In the rapidly evolving field of AI and machine learning, deploying language models into production environments efficiently and reliably is a significant challenge. LangServe, a tool developed by the LangChain team, addresses this challenge by providing a robust framework to deploy LangChain-based models and applications easily. This article explores the applications of LangServe and guides you on integrating it into various projects for optimal performance and scalability. In this article, we will look into what LangServe is, its applications, and how to use it to deploy our LangChain applications.
Table Of Contents:
- What is LangServe?
- Why do we need LangServe?
- Advantages of LangServe
- Applications of LangServe
- Using LangServe to deploy the LangChain Application.
Let us deep-dive into LangServe’s applications and also implement deploying an app to the local host.
What is LangServe?
LangServe is a deployment tool that helps transform LangChain prototypes into production-ready applications. It uses FastAPI to provide an easy web server setup with features such as streaming outputs, batch processing, automatic schema generation, and support for multiple concurrent requests. It is an ideal solution for developers who want to streamline the deployment of language models and chains.
Why Do We Need LangServe?
Automatic API Generation
LangServe can create FastAPI services automatically with streaming and batch endpoints. So we do not have to work on the API setup manually.
Playground UI
LangServe provides us with a user-friendly playground. We can experiment with different configurations and inputs, making it easy to test and refine our LangChain applications.
Configuration Options
Using LangServe, we can configure the key components of our LangChain application directly from API. These key components include the model, temperature, and top-k parameters.
Remote Calling
LangServe supports calling remote LangServe instances from JavaScript environments like the browser, making it possible to deploy our LangChain applications to a wider audience.
LangChain Expression Language (LCEL)
LangServe is built on top of LCEL, which allows you to deploy chains with no code changes, from simple prompts to complex multi-step chains. This feature ensures that your LangChain applications can be easily adapted and scaled as your requirements evolve.
Advantages of Using LangServe
Let us see what are the advantages of Using LangServe.
Streamlined Deployment
It is easy to deploy LangChain runnable and chains as accessible RestAPI using LangServe. Removing the technical complexities of deployment allows developers to focus more on building and improving their applications.
Visibility
Through the LangSmith platform, while deploying with LangServe, we can gain valuable insights into our application’s performance. We can monitor usage, errors, costs, and other key metrics.
Versatile Integration
LangServe provides a client-friendly interface that closely resembles the functionality of other runnables in the LangChain framework. This makes it simple to integrate deployed APIs into your application, regardless of the language or environment.
Simplified API Serving
By leveraging FastAPI and Pydantic, LangServe offers a robust solution for serving APIs from your LangChain applications. It provides remote APIs from core LangChain Expression Language methods like invoke, batch, and stream, simplifying the process of making the functionality of the app available via API.
Application of LangServe
Chatbots and Conversational Agents
Chatbots have become essential in customer service, virtual assistants, and user engagement interfaces. LangServe simplifies the deployment of chatbots by offering an easy setup for conversational agents. By integrating LangChain’s conversational retrieval chains with LangServe, developers can create chatbots capable of understanding and responding to user queries accurately and efficiently. This allows businesses to provide better customer support and enhance user interaction through seamless, real-time communication.
Knowledge Management Systems
Organizations dealing with large volumes of documents can leverage LangServe to build advanced knowledge management systems. These systems can efficiently index and retrieve information, helping users find relevant documents quickly. By using LangServe’s capabilities, businesses can deploy retrieval-augmented generation (RAG) setups.
Real-Time Data Processing
Real-time data processing is crucial for applications such as monitoring systems and live analytics. LangServe supports streaming outputs, which is essential for handling real-time data streams. This feature allows applications to process data as it arrives and generate incremental outputs.
Using LangServe to Deploy the LangChain Application
As we know now, LangServe helps in deploying LangChain applications very easily and quickly. It is very easy to understand the workings of LangServe and how we can use it to deploy our applications. We will now deploy our LangChain app, which will answer a query regarding a document, and also an app that will perform the task asked of it.
To begin with, we will install all the required packages and import all the libraries that we will be using.
%pip install langchain langchain-openai langchain-core python-dotenv langserve fastapi uvicorn langchain-community sse_starlette BeautifulSoup4
langchainhub
from typing import List
import os
from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.tools.retriever import create_retriever_tool
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain import hub
from langchain.agents import create_openai_functions_agent
from langchain.agents import AgentExecutor
from langchain.pydantic_v1 import BaseModel, Field
from langchain_core.messages import BaseMessage
from langserve import add_routes
We will be using Tavily Search API Key so we can import it into our environment.
os.environ["TAVILY_API_KEY"]="******"
We are going to load our document. Here, we will use WebBaseLoader to load a Wikipedia page. We will split this document into chunks. Next, we will embed these chunks into vectors and store them in our vector database.
# Load Document
loader = WebBaseLoader("https://en.wikipedia.org/wiki/Harry_Potter")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings()
vector = FAISS.from_documents(documents, embeddings)
retriever = vector.as_retriever()
Now, let us create a tool called the retriever tool, which will help to retrieve the response.
#creating tools
retriever_tool = create_retriever_tool(
retriever,
"HarryPotter",
"Search about informations related to Harry Potter book. For any questions on the book, use this tool!"
)
search = TavilySearchResults()
tools = [retriever_tool, search]
tools = [TavilySearchResults(max_results=1)]
We will create an LLM agent. This agent will use the retriever tool and execute the retriever process.
#create agent
prompt = hub.pull("hwchase17/openai-functions-agent")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
agent = create_openai_functions_agent(llm,tools,prompt)
agent_executor=AgentExecutor(agent=agent, tools=tools, verbose=True)
We can now define our app using FastAPI. We have to give the title of our application and version of the app.
#Defining app
app = FastAPI(
title="Harry Potter: Ask me Anything",
version = '1.0',
description = "A simple API server using LangChain's Runnable interfaces",
)
Next, we will define the application’s input and output classes. Then, we will add a route with our app and agent executor. We have to define a path here, which will be the extension of our local host.
#Adding Chain Route
class Input (BaseModel):
input: str
chat_history: List[BaseMessage] = Field(
...,
extra = {"widget": {"type":"chat", "input":"location"}},
)
class Output (BaseModel):
output:str
add_routes(
app,
agent_executor.with_types(input_type = Input, output_type = Output),
path='/agent',
)
Now, just the run the app using uvicorn. The app will be running on a local host.
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host='localhost', port=8000)
When we run this code, we will get a local host link. This, in the beginning, will give a page that will look something like this:
At this time, we have to go to the URL and type http://localhost:8000/docs. This will take us to the app’s schema. If we want to see the application, we have to add the extension we got in the output.
For example., http://localhost:8000/agentl/playground/.
Now, we will get our playground, and we can ask it any question we want. We will get the response retrieved, as shown below.
The output will be something like this:
Similarly, we can deploy an app that will either chat with us or execute the task we have given it.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant. You may not need to use tools for every query - the user may just want to chat!",
),
MessagesPlaceholder(variable_name="messages"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)
from langchain.agents import AgentExecutor, create_openai_tools_agent
agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
add_routes(
app,
agent_executor.with_types(input_type = Input, output_type = Output),
path='/trial',
)
The output will be something like :
Thus, by using LangServe we deployed our retriever app and all-in-one app into the local host. Here, we just built a tool and defined our app, which we added to our app route. With minimum steps, we were able to deploy an app. Hence, LangServe is very handy and easy to use.
Conclusion
LangServe bridges the gap between prototype and production, making it an invaluable tool for deploying scalable and efficient language model applications. Its versatility allows it to be applied across various domains, from customer service chatbots and knowledge management systems to real-time data processing and personalized content generation. By leveraging LangServe, developers can focus on building innovative solutions without worrying about the complexities of deployment, ultimately enhancing the efficiency and scalability of their AI-driven applications.
References
Learn more about LangChain and its applications, Vector Databases. Join the following courses.