ADaSci Banner 2024

A Practitioner’s Guide to Deploy LangChain Applications with LangServe

Deploy LangChain applications easily with LangServe, ensuring optimal performance and scalability for your AI projects.

In the rapidly evolving field of AI and machine learning, deploying language models into production environments efficiently and reliably is a significant challenge. LangServe, a tool developed by the LangChain team, addresses this challenge by providing a robust framework to deploy LangChain-based models and applications easily. This article explores the applications of LangServe and guides you on integrating it into various projects for optimal performance and scalability. In this article, we will look into what LangServe is, its applications, and how to use it to deploy our LangChain applications.

Table Of Contents:

  1. What is LangServe?
  2. Why do we need LangServe?
  3. Advantages of LangServe
  4. Applications of LangServe
  5. Using LangServe to deploy the LangChain Application.

Let us deep-dive into LangServe’s applications and also implement deploying an app to the local host.

What is LangServe?

LangServe is a deployment tool that helps transform LangChain prototypes into production-ready applications. It uses FastAPI to provide an easy web server setup with features such as streaming outputs, batch processing, automatic schema generation, and support for multiple concurrent requests. It is an ideal solution for developers who want to streamline the deployment of language models and chains.

Why Do We Need LangServe?

Automatic API Generation

LangServe can create FastAPI services automatically with streaming and batch endpoints. So we do not have to work on the API setup manually. 

Playground UI

LangServe provides us with a user-friendly playground. We can experiment with different configurations and inputs, making it easy to test and refine our LangChain applications.  

Configuration Options

Using LangServe, we can configure the key components of our LangChain application directly from API. These key components include the model, temperature, and top-k parameters.

Remote Calling

LangServe supports calling remote LangServe instances from JavaScript environments like the browser, making it possible to deploy our LangChain applications to a wider audience.

LangChain Expression Language (LCEL)

LangServe is built on top of LCEL, which allows you to deploy chains with no code changes, from simple prompts to complex multi-step chains. This feature ensures that your LangChain applications can be easily adapted and scaled as your requirements evolve.

Advantages of Using LangServe

Let us see what are the advantages of Using LangServe.

Streamlined Deployment

It is easy to deploy LangChain runnable and chains as accessible RestAPI using LangServe. Removing the technical complexities of deployment allows developers to focus more on building and improving their applications.


Through the LangSmith platform, while deploying with LangServe, we can gain valuable insights into our application’s performance. We can monitor usage, errors, costs, and other key metrics.

Versatile Integration

LangServe provides a client-friendly interface that closely resembles the functionality of other runnables in the LangChain framework. This makes it simple to integrate deployed APIs into your application, regardless of the language or environment.

Simplified API Serving

By leveraging FastAPI and Pydantic, LangServe offers a robust solution for serving APIs from your LangChain applications. It provides remote APIs from core LangChain Expression Language methods like invoke, batch, and stream, simplifying the process of making the functionality of the app available via API.

Application of LangServe

Chatbots and Conversational Agents

Chatbots have become essential in customer service, virtual assistants, and user engagement interfaces. LangServe simplifies the deployment of chatbots by offering an easy setup for conversational agents. By integrating LangChain’s conversational retrieval chains with LangServe, developers can create chatbots capable of understanding and responding to user queries accurately and efficiently. This allows businesses to provide better customer support and enhance user interaction through seamless, real-time communication.

Knowledge Management Systems

Organizations dealing with large volumes of documents can leverage LangServe to build advanced knowledge management systems. These systems can efficiently index and retrieve information, helping users find relevant documents quickly. By using LangServe’s capabilities, businesses can deploy retrieval-augmented generation (RAG) setups.

Real-Time Data Processing

Real-time data processing is crucial for applications such as monitoring systems and live analytics. LangServe supports streaming outputs, which is essential for handling real-time data streams. This feature allows applications to process data as it arrives and generate incremental outputs. 

Using LangServe to Deploy the LangChain Application

As we know now, LangServe helps in deploying LangChain applications very easily and quickly. It is very easy to understand the workings of LangServe and how we can use it to deploy our applications. We will now deploy our LangChain app, which will answer a query regarding a document, and also an app that will perform the task asked of it.

To begin with, we will install all the required packages and import all the libraries that we will be using. 

%pip install langchain langchain-openai langchain-core python-dotenv langserve fastapi uvicorn langchain-community sse_starlette BeautifulSoup4

from typing import List
import os
from fastapi import FastAPI
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from import create_retriever_tool
from import TavilySearchResults
from langchain import hub
from langchain.agents import create_openai_functions_agent
from langchain.agents import AgentExecutor
from langchain.pydantic_v1 import BaseModel, Field
from langchain_core.messages import BaseMessage
from langserve import add_routes

We will be using Tavily Search API Key so we can import it into our environment.


We are going to load our document. Here, we will use WebBaseLoader to load a Wikipedia page. We will split this document into chunks. Next, we will embed these chunks into vectors and store them in our vector database

# Load Document

loader = WebBaseLoader("")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings()
vector = FAISS.from_documents(documents, embeddings)
retriever = vector.as_retriever()

Now, let us create a tool called the retriever tool, which will help to retrieve the response. 

#creating tools

retriever_tool = create_retriever_tool(
   "Search about informations related to Harry Potter book. For any questions on the book, use this tool!"
search = TavilySearchResults()
tools = [retriever_tool, search]
tools = [TavilySearchResults(max_results=1)]

We will create an LLM agent. This agent will use the retriever tool and execute the retriever process.

#create agent

prompt = hub.pull("hwchase17/openai-functions-agent")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
agent = create_openai_functions_agent(llm,tools,prompt)
agent_executor=AgentExecutor(agent=agent, tools=tools, verbose=True)

We can now define our app using FastAPI. We have to give the title of our application and version of the app.

#Defining app

app = FastAPI(
   title="Harry Potter: Ask me Anything",
   version = '1.0',
   description = "A simple API server using LangChain's Runnable interfaces",

Next, we will define the application’s input and output classes. Then, we will add a route with our app and agent executor. We have to define a path here, which will be the extension of our local host.

#Adding Chain Route

class Input (BaseModel):
   input: str
   chat_history: List[BaseMessage] = Field(
       extra = {"widget": {"type":"chat", "input":"location"}},

class Output (BaseModel):

   agent_executor.with_types(input_type = Input, output_type = Output),

Now, just the run the app using uvicorn. The app will be running on a local host. 

if __name__ == "__main__":
   import uvicorn, host='localhost', port=8000)

When we run this code, we will get a local host link. This, in the beginning, will give a page that will look something like this:

At this time, we have to go to the URL and type http://localhost:8000/docs. This will take us to the app’s schema. If we want to see the application, we have to add the extension we got in the output.

For example., http://localhost:8000/agentl/playground/.

Now, we will get our playground, and we can ask it any question we want. We will get the response retrieved, as shown below. 

The output will be something like this:

Similarly, we can deploy an app that will either chat with us or execute the task we have given it.

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
           "You are a helpful assistant. You may not need to use tools for every query - the user may just want to chat!",

from langchain.agents import AgentExecutor, create_openai_tools_agent

agent = create_openai_tools_agent(llm, tools, prompt)

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

   agent_executor.with_types(input_type = Input, output_type = Output),

The output will be something like :

Thus, by using LangServe we deployed our retriever app and all-in-one app into the local host. Here, we just built a tool and defined our app, which we added to our app route. With minimum steps, we were able to deploy an app. Hence, LangServe is very handy and easy to use. 


LangServe bridges the gap between prototype and production, making it an invaluable tool for deploying scalable and efficient language model applications. Its versatility allows it to be applied across various domains, from customer service chatbots and knowledge management systems to real-time data processing and personalized content generation. By leveraging LangServe, developers can focus on building innovative solutions without worrying about the complexities of deployment, ultimately enhancing the efficiency and scalability of their AI-driven applications.


  1. Link to the above code
  2. LangServe Documentation
  3. LangServe Github Repository

Learn more about LangChain and its applications, Vector Databases. Join the following courses.

Picture of Shreepradha Hegde

Shreepradha Hegde

Shreepradha is an accomplished Associate Lead Consultant at AIM, showcasing expertise in AI and data science, specifically Generative AI. With a wealth of experience, she has consistently demonstrated exceptional skills in leveraging advanced technologies to drive innovation and insightful solutions. Shreepradha's dedication and strategic mindset have made her a valuable asset in the ever-evolving landscape of artificial intelligence and data science.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.