When a model receives a single query, distance-based vector database retrievals attempt to locate a similar embedded context for a response by representing the query in a high-dimensional space. However, it may provide a different answer if the query is slightly modified. To get over this problem, LlamaIndex features a “Multi-Step Query Engine,” while LangChain has released a potent library known as “MultiQuery Retriever.” These let people ask sophisticated questions and get the most suitable and accurate answer. In this article, we will go deep into the components of multi-query retrievers and multi-step query engines, compare the two, and use them to retrieve answers to a query from a document.
Table of Contents
- Introduction to MultiQuery Retriever and Multi-Step Query Engine
- Implementation of LangChain’s MultiQuery Retriever
- Using LlamaIndex’s Multi-Step Query Engine to Answer a Query
- Comparison between MultiQuery Retriever and Multi-Step Query Engine
Let’s start with an introduction to these advanced query retrievers and understand how they work:
Introduction to MultiQuery Retriever and Multi-Step Query Engine
The MultiQuery retriever and the Multi-Step Query Engine are introduced to handle complex queries that may require multiple steps to answer fully.
The goal of MultiQuery Retriever is to get over the drawbacks of distance-based similarity search. This allows for various query formulation strategies by employing language models to generate numerous inquiries similar to the original query. The documents are then obtained using these searches from a variety of data sources, including search engines and vector databases. This raises the likelihood of discovering the most appropriate and effective information.
The goal of MultiQuery Retriever is to get over the drawbacks of distance-based similarity search. This allows for various query formulation strategies by employing language models to generate numerous inquiries similar to the original query. The documents are then obtained using these searches from a variety of data sources, including search engines and vector databases. This raises the likelihood of discovering the most appropriate and effective information.
Implementation of LangChain’s MultiQuery Retriever
Here, we are going to use the multiquery retriever of LangChain to find a very relevant and more accurate response for an input query. We will be asking a query related to a document, and this query will be approached in multiple different ways. According to those questions, our retriever engine will search for relevant responses in the vector store, where our embedded document is stored. Once it finds the suitable responses for each query, it combines them to give the final response for the input query.
So, to start with, install all the required packages and import all the libraries.
%pip install pypdf
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
Once all the libraries are imported, read the document. For that, we have PyPDFLoader of LangChain.
loader = PyPDFLoader("./Document/Harry Potter and the Sorcerers Stone.pdf")
pages = loader.load_and_split()
We have our document now. We have to embed this document and save it in our vector store database. For embedding, we will use OpenAIEmbeddings and store them in the Chroma vector store database.
embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=pages, embedding=embedding)
Next, we are going to be using the “gpt-3.5-turbo-0125” model.
llm = ChatOpenAI(model="gpt-3.5-turbo-0125",temperature=0, openai_api_key=os.environ["OPENAI_API_KEY"])
This model is used to create sub-questions based on the input query and also to combine the answers of different subquestions.
Let’s now build our retriever using MultiQueryRetriever. This will be used to retrieve data from the vector database.
retriever = MultiQueryRetriever.from_llm(
retriever=vectordb.as_retriever(), llm=llm
)
To see the sub-queries created by LLM, we will be using logging.
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)
Let us now give an input query related to the document. And see the sub-questions created.
question = "What are the characters present in the book? And who are Harry's friends?"
docs = retriever.get_relevant_documents(query=question)
len(docs)
INFO:langchain.retrievers.multi_query:Generated queries: [‘1. Which characters are featured in the book?’, ‘2. Can you list the characters that appear in the book?’, ‘3. Who are the friends of Harry in the story?’]
Once we have our subquestions ready, we can create a prompt template and use queries and contexts as input variables. We will then create an LLM chain using this prompt.
QA_PROMPT = PromptTemplate(
input_variables=["query", "contexts"],
template="""You are a helpful assistant. You have to answer the provided query.".
Contexts:
{contexts}
Question: {query}""",
)
# Chain
qa_chain = LLMChain(llm=llm, prompt=QA_PROMPT)
Next, we can get the combined response as the final output for the input query.
out = qa_chain(
inputs={
"query": question,
"contexts": "\n---\n".join([d.page_content for d in docs])
}
)
out["text"]
out = qa_chain(
inputs={
"query": question,
"contexts": "\n---\n".join([d.page_content for d in docs])
}
)
out["text"]
And the output will be something like this:
So, by using a multiquery retriever, we were able to get a much better response than what we would have gotten using a single query retriever.
‘In the book, “Harry Potter and the Sorcerer\’s Stone,” some of the characters present are Harry Potter, Ron Weasley, Hermione Granger, Hagrid, Professor Quirrell, Mr. Ollivander, Dudley Dursley, Aunt Petunia, Uncle Vernon, Nearly Headless Nick, Peeves the Poltergeist, Argus Filch, and Professor Dumbledore. \n\nHarry\’s friends are Ron Weasley and Hermione Granger.’
Using LlamaIndex’s Multi-Step Query Engine to Answer a Query
A multi-step query is used to dissolve a complex query into simple sub-queries. In a multi-step query, an input query is given to LLM, which creates subqueries depending on the main query. These sub-queries are created in a sequential order, i.e., a sub-query generated is the refined form of the previous query. In a multi-step query, we can get the responses for each sub-query. The final response will be a combination of these responses.
Now, let us see the hands-on implementation of the multi-step query engine using LlamaIndex.
To begin with, install all the important packages and import the libraries.
%pip install llama-index-llms-openai
%pip install llama-index
import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from IPython.display import Markdown, display
from llama_index.core.indices.query.query_transform.base import StepDecomposeQueryTransform
from llama_index.core.query_engine import MultiStepQueryEngine
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')
We will be using “gpt-3.5-turbo” to decompose the input query and combine the responses.
# LLM (gpt-3.5)
gpt35 = OpenAI(temperature=0, model="gpt-3.5-turbo")
# Now, let us read the document using the SimpleDirectoryReader of LlamaIndex.
documents = SimpleDirectoryReader("Document/").load_data()
We will be building the vector store index.
index = VectorStoreIndex.from_documents(documents)
We will be using StepDecomposeQueryTransform to decompose the input query to subqueries.
# gpt-35
step_decompose_transform = StepDecomposeQueryTransform(llm=gpt35, verbose=True)
One of the special cases of a multi-step query engine is that we can give a summary of what the question would be and what we are expecting from the question.
index_summary = "Answers question about the Harry and friends"
Now, let us create a query engine that will help in creating subqueries by using the llm and decomposition query transforms. With this query, we will be querying our question and also seeing what subqueries it has generated.
query_engines = index.as_query_engine(llm=gpt35)
query_engine = MultiStepQueryEngine(
query_engine=query_engines,
query_transform=step_decompose_transform,
index_summary=index_summary,
)
response_gpt = query_engine.query(
"What are the characters present in the book? And among them who are Harry's friends?"
)
Let us now display the response to the query.
display(Markdown(f"{response_gpt}"))
Output:
In the multi-step query engine, we can also display the responses for each sub-query.
[(“Who are Harry’s friends in the book?”, “Ron is one of Harry’s friends in the book.”), (‘What are the names of the other friends of Harry in the book?’, “Ron is one of Harry’s friends in the book. Fred and George are also mentioned as friends of Harry in the text.”), (“What roles do Ron, Fred, and George play in Harry’s life in the book?”, “Ron is a friend who is curious about Harry’s identity and background, sharing details about his own family and expressing his insecurities about living up to his older siblings’ achievements. Fred and George are mentioned as mischievous twins who are part of Ron’s family, known for getting good marks and being funny. They provide a sense of humor and light-heartedness in Harry’s life, as well as a connection to the wizarding world through their magical abilities and family background.”)]
So, by using the multi-step query of llamaindex, we were able to answer the query. We know that llamaindex is primarily concerned with loading, ingesting, and indexing to form a base of knowledge.
Comparing the multiquery retriever and multi-step query engine, we see that Langchain is giving a better response to the query. It mentions all the character names first and then mentions the names of friends. And in llamaindex, we do see that the sub-queries are based on the previous query, because of which the originality of the query is somehow missed.
Comparison between MultiQuery Retriever and Multi-Step Query Engine
Customization vs. Ease of Use
LangChain offers so many customization options, using which users can adjust the retrieval procedure to suit their requirements. Because of this, it can be used for difficult and specialized tasks but needs technical know-how.
LlamaIndex is designed with efficiency and usability in mind, making it suitable for users with different degrees of technical expertise. It is best suited for situations requiring accurate and timely results.
Query Processing
LangChain generates multiple parallel queries to cover different aspects of the original query.
LlamaIndex uses a sequential, iterative process where each query step refines the previous one, gradually narrowing down the results.
Document Retrieval
LangChain retrieves relevant documents for each generated query from a vector store.
LlamaIndex executes the sub-questions one by one, using the results of previous steps to answer the final query.
Result Aggregation
LangChain merges and deduplicates the retrieved documents to provide a comprehensive result.
LlamaIndex provides a step-by-step explanation of how the query was answered.
Performance and Resource Usage
LangChain may require more computational resources due to the generation and processing of multiple queries simultaneously.
LlamaIndex is designed to be efficient and less resource-intensive, focusing on iterative refinement for speed and accuracy.
Integration and Scalability
LangChain provides robust integration capabilities, suitable for integrating with various data sources and other LangChain tools.
LlamaIndex, while efficient and user-friendly, may have limitations in handling extremely large-scale or highly complex retrieval scenarios.
Conclusion
The Multiquery Retriever from LangChain and the Multi-Step Query Engine from LlamaIndex both provide special benefits suited to various retrieval requirements. The method used by LangChain is perfect for retrieving extensive and detailed information, particularly in cases involving a lot of study. On the other hand, LlamaIndex is suited for real-time applications since it is excellent at clarifying unclear queries and delivering prompt, accurate responses. The exact needs of the assignment, the degree of customization required, and the computational resources available will determine which of them to use. With features like LangChain’s all-inclusive question creation and LlamaIndex’s methodical breakdown, sophisticated query retrievers are an effective way to improve search results and glean insightful information from a variety of data sources.
References
Master LangChain and its Hands-on applications by joining the following course.