The world has become enthralled by large language models (LLMs), which produce persuasive code, clever scripts, and even beautiful poetry. Despite their immense talent, they frequently fail to establish a realistic basis for their creations. Retrieval Augmented Generation (RAG) is a cutting-edge method that increases the accuracy and relevance of LLMs by enabling them to access the enormous store of external knowledge. In this article, we will explore RAG using the Pinecone Vector database using a hands-on approach.
Table of Contents
- Retrieval Augmented Generation
- RAG Pipeline
- Need and Importance of RAG
- Vector Databases
- Importance of Vector Databases
- Feature Comparison of Different Vector Databases
- Pinecone Vector Database
- Enhancing RAG with Pinecone: Hands-on Implementation
Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) improves the quality of generative AI by integrating external knowledge, enabling large-scale language models (LLM) to provide more accurate and contextually relevant answers. RAG allows the systems to use current data from external sources without retraining the LLMs from scratch. This technique involves two primary phases – retrieval and generation.
During the retrieval phase, the data in question is ingested from external public or private sources and during the generation phase the language model uses the collected data to generate insightful responses. The effect of RAG extends to improving the efficiency and performance of AI models, reducing information gaps and hallucinations, and enabling AI systems to take appropriate actions based on contextual information and user instructions.
RAG Pipeline
Understanding the RAG pipeline involves a multi-step process that comprises the following stages:
- Retrieval: In the initial phase, the LLM examines the input prompt to determine which external sources are most pertinent to consult. Mission logs, news archives, and scientific databases are a few examples of these sources. The model gathers the necessary data from these sources using natural language processing techniques, which forms the basis for the next steps.
- Generation: the generated content comes from the third step when the real text is produced by the refined LLM using the internal and external knowledge that has been merged. The output obtained is well-written, cohesive, fully descriptive, and current because of the improvements made by RAG.
- Augmentation: The goal of the second stage is to incorporate the obtained pertinent data into the creative process of the LLM. Through the process of fusion, the model’s internal representations are combined with external knowledge to produce contextualised and informed text.
RAG (minimalistic view)
Need and Importance of RAG
The purpose of Retrieval Augmented Generation (RAG) is to enhance the quality of language model-generated responses by grounding the model on external sources of knowledge. RAG combines information retrieval with text generation, allowing language models to access current, verifiable information from external sources, supplementing the model’s internal representation of information.
The importance of RAG in LLMs can be understood by the image below:
Importance of RAG
Vector Databases
Specialised data management platforms called vector databases are made for managing, storing, and querying high-dimensional vectors. These mathematical constructions, called vectors, can represent a wide range of data kinds, including text, audio, images, and other complicated information.
Importance of Vector Databases
The following are some important features of vector databases:
- Efficient similarity search: Conventional databases are good at identifying exact matches, but they are not very good at locating related data based on context or meaning. Vector databases are excellent at this by employing:
- Vector embeddings: Semantic linkages between data points are captured by converting them into vectors.
- Specialised indexing and search algorithms: These algorithms efficiently find similar vectors based on distance metrics like cosine similarity.
Example of Similarity Search
- Handling high-dimensional data: Unlike tables with fixed columns like traditional databases which struggle with large dimensional data, vectors hold flexible numbers of features. This is crucial for the following:
- Text embeddings: Capture the meaning of words and sentences from documents.
- Image and video features: Represent visual content beyond basic metadata.
- Sensor data: Store readings from various sensors in a single vector.
- Scalability and performance: As data volume and dimensions grow, traditional databases can slow down. Vector databases use the following for scaling:
- Distributed architectures: Handle large datasets across multiple servers.
- Optimised indexing and search: Maintain fast performance even with complex queries. This ensures the applications stay responsive even with massive amounts of high-dimensional data.
Feature Comparison of Different Vector Databases
Comparative Analysis of Different Vector Databases
Pinecone Vector Database
Pinecone is a cloud-based managed vector database specifically designed for machine learning applications. It’s a great choice for implementing semantic search due to its features and capabilities:
- Key features of Pinecone:
- High-performance search: Offers very low latency for vector search, even with billions of vectors.
- Real-time updates: Supports updating data in real time, enabling dynamic changes to your search index.
- Filtered search: Combine vector search with metadata filters for more precise and relevant results.
- Easy integration: Offers toolkits for popular languages such as Python.
- Managed service: The vector infrastructure is managed and the users don’t have to worry about it.
- Security and compliance: Pinecone is SOC 2 and HIPAA compliance certified.
- How Pinecone can be used for semantic search:
- Text search: Find similar documents, articles, or product descriptions based on their meaning, not just keywords.
- Recommendations: Make product recommendations to users by using vectors that represent their prior purchases and interests.
- Chatbot Creation: Create chatbots that can comprehend difficult inquiries and provide insightful answers.
- Knowledge Graph Integration: Investigate vast text datasets using knowledge graphs to locate pertinent information based on relationships and concepts.
Enhancing RAG using Pinecone: Hands-on Implementation
Initial Requirements:
- API Keys
- OpenAI
- Pinecone
Let’s delve into the implementations with the following step-by-step, hands-on guide.
1: Dependencies
We need to install the following libraries:
- langchain – to implement langchain orchestration for RAG
- openai – to access the openai GPT for RAG operations
- unstructured – for working with the URL data
- clean-text – for pre-processing the URL data
- langchain-pinecone – for implementing vector storage with RAG
!pip install langchain-openai unstructured==0.7.12 pinecone-client openai langchain clean-text langchain-pinecone
2: Importing Dependencies
We will import the installed libraries for creating our RAG agent based on Pinecone vector DB.
from langchain.text_splitter import CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document
from langchain.document_loaders import UnstructuredURLLoader
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.vectorstores.pinecone import Pinecone
from langchain.chains import RetrievalQA
from langchain import OpenAI
from langchain_openai import ChatOpenAI
from cleantext import clean
import os
import nltk
import pinecone
import openai
3: User input for building a RAG agent that takes URL data and builds a vector store
The below code takes the user input comprising of number of web links and their data for building a tailored RAG agent.
urls = []
n_weblinks = int(input("How many web links you want the RAG agent to refer for response generation & insights? Enter here: "))
print("Enter your links below: ")
for i in range(0, n_weblinks):
inp = input()
# appending the element in list
urls.append(inp)
Output:
4: Loading URL using LangChain UnstructuredURLLoader and Cleaning its content
Implementing the UnstructuredURLLoader for loading URL data and converting it into textual format. The process also involves URL data cleaning which is done by cleantext library’s clean function.
loader = UnstructuredURLLoader(urls=urls)
urls = loader.load()
# Merging urls into a single list
documents = []
documents.extend(urls)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
texts = text_splitter.split_documents(documents)
texts
clean_url_text = clean(text=texts,
fix_unicode=True,
to_ascii=True,
lower=True,
no_line_breaks=False,
no_urls=False,
no_emails=False,
no_phone_numbers=False,
no_numbers=False,
no_digits=False,
no_currency_symbols=False,
no_punct=False,
replace_with_punct="",
replace_with_url="This is a URL",
replace_with_email="Email",
replace_with_phone_number="",
replace_with_number="123",
replace_with_digit="0",
replace_with_currency_symbol="$",
lang="en"
)
clean_url_text
Output:
5: Setting up OPENAI and PINECONE API Keys
Using Colab’s secret key function userdata.get to initialise the OpenAI API and Pinecone API Key. We will also setup the Pinecone ServerlessSpec using ‘aws’ cloud provider and ‘us-east-1’ region which comes in free tier services.
# Set the OpenAI API key as an environment variable
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_APIKEY")
#Set the pinecone key
from pinecone import Pinecone
os.environ["PINECONE_API_KEY"] = userdata.get("PINECONE_API")
api_key = os.getenv("PINECONE_API_KEY")
# configure client
pc = Pinecone(api_key=api_key)
from pinecone import ServerlessSpec
cloud = os.environ.get('PINECONE_CLOUD') or 'aws'
region = os.environ.get('PINECONE_REGION') or 'us-east-1'
spec = ServerlessSpec(cloud=cloud, region=region)
6: Creating a new Index in PineCone DB
Deleting the index named ‘myindex’ if it already exists, and creating a new index ‘myindex’ having default parameters (dimension = 1536, similarity metric = ‘dotproduct’) as per OpenAI’s Pinecone vector DB initialisation parameters.
import time
index_name = "myindex"
if index_name in pc.list_indexes().names():
pc.delete_index(index_name)
# we create a new index
pc.create_index(
index_name,
dimension=1536, # dimensionality of text-embedding-ada-002
metric='dotproduct',
spec=spec
)
# wait for index to be initialized
while not pc.describe_index(index_name).status['ready']:
time.sleep(1)
index = pc.Index(index_name)
index.describe_index_stats()
Output:
7: Using LangChain for Creating Pinecone Vector Store
Using langchain_pinecone library to create a PineconeVectorStore with OpenAI Embeddings for vector conversion of our URL data and its storage in Pinecone DB
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings)
8: Upserting URL content data into the Pinecone Vector Store
Inserting our URL’s vector data into the Pinecone index ‘myindex’ for RAG querying
vectorstore_from_docs = PineconeVectorStore.from_documents(
texts,
index_name=index_name,
embedding=embeddings
)
Output: Checking Pinecone web dashboard
9: Querying our agent
Querying our RAG agent and checking the response using vectorstore.similarity_search function
query = "What is AdaSci?"
vectorstore.similarity_search(query)
Output:
10: Building a proper generative AI chatbot model powered with LangChain, RAG and Pinecone
Setting a gen ai based llm using OpenAI’s GPT3.5-Turbo model with temperature 0.0 (temperature controls the degree of randomness, higher value means the generated response is more random)
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
# completion llm
llm = ChatOpenAI(
openai_api_key=userdata.get("OPENAI_APIKEY"),
model_name='gpt-3.5-turbo',
temperature=0.0
)
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever()
)
qa.run(query)
Output:
initial_prompt = “You are a researcher who is going to search the web links, summarize them and share insights as asked”
query = "What is the primary goal of AdaSci?"
result = qa({"query": query, "prompt": initial_prompt})
print(result['result'])
Output:
The primary goal of ADaSci is to advance AI expertise and provide global recognition as a certified professional in the field of data science and artificial intelligence. They offer memberships, accreditations, and opportunities for upskilling and reskilling in AI-related areas.
Final Words
In conclusion, combining RAG and Pinecone offers a powerful and versatile approach to advanced large-language modelling. RAG’s ability to mitigate hallucinations and generate responses relevant to external data is complemented by the faster retrieval and response times of Pinecone due to its high-performance indexing and retrieval capabilities. Pinecone’s serverless architecture eliminates the need for infrastructure management, leading to significant cost savings thereby leveraging RAG for large-scale tasks compared to traditional information retrieval systems.