How to Enhance RAG Models with Pinecone Vector Database?

Discover the transformative power of Retrieval Augmented Generation (RAG) models in unleashing the full potential of large language models (LLMs) through Pinecone Vector DB.

The world has become enthralled by large language models (LLMs), which produce persuasive code, clever scripts, and even beautiful poetry. Despite their immense talent, they frequently fail to establish a realistic basis for their creations. Retrieval Augmented Generation (RAG) is a cutting-edge method that increases the accuracy and relevance of LLMs by enabling them to access the enormous store of external knowledge. In this article, we will explore RAG using the Pinecone Vector database using a hands-on approach. 

Table of Contents

  1. Retrieval Augmented Generation
    1. RAG Pipeline
    2. Need and Importance of RAG
  2. Vector Databases
    1. Importance of Vector Databases
    2. Feature Comparison of Different Vector Databases
    3. Pinecone Vector Database
  3. Enhancing RAG with Pinecone: Hands-on Implementation

Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) improves the quality of generative AI by integrating external knowledge, enabling large-scale language models (LLM) to provide more accurate and contextually relevant answers. RAG allows the systems to use current data from external sources without retraining the LLMs from scratch. This technique involves two primary phases – retrieval and generation. 

During the retrieval phase, the data in question is ingested from external public or private sources and during the generation phase the language model uses the collected data to generate insightful responses. The effect of RAG extends to improving the efficiency and performance of AI models, reducing information gaps and hallucinations, and enabling AI systems to take appropriate actions based on contextual information and user instructions. 

RAG Pipeline

Understanding the RAG pipeline involves a multi-step process that comprises the following stages:

  1. Retrieval: In the initial phase, the LLM examines the input prompt to determine which external sources are most pertinent to consult. Mission logs, news archives, and scientific databases are a few examples of these sources. The model gathers the necessary data from these sources using natural language processing techniques, which forms the basis for the next steps.
  2. Generation: the generated content comes from the third step when the real text is produced by the refined LLM using the internal and external knowledge that has been merged. The output obtained is well-written, cohesive, fully descriptive, and current because of the improvements made by RAG. 
  3. Augmentation: The goal of the second stage is to incorporate the obtained pertinent data into the creative process of the LLM. Through the process of fusion, the model’s internal representations are combined with external knowledge to produce contextualised and informed text. 

RAG (minimalistic view)

Need and Importance of RAG

The purpose of Retrieval Augmented Generation (RAG) is to enhance the quality of language model-generated responses by grounding the model on external sources of knowledge. RAG combines information retrieval with text generation, allowing language models to access current, verifiable information from external sources, supplementing the model’s internal representation of information. 

The importance of RAG in LLMs can be understood by the image below: 

Importance of RAG

Vector Databases

Specialised data management platforms called vector databases are made for managing, storing, and querying high-dimensional vectors. These mathematical constructions, called vectors, can represent a wide range of data kinds, including text, audio, images, and other complicated information.

Importance of Vector Databases

The following are some important features of vector databases:

  1. Efficient similarity search: Conventional databases are good at identifying exact matches, but they are not very good at locating related data based on context or meaning. Vector databases are excellent at this by employing:
    1. Vector embeddings: Semantic linkages between data points are captured by converting them into vectors.
    2. Specialised indexing and search algorithms: These algorithms efficiently find similar vectors based on distance metrics like cosine similarity.

Example of Similarity Search

  1. Handling high-dimensional data: Unlike tables with fixed columns like traditional databases which struggle with large dimensional data, vectors hold flexible numbers of features. This is crucial for the following:
    1. Text embeddings: Capture the meaning of words and sentences from documents.
    2. Image and video features: Represent visual content beyond basic metadata.
    3. Sensor data: Store readings from various sensors in a single vector.
  2. Scalability and performance: As data volume and dimensions grow, traditional databases can slow down. Vector databases use the following for scaling:
    1. Distributed architectures: Handle large datasets across multiple servers.
    2. Optimised indexing and search: Maintain fast performance even with complex queries. This ensures the applications stay responsive even with massive amounts of high-dimensional data. 

Feature Comparison of Different Vector Databases

Comparative Analysis of Different Vector Databases

Pinecone Vector Database

Pinecone is a cloud-based managed vector database specifically designed for machine learning applications. It’s a great choice for implementing semantic search due to its features and capabilities: 

  1. Key features of Pinecone:
    1. High-performance search: Offers very low latency for vector search, even with billions of vectors.
    2. Real-time updates: Supports updating data in real time, enabling dynamic changes to your search index.
    3. Filtered search: Combine vector search with metadata filters for more precise and relevant results.
    4. Easy integration: Offers toolkits for popular languages such as Python. 
    5. Managed service: The vector infrastructure is managed and the users don’t have to worry about it. 
    6. Security and compliance: Pinecone is SOC 2 and HIPAA compliance certified.  
  2. How Pinecone can be used for semantic search:
    1. Text search: Find similar documents, articles, or product descriptions based on their meaning, not just keywords.
    2. Recommendations: Make product recommendations to users by using vectors that represent their prior purchases and interests.
    3. Chatbot Creation: Create chatbots that can comprehend difficult inquiries and provide insightful answers.
    4. Knowledge Graph Integration: Investigate vast text datasets using knowledge graphs to locate pertinent information based on relationships and concepts.

Enhancing RAG using Pinecone: Hands-on Implementation

Initial Requirements: 

  1. API Keys
    1. OpenAI
    2. Pinecone

Let’s delve into the implementations with the following step-by-step, hands-on guide.

1: Dependencies

We need to install the following libraries: 

  1. langchain – to implement langchain orchestration for RAG
  2. openai – to access the openai GPT for RAG operations
  3. unstructured – for working with the URL data
  4. clean-text – for pre-processing the URL data
  5. langchain-pinecone  – for implementing vector storage with RAG
!pip install langchain-openai unstructured==0.7.12 pinecone-client openai langchain clean-text langchain-pinecone

2: Importing Dependencies

We will import the installed libraries for creating our RAG agent based on Pinecone vector DB.

from langchain.text_splitter import CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.docstore.document import Document
from langchain.document_loaders import UnstructuredURLLoader
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.vectorstores.pinecone import Pinecone
from langchain.chains import RetrievalQA
from langchain import OpenAI
from langchain_openai import ChatOpenAI
from cleantext import clean
import os
import nltk
import pinecone
import openai

3: User input for building a RAG agent that takes URL data and builds a vector store

The below code takes the user input comprising of number of web links and their data for building a tailored RAG agent.

urls = []
n_weblinks = int(input("How many web links you want the RAG agent to refer for response generation & insights? Enter here: "))
print("Enter your links below: ")
for i in range(0, n_weblinks):
inp = input()

# appending the element in list
urls.append(inp)

Output: 

4: Loading URL using LangChain UnstructuredURLLoader and Cleaning its content

Implementing the UnstructuredURLLoader for loading URL data and converting it into textual format. The process also involves URL data cleaning which is done by cleantext library’s clean function.

loader = UnstructuredURLLoader(urls=urls)
urls = loader.load()

# Merging urls into a single list
documents = []
documents.extend(urls)

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
texts = text_splitter.split_documents(documents)
texts

clean_url_text = clean(text=texts,
           fix_unicode=True,
           to_ascii=True,
           lower=True,
           no_line_breaks=False,
           no_urls=False,
           no_emails=False,
           no_phone_numbers=False,
           no_numbers=False,
           no_digits=False,
           no_currency_symbols=False,
           no_punct=False,
           replace_with_punct="",
           replace_with_url="This is a URL",
           replace_with_email="Email",
           replace_with_phone_number="",
           replace_with_number="123",
           replace_with_digit="0",
           replace_with_currency_symbol="$",
           lang="en"
           )
clean_url_text

Output:

5: Setting up OPENAI and PINECONE API Keys

Using Colab’s secret key function userdata.get to initialise the OpenAI API and Pinecone API Key. We will also setup the Pinecone ServerlessSpec using ‘aws’ cloud provider and ‘us-east-1’ region which comes in free tier services. 

# Set the OpenAI API key as an environment variable
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_APIKEY")

#Set the pinecone key
from pinecone import Pinecone
os.environ["PINECONE_API_KEY"] = userdata.get("PINECONE_API")
api_key = os.getenv("PINECONE_API_KEY")

# configure client
pc = Pinecone(api_key=api_key)
from pinecone import ServerlessSpec
cloud = os.environ.get('PINECONE_CLOUD') or 'aws'
region = os.environ.get('PINECONE_REGION') or 'us-east-1'
spec = ServerlessSpec(cloud=cloud, region=region)

6: Creating a new Index in PineCone DB

Deleting the index named ‘myindex’ if it already exists, and creating a new index ‘myindex’ having default parameters (dimension = 1536, similarity metric = ‘dotproduct’) as per OpenAI’s Pinecone vector DB initialisation parameters. 

import time
index_name = "myindex"

if index_name in pc.list_indexes().names():
   pc.delete_index(index_name)

# we create a new index
pc.create_index(
       index_name,
       dimension=1536,  # dimensionality of text-embedding-ada-002
       metric='dotproduct',
       spec=spec
   )

# wait for index to be initialized
while not pc.describe_index(index_name).status['ready']:
   time.sleep(1)

index = pc.Index(index_name)
index.describe_index_stats()

Output:

7: Using LangChain for Creating Pinecone Vector Store

Using langchain_pinecone library to create a PineconeVectorStore with OpenAI Embeddings for vector conversion of our URL data and its storage in Pinecone DB

from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings)

8: Upserting URL content data into the Pinecone Vector Store

Inserting our URL’s vector data into the Pinecone index ‘myindex’ for RAG querying

vectorstore_from_docs = PineconeVectorStore.from_documents(
       texts,
       index_name=index_name,
       embedding=embeddings
   )

Output: Checking Pinecone web dashboard

9: Querying our agent

Querying our RAG agent and checking the response using vectorstore.similarity_search function

query = "What is AdaSci?"
vectorstore.similarity_search(query)

Output:

10: Building a proper generative AI chatbot model powered with LangChain, RAG and Pinecone

Setting a gen ai based llm using OpenAI’s GPT3.5-Turbo model with temperature 0.0 (temperature controls the degree of randomness, higher value means the generated response is more random) 

from langchain_openai import ChatOpenAI 
from langchain.chains import RetrievalQA 

# completion llm 
llm = ChatOpenAI( 
   openai_api_key=userdata.get("OPENAI_APIKEY"), 
   model_name='gpt-3.5-turbo', 
   temperature=0.0 
) 
qa = RetrievalQA.from_chain_type( 
   llm=llm, 
   chain_type="stuff", 
   retriever=vectorstore.as_retriever() 
) 
qa.run(query)

Output:

initial_prompt = “You are a researcher who is going to search the web links, summarize them and share insights as asked”

query = "What is the primary goal of AdaSci?"
result = qa({"query": query, "prompt": initial_prompt})
print(result['result'])

Output:

The primary goal of ADaSci is to advance AI expertise and provide global recognition as a certified professional in the field of data science and artificial intelligence. They offer memberships, accreditations, and opportunities for upskilling and reskilling in AI-related areas.

Final Words

In conclusion, combining RAG and Pinecone offers a powerful and versatile approach to advanced large-language modelling. RAG’s ability to mitigate hallucinations and generate responses relevant to external data is complemented by the faster retrieval and response times of Pinecone due to its high-performance indexing and retrieval capabilities. Pinecone’s serverless architecture eliminates the need for infrastructure management, leading to significant cost savings thereby leveraging RAG for large-scale tasks compared to traditional information retrieval systems. 


References

  1. Link to Colab Notebook
  2. Diving Deeper into Retrieval Augmented Generation AdaSci Course  
  3. Pinecone Documentation
  4. LangChain Documentation 
  5. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Picture of Sachin Tripathi

Sachin Tripathi

Sachin Tripathi is the Manager of AI Research at AIM, with over a decade of experience in AI and Machine Learning. An expert in generative AI and large language models (LLMs), Sachin excels in education, delivering effective training programs. His expertise also includes programming, big data analytics, and cybersecurity. Known for simplifying complex concepts, Sachin is a leading figure in AI education and professional development.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.