Deep Dives

Hands-On Guide to build an AI-Driven Local Search Engine with Ollama

Building an AI-Driven Local Search Engine with Ollama

Explore more from ADaSci

Code Search with Vector Embeddings using Qdrant Vector Database

Document Q&A, Classification, and Summarization: Exploring Open Source (Langchain, VectorDB) and Proprietary Solutions (Azure)

Database Search: Text2SQL using dynamic few-shot prompting with self- consistency using LLM

Global-Local Scalable Explanations Using Linear Model Tree

A Noninvasive model to detect Dengue based on symptoms using Artificial Intelligence and Machine Learning

Upcoming Hands-On Workshop by ADaSci: Diving into Retrieval-Augmented Generation (RAG) with Vector Databases

Enhancing Large Language Models: Integrating Human Preferences and Conditional Reinforcement Learning

What Is A Chartered Data Scientist

Classification of different plant leaf diseases using multiple convolutional neural networks and image preprocessing

Implementing DeepSeek-R1 Locally through Llama.cpp

Large Language Models (LLMs) have changed how we work with data. While these AI tools are powerful, they usually require sending data to external servers – a major concern when dealing with sensitive information. Think about it: No company wants to share confidential data with third-party servers. Yet, traditional search methods, which simply match keywords, often fall short of finding exactly what we need. We need something better – a solution that combines AI’s intelligence with local security. This is where Ollama comes in. In this guide, we’ll show you how to build a search engine that keeps your data on your own computers, Understands questions in plain language, Finds precise answers from your documents, Protects your sensitive information. Let’s explore how to make AI search both smart and secure.

Table of Content

What Are LLM-Based Local Search Engines?
Implementation Guide: Building Your Own Search Engine with Ollama
Demonstrating the Search Engine

Let’s start with understanding what exactly local search engines are.

What Are LLMs Based Local Search Engines?

Local search engines powered by Large Language Models (LLMs) are revolutionizing how we find and use information on our computers. Unlike traditional online search tools, these AI-powered engines work entirely offline, allowing users to search their personal documents instantly without internet connectivity. They act like a smart personal assistant who understands natural language queries, grasps context, and finds exactly what you need from your files – all while keeping your data private and secure on your own computer.

This combination of advanced AI understanding and local processing makes them particularly valuable for professionals handling sensitive information, researchers working with private documents, or anyone who needs quick, secure access to their personal data. Whether you’re looking for specific details in meeting notes, searching through confidential reports, or simply wanting faster access to your files, local AI search offers a smarter, more secure way to work with your information.

Implementation Guide: Building Your Own Search Engine with Ollama

Step 1: Setting Up Your Environment:

First, ensure you have the necessary libraries installed. You can install them using pip:

pip install numpy faiss-cpu sentence-transformers streamlit ollama Pillow python-docx python-pptx PyMuPDF

Also get Ollama up and running on your PC. Follow the instructions from the ADaSci’s blog’s “Hands-On Guide to Running LLMs Locally using Ollama“. Once you’ve completed the installation successfully, Ollama should be running on your PC this is essential for building your own Search Engine. You’ll be able to interact with it locally through your command prompt.

After installation, it should look something like this:

Step 2: Structuring Your Code:

Create a new Python file, for example, local_search.py, and start by importing the necessary modules.

import os
import json
import logging
from typing import List, Dict, Any, Optional, Tuple, Set
from pathlib import Path
from dataclasses import dataclass
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer
import streamlit as st
import ollama
from PIL import Image
import fitz
from docx import Document
from pptx import Presentation
import io
from functools import lru_cache
from concurrent.futures import ThreadPoolExecutor, as_completed
import re

This code imports the required libraries for file handling, data processing, and creating a web interface. Each library serves a specific purpose, such as reading different document formats or handling embeddings.

Step 3: Configure Logging:

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

This sets up a basic logging configuration to output information and error messages, which helps in troubleshooting.

Step 4: Define Constants

Let’s set some constants that will be used throughout the application.

# Constants
CHUNK_SIZE = 500
CHUNK_OVERLAP = 50
MODEL_NAME = 'sentence-transformers/all-MiniLM-L6-v2'  # Faster, lighter model
INDEX_PATH = "document_index.faiss"
METADATA_PATH = "metadata.json"
MAX_WORKERS = 4  # Adjust based on your CPU cores

These constants define the chunk size for document processing, the model to be used for embeddings, and paths for saving the index and metadata.

Step 5: Create the DocumentProcessor Class

This class will handle all operations related to document processing, including indexing and searching.

@dataclass
class SearchResult:
    id: int
    path: str
    content: str
    score: float

class DocumentProcessor:
    def __init__(self):
        self.model = SentenceTransformer(MODEL_NAME)
        self.dimension = self.model.get_sentence_embedding_dimension()
        self.index = None
        self.metadata: List[Dict[str, Any]] = []
        self._initialize_index()
        # Cache for document content
        self.document_cache = {}

Here SearchResult is a data class to hold the search results and DocumentProcessor Initializes the sentence transformer model and sets up an index for document embeddings.

Step 6: Initialize the Index

We need to create or load an existing FAISS index for fast retrieval.

def _initialize_index(self):
        """Initialize or load existing FAISS index"""
        try:
            if os.path.exists(INDEX_PATH) and   os.path.exists(METADATA_PATH):
                self.index = faiss.read_index(INDEX_PATH)
                with open(METADATA_PATH, 'r') as f:
                    self.metadata = json.load(f)
                logger.info("Loaded existing index and metadata")
            else:
                self.index = faiss.IndexFlatIP(self.dimension)
                logger.info("Created new index")

        except Exception as e:
            logger.error(f"Error initializing index: {e}")
            self.index = faiss.IndexFlatIP(self.dimension)

    @lru_cache(maxsize=1000)
    def chunk_text(self, text: str) -> List[str]:
        """Split text into manageable chunks with caching"""

        words = text.split()
        chunks = []
        for i in range(0, len(words), CHUNK_SIZE - CHUNK_OVERLAP):
            chunk = ' '.join(words[i:i + CHUNK_SIZE])
            chunks.append(chunk)
        return chunks

This function checks if an index already exists. If it does, it loads it; otherwise, it creates a new index.

Step 7: Process Files and Index Documents

We need to read various document types and convert them into chunks that can be indexed.

def process_file(self, file_path: str) -> tuple:
        """Process a single file and return its chunks and metadata"""
        content = self.read_file(str(file_path))
        if not content:
            return [], []
        chunks = self.chunk_text(content)
        file_metadata = [
            {"path": str(file_path), "chunk_id": i}

            for i in range(len(chunks))
        ]
        return chunks, file_metadata

This method reads a file, splits it into manageable chunks, and prepares metadata for each chunk.

Step 8: Read Files of Various Formats

We will implement a function to read content from different document formats like text, PDF, DOCX, and PPTX.

def read_file(self, file_path: str) -> str:
        """Read content from different file types with caching"""
        if file_path in self.document_cache:
            return self.document_cache[file_path]
        try:
            ext = Path(file_path).suffix.lower()
            content = ""
            if ext == '.txt':
                with open(file_path, 'r', encoding='utf-8') as f:
                    content = f.read()

            elif ext == '.pdf':
                with fitz.open(file_path) as doc:
                    content = " ".join(page.get_text() for page in doc)

            elif ext == '.docx':
                doc = Document(file_path)
                content = " ".join(p.text for p in doc.paragraphs)

            elif ext == '.pptx':
                prs = Presentation(file_path)
                content = " ".join(shape.text

                                 for slide in prs.slides
                                 for shape in slide.shapes
                                 if hasattr(shape, "text"))

            self.document_cache[file_path] = content
            return content

        except Exception as e:
            logger.error(f"Error reading file {file_path}: {e}")
            return "

This function checks the file extension and reads the content accordingly, caching it for efficiency.

Step 9: Index Documents from a Directory

Now we will create a method to index all documents from a specified directory.

def index_documents(self, directory: str) -> bool:
        """Index documents from directory using parallel processing"""
        try:
            # Collect all valid files
            file_paths = []
            for ext in ['.txt', '.pdf', '.docx', '.pptx']:
                file_paths.extend(Path(directory).rglob(f'*{ext}'))
            if not file_paths:
                st.warning("No supported documents found")
                return False

            # Process files in parallel

            documents = []
            self.metadata = []
            with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:

                future_to_file = {
                    executor.submit(self.process_file, str(file_path)): file_path
                    for file_path in file_paths
                }

                for future in as_completed(future_to_file):
                    chunks, file_metadata = future.result()
                    documents.extend(chunks)
                    self.metadata.extend(file_metadata)

            if not documents:
                st.warning("No content extracted from documents")
                return False

            # Create embeddings in batches
            batch_size = 32  # Adjust based on your memory
            all_embeddings = []
            for i in range(0, len(documents), batch_size):
                batch = documents[i:i + batch_size]
                embeddings = self.model.encode(
                    batch,
                    show_progress_bar=False,
                    batch_size=batch_size
                )
                all_embeddings.append(embeddings)

            embeddings = np.vstack(all_embeddings)

            # Create and save index
            self.index = faiss.IndexFlatIP(self.dimension)
            self.index.add(embeddings.astype('float32'))

            # Save index and metadata
            faiss.write_index(self.index, INDEX_PATH)
            with open(METADATA_PATH, 'w') as f:
                json.dump(self.metadata, f)

            st.success(f"Indexed {len(documents)} text chunks from {len(file_paths)} files")

            return True

        except Exception as e:
            logger.error(f"Indexing error: {e}")
            st.error(f"Error during indexing: {str(e)}")
            return False

This function collects files from the directory, processes them in parallel, and indexes the content. If no files are found, it displays a warning in the Streamlit app.

Step 10: Implement the Search Functionality

Let’s create a method to search for relevant documents based on user queries.

@lru_cache(maxsize=100)
    def search(self, query: str, k: int = 5) -> List[SearchResult]:
        """Search for relevant documents with caching"""

        try:
            if not self.index or self.index.ntotal == 0:
                st.warning("No documents indexed yet")
                return []

            # Encode query and search

            query_vector = self.model.encode([query])[0]
            distances, indices = self.index.search(
                query_vector.reshape(1, -1).astype('float32'),
                min(k, self.index.ntotal)
            )

            # Process results
            results = []
            for i, idx in enumerate(indices[0]):
                if idx < len(self.metadata):
                    meta = self.metadata[idx]
                    try:

                        content = self.read_file(meta["path"])
                        chunks = self.chunk_text(content)
                        chunk_content = chunks[meta["chunk_id"]] if meta["chunk_id"] < len(chunks) else ""

                        results.append(SearchResult(
                            id=int(idx),
                            path=meta["path"],
                            content=chunk_content,
                            score=float(distances[0][i])
                        ))

                    except Exception as e:
                        logger.error(f"Error reading chunk: {e}")
                        continue

            return results
        except Exception as e:
            logger.error(f"Search error: {e}")
            st.error(f"Error during search: {str(e)}")
            return []

This method encodes the user’s query, searches the index, and retrieves the most relevant documents along with their metadata.

Step 11: Generate Answers Using Context

We will now implement a function to generate answers based on the search results using the Ollama model.

@lru_cache(maxsize=100)
def generate_answer(query: str, context: str) -> Tuple[str, Set[int]]:
    """
    Generate answer using Ollama with caching and return referenced citations
    Args:

        query (str): The user's question
        context (str): The context information from documents

    Returns:

        Tuple[str, Set[int]]: A tuple containing the generated answer and a set of citation indices
    """
    try:

        prompt = f"""
        Answer based on the context below. Be concise.
        You must cite your sources using [0], [1], etc. for EVERY claim you make.
        Make sure to use the citations explicitly in your answer.
        Context:

        {context}

        Question: {query}

        Answer:"""
        response = ollama.generate(
            model='llama3.2:3b',
            prompt=prompt
        )

        answer = response['response']
        # Extract citation numbers from the answer
        citations = set(int(num) for num in re.findall(r'\[(\d+)\]', answer))
        return answer, citations

    except Exception as e:
        logger.error(f"Answer generation error: {e}")
        return "Sorry, I couldn't generate an answer at this time.", set()

This function constructs a prompt using the context and query, utilizes the Ollama model to generate answers, and caches results to optimize performance.

Step 12: Create the Streamlit User Interface

Finally, let’s set up a user-friendly interface using Streamlit.

def main():
    st.set_page_config(page_title="Local Search", layout="wide")
    # Initialize processor
    if 'processor' not in st.session_state:
        st.session_state.processor = DocumentProcessor()

    # Simple UI

    st.title("Local Document Search")

    # Document indexing
    docs_path = st.text_input("Documents folder path:")
    if st.button("Index Documents") and docs_path:
        with st.spinner("Indexing documents..."):
            st.session_state.processor.index_documents(docs_path)

    # Search interface

    query = st.text_input("Enter your question:")
    if st.button("Search") and query:
        with st.spinner("Searching..."):
            results = st.session_state.processor.search(query)

            if results:
                # Generate answer and get cited references
                context = "\n\n".join(
                    f"[{i}] {r.content}"
                    for i, r in enumerate(results)
                )
                answer, citations = generate_answer(query, context)
                # Display results
                st.subheader("Answer")
                st.write(answer)

                # Only show cited documents

                if citations:
                    st.subheader("Referenced Documents")
                    for citation_num in sorted(citations):
                        if citation_num < len(results):
                            result = results[citation_num]

                            with st.expander(f"Reference [{citation_num}] - {Path(result.path).name}"):

                                st.write(result.content)
                else:
                    st.info("No specific documents were cited in the answer.")
            else:
                st.warning("No relevant documents found")

if __name__ == "__main__":
    main()

This is the main function that drives the application. It provides a text input for the directory to index documents and a search bar for user queries.

Demonstrating the Search Engine

From a given folder path containing various text files on different topics, this system processes and indexes the documents intelligently. When a user poses a question, it not only generates a relevant answer but also provides references to the source documents used to construct that answer. This approach ensures transparency and allows users to verify information directly from the source materials. The system leverages advanced natural language processing to understand queries and document content, enabling accurate matching between questions and the most relevant document sections.

Final Words

Congratulations on building your own Local Document Search Engine with Ollama! Your tool now can efficiently process and index various document formats, delivering relevant results based on user queries. This achievement marks just the beginning – you can enhance its capabilities by adding support for more document types, implementing advanced search features like semantic grouping and fuzzy matching, or building a more intuitive user interface with document previews and result highlighting. Consider boosting performance through caching and batch processing, and don’t forget about maintenance: keep your dependencies updated, monitor system performance. Whether you’re handling PDFs, spreadsheets, or complex databases, each enhancement you add will make your tool more powerful and user-friendly, creating a truly personalized search experience that respects data privacy while delivering intelligent results.

Aniruddha Shrikhande

Aniruddha Shrikhande is an AI enthusiast and technical writer with a strong focus on Large Language Models (LLMs) and generative AI. Committed to demystifying complex AI concepts, he specializes in creating clear, accessible content that bridges the gap between technical innovation and practical application. Aniruddha's work explores cutting-edge AI solutions across various industries. Through his writing, Aniruddha aims to inspire and educate, contributing to the dynamic and rapidly expanding field of artificial intelligence.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our AI Courses

Build AI Agents with Google ADK
₹1,712.00
Add to cart

Our Latest Courses

Hands-On Guide to build an AI-Driven Local Search Engine with Ollama

Explore more from ADaSci

Table of Content

What Are LLMs Based Local Search Engines?

Implementation Guide: Building Your Own Search Engine with Ollama

Demonstrating the Search Engine

Final Words

Aniruddha Shrikhande

The Chartered Data Scientist Designation

Elevate Your Team's AI Skills with our Proven Training Programs

Our AI Courses

Build AI Agents with Google ADK

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.

The power of intelligence to propel humanity and make a difference

Our Accrediations

CDS Program

Membership

About

For Organizations

Journal