Deep Dives

DSPy based Prompt Optimization: A Hands-On Guide

DSPy simplifies prompt and parameter optimization for LLMs by automating adjustments, freeing developers from manual tweaks to focus on building impactful systems.

Explore more from ADaSci

Convert Images of Equations into LaTeX Code Using Python

The Future of Search Dynamics: A Deep Dive into Large Language Models and Vector Databases with Suvrat Bharadwaj and Chinmaya Kumar Jena

Google’s Titans for Redefining Neural Memory with Persistent Learning at Test Time

Implementing DeepSeek-R1 Locally through Llama.cpp

Microsoft’s Phi-3 Models: A Game Changer in AI Performance and Accessibility

How does Institutional membership with ADaSci add value to a brand?

What Role Does Memory Play in the Performance of LLMs?

Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks for Lighter, Faster and Robust Models

LLM Based Agentic Framework to Assist with IT Incidents

How to Leverage ADaSci Continuous Learning Program for a Generative AI Career?

In the evolving landscape of large language models (LLMs), optimizing prompts and model behavior is often crucial but labor-intensive. Traditional approaches require breaking down problems manually, tuning prompts step-by-step, and iteratively refining synthetic data for fine tuning—all of which can become chaotic when changes are introduced. Enter DSPy, a framework designed to make this process systematic and powerful by separating the program’s structure from its LLM parameters.

DSPy introduces LM-driven optimizers that automatically adjust prompts and weights based on defined metrics, creating reliable and adaptable LLM pipelines. Similar to how frameworks like PyTorch manage neural network parameters, DSPy offers modules and optimizers that eliminate manual prompt-tweaking, allowing developers to focus on building high-quality systems without wrestling with repetitive prompt engineering. In this blog, we’ll explore how DSPy transforms prompt and parameter optimization for LLMs, making it less cumbersome and more impactful.

Table of Content:

Understanding DSPy for Optimizing Language Model Workflows
Overview of DSPy Workflow
Hands-on Implementation of DSPy

Let’s start with understanding DSPy in depth.

Understanding DSPy for Optimizing Language Model Workflows

The concept behind DSPy addresses a core issue in developing robust language model (LM) pipelines which is optimizing prompts and LM parameters separately from the programming logic. By introducing a “signature” system that encapsulates prompt best practices, DSPy aims to make prompt engineering both modular and systematic. Imagine a Retrieval-Augmented Generation (RAG) workflow, where prompt adjustments are typically done manually to improve accuracy. DSPy removes the burden of managing prompt engineering within code, instead letting developers focus on system logic while DSPy handles automatic prompt refinements and adjustments.

In essence, DSPy allows you to set high-level assertions and configurations, which it then optimizes automatically. For instance, in a binary question-answering task, rather than manually adjusting prompts to ensure binary responses, DSPy lets you assert that the answer should only be “yes” or “no.” If the LM deviates, DSPy backtracks and re-optimizes the prompt automatically to guide the model toward the desired output. Similarly, DSPy facilitates complex multi-step retrieval processes without the need for intricate prompt engineering, making it a powerful tool for building flexible LM pipelines.

However, DSPy as a framework is still evolving. Although the concept is promising, the current implementation has notable limitations: it lacks production readiness, has a steep learning curve due to its heavy reliance on meta-programming, and suffers from inadequate documentation. While DSPy simplifies prompt optimization theoretically, the code complexity can be a significant hurdle for users.

Overview of DSPy Workflow

DSPy employs a logical, five-step workflow tailored for language tasks, streamlining the process from data preparation to evaluation.

Workflow of DSPy

DSPy’s Workflow begins with the Dataset stage, where training data, such as blog posts, Q&A pairs, or other text data, is prepared and structured. The next step, Signature, establishes an input-output contract, clearly defining the task’s expected inputs and outputs. The Module (Pipeline) stage follows, where DSPy combines various operators to execute specific tasks, such as content generation or text analysis. In the Optimization phase, DSPy automatically fine-tunes parameters and prompts to enhance pipeline performance. Finally, Evaluation assesses pipeline effectiveness using metrics like accuracy and quality. This structured approach is very effective for various tasks, including content generation and automated content enhancement.

Hands-on Implementation of DSPy

Step 1: Setting up the environment

First, we’ll set up our development environment by installing necessary packages, configuring paths, and importing required libraries. This setup is specifically designed to work in Google Colab.

# Automatically reload modules in Colab
%load_ext autoreload
%autoreload 2

import sys
import os
# Clone DSPy repository if not already cloned, specific to Google Colab
repo_path = 'dspy'

if "google.colab" in sys.modules:
    !git -C $repo_path pull origin || git clone https://github.com/stanfordnlp/dspy $repo_path

# Add repo_path to system path
if repo_path not in sys.path:
    sys.path.append(repo_path)

# Set DSPy cache directory in Colab
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(repo_path, 'cache')

# Install DSPy and OpenAI packages if not already installed

import pkg_resources
required_packages = {"dspy-ai", "openai"}
installed_packages = {pkg.key for pkg in pkg_resources.working_set}
if not required_packages.issubset(installed_packages):
    !pip install -U pip
    !pip install dspy-ai==2.4.17 openai==0.28.1

# Import DSPy
import dspy

Step 2: Configuring LM and RM

Now we’ll configure our Language Model (GPT-3.5-turbo) and Retrieval Model (ColBERTv2). These will form the backbone of our RAG system.

turbo = dspy.OpenAI(model='gpt-3.5-turbo')

colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

Step 3: Loading the dataset

We’ll use the HotpotQA dataset for training and evaluation. We’ll load a small subset for training (20 examples) and development (50 examples). This dataset consists of questions and answers

from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.

trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

Step 4: Building Signatures

Signatures in DSPy define the interface for our LM calls. Here we’ll create a signature for generating answers that specifies input/output fields and their descriptions.

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

Step 5: Building the Pipeline

Let’s create our RAG pipeline by combining retrieval and answer generation into a single module. This pipeline will retrieve relevant passages and generate answers.

class RAG(dspy.Module):

    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):

        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

Step 6: Optimizing the Pipeline

Using DSPy’s Teleprompter, we’ll optimize our pipeline by automatically learning effective prompts for its modules through few-shot learning.

from dspy.teleprompt import BootstrapFewShot

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.

def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)
# Compile!

compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

Step 7: Executing the Pipeline

Now we can test our optimized RAG system with a sample question to see how it performs in practice.

# Ask any question you like about this simple RAG program.
my_question = "What castle did David Gregory inherit?"
# Get the prediction. This contains `pred.context` and `pred.answer`.

pred = compiled_rag(my_question)
# Print the contexts and the answer.

print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")

Step 8: Evaluating the Pipeline

Let’s evaluate our pipeline’s overall performance using exact match metrics on our development set.

from dspy.evaluate.evaluate
import Evaluate

# Set up the `evaluate_on_hotpotqa` function. We'll use this many times below.evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=5)# Evaluate the `compiled_rag` program with the `answer_exact_match` 

metric.metric = dspy.evaluate.answer_exact_matchevaluate_on_hotpotqa(compiled_rag, metric=metric)

Step 9: Evaluating the Retrieval

Finally, we’ll specifically evaluate the retrieval component by checking if our system finds the gold (correct) passages for each question.

def gold_passages_retrieved(example, pred, trace=None):

    gold_titles = set(map(dspy.evaluate.normalize_text, example['gold_titles']))

    found_titles = set(map(dspy.evaluate.normalize_text, [c.split(' | ')[0] for c in pred.context]))

    return gold_titles.issubset(found_titles)

compiled_rag_retrieval_score = evaluate_on_hotpotqa(compiled_rag, metric=gold_passages_retrieved)

Output –

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: often between 1 and 5 words

Context:

[1] «Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an American actress, producer, singer, comic book writer, and political activist. She made her film debut in the 1995 teen drama "Kids". Her subsequent film roles include "He Got Game", "Men in Black II", "25th Hour", "Rent", "Sin City", "Death Proof", "Seven Pounds", "", and "Top Five". Dawson has also provided voice-over work for Disney and DC.»

[2] «Sarai Gonzalez | Sarai Isaura Gonzalez (born 2005) is an American Latina child actress who made her professional debut at the age of 11 on the Spanish-language ""Soy Yo"" ("That's Me") music video by Bomba Estéreo. Cast as a "nerdy" tween with a "sassy" and "confident" attitude, her performance turned her into a "Latina icon" for "female empowerment, identity and self-worth". She subsequently appeared in two get out the vote videos for Latinos in advance of the 2016 United States elections.»

[3] «Gabriela (2001 film) | Gabriela is a 2001 American romance film, starring Seidy Lopez in the title role alongside Jaime Gomez as her admirer Mike. The film has been cited as an inspiration behind the Premiere Weekend Club, which supports Latino film-making.»

Question: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino?

Reasoning: Let's think step by step in order to produce the answer. We know that the actress made her film debut in 1995 and co-founded Voto Latino.

Answer: Rosario Dawson

Final Words

In summary, DSPy presents a promising approach to optimizing complex language model workflows by separating prompt engineering from programming logic, bringing structure and automation to what is traditionally a manual, iterative process. While the framework is still evolving and has some limitations, its foundational ideas—like automatic prompt adjustments, module settings, and assertion-based backtracking—are innovative steps towards a more robust and scalable way to develop LM-based applications. DSPy simplifies the integration of LLMs into pipelines, making it easier to experiment, refine, and deploy, especially as the tool matures and becomes more production-ready.

References

Aniruddha Shrikhande

Aniruddha Shrikhande is an AI enthusiast and technical writer with a strong focus on Large Language Models (LLMs) and generative AI. Committed to demystifying complex AI concepts, he specializes in creating clear, accessible content that bridges the gap between technical innovation and practical application. Aniruddha's work explores cutting-edge AI solutions across various industries. Through his writing, Aniruddha aims to inspire and educate, contributing to the dynamic and rapidly expanding field of artificial intelligence.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our AI Courses

Build AI Agents with Google ADK
₹1,714.00
Add to cart

Our Latest Courses

DSPy based Prompt Optimization: A Hands-On Guide

Explore more from ADaSci

Table of Content:

Understanding DSPy for Optimizing Language Model Workflows

Overview of DSPy Workflow

Hands-on Implementation of DSPy

Final Words

References

Aniruddha Shrikhande

The Chartered Data Scientist Designation

Elevate Your Team's AI Skills with our Proven Training Programs

Our AI Courses

Build AI Agents with Google ADK

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.

The power of intelligence to propel humanity and make a difference

Our Accrediations

CDS Program

Membership

About

For Organizations

Journal