Deep Dives

Multi-agent Orchestration through OpenAI’s Swarm – A Hands-on Guide

OpenAI's Swarm framework explores multi-agent orchestration, showcasing simple routines and handoffs in action.

Explore more from ADaSci

A Deep Dive into NVIDIA Cosmos and Its Capabilities

How L&D Leaders Can Drive AI Readiness Across the Enterprise?

Convert Images of Equations into LaTeX Code Using Python

DeepSeek-V3 Explained: Optimizing Efficiency and Scale

A Practitioners Guide to Running Ollama models in Colab – Collama

Implementing DeepSeek-R1 Locally through Llama.cpp

Why do Enterprises Love RAG?

Top Tools and Techniques for LLM Fine-Tuning: A Comprehensive Guide

A Practitioner’s Guide to Nexus – A Scalable Multi-Agent Framework

A Hands-On Guide to RecurrentGemma With Hugging Face

OpenAI recently released Swarm, an open-source, lightweight, experimental framework that explores multi-agent orchestration interfaces. Swarm is currently in the experimental phase and is not intended to be used in production; therefore, it has no official support. Its primary goal is to show the routines and handoffs and how they can be used to orchestrate multiple agents simply and easily. This article explores OpenAI’s Swarm through a practical application.

Understanding Swarm
Core-components of Swarm
Key features and benefits of Swarm
Hands-on implementation of Swarm

Understanding Swarm

OpenAI’s Swarm is an open-source multi-agent orchestration framework that enables users to build and manage multi-agent systems through the usage of routines and handoffs. These notions help the users build and execute multi-agent systems through a holistic approach rather than using independent capabilities and instructions that are difficult to encode into a single prompt. Swarm is powered by OpenAI’s Chat Completions API and is stateless between calls.

Swarm is used by instantiating a Swarm client which is nothing more than a simple OpenAI client. This client uses the run() function which is analogous to the chat.completions.create() function in the OpenAI Chat Completions API. The run() function of the client takes messages and returns messages without saving state between calls.

Swarm’s client.run() implements a loop that gets a completion from the current agent in execution, executes the tool calls and append results, switches agents if required, updates context variables and returns the response. Once the run is finished, it will return a response containing the updated state.

Core-components of Swarm

Swarm makes agent coordination and execution lightweight and easily testable through two primitive abstractions – Agents and Handoffs. The agent is composed of instructions with a set of functions and can choose to hand off a conversation to another agent as required. The agent performs the tasks whereas the handoffs enable one agent to transfer control to another agent, enabling agent communication based on the context of the task.

The key agent fields used in Swarm are shown below:

Agent Fields in Swarm (Source)

Instructions, on the other hand, are directly converted into the system prompt of a conversation. Only the instructions of an active agent will be present at any given time. If there is a handoff, the system prompt will change but the chat history will persist. A sample agent, for the Google search task, is shown below:

google_search_agent = Agent(
   name = "Google Search Agent",
   instructions = "You are a Google search agent specialised in searching the web.
)

Another important notion in Swarm is known as Routines which are predefined in natural language as a list of instructions along with the functions required to execute them. This allows the simplification of workflows through manageable steps, enabling a high degree of control. A sample routine, showcasing intelligent searching, is shown below:

system_message = """Welcome! You are an intelligent search engine agent tasked with assisting users in finding the most accurate, relevant, and timely information. 

Your goals are:

1. Understand the user's query with clarity and intent.
2. Provide results from reliable and diverse sources, ensuring factual accuracy.
3. Be concise yet informative in your responses, summarizing key points effectively.

Remember, your purpose is to serve as a highly efficient and helpful search assistant, capable of understanding both simple and complex queries while maintaining a user-friendly tone."""

def search_engine_agent(query):
  return “success”

A handoff can be used for transferring control from one agent to another within the Swarm framework. A sample handoff transference is showcased below:

def handoff_to_search_google():
  """Handoff the search query to the search Google agent."""
  return google_search_agent

user_interface_agent = Agent(
  name ="User Interface Agent",
  instructions ="You are a user interface agent that handles all interactions with the user. You need to always start with a web data extraction objective that the user wants to achieve by searching the web, mapping the web pages, and extracting the content from a specific page. Be concise.",
  functions=[handoff_to_search_google],
)

Key features and benefits of Swarm

Swarm is still in an experimental phase and lacks features corresponding to memory, advanced agentic deployments and integrations. But, it possesses certain unique features enlisted below:

Execution of multiple-agents in a controlled manner
Handling diverse and independent capabilities
Managing multi-step processes where the agents can handle different tasks and the handoffs enabling seamless coordination.
AI applications involving multiple agents working together towards achieving a common goal is very simple and easy using Swarm framework.

Hands-on implementation of Swarm

In this section, we will build a simple web data analyst using Swarm, Firecrawl and SerpAPI.

Pre-requisites:

OpenAI API Key
Firecrawl API Key
SerpAPI API Key

Step 1: Installing and importing the necessary libraries –

pip install git+https://github.com/openai/swarm.git
pip install openai firecrawl-py serpapi google-search-results

Step 2: Loading the environment having the required APIs –

dotenv.load_dotenv()

Step 3: Initialising the APIs –

app = FirecrawlApp(api_key=os.getenv("FIRECRAWL_API_KEY"))
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

Step 4: Defining functions for agents –

def search_google(query, objective):
   """Search Google using SerpAPI."""
   print(f"Parameters: query={query}, objective={objective}")
   search = GoogleSearch({"q": query, "api_key": os.getenv("SERP_API_KEY")})
   results = search.get_dict().get("organic_results", [])
   return {"objective": objective, "results": results}

def map_url_pages(url, objective):
   """Map a website's pages using Firecrawl."""
   search_query = generate_completion(
       "website search query generator",
       f"Generate a 1-2 word search query for the website: {url} based on the objective",
       "Objective: " + objective
   )
   print(f"Parameters: url={url}, objective={objective}, search_query={search_query}")
   map_status = app.map_url(url, params={'search': search_query})
   if map_status.get('status') == 'success':
       links = map_status.get('links', [])
       top_link = links[0] if links else None
       return {"objective": objective, "results": [top_link] if top_link else []}
   else:
       return {"objective": objective, "results": []}

def scrape_url(url, objective):
   """Scrape a website using Firecrawl."""
   print(f"Parameters: url={url}, objective={objective}")
   scrape_status = app.scrape_url(
       url,
       params={'formats': ['markdown']}
   )
   return {"objective": objective, "results": scrape_status}

def analyze_website_content(content, objective):
   """Analyze the scraped website content using OpenAI."""
   print(f"Parameters: content={content[:50]}..., objective={objective}")
   analysis = generate_completion(
       "website data extractor",
       f"Analyze the following website content and extract a JSON object based on the objective.",
       "Objective: " + objective + "\nContent: " + content
   )
   return {"objective": objective, "results": analysis}

def generate_completion(role, task, content):
   """Generate a completion using OpenAI."""
   print(f"Parameters: role={role}, task={task[:50]}..., content={content[:50]}...")
   response = client.chat.completions.create(
       model="gpt-4o",
       messages=[
           {"role": "system", "content": f"You are a {role}. {task}"},
           {"role": "user", "content": content}
       ]
   )
   return response.choices[0].message.content

Step 5: Defining handoffs for context variable updations –

def handoff_to_search_google():
   """Hand off the search query to the search google agent."""
   return google_search_agent

def handoff_to_map_url():
   """Hand off the url to the map url agent."""
   return map_url_agent

def handoff_to_website_scraper():
   """Hand off the url to the website scraper agent."""
   return website_scraper_agent

def handoff_to_analyst():
   """Hand off the website content to the analyst agent."""
   return analyst_agent

Step 6: Defining the agents –

# UI agent for user-interaction
user_interface_agent = Agent(
   name="User Interface Agent",
   instructions="You are a user interface agent that handles all interactions with the user. You need to always start with an web data extraction objective that the user wants to achieve by searching the web, mapping the web pages, and extracting the content from a specific page. Be concise.",
   functions=[handoff_to_search_google],
)

# Google search agent for searching web
google_search_agent = Agent(
   name="Google Search Agent",
   instructions="You are a google search agent specialized in searching the web. Only search for the website not any specific page. When you are done, you must hand off to the map agent.",
   functions=[search_google, handoff_to_map_url],

)

# URL mapping agent for mapping web pages
map_url_agent = Agent(
   name="Map URL Agent",
   instructions="You are a map url agent specialized in mapping the web pages. When you are done, you must hand off the results to the website scraper agent.",
   functions=[map_url_pages, handoff_to_website_scraper],
)

# Website scraper agent for scraping data off the website
website_scraper_agent = Agent(
   name="Website Scraper Agent",
   instructions="You are a website scraper agent specialized in scraping website content. When you are done, you must hand off the website content to the analyst agent to extract the data based on the objective.",
   functions=[scrape_url, handoff_to_analyst],
)

# Analyst agent for understanding the website content and displaying in JSON format
analyst_agent = Agent(
   name="Analyst Agent",
   instructions="You are an analyst agent that examines website content and returns a JSON object. When you are done, you must return a JSON object.",
   functions=[analyze_website_content],
)

Step 7: Testing the Swarm through REPL on command line.

if __name__ == "__main__":
   run_demo_loop(user_interface_agent, stream=True)

Output (in Terminal):

Analyst Agent: ```json
{
  "title": "MongoDB Atlas Vector Search for RAG powered LLM Applications",
  "author": "Sachin Tripathi",
  "published_date": "August 16, 2024",
  "key_insights": [
    {
      "concept": "Vector Search",
      "details": "Vector search allows for understanding semantic meaning and relationships by representing data as numerical vectors. It uses embeddings and vector space to facilitate similarity search."
    },
    {
      "concept": "MongoDB Atlas Vector Search",
      "details": "Atlas Vector Search combines MongoDB's document database capabilities with vector search. It enables similarity-based searches, storing and querying high-dimensional vector data efficiently, which is crucial for LLM applications."
    },
    {
      "concept": "Integration with LLMs",
      "details": "Atlas Vector Search integrates with various LLMs and frameworks including LangChain, LlamaIndex, OpenAI, Cohere, Hugging Face, Haystack, MS Semantic Kernel, and AWS, enhancing its utility for LLM applications."
    }
  ],
  "benefits_for_RAG_applications": [
    "Allows retrieval of semantically similar documents using vector search.",
    "Improves efficiency and accuracy in querying high-dimensional vector data.",
    "Facilitates seamless integration with multiple LLM frameworks, simplifying the development of intelligent applications."
  ],
  "semantic_search_advantages": [
    "Enables understanding of semantic meaning by incorporating relevance scoring for ranked results.",
    "Supports advanced search functionalities like full-text search and fuzzy matching within the data.",
    "Seamless integration with MongoDB's aggregation pipeline allows for combined data processing operations and search queries."
  ],
  "technical_details": [
    {
      "feature": "Atlas Vector Search Index",
      "description": "Separate from basic database indexes, used to retrieve documents containing vector embeddings efficiently at query time."
    },
    {
      "feature": "Search Algorithms Supported",
      "description": "Atlas Vector Search supports Approximate Nearest Neighbor (ANN) search with HNSW algorithm and Exact Nearest Neighbor (ENN) search."
    }
  ]
}

We can see the Swarm framework integrated with Firecrawl and SerpAPI was able to scrape and analyse the website, generating proper results in JSON format as per the agent definitions.

Final Words

OpenAI’s educational framework Swarm is open-sourced under MIT License enables users to work alongside and enhance its capabilities. This is a great step towards fostering a collaborative environment and enables active expansion in the field of multi-agent orchestration. Swarm lets users develop multi-agent orchestrations with a huge efficiency and comfortability based on the lightweight and controllable features demonstrated in the framework.

References

Sachin Tripathi

Sachin Tripathi is the Manager of AI Research at AIM, with over a decade of experience in AI and Machine Learning. An expert in generative AI and large language models (LLMs), Sachin excels in education, delivering effective training programs. His expertise also includes programming, big data analytics, and cybersecurity. Known for simplifying complex concepts, Sachin is a leading figure in AI education and professional development.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our AI Courses

Build AI Agents with Google ADK
₹1,715.00
Add to cart

Our Latest Courses

Multi-agent Orchestration through OpenAI’s Swarm – A Hands-on Guide

Explore more from ADaSci

Table of Contents

Understanding Swarm

Core-components of Swarm

Key features and benefits of Swarm

Hands-on implementation of Swarm

Final Words

References

Sachin Tripathi

The Chartered Data Scientist Designation

Elevate Your Team's AI Skills with our Proven Training Programs

Our AI Courses

Build AI Agents with Google ADK

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.

The power of intelligence to propel humanity and make a difference

Our Accrediations

CDS Program

Membership

About

For Organizations

Journal