OpenAI recently released Swarm, an open-source, lightweight, experimental framework that explores multi-agent orchestration interfaces. Swarm is currently in the experimental phase and is not intended to be used in production; therefore, it has no official support. Its primary goal is to show the routines and handoffs and how they can be used to orchestrate multiple agents simply and easily. This article explores OpenAI’s Swarm through a practical application.
Table of Contents
- Understanding Swarm
- Core-components of Swarm
- Key features and benefits of Swarm
- Hands-on implementation of Swarm
Understanding Swarm
OpenAI’s Swarm is an open-source multi-agent orchestration framework that enables users to build and manage multi-agent systems through the usage of routines and handoffs. These notions help the users build and execute multi-agent systems through a holistic approach rather than using independent capabilities and instructions that are difficult to encode into a single prompt. Swarm is powered by OpenAI’s Chat Completions API and is stateless between calls.
Swarm is used by instantiating a Swarm client which is nothing more than a simple OpenAI client. This client uses the run() function which is analogous to the chat.completions.create() function in the OpenAI Chat Completions API. The run() function of the client takes messages and returns messages without saving state between calls.
Swarm’s client.run() implements a loop that gets a completion from the current agent in execution, executes the tool calls and append results, switches agents if required, updates context variables and returns the response. Once the run is finished, it will return a response containing the updated state.
Core-components of Swarm
Swarm makes agent coordination and execution lightweight and easily testable through two primitive abstractions – Agents and Handoffs. The agent is composed of instructions with a set of functions and can choose to hand off a conversation to another agent as required. The agent performs the tasks whereas the handoffs enable one agent to transfer control to another agent, enabling agent communication based on the context of the task.
The key agent fields used in Swarm are shown below:
Agent Fields in Swarm (Source)
Instructions, on the other hand, are directly converted into the system prompt of a conversation. Only the instructions of an active agent will be present at any given time. If there is a handoff, the system prompt will change but the chat history will persist. A sample agent, for the Google search task, is shown below:
google_search_agent = Agent(
name = "Google Search Agent",
instructions = "You are a Google search agent specialised in searching the web.
)
Another important notion in Swarm is known as Routines which are predefined in natural language as a list of instructions along with the functions required to execute them. This allows the simplification of workflows through manageable steps, enabling a high degree of control. A sample routine, showcasing intelligent searching, is shown below:
system_message = """Welcome! You are an intelligent search engine agent tasked with assisting users in finding the most accurate, relevant, and timely information.
Your goals are:
1. Understand the user's query with clarity and intent.
2. Provide results from reliable and diverse sources, ensuring factual accuracy.
3. Be concise yet informative in your responses, summarizing key points effectively.
Remember, your purpose is to serve as a highly efficient and helpful search assistant, capable of understanding both simple and complex queries while maintaining a user-friendly tone."""
def search_engine_agent(query):
return “success”
A handoff can be used for transferring control from one agent to another within the Swarm framework. A sample handoff transference is showcased below:
def handoff_to_search_google():
"""Handoff the search query to the search Google agent."""
return google_search_agent
user_interface_agent = Agent(
name ="User Interface Agent",
instructions ="You are a user interface agent that handles all interactions with the user. You need to always start with a web data extraction objective that the user wants to achieve by searching the web, mapping the web pages, and extracting the content from a specific page. Be concise.",
functions=[handoff_to_search_google],
)
Key features and benefits of Swarm
Swarm is still in an experimental phase and lacks features corresponding to memory, advanced agentic deployments and integrations. But, it possesses certain unique features enlisted below:
- Execution of multiple-agents in a controlled manner
- Handling diverse and independent capabilities
- Managing multi-step processes where the agents can handle different tasks and the handoffs enabling seamless coordination.
- AI applications involving multiple agents working together towards achieving a common goal is very simple and easy using Swarm framework.
Hands-on implementation of Swarm
In this section, we will build a simple web data analyst using Swarm, Firecrawl and SerpAPI.
Pre-requisites:
- OpenAI API Key
- Firecrawl API Key
- SerpAPI API Key
Step 1: Installing and importing the necessary libraries –
pip install git+https://github.com/openai/swarm.git
pip install openai firecrawl-py serpapi google-search-results
Step 2: Loading the environment having the required APIs –
dotenv.load_dotenv()
Step 3: Initialising the APIs –
app = FirecrawlApp(api_key=os.getenv("FIRECRAWL_API_KEY"))
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
Step 4: Defining functions for agents –
def search_google(query, objective):
"""Search Google using SerpAPI."""
print(f"Parameters: query={query}, objective={objective}")
search = GoogleSearch({"q": query, "api_key": os.getenv("SERP_API_KEY")})
results = search.get_dict().get("organic_results", [])
return {"objective": objective, "results": results}
def map_url_pages(url, objective):
"""Map a website's pages using Firecrawl."""
search_query = generate_completion(
"website search query generator",
f"Generate a 1-2 word search query for the website: {url} based on the objective",
"Objective: " + objective
)
print(f"Parameters: url={url}, objective={objective}, search_query={search_query}")
map_status = app.map_url(url, params={'search': search_query})
if map_status.get('status') == 'success':
links = map_status.get('links', [])
top_link = links[0] if links else None
return {"objective": objective, "results": [top_link] if top_link else []}
else:
return {"objective": objective, "results": []}
def scrape_url(url, objective):
"""Scrape a website using Firecrawl."""
print(f"Parameters: url={url}, objective={objective}")
scrape_status = app.scrape_url(
url,
params={'formats': ['markdown']}
)
return {"objective": objective, "results": scrape_status}
def analyze_website_content(content, objective):
"""Analyze the scraped website content using OpenAI."""
print(f"Parameters: content={content[:50]}..., objective={objective}")
analysis = generate_completion(
"website data extractor",
f"Analyze the following website content and extract a JSON object based on the objective.",
"Objective: " + objective + "\nContent: " + content
)
return {"objective": objective, "results": analysis}
def generate_completion(role, task, content):
"""Generate a completion using OpenAI."""
print(f"Parameters: role={role}, task={task[:50]}..., content={content[:50]}...")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"You are a {role}. {task}"},
{"role": "user", "content": content}
]
)
return response.choices[0].message.content
Step 5: Defining handoffs for context variable updations –
def handoff_to_search_google():
"""Hand off the search query to the search google agent."""
return google_search_agent
def handoff_to_map_url():
"""Hand off the url to the map url agent."""
return map_url_agent
def handoff_to_website_scraper():
"""Hand off the url to the website scraper agent."""
return website_scraper_agent
def handoff_to_analyst():
"""Hand off the website content to the analyst agent."""
return analyst_agent
Step 6: Defining the agents –
# UI agent for user-interaction
user_interface_agent = Agent(
name="User Interface Agent",
instructions="You are a user interface agent that handles all interactions with the user. You need to always start with an web data extraction objective that the user wants to achieve by searching the web, mapping the web pages, and extracting the content from a specific page. Be concise.",
functions=[handoff_to_search_google],
)
# Google search agent for searching web
google_search_agent = Agent(
name="Google Search Agent",
instructions="You are a google search agent specialized in searching the web. Only search for the website not any specific page. When you are done, you must hand off to the map agent.",
functions=[search_google, handoff_to_map_url],
)
# URL mapping agent for mapping web pages
map_url_agent = Agent(
name="Map URL Agent",
instructions="You are a map url agent specialized in mapping the web pages. When you are done, you must hand off the results to the website scraper agent.",
functions=[map_url_pages, handoff_to_website_scraper],
)
# Website scraper agent for scraping data off the website
website_scraper_agent = Agent(
name="Website Scraper Agent",
instructions="You are a website scraper agent specialized in scraping website content. When you are done, you must hand off the website content to the analyst agent to extract the data based on the objective.",
functions=[scrape_url, handoff_to_analyst],
)
# Analyst agent for understanding the website content and displaying in JSON format
analyst_agent = Agent(
name="Analyst Agent",
instructions="You are an analyst agent that examines website content and returns a JSON object. When you are done, you must return a JSON object.",
functions=[analyze_website_content],
)
Step 7: Testing the Swarm through REPL on command line.
if __name__ == "__main__":
run_demo_loop(user_interface_agent, stream=True)
Output (in Terminal):
Analyst Agent: ```json
{
"title": "MongoDB Atlas Vector Search for RAG powered LLM Applications",
"author": "Sachin Tripathi",
"published_date": "August 16, 2024",
"key_insights": [
{
"concept": "Vector Search",
"details": "Vector search allows for understanding semantic meaning and relationships by representing data as numerical vectors. It uses embeddings and vector space to facilitate similarity search."
},
{
"concept": "MongoDB Atlas Vector Search",
"details": "Atlas Vector Search combines MongoDB's document database capabilities with vector search. It enables similarity-based searches, storing and querying high-dimensional vector data efficiently, which is crucial for LLM applications."
},
{
"concept": "Integration with LLMs",
"details": "Atlas Vector Search integrates with various LLMs and frameworks including LangChain, LlamaIndex, OpenAI, Cohere, Hugging Face, Haystack, MS Semantic Kernel, and AWS, enhancing its utility for LLM applications."
}
],
"benefits_for_RAG_applications": [
"Allows retrieval of semantically similar documents using vector search.",
"Improves efficiency and accuracy in querying high-dimensional vector data.",
"Facilitates seamless integration with multiple LLM frameworks, simplifying the development of intelligent applications."
],
"semantic_search_advantages": [
"Enables understanding of semantic meaning by incorporating relevance scoring for ranked results.",
"Supports advanced search functionalities like full-text search and fuzzy matching within the data.",
"Seamless integration with MongoDB's aggregation pipeline allows for combined data processing operations and search queries."
],
"technical_details": [
{
"feature": "Atlas Vector Search Index",
"description": "Separate from basic database indexes, used to retrieve documents containing vector embeddings efficiently at query time."
},
{
"feature": "Search Algorithms Supported",
"description": "Atlas Vector Search supports Approximate Nearest Neighbor (ANN) search with HNSW algorithm and Exact Nearest Neighbor (ENN) search."
}
]
}
We can see the Swarm framework integrated with Firecrawl and SerpAPI was able to scrape and analyse the website, generating proper results in JSON format as per the agent definitions.
Final Words
OpenAI’s educational framework Swarm is open-sourced under MIT License enables users to work alongside and enhance its capabilities. This is a great step towards fostering a collaborative environment and enables active expansion in the field of multi-agent orchestration. Swarm lets users develop multi-agent orchestrations with a huge efficiency and comfortability based on the lightweight and controllable features demonstrated in the framework.