Generative AI Crash Course for Non-Tech Professionals. Register Now >

A Hands-on Guide to llama-agents: Building AI Agents as Microservices

Discover the power of llama-agents: a comprehensive framework for creating, iterating, and deploying efficient multi-agent AI systems.

llama-agents is a new addition to the llama-index family of frameworks, designed to create, iterate and deploy multi-agent AI systems comprehensively and with great efficiency. It employs an async-first architecture that transforms individual AI agents into microservices, allowing them to endlessly operate and process the incoming tasks. This article explores llama-agents and explains it through hands-on implementation. 

Table of Contents

  1. Understanding Async-First Architecture
  2. Understanding the System Layout of llama-agents
  3. Hands-on Implementation of llama-agents

Understanding Async-First Architecture

llama-agents employ the use of asynchronous communication, ensuring efficiency when handling multiple tasks simultaneously. This enables agents in a multi-agent system to process and operate on their assigned task portions, based on a larger problem, individually without waiting for each other to finish. Async-first architecture enables the agents to be completely independent without one agent waiting for another agent to finish before starting its operation.

Async-first architecture improves scalability and enhances efficiency as the systems can handle increased workloads more efficiently as the different components, agents in this case, are not blocked waiting for other agents to finish their tasks. Also, the tasks are completed in parallel which leads to a faster overall processing.

Understanding the System-Layout of llama-agents    

Each agent in llama-agents acts as a microservice, responsible for specific tasks. This modular structure offers different advantages such as scalability, maintainability and reusability. llama-agents adopt a distributed service-oriented architecture, allowing individual AI agents to collaborate on complex tasks. 

llama-agents System Layout

Control Plane is the central hub which is responsible for many crucial tasks such as task management, agent registry, orchestrator and communication management. The control plane receives incoming tasks, breaks them down into smaller subtasks and assigns them to relevant agents. Control plane maintains a registry of all the agents registered within the system, keeping track of their capabilities and availability. 

Orchestrator is the decision-making engine within the control plane. It determines the sequence of subtasks, the flow of information between the agents and the criteria for completing the overall task. llama-agents offer two primary orchestration types – Agentic Orchestration, the LLM analyzes the tasks and dynamically decides which agents are to be involved and how they should interact, and Explicit Orchestration, the users define a predefined workflow outlining the specific sequence of interactions between the agents for completing the tasks. 

Message Queues act as the communication channels between agents and the control plane. Messages are sent and received asynchronously, allowing agents to work independently without waiting for immediate responses. The message queues enable decoupling and fault tolerance. 

Each agent, a self-contained unit specializing in specific tasks, acts as an independent microservice in llama-agents. The agents communicate with each other and the control plane asynchronously, designed to handle specific functionalities, promote modularity, enhance scalability and reusability. 

llama-agents allow for integrating external tools as services. These tools can handle computationally expensive tasks or specialized functionalities that agents might not support natively. Tool services can be invoked by agents through the control plane, extending the overall capability of the system. 

Hands-on Implementation of llama-agents

Step 1 – Install the required libraries.

Step 2 – Import the libraries.

Step 3 – Enable the logger for seeing the system operations in output.


Step 4 – Set the OpenAI_API_KEY.

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

Step 5 – Set up the message queue and control plane.

Step 6 – Create a user-defined tool for converting the agent into a microservice

Step 7 – Define the agent and create the agent service using agent name, message_queue, description, service_name, host and port parameters. 

Step 8 – Define a human consumer for handling the published result and launch the service. 

Step 9 – Use the real-time monitoring to interact with the agent service and inject task queries for agent responses. The code below should be executed on another terminal while the agent service is running. 


The monitoring tool is a point-and-click terminal application, which enlists the agentic services and creates the job based on user queries. 

On giving the task query (creating a new job) – synonym for word “knowledge”? The output comes as expertise. 

Final Words

llama-agents provides an efficient, scalable and a reusable framework for building complex agent-based systems through the usage of async-first architecture combined with flexible orchestration options and human-in-the-loop integration which enables users to develop and deploy multi-agent systems with ease. The framework is still in its infancy stage but will surely improve in future as per the roadmap issued by llama-index team.  


  1. Link to Colab Notebook
  2. llama-agents GitHub Repo
  3. Official Roadmap
  4. Official blog 

Learn more about AI Agents and LLMs through our hand-picked course modules:

Picture of Sachin Tripathi

Sachin Tripathi

Sachin Tripathi is the Manager of AI Research at AIM, with over a decade of experience in AI and Machine Learning. An expert in generative AI and large language models (LLMs), Sachin excels in education, delivering effective training programs. His expertise also includes programming, big data analytics, and cybersecurity. Known for simplifying complex concepts, Sachin is a leading figure in AI education and professional development.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.