Observing and Tracing Multi-Modal Multi-Agent Systems through Portkey

Portkey enables observability and tracing in multi-modal, multi-agent systems for enhanced understanding and development.

Multi-modal and multi-agent architectures have gained a huge amount of dominance and prevalence in the current LLM landscape. These systems are designed to process and synthesize information from diverse sources while orchestrating and collaborative efforts of multiple autonomous agents and they hold immense potential for solving complex real-world problems. Understanding the inner complexities of these agents presents significant challenges, these challenges can be tackled with ease by using agent observability and tracing platforms like Portkey. This article explains multi-modal multi-agent development and its observation in detail.  

Table of Contents

  1. Understanding Multi-Modal and Multi-Agent Development
  2. Comprehending Replicate and its Usability
  3. AI Agent Observability and Tracing
  4. Hands-on Implementation of Portkey for Observing Agents

Understanding Multi-Modal and Multi-Agent Development

A multimodal agent is a system that can process and understand information from multiple types of data such as text, images, audio, videos, and sensor data. These systems integrate information from different modalities to achieve a more comprehensive understanding and capability. These types of systems integrate specialized encoders for each modality and employ sophisticated alignment techniques, for creating a unified semantic representation that can capture the relationships between different data types. 

The architecture of multimodal systems involves modal-specific processing pathways that eventually converge into shared representations. Modern implementations often use a transformer-based approach, which is highly effective in handling cross-modal attention. Applications of multimodal agents have expanded rapidly, ranging from content-understanding systems that can understand text and images to generative systems that can create images from textual descriptions or generate captions for visual content. This ability to process multiple modalities simultaneously has enabled more natural human-computer interactions and more comprehensive data analysis capabilities across industries. 

On the other hand, multi-agent systems are composed of multiple intelligent agents that can interact with each other and their environment to achieve a common goal. Key characteristics of multi-agent systems are autonomy, interaction, and collaboration. Autonomy refers to the ability of agents to make decisions independently. Interaction is a process in which agents can communicate and coordinate with each other, and finally, when agents can work together to solve complex problems refers to the process of collaboration. 

The effectiveness of multiagent systems depends on the agent coordination mechanisms. These include task allocation, information sharing, and conflict resolution. Agents may interact in cooperative frameworks where the agents operate towards shared goals and objectives. The agents in multiagent systems can operate sequentially, one after another, or hierarchically, where a supervisor agent directs other agents to operate and finish tasks. 

Frameworks like CrewAI, AutoGen, and LangGraph are the most widely used multiagent multimodal agent development systems. These frameworks provide infrastructure for defining agent roles, communication channels, shared knowledge management, and collaborative workflow orchestrations. They abstract a huge amount of complexity involved in building such sophisticated architectures, making it user-friendly and efficient for users in their agentic development requirements. Tools such as AgentOps and Portkey support users by effectively monitoring, tracing, and evaluating agents.

Working of CrewAI 

Comprehending Replicate and its Usability

Replicate is a platform that makes it easy to run AI models in the cloud making it easy for users to access advanced AI models through API tokens. It serves as a hub for deploying, sharing, and using AI models without needing to set up the complex infrastructure required to use and run them. Replicate enables users to run open-source models, create fine-tuned models, or build and publish custom models. 

Replicate hosts a wide range of models, covering diverse applications such as image generation, language modeling, and audio & video processing. It also offers API access to users enabling them to integrate models into their applications with ease. This API token simplifies the process of sending inputs to models and receiving outputs. 

Replicate Dashboard

Replicate provides different features such as models, predictions, deployments, and webhooks. Models are trained, packaged, and published software programs that accept user input and return output. Predictions represent the execution of a model, including the inputs provided and the outputs generated. Deployments, on the other hand, allow for more control over how the models are executed and webhooks provide real-time updates about user predictions. 

AI Agent Observability and Tracing

AI agent observability and tracing refers to the practice of tracking and understanding the internal workings and behavior of agents. It extends beyond simple monitoring, which focuses on external metrics like uptime and resource utilization, to understand the agent’s decision-making processes, interactions, and states. In a complex AI agent, powered by LLMs, observability becomes important for debugging and ensuring the agent’s actions align with the intended goals and objectives. 

The process of observability of an AI agent involves collecting and analyzing various forms of data, including metrics, logs, and traces, which are used for building a complete view of the agent’s operation. Tracing is another component that primarily focuses on tracking the path of a specific request or action as it flows through the agent system. It provides a granular view, a step-by-step breakdown of how the agent processes information and interacts with other components or agents. 

By following the trace of a request, users can understand the sequence of actions and events that led to an outcome, and analyze the interactions between different agents. This level of detail is important for optimizing performance, improving agent reliability, and building trust in AI agent systems. The importance of AI agent observability and tracing increases with the complexity of AI agent systems. As agents become more complex and implement critical tasks, ensuring their reliability and trustworthiness is of prime importance. 

Portkey is one comprehensive platform designed to act as a unified interface for interacting with AI models, observing and tracing agents, and implementing guardrails with ease. It also features advanced features such as multimodal capabilities, load balancing, virtual keys, conditional routing, caching, and budget limits. The most important feature that Portkey offers is observability and logs. It can assist users in gaining real-time insights, tracking important metrics, and streamlining debugging with ease and efficiency. 

Portkey Dashboard

Hands-on Implementation of Portkey for Observing Agents

Let’s implement a multi-modal multi-agent system using CrewAI & Replicate and implementing agent observability through Portkey. 

Pre-requisites: 

  1. Tavily account is needed for creating and implementing a web-search agent
  2. Replicate account is required for using a model for a text-to-image agent. 
  3. Portkey account is needed for monitoring the agent run and tracking execution phases. 

Step 1: Library Installation

Step 2: API Initialization

Step 3: Implement a web search tool helper function using Tavily

Step 4: Create a text-to-image creation helper function using Replicate, the model we will use is adirik/flux-cinestill:216a43b9975de9768114644bbf8cd0cba54a923c6d0f65adceaccfc9383a938f”

Step 5: Create a text-to-image processing helper function

def image2text(image_url:str,prompt:str) -> str:

"""This tool is useful when we want to generate textual descriptions from images."""

 # Function

 output = replicate.run(

   "adirik/flux-cinestill:216a43b9975de9768114644bbf8cd0cba54a923c6d0f65adceaccfc9383a938f",

   input={

       "image": image_url,

       "top_p": 1,

       "prompt": prompt,

       "max_tokens": 1024,

       "temperature": 0.2

   }

 )

 return "".join(output)

Step 6: Setup a Router Tool

from crewai.tools import tool

@tool("router tool")

def router_tool(question:str) -> str:

 """Router Function"""

 prompt = f"""Based on the Question provide below determine the following:

   1. Is the question directed at generating image ?

   2. Is the question directed at describing the image ?

   3. Is the question a generic one and needs to be answered by searching the web?

   Question: {question}

   RESPONSE INSTRUCTIONS:

   - Answer either 1 or 2 or 3.

   - Answer should strictly be a string.

   - Do not provide any preamble or explanations except for 1 or 2 or 3.

   OUTPUT FORMAT:

   1

   """

 response = llm.invoke(prompt).content

 if response == "1":

   return 'text2image'

 elif response == "3":

   return 'web_search'

 else:

   return 'image2text'

Step 7: Setup a Retriever Tool

Step 8: Portkey Setup

Step 9: Create a Router Agent

Step 9: Create a Retriever Agent

Step 10: Create the Router Task and Retriever Task

Step 11: Initiate the CrewAI

Step 12: Check the generated image as per the executed agents

Output: 

Check the Portkey Web UI for agent tracing and logs – 

Final Words

Continuous improvement of AI agents by providing valuable feedback for refining their behavior and capabilities is of prime importance in a multi-agent and multi-modal setting. Observability and tracing of such agents where multiple data type processing is done becomes even more complex and more necessary, as the data needs to be tracked from different sources, also the transparency and accountability factors are crucial for building public trust in AI. Observability and tracing of AI agents provide the means to achieve these goals and objectives, by providing important insights into the agent’s inner workings. 

References

  1. Link to Colab Notebook
  2. CrewAI Documentation
  3. Replicate Documentation
  4. Porkey Documentation

Picture of Sachin Tripathi

Sachin Tripathi

Sachin Tripathi is the Manager of AI Research at AIM, with over a decade of experience in AI and Machine Learning. An expert in generative AI and large language models (LLMs), Sachin excels in education, delivering effective training programs. His expertise also includes programming, big data analytics, and cybersecurity. Known for simplifying complex concepts, Sachin is a leading figure in AI education and professional development.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.