Observing and Examining AI Agents through AgentOps

Explore how AgentOps monitors, debugs, and tracks costs for LLM-based AI agents in various contexts.

Computer programs that can act autonomously in an environment to achieve a user-specific goal are known as AI agents. These agents can be built using large language models where LLMs understand the language, apply reasoning and generate data. These LLM-based AI agents can be queried using prompts that can guide the LLM towards user-specific goals and requirements. By observing how the LLM interacts with prompts and data, a user can gain insights into its strengths and weaknesses. AgentOps is one such tool that can be used to observe and examine AI agents and support generative AI agent building in different contexts. 

This article explores AgentOps and how it can be used for monitoring, debugging and cost-tracking AI agents

Table of Content

  1. AI Agent Monitoring and its Importance
  2. Understanding AgentOps and its Features
  3. Hands-on Implementation of AgentOps

AI Agent Monitoring and its Importance

Monitoring an AI agent involves overseeing the performance and behaviour to ensure they operate effectively. AI agents, if not observed and monitored properly, can malfunction, produce inaccurate results or generate biased outputs. Challenges also include explainability, regulatory compliance and secure operations. These challenges can be resolved by implementing a proper observation and monitoring approach while developing LLM-based AI agents.

Observing and Examining agents can be accomplished using a wide array of techniques:  

Logging and Metrics – Tracking the agent’s activity, logging execution phases, outputs, resource usage and cost analysis are important aspects of logging and metrics. It helps in understanding an agent’s overall operation and how it can be improved. 

Visualisation Tools – Monitoring data can be visualised through dashboards and reports providing a clear idea about the agent’s performance over time. 

Replay Analytics – Replaying user interactions, especially in LLM-based agents reveals usability issues and explains the agent’s behaviour. 

Specialised Toolkits – Several toolkits are available, that offer exhaustive monitoring and evaluation of AI agents such as AgentOps, LangFuse, Phoenix, etc. 

Understanding AgentOps and its Features

AgentOps is an all-inclusive platform for building reliable AI agents using monitoring, testing and replay analytics. AgentOps dashboard examines AI agents through sessions. A session encloses a singular execution of LLM agent workflow containing all the agents, LLMs, agent actions, and responses under one container for easy management. Session attributes consist of ID, Project ID, Timestamps and Tags.

AgentOps provides features such as Session drill-down and session overview which enlists the recorded sessions, their necessary details and meta-analysis. 

AgentOps utilises the concept of events which are agent executions based on LLMs, tools and agentic actions. An event consists of attributes such as ID, Session ID, Agent ID, Parameters, Timestamps, LLM Model, Prompt Messages, Tokens, Cost, Thread ID, and Logs. 

AgentOps can perform LLM call tracking, agent tracking and event recording using its SDK decorators – @track_agent(), @record_function. The AgentOps SDK can detect Openai, LiteLLM and Cohere as installed modules and starts tracking their usage. AgentOps also supports and integrates the Autogen and CrewAI frameworks for building multi-agent applications.

Hands-on Implementation of AgentOps

Step 1 – Visit https://www.agentops.ai/ and create an account.

AgentOps WebUI

Step 2 – Visit the settings and create a new project, which will also automatically generate an API key (you can also use the default project).

Setting up Project in AgentOps

Step 3 – Generate the API, then use a development environment for further coding. Install the OpenAI, Cohere, and AgentOps Python libraries.

!pip install agentops openai cohere

Step 4 – Import the installed libraries

from openai import OpenAI
import cohere
from google.colab import userdata
import agentops
import os

Step 5 – Define the API keys: OpenAI API, Cohere API and AgentOps API. Initialise AgentOps using agentops.init() function to start tracking agent operations. 

OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")
AGENTOPS_API_KEY = userdata.get("AGENTOPS_API_KEY")
COHERE_API_KEY = userdata.get("COHERE_API_KEY")

openai = OpenAI(api_key = OPENAI_API_KEY)
agentops.init(AGENTOPS_API_KEY)
co = cohere.Client(COHERE_API_KEY)

Step 6 – Implement a simple agent using OpenAI’s chat completion and Cohere’s chat stream event, and track it with AgentOps. 

message = [
   {"role": "system", "content": "you are an assistant"},
         {"role": "user", "content": "What is the main premise of the Game of Thrones series?"}
   ]

response = openai.chat.completions.create(
   model = "gpt-3.5-turbo", messages = message, temperature = 0.1, stream = True
   )

stream = co.chat_stream(
   message = "What is the main premise of the Game of Thrones series?",
   connectors = [{"id": "web-search"}]
)

response = ""

for event in stream:
 if event.event_type == "text-generation":
   response += event.text

Step 7 – Once the agent finishes the text completion/generation based on OpenAI and Cohere, end the AgentOps session to finish the tracking process. 

agentops.end_session("Success")

Output

Step 8 – Visit AgentOps’s WebUI using the link shown in the previous output to check the session drill-down and summary features. This feature helps the user examine AI agents comrehensively.

Session Drill-Down and Summary Output 1 – It showcases the key metrics. 

Examine AI agents

Session Drill-Down and Summary Output 2 – It showcases the graphs representing event-based metrics and chat viewer.

examine AI agents

Session Drill-Down and Summary Output 3 – Check the session replay for step by step examination of agent’s execution.  

examine AI agents

The session drill-down menu showcases all the necessary artefacts required to monitor, examine and test our agent based on different criteria such as cost, token usage, and timestamps along with a complete trace of agent actions. 

Step 9 – You can implement a Pivot Table using the Pivot Table option in the left panel to cross-tabulate, examine, and compare multiple metrics.

examine AI agents

In the Pivot table, we can see total number of sessions, events, number of prompt_tokens, completion_tokens, total cost incurred and examine AI agents.  

Final Words

AgentOps is an excellent tool to comprehensively monitor, track and check an AI agent’s performance, which is beneficial in understanding and optimising the agent actions and behavior. The Session Drill-Down and Pivot Table features provide an exhaustive view of the agent execution events thereby making the entire operation explainable. 

References

  1. Link to Code Notebook
  2. AgentOps Documentation
  3. AgentOps GitHub Repo

Learn more about AI Agents and LLMs through our handpicked modules:

Picture of Sachin Tripathi

Sachin Tripathi

Sachin Tripathi is the Manager of AI Research at AIM, with over a decade of experience in AI and Machine Learning. An expert in generative AI and large language models (LLMs), Sachin excels in education, delivering effective training programs. His expertise also includes programming, big data analytics, and cybersecurity. Known for simplifying complex concepts, Sachin is a leading figure in AI education and professional development.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.