A Hands-on Guide to Arize Phoenix for LLM Observability

Robust monitoring and observability tool Arize AI's Phoenix aids LLM deployment and optimization.

The complexity of LLMs present significant challenges in their deployment and maintenance, especially, when it comes to understanding and optimizing their performance. A robust monitoring and analysis tool is needed for overcoming this sort of challenge. Arize AI’s Phoenix is one such comprehensive tool that can support LLM application developers with an exhaustive view of LLM’s execution. This article explains the fundamentals of Phoenix in a practical manner. 

Table of Contents

  1. Understanding Arize Phoenix
  2. Functionalities of Phoenix
  3. Hands-on Tutorial on Arize Phoenix Implementation

Understanding Arize Phoenix

Phoenix is an open-source comprehensive LLM observability and evaluation platform developed by Arize AI. As the adoption and implementation of LLMs continue to rise across various industries, the need for robust observability and evaluation has become increasingly critical. Arize AI, a leading provider of AI observability solutions, built and introduced Phoenix, a comprehensive platform designed to address the unique challenges faced by users in terms of LLM traceability and explainability

LLM models use a sophisticated and unpredictable approach while harnessing the power of deep learning and NLP. This can be difficult or inadequate to be monitored or evaluated using traditional approaches. Phoenix aims to bridge this gap, providing a holistic solution that enables users to gain unprecedented visibility, ensuring reliable performance and continuous LLM optimization. 

One of the key features of Phoenix is its real-time monitoring and tracing capabilities. LLM tracing refers to the process of capturing and recording the detailed information about each step involved in LLM’s execution. This includes the inputs, outputs, timings, and other relevant metadata. It’s a comprehensive log of whatever happens when an LLM application runs. 

LLM Tracing can enable users to track issues related to application latency, token usage, runtime exceptions, retrieved documents, embeddings, LLM parameters, prompt templates, etc. Phoenix performs LLM tracing using the concepts of spans and traces. A span is a unit of work or operation. It contains a name, time-related data, structured log messages, and other metadata to provide information about the operation it tracks. 

Types of spans and their roles

Trace on the other hand, records the paths taken by requests as they move through multiple steps. Tracing improves the visibility of LLM applications and enables us to pinpoint the performance issues in the system easily. 

Example of Trace

Functionalities of Phoenix

The functionalities and capabilities of Phoenix can be easily understood using the image below: 

Hands-on Tutorial on Arize Phoenix Implementation

Let us implement Phoenix with LlamaIndex and OpenAI API for tracing our RAG application and evaluating it. 

Step 1: Install the necessary libraries 

Step 2: Importing the installed libraries

Step 3: Launching phoenix session to record our LLM operations 

Output: 

Make a note of the localhost link, we will use it to access Phoenix dashboard

Step 4: Passing the OpenAI API Key (Make you have a .env file with OPENAI_API_KEY variable and key)

Step 5: Setting ‘arize_phoenix’ as the global handler. This will mount Phoenix’s OpenInferenceTraceCallback as the global handler to enable our LLM app tracing

set_global_handler("arize_phoenix")

Step 6: Implementing the vector index and query engine for LlamaIndex

Step 7: Passing a list of queries for Phoenix to trace and record

# Passing our queries (prompts) based on the data we have loaded

queries = ["What is Game of Thrones?", "Who is the main character?", "What is the significance of Game of Thrones?", "How did Game of Thrones get its name?"]

queries

Step 8: Running our custom query list on the query engine 

# Calling a for-loop to pass our query list's item in the query engine

for query in tqdm(queries):
   query_engine.query(query)

Step 9: Importing Phoenix’s evaluation components

Step 10: Creating a data frame of our queries and retrieved documents for evaluation

Output: 

Step 11: Implementing evaluation using Phoenix based on Hallucination, QA Correctness and Relevance

Output: 

Once the code is executed, we can head on to the Phoenix app which was working on the backend using the URL provided as the code output in Step 3 and check. 

Image 1: Phoenix shows a default project with our query traces and spans –

Image 2: Let’s click on any query and check their step-by-step details – 

Image 3: We can also see the spans – 

Image 4: Important metrics – 

Image 5: Traces show the evaluation as well – 

As you can see, Phoenix does a fantastic job in capturing all the information related to our RAG pipeline based on LlamaIndex and OpenAI. 

Final Words

Arize Phoenix is one of the pivotal tools in the complex landscape of LLM development and deployment. By providing a comprehensive observability and explainability, it enables users and developers to gain insights into LLM behavior which are invisible by default. Phoenix is one of the best and prime options in LLM operations and monitoring (LLMOps). 

References

  1. Link to the code
  2. Arize Phoenix Documentation
  3. Arize Phoenix Git Repo
Picture of Sachin Tripathi

Sachin Tripathi

Sachin Tripathi is the Manager of AI Research at AIM, with over a decade of experience in AI and Machine Learning. An expert in generative AI and large language models (LLMs), Sachin excels in education, delivering effective training programs. His expertise also includes programming, big data analytics, and cybersecurity. Known for simplifying complex concepts, Sachin is a leading figure in AI education and professional development.

The Chartered Data Scientist Designation

Achieve the highest distinction in the data science profession.

Elevate Your Team's AI Skills with our Proven Training Programs

Strengthen Critical AI Skills with Trusted Generative AI Training by Association of Data Scientists.

Our Accreditations

Get global recognition for AI skills

Chartered Data Scientist (CDS™)

The highest distinction in the data science profession. Not just earn a charter, but use it as a designation.

Certified Data Scientist - Associate Level

Global recognition of data science skills at the beginner level.

Certified Generative AI Engineer

An upskilling-linked certification initiative designed to recognize talent in generative AI and large language models

Join thousands of members and receive all benefits.

Become Our Member

We offer both Individual & Institutional Membership.