The complexity of LLMs present significant challenges in their deployment and maintenance, especially, when it comes to understanding and optimizing their performance. A robust monitoring and analysis tool is needed for overcoming this sort of challenge. Arize AI’s Phoenix is one such comprehensive tool that can support LLM application developers with an exhaustive view of LLM’s execution. This article explains the fundamentals of Phoenix in a practical manner.
Table of Contents
- Understanding Arize Phoenix
- Functionalities of Phoenix
- Hands-on Tutorial on Arize Phoenix Implementation
Understanding Arize Phoenix
Phoenix is an open-source comprehensive LLM observability and evaluation platform developed by Arize AI. As the adoption and implementation of LLMs continue to rise across various industries, the need for robust observability and evaluation has become increasingly critical. Arize AI, a leading provider of AI observability solutions, built and introduced Phoenix, a comprehensive platform designed to address the unique challenges faced by users in terms of LLM traceability and explainability.
LLM models use a sophisticated and unpredictable approach while harnessing the power of deep learning and NLP. This can be difficult or inadequate to be monitored or evaluated using traditional approaches. Phoenix aims to bridge this gap, providing a holistic solution that enables users to gain unprecedented visibility, ensuring reliable performance and continuous LLM optimization.
One of the key features of Phoenix is its real-time monitoring and tracing capabilities. LLM tracing refers to the process of capturing and recording the detailed information about each step involved in LLM’s execution. This includes the inputs, outputs, timings, and other relevant metadata. It’s a comprehensive log of whatever happens when an LLM application runs.
LLM Tracing can enable users to track issues related to application latency, token usage, runtime exceptions, retrieved documents, embeddings, LLM parameters, prompt templates, etc. Phoenix performs LLM tracing using the concepts of spans and traces. A span is a unit of work or operation. It contains a name, time-related data, structured log messages, and other metadata to provide information about the operation it tracks.
Types of spans and their roles
Trace on the other hand, records the paths taken by requests as they move through multiple steps. Tracing improves the visibility of LLM applications and enables us to pinpoint the performance issues in the system easily.
Functionalities of Phoenix
The functionalities and capabilities of Phoenix can be easily understood using the image below:
Hands-on Tutorial on Arize Phoenix Implementation
Let us implement Phoenix with LlamaIndex and OpenAI API for tracing our RAG application and evaluating it.
Step 1: Install the necessary libraries
!pip install "arize-phoenix[evals,llama-index]" llama-index-llms-openai openai llama-index-callbacks-arize-phoenix
Step 2: Importing the installed libraries
import os
import openai
import dotenv
import pandas as pd
import phoenix as px
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
ServiceContext,
set_global_handler
)
from llama_index.llms.openai import OpenAI
from tqdm import tqdm
Step 3: Launching phoenix session to record our LLM operations
session = px.launch_app()
Output:
Make a note of the localhost link, we will use it to access Phoenix dashboard
Step 4: Passing the OpenAI API Key (Make you have a .env file with OPENAI_API_KEY variable and key)
dotenv.load_dotenv()
Step 5: Setting ‘arize_phoenix’ as the global handler. This will mount Phoenix’s OpenInferenceTraceCallback as the global handler to enable our LLM app tracing
set_global_handler("arize_phoenix")
Step 6: Implementing the vector index and query engine for LlamaIndex
# Loading the data from the directory using SimpleDirectoryReader and building a VectorStoreIndex
# directory name is data and it comprises a text document on Game of Thrones script
documents = SimpleDirectoryReader("./data/").load_data()
vector_index = VectorStoreIndex.from_documents(documents)
# Initializing a query engine
query_engine = vector_index.as_query_engine()
Step 7: Passing a list of queries for Phoenix to trace and record
# Passing our queries (prompts) based on the data we have loaded
queries = ["What is Game of Thrones?", "Who is the main character?", "What is the significance of Game of Thrones?", "How did Game of Thrones get its name?"]
queries
Step 8: Running our custom query list on the query engine
# Calling a for-loop to pass our query list's item in the query engine
for query in tqdm(queries):
query_engine.query(query)
Step 9: Importing Phoenix’s evaluation components
from phoenix.session.evaluation import get_qa_with_reference, get_retrieved_documents
from phoenix.trace import DocumentEvaluations, SpanEvaluations
from phoenix.evals import (
HallucinationEvaluator,
OpenAIModel,
QAEvaluator,
RelevanceEvaluator,
run_evals,
)
Step 10: Creating a data frame of our queries and retrieved documents for evaluation
queries_df = get_qa_with_reference(session)
queries_df
Output:
Step 11: Implementing evaluation using Phoenix based on Hallucination, QA Correctness and Relevance
retrieved_documents_df = get_retrieved_documents(session)
retrieved_documents_df
Output:
Once the code is executed, we can head on to the Phoenix app which was working on the backend using the URL provided as the code output in Step 3 and check.
Image 1: Phoenix shows a default project with our query traces and spans –
Image 2: Let’s click on any query and check their step-by-step details –
Image 3: We can also see the spans –
Image 4: Important metrics –
Image 5: Traces show the evaluation as well –
As you can see, Phoenix does a fantastic job in capturing all the information related to our RAG pipeline based on LlamaIndex and OpenAI.
Final Words
Arize Phoenix is one of the pivotal tools in the complex landscape of LLM development and deployment. By providing a comprehensive observability and explainability, it enables users and developers to gain insights into LLM behavior which are invisible by default. Phoenix is one of the best and prime options in LLM operations and monitoring (LLMOps).