Quickstart Guide for GenAI Monitor#

GenAI Monitor provides automatic observability for your GenAI applications with zero code changes. Follow this guide to get started with minimal effort.

Installation#

Install the base package with support for your preferred AI framework:

# Choose one or more providers based on your needs
pip install "genai-monitor[openai]"     # For OpenAI API
pip install "genai-monitor[transformers]"  # For Hugging Face Transformers
pip install "genai-monitor[diffusers]"  # For Hugging Face Diffusers
pip install "genai-monitor[litellm]"    # For LiteLLM

# Or install with all providers
pip install "genai-monitor[all]"

Implicit Monitoring (Zero-Code Changes)#

Simply import the auto module at the beginning of your application:

# Import this first to enable automatic monitoring
import genai_monitor.auto

# Then use your AI libraries as usual - no changes needed!

Example with OpenAI#

import os
import genai_monitor.auto  # This enables monitoring automatically
from openai import OpenAI

# Your regular OpenAI code
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
question = "What is the capital of France?"

# First call - sent to OpenAI and cached in database
response = client.chat.completions.create(
    messages=[{"role": "user", "content": question}],
    model="gpt-3.5-turbo",
)
print(response.choices[0].message.content)

# Second call with same parameters - retrieved from database, no API call made!
response = client.chat.completions.create(
    messages=[{"role": "user", "content": question}],
    model="gpt-3.5-turbo",
)
print(response.choices[0].message.content)

Example with Transformers#

import genai_monitor.auto  # This enables monitoring automatically
from transformers import pipeline

# Your regular transformers code
generator = pipeline('text-generation', model='gpt2')

# First call - runs model and caches result
result = generator("Hello, I'm a language model", max_length=30)
print(result[0]['generated_text'])

# Second call with same parameters - retrieved from database, no model inference!
result = generator("Hello, I'm a language model", max_length=30)
print(result[0]['generated_text'])

How It Works#

GenAI Monitor automatically:

Intercepts calls to supported AI frameworks
Stores inputs and outputs in a local database
Returns cached results for identical calls
Provides this functionality with zero application code changes

This approach dramatically reduces:

API costs by eliminating duplicate calls
Latency by skipping remote API calls for repeated queries
Compute resources by caching expensive model inferences

Next Steps#

See how to use explicit registration for unsupported frameworks
Add artifact tracking to your model calls
Add metadata to your model calls