Skip to content

Observability Metrics & Traces Reference

This page describes the available metrics and traces presets. These parameterized query templates provide access to common telemetry patterns for monitoring AI Refinery inference services, agent workflows, and user sessions.

Note: To use the Observability APIs, set the environment variable USE_AIR_API_V2_BASE_URL=True in your SDK environment. Queries will then use https://api-prod-k8s.airefinery.accenture.com/. This feature is available starting from SDK version 1.25.0. Any preset that uses time_window can be converted to a time-series range query by passing the step parameter (e.g., "step": "15m")—this returns time-bucketed matrix data suitable for charts and visualizations.

Metrics

Inference Metrics

Metrics for monitoring LLM inference performance, including request counts, latency distributions, error rates, and model usage patterns.


inference_requests_total

  • Total number of inference requests over the specified time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

inference_active_model_count

  • Number of distinct models that have received requests within the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)

inference_model_usage

  • Per-model inference usage rate over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

inference_latency

  • Inference latency at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`, `0.99`, or `50`, `90`, `95`, `99`. Default: `0.95`
- `step` (optional)

inference_error_rate

  • Inference error rate as a ratio of errors to total requests. Returns a value between 0 and 1 (e.g., 0.05 means 5% error rate).

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

Agent Metrics

All agent metrics support filtering by both agent_name and agent_class. The agent_class refers to the implementation type (e.g., ToolUseAgent, SearchAgent, CustomAgent) and is useful for aggregating across agents of the same type regardless of their user-defined names.


agent_task_total

  • Total agent tasks broken down by agent name, agent class, and status (success/failure/timeout) over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

agent_performance_rate

  • Agent task rate by status over the time window. Defaults to success rate when status is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `status` (optional) — defaults to `success`. Can also be `failure` or `timeout`
- `time_window` (required)
- `step` (optional)

agent_throughput

  • Agent task completion rate in tasks per second.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

agent_latency

  • Agent task latency at a specified percentile, grouped by agent name and agent class. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`. Default: `0.95`
- `step` (optional)

agent_duration

  • Total time spent per agent in seconds over the time window, grouped by agent name and agent class.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

agent_dependency_calls

  • Count of external dependency calls over the time window, broken down by agent name, agent class, API type, and source.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

agent_tool_calls

  • Count of tool calls over the time window, broken down by agent name, agent class, API type, and tool name.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

agent_messages

  • Inter-agent message counts over the time window, by sender and receiver including their agent classes.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

agent_messages_with_tokens

  • Inter-agent message token consumption over the time window, broken down by sender/receiver pair, their agent classes, and token type (input/output/total).

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

agent_orchestration_overhead

  • Orchestration overhead ratio at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `percentile` (optional) — Default: `0.95`
- `step` (optional)

Token Consumption Metrics

Metrics for tracking LLM token usage across models and agents, including input/output breakdowns for cost analysis and usage optimization.


token_consumption

  • Total token consumption grouped by organization, project, and model. Supports optional agent filtering to narrow down consumption to specific agents.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

token_input_total / token_output_total

  • Input and output tokens broken out separately. Support optional agent filtering to narrow down to specific agents.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

token_consumption_by_agent

  • Token consumption grouped by agent name and agent class over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

Session Metrics

Metrics for monitoring user session activity, including session counts, durations, and request throughput.


sessions_total

  • Total number of sessions started over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

sessions_active

  • Number of currently active sessions (gauge — returns current value, no time window needed).

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)

session_duration

  • Session duration at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`, `0.99`. Default: `0.95`
- `step` (optional)

session_requests_total

  • Total requests processed within sessions over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

session_requests_rate

  • Session request rate in requests per second.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

RAI Compliance Metrics

Metrics for tracking Responsible AI (RAI) compliance checks, including check counts, rejection rates by category, and latency.


rai_check_total

  • Total number of RAI (Responsible AI) compliance checks performed over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `step` (optional)

rai_rejection_total

  • Total number of queries that failed RAI compliance checks over the time window. Supports optional filtering by rejection category.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `session_id` (optional) — filter to a specific session
- `category` (optional) — filter by rejection category: `harassment`, `hate`, `self-harm`, `sexual`, `violence`, `illicit`
- `time_window` (required)
- `step` (optional)

rai_check_latency

  • RAI compliance check latency at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `session_id` (optional) — filter to a specific session
- `time_window` (required)
- `percentile` (optional) — Default: `0.95`
- `step` (optional)

Traces


inference_traces

  • Traces for inference service requests.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)

distiller_traces

  • Traces for distiller service operations.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)

Notes

  • Time windows: Duration format (5m, 1h, 24h). Default: 1h
  • Percentile: Accepts 0.95 or 95 format. Default: 0.95 (p95)
  • Time-series mode: Pass step (e.g., "15m") to get matrix data for charting
  • Session filtering: Most metrics support session_id for narrowing results to a specific user session
  • Agent class: Filter by implementation type (e.g., ToolUseAgent) across all agents of that type