Skip to content

Observability Router Configuration

This document describes the metrics and traces defined in the observability router configuration. These definitions provide parameterized query templates for monitoring AIRefinery inference services, agent workflows, and user sessions, enabling access to common telemetry patterns without writing raw PromQL or TraceQL queries.

Note: To use the Observability APIs, set the environment variable USE_AIR_API_V2_BASE_URL=True in your SDK environment. Queries will then use https://api-prod-k8s.airefinery.accenture.com/. This feature is available starting from SDK version 1.25.0. Any preset that uses time_window can be converted to a time-series range query by passing the step parameter (e.g., "step": "15m")—this returns time-bucketed matrix data suitable for charts and visualizations.

Metrics

Inference Metrics

Metrics for monitoring LLM inference performance, including request counts, latency distributions, error rates, and model usage patterns.


inference_requests_total

  • Total number of inference requests over the specified time window.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)

inference_active_model_count

  • Number of distinct models that have received requests within the time window.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)

inference_model_usage

  • Per-model inference usage rate over the time window.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)

inference_latency

  • Inference latency at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`, `0.99`, or `50`, `90`, `95`, `99`. Default: `0.95`
- `step` (optional)

inference_error_rate

  • Inference error rate as a ratio of errors to total requests. Returns a value between 0 and 1 (e.g., 0.05 means 5% error rate).

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)

Agent Metrics

All agent metrics support filtering by both agent_name and agent_class. The agent_class refers to the implementation type (e.g., ToolUseAgent, SearchAgent, CustomAgent) and is useful for aggregating across agents of the same type regardless of their user-defined names.


agent_task_total

  • Total agent tasks broken down by agent name, agent class, and status (success/failure/timeout) over the time window.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

agent_performance_rate

  • Agent task rate by status over the time window. Defaults to success rate when status is not provided.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `status` (optional) — defaults to `success`. Can also be `failure` or `timeout`
- `time_window` (required)
- `step` (optional)

agent_throughput

  • Agent task completion rate in tasks per second.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

agent_latency

  • Agent task latency at a specified percentile, grouped by agent name and agent class. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`. Default: `0.95`
- `step` (optional)

agent_duration

  • Total time spent per agent in seconds over the time window, grouped by agent name and agent class.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

agent_dependency_calls

  • Count of external dependency calls over the time window, broken down by agent name, agent class, API type, and source.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

agent_tool_calls

  • Count of tool calls over the time window, broken down by agent name, agent class, API type, and tool name.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

agent_messages

  • Inter-agent message counts over the time window, by sender and receiver including their agent classes.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

agent_messages_with_tokens

  • Inter-agent message token consumption over the time window, broken down by sender/receiver pair, their agent classes, and token type (input/output/total).

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

agent_orchestration_overhead

  • Orchestration overhead ratio at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `percentile` (optional) — Default: `0.95`
- `step` (optional)

Token Consumption Metrics

Metrics for tracking LLM token usage across models and agents, including input/output breakdowns for cost analysis and usage optimization.


token_consumption

  • Total token consumption grouped by organization, project, and model. Supports optional agent filtering to narrow down consumption to specific agents.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `model_key` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

token_input_total / token_output_total

  • Input and output tokens broken out separately. Support optional agent filtering to narrow down to specific agents.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `model_key` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

token_consumption_by_agent

  • Token consumption grouped by agent name and agent class over the time window.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

Session Metrics

Metrics for monitoring user session activity, including session counts, durations, and request throughput.


sessions_total

  • Total number of sessions started over the time window.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

sessions_active

  • Number of currently active sessions (gauge — returns current value, no time window needed).

Parameters:

- `organization_id` (required)
- `project_name` (optional)

session_duration

  • Session duration at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`, `0.99`. Default: `0.95`
- `step` (optional)

session_requests_total

  • Total requests processed within sessions over the time window.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

session_requests_rate

  • Session request rate in requests per second.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

RAI Compliance Metrics

Metrics for tracking Responsible AI (RAI) compliance checks, including check counts, rejection rates by category, and latency.


rai_check_total

  • Total number of RAI (Responsible AI) compliance checks performed over the time window.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

rai_rejection_total

  • Total number of queries that failed RAI compliance checks over the time window. Supports optional filtering by rejection category.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `category` (optional) — filter by rejection category: `harassment`, `hate`, `self-harm`, `sexual`, `violence`, `illicit`
- `time_window` (required)
- `step` (optional)

rai_check_latency

  • RAI compliance check latency at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (required)
- `project_name` (optional)
- `time_window` (required)
- `percentile` (optional) — Default: `0.95`
- `step` (optional)

Traces


inference_traces

  • Traces for inference service requests.

Parameters:

- `organization_id` (required)
- `project_name` (optional)

distiller_traces

  • Traces for distiller service operations.

Parameters:

- `organization_id` (required)
- `project_name` (optional)

Notes

  • Time windows: Prometheus duration format (5m, 1h, 24h). Default: 1h
  • Percentile: Accepts 0.95 or 95 format. Default: 0.95 (p95)
  • Time-series mode: Pass step (e.g., "15m") to get matrix data for charting
  • Agent class: Filter by implementation type (e.g., ToolUseAgent) across all agents of that type