Observability Metrics & Traces Reference¶

This page describes the available metrics and traces presets. These parameterized query templates provide access to common telemetry patterns for monitoring AI Refinery inference services, agent workflows, and user sessions — without writing raw PromQL or TraceQL queries.

Note: The Observability APIs are available by default on api.airefinery.accenture.com. This feature is available starting from SDK version 1.25.0.

Time-series queries: Any metric that accepts time_window can also accept an optional step parameter (e.g., "step": "15m"). When provided, the response includes multiple data points at regular intervals instead of a single aggregated value — useful for building charts and trend visualizations.

Metrics¶

Inference Metrics¶

Metrics for monitoring LLM inference performance, including request counts, latency distributions, error rates, and model usage patterns.

inference_requests_total

Total number of inference requests over the specified time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)

inference_active_model_count

Number of distinct models that have received requests within the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)

inference_model_usage

Per-model inference usage rate over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)

inference_latency

Inference latency at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`, `0.99`, or `50`, `90`, `95`, `99`. Default: `0.95`
- `step` (optional)

inference_error_rate

Inference error rate as a ratio of errors to total requests. Returns a value between 0 and 1 (e.g., 0.05 means 5% error rate).

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)

Agent Metrics¶

All agent metrics support filtering by both agent_name and agent_class. The agent_class refers to the implementation type (e.g., ToolUseAgent, SearchAgent, CustomAgent) and is useful for aggregating across agents of the same type regardless of their user-defined names.

agent_task_total

Total agent tasks broken down by agent name, agent class, and status (success/failure/timeout) over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

agent_performance_rate

Agent task rate by status over the time window. Defaults to success rate when status is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `status` (optional) — defaults to `success`. Can also be `failure` or `timeout`
- `time_window` (required)
- `step` (optional)

agent_throughput

Agent task completion rate in tasks per second.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

agent_latency

Agent task latency at a specified percentile, grouped by agent name and agent class. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`. Default: `0.95`
- `step` (optional)

agent_duration

Total time spent per agent in seconds over the time window, grouped by agent name and agent class.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

agent_dependency_calls

Count of external dependency calls over the time window, broken down by agent name, agent class, API type, and source.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

agent_tool_calls

Count of tool calls over the time window, broken down by agent name, agent class, API type, and tool name.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

agent_messages

Inter-agent message counts over the time window, by sender and receiver including their agent classes.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

agent_messages_with_tokens

Inter-agent message token consumption over the time window, broken down by sender/receiver pair, their agent classes, and token type (input/output/total).

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

agent_orchestration_overhead

Orchestration overhead ratio at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `percentile` (optional) — Default: `0.95`
- `step` (optional)

Token Consumption Metrics¶

Metrics for tracking LLM token usage across models and agents, including input/output breakdowns for cost analysis and usage optimization.

token_consumption

Total token consumption grouped by organization, project, and model. Supports optional agent filtering to narrow down consumption to specific agents.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

token_input_total / token_output_total

Input and output tokens broken out separately. Support optional agent filtering to narrow down to specific agents.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

token_consumption_by_agent

Token consumption grouped by agent name and agent class over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)

Session Metrics¶

Metrics for monitoring user session activity, including session counts, durations, and request throughput.

sessions_total

Total number of sessions started over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

sessions_active

Number of currently active sessions (gauge — returns current value, no time window needed).

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)

session_duration

Session duration at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`, `0.99`. Default: `0.95`
- `step` (optional)

session_requests_total

Total requests processed within sessions over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

session_requests_rate

Session request rate in requests per second.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

RAI Compliance Metrics¶

Metrics for tracking Responsible AI (RAI) compliance checks, including check counts, rejection rates by category, and latency.

rai_check_total

Total number of RAI (Responsible AI) compliance checks performed over the time window.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)

rai_rejection_total

Total number of queries that failed RAI compliance checks over the time window. Supports optional filtering by rejection category.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `category` (optional) — filter by rejection category: `harassment`, `hate`, `self-harm`, `sexual`, `violence`, `illicit`
- `time_window` (required)
- `step` (optional)

rai_check_latency

RAI compliance check latency at a specified percentile. Defaults to p95 when percentile is not provided.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `percentile` (optional) — Default: `0.95`
- `step` (optional)

Traces¶

inference_traces

Traces for inference service requests.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)

distiller_traces

Traces for distiller service operations.

Parameters:

- `organization_id` (auto-resolved from token)
- `project_name` (optional)

Notes¶

Time windows: Prometheus duration format (5m, 1h, 24h). Default: 1h

Percentile: Accepts 0.95 or 95 format. Default: 0.95 (p95)

Time-series mode: Pass step (e.g., "15m") to get matrix data for charting

Agent class: Filter by implementation type (e.g., ToolUseAgent) across all agents of that type