Observability Metrics & Traces Reference¶
This page describes the available metrics and traces presets. These parameterized query templates provide access to common telemetry patterns for monitoring AI Refinery inference services, agent workflows, and user sessions — without writing raw PromQL or TraceQL queries.
Note: The Observability APIs are available by default on
api.airefinery.accenture.com. This feature is available starting from SDK version 1.25.0.Time-series queries: Any metric that accepts
time_windowcan also accept an optionalstepparameter (e.g.,"step": "15m"). When provided, the response includes multiple data points at regular intervals instead of a single aggregated value — useful for building charts and trend visualizations.
Metrics¶
Inference Metrics¶
Metrics for monitoring LLM inference performance, including request counts, latency distributions, error rates, and model usage patterns.
inference_requests_total
- Total number of inference requests over the specified time window.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)
inference_active_model_count
- Number of distinct models that have received requests within the time window.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)
inference_model_usage
- Per-model inference usage rate over the time window.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)
inference_latency
- Inference latency at a specified percentile. Defaults to p95 when
percentileis not provided.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`, `0.99`, or `50`, `90`, `95`, `99`. Default: `0.95`
- `step` (optional)
inference_error_rate
- Inference error rate as a ratio of errors to total requests. Returns a value between 0 and 1 (e.g., 0.05 means 5% error rate).
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `time_window` (required)
- `step` (optional)
Agent Metrics¶
All agent metrics support filtering by both agent_name and agent_class. The agent_class refers to the implementation type (e.g., ToolUseAgent, SearchAgent, CustomAgent) and is useful for aggregating across agents of the same type regardless of their user-defined names.
agent_task_total
- Total agent tasks broken down by agent name, agent class, and status (success/failure/timeout) over the time window.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)
agent_performance_rate
- Agent task rate by status over the time window. Defaults to success rate when
statusis not provided.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `status` (optional) — defaults to `success`. Can also be `failure` or `timeout`
- `time_window` (required)
- `step` (optional)
agent_throughput
- Agent task completion rate in tasks per second.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)
agent_latency
- Agent task latency at a specified percentile, grouped by agent name and agent class. Defaults to p95 when
percentileis not provided.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`. Default: `0.95`
- `step` (optional)
agent_duration
- Total time spent per agent in seconds over the time window, grouped by agent name and agent class.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)
agent_dependency_calls
- Count of external dependency calls over the time window, broken down by agent name, agent class, API type, and source.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)
agent_tool_calls
- Count of tool calls over the time window, broken down by agent name, agent class, API type, and tool name.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)
agent_messages
- Inter-agent message counts over the time window, by sender and receiver including their agent classes.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)
agent_messages_with_tokens
- Inter-agent message token consumption over the time window, broken down by sender/receiver pair, their agent classes, and token type (input/output/total).
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)
agent_orchestration_overhead
- Orchestration overhead ratio at a specified percentile. Defaults to p95 when
percentileis not provided.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `percentile` (optional) — Default: `0.95`
- `step` (optional)
Token Consumption Metrics¶
Metrics for tracking LLM token usage across models and agents, including input/output breakdowns for cost analysis and usage optimization.
token_consumption
- Total token consumption grouped by organization, project, and model. Supports optional agent filtering to narrow down consumption to specific agents.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)
token_input_total / token_output_total
- Input and output tokens broken out separately. Support optional agent filtering to narrow down to specific agents.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `model_key` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)
token_consumption_by_agent
- Token consumption grouped by agent name and agent class over the time window.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `agent_name` (optional)
- `agent_class` (optional)
- `time_window` (required)
- `step` (optional)
Session Metrics¶
Metrics for monitoring user session activity, including session counts, durations, and request throughput.
sessions_total
- Total number of sessions started over the time window.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)
sessions_active
- Number of currently active sessions (gauge — returns current value, no time window needed).
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
session_duration
- Session duration at a specified percentile. Defaults to p95 when
percentileis not provided.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `percentile` (optional) — e.g., `0.50`, `0.90`, `0.95`, `0.99`. Default: `0.95`
- `step` (optional)
session_requests_total
- Total requests processed within sessions over the time window.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)
session_requests_rate
- Session request rate in requests per second.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)
RAI Compliance Metrics¶
Metrics for tracking Responsible AI (RAI) compliance checks, including check counts, rejection rates by category, and latency.
rai_check_total
- Total number of RAI (Responsible AI) compliance checks performed over the time window.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `step` (optional)
rai_rejection_total
- Total number of queries that failed RAI compliance checks over the time window. Supports optional filtering by rejection category.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `category` (optional) — filter by rejection category: `harassment`, `hate`, `self-harm`, `sexual`, `violence`, `illicit`
- `time_window` (required)
- `step` (optional)
rai_check_latency
- RAI compliance check latency at a specified percentile. Defaults to p95 when
percentileis not provided.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
- `time_window` (required)
- `percentile` (optional) — Default: `0.95`
- `step` (optional)
Traces¶
inference_traces
- Traces for inference service requests.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
distiller_traces
- Traces for distiller service operations.
Parameters:
- `organization_id` (auto-resolved from token)
- `project_name` (optional)
Notes¶
- Time windows: Prometheus duration format (
5m,1h,24h). Default:1h- Percentile: Accepts
0.95or95format. Default:0.95(p95)- Time-series mode: Pass
step(e.g.,"15m") to get matrix data for charting- Agent class: Filter by implementation type (e.g.,
ToolUseAgent) across all agents of that type