Observability REST Endpoints¶

This page documents the REST endpoints for querying observability data from AI Refinery. These OpenTelemetry (OTel)-based endpoints enable you to query logs, metrics, and distributed traces from your AI applications through direct API calls.

Note: The Observability APIs are available by default on api.airefinery.accenture.com. This feature is available starting from SDK version 1.25.0.

Deprecation Notice: The USE_AIR_API_V2_BASE_URL environment variable is deprecated and no longer needed as of SDK version 1.28.0. The observability endpoints are now served from the default API URL.

For SDK-based access, see Observability API.

Overview¶

We provide access to three types of telemetry data, Logs, Metrics, and Traces , collected via OpenTelemetry. Thus, we have the following endpoints each for the corresponding telemetry data:

/logs - Query AIRefinery logs
- Logs capture time-stamped records of discrete events for debugging and auditing.
/metrics - Query AIRefinery metrics
- Metrics aggregate numerical measurements over time for monitoring performance trends.
/traces - Query AIRefinery traces
- Traces track request flows across AIRefinery services for identifying agent workflows and dependencies.

All endpoints support two-scope filtering:

Organization-level: Filter by organization_id (returns data for all projects)
Project-level: Filter by project_name (returns data for specific project)

Authentication¶

All endpoints require authentication. When using the SDK, pass your API key to the client:

from air import AIRefinery

client = AIRefinery(api_key="<api-key>")

For direct REST access, include the following headers:

-H "Authorization: Bearer <api-key>"
-H "sdk_version: 1.28.0"
-H "Content-Type: application/json"

Required headers: The Authorization header is required for authentication. The sdk_version header is also required — requests without it will be rejected. Use SDK version 1.13.0 or higher for API-key authentication.

The organization_id is automatically resolved from your API key. Tenants can only access observability data within their organization.

POST /observability/logs¶

Query AIRefinery logs. Users can view application logs with timestamps, filterable by labels and time range. These logs capture request handling, authentication flows, system interactions, and external dependency behavior, helping diagnose runtime issues and system health.

Parameters:¶

Parameter	Type	Required	Description
`organization_id`	string	No	Organization ID to filter logs. Auto-resolved from bearer token if omitted
`project_name`	string	No	Project name to filter logs
`severity`	string	No	Filter logs by severity level: `debug`, `info`, `warning`, `error`
`time_window`	string	No	Time range for logs (e.g., '5m', '1h', '24h'). Default: '24h'
`limit`	integer	No	Maximum number of log entries to return. Default: 500

Example Usage¶

Get logs within 1 hour:

from air import AIRefinery

client = AIRefinery(api_key="<api-key>")
response = client.logs.query(time_window="1h")

print(f"Status: {response.status()}")
for sample in response.samples():
    for ts, msg in sample.iter_messages_seconds():
        print(f"[{ts}] {msg}")

Get 100 logs for a specific project within 30 minutes:

from air import AIRefinery

client = AIRefinery(api_key="<api-key>")
response = client.logs.query(
    project_name="project-x",
    time_window="30m",
    limit=100
)

for sample in response.samples():
    print(f"Stream: {sample.stream}")
    for ts, msg in sample.iter_messages_seconds():
        print(f"  [{ts}] {msg}")

POST /observability/metrics¶

Query application metrics. This endpoint provides access to a series of metrics covering inference performance, agent operations, token consumption, RAI compliance, session analytics, and API usage. For a complete list of available metrics and their descriptions, see the configuration of observability data retrieval.

Parameters:¶

Parameter	Type	Required	Description
`metric`	string	Yes	Metric name from the configuration of observability data retrieval. (e.g., 'token_consumption', 'agent_task_total', 'api_requests_by_category')
`organization_id`	string	No	Organization ID to filter metrics. Auto-resolved from bearer token if omitted
`project_name`	string	No	Project name to filter metrics
`agent_name`	string	No	Agent name to filter metrics (for agent metrics)
`agent_class`	string	No	Agent class to filter metrics (e.g., 'ToolUseAgent', 'SearchAgent'). Useful for aggregating across all agents of a given type
`model_key`	string	No	Model identifier for inference metrics
`session_id`	string	No	Session ID to filter metrics to a specific user session. Note: a single user conversation produces multiple internal session IDs (one per agent dispatch)
`status`	string	No	Status filter for agent metrics (e.g., 'success', 'failure', 'timeout'). Default: 'success'
`category`	string	No	RAI rejection category filter: `harassment`, `hate`, `self-harm`, `sexual`, `violence`, `illicit`
`api_category`	string	No	API service category filter for usage metrics (e.g., 'chat', 'tts', 'embeddings', 'image_generation'). See API Usage Metrics
`percentile`	string	No	Percentile for latency/distribution metrics (e.g., '0.50', '0.95', or '50', '95'). Default: '0.95'
`time_window`	string	No	Time range for rate/increase queries (e.g., '5m', '1h', '24h'). Default: '1h'
`step`	string	No	Bucket interval for time-series output (e.g., '15m', '1h', '1d'). When provided, returns matrix data over time. Omit this parameter to get a single aggregated value (instant query). When included, the response will contain `ceil(time_window / step) + 1` data points — for example, `time_window=2d` with `step=2d` returns 2 data points, not 1.

Response Body¶

All metric responses follow this structure:

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": { "model_key": "openai/gpt-oss-120b" },
        "value": [1774037224.863, "3896.49"]
      }
    ],
    "analysis": {}
  },
  "query": "<PromQL query that was executed>"
}

resultType: "vector" for instant queries (no step), "matrix" for time-series queries (with step).
result: Array of series. Each series contains a metric object (labels) and either:
- "value": [timestamp, "value_string"] — for instant queries (single data point)
- "values": [[timestamp, "value_string"], ...] — for time-series queries (multiple data points)
value format: The value array is [unix_timestamp, "value_as_string"]. The timestamp is when Prometheus evaluated the query. Values are always returned as strings.
query: The PromQL query that was executed. Useful for debugging unexpected results.

Values are returned as strings and may be decimals. Token counts and other counter-based metrics return floating-point values (e.g., "3896.49") rather than integers. This is because the underlying recording rules use Prometheus's increase() function, which interpolates counter changes between scrape intervals. The values are accurate to within ~1% of the true count.

Empty results. If no data matches your query filters, the result array will be empty ([]). This can happen when filtering by a project_name that has had no activity in the specified time_window, or when querying metrics that have not been emitted (e.g., inference_error_rate when no errors have occurred).

Example Usage¶

Token consumption metrics:

from air import AIRefinery

client = AIRefinery(api_key="<api-key>")
response = client.metrics.query(
    metric="token_consumption",
    time_window="1h"
)

print(f"Query: {response.query()}")
for sample in response.samples():
    print(f"{sample.metric}: {sample.values}")

Agent task metrics (project-level):

response = client.metrics.query(
    metric="agent_task_total",
    project_name="project-x",
    time_window="1h"
)

for labels, values in response.iter_pairs():
    print(f"{labels} -> {values}")

Agent metrics filtered by agent class:

response = client.metrics.query(
    metric="agent_task_total",
    agent_class="ToolUseAgent",
    time_window="1h"
)

Inference latency at p95 (default) and p50:

# p95 latency (default)
response = client.metrics.query(
    metric="inference_latency",
    time_window="1h"
)

# p50 latency
response = client.metrics.query(
    metric="inference_latency",
    time_window="1h",
    percentile="0.50"
)

Token consumption filtered by agent:

response = client.metrics.query(
    metric="token_consumption",
    agent_name="orchestrator",
    time_window="1h"
)

RAI rejection total filtered by category:

response = client.metrics.query(
    metric="rai_rejection_total",
    category="harassment",
    time_window="1h"
)

Time-series token consumption (for charting):

response = client.metrics.query(
    metric="token_consumption",
    time_window="24h",
    step="1h"
)

# Returns matrix data with multiple data points over time
for sample in response.samples():
    print(f"Labels: {sample.metric}")
    for timestamp, value in sample.values:
        print(f"  {timestamp}: {value}")

API usage by service category:

response = client.metrics.query(
    metric="api_requests_by_category",
    time_window="24h"
)

for sample in response.samples():
    category = sample.metric.get("api_category")
    count = sample.values[-1][1] if sample.values else "0"
    print(f"{category}: {count} requests")

TTS character consumption:

response = client.metrics.query(
    metric="tts_characters_total",
    time_window="7d"
)

Document analysis requests by capability:

response = client.metrics.query(
    metric="document_analysis_total",
    time_window="7d"
)

for sample in response.samples():
    capability = sample.metric.get("capability")
    model = sample.metric.get("model_key")
    print(f"{model} ({capability}): {sample.values}")

POST /observability/traces¶

Query distributed traces using trace definitions from the configuration of observability data retrieval. This endpoint provides access to request traces across AIRefinery services, enabling you to inspect agent workflows, identify performance bottlenecks, and debug cross-service interactions.

Parameters:¶

Parameter	Type	Required	Description
`trace`	string	Yes	Trace name from the configuration of observability data retrieval. (e.g., 'inference_traces', 'distiller_traces')
`organization_id`	string	No	Organization ID to filter traces. Auto-resolved from bearer token if omitted
`project_name`	string	No	Project name to filter traces
`trace_id`	string	No	Specific trace ID to retrieve
`time_window`	string	No	Time range for query (e.g., '5m', '1h', '24h')
`detail`	boolean	No	Whether to include detailed trace information. Default: true
`limit`	integer	No	Maximum number of traces to return. Default: 100

Example Usage¶

Inference traces:

from air import AIRefinery

client = AIRefinery(api_key="<api-key>")
response = client.traces.query(
    trace="inference_traces",
    time_window="1h"
)

print(f"Found {len(response.traces())} traces")
for span in response.iter_spans():
    print(f"Span: {span.get('name')} - {span.get('status')}")

Project-level distiller traces:

response = client.traces.query(
    trace="distiller_traces",
    project_name="project-x",
    time_window="30m"
)

for batch in response.batches():
    print(f"Resource: {batch.resource}")
    for span in batch.iter_spans():
        print(f"  {span.get('name')}")

Get specific trace by ID:

response = client.traces.query(
    trace="inference_traces",
    trace_id="abc123def456"
)

Search without detailed trace data:

response = client.traces.query(
    trace="inference_traces",
    detail=False,
    limit=50
)

print(f"Total spans: {len(response.spans())}")

Notes¶

Authentication¶

The organization_id is automatically resolved from your API key. You do not need to include it — the server enforces tenant isolation automatically.
When making direct REST calls (not through the SDK), you must include both Authorization: Bearer <api-key> and sdk_version: <version> headers. Omitting sdk_version will result in a 500 error.

Parameters¶

Time windows support units: 'm' (minutes), 'h' (hours), 'd' (days). Default: '1h'.
The percentile parameter accepts values in 0–1 format (e.g., 0.95) or 1–100 format (e.g., 95). Default: 0.95 (p95).
Any metric preset can return time-series data by passing the step parameter (e.g., step="15m"). This returns matrix data with multiple data points over time, where time_window controls the lookback period and step controls the bucket interval. Omit step to get a single aggregated value.
The agent_class filter is available on all agent-related metrics, letting you aggregate by agent type (e.g., ToolUseAgent, SearchAgent) instead of individual agent names.
The status filter defaults to success for agent_performance_rate. Pass failure or timeout to query other status rates.
The severity filter on /logs accepts values: debug, info, warning, error.
The session_id filter is available on most metrics for narrowing results to a specific user session.
The category filter on rai_rejection_total lets you narrow down rejections to a specific category (e.g., harassment, hate, violence).

Understanding Token Consumption¶

Token consumption metrics (token_consumption, token_input_total, token_output_total, token_consumption_by_agent) track text tokens only. They reflect the total LLM token usage across all internal inference calls, not just the visible user message and response. A single user query to the orchestrator can produce multiple internal LLM calls (routing decisions, agent prompts, response generation), each with system prompts and context — so token counts will be higher than the visible conversation text.
Token values are returned as floating-point numbers (e.g., "1938.47" instead of "1938"). This is because the recording rules use Prometheus's increase() function which interpolates between scrape intervals. The values are accurate to within ~1% of the true integer count.
Image generation does not consume text tokens. The ImageGenerationAgent calls a diffuser API (e.g., FLUX) which generates images without producing token metrics. It will not appear in token_consumption_by_agent. However, if the image generation agent has a prompt rewriter enabled (default), the rewriter's LLM calls do consume tokens and appear under the image_generation/rewriter agent name.
ImageUnderstandingAgent does consume tokens because it makes LLM calls using a Vision Language Model (VLM). The VLM model used (e.g., Qwen/Qwen3-VL-32B-Instruct) is a platform-level default and may differ from the llm_config model set in your YAML.

Understanding Agent Classes¶

DirectInference: A synthetic label applied to requests that go through the chat completions API directly (client.chat.completions.create()) without the orchestrator. Not a real agent — used to track direct API calls in the same token metrics as agent-routed calls.
FallbackAgent: The agent class used when the orchestrator cannot route a query to a specific agent. Appears in results as agent_class: "FallbackAgent" with agent_name: "Orchestrator".
Orchestrator: The orchestrator itself, which routes queries to the appropriate agent. It consumes tokens for routing decisions and context processing.
Other agent classes (e.g., SearchAgent, AuthorAgent, ToolUseAgent, FlowSuperAgent, ImageUnderstandingAgent, ImageGenerationAgent, PlanningAgent, HumanAgent) correspond to built-in or custom agent types configured in your project YAML.

Understanding Response Values¶

Value of 0 vs empty result: A value of "0" means the metric exists but had no activity in the time window (e.g., an agent was instantiated but did not consume tokens). An empty result ("result": []) means the metric does not exist at all (e.g., querying inference_error_rate when no errors have ever occurred, or filtering by a project_name with no matching data).
Multiple session IDs per conversation: A single user chat session with the orchestrator produces multiple internal telemetry sessions — one for the orchestrator and one for each agent it dispatches to. Seeing 5–10 session IDs for a single conversation is expected behavior.
agent_messages vs agent_messages_with_tokens: These are two separate metrics. agent_messages returns message counts between agent pairs. agent_messages_with_tokens returns token counts between agent pairs, broken down by token_type (input/output/total). To build a complete inter-agent communication table, query both endpoints.