Observability REST Endpoints¶
This page documents the REST endpoints for querying observability data from AI Refinery. These OpenTelemetry (OTel)-based endpoints enable you to query logs, metrics, and distributed traces from your AI applications through direct API calls.
Note: The Observability APIs are available by default on
api.airefinery.accenture.com. This feature is available starting from SDK version 1.25.0.Deprecation Notice: The
USE_AIR_API_V2_BASE_URLenvironment variable is deprecated and no longer needed as of SDK version 1.28.0. The observability endpoints are now served from the default API URL.
For SDK-based access, see Observability API.
Overview¶
We provide access to three types of telemetry data, Logs, Metrics, and Traces , collected via OpenTelemetry. Thus, we have the following endpoints each for the corresponding telemetry data:
-
/logs- Query AIRefinery logs- Logs capture time-stamped records of discrete events for debugging and auditing.
-
/metrics- Query AIRefinery metrics- Metrics aggregate numerical measurements over time for monitoring performance trends.
-
/traces- Query AIRefinery traces- Traces track request flows across AIRefinery services for identifying agent workflows and dependencies.
All endpoints support two-scope filtering:
-
Organization-level: Filter by
organization_id(returns data for all projects) -
Project-level: Filter by
project_name(returns data for specific project)
Authentication¶
All endpoints require authentication. When using the SDK, pass your API key to the client:
For direct REST access, include the bearer token in the request header:
The
organization_idis automatically resolved from your API key. Tenants can only access observability data within their organization.
POST /observability/logs¶
Query AIRefinery logs. Users can view application logs with timestamps, filterable by labels and time range. These logs capture request handling, authentication flows, system interactions, and external dependency behavior, helping diagnose runtime issues and system health.
Parameters:¶
| Parameter | Type | Required | Description |
|---|---|---|---|
organization_id |
string | No | Organization ID to filter logs. Auto-resolved from bearer token if omitted |
project_name |
string | No | Project name to filter logs |
severity |
string | No | Filter logs by severity level: debug, info, warning, error |
time_window |
string | No | Time range for logs (e.g., '5m', '1h', '24h'). Default: '24h' |
limit |
integer | No | Maximum number of log entries to return. Default: 500 |
Example Usage¶
Get logs within 1 hour:
from air import AIRefinery
client = AIRefinery(api_key="<api-key>")
response = client.logs.query(time_window="1h")
print(f"Status: {response.status()}")
for sample in response.samples():
for ts, msg in sample.iter_messages_seconds():
print(f"[{ts}] {msg}")
Get 100 logs for a specific project within 30 minutes:
from air import AIRefinery
client = AIRefinery(api_key="<api-key>")
response = client.logs.query(
project_name="project-x",
time_window="30m",
limit=100
)
for sample in response.samples():
print(f"Stream: {sample.stream}")
for ts, msg in sample.iter_messages_seconds():
print(f" [{ts}] {msg}")
POST /observability/metrics¶
Query application metrics. This endpoint provides access to a series of metrics covering inference performance, agent operations, token consumption, RAI compliance, and session analytics. For a complete list of available metrics and their descriptions, see the configuration of observability data retrieval.
Parameters:¶
| Parameter | Type | Required | Description |
|---|---|---|---|
metric |
string | Yes | Metric name from the configuration of observability data retrieval. (e.g., 'token_consumption', 'agent_task_total') |
organization_id |
string | No | Organization ID to filter metrics. Auto-resolved from bearer token if omitted |
project_name |
string | No | Project name to filter metrics |
agent_name |
string | No | Agent name to filter metrics (for agent metrics) |
agent_class |
string | No | Agent class to filter metrics (e.g., 'ToolUseAgent', 'SearchAgent'). Useful for aggregating across all agents of a given type |
model_key |
string | No | Model identifier for inference metrics |
session_id |
string | No | Session ID to filter metrics to a specific user session |
status |
string | No | Status filter for agent metrics (e.g., 'success', 'failure', 'timeout'). Default: 'success' |
category |
string | No | RAI rejection category filter: harassment, hate, self-harm, sexual, violence, illicit |
percentile |
string | No | Percentile for latency/distribution metrics (e.g., '0.50', '0.95', or '50', '95'). Default: '0.95' |
time_window |
string | No | Time range for rate/increase queries (e.g., '5m', '1h', '24h'). Default: '1h' |
step |
string | No | Bucket interval for time-series output (e.g., '15m', '1h', '1d'). When provided, returns matrix data over time. Default: '1h' |
Example Usage¶
Token consumption metrics:
from air import AIRefinery
client = AIRefinery(api_key="<api-key>")
response = client.metrics.query(
metric="token_consumption",
time_window="1h"
)
print(f"Query: {response.query()}")
for sample in response.samples():
print(f"{sample.metric}: {sample.values}")
Agent task metrics (project-level):
response = client.metrics.query(
metric="agent_task_total",
project_name="project-x",
time_window="1h"
)
for labels, values in response.iter_pairs():
print(f"{labels} -> {values}")
Agent metrics filtered by agent class:
response = client.metrics.query(
metric="agent_task_total",
agent_class="ToolUseAgent",
time_window="1h"
)
Inference latency at p95 (default) and p50:
# p95 latency (default)
response = client.metrics.query(
metric="inference_latency",
time_window="1h"
)
# p50 latency
response = client.metrics.query(
metric="inference_latency",
time_window="1h",
percentile="0.50"
)
Token consumption filtered by agent:
response = client.metrics.query(
metric="token_consumption",
agent_name="orchestrator",
time_window="1h"
)
RAI rejection total filtered by category:
response = client.metrics.query(
metric="rai_rejection_total",
category="harassment",
time_window="1h"
)
Time-series token consumption (for charting):
response = client.metrics.query(
metric="token_consumption",
time_window="24h",
step="1h"
)
# Returns matrix data with multiple data points over time
for sample in response.samples():
print(f"Labels: {sample.metric}")
for timestamp, value in sample.values:
print(f" {timestamp}: {value}")
POST /observability/traces¶
Query distributed traces using trace definitions from the configuration of observability data retrieval. This endpoint provides access to request traces across AIRefinery services, enabling you to inspect agent workflows, identify performance bottlenecks, and debug cross-service interactions.
Parameters:¶
| Parameter | Type | Required | Description |
|---|---|---|---|
trace |
string | Yes | Trace name from the configuration of observability data retrieval. (e.g., 'inference_traces', 'distiller_traces') |
organization_id |
string | No | Organization ID to filter traces. Auto-resolved from bearer token if omitted |
project_name |
string | No | Project name to filter traces |
trace_id |
string | No | Specific trace ID to retrieve |
time_window |
string | No | Time range for query (e.g., '5m', '1h', '24h') |
detail |
boolean | No | Whether to include detailed trace information. Default: true |
limit |
integer | No | Maximum number of traces to return. Default: 100 |
Example Usage¶
Inference traces:
from air import AIRefinery
client = AIRefinery(api_key="<api-key>")
response = client.traces.query(
trace="inference_traces",
time_window="1h"
)
print(f"Found {len(response.traces())} traces")
for span in response.iter_spans():
print(f"Span: {span.get('name')} - {span.get('status')}")
Project-level distiller traces:
response = client.traces.query(
trace="distiller_traces",
project_name="project-x",
time_window="30m"
)
for batch in response.batches():
print(f"Resource: {batch.resource}")
for span in batch.iter_spans():
print(f" {span.get('name')}")
Get specific trace by ID:
Search without detailed trace data:
response = client.traces.query(
trace="inference_traces",
detail=False,
limit=50
)
print(f"Total spans: {len(response.spans())}")
Notes¶
- The
organization_idis automatically resolved from your API key. You do not need to include it — the server enforces tenant isolation automatically.- Time windows support units: 'm' (minutes), 'h' (hours), 'd' (days). Default: '1h'.
- The
percentileparameter accepts values in 0–1 format (e.g.,0.95) or 1–100 format (e.g.,95). Default:0.95(p95).- Any metric preset can return time-series data by passing the
stepparameter (e.g.,step="15m"). This returns matrix data with multiple data points over time, wheretime_windowcontrols the lookback period andstepcontrols the bucket interval.- The
agent_classfilter is available on all agent-related metrics, letting you aggregate by agent type (e.g.,ToolUseAgent,SearchAgent) instead of individual agent names.- The
statusfilter defaults tosuccessforagent_performance_rate. Passfailureortimeoutto query other status rates.- The
severityfilter on/logsaccepts values:debug,info,warning,error.- The
session_idfilter is available on most metrics for narrowing results to a specific user session.- The
categoryfilter onrai_rejection_totallets you narrow down rejections to a specific category (e.g.,harassment,hate,violence).