Error Handling¶

When you call the AI Refinery SDK, every non-2xx response is surfaced as a Python exception. Our FastAPI backend normalises those exceptions into a consistent JSON envelope, and this page outlines what SDK developers should expect and how to handle the resulting error payloads when the server reports a failure.

In HTTP, status codes from 200 to 299 mean “success”; any other status code counts as an error and triggers the behaviours described below.

How the SDK surfaces failures¶

Client flavour	Exception type	Trigger	Human-readable description of the error
`AIRefinery` and other synchronous clients	`requests.exceptions.HTTPError` (subclass of `requests.exceptions.RequestException`)	HTTP status ≥ 400 returned by the FastAPI backend	Read `err.response.json()["error"]` for the `message` and optional `detail` emitted by FastAPI.
`AsyncAIRefinery` and other async clients	`aiohttp.ClientResponseError` (subclass of `aiohttp.ClientError`)	HTTP status ≥ 400 returned by the FastAPI backend	Use `err.message`; if you capture the body, decode the FastAPI envelope just like the synchronous case.
Streaming chat completions (`stream=True`)	`air.chat.client.SSEStreamError` or `ChunkValidationError`	FastAPI emits an `event: error` frame or sends malformed SSE data	`str(err)` includes the upstream FastAPI error payload when provided.
Network/runtime issues	`requests.exceptions.RequestException`, `aiohttp.ClientError`, `asyncio.TimeoutError`	DNS failures, TLS problems, timeouts, etc.	`str(err)` and the stack trace describe the failure context (these errors arise before FastAPI can respond).

All sub-clients (chat completions, embeddings, images, models, etc.) follow the same pattern: they perform the HTTP request, call raise_for_status() and convert successful responses into Pydantic models. You should therefore wrap calls in try/except blocks that distinguish HTTP errors from application-specific failures in your own code.

The server error envelope¶

Our FastAPI layer wraps these exceptions and returns a consistent JSON envelope:

{
  "error": {
    "code": "auth.authentication_failed",
    "message": "Invalid or expired token.",
    "detail": {
      "...": "optional diagnostic fields"
    }
  }
}

code – A stable, machine-friendly identifier that you can branch on.
message – A human-readable explanation suitable for logs or UI surfaces.
detail – Optional structured metadata (such as limits, identifiers, or retry hints).

Any non-AIRefineryError raised by the backend becomes an HTTP 500 with this envelope. The original exception name is preserved in logs so you can follow up with support if needed.

The SDK does not modify this payload. In synchronous flows you can reach it via err.response.json(). In asynchronous flows, aiohttp.ClientResponseError exposes the HTTP status and headers; if you require the response body, wrap the request in a helper that inspects the aiohttp response before calling raise_for_status() (example below). The parsed JSON matches the HTTP error envelope.

Error Handling in SDK Clients¶

Synchronous clients¶

import os  # read environment variables
from dotenv import load_dotenv  # load variables from .env
from requests import HTTPError  # surface HTTP errors from requests
from air import AIRefinery  # sync SDK entry point into the FastAPI service

load_dotenv()  # load API_KEY from .env file
client = AIRefinery(api_key=os.environ["API_KEY"])  # instantiate the client with credentials

try:
    completion = client.chat.completions.create(  # perform a call against FastAPI
        model="openai/gpt-oss-120b",  # choose the model
        messages=[{"role": "user", "content": "Hello!"}],  # provide conversation context
    )
except HTTPError as err:  # catch HTTP failures
    payload = err.response.json() if err.response is not None else {}  # decode error body
    error = payload.get("error", {})  # extract the envelope
    code = error.get("code")  # pull the machine-readable code

    if code == "inference.model_key.not_found":  # handle specific model issues
        raise ValueError("Choose a model that exists in your workspace") from err
    if code == "inference.llm.rate_limit":  # throttle-aware branch
        retry_after = error.get("detail", {}).get("retry_after")  # parse retry hint
        backoff(retry_after or 5)  # schedule retry
    else:
        logger.error("API error %s: %s", code, error.get("message"))  # log fallback details
        raise  # re-raise unknown errors

Asynchronous clients¶

import os  # read environment variables
import aiohttp  # aiohttp exceptions for async failures
from dotenv import load_dotenv  # load variables from .env
from air import AsyncAIRefinery  # async SDK entry point into the FastAPI service

load_dotenv()  # load API_KEY from .env file
client = AsyncAIRefinery(api_key=os.environ["API_KEY"])  # instantiate async client

async def safe_completion(messages):
    try:
        return await client.chat.completions.create(  # await FastAPI request
            model="openai/gpt-oss-120b",  # chosen model
            messages=messages,  # chat history supplied by caller
        )
    except aiohttp.ClientResponseError as err:  # handle HTTP error responses
        if err.status == 401:  # auth failure
            raise RuntimeError("Check the API key or project permissions") from err
        if err.status == 429:  # rate limit branch
            retry_after = err.headers.get("Retry-After")  # parse retry header
            schedule_retry(retry_after)  # queue retry for later
            return None  # stop current workflow
        raise  # propagate unhandled errors

If you need the JSON body in an async workflow, issue the request manually:

import os  # read environment variables
import aiohttp  # manual request handling
from dotenv import load_dotenv  # load variables from .env
from air.utils import get_base_headers_async  # helper re-used by the SDK for FastAPI calls

load_dotenv()  # load API_KEY from .env file

async def call_with_body(client, payload):
    headers = await get_base_headers_async(client.api_key)  # base headers with auth
    async with aiohttp.ClientSession() as session:  # create HTTP session
        async with session.post(
            f"{client.base_url}/v1/chat/completions",  # FastAPI endpoint
            json=payload,  # request body
            headers=headers,  # include auth headers
        ) as resp:
            body = await resp.json()  # decode JSON body
            if resp.status >= 400:  # treat non-2xx as failures
                return None, body  # return error payload
            return body, None  # return success payload

Streaming¶

import os  # read environment variables
from dotenv import load_dotenv  # load variables from .env
from air import AIRefinery  # sync SDK entry point for FastAPI streaming
from air.chat.client import SSEStreamError  # streaming error class

load_dotenv()  # load API_KEY from .env file
client = AIRefinery(api_key=os.environ["API_KEY"])  # instantiate client for streaming

try:
    for chunk in client.chat.completions.create(
        model="openai/gpt-oss-120b",  # streaming-compatible model
        messages=messages,  # chat context
        stream=True,  # opt into FastAPI SSE stream
    ):
        handle_chunk(chunk)  # process each streamed chunk
except SSEStreamError as err:
    logger.warning("Stream aborted: %s", err)  # log stream failure

Common error codes¶

Authentication and request limits¶

Code	HTTP status	What it means	Typical next step
`auth.header_missing`	401 Unauthorized	No `Authorization` header was provided.	Supply the API key (or refresh the token).
`auth.authentication_failed`	401 Unauthorized	Token is invalid, expired, or tied to another workspace.	Rotate credentials and retry once.
`server.request_entity_too_large`	413 Payload Too Large	Upload exceeded the configured limit (default 100 MB). `detail.limit_mb` and `detail.content_length` are included.	Reduce the payload size or upload in smaller chunks.

Model catalogue and selection¶

Code	HTTP status	What it means	Action
`inference.registry.unsupported_model_type`	400 Bad Request	The referenced model type is not recognised by the platform.	Choose a model/type listed by `client.models.list()`.
`inference.registry.missing_model_type`	400 Bad Request	The registry entry lacks a mandatory `model_type`.	Fix the configuration before retrying.
`inference.registry.io_error`	500 Internal Server Error	Temporary failure while reading the model catalogue.	Retry with backoff; contact support if persistent.
`inference.registry.parse_error`	400 Bad Request	Registry metadata is malformed.	Validate the registered model definition.
`inference.registry.duplicate_key`	409 Conflict	Two models share the same logical key.	Remove or rename duplicate entries.
`inference.model_key.missing`	400 Bad Request	The request omitted the mandatory `model` parameter.	Provide the `model` argument.
`inference.model_key.not_found`	404 Not Found	Requested model key does not exist.	List models and select an available key.
`inference.model_key.type_mismatch`	400 Bad Request	Model exists but is incompatible with the endpoint.	Switch to a compatible model family.

Runtime and vendor interactions¶

Code	HTTP status	What it means	Action
`inference.runtime.error`	500 Internal Server Error	Unexpected exception while executing the request.	Retry with exponential backoff; capture the request ID for support.
`inference.llm.configuration_error`	400 Bad Request	Invalid request payload (missing fields, wrong types, etc.).	Validate your parameters before calling the SDK.
`inference.llm.client_not_initialized`	500 Internal Server Error	Backend worker was not ready to accept traffic.	Retry; report if it recurs.
`inference.llm.invalid_request`	400 Bad Request	The vendor rejected malformed input (e.g., empty `messages`).	Correct the request payload.
`inference.llm.service_error`	502 Bad Gateway	Vendor returned an unknown error.	Retry or switch models.
`inference.llm.rate_limit`	429 Too Many Requests	Shared or vendor quota exceeded. `detail.retry_after` is set when available.	Back off for the indicated interval before retrying.
`inference.llm.service_unavailable`	503 Service Unavailable	Temporary vendor outage or timeout.	Retry with exponential backoff.
`inference.llm.streaming_error`	502 Bad Gateway	Streaming connection broke mid-request.	Reconnect; re-send the request if idempotent.
`inference.llm.serialization_error`	502 Bad Gateway	Unexpected payload returned by the vendor SDK.	Retry; report to support with the request ID.
`inference.llm.unsupported_return_type`	502 Bad Gateway	Requested `return_type` is not supported.	Remove or correct the `return_type` argument.

Recommended handling flow¶

Log the HTTP status alongside the error.code so you can spot patterns quickly.
Use error.detail to decide whether to retry or prompt client-side action (limits, retry hints, and similar signals).
Apply exponential backoff for recoverable statuses (429, 500, 502, 503).
Surface actionable messages to end users (e.g., “refresh credentials”) and hide internal codes behind your own abstractions.