Skip to content

Error Handling

When you call the AI Refinery SDK, every non-2xx response is surfaced as a Python exception. Our FastAPI backend normalises those exceptions into a consistent JSON envelope, and this page outlines what SDK developers should expect and how to handle the resulting error payloads when the server reports a failure.

In HTTP, status codes from 200 to 299 mean “success”; any other status code counts as an error and triggers the behaviours described below.

How the SDK surfaces failures

Client flavour Exception type Trigger Human-readable description of the error
AIRefinery and other synchronous clients requests.exceptions.HTTPError (subclass of requests.exceptions.RequestException) HTTP status ≥ 400 returned by the FastAPI backend Read err.response.json()["error"] for the message and optional detail emitted by FastAPI.
AsyncAIRefinery and other async clients aiohttp.ClientResponseError (subclass of aiohttp.ClientError) HTTP status ≥ 400 returned by the FastAPI backend Use err.message; if you capture the body, decode the FastAPI envelope just like the synchronous case.
Streaming chat completions (stream=True) air.chat.client.SSEStreamError or ChunkValidationError FastAPI emits an event: error frame or sends malformed SSE data str(err) includes the upstream FastAPI error payload when provided.
Network/runtime issues requests.exceptions.RequestException, aiohttp.ClientError, asyncio.TimeoutError DNS failures, TLS problems, timeouts, etc. str(err) and the stack trace describe the failure context (these errors arise before FastAPI can respond).

All sub-clients (chat completions, embeddings, images, models, etc.) follow the same pattern: they perform the HTTP request, call raise_for_status() and convert successful responses into Pydantic models. You should therefore wrap calls in try/except blocks that distinguish HTTP errors from application-specific failures in your own code.

The server error envelope

Our FastAPI layer wraps these exceptions and returns a consistent JSON envelope:

{
  "error": {
    "code": "auth.authentication_failed",
    "message": "Invalid or expired token.",
    "detail": {
      "...": "optional diagnostic fields"
    }
  }
}
  • code – A stable, machine-friendly identifier that you can branch on.
  • message – A human-readable explanation suitable for logs or UI surfaces.
  • detail – Optional structured metadata (such as limits, identifiers, or retry hints).

Any non-AIRefineryError raised by the backend becomes an HTTP 500 with this envelope. The original exception name is preserved in logs so you can follow up with support if needed.

The SDK does not modify this payload. In synchronous flows you can reach it via err.response.json(). In asynchronous flows, aiohttp.ClientResponseError exposes the HTTP status and headers; if you require the response body, wrap the request in a helper that inspects the aiohttp response before calling raise_for_status() (example below). The parsed JSON matches the HTTP error envelope.

Error Handling in SDK Clients

Synchronous clients

import os  # read environment variables
from dotenv import load_dotenv  # load variables from .env
from requests import HTTPError  # surface HTTP errors from requests
from air import AIRefinery  # sync SDK entry point into the FastAPI service

load_dotenv()  # load API_KEY from .env file
client = AIRefinery(api_key=os.environ["API_KEY"])  # instantiate the client with credentials

try:
    completion = client.chat.completions.create(  # perform a call against FastAPI
        model="meta-llama/Llama-3.1-70B-Instruct",  # choose the model
        messages=[{"role": "user", "content": "Hello!"}],  # provide conversation context
    )
except HTTPError as err:  # catch HTTP failures
    payload = err.response.json() if err.response is not None else {}  # decode error body
    error = payload.get("error", {})  # extract the envelope
    code = error.get("code")  # pull the machine-readable code

    if code == "inference.model_key.not_found":  # handle specific model issues
        raise ValueError("Choose a model that exists in your workspace") from err
    if code == "inference.llm.rate_limit":  # throttle-aware branch
        retry_after = error.get("detail", {}).get("retry_after")  # parse retry hint
        backoff(retry_after or 5)  # schedule retry
    else:
        logger.error("API error %s: %s", code, error.get("message"))  # log fallback details
        raise  # re-raise unknown errors

Asynchronous clients

import os  # read environment variables
import aiohttp  # aiohttp exceptions for async failures
from dotenv import load_dotenv  # load variables from .env
from air import AsyncAIRefinery  # async SDK entry point into the FastAPI service

load_dotenv()  # load API_KEY from .env file
client = AsyncAIRefinery(api_key=os.environ["API_KEY"])  # instantiate async client

async def safe_completion(messages):
    try:
        return await client.chat.completions.create(  # await FastAPI request
            model="meta-llama/Llama-3.1-70B-Instruct",  # chosen model
            messages=messages,  # chat history supplied by caller
        )
    except aiohttp.ClientResponseError as err:  # handle HTTP error responses
        if err.status == 401:  # auth failure
            raise RuntimeError("Check the API key or project permissions") from err
        if err.status == 429:  # rate limit branch
            retry_after = err.headers.get("Retry-After")  # parse retry header
            schedule_retry(retry_after)  # queue retry for later
            return None  # stop current workflow
        raise  # propagate unhandled errors

If you need the JSON body in an async workflow, issue the request manually:

import os  # read environment variables
import aiohttp  # manual request handling
from dotenv import load_dotenv  # load variables from .env
from air.utils import get_base_headers_async  # helper re-used by the SDK for FastAPI calls

load_dotenv()  # load API_KEY from .env file

async def call_with_body(client, payload):
    headers = await get_base_headers_async(client.api_key)  # base headers with auth
    async with aiohttp.ClientSession() as session:  # create HTTP session
        async with session.post(
            f"{client.base_url}/v1/chat/completions",  # FastAPI endpoint
            json=payload,  # request body
            headers=headers,  # include auth headers
        ) as resp:
            body = await resp.json()  # decode JSON body
            if resp.status >= 400:  # treat non-2xx as failures
                return None, body  # return error payload
            return body, None  # return success payload

Streaming

import os  # read environment variables
from dotenv import load_dotenv  # load variables from .env
from air import AIRefinery  # sync SDK entry point for FastAPI streaming
from air.chat.client import SSEStreamError  # streaming error class

load_dotenv()  # load API_KEY from .env file
client = AIRefinery(api_key=os.environ["API_KEY"])  # instantiate client for streaming

try:
    for chunk in client.chat.completions.create(
        model="meta-llama/Llama-3.1-70B-Instruct",  # streaming-compatible model
        messages=messages,  # chat context
        stream=True,  # opt into FastAPI SSE stream
    ):
        handle_chunk(chunk)  # process each streamed chunk
except SSEStreamError as err:
    logger.warning("Stream aborted: %s", err)  # log stream failure

Common error codes

Authentication and request limits

Code HTTP status What it means Typical next step
auth.header_missing 401 Unauthorized No Authorization header was provided. Supply the API key (or refresh the token).
auth.authentication_failed 401 Unauthorized Token is invalid, expired, or tied to another workspace. Rotate credentials and retry once.
server.request_entity_too_large 413 Payload Too Large Upload exceeded the configured limit (default 100 MB). detail.limit_mb and detail.content_length are included. Reduce the payload size or upload in smaller chunks.

Model catalogue and selection

Code HTTP status What it means Action
inference.registry.unsupported_model_type 400 Bad Request The referenced model type is not recognised by the platform. Choose a model/type listed by client.models.list().
inference.registry.missing_model_type 400 Bad Request The registry entry lacks a mandatory model_type. Fix the configuration before retrying.
inference.registry.io_error 500 Internal Server Error Temporary failure while reading the model catalogue. Retry with backoff; contact support if persistent.
inference.registry.parse_error 400 Bad Request Registry metadata is malformed. Validate the registered model definition.
inference.registry.duplicate_key 409 Conflict Two models share the same logical key. Remove or rename duplicate entries.
inference.model_key.missing 400 Bad Request The request omitted the mandatory model parameter. Provide the model argument.
inference.model_key.not_found 404 Not Found Requested model key does not exist. List models and select an available key.
inference.model_key.type_mismatch 400 Bad Request Model exists but is incompatible with the endpoint. Switch to a compatible model family.

Runtime and vendor interactions

Code HTTP status What it means Action
inference.runtime.error 500 Internal Server Error Unexpected exception while executing the request. Retry with exponential backoff; capture the request ID for support.
inference.llm.configuration_error 400 Bad Request Invalid request payload (missing fields, wrong types, etc.). Validate your parameters before calling the SDK.
inference.llm.client_not_initialized 500 Internal Server Error Backend worker was not ready to accept traffic. Retry; report if it recurs.
inference.llm.invalid_request 400 Bad Request The vendor rejected malformed input (e.g., empty messages). Correct the request payload.
inference.llm.service_error 502 Bad Gateway Vendor returned an unknown error. Retry or switch models.
inference.llm.rate_limit 429 Too Many Requests Shared or vendor quota exceeded. detail.retry_after is set when available. Back off for the indicated interval before retrying.
inference.llm.service_unavailable 503 Service Unavailable Temporary vendor outage or timeout. Retry with exponential backoff.
inference.llm.streaming_error 502 Bad Gateway Streaming connection broke mid-request. Reconnect; re-send the request if idempotent.
inference.llm.serialization_error 502 Bad Gateway Unexpected payload returned by the vendor SDK. Retry; report to support with the request ID.
inference.llm.unsupported_return_type 502 Bad Gateway Requested return_type is not supported. Remove or correct the return_type argument.
  • Log the HTTP status alongside the error.code so you can spot patterns quickly.
  • Use error.detail to decide whether to retry or prompt client-side action (limits, retry hints, and similar signals).
  • Apply exponential backoff for recoverable statuses (429, 500, 502, 503).
  • Surface actionable messages to end users (e.g., “refresh credentials”) and hide internal codes behind your own abstractions.