Error Handling¶
When you call the AI Refinery SDK, every non-2xx response is surfaced as a Python exception. Our FastAPI backend normalises those exceptions into a consistent JSON envelope, and this page outlines what SDK developers should expect and how to handle the resulting error payloads when the server reports a failure.
In HTTP, status codes from 200 to 299 mean “success”; any other status code counts as an error and triggers the behaviours described below.
How the SDK surfaces failures¶
| Client flavour | Exception type | Trigger | Human-readable description of the error |
|---|---|---|---|
AIRefinery and other synchronous clients |
requests.exceptions.HTTPError (subclass of requests.exceptions.RequestException) |
HTTP status ≥ 400 returned by the FastAPI backend | Read err.response.json()["error"] for the message and optional detail emitted by FastAPI. |
AsyncAIRefinery and other async clients |
aiohttp.ClientResponseError (subclass of aiohttp.ClientError) |
HTTP status ≥ 400 returned by the FastAPI backend | Use err.message; if you capture the body, decode the FastAPI envelope just like the synchronous case. |
Streaming chat completions (stream=True) |
air.chat.client.SSEStreamError or ChunkValidationError |
FastAPI emits an event: error frame or sends malformed SSE data |
str(err) includes the upstream FastAPI error payload when provided. |
| Network/runtime issues | requests.exceptions.RequestException, aiohttp.ClientError, asyncio.TimeoutError |
DNS failures, TLS problems, timeouts, etc. | str(err) and the stack trace describe the failure context (these errors arise before FastAPI can respond). |
All sub-clients (chat completions, embeddings, images, models, etc.) follow the same
pattern: they perform the HTTP request, call raise_for_status() and convert
successful responses into Pydantic models. You should therefore wrap calls in
try/except blocks that distinguish HTTP errors from application-specific
failures in your own code.
The server error envelope¶
Our FastAPI layer wraps these exceptions and returns a consistent JSON envelope:
{
"error": {
"code": "auth.authentication_failed",
"message": "Invalid or expired token.",
"detail": {
"...": "optional diagnostic fields"
}
}
}
code– A stable, machine-friendly identifier that you can branch on.message– A human-readable explanation suitable for logs or UI surfaces.detail– Optional structured metadata (such as limits, identifiers, or retry hints).
Any non-AIRefineryError raised by the backend becomes an HTTP 500 with this
envelope. The original exception name is preserved in logs so you can follow up
with support if needed.
The SDK does not modify this payload. In synchronous flows you can reach it via
err.response.json(). In asynchronous flows, aiohttp.ClientResponseError
exposes the HTTP status and headers; if you require the response body, wrap the
request in a helper that inspects the aiohttp response before calling
raise_for_status() (example below). The parsed JSON matches the
HTTP error envelope.
Error Handling in SDK Clients¶
Synchronous clients¶
import os # read environment variables
from dotenv import load_dotenv # load variables from .env
from requests import HTTPError # surface HTTP errors from requests
from air import AIRefinery # sync SDK entry point into the FastAPI service
load_dotenv() # load API_KEY from .env file
client = AIRefinery(api_key=os.environ["API_KEY"]) # instantiate the client with credentials
try:
completion = client.chat.completions.create( # perform a call against FastAPI
model="meta-llama/Llama-3.1-70B-Instruct", # choose the model
messages=[{"role": "user", "content": "Hello!"}], # provide conversation context
)
except HTTPError as err: # catch HTTP failures
payload = err.response.json() if err.response is not None else {} # decode error body
error = payload.get("error", {}) # extract the envelope
code = error.get("code") # pull the machine-readable code
if code == "inference.model_key.not_found": # handle specific model issues
raise ValueError("Choose a model that exists in your workspace") from err
if code == "inference.llm.rate_limit": # throttle-aware branch
retry_after = error.get("detail", {}).get("retry_after") # parse retry hint
backoff(retry_after or 5) # schedule retry
else:
logger.error("API error %s: %s", code, error.get("message")) # log fallback details
raise # re-raise unknown errors
Asynchronous clients¶
import os # read environment variables
import aiohttp # aiohttp exceptions for async failures
from dotenv import load_dotenv # load variables from .env
from air import AsyncAIRefinery # async SDK entry point into the FastAPI service
load_dotenv() # load API_KEY from .env file
client = AsyncAIRefinery(api_key=os.environ["API_KEY"]) # instantiate async client
async def safe_completion(messages):
try:
return await client.chat.completions.create( # await FastAPI request
model="meta-llama/Llama-3.1-70B-Instruct", # chosen model
messages=messages, # chat history supplied by caller
)
except aiohttp.ClientResponseError as err: # handle HTTP error responses
if err.status == 401: # auth failure
raise RuntimeError("Check the API key or project permissions") from err
if err.status == 429: # rate limit branch
retry_after = err.headers.get("Retry-After") # parse retry header
schedule_retry(retry_after) # queue retry for later
return None # stop current workflow
raise # propagate unhandled errors
If you need the JSON body in an async workflow, issue the request manually:
import os # read environment variables
import aiohttp # manual request handling
from dotenv import load_dotenv # load variables from .env
from air.utils import get_base_headers_async # helper re-used by the SDK for FastAPI calls
load_dotenv() # load API_KEY from .env file
async def call_with_body(client, payload):
headers = await get_base_headers_async(client.api_key) # base headers with auth
async with aiohttp.ClientSession() as session: # create HTTP session
async with session.post(
f"{client.base_url}/v1/chat/completions", # FastAPI endpoint
json=payload, # request body
headers=headers, # include auth headers
) as resp:
body = await resp.json() # decode JSON body
if resp.status >= 400: # treat non-2xx as failures
return None, body # return error payload
return body, None # return success payload
Streaming¶
import os # read environment variables
from dotenv import load_dotenv # load variables from .env
from air import AIRefinery # sync SDK entry point for FastAPI streaming
from air.chat.client import SSEStreamError # streaming error class
load_dotenv() # load API_KEY from .env file
client = AIRefinery(api_key=os.environ["API_KEY"]) # instantiate client for streaming
try:
for chunk in client.chat.completions.create(
model="meta-llama/Llama-3.1-70B-Instruct", # streaming-compatible model
messages=messages, # chat context
stream=True, # opt into FastAPI SSE stream
):
handle_chunk(chunk) # process each streamed chunk
except SSEStreamError as err:
logger.warning("Stream aborted: %s", err) # log stream failure
Common error codes¶
Authentication and request limits¶
| Code | HTTP status | What it means | Typical next step |
|---|---|---|---|
auth.header_missing |
401 Unauthorized | No Authorization header was provided. |
Supply the API key (or refresh the token). |
auth.authentication_failed |
401 Unauthorized | Token is invalid, expired, or tied to another workspace. | Rotate credentials and retry once. |
server.request_entity_too_large |
413 Payload Too Large | Upload exceeded the configured limit (default 100 MB). detail.limit_mb and detail.content_length are included. |
Reduce the payload size or upload in smaller chunks. |
Model catalogue and selection¶
| Code | HTTP status | What it means | Action |
|---|---|---|---|
inference.registry.unsupported_model_type |
400 Bad Request | The referenced model type is not recognised by the platform. | Choose a model/type listed by client.models.list(). |
inference.registry.missing_model_type |
400 Bad Request | The registry entry lacks a mandatory model_type. |
Fix the configuration before retrying. |
inference.registry.io_error |
500 Internal Server Error | Temporary failure while reading the model catalogue. | Retry with backoff; contact support if persistent. |
inference.registry.parse_error |
400 Bad Request | Registry metadata is malformed. | Validate the registered model definition. |
inference.registry.duplicate_key |
409 Conflict | Two models share the same logical key. | Remove or rename duplicate entries. |
inference.model_key.missing |
400 Bad Request | The request omitted the mandatory model parameter. |
Provide the model argument. |
inference.model_key.not_found |
404 Not Found | Requested model key does not exist. | List models and select an available key. |
inference.model_key.type_mismatch |
400 Bad Request | Model exists but is incompatible with the endpoint. | Switch to a compatible model family. |
Runtime and vendor interactions¶
| Code | HTTP status | What it means | Action |
|---|---|---|---|
inference.runtime.error |
500 Internal Server Error | Unexpected exception while executing the request. | Retry with exponential backoff; capture the request ID for support. |
inference.llm.configuration_error |
400 Bad Request | Invalid request payload (missing fields, wrong types, etc.). | Validate your parameters before calling the SDK. |
inference.llm.client_not_initialized |
500 Internal Server Error | Backend worker was not ready to accept traffic. | Retry; report if it recurs. |
inference.llm.invalid_request |
400 Bad Request | The vendor rejected malformed input (e.g., empty messages). |
Correct the request payload. |
inference.llm.service_error |
502 Bad Gateway | Vendor returned an unknown error. | Retry or switch models. |
inference.llm.rate_limit |
429 Too Many Requests | Shared or vendor quota exceeded. detail.retry_after is set when available. |
Back off for the indicated interval before retrying. |
inference.llm.service_unavailable |
503 Service Unavailable | Temporary vendor outage or timeout. | Retry with exponential backoff. |
inference.llm.streaming_error |
502 Bad Gateway | Streaming connection broke mid-request. | Reconnect; re-send the request if idempotent. |
inference.llm.serialization_error |
502 Bad Gateway | Unexpected payload returned by the vendor SDK. | Retry; report to support with the request ID. |
inference.llm.unsupported_return_type |
502 Bad Gateway | Requested return_type is not supported. |
Remove or correct the return_type argument. |
Recommended handling flow¶
- Log the HTTP status alongside the
error.codeso you can spot patterns quickly. - Use
error.detailto decide whether to retry or prompt client-side action (limits, retry hints, and similar signals). - Apply exponential backoff for recoverable statuses (
429,500,502,503). - Surface actionable messages to end users (e.g., “refresh credentials”) and hide internal codes behind your own abstractions.