Model Catalog¶

Our comprehensive model catalog provides a diverse array of models for your selection. To configure your agents to leverage any of these models, please refer to our project configuration guidelines. Below, you will find a list of the models currently supported. We are dedicated to the continuous enhancement and expansion of our model catalog, so please visit this page regularly for the latest updates.

LLMs & VLMs¶

The table below lists the LLMs and VLMs currently supported:

LLM / VLM	Input Modalities	Output
`meta-llama/Llama-3.1-8B-Instruct`	text	text
`meta-llama/Llama-3.1-70B-Instruct`	text	text
`meta-llama/Llama-3.3-70b-Instruct`	text	text
`meta-llama/Llama-4-Maverick-17B-128E-Instruct`	text	text
`meta-llama/Llama-3.2-90B-Vision-Instruct`	text, image	text
`mistralai/Mistral-7B-Instruct-v0.3`	text	text
`mistralai/Mistral-Small-3.1-24B-Instruct-2503`	text, image	text
`Qwen/Qwen3-32B`	text	text
`deepseek-ai/deepseek-r1-distill-qwen-32b`	text	text

Configuring LLMs & VLMs for Your Project¶

To integrate any of the supported models into your project, update the relevant configuration section within the base_config or the config block of any utility agents in your YAML file. For models that support image input, ensure the agent is capable of handling images (e.g., ImageUnderstandingAgent). Make sure the model parameter is set to one of the supported model names listed above, and ensure that any required capabilities—such as image input—are supported by the selected agent.

Using LLMs through Our Inference API¶

You can also directly use any of the models listed above through our inference API. See an example below:

import os

from air import AIRefinery, login
from dotenv import load_dotenv

load_dotenv()  # loads your ACCOUNT and API_KEY from a .env file

auth = login(
    account=str(os.getenv("ACCOUNT")),  # your account
    api_key=str(os.getenv("API_KEY")),  # your API key
)
base_url = os.getenv("AIREFINERY_ADDRESS", "")

client = AIRefinery(**auth.openai(base_url=base_url))

# Create a chat request
response = client.chat.completions.create(
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    model="meta-llama/Llama-3.1-70B-Instruct",  # an LLM from the list  above
)
print(response.choices[0].message.content)

Embedding Models¶

The list of models that we support for embedding your data are as follows:

Using Embedding Models in Your Project¶

To utilize any of these embedding models in your project, simply update the embedding_config within the base_config or within the aisearch_config section of the ResearchAgent. Ensure that the model_name parameter of the embedding_config is set to one of the names listed above.

Embedding Your Data Using Our Embedding API¶

You can also directly use any of the models listed above to embed your data using our inference API. See an example below:

import os

from air import AIRefinery, login
from dotenv import load_dotenv

load_dotenv()  # loads your ACCOUNT and API_KEY from a .env file

auth = login(
    account=str(os.getenv("ACCOUNT")),  # your account
    api_key=str(os.getenv("API_KEY")),  # your API key
)
base_url = os.getenv("AIREFINERY_ADDRESS", "")

client = AIRefinery(**auth.openai(base_url=base_url))

# Create an embedding request
response = client.embeddings.create(
    input=["What is the capital of France?"],
    model="nvidia/nv-embedqa-mistral-7b-v2",  # required
    encoding_format="float",  # required
    extra_body={
        "input_type": "query",
        "truncate": "NONE",
    },  # extra_body is required for "nvidia" models
    # where "input_type" can be either "query" or "passage"
)
print(response)

Compressors¶

The list of prompt compression models that we support are:

microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank

To utilize any of these prompt compression models in your project, simply update the compression_config within the base_config of your project. To learn more about prompt compression, see this tutorial. Ensure that the model parameter of the compression_config is set to one of the names listed above.

Rerankers¶

The list of reranker models that we support are:

To utilize any of these reranker models in your project, simply update the reranker_config within the base_config of your project. To learn more about reranking, see this tutorial. Ensure that the model parameter of the reranker_config is set to one of the names listed above.

Diffusers¶

The list of diffusers we support are:

black-forest-labs/FLUX.1-schnell

These diffusers can be used for our image generation agent, and the Images API.

Segmentation Models¶

The list of segmentation models currently supported are:

syscv-community/sam-hq-vit-base

These models can be used with the Images API to perform high-quality image segmentation. Integration within the agentic framework is not currently supported, but may be added in future updates.

Text-to-Speech Models¶

The list of TTS models currently supported are:

Azure/AI-Speech

This model can be used to convert text to speech using batch synthesis from the TTS API.

Automatic Speech Recognition (ASR) Models¶

The list of ASR models currently supported are:

Azure/AI-Transcription

These models can be used to transcribe audio files using the ASR Transcription API.