Chat Completion API¶
This documentation provides an overview of the Chat Completion API. This API allows you to generate dynamic and contextually appropriate responses by leveraging advanced language models from our model catalog. You can utilize this API through our SDK using either the AIRefinery
or AsyncAIRefinery
clients.
Asynchronous Chat Completion¶
AsyncAIRefinery.chat.completions.create()
¶
The AsyncAIRefinery
client generates chat completions asynchronously, using the provided conversation history and model.
Parameters:¶
messages
(array): A list of messages comprising the conversation so far.model
(string): Model ID used to generate the response.audio
(object or null): Parameters for audio output. Optional.frequency_penalty
(number or null): Penalize new tokens based on their frequency in the text. Optional.logit_bias
(map): Modify the likelihood of specified tokens appearing in the completion. Optional.logprobs
(boolean or null): Whether to return log probabilities of the output tokens. Optional.max_completion_tokens
(integer or null): Maximum number of tokens that can be generated. Optional.modality
(array or null): Output types to generate. Optional.n
(integer or null): Number of chat completion choices to generate. Optional.temperature
(number or null): Sampling temperature for randomness in responses. Optional.tool_choice
(string or object): Controls which tool is called by the model. Optional.user
(string): Stable identifier for end-users. Optional.web_search_options
(object): Configuration for web search tool. Optional.response_format
(object): Specifies the format that the model must output. Optional. Options include JSON schema or JSON object for structured outputs.seed
(integer or null): Ensures deterministic sampling for repeated requests with the same seed. Beta feature. Optional.service_tier
(string or null): Specifies the latency tier for processing the request. Optional. Options are 'auto', 'default', or 'flex'.stop
(string/array/null): Specifies up to 4 sequences where the API will stop generating further tokens. Optional.store
(boolean or null): Determines whether to store the output for use in model distillation or evals products. Optional.stream
(boolean or null): Enables streaming of response data using server-sent events. Optional.stream_options
(object or null): Options for streaming response. Optional.tools
(array): A list of tools the model may call, currently only functions are supported. Optional.top_logprobs
(integer or null): Number of most likely tokens to return at each token position. Optional.top_p
(number or null): Nucleus sampling method alternative to temperature. Optional.
Returns:¶
-
Returns a
ChatCompletion
object, or a streamed sequence of chat completion chunk objects if the request is streamed. TheChatCompletion
object contains the following attributes:id
: Unique identifier for this ChatCompletion.object
: The object type, typically "chat.completion".created
: A UNIX timestamp indicating creation time.model
: The language model used.choices
: A list of choice objects describing possible completions.usage
: Token usage statistics for this completion, if available.service_tier
: Possible service-tier metadata, if provided.system_fingerprint
: System or model fingerprint, if provided.prompt_logprobs
: Log-probability data for the prompt, if available.
Example Usage¶
import os
import asyncio
from air import AsyncAIRefinery # a non-async AIRefinery client is also supported
from air import login
# Authenticate using environment variables for account and API key
auth = login(
account=str(os.getenv("ACCOUNT")), # Fetching AI Refinery ACCOUNT from environment variables
api_key=str(os.getenv("API_KEY")) # Fetching AI Refinery API_KEY from environment variables
)
async def generate_response(query: str):
# Initialize the AI Refinery client with authentication details
client = AsyncAIRefinery(**auth.openai())
prompt = f"Your task is to generate a response based on the user query.\n\n{query}"
# Request a chat completion through the client using the specified prompt and model
response = await client.chat.completions.create(
messages=[{"role": "user", "content": prompt}], # Messages including the prompt for completion
model="meta-llama/Llama-3.1-70B-Instruct" # Specify the model to use for generating the response
)
# Return the content of the first choice from the response
return response.choices[0].message.content
# Example call to the generate_response function
if __name__ == "__main__":
asyncio.run(generate_response("What is the weather like today?"))
Synchronous Chat Completion¶
AIRefinery.chat.completions.create()
¶
The AIRefinery
client generates chat completions in a synchronous manner, using the provided conversation history and model. This method supports the same parameters and return structure as the asynchronous method (AsyncAIRefinery.chat.completions.create()
) described above.
Example Usage¶
import os
import asyncio
from air import AIRefinery # a non-async AsyncAIRefinery client is also supported
from air import login
# Authenticate using environment variables for account and API key
auth = login(
account=str(os.getenv("ACCOUNT")), # Fetching AI Refinery ACCOUNT from environment variables
api_key=str(os.getenv("API_KEY")) # Fetching AI Refinery API_KEY from environment variables
)
def generate_response(query: str):
# Initialize the AI Refinery client with authentication details
client = AIRefinery(**auth.openai())
prompt = f"Your task is to generate a response based on the user query.\n\n{query}"
# Request a chat completion through the client using the specified prompt and model
response = client.chat.completions.create(
messages=[{"role": "user", "content": prompt}], # Messages including the prompt for completion
model="meta-llama/Llama-3.1-70B-Instruct" # Specify the model to use for generating the response
)
# Return the content of the first choice from the response
return response.choices[0].message.content
# Example call to the generate_response function
if __name__ == "__main__":
generate_response("What is the weather like today?")