Skip to content

Realtime Distiller API

Realtime Distiller extends AI Refinery's Distiller to support real-time streaming interactions with both text and voice input. It supports:

  • Voice input: Real-time audio streaming from microphone
  • Voice output: Speech synthesis responses
  • Text input: Text queries with voice responses

Before you begin, you must create an authenticated AsyncAIRefinery client, as shown below. All Realtime Distiller APIs are accessed via client.realtime_distiller.

import os
from air import AsyncAIRefinery
from dotenv import load_dotenv


load_dotenv() # loads your API_KEY from your local '.env' file
api_key=str(os.getenv("API_KEY"))


client = AsyncAIRefinery(api_key=api_key)

Realtime Distiller Workflow


Realtime Distiller Workflow
Realtime Distiller Workflow

Preliminaries

Creating Your Project

client.realtime_distiller.create_project() (synchronous)

Creates a new project based on the specified YAML configuration file.

Parameters:

  • config_path (str): The path to the YAML configuration file.
  • project (str): A name for your project (letters, digits, hyphens, underscores only).

Returns:

  • bool: True if the project is successfully created.

Project Versioning:

  • Realtime Distiller automatically handles project versioning, starting at version 0.
  • The first time you create a project with a given name, it is assigned version 0. If you create another project with the same name, Distiller increments the version to 1, and so on.
  • By default, connections are made to the latest project version unless a specific version is specified. For more details, refer to the distiller connection section below.

Example:

# This command registers the project "example" using the "example.yaml" configuration file.
client.realtime_distiller.create_project(config_path="example.yaml", project="example")

Downloading Your Project Configuration

client.realtime_distiller.download_project() (synchronous)

Retrieves the configuration of a specified project from the server.

Parameters:

  • project (str): The name of the project whose configuration you want to download.
  • project_version (str, optional): The version of the project configuration to download. Defaults to the latest version if not provided.

Returns:

  • dict: A Python dictionary containing the downloaded configuration.

Example:

# This command downloads version "1" of the "example" project.
project_config = client.realtime_distiller.download_project(project="example", project_version="1")

Connecting to Realtime Distiller

client.realtime_distiller.__call__() (asynchronous)

Establishes an asynchronous connection (via a WebSocket) to the RealtimeDistiller endpoint for a specific project. Usage of this function within an async context manager allows easy management of all Distiller-related operations.

Parameters:

  • project (str): The project name (letters, digits, hyphens, underscores only).
  • uuid (str): A unique user identifier (letters, digits, hyphens, underscores only).
  • executor_dict (dict[str, Callable], optional): A dictionary mapping custom agent names to callable functions. These callables are invoked when their corresponding agents are triggered by the super agent or orchestrator. Defaults to {}.
  • project_version (str, optional): The project version to connect to. If not provided, Distiller uses the latest version.

Returns:

  • _VoiceDistillerContextManager: An asynchronous context manager that handles operations within the given project.

Example:

async with client.realtime_distiller(
    project="example",
    uuid="test"
) as vc:
    # Your asynchronous operations here
    pass

Audio Input

client.realtime_distiller.send_audio_chunk() (asynchronous)

Send chunks of audio bytes containing voice query to WebSocket asynchronously. Typically used within a loop to stream audio input.

Parameters:

  • audio_bytes (bytes): Raw audio data to send to the server.

Example:

async with client.realtime_distiller(
    project="example",
    uuid="test"
) as vc:
    async for audio_chunk in audio:
        await vc.send_audio_chunk(audio_chunk)

Text Input

client.realtime_distiller.send_text_query() (asynchronous)

Send text-based query to the WebSocket asynchronously.

Parameters:

  • text (str): The text query to send.

Example:

async with client.realtime_distiller(
    project="example",
    uuid="test"
) as vc:
    text = "example query"
    await vc.send_text_query(text)

Response Stream

client.realtime_distiller.get_responses() (asynchronous)

Continuously retrieve output (text or audio) responses from the WebSocket asynchronously.

Yields:

  • Dict: A dictionary representing a Realtime Event, containing a response type and an optional response content. Responses can be status events, text response, or speech response in the form of streamed audio chunks.

Example:

async with client.realtime_distiller(
    project="example",
    uuid="test"
) as vc:
    async for response in vc.get_responses():
        print(response)

Realtime Wrapper Methods

High-level methods that handle the complete voice interaction loop. These wrap the base voice APIs (send_audio_chunk(), send_text_query(), get_responses()) to provide a ready-to-use, end-to-end realtime voice experience.

client.realtime_distiller.listen_and_respond() (asynchronous)

Captures audio from the microphone, streams it to the server, and plays back audio responses through the speaker.

Parameters:

  • sample_rate (int, optional): Audio sample rate in Hz. Must match the sample_rate in your YAML speech_config. Defaults to 16000.

Behavior:

  1. Streams microphone audio to the server using send_audio_chunk()
  2. Stops microphone capture when the server begins responding
  3. Receives server responses via get_responses()
  4. Plays TTS audio responses through the speaker
  5. Prints text transcriptions

Example:

async with client.realtime_distiller(
    project="example",
    uuid="test"
) as vc:
    await vc.listen_and_respond(sample_rate=16000)


client.realtime_distiller.send_text_and_respond() (asynchronous)

Sends a text query to the server and plays back audio responses through the speaker.

Parameters:

  • text (str): The text query to send.
  • sample_rate (int, optional): Audio sample rate in Hz. Must match the sample_rate in your YAML speech_config. Defaults to 16000.

Raises:

  • ValueError: If text is empty.

Behavior:

  1. Sends the text query using send_text_query()
  2. Receives server responses via get_responses()
  3. Plays TTS audio responses through the speaker
  4. Prints text transcriptions

Example:

async with client.realtime_distiller(
    project="example",
    uuid="test"
) as vc:
    await vc.send_text_and_respond(
        text="example query",
        sample_rate=16000
    )


Realtime Events

Response events representing status, text response or speech response.

Type Fields/Description
session.created Status event indicating Realtime session creation
response.audio_transcript.delta delta (string) : Partial transcription text
response.audio_transcript.done text (string) : Final transcription text
response.created Status event indicating response has started
response.audio.delta audio (string) : Base64-encoded audio chunk.
response.audio.done Status event indicating current audio response is complete.
response.text.delta content (string): Partial text output from Distiller.
response.text.done Status event indicating Distiller text response is completed.
response.done Status event indicating response has completed


For examples of using Realtime Distiller, check out the tutorials: