Skip to content

Text-to-Speech (TTS) API

The Text-to-Speech (TTS) API generates spoken audio from text input using the AIRefinery or the AsyncAIRefinery client.

This API supports two modes: batch synthesis mode, which waits for complete synthesis before returning all audio data at once, and streaming mode, which yields audio chunks as they're produced during synthesis.

Asynchronous TTS

The AsyncAIRefinery client asynchronously generates speech from input text.

Batch and Streaming Methods

  • audio.speech.create() - Returns complete audio after synthesis (batch synthesis mode)
  • audio.speech.with_streaming_response.create() - Returns audio chunks during synthesis (streaming mode)
Parameters:
  • model (string): Model ID used to generate the speech.. Required.
  • input (string): The text to convert to speech. Required.
  • voice (string): Voice name for speech synthesis (e.g., "en-US-JennyNeural"). Required.
  • response_format (string): Audio format for output. Optional. Options: "wav", "mp3", "pcm", "opus". Default: "wav".
  • speed (number): Speech speed multiplier (0.25 to 4.0). Optional. Default: 1.0.
  • timeout (number): Request timeout in seconds. Optional.
  • extra_headers (object): Additional HTTP headers. Optional.
  • extra_body (object): Additional parameters like speech_synthesis_language and sample_rate.
Returns:
Batch Synthesis

The entire text input is processed in a single request, and the complete synthesized audio is returned only after generation is finished.

In this mode, the API returns a TTSResponse object with:

  • content: Raw audio bytes
  • write_to_file(file): Save audio to file
  • stream_to_file(file, chunk_size): Stream audio to file in chunks
  • iter_bytes(chunk_size): Iterate over audio in byte chunks
  • aiter_bytes(chunk_size): Async iterate over audio in byte chunks
Streaming

Synthesized audio is returned incrementally in chunks as it is generated, allowing playback to begin before the full audio is ready.

In this mode, the API returns an StreamingResponse object with:

  • iter(stream_generator()): Iterator of bytes chunks
  • stream_generator.__aiter__(): Async iterator of bytes chunks
  • stream_to_file(file_path): Saves the full streamed audio content to the specified file. Automatically handles sync or async behavior depending on is_async.
Supported Audio Formats

Different use cases prioritize different trade-offs—fidelity, size, compatibility, or streaming efficiency. Supporting multiple formats ensures the API can serve everything from phone-based IVR to high-quality media production.

  • WAV / PCM – Uncompressed, highest fidelity, large files
  • MP3 – Lossy, small, universally supported
  • Ogg Opus – Modern codec that out-performs MP3 at low bit-rates
Supported Sampling Rates
Sampling Rate (Hz) Typical Use
8000 Telephony / IVR
16000 Wide-band speech
22050 / 24000 High-quality voice assistants
44100 / 48000 Broadcast / studio quality

Example Usage:
Batch Synthesis
import os
import asyncio
from air import AsyncAIRefinery, login
from dotenv import load_dotenv

load_dotenv()  # loads your ACCOUNT and API_KEY from a .env file

auth = login(
    account=str(os.getenv("ACCOUNT")),
    api_key=str(os.getenv("API_KEY")),
)

base_url = os.getenv("AIREFINERY_ADDRESS", "")

async def tts_synthesis_async():

    # Initialize the AI Refinery client
    client = AsyncAIRefinery(**auth.openai(base_url=base_url))

    # Generate speech from text (batch mode, async)
    # Speech synthesis language and sample rate can
    # be specified using the `extra_body` parameter
    # Speed can be adjusted from 0.25x (very slow) to 4.0x (very fast)
    response = await client.audio.speech.create(
        model="Azure/AI-Speech", # Specify the model to generate audio chunks
        input="Hello, this is a test of text-to-speech synthesis.",
        voice="en-US-JennyNeural", # Specify the voice used for speech synthesis
        response_format="wav",
        speed=1.0, # e.g. speed = 0.75 results in slow speech, speed = 1.5 results in fast speech
        extra_body={
            "speech_synthesis_language": "en-US",
            "sample_rate": 24000
        }
    )

    # Save the audio to a file
    response.write_to_file("output.wav")
    print(f"Audio saved! Size: {len(response.content)} bytes")

# Run the example
if __name__ == "__main__":
    asyncio.run(tts_synthesis_async())
Streaming
import os
import asyncio
from air import AsyncAIRefinery, login
from dotenv import load_dotenv

load_dotenv()  # loads your ACCOUNT and API_KEY from a .env file

auth = login(
    account=str(os.getenv("ACCOUNT")),
    api_key=str(os.getenv("API_KEY")),
)

base_url = os.getenv("AIREFINERY_ADDRESS", "")

async def tts_synthesis_async():

    # Initialize the AI Refinery client
    client = AsyncAIRefinery(**auth.openai(base_url=base_url))

    # Generate speech from text (batch mode, async)
    # Speech synthesis language and sample rate can
    # be specified using the `extra_body` parameter
    # Speed can be adjusted from 0.25x (very slow) to 4.0x (very fast)
    response = await client.audio.speech.with_streaming_response.create(
        model="Azure/AI-Speech", # Specify the model to generate audio chunks
        input="Hello, this is a test of text-to-speech synthesis.",
        voice="en-US-JennyNeural", # Specify the voice used for speech synthesis
        response_format="wav",
        speed=1.0, # e.g. speed = 0.75 results in slow speech, speed = 1.5 results in fast speech
        extra_body={
            "speech_synthesis_language": "en-US",
            "sample_rate": 24000
        }
    )

    # Save the audio to a file
    response.write_to_file("output.wav")
    print(f"Audio saved! Size: {len(response.content)} bytes")

# Run the example
if __name__ == "__main__":
    asyncio.run(tts_synthesis_async())

Synchronous TTS

The AIRefinery client generates speech from text synchronously. This method supports the same parameters, batch and streaming modes, and return structure as the asynchronous method.

Example Usage:
Batch Synthesis
import os
from air import AIRefinery, login
from dotenv import load_dotenv

load_dotenv()  # loads your ACCOUNT and API_KEY from a .env file

auth = login(
    account=str(os.getenv("ACCOUNT")),
    api_key=str(os.getenv("API_KEY")),
)

base_url = os.getenv("AIREFINERY_ADDRESS", "")

def tts_synthesis_sync():
    # Initialize the AI Refinery client
    client = AIRefinery(**auth.openai(base_url=base_url))

    # Generate speech from text (batch mode, sync)
    # Speech synthesis language and sample rate can
    # be specified using the `extra_body` parameter
    # Speed can be adjusted from 0.25x (very slow) to 4.0x (very fast)
    response = client.audio.speech.create(
        model="Azure/AI-Speech", # Specify the model to generate audio chunks
        input="Hello, this is a synchronous text-to-speech example.",
        voice="en-US-JennyNeural", # Specify the voice used for speech synthesis
        response_format="wav",
        speed=1.0, # e.g. speed = 0.75 results in slow speech, speed = 1.5 results in fast speech
        extra_body={
            "speech_synthesis_language": "en-US",
            "sample_rate": 22050
        }
    )

    # Save the audio to a file
    response.write_to_file("sync_output.wav")
    print(f"Audio saved! Size: {len(response.content)} bytes")

# Run the example
if __name__ == "__main__":
    tts_synthesis_sync()
Streaming
import os
import asyncio
from air import AsyncAIRefinery, login
from dotenv import load_dotenv

load_dotenv()  # loads your ACCOUNT and API_KEY from a .env file

auth = login(
    account=str(os.getenv("ACCOUNT")),
    api_key=str(os.getenv("API_KEY")),
)

base_url = os.getenv("AIREFINERY_ADDRESS", "")

async def tts_synthesis_async():

    # Initialize the AI Refinery client
    client = AIRefinery(**auth.openai(base_url=base_url))

    # Generate speech from text (batch mode, async)
    # Speech synthesis language and sample rate can
    # be specified using the `extra_body` parameter
    # Speed can be adjusted from 0.25x (very slow) to 4.0x (very fast)
    response = await client.audio.speech.with_streaming_response.create(
        model="Azure/AI-Speech", # Specify the model to generate audio chunks
        input="Hello, this is a test of text-to-speech synthesis.",
        voice="en-US-JennyNeural", # Specify the voice used for speech synthesis
        response_format="wav",
        speed=1.0, # e.g. speed = 0.75 results in slow speech, speed = 1.5 results in fast speech
        extra_body={
            "speech_synthesis_language": "en-US",
            "sample_rate": 24000
        }
    )

    # Save the audio to a file
    response.write_to_file("output.wav")
    print(f"Audio saved! Size: {len(response.content)} bytes")

# Run the example
if __name__ == "__main__":
    asyncio.run(tts_synthesis_async())