Text-to-Speech (TTS) API¶
The Text-to-Speech (TTS) API generates spoken audio from text input using the AIRefinery
or the AsyncAIRefinery
client.
This API supports two modes: batch synthesis mode, which waits for complete synthesis before returning all audio data at once, and streaming mode, which yields audio chunks as they're produced during synthesis.
Asynchronous TTS¶
The AsyncAIRefinery client asynchronously generates speech from input text.
Batch and Streaming Methods¶
audio.speech.create()
- Returns complete audio after synthesis (batch synthesis mode)audio.speech.with_streaming_response.create()
- Returns audio chunks during synthesis (streaming mode)
Parameters:¶
model
(string): Model ID used to generate the speech.. Required.input
(string): The text to convert to speech. Required.voice
(string): Voice name for speech synthesis (e.g., "en-US-JennyNeural"). Required.response_format
(string): Audio format for output. Optional. Options: "wav", "mp3", "pcm", "opus". Default: "wav".speed
(number): Speech speed multiplier (0.25 to 4.0). Optional. Default: 1.0.timeout
(number): Request timeout in seconds. Optional.extra_headers
(object): Additional HTTP headers. Optional.extra_body
(object): Additional parameters likespeech_synthesis_language
andsample_rate
.
Returns:¶
Batch Synthesis¶
The entire text input is processed in a single request, and the complete synthesized audio is returned only after generation is finished.
In this mode, the API returns a TTSResponse
object with:
content
: Raw audio byteswrite_to_file(file)
: Save audio to filestream_to_file(file, chunk_size)
: Stream audio to file in chunksiter_bytes(chunk_size)
: Iterate over audio in byte chunksaiter_bytes(chunk_size)
: Async iterate over audio in byte chunks
Streaming¶
Synthesized audio is returned incrementally in chunks as it is generated, allowing playback to begin before the full audio is ready.
In this mode, the API returns an StreamingResponse
object with:
iter(stream_generator())
: Iterator of bytes chunksstream_generator.__aiter__()
: Async iterator of bytes chunksstream_to_file(file_path)
: Saves the full streamed audio content to the specified file. Automatically handles sync or async behavior depending onis_async
.
Supported Audio Formats¶
Different use cases prioritize different trade-offs—fidelity, size, compatibility, or streaming efficiency. Supporting multiple formats ensures the API can serve everything from phone-based IVR to high-quality media production.
- WAV / PCM – Uncompressed, highest fidelity, large files
- MP3 – Lossy, small, universally supported
- Ogg Opus – Modern codec that out-performs MP3 at low bit-rates
Supported Sampling Rates¶
Sampling Rate (Hz) | Typical Use |
---|---|
8000 | Telephony / IVR |
16000 | Wide-band speech |
22050 / 24000 | High-quality voice assistants |
44100 / 48000 | Broadcast / studio quality |
Example Usage:¶
Batch Synthesis¶
import os
import asyncio
from air import AsyncAIRefinery, login
from dotenv import load_dotenv
load_dotenv() # loads your ACCOUNT and API_KEY from a .env file
auth = login(
account=str(os.getenv("ACCOUNT")),
api_key=str(os.getenv("API_KEY")),
)
base_url = os.getenv("AIREFINERY_ADDRESS", "")
async def tts_synthesis_async():
# Initialize the AI Refinery client
client = AsyncAIRefinery(**auth.openai(base_url=base_url))
# Generate speech from text (batch mode, async)
# Speech synthesis language and sample rate can
# be specified using the `extra_body` parameter
# Speed can be adjusted from 0.25x (very slow) to 4.0x (very fast)
response = await client.audio.speech.create(
model="Azure/AI-Speech", # Specify the model to generate audio chunks
input="Hello, this is a test of text-to-speech synthesis.",
voice="en-US-JennyNeural", # Specify the voice used for speech synthesis
response_format="wav",
speed=1.0, # e.g. speed = 0.75 results in slow speech, speed = 1.5 results in fast speech
extra_body={
"speech_synthesis_language": "en-US",
"sample_rate": 24000
}
)
# Save the audio to a file
response.write_to_file("output.wav")
print(f"Audio saved! Size: {len(response.content)} bytes")
# Run the example
if __name__ == "__main__":
asyncio.run(tts_synthesis_async())
Streaming¶
import os
import asyncio
from air import AsyncAIRefinery, login
from dotenv import load_dotenv
load_dotenv() # loads your ACCOUNT and API_KEY from a .env file
auth = login(
account=str(os.getenv("ACCOUNT")),
api_key=str(os.getenv("API_KEY")),
)
base_url = os.getenv("AIREFINERY_ADDRESS", "")
async def tts_synthesis_async():
# Initialize the AI Refinery client
client = AsyncAIRefinery(**auth.openai(base_url=base_url))
# Generate speech from text (batch mode, async)
# Speech synthesis language and sample rate can
# be specified using the `extra_body` parameter
# Speed can be adjusted from 0.25x (very slow) to 4.0x (very fast)
response = await client.audio.speech.with_streaming_response.create(
model="Azure/AI-Speech", # Specify the model to generate audio chunks
input="Hello, this is a test of text-to-speech synthesis.",
voice="en-US-JennyNeural", # Specify the voice used for speech synthesis
response_format="wav",
speed=1.0, # e.g. speed = 0.75 results in slow speech, speed = 1.5 results in fast speech
extra_body={
"speech_synthesis_language": "en-US",
"sample_rate": 24000
}
)
# Save the audio to a file
response.write_to_file("output.wav")
print(f"Audio saved! Size: {len(response.content)} bytes")
# Run the example
if __name__ == "__main__":
asyncio.run(tts_synthesis_async())
Synchronous TTS¶
The AIRefinery client generates speech from text synchronously. This method supports the same parameters, batch and streaming modes, and return structure as the asynchronous method.
Example Usage:¶
Batch Synthesis¶
import os
from air import AIRefinery, login
from dotenv import load_dotenv
load_dotenv() # loads your ACCOUNT and API_KEY from a .env file
auth = login(
account=str(os.getenv("ACCOUNT")),
api_key=str(os.getenv("API_KEY")),
)
base_url = os.getenv("AIREFINERY_ADDRESS", "")
def tts_synthesis_sync():
# Initialize the AI Refinery client
client = AIRefinery(**auth.openai(base_url=base_url))
# Generate speech from text (batch mode, sync)
# Speech synthesis language and sample rate can
# be specified using the `extra_body` parameter
# Speed can be adjusted from 0.25x (very slow) to 4.0x (very fast)
response = client.audio.speech.create(
model="Azure/AI-Speech", # Specify the model to generate audio chunks
input="Hello, this is a synchronous text-to-speech example.",
voice="en-US-JennyNeural", # Specify the voice used for speech synthesis
response_format="wav",
speed=1.0, # e.g. speed = 0.75 results in slow speech, speed = 1.5 results in fast speech
extra_body={
"speech_synthesis_language": "en-US",
"sample_rate": 22050
}
)
# Save the audio to a file
response.write_to_file("sync_output.wav")
print(f"Audio saved! Size: {len(response.content)} bytes")
# Run the example
if __name__ == "__main__":
tts_synthesis_sync()
Streaming¶
import os
import asyncio
from air import AsyncAIRefinery, login
from dotenv import load_dotenv
load_dotenv() # loads your ACCOUNT and API_KEY from a .env file
auth = login(
account=str(os.getenv("ACCOUNT")),
api_key=str(os.getenv("API_KEY")),
)
base_url = os.getenv("AIREFINERY_ADDRESS", "")
async def tts_synthesis_async():
# Initialize the AI Refinery client
client = AIRefinery(**auth.openai(base_url=base_url))
# Generate speech from text (batch mode, async)
# Speech synthesis language and sample rate can
# be specified using the `extra_body` parameter
# Speed can be adjusted from 0.25x (very slow) to 4.0x (very fast)
response = await client.audio.speech.with_streaming_response.create(
model="Azure/AI-Speech", # Specify the model to generate audio chunks
input="Hello, this is a test of text-to-speech synthesis.",
voice="en-US-JennyNeural", # Specify the voice used for speech synthesis
response_format="wav",
speed=1.0, # e.g. speed = 0.75 results in slow speech, speed = 1.5 results in fast speech
extra_body={
"speech_synthesis_language": "en-US",
"sample_rate": 24000
}
)
# Save the audio to a file
response.write_to_file("output.wav")
print(f"Audio saved! Size: {len(response.content)} bytes")
# Run the example
if __name__ == "__main__":
asyncio.run(tts_synthesis_async())