Automatic Speech Recognition (ASR) Transcription API¶

The Automatic Speech Recognition (ASR) transcription API generates text transcriptions of an input audio file using the AIRefinery or the AsyncAIRefinery client.

This API supports two modes: batch inference mode for processing complete audio files and returning the final transcription after processing is complete, and streaming mode for returning transcription results incrementally as the audio is processed.

Asynchronous Transcription¶

`AsyncAIRefinery.audio.transcriptions.create()`¶

This method asynchronously generates the text transcription of an input audio file.

Parameters¶

model (string, Required): Model ID of the ASR model to be used to generate the transcription.
file (IO[bytes], Required): Open file-like object containing the audio to transcribe, in WAV or PCM format.
chunking_strategy (string or ChunkingStrategy, Optional): Configures server-side VAD and chunking. Accepts "auto" or a ChunkingStrategy object. (default: "auto")
ChunkingStrategy attributes:
- type ("server_vad", Required): Selects server-side VAD chunking.
- prefix_padding_ms (integer, 0–5000 ms, Optional): Lead-in audio retained before detected speech.
- silence_duration_ms (integer, 0–5000 ms, Optional): Trailing silence duration to end a chunk.
- threshold (float, 0.0–1.0, Optional): VAD sensitivity (currently ignored).
language (string, Optional): Language to detect and transcribe. (default: "en-US").
response_format (string, Optional): Desired output format. (default: "json").
stream (boolean, Optional): If True, enables streaming output. (default: False).
extra_headers (map, Optional): Additional HTTP headers to include.
extra_body (map, Optional): Additional fields to merge/override top-level parameters.
timeout (integer, Optional): Request timeout in seconds. (default: 60).

Returns:¶

Batch Inference¶

The entire audio file is uploaded and processed as a single request, and the final transcription is returned only after processing is complete.

In this mode (stream=False, default), the API returns an ASRResponse object with:

text (string | null): The transcription of the audio file. null if no text was produced.
success (boolean): Indicates whether the transcription request completed successfully.
error (string | null): An optional error message describing why the transcription failed. null if no error occurred.
confidence (number | null): An optional confidence score for the transcription, typically representing the average token confidence. null if unavailable.

Streaming¶

Transcription results are returned incrementally as the audio is processed, enabling display of partial transcription results before the full transcription is complete.

In this mode (stream=True), the API returns an AsyncStream[TranscriptionStreamEvent] object, which yields:

TranscriptionTextDeltaEvent

Represents an incremental transcription update emitted during streaming. Provides a newly transcribed text segment (“delta”) as it becomes available, enabling display of partial results.
- delta (string): The newly transcribed text segment.
- type ("transcript.text.delta"): Event type identifier. Always "transcript.text.delta".
- logprobs (array | null): Optional token-level log probabilities for the delta.
TranscriptionTextDoneEvent

Represents the final transcription result emitted at the end of audio processing. Marks the completion of the transcription stream and contains the full transcribed text.
- text (string): The complete transcription of the audio input.
- type ("transcript.text.done"): Event type identifier. Always "transcript.text.done".
- logprobs (array | null): Optional token-level log probabilities for the transcription.

Example Usage:¶

Batch Inference¶

import asyncio
import os
from air import AsyncAIRefinery
from dotenv import load_dotenv

load_dotenv() #loads your API_KEY
api_key=str(os.getenv("API_KEY"))

async def generate_transcription(file_name):
    client = AsyncAIRefinery(api_key=api_key)
    audio_file = open(file_name, "rb")
    transcription = await client.audio.transcriptions.create(
        model="Azure/AI-Transcription",
        file=audio_file,
    )
    print(transcription.text)
    return transcription.text

if __name__ == "__main__":
    asyncio.run(generate_transcription("audio/sample1.wav"))

Streaming¶

import asyncio
import os
from air import AsyncAIRefinery
from dotenv import load_dotenv

load_dotenv() #loads your API_KEY
api_key=str(os.getenv("API_KEY"))

async def generate_transcription(file_name):
    client = AsyncAIRefinery(api_key=api_key)
    audio_file = open(file_name, "rb")
    transcription_stream = await client.audio.transcriptions.create(
        model="Azure/AI-Transcription",
        file=audio_file,
        stream=True,
    )
    print("\n[Streaming Transcription Output]")
    async for event in transcription_stream:
        print(event)

if __name__ == "__main__":
    asyncio.run(generate_transcription("audio/sample1.wav"))

Synchronous Transcription¶

`AIRefinery.audio.transcriptions.create()`¶

This method synchronously generates the text transcription of an input audio file. It supports the same parameters and return structure as the asynchronous method.

Example Usage:¶

Batch Inference¶

import os
from air import AIRefinery
from dotenv import load_dotenv

load_dotenv() #loads your API_KEY
api_key=str(os.getenv("API_KEY"))

def generate_transcription(file_name):
    client = AIRefinery(api_key=api_key)
    audio_file = open(file_name, "rb")
    transcription = client.audio.transcriptions.create(
        model="Azure/AI-Transcription",
        file=audio_file,
    )
    print(transcription.text)
    return transcription.text

if __name__ == "__main__":
    generate_transcription("audio/sample1.wav")

Streaming¶

import os
from air import AIRefinery
from dotenv import load_dotenv

load_dotenv() #loads your API_KEY
api_key=str(os.getenv("API_KEY"))

def generate_transcription(file_name):
    client = AIRefinery(api_key=api_key)
    audio_file = open(file_name, "rb")
    transcription_stream = client.audio.transcriptions.create(
        model="Azure/AI-Transcription",
        file=audio_file,
        stream=True,
    )
    for event in transcription_stream:
        print(event)

if __name__ == "__main__":
    generate_transcription("audio/sample1.wav")

Automatic Speech Recognition (ASR) Transcription API¶

Asynchronous Transcription¶

AsyncAIRefinery.audio.transcriptions.create()¶

Parameters¶

Returns:¶

Batch Inference¶

Streaming¶

Example Usage:¶

Batch Inference¶

Streaming¶

Synchronous Transcription¶

AIRefinery.audio.transcriptions.create()¶

Example Usage:¶

Batch Inference¶

Streaming¶

`AsyncAIRefinery.audio.transcriptions.create()`¶

`AIRefinery.audio.transcriptions.create()`¶