Automatic Speech Recognition (ASR) Transcription API¶
The Automatic Speech Recognition (ASR) transcription API generates text transcriptions of an input audio file using the AIRefinery
or the AsyncAIRefinery
client.
This API supports two modes: batch inference mode for processing complete audio files and returning the final transcription after processing is complete, and streaming mode for returning transcription results incrementally as the audio is processed.
Asynchronous Transcription¶
AsyncAIRefinery.audio.transcriptions.create()
¶
This method asynchronously generates the text transcription of an input audio file.
Parameters¶
model
(string, Required): Model ID of the ASR model to be used to generate the transcription.file
(IO[bytes], Required): Open file-like object containing the audio to transcribe, in WAV or PCM format.chunking_strategy
(string or ChunkingStrategy, Optional): Configures server-side VAD and chunking. Accepts"auto"
or aChunkingStrategy
object. (default:"auto"
)ChunkingStrategy
attributes:type
("server_vad", Required): Selects server-side VAD chunking.prefix_padding_ms
(integer, 0–5000 ms, Optional): Lead-in audio retained before detected speech.silence_duration_ms
(integer, 0–5000 ms, Optional): Trailing silence duration to end a chunk.threshold
(float, 0.0–1.0, Optional): VAD sensitivity (currently ignored).
language
(string, Optional): Language to detect and transcribe. (default:"en-US"
).response_format
(string, Optional): Desired output format. (default:"json"
).stream
(boolean, Optional): IfTrue
, enables streaming output. (default:False
).extra_headers
(map, Optional): Additional HTTP headers to include.extra_body
(map, Optional): Additional fields to merge/override top-level parameters.timeout
(integer, Optional): Request timeout in seconds. (default:60
).
Returns:¶
Batch Inference¶
The entire audio file is uploaded and processed as a single request, and the final transcription is returned only after processing is complete.
In this mode (stream=False
, default), the API returns an ASRResponse
object with:
text
(string | null
): The transcription of the audio file.null
if no text was produced.success
(boolean
): Indicates whether the transcription request completed successfully.error
(string | null
): An optional error message describing why the transcription failed.null
if no error occurred.confidence
(number | null
): An optional confidence score for the transcription, typically representing the average token confidence.null
if unavailable.
Streaming¶
Transcription results are returned incrementally as the audio is processed, enabling display of partial transcription results before the full transcription is complete.
In this mode (stream=True
), the API returns an AsyncStream[TranscriptionStreamEvent]
object, which yields:
-
TranscriptionTextDeltaEvent
Represents an incremental transcription update emitted during streaming. Provides a newly transcribed text segment (“delta”) as it becomes available, enabling display of partial results.
delta
(string
): The newly transcribed text segment.type
("transcript.text.delta"
): Event type identifier. Always"transcript.text.delta"
.logprobs
(array | null
): Optional token-level log probabilities for thedelta
.
-
TranscriptionTextDoneEvent
Represents the final transcription result emitted at the end of audio processing. Marks the completion of the transcription stream and contains the full transcribed text.
text
(string
): The complete transcription of the audio input.type
("transcript.text.done"
): Event type identifier. Always"transcript.text.done"
.logprobs
(array | null
): Optional token-level log probabilities for the transcription.
Example Usage:¶
Batch Inference¶
import asyncio
import os
from air import login
from air.client import AsyncAIRefinery
from dotenv import load_dotenv
load_dotenv()
auth = login(account=os.getenv("ACCOUNT"), api_key=os.getenv("API_KEY"))
base_url = os.getenv("AIREFINERY_ADDRESS", "")
async def generate_transcription(file_name):
client = AsyncAIRefinery(**auth.openai(base_url=base_url))
audio_file = open(file_name, "rb")
transcription = await client.audio.transcriptions.create(
model="Azure/AI-Transcription",
file=audio_file,
)
print(transcription.text)
return transcription.text
if __name__ == "__main__":
asyncio.run(generate_transcription("audio/sample1.wav"))
Streaming¶
import asyncio
import os
from air import login
from air.client import AsyncAIRefinery
from dotenv import load_dotenv
load_dotenv()
auth = login(account=os.getenv("ACCOUNT"), api_key=os.getenv("API_KEY"))
base_url = os.getenv("AIREFINERY_ADDRESS", "")
async def generate_transcription(file_name):
client = AsyncAIRefinery(**auth.openai(base_url=base_url))
audio_file = open(file_name, "rb")
transcription_stream = await client.audio.transcriptions.create(
model="Azure/AI-Transcription",
file=audio_file,
stream=True,
)
print("\n[Streaming Transcription Output]")
async for event in transcription_stream:
print(event)
if __name__ == "__main__":
asyncio.run(generate_transcription("audio/sample1.wav"))
Synchronous Transcription¶
AIRefinery.audio.transcriptions.create()
¶
This method synchronously generates the text transcription of an input audio file. It supports the same parameters and return structure as the asynchronous method.
Example Usage:¶
Batch Inference¶
import os
from air import login
from air.client import AIRefinery
from dotenv import load_dotenv
load_dotenv()
auth = login(account=os.getenv("ACCOUNT"), api_key=os.getenv("API_KEY"))
base_url = os.getenv("AIREFINERY_ADDRESS", "")
def generate_transcription(file_name):
client = AIRefinery(**auth.openai(base_url=base_url))
audio_file = open(file_name, "rb")
transcription = client.audio.transcriptions.create(
model="Azure/AI-Transcription",
file=audio_file,
)
print(transcription.text)
return transcription.text
if __name__ == "__main__":
generate_transcription("audio/sample1.wav")
Streaming¶
import os
from air import login
from air.client import AIRefinery
from dotenv import load_dotenv
load_dotenv()
auth = login(account=os.getenv("ACCOUNT"), api_key=os.getenv("API_KEY"))
base_url = os.getenv("AIREFINERY_ADDRESS", "")
def generate_transcription(file_name):
client = AIRefinery(**auth.openai(base_url=base_url))
audio_file = open(file_name, "rb")
transcription_stream = client.audio.transcriptions.create(
model="Azure/AI-Transcription",
file=audio_file,
stream=True,
)
for event in transcription_stream:
print(event)
if __name__ == "__main__":
generate_transcription("audio/sample1.wav")