Skip to content

Automatic Speech Recognition (ASR) Transcription API

The Automatic Speech Recognition (ASR) transcription API generates text transcriptions of an input audio file using the AIRefinery or the AsyncAIRefinery client.

Note: This API currently supports batch synthesis only. Streaming output capabilities will be available in a future release.

Asynchronous Transcription

AsyncAIRefinery.audio.transcriptions.create()

This method asynchronously generates the text transcriptions of an input audio file.

Parameters:
  • model (string, Required): Model ID of the ASR model to used to generate the transcription. For detailed information on supported models, see the Automatic Speech Recognition Models in the model catalog.
  • file (IO[bytes], Required): Open file-like object containing the audio to transcribe, in wav or pcm format.
  • chunking_strategy (string or ChunkingStrategy, Optional): Parameters to configure server-side VAD and chunking. Accepts "auto" or a ChunkingStrategy object. (default "auto").
    • ChunkingStrategy object attributes:
      • type ("server_vad", Required): Selects server-side VAD chunking.
      • prefix_padding_ms (integer, 0-5000 ms,Optional): Lead-in audio retained before detected speech, giving context.
      • silence_duration_ms (integer, 0-5000 ms, Optional): Trailing silence that closes a chunk.
      • threshold (float, 0.0-1.0, Optional): VAD sensitivity (currently ignored by the service).
  • language (string, Optional): Specifies which language the recognizer should detect and transcribe. (default "en-US").
  • response_format (string, Optional): Desired output format. Currently only supporting "json". (default "json").
  • stream (boolean, Optional): If True, enables streaming output transcription responses. Currently not supported, reserved for future use. (default False).
  • extra_headers (map, Optional): Additional HTTP headers to include with the request.
  • extra_body (map, Optional): Additional body fields to merge/override the top-level parameters.
  • timeout (integer, Optional): Request timeout in seconds. (default 60).
Returns:
  • Returns a ASRResponse object, which contains the following attributes:

    • text (string | null): The transcription of the audio file. null if no text could be produced.
    • success (boolean): true if the transcription request completed successfully; false otherwise.
    • error (string | null): Error message describing why transcription failed. Present only when success is false.
    • confidence (number | null): Overall confidence score for the transcription (0 – 1). Omitted if the service does not provide a score.
Example Usage
import asyncio
import os

from air import login
from air.client import AsyncAIRefinery # a non-async AIRefinery client is also supported

# Authenticate using environment variables for account and API key  
auth = login(
    account=str(os.getenv("ACCOUNT")), # Fetching AI Refinery ACCOUNT from environment variables
    api_key=str(os.getenv("API_KEY")), # Fetching AI Refinery API_KEY from environment variables
)
base_url = os.getenv("AIREFINERY_ADDRESS", "") # Fetching AI Refinery Address from environment variables

async def generate_transcription(file_name):
    # Initialize the AI Refinery client with authentication details
    client = AsyncAIRefinery(**auth.openai(base_url=base_url))
    # Open audio file object
    audio_file= open(file_name, "rb")

    # Request a transcription through the client using the specified model and audio file
    transcription = await client.audio.transcriptions.create(
        model="Azure/AI-Transcription", # Specify the model to use for generating the response
        file=audio_file, # Pass the open audio file object
    )

    # Print and return transcription
    print(transcription.text)
    return transcription.text

# Example call to the generate_response function
if __name__ == "__main__":
    asyncio.run(generate_transcription("audio/sample1.wav"))

Synchronous Transcription

AIRefinery.audio.transcriptions.create()

This method synchronously generates the text transcriptions of an input audio file. This method supports the same parameters and return structure as the asynchronous method AsyncAIRefinery.audio.transcriptions.create() described above.

Example Usage
import os

from air import login
from air.client import AIRefinery

# Authenticate using environment variables for account and API key  
auth = login(
    account=str(os.getenv("ACCOUNT")), # Fetching AI Refinery ACCOUNT from environment variables
    api_key=str(os.getenv("API_KEY")), # Fetching AI Refinery API_KEY from environment variables
)
base_url = os.getenv("AIREFINERY_ADDRESS", "") # Fetching AI Refinery Address from environment variables

async def generate_transcription(file_name):
    # Initialize the AI Refinery client with authentication details
    client = AIRefinery(**auth.openai(base_url=base_url))
    # Open audio file object
    audio_file= open(file_name, "rb")

    # Request a transcription through the client using the specified model and audio file
    transcription = client.audio.transcriptions.create(
        model="Azure/AI-Transcription", # Specify the model to use for generating the response
        file=audio_file, # Pass the open audio file object
    )

    # Print and return transcription
    print(transcription.text)
    return transcription.text

# Example call to the generate_response function
if __name__ == "__main__":
    generate_transcription("audio/sample1.wav")