Automatic Speech Recognition (ASR) Transcription API¶
The Automatic Speech Recognition (ASR) transcription API generates text transcriptions of an input audio file using the AIRefinery
or the AsyncAIRefinery
client.
Note: This API currently supports batch synthesis only. Streaming output capabilities will be available in a future release.
Asynchronous Transcription¶
AsyncAIRefinery.audio.transcriptions.create()
¶
This method asynchronously generates the text transcriptions of an input audio file.
Parameters:¶
model
(string, Required): Model ID of the ASR model to used to generate the transcription. For detailed information on supported models, see the Automatic Speech Recognition Models in the model catalog.file
(IO[bytes], Required): Open file-like object containing the audio to transcribe, in wav or pcm format.chunking_strategy
(string or ChunkingStrategy, Optional): Parameters to configure server-side VAD and chunking. Accepts "auto" or a ChunkingStrategy object. (default"auto"
).ChunkingStrategy
object attributes:type
("server_vad", Required): Selects server-side VAD chunking.prefix_padding_ms
(integer, 0-5000 ms,Optional): Lead-in audio retained before detected speech, giving context.silence_duration_ms
(integer, 0-5000 ms, Optional): Trailing silence that closes a chunk.threshold
(float, 0.0-1.0, Optional): VAD sensitivity (currently ignored by the service).
language
(string, Optional): Specifies which language the recognizer should detect and transcribe. (default"en-US"
).response_format
(string, Optional): Desired output format. Currently only supporting"json"
. (default"json"
).stream
(boolean, Optional): IfTrue
, enables streaming output transcription responses. Currently not supported, reserved for future use. (defaultFalse
).extra_headers
(map, Optional): Additional HTTP headers to include with the request.extra_body
(map, Optional): Additional body fields to merge/override the top-level parameters.timeout
(integer, Optional): Request timeout in seconds. (default60
).
Returns:¶
-
Returns a
ASRResponse
object, which contains the following attributes:text
(string | null): The transcription of the audio file.null
if no text could be produced.success
(boolean):true
if the transcription request completed successfully;false
otherwise.error
(string | null): Error message describing why transcription failed. Present only whensuccess
isfalse
.confidence
(number | null): Overall confidence score for the transcription (0 – 1). Omitted if the service does not provide a score.
Example Usage¶
import asyncio
import os
from air import login
from air.client import AsyncAIRefinery # a non-async AIRefinery client is also supported
# Authenticate using environment variables for account and API key
auth = login(
account=str(os.getenv("ACCOUNT")), # Fetching AI Refinery ACCOUNT from environment variables
api_key=str(os.getenv("API_KEY")), # Fetching AI Refinery API_KEY from environment variables
)
base_url = os.getenv("AIREFINERY_ADDRESS", "") # Fetching AI Refinery Address from environment variables
async def generate_transcription(file_name):
# Initialize the AI Refinery client with authentication details
client = AsyncAIRefinery(**auth.openai(base_url=base_url))
# Open audio file object
audio_file= open(file_name, "rb")
# Request a transcription through the client using the specified model and audio file
transcription = await client.audio.transcriptions.create(
model="Azure/AI-Transcription", # Specify the model to use for generating the response
file=audio_file, # Pass the open audio file object
)
# Print and return transcription
print(transcription.text)
return transcription.text
# Example call to the generate_response function
if __name__ == "__main__":
asyncio.run(generate_transcription("audio/sample1.wav"))
Synchronous Transcription¶
AIRefinery.audio.transcriptions.create()
¶
This method synchronously generates the text transcriptions of an input audio file. This method supports the same parameters and return structure as the asynchronous method AsyncAIRefinery.audio.transcriptions.create()
described above.
Example Usage¶
import os
from air import login
from air.client import AIRefinery
# Authenticate using environment variables for account and API key
auth = login(
account=str(os.getenv("ACCOUNT")), # Fetching AI Refinery ACCOUNT from environment variables
api_key=str(os.getenv("API_KEY")), # Fetching AI Refinery API_KEY from environment variables
)
base_url = os.getenv("AIREFINERY_ADDRESS", "") # Fetching AI Refinery Address from environment variables
async def generate_transcription(file_name):
# Initialize the AI Refinery client with authentication details
client = AIRefinery(**auth.openai(base_url=base_url))
# Open audio file object
audio_file= open(file_name, "rb")
# Request a transcription through the client using the specified model and audio file
transcription = client.audio.transcriptions.create(
model="Azure/AI-Transcription", # Specify the model to use for generating the response
file=audio_file, # Pass the open audio file object
)
# Print and return transcription
print(transcription.text)
return transcription.text
# Example call to the generate_response function
if __name__ == "__main__":
generate_transcription("audio/sample1.wav")