Transcribe โ From audio file
POST/transcribe
Generate a transcript from an audio file. Only audio/*
mime types are supported. The maximum duration is 10 minutes. If you have longer files, please use the asynchronous equivalent.
Requestโ
- multipart/form-data
Body
required
request_parameters objectrequired
The object containing all the information needed along with the audio file to transcribe.
Possible values: [ENGLISH_US
, ENGLISH_UK
, SPANISH_ES
, SPANISH_MX
, FRENCH_FR
, ARABIC_EG
, ARABIC_LB
, ARABIC_MA
, ARABIC_SA
, ARMENIAN_AM
, BENGALI_IN
, CANTONESE_CN
, CROATIAN_HR
, FILIPINO_PH
, GERMAN_DE
, GREEK_GR
, GUJARATI_IN
, HEBREW_IL
, HINDI_IN
, ITALIAN_IT
, JAPANESE_JP
, KHMER_KH
, KOREAN_KR
, MANDARIN_CN
, PERSIAN_IR
, POLISH_PL
, PORTUGUESE_PT
, PUNJABI_IN
, RUSSIAN_RU
, SERBIAN_RS
, TAMIL_IN
, TELUGU_IN
, THAI_TH
, URDU_IN
, VIETNAMESE_VN
]
The spoken or written locale of the transcript, representing both the language and its specific regional variant.
Indicates whether to segment transcription results at sentence boundaries. Default is false, meaning that a single transcript item may encompass multiple sentences, provided they are not delineated by pauses (silence) in the audio.
Responsesโ
- 200
Results of processing the audio file.
- application/json
- Schema
- Example (from schema)
Schema
- Array [
- ]
transcript object[]required
Transcript items from the audio file.
The transcribed text.
Possible values: [DOCTOR
, PATIENT
, UNSPECIFIED
]
Who said the text in this transcript item.
Start time of this transcription item as the offset, in milliseconds, from the start of the audio file.
End time of this transcription item as the offset, in milliseconds, from the start of the audio file. Equals the start_time_ms
plus the duration of the related transcribed audio portion.
{
"transcript": [
{
"text": "Also, Iโm allergic to peanuts.",
"speaker_type": "DOCTOR",
"start_offset_ms": 65100,
"end_offset_ms": 69300
}
]
}