Transcribe — From audio file

POST /transcribe

Generate a transcript from an audio file. Only audio/* mime types are supported. The maximum duration is 10 minutes. If you have longer files, please use the asynchronous equivalent.

Request

multipart/form-data

Body

required

request_parameters
objectrequired

The object containing all the information needed along with the audio file to transcribe.

speech_localespeech_locale (string)required

The spoken or written locale of the transcript, representing both the language and its specific regional variant.

Possible values: [ENGLISH_US, ENGLISH_UK, SPANISH_ES, SPANISH_MX, FRENCH_FR, ARABIC_EG, ARABIC_LB, ARABIC_MA, ARABIC_SA, ARMENIAN_AM, BENGALI_IN, CANTONESE_CN, CROATIAN_HR, FILIPINO_PH, GERMAN_DE, GREEK_GR, GUJARATI_IN, HEBREW_IL, HINDI_IN, ITALIAN_IT, JAPANESE_JP, KHMER_KH, KOREAN_KR, MANDARIN_CN, PERSIAN_IR, POLISH_PL, PORTUGUESE_PT, PUNJABI_IN, RUSSIAN_RU, SERBIAN_RS, TAMIL_IN, TELUGU_IN, THAI_TH, URDU_IN, VIETNAMESE_VN]

Example: ENGLISH_US

split_by_sentenceboolean

Indicates whether to segment transcription results at sentence boundaries. Default is false, meaning that a single transcript item may encompass multiple sentences, provided they are not delineated by pauses (silence) in the audio.

Default value: false

filebinaryrequired

Responses

Results of processing the audio file.

application/json

Schema
Example (from schema)

Schema

transcript

object[]

required

Array [

textstringrequired

The transcribed text.

Example: Also, I’m allergic to peanuts.

speakercopilot_speaker (string)required

Who said the text in this transcript item.

Possible values: [doctor, patient, unspecified]

Example: doctor

start_offset_msintegerrequired

Start time of this transcription item as the offset, in milliseconds, from the start of the audio file.

Example: 65100

end_offset_msintegerrequired

End time of this transcription item as the offset, in milliseconds, from the start of the audio file. Equals the start_time_ms plus the duration of the related transcribed audio portion.

Example: 69300

]

{
  "transcript": [
    {
      "text": "Also, I’m allergic to peanuts.",
      "speaker": "doctor",
      "start_offset_ms": 65100,
      "end_offset_ms": 69300
    }
  ]
}

Transcribe — From audio file

/transcribe

Request​

Body

Responses​

Request

Responses