Skip to main content
๐ŸŽ‰ API version '2024-10-01' just released with many new features. If you are still on an older version, check the migration guide.

Transcribe โ€” From audio file

POST 

/transcribe

Generate a transcript from an audio file. Only audio/* mime types are supported. The maximum duration is 10 minutes. If you have longer files, please use the asynchronous equivalent.

Requestโ€‹

Body

required
    request_parameters objectrequired

    The object containing all the information needed along with the audio file to transcribe.

    speech_locale speech_locale (string)required

    Possible values: [ENGLISH_US, ENGLISH_UK, SPANISH_ES, SPANISH_MX, FRENCH_FR, ARABIC_EG, ARABIC_LB, ARABIC_MA, ARABIC_SA, ARMENIAN_AM, BENGALI_IN, CANTONESE_CN, CROATIAN_HR, FILIPINO_PH, GERMAN_DE, GREEK_GR, GUJARATI_IN, HEBREW_IL, HINDI_IN, ITALIAN_IT, JAPANESE_JP, KHMER_KH, KOREAN_KR, MANDARIN_CN, PERSIAN_IR, POLISH_PL, PORTUGUESE_PT, PUNJABI_IN, RUSSIAN_RU, SERBIAN_RS, TAMIL_IN, TELUGU_IN, THAI_TH, URDU_IN, VIETNAMESE_VN]

    The spoken or written locale of the transcript, representing both the language and its specific regional variant.

    split_by_sentence boolean

    Indicates whether to segment transcription results at sentence boundaries. Default is false, meaning that a single transcript item may encompass multiple sentences, provided they are not delineated by pauses (silence) in the audio.

    file binaryrequired

Responsesโ€‹

Results of processing the audio file.

Schema
    transcript object[]required

    Transcript items from the audio file.

  • Array [
  • text stringrequired

    The transcribed text.

    speaker_type core_api_speaker (string)

    Possible values: [DOCTOR, PATIENT, UNSPECIFIED]

    Who said the text in this transcript item.

    start_offset_ms integerrequired

    Start time of this transcription item as the offset, in milliseconds, from the start of the audio file.

    end_offset_ms integerrequired

    End time of this transcription item as the offset, in milliseconds, from the start of the audio file. Equals the start_time_ms plus the duration of the related transcribed audio portion.

  • ]
Loading...