Listen — From audio file
POST/listen
Generate a transcript and a structured clinical note from an audio file. Only audio/*
mime types are supported. The maximum duration is 10 minutes. If you have longer files, please use the asynchronous equivalent.
Request
- multipart/form-data
Body
required
request_parameters objectrequired
The object containing all the information needed along with the audio file to transcribe and generate a note.
Possible values: [transcript_item
, note
]
Specifies which items you want us to send you back. In other words, which feature(s) you want to use, transcription and/or note generation.
Possible values: [fr
, en
, en-US
, en-GB
, fr-FR
, es-ES
, es-MX
]
Language spoken in the audio ('fr' and 'en' are deprecated, and correspond to 'fr-FR' and 'en-US' respectively).
Default value: false
Indicates whether to segment transcription results at sentence boundaries. Default is false, meaning that a single transcript item may encompass multiple sentences, provided they are not delineated by pauses (silence) in the audio.
Possible values: [highest_quality
, fastest
]
Choose a generation mode:
• highest_quality
: generates very high quality notes, might take up to one minute;
• fastest
: quicker note generation (few seconds), but might not give the best possible output.
Default is highest_quality
.
Possible values: [auto
, paragraphs
, bullet_points
]
Choose a desired style for note sections:
• paragraphs
: Prioritizes generating paragraphs;
• bullet_points
: Prioritizes structuring content using bullet points.
• auto
: Automatically picks the most natural formatting option.
Default is auto
.
Possible values: [GENERAL_MEDICINE
, CARDIOLOGY
, PSYCHIATRY
, DIET
, PSYCHOLOGY
, SOAP
]
The desired template of the generated note. Default is GENERAL_MEDICINE
.
Check Note template for details.
Responses
- 200
Results of processing the audio file.
- application/json
- Schema
- Example (from schema)
Schema
- Array [
- ]
- Array [
- ]
transcript object[]
Transcript items from the audio file.
The transcribed text.
Possible values: [doctor
, patient
, unspecified
]
Who said the text in this transcript item.
Start time of this transcription item as the offset, in milliseconds, from the start of the audio file.
End time of this transcription item as the offset, in milliseconds, from the start of the audio file. Equals the start_time_ms
plus the duration of the related transcribed audio portion.
note object
The generated note.
sections object[]required
Content of the note structured in multiple sections.
A key identifying a section of a note. The set of possible keys depend on the template that is used. Check Note template for possible values.
The section title.
Content of the note section.
{
"transcript": [
{
"text": "Also, I’m allergic to peanuts.",
"speaker": "doctor",
"start_offset_ms": 65100,
"end_offset_ms": 69300
}
],
"note": [
{
"key": "CHIEF_COMPLAINT",
"title": "Chief complaint",
"text": "Fatigue and headaches"
},
{
"key": "SYMPTOMS",
"title": "Symptoms",
"text": "- Tiredness all day long\n- Mild headaches on the right side"
}
]
}