Copilot Listen v0.2 documentation
Copilot Listen WebSocket API takes audio streams of your consultations
and returns a live transcription and a generated clinical note.
General specifications
- All messages sent and received via websockets are encoded as UTF-8 JSON text frames.
- We don't keep any of your data beyond the websocket lifecycle.
So to be network-resilient, we recommend you store what is relevant for you to be back on track in case of untimely closure. - You should pass your bearer authentication token as an
Authorization
header when initiating the websocket.
To get one, contact us.
Example:url: 'wss://api.nabla.com/v1/server/copilot/listen', protocol: 'copilot-listen-protocol', extra_headers: { 'Authorization': 'Bearer <YOUR_TOKEN>
}`.
Servers
nabla
Server
nabla
Server- URL:
wss://api.nabla.com/v1/server/copilot
- Protocol:
wss
Operations
PUB /listen
Operation
/listen
OperationCopilot Listen WebSocket API takes audio streams of your consultations
and returns a live transcription and a generated clinical note.
Note generation, depending on the length of the transcript, might take from 10 seconds up to few minutes.
Example communication:
- You: Open the websocket specifying an authorization token.
- You: Send a first message setting the configuration.
- You: continuously send small audio chunks for each stream.
- Copilot: continuously computes and sends transcript items.
- You: stop streaming audio and immediately send a
{ "object": "end" }
. - Copilot: generates the summarizing note and sends it within few seconds.
- Copilot: closes the websocket.
Stream your consultation audio by sending small chunks from each speaker.
wss
Channel specific information
wss
Channel specific informationName | Type | Description | Value | Constraints | Notes |
---|---|---|---|---|---|
headers | object | - | - | - | additional properties are allowed |
headers.Sec-WebSocket-Protocol | string | WebSocket Sub-protocol header, as per the RFC 6455. | const ("copilot-listen-protocol" ) | - | - |
headers.Authorization | string | Your API key prefixed by Bearer . To get one, contact us. | - | - | - |
Accepts one of the following messages:
Message listen_config
listen_config
Initiates the listening feature with the given configuration.
This should be your first message in the websocket.
Payload
Name | Type | Description | Value | Constraints | Notes |
---|---|---|---|---|---|
(root) | object | First message to configure transcription and note generation (audio format, language, etc). | - | - | additional properties are allowed |
object | string | - | const ("listen_config" ) | - | required |
output_objects | array | Specifies which items you want us to send you back. In other words, which feature(s) you want to use, live transcription and/or note generation. | - | - | required |
output_objects (single item) | string | - | allowed ("transcript_item" , "note" ) | - | - |
streams | array | Describe the audio streams you intend to stream as input of the Listen API. Typically, if you have separate doctor/patient audio tracks (or do diarization yourself) you will configure two streams for transcription. Remember that a stream will expect audio chunks to flow continuously (even if silent), if a stream does not receive any audio chunk for 3 seconds the websocket will fail with a timeout error. | - | - | required |
streams.id | string | give an identifier to this stream. | - | - | required |
streams.speaker_type | string | Who is going to speak on this audio stream. | allowed ("doctor" , "patient" ) | - | - |
encoding | string | Encoding of the submitted streaming audio. | allowed ("pcm_s16le" ) | - | required |
sample_rate | integer | Sample rate of submitted streaming audio, in hertz. | allowed (48000 , 44100 , 32000 , 16000 ) | - | required |
language | string | Language spoken in the audio. | allowed ("en" , "fr" ) | - | required |
Examples of payload (generated)
{
"object": "listen_config",
"output_objects": [
"transcript_item"
],
"streams": [
{
"id": "stream1",
"speaker_type": "doctor"
}
],
"encoding": "pcm_s16le",
"sample_rate": 48000,
"language": "en"
}
Message audio_chunk
audio_chunk
A chunk of an audio track from the consultation.
Chunk (little portion) of a single audio track from the consultation. Maximum allowed duration is 1 second, recommended is 50ms.
Payload
Name | Type | Description | Value | Constraints | Notes |
---|---|---|---|---|---|
(root) | object | - | - | - | additional properties are allowed |
object | string | - | const ("audio_chunk" ) | - | required |
payload | string | Raw audio chunk in base64 string. | - | - | required |
stream_id | string | Identifier of one of the streams you defined in streams in the configuration. | - | - | required |
Examples of payload (generated)
{
"object": "audio_chunk",
"payload": "ZXhhbXBsZQ==",
"stream_id": "stream1"
}
Message end
end
End the streaming.
Signal the end of streaming and ask the Copilot to finish what is still in progress (e.g. generate the final note).
Payload
Name | Type | Description | Value | Constraints | Notes |
---|---|---|---|---|---|
(root) | object | - | examples ({"object":"end"} ) | - | additional properties are allowed |
object | string | - | const ("end" ) | - | required |
Examples of payload
{
"object": "end"
}
SUB /listen
Operation
/listen
Operation- Operation ID:
receiveResults
Copilot Listen WebSocket API takes audio streams of your consultations
and returns a live transcription and a generated clinical note.
Note generation, depending on the length of the transcript, might take from 10 seconds up to few minutes.
Example communication:
- You: Open the websocket specifying an authorization token.
- You: Send a first message setting the configuration.
- You: continuously send small audio chunks for each stream.
- Copilot: continuously computes and sends transcript items.
- You: stop streaming audio and immediately send a
{ "object": "end" }
. - Copilot: generates the summarizing note and sends it within few seconds.
- Copilot: closes the websocket.
Receive the live transcription then a summarizing note.
wss
Channel specific information
wss
Channel specific informationName | Type | Description | Value | Constraints | Notes |
---|---|---|---|---|---|
headers | object | - | - | - | additional properties are allowed |
headers.Sec-WebSocket-Protocol | string | WebSocket Sub-protocol header, as per the RFC 6455. | const ("copilot-listen-protocol" ) | - | - |
headers.Authorization | string | Your API key prefixed by Bearer . To get one, contact us. | - | - | - |
Accepts one of the following messages:
Message transcript_item
transcript_item
A transcript item.
A portion of the transcript being generated. Typically, the currently being spoken sentence transcribed from the last transmitted audio chunks. This might be an incomplete sentence since we keep transcribing as audio chunks are received. You should patch the previously received transcript item with the same id until is_final
is true.
Payload
Name | Type | Description | Value | Constraints | Notes |
---|---|---|---|---|---|
(root) | object | - | - | - | additional properties are allowed |
object | string | - | const ("transcript_item" ) | - | required |
id | string | ID of the object you're receiving. It will be re-used as you receive more refined versions of this object, until you get the final version (the one for which is_final is true ). | - | - | required |
text | string | The transcribed text. | - | - | required |
speaker | string | Who said the text in this transcript item. If no diarization, then this is simply inferred from the speakerType of the audio stream from which this sentence got transcribed. | allowed ("doctor" , "patient" ) | - | required |
start_offset_ms | integer | Start time of this transcription item as the offset, in milliseconds, from the opening time of the websocket connection. | - | - | required |
end_offset_ms | integer | End time of this transcription item as the offset, in milliseconds, from the opening time of the websocket connection. Equals the start_time_ms plus the duration of the related transcribed audio portion (regardless of how it was chunked). | - | - | required |
is_final | boolean | Indicates if this is the final version of the transcript item. | - | - | required |
Examples of payload (generated)
{
"object": "transcript_item",
"id": "ba50e302-413d-480d-a880-c173624707c2",
"text": "Also, I’m allergic to peanuts.",
"speaker": "doctor",
"start_offset_ms": 65100,
"end_offset_ms": 69300,
"is_final": true
}
Message note
note
The note summarizing the whole transcribed consultation so far.
An AI-generated summary of the consultation, presented as bullet points and separated into relevant sections (chief complaint, allergies, medication, etc).
Payload
Name | Type | Description | Value | Constraints | Notes |
---|---|---|---|---|---|
(root) | object | - | - | - | additional properties are allowed |
object | string | - | const ("note" ) | - | required |
id | string | ID of the object you're receiving. It will be re-used as you receive more refined versions of this object, until you get the final version (the one for which is_final is true ). | - | - | required |
sections | array | The generated note. | - | - | required |
sections.title | string | The section title | examples ("Chief complaint" , "Observations" , "Allergies" ) | - | required |
sections.content | array | Content of the note section as bullet points. | - | - | required |
sections.content (single item) | string | - | - | - | - |
is_final | boolean | Indicates if this is the final version of the transcript item. | - | - | required |
Examples of payload (generated)
{
"object": "note",
"id": "ba50e302-413d-480d-a880-c173624707c2",
"sections": [
{
"title": "Chief complaint",
"content": [
"Sleep disorder"
]
},
{
"title": "Observations",
"content": [
"32 years old",
"Chronic insomnia"
]
}
],
"is_final": true
}
Message duration_limit
duration_limit
Signals that the streamed audio has reached (or is close to reaching) the maximum duration of 1 hour
.
Payload
Name | Type | Description | Value | Constraints | Notes |
---|---|---|---|---|---|
(root) | object | - | - | - | additional properties are allowed |
object | string | - | const ("duration_limit" ) | - | required |
remaining_seconds | integer | The few remaining seconds count. Will be zero if reached the limit and connection is about to get closed. | - | - | required |
Examples of payload (generated)
{
"object": "duration_limit",
"remaining_seconds": 0
}