Dictate Transcription WebSocket API is designed specifically for healthcare professionals to dictate clinical notes efficiently and accurately. It takes audio from your provider's microphone and returns a live transcription.
By receiving audio directly captured from the provider's microphone, this API delivers live transcription services tailored for the medical field. It differs from our general transcription service (/transcribe-ws
) by offering specific configuration options catered to the unique needs of clinical documentation dictation, such as precise control over punctuation.
General specifications
- All messages sent and received via websockets are encoded as UTF-8 JSON text frames.
- We don't keep any of your data beyond the websocket lifecycle. So to be network-resilient, we recommend you store what is relevant for you to be back on track in case of untimely closure.
Authentication
- Get an access token, details in Authentication. It can be a server access token or a user's one, depending on whether you are calling the Server API or the User API.
- We support two different ways of specifying the authentication token:
-
The recommended way: You pass your bearer authentication token as an
Authorization
header when initiating the websocket.Example:
url: 'wss://us.api.nabla.com/v1/core/server/dictate-ws', protocol: 'dictate-protocol', extra_headers: { 'Authorization': 'Bearer <YOUR_TOKEN>' }
-
An alternative way, especially when specifying extra headers is not supported by your websocket client (e.g. from web browsers), is to pass the token as a second websocket protocol prefixed with
jwt-
.Example:
url: 'wss://us.api.nabla.com/v1/core/user/dictate-ws', protocols: ['dictate-protocol', 'jwt-<YOUR_TOKEN>']
-
Servers
- wss://us.api.nabla.com/v1/core/serverwssServer API
Called from your servers, authenticated with a server key.
- wss://us.api.nabla.com/v1/core/userwssUser API
Called from front-end apps, authenticated with a user-scoped access token.
Operations
PUB /dictate-ws
Transcription Dictate WebSocket API takes audio from your provider's microphone and returns a live transcription.
Dictation API expects audio chunks to flow continuously (even if silent), if no audio chunk is received for 10 seconds the websocket will fail with a timeout error (code: 83011).
Example communication:
- You: Open the websocket specifying an authorization token.
- You: Send a first message setting the configuration.
- You: continuously send small audio chunks.
- Core API: continuously computes and sends dictation items.
- You: stop streaming audio and immediately send a
{ "type": "END" }
. - Core API: finishes processing audio and pushes any ongoing dictation item to a final state.
- Core API: closes the websocket.
Stream the audio of the note being dictated by sending it in small, defined chunks (e.g., segments of 100 milliseconds each).
Accepts one of the following messages:
- #0dictate_config
Initiates the dictation feature with the given configuration.
This should be your first message in the websocket.
objectuid: dictate_configFirst message to configure dictation (audio format, locale, etc).
Examples
{ "type": "CONFIG", "encoding": "PCM_S16LE", "sample_rate": 16000, "speech_locale": "ENGLISH_US", "dictate_punctuation": true, "enable_audio_chunk_ack": false }
This example has been generated automatically.
- #1audio_chunk
A chunk of audio from the provider's microphone.
Chunk (little portion) of audio from the provider's microphone on which they dictate. Maximum allowed duration is 1 second, recommended is 100ms.
objectuid: audio_chunkExamples
{ "type": "AUDIO_CHUNK", "payload": "ZXhhbXBsZQ==", "seq_id": 0 }
This example has been generated automatically.
- #2end
End the streaming.
Signal the end of streaming and ask the speech-to-text engine to finish what is still in progress (e.g. gives a final state to the latest dictation item).
objectuid: endExamples
{ "type": "END" }
This example has been generated automatically.
SUB /dictate-ws
Transcription Dictate WebSocket API takes audio from your provider's microphone and returns a live transcription.
Dictation API expects audio chunks to flow continuously (even if silent), if no audio chunk is received for 10 seconds the websocket will fail with a timeout error (code: 83011).
Example communication:
- You: Open the websocket specifying an authorization token.
- You: Send a first message setting the configuration.
- You: continuously send small audio chunks.
- Core API: continuously computes and sends dictation items.
- You: stop streaming audio and immediately send a
{ "type": "END" }
. - Core API: finishes processing audio and pushes any ongoing dictation item to a final state.
- Core API: closes the websocket.
Receive the live transcription.
Accepts one of the following messages:
- #0dictation_item
A dictation item.
A portion of the dictation being generated. Typically, the currently being spoken sentence transcribed from the last transmitted audio chunks. This might be an incomplete sentence since we keep transcribing as audio chunks are received. You should patch the previously received dictation item with the same id until
is_final
is true.objectuid: dictation_itemExamples
{ "type": "DICTATION_ITEM", "id": "98FCE1EF-DBCA-41EF-8BC7-4D1621AC07C6", "text": "Patient showed signs of rapid improvement.", "start_offset_ms": 65100, "end_offset_ms": 69300, "is_final": true }
This example has been generated automatically.
- #1audio_chunk_ack
Acknowledgement for audio chunks up to the specified sequential id.
When enabled in the configuration, server will regularly send audio chunks acknowledgement to signal receipt and imminent processing of audio.
Clients should consider acknowledged audio as processed and delete it from their buffer.
Moreover, audio acknowledgement is intended to set the pace for streaming speed: Clients should refrain from sending new audio chunks until acknowledgement is received for previous ones. Server will accept up to 10 seconds of not-yet-acknowledged audio: clients going further will face an "audio chunks buffer overflow" error.
Read more in Make transcription and dictation resilient to network interruptions.
objectuid: audio_chunk_ackAcknowledgement of audio receipt by the server up to the audio chunk with the sequential id
ack_id
.Read more in Make transcription and dictation resilient to network interruptions.
Examples
{ "type": "AUDIO_CHUNK_ACK", "ack_id": 42 }
This example has been generated automatically.
- #2error_message
An error message.
An error message sent right before closing the websocket due to a fatal error. It explains shortly what went wrong.
objectuid: error_messageExamples
{ "type": "ERROR_MESSAGE", "message": "Unable to parse JSON at path $.speech_locale" }
This example has been generated automatically.
Messages
- #1dictate_config
Initiates the dictation feature with the given configuration.
This should be your first message in the websocket.
objectuid: dictate_configFirst message to configure dictation (audio format, locale, etc).
- #2audio_chunk
A chunk of audio from the provider's microphone.
Chunk (little portion) of audio from the provider's microphone on which they dictate. Maximum allowed duration is 1 second, recommended is 100ms.
objectuid: audio_chunk - #3end
End the streaming.
Signal the end of streaming and ask the speech-to-text engine to finish what is still in progress (e.g. gives a final state to the latest dictation item).
objectuid: end - #4dictation_item
A dictation item.
A portion of the dictation being generated. Typically, the currently being spoken sentence transcribed from the last transmitted audio chunks. This might be an incomplete sentence since we keep transcribing as audio chunks are received. You should patch the previously received dictation item with the same id until
is_final
is true.objectuid: dictation_item - #5audio_chunk_ack
Acknowledgement for audio chunks up to the specified sequential id.
When enabled in the configuration, server will regularly send audio chunks acknowledgement to signal receipt and imminent processing of audio.
Clients should consider acknowledged audio as processed and delete it from their buffer.
Moreover, audio acknowledgement is intended to set the pace for streaming speed: Clients should refrain from sending new audio chunks until acknowledgement is received for previous ones. Server will accept up to 10 seconds of not-yet-acknowledged audio: clients going further will face an "audio chunks buffer overflow" error.
Read more in Make transcription and dictation resilient to network interruptions.
objectuid: audio_chunk_ackAcknowledgement of audio receipt by the server up to the audio chunk with the sequential id
ack_id
.Read more in Make transcription and dictation resilient to network interruptions.
- #6error_message
An error message.
An error message sent right before closing the websocket due to a fatal error. It explains shortly what went wrong.
objectuid: error_message
Schemas
- objectuid: dictate_config
First message to configure dictation (audio format, locale, etc).
- objectuid: audio_chunk
- objectuid: end
- objectuid: dictation_item
- objectuid: audio_chunk_ack
Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id
ack_id
.Read more in Make transcription and dictation resilient to network interruptions.
- objectuid: error_message