Transcribe WebSocket API takes audio streams of your encounters and returns a live transcription.
๐Show me some code
We've published a JavaScript example of how you can integrate the /transcribe-ws API on a web page.
General specifications
- All messages sent and received via websockets are encoded as UTF-8 JSON text frames.
- We don't keep any of your data beyond the websocket lifecycle. So to be network-resilient, we recommend you store what is relevant for you to be back on track in case of untimely closure.
Authentication
- Get an access token, details in Authentication. It can be a server access token or a user's one, depending on whether you are calling the Server API or the User API.
- We support two different ways of specifying the authentication token:
-
The recommended way: You pass your bearer authentication token as an
Authorization
header when initiating the websocket.Example:
url: 'wss://us.api.nabla.com/v1/core/server/transcribe-ws', protocol: 'transcribe-protocol', extra_headers: { 'Authorization': 'Bearer <YOUR_TOKEN>' }
-
An alternative way, especially when specifying extra headers is not supported by your websocket client (e.g. from web browsers), is to pass the token as a second websocket protocol prefixed with
jwt-
.Example:
url: 'wss://us.api.nabla.com/v1/core/user/transcribe-ws', protocols: ['transcribe-protocol', 'jwt-<YOUR_TOKEN>']
-
Servers
- wss://{region}.api.nabla.com/v1/core/serverwssServer API
Called from your servers, authenticated with a server key.
- wss://{region}.api.nabla.com/v1/core/userwssUser API
Called from front-end apps, authenticated with a user-scoped access token.
Operations
PUB /transcribe-ws
Transcribe WebSocket API takes audio streams of your encounters and returns a live transcription.
Each of the audio streams you declare in the configuration will expect audio chunks to flow continuously (even if silent), if a stream does not receive any audio chunk for 10 seconds the websocket will fail with a timeout error (code: 83011).
Example communication:
- You: Open the websocket specifying an authorization token.
- You: Send a first message setting the configuration.
- You: continuously send small audio chunks for each stream.
- Nabla: continuously computes and sends transcript items.
- You: stop streaming audio and immediately send a
{ "type": "END" }
. - Nabla: finishes processing audio and pushes any ongoing transcript item to a final state.
- Nabla: closes the websocket.
- You: filter-out non-final transcript items and sort them by
start_offset_ms
before calling the note-generation API.
Stream your encounter's audio by sending small chunks from each speaker.
Accepts one of the following messages:
- #0transcribe_config
Initiates the transcribeing feature with the given configuration.
This should be your first message in the websocket.
objectuid: transcribe_configFirst message to configure transcription (audio format, locale, etc).
Examples
{ "type": "config", "encoding": "pcm_s16le", "sample_rate": 16000, "speech_locales": [ "ENGLISH_US", "SPANISH_ES" ], "streams": [ { "id": "doctor_stream", "speaker_type": "doctor" }, { "id": "patient_stream", "speaker_type": "patient" } ], "split_by_sentence": true }
This example has been generated automatically.
- #1audio_chunk
A chunk of an audio track from the encounter.
Chunk (little portion) of a single audio track from the encounter. Maximum allowed duration is 1 second, recommended is 100ms.
objectuid: audio_chunkExamples
{ "type": "AUDIO_CHUNK", "payload": "ZXhhbXBsZQ==", "stream_id": "doctor_stream", "seq_id": 0 }
This example has been generated automatically.
- #2end
End the streaming.
Signal the end of streaming and ask the Nabla Core API to finish what is still in progress (e.g. gives a final state to the latest transcript item).
objectuid: endExamples
{ "type": "END" }
This example has been generated automatically.
SUB /transcribe-ws
Transcribe WebSocket API takes audio streams of your encounters and returns a live transcription.
Each of the audio streams you declare in the configuration will expect audio chunks to flow continuously (even if silent), if a stream does not receive any audio chunk for 10 seconds the websocket will fail with a timeout error (code: 83011).
Example communication:
- You: Open the websocket specifying an authorization token.
- You: Send a first message setting the configuration.
- You: continuously send small audio chunks for each stream.
- Nabla: continuously computes and sends transcript items.
- You: stop streaming audio and immediately send a
{ "type": "END" }
. - Nabla: finishes processing audio and pushes any ongoing transcript item to a final state.
- Nabla: closes the websocket.
- You: filter-out non-final transcript items and sort them by
start_offset_ms
before calling the note-generation API.
Receive the live transcription.
Accepts one of the following messages:
- #0transcript_item
A transcript item.
A portion of the transcript being generated. Typically, the currently being spoken sentence transcribed from the last transmitted audio chunks. This might be an incomplete sentence since we keep transcribing as audio chunks are received. You should patch the previously received transcript item with the same id until
is_final
is true.objectuid: transcript_itemExamples
{ "type": "TRANSCRIPT_ITEM", "id": "98FCE1EF-DBCA-41EF-8BC7-4D1621AC07C6", "text": "Also, Iโm allergic to peanuts.", "speaker_type": "DOCTOR", "start_offset_ms": 65100, "end_offset_ms": 69300, "is_final": true }
This example has been generated automatically.
- #1audio_chunk_ack
Acknowledgement for audio chunks up to the specified sequential id.
When enabled in the configuration, server will regularly send audio chunks acknowledgement to signal receipt and imminent transcription of audio.
Clients should consider acknowledged audio as processed and delete it from their buffer.
Moreover, audio acknowledgement is intended to set the pace for streaming speed: Clients should refrain from sending new audio chunks until acknowledgement is received for previous ones. Server will accept up to 10 seconds of not-yet-acknowledged audio: clients going further will face an "audio chunks buffer overflow" error.
Read more in Make transcription and dictation resilient to network interruptions.
objectuid: audio_chunk_ackAcknowledgement of audio receipt by the server up to the audio chunk with the sequential id
ack_id
.Read more in Make transcription and dictation resilient to network interruptions.
Examples
{ "type": "AUDIO_CHUNK_ACK", "stream_id": "patient_stream", "ack_id": 42 }
This example has been generated automatically.
- #2error_message
An error message.
An error message sent right before closing the websocket due to a fatal error. It explains shortly what went wrong.
objectuid: error_messageExamples
{ "type": "ERROR_MESSAGE", "message": "Unable to parse JSON at path $.speech_locale" }
This example has been generated automatically.
Messages
- #1transcribe_config
Initiates the transcribeing feature with the given configuration.
This should be your first message in the websocket.
objectuid: transcribe_configFirst message to configure transcription (audio format, locale, etc).
- #2audio_chunk
A chunk of an audio track from the encounter.
Chunk (little portion) of a single audio track from the encounter. Maximum allowed duration is 1 second, recommended is 100ms.
objectuid: audio_chunk - #3end
End the streaming.
Signal the end of streaming and ask the Nabla Core API to finish what is still in progress (e.g. gives a final state to the latest transcript item).
objectuid: end - #4transcript_item
A transcript item.
A portion of the transcript being generated. Typically, the currently being spoken sentence transcribed from the last transmitted audio chunks. This might be an incomplete sentence since we keep transcribing as audio chunks are received. You should patch the previously received transcript item with the same id until
is_final
is true.objectuid: transcript_item - #5audio_chunk_ack
Acknowledgement for audio chunks up to the specified sequential id.
When enabled in the configuration, server will regularly send audio chunks acknowledgement to signal receipt and imminent transcription of audio.
Clients should consider acknowledged audio as processed and delete it from their buffer.
Moreover, audio acknowledgement is intended to set the pace for streaming speed: Clients should refrain from sending new audio chunks until acknowledgement is received for previous ones. Server will accept up to 10 seconds of not-yet-acknowledged audio: clients going further will face an "audio chunks buffer overflow" error.
Read more in Make transcription and dictation resilient to network interruptions.
objectuid: audio_chunk_ackAcknowledgement of audio receipt by the server up to the audio chunk with the sequential id
ack_id
.Read more in Make transcription and dictation resilient to network interruptions.
- #6error_message
An error message.
An error message sent right before closing the websocket due to a fatal error. It explains shortly what went wrong.
objectuid: error_message
Schemas
- objectuid: transcribe_config
First message to configure transcription (audio format, locale, etc).
- objectuid: audio_chunk
- objectuid: end
- objectuid: transcript_item
- objectuid: audio_chunk_ack
Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id
ack_id
.Read more in Make transcription and dictation resilient to network interruptions.
- objectuid: error_message