Skip to main content
Dictate Transcription WebSocket API documentation
Dictate Transcription WebSocket API 

Dictate Transcription WebSocket API is designed specifically for healthcare professionals to dictate clinical notes efficiently and accurately. It takes audio from your provider's microphone and returns a live transcription.

By receiving audio directly captured from the provider's microphone, this API delivers live transcription services tailored for the medical field. It differs from our general transcription service (/transcribe-ws) by offering specific configuration options catered to the unique needs of clinical documentation dictation, such as precise control over punctuation.

General specifications

  • All messages sent and received via websockets are encoded as UTF-8 JSON text frames.
  • The order of returned dictated texts is guaranteed to be chronological.
  • We don't keep any of your data beyond the websocket lifecycle. So to be network-resilient, we recommend you store what is relevant for you to be back on track in case of untimely closure.

Dictation commands

You can consult the list of available dictation commands in the dedicated guide.

Authentication

  • Get an access token, details in Authentication. It can be a server's Access-Token or a user's Access-Token depending on whether you are calling the Server API or the User API.
  • We support two different ways of specifying the authentication token:
    • The recommended way: You pass your bearer authentication token as an Authorization header when initiating the websocket.

      Example:

      url: 'wss://us.api.nabla.com/v1/core/server/dictate-ws',
      protocol: 'dictate-protocol',
      extra_headers: { 'Authorization': 'Bearer <YOUR_TOKEN>' }
      
    • An alternative way, especially when specifying extra headers is not supported by your websocket client (e.g. from web browsers), is to pass the token as a second websocket protocol prefixed with jwt-.

      Example:

      url: 'wss://us.api.nabla.com/v1/core/user/dictate-ws',
      protocols: ['dictate-protocol', 'jwt-<YOUR_TOKEN>']
      

Servers

  • wss://us.api.nabla.com/v1/core/serverwssServer API

    Called from your servers, authenticated with a server access token.

  • wss://us.api.nabla.com/v1/core/userwssUser API

    Called from front-end apps, authenticated with a user-scoped access token.

Operations

  • PUB /dictate-ws

    Transcription Dictate WebSocket API takes audio from your provider's microphone and returns a live transcription.

    Dictation API expects audio chunks to flow continuously (even if silent), if no audio chunk is received for 10 seconds the websocket will fail with a timeout error (code: 83011).

    Example communication:

    • You: Open the websocket specifying an authorization token.
    • You: Send a first message setting the configuration.
    • You: continuously send small audio chunks.
    • Core API: continuously computes and sends dictation items.
    • You: stop streaming audio and immediately send a { "type": "END" }.
    • Core API: finishes processing audio and sends any remaining dictated texts.
    • Core API: closes the websocket.

    Stream the audio of the note being dictated by sending it in small, defined chunks (e.g., segments of 100 milliseconds each).

    Accepts one of the following messages:

    • #0dictate_config

      Initiates the dictation feature with the given configuration.

      This should be your first message in the websocket.

      object
      uid: dictate_config

      First message to configure dictation (audio format, locale, etc). Unlike older WebSocket APIs, the Dictation API now consistently requires sequence ids in incoming audio chunks and always emits audio chunk acknowledgment frames.

      Read more in Make transcription and dictation resilient to network interruptions.

      Examples

    • #1audio_chunk

      A chunk of audio from the provider's microphone.

      Chunk (little portion) of audio from the provider's microphone on which they dictate. Maximum allowed duration is 1 second, recommended is 100ms.

      object
      uid: audio_chunk

      Examples

    • #2end

      End the streaming.

      Signal the end of streaming and ask the speech-to-text engine to finish what is still in progress.

      object
      uid: end

      Examples

  • SUB /dictate-ws

    Transcription Dictate WebSocket API takes audio from your provider's microphone and returns a live transcription.

    Dictation API expects audio chunks to flow continuously (even if silent), if no audio chunk is received for 10 seconds the websocket will fail with a timeout error (code: 83011).

    Example communication:

    • You: Open the websocket specifying an authorization token.
    • You: Send a first message setting the configuration.
    • You: continuously send small audio chunks.
    • Core API: continuously computes and sends dictation items.
    • You: stop streaming audio and immediately send a { "type": "END" }.
    • Core API: finishes processing audio and sends any remaining dictated texts.
    • Core API: closes the websocket.

    Receive the live transcription.

    Accepts one of the following messages:

    • #0dictated_text

      A dictated text.

      A segment of the dictation, usually one or two words. Each received segment should be appended directly to the ongoing dictation result without inserting additional spaces, punctuation, or formatting.

      object
      uid: dictated_text

      Examples

    • #1audio_chunk_ack

      Acknowledgement for audio chunks up to the specified sequential id.

      Server will regularly send audio chunks acknowledgement to signal receipt and imminent processing of audio.

      Clients should consider acknowledged audio as processed and delete it from their buffer.

      Moreover, audio acknowledgement is intended to set the pace for streaming speed: Clients should refrain from sending new audio chunks until acknowledgement is received for previous ones. Server will accept up to 10 seconds of not-yet-acknowledged audio: clients going further will face an "audio chunks buffer overflow" error.

      Read more in Make transcription and dictation resilient to network interruptions.

      object
      uid: audio_chunk_ack

      Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id ack_id.

      Read more in Make transcription and dictation resilient to network interruptions.

      Examples

    • #2error_message

      An error message.

      An error message sent right before closing the websocket due to a fatal error. It explains shortly what went wrong.

      object
      uid: error_message

      Examples

Messages

  • #1dictate_config

    Initiates the dictation feature with the given configuration.

    This should be your first message in the websocket.

    object
    uid: dictate_config

    First message to configure dictation (audio format, locale, etc). Unlike older WebSocket APIs, the Dictation API now consistently requires sequence ids in incoming audio chunks and always emits audio chunk acknowledgment frames.

    Read more in Make transcription and dictation resilient to network interruptions.

  • #2audio_chunk

    A chunk of audio from the provider's microphone.

    Chunk (little portion) of audio from the provider's microphone on which they dictate. Maximum allowed duration is 1 second, recommended is 100ms.

    object
    uid: audio_chunk
  • #3end

    End the streaming.

    Signal the end of streaming and ask the speech-to-text engine to finish what is still in progress.

    object
    uid: end
  • #4dictated_text

    A dictated text.

    A segment of the dictation, usually one or two words. Each received segment should be appended directly to the ongoing dictation result without inserting additional spaces, punctuation, or formatting.

    object
    uid: dictated_text
  • #5audio_chunk_ack

    Acknowledgement for audio chunks up to the specified sequential id.

    Server will regularly send audio chunks acknowledgement to signal receipt and imminent processing of audio.

    Clients should consider acknowledged audio as processed and delete it from their buffer.

    Moreover, audio acknowledgement is intended to set the pace for streaming speed: Clients should refrain from sending new audio chunks until acknowledgement is received for previous ones. Server will accept up to 10 seconds of not-yet-acknowledged audio: clients going further will face an "audio chunks buffer overflow" error.

    Read more in Make transcription and dictation resilient to network interruptions.

    object
    uid: audio_chunk_ack

    Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id ack_id.

    Read more in Make transcription and dictation resilient to network interruptions.

  • #6error_message

    An error message.

    An error message sent right before closing the websocket due to a fatal error. It explains shortly what went wrong.

    object
    uid: error_message

Schemas

  • object
    uid: dictate_config

    First message to configure dictation (audio format, locale, etc). Unlike older WebSocket APIs, the Dictation API now consistently requires sequence ids in incoming audio chunks and always emits audio chunk acknowledgment frames.

    Read more in Make transcription and dictation resilient to network interruptions.

  • object
    uid: audio_chunk
  • object
    uid: end
  • object
    uid: dictated_text
  • object
    uid: audio_chunk_ack

    Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id ack_id.

    Read more in Make transcription and dictation resilient to network interruptions.

  • object
    uid: error_message