Skip to main content
Dictate Transcription WebSocket API documentation
Dictate Transcription WebSocket API 

Dictate Transcription WebSocket API is designed specifically for healthcare professionals to dictate clinical notes efficiently and accurately. It takes audio from your provider's microphone and returns a live transcription.

By receiving audio directly captured from the provider's microphone, this API delivers live transcription services tailored for the medical field. It differs from our general transcription service (/transcribe-ws) by offering specific configuration options catered to the unique needs of clinical documentation dictation, such as precise control over punctuation.

General specifications

  • All messages sent and received via websockets are encoded as UTF-8 JSON text frames.
  • We don't keep any of your data beyond the websocket lifecycle. So to be network-resilient, we recommend you store what is relevant for you to be back on track in case of untimely closure.

Authentication

  • Get an access token, details in Authentication. It can be a server access token or a user's one, depending on whether you are calling the Server API or the User API.
  • We support two different ways of specifying the authentication token:
    • The recommended way: You pass your bearer authentication token as an Authorization header when initiating the websocket.

      Example:

      url: 'wss://us.api.nabla.com/v1/core/server/dictate-ws',
      protocol: 'dictate-protocol',
      extra_headers: { 'Authorization': 'Bearer <YOUR_TOKEN>' }
      
    • An alternative way, especially when specifying extra headers is not supported by your websocket client (e.g. from web browsers), is to pass the token as a second websocket protocol prefixed with jwt-.

      Example:

      url: 'wss://us.api.nabla.com/v1/core/user/dictate-ws',
      protocols: ['dictate-protocol', 'jwt-<YOUR_TOKEN>']
      

Servers

  • wss://us.api.nabla.com/v1/core/serverwssServer API

    Called from your servers, authenticated with a server key.

  • wss://us.api.nabla.com/v1/core/userwssUser API

    Called from front-end apps, authenticated with a user-scoped access token.

Operations

  • PUB /dictate-ws

    Transcription Dictate WebSocket API takes audio from your provider's microphone and returns a live transcription.

    Dictation API expects audio chunks to flow continuously (even if silent), if no audio chunk is received for 10 seconds the websocket will fail with a timeout error (code: 83011).

    Example communication:

    • You: Open the websocket specifying an authorization token.
    • You: Send a first message setting the configuration.
    • You: continuously send small audio chunks.
    • Core API: continuously computes and sends dictation items.
    • You: stop streaming audio and immediately send a { "type": "END" }.
    • Core API: finishes processing audio and pushes any ongoing dictation item to a final state.
    • Core API: closes the websocket.

    Stream the audio of the note being dictated by sending it in small, defined chunks (e.g., segments of 100 milliseconds each).

    Accepts one of the following messages:

    • #0dictate_config

      Initiates the dictation feature with the given configuration.

      This should be your first message in the websocket.

      object
      uid: dictate_config

      First message to configure dictation (audio format, locale, etc).

      Examples

    • #1audio_chunk

      A chunk of audio from the provider's microphone.

      Chunk (little portion) of audio from the provider's microphone on which they dictate. Maximum allowed duration is 1 second, recommended is 100ms.

      object
      uid: audio_chunk

      Examples

    • #2end

      End the streaming.

      Signal the end of streaming and ask the speech-to-text engine to finish what is still in progress (e.g. gives a final state to the latest dictation item).

      object
      uid: end

      Examples

  • SUB /dictate-ws

    Transcription Dictate WebSocket API takes audio from your provider's microphone and returns a live transcription.

    Dictation API expects audio chunks to flow continuously (even if silent), if no audio chunk is received for 10 seconds the websocket will fail with a timeout error (code: 83011).

    Example communication:

    • You: Open the websocket specifying an authorization token.
    • You: Send a first message setting the configuration.
    • You: continuously send small audio chunks.
    • Core API: continuously computes and sends dictation items.
    • You: stop streaming audio and immediately send a { "type": "END" }.
    • Core API: finishes processing audio and pushes any ongoing dictation item to a final state.
    • Core API: closes the websocket.

    Receive the live transcription.

    Accepts one of the following messages:

    • #0dictation_item

      A dictation item.

      A portion of the dictation being generated. Typically, the currently being spoken sentence transcribed from the last transmitted audio chunks. This might be an incomplete sentence since we keep transcribing as audio chunks are received. You should patch the previously received dictation item with the same id until is_final is true.

      object
      uid: dictation_item

      Examples

    • #1audio_chunk_ack

      Acknowledgement for audio chunks up to the specified sequential id.

      When enabled in the configuration, server will regularly send audio chunks acknowledgement to signal receipt and imminent processing of audio.

      Clients should consider acknowledged audio as processed and delete it from their buffer.

      Moreover, audio acknowledgement is intended to set the pace for streaming speed: Clients should refrain from sending new audio chunks until acknowledgement is received for previous ones. Server will accept up to 10 seconds of not-yet-acknowledged audio: clients going further will face an "audio chunks buffer overflow" error.

      Read more in Make transcription and dictation resilient to network interruptions.

      object
      uid: audio_chunk_ack

      Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id ack_id.

      Read more in Make transcription and dictation resilient to network interruptions.

      Examples

    • #2error_message

      An error message.

      An error message sent right before closing the websocket due to a fatal error. It explains shortly what went wrong.

      object
      uid: error_message

      Examples

Messages

  • #1dictate_config

    Initiates the dictation feature with the given configuration.

    This should be your first message in the websocket.

    object
    uid: dictate_config

    First message to configure dictation (audio format, locale, etc).

  • #2audio_chunk

    A chunk of audio from the provider's microphone.

    Chunk (little portion) of audio from the provider's microphone on which they dictate. Maximum allowed duration is 1 second, recommended is 100ms.

    object
    uid: audio_chunk
  • #3end

    End the streaming.

    Signal the end of streaming and ask the speech-to-text engine to finish what is still in progress (e.g. gives a final state to the latest dictation item).

    object
    uid: end
  • #4dictation_item

    A dictation item.

    A portion of the dictation being generated. Typically, the currently being spoken sentence transcribed from the last transmitted audio chunks. This might be an incomplete sentence since we keep transcribing as audio chunks are received. You should patch the previously received dictation item with the same id until is_final is true.

    object
    uid: dictation_item
  • #5audio_chunk_ack

    Acknowledgement for audio chunks up to the specified sequential id.

    When enabled in the configuration, server will regularly send audio chunks acknowledgement to signal receipt and imminent processing of audio.

    Clients should consider acknowledged audio as processed and delete it from their buffer.

    Moreover, audio acknowledgement is intended to set the pace for streaming speed: Clients should refrain from sending new audio chunks until acknowledgement is received for previous ones. Server will accept up to 10 seconds of not-yet-acknowledged audio: clients going further will face an "audio chunks buffer overflow" error.

    Read more in Make transcription and dictation resilient to network interruptions.

    object
    uid: audio_chunk_ack

    Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id ack_id.

    Read more in Make transcription and dictation resilient to network interruptions.

  • #6error_message

    An error message.

    An error message sent right before closing the websocket due to a fatal error. It explains shortly what went wrong.

    object
    uid: error_message

Schemas

  • object
    uid: dictate_config

    First message to configure dictation (audio format, locale, etc).

  • object
    uid: audio_chunk
  • object
    uid: end
  • object
    uid: dictation_item
  • object
    uid: audio_chunk_ack

    Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id ack_id.

    Read more in Make transcription and dictation resilient to network interruptions.

  • object
    uid: error_message