Skip to main content
Transcribe WebSocket API documentation
Transcribe WebSocket API 

Transcribe WebSocket API takes audio streams of your encounters and returns a live transcription.

๐Ÿ‡Show me some code

We've published a JavaScript example of how you can integrate the /transcribe-ws API on a web page.

General specifications

  • All messages sent and received via websockets are encoded as UTF-8 JSON text frames.
  • We don't keep any of your data beyond the websocket lifecycle. So to be network-resilient, we recommend you store what is relevant for you to be back on track in case of untimely closure.

Authentication

  • Get an access token, details in Authentication. It can be a server access token or a user's one, depending on whether you are calling the Server API or the User API.
  • We support two different ways of specifying the authentication token:
    • The recommended way: You pass your bearer authentication token as an Authorization header when initiating the websocket.

      Example:

      url: 'wss://us.api.nabla.com/v1/core/server/transcribe-ws',
      protocol: 'transcribe-protocol',
      extra_headers: { 'Authorization': 'Bearer <YOUR_TOKEN>' }
      
    • An alternative way, especially when specifying extra headers is not supported by your websocket client (e.g. from web browsers), is to pass the token as a second websocket protocol prefixed with jwt-.

      Example:

      url: 'wss://us.api.nabla.com/v1/core/user/transcribe-ws',
      protocols: ['transcribe-protocol', 'jwt-<YOUR_TOKEN>']
      

Servers

  • wss://{region}.api.nabla.com/v1/core/serverwssServer API

    Called from your servers, authenticated with a server key.

  • wss://{region}.api.nabla.com/v1/core/userwssUser API

    Called from front-end apps, authenticated with a user-scoped access token.

Operations

  • PUB /transcribe-ws

    Transcribe WebSocket API takes audio streams of your encounters and returns a live transcription.

    Each of the audio streams you declare in the configuration will expect audio chunks to flow continuously (even if silent), if a stream does not receive any audio chunk for 10 seconds the websocket will fail with a timeout error (code: 83011).

    Example communication:

    • You: Open the websocket specifying an authorization token.
    • You: Send a first message setting the configuration.
    • You: continuously send small audio chunks for each stream.
    • Nabla: continuously computes and sends transcript items.
    • You: stop streaming audio and immediately send a { "type": "END" }.
    • Nabla: finishes processing audio and pushes any ongoing transcript item to a final state.
    • Nabla: closes the websocket.
    • You: filter-out non-final transcript items and sort them by start_offset_ms before calling the note-generation API.

    Stream your encounter's audio by sending small chunks from each speaker.

    Accepts one of the following messages:

    • #0transcribe_config

      Initiates the transcribeing feature with the given configuration.

      This should be your first message in the websocket.

      object
      uid: transcribe_config

      First message to configure transcription (audio format, locale, etc).

      Examples

    • #1audio_chunk

      A chunk of an audio track from the encounter.

      Chunk (little portion) of a single audio track from the encounter. Maximum allowed duration is 1 second, recommended is 100ms.

      object
      uid: audio_chunk

      Examples

    • #2end

      End the streaming.

      Signal the end of streaming and ask the Nabla Core API to finish what is still in progress (e.g. gives a final state to the latest transcript item).

      object
      uid: end

      Examples

  • SUB /transcribe-ws

    Transcribe WebSocket API takes audio streams of your encounters and returns a live transcription.

    Each of the audio streams you declare in the configuration will expect audio chunks to flow continuously (even if silent), if a stream does not receive any audio chunk for 10 seconds the websocket will fail with a timeout error (code: 83011).

    Example communication:

    • You: Open the websocket specifying an authorization token.
    • You: Send a first message setting the configuration.
    • You: continuously send small audio chunks for each stream.
    • Nabla: continuously computes and sends transcript items.
    • You: stop streaming audio and immediately send a { "type": "END" }.
    • Nabla: finishes processing audio and pushes any ongoing transcript item to a final state.
    • Nabla: closes the websocket.
    • You: filter-out non-final transcript items and sort them by start_offset_ms before calling the note-generation API.

    Receive the live transcription.

    Accepts one of the following messages:

    • #0transcript_item

      A transcript item.

      A portion of the transcript being generated. Typically, the currently being spoken sentence transcribed from the last transmitted audio chunks. This might be an incomplete sentence since we keep transcribing as audio chunks are received. You should patch the previously received transcript item with the same id until is_final is true.

      object
      uid: transcript_item

      Examples

    • #1audio_chunk_ack

      Acknowledgement for audio chunks up to the specified sequential id.

      When enabled in the configuration, server will regularly send audio chunks acknowledgement to signal receipt and imminent transcription of audio.

      Clients should consider acknowledged audio as processed and delete it from their buffer.

      Moreover, audio acknowledgement is intended to set the pace for streaming speed: Clients should refrain from sending new audio chunks until acknowledgement is received for previous ones. Server will accept up to 10 seconds of not-yet-acknowledged audio: clients going further will face an "audio chunks buffer overflow" error.

      Read more in Make transcription and dictation resilient to network interruptions.

      object
      uid: audio_chunk_ack

      Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id ack_id.

      Read more in Make transcription and dictation resilient to network interruptions.

      Examples

    • #2error_message

      An error message.

      An error message sent right before closing the websocket due to a fatal error. It explains shortly what went wrong.

      object
      uid: error_message

      Examples

Messages

  • #1transcribe_config

    Initiates the transcribeing feature with the given configuration.

    This should be your first message in the websocket.

    object
    uid: transcribe_config

    First message to configure transcription (audio format, locale, etc).

  • #2audio_chunk

    A chunk of an audio track from the encounter.

    Chunk (little portion) of a single audio track from the encounter. Maximum allowed duration is 1 second, recommended is 100ms.

    object
    uid: audio_chunk
  • #3end

    End the streaming.

    Signal the end of streaming and ask the Nabla Core API to finish what is still in progress (e.g. gives a final state to the latest transcript item).

    object
    uid: end
  • #4transcript_item

    A transcript item.

    A portion of the transcript being generated. Typically, the currently being spoken sentence transcribed from the last transmitted audio chunks. This might be an incomplete sentence since we keep transcribing as audio chunks are received. You should patch the previously received transcript item with the same id until is_final is true.

    object
    uid: transcript_item
  • #5audio_chunk_ack

    Acknowledgement for audio chunks up to the specified sequential id.

    When enabled in the configuration, server will regularly send audio chunks acknowledgement to signal receipt and imminent transcription of audio.

    Clients should consider acknowledged audio as processed and delete it from their buffer.

    Moreover, audio acknowledgement is intended to set the pace for streaming speed: Clients should refrain from sending new audio chunks until acknowledgement is received for previous ones. Server will accept up to 10 seconds of not-yet-acknowledged audio: clients going further will face an "audio chunks buffer overflow" error.

    Read more in Make transcription and dictation resilient to network interruptions.

    object
    uid: audio_chunk_ack

    Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id ack_id.

    Read more in Make transcription and dictation resilient to network interruptions.

  • #6error_message

    An error message.

    An error message sent right before closing the websocket due to a fatal error. It explains shortly what went wrong.

    object
    uid: error_message

Schemas

  • object
    uid: transcribe_config

    First message to configure transcription (audio format, locale, etc).

  • object
    uid: audio_chunk
  • object
    uid: end
  • object
    uid: transcript_item
  • object
    uid: audio_chunk_ack

    Acknowledgement of audio receipt by the server up to the audio chunk with the sequential id ack_id.

    Read more in Make transcription and dictation resilient to network interruptions.

  • object
    uid: error_message