Skip to main content

Custom WebSocket

When your audio source is not a telephony provider, such as a hardware device, a browser, a media server, or a bot, you can stream it into a VideoSDK room with the custom WebSocket path. Unlike the Twilio, Plivo, and Telnyx connectors, where the provider speaks the protocol, here you implement the WebSocket client and send PCM16 audio frames yourself.

The flow has two parts: create an ingest session on your backend, then connect a WebSocket client and stream audio in both directions. The same connection also carries a topic-based messaging channel for exchanging chat or data with other participants. If you are new to connectors, read the Connectors Overview first.

Complete example

A runnable browser example that exercises the full flow (session creation, microphone streaming, room-audio playback, and pub/sub messaging) is on GitHub: videosdk-live/videosdk-socket-ingest-example.

How It Works

  1. Your backend creates an ingest session and receives a WebSocket URL that contains a single-use reference.
  2. Your client opens a WebSocket connection to that URL.
  3. The client sends a start frame. VideoSDK claims the session, joins the room, and begins bridging audio.
  4. The client streams media frames into the room and receives media frames containing the room audio.
  5. The client sends a stop frame, or closes the connection, to leave the room.

Audio is 16-bit signed PCM, little-endian, at 8 kHz mono, base64 encoded. A 20 ms frame is 160 samples, which is 320 bytes, or roughly 428 base64 characters. VideoSDK transcodes to and from the room internally, so you only send and receive PCM16.

Prerequisites

  • A VideoSDK account and API token. See Generate a VideoSDK Token.
  • A target room ID. If you do not have one, create a room with the Create Room API.
  • A WebSocket client that can send base64-encoded audio, in any language.

Step 1: Create an Ingest Session

Call this endpoint from your backend, because it requires your VideoSDK token. The returned WebSocket URL is single-use, so pass it to whichever client will stream the audio.

curl -X POST https://api.videosdk.live/v2/ingest/sessions \
-H "Authorization: Bearer $VIDEOSDK_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"roomId": "abcd-efgh-ijkl",
"participant": {
"name": "Field Device 12",
"metadata": { "deviceId": "dev_12" }
}
}'

The request body accepts the following fields.

FieldTypeRequiredDescription
roomIdstringYesThe room the stream joins.
participantobjectNoIdentity for the ingested stream.
participant.idstringNoCustom participant ID. Omit it to let VideoSDK generate one.
participant.namestringNoDisplay name in the room.
participant.metadataobjectNoMetadata attached to the participant.
agentobjectNoAttach an AI agent to the session.
agent.idstringRequired if agent is providedAgent identifier.
agent.metadataobjectNoMetadata attached to the agent.
regionstringNoPins the ingest region (e.g., us002 or in002).

A successful request returns the session:

{
"message": "Ingest session created",
"data": {
"wsUrl": "wss://ingest.videosdk.live/videosdk?ref=vs_k9d8j2m4x7p1w5v3r6u0",
"callId": "550e8400-e29b-41d4-a716-446655440000",
"expiresIn": 90
}
}
FieldDescription
wsUrlThe WebSocket endpoint to connect to. The ref query parameter is the single-use claim, so do not remove or modify it.
callIdThe backend handle for this session. It appears on lifecycle webhooks.
expiresInSeconds before the unclaimed session expires. This value is 90. Connect and send the start frame before then.

Hand the wsUrl to your client. It connects to this URL in the next step, before the session expires.

Step 2: Connect and Stream

Using the wsUrl from Step 1, open a WebSocket connection and follow this sequence.

StepFrame sent by the clientNotes
Start{ "event": "start" }Must be sent within 5 seconds of connecting, or the server closes the connection with code 1008 and the reason start timeout.
Media{ "event": "media", "payload": "<base64 PCM16-LE>" }Stream continuously, about one frame every 20 ms.
Stop{ "event": "stop" }Ends the session. Closing the connection also ends it.

The server sends room audio back to the client as media frames in the same shape:

{ "event": "media", "payload": "<base64 PCM16-LE>" }

Decode payload from base64 to get 16-bit little-endian PCM samples at 8 kHz mono, then play or process them.

note

The server sends a WebSocket ping every 30 seconds. Most WebSocket libraries respond with a pong automatically. If your client does not, the connection is terminated after a missed interval.

Example: Node.js

import WebSocket from "ws";

// wsUrl comes from POST /v2/ingest/sessions, which you call on your backend.
const ws = new WebSocket(wsUrl);

ws.on("open", () => {
// 1. Claim the session.
ws.send(JSON.stringify({ event: "start" }));

// 2. Stream audio. getPcm16Frame() returns a Buffer of 16-bit little-endian
// PCM samples at 8 kHz mono. 320 bytes is one 20 ms frame.
const timer = setInterval(() => {
const pcm16 = getPcm16Frame(); // your audio source
if (!pcm16) return;
ws.send(JSON.stringify({ event: "media", payload: pcm16.toString("base64") }));
}, 20);

ws.on("close", () => clearInterval(timer));
});

// Receive room audio.
ws.on("message", (raw) => {
const msg = JSON.parse(raw.toString());
if (msg.event === "media") {
const pcm16 = Buffer.from(msg.payload, "base64"); // 16-bit LE PCM, 8 kHz mono
playPcm16(pcm16); // your playback or processing
}
});

// 3. End the session when you are done.
function stop() {
ws.send(JSON.stringify({ event: "stop" }));
ws.close();
}

Step 3: Verify

  1. Add another participant or an AI agent to the room.
  2. Run your client and confirm that the room hears your stream and your client receives the room audio.
  3. If you registered lifecycle webhooks, watch for call-started, call-answered, and call-hangup events that reference the callId.

Messaging

The same WebSocket also carries a topic-based publish/subscribe channel, so your client can exchange text or JSON with other participants over the connection it already has open. Messaging is available only on the custom WebSocket path; the telephony connectors (Twilio, Plivo, Telnyx) have no data channel.

Messages are scoped to the room. When you publish to a topic, every participant in the meeting that is subscribed to that topic receives it, and you never receive your own messages back. Published messages are persisted to the meeting's pub/sub history.

Subscribe to a topic

Send a subscribe frame to start receiving messages on a topic:

{ "event": "subscribe", "topic": "CHAT" }

Publish to a topic

Send a message frame with the topic and your payload. data can be a string or any JSON value. JSON values are serialized before delivery, so subscribers receive a string. Parse it on the receiving side if needed.

{ "event": "message", "topic": "CHAT", "data": "Hello everyone" }

Receive messages

When another participant publishes to a topic you are subscribed to, the server sends a message frame. The inbound data is an object describing the message and its sender:

{
"event": "message",
"topic": "CHAT",
"data": {
"message": "Hello everyone",
"senderId": "participant-uuid",
"senderName": "Alice",
"timestamp": "2026-06-27T14:41:00.000Z",
"id": "msg-uuid"
}
}

Unsubscribe from a topic

{ "event": "unsubscribe", "topic": "CHAT" }

Example: Node.js

Reuse the connection from Step 2. Subscribe and publish after the start frame, and handle inbound message frames alongside media:

// After sending the start frame:
ws.send(JSON.stringify({ event: "subscribe", topic: "CHAT" }));
ws.send(JSON.stringify({ event: "message", topic: "CHAT", data: "Hello everyone" }));

// In your existing ws.on("message") handler:
ws.on("message", (raw) => {
const msg = JSON.parse(raw.toString());
if (msg.event === "media") {
// handle audio (see Step 2)
} else if (msg.event === "message") {
console.log(`[${msg.topic}] ${msg.data.senderName}: ${msg.data.message}`);
}
});
Topic and payload rules
  • Topic: a non-empty string up to 128 characters.
  • Payload: up to 8 KB (UTF-8) per message.
  • You can subscribe or publish before the session finishes joining the room. Those frames are queued and flushed once it joins.
  • Messaging is best-effort and independent of the audio stream. A failed subscribe or publish never drops the call.

Lifecycle Webhooks

Custom WebSocket sessions emit the same lifecycle events as telephony connectors, such as call-started, call-answered, and call-hangup. Each request carries a videosdk-signature header (a base64 RSA-SHA256 signature of the body) so you can verify it. Connectors use the same webhook system as SIP, so see SIP Webhooks for the full event list, payloads, and registration.

Troubleshooting

SymptomLikely cause and fix
The connection closes with code 1008 right after connectingNo start frame was sent within 5 seconds. Send { "event": "start" } as soon as the connection opens.
The start frame is rejected, or the connection closes during startThe session expired more than 90 seconds after creation, or the WebSocket URL was modified. Create a new session and connect promptly with the exact URL.
Audio is garbled, or plays too fast or too slowThe sample format is wrong. Audio must be 16-bit signed little-endian PCM at 8 kHz mono before base64 encoding.
The connection is terminated mid-streamA heartbeat was missed. Use a WebSocket client that responds to pings automatically, or handle pong responses yourself.
Creating the session returns 401 or 403The Authorization token is missing or expired. See Generate a VideoSDK Token.
No room audio is receivedMake sure another participant is publishing audio in the room.

API Reference

API used in this guide:

WebSocket frames:

DirectionFrame
Client to server{ "event": "start" }
Client to server{ "event": "media", "payload": "<base64 PCM16-LE>" }
Client to server{ "event": "stop" }
Client to server{ "event": "subscribe", "topic": "<topic>" }
Client to server{ "event": "unsubscribe", "topic": "<topic>" }
Client to server{ "event": "message", "topic": "<topic>", "data": <string or JSON> }
Server to client{ "event": "media", "payload": "<base64 PCM16-LE>" }
Server to client{ "event": "message", "topic": "<topic>", "data": { ... } }

Got a Question? Ask us on discord