Browser Voice Sessions
Integrate real-time voice agents directly into websites and mobile apps.
Orbitali supports browser-based voice testing and production web calling using secure WebSockets. This allows users to talk directly to your agents using their computer or phone microphone, completely bypassing the PSTN telephony carrier networks.
To keep your permanent API keys secure, web integrations use a two-step authentication pattern.
The Two-Step Authentication Flow
sequenceDiagram
participant Browser as Client Browser
participant Backend as Your Backend Server
participant API as Orbitali API
participant Agent as Orbitali Agent Service
Browser->>Backend: Request Web Session
Backend->>API: POST /public/v1/agents/{id}/realtime-sessions<br/>(Bearer API Key)
API-->>Backend: Return Ephemeral Token & websocketUrl
Backend-->>Browser: Return Ephemeral Token & websocketUrl
Browser->>Agent: Connect to wss://agent.orbitali.ai/ws/...<br/>(using Ephemeral Token)
Note over Browser,Agent: Bidirectional PCM16 Audio Stream
Step 1: Request Ephemeral Token (Server-Side)
Your backend server makes a request to the Orbitali API with your permanent API key.
[!CAUTION] Never expose your permanent Orbitali API Key (
orb_live_...) in client-side code. Always request session tokens from a secure backend server environment.
Request:
curl https://api.orbitali.ai/public/v1/agents/9dbf0d0b-2b55-45de-8c41-7a2d9b0ce8e9/realtime-sessions \
-X POST \
-H "Authorization: Bearer $ORBITALI_API_KEY"
Response:
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expiresAt": "2026-06-24T12:01:00.000Z",
"websocketUrl": "wss://agent.orbitali.ai/ws/agents/9dbf0d0b-2b55-45de-8c41-7a2d9b0ce8e9?token=eyJhbGciOi...",
"protocol": {
"inputAudio": { "encoding": "pcm16", "sampleRate": 16000 },
"outputAudio": { "encoding": "pcm16", "sampleRate": 24000 }
}
}
The returned session token is valid for 60 seconds. The WebSocket connection must be established before this token expires, but once connected, the socket remains active until closed.
Step 2: Establish the Client WebSocket
Your frontend browser code connects directly to the returned websocketUrl.
Audio Stream Specifications
The WebSocket connection operates using JSON messages for signaling and audio transit.
Sending Audio (Client → Server)
Record mono audio from the user's microphone. Convert the audio to PCM16 16 kHz mono format, encode it as a Base64 string, and send it in chunks:
{
"type": "input_audio",
"audio": "//uQRAAM0M503..."
}
Receiving Audio (Server → Client)
The server streams back synthesized assistant voice response chunks. The format is PCM16 24 kHz mono:
{
"type": "output_audio",
"audio": "Z3g1eHJ5...",
"encoding": "pcm16",
"sampleRate": 24000
}
Interruption & Barge-in (clear)
Because the system supports active listening, a user can speak while the agent is talking. When Orbitali detects a barge-in:
- The server instantly stops sending synthesized voice.
- The server sends a
clearevent over the WebSocket:{ "type": "clear" } - Important: When your client code receives a
clearmessage, it must immediately stop playing any audio currently queued in the playback buffer. This creates a responsive, natural conversation flow.
Other Server Events
session.started: Emitted when connection negotiation completes.{ "type": "session.started", "callId": "call_uuid" }transcript: Emitted incrementally when conversational turns complete.{ "type": "transcript", "role": "assistant", "text": "Hello, how can I help?" }
Code Example
Backend API Route (Next.js App Router)
Create a file at app/api/voice-token/route.ts:
import { NextResponse } from "next/server";
export async function POST() {
const agentId = process.env.ORBITALI_AGENT_ID;
const apiKey = process.env.ORBITALI_API_KEY;
if (!agentId || !apiKey) {
return NextResponse.json({ error: "Configuration missing" }, { status: 500 });
}
try {
const res = await fetch(`https://api.orbitali.ai/public/v1/agents/${agentId}/realtime-sessions`, {
method: "POST",
headers: {
"Authorization": `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
});
if (!res.ok) {
const errText = await res.text();
return NextResponse.json({ error: `Orbitali error: ${errText}` }, { status: res.status });
}
const data = await res.json();
return NextResponse.json(data);
} catch (err: any) {
return NextResponse.json({ error: err.message }, { status: 500 });
}
}
Client Frontend Controller (TypeScript)
Use this structure to connect and orchestrate the audio loop in the browser:
class OrbitaliVoiceConnection {
private ws: WebSocket | null = null;
private audioQueue: string[] = [];
private isPlaying = false;
private audioContext: AudioContext | null = null;
async startSession() {
// 1. Get ephemeral token
const res = await fetch("/api/voice-token", { method: "POST" });
const { websocketUrl } = await res.json();
// 2. Connect WebSocket
this.ws = new WebSocket(websocketUrl);
this.audioContext = new (window.AudioContext || (window as any).webkitAudioContext)();
this.ws.onmessage = async (event) => {
const msg = JSON.parse(event.data);
switch (msg.type) {
case "session.started":
console.log(`Voice session started: ${msg.callId}`);
this.startMicrophoneCapture();
break;
case "output_audio":
this.queueAudio(msg.audio);
break;
case "clear":
console.log("User interrupted the agent. Clearing audio buffer.");
this.clearAudioBuffer();
break;
case "transcript":
console.log(`[${msg.role}]: ${msg.text}`);
break;
}
};
}
private queueAudio(base64Pcm16: string) {
this.audioQueue.push(base64Pcm16);
if (!this.isPlaying) {
this.playNextChunk();
}
}
private clearAudioBuffer() {
this.audioQueue = [];
this.isPlaying = false;
// Implementation note: Stop current audio source node playback here
}
private async playNextChunk() {
if (this.audioQueue.length === 0) {
this.isPlaying = false;
return;
}
this.isPlaying = true;
const base64Data = this.audioQueue.shift()!;
const pcmData = this.base64ToPCM16(base64Data);
// Convert PCM16 mono 24kHz buffer to Float32 for Web Audio API playback
const audioBuffer = this.audioContext!.createBuffer(1, pcmData.length, 24000);
const channelData = audioBuffer.getChannelData(0);
for (let i = 0; i < pcmData.length; i++) {
channelData[i] = pcmData[i] / 32768.0;
}
const source = this.audioContext!.createBufferSource();
source.buffer = audioBuffer;
source.connect(this.audioContext!.destination);
source.onended = () => this.playNextChunk();
source.start();
}
private base64ToPCM16(base64: string): Int16Array {
const raw = window.atob(base64);
const buffer = new ArrayBuffer(raw.length);
const view = new DataView(buffer);
for (let i = 0; i < raw.length; i++) {
view.setUint8(i, raw.charCodeAt(i));
}
return new Int16Array(buffer);
}
private startMicrophoneCapture() {
// Record user microphone at 16kHz mono and send PCM16 base64 chunks
// ws.send(JSON.stringify({ type: "input_audio", audio: "..." }))
}
stopSession() {
if (this.ws) {
this.ws.send(JSON.stringify({ type: "stop" }));
this.ws.close();
}
if (this.audioContext) {
this.audioContext.close();
}
}
}