Orbitali Docs

Browser Voice Sessions

Integrate real-time voice agents directly into websites and mobile apps.

Orbitali supports browser-based voice testing and production web calling using secure WebSockets. This allows users to talk directly to your agents using their computer or phone microphone, completely bypassing the PSTN telephony carrier networks.

To keep your permanent API keys secure, web integrations use a two-step authentication pattern.


The Two-Step Authentication Flow

sequenceDiagram
    participant Browser as Client Browser
    participant Backend as Your Backend Server
    participant API as Orbitali API
    participant Agent as Orbitali Agent Service

    Browser->>Backend: Request Web Session
    Backend->>API: POST /public/v1/agents/{id}/realtime-sessions<br/>(Bearer API Key)
    API-->>Backend: Return Ephemeral Token & websocketUrl
    Backend-->>Browser: Return Ephemeral Token & websocketUrl
    Browser->>Agent: Connect to wss://agent.orbitali.ai/ws/...<br/>(using Ephemeral Token)
    Note over Browser,Agent: Bidirectional PCM16 Audio Stream

Step 1: Request Ephemeral Token (Server-Side)

Your backend server makes a request to the Orbitali API with your permanent API key.

[!CAUTION] Never expose your permanent Orbitali API Key (orb_live_...) in client-side code. Always request session tokens from a secure backend server environment.

Request:

curl https://api.orbitali.ai/public/v1/agents/9dbf0d0b-2b55-45de-8c41-7a2d9b0ce8e9/realtime-sessions \
  -X POST \
  -H "Authorization: Bearer $ORBITALI_API_KEY"

Response:

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expiresAt": "2026-06-24T12:01:00.000Z",
  "websocketUrl": "wss://agent.orbitali.ai/ws/agents/9dbf0d0b-2b55-45de-8c41-7a2d9b0ce8e9?token=eyJhbGciOi...",
  "protocol": {
    "inputAudio": { "encoding": "pcm16", "sampleRate": 16000 },
    "outputAudio": { "encoding": "pcm16", "sampleRate": 24000 }
  }
}

The returned session token is valid for 60 seconds. The WebSocket connection must be established before this token expires, but once connected, the socket remains active until closed.

Step 2: Establish the Client WebSocket

Your frontend browser code connects directly to the returned websocketUrl.


Audio Stream Specifications

The WebSocket connection operates using JSON messages for signaling and audio transit.

Sending Audio (Client → Server)

Record mono audio from the user's microphone. Convert the audio to PCM16 16 kHz mono format, encode it as a Base64 string, and send it in chunks:

{
  "type": "input_audio",
  "audio": "//uQRAAM0M503..."
}

Receiving Audio (Server → Client)

The server streams back synthesized assistant voice response chunks. The format is PCM16 24 kHz mono:

{
  "type": "output_audio",
  "audio": "Z3g1eHJ5...",
  "encoding": "pcm16",
  "sampleRate": 24000
}

Interruption & Barge-in (clear)

Because the system supports active listening, a user can speak while the agent is talking. When Orbitali detects a barge-in:

  1. The server instantly stops sending synthesized voice.
  2. The server sends a clear event over the WebSocket:
    { "type": "clear" }
  3. Important: When your client code receives a clear message, it must immediately stop playing any audio currently queued in the playback buffer. This creates a responsive, natural conversation flow.

Other Server Events

  • session.started: Emitted when connection negotiation completes.
    { "type": "session.started", "callId": "call_uuid" }
  • transcript: Emitted incrementally when conversational turns complete.
    { "type": "transcript", "role": "assistant", "text": "Hello, how can I help?" }

Code Example

Backend API Route (Next.js App Router)

Create a file at app/api/voice-token/route.ts:

import { NextResponse } from "next/server";

export async function POST() {
  const agentId = process.env.ORBITALI_AGENT_ID;
  const apiKey = process.env.ORBITALI_API_KEY;

  if (!agentId || !apiKey) {
    return NextResponse.json({ error: "Configuration missing" }, { status: 500 });
  }

  try {
    const res = await fetch(`https://api.orbitali.ai/public/v1/agents/${agentId}/realtime-sessions`, {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${apiKey}`,
        "Content-Type": "application/json",
      },
    });

    if (!res.ok) {
      const errText = await res.text();
      return NextResponse.json({ error: `Orbitali error: ${errText}` }, { status: res.status });
    }

    const data = await res.json();
    return NextResponse.json(data);
  } catch (err: any) {
    return NextResponse.json({ error: err.message }, { status: 500 });
  }
}

Client Frontend Controller (TypeScript)

Use this structure to connect and orchestrate the audio loop in the browser:

class OrbitaliVoiceConnection {
  private ws: WebSocket | null = null;
  private audioQueue: string[] = [];
  private isPlaying = false;
  private audioContext: AudioContext | null = null;

  async startSession() {
    // 1. Get ephemeral token
    const res = await fetch("/api/voice-token", { method: "POST" });
    const { websocketUrl } = await res.json();

    // 2. Connect WebSocket
    this.ws = new WebSocket(websocketUrl);
    this.audioContext = new (window.AudioContext || (window as any).webkitAudioContext)();

    this.ws.onmessage = async (event) => {
      const msg = JSON.parse(event.data);

      switch (msg.type) {
        case "session.started":
          console.log(`Voice session started: ${msg.callId}`);
          this.startMicrophoneCapture();
          break;

        case "output_audio":
          this.queueAudio(msg.audio);
          break;

        case "clear":
          console.log("User interrupted the agent. Clearing audio buffer.");
          this.clearAudioBuffer();
          break;

        case "transcript":
          console.log(`[${msg.role}]: ${msg.text}`);
          break;
      }
    };
  }

  private queueAudio(base64Pcm16: string) {
    this.audioQueue.push(base64Pcm16);
    if (!this.isPlaying) {
      this.playNextChunk();
    }
  }

  private clearAudioBuffer() {
    this.audioQueue = [];
    this.isPlaying = false;
    // Implementation note: Stop current audio source node playback here
  }

  private async playNextChunk() {
    if (this.audioQueue.length === 0) {
      this.isPlaying = false;
      return;
    }

    this.isPlaying = true;
    const base64Data = this.audioQueue.shift()!;
    const pcmData = this.base64ToPCM16(base64Data);

    // Convert PCM16 mono 24kHz buffer to Float32 for Web Audio API playback
    const audioBuffer = this.audioContext!.createBuffer(1, pcmData.length, 24000);
    const channelData = audioBuffer.getChannelData(0);
    for (let i = 0; i < pcmData.length; i++) {
      channelData[i] = pcmData[i] / 32768.0;
    }

    const source = this.audioContext!.createBufferSource();
    source.buffer = audioBuffer;
    source.connect(this.audioContext!.destination);
    source.onended = () => this.playNextChunk();
    source.start();
  }

  private base64ToPCM16(base64: string): Int16Array {
    const raw = window.atob(base64);
    const buffer = new ArrayBuffer(raw.length);
    const view = new DataView(buffer);
    for (let i = 0; i < raw.length; i++) {
      view.setUint8(i, raw.charCodeAt(i));
    }
    return new Int16Array(buffer);
  }

  private startMicrophoneCapture() {
    // Record user microphone at 16kHz mono and send PCM16 base64 chunks
    // ws.send(JSON.stringify({ type: "input_audio", audio: "..." }))
  }

  stopSession() {
    if (this.ws) {
      this.ws.send(JSON.stringify({ type: "stop" }));
      this.ws.close();
    }
    if (this.audioContext) {
      this.audioContext.close();
    }
  }
}

On this page