Skip to main content

Aivoco Real-Time Voice Call API Documentation

Overview

The Aivoco Real-Time Call API enables developers to initiate a live, two-way voice conversation with the AI by connecting to a WebSocket endpoint. Once connected, the client streams live microphone audio (audio/x-mulaw) to the AI, while receiving real-time, generated speech responses back through the same connection — enabling fully interactive, natural voice communication.

Base URL

wss://call.aivoco.on.cloud.vispark.in/ws/{apiKey}/{agentId}

Parameters

ParameterTypeRequiredDescription
apiKeystringYour Aivoco API key (unique for each user)
agentIdstringAgent identifier
Create agents via UI or - Follow: Create Agent
API keys from playground.aivoco.com
Example:
wss://call.aivoco.on.cloud.vispark.in/ws/UmhzzHeO20CTJ8gndiYcO5/be91326f-06ad-4d9d-9f433-e1e0d2334be4

⚙️ Authentication

Authentication is done via WebSocket path parameters:
  • The apiKey must be a valid Aivoco-issued key.
  • The agentId identifies the calling agent.
If either is invalid, the connection will be rejected.

🎙️ Connection Flow

1. Connect to WebSocket

Client establishes a secure WebSocket connection using the apiKey and agentId.
const ws = new WebSocket(`wss://call.aivoco.on.cloud.vispark.in/ws/${apiKey}/${agentId}`);

2. Send an Empty Handshake

Once the WebSocket opens, send an empty message ("") to trigger the handshake.
""

3. Send a start Event

After handshake, send a start message that defines your stream.
{
  "event": "start",
  "sequenceNumber": "1",
  "start": {
    "accountSid": "AC1234567890abcdef",
    "streamSid": "MZb3e98b94a3d",
    "callSid": "CA09d8724bce1",
    "tracks": ["inbound"],
    "customParameters": {},
    "mediaFormat": {
      "encoding": "audio/x-mulaw",
      "sampleRate": 8000,
      "channels": 1
    }
  },
  "streamSid": "MZb3e98b94a3d"
}
FieldTypeDescription
event"start"Type of message event
sequenceNumberstringSequential ID for message ordering
streamSidstringUnique stream session ID
mediaFormatobjectFormat of audio being sent
encoding"audio/x-mulaw"Audio encoding type
sampleRate8000Required sample rate (Hz)
channels1Mono audio

4. Stream Audio (media Event)

After the start message, stream microphone data in chunks encoded as Base64 µ-law audio.
{
  "event": "media",
  "media": {
    "track": "inbound",
    "chunk": 25,
    "timestamp": "1730446223000",
    "payload": "q0aZmJqampqamJiYmJiYmJgY..."
  },
  "sequenceNumber": 25
}
FieldTypeDescription
event"media"Indicates audio data
track"inbound"Audio direction
chunkintegerIncremental chunk counter
timestampstringUnix timestamp
payloadstringBase64-encoded µ-law audio data

5. Receive Audio Responses

Aivoco will send back similar media messages for AI-generated audio output:
{
  "event": "media",
  "media": {
    "track": "outbound",
    "chunk": 42,
    "timestamp": "1730446230112",
    "payload": "k0ZGRkZGRkZGRkZGRkZGRgYGB..."
  },
  "sequenceNumber": 42
}
Clients should decode the Base64 payload, convert µ-law to PCM, and play the sound through the browser audio context.

Other Event Types

EventDescription
connectedSent when connection established
disconnectedSent when the connection closes
statusOptional message with diagnostic info
errorReturned if authentication or format fails

Audio Requirements

ParameterRequiredValue
Encodingaudio/x-mulaw
Sample rate8000 Hz
Channels1 (mono)
Chunk sizeRecommended≤ 200 ms per frame
PayloadBase64 encoded

Error Responses

CodeMessageDescription
4001Invalid API KeyThe provided API key is invalid
4002Invalid Agent IDAgent ID not recognized
1006Connection closed abnormallyNetwork or backend disconnect
5000Internal Server ErrorUnexpected server-side error

Ending the Call

To end the session, simply close the WebSocket:
ws.close(1000, "Call ended");
Or send an explicit stop event:
{
  "event": "stop",
  "sequenceNumber": 99,
  "streamSid": "MZb3e98b94a3d"
}

Client Libraries

You can use any WebSocket-compatible library:
LanguageLibraryExample
JavaScriptWebSocket (built-in)new WebSocket(url)
Pythonwebsocketsawait websockets.connect(url)
Node.jswsconst ws = new WebSocket(url)

Example Connection Flow (JS)

const ws = new WebSocket("wss://call.aivoco.on.cloud.vispark.in/ws/YOUR_API_KEY/YOUR_AGENT_ID");

ws.onopen = () => {
  ws.send("");
  ws.send(JSON.stringify({
    event: "start",
    sequenceNumber: "1",
    start: {
      accountSid: "ACexample",
      streamSid: "MZexample",
      callSid: "CAexample",
      tracks: ["inbound"],
      mediaFormat: { encoding: "audio/x-mulaw", sampleRate: 8000, channels: 1 }
    }
  }));
};

ws.onmessage = (e) => console.log("Received:", e.data);
ws.onerror = (e) => console.error("Error:", e);
ws.onclose = () => console.log("Disconnected");

Quick Summary

StepDescription
1️⃣Open WebSocket using apiKey + agentId
2️⃣Send empty handshake message
3️⃣Send start JSON event
4️⃣Begin streaming µ-law audio chunks
5️⃣Receive and play AI-generated media audio
6️⃣Close WebSocket or send stop to end call

Example Reference

You can find a working demo showing how to connect to the Aivoco Real-Time API (without S2S WebSocket) on our GitHub repository: GitHub Aivoco Real-Time Call Demo on GitHub The repository includes:
  • Example code for establishing a WebSocket connection
  • Audio streaming setup (microphone to AI and back)
  • Step-by-step instructions to run the demo locally