Aivoco Real-Time Voice Call API Documentation
Overview
The Aivoco Real-Time Call API enables developers to initiate a live, two-way voice conversation with the AI by connecting to a WebSocket endpoint. Once connected, the client streams live microphone audio (audio/x-mulaw) to the AI, while receiving real-time, generated speech responses back through the same connection — enabling fully interactive, natural voice communication.
Base URL
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
apiKey | string | ✅ | Your Aivoco API key (unique for each user) |
agentId | string | ✅ | Agent identifier |
API keys from playground.aivoco.com Example:
⚙️ Authentication
Authentication is done via WebSocket path parameters:- The
apiKeymust be a valid Aivoco-issued key. - The
agentIdidentifies the calling agent.
🎙️ Connection Flow
1. Connect to WebSocket
Client establishes a secure WebSocket connection using theapiKey and agentId.
2. Send an Empty Handshake
Once the WebSocket opens, send an empty message ("") to trigger the handshake.
3. Send a start Event
After handshake, send a start message that defines your stream.
| Field | Type | Description |
|---|---|---|
event | "start" | Type of message event |
sequenceNumber | string | Sequential ID for message ordering |
streamSid | string | Unique stream session ID |
mediaFormat | object | Format of audio being sent |
encoding | "audio/x-mulaw" | Audio encoding type |
sampleRate | 8000 | Required sample rate (Hz) |
channels | 1 | Mono audio |
4. Stream Audio (media Event)
After the start message, stream microphone data in chunks encoded as Base64 µ-law audio.
| Field | Type | Description |
|---|---|---|
event | "media" | Indicates audio data |
track | "inbound" | Audio direction |
chunk | integer | Incremental chunk counter |
timestamp | string | Unix timestamp |
payload | string | Base64-encoded µ-law audio data |
5. Receive Audio Responses
Aivoco will send back similarmedia messages for AI-generated audio output:
Other Event Types
| Event | Description |
|---|---|
connected | Sent when connection established |
disconnected | Sent when the connection closes |
status | Optional message with diagnostic info |
error | Returned if authentication or format fails |
Audio Requirements
| Parameter | Required | Value |
|---|---|---|
| Encoding | ✅ | audio/x-mulaw |
| Sample rate | ✅ | 8000 Hz |
| Channels | ✅ | 1 (mono) |
| Chunk size | Recommended | ≤ 200 ms per frame |
| Payload | ✅ | Base64 encoded |
Error Responses
| Code | Message | Description |
|---|---|---|
4001 | Invalid API Key | The provided API key is invalid |
4002 | Invalid Agent ID | Agent ID not recognized |
1006 | Connection closed abnormally | Network or backend disconnect |
5000 | Internal Server Error | Unexpected server-side error |
Ending the Call
To end the session, simply close the WebSocket:Client Libraries
You can use any WebSocket-compatible library:| Language | Library | Example |
|---|---|---|
| JavaScript | WebSocket (built-in) | new WebSocket(url) |
| Python | websockets | await websockets.connect(url) |
| Node.js | ws | const ws = new WebSocket(url) |
Example Connection Flow (JS)
Quick Summary
| Step | Description |
|---|---|
| 1️⃣ | Open WebSocket using apiKey + agentId |
| 2️⃣ | Send empty handshake message |
| 3️⃣ | Send start JSON event |
| 4️⃣ | Begin streaming µ-law audio chunks |
| 5️⃣ | Receive and play AI-generated media audio |
| 6️⃣ | Close WebSocket or send stop to end call |
Example Reference
You can find a working demo showing how to connect to the Aivoco Real-Time API (without S2S WebSocket) on our GitHub repository:- Example code for establishing a WebSocket connection
- Audio streaming setup (microphone to AI and back)
- Step-by-step instructions to run the demo locally