Development

Welcome to the AIVOCO Development Guide — your reference for building, testing, and scaling speech-to-speech (S2S) voice agents powered by the ROSE model.

Local Development Setup

Before you start coding, ensure that you’ve completed the Quickstart guide. You’ll need:

Your API key from playground.aivoco.com
An Agent ID created using the Create Agent endpoint
Access to the WebSocket URL to connect your speech agent

WebSocket Connection

The ROSE model communicates in real time over WebSocket. To connect your application:

const ws = new WebSocket("wss://call.aivoco.on.cloud.vispark.in/ws/{api_key}/{agent_id}");

ws.onopen = () => {
  console.log("Connected to ROSE agent");
  // Start sending or receiving audio data here
};

ws.onmessage = (event) => {
  console.log("Received audio or transcript:", event.data);
};

ws.onclose = () => console.log("Connection closed");

This WebSocket stream enables real-time bi-directional audio between your system and the ROSE model.

Agent Lifecycle

You can manage your voice agents through the REST API. Common operations include:

Action	Endpoint
Create Agent	`POST /agents`
Get Agent	`GET /agents/{id}`
Update Agent	`PUT /agents/{id}`
Delete Agent	`DELETE /agents/{id}`

Each operation allows you to programmatically manage agent configuration, personalities, voices, and function calling capabilities.

Connecting with Telephony

To use your agent for voice calls, connect the WebSocket endpoint to your telephony provider. We provide an example integration with Twilio in Outbound Telephony Integration, but you can use any telephony system that supports audio streaming. If your preferred provider is not listed, you can reach out at 📧 tanusri@aivoco.com for custom integration support.

Testing and Debugging

Testing locally

You can test your agents using the ROSE WebSocket without deploying to production.
Simply connect using your API key and agent ID from the playground.

Audio debugging

Record incoming and outgoing audio locally to verify audio stream quality.
Use tools like ffmpeg or sox to capture and inspect audio packets.

Inspecting responses

The WebSocket will send both audio and transcript events.
Log all message events in your client for better observability.

Retrieving Transcriptions

After your calls are completed, you can retrieve full conversation transcripts using the GET /transcriptions/{call_id} endpoint. This allows you to analyze and store the conversation context for your business logic or analytics layer.

Next Steps

Build and deploy your own ROSE voice agent
Connect with your telephony or VoIP system
Retrieve and analyze transcripts
Launch your AI-powered conversations

Getting started

Models

ROSE

Telephony

Model Benchmarks

Local Development Setup

WebSocket Connection

Agent Lifecycle

Connecting with Telephony

Testing and Debugging

Retrieving Transcriptions

Next Steps

Getting started

Models

ROSE

Telephony

Model Benchmarks

​Local Development Setup

​WebSocket Connection

​Agent Lifecycle

​Connecting with Telephony

​Testing and Debugging

​Retrieving Transcriptions

​Next Steps

Local Development Setup

WebSocket Connection

Agent Lifecycle

Connecting with Telephony

Testing and Debugging

Retrieving Transcriptions

Next Steps