Skip to main content
Welcome — this quickstart will get you up and running with ROSE, AIVOCO’s real-time Speech-to-Speech (S2S) model, in three simple steps.
Goal: Create a voice AI agent, connect it to live audio (web or telephony), and capture transcriptions.

Flow diagram showcasing ROSE speech-to-speech model connection architecture

Step 1 — Get your API key

  1. Sign in to the AIVOCO Playground: https://playground.aivoco.com
  2. Open Dashboard → API Keys and generate (or copy) your API key.
  3. Keep the key secure — do not publish it.
You will use this API key to authenticate both HTTP requests and WebSocket connections.

Step 2 — Create your Agent

Create a new Speech-to-Speech agent using the Create Agent docs.
  • Follow: Create Agent
  • Provide a name, a system_message that defines the agent persona, and a voice (boy or girl).
  • Optionally add functions to enable function-calling behavior from the agent.
When created, note the returned agent_id — you will need it to connect your WebSocket session.

Step 3 — Connect via WebSocket (Web or Telephony)

Use our WebSocket endpoint to stream audio bi-directionally with ROSE. This is the same mechanism whether you connect from a browser app or bridge an external telephony provider. WebSocket URL (example):
wss://call.aivoco.on.cloud.wispok.in/ws/{API_KEY}/{AGENT_ID}
Replace {API_KEY} and {AGENT_ID} with your values from Step 1 and Step 2. Once connected you can:
  • Send audio frames (client → ROSE)
  • Receive synthesized audio responses (ROSE → client)
  • Maintain session context for multi-turn conversations

Transcription (After the Call)

After a call ends you can obtain a transcription using the Transcription endpoint (see the Transcription docs). The typical flow is:
  1. Make your call (web or telephony) to the agent connected to ROSE.
  2. Note the call_id returned or available in your call logs.
  3. Use the Transcription API to fetch the text transcript for that call_id.
See: Transcribe Audio

Tips & Notes

  • Use the Playground to test agents interactively before wiring production telephony.
  • If your telephony provider isn’t supported out of the box, contact us at [email protected].
  • Protect your API key — treat it like a password.

© 2025 AIVOCO | ROSE – Real-Time Speech-to-Speech Intelligence