Step 1: Sign up and get your API key

First, you’ll need to obtain your API key from the Vogent dashboard.

  1. Go to app.vogent.ai
  2. Sign up or log in to your account
  3. Navigate to API in the sidebar
  4. Create a new key by clicking New Key

This is a private API key; make sure to never expose it client-side.

Step 2: Make Your First Request

Basic Text-to-Speech

Let’s start with a simple text-to-speech request:

curl -X POST "https://api.vogent.ai/tts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "So I tried this new thing called Voicelab that I found online, and the voices that it generated were super realistic.",
    "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"
  }' \
  --output my-first-voice.wav

Test the Audio

After running the code above, you should have an audio file called my-first-voice.wav. Play it to hear the generation!

Step 3: Create a Conversation

Now let’s try the multispeaker feature to create a conversation:

import requests

api_key = "YOUR_API_KEY"

# Create a conversation between two people
conversation = {
    "lines": [
        {
            "text": "So I tried this new thing called Voicelab that I found online, and the voices that it generated were super realistic.",
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"  # The CSM-1B professional voice clone "Mabel"
        },
        {
            "text": "That's interesting, was it actually like super realistic or was it just not robotic.",
            "voiceId": "50c9287d-bcee-4f2a-943f-f0f2184a5d3b"  # The CSM-1B professional voice clone "Kevin"
        },
        {
            "text": "I mean, it's hard to tell the difference from a real person speaking. The technology is incredible.",
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"  # The CSM-1B professional voice clone "Mabel"
        },
        {
            "text": "Interesting, I've been looking for a new API to use in my app. I'll definitely check it out then.",
            "voiceId": "50c9287d-bcee-4f2a-943f-f0f2184a5d3b"  # The CSM-1B professional voice clone "Kevin"
        }
    ]
}

response = requests.post(
    "https://api.vogent.ai/tts/multispeaker",
    headers={"Authorization": f"Bearer {api_key}"},
    json=conversation
)

if response.status_code == 200:
    with open("conversation.wav", "wb") as f:
        f.write(response.content)
    print("✅ Conversation saved as 'conversation.wav'")
else:
    print(f"❌ Error: {response.status_code}")

Step 4: Real-time Streaming with WebSockets

For real-time applications, you can use WebSockets to stream text and receive audio chunks as they’re generated:

import asyncio
import websockets
import json
import base64
import uuid

async def stream_tts():
    # Connect to the WebSocket endpoint
    uri = "wss://api.vogent.ai/tts/websocket?api_key=YOUR_API_KEY"
    generation_id = f"gen_{uuid.uuid4()}"
    
    async with websockets.connect(uri) as websocket:
        print("🔗 Connected to Voicelab WebSocket")
        
        # Send initial text chunk
        await websocket.send(json.dumps({
            "generationId": generation_id,
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a",  # Mabel voice
            "text": "Hello! This is a real-time streaming example.",
            "finalText": False,
            "cancel": False
        }))
        
        audio_chunks = []
        
        # Listen for audio chunks
        async for message in websocket:
            data = json.loads(message)
            
            if data["type"] == "audio":
                # Decode and store audio chunk
                audio_data = base64.b64decode(data["audio"])
                audio_chunks.append(audio_data)
                print("🎵 Received audio chunk")
                
            elif data["type"] == "error":
                print(f"❌ Error: {data['error']}")
                break
                
            elif data["type"] == "complete":
                print("✅ Streaming complete!")
                # Save the complete audio
                with open("streaming_audio.wav", "wb") as f:
                    for chunk in audio_chunks:
                        f.write(chunk)
                print("💾 Audio saved as 'streaming_audio.wav'")
                break
        
        # Send final text chunk
        await websocket.send(json.dumps({
            "generationId": generation_id,
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a",
            "text": " This demonstrates real-time text-to-speech streaming!",
            "finalText": True,
            "cancel": False
        }))
        
        # Wait a bit more for the final chunk
        await asyncio.sleep(2)

# Run the streaming example
asyncio.run(stream_tts())

Why Use WebSocket Streaming?

WebSocket streaming is perfect for:

  • Live chatbots - Generate speech as the conversation progresses
  • Real-time applications - Immediate audio feedback
  • Interactive experiences - Dynamic content that changes based on user input
  • Low-latency needs - Start playing audio before the full text is processed

Next Steps

Congratulations! You’ve successfully generated your first AI voices. Here’s what to explore next:

Need Help?

Step 1: Sign up and get your API key

First, you’ll need to obtain your API key from the Vogent dashboard.

  1. Go to app.vogent.ai
  2. Sign up or log in to your account
  3. Navigate to API in the sidebar
  4. Create a new key by clicking New Key

This is a private API key; make sure to never expose it client-side.

Step 2: Make Your First Request

Basic Text-to-Speech

Let’s start with a simple text-to-speech request:

curl -X POST "https://api.vogent.ai/tts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "So I tried this new thing called Voicelab that I found online, and the voices that it generated were super realistic.",
    "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"
  }' \
  --output my-first-voice.wav

Test the Audio

After running the code above, you should have an audio file called my-first-voice.wav. Play it to hear the generation!

Step 3: Create a Conversation

Now let’s try the multispeaker feature to create a conversation:

import requests

api_key = "YOUR_API_KEY"

# Create a conversation between two people
conversation = {
    "lines": [
        {
            "text": "So I tried this new thing called Voicelab that I found online, and the voices that it generated were super realistic.",
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"  # The CSM-1B professional voice clone "Mabel"
        },
        {
            "text": "That's interesting, was it actually like super realistic or was it just not robotic.",
            "voiceId": "50c9287d-bcee-4f2a-943f-f0f2184a5d3b"  # The CSM-1B professional voice clone "Kevin"
        },
        {
            "text": "I mean, it's hard to tell the difference from a real person speaking. The technology is incredible.",
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"  # The CSM-1B professional voice clone "Mabel"
        },
        {
            "text": "Interesting, I've been looking for a new API to use in my app. I'll definitely check it out then.",
            "voiceId": "50c9287d-bcee-4f2a-943f-f0f2184a5d3b"  # The CSM-1B professional voice clone "Kevin"
        }
    ]
}

response = requests.post(
    "https://api.vogent.ai/tts/multispeaker",
    headers={"Authorization": f"Bearer {api_key}"},
    json=conversation
)

if response.status_code == 200:
    with open("conversation.wav", "wb") as f:
        f.write(response.content)
    print("✅ Conversation saved as 'conversation.wav'")
else:
    print(f"❌ Error: {response.status_code}")

Step 4: Real-time Streaming with WebSockets

For real-time applications, you can use WebSockets to stream text and receive audio chunks as they’re generated:

import asyncio
import websockets
import json
import base64
import uuid

async def stream_tts():
    # Connect to the WebSocket endpoint
    uri = "wss://api.vogent.ai/tts/websocket?api_key=YOUR_API_KEY"
    generation_id = f"gen_{uuid.uuid4()}"
    
    async with websockets.connect(uri) as websocket:
        print("🔗 Connected to Voicelab WebSocket")
        
        # Send initial text chunk
        await websocket.send(json.dumps({
            "generationId": generation_id,
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a",  # Mabel voice
            "text": "Hello! This is a real-time streaming example.",
            "finalText": False,
            "cancel": False
        }))
        
        audio_chunks = []
        
        # Listen for audio chunks
        async for message in websocket:
            data = json.loads(message)
            
            if data["type"] == "audio":
                # Decode and store audio chunk
                audio_data = base64.b64decode(data["audio"])
                audio_chunks.append(audio_data)
                print("🎵 Received audio chunk")
                
            elif data["type"] == "error":
                print(f"❌ Error: {data['error']}")
                break
                
            elif data["type"] == "complete":
                print("✅ Streaming complete!")
                # Save the complete audio
                with open("streaming_audio.wav", "wb") as f:
                    for chunk in audio_chunks:
                        f.write(chunk)
                print("💾 Audio saved as 'streaming_audio.wav'")
                break
        
        # Send final text chunk
        await websocket.send(json.dumps({
            "generationId": generation_id,
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a",
            "text": " This demonstrates real-time text-to-speech streaming!",
            "finalText": True,
            "cancel": False
        }))
        
        # Wait a bit more for the final chunk
        await asyncio.sleep(2)

# Run the streaming example
asyncio.run(stream_tts())

Why Use WebSocket Streaming?

WebSocket streaming is perfect for:

  • Live chatbots - Generate speech as the conversation progresses
  • Real-time applications - Immediate audio feedback
  • Interactive experiences - Dynamic content that changes based on user input
  • Low-latency needs - Start playing audio before the full text is processed

Next Steps

Congratulations! You’ve successfully generated your first AI voices. Here’s what to explore next:

Need Help?