Quickstart - Vogent

First, you’ll need to obtain your API key from the Vogent dashboard.

Go to app.vogent.ai
Sign up or log in to your account
Navigate to API in the sidebar
Create a new key by clicking New Key

This is a private API key; make sure to never expose it client-side.

Step 2: Make Your First Request

Basic Text-to-Speech

Let’s start with a simple text-to-speech request:

curl -X POST "https://api.vogent.ai/tts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "So I tried this new thing called Voicelab that I found online, and the voices that it generated were super realistic.",
    "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"
  }' \
  --output my-first-voice.wav

Test the Audio

After running the code above, you should have an audio file called my-first-voice.wav. Play it to hear the generation!

Step 3: Create a Conversation

Now let’s try the multispeaker feature to create a conversation:

import requests

api_key = "YOUR_API_KEY"

# Create a conversation between two people
conversation = {
    "lines": [
        {
            "text": "So I tried this new thing called Voicelab that I found online, and the voices that it generated were super realistic.",
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"  # The CSM-1B professional voice clone "Mabel"
        },
        {
            "text": "That's interesting, was it actually like super realistic or was it just not robotic.",
            "voiceId": "50c9287d-bcee-4f2a-943f-f0f2184a5d3b"  # The CSM-1B professional voice clone "Kevin"
        },
        {
            "text": "I mean, it's hard to tell the difference from a real person speaking. The technology is incredible.",
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"  # The CSM-1B professional voice clone "Mabel"
        },
        {
            "text": "Interesting, I've been looking for a new API to use in my app. I'll definitely check it out then.",
            "voiceId": "50c9287d-bcee-4f2a-943f-f0f2184a5d3b"  # The CSM-1B professional voice clone "Kevin"
        }
    ]
}

response = requests.post(
    "https://api.vogent.ai/tts/multispeaker",
    headers={"Authorization": f"Bearer {api_key}"},
    json=conversation
)

if response.status_code == 200:
    with open("conversation.wav", "wb") as f:
        f.write(response.content)
    print("✅ Conversation saved as 'conversation.wav'")
else:
    print(f"❌ Error: {response.status_code}")

Step 4: Real-time Streaming with WebSockets

For real-time applications, you can use WebSockets to stream text and receive audio chunks as they’re generated:

import asyncio
import websockets
import json
import base64
import uuid

async def stream_tts():
    # Connect to the WebSocket endpoint
    uri = "wss://api.vogent.ai/tts/websocket?api_key=YOUR_API_KEY"
    generation_id = f"gen_{uuid.uuid4()}"
    
    async with websockets.connect(uri) as websocket:
        print("🔗 Connected to Voicelab WebSocket")
        
        # Send initial text chunk
        await websocket.send(json.dumps({
            "generationId": generation_id,
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a",  # Mabel voice
            "text": "Hello! This is a real-time streaming example.",
            "finalText": False,
            "cancel": False
        }))
        
        audio_chunks = []
        
        # Listen for audio chunks
        async for message in websocket:
            data = json.loads(message)
            
            if data["type"] == "audio":
                # Decode and store audio chunk
                audio_data = base64.b64decode(data["audio"])
                audio_chunks.append(audio_data)
                print("🎵 Received audio chunk")
                
            elif data["type"] == "error":
                print(f"❌ Error: {data['error']}")
                break
                
            elif data["type"] == "complete":
                print("✅ Streaming complete!")
                # Save the complete audio
                with open("streaming_audio.wav", "wb") as f:
                    for chunk in audio_chunks:
                        f.write(chunk)
                print("💾 Audio saved as 'streaming_audio.wav'")
                break
        
        # Send final text chunk
        await websocket.send(json.dumps({
            "generationId": generation_id,
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a",
            "text": " This demonstrates real-time text-to-speech streaming!",
            "finalText": True,
            "cancel": False
        }))
        
        # Wait a bit more for the final chunk
        await asyncio.sleep(2)

# Run the streaming example
asyncio.run(stream_tts())

Why Use WebSocket Streaming?

WebSocket streaming is perfect for:

Live chatbots - Generate speech as the conversation progresses
Real-time applications - Immediate audio feedback
Interactive experiences - Dynamic content that changes based on user input
Low-latency needs - Start playing audio before the full text is processed

Next Steps

Congratulations! You’ve successfully generated your first AI voices. Here’s what to explore next:

Explore Models

Learn about different AI models and their capabilities

Voice Library

Browse our complete voice library and cloning options

API Reference

Detailed API documentation with all parameters

Advanced Features

Discover advanced features like voice cloning and SSML

Need Help?

Documentation: Explore our comprehensive API documentation
Support: Contact our support team at support@vogent.ai
Community: Join our Discord community for discussions and tips
Examples: Check out more examples in our GitHub repository

On this page

Step 1: Sign up and get your API key
Step 2: Make Your First Request
Basic Text-to-Speech
Test the Audio
Step 3: Create a Conversation
Step 4: Real-time Streaming with WebSockets
Why Use WebSocket Streaming?
Next Steps
Need Help?

First, you’ll need to obtain your API key from the Vogent dashboard.

Go to app.vogent.ai
Sign up or log in to your account
Navigate to API in the sidebar
Create a new key by clicking New Key

This is a private API key; make sure to never expose it client-side.

Step 2: Make Your First Request

Basic Text-to-Speech

Let’s start with a simple text-to-speech request:

curl -X POST "https://api.vogent.ai/tts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "So I tried this new thing called Voicelab that I found online, and the voices that it generated were super realistic.",
    "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"
  }' \
  --output my-first-voice.wav

Test the Audio

After running the code above, you should have an audio file called my-first-voice.wav. Play it to hear the generation!

Step 3: Create a Conversation

Now let’s try the multispeaker feature to create a conversation:

import requests

api_key = "YOUR_API_KEY"

# Create a conversation between two people
conversation = {
    "lines": [
        {
            "text": "So I tried this new thing called Voicelab that I found online, and the voices that it generated were super realistic.",
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"  # The CSM-1B professional voice clone "Mabel"
        },
        {
            "text": "That's interesting, was it actually like super realistic or was it just not robotic.",
            "voiceId": "50c9287d-bcee-4f2a-943f-f0f2184a5d3b"  # The CSM-1B professional voice clone "Kevin"
        },
        {
            "text": "I mean, it's hard to tell the difference from a real person speaking. The technology is incredible.",
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a"  # The CSM-1B professional voice clone "Mabel"
        },
        {
            "text": "Interesting, I've been looking for a new API to use in my app. I'll definitely check it out then.",
            "voiceId": "50c9287d-bcee-4f2a-943f-f0f2184a5d3b"  # The CSM-1B professional voice clone "Kevin"
        }
    ]
}

response = requests.post(
    "https://api.vogent.ai/tts/multispeaker",
    headers={"Authorization": f"Bearer {api_key}"},
    json=conversation
)

if response.status_code == 200:
    with open("conversation.wav", "wb") as f:
        f.write(response.content)
    print("✅ Conversation saved as 'conversation.wav'")
else:
    print(f"❌ Error: {response.status_code}")

Step 4: Real-time Streaming with WebSockets

For real-time applications, you can use WebSockets to stream text and receive audio chunks as they’re generated:

import asyncio
import websockets
import json
import base64
import uuid

async def stream_tts():
    # Connect to the WebSocket endpoint
    uri = "wss://api.vogent.ai/tts/websocket?api_key=YOUR_API_KEY"
    generation_id = f"gen_{uuid.uuid4()}"
    
    async with websockets.connect(uri) as websocket:
        print("🔗 Connected to Voicelab WebSocket")
        
        # Send initial text chunk
        await websocket.send(json.dumps({
            "generationId": generation_id,
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a",  # Mabel voice
            "text": "Hello! This is a real-time streaming example.",
            "finalText": False,
            "cancel": False
        }))
        
        audio_chunks = []
        
        # Listen for audio chunks
        async for message in websocket:
            data = json.loads(message)
            
            if data["type"] == "audio":
                # Decode and store audio chunk
                audio_data = base64.b64decode(data["audio"])
                audio_chunks.append(audio_data)
                print("🎵 Received audio chunk")
                
            elif data["type"] == "error":
                print(f"❌ Error: {data['error']}")
                break
                
            elif data["type"] == "complete":
                print("✅ Streaming complete!")
                # Save the complete audio
                with open("streaming_audio.wav", "wb") as f:
                    for chunk in audio_chunks:
                        f.write(chunk)
                print("💾 Audio saved as 'streaming_audio.wav'")
                break
        
        # Send final text chunk
        await websocket.send(json.dumps({
            "generationId": generation_id,
            "voiceId": "23b2186b-ed56-4185-998c-8d19e1bb227a",
            "text": " This demonstrates real-time text-to-speech streaming!",
            "finalText": True,
            "cancel": False
        }))
        
        # Wait a bit more for the final chunk
        await asyncio.sleep(2)

# Run the streaming example
asyncio.run(stream_tts())

Why Use WebSocket Streaming?

WebSocket streaming is perfect for:

Live chatbots - Generate speech as the conversation progresses
Real-time applications - Immediate audio feedback
Interactive experiences - Dynamic content that changes based on user input
Low-latency needs - Start playing audio before the full text is processed

Next Steps

Congratulations! You’ve successfully generated your first AI voices. Here’s what to explore next:

Explore Models

Learn about different AI models and their capabilities

Voice Library

Browse our complete voice library and cloning options

API Reference

Detailed API documentation with all parameters

Advanced Features

Discover advanced features like voice cloning and SSML

Need Help?

Documentation: Explore our comprehensive API documentation
Support: Contact our support team at support@vogent.ai
Community: Join our Discord community for discussions and tips
Examples: Check out more examples in our GitHub repository

On this page

Step 1: Sign up and get your API key
Step 2: Make Your First Request
Basic Text-to-Speech
Test the Audio
Step 3: Create a Conversation
Step 4: Real-time Streaming with WebSockets
Why Use WebSocket Streaming?
Next Steps
Need Help?

​Step 1: Sign up and get your API key

​Step 2: Make Your First Request

​Basic Text-to-Speech

​Test the Audio

​Step 3: Create a Conversation

​Step 4: Real-time Streaming with WebSockets

​Why Use WebSocket Streaming?

​Next Steps

Explore Models

Voice Library

API Reference

Advanced Features

​Need Help?

Voicelab

​Step 1: Sign up and get your API key

​Step 2: Make Your First Request

​Basic Text-to-Speech

​Test the Audio

​Step 3: Create a Conversation

​Step 4: Real-time Streaming with WebSockets

​Why Use WebSocket Streaming?

​Next Steps

Explore Models

Voice Library

API Reference

Advanced Features

​Need Help?

Step 1: Sign up and get your API key

Step 2: Make Your First Request

Basic Text-to-Speech

Test the Audio

Step 3: Create a Conversation

Step 4: Real-time Streaming with WebSockets

Why Use WebSocket Streaming?

Next Steps

Need Help?

Step 1: Sign up and get your API key

Step 2: Make Your First Request

Basic Text-to-Speech

Test the Audio

Step 3: Create a Conversation

Step 4: Real-time Streaming with WebSockets

Why Use WebSocket Streaming?

Next Steps

Need Help?