Advanced Voice Synthesis Features

Discover the advanced features available in Voicelab’s voice synthesis engine.

Multispeaker Text to Speech

Create conversations and dialogues with multiple distinct voices in a single audio file.

Basic Multispeaker Usage

Generate conversations with different speakers:

import requests

response = requests.post(
    "https://api.vogent.ai/tts/multispeaker",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "lines": [
            {
                "text": "Welcome to our store! How can I help you today?",
                "voiceId": "professional-female-1"
            },
            {
                "text": "Hi, I'm looking for a new laptop for work.",
                "voiceId": "conversational-male-1"
            },
            {
                "text": "Great! I can help you find the perfect laptop. What type of work do you do?",
                "voiceId": "professional-female-1"
            }
        ]
    }
)

if response.status_code == 200:
    with open("conversation.wav", "wb") as f:
        f.write(response.content)
    print("Conversation saved successfully!")

Advanced Multispeaker Scenarios

Create complex dialogues for various use cases:

Customer Service Training

training_dialogue = {
    "lines": [
        {"text": "Thank you for calling customer support. How can I assist you today?", "voiceId": "professional-female-1"},
        {"text": "I'm having trouble with my recent order. It hasn't arrived yet.", "voiceId": "conversational-male-1"},
        {"text": "I understand your concern. Let me look up your order details. Can you provide your order number?", "voiceId": "professional-female-1"},
        {"text": "Sure, it's order number 12345.", "voiceId": "conversational-male-1"},
        {"text": "Thank you. I can see your order was shipped yesterday and should arrive by tomorrow.", "voiceId": "professional-female-1"}
    ]
}

response = requests.post("https://api.vogent.ai/tts/multispeaker", headers=headers, json=training_dialogue)

Educational Content

lesson_dialogue = {
    "lines": [
        {"text": "Today we're learning about photosynthesis. Can anyone tell me what plants need for photosynthesis?", "voiceId": "professional-female-1"},
        {"text": "They need sunlight, water, and carbon dioxide!", "voiceId": "conversational-female-1"},
        {"text": "Excellent! And what do plants produce during photosynthesis?", "voiceId": "professional-female-1"},
        {"text": "They produce oxygen and glucose!", "voiceId": "conversational-male-1"}
    ]
}

Voice Selection Best Practices

Choose appropriate voices for different scenarios:

Professional Contexts

Business presentations: Use professional-female-1 or professional-male-1
Corporate training: Professional voices for instructors, conversational for participants
Customer service: Professional voices for representatives

Casual and Creative Content

Podcasts: Mix of conversational and expressive voices
Audiobooks: Expressive voices for characters, conversational for narration
Social media content: Conversational voices for relatability

Voice Pairing Guidelines

When using multiple voices in a conversation:

Contrast is key: Use different genders or voice types for clarity
Consistency: Keep the same voice for each character throughout
Context matching: Match voice formality to the scenario

# Good voice pairing example
customer_service_scenario = {
    "lines": [
        {"text": "Good morning, how may I help you?", "voiceId": "professional-female-1"},  # Formal representative
        {"text": "Hi, I need help with my account.", "voiceId": "conversational-male-1"},    # Casual customer
        {"text": "I'd be happy to assist you with that.", "voiceId": "professional-female-1"}  # Consistent representative
    ]
}

Streaming Audio Output

Voicelab automatically streams audio bytes as they are generated, providing optimal performance for real-time applications.

Benefits of Streaming

Lower latency: Audio starts playing before generation is complete
Memory efficiency: No need to buffer entire audio files
Better user experience: Immediate audio feedback

Implementation Tips

import requests

# Stream audio directly to a file
response = requests.post(
    "https://api.vogent.ai/tts",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"text": "This is a streaming audio example.", "voiceId": "professional-female-1"},
    stream=True
)

with open("streaming_audio.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

Error Handling and Best Practices

Common Error Scenarios

import requests

def synthesize_text(text, voice_id):
    try:
        response = requests.post(
            "https://api.vogent.ai/tts",
            headers={"Authorization": f"Bearer {api_key}"},
            json={"text": text, "voiceId": voice_id}
        )
        
        if response.status_code == 200:
            return response.content
        elif response.status_code == 400:
            print("Error: Invalid input - check your text and voice ID")
        elif response.status_code == 422:
            print("Error: Validation failed - invalid voice ID or text format")
        elif response.status_code == 429:
            print("Error: Rate limit exceeded - please wait before retrying")
        else:
            print(f"Error: Unexpected status code {response.status_code}")
            
    except requests.exceptions.RequestException as e:
        print(f"Network error: {e}")
    
    return None

Rate Limiting Best Practices

import time
import requests

def synthesize_with_retry(text, voice_id, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://api.vogent.ai/tts",
                headers={"Authorization": f"Bearer {api_key}"},
                json={"text": text, "voiceId": voice_id}
            )
            
            if response.status_code == 200:
                return response.content
            elif response.status_code == 429:
                # Rate limited, wait and retry
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time} seconds before retry...")
                time.sleep(wait_time)
                continue
            else:
                print(f"Error: {response.status_code}")
                break
                
        except requests.exceptions.RequestException as e:
            print(f"Network error: {e}")
            time.sleep(2 ** attempt)
    
    return None

Performance Optimization

Text Length Considerations

Optimal length: 100-1000 characters per request
Maximum length: 5000 characters for single TTS
Multispeaker limits: 500 characters per line, 10 lines maximum

Batch Processing

For large amounts of text, split into smaller chunks:

def batch_synthesize(long_text, voice_id, chunk_size=1000):
    chunks = [long_text[i:i+chunk_size] for i in range(0, len(long_text), chunk_size)]
    audio_files = []
    
    for i, chunk in enumerate(chunks):
        audio_data = synthesize_text(chunk, voice_id)
        if audio_data:
            filename = f"chunk_{i}.wav"
            with open(filename, "wb") as f:
                f.write(audio_data)
            audio_files.append(filename)
    
    return audio_files

Integration Examples

Web Application Integration

// Frontend JavaScript example
async function synthesizeText(text, voiceId) {
    try {
        const response = await fetch('/api/tts', {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({ text, voiceId })
        });
        
        if (response.ok) {
            const audioBlob = await response.blob();
            const audioUrl = URL.createObjectURL(audioBlob);
            
            const audio = new Audio(audioUrl);
            audio.play();
        }
    } catch (error) {
        console.error('TTS Error:', error);
    }
}

Mobile App Integration

# Python backend for mobile app
from flask import Flask, request, Response
import requests

app = Flask(__name__)

@app.route('/tts', methods=['POST'])
def text_to_speech():
    data = request.json
    
    response = requests.post(
        "https://api.vogent.ai/tts",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=data
    )
    
    if response.status_code == 200:
        return Response(
            response.content,
            mimetype='audio/wav',
            headers={'Content-Disposition': 'attachment; filename=speech.wav'}
        )
    else:
        return {'error': 'TTS generation failed'}, 500

On this page

Advanced Voice Synthesis Features
Multispeaker Text to Speech
Basic Multispeaker Usage
Advanced Multispeaker Scenarios
Customer Service Training
Educational Content
Voice Selection Best Practices
Professional Contexts
Casual and Creative Content
Voice Pairing Guidelines
Streaming Audio Output
Benefits of Streaming
Implementation Tips
Error Handling and Best Practices
Common Error Scenarios
Rate Limiting Best Practices
Performance Optimization
Text Length Considerations
Batch Processing
Integration Examples
Web Application Integration
Mobile App Integration

Advanced Voice Synthesis Features

Discover the advanced features available in Voicelab’s voice synthesis engine.

Multispeaker Text to Speech

Create conversations and dialogues with multiple distinct voices in a single audio file.

Basic Multispeaker Usage

Generate conversations with different speakers:

import requests

response = requests.post(
    "https://api.vogent.ai/tts/multispeaker",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "lines": [
            {
                "text": "Welcome to our store! How can I help you today?",
                "voiceId": "professional-female-1"
            },
            {
                "text": "Hi, I'm looking for a new laptop for work.",
                "voiceId": "conversational-male-1"
            },
            {
                "text": "Great! I can help you find the perfect laptop. What type of work do you do?",
                "voiceId": "professional-female-1"
            }
        ]
    }
)

if response.status_code == 200:
    with open("conversation.wav", "wb") as f:
        f.write(response.content)
    print("Conversation saved successfully!")

Advanced Multispeaker Scenarios

Create complex dialogues for various use cases:

Customer Service Training

training_dialogue = {
    "lines": [
        {"text": "Thank you for calling customer support. How can I assist you today?", "voiceId": "professional-female-1"},
        {"text": "I'm having trouble with my recent order. It hasn't arrived yet.", "voiceId": "conversational-male-1"},
        {"text": "I understand your concern. Let me look up your order details. Can you provide your order number?", "voiceId": "professional-female-1"},
        {"text": "Sure, it's order number 12345.", "voiceId": "conversational-male-1"},
        {"text": "Thank you. I can see your order was shipped yesterday and should arrive by tomorrow.", "voiceId": "professional-female-1"}
    ]
}

response = requests.post("https://api.vogent.ai/tts/multispeaker", headers=headers, json=training_dialogue)

Educational Content

lesson_dialogue = {
    "lines": [
        {"text": "Today we're learning about photosynthesis. Can anyone tell me what plants need for photosynthesis?", "voiceId": "professional-female-1"},
        {"text": "They need sunlight, water, and carbon dioxide!", "voiceId": "conversational-female-1"},
        {"text": "Excellent! And what do plants produce during photosynthesis?", "voiceId": "professional-female-1"},
        {"text": "They produce oxygen and glucose!", "voiceId": "conversational-male-1"}
    ]
}

Voice Selection Best Practices

Choose appropriate voices for different scenarios:

Professional Contexts

Business presentations: Use professional-female-1 or professional-male-1
Corporate training: Professional voices for instructors, conversational for participants
Customer service: Professional voices for representatives

Casual and Creative Content

Podcasts: Mix of conversational and expressive voices
Audiobooks: Expressive voices for characters, conversational for narration
Social media content: Conversational voices for relatability

Voice Pairing Guidelines

When using multiple voices in a conversation:

Contrast is key: Use different genders or voice types for clarity
Consistency: Keep the same voice for each character throughout
Context matching: Match voice formality to the scenario

# Good voice pairing example
customer_service_scenario = {
    "lines": [
        {"text": "Good morning, how may I help you?", "voiceId": "professional-female-1"},  # Formal representative
        {"text": "Hi, I need help with my account.", "voiceId": "conversational-male-1"},    # Casual customer
        {"text": "I'd be happy to assist you with that.", "voiceId": "professional-female-1"}  # Consistent representative
    ]
}

Streaming Audio Output

Voicelab automatically streams audio bytes as they are generated, providing optimal performance for real-time applications.

Benefits of Streaming

Lower latency: Audio starts playing before generation is complete
Memory efficiency: No need to buffer entire audio files
Better user experience: Immediate audio feedback

Implementation Tips

import requests

# Stream audio directly to a file
response = requests.post(
    "https://api.vogent.ai/tts",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"text": "This is a streaming audio example.", "voiceId": "professional-female-1"},
    stream=True
)

with open("streaming_audio.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

Error Handling and Best Practices

Common Error Scenarios

import requests

def synthesize_text(text, voice_id):
    try:
        response = requests.post(
            "https://api.vogent.ai/tts",
            headers={"Authorization": f"Bearer {api_key}"},
            json={"text": text, "voiceId": voice_id}
        )
        
        if response.status_code == 200:
            return response.content
        elif response.status_code == 400:
            print("Error: Invalid input - check your text and voice ID")
        elif response.status_code == 422:
            print("Error: Validation failed - invalid voice ID or text format")
        elif response.status_code == 429:
            print("Error: Rate limit exceeded - please wait before retrying")
        else:
            print(f"Error: Unexpected status code {response.status_code}")
            
    except requests.exceptions.RequestException as e:
        print(f"Network error: {e}")
    
    return None

Rate Limiting Best Practices

import time
import requests

def synthesize_with_retry(text, voice_id, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://api.vogent.ai/tts",
                headers={"Authorization": f"Bearer {api_key}"},
                json={"text": text, "voiceId": voice_id}
            )
            
            if response.status_code == 200:
                return response.content
            elif response.status_code == 429:
                # Rate limited, wait and retry
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time} seconds before retry...")
                time.sleep(wait_time)
                continue
            else:
                print(f"Error: {response.status_code}")
                break
                
        except requests.exceptions.RequestException as e:
            print(f"Network error: {e}")
            time.sleep(2 ** attempt)
    
    return None

Performance Optimization

Text Length Considerations

Optimal length: 100-1000 characters per request
Maximum length: 5000 characters for single TTS
Multispeaker limits: 500 characters per line, 10 lines maximum

Batch Processing

For large amounts of text, split into smaller chunks:

def batch_synthesize(long_text, voice_id, chunk_size=1000):
    chunks = [long_text[i:i+chunk_size] for i in range(0, len(long_text), chunk_size)]
    audio_files = []
    
    for i, chunk in enumerate(chunks):
        audio_data = synthesize_text(chunk, voice_id)
        if audio_data:
            filename = f"chunk_{i}.wav"
            with open(filename, "wb") as f:
                f.write(audio_data)
            audio_files.append(filename)
    
    return audio_files

Integration Examples

Web Application Integration

// Frontend JavaScript example
async function synthesizeText(text, voiceId) {
    try {
        const response = await fetch('/api/tts', {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({ text, voiceId })
        });
        
        if (response.ok) {
            const audioBlob = await response.blob();
            const audioUrl = URL.createObjectURL(audioBlob);
            
            const audio = new Audio(audioUrl);
            audio.play();
        }
    } catch (error) {
        console.error('TTS Error:', error);
    }
}

Mobile App Integration

# Python backend for mobile app
from flask import Flask, request, Response
import requests

app = Flask(__name__)

@app.route('/tts', methods=['POST'])
def text_to_speech():
    data = request.json
    
    response = requests.post(
        "https://api.vogent.ai/tts",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=data
    )
    
    if response.status_code == 200:
        return Response(
            response.content,
            mimetype='audio/wav',
            headers={'Content-Disposition': 'attachment; filename=speech.wav'}
        )
    else:
        return {'error': 'TTS generation failed'}, 500

On this page

Advanced Voice Synthesis Features
Multispeaker Text to Speech
Basic Multispeaker Usage
Advanced Multispeaker Scenarios
Customer Service Training
Educational Content
Voice Selection Best Practices
Professional Contexts
Casual and Creative Content
Voice Pairing Guidelines
Streaming Audio Output
Benefits of Streaming
Implementation Tips
Error Handling and Best Practices
Common Error Scenarios
Rate Limiting Best Practices
Performance Optimization
Text Length Considerations
Batch Processing
Integration Examples
Web Application Integration
Mobile App Integration

​Advanced Voice Synthesis Features

​Multispeaker Text to Speech

​Basic Multispeaker Usage

​Advanced Multispeaker Scenarios

​Customer Service Training

​Educational Content

​Voice Selection Best Practices

​Professional Contexts

​Casual and Creative Content

​Voice Pairing Guidelines

​Streaming Audio Output

​Benefits of Streaming

​Implementation Tips

​Error Handling and Best Practices

​Common Error Scenarios

​Rate Limiting Best Practices

​Performance Optimization

​Text Length Considerations

​Batch Processing

​Integration Examples

​Web Application Integration

​Mobile App Integration

Voicelab

​Advanced Voice Synthesis Features

​Multispeaker Text to Speech

​Basic Multispeaker Usage

​Advanced Multispeaker Scenarios

​Customer Service Training

​Educational Content

​Voice Selection Best Practices

​Professional Contexts

​Casual and Creative Content

​Voice Pairing Guidelines

​Streaming Audio Output

​Benefits of Streaming

​Implementation Tips

​Error Handling and Best Practices

​Common Error Scenarios

​Rate Limiting Best Practices

​Performance Optimization

​Text Length Considerations

​Batch Processing

​Integration Examples

​Web Application Integration

​Mobile App Integration

Advanced Voice Synthesis Features

Multispeaker Text to Speech

Basic Multispeaker Usage

Advanced Multispeaker Scenarios

Customer Service Training

Educational Content

Voice Selection Best Practices

Professional Contexts

Casual and Creative Content

Voice Pairing Guidelines

Streaming Audio Output

Benefits of Streaming

Implementation Tips

Error Handling and Best Practices

Common Error Scenarios

Rate Limiting Best Practices

Performance Optimization

Text Length Considerations

Batch Processing

Integration Examples

Web Application Integration

Mobile App Integration

Advanced Voice Synthesis Features

Multispeaker Text to Speech

Basic Multispeaker Usage

Advanced Multispeaker Scenarios

Customer Service Training

Educational Content

Voice Selection Best Practices

Professional Contexts

Casual and Creative Content

Voice Pairing Guidelines

Streaming Audio Output

Benefits of Streaming

Implementation Tips

Error Handling and Best Practices

Common Error Scenarios

Rate Limiting Best Practices

Performance Optimization

Text Length Considerations

Batch Processing

Integration Examples

Web Application Integration

Mobile App Integration