Advanced Voice Synthesis Features

Discover the advanced features available in Voicelab’s voice synthesis engine.

Multispeaker Text to Speech

Create conversations and dialogues with multiple distinct voices in a single audio file.

Basic Multispeaker Usage

Generate conversations with different speakers:

import requests

response = requests.post(
    "https://api.vogent.ai/tts/multispeaker",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "lines": [
            {
                "text": "Welcome to our store! How can I help you today?",
                "voiceId": "professional-female-1"
            },
            {
                "text": "Hi, I'm looking for a new laptop for work.",
                "voiceId": "conversational-male-1"
            },
            {
                "text": "Great! I can help you find the perfect laptop. What type of work do you do?",
                "voiceId": "professional-female-1"
            }
        ]
    }
)

if response.status_code == 200:
    with open("conversation.wav", "wb") as f:
        f.write(response.content)
    print("Conversation saved successfully!")

Advanced Multispeaker Scenarios

Create complex dialogues for various use cases:

Customer Service Training

training_dialogue = {
    "lines": [
        {"text": "Thank you for calling customer support. How can I assist you today?", "voiceId": "professional-female-1"},
        {"text": "I'm having trouble with my recent order. It hasn't arrived yet.", "voiceId": "conversational-male-1"},
        {"text": "I understand your concern. Let me look up your order details. Can you provide your order number?", "voiceId": "professional-female-1"},
        {"text": "Sure, it's order number 12345.", "voiceId": "conversational-male-1"},
        {"text": "Thank you. I can see your order was shipped yesterday and should arrive by tomorrow.", "voiceId": "professional-female-1"}
    ]
}

response = requests.post("https://api.vogent.ai/tts/multispeaker", headers=headers, json=training_dialogue)

Educational Content

lesson_dialogue = {
    "lines": [
        {"text": "Today we're learning about photosynthesis. Can anyone tell me what plants need for photosynthesis?", "voiceId": "professional-female-1"},
        {"text": "They need sunlight, water, and carbon dioxide!", "voiceId": "conversational-female-1"},
        {"text": "Excellent! And what do plants produce during photosynthesis?", "voiceId": "professional-female-1"},
        {"text": "They produce oxygen and glucose!", "voiceId": "conversational-male-1"}
    ]
}

Voice Selection Best Practices

Choose appropriate voices for different scenarios:

Professional Contexts

  • Business presentations: Use professional-female-1 or professional-male-1
  • Corporate training: Professional voices for instructors, conversational for participants
  • Customer service: Professional voices for representatives

Casual and Creative Content

  • Podcasts: Mix of conversational and expressive voices
  • Audiobooks: Expressive voices for characters, conversational for narration
  • Social media content: Conversational voices for relatability

Voice Pairing Guidelines

When using multiple voices in a conversation:

  1. Contrast is key: Use different genders or voice types for clarity
  2. Consistency: Keep the same voice for each character throughout
  3. Context matching: Match voice formality to the scenario
# Good voice pairing example
customer_service_scenario = {
    "lines": [
        {"text": "Good morning, how may I help you?", "voiceId": "professional-female-1"},  # Formal representative
        {"text": "Hi, I need help with my account.", "voiceId": "conversational-male-1"},    # Casual customer
        {"text": "I'd be happy to assist you with that.", "voiceId": "professional-female-1"}  # Consistent representative
    ]
}

Streaming Audio Output

Voicelab automatically streams audio bytes as they are generated, providing optimal performance for real-time applications.

Benefits of Streaming

  • Lower latency: Audio starts playing before generation is complete
  • Memory efficiency: No need to buffer entire audio files
  • Better user experience: Immediate audio feedback

Implementation Tips

import requests

# Stream audio directly to a file
response = requests.post(
    "https://api.vogent.ai/tts",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"text": "This is a streaming audio example.", "voiceId": "professional-female-1"},
    stream=True
)

with open("streaming_audio.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

Error Handling and Best Practices

Common Error Scenarios

import requests

def synthesize_text(text, voice_id):
    try:
        response = requests.post(
            "https://api.vogent.ai/tts",
            headers={"Authorization": f"Bearer {api_key}"},
            json={"text": text, "voiceId": voice_id}
        )
        
        if response.status_code == 200:
            return response.content
        elif response.status_code == 400:
            print("Error: Invalid input - check your text and voice ID")
        elif response.status_code == 422:
            print("Error: Validation failed - invalid voice ID or text format")
        elif response.status_code == 429:
            print("Error: Rate limit exceeded - please wait before retrying")
        else:
            print(f"Error: Unexpected status code {response.status_code}")
            
    except requests.exceptions.RequestException as e:
        print(f"Network error: {e}")
    
    return None

Rate Limiting Best Practices

import time
import requests

def synthesize_with_retry(text, voice_id, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://api.vogent.ai/tts",
                headers={"Authorization": f"Bearer {api_key}"},
                json={"text": text, "voiceId": voice_id}
            )
            
            if response.status_code == 200:
                return response.content
            elif response.status_code == 429:
                # Rate limited, wait and retry
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time} seconds before retry...")
                time.sleep(wait_time)
                continue
            else:
                print(f"Error: {response.status_code}")
                break
                
        except requests.exceptions.RequestException as e:
            print(f"Network error: {e}")
            time.sleep(2 ** attempt)
    
    return None

Performance Optimization

Text Length Considerations

  • Optimal length: 100-1000 characters per request
  • Maximum length: 5000 characters for single TTS
  • Multispeaker limits: 500 characters per line, 10 lines maximum

Batch Processing

For large amounts of text, split into smaller chunks:

def batch_synthesize(long_text, voice_id, chunk_size=1000):
    chunks = [long_text[i:i+chunk_size] for i in range(0, len(long_text), chunk_size)]
    audio_files = []
    
    for i, chunk in enumerate(chunks):
        audio_data = synthesize_text(chunk, voice_id)
        if audio_data:
            filename = f"chunk_{i}.wav"
            with open(filename, "wb") as f:
                f.write(audio_data)
            audio_files.append(filename)
    
    return audio_files

Integration Examples

Web Application Integration

// Frontend JavaScript example
async function synthesizeText(text, voiceId) {
    try {
        const response = await fetch('/api/tts', {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({ text, voiceId })
        });
        
        if (response.ok) {
            const audioBlob = await response.blob();
            const audioUrl = URL.createObjectURL(audioBlob);
            
            const audio = new Audio(audioUrl);
            audio.play();
        }
    } catch (error) {
        console.error('TTS Error:', error);
    }
}

Mobile App Integration

# Python backend for mobile app
from flask import Flask, request, Response
import requests

app = Flask(__name__)

@app.route('/tts', methods=['POST'])
def text_to_speech():
    data = request.json
    
    response = requests.post(
        "https://api.vogent.ai/tts",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=data
    )
    
    if response.status_code == 200:
        return Response(
            response.content,
            mimetype='audio/wav',
            headers={'Content-Disposition': 'attachment; filename=speech.wav'}
        )
    else:
        return {'error': 'TTS generation failed'}, 500 

Advanced Voice Synthesis Features

Discover the advanced features available in Voicelab’s voice synthesis engine.

Multispeaker Text to Speech

Create conversations and dialogues with multiple distinct voices in a single audio file.

Basic Multispeaker Usage

Generate conversations with different speakers:

import requests

response = requests.post(
    "https://api.vogent.ai/tts/multispeaker",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "lines": [
            {
                "text": "Welcome to our store! How can I help you today?",
                "voiceId": "professional-female-1"
            },
            {
                "text": "Hi, I'm looking for a new laptop for work.",
                "voiceId": "conversational-male-1"
            },
            {
                "text": "Great! I can help you find the perfect laptop. What type of work do you do?",
                "voiceId": "professional-female-1"
            }
        ]
    }
)

if response.status_code == 200:
    with open("conversation.wav", "wb") as f:
        f.write(response.content)
    print("Conversation saved successfully!")

Advanced Multispeaker Scenarios

Create complex dialogues for various use cases:

Customer Service Training

training_dialogue = {
    "lines": [
        {"text": "Thank you for calling customer support. How can I assist you today?", "voiceId": "professional-female-1"},
        {"text": "I'm having trouble with my recent order. It hasn't arrived yet.", "voiceId": "conversational-male-1"},
        {"text": "I understand your concern. Let me look up your order details. Can you provide your order number?", "voiceId": "professional-female-1"},
        {"text": "Sure, it's order number 12345.", "voiceId": "conversational-male-1"},
        {"text": "Thank you. I can see your order was shipped yesterday and should arrive by tomorrow.", "voiceId": "professional-female-1"}
    ]
}

response = requests.post("https://api.vogent.ai/tts/multispeaker", headers=headers, json=training_dialogue)

Educational Content

lesson_dialogue = {
    "lines": [
        {"text": "Today we're learning about photosynthesis. Can anyone tell me what plants need for photosynthesis?", "voiceId": "professional-female-1"},
        {"text": "They need sunlight, water, and carbon dioxide!", "voiceId": "conversational-female-1"},
        {"text": "Excellent! And what do plants produce during photosynthesis?", "voiceId": "professional-female-1"},
        {"text": "They produce oxygen and glucose!", "voiceId": "conversational-male-1"}
    ]
}

Voice Selection Best Practices

Choose appropriate voices for different scenarios:

Professional Contexts

  • Business presentations: Use professional-female-1 or professional-male-1
  • Corporate training: Professional voices for instructors, conversational for participants
  • Customer service: Professional voices for representatives

Casual and Creative Content

  • Podcasts: Mix of conversational and expressive voices
  • Audiobooks: Expressive voices for characters, conversational for narration
  • Social media content: Conversational voices for relatability

Voice Pairing Guidelines

When using multiple voices in a conversation:

  1. Contrast is key: Use different genders or voice types for clarity
  2. Consistency: Keep the same voice for each character throughout
  3. Context matching: Match voice formality to the scenario
# Good voice pairing example
customer_service_scenario = {
    "lines": [
        {"text": "Good morning, how may I help you?", "voiceId": "professional-female-1"},  # Formal representative
        {"text": "Hi, I need help with my account.", "voiceId": "conversational-male-1"},    # Casual customer
        {"text": "I'd be happy to assist you with that.", "voiceId": "professional-female-1"}  # Consistent representative
    ]
}

Streaming Audio Output

Voicelab automatically streams audio bytes as they are generated, providing optimal performance for real-time applications.

Benefits of Streaming

  • Lower latency: Audio starts playing before generation is complete
  • Memory efficiency: No need to buffer entire audio files
  • Better user experience: Immediate audio feedback

Implementation Tips

import requests

# Stream audio directly to a file
response = requests.post(
    "https://api.vogent.ai/tts",
    headers={"Authorization": f"Bearer {api_key}"},
    json={"text": "This is a streaming audio example.", "voiceId": "professional-female-1"},
    stream=True
)

with open("streaming_audio.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

Error Handling and Best Practices

Common Error Scenarios

import requests

def synthesize_text(text, voice_id):
    try:
        response = requests.post(
            "https://api.vogent.ai/tts",
            headers={"Authorization": f"Bearer {api_key}"},
            json={"text": text, "voiceId": voice_id}
        )
        
        if response.status_code == 200:
            return response.content
        elif response.status_code == 400:
            print("Error: Invalid input - check your text and voice ID")
        elif response.status_code == 422:
            print("Error: Validation failed - invalid voice ID or text format")
        elif response.status_code == 429:
            print("Error: Rate limit exceeded - please wait before retrying")
        else:
            print(f"Error: Unexpected status code {response.status_code}")
            
    except requests.exceptions.RequestException as e:
        print(f"Network error: {e}")
    
    return None

Rate Limiting Best Practices

import time
import requests

def synthesize_with_retry(text, voice_id, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://api.vogent.ai/tts",
                headers={"Authorization": f"Bearer {api_key}"},
                json={"text": text, "voiceId": voice_id}
            )
            
            if response.status_code == 200:
                return response.content
            elif response.status_code == 429:
                # Rate limited, wait and retry
                wait_time = 2 ** attempt  # Exponential backoff
                print(f"Rate limited. Waiting {wait_time} seconds before retry...")
                time.sleep(wait_time)
                continue
            else:
                print(f"Error: {response.status_code}")
                break
                
        except requests.exceptions.RequestException as e:
            print(f"Network error: {e}")
            time.sleep(2 ** attempt)
    
    return None

Performance Optimization

Text Length Considerations

  • Optimal length: 100-1000 characters per request
  • Maximum length: 5000 characters for single TTS
  • Multispeaker limits: 500 characters per line, 10 lines maximum

Batch Processing

For large amounts of text, split into smaller chunks:

def batch_synthesize(long_text, voice_id, chunk_size=1000):
    chunks = [long_text[i:i+chunk_size] for i in range(0, len(long_text), chunk_size)]
    audio_files = []
    
    for i, chunk in enumerate(chunks):
        audio_data = synthesize_text(chunk, voice_id)
        if audio_data:
            filename = f"chunk_{i}.wav"
            with open(filename, "wb") as f:
                f.write(audio_data)
            audio_files.append(filename)
    
    return audio_files

Integration Examples

Web Application Integration

// Frontend JavaScript example
async function synthesizeText(text, voiceId) {
    try {
        const response = await fetch('/api/tts', {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({ text, voiceId })
        });
        
        if (response.ok) {
            const audioBlob = await response.blob();
            const audioUrl = URL.createObjectURL(audioBlob);
            
            const audio = new Audio(audioUrl);
            audio.play();
        }
    } catch (error) {
        console.error('TTS Error:', error);
    }
}

Mobile App Integration

# Python backend for mobile app
from flask import Flask, request, Response
import requests

app = Flask(__name__)

@app.route('/tts', methods=['POST'])
def text_to_speech():
    data = request.json
    
    response = requests.post(
        "https://api.vogent.ai/tts",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=data
    )
    
    if response.status_code == 200:
        return Response(
            response.content,
            mimetype='audio/wav',
            headers={'Content-Disposition': 'attachment; filename=speech.wav'}
        )
    else:
        return {'error': 'TTS generation failed'}, 500