Explore the available text-to-speech models and their capabilities
(laughs)
, (clears throat)
, (sighs)
, (gasps)
, (coughs)
, (singing)
, (sings)
, (mumbles)
, (beep)
, (groans)
, (sniffs)
, (claps)
, (screams)
, (inhales)
, (exhales)
, (applause)
, (burps)
, (humming)
, (sneezes)
, (chuckle)
, (whistles)
<laugh>
, <chuckle>
, <sigh>
, <cough>
, <sniffle>
, <groan>
, <yawn>
, <gasp>
Feature | Sesame CSM-1B | Dia | Orpheus |
---|---|---|---|
Voice Cloning | ✅ | ✅ | ✅ |
Emotive Tokens | ❌ | ✅ | ✅ |
Multi-speaker | ✅ | ✅ | ✅ |
Real-time Streaming | ✅ | ✅ | ✅ |
Custom Fine-tuning | ✅ | ✅ | ✅ |