Sesame - Vogent

We provide access to Sesame’s ultra-realistic CSM-1B voice model, through a rebuilt version that can produce audio in real-time and at low-latency. This voice engine is available at no additional cost. In testing, the voice generally surpasses state-of-the-art voice vendors in realism, while beating them on cost and latency.

Key Features

Natural Prosody: Sesame voices deliver more natural intonation, rhythm, and stress patterns in speech
Improved Expressiveness: Better emotional range and contextual understanding
Enhanced Pronunciation and Spelling: More accurate handling of complex words and phrases
Seamless Transitions: Smoother flow between sentences and paragraphs

Using Sesame Voices

Sesame voices can be identified by the “Sesame” tag in the voice selection interface. While there are a small number of available Sesame voices right now, cloning Sesame voices is straightforward, and can be done with ~8-20 seconds of audio. For tips on effectively creating new Sesame voices, see the Voice Cloning section.

Sesame voices are still in beta, and may still have instability in inference (e.g. long pauses, or strange conversational artifacts). We regularly release updates that enhance their capabilities and performance.

Voice Library Voice Cloning

​Key Features

​Using Sesame Voices

Key Features

Using Sesame Voices