While you can bring any voice to any of our offered models by voice-cloning, we also offer select pre-cloned and pre-trained voices for each model. Voices marked as High Quality are full fine-tunes from professional voice actors, and will be higher quality & more stable. Other voices are zero-shot voice clones; while still high-quality, they may be more likely to produce artifacts in generation.

Voice Cloning

Create custom voices from audio samples for unique, branded voice experiences.

Voice Cloning

There are two ways to clone a voice: zero-shot voice-cloning involves providing 15-20 seconds of a high-quality recording, while professional voice-cloning (or fine-tuning) involves 2-3 hours of two speakers engaged in conversation.

While professional voice-cloning produces higher-fidelity clones with fewer quirks/artifacts, zero-shot voice cloning can usually produce a good result. The quality of voice clones can depend on the style of the voice you’re trying to clone; voices that are too far out-of-distribution (i.e. voices whose accents/styles were not likely encountered during initial training of the model) are more likely to produce lower-fidelity voice clones, or voice clones with more artifacts.

Zero-Shot Voice Cloning

You can generate a zero-shot voice clone by uploading a 15-20 second voice recording through our UI.

To do so, go to the Voices tab on the left sidebar, then click the Clone Voice button. Select a gender and voice model, then upload the clip.

Voice Cloning Tips

  • Make sure your clip is free from background noise, and that there is only one speaker
  • For best results, upload a conversational clip. Don’t over-enunciate or artificially control the speed of speech.
  • Keep in mistakes (stutters, ums/uhs, etc.). These lead to more realistic clones.
  • For best results, improvise some lines on a topic that’s easy for you to speak about (e.g. what are your plans for this weekend?)
  • Do not read from a script.

Fine-tuning and Professional Voice Cloning

For more information on this, please visit the fine-tuning guide

While you can bring any voice to any of our offered models by voice-cloning, we also offer select pre-cloned and pre-trained voices for each model. Voices marked as High Quality are full fine-tunes from professional voice actors, and will be higher quality & more stable. Other voices are zero-shot voice clones; while still high-quality, they may be more likely to produce artifacts in generation.

Voice Cloning

Create custom voices from audio samples for unique, branded voice experiences.

Voice Cloning

There are two ways to clone a voice: zero-shot voice-cloning involves providing 15-20 seconds of a high-quality recording, while professional voice-cloning (or fine-tuning) involves 2-3 hours of two speakers engaged in conversation.

While professional voice-cloning produces higher-fidelity clones with fewer quirks/artifacts, zero-shot voice cloning can usually produce a good result. The quality of voice clones can depend on the style of the voice you’re trying to clone; voices that are too far out-of-distribution (i.e. voices whose accents/styles were not likely encountered during initial training of the model) are more likely to produce lower-fidelity voice clones, or voice clones with more artifacts.

Zero-Shot Voice Cloning

You can generate a zero-shot voice clone by uploading a 15-20 second voice recording through our UI.

To do so, go to the Voices tab on the left sidebar, then click the Clone Voice button. Select a gender and voice model, then upload the clip.

Voice Cloning Tips

  • Make sure your clip is free from background noise, and that there is only one speaker
  • For best results, upload a conversational clip. Don’t over-enunciate or artificially control the speed of speech.
  • Keep in mistakes (stutters, ums/uhs, etc.). These lead to more realistic clones.
  • For best results, improvise some lines on a topic that’s easy for you to speak about (e.g. what are your plans for this weekend?)
  • Do not read from a script.

Fine-tuning and Professional Voice Cloning

For more information on this, please visit the fine-tuning guide