Voice Cloning
Create custom voices from audio samples for unique, branded voice experiences.Voice Cloning
There are two ways to clone a voice: zero-shot voice-cloning involves providing 15-20 seconds of a high-quality recording, while professional voice-cloning (or fine-tuning) involves 2-3 hours of two speakers engaged in conversation. While professional voice-cloning produces higher-fidelity clones with fewer quirks/artifacts, zero-shot voice cloning can usually produce a good result. The quality of voice clones can depend on the style of the voice you’re trying to clone; voices that are too far out-of-distribution (i.e. voices whose accents/styles were not likely encountered during initial training of the model) are more likely to produce lower-fidelity voice clones, or voice clones with more artifacts.Zero-Shot Voice Cloning
You can generate a zero-shot voice clone by uploading a 15-20 second voice recording through our UI. To do so, go to the Voices tab on the left sidebar, then click the Clone Voice button. Select a gender and voice model, then upload the clip.Voice Cloning Tips
- Make sure your clip is free from background noise, and that there is only one speaker
- For best results, upload a conversational clip. Don’t over-enunciate or artificially control the speed of speech.
- Keep in mistakes (stutters, ums/uhs, etc.). These lead to more realistic clones.
- For best results, improvise some lines on a topic that’s easy for you to speak about (e.g. what are your plans for this weekend?)
- Do not read from a script.