Can I provide more data to an existing voice?

Voice cloning and fine tuning.

Yes, you can provide more data to your existing voice. 


Supplemental data plays a crucial role in enhancing the quality and expressiveness of synthesized voices. In this knowledge base article, we'll explore how to use supplemental data effectively to improve voice quality and address some key considerations in the process.

What is Supplemental Data?

Supplemental data refers to additional audio recordings that are used to enhance an existing voice model. This data typically consists of expressive recordings, which can include a variety of emotions, tones, and speech patterns. These recordings help the Resemble system create more natural and diverse voice responses.

Key Points to Consider:

Improved naturalness, speaker similarity, and expressiveness.

Expressive Recordings:

When collecting supplemental data, it's essential to focus on expressive recordings. These should cover a wide range of emotions, from happiness and excitement to sadness and frustration. Including variations in tone, pitch, and pacing is crucial for creating a versatile voice.

Minimum 45 Minutes of Data:

To significantly impact the quality of the voice model, it's recommended to gather at least 45 minutes of supplemental data. This ensures a substantial dataset for the AI to learn from, resulting in a more expressive and versatile voice.

Noise Obstruction Prevention:

During the recording process, make sure to prevent noise obstruction. Record in a quiet and controlled environment to maintain the quality and clarity of the supplementary data. High-quality recordings lead to high-quality voices.

Retraining Time:

It's important to be aware that integrating supplemental data and retraining an existing voice model can be a time-intensive process. The expected time to retrain an existing voice can be up to 48 hours or more, depending on the complexity of the model and the amount of supplemental data used.


Supplemental data is a valuable resource for improving the quality and expressiveness of synthesized voices. By collecting expressive recordings, ensuring a minimum of 45 minutes of data, preventing noise obstruction, and understanding the time required for retraining, you can create more natural and versatile voice models.

For specific instructions on how to integrate supplemental data into your voice training process, please refer to the documentation provided by your voice synthesis platform or service.


You can always reach us at