Creating Clips in Resemble AI API
Best Practices for Building TTS and STS Clips
Introduction:
Efficiently creating clips in Resemble AI's API is crucial for achieving the best quality output. In this documentation, we will provide you with some essential best practices to consider when making API calls for your TTS (Text-to-Speech) and STS (Speech-to-Speech) projects.
Making API Calls to Resemble AI:
Clip Size Consideration:
When making API calls for TTS and STS projects, it's essential to consider the clip size for optimal results. The following guidelines will help you achieve the best quality output:
JSON Body | Type | Description |
---|---|---|
title | (optional) string | Title for this clip |
body | string | Content to be synthesized. Max size of 3,000 characters excluding SSML tags. |
voice_uuid | string | UUID of the voice to use for synthesizing |
is_public | boolean | Whether this clip should be accessible publically |
is_archived | boolean | Whether this clip should be archived |
callback_uri | string | The URL to POST the final clip to |
precision | (optional) one of the following values: PCM_32 , PCM_24 , PCM_16 , or MULAW . Default is PCM_32 . |
The audio bit depth of generated audio |
sample_rate | (optional) one of the following values: 16000 , 22050 or 44100 . Default is 22050 . |
The sample rate (Hz) of the generated audio |
output_format | (optional) one of the following values: wav or mp3 . Default is wav . |
The format of the generated audio |
include_timestamps | (optional) boolean. Default is false |
Whether to include timestamps for the generated content -- see the documentation on timestamps for more information |
- Body string amounts:
Limit the number of sentences per body string to a maximum of 3. This ensures that the AI model can focus on generating accurate and expressive speech for a smaller set of sentences.
-
Number of Characters per clip:
Aim for a clip size of 3,000 characters excluding SSML tags. -
Number of Sentences:
Keep the number of clips within a project to a maximum of 150. This helps maintain efficient processing and quality throughout your project. (e.g. one chapter)
Clip Hacks for Enunciation:
Sometimes, you may encounter pronunciation challenges with the AI models.
**This also applies to localized clips, or multilingual models.
Here are some useful tips to work around these issues and enhance the pronunciation of your text:
-
Use Punctuation:
Add punctuation marks before or after a word to help guide the model in pronunciation. For example, placing a comma or period can influence the pronunciation of adjacent words.-
For example: Grammar doesn't matter here, you can insert, commas, where,ever emphasis is needed.
-
-
Double Letters:
Consider doubling letters, whether vowels or consonants, in the words to achieve specific sounds. For example, changing "Say" to "Saay" can result in a more accurate representation of the long 'A' sound. -
Capitalizing Letters:
Capitalizing letters can create emphasis on the letter in the word. For example, "request" can be spelled as "requeST" for an emphasis on the 'ST' sound at the end. -
Split Paragraphs:
If the model struggles to pronounce all the words in a paragraph, consider breaking the paragraph into separate clips. This can significantly enhance the pronunciation of individual words and phrases.
Getting Further Assistance:
If you encounter persistent issues or have specific challenges that require additional support, please don't hesitate to refer to our API documentation or reach out to our dedicated support team at Support@Resemble.ai. Our team members are ready to assist you and provide guidance to ensure your project's success.
Conclusion:
By following these best practices when making API calls to Resemble AI, you can optimize your projects for exceptional output quality and achieve the desired results for your TTS and STS endeavors.