What are best practices for Creating Clips in Resemble AI API?

Creating Clips in Resemble AI API

Best Practices for Building TTS and STS Clips 

Introduction:
Efficiently creating clips in Resemble AI's API is crucial for achieving the best quality output. In this documentation, we will provide you with some essential best practices to consider when making API calls for your TTS (Text-to-Speech) and STS (Speech-to-Speech) projects.

Making API Calls to Resemble AI:

Clip Size Consideration:
When making API calls for TTS and STS projects, it's essential to consider the clip size for optimal results. The following guidelines will help you achieve the best quality output:

JSON Body Type Description
title (optional) string Title for this clip
body string Content to be synthesized. Max size of 3,000 characters excluding SSML tags.
voice_uuid string UUID of the voice to use for synthesizing
is_public boolean Whether this clip should be accessible publically
is_archived boolean Whether this clip should be archived
callback_uri string The URL to POST the final clip to
precision (optional) one of the following values: PCM_32PCM_24PCM_16, or MULAW. Default is PCM_32. The audio bit depth of generated audio
sample_rate (optional) one of the following values: 1600022050 or 44100. Default is 22050. The sample rate (Hz) of the generated audio
output_format (optional) one of the following values: wav or mp3. Default is wav. The format of the generated audio
include_timestamps (optional) boolean. Default is false Whether to include timestamps for the generated content -- see the documentation on timestamps for more information
  • Body string amounts:
    Limit the number of sentences per body string to a maximum of 3. This ensures that the AI model can focus on generating accurate and expressive speech for a smaller set of sentences.
  • Number of Characters per clip:
    Aim for a clip size of 3,000 characters excluding SSML tags.

  • Number of Sentences:
    Keep the number of clips within a project to a maximum of 150. This helps maintain efficient processing and quality throughout your project. (e.g. one chapter)

Clip Hacks for Enunciation:
Sometimes, you may encounter pronunciation challenges with the AI models.
**This also applies to localized clips, or multilingual models. 

Here are some useful tips to work around these issues and enhance the pronunciation of your text:

  • Use Punctuation:
    Add punctuation marks before or after a word to help guide the model in pronunciation. For example, placing a comma or period can influence the pronunciation of adjacent words.

    • For example: Grammar doesn't matter here, you can insert, commas, where,ever emphasis is needed. 

  • Double Letters:
    Consider doubling letters, whether vowels or consonants, in the words to achieve specific sounds. For example, changing "Say" to "Saay" can result in a more accurate representation of the long 'A' sound.

  • Capitalizing Letters:
    Capitalizing letters can create emphasis on the letter in the word. For example, "request" can be spelled as "requeST" for an emphasis on the 'ST' sound at the end.

  • Split Paragraphs:
    If the model struggles to pronounce all the words in a paragraph, consider breaking the paragraph into separate clips. This can significantly enhance the pronunciation of individual words and phrases.

Getting Further Assistance:
If you encounter persistent issues or have specific challenges that require additional support, please don't hesitate to refer to our API documentation or reach out to our dedicated support team at Support@Resemble.ai. Our team members are ready to assist you and provide guidance to ensure your project's success.

Conclusion:
By following these best practices when making API calls to Resemble AI, you can optimize your projects for exceptional output quality and achieve the desired results for your TTS and STS endeavors.