Skip to main content

Speech recognition and synthesis

Bots that make and accept calls use automatic speech recognition and synthesis:

  • Automatic Speech Recognition (ASR) is the process of translating speech to text.
  • Text-To-Speech (TTS), or speech synthesis, is the process of generating speech from written text.

When creating a phone channel, you can do either of the following:

  • Select one of the ASR/TTS providers supported by Tovie AI.
    You can then customize speech recognition and text-to-speech settings in Tovie Platform: select a model for recognition, a specific voice for speech synthesis, etc.

  • Create a connection using your own account registered by the ASR/TTS provider.

    If you prefer to use your own connection, Tovie AI ASR limit does not apply to you.

Then, you will need to use the a tag or the $reactions.answer method for generating replies from the script.

Speech synthesis markup

To make the bot’s speech more expressive, you can use speech synthesis markup. Tovie Platform supports Speech Synthesis Markup Language (SSML) that allows you to customize the speech tone, pronunciation, speed, volume, etc. Learn more about SSML in Speech synthesis markup.

Changing ASR and TTS settings from the script

The settings configured for the speech recognition and synthesis provider apply to all calls made through the phone channel. However, you can override them for each individual call if necessary: for example, you can switch the recognition language mid-conversation or change the voice in which the bot talks to a specific user.

To control the ASR and TTS settings from the script, use the $dialer built-in service methods:

Get the ASR/TTS provider name.
Get the current ASR/TTS settings.
Override the ASR/TTS settings.