Time is money, and innovation can save you both. Businesses around the globe deploy interactive voice response (IVR) systems to create good customer experiences and automate business processes.
In this article, we talk about Twilio, one of the most popular interactive voice response platforms. Twilio combines automated phone calls with voice generation directly from text, allowing you to establish a comprehensive IVR service.
We also take a look at the language Twilio uses for creating services, explore the most commonly used commands in this language, and show how to generate outgoing calls and use the text to speech functionality.
An interactive voice response (IVR) system is an automated phone system technology that allows callers to access information via a voice response system without speaking to an agent. Such systems automatically initiate outgoing calls and receive incoming calls.
Reporting systems with IVR technologies can notify customers about events with an SMS or a call. Therefore, they’re a must-have for businesses that aim to provide a great customer experience and automate business processes.
While generating SMS notifications is simple, generating voice messages is a bit more challenging. There are two common ways to do so:
- Record audio files for all possible scenarios beforehand. This approach is technologically complicated. It can also be extremely costly for multinational companies that offer support in several languages. Not to mention that even standard phrases may contain variables like vocatives, names, dates, and numbers.
- Create voice messages in real time with a speech generation service. Such services allow you to generate voice messages from written text and send them to a call center or directly to a customer. The performance of some speech generation services may not be up to your project’s demands, however, so choosing the right speech generation tool is essential.
In this article, we concentrate on Twilio. Its most significant advantages are a vast number of supported languages and accents, clear and helpful documentation, and a large community. In one of our projects, we used Twilio for developing an AI text to speech messaging feature and sending SMS texts.
Twilio is a cloud communication platform that provides an API you can use to build your own IVR system, make and receive phone calls and SMS messages, and perform other operations.
Working with Twilio is straightforward and intuitive, especially since the platform offers step-by-step tutorials.
First, you need to create a Twilio account and pick a phone number. Twilio provides one free phone number per account, even for trial accounts. However, if you need to purchase additional phone numbers for your project, you’ll have to upgrade your account. You can select any available number from the list offered in Twilio, but note that the phone number selection for trial projects may be limited.
To determine which project an API request is coming from, Twilio uses the following credentials:
- Account SID as the username
- Auth Token as a password that can be changed if needed
Twilio offers IVR developers a variety of services including programmable SMS messages, voice messages, videos, chats, and faxes along with APIs for sending messages via email and WhatsApp. In this article, we concentrate on calls and text messaging with the Twilio API, so let’s take a closer look at the relevant features.
Sending SMS messages
The Twilio platform allows you to send SMS messages to one or multiple phone numbers. You can also add media files to messages and Twilio will automatically send them in the MMS format. The price differs depending on the type of message (SMS or MMS), the number to which a message is sent (toll-free, short code, local number, etc.), and the total number of messages sent.
Receiving SMS messages
You can automatically receive SMS messages for any number in your Twilio account using a webhook. To do this, implement an HTTP API endpoint on your website to receive and process SMS data (sender, contents, etc.). You can connect one or several phone numbers to this HTTP API. The endpoint is configured to receive information about all messages sent to the numbers you’ve assigned. Once Twilio receives an SMS, it will forward it to your website.
Twilio allows you to make outgoing phone calls and play audio files or specify text to synthesize. You can also use it to learn what digits were pressed during a call.
Here’s the code that initializes an outgoing call:
The mechanism behind receiving incoming calls works the same way as the mechanism for receiving SMS messages. Twilio informs you about call initiation through an HTTP endpoint on your website. You can either decline a call or accept it and play a voice message generated from a specified text.
Apart from various functionalities, Twilio has its own markup language for IVR application development. We’ll take a look at it in the next section.
The Twilio Markup Language (TwiML) is an XML-based markup language that specifies instructions and the order in which Twillio will perform them during a call or when sending an SMS. You can set instructions right from the code or in an XML file.
Here’s an example of code that initializes a call and instructs Twilio to say “Hello!” to the call recipient:
You can use different combinations of TwiML commands to create voice applications. The most commonly used commands are:
- Say — read the text to the caller
- Play — play an audio file
- Dial — add another party to the call
- Record — record the caller’s voice
- Gather — collect digits the caller types on the keypad during the call
The following commands help developers handle the flow of a call:
- Hangup — end the call
- Enqueue — add a caller to the queue of callers
- Leave — remove a caller from the queue
- Pause — wait before executing more instructions
- Redirect — redirect the flow to a different TwiML file
- Reject — decline an incoming call
Twilio also offers libraries that help you automatically create valid TwiML commands for specific needs. Another way of working with TwiML is using TwiML Bins, a serverless solution that helps you quickly prototype an application or even run it in production directly from Twilio’s servers without writing any code.
Now that you’ve learned some basics of working with Twilio, let’s move to something more interesting — text to speech functionality.
Text to speech (TTS), also known as speech synthesis, is the process of transforming text into audio.
TTS is a popular technology among developers and businesses when creating IVR solutions and other voice applications. The reason is that it reduces the time for creating voice messages without the need to record audio files using real voice actors. Instead of playing recorded files, TTS dynamically generates voice messages directly from text.
Automatic speech recognition (ASR), or text to speech messaging, is also possible with Twilio. The Gather command captures keystrokes and voice responses of call recipients. Once a call recipient finishes talking, the service waits for a few seconds of silence, then sends the recording to your HTTP endpoint so you can process the customer’s message.
Currently, Twilio can generate speech with a male or female voice in five languages including English (with both British and American accents). It can also synthesize speech for another 18 languages and 14 regional accents with a female voice.
Twilio can be integrated with Amazon Polly, an AWS service for synthesizing speech from text that offers dozens of lifelike voices. Each Amazon Polly voice supports a particular language and locale. You can also configure pronunciation, speed, tone, and other settings for Polly’s voices.
The Twilio platform allows you to configure the default language and provider (Twilio or Amazon Polly) as well as specify a language right in the program code.
Here’s an example of code for making a call with speech synthesis:
Although both text to speech generation and IVR system development with Twilio are straightforward, there are several nuances you should consider. In the next section, we highlight two crucial things you should pay attention to when working with such solutions.
No matter what IVR development platform you work with, you'll want to keep in mind a few key principles.
For one of our recent projects, we built AI text to speech messaging functionality for a transportation corporation’s customer service department. This feature allowed our client to automate a significant portion of routine processes carried out by customer service representatives and decreased their workload by 30%.
Based on our own experience in this field, there are two essential things to keep in mind when working on IVR services:
- Log all data. Consider logging all information about calls, requests, and answers given. This is important for monitoring service quality, conducting statistical analysis, and troubleshooting.
- Balance system load. An IVR is a high-load service that requires frequent access to databases. It’s a good idea to have a separate synchronized database only for your IVR service. Consider placing the service in the cloud to automatically scale computing power when the load increases. If you can’t move your IVR service to the cloud, consider using several dedicated servers for this service only.
The Twilio platform is a great option for speech synthesis and recognition. Using Twilio, you can create a comprehensive IVR service to improve the quality and efficiency of your customer service. Twilio provides a vast selection of integration options, a transparent development process, and many additional features.
At Apriorit, we have dedicated teams of cloud computing and web development specialists who are ready to help you build a dream solution from scratch or enhance an existing one. Contact us to start discussing your project right away.