The Future of Text-to-Speech and What It Means for Voice-Over

What you need to know about toggling for subtitles video translation
February 9, 2017
4 Tips for Recording Localized Video Game Voice-Over for the US Market
February 16, 2017

We’re going to end our month-long focus on text-to-speech by looking to the future – specifically, the future of TTS. It’s already everywhere – the new assistants on Amazon Echo and Google Home, for example, are very sophisticated TTS voices coupled with voice recognition. Because it’s a cost-effective and rapid option for VO production, TTS will come to dominate various VO applications, especially when they require localization.

In today’s post, we’ll look at the future of TTS, and the three voiceover applications that will benefit the most – and the most quickly – from this technology.

[Average read time: 3 minutes]

TTS will dominate audio applications that disseminate information

TTS is revolutionary specifically because it can turn text into voice-over audio somewhat instantly. For any digital applications, this means that user interfaces that were formerly limited to text interactions can now support audio interactions by coupling TTS with voice recognition. Siri, the Google Assistant and Amazon’s Alexa allow users to do this with their devices. Most phone-based customer service interfaces do this as well, though somewhat more rudimentarily.


In the following two years, human sounding text-to-speech services will also start to dominate the following voiceover recording applications.

1. Accessibility

There are already several software suites designed to convert on-screen messages to voice audio. Windows and MacOS both support this technology, to make their operating systems accessible to the blind. But TTS is also a boon to documentation accessibility. Think of long health care plans, instruction manuals, user agreements, or even warranties. TTS allows companies, or even users themselves, to quickly convert all of this content to accurate voiceover audio.

We probably won’t start seeing it in audio description for movies and TV shows very soon, since it really requires a human voice, but for everything else – expect to hear TTS.

2. e-Learning and skills training applications

Captivate already provides TTS voices for its software in English and a couple other languages. Microsoft already uses TTS voices for its how-to video tutorials – you can see one of the Spanish-language videos here, which has a blurb in which Microsoft explains that it switched to TTS because it allows them to offer their localized videos more quickly. More and more e-Learning developers are turning to TTS voices for their content, especially as a more user-friendly, but equally-low-cost alternative to subtitles.

TTS is a natural fit for e-Learning, skills training, and any other VO application that disseminates information – it’s cost-effective, it can be produced incredibly quickly, and the recordings are accurate.

3. Online news clips

e-Learning is not the only information dissemination that benefits from rapid VO in multiple languages. There’s also the news, which produces vast amounts of video content – content that must be processed by a localization services provider and a voice recording studio quickly, and often at inconvenient times of the day. Need Chinese voice localization at 3 am in the morning for a breaking news video? With Chinese text-to-speech, this is completely reasonable. In fact, the BBC is already doing exactly this, as you can see in their video here.

There are many more applications, of course – think of airport, subway or any other public announcements. Think of automated messaging systems from doctors’ offices, stores, or even governmental agencies. Any person or institution that needs to relay new or unique information quickly – be it news, traffic updates, or even appointment cancellations – can now use TTS voice over.

By the way – quality will improve dramatically too

The improvements in TTS quality over the last decade have been extraordinary – but you can expect the upgrades to continue. TTS voices are in development around the world, and TTS languages are being added every month. Expect the next generation to sound fuller, have more seamless vocal transitions, and to interpret content better. And of course, expect more languages. Because of this, we expect that TTS VO will be common-place, and possibly even dominant, as the source of voiceover in e-Learning and corporate multimedia localization before the decade is out.

New Call to Action