Text-to-speech is now widely used – most notably in the recent boom in smart speakers, which rely on TTS voices like Alexa and Google Assistant. But there’s another field in which TTS voices are about to become just as important – audio accessibility voice-over for documentation. Because multimedia localization is closely tied to accessibility services, it’s crucial for language service providers and post-production professionals to understand why TTS will come to dominate this particular field in 2018 – and how to be ready for these projects.
This post will list the 5 reasons TTS will become the standard for document accessibility audio in 2018.
[Average read time: 3 minutes]
The next generation of technology, in particular artificial intelligence, will depend on TTS voices successfully replicating human speech. This is the main reason that TTS development is so strong at the moment. But document accessibility audio, which makes legal, health, governmental and other kinds of texts accessible to the blind and sight-impaired, is where TTS will establish itself as the standard for VO in the next few months – replacing human voice-over, in fact.
Here are the five reasons for this shift.
We say this every time we write about TTS, so we won’t belabor the point. Major developments in voice quality happen every 2-3 months because there are so many developers working on the technology, from multinationals like Google, Amazon, Microsoft and Apple, to smaller cutting-edge developers around the world. And the pace of improvement will quicken this year, particularly with advances in artificial intelligence, audio stitching and big data.
More content is getting made accessible in the US because of the continued implementation of the Americans with Disabilities Act (ADA). And, new technologies are enabling much of this implementation and making it more cost-effective. For example, stand-alone text caption formats like SRT and DFXP made the implementation of captioning and subtitling online relatively quick and cost-effective, which in turn drove the demand for these services. Same thing for documents – the amount of content getting accessibility audio is just going to increase, in part because of how quick and cost-effective TTS VO is making this service.
Locales around the world are implementing accessibility requirements as well – in fact, many countries have legislation akin to the ADA already in place. Locales that have good TTS options available in their language will adopt it for accessibility relatively quickly, and that’ll in turn push language-specific TTS development.
Sales of smart speakers were robust this past holiday season, especially in the US – and will continue to grow rapidly in 2018, making these products the fastest-growing consumer tech of all time, according to Canalys. While the US still leads this adoption, demand is growing in the rest of the world. And that will mean better TTS voices for foreign-language locales, since they’re critical for smart speaker functionality.
It’s also good to note that TTS voice development in some languages should be less labor-intensive than it is for English – for example, in languages which are written phonetically and have a smaller number of vowel sounds, like Spanish. In fact, text-to-speech should dominate Spanish voiceover for accessibility relatively soon as well, especially since this language has a large number of speakers in the US.
Much of the content being made accessible through voice audio is critical for its users. Consider a health plan guide – plan members need this information to understand their health benefits, pick doctors, and find services. But these docs usually run over 100,000 words or longer, which can take a human voice-over talent weeks to record. During that time, a significant portion of the population has reduced access to this critical information. While the cost benefits of recording with TTS are certainly attractive, for accessibility it’s the drastically shorter timelines that make all the difference.
Finally, the reason that TTS will become the standard for document audio is that it’s already the standard for many accessibility applications. In fact, text-to-speech systems, along with synthetic human-sound voices, were absolutely crucial to making computers accessible to the blind and sight-impaired. “Reader” software has existed since the release of the first Macintosh computer in 1984. And today, almost all devices include a TTS accessibility function. Want to see one? Go to the Accessibility settings on your iPhone or Android device and you’ll see a selection of TTS voices ready for use.
The audience served by document accessibility audio is already familiar with these voices, and with TTS in general. They understand its quirks and shortcomings. And they expect to hear it.
As a post-production and multimedia localization professional, there are a few things you can do. Keep in mind that this is a widely-used tool for accessibility – that’ll help you pitch it to clients who aren’t familiar with the technology and may default to human voiceovers. Second, remember that not all languages have the same TTS support, so check your project’s language set against existing voice fonts. And third, don’t forget that even though TTS turn-arounds are much shorter, the productions still take some time. Audio script formatting and pronunciation guideline creation can be particularly labor-intensive (and lead to project delays if not done correctly for TTS), and the audio still requires a quality assurance review. Make sure to allow enough time to “record” the audio correctly, and to check the files as you would with any other voice audio. As with human recordings, a thorough QA performed by a native speaker – which JBI provides on all productions – is the only way to ensure the quality and accuracy of any multimedia localization and accessibility project.