3 Pronunciation Musts for Text-to-Speech Voice-Over Recordings

3 Tips for Replacing Graphics Text in e-Learning Translation
January 26, 2017
The 3 Things You Must Know to Record Chinese Voice-Over
February 2, 2017

Text-to-speech (TTS) is everywhere, and you’re probably using it every day. Siri, Google Now and Cortana, or the voices you hear in GPS navigation apps, customer service phone menus, or even the messages at the train station – they’re all text-to-speech voice fonts. Better yet, they’re an exciting technology for corporate and e-Learning voice-over because production with them is so cost-effective and rapid. But they do present a few challenges, one of which is pronunciation.

This post will provide the three rules you must follow to ensure proper text-to-speech voice-over pronunciation, with a video example for each one.

[Average reading & viewing time: 4 minutes]

TTS pronunciation is now sophisticated

Yes, TTS is a cost-effective alternative to human voice-over for high-volume, informational content. A good example is making health care program guidelines accessible to the hearing-impaired. These documents are generally hundreds of pages long, so that recording them with a human talent is both pricy and incredibly time-consuming. A recording that could take months with a human talent, on the other hand, can be created with a TTS font in a matter of days.

text-to-speech-tts-pronunciation-sample.jpgHowever, the main reason that TTS is now useful for English and foreign-language voice over recordings is that the technology itself has advanced to the point where the voices sound better – they’re more mellifluous, less choppy, and more natural-sounding – and they don’t make silly mistakes, like not being able to distinguish between the present and past tenses of the word “read.” They also now do things like raise their intonation when there’s a question mark, or increase the overall intensity of sentences that end in exclamation points.

They’re still not perfect, though, and TTS productions should stick to the following three rules to avoid the most common pronunciation issues.

1. Avoid foreign-language words in your text

This isn’t always possible, of course. Corporate and e-Learning content contains brands, names of people and places, and even loan words from another language. When translating, any proper names automatically become foreign-language words. In the following example, you’ll see how the TTS voice font has problems pronouncing a well-known last name.

This becomes even more of an issue when localizing into non-Latin languages, though TTS fonts are sometimes surprisingly adept at pronouncing English-language words. Mandarin and Cantonese Chinese text-to-speech voice fonts, for example, are surprisingly good with English-language words.

2. Spell out acronyms, or write them out

TTS fonts “know” the most basic ones, like “i.e.” and “e.g.” – but anything beyond that will be a problem. Even an acronym as common as “CEO” can be an issue, as you can see in the following video.

In general, avoid acronyms, but if you must use them, make sure to add periods after each initial for best pronunciation. Also, it’s good to note that acronyms in Latin languages other than English will be pronounced in that language – never in English. Acronyms left in Latin script in non-Latin language translations (like Russian, Arabic, Chinese, Japanese and Korean) will always be pronounced in English, with varying degrees of success. For example, Chinese TTS has better results with these kinds of terms, whereas for Japanese TTS recordings it’s best to transliterate whenever possible. All of this is true with human voice-over talents, but the difference is that their knowledge of English differs from one person to another.

3. Be prepared to use phonetic spellings

When you must have a term, or name, or word in your recording, and the TTS voice can’t say it, there’s only one solution – phonetic spellings. It effectively “tricks” the TTS voice into saying the word you intend. You can see an example in the following video:

Note that these aren’t traditional phonetic spellings, like you’d see in a dictionary. TTS voice fonts can’t process those, so phonetic spellings have to be approximated using words or syllables. This requires some guesswork. Moreover, it’s difficult to predict how a voice font will say a particular phonetic spelling, so it’s crucial to test them out with a TTS voice generator before committing to audio. This, of course, is a service that JBI Studios provides.

Scripts have to be perfect t?o avoid TTS problems

Did you understand the above headline despite the typo? Human brains compensate for communication glitches like that one all the time. A TTS voice engine, on the other hand, wouldn’t be able to make it through the above phrase. Therefore, TTS scripts have to be free of typos, additional line breaks, extra spaces, double periods, or anything else that will effectively trip up a voice font. That said, since TTS voice recordings allow for very short project turn-arounds, it’s not out-of-the-question to add a pick-up round to deal with a few straggler TTS pronunciation issues – and of course, a TTS voice pick-up is much more cost-effective than one with a human voice talent. Ultimately, that’s the beauty of TTS: it gives translation and localization producers much more flexibility when preparing budgets and timelines.

 

Download “7 Myths of Audio & Video Translation,” JBI Studios’ indispensable guide to audio translation and dubbing.