More than one out of five Americans speak a language other than English with Spanish as the most popular second language. In Canada, English and French are the two official languages. In India, there is a wealth of many different languages with Hindi as the most popular and English widely spoken. It makes sense then that companies like Google, Apple, Amazon, and Samsung are making sure that their voice assistants and smart speakers are equipped with multiple languages.
With recent updates, certain voice assistants are able to respond to a bilingual user speaking one language then switching to another without adjusting the settings. Voice assistants are also getting better at understanding the regional accents of a particular language. These advancements take a great deal of research, development, and localization.
Let’s take a look at how voice assistants learn languages, some of the challenges involved, and the leaders in voice assistant localization.
[Average read time: 4 minutes]
Voice assistants are software that utilize speech-to-text, text-to-speech, in addition to natural language processing systems to learn languages. Smart speakers (Google Home, Amazon Echo, Apple HomePod, etc…) are physical devices that have built-in voice assistants.
Natural language processing (NLP) uses neural networks: electronic networks modeled after brain neural patterns that allow computers to learn and improve through experience/input, one of the modern breakthroughs in artificial intelligence. Through neural networks, NLP better predicts the next sound in a sentence by learning the common sound combinations for a language. NLP also helps voice assistants learn how to better distinguish between similar but distinct sounds like d, b, p in English (as in the words pad, bad, pat).
Text-to-speech programs like WaveNet or Google’s Tacotron 2 are able to learn languages from human speech input alone. One approach, for example, uses researchers posing open-ended questions to a large number of native speakers of a language: “How was your day?” “What do you think about [subject]?” Through the question and response format, the program is able to learn common responses to a multitude of questions in a particular language.
Native speakers are needed to oversee the localization in order to teach the program cultural norms and usage that a computer is unable to pick up on. For instance, heavy metal can refer to the musical genre or high density metallic elements. Add the word “fan” after it, without some tweaking by the developers, the voice assistant might think someone likes spinning fans made of lead versus the more common usage of someone being a fan of heavy metal music.
photo by Andres Urena
Even with neural networks, there are still a number of challenges facing AI language learning, such as accents. According to a recent Washington Post study, Google and Amazon smart speakers were 30% less likely to comprehend English spoken with a non-American accent.
This can be a major issue in countries like India, where it would be beneficial to have a smart speaker that can respond to both Hindi and English commands, however the English would generally have a heavy Hindi accent. In this case, developers would request Hindi-accented voice recordings of an English script to train their system to recognize the commands.
Another solution is by adding contextual information such as where the smart speaker is located. If the smart speaker is located in Québec (where they speak Canadian French) versus France (European French), the device can be localized to respond better to the type of French used in that area.
Adding new languages to a system takes time. It requires an estimated 30 to 90 days to train a voice assistant in a new language, however, it may be much longer before the system is able to understand commands with a high accuracy rate. As of 2019, Apple’s HomePod and Amazon Echo understand English and Hindi commands with a 94% accuracy rate, whereas they only have a 78% accuracy rate with Chinese Mandarin.
Google leads the multilingual voice assistant pack with it’s Google Assistant that has support for over 40 languages. Apple’s Siri comes in second with 21 supported languages, followed by Microsoft’s Cortana with 8 languages, and finally Amazon’s Alexa and Samsung’s Bixby that each support 7 languages. However, each company localizes their voices differently.
Apple has worked with voice talent Karen Jacobsen to record a unique Australian English voice for Siri, as well as former British journalist Jon Briggs for a unique British English voice. Microsoft’s Cortana has been localized for England with actress Ginnie Watson’s voice and for China using lines in Chinese Mandarin from voice actor Xiao Na.
Amazon has made similar localization efforts for Alexa with its “all-new English voice” by training the system with the various accents of English found in the different regions within India.
These major companies understand the importance of localizing content to help expand their customer base. By adding new languages, voice assistant technology is spreading globally and voice is becoming the most popular way people around the world interact with their devices. With AI these devices are getting smarter and, based on previous interaction with the user, are getting better at predicting what a user wants. In this way, voice assistants may soon be the main avenue people access information, buy products, and receive marketing messages.
Want to implement text-to-speech in your organization? Click below to see some of the solutions we can offer you.