What You Need to Know for Captioning & Subtitling in Indian Languages

What You Need to Know for Screen Activity Capture Video Localization

February 21, 2018

3 Indispensable Skills of a Corporate & E-Learning Voice-Over Talent

March 7, 2018

The demand for captioning source materials in Indian languages, as well as subtitling English-language content for India, has increased dramatically in the last few years. It makes sense – India is growing rapidly, especially when it comes to internet adoption. So what do video localization professionals need to know to caption and subtitle for a country with a linguistically diverse population?

This post will list three tips to help you caption and subtitle your videos for India.

[Average read time: 3 minutes]

Why the surge in multimedia localization for India?

Two major reasons. First, internet usage in India really has grown dramatically, by almost four times between 2011 and 2016. Moreover, this growth doesn’t seem to be slowing, so that the number of internet users is projected to almost double by 2021. Second, internet usage in the country has shifted away from English-language content, and towards local Indian languages. In fact, most internet adoption is now happening among non-English speakers.

For an in-depth look at internet use in India, see our previous post, What the KPMG/Google India Study Means for Multimedia Localization.

This surge has led directly to an increase in multimedia localization into local Indian languages, including dubbing and subtitling – in particular the latter, since it’s a cost-effective way to prepare video content for early market penetration. So what do you need to know before creating captions & subtitles for India? Let’s jump right in.

1. India has 22 official languages.

The linguistic diversity of India is astonishing – there are over 1,500 languages spoken throughout the country, and about 30 languages with 1 million or more speakers. That said, it’s good to remember that most of the population speaks one of the following 11 languages: Hindi, Bengali, Telugu, Marathi, Tamil, Urdu, Kannada, Gujarati, Odia, Malayalam and Punjabi. If you’re creating subtitles for India, it’ll most likely be into one of these languages – in fact, the demand for Hindi dubbing & subtitling has surged in the last decade, as well as for languages with high internet adoption rates like Tamil and Gujarati.

You should also know that these languages are quite different from each other. For example, Hindi and Marathi use the Devanagari script, a left-to-right abugida writing system that is instantly recognizable by the line that runs across the top of it. Urdu subtitling, on the other hand, uses an alphabet that’s similar to the Perso-Arabic one, and which is written right-to-left. Bengali uses the Eastern Nagari script, which has several differences from Devanagari. Telugu and Kannada, meanwhile, use similar forms of the Brahmic script – this one also runs left-to-right, but looks very different from both the Devanagari and Nagari scripts.

india-language-scripts-used-for-captioning-and-subtitling.jpg

You get the idea – you can’t make wholesale assumptions about local languages in India, and it’s crucial to understand the basics of each one when localizing.

2. Make sure your translation template and final deliverable supports your languages.

Not all legacy caption and subtitle formats support the Unicode character set, or right-to-left text. The Sonic Scenarist (SCC) format, still widely used in broadcast captioning, is a notorious example, since it only supports English. Likewise, many proprietary formats use one of the old character encodings that don’t support Indian languages (or double-byte languages, for that matter). Before starting your video translation project, make sure that your entire subtitles workflow supports your language set – especially your translation template, since that’ll be where your linguists work. Of course, JBI’s subtitle template supports the full range of Indian languages covered by the Unicode standard, as well as right-to-left text.

3. Make sure you have a “locked-text” reference of your subtitles translations.

This is a best practice generally for multimedia localization, but it’s especially useful for subtitle workflows. Characters get corrupted when files move from one computer to another – it’s just a fact of digital translation. Especially so when text created by a linguist on a native operating system gets reviewed by a PM on an English-language system, then transferred to a Mac for conversion, and then uploaded online to a legacy player. For this reason, it’s crucial to spot character corruptions early on, and the best way to do that is to get a reference of your translated text that’s “locked.” PDF format is best, but you can also create JPEGs or screen shots. The idea is to have a visual of exactly what your linguist sees, to make sure it’s what you see in each step of the caption or subtitle workflow.

Test your workflow for your language set before starting localization

This is especially critical if you’re not using a standard output format like SRT, or burning your subtitles to video. Remember that aside from template and deliverable support, there are many other factors that may cause issues in your final caption or subtitles implementation. For example, some online video players or e-Learning LMS systems need small code tweaks to support Unicode or right-to-left character sets. Likewise, some pre-set text format specifications may not work for Indian languages, which can sometimes require more leading or kerning. For this reason, test your full workflow, including final implementation, in your full language set, before you start time-coding. You’ll spot encoding or formatting issues when you can still make tweaks to your workflow relatively easily and cost-effectively – ensuring that your project stays on budget and delivers on time.