What You Need to Know When Subtitling Asian Double-Byte Languages

5-reasons-text-to-speech-will-be-standard-for-accessibility-audio
5 Reasons Text-To-Speech Will Be the Standard for Accessibility Audio
January 17, 2018
what-you-must-know-about-usage-for-online-video-voice-over-dubbing
What You Must Know About Usage for Online Video Voice-Over & Dubbing
January 31, 2018

Asian double-byte languages have been a challenge for multimedia localization since the early days of computing – and in particular for captioning and subtitling. Fortunately, most of the initial issues with these languages have been resolved, and long-gone are the days of corrupted documents and single-language versions of Windows. But it still takes a bit of extra work and know-how to avoid issues when localizing into them.

This post will list the four things you must know to caption and subtitle Asian double-byte languages.

[Average read time: 4 minutes]

What exactly are Asian double-byte languages?

As recently as 20 years ago, most digital text was encoded using an 8-bit system, which had a limit of 256 distinct characters. There are 8 bits in 1 byte, so it was said that this system used a single byte of information to encode each character. And of course, this worked well for English and a few other languages that use a basic Latin character set, and which didn’t require more than 256 different character encodings.

Not so great for languages with much larger character sets – namely, Simplified and Traditional Chinese, Japanese, and, to some extent, Korean. For example, Chinese alone has over 50,000 characters, though most dictionaries list only about 20,000. New encoding systems – with two bytes of information per character encoding, or “double-bytes” –  were created to support each one of these languages.

While this solved character support issues, having different encoding systems for each language was a huge challenge to document translation and multimedia localization, as well as for communication over the internet, which was growing rapidly. A consortium was established to create a universal character encoding set, and in 1991 it released the first version of Unicode, which is used widely today. The current version supports over one million encodings. In many ways, the term double-byte is outdated, except that the defining feature of these languages – large sets of pictographic and ideographic characters – still affects translations.

How “double-byte” character sets affect captioning & subtitling workflows

Here’s what you need to do to caption or subtitle double-byte languages successfully.

1. Make sure your deliverables support these languages.

With the widespread use of Unicode, we expect just about any software or document to support multiple languages. However, some legacy caption and subtitle formats have issues. The Sonic Scenarist (SCC) format, still widely used in broadcast captioning, is particularly restrictive – it really only supports English. Likewise, many proprietary formats use one of the old character encodings that support a limited number of languages. When working in these languages, especially if you’re translating legacy content, it’s always good to do a quick check that your deliverables have full language support.

2. Keep character legibility in mind.

Pictographic and ideographic characters are often complex, which means that character legibility can be an issue, especially on videos which have a reduced resolution. This is especially true for Traditional Chinese subtitling, since it has the most strokes per character, on average. Fortunately, there are a few things you can do to avoid this.

First, adjust your template font specs – often, this will mean larger font sizes and more leading. Second, look for ways to ensure caption and subtitle legibility on screen, like adding a background to the text, as discussed in this post and illustrated in the following picture:

chinese-subtitling-localization-with-background-for-readability.jpg

This is especially useful on videos that have a lot of action in them. And third, avoid fonts that don’t work well on screen – usually the more ornate ones, or designer sets created specifically for print applications.

3. Make sure your template takes different character limitations into account.

The length of each caption or subtitle line should also be shorter than it would be for other languages to ensure readability. Make sure your template supports these language-specific character limitations, and that your linguist understands the requirements as well. Of course, JBI’s proprietary subtitles translation template supports character limits for double-byte languages, and in fact features a counter that helps linguists stay within them.

4. Insert manual line breaks.

Japanese and Chinese don’t use spaces to separate words or ideas, so it’s nearly impossible for a non-native speaker to know where actual words or linguistic units “break” in these languages. This creates issues in particular in Japanese subtitling, which also uses a phonetic alphabet alongside the pictographs and ideographs borrowed from Chinese. Therefore, it’s crucial to make sure your template allows linguists to insert manual line breaks, or at least note them, so that the final text reads properly on screen. JBI’s subtitle translation template likewise supports manual line breaks for both text and burned-in video deliverables.

In fact, line breaks are a challenge for Japanese translation & localization services in general, especially for on-screen titles replacement and e-Learning course integration. If you’re localizing any media in this language, make sure you take this into account.

Best practices for any captioning or subtitles localization

While double-byte languages have the largest character sets, many other foreign languages use complex or sophisticated characters – think of Thai, for example, with its accents above and below the main script line. And many other languages present unique challenges, like Arabic, which has calligraphic ligatures, relatively complex characters with multiple accents, and which flows from right to left. The tips listed above should really be taken into account whenever you subtitle into any language, as part of a thorough prep stage – which is critical to ensure that your project delivers on time and on budget, and that your subtitles integrate seamlessly with your video.

Learn what JBI Studios Offers