3 Time-Code Errors that Kill Subtitles Translation Projects

When subtitles and voice-over make a great combo localization option

May 11, 2017

3 Tips for Lining up Untimed Voice-Over Audio Translations to Video

May 18, 2017

Captioning and subtitling are now a must for any video production project – many countries require captions and subtitles for the hearing impaired (SDH), and subs translation is incredibly cost-effective. Tab-delimited and coded text deliverables, in particular, have made distribution cost-effective, scalable, and incredibly easy-to-implement – though certainly not trouble-free. After all, all text deliverables are digital code, which can be rendered moot with one character inserted or deleted during translation.

This post will list the three most common code errors inserted during translation, and what you can do to avoid them.

[Average read time: 4 minutes]

Text caption & subtitle deliverables

Text files are the most-requested deliverable for subs video translation. This is quite a change from even five years ago, when most projects required burning them to picture, or delivering them as graphics for overlay. There’s a simple reason for this change – online video streaming platforms like YouTube, Vimeo, Netflix, Amazon, and Hulu, as well as channel-based apps. As more content has moved online – including TV shows, movies, marketing spots, e-Learning and instructional videos – the subtitle deliverables have followed suit. So have dubbing and voice-over deliverables, by the way – most of those platforms also support multiple foreign-language audio channels.

Today multimedia localization professionals must have a working knowledge of text-based formats like SRT, STL, WebVTT, and SCC. If they’re working on e-Learning or corporate content, that adds DFXP and TTML to the mix, XML-based formats that come from Flash. All of these formats have different levels of complexity, ranging from SRT, which displays time-codes and caption units in a relatively user-friendly way; to SCC, which encodes each character as a specific binary hex code, as you can see in the following side-by-side comparison:

srt-scc-comparison-for-video-subtitling-options-translaiton-localization.jpg

No matter their complexity, all captions text files have one thing in common – strict requirements in terms of their structure. One change to that structure in the time-codes, or even in the number of tabs or spaces in a file, can have disastrous consequences. Naturally, this is an issue in translation because linguists often have access to this code as they do their work, and sometimes they make mistakes.

Following are the most common ones that can be real “code-killers.”

1. Inserted or deleted characters – or spaces

Can you spot the issues in this SRT file (it’s the same one as above)?

srt-video-subtitling-translation-file-with-time-code-errors.jpg

They’re circled in this following picture:

srt-video-subtitling-translation-file-with-time-code-errors-NOTED.jpg

There were actually four issues – a sequence of two hyphens converted to an em-dash (#1), a space inserted in the middle of an SRT “arrow” (#2), a space inserted before an end time-code (#3), and a tab inserted at the end of a line (#4). Note that #3 is nearly invisible, and #4 completely so.

All of these mistakes would’ve caused issues when adding this SRT to a video on most online players. In fact, most players give an integration warning, but usually just for the first line with an issue. And these issues can be very difficult to fix, especially if they’re invisible, like the tab above. And of course, an SRT file for a feature film or TV show can have hundreds or even thousands of segments, so a widespread issue can mean hours of frustrating labor.

And these mistakes are very easy to make – most people often lose their place in a long document and hit keys on their keyboard, or delete text and then not replace it correctly. Ultimately, simple human error will engender this kind of issue in the work of even the most diligent linguists.

2. Misplaced or poorly structured format tags.

Many text files support font formatting, screen placement, or various other special formats. Most of them, in fact, use standard XML tags, including in some formats that aren’t XML-based to begin with. If you’ve translated in XML, you know how easy it is to misplace those tags – even one <i> open code (for italics) without its corresponding </i> will throw off an entire string. That applies to tag hierarchies as well – just one out of place will invalidate the code. We see this issue regularly in German localization, for example, since the syntax in that language is so completely different from English, and translation requires moving a lot of tags around.

If you’re using a format with tags, make sure that your linguists are familiar with XML in general or the file format specifically.

3. Wrong time-codes

Most of the time this error occurs as simple human mistake, much like the inserted spaces, characters or tabs in the first item. But sometimes this happens because linguists will change the time-codes themselves, usually to combine two English-language segments, or to split them up when the translations are too long to fit. The mistakes fall into two main camps – first, time-code structure errors (like a missing reel number, a missing colon, a frame number that doesn’t fit within the frame-rate, or just missing decimal number); and second, time-codes that overlap with the previous or next subtitle, which are common when translators try to lengthen the on-screen time of a particular segment.

These mistakes are particularly difficult to fix, especially for double-byte language projects, like Japanese and Chinese subtitling, since they require a linguist, as well as a professional time-coder who can re-spot those sections of the video.

So, what can you do?

Fortunately, you can avoid these issues by doing the following:

Choose the simplest format that will work for your project. The more code in your text file, the better the chance that it’ll get mangled. Use the simplest format that will still give you the quality, and flexibility, that your localization project requires. Unfortunately this isn’t possible for a lot of entertainment content, which uses text formatting to help convey emotion or irony. And a lot of content still uses SCC (the hex-based format above), which is particularly difficult to produce.
When translating, stay away from any software that auto-formats. Especially Microsoft Word. The hyphen-to-em-dash conversion above is a Word auto-correction. Plus, remember that Word often auto-capitalizes the first word after a hard return, which can wreak havoc on subs with multiple lines.
Lock time-codes in your workflow. If your linguists can’t touch them, they won’t be able to insert errors by mistake, or when trying to re-spot by hand. JBI Studios has a workflow that specifically moves the time-codes away from the text for translation to avoid these issues as part of its video subtitling services.
Employ linguists with subtitling services experience. It’s hard enough for experts to translate subs, let alone for newbies.
Do a QA of the implemented captions & subtitles. This won’t avoid issues created by the translation process, but it will make sure that no issues make it to your audience. This will usually mean taking videos offline while the captions are being implemented, or setting up a beta test. Allow time for this in your workflow.

That last sentence is good in general for audio and video translation projects. Rushing through caption & subtitle projects can lead to more human error, which means more bugs and longer QA cycles. Planning projects thoroughly – even during the original English-language project’s post – is the best way to makes sure that audio & video translation projects run smoothly, release on time, and stay on budget.