3 Tips for Lining up Untimed Voice-Over Audio Translations to Video

3 Time-Code Errors that Kill Subtitles Translation Projects

May 15, 2017

five-steps-to-kick-start-your-voiceover-acting-career

Five Basic Steps to Kick-Start Your Voice-Over Acting Career

May 22, 2017

Text expansion is one of the greatest hurdles to recording foreign-language voice-over for video and animations. This is especially true when the localized VO is recorded untimed – that is to say, without synchronizing it to the original English-language content, usually because an editor will expand the visuals to match the new professional voice over recording. This task is pretty difficult, but very common in corporate multimedia translation, especially for e-Learning courses and webinar presentations. Fortunately, there are a few tricks of the trade to make it easier.

This post will list three tips that editors and e-Learning developers can use to make lining up untimed audio less painful, and more accurate.

[Average read time: 4 minutes]

Why recording “wild” VO can be good for video & animations

There are a few reasons that recording untimed voiceover audio – also known as “wild” audio – might be a better option translated VO that is going to be integrated with a video or animation. For starters, not editing scripts for timing can provide better localization accuracy, especially for highly technical, governmental or medical content. Second, untimed audio is generally more cost-effective, since it doesn’t require editing a script for timing, and because it takes slightly less time to record in the studio. Finally, untimed audio can all be recorded at a natural pace – as opposed to timed audio or video dubbing, in which the talent occasionally has to speed up a little to get a line to fit.

Recording “wild” is particularly common for e-Learning translation, specifically for courses developed in Adobe Captivate and Articulate Storyline, both of which have easy-to-use timeline interfaces that allow re-synchronizing animations to localized audio. Likewise, video productions can benefit from re-editing for the localized off-screen narration if they have leeway in their footage (usually a fair amount of B-roll) – as well as no on-screen speakers, of course. Same thing for animations – editors can extend key frames to allow more time for the localized audio, or even re-time entire scenes if they’re using software that automatically re-animates lip movements to VO audio files.

Of course, recording untimed audio requires re-synchronizing to visuals, which can be a substantial labor cost by itself. The following three tips all try to make this process easier, more accurate, and ultimately less labor-intesive.

1. Segment voiceover into audio files that correspond to main animations.

Audio deliverables for video are usually single files – one per file, matching its exact length – that synchronize to the visuals. Video editors can just line up the translated audio at the first frame of a video, and it’ll synchronize perfectly – it is said that the audio file is “frame-accurate” to the source video.

When recording untimed audio, though, the output file wouldn’t be the same length as the video, nor would it line up to any synchronization points. Moreover, long audio files usually have several sync points within the file. For example, the following sample script is for a biomedical device tutorial – in this case, different pictures will flash for each of the steps:

The editor will have to take the full audio file produced in the studio, find the sync points by listening to the translated audio, and then line up each one of the animations to its corresponding step. Not a terribly difficult task for Spanish voice overs – but if you’re working in a non-Latin language (for example, for Arabic, Russian, Chinese, Korean or Japanese voice overs) this task can be exceptionally difficult.

Breaking up the audio into its smaller “synchronizable” clips, however, can help an audio editor to find those sync points more easily. For example, imagine breaking up the script like this:

The audio editor would get 6 audio files, which he or she could sequence in the editing timeline, and then adjust the footage according to its corresponding file. This is a great option for video editing timelines especially, since most of those software support importing multiple audio files at once (lining them up one-by-one in the audio track), as well as snapping to edits for faster sync. Not as great an option for e-Learning sync, since neither Captivate nor Storyline (though especially the latter) are great with multiple audio files in slide timelines.

2. Add markers to the recorded audio file.

For this method, the voice over production studio outputs a single file per video, but then adds markers to it, usually per specifications from the client. For example, in the following script, we’ve added markers at the major sync points in the audio – they’re the numbers inside brackets, highlighted in yellow:

Note that each numbered marker is in the same relative place in the Spanish audio. A post-production studio like JBI could then add them to the final audio deliverable – in the following screen shot of the audio recording file, the markers have been added:

A video or e-Learning editor can use these markers as guides for where exactly to line up visuals.

The best part of this method is that it keeps the overall number of audio files down. However, there are three main drawbacks. First, not all kinds of markers are supported by the different editing and e-Learning software, so it’s crucial to run rigorous tests before committing to a workflow. Second, it requires adding marker cues (the numbers in brackets) to the script, and these cues are then retained in the translation – this decreases translation memory reuse, and can lead to errors. Finally, adding markers after recording does require additional studio labor. But for some projects, markers can mean shortened QA and bug-fixing timelines, and better localized products.

On a side note, Storyline has a cue points feature that can aid localization – for more on that, see our previous post, Using cue points to lower e-Learning translation re-sync costs.

3. Hire bilingual editors.

This isn’t necessarily a viable solution for projects with many languages. Likewise, it means bringing on an additional editor who may not be as familiar with an organization’s content, which can lead to issues. However, having an editor who knows the target language can make the re-sync process faster and much more accurate. Best of all, the editor doesn’t have to be a native-speaker or even very fluent in the target language – he or she just needs to be able to read the script and find the corresponding sections in the audio file. This means that intermediate proficiency will do for most languages. If you have a large project in one or two closely-related languages (like Danish and Finnish), hiring a bilingual editor may be a great option.

The importance of bilingual scripts

Finally, we should note that none of the above tips could be implemented without a bilingual script – that is to say, a script that has both the English and target voiceover texts, one per column, and side-by-side. While bilingual scripts aren’t absolutely necessary to record international voiceover, they are nonetheless incredibly useful in the studio, especially if the foreign-language voice over talent or director have any questions about a meaning or intent. When lining up untimed audio to a video, animation or e-Learning course, these bilingual scripts are absolutely critical, since they provide a record of what translated audio corresponds to each section of the English. Without them, project timelines can increase, and costs can rise rapidly. This is true of audio & video translation voiceover projects in general – bilingual scripts can avoid pick-ups, keeping costs low and multimedia localization projects on time.