This is one of the most common questions we get from voice-over recording clients, for both video and audio applications. Specifically, are timed audio and synchronized audio different deliverables? The answer is a resounding yes. More importantly, they’re different in ways that affect recording workflows – and therefore, pricing. So it’s essential for audio and video translation project managers to be able to understand the difference between them.
This post will outline the difference between timed and synchronized voiceover audio.
[Average read time: 3 minutes]
Timed and synchronized voiceover audio are indeed very similar, but with a small distinction that makes all the difference. In short, timed audio just has to be a certain length, usually called a time limitation in the studio. Synchronized audio has a time limitation as well – but it also has to synchronize to another element, usually a video. To elucidate this distinction, we’ll look at both deliverables in detail.
As we said earlier, timed audio just has to have a certain time limitation, or come in at a certain number of minutes and seconds. The most obvious example is radio commercials used in broadcast settings, which must all be a certain length, usually, 15, 30 or 60 seconds exactly. Why? Broadcasters have to be able to place a certain number of commercials together, one after the other, and know that they’ll use up exactly 2 or 3 minutes of time – not a millisecond more, and not a millisecond less. When recording the voice over for a 30-second spot, then, it’s imperative to have the talent do as many takes as necessary to get the timing just right – usually this means multiple takes, as well as some tweaking from the studio engineer to shave very small amounts of time off. This also requires editing the translated scripts to make sure that they’ll fit, of course – otherwise, there’s almost no way that a Russian voice translation script, for example, would fit within the time allotted for the English, since the language expands a fair bit.
This is what a timed audio script looks like, usually – note the columns for English text, translated text, and time limitation. As you may guess by the lines, this is for a video game:
You get the picture – recording timed audio requires script editing, more takes in the studio, and a little more engineering than just recording the audio untimed. That’s why timing adds a little cost. What you may not realize is that timed audio is required as a deliverable for many voiceover applications. For example, video games often require translated lines to be about the same length as the English (usually no more than 10% longer), so that they line up with the actions that the characters are doing. For example, a character’s death moans really shouldn’t be audible after the character has died. Same with public announcements – for example, metro stations need announcements to be done before passengers have to get on the train, meaning that they can only be so long. There are many more timed audio applications out there.
Synchronized audio, then, is timed audio that also has to correspond to another, separate element. Usually, it’s a video – synchronized audio is most often voiceover for video. It’s different from timed audio in that synchronized audio doesn’t just have to be a certain length; rather, it has line up to the different cue points in the piece with which it’s synchronizing. Those cue points are of course also called sync points.
It’s better to see it than to read about it – watch the following video in Japanese, and note how the voiceover lines up to the words on the screen:
{{ script_embed(‘wistia’, ‘hjsls6ky4f’, ‘ ‘) }}
To fit with the titles, the foreign-language voice over has to be the same length as the original English-language audio – in that respect, it’s timed audio. But the voiceover also has to line up with each bulleted item, if possible so that the word said lines up with the word that appears on-screen. This is especially difficult to do in languages that have a significantly different syntax from English – for example, it’s very tricky in German voice-over projects, since German usually inverts the subject-verb order we English-speakers expect.
When recording synchronized audio, professional voiceover talents usually watch the video to which they’re synchronizing. The really good ones can watch the video, listen to the English original audio, and time their voice to fit – as they record. And, usually reading the text for the first time, as in the following picture from our studio.
Lip-sync dubbing is the most difficult type of synchronized audio, since talents must line up their voice over to the lip movements of the actor on-screen – in another language. And, they must perform at the same time, often scenes in which characters get shot, cry, laugh hysterically, run, fight, burp, rap… you get the idea. Needless to say, the dubbing talent pool is smaller than the overall VO pool.
The professional studio setup is more complex for synchronized audio than it is for timed audio – you still need the script timing for editing, but now the playback in the studio includes the sync element, usually a video. Moreover, because the recording situation is more difficult for the talent, these recordings take more time, requiring more takes usually. Therefore, the cost is slightly higher in general than it is for timed audio.
A good way to think of timed and synchronized audio is using the rectangle/square analogy. A square is a special kind of rectangle – like a rectangle, it has four sides and four 90-degree angles. However, a square has one further requirement – it must have four equal sides. Therefore, all squares are rectangles since they fulfill all the requirements of a rectangle, but the reverse is not true. Same with audio – synchronized audio is a trickier, more specialized kind of timed audio. Because of that, it requires more work in the script editing and in the studio, more specialized talents – and of course, is slightly less cost-effective and more time-consuming.