What’s the Difference Between SRT & WebVTT in Captioning & Subtitling?

The 4 Video Localization Options for e-Learning Scenarios
May 2, 2018
Isn’t Subtitling for the Deaf & Hard-of-Hearing Just Localized Captions? (No.)
May 16, 2018

One of the questions we get most from clients is the difference between the SRT and WebVTT captioning and subtitling formats. This question makes sense – they look similar and most online players can take both of them. That said, every multimedia localization professional should have a working knowledge of their differences – especially in terms of capabilities and workflow.

This post will list the differences between the SRT and WebVTT captions/subtitles formats.

[Average read time: 3 minutes]

What is the SRT format?

The SubRip text format – commonly called SRT – was initially developed as part of a program that extracts captions and subtitles from media files. This SRT text format was notable for its simplicity and ease-of-use, especially when compared to other formats available at the time, many of which used XML-based code. In the following screenshot of a Spanish subtitling SRT file, you can see just how simple this format is.


As you can imagine, this format was very attractive for the post-production and video localization of online media, and in 2008 YouTube adopted it. Vimeo and Netflix did so as well, and today it’s the go-to for many video streaming platforms.

What’s the WebVTT format?

The Web Video Text Tracks format (WebVTT, also known just as VTT) was initially created in 2010. The idea was to base the format on SRT – in fact, it was initially called WebSRT – but to also make it more robust, specifically by enabling HTML5 code functionalities. The resulting caption and subtitle text files have a “.vtt” extension, and look quite similar to SRT, as you’ll see in the following screenshot. (We’ll discuss the differences highlighted in yellow below.)


WebVTT is also used widely, especially for e-Learning localization and multimedia applications, since it works particularly well with HTML5-based platforms.

So, what are the main differences for captioning & subtitling?

Though many platforms and post-production suites will accept both formats, SRT and WebVTT are different enough that they’re not actually compatible. That is to say, if a program or platform is expecting one of them exclusively, it will not be able to work with the other. For that reason, it’s critical to know the main differences.

  1. Caption numbers. VTT files can have caption numbers, but they’re not actually necessary, as you can see in the file above. SRTs must have them.
  2. Time-code format. SRT separates seconds from milliseconds with a comma. VTT uses a period instead (see the time-code in yellow above). Also, no time-code hours are required in VTT files, though they’re almost always provided.
  3. Metadata. WebVTT files can have metadata, and in fact, some is required, in particular having WEBVTT in the first line of the file. The VTT screenshot above has the full header highlighted (it includes file type and language), as well as a metadata note in the body. SRT can’t support metadata.
  4. Formatting options. WebVTT has very robust features, including font, color and text formatting, and placement. Initially SRT couldn’t support any formatting, but it’s been upgraded to support basic text formats (bold, italic, underline) and placement. However, it doesn’t have nearly the same capabilities as VTT.

How do multimedia localization professionals pick a format?

There are three main factors to consider when choosing one of the formats.

  • Video player: If your or your client’s video player only supports one of them, the decision is made for you.
  • Feature set: VTTs are the clear winner here – they have much more functionality than SRT files, some of which is very useful for complex multimedia localization.
  • Simplicity: Sure, SRTs don’t have the same bells and whistles as VTTs, but this can be a virtue. For example, linguists often prefer to translate directly in SRT since it will have fewer text code elements, or because just about any subtitles processing program can take it. And remember – despite its limitations, SRTs can still support the text formatting and placement required to produce professional captions and subtitles.

In short, both formats are excellent for meeting accessibility and video localization requirements. And a professional studio that specializes in captioning and subtitling (like JBI Studios) will be able to provide both as deliverables, and even convert from one to the other. The key to remember, though, is that the workflows and actual code for each format will be slightly different. For this reason, it’s important to know your deliverable format before beginning your captioning or subtitling project. That’s the best way to avoid re-work and ensure that you deliver on time and on budget. 

Learn how JBI Studios work