Investigate to transcode a video for a video on demand streaming by ffmpeg

NOTE:: This was written using ffmpeg version 4.4.2-0ubuntu0.22.04.1, and it's not been used on the production environment.

I investigated how to transcode a video by ffmpeg and play it on a browser, and in order to do it, it was required to understand the theory of how it works. In this article, I described the basics of a video streaming like CMAF, HLS and DASH, and actual commands or codes to transcode and play a video on a browser.

The video I transcoded is something like next, which I made on the Blender quickly. The audio is from Beauteous of AlexiAction in the Pixabay.

This article doesn't go into many topics for transcoding like

  • Encryption of segments
  • DRM (Digital Rights Management)

tl;dr

The below commands transcode videos for the following qualities for streaming,

  • 2.5M bps for 24 fps, which is for 480p video
  • 8M bps for 30 fps, which is for 1080p video

For a video file with an audio,

 1INPUT_VIDEO_FILE=animation_with_audio.mkv
 2
 3ffmpeg -i $INPUT_VIDEO_FILE \
 4    -c:v libx264 \
 5    -hls_time 6 \
 6    -hls_playlist_type vod \
 7    -hls_segment_type fmp4 \
 8    -b:v:0 8M -r:v:0 30 -bufsize 8M \
 9    -b:v:1 2.5M -r:v:1 24 -bufsize 2.5M \
10    -b:a:0 384k \
11    -b:a:1 128k \
12    -map 0:v -map 0:a -map 0:v -map 0:a -var_stream_map "v:0,a:0,name:1080p v:1,a:1,name:480p" \
13    -master_pl_name 'playlist.m3u8' \
14    'public/animation_with_audio/playlist_segment_%v.m3u8'

For a video file without an audio,

 1INPUT_VIDEO_FILE=animation_without_audio.mkv
 2
 3ffmpeg -i $INPUT_VIDEO_FILE \
 4    -c:v libx264 \
 5    -hls_time $SEGMENT_LENGTH \
 6    -hls_playlist_type vod \
 7    -hls_segment_type fmp4 \
 8    -b:v:0 8M -r:v:0 30 -bufsize 8M \
 9    -b:v:1 2.5M -r:v:1 24 -bufsize 2.5M \
10    -map 0:v -map 0:v -var_stream_map "v:0,name:1080p v:1,name:480p" \
11    -master_pl_name 'playlist.m3u8' \
12    'public/animation_without_audio/playlist_segment_%v.m3u8'

How Video on Demand works?

To develop a video-on-demand streaming, transcoding is used to convert the video into a suitable format of streaming. in this article, transcoding is used for 2 purposes: encoding and fragmenting.

Fragmenting is used to achieve Adaptive bitrate streaming, which is essential for seamless video streaming experiences without buffering videos yet as high quality as possible.

At first, I'll explain about Adaptive bitrate streaming and then encoding to achieve the adaptive bitrate streaming.

Adaptive bitrate streaming

Adaptive bitrate streaming is a way to improve streaming on HTTP. With this, viewers can watch a video without waiting to load an entire video and they can play the video with the highest quality as much as possible without buffering considering the network bandwidth and device type [1].

In order to implement adaptive bitrate streaming, a video file is transcoded so that it is split into segment files and encoded for streaming. Each segment is usually for a few seconds and it is for each bit rate, to a client to adapt the condition, like downloading high-quality video segments when a networking condition is good.

Streaming format

So, to implement adaptive bitrate streaming, videos have to be encoded into suitable formats for streaming. There are mainly 3 formats for this: CMAF, HLS and MPEG-Dash [2][3].

CMAF (Common Media Application Format)

This is not a protocol, but the standard to use a command media format for streaming to reduce complexity to make this work with HLS and DASH protocols. CMAF supports a standard container format, which is fragmented MP4, for segments.

HLS (HTTP Live Streaming)

HLS is the format developed by Apple and widely used by its products.

After Apple supported CMAF for HLS, it also developed the Low-Latency HLS, and thus, there are 3 versions of HLS formats

  • Traditional HLS
  • HLS with the Low-Latency HLS protocol to improve latency
  • HLS with the CMAF format to standard container files

MPEG-DASH (Dynamic Adaptive Streaming over HTTP)

MPEG-DASH is another way of streaming method. You can see some comparisons in the article from Cloudflare.

Comparisons of 3 formats

CMAFHLSDASH
Device supports requiredNoYesYes
Fragmented MP4 support (fMP4)YesYesYes
MPEG-TS supportNoYesYes

Video bit rate and frame rate

Next is about a bit rate and frame rate. Frame rate is the number of images in a second and each image is called frame. On the other hand, the video bit rate is the amount of details in each frame [4].

The recommended parameters of each bit rate and frame rate are different on each streaming service. For example, there are recommended settings on YouTube

On YouTube, the recommended parameters are different between an HDR (High Dynamic Range) and SDR (Standard Dynamic Range) video. In short, HDR is higher quality than SDR according to the article of ViewSonic.

For HDR videos, the recommended parameters of bit rates and frame rates on YouTube as of January 2024 are

ResolutionVideo bit rate for frame rates (24, 25, 30)Video bit rate for high frame rates (48, 50, 60)
1080p8 Mbps12 Mbps
720p5 Mbps7.5 Mbps
480p2.5 Mbps4 Mbps

On the other hand, audio bit rates on YouTube are followings:

  • Mono: 128 kbps
  • Stereo: 384 kbps
  • 5.1: 512 kbps

Compression algorithm

There are multiple compression algorithms, H.264 and H.265.

H.264, or Advanced Video Encoding (AVC), has been a video compression standard. And H.265 or High-Efficiency Video Coding (HEVC), is a successor of H.264 [5].

As a successor, H.265 compressed files roughly twice as efficiently as H.264 does while maintaining the image quality, although requiring more computational resources, not only the time for encoding but also decoding for video playbook.

FFmpeg commands

Now let's look at how the above ffmpeg command works.

 1ffmpeg -i animation_with_audio.mkv \
 2    -c:v libx264 \
 3    -hls_time 1 \
 4    -hls_playlist_type vod \
 5    -hls_segment_type fmp4 \
 6    -b:v:0 8M -r:v:0 30 -bufsize 8M \
 7    -b:v:1 2.5M -r:v:1 24 -bufsize 2.5M \
 8    -b:a:0 384k \
 9    -b:a:1 128k \
10    -map 0:v -map 0:a -map 0:v -map 0:a -var_stream_map "v:0,a:0,name:1080p v:1,a:1,name:480p" \
11    -master_pl_name 'playlist.m3u8' \
12    'public/animation_with_audio/playlist_segment_%v.m3u8'

To see the details of options and how they work, there are the official document and the wiki.

Here, I'm going to describe them briefly. There is more information on pages like wiki. For instance, H.264 encoding is described in this page

Map option

The map option chooses which stream of an input file should be output. For example, the command ffmpeg -i input0.mp4 -i input1.mp4 -map 0:v -map 1:a means

  • -map 0:v means the video stream of 1st input file input0.mp4 will be output
  • -map 1:a means the audio stream of 2nd input file input1.mp4 will be output

The more details of map option are described in the ffmpeg wiki

HLS specific parameters

  • -c:v libx264: Use H.264. H.265 didn't work on a browser.
  • -hls_segment_type: The file format of a segment file. fmp4 means fragmented MP4 format.
  • -hls_time X: The time for each segment in seconds.
  • -hls_playlist_type vod: Emit Emit #EXT-X-PLAYLIST-TYPE:VOD in the header of a m3u8 (manifest) file, and set hls_list_size to 0.
  • -hls_list_size: The max number of segments. The default value is 5 and 0 means all segments.
  • -hls_segment_filename: The filename of each segment
  • -master_pl_name: The playlist name for HLS
  • var_stream_map: Define how to group the audio, video, and subtitle streams into variant streams.
    • v:0,a:0 v:1,a:1,s:1 creates a first group from the 1st stream of video and audio, and 2nd group from the 2nd stream of video, audio, and subtitle.
    • The allowed values for groups/variants are 0 to 9.
    • To set a text instead of indexes in segment filenames, add name: on each variant

Bit rate and frame rate parameters

  • -b:v Xk: The bitrate of a video stream
  • -b:a Xk: The bitrate of an audio stream
  • -r:v X: The frame rate on a video stream

On top of the above, there are a few parameters to limit output bit rates, described in the ffmpeg wiki.

  • Bit rate and buffer size:
    • bufsize determines how frequently checks for calculating and correcting the average bit rate. Without this, output bit rates could be different from the specified average rates too much or too low.

Frontend

Adaptive Bitrate Streaming, both of HLS and DASH and likely CMAF, isn't supported on browsers like Google Chrome, so it's required to use a library like Video.js.

 1<html>
 2    <head>
 3        <meta name="viewport" content="width=device-width, initial-scale=1.0">
 4        <link href="//vjs.zencdn.net/8.3.0/video-js.min.css" rel="stylesheet">
 5    </head>
 6    <body>
 7        <video id="video" class="video-js vjs-default-skin" controls>
 8            <source src="./output/playlist.m3u8" type="application/x-mpegURL" />
 9
10            Your browser does not support the video tag.
11         </video>
12
13         <script src="//vjs.zencdn.net/8.3.0/video.min.js"></script>
14         <script src="https://unpkg.com/@videojs/http-streaming@3.8.0/dist/videojs-http-streaming.min.js"></script>
15         <script>
16            videojs('video');
17         </script>
18    </body>
19</html>

File structure

Before running the above command, create directories public/animation_with_audio and public/animation_without_audio. Then it'll output files under those directories. Then put index.html file described on the above section under the public directory. So, the final file structure looks like next.

 1$ tree .
 2.
 3├──
 4├── animation_with_audio.mkv
 5├── animation_without_audio.mkv
 6├── hls.sh
 7├── index.js
 8├── package-lock.json
 9├── package.json
10├── public
11│   ├── animation_with_audio
12│   │   ├── init_0.mp4
13│   │   ├── init_1.mp4
14│   │   ├── playlist.m3u8
15│   │   ├── playlist_segment_1080p.m3u8
16│   │   ├── playlist_segment_1080p0.m4s
17│   │   ├── playlist_segment_1080p1.m4s
18│   │   ├── playlist_segment_480p.m3u8
19│   │   ├── playlist_segment_480p0.m4s
20│   │   └── playlist_segment_480p1.m4s
21│   ├── animation_without_audio
22│   │   ├── init_0.mp4
23│   │   ├── init_1.mp4
24│   │   ├── playlist.m3u8
25│   │   ├── playlist_segment_1080p.m3u8
26│   │   ├── playlist_segment_1080p0.m4s
27│   │   ├── playlist_segment_1080p1.m4s
28│   │   ├── playlist_segment_480p.m3u8
29│   │   ├── playlist_segment_480p0.m4s
30│   │   └── playlist_segment_480p1.m4s
31│   └── index.html
32└── transcode.sh

Note that index.js is a small script to start an HTTP server.

Others

Up until this section, all of them to run the above video was described. I also looked into a few other things which described below.

How to transcode to DASH format?

The next command also generates segment files by DASH format, as well as generating manifest files for HLS.

 1ffmpeg -i $INPUT \
 2    -c:v libx264 \
 3    -c:a aac \
 4    -f dash \
 5    -dash_segment_type mp4 \
 6    -seg_duration 4 \
 7    -init_seg_name init\$RepresentationID\$.\$ext\$ \
 8    -media_seg_name segment\$RepresentationID\$-\$Number%05d\$.\$ext\$ \
 9    -hls_playlist true \
10    -hls_master_name 'playlist.m3u8' \
11    public/output/output.mpd

You can see the details of an option in the official document.

How to recognize if a video file exists an audio or not

I cannot find a way to run ffmpeg for both video files with and without audio. As a workaround, we can check if a video file contains it or not by the following script.

1AUDIO_CODEC_NAME=$(ffprobe -i $INPUT_FILE -select_streams a -v error -show_entries stream=codec_name -of json | jq -r '.streams[0].codec_name // empty')
2
3if [ "$AUDIO_CODEC_NAME" = "" ]; then
4  // No audio
5else
6  // With an audio
7fi  

Reference

There are also other resources that describe more details about video transcoding such as the article of Scaleflex Blog.