Investigate to transcode a video for a video on demand streaming by ffmpeg
NOTE:: This was written using ffmpeg version 4.4.2-0ubuntu0.22.04.1, and it's not been used on the production environment.
I investigated how to transcode a video by ffmpeg
and play it on a browser, and in order to do it, it was required to understand the theory of how it works.
In this article, I described the basics of a video streaming like CMAF, HLS and DASH, and actual commands or codes to transcode and play a video on a browser.
The video I transcoded is something like next, which I made on the Blender quickly. The audio is from Beauteous of AlexiAction in the Pixabay.
This article doesn't go into many topics for transcoding like
- Encryption of segments
- DRM (Digital Rights Management)
tl;dr
The below commands transcode videos for the following qualities for streaming,
- 2.5M bps for 24 fps, which is for 480p video
- 8M bps for 30 fps, which is for 1080p video
For a video file with an audio,
1INPUT_VIDEO_FILE=animation_with_audio.mkv
2
3ffmpeg -i $INPUT_VIDEO_FILE \
4 -c:v libx264 \
5 -hls_time 6 \
6 -hls_playlist_type vod \
7 -hls_segment_type fmp4 \
8 -b:v:0 8M -r:v:0 30 -bufsize 8M \
9 -b:v:1 2.5M -r:v:1 24 -bufsize 2.5M \
10 -b:a:0 384k \
11 -b:a:1 128k \
12 -map 0:v -map 0:a -map 0:v -map 0:a -var_stream_map "v:0,a:0,name:1080p v:1,a:1,name:480p" \
13 -master_pl_name 'playlist.m3u8' \
14 'public/animation_with_audio/playlist_segment_%v.m3u8'
For a video file without an audio,
1INPUT_VIDEO_FILE=animation_without_audio.mkv
2
3ffmpeg -i $INPUT_VIDEO_FILE \
4 -c:v libx264 \
5 -hls_time $SEGMENT_LENGTH \
6 -hls_playlist_type vod \
7 -hls_segment_type fmp4 \
8 -b:v:0 8M -r:v:0 30 -bufsize 8M \
9 -b:v:1 2.5M -r:v:1 24 -bufsize 2.5M \
10 -map 0:v -map 0:v -var_stream_map "v:0,name:1080p v:1,name:480p" \
11 -master_pl_name 'playlist.m3u8' \
12 'public/animation_without_audio/playlist_segment_%v.m3u8'
How Video on Demand works?
To develop a video-on-demand streaming, transcoding
is used to convert the video into a suitable format of streaming.
in this article, transcoding is used for 2 purposes: encoding and fragmenting.
Fragmenting
is used to achieve Adaptive bitrate streaming
, which is essential for seamless video streaming experiences without buffering videos yet as high quality as possible.
At first, I'll explain about Adaptive bitrate streaming
and then encoding to achieve the adaptive bitrate streaming.
Adaptive bitrate streaming
Adaptive bitrate streaming is a way to improve streaming on HTTP. With this, viewers can watch a video without waiting to load an entire video and they can play the video with the highest quality as much as possible without buffering considering the network bandwidth and device type [1].
In order to implement adaptive bitrate streaming, a video file is transcoded so that it is split into segment files and encoded for streaming. Each segment is usually for a few seconds and it is for each bit rate, to a client to adapt the condition, like downloading high-quality video segments when a networking condition is good.
Streaming format
So, to implement adaptive bitrate streaming, videos have to be encoded into suitable formats for streaming. There are mainly 3 formats for this: CMAF, HLS and MPEG-Dash [2][3].
CMAF (Common Media Application Format)
This is not a protocol, but the standard to use a command media format for streaming to reduce complexity to make this work with HLS and DASH protocols. CMAF supports a standard container format, which is fragmented MP4, for segments.
HLS (HTTP Live Streaming)
HLS is the format developed by Apple and widely used by its products.
After Apple supported CMAF for HLS, it also developed the Low-Latency HLS, and thus, there are 3 versions of HLS formats
- Traditional HLS
- HLS with the Low-Latency HLS protocol to improve latency
- HLS with the CMAF format to standard container files
MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
MPEG-DASH is another way of streaming method. You can see some comparisons in the article from Cloudflare.
Comparisons of 3 formats
CMAF | HLS | DASH | |
---|---|---|---|
Device supports required | No | Yes | Yes |
Fragmented MP4 support (fMP4) | Yes | Yes | Yes |
MPEG-TS support | No | Yes | Yes |
Video bit rate and frame rate
Next is about a bit rate and frame rate. Frame rate is the number of images in a second and each image is called frame. On the other hand, the video bit rate is the amount of details in each frame [4].
The recommended parameters of each bit rate and frame rate are different on each streaming service. For example, there are recommended settings on YouTube
On YouTube, the recommended parameters are different between an HDR (High Dynamic Range) and SDR (Standard Dynamic Range) video. In short, HDR is higher quality than SDR according to the article of ViewSonic.
For HDR videos, the recommended parameters of bit rates and frame rates on YouTube as of January 2024 are
Resolution | Video bit rate for frame rates (24, 25, 30) | Video bit rate for high frame rates (48, 50, 60) |
---|---|---|
1080p | 8 Mbps | 12 Mbps |
720p | 5 Mbps | 7.5 Mbps |
480p | 2.5 Mbps | 4 Mbps |
On the other hand, audio bit rates on YouTube are followings:
- Mono: 128 kbps
- Stereo: 384 kbps
- 5.1: 512 kbps
Compression algorithm
There are multiple compression algorithms, H.264 and H.265.
H.264, or Advanced Video Encoding (AVC), has been a video compression standard. And H.265 or High-Efficiency Video Coding (HEVC), is a successor of H.264 [5].
As a successor, H.265 compressed files roughly twice as efficiently as H.264 does while maintaining the image quality, although requiring more computational resources, not only the time for encoding but also decoding for video playbook.
FFmpeg commands
Now let's look at how the above ffmpeg
command works.
1ffmpeg -i animation_with_audio.mkv \
2 -c:v libx264 \
3 -hls_time 1 \
4 -hls_playlist_type vod \
5 -hls_segment_type fmp4 \
6 -b:v:0 8M -r:v:0 30 -bufsize 8M \
7 -b:v:1 2.5M -r:v:1 24 -bufsize 2.5M \
8 -b:a:0 384k \
9 -b:a:1 128k \
10 -map 0:v -map 0:a -map 0:v -map 0:a -var_stream_map "v:0,a:0,name:1080p v:1,a:1,name:480p" \
11 -master_pl_name 'playlist.m3u8' \
12 'public/animation_with_audio/playlist_segment_%v.m3u8'
To see the details of options and how they work, there are the official document and the wiki.
Here, I'm going to describe them briefly. There is more information on pages like wiki. For instance, H.264 encoding is described in this page
Map option
The map option chooses which stream of an input file should be output.
For example, the command ffmpeg -i input0.mp4 -i input1.mp4 -map 0:v -map 1:a
means
-map 0:v
means the video stream of 1st input fileinput0.mp4
will be output-map 1:a
means the audio stream of 2nd input fileinput1.mp4
will be output
The more details of map
option are described in the ffmpeg wiki
HLS specific parameters
-c:v libx264
: Use H.264.H.265
didn't work on a browser.-hls_segment_type
: The file format of a segment file.fmp4
means fragmented MP4 format.-hls_time X
: The time for each segment in seconds.-hls_playlist_type vod
: EmitEmit #EXT-X-PLAYLIST-TYPE:VOD
in the header of a m3u8 (manifest) file, and sethls_list_size
to 0.-hls_list_size
: The max number of segments. The default value is 5 and 0 means all segments.-hls_segment_filename
: The filename of each segment-master_pl_name
: The playlist name for HLSvar_stream_map
: Define how to group the audio, video, and subtitle streams into variant streams.v:0,a:0 v:1,a:1,s:1
creates a first group from the 1st stream of video and audio, and 2nd group from the 2nd stream of video, audio, and subtitle.- The allowed values for groups/variants are 0 to 9.
- To set a text instead of indexes in segment filenames, add
name:
on each variant
Bit rate and frame rate parameters
-b:v Xk
: The bitrate of a video stream-b:a Xk
: The bitrate of an audio stream-r:v X
: The frame rate on a video stream
On top of the above, there are a few parameters to limit output bit rates, described in the ffmpeg wiki.
- Bit rate and buffer size:
bufsize
determines how frequently checks for calculating and correcting the average bit rate. Without this, output bit rates could be different from the specified average rates too much or too low.
Frontend
Adaptive Bitrate Streaming, both of HLS and DASH and likely CMAF, isn't supported on browsers like Google Chrome, so it's required to use a library like Video.js.
1<html>
2 <head>
3 <meta name="viewport" content="width=device-width, initial-scale=1.0">
4 <link href="//vjs.zencdn.net/8.3.0/video-js.min.css" rel="stylesheet">
5 </head>
6 <body>
7 <video id="video" class="video-js vjs-default-skin" controls>
8 <source src="./output/playlist.m3u8" type="application/x-mpegURL" />
9
10 Your browser does not support the video tag.
11 </video>
12
13 <script src="//vjs.zencdn.net/8.3.0/video.min.js"></script>
14 <script src="https://unpkg.com/@videojs/http-streaming@3.8.0/dist/videojs-http-streaming.min.js"></script>
15 <script>
16 videojs('video');
17 </script>
18 </body>
19</html>
File structure
Before running the above command, create directories public/animation_with_audio
and public/animation_without_audio
.
Then it'll output files under those directories.
Then put index.html
file described on the above section under the public
directory.
So, the final file structure looks like next.
1$ tree .
2.
3├──
4├── animation_with_audio.mkv
5├── animation_without_audio.mkv
6├── hls.sh
7├── index.js
8├── package-lock.json
9├── package.json
10├── public
11│ ├── animation_with_audio
12│ │ ├── init_0.mp4
13│ │ ├── init_1.mp4
14│ │ ├── playlist.m3u8
15│ │ ├── playlist_segment_1080p.m3u8
16│ │ ├── playlist_segment_1080p0.m4s
17│ │ ├── playlist_segment_1080p1.m4s
18│ │ ├── playlist_segment_480p.m3u8
19│ │ ├── playlist_segment_480p0.m4s
20│ │ └── playlist_segment_480p1.m4s
21│ ├── animation_without_audio
22│ │ ├── init_0.mp4
23│ │ ├── init_1.mp4
24│ │ ├── playlist.m3u8
25│ │ ├── playlist_segment_1080p.m3u8
26│ │ ├── playlist_segment_1080p0.m4s
27│ │ ├── playlist_segment_1080p1.m4s
28│ │ ├── playlist_segment_480p.m3u8
29│ │ ├── playlist_segment_480p0.m4s
30│ │ └── playlist_segment_480p1.m4s
31│ └── index.html
32└── transcode.sh
Note that index.js
is a small script to start an HTTP server.
Others
Up until this section, all of them to run the above video was described. I also looked into a few other things which described below.
How to transcode to DASH format?
The next command also generates segment files by DASH format, as well as generating manifest files for HLS.
1ffmpeg -i $INPUT \
2 -c:v libx264 \
3 -c:a aac \
4 -f dash \
5 -dash_segment_type mp4 \
6 -seg_duration 4 \
7 -init_seg_name init\$RepresentationID\$.\$ext\$ \
8 -media_seg_name segment\$RepresentationID\$-\$Number%05d\$.\$ext\$ \
9 -hls_playlist true \
10 -hls_master_name 'playlist.m3u8' \
11 public/output/output.mpd
You can see the details of an option in the official document.
How to recognize if a video file exists an audio or not
I cannot find a way to run ffmpeg
for both video files with and without audio.
As a workaround, we can check if a video file contains it or not by the following script.
1AUDIO_CODEC_NAME=$(ffprobe -i $INPUT_FILE -select_streams a -v error -show_entries stream=codec_name -of json | jq -r '.streams[0].codec_name // empty')
2
3if [ "$AUDIO_CODEC_NAME" = "" ]; then
4 // No audio
5else
6 // With an audio
7fi
Reference
There are also other resources that describe more details about video transcoding such as the article of Scaleflex Blog.