Keyframes

This commit is contained in:
2024-05-28 15:37:18 +02:00
parent a6d4a23c15
commit 27d669599a
2 changed files with 22 additions and 22 deletions

View File

@@ -18,7 +18,7 @@ The service should be able to:
- Allow clients to automatically select the best video quality it can play (and auto-switch when internet speed changes) - Allow clients to automatically select the best video quality it can play (and auto-switch when internet speed changes)
- Prefer the original video if it can be played by the device/connection speed - Prefer the original video if it can be played by the device/connection speed
The last point is particularly important since Kyoo is self-hosted and user's servers are not always powerful enough to always transcode video. The last point is particularly important since Kyoo is self-hosted and user's servers are not always powerful enough to always transcode video. For example a Raspberry Pi will not be able to transcode videos, but simply transmuxing and conserving the original video stream is possible.
As for any video services, the following points should also be satisfied: As for any video services, the following points should also be satisfied:
- Start playing fast (we don't want to wait 30s to start watching a movie) - Start playing fast (we don't want to wait 30s to start watching a movie)
@@ -152,42 +152,45 @@ The user will now have to rewatch part of the movie or wait for the transcoder t
![scenario-illustration.png](./scenario-paint.png "A schema of the scenario made with the help of paintjs.app") ![scenario-illustration.png](./scenario-paint.png "A schema of the scenario made with the help of paintjs.app")
So how should we fix that? The obvious idea is to start the new encode directly at the requested segment, so users don't have to wait. While the idea is pretty simple, actually implementing it is a lot harder. So how should we fix that? The obvious idea is to start the new encode directly at the requested segment, so users don't have to wait.<br/>
First, you want to start the transcode at a specific segment, but you don't know the start time in seconds of that segment. And even if we knew the start time of the segment, we can't simply remove previous segments from the index.m3u8 file. It's illegal to do so and the player would not be able to seek before in the video. While the idea is pretty simple, actually implementing it is a lot harder. First, you want to start the transcode at a specific segment, but you don't know the start time in seconds of that segment. And even if we knew the start time of the segment, we can't simply remove previous segments from the index.m3u8 file. It's illegal to do so and the player would not be able to seek before in the video.
### Alignments
In truth, HLS has another rule: each variant needs to have their segments aligned (same length and start time). I'll steel a diagram from a twitch's blog: In truth, HLS has another rule: each variant needs to have their segments aligned (same length and start time). I'll steel a diagram from a twitch's blog:
![variant-alignment](./twitch-variant-alignment.png "Source: https://blog.twitch.tv/en/2017/10/10/live-video-transmuxing-transcoding-f-fmpeg-vs-twitch-transcoder-part-i-489c1c125f28/") ![variant-alignment](./twitch-variant-alignment.png "Source: https://blog.twitch.tv/en/2017/10/10/live-video-transmuxing-transcoding-f-fmpeg-vs-twitch-transcoder-part-i-489c1c125f28/")
We will talk about what does IDR means in the next chapter. You can see that each segment is aligned: they start and end at exactly the same time in all variants. This makes it easy to switch quality/variant at any point (as illustrated by the arrows). You can see that each segment is aligned: they start and end at exactly the same time in all variants. This makes it easy to switch quality/variant at any point (as illustrated by the arrows).
To specify segments length we can either use `-segment_time` to specify a single length for all segments, or we can use `-segment_times` and specify an array of length with one value per segment. In ffmpeg, we can either use `-segment_time` to specify a single length for all segments, or we can use `-segment_times` and specify an array of length with one value per segment.<br/>
That's great, and you might think this solves the issue, but we can't simply cut a video in two at any point. We need each segment to start with a keyframe, the `IDR` in the previous illustration. If we tried to run a ffmpeg command with this flags, you would quickly notice an issue: segments are not at the right duration! This is because a segment must start with a keyframe (the `IDR` in the previous illustration).
While we can manually create keyframes at the start of segment when we transcode, we have no control over keyframes when we transmux (keep the original video stream). This means we could have a HLS setup like this: While we can manually create keyframes at the start of segment when we transcode (using the `-force_key_frames` flag), we have no control over keyframes when we transmux (keep the original video stream). This means we could have a HLS setup like this:
![variant-misalignment](./twitch-variant-misalignment.png "Same graph as before but with a transmux stream") ![variant-misalignment](./twitch-variant-misalignment.png "Same graph as before but with a transmux stream")
Let's take a step back and focus on what's a keyframe beforehand: Clients watching this stream could not change quality without replaying or skipping part of a segment. Let's take a step back and focus on what's a keyframe before searching for a solution.
## Keyframes ## A story about Keyframes
What's a keyframe you might ask: it's an independent frame (I-frame) in a video stream. Think of it has an image. Video frames can either be independent (keyframes) or dependant on a keyframe. A dependant frame does not store the whole image but the differences relative to a keyframe (a keyframe before for a B-frame and a keyframe after for a P-frame) ### What's a keyframe
![i frame graph] So, what's a keyframe: it's an independent frame (I-frame) in a video stream. Think of it has an image. Video frames can either be independent (keyframes) or dependent on another frame. A dependent frame does not store the whole image but the differences relative to another frame.
Great so just put a keyframe every time we create a segment, no? Well yes and no. It would be easy to do so when we transcode, there is a ffmpeg option for that: `-force_keyframe 2` will force a keyframe every 2 seconds. But what about times when we preserve the original video stream (copy it)? ![ipb-frame-explanation](./ipb-frames.jpg "Source: https://www.canon.com.hk/cpx/en/technical/va_EOS_Movie_Compression_Options_All_I_and_IPB.html")
It's important to allow playback of the original video stream without re-encoding it since it offers the best video quality. It is also way faster to process on the server. With this enabled even playing on a raspberry pi is doable. In the previous illustration, you can see on the top row what you would see when playing the video. On the bottom row, you can see video frames. I-frames are keyframes, P-frames and B-frames are both dependent frames (B-frames are `bidirectionally predicted pictures`, it can depend on both previous and future frames). You can read more about I/P/B frames on [Wikipedia](https://en.wikipedia.org/wiki/Inter_frame).br/
With this new knowledge about dependent frames, you can now understand why segments must start with a keyframe. A player could not show the image without the preceding keyframe anyway.
So we absolutely need to allow playback of the original video stream, where we have no control of keyframes. There can be a keyframe every frame or we could have 3 minutes of video without any keyframes. Segments still need to start with a keyframe, even in original quality. So we absolutely need to allow playback of the original video stream, where we have no control of keyframes. There can be a keyframe every frame, or we could have 3 minutes of video without any keyframe. Segments still need to start with a keyframe, even in original quality.
## Allowing original playback ### Allowing original playback
There is only one way to meet the previously stated constraints: giving up control on fixed segments length and aligning on keyframes. Instead of creating a segment every 4s, we scan the whole video and extract keyframes timestamps and create a new segment only on one of those timestamps. There is only one way to align the original video stream with the transcoded stream: giving up control on fixed segments length and aligning on the original keyframes. Since we can control the transcoded's stream keyframe, we can put them at the same times as the original's stream keyframes.<br/>
This means we can't simply create a keyframe/segment every 4s, we need to scan the whole video to extract keyframes timestamps and create a new segment only on one of those timestamps.
When creating the hls stream from the original video stream, we simply cut it at a previously extracted keyframes. For transcoded stream, we force keyframes and segments cut exactly like before but we use the original's video keyframes as a reference. When creating the hls stream from the original video stream, we simply cut segments at a previously extracted keyframes. For transcoded stream, we force keyframes and segments cut exactly like before but we use the original's video keyframes as a reference.
To extract keyframes from a video file, we can use ffprobe, a tool that ships with ffmpeg. The following command gives keyframes: To extract keyframes from a video file, we can use ffprobe, a tool that ships with ffmpeg. The following command gives keyframes:
@@ -195,7 +198,7 @@ To extract keyframes from a video file, we can use ffprobe, a tool that ships wi
If you run this command, you will notice that it's extremily slow. That's because the `-skipkey` argument is a decoder argument so it needs to decode all video frames and then discard the frames wich are not keyframes. We can effectively do the same thing 20 times faster by manually filtering keyframes. If you run this command, you will notice that it's extremily slow. That's because the `-skipkey` argument is a decoder argument so it needs to decode all video frames and then discard the frames wich are not keyframes. We can effectively do the same thing 20 times faster by manually filtering keyframes.
```ffprobe fast``` ```ffprobe -loglevel error -select_streams v:0 -show_entries packet=pts_time,flags -of csv=print_section=0 input.mkv | awk -F',' '/K/ {print $1}'```
This command will output something like that: This command will output something like that:
@@ -215,10 +218,7 @@ I iterated a lot on this transcoder, my first implementation was written in C an
Kyoo's transcoder also has other features that resolve around video like extracting subtitles, fonts or media thumbnails for seeking (see picture below). It's still a moving project with new features coming, but the core transcoding process is done and fully working! The next feature that will probably come is intro/outro detection using audio fingerprints. Kyoo's transcoder also has other features that resolve around video like extracting subtitles, fonts or media thumbnails for seeking (see picture below). It's still a moving project with new features coming, but the core transcoding process is done and fully working! The next feature that will probably come is intro/outro detection using audio fingerprints.
This was my first blog about Kyoo's development, If you want to read more about a specific topic, please manifest yourself! If you liked this article, consider sharing it or staring Kyoo on github. This was my first blog about Kyoo's development, If you want to read more about a specific topic, please manifest yourself! If you liked this article, consider sharing it or staring [Kyoo](https://github.com/zoriya/kyoo) on github.
<!-- vim: set wrap: --> <!-- vim: set wrap: -->
`ffmpeg -i input.mkv -map 0:V:0 -c:v libx264 -preset faster -vf scale=854:480 -bufsize 10500000 -b:v 1200000 -maxrate 2100000 -force_key_frames 2.002000,4.004000,6.006000,... -sc_threshold 0 -f segment -segment_time_delta 0.2 -segment_format mpegts -segment_times 2.002000,4.004000,6.006000,... -segment_list_type flat -segment_list pipe:1 -segment_start_number 0 /tmp/k/segment-480p-0-%d.ts `

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB