How does bitrate differ for the same resolution and framerate?
Reading about video quality I found that it depends of resolution, frames per second and bitrate, which decides the size of the video.
My question is how the bitrate is calculated and how it can differ.
Let's say a video has a 360x240 resolution. It takes 86400 pixels per frame. The frame rate is 30 Hz. So the video takes 86400 × 30 = 2592000 pixels per second.
So let's say 1 pixel is 3 Bytes (24 Bits) of data: we have 2592000 × 24 bits per second video (62208000 Bits), that is 62208 kBits (This does not sound right, maybe some problem in my calculation).
But how can it differ and how does it make difference in quality?
What you've calculated is the bitrate for a raw, uncompressed video. You typically won't find these except in research or other specialized applications. Even broadcasters use compressed video, albeit at a much higher bitrate than your typical YouTube video.
So, video quality has a lot to do with how the video was compressed. The more you compress it, the less bits it takes per frame. Also, the more you compress, the worse the quality is. Now, some videos are much easier to compress than others – in essence, this is why they have a lower bitrate even though they have the same resolution and framerate.
In order to understand why this is, you need to be aware of the two main principles video compression uses. These are called "spatial" and "temporal redundancy".
Spatial redundancy
Spatial redundancy exists in images that show natural content. This is the reason JPEG works so well — it compresses image data because blocks of pixels can be coded together. These are 8 × 8 pixels, for example. These are called "macroblocks".
Modern video codecs do the same: They basically use similar algorithms to JPEG in order to compress a frame, block by block. So you don't store bits per pixel anymore, but bits per macroblock, because you "summarize" pixels into larger groups. By summarizing them, the algorithm will also discard information that is not visible to the human eye — this is where you can reduce most of the bitrate. It works by quantizing the data. This will retain frequencies that are more perceivable and "throw away" those we can't see. Quantizing factor is expressed as "QP" in most codecs, and it's the main control knob for quality.
You can now even go ahead and predict macroblocks from macroblocks that have been previously encoded in the same image. This is called intra prediction. For example, a part of a grey wall was already encoded in the upper left corner of the frame, so we can use that macroblock in the same frame again, for example for the macroblock right next to it. We will just store the difference it had to the previous one and save data. This way, we don't have to encode two macroblocks that are very similar to each other.
Why does bitrate change for same image size? Well, some images are easier to encode than others. The higher the spatial activity, the more you actually have to encode. Smooth textures take up less bits than detailed ones. The same goes for intra prediction: A frame of a grey wall will allow you to use one macroblock to predict all others, whereas a frame of flowing water might not work that well.
Temporal redundancy
This exists because a frame following another frame is probably very similar to its predecessor. Mostly, just a tiny bit changes, and it wouldn't make sense to fully encode it. What video encoders do is just encode the difference between two subsequent frames, just like they can do for macroblocks.
Taking an example from Wikipedia's article on motion compensation, let's say this is your original frame:
Then the difference to the next frame is just this:
The encoder now only stores the actual differences, not the pixel-by-pixel values. This is why the bits used for each frame are not the same every time. These "difference" frames depend on a fully encoded frame, and this is why there are at least two types of frames for modern codecs:
- I-frames (aka keyframes) — these are the fully encoded ones
- P-frames — these are the ones that just store the difference
You occasionally need to insert I-frames into a video. The actual bitrate depends also on the number of I-frames used. Moreover, the more difference in motion there is between two subsequent frames, the more the encoder has to store. A video of "nothing" moving will be easier to encode than a sports video, and use less bits per frame.
I believe your math is actually correct, but there is a little more to it; compression is the missing link here.
You calculated the uncompressed bit rate, and came up with the reason that compression exists. The bit rates become impossibly large with uncompressed video. So, they compress the video at the source, and uncompress it at the receiver, and then the bit rate becomes manageable. You just need a fast enough decompressor, which may be hardware or software.
So, the issue becomes how much compression can be tolerated - it's not lossless, usually, so you are losing information, but they try to make it intelligent enough to lose the less important data that won't be so noticeable. It usually is fairly easy until there is a lot of motion, then it becomes more complicated.
Edit: Forgot to add, but the parts that implement the compression method is the codec; I noticed that you used this as a tag in your post.