Compare two video files to find out which has best quality

I work in video quality research, and it's hard to give a simple answer to the question "which video is better". What you want is a program that gives you a Mean Opinion Score (MOS) of a video, i.e. a number between 1 and 5, or between 0 and 100, which corresponds to the quality as perceived by a human being.

If you want practical tools, you can skip the next sections.

Intro: Why you cannot simply compare bitrate/resolution/etc.

Just comparing video resolution won't tell anything about the quality. In fact, it may be completely misleading. A 1080p movie rip at 700MB size might look worse than a 720p rip at 700MB, because for the former, the bitrate is too low, which introduces all kinds of compression artifacts.

The same goes for comparing bitrate at similar frame sizes, as different encoders can actually deliver better quality at less bitrate, or vice-versa. For example, a 720p 700MB rip produced with XviD will look worse than a 700MB rip produced with x264, because the latter is much more efficient.

You would also have to define how a final "integral score" (the MOS) is composed of the individual quality factors. This heavily depends on several things, including but not limited to:

the type of videos you are comparing (cartoons, movies, news, etc.)
the original quality before they were encoded (you can encode a bad file with high bitrate and it won't make it better)

This is just speaking about pure compression quality. We're not even talking about how humans would perceive the videos. Let's assume you have a friend who is watching movies because he or she enjoys crisp details and high motion resolution. They would be much more critical when seeing a low quality rip than a friend who is just watching movies for their content. They probably would not care about the quality so much, as long as the movie is funny or entertaining.

There are different types of video quality metrics!

There are several so-called video quality metrics, which can be classified according to which kind of information is used to determine the quality. In principle and very simply speaking, you distinguish between the following:

No-reference metrics – You have one video as input and want a quality score. In your case you are looking for a no-reference metric, because you often do not even have the original video. Such a metric will take one video and output one quality score. Here are some examples of problems a NR metric will detect (e.g. blurring, blockiness).
Full-reference metrics – You have an original video and an encoded video and want a quality score. For example, you could take a Blu-ray movie, then create two rips from it, and use a full-reference metric to estimate the quality loss and your rips. This will take a long time to compute, but it's more accurate than NR metrics.

Note that the above metrics look at video encoding quality, but there are also metrics that incorporate problems like initial loading times and stalling events when streaming video (e.g. ITU-T P.1203).

What tools can I use?

Here is a list of ready-to-use tools that you can use to test video quality metrics:

VMAF – Video Multi-Method Assessment Fusion by Netflix (more info here)
VQMT – Video Quality Measurement Tool by the EPFL in Lausanne, Switzerland – combines several metrics
MSU Video Quality Tool, a commercial software, combines several metrics
AVQT – Apple's own video quality tool, premiered at WWWDC 2021.
ITU-T P.1203 Implementation for analysis of HTTP streaming quality
ITU-T P.1204.3 Implementation for analysis of H.264, H.265 and VP9-encoded bitstreams
FFMetrics, a Windows GUI for several video quality metrics available in FFmpeg

Now what metrics are there?

PSNR, PSNR-HVS and PSNR-HVS-M

For starters, PSNR (Peak Signal-to-Noise Ratio) is a very simple-to-use but somewhat poor method of assessing video quality. It works relatively well for most applications and quick diagnostics, but it does not give a good estimation of how humans would perceive the quality.

PSNR can be calculated frame-by-frame, and then you would for example average the PSNR of a whole video sequence to get the final score. Higher PSNR is better. ffmpeg can be used to calculate PSNR.

PSNR-HVS and PSNR-HVS-M are extensions of PSNR that try to emulate human visual perception, so they should be more accurate. VQMT and MSU can calculate PSNR, PSNR-HVS and PSNR-HVS-M between two videos.

SSIM, MS-SSIM

Structural Similarity (SSIM) is as easy to calculate as PSNR, and it delivers more accurate results, but still on a frame-by-frame basis. ffmpeg can calculate SSIM (use this command but replace psnr with ssim).

You can also use VQMT or MSU. These tools also include MS-SSIM, which gives better (i.e., more representative) results than SSIM, as well as a few other derivatives.

The results should be similar to PSNR. Again, you need to compare a reference to a processed video for this to work, and both videos should be of the same size.

VMAF

Video Multi-Method Assessment Fusion by Netflix is a set of tools to calculate video quality based on some existing metrics, which are then fused by machine learning methods into a final score between 0 and 100. Netflix have explained the whole thing here:

[VMAF] predicts subjective quality by combining multiple elementary quality metrics. The basic rationale is that each elementary metric may have its own strengths and weaknesses with respect to the source content characteristics, type of artifacts, and degree of distortion. By ‘fusing’ elementary metrics into a final metric using a machine-learning algorithm - in our case, a Support Vector Machine (SVM) regressor - which assigns weights to each elementary metric, the final metric could preserve all the strengths of the individual metrics, and deliver a more accurate final score.

You can also use ffmpeg to calculate VMAF scores.

VQM

The Video Quality Metric was validated in the Video Quality Experts Group (VQEG) and is a very good full-reference algorithm. You can download VQM for free or use the implementation from MSU.

When you register and download, you want to use the NTIA General Model or the Video Quality Model with Variable Frame Delay.

AVQT

Apple have developed their own video quality tool (Advanced Video Quality Tool, AVQT), which works in a full-reference manner, so it requires an input video and a degraded version. You can only use this on Apple machines with Apple Silicon processors.

So far, only few details are known about how well this tool works. In contrast to other tools like VMAF, it can handle much larger display resolutions and even HDR. However, its accuracy has not yet been independently validated.

ITU P.1204.3

This is an ITU-T standard for bitstream-based evaluation of video quality. It is a short term video quality prediction model that uses full bitstream data to estimate video quality scores on a segment level (for segments of ~10 seconds length).

A reference implementation can be found on GitHub.

Other Metrics

PEVQ is a standardized full-reference metric under ITU-T J.246. It aims at multimedia signals, but not HD video.
VQuad-HD is another full-reference metric standardized as ITU-T J.341. Since it's newer, its better suited for HD video.

Both of them are commercial solutions and you'll not find a software to download for them.

There are also some ITU standards on no-reference metrics, such as ITU-T P.1201 and ITU-T P.1202, which work with parameters from the bitstream for IPTV streaming. ITU-T P.1203 can be used for adaptive streaming cases.

Summary

If you just seek to compare simple objectively measurable criteria like:

Frame size
Bit rate
Frames per second
Video resolution

… a simple call to ffprobe input.mp4 should give you all the details you need at the beginning. You could then summarize this in a spreadsheet. Note that when you encode videos, x264 for example will log stuff like PSNR straight to a file if you need to, so you can use these values later.

As for how to weigh these criteria, you should probably emphasize the bitrate – but only if you know that the codec is the same. You could generally say that when you have two encodes of the same original, and both videos use x264, the one with higher bitrate will be better. Even more generally, you should choose a lower resolution when you have two videos with the same bitrate, since the degradation due to upscaling is not as bad as the degradation due to low bitrate.

Comparing different codecs according to their bit rate is not possible unless you know more about the content and the individual encoding settings. Frame rate is a very subjective thing too and should be counted into your measurements if it is well below 25 Hz.

To summarize, heavily emphasize the bitrate if it's the only thing you have. Don't forget to use your eyes, too :)

I'm unaware of any tool which will give you a final recommendation or score, but using FFmpeg, you can output all the details you listed in the question.

On the command line, ffmpeg -i will list the information from the video. From there, you can write a script to parse the information and weight it as you see appropriate.