Video Quality Metric

Video quality metrics are algorithms de-signed to predict how actual viewers wouldvgauge video quality. These metrics are used primarily for comparing codecs and encoding settings, video bitrates.

Understanding Video Quality Metrics

PSNR

Probably the most widely used metric, but also recognized as having the lowest predictive value. Still cited by Netflix, Facebook, and other companies in codec comparisons and similar applications, but usage is declining.

PSNR measures decibels on a scale from 1–100. Though these numbers are not universally accepted, Netflix has posited that values in excess of 45dB yield no perceivable benefits, while values below 30 are almost always accompanied by visual artifacts. These observations have proven extremely useful for my work, but only when comparing full-resolution output to full-resolution source. When applied to lower rungs in an encoding ladder, higher numbers are better, but lose their ability to predict a subjective rating. For example, for 360p video compared to the original 1080p source, you’ll seldom see a PSNR score higher than 39dB, even if there are no visible compression artifacts.

SSIM

Slightly higher predictive value than PSNR, and less well-known, but favored by some codec researchers and compression engineers.It’s scoring system anticipates a very small range from -1 to +1, with higher scores better. Most high-quality video is around .98 and above, which complicates comparisons. While you can mathematically calculate how much better .985 is than .982, at the end of the day, it still feels irrelevant.

VMAF

Invented by Netflix and then open-sourced, VMAF is widely available. Designed and tuned for use in evaluating streams encoded for multiple-resolu- tion rungs on an encoding ladder, VMAF is the en- gine behind Netflix’s highly respected per-title and per-clip encoding stacks.VMAF scores also rank from 1–100. While higher scores are always better, individual scores, like a rating of 55 for a 540p file, have no predictive value of subjective quality. You can’t tell if that means the video is perfect or awful. That said, when analyzing an encoding ladder, VMAF scores typically run from the low teens or lower for 180p streams, to 98+ for 1080p streams, which meaningfully distinguishes the scores. In addition, VMAF differences of 6 points or more equals a just-noticeable difference (JND), which is very useful for analyzing a number of encoding-related scenarios, including codec comparisons.

Tool for measuring metrics

We used FFMPEG, a open-source tool that can compute both PSRN and SSIM and VMAF with vmaf library at Netflix - https://github.com/Netflix/vmaf

Credited to: SK (haisk@uiza.io)