How can I normalize the volume of a single video file?

I have a video file that was recorded from TV. During recording, we changed the volume levels in between multiple times.

Is it possible to (approximately) normalize the volume of the video track? Using any tool like ffmpeg and some heuristics? I am looking at two problems:

  1. What parameter/algorithm would be used for a single video file? Given that the video is not a song (hence can have valid portions of silence/low sound), a simple normalization may not be nice.

  2. What tool (command/options) would I use to normalize the sound?


Solution 1:

Normalization vs. compression

Normalization is not what you're trying to achieve. If you say that the video track has different volume levels, normalization will maximize the general amplitude of all parts at the same time, so that the loudest part (the peak) goes to 0dB or less, depending on the implementation. That means the loudness difference between silent and loud parts will still be the same as before and therefore audible. So that's true, "simple normalization" doesn't do the job here.

What you need to do is compress the audio signal. Compression means that the dynamic range of the signal will be reduced, so that loud parts and silent parts are "closer" to each other and the difference will not be so obvious anymore. This is what radio stations do frequently: They apply heavy compression to their broadcast tracks so they will sound good in loud environments like cars, etc, and also sound good when listened to at lower volume levels. The downside is that sometimes you hear that the chorus parts of a song (those that should sound louder) have less volume than the other parts.

Practical approach

I would go ahead and extract the audio signal from the video and then open it in a tool like Audacity. It has a built in compressor which you can use to reduce the dynamic range.

Here are some guidelines for its settings (depends on the file though, so you should just experiment and see what works best for you):

  • Threshold: The threshold is the loudness level at which the compressor kicks in. If you have very silent parts, you should set the threshold so that the compressor is active most of the time
  • Ratio: The ratio should be rather high. Too high values might make the track sound unnatural, though.
  • Attack/Release times: Experiment with those. Normally, you want a low attack time and a larger release time. Bad settings here can lead to "pumping" sounds, also depending on the content.

After that, you can add the audio track back to the video file again (there are several tutorials online, as well as probably some SO/SU questions)