Reduce background noise and optimize the speech from an audio clip using ffmpeg
I extract audio clips from a video file for speech recognition. These videos come from mobile/other handmade devices and hence contain a lot of noise. I want to reduce the background noise of the audio so that the speech that I relay to my speech recognition engine is clear. I am using ffmpeg to do all of this stuff, but am stuck at the noise reduction phase.
Till now I have tried following filters:
ffmpeg-20140324-git-63dbba6-win64-static\bin>ffmpeg -i i nput.wav -filter_complex "highpass=f=400,lowpass=f=1800" out2.wav
ffmpeg -i i nput.wav -af "equalizer=f=1000:width_type=h:width=900:g=-10" output.wav
ffmpeg -i i nput.wav -af "bandreject=f=1200:width_type=h:width=900:g=-10" output.wav
But the results are very disappointing. My reasoning was that since speech comes under 300-3000 hz range I can filter out all other frequencies to suppress any background noise. What am I missing?
Also, I read about weiner filters that could be used for speech enhancements and found this but am not sure how to use it.
Solution 1:
If you are looking to isolate audible speech try combining a lowpass filter with a high pass filter. For usable audio I have noticed that filtering out 200hz and below then filter out 3000hz and above does a pretty good job of keeping usable voice audio.
ffmpeg -i <input_file> -af "highpass=f=200, lowpass=f=3000" <output_file>
In this example add the high pass filter first to cut the lower frequencies then use the low pass filter to cut the higher frequencies. If needed you could run your file through this more than once to clean up higher db frequencies within the cut frequency ranges.
Solution 2:
FFmpeg now has 3 native filters to deal with noise background:
-
afftdn
: Denoises audio samples with FFT -
anlmdn
: Reduces broadband noise in audio samples using a Non-Local Means algorithm -
arnndn
: Reduces noise from speech using Recurrent Neural Networks. Examples for model files to load can be found here.
Also, since some time, one can use ladspa
(look for noise-supressor) and/or lv2
(look for speech denoiser) filters with FFmpeg.
Solution 3:
Update: FFmpeg recently added afftdn
which uses the noise threshold per-FFT-bin method described below, with various options for adapting / figuring out appropriate threshold values on the fly.
anlmdn
(non-local means) is a technique that works well for video; I haven't tried the audio filter.
Either of these should be much better than highpass / lowpass, unless your only noise is a 60Hz hum or something. (Human speech can still sound ok in a pretty narrow bandpass, but there are much better ways to clean up a broadband noise background hiss.)
ffmpeg doesn't have any decent audio filters for noise-reduction built in. Audacity has a fairly effective NR filter, but it's designed to be used with 2-pass operation with a sample of just the noise, and then the input.
The comments at the top of https://github.com/audacity/audacity/blob/master/src/effects/NoiseReduction.cpp explain how it works. (basically: suppress every FFT bin that's below the threshold. So it only lets signals through when they're louder than the noise floor in that frequency band. It can do amazing things without causing problem. It's like a band-pass filter that adapts to the signal. Since the energy of the noise is spread over the whole spectrum, only letting through a few narrow bands of it will reduce the total noise energy a LOT.
See also Audio noise reduction: how does audacity compare to other options? for more details of how it works, and that thresholding FFT bins in one way or another is the basis of typical commercial noise-reduction filters, too.
Porting that filter to ffmpeg would be a bit awkward. Maybe implementing it as a filter with 2 inputs, instead of a 2-pass filter, would work best. Since it only needs a few seconds to get a noise profile, it's not like it has to read through the whole file. And you SHOULDN'T feed it the whole audio stream as a noise sample, anyway. It needs to see a sample of JUST noise to set thresholds for each FFT bin.
So yeah, a 2nd input, rather than 2pass, would make sense. But that makes it a lot less easy to use than most ffmpeg filters. You'd need a bunch of voodoo with stream split / time-range extract. And of course you need manual intervention, unless you have a noise sample in a separate file that will be appropriate for multiple input files. (one noise sample from the same mic / setup should be fine for all clips from that setup.)
Solution 4:
I had a video with vary bad background noise. I managed to fix in this way: I did two pass with following command:
ffmpeg -i input.mp4 -af "afftdn=nf=-25" file1.mp4
ffmpeg -i file1.mp4 -af "afftdn=nf=-25" file2.mp4
Than I used in order to clarify the speak
ffmpeg -i file2.mp4 -af "highpass=f=200, lowpass=f=3000" file3.mp4
At the end increased the volume with:
ffmpeg file3.mp4 -af "volume=4" finaloutput.mp4
In this way I managed to have a fairly good audio. Anyway sound is something subjective and what is good for me can be not for others. Hope it helps. M.M.