Which audio encoders in FFmpeg support 8 kHz?
I have an old video (made by a Casio Exilim EX-Z40, if it matters), whose audio stream ffprobe
reports as pcm_u8, 8000 Hz, mono, u8
.
I would like to transcode it into something modern.
Transcoding with FFmpeg defaults fails:
libfaac doesn't support this output format!
So presumably libfaac doesn't support 8 kHz, because -c:a copy
works.
Which encoders support an 8 kHz sampling rate? The list found here barely mentions sampling rates at all.
Can I script something that tries every installed codec, from…
ffmpeg -codecs | grep EA`
…to see directly which ones work?
The native FFmpeg AAC encoder (-c:a aac
) supports 8000 Hz sample rate:
ffmpeg -h encoder=aac
...
Supported sample rates: 96000 88200 64000 48000 44100 32000 24000 22050 16000 12000 11025 8000 7350
It will automatically choose the sample rate the most closely matches the input, so you don't need to declare -ar
:
ffmpeg -i input.mov -c:a aac output.m4a
Which audio encoders in FFmpeg support 8 kHz?
aac, aptx, aptx_hd, dca, flac, g723_1, libfdk_aac, libmp3lame, libopus, libspeex, libvorbis, real_144, wavpack, many pcm variants.
There are probably others, but reporting of supported_samplerates
is inconsistent.
I would like to transcode it into something modern.
libfaac has been removed from FFmpeg for years and is not considered to be a modern AAC encoder. Your ffmpeg
must be ancient. Update and use the native FFmpeg AAC encoder, or compile and use libfdk_aac
.
If you want the most modern use libopus
.
But when I tried [aac], compared to the original, the file size increased and some high frequencies were attenuated.
Since I suspect your ffmpeg
is very old you are likely missing the major quality updates to the encoder aac
. Upgrade and quality will likely improve.
Sampling rate and codec are different parameters. Most likely you want something along the lines of
-ar 48000 -c:a aac
To upsample from 8KHz to 48KHz and the compress to AAC
8 KHz is fairly standard for speech, known as 'narrow band'. If this is speech then you should have plenty of options, although not that many are supported by FFmpeg out-of-the-box. Probably the best options are
- AMR - you can compile libopencode-amrnb into FFmpeg for support
- Opus, which will use the Vorbis CELT speech codec
However 8KHz 8-bit PCM isn't a very good source in the first place: most encoders will expect / hope for better input, e.g. 8-bit G.711 mu-law which is effectively 12-bit data encoded as 8-bit floating point. They may not do well with pure 8-bit PCM input as it might not fit speech patterns they're modelled for.
It's also a fairly small file already, and it's possible that your video container won't support more complicated codecs. So I think this is more trouble than it's worth, and I'd leave the audio as-is.
Opus is generally considered the best low-bitrate codec available, and doesn't have problems with an 8kHz input sample rate. The resulting opus stream can still be decoded to whatever sample rate is convenient for the decoder. (Like other lossy codecs, it compresses based on frequency bands after doing an FFT. But some other codecs apparently only want to decode to the same sample rate as the input. As other answers point out, you can get FFmpeg to resample the input before giving it to the codec, but you don't need that for Opus.
Try ffmpeg -c:a libopus -b:a 24k -frame_duration 120
for 24 kbit/s Opus.
Perhaps worth trying: -application voip
to tune for "improved speech intelligibility" instead of the default audio
profile.
Setting -frame_duration
to the highest value reduces overhead, I think. You don't care about encoder / decoder latency because you just have files, not real-time 2-way voice chat. So you can let it buffer 120ms of audio and pack together multiple CELT or SILK frames to reduce redundancy of frame headers.
The best available Opus encoder is the free and open source libopus
(https://opus-codec.org) so FFmpeg can just use it, unlike with AAC where the best encoders are closed-source.
Opus has special modes for very low bitrate speech (like 16kb/s), detecting speech and even switching over to a speech-specific encoder (SILK) at low bitrates.
Opus's low-bitrate coding tools are similar to what HE-AACv2 can do, see the wikipedia article.
But when I tried it, compared to the original, the file size increased ...
Part of the point of lossy compression is that you can choose the output bitrate, trading off against quality. Most codecs can use -b:a 32k
for example to choose an audio bitrate of 32 kbit/s.
(For video, you can also trade off CPU time spent encoding, e.g. -preset veryslow vs. -preset medium. But compressing audio is cheap enough that most codecs don't have a lot of options for spending more CPU time to improve the bitrate vs. quality tradeoff.)
Mono 8-bit 8kHz PCM has a bitrate of 64 kbit/s = 8 * 8000 so you're aiming for lower than that, otherwise you might as well keep your original files. PCM is just raw samples so bitrate is just a product of sample rate and sample width. Like the audio equivalent of a .bmp
bitmap image. That's highly inefficient, and the reason better codecs were invented. (And as you know from listening, saving bitrate for PCM comes at a massive cost to quality and frequency range because bitrate is tied 1:1 with sample rate.
That's not the case when you quantize in the frequency domain with a lossy codec.)
and some high frequencies were attenuated. So, worse than
-c:a copy
FFmpeg's native AAC encoder -c:a aac
used to be pretty bad, and you were using an old FFmpeg. https://trac.ffmpeg.org/wiki/Encode/HighQualityAudio says that as of 2017, aac
is sometimes better than libfdk_aac
for AAC-LC (low-complexity high bitrate). It doesn't mention HE-AAC, though, and that's what you want for low bitrate AAC.
libfdk_aac
used to be the best open-source AAC encoder available, and maybe still is for HE-AAC. AFAIK, neither of them are as good as the best non-free AAC encoders, though.
For low-bitrate AAC, you really want HE-AAC which adds more coding tools https://en.wikipedia.org/wiki/High-Efficiency_Advanced_Audio_Coding. I'm not sure if -c:a aac
can do that.
https://trac.ffmpeg.org/wiki/Encode/HighQualityAudio lists some recommended settings and ranges of useful bitrates for various encoders.
But you probably want Opus, or possibly AMR-NB (narrowband) for bitrates like 4 kbit/s. I don't know how old the quality vs. bitrate plot on the Opus wiki article is, but it shows AMR-NB at higher quality than Opus down below 8kb/s.
With that few bits, you might be able to understand speech but it won't sound nice. It's just a question of which codec is least horrible.