Detect silent audio via FFMPEG (entire file)
I am trying to pass audio and video files into FFMPEG and determine if their audio is entirely silent (I don't need to detect if the audio stream is present or not, just if it is silent), and ideally return a boolean at the end of it all, or a 0/1. I am able to output the silence information via:
ffmpeg -i FILE.mov -af silencedetect=noise=0.0001 -f null - 2>&1
I think I would need to check if the last silence_duration value is equal to the duration.
There seems to be different rounding of decimal points depending on the values for FFMPEG's output (duration = 15.67, silence_duration=15.6667), so whatever accuracy is possible given these circumstances is ok.
I'm not sure how to parse the output in order to do this and any nudges in the right direction would be extremely helpful - Thanks!
Here are two possible examples that contain silence:
Entirely Silent File
ffmpeg version 4.4.1 Copyright (c) 2000-2021 the FFmpeg developers
built with Apple clang version 12.0.0 (clang-1200.0.32.29)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.4.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-avresample --enable-videotoolbox
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/Users/user/Desktop/SilenceAll.mov':
Metadata:
major_brand : qt
minor_version : 1
compatible_brands: qt
creation_time : 2021-11-23T16:08:58.000000Z
timecode : 01:00:00:09
Duration: 00:00:15.67, start: 0.000000, bitrate: 11795 kb/s
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 8955 kb/s, 30 fps, 30 tbr, 3k tbn, 60 tbc (default)
Metadata:
handler_name : Video Handler
vendor_id : [0][0][0][0]
encoder : H.264
Stream #0:1(und): Audio: pcm_s24le (in24 / 0x34326E69), 48000 Hz, stereo, s32 (24 bit), 2304 kb/s (default)
Metadata:
handler_name : Sound Handler
vendor_id : [0][0][0][0]
Stream #0:2(und): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
Metadata:
handler_name : Timecode Handler
timecode : 01:00:00:09
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (pcm_s24le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : qt
minor_version : 1
compatible_brands: qt
timecode : 01:00:00:09
encoder : Lavf58.76.100
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
handler_name : Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc58.134.100 wrapped_avframe
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default)
Metadata:
handler_name : Sound Handler
vendor_id : [0][0][0][0]
encoder : Lavc58.134.100 pcm_s16le
[silencedetect @ 0x7f93de904280] silence_start: 0.49 bitrate=N/A speed=11.7x
frame= 470 fps=0.0 q=-0.0 Lsize=N/A time=00:00:15.66 bitrate=N/A speed=31.7x
video:246kB audio:2938kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect @ 0x7f93de904280] silence_end: 15.6667 | silence_duration: 15.6667
Silent at the Begging and End of File (Two sections of silence, but not entirely silent)
ffmpeg version 4.4.1 Copyright (c) 2000-2021 the FFmpeg developers
built with Apple clang version 12.0.0 (clang-1200.0.32.29)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.4.1_3 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-avresample --enable-videotoolbox
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/Users/user/Desktop/SilenceTopTail.mov':
Metadata:
major_brand : qt
minor_version : 1
compatible_brands: qt
creation_time : 2021-11-23T16:09:32.000000Z
timecode : 01:00:00:09
Duration: 00:00:15.67, start: 0.000000, bitrate: 11795 kb/s
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 8955 kb/s, 30 fps, 30 tbr, 3k tbn, 60 tbc (default)
Metadata:
handler_name : Video Handler
vendor_id : [0][0][0][0]
encoder : H.264
Stream #0:1(und): Audio: pcm_s24le (in24 / 0x34326E69), 48000 Hz, stereo, s32 (24 bit), 2304 kb/s (default)
Metadata:
handler_name : Sound Handler
vendor_id : [0][0][0][0]
Stream #0:2(und): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
Metadata:
handler_name : Timecode Handler
timecode : 01:00:00:09
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> wrapped_avframe (native))
Stream #0:1 -> #0:1 (pcm_s24le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
Metadata:
major_brand : qt
minor_version : 1
compatible_brands: qt
timecode : 01:00:00:09
encoder : Lavf58.76.100
Stream #0:0(und): Video: wrapped_avframe, yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 30 fps, 30 tbn (default)
Metadata:
handler_name : Video Handler
vendor_id : [0][0][0][0]
encoder : Lavc58.134.100 wrapped_avframe
Stream #0:1(und): Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default)
Metadata:
handler_name : Sound Handler
vendor_id : [0][0][0][0]
encoder : Lavc58.134.100 pcm_s16le
[silencedetect @ 0x7fb140d0d100] silence_start: 0.49 bitrate=N/A speed=12.8x
[silencedetect @ 0x7fb140d0d100] silence_end: 5.16667 | silence_duration: 5.16667
[silencedetect @ 0x7fb140d0d100] silence_start: 10.1
frame= 470 fps=0.0 q=-0.0 Lsize=N/A time=00:00:15.66 bitrate=N/A speed=31.9x
video:246kB audio:2938kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect @ 0x7fb140d0d100] silence_end: 15.6667 | silence_duration: 5.56667
You can pipe the output of ffmpeg
through awk
for further processing:
ffmpeg ... | awk '/silence_end/ && ($5 == $8) {print "silent"}'
I didn't bother to check silence_start
because for the whole audio to be silent, silent_end
needs to match silence_duration
anyway.