Split Up a Video Using FFMPEG through Scene Detection

I saw this thread, which almost exactly does what I want, but I'm actually looking for splitting on scene detection.

Automatically split large .mov video files into smaller files at black frames (scene changes)?

For example, let's say I have a woman on screen in from 0:01 -> 0:05, then a man in a different scene from 0:06 -> 0:09, and a second woman on screen from 0:10 -> 0:14

This (ideally) would create three different video clips. I'd really like it down to the frame level, if possible, with autodetection for when the scenes change.

** UPDATED **

Ok, I'm off to a great start. I've done the following using FFProbe:

ffprobe -show_frames -of compact=p=0 -f lavfi "movie=foo.mp4,select=gt(scene\,.4)" > foo.txt

Which gives me a list of timestamps that appear to be exactly right! Now the next step - how do I take this list of timestamps and input them back into ffmpeg to split it? Here's an example of the timestamps.

media_type=video|key_frame=1|pkt_pts=972221|pkt_pts_time=10.802456|pkt_dts=972221|pkt_dts_time=10.802456|best_effort_timestamp=972221|best_effort_timestamp_time=10.802456|pkt_duration=N/A|pkt_duration_time=N/A|pkt_pos=5083698|pkt_size=6220800|width=1920|height=1080|pix_fmt=rgb24|sample_aspect_ratio=1:1|pict_type=I|coded_picture_number=0|display_picture_number=0|interlaced_frame=0|top_field_first=0|repeat_pict=0|tag:lavfi.scene_score=0.503364
media_type=video|key_frame=1|pkt_pts=2379878|pkt_pts_time=26.443089|pkt_dts=2379878|pkt_dts_time=26.443089|best_effort_timestamp=2379878|best_effort_timestamp_time=26.443089|pkt_duration=N/A|pkt_duration_time=N/A|pkt_pos=12736403|pkt_size=6220800|width=1920|height=1080|pix_fmt=rgb24|sample_aspect_ratio=1:1|pict_type=I|coded_picture_number=0|display_picture_number=0|interlaced_frame=0|top_field_first=0|repeat_pict=0|tag:lavfi.scene_score=1.000000
media_type=video|key_frame=1|pkt_pts=2563811|pkt_pts_time=28.486789|pkt_dts=2563811|pkt_dts_time=28.486789|best_effort_timestamp=2563811|best_effort_timestamp_time=28.486789|pkt_duration=N/A|pkt_duration_time=N/A|pkt_pos=13162601|pkt_size=6220800|width=1920|height=1080|pix_fmt=rgb24|sample_aspect_ratio=1:1|pict_type=I|coded_picture_number=0|display_picture_number=0|interlaced_frame=0|top_field_first=0|repeat_pict=0|tag:lavfi.scene_score=0.745838
media_type=video|key_frame=1|pkt_pts=2627625|pkt_pts_time=29.195833|pkt_dts=2627625|pkt_dts_time=29.195833|best_effort_timestamp=2627625|best_effort_timestamp_time=29.195833|pkt_duration=N/A|pkt_duration_time=N/A|pkt_pos=13485087|pkt_size=6220800|width=1920|height=1080|pix_fmt=rgb24|sample_aspect_ratio=1:1|pict_type=I|coded_picture_number=0|display_picture_number=0|interlaced_frame=0|top_field_first=0|repeat_pict=0|tag:lavfi.scene_score=0.678877

You can directly use ffmpeg to detect and extract scenes on the fly without the need of printing and parsing frames information:

ffmpeg -i foo.mp4 -vf select='gt(scene\,0.4)' -vsync vfr frame%d.png

The -vsync vfr is required because images extraction does not work with variable framerate by default, see #1644.


Process your text to get your timestamps and print them in a .txt file, use the .txt in ffmpeg segmenter.

Accuracy won't be perfect and there are loads of issues you may run in to unless you have total control over the incoming content.

It's worth noting that this type of work is a current topic of research so again, it may produce imperfect results.