Meaningful thumbnails for a Video using FFmpeg

FFmpeg can capture images from videos that can be used as thumbnails to represent the video. Most common ways of doing that are captured in the FFmpeg Wiki.

But, I don't want to pick random frames at some intervals. I found some options using filters on FFmpeg to capture scene changes:

The filter thumbnail tries to find the most representative frames in the video:

ffmpeg -i input.mp4 -vf  "thumbnail,scale=640:360" -frames:v 1 thumb.png

and the following command selects only frames that have more than 40% of changes compared to previous (and so probably are scene changes) and generates a sequence of 5 PNGs.

ffmpeg -i input.mp4 -vf  "select=gt(scene\,0.4),scale=640:360" -frames:v 5 thumb%03d.png

Info credit for the above commands to Fabio Sonnati. The second one seemed better as I could get n images and pick the best. I tried it and it generated the same image 5 times.

Some more investigation led me to:

ffmpeg -i input.mp4 -vf "select=gt(scene\,0.5)" -frames:v 5 -vsync vfr  out%02d.png

-vsync vfr ensures that you get different images. This still always picks the first frame of the video, in most cases the first frame is credits/logo and not meaningful, so I added a -ss 3 to discard first 3 seconds of the video.

My final command looks like this:

ffmpeg -ss 3 -i input.mp4 -vf "select=gt(scene\,0.5)" -frames:v 5 -vsync vfr out%02d.jpg

This was the best I could do. I have noticed that since I pick only 5 videos , all of them are mostly from beginning of the video and may miss out on important scenes that occur later in the video

I would like to pick your brains for any other better options.


How about looking for, ideally, the first >40%-change frame within each of 5 time spans, where the time spans are the 1st, 2nd, 3rd, 4th, and 5th 20% of the video.

You could also split it into 6 time spans and disregard the 1st one to avoid credits.

In practice, this would mean setting the fps to a low number while applying your scene change check and your argument to throw out the first bit of the video.

...something like:

ffmpeg -ss 3 -i input.mp4 -vf "select=gt(scene\,0.4)" -frames:v 5 -vsync vfr -vf fps=fps=1/600 out%02d.jpg

Defining meaningful is hard but if you want to make N thumbnails efficiently spanning whole video file this is what I use to generate thumbnails on production with user uploaded content.

Pseudo-code

for X in 1..N
  T = integer( (X - 0.5) * D / N )  
  run `ffmpeg -ss <T> -i <movie>
              -vf select="eq(pict_type\,I)" -vframes 1 image<X>.jpg`

Where:

  • D - video duration read from ffmpeg -i <movie> alone or ffprobe which has nice JSON output writer btw
  • N - total number of thumbnails you want
  • X - thumbnail number, from 1 to N
  • T - time point for tumbnail

Simply the above writes down center key-frame of each partition of the movie. E.g. if movie is 300s long and you want 3 thumbnails then it takes one key frame after 50s, 150s and 250s. For 5 thumbnails it would be 30s, 90s, 150s, 210s, 270s. You can adjust N depending on movie duration D, that e.g. 5 minute movie will have 3 thumbnails but over 1 hour will have 20 thumbnails.

Performance

Each invocation of above ffmpeg command takes a fraction of second (!) for ~1GB H.264. That is because it instantly jumps to <time> position (mind -ss before -i) and takes first key frame which is practically complete JPEG. There is no time wasted for rendering the movie to match exact time position.

Post-processing

You can mix above with scale or any other resize method. You can also remove solid color frames or try to mix it with other filters like thumbnail.


Try this

 ffmpeg -i input.mp4 -vf fps= no_of_thumbs_req/total_video_time out%d.png

Using this command I am able to generate the required number of thumbnails which are representative of the entire video.


I once did something similar, but I exported all frames of the video (in 1 fps) and compared them with a perl utility I found which computes the difference between images. I compared each frame to previous thumbnails, and if it was different from all thumbnails, I added it to the thumbnails collection. The advantage here is that if your video moves from scene A to B and them returns to A, ffmpeg will export 2 frames of A.