Specifying parameters to create videos for ffmpeg's concat demuxer (to avoid a large re-encode)

ffmpeg can be used to concatenate files together:

If you have media files with exactly the same codec and codec parameters you can concatenate them [...]

(emphasis mine) My intention1 is to produce media files with the same codec and parameters so that I can take advantage of concat without incurring a long re-encode.

Preamble:

I have a file I would like to cut and keep useful parts from. I have written a python script to find the nearest keyframe to the desired cut point, and cut there, since when doing a stream copy ffmpeg can only use I-frames:

Using -ss as input option together with -c:v copy might not be accurate since ffmpeg is forced to only use/split on i-frames.

As it happens, the splits aren't happening at exactly the right moment, but are close enough for the moment that I can focus on another part of the equation. If I use the concat demuxer at this point, the different parts get joined together perfectly- so far so good!

However, I would like there to be smooth transitions between these segments, so I have further split these segments so that the short ends can be used to create a crossfade transition without re-encoding the entire set of files.

A basic diagram would probably help illustrate this:

  [111AAAA111BBBBB111111CCCCCCC1111DDDDD111]   | (original file)
     [AAAA] [BBBBB]    [CCCCCCC]  [DDDDD]      | (desired clips extracted)
[AAA] [A][B] [BBB] [B][C] [CCCCC] [C][D] [DDDD]| (split ends from clips)
      [AAA][ab][BBB][bc][CCCCC][cd][DDD]       | (transitions between short ends)
            [AAAabBBBbcCCCCCcdDDD]             | (intended output)

Problem:

This is where I've gotten to. When I used ffmpeg's concat demuxer to join the clips above I get significant video and audio artifacts on playback. My guess is there is a mismatch in codec parameters, as noted as a prerequisite way up at the top of this question. So, checking the video with ffprobe gives:

$ ffprobe -i ab-transition.mkv 2>&1 | grep Stream.*Video ; ffprobe -i B.mkv 2>&1 | grep Stream.*Video
Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709/bt709/iec61966-2-1), 1280x720, SAR 1:1 DAR 16:9, 62.50 fps, 62.50 tbr, 1k tbn, 120 tbc (default)
Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709/bt709/iec61966-2-1), 1280x720 [SAR 1:1 DAR 16:9], 62.50 fps, 62.50 tbr, 1k tbn, 125 tbc (default)

(I have omitted audio stream output as the streams have ostensibly the same parameters, yet the audio is not joined correctly)

There are differences. I used the -show_streams to get more detailed info, which are available at http://pastebin.com/4vcnDYtj (single blank line separating 2 outputs). diffing the output gives:

7c7
< codec_time_base=1/120
---
> codec_time_base=1/125
70,71c70,71
< start_pts=12
< start_time=0.012000
---
> start_pts=11
> start_time=0.011000

Update:

I have found options and matched parameters for everything that I can see except the codec time base (tbc). Is there a setting which will allow me to set codec_time_base (tbc)? Setting -r has no effect.

Update 2: Fearing this question was too niche for SU, I asked the question of the ffmpeg-user mailing list. Unfortunately -time_base is not an appropriate encoder option in this case:

This is an option for FFmpeg-internal encoders that you try to use for an external encoder (x264).

And more unfortunately, when I asked about general feasibility, the reply was

I don't think this is possible.

I have asked for clarification and possibilities surrounding the original encoding software - in this case OBS - which is potentially less flexible in option specification than ffmpeg due to having to match live stream consumer (Twitch) format specifications. I've yet to receive a reply from the mailing list, but have asked in the OBS forums as well.

More crucially, will controlling for these allow me to use the concat demuxer in ffmpeg to join these together without the need for a long encode process? Many thanks in advance.

(I realise this is a wall-of-text-and-a-half, so additions, subtractions or clarification suggestions are welcome of course. I would link to more official info but being <10 rep I cannot include more than 2 links!)


1: For more context, see my related question: How to efficiently and automatedly join video clips using short transitions?


According to the generic codec options, you can add -time_base to the libx264 encoder set during the creation of the transitional clips.

If I'm reading your file comparisons correctly -- ab-transition.mkv is showing a tcb of 1/120 while B.mkv is showing 1/125 (which is the value you want, right?) -- I'd suggest including an -r value in as well to make certain that both the framerate and time base are maintained:

-c:v libx264 [preset & crf/qp settings] -r 62.50 -time_base 1/125 [output]

As a side note, I would like to mention that my own attempts at using the concat demuxer without fully re-encoding the output file(s) has always resulted in problems, chiefly audio sync and frame drops. Best results have come through encoding separate clips with lossless audio and video to preserve the original quality...

-c:v -libx264 -preset ultrafast -qp 0 -c:a pcm_s16le

...then encoding the final file using the same audio/video settings used to create the source video(s).