merging video and audio with different durations

Following https://superuser.com/a/762752/7796 I'm using the following to merge a video and audio file into one file with some custom audio panning rules.

ffmpeg -i vid.mp4 -i rec.mp3 -filter_complex "[0:a][1:a]amerge[a]" \
-strict experimental -c:a aac -map 0:v -map "[a]" -ar 48000 -ab 128k \
-c:v copy -f mp4 out.mp4 -y

ffmpeg version N-61286-gdbc3e11 Copyright (c) 2000-2014 the FFmpeg developers
  built on Mar 11 2014 22:01:37 with gcc 4.8.2 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetype --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-zlib
  libavutil      52. 66.101 / 52. 66.101
  libavcodec     55. 52.102 / 55. 52.102
  libavformat    55. 34.100 / 55. 34.100
  libavdevice    55. 11.100 / 55. 11.100
  libavfilter     4.  3.100 /  4.  3.100
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 18.100 /  0. 18.100
  libpostproc    52.  3.100 / 52.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'vid.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf55.34.100
  Duration: 00:00:20.07, start: 0.033333, bitrate: 214 kb/s
    Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 480x270 [SAR 1:1 DAR 16:9], 80 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
[mp3 @ 02c6c2a0] Estimating duration from bitrate, this may be inaccurate
Input #1, mp3, from 'rec.mp3':
  Duration: 00:00:50.31, start: 0.000000, bitrate: 127 kb/s
    Stream #1:0: Audio: mp3, 44100 Hz, mono, s16p, 128 kb/s
[Parsed_amerge_0 @ 02d514e0] No channel layout for input 1
[Parsed_amerge_0 @ 02d514e0] Input channel layouts overlap: output layout will be determined by the number of distinct input channels
Output #0, mp4, to 'out.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf55.34.100
    Stream #0:0(eng): Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 480x270 [SAR 1:1 DAR 16:9], q=2-31, 80 kb/s, 25 fps, 12800 tbn, 12800 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1: Audio: aac ([64][0][0][0] / 0x0040), 48000 Hz, stereo, fltp, 128 kb/s (default)
Stream mapping:
  Stream #0:1 (aac) -> amerge:in0
  Stream #1:0 (mp3) -> amerge:in1
  Stream #0:0 -> #0:0 (copy)
  amerge -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
[mp3 @ 02c6f960] overread, skip -6 enddists: -4 -4
frame=  501 fps=227 q=-1.0 Lsize=     527kB time=00:00:20.07 bitrate= 215.0kbits/s
video:196kB audio:315kB subtitle:0 data:0 global headers:0kB muxing overhead 3.170177%

The problem is that if the video is 30 seconds and the audio 45, the end merged video is 30 seconds so I'm missing the remaining 15 seconds of audio. Displaying just a black screen or the final frame of the video for the remaining seconds (until the audio is finished) will be fine.

How can I do this?


Solution 1:

After doing some testing, it seems that the problem is caused by the amerge filter. According to documentation:

6.7 amerge

[...]

If inputs do not have the same duration, the output will stop with the shortest.

Try using apad as a previous filter:

ffmpeg -i vid.mp4 -i rec.mp3 -filter_complex \
"[0:a]apad [b] ; [b][1:a]amerge[a]" \
-strict experimental -c:a aac -map 0:v -map "[a]" -ar 48000 -ab 128k \
-c:v copy -f mp4 out.mp4 -y