How to concatenate videos in ffmpeg with different attributes?

Since I'm assuming your inputs are going to be arbitrary I recommend using the concat filter instead of the concat demuxer because you're going to need to perform filtering anyway to conform everything into a common set of parameters and you can do everything in one command.

Make all videos 1280x720, 1:1 SAR, 30 fps, yuv420p

Using scale (width x height / resolution), setsar (aspect ratio), fps (frame rate), format (chroma subsampling), and concat (concatenation/joining) filters.

ffmpeg -i 1.mp4 -i 2.mp4 -i 3.mp4 -filter_complex \
"[0:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v0];
 [1:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v1];
 [2:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v2];
 [v0][0:a][v1][1:a][v2][2:a]concat=n=3:v=1:a=1[v][a]" \
-map "[v]" -map "[a]" -c:v libx264 -c:a aac -movflags +faststart output.mp4

Same as above but also processes audio to be stereo with 48000 sample rate

Added the aformat (sample rate and channel layout) filter.

ffmpeg -i 1.mp4 -i 2.mp4 -i 3.mp4 -filter_complex \
"[0:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v0];
 [1:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v1];
 [2:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v2];
 [0:a]aformat=sample_rates=48000:channel_layouts=stereo[a0];
 [1:a]aformat=sample_rates=48000:channel_layouts=stereo[a1];
 [2:a]aformat=sample_rates=48000:channel_layouts=stereo[a2];
 [v0][a0][v1][a1][v2][a2]concat=n=3:v=1:a=1[v][a]" \
-map "[v]" -map "[a]" -c:v libx264 -c:a aac -movflags +faststart output.mp4

With watermark

ffmpeg -i 1.mp4 -i 2.mp4 -i 3.mp4 -i logo.png -filter_complex \
"[0:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v0];
 [1:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v1];
 [2:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v2];
 [0:a]aformat=sample_rates=48000:channel_layouts=stereo[a0];
 [1:a]aformat=sample_rates=48000:channel_layouts=stereo[a1];
 [2:a]aformat=sample_rates=48000:channel_layouts=stereo[a2];
 [v0][a0][v1][a1][v2][a2]concat=n=3:v=1:a=1[vid][a];[vid][3]overlay=W-w-5:H-h-5[v]" \
-map "[v]" -map "[a]" -c:v libx264 -c:a aac -movflags +faststart output.mp4

For more info see overlay filter documentation and How to add and position watermark with ffmpeg?

Adding silent dummy audio for an input that does not have audio

The anullsrc filter is used to provide silent dummy audio if one of your inputs does not contain audio. This may be required because all segments to be concatenated must have the same number and type of streams. In other words, you can't concat a video without audio to a video with audio. So silent audio can be added as in this example:

ffmpeg -i 1.mp4 -i 2.mp4 -i 3.mp4 -t 0.1 -f lavfi -i anullsrc=channel_layout=stereo:sample_rate=48000 -filter_complex \
"[0:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v0];
 [1:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1,setsar=1,fps=30,format=yuv420p[v1];
 [2:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720::-1:-1,setsar=1,fps=30,format=yuv420p[v2];
 [0:a]aformat=sample_rates=48000:channel_layouts=stereo[a0];
 [2:a]aformat=sample_rates=48000:channel_layouts=stereo[a2];
 [v0][a0][v1][3:a][v2][a2]concat=n=3:v=1:a=1[v][a]" \
-map "[v]" -map "[a]" -c:v libx264 -c:a aac -movflags +faststart output.mp4

Note: Leave -t 0.1 as is: the duration of anullsrc only needs to be shorter than the duration of the associated video input(s). The concat filter will automatically extend the silent audio to match the length of the associated video input.