How to force the first frame to be key-frame?

When encoding video, the first frame has to be a keyframe. It will be the first one fully encoded, and subsequent frames may use it for inter-frame prediction. Also, at the beginning of the coded video sequence, you will have an H.264 access unit that tells the decoder to refresh.

So, regardless of what you're doing: Unless you just copy the bitstream, you're re-encoding the video and your first frame has to be a keyframe.

Now, for whatever reason your stream has an offset in its start time. This means that all the presentation timestamps are also shifted according to this offset. If you inspect the head of the ffprobe -show_frames output, you'll see that frame 0 will indeed be a keyframe, but with a different PTS.

To compensate for this, you can subtract the start time from all PTS.