I would like to be able to robustly stop a video when the video arrives on some specified frames in order to do oral presentations based on videos made with Blender, Manim...

I'm aware of this question, but the problem is that the video does not stops exactly at the good frame. Sometimes it continues forward for one frame and when I force it to come back to the initial frame we see the video going backward, which is weird. Even worse, if the next frame is completely different (different background...) this will be very visible.

To illustrate my issues, I created a demo project here (just click "next" and see that when the video stops, sometimes it goes backward). The full code is here.

The important part of the code I'm using is:

      var video = VideoFrame({
          id: 'video',
          frameRate: 24,
          callback: function(curr_frame) {
              // Stops the video when arriving on a frames to stop at.
              if (stopFrames.includes(curr_frame)) {
                  console.log("Automatic stop: found stop frame.");
                  pauseMyVideo();
                  // Ensure we are on the proper frame.
                  video.seekTo({frame: curr_frame});
              }
          }
      });

So far, I avoid this issue by stopping one frame before the end and then using seekTo (not sure how sound this is), as demonstrated here. But as you can see, sometimes when going on the next frame it "freezes" a bit: I guess this is when the stop happens right before the seekTo.

PS: if you know a reliable way in JS to know the number of frames of a given video, I'm also interested.

Concerning the idea to cut the video before hand on the desktop, this could be used... but I had bad experience with that in the past, notably as changing videos sometimes produce some glitches. Also, it can be more complicated to use at it means that the video should be manually cut a lot of time, re-encoded...

EDIT Is there any solution for instance based on WebAssembly (more compatible with old browsers) or Webcodec (more efficient, but not yet wide-spread)? Webcodec seems to allow pretty amazing things, but I'm not sure how to use them for that. I would love to hear solution based on both of them since firefox does not handle webcodec yet. Note that it would be great if audio is not lost in the process. Bonus if I can also make controls appear on request.

EDIT: I'm not sure to understand what's happening here (source)... But it seems to do something close to my need (using webassembly I think) since it manages to play a video in a canvas, with frame... Here is another website that does something close to my need using Webcodec. But I'm not sure how to reliably synchronize sound and video with webcodec.

EDIT: answer to the first question

Concerning the video frame, indeed I chose poorly my frame rate, it was 25 not 24. But even by using a framerate of 25, I still don't get a frame-precise stop, on both Firefox and Chromium. For instance, here is a recording (using OBS) of your demo (I see the same with mine when I use 25 instead of 24):

enter image description here

one frame later, see that the butter "fly backward"(this is maybe not very visible with still screenshots, but see for instance the position of the lower left wing in the flowers):

enter image description here

I can see three potential reasons: first (I think it is the most likely reason), I heard that video.currentTime was not always reporting accurately the time, maybe it could explain why here it fails? It seems to be pretty accurate in order to change the current frame (I can go forward and backward by one frame quite reliably as far as I can see), but people reported here that video.currentTime is computed using the audio time and not the video time in Chromium, leading to some inconsistencies (I observe similar inconsistencies in Firefox), and here that it may either lead the time at which the frame is sent to the compositor or at which the frame is actually printed in the compositor (if it is the latest, it could explain the delay we have sometimes). This would also explain why requestAnimationVideoFrame is better, as it also provides the current media time.

The second reason that could explain that problem is that setInterval may not be precise enough... However requestAnimationFrame is not really better (requestAnimationVideoFrame is not available in Firefox) while it should fire 60 times per seconds which should be quick enough.

The third option I can see is that maybe the .pause function is quite long to fire... and that by the end of the call the video also plays another frame. On the other hand, your example using requestAnimationVideoFrame https://mvyom.csb.app/requestFrame.html seems to work pretty reliably, and it's using .pause! Unfortunately it only works in Chromium, but not in firefox. I see that you use metadata.mediaTime instead of currentTime, maybe this is more precise than current time.

The last option is that there is maybe something subtle concerning vsync as explained in this page. It also reports that expectedDisplayTime may help to solve this issue when using requestAnimationVideoFrame.


Solution 1:

The video has frame rate of 25fps, and not 24fps: enter image description here

After putting the correct value it works ok: demo
The VideoFrame api heavily relies on FPS provided by you. You can find FPS of your videos offline and send as metadata along with stop frames from server.


The site videoplayer.handmadeproductions.de uses window.requestAnimationFrame() to get the callback.


There is a new better alternative to requestAnimationFrame. The requestVideoFrameCallback(), allows us to do per-video-frame operations on video.
The same functionality, you domed in OP, can be achieved like this:

   const callback = (now, metadata) => {
      if (startTime == 0) {
        startTime = now;
      }
      elapsed = metadata.mediaTime;
      currentFrame = metadata.presentedFrames - doneCount;

      fps = (currentFrame / elapsed).toFixed(3);
      fps = !isFinite(fps) ? 0 : fps;

      updateStats();
      if (stopFrames.includes(currentFrame)) {
        pauseMyVideo();
      } else {
        video.requestVideoFrameCallback(callback);
      }
   };
   video.requestVideoFrameCallback(callback);

And here is how demo looks like.
The API works on chromium based browsers like Chrome, Edge, Brave etc.


There is a JS library, which finds frame rate from video binary file, named mediainfo.js.

Solution 2:

See if this helps you. I will expand on it later if it's useful to you.

Can test online via: https://www.w3schools.com/tags/tryit.asp?filename=tryhtml5_video

  • It will count total frames in MP4 file.
  • it will estimate current frame (as video plays).

Let me know if useful towards a solution for your problem and I will expand it to do reverse playback etc and also deal with stopping at specific frames (when set into the "stopping points" box).

<!DOCTYPE html>
<html>
<body>

<h1 style="position: absolute; top: 10px; left: 10px" > Demo // Stop Video at Specific Frame(s) : </h1>
<br>

<div style="z-index: 1; overflow:hidden; position: absolute; top: 60px; left: 10px; font-family: OpenSans; font-size: 14px">
<p> <b> Choose an .MP4 video file... </b> </p>
<input type="file" id="choose_media" accept=".mov, .mp4" />
</div>

<video id="myVideo" width="640" height="480" controls muted playsinline 
style="position: absolute; top: 80px; left: 10px" >
<source src="vc_timecode3.mp4" type="video/mp4">
</video>

<div id="cont_texts" style="position: absolute; top: 80px; left: 700px" >

<span> Current Time : </span> <span id="txt_curr_time"> 00:00:00 </span> 
<br><br>
<span> Estimated Frame Num : </span> <span id="txt_est_frame"> 0 </span> 
<br><br>
<span> Total Frames (video) : </span> <span id="txt_total_frame"> -999 </span> 
<br><br>

<span onclick="check_points()" > Stopping Points Array : </span> <input type="text" id="stopPointsArray" value="" > 

</div>

</body>


<script>


////////////////////////////////

//# VARS
var myVideo = document.getElementById( 'myVideo' );
var video_duration;

var h; var m; var s;
var h_time; var m_time; var s_time;

var vid_curr_time = document.getElementById( 'txt_curr_time' );
var vid_est_frame = document.getElementById( 'txt_est_frame' );
var vid_total_frame = document.getElementById( 'txt_total_frame' );

var reader; //to get bytes from file into Array
var bytes_MP4; //updated as Array

//# MP4 related vars
var got_FPS = false; var video_FPS = -1; 
var temp_Pos = 0;  var sampleCount = -1;
var temp_int_1, temp_int_2 = -1;

                    
var array_points = document.getElementById("stopPointsArray");
array_points.addEventListener('change', check_points );

//# EVENTS
document.getElementById('choose_media').addEventListener('change', onFileSelected, false);

myVideo.addEventListener("timeupdate", video_timeUpdate);           
myVideo.addEventListener("play", handle_Media_Events );
myVideo.addEventListener("pause", handle_Media_Events );
myVideo.addEventListener("ended", handle_Media_Events );

//# LET'S GO...
        
function onFileSelected ( evt )
{
    file = evt.target.files[0];
    path = (window.URL || window.webkitURL).createObjectURL(file);
    
    reader = new FileReader();
    reader.readAsArrayBuffer(file);
    
    
    reader.onloadend = function(evt) 
    {
        //alert( " file is selected ... " );
        
        if (evt.target.readyState == FileReader.DONE) 
        {
            bytes_MP4 = new Uint8Array( evt.target.result );
            get_MP4_info( bytes_MP4 );
            
            //# use bytes Array as video tag source
            let path_MP4 = (window.URL || window.webkitURL).createObjectURL(new Blob([bytes_MP4], { type: 'video/mp4' })); //'image/png' //'application/octet-stream'
            myVideo.src = path_MP4;
            myVideo.load();
            
            video_duration = myVideo.duration;
            txt_total_frame.innerHTML =( sampleCount);
            //alert("video FPS : " + video_FPS);
        }
        
    }
    
}

function check_points (e)
{
    alert( "Array Points are : " + e.target.value );
}

function handle_Media_Events()
{
    if ( event.type == "ended" )
    { 
        myVideo.currentTime = 0; myVideo.play(); myVideo.pause(); myVideo.play();
    }
    
    //{ myVideo.currentTime = 8; btn_ctrl.src = "ico_vc_play.png"; vid_isPlaying = false; bool_isPlaying = true; }
    
    if ( event.type == "play" )
    {
        if ( myVideo.nodeName == "VIDEO" )
        {

        }
    
    }
    
    else if ( event.type == "pause" )
    {
        
        
    }
    
    else if ( event.type == "seeked" )
    {
        
        
    }
    
}

function video_timeUpdate()
{
    vid_curr_time.innerHTML = ( convertTime ( myVideo.currentTime ) );
    
    vid_est_frame.innerHTML = Math.round ( video_FPS * myVideo.currentTime );
    
}

function convertTime ( input_Secs ) 
{
    h = Math.floor(input_Secs / 3600);
    m = Math.floor(input_Secs % 3600 / 60);
    s = Math.floor(input_Secs % 3600 % 60);

    h_time = h < 10 ? ("0" + h) : h ;
    m_time = m < 10 ? ("0" + m) : m ;
    s_time = s < 10 ? ("0" + s) : s ;
    
    if ( (h_time == 0) && ( video_duration < 3600) ) 
    { return ( m_time + ":" + s_time ); }
    else 
    { return ( h_time + ":" + m_time + ":" + s_time ); }
     
}

function get_MP4_info( input ) //# "input" is Array of Byte values
{
    //# console.log( "checking MP4 frame-rate..." );
    
    got_FPS = false;
    temp_Pos = 0; //# our position inside bytes of MP4 array
     
    let hdlr_type = "-1"; 
    
    while(true)
    {
        //# Step 1) Prepare for when metadata pieces are found  
        //# When VIDEO HANDLER Atom is found in MP4
        
        //# if STSZ ... 73 74 73 7A  
        if (input[ temp_Pos+0 ] == 0x73)
        {
            if ( ( input[temp_Pos+1] == 0x74 ) && ( input[temp_Pos+2] == 0x73 ) && ( input[temp_Pos+3] == 0x7A ) )
            {
                if ( hdlr_type == "vide" ) //# only IF it's the "video" track
                {
                    temp_Pos += 12;
                    sampleCount = ( ( input[temp_Pos+0] << 24) | (input[temp_Pos+1] << 16) | (input[temp_Pos+2] << 8) | input[temp_Pos+3] );
                    console.log( "found VIDEO sampleCount at: " + sampleCount );
                    
                    video_FPS = ( ( sampleCount * temp_int_1 ) / temp_int_2 );
                    console.log( "FPS of MP4 ### : " +  video_FPS );
                }
                
            }
            
        }
        
        //# Step 2) Find the pieces of metadata info
        //# Find other Atoms with data needed by above VIDEO HANDLER code.
        
        
        //# for MOOV and MDAT
        if (input[ temp_Pos ] == 0x6D) //0x6D
        {
            //# if MDAT ... 6D 64 61 74
            if ( ( temp_Pos[temp_Pos+1] == 0x64 ) && ( input[temp_Pos+2] == 0x61 ) && ( input[temp_Pos+3] == 0x74 ) )
            {
                temp_int = ( ( input[temp_Pos-4] << 24) | (input[temp_Pos-3] << 16) | (input[temp_Pos-2] << 8) | input[temp_Pos-1] );
                temp_Pos = (temp_int-1);
                if ( temp_Pos >= (input.length-1) ) { break; }
            }
            
            //# if MOOV ... 6D 6F 6F 76
            if ( ( input[temp_Pos+1] == 0x6F ) && ( input[temp_Pos+2] == 0x6F ) && ( input[temp_Pos+3] == 0x76 ) )
            {
                temp_int = ( ( input[temp_Pos-4] << 24) | (input[temp_Pos-3] << 16) | (input[temp_Pos-2] << 8) | input[temp_Pos-1] );
            }
            
            //# if MDHD ... 6D 64 68 64
            if ( ( input[temp_Pos+1] == 0x64 ) && ( input[temp_Pos+2] == 0x68 ) && ( input[temp_Pos+3] == 0x64 ) )
            {
                temp_Pos += 32;
                
                //# if HDLR ... 68 64 6C 72
                if (  input[temp_Pos+0] == 0x68 )
                {
                    if ( ( input[temp_Pos+1] == 0x64 ) && ( input[temp_Pos+2] == 0x6C ) && ( input[temp_Pos+3] == 0x72 ) )
                    {
                        temp_Pos += 12;
                        hdlr_type = String.fromCharCode(input[temp_Pos+0], input[temp_Pos+1], input[temp_Pos+2], input[temp_Pos+3] );
                    }
                }
            }
            
            //# if MVHD ... 6D 76 68 64
            if ( ( input[temp_Pos+1] == 0x76 ) && ( input[temp_Pos+2] == 0x68 ) && ( input[temp_Pos+3] == 0x64 ) )
            {
                temp_Pos += (12 + 4);
                
                //# get timescale
                temp_int_1 = ( ( input[temp_Pos+0] << 24) | (input[temp_Pos+1] << 16) | (input[temp_Pos+2] << 8) | input[temp_Pos+3] );
                ///console.log( "MVHD timescale at: " + temp_int_1 );
                
                //# get duration
                temp_int_2 = ( ( input[temp_Pos+4+0] << 24) | (input[temp_Pos+4+1] << 16) | (input[temp_Pos+4+2] << 8) | input[temp_Pos+4+3] );
                ///console.log( "MVHD duration at: " + temp_int_2 );
            }
            
        }
        
        if( temp_Pos >= (input.length-1) ) { break; }
        if( got_FPS == true) { break; }
        
        temp_Pos++;
    }
    
}

</script>

</html>