How do all of these "Save video from YouTube" services work?

There is a very popular open source command-line downloader called youtube-dl, which does exactly that. It grabs the actual video and audio file links from a given YouTube link – or any other popular web video site like Vimeo, Yahoo! Video, uStream, etc.

To see how that's done, look into the YouTube extractor. That's just too much to show here. Other extractors exist for simpler sites.

In order to find the video stream, you'd have to pretend to be the actual browser client, trying to load the video. This means you first have to parse the HTML code, load the relevant Javascript code, and initialize a player object, which plays video through an HTML <video> element.

This means that somewhere in the Javascript execution, there is initialization code for the player, containing important parameters like where to actually find the video.

In the simplest case, the video might be present as a URL to some MP4 file, directly in some configuration object. This is very easy to parse by looking at the src attribute of the <video> element. But it could also be generated on the fly with some specific download tokens negotiated between client and some authentication server. The video might also play through a blob URL, so you cannot see it directly, because it's generated via MediaSource APIs.

Often, the Javascript code itself is obfuscated to make it harder to re-engineer it, using variables like xyz rather than player.

Most video websites these days use MPEG-DASH or Apple's HTTP Live Streaming (HLS) behind the scenes. These do not use direct URLs to a video file, but instead work with a so-called "manifest" file. The manifest provides meta-information to get the actual video stream. The manifest file (.mpd for example in DASH, and .m3u8 for HLS) will contain links to segments of video and audio, which you'd later have to combine to get a playable file.

Many websites transmit these manifests from the server to the client player, so if you can inspect the network requests made by the client, so might find a .mpd file which you can then just use to download the video segments from your own client.

However, the manifest could also be transmitted via other side-channels, embedded into some Javascript code, generated on-the-fly, etc. For youtube-dl, you can see how the code tries to extract the DASH manifest URL from the transmitted configuration information.

There's no general solution for this. It requires careful inspection and debugging of the target site.


Start with a typical video:

https://www.youtube.com/watch?v=XeojXq6ySs4

Using the same ID, construct a URL like this:

https://www.youtube.com/get_video_info?eurl=https://www.youtube.com&video_id=XeojXq6ySs4

The response will be a query string, like this (edited for readability):

innertube_api_version=v1&
innertube_context_client_version=2.20210504.09.00&
player_response=%7B%22responseContext%22%3A%7B%22serviceTrackingParams%22%3A...
ps=desktop-polymer&
root_ve_type=27240&

Extract the player_response value. This will be a JSON object, like this:

{
  "streamingData": {
    "adaptiveFormats": [
      {
        "itag": 137,
        "mimeType": "video/mp4; codecs=\"avc1.640020\"",
        "bitrate": 570464,
        "height": 1080,
        "signatureCipher": "s=VZVZOq0QJ8wRgIhANWm3sPF-2hbzQQGrErjQFMNmxTfALco..."
      }
    ]
  }
}

Then extract the signatureCipher value, this is a query string, like this:

sp=sig&
s=VZVZOq0QJ8wRgIhANWm3sPF-2hbzQQGrErjQFMNmxTfALcoZkZ4IVR1djIpAiEA8HFKix6d4B3T...&
url=https://r3---sn-q4flrnek.googlevideo.com/videoplayback%3Fexpire%3D16201927...

The url is the URL to the audio or video. However before you can access the URL, you must add an entry to the query string. The new key, is the value under sp above (sig in this case). The new value, is the value under s above (VZVZOq0QJ8wRgIhANWm3sPF-2hbzQQGrErjQFMNmxTfALcoZkZ4IVR1djIpA... in this case). However before you can add the new entry, you must decode the s value. To decode the value, take the following steps. First, visit the original page:

https://www.youtube.com/watch?v=XeojXq6ySs4

In the source code, will be some text like this:

/s/player/3e7e4b43/player_ias.vflset/en_US/base.js

which you can turn into:

https://www.youtube.com/s/player/3e7e4b43/player_ias.vflset/en_US/base.js

In this new page, will be some code like this:

var uy={an:function(a){a.reverse()},
gN:function(a,b){a.splice(0,b)},
J4:function(a,b){var c=a[0];a[0]=a[b%a.length];a[b%a.length]=c}};
vy=function(a){a=a.split("");uy.gN(a,2);uy.J4(a,47);uy.gN(a,1);uy.an(a,49);
uy.gN(a,2);uy.J4(a,4);uy.an(a,71);uy.J4(a,15);uy.J4(a,40);return a.join("")};

Take the original s value, and run it through this function:

vy('_l_lOq0QJ8wRAIgc-yNc9Z4lSO2CozG4B-W9uC5zeuTATDvqHlnQaHGNmkCICsZJGbEjKDmD...')

Result will look about the same, but scrambled:

AOq0QJ8wRAIgc-ylc9Z4lSO2CozG4B-W9uC5zeuTNTDvqH_nQaHGNmkCICsZJGbEjKDmDSnKg_atTR...

Finally you can construct the resulting URL:

https://r3---sn-q4fl6nz7.googlevideo.com/videoplayback?vprv=1&
id=o-AHThxQXyxJ3jfw5EBUJeT0IJLrdQeYpMdCsCImMfbuac&
sig=AOq0QJ8wRAIgc-ylc9Z4lSO2CozG4B-W9uC5zeuTNTDvqH_nQaHGNmkCICsZJGbEjKDmDSnKg_...

I have a library and program that does these steps:

https://pkg.go.dev/github.com/89z/mech/youtube


My answer: from 22 January 2019, using these methods can get caught if you try to bypass without linking your user information as well.

Why? since I'm a new user to this platform, I cannot comment for rule specified by @Daniel-B. According to new ToS (in German as I am in Germany; please translate) for YouTube under $6.1 G$:

You agree any automated system (including – but not limited to – any robot, spider or offline reader) to use that on the website accesses in such a way that more requests to the server within a specified time YouTube directed being able to reasonably produce as a human within the same time period using a publicly available, unmodified standard web browser;

Now they can find out the time duration for each request and can track if you are violating. How is it possible now, given this scenario and your external IP address will be known even if you use a VPN to protect yourself without linking details of user to any service.