Getting URL of YouTube video with `youtube-dl` is slow, without downloading video
Solution 1:
The time is spend doing work
The command does not hang or wait for something wasting time,
it actually does work that takes time; It most probably takes time by adding up multiple small network delays. But it could also be that there are delays on the youtube side, that add up.
That it just the time it takes to download the HTML that is needed;
The command needs to make at least two HTTP requests, one after another, and probably more.
So if anything is slow, it's multiplied by the number of requests already.
For me it takes 1.5 seconds on a very fast line - that is not that far from 8 seconds.
How to find out
I'll show the commands I used to find out:
To make the examples more tidy, we use a variable for the URL:
$ u="https://www.youtube.com/watch?v=k4JGSAmu4lg"
We want to measure the duration of commands; Using the command time
needs to take care not to mix up the command and the shell builtin. We use a small function to make the lines shorter:
$ t(){/usr/bin/time -f 'Time: %es' "$@";}
Your command writes out the URL of the video file (truncated to 80 columns):
$ youtube-dl -g "$u"
https://r20---sn-cxg7en7d.googlevideo.com/videoplayback?signature=091F68E823
Let's measure the time it takes to run on my computer:
$ t youtube-dl -g "$u"
https://r20---sn-cxg7en7d.googlevideo.com/videoplayback?signature=091F68E823
Time: 1.44s
Ok, one and a half seconds. Faster than in the question, but not that much faster.
But how is it spending the time? Maybe it does download the video in some hidden way and discards it? The video is 11min in 360p. Just downloading it with no options takes about 13s - ten times longer.
Need to take a closer look, with the verbose option -v
:
$ t youtube-dl -v -g "$u"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', '-g', 'https://www.youtube.com/watch?v=k4J
[debug] Encodings: locale 'UTF-8', fs 'UTF-8', out 'UTF-8', pref: 'UTF-8'
[debug] youtube-dl version 2014.02.06
[debug] Python version 2.7.6 - Linux-3.13.0-24-generic-x86_64-with-Ubuntu-14
[debug] Proxy map: {}
https://r20---sn-cxg7en7d.googlevideo.com/videoplayback?sparams=id%2Cinitcwn
Time: 1.40s
Oh, there is some delay before the '[debug]' lines are printed. Looks like youtube-dl
spends some time for it's own configuration setup. It's a quarter of a second or so, not the delay we are looking for. But what we can learn from it is that the youtube-dl
implementation itself may be slow.
After the messages, nothing happens until the result URL is printed. So we still do not see the interesting part.
The option -g
is to "simulate" the video download in the sense that it does the complicated part of finding out that semi-secret URL, prints it, but then skips the actual download in the end. There is a similar option -s
that does not output the URL, and seems similar otherwise. Let's assume it's similar enough if it takes about the same time; We need to check that.
$ t youtube-dl -v -s "$u"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['-v', '-s', 'https://www.youtube.com/watch?v=k4J
[debug] Encodings: locale 'UTF-8', fs 'UTF-8', out 'UTF-8', pref: 'UTF-8'
[debug] youtube-dl version 2014.02.06
[debug] Python version 2.7.6 - Linux-3.13.0-24-generic-x86_64-with-Ubuntu-14
[debug] Proxy map: {}
[youtube] Setting language
[youtube] k4JGSAmu4lg: Downloading webpage
[youtube] k4JGSAmu4lg: Downloading video info webpage
[youtube] k4JGSAmu4lg: Extracting video information
Time: 1.45s
Ok, -s
takes the same time as -g
, so it's ok to replace them for testing.
More interesting is that we got more output now. And it's printed with an interesting timing: The lines get printed with a similar delay to each other, so it seems they are about the actions that are actually taking the time we look for.
From the messages, at least two web pages are downloaded. But we can assume that the word "page" will not mean a single HTTP request and a single HTML document.
What did we learn?
The main point is, the work of the program does actually take time, it's not waiting for something, or hanging.
Also, we see multiple steps taking similar amounts of time. There is not much to calculate, so that's network roundtrips in some way, adding up.
That means, the latency of our connection is important only here. The throughput of the connection is just irrelevant.
If you would make your internet connection faster so it can transfer data at double speed - that would not help at all.
But if you can get better ping
times, that will make it much faster.
It's not about 'ping' times to your internet service provider, though; The ping time all the way to YouTube it what matters - and may be not possible to change.
Interestingly, for the next step, downloading a video, the requirements for a fast line are exactly the opposite: latency is not relevant at all, and throughput really matters.
Not tired yet?
Want still more details to understand what the time is really spend on?
The next step would be to trace the HTTP connection; I would suspect that it may show many more roundtrips than two, for redirects for example. You could use wireshark
, or a logging HTTP proxy, or strace
to just count the system calls for connecting or writing.
For today, we have both looked deep enough into the rabbit hole of networking.
Solution 2:
Just do a:
youtube-dl -j --flat-playlist 'https://www.youtube.com/watch?v=k4JGSAmu4lg' | jq -r '.id' | sed 's_^_https://youtube.com/v/_'
Source