Why was the Windows File Transfer so bad at predicting transfer time? [duplicate]

Estimation

xkcd

I know that the Windows copy dialog (in Windows XP) stores the copy in memory first, and it is still copying after the dialog closes, so the time is off, but why is the estimation of the time it will take to make a copy so inaccurate, even when memory copying has been disabled (in Vista and Windows 7)? It seems so arbitrary! How does the whole copy procedure work, and why can't Windows estimate it correctly?


Solution 1:

In short: the poor algorithms and the jumpy estimation is actually an implementation weakness.

Other tools like TeraCopy do a better job. I think it is not worth explaining why their implementation is not good. They will have noticed it and will improve.

What is difficult:

  1. You have to take into account resource fluctuations (CPU/Network bandwidth/HDD speed mainly)
  2. You need to extrapolate the time it'll take by predicting the behavior (what Windows file copy definitively does badly right now).
  3. Make adjustments time over time to your original estimation (I mean small adjustments not like in the funny picture above!)

For this not only the amount of bytes but the amount of files to create play a role. If you have a million of 1KB files or thousand 1MB files the situation will be quite different because the former has the overhead of creating many many files. Depending on the filesystem used, this could take more time than actually transferring the data.

This dialog drove me mad also quite a couple of times:

  • On an older WinNT system, if you had a lot of small files to copy, it displayed the name and nice animation for each file slowing down the whole process to be practically unusable.

The modern Windows copy stuff is not much better:

  • To compute the amount of data to transfer it seems to make a lookup first (that is what I suppose it does) so it takes ages if you select many directories until it effectively starts to do the job.
  • Some built-in timeout impeaches big files to be copied (> about 60GB on my system). The pain is that it tell you that after having copied already more than 30GB over the network and this is lost bandwith and time because you have to restart from scratch!
  • Copy of files from one computer to another is damn slow for some reason. (I mean compared with the available network bandwidth, using other tools it is faster so it's not a computational limitation.)

Solution 2:

Raymond Chen wrote a very nice article about this once. Basically, the dialog is just guessing :).

https://devblogs.microsoft.com/oldnewthing/20040106-00/?p=41193

"Because the copy dialog is just guessing. It can't predict the future, but it is forced to try. And at the very beginning of the copy, when there is very little history to go by, the prediction can be really bad.

Here's an analogy: Suppose somebody tells you, "I am going to count to 100, and you need to give continuous estimates as to when I will be done." They start out, "one, two, three...". You notice they are going at about one number per second, so you estimate 100 seconds. Uh-oh, now they're slowing down. "Four... ... ... five... ... ..." Now you have to change your estimate to maybe 200 seconds. Now they speed up: "six-seven-eight-nine" You have to update your estimate again.

Now somebody who is listening only to your estimates and not the the person counting thinks you are off your rocker. Your estimate went from 100 seconds to 200 seconds to 50 seconds; what's your problem? Why can't you give a good estimate?

File copying is the same thing. The shell knows how many files and how many bytes are going to be copied, but it doesn't know know how fast the hard drive or network or internet is going to be, so it just has to guess. If the copy throughput changes, the estimate needs to change to take the new transfer rate into account."

Solution 3:

I am going to count to ten, 1....2....3....4 how many dots is it going to take to get to 10?

5.6.7 What about now? Do you take in to account all past dots between numbers and average it, do you only take the last 4 intervals and use that average, do you only look at the last interval?

You have the same problem with file transfers. The speed that the file transfers is not constant, it speeds up and slows down based on a lot of factors. The reason the number jumps around so much is Microsoft leaned toward the "only count the last interval" side of the spectrum.

There is nothing wrong with that side of the spectrum, it gives you more accurate "seconds per second" (one second in real time makes the counter go down by one second) but this causes the total ETA of the timer to jump around a lot.

A good example of the opposite side is 7-Zip when it is compressing. If the speed of the compression drops as it processes you can see that the ETA does not jump dramatically like a file transfer ETA, but it may take 2 to 3 real seconds before the timer ticks down one second (or it even may start counting up) until it stabilizes at the new speed.