Does 5 GB of jpeg images take the same amount of time to download and/or import as 5 GB of plain text?

Solution 1:

The answer is "it depends". Depends on what you mean by "download".

If you're downloading from a web site, than some sites automatically compress files "on the fly", and text compresses very well, while JPEG is already compressed, so it won't compress at all. In this case, there will be a big difference.

If you're just using a copy command to copy files from one computer to another, than there will be no difference. However, if you're employing some kind of a specialized tool, then again, it depends if that tool uses automatic compression or not. The only difference between jpeg and text is a possibility to compress the files.

There is no difference in the 'overhead' associated with file transfer, no matter what the file is.

Solution 2:

With 5GB of pictures you are likely to be talking about a few thousand reasonably sized files, say 3MB+ each. If you were download 5GB of text files, you'd typically expect each file to be a lot smaller. So you'd likely be dealing with an order of magnitude or two extra files (hundreds of thousands or millions of files).

Copying lots of small files takes longer than copying the same amount of data in bigger files. There is a reasonable overhead in creating each individual file.

Not enough to make a massive difference probably, but still a difference.

Solution 3:

The "It Depends" in ftp is in the fine details.

ftp Binary mode just a straight transfer and will take the time it takes for 5GB.

If you're going from Windows to Linux as an ftp text transfer (for surprisingly, plain text), ftp actually changes the line endings from /r/n to /n and vice-versa. There's probably a little overhead in the streaming replace, but with 5GB of text, you'll have less to write to disk going from win to lin as you drop one character per line, and more going from lin to win as you add one character per line.

So, is it 5GB on Linux? or Windows?

Enough pedantry for one night, going to bed!

Solution 4:

There is no overhead associated with files themselves, but some storage/transfer facilities support automatic compression, and that may introduce a difference.

When copying from DVD to an uncompressed drive, there is no difference. When copying to a compressed NTFS drive, text will take less space than JPEGs.

When downloading from HTTP server that uses compression, text will take less time to download. But if the server does not use compression, there will be no difference.

Also, talking about overhead, a million of small files 5GB total size will take more [actual] space and usually more time to copy than a single 5GB file, because that 5GB does not include space needed to store file names, dates and other metadata.

Solution 5:

This is meant to be an addition to the other answers that address compression, etc as factors that affect efficiency and download time.

One point that hadn't been mentioned yet is packet efficiency. I doubt most people have even come across this, so here's a brief bit of background.

Before venturing into using web services, we wanted to know the difference in efficiency between using them and using a more "standard" database connection (Such as OleDb, System.Data.SqlClient, JDBC, etc). We had our guru put packet sniffers in place to track the data streams across the network to see the difference.

We expected that using web services would be less efficient because of the binary format of the other types of connections, and the added overhead of the XML tags used to describe the data.

What we found was that the web services were, in many cases MORE efficient, at least on our network. The difference was that when were transferring binary data, some of the bytes within the packets were empty, but when sending text data, the packets were used more efficiently.

We found this interesting, and tried it while transferring different sorts of files, and found that as a rule, plain text going over the network always used 100% of the bits available in each packet, where binary transfers often had unused bits. Why this is, I couldn't tell you, but several experiments bore this out.

Several comments on the question seemed to dismiss this as an obviously flawed question, but it's really not. Even though the amount of data remains the same, the efficiency of the pipe, matters as well.

Because I can't resist making analogies that a non-IT person would understand:

A single shelf in a freezer in a grocery store has x amount of space, yet you can fit more gallons of ice cream on a shelf if the containers are square than you can if they are round, because of the wasted space created by using round containers. Our tests, although counter-intuitive at first, told us what any grocery store stocker could have told us.