Entropy of file in network transfer

If we ignore application layer file compression (such as Mega, or iCloud compressing a file before transfer), does the contents of a file affect the speed of transfer?

i.e. - all things being equal, does the underlying internet/routers/phy layer, care whether it's transporting 1GB of zeros vs 1Gb of high entropy random data?

I understand there can be compression, but I'm asking specifically without that enabled.


If we ignore application layer file compression (such as Mega, or iCloud compressing a file before transfer), does the contents of a file affect the speed of transfer?

I understand there can be compression, but I'm asking specifically without that enabled.

No, one byte always takes the same amount of time to transmit, regardless of its value, and one packet of a given size and type always takes the same amount of time to handle, regardless of its payload.

However, there are other possible differences besides just the speed of transfer:

i.e. - all things being equal, does the underlying internet/routers/phy layer, care whether it's transporting 1GB of zeros vs 1Gb of high entropy random data?

Some physical layers do care. A long string of identical bits can cause them to desynchronize, as they rely on the occassional transition between 0s and 1s, so they can lose track of where a bit or a byte or a symbol starts and where it ends. To prevent this from causing problems, a higher layer has to scramble data (encrypt it, in a sense) to increase entropy. For example, SONET has this problem.

  • Juniper: Enabling SONET Payload Scrambling

  • Cisco: When Should Scrambling Be Enabled on ATM Virtual Circuits?

  • Wikipedia: 64/66b encoding – run length; Cisco Press: Packet over SONET

    "An earlier scrambler used in Packet over SONET/SDH (RFC 1619) had a short polynomial with only 7 bits of internal state which allowed a malicious attacker to create a denial-of-service attack by transmitting patterns in all 27−1 states, one of which was guaranteed to de-synchronize the clock recovery circuits. This vulnerability was kept secret until the scrambler length was increased to 43 bits (RFC 2615) making it impossible for a malicious attacker to jam the system with a short sequence."

This doesn't apply to all physical layers, only some.

For example, fiber Ethernet is unaffected due to its usage of 8b/10b encoding. In other cases (such as with copper Ethernet), scrambling is built directly into the physical layer, so higher layers do not need to care about it (just as they shouldn't).

Serial (RS-232) links use explicit 'start/stop bits' for the same reason.

  • Wikipedia: clock recovery

Higher layers don't care at all. They've all been built to transport arbitrary payloads, and there's no particular reason why e.g. a TCP segment containing all 0s would be handled differently than the rest. (And even such a segment still has a TCP header and an IP header that are distinctly not null.)


Of course, this is also not an issue if your data gets encrypted by an intermediate layer (such as being transferred over TLS or via secure Wi-Fi), which always makes it look high-entropy to the outside.


As others have said, most modern transmission technologies are pretty deterministic, and a string of X bits will always take the same time to transmit, either as is, or if a lower layer requires scrambling, but applying a fixed ratio.

There are however a few cases when there could be a slight effect, if some characters need to be escaped. This is for instance the case for PPP, where at least 0x7D and 0x7E need to be escaped (the former being the escape prefix, and the latter being the frame delimiter). Additional characters may need to be escaped if the link requires it. For those characters, it will take twice the time to transmit them. Since PPP is still the basis for PPPoA and PPPoE and used in some last-mile scenarios, this could have a very slight effect. Unless of course your file is just a repetition of 0x7D or 0x7E, in which case it will take double the time compared to a file not containing those characters at all.

There is also the case of bit stuffing as used for instance by HDLC and USB: the NRZI coding scheme does not change level when series of ones are sent, so after too many ones, a zero is inserted to make sure sync is not lost. Worst case here is that if you send only ones (i.e. your file is just a repetition of 0xFF), then it will take 20% longer (HDLC, extra bit after 5 ones) or 17% longer (USB, extra bit after 6 ones) than if you send all zeroes or any sequence that never includes a 5- or 6-bit sequence of ones.

Back in the old days when not all links where 8-bit transparent, data transmitted could need encoding in some situations (e.g. base64 for binary data) and not others (e.g. pure ASCII sent as-is), with stuff like quoted-printable in between (e.g. text with a few accented characters). So depending on what you sent, it would require more or less characters/bits on the wire. But that should be extremely rare nowadays (and was mostly an issue for mail).

In all those cases, it's not really the entropy that matters, but the actual content matching specific sequences. If you have high entropy data (e.g. compressed or encrypted data) then you get a relatively consistent average speed even in those cases. If you have specific sequences of data (you send 1 GB of 0x7D over PPP or 1 GB of 0xFF over HDLC for instance), then it could take longer. If you avoid those sequences altogether, it could be shorter.

Note that some lower layers introduce compression even if you don't use it at higher layers. Again, back in the old days of POTS (dial-up) modems, the modems could use V.42bis compression between them. There are probably a few other transmission technologies which include compression at a relatively low layer.


Does the underlying internet/routers/phy layer, care whether it's transporting 1GB of zeros vs 1Gb of high entropy random data?

Quite often there's something on that tune. Some examples:

  • On many commonly used links, including anything HDLC, the bit stuffing causes long sequences of 1 to require nearly 20% more time than a random sequence of the same length.
  • Some modem-level data compression was standardized as V.42bis and V.44, and is still found in modern devices.
  • I'm told some long-distance carriers, or their customers, insert lossless low-latency compression in their links, because that does save bandwidth/money. References appreciated.
  • There's compression (often gzip) built into HTTP and supported by common servers and browsers, and that's hardly an "application layer" compression comparable to "Mega, or iCloud compressing a file before transfer".
  • There are (typically old) protocols where a byte is reserved for an escape character and is transmitted as two bytes (except, in some protocols, when repeated).
  • In morse code, the data rate of . is higher than the data rate of _, and that's not the only protocol with 0 and 1 requiring different time.

Generally, on-the-fly compression would be possible at any level. And in practice could happen below the application layer if your connection includes ssh port-forwarding through ssh -C (enable compression, including for port-forwarding and X11).

SSH compression only uses gzip, not a faster modern algorithm like zstd or lz4 designed, so it's only going to speed things up with a fast CPU compared to the link speed.

Standard physical / link-layer protocols like 802.3 ethernet or 802.11 Wifi don't use compression; it would cost latency, require powerful hardware to keep up with gigabit data rates, and any size gain in some cases is going to make the worst case worse. Same goes for link-level protocols used over long-distance fiber optic links.

Compression works a lot better at the application level, or at least for a VPN-like tunnel over multiple lower-level links.


The only things that could speed up the transfer are sending fewer packets (application-level compression), sending smaller packets (less good, and what you'd probably get from hypothetical link-level compression after framing into TCP frames), or lower packet loss rates if they were non-zero.

user1686's answer brought up the possibility of some data patterns being a problem for the link-level encodings, e.g. possibly you could craft packets that push some equipment to the point of miscommunicating, if they were already close to their timing tolerances. Generally not something you have to worry about on the Internet, though, AFAIK. Real-world fiber links are usually well maintained and have a very low rate of introducing bit-errors that would lead to a TCP checksum failure, and that's probably not data-dependent to any significant degree. (And after scrambling, not dependent on long runs of 0s or 1s; those happen in real-life uncompressed data, so scrambling functions are designed to make sure those aren't a problem.)