NIC spontaneously switches between 100 and 1000 Mbit [closed]

After having given numerous excuses to my coworkers, tested a pile of hardware (including a stack of recently purchased, but somewhat cheap Realtek NICs), looked through MBs of log files and read up on kernel parameters, I've found the culprit.

Deciding that this probably was software related, I booted my system in single user mode and ran the command line version of speedtest from Ookla. It reflected the bandwith advertised by my ISP - around 500/500Mbit. At no point did it fall back to 100 Mbit.

Booting in multi user mode again, and the problem was back. When left alone for a couple of minutes, the NIC would fall back to 100 Mbit. When running speedtest again, the NIC would renegotiate the link to 1,000 Mbit about 2-5 seconds into the test. About a minute of so after the test completed, the NIC was back to 100 Mbit.

Armed with systemctl stop, I started going through the daemons running on the system one by one. Finally I found the gremlin hiding in my system: tuned

When I stopped tuned, the issue was resolved. Upon closer inspection, it appears I at some point enabled the maximum power saving option in tuned:

$ tuned-adm --debug active
Current active profile: powersave

I would imagine setting this to something less aggressive would also do the trick, but for now I turned it off completely:

$ tuned-adm off

More info from https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/chap-red_hat_enterprise_linux-performance_tuning_guide-tuned:

As a practical example, consider a typical office workstation. Most of the time, the Ethernet network interface is very inactive. Only a few emails go in and out every once in a while or some web pages might be loaded. For those kinds of loads, the network interface does not have to run at full speed all the time, as it does by default. Tuned has a monitoring and tuning plug-in for network devices that can detect this low activity and then automatically lower the speed of that interface, typically resulting in a lower power usage. If the activity on the interface increases for a longer period of time, for example because a DVD image is being downloaded or an email with a large attachment is opened, tuned detects this and sets the interface speed to maximum to offer the best performance while the activity level is so high. This principle is used for other plug-ins for CPU and hard disks as well.

Learnings:

  • I should really remember it when I enable something like this.
  • A little more (than no) verbosity would be lovely when daemons tweak hardware like this.
  • I should test more before running off to the hardware store.
  • It's not always the kernel's fault ;-)