How to repair the checksum of the non-volatile memory (NVM) of Intel Ethernet Controller I219-V of an ASUS laptop?

I have a problem with a new ASUSPRO B8430UA laptop: its Intel Ethernet Connection I219-V does not work under Linux. In fact, I tried two different laptops of this model, and both had the same problem.

The Linux driver used is e1000e, it produces the following messages during Linux (Ubuntu 16.04) boot:

$ dmesg | grep e1000e 
[ 5.643760] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k 
[ 5.643761] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. 
[ 5.644308] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode 
[ 5.877838] e1000e 0000:00:1f.6: The NVM Checksum Is Not Valid 
[ 5.907340] e1000e: probe of 0000:00:1f.6 failed with error -5 

I have tried installing the latest version 3.3.4 of e1000e, but this didn't help (I have tainted the kernel, though).

I have posed questions about this on e1000-devel mailing list, and it was advised that I contact my laptop manufacturer, because "The NVM Checksum Is Not Valid" means that the contents of the non-volatile memory of my Ethernet chip is corrupted, or at least that it does not match the checksum (unfortunately, I am not a specialist and cannot explain this more precisely).

I have posed the question to Intel customer support, and they replied that they do not take care of OEM systems (on-board Ethernet chips in laptops) and that I should contact ASUS:

Unfortunately as your system is OEM our support options are extremely limited. The laptop manufacturer may have altered the software or the hardware and this is why support and drivers for such systems is provided directly by the laptop manufacturer.

I have contacted ASUS customer support, but they replied that they have no tools for checking or reparing the contents of the NVM, and that if I find such tools, they would be glad to know about it. They also explained that they are only supposed to support the original hardware and software configuration, and this laptop model is sold with Windows 7. Under Windows 7 my Ethernet seems to work fine. According to what I've learned, Windows simply doesn't check the NVM checksum.

I have found that in one similar case in 2011, the problem could be fixed using Intel Ethernet Connections Boot Utility:

https://thesorcerer.wordpress.com/2011/07/01/guide-intel-82573l-gigabit-ethernet-with-ubuntu-11-04-and-fix-pxe-e05/

However, the DISCLAIMER in the last paragraph warns:

You probably need to know that the Intel(R) Ethernet Connections Boot Utility WAS NOT designed to be used with on board (also know as OEM) lan cards (is for the PCI cards) therefore there is no sure way to predict it’s interactions with others on board components like USB or SOUND controllers.

The description of BootUtil version 1.6.13.0 also seems to say that it is not exactly intended for use with on-board Ethernet controllers:

The Intel(R) Ethernet Flash Firmware Utility (BootUtil) is a utility that can be used to program the PCI option ROM on the flash memory of supported Intel PCI and PCI-Express-based network adapters, and to update configurations.

[...]

OEMs may provide custom flash firmware images for OEM network adapters. Please refer to the instructions given by OEMs.

There is however a paragraph I didn't understand:

PXE+EFI and iSCSI+EFI image combinations are supported for all OEM generic adapters, however support is limited to devices which support both technologies as discrete images.

Besides, in comment 5 on a 2008 issue where the NVM was getting corrupted because of a e1000e driver bug, it is advised:

Please DO NOT run ibautil as some sites on the web suggest to try to fix this issue. It will likely cause you to have to replace your motherboard to get LAN functionality back.

IBAUTIL is one of the predecessors of BootUtil.

In any case, I decided to run BootUtil from under Linux without command-line options to get the "list of all supported Intel network ports found in the system." This is what I've got:

$ sudo ./bootutil64e

Intel(R) Ethernet Flash Firmware Utility
BootUtil version 1.6.13.0
Copyright (C) 2003-2016 Intel Corporation

Type BootUtil -? for help

Port Network Address Location Series  WOL Flash Firmware                Version
==== =============== ======== ======= === ============================= =======
  1   D017C2201F59     0:31.6 Gigabit N/A FLASH Not Present

I would like to understand what "FLASH Not Present" means in this context, and what options I have for fixing the checksum.


Update 1. According to a comment I received from e1000-devel mailing list about "FLASH Not Present",

The flash and NVM are two separate items. The flash enables things like PXE booting and iSCSI whereas the NVM stores things like the Network Address.


Update 2. I have found Intel's datasheet for I219, Section 10.3.2.2 Checksum Word Calculation says:

The Checksum word (Word 0x3F, NVM bytes 0x7E and 0x7F) is used to ensure that the base NVM image is a valid image. The value of this word should be calculated such that after adding all the words (0x00- 0x3F) / bytes (0x00-0x7F), including the Checksum word itself, the sum should be 0xBABA. The initial value in the 16 bit summing register should be 0x0000 and the carry bit should be ignored after each addition.


Solution 1:

Before trying my solution, please consider trying the one by ppparadox first.


With kind help from e1000-devel mailing list, here is how I fixed the NVM Checksum word using ethtool.

tl;dr: Basically, I first patched e1000e to have access to the Ethernet chip in Linux, and then used ethtool to read a value from the "checksummed" region of the NVM of my I219-V and then to write it back. The writing operation fixed the checksum.

To have acces to my Ethernet chip from Linux, I had to patch e1000e to skip NVM checksum validation. In file src/netdev.c, I changed the first line of

for (i = 0;; i++) {
    if (e1000_validate_nvm_checksum(&adapter->hw) >= 0)
        break;
    if (i == 2) {
        dev_err(pci_dev_to_dev(pdev),
            "The NVM Checksum Is Not Valid\n");
        err = -EIO;
        goto err_eeprom;
    }
}

into

for (i = 0; false; i++) {

(The whole block could also be just removed or commented out.)

Then I installed the patched module. From the /src directory I did:

sudo make install
sudo modprobe -r e1000e
sudo modprobe e1000e
sudo update-initramfs -u
reboot

Now the checksum validation was skipped and the Ethernet started working.

Before fixing the Checksum word, I looked into the outline of the NVM of I219 presented in Section 10 of Intel's datasheet. The use of Checksum word is explained in Section 10.3.2.2.

I noted the Checksum word before writing to the NVM:

$ sudo ethtool -e enp0s31f6 offset 0x7e length 2
Offset      Values
------      ------
0x007e:     60 13 

(enp0s31f6 is the name of my Ethernet interface.) Thus the erroneous Checksum word value was 0x1360.

I looked at the dump of NVM with sudo ethtool -e enp0s31f6 and then looked again at the byte at offset 0x10:

$ sudo ethtool -e enp0s31f6 offset 0x10 length 1
Offset      Values
------      ------
0x0010:     ff 

(Apparently any location would do, but I was told that in my case the value at offset 0x10 was not used at all, so it seemed "safer.")

For writing to the NVM (EEPROM) with ethtool, I needed a "magic key." I read Unbricking an Intel Pro/1000 (e1000) network interface and figured out that my magic key was 0x15708086 using lspci -nn:

$ lspci -nn | grep Ethernet
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection I219-V [8086:1570] (rev 21)

Then I wrote 0xff back to offset 0x10 in the NVM:

$ sudo ethtool -E enp0s31f6 magic 0x15708086 offset 0x10 value 0xff

After comparing the dumps of the NVM before and after the write, I could see that, as expected, the only thing that changed was the Checksum word:

$ sudo ethtool -e enp0s31f6 offset 0x7e length 2
Offset      Values
------      ------
0x007e:     60 93 

The new value thus was 0x9360.

I booted a kernel with an unpatched e1000e, and the Ethernet port worked fine.

P.S. I find it a bit worrying that only the highest bit in the Checksum word was wrong.

Solution 2:

I used bootutil for Linux from Intel (as suggested in the 2011 post) on an integrated Intel NIC on my Asus Z270-A to fix this error, without the recompiling and magic keys discussed in the upvoted answer. It worked great. I downloaded the tool from the Intel download site

chmod +x ./bootutil64e
sudo ./bootutil64e -NIC 1 -defcfg