Networking doesn't initialize properly when pxebooting Linux Mint (live CD) using cifs, but works with nfs

I have a TFTP/DHCP/NFS/SMB server (Ubuntu server 12.04 LTS) on 192.168.26.1. I use pxelinux to display a menu containing startup and installation options for Windows, an Ubuntu network installer, and the Linux Mint 17 MATE live CD. Getting it running like this was already nasty and I'm running out of steam...

For Linux Mint, I have provided 2 netboot options: NFS and CIFS. I got it fully working with NFS: The user can select it in the boot menu, and a short while later, lands on the Linux Mint live CD desktop. But with CIFS, networking doesn't initialize properly. When Linux Mint starts, the networking hangs for 120 seconds. Then, it continues to boot to the Desktop, but net network-manager isn't started (and doesn't start). I suspected that it might be a problem with the DHCP server not responding, however, in the DHCP server log I can see the DHCP request and successful response.

Once in the Linux Mint desktop, ifconfig reports an IP address that is assigned by the DHCP, and pinging the server works.

My pxelinux configuration is (everything after APPEND is in one line, I just split it up for readability on this site):

NFS:

LABEL linuxmint17
    MENU LABEL Linux Mint 17
    KERNEL linux-mint-17/image/casper/vmlinuz
    APPEND 
        root=/dev/nfs boot=casper netboot=nfs
        nfsroot=192.168.26.1:/var/lib/tftpboot/linux-mint-17/image
        initrd=/linux-mint-17/image/casper/initrd.lz

CIFS:

LABEL linuxmint17smb
    MENU LABEL Linux Mint 17 (SMB)
    KERNEL linux-mint-17/image/casper/vmlinuz
    APPEND
        root=/dev/cifs boot=casper netboot=cifs
        nfsroot=//192.168.26.1/tftpshare/linux-mint-17/image
        ip=dhcp
        initrd=/linux-mint-17/image/casper/initrd.lz

Note that I had to insert the ip=dhcp option to the CIFS menu. If I don't do that, the boot process hangs for 120 seconds when initializing Networking, but then it doesn't continue. If I add that line, it still hangs, but after 120 seconds it continues to boot.

The setup:

The client and server virtual machines are only connected to each other (internal network). There are no other machines in the network at all.

The server has all the pxe boot files under /var/lib/tftpboot/. The Linux Mint ISO (unmodified) is mounted under /var/lib/tftpboot/linux-mint-17/image. vmlinuz and initrd are in /var/lib/tftpboot/linux-mint-17/image/casper. /var/lib/tftpboot/ is an NFS export. There is a samba share called tftpshare that maps to /var/lib/tftpboot/ (read-only, allows access to everyone).

smb.conf

[tftpshare]
   comment = TFTP Root
   path = /var/lib/tftpboot
   browsable = yes
   guest ok = yes
   read only = no
   create mask = 0644

dhcpd.conf

authoritative;
subnet 192.168.26.0 netmask 255.255.255.0 {
  range 192.168.26.10 192.168.26.40;
  next-server 192.168.26.1;
  filename "pxelinux.0";
}

This is a strange 2 minute gap in the syslog of the client machine after a successful boot to the live desktop environment:

Jun 14 13:13:18 mint kernel: [   23.388873] intel_rapl: domain core energy ctr 0:0 not working, skip
Jun 14 13:13:18 mint kernel: [   23.528409] intel_rapl: domain uncore energy ctr 0:0 not working, skip
Jun 14 13:13:18 mint kernel: [   23.528453] intel_rapl: no valid rapl domains found in package 0
Jun 14 13:13:20 mint ntpdate[1198]: Can't find host ntp.ubuntu.com: Name or service not known (-2)
Jun 14 13:13:20 mint ntpdate[1198]: no servers can be used, exiting

(2 Minute gap without any entries, roughly at the time when the 120 second boot delay occurs)

Jun 14 13:15:19 mint dbus[864]: [system] Activating service name='org.freedesktop.ConsoleKit' (using servicehelper)
Jun 14 13:15:19 mint dbus[864]: [system] Activating service name='org.freedesktop.PolicyKit1' (using servicehelper)
Jun 14 13:15:19 mint acpid: starting up with netlink and the input layer
Jun 14 13:15:19 mint acpid: 9 rules loaded
Jun 14 13:15:19 mint acpid: waiting for events: event logging is off

This is what happens in both cases when using CIFS:

Hangs

On the server:

...
Jun 14 13:12:52 ubuntu-netboot in.tftpd[2722]: RRQ from 192.168.26.13 filename /linux-mint-17/image/casper/initrd.lz
Jun 14 13:13:14 ubuntu-netboot dhcpd: DHCPDISCOVER from 08:00:27:1c:c5:43 via eth1
Jun 14 13:13:14 ubuntu-netboot dhcpd: DHCPOFFER on 192.168.26.14 to 08:00:27:1c:c5:43 via eth1
Jun 14 13:13:14 ubuntu-netboot dhcpd: DHCPREQUEST for 192.168.26.14 (192.168.26.1) from 08:00:27:1c:c5:43 via eth1
Jun 14 13:13:14 ubuntu-netboot dhcpd: DHCPACK on 192.168.26.14 to 08:00:27:1c:c5:43 via eth1

The IP that is assigned to the client in case of a successful boot to the desktop, according to ifconfig, is indeed ...14.

This is what happens without the ip=dhcp:

nodhcp1nodhcp2

This is what happens with the ip=dhcp, immediately before the Desktop shows:

success

I'm thankful for any ideas. If any other logs (which?) would help, I can provide them.


Solution 1:

This problem has been solved by Serva (I'm related to Serva development)

The complete kernel and append lines plus the additional initrd.gz required for PXE booting current Ubuntu/Mint live versions with CIFS can be found here

Basically the problem is a Casper bug (AFAIK never reported/fixed before) that in the case of a CIFS netmount forgets to export a kernel parameter that later affects the networking configuration scripts that end up recreating with delays and errors the file /etc/network/interfaces.

If we see Serva's Ubuntu/Mint "append" line

append   = showmounts toram root=/dev/cifs initrd=NWA_PXE/$HEAD_DIR$/casper/initrd.lz,NWA_PXE/$HEAD_DIR$/casper/INITRD_N11.GZ boot=casper netboot=cifs nfsroot=//$IP_BSRV$/NWA_PXE_SHARE/$HEAD_DIR$ NFSOPTS=-ouser=serva,pass=avres,ro ip=dhcp ro

we find that the embedded "initrd" variable is made of 2 "consecutively loaded" initrd files (initrd.lz and INITRD_N11.GZ)

initrd=NWA_PXE/$HEAD_DIR$/casper/initrd.lz,NWA_PXE/$HEAD_DIR$/casper/INITRD_N11.GZ 

The first one (initrd.lz) is the one coming with Ubuntu/Mint while the second one (INITRD_N11.GZ) is a tiny 8K (originally developed by Serva) custom initrd including the patched components. This approach avoids the need to recreate the big original initrd.lz (20 MB). INITRD_N11.GZ can be freely downloaded from Serva's site (please do not post direct links here)

If we continue analyzing the "append" line we see the need to add the CIFS mounting options (the OP forgets this step) that are carried in this case by the somehow misleading variable "NFSOPTS"

NFSOPTS=-ouser=serva,pass=avres,ro

In this example the SMB share has a user=serva with password=avres and it'll be mounted as "Read Only", off course user/pass parameters must be edited accordingly.

The TFTP paths and CIFS locator are the ones required by Serva repository structure; when the PXE server is not Serva those parameters must be edited accordingly.

If you guys PXE boot this way Ubuntu/Mint Live versions from a CIFS share there will be no network related delays and Internet/Networking will work right away after boot

Edit:

Bug already reported to Ubuntu Launchpad and confirmed