Debian - Slow boot waiting for network

I have a Debian 10 Server that I think could/should boot faster, but I don't know what the problem is.

I see A start job is running for Raise network interfaces when booting, and looking at systemd-analyze blame it seems a lot of time is spend in networking.service (I left out the services below 50ms).

         50.846s networking.service
          1.872s smb.mount
           717ms nftables.service
           560ms ifupdown-pre.service
           544ms systemd-logind.service
           205ms systemd-journald.service
           194ms dev-nvme0n1p2.device
            88ms systemd-udev-trigger.service
            83ms smbd.service
            66ms libvirtd.service
            61ms lvm2-monitor.service
            60ms chrony.service
            57ms [email protected]
            50ms nmbd.service

Doing a systemctl stop networking.service and systemctl start networking.service takes less than 2 seconds though.

This is my /etc/network/interfaces for reference (br0 is used for VMs):

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

auto enp5s0
iface enp5s0 inet static
        address 192.168.1.190
        netmask 255.255.255.0
        gateway 192.168.1.1 

auto enp9s0
iface enp9s0 inet manual

auto enp8s0
iface enp8s0 inet static
        address 10.36.14.242
        netmask 255.255.255.0
        post-up ip route add 10.0.0.0/8 via 10.36.14.1 dev enp8s0
        pre-down ip route del 10.0.0.0/8 via 10.36.14.1 dev enp8s0

auto br0
iface br0 inet static
        address 10.36.15.11
        netmask 255.255.255.0
        bridge_ports enp9s0
        bridge_stp off
        bridge_fd 0

auto wg-p2p
iface wg-p2p inet static
        address 10.88.88.1
        netmask 255.255.255.0
        pre-up ip link add $IFACE type wireguard
        pre-up wg setconf $IFACE /etc/wireguard/$IFACE.conf
        post-down ip link del $IFACE

I did see some posts about changing auto to allow-hotplug, but the explanations make it sound like it just starts the interfaces without blocking the boot i.e. the time to reach the network would be the same. Changing the enp* interfaces to that did however improve at least the time to start routing. Pinging another network from another device after starting the server takes about 1 minute with auto and ~32 seconds with allow-hotplug (probably because the interfaces are started in parallel?), and I get a console immediately without the A start job is running message. Changing br0 to it made that interface not start automatically though, so I left it.

Also only when auto is active on the enp* interfaces, I get A stop job is running for Raise network interfaces on shutdown, which takes about 50 seconds to complete.

So I want to know if it's OK to leave the interfaces on allow-hotplug (maybe there can be interface binding issues for some services?), and if there are any other problems I can fix to improve boot times.

Edit:

Relevant output of dmesg -T is:

[Mo Jun 24 13:16:48 2019] IPv6: ADDRCONF(NETDEV_UP): enp5s0: link is not ready
[Mo Jun 24 13:16:51 2019] igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[Mo Jun 24 13:16:51 2019] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0: link becomes ready
[Mo Jun 24 13:16:58 2019] r8169 0000:09:00.0: firmware: direct-loading firmware rtl_nic/rtl8168e-3.fw
[Mo Jun 24 13:16:58 2019] RTL8211E Gigabit Ethernet r8169-900:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-900:00, irq=IGNORE)
[Mo Jun 24 13:16:58 2019] r8169 0000:09:00.0 enp9s0: No native access to PCI extended config space, falling back to CSI
[Mo Jun 24 13:16:58 2019] IPv6: ADDRCONF(NETDEV_UP): enp9s0: link is not ready
[Mo Jun 24 13:17:01 2019] r8169 0000:09:00.0 enp9s0: Link is Up - 1Gbps/Full - flow control off
[Mo Jun 24 13:17:01 2019] IPv6: ADDRCONF(NETDEV_CHANGE): enp9s0: link becomes ready
[Mo Jun 24 13:17:08 2019] RTL8211E Gigabit Ethernet r8169-800:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-800:00, irq=IGNORE)
[Mo Jun 24 13:17:08 2019] IPv6: ADDRCONF(NETDEV_UP): enp8s0: link is not ready
[Mo Jun 24 13:17:11 2019] r8169 0000:08:00.0 enp8s0: Link is Up - 1Gbps/Full - flow control off
[Mo Jun 24 13:17:11 2019] IPv6: ADDRCONF(NETDEV_CHANGE): enp8s0: link becomes ready
[Mo Jun 24 13:17:18 2019] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[Mo Jun 24 13:17:18 2019] br0: port 1(enp9s0) entered blocking state
[Mo Jun 24 13:17:18 2019] br0: port 1(enp9s0) entered disabled state
[Mo Jun 24 13:17:18 2019] device enp9s0 entered promiscuous mode
[Mo Jun 24 13:17:18 2019] br0: port 1(enp9s0) entered blocking state
[Mo Jun 24 13:17:18 2019] br0: port 1(enp9s0) entered forwarding state
[Mo Jun 24 13:17:18 2019] IPv6: ADDRCONF(NETDEV_UP): br0: link is not ready
[Mo Jun 24 13:17:19 2019] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready
[Mo Jun 24 13:17:28 2019] wireguard: loading out-of-tree module taints kernel.
[Mo Jun 24 13:17:28 2019] wireguard: module verification failed: signature and/or required key missing - tainting kernel
[Mo Jun 24 13:17:28 2019] wireguard: WireGuard 0.0.20190406 loaded. See www.wireguard.com for information.
[Mo Jun 24 13:17:28 2019] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <[email protected]>. All Rights Reserved.

Disabling any interface seems to gain me ~10 seconds per interface

I had exactly the same problem as you (I even referenced your serverfault question in this posting to the Debian User Forums)

My problem turned out to be another instance of "Boottime Entropy Starvation" (I'm guessing that wpa-supplicant needed some random data for key exchange, and that the kernel blocked while waiting for the random number generator to provide the random data). It turns out that it's a surprisingly common problem - unfortunately almost all the "so-called solutions" that I've seen simply suggest reducing the time-out. :-(

The fix, for me, was to install the havege daemon (this is another random data generator that starts early during the boot process, so there's plenty of entropy available when wpa-supplicant requests it), thus:

sudo apt install haveged

Enjoy!! :-)

More info:

https://wiki.debian.org/BoottimeEntropyStarvation
https://www.debian.org/releases/stable/amd64/release-notes/ch-information.en.html#entropy-starvation
https://daniel-lange.com/archives/152-hello-buster.html

Did you check if systemd-networkd-wait-online.service is disabled?

"""systemd-networkd-wait-online is a oneshot system service (see systemd.service(5)), that waits for the network to be configured. By default, it will wait for all links it is aware of and which are managed by systemd-networkd.service(8) to be fully configured or failed, and for at least one link to gain a carrier.""" https://manpages.debian.org/testing/systemd/systemd-networkd-wait-online.service.8.en.html

Also make sure your network is free of rogue servers first. A broadcasting DHCP server with invalid leases maybe?

Debian - Slow boot waiting for network

Related

Recent Posts