How should I configure Ubuntu/Upstart for unusual network configuration?

First, let me apologize for answering my own question.

Second, I have, in fact, conquered the failsafe.conf startup delay problem. While I realize there's been no torrent of activity on this question, I've seen enough activity on various other threads about similar failsafe/boot delay problems that I'm posting my research and solution for the benefit of others in a similar pickle.

Overview

As noted in the initial post, the problem as I saw it was one where the failsafe upstart job was imposing an unwanted constraint on the booting of my system. I then researched the issue further, found out why failsafe was behaving as it was.

Analysis

By default, failsafe.conf defines a start condition that effectively fires it at boot time (as soon as filesystem and the loopback interface are available), and defines one of two possible stop conditions:

start on filesystem and net-device-up IFACE=lo
stop on static-network-up or starting rc-sysinit

Failsafe's insistence upon the delays arose by virtue of neither 'stop' event firing. The second condition, rc-sysinit, is one of the final system initialization tasks upstart runs, which has its own start condition

start on (filesystem and static-network-up) or failsafe-boot

With failsafe not stopping, it's apparent rc-sysinit is not starting. Failsafe will emit the failsafe-boot event once its timeouts expire. Given failsafe has started, 'filesystem' is implied, thus leaving the sole remaining condition common to both events being 'static-network-up'. Failsafe is running because it doesn't think any network interfaces are 'up.'

The cause

Working backward through /etc/network/if-up.d, an upstart script is defined that iterates through all the network interfaces defined in /etc/network/interfaces defined with an "auto" qualifier, meaning that interface is to be brought up at boot time. The definition of how an interface is considered 'up' becomes an important semantic issue I'll describe later.

If and only if all "auto"-configured interfaces are 'up', the upstart script will emit the famed 'static-network-up' event. That would, in turn, allow rc-sysinit to fire and terminate failsafe - hence the root cause of my problem. None of my network interfaces have an IP address at boot time - by design. But 'static-network-up' doesn't abide the idea of an interface being 'up' without an IP address, hence failsafe hangs until timeouts expire.

For my situation, I slave the two physical NIC's in the box to bridges and expose them via taps to two different VM's. One VM serves up DHCP across one tap, the other is just a server on the same network. For the bridges to function properly as tapped by the VM's, the NIC's must at least be "UP", passively allowing packets through. Hence, 'auto' seemed appropriate in /etc/network/interfaces. It was not appropriate, however, in the eyes of failsafe, hence the only solution had to be one that abided failsafe's semantics.

The solution to my problem, then, was twofold:

  1. Remove the 'auto' declaration from every network interface I'd defined (other than loopback).
  2. Create upstart jobs to bring up the previously "auto" interfaces "manually."

I defined one job four each of four devices - two taps and two virtual bridges - by mimicking a solution provided here.

In this configuration, with no 'auto' interfaces, the networking script should now immediately emit 'static-network-up', thus forcing failsafe to terminate. A final modification required me to add a "post-up" clause to each tap's interface definition to call 'brctl' and create the corresponding virtual bridge, previously done as part of the 'auto' configuration.

So, my /etc/network/interfaces (in part) now looks like:

#auto tpRED  (commented out)
  iface tpRED inet manual
  pre-up /usr/sbin/tunctl -t tpRED
  post-up /sbin/brctl addbr brRED

#auto brRED
  iface brRED inet manual
  bridge_ports eth1 tpRED
  bridge_hw xx:yy:aa:bb:cc:dd

The acid test

The acid test? Reboot the server. And when I did, the failsafe timeout was gone, and my network came up in a functionally identical configuration. IT WORKS!! I just wish we had a better handle on the semantics of an "UP" network interface!!