Bridging LXC containers to host eth0 so they can have a public IP
UPDATE:
I found the solution there: http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge#No_traffic_gets_trough_.28except_ARP_and_STP.29
# cd /proc/sys/net/bridge
# ls
bridge-nf-call-arptables bridge-nf-call-iptables
bridge-nf-call-ip6tables bridge-nf-filter-vlan-tagged
# for f in bridge-nf-*; do echo 0 > $f; done
But I'd like to have expert opinions on this: is it safe to disable all bridge-nf-*? What are they here for?
END OF UPDATE
I need to bridge LXC containers to the physical interface (eth0) of my host, reading numerous tutorials, documents and blog posts on the subject.
I need the containers to have their own public IP (which I've previously done KVM/libvirt).
After two days of searching and trying, I still can't make it work with LXC containers.
The host runs a freshly installed Ubuntu Server Quantal (12.10) with only libvirt (which I'm not using here) and lxc installed.
I created the containers with :
lxc-create -t ubuntu -n mycontainer
So they also run Ubuntu 12.10.
Content of /var/lib/lxc/mycontainer/config is:
lxc.utsname = mycontainer
lxc.mount = /var/lib/lxc/test/fstab
lxc.rootfs = /var/lib/lxc/test/rootfs
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
lxc.network.name = eth0
lxc.network.veth.pair = vethmycontainer
lxc.network.ipv4 = 179.43.46.233
lxc.network.hwaddr= 02:00:00:86:5b:11
lxc.devttydir = lxc
lxc.tty = 4
lxc.pts = 1024
lxc.arch = amd64
lxc.cap.drop = sys_module mac_admin mac_override
lxc.pivotdir = lxc_putold
# uncomment the next line to run the container unconfined:
#lxc.aa_profile = unconfined
lxc.cgroup.devices.deny = a
# Allow any mknod (but not using the node)
lxc.cgroup.devices.allow = c *:* m
lxc.cgroup.devices.allow = b *:* m
# /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
# consoles
lxc.cgroup.devices.allow = c 5:1 rwm
lxc.cgroup.devices.allow = c 5:0 rwm
#lxc.cgroup.devices.allow = c 4:0 rwm
#lxc.cgroup.devices.allow = c 4:1 rwm
# /dev/{,u}random
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
# rtc
lxc.cgroup.devices.allow = c 254:0 rwm
#fuse
lxc.cgroup.devices.allow = c 10:229 rwm
#tun
lxc.cgroup.devices.allow = c 10:200 rwm
#full
lxc.cgroup.devices.allow = c 1:7 rwm
#hpet
lxc.cgroup.devices.allow = c 10:228 rwm
#kvm
lxc.cgroup.devices.allow = c 10:232 rwm
Then I changed my host /etc/network/interfaces to:
auto lo
iface lo inet loopback
auto br0
iface br0 inet static
bridge_ports eth0
bridge_fd 0
address 92.281.86.226
netmask 255.255.255.0
network 92.281.86.0
broadcast 92.281.86.255
gateway 92.281.86.254
dns-nameservers 213.186.33.99
dns-search ovh.net
When I try command line configuration ("brctl addif", "ifconfig eth0", etc.) my remote host becomes inaccessible and I have to hard reboot it.
I changed the content of /var/lib/lxc/mycontainer/rootfs/etc/network/interfaces to:
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 179.43.46.233
netmask 255.255.255.255
broadcast 178.33.40.233
gateway 92.281.86.254
It takes several minutes for mycontainer to start (lxc-start -n mycontainer).
I tried replacing
gateway 92.281.86.254
by :
post-up route add 92.281.86.254 dev eth0
post-up route add default gw 92.281.86.254
post-down route del 92.281.86.254 dev eth0
post-down route del default gw 92.281.86.254
My container then starts instantly.
But whatever configuration I set in /var/lib/lxc/mycontainer/rootfs/etc/network/interfaces, I cannot ping from mycontainer to any IP (including the host's) :
ubuntu@mycontainer:~$ ping 92.281.86.226
PING 92.281.86.226 (92.281.86.226) 56(84) bytes of data.
^C
--- 92.281.86.226 ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 5031ms
And my host cannot ping the container:
root@host:~# ping 179.43.46.233
PING 179.43.46.233 (179.43.46.233) 56(84) bytes of data.
^C
--- 179.43.46.233 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4000ms
My container's ifconfig:
ubuntu@mycontainer:~$ ifconfig
eth0 Link encap:Ethernet HWaddr 02:00:00:86:5b:11
inet addr:179.43.46.233 Bcast:255.255.255.255 Mask:0.0.0.0
inet6 addr: fe80::ff:fe79:5a31/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:64 errors:0 dropped:6 overruns:0 frame:0
TX packets:54 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4070 (4.0 KB) TX bytes:4168 (4.1 KB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:32 errors:0 dropped:0 overruns:0 frame:0
TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2496 (2.4 KB) TX bytes:2496 (2.4 KB)
My host's ifconfig:
root@host:~# ifconfig
br0 Link encap:Ethernet HWaddr 4c:72:b9:43:65:2b
inet addr:92.281.86.226 Bcast:91.121.67.255 Mask:255.255.255.0
inet6 addr: fe80::4e72:b9ff:fe43:652b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1453 errors:0 dropped:18 overruns:0 frame:0
TX packets:1630 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:145125 (145.1 KB) TX bytes:299943 (299.9 KB)
eth0 Link encap:Ethernet HWaddr 4c:72:b9:43:65:2b
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3178 errors:0 dropped:0 overruns:0 frame:0
TX packets:1637 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:298263 (298.2 KB) TX bytes:309167 (309.1 KB)
Interrupt:20 Memory:fe500000-fe520000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:6 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:300 (300.0 B) TX bytes:300 (300.0 B)
vethtest Link encap:Ethernet HWaddr fe:0d:7f:3e:70:88
inet6 addr: fe80::fc0d:7fff:fe3e:7088/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:54 errors:0 dropped:0 overruns:0 frame:0
TX packets:67 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4168 (4.1 KB) TX bytes:4250 (4.2 KB)
virbr0 Link encap:Ethernet HWaddr de:49:c5:66:cf:84
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
I have disabled lxcbr0 (USE_LXC_BRIDGE="false" in /etc/default/lxc).
root@host:~# brctl show
bridge name bridge id STP enabled interfaces
br0 8000.4c72b943652b no eth0
vethtest
I have configured the IP 179.43.46.233 to point to 02:00:00:86:5b:11 in my hosting provider (OVH) config panel.
(The IPs in this post are not the real ones.)
Thanks for reading this long question! :-)
Vianney
A better way to make your change permanent is to use sysctl instead of writing to /proc directly since that is the standard way to configure kernel parameters at runtime so they are set correctly at next boot:
# cat >> /etc/sysctl.d/99-bridge-nf-dont-pass.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
net.bridge.bridge-nf-filter-vlan-tagged = 0
EOF
# service procps start
As for the answer to the question in your update...
bridge-netfilter (or bridge-nf) is a very simple bridge for IPv4/IPv6/ARP packets (even in 802.1Q VLAN or PPPoE headers) that provides the functionality for a stateful transparent firewall, but more advanced functionality like transparent IP NAT is provided by passing those packets to arptables/iptables for further processing-- however even if the more advanced features of arptables/iptables is not need, passing packets to those programs is still turned on by default in the kernel module and must be turned off explicitly using sysctl.
What are they here for? These kernel configuration options are here to either pass (1) or don't pass (0) packets to arptables/iptables as described in the bridge-nf FAQ:
As of kernel version 2.6.1, there are three sysctl entries for bridge-nf behavioral control (they can be found under /proc/sys/net/bridge/):
bridge-nf-call-arptables - pass (1) or don't pass (0) bridged ARP traffic to arptables' FORWARD chain.
bridge-nf-call-iptables - pass (1) or don't pass (0) bridged IPv4 traffic to iptables' chains.
bridge-nf-call-ip6tables - pass (1) or don't pass (0) bridged IPv6 traffic to ip6tables' chains.
bridge-nf-filter-vlan-tagged - pass (1) or don't pass (0) bridged vlan-tagged ARP/IP traffic to arptables/iptables.
Is it safe to disable all bridge-nf-*? Yes, it is not only safe to do so, but there is a recommendation for distributions to turn it off by default to help people avoid confusion for the kind of problem you encountered:
In practice, this can lead to serious confusion where someone creates a bridge and finds that some traffic isn't being forwarded across the bridge. Because it's so unexpected that IP firewall rules apply to frames on a bridge, it can take quite some time to figure out what's going on.
and to increase security:
I still think the risk with bridging is higher, especially in the presence of virtualisation. Consider the scenario where you have two VMs on the one host, each with a dedicated bridge with the intention that neither should know anything about the other's traffic.
With conntrack running as part of bridging, the traffic can now cross over which is a serious security hole.
UPDATE: May 2015
If you are running a kernel older than 3.18, then you may be subject to the old behavior of bridge filtering enabled by default; if you newer than 3.18, then you can still be bitten by this if you've loaded the bridge module and haven't disabled the bridge filtering. See:
https://bugzilla.redhat.com/show_bug.cgi?id=634736#c44
After all these years of asking for the default of bridge filtering to be "disabled" and the change being refused by the kernel maintainers, now the filtering has been moved into a separate module that isn't loaded (by default) when the bridge module is loaded, effectively making the default "disabled". Yay!
I think this is in the kernel as of 3.17 (It definitely is in kernel 3.18.7-200.fc21, and appears to be in git prior to the tag "v3.17-rc4")