Assign physical interface to docker exclusively
I would like to run a high performance network test in a docker container, and do not want the overhead of bridging (so pipeworks won't work AFAIK). I would like to assign (in addition to the normal docker veth device) a physical 40GbE network interface from the host to a docker container as in the lxc "phys" mode. This should cause the physical interface to become invisible to the host.
Solution 1:
pipework
can move a physical network interface from the default to the container network namespace:
$ sudo pipework --direct-phys eth1 $CONTAINERID 192.168.1.2/24
For more information see here.
Solution 2:
In my search I came across old solutions that invovled passing lxc-config parameters to docker, but newer versions of docker don't use the lxc tools any more, so that cannot work.
Following the suggestion here: https://groups.google.com/d/msg/docker-user/pL8wlmiuAEU/QfcoFcKI3kgJ a solution was found. I did not look into modifying the pipework script as mentioned above, instead using the required commands directly. Also see subsequent blog post: http://jason.digitalinertia.net/exposing-docker-containers-with-sr-iov/.
The following low-level (i.e. not docker specific) network namespace tool commands can be used to transfer an interface from the host to a docker container:
CONTAINER=slave-play # Name of the docker container
HOST_DEV=ethHOST # Name of the ethernet device on the host
GUEST_DEV=test10gb # Target name for the same device in the container
ADDRESS_AND_NET=10.101.0.5/24
# Next three lines hooks up the docker container's network namespace
# such that the ip netns commands below will work
mkdir -p /var/run/netns
PID=$(docker inspect -f '{{.State.Pid}}' $CONTAINER)
ln -s /proc/$PID/ns/net /var/run/netns/$PID
# Move the ethernet device into the container. Leave out
# the 'name $GUEST_DEV' bit to use an automatically assigned name in
# the container
ip link set $HOST_DEV netns $PID name $GUEST_DEV
# Enter the container network namespace ('ip netns exec $PID...')
# and configure the network device in the container
ip netns exec $PID ip addr add $ADDRESS_AND_NET dev $GUEST_DEV
# and bring it up.
ip netns exec $PID ip link set $GUEST_DEV up
# Delete netns link to prevent stale namespaces when the docker
# container is stopped
rm /var/run/netns/$PID
A minor caveat on the interface naming if your host has a lot of ethX devices (mine had eth0 -> eth5). E.g. say you move eth3 into the container as eth1 in the containers namespace. When you stop the container, the kernel will try to move the container's eth1 device back to the host, but notice that there is already an eth1 device. It will then rename the interface to something arbitrary; took me a while to find it again. For this reason I edited /etc/udev/rules.d/70-persistent-net.rules (I think this filename is common to most popular Linux distros; I am using Debian) to give the interface in question a unique, unmistakable name, and use that in both the container and on the host.
Since we are not using docker to do this configuration, the standard docker lifecycle tools (e.g. docker run --restart=on-failure:10 ...) cannot be used. The host machine in question runs Debian Wheezy, so I wrote the following init script:
#!/bin/sh
### BEGIN INIT INFO
# Provides: slave-play
# Required-Start: $local_fs $network $named $time $syslog $docker
# Required-Stop: $local_fs $network $named $time $syslog $docker
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Description: some slavishness
### END INIT INFO
CONTAINER=slave-play
SCRIPT="docker start -i $CONTAINER"
RUNAS=root
LOGFILE=/var/log/$CONTAINER.log
LOGFILE=/var/log/$CONTAINER.log
HOST_DEV=test10gb
GUEST_DEV=test10gb
ADDRESS_AND_NET=10.101.0.5/24
start() {
if [ -f /var/run/$PIDNAME ] && kill -0 $(cat /var/run/$PIDNAME); then
echo 'Service already running' >&2
return 1
fi
echo 'Starting service…' >&2
local CMD="$SCRIPT &> \"$LOGFILE\" &"
su -c "$CMD" $RUNAS
sleep 0.5 # Nasty hack so that docker container is already running before we do the rest
mkdir -p /var/run/netns
PID=$(docker inspect -f '{{.State.Pid}}' $CONTAINER)
ln -s /proc/$PID/ns/net /var/run/netns/$PID
ip link set $HOST_DEV netns $PID name $GUEST_DEV
ip netns exec $PID ip addr add $ADDRESS_AND_NET dev $GUEST_DEV
ip netns exec $PID ip link set $GUEST_DEV up
rm /var/run/netns/$PID
echo 'Service started' >&2
}
stop() {
echo "Stopping docker container $CONTAINER" >&2
docker stop $CONTAINER
echo "docker container $CONTAINER stopped" >&2
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
*)
echo "Usage: $0 {start|stop|restart}"
esac
Slightly hacky, but it works :)