Springe zum Hauptinhalt

Strange Behaviour in Container Namespace

Anmerkung: Dieser Beitrag ist zusammengestellt aus mehreren Posts im Fediverse, wo ich gerade bei technischen Beiträgen zu englisch neige in der Hoffnung, dass jemand drauf reagiert. Für die bessere Auffindbarkeit veröffentliche ich das nachträglich auch hier. Das Datum des Beitrags setze ich auf das vom ersten Beitrag des Threads.


I am, by no means, a good C programmer or a good programmer at all. But looking at the #dnsmasq source code causes some severe headaches. Indentation is wildly mixed between tabstops, 8 spaces and 2 spaces. Functions have no meaningful documentation … has anybody some resources to make sense of this code? I'm especially interested in iface_check (network.c:112) as dnsmasq complains about my interface not having an address (dhcp.c:295).

For the moment, I'm tempted to just switch over to Kea + Bind. Or try to get systemd-networkd + systemd-resolved to do what I want.

Background: I'm migrating my NAS to a new NAS (NAS1 → NAS2). On my NAS1 I have a systemd-nspawn lan-basics which provides some LAN basics like DNS and DHCP. My NAS1 only has 1 physical interface so I created a bridge and added it to the nspawn. That worked reasonably well. My NAS2 has 4 physical ports and I wanted to dedicate 1 port for the nspawn. I did not try yet to replicate the complete NAS1 interface config on NAS2 to rule out some other sources for my problems.

On NAS1 I added all nspawns to the bridge interface. On NAS2 my plan was to only expose the nspawns that are required to be exposed. Everything else should be reverse proxied using nginx on the lan-basics nspawn. So, lan-basics requires at least two interfaces: the physical interface and a veth connected to the veth zone on NAS2. The other nspawns only get veth interfaces bound to that zone.

The NAS2 does DHCP using systemd-networkd on the zone interface, vz-zone. lan-basics should provide DNS and DHCP on the physical interface but itself use LLMNR on the veth interface to find the neighbouring nspawns.

At the moment, I think I'll just drop this requirement and use a dedicated nspawn for nginx. But then along came the message "DHCP packet received on enp6s0 which has no address" which is caused by dnsmasq and I can't find out what's wrong.

I could imagine it has something to do with the interface being moved into a different namespace, though. Maybe there's a difference in a veth bound to bridge and a physical interface in a different namespace, even if veth and physical interface both were in the same namespace.


Okay, I did replicate the config but it's still "DHCP packet received on host0 which has no address".

The last difference is the version of dnsmasq: 2.85 vs 2.89. I'll move the nspawn to NAS1 and try it there.

I must emphasize that I really appreciate the networkctl command. Prior to that I always issued systemctl restart systemd-networkd.service and still wasn't sure if it really applies my changes. networkctl reload OTOH seems to always does.

And then there's netplan. The people from Openmediavault, as much as I appreciate their work, decided to use netplan for network management. I just managed to terminate my ssh connections by netplan apply.

Of course, that's just a coincidence as my DHCP server wasn't running and the interfaces were set to DHCP. 😄

Well, I cannot tell for sure why, but the lan-basics nspawn from NAS2 doesn't work on NAS1, too. Because I already wasted^Wlearned so much I'll just copy the old lan-basics to NAS2 and see if it works there as I intended. Actually, I feel bad not using btrfs send|receive for that but it's just too big a hassle to ro-snapshot the subvolumes, send/receive them and the set them rw again. I should suggest automating these steps.

Result 1: The lan-basics1 nspawn bound to a bridge interface works.

Result 2: When providing exclusive access to enp6s0 the nspawn's dnsmasq starts showing "DHCP packet received on enp6s0 which has no address".

I don't know what's happening here but I'd say it's a bug somewhere, either in systemd-nspawn or in dnsmasq.

After a long chat with the nice people of #systemd IRC channel (nice to have it hashtagged if I mention the channel) I conclude that the bug is with dnsmasq. There's no reason it shouldn't work.


I then went on to file a bug report:

I finally found some time for further investigations. Here are my observations.

NAS1

old NAS with systemd-nspawn lan-basics

NAS2

new NAS with systemd-nspawn lan-basics

lan-basics1

NAS1's lan-basics on NAS2

lan-basics2

NAS2's lan-basics on NAS1

OS NAS1

Debian 11

OS NAS2

Debian 12

OS lan-basics at NAS1

Debian 11

OS lan-basics at NAS2

Debian 12

lan-basics at NAS1 is configured to use bridge br0 which enslaved the physical interface enp1s0.

lan-basics at NAS2 started with 1 of 4 interfaces moved into the systemd-nspawn, enp6s0. I removed any additional interfaces of my prior setup as I wanted to reduce the moving parts.

Leaving aside dnsmasq's configuration that worked on lan-basics at NAS1, I made the following attempts:

  1. I copied lan-basics at NAS2 to lan-basics2 at NAS1. As NAS1 only has one interface I added lan-basics2 to the same br0 as lan-basics at NAS1. The result was the same as on NAS2: "DHCP packet received on host0 which has no address" (the veth interface in this configuration is named host0 in the container).

  2. I replicated the network configuration from NAS1: I created a bridge interface br0, enslaved enp6s0 and configured lan-basics at NAS2 to use this bridge interface. Still the same symptoms.

  3. I copied lan-basics at NAS1 to lan-basics1 at NAS2. There I first tried using the same initial config as lan-basics at NAS2 with enp6s0, but lan-basics1 now oddly showed the same symptoms.

  4. I reconfigured the network to enslave enp6s0 under br0 and lan-basics1 to use this bridge interface. It works.

For the moment I stick to setup 4 but there's apparently something wrong with how dnsmasq treats namespaced interfaces. I can switch around setups 3 and 4, always with the same result of dnsmasq complaining about receiving DHCP packets on the interface where it doesn't know its address. Yes, I made sure that dnsmasq is started after the interface got its configuration (stopping dnsmasq, checking the address of the interface, starting dnsmasq).

I haven't tried yet the other network options of systemd-nspawn, e.g. IPVLAN, MACVLAN, VETH with port forwarding etc. Giving a systemd-nspawn container a network interface with exclusive access should be the gold standard of all possible network configurations.

FWIW, the versions of the involved dnsmasq binaries are 2.85 for Debian 11 and 2.89 for Debian 12.

I believe this to be a bug in dnsmasq. I'd gladly help to debug it but I can't do it myself. AFAICT the problem seems to be in iface_check in network.c but I don't know where it goes wrong.

Kommentare

With an account on the Fediverse or Mastodon, you can respond to this post. Since Mastodon is decentralized, you can use your existing account hosted by another Mastodon server or compatible platform if you don't have an account on this one. Known non-private replies are displayed below.