In the previous article, we mentioned that after the default cilium install, the Cilium feature enablement status is as follows:
datapath mode: tunnel: For compatibility reasons, Cilium enables the tunnel (VXLAN-based) datapath mode by default, which is an overlay network architecture.
KubeProxyReplacement: Disabled Cilium does not fully replace kube-proxy. We will cover how to achieve this replacement in a future article.
IPv6 BIG TCP: Disabled This feature requires Linux Kernel >= 5.19, so it is disabled on Kernel 4.19.232.
BandwidthManager: Disabled This feature requires Linux Kernel >= 5.1, so it is currently disabled.
Host Routing: Legacy Legacy Host Routing still uses iptables, resulting in weaker performance; however, BPF-based host routing requires Linux Kernel >= 5.10.
Masquerading: IPtables There are several IP masquerading methods: eBPF-based and iptables-based. The default is iptables-based, but eBPF-based is recommended.
Hubble Relay: disabled Hubble is also disabled by default.
Today we will try to disable the tunnel feature and enable Native Routing to improve network performance.
Test Environment
Cilium 1.13.4
K3s v1.26.6+k3s1
OS
3x Ubuntu 23.04 VMs, Kernel 6.2, x86
VXLAN Encapsulation
Without any configuration provided, Cilium automatically runs in this mode because it has the lowest requirements on the underlying network infrastructure.
In this mode, all cluster nodes form a mesh of tunnels using the UDP-based encapsulation protocol VXLAN or Geneve. All traffic between Cilium nodes is encapsulated.
Drawbacks of This Mode
MTU Overhead
Due to the added encapsulation headers, the MTU available for the payload is lower than with native routing (50 bytes per network packet for VXLAN). This results in reduced maximum throughput for specific network connections.
Native Routing
The native routing datapath is enabled when tunnel: disabled is set, which activates the native packet forwarding mode. The native packet forwarding mode leverages the routing capabilities of the network where Cilium runs, instead of performing encapsulation.
In native routing mode, Cilium delegates all packets not addressed to other local endpoints to the Linux kernel’s routing subsystem. This means packets are routed as if a local process had emitted them. Therefore, the network connecting the cluster nodes must be capable of routing PodCIDR.
When configuring native routing, Cilium automatically enables IP forwarding in the Linux kernel.
Network Requirements
To run in native routing mode, the network connecting the hosts running Cilium must be capable of forwarding IP traffic using addresses assigned to pods or other workloads.
The Linux kernel on each node must know how to forward packets for pods or other workloads on all nodes running Cilium. This can be achieved in two ways:
The nodes themselves do not know how to route all pod IPs, but a router on the network knows how to reach all other pods. In this case, the Linux nodes are configured with a default route pointing to such a router. This mode is used for cloud provider network integration. For details, see Google Cloud, AWS ENI, and Azure IPAM.
Each node knows all pod IPs of all other nodes and inserts routes into the Linux kernel routing table to represent this.
If all nodes share a single L2 network, the option auto-direct-node-routes: true can be enabled to handle this. In this experiment, we use this approach to enable native routing.
Otherwise, an additional system component such as a BGP daemon must be run to distribute routes. For how to achieve this using the kube-router project, see the guide Running BGP with Kube-Router.
Hands-On: Enabling Native Routing
From now on, subsequent Cilium installation configurations become increasingly complex with many custom parameters, so we will start using the Helm Chart method to install Cilium.
│ 📚️Reference:
│
│ The Helm Chart method is suitable for advanced installations and production environments that require fine-grained control over the Cilium installation. It requires you to manually select the optimal datapath and IPAM mode for your specific Kubernetes environment.
First, perform a basic installation using Helm Chart to ensure the configuration matches the previous article.
Uninstall Cilium
First, uninstall the Cilium that was installed via cilium install.
If you created a cluster with nodes that do not use the node.cilium.io/agent-not-ready taint, you need to manually restart unmanaged pods. Restart all already-running pods that are not running in host networking mode to ensure Cilium starts managing them. This ensures that all pods running before Cilium was deployed have Cilium-provided network connectivity and that NetworkPolicy applies to them:
1 2 3 4 5 6 7 8 9 10
$ kubectl get pods --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,HOSTNETWORK:.spec.hostNetwork --no-headers=true | grep '<none>' | awk '{print "-n "$1" "$2}' | xargs -L 1 -r kubectl delete pod pod "helm-install-traefik-crd-wv67f" deleted pod "helm-install-traefik-vt2zh" deleted pod "svclb-traefik-c19bcc42-6jqxs" deleted pod "coredns-59b4f5bbd5-qmn2k" deleted pod "local-path-provisioner-76d776f6f9-mpct2" deleted pod "traefik-57c84cf78d-jpx47" deleted pod "metrics-server-68cf49699b-dxvnk" deleted pod "hubble-ui-68fb44f6f5-z9w7c" deleted pod "hubble-relay-5f68b89b76-s6xp5" deleted
–reuse-values reuses the configuration from the previous Helm Chart installation
tunnel=disabled enables native routing mode
autoDirectNodeRoutes=true each node knows all pod IPs of all other nodes and inserts routes into the Linux kernel routing table to represent this. If all nodes share a single L2 network, the option auto-direct-node-routes: true can be enabled to handle this.
ipv4-native-routing-cidr: x.x.x.x/y sets the CIDR for which native routing can be performed.
At this point, native routing is enabled. You can run the relevant commands again to verify.
Verifying Native Routing Is Enabled
First, before enabling native routing (i.e., with VXLAN encapsulation), there is a corresponding VXLAN network interface cilium_vxlan. Example:
1 2 3 4
5: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN groupdefault qlen 1000 link/ether 52:5b:dd:37:f5:45 brd ff:ff:ff:ff:ff:ff inet6 fe80::505b:ddff:fe37:f545/64 scope link valid_lft forever preferred_lft forever
$ ip a ... 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:15:5d:02:20:22 brd ff:ff:ff:ff:ff:ff inet 172.17.236.121/20 brd 172.17.239.255 scope global dynamic noprefixroute eth0 valid_lft 84958sec preferred_lft 84958sec inet6 fe80::e4ed:31d3:3101:3265/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether f6:e6:97:fa:8a:d9 brd ff:ff:ff:ff:ff:ff inet6 fe80::f4e6:97ff:fefa:8ad9/64 scope link valid_lft forever preferred_lft forever 4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 72:f7:bb:f9:31:0b brd ff:ff:ff:ff:ff:ff inet 10.0.0.172/32 scope global cilium_host valid_lft forever preferred_lft forever inet6 fe80::70f7:bbff:fef9:310b/64 scope link valid_lft forever preferred_lft forever 15: lxca13b12696333@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether de:89:24:7b:86:e0 brd ff:ff:ff:ff:ff:ff link-netns cni-0253f30e-07bc-2273-640c-7ec96f0a30dd inet6 fe80::dc89:24ff:fe7b:86e0/64 scope link valid_lft forever preferred_lft forever ...
You can see that the Cilium and lxc-related interfaces now have an MTU consistent with eth0: mtu 1500. Before enabling native routing, the MTU was mtu 1280.
Without native routing enabled, the MTU with VXLAN encapsulation is as follows:
The result is 1266 MBytes/sec — nearly identical to the native VM bandwidth.
Summary
👍️ Disabling encapsulation (tunnel) (VXLAN encapsulation mode in this test) and enabling native routing does indeed improve maximum network throughput.
Conclusion
Without any configuration provided, Cilium automatically runs in encapsulation (tunnel) mode because it has the lowest requirements on the underlying network infrastructure.
In this mode, all cluster nodes form a mesh of tunnels using the UDP-based encapsulation protocol VXLAN or Geneve.
Due to the added encapsulation headers, the MTU available for the payload is lower than with native routing, resulting in reduced maximum throughput for specific network connections.
Enabling Native Routing avoids this issue, but it does have certain requirements on the local network. In this article, we enabled it using the autoDirectNodeRoutes=true approach.
Through iperf testing, we confirmed that enabling native routing does improve throughput. 💪