Cilium Series - 4 - Cilium Native Routing

This article was last updated on: May 17, 2026 am

Series Articles

Introduction

In the previous article, we mentioned that after the default cilium install, the Cilium feature enablement status is as follows:

  1. datapath mode: tunnel: For compatibility reasons, Cilium enables the tunnel (VXLAN-based) datapath mode by default, which is an overlay network architecture.
  2. KubeProxyReplacement: Disabled Cilium does not fully replace kube-proxy. We will cover how to achieve this replacement in a future article.
  3. IPv6 BIG TCP: Disabled This feature requires Linux Kernel >= 5.19, so it is disabled on Kernel 4.19.232.
  4. BandwidthManager: Disabled This feature requires Linux Kernel >= 5.1, so it is currently disabled.
  5. Host Routing: Legacy Legacy Host Routing still uses iptables, resulting in weaker performance; however, BPF-based host routing requires Linux Kernel >= 5.10.
  6. Masquerading: IPtables There are several IP masquerading methods: eBPF-based and iptables-based. The default is iptables-based, but eBPF-based is recommended.
  7. Hubble Relay: disabled Hubble is also disabled by default.

Today we will try to disable the tunnel feature and enable Native Routing to improve network performance.

Test Environment

  • Cilium 1.13.4
  • K3s v1.26.6+k3s1
  • OS
    • 3x Ubuntu 23.04 VMs, Kernel 6.2, x86

VXLAN Encapsulation

Without any configuration provided, Cilium automatically runs in this mode because it has the lowest requirements on the underlying network infrastructure.

In this mode, all cluster nodes form a mesh of tunnels using the UDP-based encapsulation protocol VXLAN or Geneve. All traffic between Cilium nodes is encapsulated.

Drawbacks of This Mode

MTU Overhead

Due to the added encapsulation headers, the MTU available for the payload is lower than with native routing (50 bytes per network packet for VXLAN). This results in reduced maximum throughput for specific network connections.

Native Routing

The native routing datapath is enabled when tunnel: disabled is set, which activates the native packet forwarding mode. The native packet forwarding mode leverages the routing capabilities of the network where Cilium runs, instead of performing encapsulation.

Native-Routing

In native routing mode, Cilium delegates all packets not addressed to other local endpoints to the Linux kernel’s routing subsystem. This means packets are routed as if a local process had emitted them. Therefore, the network connecting the cluster nodes must be capable of routing PodCIDR.

When configuring native routing, Cilium automatically enables IP forwarding in the Linux kernel.

Network Requirements

  • To run in native routing mode, the network connecting the hosts running Cilium must be capable of forwarding IP traffic using addresses assigned to pods or other workloads.
  • The Linux kernel on each node must know how to forward packets for pods or other workloads on all nodes running Cilium. This can be achieved in two ways:
    • The nodes themselves do not know how to route all pod IPs, but a router on the network knows how to reach all other pods. In this case, the Linux nodes are configured with a default route pointing to such a router. This mode is used for cloud provider network integration. For details, see Google Cloud, AWS ENI, and Azure IPAM.
    • Each node knows all pod IPs of all other nodes and inserts routes into the Linux kernel routing table to represent this.
      • If all nodes share a single L2 network, the option auto-direct-node-routes: true can be enabled to handle this. In this experiment, we use this approach to enable native routing.
      • Otherwise, an additional system component such as a BGP daemon must be run to distribute routes. For how to achieve this using the kube-router project, see the guide Running BGP with Kube-Router.

Hands-On: Enabling Native Routing

From now on, subsequent Cilium installation configurations become increasingly complex with many custom parameters, so we will start using the Helm Chart method to install Cilium.

│ 📚️Reference:

│ The Helm Chart method is suitable for advanced installations and production environments that require fine-grained control over the Cilium installation. It requires you to manually select the optimal datapath and IPAM mode for your specific Kubernetes environment.

First, perform a basic installation using Helm Chart to ensure the configuration matches the previous article.

Uninstall Cilium

First, uninstall the Cilium that was installed via cilium install.

1
2
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
cilium uninstall

Basic Helm Chart Installation

Then, perform a basic installation using Helm Chart to ensure the configuration matches the previous article.

1
2
3
4
5
6
7
8
9
helm repo add cilium https://helm.cilium.io/

helm install cilium cilium/cilium --version 1.13.4 \
--namespace kube-system \
--set operator.replicas=1 \
--set k8sServiceHost=192.168.2.43 \
--set k8sServicePort=6443 \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true

Explanation:

  • –namespace kube-system keeps consistency with the default cilium install, installing Cilium under kube-system
  • operator.replicas=1 sets the Operator replica count to 1 (default is 2)
  • k8sServiceHost k8sServicePort explicitly specify the IP and port of the K8s cluster’s APIServer
  • hubble.relay.enabled=true hubble.ui.enabled=true enable Hubble observability

Restart Unmanaged Pods

If you created a cluster with nodes that do not use the node.cilium.io/agent-not-ready taint, you need to manually restart unmanaged pods. Restart all already-running pods that are not running in host networking mode to ensure Cilium starts managing them. This ensures that all pods running before Cilium was deployed have Cilium-provided network connectivity and that NetworkPolicy applies to them:

1
2
3
4
5
6
7
8
9
10
$ kubectl get pods --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,HOSTNETWORK:.spec.hostNetwork --no-headers=true | grep '<none>' | awk '{print "-n "$1" "$2}' | xargs -L 1 -r kubectl delete pod
pod "helm-install-traefik-crd-wv67f" deleted
pod "helm-install-traefik-vt2zh" deleted
pod "svclb-traefik-c19bcc42-6jqxs" deleted
pod "coredns-59b4f5bbd5-qmn2k" deleted
pod "local-path-provisioner-76d776f6f9-mpct2" deleted
pod "traefik-57c84cf78d-jpx47" deleted
pod "metrics-server-68cf49699b-dxvnk" deleted
pod "hubble-ui-68fb44f6f5-z9w7c" deleted
pod "hubble-relay-5f68b89b76-s6xp5" deleted

Enable Native Routing via Helm Chart

1
2
3
4
5
6
helm upgrade cilium cilium/cilium \
--namespace kube-system \
--reuse-values \
--set tunnel=disabled \
--set autoDirectNodeRoutes=true \
--set ipv4NativeRoutingCIDR=10.0.0.0/22

Configuration details:

  • –reuse-values reuses the configuration from the previous Helm Chart installation
  • tunnel=disabled enables native routing mode
  • autoDirectNodeRoutes=true each node knows all pod IPs of all other nodes and inserts routes into the Linux kernel routing table to represent this. If all nodes share a single L2 network, the option auto-direct-node-routes: true can be enabled to handle this.
  • ipv4-native-routing-cidr: x.x.x.x/y sets the CIDR for which native routing can be performed.

At this point, native routing is enabled. You can run the relevant commands again to verify.

Verifying Native Routing Is Enabled

First, before enabling native routing (i.e., with VXLAN encapsulation), there is a corresponding VXLAN network interface cilium_vxlan. Example:

1
2
3
4
5: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 52:5b:dd:37:f5:45 brd ff:ff:ff:ff:ff:ff
inet6 fe80::505b:ddff:fe37:f545/64 scope link
valid_lft forever preferred_lft forever

You can check the Cilium Agent logs:

1
2
3
4
5
6
7
8
$ k3s kubectl logs -f cilium-nxbsn -n kube-system|grep datapath
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
level=info msg=" --datapath-mode='veth'" subsys=daemon
level=info msg="clang (10.0.0) and kernel (6.2.0) versions: OK!" subsys=linux-datapath
level=info msg="linking environment: OK!" subsys=linux-datapath
level=info msg="Restored 1 node IDs from the BPF map" subsys=linux-datapath
level=info msg="Detected devices" devices="[]" subsys=linux-datapath
level=info msg="Setting up BPF datapath" bpfClockSource=jiffies bpfInsnSet=v3 subsys=datapath-loader

From --datapath-mode=‘veth’, you can confirm that native routing has been successfully enabled.

You can also check the MTU of the network interfaces. The Cilium VXLAN interface is gone:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
$ ip a
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:15:5d:02:20:22 brd ff:ff:ff:ff:ff:ff
inet 172.17.236.121/20 brd 172.17.239.255 scope global dynamic noprefixroute eth0
valid_lft 84958sec preferred_lft 84958sec
inet6 fe80::e4ed:31d3:3101:3265/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether f6:e6:97:fa:8a:d9 brd ff:ff:ff:ff:ff:ff
inet6 fe80::f4e6:97ff:fefa:8ad9/64 scope link
valid_lft forever preferred_lft forever
4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 72:f7:bb:f9:31:0b brd ff:ff:ff:ff:ff:ff
inet 10.0.0.172/32 scope global cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::70f7:bbff:fef9:310b/64 scope link
valid_lft forever preferred_lft forever
15: lxca13b12696333@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether de:89:24:7b:86:e0 brd ff:ff:ff:ff:ff:ff link-netns cni-0253f30e-07bc-2273-640c-7ec96f0a30dd
inet6 fe80::dc89:24ff:fe7b:86e0/64 scope link
valid_lft forever preferred_lft forever
...

You can see that the Cilium and lxc-related interfaces now have an MTU consistent with eth0: mtu 1500. Before enabling native routing, the MTU was mtu 1280.

Without native routing enabled, the MTU with VXLAN encapsulation is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ ip a
...
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether aa:94:b7:b4:25:ac brd ff:ff:ff:ff:ff:ff
inet 192.168.2.44/24 brd 192.168.2.255 scope global dynamic noprefixroute eth0
valid_lft 74264sec preferred_lft 74264sec
inet6 240e:3a1:166d:dd70:4ea1:7c0c:13de:aa3/64 scope global dynamic noprefixroute
valid_lft 208339sec preferred_lft 121939sec
inet6 fe80::b0:3f98:e4e1:1d16/64 scope link noprefixroute
valid_lft forever preferred_lft forever
6: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc noqueue state UP group default qlen 1000
link/ether be:0f:af:14:c7:05 brd ff:ff:ff:ff:ff:ff
inet6 fe80::bc0f:afff:fe14:c705/64 scope link
valid_lft forever preferred_lft forever
7: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc noqueue state UP group default qlen 1000
link/ether 1e:96:a5:af:3c:a3 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.109/32 scope global cilium_host
valid_lft forever preferred_lft forever
inet6 fe80::1c96:a5ff:feaf:3ca3/64 scope link
valid_lft forever preferred_lft forever
98: lxc_health@if97: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1280 qdisc noqueue state UP group default qlen 1000
link/ether 1a:41:2c:3b:18:0b brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::1841:2cff:fe3b:180b/64 scope link
valid_lft forever preferred_lft forever
...

Performance Testing

We use iperf to test network throughput and verify the performance improvement from enabling native routing. We use iperf3 for the tests.

Inter-VM Bandwidth

Test the native bandwidth between VMs. Install iperf3 via apt:

1
sudo apt install -y iperf3

Test the inter-VM bandwidth. Results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ iperf3 -c 192.168.2.3 -f M
Connecting to host 192.168.2.3, port 5201
[ 5] local 192.168.2.26 port 32930 connected to 192.168.2.3 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.02 GBytes 1047 MBytes/sec 0 3.12 MBytes
[ 5] 1.00-2.00 sec 1.13 GBytes 1161 MBytes/sec 0 3.12 MBytes
[ 5] 2.00-3.00 sec 1.12 GBytes 1150 MBytes/sec 0 3.12 MBytes
[ 5] 3.00-4.00 sec 1.08 GBytes 1107 MBytes/sec 0 3.12 MBytes
[ 5] 4.00-5.00 sec 1.17 GBytes 1194 MBytes/sec 0 3.12 MBytes
[ 5] 5.00-6.00 sec 1.09 GBytes 1120 MBytes/sec 0 3.12 MBytes
[ 5] 6.00-7.00 sec 1.10 GBytes 1128 MBytes/sec 0 3.12 MBytes
[ 5] 7.00-8.00 sec 1.10 GBytes 1131 MBytes/sec 0 3.12 MBytes
[ 5] 8.00-9.00 sec 1.18 GBytes 1211 MBytes/sec 0 3.12 MBytes
[ 5] 9.00-10.00 sec 1.11 GBytes 1133 MBytes/sec 0 3.12 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 11.1 GBytes 1138 MBytes/sec 0 sender
[ 5] 0.00-10.00 sec 11.1 GBytes 1138 MBytes/sec receiver

iperf Done.

The result is 1138 MBytes/sec bandwidth.

Deploying iperf3 in Containers

To test both Cilium VXLAN encapsulation and native routing modes, deploy iperf3 as a DaemonSet:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: iperf3
labels:
app: iperf3
spec:
selector:
matchLabels:
app: iperf3
template:
metadata:
labels:
app: iperf3
spec:
containers:
- name: iperf3
image: clearlinux/iperf:3
command: ['/bin/sh', '-c', 'sleep 1d']
ports:
- containerPort: 5201

Results:

1
2
3
4
5
$ k3s kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
iperf3-dmqzb 1/1 Running 0 30s 10.0.0.13 cilium-62-1 <none> <none>
iperf3-g84hd 1/1 Running 0 30s 10.0.2.239 cilium-62-3 <none> <none>
iperf3-lnwfn 1/1 Running 0 30s 10.0.1.39 cilium-62-2 <none> <none>

Testing with iperf3 Inside Containers

Select one pod as the server (on the cilium-62-2 node) and another as the client (on the cilium-62-3 node).

Server (iperf3-lnwfn) command:

1
kubectl exec -it iperf3-lnwfn -- iperf3 -s -f M

Client (iperf3-g84hd) command:

1
kubectl exec -it iperf3-g84hd -- iperf3 -c 10.0.1.39 -f M

VXLAN Encapsulation

Results with VXLAN encapsulation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ kubectl exec -it iperf3-g84hd -- iperf3 -c 10.0.1.39 -f M
Connecting to host 10.0.1.39, port 5201
[ 5] local 10.0.2.239 port 38102 connected to 10.0.1.39 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 377 MBytes 377 MBytes/sec 46 1.19 MBytes
[ 5] 1.00-2.00 sec 458 MBytes 457 MBytes/sec 0 1.31 MBytes
[ 5] 2.00-3.00 sec 538 MBytes 538 MBytes/sec 46 1.43 MBytes
[ 5] 3.00-4.00 sec 538 MBytes 537 MBytes/sec 0 1.49 MBytes
[ 5] 4.00-5.00 sec 525 MBytes 525 MBytes/sec 14 1.50 MBytes
[ 5] 5.00-6.00 sec 494 MBytes 494 MBytes/sec 0 1.51 MBytes
[ 5] 6.00-7.00 sec 494 MBytes 494 MBytes/sec 0 1.51 MBytes
[ 5] 7.00-8.00 sec 494 MBytes 494 MBytes/sec 33 1.52 MBytes
[ 5] 8.00-9.00 sec 528 MBytes 528 MBytes/sec 0 1.53 MBytes
[ 5] 9.00-10.00 sec 495 MBytes 495 MBytes/sec 46 1.54 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 4.82 GBytes 494 MBytes/sec 185 sender
[ 5] 0.00-10.00 sec 4.82 GBytes 493 MBytes/sec receiver

iperf Done.

The result is approximately 493 MBytes/sec bandwidth — roughly half the native bandwidth.

Native Routing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
$ kubectl exec -it iperf3-g84hd -- iperf3 -c 10.0.1.39 -f M
Connecting to host 10.0.1.39, port 5201
[ 5] local 10.0.2.239 port 39518 connected to 10.0.1.39 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.01 GBytes 1030 MBytes/sec 33 1.53 MBytes
[ 5] 1.00-2.00 sec 1.16 GBytes 1191 MBytes/sec 0 2.01 MBytes
[ 5] 2.00-3.00 sec 1.31 GBytes 1339 MBytes/sec 0 2.45 MBytes
[ 5] 3.00-4.00 sec 1.28 GBytes 1312 MBytes/sec 0 2.79 MBytes
[ 5] 4.00-5.00 sec 1.25 GBytes 1283 MBytes/sec 0 3.00 MBytes
[ 5] 5.00-6.00 sec 1.28 GBytes 1310 MBytes/sec 0 3.00 MBytes
[ 5] 6.00-7.00 sec 1.26 GBytes 1292 MBytes/sec 0 3.01 MBytes
[ 5] 7.00-8.00 sec 1.31 GBytes 1337 MBytes/sec 0 3.01 MBytes
[ 5] 8.00-9.00 sec 1.23 GBytes 1260 MBytes/sec 0 3.01 MBytes
[ 5] 9.00-10.00 sec 1.28 GBytes 1308 MBytes/sec 92 3.01 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 12.4 GBytes 1266 MBytes/sec 125 sender
[ 5] 0.00-10.00 sec 12.4 GBytes 1266 MBytes/sec receiver

iperf Done.

The result is 1266 MBytes/sec — nearly identical to the native VM bandwidth.

Summary

👍️ Disabling encapsulation (tunnel) (VXLAN encapsulation mode in this test) and enabling native routing does indeed improve maximum network throughput.

Conclusion

Without any configuration provided, Cilium automatically runs in encapsulation (tunnel) mode because it has the lowest requirements on the underlying network infrastructure.

In this mode, all cluster nodes form a mesh of tunnels using the UDP-based encapsulation protocol VXLAN or Geneve.

Due to the added encapsulation headers, the MTU available for the payload is lower than with native routing, resulting in reduced maximum throughput for specific network connections.

Enabling Native Routing avoids this issue, but it does have certain requirements on the local network. In this article, we enabled it using the autoDirectNodeRoutes=true approach.

Through iperf testing, we confirmed that enabling native routing does improve throughput. 💪

📚️References