Cilium Series Part 13: Enabling XDP Acceleration and Cilium Performance Tuning Summary
This article was last updated on: May 17, 2026 am
Series Articles
Introduction
Switching the Kubernetes CNI from other components to Cilium can already effectively improve network performance. However, by toggling different Cilium modes and enabling various features, you can further enhance Cilium’s network performance. Specific tuning items include but are not limited to:
- Enable Native Routing
- Fully replace KubeProxy
- Switch IP Address Masquerading to eBPF-based mode
- Run Kubernetes NodePort implementation in DSR (Direct Server Return) mode
- Bypass iptables Connection Tracking
- Switch Host Routing to BPF-based mode (requires Linux Kernel >= 5.10)
- Enable IPv6 BIG TCP (requires Linux Kernel >= 5.19)
- Disable Hubble (not recommended — observability is more important than a marginal performance gain)
- Change MTU to jumbo frames (requires network conditions to allow it)
- Enable Bandwidth Manager (requires Kernel >= 5.1)
- Enable BBR congestion control for Pods (requires Kernel >= 5.18)
- Enable XDP Acceleration (requires native XDP driver support)
- (Optional for advanced users) Adjust eBPF Map Size
- Linux Kernel optimization and upgrade
- CONFIG_PREEMPT_NONE=y
- Other:
- tuned network-* profiles, e.g.: tuned-adm profile network-latency or network-throughput
- Set CPU to performance mode
- Stop irqbalance and pin NIC interrupts to specific CPUs
When network/NIC/OS conditions permit, we enable as many of these tuning options as possible. Related optimizations will be covered one by one in subsequent articles. Stay tuned.
Today we tune Cilium by enabling XDP acceleration so that inbound requests such as NodePort can be processed directly from the network driver layer, helping reduce latency and scale services. We also provide a summary of Cilium performance tuning.
XDP Acceleration
Cilium has built-in acceleration support for NodePort, LoadBalancer services, and services with external IPs, allowing incoming requests to be pushed back directly from the node when backends are on remote nodes. This feature was introduced in Cilium 1.8 at the XDP (eXpress Data Path) layer, where eBPF runs directly in the network driver rather than at higher layers.
In this scenario, network packets do not need to be pushed all the way up to the upper network stack. With the help of XDP, Cilium can process these requests directly from the network driver layer. This helps reduce latency and scale services, given the significantly improved forwarding capacity of a single node. Starting from Cilium 1.8, kube-proxy at the XDP layer is replaced.
Requirements
- Kernel >= 4.19.57, >= 5.1.16, >= 5.2
- Native XDP driver support — see Cilium’s driver list for details
- Direct-routing configuration
- eBPF-based kube-proxy replacement
To enable XDP acceleration, refer to Cilium’s Getting Started Guide, which also includes instructions for setup on public cloud providers.
The loadBalancer.acceleration setting allows enabling acceleration via the native option. The disabled option is the default and disables acceleration. Most drivers supporting 10G or higher speeds also support native XDP on recent kernels. For cloud-based deployments, most of these drivers have SR-IOV variants that support native XDP. For on-premises deployments, Cilium XDP acceleration can be used in combination with Kubernetes load balancer service implementations such as MetalLB. The acceleration feature can only be enabled on a single device used for direct routing.
The load balancer acceleration setting supports DSR, SNAT, and Hybrid modes.
To understand where Cilium’s XDP service acceleration fits in the overall picture, here is a brief overview of Cilium 1.8’s service load balancing architecture:

As shown, Cilium’s kube-proxy replacement in eBPF consists of two main components at a high level: eBPF at the socket layer and eBPF at the driver layer.
- East-west traffic — all service traffic between Cilium-managed nodes — is handled solely at the kernel’s socket layer, before memory is allocated for packet metadata. Executing at this point allows Cilium to eliminate the per-packet cost of service translation.
- North-south traffic — all inbound service traffic from external sources to Cilium-managed nodes — is processed as close to the driver layer as possible, with ingress and egress operations on a single interface. This enables very fast forwarding and can even drop or reflect traffic back to the inbound interface before any expensive operations higher up the stack. The latter component handling north-south traffic is accelerated via XDP.
Cilium’s service XDP acceleration currently supports direct routing mode and shares the same core code with the tc eBPF implementation. After XDP service translation, three options are available for redirecting traffic to remote backends: DSR, SNAT, and Hybrid.
Implementation
1 | |
Verification
To verify that your installation is using XDP acceleration, run cilium status in any Cilium pod and look for the line reporting “XDP Acceleration” status, which should show “Native”. As shown below:
1 | |
Note that packets pushed back from the device at the XDP layer for NodePort processing are not visible in tcpdump, since the packet tap occurs at a later stage of the network stack. You can use Cilium’s monitor command or metric counters to gain visibility.
Performance Improvement
Cilium conducted preliminary benchmarks by deploying a single service on a freshly kubeadm-deployed node with kernel 5.7, running with iptables-based and ipvs-based kube-proxy to establish baselines, then inserting Cilium’s kube-proxy replacement via eBPF from both the tc and XDP sides, placing it right in front of eBPF:

Preliminary results show that Cilium’s kube-proxy replacement with XDP acceleration delivers a massive improvement, capable of maxing out the packet generator and pushing all 10 million incoming requests to the remote service backend. In contrast, with kube-proxy, the node under test could only forward about 2.1 million requests per second for the same service, with the rest being dropped. A similar situation was observed with ipvs — although ipvs offers better “first-packet” scalability for large numbers of services compared to iptables, the per-packet cost appears slightly higher. Replacing kube-proxy with Cilium’s tc eBPF implementation not only solves the “first-packet” scalability issue but also improves performance, as evidenced by approximately 3.6 million requests per second from that node. However, this still cannot compare with the significant gains achieved when Cilium accelerates at the XDP layer:

Comparing the flame graphs of kube-proxy and Cilium’s XDP implementation at 10 million requests per second also reveals the shortcut for accelerated service processing in the driver’s poll routine. Furthermore, compared to Cilium running eBPF under tc and kube-proxy in iptables and ipvs modes, XDP-accelerated forwarding requires significantly less processing overhead in the softirq context. The following test was run on an otherwise idle system where the node’s CPU was used solely for processing softirq. The graph shows the remaining available CPU capacity. As shown, even at a low rate of approximately 1 million requests per second on a given node, the CPU spends only about 13% of its time processing softirq context for XDP, leaving 87% of remaining capacity available for other purposes. In contrast, with kube-proxy, the CPU spends at least 60% of its time servicing the softirq context, with at most 40% remaining available capacity. At approximately 2 million or 4 million requests per second, the kube-proxy situation worsens further, with only 1-2% idle share as the CPU spends 98% of its time processing packets in the softirq context:

In short, leveraging Cilium to accelerate Kubernetes service processing under XDP dramatically improves the performance of pushing packets to remote backends while significantly reducing CPU overhead. Under the default external traffic policy (externalTrafficPolicy: Cluster), this also increases the overall cluster capacity. This means that scaling a service to more backends is limited only by the forwarding capacity of a single node to those backends. However, even if a Kubernetes deployment does not need to handle that many packets, those CPU cycles can be freed up for actual user workloads.
Summary
This article continues tuning Cilium by enabling XDP acceleration so that inbound requests such as NodePort can be processed directly from the network driver layer. The specific benefits are:
- Dramatically improved performance for pushing packets to remote backends
- Significantly reduced CPU overhead
- Increased overall cluster capacity
At this point, the following performance tuning items have been validated in practice:
- ✔️ Enable Native Routing
- ✔️ Fully replace KubeProxy
- ✔️ Switch IP Address Masquerading to eBPF-based mode
- ✔️ Run Kubernetes NodePort implementation in DSR (Direct Server Return) mode
- ✔️ Bypass iptables Connection Tracking
- ✔️ Switch Host Routing to BPF-based mode (requires Linux Kernel >= 5.10)
- ❌ Enable IPv6 BIG TCP (requires Linux Kernel >= 5.19, supported NICs: mlx4, mlx5)
- Could not be validated due to lack of supported NICs
- ❌ Change MTU to jumbo frames (requires network conditions to allow it)
- Could not be validated due to network conditions not permitting it
- ✔️ Enable Bandwidth Manager (requires Kernel >= 5.1)
- ✔️ Enable BBR congestion control for Pods (requires Kernel >= 5.18)
- ✔️ Enable XDP Acceleration (requires native XDP driver support)
Cilium Performance Tuning Summary
At this point, we have completed the major Cilium performance optimization items.
Cilium tuning can be divided into the following major dimensions:
- Cilium tuning
- Underlying network tuning
- Linux Kernel optimization and upgrade
- Other dimensions of tuning
Cilium Tuning
Cilium tuning includes:
- Enable Native Routing
- Fully replace KubeProxy
- Switch IP Address Masquerading to eBPF-based mode
- Run Kubernetes NodePort implementation in DSR (Direct Server Return) mode
- Bypass iptables Connection Tracking
- Switch Host Routing to BPF-based mode (requires Linux Kernel >= 5.10)
- Enable IPv6 BIG TCP (requires Linux Kernel >= 5.19)
- Disable Hubble (not recommended — observability is more important than a marginal performance gain)
- Enable Bandwidth Manager (requires Kernel >= 5.1)
- Enable BBR congestion control for Pods (requires Kernel >= 5.18)
- Enable XDP Acceleration (requires native XDP driver support)
- (Optional for advanced users) Adjust eBPF Map Size
Underlying Network Tuning
Underlying network tuning includes:
- Change MTU to jumbo frames (requires network conditions to allow it)
Linux Kernel Optimization and Upgrade
Linux Kernel optimization and upgrade includes:
- CONFIG_PREEMPT_NONE=y
Other Dimensions of Tuning
Other dimensions of tuning include:
- tuned network-* profiles, e.g.: tuned-adm profile network-latency or network-throughput
- Set CPU to performance mode
- Stop irqbalance and pin NIC interrupts to specific CPUs
Cilium “Ultimate” Optimization Configuration
Based on personal experience, the recommended Cilium “performance mode” configuration is as follows:
First, Kernel >= 5.10 — this is the latest stable kernel that enables the critically important “BPF-based Host Routing” feature and supports most Cilium features, as shown below:
| Cilium Feature | Minimum Kernel Version |
|---|---|
| Bandwidth Manager | >= 5.1 |
| Egress Gateway | >= 5.2 |
| VXLAN Tunnel Endpoint (VTEP) Integration | >= 5.2 |
| WireGuard Transparent Encryption | >= 5.6 |
| Full support for Session Affinity | >= 5.7 |
| BPF-based proxy redirect | >= 5.7 |
| Socket-level LB bypass in pod netns | >= 5.7 |
| L3 Devices | >= 5.8 |
| BPF-based Host Routing | >= 5.10 |
| BBR Congestion Control for Pods | >= 5.18 |
| IPv6 BIG TCP Support | >= 5.19 |
The recommended Cilium configuration and features include:
- Disable tunneling, disable encryption
- Enable Native Routing
- Fully replace KubeProxy
- Switch IP Address Masquerading to eBPF-based mode
- Run Kubernetes NodePort implementation in DSR (Direct Server Return) mode
- Switch Host Routing to BPF-based mode (requires Linux Kernel >= 5.10)
- Enable Bandwidth Manager (requires Kernel >= 5.1)
- Enable XDP Acceleration (requires native XDP driver support, but most 10G/40G NICs, including virtual NICs and cloud providers, already support it)
Bypass iptables Connection Tracking is optional, because once “BPF-based Host Routing” is enabled, setting this option is unnecessary.
Enabling IPv6 BIG TCP is not recommended — on one hand, it requires a high kernel version (Linux Kernel >= 5.19); on the other hand, IPv6 adoption in Kubernetes is not yet widespread.
Disabling Hubble for performance gains is also not recommended, because observability is more important than a marginal performance improvement.
Enabling BBR congestion control for Pods is not recommended either, due to the high kernel requirement (Kernel >= 5.18). Enable it as needed if conditions permit.
The final installation command is as follows:
1 | |
│ 🐾Warning
│
│ 1. Native Routing requires additional helm parameters — please select and add them according to your actual environment.
│ 2. Choose loadBalancer.mode between DSR and hybrid based on your actual requirements. (Default is SNAT mode)
🎉🎉🎉
📚️References
- LoadBalancer & NodePort XDP Acceleration - Kubernetes Without kube-proxy — Cilium 1.13.4 documentation
- Cilium 1.8: XDP Load Balancing, Cluster-wide Flow Visibility, Host Network Policy, Native GKE & Azure modes, Session Affinity, CRD-mode Scalability, Policy Audit mode, …
- Tuning Guide — Cilium 1.13.4 documentation