Stop HPA Flapping: Ten Tips to Make Your Pod Autoscaling Rock-Solid
This article was last updated on: May 17, 2026 am
Kubernetes HPA seems simple enough — just a single kubectl autoscale and you’re done. But once you hit production, you’ll discover how maddening the “flapping” can be — suddenly scaling out to 10 Pods and maxing out resources, then dropping back to 2, causing QPS to spike and triggering another scale-up… Eventually your entire cluster’s CPU gets eaten alive by HPA, and your workload still can’t keep up. 😱
Let’s be honest: HPA was never a “set it and forget it” feature. It’s more like a double-edged sword — wield it well and it’s a cost-saving powerhouse; get it wrong and it becomes the root cause of performance oscillation.
Today, I’m sharing 10 core tips distilled from hard-won experience tuning HPA in InsurTech scenarios. Hopefully they’ll save you some pain.
1. Start by Setting targetAverageUtilization and stabilizationWindowSeconds
These two parameters are HPA’s “stabilizers” and must be used together.
- targetAverageUtilization: The target utilization rate. It’s a relative value, not an absolute one. I typically use the Pod’s resource requests (requests) as the baseline rather than limits (limits). Why? Because requests are what the scheduler uses to evaluate resources, making the calculation more accurate.
- stabilizationWindowSeconds: The stabilization window. It tells HPA how long to observe metrics before making a scale-down (or scale-up) decision. The default is 5 minutes, which is sufficient for most workloads. For bursty traffic scenarios, you can extend it to 3–5 minutes to avoid “tidal flapping.”
│ 📝Notes: Don’t set this window too large — say 10 minutes. If metrics stay low for an extended period, HPA will take too long to scale down, wasting resources.
2. Don’t Just Watch CPU — Try custom and external Metrics
CPU and memory utilization only reflect container-level “health” — they can’t capture actual business load. For example, if your service is I/O-intensive, CPU utilization may be low while QPS is skyrocketing. In that case, CPU-based HPA is essentially flying blind.
I strongly recommend introducing custom metrics such as QPS, request latency, and message queue depth. These are the real “business signals.”
1 | |
3. Pair Your HPA Pods with a PodDisruptionBudget (PDB)
This one gets overlooked all the time. If you haven’t configured a PDB for critical workloads, HPA can kill all your Pods at once during node maintenance or rolling updates, causing an outright service outage.
Recommended approach: Configure a PDB for every Deployment managed by HPA, ensuring that at least minAvailable Pods are running at all times.
4. HPA + Cluster Autoscaler: They Must Be Best Friends
What happens when a single node runs out of resources? HPA can only manage Pod scaling — it can’t provision nodes. Without Cluster Autoscaler (CA), you’ll hit the awkward situation of “scaling out halfway, then running out of node resources.”
Best practice: When HPA needs to scale out, CA monitors node resources. If nodes are insufficient, CA automatically requests new ones. Once the nodes are ready, HPA schedules Pods onto them. But there’s a catch — see tip #6.
5. Watch Out for Cold Start Latency — Optimize Pod Startup Time
Is your image too large? Startup scripts too complex? Database connection pool pre-allocation consuming too many resources? All of these cause slow Pod startup, and the Pod may not be able to serve traffic properly for a while after starting.
As a result, HPA keeps adding more Pods to fill the gap, leading to “over-scaling.”
My recommendations:
- Use smaller base images (e.g., distroless)
- Use readinessProbe and livenessProbe so HPA knows when a Pod is actually Ready.
6. Scale-Down Cooldown — Don’t Kill Pods Right After They Start
This is the most common pitfall with HPA + CA. You set a low targetAverageUtilization, the cluster runs out of resources, and CA triggers node provisioning. The node becomes ready a few minutes later, HPA sees the Pods running — but by then the load has already dropped, so HPA immediately starts scaling down…
Result: Pods ran for nothing, nodes were provisioned for nothing. 😑
Solution: Set stabilizationWindowSeconds and scale-down-delay (supported by some HPA implementations) appropriately. Additionally, configure terminationGracePeriodSeconds in your Deployment to give Pods enough time for graceful shutdown and avoid surge effects.
7. Monitor HPA Status — Don’t Rely on Gut Feeling
I’ve seen too many people configure HPA and never check its status. The Conditions field in kubectl describe hpa tells you whether HPA can compute metrics, whether it can fetch metrics, and whether it’s currently scaling. If you see FailedGetResourceMetric or FailedGetCustomMetric, investigate immediately.
8. Multi-Metric Coordination Strategy
HPA v2 supports configuring multiple metrics simultaneously (e.g., CPU + QPS). HPA only scales down when all metrics meet the criteria (AND logic). But for scaling up, any single metric exceeding the threshold triggers a scale-up (OR logic).
This design makes perfect sense: scale up aggressively, scale down conservatively.
9. Set Pod Topology Spread Constraints for HPA-Managed Pods
Prevent all newly scaled Pods from being scheduled onto the same node. Combined with CA, this avoids a single node failure taking down your entire service. I recommend using topologySpreadConstraints for this.
10. Test Your HPA — Load Test with kubectl run
Don’t wait until production breaks to verify your configuration.
1 | |
By watching kubectl get hpa -w, you can see in real time how Pod count changes with load.
Summary
HPA is no silver bullet, but it is the core of Kubernetes elastic scaling. Each of these 10 tips you apply brings you one step closer to stable, efficient, cost-effective operations.
One last thing I want to emphasize: invest in observability. Without comprehensive logs and metrics, you’ll never truly understand what HPA is doing. This may be even more important than configuring HPA itself.
🎉🎉🎉
│ Technology iteration isn’t elimination — it’s evolution. Keep learning, keep optimizing.
📚️References