From Spikes to Savings: Kubernetes Cost Optimization
Our AWS bill jumped 34% in one quarter. The product roadmap showed no changes. The cause was simple: our Kubernetes clusters were wasting money.
Engineers often guess how much CPU and memory a service needs. They round up to be safe. This creates phantom capacity. You pay for resources that your applications never use.
Here is how we fixed it and saved 34% on monthly costs.
The Core Problem: Requests vs Limits
Requests are what you guarantee. Kubernetes uses this number to place your pod on a node. This number drives your bill.
Limits are the ceiling. If a pod hits a CPU limit, it slows down. If it hits a memory limit, it dies.
Many teams set requests equal to limits. This means you pay for peak capacity 24/7, even when your service is idle.
Our Strategy for Savings
- Measure before you act: Use Prometheus and Grafana to see actual usage.
- Use percentiles: Look at p95 usage over 4 weeks. Do not use averages. Averages hide spikes.
- Right-size requests: Set requests at p95 usage plus a 20% buffer.
- Manage CPU limits: Avoid tight CPU limits on sensitive services to prevent throttling.
- Automate scaling: Use HPA for traffic spikes and VPA to tune individual pods.
The Results
We reduced our node count from 40 to 26. Average CPU utilization rose from 14% to 52%. Monthly compute costs dropped from $48,200 to $31,900. Latency actually improved by 35%.
Optimization is not a one-time project. It is a habit. If you write a resource request based on a guess, you are wasting money.
Checklist for your cluster:
• Build a dashboard showing requested vs actual usage. • Set requests based on 4 weeks of data. • Run VPA in recommendation mode before letting it make changes. • Review resource specs every quarter. • Give engineering teams visibility into their own costs.
Source: https://dev.to/samarth_05/from-spikes-to-savings-practical-k8s-cost-optimization-for-2026-75k
Optional learning community: https://t.me/GyaanSetuAi
