From Spikes to Savings: Kubernetes Cost Optimization

Our AWS bill jumped 34% in one quarter. The product roadmap showed no changes. The cause was simple: our Kubernetes clusters were wasting money.

Engineers often guess how much CPU and memory a service needs. They round up to be safe. This creates phantom capacity. You pay for resources that your applications never use.

Here is how we fixed it and saved 34% on monthly costs.

The Core Problem: Requests vs Limits

Requests are what you guarantee. Kubernetes uses this number to place your pod on a node. This number drives your bill.

Limits are the ceiling. If a pod hits a CPU limit, it slows down. If it hits a memory limit, it dies.

Many teams set requests equal to limits. This means you pay for peak capacity 24/7, even when your service is idle.

Our Strategy for Savings

  • Measure before you act: Use Prometheus and Grafana to see actual usage.
  • Use percentiles: Look at p95 usage over 4 weeks. Do not use averages. Averages hide spikes.
  • Right-size requests: Set requests at p95 usage plus a 20% buffer.
  • Manage CPU limits: Avoid tight CPU limits on sensitive services to prevent throttling.
  • Automate scaling: Use HPA for traffic spikes and VPA to tune individual pods.

The Results

We reduced our node count from 40 to 26. Average CPU utilization rose from 14% to 52%. Monthly compute costs dropped from $48,200 to $31,900. Latency actually improved by 35%.

Optimization is not a one-time project. It is a habit. If you write a resource request based on a guess, you are wasting money.

Checklist for your cluster:

• Build a dashboard showing requested vs actual usage. • Set requests based on 4 weeks of data. • Run VPA in recommendation mode before letting it make changes. • Review resource specs every quarter. • Give engineering teams visibility into their own costs.

Source: https://dev.to/samarth_05/from-spikes-to-savings-practical-k8s-cost-optimization-for-2026-75k

Optional learning community: https://t.me/GyaanSetuAi