๐ ๐๐ ๐ฝ๐ฒ๐ฐ๐๐ฒ๐ฑ ๐๐ต๐ฒ ๐ฐ๐ต๐ฒ๐ฎ๐ฝ๐ฒ๐ฟ ๐บ๐ผ๐ฑ๐ฒ๐น ๐๐ผ ๐ฏ๐ฒ ๐ฐ๐ต๐ฒ๐ฎ๐ฝ๐ฒ๐ฟ
I tested Claude Haiku against Gemini 2.5 Flash.
Flash has a lower price per token. I expected it to be cheaper. It cost 8.6 times more.
Flash is a thinking model. Before it answered a prompt, it spent tokens on reasoning. Reasoning costs money. Haiku used 4 tokens. Flash used 28.
I only saw this because I track every call. I log tokens, cost, and latency to Postgres. I do this because I spent years building real-time payment systems.
In payments, a rounding error is a crisis.
I spent two years building cross-border payments at NPCI. This year, I built an LLM gateway. I used the same tools.
AI infrastructure looks new. It is actually standard systems work.
An LLM API is a downstream dependency. It is slow. It goes down. It has rate limits. It bills you. You have used these before with payment processors or banks.
The model is not magic. It is just a new, expensive, and flaky dependency. The hard problems remain the same:
- Reliability
- Cost control
- Failover
I built circuit breakers for my gateway. I built these same breakers at NPCI. When a partner bank fails, you do not keep hitting it. You trip the breaker, fail fast, and wait for recovery. The logic is identical whether the partner is a bank or an AI provider.
I also needed to meter every call. This requires an audit log. In payments, you never use floats for money. You use fixed-precision numbers. I use NUMERIC for cost logs in my gateway. You do not approximate spend.
The logic for retries is also muscle memory. A bad retry in payments causes a double debit. In AI, a bad retry wastes tokens. The scale changed, but the problem did not.
AI introduces new challenges:
- Token economics.
- Non-deterministic outputs.
- Models spending your budget on reasoning.
If you are a backend engineer, your skills transfer to AI. Anyone can call an API. Few people can make that call reliable, cheap, and observable.
That is not just AI expertise. That is core engineering.
The gateway is live: https://llm-gateway-python.onrender.com
The code is on GitHub: https://github.com/Yogesh23012001/llm-gateway-python
Full post: https://dev.to/yogesh23012001/i-expected-the-cheaper-model-to-be-cheaper-it-cost-86x-more-5cph
Optional learning community: https://t.me/GyaanSetuAi