๐—œ ๐—˜๐˜…๐—ฝ๐—ฒ๐—ฐ๐˜๐—ฒ๐—ฑ ๐˜๐—ต๐—ฒ ๐—ฐ๐—ต๐—ฒ๐—ฎ๐—ฝ๐—ฒ๐—ฟ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜๐—ผ ๐—ฏ๐—ฒ ๐—ฐ๐—ต๐—ฒ๐—ฎ๐—ฝ๐—ฒ๐—ฟ

I tested Claude Haiku against Gemini 2.5 Flash.

Flash has a lower price per token. I expected it to be cheaper. It cost 8.6 times more.

Flash is a thinking model. Before it answered a prompt, it spent tokens on reasoning. Reasoning costs money. Haiku used 4 tokens. Flash used 28.

I only saw this because I track every call. I log tokens, cost, and latency to Postgres. I do this because I spent years building real-time payment systems.

In payments, a rounding error is a crisis.

I spent two years building cross-border payments at NPCI. This year, I built an LLM gateway. I used the same tools.

AI infrastructure looks new. It is actually standard systems work.

An LLM API is a downstream dependency. It is slow. It goes down. It has rate limits. It bills you. You have used these before with payment processors or banks.

The model is not magic. It is just a new, expensive, and flaky dependency. The hard problems remain the same:

I built circuit breakers for my gateway. I built these same breakers at NPCI. When a partner bank fails, you do not keep hitting it. You trip the breaker, fail fast, and wait for recovery. The logic is identical whether the partner is a bank or an AI provider.

I also needed to meter every call. This requires an audit log. In payments, you never use floats for money. You use fixed-precision numbers. I use NUMERIC for cost logs in my gateway. You do not approximate spend.

The logic for retries is also muscle memory. A bad retry in payments causes a double debit. In AI, a bad retry wastes tokens. The scale changed, but the problem did not.

AI introduces new challenges:

If you are a backend engineer, your skills transfer to AI. Anyone can call an API. Few people can make that call reliable, cheap, and observable.

That is not just AI expertise. That is core engineering.

The gateway is live: https://llm-gateway-python.onrender.com

The code is on GitHub: https://github.com/Yogesh23012001/llm-gateway-python

Full post: https://dev.to/yogesh23012001/i-expected-the-cheaper-model-to-be-cheaper-it-cost-86x-more-5cph

Optional learning community: https://t.me/GyaanSetuAi