𝗜 𝗘𝘅𝗽𝗲𝗰𝘁𝗲𝗱 𝘁𝗵𝗲 𝗰𝗵𝗲𝗮𝗽𝗲𝗿 𝗺𝗼𝗱𝗲𝗹 𝘁𝗼 𝗯𝗲 𝗰𝗵𝗲𝗮𝗽𝗲𝗿

📅2 days ago⏱2 min read

I tested Claude Haiku against Gemini 2.5 Flash.

Flash has a lower price per token. I expected it to be cheaper. It cost 8.6 times more.

Flash is a thinking model. Before it answered a prompt, it spent tokens on reasoning. Reasoning costs money. Haiku used 4 tokens. Flash used 28.

I only saw this because I track every call. I log tokens, cost, and latency to Postgres. I do this because I spent years building real-time payment systems.

In payments, a rounding error is a crisis.

I spent two years building cross-border payments at NPCI. This year, I built an LLM gateway. I used the same tools.

AI infrastructure looks new. It is actually standard systems work.

An LLM API is a downstream dependency. It is slow. It goes down. It has rate limits. It bills you. You have used these before with payment processors or banks.

The model is not magic. It is just a new, expensive, and flaky dependency. The hard problems remain the same:

Reliability
Cost control
Failover

I built circuit breakers for my gateway. I built these same breakers at NPCI. When a partner bank fails, you do not keep hitting it. You trip the breaker, fail fast, and wait for recovery. The logic is identical whether the partner is a bank or an AI provider.

I also needed to meter every call. This requires an audit log. In payments, you never use floats for money. You use fixed-precision numbers. I use NUMERIC for cost logs in my gateway. You do not approximate spend.

The logic for retries is also muscle memory. A bad retry in payments causes a double debit. In AI, a bad retry wastes tokens. The scale changed, but the problem did not.

AI introduces new challenges:

Token economics.
Non-deterministic outputs.
Models spending your budget on reasoning.

If you are a backend engineer, your skills transfer to AI. Anyone can call an API. Few people can make that call reliable, cheap, and observable.

That is not just AI expertise. That is core engineering.

The gateway is live: https://llm-gateway-python.onrender.com

The code is on GitHub: https://github.com/Yogesh23012001/llm-gateway-python

Full post: https://dev.to/yogesh23012001/i-expected-the-cheaper-model-to-be-cheaper-it-cost-86x-more-5cph

Optional learning community: https://t.me/GyaanSetuAi

𝗜 𝗘𝘅𝗽𝗲𝗰𝘁𝗲𝗱 𝘁𝗵𝗲 𝗰𝗵𝗲𝗮𝗽𝗲𝗿 𝗺𝗼𝗱𝗲𝗹 𝘁𝗼 𝗯𝗲 𝗰𝗵𝗲𝗮𝗽𝗲𝗿

Continue reading

𝗧𝗵𝗲 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗕𝗶𝗹𝗹 𝗜𝘀 𝗛𝗲𝗿𝗲

𝗔𝗜 𝗔𝗣𝗜 𝗥𝗲𝗹𝗮𝘆𝘀 𝗳𝗼𝗿 𝗕𝗲𝗴𝗶𝗻𝗻𝗲𝗿𝘀

𝗗𝗲𝗲𝗽𝗦𝗲𝗲𝗸 𝗗𝗼𝗺𝗶𝗻𝗮𝘁𝗲𝘀 𝗚𝗹𝗼𝗯𝗮𝗹 𝗔𝗜 𝗨𝘀𝗮𝗴𝗲

𝗛𝗼𝘄 𝘁𝗼 𝗕𝘂𝗶𝗹𝗱 𝗔𝘂𝘁𝗼𝗻𝗼𝗺𝗼𝘂𝘀 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗦𝗸𝗶𝗹𝗹𝘀

𝗪𝗵𝘆 𝗬𝗼𝘂𝗿 𝗚𝗲𝗺𝗶𝗻𝗶 𝗕𝗶𝗹𝗹 𝗗𝗼𝗲𝘀𝗻'𝘁 𝗠𝗮𝘁𝗰𝗵 𝗧𝗵𝗲 𝗠𝗼𝗱𝗲𝗹 𝗡𝗮𝗺𝗲𝘀