প্রোডাকশন এআই-এর লুকানো খরচ

Translated for your language. Read the original.

AI-assisted draft.

১৯ ঘন্টা আগে2min read

𝗧𝗵𝗲 𝗛𝗶𝗱𝗱𝗲𝗻 𝗖𝗼𝘀𝘁 𝗼𝗳 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗔𝗜

The worst bugs in production do not crash your system. They just fail silently.

An LLM provider might have a partial outage. They return a 200 OK status, but the response is empty or nonsense. There is no error. There is no alert. It looks like success, but it is a failure.

This is the real cost of AI. It is not the API bills. It is the failure that looks normal until a user tells you something is wrong.

I run a pipeline that scores 10,000 job listings every day. I use OpenAI, Anthropic, Gemini, DeepSeek, and Groq. Here is how you build fallback chains that work.

Most teams use one provider. It works in development. Then production traffic hits. You face rate limits, degraded responses, or deprecated models.

You need a three-layer architecture:

Layer 1: Primary model. High quality and high cost.
Layer 2: Fallback model. Good quality and lower cost.
Layer 3: Degraded mode. Minimal quality and near-zero cost.

Each layer must use a different provider. If one provider goes down, the others stay up.

Crucial tip: Do not just check the HTTP status. You must validate the output. Use schema validation for structured data. Use length checks for text.

I use three tiers for my tasks:

Tier 1: Complex tasks. I use GPT-4o or Claude 3.5 Sonnet.
Tier 2: Classification. I use GPT-4o mini or Gemini 2.0 Flash.
Tier 3: Speed-critical tasks. I use Groq or DeepSeek V4 Flash.

This routing cuts costs by using expensive models only when necessary.

Do not forget your embedding providers. If your embedding API fails, your RAG pipeline stops working. I maintain two embedding providers in parallel for every pipeline.

To catch silent failures, track these three metrics:

Response time. If a complex prompt returns too fast, the model likely returned a cached or empty response.
Output length. Short responses are a red flag.
Schema compliance. Check if the content is actually useful or just a bunch of null values.

A good fallback chain ensures every request gets a usable response. You pay for extra capacity, but you protect user trust.

Source: https://dev.to/abdul___rehman/the-hidden-cost-of-production-ai-how-to-build-fallback-chains-that-dont-fail-silently-dec

Optional learning community: https://t.me/GyaanSetuAi

প্রোডাকশন এআই-এর লুকানো খরচ

Continue reading

কেন আমি একটি মাত্র AI প্রোভাইডারের ওপর নির্ভর করা বন্ধ করলাম

আমি প্রতিদিন আমার এআই (AI) এর খরচ ট্র্যাক করি

LLM গেটওয়েজ: রাউটিং, ফলব্যাকস এবং সিম্যান্টিক ক্যাশিং

𝗔𝗜 𝗠𝗼𝗱𝗲𝗹 𝗙𝗮𝗶𝗹𝗼𝘃𝗲𝗿 𝗗𝗿𝗶𝗹𝗹𝘀: 𝗞𝗲𝗲𝗽 𝗔𝗴𝗲𝗻𝘁𝘀 𝗨𝘀𝗲𝗳𝘂𝗹 𝗪𝗵𝗲𝗻 𝗣𝗿𝗼𝘃𝗶𝗱𝗲𝗿𝘀 𝗕𝗿𝗲𝗮𝗸

𝗕𝘂𝘆𝗶𝗻𝗴 𝗔𝗜 𝘃𝘀 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴: 𝗔 𝗖𝗙𝗢 𝗖𝗵𝗲𝗰𝗸𝗹𝗶𝘀𝘁