𝗜 𝗦𝗽𝗲𝗻𝘁 $𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗙𝗶𝘅𝗶𝗻𝗴 𝗧𝗵𝗲𝘀𝗲 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀

Translated for your language. Read the original.

AI-assisted draft.

I built a RAG pipeline for private document search. It cost me $500 in compute and weeks of debugging. The results were bad. Users got wrong answers and queries were slow.

I audited the pipeline. I found 7 mistakes. Fixing them changed everything.

Fixed Token Chunking I split documents by 512 tokens. This destroyed context. An API explanation would split mid-sentence. The LLM received fragments and gave garbage answers. The fix: Use semantic chunking.

Split by natural boundaries like paragraphs or headers.
Use parent-document retrieval.
Create small child chunks for search.
Return the full parent document to the LLM.
Add 10-20% overlap between chunks.

Bad Hybrid Search Weights I used a 50/50 split for vector and keyword search. This failed for technical docs. Technical users need exact keyword matches. The fix: Use dynamic weights.

Factual queries: 35% vector, 65% keyword.
Semantic queries: 75% vector, 25% keyword.
General queries: 60% vector, 40% keyword.

Over-tuning HNSW Parameters I set ef_construction to the maximum. This crashed my server. It used all available RAM. The fix: Use appropriate parameters.

Set M between 8 and 32.
Set ef_construction to 200.
Set ef_search to 50. Memory usage dropped 70%.

General Embedding Models I used a model trained on Wikipedia. My documents were technical engineering runbooks. The model did not understand my domain. The fix: Use a model fine-tuned for technical or code content.
No Query Rewriting Users ask natural questions. Technical docs use formal terms. They do not match. The fix: Add a lightweight LLM step to rewrite queries.

User asks: "why is my build slow"
System rewrites to: "CI pipeline performance optimization"
This improved recall by 40%.

Redundant Results Retrieving top-10 chunks often gave the same paragraph three times. The LLM repeated itself. The fix: Use Maximal Marginal Relevance (MMR) to ensure diversity in results.
Testing the Wrong Thing I tested the whole pipeline at once. I did not know if the problem was retrieval or the LLM. The fix: Separate retrieval evaluation.

Track hit rate.
Track Mean Reciprocal Rank (MRR).
Build a test set of 100 query-document pairs.

ਸੁਧਾਰਾਂ ਤੋਂ ਬਾਅਦ ਦੇ ਨਤੀਜੇ:

ਉੱਤਰ ਦੀ ਪ੍ਰਸੰਗਿਕਤਾ: 45% ਤੋਂ 85%
ਲੇਟੈਂਸੀ: 3.2s ਤੋਂ 1.8s
ਮਹੀਨਾਵਾਰ ਲਾਗਤ: $180 ਤੋਂ $95

ਪਹਿਲਾਂ chunking ਨੂੰ ਠੀਕ ਕਰੋ। ਫਿਰ weights ਨੂੰ। ਫਿਰ embedding quality ਨੂੰ।

ਸਰੋਤ: https://dev.to/kollittle/i-spent-500-on-rag-infrastructure-before-realizing-these-7-mistakes-were-killing-my-results-iph

𝗜 𝗦𝗽𝗲𝗻𝘁 $𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗙𝗶𝘅𝗶𝗻𝗴 𝗧𝗵𝗲𝘀𝗲 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀

Continue reading

ਮੈਂ ਇਸ RAG ਸੈੱਟਅੱਪ ਨਾਲ ਆਪਣੇ AI ਖਰਚੇ 60% ਕਿਵੇਂ ਘਟਾਏ

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗥𝗔𝗚 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗙𝗿𝗼𝗺 𝗦𝗰𝗿𝗮𝘁𝗰𝗵

𝗜 𝗦𝗽𝗲𝗻𝘁 \$𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗠𝗮𝗸𝗶𝗻𝗴 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀

𝗜 𝗕𝘂𝗶𝗹𝘁 𝗮 𝗖𝗼𝗱𝗲 𝗤&𝗔 𝗕𝗼𝘁 𝗪𝗶𝘁𝗵 𝗥𝗔𝗚: 𝗪𝗵𝗮𝘁 𝗪𝗼𝗿𝗸𝗲𝗱 𝗮𝗻𝗱 𝗪𝗵𝗮𝘁 𝗙𝗮𝗶𝗹𝗲𝗱

𝗪𝗵𝘆 𝗬𝗼𝘂𝗿 𝗥𝗔𝗚 𝗦𝘆𝘀𝘁𝗲𝗺 𝗛𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗲𝘀