𝗜 𝗦𝗽𝗲𝗻𝘁 $𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗠𝗮𝗸𝗶𝗻𝗴 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀
I built a RAG pipeline for private document search. It cost me $500 in compute and weeks of debugging. The results were bad. Users got irrelevant answers and queries were slow.
I audited the pipeline and found 7 common mistakes. Fixing them changed everything.
- Fixed Token Chunking I split documents by fixed token counts. This destroyed context. A sentence would split in half. The LLM received fragmented data and gave poor answers.
- The fix: Use semantic chunking with parent-document retrieval.
- Split by natural boundaries like paragraphs or headers.
- Create small child chunks for search.
- Return the full parent document to the LLM when a match occurs.
- Add 10-20% overlap between chunks.
- Default Search Weights I used a 50/50 split for vector and keyword search. For technical docs, exact keywords matter more.
- The fix: Use dynamic weights.
- Factual queries: 35% vector, 65% keyword.
- Semantic queries: 75% vector, 25% keyword.
- Over-optimizing HNSW Parameters I set ef_construction to the maximum value. On a large index, this crashed my server and used all my RAM.
- The fix: Use appropriate HNSW settings.
- Keep M between 8 and 32.
- Set ef_construction to 200.
- Set ef_search to 50.
- Wrong Embedding Models I used a general model trained on web text. It did not understand my technical engineering docs.
- The fix: Switch to a model fine-tuned for technical or code content.
- Natural Language Mismatch Users ask questions like "why is my build slow." Documentation uses terms like "CI pipeline optimization." There was zero overlap.
- The fix: Add an LLM query rewrite step.
- Rewrite the user query into technical terms before searching.
- Redundant Context Retrieving the top 10 chunks often meant getting the same paragraph three times. This caused hallucinations.
- The fix: Use Maximal Marginal Relevance (MMR) to ensure diversity in results.
- End-to-End Evaluation Only I only checked the final answer. I did not know if the problem was retrieval or the LLM.
- The fix: Evaluate retrieval separately.
- Track hit rate and Mean Reciprocal Rank (MRR).
- Build a test set of 100 query-document pairs.
Results after fixes: • Answer relevance: 45% to 85% • Query latency: 3.2s to 1.8s • Monthly cost: $180 to $95
Prima risolvi il chunking. Poi i pesi. Poi la qualità degli embedding.
Qual è il tuo più grande mal di testa con il RAG? Scrivilo nei commenti.
Community di apprendimento opzionale: https://t.me/GyaanSetuAi