𝗜 𝗦𝗽𝗲𝗻𝘁 $𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗠𝗮𝗸𝗶𝗻𝗴 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀
I built a RAG pipeline for private document search. It cost me $500 in compute and weeks of debugging. The results were bad. Users got irrelevant answers and queries were slow.
I audited the pipeline and found 7 common mistakes. Fixing them changed everything.
- Fixed Token Chunking I split documents by fixed token counts. This destroyed context. A sentence would split in half. The LLM received fragmented data and gave poor answers.
- The fix: Use semantic chunking with parent-document retrieval.
- Split by natural boundaries like paragraphs or headers.
- Create small child chunks for search.
- Return the full parent document to the LLM when a match occurs.
- Add 10-20% overlap between chunks.
- Default Search Weights I used a 50/50 split for vector and keyword search. For technical docs, exact keywords matter more.
- The fix: Use dynamic weights.
- Factual queries: 35% vector, 65% keyword.
- Semantic queries: 75% vector, 25% keyword.
- Over-optimizing HNSW Parameters I set ef_construction to the maximum value. On a large index, this crashed my server and used all my RAM.
- The fix: Use appropriate HNSW settings.
- Keep M between 8 and 32.
- Set ef_construction to 200.
- Set ef_search to 50.
- Wrong Embedding Models I used a general model trained on web text. It did not understand my technical engineering docs.
- The fix: Switch to a model fine-tuned for technical or code content.
- Natural Language Mismatch Users ask questions like "why is my build slow." Documentation uses terms like "CI pipeline optimization." There was zero overlap.
- The fix: Add an LLM query rewrite step.
- Rewrite the user query into technical terms before searching.
- Redundant Context Retrieving the top 10 chunks often meant getting the same paragraph three times. This caused hallucinations.
- The fix: Use Maximal Marginal Relevance (MMR) to ensure diversity in results.
- End-to-End Evaluation Only I only checked the final answer. I did not know if the problem was retrieval or the LLM.
- The fix: Evaluate retrieval separately.
- Track hit rate and Mean Reciprocal Rank (MRR).
- Build a test set of 100 query-document pairs.
Results after fixes: • Answer relevance: 45% to 85% • Query latency: 3.2s to 1.8s • Monthly cost: $180 to $95
Hãy xử lý chunking trước. Sau đó là trọng số. Cuối cùng là chất lượng embedding.
Vấn đề đau đầu nhất của bạn với RAG là gì? Hãy cho tôi biết ở phần bình luận.
Cộng đồng học tập (tùy chọn): https://t.me/GyaanSetuAi