𝗜 𝗦𝗽𝗲𝗻𝘁 $𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗠𝗮𝗸𝗶𝗻𝗴 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀

I built a RAG pipeline for private document search. It cost me $500 in compute and weeks of debugging. The results were bad. Users got irrelevant answers and queries were slow.

I audited the pipeline and found 7 common mistakes. Fixing them changed everything.

  1. Fixed Token Chunking I split documents by fixed token counts. This destroyed context. A sentence would split in half. The LLM received fragmented data and gave poor answers.
  • The fix: Use semantic chunking with parent-document retrieval.
  • Split by natural boundaries like paragraphs or headers.
  • Create small child chunks for search.
  • Return the full parent document to the LLM when a match occurs.
  • Add 10-20% overlap between chunks.
  1. Default Search Weights I used a 50/50 split for vector and keyword search. For technical docs, exact keywords matter more.
  • The fix: Use dynamic weights.
  • Factual queries: 35% vector, 65% keyword.
  • Semantic queries: 75% vector, 25% keyword.
  1. Over-optimizing HNSW Parameters I set ef_construction to the maximum value. On a large index, this crashed my server and used all my RAM.
  • The fix: Use appropriate HNSW settings.
  • Keep M between 8 and 32.
  • Set ef_construction to 200.
  • Set ef_search to 50.
  1. Wrong Embedding Models I used a general model trained on web text. It did not understand my technical engineering docs.
  • The fix: Switch to a model fine-tuned for technical or code content.
  1. Natural Language Mismatch Users ask questions like "why is my build slow." Documentation uses terms like "CI pipeline optimization." There was zero overlap.
  • The fix: Add an LLM query rewrite step.
  • Rewrite the user query into technical terms before searching.
  1. Redundant Context Retrieving the top 10 chunks often meant getting the same paragraph three times. This caused hallucinations.
  • The fix: Use Maximal Marginal Relevance (MMR) to ensure diversity in results.
  1. End-to-End Evaluation Only I only checked the final answer. I did not know if the problem was retrieval or the LLM.
  • The fix: Evaluate retrieval separately.
  • Track hit rate and Mean Reciprocal Rank (MRR).
  • Build a test set of 100 query-document pairs.

Results after fixes: • Answer relevance: 45% to 85% • Query latency: 3.2s to 1.8s • Monthly cost: $180 to $95

Prima risolvi il chunking. Poi i pesi. Poi la qualità degli embedding.

Qual è il tuo più grande mal di testa con il RAG? Scrivilo nei commenti.

Fonte: https://dev.to/kollittle/i-spent-500-on-rag-infrastructure-before-realizing-these-7-mistakes-were-killing-my-results-iph

Community di apprendimento opzionale: https://t.me/GyaanSetuAi