𝗜 𝗦𝗽𝗲𝗻𝘁 $𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗙𝗶𝘅𝗶𝗻𝗴 𝗧𝗵𝗲𝘀𝗲 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀

I built a RAG pipeline for private document search. It cost me $500 in compute and weeks of debugging. The results were bad. Users got wrong answers and queries were slow.

I audited the pipeline. I found 7 mistakes. Fixing them changed everything.

  1. Fixed Token Chunking I split documents by 512 tokens. This destroyed context. An API explanation would split mid-sentence. The LLM received fragments and gave garbage answers. The fix: Use semantic chunking.
  • Split by natural boundaries like paragraphs or headers.
  • Use parent-document retrieval.
  • Create small child chunks for search.
  • Return the full parent document to the LLM.
  • Add 10-20% overlap between chunks.
  1. Bad Hybrid Search Weights I used a 50/50 split for vector and keyword search. This failed for technical docs. Technical users need exact keyword matches. The fix: Use dynamic weights.
  • Factual queries: 35% vector, 65% keyword.
  • Semantic queries: 75% vector, 25% keyword.
  • General queries: 60% vector, 40% keyword.
  1. Over-tuning HNSW Parameters I set ef_construction to the maximum. This crashed my server. It used all available RAM. The fix: Use appropriate parameters.
  • Set M between 8 and 32.
  • Set ef_construction to 200.
  • Set ef_search to 50. Memory usage dropped 70%.
  1. General Embedding Models I used a model trained on Wikipedia. My documents were technical engineering runbooks. The model did not understand my domain. The fix: Use a model fine-tuned for technical or code content.

  2. No Query Rewriting Users ask natural questions. Technical docs use formal terms. They do not match. The fix: Add a lightweight LLM step to rewrite queries.

  • User asks: "why is my build slow"
  • System rewrites to: "CI pipeline performance optimization"
  • This improved recall by 40%.
  1. Redundant Results Retrieving top-10 chunks often gave the same paragraph three times. The LLM repeated itself. The fix: Use Maximal Marginal Relevance (MMR) to ensure diversity in results.

  2. Testing the Wrong Thing I tested the whole pipeline at once. I did not know if the problem was retrieval or the LLM. The fix: Separate retrieval evaluation.

  • Track hit rate.
  • Track Mean Reciprocal Rank (MRR).
  • Build a test set of 100 query-document pairs.

ਸੁਧਾਰਾਂ ਤੋਂ ਬਾਅਦ ਦੇ ਨਤੀਜੇ:

  • ਉੱਤਰ ਦੀ ਪ੍ਰਸੰਗਿਕਤਾ: 45% ਤੋਂ 85%
  • ਲੇਟੈਂਸੀ: 3.2s ਤੋਂ 1.8s
  • ਮਹੀਨਾਵਾਰ ਲਾਗਤ: $180 ਤੋਂ $95

ਪਹਿਲਾਂ chunking ਨੂੰ ਠੀਕ ਕਰੋ। ਫਿਰ weights ਨੂੰ। ਫਿਰ embedding quality ਨੂੰ।

ਸਰੋਤ: https://dev.to/kollittle/i-spent-500-on-rag-infrastructure-before-realizing-these-7-mistakes-were-killing-my-results-iph