𝗜 𝗦𝗽𝗲𝗻𝘁 \$𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗠𝗮𝗸𝗶𝗻𝗴 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀

Translated for your language. Leggi l'originale.

AI-assisted draft.

GyaanSetu Editoriall’altro ieri2min di lettura

𝗜 𝗦𝗽𝗲𝗻𝘁 $𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗠𝗮𝗸𝗶𝗻𝗴 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀

I built a RAG pipeline for private document search. It cost me $500 in compute and weeks of debugging. The results were bad. Users got irrelevant answers and queries were slow.

I audited the pipeline and found 7 common mistakes. Fixing them changed everything.

Fixed Token Chunking I split documents by fixed token counts. This destroyed context. A sentence would split in half. The LLM received fragmented data and gave poor answers.

The fix: Use semantic chunking with parent-document retrieval.
Split by natural boundaries like paragraphs or headers.
Create small child chunks for search.
Return the full parent document to the LLM when a match occurs.
Add 10-20% overlap between chunks.

Default Search Weights I used a 50/50 split for vector and keyword search. For technical docs, exact keywords matter more.

The fix: Use dynamic weights.
Factual queries: 35% vector, 65% keyword.
Semantic queries: 75% vector, 25% keyword.

Over-optimizing HNSW Parameters I set ef_construction to the maximum value. On a large index, this crashed my server and used all my RAM.

The fix: Use appropriate HNSW settings.
Keep M between 8 and 32.
Set ef_construction to 200.
Set ef_search to 50.

Wrong Embedding Models I used a general model trained on web text. It did not understand my technical engineering docs.

The fix: Switch to a model fine-tuned for technical or code content.

Natural Language Mismatch Users ask questions like "why is my build slow." Documentation uses terms like "CI pipeline optimization." There was zero overlap.

The fix: Add an LLM query rewrite step.
Rewrite the user query into technical terms before searching.

Redundant Context Retrieving the top 10 chunks often meant getting the same paragraph three times. This caused hallucinations.

The fix: Use Maximal Marginal Relevance (MMR) to ensure diversity in results.

End-to-End Evaluation Only I only checked the final answer. I did not know if the problem was retrieval or the LLM.

The fix: Evaluate retrieval separately.
Track hit rate and Mean Reciprocal Rank (MRR).
Build a test set of 100 query-document pairs.

Results after fixes: • Answer relevance: 45% to 85% • Query latency: 3.2s to 1.8s • Monthly cost: $180 to $95

Prima risolvi il chunking. Poi i pesi. Poi la qualità degli embedding.

Qual è il tuo più grande mal di testa con il RAG? Scrivilo nei commenti.

Fonte: https://dev.to/kollittle/i-spent-500-on-rag-infrastructure-before-realizing-these-7-mistakes-were-killing-my-results-iph

Community di apprendimento opzionale: https://t.me/GyaanSetuAi

𝗜 𝗦𝗽𝗲𝗻𝘁 \$𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗠𝗮𝗸𝗶𝗻𝗴 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀

Continua a leggere

Come ho ridotto i miei costi AI del 60% con questo setup RAG

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗥𝗔𝗚 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗙𝗿𝗼𝗺 𝗦𝗰𝗿𝗮𝘁𝗰𝗵

𝗜 𝗦𝗽𝗲𝗻𝘁 $𝟱𝟬𝟬 𝗼𝗻 𝗥𝗔𝗚 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗕𝗲𝗳𝗼𝗿𝗲 𝗙𝗶𝘅𝗶𝗻𝗴 𝗧𝗵𝗲𝘀𝗲 𝟳 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀

Ho costruito un bot di Q&A per il codice con RAG: cosa ha funzionato e cosa no

𝗪𝗵𝘆 𝗬𝗼𝘂𝗿 𝗥𝗔𝗚 𝗦𝘆𝘀𝘁𝗲𝗺 𝗛𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗲𝘀