𝗥𝗔𝗚 𝗶𝗻 𝟴 𝗟𝗮𝘆𝗲𝗿𝘀: 𝗙𝗿𝗼𝗺 𝗧𝗼𝗸𝗲𝗻𝘀 𝘁𝗼 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻

📅11 hours ago⏱2 min read

You ship a RAG system. A week later, it breaks. The answers are confident, the citations look real, but the conclusions are wrong. Your logs show nothing.

I hit this wall many times. I realized RAG is not one step. It is eight layers. Each layer is a place where things go wrong.

If you build an AI assistant for engineers, bad answers cost time and money. Use this framework to build systems that actually work.

Layer 1: Tokenization Before a model reads a word, it converts it to tokens. Tokens are small units like sub-words. If your chunk is 512 tokens, it is not 512 words. Technical jargon fragments into many tokens. If you exceed the limit, the model silently cuts the end of your text. You lose the fix.

Layer 2: Chunking Bad chunks ruin everything. If you split a table in half, the model sees nothing.

Use overlap so meaning stays intact.
Use recursive splitting to keep sentences together.
Use parent-child chunking. Index small chunks for precision, but give the LLM the large parent chunk for context.

Layer 3: Embeddings Embeddings turn text into numbers.

Sparse embeddings (BM25) are great for exact keywords like error codes.
Dense embeddings are great for meaning and synonyms.
Use both.

Layer 4: Vector Indexing Searching millions of vectors takes too long. You need Approximate Nearest Neighbor (ANN) indexing. Use HNSW to trade a tiny bit of accuracy for massive speed. Aim for sub-100ms responses.

Layer 5: Retrieval Strategy Do not rely on one method. Use Hybrid Search. Combine BM25 and dense retrieval. This catches both the exact error code and the general symptom.

Layer 6: Reranking This is the biggest quality jump. Retrieve 20 candidates with a fast model. Use a Cross-Encoder to score them precisely. This turns a "maybe" into a "correct" answer.

Layer 7: Query Rewriting Users ask messy questions.

Multi-query: Generate several versions of the question to find more matches.
HyDE: Generate a fake answer first, then search for documents that look like that answer.

Layer 8: Evaluation Do not wait for a crash to test. Use RAGAS to measure:

Faithfulness: Does the answer match the context?
Relevancy: Does it answer the question?
Recall: Did you find the right data?

If your RAG is failing, check your chunking, your hybrid search, and your evaluation first.

Source: https://dev.to/aashna_mahajan/rag-in-8-layers-from-tokens-to-production-39kf

Optional learning community: https://t.me/GyaanSetuAi

𝗥𝗔𝗚 𝗶𝗻 𝟴 𝗟𝗮𝘆𝗲𝗿𝘀: 𝗙𝗿𝗼𝗺 𝗧𝗼𝗸𝗲𝗻𝘀 𝘁𝗼 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻

Continue reading

𝗥𝗔𝗚 𝗜𝘀 𝗔 𝗦𝗲𝗮𝗿𝗰𝗵 𝗣𝗿𝗼𝗯𝗹𝗲𝗺, 𝗡𝗼𝘁 𝗔𝗻 𝗔𝗜 𝗣𝗿𝗼𝗯𝗹𝗲𝗺

𝗠𝘆 𝗥𝗔𝗚 𝗪𝗮𝘀 𝗕𝗿𝗼𝗸𝗲𝗻. 𝗧𝗵𝗲 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 𝗪𝗮𝘀 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴.

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗔 𝗥𝗔𝗚 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗜𝗻 𝗔 𝗪𝗲𝗲𝗸𝗲𝗻𝗱

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗮 𝗥𝗔𝗚 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗶𝗻 𝗮 𝘄𝗲𝗲𝗸𝗲𝗻𝗱

𝗜 𝗥𝗲𝗯𝘂𝗶𝗹𝘁 𝗠𝘆 𝗥𝗔𝗚 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗙𝗿𝗼𝗺 𝗦𝗰𝗿𝗮𝘁𝗰𝗵