๐ช๐ต๐ ๐ ๐ ๐๐ถ๐ฟ๐๐ ๐ฅ๐๐ ๐ฆ๐๐๐๐ฒ๐บ ๐๐ฎ๐ถ๐น๐ฒ๐ฑ (๐ฎ๐ป๐ฑ ๐๐ผ๐ ๐ ๐๐ถ๐ ๐ฒ๐ฑ ๐๐)
I built a bot for internal documents. I used a vector database and an LLM. It looked good at first. Then it lied.
My first version had three big problems.
- It gave wrong numbers.
- It missed steps in long guides.
- It found the wrong documents.
I fixed these with two methods.
First. Parent-child chunking. I split data into small child chunks for searching. I gave the LLM the larger parent section for context. The LLM saw the full picture.
Second. Hybrid search. I combined vector search with keyword matching. This finds exact terms like admin password.
My new pipeline:
- User asks a question.
- Hybrid search finds child chunks.
- System pulls parent sections.
- Reranker picks the top 3.
- GPT-4 writes the answer.
This stopped the hallucinations. The bot found the right sections. It stopped guessing.
RAG is a system design problem. The embedding model is a small part. Slicing and retrieving data matters most.
My advice for you:
- Create a test set to measure progress.
- Monitor retrieval quality in production.
- Log the chunks the bot finds.
What is your chunking strategy?
Source: https://dev.to/__c1b9e06dc90a7e0a676b/why-my-first-rag-system-hallucinated-and-how-i-fixed-it-cha Optional learning community: https://t.me/GyaanSetuAi