๐—ช๐—ต๐˜† ๐— ๐˜† ๐—™๐—ถ๐—ฟ๐˜€๐˜ ๐—ฅ๐—”๐—š ๐—ฆ๐˜†๐˜€๐˜๐—ฒ๐—บ ๐—™๐—ฎ๐—ถ๐—น๐—ฒ๐—ฑ (๐—ฎ๐—ป๐—ฑ ๐—›๐—ผ๐˜„ ๐—œ ๐—™๐—ถ๐˜…๐—ฒ๐—ฑ ๐—œ๐˜)

I built a bot for internal documents. I used a vector database and an LLM. It looked good at first. Then it lied.

My first version had three big problems.

I fixed these with two methods.

First. Parent-child chunking. I split data into small child chunks for searching. I gave the LLM the larger parent section for context. The LLM saw the full picture.

Second. Hybrid search. I combined vector search with keyword matching. This finds exact terms like admin password.

My new pipeline:

This stopped the hallucinations. The bot found the right sections. It stopped guessing.

RAG is a system design problem. The embedding model is a small part. Slicing and retrieving data matters most.

My advice for you:

What is your chunking strategy?

Source: https://dev.to/__c1b9e06dc90a7e0a676b/why-my-first-rag-system-hallucinated-and-how-i-fixed-it-cha Optional learning community: https://t.me/GyaanSetuAi