𝗪𝗵𝘆 𝗠𝘆 𝗥𝗔𝗚 𝗔𝗽𝗽 𝗞𝗲𝗽𝘁 𝗛𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗶𝗻𝗴 𝗔𝗻𝗱 𝗛𝗼𝘄 𝗜 𝗙𝗶𝘅𝗲𝗱 𝗜𝘁

A few months ago, I demoed my RAG support bot. It told a colleague our refund policy was 30 days. Our actual policy is 14 days. The bot did not hesitate. It did not say it was unsure. It made up an answer with total confidence.

RAG should reduce hallucinations. My setup only moved them around. I learned five lessons while debugging this system.

  1. Stop using fixed character counts for chunks I used 1000 character chunks with slight overlap. This caused problems. One chunk often mixed shipping rules with return rules. The model blended these different sections into one wrong answer. Fix: I switched to semantic chunking. I split data by headings and paragraphs. This keeps related information together.

  2. Similarity does not mean relevance My retriever pulled the top 3 chunks based on cosine similarity. A chunk can look similar to a question without containing the answer. The model assumes everything in the context is true. Fix: I added a reranking step using a cross-encoder. I also started logging retrieval scores. This shows when the system lacks a real answer.

  3. Tell the model it is okay to fail My original prompt was simple: Use the context to answer the question. It gave the model no instructions for when context was missing. The model filled the gaps with guesses. Fix: I added a specific instruction. If the answer is not in the context, say you do not know. Hallucinations dropped immediately.

  4. Enforce a retrieval threshold The model still used general knowledge when retrieval failed. I was hoping the prompt would work, but hope is not a strategy. Fix: I set a hard score threshold. If the top retrieval score is too low, the system stops. It returns a fallback message instead of letting the model guess.

  5. Test for failures, not just success I only tested easy questions that I knew the documents covered. I ignored ambiguous queries and missing information. Hallucinations live in those gaps. Fix: I built an evaluation set of trap questions. These are cases where the correct answer is not in the system. I run these tests every time I make a change.

RAG does not stop hallucinations entirely. It makes them controllable. My bot still does not know everything. But now, when it is unsure, it says so. That makes the tool usable.

Source: https://dev.to/pallavi_sharma_10c1a6f1da/why-my-rag-app-kept-hallucinating-and-how-i-fixed-it-3i10