๐๐๐ถ๐น๐ฑ๐ถ๐ป๐ด ๐๐ป ๐๐ป๐๐ฒ๐ฟ๐ป๐ฎ๐น ๐๐ ๐๐ต๐ฎ๐๐ฏ๐ผ๐: ๐๐ฒ๐๐๐ผ๐ป๐ ๐๐ฒ๐ฎ๐ฟ๐ป๐ฒ๐ฑ
Internal docs are often a mess. My team had too many pages. New hires struggled to find answers. I built an AI chatbot to solve this.
It took two months of tests. Here is what I learned.
First, I used GPT-4 and fixed chunks. It worked, but the cost was too high. Then I tried local models. They failed to understand technical jargon.
The secret is better retrieval. A bigger model is not the answer.
What worked:
- Semantic chunking. I used sentence boundaries to keep context.
- Hybrid search. I combined vector search with BM25. This finds meanings and exact keywords.
- Reranking. I added a cross-encoder to reorder results.
- Honest fallbacks. The bot says I do not know when unsure.
My top tips for you:
- Focus on chunking first.
- Use hybrid search for technical terms.
- Use small models for simple facts.
- Create an evaluation set with experts early on.
Stop chasing the newest model. Your bottleneck is retrieval.
Optional learning community: https://t.me/GyaanSetuAi