๐ง๐ฒ๐ฎ๐ฐ๐ต๐ถ๐ป๐ด ๐ ๐ฅ๐ฒ๐ฟ๐ฎ๐ป๐ธ๐ฒ๐ฟ ๐ง๐ต๐ฒ ๐๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ ๐ข๐ณ ๐ฆ๐ฒ๐ฐ๐๐ฟ๐ถ๐๐ ๐ง๐ถ๐ฐ๐ธ๐ฒ๐๐
We improved our RAG pipeline. We fine-tuned a reranker for security tickets. The result was a 41 percent increase in MRR@10. The score rose from 0.598 to 0.846.
We kept the model architecture. We kept the embedding model. We trained the reranker on our own data.
We found training data in 142,000 closed tickets. Analysts often write "Refer to ticket #123". These notes are free relevance labels. They show related tickets.
Our method:
- We used regex to find ticket references.
- We kept high quality analyst links.
- We added sibling tickets to grow the set.
- We used hard negatives from the embedder.
Hard negatives matter most. These items look right to the embedder but are wrong. Teaching the model these gaps increases accuracy.
Use this approach if:
- Your data has a specific professional language.
- You have hidden signals like links or clicks.
- You already tuned your chunking.
Build your evaluation tool on day one. Find labels in your existing text.
Source: https://dev.to/vinayiitkgp/teaching-a-reranker-the-language-of-security-tickets-41-mrr10-4mgk Optional learning community: https://t.me/GyaanSetuAi