𝗧𝗲𝗮𝗰𝗵𝗶𝗻𝗴 𝗔 𝗥𝗲𝗿𝗮𝗻𝗸𝗲𝗿 𝗧𝗵𝗲 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗢𝗳 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗧𝗶𝗰𝗸𝗲𝘁𝘀

📅1 week ago⏱1 min read

We improved our RAG pipeline. We fine-tuned a reranker for security tickets. The result was a 41 percent increase in MRR@10. The score rose from 0.598 to 0.846.

We kept the model architecture. We kept the embedding model. We trained the reranker on our own data.

We found training data in 142,000 closed tickets. Analysts often write "Refer to ticket #123". These notes are free relevance labels. They show related tickets.

Our method:

We used regex to find ticket references.
We kept high quality analyst links.
We added sibling tickets to grow the set.
We used hard negatives from the embedder.

Hard negatives matter most. These items look right to the embedder but are wrong. Teaching the model these gaps increases accuracy.

Use this approach if:

Your data has a specific professional language.
You have hidden signals like links or clicks.
You already tuned your chunking.

Build your evaluation tool on day one. Find labels in your existing text.

Source: https://dev.to/vinayiitkgp/teaching-a-reranker-the-language-of-security-tickets-41-mrr10-4mgk Optional learning community: https://t.me/GyaanSetuAi

𝗧𝗲𝗮𝗰𝗵𝗶𝗻𝗴 𝗔 𝗥𝗲𝗿𝗮𝗻𝗸𝗲𝗿 𝗧𝗵𝗲 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗢𝗳 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗧𝗶𝗰𝗸𝗲𝘁𝘀

Continue reading

𝗧𝗵𝗶𝘀 𝗜𝘀 𝗛𝗼𝘄 𝗬𝗼𝘂 𝗖𝗮𝗻 𝗔𝘂𝗍𝗼𝗺𝗮𝘁𝗲 𝗦𝗲𝗻𝘁𝗶𝗺𝗲𝗻𝘁 𝗧𝗿𝗶𝗮𝗴𝗲 𝗮𝗻𝗱 𝗩𝗜𝗣 𝗗𝗲𝘁𝗲𝗰𝘁𝗶𝗼𝗻

𝗪𝗵𝘆 𝗠𝘆 𝗥𝗔𝗚 𝗕𝗼𝘁 𝗟𝗶𝗲𝗱 𝗔𝗻𝗱 𝗛𝗼𝘄 𝗜 𝗙𝗶𝘅𝗲𝗱 𝗜𝘁

𝗦𝗲𝗰𝘂𝗿𝗶𝗻𝗴 𝗬𝗼𝘂𝗿 𝗥𝗔𝗚 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲

𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀 𝗳𝗼𝗿 𝗥𝗔𝗚

𝗥𝗔𝗚 𝗶𝗻 𝟴 𝗟𝗮𝘆𝗲𝗿𝘀: 𝗙𝗿𝗼𝗺 𝗧𝗼𝗸𝗲𝗻𝘀 𝘁𝗼 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻