𝗟𝗶𝗻𝗲𝗮𝗿 𝗘𝗻𝘀𝗲𝗺𝗯𝗹𝗲𝘀 𝗘𝗿𝗮𝘀𝗲 𝗟𝗟𝗠 𝗪𝗮𝘁𝗲𝗿𝗺𝗮𝗿𝗸𝘀

📅2 days ago⏱1 min read

Watermarks in Large Language Models are failing.

Most tools try to detect AI text by finding tiny patterns in how models pick words. Experts thought these patterns would stay in the text no matter what. They were wrong.

If you use a mix of models, you erase the watermark.

This happens through a simple process called linear ensembles. When you take the outputs from three to five different models and average them, the watermark disappears.

Each model adds its own unique noise. Averaging these models cancels out that noise. It returns the text to its natural state.

The data shows how effective this is:

• Averaging 3 models drops detection scores below the standard threshold. • True detection rates fall from over 90% to under 50%. • Text quality improves by 27.5%. • The process runs 6 times faster than current baselines.

Researchers created a tool called WASH to make this work. It helps different models talk to each other by aligning their vocabularies. This makes the process work even if the models have different structures.

This creates a big problem for AI safety. You cannot rely on simple watermarks to track AI content if people use multiple models at once.

If you build a service that uses many different AI APIs, you must assume watermark detection is unreliable.

To fix this, the industry needs better methods. We need cryptographic signatures or shared signing keys that survive the averaging process.

Source: https://dev.to/olaughter/linear-ensembles-can-erase-llm-watermarks-34oo

Optional learning community: https://t.me/GyaanSetuAi

𝗟𝗶𝗻𝗲𝗮𝗿 𝗘𝗻𝘀𝗲𝗺𝗯𝗹𝗲𝘀 𝗘𝗿𝗮𝘀𝗲 𝗟𝗟𝗠 𝗪𝗮𝘁𝗲𝗿𝗺𝗮𝗿𝗸𝘀

Continue reading

𝗧𝗵𝗲 𝗥𝗲𝗮𝗹 𝗪𝗼𝗿𝗹𝗱 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 𝗼𝗳 𝗢𝗖𝗥

𝗚𝗨𝗔𝗥𝗗𝗥𝗔𝗜𝗟𝗦 𝗙𝗢𝗥 𝗔𝗜 𝗖𝗢𝗡𝗧𝗘𝗡𝗧

𝗟𝗲𝘀𝘀𝗼𝗻𝘀 𝗳𝗿𝗼𝗺 𝗮 𝟭𝟬𝟵 𝗮𝗴𝗲𝗻𝘁 𝗰𝗼𝗱𝗲 𝗮𝘂𝗱𝗶𝘁 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄

𝗧𝗵𝗲 𝗔𝗜 𝗥𝗲𝘃𝗶𝗲𝘄 𝗧𝗿𝗮𝗽: 𝗪𝗵𝘆 𝗩𝗲𝗿𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 𝗠𝗼𝗿𝗲 𝗧𝗵𝗮𝗻 𝗣𝗿𝗼𝗺𝗽𝘁𝗶𝗻𝗴

𝗔𝗜 𝗙𝗮𝗸𝗲𝘀 𝗙𝗼𝗼𝗱 𝗦𝗮𝗳𝗲𝘁𝘆 𝗣𝗿𝗼𝗼𝗳