๐๐ถ๐ป๐ฒ๐ฎ๐ฟ ๐๐ป๐๐ฒ๐บ๐ฏ๐น๐ฒ๐ ๐๐ฟ๐ฎ๐๐ฒ ๐๐๐ ๐ช๐ฎ๐๐ฒ๐ฟ๐บ๐ฎ๐ฟ๐ธ๐
Watermarks in Large Language Models are failing.
Most tools try to detect AI text by finding tiny patterns in how models pick words. Experts thought these patterns would stay in the text no matter what. They were wrong.
If you use a mix of models, you erase the watermark.
This happens through a simple process called linear ensembles. When you take the outputs from three to five different models and average them, the watermark disappears.
Each model adds its own unique noise. Averaging these models cancels out that noise. It returns the text to its natural state.
The data shows how effective this is:
โข Averaging 3 models drops detection scores below the standard threshold. โข True detection rates fall from over 90% to under 50%. โข Text quality improves by 27.5%. โข The process runs 6 times faster than current baselines.
Researchers created a tool called WASH to make this work. It helps different models talk to each other by aligning their vocabularies. This makes the process work even if the models have different structures.
This creates a big problem for AI safety. You cannot rely on simple watermarks to track AI content if people use multiple models at once.
If you build a service that uses many different AI APIs, you must assume watermark detection is unreliable.
To fix this, the industry needs better methods. We need cryptographic signatures or shared signing keys that survive the averaging process.
Source: https://dev.to/olaughter/linear-ensembles-can-erase-llm-watermarks-34oo
Optional learning community: https://t.me/GyaanSetuAi