Cloudflare Sets Deadline to Force AI Companies to Pay Publishers
Cloudflare has announced a landmark policy shift designed to decouple traditional search crawling from AI training and agentic services. By implementing strict new defaults, the edge computing giant aims to protect intellectual property and create a sustainable economic ecosystem for web publishers.
The End of "Mixed-Use" Crawlers
In a move that directly challenges the current data-scraping status quo, Cloudflare has set a deadline of September 15, 2026, to address the rise of "mixed-use" crawlers. These are bots that blend traditional search indexing with AI model training and agentic functions. Starting on that date, Cloudflare’s default settings will automatically block these hybrid crawlers from accessing any pages that host advertisements.
This policy change applies to all new Cloudflare customers, new sites created by existing customers, and all current free-tier users. The goal is to force AI companies to distinguish their intent: if a bot wants to index a site for search, it follows one path; if it wants to ingest data for training a Large Language Model (LLM), it must follow another—one that potentially requires compensation.
Challenging the Search Giant’s Dominance
A significant driver behind this decision is the perceived unfair advantage held by major search engines. Cloudflare specifically highlighted that the world’s largest search engine—widely understood to be Google—currently has access to approximately "2x more information" than its AI competitors.
While Google offers "Google Extended" to allow publishers to opt out of AI training without affecting search visibility, its flagship Googlebot continues to crawl extensively to power features like AI Overviews. Cloudflare’s intervention seeks to level the playing field, ensuring that AI companies cannot piggyback on the massive indexing capabilities of search engines to train their models for free.
Moving Toward "Pay Per Use" Models
Beyond mere blocking, Cloudflare is actively building the infrastructure for a new content economy. The company is evolving its "Pay Per Crawl" marketplace into a more sophisticated "Pay Per Use" model. Under this framework, publishers can charge AI companies not just for the act of fetching data, but when that content actually generates value.
To pilot this, Cloudflare is partnering with Ceramic.ai and You.com. Through these partnerships, publishers can receive direct compensation when their content appears in Ceramic’s AI search results or when You.com accesses premium material. This shift addresses a critical inefficiency in the current web: Cloudflare data reveals that over 50% of AI crawler traffic is wasted re-fetching unchanged pages, a process that drains both publisher bandwidth and AI compute resources.
Why This Matters for the AI Landscape
As non-human traffic now surpasses human traffic on the internet, the "scrape everything for free" era is hitting a wall. Cloudflare's move signals a transition toward a more regulated and transactional web. For AI developers, this means the era of frictionless, zero-cost data acquisition is ending, necessitating more transparent and cooperative relationships with content creators to ensure long-term data viability.
Key Takeaways
- Default Blocking: Starting September 15, 2026, Cloudflare will default to blocking "mixed-use" crawlers from ad-supported pages.
- Monetization Shift: Cloudflare is transitioning from "Pay Per Crawl" to a "Pay Per Use" model, allowing publishers to charge AI companies based on content value.
- Efficiency Gains: The new policy aims to reduce the 50% of AI crawl traffic currently wasted on re-fetching unchanged web pages.
