Publishers facing AI crawler activity have three paths forward. Block access entirely with robots.txt. Monetize through marketplace infrastructure like Cloudflare Pay-Per-Crawl and RSL protocol. Or negotiate custom contracts directly with OpenAI, Anthropic, and Google.
Each path has economics. Each has enforcement mechanisms. Each serves different publisher profiles.
The decision isn't philosophical. It's financial. A 5-million-pageview trade publication and a 500-million-pageview news conglomerate face the same AI crawlers but have radically different leverage positions.
The Three Paths to AI Content Licensing
Blocking (robots.txt) - Control Without Compensation
The oldest approach. Declare in your robots.txt file which crawlers can access which content.
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
This stops compliant crawlers from accessing your content. It doesn't generate revenue. It's a wall, not a tollbooth.
75% of major publishers now block CCBot. 69% block ClaudeBot. 62% block GPTBot. These numbers represent a defensive posture: publishers protecting archives without a clear path to monetization.
Blocking works when the goal is negotiating leverage. News Corp blocked AI crawlers before their $250 million OpenAI deal. The block created scarcity. Scarcity created negotiating power.
Marketplace (RSL + Cloudflare) - Standardized Pricing, Automated Billing
RSL protocol and Cloudflare Pay-Per-Crawl represent the marketplace approach. Publishers set per-crawl rates. Compliant AI companies pay automatically. Non-compliant crawlers get blocked or throttled.
The marketplace democratizes AI licensing. A 10-million-pageview B2B publication can set rates and collect revenue without hiring licensing lawyers. Expected revenue ranges from $500 to $5,000 monthly for mid-size publishers.
The limitation: AI companies can choose to ignore marketplace terms. Non-compliant crawlers bypass these systems entirely.
Direct Deals (News Corp Model) - Negotiated Contracts, Upfront Payments
News Corp's $250 million OpenAI agreement. Reddit's $60 million annual Google contract. Associated Press's undisclosed OpenAI partnership. Financial Times' multi-year Anthropic deal.
Custom contracts. Negotiated terms. Upfront payments or guaranteed annual minimums. Attribution requirements. Audit rights.
The threshold for direct deal viability sits around 50 million monthly pageviews or truly irreplaceable niche data.
robots.txt: The Blocking-Only Approach
Compliance Rates
| Crawler | Company | Publisher Block Rate |
|---|---|---|
| CCBot | Common Crawl | 75% |
| ClaudeBot | Anthropic | 69% |
| GPTBot | OpenAI | 62% |
| Google-Extended | 58% | |
| Bytespider | ByteDance | 45% |
Anthropic and OpenAI demonstrate strong compliance. ByteDance's Bytespider shows lower compliance.
When Blocking Makes Sense
Pre-negotiation leverage building. News Corp blocked AI crawlers before their OpenAI deal.
Unique, irreplaceable data. If your archives contain information unavailable elsewhere, blocking creates scarcity value.
Regulatory positioning. Some publishers block preemptively while legal frameworks develop.
Limitations
Blocking generates zero revenue. Legal enforceability is weak. Blocking also forecloses future optionality.
RSL + Cloudflare Pay-Per-Crawl: The Marketplace Approach
How It Works
RSL provides machine-readable licensing terms:
{
"licensor": "Example Publisher",
"content_type": "news",
"pricing_model": "per_crawl",
"rates": {
"news": 0.005,
"analysis": 0.010,
"research": 0.020
}
}
Cloudflare detects AI crawler requests, checks pricing configuration, routes compliant crawlers through payment flow, and blocks non-paying crawlers.
Expected Revenue by Publisher Size
| Monthly Pageviews | Expected Monthly Revenue |
|---|---|
| 1-5 million | $200-$800 |
| 5-15 million | $800-$2,500 |
| 15-50 million | $2,500-$5,000 |
| 50+ million | $5,000+ (consider direct deals) |
Pros and Cons
Pros: Accessibility for mid-size publishers, automated billing, transparency, low barrier to entry.
Cons: Platform dependency on Cloudflare, compliance still voluntary, revenue ceiling below direct deals.
Direct Licensing Deals
News Corp ($250M Over 5 Years)
Properties: Wall Street Journal, New York Post, Times of London, Barron's, MarketWatch.
What they licensed: Current and archived news, real-time feeds, paywalled content.
Reddit ($60M Annually)
What Google licensed: 18 years of posts and comments, real-time API access, structured metadata.
Value drivers: Conversational patterns, niche expertise, temporal dynamics, community signals.
What Direct Deals Include
- Training data rights
- Retrieval rights
- Attribution requirements
- Audit rights
- Exclusivity terms (usually non-exclusive)
Hybrid Strategies
Block Aggressive Crawlers, License Compliant Ones
Compliant crawlers (GPTBot, ClaudeBot, Google-Extended): Route through Cloudflare Pay-Per-Crawl.
Non-compliant crawlers (Bytespider): Block via firewall rules.
Separate Retrieval from Training
Cloudflare marketplace handles retrieval. Per-crawl pricing. Automated billing.
Direct negotiation handles training. Flat fee for archive access.
Decision Framework
Content Uniqueness Assessment
Commodity content (Score 1-2): Focus on marketplace efficiency.
Industry specialization (Score 3): Test marketplace first, use data for direct deal cases.
Unique datasets (Score 4-5): Direct deal potential warrants upfront investment.
Technical Resources and Legal Budget
| Resource Level | Recommended Approach |
|---|---|
| No dedicated tech/legal | Marketplace only |
| Part-time tech, no legal | Marketplace primary, simple RSL |
| Dedicated tech, external legal | Hybrid approach |
| Full tech team, in-house legal | Direct deals primary |
Publishers don't have to choose one model permanently. The landscape is evolving.
Marketplace revenue today funds direct deal negotiations tomorrow. Direct deal learnings inform marketplace pricing refinements. Blocking creates optionality for future negotiations.
The worst strategy is paralysis. While you deliberate, AI crawlers are scraping. Choose a path. Execute. Iterate.
For implementation guides, see Cloudflare Pay-Per-Crawl Setup and RSL Protocol Implementation.