robots.txt told crawlers where they could and couldn't go. For two decades, that was sufficient. Search engines respected the directive. Webmasters trusted the honor system.
AI crawlers broke that system. Not because they ignored robots.txt (though some do), but because the file answers the wrong question. robots.txt says "don't crawl here." It doesn't say "crawl here, but pay for it." It doesn't communicate pricing, usage rights, or licensing terms.
Dave Winer, the programmer who co-created RSS and helped build the infrastructure of the early web, saw this gap. In September 2025, he proposed Really Simple Licensing (RSL) as the machine-readable standard for AI content licensing. The name echoes RSS deliberately. Simple. Standardized. Built for adoption.
RSL lets publishers communicate licensing terms in a format AI crawlers can parse automatically. When GPTBot hits your domain, it can check your RSL file, read your pricing, and decide whether to proceed or move on. Cloudflare Pay-Per-Crawl uses RSL as one of its data sources for automated enforcement.
Why RSL Matters
Dave Winer's Vision for Machine-Readable Licensing
The core insight: AI licensing will happen at scale or not at all. Publishers can't negotiate individual contracts with every AI company. AI companies can't manually review licensing terms for every domain they crawl. The only path to a functioning market is standardization.
RSL provides that standard. A single file format. A single location convention. A predictable structure that any crawler can read and any publisher can create without hiring developers.
How AI Companies Discover RSL Files
AI crawlers check predictable locations. Just as robots.txt lives at /robots.txt, RSL files live at /rsl.json or /rsl.xml.
The discovery process:
- Crawler arrives at your domain
- Before crawling content, it requests
/rsl.json - If the file exists, crawler parses the licensing terms
- Crawler compares your terms against its configured parameters
- If terms align, crawler proceeds. If not, crawler either skips your domain or flags for human review.
OpenAI's GPTBot and Anthropic's ClaudeBot both include RSL checking in their crawl logic. Cloudflare explicitly reads RSL as part of Pay-Per-Crawl enforcement.
RSL vs. robots.txt vs. Direct Licensing Deals
robots.txt
- Function: Access control (allow/disallow crawling)
- Pricing capability: None
- Enforcement: Honor system
- Best for: Blocking crawlers you don't want, period
RSL Protocol
- Function: Licensing terms communication
- Pricing capability: Full (per-crawl, per-inference, flat-rate, hybrid)
- Enforcement: Through systems like Cloudflare Pay-Per-Crawl
- Best for: Automated licensing to compliant AI companies at scale
Direct Deals
- Function: Custom contracts with specific AI companies
- Pricing capability: Unlimited
- Enforcement: Contract law
- Best for: Large publishers with unique leverage (News Corp, Reddit)
RSL sits between blocking and negotiating. It enables licensing to AI companies you'll never talk to, at rates you set, without lawyers or contracts.
Understanding RSL File Structure
Required Fields
Every RSL file needs three pieces of information: who you are, what you're licensing, and how you want to get paid.
{
"rsl_version": "1.0",
"licensor": {
"name": "Example Publication",
"contact": "[email protected]",
"url": "https://examplepub.com"
},
"content_type": "news",
"pricing_model": "per_crawl",
"pricing": {
"rate": 0.008,
"currency": "USD"
},
"updated": "2026-01-15"
}
The licensor block identifies your organization. Include a contact email that routes to someone authorized to discuss licensing.
content_type helps AI companies categorize your content. Common values: "news", "technical_documentation", "b2b_trade", "academic", "user_generated_content".
pricing_model declares your approach. Options: "per_crawl", "per_inference", "flat_rate", "hybrid", "negotiable".
Optional Fields
{
"restrictions": {
"geographic": ["US", "EU", "UK"],
"usage_type": ["retrieval"],
"excluded_paths": ["/premium/", "/subscriber-only/"]
},
"attribution": {
"required": true,
"format": "Source: Example Publication (examplepub.com)",
"link_required": true
},
"volume_discounts": [
{"threshold": 50000, "rate": 0.006},
{"threshold": 200000, "rate": 0.004}
]
}
Geographic restrictions limit where AI systems can use your content. Usage type can distinguish training from retrieval. Attribution requirements specify how AI systems should cite you.
JSON vs. XML Format
Both formats work. JSON has become dominant for practical reasons:
- Lighter weight
- Native parsing in JavaScript
- Easier to read and debug
- More consistent with modern API conventions
If you're choosing fresh: use JSON. Name it rsl.json. Host it at your domain root.
Creating Your First RSL File
Defining Pricing Models
Per-crawl pricing charges for each page request. Simplest model. Cloudflare Pay-Per-Crawl built its system around this approach.
Industry benchmarks:
- News content: $0.002-$0.005 per crawl
- B2B trade publications: $0.008-$0.012 per crawl
- Technical documentation: $0.015-$0.025 per crawl
"pricing_model": "per_crawl",
"pricing": {
"rate": 0.008,
"currency": "USD"
}
Per-inference pricing charges when AI systems use your content in responses. Higher theoretical value, but nearly impossible to track without AI company cooperation.
Flat-rate pricing mimics the News Corp model: annual fee for access. Works when you have leverage to demand upfront payment.
Hybrid models combine approaches: per-crawl for retrieval, flat-rate for training data rights.
Setting Content Scope
Site-wide licensing:
"scope": {
"type": "site_wide",
"exclusions": ["/admin/", "/private/"]
}
Directory-based licensing:
"scope": {
"type": "directory",
"pricing_by_path": [
{"path": "/news/", "rate": 0.005},
{"path": "/analysis/", "rate": 0.012},
{"path": "/research/", "rate": 0.020}
],
"default_rate": 0.008
}
Where to Host Your RSL File
Domain Root Placement
Host at the domain root. Full stop.
https://example.com/rsl.json
Not:
https://example.com/legal/rsl.json
https://example.com/licensing/terms/rsl.json
Every AI crawler checking for RSL looks at the root first. Some stop there. If your file lives elsewhere, many crawlers never find it.
Linking from robots.txt
Cross-reference increases discoverability:
# RSL Protocol licensing terms
# https://example.com/rsl.json
User-agent: GPTBot
Crawl-delay: 10
Allow: /
HTTP Header Declarations
Advanced implementation adds HTTP headers to all responses:
X-RSL-Location: https://example.com/rsl.json
Link: <https://example.com/rsl.json>; rel="license"
Testing and Validation
RSL Validator Tools
Before deployment, validate your RSL file:
Syntax validation: Paste your JSON into any JSON validator. Catches structural errors.
Schema validation: The RSL specification includes a JSON Schema definition. Tools like ajv can check your file against the schema.
Run validation after any edit. A typo in your pricing field can change $0.008 to $0.08 (10x your intended rate).
Monitoring Which AI Companies Read Your RSL
Filter logs for requests to /rsl.json:
grep "rsl.json" /var/log/nginx/access.log
User-agent strings identify the crawler. Track over time. If ClaudeBot checked your RSL file in January but stopped in March, something changed.
Version Control
Treat your RSL file like code. Version control it.
git commit -m "RSL v1.1: Added volume discounts for 50k+ crawls"
This creates an audit trail. If an AI company disputes your terms, you can demonstrate exactly what your RSL file said on any given date.
Enforcement: What Happens When AI Companies Ignore RSL
Legal Precedent
RSL communicates your terms. It doesn't enforce them. Enforcement requires separate mechanisms.
Your RSL file establishes that terms existed and were communicated. If an AI company scrapes without payment, you have documentation showing:
- Your licensing terms were published at a known location
- Their crawler accessed your domain
- They didn't comply with stated terms
Cloudflare Pay-Per-Crawl as Enforcement Layer
Cloudflare sits between crawlers and your origin server:
- Cloudflare intercepts the request
- Cloudflare checks your RSL file for pricing terms
- Cloudflare checks whether this crawler has a payment relationship
- If paid: request proceeds
- If unpaid: request blocked, throttled, or redirected to payment setup
Your RSL file feeds into this system. Change your RSL file, Cloudflare's enforcement updates automatically.
RSL protocol fills the gap between blocking AI crawlers and negotiating individual contracts. Implementation takes 30 minutes for a basic file, an hour for tiered pricing and full options.
Publishers waiting for the "right time" to implement RSL are watching the market develop without them. The right time was when you first noticed AI crawlers in your server logs. The second-best time is now.
For related guides, see Cloudflare Pay-Per-Crawl Setup and llms.txt Specification.