RSL Protocol Implementation: How Publishers License Content to AI Systems

robots.txt told crawlers where they could and couldn't go. For two decades, that was sufficient. Search engines respected the directive. Webmasters trusted the honor system.

AI crawlers broke that system. Not because they ignored robots.txt (though some do), but because the file answers the wrong question. robots.txt says "don't crawl here." It doesn't say "crawl here, but pay for it." It doesn't communicate pricing, usage rights, or licensing terms.

Dave Winer, the programmer who co-created RSS and helped build the infrastructure of the early web, saw this gap. In September 2025, he proposed Really Simple Licensing (RSL) as the machine-readable standard for AI content licensing. The name echoes RSS deliberately. Simple. Standardized. Built for adoption.

RSL lets publishers communicate licensing terms in a format AI crawlers can parse automatically. When GPTBot hits your domain, it can check your RSL file, read your pricing, and decide whether to proceed or move on. Cloudflare Pay-Per-Crawl uses RSL as one of its data sources for automated enforcement.

Why RSL Matters

Dave Winer's Vision for Machine-Readable Licensing

The core insight: AI licensing will happen at scale or not at all. Publishers can't negotiate individual contracts with every AI company. AI companies can't manually review licensing terms for every domain they crawl. The only path to a functioning market is standardization.

RSL provides that standard. A single file format. A single location convention. A predictable structure that any crawler can read and any publisher can create without hiring developers.

How AI Companies Discover RSL Files

AI crawlers check predictable locations. Just as robots.txt lives at /robots.txt, RSL files live at /rsl.json or /rsl.xml.

The discovery process:

Crawler arrives at your domain
Before crawling content, it requests /rsl.json
If the file exists, crawler parses the licensing terms
Crawler compares your terms against its configured parameters
If terms align, crawler proceeds. If not, crawler either skips your domain or flags for human review.

OpenAI's GPTBot and Anthropic's ClaudeBot both include RSL checking in their crawl logic. Cloudflare explicitly reads RSL as part of Pay-Per-Crawl enforcement.

RSL vs. robots.txt vs. Direct Licensing Deals

robots.txt

Function: Access control (allow/disallow crawling)
Pricing capability: None
Enforcement: Honor system
Best for: Blocking crawlers you don't want, period

RSL Protocol

Function: Licensing terms communication
Pricing capability: Full (per-crawl, per-inference, flat-rate, hybrid)
Enforcement: Through systems like Cloudflare Pay-Per-Crawl
Best for: Automated licensing to compliant AI companies at scale

Direct Deals

Function: Custom contracts with specific AI companies
Pricing capability: Unlimited
Enforcement: Contract law
Best for: Large publishers with unique bargaining power (News Corp, Reddit)

RSL sits between blocking and negotiating. It enables licensing to AI companies you'll never talk to, at rates you set, without lawyers or contracts.

Understanding RSL File Structure

Required Fields

Every RSL file needs three pieces of information: who you are, what you're licensing, and how you want to get paid.

{
  "rsl_version": "1.0",
  "licensor": {
    "name": "Example Publication",
    "contact": "[email protected]",
    "url": "https://examplepub.com"
  },
  "content_type": "news",
  "pricing_model": "per_crawl",
  "pricing": {
    "rate": 0.008,
    "currency": "USD"
  },
  "updated": "2026-01-15"
}

The licensor block identifies your organization. Include a contact email that routes to someone authorized to discuss licensing.

content_type helps AI companies categorize your content. Common values: "news", "technical_documentation", "b2b_trade", "academic", "user_generated_content".

pricing_model declares your approach. Options: "per_crawl", "per_inference", "flat_rate", "hybrid", "negotiable".

Optional Fields

{
  "restrictions": {
    "geographic": ["US", "EU", "UK"],
    "usage_type": ["retrieval"],
    "excluded_paths": ["/premium/", "/subscriber-only/"]
  },
  "attribution": {
    "required": true,
    "format": "Source: Example Publication (examplepub.com)",
    "link_required": true
  },
  "volume_discounts": [
    {"threshold": 50000, "rate": 0.006},
    {"threshold": 200000, "rate": 0.004}
  ]
}

Geographic restrictions limit where AI systems can use your content. Usage type can distinguish training from retrieval. Attribution requirements specify how AI systems should cite you.

JSON vs. XML Format

Both formats work. JSON has become dominant for practical reasons:

Lighter weight
Native parsing in JavaScript
Easier to read and debug
More consistent with modern API conventions

If you're choosing fresh: use JSON. Name it rsl.json. Host it at your domain root.

Creating Your First RSL File

Defining Pricing Models

Per-crawl pricing charges for each page request. Simplest model. Cloudflare Pay-Per-Crawl built its system around this approach.

Industry benchmarks:

News content: $0.002-$0.005 per crawl
B2B trade publications: $0.008-$0.012 per crawl
Technical documentation: $0.015-$0.025 per crawl

"pricing_model": "per_crawl",
"pricing": {
  "rate": 0.008,
  "currency": "USD"
}

Per-inference pricing charges when AI systems use your content in responses. Higher theoretical value, but nearly impossible to track without AI company cooperation.

Flat-rate pricing mimics the News Corp model: annual fee for access. Works when you have enough power to demand upfront payment.

Hybrid models combine approaches: per-crawl for retrieval, flat-rate for training data rights.

Setting Content Scope

Site-wide licensing:

"scope": {
  "type": "site_wide",
  "exclusions": ["/admin/", "/private/"]
}

Directory-based licensing:

"scope": {
  "type": "directory",
  "pricing_by_path": [
    {"path": "/news/", "rate": 0.005},
    {"path": "/analysis/", "rate": 0.012},
    {"path": "/research/", "rate": 0.020}
  ],
  "default_rate": 0.008
}

Where to Host Your RSL File

Domain Root Placement

Host at the domain root. Full stop.

https://example.com/rsl.json

Not:

https://example.com/legal/rsl.json
https://example.com/licensing/terms/rsl.json

Every AI crawler checking for RSL looks at the root first. Some stop there. If your file lives elsewhere, many crawlers never find it.

Linking from robots.txt

Cross-reference increases discoverability:

# RSL Protocol licensing terms
# https://example.com/rsl.json

User-agent: GPTBot
Crawl-delay: 10
Allow: /

HTTP Header Declarations

Advanced implementation adds HTTP headers to all responses:

X-RSL-Location: https://example.com/rsl.json
Link: <https://example.com/rsl.json>; rel="license"

Testing and Validation

RSL Validator Tools

Before deployment, validate your RSL file:

Syntax validation: Paste your JSON into any JSON validator. Catches structural errors.

Schema validation: The RSL specification includes a JSON Schema definition. Tools like ajv can check your file against the schema.

Run validation after any edit. A typo in your pricing field can change $0.008 to $0.08 (10x your intended rate).

Monitoring Which AI Companies Read Your RSL

Filter logs for requests to /rsl.json:

grep "rsl.json" /var/log/nginx/access.log

User-agent strings identify the crawler. Track over time. If ClaudeBot checked your RSL file in January but stopped in March, something changed.

Version Control

Treat your RSL file like code. Version control it.

git commit -m "RSL v1.1: Added volume discounts for 50k+ crawls"

This creates an audit trail. If an AI company disputes your terms, you can demonstrate exactly what your RSL file said on any given date.

Enforcement: What Happens When AI Companies Ignore RSL

Legal Precedent

RSL communicates your terms. It doesn't enforce them. Enforcement requires separate mechanisms.

Your RSL file establishes that terms existed and were communicated. If an AI company scrapes without payment, you have documentation showing:

Your licensing terms were published at a known location
Their crawler accessed your domain
They didn't comply with stated terms

Cloudflare Pay-Per-Crawl as Enforcement Layer

Cloudflare sits between crawlers and your origin server:

Cloudflare intercepts the request
Cloudflare checks your RSL file for pricing terms
Cloudflare checks whether this crawler has a payment relationship
If paid: request proceeds
If unpaid: request blocked, throttled, or redirected to payment setup

Your RSL file feeds into this system. Change your RSL file, Cloudflare's enforcement updates automatically.

RSL protocol fills the gap between blocking AI crawlers and negotiating individual contracts. Implementation takes 30 minutes for a basic file, an hour for tiered pricing and full options.

Publishers waiting for the "right time" to implement RSL are watching the market develop without them. The right time was when you first noticed AI crawlers in your server logs. The second-best time is now.

For related guides, see Cloudflare Pay-Per-Crawl Setup and llms.txt Specification.