The AI licensing landscape has a communication problem. Machine-readable protocols like RSL (Really Simple Licensing) work for automated systems, but they fail when human judgment enters the picture. An OpenAI engineer reviewing licensing terms doesn't want to parse JSON. An Anthropic compliance officer checking whether their crawler respects publisher wishes doesn't want to decode XML schemas.
llms.txt solves this by being readable. Plain text. Natural language. A document that both AI systems and humans can interpret without tooling.
This isn't a replacement for RSL or robots.txt. It's a complement. RSL tells machines what to do. llms.txt tells humans (and increasingly, AI systems themselves) what the rules are in language they can process contextually.
What llms.txt Is
Human-Readable vs. Machine-Readable Licensing
RSL protocol uses structured data formats. JSON looks like this:
{
"licensor": "Example Publisher",
"content_type": "news",
"pricing_model": "per_crawl",
"rate": 0.005
}
Machines parse it efficiently. Humans squint at curly braces.
llms.txt uses prose:
This site is operated by Example Publisher.
We license our content to AI companies for training and retrieval purposes.
Per-crawl rate: $0.005 for news content.
Contact [email protected] to establish a billing relationship.
Both communicate the same information. llms.txt does it in a format that requires no technical training to understand.
Why Plain Text Matters for AI Context Understanding
Modern LLMs process text through context windows. When Claude or GPT-4 encounters your llms.txt file during a retrieval operation, it doesn't need special parsing logic. It reads the text the same way it reads any document.
AI systems can understand and potentially respect licensing terms expressed in natural language, even without explicit programming to parse a specific file format.
Anthropic's Claude implementation reportedly checks for llms.txt files and incorporates the content into its context when deciding how to handle retrieved information.
RSL handles the crawling phase. llms.txt potentially influences what happens after content is already in the AI's context window.
llms.txt as Complementary to RSL
RSL Protocol
- Target: AI crawler systems
- Format: JSON or XML
- Function: Automated decision-making during crawl operations
- Enforcement: Cloudflare Pay-Per-Crawl
llms.txt
- Target: Humans reviewing licensing, AI systems during retrieval/inference
- Format: Plain text, Markdown optional
- Function: Communication of terms, contextual understanding
- Enforcement: Relies on AI company compliance teams
A publisher running both files has coverage across the full lifecycle.
File Structure and Required Elements
Header Section
Every llms.txt file starts with identification:
# llms.txt for ExamplePublication.com
# Last updated: 2026-01-15
# Licensing contact: [email protected]
The update date matters for audit trails. Licensing contact should route to someone authorized to negotiate.
Licensing Terms in Natural Language
ExamplePublication.com publishes business journalism covering
the technology sector.
Our content is protected by copyright. AI companies may access
our content for training or retrieval purposes under the following
terms:
1. Per-crawl licensing: $0.008 per page crawled for training purposes.
2. Retrieval licensing: $0.003 per page retrieved for real-time
AI responses.
3. Payment must be established before crawling begins. Contact
our licensing team to set up billing via Stripe.
4. Crawlers that access our content without payment will be
blocked and reported.
Specificity matters. "$0.008 per crawl for training, $0.003 for retrieval" is actionable. "We charge for AI access" is vague.
Content Scope Definitions
Content scope:
INCLUDED in licensing terms:
- All articles published in /news/, /analysis/, and /research/
- Archived content from 2020 to present
- Data tables, charts, and embedded visualizations
EXCLUDED from licensing terms (not available for AI training):
- Subscriber-only content behind /premium/
- Wire service content (AP, Reuters) that we redistribute under
separate license
- User-submitted comments
Pricing and Payment Instructions
Pricing:
Standard per-crawl rates:
- News content (/news/): $0.005 per crawl
- Analysis content (/analysis/): $0.010 per crawl
- Research reports (/research/): $0.020 per crawl
Volume discounts available for crawlers exceeding 50,000
requests per month.
Payment:
We use Cloudflare Pay-Per-Crawl for automated billing.
Compliant crawlers will be prompted to establish payment
via Stripe upon first request.
Non-payment enforcement:
Crawlers that access content without payment will be:
1. Throttled to 10 requests per day (first offense)
2. Blocked entirely (repeated violation)
3. Reported to industry blocklists
Creating an Effective llms.txt File
Tone and Clarity
Write llms.txt as if you're explaining your licensing to two audiences: a competent business professional and a large language model.
Both benefit from:
- Short sentences over compound structures
- Concrete numbers over vague ranges
- Explicit statements over implied meanings
- Consistent terminology throughout
Weak language:
We generally expect AI companies to pay for access to our
valuable content, though we're open to discussing various
arrangements depending on the nature of the usage.
Strong language:
AI companies must pay $0.008 per page crawled. We offer
volume discounts for monthly crawl counts exceeding 50,000
pages. Contact [email protected] before crawling begins.
Specificity Requirements
Research suggests that vague terms get treated as non-binding suggestions. Specific terms get treated as requirements.
Publishers testing llms.txt implementations report higher compliance rates when their files include:
- Exact dollar amounts, not ranges
- Specific URL paths, not general references
- Named contacts with email addresses
- Numbered lists of terms, not paragraph-form prose
Placement and Discoverability
Hosting at /llms.txt
Convention matters:
https://example.com/llms.txt
Not in a subdirectory. Not with a different filename.
Cloudflare's crawler detection system checks the root location. Community standards assume the root location.
Linking from robots.txt
# AI licensing terms available at:
# https://example.com/llms.txt
# https://example.com/rsl.json
User-agent: GPTBot
Crawl-delay: 10
HTTP Header Signals
X-AI-Licensing: https://example.com/llms.txt
X-RSL-Location: https://example.com/rsl.json
How AI Systems Use llms.txt
Claude's Parsing Behavior
Observed behavior suggests:
- When Claude retrieves content from a domain, it may check for llms.txt
- If found, the content enters Claude's context window
- Claude's responses may incorporate awareness of licensing terms
This isn't guaranteed behavior. It's pattern-matched from publisher reports.
OpenAI's Response
OpenAI's official position on llms.txt is undefined. GPTBot respects robots.txt directives. ChatGPT retrieval behavior is less documented.
What's clear: OpenAI's partnerships team reads licensing documentation when evaluating publishers for direct deals.
Retrieval-Augmented Generation Systems
RAG systems retrieve external content and inject it into LLM prompts. llms.txt becomes relevant when RAG systems:
- Retrieve content from your domain
- Check for licensing terms before including content
- Potentially filter or modify responses based on stated terms
Most RAG implementations don't check llms.txt today. But as AI licensing matures, expect more systems to incorporate licensing awareness.
Updating and Versioning
When to Update
Update your llms.txt when:
- Pricing changes
- Scope changes
- Contact changes
- Policy changes
Don't update for trivial changes.
Changelog Best Practices
# CHANGELOG
2026-01-15: Updated retrieval pricing from $0.004 to $0.006
2025-11-20: Added /research/ section to licensed content
2025-09-01: Initial llms.txt publication
Archiving Previous Versions
Save dated copies:
/archive/llms-txt-2025-09-01.txt
/archive/llms-txt-2025-11-20.txt
/archive/llms-txt-2026-01-15.txt
Don't link to these from your active llms.txt. Maintain them internally for legal documentation.
The llms.txt specification fills a gap in the AI licensing stack. RSL handles machine-to-machine communication. robots.txt handles crawler directives. llms.txt handles everything that requires human interpretation or AI contextual understanding.
The file takes 30 minutes to write. The benefits compound as AI licensing matures.
For related guides, see RSL Protocol Implementation and AI Content Licensing Models.