llms.txt Specification: The Human-Readable Licensing Standard for AI Systems

The AI licensing landscape has a communication problem. Machine-readable protocols like RSL (Really Simple Licensing) work for automated systems, but they fail when human judgment enters the picture. An OpenAI engineer reviewing licensing terms doesn't want to parse JSON. An Anthropic compliance officer checking whether their crawler respects publisher wishes doesn't want to decode XML schemas.

llms.txt solves this by being readable. Plain text. Natural language. A document that both AI systems and humans can interpret without tooling.

This isn't a replacement for RSL or robots.txt. It's a complement. RSL tells machines what to do. llms.txt tells humans (and increasingly, AI systems themselves) what the rules are in language they can process contextually.

What llms.txt Is

Human-Readable vs. Machine-Readable Licensing

RSL protocol uses structured data formats. JSON looks like this:

{
  "licensor": "Example Publisher",
  "content_type": "news",
  "pricing_model": "per_crawl",
  "rate": 0.005
}

Machines parse it efficiently. Humans squint at curly braces.

llms.txt uses prose:

This site is operated by Example Publisher.

We license our content to AI companies for training and retrieval purposes.

Per-crawl rate: $0.005 for news content.
Contact [email protected] to establish a billing relationship.

Both communicate the same information. llms.txt does it in a format that requires no technical training to understand.

Why Plain Text Matters for AI Context Understanding

Modern LLMs process text through context windows. When Claude or GPT-4 encounters your llms.txt file during a retrieval operation, it doesn't need special parsing logic. It reads the text the same way it reads any document.

AI systems can understand and potentially respect licensing terms expressed in natural language, even without explicit programming to parse a specific file format.

Anthropic's Claude implementation reportedly checks for llms.txt files and incorporates the content into its context when deciding how to handle retrieved information.

RSL handles the crawling phase. llms.txt potentially influences what happens after content is already in the AI's context window.

llms.txt as Complementary to RSL

RSL Protocol

Target: AI crawler systems
Format: JSON or XML
Function: Automated decision-making during crawl operations
Enforcement: Cloudflare Pay-Per-Crawl

llms.txt

Target: Humans reviewing licensing, AI systems during retrieval/inference
Format: Plain text, Markdown optional
Function: Communication of terms, contextual understanding
Enforcement: Relies on AI company compliance teams

A publisher running both files has coverage across the full lifecycle.

File Structure and Required Elements

Header Section

Every llms.txt file starts with identification:

# llms.txt for ExamplePublication.com
# Last updated: 2026-01-15
# Licensing contact: [email protected]

The update date matters for audit trails. Licensing contact should route to someone authorized to negotiate.

Licensing Terms in Natural Language

ExamplePublication.com publishes business journalism covering
the technology sector.

Our content is protected by copyright. AI companies may access
our content for training or retrieval purposes under the following
terms:

1. Per-crawl licensing: $0.008 per page crawled for training purposes.
2. Retrieval licensing: $0.003 per page retrieved for real-time
   AI responses.
3. Payment must be established before crawling begins. Contact
   our licensing team to set up billing via Stripe.
4. Crawlers that access our content without payment will be
   blocked and reported.

Specificity matters. "$0.008 per crawl for training, $0.003 for retrieval" is actionable. "We charge for AI access" is vague.

Content Scope Definitions

Content scope:

INCLUDED in licensing terms:
- All articles published in /news/, /analysis/, and /research/
- Archived content from 2020 to present
- Data tables, charts, and embedded visualizations

EXCLUDED from licensing terms (not available for AI training):
- Subscriber-only content behind /premium/
- Wire service content (AP, Reuters) that we redistribute under
  separate license
- User-submitted comments

Pricing and Payment Instructions

Pricing:

Standard per-crawl rates:
- News content (/news/): $0.005 per crawl
- Analysis content (/analysis/): $0.010 per crawl
- Research reports (/research/): $0.020 per crawl

Volume discounts available for crawlers exceeding 50,000
requests per month.

Payment:

We use Cloudflare Pay-Per-Crawl for automated billing.
Compliant crawlers will be prompted to establish payment
via Stripe upon first request.

Non-payment enforcement:

Crawlers that access content without payment will be:
1. Throttled to 10 requests per day (first offense)
2. Blocked entirely (repeated violation)
3. Reported to industry blocklists

Creating an Effective llms.txt File

Tone and Clarity

Write llms.txt as if you're explaining your licensing to two audiences: a competent business professional and a large language model.

Both benefit from:

Short sentences over compound structures
Concrete numbers over vague ranges
Explicit statements over implied meanings
Consistent terminology throughout

Weak language:

We generally expect AI companies to pay for access to our
valuable content, though we're open to discussing various
arrangements depending on the nature of the usage.

Strong language:

AI companies must pay $0.008 per page crawled. We offer
volume discounts for monthly crawl counts exceeding 50,000
pages. Contact [email protected] before crawling begins.

Specificity Requirements

Research suggests that vague terms get treated as non-binding suggestions. Specific terms get treated as requirements.

Publishers testing llms.txt implementations report higher compliance rates when their files include:

Exact dollar amounts, not ranges
Specific URL paths, not general references
Named contacts with email addresses
Numbered lists of terms, not paragraph-form prose

Placement and Discoverability

Hosting at /llms.txt

Convention matters:

https://example.com/llms.txt

Not in a subdirectory. Not with a different filename.

Cloudflare's crawler detection system checks the root location. Community standards assume the root location.

Linking from robots.txt

# AI licensing terms available at:
# https://example.com/llms.txt
# https://example.com/rsl.json

User-agent: GPTBot
Crawl-delay: 10

HTTP Header Signals

X-AI-Licensing: https://example.com/llms.txt
X-RSL-Location: https://example.com/rsl.json

How AI Systems Use llms.txt

Claude's Parsing Behavior

Observed behavior suggests:

When Claude retrieves content from a domain, it may check for llms.txt
If found, the content enters Claude's context window
Claude's responses may incorporate awareness of licensing terms

This isn't guaranteed behavior. It's pattern-matched from publisher reports.

OpenAI's Response

OpenAI's official position on llms.txt is undefined. GPTBot respects robots.txt directives. ChatGPT retrieval behavior is less documented.

What's clear: OpenAI's partnerships team reads licensing documentation when evaluating publishers for direct deals.

Retrieval-Augmented Generation Systems

RAG systems retrieve external content and inject it into LLM prompts. llms.txt becomes relevant when RAG systems:

Retrieve content from your domain
Check for licensing terms before including content
Potentially filter or modify responses based on stated terms

Most RAG implementations don't check llms.txt today. But as AI licensing matures, expect more systems to incorporate licensing awareness.

Updating and Versioning

When to Update

Update your llms.txt when:

Pricing changes
Scope changes
Contact changes
Policy changes

Don't update for trivial changes.

Changelog Best Practices

# CHANGELOG

2026-01-15: Updated retrieval pricing from $0.004 to $0.006
2025-11-20: Added /research/ section to licensed content
2025-09-01: Initial llms.txt publication

Archiving Previous Versions

Save dated copies:

/archive/llms-txt-2025-09-01.txt
/archive/llms-txt-2025-11-20.txt
/archive/llms-txt-2026-01-15.txt

Don't link to these from your active llms.txt. Maintain them internally for legal documentation.

The llms.txt specification fills a gap in the AI licensing stack. RSL handles machine-to-machine communication. robots.txt handles crawler directives. llms.txt handles everything that requires human interpretation or AI contextual understanding.

The file takes 30 minutes to write. The benefits compound as AI licensing matures.

For related guides, see RSL Protocol Implementation and AI Content Licensing Models.