Cloudflare Pay-Per-Crawl Setup: Complete Configuration Guide for Publishers
Quick Summary
- What this covers: Step-by-step guide to configuring Cloudflare Pay-Per-Crawl for AI crawler monetization. Learn pricing tiers, Stripe billing integration, and enforcement settings.
- Who it's for: publishers and site owners managing AI bot traffic
- Key takeaway: Read the first section for the core framework, then use the specific tactics that match your situation.
The robots.txt honor system collapsed sometime around late 2024. OpenAI, Anthropic, and ByteDance had already scraped billions of pages by the time publishers added disallow rules. The damage was done. Training data was collected. The models were built.
Blocking AI crawlers today stops future scraping but recovers nothing from the past. And blocking generates zero revenue.
Cloudflare launched Pay-Per-Crawl in July 2025 as a third option. Not blocking. Not allowing freely. Charging.
The concept is simple: AI companies that want your content pay for access. Those that refuse get blocked or throttled. Cloudflare handles detection, billing via Stripe, and enforcement. You set the prices.
Publishers running Pay-Per-Crawl report $500 to $5,000 monthly from AI crawler licensing. Not transformational revenue for large news organizations. Meaningful new income for trade publications, technical documentation sites, and niche content producers.
This guide walks through the full setup process. Four to six hours of configuration. Revenue generation starting within 30 to 60 days.
[INTERNAL: RSL Protocol Implementation Guide]
What Cloudflare Pay-Per-Crawl Is (And Why Publishers Are Switching From Blocking)
The Collapse of the robots.txt Honor System
robots.txt was never legally binding. It was a social contract. Crawlers agreed to check the file and respect directives because the alternative was web chaos.
AI companies broke that contract at scale.
Common Crawl scraped the web for years, feeding training data to multiple AI companies. Bytespider (ByteDance's crawler) famously ignores robots.txt entries entirely. Even companies that claim compliance scraped aggressively before publishers realized what was happening.
By 2025, research showed 75% of major publishers had added AI crawler blocks to robots.txt. The blocks came too late. GPT-4 was already trained. Claude was already trained. The archives were already in the models.
Blocking today prevents future training. It doesn't undo past extraction. And it generates nothing in return.
How Pay-Per-Crawl Differs from Traditional Ad Revenue
Traditional publishing economics: Create content. Attract traffic. Sell advertising against pageviews. Content value measured by human attention.
AI licensing economics: Create content. AI systems extract it. Value measured by training utility and retrieval frequency. Human traffic optional.
Pay-Per-Crawl monetizes the extraction itself. Every time GPTBot requests a page, Cloudflare charges your configured rate. Every time ClaudeBot scrapes your archive, the meter runs.
| Revenue Model | Value Metric | Your Control |
|---|---|---|
| Display advertising | Human pageviews | Limited (depends on traffic) |
| Subscription | Paying readers | High (paywall decisions) |
| Pay-Per-Crawl | AI crawler requests | High (pricing, enforcement) |
The models aren't mutually exclusive. Publishers run display ads, subscriptions, and AI licensing simultaneously. Each captures different value from the same content.
AI Companies Willing to Pay vs. AI Companies That Ignore Terms
Not all AI companies respond the same to licensing requirements.
Compliant crawlers (observed behavior in Pay-Per-Crawl implementations):
- GPTBot (OpenAI): Checks robots.txt, responds to rate limits, negotiates volume discounts
- ClaudeBot (Anthropic): Honors robots.txt, pays published rates without negotiation
- Google-Extended (Google Gemini): Separate from Googlebot search indexing, complies with payment requirements
Non-compliant crawlers (observed behavior):
- Bytespider (ByteDance): Ignores robots.txt, ignores RSL, doesn't respond to licensing outreach
- CCBot (Common Crawl): Technically compliant but feeds training data to multiple AI companies
- Various Chinese AI crawlers: Operate outside Western compliance frameworks
Pay-Per-Crawl works for compliant crawlers. Non-compliant ones get blocked. The result: revenue from companies that play by rules, protection from those that don't.
[INTERNAL: AI Crawler Directory]
Prerequisites: What You Need Before Starting
Cloudflare Account Requirements (Pro Plan Minimum)
Pay-Per-Crawl requires Cloudflare Pro plan or higher. The Free plan doesn't include the Bot Management features necessary for AI crawler detection and pricing.
| Plan | Monthly Cost | Pay-Per-Crawl Access |
|---|---|---|
| Free | $0 | No |
| Pro | $20 | Yes |
| Business | $200 | Yes (advanced rules) |
| Enterprise | Custom | Yes (full customization) |
Most publishers start with Pro. Upgrade to Business if you need more granular firewall rules or handle high crawler volume.
Before starting configuration, verify:
- Domain is active on Cloudflare (DNS proxied through CF nameservers)
- SSL certificate is active (required for secure billing)
- You have admin access to the Cloudflare dashboard
If your site isn't on Cloudflare yet, add it first. DNS propagation takes 24 to 48 hours. Don't start Pay-Per-Crawl setup until the domain is fully proxied.
Crawler Activity Baseline (90-Day Analytics Snapshot)
You can't price intelligently without knowing current demand. Before setting rates, analyze your existing crawler activity.
Pull 90 days of server logs. Filter for AI crawler user-agents:
GPTBot
ClaudeBot
ClaudeBot-User
Bytespider
Google-Extended
CCBot
Applebot-Extended
Meta-ExternalAgent
PerplexityBot
Calculate for each crawler:
- Total requests per day
- Requests per page (which content they target)
- Crawl depth (surface pages vs. deep archives)
- Time patterns (continuous vs. batched crawls)
This baseline tells you:
- Which AI companies value your content most (high-frequency crawlers)
- Which content sections attract AI attention (pricing tier candidates)
- Total addressable market (if all compliant crawlers paid, what's the revenue?)
Publishers who skip baseline analysis guess at pricing. Publishers who analyze first price based on demonstrated demand.
Content Inventory and Valuation Preparation
Not all content has equal AI training value.
Higher value (commands premium pricing):
- Technical documentation with code examples
- Proprietary research and original data
- Expert analysis in specialized fields
- Real-time information (breaking news, market data)
Lower value (commodity pricing):
- Aggregated news coverage
- General information available elsewhere
- Opinion content without unique data
- Archived content older than 2 years
Map your content sections to value tiers before configuring pricing. A site-wide flat rate leaves money on the table for high-value sections and potentially overprices commodity content.
Example tier structure:
/news/: $0.003/crawl (commodity news)/analysis/: $0.008/crawl (expert analysis)/research/: $0.015/crawl (proprietary data)/docs/: $0.020/crawl (technical documentation)
[INTERNAL: Pricing Your Content for AI Training]
Step 1: Enable Cloudflare AI Crawler Detection
Navigating the Cloudflare Dashboard to Bot Management
Log into Cloudflare dashboard. Select your domain.
Navigate: Security > Bot Management > Configure Bot Management
If you don't see Bot Management, verify your plan supports it (Pro or higher).
The Bot Management panel shows:
- Detected bot traffic breakdown (verified bots, likely bots, likely humans)
- Bot traffic over time
- Top user-agents by request volume
This view confirms Cloudflare is seeing your crawler traffic. If AI crawler user-agents don't appear here, check that DNS is properly proxied (orange cloud icon in DNS settings).
Configuring Detection Rules for GPTBot, ClaudeBot, Bytespider
Navigate to Security > WAF > Custom Rules
Create a rule for each AI crawler you want to monetize or block.
Example rule for GPTBot (OpenAI):
Name: GPTBot Licensing Required
If: User-Agent contains "GPTBot"
Then: Managed Challenge (or Block, depending on strategy)
For Pay-Per-Crawl specifically, you'll use the AI Crawler Monetization panel (added in July 2025 update):
Security > AI Crawlers > Monetization Settings
This panel lists known AI crawlers with toggles:
- Allow: Crawler accesses freely (not recommended)
- License: Crawler must pay configured rate
- Block: Crawler receives 403 response
- Throttle: Crawler allowed at reduced rate
Set GPTBot, ClaudeBot, and Google-Extended to License. Set Bytespider and unknown crawlers to Block.
Testing Detection Accuracy with Crawler Simulation Tools
Before going live, verify detection works. Use Cloudflare's built-in testing:
Security > AI Crawlers > Test Detection
Enter a user-agent string. Cloudflare reports whether it matches a known AI crawler and which rule applies.
External testing options:
- Fetch your own pages using curl with AI crawler user-agents
- Use SEO crawler tools configured with GPTBot user-agent
- Request a test crawl from Anthropic's ClaudeBot verification system
Testing catches configuration errors before they cost revenue. A misconfigured rule might block compliant crawlers who would have paid, or allow non-compliant crawlers who should be blocked.
Step 2: Set Per-Crawl Pricing Tiers
Industry Benchmark Pricing
Pricing data from 50+ publisher implementations (as of January 2026):
| Content Type | Low Range | Typical | Premium |
|---|---|---|---|
| News (general) | $0.002 | $0.003-$0.005 | $0.008 |
| News (breaking/real-time) | $0.008 | $0.010-$0.015 | $0.020 |
| B2B/Trade publications | $0.006 | $0.008-$0.012 | $0.015 |
| Technical documentation | $0.010 | $0.015-$0.020 | $0.030 |
| Research/proprietary data | $0.015 | $0.020-$0.030 | $0.050 |
| User-generated content | $0.001 | $0.002-$0.003 | $0.005 |
These are per-crawl rates, not per-inference. A single page crawl at $0.01 generates $0.01 whether the AI company uses that content in one response or one million responses.
Don't race to the bottom. AI companies pay News Corp $50 million annually. They can afford your $0.01 per crawl.
Don't price yourself out. If rates are too high, compliant crawlers might opt out entirely. You earn nothing instead of something.
Volume Discount Structures for High-Frequency Crawlers
High-volume crawlers have negotiating leverage. A crawler requesting 500,000 pages monthly brings more total revenue than one requesting 5,000 pages, even at lower per-crawl rates.
Example volume discount structure:
- 0-10,000 crawls/month: Standard rate
- 10,001-50,000 crawls/month: 15% discount
- 50,001-100,000 crawls/month: 25% discount
- 100,000+ crawls/month: Negotiate directly
Configure this in Cloudflare under:
AI Crawlers > Pricing > Volume Discounts
Alternatively, set flat rates in Cloudflare and handle volume discounts through manual outreach. When GPTBot hits 50,000 monthly requests, email OpenAI's partnerships team offering reduced rates for committed volume.
Dynamic Pricing Based on Content Freshness and Depth
Advanced implementations vary pricing by content attributes:
By freshness:
- Content published < 24 hours: Premium rate (breaking news value)
- Content 1-30 days old: Standard rate
- Content 30+ days old: Archive rate (lower value, but still monetizable)
By depth:
- Homepage and section fronts: Lower rate (thin content)
- Individual articles: Standard rate
- Data tables, research reports: Premium rate
Cloudflare supports path-based pricing:
AI Crawlers > Pricing > Path Rules
/breaking/*: $0.015
/news/*: $0.005
/research/*: $0.020
/archive/*: $0.002
Complex pricing requires more configuration but captures value more accurately than flat rates.
Step 3: Configure Payment and Enforcement Settings
Connecting Stripe for Automated Billing
Cloudflare Pay-Per-Crawl uses Stripe for payment processing. You need a Stripe account connected to receive funds.
Navigate: AI Crawlers > Billing > Connect Stripe
Follow the OAuth flow to authorize Cloudflare to create charges on your behalf.
Stripe handles:
- Credit card processing for AI companies
- Invoicing for enterprise accounts
- Payout to your bank account (standard 2-day rolling)
- Tax documentation (1099 for US publishers)
Cloudflare takes a processing fee (currently 5% of AI licensing revenue, subject to change). This covers detection, enforcement, and billing infrastructure. It's comparable to payment processor fees in other contexts.
Once connected, test the billing flow:
- Request a test crawl using an AI company's verification system
- Verify the charge appears in your Stripe dashboard
- Confirm the amount matches your configured pricing
Setting Grace Periods and Rate Limits
Not every crawl should trigger immediate billing. Grace periods allow AI companies to establish payment before charges begin.
Recommended settings:
- Grace period: 24 hours (allows time for payment setup)
- Rate limit during grace: 100 requests (enough to test, not enough to scrape extensively)
- Post-grace behavior: Full billing or block
Configure under: AI Crawlers > Billing > Grace Settings
New crawlers hitting your site for the first time receive a Cloudflare challenge prompting them to establish payment. The grace period gives their systems time to process this before you block them for non-payment.
Blocking vs. Throttling Non-Paying Crawlers
Three enforcement options for crawlers that don't pay:
Block (403 response)
- Crawler receives denied access
- No content delivered
- Clearest enforcement
- Risk: Might escalate to user-agent spoofing
Throttle (rate-limited access)
- Crawler receives content but slowly
- Limited requests per minute/hour
- Maintains some access as negotiation leverage
- Risk: Crawler still gets content, just slower
Challenge (CAPTCHA/verification)
- Crawler receives verification challenge
- Legitimate crawlers can solve and proceed to payment
- Blocks crude scraping scripts
- Risk: Additional friction for compliant crawlers
Most publishers start with Block for known non-compliant crawlers (Bytespider) and Challenge for unknown crawlers. Compliant crawlers that just haven't set up payment yet can complete verification and proceed.
Step 4: Deploy and Monitor
DNS Propagation and TXT Record Verification
If you're adding Pay-Per-Crawl to an existing Cloudflare site, no DNS changes are required. The feature works through existing proxy infrastructure.
If you're new to Cloudflare, verify:
- All DNS records are proxied (orange cloud icon)
- SSL/TLS is active (Full or Full Strict mode)
- Cache rules don't interfere with crawler detection
Add a TXT record documenting your AI licensing terms:
Type: TXT
Name: _ai-licensing
Value: "Cloudflare Pay-Per-Crawl active. Terms at /llms.txt"
This TXT record isn't required for Pay-Per-Crawl to function, but it provides another signal to AI companies reviewing your licensing position.
First-Week Analytics: What to Watch For
After deployment, monitor daily for the first week:
In Cloudflare dashboard:
- AI crawler requests by user-agent
- Challenge completion rates (are crawlers solving verification?)
- Billing events (payments initiated, completed, failed)
- Block events (which crawlers are being denied?)
In Stripe dashboard:
- Charges by AI company
- Payment success/failure rates
- Dispute or chargeback activity
Red flags to investigate:
- Crawler requests dropping to zero (possible detection issue)
- High challenge failure rate (configuration too aggressive)
- No billing events despite crawler activity (payment flow broken)
- Unexpected user-agents appearing (new crawlers, possibly spoofed)
Expect a 1-2 week ramp-up period. AI companies need time to detect your pricing requirements and establish payment. Revenue in week one may be minimal. Revenue by week four should reflect steady-state.
Adjusting Pricing Based on Crawler Response Patterns
After 30 days, analyze crawler behavior:
If compliant crawlers are paying without complaint:
- Your pricing is at market or below
- Consider testing higher rates on premium content
If compliant crawlers stopped crawling:
- Your pricing may be above market
- Test lower rates or add volume discounts
- Reach out directly to negotiate
If non-compliant crawlers are bypassing blocks:
- User-agent spoofing likely occurring
- Add IP-based blocking for known bad ranges
- Consider upgrading to Business plan for advanced rules
Pricing isn't set-and-forget. Quarterly reviews catch market shifts, new AI company entrants, and changes in your own content value.
[INTERNAL: llms.txt Specification Guide]
Troubleshooting Common Setup Issues
Crawlers Bypassing Detection (User-Agent Spoofing)
Some crawlers lie about their identity. Bytespider has been observed spoofing mainstream browser user-agents to bypass blocking.
Detection strategies:
- IP range blocking: AI companies operate from known IP ranges. Block requests from ByteDance IP ranges regardless of user-agent.
- Behavioral analysis: AI crawlers request pages faster than humans. Rate-limit rapid sequential requests.
- TLS fingerprinting: Cloudflare Business and Enterprise plans can identify crawler libraries by their TLS handshake patterns.
Add IP range blocks under: Security > WAF > Tools > IP Access Rules
Known AI company IP ranges are published in their crawler documentation. Cross-reference suspicious traffic against these ranges.
Payment Processing Delays with International AI Companies
Stripe processes payments in 30+ currencies, but international transactions can experience delays:
- Currency conversion: AI companies paying in non-USD may have processing delays
- Bank verification: First-time payers require payment method verification
- Compliance holds: Large transactions may trigger review
If you're seeing payment initiation but not completion:
- Check Stripe for pending payments awaiting action
- Verify your Stripe account is fully activated (not restricted)
- Contact Cloudflare support for billing-specific issues
For enterprise-scale AI companies, direct billing relationships often work better than automated per-crawl. Reach out to negotiate monthly invoicing if payment friction is high.
Conflicts with Existing robots.txt Rules
If your robots.txt blocks AI crawlers, Pay-Per-Crawl sees no traffic to monetize.
Typical conflict scenario:
# robots.txt
User-agent: GPTBot
Disallow: /
This blocks GPTBot at the crawler level before Cloudflare can offer payment.
Resolution options:
Option 1: Remove robots.txt blocks, rely on Cloudflare
- Remove Disallow rules for AI crawlers
- Let Cloudflare handle access control via Pay-Per-Crawl
- Compliant crawlers pay; non-compliant get blocked by Cloudflare
Option 2: Selective robots.txt + Cloudflare
- Keep blocks for non-compliant crawlers (Bytespider) in robots.txt
- Remove blocks for compliant crawlers (GPTBot, ClaudeBot)
- Use Cloudflare Pay-Per-Crawl for compliant crawler monetization
Option 3: RSL override
- Maintain robots.txt blocks as default
- Publish RSL file indicating that compliant crawlers with payment established are permitted
- Requires crawler support for RSL protocol (growing but not universal)
Most publishers choose Option 1 or 2. Relying on Cloudflare for enforcement is simpler than managing parallel systems.
When Blocking AI Crawlers Isn't the Move
Skip this if:
- Your site has less than 1,000 monthly organic visits. AI crawlers aren't your problem — getting indexed by traditional search is. Focus on content quality and link acquisition before worrying about bot management.
- You're running a personal blog or portfolio site. AI citation of your content is free exposure at this scale. Blocking crawlers costs you visibility without protecting meaningful revenue.
- Your revenue comes entirely from direct sales, not content. If your content isn't the product (e-commerce, SaaS with no content moat), AI crawlers are neutral. Your competitive advantage lives in the product, not the pages.
Pay-Per-Crawl represents a shift in how publishers think about AI company relationships. Blocking is protection without compensation. Allowing freely is contribution without compensation. Pay-Per-Crawl is commerce.
The setup takes four to six hours. The monitoring takes an hour weekly for the first month, then perhaps an hour monthly ongoing. The revenue starts in 30 to 60 days and compounds as more AI companies comply with licensing requirements.
Publishers who implement now establish pricing before market benchmarks solidify. Publishers who wait negotiate from a weaker position against established industry rates.
The infrastructure exists. The AI companies are crawling. The only variable is whether they pay you for the privilege.
[INTERNAL: RSL Protocol Implementation Guide] [INTERNAL: AI Crawler Directory] [INTERNAL: Pricing Your Content for AI Training]
Frequently Asked Questions
Should I block all AI crawlers from my site?
Not necessarily. Blocking indiscriminately cuts you off from AI-powered search results and citation traffic. The better approach is selective access — allow crawlers from platforms that drive referral traffic or pay for content, block those that only scrape without attribution. Start with robots.txt analysis, then layer in more granular controls based on your traffic data.
How do I know which AI bots are crawling my site?
Check your server access logs for user-agent strings containing GPTBot, ClaudeBot, Googlebot (with AI-related query patterns), Bytespider, CCBot, and others. Most hosting platforms expose these in analytics. If you lack raw log access, tools like Cloudflare or server-side middleware can surface bot traffic patterns without custom infrastructure.
Can I monetize AI crawler access to my content?
Some publishers are negotiating licensing deals directly with AI companies. For smaller sites, the practical path is controlling access (robots.txt, rate limiting, paywalling API endpoints) and measuring whether AI-sourced citation traffic converts. The pay-per-crawl model is emerging but not standardized — position yourself by documenting your content value and traffic patterns now.