ai-marketing

OpenAI's Crawl Tripled After GPT-5 While ChatGPT Users Fell 28%

OpenAI's crawl volume climbed 3x after GPT-5. Referral traffic back to publishers moved in the opposite direction. Source: Botify via PPC Land.

OpenAI's web crawl roughly tripled after the August 2025 GPT-5 launch, according to a Botify analysis of 7+ billion server log entries collected between November 2024 and March 2026. OAI-SearchBot events rose 3.5x, GPTBot events rose 2.9x. Over the same window, ChatGPT-User events (the crawler that fires when a real user runs an in-chat query) fell 28% year over year. Publishers are feeding more and getting less back.

That's the underreported line. The crawl numbers always get the headlines. The return traffic is what changes the business case for letting OpenAI into your site at all, and the return traffic is moving in the wrong direction.

The exchange rate keeps getting worse

Cloudflare's running crawl-to-refer ratio for OpenAI sits around 1,255 pages scraped for every single referral back to a publisher. Google's equivalent is roughly 14:1. Anthropic's ClaudeBot is north of 20,000:1. If you run any kind of content site, you already suspected OpenAI was an unbalanced trade. The Botify numbers close the loop: the training side is still compounding (GPTBot at 2.9x), while the user-facing side that would justify the trade is shrinking.

The study was run by Nectiv co-founder Chris Long in partnership with Botify, pulling from 250+ billion total log entries across Botify's enterprise clients. That's the largest publicly discussed dataset on OpenAI crawl behavior I've seen in 2026, and the signal is consistent: from what the data shows, the 28% drop in ChatGPT-User activity has held since December 2025. Not a spike. Not a seasonal dip. A sustained flattening of the user side of the crawl.

For scale, Googlebot generated 18.2 billion events in the same dataset versus OpenAI's 887 million. OpenAI still sits at about 4% of Google's crawl volume, which sounds small until you put it on the year-over-year line: OpenAI's share of total crawl was 1.38% twelve months ago. Roughly tripled, which matches the headline number. The trajectory is what to watch, not the absolute number. If the next 12 months compress at the same rate, OpenAI crawl stops being a rounding error and starts sitting inside most sites' top three bot loads.

Some verticals are carrying almost all the cost

Botify broke the crawl growth out by industry, and the spread is wider than most operators would guess:

Healthcare: 740% growth in OpenAI crawl volume
Media and publishing: 702%
Marketplaces: 216%
Software and internet: 205%
Retail and e-commerce: 195%
Travel: 30%

Travel was the only vertical that roughly held flat. Everyone else got absorbed, with media and healthcare leading by a mile. That tracks with what the Chartbeat numbers showed about small publishers losing 60% of search traffic: different crawler, same dynamic, same publishers on the receiving end.

The search-vs-training ratio flipped, and it matters per vertical

Before GPT-5 shipped, the ratio of OAI-SearchBot (live query crawler) to GPTBot (training crawler) sat at 0.95. OpenAI was pulling slightly more for training than for live search. Post-GPT-5, the ratio climbed to 1.14. Search-side crawl now dominates, at least in aggregate.

Break it out by vertical and it gets interesting. Media and publishing sits at 256%, the highest OAI-SearchBot-to-GPTBot ratio in any industry. Healthcare runs the opposite direction, roughly -50% (GPTBot favored). Retail comes in near -33% (GPTBot favored). The two crawlers do different jobs. SearchBot pulls mean potential citation inside a ChatGPT answer, so there's at least a theoretical upside. GPTBot pulls are strictly for model training and return nothing you can measure in GA4.

If you're in media, the trade is at least two-sided on paper: content gets scraped for search, search surfaces might cite you, users may click through. If you're in healthcare or e-commerce, you're still subsidizing training with no near-term citation signal to balance it. That's a meaningfully different calculation per vertical, and I don't see most content teams making it that way.

The consumer side doesn't look great either

OpenAI's share of AI chatbot traffic worldwide dropped from 86.7% in January 2025 to roughly 64.5% by January 2026, according to SimilarWeb. Gemini went from 5.7% to over 20% in that same window. Grok crossed 3%. DeepSeek is approaching it.

That's the context the 28% ChatGPT-User decline is sitting inside. Fewer users on the interface means fewer in-chat queries, means fewer real-time crawler hits for citation opportunities, means the referral end of the bargain keeps thinning. Meanwhile the training side keeps growing. And to be fair, this isn't entirely new. OpenAI has always indexed more than it referred. It just feels a lot less forgiving now that the consumer side is plateauing.

The 20-minute crawl audit worth running this week

You already have the log data. Most content teams just don't look at it by user-agent. Here's the minimum version:

Pull the last 30 days of server log events. Filter by user-agent for OAI-SearchBot, GPTBot, ChatGPT-User, and Googlebot as a baseline.
Count events per bot. Then count referrals from chat.openai.com or openai.com in GA4 or whatever analytics tool you use.
Calculate your own crawl-to-refer ratio. If your OpenAI ratio sits north of 2,000 pages per referral and you're not in media or publishing, the training subsidy is real and it's one-sided.
Make a separate decision per bot. OAI-SearchBot usually stays allowed (citation upside). GPTBot is where the training-only question gets sharp. Cloudflare's managed robots.txt tool handles this cleanly if you're already on their network.
Re-run the audit in 60 days. The Botify trendline suggests the ratio keeps getting worse until OpenAI's consumer side rebounds, or the company starts signing revenue-share deals with publishers the way Google eventually did.

One thing worth flagging: robots.txt blocks do not retroactively unwind training. If GPTBot has already crawled you 2 million times, that content is already weighted in the model. The audit isn't about clawing back what's gone. It's about deciding what happens from here, per vertical, per crawler, with the actual ratio in front of you instead of a vibe.

Where the math actually breaks

The uncomfortable read is that this doesn't feel like a cyclical dip. ChatGPT's user growth flattening at the same time crawler volume is compounding points at a structural decoupling between OpenAI's product side and its ingestion side. The publishers in the middle are the ones absorbing the difference, and the distribution of that cost looks regressive: healthcare and media sites carry most of the load, retail takes a smaller hit, travel barely notices.

I'm not sure the right move is blanket blocking. There's probably still real value in AI search citations over the next 12 months, and cutting off SearchBot feels premature for anyone in a vertical where citation has a clear path to traffic. But sitting on the current crawl-to-refer numbers without flagging them internally, or without at least pulling the GPTBot lever, is the lazier call. The referral exchange rate is not your friend right now, and the next Botify cut of the data will probably tell us whether this is a temporary imbalance or the actual new shape of the deal.

OpenAI's Crawl Tripled After GPT-5 While ChatGPT Users Fell 28%

The exchange rate keeps getting worse

Some verticals are carrying almost all the cost

The search-vs-training ratio flipped, and it matters per vertical

The consumer side doesn't look great either

The 20-minute crawl audit worth running this week

Where the math actually breaks

Read next

Google's TurboQuant Just Took the 30-Result Cap Off RankBrain

Air Canada's C$812 Chatbot Loss Sets the Real Marketing AI Liability Floor

Ahrefs's 1,885-Page Schema Study Killed the AEO Industry's Main Sales Pitch