Lily Ray Planted a Fake Google Update and Fooled Every Major LLM
SEO researcher Lily Ray published a fabricated article about a Google core update that never existed. Within 24 hours, Google AI Overviews and Perplexity cited it as fact. The experiment, documented in Search Engine Journal, exposed a self-reinforcing loop where AI-generated misinformation gets scraped, republished, and treated as authoritative by the same systems that created the problem.
How One Fake Blog Post Became an LLM's "Source of Truth"
Ray's experiment started with a straightforward question. She asked Perplexity about recent Google SEO and AI search news. Perplexity responded with confident details about a "September 2025 Perspectives Core Algorithm Update," complete with specifics about how it emphasized "deeper expertise" and "completion of the user journey."
No such update ever happened. Google never announced it, never rolled it out, and it doesn't appear in any official changelog.
When Ray traced Perplexity's citations, both URLs led to AI-generated content on SEO agency blogs. Not analyst reports. Not Google documentation. Blog posts that had fabricated the update details, which then got scraped and treated as credible because they existed on multiple sites.
For RAG-based systems like Perplexity and AI Overviews, citation volume is essentially all it takes for something to be treated as fact. If three AI-generated blog posts all say the same wrong thing, the system sees three independent sources confirming it. What it actually is: one hallucination repeated three times.
It's the information equivalent of a photocopier copying its own copies. Each generation looks a little worse, but nobody's checking against the original.
Ray took it further. In January 2026, she published a deliberately fake article on her personal blog claiming Google had approved an update "between slices of leftover pizza." Within 24 hours, AI Overviews confidently presented the fake update and even added its own context by connecting the pizza detail to real 2024 pizza-related search incidents. The system didn't just repeat the misinformation. It built on it.
91% Accuracy Sounds Good Until You Run the Numbers
A New York Times investigation, conducted with AI startup Oumi, tested AI Overviews accuracy using SimpleQA, a standard benchmark for factual accuracy. Gemini 3 scored about 91% accurate across 4,326 queries.
Ninety-one percent sounds reasonable. Google processes over 5 trillion searches annually. At a 9% error rate, that works out to tens of millions of wrong answers every hour. Futurism called it "misinformation at a scale possibly unprecedented in the history of human civilization." And that's the accuracy number. The citation quality number is worse.
Even when Gemini 3 got the answer right, 56% of responses were "ungrounded," meaning the cited sources didn't actually support the claim being made. That's up from 37% with Gemini 2.
The citation quality is actively degrading while the headline accuracy number improves slightly.
For marketers using these tools to research competitor moves, track algorithm updates, or validate strategy, that 56% ungrounded rate matters more than the 91% accuracy headline. You might get a correct answer backed by sources that say something completely different. And most people never click through to check.
Model Collapse Stopped Being Theoretical About Six Months Ago
The academic term for what's happening is "model collapse." A 2024 Nature study showed that when generative models train on data that includes their own outputs, the resulting models lose the tails of the original distribution. Rare information disappears. Outputs converge toward the median. Quality degrades generation after generation.
This used to live in research papers. A February 2026 report in Communications of the ACM documented model collapse appearing in production systems, including a commercial background remover that started failing on specific hair textures and image generators producing increasingly homogeneous outputs. In healthcare, researchers found false reassurance rates in AI-generated medical reports tripled to 40% after just two generations of training on synthetic data. Rare but critical pathology simply vanished from the model's vocabulary.
We wrote recently about Sam Altman expressing concern about the dead internet. The AI slop loop is the mechanism behind that concern. It's not just that AI content is everywhere. The next generation of AI tools is training on that content, and the generation after that will train on even more of it. The quality ratchet only turns one direction.
The Research Problem Sitting Upstream of Every Marketing Decision
If you use AI tools for competitive research, keyword analysis, or staying current on platform changes, you're downstream of this problem.
Perplexity confidently described a Google algorithm update that never happened. If a marketing team adjusted their SEO strategy based on that fabricated update, they'd have wasted weeks optimizing for criteria that don't exist. And they'd have no way of knowing unless someone on the team had the experience to spot it.
It's worth flagging that 76% of marketers now use generative AI daily. Most companies aren't checking the output against primary sources. The AI slop loop means the output is getting less reliable over time, but most workflows haven't added verification steps to account for that drift.
A University of Florida study published in the Journal of Marketing Research found that the flood of low-quality AI content is already congesting recommendation systems, making it harder for both consumers and professionals to surface quality work. "Because the quantity is so large, it congests the recommendation systems," the study's lead author Tianxin Zou noted. The volume problem feeds the quality problem feeds the volume problem.
Your AI Research Workflow Probably Has a Credibility Gap
The uncomfortable part of this: if you're using an LLM as a research starting point (and from what I've seen, most marketers are), you now need a verification step between "AI told me this" and "I'm going to act on this." That's annoying, because the whole point of using AI for research was to save time. But the alternative is making decisions on increasingly contaminated information.
For algorithm updates and platform changes: go to the source. Google's official blog, Meta's newsroom, the platform's actual documentation. If the AI summary says there was an update, find the announcement. If you can't find it within 5 minutes of searching, it probably didn't happen.
For competitive intelligence and market data: cross-reference against at least one paid tool (Semrush, Ahrefs, SimilarWeb) that pulls from proprietary datasets, not scraped web content. These aren't immune to data quality issues, but they're not training on each other's outputs. Yet.
For industry analysis: favor named authors with track records. A bylined piece on Search Engine Journal has different credibility than an unsigned blog post on an agency site you've never heard of. Source authority still matters, even though the LLMs can't tell the difference. Recent research from Ahrefs showed that what makes ChatGPT cite a page has almost nothing to do with what makes Google rank it, which means the LLM's idea of a "good source" is already diverging from anything resembling editorial standards.
And honestly, the simplest check: if an AI tool gives you a fact that would change how you spend money, click the citation link. Just that one thing. If the link doesn't exist, or if the source doesn't say what the AI claims it says (which happens 56% of the time with AI Overviews), you've saved yourself from a bad decision with five seconds of effort.
The Ratchet Doesn't Have a Reverse Gear
I think the AI slop loop is going to get worse before anything forces it to get better. Generating AI content is cheap, verification is expensive, and the platforms ingesting this content have no mechanism to distinguish a thoroughly researched piece from a blog post that fabricated its details entirely. I'd estimate we'll see the ungrounded citation rate in AI Overviews cross 70% by the end of 2026 if nothing changes structurally in how these systems evaluate sources.
The economic incentives all point in the wrong direction, at least for now. The models get trained, the loop tightens, and the average quality of AI-generated research drifts another notch down. Nobody's going to fix this for you.
From what I've seen, the marketers who end up best positioned over the next year aren't necessarily the ones using AI the most or the least. They're probably just the ones who figured out where in their workflow the AI output is trustworthy and where it isn't, and built habits around that line. Which sounds simple enough, except the line keeps moving.