The Soft 404 Pattern That Cost a News Network 90% of Its Search Traffic
A multinational news network lost 90% of its Google search traffic in 10 months after a domain migration triggered a wave of soft 404 errors and crawl-budget waste. At peak, its Brazilian property had 513,369 pages crawled but not indexed and 1,193 soft 404s flagged in Search Console. The fix took roughly one quarter and 4 GSC reports most SEO teams never open until traffic is already gone.
What actually broke during the migration
The site moved from a country-code domain (xx.com.br) to a subdomain on the global brand (br.xx.com) in January 2022. On paper, that is the cleanest possible setup. In practice, the redirects did not consolidate authority, and Google ended up crawling both domains in parallel for months. The result, documented in Search Engine Land's deep-dive case study, was a slow bleed rather than a single catastrophic drop. Daily clicks went from 15,000 to 25,000 before the migration to 2,000 to 4,000 by December 2022, and the floor stayed there for over a year.
That last part is what I keep getting stuck on. The traffic did not recover on its own. There was no algorithm update to blame. The audit that fixed it was not glamorous, and most of the actual work lived in two GSC reports that have been free, sitting in everyone's account, since 2020.
The Search Console signal nobody was watching
If you only read Google's own description, a soft 404 sounds harmless: a page that returns HTTP 200 but does not really have content. The issue is what Googlebot does with that pattern over time. Each soft 404 burns crawl budget on a page Google has already decided is not worth indexing, and the more your CMS spits out, the faster the rest of your inventory loses crawl frequency alongside it.
The case study numbers make the scale concrete. Across the network's properties:
- Main brand: 90,400 soft 404 pages
- Spain: 17,700
- Korea: 15,400
- France: 15,100
- Germany: 8,010
And then on the Brazil property: 513,369 pages with status "Crawled, currently not indexed," 1,193 soft 404s, 2,532 with alternate canonical tags, 524 "Discovered, currently not indexed." Together, that is a list of categories most of the standard crawled, currently not indexed guides treat as separate problems. Treated together, they tell one story: Google was telling the site, in four different reports, that it had stopped trusting the URLs.
Crawl rate confirmed the same story. On the France domain, daily Googlebot requests dropped from 60,000 to 70,000 down to 20,000 to 30,000. New articles took 24 hours to index, which is fine for an evergreen blog post and almost useless for a news property whose entire value to Google is freshness.
The currency converter problem (every CMS has its version)
The single biggest contributor to the soft 404 count was auto-generated currency converter pages. Things like /usd-to-thor?amount=250 and /eur-to-signaturechain?amount=1000. The CMS happily generated a near-infinite parameter space. Each URL technically returned 200 with a tiny widget on it. Google looked at them and quietly flipped tens of thousands of them into soft 404 territory because there was no meaningful unique content.
If you have ever wondered why soft 404 audits always start with "what auto-generated patterns is your CMS shipping," this is the reason. Every CMS has its version of it: WordPress tag pages with one post, Shopify variant URLs for sold-out colors, Webflow CMS list pages that never populated, faceted nav on ecommerce that lets bots generate URLs for filter combinations no one searches.
I think most teams underestimate how quickly this can escalate. The case study notes the soft 404 count growing exponentially from October 2022 onward. Once Google decides a pattern of URLs is low-value, it does not stop at the originals. It generalizes.
The 30-minute GSC audit most teams keep skipping
The most striking part of the case study, honestly, is that the data was already in Search Console for nearly a year before anyone treated it as a priority. So if you only had 30 minutes a week to keep your site out of this hole, this is roughly what I would do, in order:
- Open the Page Indexing report. Look at the "Why pages aren't indexed" table, sorted by page count descending.
- Scan for any reason bucket holding more than 1% of your total URL inventory. Those are the categories Google is actively trying to tell you about.
- Click into "Soft 404." If the offending URLs share a path pattern (parameters, tag pages, calculator pages), that is a CMS problem, not a content team problem.
- Cross-check "Crawled, currently not indexed" and "Discovered, currently not indexed." If those buckets are growing month over month while your impressions are flat, you have a crawl-budget leak even if nothing has visibly crashed yet.
- Open the Crawl Stats report. A sustained 30 to 50% drop in daily requests against a stable sitemap is the same story Google was telling the Brazilian property in mid-2022.
That is it. No tools, no licenses, no agency engagement. For new content cadence, the index lag itself is a leading indicator. Our recent piece on Google's TurboQuant 30-result expansion sits on the other end of the same problem: more SERP slots means crawl budget gets fought over harder, not less.
What I would cut from the typical fix list
Not every entry on a "comprehensive fix" template earns its place. Looking at what actually drove the case study site's recovery, two moves did most of the work: returning real 404 or 410 status codes on the dead URL patterns, and blocking the parameter combinations with stricter robots.txt rules. Core Web Vitals optimization was on the original plan and ended up deprioritized because regional infrastructure made it a slow lift. The indexing fixes alone moved traffic on their own.
The numbers, from the same write-up: Brazil's "Crawled, currently not indexed" pool dropped 57% (513,000 to 220,000), soft 404s on Brazil dropped 69% (1,193 to 370), and across all 13 domains soft 404s went from roughly 120,000 to under 20,000 by late April 2023. Germany's clicks climbed from about 8,000 per day to 12,000 to 15,000 per day. Spain's Google Discover share went from baseline to 65% of total traffic.
So when the audit comes up at your next QBR, the order of operations is the part worth pushing back on. Status codes and parameter handling are roughly 80% of the lift. Everything else is a nice-to-have that buys you small percent gains while the soft 404 pile keeps compounding.
The audit you set up before traffic moves
If I had to leave a single rule on the wall above the SEO desk, it would be that soft 404 reports are not a tidy-up task for next quarter. They are a leading indicator of how Google is currently grading your inventory, and the case study site spent 10 months ignoring them. The version of this story that ends in recovery starts with someone opening the Page Indexing report on a Tuesday and refusing to close it until the largest reason bucket is named, mapped to a CMS pattern, and assigned to someone with a deploy button.
The crypto news network in the case study got there. It just took them losing eight figures of search visibility first.