Web Pages Tripled in Size Since 2015. Google Noticed. They Are Also Part of the Problem.

Web Pages Tripled in Size Since 2015. Google Noticed. They Are Also Part of the Problem.
The median mobile page is almost 2.4 MB now. Google says it matters, then asks you to add more code.

Web pages have tripled in size over the last decade. The median mobile homepage weighed 845 KB in 2015. By mid-2025, the HTTP Archive Web Almanac measured that number at 2,362 KB. Nearly three times heavier, and the trend shows no sign of flattening.

Google knows this. On a recent episode of the Search Off the Record podcast, Gary Illyes and Martin Splitt discussed why page weight still matters and what is contributing to the bloat. The conversation was surprisingly honest, and one admission in particular deserves more attention than it is getting: Google itself is part of the problem.

Google is asking for more code while telling you pages are too heavy

Illyes raised a specific concern during the podcast that most coverage has glossed over. He pointed out that structured data, the machine-readable markup Google asks publishers to add for rich results, is contributing to page bloat. His phrasing was blunt: Google is requesting data "for machines, not users."

That tension is worth sitting with for a moment. The same company that encourages you to add JSON-LD structured data for product reviews, FAQ accordions, how-to steps, article metadata, breadcrumbs, and organization schema is also telling you that your pages are getting too large and it matters for performance.

Both of those things are true at the same time, and nobody at Google seems to have a clean answer for the contradiction. Splitt acknowledged that page weight matters significantly for users on slower or metered connections, which is a large portion of the global web audience. But he also admitted the issue is "less relevant" on fast home connections, which is a strangely casual thing to say about a ranking consideration that affects billions of pages.

What 2,362 KB actually feels like on a real connection

The number is abstract until you put it in context. A 2.4 MB page on a 3G connection (which is still the reality for a significant chunk of mobile users globally) takes roughly 8 to 12 seconds to become interactive. On a metered data plan, loading 50 pages a day at that weight burns through about 120 MB of data, just on web browsing.

For publishers in North America and Western Europe, it is easy to dismiss this as someone else's problem. Most of your audience is on LTE or WiFi. But Google's crawl infrastructure does not make that distinction. Googlebot has finite resources, and heavier pages consume more of its crawling budget. If your pages are large, Google may crawl fewer of them in any given session. That is not a penalty. It is just math.

Martin Splitt made a point during the podcast that "website-level size is meaningless," arguing that individual page weight is what matters. He is right in theory. But the practical issue for most SEO teams is that they are adding weight to every page through global elements: analytics scripts, consent managers, tag managers, chat widgets, header/footer structured data. The weight is not in the content. It is in the infrastructure wrapped around the content.

Where the actual weight is coming from

If you have not audited your page weight recently, the composition probably looks something like this (based on patterns from the Web Almanac data and a few real audits I have seen shared on r/webdev):

  • Images: Still the single largest contributor, typically 40-60% of total page weight. Unoptimized hero images and product photos are the usual suspects.
  • JavaScript: Usually 20-30% of weight. Third-party scripts (analytics, ads, tracking pixels, consent management) are the biggest source, and most publishers have no idea how much weight each script adds because nobody audits the tag manager after initial setup.
  • CSS: Often 5-10%, but framework CSS (looking at you, Tailwind output without purging, or Bootstrap loaded in full) can push this higher.
  • Fonts: 3-7% typically. Custom web fonts add up fast if you are loading multiple weights and styles.
  • Structured data and metadata: Usually small individually (2-5%), but it adds up across JSON-LD blocks, Open Graph tags, Twitter cards, and various platform-specific metadata.

The structured data piece is interesting because it grows with every new rich result type Google introduces. FAQ schema, HowTo schema, product schema with reviews, organization schema, breadcrumbs. Each one is maybe 1-3 KB. On a page with several of them, you are adding 5-15 KB of machine-readable content that provides zero value to the human reader looking at the page.

Google's crawl limits are closer than you think

During the podcast, the team mentioned several crawl size limits that are worth knowing about:

  • Googlebot crawls the first 2 MB of supported file types (HTML, CSS, etc.)
  • There is a broader 15 MB default limit across Google's crawl infrastructure
  • PDFs get a more generous 64 MB limit

Most individual pages are well under 2 MB of HTML. But once you factor in the total page weight that the browser needs to render (all the CSS, JavaScript, images, fonts), the user experience is what suffers. And Core Web Vitals, which Google does use as a ranking signal, measure that full rendering experience, not just the HTML size.

There is a useful distinction here that gets muddled in most conversations about page speed. Googlebot cares about HTML size because that is what it crawls. Users care about total page weight because that is what they wait for. Core Web Vitals care about rendering performance, which is affected by both. You need to optimize for all three, and they are not always the same optimization.

The audit you should run this week

Open Chrome DevTools on your highest-traffic page. Go to the Network tab, reload, and sort by size. The answer to "why is my page heavy?" is almost always in the top 10 resources loaded.

Specifically, look for:

  1. Images over 200 KB. If your hero image is 800 KB, converting to WebP or AVIF and properly sizing it will likely cut that by 60-70%. This is the single highest-impact change for most sites.
  2. JavaScript bundles you do not recognize. Third-party scripts accumulate. That chat widget someone added two years ago, the A/B testing script from a vendor you no longer use, the second analytics implementation nobody decommissioned. Each one adds latency and weight.
  3. Uncompressed resources. Check your server's Content-Encoding headers. If you are not serving Brotli (or at minimum gzip) for text resources, you are shipping 60-80% more bytes than necessary.
  4. Structured data you added for rich results that never materialized. If you implemented FAQ schema on 500 pages and none of them show FAQ rich results, that schema is just weight. Consider removing it or limiting it to pages that actually qualify.

The structured data audit is the one most SEO teams skip, and it connects directly to the paradox Illyes raised. Google asks for structured data. You add it. But if the result is heavier pages that load slower and Google never shows the rich result anyway, you have made your site worse for both users and crawlers on behalf of a feature you do not get.

We covered Google's TurboQuant algorithm and how it is changing semantic search processing recently. As search systems get more sophisticated about understanding content, the argument for adding ever-more machine-readable markup to your pages gets weaker. Google is getting better at understanding your content without you having to spell it out in JSON-LD.

Pages will keep getting heavier. The question is whether yours have to.

The broader trend is not going to reverse. Richer media, more interactive features, more third-party integrations. The Web Almanac data basically guarantees that the median page will be heavier next year than it is now.

But your pages do not have to follow the median. The sites that load fastest in their competitive set tend to rank better, convert better, and retain more visitors. That was true five years ago, and with Core Web Vitals baked into rankings since 2021, the performance advantage has only gotten clearer.

I suspect that by the end of 2026, we will see Google introduce more explicit guidance about structured data weight, possibly even recommending against certain schema types on performance-sensitive pages. They created this problem by asking publishers to add more machine-readable data without acknowledging the cumulative cost. Illyes essentially admitted that on the podcast. The fix will probably be slower than the problem, which means the publishers who audit and trim now will have a performance edge over the ones waiting for Google to sort out its own contradictions.