The Delegation Boundary Is Why Schema-Heavy AEO Audits Don't Move Citations

The Delegation Boundary Is Why Schema-Heavy AEO Audits Don't Move Citations
The delegation boundary moves with seven factors. Schema markup doesn't shift any of them.

Jason Barnard published a framework on Search Engine Land naming the three signals AI engines weight when deciding which brand to recommend: accuracy, positive sentiment, and consistency across engines. The implication is narrower than the AEO industry has been selling. Schema markup and entity SEO are table stakes; the signal that decides whether ChatGPT names you or a competitor is corroboration that lines up across seven separate engines.

The framework, in one paragraph

Barnard's argument, published May 12 on Search Engine Land, is that AI engines run on three coexisting modes. Search mode (the user does the sorting). Assistive mode (the engine recommends and the user picks). Agent mode (the engine just executes). The "delegation boundary" is the line each user draws between what they do themselves and what they hand off. Cross that line and the engine has to commit to a single brand. Stay before it and the engine can hand back a fuzzy list and let the human sort it out. The whole AEO question collapses into one thing: at the moment of delegation, does the engine have enough confidence in your brand to actually commit?

Seven things that pull the line one way or the other

Barnard lists seven factors that determine where the boundary sits for any given category. Emotional weight (the more a purchase touches identity, the harder to delegate). Domain expertise required. Price relative to income (a $2 coffee delegates easily, a $20,000 car doesn't). Frequency (habitual purchases hand off readily). Reversibility (returnable goods are easy to delegate, wedding venues aren't). Regulatory context (financial, medical, and legal categories carry compliance constraints). Cultural context (trust in agents varies by market).

The list looks generic until you flip it. If you sell something that scores high on emotional weight, low on reversibility, or high on regulatory load, your category's delegation boundary is parked closer to the human side. You compete in assistive and search modes, almost never agent. If you sell repeat-purchase, low-cost, easily-returned products, you should already be benchmarking yourself against agent-mode behavior, because that's where the floor is moving fastest.

Why structured data alone isn't moving citations

This is the part that should be uncomfortable for anyone running a schema-heavy AEO program. An 1,885-page Ahrefs study earlier this quarter found schema additions produced essentially no measurable lift in AI citations. Kevin Indig's consensus-gap research showed only 2.37% of citations survive across all three major engines. And 13% of ChatGPT citations come from Wikipedia regardless of what your own pages say.

Barnard's framework explains all three findings cleanly. Schema is table stakes for the global layer (the encoded knowledge in the model and the search index), but the layer that's deciding individual recommendations is the consistency layer, the one watching whether your description on your homepage matches the description on Wikipedia, the description on G2, and the description in last quarter's trade-press feature. Schema doesn't fix consistency. It just gives engines a slightly easier path to read what's already there.

What the three confidence signals actually look like in a spreadsheet

This is the part Barnard hand-waves and the part you actually have to operationalize. The three signals (accuracy, positive sentiment, consistency across engines) need to be measured per-engine, not in aggregate. Run the same brand-defining query across ChatGPT, Claude, Perplexity, Gemini, Copilot, Siri, and Alexa, capture the responses verbatim, and score each one on those three dimensions.

What I've seen working teams do, mostly informally so far: build a tracking sheet where the rows are queries (the same eight or ten queries you'd test if you were running a brand reputation audit) and the columns are engines. Each cell holds the verbatim response. Then a second tab where you tag each response with a factual error count, a sentiment polarity score, and a "match score" against the canonical brand description on your homepage. Run it weekly. The drift you see week-to-week is the actual AEO problem, and almost none of it is fixed by schema or entity SEO. It gets fixed by getting Wikipedia, G2, Capterra, your own About page, and the trade-press write-ups telling the same story.

One marketer on r/SEO summed it up bluntly recently: most AEO audits track what's easy to crawl, not what's actually shaping the recommendation. From what I've seen, that's roughly correct.

The Thomann example, and why it matters

Barnard's anchor case is mundane on purpose. He asked ChatGPT for reverb, compression, and EQ pedals under $125, delivered to Europe by Friday. ChatGPT named Thomann. He clicked, bought, and has since spent over €2,000 with the retailer. The delegation boundary in that conversation sat right at the buy button. ChatGPT made the brand decision, the model decision, the price-tier decision, and the supplier decision, and handed Barnard exactly one option to click on.

That's a 15-minute compression of a process that used to take a week of forum browsing. And it's worth saying: Thomann didn't win that recommendation with schema. They won it because their stock, shipping, returns, and credibility were described the same way on their own site, on third-party reviews, on Reddit, and in trade-press coverage. The model had nothing to disambiguate.

Where this leaves an AEO budget for the next quarter

I think the practical move is to stop spending the next two quarters tuning schema and start spending them auditing consistency. Map the eight or ten queries that matter for your category. Run them through each engine. Find every spot where the description of your product or company differs across sources. Then go fix the highest-traffic external mentions first: your Wikipedia entry if you have one, your G2 and Capterra listings, the major trade-press features. That's it. That's most of the work.

A small honest hedge before signing off: this is still early enough that nobody, including Barnard, has clean before-and-after numbers tying a consistency cleanup to a measurable citation lift. The Kalicube framework names the right variables. Whether moving them on a six-month timeline actually changes citation share at the scale a $50K AEO budget can move it is something the next two quarters will have to answer.

The cleaner read is that schema-first AEO was the version of this work you could outsource to a junior. Consistency-first AEO needs someone senior enough to negotiate with comms, with PR, and with the partnerships team about what the brand actually says it is. That's probably why most agencies are still selling the schema version.

Notice Me Senpai Editorial