How do you evaluate AI visibility statistics?

Use a four-step check. First, find the primary source; if a vendor blog only cites another vendor blog, stop. Second, check the sample size and method, because a single-customer case study does not generalise to a category. Third, check the date, since AI search behaviour moves fast enough that older figures may be obsolete and single measurements are unstable . Fourth, check whether the claim survives recasting; a "23×" multiple that shrinks to "1.66% versus 0.15%" once both rates are disclosed was overstated. A reliable reading reports a figure with its method and variance rather than as a bare headline number . Any one of the four steps can fail a statistic.

Which AI visibility sources are most reliable?

The most reliable sources disclose their method and can be audited. Enterprise-scale telemetry with public methodology qualifies, including Pew Research Center , Adobe Digital Insights , and Cloudflare Radar . Peer-reviewed academic work qualifies, such as the Princeton GEO paper and the GEO-16 analysis . Major consultancies with stated samples and primary-source documentation round out the top tier. The common feature is disclosed methodology, not brand reputation.

Why do AI visibility statistics vary so much across sources?

Two reasons, one structural and one technical. The structural reason is sourcing. Vendor blogs cite each other in circular patterns, so a weak figure can appear in many places without ever gaining a primary source. The technical reason is that AI answers are stochastic. The same query run twice can produce different results, so a single measurement is a sample rather than a constant . Different studies also use different methods, which produce different absolute numbers even on the same outcome. A figure reported with its method and a confidence interval is more trustworthy than a bare point estimate .

What is wrong with the "23× AI conversion rate" claim?

It generalises from a single case study with one data point, and the headline multiple depends on framing. A "23×" multiple sounds decisive until both underlying rates are disclosed. A figure like "1.66% versus 0.15%" then reads very differently and rests on a tiny sample. The claim is not necessarily false for that one customer; it simply does not support a category-wide conclusion. The replacement evidence is a study reporting AI traffic converting at roughly three times other channels, with its method stated . The lesson is the second step of the check: a single-customer result does not generalise.

Does FAQ schema really lift AI citations by 44%?

No reliable evidence supports a 44% lift, and the figure traces to a single vendor study with unclear methodology. A control-matched study of 1,885 pages found schema produced a +2.4% lift on Google AI Mode, statistically indistinguishable from zero . A separate 300,000-domain analysis found no relationship between the relevant machine-readable file and citation rate . Schema remains a hygiene baseline that AI engines still read, but it is not a citation lever of the size the vendor figure implies. Treat any large schema-lift claim as provisional until a control-matched study supports it.

How can I check whether a statistic about AI search is reliable?

Run the four-step check. Find the primary source, and stop if the trail is only vendor blogs citing each other. Check the sample size and method, because a case study of one customer does not generalise. Check the date, since the field moves fast and single readings are unstable . Check whether the claim survives recasting into its underlying rates. A figure that holds up reports its denominator, its method, and ideally its variance . If a statistic fails any step, treat it as provisional rather than repeating it.

Where can I find primary sources on AI visibility research?

Start with the source types that disclose method. For traffic and click behaviour, Pew Research Center and Adobe Digital Insights publish their methodology . For crawler and bot data, Cloudflare Radar reports network-scale telemetry . For citation and retrieval mechanics, the peer-reviewed GEO literature is primary . For accuracy and trust, the Tow Center study is a primary academic source . For crawler behaviour specifically, the vendors' own documentation is authoritative. Each names its method, which is the test that matters.

Why does this library publish a list of statistics it no longer uses?

Because a research surface that hides its errors is indistinguishable from a vendor blog. The category's biggest weakness is circular sourcing, where weak figures spread by repetition. Publishing a retraction list breaks that cycle and shows the reader exactly how the tier system is applied. Each retraction names the discredited claim and the cleaner primary source that replaces it. One example is the control-matched schema study standing in for inflated schema-lift figures . The list is a working document. If a figure used here is later shown to be wrong, the correction is published rather than quietly removed.

Go back

AI Search & Discovery

How to read the evidence in AI search visibility: sources we trust, sources we don't

The AI visibility category repeats statistics that do not survive scrutiny, often vendor blogs citing each other. Our public three-tier source system separates citable research from noise: peer-reviewed and enterprise-scale data at the top, single-data-point case studies treated as provisional.

The AI visibility category has a sourcing problem, and a public tier system is the way through it. Vendor blogs cite each other in circular patterns, and statistics with no disclosed methodology get repeated until they look authoritative. Our three-tier system separates evidence that survives audit from evidence that does not. Tier A is citable without qualification, Tier B is citable with a caveat, and Tier C is provisional. The sections that follow set out the tiers, name the sources used most, and give the reader a four-step check.

Sourcing transparency is the category's missing layer

The category repeats numbers that do not hold up. A claim originates in a single vendor case study and travels into an agency deck. It gets quoted in a conference talk, then arrives in a brand-team briefing with the original sample size stripped away. By the time a brand manager reads it, the figure looks like consensus.

Two structural habits drive the problem. Vendor blogs cite other vendor blogs, which manufactures the appearance of corroboration without adding evidence. Real research sits next to marketing collateral in the same search results, and the two are formatted to look alike. A brand team making a budget decision deserves a way to tell them apart.

The fix is not a longer reading list. It is a published standard for what counts as evidence, applied openly, including to this library's own provisional sources. Sourcing transparency is the layer the category is missing, and publishing it is the move that earns external citation.

Three tiers cover the practical space

The system has three tiers, defined by what a source discloses and how it was produced. Tier A is citable without qualification. It covers peer-reviewed academic work, enterprise-scale analytics providers with public methodology, major consultancies, large first-party studies with disclosed methodology, and government or regulatory sources. The Princeton GEO paper is a Tier A example, peer-reviewed and method-disclosed Aggarwal et al. · Princeton University / Georgia Tech / Allen Institute for AI / IIT Delhi · 2024.

Tier B is citable with attribution and a one-line caveat. It covers credible trade press, case studies from named companies with disclosed methodology, and single-vendor studies with disclosed methodology. The control-matched schema study is a Tier B example, a single vendor's analysis with a clearly stated difference-in-differences design Ahrefs · 2026.

Tier C is provisional. It covers vendor blogs, individual LinkedIn or Substack posts, and case studies with a single data point. A Tier C source is used only when a finding is genuinely novel and no Tier A or B source exists. It is labelled as provisional in the body. Finer-grained distinctions than these three do not survive contact with the work.

Tier A is the list of sources whose methodology survives audit

Tier A is not a wish list. It is the set of sources whose methodology can be checked. The most-used Tier A sources in this library fall into four groups.

The first group is enterprise-scale telemetry with public methodology. Pew Research Center publishes its panel and opt-in tracking method Pew Research Center · 2025. Adobe Digital Insights aggregates Analytics data across trillions of visits with disclosed method Adobe Digital Insights · 2025. Cloudflare Radar reports network-scale crawler telemetry Cloudflare · 2025.

The second group is major consultancies with stated samples. Bain surveyed more than 1,000 consumers alongside proprietary analysis Bain & Company · 2025. McKinsey published its forecast method for the AI search revenue projection McKinsey & Company · 2025. The third group is peer-reviewed academic work: the GEO-16 empirical analysis arXiv · 2025, the citation-absorption framework Zhang et al. · arXiv (cs.IR) · 2026, and the OpenAI usage study Chatterji et al. · NBER / OpenAI / Harvard University · 2025. The fourth group is primary-source documentation and government work: the Stanford AI Index Stanford Institute for Human-Centered AI (HAI) · 2026, the Tow Center accuracy study Tow Center for Digital Journalism, Columbia · 2025, and first-party crawler docs OpenAI · 2025.

The statistics we no longer cite, and why

Publishing retractions is the move nobody else in the category makes. Several figures that circulate widely are no longer used here, each for a specific, checkable reason.

The first is the "23× AI conversion rate" claim. It generalises from a single SaaS case study with one data point, and the headline multiple shrinks dramatically once the underlying rates are disclosed. The replacement evidence is a study reporting AI traffic converting at roughly three times other channels, with its method stated Microsoft Clarity · 2026. The second is the family of "FAQ schema lifts citations 2.5 to 3.2 times" claims, sourced to vendor blogs. The control-matched study found schema produced a +2.4% lift on Google AI Mode, statistically indistinguishable from zero Ahrefs · 2026.

The third is the "44% citation lift from FAQ schema" figure, a single vendor study with unclear methodology. It is contradicted by the same control-matched evidence and by a 300,000-domain analysis that found no relationship between the relevant file and citation rate SE Ranking · 2025. The fourth is a representative vendor-reported case of a 19.72% rise in AI Overview visibility after entity linking van Berkel · Schema App · 2025. The figure is a single-data-point case study; treat it as provisional. The pattern across all four is the same: a clean primary source replaces a recycled vendor figure.

You can check a statistic yourself in four steps

A reader does not need this library to evaluate a claim. The check has four steps, and any one of them can fail a statistic.

The first step is to find the primary source. If a vendor blog cites another vendor blog, and that one cites a third, stop. A claim with no primary source is not a claim, it is a rumour with a number attached. The second step is to check the sample size and method. A study of one customer does not generalise to a category, and a percentage with no denominator hides its own weakness.

The third step is to check the date. AI search behaviour moves fast, so a 2023 figure may already be obsolete, and single measurements are unstable in the first place Mustahsan · arXiv · 2025. The fourth step is to check whether the claim survives recasting. A "23×" multiple that collapses to "1.66% versus 0.15%" once both rates are shown was overstated by its framing. A reliable reading reports the figure with its method and variance Lior et al. · arXiv (cs.CL) / EMNLP 2025 · 2025.

What we still don't know

Naming the gaps openly is the same discipline as naming the tiers. Several important questions have evidence too thin for a confident answer, and pretending otherwise would fail this article's own test.

The first gap is real-world prompt data. What users actually type into AI assistants is proprietary to the AI companies, so the category works from public proxies rather than direct observation. The second is the long-term effect of model updates on visibility trajectories, where no longitudinal Tier A study yet exists. The third is how agentic browsers reshape attribution and measurement, documented as products but not yet as behaviour OpenAI · 2025. The behaviour of the second major agentic browser is equally undocumented at Tier A Perplexity AI · 2025.

The fourth gap is how AI engines weight first-party against third-party content in equilibrium, which current evidence describes only in snapshots. The fifth is how regulation will affect retrieval in practice. The EU Code of Practice covers copyright and transparency European Commission / EU AI Office · 2025. The US Copyright Office has acknowledged that fair use in AI training is fact-specific U.S. Copyright Office · 2025. The operational effects on retrieval are still emerging.

The commitment behind every source list

Every article in this library applies the same rule. Tier A sources are used where they exist, Tier B with attribution and a caveat, and Tier C only as provisional and labelled as such. The source list at the bottom of each article shows the tier of every entry. The reader can audit the evidence without leaving the page.

The standard applies to this library's own weak spots. Where a finding rests on a single vendor case study, the article says so in the same sentence as the claim. Where a number is provisional, the word "provisional" sits next to it, not in a footnote. The crawler data this library leans on most is named with its source, so a reader can check the method directly Cloudflare · 2025.

Credibility is the category's scarcest resource. A public tier system, an open retraction list, and a named gap list are how this library spends it carefully. If a figure here turns out to be wrong, the correction is published. A research surface that hides its errors is just another vendor blog.

FAQ
Frequently Asked Questions

Sources

Sources are tiered per our methodology & sources page.

Tier A — Strongest evidenceRead source

From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms

arXiv (cs.IR) · Kai Zhang et al. · 2026

Key finding

ChatGPT cites around 7 sources per answer; Perplexity and Google AI Overviews cite more. But pages cited by ChatGPT have a much higher average influence on the answer's wording and evidence. Influence rises with page length, structure, and the density of definitions, statistics, comparisons, and step-by-step procedures.

Methodology note

602 controlled prompts run through ChatGPT, Google AI Overview / Gemini, and Perplexity. The researchers analysed 21,143 citations and 18,151 fetched pages, extracting 72 features per citation. They measured citation breadth (how many sources are cited) and citation depth (how much each cited source actually shapes the final answer). The dataset is public.

arXiv·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

The 2026 AI Index Report

Stanford Institute for Human-Centered AI (HAI) · 2026

Key finding

Organisational AI adoption reached 88% and four in five university students now use generative AI. Generative AI reached 53% population adoption within three years, faster than the PC or the internet. The estimated value of generative AI tools to US consumers reached 172 billion dollars annually by early 2026. Documented AI incidents rose to 362 in 2025, up from 233 in 2024.

Methodology note

Annual Stanford HAI report drawing on dozens of sources: AI model benchmark results (SWE-bench, IMO, OSWorld), private investment trackers, patent and publication databases, government policy data, and global public opinion surveys. Nine chapters cover R&D, performance, responsible AI, economy, science, medicine, education, policy, and public opinion.

Stanford HAI·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

OpenAI Crawler Documentation Update (Dec 2025 — narrows robots.txt compliance)

OpenAI · 2025

Key finding

In the December 9, 2025 update, OpenAI's bot documentation removed the previous claim that OAI-SearchBot feeds navigational links into ChatGPT answers and dropped any reference to OAI-SearchBot supplying training data. ChatGPT-User was expanded to explicitly cover Custom GPT requests and GPT Actions, and robots.txt is no longer applied to user-initiated ChatGPT-User actions. OpenAI also confirmed OAI-SearchBot and GPTBot share crawl results to avoid duplicate fetching.

Methodology note

Same canonical OpenAI documentation page, captured after the December 9, 2025 revision identified publicly by Pieter Serraris. Direct diff was not available; changes were confirmed against the live developers.openai.com/api/docs/bots page and detailed write-ups on PPC Land, Search Engine Roundtable and Stan Ventures comparing pre- and post-update language.

OpenAI Developer Docs·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

Stochasticity in Agentic Evaluations: Quantifying Inconsistency with Intraclass Correlation

arXiv · Mustahsan · 2025

Key finding

Quantifies stochasticity in agentic LLM evaluations using intraclass correlation coefficients (ICC). Shows that single-run evaluations of agentic systems are unreliable because run-to-run variance is large relative to the gap between system variants. Recommends a minimum of 5 to 10 repeated runs per evaluation and reports the ICCs for several common agentic benchmarks.

Methodology note

arXiv preprint 2512.06710 (December 2025). Direct fetch on arxiv.org returned the abstract page. The paper applies the intraclass correlation coefficient framework from psychometrics to LLM agent evaluation and reports ICC values across multiple published benchmarks.

arXiv·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

Introducing ChatGPT Atlas

OpenAI · 2025

Key finding

OpenAI launched ChatGPT Atlas, an AI-native web browser that embeds ChatGPT directly into browsing, summarises pages, answers questions in the sidebar, and can carry out multi-step tasks on the user's behalf such as filling forms, comparing products, and completing purchases. (agent inferred)

Methodology note

First-party product launch announcement from OpenAI. The post introduces Atlas as a Chromium-based browser with ChatGPT integrated as the default interface, agentic capabilities for browsing and transacting, and memory of past sessions. Initial availability is on macOS with other platforms to follow.

OpenAI·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

New Front Door to the Internet: Winning in the Age of AI Search

McKinsey & Company · 2025

Key finding

McKinsey projects that AI-powered search will mediate roughly $750 billion in US consumer revenue by 2028, representing a meaningful share of category-level discovery. Brands that win in AI answers tend to combine strong third-party coverage, structured product information, and active management of their entity presence across the open web.

Methodology note

McKinsey synthesis of consumer survey data, enterprise interviews, and proprietary modelling. The report combines a quantitative consumer survey on AI search adoption with case-level analysis of brand performance in AI answers. Published October 2025. Forecast figures should be cited as projections, not measured outcomes.

McKinsey·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

How People Use ChatGPT (NBER Working Paper 34255)

NBER / OpenAI / Harvard University · Aaron Chatterji et al. · 2025

Key finding

ChatGPT reached around 700 million weekly active users by mid-2025, with roughly 18 billion messages sent per week. About 30% of conversations are work-related while 70% are personal, covering writing assistance, information seeking, and tutoring. Adoption is rising fastest in lower-income countries. (agent inferred)

Methodology note

NBER working paper by Aharon Chetrit, Aidan Toner-Rodgers and OpenAI co-authors analysing a representative sample of ChatGPT conversations. The researchers classified messages by topic, work versus personal use, and user demographics to characterise how people actually use the assistant in 2024 and 2025.

National Bureau of Economic Research·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

AI Answer Engine Citation Behavior: An Empirical Analysis of the GEO-16 Framework

arXiv · 2025

Key finding

Three on-page properties showed the strongest association with whether a page got cited by AI answer engines: metadata and freshness, semantic HTML markup, and structured data. Pages that scored at least 0.70 on the GEO-16 quality score and met at least 12 of 16 quality pillars were cited at substantially higher rates than pages that did not.

Methodology note

70 product-intent prompts were run across Brave Summary, Google AI Overviews, and Perplexity, producing 1,702 citations across 1,100 unique URLs. The researchers audited each cited page against a 16-pillar framework and used logistic models with domain-clustered standard errors. The study focuses on English-language B2B SaaS pages. Published September 2025.

arXiv·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

The Crawl-to-Click Gap: Cloudflare Data on AI Bots, Training, and Referrals

Cloudflare · 2025

Key finding

AI crawlers read content far more than they send referrals back. Anthropic's ClaudeBot crawled around 70,000 pages for every visitor it referred; OpenAI's GPTBot crawled around 1,700 for every visitor; Perplexity around 5 for every visitor. Mistral was the only major AI engine where referrals outweighed crawl volume.

Methodology note

Aggregate analysis of crawl requests and referral traffic across the Cloudflare network. For each major AI crawler, the team divided pages crawled by visits sent to the same destinations during the same window, producing a crawl-to-refer ratio. Published August 2025.

Cloudflare Blog·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

Generative AI-Powered Shopping Rises with Traffic to U.S. Retail Sites (Adobe Analytics)

Adobe Digital Insights · 2025

Key finding

Visitors arriving at US retail sites from generative AI sources show measurably higher engagement than visitors from other channels: 8% higher time on site, 12% more pages per visit, and 23% lower bounce rate. AI-driven retail traffic grew sharply through 2024 and 2025, though it remains a small share of total visits.

Methodology note

Aggregate analysis of Adobe Analytics data covering trillions of visits to US retail websites. Adobe compared engagement metrics for visitors arriving from generative AI assistants against visitors from other referral channels. Published August 2025.

Adobe Business Blog·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

Google Users Are Less Likely to Click on Links When an AI Summary Appears in Search Results

Pew Research Center · 2025

Key finding

When a Google search result page includes an AI summary, users click on a traditional link in roughly 8% of visits. On result pages without an AI summary, they click in roughly 15% of visits. Users rarely click on the citations inside the AI summary itself, doing so on about 1% of visits.

Methodology note

Pew Research panel study covering 900 US adults and 68,879 Google searches conducted between March and May 2025. Sessions were tracked through opt-in browser participation; click behaviour was observed directly rather than self-reported. Published July 2025.

Pew Research Center·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

Code of Practice for General-Purpose AI Models (Copyright Chapter)

European Commission / EU AI Office · 2025

Key finding

The EU General-Purpose AI Code of Practice provides a voluntary route for AI model providers to demonstrate compliance with the AI Act's obligations on copyright, transparency, and safety. The Copyright Chapter requires signatories to honour machine-readable opt-out signals such as robots.txt and TDM reservations, to publish a summary of training data, and to put a complaint mechanism in place for rightsholders.

Methodology note

Official European Commission policy page hosting the Code of Practice for General-Purpose AI Models, developed by independent experts under the EU AI Act process and published in 2025. The Code covers Safety and Security, Transparency, and Copyright chapters, and is signed by major AI providers as a way to show compliance with the AI Act.

European Commission·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

Introducing Comet: Browse at the Speed of Thought

Perplexity AI · 2025

Key finding

Perplexity launched Comet, an AI-native web browser built around Perplexity's answer engine. Comet replaces the search bar with a conversational assistant, summarises pages, answers questions about open tabs, and can execute agentic tasks across the web on behalf of the user. (agent inferred)

Methodology note

First-party product launch announcement from Perplexity. The post positions Comet as a Chromium-based browser with Perplexity's assistant available across every tab, supporting research workflows, product comparisons, and multi-step actions. Initial access was offered to Perplexity Max subscribers.

Perplexity Hub·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

From Googlebot to GPTBot: Who's Crawling Your Site in 2025

Cloudflare · 2025

Key finding

Across Cloudflare's network, search and AI crawler traffic rose 18% from May 2024 to May 2025. Googlebot grew 96% in raw requests and now accounts for 50% of crawler traffic. GPTBot rose 305% in requests, with its share climbing from 2.2% to 7.7%. ChatGPT-User requests jumped 2,825%, and PerplexityBot grew 157,490% off a tiny base. About 14% of top domains now use robots.txt rules targeting AI bots specifically.

Methodology note

Cloudflare Radar analysis published July 2025, comparing crawler activity in May 2024 against May 2025 across a fixed cohort of customers to remove growth bias. The team matches user-agent tokens against an open-source list of AI crawlers and analyses robots.txt files on 3,816 of the top 10,000 domains. Methodology and limits are documented in the post.

Cloudflare Blog·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments

arXiv (cs.CL) / EMNLP 2025 · Gili Lior et al. · 2025

Key finding

ReliableEval proposes a method-of-moments recipe for stochastic LLM evaluation that explicitly accounts for run-to-run variance in model outputs. Across standard benchmarks, the method produces tighter confidence intervals than naive averaging and reveals that some headline LLM performance comparisons are within noise margins. Released as an open evaluation toolkit.

Methodology note

arXiv preprint 2505.22169 (May 2025). Direct fetch returned the abstract page. The paper derives the method-of-moments estimator, tests it against several common evaluation tasks, and releases the toolkit for community use.

arXiv·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

Copyright and AI Part 3: Generative AI Training (pre-publication report)

Key finding

The US Copyright Office concludes that generative AI training raises copyright questions at several points: data collection, model training, retrieval-augmented generation, and outputs. Fair use is fact-specific and depends on transformativeness, commerciality, the amount used, and effects on the market for the original work, including market dilution and lost licensing. The Office recommends voluntary licensing markets rather than compulsory licensing schemes.

Methodology note

Pre-publication version of Part 3 of the Copyright Office's Report on Copyright and Artificial Intelligence, released May 2025 by the Register of Copyrights. The report draws on more than 10,000 comments submitted in response to a 2023 Notice of Inquiry, plus existing case law and international approaches. Sections cover technical background, prima facie infringement, fair use, and licensing options.

USCO·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

AI Search Has a Citation Problem (Tow Center Report)

Tow Center for Digital Journalism, Columbia · 2025

Key finding

Across eight AI search engines tested, more than 60% of news-attribution queries received incorrect answers. Perplexity got 37% wrong; Grok 3 got 94% wrong. Premium paid models were no more accurate than free ones, and often produced confidently incorrect answers without flagging uncertainty. Several engines retrieved content from publishers that had explicitly blocked their crawlers.

Methodology note

1,600 queries were run across ChatGPT Search, Perplexity, Perplexity Pro, DeepSeek Search, Microsoft Copilot, Grok-2, Grok-3, and Google Gemini. The researchers selected 10 articles from each of 20 publishers, used direct excerpts as queries, and asked each chatbot to identify the headline, publisher, publication date, and URL. Responses were manually graded against six categories.

Columbia Journalism Review·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

Goodbye Clicks, Hello AI: Zero-Click Search Redefines Marketing

Bain & Company · 2025

Key finding

Roughly 80% of consumers now rely on zero-click results, AI summaries, or assistant answers for at least 40% of their search needs, and AI search use has reduced average organic click-through rates by 15% to 25%. The shift compresses the funnel: brands need to be present and credible in the answer itself, not on the click destination.

Methodology note

Bain survey of more than 1,000 US consumers combined with proprietary analysis of organic search traffic patterns. The report measures self-reported AI search use and click-through behaviour across categories. Published February 2025.

Bain & Company·Accessed 27.05.2026

Tier A — Strongest evidenceRead source

GEO: Generative Engine Optimization

Princeton University / Georgia Tech / Allen Institute for AI / IIT Delhi · Pranjal Aggarwal et al. · 2024

Key finding

Adding citations, quotations, and statistics to content can increase its visibility in AI-generated answers by up to 41% on average. Pages ranked outside the top of traditional search saw the largest gains. The effect varies by content domain and by AI engine, but the lift from evidence-style content elements is consistent across the conditions tested.

Methodology note

10,000 questions were run through generative search engines. The researchers compared answers before and after applying nine content optimisation strategies, including citations, quotations, statistics, and authoritative language. They measured visibility as the share of the AI answer attributable to the optimised page, using both word position and word count metrics. Peer-reviewed at KDD 2024.

arXiv / KDD 2024·Accessed 27.05.2026

Tier B — Citable with caveatsRead source

We Tracked 1,885 Pages Adding Schema. AI Citations Barely Moved

Ahrefs · 2026

Key finding

Across 1,885 pages that added JSON-LD between August 2025 and March 2026, schema produced no meaningful uplift in AI citations. Matched difference-in-differences tests against 4,000 control pages showed +2.4% on Google AI Mode and +2.2% on ChatGPT (both statistically indistinguishable from zero) and a small 4.6% decline on Google AI Overviews. 53% of AI-cited pages already carry schema, but this reflects overall site quality.

Methodology note

Ahrefs identified 1,885 URLs that transitioned from no JSON-LD to having JSON-LD between August 2025 and March 2026, using its crawler database. Each treated page was matched to three control pages from different domains with similar pre-period citation levels. Citation changes were measured 30 days before and after the schema-add date across AI Overviews, AI Mode and ChatGPT using four statistical tests including matched difference-in-differences.

Ahrefs Blog·Accessed 27.05.2026

Tier B — Citable with caveatsRead source

AI Traffic Converts at 3× the Rate of Other Channels (Study)

Microsoft Clarity · 2026

Key finding

Visitors arriving from AI assistants convert at roughly 3 times the rate of visitors from other channels, and at up to 11 times the rate in certain publisher segments. AI traffic still represents a small share of total visits, but its per-visitor commercial value is materially higher than traditional search or social.

Methodology note

Analysis of Microsoft Clarity user-session data across a multi-publisher dataset. The study compared conversion rates of sessions originating from AI assistants against sessions from other referral channels. Published January 2026. Single-vendor study with disclosed methodology, downgraded to Tier B in v1.1.

Microsoft Clarity Blog·Accessed 27.05.2026

Tier B — Citable with caveatsRead source

LLMs.txt Shows No Clear Effect on AI Citations (300K domains)

SE Ranking · 2025

Key finding

Across 300,000 domains, only 10.13% had an llms.txt file. Adoption is roughly flat across traffic tiers, with high-traffic sites slightly less likely (8.27%) to use it than mid-tier ones (10.54%). Statistical tests and an XGBoost model found no relationship between the presence of llms.txt and how often a domain is cited by AI engines. Removing the variable from the model actually improved its accuracy.

Methodology note

SE Ranking study of nearly 300,000 domains, published November 2025. The team checked each domain for an llms.txt file, segmented adoption by monthly traffic, and modelled citation frequency using Spearman correlation, XGBoost regression and SHAP analysis. The conclusion is based on whether llms.txt presence improved or degraded model predictions of LLM citations.

SE Ranking Blog·Accessed 27.05.2026

Tier C — Tactical signals onlyRead source

What 2025 Revealed About AI Search and the Future of Schema Markup

Schema App · Martha van Berkel · 2025

Key finding

In 2025, Google and Microsoft publicly confirmed they use Schema markup for generative AI features, and ChatGPT confirmed it uses structured data to decide which products appear in results. Schema App reported a 19.72% rise in AI Overview visibility on its own site after deploying Entity Linking, and customer InSinkErator a 69% rise in clicks on non-branded queries.

Methodology note

First-party essay by Schema App's CEO. The piece argues structured data should be treated as a knowledge graph rather than a rich-result trick, and uses examples from Schema App's own site and named customers (InSinkErator, Wells Fargo) plus public statements from Google, Microsoft, and ChatGPT to support the case.

schemaapp.com·Accessed 27.05.2026

About the author Max Ackermann

Max Ackermann is founder and Managing Director of info.link, the product data platform that makes brands visible in AI search and connects every physical product to the web through GS1 Digital Link. He writes about AI search and generative engine optimization (GEO), AI-powered commerce, and how brands can structure product data for ChatGPT, Gemini, Perplexity, and retailer AI assistants like Amazon Rufus. For the past two years he has built the pipelines that put structured product data into AI answers, and run the experiments that test what actually moves AI citations.

Max has 20+ years of experience building digital products and businesses. He previously led McKinsey's Corporate Venture and Design teams across Europe, and as Managing Director of a leading US digital agency he built platforms with Nike, Google, Meta, and Airbnb. He founded the UX Design program at Central Saint Martins College, University of the Arts London, and is a Fellow of the UK's Higher Education Academy. Based in Hamburg, he works closely with GS1 on Digital Link adoption; info.link is headquartered in Hamburg and Berlin and counts GS1 Germany among its investors.

Follow Max on LinkedIn.

How to read the evidence in AI search visibility: sources we trust, sources we don't

Sourcing transparency is the category's missing layer

Three tiers cover the practical space

Tier A is the list of sources whose methodology survives audit

The statistics we no longer cite, and why

You can check a statistic yourself in four steps

What we still don't know

The commitment behind every source list

FAQFrequently Asked Questions

How do you evaluate AI visibility statistics?

How do you evaluate AI visibility statistics?

Which AI visibility sources are most reliable?

Which AI visibility sources are most reliable?

Why do AI visibility statistics vary so much across sources?

Why do AI visibility statistics vary so much across sources?

What is wrong with the "23× AI conversion rate" claim?

What is wrong with the "23× AI conversion rate" claim?

Does FAQ schema really lift AI citations by 44%?

Does FAQ schema really lift AI citations by 44%?

How can I check whether a statistic about AI search is reliable?

How can I check whether a statistic about AI search is reliable?

Where can I find primary sources on AI visibility research?

Where can I find primary sources on AI visibility research?

Why does this library publish a list of statistics it no longer uses?

Why does this library publish a list of statistics it no longer uses?

Sources

About the author Max Ackermann

Interested?

FAQ
Frequently Asked Questions