How to read the evidence in AI search visibility: sources we trust, sources we don't
The AI visibility category repeats statistics that do not survive scrutiny, often vendor blogs citing each other. Our public three-tier source system separates citable research from noise: peer-reviewed and enterprise-scale data at the top, single-data-point case studies treated as provisional.
The AI visibility category has a sourcing problem, and a public tier system is the way through it. Vendor blogs cite each other in circular patterns, and statistics with no disclosed methodology get repeated until they look authoritative. Our three-tier system separates evidence that survives audit from evidence that does not. Tier A is citable without qualification, Tier B is citable with a caveat, and Tier C is provisional. The sections that follow set out the tiers, name the sources used most, and give the reader a four-step check.
Sourcing transparency is the category's missing layer
The category repeats numbers that do not hold up. A claim originates in a single vendor case study and travels into an agency deck. It gets quoted in a conference talk, then arrives in a brand-team briefing with the original sample size stripped away. By the time a brand manager reads it, the figure looks like consensus.
Two structural habits drive the problem. Vendor blogs cite other vendor blogs, which manufactures the appearance of corroboration without adding evidence. Real research sits next to marketing collateral in the same search results, and the two are formatted to look alike. A brand team making a budget decision deserves a way to tell them apart.
The fix is not a longer reading list. It is a published standard for what counts as evidence, applied openly, including to this library's own provisional sources. Sourcing transparency is the layer the category is missing, and publishing it is the move that earns external citation.
Three tiers cover the practical space
The system has three tiers, defined by what a source discloses and how it was produced. Tier A is citable without qualification. It covers peer-reviewed academic work, enterprise-scale analytics providers with public methodology, major consultancies, large first-party studies with disclosed methodology, and government or regulatory sources. The Princeton GEO paper is a Tier A example, peer-reviewed and method-disclosed Aggarwal et al. · Princeton University / Georgia Tech / Allen Institute for AI / IIT Delhi · 2024.
Tier B is citable with attribution and a one-line caveat. It covers credible trade press, case studies from named companies with disclosed methodology, and single-vendor studies with disclosed methodology. The control-matched schema study is a Tier B example, a single vendor's analysis with a clearly stated difference-in-differences design Ahrefs · 2026.
Tier C is provisional. It covers vendor blogs, individual LinkedIn or Substack posts, and case studies with a single data point. A Tier C source is used only when a finding is genuinely novel and no Tier A or B source exists. It is labelled as provisional in the body. Finer-grained distinctions than these three do not survive contact with the work.
Tier A is the list of sources whose methodology survives audit
Tier A is not a wish list. It is the set of sources whose methodology can be checked. The most-used Tier A sources in this library fall into four groups.
The first group is enterprise-scale telemetry with public methodology. Pew Research Center publishes its panel and opt-in tracking method Pew Research Center · 2025. Adobe Digital Insights aggregates Analytics data across trillions of visits with disclosed method Adobe Digital Insights · 2025. Cloudflare Radar reports network-scale crawler telemetry Cloudflare · 2025.
The second group is major consultancies with stated samples. Bain surveyed more than 1,000 consumers alongside proprietary analysis Bain & Company · 2025. McKinsey published its forecast method for the AI search revenue projection McKinsey & Company · 2025. The third group is peer-reviewed academic work: the GEO-16 empirical analysis arXiv · 2025, the citation-absorption framework Zhang et al. · arXiv (cs.IR) · 2026, and the OpenAI usage study Chatterji et al. · NBER / OpenAI / Harvard University · 2025. The fourth group is primary-source documentation and government work: the Stanford AI Index Stanford Institute for Human-Centered AI (HAI) · 2026, the Tow Center accuracy study Tow Center for Digital Journalism, Columbia · 2025, and first-party crawler docs OpenAI · 2025.
The statistics we no longer cite, and why
Publishing retractions is the move nobody else in the category makes. Several figures that circulate widely are no longer used here, each for a specific, checkable reason.
The first is the "23× AI conversion rate" claim. It generalises from a single SaaS case study with one data point, and the headline multiple shrinks dramatically once the underlying rates are disclosed. The replacement evidence is a study reporting AI traffic converting at roughly three times other channels, with its method stated Microsoft Clarity · 2026. The second is the family of "FAQ schema lifts citations 2.5 to 3.2 times" claims, sourced to vendor blogs. The control-matched study found schema produced a +2.4% lift on Google AI Mode, statistically indistinguishable from zero Ahrefs · 2026.
The third is the "44% citation lift from FAQ schema" figure, a single vendor study with unclear methodology. It is contradicted by the same control-matched evidence and by a 300,000-domain analysis that found no relationship between the relevant file and citation rate SE Ranking · 2025. The fourth is a representative vendor-reported case of a 19.72% rise in AI Overview visibility after entity linking van Berkel · Schema App · 2025. The figure is a single-data-point case study; treat it as provisional. The pattern across all four is the same: a clean primary source replaces a recycled vendor figure.
You can check a statistic yourself in four steps
A reader does not need this library to evaluate a claim. The check has four steps, and any one of them can fail a statistic.
The first step is to find the primary source. If a vendor blog cites another vendor blog, and that one cites a third, stop. A claim with no primary source is not a claim, it is a rumour with a number attached. The second step is to check the sample size and method. A study of one customer does not generalise to a category, and a percentage with no denominator hides its own weakness.
The third step is to check the date. AI search behaviour moves fast, so a 2023 figure may already be obsolete, and single measurements are unstable in the first place Mustahsan · arXiv · 2025. The fourth step is to check whether the claim survives recasting. A "23×" multiple that collapses to "1.66% versus 0.15%" once both rates are shown was overstated by its framing. A reliable reading reports the figure with its method and variance Lior et al. · arXiv (cs.CL) / EMNLP 2025 · 2025.
What we still don't know
Naming the gaps openly is the same discipline as naming the tiers. Several important questions have evidence too thin for a confident answer, and pretending otherwise would fail this article's own test.
The first gap is real-world prompt data. What users actually type into AI assistants is proprietary to the AI companies, so the category works from public proxies rather than direct observation. The second is the long-term effect of model updates on visibility trajectories, where no longitudinal Tier A study yet exists. The third is how agentic browsers reshape attribution and measurement, documented as products but not yet as behaviour OpenAI · 2025. The behaviour of the second major agentic browser is equally undocumented at Tier A Perplexity AI · 2025.
The fourth gap is how AI engines weight first-party against third-party content in equilibrium, which current evidence describes only in snapshots. The fifth is how regulation will affect retrieval in practice. The EU Code of Practice covers copyright and transparency European Commission / EU AI Office · 2025. The US Copyright Office has acknowledged that fair use in AI training is fact-specific U.S. Copyright Office · 2025. The operational effects on retrieval are still emerging.
The commitment behind every source list
Every article in this library applies the same rule. Tier A sources are used where they exist, Tier B with attribution and a caveat, and Tier C only as provisional and labelled as such. The source list at the bottom of each article shows the tier of every entry. The reader can audit the evidence without leaving the page.
The standard applies to this library's own weak spots. Where a finding rests on a single vendor case study, the article says so in the same sentence as the claim. Where a number is provisional, the word "provisional" sits next to it, not in a footnote. The crawler data this library leans on most is named with its source, so a reader can check the method directly Cloudflare · 2025.
Credibility is the category's scarcest resource. A public tier system, an open retraction list, and a named gap list are how this library spends it carefully. If a figure here turns out to be wrong, the correction is published. A research surface that hides its errors is just another vendor blog.
FAQFrequently Asked Questions
Sources
Sources are tiered per our methodology & sources page.
From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms
arXiv (cs.IR) · Kai Zhang et al. · 2026
ChatGPT cites around 7 sources per answer; Perplexity and Google AI Overviews cite more. But pages cited by ChatGPT have a much higher average influence on the answer's wording and evidence. Influence rises with page length, structure, and the density of definitions, statistics, comparisons, and step-by-step procedures.
Methodology note
602 controlled prompts run through ChatGPT, Google AI Overview / Gemini, and Perplexity. The researchers analysed 21,143 citations and 18,151 fetched pages, extracting 72 features per citation. They measured citation breadth (how many sources are cited) and citation depth (how much each cited source actually shapes the final answer). The dataset is public.
The 2026 AI Index Report
Stanford Institute for Human-Centered AI (HAI) · 2026
Organisational AI adoption reached 88% and four in five university students now use generative AI. Generative AI reached 53% population adoption within three years, faster than the PC or the internet. The estimated value of generative AI tools to US consumers reached 172 billion dollars annually by early 2026. Documented AI incidents rose to 362 in 2025, up from 233 in 2024.
Methodology note
Annual Stanford HAI report drawing on dozens of sources: AI model benchmark results (SWE-bench, IMO, OSWorld), private investment trackers, patent and publication databases, government policy data, and global public opinion surveys. Nine chapters cover R&D, performance, responsible AI, economy, science, medicine, education, policy, and public opinion.
OpenAI Crawler Documentation Update (Dec 2025 — narrows robots.txt compliance)
OpenAI · 2025
In the December 9, 2025 update, OpenAI's bot documentation removed the previous claim that OAI-SearchBot feeds navigational links into ChatGPT answers and dropped any reference to OAI-SearchBot supplying training data. ChatGPT-User was expanded to explicitly cover Custom GPT requests and GPT Actions, and robots.txt is no longer applied to user-initiated ChatGPT-User actions. OpenAI also confirmed OAI-SearchBot and GPTBot share crawl results to avoid duplicate fetching.
Methodology note
Same canonical OpenAI documentation page, captured after the December 9, 2025 revision identified publicly by Pieter Serraris. Direct diff was not available; changes were confirmed against the live developers.openai.com/api/docs/bots page and detailed write-ups on PPC Land, Search Engine Roundtable and Stan Ventures comparing pre- and post-update language.
Stochasticity in Agentic Evaluations: Quantifying Inconsistency with Intraclass Correlation
arXiv · Mustahsan · 2025
Quantifies stochasticity in agentic LLM evaluations using intraclass correlation coefficients (ICC). Shows that single-run evaluations of agentic systems are unreliable because run-to-run variance is large relative to the gap between system variants. Recommends a minimum of 5 to 10 repeated runs per evaluation and reports the ICCs for several common agentic benchmarks.
Methodology note
arXiv preprint 2512.06710 (December 2025). Direct fetch on arxiv.org returned the abstract page. The paper applies the intraclass correlation coefficient framework from psychometrics to LLM agent evaluation and reports ICC values across multiple published benchmarks.
OpenAI launched ChatGPT Atlas, an AI-native web browser that embeds ChatGPT directly into browsing, summarises pages, answers questions in the sidebar, and can carry out multi-step tasks on the user's behalf such as filling forms, comparing products, and completing purchases. (agent inferred)
Methodology note
First-party product launch announcement from OpenAI. The post introduces Atlas as a Chromium-based browser with ChatGPT integrated as the default interface, agentic capabilities for browsing and transacting, and memory of past sessions. Initial availability is on macOS with other platforms to follow.
New Front Door to the Internet: Winning in the Age of AI Search
McKinsey & Company · 2025
McKinsey projects that AI-powered search will mediate roughly $750 billion in US consumer revenue by 2028, representing a meaningful share of category-level discovery. Brands that win in AI answers tend to combine strong third-party coverage, structured product information, and active management of their entity presence across the open web.
Methodology note
McKinsey synthesis of consumer survey data, enterprise interviews, and proprietary modelling. The report combines a quantitative consumer survey on AI search adoption with case-level analysis of brand performance in AI answers. Published October 2025. Forecast figures should be cited as projections, not measured outcomes.
How People Use ChatGPT (NBER Working Paper 34255)
NBER / OpenAI / Harvard University · Aaron Chatterji et al. · 2025
ChatGPT reached around 700 million weekly active users by mid-2025, with roughly 18 billion messages sent per week. About 30% of conversations are work-related while 70% are personal, covering writing assistance, information seeking, and tutoring. Adoption is rising fastest in lower-income countries. (agent inferred)
Methodology note
NBER working paper by Aharon Chetrit, Aidan Toner-Rodgers and OpenAI co-authors analysing a representative sample of ChatGPT conversations. The researchers classified messages by topic, work versus personal use, and user demographics to characterise how people actually use the assistant in 2024 and 2025.
AI Answer Engine Citation Behavior: An Empirical Analysis of the GEO-16 Framework
arXiv · 2025
Three on-page properties showed the strongest association with whether a page got cited by AI answer engines: metadata and freshness, semantic HTML markup, and structured data. Pages that scored at least 0.70 on the GEO-16 quality score and met at least 12 of 16 quality pillars were cited at substantially higher rates than pages that did not.
Methodology note
70 product-intent prompts were run across Brave Summary, Google AI Overviews, and Perplexity, producing 1,702 citations across 1,100 unique URLs. The researchers audited each cited page against a 16-pillar framework and used logistic models with domain-clustered standard errors. The study focuses on English-language B2B SaaS pages. Published September 2025.
The Crawl-to-Click Gap: Cloudflare Data on AI Bots, Training, and Referrals
Cloudflare · 2025
AI crawlers read content far more than they send referrals back. Anthropic's ClaudeBot crawled around 70,000 pages for every visitor it referred; OpenAI's GPTBot crawled around 1,700 for every visitor; Perplexity around 5 for every visitor. Mistral was the only major AI engine where referrals outweighed crawl volume.
Methodology note
Aggregate analysis of crawl requests and referral traffic across the Cloudflare network. For each major AI crawler, the team divided pages crawled by visits sent to the same destinations during the same window, producing a crawl-to-refer ratio. Published August 2025.
Generative AI-Powered Shopping Rises with Traffic to U.S. Retail Sites (Adobe Analytics)
Adobe Digital Insights · 2025
Visitors arriving at US retail sites from generative AI sources show measurably higher engagement than visitors from other channels: 8% higher time on site, 12% more pages per visit, and 23% lower bounce rate. AI-driven retail traffic grew sharply through 2024 and 2025, though it remains a small share of total visits.
Methodology note
Aggregate analysis of Adobe Analytics data covering trillions of visits to US retail websites. Adobe compared engagement metrics for visitors arriving from generative AI assistants against visitors from other referral channels. Published August 2025.
Google Users Are Less Likely to Click on Links When an AI Summary Appears in Search Results
Pew Research Center · 2025
When a Google search result page includes an AI summary, users click on a traditional link in roughly 8% of visits. On result pages without an AI summary, they click in roughly 15% of visits. Users rarely click on the citations inside the AI summary itself, doing so on about 1% of visits.
Methodology note
Pew Research panel study covering 900 US adults and 68,879 Google searches conducted between March and May 2025. Sessions were tracked through opt-in browser participation; click behaviour was observed directly rather than self-reported. Published July 2025.
Code of Practice for General-Purpose AI Models (Copyright Chapter)
European Commission / EU AI Office · 2025
The EU General-Purpose AI Code of Practice provides a voluntary route for AI model providers to demonstrate compliance with the AI Act's obligations on copyright, transparency, and safety. The Copyright Chapter requires signatories to honour machine-readable opt-out signals such as robots.txt and TDM reservations, to publish a summary of training data, and to put a complaint mechanism in place for rightsholders.
Methodology note
Official European Commission policy page hosting the Code of Practice for General-Purpose AI Models, developed by independent experts under the EU AI Act process and published in 2025. The Code covers Safety and Security, Transparency, and Copyright chapters, and is signed by major AI providers as a way to show compliance with the AI Act.
Introducing Comet: Browse at the Speed of Thought
Perplexity AI · 2025
Perplexity launched Comet, an AI-native web browser built around Perplexity's answer engine. Comet replaces the search bar with a conversational assistant, summarises pages, answers questions about open tabs, and can execute agentic tasks across the web on behalf of the user. (agent inferred)
Methodology note
First-party product launch announcement from Perplexity. The post positions Comet as a Chromium-based browser with Perplexity's assistant available across every tab, supporting research workflows, product comparisons, and multi-step actions. Initial access was offered to Perplexity Max subscribers.
From Googlebot to GPTBot: Who's Crawling Your Site in 2025
Cloudflare · 2025
Across Cloudflare's network, search and AI crawler traffic rose 18% from May 2024 to May 2025. Googlebot grew 96% in raw requests and now accounts for 50% of crawler traffic. GPTBot rose 305% in requests, with its share climbing from 2.2% to 7.7%. ChatGPT-User requests jumped 2,825%, and PerplexityBot grew 157,490% off a tiny base. About 14% of top domains now use robots.txt rules targeting AI bots specifically.
Methodology note
Cloudflare Radar analysis published July 2025, comparing crawler activity in May 2024 against May 2025 across a fixed cohort of customers to remove growth bias. The team matches user-agent tokens against an open-source list of AI crawlers and analyses robots.txt files on 3,816 of the top 10,000 domains. Methodology and limits are documented in the post.
ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments
arXiv (cs.CL) / EMNLP 2025 · Gili Lior et al. · 2025
ReliableEval proposes a method-of-moments recipe for stochastic LLM evaluation that explicitly accounts for run-to-run variance in model outputs. Across standard benchmarks, the method produces tighter confidence intervals than naive averaging and reveals that some headline LLM performance comparisons are within noise margins. Released as an open evaluation toolkit.
Methodology note
arXiv preprint 2505.22169 (May 2025). Direct fetch returned the abstract page. The paper derives the method-of-moments estimator, tests it against several common evaluation tasks, and releases the toolkit for community use.
Copyright and AI Part 3: Generative AI Training (pre-publication report)
U.S. Copyright Office · 2025
The US Copyright Office concludes that generative AI training raises copyright questions at several points: data collection, model training, retrieval-augmented generation, and outputs. Fair use is fact-specific and depends on transformativeness, commerciality, the amount used, and effects on the market for the original work, including market dilution and lost licensing. The Office recommends voluntary licensing markets rather than compulsory licensing schemes.
Methodology note
Pre-publication version of Part 3 of the Copyright Office's Report on Copyright and Artificial Intelligence, released May 2025 by the Register of Copyrights. The report draws on more than 10,000 comments submitted in response to a 2023 Notice of Inquiry, plus existing case law and international approaches. Sections cover technical background, prima facie infringement, fair use, and licensing options.
AI Search Has a Citation Problem (Tow Center Report)
Tow Center for Digital Journalism, Columbia · 2025
Across eight AI search engines tested, more than 60% of news-attribution queries received incorrect answers. Perplexity got 37% wrong; Grok 3 got 94% wrong. Premium paid models were no more accurate than free ones, and often produced confidently incorrect answers without flagging uncertainty. Several engines retrieved content from publishers that had explicitly blocked their crawlers.
Methodology note
1,600 queries were run across ChatGPT Search, Perplexity, Perplexity Pro, DeepSeek Search, Microsoft Copilot, Grok-2, Grok-3, and Google Gemini. The researchers selected 10 articles from each of 20 publishers, used direct excerpts as queries, and asked each chatbot to identify the headline, publisher, publication date, and URL. Responses were manually graded against six categories.
Goodbye Clicks, Hello AI: Zero-Click Search Redefines Marketing
Bain & Company · 2025
Roughly 80% of consumers now rely on zero-click results, AI summaries, or assistant answers for at least 40% of their search needs, and AI search use has reduced average organic click-through rates by 15% to 25%. The shift compresses the funnel: brands need to be present and credible in the answer itself, not on the click destination.
Methodology note
Bain survey of more than 1,000 US consumers combined with proprietary analysis of organic search traffic patterns. The report measures self-reported AI search use and click-through behaviour across categories. Published February 2025.
GEO: Generative Engine Optimization
Princeton University / Georgia Tech / Allen Institute for AI / IIT Delhi · Pranjal Aggarwal et al. · 2024
Adding citations, quotations, and statistics to content can increase its visibility in AI-generated answers by up to 41% on average. Pages ranked outside the top of traditional search saw the largest gains. The effect varies by content domain and by AI engine, but the lift from evidence-style content elements is consistent across the conditions tested.
Methodology note
10,000 questions were run through generative search engines. The researchers compared answers before and after applying nine content optimisation strategies, including citations, quotations, statistics, and authoritative language. They measured visibility as the share of the AI answer attributable to the optimised page, using both word position and word count metrics. Peer-reviewed at KDD 2024.
We Tracked 1,885 Pages Adding Schema. AI Citations Barely Moved
Ahrefs · 2026
Across 1,885 pages that added JSON-LD between August 2025 and March 2026, schema produced no meaningful uplift in AI citations. Matched difference-in-differences tests against 4,000 control pages showed +2.4% on Google AI Mode and +2.2% on ChatGPT (both statistically indistinguishable from zero) and a small 4.6% decline on Google AI Overviews. 53% of AI-cited pages already carry schema, but this reflects overall site quality.
Methodology note
Ahrefs identified 1,885 URLs that transitioned from no JSON-LD to having JSON-LD between August 2025 and March 2026, using its crawler database. Each treated page was matched to three control pages from different domains with similar pre-period citation levels. Citation changes were measured 30 days before and after the schema-add date across AI Overviews, AI Mode and ChatGPT using four statistical tests including matched difference-in-differences.
AI Traffic Converts at 3× the Rate of Other Channels (Study)
Microsoft Clarity · 2026
Visitors arriving from AI assistants convert at roughly 3 times the rate of visitors from other channels, and at up to 11 times the rate in certain publisher segments. AI traffic still represents a small share of total visits, but its per-visitor commercial value is materially higher than traditional search or social.
Methodology note
Analysis of Microsoft Clarity user-session data across a multi-publisher dataset. The study compared conversion rates of sessions originating from AI assistants against sessions from other referral channels. Published January 2026. Single-vendor study with disclosed methodology, downgraded to Tier B in v1.1.
LLMs.txt Shows No Clear Effect on AI Citations (300K domains)
SE Ranking · 2025
Across 300,000 domains, only 10.13% had an llms.txt file. Adoption is roughly flat across traffic tiers, with high-traffic sites slightly less likely (8.27%) to use it than mid-tier ones (10.54%). Statistical tests and an XGBoost model found no relationship between the presence of llms.txt and how often a domain is cited by AI engines. Removing the variable from the model actually improved its accuracy.
Methodology note
SE Ranking study of nearly 300,000 domains, published November 2025. The team checked each domain for an llms.txt file, segmented adoption by monthly traffic, and modelled citation frequency using Spearman correlation, XGBoost regression and SHAP analysis. The conclusion is based on whether llms.txt presence improved or degraded model predictions of LLM citations.
What 2025 Revealed About AI Search and the Future of Schema Markup
Schema App · Martha van Berkel · 2025
In 2025, Google and Microsoft publicly confirmed they use Schema markup for generative AI features, and ChatGPT confirmed it uses structured data to decide which products appear in results. Schema App reported a 19.72% rise in AI Overview visibility on its own site after deploying Entity Linking, and customer InSinkErator a 69% rise in clicks on non-branded queries.
Methodology note
First-party essay by Schema App's CEO. The piece argues structured data should be treated as a knowledge graph rather than a rich-result trick, and uses examples from Schema App's own site and named customers (InSinkErator, Wells Fargo) plus public statements from Google, Microsoft, and ChatGPT to support the case.
About the author Max Ackermann
Max Ackermann is founder and Managing Director of info.link, the product data platform that makes brands visible in AI search and connects every physical product to the web through GS1 Digital Link. He writes about AI search and generative engine optimization (GEO), AI-powered commerce, and how brands can structure product data for ChatGPT, Gemini, Perplexity, and retailer AI assistants like Amazon Rufus. For the past two years he has built the pipelines that put structured product data into AI answers, and run the experiments that test what actually moves AI citations.
Max has 20+ years of experience building digital products and businesses. He previously led McKinsey's Corporate Venture and Design teams across Europe, and as Managing Director of a leading US digital agency he built platforms with Nike, Google, Meta, and Airbnb. He founded the UX Design program at Central Saint Martins College, University of the Arts London, and is a Fellow of the UK's Higher Education Academy. Based in Hamburg, he works closely with GS1 on Digital Link adoption; info.link is headquartered in Hamburg and Berlin and counts GS1 Germany among its investors.
Follow Max on LinkedIn.


