info.link logo
Go back
Citations & Retrieval

What we can actually know about AI chatbot prompts: public proxies, not private feeds

No public first-party source from OpenAI, Anthropic, Google, Microsoft, or Perplexity discloses a browsable feed of real user prompts. Query fan-out and prompt rewriting are real and partly observable through the API. The best defensible workflow is a proxy built from public search demand, labelled as one.

No public source shows that anyone has comprehensive access to platform-wide AI chatbot prompt logs. Query fan-out and prompt rewriting are real and partly observable. Public search demand is the strongest documented proxy for likely chatbot intent. The defensible workflow is a proxy system built from public signals, usage research, and controlled API runs. It should be labelled as a proxy, not sold as a feed. This article separates what is provable from what is asserted, and treats the gap as a measurement-ethics question.

The category's prompt-tracking claims need disambiguation, not denial

AI visibility vendors say they track the prompts that matter for a brand. The implicit promise is visibility into what real users type into ChatGPT, Perplexity, and Gemini. The actual mechanism behind that promise is rarely disclosed in the pitch.

Some vendors are explicit about the limit in their own documentation. Otterly's help content recommends building a prompt set from brand terms, domains, industries, URLs, and SEO keywords Otterly AI · 2024. That is a construction method, not a feed of observed user prompts. Peec's onboarding documentation describes the same shape of workflow, where the customer supplies the inputs the tool then runs Peec AI · 2024.

The question is not whether prompt tracking is fraudulent. It is what the word "track" is doing. A vendor that builds a prompt set from public inputs is doing useful work. A vendor that implies it reads live user conversations is making a claim public evidence does not support.

No public source shows comprehensive access to real chatbot prompts

The strongest defensible statement here is bounded, not absolute. No public first-party source from OpenAI, Anthropic, Google, Microsoft, or Perplexity discloses a browsable feed of real user prompts. The phrasing matters: no public evidence reviewed here shows such a feed exists, which is different from claiming no non-public data exists anywhere.

The first-party documentation that does exist describes mechanisms, not dumps. OpenAI's help content explains how ChatGPT Search behaves for a user, without exposing other users' queries OpenAI · 2024. The web-search API documentation describes what a caller can observe from their own requests, not a global query stream OpenAI · 2025. Otterly confirms the gap from the other side. Its help content states there is no way to learn which prompts are most asked at ChatGPT or Perplexity Otterly AI · 2024.

Any vendor implying platform-wide prompt visibility is asserting something past the public record. The honest position is the bounded one.

Four first-party findings are actually proven

Four findings from primary sources support a real but limited workflow. Each is documented by the platform itself, which puts it on firmer ground than vendor inference.

The first is search expansion. Google documents that the Trends Explore page uses Gemini to expand a seed topic into up to eight related terms and rising queries Google · 2025. This is first-party proof that public search behaviour can be expanded into an adjacent-intent neighbourhood. The second is intent classification. OpenAI's usage research classifies consumer conversations into roughly 49% Asking, 40% Doing, and 11% Expressing Chatterji et al. · NBER / OpenAI / Harvard University · 2025. The taxonomy is published in both the research paper and OpenAI's summary OpenAI · 2025.

The third finding is prompt rewriting. OpenAI documents that ChatGPT Search rewrites a user query into one or more targeted queries. It may send additional, more specific queries after reviewing initial results OpenAI · 2024. Fan-out behaviour is real inside a production chatbot search system. The fourth is partial observability. The web-search API states that the call output will usually, but not always, include the search queries that were run OpenAI · 2025. The sources field can reveal the URLs consulted. Some fan-out is visible to an API caller; not all of it is.

Fan-out is a measurable mechanism, even when the queries stay hidden

Query fan-out is the step where one user question becomes several targeted sub-queries. A user types a single question. The system rewrites it into more specific queries, runs them, and may issue further queries after seeing early results OpenAI · 2024. Some of those sub-queries are observable through the API output; many are not OpenAI · 2025.

The mechanism has measurable consequences even where the specific queries stay hidden. Pages ranking for fan-out sub-queries are 161% more likely to be cited than pages ranking only for the main query Search Engine Land · 2025. Fan-out lives inside the wider retrieval pipeline, and the citation effect is the part a brand team can act on.

Fan-out also has documented quirks that affect any reconstruction. ChatGPT Search has been observed switching to English in its fan-out queries even when the user prompts in another language Search Engine Journal · 2025. A tool that surfaces fan-out queries does so from observable signals, not from a privileged feed Ahrefs · 2025.

Public search data is the best proxy, not proof

Google Trends measures public web-search behaviour, not chatbot conversations. Its Gemini-assisted expansion is useful because it reveals adjacent demand and semantic neighbourhoods around a seed topic Google · 2025. It should never be described as a direct measurement of what users typed into a ChatGPT or Gemini chat.

Search intent and chatbot intent are not identical, but they are close enough to be useful. Web search in the age of generative AI has its own documented characteristics that differ from conversational queries Kirsten et al. · Ruhr University Bochum / Max Planck Institute for Software Systems · 2025. The shift in how attention concentrates under AI search is also documented Aral et al. · MIT IDE · 2026. The proxy has to be read with that difference in mind. Search intent is the strongest publicly documented neighbourhood for likely chatbot intent, and that is the honest claim.

Calibrated proxy is honest research. Pretending the proxy is the thing is not. The distinction is the whole point of the method.

How vendors likely build prompt sets in practice

The most defensible public reconstruction of the vendor workflow has three steps. Vendors begin with customer-provided inputs: prompts, SEO keywords, URLs, brand terms, and public-demand proxies. They then run those prompts through targeted chatbot or search workflows. They then measure visibility, citations, and sometimes the query rewrites where the platform exposes them.

Vendor documentation supports this shape directly. Otterly's help content openly states the prompt set is built from brand terms, domains, industries, and SEO keywords Otterly AI · 2024. Its guidance on adding prompts describes manual and assisted input, not a captured feed Otterly AI · 2024. Peec's documentation describes the same customer-supplied input model Peec AI · 2024. Some tools then surface the fan-out queries they can observe from those runs Ahrefs · 2025.

Treat each vendor's help centre as evidence about that vendor's stated method, not as proof of the whole category. A vendor that admits the workflow is a proxy is more credible, not less.

The caveats an honest researcher keeps in every claim

Four bounded statements hold without overreaching. Each is the calibrated version of a claim the category often overstates.

Public search behaviour is the best available proxy for likely chatbot prompts, not a measurement identical to chatbot behaviour Google · 2025. OpenAI's usage research is useful for classifying likely chatbot intent, not a public prompt dump Chatterji et al. · NBER / OpenAI / Harvard University · 2025. Some query rewrites are observable to API callers, but API responses do not return every internal fan-out query OpenAI · 2025. No public first-party source reviewed here proves comprehensive vendor access to prompt logs, which is narrower than claiming no vendor holds any non-public data anywhere.

Honest caveats strengthen the article's authority rather than weakening it. The bounded claim is the one that survives scrutiny.

A buyer's framework, not a vendor takedown

The best publicly documented workflow approximates intent from public search signals and generates prompt hypotheses from them. It classifies those hypotheses against OpenAI's published taxonomy, then inspects whatever rewrites or sources the platform returns on controlled runs OpenAI · 2025. That is a useful proxy system. It is not direct access to global chatbot prompt demand.

The reader leaves with a usable method, not a list of vendors to distrust. Use Google Trends Explore with Gemini expansion as a seed-discovery layer Google · 2025. Classify the resulting hypotheses against the Asking, Doing, and Expressing taxonomy OpenAI · 2025. Run a controlled set of prompts through ChatGPT, Perplexity, Gemini, and Google AI Overviews, and observe the rewrites where available OpenAI · 2025. When evaluating a vendor, ask one question: what is the actual source of the prompts you say you track? The answer is the test of the vendor's editorial honesty.

FAQ
Frequently Asked Questions

Sources

Sources are tiered per our methodology & sources page.

Key finding

Across controlled experiments comparing AI search engines (ChatGPT, Perplexity, Google AI Overviews) with traditional search, AI search significantly reduces clicks to source publishers and concentrates attention on a smaller set of authoritative domains. Users exposed to AI summaries form more confident but less accurate beliefs on contested topics. (agent inferred)

Methodology note

arXiv preprint 2602.13415 by Sinan Aral, Haiwen Li and Rui Zuo (MIT Sloan), submitted 13 February 2026. Direct fetch on arxiv.org confirmed authorship and the 24,000 queries / 2.8 million results / 243 countries scope. Companion to R128 from the same lab.

arXiv·Accessed
Tier A — Strongest evidenceRead source

Characterizing Web Search in The Age of Generative AI

Ruhr University Bochum / Max Planck Institute for Software Systems · Elisabeth Kirsten et al. · 2025

Key finding

Generative search and traditional web search return different things even for the same query. Generative engines pull from a broader pool of sources than Google web search, mix in varying amounts of internal model knowledge versus retrieved pages, and surface different concept sets. That widens the set of pages that can earn visibility, but also breaks assumptions baked into classical ranked-list evaluation.

Methodology note

Academic comparison of one traditional engine (Google web search) with four generative engines from Google and OpenAI, run across queries from four content domains. The authors measured source coverage, the balance between model-internal knowledge and externally retrieved web pages, and the concepts surfaced in each output.

arXiv·Accessed
Tier A — Strongest evidenceRead source

How People Use ChatGPT (NBER Working Paper 34255)

NBER / OpenAI / Harvard University · Aaron Chatterji et al. · 2025

Key finding

ChatGPT reached around 700 million weekly active users by mid-2025, with roughly 18 billion messages sent per week. About 30% of conversations are work-related while 70% are personal, covering writing assistance, information seeking, and tutoring. Adoption is rising fastest in lower-income countries. (agent inferred)

Methodology note

NBER working paper by Aharon Chetrit, Aidan Toner-Rodgers and OpenAI co-authors analysing a representative sample of ChatGPT conversations. The researchers classified messages by topic, work versus personal use, and user demographics to characterise how people actually use the assistant in 2024 and 2025.

National Bureau of Economic Research·Accessed
Tier A — Strongest evidenceRead source

How People Use ChatGPT (OpenAI research paper PDF)

OpenAI · 2025

Key finding

OpenAI's research on ChatGPT usage classifies consumer conversations into three categories: roughly 49% 'Asking' (information seeking), 40% 'Doing' (practical task assistance), and 11% 'Expressing' (writing and creative work). This provides a defensible intent taxonomy for classifying prompt hypotheses — but it is an aggregate breakdown, not a browsable dump of user prompts.

Methodology note

OpenAI research paper published as a downloadable PDF, with co-authorship by external economists (David Deming, Christopher T. Stanton and colleagues). The paper presents an aggregate analysis of anonymised ChatGPT consumer usage and classifies conversations into Asking, Doing and Expressing categories. No individual prompts are published; methodology and definitions of each category are disclosed in full inside the PDF.

OpenAI·Accessed
Tier A — Strongest evidenceRead source

How people are using ChatGPT (OpenAI summary page)

OpenAI · 2025

Key finding

Publicly accessible summary of OpenAI's ChatGPT usage research. Describes the Asking/Doing/Expressing classification (49%/40%/11%) and the dominant consumer use cases: practical guidance, information seeking, and writing assistance. Useful as a citable first-party summary of the underlying research, but, like the underlying paper, it provides aggregate categories rather than a browsable feed of real user prompts.

Methodology note

Public-facing OpenAI summary page accompanying the 'How People Use ChatGPT' research paper. Provides accessible explanations of the Asking/Doing/Expressing taxonomy and the reported category shares (~49% / 40% / 11%). Content verified by fetch on 2026-05-27. No methodology beyond what is disclosed in the underlying paper (R192) and the NBER working paper (R52).

OpenAI·Accessed
Tier A — Strongest evidenceRead source

Web search (OpenAI API documentation)

OpenAI · 2025

Key finding

OpenAI's web-search API documentation states that the web_search_call output item will usually (but not always) include the search queries that were searched, and that the sources field can reveal all URLs consulted during the search run. This is first-party proof that some query rewrites can be observed for requests under the caller's control — but the 'usually but not always' caveat means observed fan-out is partial rather than exhaustive.

Methodology note

Official OpenAI developer documentation for the web search tool exposed via the Responses API. Describes the schema of the web_search_call output item, including which fields are populated and the explicit caveat that searched queries are returned 'usually (but not always).' Content verified by fetch on 2026-05-27. No aggregate usage data is disclosed.

OpenAI Developer Platform·Accessed
Key finding

Google Trends Explore uses Gemini to take an area of interest and expand it into up to eight related search terms, additional ideas, and top/rising queries. This is direct first-party proof that public search behaviour can be expanded from a seed topic into an adjacent-intent neighbourhood — but Google explicitly frames it as web-search demand, not chatbot-prompt demand.

Methodology note

Official Google Help Center documentation page describing how the Google Trends Explore experience uses Gemini models to suggest related search terms, follow-up ideas, and top/rising queries from a seed input. Content was fetched and verified directly from support.google.com on 2026-05-27; the page also discloses Gemini privacy practices, data retention, and feedback mechanisms.

Google Help·Accessed
Tier A — Strongest evidenceRead source

ChatGPT Search (OpenAI Help Center)

OpenAI · 2024

Key finding

OpenAI documents that ChatGPT Search rewrites a user query into one or more targeted queries and may send additional, more specific queries after reviewing initial results. This is first-party proof that query fan-out behaviour is real inside a production chatbot search system.

Methodology note

Official OpenAI Help Center article describing how ChatGPT Search functions, including the prompt-rewriting and follow-up query behaviour. Content verified by fetch on 2026-05-27 (HTTP 200 confirmed; full body accessible in a browser session). Article documents product behaviour without disclosing the rewriting algorithm, query-volume statistics or model-side reasoning.

OpenAI Help Center·Accessed
Key finding

Pages ranking for both the main query and at least one fan-out sub-query collected 51% of AI Overview citations. Pages ranking only for the main query collected just under 20%. Ranking for fan-out queries makes citation 161% more likely than ranking only for the head term. Around 68% of cited pages did not rank in Google's top 10 for any related query.

Methodology note

Search Engine Land coverage, December 2025, of a Surfer SEO analysis of 10,000 keywords and 33,000 fan-out queries extracted with Gemini. Surfer measured the share of AI Overview citations going to pages ranking on the head query, on fan-outs, on both, or on neither, and reported a Spearman correlation of 0.77 between fan-out coverage and citation rate.

Search Engine Land·Accessed
Key finding

Search Engine Journal reports on vendor-observed fan-out behaviour in ChatGPT Search, including the pattern of fan-out queries often switching to English regardless of the original prompt language. Useful as reporting on vendor-observed patterns; should not be treated as first-party platform proof or as a representative sample of all end-user prompts.

Methodology note

Search Engine Journal article by Matt G. Southern, 18 February 2026, reporting on a Peec AI analysis of 10M+ ChatGPT prompts and 20M fan-out queries. Trade-press coverage of vendor-observed patterns; the underlying dataset comes from Peec's own controlled prompt runs (UI scraping), not a representative sample of all consumer ChatGPT use.

Search Engine Journal·Accessed
Tier B — Citable with caveatsRead source

How to view fanout queries generated by AI (Ahrefs Help)

Ahrefs · 2025

Key finding

Ahrefs provides a way for users to view fan-out queries that AI assistants generate from a seed prompt the user has chosen to track. This is evidence that third-party tools can observe AI-generated query rewrites for prompts under the user's control, but it does not prove access to all end-user prompts in the wild.

Methodology note

Ahrefs Help Center article by Constance Tan (updated weekly) describing the Brand Radar fan-out-queries feature for ChatGPT and Perplexity. Explains that Ahrefs typically returns two fan-out queries per tracked prompt (sometimes one, sometimes none) and compares fan-out to People Also Ask. Vendor-reported product behaviour; the fan-out queries observed are derived from user-defined seed prompts. Content verified by direct fetch.

Ahrefs Help Center·Accessed
Tier B — Citable with caveatsRead source

How to find relevant prompts for your brand? (Otterly Help)

Otterly AI · 2024

Key finding

Otterly's own help documentation explicitly states there is 'no way to learn which prompts are most asked at ChatGPT or Perplexity' and 'no way to know what exactly people are searching for in the AI engines.' Otterly recommends constructing prompts from available external inputs such as brand terms, domains, industries, URLs, and SEO keywords. This is a vendor admission that aligns with the public-proxy thesis.

Methodology note

Otterly AI Help Center article (last updated April 2026) describing the vendor's own recommended methodology for building a brand's prompt list. Self-reported vendor documentation; the page explicitly states that AI search engines do not publish query data and lists three substitute methods Otterly supports (Prompt Research tool, Google Search Console import, AI-assisted brainstorming). Content verified by direct fetch.

Otterly AI Help Center·Accessed
Tier B — Citable with caveatsRead source

How can I add prompts? (Otterly Help)

Otterly AI · 2024

Key finding

Otterly supports prompt construction from external proxies including SEO keywords, brand names, industry terms, and URLs. The page reinforces that prompts are customer-defined and proxy-derived, not drawn from a privileged platform-wide feed of real chatbot user prompts.

Methodology note

Otterly AI Help Center article (December 2025) describing the three ways customers can add prompts inside the Otterly platform: individual entry, CSV import, or the AI Prompt Research tool. Self-reported vendor documentation. Useful as evidence of the kinds of inputs Otterly accepts; not a controlled study or independent benchmark. Content verified by direct fetch.

Otterly AI Help Center·Accessed
Tier B — Citable with caveatsRead source

Quickstart Guide (Peec AI Docs)

Peec AI · 2024

Key finding

Peec's documentation says the platform runs customer prompts daily across AI platforms. This supports the interpretation that vendors like Peec observe outcomes from prompts they execute rather than drawing from a secret platform-wide prompt firehose.

Methodology note

Peec AI's official Quickstart Guide, published on its Mintlify-hosted documentation site. Describes the four-step onboarding workflow (set up prompts, identify competitors, read the dashboard, analyse sources) and confirms that Peec runs customer-defined prompts daily across ChatGPT, Perplexity, Gemini and Copilot. Content verified by direct fetch on 2026-05-27.

Peec AI Documentation·Accessed
Tier B — Citable with caveatsRead source

Welcome to Peec AI (Peec AI Docs)

Peec AI · 2024

Key finding

Peec frames itself as a platform that runs customer-defined prompts across major AI assistants and tracks visibility, citation, and answer-inclusion outcomes. Useful as additional product-context evidence that the platform observes outputs from its own controlled runs.

Methodology note

Peec AI's product-introduction documentation page. Describes the platform's three core metrics (Visibility, Position, Sentiment), the prompt-running cadence, and Peec's UI-scraping data-collection approach. Confirms that data comes from prompts the customer defines, not from a privileged platform-wide feed. Content verified by direct fetch on 2026-05-27.

Peec AI Documentation·Accessed

About the author Max Ackermann

Max Ackermann is founder and Managing Director of info.link, the product data platform that makes brands visible in AI search and connects every physical product to the web through GS1 Digital Link. He writes about AI search and generative engine optimization (GEO), AI-powered commerce, and how brands can structure product data for ChatGPT, Gemini, Perplexity, and retailer AI assistants like Amazon Rufus. For the past two years he has built the pipelines that put structured product data into AI answers, and run the experiments that test what actually moves AI citations.

Max has 20+ years of experience building digital products and businesses. He previously led McKinsey's Corporate Venture and Design teams across Europe, and as Managing Director of a leading US digital agency he built platforms with Nike, Google, Meta, and Airbnb. He founded the UX Design program at Central Saint Martins College, University of the Arts London, and is a Fellow of the UK's Higher Education Academy. Based in Hamburg, he works closely with GS1 on Digital Link adoption; info.link is headquartered in Hamburg and Berlin and counts GS1 Germany among its investors.

Follow Max on LinkedIn.

Interested?

From compliant digital labels to AI-verified product answers, we help leading brands ensure their products are visible and accurately represented everywhere consumers look. Book your free consultation and demo.

digital label preview
digital label preview
digital label preview
What we can know about AI chatbot prompts | info.link