AI Search Engines10 min read

How ChatGPT, Perplexity, and Gemini Decide Which Sites to Cite

Traditional SEO is about ranking in a list. GEO is about being selected as a source. These are fundamentally different problems — and each AI search engine has its own mechanism for choosing sources. Understanding the technical details is the first step to appearing in them.

The Three Pipelines

ChatGPT, Perplexity, and Gemini all surface web content in AI responses — but the pipeline from "your site" to "cited in AI response" is different for each. Here's how each one actually works.

ChatGPT: Responses API + web_search_preview

When ChatGPT answers a question that requires current information, it uses the web_search_preview tool via the OpenAI Responses API. This is different from the Chat Completions API — the Responses API is the one that actually fetches and cites URLs.

The pipeline works as follows:

  1. User asks a question
  2. ChatGPT decides whether web search is needed (informational intent triggers it)
  3. A Bing search is executed with a reformulated query
  4. Top results are fetched and parsed
  5. Content is passed to GPT-4o for synthesis
  6. The model generates a response citing the sources it actually used as url_citation annotations

What This Means for Optimization

  • Bing ranking matters. ChatGPT searches Bing. If you don't rank on Bing, you won't be found.
  • Structured data helps. Bing weights structured data heavily in its ranking signals.
  • Page load speed matters. Slow pages may be skipped during the fetch phase.
  • Clear, quotable paragraphs matter. GPT-4o prefers content with distinct, standalone statements it can quote accurately.

Perplexity: Sonar with Direct Citations

Perplexity is built specifically for web search — it's the most citation-heavy of the three. The Sonar API (Perplexity's model family) returns a citations[] array with every response: a list of URLs the model actually drew from to generate its answer.

The Perplexity pipeline:

  1. Query is classified and reformulated
  2. Web search runs across multiple indices (Google, Bing, direct crawl)
  3. Top 5-10 pages are fetched and chunked
  4. Sonar model synthesizes from the chunks
  5. Response includes numbered citations linking back to source URLs

Perplexity's citation mechanism is unique: it cites specific passages, not just pages. This means content that is clearly segmented (headers, short paragraphs, bullet points) is more likely to be quoted verbatim.

What This Means for Optimization

  • Content segmentation. Use H2/H3 headers to create quotable sections. Each section should answer one question.
  • Freshness. Perplexity heavily weights recent content. Update key pages regularly.
  • FAQ schema. Perplexity extracts Q&A pairs from FAQPage JSON-LD directly.
  • Domain authority still matters. Higher-authority domains get more weight in the initial retrieval phase.

Gemini: Google Search Grounding and AI Overviews

Gemini uses Google's own search infrastructure with a feature called "grounding." When Gemini is asked a factual question, it calls the google_search_retrieval tool, which searches Google and returns sources as grounding_chunks with web URIs.

These grounding sources are what power Google's "AI Overviews" — the summary that appears at the top of Google search results. There's a direct relationship: if you appear in Gemini's grounding sources, you're likely to appear in AI Overviews too.

The key difference from ChatGPT: Gemini searches Google, which means Google's full ranking algorithm applies — PageRank, E-E-A-T, Core Web Vitals, and all structured data signals.

What This Means for Optimization

  • Google SEO remains the foundation. Gemini/AI Overviews amplify existing Google ranking. Traditional SEO still matters.
  • E-E-A-T signals. Experience, Expertise, Authoritativeness, Trust — especially important for health, finance, and technical topics.
  • FAQPage schema directly feeds AI Overviews. This is documented by Google — FAQ structured data is one of the explicit sources for AI Overview content.
  • Core Web Vitals. Slow, janky pages are penalized in Google rankings and therefore in Gemini citations.

Yandex GPT: Text-Based Mention Detection

Yandex GPT (used in Yandex's Alice assistant and search) takes a different approach. Unlike the Western engines, it doesn't expose citation URLs — instead, it mentions brands and sites in its response text. Monitoring Yandex requires checking whether your domain or brand name appears in the response.

Yandex's Turbo Pages (AMP equivalent) and structured data through Yandex's own tools (Yandex Webmaster) are the primary levers for Yandex search visibility.

The Common Signals Across All Three

Despite different architectures, a consistent set of signals improves citation rate across all AI search engines:

FAQPage JSON-LD
+41% citation rate
Works across ChatGPT, Perplexity, Gemini
robots.txt AI bot access
Required baseline
GPTBot, ClaudeBot, PerplexityBot must not be blocked
Schema.org Organization/Product
+15-20% citation rate
Helps AI models identify who you are
Content depth (800+ words)
+12% citation rate
Thin pages are deprioritized across all engines
Brand mentions + sameAs links
+8% citation rate
LinkedIn, GitHub, Wikipedia links establish authority
Fresh content (< 6 months)
+10% citation rate
Especially important for Perplexity

How to Monitor Your Citation Rate

The only way to know if AI engines are actually citing you is to ask them. For each engine, this means:

  • ChatGPT: Use the Responses API with web_search_preview tool and check url_citation annotations in the output
  • Perplexity: Use the Sonar API and check the citations[] array in the response
  • Gemini: Use google_search_retrieval grounding and check grounding_metadata.grounding_chunks[].web.uri
  • Yandex: Check whether your domain appears in the response text

This is exactly what Causabi's monitoring module automates — running your target queries through each engine weekly and tracking your citation rate over time.

See your current citation score

Causabi scores your site across all 5 GEO dimensions and shows exactly what's blocking you from being cited by AI search engines.

Check your GEO score →