In May 2025, Google reported that AI Overviews were appearing on more than 1.5 billion monthly queries, and roughly 10% of websites now publish an llms.txt file aimed at large language model crawlers.
ChatGPT and Perplexity each return citations from a noticeably different mix of sources than Google does. The traffic that arrives through these channels behaves differently from organic search traffic, with longer dwell times and lower bounce on landing pages that match the cited claim.
A site that wants to participate has to be readable by both kinds of visitors at once.
This article covers the technical, structural, and editorial preparation that a small or mid-sized business site needs to remain visible as AI-mediated discovery grows.
Table of Contents
AI Traffic Versus Search Traffic
Search traffic arrives knowing what query produced the result. AI-cited traffic arrives already partway through a conversation, often having read a summary that paraphrased your page.
The buyer is further along but harder to influence with a generic landing page.
The implication for site preparation is concrete.
Pages have to load with the cited claim front and center, with the supporting evidence near it, in a structure a skim-reader can verify in 10 seconds. Pages built around slow reveals and scroll-driven storytelling lose AI-routed buyers immediately because the buyer came to confirm one specific thing.
Content Structure for Machine Reading
LLMs extract facts from raw HTML rather than from rendered pages.
Content that loads via JavaScript after page paint is invisible to most AI crawlers. The first preparation step is a render audit.
Open the published page, view source, and confirm the main body content appears in the initial HTML. If the body is empty until scripts run, the AI sees an empty page. Server-side rendering or static generation fixes this for sites built on modern frameworks.
The second preparation step is heading hierarchy. One H1 per page, H2s for major sections, H3s for subsections, in source order matching visual order. AI summarizers use this hierarchy as the table of contents they cite from.
Structured Data and Schema Markup
Schema markup gives crawlers explicit facts about the page that text alone leaves implicit. Search Engine Journal’s analysis of structured data role in AI search visibility found pages with valid JSON-LD appear in AI-generated summaries 20% to 30% more often than unstructured pages.
A small business site needs four schema types at minimum. Organization schema on every page identifying the business.
Local Business schema on contact and location pages. Product or Service schema on the offering pages. FAQPage schema on any page with a question-and-answer block. JSON-LD format goes in the page head and is the format Google and most AI systems prefer.
Schema accuracy matters as much as presence. AI systems compare structured data against visible page content and ignore (or penalize) pages where the two disagree.
Hosting Infrastructure for AI-Ready Sites
A site that AI models can crawl needs to load fast, return clean HTML, and stay reachable during traffic spikes. The site needs powerful wordpress hosting with PHP 8 or later, server-side caching, sufficient memory for the database, and a content delivery network in front of it.
Server response time below 200 ms gives crawlers a wide margin to read the full page. Anything above 600 ms starts to time out partial sections during peak crawl windows.
Page Speed and Core Web Vitals
Page speed acts as a tiebreaker in Google ranking and a deprioritization signal in AI summaries when other factors are close.
The three Core Web Vitals metrics set the targets. LCP under 2.5 seconds, INP under 200 milliseconds, and CLS under 0.1.
Backlinko’s Core Web Vitals study of 208,000 pages found 46% of sites had a poor or needs-improvement LCP score, leaving a meaningful share of the web with room to improve. Real-world cases support the upside of fixing this. Refurbished-phone retailer Swappie reported a 42% mobile revenue lift after focused work on its Core Web Vitals.
Practical fixes are unglamorous. Convert hero images to WebP or AVIF, defer non-critical JavaScript, preload the largest content image, set explicit width and height on every image, and audit any third-party script for the load it adds.
The llms.txt File and Other Emerging Standards
The llms.txt proposal, introduced by Jeremy Howard in September 2024, is a markdown file at the domain root that points language models to the most important content with one-line descriptions.
Major LLM crawlers from OpenAI, Google, and Anthropic do not yet fetch it in meaningful volume, and Ahrefs’ analysis of the llms.txt format concluded the spec is real but the adoption signal is not.
The honest current advice is to publish one if your site has a documentation hub or a long content library, since the cost is low and the upside is real if adoption catches up. Small marketing sites can wait.
The companion file llms-full.txt, developed with input from Anthropic, is the heavier version that includes complete content rather than links.
Brand Entity Building Beyond Your Site
AI systems gain confidence in a brand when independent sources agree about it. Research summarized by Advanced Web Ranking on brand authority and search algorithms describes this as consensus building. Five external sources confirming the same facts about your business move it from “probably real” to “confidently cited.”
The off-site work that produces those signals is unglamorous and slow. Industry directory listings with consistent name-address-phone information. Profiles on review platforms (G2 is the most cited software review platform across ChatGPT, Perplexity, and Google AI Overviews). Wikipedia entries where notability supports them. Quoted contributions to industry publications.
The site itself cannot do this work alone. The site provides the canonical entity description that the rest of the web verifies.
Implementation Order
A practical sequence saves time. First, fix render and hierarchy so AI crawlers can read the page at all.
Second, add the four core schema types and validate them.
Third, address the worst Core Web Vitals failure on mobile, then the second-worst. Fourth, audit hosting capacity against actual traffic plus a 3x safety factor. Fifth, decide on llms.txt based on content depth. Sixth, start the slow off-site work that builds entity consensus.
Done in that order, the technical floor is set within a week, the structural improvements within a month, and the entity work continues for as long as the business operates. The first pass should leave the site readable by machines and useful to humans, in that order, with each subsequent pass tightening both.













