llms.txt: What It Is, What It Does, When You Actually Need One

The Short Answer

llms.txt is a proposed standard for a Markdown file placed at the root of your domain. It tells large language models which pages on your site contain the most important information and what order to read them in. The proposal came from Jeremy Howard (Answer.AI, fast.ai) in 2024 to address one specific technical limit. LLM context windows are too small to read entire websites, so sites should offer a curated reading list. The standard is gaining traction in developer documentation and AI-native companies. Most businesses don’t need one yet.

The Problem It Solves

When an AI engine needs to answer a question about your company, it can’t just read your whole website. Context windows cap out in the hundreds of thousands of tokens, which is a few hundred pages at most. If your site has thousands of pages, the engine grabs whatever pages its crawler indexed and hopes the right information is somewhere in that slice.

llms.txt is a hint. It’s a Markdown file you write that says “here are the ten pages on this site that matter most for understanding what we do.” The format is human-readable (you can write it in a text editor in ten minutes) and machine-readable (LLMs can parse it directly). It coexists with robots.txt and sitemap.xml but serves a different purpose. robots.txt says what crawlers may access. sitemap.xml lists every URL. llms.txt curates the subset that matters most for inference-time reasoning.

What the File Looks Like

The format is Markdown. An H1 heading with the site or product name. An optional blockquote summary. One or more H2 sections listing URLs with short descriptions.

# Example Company

> Example Company builds AI visibility audit tools for small businesses.

## Getting Started
- [What we do](https://example.com/about): Company overview
- [How the product works](https://example.com/how-it-works): Product tour

## Documentation
- [Pricing](https://example.com/pricing): Plans and pricing
- [API docs](https://example.com/api): Developer reference

## Deep dives
- [Case study: 38 to 79 score](https://example.com/case-studies/aurea-regina): Real-world outcome

That’s the whole spec for the basic version. A more extensive variant called llms-full.txt inlines the actual page content into a single fetchable file, for cases where you want AI engines to have everything at once.

Who Actually Needs One

Three groups benefit most, in rough order of signal strength.

Software companies with documentation. If you publish developer docs (SDKs, APIs, CLIs, libraries), llms.txt lets you point AI coding assistants straight at the canonical reference. FastHTML and most Answer.AI projects adopted this pattern early. Stripe, Anthropic, plus a growing list of infrastructure companies now publish llms.txt files in 2026.

Content-heavy publishers. Sites with hundreds or thousands of articles benefit from curating a top-twenty list of pillar pieces that represent the site’s core authority. AI engines that would otherwise grab random pages can now start with the intended ones.

Complex product sites. If your business offering isn’t obvious from the home page alone (niche B2B, regulated industries, multi-product companies), llms.txt gives AI engines a guided tour instead of relying on surface-level crawling.

If you’re a local service business with a ten-page site, skip it. The file assumes you have enough content that curation helps. For small sites, solid Schema.org markup and a clean Organization or LocalBusiness entity does more.

How to Generate One

The fastest path is a generator. A few tools automate the work by crawling the site and proposing a structure.

Cairrot includes an llms.txt generator as part of its AEO tracking platform. A few open-source generators exist (llmstxt-hub, llms-txt-cli) that output a first draft you can edit. For small sites, writing it by hand in a text editor takes fifteen minutes and you’ll probably end up editing whatever a generator produces anyway.

The key editorial decision is which pages make the cut. Aim for fifteen to thirty pages maximum. Include the pages that explain what you do, who you do it for, what it costs, and the proof points: case studies, docs, key research. Leave out blog archives, category pages, anything transient.

Does It Actually Move the Needle

Honest answer depends on who’s reading. OpenAI, Anthropic, and a handful of other inference providers have acknowledged reading llms.txt files in various contexts. Google hasn’t confirmed that Gemini or AI Overviews respect the standard. The signal is real but not universal in 2026.

The cost to publish an llms.txt is low (an hour of work). The downside is zero. The file does no harm. The upside is a cleaner signal for the AI engines that do read it. That risk reward pencils out for most businesses with more than a handful of pages.

What This Means in Practice

If you have documentation, publish one. If you have a content library, publish one. If you have a simple marketing site, skip it and spend the time on Schema.org markup and corroboration instead.

Schema.org for AI Visibility: the other technical signal AI engines actually read
The Three Layers of AI Visibility: where llms.txt sits in the framework
GEO and AEO Explained: the strategic context llms.txt operates inside