Loading...

12 May 2026

Technical SEO in the AI Era

Schema markup, llms.txt, structured data, AI visibility: how to get read and cited by generative systems

Technical SEO in the AI Era

When ChatGPT answers a question or Google displays an AI Overview, where does that information come from? In most cases, from websites like yours. But only if those websites are structured in the right way to be read by artificial intelligence systems . Today, simply being online is no longer enough: you need to be understandable to machines and authoritative enough to be selected as a source. 

The stakes are already real. Recent studies estimate that  37–38% of purchases are now influenced by an AI recommendation. Brands that fail to establish a presence in this space are giving up access to a growing share of users who prefer asking a chatbot a question rather than opening ten browser tabs. 

How AI systems read a website 

Until recently, the only “automatic reader” companies needed to take into account was Googlebot, the crawler that scans and indexes web pages. The landscape is now much more crowded: new bots, known asLarge Language Model (LLM) crawlers such as OpenAI’s GPTBot, Anthropic’s ClaudeBot, PerplexityBot, and Bingbot AI, visit websites to collect content that can be used in generated answers. 

These systems read pages differently from Google: they often do not execute JavaScript and therefore may not see dynamically loaded content. They work best with linear, well-organized text that is immediately easy to understand. 

There is another aspect that few people consider. When a user asks ChatGPT or Perplexity a question, the system does not simply search Google and return the first result. It launches a series of parallel searches from different angles on the same topic, collects the websites that appear most often across those queries, downloads the most useful content, and builds a temporary memory—technically known as RAG, or Retrieval-Augmented Generation—on which the entire conversation is based. 

This selection process explains a phenomenon many are discovering with surprise: websites that rank well on Google but never appear in AI-generated answers. This happens when content, although optimized for ranking, is too brief or built around mechanics that reward technical form more than informational quality. AI ignores it because it cannot support a conversation across multiple questions. 

While Google indexes pages to display them in search results, AI systems work through ingestion: they collect content from which they extract information to generate answers. In this new paradigm, visibility depends on the ability to be cited, not only on SERP position. 

Schema markup: the language AI understands 

Schema markup is code added to web pages that labels content, making it precisely readable by machines. Thanks to these labels, an AI system can immediately understand whether it is reading an article, a question with its answer, a practical guide, or an author profile. It is a bit like adding captions to an image: the content does not change, but the reader immediately understands what it is about. 

Among the most useful formats are Article, FAQPage, HowTo, and Person, all of which help make content easier to interpret and strengthen its credibility in the eyes of automated systems. 

A concrete example: a page with FAQPage markup is more likely to appear in AI Overviews because the system immediately recognizes the question-and-answer structure and already knows how to use it. 

llms.txt and robots.txt: managing bots in the AI era 

The robots.txt file is a long-standing technical SEO tool: it tells crawlers which sections of a website they are allowed to visit. With the rise of AI bots, it has taken on a more strategic role: it allows companies to decide whether to open or restrict access to generative systems. Allowing access increases the chances of being cited; blocking access protects content but cuts it off from AI visibility. 

Another emerging file is llms.txt, designed specifically to guide AI systems as they read website content. Unlike robots.txt, it does not block anything: it provides context and instructions on how to interpret the website. However, it is still an evolving standard. 

AI-ready content structure: think in topics, not keywords 

AI systems look for information that is clear and immediately usable. For this reason, content structure becomes decisive: hierarchical headings (H1, H2, H3), short paragraphs, and answers that get straight to the point without unnecessary introductions. FAQ sections work well for this very reason: they make content easier to read for anyone looking for quick answers, whether human or machine. 

What changes compared with traditional SEO is topical depth. AI is not looking for the page that answers a single question: it is looking for content that can sustain an entire conversation on that topic. A page built around a single keyword, however well optimized, is less competitive than content that addresses the topic comprehensively, including related questions, comparisons, and use cases. 

This is where what experts call Prompt Research comes in. Instead of asking, “Which keyword should I rank for?” the question becomes: “If a user asks this question to AI, what would the system search on Google to build its answer?” Content is then designed to cover that entire space, not just the core query. 

Another rule also applies: AI tends to read the first lines of each paragraph. If the answer is buried at the end of a block of text, it may simply not be extracted. Writing clearly and directly is not a concession to machines: it is quality journalism, and it works for everyone. 

E-E-A-T in the generative era: authority cannot be improvised 

Experience, Expertise, Authoritativeness, and Trustworthiness: the framework Google uses to evaluate content quality has also become relevant for AI systems. Content is more likely to be cited if it is written by a recognizable author, supported by verifiable data and sources, and updated over time. 

Generative systems skip generic content: they can produce similar material on their own, so they have no reason to cite it. They favor sources that add something more: an original point of view, first-party data, or direct experience with the topic. 

However, there is another variable that is often overlooked: what AI already knows about you before it even searches the web. Language models are trained on large bodies of text up to a certain date—the so-called knowledge cutoff—after which information is not automatically updated. If you have changed your services, pricing, positioning, or name in the meantime, the model may still return outdated information or, worse, fill in the gaps with invented data. 

Auditing this internal knowledge—checking what an AI knows about your brand without consulting the web—has become the starting point of any serious GEO strategy. If gaps or inaccuracies emerge, action is needed on two fronts: on-site, by making information on the website clearer and more complete; and off-site, through digital PR activities designed to build a consistent ecosystem of sources around the brand. 

AI Visibility: the missing KPI 

If traditional SEO is measured through ranking, GEO—Generative Engine Optimization—introduces a different indicator: share of model, meaning how often your content is chosen as a source by a generative engine compared with your competitors’ content. 

The question is no longer, “Where do I rank on Google?” but rather, “Who is talking about me when a user asks ChatGPT or Gemini to compare solutions in my industry?”. If your content is not selected as a source in that conversation, you are absent at the very moment when the user is already making a decision. 

Monitoring AI visibility also helps identify which sources are influencing answers in your place. If an outdated article on a third-party website is giving ChatGPT incorrect information about your brand, you can contact the owner and request a correction. This type of work looks more like public relations than classic SEO, but it has become part of the job. 

Beyond the website: YouTube, social media, and mentions 

Today’s AI systems do not read only web pages. They are multimodal: they analyze video transcripts, forum threads, and Reddit discussions. YouTube, in particular, is becoming an increasingly cited source in prompts because AI systems can extract information from subtitles and transcripts and use it to build answers. 

This opens up a path that goes beyond classic SEO. Producing high-quality video content, participating in conversations on industry forums, and maintaining a consistent social media presence are no longer just branding activities: they are ways to increase the surfaces through which AI can find you and use you as a source. 

A new practice is also emerging on the external source front: earning mentions without links. For years, SEO has thought in terms of backlinks. Today, to influence how AI talks about you, it can be just as useful for authoritative sources to mention your brand or expertise naturally within a text, even without linking to your website. AI reads words, not just links. 

The website in the zero-click era: fewer visits, higher quality 

There is a shift that companies still struggle to accept: the website is gradually becoming less of a place where people search for information. They already get that information from AI. They arrive on the website later, when their choice has already largely taken shape. 

The role of the website is changing, but it is not becoming empty. It becomes the place where AI has learned to speak in your voice, and where the user arrives to confirm what they already know. Informational traffic decreases, while transactional traffic remains and, in some cases, grows. Fewer visits, but from users who already know what they want. 

For an e-commerce business or service provider, this can be a clear advantage: if AI learns to recommend your product in the right conversations, you reach an audience that is already oriented toward buying. Artificial intelligence is becoming the point of reference that influencers or search engines once were. With one difference: you do not pay for it—you earn it. 

From Technical SEO to Agentic AI as a Service 

At Mashfrog, this shift is not approached as an evolution of SEO alone or of content production, but as a transformation of the entire digital value chain. We work with multidisciplinary teams of specialists—from data architecture to UX, through to AI engineering—capable of understanding, interacting with, and performing within machine-readable systems. 

Mashfrog is transforming as an agentic AI as a service organization: a partner that goes beyond optimization for search engines or models, and instead builds environments where AI agents can operate, learn, and effectively interact with data, content, and services. Technical SEO therefore becomes one layer within a broader system, where structure, semantics, accessibility, and information governance work together to make brands truly “present” and able to communicate with AI ecosystems. 

Value is no longer defined by individual assets, but by the ability to orchestrate skills and technologies to shape how artificial intelligence systems read, understand, and support business outcomes. Those who can govern this layer are not just visible—they become an active part of the decision-making logic that AI systems apply every day. Those who delay the Technical SEO shift, on the other hand, risk disappearing not only from Google, but from the conversations that truly matter. 

FAQ – User-friendly questions and answers 

What is an LLM crawler? 
It is a bot used by AI systems to collect content from websites and integrate it into generated answers. 

What is schema markup used for? 
It describes the content of a page in a structured way, so that search engines and AI systems can immediately understand what it is about. 

What is llms.txt? 
It is a file that provides AI systems with guidance on how to read and interpret a website’s content. 

Should I block AI bots in robots.txt? 
It depends on the strategy. Blocking them protects content but reduces the chances of being cited by generative systems. 

How can I make content AI-ready? 
With a clear structure, hierarchical headings, direct answers from the very first lines, and comprehensive coverage of the topic—not just the main keyword. 

What is share of model? 
It is the main KPI in GEO. Share of model measures how often your content is chosen as a source by generative engines compared with competitors. 

Does good Google ranking guarantee AI visibility? 
No, not automatically. A website can rank on the first page of Google and still be ignored by AI if its content is not comprehensive enough or is built only for ranking. 

What is Prompt Research? 
It is the evolution of keyword research: it analyzes the sub-queries an AI system would run on Google to answer a given prompt, then builds content to cover that full set of questions—not just the main one.