Why Every AIKit Blog Post Appears in Both Google and LLM Crawlers: EmDash's Dual-Discovery Architecture

Every single one of AIKit's 634 published blog posts is instantly discoverable by Google's crawler **and** by every major LLM training pipeline — without any duplication, extra build steps, or separate content management workflows. The secret isn't a clever SEO plugin or a separate AI pipeline. It's EmDash's dual-discovery architecture: a single D1-backed content engine that simultaneously serves traditional sitemaps and the emerging llms.txt standard from the exact same database queries.

The Problem

Content teams today face an impossible choice. Traditional SEO demands structured metadata, sitemaps, canonical URLs, semantic HTML, and Google-friendly rendering. Meanwhile, the AI-discoverability world (LLM crawlers from OpenAI, Anthropic, Google, Meta) expects clean markdown, token-efficient excerpts, and well-structured plain-text feeds.

Most teams build two separate content pipelines:

- **Pipeline A**: CMS → static site generator → sitemap.xml → Google Search Console.

- **Pipeline B**: A separate API or webhook that feeds content to AI training crawlers — maybe an llms.txt generated at build time, maybe a manual export.

This dual-pipeline approach creates real pain:

| Concern | Separate Pipelines | Unified (EmDash) |

|---|---|---|

| Content freshness | Stale until next deploy | Real-time at every request |

| Maintenance burden | Two codebases to update | One database, one query |

| Format divergence | HTML vs Markdown drift | Single source of truth |

| Scaling cost | N parallel stacks | One D1 instance |

When your pipelines diverge, content drifts. The sitemap has posts the llms.txt doesn't — or vice versa. Excerpts differ. Metadata gets out of sync.

The Solution

EmDash's dual-discovery architecture solves this by treating every piece of content as a single record in Cloudflare D1 — and **rendering it on demand** into whatever format the requesting agent needs.

Google crawlers and LLM crawlers both want the same underlying data (title, body, excerpt, publish date, tags). They just want it formatted differently. Google wants HTML + XML. LLMs want markdown + plain text. Why maintain two copies when you can have one database and two server-rendered views?

Every blog post lives in the `ec_posts` table in D1 — a single row per post. When Google crawls `/sitemap.xml`, EmDash queries D1 and renders XML on the fly. When an LLM crawler hits `/llms.txt`, EmDash queries the same table and renders markdown. Same data. Same freshness. No build step required.

Architecture

EmDash runs on Astro with the Cloudflare adapter, deployed as a Cloudflare Worker:

```

┌─────────────────────────────────────────────────────────┐

│ Cloudflare Edge │

│ │

│ Astro SSR (Worker) │

│ ┌──────────────────────────────────────────────────┐ │

│ │ Dynamic Routes: │ │

│ │ ├── /sitemap.xml → sitemap.xml.ts │ │

│ │ ├── /llms.txt → llms.txt.ts │ │

│ │ ├── /llms-full.txt → llms-full.txt.ts │ │

│ │ ├── /posts/[slug] → [...slug].astro │ │

│ │ └── /api/posts → posts.json (D1 query) │ │

│ │ │ │

│ │ D1 Binding: context.env.DB │ │

│ └──────────────────────────────────────────────────┘ │

│ │ │

│ ▼ │

│ ┌──────────────────────────────────────────────────┐ │

│ │ Cloudflare D1 (SQLite at the edge) │ │

│ │ ┌────────────────────────────────────────────┐ │ │

│ │ │ ec_posts │ │ │

│ │ │ ├── id (INTEGER PRIMARY KEY) │ │ │

│ │ │ ├── slug (TEXT UNIQUE) │ │ │

│ │ │ ├── title (TEXT) │ │ │

│ │ │ ├── body (TEXT — full markdown) │ │ │

│ │ │ ├── excerpt (TEXT) │ │ │

│ │ │ ├── category (TEXT) │ │ │

│ │ │ ├── tags (TEXT — JSON array) │ │ │

│ │ │ ├── published (INTEGER) │ │ │

│ │ │ ├── pub_date (TEXT — ISO 8601) │ │ │

│ │ │ └── updated_at (TEXT — ISO 8601) │ │ │

│ │ └────────────────────────────────────────────┘ │ │

│ └──────────────────────────────────────────────────┘ │

└─────────────────────────────────────────────────────────┘

```

The critical detail: there is **no build step**. The sitemap, llms.txt, llms-full.txt, and every blog post page are all dynamic Astro server routes querying D1 on every request. When content is added or updated via the EmDash Content API, changes are visible to every crawler on the very next request.

Implementation

The llms.txt route is surprisingly simple:

llms.txt.ts

```typescript

// src/pages/llms.txt.ts

import type { APIRoute } from "astro";

export const GET: APIRoute = async (context) => {

const DB = context.locals.runtime.env.DB;

const { results } = await DB.prepare(

"SELECT slug, title, excerpt, pub_date, category, tags " +

"FROM ec_posts WHERE published = true ORDER BY pub_date DESC"

).all();

let output = "# AIKit Blog\n\n";

output += "> A headless CMS built for the edge.\n\n";

for (const post of results) {

const url = `https://ai-kit.net/posts/${post.slug}/`;

output += `## [${post.title}](${url})\n`;

output += `- Published: ${post.pub_date}\n`;

output += `- Category: ${post.category}\n`;

output += `- Tags: ${post.tags}\n\n`;

output += `${post.excerpt}\n\n`;

}

return new Response(output, {

headers: { "Content-Type": "text/plain; charset=utf-8" },

});

};

```

llms-full.txt.ts

The full-content variant includes complete post bodies — ideal for LLMs that want full corpus context for training or RAG:

```typescript

// src/pages/llms-full.txt.ts

import type { APIRoute } from "astro";

export const GET: APIRoute = async (context) => {

const DB = context.locals.runtime.env.DB;

const { results } = await DB.prepare(

"SELECT slug, title, body, excerpt, pub_date " +

"FROM ec_posts WHERE published = true ORDER BY pub_date DESC"

).all();

let output = `# AIKit Blog — Full Content\n\n`;

output += `Total posts: ${results.length}\n\n---\n\n`;

for (const post of results) {

const url = `https://ai-kit.net/posts/${post.slug}/`;

output += `# ${post.title}\n\nURL: ${url}\nPublished: ${post.pub_date}\n\n`;

output += `${post.body}\n\n---\n\n`;

}

return new Response(output, {

headers: { "Content-Type": "text/plain; charset=utf-8" },

});

};

```

sitemap.xml.ts

For Google crawlers:

```typescript

// src/pages/sitemap.xml.ts

import type { APIRoute } from "astro";

export const GET: APIRoute = async (context) => {

const DB = context.locals.runtime.env.DB;

const { results } = await DB.prepare(

"SELECT slug, updated_at FROM ec_posts WHERE published = true"

).all();

const urls = results.map(

(post) => `

<url>

<loc>https://ai-kit.net/posts/${post.slug}/</loc>

<lastmod>${post.updated_at}</lastmod>

<changefreq>monthly</changefreq>

</url>`

).join("");

const xml = `<?xml version="1.0" encoding="UTF-8"?>

${urls}

</urlset>`;

return new Response(xml, {

headers: { "Content-Type": "application/xml; charset=utf-8" },

});

};

```

Three routes. One D1 table. Zero build steps.

Results

The impact has been immediate:

- **634 published posts** — every single one is in both `/sitemap.xml` and `/llms.txt` the moment it's published. No exceptions. No drift.

- **Real-time freshness** — when content updates via the EmDash Content API, all three routes reflect the change on the next HTTP request. No deploy, no regeneration.

- **No separate AI pipeline** — zero additional services or cron jobs for AI discoverability. llms.txt and llms-full.txt are just two more Astro endpoints.

- **Zero extra infrastructure cost** — a single `SELECT ... WHERE published = true` powers both Google crawlers and LLM training pipelines.

One workflow covers both audiences. Write a post, hit publish, and it's immediately available to:

1. **Googlebot** — via dynamic `/sitemap.xml` submitted to Search Console

2. **OpenAI crawlers** (GPTBot, ChatGPT-User) — via `/llms.txt`

3. **Anthropic crawlers** (Claude-Web) — via the same `/llms.txt`

4. **Meta/AI crawlers** — via `/llms.txt` and `/llms-full.txt`

5. **Any RAG system** — via `/llms-full.txt` for full-content chunking and embedding

Key Takeaways

1. **Dual-discovery eliminates content drift.** One database powers both your sitemap and your llms.txt — they will never disagree about what content exists.

2. **Real-time beats build-time for AI discoverability.** LLM crawlers don't wait for your next deploy. EmDash's dynamic routes ensure zero latency between publishing and discovery.

3. **llms.txt is a first-class route, not an afterthought.** Treating `llms.txt.ts` as an Astro server endpoint (not a static build artifact) means it inherits the same reliability, caching, and edge-deployment benefits as every other route.

4. **This pattern scales.** Whether you have 10 posts or 10,000, the architecture is identical. D1 handles the volume, and Cloudflare Workers scale horizontally for any crawler load.

5. **The future is format-agnostic storage with format-aware rendering.** Store once in D1, render for every consumer — Google, LLMs, RSS readers, JSON APIs, whatever comes next.

Why Every AIKit Blog Post Appears in Both Google and LLM Crawlers: EmDash's Dual-Discovery Architecture

The Problem

The Solution

Architecture

Implementation

llms.txt.ts

llms-full.txt.ts

sitemap.xml.ts

Results

Key Takeaways

Related Posts

How to Set Up a Telegram Token Bot for Your Community: A DeFiKit Bot Maker Runbook

PlayableAd Studio Content Syndication Kit: Turn One Demo Into Partner-Ready Growth Assets

AIKit Answer Engine Pages: Turning SEO Articles Into LLM-Ready Conversion Paths