How do you stop guessing which keywords your content covers and start knowing exactly where your SEO gaps are? We built a plugin-level SEO audit tool on Cloudflare D1 and Workers that scans published content, cross-references against a D1 database of target keywords, and generates automated gap reports — turning a 3-hour manual slog into a 5-minute actionable dashboard. ## The Problem — SEO Content Gap Analysis Is Manual, Slow, and Inconsistent Every content team faces the same question: "What have we already covered, and what are we missing?" For plugin developers shipping AIKit and similar tools, the stakes are higher because plugin documentation, landing pages, blog posts, and changelogs all compete for the same keyword real estate. Before we built this system, our SEO audit process looked like this: 1. **Export all content URLs** from the CMS (15 minutes) 2. **Manually scrape or copy titles and headings** from each page (45 minutes) 3. **Paste into a spreadsheet** alongside a keyword master list (20 minutes) 4. **Color-code cells** green/hit or red/miss based on human judgment (30 minutes) 5. **Write up findings** in a shared doc (40 minutes) 6. **Repeat every two weeks** — and pray nothing was missed That's roughly three hours per audit cycle, and it's riddled with failure points: | Failure Point | Impact | |---|---| | Human error in copy-paste | Missing or duplicated URLs | | Inconsistent keyword judgment | One editor calls it "covered," another disagrees | | No crawl depth | Only analyzes visible pages, not sitemap-hidden content | | No trend tracking | Each audit is a fresh snapshot; no historical comparison | | Manual only | Nobody runs it weekly — it's too painful | For a project like AIKit — where we ship plugin updates, new features, and documentation changes weekly — a biweekly manual audit meant we were always two weeks behind reality. By the time we identified a gap, four new pages had already been published without keyword guidance. ## The Solution — Build a D1/Workers Plugin That Scans, Queries, and Reports We needed a system that could: - **Crawl** our entire content surface (blog, docs, landing pages, changelogs) - **Compare** every page's headings, meta descriptions, and body text against a keyword taxonomy stored in D1 - **Score** coverage at the keyword level: covered, partial, or missing - **Report** gaps automatically via a scheduled cron job The core insight: treat your keyword strategy as a database table, not a spreadsheet. Once keywords live in D1, every Worker endpoint can query them. Once content is crawled on a schedule, every gap is automatically detected. We packaged this as a plugin — a self-contained Worker with its own D1 binding, cron trigger, and a lightweight admin dashboard rendered directly from Workers. No external CMS plugin, no third-party SaaS, no extra infrastructure. ## Architecture Overview — Cloudflare Workers + D1 + Scheduled Cron The architecture follows a simple three-layer pattern: ``` ┌─────────────────────────────────────────────────────┐ │ User / Team │ └────────────────────┬────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────┐ │ Cloudflare Workers (API Layer) │ │ ┌──────────┐ ┌─────────────┐ ┌────────────────┐ │ │ │ Crawl │ │ Gap │ │ Dashboard │ │ │ │ Endpoint │ │ Analysis │ │ Endpoint │ │ │ └────┬─────┘ └──────┬──────┘ └───────┬────────┘ │ └───────┼───────────────┼─────────────────┼───────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────┐ │ Cloudflare D1 Database │ │ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │ │ │ keyword_tax │ │ content_pages │ │ gap_reports│ │ │ │ onomy │ │ │ │ │ │ │ └──────────────┘ └──────────────┘ └────────────┘ │ └─────────────────────────────────────────────────────┘ ▲ │ ┌───────┴─────────────────────────────────────────────┐ │ Cron Trigger (weekly via Workers) │ └─────────────────────────────────────────────────────┘ ``` **Layer 1: D1 Database** — Serves as the single source of truth for: - `keyword_taxonomy` — All target keywords grouped by theme, priority, and content type - `content_pages` — Crawled page metadata: URL, title, headings, meta description, word count - `gap_reports` — Generated report snapshots with coverage scores and missing keyword lists **Layer 2: Workers API** — Three endpoints: - `POST /crawl` — Triggers a full content crawl (also called by cron) - `GET /report/latest` — Returns the most recent gap report - `GET /dashboard` — Renders an HTML dashboard for quick visual review **Layer 3: Cron Trigger** — Runs weekly (every Monday at 0600 UTC) to crawl content and regenerate the gap report automatically. ## Implementation — Code Snippets ### D1 Schema for Keyword Tracking ```sql -- Keyword taxonomy: the master list of everything we want to rank for CREATE TABLE IF NOT EXISTS keyword_taxonomy ( id INTEGER PRIMARY KEY AUTOINCREMENT, keyword TEXT NOT NULL UNIQUE, theme TEXT NOT NULL, -- e.g., 'AIKit', 'PluginDev', 'SEO', 'Cloudflare' priority TEXT CHECK(priority IN ('core', 'secondary', 'long_tail')), target_content_type TEXT, -- 'blog', 'docs', 'landing', 'any' monthly_search_volume INTEGER DEFAULT 0, created_at TEXT DEFAULT (datetime('now')), updated_at TEXT DEFAULT (datetime('now')) ); -- Content pages: every page we crawl CREATE TABLE IF NOT EXISTS content_pages ( id INTEGER PRIMARY KEY AUTOINCREMENT, url TEXT NOT NULL UNIQUE, title TEXT, meta_description TEXT, headings TEXT, -- JSON array of all h1-h6 text body_snippet TEXT, -- First 500 chars of body for quick scan word_count INTEGER DEFAULT 0, content_type TEXT, -- 'blog', 'docs', 'landing', 'changelog' last_crawled_at TEXT, created_at TEXT DEFAULT (datetime('now')) ); -- Keyword-page coverage mapping CREATE TABLE IF NOT EXISTS keyword_coverage ( id INTEGER PRIMARY KEY AUTOINCREMENT, keyword_id INTEGER NOT NULL, page_id INTEGER NOT NULL, match_type TEXT CHECK(match_type IN ('exact', 'fuzzy', 'partial')), found_in TEXT, -- 'title', 'h1', 'h2', 'meta', 'body' confidence REAL DEFAULT 1.0, matched_at TEXT DEFAULT (datetime('now')), FOREIGN KEY (keyword_id) REFERENCES keyword_taxonomy(id), FOREIGN KEY (page_id) REFERENCES content_pages(id), UNIQUE(keyword_id, page_id) ); -- Snapshot of gap analysis results CREATE TABLE IF NOT EXISTS gap_reports ( id INTEGER PRIMARY KEY AUTOINCREMENT, generated_at TEXT DEFAULT (datetime('now')), total_keywords INTEGER, covered_keywords INTEGER, partial_keywords INTEGER, missing_keywords INTEGER, total_pages_crawled INTEGER, report_data TEXT -- Full JSON report payload ); -- Index for fast lookups CREATE INDEX IF NOT EXISTS idx_coverage_keyword ON keyword_coverage(keyword_id); CREATE INDEX IF NOT EXISTS idx_coverage_page ON keyword_coverage(page_id); CREATE INDEX IF NOT EXISTS idx_pages_type ON content_pages(content_type); ``` ### Worker Endpoint for Gap Analysis ```javascript // src/gap-analysis.js — Core analysis logic import { Hono } from 'hono'; import { cors } from 'hono/cors'; const app = new Hono(); app.use('/*', cors()); // POST /crawl — Trigger a full crawl and gap analysis app.post('/crawl', async (c) => { const db = c.env.D1_SEO_AUDIT; // Step 1: Crawl content from configured sources const sources = c.env.CONTENT_SOURCES?.split(',') || [ 'https://ai-kit.net/blog', 'https://ai-kit.net/docs', 'https://ai-kit.net/plugins' ]; let crawledCount = 0; for (const source of sources) { const pages = await crawlSource(source, c.env); for (const page of pages) { await db .prepare(` INSERT INTO content_pages (url, title, meta_description, headings, body_snippet, word_count, content_type, last_crawled_at) VALUES (?, ?, ?, ?, ?, ?, ?, datetime('now')) ON CONFLICT(url) DO UPDATE SET title = excluded.title, meta_description = excluded.meta_description, headings = excluded.headings, body_snippet = excluded.body_snippet, word_count = excluded.word_count, content_type = excluded.content_type, last_crawled_at = excluded.last_crawled_at `) .bind(page.url, page.title, page.metaDescription, JSON.stringify(page.headings), page.bodySnippet, page.wordCount, page.contentType) .run(); crawledCount++; } } // Step 2: Run gap analysis against keyword taxonomy const keywords = await db .prepare('SELECT * FROM keyword_taxonomy') .all(); const pages = await db .prepare('SELECT id, url, title, headings, body_snippet, meta_description FROM content_pages') .all(); let covered = 0, partial = 0, missing = 0; const reportLines = []; for (const kw of keywords.results) { const pattern = kw.keyword.toLowerCase(); let bestMatch = null; let bestConfidence = 0; for (const page of pages.results) { const searchText = [ page.title, page.meta_description, page.headings, page.body_snippet ].filter(Boolean).join(' ').toLowerCase(); const match = scoreKeywordMatch(pattern, searchText); if (match.confidence > bestConfidence) { bestConfidence = match.confidence; bestMatch = { pageId: page.id, pageUrl: page.url, pageTitle: page.title, foundIn: match.foundIn, confidence: match.confidence }; } } // Classify coverage level let coverageLevel; if (bestConfidence >= 0.8) { coverageLevel = 'covered'; covered++; } else if (bestConfidence >= 0.3) { coverageLevel = 'partial'; partial++; } else { coverageLevel = 'missing'; missing++; } // Persist keyword-page mapping if (bestMatch) { await db .prepare(` INSERT INTO keyword_coverage (keyword_id, page_id, match_type, found_in, confidence, matched_at) VALUES (?, ?, ?, ?, ?, datetime('now')) ON CONFLICT(keyword_id, page_id) DO UPDATE SET match_type = excluded.match_type, found_in = excluded.found_in, confidence = excluded.confidence, matched_at = excluded.matched_at `) .bind(kw.id, bestMatch.pageId, coverageLevel, bestMatch.foundIn, bestMatch.confidence) .run(); } reportLines.push({ keyword: kw.keyword, theme: kw.theme, priority: kw.priority, coverage: coverageLevel, bestPage: bestMatch?.pageUrl || null, bestPageTitle: bestMatch?.pageTitle || null, confidence: bestConfidence }); } // Step 3: Store report snapshot const reportPayload = { generatedAt: new Date().toISOString(), summary: { totalKeywords: keywords.results.length, covered, partial, missing, totalPagesCrawled: pages.results.length }, details: reportLines }; await db .prepare(` INSERT INTO gap_reports (total_keywords, covered_keywords, partial_keywords, missing_keywords, total_pages_crawled, report_data) VALUES (?, ?, ?, ?, ?, ?) `) .bind(keywords.results.
Key Takeaways
- **Automation is essential** for teams shipping weekly content — manual audits can't keep pace
- **D1 is ideal for this use case** because its global replication keeps queries fast, and SQL makes gap analysis straightforward
- **Integrate gap detection into the editorial workflow** by surfacing gaps directly in the CMS admin panel