If you're publishing blog content without tracking what actually drives traffic, engagement, and conversions — you're flying blind. AIKit's blog (running on EmDash CMS with Astro and Cloudflare D1) treats content as a data product: every page view, scroll depth, keyword ranking, and conversion event feeds back into our editorial decisions. Here's exactly how we built a lightweight analytics pipeline that transformed our SEO strategy.

The Problem — Blind Content Publishing Without Data

Most technical blogs — especially open-source projects like AIKit — fall into a common trap: publish and pray. You write a tutorial, share it on social media, and maybe check Google Analytics weeks later. But the feedback loop is too slow, and the signal is too noisy.

**We had three specific problems:**

| Problem | Symptom | Impact |

|---|---|---|

| No page-level engagement data | Couldn't tell if readers reached the code examples | High bounce rate on tutorials |

| Keyword ranking blind spots | Writing content based on intuition, not data | Low organic CTR for high-potential terms |

| No conversion attribution | Couldn't connect blog posts to API sign-ups | Wasted effort on low-ROI content |

Our blog was running on Astro with EmDash CMS and Cloudflare D1 — a fast, cost-effective stack. But we had zero insight into what was working. We needed a first-party analytics system that respected privacy while giving us actionable SEO intelligence.

The Solution — Analytics Pipeline Architecture

We built a lightweight, first-party analytics pipeline that lives entirely on Cloudflare's edge network. Here's the stack:

- **Cloudflare Workers** — intercept page views, engagement events, and outbound link clicks

- **Cloudflare D1** — serverless SQLite database for storing analytics events

- **EmDash CMS** — headless CMS that serves content metadata alongside analytics

- **Google Search Console API** — pulled in via a scheduled Worker to sync keyword performance

- **Custom dashboard** — an Astro page querying D1 for real-time content insights

**Why first-party analytics?** No cookie banners, no third-party scripts, no GDPR headaches. Every event is captured at the edge and stored in our own D1 database.

Architecture Overview — How the Data Flows

```mermaid

flowchart LR

A[Visitor Browser] --> B[Cloudflare Edge]

B --> C[Worker: Analytics Collector]

C --> D[Cloudflare D1]

D --> E[Daily Aggregation Worker]

E --> F[Dashboard API]

G[Google Search Console] --> H[Keyword Sync Worker]

H --> D

F --> I[Astro Dashboard Page]

I --> J[Editorial Team]

J --> K[Content Decisions]

K --> A

```

The flow works in five stages:

1. **Capture** — Every page request hits a Cloudflare Worker that extracts path, referrer, anonymized user agent, and engagement time

2. **Store** — Events are batched and written to D1 tables: `page_views`, `engagement_events`, `conversions`

3. **Enrich** — A daily cron Worker joins raw page views with Search Console keyword data from the past 28 days

4. **Serve** — The Astro dashboard queries D1 through an API endpoint protected by Cloudflare Access

5. **Act** — The editorial team reviews a ranked "Content Opportunity Score" table every Monday morning

Implementation — Tracking Page Views, Engagement, and Conversions

Page View Tracking

Every request to `/blog/*` passes through this Worker handler:

```javascript

export default {

async fetch(request, env) {

const url = new URL(request.url);

if (!url.pathname.startsWith('/blog/')) {

return env.ASSETS.fetch(request);

}

const event = {

path: url.pathname,

referrer: request.headers.get('Referer') || 'direct',

user_agent: anonymizeUA(request.headers.get('User-Agent')),

country: request.cf?.country || 'unknown',

timestamp: Date.now()

};

env.DB.prepare(

`INSERT INTO page_views (path, referrer, country, timestamp)

VALUES (?1, ?2, ?3, ?4)`

).bind(event.path, event.referrer, event.country, event.timestamp)

.run().catch(console.error);

return env.ASSETS.fetch(request);

}

};

```

Key decisions: **fire-and-forget** (never block the response), **anonymized user agents** (no fingerprinting), and **no cookies** (GDPR compliant out of the box).

Engagement & Scroll Tracking

A tiny client-side script (1.2 KB gzipped) tracks three signals:

| Signal | Threshold | Purpose |

|---|---|---|

| Scroll depth | 25%, 50%, 75%, 100% | Which sections hold attention |

| Time on page | 5s, 30s, 60s, 120s | Engagement quality |

| Outbound clicks | Any external link | Content referral value |

```javascript

const THRESHOLDS = [25, 50, 75, 100];

let sent = new Set();

const observer = new IntersectionObserver((entries) => {

entries.forEach(entry => {

if (entry.isIntersecting) {

const depth = Math.round((entry.boundingClientRect.top / document.documentElement.scrollHeight) * 100);

const nearest = THRESHOLDS.find(t => depth <= t);

if (nearest && !sent.has(nearest)) {

sent.add(nearest);

navigator.sendBeacon('/api/analytics/scroll', JSON.stringify({

path: window.location.pathname,

depth: nearest,

ts: Date.now()

}));

}

}

});

}, { threshold: [0, 0.25, 0.5, 0.75, 1.0] });

observer.observe(document.body);

```

Conversion Tracking

We track two conversion types from blog content:

1. **GitHub stars** — Clicks to `github.com/nousresearch/ai-kit` with `?ref=blog-{slug}` parameter

2. **API sign-ups** — When a visitor hits `/docs/getting-started` within 7 days of reading a blog post

This attribution lives in a `conversions` table keyed by slug:

```sql

CREATE TABLE conversions (

slug TEXT PRIMARY KEY,

github_stars INTEGER DEFAULT 0,

api_signups INTEGER DEFAULT 0,

last_attributed DATE

);

```

Search Console Integration

A weekly cron Worker syncs keyword performance from Google Search Console:

```javascript

async function syncKeywords(env) {

const response = await fetch(

'https://searchconsole.googleapis.com/v1/sites/ai-kit.net/searchAnalytics/query',

{

method: 'POST',

headers: { 'Authorization': `Bearer ${await getGSCAccessToken(env)}`, 'Content-Type': 'application/json' },

body: JSON.stringify({

startDate: '28daysAgo', endDate: 'today',

dimensions: ['page', 'query'], rowLimit: 5000

})

}

);

const data = await response.json();

const stmt = env.DB.prepare(

`INSERT INTO keyword_performance (slug, keyword, impressions, clicks, position, date)

VALUES (?1, ?2, ?3, ?4, ?5, date('now'))

ON CONFLICT (slug, keyword, date) DO UPDATE SET

impressions = ?3, clicks = ?4, position = ?5`

);

for (const row of data.rows) {

const slug = extractSlug(row.keys[0]);

await stmt.bind(slug, row.keys[1], row.impressions, row.clicks, row.position).run();

}

}

```

Results — Metrics Showing Improvement

After six months of running this pipeline, here's what we saw:

| Metric | Before | After | Change |

|---|---|---:|---:|

| Organic traffic (monthly) | ~8,200 sessions | ~24,600 sessions | **+200%** |

| Avg. blog page position | 18.4 | 7.2 | **−11.2** |

| Content with >50% scroll depth | ~22% | ~67% | **+45pp** |

| Blog-to-signup conversion rate | 0.8% | 3.1% | **+2.3pp** |

| High-value keyword coverage | 47 keywords | 218 keywords | **+364%** |

**Biggest wins from specific data-driven decisions:**

- **Topic pruning:** Identified 14 posts with <50 views and >80% bounce rate. Consolidated or retired them, redirecting traffic to stronger content.

- **Keyword gap filling:** Search Console showed we ranked #11–15 for "LLM API proxy" but had no dedicated post. We published one — it reached #3 in 6 weeks.

- **Content refresh triggers:** Posts with declining impressions over 3 months get auto-flagged for refresh. We updated 8 posts and saw average **34% traffic recovery** within 30 days.

- **Section-level optimization:** Scroll-depth data revealed 70% of readers skipped the "prerequisites" section. Moving it to a collapsible element improved time-on-page by 22%.

**Real example — "Prompt Routing" post:**

```

Week 0: Published "How to Route LLM Prompts with AIKit"

Week 2: 340 impressions, avg position 24.1, scroll-depth 38%

Week 3: Added TL;DR summary + code example above the fold

Week 6: 2,100 impressions, avg position 8.7, scroll-depth 71%

Week 8: 8 blog-to-signup conversions attributed

```

Key Takeaways

If you're running a technical blog for an open-source project, here's what we learned:

1. **First-party analytics is the only way to go.** Third-party tools are blocked by ad blockers, slow down your site, and raise privacy concerns. Cloudflare Workers + D1 cost us ~$5/month.

2. **Connect content data to keyword data.** Raw page views tell you what's popular. Search Console keyword position tells you what's possible. Combining them — the **Content Opportunity Score** — tells you what to write next.

3. **Track engagement, not just visits.** Scroll depth and time-on-page reveal whether your content actually answers the reader's question. If they bounce at 25%, your intro needs work.

4. **Close the attribution loop.** Without knowing which posts drive sign-ups and GitHub stars, you'll optimize for traffic that doesn't convert.

5. **Automate the boring parts.** Weekly keyword syncs, engagement alerts, and content refresh flags should be Workers on cron — not manual reports.

---

AIKit is an open-source LLM API proxy and sandbox with observability built in. Our blog runs on EmDash CMS with Astro and Cloudflare D1 — the same stack we use to dogfood this analytics approach. Fork the Worker scripts, adapt the D1 schema, and start treating your blog content as a data product.