If you're publishing blog content without tracking what actually drives traffic, engagement, and conversions — you're flying blind. AIKit's blog (running on EmDash CMS with Astro and Cloudflare D1) treats content as a data product: every page view, scroll depth, keyword ranking, and conversion event feeds back into our editorial decisions. Here's exactly how we built a lightweight analytics pipeline that transformed our SEO strategy.
The Problem — Blind Content Publishing Without Data
Most technical blogs — especially open-source projects like AIKit — fall into a common trap: publish and pray. You write a tutorial, share it on social media, and maybe check Google Analytics weeks later. But the feedback loop is too slow, and the signal is too noisy.
**We had three specific problems:**
| Problem | Symptom | Impact |
|---|---|---|
| No page-level engagement data | Couldn't tell if readers reached the code examples | High bounce rate on tutorials |
| Keyword ranking blind spots | Writing content based on intuition, not data | Low organic CTR for high-potential terms |
| No conversion attribution | Couldn't connect blog posts to API sign-ups | Wasted effort on low-ROI content |
Our blog was running on Astro with EmDash CMS and Cloudflare D1 — a fast, cost-effective stack. But we had zero insight into what was working. We needed a first-party analytics system that respected privacy while giving us actionable SEO intelligence.
The Solution — Analytics Pipeline Architecture
We built a lightweight, first-party analytics pipeline that lives entirely on Cloudflare's edge network. Here's the stack:
- **Cloudflare Workers** — intercept page views, engagement events, and outbound link clicks
- **Cloudflare D1** — serverless SQLite database for storing analytics events
- **EmDash CMS** — headless CMS that serves content metadata alongside analytics
- **Google Search Console API** — pulled in via a scheduled Worker to sync keyword performance
- **Custom dashboard** — an Astro page querying D1 for real-time content insights
**Why first-party analytics?** No cookie banners, no third-party scripts, no GDPR headaches. Every event is captured at the edge and stored in our own D1 database.
Architecture Overview — How the Data Flows
```mermaid
flowchart LR
A[Visitor Browser] --> B[Cloudflare Edge]
B --> C[Worker: Analytics Collector]
C --> D[Cloudflare D1]
D --> E[Daily Aggregation Worker]
E --> F[Dashboard API]
G[Google Search Console] --> H[Keyword Sync Worker]
H --> D
F --> I[Astro Dashboard Page]
I --> J[Editorial Team]
J --> K[Content Decisions]
K --> A
```
The flow works in five stages:
1. **Capture** — Every page request hits a Cloudflare Worker that extracts path, referrer, anonymized user agent, and engagement time
2. **Store** — Events are batched and written to D1 tables: `page_views`, `engagement_events`, `conversions`
3. **Enrich** — A daily cron Worker joins raw page views with Search Console keyword data from the past 28 days
4. **Serve** — The Astro dashboard queries D1 through an API endpoint protected by Cloudflare Access
5. **Act** — The editorial team reviews a ranked "Content Opportunity Score" table every Monday morning
Implementation — Tracking Page Views, Engagement, and Conversions
Page View Tracking
Every request to `/blog/*` passes through this Worker handler:
```javascript
export default {
async fetch(request, env) {
const url = new URL(request.url);
if (!url.pathname.startsWith('/blog/')) {
return env.ASSETS.fetch(request);
}
const event = {
path: url.pathname,
referrer: request.headers.get('Referer') || 'direct',
user_agent: anonymizeUA(request.headers.get('User-Agent')),
country: request.cf?.country || 'unknown',
timestamp: Date.now()
};
env.DB.prepare(
`INSERT INTO page_views (path, referrer, country, timestamp)
VALUES (?1, ?2, ?3, ?4)`
).bind(event.path, event.referrer, event.country, event.timestamp)
.run().catch(console.error);
return env.ASSETS.fetch(request);
}
};
```
Key decisions: **fire-and-forget** (never block the response), **anonymized user agents** (no fingerprinting), and **no cookies** (GDPR compliant out of the box).
Engagement & Scroll Tracking
A tiny client-side script (1.2 KB gzipped) tracks three signals:
| Signal | Threshold | Purpose |
|---|---|---|
| Scroll depth | 25%, 50%, 75%, 100% | Which sections hold attention |
| Time on page | 5s, 30s, 60s, 120s | Engagement quality |
| Outbound clicks | Any external link | Content referral value |
```javascript
const THRESHOLDS = [25, 50, 75, 100];
let sent = new Set();
const observer = new IntersectionObserver((entries) => {
entries.forEach(entry => {
if (entry.isIntersecting) {
const depth = Math.round((entry.boundingClientRect.top / document.documentElement.scrollHeight) * 100);
const nearest = THRESHOLDS.find(t => depth <= t);
if (nearest && !sent.has(nearest)) {
sent.add(nearest);
navigator.sendBeacon('/api/analytics/scroll', JSON.stringify({
path: window.location.pathname,
depth: nearest,
ts: Date.now()
}));
}
}
});
}, { threshold: [0, 0.25, 0.5, 0.75, 1.0] });
observer.observe(document.body);
```
Conversion Tracking
We track two conversion types from blog content:
1. **GitHub stars** — Clicks to `github.com/nousresearch/ai-kit` with `?ref=blog-{slug}` parameter
2. **API sign-ups** — When a visitor hits `/docs/getting-started` within 7 days of reading a blog post
This attribution lives in a `conversions` table keyed by slug:
```sql
CREATE TABLE conversions (
slug TEXT PRIMARY KEY,
github_stars INTEGER DEFAULT 0,
api_signups INTEGER DEFAULT 0,
last_attributed DATE
);
```
Search Console Integration
A weekly cron Worker syncs keyword performance from Google Search Console:
```javascript
async function syncKeywords(env) {
const response = await fetch(
'https://searchconsole.googleapis.com/v1/sites/ai-kit.net/searchAnalytics/query',
{
method: 'POST',
headers: { 'Authorization': `Bearer ${await getGSCAccessToken(env)}`, 'Content-Type': 'application/json' },
body: JSON.stringify({
startDate: '28daysAgo', endDate: 'today',
dimensions: ['page', 'query'], rowLimit: 5000
})
}
);
const data = await response.json();
const stmt = env.DB.prepare(
`INSERT INTO keyword_performance (slug, keyword, impressions, clicks, position, date)
VALUES (?1, ?2, ?3, ?4, ?5, date('now'))
ON CONFLICT (slug, keyword, date) DO UPDATE SET
impressions = ?3, clicks = ?4, position = ?5`
);
for (const row of data.rows) {
const slug = extractSlug(row.keys[0]);
await stmt.bind(slug, row.keys[1], row.impressions, row.clicks, row.position).run();
}
}
```
Results — Metrics Showing Improvement
After six months of running this pipeline, here's what we saw:
| Metric | Before | After | Change |
|---|---|---:|---:|
| Organic traffic (monthly) | ~8,200 sessions | ~24,600 sessions | **+200%** |
| Avg. blog page position | 18.4 | 7.2 | **−11.2** |
| Content with >50% scroll depth | ~22% | ~67% | **+45pp** |
| Blog-to-signup conversion rate | 0.8% | 3.1% | **+2.3pp** |
| High-value keyword coverage | 47 keywords | 218 keywords | **+364%** |
**Biggest wins from specific data-driven decisions:**
- **Topic pruning:** Identified 14 posts with <50 views and >80% bounce rate. Consolidated or retired them, redirecting traffic to stronger content.
- **Keyword gap filling:** Search Console showed we ranked #11–15 for "LLM API proxy" but had no dedicated post. We published one — it reached #3 in 6 weeks.
- **Content refresh triggers:** Posts with declining impressions over 3 months get auto-flagged for refresh. We updated 8 posts and saw average **34% traffic recovery** within 30 days.
- **Section-level optimization:** Scroll-depth data revealed 70% of readers skipped the "prerequisites" section. Moving it to a collapsible element improved time-on-page by 22%.
**Real example — "Prompt Routing" post:**
```
Week 0: Published "How to Route LLM Prompts with AIKit"
Week 2: 340 impressions, avg position 24.1, scroll-depth 38%
Week 3: Added TL;DR summary + code example above the fold
Week 6: 2,100 impressions, avg position 8.7, scroll-depth 71%
Week 8: 8 blog-to-signup conversions attributed
```
Key Takeaways
If you're running a technical blog for an open-source project, here's what we learned:
1. **First-party analytics is the only way to go.** Third-party tools are blocked by ad blockers, slow down your site, and raise privacy concerns. Cloudflare Workers + D1 cost us ~$5/month.
2. **Connect content data to keyword data.** Raw page views tell you what's popular. Search Console keyword position tells you what's possible. Combining them — the **Content Opportunity Score** — tells you what to write next.
3. **Track engagement, not just visits.** Scroll depth and time-on-page reveal whether your content actually answers the reader's question. If they bounce at 25%, your intro needs work.
4. **Close the attribution loop.** Without knowing which posts drive sign-ups and GitHub stars, you'll optimize for traffic that doesn't convert.
5. **Automate the boring parts.** Weekly keyword syncs, engagement alerts, and content refresh flags should be Workers on cron — not manual reports.
---
AIKit is an open-source LLM API proxy and sandbox with observability built in. Our blog runs on EmDash CMS with Astro and Cloudflare D1 — the same stack we use to dogfood this analytics approach. Fork the Worker scripts, adapt the D1 schema, and start treating your blog content as a data product.