The Publishing Bottleneck
When AIKit launched, publishing a single blog post required: writing 1000+ words, formatting Portable Text JSON, running the publisher script, verifying the live URL, and updating the content calendar. At 3 posts per week, this consumed 6-8 hours of human time. Scaling to 12 posts per week would have required a full-time content manager.
Instead, we built an autonomous pipeline that handles the entire lifecycle -- from topic generation to live publication -- without human intervention. Here is how the architecture works and what it took to get there.
Pipeline Architecture
The pipeline spans four stages, each automated by a dedicated Python script and cron trigger:
```
Calendar --> Generator --> Queue --> Publisher --> Live
^ |
+-------- Stats Feedback ----------------+
```
1. **Calendar**: A markdown file listing topics, themes, and rotation rules. Updated automatically after each publishing round.
2. **Generator**: A cron-driven Python script that reads the calendar, generates body text using LLM prompts, and writes validated JSON queue files.
3. **Queue**: A directory of `NN-slug.json` files sorted alphabetically. Each file contains title, body_text, excerpt, category, and tags.
4. **Publisher**: A wrapper script that takes the first queue file, calls `blog-publisher.py` to insert into D1, archives the file, and updates the calendar.
The Generator: Where Content Happens
The generator is the most complex component. It uses a theme rotation to ensure diverse content: Content/Growth, Marketing Automation, Sales Channel, and Hybrid Dev+Marketing. Each theme rotates through five project focuses: AIKit, CCFish, AiSalonHub, PlayableAdStudio, and DeFiKit.
Generation follows a multi-pass pattern. The first pass produces around 400-600 words on a topic. A second pass adds technical depth sections. A third pass covers implementation details and edge cases. Each pass is validated for word count and content type before the next begins.
```python
def generate_post(theme, project):
First pass: core content
body = generate_first_pass(theme, project)
if len(body.split()) < 800:
Second pass: technical deep-dive
body += expand_technical_section(theme, project)
if len(body.split()) < 800:
Third pass: edge cases and extensions
body += expand_use_cases()
return validate(body)
```
Queue Management
The queue directory maintains 2-5 pending posts at all times. When the publisher successfully processes a file, it checks the queue count. If only 1 or fewer posts remain, the generator creates 2 new ones immediately. This just-in-time refill ensures the pipeline never stalls while keeping the queue small enough to avoid wasted content that becomes stale.
Each queue file goes through validation before acceptance:
```bash
Validate JSON structure
python3 -c "import json; json.load(open('queue/NN-slug.json'))"
Verify body_text is string, not array
python3 -c "d=json.load(open('queue/NN-slug.json')); assert isinstance(d['body_text'], str)"
Check word count is 800-1500
python3 -c "d=json.load(open('queue/NN-slug.json')); w=len(d['body_text'].split()); assert 800<=w<=1500, f'{w} words'
```
The Publisher: Zero-Config D1 Insertion
The publisher script converts markdown body text to Sanity Portable Text JSON format -- the format EmDash uses for rich content. It generates a ULID-style ID, computes a URL-safe slug, and executes a four-step D1 insert sequence:
1. Insert the post into `ec_posts` with null revision references
2. Insert a content revision into `revisions` referencing the post ID
3. Update the post with the revision IDs to resolve the circular foreign key
4. Insert SEO metadata into `_emdash_seo` for search engine visibility
The entire sequence takes under 2 seconds. The post is immediately live at `ai-kit.net/blog/<slug>` -- no build step, no redeploy, no cache purge needed.
Handling Failures
No pipeline runs perfectly every time. Common failure modes and their fixes:
- **Slug collision**: The D1 table has a UNIQUE(slug, locale) constraint. The publisher checks D1 before inserting and archives duplicates instead of crashing.
- **Auth errors**: Cloudflare multi-account setups need explicit `CLOUDFLARE_ACCOUNT_ID`. The publisher propagates the env var to subprocesses.
- **JSON quoting**: LLM-generated body text often contains unescaped double quotes. The generator uses `json.dump()` instead of string templates to avoid this.
- **Content calendar corruption**: The calendar file accumulates line-number prefixes when `read_file()` output is written back. The pipeline checks for this before every edit.
Results at 12 Posts Per Week
After running the pipeline for 4 weeks:
| Metric | Manual (3/week) | Automated (12/week) | Improvement |
|--------|----------------|-------------------|-------------|
| Posts published | 12 per month | 48 per month | 4x volume |
| Human hours | 6-8 hours/week | 0 hours | Fully automated |
| Time to publish | 15 min manual | <2 seconds | 450x faster |
| Error rate | ~5% (typos, formatting) | <1% (schema edge cases) | 5x improvement |
Key Takeaways
A fully automated content pipeline is achievable without a massive engineering budget. AIKit uses tools already available to any indie developer: cron, Python, D1, and markdown files. The key is designing each stage to fail safely, validating aggressively, and never trusting LLM output without type and length checks. With this pipeline, a solo developer can maintain the content output of a five-person marketing team.