PlayableAd Studio ships complete, MRAID-compliant playable ads in 30 seconds using a serverless stack built entirely on Cloudflare Workers, D1, KV, and Workers AI — replacing multi-day agency production cycles with a multi-tenant API that costs $0.004 per generation.
The Problem
Building a playable ad generation platform that serves multiple customers simultaneously is a tangle of tradeoffs between latency, cost, and multi-tenant isolation.
**Why traditional compute fails.** A typical generation pipeline requires an LLM call for ad copy, a template compiler that renders output into an HTML5 canvas bundle, MRAID wrapper injection for ad network compliance, and ZIP compression. On EC2 or a VPS, this means provisioning instances that sit idle 90% of the time and managing autoscaling groups that take 3-5 minutes to spin up during traffic spikes.
**The multi-tenant complexity.** Each customer has their own campaigns, templates, API keys, rate limits, and billing tiers. A shared database creates a noisy-neighbor problem where one customer's bulk generation slows everyone's dashboard queries. Fully isolated per-customer databases double costs. Neither extreme works for a product serving indie developers and enterprise UA teams alike.
**The state management problem.** A single "generate" request triggers LLM inference, asset download, template compilation, MRAID wrapping, CDN upload, and campaign record creation. Without real-time deduplication, a network hiccup creates ghost campaigns that consume D1 storage and confuse customers.
**Launching without downtime.** Traditional blue-green deployments on Kubernetes require cluster management overhead that a two-person team cannot sustain while also building the product.
The Solution
PlayableAd Studio chose Cloudflare Workers as the entire compute layer — not as a sidecar or CDN cache, but as the primary runtime for ad generation, API serving, asset compilation, and LLM inference. D1 serves as the relational store, KV handles real-time request deduplication, and Workers AI runs server-side inference.
**Why Workers over EC2 or Vercel.** Workers eliminated cold-start scaling entirely. The same Worker that handles one request per day handles 10,000 concurrent requests with zero operator action. EC2 autoscaling groups have 3-5 minute warm-up latency during bursts, and Vercel's serverless functions impose a 60-second timeout too tight for LLM calls that spike to 45 seconds. Workers also offer sub-5ms cold starts versus 200-800ms for Lambda and native WebSocket support for live previews.
Architecture Overview
The system has four layers:
```
┌──────────────────────┐
│ Cloudflare Workers │
│ (API + Generation) │
└──────┬───────┬───────┘
│ │
┌────────────┘ └────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ D1 Database │ │ Workers AI │
│ (Campaigns, │ │ (LLM Inference) │
│ Templates, │ │ │
│ Tenants) │ └──────────────────┘
└─────────────────┘
│
▼
┌─────────────────┐
│ Cloudflare KV │
│ (Request Dedup, │
│ Session Cache) │
└─────────────────┘
```
1. **API Gateway Worker** — Routes requests by tenant, validates API keys against D1, enforces rate limits via KV counters.
2. **Generation Worker** — Orchestrates LLM calls, template compilation, asset bundling, and MRAID wrapping. Fully stateless.
3. **D1 Database** — Multi-tenant relational store for campaigns, templates, tenants, and generation logs.
4. **KV Namespace** — Real-time dedup keys with 60-second TTL, API key caching, and rate-limit counters.
Implementation
D1 Schema: Multi-Tenant Campaign Management
```sql
CREATE TABLE tenants (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT NOT NULL DEFAULT 'free',
api_key_hash TEXT NOT NULL,
rate_limit INTEGER NOT NULL DEFAULT 60,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE templates (
id TEXT PRIMARY KEY,
tenant_id TEXT NOT NULL,
name TEXT NOT NULL,
engine TEXT NOT NULL DEFAULT 'html5canvas',
config JSON NOT NULL,
active BOOLEAN NOT NULL DEFAULT 1,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
FOREIGN KEY (tenant_id) REFERENCES tenants(id)
);
CREATE TABLE campaigns (
id TEXT PRIMARY KEY,
tenant_id TEXT NOT NULL,
template_id TEXT NOT NULL,
name TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'draft',
ad_network TEXT NOT NULL,
metadata JSON,
generation_ms INTEGER,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
FOREIGN KEY (tenant_id) REFERENCES tenants(id),
FOREIGN KEY (template_id) REFERENCES templates(id)
);
CREATE INDEX idx_campaigns_tenant ON campaigns(tenant_id);
CREATE INDEX idx_campaigns_status ON campaigns(status);
```
Every query includes `WHERE tenant_id = ?` — D1's B-tree indexes handle this efficiently at 50K+ campaigns per tenant. Per-database read replicas isolate read-heavy operations from write-heavy generation.
KV for Real-Time Dedup
```javascript
async function dedupeGeneration(env, tenantId, requestHash) {
const dedupKey = `gen:${tenantId}:${requestHash}`;
const existing = await env.KV.get(dedupKey);
if (existing) return JSON.parse(existing);
const campaignId = crypto.randomUUID();
const result = await runGeneration(env, tenantId, requestHash, campaignId);
await env.KV.put(dedupKey, JSON.stringify({ campaignId }), {
expirationTtl: 60
});
return { campaignId, url: result.downloadUrl };
}
```
This prevents double-generation when a client retries a POST that succeeded server-side but the HTTP response was lost. The 60-second TTL matches maximum generation time.
Workers API: Generation Endpoint
```javascript
export default {
async fetch(request, env) {
const url = new URL(request.url);
const tenantId = request.headers.get('X-Tenant-ID');
const count = await rateLimitCheck(env.KV, tenantId);
if (count > (await getRateLimit(env.DB, tenantId))) {
return new Response('Rate limit exceeded', { status: 429 });
}
if (url.pathname === '/api/generate' && request.method === 'POST') {
const body = await request.json();
const requestHash = await hashRequest(body);
const deduped = await dedupeGeneration(env, tenantId, requestHash);
if (deduped) return Response.json(deduped);
const llmOutput = await env.AI.run('@cf/meta/llama-3.1-8b', {
prompt: buildPrompt(body)
});
const adBundle = await compileAd(body.template, llmOutput);
const campaign = await storeCampaign(env.DB, tenantId, body, adBundle);
return Response.json(
{ campaignId: campaign.id, url: campaign.downloadUrl },
{ status: 201 }
);
}
}
};
```
Deployment and Rollout Strategy
The launch rollout followed four phases over two weeks:
**Phase 1 (Days 1-3) — Foundation.** Deployed API gateway Worker, D1 schema migrations, and KV namespace. Validated multi-tenant isolation with synthetic load tests.
**Phase 2 (Days 4-7) — Generation Pipeline.** Deployed LLM integration and template compiler. Workers AI handled inference natively — no GPU instances or API keys to manage. Tested with 100+ synthetic requests per minute.
**Phase 3 (Days 8-10) — Beta with Design Partners.** Five partners received API keys and dedicated Slack channels. Each ran 50+ generations. We discovered that Worker CPU time limits (30ms CPU per request) required splitting LLM inference into a separate Worker binding to avoid CPU starvation.
**Phase 4 (Days 11-14) — General Availability.** Promoted to production with ten design partners as case studies. Traffic splitting via Cloudflare Load Balancer: 10% canary, 90% stable. Gradual shift to 100% over 4 hours while monitoring D1 latency and KV cache hits.
Results/Impact
- **Generation time:** Median 28 seconds (P95: 52 seconds) from API POST to downloadable ZIP
- **Cost per generation:** $0.0042 — Workers AI $0.0021, D1 writes $0.0003, Worker CPU $0.0018
- **Cold start:** Zero — hot isolates for active tenants, under 5ms for new ones
- **D1 performance:** 12ms median read, 35ms median write at 500 QPS per tenant
- **KV cache hit rate:** 94% for rate limits, 87% for request dedup keys
- **Rollout uptime:** 100% — zero downtime during the entire phased launch
- **Scalability:** 1,200 concurrent requests during GA with zero error rate increase
Key Takeaways
1. **Workers are production-ready for compute-heavy workloads.** Workers AI combined with CPU-bound compilation passes fit within the 30ms CPU budget when decomposed into separate Worker bindings.
2. **D1 replaces most standalone Postgres use cases.** For a multi-tenant SaaS with 50K+ rows per tenant, D1's read replicas and B-tree indexes deliver competitive latency without managing Postgres.
3. **KV is a coordination primitive.** Short-TTL KV keys for deduplication eliminated ghost campaigns — the simplest distributed lock-free dedup mechanism we have used.
4. **Phased rollout beats big-bang releases.** The canary pattern caught two issues — LLM timeout under concurrent load and KV write contention at high QPS — that would have been production incidents in a monolithic launch.
5. **Serverless-first does not mean vendor lock-in.** The entire stack (Workers + D1 + KV + Workers AI) is abstracted behind internal interfaces. Each component can be replaced independently.
PlayableAd Studio is live at playablead.studio. The complete open-source architecture is available on GitHub.