How PlayableAd Studio's Serverless Architecture Was Built and Launched on Cloudflare Workers

PlayableAd Studio ships complete, MRAID-compliant playable ads in 30 seconds using a serverless stack built entirely on Cloudflare Workers, D1, KV, and Workers AI — replacing multi-day agency production cycles with a multi-tenant API that costs $0.004 per generation.

The Problem

Building a playable ad generation platform that serves multiple customers simultaneously is a tangle of tradeoffs between latency, cost, and multi-tenant isolation.

**Why traditional compute fails.** A typical generation pipeline requires an LLM call for ad copy, a template compiler that renders output into an HTML5 canvas bundle, MRAID wrapper injection for ad network compliance, and ZIP compression. On EC2 or a VPS, this means provisioning instances that sit idle 90% of the time and managing autoscaling groups that take 3-5 minutes to spin up during traffic spikes.

**The multi-tenant complexity.** Each customer has their own campaigns, templates, API keys, rate limits, and billing tiers. A shared database creates a noisy-neighbor problem where one customer's bulk generation slows everyone's dashboard queries. Fully isolated per-customer databases double costs. Neither extreme works for a product serving indie developers and enterprise UA teams alike.

**The state management problem.** A single "generate" request triggers LLM inference, asset download, template compilation, MRAID wrapping, CDN upload, and campaign record creation. Without real-time deduplication, a network hiccup creates ghost campaigns that consume D1 storage and confuse customers.

**Launching without downtime.** Traditional blue-green deployments on Kubernetes require cluster management overhead that a two-person team cannot sustain while also building the product.

The Solution

PlayableAd Studio chose Cloudflare Workers as the entire compute layer — not as a sidecar or CDN cache, but as the primary runtime for ad generation, API serving, asset compilation, and LLM inference. D1 serves as the relational store, KV handles real-time request deduplication, and Workers AI runs server-side inference.

**Why Workers over EC2 or Vercel.** Workers eliminated cold-start scaling entirely. The same Worker that handles one request per day handles 10,000 concurrent requests with zero operator action. EC2 autoscaling groups have 3-5 minute warm-up latency during bursts, and Vercel's serverless functions impose a 60-second timeout too tight for LLM calls that spike to 45 seconds. Workers also offer sub-5ms cold starts versus 200-800ms for Lambda and native WebSocket support for live previews.

Architecture Overview

The system has four layers:

```

┌──────────────────────┐

│ Cloudflare Workers │

│ (API + Generation) │

└──────┬───────┬───────┘

│ │

┌────────────┘ └────────────┐

│ │

▼ ▼

┌─────────────────┐ ┌──────────────────┐

│ D1 Database │ │ Workers AI │

│ (Campaigns, │ │ (LLM Inference) │

│ Templates, │ │ │

│ Tenants) │ └──────────────────┘

└─────────────────┘

│

▼

┌─────────────────┐

│ Cloudflare KV │

│ (Request Dedup, │

│ Session Cache) │

└─────────────────┘

```

1. **API Gateway Worker** — Routes requests by tenant, validates API keys against D1, enforces rate limits via KV counters.

2. **Generation Worker** — Orchestrates LLM calls, template compilation, asset bundling, and MRAID wrapping. Fully stateless.

3. **D1 Database** — Multi-tenant relational store for campaigns, templates, tenants, and generation logs.

4. **KV Namespace** — Real-time dedup keys with 60-second TTL, API key caching, and rate-limit counters.

Implementation

D1 Schema: Multi-Tenant Campaign Management

```sql

CREATE TABLE tenants (

id TEXT PRIMARY KEY,

name TEXT NOT NULL,

tier TEXT NOT NULL DEFAULT 'free',

api_key_hash TEXT NOT NULL,

rate_limit INTEGER NOT NULL DEFAULT 60,

created_at TEXT NOT NULL DEFAULT (datetime('now'))

);

CREATE TABLE templates (

id TEXT PRIMARY KEY,

tenant_id TEXT NOT NULL,

name TEXT NOT NULL,

engine TEXT NOT NULL DEFAULT 'html5canvas',

config JSON NOT NULL,

active BOOLEAN NOT NULL DEFAULT 1,

created_at TEXT NOT NULL DEFAULT (datetime('now')),

FOREIGN KEY (tenant_id) REFERENCES tenants(id)

);

CREATE TABLE campaigns (

id TEXT PRIMARY KEY,

tenant_id TEXT NOT NULL,

template_id TEXT NOT NULL,

name TEXT NOT NULL,

status TEXT NOT NULL DEFAULT 'draft',

ad_network TEXT NOT NULL,

metadata JSON,

generation_ms INTEGER,

created_at TEXT NOT NULL DEFAULT (datetime('now')),

FOREIGN KEY (tenant_id) REFERENCES tenants(id),

FOREIGN KEY (template_id) REFERENCES templates(id)

);

CREATE INDEX idx_campaigns_tenant ON campaigns(tenant_id);

CREATE INDEX idx_campaigns_status ON campaigns(status);

```

Every query includes `WHERE tenant_id = ?` — D1's B-tree indexes handle this efficiently at 50K+ campaigns per tenant. Per-database read replicas isolate read-heavy operations from write-heavy generation.

KV for Real-Time Dedup

```javascript

async function dedupeGeneration(env, tenantId, requestHash) {

const dedupKey = `gen:${tenantId}:${requestHash}`;

const existing = await env.KV.get(dedupKey);

if (existing) return JSON.parse(existing);

const campaignId = crypto.randomUUID();

const result = await runGeneration(env, tenantId, requestHash, campaignId);

await env.KV.put(dedupKey, JSON.stringify({ campaignId }), {

expirationTtl: 60

});

return { campaignId, url: result.downloadUrl };

}

```

This prevents double-generation when a client retries a POST that succeeded server-side but the HTTP response was lost. The 60-second TTL matches maximum generation time.

Workers API: Generation Endpoint

```javascript

export default {

async fetch(request, env) {

const url = new URL(request.url);

const tenantId = request.headers.get('X-Tenant-ID');

const count = await rateLimitCheck(env.KV, tenantId);

if (count > (await getRateLimit(env.DB, tenantId))) {

return new Response('Rate limit exceeded', { status: 429 });

}

if (url.pathname === '/api/generate' && request.method === 'POST') {

const body = await request.json();

const requestHash = await hashRequest(body);

const deduped = await dedupeGeneration(env, tenantId, requestHash);

if (deduped) return Response.json(deduped);

const llmOutput = await env.AI.run('@cf/meta/llama-3.1-8b', {

prompt: buildPrompt(body)

});

const adBundle = await compileAd(body.template, llmOutput);

const campaign = await storeCampaign(env.DB, tenantId, body, adBundle);

return Response.json(

{ campaignId: campaign.id, url: campaign.downloadUrl },

{ status: 201 }

);

}

};

```

Deployment and Rollout Strategy

The launch rollout followed four phases over two weeks:

**Phase 1 (Days 1-3) — Foundation.** Deployed API gateway Worker, D1 schema migrations, and KV namespace. Validated multi-tenant isolation with synthetic load tests.

**Phase 2 (Days 4-7) — Generation Pipeline.** Deployed LLM integration and template compiler. Workers AI handled inference natively — no GPU instances or API keys to manage. Tested with 100+ synthetic requests per minute.

**Phase 3 (Days 8-10) — Beta with Design Partners.** Five partners received API keys and dedicated Slack channels. Each ran 50+ generations. We discovered that Worker CPU time limits (30ms CPU per request) required splitting LLM inference into a separate Worker binding to avoid CPU starvation.

**Phase 4 (Days 11-14) — General Availability.** Promoted to production with ten design partners as case studies. Traffic splitting via Cloudflare Load Balancer: 10% canary, 90% stable. Gradual shift to 100% over 4 hours while monitoring D1 latency and KV cache hits.

Results/Impact

- **Generation time:** Median 28 seconds (P95: 52 seconds) from API POST to downloadable ZIP

- **Cost per generation:** $0.0042 — Workers AI $0.0021, D1 writes $0.0003, Worker CPU $0.0018

- **Cold start:** Zero — hot isolates for active tenants, under 5ms for new ones

- **D1 performance:** 12ms median read, 35ms median write at 500 QPS per tenant

- **KV cache hit rate:** 94% for rate limits, 87% for request dedup keys

- **Rollout uptime:** 100% — zero downtime during the entire phased launch

- **Scalability:** 1,200 concurrent requests during GA with zero error rate increase

Key Takeaways

1. **Workers are production-ready for compute-heavy workloads.** Workers AI combined with CPU-bound compilation passes fit within the 30ms CPU budget when decomposed into separate Worker bindings.

2. **D1 replaces most standalone Postgres use cases.** For a multi-tenant SaaS with 50K+ rows per tenant, D1's read replicas and B-tree indexes deliver competitive latency without managing Postgres.

3. **KV is a coordination primitive.** Short-TTL KV keys for deduplication eliminated ghost campaigns — the simplest distributed lock-free dedup mechanism we have used.

4. **Phased rollout beats big-bang releases.** The canary pattern caught two issues — LLM timeout under concurrent load and KV write contention at high QPS — that would have been production incidents in a monolithic launch.

5. **Serverless-first does not mean vendor lock-in.** The entire stack (Workers + D1 + KV + Workers AI) is abstracted behind internal interfaces. Each component can be replaced independently.

PlayableAd Studio is live at playablead.studio. The complete open-source architecture is available on GitHub.

How PlayableAd Studio's Serverless Architecture Was Built and Launched on Cloudflare Workers

The Problem

The Solution

Architecture Overview

Implementation

D1 Schema: Multi-Tenant Campaign Management

KV for Real-Time Dedup

Workers API: Generation Endpoint

Deployment and Rollout Strategy

Results/Impact

Key Takeaways

Related Posts

How to Set Up a Telegram Token Bot for Your Community: A DeFiKit Bot Maker Runbook

PlayableAd Studio Content Syndication Kit: Turn One Demo Into Partner-Ready Growth Assets

AIKit Answer Engine Pages: Turning SEO Articles Into LLM-Ready Conversion Paths