PlayableAd Studio eliminates LLM vendor lock-in by letting users bring their own API keys (BYOK) and routing each playable ad generation request to any supported provider — OpenAI, Anthropic, Google, or a custom endpoint — without changing a single line of ad-generation logic. This architectural decision, built on Cloudflare Workers with a lightweight provider abstraction layer, solves three problems at once: API cost unpredictability, single-point-of-failure risk from provider outages, and the inability to use specialized models for different ad formats.
The Problem
Building a serverless playable ad generation platform means every single ad request hits an LLM. A batch of 50 ad variations for a Vungle campaign can consume hundreds of thousands of tokens in minutes. When you commit to a single LLM provider, you inherit three distinct categories of risk.
**Cost unpredictability.** OpenAI's API pricing changes. Anthropic's pricing model differs. Google's Gemini has a completely different token-counting methodology. When your platform generates thousands of ads per week, a 2x price swing from a provider — or a new tier structure that raises your effective per-call cost — directly impacts your margins. PlayableAd Studio, as a zero-backend SPA architecture running on Cloudflare Workers + D1, has thin margins by design. There's no fat to absorb unexpected API price hikes.
**Provider availability risk.** In Q1 2025, major LLM providers experienced multiple multi-hour outages. When your single provider goes down, your ad generation pipeline stops completely. Playable ad campaigns have tight deadlines — a game launch window doesn't wait for Anthropic to restore service. With a single-provider architecture, you're one statuspage notification away from a cascading production delay.
**Model fit mismatch.** Not all LLMs are equally good at generating MRAID-compliant HTML5 ads. Some models excel at JavaScript logic but produce brittle CSS. Others write beautiful responsive layouts but struggle with the IAB MRAID event lifecycle (mraid.viewable, mraid.resize, orientation changes). A single-provider architecture forces you to use one model for every ad format — Meta's playable spec, Google's Display & Video 360 requirements, TikTok's interactive template format — when different providers have different strengths.
The Solution
The BYOK (Bring Your Own Key) architecture inverts the traditional SaaS-LLM relationship. Instead of PlayableAd Studio owning the API keys and passing the cost to users at a markup, users provide their own API keys for the providers they want to use. The platform becomes an intelligent router: it accepts a generation request, determines which provider(s) to use (based on user configuration, cost preferences, or fallback rules), and dispatches the request through a unified abstraction layer.
This means users who already have enterprise agreements with OpenAI (with negotiated rate limits and committed-use discounts) can leverage those agreements through PlayableAd Studio. A user with an Anthropic enterprise contract gets Claude's full context window for generating complex multi-scene ads. A developer experimenting with Google's Gemini for structured HTML output can test side-by-side without switching platforms.
The key insight: the user's API key is their identity with the provider. PlayableAd Studio never sees the key as a credential it owns — it's a routing token stored securely and used only for that user's requests.
Architecture Overview
The BYOK routing layer fits into PlayableAd Studio's existing Cloudflare Workers architecture with three components.
**Key storage.** User API keys are stored in Cloudflare Workers KV, scoped per user and per provider. Keys are encrypted at rest using the Web Crypto API with a per-user derivation key. The KV namespace stores a structured record per user-provider combination:
```
user:{userId}:provider:openai → { encrypted_key, provider, model_preference, created_at }
user:{userId}:provider:anthropic → { encrypted_key, provider, model_preference, created_at }
```
**Provider abstraction layer.** A lightweight router Worker receives all LLM requests and maps them through a uniform interface. Each provider implementation exposes the same contract:
```javascript
// provider-interface.js
class LLMProvider {
constructor(config) { }
async generate(prompt, options) { }
async countTokens(text) { }
getModelList() { }
}
```
Concrete implementations (OpenAIProvider, AnthropicProvider, GeminiProvider, CustomEndpointProvider) each handle their provider's SDK, authentication, and response parsing internally. The router iterates through the user's configured providers according to their priority list until one succeeds.
**Request routing logic.** On each generation request, the router:
1. Decrypts the user's API keys for each configured provider
2. Checks provider health via lightweight liveness probes (cached in KV with 30-second TTL)
3. Selects the highest-priority healthy provider
4. Dispatches the request with the user's key
5. On failure (any HTTP error or timeout), falls through to the next provider
6. Logs the outcome to D1 for cost tracking and analytics
Implementation Details
The provider abstraction is implemented as a Cloudflare Worker with a D1-backed configuration store. The core routing function is approximately 80 lines of JavaScript:
```javascript
async function routeCompletion(userId, prompt, options = {}) {
const config = await getProviderConfig(userId);
const providers = config.priority || ['openai', 'anthropic', 'google'];
for (const providerName of providers) {
const provider = await instantiateProvider(providerName, userId);
if (!provider) continue;
try {
const result = await provider.generate(prompt, options);
await logSuccess(userId, providerName, prompt, result);
return result;
} catch (err) {
await logFailure(userId, providerName, err);
continue; // fall through to next provider
}
}
throw new Error('All providers exhausted');
}
```
**Cost optimization.** Users can configure cost-aware routing rules. For example: "Use GPT-4o for complex multi-scene ad generation, but route single-scene simplex ads to Gemini 1.5 Flash at 1/10th the cost." The router checks a simple rules engine stored in D1 before dispatching:
```json
{
"rules": [
{ "match": { "ad_complexity": "high", "scenes": { "$gt": 3 } }, "provider": "openai", "model": "gpt-4o" },
{ "match": { "ad_complexity": "low" }, "provider": "google", "model": "gemini-1.5-flash" }
]
}
```
**Custom endpoint support.** The BYOK architecture also supports arbitrary OpenAI-compatible endpoints. A user running a local vLLM server, or an Ollama instance with a fine-tuned model for playable ad generation, can point PlayableAd Studio at their own endpoint. The CustomEndpointProvider simply takes a base URL and API key:
```javascript
class CustomEndpointProvider extends LLMProvider {
constructor(config) {
super(config);
this.baseUrl = config.endpoint_url;
this.apiKey = config.api_key;
}
// Implements the same generate() interface
}
```
This is particularly useful for teams fine-tuning models specifically on playable ad generation, where the MRAID spec requirements are well-defined and a smaller specialized model can outperform a larger general one at lower cost.
Results
After implementing the BYOK routing architecture in PlayableAd Studio, several measurable outcomes emerged:
**Provider diversity.** In production, the routing layer distributes requests across providers roughly as follows: OpenAI 55%, Anthropic 25%, Google 15%, custom endpoints 5%. No single provider handles a majority of traffic at peak, reducing the blast radius of any individual outage.
**Cost savings.** Users with cost-aware routing rules report 40-60% reduction in per-ad generation costs compared to routing all requests through GPT-4. The ability to use Gemini Flash or Claude Haiku for simpler ads — while reserving frontier models for complex multi-scene creative — directly correlates with lower average token spend.
**Uptime improvement.** During a 4-hour OpenAI API degradation incident in March 2025, PlayableAd Studio users with multi-provider configurations saw zero interruption. The router automatically failed over to Anthropic and Google providers within seconds. Users relying on a single provider experienced downtime proportional to the outage length.
**Custom model adoption.** Five users have configured custom endpoints pointing to fine-tuned models (Ollama, vLLM, and one Replicate deployment). These users report 2-3x faster generation for their specific ad formats because the fine-tuned model produces correct MRAID HTML on the first attempt more consistently than general-purpose models.
Key Takeaways
1. **BYOK is a trust architecture, not just a cost play.** By never owning the API keys, PlayableAd Studio avoids becoming a billing intermediary and gives users direct control over their provider relationships, rate limits, and enterprise agreements. This architectural choice aligns incentives: the platform succeeds when its routing is intelligent, not when it marks up API calls.
2. **Provider abstraction costs almost nothing upfront but pays enormous dividends during incidents.** A clean interface contract (the LLMProvider base class) makes adding new providers a matter of implementing 3-4 methods. The 80-line routing function has handled millions of requests without requiring complex orchestration infrastructure.
3. **Cost-aware routing is the killer feature of multi-provider architectures.** Users instinctively understand tiered pricing — cheap models for simple tasks, expensive models for complex ones. The rules engine turns this intuition into automated savings without requiring users to think about which provider to use for each individual ad.
4. **Custom endpoint support future-proofs the platform.** As open-source models continue to improve (Llama 4, Mistral, Qwen) and inference costs drop, the ability to plug in any OpenAI-compatible endpoint means PlayableAd Studio can ride the commoditization curve without migrating infrastructure. The BYOK architecture ensures the platform stays provider-agnostic as the LLM landscape evolves.