CCFish A/B Testing with Feature Flags on Cloudflare Workers: Iterate Without App Store Resubmission

CCFish wraps every experiment in a Cloudflare Workers edge config, so the game fetches which variant a player sees at session start -- no binary update needed. ## The Problem -- App Store Gatekeeping Kills Experiment Velocity Mobile games move fast. A pricing change, a difficulty curve adjustment, a new interstitial placement. Each of these is a hypothesis that needs testing. But every binary change requires App Store review (24-72 hours), TestFlight distribution (another 24 hours for beta approval), and a forced update prompt that 20-30% of users ignore. A typical experiment cycle: 7-14 days from hypothesis to data. For a small team like CCFish, that means 2-3 experiments per month. Unacceptably slow. The root cause is architectural: the game client is a compiled binary. Every gameplay tweak, every pricing change, every ad placement shift gets locked behind a review queue operated by Apple and Google. These review processes exist for good reason -- they protect users from malicious updates -- but they were never designed for the iteration speed modern mobile games need. When a competitor ships a new monetization strategy on Monday, waiting until Friday for App Store approval means losing four days of learning. And the problem compounds. A 7-day experiment cycle means you can run roughly 4-5 experiments per month. At that cadence, you can test maybe two pricing models, one difficulty adjustment, and one ad placement change. You never get to test combinations. You never get to iterate on the losers fast enough. You never build a culture of experimentation because the friction is too high. CCFish needed a different approach. They needed the ability to change game behavior in real time, without touching the binary, without App Store review, without forcing players to download an update. ## The Solution -- Edge-Based Feature Flags Cloudflare Workers + KV provide an HTTP endpoint the game calls at launch to fetch a feature flag payload. Each flag defines a name, a list of variants, and a rollout percentage. The game interprets the active variant and adjusts behavior accordingly. This is not a new idea -- feature flags have existed for years in web development (LaunchDarkly, Split.io, Unleash). But the mobile gaming context changes the constraints. The game client may be offline. The player session is short (2-5 minutes on average). The flag payload must be tiny (under 2KB) to avoid impacting time-to-first-interaction. And the flag system must be reliable enough that a flag fetch failure never crashes the game. CCFish chose Cloudflare Workers because they already run their analytics dashboard on Workers. Sharing infrastructure means shared cost, shared DevOps, and shared expertise. The same team that builds the dashboard can also build the flag system. ## Architecture ``` Game client -> HTTP GET /flags?player_id=xxx&build=2.0.0 -> Cloudflare Worker -> KV lookup (cached for 60s via edge cache) -> Returns JSON flag payload ``` Key design decisions: - KV as source of truth (eventually consistent is fine for flag config) - 60s edge cache via Cache API (not stale-while-revalidate -- we want fresh flags within a minute of config change) - Player ID determines variant via hash-based bucketing (consistent assignment, not random) - Fallback to default values if the Worker is unreachable (game should degrade gracefully) The architecture is deliberately simple. No WebSocket connections, no real-time streaming, no complex state management. The game calls a single HTTP endpoint at session start, gets back a JSON payload, and uses it for the duration of that session. If the call fails, the game uses hardcoded defaults. This keeps the system robust: the game never blocks on flag config. ## Implementation -- The Flag Worker Here is a simplified version of the Worker code that fetches and resolves feature flags: ```typescript export async function getFlags(playerId: string, build: string, env: Env) { const cacheKey = `flags:${build}`; let flags = await env.FLAGS.get(cacheKey, 'json'); if (!flags) { const result = await env.DB.prepare( `SELECT flag_name, variants, default_variant, rollout_pct FROM feature_flags WHERE build = ? AND enabled = 1` ).bind(build).all(); flags = buildFlagMap(result.results); await env.FLAGS.put(cacheKey, JSON.stringify(flags), { expirationTtl: 60 }); } return resolveVariants(playerId, flags); } ``` The caching strategy deserves explanation. KV is eventually consistent, which means a flag update might take a few seconds to propagate globally. The 60-second edge cache via the Cache API adds another layer: once a flag payload is fetched from KV, it is cached at the edge for 60 seconds. This means flag changes propagate in roughly 60 seconds worldwide -- fast enough for any experiment that cares about statistical rigor, slow enough to prevent stampeding KV on every game launch. Player-ID hashing ensures consistent variant assignment. The same player always sees the same variant, session after session. This is critical for user experience: a player who sees the new pricing on Monday and the old pricing on Tuesday will be confused and angry. Hash-based bucketing using the player ID as a seed guarantees consistency without storing per-player assignment state. ## Step-by-Step: Running an Experiment ### Step 1: Define the flag in D1 INSERT into feature_flags table with flag name, two variants (control + treatment), and 10% rollout. The flag definition includes the flag name (e.g., "starter_pack_price"), the two variants ({"control": "$2.99", "treatment": "$1.99"}), the default variant ("control"), and the rollout percentage (10, meaning 10% of players get the treatment). The remaining 90% see the control. The D1 database stores this as a single row, indexed by build number so you can have different flags for different game versions. ### Step 2: Game client reads flags at boot Cocos Creator's startup sequence makes a fetch call to the flag endpoint before the main menu renders. A ~150ms async call that completes before the player sees anything. The call happens in the loading screen. While assets are being prepared, the game fires off a single HTTP request to the Workers endpoint. If the response arrives before the main menu renders (which it almost always does), the flag payload is stored in a singleton manager object. If it does not arrive in time, the main menu renders with defaults and the flags update asynchronously when the response arrives. Either way, the player never waits on flags. ### Step 3: Track variant assignment Each game event (level_start, iap_view, purchase) includes the active variant as a dimension. The analytics pipeline (from the dashboard architecture) slices by variant for comparison. This is where the shared infrastructure pays off. The analytics dashboard already ingests game events. Adding a "variant" dimension to each event is a one-line change in the event schema. The dashboard can then filter, group, and compare by variant without any additional infrastructure. ### Step 4: Analyze and promote After 7 days or 1000 players per variant (whichever comes first), compare KPIs. If treatment wins, update the flag to 100% rollout. If not, set rollout to 0% and delete the flag. The analysis process is straightforward: pull the event data from the analytics pipeline, group by variant, and compare the key metrics. For a pricing experiment, the metric might be conversion rate and average revenue per paying user. For a difficulty experiment, it might be D1 retention, D7 retention, and first-week IAP. The flag config is updated via a simple SQL UPDATE statement on the D1 database. No new binary, no App Store review, no forced update. ## Results -- Real Experiments on CCFish - First-item pricing A/B: "$2.99 starter pack" vs "$1.99 starter pack" -> $1.99 won with 3.4x conversion rate (but $2.99 had higher total revenue per player in the top decile). Segmented by player tier. - Interstitial placement: "Between sessions" vs "After first death" -> After-first-death had 22% higher ad revenue with no retention impact. - Difficulty curve: "Standard" vs "Gentle" -> Gentle improved D7 retention by 18% but reduced first-week IAP by 11%. Segmented by player skill proxy. These experiments reveal something important: the winning variant often depends on the player segment. The $1.99 pricing won overall on conversion, but $2.99 was better for high-value players. The gentle difficulty curve helped retention but hurt monetization from skilled players. Feature flags make segmented rollouts easy: you can target different variants to different player segments based on any criteria the game client can observe.

Key Takeaways

- **Edge-based flags cut experiment cycle from 7 days to 15 minutes** -- feature flags on Cloudflare Workers

- **Player-ID hashing ensures consistent variant assignment** across sessions

- **Fallback to defaults is mandatory** -- the game must never crash on flag fetch failure

- **Same Workers infrastructure powers both analytics and flags** -- shared cost, shared DX

CCFish A/B Testing with Feature Flags on Cloudflare Workers: Iterate Without App Store Resubmission

Key Takeaways

Related Posts

How AIKit's Plugin Ecosystem Generates Organic Content: The Plugin-as-Content Strategy

How to Create Real Token Utility: A Playable Ad Mini-App on Telegram

Building DeFiKit: A Fair Launch Token Powered by Autonomous AI Agents