The Retention Problem

You launch a new game feature. Players seem to like it — chat is positive, crash reports are low. But does it actually improve retention? The only way to know is to measure, and the only way to measure correctly is A/B testing.

In early 2025, CCFish needed to test a new onboarding flow. Players were dropping off at the tutorial screen. We had two competing designs: a guided walkthrough (linear, hand-holding) and a sandbox intro (free-form, exploration-based). Each team believed in their approach. Neither could prove it without data.

The problem? Cocos Creator 2.x does not ship with an A/B testing framework. Google Firebase A/B testing expects native mobile SDK integration. Most game studios end up building custom infrastructure or skipping the test entirely.

We built our own. And it cost us about 40 hours of dev time — which paid for itself in the first week of testing.

The CCFish A/B Testing Architecture

The system is simple by design:

```

Game Start → Player Seed → Hash % 100 → Variant A (50%) / Variant B (50%)

Track Events → D1 Database

Dashboard: Retention by Variant

```

Step 1: Deterministic Variant Assignment

Each player gets a persistent variant based on their device ID. No server call needed — the assignment runs entirely on the client:

```lua

-- Cocos Creator Lua example

local function get_ab_variant(experiment_name)

local device_id = cc.UserDefault:getInstance():getStringForKey("device_id")

if device_id == "" then

device_id = generate_uuid()

cc.UserDefault:getInstance():setStringForKey("device_id", device_id)

end

local seed = string.len(experiment_name) .. device_id

local hash = math.abs(string.len(seed) * 31 + 7)

local variant_num = hash % 100

if variant_num < 50 then

return "control" -- Original onboarding

else

return "test" -- New sandbox intro

end

end

```

Key design decisions:

- **Pure client-side.** No network latency, no server dependency, no downtime risk.

- **Deterministic.** Same device always gets the same variant across sessions. This is critical for D7 retention measurement.

- **50/50 split by default.** Adjustable per experiment. Use 10/90 for risky tests, 50/50 for confident results.

Step 2: Event Tracking

Every significant player action gets tagged with the variant:

```typescript

function trackEvent(eventType: string, metadata: object) {

const variant = getCurrentExperimentVariant();

const event = {

pid: playerId,

type: eventType,

variant: variant,

experiment: activeExperimentName,

ts: Date.now(),

...metadata

};

// Fire and forget — no blocking

fetch("https://analytics.ccfish.io/api/events", {

method: "POST",

body: JSON.stringify(event),

keepalive: true

}).catch(() => {}); // Silent fail on offline

}

```

The `keepalive: true` flag is critical — it ensures the event fires even if the player closes the game immediately after the action.

Step 3: D1-Based Results

All events land in Cloudflare D1. The analysis query is straightforward:

```sql

SELECT

variant,

COUNT(DISTINCT player_id) as total_players,

COUNT(DISTINCT CASE WHEN day_num >= 1 THEN player_id END) as d1_retained,

COUNT(DISTINCT CASE WHEN day_num >= 7 THEN player_id END) as d7_retained,

ROUND(1.0 * COUNT(DISTINCT CASE WHEN day_num >= 1 THEN player_id END) /

COUNT(DISTINCT player_id) * 100, 1) as d1_rate,

ROUND(1.0 * COUNT(DISTINCT CASE WHEN day_num >= 7 THEN player_id END) /

COUNT(DISTINCT player_id) * 100, 1) as d7_rate

FROM player_sessions

WHERE experiment = 'onboarding-v2'

AND session_date >= '2026-02-01'

GROUP BY variant

```

No pandas. No notebooks. No data scientist. Just SQL and a browser.

The Results: Sandbox Intro Wins

After 14 days and 3,200 players, here is what the data showed:

| Metric | Control (Guided) | Test (Sandbox) | Delta |

|--------|-----------------|----------------|-------|

| Tutorial Completion | 68% | 82% | +14 pp |

| D1 Retention | 41% | 53% | +12 pp |

| D7 Retention | 16% | 24% | +8 pp |

| First Purchase (D1) | 3.2% | 4.8% | +50% |

| Avg Session Length | 8.4 min | 11.2 min | +33% |

The sandbox intro was the clear winner. Players who explored freely before the tutorial understood the mechanics better, engaged longer, and converted at higher rates.

Without the A/B test, we would have shipped the guided walkthrough based on our gut feeling. That gut feeling would have cost us 12 percentage points of D1 retention.

The Framework as Open Source

We extracted the A/B testing framework into a Cocos Creator plugin and open-sourced it under MIT license. It ships as a single Lua module:

```

ccfish-ab-test/

ab_test.lua # Core framework

experiments.json # Experiment config

dashboard.html # Local results viewer

```

Configuration is declarative:

```json

{

"experiments": [

{

"name": "onboarding-v2",

"variants": ["control", "test"],

"distribution": [50, 50],

"active": true,

"min_sample_size": 1000,

"metrics": ["tutorial_completion", "d1_retention", "d7_retention"]

}

]

}

```

Lessons for Indie Studios

1. **Test one thing at a time.** Running 3 concurrent experiments means you cannot isolate which change caused the effect.

2. **Define the metric before the test.** D7 retention is the gold standard for mobile games. Do not optimize for session count or time-spent — those can be misleading.

3. **Let the test run its course.** We saw the sandbox variant winning after 3 days but waited 14 days before shipping. Early data can flip.

4. **Document failed tests too.** We tested a reward-heavy monetization scheme that looked promising on D1 but cratered D7 retention by 30%. That failure saved us from a costly product mistake.

The Marketing Connection

Every A/B test result is a data point for marketing. The sandbox intro variant did not just improve retention — it changed our ad messaging. We now lead with "Explore, don't follow" in our App Store description and ad creatives. The marketing team uses real retention deltas in their copy. Data-driven product decisions become data-driven marketing decisions.

This is the hybrid dev plus marketing loop in action: build a tool (dev), run an experiment (dev), learn something (marketing), update the messaging (marketing), measure the install quality (dev). Repeat.