The Retention Problem
You launch a new game feature. Players seem to like it — chat is positive, crash reports are low. But does it actually improve retention? The only way to know is to measure, and the only way to measure correctly is A/B testing.
In early 2025, CCFish needed to test a new onboarding flow. Players were dropping off at the tutorial screen. We had two competing designs: a guided walkthrough (linear, hand-holding) and a sandbox intro (free-form, exploration-based). Each team believed in their approach. Neither could prove it without data.
The problem? Cocos Creator 2.x does not ship with an A/B testing framework. Google Firebase A/B testing expects native mobile SDK integration. Most game studios end up building custom infrastructure or skipping the test entirely.
We built our own. And it cost us about 40 hours of dev time — which paid for itself in the first week of testing.
The CCFish A/B Testing Architecture
The system is simple by design:
```
Game Start → Player Seed → Hash % 100 → Variant A (50%) / Variant B (50%)
↓
Track Events → D1 Database
↓
Dashboard: Retention by Variant
```
Step 1: Deterministic Variant Assignment
Each player gets a persistent variant based on their device ID. No server call needed — the assignment runs entirely on the client:
```lua
-- Cocos Creator Lua example
local function get_ab_variant(experiment_name)
local device_id = cc.UserDefault:getInstance():getStringForKey("device_id")
if device_id == "" then
device_id = generate_uuid()
cc.UserDefault:getInstance():setStringForKey("device_id", device_id)
end
local seed = string.len(experiment_name) .. device_id
local hash = math.abs(string.len(seed) * 31 + 7)
local variant_num = hash % 100
if variant_num < 50 then
return "control" -- Original onboarding
else
return "test" -- New sandbox intro
end
end
```
Key design decisions:
- **Pure client-side.** No network latency, no server dependency, no downtime risk.
- **Deterministic.** Same device always gets the same variant across sessions. This is critical for D7 retention measurement.
- **50/50 split by default.** Adjustable per experiment. Use 10/90 for risky tests, 50/50 for confident results.
Step 2: Event Tracking
Every significant player action gets tagged with the variant:
```typescript
function trackEvent(eventType: string, metadata: object) {
const variant = getCurrentExperimentVariant();
const event = {
pid: playerId,
type: eventType,
variant: variant,
experiment: activeExperimentName,
ts: Date.now(),
...metadata
};
// Fire and forget — no blocking
fetch("https://analytics.ccfish.io/api/events", {
method: "POST",
body: JSON.stringify(event),
keepalive: true
}).catch(() => {}); // Silent fail on offline
}
```
The `keepalive: true` flag is critical — it ensures the event fires even if the player closes the game immediately after the action.
Step 3: D1-Based Results
All events land in Cloudflare D1. The analysis query is straightforward:
```sql
SELECT
variant,
COUNT(DISTINCT player_id) as total_players,
COUNT(DISTINCT CASE WHEN day_num >= 1 THEN player_id END) as d1_retained,
COUNT(DISTINCT CASE WHEN day_num >= 7 THEN player_id END) as d7_retained,
ROUND(1.0 * COUNT(DISTINCT CASE WHEN day_num >= 1 THEN player_id END) /
COUNT(DISTINCT player_id) * 100, 1) as d1_rate,
ROUND(1.0 * COUNT(DISTINCT CASE WHEN day_num >= 7 THEN player_id END) /
COUNT(DISTINCT player_id) * 100, 1) as d7_rate
FROM player_sessions
WHERE experiment = 'onboarding-v2'
AND session_date >= '2026-02-01'
GROUP BY variant
```
No pandas. No notebooks. No data scientist. Just SQL and a browser.
The Results: Sandbox Intro Wins
After 14 days and 3,200 players, here is what the data showed:
| Metric | Control (Guided) | Test (Sandbox) | Delta |
|--------|-----------------|----------------|-------|
| Tutorial Completion | 68% | 82% | +14 pp |
| D1 Retention | 41% | 53% | +12 pp |
| D7 Retention | 16% | 24% | +8 pp |
| First Purchase (D1) | 3.2% | 4.8% | +50% |
| Avg Session Length | 8.4 min | 11.2 min | +33% |
The sandbox intro was the clear winner. Players who explored freely before the tutorial understood the mechanics better, engaged longer, and converted at higher rates.
Without the A/B test, we would have shipped the guided walkthrough based on our gut feeling. That gut feeling would have cost us 12 percentage points of D1 retention.
The Framework as Open Source
We extracted the A/B testing framework into a Cocos Creator plugin and open-sourced it under MIT license. It ships as a single Lua module:
```
ccfish-ab-test/
ab_test.lua # Core framework
experiments.json # Experiment config
dashboard.html # Local results viewer
```
Configuration is declarative:
```json
{
"experiments": [
{
"name": "onboarding-v2",
"variants": ["control", "test"],
"distribution": [50, 50],
"active": true,
"min_sample_size": 1000,
"metrics": ["tutorial_completion", "d1_retention", "d7_retention"]
}
]
}
```
Lessons for Indie Studios
1. **Test one thing at a time.** Running 3 concurrent experiments means you cannot isolate which change caused the effect.
2. **Define the metric before the test.** D7 retention is the gold standard for mobile games. Do not optimize for session count or time-spent — those can be misleading.
3. **Let the test run its course.** We saw the sandbox variant winning after 3 days but waited 14 days before shipping. Early data can flip.
4. **Document failed tests too.** We tested a reward-heavy monetization scheme that looked promising on D1 but cratered D7 retention by 30%. That failure saved us from a costly product mistake.
The Marketing Connection
Every A/B test result is a data point for marketing. The sandbox intro variant did not just improve retention — it changed our ad messaging. We now lead with "Explore, don't follow" in our App Store description and ad creatives. The marketing team uses real retention deltas in their copy. Data-driven product decisions become data-driven marketing decisions.
This is the hybrid dev plus marketing loop in action: build a tool (dev), run an experiment (dev), learn something (marketing), update the messaging (marketing), measure the install quality (dev). Repeat.