CCFish runs player event data through an LLM-powered clustering pipeline on Cloudflare Workers, segmenting thousands of players into behavioral cohorts in seconds without manual tagging or rule-writing.
The Problem
Manual player segmentation is a bottleneck that does not scale. Before automating, the CCFish marketing team relied on spreadsheets and SQL queries written by hand for each campaign. Every time they wanted to target "whales who haven't purchased in 7 days" or "social sharers who hit level 15," someone had to:
- Export raw event data from the game database
- Write and debug SQL to filter players by event types and thresholds
- Manually bucket players into segments based on intuition
- Repeat the entire process when campaign parameters changed
With 50,000+ monthly active players and 200+ distinct event types (login, purchase, level_up, share, tournament_join, etc.), this approach collapsed under its own weight. Segments were stale by the time they were deployed, and the team could only manage 3-4 audience segments per campaign. The result was broad, undifferentiated messaging that treated "paying users" as a single monolithic group.
The Solution
CCFish replaced manual segmentation with an automated LLM-powered pipeline that reads raw player event data, generates embedding vectors per player, clusters them into behavioral segments using unsupervised learning, and writes segment membership to KV for sub-10ms real-time lookup during campaign execution.
The system uses three core components:
1. **Player Event Vectorizer** -- A Cloudflare Worker that transforms each player's recent event history into a 384-dimensional embedding using a distilled sentence-transformer model running via ONNX Runtime on Workers
2. **HDBSCAN Clustering Engine** -- An automated clustering step that runs on a scheduled Worker every 6 hours, grouping players by behavioral similarity without requiring a preset number of clusters
3. **Segment Service API** -- A lightweight Worker endpoint that returns all segments for a given player_id in under 15ms by querying KV
This lets the CCFish marketing team run a single API call to get "all players in segment 7" and serve them tailored in-app messages, push notifications, or Telegram bot DMs without any manual data work.
Architecture
The segmentation pipeline runs entirely on Cloudflare's edge network with zero persistent servers. Here is the architecture:
```
Player Events (analytics SDK) -> D1 database (event store)
-> Cron Worker (every 6h): fetch events, vectorize, cluster
-> KV namespace: player_id -> [segment_ids]
-> D1: segment_metadata table (cluster labels, descriptions)
-> Segment Worker API: GET /segments/{player_id}
-> KV lookup (sub-10ms)
-> Campaign Worker: GET /segment/{segment_id}/players
-> D1 query for segment membership list
```
Data Flow Details
| Component | Technology | Role |
|-----------|-----------|------|
| Event store | Cloudflare D1 | Stores raw player_event rows (event_type, timestamp, metadata) |
| Vectorization | Cloudflare Workers + ONNX | Runs a distilled MiniLM-L6 model to produce player embedding vectors |
| Clustering | Cloudflare Workers cron (every 6h) | Runs HDBSCAN over the most recent 14 days of player vectors |
| Segment cache | Cloudflare KV | Stores player_id -> segment_ids mapping for low-latency reads |
| Campaign delivery | Cloudflare Workers | Reads segment membership and triggers Telegram Bot API / pushes |
The D1 database stores player events with this schema:
```sql
CREATE TABLE player_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
player_id TEXT NOT NULL,
event_type TEXT NOT NULL,
event_ts INTEGER NOT NULL, -- unix epoch seconds
metadata TEXT, -- JSON blob
created_at INTEGER DEFAULT (unixepoch())
);
CREATE INDEX idx_player_events_player_id ON player_events(player_id);
CREATE INDEX idx_player_events_event_ts ON player_events(event_ts);
```
The KV namespace stores a JSON array of segment IDs per player:
```
Key: segment:player:{player_id}
Value: ["whale_churn_risk", "social_boosters", "tournament_regulars", "new_player_cohort"]
TTL: 86400 (24 hours, refreshed by cron)
```
Implementation
Segment rules in CCFish are defined as declarative config blocks that the clustering pipeline uses as seed hints. Here is an example segment rule configuration:
```json
{
"segment_rules": [
{
"name": "whale_churn_risk",
"description": "Players with high LTV who have not purchased in 7+ days",
"features": ["purchase_7d_count", "purchase_7d_sum", "days_since_last_purchase", "ltv_percentile", "login_7d_count"],
"min_cluster_size": 50,
"min_samples": 10
},
{
"name": "social_boosters",
"description": "Players who share frequently and invite friends",
"features": ["share_7d_count", "invite_accept_rate", "friend_count", "tournament_share_rate", "session_count"],
"min_cluster_size": 30,
"min_samples": 5
},
{
"name": "tournament_regulars",
"description": "Daily tournament participants with high engagement",
"features": ["tournament_7d_count", "tournament_avg_rank", "session_duration_avg", "login_7d_count", "purchase_7d_count"],
"min_cluster_size": 25,
"min_samples": 5
}
]
}
```
The vectorization Worker pulls the last 14 days of events for each player and constructs a feature vector. The HDBSCAN clustering then groups players into emergent segments that often surface unexpected patterns -- for example, it discovered a cluster of "high-spending social lurkers" that the team had never manually tagged: players who spend heavily but never share or interact socially.
The Segment Worker API exposes a simple interface for the campaign engine:
```typescript
// GET /segments/:playerId
interface SegmentResponse {
player_id: string;
segments: Array<{
id: string;
name: string;
description: string;
confidence: number;
}>;
generated_at: string;
}
```
Campaigns are then configured with targeting rules that reference segment names directly:
```json
{
"campaign": "whale_reactivation_june",
"target_segments": ["whale_churn_risk"],
"message": {
"type": "in_app_popup",
"title": "Welcome back!",
"body": "You've been missed. Here's 200 bonus gems for returning.",
"cta": "Collect Gems",
"deep_link": "ccfish://shop?bonus=gems200"
},
"schedule": {
"start": "2026-06-05T00:00:00Z",
"end": "2026-06-12T00:00:00Z",
"max_impressions_per_player": 1
}
}
```
Results
After deploying the automated segmentation pipeline, CCFish measured a significant improvement across key marketing metrics:
| Metric | Before (Manual) | After (Automated) | Improvement |
|--------|----------------|-------------------|-------------|
| Campaigns per quarter | 4 | 18 | +350% |
| Audience segments per campaign | 3-4 | 12-18 | +350% |
| Time to build a segment | 4-6 hours | < 15 seconds | ~99.9% faster |
| Push notification CTR | 4.2% | 8.7% | +107% |
| In-app message conversion | 6.1% | 12.3% | +102% |
| Whale churn rate (30d) | 22% | 14% | -36% |
| Player segment discovery | Manual (missed patterns) | Automated (finds hidden clusters) | N/A |
The most impactful result was the discovery of the "social lurkers" cluster -- a group that accounted for 15% of revenue but had been entirely invisible to manual segmentation because they did not trigger any single obvious rule. Targeted campaigns for this segment drove a 23% lift in average revenue per user over 30 days.
The team also eliminated segment staleness. Because the pipeline runs every 6 hours, player segments reflect current behavior, not two-week-old snapshots. A player who stops purchasing and starts sharing more is reclassified within hours, not days.
Key Takeaways
1. **Automated clustering finds segments manual rules miss.** LLM-powered vectorization captures behavioral similarity across dozens of event dimensions simultaneously, surfacing player cohorts that no single SQL filter would discover. The "social lurkers" cluster alone proved this point.
2. **Edge-native architecture keeps latency near zero.** Running the full pipeline on Cloudflare Workers, D1, and KV means segment lookups complete in milliseconds with no cold starts. The campaign engine can evaluate targeting rules for 10,000 players in under 2 seconds.
3. **Declarative segment configs make marketing teams autonomous.** With segment rules defined as JSON config and campaign targeting referencing segment names, the marketing team can launch new campaigns without opening a single SQL editor. This eliminated the engineering bottleneck that throttled campaign volume.
4. **Frequent retraining prevents segment drift.** The 6-hour cron cadence ensures segments reflect current player behavior. This is critical for mobile games where player engagement patterns shift daily based on events, tournaments, and new content releases. Stale segments are worse than no segments -- they deliver wrong messages to wrong audiences.