A/B Testing DeFiKit Trading Signals: How Feature Flags Optimized Our Conversion Rates

The Problem: Inconsistent Signal-to-Trade Conversion

DeFiKit’s Telegram mini-app delivers real-time trading signals for Solana and TON prediction games. Our pipeline—multi-agent Bot Matrix architecture—pulls wallet data via RPC agents, computes crash-point predictions through signal agents, and pushes alerts to Telegram groups through messenger agents powered by grammY and Telegraf. The technical side worked beautifully. The conversion side did not.

Our signal-to-trade conversion rate hovered around 12–14%. Users were seeing signals, but something between “signal received” and “trade executed” was breaking down. We hypothesized that signal threshold parameters—minimum confidence score, volatility window, cooldown periods—were tuned too conservatively for most users but too aggressively for power users. The problem? Changing any of those parameters meant a full redeploy of our Cloudflare Workers, a pull-request review cycle, and at minimum 20 minutes before the change reached production. We could test exactly one hypothesis per deployment, and we had a dozen variables to optimize.

We needed a way to run concurrent A/B tests on signal parameters without touching the deployment pipeline. We needed feature flags.

The Solution: Cloudflare Workers + KV-Based Feature Flags

Cloudflare Workers already formed the backbone of our serverless backend. Workers KV—their global, low-latency key-value store—was already in our stack for configuration storage. Using KV as a feature flag system was a natural extension: store flag definitions and segment assignments in KV, resolve them at request time in Worker middleware, and let non-developers toggle parameters through a simple admin interface (a Telegram command handler) without ever opening a pull request.

The architecture is simple:

- **KV namespace** holds flag configurations as JSON blobs keyed by flag name

- **Worker middleware** resolves the active flag variant for each incoming request (or user session)

- **Analytics pipeline** tracks which variant each user saw and whether they converted

- **Admin interface** (Telegram command `/setflag`) lets product managers toggle flags in real time

A product manager can say “Set signal confidence threshold to 0.75 for 25% of users” and it’s live in under two seconds—no deploy, no code review, no DevOps involvement.

Architecture Overview

```

User Request → Worker Entry Point

↓

Flag Resolution Middleware

↓

KV Lookup: /flags/{flag_name}

↓

Resolve: user_id → segment → variant

↓

Attach variant config to request context

↓

Downstream handlers read ctx.featureFlags

```

The Worker runtime reads flag config from KV at the start of each request, resolves the user’s segment (stable hashing of user ID into buckets), and serves different signal thresholds to different segments. KV reads globally average 2–5ms at the edge, meaning the feature flag overhead is imperceptible to users.

Step-by-Step Implementation

1. Flag Storage in KV

Each flag is a JSON object stored under a predictable key. We namespace flags under `/flags/` for organization.

**KV key structure:**

```

/flags/signal-confidence-threshold

/flags/volatility-window-seconds

/flags/cooldown-minutes

/flags/max-leverage

```

**Flag value format:**

```json

{

"variants": {

"control": {

"weight": 50,

"config": {"minConfidence": 0.85, "volatilityWindow": 300}

"treatment_a": {

"weight": 25,

"config": {"minConfidence": 0.75, "volatilityWindow": 180}

"treatment_b": {

"weight": 25,

"config": {"minConfidence": 0.80, "volatilityWindow": 120}

}

"defaultVariant": "control",

"enabled": true

}

```

2. Worker Flag Resolution Middleware

We wrote a lightweight middleware that runs before any request handler. It reads the relevant flags from KV, resolves the user’s variant using consistent hashing, and attaches the resolved config to the request context.

```javascript

// flag-resolver.js — Worker middleware

import { getFlaggableUserId } from './segment-utils';

const FLAG_CACHE_TTL = 30_000; // 30 seconds local cache

const flagCache = new Map();

export async function resolveFlags(request, env, ctx) {

const userId = getFlaggableUserId(request);

// Fetch all active flags from KV (with local cache)

const flags = await getActiveFlags(env);

const resolved = {};

for (const [flagName, flagDef] of Object.entries(flags)) {

if (!flagDef.enabled) {

resolved[flagName] = flagDef.variants[flagDef.defaultVariant].config;

continue;

}

// Consistent hash user into a bucket

const variant = assignVariant(userId, flagName, flagDef.variants);

resolved[flagName] = flagDef.variants[variant].config;

// Record assignment for analytics

ctx.waitUntil(logFlagAssignment(userId, flagName, variant));

}

request.featureFlags = resolved;

}

function assignVariant(userId, flagName, variants) {

const hash = simpleHash(`${userId}:${flagName}`);

const totalWeight = Object.values(variants).reduce((s, v) => s + v.weight, 0);

const bucket = hash % totalWeight;

let cumulative = 0;

for (const [name, variant] of Object.entries(variants)) {

cumulative += variant.weight;

if (bucket < cumulative) return name;

}

return Object.keys(variants)[0];

}

```

The `getFlaggableUserId` helper extracts the Telegram user ID from the request context (either a direct API call or a signed Telegram WebApp init data). The consistent hashing ensures the same user always sees the same variant for the same flag, which is critical for valid A/B test results.

3. Analytics Pipeline for A/B Comparison

We log every flag assignment—user ID, flag name, variant, timestamp—to a separate KV namespace that serves as our raw event store. A scheduled Worker (cron trigger every 15 minutes) aggregates this data into a summary table.

**Analytics query pattern:**

```sql

-- Pseudocode query run against aggregated data

SELECT

flag_name,

variant,

COUNT(DISTINCT user_id) AS users_exposed,

SUM(CASE WHEN converted THEN 1 ELSE 0 END) AS conversions,

ROUND(

SUM(CASE WHEN converted THEN 1 ELSE 0 END) * 100.0

/ COUNT(DISTINCT user_id), 2

) AS conversion_rate

FROM flag_assignments

JOIN trade_events ON flag_assignments.user_id = trade_events.user_id

WHERE flag_name = 'signal-confidence-threshold'

AND timestamp >= NOW() - INTERVAL '7 days'

GROUP BY flag_name, variant

ORDER BY conversion_rate DESC;

```

We built a simple dashboard endpoint (`/analytics/flags`) that renders these numbers as a table in JSON, which our Telegram admin bot can query with `/flagstats signal-confidence-threshold`.

Results

The A/B test ran for 14 days with approximately 4,200 users randomly assigned across control and two treatment variants. Here are the numbers:

<table>

<tr><th>Variant</th><th>Users Exposed</th><th>Conversions</th><th>Conversion Rate</th><th>vs. Control</th></tr>

<tr><td>Control (0.85 confidence)</td><td>1,412</td><td>183</td><td>12.96%</td><td>—</td></tr>

<tr><td>Treatment A (0.75 confidence)</td><td>1,396</td><td>232</td><td>16.62%</td><td>+28.2%</td></tr>

<tr><td>Treatment B (0.80 confidence)</td><td>1,392</td><td>257</td><td>18.47%</td><td>+42.5%</td></tr>

</table>

**Weighted average uplift across treatments: 34%**

Treatment B (0.80 min confidence, 120s volatility window) emerged as the clear winner. We promoted it to 100% of users with a single KV write operation—a `PUT` to `/flags/signal-confidence-threshold` with Treatment B’s config as the default variant. Time from decision to full rollout: **4.7 seconds**.

**Flag toggling latency measured:**

- KV read latency (p95): 4.8ms

- KV write latency (p95): 6.2ms

- Flag resolution overhead per request: <1ms (in-memory hash + cache hit)

- End-to-end flag update to edge propagation: 2–7 seconds (KV global cache invalidation)

**Cost comparison:**

- Before feature flags: ~$47/month in CI/CD minutes for redeploys related to parameter tuning

- After feature flags: ~$0.30/month in additional KV reads (negligible bandwidth)

- Annualized savings: ~$560 in compute costs alone, not counting engineering time

Key Takeaways

1. **Feature flags decouple deployment from release.** We can change trading signal parameters 50 times a day without a single Worker deploy. This is the single biggest productivity win for a lean team.

2. **KV is surprisingly capable as a feature flag backend.** With sub-5ms read latency and global replication, Cloudflare Workers KV handles flag resolution at scale without dedicated infrastructure. For teams already on Workers, it’s essentially free.

3. **Segment-based assignment with consistent hashing is critical for valid A/B tests.** Without stable user-to-variant mapping, your results are noise. Our `assignVariant` function uses a simple hash that keeps users in their assigned bucket even as flag configs change.

4. **Let non-developers run experiments.** The Telegram `/setflag` command handler was the game-changer. Our product lead now runs A/B tests independently, which means more experiments, more data, and faster optimization cycles.

5. **Start small, measure everything.** We began with one flag (`signal-confidence-threshold`) and one metric (signal-to-trade conversion). Now we have 14 flags covering everything from UI copy variants to notification timing preferences. The pattern scales perfectly.