The Problem With Black-Box Trading Bots

Auto-trading bots are notoriously opaque. You set a strategy, turn it on, and check the balance at the end of the week. If the number is green, everything is fine. If its red, you dont know what went wrong -- was the strategy bad, the timing unlucky, or the exchange connection flaky?

DeFiKit runs multiple trading strategies across different exchanges and blockchains. The Ichimoku-based Freqtrade bot on KuCoin. The momentum scanner on Solana. The prediction market bot on HyperLiquid. Each strategy has different risk profiles, time horizons, and failure modes. Running them without a performance monitoring system is like flying blind.

This post covers how we built a lightweight performance monitoring system for DeFiKits trading bots using Cloudflare Workers, D1, and LLM-based analysis -- no external monitoring service required.

Core Metrics We Track

Every strategy emits structured events. From those events, the monitoring system computes five key metrics:

| Metric | Definition | Warning Threshold | Critical Threshold |

|--------|-----------|-------------------|--------------------|

| Win Rate | Profitable trades / total closed trades | < 45% over 50 trades | < 35% over 50 trades |

| Max Drawdown | Largest peak-to-trough decline in portfolio value | > 10% | > 20% |

| Sharpe Ratio | Risk-adjusted return (daily) | < 0.5 over 30 days | < 0.2 over 30 days |

| Slippage Avg | Average execution slippage in basis points | > 5 bps | > 15 bps |

| Consecutive Losses | Number of losing trades in a row | > 5 | > 10 |

These metrics are computed every hour by a Cloudflare Worker that queries the D1 database where trade events are stored.

Database Schema

Trade events land in a simple D1 table:

```sql

CREATE TABLE IF NOT EXISTS trade_events (

id TEXT PRIMARY KEY,

bot_id TEXT NOT NULL,

strategy TEXT NOT NULL,

exchange TEXT NOT NULL,

symbol TEXT NOT NULL,

side TEXT NOT NULL, -- buy | sell

event_type TEXT NOT NULL, -- opened | closed | cancelled

entry_price REAL,

exit_price REAL,

quantity REAL,

pnl REAL, -- realized P&L (null for open positions)

pnl_percent REAL,

slippage_bps REAL,

tx_fee REAL,

raw_data TEXT, -- full bot log as JSON

created_at TEXT DEFAULT (datetime('now'))

);

CREATE INDEX idx_bot_strategy ON trade_events(bot_id, strategy);

CREATE INDEX idx_created ON trade_events(created_at);

```

Every bot writes to this table after each trade execution and position close. The schema is deliberately simple -- the monitoring layer handles the computation.

The LLM-Based Analysis Loop

The key innovation is that performance data isnt just displayed -- its analyzed by an LLM. Heres how the loop works:

1. Every 6 hours, the monitoring worker fetches the last 7 days of trade events per strategy.

2. It computes the five core metrics and packages them as a structured report.

3. The report is sent to an LLM (OpenRouter, any capable model) with a system prompt:

```

You are a quantitative trading analyst reviewing DeFiKit bot performance.

Given the following metrics for strategy {name} over the last 7 days,

provide: 1) A one-paragraph health assessment. 2) Three specific

recommendations for improvement. 3) A verdict: CONTINUE, ADJUST, or PAUSE.

```

4. The LLMs response is logged alongside the metrics and optionally sent as a Telegram alert (PAUSE verdicts are always pushed).

5. If the same strategy gets two consecutive PAUSE verdicts, the system automatically stops new trades and notifies the admin.

Concrete Example

Heres a real output from the monitoring system after an Ichimoku strategy started underperforming:

```

📊 Strategy Health Report -- Ichimoku (KUCOIN)

Period: 2026-04-30 to 2026-05-07

Trades: 48 | Win Rate: 39.6%

Max Drawdown: -12.3% | Sharpe: 0.31

Slippage Avg: 4.2 bps

Consecutive Losses: 7 (active)

🤖 LLM Analysis:

Assessment: The Ichimoku strategy has entered a drawdown period

consistent with ranging markets. The 7 consecutive losses suggest

the current trend-following parameters are catching false breakouts.

Recommendations:

1. Tighten the conversion line (tenkan-sen) period from 9 to 7 for

faster signal generation in the current volatile conditions.

2. Add a volatility filter (ATR > 2%) to skip trades during low-

volatility range-bound periods.

3. Reduce position size from 2% to 1% per trade until win rate

recovers above 50%.

Verdict: ADJUST

```

The system identified the problem (ranging market, false breakouts) and suggested concrete parameter changes -- not just a red or green light.

Building the Monitoring Worker

The monitoring worker is ~200 lines of JavaScript deployed on Cloudflare Workers. Core components:

1. Metrics Computation

```javascript

async function computeMetrics(db, botId, strategy, since) {

const { results } = await db.prepare(`

SELECT

COUNT(*) as total_trades,

SUM(CASE WHEN pnl > 0 THEN 1 ELSE 0 END) as wins,

SUM(CASE WHEN pnl < 0 THEN 1 ELSE 0 END) as losses,

AVG(slippage_bps) as avg_slippage,

MAX(CASE WHEN event_type = 'closed' THEN -pnl_percent ELSE 0 END) as max_dd,

AVG(pnl_percent) as avg_return

FROM trade_events

WHERE bot_id = ? AND strategy = ? AND created_at > ?

`).bind(botId, strategy, since).first();

return {

winRate: results.total_trades > 0

? (results.wins / results.total_trades * 100).toFixed(1) : 0,

maxDrawdown: results.max_dd?.toFixed(1) || 0,

avgSlippage: results.avg_slippage?.toFixed(1) || 0,

consecutiveLosses: await computeConsecutiveLosses(db, botId, strategy)

};

}

```

2. LLM Integration

```javascript

async function analyzeStrategy(metrics, strategyName) {

const prompt = buildAnalysisPrompt(metrics, strategyName);

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {

method: 'POST',

headers: {

'Authorization': `Bearer ${env.OPENROUTER_API_KEY}`,

'Content-Type': 'application/json'

},

body: JSON.stringify({

model: 'openai/gpt-4o-mini',

messages: [{ role: 'user', content: prompt }],

temperature: 0.3 // low temp for analytical consistency

})

});

return await response.json();

}

```

Using a cheaper model (GPT-4o-mini or Claude 3 Haiku) keeps costs under $0.50/month for hourly analysis of 5 strategies.

Alerting Rules Engine

Not every metric deviation needs human attention. We built a simple rules engine that classifies alerts:

| Severity | Rule | Action |

|----------|------|--------|

| 🔴 Critical | Drawdown > 20% OR 10+ consecutive losses | Auto-pause strategy + push Telegram alert |

| 🟡 Warning | Drawdown > 10% OR win rate < 45% | Telegram alert + LLM analysis report |

| 🔵 Info | Slippage > 5 bps OR Sharpe < 0.5 | Log to dashboard only |

| ⚪ All Clear | All metrics green | No action (silent) |

The silence of All Clear is intentional -- nobody needs a notification that everything is fine.

Tracking Over Time

Metrics are sampled hourly and stored in a separate `strategy_snapshots` table:

```sql

CREATE TABLE strategy_snapshots (

id TEXT PRIMARY KEY,

bot_id TEXT NOT NULL,

strategy TEXT NOT NULL,

win_rate REAL,

drawdown REAL,

sharpe_ratio REAL,

total_trades INTEGER,

llm_verdict TEXT,

snapshot_at TEXT NOT NULL

);

```

This enables time-series queries: "Show me the win rate of Ichimoku over the last 30 days." The frontend (a simple Telegram chart generated by chart.js on Workers) renders these as line graphs.

Cost Breakdown

| Component | Monthly Cost |

|-----------|-------------|

| Cloudflare Workers (1M requests/month) | $0 |

| D1 database (5GB storage + 10M reads) | $0.75 |

| OpenRouter LLM API (~3,000 calls/month) | ~$0.40 |

| **Total** | **~$1.15/month** |

For under $2/month, DeFiKit gets real-time strategy monitoring, LLM-powered analysis, automated pausing, and historical trend tracking.

What We Learned

- **LLM analysis is cheaper than a human analyst.** The LLM cant replace deep quantitative research, but it catches the obvious problems (ranging market, parameter drift) before a human would notice them.

- **Silence is a feature.** When every 6-hour report is green, the system stays quiet. Trust the metrics, not notifications.

- **Auto-pause needs manual resume.** Two PAUSE verdicts auto-stop the strategy, but only a human can restart it. This prevents the bot from re-entering a losing loop during a flash crash.

- **Start with 5 metrics, not 20.** More metrics make the dashboard harder to scan. Win rate, drawdown, Sharpe, slippage, and consecutive losses cover 90% of strategy health signals.

Extending to Multi-Exchange Monitoring

The same system now monitors bots across KuCoin, Solana, and HyperLiquid from a single D1 database. Each exchange has different latency, fee structures, and failure modes, but the unified schema means a single dashboard tracks everything. Next up: integrating on-chain data (wallet balances, staking rewards, gas spent) alongside the exchange trade data for a complete portfolio view.