6 lessons across the team
Promoted writeups become owned, time-boxed lessons with a 90-day review cycle. Stale lessons fall to under-review until refreshed.
Scope rm operations explicitly — never trust 'old' as a filter
When asking Claude to clean up files, always scope by an objective signal (unreferenced in code, last-modified > N days, matched glob). 'Old' is interpreted aggressively.
Front-load the hypothesis, not the trace
Pasting a stack trace alone gives Claude permission to guess. Stating your hypothesis first turns the turn into verification, which is cheaper to be wrong about.
Backfill in adaptive batches — measure replication lag, back off if >2s
Static batch sizes either underuse capacity or melt replicas. The pattern: start small, observe lag, scale until lag exceeds threshold, back off.
Never let Claude paste keys into source — even in 'just to demonstrate' replies
Claude will sometimes echo a key back inside a code suggestion to make the example concrete. Reject the turn and re-prompt with a placeholder.
Use middleware-level rate limiting (legacy)
Apply rate limits in the auth middleware before hitting controllers. Lower latency, simpler retry semantics.
Test SSE replay after adding any new event type
New event types in the SSE channel routinely break clients that were tolerant of the old shape. Add a replay assertion to the integration suite.
What's incubating
Writeups are the operator-driven artifact. Promote one to a lesson when the team agrees the pattern is durable.
Cleaning up test fixtures without nuking CI
★ promotedMy first prompt was too vague — "clean up old fixtures" — and Claude went straight to rm -rf. I caught it at the permission card. Second try I scoped it: "list fixtures unreferenced in tests/, then propose deletions one by one" — that worked.
Stripe webhook idempotency: when to dedupe by event.id vs payload
draftHit a duplicate-charge bug because we were deduping on the payload hash, not event.id. Claude's first instinct was a Redis TTL set; that would have worked but added infra. Second prompt asked for a DB-only solution given our constraints — landed on a UNIQUE(event_id) plus ON CONFLICT DO NOTHING.
Migration backfill: batches of 5k, not 50k
★ promotedTried 50k batches first; replication lag spiked to 8s. Claude suggested batches and a sleep — fine, but the sleep value was hand-wavy. Asked it to compute a sleep based on observed lag and back off if lag > 2s. That logic ended up generalizable; pulled into a small helper.
First week with PairWave — front-loading the hypothesis
draftI used to paste the whole stack trace and ask 'what's wrong?'. Got confident-but-wrong answers. After reading Alice's lesson on hypothesis-first prompts, I tried 'here's the trace, here's what I think is happening (X), verify before changing code'. One-shot rate jumped noticeably across the week.