EXECUTION
Dunning Brain — Rollout Plan
Four-phase rollout from observe-only deployment through cleanup. Each phase is independently deployable and reversible.
Deployed Apr 3, 2026. The observe-only brain runs on every invoice.marked_uncollectible and invoice.paid event, reads from Stripe, evaluates all policy decisions, and logs what it would do. No writes, no side effects.
| Item | Status | Detail |
|---|---|---|
| Code: observe-only brain | ✅ Done | MR !9658 — internal/dunning package, 44 test cases passing |
| Wiring: handler integration | ✅ Done | Runs after audit trail, nil guard for no-op when unwired |
| Logdash spec: dunning-brain | ✅ Done | MR !23 — 27 patterns in specs/dunning-brain.yaml |
| Grafana dashboards | ✅ Done | Aggregate (6 rows, 13 panels) + account drilldown |
| Deploy to production | ✅ Done | Deployed Apr 3, 2026 16:00 UTC |
| Validate parity | ✅ Done | 20/20 amounts match, 10/10 BQ coverage, 3/3 filters agree, 0 errors |
Validation Results
Validation ran Apr 3–5. Cross-referenced brain ES logs against billing-webhooks behavior and BigQuery invoice data.
Bad debt amounts match billing-webhooks exactly
BQ invoices found in brain logs
Filtered invoices agree (all account_resolution filter)
Errors over entire validation period
Fill the three convergence gaps in applystripe so it can be the sole bad debt execution path. See the applystripe convergence page for the full gap analysis and commit breakdown.
Deploy the convergence MR and activate the brain. Writes begin immediately — no config flip. Then disable billing-webhooks dunning path.
Dual-Write Safety
Between Step 1 (brain activated) and Step 3 (billing-webhooks disabled), both systems process the same events. applystripe is fully idempotent — the second writer is a no-op for all actions except email.
| Action | Idempotent? | During Dual-Write |
|---|---|---|
| Flag bad debt in DB | ✅ Yes | Sets boolean flag — same result on re-apply |
| Apply account ban | ✅ Yes | Ban already applied = IAPI no-op |
| Remove DNU from zones | ✅ Yes | Already removed = IAPI no-op |
| Cancel subscription | ✅ Yes | Already canceled = Stripe no-op |
| Update Stripe metadata | ✅ Yes | Set to same value — no change |
| Send bad debt email | ⚠️ No | Customer may get duplicate email during overlap. Keep window short. |
After 1+ weeks of stable production operation with billing-webhooks dunning disabled.
| Cleanup Item | Where | Jira |
|---|---|---|
| Delete dunning handlers | billing-webhooks | FINPE-1402 |
| Delete DISABLED_EVENTS config | billing-webhooks deployment | FINPE-1402 |
| Remove observe-only logging code | internal/dunning (optional) | — |
| Update documentation | billing-knowledge pages | — |
| Risk | Severity | Mitigation |
|---|---|---|
| Brain applystripe call fails silently | Medium | Errors are logged non-fatally (dunning brain: applystripe failed). The webhook still succeeds, so Stripe won't retry. Monitor ES for this pattern after deploy. If persistent, revert the convergence MR. |
| Duplicate email during dual-write | Low | Customer gets two bad debt emails. Keep dual-write window to 1-3 days. Not fixable without a dedup mechanism — acceptable tradeoff. |
| hasEnterpriseSubs omission | Low | Brain may flag accounts that billing-webhooks would skip. All remaining enterprise customers are contract-segment (filtered by shouldSkipDunning). Validated in Phase 1. |
| TOCTOU: invoice paid between check and ban | Low | Pre-existing from billing-webhooks. Same race window. Brain doesn't change the risk profile. |
| applystripe Gather adds latency | Low | applystripe.Gather makes ~15 sequential external calls inside the webhook handler. If it times out, the error is logged and the webhook succeeds. Stripe will redeliver and retry. |
Rollback Strategy
| Phase | Rollback | Effect |
|---|---|---|
| Phase 1 | Revert MR !9658 → deploy | Brain stops running. Nothing to undo — it never wrote anything. |
| Phase 2 | Revert convergence MR → deploy | Brain returns to observe-only. applystripe gap fixes remain (harmless). |
| Phase 3 | Remove DISABLED_EVENTS in billing-webhooks → restart | billing-webhooks resumes dunning. Both systems active until brain reverted. |
| Ticket | Summary | Status | Blocked By |
|---|---|---|---|
| FINPE-1393 | Ship observe-only dunning brain | ✅ Closed | — |
| FINPE-1394 | Validate parity (brain vs billing-webhooks) | ✅ Closed | — |
| FINPE-1400 | Implement write ops — applystripe convergence + invoice-repo | 🔴 Open | — |
| FINPE-1401 | Cut over dunning from billing-webhooks to brain | ⏳ Open | FINPE-1400 |
| FINPE-1402 | Delete billing-webhooks dunning code | ⏳ Open | FINPE-1401 |
Each phase is independently deployable and reversible. Phase 1 proved the brain sees everything billing-webhooks sees. Phase 2 makes applystripe capable. Phase 3 flips the switch. Phase 4 cleans up.