PAYGO PLATFORM
PayPal Health Audit
An honest assessment of PayPal integration health. What we fixed, what we measured, what we still don't know, and what doesn't fit the narrative.
The FINPE-1311 void fix works. User-initiated uncollectible invoices dropped from 10–18/day to zero. That's the good news. The rest of this page is about everything else.
User-initiated uncollectibles per day since Mar 20. Was 10–18. Complete elimination.
Customers hit by the void bug before the fix. 30+ CUSTESC tickets. Executive escalation.
New BQ query specs created. None have been run yet.
Two fixes shipped in March 2026. Together they close the user-initiated invoice void gap completely.
| Fix | Deployed | Mechanism | Status |
|---|---|---|---|
| FINPE-1311 void fix | ~Mar 16 | PayPal invoices now voided on payment failure instead of going uncollectible | ✅ Holding |
| VoidStaleUserInvoicesCron | Mar 20 | Safety net — hourly scan voids orphaned user-initiated invoices before Stripe dunning marks them uncollectible | ✅ Holding |
The control group (non-user-initiated uncollectibles) stayed flat at ~250/day throughout — confirming the fix is targeted and the cron isn't over-voiding.
We built four query specs. We haven't run any of them. Every number below is a prediction, not a measurement.
paypal-volume query: ready to run but hasn't been. If PayPal is less than 1% of customers, most risks are low-priority. If it's more than 5%, they're material. We don't know which.
paypal-collection-rate query: compares send_invoice vs charge_automatically outcomes. The dunning gap hypothesis says PayPal should be worse. We haven't confirmed it.
paypal-drift-detection query: finds PM/collection_method mismatches. Could be zero. Could be hundreds. Unknown.
The PayPal spike was supposed to run early Q2. It hasn't started. If the answer is no, the unified checkout design needs to branch on payment method type. This gates the checkout workstream.
The team thesis is "PayPal is not special" — that PayPal and card payments reduce to the same requires_action flow and don't need separate handling. Here's the evidence against that thesis.
Every other payment method (card, Apple Pay, Google Pay, ACH, CashApp, Link, SEPA) routes through Stripe. PayPal routes through Braintree. This isn't an abstraction difference — it's a completely separate processor with separate error handling, separate reconciliation, and separate monitoring gaps.
Adding PayPal as a PM switches every subscription to send_invoice with days_until_due. This means a customer's existing card subscriptions lose Stripe smart retry and dunning. The blast radius of a PayPal addition is account-wide, not scoped to PayPal-funded subscriptions.
FINPE-1311 fixed user-initiated invoices (subscription_create, subscription_update). But subscription_cycle renewals still go through send_invoice with no Stripe smart retry. The structural dunning gap is unchanged for renewals. The void fix addressed the symptom, not the architecture.
No business-level PayPal failure metric exists. No alert fires on PayPal failure spikes. No Grafana dashboard panel. Engineering discovers PayPal problems from customer escalations, not monitoring. A Braintree outage would be invisible until tickets arrive. PayPal is the only payment method with zero proactive observability.
The knowledge graph covers PayPal double-charge risk (30s Braintree dedup window). But 182 double-charge SF cases per month suggests systemic causes — proration generating unexpected second invoices, schedule overlap, Stripe retry duplication. The graph's coverage is mechanism-specific when the problem is systemic.
As long as PayPal uses Braintree, someone has to reconcile Stripe invoices against Braintree transactions monthly. What Stripe says should be paid vs what Braintree actually collected. This isn't a bug to fix — it's a structural cost of the integration.
These predictions are falsifiable. If they're wrong by May 1, the void fix isn't working as expected and we need to re-examine the approach.
| Metric | Before | Predicted | Mechanism |
|---|---|---|---|
| User-initiated uncollectibles/day | 10–18 | 0 | Cron voids before Stripe dunning |
| New bad-debt users/day | 65 | 35 | Void gap eliminated; rollback failures remain |
| Bad-debt tickets/month | ~17 | <10 | Fewer orphaned invoices → fewer bad-debt bans |
| Invoice remediation tickets/month | ~30 | ~25 | Fewer manual void/waive requests |
| Void/waive escalations/week | 11 | 5–7 | Upstream cause removed |
| PayPal uncollectible rate | Unknown | <5% | First measurement needed |
| PayPal PM drift count | Unknown | <10 | First measurement needed |
Note: two rows say "Unknown" for the before value. We literally cannot validate those predictions until we run the queries.
Four BQ queries created. All target cf-billing-jh8o3p1. All follow the query spec schema. None have been executed.
| Query | What It Measures | Why It Matters |
|---|---|---|
| paypal-volume | Customer and subscription counts by PM type | Sizes the denominator — are PayPal risks material or noise? |
| paypal-collection-rate | Invoice outcomes: paid, uncollectible, voided by collection_method | The core question: is PayPal dunning worse than card? |
| paypal-drift-detection | PM/collection_method mismatches | Finds customers in actively broken states |
| paypal-dunning-gap | Overdue PayPal renewal outcomes | Measures the structural gap that the void fix didn't address |
Knowledge graph updated: 5 new nodes in collection tree, 1 risk status changed from dormant to mitigated. Brain has 2016 nodes, 2828 edges after compilation.
All four specs are ready. Running them turns predictions into measurements. Until then, this audit is theory.
A paypal_charge_outcome Prometheus counter in subs-api. An alert on failure rate spikes. A Grafana panel. The minimum viable monitoring for a payment method that processes real money through a separate processor.
The client-secret compatibility question has been open since Q1. It gates checkout design. Someone needs to actually try it.
PayPal will always require dual-system reconciliation. PayPal will always use send_invoice. These aren't bugs — they're the architecture. Plan accordingly.