Payments Are a Distributed State Machine, Not CRUD
09 Jun 2026The bug that taught me the most respect for billing systems was the one where our website showed an invoice for a subscription the customer had already cancelled.
That actually happened. The subscriptions page did not list it, cancelled and gone. But somewhere else in the product an invoice for it was still cheerfully on display. Two parts of the same system disagreed about whether someone was even a customer.
Payments look like CRUD until you have lived in them. Create a subscription, read its status, update the plan, delete on cancel. It feels like a table with some rows. Then you realise it is a distributed state-machine where the source of truth sits at the provider (Stripe, Razorpay), reaches you as webhooks, and those webhooks arrive out of order, more than once, and sometimes not at all.
Here are a few scars from that stretch of work, and what they drilled into me.
A subscription that quietly stopped renewing
A Razorpay subscription stopped auto-renewing in production. No error, no alert. It just did not renew, and we found out the slow way.
The cause was not Razorpay. It was us. A few weeks earlier we had split webhook handling into a separate path, a reasonable refactor on its own. But a renewal event slipped through the seam between the old handling and the new. The event came in, nothing was listening for it in the right place, and the subscription silently lapsed.
This is the thing about webhooks. They are not a request you control and can retry on your terms. They are an event stream the provider pushes at you, and any gap in your handling shows up later as money that did not move. When you change how you process them, you are changing a load-bearing wall.
“Invalid integer: all”
Around the same time, a merge to master lit up our Firebase cloud-functions with this:
error: "Invalid integer: all"
message: "Failed to fetch invoices from Stripe"
A tiny type assumption somewhere was passing the string all where an integer was expected. Completely invisible in code review. Loud and immediate in production, the moment it ran against real Stripe data.
I mention it because it is so ordinary. The expensive payment bugs are almost never clever. They are a string where a number should be, an event handled in the wrong place, a status that two systems read differently.
What I actually learned
A few principles that I now treat as non-negotiable when money is involved.
Make webhook handling idempotent. The same payment will be reported to you several times. A retry, a duplicate, a replay after an outage. Your job is not to process each delivery, it is to land in the same final state no matter how many times you see the event. Dedupe on the provider’s event id, make the state transition safe to apply twice, and stop assuming “exactly once”.
The provider is the source of truth. Your database is a cache that is allowed to be wrong. This sounds obvious and almost nobody builds like they believe it. Your local subscription.status is a convenience copy. It will drift. So you reconcile against the provider rather than trusting your own row, and you build a way to re-sync when (not if) they disagree.
State changes are not free-form. A subscription goes active, past_due, cancelled, and the valid transitions between those are a small graph. Writing each webhook handler as “set status to whatever the event says” is how you end up showing invoices for cancelled subscriptions. Model the state-machine explicitly and reject transitions that should not happen.
The bugs that cost money do not throw exceptions. That is the uncomfortable part. A crash you will see. A subscription that silently fails to renew, or two pages that disagree about a customer’s status, those just sit there quietly costing you until someone notices. So you instrument the money-paths more carefully than the rest of your app, and you alert on the absence of events you expected, not just on errors.
The mindset shift
If I had to compress it: stop thinking of billing as your data and start thinking of it as a projection of the provider’s truth, assembled from an unreliable event stream.
Once that clicks, a lot of the defensive work stops feeling like overhead and starts feeling like the actual job. Idempotency, reconciliation, explicit state, alerting on silence. None of it is glamorous. All of it is the difference between a billing system you trust and one you are quietly afraid of.
If your billing “mostly works”, that is exactly the phrase I would worry about. Go and find the place where two parts of your system disagree about who is paying you.