𝗔 𝗦𝘂𝗰𝗰𝗲𝘀𝘀𝗳𝘂𝗹 𝗣𝗮𝘆𝗺𝗲𝗻𝘁 𝗧𝗵𝗮𝘁 𝗡𝗲𝘃𝗲𝗿 𝗕𝗲𝗰𝗮𝗺𝗲 𝗮 𝗕𝗼𝗼𝗸𝗶𝗻𝗴

A customer paid. Razorpay showed success. The webhook sent an HTTP 200. The payment was captured.

Yet the booking stayed stuck on "confirming."

No errors appeared. No exceptions broke the code. No alerts went off. Every metric showed a healthy system.

But the customer had nothing. The creator had no booking.

Accepting money is easy. Ensuring every payment leads to a booking is the real challenge.

Most tutorials suggest this flow:

This is dangerous. If the business logic lives inside the webhook, you depend entirely on delivery success. Webhooks face retries, duplicates, and partial failures.

We changed our architecture to separate these tasks. Webhooks now only record events. They do not perform business logic.

We introduced an event ledger with three tables:

The webhook now has one job:

This protects the system. If the webhook fails, the event is still safe.

We also learned that payment state and booking state are different. A captured payment is an input. A confirmed booking is the result. Keeping them separate allows for reconciliation.

During an investigation, we found a bug. The events existed in the database. The processor was healthy. The webhook was healthy.

But the processor never ran. Nobody was triggering the function to process pending events.

Decoupling ingestion from processing is good design. But it creates a new requirement: something must trigger the processing.

We implemented a scheduler to run several jobs:

To prevent errors during retries, we use this logic:

A system that only works when every webhook arrives on time is a fragile system. If your queue has no one to drain it, work waits forever.

Reliability means building for when things fail.

Source: https://dev.to/abhishekvoid/a-successful-payment-that-never-became-a-booking-building-a-fault-tolerant-payment-pipeline-4ioj