We have implemented two different approaches: one using the synchronous Saga pattern and the other following an asynchronous event-driven architecture.
We would greatly appreciate it if you could also check out the event-driven implementation briefly, as it also provides fault tolerance while ensuring consistency. Main difference is that logging is better implemented for the checkout workflow in this branch, and therefore we are using it as main but the event-driven approach has a slightly better RPS.
The event-driven approach is implemented in rabbitmq-final branch
This repository contains our primary implementation of a distributed checkout system using the Saga pattern with orchestration.
Key Components
- Implemented a central orchestrator in the order-service, which:
- Initiates stock subtraction (stock-service)
- Initiates payment deduction (payment-service)
- Finalizes the order upon success
- Performs compensation in case of failure (refund payment or restore stock)
- Followed an orchestration-based pattern: The order service directly coordinates the workflow through synchronous HTTP requests.
- Workflow is synchronous and request-driven, which keeps the flow deterministic and easier to debug.
- Leveraged Redis pipelines and optimistic locking (
WATCH/MULTI/EXEC) instock-serviceandpayment-serviceto:- Prevent race conditions in concurrent updates
- Ensure atomicity and consistency in high-concurrency scenarios
- Support safe retries when conflicts are detected
-
Choreography-based Saga: No standalone orchestrator. Instead,
order-servicedrives the workflow, coordinating the saga while also being a domain-bound participant. This maintains a choreographed design rather than an orchestrated one. -
Synchronous choreography simplifies tracing, error handling, and debugging.
-
Optimistic concurrency control using Redis prevents conflicts and maintains performance without locking.
To ensure fault tolerance and state recoverability, each service implements logging:
-
All state transitions (e.g., order placed, stock reserved, payment processed) are written to log files:
logging/ ├── order_log.txt ├── stock_log.txt └── payment_log.txt -
These logs are:
- Append-only and human-readable
- Chronologically ordered for replay
- Used for recovery after crashes
-
On startup or failure recovery, each service runs:
./start_redis_with_recovery.sh
This script:
- Parses the service’s log file
- Reconstructs Redis state deterministically
- Allows the service to resume as if it never crashed
-
This decentralized logging model ensures:
- Fast recovery with no coordination overhead
- No single point of failure
- Replay-safe execution aligned with the last committed operation