April 18, 2025

πŸ”„ Mastering the SAGA Pattern: Java vs React – A Deep Dive for Architects and Interview Champions

🧠 Why Do We Need the SAGA Pattern?

In modern distributed systems, especially microservices and rich client-side apps, the traditional database transaction (ACID) model doesn't hold up. Here's why we need the SAGA pattern:

  • πŸ”„ Ensures eventual consistency across services

  • ❌ Handles partial failure gracefully

  • 🀐 Enables complex, multi-step workflows

  • β›” Avoids complexity and tight-coupling of 2-phase commits (2PC)


πŸ“˜ What Is the SAGA Pattern?

A SAGA is a sequence of local transactions. Each service updates its data and publishes an event. If a step fails, compensating transactions are triggered to undo the impact of prior actions.

✌️ Two Main Styles:

Pattern Description
Orchestration Centralized controller manages the saga
Choreography Services communicate via events

πŸ’» SAGA in Java (Spring Boot)

πŸ›οΈ E-Commerce Checkout Flow

  1. Create Order

  2. Reserve Inventory

  3. Charge Payment

  4. Initiate Shipping

❌ If Payment Fails:

  • Refund

  • Release Inventory

  • Cancel Order

✨ Java Orchestration Example

public class OrderSagaOrchestrator {
  public void startSaga(OrderEvent event) {
    try {
      inventoryService.reserveItem(event.getProductId());
      paymentService.charge(event.getUserId(), event.getAmount());
      shippingService.shipOrder(event.getOrderId());
    } catch (Exception e) {
      rollbackSaga(event);
    }
  }

  public void rollbackSaga(OrderEvent event) {
    shippingService.cancelShipment(event.getOrderId());
    paymentService.refund(event.getUserId(), event.getAmount());
    inventoryService.releaseItem(event.getProductId());
    orderService.cancelOrder(event.getOrderId());
  }
}

πŸ“ˆ Tools & Frameworks:

  • Spring Boot

  • Kafka/RabbitMQ

  • Axon Framework / Eventuate


βš›οΈ SAGA in React (redux-saga)

πŸšͺ Multi-Step Login Workflow

  1. Authenticate User

  2. Fetch Profile

  3. Load Preferences

❌ If Fetch Profile Fails:

  • Logout

  • Show Error

function* loginSaga(action) {
  try {
    const token = yield call(loginAPI, action.payload);
    yield put({ type: 'LOGIN_SUCCESS', token });

    const profile = yield call(fetchProfile, token);
    yield put({ type: 'PROFILE_SUCCESS', profile });

    const prefs = yield call(fetchPreferences, token);
    yield put({ type: 'PREFERENCES_SUCCESS', prefs });

  } catch (err) {
    yield put({ type: 'LOGIN_FAILURE', error: err.message });
    yield call(logoutUser);
    yield put({ type: 'SHOW_ERROR', message: 'Login failed' });
  }
}

🧠 Key Concepts:

  • redux-saga = orchestrator

  • yield call() = async step

  • Rollback = logout/cleanup


🌍 Real-Life Use Cases

Backend:

  • Booking systems (flight + hotel)

  • Wallet fund transfers

  • eCommerce checkouts

Frontend:

  • Multi-step login/signup

  • Form wizard undo

  • Order confirmation with rollback


🏠 Architectural Deep Dive

πŸ”¨ Orchestration

            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ Orchestratorβ”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ Order Created β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β–Ό
       Inventory β†’ Payment β†’ Shipping
        ↓         ↓         ↓
   Release ← Refund ← Cancel

πŸ“š What's Next?

  1. Event Sourcing

  2. CQRS (Command Query Responsibility Segregation)

  3. Outbox Pattern

  4. Retry Patterns

  5. Step Functions / State Machines


πŸ›Œ Interview Questions

Question Tip
What is a SAGA pattern? Explain distributed transaction and compensation
Orchestration vs Choreography? Orchestrator vs Event-based
SAGA in React? Use redux-saga, show generator pattern
How to rollback in microservices? Compensating transaction
Why not 2PC? Not scalable, tight coupling

🧐 Self-Test Questions

  1. Design a SAGA for flight + hotel combo

  2. SAGA with Kafka vs SAGA with REST

  3. Difference: retry vs compensation

  4. How to ensure idempotency in SAGA?

  5. Drawbacks of SAGA? Latency, complexity?


πŸ“„ Summary: Backend vs Frontend

Feature Java (Spring) React (redux-saga)
Purpose Distributed data consistency UI flow control
Pattern Event-driven Orchestration Generator-based orchestration
Rollback Compensating transaction State rollback/logout
Communication Kafka, REST, RabbitMQ Redux actions, async calls

Ready to master distributed consistency like a pro? SAGA is your first step to engineering Olympic-level microservice systems and stateful UI flows!

April 16, 2025

πŸš€ Deploying AWS Lambda with CloudFormation: Deep Dive with Reasoning, Strategy & Implementation

Infrastructure as Code (IaC) is not just a DevOps trend β€” it’s a necessity in modern cloud environments. AWS CloudFormation empowers teams to define, version, and deploy infrastructure consistently.

In this blog, we'll explore a full-fledged CloudFormation template designed for deploying Lambda functions, focusing on why we structure it this way, what each section does, and how it contributes to reliable, scalable infrastructure.


βœ… WHY: The Motivation Behind This Template

1. Consistency Across Environments

Manual deployment = human error. This template ensures every Lambda function in QA, PreProduction, or Production is configured exactly the same, reducing bugs and drift.

2. Scalability for Teams

Multiple teams, multiple functions. Parameterization, environment mapping, and IAM policies are designed so one template can serve dozens of use cases.

3. Security-First Approach

IAM roles, security groups, and S3 access policies follow the least privilege principle, enforcing boundaries between services and reducing risk.

4. Automated, Repeatable Deployments

Once written, this template becomes part of CI/CD pipelines β€” no more clicking around AWS Console for deploying each version.


🧾 WHAT: Key Components of the Template

πŸ”§ Parameters

Define runtime configurations:

  • Memory, timeout, environment

  • S3 path to code

  • Whether it’s deployed in DR or not

Why: Keeps the template generic & reusable. You plug in values instead of rewriting.


🧩 Mappings

Mappings connect abstract inputs to actual values:

  • VPC CIDRs by environment

  • Mongo Atlas domains

  • AWS Account-specific values

Why: Allows deployment across multiple AWS accounts and regions without code change.


πŸ” IAM Roles & Policies

Provides:

  • Execution Role for Lambda

  • S3 read access for code artifacts

  • Access to services like SQS, Batch, Kinesis

Why: Lambda runs with temporary credentials. These permissions define what Lambda can touch, and nothing more.


🌐 VPC & Subnets

Lambda can be placed in VPC subnets to access:

  • Databases

  • Internal services

  • VPC-only APIs

Why: Enables secure, private connectivity β€” essential for production-grade workloads.


🎯 Scheduled Invocations

Supports setting a CRON schedule for periodic executions (e.g. cleanup tasks, polling jobs).

Why: Reduces the need for additional services like CloudWatch Events or external schedulers.


πŸ”§ HOW: Putting It All Together

Let’s walk through the deployment logic of the template:

1. Define Inputs via Parameters

Parameters:
  Project:
    Type: String
  Environment:
    Type: String
    AllowedValues: [QA, PreProduction, Production]
  ...

You pass these values when you deploy the stack (e.g., via AWS CLI or CD pipelines).


2. Look Up Environment-Specific Values

Mappings:
  EnvCIDRByVPC:
    QA:
      PlatformA: 10.0.0.0/16

CloudFormation uses Fn::FindInMap to fetch the right CIDR, Mongo domain, or account ID dynamically.


3. Create IAM Roles with Granular Access

Policies:
  - PolicyName: LambdaArtifactAccess
    PolicyDocument:
      Statement:
        - Effect: Allow
          Action: [s3:GetObject]
          Resource: arn:aws:s3:::bucket-name/path/to/code.zip

This ensures the Lambda function can only read its own code and interact with intended services.


4. Provision Lambda in VPC

VpcConfig:
  SubnetIds: [subnet-1, subnet-2]
  SecurityGroupIds: [sg-xxxx]

Running Lambda in a VPC helps isolate network traffic and control what it can talk to.


5. Support for Schedule

Properties:
  ScheduleExpression: rate(1 hour)

This allows you to deploy event-driven or scheduled Lambdas without extra services.


🧠 DEEP KNOWLEDGE: Under-the-Hood Design Decisions

πŸ”„ Environment Agnostic via Mappings

Instead of using if/else logic, Mappings let CloudFormation resolve values at runtime with Fn::FindInMap. This is more maintainable and faster than using Conditions.


πŸ” S3 Bucket Access is Explicit

Instead of granting wide access, the template crafts exact S3 ARN paths for Lambda artifacts. This follows zero trust principles.


πŸ›‘ IAM Role Segregation

Lambda roles are created per function β€” this way, access doesn't bleed over into unrelated resources.


🧩 Security Group Logic Uses External Service

Outbound rules are set using:

ServiceToken: arn:aws:lambda:...

This uses a Service Catalog Custom Resource, showing how advanced teams abstract and reuse security config logic across orgs.


🧬 Metadata Tags

Every resource is tagged with:

  • Project

  • Platform

  • Environment

  • Cost center

  • Application owner

This is crucial for FinOps, auditing, and visibility in large-scale environments.


🧰 Want to Go Further?

  • πŸ’‘ Add CloudWatch log groups

  • πŸͺ Use Lambda Destinations for post-processing

  • πŸ§ͺ Integrate with SAM for local testing

  • πŸ”„ Automate deployments with CodePipeline or GitHub Actions


✍️ Final Thoughts

This CloudFormation template is more than just a deployment script β€” it's a framework for building scalable, secure, and repeatable serverless architectures. Designed with flexibility, observability, and compliance in mind, it helps teams move faster without sacrificing control.

Whether you're managing one Lambda or a hundred, this structure will help you stay organized and resilient as you scale.


Let me know if you want this formatted for Medium, Dev.to, or as a GitHub README with code blocks and visual diagrams!

April 15, 2025

π„π±π©πžπ«π’πžπ§πœπžπ π‹πžπ―πžπ₯ π’π²π¬π­πžπ¦ πƒπžπ¬π’π π§ πŸ’‘

 A practical overview of challenging real-world system designs. Each design idea includes its purpose, blockers, solutions, intuition, and a popular interview Q&A to help you prepare for high-level interviews or system architecture discussions.

Use this as a cheat sheet or learning reference to guide your system design thinking.

# System Design Problem Intuition & Design Idea Blockers & Challenges Solution/Best Practices Famous Interview Question & Answer
1 URL Shortening (bit.ly) Map long URLs to short hashes. Store metadata and handle redirection. High scale, link abuse Use Base62/UUID, Redis cache, rate-limiting Q: How to avoid collisions in shortened URLs? A: Use hash + check DB for duplicates.
2 Distributed KV Store (Redis) Store data as key-value pairs across nodes. Network partitions, consistency Gossip/Raft protocol, sharding, replication Q: How to handle Redis master failure? A: Sentinel auto-failover.
3 Scalable Social Network (Facebook) Users interact via posts, likes, comments. Need timeline/feed generation. Feed generation latency, DB bottlenecks Precompute feed (fanout), cache timeline Q: How is news feed generated? A: Fan-out to followers or pull on-demand.
4 Recommendation System (Netflix) Suggest content based on user taste + trends Cold start, real-time scoring Use hybrid filtering, vector embeddings Q: How to solve cold start? A: Use content-based filtering.
5 Distributed File System (HDFS) Break files into blocks, replicate across nodes. Metadata scaling, file recovery NameNode for metadata, block replication Q: How does HDFS ensure fault tolerance? A: 3x replication and heartbeat checks.
6 Real-time Messaging (WhatsApp) Deliver messages instantly, maintain order. Ordering, delivery failures Kafka queues, delivery receipts, retries Q: How to ensure delivery? A: ACK, retry, message status flags.
7 Web Crawler (Googlebot) Crawl web, avoid duplicate/irrelevant content. URL duplication, crawl efficiency BFS + filters, politeness policy Q: How to avoid crawling same URL? A: Normalize + deduplicate with hash.
8 Distributed Cache (Memcached) Store frequently accessed data closer to users. Cache invalidation, stampede TTL staggering, background refresh Q: How to handle cache stampede? A: Use mutex/locks for rebuilds.
9 CDN (Cloudflare) Serve static assets from edge for low latency. Cache expiry, geolocation Use geo-DNS, cache invalidation APIs Q: How does CDN reduce latency? A: Edge nodes cache closer to user.
10 Search Engine (Google) Index content and rank pages on queries. Real-time indexing, ranking MapReduce, inverted index, TF-IDF Q: How does Google rank pages? A: Relevance + PageRank + freshness.
11 Ride-sharing (Uber) Match drivers to riders using location data. Geo-search, dynamic pricing Use GeoHashing, Kafka, ETA predictions Q: How does Uber find nearby drivers? A: Geo index or R-tree based lookup.
12 Video Streaming (YouTube) Store and stream videos with low buffer. Encoding, adaptive playback ABR (adaptive bitrate), chunking, CDN Q: How to support multiple devices? A: Transcode to multiple formats.
13 Food Delivery (Zomato) Show restaurants, manage orders, track delivery. ETA accuracy, busy hours ML models for ETA, real-time maps Q: How is ETA calculated? A: Based on past data + live traffic.
14 Collaborative Docs (Google Docs) Enable multiple users to edit in real time. Conflict resolution Use CRDTs/OT, state sync Q: How does real-time collaboration work? A: Merge edits using CRDT.
15 E-Commerce (Amazon) Sell products, track inventory, handle payments. Concurrency, pricing errors Use event sourcing, locking, audit trail Q: How to handle flash sale? A: Queue requests + inventory locking.
16 Marketplace Recommendation Personalize based on shopping history. New users, noisy data Use embeddings, clustering, trending items Q: How to personalize for new user? A: Use trending/best-selling items.
17 Fault-tolerant DB Ensure consistency + uptime in failures. Partitioning, network split Raft/Paxos, quorum reads/writes Q: CAP theorem real example? A: CP (MongoDB), AP (Cassandra).
18 Event System (Twitter) Send tweets/events to followers in real time. Fan-out, latency Kafka, event store, async processing Q: Push or pull tweets? A: Push for active, pull for passive.
19 Photo Sharing (Instagram) Users upload, view, and like photos. Storage, metadata Store media on CDN/S3, DB for metadata Q: Where are images stored? A: CDN edge, S3 origin.
20 Task Scheduler Schedule and trigger jobs reliably. Time zone issues, duplication Use cron w/ distributed locks Q: How to ensure task runs once? A: Use leader election or DB locks.

🧠 Tips for Developers:

  • Always consider scalability (horizontal vs vertical).

  • Trade-offs are key: CAP, latency vs availability.

  • Use queues to decouple services.

  • Think about observability: logging, metrics, alerts.

πŸ“š Want to go deeper? Check out:

  • "Designing Data-Intensive Applications" by Martin Kleppmann

  • SystemDesignPrimer (GitHub)

  • Grokking the System Design Interview (Educative.io)

Let me know if you'd like deep dives, diagrams, or downloadable PDF/Markdown version!

Fresher Level System Design Blog

Introduction

This blog is a quick reference guide for freshers preparing for system design interviews. Each topic below is summarized in 3-4 lines and presented in a table format for easy review. It also includes common interview questions, challenges, and suggestions to help you build intuition.

# System Design Topic Design Summary Challenges / Blockers Suggested Solution Famous Interview Question & Answer Intuition & Design Ideas
1 URL Shortening Service Use a key-value store to map short codes to long URLs. Generate short codes using Base62. Cache frequently accessed URLs. Collision in short code generation Use hashing + collision checks or UUID/base62 encoding. Q: How do you avoid collisions in short URL generation? A: Use base62 encoding of incremental IDs or UUID + retry on collision. Think of it like a dictionary: you store a short code and retrieve the original. Add expiration support and track analytics.
2 Basic Chat Application Use WebSockets for real-time messaging. Store messages in a NoSQL DB. Ensure message ordering and delivery. Ensuring delivery and message order Use message queues and timestamps, ACKs from client. Q: How would you ensure message order in group chats? A: Use timestamps with logical clocks or message queues per chat room. Use WebSocket for real-time, and fallback to polling for older clients. Consider how to handle offline messages.
3 File Storage System Use object storage like S3 for files. Store metadata in a DB. Provide upload/download APIs. Large file handling, partial uploads Use chunked upload/download and resumable uploads. Q: How would you implement versioning for files? A: Store file version history with timestamps in metadata DB. Think Dropbox: sync files across devices with deduplication and conflict resolution.
4 Social Media Platform Use relational DB for users/posts. Cache timelines. Implement followers and feed service. High write/read traffic on feeds Use fan-out on write/read strategy and timeline caching. Q: How do you design the user timeline? A: Use fan-out on write for small followers, fan-out on read for celebrities. Prioritize read-heavy optimization. Add notification and media support.
5 Simple Search Engine Crawl pages and index using inverted index. Use ranking algorithm for results. Keeping index up to date Use distributed crawlers and scheduled re-indexing. Q: How would you rank search results? A: Use TF-IDF, PageRank, or user behavior signals like clicks. Think Google-lite: crawl, index, rank. Add caching and autosuggestions.
6 E-commerce Website Use microservices: product, cart, order, payment. SQL DB for product and inventory. Inventory sync and order consistency Use distributed transactions or eventual consistency with event queues. Q: How would you handle high traffic flash sales? A: Use inventory preloading to Redis and lock stock before checkout. Start with catalog, then cart/order/payments. Consider promotions, reviews, delivery tracking.
7 Ride-Sharing System Match riders and drivers using location. Real-time tracking. Accurate location matching, dynamic pricing Use geo-hashing, real-time map APIs, and ML for pricing. Q: How do you match drivers and riders efficiently? A: Use a spatial index like QuadTrees or GeoHash. Focus on live map, ETA, and surge pricing. Add cancellation/reassignment logic.
8 Video Streaming Service Use CDN for delivery. Store videos in chunks. Use adaptive bitrate for smooth playback. Latency and buffering Use HLS/DASH protocol and edge caching. Q: How to stream to users with different network speeds? A: Use adaptive bitrate streaming with multiple resolutions. Break videos into chunks. Use a manifest file (HLS). Add user history, playlist, and DRM.
9 Recommendation System Use collaborative or content-based filtering. Precompute recommendations. Cold start for new users or items Use hybrid approach with default/popular items. Q: How would you recommend items to a new user? A: Show trending items or use demographic similarity. Think YouTube/Netflix. Store events (views, clicks), then use ML models offline for suggestions.
10 Food Delivery App Use microservices: restaurant, user, order, delivery. Real-time tracking. Live order tracking, delivery partner availability Use Google Maps APIs + ETA algorithms and dynamic delivery assignment. Q: How do you ensure food is delivered fresh and on time? A: Assign nearest delivery agent, optimize route, notify delays. Focus on real-time updates and restaurant status. Add rating system for feedback.
11 Parking Lot System Track available slots in DB. Assign spots. Entry/exit logs and payments. Real-time availability accuracy Use sensors or manual sync + DB updates. Q: How would you design for multiple floors or zones? A: Partition lot into zones and track slots per zone in DB. Add reservation system, payments, QR/barcode entry. Consider IoT for sensors.
12 Music Streaming Service Store music on cloud. Use playlists, search, recommendations. Latency and copyright handling Use CDN + streaming DRM integration. Q: How would you support offline playback? A: Encrypt songs on device with limited-time license key. Similar to video streaming but lighter files. Add social sharing, lyrics, etc.
13 Ticket Booking System Locking to avoid double bookings. Store event/show data in DB. High concurrency for popular events Use row-level locking or optimistic locking strategies. Q: How to prevent double booking of the same seat? A: Use atomic seat lock with expiry during checkout. Add seat map UI, payment integration, reminders. Handle refunds/cancellations.
14 Note-Taking Application CRUD operations. Sync across devices. Store in cloud DB. Conflict resolution in sync Use timestamps + conflict resolution policies. Q: How to sync notes across multiple devices? A: Use timestamps and push updates via WebSocket or polling. Think Notion/Keep. Add tags, reminders, and collaborative editing.
15 Weather Forecasting System Collect weather data from APIs/sensors. Store time-series data. High frequency updates, regional accuracy Use time-series DBs and ML-based predictions. Q: How do you predict weather for a new location? A: Use nearby station data and interpolate using models. Combine IoT sensors, external APIs, and ML models. Add alerting and maps.
16 Email Service Use SMTP to send emails. Store in DB. Support inbox, outbox, spam. Spam filtering and delivery issues Use heuristics + feedback systems + email queue management. Q: How would you ensure email delivery reliability? A: Use retries, bounce monitoring, and SPF/DKIM setup. Design mailbox, filters, attachments. Add UI like Gmail.
17 File Sync System Use file hash and timestamps. Sync diffs. Handle conflict resolution. Merge conflicts Use last-write-wins or manual merge strategy. Q: How do you sync two files modified at the same time? A: Detect conflict and ask user to merge manually. Think Dropbox/GDrive. Compress, diff-check, and background upload.
18 Calendar Application Support events, reminders, recurrence. Notifications and sync. Time zone handling, reminders Normalize time and use push notification service. Q: How to handle daylight saving and multiple time zones? A: Store in UTC and convert to local for display. Focus on recurrence (RRULE), invites, rescheduling. Add integrations like email or Google Meet.
19 Online Quiz Platform Create quizzes. Store answers, scores. Track user progress. Prevent cheating, real-time scoring Use proctoring APIs or time-restricted tests with session tracking. Q: How to handle large-scale exam with many users? A: Use horizontal scaling and rate limit cheating behavior. Think Google Forms + timer. Add leaderboard, difficulty levels.
20 Auth System Use OAuth2 or JWT. Store hashed passwords. Support MFA. Token expiration, brute force attacks Use refresh tokens, rate limiting, and password encryption (bcrypt). Q: How do you revoke JWT tokens? A: Use token blacklist or short expiry + refresh token. Start with sign-up/login, session vs token, role-based access. Add social login and 2FA.

Conclusion

This concise table helps you quickly review common system designs. Build a few for hands-on experience and better understanding.

Learn More: