January 23, 2026

Java Concurrency in Practice (JCIP) — Chapter 1

Why Concurrency Exists, Why It’s Hard, and How to Talk About It (with diagrams + examples for your YouTube video)

Hook: Concurrency is basically “reading the news while the kettle boils.”
The tricky part is: if two people try to make tea using the same kitchen counter at the same time… stuff gets knocked over.

This blog is a complete, future-proof Chapter 1 recap (plus the missing pieces you asked for: async event handling, non-blocking I/O, Unix select()/poll(), and modern Java updates). It’s designed so you can lift sections directly into a YouTube script and use the diagrams as visuals.


1) The reason concurrency became mandatory: the clock-speed wall → multi-core world

CPU performance used to rise mainly by increasing clock speed. That path got limited by power/heat constraints, so manufacturers shifted to adding more cores.

Consequence: A single-threaded program often can’t use all available CPU capacity on a multi-core machine. If your machine has 8 cores and you run 1 runnable thread, you’re leaving a lot on the table.

Diagram: “One thread on many cores”

4-core CPU
+-------+-------+-------+-------+
| Core1 | Core2 | Core3 | Core4 |
|  RUN  | idle  | idle  | idle  |
+-------+-------+-------+-------+

4 runnable threads
+-------+-------+-------+-------+
| Core1 | Core2 | Core3 | Core4 |
| RUN   | RUN   | RUN   | RUN   |
+-------+-------+-------+-------+

2) A short history: from bare metal to multitasking OS

Early systems often ran one program at a time. Then operating systems evolved to run multiple programs “at once” (time-slicing, scheduling) because of:

A) Resource utilization

If one program blocks on I/O (disk/network), the CPU shouldn’t sit idle—run something else.

B) Fairness

Multiple users/processes should get a fair share of resources.

C) Convenience

Humans naturally model the world as independent activities.
Your tea example is perfect:

  • Boiling water = waiting (blocked)

  • Read news while waiting = overlap idle time with useful work

That’s concurrency as real life.


3) Threads: “lightweight processes” + what’s shared vs. private

Modern OSs schedule threads as the basic unit (in practice, most systems treat threads—not processes—as what gets scheduled).

What threads share (inside one process)

What each thread owns

  • Stack

  • Program counter / registers (execution context)

Diagram: process vs threads

flowchart LR
  subgraph P["One Process (one address space)"]
    Heap[(Heap / Objects / Shared State)]
    subgraph T1["Thread 1"]
      S1[Stack]
      C1[PC + Registers]
    end
    subgraph T2["Thread 2"]
      S2[Stack]
      C2[PC + Registers]
    end
    T1 --> Heap
    T2 --> Heap
  end

Key point: Sharing the heap makes communication easy—and makes races possible.


4) Why we use threads anyway: the 3 big benefits

1) Exploit multi-core

Multiple runnable threads can run simultaneously on multiple cores.

2) Better throughput via I/O overlap

A single thread blocks waiting for I/O. With concurrency, another thread can run while one waits.

3) Simpler modeling (sometimes)

It can be easier to think:

  • “Each user request is an independent activity”

  • “Each background task is an independent activity”

  • “UI events are handled separately from slow work”

Frameworks often give you this “simple model” while hiding scheduling.


5) Concurrency is already inside your app (even if you didn’t add it)

Chapter 1’s sneaky truth: you’re already writing concurrent programs because the platform/framework is.

Common sources of “hidden concurrency”

  • Servlet containers: multiple requests invoke your code concurrently.

  • RMI: remote calls can overlap on the server.

  • Swing/AWT: event-driven and asynchronous.

  • Timers/schedulers: background task execution.

  • JVM internals: background threads exist (GC, runtime work).

So even “normal Java code” can be called from multiple threads.


6) The 3 concurrency hazard families (Chapter 1’s core warning)

JCIP gives a super practical taxonomy:

flowchart TB
  C[Concurrency] --> S[Safety: nothing bad happens<br/>Correctness]
  C --> L[Liveness: something good eventually happens<br/>Progress]
  C --> P[Performance: it works but may be slower<br/>Overhead & contention]

Let’s make each one concrete.


6A) Safety hazards (correctness): “nothing bad happens”

Safety problems show up as:

  • race conditions

  • broken invariants

  • corrupted state

  • “works 999 times, fails once” bugs

Example 1: count++ is not atomic

class Counter {
  private int count = 0;
  public void inc() { count++; } // read → add → write (can interleave)
  public int get() { return count; }
}

Two threads can lose updates because increments overlap.

Fix options you’ll see later:

Example fix:

import java.util.concurrent.atomic.AtomicInteger;

class Counter {
  private final AtomicInteger count = new AtomicInteger();
  public void inc() { count.incrementAndGet(); }
  public int get() { return count.get(); }
}

Example 2: “check-then-act” race

if (!map.containsKey(k)) {
  map.put(k, v);
}

Two threads can both pass the check and both insert.


6B) Liveness hazards (progress): “something good eventually happens”

Liveness failures include:

  • deadlock (mutual waiting)

  • starvation (one thread never gets scheduled/lock)

  • livelock (lots of motion, no progress)

Deadlock example (classic lock ordering bug)

final Object A = new Object();
final Object B = new Object();

Thread t1 = new Thread(() -> {
  synchronized (A) { synchronized (B) { } }
});

Thread t2 = new Thread(() -> {
  synchronized (B) { synchronized (A) { } }
});

Diagram: deadlock cycle

flowchart LR
  T1[Thread 1] -->|holds| A[Lock A]
  T1 -->|waits for| B[Lock B]
  T2[Thread 2] -->|holds| B
  T2 -->|waits for| A

6C) Performance hazards: “threads can make you slower”

This is the point you insisted on (and it’s one of the most real-world parts of Chapter 1):

Threads can bring net performance gain, but they introduce runtime overhead.
If you overdo threads or coordinate poorly, performance collapses.

The big overhead sources

1) Context switches

The scheduler suspends one thread, resumes another:

  • save/restore execution context (registers, PC, stack metadata)

  • scheduler bookkeeping

2) Loss of locality (cache pain)

Frequent switching means:

  • more cache misses

  • more time reloading data

  • less time doing real work

3) CPU time spent scheduling instead of running

Too many runnable threads → CPU becomes a “thread juggler.”

4) Contention & coordination overhead

Locks/queues shared by many threads:

  • waiting, wakeups, convoying

  • bad tail latency

5) Memory overhead per platform thread

Each OS thread has stack + runtime overhead.

Diagram: where the CPU time goes when you over-thread

CPU time →
[work][context switch][cache refill][work][scheduler][work][lock wait]...

Example: thread-per-client can explode

while (true) {
  Socket s = server.accept();
  new Thread(() -> handleClient(s)).start(); // scales... until it doesn't
}

7) “Simplified handling of async events” (how real systems avoid raw thread chaos)

A big reason concurrency is usable in practice: we structure it.

Instead of “new Thread for everything,” we do:

Pattern: events → queue → bounded workers

flowchart LR
  UI[UI click / HTTP request / Timer tick] --> Q[Task Queue]
  Q --> Pool[Thread Pool / Executor]
  Pool --> Result[Response / Update / Next Task]

Java example: ExecutorService

ExecutorService pool = Executors.newFixedThreadPool(32);

void onEvent(Event e) {
  pool.submit(() -> handleEvent(e));
}

This is “simplified async event handling” in one line:

  • you submit tasks, not threads

  • concurrency is bounded and manageable

Async composition: CompletableFuture

CompletableFuture
  .supplyAsync(this::fetchData, pool)
  .thenApply(this::transform)
  .thenAccept(this::respond);

This helps you model “what happens next” without blocking the caller.


8) Non-blocking I/O and Unix select() / poll() (and why Java NIO exists)

When you have many connections, thread-per-connection often becomes expensive.

Unix solved “watch many connections efficiently” with I/O multiplexing:

select(): “wait until some fd is ready”

select() lets a program monitor multiple file descriptors until one becomes “ready.” A descriptor is ready if an I/O operation (read/write) will not block. (man7.org)

poll(): similar concept, different interface

poll() also waits for file descriptors to become ready; “ready” means the requested operation will not block. (Arch Manual Pages)

epoll (Linux): designed to scale

Linux epoll performs a similar task to poll but “scales well to large numbers of watched file descriptors.” (man7.org)

kqueue (BSD/macOS): kernel event notification mechanism

kqueue provides a generic event notification mechanism based on kernel “filters.” (man.freebsd.org)


9) Java NIO: the Java face of select/poll/epoll/kqueue

Java NIO gives you a Selector: a “multiplexor of SelectableChannel objects.” (Oracle Docs)

Diagram: blocking vs non-blocking server model

flowchart LR
  subgraph Blocking["Blocking I/O (thread-per-connection)"]
    C1[Client 1] --> T1[Thread 1<br/>blocks on read()]
    C2[Client 2] --> T2[Thread 2<br/>blocks on read()]
    C3[Client 3] --> T3[Thread 3<br/>blocks on read()]
  end

  subgraph NonBlocking["Non-blocking I/O (Selector/event loop)"]
    Many[Many clients] --> Sel[Selector + event loop<br/>(readiness notifications)]
    Sel --> CPU[Optional worker pool<br/>for CPU-heavy tasks]
  end

Minimal Java NIO skeleton (teaching version)

Selector selector = Selector.open(); // uses default provider

ServerSocketChannel server = ServerSocketChannel.open();
server.configureBlocking(false);
server.bind(new InetSocketAddress(8080));
server.register(selector, SelectionKey.OP_ACCEPT);

while (true) {
  selector.select(); // block until at least one channel is ready

  for (Iterator<SelectionKey> it = selector.selectedKeys().iterator(); it.hasNext();) {
    SelectionKey key = it.next();
    it.remove();

    if (key.isAcceptable()) {
      SocketChannel client = server.accept();
      client.configureBlocking(false);
      client.register(selector, SelectionKey.OP_READ);
    }

    if (key.isReadable()) {
      SocketChannel client = (SocketChannel) key.channel();
      ByteBuffer buf = ByteBuffer.allocate(4096);
      int n = client.read(buf);
      if (n < 0) client.close();
      // process buf...
    }
  }
}

“Under the hood” proof (future-proof + concrete)

OpenJDK literally includes:

  • EPollSelectorImpl: “implementation of Selector … uses the epoll event notification facility.” (GitHub)

  • On Linux, the default selector provider wiring uses EPollSelectorProvider. (cocalc.com)

  • KQueueSelectorImpl: “Implementation of Selector using FreeBSD / Mac OS X kqueues.” (cr.openjdk.org)

So conceptually:

  • Unix gives you select/poll/epoll/kqueue

  • Java gives you Selector

  • The JDK maps to platform mechanisms internally


10) Swing EDT: responsiveness + thread safety by confinement

Swing apps are asynchronous by nature: users can click anytime, and they expect the UI to respond promptly.

Swing’s rule is strict:

“All other Swing component methods must be invoked from the event dispatch thread.” (Oracle Docs)

Diagram: Swing EDT + worker thread

flowchart LR
  User[User action] --> EDT[Event Dispatch Thread]
  EDT -->|must be quick| UI[Update UI]
  EDT -->|offload slow work| Worker[Background Thread]
  Worker -->|post result| EDT

Example: UI freeze (bad)

button.addActionListener(e -> {
  doSlowNetworkCall();  // blocks EDT → UI freezes
  label.setText("Done");
});

Example: correct pattern (good)

button.addActionListener(e -> {
  new Thread(() -> {
    String result = doSlowNetworkCall();
    SwingUtilities.invokeLater(() -> label.setText(result));
  }).start();
});

11) RMI: can the same remote method be called concurrently?

Yes. Oracle’s RMI spec is blunt:

“Since remote method invocation on the same remote object may execute concurrently, a remote object implementation needs to make sure its implementation is thread-safe.” (Oracle Docs)

Diagram: concurrent RMI invocations

sequenceDiagram
  participant C1 as Client 1
  participant C2 as Client 2
  participant R as RMI Runtime (Server)
  participant O as Remote Object

  C1->>R: invoke method()
  C2->>R: invoke method()

  par concurrent execution
    R->>O: method() on Thread A
  and
    R->>O: method() on Thread B
  end

  O-->>R: return
  R-->>C1: result
  O-->>R: return
  R-->>C2: result

Implication: Treat remote objects like servlets: assume multiple threads can call them and protect shared mutable state.


12) “Future-proof” update: what changed since JCIP, what never changes

JCIP Chapter 1 is still correct because the problem categories haven’t changed:

  • Safety

  • Liveness

  • Performance

What changed is we now have better tools.

Virtual threads (Java 21)

Virtual threads are lightweight threads meant to dramatically reduce the effort of writing high-throughput concurrent apps. (OpenJDK)

Why this matters for the chapter’s story:

  • It brings back “thread-per-task” readability for many I/O-heavy workloads

  • While still requiring you to think about shared state safety, liveness, and performance

Structured concurrency (still evolving)

Structured concurrency is about treating a group of subtasks as a unit (scope), improving cancellation and observability. The JEP describes StructuredTaskScope for structuring concurrent subtasks. (OpenJDK)

Future-proof rule: Even if APIs change, the core reasoning (shared state + hazards + overhead) stays.


Final recap (Chapter 1 distilled into one slide)

If your viewers remember only this, they’ll be set for Chapter 2:

  1. Multicore made concurrency non-optional.

  2. OS multitasking evolved for utilization, fairness, convenience.

  3. Threads share heap memory → easy communication, easy races.

  4. Threads help with throughput, I/O overlap, responsiveness, and modeling.

  5. Concurrency shows up “under the hood” (Servlets, Swing, RMI, timers, JVM).

  6. Hazards fall into Safety, Liveness, Performance.

  7. Performance hazards are real: context switches, cache locality loss, scheduling/coordination overhead.

  8. Non-blocking I/O exists to avoid “thread-per-connection” scalability limits.

  9. Unix select/poll and Linux epoll/BSD kqueue are the OS foundation. (man7.org)

  10. Java NIO Selector is the Java abstraction over readiness multiplexing. (Oracle Docs)


YouTube-friendly slide plan (you can literally follow this)

  1. Hook: Tea kettle analogy (no diagram)

  2. Multi-core shift: Core boxes diagram

  3. OS motivations: utilization/fairness/convenience (3 bullet icons)

  4. Process vs thread: Mermaid process/thread diagram

  5. Benefits: throughput + responsiveness + modeling (3 callouts)

  6. Hazards triangle: Safety/Liveness/Performance diagram

  7. Safety example: count++ race code

  8. Liveness example: deadlock cycle diagram

  9. Performance: CPU time strip diagram + “too many threads” snippet

  10. Blocking vs non-blocking: server model diagram

  11. Unix readiness: select/poll/epoll/kqueue bullets with one visual

  12. Swing EDT: EDT diagram

  13. RMI concurrency: sequence diagram

  14. Modern Java: virtual threads + structured concurrency (one slide)

  15. Wrap-up checklist: final recap slide

Java Concurrency in Practice — Chapter 1, but future-proof (with examples + NIO + select/poll)

If Chapter 1 had a vibe, it’s this:

Concurrency is how you cash in on modern hardware and responsive software…
and also how you accidentally summon chaos.

You already captured the main storyline. Below is a “final blog” that includes every point you raised, adds examples, explains simplified async event handling, and connects Java NIO all the way down to Unix select() / poll() (and friends). I’ll also make it future-proof by mapping the chapter’s ideas to modern Java concurrency choices (virtual threads, structured concurrency, etc.).


Table of contents

  1. Why concurrency matters (the “clock-speed wall”)

  2. From bare metal to multitasking OS: utilization, fairness, convenience

  3. Threads: “lightweight processes” + what’s shared vs private

  4. Why threads are used: throughput, responsiveness, simpler modeling

  5. Concurrency is already in your app (Servlets, RMI, Swing, timers, JVM)

  6. The three hazard families (Safety, Liveness, Performance) + examples

  7. Performance hazards (your added point) — why “more threads” can be slower

  8. Simplified handling of async events (tasks, pools, futures, EDT)

  9. Non-blocking I/O: Unix select/poll → Java NIO Selector (with code)

  10. Future-proofing: what changed since JCIP, what still matters

  11. A tiny checklist + exercises for Chapter 2


1) Why concurrency matters: CPUs stopped getting faster “for free”

For years, CPUs got faster mostly by increasing clock speed. Then power/thermal limits made that approach unattractive, and manufacturers shifted toward adding more cores instead.

That’s the key consequence Chapter 1 is pushing:

  • A single-threaded program on a multi-core machine can leave a lot of compute unused.

  • Exploiting concurrency effectively becomes more important as core counts rise.

So concurrency isn’t “an optional advanced topic.” It’s how modern systems scale.


2) From bare metal to multitasking OS: why “multiple things at once” became normal

Early computing often ran one program at a time on bare metal. As systems evolved, operating systems began supporting multiple programs executing simultaneously (or appearing to).

Three motivations you listed are the “holy trinity”:

A) Resource utilization

When a program blocks on I/O (disk/network), the CPU would otherwise sit idle. Let something else run.

B) Fairness

Multiple users/processes should each get a reasonable share of machine resources (no single bully process hogging everything).

C) Convenience

Humans naturally model independent activities. Your tea example nails it:

  • Water boiling = “blocked” time

  • Read news while waiting = overlap waiting with useful work

That’s concurrency as lived experience.


3) Threads: “lightweight processes” and the shared-memory deal

A thread is often called a lightweight process because it has its own execution state, but shares the process memory.

Modern OS schedulers typically treat threads (not processes) as the basic unit of scheduling — so threads get time-sliced and scheduled across cores.

What threads share (inside one process)

  • Heap (objects, shared data structures)

  • Same address space

What threads do not share

  • Stack (local variables per call chain)

  • Program counter, registers (execution context)

Because threads share heap state, without coordination they run asynchronously with respect to each other and can interfere in subtle ways.


4) Why threads are used: the big three benefits

1) Exploiting multi-core hardware

Multiple runnable threads can execute simultaneously on different cores.

2) Better throughput via overlapping I/O waits

While one thread blocks on I/O, another thread can run — improving CPU utilization.

3) Simpler modeling (sometimes!)

A lot of systems become easier to describe as independent activities:

  • “each request is handled independently”

  • “UI responds to events; background work fetches data”

  • “periodic cleanup runs on a timer”

Frameworks often use threads internally so you can write “simple-looking” code.


5) Concurrency is already in your app (even if you didn’t add it)

This is one of the most important Chapter 1 lessons: you might already be concurrent because the platform/framework is.

Examples you mentioned (and yes, they matter)

  • Servlet/JSP containers: multiple requests can call into your code concurrently.

  • Timers / scheduled tasks: time-based tasks run in the background.

  • RMI: remote calls can be served concurrently.

  • Swing/AWT: user interaction is event-driven and asynchronous by nature.

  • JVM housekeeping: background threads exist for runtime behavior.


6) The three hazard families (with concrete examples)

JCIP’s intro gives an extremely useful taxonomy:

A) Safety hazards — “nothing bad happens”

Correctness failures: races, broken invariants, inconsistent state.

Example: non-atomic increment

class Counter {
  private int count = 0;

  public void inc() { count++; }  // NOT atomic
  public int get() { return count; }
}

Two threads can both read the same value and overwrite each other → lost updates.

Example: check-then-act (classic race)

if (!map.containsKey(k)) {
  map.put(k, v);
}

Two threads can both pass the containsKey check and both put.

B) Liveness hazards — “something good eventually happens”

The system doesn’t crash, but it stops making progress.

  • Deadlock: circular waiting

  • Starvation: one thread never gets what it needs (CPU/lock/resource)

  • Livelock: threads actively “react,” but no progress

Deadlock example

final Object A = new Object();
final Object B = new Object();

Thread t1 = new Thread(() -> {
  synchronized (A) {
    synchronized (B) { /* ... */ }
  }
});

Thread t2 = new Thread(() -> {
  synchronized (B) {
    synchronized (A) { /* ... */ }
  }
});

If t1 holds A and t2 holds B, both wait forever.

C) Performance hazards — “it works, but it’s slow”

Concurrency can reduce throughput/latency if overhead dominates (details next section).


7) Performance hazards (your added point) — why threads have real overhead

You were 100% right to add this: threads aren’t free.

Even when multithreading improves throughput in principle, each thread introduces runtime costs:

1) Context switching overhead

When the scheduler suspends one thread and resumes another, it saves/restores execution context. Too many runnable threads → CPU time spent juggling instead of working.

2) Loss of locality (cache pain)

Threads bouncing across cores can trash CPU caches. Even with high CPU usage, throughput can drop because the CPU is constantly reloading data rather than executing logic.

3) Scheduler overhead

More runnable threads means more time deciding who runs next and maintaining scheduling structures.

4) Contention costs

Even with “reasonable” thread counts, shared locks/queues can serialize work:

  • lock contention → waiting

  • wakeups → overhead

  • convoy effects → latency spikes

5) Memory overhead

Each platform thread consumes stack space and runtime bookkeeping. Huge thread counts can burn memory/GC budget.

Performance hazard example: unbounded thread-per-request

while (true) {
  Socket s = server.accept();
  new Thread(() -> handle(s)).start(); // scales... until it doesn't
}

At high load you don’t just get slower — you can get unstable:

  • too many threads

  • too many context switches

  • memory pressure

  • catastrophic tail latency


8) Simplified handling of async events: “don’t model everything as raw threads”

Chapter 1 hints at a better approach: structure concurrency around tasks and events, not around manually created threads everywhere.

A) The “task queue + thread pool” approach (simple, scalable, readable)

Instead of “new Thread per event,” you do:

ExecutorService pool = Executors.newFixedThreadPool(32);

void onEvent(Event e) {
  pool.submit(() -> handleEvent(e));
}

This is “simplified async event handling” in practice:

  • the event arrives (UI click, HTTP request, timer tick)

  • you enqueue work

  • a bounded number of workers execute it

You get concurrency and control.

B) Futures/CompletableFuture for “async flow”

When you want non-blocking composition:

CompletableFuture
  .supplyAsync(this::fetchUser, pool)
  .thenApply(this::enrichUser)
  .thenAccept(this::renderOrRespond);

You model “what happens next” without blocking the caller.

C) Swing: async events + thread confinement

Swing uses event-driven async behavior, but with a strict rule:

  • most UI access must happen on the Event Dispatch Thread (EDT)
    Oracle’s Swing docs: only methods explicitly documented thread-safe are safe off-EDT; all others must run on the EDT. (Oracle Docs)

So you handle long work on a worker thread and post back to EDT:

  • SwingUtilities.invokeLater(...) for UI updates (Oracle Docs)

  • or SwingWorker for the common “background + publish results” pattern (Oracle Docs)


9) Non-blocking I/O: Unix select/poll → Java NIO Selector

This is the “under the hood” part you asked for.

The core problem

If you use blocking I/O like:

read(fd, buf, n);  // blocks until data arrives

…and you have 10,000 connections, then:

  • thread-per-connection becomes expensive

  • or you end up busy-waiting (wasting CPU)

Unix solved this with I/O multiplexing.


A) Unix select() (classic readiness multiplexing)

select() allows a program to monitor multiple file descriptors, waiting until one or more become “ready” for I/O. A descriptor is “ready” if an I/O operation would not block. (man7.org)

Two practical details that matter:

  • select() uses fd_set bitsets, and POSIX limits the size by FD_SETSIZE. (man7.org)

  • On return, the sets are modified in place (so you must reinitialize them each loop). (man7.org)

Tiny pseudo-loop (C-ish):

for (;;) {
  FD_ZERO(&readset);
  FD_SET(server_fd, &readset);
  FD_SET(client_fd, &readset);

  int n = select(maxfd+1, &readset, NULL, NULL, &timeout);
  if (FD_ISSET(server_fd, &readset)) accept_client();
  if (FD_ISSET(client_fd, &readset)) read_client();
}

B) Unix poll() (similar idea, different interface)

poll() performs a similar role to select(): wait for one of a set of file descriptors to become ready for I/O. It takes an array of pollfd structs. (man7.org)

It avoids some select() awkwardness (bitsets + fixed FD range), and is typically preferred over select() on modern systems. (Arch Linux Manual Pages)


C) epoll (Linux) and kqueue (BSD/macOS) — scalable successors

  • epoll monitors many file descriptors for readiness and “scales well to large numbers” of watched descriptors. (man7.org)

  • kqueue provides a general kernel event notification mechanism based on “filters.” (FreeBSD Manual Pages)

These are the usual “industrial” building blocks behind high-concurrency network servers.


D) Java NIO Selector: the Java version of readiness multiplexing

Java NIO gives you:

  • Channel + Buffer

  • Selector to monitor many channels and react when they’re ready

Oracle’s Selector docs describe selection operations (select(), selectNow(), etc.) that identify keys/channels ready for operations. (Oracle Docs)

Minimal NIO skeleton (Java):

Selector selector = Selector.open();

ServerSocketChannel server = ServerSocketChannel.open();
server.configureBlocking(false);
server.bind(new InetSocketAddress(8080));
server.register(selector, SelectionKey.OP_ACCEPT);

while (true) {
  selector.select(); // block until some channel is ready

  for (Iterator<SelectionKey> it = selector.selectedKeys().iterator(); it.hasNext();) {
    SelectionKey key = it.next();
    it.remove();

    if (key.isAcceptable()) {
      SocketChannel client = server.accept();
      client.configureBlocking(false);
      client.register(selector, SelectionKey.OP_READ);
    }

    if (key.isReadable()) {
      SocketChannel client = (SocketChannel) key.channel();
      ByteBuffer buf = ByteBuffer.allocate(4096);
      int n = client.read(buf);
      if (n < 0) client.close();
      // process buf...
    }
  }
}

How does Java map this to the OS?
The JDK uses platform-specific selector providers internally. For example, OpenJDK has an EPollSelectorProvider for Linux. (GitHub)
And it has a KQueueSelectorProvider for macOS. (GitHub)

So conceptually:

  • Unix gives you select/poll/epoll/kqueue

  • Java gives you Selector

  • the JDK bridges them with a platform-appropriate provider

That’s the missing “why NIO exists” piece from Chapter 1:

  • thread-per-client hit limits

  • multiplexed readiness APIs evolved

  • NIO exposes that model to Java


10) RMI and concurrency: yes, same remote method can be called concurrently

You asked:

Can the same remote method on the same remote object be called simultaneously by multiple RMI threads?

Yes, it may. Oracle’s RMI architecture spec states:

  • RMI makes no guarantees about mapping invocations to threads.

  • Remote invocation on the same remote object may execute concurrently, so the remote object implementation must be thread-safe. (Oracle Docs)

So treat remote objects like servlets:

  • assume concurrent calls

  • guard shared mutable state

  • design for thread safety


11) Future-proofing this Chapter 1 mindset (2026+)

JCIP is older, but Chapter 1 is still structurally correct. What’s changed is the set of tools available.

A) Choose the model based on the workload

If mostly CPU-bound (pure computation):

  • limit concurrency to core count-ish

  • avoid contention

  • consider fork/join or parallel streams for data-parallel work

If mostly I/O-bound (network, DB, remote calls):

  • concurrency is about overlapping waits

  • you can scale with:

B) Virtual threads (modern “thread-per-task” without the old pain)

Virtual threads are designed to dramatically reduce the cost of high-concurrency applications and were finalized in Java 21 (JEP 444). (openjdk.org)

This matters because it makes the old “thread-per-request” style viable again for many I/O-heavy servers—without thousands of heavyweight platform threads.

C) Structured concurrency (making async lifecycles less error-prone)

Structured concurrency is an approach to preserve parent/child relationships between tasks to improve readability, maintainability, cancellation, and observability. (openjdk.org)
As of JDK 25 it’s still in preview iterations (JEP 505). (openjdk.org)

D) Stick to LTS for production baselines

Oracle’s roadmap lists Java 21 and Java 25 as LTS releases (among others), and indicates the LTS cadence. (Oracle)

Future-proof principle:
Even if APIs evolve, the Chapter 1 hazard categories don’t. Safety, liveness, performance remain the real game.


12) Mini checklist for Chapter 2 (Thread Safety) + exercises

Thread-safety checklist (quick mental scan)

  1. What state is shared? (heap fields, statics, caches, singletons)

  2. Is that state mutable? (if yes, danger)

  3. Are there compound actions? (check-then-act, read-modify-write)

  4. What’s the coordination strategy?

    • lock? atomic? immutable? confinement?

  5. What’s the liveness risk?

    • lock ordering? blocking calls? bounded pools?

  6. What’s the performance plan?

    • thread count? contention? queue depth? tail latency?

Small exercises (fast learning, big payoff)

  • Create a shared count++ bug with two threads; fix with AtomicInteger.

  • Create the two-lock deadlock; fix with consistent lock ordering.

  • Implement a tiny selector-based echo server (NIO) and compare it conceptually to thread-per-connection.

  • In Swing: freeze UI by doing work on EDT, then fix using SwingWorker. (Oracle Docs)


Closing: what Chapter 1 really arms you with

By the end of Chapter 1, you’ve built the exact mental model you need:

  • Concurrency exists because hardware + I/O realities demand it.

  • Threads help utilization, responsiveness, and modeling.

  • Threads also create safety, liveness, and performance hazards.

  • OS I/O multiplexing (select/poll etc.) is the foundation of non-blocking/event-driven servers, and Java NIO is the Java face of that world.

  • Modern Java adds new tools (virtual threads, structured concurrency), but the core reasoning remains exactly what JCIP teaches.

If you want, for Chapter 2, paste any code snippet you’re reading (even a short one) and I’ll do a “thread-safety review” on it using the checklist above—like a mini code review, but focused on atomicity/visibility/invariants.

Concurrency in Java Concurrency in Practice — Chapter 1 (A human-friendly blog + your notes, organized)

You’ve basically decoded what Chapter 1 is trying to do: it’s not teaching APIs yet — it’s building the mental model and warning labels.

Here’s a clean, interesting “blog-style” version that combines everything you covered (plus your added performance hazard point about context switching + runtime overhead), with a few extra hints where the book is implicitly going.


The world changed: CPUs stopped getting “faster” the old way

For a long time, performance improvements were “free”:

Then reality hit:

So chip makers started adding more cores instead of cranking frequency. That gives you a new deal:

You don’t automatically get faster. You get more total compute, but only if your software can use it concurrently/parallel.

That’s the big modern motivation: multicore is the default, and concurrency is how you cash that check.


A quick history lesson: from bare metal to “many things at once”

Early systems:

  • one machine, one program, one user, one “flow”

Then OSs evolved to run multiple programs “at once” (time-slicing). Why?

1) Resource utilization

If program A is waiting on disk/network, the CPU shouldn’t sit idle. Run program B.

2) Fairness

Multiple users/processes should get a fair share of machine time.

3) Convenience (this one matters more than people admit)

Humans think in independent activities:

  • boil water

  • while it boils, read the news

  • then make tea

That’s concurrency as a model—even if there’s only one CPU.


Threads: “lightweight processes” with shared memory

A thread is often called a “lightweight process,” but the key detail is:

Threads share

Threads do NOT share

This is why threads are powerful and dangerous:

Sharing memory makes communication easy… and bugs easy.

Because now two threads can touch the same object at the same time.


Why threads are useful (the 3 big wins)

1) Better responsiveness

UI example:

You already called this out with Swing’s Event Dispatch Thread (EDT) idea.

2) Better throughput / resource usage

If one thread blocks on I/O:

  • another thread can run

  • CPU stays busy

  • system handles more work per unit time

3) Simpler modeling (sometimes)

Even if the implementation is complex, the mental model can be simpler:

  • “each request is handled independently”

  • “each client gets a flow”

This is why classic servers used the thread-per-request model.


Concurrency is already everywhere in Java (even if you didn’t ask for it)

Chapter 1 is basically saying: you’re already in the concurrency game.

Examples:

So even “normal” code can become concurrent because the framework calls you from multiple threads.


The three categories of concurrency hazards

This is the part where the book goes: “Okay, threads are great… now here’s how they ruin your day.”

1) Safety hazards (correctness)

This is about bad things happening:

Classic example: count++ is not atomic.
It’s roughly:

  1. read count

  2. add 1

  3. write back

Two threads can interleave those steps and lose updates.

2) Liveness hazards (progress)

This is about nothing good happening even if nothing “crashes”.

You nailed the definition:

  • Safety: “nothing bad happens”

  • Liveness: “something good eventually happens”

Common liveness failures:

  • Deadlock (A waits for B, B waits for A)

  • Starvation (a thread never gets CPU/lock/resources)

  • Livelock (everyone keeps reacting and moving, but no progress)

3) Performance hazards (doing work… but slower ๐Ÿ˜ญ)

This is the one you asked to strengthen, and it’s a BIG deal in real systems.

Concurrency can absolutely improve throughput — but threads also introduce runtime overhead. If you overdo threads (or synchronize poorly), your app can get slower as you add “parallelism”.


Performance hazards: the concrete overhead threads introduce

Here’s the “meat” point you wanted added, expanded in practical terms.

A) Context switching overhead

When the scheduler suspends one thread and runs another, the system has to:

  • save registers / program counter / stack metadata

  • restore another thread’s execution context

  • update scheduling structures

A context switch isn’t “free.” If you have too many runnable threads, the CPU spends real time just juggling them.

B) Loss of locality (cache pain)

Modern CPUs are fast partly because of caches (L1/L2/L3).

When you switch threads frequently:

  • the next thread likely touches different memory

  • caches get “cold”

  • more cache misses → slower execution

  • branch prediction and pipeline friendliness also degrade

Result: CPU time goes into reloading data instead of doing your business logic.

C) Scheduling overhead

With lots of threads:

  • the OS/JVM does more bookkeeping

  • the scheduler spends more time deciding who runs next

  • your app does less useful work per second

D) Lock contention + coordination costs

Even without many threads, performance dies when:

  • many threads fight over the same lock

  • you serialize “parallel” work accidentally

  • you trigger blocking/wakeup storms

E) Memory overhead

Each platform thread costs:

  • stack memory

  • internal structures in OS + JVM

Thousands of threads can eat memory even if “idle”.

Big takeaway:

Threads can increase throughput… until they don’t. After a point, more threads = more overhead, less real work.

That’s why real systems care about:

  • choosing sane thread counts

  • using thread pools

  • minimizing contention

  • measuring with profilers instead of guessing


Why NIO shows up here (and what it means in plain English)

You mentioned the book talking about multiplexed I/O and then name-dropping Java NIO.

The point is:

  • classic java.io often blocks a thread per connection

  • with enough clients, thread-per-client becomes expensive (context switches + memory + scheduler)

NIO lets you handle lots of connections with fewer threads by using:

  • non-blocking channels

  • selectors (“tell me which sockets are ready”)

This is the event-loop style used by Netty/Undertow/etc.

Rule of thumb:

  • Many mostly-idle connections → NIO/event loop shines

  • Moderate concurrency or simpler workloads → blocking I/O + thread pools can be simpler and fine

  • Modern Java also adds virtual threads (huge topic later), which changes the tradeoffs again


Swing EDT: thread safety via confinement (why JTable isn’t thread-safe)

Swing’s approach is simple and strict:

  • Only the Event Dispatch Thread touches UI components

  • background threads do slow work

  • UI updates are posted back onto the EDT

This avoids races by design: don’t share UI state across threads.


Your RMI question: can the same remote method run simultaneously on the same remote object?

In typical Java RMI server implementations, yes—concurrent calls are possible.

What usually happens:

  • the RMI runtime uses threads to handle incoming calls

  • multiple clients can invoke the same remote method at the same time

  • if they target the same remote object instance, your method can be entered concurrently by multiple threads

So the remote object must be designed like any other shared object:

  • make it thread-safe, or

  • confine state, or

  • synchronize access appropriately

If you keep shared mutable state inside a remote object without protection, you can absolutely create race conditions.


Practical hints (Chapter 1 “rules of survival”)

These are the habits that save you later:

  • Treat any framework callback as “could be called concurrently” unless documented otherwise

  • Prefer immutable objects or thread confinement where possible

  • Use thread pools instead of unbounded thread creation

  • Avoid blocking in event loops (NIO/reactor style)

  • Don’t optimize by guesswork — measure contention, CPU, GC, queue depth, latency percentiles


Final block: Everything you learned so far, distilled

If you remember only this, you’re in great shape going into Chapter 2:

  1. Multicore is the norm because clock speed stopped scaling nicely. To use modern hardware well, software must exploit concurrency/parallelism.

  2. Threads share heap memory, which makes communication easy and bugs likely.

  3. Threads improve systems through:

    • Responsiveness (UI stays alive)

    • Throughput (overlap I/O waits and CPU work)

    • Simpler modeling (request-per-thread is easy to reason about)

  4. Concurrency hazards come in three families:

    • Safety: wrong results (races, broken invariants)

    • Liveness: no progress (deadlock, starvation, livelock)

    • Performance: slower due to overhead/coordination

  5. Performance hazards are real: context switches, scheduler overhead, memory overhead, cache locality loss, lock contention. More threads can make you slower, not faster.

  6. Java already uses concurrency under the hood (servlets, RMI, Swing, timers, JVM internals) — so you must assume concurrency exists even if you didn’t “create threads.”

  7. NIO exists because thread-per-connection doesn’t scale forever; multiplexed I/O lets fewer threads manage many connections.

  8. Swing is thread-safe by confinement: all UI work on the EDT; background threads do slow work; UI updates hop back to EDT.


If you want, for Chapter 2 I can turn this into a “mini checklist” you can apply to any code snippet you read:

  • What’s shared?

  • What’s mutable?

  • What are the invariants?

  • What’s the publication/visibility story?

  • What’s the contention story?

That’ll make the next chapter feel way less abstract.

January 13, 2026

Mastering Advanced SQL Interview Questions

A Practical Guide for Freshers & Experienced Engineers

SQL interviews are not about memorizing syntax — they test data thinking, edge-case handling, and real-world querying skills.

This blog covers frequently asked SQL interview problems, explained step-by-step with:

  • clear intent

  • correct SQL

  • beginner-friendly explanations

  • advanced variations for experienced candidates

You can revisit this blog anytime — it’s written for long-term learning.


1️⃣ Delete Duplicate Records While Keeping One

๐Ÿ“Œ Problem

A table contains duplicate rows. You need to delete duplicates but keep one record per group.

Assume a table:

Employees(id, email)

✅ Solution Using ROW_NUMBER() (Best Practice)

DELETE FROM Employees
WHERE id IN (
    SELECT id
    FROM (
        SELECT id,
               ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
        FROM Employees
    ) t
    WHERE rn > 1
);

๐Ÿง  Explanation

  • PARTITION BY email groups duplicates

  • ROW_NUMBER() assigns 1, 2, 3…

  • Keep rn = 1, delete the rest

๐Ÿ’ก Interview Tip

Always preview with SELECT before DELETE.


2️⃣ Find Employees Who Worked on All Projects

Tables:

Employees(emp_id)
EmployeeProjects(emp_id, project_id)
Projects(project_id)

✅ Solution Using GROUP BY + HAVING

SELECT emp_id
FROM EmployeeProjects
GROUP BY emp_id
HAVING COUNT(DISTINCT project_id) =
       (SELECT COUNT(*) FROM Projects);

๐Ÿง  Explanation

  • Count projects per employee

  • Compare with total project count

๐Ÿ’ก Interview Tip

This pattern = worked on all” / “matched all → remember it.


3️⃣ Customers With Most Orders but Lowest Total Spend (Last Month)

Table:

Orders(order_id, customer_id, amount, order_date)

✅ Step 1: Aggregate Last Month Data

WITH last_month_orders AS (
    SELECT customer_id,
           COUNT(*) AS order_count,
           SUM(amount) AS total_amount
    FROM Orders
    WHERE order_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')
      AND order_date < DATE_TRUNC('month', CURRENT_DATE)
    GROUP BY customer_id
)
SELECT *
FROM last_month_orders
ORDER BY order_count DESC, total_amount ASC;

๐Ÿง  Explanation

  • Highest orders → order_count DESC

  • Lowest spend → total_amount ASC

๐Ÿ’ก Interview Tip

Ordering by multiple business conditions is very common.


4️⃣ Products With Sales Higher Than Average Monthly Sales

Tables:

Sales(product_id, sale_amount, sale_date)
Products(product_id, product_name)

✅ Using Subquery + JOIN

SELECT p.product_id, p.product_name
FROM Products p
JOIN (
    SELECT product_id,
           AVG(sale_amount) AS avg_sales
    FROM Sales
    GROUP BY product_id
) ps ON p.product_id = ps.product_id
WHERE ps.avg_sales >
      (SELECT AVG(sale_amount) FROM Sales);

๐Ÿง  Explanation

  • Inner query → avg per product

  • Subquery → global avg

  • Compare both

๐Ÿ’ก Interview Tip

Interviewers love compare against average questions.


5️⃣ Students in Top 10% of Their Class (Window Functions)

Table:

Students(student_id, class_id, marks)

✅ Using PERCENT_RANK()

SELECT student_id, class_id, marks
FROM (
    SELECT student_id,
           class_id,
           marks,
           PERCENT_RANK() OVER (PARTITION BY class_id ORDER BY marks DESC) AS pr
    FROM Students
) t
WHERE pr <= 0.10;

๐Ÿง  Explanation

  • PERCENT_RANK() gives percentile

  • <= 0.10 → top 10%

๐Ÿ’ก Interview Tip

For rank-based questions, always think window functions.


6️⃣ Suppliers With Products Cheaper Than Category Average

(Correlated Subquery + JOIN)

Tables:

Suppliers(supplier_id, name)
Products(product_id, supplier_id, category_id, price)

✅ Solution

SELECT DISTINCT s.supplier_id, s.name
FROM Suppliers s
JOIN Products p ON s.supplier_id = p.supplier_id
WHERE p.price <
      (
        SELECT AVG(p2.price)
        FROM Products p2
        WHERE p2.category_id = p.category_id
      );

๐Ÿง  Explanation

  • Subquery recalculates avg per category

  • Compares product price with category avg

๐Ÿ’ก Interview Tip

This is a classic correlated subquery example.


7️⃣ Customers and Their Total Order Amount

(Include Customers With No Orders)

Tables:

Orders(order_id, customer_id, amount)
Customers(customer_id, name)

✅ Correct Solution Using LEFT JOIN

SELECT c.customer_id,
       c.name,
       COALESCE(SUM(o.amount), 0) AS total_order_amount
FROM Customers c
LEFT JOIN Orders o
       ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

๐Ÿง  Explanation

  • LEFT JOIN keeps all customers

  • COALESCE converts NULL to 0

๐Ÿ’ก Interview Tip

If they say include even if no records exist, think LEFT JOIN.


๐Ÿง  Key SQL Patterns to Remember

Problem TypePattern
Delete duplicatesROW_NUMBER()
Worked on all itemsGROUP BY + HAVING
Top N per groupRANK() / DENSE_RANK()
Compare with averageSubquery
Percentile / top %PERCENT_RANK()
Include missing dataLEFT JOIN

๐ŸŽฏ What Interviewers Really Look For

✔ Correct joins
✔ Proper grouping
✔ No missing edge cases
✔ Business logic clarity
✔ Clean, readable SQL

Not just syntax.


๐Ÿ“Œ Final Advice for Long-Term SQL Mastery

  • Always start with SELECT, then DELETE/UPDATE

  • Think in sets, not rows

  • Ask clarifying questions (> vs >=, date ranges)

  • Practice explaining why, not just how


Identifying the Top 10 Sellers for Featured Promotions (E-commerce Platform)

E-commerce platforms frequently rank sellers to decide featured placements, promotions, and incentives.

This problem simulates a real-world backend/data scenario where seller performance is distributed across multiple months and must be evaluated using business rules.


๐Ÿ“Œ Problem Statement

An e-commerce platform wants to identify the Top 10 sellers for featured promotions based on their performance over the last 3 months.


๐Ÿ“‚ Data Sources

  • jan_orders.csv

  • feb_orders.csv

  • mar_orders.csv

Each Record

(sellerId, orderValue, customerRating)

Each record represents one completed order.


✅ Selection Criteria

A seller is considered eligible only if all the following conditions are met:

  1. At least 100 total orders across all months

  2. Average customer rating ≥ 4.2

  3. At least 20 orders in EACH month

Ranking Rule


๐ŸŽฏ Expected Output

List<String> → sellerIds sorted by average order value (highest first)

๐Ÿง  Key Clarifications (Interview Gold)

Before coding, clarify these assumptions:

  • Customer rating is per order (not per seller)

  • Average rating = totalRatings / totalOrders

  • Average order value = totalOrderValue / totalOrders

  • A seller missing even one month is not eligible

  • ≥ 4.2 means 4.2 is allowed

These clarifications show strong analytical thinking.


๐Ÿ— High-Level Approach

Step 1: Aggregate seller data

Use a map:

sellerId → SellerSummary

Track:

  • order count per month

  • total order value

  • total rating sum


Step 2: Compute derived metrics

For each seller:

  • totalOrders

  • averageRating

  • averageOrderValue


Step 3: Apply eligibility rules

Filter sellers who fail any condition:

  • totalOrders < 100

  • any month < 20 orders

  • averageRating < 4.2


Step 4: Rank & select

  • Sort by average order value (DESC)

  • Pick top 10


๐Ÿงฉ Java Data Model (Interview-Ready)

class SellerSummary {
    String sellerId;

    int ordersJan, ordersFeb, ordersMar;
    double totalOrderValue;
    double totalRating;

    int totalOrders() {
        return ordersJan + ordersFeb + ordersMar;
    }

    double averageRating() {
        return totalOrders() == 0 ? 0 : totalRating / totalOrders();
    }

    double averageOrderValue() {
        return totalOrders() == 0 ? 0 : totalOrderValue / totalOrders();
    }

    boolean isEligible() {
        return totalOrders() >= 100
            && ordersJan >= 20
            && ordersFeb >= 20
            && ordersMar >= 20
            && averageRating() >= 4.2;
    }
}

๐Ÿง  Stream Pipeline (Conceptual)

sellers.values().stream()
    .filter(SellerSummary::isEligible)
    .sorted(Comparator.comparingDouble(
        SellerSummary::averageOrderValue).reversed())
    .limit(10)
    .map(s -> s.sellerId)
    .toList();

⏱ Time & Space Complexity

Let:

  • N = total number of orders

  • S = number of sellers

OperationComplexity
AggregationO(N)
FilteringO(S)
SortingO(S log S)
SpaceO(S)

This is optimal and production-ready.


❌ Common Mistakes to Avoid

  1. Averaging averages

    • Always calculate from totals

  2. Integer division

    • Use double for all averages

  3. Ignoring monthly constraints

    • Each month must meet minimum orders

  4. Using > instead of

    • Ratings are inclusive here

  5. Ranking before filtering

    • Eligibility must come first


๐Ÿš€ How to Extend This in Real Systems


๐Ÿงช Interview Follow-Up Questions

  • How would you handle seller fraud or fake ratings?

  • What if promotions require category-wise top sellers?

  • How would you optimize this for millions of orders/day?

  • Would you precompute rankings or calculate on demand?


✅ Final Takeaway

This problem tests more than coding:

✔ data modeling
✔ aggregation logic
✔ business rule enforcement
✔ ranking correctness
✔ scalability thinking

If you can clearly explain this flow, you demonstrate strong backend and system-level thinking—exactly what interviewers want.


Rewarding the Top 5 Sales Executives Across 3 Quarters (Java + Map + Streams)

This post shows an interview-ready way to solve a common “aggregate + filter + rank” problem in Java. We’ll go from the problem statement to a clean final solution, and then cover tips, pitfalls, and improvements that interviewers typically look for.


✅ Question

A company wants to reward the Top 5 sales executives based on performance over the last 3 quarters.

Data Files

  • q1_sales.csv

  • q2_sales.csv

  • q3_sales.csv

Each Record

(employeeId, dealsClosed, revenue)

Selection Criteria

An employee is eligible only if all conditions are met:

  1. Total deals closed ≥ 30 across all quarters

  2. Average revenue per deal > $5,000

  3. Deals closed ≥ 5 in each quarter

  4. From eligible employees, select Top 5 by total revenue (descending)

Output

Return a list of employeeIds sorted by total revenue DESC.


๐Ÿง  Approach

This is an “aggregation + eligibility + ranking” problem. The clean approach is:

  1. Aggregate quarter data into one summary per employee.

    • Track: deals and revenue per quarter, and totals.

  2. Apply filters:

    • totalDeals ≥ 30

    • dealsQ1 ≥ 5, dealsQ2 ≥ 5, dealsQ3 ≥ 5

    • averageRevenuePerDeal > 5000

  3. Sort by total revenue descending

  4. Take top 5

  5. Return list of employeeIds

Why Map?

A HashMap<String, EmployeeSalesSummary> gives you O(1) updates per record and keeps the code straightforward.


✅ Final Answer (Clean Java Solution)

This is a polished version of the code you started, with small improvements:

  • Uses constants for rule thresholds

  • Fixes the placeholder return (now returns top list)

  • Ensures filters match the problem statement

  • Adds a tie-breaker for consistent sorting (optional but good practice)

  • Removes unused imports

import java.util.*;
import java.util.stream.Collectors;

public class TopSalesExecutivesFinder {

    // ---------------- SAMPLE INPUT (Hardcoded) ----------------
    static List<SalesRecord> q1Sales = Arrays.asList(
            new SalesRecord("E1", 10, 60000),
            new SalesRecord("E2", 8, 30000),
            new SalesRecord("E3", 12, 80000)
    );

    static List<SalesRecord> q2Sales = Arrays.asList(
            new SalesRecord("E1", 9, 55000),
            new SalesRecord("E2", 12, 70000),
            new SalesRecord("E3", 10, 60000)
    );

    static List<SalesRecord> q3Sales = Arrays.asList(
            new SalesRecord("E1", 11, 65000),
            new SalesRecord("E2", 10, 50000),
            new SalesRecord("E3", 9, 55000)
    );

    public static List<String> findTopSalesExecutives() {

        // ---------------- RULE THRESHOLDS ----------------
        final int MIN_TOTAL_DEALS = 30;
        final int MIN_DEALS_EACH_QUARTER = 5;
        final double MIN_AVG_REV_PER_DEAL = 5000.0; // must be strictly greater
        final int TOP_K = 5;

        // ---------------- AGGREGATION ----------------
        Map<String, EmployeeSalesSummary> summaryMap = new HashMap<>();

        q1Sales.forEach(r ->
                summaryMap.computeIfAbsent(r.employeeId, EmployeeSalesSummary::new)
                          .addQ1(r.dealsClosed, r.revenue)
        );

        q2Sales.forEach(r ->
                summaryMap.computeIfAbsent(r.employeeId, EmployeeSalesSummary::new)
                          .addQ2(r.dealsClosed, r.revenue)
        );

        q3Sales.forEach(r ->
                summaryMap.computeIfAbsent(r.employeeId, EmployeeSalesSummary::new)
                          .addQ3(r.dealsClosed, r.revenue)
        );

        // ---------------- FILTER + SORT + PICK ----------------
        return summaryMap.values().stream()
                // 1) total deals >= 30
                .filter(s -> s.totalDeals() >= MIN_TOTAL_DEALS)
                // 2) deals >= 5 in each quarter
                .filter(s -> s.dealsEachQuarterAtLeast(MIN_DEALS_EACH_QUARTER))
                // 3) avg revenue per deal > 5000
                .filter(s -> s.averageRevenuePerDeal() > MIN_AVG_REV_PER_DEAL)
                // 4) sort by total revenue desc (tie-breaker optional)
                .sorted(Comparator.comparingDouble(EmployeeSalesSummary::totalRevenue).reversed()
                        .thenComparing(EmployeeSalesSummary::totalDeals, Comparator.reverseOrder())
                        .thenComparing(s -> s.employeeId))
                // 5) top 5
                .limit(TOP_K)
                // return employeeIds
                .map(s -> s.employeeId)
                .collect(Collectors.toList());
    }

    public static void main(String[] args) {
        List<String> result = findTopSalesExecutives();
        System.out.println("Top Sales Executives: " + result);
    }
}

// ---------------- SUPPORT CLASSES ----------------

class SalesRecord {
    String employeeId;
    int dealsClosed;
    double revenue;

    SalesRecord(String employeeId, int dealsClosed, double revenue) {
        this.employeeId = employeeId;
        this.dealsClosed = dealsClosed;
        this.revenue = revenue;
    }
}

class EmployeeSalesSummary {
    String employeeId;

    int dealsQ1, dealsQ2, dealsQ3;
    double revenueQ1, revenueQ2, revenueQ3;

    EmployeeSalesSummary(String employeeId) {
        this.employeeId = employeeId;
    }

    void addQ1(int deals, double revenue) {
        dealsQ1 += deals;
        revenueQ1 += revenue;
    }

    void addQ2(int deals, double revenue) {
        dealsQ2 += deals;
        revenueQ2 += revenue;
    }

    void addQ3(int deals, double revenue) {
        dealsQ3 += deals;
        revenueQ3 += revenue;
    }

    int totalDeals() {
        return dealsQ1 + dealsQ2 + dealsQ3;
    }

    double totalRevenue() {
        return revenueQ1 + revenueQ2 + revenueQ3;
    }

    double averageRevenuePerDeal() {
        int totalDeals = totalDeals();
        if (totalDeals == 0) return 0.0;
        return totalRevenue() / totalDeals; // safe double division
    }

    boolean dealsEachQuarterAtLeast(int minDeals) {
        return dealsQ1 >= minDeals && dealsQ2 >= minDeals && dealsQ3 >= minDeals;
    }

    @Override
    public String toString() {
        return "EmployeeSalesSummary{" +
                "employeeId='" + employeeId + '\'' +
                ", dealsQ1=" + dealsQ1 +
                ", dealsQ2=" + dealsQ2 +
                ", dealsQ3=" + dealsQ3 +
                ", totalDeals=" + totalDeals() +
                ", totalRevenue=" + totalRevenue() +
                ", avgRevPerDeal=" + averageRevenuePerDeal() +
                '}';
    }
}

๐Ÿงพ What’s the Output for Your Sample Data?

With your sample input:

  • E1 total deals = 10 + 9 + 11 = 30
    avg revenue per deal = (60000+55000+65000)/30 = 180000/30 = 6000
    each quarter deals ≥ 5 ✅
    total revenue = 180000

  • E2 total deals = 8 + 12 + 10 = 30
    avg revenue per deal = (30000+70000+50000)/30 = 150000/30 = 5000 ❌ (must be > 5000)

  • E3 total deals = 12 + 10 + 9 = 31
    avg revenue per deal = (80000+60000+55000)/31 ≈ 6290
    each quarter deals ≥ 5 ✅
    total revenue = 195000

✅ Result:

Top Sales Executives: [E3, E1]

✅ Tips & Interview Notes

1) Don’t mix up “average revenue per deal”

The rule is:

average revenue per deal across ALL quarters

So compute:

totalRevenue / totalDeals

Not quarter averages.

2) Beware >= vs >

  • The problem says average revenue per deal > 5000

  • That means 5000 exactly should FAIL (like E2 above)

3) Avoid integer division

Here we’re safe because totalRevenue is double.
If revenue was int, you’d need:

(double) totalRevenue / totalDeals

4) Sorting tie-breakers (bonus)

If total revenue ties, you can stabilize output:

  • higher total deals wins

  • then alphabetical employeeId

Interviewers like deterministic sorting.

5) Make rules configurable

Putting thresholds in constants makes the function reusable:

  • change to Top 10

  • change min deals from 30 to 20

  • etc.


⏱ Complexity

Let N be total number of sales records across q1, q2, q3, and P employees.

  • Aggregation: O(N)

  • Sorting: O(P log P)

  • Memory: O(P)

This is optimal for the problem.


Optional Enhancements (If Asked)

  • Read from actual CSVs using BufferedReader

  • Support any number of quarters (use arrays instead of dealsQ1/Q2/Q3)

  • Add unit tests:

    • avg exactly 5000

    • missing quarter data

    • fewer than 5 eligible employees