Scaling SSE Across Multiple Nodes with Redis Permalink to this section

Part of Redis Pub/Sub Fan-Out for SSE.

Single-node SSE works fine in development, but the moment you deploy two or more application instances behind a load balancer, clients reconnecting after a network blip land on whichever node the balancer picks — which has zero knowledge of their subscription state or missed events. This guide shows how to make every node identical and stateless using Redis Pub/Sub for live fan-out and Redis Streams for Last-Event-ID replay, so any node can serve any client at any time without sticky-session hacks.

Symptom & Developer Intent Permalink to this section

Observed behaviour: A client reconnects with Last-Event-ID: 42, but the new node has no event history and silently sends nothing — or sends duplicate events because you ran two in-process broadcaster goroutines/threads. In logs you see:

WARN  sse: client reconnected with Last-Event-ID=42, no history found — sending nothing

Or, after adding a second pod, roughly half of reconnecting clients stop receiving events until the next manual refresh.

Developer intent: Run N identical application pods behind a round-robin load balancer (nginx, AWS ALB, GCP LB) with no sticky sessions, emit every event to every connected client across all pods, and replay missed events to reconnecting clients by reading from a shared durable log.

Root Cause Analysis Permalink to this section

Each SSE server maintains an in-process subscriber registry — a map of channel → []http.ResponseWriter or equivalent. When a client connects to Pod A, its ResponseWriter lives only in Pod A’s memory. When an event is published from any pod, only Pod A’s clients receive it. Pod B’s clients are blind.

Three separate problems compound this:

Problem Consequence
In-process subscriber registry Events only fan-out to clients on the emitting pod
No shared event log Reconnecting clients on a different pod cannot replay missed events
In-process sequence counter Each pod issues its own id: sequence, causing Last-Event-ID collisions across pods

The SSE protocol spec requires that the server honour Last-Event-ID on reconnect. Without a shared log, this is architecturally impossible across pods. Sticky sessions “fix” the replay problem but reintroduce statefulness, break zero-downtime deploys, and cause thundering-herd reconverging when a pod restarts.

Redis solves both halves: Pub/Sub for zero-latency fan-out, Streams (XADD/XRANGE) for durable, indexed event replay. A Redis Stream entry ID (1718000000000-0) is monotonically ordered across all producers and doubles as the SSE event id: field.

Step-by-Step Resolution Permalink to this section

Step 1 — Deploy a Redis instance (or cluster) Permalink to this section

Ensure every application pod can reach the same Redis endpoint. For production, use Redis Sentinel or Redis Cluster for HA. The steps below use a single Redis 7 instance; the client calls are identical for Cluster mode.

# Quick local Redis via Docker
docker run -d --name redis-sse \
  -p 6379:6379 \
  redis:7-alpine redis-server --appendonly yes

Configure maxmemory-policy noeviction so Stream entries are never silently dropped while pods are still consuming them.

Step 2 — Write every event to a Redis Stream (the shared log) Permalink to this section

Replace any in-process event buffer with XADD. The Stream key is the channel name (e.g., sse:news). Cap the stream with MAXLEN ~ to avoid unbounded growth.

// Node.js — TypeScript, ioredis
import Redis from "ioredis";

const redis = new Redis(process.env.REDIS_URL!);

interface SseEvent {
  type: string;
  data: string;
}

// publish returns the Redis Stream entry ID, which becomes the SSE event id
async function publishEvent(channel: string, event: SseEvent): Promise<string> {
  const id = await redis.xadd(
    `sse:${channel}`,
    "MAXLEN", "~", "10000",   // keep ~10 000 entries, approximate trim
    "*",                       // auto-generate entry id (timestamp-sequence)
    "type", event.type,
    "data", event.data,
  );
  // Also fan-out via Pub/Sub for zero-latency delivery to live subscribers
  await redis.publish(`sse:channel:${channel}`, JSON.stringify({ id, ...event }));
  return id!;
}

Using * lets Redis generate the entry ID as <milliseconds>-<seq>, which is globally monotonic — no per-pod sequence counter needed.

Step 3 — Subscribe to Pub/Sub on every pod to fan-out live events Permalink to this section

Each pod maintains one Redis subscriber connection per channel it serves. When a message arrives, it writes to every local ResponseWriter.

// Node.js — fan-out subscriber (run once per channel on pod startup)
import type { ServerResponse } from "http";

const clients = new Map<string, Set<ServerResponse>>(); // channel → local clients

async function startChannelFanOut(channel: string) {
  const sub = new Redis(process.env.REDIS_URL!);
  await sub.subscribe(`sse:channel:${channel}`);

  sub.on("message", (_ch: string, raw: string) => {
    const event = JSON.parse(raw) as { id: string; type: string; data: string };
    const payload =
      `id: ${event.id}\nevent: ${event.type}\ndata: ${event.data}\n\n`;

    for (const res of clients.get(channel) ?? []) {
      res.write(payload);
      // cast to any to access Node's internal flush; see Buffer Management guide
      (res as any).flush?.();
    }
  });
}

Because every pod runs this subscriber loop, a PUBLISH from any pod reaches all pods simultaneously, which then forward to their local clients. This is the classic Redis fan-out pattern.

Step 4 — Replay missed events from the Stream on reconnect Permalink to this section

When the client sends Last-Event-ID, read forward from that position using XRANGE.

import type { IncomingMessage, ServerResponse } from "http";

async function handleSseRequest(req: IncomingMessage, res: ServerResponse) {
  const channel = parseChannel(req.url!);            // e.g. "news"
  const lastId   = req.headers["last-event-id"] as string | undefined;

  // SSE headers — disable all proxy/CDN buffering
  res.writeHead(200, {
    "Content-Type":  "text/event-stream",
    "Cache-Control": "no-cache, no-store",
    "Connection":    "keep-alive",
    "X-Accel-Buffering": "no",                       // nginx directive
  });
  res.write(": connected\n\n");                      // comment keeps proxies awake

  // --- Replay missed events ---
  if (lastId) {
    // XRANGE key (lastId, +∞]; "(" prefix means exclusive start
    const missed = await redis.xrange(`sse:${channel}`, `(${lastId}`, "+");
    for (const [id, fields] of missed) {
      const type = fields[fields.indexOf("type") + 1];
      const data = fields[fields.indexOf("data") + 1];
      res.write(`id: ${id}\nevent: ${type}\ndata: ${data}\n\n`);
    }
  }

  // --- Register for live events ---
  if (!clients.has(channel)) clients.set(channel, new Set());
  clients.get(channel)!.add(res);

  // Clean up on disconnect
  req.on("close", () => {
    clients.get(channel)?.delete(res);
  });
}

function parseChannel(url: string): string {
  // e.g. /events/news → "news"
  return url.split("/").pop() ?? "default";
}

The exclusive lower bound (${lastId} means the client receives events strictly after the one it already processed — no duplicates.

Step 5 — Remove sticky-session configuration from the load balancer Permalink to this section

With stateless pods, sticky sessions are no longer needed and are actively harmful.

# nginx upstream — round-robin (default), no ip_hash
upstream sse_pods {
  server pod1:3000;
  server pod2:3000;
  server pod3:3000;
  # NO ip_hash; NO sticky directive
  keepalive 64;          # reuse upstream TCP connections
}

server {
  location /events/ {
    proxy_pass         http://sse_pods;
    proxy_http_version 1.1;
    proxy_set_header   Connection "";        # enable keep-alive to upstream
    proxy_set_header   Host $host;
    proxy_set_header   Last-Event-ID $http_last_event_id;
    proxy_buffering    off;                  # critical — no buffering
    proxy_cache        off;
    proxy_read_timeout 86400s;              # allow long-lived connections
    chunked_transfer_encoding on;
  }
}

Forwarding Last-Event-ID as a header ensures the header reaches req.headers["last-event-id"] in Node.js even if the browser sets it on the reconnect request. See the Buffer Management & Chunked Transfer Encoding guide for the full list of proxy directives.

Step 6 — Set a Stream TTL and eviction policy Permalink to this section

Streams are durable by default; cap them so Redis memory stays bounded.

# One-time setup for each channel key
redis-cli XADD sse:news MAXLEN "~" 10000 "*" type ping data "{}"
# Or set an explicit TTL on the stream key (refreshed on each XADD via a Lua script)
redis-cli EXPIRE sse:news 86400   # 24-hour TTL

For high-throughput channels, use approximate trimming (MAXLEN ~) — Redis trims lazily in O(1) amortised rather than O(n) per write.

Validation & Monitoring Permalink to this section

Verify fan-out with curl Permalink to this section

Open two terminals. In terminal 1, subscribe:

curl -N -H "Accept: text/event-stream" http://localhost:3000/events/news

In terminal 2, publish via Redis CLI (simulating any pod):

redis-cli XADD sse:news "*" type update data '{"msg":"hello"}'
redis-cli PUBLISH sse:channel:news '{"id":"1718000000001-0","type":"update","data":"{\"msg\":\"hello\"}"}'

Terminal 1 should print the event within milliseconds.

Verify Last-Event-ID replay Permalink to this section

# Simulate a reconnect after event id 1718000000001-0
curl -N \
  -H "Accept: text/event-stream" \
  -H "Last-Event-ID: 1718000000001-0" \
  http://localhost:3000/events/news

You should see all events added after that ID before the live stream begins.

Monitoring metrics to track Permalink to this section

Metric Tool Alert threshold
redis_connected_clients Redis INFO > 80% of maxclients
xlen(sse:<channel>) XLEN command > 9 000 (approaching MAXLEN cap)
Pub/Sub messages/sec redis-cli monitor or Prometheus redis_exporter sudden drop to 0
SSE client count per pod custom gauge unexpected imbalance
Reconnect rate app metric spike > baseline × 3

A sudden drop in Pub/Sub messages while XLEN keeps growing indicates pods are writing but the subscriber loop crashed — instrument with a reconnect wrapper around the sub.on("error") handler.

⚡ Production Directives

  • Set maxmemory-policy noeviction on Redis — Pub/Sub and Stream reads must never be silently discarded.
  • Use MAXLEN ~ (approximate) trimming on every XADD to keep stream size bounded without per-write O(n) cost.
  • Forward Last-Event-ID through every proxy tier (proxy_set_header Last-Event-ID $http_last_event_id) so the application always receives it.
  • Run one Redis subscriber connection per channel per pod, not one per client — fan-out locally in process.
  • Disable proxy buffering at every layer: nginx proxy_buffering off, CDN pass-through, ALB idle timeout ≥ 3 600 s.

Verification Checklist Permalink to this section

Frequently Asked Questions Permalink to this section

Does this approach work with Redis Cluster (sharded) mode?

Yes, with one constraint: all keys for a given channel must hash to the same shard. Use hash tags — name your keys sse:{news} and sse:channel:{news} so both the Stream and Pub/Sub key hash to the same slot. Redis Cluster routes Pub/Sub messages only to clients subscribed on the shard that owns the channel slot, so all pods must subscribe on the shard that owns the channel — ioredis and node-redis do this automatically when you use subscribe on a Redis Cluster client.

What happens if Redis goes down?

Live fan-out stops and new connections cannot replay missed events. Design a graceful degradation: catch the Redis connection error in the subscriber loop, immediately close all local SSE connections so clients reconnect (triggering the browser's built-in retry backoff), and expose a health check endpoint that returns 503 when Redis is unreachable so the load balancer stops routing new SSE traffic to affected pods. The retry field set to a longer interval (e.g. retry: 5000) reduces thundering-herd pressure on Redis recovery.

Should I use Redis Pub/Sub or Redis Streams alone?

Use both together. Pub/Sub alone has no persistence — a client that disconnects for one second misses those events forever. Streams alone have no push mechanism — you'd need per-client polling loops. The combination gives you sub-millisecond push latency from Pub/Sub plus arbitrarily long replay windows from Streams. The overhead is one extra PUBLISH call per event, which is negligible compared to the XADD.

How do I handle per-user channels vs broadcast channels?

Use namespaced stream and Pub/Sub keys: sse:user:{userId} and sse:channel:user:{userId}. Authenticate the SSE request before subscribing the client, then subscribe the pod's fan-out loop to that user's channel. Evict the Pub/Sub subscription when the client disconnects. For very high user counts, batch the per-user subscriptions using PSUBSCRIBE sse:channel:user:* with pattern matching and filter in-process — this keeps the number of Redis subscriber connections per pod constant regardless of connected-user count.

Can I use this pattern with the Fetch API and ReadableStream instead of EventSource?

Yes. The server side is identical — the HTTP response is the same text/event-stream body. The client-side difference is that fetch+ReadableStream does not automatically reconnect or send Last-Event-ID, so you must track the last received ID in JavaScript and re-issue the fetch with a Last-Event-ID header on connection loss. See the Error Handling & Reconnection UX guide for a complete client retry loop.