Backend Stream Generation & Connection Management Permalink to this section

The server side of Server-Sent Events looks deceptively simple: set Content-Type: text/event-stream, write a few data: lines, flush. In production that one open response becomes a long-lived, stateful socket that must survive proxy idle timeouts, NAT table expiry, mobile network handoffs, slow consumers, rolling deploys, and file-descriptor exhaustion — sometimes for hours at a time. This section is for backend and full-stack engineers building those endpoints in Node.js, Python, or Go and operating them at scale. It covers the wire format you must emit byte-for-byte, the reverse-proxy and load-balancer config that keeps streams alive, how to fan events out across many processes with Redis, and how to handle the dozens of failure modes that never appear in a localhost demo. The frontend half of the system — EventSource, reconnection, and UI state — lives under Frontend Consumption & Client Patterns, and the protocol rules both sides obey are specified in SSE Protocol Fundamentals & Architecture.

SSE backend topology from producer to client Events flow from a producer into Redis pub/sub, fan out to multiple SSE worker nodes, pass through a reverse proxy with buffering disabled, and reach the browser EventSource over a single long-lived HTTP response. Producer domain events Redis PUBLISH / SUB SSE worker A flush per event SSE worker B flush per event SSE worker C flush per event Reverse proxy buffering OFF read_timeout 24h Browser EventSource One event → fan-out → N long-lived responses Last-Event-ID on reconnect
Producer to browser: a single domain event fanned out across stateless workers and flushed down long-lived responses.

Concept overview: a stream is one HTTP response that never ends Permalink to this section

An SSE endpoint is an ordinary HTTP request whose response body is written incrementally and never closed by the server until the client leaves. The server sets three things: the text/event-stream content type, Cache-Control: no-cache so nothing caches a partial body, and chunked framing so bytes leave the kernel immediately. After the headers are flushed, the handler blocks (or awaits) on an event source and writes UTF-8 text frames separated by blank lines. The browser’s EventSource parses those frames and reconnects automatically when the socket drops — which means your server logic must be idempotent and resumable, not assume a single continuous session.

The single most common production bug is buffering. Application servers, language runtimes, and reverse proxies all accumulate output before sending it to optimise throughput. For SSE that turns a real-time stream into a batch delivery: events sit in a buffer for seconds and arrive in a clump. Every layer must be told to flush. Here is the minimal correct Node.js handler, with the flush call that most tutorials omit:

// Node.js built-in http — no framework. Correct headers + per-event flush.
import http from 'node:http';

http.createServer((req, res) => {
  if (req.url !== '/events') { res.writeHead(404).end(); return; }

  res.writeHead(200, {
    'Content-Type': 'text/event-stream; charset=utf-8',
    'Cache-Control': 'no-cache, no-transform', // no-transform stops proxies gzip-buffering
    'Connection': 'keep-alive',
    'X-Accel-Buffering': 'no',                 // nginx: disable proxy_buffering for this response
  });
  res.flushHeaders(); // send 200 + headers now, before the first event

  // Comment line as an immediate heartbeat so the client sees bytes at once.
  res.write(': connected\n\n');

  const tick = setInterval(() => {
    // id: enables Last-Event-ID resume; data: is the payload; blank line dispatches.
    res.write(`id: ${Date.now()}\n`);
    res.write(`event: tick\n`);
    res.write(`data: ${JSON.stringify({ ts: Date.now() })}\n\n`);
  }, 1000);

  // The only reliable disconnect signal: the request stream closing.
  req.on('close', () => clearInterval(tick));
}).listen(8080);

Note req.on('close'), not res.on('finish') — the client closing the TCP connection surfaces as a close on the request. Getting this teardown right is the subject of Node.js Streaming Architecture Basics; skip it and every dropped client leaks a timer, a closure, and a file descriptor.

The runtime primitive differs by language but the contract is identical: hold the response open, write framed UTF-8, flush, and tear down on disconnect. Node exposes a Writable stream whose write() return value signals backpressure. Python ASGI frameworks express the same loop as an async generator that yields frames and must cooperatively check for disconnection so the event loop is never blocked. Go gives you the most direct control — a goroutine writes to the http.ResponseWriter, then type-asserts it to http.Flusher and calls Flush() after each event, with the request Context() cancelled when the client leaves. Across all three, two invariants hold: never block the I/O scheduler with synchronous work, and never assume a write reached the client just because it returned — it may sit in a kernel or proxy buffer until the next flush.

Specification & wire format: the bytes you must emit Permalink to this section

The wire format is defined by the WHATWG HTML standard. A stream is a sequence of UTF-8 lines; each non-empty line is field: value, and a blank line dispatches the accumulated event to the client. Only five field names are meaningful — everything else is ignored, and a line beginning with : is a comment used for heartbeats. Multi-line payloads repeat the data: field, one line per logical line; the client joins them with \n. The full grammar and parsing rules are covered in Understanding the Event Stream Format.

Field Purpose Server emits Client behaviour
data: Event payload. Repeat for multi-line bodies. data: {"x":1}\n Concatenated with \n, delivered as event.data
event: Named event type. event: price\n Dispatches to addEventListener('price', …) instead of message
id: Last-Event-ID checkpoint. id: 4821\n Stored; resent as Last-Event-ID header on reconnect
retry: Reconnection delay in ms. retry: 5000\n Overrides the browser default reconnect interval
: (comment) Keep-alive / no-op. : keepalive\n Ignored, but resets idle timers end-to-end
(blank line) Event terminator. \n Triggers dispatch of the buffered fields

The response headers carry equal weight. This table is the authoritative checklist for what an SSE response must and must not set:

Header Value Why
Content-Type text/event-stream; charset=utf-8 Required for EventSource to accept the stream
Cache-Control no-cache, no-transform Stops CDN/proxy caching and gzip rebuffering
Connection keep-alive HTTP/1.1 only; omit under HTTP/2 (illegal there)
X-Accel-Buffering no Per-response nginx buffering off without editing nginx.conf
Content-Encoding (absent) Compression buffers; never gzip an event stream
Content-Length (absent) Body is open-ended; rely on chunked transfer

Three correctness rules trip up most implementations. First, IDs must never contain a newline or null byte. Second, a data: value of more than a few hundred kilobytes risks hitting client and proxy limits — see Maximum Payload Size Limits for SSE Streams. Third, the stream is text-only and UTF-8: binary payloads must be base64-encoded into data:, never written raw.

Architecture patterns: proxy, load balancer, fan-out Permalink to this section

A production SSE deployment is rarely a single process talking to a browser. There is almost always a reverse proxy in front, a load balancer spreading connections across nodes, and a message bus distributing events to whichever node holds a given client.

Reverse proxy Permalink to this section

The proxy must disable buffering and raise read/send timeouts to the maximum stream lifetime, and it must speak HTTP/1.1 upstream with a cleared Connection header so it does not pass keep-alive semantics through incorrectly.

location /events {
    proxy_pass http://sse_pool;
    proxy_http_version 1.1;          # required for chunked streaming upstream
    proxy_set_header Connection "";  # clear hop-by-hop header; enables upstream keep-alive
    proxy_buffering off;             # flush each chunk straight to the client
    proxy_cache off;
    proxy_read_timeout 1h;           # raise above any expected idle gap
    proxy_send_timeout 1h;
    chunked_transfer_encoding on;
}

proxy_buffering off is non-negotiable; without it nginx accumulates the response and your “real-time” stream arrives in bursts. The buffering mechanics and how chunked transfer interacts with flushing are detailed in Buffer Management & Chunked Transfer Encoding.

Load balancer Permalink to this section

SSE connections are long-lived, so naive round-robin still works for the connection itself — but only if every node can reach the message bus, because a client pinned to node B must still receive an event produced on node A. Configure idle timeouts generously: an AWS ALB defaults to 60 seconds and will silently kill a quiet stream, so either raise idle_timeout.timeout_seconds or emit a heartbeat well inside that window. The interaction between proxy timeouts and your heartbeat cadence is the heart of HTTP Keep-Alive & Connection Lifecycle.

Pub/sub fan-out Permalink to this section

The decoupling pattern that makes SSE scale horizontally is a publish/subscribe bus between producers and SSE workers. Each worker subscribes to the relevant channels; when a domain event is published, every worker holding a matching client serialises and flushes it. Redis pub/sub is the common choice:

// Go worker: subscribe to Redis, flush to every local SSE client on a channel.
sub := rdb.Subscribe(ctx, "events:orders")
ch := sub.Channel()
for msg := range ch {
    hub.Broadcast([]byte("data: " + msg.Payload + "\n\n")) // fan out to local conns
}

This is the foundation of Redis Pub/Sub Fan-Out for SSE, which covers channel design, fan-out cost, and the at-most-once delivery semantics of Redis pub/sub. Because workers hold no shared connection state, you can keep them stateless and lean on Connection Pooling for SSE Servers to manage the socket inventory on each node.

One subtlety: Redis pub/sub is fire-and-forget. If a worker is mid-reconnect to Redis when a message is published, that message is gone — there is no buffering and no replay. For streams where a missed event is merely stale (a price ticker, a presence indicator) this is acceptable. For streams where every event matters (an audit feed, a billing event), pair pub/sub with a durable log: publish to a Redis Stream or Kafka topic for ordering and replay, and use pub/sub only as a low-latency wake-up so workers know to read the log. The worker then resumes each client from its Last-Event-ID against the log rather than trusting the volatile channel. Either way, keep the per-channel subscription count bounded — a single worker subscribed to tens of thousands of fine-grained channels spends more CPU on subscription bookkeeping than on flushing, so prefer a few coarse channels filtered in-process over a channel per client.

Topology decision Permalink to this section

The table below maps a deployment shape to when it fits, so you pick the simplest topology that meets the requirement rather than over-building.

Topology When it fits Cost
Single process < 5k streams, one box, no HA need Cheapest; a deploy drops every client
N stateless workers + Redis pub/sub Many streams, at-most-once is fine One bus hop; events lost during Redis blips
Workers + durable log (Streams/Kafka) Every event must be delivered Higher latency and ops; full replay on resume
Edge/serverless + KV state Global low latency, bursty traffic Cold starts disrupt streams; state sync needed

Edge cases & failure modes Permalink to this section

Streams fail in ways that never surface in development. The list below pairs each with its root cause and the mitigation.

  • Events arrive in bursts, not real time. Buffering somewhere in the chain. Mitigation: set X-Accel-Buffering: no, proxy_buffering off, never gzip, and explicitly flush after every write.
  • The stream dies silently after ~30–60 s of inactivity. A proxy or load balancer idle timeout closed it without a TCP FIN reaching your app. Mitigation: emit a : heartbeat\n\n comment every 15–25 seconds, comfortably inside the shortest hop’s timeout.
  • Leaked file descriptors and timers under churn. The disconnect handler never fired or never cleaned up. Mitigation: bind cleanup to the request-close event and verify with a load test that idle FDs return to baseline. See Handling Client Disconnects in Node.js SSE.
  • A slow consumer balloons server memory. The client’s TCP window is full, writes queue in your process, and the producer keeps pushing. Mitigation: honour write backpressure — pause the producer when res.write() returns false, or bound the per-connection queue and drop low-priority events. Covered in Handling Slow Consumers with SSE Backpressure.
  • Reconnect storms after a deploy or outage. Thousands of clients reconnect simultaneously. Mitigation: set a sane retry: and let clients apply jittered backoff; reject excess connections at the edge rather than accepting and OOMing.
  • Duplicate or missing events on resume. The server ignored Last-Event-ID or reused IDs. Mitigation: emit monotonic IDs and keep a short replay window. See Idempotent Event ID Generation.
  • EventSource refuses to connect across origins. Missing CORS headers; EventSource does not send credentials unless withCredentials and matching headers are set. Mitigation: configure CORS per Handling CORS in SSE Implementations.
  • Async generator keeps running after the client left (Python). The framework did not cancel the generator. Mitigation: check await request.is_disconnected() and handle asyncio.CancelledError — detailed in Streaming SSE Responses with FastAPI and sse-starlette.

A correct FastAPI generator that cooperates with disconnection looks like this:

# FastAPI / Starlette: stop the generator when the client disconnects.
from fastapi import FastAPI, Request
from sse_starlette.sse import EventSourceResponse
import asyncio

app = FastAPI()

@app.get("/events")
async def events(request: Request):
    async def gen():
        try:
            while True:
                if await request.is_disconnected():  # poll disconnect
                    break
                event = await bus.get()              # awaits without blocking the loop
                yield {"id": event.id, "event": "msg", "data": event.json}
        except asyncio.CancelledError:
            await cleanup(request)                   # release the subscription
            raise
    return EventSourceResponse(gen())

The same loop in Python is built on async generators, the same in Go on goroutines plus http.Flusher; the full FastAPI patterns live in the Python FastAPI SSE Implementation Guide and the Go equivalents in Go Streaming Patterns for SSE.

Horizontal scaling & production ops Permalink to this section

SSE scales by connection count, and each connection costs one file descriptor plus a slice of per-connection memory. The hard ceiling is your kernel’s open-file limit, which defaults to 1024 — orders of magnitude too low. Raise it system-wide and per-process before anything else.

# /etc/security/limits.conf — per-user soft/hard FD ceilings
*  soft  nofile  262144
*  hard  nofile  262144

# Kernel-wide ceiling and ephemeral port range for many outbound subs
sysctl -w fs.file-max=2000000
sysctl -w net.ipv4.ip_local_port_range="1024 65535"

Under systemd, also set LimitNOFILE=262144 in the unit, since limits.conf does not apply to systemd-managed services. The full tuning playbook — nofile, somaxconn, conntrack table size, and ephemeral ports — is in Tuning File-Descriptor Limits for SSE Connection Pools and Configuring Connection Pools for High-Concurrency SSE.

This table sizes the limits against connection targets so capacity planning is concrete, not guesswork:

Concurrent streams / node nofile floor Approx. RAM (8 KB/conn) Bottleneck to watch
1,000 4,096 ~8 MB Event-loop CPU per flush
10,000 32,768 ~80 MB GC pressure, heartbeat fan-out
50,000 131,072 ~400 MB conntrack table, ephemeral ports
100,000+ 262,144 ~800 MB Per-conn memory; shard across nodes

For deployment, drain rather than kill. On shutdown, stop accepting new connections, send a final event so clients know to reconnect elsewhere, then close. Go makes this explicit with http.Server.Shutdown and context cancellation — see Graceful Shutdown for Go SSE Servers. For observability, instrument four numbers: active connections (gauge), connection lifetime (histogram), events flushed per second (counter), and write-backpressure events (counter). Alert at 80% of nofile. Keep liveness probes off the SSE route — point them at a cheap /health endpoint, because a probe that opens a real stream and never reads it leaks resources and skews your connection gauge.

Protect the node from overload with admission control: cap connections per node and per client, and apply token-bucket throttling on event emission so one chatty tenant cannot starve the rest. The mechanics are in Rate Limiting & Backpressure Handling and Applying Token-Bucket Rate Limiting to Event Streams. Once a single node is saturated, scale out with stateless workers behind the Redis bus described in Scaling SSE Across Multiple Nodes with Redis.

Migration & fallback paths Permalink to this section

Most teams arrive at SSE from polling or after deciding WebSockets are overkill for one-way data. The trade-offs are laid out in SSE vs WebSockets vs HTTP Polling and the decision guide When to Use Server-Sent Events over WebSockets; the short version is that SSE wins when the client only consumes, because you keep plain HTTP, automatic reconnection, and Last-Event-ID resume for free.

From polling, the migration is mechanical: replace the poll loop’s response with a streamed body and reuse the same serialization. Keep the polling endpoint live during rollout so old clients keep working; SSE clients simply stop re-requesting.

From WebSockets, abstract event production behind a transport-agnostic interface so the same producer feeds both a WS handler and an SSE handler during the transition, then retire WS once clients are upgraded.

A practical staging is to run both transports behind a feature flag and shift a percentage of clients to SSE while watching the connection gauge, backpressure counter, and error rate. Because SSE clients reconnect automatically, a partial rollback is low-risk: flip the flag, and clients re-establish on the legacy transport on their next reconnect. Keep the event schema versioned so a producer can serve old and new clients from the same payload during the overlap.

Fallback for environments without EventSource — older runtimes, some proxies that mangle text/event-stream, or when you need POST/headers the native API cannot send — is fetch plus a ReadableStream reader that parses the same wire format manually:

// fetch + ReadableStream fallback: same wire format, full control over headers.
const res = await fetch('/events', {
  headers: { Accept: 'text/event-stream', Authorization: `Bearer ${token}` },
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buf = '';
for (;;) {
  const { value, done } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });
  let i;
  while ((i = buf.indexOf('\n\n')) !== -1) {       // split on event terminator
    const frame = buf.slice(0, i);
    buf = buf.slice(i + 2);
    const data = frame.split('\n')
      .filter(l => l.startsWith('data:'))
      .map(l => l.slice(5).trimStart())
      .join('\n');
    if (data) handle(JSON.parse(data));            // your dispatch
  }
}

This pattern is also how authenticated streams ship a bearer token the native EventSource cannot set — covered in Authenticating SSE Streams with Tokens & Cookies. On the consuming side, wiring either transport into a UI is the domain of React EventSource Hooks & State and Vue EventSource Composables.

⚡ Production Directives

  • Disable buffering everywhere: X-Accel-Buffering: no, proxy_buffering off, and never gzip text/event-stream.
  • Emit a comment heartbeat every 15–25 s — strictly inside the shortest proxy/LB idle timeout in the path.
  • Raise nofile to at least 4× peak concurrent streams (and set LimitNOFILE in the systemd unit); alert at 80% of the limit.
  • Bind cleanup to the request-close event and load-test that idle FDs and memory return to baseline.
  • Honour write backpressure: pause or bound the producer when the socket buffer is full; drop low-priority events first.

Production Checklist Permalink to this section

Frequently Asked Questions Permalink to this section

Why do my events arrive in a burst instead of one at a time?

Something in the path is buffering. Disable proxy buffering (proxy_buffering off / X-Accel-Buffering: no), never gzip the stream, and explicitly flush your application output after every event write.

How often should the server send a heartbeat?

Every 15–25 seconds, comfortably inside the shortest idle timeout in the chain (ALB defaults to 60 s, many nginx setups to 60 s). Send a comment line : heartbeat\n\n — it resets every idle timer without dispatching a visible event.

Can I serve SSE over HTTP/2?

Yes, and it removes the browser's six-connections-per-origin limit for HTTP/1.1. Do not send the Connection: keep-alive header under HTTP/2 — it is illegal there. Otherwise the wire format is identical.

How do I scale SSE beyond one server?

Keep workers stateless and fan events out through a pub/sub bus such as Redis. Any worker holding a matching client serialises and flushes the event, so the load balancer can route connections freely without sticky sessions.

What happens to events while a client is reconnecting?

The client resends its last id: as the Last-Event-ID header. If you keep a bounded replay window keyed by ID, you can resend the gap; otherwise emit a resync event telling the client to refetch state.

Topics in this Section

Buffer Management & Chunked Transfer Encoding Master HTTP chunked transfer encoding, flush timing, watermark tuning, and proxy buffering bypass for production SSE streams across Node.js, Python, and Go. Connection Pooling for SSE Servers How to manage thousands of long-lived SSE connections: FD limits, worker tuning, pooled sockets, heartbeats, and graceful drain under load. Go Streaming Patterns for SSE Idiomatic Go SSE using http.Flusher, goroutines, channels, context cancellation, and graceful shutdown — with production-ready patterns. HTTP Keep-Alive & Connection Lifecycle How HTTP Keep-Alive, TCP keepalives, heartbeats, and teardown interact for long-lived SSE streams — config, edge cases, and production ops. Idempotent Event ID Generation Assign collision-free, monotonic IDs to SSE events so clients can resume streams after disconnection without duplicates or missed events. Node.js Streaming Architecture Basics Build reliable SSE endpoints in Node.js: headers, res.write, flush, heartbeats, backpressure, disconnect cleanup, and production ops. Python FastAPI SSE Implementation Guide Ship production SSE endpoints with FastAPI: StreamingResponse, async generators, sse-starlette, ASGI tuning, proxy buffering, and scale considerations. Rate Limiting & Backpressure Handling Token-bucket rate limiting, slow-consumer backpressure, and drop policies for SSE servers — Node.js, Go, and Python patterns with production ops. Redis Pub/Sub Fan-Out for SSE Decouple SSE producers from connections using Redis pub/sub. Multi-node broadcast, channel design, at-least-once delivery, and backpressure strategies.