HTTP Keep-Alive & Connection Lifecycle Permalink to this section

Part of Backend Stream Generation & Connection Management.

SSE connections must stay open for minutes to hours — sometimes indefinitely. Every layer of the stack has opinions about that: the HTTP spec has its own keep-alive semantics, TCP has its own keepalive probes, reverse proxies enforce idle timeouts, mobile NATs silently age out sockets, and application code must emit heartbeats to survive all of the above. Getting these four layers aligned is the central challenge of SSE connection lifecycle management.

This guide covers the full lifecycle: how HTTP/1.1 and HTTP/2 keep-alive work at the wire level, how to configure each layer server-side (Node.js and Go), how clients detect and recover from half-open connections, what breaks in real networks (proxy buffering, firewall resets, CDN interference), and how to validate that everything is working under load.

HTTP Keep-Alive SSE Connection Lifecycle Four-layer diagram showing how TCP keepalives, HTTP keep-alive, application heartbeats, and proxy timeouts interact for a long-lived SSE connection. TCP Layer HTTP Layer Proxy / CDN Application time → SYN/SYN-ACK/ACK keepalive probe (75s) probe ACK FIN/ACK GET /events HTTP/1.1 200 OK text/event-stream (open stream) Connection close proxy idle timeout window (e.g. 60 s Nginx default) 504 / RST data: … :ping :ping :ping :ping heartbeat every 15 s — resets proxy idle counter heartbeat timeout/reset open stream
Four-layer SSE connection lifecycle: TCP keepalive probes, HTTP persistent connection, proxy idle timeout window, and application heartbeats that prevent premature teardown.

How HTTP Keep-Alive Works for SSE Permalink to this section

HTTP/1.1 Persistent Connections Permalink to this section

HTTP/1.1 made persistent connections the default. A client omits Connection: close (or sends Connection: keep-alive) and the server reuses the TCP socket for subsequent requests. For SSE, the GET /events response never completes — the body stays open, written to incrementally. The connection is therefore persistent by definition.

Wire-level handshake for an SSE stream:

GET /events HTTP/1.1
Host: api.example.com
Accept: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

HTTP/1.1 200 OK
Content-Type: text/event-stream; charset=utf-8
Cache-Control: no-cache, no-store
Connection: keep-alive
Keep-Alive: timeout=120, max=1000
Transfer-Encoding: chunked
X-Accel-Buffering: no

: ping

data: {"type":"connected"}

id: 1
data: {"type":"update","value":42}

The Keep-Alive response header carries two directives (RFC 7230, §6.3):

Directive Meaning Typical SSE value
timeout Seconds the server will keep the socket open with no data 120–600
max Maximum requests on the connection before forced teardown 1 (SSE never reuses for other requests)

The timeout value is advisory — intermediate proxies and the OS can terminate sooner. max=1 is appropriate because SSE ties the socket to a single long-running response.

HTTP/2 Multiplexed Streams Permalink to this section

HTTP/2 changes the topology: one TCP connection carries many concurrent streams. Keep-alive at the connection level is managed by PING frames sent by either side. SETTINGS_INITIAL_WINDOW_SIZE controls flow-control per-stream. SSE over HTTP/2 arrives as a single stream within the shared connection.

Client                    Server
  | ── SETTINGS ──────────► |
  | ◄── SETTINGS ────────── |
  | ── HEADERS (GET) ─────► |  stream 1 opens
  | ◄── HEADERS (200) ───── |  half-open; server sends DATA frames
  | ◄── DATA (:ping\n\n) ── |
  | ◄── DATA (event) ────── |
  | ── PING ───────────────► |  connection keepalive
  | ◄── PING (ACK) ───────── |

With HTTP/2 you do not set Connection: keep-alive (it is a hop-by-hop header forbidden in HTTP/2). Instead ensure your server emits periodic PING frames and that SETTINGS_INITIAL_WINDOW_SIZE (default 65535 bytes) is large enough for burst event payloads.

TCP Keepalive vs HTTP Keep-Alive Permalink to this section

These are distinct mechanisms that must both be configured:

Mechanism Layer Controlled by Default off? Purpose
HTTP Connection: keep-alive Application HTTP headers No (H/1.1 default) Reuse TCP socket across requests
HTTP Keep-Alive: timeout= Application HTTP headers Advisory idle window
TCP SO_KEEPALIVE + probe intervals Kernel sysctl / socket options Yes on most Linux Detect half-open sockets

TCP keepalive fires three probe packets (Linux defaults: first probe after 7200 s, every 75 s, 9 probes). For SSE that is far too slow — a dead socket can linger for 2+ hours. Tighten OS defaults or set per-socket options.

Server-Side Configuration: Node.js Permalink to this section

Timeout Alignment Permalink to this section

Node.js (v18+) exposes four timeout properties on http.Server. For SSE all three timers must exceed your longest expected heartbeat interval:

import http from "node:http";
import { createApp } from "./app.js";

const server = http.createServer(createApp());

// keepAliveTimeout: how long an idle socket waits for the NEXT request.
// For SSE this socket is never reused for another request, but the value
// must be > 0 so Node doesn't close the socket between heartbeats.
server.keepAliveTimeout = 125_000; // 125 s — slightly above proxy timeout

// headersTimeout must be strictly GREATER than keepAliveTimeout,
// otherwise Node will fire headersTimeout first on slow proxies.
server.headersTimeout = 130_000;

// requestTimeout covers the time to read the whole request body.
// Not critical for SSE (bodies are tiny), but set defensively.
server.requestTimeout = 30_000;

// timeout is a legacy property; set it to 0 to disable automatic teardown.
server.timeout = 0;

server.listen(3000, () => console.log("SSE server listening on :3000"));

SSE Endpoint with Heartbeat Permalink to this section

import { randomUUID } from "node:crypto";

export function sseHandler(req, res) {
  // Mandatory headers for SSE
  res.writeHead(200, {
    "Content-Type": "text/event-stream; charset=utf-8",
    "Cache-Control": "no-cache, no-store",
    "Connection": "keep-alive",
    "Keep-Alive": "timeout=120",
    "X-Accel-Buffering": "no",     // disable Nginx proxy buffering
    "X-Content-Type-Options": "nosniff",
  });

  // Flush headers immediately — do not wait for the first event
  res.flushHeaders();

  const clientId = randomUUID();
  console.log({ event: "sse_connect", clientId });

  // Heartbeat: SSE comment lines (": ping\n\n") reset proxy idle timers.
  // 15 s is safe against 60 s Nginx default and 30 s AWS ALB idle timeout.
  const heartbeat = setInterval(() => {
    res.write(": ping\n\n");
  }, 15_000);

  // Propagate real events from an emitter, pub/sub, or queue
  const onEvent = (payload) => {
    res.write(`id: ${payload.id}\ndata: ${JSON.stringify(payload)}\n\n`);
  };
  eventBus.on("update", onEvent);

  // Teardown: client closes tab, proxy resets, or network error
  req.on("close", () => {
    clearInterval(heartbeat);
    eventBus.off("update", onEvent);
    console.log({ event: "sse_disconnect", clientId });
  });
}

Per-Socket TCP Keepalive (Node.js) Permalink to this section

The "connection" event fires for every new TCP socket before any HTTP parsing. Set OS-level keepalive here:

server.on("connection", (socket) => {
  socket.setKeepAlive(
    true,
    10_000  // first keepalive probe after 10 s of inactivity
  );
  // Node's setKeepAlive maps to SO_KEEPALIVE + TCP_KEEPIDLE.
  // TCP_KEEPINTVL and TCP_KEEPCNT still use OS defaults (75 s / 9).
  // Override via sysctl for tighter half-open detection.
});

For full control set kernel parameters:

# Reduce from Linux defaults (7200/75/9) to detect dead sockets faster.
sysctl -w net.ipv4.tcp_keepalive_time=60
sysctl -w net.ipv4.tcp_keepalive_intvl=10
sysctl -w net.ipv4.tcp_keepalive_probes=3
# Persist in /etc/sysctl.d/99-sse.conf

Server-Side Configuration: Go Permalink to this section

Go’s net/http package exposes finer-grained timeout controls via http.Server. For SSE endpoints disable the read/write deadlines on the hijacked connection, but keep server-level guards for non-streaming routes.

package main

import (
    "fmt"
    "log/slog"
    "net/http"
    "time"
)

func main() {
    mux := http.NewServeMux()
    mux.HandleFunc("/events", sseHandler)
    mux.HandleFunc("/health", healthHandler)

    srv := &http.Server{
        Addr:    ":8080",
        Handler: mux,
        // IdleTimeout governs keep-alive socket reuse for non-SSE routes.
        // For SSE the connection is never idle in this sense, but set a
        // reasonable ceiling for non-streaming requests on the same server.
        IdleTimeout: 120 * time.Second,
        // ReadHeaderTimeout protects against Slowloris on the request side.
        ReadHeaderTimeout: 5 * time.Second,
        // WriteTimeout MUST be 0 for SSE — a finite value will terminate
        // the response after that duration, closing the stream.
        WriteTimeout: 0,
    }

    slog.Info("starting", "addr", srv.Addr)
    if err := srv.ListenAndServe(); err != nil {
        slog.Error("server exit", "err", err)
    }
}

func sseHandler(w http.ResponseWriter, r *http.Request) {
    flusher, ok := w.(http.Flusher)
    if !ok {
        http.Error(w, "streaming not supported", http.StatusInternalServerError)
        return
    }

    w.Header().Set("Content-Type", "text/event-stream; charset=utf-8")
    w.Header().Set("Cache-Control", "no-cache, no-store")
    w.Header().Set("Connection", "keep-alive")
    w.Header().Set("X-Accel-Buffering", "no")
    w.WriteHeader(http.StatusOK)
    flusher.Flush() // flush headers immediately

    ticker := time.NewTicker(15 * time.Second)
    defer ticker.Stop()

    notify := r.Context().Done()

    for {
        select {
        case <-notify:
            slog.Info("client disconnected", "remote", r.RemoteAddr)
            return
        case <-ticker.C:
            // SSE comment — invisible to EventSource, resets proxy timer
            fmt.Fprintf(w, ": ping\n\n")
            flusher.Flush()
        case evt := <-eventChannel():
            fmt.Fprintf(w, "id: %s\ndata: %s\n\n", evt.ID, evt.JSON)
            flusher.Flush()
        }
    }
}

See Go Streaming Patterns for SSE for http.Flusher gotchas and graceful shutdown sequencing.

Client-Side: EventSource and Half-Open Detection Permalink to this section

The browser EventSource API has built-in reconnection, but its half-open detection is weak. If the TCP connection dies without a FIN packet (common on mobile networks), EventSource.readyState stays 1 (OPEN) indefinitely — it never triggers onerror and never reconnects.

Watchdog Timer Pattern Permalink to this section

class ResilientEventSource {
  #es = null;
  #watchdog = null;
  #url;
  #timeout;   // ms to wait for any traffic (events or heartbeats)
  #onMessage;

  constructor(url, { timeout = 30_000, onMessage } = {}) {
    this.#url = url;
    this.#timeout = timeout;
    this.#onMessage = onMessage;
    this.#connect();
  }

  #resetWatchdog() {
    clearTimeout(this.#watchdog);
    this.#watchdog = setTimeout(() => {
      // No traffic for #timeout ms — assume half-open socket
      console.warn("SSE watchdog fired; forcing reconnect");
      this.#es?.close();
      this.#connect();
    }, this.#timeout);
  }

  #connect() {
    this.#es = new EventSource(this.#url, { withCredentials: true });

    this.#es.onopen = () => this.#resetWatchdog();

    // SSE comment lines arrive as empty `message` events in some parsers;
    // others fire no event at all. Use a named event for heartbeats instead.
    this.#es.addEventListener("ping", () => this.#resetWatchdog());

    this.#es.onmessage = (e) => {
      this.#resetWatchdog();
      this.#onMessage?.(JSON.parse(e.data));
    };

    this.#es.onerror = () => {
      // EventSource will auto-reconnect after `retry:` ms.
      // Reset watchdog so we don't double-reconnect.
      this.#resetWatchdog();
    };
  }

  close() {
    clearTimeout(this.#watchdog);
    this.#es?.close();
  }
}

The server emits a named ping event every 15 seconds:

// Server-side named ping (Node.js)
setInterval(() => res.write("event: ping\ndata: \n\n"), 15_000);

Named events avoid the ambiguity of whether SSE comment lines (: ping) reach onmessage — they do not per spec, but some polyfills differ.

For React-specific patterns see React EventSource Hooks & State and Error Handling & Reconnection UX.

EventSource retry: Field Permalink to this section

The server controls the client’s reconnection interval via the retry: field (milliseconds). Send it once at stream open:

retry: 3000
data: {"type":"connected"}

This tells EventSource to wait 3 seconds before reconnecting after any closure. The browser applies exponential backoff on top — it doubles on repeated failures up to a browser-defined cap (typically 64 seconds in Chrome). You cannot control the backoff curve from the server, only the base interval.

For detailed retry and event ID design see Event ID & Retry Mechanism Design.

Edge Cases and Network Interference Permalink to this section

Proxy Buffering and CDN Interference Permalink to this section

Most proxy/CDN layers default to buffering HTTP responses before forwarding. This silently breaks SSE because events accumulate in the proxy’s buffer and are never forwarded downstream.

# Nginx: disable buffering per location
location /events {
    proxy_pass          http://upstream;
    proxy_http_version  1.1;
    proxy_set_header    Connection "";  # enable keepalive to upstream
    proxy_set_header    Host $host;

    # Critical: disable all response buffering
    proxy_buffering     off;
    proxy_cache         off;
    proxy_read_timeout  3600s;   # longer than your longest expected idle
    proxy_send_timeout  3600s;

    # Alternatively: X-Accel-Buffering: no in the app response header
    # does the same thing without touching Nginx config
}

HAProxy equivalent:

backend sse_backend
    timeout tunnel  3600s   # replaces connect/server timeouts for upgraded conns
    option             http-server-close
    server             app1 127.0.0.1:3000

AWS ALB has a fixed 60-second idle timeout for non-WebSocket connections. You cannot raise it beyond 4000 seconds via console. Keep heartbeat intervals under 55 seconds when behind ALB.

Cloudflare’s default proxy_read_timeout is 100 seconds for non-enterprise plans. Set Cache-Control: no-store and ensure X-Accel-Buffering: no reaches Cloudflare; their edge respects this header on enterprise plans.

Firewall and NAT Teardown Permalink to this section

Environment Typical idle TCP teardown Mitigation
AWS ALB 60 s (configurable to 4000 s) Heartbeat every 55 s
Nginx (default) 75 s (keepalive_timeout) Heartbeat every 60 s
HAProxy 50 s (timeout tunnel unset) Set timeout tunnel 3600s
Cloudflare (free) 100 s Heartbeat every 90 s
Corporate firewall 30–90 s (varies) Heartbeat every 25 s
Mobile NAT (iOS/Android) 15–30 s (varies by carrier) Heartbeat every 12 s
Kubernetes default service no timeout (pass-through) OS-level TCP keepalive

The safest universal heartbeat interval is 15 seconds. It clears most infrastructure timeouts at the cost of ~3.5 KB/hour of SSE comment traffic per connection.

Half-Open Sockets and CLOSE_WAIT Accumulation Permalink to this section

A CLOSE_WAIT state on the server means the client sent a FIN but the server has not yet called close() on the socket. Growing CLOSE_WAIT counts indicate a server-side listener cleanup leak:

# Count by state
ss -tn | awk '{print $1}' | sort | uniq -c | sort -rn

# Find CLOSE_WAIT sockets on the SSE port
ss -tnp state close-wait '( dport = :3000 or sport = :3000 )'

Fix: always call res.end() (Node.js) or return from the handler (Go) when req.on("close") fires. Never leave event listeners attached after the socket is gone — this is also the most common EventSource memory leak pattern on the server side.

TLS Renegotiation Cost Permalink to this section

Each new TLS 1.2 handshake costs ~2 RTTs and ~1–5 ms of CPU per connection. For 10,000 reconnections/minute that is 160–830 ms of CPU overhead per second on a single core. TLS 1.3 reduces this to 1 RTT (and 0-RTT for session resumption), but session tickets still require a key lookup. HTTP Keep-Alive amortises this cost across the lifetime of the stream — one handshake per physical TCP connection.

Performance and Scale Considerations Permalink to this section

Each persistent SSE connection holds:

  • One file descriptor (check ulimit -n; Linux default is 1024 for non-root processes — raise to 100 000 for SSE servers)
  • ~4–8 KB kernel socket buffers (send + receive ring)
  • Application-level state: event listener registration, heartbeat timer, client ID

At 50 000 concurrent connections on a single Node.js process you carry approximately:

  • 50 000 file descriptors
  • ~400 MB kernel socket buffers
  • Node.js heap for 50 000 timer objects (~100 MB)

Use a shared heartbeat timer rather than per-connection setInterval to reduce timer overhead:

// Single shared interval pings ALL connected clients
const clients = new Set();

setInterval(() => {
  for (const res of clients) {
    res.write(": ping\n\n");
  }
}, 15_000);

// In handler:
clients.add(res);
req.on("close", () => clients.delete(res));

This cuts timer count from N timers to 1, reducing event-loop load significantly. See Rate Limiting & Backpressure Handling for managing slow consumers that cause heartbeat writes to stall.

For connection pool sizing and file-descriptor tuning across multiple server nodes, see Connection Pooling for SSE Servers.

When distributing events across nodes, heartbeat traffic itself becomes a fan-out concern — see Redis Pub/Sub Fan-Out for SSE for the broadcast architecture.

Validation and Debugging Permalink to this section

curl Smoke Test Permalink to this section

# Verify keep-alive header and event stream format
curl -v --no-buffer \
  -H "Accept: text/event-stream" \
  -H "Cache-Control: no-cache" \
  https://api.example.com/events 2>&1 | grep -E "< |^(:|data:|id:|event:|retry:)"

# Check TCP socket state during streaming (separate terminal)
ss -tnp state established '( dport = :443 )'

Expected response headers:

< HTTP/1.1 200 OK
< Content-Type: text/event-stream; charset=utf-8
< Cache-Control: no-cache, no-store
< Connection: keep-alive
< Keep-Alive: timeout=120
< X-Accel-Buffering: no
< Transfer-Encoding: chunked

Chrome DevTools Permalink to this section

  1. Open Network → filter by EventStream type.
  2. Select the /events request → EventStream tab shows parsed events in real time.
  3. Timing tab: “Waiting (TTFB)” should be < 500 ms; “Content Download” should grow continuously.
  4. Check Headers → Response Headers for X-Accel-Buffering: no and Cache-Control: no-cache.

Structured Log Fields to Capture Permalink to this section

{
  "event": "sse_connect",
  "client_id": "uuid",
  "remote_addr": "1.2.3.4",
  "user_agent": "...",
  "ts": "2026-06-21T12:00:00Z"
}
{
  "event": "sse_disconnect",
  "client_id": "uuid",
  "duration_ms": 84200,
  "events_sent": 312,
  "heartbeats_sent": 5
}
{
  "event": "sse_heartbeat_miss",
  "client_id": "uuid",
  "last_write_ms_ago": 16200
}

Alert when p99(duration_ms) drops sharply — it signals proxy teardown rather than graceful client close. Track heartbeats_sent / duration_s ratio; a ratio significantly below 1/15 means heartbeat writes are being dropped or buffered.

Load Testing with k6 Permalink to this section

// k6 script: verify keep-alive persists under concurrent load
import http from "k6/http";
import { check, sleep } from "k6";

export const options = {
  vus: 200,
  duration: "60s",
};

export default function () {
  const res = http.get("https://api.example.com/events", {
    headers: { Accept: "text/event-stream" },
    timeout: "90s",
  });
  check(res, {
    "status 200": (r) => r.status === 200,
    "content-type is event-stream": (r) =>
      r.headers["Content-Type"].includes("text/event-stream"),
  });
  sleep(60);
}

Monitor active_connections in Prometheus during the run; it should hold at 200, not oscillate.

⚡ Production Directives

  • Set server.keepAliveTimeout and server.headersTimeout (Node.js) or WriteTimeout: 0 (Go) — wrong values silently close streams before clients notice.
  • Emit a named event: ping every 15 seconds from every SSE handler; use a shared interval, not per-connection timers, to keep event-loop overhead constant regardless of connection count.
  • Add X-Accel-Buffering: no to every SSE response and configure proxy_read_timeout 3600s in Nginx — forgetting either causes events to batch-deliver seconds late or never.
  • Raise ulimit -n to at least 100 000 before deploying; each SSE connection consumes one file descriptor, and the default 1024 will hard-cap concurrency on most Linux distros.
  • Monitor CLOSE_WAIT count on the SSE port; sustained growth means cleanup listeners are leaking and the process will eventually exhaust memory.

Production Checklist Permalink to this section

Frequently Asked Questions Permalink to this section

Why does my SSE stream disconnect exactly every 60 seconds?

This is the Nginx default keepalive_timeout (75 s) or proxy_read_timeout (60 s) firing. The proxy silently drops the connection when it sees no data. Fix by either: (a) adding proxy_read_timeout 3600s to the Nginx location block, or (b) emitting a heartbeat comment/event every < 60 seconds from your server so the proxy sees traffic and resets its idle timer.

Does Connection: keep-alive on the response header do anything for SSE?

For HTTP/1.1 it is advisory — it signals intent but the actual socket lifetime is controlled by the server's keepAliveTimeout and the OS TCP stack. The header is required for HTTP/1.0 clients but redundant in HTTP/1.1 (where persistence is the default). Always include it for clarity and compatibility with older clients and proxies that check for it explicitly. Do NOT set it in HTTP/2 responses — it is a hop-by-hop header forbidden by the HTTP/2 spec (RFC 7540 §8.1.2.2).

What is the difference between an SSE comment (`: ping`) and a named ping event?

SSE comment lines (lines beginning with :) are defined in the WHATWG HTML spec §9.2 as a way to keep connections alive — they are stripped by the parser and never reach onmessage or any addEventListener handler. A named event (event: ping\ndata: \n\n) is delivered to listeners registered with es.addEventListener("ping", handler). Use comment lines if you do not need the client to react; use named events if you want the client to reset a watchdog timer or update a last-seen timestamp. Comments generate slightly less overhead because the parser discards them without dispatching a DOM event.

Should I use HTTP/2 for SSE at scale?

HTTP/2 multiplexing means many SSE streams share one TCP connection per client, reducing TLS handshake overhead and OS-level socket counts. However, head-of-line blocking at the TCP level can delay events on one stream when another stream's packets are lost. HTTP/3 (QUIC) solves this. For most deployments HTTP/1.1 with keep-alive is simpler to reason about and debug. Use HTTP/2 if your infrastructure supports it and you have many concurrent streams per client (e.g. multiple tabs); the per-stream overhead is lower but the failure modes are harder to observe in DevTools.

How do I handle SSE keep-alive on mobile where the tab goes to the background?

iOS and Android aggressively suspend background tabs, which closes the underlying socket. The EventSource will eventually detect the error and attempt reconnection when the tab resumes, but the timing depends on the OS scheduler. Implement a Page Visibility API listener: when the tab becomes visible again, check whether EventSource.readyState is CLOSED (2) and recreate it. Also send a watchdog timer reset on visibilitychange so you do not immediately declare the connection dead when returning from background. See Mobile & Background-Tab Handling for the full pattern.

Deep Dives