Error Handling & Reconnection UX Permalink to this section

Part of Frontend Consumption & Client Patterns.

EventSource reconnects automatically, but that single fact conceals a pile of real-world failure modes: exponential backoff isn’t built in, the browser’s default retry interval is 3 seconds regardless of server load, stale data accumulates silently, and users see a frozen UI with no indication the stream died. This guide covers the full reconnection lifecycle—from spec-defined retry fields through custom backoff logic, state machine design, and UI indicators—so engineers can ship SSE endpoints that fail gracefully under adverse network conditions.


How the Reconnection Mechanism Works Permalink to this section

The WHATWG HTML spec (§9.2, “The EventSource interface”) defines a reconnection algorithm the browser runs automatically when a connection closes unexpectedly:

  1. The browser fires an error event on the EventSource object.
  2. If readyState transitions to CLOSED (2), no reconnect happens. If it transitions to CONNECTING (0), the browser waits retry milliseconds and then re-issues the HTTP request.
  3. If the server previously sent a retry: field, the browser uses that value (in milliseconds) for this wait. The default is implementation-defined—Chromium uses 3 000 ms, Firefox 5 000 ms.
  4. On reconnect, the browser includes Last-Event-ID: <id> in the request header if the stream set any id: field during the session.

The server’s retry: field is the only spec-standard lever for controlling reconnect cadence. Everything else—exponential backoff, jitter, circuit breaking—must be layered on top in JavaScript by closing and re-opening EventSource yourself.

SSE Reconnection State Machine State diagram showing transitions between Connected, Connecting, Backoff, and Closed states with associated EventSource events and retry logic. CONNECTED readyState = 1 CONNECTING readyState = 0 BACKOFF custom JS timer CLOSED readyState = 2 new EventSource() initial / after backoff open HTTP 200 error → browser retry (3 s) es.close() setTimeout(backoff) fires → new EventSource() 401 / max retries
SSE reconnection state machine: browser-native retry vs. custom close-and-backoff flow.

Wire-level events that trigger errors Permalink to this section

Scenario error fired? readyState after Browser auto-reconnects?
Server closes TCP connection cleanly Yes CONNECTING (0) Yes, after retry ms
Server sends HTTP 200 then drops Yes CONNECTING (0) Yes
Server returns HTTP 4xx on connect Yes CLOSED (2) No
Server returns HTTP 5xx on connect Yes CLOSED (2) No
Network interface goes down Yes CONNECTING (0) Yes (keeps retrying)
CORS preflight fails Yes CLOSED (2) No

4xx and 5xx responses immediately close the EventSource permanently—the browser will not retry. Your code must detect this case and reopen manually.


Client-Side Implementation: Custom Backoff Manager Permalink to this section

Because the native EventSource exposes no retry count and no backoff control, wrap it in a class that manages its own state machine:

// reconnecting-sse.ts
type SSEState = "connecting" | "connected" | "backoff" | "closed";

interface SSEOptions {
  url: string;
  withCredentials?: boolean;
  maxRetries?: number;          // default: 10; set Infinity for endless retry
  baseDelay?: number;           // ms, default: 1000
  maxDelay?: number;            // ms cap, default: 30000
  jitter?: boolean;             // add ±25% jitter, default: true
  onMessage?: (ev: MessageEvent) => void;
  onCustomEvent?: Record<string, (ev: MessageEvent) => void>;
  onStateChange?: (state: SSEState) => void;
  onError?: (attempt: number, delay: number) => void;
  onOpen?: () => void;
}

export class ReconnectingSSE {
  private es: EventSource | null = null;
  private attempt = 0;
  private timer: ReturnType<typeof setTimeout> | null = null;
  private state: SSEState = "connecting";
  private aborted = false;

  constructor(private opts: SSEOptions) {
    this.connect();
  }

  private setState(s: SSEState) {
    this.state = s;
    this.opts.onStateChange?.(s);
  }

  private connect() {
    if (this.aborted) return;
    this.setState("connecting");

    const es = new EventSource(this.opts.url, {
      withCredentials: this.opts.withCredentials ?? false,
    });
    this.es = es;

    es.addEventListener("open", () => {
      this.attempt = 0;       // reset backoff counter on successful open
      this.setState("connected");
      this.opts.onOpen?.();
    });

    es.addEventListener("message", (ev) => {
      this.opts.onMessage?.(ev);
    });

    // Register named event listeners
    for (const [type, handler] of Object.entries(this.opts.onCustomEvent ?? {})) {
      es.addEventListener(type, handler);
    }

    es.addEventListener("error", () => {
      if (es.readyState === EventSource.CLOSED) {
        // Browser will NOT retry (4xx / CORS / network permanent failure)
        es.close();
        this.scheduleReconnect();
      }
      // If readyState === CONNECTING, browser is already retrying natively.
      // We hijack by closing it and doing our own backoff instead:
      if (es.readyState === EventSource.CONNECTING) {
        es.close();
        this.scheduleReconnect();
      }
    });
  }

  private scheduleReconnect() {
    const maxRetries = this.opts.maxRetries ?? 10;
    if (this.attempt >= maxRetries) {
      this.setState("closed");
      return;
    }

    const base = this.opts.baseDelay ?? 1000;
    const cap = this.opts.maxDelay ?? 30_000;
    // Full jitter: random value in [0, min(cap, base * 2^attempt)]
    let delay = Math.min(cap, base * 2 ** this.attempt);
    if (this.opts.jitter !== false) {
      delay = delay * (0.75 + Math.random() * 0.5); // ±25%
    }
    this.attempt++;
    this.setState("backoff");
    this.opts.onError?.(this.attempt, delay);

    this.timer = setTimeout(() => {
      if (!this.aborted) this.connect();
    }, delay);
  }

  /** Call when the component unmounts or the user navigates away. */
  close() {
    this.aborted = true;
    if (this.timer !== null) clearTimeout(this.timer);
    this.es?.close();
    this.setState("closed");
  }

  get currentState(): SSEState {
    return this.state;
  }
}

Key design decisions:

  • Full jitter (random * delay) distributes reconnect storms across time. Without it, thousands of clients reconnecting after a server restart hit in synchronized waves.
  • attempt resets on open so a brief disconnect doesn’t consume the retry budget permanently.
  • aborted flag prevents a race where close() is called just before a setTimeout fires and spawns a new EventSource.

Server-Side Implementation: Retry Field & Status Codes Permalink to this section

The server controls reconnect timing via the retry: field and must respond correctly to different failure modes. Below is a Go example (see also the Python FastAPI SSE guide and Node.js streaming basics for those runtimes):

// handler.go — Go net/http SSE handler
package main

import (
    "fmt"
    "net/http"
    "time"
)

func sseHandler(w http.ResponseWriter, r *http.Request) {
    // Must flush headers immediately; proxies buffer otherwise.
    flusher, ok := w.(http.Flusher)
    if !ok {
        http.Error(w, "streaming unsupported", http.StatusInternalServerError)
        return
    }

    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    w.Header().Set("X-Accel-Buffering", "no")   // disable nginx proxy buffering
    w.Header().Set("Connection", "keep-alive")

    // Tell clients: retry after 2 000 ms (browser default is 3 000).
    // Send once at stream start; browser remembers it for the session.
    fmt.Fprintf(w, "retry: 2000\n\n")
    flusher.Flush()

    ticker := time.NewTicker(5 * time.Second)
    defer ticker.Stop()

    ctx := r.Context()
    for {
        select {
        case <-ctx.Done():
            // Client disconnected — clean up without writing, which would panic.
            return
        case t := <-ticker.C:
            id := t.UnixMilli()
            fmt.Fprintf(w, "id: %d\ndata: {\"ts\":%d}\n\n", id, id)
            flusher.Flush()
        }
    }
}

HTTP status codes the server must handle explicitly Permalink to this section

// authMiddleware wraps sseHandler to handle auth failures correctly.
func authMiddleware(next http.HandlerFunc) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        token := r.URL.Query().Get("token")
        if token == "" {
            // 401 → EventSource closes permanently, browser won't retry.
            // Client must detect and re-authenticate before reconnecting.
            w.Header().Set("Content-Type", "application/json")
            w.WriteHeader(http.StatusUnauthorized)
            fmt.Fprintln(w, `{"error":"unauthorized"}`)
            return
        }
        // 503 during maintenance: send retry hint before closing
        if maintenanceMode {
            w.Header().Set("Retry-After", "60")
            http.Error(w, "maintenance", http.StatusServiceUnavailable)
            return
        }
        next(w, r)
    }
}

The Last-Event-ID header on reconnect lets you resume from where the client left off. See Idempotent Event ID Generation for strategies to generate replay-safe IDs.


UI State: Surfacing Connection Health Permalink to this section

Four connection states to expose Permalink to this section

State Description Suggested UI
connecting Initial connection or reconnection in progress Spinner or pulsing indicator
connected Stream active, events flowing Green dot, no intrusion
backoff Waiting before next reconnect attempt “Reconnecting in Xs…” countdown
closed Max retries exceeded or permanent error Error banner with “Try again” button

React hook using ReconnectingSSE Permalink to this section

// useSSE.ts — wraps ReconnectingSSE with React state
import { useEffect, useRef, useState } from "react";
import { ReconnectingSSE, type SSEState } from "./reconnecting-sse";

interface UseSSEOptions {
  url: string;
  withCredentials?: boolean;
  onMessage: (ev: MessageEvent) => void;
}

export function useSSE({ url, withCredentials, onMessage }: UseSSEOptions) {
  const [connectionState, setConnectionState] = useState<SSEState>("connecting");
  const [attempt, setAttempt] = useState(0);
  const [nextRetryIn, setNextRetryIn] = useState<number | null>(null);
  const sseRef = useRef<ReconnectingSSE | null>(null);
  const countdownRef = useRef<ReturnType<typeof setInterval> | null>(null);

  useEffect(() => {
    sseRef.current = new ReconnectingSSE({
      url,
      withCredentials,
      onMessage,
      onStateChange: setConnectionState,
      onError: (att, delay) => {
        setAttempt(att);
        // Drive a countdown timer for UX ("reconnecting in 4s")
        let remaining = Math.ceil(delay / 1000);
        setNextRetryIn(remaining);
        if (countdownRef.current) clearInterval(countdownRef.current);
        countdownRef.current = setInterval(() => {
          remaining -= 1;
          if (remaining <= 0) {
            clearInterval(countdownRef.current!);
            setNextRetryIn(null);
          } else {
            setNextRetryIn(remaining);
          }
        }, 1000);
      },
    });

    return () => {
      sseRef.current?.close();
      if (countdownRef.current) clearInterval(countdownRef.current);
    };
  }, [url]); // Re-create when URL changes (e.g., token rotation)

  const reconnect = () => {
    sseRef.current?.close();
    setAttempt(0);
    sseRef.current = new ReconnectingSSE({
      url,
      withCredentials,
      onMessage,
      onStateChange: setConnectionState,
    });
  };

  return { connectionState, attempt, nextRetryIn, reconnect };
}
// ConnectionBanner.tsx
import { useSSE } from "./useSSE";

export function LiveFeed() {
  const [messages, setMessages] = useState<string[]>([]);
  const { connectionState, nextRetryIn, reconnect } = useSSE({
    url: "/api/events",
    onMessage: (ev) => setMessages((m) => [...m.slice(-99), ev.data]),
  });

  return (
    <div>
      {connectionState === "backoff" && (
        <div className="banner banner--warn">
          Stream interrupted. Reconnecting{nextRetryIn ? ` in ${nextRetryIn}s` : "…"}
        </div>
      )}
      {connectionState === "closed" && (
        <div className="banner banner--error">
          Connection lost.{" "}
          <button onClick={reconnect}>Reconnect</button>
        </div>
      )}
      {connectionState === "connected" && (
        <span className="status-dot status-dot--live" aria-label="Live" />
      )}
      <ul>{messages.map((m, i) => <li key={i}>{m}</li>)}</ul>
    </div>
  );
}

For Vue, the same pattern applies as a composable; see Vue EventSource Composables for the equivalent implementation. For Redux, see State-Management Integration for SSE on dispatching connection state actions.

Stale-data indicators Permalink to this section

When EventSource is in backoff or closed state, data already rendered may be minutes old. Detect staleness with a last-received timestamp:

// Inside onMessage handler
const lastReceived = useRef<number>(Date.now());

function onMessage(ev: MessageEvent) {
  lastReceived.current = Date.now();
  // ...update state
}

// Periodically check for stale data (e.g., heartbeat missed)
useEffect(() => {
  const interval = setInterval(() => {
    const age = Date.now() - lastReceived.current;
    if (age > 30_000 && connectionState === "connected") {
      // readyState says connected but no events in 30s — likely a zombie
      setStale(true);
    }
  }, 5_000);
  return () => clearInterval(interval);
}, [connectionState]);

This catches “zombie connections”—TCP sessions that remain open at the OS level after a proxy silently dropped them, leaving EventSource in OPEN state forever. See Mobile & Background-Tab Handling for the Page Visibility API approach to detecting these on mobile.


Edge Cases & Network Interference Permalink to this section

Proxy and CDN buffering Permalink to this section

Reverse proxies (nginx, HAProxy, Cloudflare) buffer responses by default. A buffering proxy accumulates SSE events into chunks, delivering them in bursts—or not at all if the buffer never fills. Mitigation:

# nginx: disable buffering for the SSE endpoint only
location /api/events {
    proxy_pass         http://upstream;
    proxy_buffering    off;
    proxy_cache        off;
    proxy_read_timeout 3600s;   # keep idle connections alive 1 h
    proxy_http_version 1.1;
    proxy_set_header   Connection "";  # enable HTTP/1.1 keepalive to upstream
    # Signal to upstream nginx (e.g., behind another proxy)
    add_header         X-Accel-Buffering no;
}
Proxy / CDN Default buffer Disable command
nginx proxy_pass On (8 KB) proxy_buffering off
HAProxy Off by default
Cloudflare Buffers unless Enterprise “Disable Buffering” per route rule
AWS ALB Off for HTTP/1.1 streaming
Fastly Buffering On by default beresp.do_stream = true in VCL

Firewall connection timeouts Permalink to this section

Corporate firewalls kill idle TCP connections after 60–120 seconds. Send a heartbeat comment every 25 seconds to keep them alive:

# FastAPI — heartbeat every 25 s
async def event_generator():
    while True:
        yield ": heartbeat\n\n"   # comment, ignored by EventSource
        await asyncio.sleep(25)

A comment line (starts with :) is not dispatched to onmessage—it’s purely a keep-alive probe. If heartbeats stop arriving, the stale-data detector above fires.

HTTP/2 and multiplexing Permalink to this section

Under HTTP/2, all SSE streams share a single TCP connection. One stream’s backpressure can throttle others. Validate with:

curl -v --http2 -N -H "Accept: text/event-stream" https://example.com/api/events 2>&1 | grep -E "^[<>*]"

If the server sends HTTP/1.1 (< HTTP/1.1 200), ensure Connection: keep-alive is set. If it sends HTTP/2 (< HTTP/2 200), verify there is no Transfer-Encoding: chunked header (illegal in HTTP/2; framing is handled by the protocol).

EventSource and 301/302 redirects Permalink to this section

EventSource follows redirects automatically, but on redirect the Last-Event-ID header is not forwarded by some browsers (Chromium bug #40089). If your auth flow redirects (e.g., SSO), the client reconnects from event 0 silently, causing duplicate or missing events. Mitigation: canonicalize the URL before constructing EventSource, or embed the Last-Event-ID as a query param and handle it server-side on redirect.


Performance & Scale Considerations Permalink to this section

Connection count and backoff storms Permalink to this section

With 50 000 clients connected and a server restart, all 50 000 fire reconnect within the same retry interval. Full jitter reduces thundering herd from a synchronized spike to near-uniform distribution:

Without jitter: 50,000 connections at t=3000ms
With full jitter (0–3000ms): ~16.7 connections/ms average

Set retry: 1000 in the stream and let the JavaScript backoff cap at 30 seconds; this bounds the reconnect storm to manageable levels.

Memory per pending connection Permalink to this section

Each EventSource object holds:

  • One open TCP socket (file descriptor)
  • ~4–8 KB of kernel socket buffer (Linux default)
  • One pending HTTP request object in the browser

The server side holds roughly the same. With 10 000 concurrent SSE connections on a single Node.js process, expect ~80–100 MB of socket buffer overhead before application memory. See Connection Pooling for SSE Servers for file-descriptor limits and worker-process allocation.

Backpressure from slow reconnects Permalink to this section

If a reconnecting client re-subscribes to a Redis channel before the server removes the old subscription (race between TCP close detection and garbage collection), the client may receive duplicate events. Assign a unique clientId per EventSource instance (regenerated on each reconnect) and deduplicate on the server by Last-Event-ID:

// Include clientId in URL so the server can clean up the old subscription
const clientId = crypto.randomUUID();
const es = new EventSource(`/api/events?clientId=${clientId}`);

Validation & Debugging Permalink to this section

curl diagnostics Permalink to this section

# Stream events in real time, show response headers
curl -N -i -H "Accept: text/event-stream" http://localhost:3000/api/events

# Verify Last-Event-ID round-trip after simulated disconnect
curl -N -H "Accept: text/event-stream" \
     -H "Last-Event-ID: 1718300000000" \
     http://localhost:3000/api/events

# Check for proxy buffering: if events arrive in bursts, buffering is on
curl -N --no-buffer -H "Accept: text/event-stream" https://prod.example.com/api/events \
  | ts "[%Y-%m-%d %H:%M:%.S]"   # requires moreutils 'ts'

Chrome DevTools Permalink to this section

  1. Open Network → EventStream tab for the SSE request.
  2. Confirm Content-Type: text/event-stream in Response Headers.
  3. Each row in the EventStream tab shows: id, event type, data, and timestamp. Gaps in timestamps reveal backoff intervals.
  4. Throttle to “Slow 3G” in the Network panel to simulate mobile reconnection; confirm your countdown UI appears.

Structured logging for reconnection events Permalink to this section

// Log each reconnect attempt for observability
const sse = new ReconnectingSSE({
  url: "/api/events",
  onError: (attempt, delay) => {
    console.warn(JSON.stringify({
      event: "sse_reconnect",
      attempt,
      delay_ms: Math.round(delay),
      url: "/api/events",
      timestamp: new Date().toISOString(),
    }));
    // Ship to your logging pipeline (DataDog, Loki, etc.)
    fetch("/log", {
      method: "POST",
      body: JSON.stringify({ event: "sse_reconnect", attempt, delay_ms: delay }),
    }).catch(() => {/* non-blocking */});
  },
});

Alert if attempt reaches 5 for more than 1% of sessions in a 5-minute window—it typically indicates a deployment issue or upstream failure not yet caught by server-side monitors.


⚡ Production Directives

  • Set retry: 2000 in the SSE stream and implement full-jitter exponential backoff in JavaScript; never rely on the browser's fixed 3-second retry for production traffic.
  • Detect permanent failures (4xx, CORS) separately from transient ones (TCP drop) and surface a manual "Reconnect" button with a re-auth prompt when auth is the root cause.
  • Send a : heartbeat comment every 25 seconds to prevent corporate firewalls from silently killing idle connections; pair with a client-side stale-data detector at 30 seconds.
  • Set proxy_buffering off and X-Accel-Buffering: no for all SSE routes; buffer-related delivery failures are the single most common production SSE issue.
  • Log each reconnect attempt with attempt number and delay to your observability stack; alert if reconnect rates spike above baseline within 5 minutes of any deployment.

Production Checklist Permalink to this section


Frequently Asked Questions Permalink to this section

Why doesn't EventSource retry automatically after a 503?

The spec mandates that any HTTP response that is not a 200 with Content-Type: text/event-stream causes the browser to close the EventSource and set readyState to CLOSED (2)—permanently. This applies to 5xx errors. The rationale is that the server explicitly rejected the connection; blindly retrying could amplify load during an outage. You must listen for the error event, check readyState === 2, and implement retry logic in JavaScript as shown above.

How do I differentiate a network drop from a server-side close?

You cannot distinguish them via EventSource events alone—both fire the same error event. The difference is detectable only after the fact: if the server cleanly closes with a specific final event (e.g., event: close\ndata: maintenance\n\n), your addEventListener("close", ...) handler can set a flag before error fires. Without a sentinel event, assume all drops are transient and retry with backoff.

Does the browser forward Last-Event-ID across redirects?

Not reliably. Chromium does not include Last-Event-ID in the redirected request (issue #40089). To work around this, store the last received ID in sessionStorage and append it as a query parameter (?lastEventId=…) when constructing the EventSource URL. The server reads it from the query string when the header is absent.

What backoff parameters should I use in production?

Start with baseDelay: 1000ms, maxDelay: 30000ms, maxRetries: 10, and full jitter. This gives a max wait budget of roughly 5 minutes before giving up and showing a "Connection lost" message. For dashboards or notification feeds where users expect persistence, set maxRetries: Infinity and show a reconnect countdown indefinitely. For one-shot data fetches, set maxRetries: 3.

How do I test reconnection behavior locally?

Use Chrome DevTools Network throttling to simulate packet loss, or kill the backend server mid-stream and restart it. For automated testing, write a test HTTP server that sends 3 events then drops the connection (close the socket without writing a response), and assert that your client reconnects and resumes from the correct Last-Event-ID. Tools like toxiproxy let you inject latency and disconnects programmatically.


Deep Dives