Event ID & Retry Mechanism Design Permalink to this section

Part of SSE Protocol Fundamentals & Architecture.

Without explicit event IDs and a deliberate retry policy, a dropped TCP connection silently discards every event transmitted since the last full page load. The browser reconnects, the server starts a fresh stream, and the client has no idea what it missed. This guide covers the two SSE primitives that fix that: the id field, which gives every event a cursor the browser remembers and replays on reconnect, and the retry directive, which controls how long the browser waits before attempting that reconnect. Together they form a resumption contract — the backbone of at-least-once delivery over a fundamentally unreliable transport.

SSE Event ID and Retry Reconnection Flow Sequence diagram showing client-server SSE lifecycle: initial connection, event delivery with id and retry fields, TCP drop, client wait using retry interval, reconnect with Last-Event-ID header, and server resumption from cursor. Browser Client SSE Server Cursor Store GET /stream (no Last-Event-ID) retry:2000 id: evt-42 data: {...} id: evt-43 data: {...} ZADD cursor evt-43 TCP connection dropped wait 2000 ms GET /stream Last-Event-ID: evt-43 ZRANGEBYSCORE > evt-43 id: evt-44 data: {missed event} stream resumes without gap
SSE reconnection lifecycle: client stores the last id, waits retry ms on disconnect, then reconnects with Last-Event-ID so the server can replay missed events from a cursor store.

How the Mechanism Works Permalink to this section

The WHATWG HTML specification defines id and retry as optional event-stream fields alongside data and event. Their interaction is precise and worth reading at the wire level.

The id field Permalink to this section

When a client’s EventSource parser encounters an id: line, it stores the value in its last event ID buffer. That buffer persists across reconnections for the lifetime of the EventSource object. On every reconnect, the browser sends the buffer’s contents as the Last-Event-ID HTTP request header.

One special case: an id: line with an empty value (id: followed immediately by a newline) clears the buffer. The next reconnect will not send the header at all. Use this deliberately when you want to signal “start fresh” — for example, after a client-initiated full resync.

The retry field Permalink to this section

retry: takes a decimal integer in milliseconds. The spec requires the value consist only of ASCII digits; any non-digit character causes the directive to be ignored. The browser replaces its internal reconnect timer on the first retry: it sees and keeps the new value for all subsequent reconnects unless overwritten.

Key behaviors to internalize:

  • retry: updates the timer globally for the connection, not per-event. Send it once at stream open, then only when you want to change it.
  • The browser does not apply exponential backoff automatically. Whatever you send is what it uses. Implement progressive backoff server-side by updating the retry value over time, or use a fetch-based polyfill for full backoff control.
  • If retry: is never sent, browsers default to approximately 3000 ms (Chrome, Firefox) — but this is implementation-defined, not specified.

Wire-level annotated example Permalink to this section

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
X-Accel-Buffering: no

retry: 2000
id: evt-001
event: order_update
data: {"orderId":"ORD-9182","status":"confirmed","ts":1748736000}

id: evt-002
event: order_update
data: {"orderId":"ORD-9183","status":"shipped","ts":1748736005}

: heartbeat

id: evt-003
event: order_update
data: {"orderId":"ORD-9182","status":"delivered","ts":1748736060}

Notes on the above:

  • retry: is sent once, before the first id:. The browser stores 2000 ms.
  • Each event has its own id:. The browser’s last-event-ID buffer is updated to evt-003 after the third event.
  • The comment line (: heartbeat) is a keepalive. It has no id: and does not affect the buffer.
  • The blank line after each event block is mandatory — it dispatches the event.

The Last-Event-ID header sent on reconnect looks like:

GET /stream HTTP/1.1
Accept: text/event-stream
Last-Event-ID: evt-003
Cache-Control: no-cache

Server-Side Implementation: Cursor-Based Resumption Permalink to this section

The server’s job on reconnect is to read Last-Event-ID, look up what events the client missed, and replay them in order before resuming the live stream. The cursor store must support range queries by event ID.

Node.js with Redis sorted set Permalink to this section

import { createClient } from 'redis';
import http from 'http';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

const STREAM_KEY = 'sse:events';
const RETENTION_MS = 24 * 60 * 60 * 1000; // 24 hours

http.createServer(async (req, res) => {
  if (req.url !== '/stream') { res.end(); return; }

  res.writeHead(200, {
    'Content-Type': 'text/event-stream',
    'Cache-Control': 'no-cache',
    'X-Accel-Buffering': 'no',   // disable nginx buffering
    'Connection': 'keep-alive',
  });

  // Send initial retry directive — clients will wait 3 s before reconnecting
  res.write('retry: 3000\n\n');

  const lastId = req.headers['last-event-id'];

  // Replay missed events if client provides a cursor
  if (lastId) {
    // Events stored as JSON strings in a sorted set, score = numeric sequence
    const cursor = parseInt(lastId.split('-')[1], 10); // extract sequence from "evt-NNN"
    const missed = await redis.zRangeByScore(
      STREAM_KEY,
      cursor + 1,           // exclusive: everything after the last seen ID
      '+inf',
      { LIMIT: { offset: 0, count: 200 } }  // cap replay to 200 events
    );

    for (const raw of missed) {
      const evt = JSON.parse(raw);
      // Re-emit with original id so the client's buffer stays accurate
      res.write(`id: ${evt.id}\ndata: ${JSON.stringify(evt.payload)}\n\n`);
    }
  }

  // Subscribe to new events via Redis pub/sub
  const sub = redis.duplicate();
  await sub.connect();

  sub.subscribe('sse:live', (message) => {
    const evt = JSON.parse(message);
    res.write(`id: ${evt.id}\ndata: ${JSON.stringify(evt.payload)}\n\n`);
  });

  // Clean up on client disconnect
  req.on('close', async () => {
    await sub.unsubscribe('sse:live');
    await sub.quit();
  });
}).listen(3000);

Publishing events (in another service or route):

async function publishEvent(payload) {
  const seq = await redis.incr('sse:seq');  // atomic monotonic counter
  const id = `evt-${seq}`;
  const evt = { id, payload, ts: Date.now() };

  // Store in sorted set with score = sequence number for range queries
  await redis.zAdd(STREAM_KEY, { score: seq, value: JSON.stringify(evt) });

  // Expire old events — keep only last 24 h worth
  const cutoff = Date.now() - RETENTION_MS;
  await redis.zRemRangeByScore(STREAM_KEY, '-inf', cutoff);  // trim by time if score = ts

  // Fan out to all connected stream nodes
  await redis.publish('sse:live', JSON.stringify(evt));
}

For a deeper treatment of distributed fan-out across multiple stream nodes, see Redis Pub/Sub Fan-Out for SSE.

Python / FastAPI with sse-starlette Permalink to this section

from fastapi import FastAPI, Request
from sse_starlette.sse import EventSourceResponse
import asyncio, redis.asyncio as aioredis

app = FastAPI()
pool = aioredis.ConnectionPool.from_url("redis://localhost")

STREAM_KEY = "sse:events"
RETRY_MS = 3000

async def event_generator(request: Request, last_id: str | None):
    r = aioredis.Redis(connection_pool=pool)

    # Send initial retry directive once
    yield {"retry": RETRY_MS}

    if last_id:
        cursor = int(last_id.split("-")[1])
        missed = await r.zrangebyscore(STREAM_KEY, cursor + 1, "+inf", start=0, num=200)
        for raw in missed:
            evt = json.loads(raw)
            yield {"id": evt["id"], "data": json.dumps(evt["payload"])}

    pubsub = r.pubsub()
    await pubsub.subscribe("sse:live")

    async for msg in pubsub.listen():
        if await request.is_disconnected():
            await pubsub.unsubscribe("sse:live")
            break
        if msg["type"] == "message":
            evt = json.loads(msg["data"])
            yield {"id": evt["id"], "data": json.dumps(evt["payload"])}

@app.get("/stream")
async def stream(request: Request):
    last_id = request.headers.get("last-event-id")
    return EventSourceResponse(event_generator(request, last_id))

See the Python FastAPI SSE Implementation Guide for full middleware and deployment configuration.

Client-Side Consumption with EventSource Permalink to this section

The native EventSource API handles id and retry automatically, but you need to understand what it actually does — and what it does not do.

const es = new EventSource('/stream', { withCredentials: true });

es.addEventListener('order_update', (e) => {
  // e.lastEventId is the most recently seen id: value
  // The browser will send this as Last-Event-ID on reconnect automatically
  console.log('last cursor:', e.lastEventId);
  processOrder(JSON.parse(e.data));
});

es.addEventListener('error', (e) => {
  // readyState 0 = CONNECTING (auto-reconnecting), 2 = CLOSED (permanent failure)
  if (es.readyState === EventSource.CLOSED) {
    // Server sent HTTP error or called es.close() — will NOT auto-reconnect
    scheduleManualReconnect();
  }
  // readyState === CONNECTING: browser is already waiting the retry interval
});

What the browser does automatically:

  1. Receives retry: 3000 → stores 3000 ms as the reconnect delay.
  2. Receives id: evt-042 → stores "evt-042" in the last-event-ID buffer.
  3. On TCP drop → waits 3000 ms → sends GET /stream with Last-Event-ID: evt-042.
  4. On id: (empty) → clears buffer → next reconnect sends no Last-Event-ID.

What the browser does not do: exponential backoff, jitter, max retry attempts, or custom headers beyond Last-Event-ID. For those capabilities, use a fetch-based implementation instead — covered in Browser Support & Polyfill Strategies.

Fetch-based client with exponential backoff Permalink to this section

async function* streamEvents(url, options = {}) {
  let lastId = null;
  let delay = options.initialDelay ?? 1000;
  const maxDelay = options.maxDelay ?? 30000;
  const factor = options.factor ?? 2;

  while (true) {
    const headers = { Accept: 'text/event-stream' };
    if (lastId) headers['Last-Event-ID'] = lastId;
    if (options.token) headers['Authorization'] = `Bearer ${options.token}`;

    try {
      const res = await fetch(url, { headers, signal: options.signal });
      if (!res.ok) throw new Error(`HTTP ${res.status}`);

      delay = options.initialDelay ?? 1000; // reset on successful connect

      const reader = res.body.getReader();
      const decoder = new TextDecoder();
      let buf = '';

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        buf += decoder.decode(value, { stream: true });

        const blocks = buf.split('\n\n');
        buf = blocks.pop(); // incomplete trailing block

        for (const block of blocks) {
          const lines = block.split('\n');
          let id = null, data = null, event = 'message';
          for (const line of lines) {
            if (line.startsWith('id:')) id = line.slice(3).trim();
            else if (line.startsWith('data:')) data = line.slice(5).trim();
            else if (line.startsWith('event:')) event = line.slice(6).trim();
          }
          if (id) lastId = id;           // keep cursor updated
          if (data !== null) yield { id: lastId, event, data };
        }
      }
    } catch (err) {
      if (options.signal?.aborted) return;
      // Exponential backoff with full jitter
      const jitter = Math.random() * delay * 0.2;
      await new Promise(r => setTimeout(r, delay + jitter));
      delay = Math.min(delay * factor, maxDelay);
    }
  }
}

// Usage
const ctrl = new AbortController();
for await (const evt of streamEvents('/stream', { token: getToken(), signal: ctrl.signal })) {
  dispatch({ type: evt.event, payload: JSON.parse(evt.data) });
}

This pattern is also useful for authenticated streams — see Authenticating SSE Streams with Tokens & Cookies.

Designing the retry Value Permalink to this section

The retry value is a lever with real production consequences. Set it by profiling your failure modes:

Scenario Recommended retry Reasoning
Stable datacenter link, low jitter 1000–2000 ms Fast recovery without thundering herd
Mobile / lossy networks 3000–5000 ms Reduces storm risk; battery friendly
Server deploy / rolling restart 5000–10000 ms Gives pods time to come up healthy
Degraded upstream dependency 15000–30000 ms Back off while dependency recovers
Bulk reconnect storm (spike traffic) 10000 + jitter Flatten the reconnect curve

Update the retry directive dynamically as conditions change. A server under memory pressure should lengthen retry before it starts dropping connections:

// Increase retry interval as connection count climbs
function adaptiveRetry(connectionCount) {
  if (connectionCount < 10_000) return 2000;
  if (connectionCount < 50_000) return 5000;
  return 10000; // slow reconnects under pressure
}

// Re-emit retry directive every 60 s so reconnecting clients pick it up
setInterval(() => {
  const ms = adaptiveRetry(getConnectionCount());
  broadcastToAll(`retry: ${ms}\n\n`);
}, 60_000);

For the full specification of retry semantics and advanced interval tuning, see Setting the retry Interval in SSE Streams.

Edge Cases and Network Interference Permalink to this section

SSE streams traverse HTTP infrastructure that was not designed for long-lived responses. Each hop can corrupt or destroy the id/retry mechanism.

Proxy and CDN buffering Permalink to this section

The most common failure: an intermediate proxy accumulates your events before forwarding them. The client sees no data for seconds, then a burst, then a disconnect — and its Last-Event-ID may be several events behind the actual last delivered event.

Infrastructure Default buffer behavior Mitigation header
nginx (reverse proxy) Buffered by default X-Accel-Buffering: no
Apache mod_proxy Buffered ProxyBufSize 0 / SetEnv proxy-sendchunked 1
Cloudflare (free/pro) Buffers until response completes Use SSE-compatible Cloudflare plan or Workers
AWS ALB Does not buffer HTTP/1.1 chunked No action needed
Fastly / Varnish beresp.do_stream = true required Set in VCL
CDN with gzip Can re-buffer to compress Set Content-Encoding: identity

The X-Accel-Buffering: no header is the single most important header to set on SSE responses. Without it, nginx silently defeats the entire streaming mechanism.

ID field stripping Permalink to this section

Some reverse proxies and WAFs normalize or strip unrecognized HTTP headers on the request path. If Last-Event-ID is stripped from the reconnect request, the server has no cursor and replays from the beginning (or not at all). Test this explicitly:

curl -v -N \
  -H "Last-Event-ID: evt-042" \
  -H "Accept: text/event-stream" \
  https://api.example.com/stream 2>&1 | head -40

Check the > Last-Event-ID: line in curl’s verbose output. If it disappears from the server-side access log, a proxy is stripping it. Allowlist the header in your WAF or proxy config.

Firewall idle-connection resets Permalink to this section

Many corporate firewalls and NAT devices reset TCP connections idle for 60–90 seconds. Your keepalive comment must arrive before that threshold:

// Send a keepalive comment every 25 seconds
const keepalive = setInterval(() => {
  if (!res.writableEnded) res.write(': keepalive\n\n');
}, 25_000);
req.on('close', () => clearInterval(keepalive));

The comment line (: keepalive\n\n) does not update the id buffer and does not trigger a message event on the client.

Stale cursor / cursor expiry Permalink to this section

If a client reconnects with a cursor that has been evicted from your store (e.g., after a 24-hour TTL), replaying from that cursor is impossible. Options:

  1. Return a sync-required event with a full state snapshot and a fresh id.
  2. Respond with HTTP 204 No Content to signal “no events since cursor” (browser does not reconnect on 204).
  3. Use HTTP 410 Gone to permanently close the stream — but note the browser will stop auto-reconnecting.

Do not silently drop the cursor and start from the head of the stream. That causes silent duplication for the client.

Distributed ID collisions Permalink to this section

Auto-increment counters are safe on a single node. Across multiple stream nodes they produce collisions. Use one of:

Strategy Uniqueness guarantee Sortable Overhead
Redis INCR on shared key Global (single Redis) Yes 1 round-trip per event
ULID Probabilistic (128-bit) Yes (ms-level) None (local)
Snowflake ID Global (with node ID config) Yes Config required
UUID v4 Probabilistic (128-bit) No None (local)

For production systems, Redis INCR on a single Redis instance (or cluster with slot affinity) gives strict monotonicity with no collision risk. See Idempotent Event ID Generation for implementation details and Generating Monotonic Event IDs for SSE for code examples.

Performance and Scale Considerations Permalink to this section

Memory: the cursor store cost Permalink to this section

At 1000 events/second with a 24-hour retention window, your sorted set holds up to 86.4 million entries. At ~200 bytes per entry (ID + JSON payload), that is roughly 17 GB — likely too large for RAM.

Mitigation strategies:

  • Store only event IDs in the sorted set; keep payloads in a separate key-value store with per-key TTL.
  • Use a stream-native data structure: Redis Streams (XADD / XRANGE) are purpose-built for this and are more memory-efficient than sorted sets.
  • Tier cold events to disk or an object store; serve hot events (last N minutes) from memory.

CPU: replay fan-out cost Permalink to this section

A mass reconnect event (deploy, network partition recovery) can trigger thousands of concurrent cursor lookups. Each lookup is a ZRANGEBYSCORE or XRANGE call. Protect Redis with:

  • A per-client replay cap (e.g., 200 events maximum).
  • A reconnect queue or token bucket to stagger replay requests. See Rate Limiting & Backpressure Handling for token-bucket patterns.
  • Redis cluster read replicas for replay queries (replicas handle reads; primary handles writes).

Connection overhead Permalink to this section

Each SSE connection holds an open TCP socket and a server-side goroutine/thread/async task. At 50k connections:

  • File descriptors: set ulimit -n 100000 and configure OS-level net.core.somaxconn.
  • Memory per connection: ~10–20 KB goroutine stack (Go), ~100 KB V8 async context (Node.js default).
  • Heartbeat timer: one setInterval per connection; consolidate into a single broadcast interval.

Validation and Debugging Permalink to this section

Verify Last-Event-ID survives your proxy chain Permalink to this section

# 1. Connect to the raw origin (bypass CDN)
curl -N -H "Last-Event-ID: test-42" -H "Accept: text/event-stream" \
  http://origin-host:3000/stream

# 2. Connect through the full proxy chain
curl -N -H "Last-Event-ID: test-42" -H "Accept: text/event-stream" \
  https://api.example.com/stream

# Compare: does the server log show the header in both cases?
# In Node.js: console.log(req.headers['last-event-id'])

Chrome DevTools inspection Permalink to this section

  1. Open Network tab → filter by EventStream (or text/event-stream).
  2. Click the /stream request → EventStream sub-tab: shows each dispatched event with its id.
  3. Headers tab of the reconnect request: verify Last-Event-ID is present.
  4. Simulate disconnect: kill the server process; watch the browser reconnect and the new request’s headers.

Structured logging Permalink to this section

// Log every reconnect attempt server-side
const lastId = req.headers['last-event-id'];
logger.info({
  event: 'sse_connect',
  clientIp: req.socket.remoteAddress,
  lastEventId: lastId ?? null,
  resuming: !!lastId,
  userAgent: req.headers['user-agent'],
});

Alert on: resuming: true with lastEventId absent from your cursor store (stale cursor); high reconnect rate per IP (client-side loop bug or very short retry); resuming: false rate spike (mass disconnect without prior id delivery — possible proxy stripping).

Replay correctness test Permalink to this section

# Publish 10 events via your normal publish path, then kill the stream.
# Reconnect with the id of event 5 and assert events 6-10 are replayed.
# Verify no event is duplicated and sequence is intact.
for i in $(seq 1 10); do
  curl -s -X POST http://localhost:3000/publish \
    -H "Content-Type: application/json" \
    -d "{\"msg\": \"event-$i\"}"
done

⚡ Production Directives

  • Set X-Accel-Buffering: no and Cache-Control: no-cache on every SSE response — missing these headers silently break delivery through nginx and CDNs.
  • Always emit retry: explicitly at stream open; never rely on the browser's implementation-defined default of ~3 s.
  • Use a Redis INCR or Redis Streams XADD for globally monotonic event IDs — local counters collide in multi-node deployments.
  • Cap replay depth on reconnect (200 events or 5 minutes of backlog); unbounded replays can saturate Redis and delay new events for recovering clients.
  • Test the full proxy chain with curl -v -H "Last-Event-ID: ..." after every infrastructure change — WAF rule updates routinely strip custom request headers.

Production Checklist Permalink to this section

Frequently Asked Questions Permalink to this section

Does the browser send Last-Event-ID if the connection was closed intentionally with es.close()?

No. Calling es.close() permanently closes the EventSource. When you create a new EventSource instance (even with the same URL), it starts fresh with no Last-Event-ID header. The last-event-ID buffer lives on the EventSource object, not in browser storage. If you need cursor persistence across page reloads, write the last e.lastEventId value to sessionStorage and include it as a query parameter on the new connection URL, then have the server honour that parameter with the same semantics as the header.

What happens if two events have the same id value?

The browser simply overwrites its buffer with the repeated value. On reconnect it sends that value once. There is no error or duplicate-detection at the protocol level. Duplicate IDs mean your server cannot distinguish "client missed events after evt-042" from "client already received evt-042" — you may re-deliver or skip. This is why strict monotonicity matters: use a counter that never repeats.

Can I send Last-Event-ID as a query parameter instead of a header?

The native EventSource API always uses the header, not a query parameter. However, nothing stops you from also accepting a ?lastEventId= query parameter on your server for environments where the header is stripped. A fetch-based polyfill can inject the cursor as a query parameter on each reconnect. If you support both, prefer the header when present (it is set by the browser automatically and is more reliable than client-managed query strings).

How do I handle the reconnect storm when a server restarts?

All clients reconnect simultaneously after the retry interval expires. Mitigations: (1) Use a longer retry value during rolling deploys (send retry: 10000 a few seconds before shutting down the stream). (2) Add a random jitter on the client side in a fetch-based implementation. (3) Rate-limit reconnects at your load balancer (e.g., nginx limit_req zone). (4) Pre-warm the cursor store on new nodes before they start accepting connections, so replay queries do not hit a cold cache.

Does HTTP/2 multiplexing affect the retry mechanism?

The id/retry fields and Last-Event-ID header are transport-agnostic — they work identically over HTTP/1.1 and HTTP/2. The main difference: over HTTP/2 you can multiplex multiple SSE streams on one TCP connection, so a single TCP drop does not disconnect all streams. However, a stream-level reset (RST_STREAM) still triggers reconnection with the last-event-ID buffer intact. Ensure your HTTP/2 proxy does not translate stream resets into full TCP teardowns before the client sees them.

Deep Dives