Generating Monotonic Event IDs for SSE Permalink to this section

Part of Idempotent Event ID Generation.

Monotonic event IDs are the mechanism that makes SSE reconnection deterministic. Without them, a client’s Last-Event-ID header is meaningless to the server and the browser either replays duplicate events or silently skips events that arrived during the gap. This guide walks through every strategy—from a single-process integer counter to Redis INCR to Snowflake-style distributed IDs—and shows how to persist sequence state across restarts so a deployment rollout never resets the counter to zero.

Symptom and Developer Intent Permalink to this section

The observable failure pattern: a client reconnects after a 10-second network drop, sends Last-Event-ID: 8f3a9c02-…, and the server returns 200 OK but starts the stream from the beginning. Alternatively, the server logs an error like:

WARN: Last-Event-ID '1672531200.483-0' not found in replay window — full resend

or the client silently receives events it already processed (duplicates) because the server cannot identify the resume point. In some deployments the ID is a UUID or a raw Date.now() timestamp that resets on each server restart, making every reconnect identical to a fresh connection.

The intent: emit a strictly increasing, globally unique id: field on every SSE event block, recover that counter after process restarts, and handle multi-node deployments without collisions.

Root Cause Analysis Permalink to this section

Why UUIDs and Timestamps Fail Permalink to this section

UUIDs are not monotonic. The server cannot answer “give me all events after ID X” using UUID comparison because there is no inherent ordering. Date.now() in milliseconds is monotonic within a single process run, but:

  1. Clock rollback: NTP corrections or VM live-migration can move the system clock backwards, producing IDs lower than previously emitted ones.
  2. Sub-millisecond collisions: Two events generated in the same millisecond get the same timestamp component, requiring an additional counter that is rarely implemented correctly.
  3. Reset on restart: An in-memory counter initialised to Date.now() starts from the current epoch value on every restart. A client holding Last-Event-ID: 1000 from the previous run may receive that same ID again on a fresh server with a reset counter.
  4. Multi-process duplication: Two Node.js cluster workers both starting an integer counter at 0 will emit id: 1, id: 2, … independently—identical IDs for different events.

Protocol-Level Consequence Permalink to this section

The WHATWG HTML specification for EventSource defines Last-Event-ID as the value of the most recent id: field received. When the browser reconnects it sends this value in the Last-Event-ID HTTP request header. The server must use this value to seek an event log and resume delivery. If the ID scheme does not support ordered lookup, the server has no choice but to replay everything or drop the reconnect to a fresh start—both are incorrect.

For the Event ID & Retry Mechanism Design to work correctly, the ID must be comparable: the server must be able to answer “is event A newer than event B?” and “fetch all events after event C.”

Step-by-Step Resolution Permalink to this section

Step 1 — Choose the Right ID Strategy for Your Deployment Scale Permalink to this section

Select a strategy before writing any code. The wrong choice requires a breaking migration later.

Strategy Monotonic Collision-safe across nodes Survives restart Throughput
In-process integer Yes No (single process only) With file/DB flush ~10 M/s
Redis INCR Yes Yes (Redis serialises) Yes (Redis is durable) ~80 k/s per node
ULID monotonicFactory Yes (ms + random suffix) Yes (80-bit random) Yes (no coordination needed) ~800 k/s
Snowflake (64-bit) Yes (ms + node + seq) Yes (node ID differentiates) Yes (node ID persisted) ~4 M/s
PostgreSQL NEXTVAL Yes Yes Yes ~10 k/s (network round-trip)

Use Redis INCR if you already run Redis, have fewer than 50 k events/s, and want the simplest possible guarantee. Use ULID if you want decentralised generation without assigning node IDs. Use Snowflake for high-throughput Go or Rust services where per-event Redis latency is unacceptable.

Step 2 — Implement the Counter Permalink to this section

Option A: Redis INCR (distributed, simple)

# Initialise the key once — skip if key already exists
redis-cli SET sse_event_seq 0 NX
# Each call returns the next integer; Redis serialises all INCRs
redis-cli INCR sse_event_seq

In Node.js with ioredis or node-redis:

import { createClient } from 'redis';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

// Atomic: returns a guaranteed-unique, always-increasing integer
async function nextEventId() {
  return redis.incr('sse_event_seq');   // returns Promise<number>
}

Option B: ULID monotonic factory (decentralised)

import { monotonicFactory } from 'ulid';

// One factory per process; the monotonic variant increments the
// 80-bit random suffix when two calls land in the same millisecond.
const ulid = monotonicFactory();

function nextEventId() {
  return ulid();   // e.g. "01HZAR3P2E0000000000000007" — lexicographically sortable
}

Option C: Snowflake (Go, high throughput)

package ids

import (
    "sync"
    "time"
)

// Bit layout: 41 bits epoch-ms | 10 bits nodeID | 12 bits sequence
const (
    epoch    = int64(1_700_000_000_000) // custom epoch reduces ID magnitude
    nodeBits = 10
    seqBits  = 12
    maxSeq   = (1 << seqBits) - 1      // 4095
)

type Generator struct {
    mu     sync.Mutex
    nodeID int64
    lastMS int64
    seq    int64
}

func New(nodeID int) *Generator { return &Generator{nodeID: int64(nodeID) & 0x3FF} }

func (g *Generator) Next() int64 {
    g.mu.Lock()
    defer g.mu.Unlock()
    ms := time.Now().UnixMilli() - epoch
    if ms == g.lastMS {
        g.seq = (g.seq + 1) & maxSeq
        if g.seq == 0 {
            // Sequence exhausted this ms — spin until next ms
            for ms <= g.lastMS {
                ms = time.Now().UnixMilli() - epoch
            }
        }
    } else {
        g.seq = 0
    }
    g.lastMS = ms
    return (ms << 22) | (g.nodeID << 12) | g.seq
}

Step 3 — Assign IDs Before Publishing, Not Inside the SSE Handler Permalink to this section

This is the single most common mistake in multi-node deployments. If each SSE handler generates its own ID after receiving an event from a message bus, different servers will assign different IDs to the same logical event. A client reconnecting to a different server will send a Last-Event-ID that no other server recognises.

Correct flow:

Producer → assign ID → publish {id, type, payload} to Redis Stream → SSE handlers tail stream → forward as-is
// publisher.js — runs on the service that creates events
import { createClient } from 'redis';
import { monotonicFactory } from 'ulid';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
const ulid = monotonicFactory();

export async function publishEvent(channel, type, payload) {
  const id = ulid();   // assigned HERE, before any SSE handler sees it
  await redis.xAdd(
    `sse:${channel}`,
    '*',
    { eid: id, type, data: JSON.stringify(payload) },
    { TRIM: { strategy: 'MAXLEN', threshold: 1000, strategyModifier: '~' } }
  );
  return id;
}
// sse-handler.js — runs on each API server node
export async function sseHandler(req, res) {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('X-Accel-Buffering', 'no');   // prevent nginx buffering
  res.flushHeaders();

  const channel = req.params.channel;
  const streamKey = `sse:${channel}`;
  // Use the ULID from the client as the replay start point
  const lastId = req.headers['last-event-id'] || '0';

  // Replay missed events (those after lastId in the stream)
  const missed = await redis.xRange(streamKey, lastId, '+');
  for (const { message } of missed) {
    if (message.eid === lastId) continue;    // skip the event the client already has
    res.write(`id: ${message.eid}\nevent: ${message.type}\ndata: ${message.data}\n\n`);
  }

  // Tail live events
  let cursor = '$';
  let active = true;
  req.on('close', () => { active = false; });

  while (active) {
    const results = await redis.xRead(
      [{ key: streamKey, id: cursor }],
      { COUNT: 50, BLOCK: 5000 }
    );
    if (!results) continue;
    for (const { messages } of results) {
      for (const { id: rid, message } of messages) {
        if (!active) break;
        res.write(`id: ${message.eid}\nevent: ${message.type}\ndata: ${message.data}\n\n`);
        cursor = rid;
      }
    }
  }
}

Step 4 — Persist the Sequence Across Restarts Permalink to this section

For Redis INCR, the counter survives restarts automatically as long as Redis persistence is enabled (appendonly yes in redis.conf or an RDB snapshot). Verify:

redis-cli CONFIG GET appendonly
# Expected output:
# 1) "appendonly"
# 2) "yes"

For in-process counters (fallback mode or single-process deployments), flush the current counter value to durable storage on shutdown:

import fs from 'fs/promises';

let localSeq = 0;

// On startup: restore the counter
async function initCounter() {
  try {
    const saved = await fs.readFile('/var/lib/myapp/sse_last_id.txt', 'utf8');
    localSeq = parseInt(saved.trim(), 10) || 0;
  } catch {
    localSeq = 0;  // first run
  }
}

// On shutdown: save the counter
async function persistCounter() {
  await fs.writeFile('/var/lib/myapp/sse_last_id.txt', String(localSeq));
}

process.on('SIGTERM', async () => {
  await persistCounter();
  process.exit(0);
});

process.on('SIGINT', async () => {
  await persistCounter();
  process.exit(0);
});

// In k8s, also handle SIGTERM via preStop hook to ensure the file write completes
// before the container is killed.

For Snowflake-style generators, the only thing to persist is the nodeID (assign once at provisioning time and store in an environment variable or a Kubernetes ConfigMap). The timestamp component self-recovers after a restart because wall-clock time always advances past the last emitted millisecond.

Step 5 — Handle the Replay-Window Gap Case Permalink to this section

When a client reconnects with a Last-Event-ID that has aged out of the replay buffer, do not return a 4xx status. A 4xx response causes EventSource to stop retrying permanently. Instead, return 200 and emit a sync-required event with a full state snapshot:

async function replayOrSync(lastId, streamKey, res) {
  const oldest = await redis.xRange(streamKey, '-', '+', { COUNT: 1 });
  const oldestEid = oldest[0]?.message?.eid ?? null;

  if (!oldestEid || lastId < oldestEid) {
    // ID is older than the replay window — send a full snapshot
    const snapshot = await buildStateSnapshot();   // your data layer
    const freshId = ulid();
    res.write(`id: ${freshId}\nevent: sync-required\ndata: ${JSON.stringify(snapshot)}\n\n`);
    return freshId;   // caller tails from here
  }
  return lastId;  // within window, normal replay proceeds
}

The client should listen for sync-required and rebuild local state from the snapshot payload rather than merging incremental events. See Broadcasting SSE Events with Redis Pub/Sub for the complementary server-side fan-out pattern.

Validation and Monitoring Permalink to this section

Verify ID Monotonicity with curl Permalink to this section

# Check that id: fields appear and increase on every event
curl -sN -H "Accept: text/event-stream" http://localhost:3000/events/orders \
  | grep "^id:"

# Expected output (IDs strictly increasing):
# id: 01HZAR3P2E0000000000000001
# id: 01HZAR3P2E0000000000000002
# id: 01HZAR3P2E0000000000000003
# Simulate reconnect — verify server resumes after given ID, not before
LAST="01HZAR3P2E0000000000000002"
curl -sN \
  -H "Accept: text/event-stream" \
  -H "Last-Event-ID: $LAST" \
  http://localhost:3000/events/orders \
  | grep "^id:" | head -5
# First id: line must be greater than $LAST (lexicographically for ULID, numerically for integers)

DevTools Verification Steps Permalink to this section

  1. Open Chrome DevTools → Network → filter by EventStream.
  2. Click the SSE request → EventStream sub-tab.
  3. Confirm lastEventId column updates on every row, not just the first.
  4. In DevTools → Network conditions, set throttling to Offline for 3 seconds, then No throttling.
  5. Watch the new request row: Headers tab must show Last-Event-ID: <last received value>.
  6. The first id: in the resumed stream must be strictly greater than the value in Last-Event-ID.

Unit Test Stub (Node.js / Jest) Permalink to this section

import { monotonicFactory } from 'ulid';

describe('ULID monotonicity', () => {
  it('generates strictly increasing IDs even in rapid succession', () => {
    const ulid = monotonicFactory();
    const ids = Array.from({ length: 1000 }, () => ulid());
    for (let i = 1; i < ids.length; i++) {
      // ULID lexicographic comparison is equivalent to time ordering
      expect(ids[i] > ids[i - 1]).toBe(true);
    }
  });
});

Prometheus Metrics to Export Permalink to this section

# prometheus.yml scrape config
scrape_configs:
  - job_name: sse_id_health
    scrape_interval: 15s
    static_configs:
      - targets: ['localhost:9464']

Instrument these counters in your SSE server:

  • sse_id_gaps_total — incremented when a reconnect Last-Event-ID is not found in the replay window (indicates window exhaustion).
  • sse_events_emitted_total — monotonically increasing event counter per channel; sudden drops indicate handler crashes.
  • sse_reconnect_replay_count — histogram of events replayed per reconnect; p99 > 200 suggests the replay window is undersized.

Alert on sse_id_gaps_total rate > 2% of sse_reconnect_total over a 5-minute window.

Confirm Proxy Does Not Strip Last-Event-ID Permalink to this section

Several proxies silently strip or transform the Last-Event-ID header on upstream requests. Test with tcpdump at the upstream server’s NIC:

# Capture HTTP requests on port 3000 and check for Last-Event-ID
sudo tcpdump -A -i lo 'tcp port 3000' 2>/dev/null | grep -i 'last-event-id'

If the header is absent from the tcpdump output but present in your test curl call, the proxy is stripping it. For nginx, confirm proxy_pass_request_headers on; and that the header is not listed in proxy_set_header overrides without the original value. For Buffer Management & Chunked Transfer Encoding details on nginx configuration that also affect ID delivery, see that guide.

⚡ Production Directives

  • Assign event IDs in the publisher before writing to the message bus — never inside the SSE handler — so all nodes emit the same ID for the same logical event.
  • Enable Redis AOF persistence (appendonly yes) so the INCR counter survives restarts; for in-process counters, flush to disk on SIGTERM and SIGINT.
  • Set X-Accel-Buffering: no on every SSE response and verify with tcpdump that the proxy passes Last-Event-ID to the upstream unchanged.
  • Respond to out-of-window Last-Event-ID with a 200 + sync-required event — never 4xx, which causes EventSource to stop reconnecting.
  • Size your Redis Stream replay window to cover at least 2× the 99th-percentile client reconnect gap (mobile on poor connections: target 600–1000 events minimum).

Verification Checklist Permalink to this section

Frequently Asked Questions Permalink to this section

Can I use a plain Date.now() timestamp as the event ID?

Only in single-process, single-deployment scenarios where you can guarantee the clock never rolls back and you restart with a counter seeded from the last persisted value. In practice this means Date.now() alone is fragile: two events in the same millisecond collide, NTP corrections can go backwards, and a counter that resets to the current epoch on startup will produce IDs lower than those a client holds from a previous server run. Use ULID's monotonicFactory instead — it uses Date.now() as its timestamp component but handles same-millisecond collisions and is drop-in compatible.

What NODE_ID should I assign to Snowflake generators in a Kubernetes deployment?

Assign node IDs via a Kubernetes StatefulSet ordinal (POD_NAME env var ends in -0, -1, …) or via a Redis SETNX lease at startup: each pod claims the first unclaimed integer in range 0–1023. Store the claimed ID in the pod's local state and release it on shutdown. For Deployments (not StatefulSets), use a Redis sorted set: ZADD nodes:pool NX <timestamp> <nodeId> to atomically claim an ID. Snowflake supports up to 1023 concurrent nodes per datacenter field; if you need more, widen the node field to 12 bits and narrow the sequence field to 10 bits.

Does the ULID monotonicFactory guarantee uniqueness across multiple processes?

No. The monotonic variant guarantees uniqueness within a single factory instance (single process/thread). Across processes, the 80-bit random component makes collisions astronomically unlikely (probability ≈ 1/2^80 per millisecond per pair of processes), but it is not guaranteed. For absolute uniqueness across nodes, use Redis INCR or assign each process a unique node ID and use Snowflake. In practice, ULID's collision probability is lower than hardware failure rates, so most production teams accept it for fan-out workloads where coordinating via Redis would add unacceptable latency.

How do I migrate from UUID event IDs to ULID without breaking existing clients?

Run a dual-emit period: emit both the old UUID (in the payload) and the new ULID as the id: field value. Existing clients holding UUID-format Last-Event-ID values will send those on reconnect. Add a migration handler that detects UUID-format IDs (regex /^[0-9a-f-]{36}$/), maps them to the ULID of the corresponding event via a lookup table in Redis, and resumes from there. After all clients have reconnected at least once and now hold ULID-format IDs (observable via metrics), remove the UUID lookup table and the dual-emit logic.

Should the retry: field value influence ID generation frequency?

No — the retry: field controls the client's reconnect delay, not how often events are emitted. However, your replay window size should be calibrated against the retry: value: if you set retry: 30000 (30 seconds) and emit 10 events/second, a client that disconnects and waits the full retry delay needs at least 300 events in the replay window to guarantee zero data loss. Size the window at 2× the retry value times the peak event rate. See Event ID & Retry Mechanism Design for the full interaction between retry intervals and replay windows.