How to Parse the text/event-stream MIME Type Correctly Permalink to this section

Part of Understanding the Event Stream Format.

The text/event-stream wire format looks trivial until production hits you with partial TCP chunks, mixed line endings, multi-line data: fields, and a UTF-8 BOM the browser silently strips but your custom parser chokes on. This guide walks through every rule the WHATWG HTML specification imposes on a conformant event stream parser and shows a production-ready implementation in JavaScript and Go.

Symptom & Developer Intent Permalink to this section

You are consuming an SSE endpoint without the browser’s native EventSource β€” either in a Node.js backend, a React component using fetch, or a CLI tool. You observe one or more of these failure modes:

  • evt.data is undefined or the raw unparsed line (e.g., "data: {\"status\":\"ok\"}") instead of the payload.
  • Multi-line data: blocks arrive as separate events rather than one joined string.
  • The parser hangs or emits stale state after a heartbeat comment line (: ping).
  • Events are dropped at chunk boundaries β€” fields split across two TCP packets produce one empty event and one half-parsed event.
  • id and retry fields are silently ignored, breaking event ID and retry mechanism semantics on reconnect.

The goal is a deterministic, zero-allocation-waste parser that converts an unbounded byte stream into a sequence of typed event objects with data, event, id, and retry fields.

Root Cause Analysis Permalink to this section

The protocol is line-framed, not length-prefixed Permalink to this section

text/event-stream uses newlines as delimiters. A TCP stack may deliver any number of bytes per read; there is no guarantee that a single read() call aligns with a logical line. Parsers that call response.text() or split the full response body once fail because:

  1. response.text() buffers the entire body and only resolves on connection close β€” real-time delivery is lost.
  2. Splitting a single value chunk on "\n" without a persistent cross-chunk buffer loses the tail of every chunk that ends mid-field.

The spec mandates specific whitespace rules Permalink to this section

The WHATWG spec (Β§9.2.6) states: if the line contains a U+003A COLON, the field name is the substring before the colon, and the field value is the substring after β€” with exactly one leading U+0020 SPACE stripped if present. Not trimmed, not all spaces β€” exactly one. Stripping all whitespace silently corrupts field values that start with a space intentionally.

Line-ending normalization is required Permalink to this section

The spec requires that \r\n and \r be treated identically to \n. Parsers that only split on "\n" leave a trailing \r on every field value when the server (commonly on Windows or some Go frameworks) uses CRLF.

BOM on the first chunk must be stripped Permalink to this section

A UTF-8 BOM (\xEF\xBB\xBF, decoded as U+FEFF) prepended by some servers or CDN edge nodes must be stripped before parsing. The TextDecoder API strips the BOM when constructed with { ignoreBOM: false } (the default), but a raw Buffer.toString('utf8') in Node.js does not.

Comment lines must be ignored, not break state Permalink to this section

Lines beginning with : are comments (used as heartbeats). Parsers that treat them as unknown fields and emit a partial event reset field state at the wrong moment, causing the next real event to inherit a half-initialized event or id field.

Failure pattern Root cause
data is raw "data: ..." string Not stripping field prefix; not buffering
Dropped event at chunk boundary No persistent cross-chunk buffer
Multi-line data split into N events Not accumulating data: lines before empty-line dispatch
Trailing \r in field values Missing \r\n normalization
id/retry ignored Switch/case doesn’t handle those field names
Parser breaks after : ping Comment line incorrectly resets eventObj

Step-by-Step Resolution Permalink to this section

Step 1 β€” Open a streaming reader, not a buffered body consumer Permalink to this section

const response = await fetch(url, {
  headers: { Accept: 'text/event-stream' },
  // signal: abortController.signal  // attach for cleanup
});

if (!response.ok) {
  throw new Error(`HTTP ${response.status} ${response.statusText}`);
}

const contentType = response.headers.get('content-type') ?? '';
if (!contentType.includes('text/event-stream')) {
  throw new TypeError(`Expected text/event-stream, got: ${contentType}`);
}

const reader = response.body.getReader();

Never use response.json() or response.text() β€” both buffer to EOF. response.body.getReader() gives a ReadableStreamDefaultReader that yields Uint8Array chunks as they arrive.

Step 2 β€” Decode bytes with a stateful TextDecoder and strip the BOM Permalink to this section

// stream: true keeps the decoder state across chunk boundaries
// (handles multi-byte UTF-8 characters split across chunks)
const decoder = new TextDecoder('utf-8', { ignoreBOM: true }); // strips U+FEFF

let buffer = '';  // persistent cross-chunk accumulator

Pass { stream: true } to decoder.decode(chunk, { stream: true }) so multi-byte sequences that straddle chunk boundaries are completed across calls.

Step 3 β€” Normalize line endings Permalink to this section

function normalizeLineEndings(str) {
  // Replace \r\n then lone \r so both become \n
  return str.replace(/\r\n/g, '\n').replace(/\r/g, '\n');
}

Apply this immediately after decoding each chunk, before appending to the buffer. Normalizing once per chunk is cheaper than checking each character during field parsing.

Step 4 β€” Split lines, preserve incomplete tail Permalink to this section

buffer += normalizeLineEndings(decoder.decode(chunk, { stream: true }));

const lines = buffer.split('\n');
// The last element is an incomplete line (no trailing \n yet).
// Pop it back into buffer for the next iteration.
buffer = lines.pop() ?? '';

This is the critical invariant: buffer always holds a potentially incomplete line that will be completed by future chunks.

Step 5 β€” Initialize per-event state and iterate lines Permalink to this section

// Reset at each event boundary (empty line)
let dataLines = [];
let eventType = 'message';  // default per spec
let lastEventId = null;
let retryMs = null;

Initializing eventType to 'message' matches the spec default. The id field must persist across events (it is the β€œlast event ID” for reconnect); only reset it when a new id: field is encountered, never on each event boundary.

Step 6 β€” Parse each field using a colon split, strip exactly one leading space Permalink to this section

for (const line of lines) {
  // Comment: ignore entirely, do not reset state
  if (line.startsWith(':')) continue;

  // Empty line: dispatch event if data accumulated
  if (line === '') {
    if (dataLines.length > 0) {
      yield {
        data: dataLines.join('\n'),
        event: eventType,
        id: lastEventId,
        retry: retryMs,
      };
    }
    // Reset per-event fields (but NOT lastEventId β€” it persists)
    dataLines = [];
    eventType = 'message';
    retryMs = null;
    continue;
  }

  const colonPos = line.indexOf(':');
  if (colonPos === -1) {
    // Spec: line with no colon β€” field name is the entire line, value is empty string
    // Treat as a field with empty value (relevant for bare "data" line)
    handleField(line, '');
    continue;
  }

  const field = line.slice(0, colonPos);
  // Strip exactly one leading space per spec Β§9.2.6
  const rawValue = line.slice(colonPos + 1);
  const value = rawValue.startsWith(' ') ? rawValue.slice(1) : rawValue;

  handleField(field, value);
}

function handleField(field, value) {
  switch (field) {
    case 'data':  dataLines.push(value); break;
    case 'event': eventType = value; break;
    case 'id':    lastEventId = value; break;  // empty string clears the ID
    case 'retry':
      const ms = parseInt(value, 10);
      if (!isNaN(ms)) retryMs = ms;
      break;
    // Unknown fields are ignored per spec
  }
}

Step 7 β€” Assemble the complete async generator Permalink to this section

async function* parseSSEStream(response) {
  const reader = response.body.getReader();
  const decoder = new TextDecoder('utf-8', { ignoreBOM: true });

  let buffer = '';
  let dataLines = [];
  let eventType = 'message';
  let lastEventId = null;
  let retryMs = null;

  try {
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true }).replace(/\r\n/g, '\n').replace(/\r/g, '\n');
      const lines = buffer.split('\n');
      buffer = lines.pop() ?? '';

      for (const line of lines) {
        if (line.startsWith(':')) continue;  // comment / heartbeat

        if (line === '') {
          if (dataLines.length > 0) {
            yield { data: dataLines.join('\n'), event: eventType, id: lastEventId, retry: retryMs };
          }
          dataLines = [];
          eventType = 'message';
          retryMs = null;
          continue;
        }

        const colonPos = line.indexOf(':');
        const field = colonPos === -1 ? line : line.slice(0, colonPos);
        const rawVal = colonPos === -1 ? '' : line.slice(colonPos + 1);
        const val = rawVal.startsWith(' ') ? rawVal.slice(1) : rawVal;

        switch (field) {
          case 'data':  dataLines.push(val); break;
          case 'event': eventType = val; break;
          case 'id':    lastEventId = val; break;
          case 'retry': { const ms = parseInt(val, 10); if (!isNaN(ms)) retryMs = ms; break; }
        }
      }
    }
  } finally {
    reader.cancel();  // release the lock even on throw/return
  }
}

// Caller
const res = await fetch('/api/events', { headers: { Accept: 'text/event-stream' } });
for await (const evt of parseSSEStream(res)) {
  console.log(evt.event, JSON.parse(evt.data));
}

Step 8 β€” Equivalent Go implementation Permalink to this section

For server-side fan-out or CLI tools consuming Redis Pub/Sub fan-out streams, a Go parser wraps bufio.Scanner:

package sseparse

import (
    "bufio"
    "io"
    "strconv"
    "strings"
)

type Event struct {
    Data  string
    Type  string
    ID    string
    Retry int // milliseconds, 0 = not set
}

// ParseStream reads from r and sends events on ch until EOF or error.
func ParseStream(r io.Reader, ch chan<- Event) error {
    scanner := bufio.NewScanner(r)
    scanner.Buffer(make([]byte, 64*1024), 1024*1024) // 1 MB max line

    var (
        dataLines []string
        eventType = "message"
        lastID    string
        retryMs   int
    )

    for scanner.Scan() {
        line := scanner.Text() // bufio strips \n; handles \r\n too

        if strings.HasPrefix(line, ":") {
            continue // comment
        }

        if line == "" {
            if len(dataLines) > 0 {
                ch <- Event{
                    Data:  strings.Join(dataLines, "\n"),
                    Type:  eventType,
                    ID:    lastID,
                    Retry: retryMs,
                }
            }
            dataLines = nil
            eventType = "message"
            retryMs = 0
            continue
        }

        field, value, _ := strings.Cut(line, ":")
        if strings.HasPrefix(value, " ") {
            value = value[1:] // strip exactly one leading space
        }

        switch field {
        case "data":  dataLines = append(dataLines, value)
        case "event": eventType = value
        case "id":    lastID = value
        case "retry":
            if ms, err := strconv.Atoi(value); err == nil {
                retryMs = ms
            }
        }
    }
    return scanner.Err()
}

bufio.Scanner handles \r\n stripping automatically. The strings.Cut call correctly handles fields that have no colon (returns field=line, value="", found=false).

Validation & Monitoring Permalink to this section

curl smoke test Permalink to this section

# -N disables response buffering; see raw wire format
curl -N -H "Accept: text/event-stream" https://example.com/api/events

# Expected output for a valid stream:
# data: {"status":"ok"}
#
# event: heartbeat
# data: ping
# id: 42
#

Unit test fixtures for edge cases Permalink to this section

import { describe, it, expect } from 'vitest';

// Helper: turn a string into a single-chunk ReadableStream response
function makeResponse(body) {
  const stream = new ReadableStream({
    start(c) { c.enqueue(new TextEncoder().encode(body)); c.close(); }
  });
  return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
}

describe('parseSSEStream', () => {
  it('strips exactly one leading space from values', async () => {
    const events = [];
    for await (const e of parseSSEStream(makeResponse('data:  leading\n\n')))
      events.push(e);
    expect(events[0].data).toBe(' leading'); // one space stripped, one remains
  });

  it('joins multi-line data with newline', async () => {
    const payload = 'data: line1\ndata: line2\n\n';
    const events = [];
    for await (const e of parseSSEStream(makeResponse(payload))) events.push(e);
    expect(events[0].data).toBe('line1\nline2');
  });

  it('ignores comment lines without resetting state', async () => {
    const payload = 'data: hello\n: heartbeat\ndata: world\n\n';
    const events = [];
    for await (const e of parseSSEStream(makeResponse(payload))) events.push(e);
    expect(events[0].data).toBe('hello\nworld');
  });

  it('handles CRLF line endings', async () => {
    const events = [];
    for await (const e of parseSSEStream(makeResponse('data: crlf\r\n\r\n'))) events.push(e);
    expect(events[0].data).toBe('crlf');
  });
});

DevTools verification Permalink to this section

  1. Open Network tab, filter by Fetch/XHR, click the SSE request.
  2. Under Headers, confirm content-type: text/event-stream (not application/octet-stream or text/plain).
  3. Under Response (Chrome) or EventStream (Firefox), confirm individual events appear as rows β€” not a single blob of text.
  4. Use Copy as cURL, replay with curl -N, and pipe through cat -A to reveal hidden \r characters: curl -N ... | cat -A.

Browser EventSource handles all parsing automatically β€” only custom fetch-based parsers need the steps above. For multi-line data: field formatting rules on the server side, see the companion guide.

Verification Checklist Permalink to this section

⚑ Production Directives

  • Never use response.text() or response.json() for SSE β€” they buffer to EOF and destroy real-time delivery.
  • Always construct TextDecoder with { ignoreBOM: true, stream: true } to handle BOM and multi-byte UTF-8 split across chunks.
  • Normalize line endings (\r\n β†’ \n, \r β†’ \n) immediately after decode, before any string splitting.
  • Cap your cross-chunk buffer at a sane limit (e.g., 1 MB) and abort the stream on overflow to prevent OOM from malformed or adversarial servers.
  • Call reader.cancel() in a finally block so the underlying TCP connection is released on generator return or throw.

Frequently Asked Questions Permalink to this section

Why does my parser drop the first event after a heartbeat comment?

Comment lines (: ...) must be silently discarded β€” they must not trigger the empty-line dispatch or reset accumulated data lines. If your parser resets dataLines on any unrecognized line, a : ping heartbeat before the first real data: line will clear partially-accumulated state. Check that if (line.startsWith(':')) continue; is the very first check, before the empty-line check.

What happens when a field has no colon, e.g., a bare "data" line?

The WHATWG spec says: if the line contains no colon, the entire line is the field name and the value is the empty string. So a bare data line appends an empty string to dataLines. This is rarely intentional but must not crash the parser. Handle it by checking colonPos === -1 and treating value as "".

Should I reset lastEventId to null on each event dispatch?

No. The id field sets the "last event ID" that the browser sends back in Last-Event-ID on reconnect. It persists until explicitly overwritten by a new id: field. An id: line with an empty value clears the ID. Never reset it on empty-line dispatch or you lose reconnect continuity β€” exactly the behavior event ID and retry mechanism design depends on.

Does the parser need to handle chunked transfer encoding explicitly?

No. HTTP chunked transfer encoding is handled transparently by the browser's Fetch layer and Node.js's http module. By the time bytes reach response.body.getReader(), chunk boundaries are invisible β€” you see a stream of application-layer bytes. The only boundaries you must handle are SSE's own line-based delimiters. See Buffer Management & Chunked Transfer Encoding for server-side considerations.

Can I use this parser with the Fetch API in Node.js 18+?

Yes. Node.js 18+ ships a native fetch with a Web Streams-compatible response.body. The async generator above works without modification. For earlier Node.js versions, use node-fetch v3 (ESM only) or consume the raw http.IncomingMessage as a Node.js Readable stream with a readline.createInterface equivalent.