How to Parse the text/event-stream MIME Type Correctly Permalink to this section
Part of Understanding the Event Stream Format.
The text/event-stream wire format looks trivial until production hits you with partial TCP chunks, mixed line endings, multi-line data: fields, and a UTF-8 BOM the browser silently strips but your custom parser chokes on. This guide walks through every rule the WHATWG HTML specification imposes on a conformant event stream parser and shows a production-ready implementation in JavaScript and Go.
Symptom & Developer Intent Permalink to this section
You are consuming an SSE endpoint without the browserβs native EventSource β either in a Node.js backend, a React component using fetch, or a CLI tool. You observe one or more of these failure modes:
evt.dataisundefinedor the raw unparsed line (e.g.,"data: {\"status\":\"ok\"}") instead of the payload.- Multi-line
data:blocks arrive as separate events rather than one joined string. - The parser hangs or emits stale state after a heartbeat comment line (
: ping). - Events are dropped at chunk boundaries β fields split across two TCP packets produce one empty event and one half-parsed event.
idandretryfields are silently ignored, breaking event ID and retry mechanism semantics on reconnect.
The goal is a deterministic, zero-allocation-waste parser that converts an unbounded byte stream into a sequence of typed event objects with data, event, id, and retry fields.
Root Cause Analysis Permalink to this section
The protocol is line-framed, not length-prefixed Permalink to this section
text/event-stream uses newlines as delimiters. A TCP stack may deliver any number of bytes per read; there is no guarantee that a single read() call aligns with a logical line. Parsers that call response.text() or split the full response body once fail because:
response.text()buffers the entire body and only resolves on connection close β real-time delivery is lost.- Splitting a single
valuechunk on"\n"without a persistent cross-chunk buffer loses the tail of every chunk that ends mid-field.
The spec mandates specific whitespace rules Permalink to this section
The WHATWG spec (Β§9.2.6) states: if the line contains a U+003A COLON, the field name is the substring before the colon, and the field value is the substring after β with exactly one leading U+0020 SPACE stripped if present. Not trimmed, not all spaces β exactly one. Stripping all whitespace silently corrupts field values that start with a space intentionally.
Line-ending normalization is required Permalink to this section
The spec requires that \r\n and \r be treated identically to \n. Parsers that only split on "\n" leave a trailing \r on every field value when the server (commonly on Windows or some Go frameworks) uses CRLF.
BOM on the first chunk must be stripped Permalink to this section
A UTF-8 BOM (\xEF\xBB\xBF, decoded as U+FEFF) prepended by some servers or CDN edge nodes must be stripped before parsing. The TextDecoder API strips the BOM when constructed with { ignoreBOM: false } (the default), but a raw Buffer.toString('utf8') in Node.js does not.
Comment lines must be ignored, not break state Permalink to this section
Lines beginning with : are comments (used as heartbeats). Parsers that treat them as unknown fields and emit a partial event reset field state at the wrong moment, causing the next real event to inherit a half-initialized event or id field.
| Failure pattern | Root cause |
|---|---|
data is raw "data: ..." string |
Not stripping field prefix; not buffering |
| Dropped event at chunk boundary | No persistent cross-chunk buffer |
| Multi-line data split into N events | Not accumulating data: lines before empty-line dispatch |
Trailing \r in field values |
Missing \r\n normalization |
id/retry ignored |
Switch/case doesnβt handle those field names |
Parser breaks after : ping |
Comment line incorrectly resets eventObj |
Step-by-Step Resolution Permalink to this section
Step 1 β Open a streaming reader, not a buffered body consumer Permalink to this section
const response = await fetch(url, {
headers: { Accept: 'text/event-stream' },
// signal: abortController.signal // attach for cleanup
});
if (!response.ok) {
throw new Error(`HTTP ${response.status} ${response.statusText}`);
}
const contentType = response.headers.get('content-type') ?? '';
if (!contentType.includes('text/event-stream')) {
throw new TypeError(`Expected text/event-stream, got: ${contentType}`);
}
const reader = response.body.getReader();
Never use response.json() or response.text() β both buffer to EOF. response.body.getReader() gives a ReadableStreamDefaultReader that yields Uint8Array chunks as they arrive.
Step 2 β Decode bytes with a stateful TextDecoder and strip the BOM Permalink to this section
// stream: true keeps the decoder state across chunk boundaries
// (handles multi-byte UTF-8 characters split across chunks)
const decoder = new TextDecoder('utf-8', { ignoreBOM: true }); // strips U+FEFF
let buffer = ''; // persistent cross-chunk accumulator
Pass { stream: true } to decoder.decode(chunk, { stream: true }) so multi-byte sequences that straddle chunk boundaries are completed across calls.
Step 3 β Normalize line endings Permalink to this section
function normalizeLineEndings(str) {
// Replace \r\n then lone \r so both become \n
return str.replace(/\r\n/g, '\n').replace(/\r/g, '\n');
}
Apply this immediately after decoding each chunk, before appending to the buffer. Normalizing once per chunk is cheaper than checking each character during field parsing.
Step 4 β Split lines, preserve incomplete tail Permalink to this section
buffer += normalizeLineEndings(decoder.decode(chunk, { stream: true }));
const lines = buffer.split('\n');
// The last element is an incomplete line (no trailing \n yet).
// Pop it back into buffer for the next iteration.
buffer = lines.pop() ?? '';
This is the critical invariant: buffer always holds a potentially incomplete line that will be completed by future chunks.
Step 5 β Initialize per-event state and iterate lines Permalink to this section
// Reset at each event boundary (empty line)
let dataLines = [];
let eventType = 'message'; // default per spec
let lastEventId = null;
let retryMs = null;
Initializing eventType to 'message' matches the spec default. The id field must persist across events (it is the βlast event IDβ for reconnect); only reset it when a new id: field is encountered, never on each event boundary.
Step 6 β Parse each field using a colon split, strip exactly one leading space Permalink to this section
for (const line of lines) {
// Comment: ignore entirely, do not reset state
if (line.startsWith(':')) continue;
// Empty line: dispatch event if data accumulated
if (line === '') {
if (dataLines.length > 0) {
yield {
data: dataLines.join('\n'),
event: eventType,
id: lastEventId,
retry: retryMs,
};
}
// Reset per-event fields (but NOT lastEventId β it persists)
dataLines = [];
eventType = 'message';
retryMs = null;
continue;
}
const colonPos = line.indexOf(':');
if (colonPos === -1) {
// Spec: line with no colon β field name is the entire line, value is empty string
// Treat as a field with empty value (relevant for bare "data" line)
handleField(line, '');
continue;
}
const field = line.slice(0, colonPos);
// Strip exactly one leading space per spec Β§9.2.6
const rawValue = line.slice(colonPos + 1);
const value = rawValue.startsWith(' ') ? rawValue.slice(1) : rawValue;
handleField(field, value);
}
function handleField(field, value) {
switch (field) {
case 'data': dataLines.push(value); break;
case 'event': eventType = value; break;
case 'id': lastEventId = value; break; // empty string clears the ID
case 'retry':
const ms = parseInt(value, 10);
if (!isNaN(ms)) retryMs = ms;
break;
// Unknown fields are ignored per spec
}
}
Step 7 β Assemble the complete async generator Permalink to this section
async function* parseSSEStream(response) {
const reader = response.body.getReader();
const decoder = new TextDecoder('utf-8', { ignoreBOM: true });
let buffer = '';
let dataLines = [];
let eventType = 'message';
let lastEventId = null;
let retryMs = null;
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true }).replace(/\r\n/g, '\n').replace(/\r/g, '\n');
const lines = buffer.split('\n');
buffer = lines.pop() ?? '';
for (const line of lines) {
if (line.startsWith(':')) continue; // comment / heartbeat
if (line === '') {
if (dataLines.length > 0) {
yield { data: dataLines.join('\n'), event: eventType, id: lastEventId, retry: retryMs };
}
dataLines = [];
eventType = 'message';
retryMs = null;
continue;
}
const colonPos = line.indexOf(':');
const field = colonPos === -1 ? line : line.slice(0, colonPos);
const rawVal = colonPos === -1 ? '' : line.slice(colonPos + 1);
const val = rawVal.startsWith(' ') ? rawVal.slice(1) : rawVal;
switch (field) {
case 'data': dataLines.push(val); break;
case 'event': eventType = val; break;
case 'id': lastEventId = val; break;
case 'retry': { const ms = parseInt(val, 10); if (!isNaN(ms)) retryMs = ms; break; }
}
}
}
} finally {
reader.cancel(); // release the lock even on throw/return
}
}
// Caller
const res = await fetch('/api/events', { headers: { Accept: 'text/event-stream' } });
for await (const evt of parseSSEStream(res)) {
console.log(evt.event, JSON.parse(evt.data));
}
Step 8 β Equivalent Go implementation Permalink to this section
For server-side fan-out or CLI tools consuming Redis Pub/Sub fan-out streams, a Go parser wraps bufio.Scanner:
package sseparse
import (
"bufio"
"io"
"strconv"
"strings"
)
type Event struct {
Data string
Type string
ID string
Retry int // milliseconds, 0 = not set
}
// ParseStream reads from r and sends events on ch until EOF or error.
func ParseStream(r io.Reader, ch chan<- Event) error {
scanner := bufio.NewScanner(r)
scanner.Buffer(make([]byte, 64*1024), 1024*1024) // 1 MB max line
var (
dataLines []string
eventType = "message"
lastID string
retryMs int
)
for scanner.Scan() {
line := scanner.Text() // bufio strips \n; handles \r\n too
if strings.HasPrefix(line, ":") {
continue // comment
}
if line == "" {
if len(dataLines) > 0 {
ch <- Event{
Data: strings.Join(dataLines, "\n"),
Type: eventType,
ID: lastID,
Retry: retryMs,
}
}
dataLines = nil
eventType = "message"
retryMs = 0
continue
}
field, value, _ := strings.Cut(line, ":")
if strings.HasPrefix(value, " ") {
value = value[1:] // strip exactly one leading space
}
switch field {
case "data": dataLines = append(dataLines, value)
case "event": eventType = value
case "id": lastID = value
case "retry":
if ms, err := strconv.Atoi(value); err == nil {
retryMs = ms
}
}
}
return scanner.Err()
}
bufio.Scanner handles \r\n stripping automatically. The strings.Cut call correctly handles fields that have no colon (returns field=line, value="", found=false).
Validation & Monitoring Permalink to this section
curl smoke test Permalink to this section
# -N disables response buffering; see raw wire format
curl -N -H "Accept: text/event-stream" https://example.com/api/events
# Expected output for a valid stream:
# data: {"status":"ok"}
#
# event: heartbeat
# data: ping
# id: 42
#
Unit test fixtures for edge cases Permalink to this section
import { describe, it, expect } from 'vitest';
// Helper: turn a string into a single-chunk ReadableStream response
function makeResponse(body) {
const stream = new ReadableStream({
start(c) { c.enqueue(new TextEncoder().encode(body)); c.close(); }
});
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
}
describe('parseSSEStream', () => {
it('strips exactly one leading space from values', async () => {
const events = [];
for await (const e of parseSSEStream(makeResponse('data: leading\n\n')))
events.push(e);
expect(events[0].data).toBe(' leading'); // one space stripped, one remains
});
it('joins multi-line data with newline', async () => {
const payload = 'data: line1\ndata: line2\n\n';
const events = [];
for await (const e of parseSSEStream(makeResponse(payload))) events.push(e);
expect(events[0].data).toBe('line1\nline2');
});
it('ignores comment lines without resetting state', async () => {
const payload = 'data: hello\n: heartbeat\ndata: world\n\n';
const events = [];
for await (const e of parseSSEStream(makeResponse(payload))) events.push(e);
expect(events[0].data).toBe('hello\nworld');
});
it('handles CRLF line endings', async () => {
const events = [];
for await (const e of parseSSEStream(makeResponse('data: crlf\r\n\r\n'))) events.push(e);
expect(events[0].data).toBe('crlf');
});
});
DevTools verification Permalink to this section
- Open Network tab, filter by
Fetch/XHR, click the SSE request. - Under Headers, confirm
content-type: text/event-stream(notapplication/octet-streamortext/plain). - Under Response (Chrome) or EventStream (Firefox), confirm individual events appear as rows β not a single blob of text.
- Use Copy as cURL, replay with
curl -N, and pipe throughcat -Ato reveal hidden\rcharacters:curl -N ... | cat -A.
Browser EventSource handles all parsing automatically β only custom fetch-based parsers need the steps above. For multi-line data: field formatting rules on the server side, see the companion guide.
Verification Checklist Permalink to this section
β‘ Production Directives
- Never use
response.text()orresponse.json()for SSE β they buffer to EOF and destroy real-time delivery. - Always construct
TextDecoderwith{ ignoreBOM: true, stream: true }to handle BOM and multi-byte UTF-8 split across chunks. - Normalize line endings (
\r\nβ\n,\rβ\n) immediately after decode, before any string splitting. - Cap your cross-chunk buffer at a sane limit (e.g., 1 MB) and abort the stream on overflow to prevent OOM from malformed or adversarial servers.
- Call
reader.cancel()in afinallyblock so the underlying TCP connection is released on generator return or throw.
Frequently Asked Questions Permalink to this section
Why does my parser drop the first event after a heartbeat comment?
Comment lines (: ...) must be silently discarded β they must not trigger the empty-line dispatch or reset accumulated data lines. If your parser resets dataLines on any unrecognized line, a : ping heartbeat before the first real data: line will clear partially-accumulated state. Check that if (line.startsWith(':')) continue; is the very first check, before the empty-line check.
What happens when a field has no colon, e.g., a bare "data" line?
The WHATWG spec says: if the line contains no colon, the entire line is the field name and the value is the empty string. So a bare data line appends an empty string to dataLines. This is rarely intentional but must not crash the parser. Handle it by checking colonPos === -1 and treating value as "".
Should I reset lastEventId to null on each event dispatch?
No. The id field sets the "last event ID" that the browser sends back in Last-Event-ID on reconnect. It persists until explicitly overwritten by a new id: field. An id: line with an empty value clears the ID. Never reset it on empty-line dispatch or you lose reconnect continuity β exactly the behavior event ID and retry mechanism design depends on.
Does the parser need to handle chunked transfer encoding explicitly?
No. HTTP chunked transfer encoding is handled transparently by the browser's Fetch layer and Node.js's http module. By the time bytes reach response.body.getReader(), chunk boundaries are invisible β you see a stream of application-layer bytes. The only boundaries you must handle are SSE's own line-based delimiters. See Buffer Management & Chunked Transfer Encoding for server-side considerations.
Can I use this parser with the Fetch API in Node.js 18+?
Yes. Node.js 18+ ships a native fetch with a Web Streams-compatible response.body. The async generator above works without modification. For earlier Node.js versions, use node-fetch v3 (ESM only) or consume the raw http.IncomingMessage as a Node.js Readable stream with a readline.createInterface equivalent.