Buffer Management & Chunked Transfer Encoding Permalink to this section

Part of Backend Stream Generation & Connection Management.

Without explicit flush controls, application servers silently accumulate SSE response payloads in memory until the connection closes β€” turning a real-time stream into a batched dump. Chunked transfer encoding is the HTTP/1.1 mechanism that escapes this trap: it lets the server emit variable-length frames the moment data is ready, without knowing the final response size. For SSE, every call to flush() must immediately drive a TCP write; any layer in the stack that re-buffers those chunks destroys latency guarantees and can cause unbounded memory growth under sustained load. This guide covers the wire format, per-runtime flush APIs, proxy/CDN interference, watermark-based memory bounding, and the observability tools needed to verify correct behaviour end-to-end.

SSE buffer and chunked encoding data path Data flows from the application event generator through an in-process write buffer, then through a chunked-encoding framer, optionally through an Nginx reverse proxy, and finally over TCP to the browser EventSource. Proxy buffering off and X-Accel-Buffering: no annotations are shown at the proxy layer. App Event Generator Write Buffer HWM: 16 KB flush() call Chunked Framer 3b\r\n data: hi\n\n \r\n Transfer-Encoding: chunked Nginx Proxy proxy_buffering off X-Accel-Buffering: no proxy_http_version 1.1 Browser EventSource backpressure signal emit event wire format
SSE data path: in-process write buffer β†’ chunked framer β†’ proxy (buffering disabled) β†’ browser EventSource. Backpressure signals flow upstream when the write buffer is full.

How Chunked Transfer Encoding Works Permalink to this section

HTTP/1.1 chunked transfer encoding (RFC 7230 Β§4.1, absorbed into RFC 9112) allows a response body to be sent as a series of chunks without knowing the total size in advance. Each chunk is:

<hex-length>\r\n
<chunk-data>\r\n

A terminal zero-length chunk signals end-of-stream:

0\r\n
\r\n

The server MUST NOT set Content-Length alongside Transfer-Encoding: chunked; the two are mutually exclusive. Modern HTTP stacks (Node.js http, Python http.server, Go net/http) activate chunked encoding automatically when the handler writes data without first setting Content-Length.

For SSE the wire format layering looks like this for a single event:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Transfer-Encoding: chunked

1b\r\n
data: {"temp":42.1}\n\n\r\n

1b hex = 27 bytes (the literal data: {"temp":42.1}\n\n). The outer CRLF pair belongs to the chunked encoding envelope; the inner \n\n is the SSE event terminator defined by the Understanding the Event Stream Format spec.

Flush vs. Chunk boundary alignment Permalink to this section

A critical misunderstanding: calling flush() does not guarantee one-chunk-per-event. The TCP stack may coalesce multiple flush calls if Nagle’s algorithm is active. Always disable Nagle at the socket level (TCP_NODELAY) for SSE endpoints, or verify that your runtime does so automatically (Node.js http server does; Go net/http does not set TCP_NODELAY by default before Go 1.22 on Linux).

Layer Flush call Nagle disabled Result
Node.js res.write() + res.flushHeaders() yes (implicit) yes (http module) immediate TCP write
Go http.Flusher.Flush() yes (explicit) no (pre-1.22) may coalesce chunks
Python WSGI no native flush depends on server buffered until chunk threshold
Python ASGI (Starlette/FastAPI) yes (yield) depends on Uvicorn immediate on yield
Nginx upstream N/A N/A re-buffers unless proxy_buffering off

Node.js Implementation Permalink to this section

Node.js http.ServerResponse writes are chunked automatically. The two must-dos are: call res.flushHeaders() before emitting events (so the browser’s EventSource receives the Content-Type: text/event-stream header without waiting for data), and never let the stream pause without a heartbeat.

import http from 'node:http';

const server = http.createServer((req, res) => {
  if (req.url !== '/events') { res.end(); return; }

  // Headers must arrive before any data; flushHeaders() forces the flush
  res.writeHead(200, {
    'Content-Type': 'text/event-stream',
    'Cache-Control': 'no-cache',
    'X-Accel-Buffering': 'no',   // tell Nginx not to buffer this response
    'Connection': 'keep-alive',
  });
  res.flushHeaders();

  let seq = 0;

  const send = (event, data) => {
    // Each write() call produces one chunk; Node.js http flushes synchronously
    res.write(`id: ${seq++}\nevent: ${event}\ndata: ${JSON.stringify(data)}\n\n`);
  };

  // Heartbeat every 15s prevents proxy idle-timeout disconnection
  const heartbeat = setInterval(() => res.write(': heartbeat\n\n'), 15_000);

  // Drain detection: if the kernel send buffer is full, write() returns false
  res.on('drain', () => {
    // resume upstream producer here
  });

  req.on('close', () => {
    clearInterval(heartbeat);
    // no res.end() needed; client closed the connection
  });

  // Example: push data from an event emitter
  const onData = (payload) => send('update', payload);
  eventBus.on('update', onData);
  req.on('close', () => eventBus.off('update', onData));
});

server.listen(3000);

res.write() returning false is the Node.js backpressure signal. When the internal write buffer exceeds its high-water mark (default 16 KB for writable streams), writes are queued in process memory. Pause your upstream producer and resume on the drain event to avoid unbounded queue growth. For a deeper treatment of connection lifecycle concerns see HTTP Keep-Alive & Connection Lifecycle.

Python / FastAPI Implementation Permalink to this section

WSGI servers (Gunicorn, uWSGI) buffer response bodies by default and expose no per-chunk flush API; SSE over WSGI is effectively impossible without server-specific hacks. Use an ASGI server (Uvicorn, Hypercorn) with FastAPI or raw Starlette, where yield in a generator naturally produces one chunk per iteration.

from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
import asyncio, json, time

app = FastAPI()

async def event_generator(request: Request):
    seq = 0
    try:
        while True:
            if await request.is_disconnected():
                break
            payload = json.dumps({"ts": time.time(), "seq": seq})
            # Each yield produces one HTTP chunk β€” no explicit flush needed
            yield f"id: {seq}\ndata: {payload}\n\n"
            seq += 1
            await asyncio.sleep(1)
    except asyncio.CancelledError:
        pass  # client disconnect

@app.get("/events")
async def sse_endpoint(request: Request):
    return StreamingResponse(
        event_generator(request),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "X-Accel-Buffering": "no",
            "Connection": "keep-alive",
        },
    )

Uvicorn passes each yielded bytes object directly to asyncio’s transport layer without intermediate buffering, provided you run with --no-access-log (the access log writer can hold references that delay GC). See the Python FastAPI SSE Implementation Guide for authentication and multi-tenant patterns.

Buffer sizing in Uvicorn / Hypercorn Permalink to this section

Both servers inherit the underlying asyncio transport’s write buffer limits. You can lower the high-water mark to detect backpressure sooner:

# In a custom asyncio Protocol or via server config
# Uvicorn (via h11 or httptools) does not expose HWM directly;
# use OS-level SO_SNDBUF instead:
import socket, uvicorn

class CustomServer(uvicorn.Server):
    async def startup(self, sockets=None):
        await super().startup(sockets)
        for sock in self.servers[0].sockets:
            sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 32768)  # 32 KB

Setting SO_SNDBUF to a low value (16–64 KB) means the kernel signals backpressure sooner, reducing per-connection memory overhead under high concurrency. Pair this with Rate Limiting & Backpressure Handling to drop slow consumers rather than accumulating write queues.

Go Implementation Permalink to this section

Go’s net/http handler activates chunked encoding automatically when you call w.Write() without setting Content-Length. The critical extra step is asserting http.Flusher and calling Flush() after each event write, otherwise the default 4 KB response buffer will hold data until full.

package main

import (
    "fmt"
    "net/http"
    "time"
)

func sseHandler(w http.ResponseWriter, r *http.Request) {
    flusher, ok := w.(http.Flusher)
    if !ok {
        http.Error(w, "streaming not supported", http.StatusInternalServerError)
        return
    }

    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    w.Header().Set("X-Accel-Buffering", "no")
    // Force chunked; no Content-Length
    w.WriteHeader(http.StatusOK)

    seq := 0
    ticker := time.NewTicker(time.Second)
    defer ticker.Stop()

    for {
        select {
        case <-r.Context().Done():
            // Client disconnected; context cancelled
            return
        case t := <-ticker.C:
            fmt.Fprintf(w, "id: %d\ndata: {\"ts\":%d}\n\n", seq, t.Unix())
            flusher.Flush() // pushes the buffered chunk to the kernel send buffer
            seq++
        }
    }
}

func main() {
    http.HandleFunc("/events", sseHandler)
    http.ListenAndServe(":8080", nil)
}

For deeper Go-specific buffer tuning β€” including sync.Pool reuse, zero-allocation formatting, and bufio.Writer wrapping β€” see Managing Memory Buffers in Go Streaming Servers. For channel-based fan-out patterns visit Go Streaming Patterns for SSE.

Edge Cases & Network Interference Permalink to this section

Chunked streaming breaks in surprising ways across the network stack. Every layer between your application server and the browser is a potential re-buffering point.

Proxy and CDN buffering Permalink to this section

The most common production failure: a reverse proxy (Nginx, HAProxy, AWS ALB, Cloudflare) accumulates chunks and delivers them in a burst after a timeout, or not at all.

Nginx mitigation:

location /events {
    proxy_pass          http://backend;
    proxy_http_version  1.1;           # required for keep-alive and chunked passthrough
    proxy_set_header    Connection '';  # clear hop-by-hop Connection header
    proxy_buffering     off;           # disable proxy_buffer_size accumulation
    proxy_cache         off;
    proxy_read_timeout  3600s;         # allow long-lived connections
    chunked_transfer_encoding on;

    # Tell Nginx not to buffer even if the upstream sends X-Accel-Buffering: yes
    add_header X-Accel-Buffering no;
}

AWS ALB / CloudFront: ALB does not buffer SSE; CloudFront does unless you set Origin Response Timeout β‰₯ connection duration and disable compression for text/event-stream. Set the cache policy to CachingDisabled.

Mitigation checklist for proxy interference:

Gzip compression kills chunked streaming Permalink to this section

Gzip and chunked transfer are mutually exclusive in practice. A gzip compressor must buffer data to build compression dictionaries; it cannot flush a single event as a valid gzip stream without using Z_SYNC_FLUSH, which is inefficient and rarely implemented correctly in framework middleware. Strip the Accept-Encoding header for SSE clients at the proxy layer, or disable compression per content-type in your application middleware.

# Nginx: exclude text/event-stream from gzip
gzip_types text/plain application/json;
# text/event-stream is omitted intentionally

HTTP/2 and chunked encoding Permalink to this section

HTTP/2 does not use Transfer-Encoding: chunked; framing is handled at the protocol layer. Browsers connecting over HTTP/2 will still receive events immediately if the server flushes the DATA frames, but the Transfer-Encoding header is stripped. EventSource is agnostic β€” it works over both HTTP/1.1 and HTTP/2 provided the connection stays open. Verify with DevTools β†’ Network β†’ Protocol column.

Firewall idle-timeout disconnections Permalink to this section

Stateful firewalls (and NAT devices) silently drop connections idle for longer than their timeout (typically 60–300 s). Send a comment-only heartbeat every 15–25 s to keep the connection alive through intermediate NAT:

// Comment lines (starting with ':') are valid SSE and ignored by EventSource
res.write(': heartbeat\n\n');

Pair this with the retry: field so clients reconnect quickly after a drop β€” see Event ID & Retry Mechanism Design.

Performance & Scale Considerations Permalink to this section

Memory per connection Permalink to this section

Each open SSE connection holds at minimum:

  • Kernel TCP send buffer (SO_SNDBUF, default 87 KB on Linux)
  • Application write buffer (Node.js writable HWM: 16 KB; Go ResponseWriter buffer: 4 KB; Python asyncio transport: varies)
  • Event queue entries if backpressure is not respected

At 10 000 concurrent connections, 87 KB TCP send buffers alone consume ~870 MB of kernel memory. Tune SO_SNDBUF to 8–16 KB for SSE β€” individual events are small and write latency matters more than throughput:

# System-wide; apply before the process starts
sysctl -w net.core.wmem_default=16384
sysctl -w net.core.wmem_max=16384

High-water mark and backpressure Permalink to this section

The high-water mark (HWM) is the threshold at which a writable stream reports that its internal queue is full and the producer should pause. Setting it too high wastes memory per slow client; too low causes excessive pause/resume cycles and CPU overhead.

Runtime Default HWM Recommended for SSE API to override
Node.js Writable 16 384 bytes 4 096–8 192 bytes new Writable({ highWaterMark: 4096 })
Go http.ResponseWriter 4 096 bytes (bufio) 2 048–4 096 bytes custom bufio.NewWriterSize wrapper
Python asyncio transport OS SO_SNDBUF set via setsockopt SO_SNDBUF on server socket
Nginx proxy_buffer_size 4 KB N/A (set to 0 with proxy_buffering off) proxy_buffering off

CPU cost of frequent small flushes Permalink to this section

Each Flush() call is a syscall (write(2) or send(2)). At 1 event/s per connection across 10 000 connections, that is 10 000 syscalls/s β€” negligible. At 100 events/s, you reach 1 M syscalls/s, which starts to become measurable. Consider batching events at the application layer into a single write() call when the event rate is high and the consumer can tolerate slight aggregation (e.g., telemetry dashboards). For fan-out from a message broker, see Redis Pub/Sub Fan-Out for SSE.

Connection count and file descriptor limits Permalink to this section

Each SSE connection is a file descriptor. On Linux the default per-process limit is 1 024. Increase it in /etc/security/limits.conf and confirm with ulimit -n. Node.js workers typically need 65536 for a moderately loaded SSE server. For pool sizing guidance see Connection Pooling for SSE Servers.

Validation & Debugging Permalink to this section

curl: verify chunked headers and immediate delivery Permalink to this section

# -N disables curl's own response buffering
# -v shows headers including Transfer-Encoding
curl -N -v -H "Accept: text/event-stream" http://localhost:3000/events

Look for < Transfer-Encoding: chunked in the verbose output. Events should appear line-by-line as they arrive, not in a single burst.

curl: check raw chunk framing Permalink to this section

# --raw passes the chunk headers through without decoding
curl -N --raw -H "Accept: text/event-stream" http://localhost:3000/events | xxd | head -40

You should see hex-encoded chunk size bytes (1b\r\n or similar) before each event payload.

Check for proxy re-buffering Permalink to this section

# Compare time-to-first-byte when hitting the app directly vs. through the proxy
time curl -N -s http://localhost:3000/events -o /dev/null --max-time 1
time curl -N -s https://api.example.com/events -o /dev/null --max-time 1

A multi-second TTFB through the proxy but sub-100 ms direct indicates buffering. Add X-Accel-Buffering: no to the application response and proxy_buffering off to Nginx.

DevTools: protocol and timing Permalink to this section

  1. Open DevTools β†’ Network β†’ filter by event-stream.
  2. Select the request β†’ Headers tab: confirm transfer-encoding: chunked (HTTP/1.1) or no content-length (HTTP/2).
  3. EventStream tab: events should appear individually as they arrive, not all at once.
  4. Timing tab: Waiting (TTFB) should be under 200 ms; Content Download should grow incrementally.

Structured logging for buffer events Permalink to this section

// Node.js: log whenever the write buffer exceeds HWM
res.on('drain', () => {
  logger.warn({ event: 'sse_drain', remoteAddr: req.socket.remoteAddress });
});

// Count back-pressure events in your metrics
metrics.counter('sse.backpressure.total').inc();

Track sse_drain events per endpoint over time. A rising rate indicates slow consumers or network congestion upstream of the client β€” candidates for Rate Limiting & Backpressure Handling strategies.

tcpdump verification Permalink to this section

# Capture on loopback port 3000, print ASCII payload
tcpdump -i lo -A 'tcp port 3000 and (tcp[tcpflags] & tcp-push != 0)' 2>/dev/null | grep -A2 'data:'

TCP segments with PSH flag set are being flushed to the receiver immediately. If you see large batches of events arriving simultaneously with a single PSH, buffering is occurring in the kernel (Nagle) or a middleware layer.

⚑ Production Directives

  • Set proxy_buffering off and send X-Accel-Buffering: no from every SSE handler β€” a single missed location block silently destroys real-time delivery.
  • Call flusher.Flush() (Go) or rely on yield (Python ASGI) after every event; never wait for the runtime's default buffer to fill.
  • Send a comment-only heartbeat (: heartbeat\n\n) every 15–25 s to prevent NAT/firewall idle-timeout disconnections.
  • Tune SO_SNDBUF to 8–16 KB per connection and monitor drain events in Node.js to detect slow consumers before they exhaust heap memory.
  • Disable gzip compression for text/event-stream at every proxy layer; gzip requires buffering the full body and will break streaming.

Production Checklist Permalink to this section

Frequently Asked Questions Permalink to this section

Why do I see events batching in the browser even though I call flush() on the server?

The most common cause is an intermediate proxy (Nginx, CDN, AWS ALB) re-buffering the response. Verify that proxy_buffering off is set in Nginx and that X-Accel-Buffering: no appears in the response headers the browser receives. A secondary cause is Nagle's algorithm on the server socket: on Go servers before 1.22 on Linux, TCP_NODELAY is not set by default, and the kernel may coalesce small writes before sending. You can set it explicitly with conn.(*net.TCPConn).SetNoDelay(true) after hijacking the connection.

Does HTTP/2 support chunked transfer encoding for SSE?

HTTP/2 does not use Transfer-Encoding: chunked β€” framing is handled at the protocol level by DATA frames. The browser's EventSource API works transparently over HTTP/2; you do not need to change your SSE format. However, you cannot inspect Transfer-Encoding: chunked in DevTools for HTTP/2 connections β€” instead verify that events arrive incrementally in the EventStream tab rather than in a burst.

What is the right high-water mark for an SSE write buffer?

For SSE, individual events are small (typically 50–500 bytes) and latency matters more than throughput. A HWM of 4–8 KB in Node.js (vs. the 16 KB default) means backpressure signals are generated sooner, limiting per-connection memory to a tighter bound. If you have a mix of event sizes or burst patterns, profile under load and lower the HWM until drain events appear at a rate of fewer than 1 per minute per connection at normal load.

Can I use gzip compression with SSE?

No, not safely. Gzip requires buffering the entire deflate stream to produce a valid compressed block unless Z_SYNC_FLUSH is used, which most framework middleware does not implement correctly. The practical consequence is that the middleware buffers the entire SSE response until the connection closes, delivering nothing in real time. Disable gzip (and Brotli) for text/event-stream responses at every proxy layer. SSE events are already text, so compression provides minimal benefit for the small, frequently flushed payloads that SSE is designed for.

How do I detect that a CDN is silently buffering my SSE stream in production?

Compare time-to-first-byte (TTFB) for the SSE endpoint hit directly against the application server vs. through the CDN. If TTFB through the CDN is >500 ms but the direct TTFB is <100 ms, the CDN is buffering. Also instrument your client: record the timestamp when an event is emitted on the server (embed it in the event payload) and subtract it from the client-side Date.now() when the onmessage handler fires. A sustained delta of several seconds confirms CDN-layer buffering. Check your CDN's documentation for text/event-stream MIME type handling and ensure streaming is not disabled in the cache policy.

Deep Dives