Buffer Management & Chunked Transfer Encoding Permalink to this section
Part of Backend Stream Generation & Connection Management.
Without explicit flush controls, application servers silently accumulate SSE response payloads in memory until the connection closes β turning a real-time stream into a batched dump. Chunked transfer encoding is the HTTP/1.1 mechanism that escapes this trap: it lets the server emit variable-length frames the moment data is ready, without knowing the final response size. For SSE, every call to flush() must immediately drive a TCP write; any layer in the stack that re-buffers those chunks destroys latency guarantees and can cause unbounded memory growth under sustained load. This guide covers the wire format, per-runtime flush APIs, proxy/CDN interference, watermark-based memory bounding, and the observability tools needed to verify correct behaviour end-to-end.
How Chunked Transfer Encoding Works Permalink to this section
HTTP/1.1 chunked transfer encoding (RFC 7230 Β§4.1, absorbed into RFC 9112) allows a response body to be sent as a series of chunks without knowing the total size in advance. Each chunk is:
<hex-length>\r\n
<chunk-data>\r\n
A terminal zero-length chunk signals end-of-stream:
0\r\n
\r\n
The server MUST NOT set Content-Length alongside Transfer-Encoding: chunked; the two are mutually exclusive. Modern HTTP stacks (Node.js http, Python http.server, Go net/http) activate chunked encoding automatically when the handler writes data without first setting Content-Length.
For SSE the wire format layering looks like this for a single event:
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Transfer-Encoding: chunked
1b\r\n
data: {"temp":42.1}\n\n\r\n
1b hex = 27 bytes (the literal data: {"temp":42.1}\n\n). The outer CRLF pair belongs to the chunked encoding envelope; the inner \n\n is the SSE event terminator defined by the Understanding the Event Stream Format spec.
Flush vs. Chunk boundary alignment Permalink to this section
A critical misunderstanding: calling flush() does not guarantee one-chunk-per-event. The TCP stack may coalesce multiple flush calls if Nagleβs algorithm is active. Always disable Nagle at the socket level (TCP_NODELAY) for SSE endpoints, or verify that your runtime does so automatically (Node.js http server does; Go net/http does not set TCP_NODELAY by default before Go 1.22 on Linux).
| Layer | Flush call | Nagle disabled | Result |
|---|---|---|---|
Node.js res.write() + res.flushHeaders() |
yes (implicit) | yes (http module) | immediate TCP write |
Go http.Flusher.Flush() |
yes (explicit) | no (pre-1.22) | may coalesce chunks |
| Python WSGI | no native flush | depends on server | buffered until chunk threshold |
| Python ASGI (Starlette/FastAPI) | yes (yield) | depends on Uvicorn | immediate on yield |
| Nginx upstream | N/A | N/A | re-buffers unless proxy_buffering off |
Node.js Implementation Permalink to this section
Node.js http.ServerResponse writes are chunked automatically. The two must-dos are: call res.flushHeaders() before emitting events (so the browserβs EventSource receives the Content-Type: text/event-stream header without waiting for data), and never let the stream pause without a heartbeat.
import http from 'node:http';
const server = http.createServer((req, res) => {
if (req.url !== '/events') { res.end(); return; }
// Headers must arrive before any data; flushHeaders() forces the flush
res.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'X-Accel-Buffering': 'no', // tell Nginx not to buffer this response
'Connection': 'keep-alive',
});
res.flushHeaders();
let seq = 0;
const send = (event, data) => {
// Each write() call produces one chunk; Node.js http flushes synchronously
res.write(`id: ${seq++}\nevent: ${event}\ndata: ${JSON.stringify(data)}\n\n`);
};
// Heartbeat every 15s prevents proxy idle-timeout disconnection
const heartbeat = setInterval(() => res.write(': heartbeat\n\n'), 15_000);
// Drain detection: if the kernel send buffer is full, write() returns false
res.on('drain', () => {
// resume upstream producer here
});
req.on('close', () => {
clearInterval(heartbeat);
// no res.end() needed; client closed the connection
});
// Example: push data from an event emitter
const onData = (payload) => send('update', payload);
eventBus.on('update', onData);
req.on('close', () => eventBus.off('update', onData));
});
server.listen(3000);
res.write() returning false is the Node.js backpressure signal. When the internal write buffer exceeds its high-water mark (default 16 KB for writable streams), writes are queued in process memory. Pause your upstream producer and resume on the drain event to avoid unbounded queue growth. For a deeper treatment of connection lifecycle concerns see HTTP Keep-Alive & Connection Lifecycle.
Python / FastAPI Implementation Permalink to this section
WSGI servers (Gunicorn, uWSGI) buffer response bodies by default and expose no per-chunk flush API; SSE over WSGI is effectively impossible without server-specific hacks. Use an ASGI server (Uvicorn, Hypercorn) with FastAPI or raw Starlette, where yield in a generator naturally produces one chunk per iteration.
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
import asyncio, json, time
app = FastAPI()
async def event_generator(request: Request):
seq = 0
try:
while True:
if await request.is_disconnected():
break
payload = json.dumps({"ts": time.time(), "seq": seq})
# Each yield produces one HTTP chunk β no explicit flush needed
yield f"id: {seq}\ndata: {payload}\n\n"
seq += 1
await asyncio.sleep(1)
except asyncio.CancelledError:
pass # client disconnect
@app.get("/events")
async def sse_endpoint(request: Request):
return StreamingResponse(
event_generator(request),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
"Connection": "keep-alive",
},
)
Uvicorn passes each yielded bytes object directly to asyncioβs transport layer without intermediate buffering, provided you run with --no-access-log (the access log writer can hold references that delay GC). See the Python FastAPI SSE Implementation Guide for authentication and multi-tenant patterns.
Buffer sizing in Uvicorn / Hypercorn Permalink to this section
Both servers inherit the underlying asyncio transportβs write buffer limits. You can lower the high-water mark to detect backpressure sooner:
# In a custom asyncio Protocol or via server config
# Uvicorn (via h11 or httptools) does not expose HWM directly;
# use OS-level SO_SNDBUF instead:
import socket, uvicorn
class CustomServer(uvicorn.Server):
async def startup(self, sockets=None):
await super().startup(sockets)
for sock in self.servers[0].sockets:
sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 32768) # 32 KB
Setting SO_SNDBUF to a low value (16β64 KB) means the kernel signals backpressure sooner, reducing per-connection memory overhead under high concurrency. Pair this with Rate Limiting & Backpressure Handling to drop slow consumers rather than accumulating write queues.
Go Implementation Permalink to this section
Goβs net/http handler activates chunked encoding automatically when you call w.Write() without setting Content-Length. The critical extra step is asserting http.Flusher and calling Flush() after each event write, otherwise the default 4 KB response buffer will hold data until full.
package main
import (
"fmt"
"net/http"
"time"
)
func sseHandler(w http.ResponseWriter, r *http.Request) {
flusher, ok := w.(http.Flusher)
if !ok {
http.Error(w, "streaming not supported", http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("X-Accel-Buffering", "no")
// Force chunked; no Content-Length
w.WriteHeader(http.StatusOK)
seq := 0
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
for {
select {
case <-r.Context().Done():
// Client disconnected; context cancelled
return
case t := <-ticker.C:
fmt.Fprintf(w, "id: %d\ndata: {\"ts\":%d}\n\n", seq, t.Unix())
flusher.Flush() // pushes the buffered chunk to the kernel send buffer
seq++
}
}
}
func main() {
http.HandleFunc("/events", sseHandler)
http.ListenAndServe(":8080", nil)
}
For deeper Go-specific buffer tuning β including sync.Pool reuse, zero-allocation formatting, and bufio.Writer wrapping β see Managing Memory Buffers in Go Streaming Servers. For channel-based fan-out patterns visit Go Streaming Patterns for SSE.
Edge Cases & Network Interference Permalink to this section
Chunked streaming breaks in surprising ways across the network stack. Every layer between your application server and the browser is a potential re-buffering point.
Proxy and CDN buffering Permalink to this section
The most common production failure: a reverse proxy (Nginx, HAProxy, AWS ALB, Cloudflare) accumulates chunks and delivers them in a burst after a timeout, or not at all.
Nginx mitigation:
location /events {
proxy_pass http://backend;
proxy_http_version 1.1; # required for keep-alive and chunked passthrough
proxy_set_header Connection ''; # clear hop-by-hop Connection header
proxy_buffering off; # disable proxy_buffer_size accumulation
proxy_cache off;
proxy_read_timeout 3600s; # allow long-lived connections
chunked_transfer_encoding on;
# Tell Nginx not to buffer even if the upstream sends X-Accel-Buffering: yes
add_header X-Accel-Buffering no;
}
AWS ALB / CloudFront: ALB does not buffer SSE; CloudFront does unless you set Origin Response Timeout β₯ connection duration and disable compression for text/event-stream. Set the cache policy to CachingDisabled.
Mitigation checklist for proxy interference:
Gzip compression kills chunked streaming Permalink to this section
Gzip and chunked transfer are mutually exclusive in practice. A gzip compressor must buffer data to build compression dictionaries; it cannot flush a single event as a valid gzip stream without using Z_SYNC_FLUSH, which is inefficient and rarely implemented correctly in framework middleware. Strip the Accept-Encoding header for SSE clients at the proxy layer, or disable compression per content-type in your application middleware.
# Nginx: exclude text/event-stream from gzip
gzip_types text/plain application/json;
# text/event-stream is omitted intentionally
HTTP/2 and chunked encoding Permalink to this section
HTTP/2 does not use Transfer-Encoding: chunked; framing is handled at the protocol layer. Browsers connecting over HTTP/2 will still receive events immediately if the server flushes the DATA frames, but the Transfer-Encoding header is stripped. EventSource is agnostic β it works over both HTTP/1.1 and HTTP/2 provided the connection stays open. Verify with DevTools β Network β Protocol column.
Firewall idle-timeout disconnections Permalink to this section
Stateful firewalls (and NAT devices) silently drop connections idle for longer than their timeout (typically 60β300 s). Send a comment-only heartbeat every 15β25 s to keep the connection alive through intermediate NAT:
// Comment lines (starting with ':') are valid SSE and ignored by EventSource
res.write(': heartbeat\n\n');
Pair this with the retry: field so clients reconnect quickly after a drop β see Event ID & Retry Mechanism Design.
Performance & Scale Considerations Permalink to this section
Memory per connection Permalink to this section
Each open SSE connection holds at minimum:
- Kernel TCP send buffer (
SO_SNDBUF, default 87 KB on Linux) - Application write buffer (Node.js writable HWM: 16 KB; Go
ResponseWriterbuffer: 4 KB; Python asyncio transport: varies) - Event queue entries if backpressure is not respected
At 10 000 concurrent connections, 87 KB TCP send buffers alone consume ~870 MB of kernel memory. Tune SO_SNDBUF to 8β16 KB for SSE β individual events are small and write latency matters more than throughput:
# System-wide; apply before the process starts
sysctl -w net.core.wmem_default=16384
sysctl -w net.core.wmem_max=16384
High-water mark and backpressure Permalink to this section
The high-water mark (HWM) is the threshold at which a writable stream reports that its internal queue is full and the producer should pause. Setting it too high wastes memory per slow client; too low causes excessive pause/resume cycles and CPU overhead.
| Runtime | Default HWM | Recommended for SSE | API to override |
|---|---|---|---|
Node.js Writable |
16 384 bytes | 4 096β8 192 bytes | new Writable({ highWaterMark: 4096 }) |
Go http.ResponseWriter |
4 096 bytes (bufio) | 2 048β4 096 bytes | custom bufio.NewWriterSize wrapper |
| Python asyncio transport | OS SO_SNDBUF |
set via setsockopt |
SO_SNDBUF on server socket |
Nginx proxy_buffer_size |
4 KB | N/A (set to 0 with proxy_buffering off) |
proxy_buffering off |
CPU cost of frequent small flushes Permalink to this section
Each Flush() call is a syscall (write(2) or send(2)). At 1 event/s per connection across 10 000 connections, that is 10 000 syscalls/s β negligible. At 100 events/s, you reach 1 M syscalls/s, which starts to become measurable. Consider batching events at the application layer into a single write() call when the event rate is high and the consumer can tolerate slight aggregation (e.g., telemetry dashboards). For fan-out from a message broker, see Redis Pub/Sub Fan-Out for SSE.
Connection count and file descriptor limits Permalink to this section
Each SSE connection is a file descriptor. On Linux the default per-process limit is 1 024. Increase it in /etc/security/limits.conf and confirm with ulimit -n. Node.js workers typically need 65536 for a moderately loaded SSE server. For pool sizing guidance see Connection Pooling for SSE Servers.
Validation & Debugging Permalink to this section
curl: verify chunked headers and immediate delivery Permalink to this section
# -N disables curl's own response buffering
# -v shows headers including Transfer-Encoding
curl -N -v -H "Accept: text/event-stream" http://localhost:3000/events
Look for < Transfer-Encoding: chunked in the verbose output. Events should appear line-by-line as they arrive, not in a single burst.
curl: check raw chunk framing Permalink to this section
# --raw passes the chunk headers through without decoding
curl -N --raw -H "Accept: text/event-stream" http://localhost:3000/events | xxd | head -40
You should see hex-encoded chunk size bytes (1b\r\n or similar) before each event payload.
Check for proxy re-buffering Permalink to this section
# Compare time-to-first-byte when hitting the app directly vs. through the proxy
time curl -N -s http://localhost:3000/events -o /dev/null --max-time 1
time curl -N -s https://api.example.com/events -o /dev/null --max-time 1
A multi-second TTFB through the proxy but sub-100 ms direct indicates buffering. Add X-Accel-Buffering: no to the application response and proxy_buffering off to Nginx.
DevTools: protocol and timing Permalink to this section
- Open DevTools β Network β filter by
event-stream. - Select the request β Headers tab: confirm
transfer-encoding: chunked(HTTP/1.1) or nocontent-length(HTTP/2). - EventStream tab: events should appear individually as they arrive, not all at once.
- Timing tab:
Waiting (TTFB)should be under 200 ms;Content Downloadshould grow incrementally.
Structured logging for buffer events Permalink to this section
// Node.js: log whenever the write buffer exceeds HWM
res.on('drain', () => {
logger.warn({ event: 'sse_drain', remoteAddr: req.socket.remoteAddress });
});
// Count back-pressure events in your metrics
metrics.counter('sse.backpressure.total').inc();
Track sse_drain events per endpoint over time. A rising rate indicates slow consumers or network congestion upstream of the client β candidates for Rate Limiting & Backpressure Handling strategies.
tcpdump verification Permalink to this section
# Capture on loopback port 3000, print ASCII payload
tcpdump -i lo -A 'tcp port 3000 and (tcp[tcpflags] & tcp-push != 0)' 2>/dev/null | grep -A2 'data:'
TCP segments with PSH flag set are being flushed to the receiver immediately. If you see large batches of events arriving simultaneously with a single PSH, buffering is occurring in the kernel (Nagle) or a middleware layer.
β‘ Production Directives
- Set
proxy_buffering offand sendX-Accel-Buffering: nofrom every SSE handler β a single missed location block silently destroys real-time delivery. - Call
flusher.Flush()(Go) or rely onyield(Python ASGI) after every event; never wait for the runtime's default buffer to fill. - Send a comment-only heartbeat (
: heartbeat\n\n) every 15β25 s to prevent NAT/firewall idle-timeout disconnections. - Tune
SO_SNDBUFto 8β16 KB per connection and monitordrainevents in Node.js to detect slow consumers before they exhaust heap memory. - Disable gzip compression for
text/event-streamat every proxy layer; gzip requires buffering the full body and will break streaming.
Production Checklist Permalink to this section
Frequently Asked Questions Permalink to this section
Why do I see events batching in the browser even though I call flush() on the server?
The most common cause is an intermediate proxy (Nginx, CDN, AWS ALB) re-buffering the response. Verify that proxy_buffering off is set in Nginx and that X-Accel-Buffering: no appears in the response headers the browser receives. A secondary cause is Nagle's algorithm on the server socket: on Go servers before 1.22 on Linux, TCP_NODELAY is not set by default, and the kernel may coalesce small writes before sending. You can set it explicitly with conn.(*net.TCPConn).SetNoDelay(true) after hijacking the connection.
Does HTTP/2 support chunked transfer encoding for SSE?
HTTP/2 does not use Transfer-Encoding: chunked β framing is handled at the protocol level by DATA frames. The browser's EventSource API works transparently over HTTP/2; you do not need to change your SSE format. However, you cannot inspect Transfer-Encoding: chunked in DevTools for HTTP/2 connections β instead verify that events arrive incrementally in the EventStream tab rather than in a burst.
What is the right high-water mark for an SSE write buffer?
For SSE, individual events are small (typically 50β500 bytes) and latency matters more than throughput. A HWM of 4β8 KB in Node.js (vs. the 16 KB default) means backpressure signals are generated sooner, limiting per-connection memory to a tighter bound. If you have a mix of event sizes or burst patterns, profile under load and lower the HWM until drain events appear at a rate of fewer than 1 per minute per connection at normal load.
Can I use gzip compression with SSE?
No, not safely. Gzip requires buffering the entire deflate stream to produce a valid compressed block unless Z_SYNC_FLUSH is used, which most framework middleware does not implement correctly. The practical consequence is that the middleware buffers the entire SSE response until the connection closes, delivering nothing in real time. Disable gzip (and Brotli) for text/event-stream responses at every proxy layer. SSE events are already text, so compression provides minimal benefit for the small, frequently flushed payloads that SSE is designed for.
How do I detect that a CDN is silently buffering my SSE stream in production?
Compare time-to-first-byte (TTFB) for the SSE endpoint hit directly against the application server vs. through the CDN. If TTFB through the CDN is >500 ms but the direct TTFB is <100 ms, the CDN is buffering. Also instrument your client: record the timestamp when an event is emitted on the server (embed it in the event payload) and subtract it from the client-side Date.now() when the onmessage handler fires. A sustained delta of several seconds confirms CDN-layer buffering. Check your CDN's documentation for text/event-stream MIME type handling and ensure streaming is not disabled in the cache policy.