Configuring connection pools for high-concurrency SSE

Symptom & Developer Intent

High-concurrency SSE deployments drop connections, throw ERR_CONNECTION_RESET, or exhaust worker threads under load. Engineers require precise pool tuning to sustain thousands of persistent, long-lived HTTP streams without blocking event loops or exhausting OS file descriptors. The objective is stable throughput and zero resource starvation during traffic spikes.

Root Cause Analysis

Default HTTP server connection pools optimize for short-lived request/response cycles. SSE requires persistent, unidirectional streams. Defaults like maxSockets, keepAliveTimeout, and worker thread limits saturate rapidly, forcing new connections into queues or triggering hard resets. Misconfigured backpressure, missing Connection: keep-alive headers, and aggressive reverse-proxy buffering cause premature socket closures, heap bloat, and silent stream termination.

Step-by-Step Resolution

1. Calculate & Set Pool Capacity

Determine baseline capacity using: (concurrent_clients * avg_stream_duration) / timeout_window. Set maxSockets to this cap or Infinity paired with strict OS file descriptor limits.

OS Limits (Linux):

# Check current limit
ulimit -n
# Set soft/hard limits for the service user
sudo sysctl -w fs.file-max=2097152
echo "sse-service soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "sse-service hard nofile 131072" | sudo tee -a /etc/security/limits.conf

Node.js Agent Configuration:

const http = require('http');
const agent = new http.Agent({
 keepAlive: true,
 maxSockets: Infinity, // Rely on OS ulimit for hard cap
 maxFreeSockets: 256,
 timeout: 0 // Disable agent-level socket timeout
});

2. Tune Server Keep-Alive

Align keepAliveTimeout with your SSE heartbeat interval plus a 2x buffer. Disable idle connection pruning for active SSE sockets to prevent mid-stream teardowns.

const server = http.createServer(app);
server.keepAliveTimeout = 120_000; // 2x typical 60s heartbeat
server.headersTimeout = 130_000; // Must exceed keepAliveTimeout
server.timeout = 0; // Disable global request timeout
server.listen(3000);

3. Configure Reverse Proxy

Proxies aggressively buffer and timeout long-lived streams by default. Disable buffering, extend read timeouts, and enforce chunked transfer encoding.

Nginx Configuration:

location /api/sse {
 proxy_pass http://backend_upstream;
 proxy_http_version 1.1;
 proxy_set_header Connection "";
 proxy_buffering off;
 proxy_cache off;
 proxy_read_timeout 3600s;
 proxy_send_timeout 3600s;
 chunked_transfer_encoding on;
}

4. Implement Lifecycle Hooks & Pool Architecture

Track active sockets, enforce graceful shutdown, and apply per-client rate limits to prevent pool starvation. Reference Connection Pooling for SSE Servers for proven architecture patterns.

const activeStreams = new Map();

function handleSSE(req, res) {
 const clientId = crypto.randomUUID();
 activeStreams.set(clientId, res);

 // Graceful drain on SIGTERM
 process.on('SIGTERM', () => {
 res.write(`event: shutdown\ndata: Server shutting down\n\n`);
 res.end();
 activeStreams.delete(clientId);
 });

 res.writeHead(200, {
 'Content-Type': 'text/event-stream',
 'Cache-Control': 'no-cache',
 'Connection': 'keep-alive'
 });
}

5. Apply Backpressure Controls

Tune highWaterMark on writable streams. Pause event emitters when client drain rates fall below threshold to prevent heap overflow and OOM kills.

const { Writable } = require('stream');

const sseStream = new Writable({
 highWaterMark: 64 * 1024, // 64KB
 write(chunk, encoding, callback) {
 if (this._writableState.length > this._writableState.highWaterMark) {
 this.emit('backpressure');
 }
 callback();
 }
});

// Emitter integration
eventEmitter.on('data', (payload) => {
 if (!sseStream.writableNeedDrain) {
 sseStream.write(`data: ${JSON.stringify(payload)}\n\n`);
 } else {
 // Queue or drop based on SLA
 sseStream.once('drain', () => {
 sseStream.write(`data: ${JSON.stringify(payload)}\n\n`);
 });
 }
});

Validation & Monitoring

1. Instrument Metrics

Expose http_active_connections, pool_utilization_percent, and sse_disconnects_total via Prometheus. Use prom-client to scrape Node.js event loop lag and heap usage alongside connection counts.

const client = require('prom-client');
const connectionsGauge = new client.Gauge({
 name: 'http_active_connections',
 help: 'Current active SSE connections'
});
setInterval(() => connectionsGauge.set(activeStreams.size), 1000);

2. Load Test

Simulate 10k concurrent clients using k6. Verify 0% connection drops, stable RSS memory, and consistent event delivery latency (<100ms p99).

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
 vus: 10000,
 duration: '5m',
 thresholds: {
 http_req_duration: ['p(99)<100'],
 http_req_failed: ['rate<0.01'],
 },
};

export default function () {
 const res = http.get('https://your-api.com/stream', {
 headers: { 'Accept': 'text/event-stream' },
 timeout: '300s'
 });
 check(res, { 'status is 200': (r) => r.status === 200 });
 sleep(60);
}

3. Header Verification

Confirm required headers on every response. Missing headers trigger browser/client fallback behavior and premature closures.

curl -sI -N https://your-api.com/stream | grep -iE '(cache-control|connection|content-type|transfer-encoding)'
# Expected:
# Cache-Control: no-cache
# Connection: keep-alive
# Content-Type: text/event-stream
# Transfer-Encoding: chunked

4. Integrate Lifecycle Tracking

Align pool metrics with broader Backend Stream Generation & Connection Management observability standards for end-to-end reliability. Correlate sse_disconnects_total with upstream latency, GC pauses, and proxy error logs to isolate pool saturation vs. network-induced drops.

Configuring connection pools for high-concurrency SSE #

Symptom & Developer Intent #

Root Cause Analysis #

Step-by-Step Resolution #

1. Calculate & Set Pool Capacity #

2. Tune Server Keep-Alive #

3. Configure Reverse Proxy #

4. Implement Lifecycle Hooks & Pool Architecture #

5. Apply Backpressure Controls #

Validation & Monitoring #

1. Instrument Metrics #

2. Load Test #

3. Header Verification #

4. Integrate Lifecycle Tracking #