Graceful Shutdown for Go SSE Servers Permalink to this section

Part of Go Streaming Patterns for SSE.

A Go SSE server that ignores OS signals kills every open stream mid-event when it restarts. Clients see an abrupt disconnect, EventSource fires an onerror, and they begin a retry reconnect loop with no guarantee of receiving the event that was in-flight. The correct goal is: receive SIGTERM, stop accepting new connections, drain all active stream handlers to a clean event boundary, then exit — without data loss, and within a bounded timeout.

Symptom and Developer Intent Permalink to this section

Observable behavior: After kill <pid> or a kubectl rollout restart, clients log connection errors immediately. Metrics show a spike in EventSource reconnections. Partially-written SSE frames arrive as malformed data: blocks on the client because the underlying TCP connection was torn down while http.Flusher.Flush() was mid-call.

Developer intent: You want http.Server.Shutdown to stop the listener, wait for every in-flight SSE handler to finish its current event write and then return, and propagate a cancellation signal so long-lived goroutines (ticker loops, Redis subscriber loops) also exit cleanly. The whole drain should complete within a configurable timeout (typically 15–30 s) so Kubernetes or systemd does not forcibly kill the process.

Root Cause Analysis Permalink to this section

http.Server.Close versus http.Server.Shutdown Permalink to this section

http.Server.Close is immediate: it closes the listener, then forcibly closes all active connections. Any goroutine blocked in response.Write or Flush gets a broken-pipe error and may leave a partial SSE frame on the wire.

http.Server.Shutdown is cooperative: it closes the listener and then waits for every connection’s ServeHTTP goroutine to return. This is what SSE handlers need.

However, http.Server.Shutdown has a subtlety: it only waits for hijacked connections if you manage them separately. Standard SSE handlers run as ordinary http.ResponseWriter goroutines, so Shutdown covers them automatically — but only if they actually return. A handler that blocks forever (e.g., for { select {} }) prevents Shutdown from completing. The handler must observe a context cancellation signal and exit.

SSE handlers are long-lived goroutines Permalink to this section

A typical SSE handler keeps the HTTP connection open for minutes or hours. It loops on a channel or ticker, writing events. Without a shutdown signal, the handler has no mechanism to break out of the loop and return from ServeHTTP, which means http.Server.Shutdown will block until its own deadline fires — then it returns context.DeadlineExceeded and Go’s HTTP internals force-close the remaining connections anyway.

Signal delivery in Go Permalink to this section

The default behavior for SIGTERM and SIGINT in a Go binary is immediate process exit. You must register an os/signal channel before the signal arrives or it is discarded.

Step-by-Step Resolution Permalink to this section

Step 1 — Register OS signal handling before starting the server Permalink to this section

package main

import (
    "context"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    // Buffer 1 so the signal send never blocks.
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)

    srv := &http.Server{
        Addr:    ":8080",
        Handler: buildRouter(),
    }

    // Start serving in a goroutine so main() can block on quit.
    go func() {
        if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatalf("ListenAndServe: %v", err)
        }
    }()

    log.Println("server listening on :8080")
    <-quit // block until SIGTERM or SIGINT
    log.Println("shutdown signal received")

    // Allow 30 s for active SSE streams to drain.
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    if err := srv.Shutdown(ctx); err != nil {
        log.Fatalf("forced shutdown: %v", err)
    }
    log.Println("server exited cleanly")
}

srv.ListenAndServe returns http.ErrServerClosed after Shutdown closes the listener — that is not an error.

Step 2 — Propagate a server-wide context into SSE handlers Permalink to this section

Create a root context that is cancelled when SIGTERM fires. Every SSE handler receives this context so it can break its event loop.

package main

import (
    "context"
    "net/http"
    "os"
    "os/signal"
    "syscall"
)

// serverContext returns a context that is cancelled when the process
// receives SIGTERM or SIGINT, plus a cancel func for testing.
func serverContext() (context.Context, context.CancelFunc) {
    ctx, cancel := context.WithCancel(context.Background())
    go func() {
        quit := make(chan os.Signal, 1)
        signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
        <-quit
        cancel()
    }()
    return ctx, cancel
}

func buildRouter() http.Handler {
    rootCtx, _ := serverContext()

    mux := http.NewServeMux()
    mux.HandleFunc("/events", func(w http.ResponseWriter, r *http.Request) {
        // Merge the server root context with the per-request context.
        // The per-request context is cancelled when the client disconnects.
        ctx, cancel := context.WithCancel(rootCtx)
        defer cancel()
        // Attach to the request so the handler receives either signal.
        r = r.WithContext(ctx)
        sseHandler(w, r)
    })
    return mux
}

Step 3 — Write an SSE handler that exits on context cancellation Permalink to this section

package main

import (
    "context"
    "fmt"
    "net/http"
    "time"
)

func sseHandler(w http.ResponseWriter, r *http.Request) {
    flusher, ok := w.(http.Flusher)
    if !ok {
        http.Error(w, "streaming unsupported", http.StatusInternalServerError)
        return
    }

    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    w.Header().Set("X-Accel-Buffering", "no") // disable nginx buffering
    w.Header().Set("Connection", "keep-alive")

    // Send an initial comment to flush proxy buffers.
    fmt.Fprint(w, ": connected\n\n")
    flusher.Flush()

    ticker := time.NewTicker(2 * time.Second)
    defer ticker.Stop()

    for {
        select {
        case <-r.Context().Done():
            // Either SIGTERM (root context cancelled) or client disconnect.
            // Return cleanly so http.Server.Shutdown can complete.
            return

        case t := <-ticker.C:
            // Write one complete SSE event before checking for shutdown.
            fmt.Fprintf(w, "data: %s\n\n", t.Format(time.RFC3339))
            flusher.Flush()
        }
    }
}

The select with r.Context().Done() is the essential piece. When the root context is cancelled at shutdown, the next select iteration unblocks and the handler returns, releasing the goroutine that http.Server.Shutdown is waiting on.

Step 4 — Track in-flight handlers with a WaitGroup for zero-drop drains Permalink to this section

If you need a guarantee that every event write completes before the process exits (not just that handlers have returned), use a sync.WaitGroup to count active handlers:

package main

import (
    "net/http"
    "sync"
)

type SSEServer struct {
    wg sync.WaitGroup
}

func (s *SSEServer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    s.wg.Add(1)
    defer s.wg.Done()
    sseHandler(w, r)
}

// DrainAndWait blocks until all active SSE handlers have returned.
// Call this after http.Server.Shutdown returns.
func (s *SSEServer) DrainAndWait() {
    s.wg.Wait()
}

In main, after srv.Shutdown:

sseServer.DrainAndWait()
log.Println("all SSE handlers finished")

Step 5 — Integrate with an external event source (Redis, channel broker) Permalink to this section

A real SSE handler typically receives events from a broker goroutine. That goroutine must also exit when the root context is cancelled. Here is the pattern for a channel-based broker:

package main

import (
    "context"
    "fmt"
    "net/http"
)

// Broker distributes events to all connected SSE clients.
type Broker struct {
    publish   chan string
    subscribe chan chan string
    cancel    chan chan string
}

func (b *Broker) Run(ctx context.Context) {
    clients := make(map[chan string]struct{})
    for {
        select {
        case <-ctx.Done():
            // Close all client channels so SSE handlers unblock immediately.
            for ch := range clients {
                close(ch)
            }
            return
        case ch := <-b.subscribe:
            clients[ch] = struct{}{}
        case ch := <-b.cancel:
            delete(clients, ch)
            close(ch)
        case msg := <-b.publish:
            for ch := range clients {
                // Non-blocking send; drop slow consumers rather than stalling shutdown.
                select {
                case ch <- msg:
                default:
                }
            }
        }
    }
}

func (b *Broker) Handler(w http.ResponseWriter, r *http.Request) {
    flusher, ok := w.(http.Flusher)
    if !ok {
        http.Error(w, "streaming unsupported", http.StatusInternalServerError)
        return
    }

    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")

    ch := make(chan string, 8) // buffered to absorb burst during shutdown
    b.subscribe <- ch
    defer func() { b.cancel <- ch }()

    fmt.Fprint(w, ": connected\n\n")
    flusher.Flush()

    for {
        select {
        case <-r.Context().Done():
            return
        case msg, ok := <-ch:
            if !ok {
                // Broker closed the channel (shutdown path).
                return
            }
            fmt.Fprintf(w, "data: %s\n\n", msg)
            flusher.Flush()
        }
    }
}

When the root context is cancelled, Broker.Run closes every client channel. Each SSE handler’s case msg, ok := <-ch detects ok == false and returns, freeing the goroutine.

Shutdown Behaviour Comparison Permalink to this section

Approach Accepts new conns after signal Active streams Timeout needed
os.Exit(0) direct Yes (race) Killed mid-frame No
http.Server.Close No Killed mid-frame No
http.Server.Shutdown (no ctx cancellation) No Block until timeout Yes
http.Server.Shutdown + ctx cancellation No Clean return at next select Yes (safety net)
Above + WaitGroup drain No Verified clean exit Yes (safety net)

Validation and Monitoring Permalink to this section

Smoke test with curl Permalink to this section

Open a stream in one terminal, then send SIGTERM to the server:

# Terminal 1 — start a long-lived SSE stream
curl -N http://localhost:8080/events

# Terminal 2 — graceful shutdown
kill -TERM $(pgrep -f myserver)

Expected: the curl session receives any in-flight event, then the connection closes with a clean TCP FIN (curl exits with code 0, not a reset). No partial data: line appears.

Verify with ss / netstat Permalink to this section

# Watch connections drain; should reach 0 within the timeout
watch -n1 'ss -tn sport = :8080 | grep ESTABLISHED | wc -l'

Integration test stub Permalink to this section

package main

import (
    "context"
    "net/http/httptest"
    "testing"
    "time"
)

func TestGracefulShutdown(t *testing.T) {
    rootCtx, cancel := context.WithCancel(context.Background())
    srv := httptest.NewServer(buildRouterWithContext(rootCtx))
    defer srv.Close()

    // Connect a client.
    // (use http.Get with a response body reader in a goroutine)
    done := make(chan struct{})
    go func() {
        // simulate long-lived SSE read — omitted for brevity
        close(done)
    }()

    // Trigger shutdown after 100 ms.
    time.AfterFunc(100*time.Millisecond, cancel)

    select {
    case <-done:
        // handler exited cleanly
    case <-time.After(2 * time.Second):
        t.Fatal("handler did not exit within timeout after context cancel")
    }
}

Kubernetes readiness probe Permalink to this section

Set terminationGracePeriodSeconds in your Pod spec to at least 5 s more than the http.Server.Shutdown timeout so Kubernetes does not send SIGKILL before the drain completes:

spec:
  terminationGracePeriodSeconds: 40   # shutdown timeout (30s) + buffer
  containers:
    - name: sse-server
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sleep", "5"]  # let LB drain before SIGTERM

The preStop sleep gives the load balancer time to stop routing new connections before SIGTERM fires.

⚡ Production Directives

  • Always use http.Server.Shutdown, never http.Server.Close or os.Exit, when SSE connections are active.
  • Derive every SSE handler's context from a root context cancelled on SIGTERM; use a select on r.Context().Done() to exit the event loop.
  • Set the http.Server.Shutdown deadline to 20–30 s and Kubernetes terminationGracePeriodSeconds to that value plus 10 s to prevent SIGKILL races.
  • Add a preStop: sleep 5 hook so the upstream load balancer drains before SIGTERM is delivered.
  • Use a sync.WaitGroup to confirm all handlers have exited before the process returns from main.

Verification Checklist Permalink to this section

Frequently Asked Questions Permalink to this section

Does http.Server.Shutdown wait for hijacked connections?

http.Server.Shutdown does not wait for hijacked connections (e.g., WebSocket upgrades). Standard SSE handlers that keep the http.ResponseWriter without hijacking are handled automatically. If you call conn.Hijack() anywhere, you must track those connections and close them manually before Shutdown returns.

What if a slow consumer holds a connection open past the shutdown timeout?

When the context.WithTimeout passed to http.Server.Shutdown expires, Shutdown returns context.DeadlineExceeded and the runtime force-closes remaining connections. To prevent this, use a non-blocking send in your broker (see the select { case ch <- msg: default: } pattern in Step 5) so slow consumers are dropped from the broker before shutdown starts, and use backpressure handling to evict them proactively.

How do I send a final SSE comment or close event before shutting down?

In the case <-r.Context().Done() branch, write a final event before returning: fmt.Fprint(w, "event: close\ndata: server_shutdown\n\n"); flusher.Flush(). The client can listen for eventType === 'close' to show a "server restarting" UI state instead of triggering the default reconnect. Note that the write may fail if the client already disconnected — ignore the error or log at debug level.

Can I use os/signal.NotifyContext instead of a manual channel?

Yes. signal.NotifyContext(parent, syscall.SIGTERM, syscall.SIGINT) (added in Go 1.16) returns a context that is automatically cancelled when a signal fires. It replaces the manual channel pattern in Steps 1 and 2. Pass that context as the root context and call the returned stop function in a defer to release signal resources. It is the idiomatic Go 1.16+ approach.