Streaming Claude to the Browser With Real Backpressure
Streaming LLM tokens is easy to get 80% right. The last 20% is where most developers fail. Naive setups work on your local machine but break on slow connections or with fast models.
If you want production-grade streaming, you must handle two specific problems.
The Nginx Buffer Problem Many developers forget the X-Accel-Buffering header. Without setting this to no, Nginx buffers your stream. Your user sees nothing until the entire response finishes. This defeats the purpose of streaming.
The Abandoned Stream Problem This is the most expensive mistake. If a user closes a tab or loses connection while the model is generating, the server keeps running. Your loop keeps pulling tokens from Claude. You pay for output that no one sees.
The Fix: End-to-End Aborts You must link the request signal to the Claude stream. When the client disconnects, the server must stop generating immediately.
In your Next.js route, pass the request signal to the Anthropic SDK:
- Use { signal: request.signal } in your SDK call.
- Add an event listener for the abort signal.
- Call llm.abort() and controller.close() when an abort occurs.
This stops the generation and stops your bill from growing.
On the Frontend The browser receives chunks at random boundaries. You must buffer these chunks and split them by the SSE delimiter.
- Use an AbortController in your fetch call.
- Return that controller to your React component.
- Call controller.abort() in the component cleanup function.
This ensures the abort signal travels from the UI all the way back to your server.
One final tip for performance: Fast models emit tokens faster than the DOM can repaint. Updating React state for every single token will lag your UI. Buffer tokens and update in batches. This keeps your interface smooth.
Stop building demo-only streams. Disable proxy buffering and propagate aborts to save money and build robust apps.
Optional learning community: https://t.me/GyaanSetuAi
