𝗛𝗼𝘄 𝗜 𝗠𝗲𝘀𝘀𝗲𝗱 𝗨𝗽 𝗔𝗜 𝗦𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 (𝗔𝗻𝗱 𝗛𝗼𝘄 𝗬𝗼𝘂 𝗔𝘃𝗼𝗶𝗱 𝗜𝘁)

📅6 days ago⏱1 min read

I built a code review assistant. It used AI to give feedback in real time. I wanted a stream of tokens. Everything went wrong.

First, I used a standard REST endpoint. The backend waited for the full AI response. The frontend timed out after 30 seconds. Users complained.

Next, I tried streaming with a buffer. I collected all tokens before sending them. The latency stayed the same. The user still waited.

Then, I tried Server-Sent Events (SSE). I ignored backpressure. The AI pushed tokens too fast. Memory grew. Connections died.

I fixed it with FastAPI and asyncio. Here is the setup:

Streaming is hard. Keep these points in mind:

My final advice:

Continue reading