๐ ๐๐น๐บ๐ผ๐๐ ๐๐ฎ๐๐ฒ ๐จ๐ฝ ๐ข๐ป ๐ ๐ ๐๐ ๐๐๐๐ถ๐๐๐ฎ๐ป๐
I spent months building a personal AI assistant. It was supposed to summarize emails and answer questions about my notes.
It started simple with a few Python scripts. Then problems appeared.
The longer the chat went, the worse the bot became. It forgot previous messages. It contradicted itself. It repeated advice. My API costs also went up fast.
I tried three common fixes, but they all failed:
- Sending the whole history: This hit token limits and broke the conversation.
- A sliding window: Keeping only the last few messages made the bot forget everything from earlier.
- Summarizing every turn: This worked but cost too much money and was too slow.
I needed a way to keep recent messages intact while maintaining a summary of older parts. I found the solution in hierarchical context management.
The design is simple:
- Keep the last 5 to 10 messages as raw text.
- Turn older history into a single summary string.
You do not need to summarize after every message. Only trigger a summary when the conversation grows past a certain point.
Here is how the logic works:
- Set a threshold for the number of recent messages.
- Set a time limit so you do not summarize too often.
- When both conditions meet, move older messages into the summary.
This approach helps the bot remember key points without breaking the budget. It works for 90% of use cases.
A few things to keep in mind:
- Summary quality is vital. Use a good model for the summary task.
- This is not for legal or medical work. You will lose fine details.
- For web apps, run the summarization as a background job so it does not slow down the user.
- Store context in a database like Redis so the bot does not forget everything if the server restarts.
How do you handle context in your AI apps? Do you use a vector store or a fixed window?
Optional learning community: https://t.me/GyaanSetuAi