𝗚𝗶𝘃𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝗚𝗮𝘁𝗲𝘄𝗮𝘆 𝗮 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗕𝗿𝗮𝗶𝗻
My AI agent routing used to be a mess.
I built a personal AI agent named Pi. It runs 24/7 from my living room. To save money, I used three different models:
- Ollama (Local) for coding.
- OpenAI for deep reasoning.
- Gemini for fast tasks.
To choose the right model, I used a Python script with keyword lists. It was a simple if-else chain.
It failed constantly. If a user asked about Rust patterns without using my specific keywords, the router sent it to the wrong model. If a user spoke Hindi, it broke.
The results were bad:
- 18% of requests went to the wrong model.
- I wasted money on expensive APIs for simple tasks.
- I had to manually update keywords every week.
I needed a system that understood meaning, not just keywords.
I switched to the vLLM Semantic Router with AgentGateway. This changed everything.
Instead of a Python script, the Semantic Router works as an Envoy sidecar. It uses a small 130MB embedding model to understand the intent of every prompt. You do not write keywords. You simply write a description of what each model does in a YAML file.
The results after two weeks:
- Misrouted requests dropped from 18% to 3%.
- Routing latency dropped from 45ms to 1ms.
- Monthly API costs dropped from $24 to $14.
- Maintenance is now zero.
The router uses embeddings to compare your prompt against your model descriptions. If you describe a model as a coding specialist, the router sends coding prompts there automatically. It even works across different languages.
If the router fails, the system stays online. I configured a fail-open policy. If the router crashes, the requests move to Gemini automatically. The agent never stops working.
I even found and helped fix two bugs in the source code related to ARM64 support on Apple Silicon. This is how open source should work. You find an issue, contribute a fix, and the whole community gets better.
If you build AI agents, stop using keyword matching. Use semantic routing to control your costs and improve your answers.
Optional learning community: https://t.me/GyaanSetuAi