𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗙𝗮𝘀𝘁𝗖𝗼𝗻𝘁𝗲𝘅 𝗖𝘂𝘁𝘀 𝗖𝗼𝗱𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁 𝗧𝗼𝗸𝗲𝗻𝘀 𝗯𝘆 𝟲𝟬%
Coding agents waste too much time searching for code.
When an agent searches a repository, it often pulls every file into its own context window. This fills the "desk" with raw data before the agent even starts coding.
Microsoft researchers studied GPT-5.4 traces and found a massive problem:
- Searching and reading code took 56.2% of all tool use.
- It consumed 46.5% of the main agent's total tokens.
Most of this data is low signal. The agent only needs a few lines, but it carries the whole file.
Microsoft released FastContext to solve this.
Instead of the main agent doing the searching, it uses a dedicated explorer subagent. Think of this like a librarian. You stay at your desk, and you send a librarian into the stacks to find information.
How it works:
- The main agent sends a natural language query to the explorer.
- The explorer uses read-only tools like Read, Glob, and Grep.
- The explorer finds the code in its own separate context.
- Instead of sending the whole file back, it sends a "file-line citation."
- A citation looks like this: path/to/file.ts:88-104.
The main agent gets the exact location without the bulky text.
The results are significant:
- Up to 60% reduction in token usage.
- Up to 5.5% increase in task resolution rates.
The explorer model (4B-30B) undergoes two training stages. First, supervised fine-tuning teaches it how to explore. Second, task-grounded reinforcement learning ensures it finds evidence that actually helps the main agent solve the problem.
By offloading the "haystack" to a subagent, the main agent keeps its context window clean for actual reasoning and coding.
Optional learning community: https://t.me/GyaanSetuAi