𝗦𝘁𝗼𝗽 𝗟𝗼𝗮𝗱𝗶𝗻𝗴 𝗘𝘃𝗲𝗿𝘆 𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻 𝗜𝗻𝘁𝗼 𝗘𝘃𝗲𝗿𝘆 𝗦𝗲𝘀𝘀𝗶𝗼𝗻
Most people focus on better prompts. They ignore what happens before the prompt starts. They load too many instructions into the assistant context.
This causes three problems:
- High token costs.
- High latency.
- Low signal-to-noise ratio.
When you load a massive instruction file for every small question, it is like reading an entire employee handbook before asking a simple question. Most of that information is useless for the current task.
The more rules you add, the more you dilute the relevant parts. More context does not mean more competence.
I solved this by moving from a single file to a modular system. I split my instructions into specialized modules:
• instructions.md: A small entry point that always loads. • persona.md: Personality and tone. • structure.md: System structure for navigation tasks. • workflows.md: Specific rules for ending sessions.
Now, the main file acts as a router. It only calls other modules when the task requires them.
For example:
- If you need to navigate a project, load structure.md.
- If you need to end a session, load workflows.md.
- If you have a quick question, load nothing else.
The results were clear. My baseline token load dropped from 4,800 tokens to 1,450 tokens. That is a 70% reduction.
The goal is not to make instructions smaller. The goal is to separate baseline load from on-demand load.
Baseline load is what you pay for every single time. You must keep this tiny. On-demand load is what you load only when it matters. This can be large and detailed.
This approach has trade-offs. You gain efficiency but add complexity in how you route instructions. You must ensure the assistant can access the modules reliably.
If your instructions are small, do not do this. It is a waste of time. If your instruction set is huge and growing, do this immediately.
Stop forcing the assistant to carry unnecessary weight. Keep the room clear of irrelevant instructions.
Optional learning community: https://t.me/GyaanSetuAi