𝗬𝗢𝗨𝗥 𝗔𝗚𝗘𝗡𝗧 𝗙𝗔𝗜𝗟𝗘𝗗 𝗜𝗡 𝗣𝗥𝗢𝗗. 𝗚𝗢𝗢𝗗 𝗟𝗨𝗖𝗞 𝗥𝗘𝗣𝗥𝗢𝗗𝗨𝗖𝗜𝗡𝗚 𝗜𝗧.
Your agent deleted the wrong record. A customer sends a screenshot of the damage. You copy the prompt from the logs. You run it again. It works perfectly.
You are unable to find the bug. You are unable to promise it will not happen again.
Many teams try to set temperature to zero. They think this makes the model deterministic. It does not.
GPU math is not exact. The server batches your request with other users. This batching changes the math. A tiny change in a number changes the word the model picks. One word shifts. The whole sentence drifts.
You might think you should force the model to be identical every time. Do not do this. Randomness is why LLMs are good. It prevents boring, looping text. It helps models find more accurate answers. It lets agents try new paths to solve a problem.
You do not need a deterministic model. You need a reproducible run.
Stop chasing the model. Record the evidence. Capture the full envelope for every run:
- The exact assembled prompt
- The model version
- The retrieved context chunks
- The tool inputs and outputs
- The final completion
When a ticket lands, do not re-run the model. Open the recorded envelope. The failure is frozen. You see the exact path the agent took. You find the error. You fix it.
Keep the creativity in generation. Put the determinism in replay.
Source: https://dev.to/tisha_chawla/your-agent-failed-in-prod-good-luck-reproducing-it-56ci Optional learning community: https://t.me/GyaanSetuAi