๐ฆ๐ฒ๐น๐ณ ๐๐ผ๐๐๐ฒ๐ฑ ๐๐๐ ๐๐ผ๐ป๐๐ถ๐๐๐ฒ๐ป๐ฐ๐ ๐๐๐๐๐ฒ๐
Sequential tests worked. You got the same answer every time.
Parallel tests failed. The model disagreed with itself. It returned the right answer 87% of the time.
Batching causes this. Requests arrive at once. Floating point results change between a batch of one and five. This change picks a different token. Temperature 0.0 does not stop this drift.
GPU scheduling controls this layer. You do not control it. Your agents run in parallel. This breaks your guarantee.
Follow these steps:
- Run a consistency probe.
- Do this before you delegate actions.
Source: https://dev.to/codelev/self-hosted-llm-same-prompt-temperature-zero-6-different-answers-ae6 Optional learning community: https://t.me/GyaanSetuAi