๐ ๐ฎ๐ฟ๐ด๐ถ๐ป๐๐ฎ๐๐ฒ: ๐๐ถ๐ ๐ถ๐ป๐ด ๐๐๐ ๐๐ฒ๐๐ฒ๐ฟ๐บ๐ถ๐ป๐ถ๐๐บ
You think temperature-0 decoding is predictable. It is not. The same prompt gives different tokens when you change the batch size.
GPUs sum numbers in different orders based on batch size. BF16 math is not perfect. Small errors change the winner during a tie.
MarginGate fixes this. It checks the gap between the top two tokens.
- Large gap: Use BF16. It stays fast.
- Small gap: Use FP32. It is precise.
FP32 re-checks risky steps. MarginGate fixes the K/V cache if a token flips. You get 100% consistency.
Why use this over full FP32 verification?
- Full FP32 checks every step.
- MarginGate checks 15 to 18 percent of steps.
- This cuts overhead by half.
Determinism helps with debugging and audits. You get stability without losing speed.
Source: https://dev.to/pueding/margingate-margin-gated-verification-for-batch-invariant-decoding-1cko Optional learning community: https://t.me/GyaanSetuAi