๐—ก๐—ฉ๐—œ๐——๐—œ๐—” ๐—•๐—น๐—ฎ๐—ฐ๐—ธ๐˜„๐—ฒ๐—น๐—น ๐—ฆ๐—ฝ๐—ฒ๐—ฒ๐—ฑ๐˜€ ๐—จ๐—ฝ ๐—๐—”๐—ซ ๐—ง๐—ฟ๐—ฎ๐—ถ๐—ป๐—ถ๐—ป๐—ด

NVIDIA released NVFP4. This 4-bit format runs on Blackwell GPUs. It speeds up JAX training in MaxText.

You get 1.8x faster training over FP8. The format puts two 4-bit values in one 8-bit register. This doubles the math density.

Here are the key facts:

Blackwell GPUs save memory. They use half the memory of FP16. They use 1.5x less memory than FP8. This lets you use larger batch sizes.

The H100 lacks these FP4 cores. This gives Blackwell a clear edge for pre-training.

NVIDIA says accuracy stays high. We need more tests for models over 100B parameters. We need to see real scores to be sure.

Source: https://dev.to/gentic_news/nvidia-nvfp4-on-blackwell-cuts-jax-training-by-18x-in-maxtext-373a Optional learning community: https://t.me/GyaanSetuAi