𝗡𝗩𝗜𝗗𝗜𝗔 𝗕𝗹𝗮𝗰𝗸𝘄𝗲𝗹𝗹 𝗦𝗽𝗲𝗲𝗱𝘀 𝗨𝗽 𝗝𝗔𝗫 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴

📅5 days ago⏱1 min read

NVIDIA released NVFP4. This 4-bit format runs on Blackwell GPUs. It speeds up JAX training in MaxText.

You get 1.8x faster training over FP8. The format puts two 4-bit values in one 8-bit register. This doubles the math density.

Here are the key facts:

Blackwell GPUs save memory. They use half the memory of FP16. They use 1.5x less memory than FP8. This lets you use larger batch sizes.

The H100 lacks these FP4 cores. This gives Blackwell a clear edge for pre-training.

NVIDIA says accuracy stays high. We need more tests for models over 100B parameters. We need to see real scores to be sure.

Continue reading