𝗚𝗲𝗺𝗺𝗮 𝟰 𝗡𝗼𝘄 𝗥𝘂𝗻𝘀 𝗟𝗼𝗰𝗮𝗹𝗹𝘆 𝗼𝗻 𝗬𝗼𝘂𝗿 𝗚𝗣𝗨
mlx-vlm v0.6.2 is out.
It adds support for Gemma 4 QAT checkpoints. You now run Google DeepMind models on your own hardware. This works on consumer GPUs and edge devices.
Key updates:
- Support for Gemma 4 QAT checkpoints.
- Video input for the 12B model.
- Reliability fixes for Gemma 4.
- APC fix for single requests.
QAT stands for quantization-aware training. Google trains the model to stay accurate while it shrinks. You run larger models on smaller hardware.
Google DeepMind released these checkpoints on launch day. They partnered with mlx-vlm for this release. You no longer need third party tools to compress the model.
Source: https://dev.to/gentic_news/mlx-vlm-v062-adds-gemma-4-qat-support-for-local-gpus-lod
Optional learning community: https://t.me/GyaanSetuAi