𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻𝗚𝗲𝗺𝗺𝗮 𝟮𝟲𝗕: 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗧𝗲𝘅𝘁 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻

Translated for your language. Read the original.

AI-assisted draft.

۳ روز پیش1min read

Google DeepMind released DiffusionGemma 26B. This model uses discrete diffusion instead of the standard autoregressive method.

Most models like GPT or Llama generate text one token at a time. They must run a full pass for every single token. This makes them slow for local use or real-time tasks.

DiffusionGemma works differently. It starts with a block of 256 random tokens and refines them through multiple passes.

Why this matters:

• Speed: It can reach 1,000 tokens per second on an H100 GPU. Standard models only reach 70 tokens per second on the same hardware. • Efficiency: Instead of 256 passes for 256 tokens, it only needs about 10 passes. • GPU usage: It uses compute power more effectively than memory bandwidth.

The trade-offs:

The speed comes with a cost in quality. DiffusionGemma scores lower on reasoning and coding benchmarks compared to the standard Gemma 4 26B.

Best use cases:

Code infilling.
Filling JSON schemas.
Structured document completion.
Local tasks where low latency is the priority.

Avoid using it for:

High-concurrency APIs with huge batches.
Tasks where quality is the only priority.
Applications that require streaming text word by word.

This model uses a Mixture-of-Experts (MoE) architecture. It has 25.2B total parameters but only uses 3.8B active parameters per step. You can run the 4-bit version on an RTX 4090 with 24GB VRAM.

It is an experimental model. Use standard Gemma 4 if you need the highest accuracy. Use DiffusionGemma if you need extreme speed for local applications.

Source: https://dev.to/prabhakar_chaudhary_7afe4/diffusiongemma-26b-how-googles-text-diffusion-model-generates-tokens-in-parallel-56og

Optional learning community: https://t.me/GyaanSetuAi

𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻𝗚𝗲𝗺𝗺𝗮 𝟮𝟲𝗕: 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗧𝗲𝘅𝘁 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻

Continue reading

𝗚𝗲𝗺𝗺𝗮 𝟰 𝟭𝟮𝗕 𝗦𝗵𝗼𝘄𝘀 𝗛𝗼𝘄 𝗙𝗮𝗿 𝗟𝗼𝗰𝗮𝗹 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗔𝗜 𝗛𝗮𝘀 𝗠𝗼𝘃𝗲𝗱

دیفیوژن‌گما: تحول هوش مصنوعی باز گوگل

𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻𝗚𝗲𝗺𝗺𝗮: 𝟭,𝟬𝟬𝟬 𝗧𝗼𝗸𝗲𝗻𝘀 𝗣𝗲𝗿 𝗦𝗲𝗰𝗼𝗻𝗱

معماری Gemma 2: عملکرد بیشتر با مدل کوچک‌تر

گوگل تولید هوش مصنوعی را برای همیشه تغییر می‌دهد