𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻𝗚𝗲𝗺𝗺𝗮: 𝟭,𝟬𝟬𝟬 𝗧𝗼𝗸𝗲𝗻𝘀 𝗣𝗲𝗿 𝗦𝗲𝗰𝗼𝗻𝗱

Translated for your language. Read the original.

AI-assisted draft.

۳ روز پیش2min read

Most language models work one word at a time. They go from left to right. This creates a speed limit because the model must wait for each word to finish before starting the next.

Google DeepMind changed this with DiffusionGemma.

Instead of sequential writing, it uses a denoising process. It takes a block of up to 256 tokens and refines them all at once. This approach achieves over 1,000 tokens per second on a single NVIDIA H100. That is four times faster than standard models.

How it works:

The model starts with a block of placeholder tokens.
It runs multiple passes to clean up these placeholders.
Every token looks at every other token in the block at the same time.
This bidirectional view helps the model understand context from both sides.

Hardware performance:

• NVIDIA H100: 1,000+ tokens/second • NVIDIA DGX Station: up to 2,000 tokens/second • GeForce RTX 5090: ~700 tokens/second • VRAM need: ~18GB when quantized

Where to use it:

DiffusionGemma excels in local settings. In the cloud, companies batch many users together to stay efficient. On your own computer, the GPU often sits idle between words. DiffusionGemma solves this by turning memory bottlenecks into raw compute tasks.

Use it for:

Code infilling: Adding code to the middle of a function.
Text editing: Changing a sentence inside a paragraph.
Constraint tasks: Solving puzzles or math where the whole block must fit together.

The trade-off is quality. Benchmarks show DiffusionGemma scores lower than standard Gemma 4 in reasoning and coding. Language is harder to diffuse than images because one wrong word can ruin a whole sentence.

The verdict:

Use DiffusionGemma if you need speed on local hardware. Use standard Gemma 4 if you need the highest accuracy and deep reasoning.

Source: https://dev.to/prabhakar_chaudhary_7afe4/diffusiongemma-how-google-deepminds-text-diffusion-model-achieves-1000-tokens-per-second-3jnl

Optional learning community: https://t.me/GyaanSetuAi

𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻𝗚𝗲𝗺𝗺𝗮: 𝟭,𝟬𝟬𝟬 𝗧𝗼𝗸𝗲𝗻𝘀 𝗣𝗲𝗿 𝗦𝗲𝗰𝗼𝗻𝗱

Continue reading

دیفیوژن‌گما: تحول هوش مصنوعی باز گوگل

𝗛𝗼𝗴𝘄𝗶𝗹𝗱! 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲: 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗟𝗟𝗠 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻

𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻𝗚𝗲𝗺𝗺𝗮 𝟮𝟲𝗕: 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗧𝗲𝘅𝘁 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻

گوگل تولید هوش مصنوعی را برای همیشه تغییر می‌دهد

GPT فراتر از آنچه تصور می‌کنید عمل می‌کند