𝗧𝘄𝗼 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗦𝘁𝗲𝗽𝘀 𝗥𝗲𝗮𝗰𝗵 𝟯𝟭 𝗙𝗣𝗦

Diffusion models for lip sync finally reach real-time speeds.

Most people believe you need dozens of steps to make diffusion work. New research shows you only need two.

The Lip Forcing method changes how the pipeline works. It does not just make the model bigger. It makes the process smarter.

Old systems required over 50 steps. This caused long delays. You could not use them for live interaction.

The new 1.3B student model hits 31 FPS. This is 17.6x faster than previous models of the same size.

How does it work?

The speed comes with a small trade-off in fidelity. However, the synchronization remains high.

The limitations are clear.

If two steps work for lip sync, other video models should follow this path. We can replace heavy models with lightweight students. This opens the door for live streaming filters and on-device animation.

We might see models with only one step soon. This would make video generation instant.

Source: https://dev.to/olaughter/two-diffusion-steps-reach-31-fps-52pd

Optional learning community: https://t.me/GyaanSetuAi