
Researchers from HSE University, in collaboration with Yandex Research, have created a technique that dramatically speeds up diffusion models, which are essential for image generation. This advancement was reported to “Gazeta.Ru” by the educational institution’s press service.
Currently, diffusion models set the benchmark for image synthesis tasks; however, their widespread adoption is hampered by substantial computational demands, typically requiring numerous sequential steps to produce a single image.
The novel technique, dubbed Scale-wise Distillation of Diffusion Models (SwD), slashes this processing time down to just 0.3–0.4 seconds while maintaining the original image quality.
This approach capitalizes on the observation that the fundamental structure of an image is established in the initial generation stages, with finer details emerging subsequently. The scientists proposed first processing the image at a lower resolution, progressively refining it as noise levels decrease. This strategy bypasses unnecessary computations.
A second vital component is model distillation. This involves training a simpler ‘student model’ to replicate the output of a complex ‘teacher model,’ such as FLUX or Stable Diffusion 3.5. Consequently, the number of required generation steps plummets from tens down to just 4–6.
For the training process, the team employed a new loss function: Maximum Mean Discrepancy (MMD). This function compares the internal feature representations within the teacher and student models, eliminating the need for auxiliary neural networks. This simplifies the training regimen and accelerates it; in practical tests, the duration of a single iteration was reduced sevenfold.
According to the creators, this method renders contemporary generative models faster and more economical to operate, paving the way for broader deployment across various fields, ranging from creative design and media to complex scientific applications.