Computer Science - Computer Vision and Pattern Recognition
Though diffusion models excel in image generation, their step-by-step
denoising leads to slow generation speeds. Consistency training addresses this
issue with single-step sampling but often produces lower-quality generations
and requires high training costs. In this paper, we show that optimizing
consistency training loss minimizes the Wasserstein distance between target and
generated distributions. As timestep increases, the upper bound accumulates
previous consistency training losses. Therefore, larger batch sizes are needed
to reduce both current and accumulated losses. We propose Adversarial
Consistency Training (ACT), which directly minimizes the Jensen-Shannon (JS)
divergence between distributions at each timestep using a discriminator.
Theoretically, ACT enhances generation quality, and convergence. By
incorporating a discriminator into the consistency training framework, our
method achieves improved FID scores on CIFAR10 and ImageNet 64$\times$64 and
LSUN Cat 256$\times$256 datasets, retains zero-shot image inpainting
capabilities, and uses less than $1/6$ of the original batch size and fewer
than $1/2$ of the model parameters and training steps compared to the baseline
method, this leads to a substantial reduction in resource consumption. Our code
is available:https://github.com/kong13661/ACT
Metrics
16 Record Views
Details
Title
ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models