Hello, thank you very much for open-sourcing this project for us to learn from. I also came across Nitro(https://github.com/AMD-AGI/Nitro-1/blob/main/train.py) and noticed that in your training process, you first train the discriminator and then use the noised student predictions (from the discriminator training phase) to train the generator. This is different from Nitro, which uses separate student predictions to train the generator and discriminator independently. Have you conducted any experiments to confirm that this configuration works better?
Recently, I’ve been trying to apply LADD training to FLUX, aiming to generate high-resolution images (1024×768) with fewer inference steps. While distilling the model improves results compared to directly using the teacher model for few-step inference, I still notice residual noise that can’t be fully removed. Do you have any suggestions for addressing this issue?