-
Notifications
You must be signed in to change notification settings - Fork 0
freeze and band #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: conv-generator
Are you sure you want to change the base?
freeze and band #67
Conversation
| discriminator_optimizer.zero_grad() | ||
| self.manual_backward(discriminator_loss["loss"], retain_graph=True) | ||
| discriminator_optimizer.step() | ||
| if self.global_step >= 10000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this draft, but can we make this configurable / enableable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worse than draft, this hack and yes, we could/should
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A funny note / observation:
global steps are per optimizer step; so with three datasets, it shows up in the graph as 10000 / 3 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should put that in a comment in the docstring
| discriminator_optimizer.zero_grad() | ||
| self.manual_backward(discriminator_loss["loss"], retain_graph=True) | ||
| discriminator_optimizer.step() | ||
| if self.global_step >= self.num_frozen_steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the loss graphs, and they didn't seem quite right, so I went back and compared this to how it's done in bigvgan:
https://github.com/NVIDIA/BigVGAN/blob/main/train.py#L486
I their implementation (different than here), the discriminator_optimizer.zero_grad() is always called, but backwards and step are optionally called.
I think the way you have it here we may be implicitly stepping the discriminator through the generator because it's not zeroing the gradients.
No description provided.