-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
Congrats, nice work! I have two questions out of curiosity:
-
Forward pass: Why did you choose to sample from the Bernoulli distribution instead of the Gumbel-softmax? To my knowledge, sampling from the Bernoulli distribution introduces a bias in the gradient estimation which could make optimization trickier. I understand that you would not be able to use sparse convolutions in the training but I wonder if there is another reason.
-
Have you tried annealing the temperature parameter to less than 1?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels