Skip to content

Questions about mask generation #12

@magehrig

Description

@magehrig

Hi @thomasverelst

Congrats, nice work! I have two questions out of curiosity:

  1. Forward pass: Why did you choose to sample from the Bernoulli distribution instead of the Gumbel-softmax? To my knowledge, sampling from the Bernoulli distribution introduces a bias in the gradient estimation which could make optimization trickier. I understand that you would not be able to use sparse convolutions in the training but I wonder if there is another reason.

  2. Have you tried annealing the temperature parameter to less than 1?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions