Q: Padded tokens without an `attention_mask`? 

Hey, I wanted to know how the padded tokens are handled by the attention layers of the model, 
considering that repo's implementation of GPT-2 discards `attention_mask` (HuggingFace).