Thanks for your great job! I am curious why we need to calculate residual connections when visualizing attention maps? 
Thanks for your great job!

I am curious why we need to calculate residual connections when visualizing attention maps?