This is a torch implementation of the transformer architecture proposed in Chen et al. 2021 aimed to integrate the concept of spatial-temporal attention to multivariate time-series forecasting and future localization. It provides a way to implement the network modularly, detaching the forecasting network heads from the hidden state ouput.
Design choices inspired heavily by Vaswani et al. and the transformers library.