How can you get enough "aligned" images for pre-training?

The accuracy and robustness of the selection of key points seems to be crucial, which depend on "Self-supervised Pretraining"

However, this step needs "a single subject and its set of **aligned** different-modality scans"

I wonder how can you get enough aligned images to train this module?

Looks like we need to do a groupwise registration first, right?