Hello, thank you for open-sourcing this excellent work.
I have a question about reproducing the training setup in this repository.
From my understanding of the code, when running the training command such as:
python -m src.main +experiment=scannet/fvt +output_dir=train_fvt_full_100v
the dataset loader seems to expect per-scene feature files like language_feats_256/.pt (or related feature .pt files). However, in the README I could only find instructions for preparing color, depth, intrinsic, extrinsics.npy, train_idx.txt, and test_idx.txt, but I could not find guidance on how to obtain these .pt feature files.
So I would like to kindly ask:
1. If these .pt files are missing, is it correct that the training cannot be reproduced in a way that is fully consistent with the experimental setup reported in the paper?
2. Could you please provide guidance on how these .pt files are generated?
3. If possible, could you also clarify which pretrained model/checkpoint and preprocessing steps were used to extract them?
Any clarification would be greatly appreciated. Thank you very much for your time and help.
Hello, thank you for open-sourcing this excellent work.
I have a question about reproducing the training setup in this repository.
From my understanding of the code, when running the training command such as:
python -m src.main +experiment=scannet/fvt +output_dir=train_fvt_full_100v
the dataset loader seems to expect per-scene feature files like language_feats_256/.pt (or related feature .pt files). However, in the README I could only find instructions for preparing color, depth, intrinsic, extrinsics.npy, train_idx.txt, and test_idx.txt, but I could not find guidance on how to obtain these .pt feature files.
So I would like to kindly ask:
1. If these .pt files are missing, is it correct that the training cannot be reproduced in a way that is fully consistent with the experimental setup reported in the paper?
2. Could you please provide guidance on how these .pt files are generated?
3. If possible, could you also clarify which pretrained model/checkpoint and preprocessing steps were used to extract them?
Any clarification would be greatly appreciated. Thank you very much for your time and help.