python demo.py --input demo_data/lady-running --output_dir demo_tmp --seq_name lady-running
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
... loading model from checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth
instantiating : AsymmetricCroCo3DStereo(pos_embed='RoPE100', patch_embed_cls='PatchEmbedDust3R', img_size=(512, 512), head_type='dpt', output_mode='pts3d', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, freeze='encoder', landscape_only=False)
Freezing encoder parameters
<All keys matched successfully>
Outputting stuff in demo_tmp
>> Loading a list of 65 items
- Adding demo_data/lady-running/00000.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00001.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00002.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00003.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00004.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00005.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00006.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00007.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00008.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00009.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00010.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00011.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00012.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00013.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00014.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00015.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00016.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00017.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00018.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00019.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00020.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00021.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00022.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00023.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00024.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00025.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00026.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00027.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00028.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00029.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00030.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00031.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00032.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00033.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00034.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00035.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00036.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00037.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00038.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00039.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00040.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00041.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00042.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00043.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00044.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00045.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00046.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00047.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00048.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00049.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00050.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00051.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00052.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00053.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00054.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00055.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00056.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00057.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00058.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00059.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00060.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00061.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00062.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00063.jpg with resolution 854x480 --> 512x288
- Adding demo_data/lady-running/00064.jpg with resolution 854x480 --> 512x288
(Found 65 images)
>> Inference with model on 600 image pairs
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 38/38 [00:23<00:00, 1.60it/s]
precomputing flow...
Loaded pretrained RAFT model from third_party/RAFT/models/Tartan-C-T-TSKH-spring540x960-M.pth
0%| | 0/50 [00:00<?, ?it/s]/opt/conda/envs/py310/lib/python3.10/site-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:23<00:00, 2.09it/s]
flow precomputed
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [01:14<00:00, 4.03it/s]
==>> frame_tensors.shape: torch.Size([65, 3, 288, 512])
==>> dir of self.img_pathes[0]: demo_data/lady-running
Traceback (most recent call last):
File "/repo/monst3r/demo.py", line 424, in <module>
scene, outfile, imgs = recon_fun(
File "/repo/monst3r/demo.py", line 132, in get_reconstructed_scene
scene = global_aligner(output, device=device, mode=mode, verbose=not silent, shared_focal = shared_focal, temporal_smoothing_weight=temporal_smoothing_weight, translation_weight=translation_weight,
File "/repo/monst3r/dust3r/cloud_opt/__init__.py", line 25, in global_aligner
net = PointCloudOptimizer(view1, view2, pred1, pred2, **optim_kw).to(device)
File "/repo/monst3r/dust3r/cloud_opt/optimizer.py", line 148, in __init__
self.refine_motion_mask_w_sam2()
File "/repo/monst3r/dust3r/cloud_opt/optimizer.py", line 398, in refine_motion_mask_w_sam2
inference_state = predictor.init_state(video_path=frame_tensors)
File "/opt/conda/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/repo/sam2/sam2/sam2_video_predictor.py", line 51, in init_state
images, video_height, video_width = load_video_frames(
File "/repo/sam2/sam2/utils/misc.py", line 209, in load_video_frames
raise NotImplementedError(
NotImplementedError: Only MP4 video and JPEG folder are supported at this moment
I fount the
frame_tensorspassed intovideo_pathofpredictor.init_stateis torch tensor instead of mp4 or jpeg folder, which is not supported by sam2