Skip to content

error occurs in sam2 video segmentation: NotImplementedError: Only MP4 video and JPEG folder are supported at this moment #87

@EmmaThompson123

Description

@EmmaThompson123
python demo.py --input demo_data/lady-running --output_dir demo_tmp --seq_name lady-running
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
... loading model from checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth
instantiating : AsymmetricCroCo3DStereo(pos_embed='RoPE100', patch_embed_cls='PatchEmbedDust3R', img_size=(512, 512), head_type='dpt', output_mode='pts3d', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, freeze='encoder', landscape_only=False)
Freezing encoder parameters
<All keys matched successfully>
Outputting stuff in demo_tmp
>> Loading a list of 65 items
 - Adding demo_data/lady-running/00000.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00001.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00002.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00003.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00004.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00005.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00006.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00007.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00008.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00009.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00010.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00011.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00012.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00013.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00014.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00015.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00016.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00017.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00018.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00019.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00020.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00021.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00022.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00023.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00024.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00025.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00026.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00027.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00028.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00029.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00030.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00031.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00032.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00033.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00034.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00035.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00036.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00037.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00038.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00039.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00040.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00041.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00042.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00043.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00044.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00045.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00046.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00047.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00048.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00049.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00050.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00051.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00052.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00053.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00054.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00055.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00056.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00057.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00058.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00059.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00060.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00061.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00062.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00063.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00064.jpg with resolution 854x480 --> 512x288
 (Found 65 images)
>> Inference with model on 600 image pairs
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 38/38 [00:23<00:00,  1.60it/s]
precomputing flow...
Loaded pretrained RAFT model from third_party/RAFT/models/Tartan-C-T-TSKH-spring540x960-M.pth
  0%|                                                                                                                           | 0/50 [00:00<?, ?it/s]/opt/conda/envs/py310/lib/python3.10/site-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:23<00:00,  2.09it/s]
flow precomputed
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [01:14<00:00,  4.03it/s]
==>> frame_tensors.shape: torch.Size([65, 3, 288, 512])
==>> dir of self.img_pathes[0]: demo_data/lady-running
Traceback (most recent call last):
  File "/repo/monst3r/demo.py", line 424, in <module>
    scene, outfile, imgs = recon_fun(
  File "/repo/monst3r/demo.py", line 132, in get_reconstructed_scene
    scene = global_aligner(output, device=device, mode=mode, verbose=not silent, shared_focal = shared_focal, temporal_smoothing_weight=temporal_smoothing_weight, translation_weight=translation_weight,
  File "/repo/monst3r/dust3r/cloud_opt/__init__.py", line 25, in global_aligner
    net = PointCloudOptimizer(view1, view2, pred1, pred2, **optim_kw).to(device)
  File "/repo/monst3r/dust3r/cloud_opt/optimizer.py", line 148, in __init__
    self.refine_motion_mask_w_sam2()
  File "/repo/monst3r/dust3r/cloud_opt/optimizer.py", line 398, in refine_motion_mask_w_sam2
    inference_state = predictor.init_state(video_path=frame_tensors)
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/repo/sam2/sam2/sam2_video_predictor.py", line 51, in init_state
    images, video_height, video_width = load_video_frames(
  File "/repo/sam2/sam2/utils/misc.py", line 209, in load_video_frames
    raise NotImplementedError(
NotImplementedError: Only MP4 video and JPEG folder are supported at this moment

I fount the frame_tensors passed into video_path of predictor.init_state is torch tensor instead of mp4 or jpeg folder, which is not supported by sam2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions