Skip to content

Download CLIP embeddings in build time#81

Closed
mqcmd196 wants to merge 1 commit into
HiroIshida:masterfrom
mqcmd196:avoid-clip-download-runtime
Closed

Download CLIP embeddings in build time#81
mqcmd196 wants to merge 1 commit into
HiroIshida:masterfrom
mqcmd196:avoid-clip-download-runtime

Conversation

@mqcmd196
Copy link
Copy Markdown
Collaborator

@mqcmd196 mqcmd196 commented Jul 9, 2025

I want to use this in off-line environment. This PR supports downloading CLIP models on build time.

This PR requires

@HiroIshida
Copy link
Copy Markdown
Owner

HiroIshida commented Jul 9, 2025

Sorry I dont have time to fully gussing the intent of the PR, so could you give me more context?

  1. what do you mean by offline environment?
  2. in the PR to the HiroIshida/Detic side you reverted the clip embedding cache, but I think the cache mechanism is irrelevant to whether or not pre-download the model. The cache target is computation result of f_encoder(text) and not the f_encoder. Tell me your intention

@mqcmd196
Copy link
Copy Markdown
Collaborator Author

what do you mean by offline environment?

I mean no network environment.

you reverted the clip embedding cache, but I think the cache mechanism is irrelevant to whether or not pre-download the model. The cache target is computation result of f_encoder(text) and not the f_encoder. Tell me your intention

I'm sorry for confusing. text_encoder variable is nothing to do with the prompt words https://github.com/HiroIshida/Detic/pull/1/files#diff=93e33776696c1f26000ebfe7438a4f3ec84f8dc0a00c4256f9944c55abb16142R20 . I noticed calculating text embeddings takes not so much time if the CLIP weight is pre-downloaded. So I reverted the commit.

@HiroIshida
Copy link
Copy Markdown
Owner

HiroIshida commented Jul 10, 2025

On Offline usage

Ok. I see for the first one. But, the clip donwload is done only once, and cached under

h-ishida@azarashi:~$ ls  ~/.cache/clip
ViT-B-32.pt

(defined around here https://github.com/openai/CLIP/blob/dcba3cb2e2827b402d2701e7e1c7d9fed8a20ef1/clip/clip.py#L120 )

I think after this PR, we'll have two functionalities of CLIP model downloading inside the code base.
So to avoid such duplicaton, I'd prefer to write simiple python script that run the clip download function, and call it once in the building process. Or more simply run python -c "import clip; hogehoge.." is also an option.
But any opinion is welcom.

On reversion of the caching mechanism

As for the second one, I disagree. Embedding computation is dominated by building the clip model, not downloading it. I tested the performance as follows. because inside the emebdding computation dont do any model donwload and main bottleneck, which suggest that bottleneck. So I think current caching mechanism is quite effective (reduces 3 seconds) and I want to keep it.

On current master branch

--- a/detic/predictor.py
+++ b/detic/predictor.py
@@ -26,10 +26,15 @@ def get_clip_embeddings(vocabulary, prompt='a '):
         return torch.load(cache_file_path)
     else:
         from detic.modeling.text.text_encoder import build_text_encoder
+        from pyinstrument import Profiler
+        profiler = Profiler()
+        profiler.start()
         text_encoder = build_text_encoder(pretrain=True)
         text_encoder.eval()
         texts = [prompt + x for x in vocabulary]
         emb = text_encoder(texts).detach().permute(1, 0).contiguous().cpu()
+        profiler.stop()
+        print(profiler.output_text(unicode=True, color=True, show_all=True))
         print(f"saved embeddings for {vocabulary} to {cache_file_path}")
         torch.save(emb, cache_file_path)
         return emb
3.187 get_clip_embeddings  detic/predictor.py:20
├─ 3.107 build_text_encoder  detic/modeling/text/text_encoder.py:174
│  ├─ 2.390 load  clip/clip.py:94
│  │  ├─ 1.125 build_model  clip/model.py:399
│  │  │  ├─ 1.008 CLIP.__init__  clip/model.py:244
│  │  │  │  ├─ 0.431 VisionTransformer.__init__  clip/model.py:207
│  │  │  │  │  ├─ 0.417 Transformer.__init__  clip/model.py:196
│  │  │  │  │  │  └─ 0.417 <listcomp>  clip/model.py:200
│  │  │  │  │  │     └─ 0.417 ResidualAttentionBlock.__init__  clip/model.py:172
│  │  │  │  │  │        ├─ 0.275 Linear.__init__  torch/nn/modules/linear.py:75
│  │  │  │  │  │        │  └─ 0.275 Linear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │  │  │  │        │     └─ 0.275 kaiming_uniform_  torch/nn/init.py:360
│  │  │  │  │  │        │        └─ 0.275 Parameter.uniform_  <built-in>
│  │  │  │  │  │        └─ 0.142 MultiheadAttention.__init__  torch/nn/modules/activation.py:909
│  │  │  │  │  │           ├─ 0.103 MultiheadAttention._reset_parameters  torch/nn/modules/activation.py:951
│  │  │  │  │  │           │  └─ 0.103 xavier_uniform_  torch/nn/init.py:297
│  │  │  │  │  │           │     └─ 0.103 _no_grad_uniform_  torch/nn/init.py:12
│  │  │  │  │  │           │        └─ 0.103 Parameter.uniform_  <built-in>
│  │  │  │  │  │           └─ 0.039 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:110
│  │  │  │  │  │              └─ 0.039 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:75
│  │  │  │  │  │                 └─ 0.039 NonDynamicallyQuantizableLinear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │  │  │  │                    └─ 0.039 kaiming_uniform_  torch/nn/init.py:360
│  │  │  │  │  │                       └─ 0.039 Parameter.uniform_  <built-in>
│  │  │  │  │  ├─ 0.012 Conv2d.__init__  torch/nn/modules/conv.py:411
│  │  │  │  │  │  └─ 0.012 Conv2d.__init__  torch/nn/modules/conv.py:67
│  │  │  │  │  │     └─ 0.012 Conv2d.reset_parameters  torch/nn/modules/conv.py:140
│  │  │  │  │  │        └─ 0.012 kaiming_uniform_  torch/nn/init.py:360
│  │  │  │  │  │           └─ 0.012 Parameter.uniform_  <built-in>
│  │  │  │  │  └─ 0.002 _VariableFunctionsClass.randn  <built-in>
│  │  │  │  ├─ 0.252 CLIP.initialize_parameters  clip/model.py:299
│  │  │  │  │  └─ 0.252 normal_  torch/nn/init.py:138
│  │  │  │  │     └─ 0.252 _no_grad_normal_  torch/nn/init.py:17
│  │  │  │  │        └─ 0.252 Parameter.normal_  <built-in>
│  │  │  │  ├─ 0.190 Transformer.__init__  clip/model.py:196
│  │  │  │  │  └─ 0.190 <listcomp>  clip/model.py:200
│  │  │  │  │     └─ 0.190 ResidualAttentionBlock.__init__  clip/model.py:172
│  │  │  │  │        ├─ 0.124 Linear.__init__  torch/nn/modules/linear.py:75
│  │  │  │  │        │  └─ 0.124 Linear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │  │  │        │     └─ 0.124 kaiming_uniform_  torch/nn/init.py:360
│  │  │  │  │        │        └─ 0.124 Parameter.uniform_  <built-in>
│  │  │  │  │        └─ 0.066 MultiheadAttention.__init__  torch/nn/modules/activation.py:909
│  │  │  │  │           ├─ 0.047 MultiheadAttention._reset_parameters  torch/nn/modules/activation.py:951
│  │  │  │  │           │  └─ 0.047 xavier_uniform_  torch/nn/init.py:297
│  │  │  │  │           │     └─ 0.047 _no_grad_uniform_  torch/nn/init.py:12
│  │  │  │  │           │        └─ 0.047 Parameter.uniform_  <built-in>
│  │  │  │  │           └─ 0.019 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:110
│  │  │  │  │              └─ 0.019 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:75
│  │  │  │  │                 └─ 0.019 NonDynamicallyQuantizableLinear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │  │  │                    └─ 0.019 kaiming_uniform_  torch/nn/init.py:360
│  │  │  │  │                       └─ 0.019 Parameter.uniform_  <built-in>
│  │  │  │  └─ 0.135 Embedding.__init__  torch/nn/modules/sparse.py:120
│  │  │  │     └─ 0.135 Embedding.reset_parameters  torch/nn/modules/sparse.py:148
│  │  │  │        └─ 0.135 normal_  torch/nn/init.py:138
│  │  │  │           └─ 0.135 _no_grad_normal_  torch/nn/init.py:17
│  │  │  │              └─ 0.135 Parameter.normal_  <built-in>
│  │  │  ├─ 0.065 convert_weights  clip/model.py:375
│  │  │  │  └─ 0.065 CLIP.apply  torch/nn/modules/module.py:577
│  │  │  │     └─ 0.065 VisionTransformer.apply  torch/nn/modules/module.py:577
│  │  │  │        └─ 0.065 Transformer.apply  torch/nn/modules/module.py:577
│  │  │  │           ├─ 0.064 Sequential.apply  torch/nn/modules/module.py:577
│  │  │  │           │  └─ 0.064 ResidualAttentionBlock.apply  torch/nn/modules/module.py:577
│  │  │  │           │     ├─ 0.059 Sequential.apply  torch/nn/modules/module.py:577
│  │  │  │           │     │  ├─ 0.038 Linear.apply  torch/nn/modules/module.py:577
│  │  │  │           │     │  │  └─ 0.038 _convert_weights_to_fp16  clip/model.py:378
│  │  │  │           │     │  │     ├─ 0.037 [self]  clip/model.py
│  │  │  │           │     │  │     └─ 0.001 Tensor.half  <built-in>
│  │  │  │           │     │  └─ 0.021 _convert_weights_to_fp16  clip/model.py:378
│  │  │  │           │     │     ├─ 0.017 [self]  clip/model.py
│  │  │  │           │     │     └─ 0.004 Tensor.half  <built-in>
│  │  │  │           │     └─ 0.005 _convert_weights_to_fp16  clip/model.py:378
│  │  │  │           │        ├─ 0.004 Tensor.half  <built-in>
│  │  │  │           │        └─ 0.001 isinstance  <built-in>
│  │  │  │           └─ 0.001 _convert_weights_to_fp16  clip/model.py:378
│  │  │  │              └─ 0.001 Tensor.half  <built-in>
│  │  │  ├─ 0.050 CLIP.load_state_dict  torch/nn/modules/module.py:1354
│  │  │  │  └─ 0.050 load  torch/nn/modules/module.py:1384
│  │  │  │     ├─ 0.049 load  torch/nn/modules/module.py:1384
│  │  │  │     │  ├─ 0.040 load  torch/nn/modules/module.py:1384
│  │  │  │     │  │  ├─ 0.038 load  torch/nn/modules/module.py:1384
│  │  │  │     │  │  │  ├─ 0.036 load  torch/nn/modules/module.py:1384
│  │  │  │     │  │  │  │  ├─ 0.034 load  torch/nn/modules/module.py:1384
│  │  │  │     │  │  │  │  │  ├─ 0.017 load  torch/nn/modules/module.py:1384
│  │  │  │     │  │  │  │  │  │  └─ 0.017 Linear._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │  │  │  │  │  │     └─ 0.017 Parameter.copy_  <built-in>
│  │  │  │     │  │  │  │  │  └─ 0.017 Linear._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │  │  │  │  │     ├─ 0.014 Parameter.copy_  <built-in>
│  │  │  │     │  │  │  │  │     └─ 0.003 [self]  torch/nn/modules/module.py
│  │  │  │     │  │  │  │  └─ 0.002 LayerNorm._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │  │  │  │     ├─ 0.001 str.startswith  <built-in>
│  │  │  │     │  │  │  │     └─ 0.001 [self]  torch/nn/modules/module.py
│  │  │  │     │  │  │  ├─ 0.001 ResidualAttentionBlock._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │  │  │  └─ 0.001 [self]  torch/nn/modules/module.py
│  │  │  │     │  │  └─ 0.002 Conv2d._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │  │     ├─ 0.001 Parameter.copy_  <built-in>
│  │  │  │     │  │     └─ 0.001 [self]  torch/nn/modules/module.py
│  │  │  │     │  └─ 0.009 Embedding._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │     └─ 0.009 Parameter.copy_  <built-in>
│  │  │  │     └─ 0.001 CLIP._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │        └─ 0.001 str.split  <built-in>
│  │  │  └─ 0.001 CLIP.eval  torch/nn/modules/module.py:1644
│  │  │     └─ 0.001 CLIP.train  torch/nn/modules/module.py:1622
│  │  │        └─ 0.001 Transformer.train  torch/nn/modules/module.py:1622
│  │  │           └─ 0.001 Sequential.train  torch/nn/modules/module.py:1622
│  │  │              └─ 0.001 ResidualAttentionBlock.train  torch/nn/modules/module.py:1622
│  │  │                 └─ 0.001 MultiheadAttention.train  torch/nn/modules/module.py:1622
│  │  │                    └─ 0.001 MultiheadAttention.children  torch/nn/modules/module.py:1521
│  │  │                       └─ 0.001 MultiheadAttention.named_children  torch/nn/modules/module.py:1530
│  │  ├─ 0.858 load  torch/jit/_serialization.py:87
│  │  │  ├─ 0.659 PyCapsule.import_ir_module_from_buffer  <built-in>
│  │  │  ├─ 0.155 BufferedReader.read  <built-in>
│  │  │  ├─ 0.028 [self]  torch/jit/_serialization.py
│  │  │  └─ 0.017 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │     └─ 0.017 _construct  torch/jit/_script.py:482
│  │  │        └─ 0.017 init_fn  torch/jit/_recursive.py:801
│  │  │           ├─ 0.015 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │  └─ 0.015 _construct  torch/jit/_script.py:482
│  │  │           │     └─ 0.015 init_fn  torch/jit/_recursive.py:801
│  │  │           │        └─ 0.015 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │           └─ 0.015 _construct  torch/jit/_script.py:482
│  │  │           │              ├─ 0.014 init_fn  torch/jit/_recursive.py:801
│  │  │           │              │  └─ 0.014 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │              │     └─ 0.014 _construct  torch/jit/_script.py:482
│  │  │           │              │        └─ 0.014 init_fn  torch/jit/_recursive.py:801
│  │  │           │              │           └─ 0.014 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │              │              └─ 0.014 _construct  torch/jit/_script.py:482
│  │  │           │              │                 ├─ 0.012 init_fn  torch/jit/_recursive.py:801
│  │  │           │              │                 │  ├─ 0.011 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │              │                 │  │  └─ 0.011 _construct  torch/jit/_script.py:482
│  │  │           │              │                 │  │     ├─ 0.005 init_fn  torch/jit/_recursive.py:801
│  │  │           │              │                 │  │     │  ├─ 0.003 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │              │                 │  │     │  │  └─ 0.003 _construct  torch/jit/_script.py:482
│  │  │           │              │                 │  │     │  │     ├─ 0.002 _finalize_scriptmodule  torch/jit/_script.py:504
│  │  │           │              │                 │  │     │  │     │  ├─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │              │                 │  │     │  │     │  │  └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │              │                 │  │     │  │     │  │     └─ 0.001 RecursiveScriptModule.__setattr__  torch/nn/modules/module.py:1133
│  │  │           │              │                 │  │     │  │     │  │        └─ 0.001 OrderedDictWrapper.__contains__  torch/jit/_script.py:185
│  │  │           │              │                 │  │     │  │     │  └─ 0.001 [self]  torch/jit/_script.py
│  │  │           │              │                 │  │     │  │     └─ 0.001 RecursiveScriptModule.__init__  torch/jit/_script.py:473
│  │  │           │              │                 │  │     │  │        └─ 0.001 RecursiveScriptModule.init_then_script  torch/jit/_script.py:268
│  │  │           │              │                 │  │     │  │           └─ 0.001 RecursiveScriptModule.__init__  torch/jit/_script.py:377
│  │  │           │              │                 │  │     │  │              └─ 0.001 RecursiveScriptModule.__init__  torch/nn/modules/module.py:250
│  │  │           │              │                 │  │     │  │                 └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │              │                 │  │     │  │                    └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │              │                 │  │     │  │                       └─ 0.001 isinstance  <built-in>
│  │  │           │              │                 │  │     │  ├─ 0.001 PyCapsule.from_jit_type  <built-in>
│  │  │           │              │                 │  │     │  └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │              │                 │  │     │     └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │              │                 │  │     │        └─ 0.001 RecursiveScriptModule.__setattr__  torch/nn/modules/module.py:1133
│  │  │           │              │                 │  │     │           └─ 0.001 remove_from  torch/nn/modules/module.py:1134
│  │  │           │              │                 │  │     ├─ 0.004 RecursiveScriptModule.__init__  torch/jit/_script.py:473
│  │  │           │              │                 │  │     │  └─ 0.004 RecursiveScriptModule.init_then_script  torch/jit/_script.py:268
│  │  │           │              │                 │  │     │     └─ 0.004 RecursiveScriptModule.__init__  torch/jit/_script.py:377
│  │  │           │              │                 │  │     │        └─ 0.004 RecursiveScriptModule.__init__  torch/nn/modules/module.py:250
│  │  │           │              │                 │  │     │           ├─ 0.003 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │              │                 │  │     │           │  ├─ 0.002 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │              │                 │  │     │           │  │  └─ 0.002 RecursiveScriptModule.__setattr__  torch/nn/modules/module.py:1133
│  │  │           │              │                 │  │     │           │  └─ 0.001 [self]  torch/jit/_script.py
│  │  │           │              │                 │  │     │           └─ 0.001 [self]  torch/nn/modules/module.py
│  │  │           │              │                 │  │     └─ 0.002 _finalize_scriptmodule  torch/jit/_script.py:504
│  │  │           │              │                 │  │        └─ 0.002 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │              │                 │  │           └─ 0.002 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │              │                 │  │              ├─ 0.001 [self]  torch/jit/_script.py
│  │  │           │              │                 │  │              └─ 0.001 RecursiveScriptModule.__setattr__  torch/nn/modules/module.py:1133
│  │  │           │              │                 │  │                 └─ 0.001 OrderedDictWrapper.__contains__  torch/jit/_script.py:185
│  │  │           │              │                 │  └─ 0.001 PyCapsule.from_jit_type  <built-in>
│  │  │           │              │                 └─ 0.002 RecursiveScriptModule.__init__  torch/jit/_script.py:473
│  │  │           │              │                    ├─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │              │                    │  └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │              │                    │     └─ 0.001 RecursiveScriptModule.__setattr__  torch/nn/modules/module.py:1133
│  │  │           │              │                    │        └─ 0.001 isinstance  <built-in>
│  │  │           │              │                    └─ 0.001 RecursiveScriptModule.init_then_script  torch/jit/_script.py:268
│  │  │           │              │                       └─ 0.001 RecursiveScriptModule.__init__  torch/jit/_script.py:377
│  │  │           │              │                          └─ 0.001 RecursiveScriptModule.__init__  torch/nn/modules/module.py:250
│  │  │           │              │                             └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │              │                                └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │              │                                   └─ 0.001 RecursiveScriptModule.__setattr__  torch/nn/modules/module.py:1133
│  │  │           │              └─ 0.001 RecursiveScriptModule.__init__  torch/jit/_script.py:473
│  │  │           │                 └─ 0.001 RecursiveScriptModule.init_then_script  torch/jit/_script.py:268
│  │  │           │                    └─ 0.001 RecursiveScriptModule.__init__  torch/jit/_script.py:377
│  │  │           │                       └─ 0.001 RecursiveScriptModule.__init__  torch/nn/modules/module.py:250
│  │  │           │                          └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │                             └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           └─ 0.001 PyCapsule.from_jit_type  <built-in>
│  │  ├─ 0.347 _download  clip/clip.py:43
│  │  │  ├─ 0.168 openssl_sha256  <built-in>
│  │  │  ├─ 0.150 BufferedReader.read  <built-in>
│  │  │  └─ 0.028 [self]  clip/clip.py
│  │  ├─ 0.043 CLIP.float  torch/nn/modules/module.py:683
│  │  │  └─ 0.043 CLIP._apply  torch/nn/modules/module.py:528
│  │  │     └─ 0.043 VisionTransformer._apply  torch/nn/modules/module.py:528
│  │  │        ├─ 0.042 Transformer._apply  torch/nn/modules/module.py:528
│  │  │        │  └─ 0.042 Sequential._apply  torch/nn/modules/module.py:528
│  │  │        │     └─ 0.042 ResidualAttentionBlock._apply  torch/nn/modules/module.py:528
│  │  │        │        └─ 0.042 Sequential._apply  torch/nn/modules/module.py:528
│  │  │        │           ├─ 0.023 Linear._apply  torch/nn/modules/module.py:528
│  │  │        │           │  └─ 0.023 <lambda>  torch/nn/modules/module.py:692
│  │  │        │           │     └─ 0.023 Parameter.float  <built-in>
│  │  │        │           └─ 0.019 <lambda>  torch/nn/modules/module.py:692
│  │  │        │              └─ 0.019 Parameter.float  <built-in>
│  │  │        └─ 0.001 <lambda>  torch/nn/modules/module.py:692
│  │  │           └─ 0.001 Parameter.float  <built-in>
│  │  ├─ 0.014 [self]  clip/clip.py
│  │  ├─ 0.002 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │  │  └─ 0.002 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │  │     └─ 0.002 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │  │        └─ 0.002 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │  │           ├─ 0.001 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │  │           │  └─ 0.001 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │  │           │     └─ 0.001 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │  │           │        └─ 0.001 RecursiveScriptModule._save_to_state_dict  torch/nn/modules/module.py:1202
│  │  │           └─ 0.001 RecursiveScriptModule._save_to_state_dict  torch/nn/modules/module.py:1202
│  │  │              └─ 0.001 OrderedDictWrapper.items  torch/jit/_script.py:174
│  │  └─ 0.002 CLIP.to  torch/nn/modules/module.py:752
│  │     └─ 0.002 CLIP._apply  torch/nn/modules/module.py:528
│  │        └─ 0.002 Transformer._apply  torch/nn/modules/module.py:528
│  │           └─ 0.002 Sequential._apply  torch/nn/modules/module.py:528
│  │              └─ 0.002 ResidualAttentionBlock._apply  torch/nn/modules/module.py:528
│  │                 └─ 0.002 LayerNorm._apply  torch/nn/modules/module.py:528
│  │                    ├─ 0.001 [self]  torch/nn/modules/module.py
│  │                    └─ 0.001 LayerNorm._apply  torch/nn/modules/module.py:528
│  ├─ 0.685 CLIPTEXT.__init__  detic/modeling/text/text_encoder.py:68
│  │  ├─ 0.251 CLIPTEXT.initialize_parameters  detic/modeling/text/text_encoder.py:99
│  │  │  └─ 0.251 normal_  torch/nn/init.py:138
│  │  │     └─ 0.251 _no_grad_normal_  torch/nn/init.py:17
│  │  │        └─ 0.251 Parameter.normal_  <built-in>
│  │  ├─ 0.191 Transformer.__init__  detic/modeling/text/text_encoder.py:56
│  │  │  └─ 0.191 <listcomp>  detic/modeling/text/text_encoder.py:61
│  │  │     └─ 0.191 ResidualAttentionBlock.__init__  detic/modeling/text/text_encoder.py:32
│  │  │        ├─ 0.123 Linear.__init__  torch/nn/modules/linear.py:75
│  │  │        │  └─ 0.123 Linear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │        │     └─ 0.123 kaiming_uniform_  torch/nn/init.py:360
│  │  │        │        └─ 0.123 Parameter.uniform_  <built-in>
│  │  │        └─ 0.068 MultiheadAttention.__init__  torch/nn/modules/activation.py:909
│  │  │           ├─ 0.047 MultiheadAttention._reset_parameters  torch/nn/modules/activation.py:951
│  │  │           │  └─ 0.047 xavier_uniform_  torch/nn/init.py:297
│  │  │           │     └─ 0.047 _no_grad_uniform_  torch/nn/init.py:12
│  │  │           │        └─ 0.047 Parameter.uniform_  <built-in>
│  │  │           └─ 0.021 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:110
│  │  │              └─ 0.021 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:75
│  │  │                 ├─ 0.020 NonDynamicallyQuantizableLinear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │                 │  └─ 0.020 kaiming_uniform_  torch/nn/init.py:360
│  │  │                 │     └─ 0.020 Parameter.uniform_  <built-in>
│  │  │                 └─ 0.001 NonDynamicallyQuantizableLinear.__setattr__  torch/nn/modules/module.py:1133
│  │  │                    └─ 0.001 NonDynamicallyQuantizableLinear.register_parameter  torch/nn/modules/module.py:322
│  │  ├─ 0.133 Embedding.__init__  torch/nn/modules/sparse.py:120
│  │  │  └─ 0.133 Embedding.reset_parameters  torch/nn/modules/sparse.py:148
│  │  │     └─ 0.133 normal_  torch/nn/init.py:138
│  │  │        └─ 0.133 _no_grad_normal_  torch/nn/init.py:17
│  │  │           └─ 0.133 Parameter.normal_  <built-in>
│  │  ├─ 0.088 SimpleTokenizer.__init__  clip/simple_tokenizer.py:63
│  │  │  ├─ 0.023 GzipFile.read  gzip.py:287
│  │  │  │  ├─ 0.021 _GzipReader.read  gzip.py:454
│  │  │  │  │  ├─ 0.016 Decompress.decompress  <built-in>
│  │  │  │  │  ├─ 0.003 _GzipReader._add_read_data  gzip.py:505
│  │  │  │  │  │  └─ 0.003 crc32  <built-in>
│  │  │  │  │  ├─ 0.001 _PaddedFile.read  gzip.py:85
│  │  │  │  │  │  └─ 0.001 BufferedReader.read  <built-in>
│  │  │  │  │  └─ 0.001 [self]  gzip.py
│  │  │  │  └─ 0.001 BufferedReader.read  <built-in>
│  │  │  ├─ 0.022 [self]  clip/simple_tokenizer.py
│  │  │  ├─ 0.016 <listcomp>  clip/simple_tokenizer.py:68
│  │  │  │  ├─ 0.008 [self]  clip/simple_tokenizer.py
│  │  │  │  └─ 0.008 str.split  <built-in>
│  │  │  ├─ 0.015 str.split  <built-in>
│  │  │  ├─ 0.006 str.join  <built-in>
│  │  │  ├─ 0.003 bytes.decode  <built-in>
│  │  │  └─ 0.003 <dictcomp>  clip/simple_tokenizer.py:75
│  │  └─ 0.021 CLIPTEXT.build_attention_mask  detic/modeling/text/text_encoder.py:115
│  │     └─ 0.021 Tensor.fill_  <built-in>
│  ├─ 0.031 CLIPTEXT.load_state_dict  torch/nn/modules/module.py:1354
│  │  └─ 0.031 load  torch/nn/modules/module.py:1384
│  │     ├─ 0.030 load  torch/nn/modules/module.py:1384
│  │     │  ├─ 0.021 load  torch/nn/modules/module.py:1384
│  │     │  │  └─ 0.021 load  torch/nn/modules/module.py:1384
│  │     │  │     └─ 0.021 load  torch/nn/modules/module.py:1384
│  │     │  │        ├─ 0.015 load  torch/nn/modules/module.py:1384
│  │     │  │        │  └─ 0.015 Linear._load_from_state_dict  torch/nn/modules/module.py:1276
│  │     │  │        │     ├─ 0.012 Parameter.copy_  <built-in>
│  │     │  │        │     ├─ 0.002 no_grad.__enter__  torch/autograd/grad_mode.py:124
│  │     │  │        │     │  └─ 0.002 set_grad_enabled.__init__  torch/autograd/grad_mode.py:213
│  │     │  │        │     └─ 0.001 str.startswith  <built-in>
│  │     │  │        ├─ 0.004 MultiheadAttention._load_from_state_dict  torch/nn/modules/module.py:1276
│  │     │  │        │  ├─ 0.002 [self]  torch/nn/modules/module.py
│  │     │  │        │  ├─ 0.001 Parameter.copy_  <built-in>
│  │     │  │        │  └─ 0.001 str.startswith  <built-in>
│  │     │  │        └─ 0.001 OrderedDict.get  <built-in>
│  │     │  └─ 0.009 Embedding._load_from_state_dict  torch/nn/modules/module.py:1276
│  │     │     └─ 0.009 Parameter.copy_  <built-in>
│  │     └─ 0.001 CLIPTEXT._load_from_state_dict  torch/nn/modules/module.py:1276
│  │        └─ 0.001 len  <built-in>
│  └─ 0.001 CLIP.state_dict  torch/nn/modules/module.py:1236
│     └─ 0.001 VisionTransformer.state_dict  torch/nn/modules/module.py:1236
│        └─ 0.001 Transformer.state_dict  torch/nn/modules/module.py:1236
│           └─ 0.001 Sequential.state_dict  torch/nn/modules/module.py:1236
│              └─ 0.001 ResidualAttentionBlock.state_dict  torch/nn/modules/module.py:1236
│                 └─ 0.001 Sequential.state_dict  torch/nn/modules/module.py:1236
│                    └─ 0.001 Linear.state_dict  torch/nn/modules/module.py:1236
│                       └─ 0.001 Linear._save_to_state_dict  torch/nn/modules/module.py:1202
│                          └─ 0.001 Parameter.detach  <built-in>
├─ 0.062 CLIPTEXT._call_impl  torch/nn/modules/module.py:1045
│  └─ 0.062 CLIPTEXT.forward  detic/modeling/text/text_encoder.py:165
│     └─ 0.062 CLIPTEXT.encode_text  detic/modeling/text/text_encoder.py:154
│        ├─ 0.060 Transformer._call_impl  torch/nn/modules/module.py:1045
│        │  ├─ 0.052 Transformer.forward  detic/modeling/text/text_encoder.py:64
│        │  │  └─ 0.052 Sequential._call_impl  torch/nn/modules/module.py:1045
│        │  │     └─ 0.052 Sequential.forward  torch/nn/modules/container.py:137
│        │  │        └─ 0.052 ResidualAttentionBlock._call_impl  torch/nn/modules/module.py:1045
│        │  │           └─ 0.052 ResidualAttentionBlock.forward  detic/modeling/text/text_encoder.py:49
│        │  │              ├─ 0.037 Sequential._call_impl  torch/nn/modules/module.py:1045
│        │  │              │  ├─ 0.036 Sequential.forward  torch/nn/modules/container.py:137
│        │  │              │  │  └─ 0.036 Linear._call_impl  torch/nn/modules/module.py:1045
│        │  │              │  │     └─ 0.036 Linear.forward  torch/nn/modules/linear.py:95
│        │  │              │  │        └─ 0.036 linear  torch/nn/functional.py:1831
│        │  │              │  │           └─ 0.036 linear  <built-in>
│        │  │              │  └─ 0.001 LayerNorm.forward  detic/modeling/text/text_encoder.py:20
│        │  │              │     └─ 0.001 LayerNorm.forward  torch/nn/modules/normalization.py:172
│        │  │              │        └─ 0.001 layer_norm  torch/nn/functional.py:2331
│        │  │              │           └─ 0.001 _VariableFunctionsClass.layer_norm  <built-in>
│        │  │              └─ 0.015 ResidualAttentionBlock.attention  detic/modeling/text/text_encoder.py:45
│        │  │                 └─ 0.015 MultiheadAttention._call_impl  torch/nn/modules/module.py:1045
│        │  │                    └─ 0.015 MultiheadAttention.forward  torch/nn/modules/activation.py:974
│        │  │                       └─ 0.015 multi_head_attention_forward  torch/nn/functional.py:4836
│        │  │                          ├─ 0.009 _scaled_dot_product_attention  torch/nn/functional.py:4790
│        │  │                          │  ├─ 0.005 [self]  torch/nn/functional.py
│        │  │                          │  └─ 0.004 _VariableFunctionsClass.bmm  <built-in>
│        │  │                          ├─ 0.002 _in_projection_packed  torch/nn/functional.py:4681
│        │  │                          │  └─ 0.002 linear  torch/nn/functional.py:1831
│        │  │                          │     └─ 0.002 linear  <built-in>
│        │  │                          ├─ 0.002 Tensor.contiguous  <built-in>
│        │  │                          └─ 0.001 linear  torch/nn/functional.py:1831
│        │  │                             └─ 0.001 linear  <built-in>
│        │  └─ 0.008 Embedding.forward  torch/nn/modules/sparse.py:157
│        │     └─ 0.008 embedding  torch/nn/functional.py:1949
│        │        └─ 0.008 _VariableFunctionsClass.embedding  <built-in>
│        └─ 0.002 [self]  detic/modeling/text/text_encoder.py
└─ 0.018 [self]  detic/predictor.py

And in the second time with the same text,

diff --git a/detic/predictor.py b/detic/predictor.py
index 047ed80..06454d2 100644
--- a/detic/predictor.py
+++ b/detic/predictor.py
@@ -19,10 +19,15 @@ from .modeling.utils import reset_cls_test
 
 def get_clip_embeddings(vocabulary, prompt='a '):
     # NOTE: need hashing due to filename length limit
+    from pyinstrument import Profiler
+    profiler = Profiler()
+    profiler.start()
     hash_value = md5("-".join(sorted(vocabulary)).encode()).hexdigest()
     cache_file_path = f"/tmp/detic-clip-embeddings-{hash_value}.pt"
     if Path(cache_file_path).exists():
         print(f"loading embeddings for {vocabulary} from {cache_file_path}")
+        profiler.stop()
+        print(profiler.output_text(unicode=True, color=True, show_all=True))
         return torch.load(cache_file_path)
     else:
         from detic.modeling.text.text_encoder import build_text_encoder

  _     ._   __/__   _ _  _  _ _/_   Recorded: 00:44:36  Samples:  0
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.000     CPU time: 0.000
/   _/                      v4.6.2

means takes no times.

Reflecting your banch (both detic_ros, and Detic)

On your branch

diff --git a/detic/predictor.py b/detic/predictor.py
index 6f787ac..f5476b8 100644
--- a/detic/predictor.py
+++ b/detic/predictor.py
@@ -16,11 +16,16 @@ from .modeling.utils import reset_cls_test
 
 def get_clip_embeddings(vocabulary, prompt='a ', clip_download_root=None):
     from detic.modeling.text.text_encoder import build_text_encoder
+    from pyinstrument import Profiler
+    profiler = Profiler()
+    profiler.start()
     text_encoder = build_text_encoder(pretrain=True,
                                       clip_download_root=clip_download_root)
     text_encoder.eval()
     texts = [prompt + x for x in vocabulary]
     emb = text_encoder(texts).detach().permute(1, 0).contiguous().cpu()
+    profiler.stop()
+    print(profiler.output_text(unicode=True, color=True, show_all=True))
     return emb

the profiling result as below is same (takes 3 seconds) for the embedding generation with the same text at the first time and the later.

3.236 get_clip_embeddings  detic/predictor.py:17
├─ 3.165 build_text_encoder  detic/modeling/text/text_encoder.py:174
│  ├─ 2.432 load  clip/clip.py:94
│  │  ├─ 1.150 build_model  clip/model.py:399
│  │  │  ├─ 1.036 CLIP.__init__  clip/model.py:244
│  │  │  │  ├─ 0.444 VisionTransformer.__init__  clip/model.py:207
│  │  │  │  │  ├─ 0.428 Transformer.__init__  clip/model.py:196
│  │  │  │  │  │  └─ 0.428 <listcomp>  clip/model.py:200
│  │  │  │  │  │     └─ 0.428 ResidualAttentionBlock.__init__  clip/model.py:172
│  │  │  │  │  │        ├─ 0.284 Linear.__init__  torch/nn/modules/linear.py:75
│  │  │  │  │  │        │  └─ 0.284 Linear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │  │  │  │        │     └─ 0.284 kaiming_uniform_  torch/nn/init.py:360
│  │  │  │  │  │        │        └─ 0.284 Parameter.uniform_  <built-in>
│  │  │  │  │  │        └─ 0.144 MultiheadAttention.__init__  torch/nn/modules/activation.py:909
│  │  │  │  │  │           ├─ 0.105 MultiheadAttention._reset_parameters  torch/nn/modules/activation.py:951
│  │  │  │  │  │           │  └─ 0.105 xavier_uniform_  torch/nn/init.py:297
│  │  │  │  │  │           │     └─ 0.105 _no_grad_uniform_  torch/nn/init.py:12
│  │  │  │  │  │           │        └─ 0.105 Parameter.uniform_  <built-in>
│  │  │  │  │  │           └─ 0.039 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:110
│  │  │  │  │  │              └─ 0.039 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:75
│  │  │  │  │  │                 └─ 0.039 NonDynamicallyQuantizableLinear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │  │  │  │                    └─ 0.039 kaiming_uniform_  torch/nn/init.py:360
│  │  │  │  │  │                       └─ 0.039 Parameter.uniform_  <built-in>
│  │  │  │  │  ├─ 0.013 Conv2d.__init__  torch/nn/modules/conv.py:411
│  │  │  │  │  │  └─ 0.013 Conv2d.__init__  torch/nn/modules/conv.py:67
│  │  │  │  │  │     └─ 0.013 Conv2d.reset_parameters  torch/nn/modules/conv.py:140
│  │  │  │  │  │        └─ 0.013 kaiming_uniform_  torch/nn/init.py:360
│  │  │  │  │  │           └─ 0.013 Parameter.uniform_  <built-in>
│  │  │  │  │  ├─ 0.003 _VariableFunctionsClass.randn  <built-in>
│  │  │  │  │  └─ 0.001 LayerNorm.__init__  torch/nn/modules/normalization.py:148
│  │  │  │  │     └─ 0.001 LayerNorm.reset_parameters  torch/nn/modules/normalization.py:167
│  │  │  │  │        └─ 0.001 ones_  torch/nn/init.py:189
│  │  │  │  ├─ 0.257 CLIP.initialize_parameters  clip/model.py:299
│  │  │  │  │  └─ 0.257 normal_  torch/nn/init.py:138
│  │  │  │  │     └─ 0.257 _no_grad_normal_  torch/nn/init.py:17
│  │  │  │  │        └─ 0.257 Parameter.normal_  <built-in>
│  │  │  │  ├─ 0.193 Transformer.__init__  clip/model.py:196
│  │  │  │  │  └─ 0.193 <listcomp>  clip/model.py:200
│  │  │  │  │     └─ 0.193 ResidualAttentionBlock.__init__  clip/model.py:172
│  │  │  │  │        ├─ 0.126 Linear.__init__  torch/nn/modules/linear.py:75
│  │  │  │  │        │  └─ 0.126 Linear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │  │  │        │     └─ 0.126 kaiming_uniform_  torch/nn/init.py:360
│  │  │  │  │        │        └─ 0.126 Parameter.uniform_  <built-in>
│  │  │  │  │        └─ 0.067 MultiheadAttention.__init__  torch/nn/modules/activation.py:909
│  │  │  │  │           ├─ 0.048 MultiheadAttention._reset_parameters  torch/nn/modules/activation.py:951
│  │  │  │  │           │  └─ 0.048 xavier_uniform_  torch/nn/init.py:297
│  │  │  │  │           │     └─ 0.048 _no_grad_uniform_  torch/nn/init.py:12
│  │  │  │  │           │        └─ 0.048 Parameter.uniform_  <built-in>
│  │  │  │  │           └─ 0.020 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:110
│  │  │  │  │              └─ 0.020 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:75
│  │  │  │  │                 └─ 0.020 NonDynamicallyQuantizableLinear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │  │  │                    └─ 0.020 kaiming_uniform_  torch/nn/init.py:360
│  │  │  │  │                       └─ 0.020 Parameter.uniform_  <built-in>
│  │  │  │  └─ 0.142 Embedding.__init__  torch/nn/modules/sparse.py:120
│  │  │  │     └─ 0.142 Embedding.reset_parameters  torch/nn/modules/sparse.py:148
│  │  │  │        └─ 0.142 normal_  torch/nn/init.py:138
│  │  │  │           └─ 0.142 _no_grad_normal_  torch/nn/init.py:17
│  │  │  │              └─ 0.142 Parameter.normal_  <built-in>
│  │  │  ├─ 0.062 convert_weights  clip/model.py:375
│  │  │  │  └─ 0.062 CLIP.apply  torch/nn/modules/module.py:577
│  │  │  │     └─ 0.062 VisionTransformer.apply  torch/nn/modules/module.py:577
│  │  │  │        └─ 0.062 Transformer.apply  torch/nn/modules/module.py:577
│  │  │  │           ├─ 0.060 Sequential.apply  torch/nn/modules/module.py:577
│  │  │  │           │  └─ 0.060 ResidualAttentionBlock.apply  torch/nn/modules/module.py:577
│  │  │  │           │     ├─ 0.057 Sequential.apply  torch/nn/modules/module.py:577
│  │  │  │           │     │  ├─ 0.035 Linear.apply  torch/nn/modules/module.py:577
│  │  │  │           │     │  │  └─ 0.035 _convert_weights_to_fp16  clip/model.py:378
│  │  │  │           │     │  │     ├─ 0.034 [self]  clip/model.py
│  │  │  │           │     │  │     └─ 0.001 Tensor.half  <built-in>
│  │  │  │           │     │  └─ 0.022 _convert_weights_to_fp16  clip/model.py:378
│  │  │  │           │     │     ├─ 0.015 [self]  clip/model.py
│  │  │  │           │     │     └─ 0.006 Tensor.half  <built-in>
│  │  │  │           │     └─ 0.003 _convert_weights_to_fp16  clip/model.py:378
│  │  │  │           │        └─ 0.003 Tensor.half  <built-in>
│  │  │  │           └─ 0.001 _convert_weights_to_fp16  clip/model.py:378
│  │  │  │              └─ 0.001 Tensor.half  <built-in>
│  │  │  ├─ 0.052 CLIP.load_state_dict  torch/nn/modules/module.py:1354
│  │  │  │  └─ 0.052 load  torch/nn/modules/module.py:1384
│  │  │  │     ├─ 0.051 load  torch/nn/modules/module.py:1384
│  │  │  │     │  ├─ 0.042 load  torch/nn/modules/module.py:1384
│  │  │  │     │  │  ├─ 0.041 load  torch/nn/modules/module.py:1384
│  │  │  │     │  │  │  ├─ 0.040 load  torch/nn/modules/module.py:1384
│  │  │  │     │  │  │  │  ├─ 0.031 load  torch/nn/modules/module.py:1384
│  │  │  │     │  │  │  │  │  ├─ 0.017 MultiheadAttention._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │  │  │  │  │  │  ├─ 0.013 Parameter.copy_  <built-in>
│  │  │  │     │  │  │  │  │  │  ├─ 0.002 [self]  torch/nn/modules/module.py
│  │  │  │     │  │  │  │  │  │  ├─ 0.001 str.startswith  <built-in>
│  │  │  │     │  │  │  │  │  │  └─ 0.001 no_grad.__enter__  torch/autograd/grad_mode.py:124
│  │  │  │     │  │  │  │  │  │     └─ 0.001 set_grad_enabled.__init__  torch/autograd/grad_mode.py:213
│  │  │  │     │  │  │  │  │  └─ 0.013 load  torch/nn/modules/module.py:1384
│  │  │  │     │  │  │  │  │     ├─ 0.012 Linear._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │  │  │  │  │     │  ├─ 0.006 Parameter.copy_  <built-in>
│  │  │  │     │  │  │  │  │     │  ├─ 0.002 [self]  torch/nn/modules/module.py
│  │  │  │     │  │  │  │  │     │  ├─ 0.001 <dictcomp>  torch/nn/modules/module.py:1314
│  │  │  │     │  │  │  │  │     │  ├─ 0.001 str.startswith  <built-in>
│  │  │  │     │  │  │  │  │     │  ├─ 0.001 no_grad.__enter__  torch/autograd/grad_mode.py:124
│  │  │  │     │  │  │  │  │     │  │  └─ 0.001 is_grad_enabled  <built-in>
│  │  │  │     │  │  │  │  │     │  └─ 0.001 no_grad.__init__  torch/autograd/grad_mode.py:119
│  │  │  │     │  │  │  │  │     └─ 0.001 OrderedDict.items  <built-in>
│  │  │  │     │  │  │  │  └─ 0.009 LayerNorm._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │  │  │  │     ├─ 0.004 [self]  torch/nn/modules/module.py
│  │  │  │     │  │  │  │     ├─ 0.003 Parameter.copy_  <built-in>
│  │  │  │     │  │  │  │     └─ 0.002 str.startswith  <built-in>
│  │  │  │     │  │  │  └─ 0.001 ResidualAttentionBlock._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │  │  └─ 0.001 LayerNorm._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │  │     └─ 0.001 str.startswith  <built-in>
│  │  │  │     │  └─ 0.009 Embedding._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │     │     └─ 0.009 Parameter.copy_  <built-in>
│  │  │  │     └─ 0.001 CLIP._load_from_state_dict  torch/nn/modules/module.py:1276
│  │  │  │        └─ 0.001 str.split  <built-in>
│  │  │  └─ 0.001 CLIP.eval  torch/nn/modules/module.py:1644
│  │  │     └─ 0.001 CLIP.train  torch/nn/modules/module.py:1622
│  │  │        └─ 0.001 Transformer.train  torch/nn/modules/module.py:1622
│  │  │           └─ 0.001 Sequential.train  torch/nn/modules/module.py:1622
│  │  │              └─ 0.001 ResidualAttentionBlock.train  torch/nn/modules/module.py:1622
│  │  │                 └─ 0.001 ResidualAttentionBlock.__setattr__  torch/nn/modules/module.py:1133
│  │  │                    └─ 0.001 dict.get  <built-in>
│  │  ├─ 0.867 load  torch/jit/_serialization.py:87
│  │  │  ├─ 0.667 PyCapsule.import_ir_module_from_buffer  <built-in>
│  │  │  ├─ 0.155 BufferedReader.read  <built-in>
│  │  │  ├─ 0.027 [self]  torch/jit/_serialization.py
│  │  │  └─ 0.018 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │     └─ 0.018 _construct  torch/jit/_script.py:482
│  │  │        └─ 0.018 init_fn  torch/jit/_recursive.py:801
│  │  │           ├─ 0.017 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │  └─ 0.017 _construct  torch/jit/_script.py:482
│  │  │           │     └─ 0.017 init_fn  torch/jit/_recursive.py:801
│  │  │           │        ├─ 0.016 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │        │  └─ 0.016 _construct  torch/jit/_script.py:482
│  │  │           │        │     └─ 0.016 init_fn  torch/jit/_recursive.py:801
│  │  │           │        │        └─ 0.016 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │        │           └─ 0.016 _construct  torch/jit/_script.py:482
│  │  │           │        │              └─ 0.016 init_fn  torch/jit/_recursive.py:801
│  │  │           │        │                 ├─ 0.014 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │        │                 │  └─ 0.014 _construct  torch/jit/_script.py:482
│  │  │           │        │                 │     ├─ 0.013 init_fn  torch/jit/_recursive.py:801
│  │  │           │        │                 │     │  ├─ 0.012 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │        │                 │     │  │  └─ 0.012 _construct  torch/jit/_script.py:482
│  │  │           │        │                 │     │  │     ├─ 0.005 RecursiveScriptModule.__init__  torch/jit/_script.py:473
│  │  │           │        │                 │     │  │     │  └─ 0.005 RecursiveScriptModule.init_then_script  torch/jit/_script.py:268
│  │  │           │        │                 │     │  │     │     └─ 0.005 RecursiveScriptModule.__init__  torch/jit/_script.py:377
│  │  │           │        │                 │     │  │     │        └─ 0.005 RecursiveScriptModule.__init__  torch/nn/modules/module.py:250
│  │  │           │        │                 │     │  │     │           ├─ 0.004 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │        │                 │     │  │     │           │  └─ 0.004 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │        │                 │     │  │     │           │     └─ 0.004 RecursiveScriptModule.__setattr__  torch/nn/modules/module.py:1133
│  │  │           │        │                 │     │  │     │           └─ 0.001 [self]  torch/nn/modules/module.py
│  │  │           │        │                 │     │  │     ├─ 0.004 _finalize_scriptmodule  torch/jit/_script.py:504
│  │  │           │        │                 │     │  │     │  ├─ 0.002 [self]  torch/jit/_script.py
│  │  │           │        │                 │     │  │     │  └─ 0.002 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │        │                 │     │  │     │     └─ 0.002 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │        │                 │     │  │     │        └─ 0.002 RecursiveScriptModule.__setattr__  torch/nn/modules/module.py:1133
│  │  │           │        │                 │     │  │     │           ├─ 0.001 isinstance  <built-in>
│  │  │           │        │                 │     │  │     │           └─ 0.001 OrderedDictWrapper.__contains__  torch/jit/_script.py:185
│  │  │           │        │                 │     │  │     └─ 0.003 init_fn  torch/jit/_recursive.py:801
│  │  │           │        │                 │     │  │        └─ 0.003 wrap_cpp_module  torch/jit/_recursive.py:797
│  │  │           │        │                 │     │  │           └─ 0.003 _construct  torch/jit/_script.py:482
│  │  │           │        │                 │     │  │              ├─ 0.002 RecursiveScriptModule.__init__  torch/jit/_script.py:473
│  │  │           │        │                 │     │  │              │  ├─ 0.001 RecursiveScriptModule.__delattr__  torch/nn/modules/module.py:1180
│  │  │           │        │                 │     │  │              │  └─ 0.001 RecursiveScriptModule.init_then_script  torch/jit/_script.py:268
│  │  │           │        │                 │     │  │              │     └─ 0.001 RecursiveScriptModule.__init__  torch/jit/_script.py:377
│  │  │           │        │                 │     │  │              │        └─ 0.001 RecursiveScriptModule.__init__  torch/nn/modules/module.py:250
│  │  │           │        │                 │     │  │              │           └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │        │                 │     │  │              │              └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │        │                 │     │  │              └─ 0.001 init_fn  torch/jit/_recursive.py:801
│  │  │           │        │                 │     │  │                 └─ 0.001 PyCapsule.from_jit_type  <built-in>
│  │  │           │        │                 │     │  └─ 0.001 PyCapsule.from_jit_type  <built-in>
│  │  │           │        │                 │     └─ 0.001 RecursiveScriptModule.__init__  torch/jit/_script.py:473
│  │  │           │        │                 │        └─ 0.001 RecursiveScriptModule.init_then_script  torch/jit/_script.py:268
│  │  │           │        │                 │           └─ 0.001 RecursiveScriptModule.__init__  torch/jit/_script.py:377
│  │  │           │        │                 │              └─ 0.001 RecursiveScriptModule.__init__  torch/nn/modules/module.py:250
│  │  │           │        │                 │                 └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │        │                 │                    └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │        │                 │                       └─ 0.001 RecursiveScriptModule.__setattr__  torch/nn/modules/module.py:1133
│  │  │           │        │                 ├─ 0.001 PyCapsule.from_jit_type  <built-in>
│  │  │           │        │                 └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:669
│  │  │           │        │                    └─ 0.001 RecursiveScriptModule.__setattr__  torch/jit/_script.py:387
│  │  │           │        └─ 0.001 PyCapsule.from_jit_type  <built-in>
│  │  │           └─ 0.001 PyCapsule.from_jit_type  <built-in>
│  │  ├─ 0.353 _download  clip/clip.py:43
│  │  │  ├─ 0.169 openssl_sha256  <built-in>
│  │  │  ├─ 0.159 BufferedReader.read  <built-in>
│  │  │  └─ 0.026 [self]  clip/clip.py
│  │  ├─ 0.044 CLIP.float  torch/nn/modules/module.py:683
│  │  │  └─ 0.044 CLIP._apply  torch/nn/modules/module.py:528
│  │  │     └─ 0.044 VisionTransformer._apply  torch/nn/modules/module.py:528
│  │  │        ├─ 0.043 Transformer._apply  torch/nn/modules/module.py:528
│  │  │        │  └─ 0.043 Sequential._apply  torch/nn/modules/module.py:528
│  │  │        │     └─ 0.043 ResidualAttentionBlock._apply  torch/nn/modules/module.py:528
│  │  │        │        └─ 0.043 Sequential._apply  torch/nn/modules/module.py:528
│  │  │        │           ├─ 0.023 Linear._apply  torch/nn/modules/module.py:528
│  │  │        │           │  ├─ 0.022 <lambda>  torch/nn/modules/module.py:692
│  │  │        │           │  │  └─ 0.022 Parameter.float  <built-in>
│  │  │        │           │  └─ 0.001 compute_should_use_set_data  torch/nn/modules/module.py:532
│  │  │        │           │     └─ 0.001 get_overwrite_module_params_on_conversion  torch/__future__.py:18
│  │  │        │           ├─ 0.019 <lambda>  torch/nn/modules/module.py:692
│  │  │        │           │  ├─ 0.018 Parameter.float  <built-in>
│  │  │        │           │  └─ 0.001 [self]  torch/nn/modules/module.py
│  │  │        │           └─ 0.001 [self]  torch/nn/modules/module.py
│  │  │        └─ 0.001 <lambda>  torch/nn/modules/module.py:692
│  │  │           └─ 0.001 Parameter.float  <built-in>
│  │  ├─ 0.013 [self]  clip/clip.py
│  │  ├─ 0.003 CLIP.to  torch/nn/modules/module.py:752
│  │  │  └─ 0.003 CLIP._apply  torch/nn/modules/module.py:528
│  │  │     └─ 0.003 Transformer._apply  torch/nn/modules/module.py:528
│  │  │        └─ 0.003 Sequential._apply  torch/nn/modules/module.py:528
│  │  │           └─ 0.003 ResidualAttentionBlock._apply  torch/nn/modules/module.py:528
│  │  │              └─ 0.003 MultiheadAttention._apply  torch/nn/modules/module.py:528
│  │  │                 ├─ 0.001 convert  torch/nn/modules/module.py:846
│  │  │                 │  └─ 0.001 Parameter.is_floating_point  <built-in>
│  │  │                 ├─ 0.001 no_grad.__exit__  torch/autograd/grad_mode.py:128
│  │  │                 │  └─ 0.001 set_grad_enabled.__init__  torch/autograd/grad_mode.py:213
│  │  │                 │     └─ 0.001 _set_grad_enabled  <built-in>
│  │  │                 └─ 0.001 MultiheadAttention._apply  torch/nn/modules/module.py:528
│  │  │                    └─ 0.001 NonDynamicallyQuantizableLinear._apply  torch/nn/modules/module.py:528
│  │  │                       └─ 0.001 compute_should_use_set_data  torch/nn/modules/module.py:532
│  │  └─ 0.002 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │     ├─ 0.001 RecursiveScriptModule._save_to_state_dict  torch/nn/modules/module.py:1202
│  │     │  └─ 0.001 OrderedDictWrapper.items  torch/jit/_script.py:174
│  │     └─ 0.001 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │        └─ 0.001 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │           └─ 0.001 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │              └─ 0.001 RecursiveScriptModule.state_dict  torch/nn/modules/module.py:1236
│  │                 └─ 0.001 RecursiveScriptModule._save_to_state_dict  torch/nn/modules/module.py:1202
│  │                    └─ 0.001 OrderedDictWrapper.items  torch/jit/_script.py:174
│  ├─ 0.700 CLIPTEXT.__init__  detic/modeling/text/text_encoder.py:68
│  │  ├─ 0.259 CLIPTEXT.initialize_parameters  detic/modeling/text/text_encoder.py:99
│  │  │  └─ 0.259 normal_  torch/nn/init.py:138
│  │  │     └─ 0.259 _no_grad_normal_  torch/nn/init.py:17
│  │  │        └─ 0.259 Parameter.normal_  <built-in>
│  │  ├─ 0.195 Transformer.__init__  detic/modeling/text/text_encoder.py:56
│  │  │  └─ 0.195 <listcomp>  detic/modeling/text/text_encoder.py:61
│  │  │     └─ 0.195 ResidualAttentionBlock.__init__  detic/modeling/text/text_encoder.py:32
│  │  │        ├─ 0.125 Linear.__init__  torch/nn/modules/linear.py:75
│  │  │        │  └─ 0.125 Linear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │        │     └─ 0.125 kaiming_uniform_  torch/nn/init.py:360
│  │  │        │        └─ 0.125 Parameter.uniform_  <built-in>
│  │  │        ├─ 0.068 MultiheadAttention.__init__  torch/nn/modules/activation.py:909
│  │  │        │  ├─ 0.048 MultiheadAttention._reset_parameters  torch/nn/modules/activation.py:951
│  │  │        │  │  └─ 0.048 xavier_uniform_  torch/nn/init.py:297
│  │  │        │  │     └─ 0.048 _no_grad_uniform_  torch/nn/init.py:12
│  │  │        │  │        └─ 0.048 Parameter.uniform_  <built-in>
│  │  │        │  └─ 0.020 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:110
│  │  │        │     └─ 0.020 NonDynamicallyQuantizableLinear.__init__  torch/nn/modules/linear.py:75
│  │  │        │        ├─ 0.019 NonDynamicallyQuantizableLinear.reset_parameters  torch/nn/modules/linear.py:88
│  │  │        │        │  └─ 0.019 kaiming_uniform_  torch/nn/init.py:360
│  │  │        │        │     └─ 0.019 Parameter.uniform_  <built-in>
│  │  │        │        └─ 0.001 Parameter.__new__  torch/nn/parameter.py:23
│  │  │        └─ 0.001 LayerNorm.__init__  torch/nn/modules/normalization.py:148
│  │  │           └─ 0.001 LayerNorm.__setattr__  torch/nn/modules/module.py:1133
│  │  │              └─ 0.001 LayerNorm.register_parameter  torch/nn/modules/module.py:322
│  │  │                 └─ 0.001 LayerNorm.__getattr__  torch/nn/modules/module.py:1117
│  │  ├─ 0.140 Embedding.__init__  torch/nn/modules/sparse.py:120
│  │  │  └─ 0.140 Embedding.reset_parameters  torch/nn/modules/sparse.py:148
│  │  │     └─ 0.140 normal_  torch/nn/init.py:138
│  │  │        └─ 0.140 _no_grad_normal_  torch/nn/init.py:17
│  │  │           └─ 0.140 Parameter.normal_  <built-in>
│  │  ├─ 0.088 SimpleTokenizer.__init__  clip/simple_tokenizer.py:63
│  │  │  ├─ 0.024 [self]  clip/simple_tokenizer.py
│  │  │  ├─ 0.021 GzipFile.read  gzip.py:287
│  │  │  │  ├─ 0.019 _GzipReader.read  gzip.py:454
│  │  │  │  │  ├─ 0.015 Decompress.decompress  <built-in>
│  │  │  │  │  ├─ 0.003 _GzipReader._add_read_data  gzip.py:505
│  │  │  │  │  │  └─ 0.003 crc32  <built-in>
│  │  │  │  │  └─ 0.001 [self]  gzip.py
│  │  │  │  └─ 0.002 BufferedReader.read  <built-in>
│  │  │  ├─ 0.016 <listcomp>  clip/simple_tokenizer.py:68
│  │  │  │  ├─ 0.009 str.split  <built-in>
│  │  │  │  └─ 0.007 [self]  clip/simple_tokenizer.py
│  │  │  ├─ 0.015 str.split  <built-in>
│  │  │  ├─ 0.004 str.join  <built-in>
│  │  │  ├─ 0.004 bytes.decode  <built-in>
│  │  │  ├─ 0.003 <dictcomp>  clip/simple_tokenizer.py:75
│  │  │  └─ 0.001 list.append  <built-in>
│  │  └─ 0.018 CLIPTEXT.build_attention_mask  detic/modeling/text/text_encoder.py:115
│  │     └─ 0.018 Tensor.fill_  <built-in>
│  ├─ 0.032 CLIPTEXT.load_state_dict  torch/nn/modules/module.py:1354
│  │  └─ 0.032 load  torch/nn/modules/module.py:1384
│  │     ├─ 0.031 load  torch/nn/modules/module.py:1384
│  │     │  ├─ 0.021 load  torch/nn/modules/module.py:1384
│  │     │  │  └─ 0.021 load  torch/nn/modules/module.py:1384
│  │     │  │     └─ 0.021 load  torch/nn/modules/module.py:1384
│  │     │  │        ├─ 0.020 load  torch/nn/modules/module.py:1384
│  │     │  │        │  └─ 0.020 Linear._load_from_state_dict  torch/nn/modules/module.py:1276
│  │     │  │        │     ├─ 0.017 Parameter.copy_  <built-in>
│  │     │  │        │     ├─ 0.001 [self]  torch/nn/modules/module.py
│  │     │  │        │     ├─ 0.001 no_grad.__init__  torch/autograd/grad_mode.py:119
│  │     │  │        │     └─ 0.001 no_grad.__enter__  torch/autograd/grad_mode.py:124
│  │     │  │        └─ 0.001 MultiheadAttention._load_from_state_dict  torch/nn/modules/module.py:1276
│  │     │  │           └─ 0.001 Parameter.copy_  <built-in>
│  │     │  └─ 0.009 Embedding._load_from_state_dict  torch/nn/modules/module.py:1276
│  │     │     └─ 0.009 Parameter.copy_  <built-in>
│  │     └─ 0.001 CLIPTEXT._load_from_state_dict  torch/nn/modules/module.py:1276
│  │        └─ 0.001 Parameter.copy_  <built-in>
│  └─ 0.001 CLIP.state_dict  torch/nn/modules/module.py:1236
│     └─ 0.001 VisionTransformer.state_dict  torch/nn/modules/module.py:1236
│        └─ 0.001 Transformer.state_dict  torch/nn/modules/module.py:1236
│           └─ 0.001 Sequential.state_dict  torch/nn/modules/module.py:1236
│              └─ 0.001 ResidualAttentionBlock.state_dict  torch/nn/modules/module.py:1236
│                 └─ 0.001 LayerNorm.state_dict  torch/nn/modules/module.py:1236
│                    └─ 0.001 LayerNorm._save_to_state_dict  torch/nn/modules/module.py:1202
├─ 0.053 CLIPTEXT._call_impl  torch/nn/modules/module.py:1045
│  └─ 0.053 CLIPTEXT.forward  detic/modeling/text/text_encoder.py:165
│     └─ 0.053 CLIPTEXT.encode_text  detic/modeling/text/text_encoder.py:154
│        └─ 0.053 Transformer._call_impl  torch/nn/modules/module.py:1045
│           ├─ 0.052 Transformer.forward  detic/modeling/text/text_encoder.py:64
│           │  └─ 0.052 Sequential._call_impl  torch/nn/modules/module.py:1045
│           │     └─ 0.052 Sequential.forward  torch/nn/modules/container.py:137
│           │        └─ 0.052 ResidualAttentionBlock._call_impl  torch/nn/modules/module.py:1045
│           │           └─ 0.052 ResidualAttentionBlock.forward  detic/modeling/text/text_encoder.py:49
│           │              ├─ 0.036 Sequential._call_impl  torch/nn/modules/module.py:1045
│           │              │  └─ 0.036 Sequential.forward  torch/nn/modules/container.py:137
│           │              │     └─ 0.036 Linear._call_impl  torch/nn/modules/module.py:1045
│           │              │        └─ 0.036 Linear.forward  torch/nn/modules/linear.py:95
│           │              │           └─ 0.036 linear  torch/nn/functional.py:1831
│           │              │              └─ 0.036 linear  <built-in>
│           │              └─ 0.016 ResidualAttentionBlock.attention  detic/modeling/text/text_encoder.py:45
│           │                 └─ 0.016 MultiheadAttention._call_impl  torch/nn/modules/module.py:1045
│           │                    └─ 0.016 MultiheadAttention.forward  torch/nn/modules/activation.py:974
│           │                       └─ 0.016 multi_head_attention_forward  torch/nn/functional.py:4836
│           │                          ├─ 0.006 _scaled_dot_product_attention  torch/nn/functional.py:4790
│           │                          │  ├─ 0.005 _VariableFunctionsClass.bmm  <built-in>
│           │                          │  └─ 0.001 [self]  torch/nn/functional.py
│           │                          ├─ 0.005 _in_projection_packed  torch/nn/functional.py:4681
│           │                          │  └─ 0.005 linear  torch/nn/functional.py:1831
│           │                          │     └─ 0.005 linear  <built-in>
│           │                          ├─ 0.002 linear  torch/nn/functional.py:1831
│           │                          │  └─ 0.002 linear  <built-in>
│           │                          ├─ 0.002 Tensor.contiguous  <built-in>
│           │                          └─ 0.001 Tensor.transpose  <built-in>
│           └─ 0.001 Embedding.forward  torch/nn/modules/sparse.py:157
└─ 0.017 [self]  detic/predictor.py

@mqcmd196
Copy link
Copy Markdown
Collaborator Author

Or more simply run python -c "import clip; hogehoge.." is also an option.

At first, I gave up this idea because the build method differs between x86 and l4t, but as a result the code seems to be simpler here, so I agree this.

On reversion of the caching mechanism

Okay. You introduced the actual usage which already exists, so I accept it.

In Jetson l4t, I saw another issue. After the saving or reloading cached weight, the model doesn't return any segmentations. Do you have any guess? This bug was resolved accidentally by this patch.

@HiroIshida
Copy link
Copy Markdown
Owner

In Jetson l4t, I saw another issue. After the saving or reloading cached weight, the model doesn't return any segmentations. Do you have any guess? This bug was resolved accidentally by this patch.

The cache is created under /tmp. And considering you probably run detic_ros inside container, maybe this is related to authrization issue of /tmp , I guess.

@mqcmd196
Copy link
Copy Markdown
Collaborator Author

Hmm, the cache file exists in /tmp. I'll investigate when I have enough time

@mqcmd196
Copy link
Copy Markdown
Collaborator Author

TODO: saveする前とloadしたときで,それぞれpickleに吐いて,ハッシュ値を比較する

@mqcmd196 mqcmd196 marked this pull request as draft July 17, 2025 05:29
@mqcmd196 mqcmd196 mentioned this pull request Jul 17, 2025
@mqcmd196 mqcmd196 closed this Jul 17, 2025
@mqcmd196 mqcmd196 deleted the avoid-clip-download-runtime branch July 18, 2025 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants