Download CLIP embeddings in build time#81
Conversation
|
Sorry I dont have time to fully gussing the intent of the PR, so could you give me more context?
|
I mean no network environment.
I'm sorry for confusing. |
On Offline usageOk. I see for the first one. But, the clip donwload is done only once, and cached under (defined around here https://github.com/openai/CLIP/blob/dcba3cb2e2827b402d2701e7e1c7d9fed8a20ef1/clip/clip.py#L120 ) I think after this PR, we'll have two functionalities of CLIP model downloading inside the code base. On reversion of the caching mechanismAs for the second one, I disagree. Embedding computation is dominated by building the clip model, not downloading it. I tested the performance as follows. because inside the emebdding computation dont do any model donwload and main bottleneck, which suggest that bottleneck. So I think current caching mechanism is quite effective (reduces 3 seconds) and I want to keep it. On current master branch--- a/detic/predictor.py
+++ b/detic/predictor.py
@@ -26,10 +26,15 @@ def get_clip_embeddings(vocabulary, prompt='a '):
return torch.load(cache_file_path)
else:
from detic.modeling.text.text_encoder import build_text_encoder
+ from pyinstrument import Profiler
+ profiler = Profiler()
+ profiler.start()
text_encoder = build_text_encoder(pretrain=True)
text_encoder.eval()
texts = [prompt + x for x in vocabulary]
emb = text_encoder(texts).detach().permute(1, 0).contiguous().cpu()
+ profiler.stop()
+ print(profiler.output_text(unicode=True, color=True, show_all=True))
print(f"saved embeddings for {vocabulary} to {cache_file_path}")
torch.save(emb, cache_file_path)
return embAnd in the second time with the same text, means takes no times. Reflecting your banch (both detic_ros, and Detic)On your branch diff --git a/detic/predictor.py b/detic/predictor.py
index 6f787ac..f5476b8 100644
--- a/detic/predictor.py
+++ b/detic/predictor.py
@@ -16,11 +16,16 @@ from .modeling.utils import reset_cls_test
def get_clip_embeddings(vocabulary, prompt='a ', clip_download_root=None):
from detic.modeling.text.text_encoder import build_text_encoder
+ from pyinstrument import Profiler
+ profiler = Profiler()
+ profiler.start()
text_encoder = build_text_encoder(pretrain=True,
clip_download_root=clip_download_root)
text_encoder.eval()
texts = [prompt + x for x in vocabulary]
emb = text_encoder(texts).detach().permute(1, 0).contiguous().cpu()
+ profiler.stop()
+ print(profiler.output_text(unicode=True, color=True, show_all=True))
return embthe profiling result as below is same (takes 3 seconds) for the embedding generation with the same text at the first time and the later. |
At first, I gave up this idea because the build method differs between x86 and l4t, but as a result the code seems to be simpler here, so I agree this.
Okay. You introduced the actual usage which already exists, so I accept it. In Jetson l4t, I saw another issue. After the saving or reloading cached weight, the model doesn't return any segmentations. Do you have any guess? This bug was resolved accidentally by this patch. |
The cache is created under /tmp. And considering you probably run detic_ros inside container, maybe this is related to authrization issue of /tmp , I guess. |
|
Hmm, the cache file exists in /tmp. I'll investigate when I have enough time |
|
TODO: saveする前とloadしたときで,それぞれpickleに吐いて,ハッシュ値を比較する |
I want to use this in off-line environment. This PR supports downloading CLIP models on build time.
This PR requires