Follow up to #96 and #205.
The _load_model function
|
def _load_model(model: Any, trusted=False) -> Any: |
|
"""Return a model instance. |
|
|
|
Loads the model if provided a file path, if already a model instance return |
|
it unmodified. |
|
|
|
Parameters |
|
---------- |
|
model : pathlib.Path, str, or sklearn estimator |
|
Path/str or the actual model instance. if a Path or str, loads the model. |
|
|
|
trusted : bool, default=False |
|
Passed to :func:`skops.io.load` if the model is a file path and it's |
|
a `skops` file. |
|
|
|
Returns |
|
------- |
|
model : object |
|
Model instance. |
|
""" |
|
|
|
if not isinstance(model, (Path, str)): |
|
return model |
|
|
|
model_path = Path(model) |
|
if not model_path.exists(): |
|
raise FileNotFoundError(f"File is not present: {model_path}") |
|
|
|
try: |
|
if zipfile.is_zipfile(model_path): |
|
model = load(model_path, trusted=trusted) |
|
else: |
|
model = joblib.load(model_path) |
|
except Exception as ex: |
|
msg = f'An "{type(ex).__name__}" occurred during model loading.' |
|
raise RuntimeError(msg) from ex |
|
|
|
return model |
is currently called each time when the model of a card is accessed, e.g. when repr is called. When the model is a path, it has to be loaded each time, which can be expensive. Therefore, we would like to add a cache to _load_model.
A simple functools.{lru_cache,cache} is, however, not sufficient. This is because the argument, in this case the model path, could remain the same while the model on the drive is overwritten by a new model. The cache key would need to be something else, like the md5 sum of the file that the path points to.
Follow up to #96 and #205.
The
_load_modelfunctionskops/skops/card/_model_card.py
Lines 168 to 205 in 30ddea7
is currently called each time when the
modelof a card is accessed, e.g. whenrepris called. When the model is a path, it has to be loaded each time, which can be expensive. Therefore, we would like to add a cache to_load_model.A simple
functools.{lru_cache,cache}is, however, not sufficient. This is because the argument, in this case the model path, could remain the same while the model on the drive is overwritten by a new model. The cache key would need to be something else, like the md5 sum of the file that the path points to.