Interpretability with tensordict and torch hooks.
Most methods should work with minimal configuration. Here's a basic example of running Integrated Gradients on a VGG16 model (full example available here):
from tdhook.attribution import IntegratedGradients
# Define attribution target (e.g., zebra class = 340)
def init_attr_targets(targets, _):
zebra_logit = targets["output"][..., 340]
return TensorDict(out=zebra_logit, batch_size=targets.batch_size)
# Compute attribution
with IntegratedGradients(init_attr_targets=init_attr_targets).prepare(model) as hooked_model:
td = TensorDict({
"input": image_tensor,
("baseline", "input"): torch.zeros_like(image_tensor) # required for integrated gradients
}).unsqueeze(0)
td = hooked_model(td) # Access attribution with td.get(("attr", "input"))To dig deeper, see the documentation.
This project uses uv to manage python dependencies and run scripts, as well as just to run commands.
If you're using tdhook in your research, please cite it using the following BibTeX entry:
@misc{poupart2025tdhooklightweightframeworkinterpretability,
title={TDHook: A Lightweight Framework for Interpretability},
author={Yoann Poupart},
year={2025},
eprint={2509.25475},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2509.25475},
}
tdhook is licensed under the MIT License. See LICENSE for details.
