perf: optimize PyTorch forward hooks by caching deque lookups by ppraneth · Pull Request #63 · traceopt-ai/traceml

ppraneth · 2026-03-23T02:54:08Z

PR Description

Description:
This PR optimizes the critical path of the PyTorch layer forward hooks (LayerForwardTimePreHook and LayerForwardTimePostHook). Previously, the hooks performed dynamic nested dictionary lookups (setdefault(...) and .get(...)) during every layer's forward pass, introducing microsecond delays that accumulate over an epoch and can softly starve the GPU of work.

Changes:

Extracted target queue (deque) resolution into the __init__ methods of both pre and post hooks.
Bound the resolved reference to self.layer_q.
Updated the __call__ methods to directly use self.layer_q.append(...) and self.layer_q.popleft(), achieving O(1) time complexity and bypassing dictionary hashing overhead.

Impact:

Eliminates Python dictionary overhead per module execution.
Microbenchmarks: ~2.85× speedup in isolated Python overhead within the hook execution path.

Signed-off-by: ppraneth <pranethparuchuri@gmail.com>

ppraneth added 2 commits March 22, 2026 19:09

test

4cb283a

Signed-off-by: ppraneth <pranethparuchuri@gmail.com>

fix changes

8a057c6

Signed-off-by: ppraneth <pranethparuchuri@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize PyTorch forward hooks by caching deque lookups#63

perf: optimize PyTorch forward hooks by caching deque lookups#63
ppraneth wants to merge 2 commits intotraceopt-ai:mainfrom
ppraneth:perf5

ppraneth commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ppraneth commented Mar 23, 2026

PR Description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant