Skip to content

perf: optimize PyTorch forward hooks by caching deque lookups#63

Open
ppraneth wants to merge 2 commits intotraceopt-ai:mainfrom
ppraneth:perf5
Open

perf: optimize PyTorch forward hooks by caching deque lookups#63
ppraneth wants to merge 2 commits intotraceopt-ai:mainfrom
ppraneth:perf5

Conversation

@ppraneth
Copy link
Copy Markdown
Contributor

PR Description

Description:
This PR optimizes the critical path of the PyTorch layer forward hooks (LayerForwardTimePreHook and LayerForwardTimePostHook). Previously, the hooks performed dynamic nested dictionary lookups (setdefault(...) and .get(...)) during every layer's forward pass, introducing microsecond delays that accumulate over an epoch and can softly starve the GPU of work.

Changes:

  • Extracted target queue (deque) resolution into the __init__ methods of both pre and post hooks.
  • Bound the resolved reference to self.layer_q.
  • Updated the __call__ methods to directly use self.layer_q.append(...) and self.layer_q.popleft(), achieving O(1) time complexity and bypassing dictionary hashing overhead.

Impact:

  • Eliminates Python dictionary overhead per module execution.
  • Microbenchmarks: ~2.85× speedup in isolated Python overhead within the hook execution path.

Signed-off-by: ppraneth <pranethparuchuri@gmail.com>
Signed-off-by: ppraneth <pranethparuchuri@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant