Phase-level memory breakdown for forward / backward / optimizer

## Problem 

TraceML currently reports approximate step memory by resetting peak memory stats around the training step. This is useful at step level but it does not show where memory pressure happens inside the step: forward, backward or optimizer

### Goal

Add estimated memory breakdown for forward, backward, and optimizer, using phase-scoped instrumentation similar to `timed region` .

### Proposed idea

Track memory around each phase and report: peak allocated memory. For forward, one approach is to wrap the outermost nn.Module.__call__  (similar to timed region) and measure memory within that scope.

### Important detail

Step memory approximation works by resetting peak stats around the full step. That approach will no longer work by itself once we also reset peak stats inside forward/backward/optimizer, because those resets happen within the step and interfere with the previous step-level approximation. Because of this, step memory should now be computed as the max of:

the existing step-level memory approximation
forward phase peak
backward phase peak
optimizer phase peak
Notes

This is still approximate, not exact attribution, because CUDA is asynchronous and CPU-side phase boundaries do not perfectly match GPU execution.

### Requirements
low overhead, no forced synchronization in default mode, outermost forward only, no submodule, keep step memory reporting correct after adding per-phase resets, document clearly


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase-level memory breakdown for forward / backward / optimizer #64

Problem

Goal

Proposed idea

Important detail

Requirements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phase-level memory breakdown for forward / backward / optimizer #64

Description

Problem

Goal

Proposed idea

Important detail

Requirements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions