docs: add TRB temporary buffer plan#1198
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds ChangesTRB Serial Tensor Buffer Pool Plan
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces an implementation plan for a TRB Temporary Variable Buffer to optimize memory allocation overhead. The review feedback highlights three key areas for improvement: simplifying the cleanup contract by leveraging the existing validate_runtime_impl gateway, resolving a design contradiction between the aggregate budget contract and segmented chunk allocation, and adding a lightweight active-run guard to prevent silent data corruption from concurrent runs.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| Cleanup ownership must be explicit: | ||
|
|
||
| ```text | ||
| bind owns cleanup until bind succeeds | ||
| validate owns cleanup after bind succeeds | ||
| ``` | ||
|
|
||
| If `begin_temporary_buffer_run()` succeeds, exactly one matching | ||
| `end_temporary_buffer_run()` must run. This applies whether bind, H2D, memset, | ||
| run, status readback, D2H, or validation fails. | ||
|
|
||
| Bind should use a local cleanup guard: | ||
|
|
||
| ```text | ||
| temp_run_active = false | ||
|
|
||
| if temporary buffer is enabled: | ||
| begin_temporary_buffer_run() | ||
| temp_run_active = true | ||
|
|
||
| for tensor in tensors: | ||
| acquire or malloc dev_ptr | ||
| record TensorLease immediately | ||
| copy or memset | ||
|
|
||
| runtime.temporary_buffer_run_active = temp_run_active | ||
| release bind cleanup guard | ||
| ``` | ||
|
|
||
| Before the cleanup guard is released, bind failure cleanup must: | ||
|
|
||
| - release all recorded `Free` leases with `device_free()`; | ||
| - leave `BufferNoop` and `ExternalNoop` tensor leases as per-tensor no-ops; | ||
| - end the temporary-buffer run if `temp_run_active` is true; | ||
| - clear recorded leases. | ||
|
|
||
| After bind succeeds, validate cleanup must perform the same release dispatch | ||
| and end the temporary-buffer run if | ||
| `runtime.temporary_buffer_run_active` is true. |
There was a problem hiding this comment.
The proposed cleanup contract introduces a local cleanup guard in bind to handle failures before bind succeeds, and a separate cleanup path in validate. However, in the existing c_api_shared.cpp implementation, validate_runtime_impl is always called by the caller (run_prepared) if bind_callable_to_runtime_impl fails:
rc = bind_callable_to_runtime_impl(...);
if (rc != 0) {
r->set_gm_sm_ptr(nullptr);
validate_runtime_impl(r);
return rc;
}Because validate_runtime_impl is already the designated single cleanup gateway for any failure after Runtime construction, having bind perform its own manual cleanup of the temporary-buffer run and leases duplicates the release dispatch logic and increases the risk of double-cleanup or mismatched state.
Suggestion:
Simplify the contract by letting validate_runtime_impl handle all cleanup on failure. When begin_temporary_buffer_run() succeeds, immediately set runtime.temporary_buffer_run_active = true and record leases in runtime.tensor_leases_ as they are acquired. If any subsequent step in bind fails, simply return an error and let the caller's existing validate_runtime_impl call handle the rollback of Free leases and end the temporary-buffer run.
References
- Ensure documentation and diagrams accurately reflect implementation details regarding resource lifecycles, especially when persistence is used to maintain internal state like caches.
- Ensure that code snippets provided in documentation are kept in sync with the actual implementation in the codebase, including robust error handling (e.g., try-except blocks) and edge-case guards.
| Allocation policy: | ||
|
|
||
| - Support multiple chunks so the implementation does not depend on the largest | ||
| contiguous allocatable HBM block. | ||
| - Allocate retained chunks during positive-budget configuration. Do not add | ||
| chunks lazily from `acquire()` during bind. | ||
| - Never let total retained chunk capacity exceed | ||
| `max_temporary_buffer_bytes`. | ||
| - A tensor slice must be contiguous within one chunk. If a single tensor is | ||
| larger than every retained chunk, configuration must create a large-enough | ||
| chunk within the same aggregate budget or fail before the run. | ||
|
|
There was a problem hiding this comment.
There is a design contradiction between the "Budget Contract" and "Segmented Chunks" sections:
- The "Budget Contract" states that the runtime does not know model-specific tensor sizes and only receives an aggregate
max_temporary_buffer_byteslimit. - The "Segmented Chunks" section states that chunks are allocated statically during configuration (not lazily during
acquire), and that if a single tensor is larger than every retained chunk, the configuration must create a large-enough chunk or fail before the run.
If the configuration API only receives the aggregate max_temporary_buffer_bytes and has no knowledge of individual tensor sizes, it cannot know how to segment the budget into chunks that are guaranteed to fit the largest contiguous tensor. For example, if the budget is split into multiple chunks, a single large tensor might fail to find a contiguous slot even if the aggregate budget is sufficient.
Suggestion:
Clarify how the chunk allocation policy decides the individual chunk sizes during configuration. If the largest tensor size is unknown, the runner should default to allocating a single contiguous chunk of max_temporary_buffer_bytes to guarantee that any tensor within the budget can be allocated contiguously, or the configuration API should accept a list of chunk sizes instead of a single aggregate limit.
| If two host threads call `run_prepared()` concurrently on the same runner while | ||
| temporary buffering is enabled, behavior is unsupported. The caller or serving | ||
| scheduler is responsible for serializing same-runner runs. | ||
|
|
||
| Future same-runner concurrency must add a run-lifecycle mutex, active-run | ||
| guard, fallback-to-malloc behavior, or true double buffering. That work is | ||
| outside this implementation plan. | ||
|
|
There was a problem hiding this comment.
While the plan assumes a single active run per runner and treats concurrent same-runner runs as unsupported, failing to guard against concurrency can lead to silent data corruption (e.g., if a second thread calls begin_temporary_buffer_run() and resets the chunk offsets to zero while the first thread's run is still active).
Suggestion:
Instead of leaving concurrent runs completely unguarded, implement a lightweight active-run guard (e.g., using a std::atomic<bool> or std::atomic_flag on the runner). If begin_temporary_buffer_run() is called while a run is already active, it should fail-fast with a clear error or assertion. This provides a robust safety net against programming errors with zero performance overhead on the hot path.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
docs/trb-serial-tensor-buffer-pool-plan.md (3)
11-11: 📐 Maintainability & Code Quality | 🔵 TrivialConsider using English consistently throughout the document.
Line 11 uses Chinese characters "临时变量" while the rest of the document is entirely in English. For consistency and to avoid encoding issues in some toolchains, consider using only English: "This plan uses 'temporary variable buffer' for the same concept as 'linshi biancun buffer'."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/trb-serial-tensor-buffer-pool-plan.md` at line 11, The document mixes Chinese and English, so update the referenced text in the plan to use English consistently. Replace the Chinese phrase in the buffer description with an English equivalent in the same section of docs/trb-serial-tensor-buffer-pool-plan.md, keeping the wording aligned with the existing terminology used elsewhere in the document.
154-157: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winClarify finalize cleanup ordering when a run is active.
The text says to "log it, release retained chunks" if finalize sees an active run. For defensive correctness, explicitly state whether
end_temporary_buffer_run()should be called beforeclear()to reset theactive_flag, or ifclear()alone is sufficient because the runner is being destroyed. This prevents implementers from leaving the flag in an inconsistent state if any post-finalize diagnostics check it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/trb-serial-tensor-buffer-pool-plan.md` around lines 154 - 157, Clarify the finalize cleanup order in finalize_common()/sim finalize when an active temporary-buffer run is present: specify whether end_temporary_buffer_run() must be called before clear() to reset active_, or whether clear() alone is sufficient because the runner is being torn down. Update the contract wording so implementers know how to handle the active run state consistently and avoid leaving diagnostics-visible flags stale.
526-552: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAdd unit tests for configuration edge cases.
Consider adding focused unit tests for:
configure_temporary_bufferwith same budget twice is a no-op (no reallocation);configure_temporary_bufferwhile a run is active fails with clear error;begin_temporary_buffer_runfailure path (e.g., OOM during chunk allocation) is handled correctly.These complement the existing test list and guard against reconfiguration bugs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/trb-serial-tensor-buffer-pool-plan.md` around lines 526 - 552, Add focused unit tests for the temporary-buffer configuration edge cases: verify configure_temporary_buffer is a no-op when called twice with the same budget, verify configure_temporary_buffer rejects changes while a run is active with a clear error, and verify begin_temporary_buffer_run handles allocation failure/OOM correctly. Place these alongside the existing buffer unit tests and use the same temporary-buffer pool APIs and run lifecycle symbols so the new coverage matches the current single-active-run model.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/trb-serial-tensor-buffer-pool-plan.md`:
- Around line 448-460: The bind pseudocode currently assumes
begin_temporary_buffer_run() always succeeds and sets temp_run_active=true
unconditionally, but the HostApi callback returns a bool. Update the bind flow
to check the return value from begin_temporary_buffer_run() and fail the bind
path immediately with a clear error if it returns false, before any TensorLease
recording or device memory work; keep temp_run_active=true only after a
successful begin_temporary_buffer_run() call.
---
Nitpick comments:
In `@docs/trb-serial-tensor-buffer-pool-plan.md`:
- Line 11: The document mixes Chinese and English, so update the referenced text
in the plan to use English consistently. Replace the Chinese phrase in the
buffer description with an English equivalent in the same section of
docs/trb-serial-tensor-buffer-pool-plan.md, keeping the wording aligned with the
existing terminology used elsewhere in the document.
- Around line 154-157: Clarify the finalize cleanup order in
finalize_common()/sim finalize when an active temporary-buffer run is present:
specify whether end_temporary_buffer_run() must be called before clear() to
reset active_, or whether clear() alone is sufficient because the runner is
being torn down. Update the contract wording so implementers know how to handle
the active run state consistently and avoid leaving diagnostics-visible flags
stale.
- Around line 526-552: Add focused unit tests for the temporary-buffer
configuration edge cases: verify configure_temporary_buffer is a no-op when
called twice with the same budget, verify configure_temporary_buffer rejects
changes while a run is active with a clear error, and verify
begin_temporary_buffer_run handles allocation failure/OOM correctly. Place these
alongside the existing buffer unit tests and use the same temporary-buffer pool
APIs and run lifecycle symbols so the new coverage matches the current
single-active-run model.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e992fafc-1d81-49ff-a3d7-3cea3aeb9bfd
📒 Files selected for processing (1)
docs/trb-serial-tensor-buffer-pool-plan.md
| temp_run_active = false | ||
|
|
||
| if temporary buffer is enabled: | ||
| begin_temporary_buffer_run() | ||
| temp_run_active = true | ||
|
|
||
| for tensor in tensors: | ||
| acquire or malloc dev_ptr | ||
| record TensorLease immediately | ||
| copy or memset | ||
|
|
||
| runtime.temporary_buffer_run_active = temp_run_active | ||
| release bind cleanup guard |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Add error handling for begin_temporary_buffer_run() failure in bind pseudocode.
The pseudocode sets temp_run_active = true unconditionally after begin_temporary_buffer_run(), but the actual callback returns bool. If begin fails (e.g., configuration error, out of memory), the bind path should fail early rather than proceeding with temp_run_active = true. Add:
if temporary buffer is enabled:
if !begin_temporary_buffer_run():
fail with clear error
temp_run_active = true
This aligns with the actual bool return type specified in the HostApi wiring section (line 360).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/trb-serial-tensor-buffer-pool-plan.md` around lines 448 - 460, The bind
pseudocode currently assumes begin_temporary_buffer_run() always succeeds and
sets temp_run_active=true unconditionally, but the HostApi callback returns a
bool. Update the bind flow to check the return value from
begin_temporary_buffer_run() and fail the bind path immediately with a clear
error if it returns false, before any TensorLease recording or device memory
work; keep temp_run_active=true only after a successful
begin_temporary_buffer_run() call.
185bc6e to
befb5c5
Compare
- Add a retained temporary variable buffer owned by DeviceRunnerBase and wire it through HostApi, C ABI, ChipWorker, and Worker configuration.\n- Convert TRB tensor cleanup to explicit leases so configured runs reuse buffer slices while disabled runs keep malloc/free semantics.\n- Cover buffer lifecycle, TRB bind/validate cleanup, child-memory regressions, and Python configuration entrypoints.
befb5c5 to
3bf8392
Compare
- Replace the non-English temporary-buffer note in the plan doc - Group chip child-process startup settings to keep the helper under ruff's argument limit - Apply clang-format changes reported by CI
Summary
Testing