Skip to content

tmp#1

Open
ChaoZheng109 wants to merge 1 commit into
mainfrom
task-hub
Open

tmp#1
ChaoZheng109 wants to merge 1 commit into
mainfrom
task-hub

Conversation

@ChaoZheng109

Copy link
Copy Markdown
Owner

No description provided.

ChaoZheng109 pushed a commit that referenced this pull request Jun 5, 2026
…unning-onboard (hw-native-sys#990)

`.claude/rules/task-submit-isolation.md` asserted a specific causal chain
— "another user on the same physical chip" → contend on FFTS / shared L2
/ MMU TLB / register MMIO → AICore ACK-no-FIN → 507018 — and treated it
as a known mechanism. The whole chain rests on a single anecdote (the
"npu-lock 8 held 5 hours" example) where the fix was "rerun under
task-submit lock and observe pass," which is correlation, not a
controlled measurement of the mechanism. There is no controlled repro
of "two users, same chip, sibling die, deterministic 507018."

In a recent debugging session a stale build artifact (binary built
before later C++ edits) reproduced 507018 deterministically across 4
attempts on different chip pairs with full chip-pair locks. Re-running
pip install fixed it. The repeated "must be chip-shared contention"
hypothesis I cited from this rule was wrong and burned several
diagnostic rounds before someone called it out.

Changes:

* Rename .claude/rules/task-submit-isolation.md -> running-onboard.md.
  The file's title is already "NPU Hardware Isolation" — sim variants
  (a2a3sim / a5sim) are silicon-agnostic and never need task-submit;
  the filename now reflects that scope. 5 references in skills and
  cann-examples READMEs updated to the new path.
* "Why" rewritten as a queue-management rule: shared dev box, polite
  device allocation, parity with CI which always uses task-submit. No
  causal claim about specific failure modes.
* "If unlocked results contradict CI ... check another process on the
  same chip" replaced with "rule out binary/source skew, wrong arch,
  stale CMake cache first."
* Anti-pattern #1 reworded: unlocked shell loops break queue
  accounting and race other users for devices — not "your iter results
  are entangled" (which implied the chip-shared contention model).
* Quick-reference rename: "Per-die contention map" -> "Per-die
  utilization + process table." npu-smi info shows utilization, not a
  contention map.

Kept:

* The rule that hardware work goes through task-submit when available.
* The fallback / unisolated guidance for environments without
  task-submit.
* The arch-precheck section (the 507018/507899 wrong-arch failure mode
  is documented separately in onboard-arch-precheck and is backed by
  controlled repro).

Co-authored-by: Chao Wang <26245345+ChaoWao@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant