Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
d5128df
Add: fully_distributed_within_core runtime — SPMD on-core orchestration
hengliao1972 Jun 24, 2026
98c0e65
Add: execute-first run-ahead loop + per-core swimlane for fully_distr…
hengliao1972 Jun 25, 2026
3ebf6eb
docs(fully_distributed_within_core): embed FullCore24 execution swimlane
hengliao1972 Jun 25, 2026
9158240
Add: runtime_overhead_test — isolate on-core orchestration/scheduling…
hengliao1972 Jun 26, 2026
ccf77a7
docs(fully_distributed_within_core): add §6.2 measured orchestration …
hengliao1972 Jun 26, 2026
efe9deb
Fix sim pthread-key leak; add CPU-affinity sweep + binding docs
hengliao1972 Jun 26, 2026
30bb973
perf(fully_distributed_within_core): make per-core TensorMap O(N) ins…
hengliao1972 Jun 26, 2026
f896159
perf+feat(fully_distributed_within_core): AICore single-NUMA thread p…
hengliao1972 Jun 26, 2026
5297a0e
fix(sim): guard Linux-only AICore pinning for non-Linux build; platfo…
hengliao1972 Jun 26, 2026
3090183
chore(runtime_overhead_test): macOS default --blocks 1-4
hengliao1972 Jun 26, 2026
75c38a1
perf(fully_distributed_within_core): winner-only fan-in lookup (skip …
hengliao1972 Jun 26, 2026
15c1ae8
perf(fully_distributed_within_core): shard claim cursor by task_index…
hengliao1972 Jun 26, 2026
ca4fdf1
docs(fully_distributed_within_core): add §6.7 measured cursor-shardin…
hengliao1972 Jun 26, 2026
f228b0d
style(fully_distributed_within_core): clang-format dist_engine.cpp
poursoul Jun 30, 2026
13a9f91
Add: sim trace-driven replay via --use-example-exec-time
poursoul Jun 30, 2026
8bd67ca
fix(fully_distributed_within_core): elect a single alloc owner; phase…
poursoul Jun 30, 2026
9035e46
Merge pull request #1 from poursoul/fdwic-alloc-owner-fix
hengliao1972 Jun 30, 2026
4398f45
perf(fully_distributed_within_core): dual-clock swimlane (wall + thre…
poursoul Jun 30, 2026
0e37ff4
perf(fully_distributed_within_core): swimlane dependency + slot-relea…
poursoul Jun 30, 2026
74b6756
perf(fully_distributed_within_core): startup barrier to align worker …
poursoul Jun 30, 2026
b5e39b8
perf(fully_distributed_within_core): full-submit lap swimlane, gated …
poursoul Jul 1, 2026
84e78bd
Merge pull request #2 from poursoul/fdwic-swimlane-deps
hengliao1972 Jul 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,15 @@ def pytest_addoption(parser):
help="Enable L2 swimlane. Bare flag=level 4 (full). "
"1=AICore timing, 2=+dispatch/fanout, 3=+sched phases, 4=+orch phases",
)
parser.addoption(
"--use-example-exec-time",
action="store_true",
default=False,
help="(fully_distributed_within_core sim only) Replace each incore kernel with a "
"busy-wait of its CALLABLE example_execute_time (microseconds) instead of running "
"the real kernel, so a fast sim run reflects measured on-hardware kernel durations "
"plus orchestration overhead. Other runtimes reject this flag.",
)
parser.addoption(
"--enable-device-log-timing",
action="store_true",
Expand Down
1,236 changes: 1,236 additions & 0 deletions docs/fully_distributed_within_core.md

Large diffs are not rendered by default.

Loading