sort/topk: THISTOGRAM radix-select top-K (gfrun+gfsim PASS)#13
sort/topk: THISTOGRAM radix-select top-K (gfrun+gfsim PASS)#13liujiang833 wants to merge 2 commits into
Conversation
|
Model-side changes tracked in LinxISA/LinxCoreModel#21 (THISTOGRAM tile op + ADDTPC full-PC fix + srcRType encoding fix). |
|
Correction: model-side changes are consolidated in LinxISA/LinxCoreModel#20 (THISTOGRAM modeling + the ADDTPC full-PC and srcRType fixes). #21 was closed as a duplicate. |
Reimplement topk with the THISTOGRAM tile op instead of SIMT per-bucket scans: - topk.hpp: cumulative byte histograms over all 131072 inputs via THISTOGRAM (Byte1 high pass, then Byte0 low pass filtered to high==kth_bin via the idx tile). Self-contained THISTOGRAM_FIXED inline-asm wrapper with correct operand numbering (the toolchain-bundled template has an off-by-one operand bug). - topk.cpp: radix-select the K-th-largest value threshold from the two histograms; verify O(1) against g_expected[kTopK-1] (the K-th largest by construction); output via linxi_put, result in exit code. (The previous O(kTopK^2) host sort made the kernel infeasible on the cycle-accurate model.) Validated: gfrun exit 0 (~47s); gfsim exit 0 with -s core.singleTierMode=true (~89s); threshold 0xfc10 == g_expected[2047]. Requires LinxCoreModel THISTOGRAM support (THISTOGRAM tile op + ADDTPC/srcRType fixes) -- see LinxCoreModel issue. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
781b0bf to
ccc5e4d
Compare
…apper) The tileop-api is owned by the compiler. Remove topk's self-contained THISTOGRAM_FIXED inline-asm copy and call the standard THISTOGRAM wrapper from tileop-api directly. This makes topk depend on the upstream fix for tileop-api's THISTOGRAM off-by-one operand numbering (template_asm.hpp), tracked in LinxISA/llvm-project -> topk is BLOCKED on upstream until that fix ships. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Blocked on upstream: topk now calls the compiler tileop-api's standard |
|
Unblocked: the THISTOGRAM operand fix is merged upstream (LinxISA/llvm-project#27, Linx-TileOP-API@75eaf77). Verified end-to-end on the rebuilt toolchain — topk compiles with the standard tileop-api |
Summary
Reimplements
sort/topkas a pure tile-op radix-select using the new THISTOGRAM tile instruction, replacing the old SIMT per-bucket histogram scans. Runs end-to-end on both LinxCoreModel simulators.Approach (radix-select the K-th-largest value)
Byte1) histogram over all 131072 inputs via THISTOGRAM (no filter) → high byte of the K-th largest.Byte0) histogram via THISTOGRAM, filtered tohigh == kth_bin(the idx tile supplies the high-byte prefix filter) → low byte.threshold = (kth_bin<<8 | low8)is, by definition, the value of the K-th largest element.g_expectedis sorted descending, sog_expected[kTopK-1]is the K-th largest — compare it tothreshold. (The previous O(kTopK²) host sort made the kernel infeasible on the cycle-accurate model.)topk.hppuses a self-containedTHISTOGRAM_FIXEDinline-asm wrapper with correct operand numbering, so it does not depend on the toolchain-bundled THISTOGRAM template (which has an off-by-one operand bug).Validation
gfrun(functional): exit 0, ~47s.gfsim(cycle-accurate): exit 0 with-s core.singleTierMode=true, ~89s;threshold = 0xfc10 == g_expected[2047].Requires (model side)
THISTOGRAM is a new tile op — needs LinxCoreModel support (THISTOGRAM tile op + the
ADDTPCfull-PC andsrcRTypeencoding fixes). Tracked in the companion LinxCoreModel issue.🤖 Generated with Claude Code