reject worktree-bypassing harnesses instead of silently evaluating the unpatched kernel#255
Open
chao-xu-spec wants to merge 3 commits into
Open
reject worktree-bypassing harnesses instead of silently evaluating the unpatched kernel#255chao-xu-spec wants to merge 3 commits into
chao-xu-spec wants to merge 3 commits into
Conversation
…ommand + harness-generator prompt reorg conflicts)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Closes #246.
For C++/HIP header-only (and editable-install, e.g.
aiter) kernels, theauto-generated
harness.pyhard-coded an absolute path into the source repoinstead of deriving every repository path from
$GEAK_WORK_DIR. The COMMANDMENTside was correct (
cd $GEAK_WORK_DIR, worktree-firstPYTHONPATH), but theharness it invoked ignored those signals via four shapes:
REPO_ROOT = "/sgl-workspace/sglang"module-level literal.hipcc -I /abs/source/include—#includeresolved through the source repo..sobuild cache underHARNESS_DIR/_harness_build/, shared across everyworktree / round / GPU worker.
need_rebuild = Falseand the stale.sois reused.Net effect: every patched candidate compiled/loaded the unpatched kernel.
--correctnessalways PASSed, verified speedup was always ≈ 1.00×, and novalidator fired — the optimizer (and each sub-agent inner loop) trained on a
flat signal, and correctness-breaking patches went undetected.
Existing validators only checked the four argparse flags +
GEAK_RESULT_*stdout markers, and preflight/baseline both ran on the unpatched repo, so the
bypass was indistinguishable from a legitimate "couldn't beat baseline" result.
Approach — defense in depth
The root cause is mechanism-specific (
-I,sys.path,open(), build cache,editable
import), so a single textual check is not enough. This PR adds adeterministic, mechanism-agnostic static gate plus generator/verifier
guidance so the bypass is caught at generation time, verification time, and
optimization time — and never produces a green ≈1.00× run.
1. Deterministic worktree-bypass gate (
kernel_languages/contract.py)find_source_repo_path_leaks(text, repo_root=None)— flags any absolute pathliteral that points into the source repo and is not derived from the
GEAK_WORK_DIR/GEAK_REPO_ROOTenv contract. It is mechanism-agnostic:(
repo_rootarg orGEAK_REPO_ROOT/GEAK_WORK_DIRenv, both literal andresolved);
sys.path.insert(0, "/abs"),-I/abs,REPO_ROOT = "/abs";list/tuple later fed into
sys.path/-I/PYTHONPATHvia a variable —the exact shape that, with
sys.path.insert(0, ...)in a loop, lands aheadof the worktree.
os.environ.get("GEAK_WORK_DIR", "/repo"), and system/toolchain prefixes(
/opt/rocm, etc.). Full-comment lines are stripped; a tokenizer-basedcontext scan avoids false positives on strings/comments.
find_aiter_routing_violation(text)—aiteris installed editable via asys.meta_pathfinder, sosys.pathordering can not shadow it. A harnessthat imports
aiterand builds acsrc/*.cukernel must setAITER_META_DIR=$GEAK_WORK_DIR(source routing) and a per-slotAITER_JIT_DIR(build-output routing) beforeimport aiter, or itsilently evaluates the baseline (and pollutes/races the shared source build
dir). The gate enforces both.
validate_harness(path, repo_root=None)raisesContractViolationwith anactionable message (offending line + how to derive from
$GEAK_WORK_DIR).2. Wire the gate into every layer
run/preprocess/harness_utils.py+run/preprocess/run_harness.py: harnessgeneration and optimization-time evaluation run the gate and reject leaks.
run/preprocess_v3/adapter.py: exportGEAK_REPO_ROOT= the originalsource repo before harness validation. Previously preprocess only saw
GEAK_WORK_DIR(the sub-agent worktree), so_resolve_repo_rootscould notrecognize a leak pointing at the real repo root.
subagents/preprocess/harness-verifier/SUBAGENT.yaml: the verifier sub-agentnow runs the deterministic validators as fatal gates — worktree-bypass (7),
aiter-routing (7b), self-contained-build (7c) — and on failure escalates to
the generator with
FAILED_RULE=...+ a correction hint, instead of patchingaround it by hardcoding another path.
3. Prevent the bug at the source (generator guidance)
subagents/preprocess/harness-generator/SUBAGENT.yaml: a mandatory"Worktree Path Discipline" section — derive
WORK_DIR = os.environ.get("GEAK_WORK_DIR", ...), build-I{WORK_DIR}/...,put the build cache under
WORK_DIR(per-slot isolated), do an incrementalrebuild keyed on the worktree source mtime, build self-contained from a
fresh worktree, and never add a hardcoded source-path fallback to a
sys.pathcandidate list.
kernel_languages/hip/builder_hints.md(new): HIP-specific idioms (pybind /standalone / raw
hipccshapes, timing loop) and the mandatory aiter worktreerouting rule.
kernel_languages/hip/commandment.j2: use Jinjadefault(..., true)so anempty
correctness_command/performance_commandstill falls back to theworktree-
cddefault command (previously an empty string suppressed it).4. Robustness (avoids false negatives / discarded runs)
build from scratch in a fresh per-slot worktree (no reliance on a shared
_harness_buildoutside the worktree, no source-repo mtime check), so eachcandidate recompiles the patched source.
run/preprocess_v3/{orchestrator,tools}.py: retry caps on deterministic toolsharness.py+COMMANDMENT.mdis rescued (with a warning) rather than aborted.tools/bash_command.py: process-group timeouts + denylist scope-gating so theverifier's checks can't hang on recursive scans over a huge filesystem.
How each root-cause block is now closed
REPO_ROOT = "/abs"literalfind_source_repo_path_leaksnamed-constant + repo-root-prefix patterns; generator derives from$GEAK_WORK_DIRhipcc -I /abs/source-I/abspattern; generator emits-I{WORK_DIR}/....socache outside worktreeWORK_DIR(per-slot)aiter(sys.path can't shadow)find_aiter_routing_violation(AITER_META_DIR+AITER_JIT_DIR)Validation
find_source_repo_path_leaksfires on the exact four harness shapes from theissue and stays silent on a contract-compliant
os.environ.get("GEAK_WORK_DIR", ...)harness (true/false-positive cases covered by unit tests).
a harness with no
GEAK_WORK_DIRreference is rejected at preflight ratherthan producing a green ≈1.00× run.
silu.hip) and anaiterkernel: the generated harness resolves paths from$GEAK_WORK_DIR,builds inside the worktree, and a deliberately corrupted worktree kernel now
makes
--correctnessFAIL (previously PASSed).py_compile/ lint clean on all touched modules; existing preprocess testsuite passes.
Files
contract.py(gate),hip/builder_hints.md(new),hip/commandment.j2,harness_utils.py,run_harness.py,adapter.py,orchestrator.py,tools.py,baseline.py,bash_command.py,unit_test_agent.py,litellm_model.py,harness-generator/SUBAGENT.yaml,harness-verifier/SUBAGENT.yaml.