Symlink gitignored build artifacts into worktrees by fxmarty-amd · Pull Request #192 · AMD-AGI/GEAK

fxmarty-amd · 2026-04-24T08:25:22Z

Motivation

Worktrees created by git worktree add don't include gitignored files. For projects like vllm that have compiled
extensions (.so), generated version files (_version.py), and build artifacts, this means the worktree can't run
without a full rebuild. Symlinking these files from the original repo avoids that cost.

For example, I was getting errors as:

WARNING 04-23 16:39:18 [rocm.py:43] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")

and others due to missing objects, and GEAK would go endlessly and unnecessarily looking for them.

Testing

python -m pytest tests/run/test_worktree_symlink.py -v

8 passed — covers .so symlinking, _version.py symlinking, output-directory exclusion, existing-file preservation, and log
output.

AI assistance was used (Claude).

When --num-parallel >=2, multiple agent threads shared the same module-level MCPToolBridge singletons. Since each bridge wraps a single subprocess with one asyncio StreamReader, concurrent readline() calls from different threads triggered: readuntil() called while another coroutine is already waiting for incoming data Fix: each ToolRuntime now creates its own set of MCPToolBridge instances via _create_own_bridges(), giving every parallel agent its own subprocess and stdio pipes. The module-level bridges are used only for schema discovery at import time and then shut down. Fixes: AMD-AGI#100

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The explicit _shutdown_loop() calls at module init killed the event loop, then the atexit handler tried to shut down the same bridges again on a dead loop, causing the process to hang after tests. Let atexit handle cleanup once, matching the behavior on main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…n-cleanup Fix/postprocess evaluation cleanup

…ract Four correctness fixes to prevent agents from evaluating patches against the wrong (original) code: Issue 1 – kernel defs in harness: detect @triton.jit/@triton.autotune decorated functions embedded inside a harness file and split them into kernel_extracted.py, rewriting the harness to import from it. Called at all three harness selection points (deterministic, discovery, UTA). Issue 2 – relative imports bypass PYTHONPATH: _rewrite_relative_imports() converts `from .. import foo` style imports to absolute imports anchored at repo_root, so PYTHONPATH ordering (GEAK_WORK_DIR first) is respected. Plugged into _rewrite_materialized_harness_source(). Issue 3 – COMMANDMENT.md hardcodes harness path: replace ${GEAK_HARNESS} variable references in _generate_simple() and _generate_inner_kernel() with the literal resolved harness path, so agents cannot accidentally override it. Issue 4 – null repo_root: add _infer_repo_root() that walks up from the kernel file looking for .git/pyproject.toml/setup.py markers. Used as fallback when resolve_kernel_url returns no local_repo_path (local file path specs). Hard assertion ensures repo_root is never empty downstream. Adds 11 unit tests covering all four fixes (no GPU/LLM required). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nel defs Issue 1 implementation was inverted: the correct split is to pull test functions out into a new test_<stem>_harness.py and leave the original file as a clean kernel. The old code did the opposite (extracted @triton.jit defs, rewrote harness to import them), which would break patch evaluation. New detect_and_split_kernel_from_harness algorithm: - Find test-root seeds: run_*/test_* names, pytest/unittest decorators, GEAK CLI flag usage (--correctness etc.), functions called from __main__ - BFS from seeds collecting all reachable functions, skipping @triton.jit - Strip collected test functions + __main__ block from original file - Write test_<stem>_harness.py with all imports, sys.path bootstrap, `from <stem> import *`, all test functions, and __main__ block Update _ensure_harness_has_no_kernel_defs to match new return signature (new_harness_path, kernel_path) and always set ctx["kernel_path"]. Update TestDetectAndSplitKernelFromHarness: 3 tests covering the corrected split direction and BFS exclusion of @triton.jit functions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously detect_and_split_kernel_from_harness was only called when a file was being evaluated as a harness candidate. Merged files (like naive_softmax.py) passed directly via --kernel-path were never split, so agents saw and patched a file containing both kernel defs and test infrastructure. Now immediately after Step 1 (resolve-kernel-url), if the resolved kernel file contains both @triton.jit defs and test roots, we split it: test logic goes to test_<stem>_harness.py in output_dir, the original becomes a clean kernel file. The split harness is also surfaced as the harness hint for downstream discovery/UTA so they build on it rather than starting from scratch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… hint The pytest-extracted harness from detect_and_split_kernel_from_harness lacks --correctness/--profile/--benchmark/--full-benchmark flags, so setting it as the harness hint caused deterministic validation to crash. Instead, let UTA consume the cleaned kernel and generate a proper GEAK harness normally. The split still fires (kernel is cleaned), UTA gets the right input, end-to-end smoke test confirms preprocessing completes successfully. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ract Write clean kernel copy to output_dir instead of mutating the original repo file. The geak framework uses git apply/revert to manage patches against the repo; in-place modification of the kernel file breaks those git operations and causes patches to fail in subsequent rounds. Instead: - Clean kernel (test logic stripped) is written to output_dir/<name>.py - Original repo file is left untouched - PYTHONPATH already includes GEAK_WORK_DIR (output_dir) first, so `from <stem> import *` in the harness resolves to the clean copy - Agents patch the output_dir copy; git state of the repo is clean Update tests to assert original file is untouched and clean kernel copy lives in output_dir. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fn_map was dict[name -> single node], so duplicate function definitions (e.g. two `def test_vecmat` in test_batched_vecmat.py) would only store the last definition. The first definition was never added to the strip set, leaving it in the clean kernel output. Two fixes: 1. fn_map is now dict[name -> list[nodes]] — all definitions captured 2. Strip phase scans tree.body directly (not fn_map) to catch every definition whose name is in test_fns, regardless of duplicates Validated against all 31 rocmbench kernel files: 31/31 clean splits, no test functions leaking into kernels, originals untouched. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Issue 4 (_infer_repo_root): - Add test_finds_setup_py and test_finds_setup_cfg to verify all 4 marker types are detected - Add test_git_takes_precedence_over_inner_pyproject to verify walk-up handles nested marker files without crashing Issue 2 (_rewrite_relative_imports): - Add test_rewrites_same_package_relative_import (level=1, from .sibling) - Add test_rewrites_multiple_relative_imports_in_one_file — multiple relative imports of different levels in one source - Add test_rewritten_import_resolves_to_patched_not_original — the core correctness guarantee: absolute import resolves to the GEAK_WORK_DIR copy (prepended to PYTHONPATH) not the original repo file - Retain existing absolute/outside-repo no-op tests Issue 3 (COMMANDMENT hardcodes harness): - Add test_all_four_sections_contain_literal_harness_path — each of CORRECTNESS, PROFILE, BENCHMARK, FULL_BENCHMARK must embed the literal absolute path and must not contain ${GEAK_HARNESS} - Add test_harness_path_is_absolute_not_relative — path must start with / so agents can find it from any working directory 68 tests total, all passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…l file path When detect_and_split_kernel_from_harness writes the new harness to output_dir, the harness is outside the repo — so a subsequent call to _rewrite_relative_imports(new_harness, repo_root) would fail because it cannot determine the package path from outside the repo tree. Fix: collect and rewrite relative imports inside the split function itself, while harness_path (the original file's location inside the repo) is still available as the reference for computing the package hierarchy. Walk up from harness_path to find repo_root independently so the split function is self-contained. Smoke tested with a synthetic package: myrepo/ops/kernels/naive_add.py (merged, has from ..helpers import set_seed) myrepo/ops/helpers.py After split: test_naive_add_harness.py -> from ops.helpers import set_seed, make_input naive_add.py -> @triton.jit add_kernel (no test fns) original file -> untouched (still has from ..helpers) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extend the merged-file split path to generate a GEAK-compatible Python wrapper for HIP/CUDA harnesses. This lets real mixed-source HIP kernels run through correctness, profiling, baseline capture, and commandment generation without changing the existing preprocess contract. Made-with: Cursor

- Skip swerex docker tests when docker binary is unavailable (container CI environment has no docker daemon) - Fix test_env_var_fallback: GEAK_MODEL env var takes precedence over MSWEA_MODEL_NAME; explicitly unset it in the patch.dict context so the MSWEA_MODEL_NAME fallback path is correctly exercised Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test_rewritten_import_resolves_to_patched_not_original was clearing any module whose name contained "utils" from sys.modules. This inadvertently evicted minisweagent.run.utils.* (including task_parser) from the module cache. In xdist workers, subsequent tests that patched minisweagent.run.utils.task_parser.datetime received a freshly re-imported module instance, while the test's tp reference still pointed to the old one — causing the patch to be silently ignored and datetime.now() to return the real timestamp instead of the mock. Fix: use an exact prefix match (key == "ops" or key.startswith("ops.")) so only the temporary ops package created by the test is evicted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sor artifacts from patches Two bugs caused degraded optimization runs when a relative output directory was specified: 1. _derive_output_dir_and_traj did not resolve explicit output paths to absolute, causing COMMANDMENT and other artifact paths in task YAML to be relative. read_task_file then resolved them against the task file's directory instead of the workspace root, producing bogus paths. Sub-agents silently fell back to a raw test_command without SETUP/CORRECTNESS/BENCHMARK sections. 2. The preprocessor writes baseline_metrics.json and profile.json to the kernel repo root (introduced in 3178b58). These leaked into patches via git diff, causing "Failed to apply starting patch" in subsequent rounds when the files already existed in worktrees. Add both files to the generated-artifacts exclusion list so they are stripped from patches and excluded from diffs. Made-with: Cursor

…egrity-main-20260401 Fix/preprocess harness integrity main 20260401

Two changes: - tools_runtime: collect_mcp_tools() was called at module import time, spawning 3 MCP server subprocesses on every `import minisweagent`. This caused `geak --help` (and any sub-agent that probed the geak CLI) to hang indefinitely waiting for MCP handshakes. Fixed by deferring to _ensure_mcp_collected(), called lazily on the first ToolRuntime instantiation instead. - mini.py: guard configure_if_first_time() with sys.stdin.isatty() so the interactive setup wizard does not block piped / non-TTY invocations that lack the MSWEA_CONFIGURED env var. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Align model tool filtering with mini.py disabled_tools: when AmdLlmModelConfig.profiling is false, remove both the built-in profiling tool and the MCP profile_kernel tool. Centralize logic in filter_tools_for_amd_config for AmdLlmModelBase (__init__ and set_tools) and reuse from LitellmModel. Made-with: Cursor

parse_speedup_report only recognized the `Overall: Xx (Yms -> Zms)` format from save_and_test. When the harness only prints `GEAK_RESULT_LATENCY_MS=<number>`, candidate_ms was extracted but overall_speedup and baseline_ms stayed null — breaking trajectory tracking, strategy scoring, and dead-end detection in working memory. Fix: record_result now reads the stored baseline from the notebook's event log and passes it to parse_speedup_report, which computes `overall_speedup = baseline_ms / candidate_ms` when the Overall line is absent. Also fixes the GEAK_RESULT_LATENCY_MS regex to handle scientific notation. Made-with: Cursor

Working memory baseline was set from the METRIX profiler's kernel-level duration_us, not the harness's GEAK_RESULT_LATENCY_MS. When the harness measures a different metric (e.g. full model forward pass vs single kernel dispatch), speedup computation was wrong — agents saw "regression" when they actually improved the target metric. Extract load_baseline_from_artifacts() into WorkingMemory so both the heterogeneous orchestrator and parallel_helpers use the same logic: read baseline_metrics.json first (profiler), then override with benchmark_baseline.txt (harness GEAK_RESULT_LATENCY_MS) if available. Made-with: Cursor

Add missing blank line after inline import per ruff E303. Made-with: Cursor

…ssing benchmark_baseline.txt is not always written by the preprocessor (depends on the code path). When it's absent, load_baseline_from_artifacts now checks harness_results.json for the benchmark entry's GEAK_RESULT_LATENCY_MS. This fixes the e2e harness scenario where the preprocessor writes harness results but not the separate benchmark_baseline.txt file, causing working memory to use the profiler's kernel-level baseline instead of the harness's end-to-end measurement. Priority chain: 1. benchmark_baseline.txt (GEAK_RESULT_LATENCY_MS) 2. harness_results.json benchmark entry (GEAK_RESULT_LATENCY_MS) 3. baseline_metrics.json (profiler duration_us) Made-with: Cursor

fix: prevent geak --help from hanging by deferring MCP server startup

Resolves pylint E0102 (function-redefined) from a merge that left two set_tools definitions; keep the implementation that uses filter_tools_for_amd_config. Made-with: Cursor

Fix/homogeneous repo path

fix: compute overall_speedup from stored baseline in working memory

Adds memory/cross_session/*.json to [tool.setuptools.package-data] so the KB ships with the wheel even if include-package-data is ever disabled or the project moves to a build system without file-tree auto-inclusion. Current behavior: knowledge_base.json already ships because include-package-data=true picks up git-tracked files (verified with wheel builds both inside and outside git). This change is defensive — makes the intent explicit and survives config changes. Without it, a non-editable install into an environment where the KB is not already present (e.g., pip install of a source tarball in a non-docker/non-editable setup) would leave _seed_from_knowledge_base in backends/local.py silently no-op because Path(__file__).parent.parent / knowledge_base.json would not exist in site-packages. Made-with: Cursor

…-fallback fix(postprocess): fallback to 3-way apply when eval worktree lacks parent blob

…er + raised budgets Root cause analysis (with 100% hard evidence from peak vs current logs) showed that the 3 L3 kernels regressed because: 1. Retrieval scoring PENALIZED same-kernel KB entries whose stored bottleneck (e.g. "memory" from prior run's classifier) differed from runtime's current bottleneck ("latency"). The -0.15 penalty + success_boost caused cross-kernel 2.46x entries to outrank byte-identical same-kernel 2.23x entries. Peak retrieval log: top=fused_rms_fp8(0.706) Current (pre-fix): top=fused_mxfp4_quant_moe_sort(0.721) 2. Context budget was slashed 60K -> 20K, per-patch 8K -> 4K. The 2.23x fused_rms_fp8 winning patch is 38KB -- it cannot fit in today's budget even if retrieval ranks it AMD-AGI#1, so the agent cannot read the code to reproduce. 3. FIRST-MOVE / EXACT-CODE-MATCH banner was removed in favor of purely "informed cross-reference" framing. The agent had no prior signal telling it a verbatim-applicable patch was in the context, so it defaulted to authoring minimal own-changes (R1=1.00x every round today vs R2=1.99x in peak runs). Fixes: Retriever (`retriever.py`): * New `_code_similarity(target, kb)`: whitespace-normalized line-set Jaccard with a sha256 short-circuit for byte-identical detection. * `_stage2_code_similarity` replaces `_stage2_text_similarity`. Scoring: total = code_sim * 1.0 + scaled_speedup_boost (cap 0.30) + stem_boost (cap 0.10) Byte-identical entries always outrank non-identical (1.0 + 0 > 0.99 + 0.40). Tie-break by `best_speedup` ensures 2.23x fused_rms_fp8 surfaces above 1.80x fused_rms_fp8 when all share code_sim=1.0. * Removed: category boost, bottleneck match +/-0.25 (the -0.15 penalty was the primary regression trigger), language boost, diversity penalty, text-based `_build_query_terms`/`_experience_text`/`_text_similarity` /`_extract_source_terms`/`_NOISE_WORDS`. ~230 lines of dead scoring machinery deleted. * Relevance gate simplified: emit context iff best code_sim-based score >= 0.02. Prevents unrelated entries from being shown when none apply. Formatter (`formatter.py`): * `_MAX_CONTEXT_FULL`: 20_000 -> 40_000 * `_MAX_CONTEXT_COMPACT`: 4_000 -> 8_000 * `_MAX_BEST_PATCH_CHARS`: 4_000 -> 8_000 (fits full winning patch) * `_TOP_IMPROVED_STRATEGIES`: 3 -> 5 (more working patterns visible) * `_TOP_REGRESSED_STRATEGIES`: 2 -> 3 (more anti-patterns to avoid) * `_MAX_REGRESSION_PATCH_CHARS`: 1_500 -> 2_000 * `_MAX_BASELINE_BENCHMARK_CHARS`: 1_500 -> 2_000 * New `_find_exact_code_match(experiences, target_code)` picks highest-speedup byte-identical entry (whitespace-normalized). * New `_build_exact_match_banner(exp)` emits a targeted EXACT-CODE-MATCH banner at the top of the injected context. Banner fires ONLY when at least one retrieved entry is byte-identical; silent for cross-kernel transfers (zero false positives, zero risk of directing agent to copy an inapplicable patch). Simulation validation (per-kernel) on the current KB of 23 entries: fused_rms_fp8: target matches 11 KB entries byte-for-byte; top-8 are all same-kernel sorted by speedup (2.23x, 2.17x, 1.80x, 1.77x, ...). EXACT-CODE-MATCH banner fires naming the 2.23x patch. gemm_a16wfp4: no same-kernel entry exists; top is gemm_a16w16_atomic (3.92x, stem-matched, line-level code_sim=0.131) as expected cross- kernel transfer seed. Banner does not fire. llama_ff_triton: exact match on our own 5.24x entry; banner fires; subsequent cross-kernel entries (gemm_a16w16_atomic, knn, three_nn) follow as expected secondary options. Made-with: Cursor

… is sufficient The banner was redundant. Every KB entry already emits a ``Code fingerprint`` line (``Nbytes, sha256=...``) and the top of the context shows ``Your kernel fingerprint`` for the current kernel. When those match byte-for-byte the agent sees it directly in the evidence and can act on it without a prescriptive "apply verbatim" banner at the top. This returns to the "informed cross-reference, not directive" principle we already committed to. The code-similarity-based ranking from the previous commit already surfaces byte-identical matches to the top of the retrieved set -- no extra banner needed for the agent to notice. Removed: * ``_find_exact_code_match`` (58 lines) * ``_build_exact_match_banner`` (18 lines) * top-of-context banner emission Kept: * per-entry ``Code fingerprint`` line -- the signal the agent compares * top-of-context ``Your kernel fingerprint`` -- the reference value * code-similarity primary ranking (from previous commit) * raised context budgets (from previous commit) Made-with: Cursor

… I/O The retriever no longer accepts a ``kernel_path`` and never reads the filesystem. Callers pass the kernel's raw source as ``target_code``; the integration wrapper at ``memory.cross_session.retrieve()`` keeps a ``kernel_path`` kwarg as a backward-compat shim that reads the file once before forwarding. Why this matters: sub-agent dispatch contexts were passing a ``kernel_path`` that pointed at a not-yet-materialised working-dir location (``/workspace/outputs/<k>/tasks/round_N/outputs/<k>/kernel.py``), so ``_read_target_code`` silently returned ``""`` and every KB entry's code_sim dropped to 0. The ranking then collapsed to success_boost only, which surfaced unrelated high-speedup kernels (e.g. ``three_nn``/5.38x) as top for ``fused_qkv_rope``/ ``fast_rms_layernorm`` sub-agent retrievals. The orchestrator-level retrieval was correct because its path resolved; sub-agents regressed. Making the retriever purely a (target_code, candidates) -> ranking function removes the failure mode by construction -- the caller either has the code and supplies it, or retrieval declines (no misleading partial context). This is semantically cleaner too: code identity matches should be computed on code, not inferred from path strings. Concrete changes: * ``retrieve_context(kernel_path=...)`` -> ``retrieve_context(target_code=...)`` * Removed ``_read_target_code``, ``_infer_category``, ``_infer_language``, ``_kernel_stem_overlap`` (path-based helpers, no longer used). * Scoring collapses to ``code_sim + capped success_boost``: - stem_boost removed: name strings are a path-derived proxy for code similarity; if code similarity is weak, the stem heuristic was overweighting tenuous cross-family transfers. - category/bottleneck/language boosts already gone in the previous commit for the same reason. * ``format_landscape_context`` takes ``target_code`` instead of ``target_kernel_path``; drops its own Path-read logic. * ``memory.cross_session.retrieve()`` accepts ``target_code`` (preferred) and still accepts legacy ``kernel_path`` (reads the file once before forwarding the raw code). Semantic validation (re-simulated on current KB of 23 entries): fused_rms_fp8 target (49,666 B): top = fused_rms_fp8 sp=2.231x total=1.100 (code_sim strong=yes) -- identical to the outgoing behaviour, correctly surfaces the same-kernel 2.23x entry at rank 1. fused_qkv_rope target (25,074 B at the real AKA path; 11,491 B at the GEAK working-dir stripped version): * AKA path -> code_sim=1.000 (byte-identical), same-kernel wins. * Working dir -> code_sim~=0.45 (Jaccard over shared lines), same-kernel still wins because 0.45 + 0.05 > any cross-kernel 0 + 0.20 (three_nn's success_boost). Sub-agent path regression (target_code="" failure mode) now impossible: if a caller passes ``target_code=""`` the retriever declines via the ``best_score < 0.02`` relevance gate; no unrelated entries surface. Made-with: Cursor

The agent now sees the Jaccard code-overlap percentage between its current kernel and every retrieved KB entry, plus a top-level KB-relevance tier. Without this signal the agent sometimes anchored on a weak cross-family entry (e.g. gemm_a16w16_atomic ~ 13% overlap with gemm_a16wfp4) and defaulted to generic-GEMM strategies across all 5 rounds instead of recognising early that the KB had no close match and pivoting to kernel-specific analysis (MXFP4 quant ops etc.). The framing stays non-prescriptive: we report the NUMBER, explain what it means, and let the agent weigh. Retriever: * Compute per-entry code_sim for the top-k selected entries (reuses _code_similarity, no extra pass over all candidates). * Thread ``per_entry_code_sim`` into ``format_landscape_context``. Formatter: * Top-of-context ``KB relevance`` tier: STRONG (code_sim ≥ 99%) "patch applies verbatim" PARTIAL (25% ≤ code_sim) "adapt techniques, validate against your hot paths" LOW (code_sim < 25%) "distant cross-family references; analyse YOUR kernel + profiler for kernel-specific optimisations" * Per-entry ``**Code similarity to your kernel**: NN.N%`` line with the same qualitative tier. Agent can now sort/weigh entries by their actual code overlap rather than rank position alone. * Reasoning guidance updated to call out: "if no entry matches well and early rounds of KB-inspired strategies don't improve, pivot to analysing the current kernel's profile and propose kernel-specific optimizations." Validated against the current 23-entry KB: fused_rms_fp8 target → KB relevance: STRONG (100% top match) fused_qkv_rope target → KB relevance: STRONG (100% top match) fast_rms_layernorm → KB relevance: PARTIAL (26% top match) gemm_a16wfp4 target → KB relevance: LOW (13% top match) The LOW-relevance case is exactly where the previous run got stuck at 1.03x: the agent anchored on a 13% same-category (GEMM) entry and rolled out 5 rounds of generic GEMM strategies, never trying MXFP4- specific optimisations (fuse-wrapper-ops, precompute-quant-separate, log2/exp2 bitops) that the historical 1.43x peak exploited. With the explicit "LOW / use as weak hints / focus on YOUR kernel's quant ops" framing the agent should recognise the mismatch earlier and pivot. Made-with: Cursor

…m number Earlier we added "KB relevance: STRONG / PARTIAL / LOW" banners and per-entry "STRONG/WEAK: treat as generic hint only" labels with directive-flavoured text like "prioritise analysing YOUR kernel's profiler output". Reverting this -- the agent already has: * Full current kernel source (task body) * Current baseline_metrics / profiler output (injected separately) * Each KB entry's stored code_fingerprint, code_sim %, baseline→ best latency, bottleneck, strategies with diffs, key params, round trajectory, regressions Adding a pre-digested "this is WEAK, pivot to kernel-specific" tier is us interpreting on the agent's behalf. Given the same raw evidence the agent can (and should) form its own judgement. Kept from the previous commit: * Per-entry raw Jaccard percentage -- a signal the agent can't derive from fingerprint alone ("your fingerprint is A; the KB entry's is B" tells you they differ but not by how much). * Per-entry code_fingerprint (so the agent can confirm byte-identity exactly when it matters). * Per-entry full evidence block (hardware, performance, diffs, etc.). Removed: * Top-of-context "KB relevance: STRONG/PARTIAL/LOW" banner. * Per-entry qualitative tier suffix ("STRONG: byte-identical source, patch applies verbatim", "WEAK: distant cross-family, treat as generic hint only", etc.). * Reasoning-guidance directives ("pivot to analysing the current kernel's profile", "prioritise analysing YOUR kernel's profiler output", "if early rounds of KB-inspired strategies don't improve..."). Guidance text is now minimal and descriptive: "*Below: evidence from past optimization runs... You also have your current kernel's full source and profiler metrics from the main task. Use both inputs to form your own plan -- the KB informs your decision, it does not make it for you.*" Made-with: Cursor

`write_task_file` was writing path-valued frontmatter fields verbatim when no `relative_to=` anchor was passed. Callers (`tools.py::tool_generate_tasks`) sometimes pass CWD-relative strings like `outputs/fused_qkv_rope/kernel.py`, sometimes absolute paths -- depending on how the orchestrator was initialised. The downstream reader (`read_task_file`) then resolves the relative string against the *task file's own directory*, producing nonsense paths like `<output>/<kernel>/tasks/round_2/outputs/<kernel>/kernel.py` that don't exist. Visible symptom: the cross-session memory retriever, called from `dispatch.task_file_to_agent_task` for sub-agent injection, gets `kernel_path` pointing at a non-existent file, fails the read silently (OSError -> empty string), and logs `Retriever: target_code=0B bottleneck=unknown`. Code-similarity scoring then returns zero for all entries, so unrelated KB entries get promoted (e.g. `fast_rms_layernorm` 6.50x getting injected into `fused_qkv_rope` sub-agents). Empirically verified against today's MI355X runs: slot1 (`gemm_a16wfp4`) had absolute paths in its task files and 0/28 retrieval calls had target_code=0B; slot2 (`fused_qkv_rope`) had relative paths and 32/37 (86%) retrieval calls had target_code=0B and surfaced wrong-kernel KB entries. Fix: in `write_task_file`, resolve path-valued fields to absolute paths against the writer's CWD before serialising. The read-side resolution in `read_task_file` reduces to a no-op for absolute paths, so the file opens correctly regardless of who later reads it (orchestrator sub-agent dispatch, parallel agent worker, standalone CLI). Smoke-tested end-to-end: a relative `outputs/foo/kernel.py` written from CWD `/tmp/.../workspace` is stored as the absolute path and read back correctly from a different CWD (`/tmp`). Behaviour with `relative_to=` set is unchanged. Made-with: Cursor

update skill to support docs and scripts within skill folder

… postprocessor - Replace site.main() with sys.path.insert() in mini.py for more targeted path refresh after rag-mcp auto-install (PR AMD-AGI#90 review followup) - Pass api_key from agent model config to RAG postprocessor so it can use yaml-configured api_key instead of relying solely on env vars (Issue AMD-AGI#169)

…-install Made-with: Cursor # Conflicts: # README.md # mcp_tools/README.md # pyproject.toml

…ocess fix(preprocess): use full Metrix profile for baseline runs

…tall fix(packaging): make full extras pip-installable again

…-rag-integration feat(memory): enhance cross-session memory retrieval + RAG

update readme

Removed the unnecessary top-level knowledge-base directory copy that causes docker failure.

Update Dockerfile

Removed from install / install-full / install-dev. Replaced with an opt-in 'make index' target. mini.py already lazy-builds on first RAG use when tools.rag is enabled (off by default), so the eager run only slowed every 'make install' (and docker build) by several minutes without benefit. Made-with: Cursor

…efore-install fix(docker): copy scripts/ before make install for RAG index build

Made-with: Cursor

fix: replace site.main() with sys.path.insert and pass api_key to RAG postprocessor

fix(docker): include scripts/ in image so make index works at runtime

Co-authored-by: Claude <noreply@anthropic.com>

yueliu14 and others added 30 commits April 3, 2026 07:31

fix examples/mla_decode import aiter error

0aa6ec1

style: fix ruff import sorting in tools_runtime.py

474b1e7

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge pull request AMD-AGI#101 from AMD-AGI/fix/postprocess-evaluatio…

c2b3e41

…n-cleanup Fix/postprocess evaluation cleanup

style: apply ruff formatting to preprocess modules

09a1397

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge pull request AMD-AGI#98 from AMD-AGI/fix/preprocess-harness-int…

99f2c27

…egrity-main-20260401 Fix/preprocess harness integrity main 20260401

style: fix ruff formatting in working_memory.py

78f7907

Add missing blank line after inline import per ruff E303. Made-with: Cursor

Merge pull request AMD-AGI#108 from AMD-AGI/fix/no-hang-on-geak-help

c6796a1

fix: prevent geak --help from hanging by deferring MCP server startup

fix(models): remove duplicate set_tools in AmdLlmModelBase

1145119

Resolves pylint E0102 (function-redefined) from a merge that left two set_tools definitions; keep the implementation that uses filter_tools_for_amd_config. Made-with: Cursor

Merge branch 'main' into geak_v3_db_mcp_integration

dfdca9f

Merge pull request AMD-AGI#103 from AMD-AGI/fix/homogeneous_repo_path

5300098

Fix/homogeneous repo path

Merge pull request AMD-AGI#106 from AMD-AGI/fix/mem-parsing

5c202a0

fix: compute overall_speedup from stored baseline in working memory

iraj465 and others added 28 commits April 20, 2026 07:11

Merge pull request AMD-AGI#161 from AMD-AGI/fix/eval-patch-apply-3way…

9699652

…-fallback fix(postprocess): fallback to 3-way apply when eval worktree lacks parent blob

update skill to support docs and scripts within skill folder

2dbcda6

fix skill readme

fb8ef2b

Merge pull request AMD-AGI#171 from AMD-AGI/main_skill_fix

81c87bd

update skill to support docs and scripts within skill folder

update readme

72ad38a

Update geak.yaml

d2f51fa

Merge remote-tracking branch 'origin/main' into diagnose/mcp-pip-full…

2fc7df9

…-install Made-with: Cursor # Conflicts: # README.md # mcp_tools/README.md # pyproject.toml

Merge pull request AMD-AGI#146 from AMD-AGI/upandey/metrix-full-prepr…

7244453

…ocess fix(preprocess): use full Metrix profile for baseline runs

Merge pull request AMD-AGI#141 from AMD-AGI/diagnose/mcp-pip-full-ins…

2e764bd

…tall fix(packaging): make full extras pip-installable again

Merge pull request AMD-AGI#166 from AMD-AGI/feat/cross-session-memory…

97891d3

…-rag-integration feat(memory): enhance cross-session memory retrieval + RAG

Merge branch 'main' into doc/readme_framework

0c41fc4

Merge pull request AMD-AGI#174 from AMD-AGI/doc/readme_framework

d7b880c

update readme

Update Dockerfile

4f0846b

Removed the unnecessary top-level knowledge-base directory copy that causes docker failure.

Merge pull request AMD-AGI#177 from AMD-AGI/fix/docker-build

6606f3a

Update Dockerfile

Merge pull request AMD-AGI#180 from AMD-AGI/fix/docker-copy-scripts-b…

8980d2a

…efore-install fix(docker): copy scripts/ before make install for RAG index build

fix(docker): include scripts/ in image so make index works at runtime

b16c219

Made-with: Cursor

Merge pull request AMD-AGI#173 from AMD-AGI/fix/pr90-review-followup

d54a5cc

fix: replace site.main() with sys.path.insert and pass api_key to RAG postprocessor

Merge pull request AMD-AGI#183 from AMD-AGI/fix/docker-include-scripts

393b7af

fix(docker): include scripts/ in image so make index works at runtime

feat(worktree): symlink gitignored build artifacts into worktrees

55fd68f

Co-authored-by: Claude <noreply@anthropic.com>

Umangatamd force-pushed the main branch from 70dd063 to 5ef30d0 Compare May 4, 2026 06:43

yueliu14 self-assigned this May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Symlink gitignored build artifacts into worktrees#192

Symlink gitignored build artifacts into worktrees#192
fxmarty-amd wants to merge 445 commits into
AMD-AGI:mainfrom
fxmarty-amd:felmarty/symlink-gitignored-files-in-worktree

fxmarty-amd commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

fxmarty-amd commented Apr 24, 2026

Motivation

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants