[AMD] feat(agentic): AgentX v0.3 — Kimi MI355X LMCache MP benchmark#1565
[AMD] feat(agentic): AgentX v0.3 — Kimi MI355X LMCache MP benchmark#1565seungrokj wants to merge 11 commits into
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 3fa8c2b. Configure here.
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟡
utils/generate_aiperf_plots.py:691-709— The two parallel arrays built at utils/generate_aiperf_plots.py:691-709 use mismatched filters:starts_nsskips records whererequest_start_nsis missing or falsy, while the per-record loop forttfts_ms/e2es_ms/interactivitiesiterates all records unfiltered. If any record is dropped from the start-time filter,request_times_sends up shorter than the metric arrays andax.scatter(request_times_s, values)inpanel_per_record_metricraisesValueError: x and y must be the same size. The caller (write_agentic_result_json) masks failures with|| true, sometrics_plots.pngsilently fails to render. Either drop the filter or apply the same filter in the per-record loop.Extended reasoning...
What the bug is
utils/generate_aiperf_plots.py(new file added by this PR) builds two parallel arrays inmain()from the samerecordsiterable but applies different filters:# Lines 691-697: filtered on request_start_ns starts_ns = [ int(r["metadata"]["request_start_ns"]) for r in records if r.get("metadata", {}).get("request_start_ns") ] first_record_start = min(starts_ns) if starts_ns else 0 request_times_s = [(s - first_record_start) / 1e9 for s in starts_ns] # Lines 700-709: NOT filtered ttfts_ms: list[float] = [] e2es_ms: list[float] = [] interactivities: list[float] = [] for r in records: ttft = metric_value(r, "time_to_first_token") ... ttfts_ms.append(ttft if ttft is not None else 0.0) ...
How it manifests
If any record in
recordshas a missing/falsymetadata.request_start_ns(truthy.get(...)check also drops 0/None), it gets excluded fromstarts_ns/request_times_sbut still appended tottfts_ms/e2es_ms/interactivities(with a 0.0 placeholder). The lists then have different lengths.Later,
main()passes them paired topanel_per_record_metric(lines 749, 758, 767), which callsax.scatter(request_times_s, values, ...)at line 582. Matplotlib'sscatterenforces equal-length sequences and raisesValueError: x and y must be the same size.Step-by-step proof
Consider 3 records, one missing
request_start_ns:records = [R0 (start=1e9), R1 (no start), R2 (start=3e9)]starts_nsloop filters out R1:[1_000_000_000, 3_000_000_000](len=2)request_times_s = [0.0, 2.0](len=2)- The
for r in recordsloop visits all 3 records and appends placeholders for R1:ttfts_ms = [ttft0, 0.0, ttft2](len=3) panel_per_record_metric(axes[4,0], request_times_s, ttfts_ms, ...)callsax.scatter([0.0, 2.0], [ttft0, 0.0, ttft2])→ValueError: x and y must be the same size.
What the impact would be
The exception propagates out of
main()and the process exits non-zero.benchmark_lib.sh'swrite_agentic_result_jsoninvokes the script with|| true:python3 "$INFMAX_CONTAINER_WORKSPACE/utils/generate_aiperf_plots.py" "$result_dir" 2>&1 || true
so the launcher does NOT fail —
metrics_plots.pngjust silently disappears from the artifact tarball with no warning in CI logs. This is precisely the silent-failure mode the bug finder flagged.Why this might be dead code in practice
load_jsonl_recordsat line 67 already drops records withobj['error']set, and successful aiperf records always populatemetadata.request_start_ns. So under healthy runs the filter is dead code and the mismatch can't actually trigger. The defensive.get()chain itself suggests the author thought the field could be missing — but if it really can't, the filter should be dropped.How to fix
Either drop the filter on
starts_ns(treatingrequest_start_nsas a hard-required key — raises loudly if missing instead of silently dropping):starts_ns = [int(r["metadata"]["request_start_ns"]) for r in records]
OR mirror the same filter into the per-record loop:
for r in records: if not r.get("metadata", {}).get("request_start_ns"): continue ...
Both fixes are mechanical one-liners. The two pieces of code currently cannot both be correct simultaneously.
-
🔴
utils/process_agentic_result.py:40— The test fixture inutils/test_process_agentic_result.pystill builds the fake HF cache directory using the old dataset name (cc-traces-weka-042026), but this PR updated_HF_DATASETinutils/process_agentic_result.py:40tosemianalysisai/cc-traces-weka-with-subagents-051926. As a result,test_processor_loads_traces_jsonl_for_theoretical_cachewill fail because_hf_traces_dir()cannot find the snapshots directory under the new name, leavingtheoretical_cache_hit_rateasNoneinstead of the expected0.5. Fix: update the fixture path intest_process_agentic_result.py(around line 408) to usedatasets--semianalysisai--cc-traces-weka-with-subagents-051926.Extended reasoning...
What the bug is
This PR renamed the agentic-replay dataset constant in
utils/process_agentic_result.py:40fromsemianalysisai/cc-traces-weka-042026tosemianalysisai/cc-traces-weka-with-subagents-051926. The dataset name flows through to_hf_traces_dir()(utils/process_agentic_result.py:133-134) which splits_HF_DATASETon/and looks for:$HF_HUB_CACHE/datasets--semianalysisai--cc-traces-weka-with-subagents-051926/snapshotsBut
test_processor_loads_traces_jsonl_for_theoretical_cacheinutils/test_process_agentic_result.pystill constructs the fake HF cache at the old path:snapshot = hf_cache / "datasets--semianalysisai--cc-traces-weka-042026" / "snapshots" / "abc" snapshot.mkdir(parents=True) ... with open(snapshot / "traces.jsonl", "w") as f: for t in traces: f.write(json.dumps(t) + "\n")
Step-by-step proof the test will fail
- The test sets
HF_HUB_CACHEto a directory containing onlydatasets--semianalysisai--cc-traces-weka-042026/snapshots/abc/traces.jsonl. - The subprocess runs
process_agentic_result.py, which sees the new_HF_DATASET = "semianalysisai/cc-traces-weka-with-subagents-051926". _hf_traces_dir()computesorg, name = _HF_DATASET.split("/")→("semianalysisai", "cc-traces-weka-with-subagents-051926")and globscache_root/"datasets--semianalysisai--cc-traces-weka-with-subagents-051926"/"snapshots".- That directory does not exist in the fixture cache (only the
-042026directory does), so_hf_traces_dir()returnsNone. _TRACE_METADATA_CACHEis never populated, so per-request hash_ids and trace output_lengths are never attached.theoretical_cache_hit_rateis computed asNoneinstead of the expected0.5.- The assertion
assert agg["theoretical_cache_hit_rate"] == pytest.approx(0.5)fails. The companion assertionassert agg["mean_output_tokens_expected"] == pytest.approx((50 + 60 + 55 + 40 + 70) / 5)will also fail for the same root cause (no metadata loaded → nooutput_tokens_expectedaggregated).
Why existing code doesn't prevent it
The two strings are not deduplicated — the production constant lives in the module under test, and the fixture rebuilds the cache directory name by hand. There's no shared constant or fixture builder that would force them to stay in sync. The PR author touched this test module (changing docstrings/paths from
trace_replay/toaiperf_artifacts/) but missed updating the dataset directory name.Impact
test_processor_loads_traces_jsonl_for_theoretical_cachewill fail in CI on every run until fixed. The processor'stheoretical_cache_hit_rate/mean_output_tokens_expectedcodepath is no longer exercised, hiding any future regressions in trace-metadata loading.Fix
One-line change: update the fixture path string in
utils/test_process_agentic_result.pyto use the new directory name (and ideally extract_HF_DATASETinto a shared constant or referenceprocess_agentic_result._HF_DATASETfrom the test so this can't drift again). The minimal patch:snapshot = hf_cache / "datasets--semianalysisai--cc-traces-weka-with-subagents-051926" / "snapshots" / "abc"
- The test sets

Summary
AgentX v0.3 agentic benchmark for Kimi-K2.5-MXFP4 on MI355X using LMCache MP (multi-process) CPU offloading. Key changes on top of main:
feat(agentic): add LMCache MP for Kimi MI355X— wires up the external LMCache server path for ROCmfix(agentic): avoid CUDA NIXL import on MI355X LMCache— ROCm-safe import guardfix(agentic): lazily patch ROCm LMCache allocator/defer ROCm LMCache pinned expansion— avoids 10-min startup stall for 2.5 TB pinned poolfix(agentic): add ROCm LMCache MP block fallback— Python fallback formulti_layer_block_kv_transfer(no CUDA kernel on ROCm)fix(agentic): avoid partial LMCache import patching— ensures all ROCm patches apply atomicallyfix(agentic): extend Kimi MI355X LMCache read lease/use final LMCache capacity on ROCmfix(agentic): normalize/filter Kimi MI355X replay context— caps contexts to server windowfix(agentic): reduce Kimi FP4 B200 CPU DRAM limit to 1500 GBcarry AIPerf prefix metadata,refresh AIPerf mmap cache schema,update AIPerf replay metadata)Benchmark Results — Run 26448139220
Environment: vLLM v0.21.0, ROCm 7.2.x, 8× MI355X (288 GB VRAM each)
Model: amd/Kimi-K2.5-MXFP4
Benchmark: AgentX v0.3 agentic trace replay, 1800s duration
Offloading: LMCache MP (2.5 TB CPU DRAM pool)
offloadlmcache — all cases (FAILED/zero-request excluded)
Notes:
amds_00,amds_04) and TP8/conc40 (amds_07)🤖 Generated with Claude Code
Note
Medium Risk
Touches agentic replay, LMCache offloading, and large pinned-memory behavior on ROCm; mis-patches could affect startup time or KV correctness, but scope is benchmark/inference paths rather than core auth or payments.
Overview
Adds AgentX v0.3 agentic benchmarking for Kimi-K2.5-MXFP4 on MI355X with LMCache multi-process CPU offloading, including the external LMCache server path on ROCm.
ROCm-specific fixes avoid CUDA/NIXL imports, lazily/defer large pinned CPU pool setup, apply atomic LMCache import patches, add a Python fallback for
multi_layer_block_kv_transfer, tune read leases and final cache capacity, and normalize/filter replay contexts to the server window. Also lowers the Kimi FP4 B200 CPU DRAM cap to 1500 GB and updates AIPerf prefix/replay metadata and mmap cache schema.Reviewed by Cursor Bugbot for commit 0a5d493. Bugbot is set up for automated code reviews on this repo. Configure here.