Skip to content

explore(agent-wiki): trajectory-derived wiki — skills, builder, experiments#268

Open
vinodmut wants to merge 4 commits into
AgentToolkit:mainfrom
vinodmut:explorations/agent-wiki
Open

explore(agent-wiki): trajectory-derived wiki — skills, builder, experiments#268
vinodmut wants to merge 4 commits into
AgentToolkit:mainfrom
vinodmut:explorations/agent-wiki

Conversation

@vinodmut

@vinodmut vinodmut commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Related to #256 — this is a prototype of offline trajectory-mining + consolidation ("dreaming"): reviewing saved trajectories to extract, consolidate, deduplicate, and curate memory outside the main task loop, with an auditable record of what changed.

What this is

An exploration in turning agent trajectories into a reusable, evidence-grounded wiki that future agents consult before acting — plus the experiments measuring whether it helps. Everything lives self-contained under explorations/agent-wiki/.

The core idea: after an agent finishes a task, distill its trajectory into wiki pages — episodic summaries, atomic guidelines, themed cluster pages, and executable skills — each linked back to the trajectory that produced it. A future agent, pointed at the wiki's AGENTS.md, retrieves the pages relevant to its task and applies them instead of re-deriving the recipe.

How this maps to #256 ("dreaming")

#256 asks for provided here
extract useful memories from raw trajectories after the fact agent-wiki-summarize / -extract-guidelines / -synthesize-skill (retroactive + batch ingest)
consolidate duplicate / overlapping guidelines agent-wiki-consolidate-guidelines → cluster pages
promote repeated observations; detect stale / redundant entities delete-on-promote (--archive-covered), recall roll-up, priority tiers
auditable summary of what changed and why _audit.log + provenance back-links on every page

Layout

explorations/agent-wiki/
├── skills/        7 agent-wiki skills + build_agent_wiki.py (reference copy)
├── docs/          design.md (rationale) + schema.md (on-disk format)
├── experiments/   RESULTS-SUMMARY + comparison reports; metrics/ rollups; harness/ scripts
└── wikis/         worked examples: wiki-twobatch {base, skills, both, pruned}

Headline findings (experiments/RESULTS-SUMMARY.md)

  • Wiki vs no wiki: −20% cost, −38% duration, −43% tool calls at unchanged accuracy (16-task A/B).
  • Skills > guidelines: a skills-only wiki beats a guidelines-only one on cost (−14%) and matches accuracy.
  • Pointer wording is load-bearing: a strong-imperative CLAUDE.md pointer is read 3/3; a soft one 1/3.
  • Composition > size: piling guidelines on top of skills is the worst populated wiki; delete-on-promote (archive skill-covered atomics) beats it but skills-only stays cheapest.

Scope / data note

These are benchmark-derived example wikis (a synthetic 16-task file-format corpus). Raw per-trial sandbox transcripts and any wikis built from internal trajectory corpora are intentionally excluded — only metric rollups, narrative reports, and the benchmark-derived wikis are included. Source links in wiki frontmatter are shown in the generic trajectories/<session-id>.json form. The skills are a standalone reference copy, not wired into a plugin loader.

Summary by CodeRabbit

  • New Features

    • Introduced agent-wiki exploration: a system for converting agent task trajectories into curated, evidence-grounded wikis containing skills, guidelines, and episode summaries for knowledge reuse and improved task performance.
  • Documentation

    • Added comprehensive design documents, schema references, and usage guides for agent-wiki workflows, including experimental results from multiple wiki retrieval strategies.
  • Chores

    • Updated linting and secret-scanning configurations to exclude generated wiki content from checks.

vinodmut added 2 commits June 10, 2026 00:54
Adds explorations/agent-wiki/ — the agent-wiki skill family, builder, design
+ schema docs, the wiki-helps experiment reports, and benchmark-derived
example wikis, all under one tree suitable for a public PR.

Contents:
  - skills/        7 agent-wiki skills + build_agent_wiki.py (reference copy,
                   not plugin-wired)
  - docs/          design.md + schema.md
  - experiments/   RESULTS-SUMMARY + twobatch comparison reports +
                   pruned-index-hypothesis; metrics/ rollups (no raw
                   transcripts); harness/ runner + compare scripts
  - wikis/         wiki-terminalbench-bob + the twobatch arms
                   (base / skills / both / pruned-corrected)

Public-safety scrub:
  - Excluded all raw per-trial sandbox transcripts (kept only metric
    rollups + narrative reports).
  - Excluded wikis built from internal corpora (procedural-design,
    consult-meta, iterative, retroactive, simple-claude, test-paired,
    claude) and the build-pattern comparison that ran on them; §3-4 of
    RESULTS-SUMMARY reduced to a portable-finding note.
  - Rewrote all source-path frontmatter to the generic
    trajectories/<session-id>.json form; genericized internal example
    names and the benchmark-data dir convention in skills/docs.
  - Leak gate (benchmark-data / internal corpus + wiki names / org paths)
    passes with zero hits across the tree.

Branched off main; diff touches only explorations/agent-wiki/. Builder
catalog + comparison scripts verified runnable from the new location.
Removes the terminal-bench example wiki from the exploration. Repoints the
README reading-order + layout to wiki-twobatch-skills, fixes the docs that
attributed worked examples to it (schema.md now points at the wiki-twobatch
arms; example index rows retagged), and corrects stale relative links the
docs carried from the original tree (../plugin-source → ../skills,
../WIKIS.md removed, ../experiments/wiki-build-comparison.md → RESULTS-SUMMARY
§3–4, design.md/schema.md cross-links to renamed filenames). Skill example
paths (consult, ingest) repointed off the removed wiki.

Remaining wikis: wiki-twobatch {base, skills, both, pruned}. All intra-doc
relative links resolve; leak gate clean.
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces the agent-wiki exploration framework: a system for distilling agent trajectories into a reusable, evidence-grounded knowledge base comprising episodic summaries, atomic guidelines, consolidated clusters, executable skills, and cross-task comparisons. It includes comprehensive design/schema documentation, multi-arm experimental validation across 17 tasks and 8 experiments, and operational skills guiding end-to-end ingestion and querying workflows, plus configuration updates excluding the exploration from lint/scan tooling.

Changes

Agent-Wiki Framework Design & Schema

Layer / File(s) Summary
Framework Overview & Rationale
explorations/agent-wiki/README.md
High-level introduction explaining the exploration goal (post-task distillation into linked pages), repository layout (skills/docs/experiments/wikis), recommended reading order, and scope constraints (benchmark-derived synthetic corpus, no raw transcripts).
Design Principles & Architecture
explorations/agent-wiki/docs/design.md
Specifies mandatory provenance, five page kinds with retrieval preference order, guidance on procedural vs declarative structure, consolidation and delete-on-promote lifecycle, recall-time discipline, end-to-end build pipeline with execution/ordering constraints between passes, and experimental evidence summary with cross-experiment findings.
On-Disk Schema & Contracts
explorations/agent-wiki/docs/schema.md
Complete reference documenting directory layout, filename suffixes, YAML frontmatter for each page kind, index/config/audit artifact structures, linking rules (forward authored, reverse recomputed), promotion/archival lifecycle mechanics with worked examples, and coverage inference for skill-driven archival.

Experimental Validation & Result Analysis

Layer / File(s) Summary
Experiment Harness & Task Suite
explorations/agent-wiki/experiments/harness/experiment_wiki_consult.py, wiki_consult_tasks.yaml
Main orchestrator creating per-trial workspaces, executing Dockerized agent sessions against wikis, parsing stream-json output to detect AGENTS.md and guideline access, and computing outcome matches. Includes 17 prompt-driven tasks covering EXIF/image metadata, codebase exploration, format parsing, and archival/audio analysis, all with explicit outcome matching rules.
Trial Metrics Extraction & Transcript Normalization
explorations/agent-wiki/experiments/harness/extract_trial_metrics.py, normalize_stream_json_transcripts.py
Extracts per-trial metrics (token usage, tool calls, wiki reads) from stream-json transcripts and normalizes diverse transcript formats into OpenAI chat-completion style JSON for downstream analysis.
Result Aggregation & Comparison Reporting
explorations/agent-wiki/experiments/harness/twobatch_compare.py, threeway_compare.py, fourway_compare.py, fiveway_compare.py
Modular scripts reading JSONL metrics, grouping by task/arm, computing statistics (accuracy, cost, tokens), and rendering Markdown comparison reports with per-family and per-task breakdowns.
Experimental Findings & Reports
explorations/agent-wiki/experiments/RESULTS-SUMMARY.md, twobatch-comparison.md, twobatch-skills-comparison.md, twobatch-fourway-comparison.md, twobatch-fiveway-comparison.md, pruned-index-hypothesis.md, explorations/agent-wiki/experiments/metrics/*
Comprehensive markdown reports documenting eight multi-arm experiments comparing empty wiki, guidelines-only, skills-only, combined ("both"), and pruned (delete-on-promote) arms. Includes consolidated findings summary, per-experiment write-ups with aggregate/family/per-task tables, metric JSONL data (48–95 records per file), and root-cause analysis of index staleness confound.

Operational Skills & Configuration

Layer / File(s) Summary
Wiki Consultation Skill
explorations/agent-wiki/skills/agent-wiki-consult/SKILL.md
Defines how agents consult the wiki: resolving wiki root, reading AGENTS.md navigation contract and _index.jsonl index, applying retrieval recipes, and presenting 2–5 ranked guideline matches with rationale.
Guideline Extraction & Skill Synthesis
explorations/agent-wiki/skills/agent-wiki-extract-guidelines/SKILL.md, agent-wiki-synthesize-skill/SKILL.md
Workflows for extracting reusable guideline entities from trajectories (with JSON schema for error/root-cause identification and artifact capture) and promoting successful workflows into executable skill pages (with JSON schema for skill definition, validation, and file/index/audit updates).
Summarization, Consolidation & Task Comparison
explorations/agent-wiki/skills/agent-wiki-summarize/SKILL.md, agent-wiki-consolidate-guidelines/SKILL.md, agent-wiki-tasks/SKILL.md, agent-wiki-ingest/SKILL.md
Four interconnected skills: summarizing trajectories into episodic pages with recalled-guideline status, consolidating atomic guidelines into themed clusters, comparing cross-trajectory task families with evidence-grounded findings and subtask decomposition, and orchestrating the full pipeline from raw traces through normalization, per-trace processing, consolidation, and catalog refresh.
Configuration Defaults & Templates
explorations/agent-wiki/skills/scripts/_default_agent_wiki_config.yaml, _default_agents.md
Default YAML configuration schema (tags, clusters, tasks, session overrides) and template AGENTS.md contract specifying wiki consultation timing, directory/suffix conventions, retrieval algorithms, provenance chaining, and bootstrap/wrapper behavior.

Repository Tooling & Configuration

Layer / File(s) Summary
Repository Configuration Updates
.pre-commit-config.yaml, .secrets.baseline, pyproject.toml
Excludes explorations/agent-wiki/ from detect-secrets scanning (with comments explaining example IDs trigger high-entropy detection), updates .secrets.baseline with adjusted line numbers and file exclusions, and updates Ruff/MyPy in pyproject.toml to exclude generated example-wiki content from linting and type checking.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • visahak
  • gaodan-fang

Poem

🐰 A wiki grows from wandering paths,
Each agent's journey now distilled in haste.
Guidelines gleam, and skills take flight—
One bunny's trace becomes many's light! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main addition: a new exploration introducing agent-wiki with skills, builder, and experiments.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vinodmut vinodmut marked this pull request as ready for review June 10, 2026 06:01
CI (ruff, mypy, detect-secrets) was scanning explorations/agent-wiki/ as
project source — the first content under explorations/ to carry .py files
and high-entropy identifiers. Fixes, scoped so generated example artifacts
are treated like the already-excluded plugin-source/ and examples/ trees:

- ruff: lint + format fixes in the harness scripts + builder; exclude the
  generated wiki scripts (explorations/agent-wiki/wikis/) via extend-exclude.
- mypy: add explorations/agent-wiki/wikis/ to exclude; add file-local
  `# mypy: ignore-errors` to the exploration harness + the builder (a
  verbatim copy of the mypy-excluded plugin-source/ original).
- detect-secrets: exclude explorations/agent-wiki/ in the pre-commit hook
  and .secrets.baseline — the 53 findings are 12-hex guideline content
  hashes and session-id UUIDs, not secrets.

No example-wiki content changed (scripts keep their original names).
Fixes failing CI checks: check-formatting, check-linting, check-typing,
tekton/pr-code-checks/code-detect-secrets.
Drops explorations/agent-wiki/wikis/ (253 generated files, ~10k lines) from
this PR so the diff is the reviewable surface — skills, builder, docs, and
the experiment reports/harness (~34 files). The example wikis are machine-
generated output; bundling them buried the code and appears to have made
CodeRabbit skip deep review (summary only, zero inline findings).

The wikis land in a stacked follow-up PR. README/docs still reference
wikis/wiki-twobatch-* by path; those links resolve once the follow-up
merges. Root-config excludes (ruff/mypy/detect-secrets) are kept — the
detect-secrets exclude still covers example content hashes in docs/schema.md,
and the wiki excludes become live again when the follow-up lands.
@vinodmut

Copy link
Copy Markdown
Contributor Author

Split the generated example wikis into a companion PR #269 (merge after this one) so this diff stays focused on the reviewable code — builder, skills, docs, and experiment harness (34 files vs the original 287). This should let CodeRabbit review the code properly.

@vinodmut

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (13)
explorations/agent-wiki/skills/agent-wiki-synthesize-skill/SKILL.md (1)

205-213: ⚡ Quick win

Add language specifier to fenced code block.

The directory structure example should use text or similar language identifier for consistency.

📝 Suggested fix
-```
+```text
 <wiki>/skills/
 ├── _id_index.json                     skill slug → relpath
 ├── index.md                           alphabetical listing (auto-generated)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/skills/agent-wiki-synthesize-skill/SKILL.md` around
lines 205 - 213, Update the fenced code block in SKILL.md that shows the
directory tree for "<wiki>/skills/" to include a language specifier (e.g.,
change the opening ``` to ```text) so the block is marked as plain text; locate
the block in the SKILL.md content that begins with the three backticks followed
by the tree and replace the opening fence accordingly to ensure consistent
formatting.

Source: Linters/SAST tools

explorations/agent-wiki/skills/agent-wiki-consult/SKILL.md (2)

53-55: ⚡ Quick win

Add language specifier to fenced code block.

The code block should specify bash as the language for proper syntax highlighting and consistency with the rest of the documentation.

📝 Suggested fix
-```
+```bash
 Read <wiki-root>/AGENTS.md
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @explorations/agent-wiki/skills/agent-wiki-consult/SKILL.md around lines 53 -
55, Update the fenced code block containing "Read /AGENTS.md" in
SKILL.md to include a language specifier; specifically, change the backticks
that start the block to bash so the snippet is bash Read
/AGENTS.md ``` which enables Bash syntax highlighting and keeps
formatting consistent with other docs.


</details>

<!-- cr-comment:v1:112ba0be983257e2722014cb -->

_Source: Linters/SAST tools_

---

`72-74`: _⚡ Quick win_

**Add language specifier to fenced code block.**

The code block should specify `bash` as the language for proper syntax highlighting and consistency.





<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```bash
 Read <wiki-root>/_index.jsonl
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @explorations/agent-wiki/skills/agent-wiki-consult/SKILL.md around lines 72 -
74, Add the language specifier "bash" to the fenced code block that contains the
line "Read /_index.jsonl" in SKILL.md so the block reads as a bash
snippet; locate the triple-backtick fence surrounding that line and change it
from tobash to enable proper syntax highlighting and consistency.


</details>

<!-- cr-comment:v1:a82435713f4f0bdb75db895c -->

_Source: Linters/SAST tools_

</blockquote></details>
<details>
<summary>explorations/agent-wiki/skills/agent-wiki-tasks/SKILL.md (1)</summary><blockquote>

`43-45`: _⚡ Quick win_

**Add language specifier to fenced code block.**

The code block should specify `bash` as the language for consistency.





<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```bash
 Read /tmp/summaries.json
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @explorations/agent-wiki/skills/agent-wiki-tasks/SKILL.md around lines 43 -
45, The fenced code block containing the line "Read /tmp/summaries.json" is
missing a language specifier; update the markdown in SKILL.md by changing the
opening fence from tobash so the block reads as a bash code block (i.e.,
use bash before the line and keep the closing ), ensuring consistency with
other fenced blocks.


</details>

<!-- cr-comment:v1:8cf4b500a74a459a81dea92b -->

_Source: Linters/SAST tools_

</blockquote></details>
<details>
<summary>explorations/agent-wiki/skills/agent-wiki-consolidate-guidelines/SKILL.md (1)</summary><blockquote>

`43-45`: _⚡ Quick win_

**Add language specifier to fenced code block.**

The code block should specify `bash` as the language for consistency.





<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```bash
 Read /tmp/guidelines.json
 ```
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @explorations/agent-wiki/skills/agent-wiki-consolidate-guidelines/SKILL.md
around lines 43 - 45, The fenced code block containing the line "Read
/tmp/guidelines.json" in SKILL.md should include a language specifier; update
that code fence to use "bash" (i.e., change the opening tobash) so the
block reads as a bash snippet for consistency and proper syntax highlighting.


</details>

<!-- cr-comment:v1:5bf51a6b4180929ec5599bc9 -->

_Source: Linters/SAST tools_

</blockquote></details>
<details>
<summary>explorations/agent-wiki/skills/agent-wiki-ingest/SKILL.md (1)</summary><blockquote>

`26-35`: _⚡ Quick win_

**Add language specifier to fenced code block.**

The pipeline diagram should use `text` or similar language identifier for consistency.





<details>
<summary>📝 Suggested fix</summary>

```diff
-```
+```text
 0.  Convert    raw bob / claude traces → normalized analysis JSON   (skip if already normalized)
 1.  Bootstrap  create wiki scaffold + seed catalog                  (skip if wiki exists)
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @explorations/agent-wiki/skills/agent-wiki-ingest/SKILL.md around lines 26 -
35, The fenced code block that lists the pipeline steps (starting with "0.
Convert raw bob / claude traces → normalized analysis JSON") in SKILL.md is
missing a language specifier; update the opening fence from totext (or
another plain language like text) so the block reads ```text and preserves
formatting/consistency across renderers.


</details>

<!-- cr-comment:v1:6330421252c87cc7389c0754 -->

_Source: Linters/SAST tools_

</blockquote></details>
<details>
<summary>explorations/agent-wiki/skills/scripts/_default_agents.md (3)</summary><blockquote>

`35-55`: _💤 Low value_

**Add language specifier to fenced code block.**

The directory structure code block should specify a language (e.g., `text`) to satisfy markdown linting best practices.





<details>
<summary>📝 Proposed fix</summary>

```diff
-```
+```text
 <wiki-root>/
 ├── AGENTS.md          ← this file
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @explorations/agent-wiki/skills/scripts/_default_agents.md around lines 35 -
55, The fenced directory-structure block at the top of AGENTS.md is missing a
language tag; update the opening fence from totext so the block is
marked as plain text (modify the code block that begins with the tree under
/ in AGENTS.md).


</details>

<!-- cr-comment:v1:f6308da14e8d4796aead671d -->

_Source: Linters/SAST tools_

---

`123-131`: _💤 Low value_

**Add language specifier to fenced code block.**

The provenance chain code block should specify a language (e.g., `text`) to satisfy markdown linting best practices.





<details>
<summary>📝 Proposed fix</summary>

```diff
-```
+```text
 guideline.md
   ↓ frontmatter `related_summary:`
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @explorations/agent-wiki/skills/scripts/_default_agents.md around lines 123 -
131, The fenced provenance chain block in _default_agents.md lacks a language
specifier which fails markdown linting; update the opening triple-backtick for
that block to include a language (for example change totext) so the
block becomes a labeled text code fence, ensuring the block around the lines
starting with "guideline.md" and the following arrows is annotated (look for the
triple-backtick that opens that specific provenance chain block).


</details>

<!-- cr-comment:v1:2ab25b3aa9b4a9092e5a0af8 -->

_Source: Linters/SAST tools_

---

`151-152`: _⚡ Quick win_

**Clarify the guideline reference format in the example.**

Line 152 shows `474bb2ba1076` as a guideline reference, but according to the structure at line 46, atomic guideline files follow the pattern `<slug>__<gid>.md`. The example should either show the full filename or clarify that this is just the ID portion from the `_index.jsonl`.





<details>
<summary>📝 Suggested clarification</summary>

```diff
-   - `474bb2ba1076` "Promote a feature mode to a top-level flag, not an
-     option" (atomic; tags include `cli`, `ux`, `workspace`).
+   - `474bb2ba1076` (id) "Promote a feature mode to a top-level flag, not an
+     option" (atomic; tags include `cli`, `ux`, `workspace`; 
+     file: `guidelines/promote-mode-to-flag__474bb2ba1076.md`).
```

Or simply add a note that IDs shown are from `_index.jsonl` and map to full filenames via the pattern.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/skills/scripts/_default_agents.md` around lines 151 -
152, Update the example in _default_agents.md so the guideline reference is
unambiguous: replace or augment the bare ID `474bb2ba1076` with the full atomic
filename pattern `<slug>__474bb2ba1076.md` (or show the actual example full
filename), or add a short note stating that IDs shown (like `474bb2ba1076`) are
the gid field from `_index.jsonl` and map to atomic files via the
`<slug>__<gid>.md` convention; ensure this change is applied where the example
appears so readers can map the ID to the full guideline filename.
```

</details>

<!-- cr-comment:v1:6be9e0d1da5cd4915083e5de -->

</blockquote></details>
<details>
<summary>explorations/agent-wiki/experiments/harness/twobatch_compare.py (2)</summary><blockquote>

`232-249`: _💤 Low value_

**Move `TASK_IDS_ORDER` constant to module top for conventional readability.**

The constant is defined at line 232 but used in `main()` at line 192. While Python's lazy evaluation makes this work, conventional practice places module-level constants immediately after imports. This improves readability and makes it clear which values are configurable.





<details>
<summary>Suggested placement</summary>

Move lines 232-249 to appear immediately after line 21 (after `REPO` definition and before `FAMILY`).

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/experiments/harness/twobatch_compare.py` around lines
232 - 249, TASK_IDS_ORDER is defined far down the file but used in main(); move
the TASK_IDS_ORDER constant up to the top-level constants area (immediately
after the existing REPO constant and before FAMILY) so it’s declared with other
module-level configuration. Update its placement only — keep the exact name
TASK_IDS_ORDER and do not change its contents or usages (e.g., references inside
main()) to restore conventional readability.
```

</details>

<!-- cr-comment:v1:f6a2be203b773f2ca558acac -->

---

`23-79`: _⚡ Quick win_

**Consider extracting shared constants and utilities to reduce duplication.**

Both `twobatch_compare.py` and `threeway_compare.py` duplicate `FAMILY` dict, `TASK_IDS_ORDER` list, and helper functions (`median`/`mean`, `fmt`, `delta`/`delta_str`, `acc`). For exploration code, self-contained scripts may be intentional, but if these tools will be maintained or extended, consolidating ~80 lines of shared logic into a `comparison_utils.py` module would reduce drift and simplify updates.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/experiments/harness/twobatch_compare.py` around lines
23 - 79, Extract the duplicated constants and helper functions into a new module
(e.g., comparison_utils.py): move FAMILY and TASK_IDS_ORDER plus median_or_none,
mean_or_none, fmt, delta_str (and any other shared helpers like acc/delta) into
that module, update twobatch_compare.py and threeway_compare.py to import those
symbols instead of redefining them, and remove the duplicate definitions from
both files so they reference the shared implementations.
```

</details>

<!-- cr-comment:v1:5a2f1f797deec619cfc823db -->

</blockquote></details>
<details>
<summary>explorations/agent-wiki/experiments/harness/fourway_compare.py (1)</summary><blockquote>

`20-20`: _⚡ Quick win_

**Remove unused `REPO` variable in both comparison scripts.**

Both `fourway_compare.py` and `fiveway_compare.py` define `REPO = Path(__file__).resolve().parents[1]` but never reference it. This appears to be copy-paste boilerplate that can be removed.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/experiments/harness/fourway_compare.py` at line 20,
Remove the unused REPO variable declaration (REPO =
Path(__file__).resolve().parents[1]) from the comparison scripts; locate the
top-level REPO assignment in fourway_compare.py and fiveway_compare.py and
delete that line to eliminate dead/copy-pasted boilerplate, ensuring no other
code references REPO afterward.
```

</details>

<!-- cr-comment:v1:82bef0e5f751d47f51a6fc55 -->

</blockquote></details>
<details>
<summary>explorations/agent-wiki/experiments/RESULTS-SUMMARY.md (1)</summary><blockquote>

`408-432`: _💤 Low value_

**Add language identifier to fenced code block.**

The file-map code fence should specify `text` or leave it blank explicitly. As per static analysis, fenced code blocks should have a language specified.





<details>
<summary>🔧 Proposed fix</summary>

````diff
-```
+```text
 explorations/agent-wiki/experiments/
 ├── RESULTS-SUMMARY.md                     this file
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/experiments/RESULTS-SUMMARY.md` around lines 408 -
432, The fenced directory tree at the top of RESULTS-SUMMARY.md is missing a
language identifier; update the opening code fence for the file-map block (the
triple-backticks that surround the explorations/agent-wiki/experiments/ tree) to
include a language identifier (e.g., change ``` to ```text) so static analysis
recognizes the block language.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@explorations/agent-wiki/experiments/harness/experiment_wiki_consult.py`:
- Line 47: REPO_ROOT is set using Path(__file__).resolve().parents[2] which
points to explorations/agent-wiki/ (too shallow); change the parent index to
parents[4] so REPO_ROOT points to the repository root; update the expression
Path(__file__).resolve().parents[2] to Path(__file__).resolve().parents[4]
(symbol: REPO_ROOT) so all derived paths that use REPO_ROOT (used later on lines
referencing the same variable) resolve correctly.
- Around line 425-426: The median calculation using median = durs[n // 2] is
incorrect for even-length lists; update the logic after computing durs and n so
that you compute mid = n // 2 and set median = durs[mid] when n is odd,
otherwise set median = (durs[mid - 1] + durs[mid]) / 2.0 to return the average
of the two middle values (ensure the result is a float); reference the existing
variables durs, n, median and rows to locate and replace the incorrect line.
- Line 310: The tasks_file path is pointing to the wrong location; update the
tasks_file assignment (the variable named tasks_file which currently uses
REPO_ROOT / "tests" / "e2e" / "wiki_consult_tasks.yaml") to the actual file
location by constructing the path as REPO_ROOT / "explorations" / "agent-wiki" /
"experiments" / "harness" / "wiki_consult_tasks.yaml" so the script can load the
correct YAML; ensure you use the same Path joining style used elsewhere with
REPO_ROOT.
- Around line 114-120: The helper _seed_format_group currently does a local
import "from _format_samples import seed_into" but there is no _format_samples
module in the harness directory, causing runtime import errors; fix by providing
a real implementation or correct import: either add a new _format_samples.py
next to experiment_wiki_consult.py that implements seed_into(ws: Path, group:
str) -> list[str], or change the import in _seed_format_group to point to the
existing module that provides seed_into (or inline the seed_into logic into
_seed_format_group) so that the function _seed_format_group calls a valid
seed_into symbol at runtime.

In `@explorations/agent-wiki/experiments/RESULTS-SUMMARY.md`:
- Line 368: Replace the typo "byes" with "bytes" in the sentence fragment that
currently reads "it reads MORE byes (cache-creation..." in RESULTS-SUMMARY.md so
the sentence reads "it reads MORE bytes (cache-creation..."; search for the
exact phrase "it reads MORE byes" to locate the spot and update only that word.

In `@explorations/agent-wiki/skills/scripts/_default_agents.md`:
- Around line 167-171: Update the bootstrap command path in the markdown so it
points to the correct script location: replace occurrences of
"plugin-source/skills/agent-wiki/scripts/build_agent_wiki.py" with
"explorations/agent-wiki/skills/scripts/build_agent_wiki.py" in the AGENTS.md
bootstrap instructions (the sentence containing the command `uv run python ...
build_agent_wiki.py --wiki-root <wiki-root> catalog`) so the documented command
matches the actual repository layout.
- Around line 28-29: The Structure section in _default_agents.md is missing the
generated guidelines/index.md entry; update the Structure listing to include
guidelines/index.md alongside the existing guidelines/<slug>__<gid>.md and
guidelines/<slug>__cluster.md entries so the docs match what
explorations/agent-wiki/skills/scripts/build_agent_wiki.py generates; edit the
"Structure" block in _default_agents.md to add a line for guidelines/index.md
and ensure formatting/ordering is consistent with the other guideline entries.

---

Nitpick comments:
In `@explorations/agent-wiki/experiments/harness/fourway_compare.py`:
- Line 20: Remove the unused REPO variable declaration (REPO =
Path(__file__).resolve().parents[1]) from the comparison scripts; locate the
top-level REPO assignment in fourway_compare.py and fiveway_compare.py and
delete that line to eliminate dead/copy-pasted boilerplate, ensuring no other
code references REPO afterward.

In `@explorations/agent-wiki/experiments/harness/twobatch_compare.py`:
- Around line 232-249: TASK_IDS_ORDER is defined far down the file but used in
main(); move the TASK_IDS_ORDER constant up to the top-level constants area
(immediately after the existing REPO constant and before FAMILY) so it’s
declared with other module-level configuration. Update its placement only — keep
the exact name TASK_IDS_ORDER and do not change its contents or usages (e.g.,
references inside main()) to restore conventional readability.
- Around line 23-79: Extract the duplicated constants and helper functions into
a new module (e.g., comparison_utils.py): move FAMILY and TASK_IDS_ORDER plus
median_or_none, mean_or_none, fmt, delta_str (and any other shared helpers like
acc/delta) into that module, update twobatch_compare.py and threeway_compare.py
to import those symbols instead of redefining them, and remove the duplicate
definitions from both files so they reference the shared implementations.

In `@explorations/agent-wiki/experiments/RESULTS-SUMMARY.md`:
- Around line 408-432: The fenced directory tree at the top of
RESULTS-SUMMARY.md is missing a language identifier; update the opening code
fence for the file-map block (the triple-backticks that surround the
explorations/agent-wiki/experiments/ tree) to include a language identifier
(e.g., change ``` to ```text) so static analysis recognizes the block language.

In `@explorations/agent-wiki/skills/agent-wiki-consolidate-guidelines/SKILL.md`:
- Around line 43-45: The fenced code block containing the line "Read
/tmp/guidelines.json" in SKILL.md should include a language specifier; update
that code fence to use "bash" (i.e., change the opening ``` to ```bash) so the
block reads as a bash snippet for consistency and proper syntax highlighting.

In `@explorations/agent-wiki/skills/agent-wiki-consult/SKILL.md`:
- Around line 53-55: Update the fenced code block containing "Read
<wiki-root>/AGENTS.md" in SKILL.md to include a language specifier;
specifically, change the backticks that start the block to ```bash so the
snippet is ```bash Read <wiki-root>/AGENTS.md ``` which enables Bash syntax
highlighting and keeps formatting consistent with other docs.
- Around line 72-74: Add the language specifier "bash" to the fenced code block
that contains the line "Read <wiki-root>/_index.jsonl" in SKILL.md so the block
reads as a bash snippet; locate the triple-backtick fence surrounding that line
and change it from ``` to ```bash to enable proper syntax highlighting and
consistency.

In `@explorations/agent-wiki/skills/agent-wiki-ingest/SKILL.md`:
- Around line 26-35: The fenced code block that lists the pipeline steps
(starting with "0.  Convert    raw bob / claude traces → normalized analysis
JSON") in SKILL.md is missing a language specifier; update the opening fence
from ``` to ```text (or another plain language like `text`) so the block reads
```text and preserves formatting/consistency across renderers.

In `@explorations/agent-wiki/skills/agent-wiki-synthesize-skill/SKILL.md`:
- Around line 205-213: Update the fenced code block in SKILL.md that shows the
directory tree for "<wiki>/skills/" to include a language specifier (e.g.,
change the opening ``` to ```text) so the block is marked as plain text; locate
the block in the SKILL.md content that begins with the three backticks followed
by the tree and replace the opening fence accordingly to ensure consistent
formatting.

In `@explorations/agent-wiki/skills/agent-wiki-tasks/SKILL.md`:
- Around line 43-45: The fenced code block containing the line "Read
/tmp/summaries.json" is missing a language specifier; update the markdown in
SKILL.md by changing the opening fence from ``` to ```bash so the block reads as
a bash code block (i.e., use ```bash before the line and keep the closing ```),
ensuring consistency with other fenced blocks.

In `@explorations/agent-wiki/skills/scripts/_default_agents.md`:
- Around line 35-55: The fenced directory-structure block at the top of
AGENTS.md is missing a language tag; update the opening fence from ``` to
```text so the block is marked as plain text (modify the code block that begins
with the tree under <wiki-root>/ in AGENTS.md).
- Around line 123-131: The fenced provenance chain block in _default_agents.md
lacks a language specifier which fails markdown linting; update the opening
triple-backtick for that block to include a language (for example change ``` to
```text) so the block becomes a labeled text code fence, ensuring the block
around the lines starting with "guideline.md" and the following arrows is
annotated (look for the triple-backtick that opens that specific provenance
chain block).
- Around line 151-152: Update the example in _default_agents.md so the guideline
reference is unambiguous: replace or augment the bare ID `474bb2ba1076` with the
full atomic filename pattern `<slug>__474bb2ba1076.md` (or show the actual
example full filename), or add a short note stating that IDs shown (like
`474bb2ba1076`) are the gid field from `_index.jsonl` and map to atomic files
via the `<slug>__<gid>.md` convention; ensure this change is applied where the
example appears so readers can map the ID to the full guideline filename.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 03d507b1-88b4-4d95-926c-e74bcaf7cd25

📥 Commits

Reviewing files that changed from the base of the PR and between 6de3712 and 8cadfa6.

📒 Files selected for processing (34)
  • .pre-commit-config.yaml
  • .secrets.baseline
  • explorations/agent-wiki/README.md
  • explorations/agent-wiki/docs/design.md
  • explorations/agent-wiki/docs/schema.md
  • explorations/agent-wiki/experiments/RESULTS-SUMMARY.md
  • explorations/agent-wiki/experiments/harness/experiment_wiki_consult.py
  • explorations/agent-wiki/experiments/harness/extract_trial_metrics.py
  • explorations/agent-wiki/experiments/harness/fiveway_compare.py
  • explorations/agent-wiki/experiments/harness/fourway_compare.py
  • explorations/agent-wiki/experiments/harness/normalize_stream_json_transcripts.py
  • explorations/agent-wiki/experiments/harness/threeway_compare.py
  • explorations/agent-wiki/experiments/harness/twobatch_compare.py
  • explorations/agent-wiki/experiments/harness/wiki_consult_tasks.yaml
  • explorations/agent-wiki/experiments/metrics/pruned-fixed-9atomic.metrics.jsonl
  • explorations/agent-wiki/experiments/metrics/twobatch-both.metrics.jsonl
  • explorations/agent-wiki/experiments/metrics/twobatch-skills.metrics.jsonl
  • explorations/agent-wiki/experiments/metrics/twobatch.metrics.jsonl
  • explorations/agent-wiki/experiments/pruned-index-hypothesis.md
  • explorations/agent-wiki/experiments/twobatch-comparison.md
  • explorations/agent-wiki/experiments/twobatch-fiveway-comparison.md
  • explorations/agent-wiki/experiments/twobatch-fourway-comparison.md
  • explorations/agent-wiki/experiments/twobatch-skills-comparison.md
  • explorations/agent-wiki/skills/agent-wiki-consolidate-guidelines/SKILL.md
  • explorations/agent-wiki/skills/agent-wiki-consult/SKILL.md
  • explorations/agent-wiki/skills/agent-wiki-extract-guidelines/SKILL.md
  • explorations/agent-wiki/skills/agent-wiki-ingest/SKILL.md
  • explorations/agent-wiki/skills/agent-wiki-summarize/SKILL.md
  • explorations/agent-wiki/skills/agent-wiki-synthesize-skill/SKILL.md
  • explorations/agent-wiki/skills/agent-wiki-tasks/SKILL.md
  • explorations/agent-wiki/skills/scripts/_default_agent_wiki_config.yaml
  • explorations/agent-wiki/skills/scripts/_default_agents.md
  • explorations/agent-wiki/skills/scripts/build_agent_wiki.py
  • pyproject.toml


import yaml

REPO_ROOT = Path(__file__).resolve().parents[2]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

REPO_ROOT calculation resolves to wrong directory.

The script is located at explorations/agent-wiki/experiments/harness/experiment_wiki_consult.py, which is 4 levels deep from the repository root. Path(__file__).resolve().parents[2] resolves to explorations/agent-wiki/, not the repository root. All derived paths (lines 61, 64, 67) will be incorrect.

🔧 Proposed fix
-REPO_ROOT = Path(__file__).resolve().parents[2]
+REPO_ROOT = Path(__file__).resolve().parents[4]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
REPO_ROOT = Path(__file__).resolve().parents[2]
REPO_ROOT = Path(__file__).resolve().parents[4]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/experiments/harness/experiment_wiki_consult.py` at
line 47, REPO_ROOT is set using Path(__file__).resolve().parents[2] which points
to explorations/agent-wiki/ (too shallow); change the parent index to parents[4]
so REPO_ROOT points to the repository root; update the expression
Path(__file__).resolve().parents[2] to Path(__file__).resolve().parents[4]
(symbol: REPO_ROOT) so all derived paths that use REPO_ROOT (used later on lines
referencing the same variable) resolve correctly.

Comment on lines +114 to +120
def _seed_format_group(ws: Path, group: str) -> list[str]:
"""Seed image/archive/text format samples via the stdlib generators in
`_format_samples.py`. Group is one of `image-formats`, `archive-formats`,
`text-formats`."""
from _format_samples import seed_into # local import — script lives next door

return seed_into(ws, group)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if _format_samples.py exists in the harness directory
fd -t f "_format_samples.py" explorations/agent-wiki/experiments/harness/

Repository: AgentToolkit/altk-evolve

Length of output: 50


🏁 Script executed:

#!/bin/bash
set -euo pipefail

dir="explorations/agent-wiki/experiments/harness"
echo "== harness directory listing =="
ls -la "$dir"

echo
echo "== files matching _format_samples* in harness =="
fd -t f "_format_samples*" "$dir" || true

echo
echo "== any reference to _format_samples in repo (top 50 matches) =="
rg -n --hidden --no-ignore-vcs "_format_samples" . | head -n 50

echo
echo "== any definition of seed_into in repo (top 50 matches) =="
rg -n --hidden --no-ignore-vcs "def seed_into" . | head -n 50

Repository: AgentToolkit/altk-evolve

Length of output: 1295


🌐 Web query:

Search for _format_samples.pyandseed_into in the AgentToolkit/altk-evolve repository if available in public docs (or confirm expected module location for harness format seeding).

💡 Result:

The altk-evolve framework, managed by the AgentToolkit, does not publicly expose a module or function named _format_samples.py or seed_into in its main documentation or repository structure. Based on official documentation for altk-evolve: 1. Entity Management: Evolve manages knowledge (tips/guidelines) either via an MCP server with Milvus vector storage or as Markdown files with YAML frontmatter in .evolve/entities/ (specifically in the evolve-lite integration) [1][2]. 2. Trajectory Seeding and Guidelines: The system uses tools such as save_trajectory to extract insights and generate guidelines from execution traces [1][3]. The generation process is handled automatically through evolve sync (for observability traces) or via explicit tool calls (like /evolve-lite:learn) [2][4]. 3. Provenance: Every guideline created is assigned metadata, including creation_mode (e.g., auto-phoenix, auto-mcp, or manual) and source_task_id, rather than being injected via a seed_into function [1][4]. It is possible that _format_samples.py or seed_into are internal implementation details or components of a private or older version of the toolkit, but they are not part of the documented public API for harness format seeding or knowledge base initialization in the current stable version [3][5]. If you are working with a specific development branch, you may want to check the src/ or internal/ directories directly in the local repository clone, as these are not exposed in the high-level API docs [1].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== fd any _format_samples.py anywhere in repo =="
fd -t f "_format_samples.py" . || true

echo
echo "== fd any _format_samples.* anywhere in repo (loose) =="
fd -t f "_format_samples.*" . || true

echo
echo "== rg references to 'seed_into(' across repo (top 50) =="
rg -n --hidden --no-ignore-vcs "seed_into\s*\(" . | head -n 50 || true

echo
echo "== rg references to '_format_samples' across repo (top 50) =="
rg -n --hidden --no-ignore-vcs "_format_samples" . | head -n 50 || true

Repository: AgentToolkit/altk-evolve

Length of output: 719


🏁 Script executed:

#!/bin/bash
set -euo pipefail

f="explorations/agent-wiki/experiments/harness/experiment_wiki_consult.py"
y="explorations/agent-wiki/experiments/harness/wiki_consult_tasks.yaml"

echo "== Python: find make_workspace and seed handling =="

rg -n "def make_workspace|make_workspace|seed" "$f" | head -n 80

echo
echo "== Python: show _seed_format_group area (lines ~90-140) =="

python - <<'PY'
import itertools
path="explorations/agent-wiki/experiments/harness/experiment_wiki_consult.py"
start=90
end=150
with open(path,'r',encoding='utf-8') as fp:
    for i,line in enumerate(fp, start=1):
        if start<=i<=end:
            print(f"{i:4d}: {line.rstrip()}")
        if i>end:
            break
PY

echo
echo "== YAML: show tasks that mention 'image-formats'/'archive-formats'/'text-formats' =="

rg -n "image-formats|archive-formats|text-formats" "$y" || true

echo
echo "== YAML: show full file (<=200 lines) or relevant blocks =="

wc -l "$y"
if [ "$(wc -l < "$y")" -le 200 ]; then
  nl -ba "$y"
else
  # print first and last plus around matches
  nl -ba "$y" | head -n 60
  rg -n "image-formats|archive-formats|text-formats" "$y" -n || true
  nl -ba "$y" | tail -n 60
fi

Repository: AgentToolkit/altk-evolve

Length of output: 4383


🏁 Script executed:

#!/bin/bash
set -euo pipefail

y="explorations/agent-wiki/experiments/harness/wiki_consult_tasks.yaml"

echo "== seed entries =="
rg -n "^\s*seed:\s*(image-formats|archive-formats|text-formats)\s*$" "$y" || true

echo
echo "== total lines containing seed: =="
rg -n "^\s*seed:\s*" "$y" | wc -l

echo
echo "== total task items (lines starting with '-' ) =="
rg -n "^\s*-\s" "$y" | wc -l

echo
echo "== show YAML (for context, whole file <=200 lines) =="
sed -n '1,200p' "$y"

Repository: AgentToolkit/altk-evolve

Length of output: 5179


Fix missing _format_samples import for format seeding

experiment_wiki_consult.py imports seed_into via from _format_samples import seed_into (assuming _format_samples.py sits next to the script), but _format_samples.py is not present in explorations/agent-wiki/experiments/harness/, and there is no seed_into implementation anywhere else in this repo—so the format-seeding tasks will fail at runtime unless _format_samples is generated/copied in elsewhere. Blocks 12/17 trials: t6-t9, t10-t13, t14-t17.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/experiments/harness/experiment_wiki_consult.py`
around lines 114 - 120, The helper _seed_format_group currently does a local
import "from _format_samples import seed_into" but there is no _format_samples
module in the harness directory, causing runtime import errors; fix by providing
a real implementation or correct import: either add a new _format_samples.py
next to experiment_wiki_consult.py that implements seed_into(ws: Path, group:
str) -> list[str], or change the import in _seed_format_group to point to the
existing module that provides seed_into (or inline the seed_into logic into
_seed_format_group) so that the function _seed_format_group calls a valid
seed_into symbol at runtime.

return 2

# Load tasks (--task may be comma-separated)
tasks_file = REPO_ROOT / "tests" / "e2e" / "wiki_consult_tasks.yaml"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

tasks_file path points to wrong location.

The code looks for wiki_consult_tasks.yaml at tests/e2e/wiki_consult_tasks.yaml, but the actual file is located at explorations/agent-wiki/experiments/harness/wiki_consult_tasks.yaml. The script will fail when attempting to load tasks.

🔧 Proposed fix
-    tasks_file = REPO_ROOT / "tests" / "e2e" / "wiki_consult_tasks.yaml"
+    tasks_file = Path(__file__).parent / "wiki_consult_tasks.yaml"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/experiments/harness/experiment_wiki_consult.py` at
line 310, The tasks_file path is pointing to the wrong location; update the
tasks_file assignment (the variable named tasks_file which currently uses
REPO_ROOT / "tests" / "e2e" / "wiki_consult_tasks.yaml") to the actual file
location by constructing the path as REPO_ROOT / "explorations" / "agent-wiki" /
"experiments" / "harness" / "wiki_consult_tasks.yaml" so the script can load the
correct YAML; ensure you use the same Path joining style used elsewhere with
REPO_ROOT.

Comment on lines +425 to +426
durs = sorted(r["duration_s"] for r in rows)
median = durs[n // 2]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Median calculation is incorrect for even-length trial lists.

The code uses median = durs[n // 2], which picks the first middle value for even-length lists instead of averaging the two middle values. For example, with durations [1, 2, 3, 4], it returns 3 instead of 2.5.

This doesn't affect the default --trials 3 case (odd length), but will produce incorrect medians for even trial counts.

📊 Proposed fix for correct median
             durs = sorted(r["duration_s"] for r in rows)
-            median = durs[n // 2]
+            median = durs[n // 2] if n % 2 == 1 else (durs[n // 2 - 1] + durs[n // 2]) / 2
             md_lines.append(f"| {condition:<10} | {rd}/{n} | {ct}/{n} | {om}/{n} | {median:.0f} |")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
durs = sorted(r["duration_s"] for r in rows)
median = durs[n // 2]
durs = sorted(r["duration_s"] for r in rows)
median = durs[n // 2] if n % 2 == 1 else (durs[n // 2 - 1] + durs[n // 2]) / 2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/experiments/harness/experiment_wiki_consult.py`
around lines 425 - 426, The median calculation using median = durs[n // 2] is
incorrect for even-length lists; update the logic after computing durs and n so
that you compute mid = n // 2 and set median = durs[mid] when n is odd,
otherwise set median = (durs[mid - 1] + durs[mid]) / 2.0 to return the average
of the two middle values (ensure the result is a float); reference the existing
variables durs, n, median and rows to locate and replace the incorrect line.


5. **Cost reduction comes mainly from output tokens and tool-call
reduction**, not from input-token compression. The agent doesn't read
*fewer* bytes when it has the wiki — it reads MORE byes (cache-creation

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo: "byes" should be "bytes".

The sentence reads "it reads MORE byes" but should be "it reads MORE bytes".

📝 Proposed fix
-   *fewer* bytes when it has the wiki — it reads MORE byes (cache-creation
+   *fewer* bytes when it has the wiki — it reads MORE bytes (cache-creation
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
*fewer* bytes when it has the wiki — it reads MORE byes (cache-creation
*fewer* bytes when it has the wiki — it reads MORE bytes (cache-creation
🧰 Tools
🪛 LanguageTool

[grammar] ~368-~368: Ensure spelling is correct
Context: ...es when it has the wiki — it reads MORE byes (cache-creation on guideline pages g...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/experiments/RESULTS-SUMMARY.md` at line 368, Replace
the typo "byes" with "bytes" in the sentence fragment that currently reads "it
reads MORE byes (cache-creation..." in RESULTS-SUMMARY.md so the sentence reads
"it reads MORE bytes (cache-creation..."; search for the exact phrase "it reads
MORE byes" to locate the spot and update only that word.

Source: Linters/SAST tools

Comment on lines +28 to +29
clearly outside the wiki's scope (the corpus is finite — see
`guidelines/index.md` for the topical surface).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for guidelines/index.md references and existence

# Search for any mention of guidelines/index.md in the exploration
rg -n "guidelines/index\.md" explorations/agent-wiki/

# Check the actual directory structure in any example wikis
fd -t f "index.md" explorations/agent-wiki/ --exec echo "Found: {}"

Repository: AgentToolkit/altk-evolve

Length of output: 760


Include guidelines/index.md in the “Structure” tree shown in _default_agents.md

explorations/agent-wiki/skills/scripts/build_agent_wiki.py refreshes/generates guidelines/index.md, so the line-29 link isn’t stale. Update the “Structure” listing (around lines 41-54) to add guidelines/index.md alongside the existing guidelines/<slug>__<gid>.md, guidelines/<slug>__cluster.md, etc., so the documented structure matches the generated wiki.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/skills/scripts/_default_agents.md` around lines 28 -
29, The Structure section in _default_agents.md is missing the generated
guidelines/index.md entry; update the Structure listing to include
guidelines/index.md alongside the existing guidelines/<slug>__<gid>.md and
guidelines/<slug>__cluster.md entries so the docs match what
explorations/agent-wiki/skills/scripts/build_agent_wiki.py generates; edit the
"Structure" block in _default_agents.md to add a line for guidelines/index.md
and ensure formatting/ordering is consistent with the other guideline entries.

Comment on lines +167 to +171
If `AGENTS.md` does not exist in a wiki, run
`uv run python plugin-source/skills/agent-wiki/scripts/build_agent_wiki.py
--wiki-root <wiki-root> catalog` — the bootstrap pass copies the template
in. After bootstrap, this file is yours to edit; subsequent catalog runs
do not overwrite an existing `AGENTS.md`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix the incorrect build script path.

Line 168 references plugin-source/skills/agent-wiki/scripts/build_agent_wiki.py, but based on the PR structure, the correct path should be explorations/agent-wiki/skills/scripts/build_agent_wiki.py. This breaks the bootstrapping instructions.

🔧 Proposed fix
 If `AGENTS.md` does not exist in a wiki, run
-`uv run python plugin-source/skills/agent-wiki/scripts/build_agent_wiki.py
+`uv run python explorations/agent-wiki/skills/scripts/build_agent_wiki.py
 --wiki-root <wiki-root> catalog` — the bootstrap pass copies the template
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
If `AGENTS.md` does not exist in a wiki, run
`uv run python plugin-source/skills/agent-wiki/scripts/build_agent_wiki.py
--wiki-root <wiki-root> catalog` — the bootstrap pass copies the template
in. After bootstrap, this file is yours to edit; subsequent catalog runs
do not overwrite an existing `AGENTS.md`.
If `AGENTS.md` does not exist in a wiki, run
`uv run python explorations/agent-wiki/skills/scripts/build_agent_wiki.py
--wiki-root <wiki-root> catalog` — the bootstrap pass copies the template
in. After bootstrap, this file is yours to edit; subsequent catalog runs
do not overwrite an existing `AGENTS.md`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@explorations/agent-wiki/skills/scripts/_default_agents.md` around lines 167 -
171, Update the bootstrap command path in the markdown so it points to the
correct script location: replace occurrences of
"plugin-source/skills/agent-wiki/scripts/build_agent_wiki.py" with
"explorations/agent-wiki/skills/scripts/build_agent_wiki.py" in the AGENTS.md
bootstrap instructions (the sentence containing the command `uv run python ...
build_agent_wiki.py --wiki-root <wiki-root> catalog`) so the documented command
matches the actual repository layout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant