explore(agent-wiki): example wikis (companion to #268) by vinodmut · Pull Request #269 · AgentToolkit/altk-evolve

vinodmut · 2026-06-10T07:06:32Z

Companion to #268 (the agent-wiki code PR) — merge AFTER #268

Note on the diff: this branch is based on the #268 branch, but GitHub
can't target a fork-only branch as the PR base, so against main it
currently shows #268's files too. Once #268 merges, this diff collapses
to just the example wikis below. Review #268 first.

Adds the four benchmark-derived example wikis built by the agent-wiki skills, split out from #268 so that PR's diff stays focused on reviewable code (builder, skills, docs, experiment harness) instead of ~10k lines of generated output.

explorations/agent-wiki/wikis/
├── wiki-twobatch/          16-task corpus, guidelines arm
├── wiki-twobatch-skills/   skills-only arm
├── wiki-twobatch-both/     skills + guidelines
└── wiki-twobatch-pruned/   skills + only no-skill-coverage atomics (delete-on-promote)

These are generated artifacts — every page is emitted by build_agent_wiki.py from trajectories, not hand-authored. Provenance back-links appear in the generic trajectories/<session-id>.json form. Merging this resolves the wikis/wiki-twobatch-* references in #268's README and docs.

Adds explorations/agent-wiki/ — the agent-wiki skill family, builder, design + schema docs, the wiki-helps experiment reports, and benchmark-derived example wikis, all under one tree suitable for a public PR. Contents: - skills/ 7 agent-wiki skills + build_agent_wiki.py (reference copy, not plugin-wired) - docs/ design.md + schema.md - experiments/ RESULTS-SUMMARY + twobatch comparison reports + pruned-index-hypothesis; metrics/ rollups (no raw transcripts); harness/ runner + compare scripts - wikis/ wiki-terminalbench-bob + the twobatch arms (base / skills / both / pruned-corrected) Public-safety scrub: - Excluded all raw per-trial sandbox transcripts (kept only metric rollups + narrative reports). - Excluded wikis built from internal corpora (procedural-design, consult-meta, iterative, retroactive, simple-claude, test-paired, claude) and the build-pattern comparison that ran on them; §3-4 of RESULTS-SUMMARY reduced to a portable-finding note. - Rewrote all source-path frontmatter to the generic trajectories/<session-id>.json form; genericized internal example names and the benchmark-data dir convention in skills/docs. - Leak gate (benchmark-data / internal corpus + wiki names / org paths) passes with zero hits across the tree. Branched off main; diff touches only explorations/agent-wiki/. Builder catalog + comparison scripts verified runnable from the new location.

Removes the terminal-bench example wiki from the exploration. Repoints the README reading-order + layout to wiki-twobatch-skills, fixes the docs that attributed worked examples to it (schema.md now points at the wiki-twobatch arms; example index rows retagged), and corrects stale relative links the docs carried from the original tree (../plugin-source → ../skills, ../WIKIS.md removed, ../experiments/wiki-build-comparison.md → RESULTS-SUMMARY §3–4, design.md/schema.md cross-links to renamed filenames). Skill example paths (consult, ingest) repointed off the removed wiki. Remaining wikis: wiki-twobatch {base, skills, both, pruned}. All intra-doc relative links resolve; leak gate clean.

CI (ruff, mypy, detect-secrets) was scanning explorations/agent-wiki/ as project source — the first content under explorations/ to carry .py files and high-entropy identifiers. Fixes, scoped so generated example artifacts are treated like the already-excluded plugin-source/ and examples/ trees: - ruff: lint + format fixes in the harness scripts + builder; exclude the generated wiki scripts (explorations/agent-wiki/wikis/) via extend-exclude. - mypy: add explorations/agent-wiki/wikis/ to exclude; add file-local `# mypy: ignore-errors` to the exploration harness + the builder (a verbatim copy of the mypy-excluded plugin-source/ original). - detect-secrets: exclude explorations/agent-wiki/ in the pre-commit hook and .secrets.baseline — the 53 findings are 12-hex guideline content hashes and session-id UUIDs, not secrets. No example-wiki content changed (scripts keep their original names). Fixes failing CI checks: check-formatting, check-linting, check-typing, tekton/pr-code-checks/code-detect-secrets.

Drops explorations/agent-wiki/wikis/ (253 generated files, ~10k lines) from this PR so the diff is the reviewable surface — skills, builder, docs, and the experiment reports/harness (~34 files). The example wikis are machine- generated output; bundling them buried the code and appears to have made CodeRabbit skip deep review (summary only, zero inline findings). The wikis land in a stacked follow-up PR. README/docs still reference wikis/wiki-twobatch-* by path; those links resolve once the follow-up merges. Root-config excludes (ruff/mypy/detect-secrets) are kept — the detect-secrets exclude still covers example content hashes in docs/schema.md, and the wiki excludes become live again when the follow-up lands.

The four benchmark-derived example wikis built by the agent-wiki skills: wiki-twobatch {base, skills, both, pruned}. Generated artifacts — each page is machine-emitted by build_agent_wiki.py from the trajectories, with provenance back-links shown in the generic trajectories/<session-id>.json form. Stacked on the code PR (AgentToolkit#268); resolves the wikis/wiki-twobatch-* references in that PR's README/docs.

coderabbitai · 2026-06-10T07:06:39Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c31b019f-d4b1-4a6f-8d0d-16a6a2e97199

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vinodmut added 5 commits June 10, 2026 00:54

vinodmut mentioned this pull request Jun 10, 2026

explore(agent-wiki): trajectory-derived wiki — skills, builder, experiments #268

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explore(agent-wiki): example wikis (companion to #268)#269

explore(agent-wiki): example wikis (companion to #268)#269
vinodmut wants to merge 5 commits into
AgentToolkit:mainfrom
vinodmut:explorations/agent-wiki-wikis

vinodmut commented Jun 10, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 10, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vinodmut commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Companion to #268 (the agent-wiki code PR) — merge AFTER #268

Uh oh!

coderabbitai Bot commented Jun 10, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vinodmut commented Jun 10, 2026 •

edited

Loading