Skip to content

explore(agent-wiki): example wikis (companion to #268)#269

Draft
vinodmut wants to merge 5 commits into
AgentToolkit:mainfrom
vinodmut:explorations/agent-wiki-wikis
Draft

explore(agent-wiki): example wikis (companion to #268)#269
vinodmut wants to merge 5 commits into
AgentToolkit:mainfrom
vinodmut:explorations/agent-wiki-wikis

Conversation

@vinodmut

@vinodmut vinodmut commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Companion to #268 (the agent-wiki code PR) — merge AFTER #268

Note on the diff: this branch is based on the #268 branch, but GitHub
can't target a fork-only branch as the PR base, so against main it
currently shows #268's files too. Once #268 merges, this diff collapses
to just the example wikis below.
Review #268 first.

Adds the four benchmark-derived example wikis built by the agent-wiki skills, split out from #268 so that PR's diff stays focused on reviewable code (builder, skills, docs, experiment harness) instead of ~10k lines of generated output.

explorations/agent-wiki/wikis/
├── wiki-twobatch/          16-task corpus, guidelines arm
├── wiki-twobatch-skills/   skills-only arm
├── wiki-twobatch-both/     skills + guidelines
└── wiki-twobatch-pruned/   skills + only no-skill-coverage atomics (delete-on-promote)

These are generated artifacts — every page is emitted by build_agent_wiki.py from trajectories, not hand-authored. Provenance back-links appear in the generic trajectories/<session-id>.json form. Merging this resolves the wikis/wiki-twobatch-* references in #268's README and docs.

vinodmut added 5 commits June 10, 2026 00:54
Adds explorations/agent-wiki/ — the agent-wiki skill family, builder, design
+ schema docs, the wiki-helps experiment reports, and benchmark-derived
example wikis, all under one tree suitable for a public PR.

Contents:
  - skills/        7 agent-wiki skills + build_agent_wiki.py (reference copy,
                   not plugin-wired)
  - docs/          design.md + schema.md
  - experiments/   RESULTS-SUMMARY + twobatch comparison reports +
                   pruned-index-hypothesis; metrics/ rollups (no raw
                   transcripts); harness/ runner + compare scripts
  - wikis/         wiki-terminalbench-bob + the twobatch arms
                   (base / skills / both / pruned-corrected)

Public-safety scrub:
  - Excluded all raw per-trial sandbox transcripts (kept only metric
    rollups + narrative reports).
  - Excluded wikis built from internal corpora (procedural-design,
    consult-meta, iterative, retroactive, simple-claude, test-paired,
    claude) and the build-pattern comparison that ran on them; §3-4 of
    RESULTS-SUMMARY reduced to a portable-finding note.
  - Rewrote all source-path frontmatter to the generic
    trajectories/<session-id>.json form; genericized internal example
    names and the benchmark-data dir convention in skills/docs.
  - Leak gate (benchmark-data / internal corpus + wiki names / org paths)
    passes with zero hits across the tree.

Branched off main; diff touches only explorations/agent-wiki/. Builder
catalog + comparison scripts verified runnable from the new location.
Removes the terminal-bench example wiki from the exploration. Repoints the
README reading-order + layout to wiki-twobatch-skills, fixes the docs that
attributed worked examples to it (schema.md now points at the wiki-twobatch
arms; example index rows retagged), and corrects stale relative links the
docs carried from the original tree (../plugin-source → ../skills,
../WIKIS.md removed, ../experiments/wiki-build-comparison.md → RESULTS-SUMMARY
§3–4, design.md/schema.md cross-links to renamed filenames). Skill example
paths (consult, ingest) repointed off the removed wiki.

Remaining wikis: wiki-twobatch {base, skills, both, pruned}. All intra-doc
relative links resolve; leak gate clean.
CI (ruff, mypy, detect-secrets) was scanning explorations/agent-wiki/ as
project source — the first content under explorations/ to carry .py files
and high-entropy identifiers. Fixes, scoped so generated example artifacts
are treated like the already-excluded plugin-source/ and examples/ trees:

- ruff: lint + format fixes in the harness scripts + builder; exclude the
  generated wiki scripts (explorations/agent-wiki/wikis/) via extend-exclude.
- mypy: add explorations/agent-wiki/wikis/ to exclude; add file-local
  `# mypy: ignore-errors` to the exploration harness + the builder (a
  verbatim copy of the mypy-excluded plugin-source/ original).
- detect-secrets: exclude explorations/agent-wiki/ in the pre-commit hook
  and .secrets.baseline — the 53 findings are 12-hex guideline content
  hashes and session-id UUIDs, not secrets.

No example-wiki content changed (scripts keep their original names).
Fixes failing CI checks: check-formatting, check-linting, check-typing,
tekton/pr-code-checks/code-detect-secrets.
Drops explorations/agent-wiki/wikis/ (253 generated files, ~10k lines) from
this PR so the diff is the reviewable surface — skills, builder, docs, and
the experiment reports/harness (~34 files). The example wikis are machine-
generated output; bundling them buried the code and appears to have made
CodeRabbit skip deep review (summary only, zero inline findings).

The wikis land in a stacked follow-up PR. README/docs still reference
wikis/wiki-twobatch-* by path; those links resolve once the follow-up
merges. Root-config excludes (ruff/mypy/detect-secrets) are kept — the
detect-secrets exclude still covers example content hashes in docs/schema.md,
and the wiki excludes become live again when the follow-up lands.
The four benchmark-derived example wikis built by the agent-wiki skills:
wiki-twobatch {base, skills, both, pruned}. Generated artifacts — each page
is machine-emitted by build_agent_wiki.py from the trajectories, with
provenance back-links shown in the generic trajectories/<session-id>.json
form. Stacked on the code PR (AgentToolkit#268); resolves the wikis/wiki-twobatch-*
references in that PR's README/docs.
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c31b019f-d4b1-4a6f-8d0d-16a6a2e97199

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant