skill: replace before-and-after with persona-driven version by akanksha276 · Pull Request #344 · mycelium-io/mycelium

akanksha276 · 2026-06-10T17:03:00Z

Summary

Replaces .claude/skills/before-and-after/ with .claude/skills/persona-before-and-after/, a superset that supports three persona input modes:

Dataset personas (recommended) — agents built from versioned preference + strategy files in the agent-personas repo; reproducible across runs
Inline personas — describe agents yourself in the prompt; the skill writes SOUL.md from your description (same flexibility as the old before-and-after skill)
Custom SOUL.md — bring your own files, same as before

Key additions over the old skill:

Before-case uses preference-only personas (no strategy injection) as a clean control; after-case injects the full negotiation strategy — isolating the Mycelium protocol contribution
summarize_experiments.py aggregates results across multiple experiment gists, scores issue/option recall/F1 against all_missions_set1_gold.json, and synthesises a verdict via LLM
Validated across 9 experiments (ex01–ex09); results at https://gist.github.com/akanksha276/b7f61c246891e0e335666f034849b90d

Test plan

Run persona-before-and-after list to verify persona dataset clone works
Run one experiment end-to-end with dataset personas (e.g. ex03_personal_planning — fastest, ~4 rounds)
Run one experiment with inline personas to verify the old workflow still works
Run summarize_experiments.py against the 9 published gists and verify output matches the summary gist
Confirm no other skill or doc references the removed before-and-after skill

…ter skill anthropic/claude-haiku-* requires a direct Anthropic API key which isn't configured in this environment — agents auth through litellm. Fixes the "No API key found for provider anthropic" error on agent bootstrap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nd full (after) SOUL.md Before case agents were contaminating the control by spontaneously using `mycelium session join` — their SOUL.md included the strategy/negotiate block which describes the CLI protocol. Phase 0.7 now produces two files: - exp_personas_before.json — preference parts only (keys not in {negotiate, general}) - exp_personas_after.json — full persona (preference + strategy) Phase 1b writes preference-only SOUL.md for the before case. Phase 3a rewrites SOUL.md with the full persona before the after case runs. All other references (agent list, openclaw config, prompt derivation, cleanup) updated to use the appropriate file.

…case seed Agents were prematurely declaring 'CONSENSUS LOCKED' in chat before CognitiveEngine confirmed anything, which confused other agents about the actual negotiation state. Phase 3b seed now instructs agents to: - Never self-declare consensus — only CE's 'consensus' message is authoritative - On CE timeout/broken: post a final message with their last accepted position so the transcript has a readable end state for evaluation Also caps per-turn chat narration to 1-2 sentences.

…uation - Summary table now tracks messages exchanged (both cases) and engine ticks/rounds (after case only) instead of the rounds-centric metric that had no equivalent in the before case - Add message-counting script before report generation so values are auto-derived from transcripts rather than hand-filled Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e counting

Phase 5 now copies before/after transcripts, session transcripts, and ingest logs into ~/.mycelium/rooms/${EXP_ID}/ before deleting room dirs. Phase 4b gist staging reads from the eval dir first, falling back to live room dirs. Required for summarize_experiments.py to compute issue recall/F1 scores.

…ter analysis

…gest events, fuzzy match - Add Before/After Negotiation Moves columns (parsed from evaluation.md Summary table) - Remove Before/After Rounds columns - Parse ingest events directly from *-ingest-stats.json gist files - Fix after-case issue recall (0%) with containment-based fuzzy matching + lower Jaccard threshold (0.5→0.3) Signed-off-by: akanksha276 <akanksha276@gmail.com>

Before moves = non-facilitator chat messages in before-transcript.md. After moves = direct session actions in after-session-transcript.md. Removes dependency on evaluation.md table being manually populated. Signed-off-by: akanksha276 <akanksha276@gmail.com>

Signed-off-by: akanksha276 <akanksha276@gmail.com>

…table Signed-off-by: akanksha276 <akanksha276@gmail.com>

Signed-off-by: akanksha276 <akanksha276@gmail.com>

- Replace per-experiment verdict concatenation with LLM synthesis via LiteLLM proxy (haiku by default, SUMMARY_MODEL env override) - Falls back to concatenation if LLM call fails - Remove Aggregate Statistics table from report - Strip markdown headers from LLM output Signed-off-by: akanksha276 <akanksha276@gmail.com>

- Add inline persona option (Phase 0.55) alongside dataset path - Fix all backend curl paths to use /api/ prefix - Fix plugin path: adapters/ → integrations/openclaw/assets/ - Upgrade Phase 0.8 to check configured openclaw model first - Update agent-personas repo to mycelium-io org Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The Est. input tokens regex was anchored with \s* after the label, causing it to miss rows like: | Est. input tokens (exp-2318 total) | ... | Est. input tokens (total buffer) | ... Relaxed to [^|]* so any text between the label and the first pipe is consumed, fixing blank #Tokens for ex03 and ex04. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…only Signed-off-by: akanksha276 <akanksha276@gmail.com>

…atching Add _extract_issues_from_ce_consensus() to parse CognitiveEngine consensus/plan lines from all three transcript formats observed across experiments: - [coordination_consensus] CognitiveEngine: {JSON} (inline, ex07 style) - [coordination_consensus] CognitiveEngine:\n{JSON} (next-line, ex01 style) - [CognitiveEngine] {JSON} (inline prefix, ex09 style) - **CognitiveEngine:**\n{JSON} (markdown, exp-4919 style) Extraction uses three strategies in priority order: full JSON parse of "assignments" dict, regex over truncated "assignments" blocks, and semicolon- delimited "plan": "key=value" string parsing. Add _EmbedMatcher using fastembed BAAI/bge-small-en-v1.5 (threshold 0.70) to replace Jaccard-only fuzzy matching. Auto-detects the Mycelium backend venv; falls back to Jaccard when fastembed is unavailable. Eliminates 0% after-recall caused by CE using agent-generated labels rather than gold standard labels. After-recall improvement across 9 experiments: ex04 8%→108%, ex05 0%→50%, ex06 0%→55%, ex09 38%→100%. Trim candidates reduced from 29 to 17.

Replace Jaccard 0.4 threshold with embedding cosine similarity (0.65) when comparing found option values against gold option strings. CE negotiated values are paraphrases of gold options, not near-copies — Jaccard was producing 0% after-option-recall across all experiments. After-option-recall improvement: ex03 0%→58%, ex04 0%→81%, ex05 0%→64%, ex06 0%→39%, ex09 0%→71%.

Add _extract_options_from_ce_consensus() to extract {issue: resolved_value} pairs from CE consensus/plan lines (assignments dict and plan key=value string). These are added to the options dict so the value matcher can compare the CE's single agreed value against gold option descriptions. Fixes 0% after-option- recall for ex01/ex07/ex08 where no offer-tick payloads were present. Also upgrade compute_option_metrics() key routing to use embedding similarity (threshold 0.70) in addition to Jaccard/word-overlap. CE plan keys like "broad diversification required" don't word-overlap with gold issue names like "sector exposure", but embedding similarity is 0.69+. Remaining 0% after-option-recall for ex07/ex08 reflects genuine CE coverage gaps (plan only resolved 2-3 issues vs 10-11 gold), not a matching failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: akanksha276 <akanksha276@gmail.com>

akanksha276 and others added 30 commits May 5, 2026 09:34

add persona-before-and-after skill

e8ac855

Merge remote-tracking branch 'origin/main' into persona-before-and-after

d3c9e4b

Merge remote-tracking branch 'origin/main' into persona-before-and-after

9c29c35

Merge remote-tracking branch 'origin/main' into persona-before-and-after

ddfd095

docs(skill): fix before-case contamination and improve Phase 4 messag…

543f765

…e counting

feat(skill): add summarize_experiments.py for aggregate before-and-af…

7995189

…ter analysis

fix(skill): normalize consensus columns to Yes/No only

75183cf

Signed-off-by: akanksha276 <akanksha276@gmail.com>

fix(skill): remove Before Score and After Score columns from summary …

819fc24

…table Signed-off-by: akanksha276 <akanksha276@gmail.com>

fix(skill): round ingest token counts to nearest k

5b1638f

Signed-off-by: akanksha276 <akanksha276@gmail.com>

feat(skill): replace Per-Experiment Issue Coverage with Verdicts section

eb63b2a

Signed-off-by: akanksha276 <akanksha276@gmail.com>

feat(skill): combine verdicts into single paragraph

f603fb1

Signed-off-by: akanksha276 <akanksha276@gmail.com>

fix(summarize): parse 'Agent messages' row as negotiation moves fallback

6056841

fix(summarize): fix before-move counter to match exp- agent headings …

a9fc3b1

…only Signed-off-by: akanksha276 <akanksha276@gmail.com>

fix(summarize): raise max_tokens for verdict synthesis from 300 to 600

87f39a9

widen negotiation-moves column to avoid truncating round labels

0a4edec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into persona-before-and-after

5cf23bf

remove before-and-after skill, superseded by persona-before-and-after

fc3637c

Signed-off-by: akanksha276 <akanksha276@gmail.com>

akanksha276 requested a review from juliarvalenti as a code owner June 10, 2026 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skill: replace before-and-after with persona-driven version#344

skill: replace before-and-after with persona-driven version#344
akanksha276 wants to merge 30 commits into
mainfrom
persona-before-and-after

akanksha276 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akanksha276 commented Jun 10, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant