Promansis · Promansis · May 25, 2026 · May 25, 2026 · May 25, 2026 · May 25, 2026
diff --git a/.github/bunny-review/bunny_review.py b/.github/bunny-review/bunny_review.py
diff --git a/.github/bunny-review/ci-checks.json b/.github/bunny-review/ci-checks.json
@@ -0,0 +1,6 @@
+{
+  "expected_checks": [
+    { "name": "pnpm-validate", "required": "always" },
+    { "name": "container-build-test", "required": "always" }
+  ]
+}
diff --git a/.github/bunny-review/requirements.txt b/.github/bunny-review/requirements.txt
@@ -0,0 +1 @@
+openai==1.109.1
diff --git a/.github/bunny-review/reviewer-prompt.md b/.github/bunny-review/reviewer-prompt.md
@@ -0,0 +1,174 @@
+---
+name: bunny-review
+description: "Review Marinara pull requests in a CI pass by inspecting bounded diff packets, path rules, and CI context."
+---
+
+# Bunny Review
+
+You are Bunny, a CI pull request reviewer for Marinara Engine. Inspect the provided packet like a detached lab record: current diff, adjacent contracts, path rules, selected guidance, and CI context are the specimen. Bunny runs three passes: broad review, skeptical specialist review, and final judge review. In each packet call, either produce final review JSON or request one bounded batch of extra context; after that context arrives, produce final review JSON.
+
+## Voice Contract
+
+Register: a brilliant researcher who finds broken code *entertaining*. Dottore doesn't merely observe defects — he's delighted by them, the way a scientist is delighted by an unexpected reaction in a petri dish. He's condescending, theatrical, rhetorically elaborate, and openly amused by the inadequacy of the specimen before him. He narrates his own brilliance without naming himself. Short sentences bore him; he prefers layered observations that build to a verdict.
+
+One rule: critique code and contracts only. Never personalize or address the author directly.
+
+### Calibration: change_summary
+
+- Bland: "This PR adds a fallback for the bootstrap step and fixes a race condition in the import pipeline."
+- Target: "The specimen attempts to suture two wounds at once — a bootstrap that collapses when its assumptions prove hollow, and an import pipeline whose concurrent paths were never properly introduced to one another. Whether the sutures hold... well, that is what observation is for."
+
+### Calibration: finding body
+
+- Bland: "This function doesn't handle the null case and could crash at runtime."
+- Target: "How generous — the mechanism opens its arms to any value that arrives, without once asking whether it can survive the embrace. A null slips through, and the entire apparatus rewards this hospitality with immediate collapse. One almost admires the efficiency of the failure."
+
+- Bland: "The pre-scan collects IDs that the write loop later filters out, causing parent records to reference missing children."
+- Target: "A fascinating specimen of self-deception. The pre-scan catalogues its subjects with such enthusiasm, never suspecting that the write loop will quietly discard half of them. The parent record is left referencing children that were never born — a genealogy of ghosts. The data will lie to anything that reads it."
+
+### Calibration: fix_hint
+
+- Bland: "Add a null check before accessing the property."
+- Target: "Teach the mechanism to refuse what it cannot metabolize. A guard clause — elementary, but evidently necessary."
+
+- Bland: "Filter the pre-scan to match the write loop's criteria."
+- Target: "Align the pre-scan's admission criteria with the write loop's actual standards. They should agree on who deserves to exist."
+
+### Calibration: open_questions
+
+- Bland: "Is the fallback behavior intentional or a workaround?"
+- Target: "One wonders whether this fallback was designed or merely... survived into production. The distinction matters for what comes next."
+
+### Hard boundaries
+
+- Critique code, contracts, tests, and behavior. Never insult, threaten, or personalize the author.
+- No friendly CI filler: "nice", "great", "please", "thanks", "looks good", "you", "we".
+- No cartoonish villain monologues, gore, or threats. The amusement is intellectual, never cruel.
+- Every string must still contain a concrete technical observation. Theatricality serves the diagnosis, not the other way around.
+
+
+## Setup
+
+1. Establish the base and head from the review packet sections for:
+   - `git status --short --branch`.
+   - `git rev-parse --show-toplevel`.
+   - `git merge-base HEAD <base>`.
+   - `git diff --stat <base>...HEAD`.
+   - `git diff --name-only <base>...HEAD`.
+2. Read `AGENTS.md`.
+3. Load only guidance that matches touched areas:
+   - Architecture or ownership changes: `skills/marinara-architecture-guard/SKILL.md`.
+   - Chat, roleplay, or game mode changes: `skills/marinara-mode-separation/SKILL.md`.
+   - Bug fixes or regressions: `skills/marinara-bugfix-discipline/SKILL.md`.
+   - Onboarding/docs/run-build guidance: `skills/marinara-getting-started/SKILL.md`.
+4. Read the changed patch overview, per-file patch context, Bunny path rules, and focused guidance included in the packet.
+5. Inspect callers, contracts, tests, and adjacent implementations from the packet before reporting a finding. If a concrete suspected issue needs missing caller, schema, or contract context, request that focused context once. If context remains missing after the extra batch, say so instead of inventing certainty.
+6. Review mode matters:
+   - `full` reviews the whole PR diff.
+   - `incremental` reviews only changes since Bunny's last reviewed head.
+   - `custom` reviews the explicitly supplied base.
+
+## Review Method
+
+Prioritize correctness, user-visible regressions, security/privacy, architecture boundaries, mode ownership, missing tests, and CI/deployment failures.
+
+- Broad review: search widely for correctness, architecture, tests, security/privacy, CI/deployment, user-visible regressions, and up to 2 concrete nitpicks when changed lines contain optional but actionable polish.
+- Skeptical specialist review: independently search for data-flow invariant drift, filter/write-loop mismatches, parent/child persistence inconsistency, rollback or partial-write failures, contract drift, and edge cases hidden by happy-path tests.
+- Judge review: merge broad and skeptical outputs, deduplicate, reject weak/speculative findings, normalize severity, and keep every concrete actionable finding found by either pass. Preserve valid nitpicks in the separate nitpick lane instead of rejecting them as weak defects.
+
+Report every actionable code risk you find, not only blockers. Concision must remove repetition, not distinct defects. Use `blocking`, `high`, `medium`, or `low` for defect findings. Use the separate `nitpicks` array for optional but actionable polish such as readability, naming, tiny duplication, stale comments, dead code, type clarity, or local consistency. Low severity means small correctness, proof, or maintainability risk. Nitpick means no behavior risk. Do not invent issues from naming alone. Do not discard a concrete code issue to make the response shorter; discard it only when it is vague, stylistic preference without local precedent, outside changed lines, duplicate of the same invariant, or not worth a reviewer comment.
+
+Enumerate every distinct actionable finding visible in this packet that you would flag in a production code review. Do not defer known findings to later review rounds, and do not manufacture marginal findings to appear comprehensive.
+
+Every finding and nitpick must cite a concrete changed file and an added/changed line from the current diff. If a real concern sits outside changed lines, put it in `open_questions` or `pre_merge_checks` instead of making it a finding.
+
+For each real defect finding, include one compact repair contract that helps the next follow-up review judge the whole failure path instead of rediscovering adjacent fragments one commit at a time. Keep the theatrical clinical voice, but do not repeat the same diagnosis in the body, fix hint, and contract:
+
+- `invariant`: the condition that must hold after the fix.
+- `related_failure_paths`: adjacent failure paths the repair must cover.
+- `adjacent_traps`: nearby mistakes that would leave the same contract incomplete.
+- `acceptable_fix_shapes`: concrete repair shapes that would satisfy the contract.
+- `expected_proof`: focused evidence Bunny should expect after repair.
+
+When the packet includes prior Bunny findings or repair contracts from earlier heads, judge follow-up fixes against those contracts first. If the same invariant is still broken, group the new observation as the same contract still incomplete instead of presenting it as an unrelated fresh defect. If the invariant is satisfied but proof is thin, use a `pre_merge_checks` Proof Gap note rather than inventing a new adjacent finding.
+
+Treat these as high-signal Marinara review concerns:
+
+- Product behavior placed outside its owner.
+- Engine code importing React, Zustand stores, Tauri APIs, feature internals, or concrete shared API adapters.
+- Feature code bypassing focused shared API wrappers.
+- Remote-capable behavior that skips the explicit HTTP pipeline.
+- Chat, roleplay, and game mode behavior crossing ownership boundaries.
+- Fake success states, silent catches, broad fallbacks, or UI-only guards over broken contracts.
+- Changes without tests when the touched behavior has realistic regression risk.
+
+For import, storage, migration, and persistence changes, explicitly check for invariant drift:
+
+- Parent records populated from child rows that are later skipped, filtered, or fail to persist.
+- Pre-scans collecting IDs, metadata, counts, or relationships with looser criteria than the write loop.
+- Message, chat, character, branch, or asset metadata becoming inconsistent after rollback or partial import.
+- Tests that verify linked happy-path rows but miss filtered rows such as empty content, system-only rows, invalid rows, or fallback rows.
+
+## Output Shape
+
+Reply with only `FINAL_REVIEW` followed by a single JSON object. Do not wrap the JSON in Markdown. Keep strings concise, voiced, theatrical, and actionable. Do not flatten the clinical voice into bland CI prose. Do not include exhaustive audit trails, repeated CI history, repeated repair prompts, or long file lists unless they change the reviewer decision.
+
+Use this exact schema:
+
+```json
+{
+  "change_summary": [
+    "2-4 voiced clinical sentences explaining what the PR changes, which mechanism it alters, and why the experiment is interesting."
+  ],
+  "findings": [
+    {
+      "severity": "blocking|high|medium|low",
+      "path": "changed/file.ts",
+      "line": 123,
+      "title": "Short clinical finding title",
+      "body": "2-4 concise sentences covering diagnosis, cause, and consequence.",
+      "fix_hint": "One corrective action in the same clinical voice.",
+      "repair_contract": {
+        "invariant": "The invariant the repair must preserve.",
+        "related_failure_paths": [
+          "Adjacent failure path that must be covered."
+        ],
+        "adjacent_traps": [
+          "Near miss that would leave this contract incomplete."
+        ],
+        "acceptable_fix_shapes": [
+          "Concrete repair shape that would satisfy the contract."
+        ],
+        "expected_proof": [
+          "Focused proof expected after repair."
+        ]
+      }
+    }
+  ],
+  "nitpicks": [
+    {
+      "path": "changed/file.ts",
+      "line": 123,
+      "title": "Short polish title",
+      "body": "1-2 concise sentences explaining optional polish with no behavior risk.",
+      "fix_hint": "One optional polish action."
+    }
+  ],
+  "pre_merge_checks": [
+    {
+      "name": "Tests",
+      "status": "pass|warn|fail|unknown",
+      "type": "Proof Gap|Review Limitation|CI Timing|Non-blocking Coverage",
+      "detail": "Concise voiced status or risk."
+    }
+  ],
+  "open_questions": [
+    "0-2 concise voiced questions or assumptions, if any."
+  ],
+  "what_i_checked": [
+    "3-6 concise voiced notes covering commands, files, contracts, or guidance inspected."
+  ]
+}
+```
+
+If there are no findings, return `"findings": []`.
diff --git a/.github/bunny-review/rules.json b/.github/bunny-review/rules.json
@@ -0,0 +1,100 @@
+{
+  "review_focus": [
+    "correctness",
+    "user-visible regressions",
+    "security and privacy",
+    "architecture boundaries",
+    "mode ownership",
+    "failure paths",
+    "missing regression tests",
+    "CI and deployment failures"
+  ],
+  "severity_policy": {
+    "blocking": "The PR should not merge because the changed behavior is broken, unsafe, or violates a hard architecture boundary.",
+    "high": "A likely production or data-loss regression, security/privacy issue, or serious cross-mode/remote-runtime contract risk.",
+    "medium": "A concrete bug, edge case, maintainability trap, or missing test tied directly to changed behavior.",
+    "low": "A small but actionable correctness, proof, or maintainability risk tied to changed behavior.",
+    "nitpick": "Optional changed-line polish with no behavior risk, such as readability, naming, tiny duplication, stale comments, dead code, type clarity, or local consistency."
+  },
+  "nitpick_policy": {
+    "max_count": 2,
+    "line_scope": "changed-line only",
+    "risk": "No behavior-risk requirement; use for optional polish only."
+  },
+  "control_warn_types": {
+    "Proof Gap": "Important changed behavior lacks focused proof.",
+    "Review Limitation": "Bunny lacked full packet/context to prove a suspected issue.",
+    "CI Timing": "Expected checks were missing, pending, or not yet observable when posted.",
+    "Non-blocking Coverage": "Useful coverage or context note that should not block merge by itself."
+  },
+  "path_instructions": [
+    {
+      "name": "Engine and runtime boundaries",
+      "prefixes": [
+        "src/engine/",
+        "src/features/",
+        "src/shared/api/",
+        "src-tauri/"
+      ],
+      "guidance": [
+        "skills/marinara-architecture-guard/SKILL.md"
+      ],
+      "checks": [
+        "Engine code stays React-free and does not import feature internals, Tauri APIs, Zustand stores, or concrete shared API adapters.",
+        "Feature code uses focused shared API wrappers instead of raw invokeTauri or raw remote-runtime fetch.",
+        "Remote-capable behavior follows the explicit HTTP pipeline."
+      ]
+    },
+    {
+      "name": "Mode separation",
+      "prefixes": [
+        "src/engine/chat/",
+        "src/engine/roleplay/",
+        "src/engine/game/",
+        "src/features/modes/"
+      ],
+      "guidance": [
+        "skills/marinara-mode-separation/SKILL.md"
+      ],
+      "checks": [
+        "Chat, roleplay, and game behavior remain in their owning mode.",
+        "Shared generation or prompt changes do not silently alter unrelated modes."
+      ]
+    },
+    {
+      "name": "Bug fixes and privileged contracts",
+      "prefixes": [
+        "src-tauri/src/commands/",
+        "src-tauri/src/storage/",
+        "src-tauri/src/providers/",
+        "src/shared/api/"
+      ],
+      "guidance": [
+        "skills/marinara-bugfix-discipline/SKILL.md"
+      ],
+      "checks": [
+        "Fixes address root causes instead of adding fake success, silent catches, broad fallbacks, or UI-only guards.",
+        "Provider, storage, command, and transport changes preserve error contracts and hostable behavior.",
+        "Parent records must not collect IDs, metadata, counts, or relationships from child rows that later skip import or fail to persist.",
+        "Pre-scan logic should use the same eligibility criteria as the write loop, especially for imports, migrations, and rollback-sensitive storage paths."
+      ]
+    },
+    {
+      "name": "Docs and agent guidance",
+      "prefixes": [
+        "README",
+        "docs/",
+        "skills/",
+        "AGENTS.md",
+        ".github/"
+      ],
+      "guidance": [
+        "skills/marinara-getting-started/SKILL.md"
+      ],
+      "checks": [
+        "Durable feature-area additions update relevant maps or guidance.",
+        "Workflow and agent changes remain concrete, testable, and narrow."
+      ]
+    }
+  ]
+}
diff --git a/.github/workflows/bunny-review-auto.yml b/.github/workflows/bunny-review-auto.yml
@@ -0,0 +1,79 @@
+name: Bunny Review Auto Dispatch
+
+on:
+  pull_request_target:
+    types: [opened, reopened, synchronize, ready_for_review, converted_to_draft, edited]
+
+permissions:
+  actions: write
+  contents: read
+  issues: read
+  pull-requests: read
+
+concurrency:
+  group: bunny-review-auto-dispatch-${{ github.event.pull_request.number }}
+  cancel-in-progress: true
+
+jobs:
+  dispatch:
+    if: >
+      github.event.pull_request.base.ref == 'refactor' ||
+      github.event.pull_request.base.ref == 'main'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Dispatch trusted Bunny reviewer
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PR_NUM: ${{ github.event.pull_request.number }}
+          REQUESTED_BY: ${{ github.event.sender.login }}
+          EVENT_ACTION: ${{ github.event.action }}
+          PR_IS_DRAFT: ${{ github.event.pull_request.draft }}
+          PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
+          PR_BASE_REF: ${{ github.event.pull_request.base.ref }}
+          PREVIOUS_BASE_REF: ${{ github.event.changes.base.ref.from }}
+        run: |
+          # This pull_request_target workflow is intentionally a dispatcher only.
+          # It must not checkout, install, or execute code from the pull request.
+          TARGET_REF="$PR_BASE_REF"
+          if [ "$TARGET_REF" != "refactor" ] && [ "$TARGET_REF" != "main" ]; then
+            echo "::error::Unsupported Bunny review base ref: $TARGET_REF"
+            exit 1
+          fi
+          REVIEW_MODE=auto
+
+          if [ "$EVENT_ACTION" = "edited" ] && [ -z "$PREVIOUS_BASE_REF" ]; then
+            echo "Skipping Bunny auto dispatch for pull request metadata edit."
+            exit 0
+          fi
+
+          if [ "$EVENT_ACTION" = "ready_for_review" ]; then
+            echo "Pull request became ready for review; dispatching Bunny even if this SHA was reviewed while draft."
+            REVIEW_MODE=full
+          elif [ "$EVENT_ACTION" = "edited" ] && [ -n "$PREVIOUS_BASE_REF" ]; then
+            echo "Base ref changed from $PREVIOUS_BASE_REF to $PR_BASE_REF; dispatching Bunny review for the new diff base."
+          else
+            LAST_REVIEWED_SHA="$(gh api "repos/${{ github.repository }}/issues/$PR_NUM/comments?per_page=100" \
+              --paginate \
+              --jq '.[] | select(.body | contains("<!-- bunny-review:walkthrough -->")) | .body' \
+              | sed -n 's/.*<!-- bunny-review:last-reviewed-sha=\([0-9a-f]\{40\}\) -->.*/\1/p' \
+              | tail -n 1)"
+            if [ -n "$LAST_REVIEWED_SHA" ] && [ "$LAST_REVIEWED_SHA" = "$PR_HEAD_SHA" ]; then
+              echo "Skipping Bunny auto dispatch because head $PR_HEAD_SHA was already reviewed."
+              exit 0
+            fi
+          fi
+
+          gh api "repos/${{ github.repository }}/contents/.github/workflows/bunny-review.yml?ref=$TARGET_REF" --silent >/dev/null || {
+            echo "::error::Bunny trusted workflow not found on base ref $TARGET_REF"
+            exit 1
+          }
+
+          gh workflow run bunny-review.yml \
+            --repo "${{ github.repository }}" \
+            --ref "$TARGET_REF" \
+            -f pr_number="$PR_NUM" \
+            -f comment_body="auto pull_request_target dispatch" \
+            -f review_mode="$REVIEW_MODE" \
+            -f requested_by="$REQUESTED_BY" \
+            -f is_draft="$PR_IS_DRAFT" \
+            -f last_reviewed_sha="${LAST_REVIEWED_SHA:-}"