From b8b3698d1d0da65a96b1882017f792799a700572 Mon Sep 17 00:00:00 2001 From: ArchieIndian Date: Sun, 29 Mar 2026 16:49:33 +0530 Subject: [PATCH] Clean up docs and refactor state helpers --- CHANGELOG.md | 10 +++ CONTRIBUTING.md | 12 +++- README.md | 19 ++++++ docs/OPERATIONS.md | 68 +++++++++++++++++++ scripts/state_helpers.py | 68 +++++++++++++++++++ .../cron-execution-prover/prove.py | 38 +++-------- .../deployment-preflight/check.py | 38 ++++------- .../mcp-auth-lifecycle-manager/manage.py | 46 ++++--------- .../message-delivery-verifier/verify.py | 38 +++-------- .../session-reset-recovery/recover.py | 34 +++------- .../upgrade-rollback-manager/manage.py | 35 ++++------ tests/test-runner.sh | 56 ++++++++------- 12 files changed, 274 insertions(+), 188 deletions(-) create mode 100644 docs/OPERATIONS.md create mode 100644 scripts/state_helpers.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 4035d16..77ecb90 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,15 @@ # Changelog +## [0.2.0] - 2026-03-29 + +### Added +- Runtime reliability skills: `runtime-verification-dashboard`, `deployment-preflight`, `session-reset-recovery`, `cron-execution-prover`, `message-delivery-verifier`, `subagent-capability-auditor`, `upgrade-rollback-manager`, and `mcp-auth-lifecycle-manager` +- Operational playbooks in `docs/OPERATIONS.md` + +### Changed +- README and contributor guidance now reflect the expanded operational skill set and validation workflow +- New shared `scripts/state_helpers.py` reduces repeated state loading and saving code across recent Python helpers + ## [0.1.0] - 2026-03-15 ### Added diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 14c0d57..84b4d40 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -6,7 +6,7 @@ We'd love your skills! Here's how to contribute. 1. **Propose your idea** — [Open a Skill Proposal issue](../../issues/new?template=skill-proposal.yml) to get feedback 2. **Create the skill** — Use the `create-skill` superpower or copy the [template](skills/core/create-skill/TEMPLATE.md) -3. **Validate locally** — Run `./scripts/validate-skills.sh` to catch issues +3. **Validate locally** — Run `./scripts/validate-skills.sh` and `bash ./tests/test-runner.sh` 4. **Submit a PR** — CI validates automatically on any PR that touches `skills/` ## Where to Put Your Skill @@ -46,7 +46,15 @@ Run the validation script before submitting: ./scripts/validate-skills.sh ``` -It checks: frontmatter format, naming conventions, file structure, line count, stateful skill coherence (`STATE_SCHEMA.yaml` present when `stateful: true`), and cron expression format. +It checks: frontmatter format, naming conventions, file structure, line count, stateful skill coherence (`STATE_SCHEMA.yaml` present when `stateful: true`), cron expression format, and README inventory metrics. + +Run the repository smoke tests too: + +```bash +bash ./tests/test-runner.sh +``` + +If you are adding or updating a stateful helper script, prefer reusing the shared helpers in `scripts/state_helpers.py` instead of open-coding the same YAML/JSON state loader again. ## Pull Requests diff --git a/README.md b/README.md index 41a7a55..e723f26 100644 --- a/README.md +++ b/README.md @@ -61,6 +61,22 @@ openclaw gateway restart Install `PyYAML` before using the stateful Python helpers: `python3 -m pip install PyYAML`. +For operator workflows and rollout order, see [docs/OPERATIONS.md](docs/OPERATIONS.md). + +--- + +## Start Here + +If you are adopting the repo for real unattended usage, start in this order: + +1. Run `deployment-preflight` before the first install or after Docker/compose changes. +2. Install the repo and run `runtime-verification-dashboard` once the runtime is live. +3. Wrap scheduled work with `cron-execution-prover` and `message-delivery-verifier`. +4. Add `session-reset-recovery` and `upgrade-rollback-manager` before relying on overnight or upgrade-heavy automation. +5. If you use MCP servers, pair `mcp-health-checker` with `mcp-auth-lifecycle-manager`. + +This gives you deployment safety, runtime visibility, proof of execution, proof of delivery, reset survival, rollback coverage, and MCP auth coverage without enabling every skill at once. + --- ## Skills included @@ -259,6 +275,9 @@ Skills marked with a script ship a small executable alongside their `SKILL.md`: **Self-hosted or Docker deployment** > Run `deployment-preflight` before the first rollout or after compose changes to catch missing mounts, missing bootstrap files, and public gateway exposure. Follow it with `runtime-verification-dashboard` once the runtime is live. +**Operators building a reliability stack** +> Use the playbooks in [docs/OPERATIONS.md](docs/OPERATIONS.md) to layer deployment safety, cron proofing, delivery verification, reset recovery, upgrade rollback, and MCP auth checks in a sane order. + **Open-source maintainer** > `community-skill-radar` scans Reddit for pain points automatically. `skill-vetting` catches malicious community contributions before they're installed. `installed-skill-auditor` detects post-install tampering. diff --git a/docs/OPERATIONS.md b/docs/OPERATIONS.md new file mode 100644 index 0000000..7ebd538 --- /dev/null +++ b/docs/OPERATIONS.md @@ -0,0 +1,68 @@ +# Operational Playbooks + +This repo has enough always-on skills now that the useful question is no longer "which skills exist?" but "which ones should I turn on together?" + +Use these playbooks as a rollout order. + +## 1. First deployment + +Use this when bringing up OpenClaw on a laptop, server, or Docker host for the first time. + +1. Run `deployment-preflight` before install or after any compose change. +2. Install `openclaw-superpowers`. +3. Run `runtime-verification-dashboard` once the runtime is live. +4. Fix any missing mounts, missing bootstrap files, missing cron registrations, or state path issues before enabling unattended workflows. + +Why this order: +- `deployment-preflight` catches layout and exposure mistakes before the runtime starts. +- `runtime-verification-dashboard` catches post-install drift inside the live runtime. + +## 2. Scheduled workflow with proof + +Use this when a cron workflow writes files, posts a report, or notifies a human. + +1. Wrap the workflow in `cron-execution-prover`. +2. Track the last-mile notification in `message-delivery-verifier`. +3. Review stale executions and stale deliveries before trusting the automation. + +Why this order: +- `cron-execution-prover` proves the job started and finished. +- `message-delivery-verifier` proves the output was actually sent and acknowledged. + +## 3. Overnight continuity + +Use this when long-running work regularly crosses the session reset window. + +1. Enable `session-reset-recovery`. +2. Pair it with `task-handoff` for tasks that may span multiple sessions or agents. +3. Review `resume_brief` output after restart before resuming work. + +Why this order: +- `session-reset-recovery` preserves the active checkpoint. +- `task-handoff` keeps the next operator or session from restarting blind. + +## 4. Safer upgrades + +Use this before changing OpenClaw versions, config structure, or deployment layout. + +1. Run `upgrade-rollback-manager --snapshot`. +2. Apply the upgrade. +3. Re-run `deployment-preflight`. +4. Re-run `runtime-verification-dashboard`. +5. If something regresses, generate rollback instructions with `upgrade-rollback-manager --rollback-plan