[codex] docs: add Slurm academic sandbox plan and adapter#345
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed June 14, 2026, 4:30 AM ET / 08:30 UTC. Summary Reproducibility: not applicable. this is a feature/docs/example PR, not a bug report. Source review confirms current main lacks the requested Slurm surface and the PR adds it. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Land a maintainer-approved external-provider-first Slurm contract only after the example adapter preserves state on cancellation/status uncertainty, validates config in doctor before job submission, and has redacted real-run proof. Do we have a high-confidence way to reproduce the issue? Not applicable; this is a feature/docs/example PR, not a bug report. Source review confirms current main lacks the requested Slurm surface and the PR adds it. Is this the best way to solve the issue? No as submitted; the external-provider-first direction matches the repository boundary, but the adapter needs cleanup-safety fixes, earlier config validation, maintainer contract approval, and real-run proof before it is the best merge path. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against ccc27374948c. Label changesLabel justifications:
Evidence reviewedSecurity concerns:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
@clawsweeper re-review The PR now includes the reference Slurm external-provider adapter and sample runner in addition to the docs/product contract. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…tests Take the Slurm reference adapter from draft to mergeable: - Stop emitting the reserved routing label "slug" from lease responses; the broker rejects leases that set reserved labels (lease/slug/name/ externalResourceName/externalResourceNameFromEnv), most visibly when idempotentLeaseId is enabled for acquire and list. Namespace adapter labels (state, slurmJobId, slurmState) instead. - Parse sbatch --parsable output safely: tolerate empty stdout and the "<jobid>;<cluster>" form without IndexError. - Parse sacct State robustly (strip "CANCELLED by <uid>" qualifiers, skip blank step rows) without indexing an empty split. - Make list/cleanup/find_state resilient to corrupt or partially written state.json files via a shared iter_states() helper that skips unreadable files instead of aborting the whole operation. - Add argparse help text and reject negative --poll-interval. Add test_slurm_cbx.py (pytest) with sbatch/squeue/sacct/scancel mocked, covering doctor, the acquire -> resolve -> release happy path, idempotent re-acquire, lease-id parsing, endpoint-timeout cancellation and keep behavior, list filtering, cleanup, expected-identity mismatch, proxy-through-login, secret redaction, corrupt-state resilience, and the stdin main() doctor path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add `python3 -m pytest examples/slurm-external-provider/ -q` to the local checks in the feature doc and example README, and describe what the suite covers. - Ignore __pycache__, *.pyc, and .pytest_cache so test runs do not dirty the tree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
provider: externalas the first safe campus offer.examples/slurm-external-provider/, a reference Python external-provider adapter plus sample unprivilegedsshdrunner for campus pilots.What the adapter does
doctor,acquire,resolve,list,release,touch, andcleanup.sbatch --parsable, records the Slurm job ascloudId: slurm/job/<id>, watchessqueue/optionalsacct, and releases withscancel.Research notes
sbatchas accepting a script before an allocation may actually run, so the docs recommendwarmupto absorb scheduler queue time.squeuefor live job state andscancelfor cancellation, which maps cleanly to external-provider list/status/release behavior.slurmrestdis not designed to be directly internet-facing, so the docs keep Slurm control-plane access site-local.References:
Closes #325
Validation
python3 -m py_compile examples/slurm-external-provider/slurm-cbx.pybash -n examples/slurm-external-provider/runner-unprivileged-sshd.shgit diff --checkacquire,list, andreleaseagainst stubbedsbatch,squeue, andscancelscripts/check-docs.sh