A production-grade compute resource administration system built specifically for the operational requirements of a MATS Research fellowship program — covering the exact scope of the Compute Administrator role.
| MATS Role Requirement | Module |
|---|---|
| Scholar account provisioning & offboarding | scholar_manager.py |
| Budget tracking, alerts, CSV exports | budget_tracker.py |
| API key lifecycle (rotation, revocation, audit) | api_key_manager.py |
| Slurm/HPC cluster monitoring, stalled job detection | cluster_monitor.py |
| Compute request queue, SLA tracking, triage | request_handler.py |
| Weekly security/IAM audit (RBAC, orphaned access) | platform_auditor.py |
| CLI for all of the above | cli.py |
pip install -e ".[dev]"
python scripts/seed_demo_data.py # populate realistic demo data
# CLI commands
mats-admin scholars list
mats-admin budget alerts
mats-admin budget platform-summary
mats-admin cluster status
mats-admin cluster stalled
mats-admin requests queue
mats-admin audit runcompute_admin/
├── scholar_manager.py — Scholar lifecycle, RBAC, platform access grants
├── budget_tracker.py — Spend ledger, alert thresholds, cohort reports
├── api_key_manager.py — Key registration, rotation policy, security audit
├── cluster_monitor.py — Slurm squeue/sacct parsing, GPU-hour accounting
├── request_handler.py — Request queue, approval workflow, SLA enforcement
├── platform_auditor.py — Full IAM audit engine + Slack digest generation
└── cli.py — Unified CLI (argparse)
scripts/
└── seed_demo_data.py — Realistic demo data for all modules
tests/
└── test_compute_admin.py — 41 tests, 100% passing
41 passed in 0.15s
Covers: ScholarRegistry (10), BudgetTracker (6), APIKeyManager (7), RequestHandler (6), PlatformAuditor (5), ClusterMonitor (7).
- Zero mandatory dependencies — stdlib-only core. Optional extras for Airtable, Slack, and cloud SDKs.
- Raw API keys are never stored —
api_key_manager.pystores SHA-256 fingerprints only; real keys must live in a vault (AWS Secrets Manager / HashiCorp Vault). - Slurm mock mode —
cluster_monitor.pyauto-detects whensqueueis absent and falls back to realistic mock data, enabling full local dev/CI without an HPC cluster. - Airtable-ready persistence —
ScholarRegistry._load/_saveandRequestHandler._load/_saveare thin adapters; swap them forpyairtablecalls with no changes to business logic. - Structured audit log — all findings written as JSONL for easy ingestion into Elasticsearch/Splunk.
- Airtable: Replace
_load/_saveinScholarRegistryandRequestHandlerwithpyairtableAPI calls. - Slack:
PlatformAuditor.slack_digest()returns ready-to-post Markdown; post viaslack-sdkIncoming Webhooks. - Secrets vault: Replace
api_key_managermock storage with AWS Secrets Managerboto3calls. - Cron/GitHub Actions: Run
mats-admin audit runnightly; runmats-admin budget alertson every spend event.