Source of truth: GitHub issues + milestones at
popsolutions/vessel.
This file mirrors the plan; the live state lives in the issue tracker.
The legacy Forgejo issue tracker at
git.pop.coop/noc/huaweie9000/issuesis now read-only — all open work has been migrated to GitHub.
Vessel is infrastructure-as-code for the Huawei E9000 chassis. Where existing tooling forces you through the Java/Palemoon HMM web UI to do each operation by hand, Vessel treats the whole chassis as a programmable resource:
- A Python SDK that exposes every HMM/iBMC/CX310 operation as a typed call.
- A REST API (FastAPI) on top of the SDK, with auto-generated OpenAPI spec.
- A declarative engine that reads desired state from YAML/HCL and reconciles it against the chassis (Terraform-style).
- Ansible collection + OpenTofu provider built on the SDK so existing IaC pipelines can manage E9000 chassis the same way they manage VMware, Junos, or AWS.
Concrete example — what Vessel makes possible:
resource "vessel_compute_profile" "k8s_worker" { name = "k8s-worker-template" bios_boot_order = ["pxe", "disk"] vnic_profiles = [vessel_vnic_profile.mgmt.id, vessel_vnic_profile.data.id] mac_pool = vessel_mac_pool.workers.id power_policy = "always-on" } resource "vessel_blade" "worker" { for_each = toset(["1","2","3","4","5","6","7","8","9","10"]) slot = tonumber(each.key) profile = vessel_compute_profile.k8s_worker.id }
┌───────────────────────────────────────────────────────────────┐
│ L4 IaC integrations │ Ansible collection │ OpenTofu prov │
│ L3 Declarative engine│ vessel apply -f chassis.yaml │
│ L2 REST API (FastAPI)│ /api/v1/chassis/... + OpenAPI spec │
│ L1 SDK Python │ vessel.chassis.* (typed Pydantic) │
│ L0 Drivers │ HMM web API + Redfish + VRP CLI + SSH│
└───────────────────────────────────────────────────────────────┘
- Snapshot first, change second. No write-action ships until backup + restore for the affected scope is verified on a non-production target.
- Validation watchdog on switch changes. A management-plane heartbeat must succeed after every commit on the CX310s; if it doesn't, auto-rollback using the pre-change snapshot.
- Audit log is a feature, not telemetry. Every mutating call writes a structured record (actor, target, operation, snapshot id, evidence).
- Redfish first, HMM web API second, SSH last. Standardised API beats
custom CLI. Reach for SSH only when neither covers the operation
(KVM,
swiconfexport, VRP-only commands). - Issues = memory. Every non-trivial finding gets an issue. Closed issues carry the evidence forward.
This is the surface area Vessel must cover. Each row maps a section of the Huawei HMM web UI to a Python SDK module and the IaC resources we expose.
| HMM menu | SDK module | IaC resources |
|---|---|---|
| Chassis Settings → Basic / BIOS Boot / Black Box / FC Ports / Reminder / Restore | vessel.chassis.basic |
vessel_chassis_settings |
| Network Settings → MMs / Blades / Switch Modules | vessel.chassis.network |
vessel_mgmt_network |
| Stateless Computing → Profile / MAC Pool / UUID / Node | vessel.chassis.stateless |
vessel_compute_profile, vessel_mac_pool, vessel_uuid_pool |
| easyLink → Setting / Wizard / Switch Profile / NIC Profile | vessel.chassis.easylink |
vessel_switch_profile, vessel_vnic_profile |
| easyLink Network Info → Overview / Ethernet / FC / MAC | vessel.chassis.network_info |
data sources (read-only) |
| PSUs & Fans → Power Meter / Hibernation / Capping / Records / Fan Meter | vessel.chassis.power, vessel.chassis.fan |
vessel_power_policy |
| Alarm Monitoring → Settings / Simulation | vessel.chassis.alarms |
vessel_alarm_subscription |
| System Mgmt → Logs / Account / Security / NTP / SSL / Upgrade | vessel.chassis.system, vessel.chassis.upgrade |
vessel_user, vessel_firmware, vessel_ldap_domain, vessel_ntp_server |
| Compute Nodes → Slot 1..16 | vessel.chassis.blade |
vessel_blade |
| Switch Modules → Swi2 / Swi3 (CX310 VRP) | vessel.chassis.switch |
vessel_vlan, vessel_switch_port, vessel_lag |
| Phase | Goal | Gate to advance |
|---|---|---|
| 1 — Discovery & Backup | Snapshot + restore for HMM, switches, each iBMC | Round-trip backup → wipe → restore on a staging blade |
| 2 — Proxmox commissioning | Mount ISO, boot-once, install Proxmox unattended | First blade installed end-to-end without human touch |
| 3 — Switch ops & VLAN | VLAN CRUD, port assignment, link diagnostics on CX310 | VLAN add+remove with auto-rollback verified |
| 4 — GUI / KVM | Local web app for power, mount, console | All phase-1/2/3 actions usable from the browser |
| 5 — Vessel Platform | API-first SDK + REST + Ansible + OpenTofu provider | End-to-end IaC: VLAN + vNIC profile + bulk firmware via Ansible/OpenTofu |
Sprints inside Phase 5:
| Sprint | Deliverable | Blocker |
|---|---|---|
| 1 — RE | Capture HMM web API endpoints with mitmproxy/DevTools; document in docs/hmm-api/*.md |
None — unblocks everything else |
| 2 — SDK Layer | vessel.chassis.* typed Python (Pydantic models per HMM subsystem) |
Sprint 1 |
| 3 — Bulk firmware | vessel firmware plan/apply CLI + Ansible module (rolling waves + auto-rollback) |
Sprint 2 |
| 4 — VLAN/vNIC IaC | OpenTofu provider — VLAN, vNIC profile, MAC pool resources | Sprint 2 |
| 5 — Stateless profiles | Compute profile + blade assignment in OpenTofu + Ansible | Sprint 4 |
- Map full HMM
smmget/smmsetdata-item dictionary (probe + cross-ref the Huawei HMM CLI Reference). - Determine
swiconfexportoutput destination (does it write to a local path? push via SFTP? require a remote target?). Test on Swi2. - Discover per-blade iBMC IPs in the
172.31.0.0/24fabric (sweep + map slot ↔ IP). - Implement
hmm-snapshotCLI that produces, undersnapshots/<UTC-stamp>/:hmm/redfish.json— full Redfish dump (already prototyped indiscover.py)hmm/smmget.txt— every known data itemhmm/accounts.jsonswitches/swi<N>.cfg— VRP running-config + startup-configibmc/blade<N>.json— Redfish dump from each iBMCmanifest.yaml— what was captured, by whom, when, hashes
- Implement
hmm-restore <snapshot>for each scope (HMM, switch, iBMC). - Verify round-trip on a non-production blade.
- Reach iBMC Redfish through SSH tunnel via HMM (
ssh -L). - Confirm
VirtualMediais exposed by the iBMC; document theInsertMedia/EjectMediaactions. - HTTP host the Proxmox ISO from the workstation (
python -m http.serveror a smallhttpxserver) on a routable interface for the iBMC. - Implement
hmm-provision <blade>that: snapshot → mount ISO → set one-shot boot=CD → power on → wait for install → eject → confirm. - Build a minimal Proxmox answer file for unattended install.
- Validate cluster: 10× CH121 compute + 3× CH222 ceph-osd.
- Confirm CX310 mgmt IPs and SSH access (probably via HMM jump).
- Wrapper around VRP CLI:
system-view,vlan,interface,port,commit,save. Interactive shell flow with paging-prompt handling (screen-length 0 temporary). hmm-net snapshot <swi>(usesswiconfexportfrom Phase 1 + rawdisplay current-configuration).hmm-net vlan add <id> --name <n>etc., always inside the snapshot → apply → watchdog → rollback-or-commit envelope.- Diagnostics:
hmm-net diag <blade>→ ping, ARP, MAC table, LLDP neighbours.
- FastAPI app, single-binary deploy via
uv tool run. ✅ scaffolded - Chassis map view (16-slot grid + 4 switch slots). ✅ done
- Per-blade actions: power on/off, mount ISO, open console. ✅ done
- KVM console — Java applet replacement:
- OldRLE path (BGR233 + XOR diff) — ✅ working live against this chassis
- NewRLE/JPEG path — 🟡 ported but produces single-tile output; needs pcap capture for byte-diff validation (priority: low)
- Backup browser: list/download/restore snapshots from the GUI.
- Reverse engineer the HMM web API (Sprint 1, blocking)
- Use
mitmproxyor Chrome DevTools while operator drives the GUI - Document each endpoint (path, method, auth, request/response schema)
- One markdown doc per HMM menu under
docs/hmm-api/ - Output: enough to generate a typed SDK
- Use
- SDK layer (Sprint 2)
vessel.chassispackage with one module per HMM subsystem- Pydantic models for all responses
- Async + sync APIs
- Typed errors per failure category
- REST API layer (Sprint 3 in parallel with bulk firmware)
- Extend existing FastAPI (
gui/app.py) with/api/v1/... - All read operations exposed today
- Mutation endpoints with mandatory backup-first
- OpenAPI spec at
/api/openapi.json
- Extend existing FastAPI (
- Bulk firmware upgrade (Sprint 3)
vessel firmware inventory— list all upgradable components and current versions (iBMC, BIOS, switches, fans, SMMs)vessel firmware plan --image <bin> --targets <selector>— dry-runvessel firmware apply --plan <file>— rolling waves with healthcheck- auto-rollback on failure
- Declarative state engine (Sprint 4-5)
vessel apply -f chassis.yaml(plan/apply/destroy semantics)- YAML schema for VLANs, vNIC profiles, blade profiles
- Drift detection
- Ansible Collection (Sprint 3-5)
popsolutions.vessel.chassis_vlanpopsolutions.vessel.firmware_upgradepopsolutions.vessel.blade_provisionpopsolutions.vessel.alarm_query
- OpenTofu Provider (Sprint 4-5)
provider "vessel"published to OpenTofu Registry- Resources:
vessel_vlan,vessel_vnic_profile,vessel_switch_profile,vessel_blade,vessel_firmware,vessel_compute_profile,vessel_mac_pool,vessel_uuid_pool
Always check the GitHub board for the freshest list — this table is a human-friendly snapshot.
| # | Title | Phase | Priority |
|---|---|---|---|
| #1 | Map HMM smmget/smmset data-item dictionary | 1 | medium |
| #2 | Determine swiconfexport output destination |
1 | medium |
| #3 | Discover per-blade iBMC IPs in 172.31.0.0/24 | 1 | high |
| #4 | Implement hmm-snapshot CLI |
1 | high |
| #5 | Implement hmm-restore <snapshot> CLI |
1 | high |
| #6 | Verify backup roundtrip on non-production blade | 1 | high |
| #7 | Reach iBMC Redfish through SSH tunnel via HMM | 2 | medium |
| #8 | Confirm iBMC VirtualMedia actions |
2 | high |
| #9 | Implement hmm-provision <blade> --iso <url> |
2 | medium |
| #10 | Proxmox unattended-install answer file | 2 | medium |
| #11 | CX310 mgmt SSH access + VRP CLI wrapper | 3 | high |
| #12 | VLAN CRUD with snapshot + watchdog rollback | 3 | high |
| #13 | Network diagnostics (ping/ARP/MAC/LLDP) | 3 | medium |
| #14 | FastAPI GUI scaffolding + chassis map | 4 | medium |
| #15 | Capture & decode legacy Java KVM applet handshake | 4 | medium |
| #16 | Console: noVNC fallback to SOL | 4 | low |
| #17 | NewRLE/JPEG decoder: top-row tile copy fails (red square) | 4 | low |
| #18 | Capture NewRLE pcap from Palemoon for byte-diff | 4 | low |
| #19 | Port DQT tables for JPEG quality 20/30/40/60-100% | 4 | low |
| #20 | CX310 dispatcher: interactive shell flow | 3 | medium |
| #21 | Audit log for all chassis-mutating operations | 5 | high |
| #22 | GUI authentication (currently zero auth) | 4 | high |