Skip to content

docs: use dstack v0.5.11 build-args for reproducible key-provider mr_enclave#3408

Open
barakeinav1 wants to merge 2 commits into
mainfrom
barak/3153-reproducible-key-provider
Open

docs: use dstack v0.5.11 build-args for reproducible key-provider mr_enclave#3408
barakeinav1 wants to merge 2 commits into
mainfrom
barak/3153-reproducible-key-provider

Conversation

@barakeinav1
Copy link
Copy Markdown
Contributor

@barakeinav1 barakeinav1 commented May 31, 2026

Closes #3153
main changes:

Replaces the temporary manual Dockerfile.key-provider patch (#3152) with the upstreamed build-arg mechanism in dstack v0.5.11 (Dstack-TEE/dstack#672), so operators reproduce the canonical key-provider mr_enclave without editing files.

  • Build key-provider-build/ from a v0.5.11 git worktree: APT_SNAPSHOT=20260423T000000Z ./run.sh (Rust patch + rustup-init pinned in-recipe). dstack-vmm + the OS image stay v0.5.8 (measurements pinned on-chain; VMM bump tracked in Move dstack-vmm (and decide on the guest OS image) to v0.5.11 #3445).
  • Add docker-compose-v2 + docker-buildx to prerequisites (run.sh uses docker compose).
  • Recommend a self-hosted local PCCS for the key provider in production (avoids a third-party single point of failure on the attestation path).
  • Add troubleshooting for AESM service returned error 44 → platform not registered with Intel (enable SGX Auto MP Registration).

some other small doc change have sneaked in as well

@barakeinav1 barakeinav1 marked this pull request as ready for review June 2, 2026 13:40
Copilot AI review requested due to automatic review settings June 2, 2026 13:40
@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Pull request overview

Documentation-only change to the TDX operator guide. Replaces the manual Dockerfile.key-provider patch (the temporary #3152 workaround) with the structural fix from upstream dstack v0.5.11 — APT_SNAPSHOT build-arg + recipe-pinned Rust toolchain — using a v0.5.11 git worktree so dstack-vmm and the guest OS image remain at the on-chain-pinned v0.5.8. Also adds: prerequisite packages (docker-compose-v2, docker-buildx), a production recommendation to point the key provider at a local PCCS, and a troubleshooting entry for AESM error 44 (unregistered platform).

Changes:

  • Add docker-compose-v2 / docker-buildx to the apt install (required by key-provider-build/run.sh's docker compose up --build).
  • Add a top-level note explaining the v0.5.8 (vmm + OS image) vs. v0.5.11 (key-provider only) version split.
  • Remove the hand-patched Dockerfile.key-provider block (apt snapshot + rustup patch pin) and replace with APT_SNAPSHOT=… ./run.sh against a v0.5.11 worktree.
  • Add production guidance to override sgx_default_qcnl.conf with a self-hosted local PCCS.
  • Add troubleshooting section for AESM service returned error 44 (platform-not-registered).

Reviewed changes

Per-file summary
File Description
docs/running-an-mpc-node-in-tdx-external-guide.md Prereqs gain docker-compose-v2 / docker-buildx; the §2 "Clone the dstack repository" step is replaced by a version-split callout; §3.2 swaps the Dockerfile patch for a v0.5.11-worktree APT_SNAPSHOT build, adds local-PCCS guidance, and the troubleshooting section gains an AESM-error-44 entry.

Findings

Blocking (must fix before merge):

  • docs/running-an-mpc-node-in-tdx-external-guide.md:146-156 — The previous 1. **Clone the dstack repository:** heading was deleted, but its git clone fenced block was left in place and the next item is now 2. **Compile dstack-vmm:**. The rendered guide ends up with an orphan code fence (still indented as a sub-item of a now-missing step) followed by a numbered list that starts at 2 (then 3, 4). Either restore a 1. **Clone the dstack repository:** heading above the git clone block (move the new "dstack versions" callout inside it as a sub-paragraph), or renumber 2→1, 3→2, 4→3 and remove the orphan block. Skipped numbering is genuinely confusing here because §3 references "see §2 Dstack Setup and Configuration" and operators are expected to follow these steps in order.

Non-blocking (nits, follow-ups, suggestions):

  • docs/running-an-mpc-node-in-tdx-external-guide.md:560-572 — The local-PCCS JSON snippet says "Set it before ./run.sh" but doesn't say which file to overwrite. The paragraph above does mention key-provider-build/sgx_default_qcnl.conf, but a one-line "Edit /opt/mpc/dstack-v0.5.11/key-provider-build/sgx_default_qcnl.conf:" right before the fence would remove all ambiguity — especially since the worktree just introduced two key-provider-build/ directories on disk and an operator could plausibly edit the v0.5.8 copy by mistake (which run.sh would not pick up).
  • PR-level: the description still has two unchecked acceptance-criteria boxes — the ≥2-host mr_enclave = 6b5ed02e… reproduction and the v0.5.11-vs-v0.5.8 vmm measurement-equivalence check. The author has flagged this as a draft gate; recording it here so it's not lost when this comes out of draft.

⚠️ Issues found

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the external TDX operator guide to rely on dstack v0.5.11’s newly-added APT_SNAPSHOT build-arg for reproducible SGX local key-provider builds (stabilizing the resulting mr_enclave), replacing the prior manual Dockerfile patch instructions.

Changes:

  • Add required Docker tooling packages (docker-compose-v2, docker-buildx) to host prerequisites and explain why.
  • Replace the manual Dockerfile.key-provider patch instructions with a git worktree + APT_SNAPSHOT=... ./run.sh procedure.
  • Add troubleshooting guidance for gramine-sealing-key-provider crash loops with AESM service returned error 44.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/running-an-mpc-node-in-tdx-external-guide.md
Comment thread docs/running-an-mpc-node-in-tdx-external-guide.md Outdated
Comment thread docs/running-an-mpc-node-in-tdx-external-guide.md
@barakeinav1 barakeinav1 force-pushed the barak/3153-reproducible-key-provider branch 4 times, most recently from 60cdaa9 to 82f47a8 Compare June 3, 2026 12:57
Replace the temporary manual Dockerfile.key-provider patch (#3152) with the
upstreamed build-arg mechanism (Dstack-TEE/dstack#672, first released in dstack
v0.5.11), so operators reproduce the canonical key-provider mr_enclave without
editing files:

- Build key-provider-build/ from a v0.5.11 git worktree with
  `APT_SNAPSHOT=20260423T000000Z ./run.sh` (Rust patch version + rustup-init
  pinned in-recipe). dstack-vmm + the guest OS image stay v0.5.8 (measurements
  pinned on-chain; VMM bump tracked in #3445).
- Add docker-compose-v2 + docker-buildx to prerequisites (run.sh uses docker compose).
- Recommend a self-hosted local PCCS for the key provider in production (avoids a
  third-party single point of failure on the attestation path).
- Add troubleshooting for `AESM service returned error 44` (platform not
  registered with Intel -> enable SGX Auto MP Registration).

Closes #3153
@barakeinav1 barakeinav1 force-pushed the barak/3153-reproducible-key-provider branch from 82f47a8 to e793ed8 Compare June 3, 2026 13:00
Copy link
Copy Markdown
Contributor

@gilcu3 gilcu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pin SGX key-provider Dockerfile apt sources for reproducible mr_enclave

3 participants