-
Notifications
You must be signed in to change notification settings - Fork 100
doc: added measured boot documentation #1952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kbakanov-dev
wants to merge
5
commits into
NVIDIA:main
Choose a base branch
from
kbakanov-dev:mb_onboarding_docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+122
−0
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
f3de5fa
doc: added measured boot documentation
kbakanov-dev bd6f735
Merge branch 'main' of github.com:NVIDIA/bare-metal-manager-core into…
kbakanov-dev db88634
fix: updated doc in line with feedback
kbakanov-dev 58ccd8e
doc: added a clarification
kbakanov-dev ea3ac8c
Merge branch 'main' into mb_onboarding_docs
kbakanov-dev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| # Measured Boot Attestation | ||
|
|
||
| ## Introduction | ||
|
|
||
| At the highest level, measured boot works by comparing (attesting) hash values (checksums) of certain critical components of a vulnerable system against some golden values. | ||
|
|
||
| In NICo ecosystem, this system has three components: | ||
|
|
||
| - Custom UEFI code on tenant machines, which performs measurements during machine boot and pushes those hash values (measurements) into so called PCR registers. | ||
| - `scout`, which runs inside discovery image, and which collects PCR values and submits them to `nico-api`. | ||
| - `nico-api`, which analyses PCR values received from `scout`, compares those values against golden values and makes a judgement call whether the measured boot is successful or not. | ||
|
|
||
| Measured Boot attestation happens during the ingestion of a managed host, and after a managed host has been released by a tenant. | ||
|
|
||
| ## Measured Boot Scope and Its Effect | ||
|
|
||
| Measured Boot requires the presence of a TPM (Trusted Platform Module) chip on a managed host. DPUs do not support Measured Boot due to a lack of TPM hardware. | ||
|
|
||
| For the hosts the Measured Boot is configurable and can be turned off with `attestation_enabled` flag set to `false` in `carbide-api-site-config.toml`. | ||
|
|
||
| If Measured Boot is enabled and the attestation fails, the effect will be that `scout` will never get dispensed mTLS certificates that it needs to communicate with `nico-api`. Without mTLS certificates only gRPC calls not requiring mutual authentication will succeed, such as `DiscoverMachine` (which is where attestation happens) or `Version`. | ||
|
|
||
| ## Measured Boot Operation | ||
|
|
||
| A prerequisite for any Measured Boot operation is the presence of a CA certificate. This certificate is needed to verify the authenticity of a TPM. When `scout` submits attestation info, it also includes its TPM's Endorsement Key (EK) certificate. `nico-api` verifies the authenticity of EK certificate (and thus of the TPM) by using a CA certificate. | ||
|
|
||
| When a machine is ingested into NICo, its DPU is being configured first (to provide network connectivity), and then its host is being provisioned second. The host boots into what is called a discovery, or a scout, image. Once booted into discovery image, the `scout` application runs and attempts to register the machine with the `nico-api` by making a gRPC `DiscoverMachine` call. In that call it supplies some attestation information. If the attestation is enabled, the `nico-api` replies with a challenge. The `scout` attempts to solve the challenge with the help of a TPM and, if successful, replies with a solved challenge and with the attestation information including PCR values. `nico-api` verifies the attestation information and the challenge, and, if both of those things are successful, it dispenses mTLS certificates to `scout`, which are necessary for all future communication. If the attestation is unsuccessful, the `scout` will keep calling `DiscoverMachine` for the next ten years at a rate of one call per minute. | ||
|
|
||
| Besides the ingestion path, the Measured Boot will also occur when an assigned machine is released back into the pool, or when attestation status for the machine changes while the machine is in the Ready state. | ||
|
|
||
| ## Configuring Measured Boot | ||
|
|
||
| Once `attestation_enabled` flag is set, the `nico-api` will attempt to do attestation for all hosts. | ||
|
|
||
| By default, if none of the approval strategies have been enacted as discussed below in the section [Configuring nico-api for Attestation](#configuring-nico-api-for-attestation), the machines will be stuck in `Measuring/WaitingForMeasurements` state. | ||
|
|
||
| ## Installing CA Certificates | ||
|
|
||
| First, a CA certificate needs to be installed. This is done using `nico-admin-cli`'s `tpm-ca` commands. | ||
|
|
||
| - `tpm-ca show-unmatched-ek` will show all EK certificates/machines for which there is no matching CA certificate (a machine will be in `Failed/MeasurementsCAValidationFailed` state in this instance). A link to download the CA certificate will typically be shown also. It is important that whoever is adding CA certificates makes sure they are downloaded from a reputable source - the need for a human verification is the reason why this operation was not automated in the first place. | ||
| - `tpm-ca show` will display all installed CA certificates. | ||
| - `tpm-ca add --filename <ca_cert>` will add a given CA certificate to `nico-api`. | ||
| - `tpm-ca delete --ca-id <ca_cert_id>` will delete a given CA certificate from `nico-api`. The `ca_cert_id` can be retrieved from the output of `tpm-ca show` command. | ||
| - `tpm-ca add-bulk --dirname <dir_path>` will add all certificates in a given directory to `nico-api`. | ||
|
|
||
| CA certificates must have extension `.pem`, `.cer` or `.der` and must be in corresponding format. A validation is done in `nico-admin-cli` to confirm that this is the case. | ||
|
|
||
| ## Configuring nico-api for Attestation | ||
|
|
||
| The attestation on the `nico-api` side is done with the help of the following entities, which are all present in `nico-api` DB and can be accessed with `nico-admin-cli`: | ||
|
|
||
| - `report` - this is what `scout` sends to `nico-api`, including PCR values. It is accessible under `measured-boot report` subcommand. | ||
| - `profile` - this is combination of properties that describe a type of a machine. It is used to group identical machines together. It is accessible under `measured-boot profile` subcommand. | ||
| - `bundle` - this is a set of golden values against which attestation is being made. Each bundle is linked to a profile. It is accessible under `measured-boot bundle` subcommand. | ||
| - `journal` - this is an "umbrella" entity linking report to a possible bundle and to a profile. It is accessible under `measured-boot report` subcommand and will contain the measurement status of a machine. | ||
|
|
||
| Once a report is sent for a given machine, it is saved in a DB, and `nico-api` attempts to find a corresponding machine profile, failing that it will create a new profile. After the report is saved and profile is found or created, it will attempt to find the matching bundle and save a journal entry linking all of the above. If a bundle has been identified, the machine's attestation status will be `Measured` and the process returns. | ||
|
|
||
| If a bundle has not been identified, the `nico-api` will attempt to create a bundle from the report in one of two ways: | ||
|
|
||
| 1. It will check if a given machine (or any machine in a site) has been marked for auto-approval. | ||
| 2. Failing that, it will check if a given profile has been marked for auto-approval. | ||
|
|
||
| If a bundle could not be created in any of the two ways above, the attestation will initially fail and `scout` will keep making `DiscoverMachine` calls every minute for the next ten years. | ||
|
|
||
| One way to resolve this situation is to mark a given machine for one shot auto approval with: | ||
|
|
||
| ```sh | ||
| measured-boot site trusted-machine approve <machine id> oneshot --pcr-registers <list of PCR registers> | ||
| ``` | ||
|
|
||
| In this way a given machine will be marked for automatic approval one time only. This will be enough to create an initial bundle. An alternative way would be to promote the existing report with: | ||
|
|
||
| ```sh | ||
| measured-boot report promote <REPORT ID> | ||
| ``` | ||
|
|
||
| Similarly, a given profile can be marked for one shot auto approval with: | ||
|
|
||
| ```sh | ||
| measured-boot site trusted-profile approve <profile id> oneshot --pcr-registers <list of PCR registers> | ||
| ``` | ||
|
|
||
| In both of the above cases, the `--pcr-registers` argument is optional. If omitted, all available PCR registers will be selected. | ||
|
|
||
| If a site is to be left running in permissive mode, an auto approve can be configured for all machines on a site with: | ||
|
|
||
| ```sh | ||
| measured-boot site trusted-machine approve * persist | ||
| ``` | ||
|
|
||
| > Note: the `*` has a tendency to confuse bash scripts, if one is to script this command. | ||
|
|
||
| Similarly, a `persist` auto approval can be set for a profile. | ||
|
|
||
| In order to make it easier set up measured boot on new sites, it is possible to export an existing site with `measured-boot site export` command and then import the values with `measured-boot site import` command on a different site. | ||
|
|
||
| For the full breakdown of possible options, please browse `nico-admin-cli measured-boot` subcommand, where you will discover some other options not discussed here, such as e.g. being able to manually set the state of a bundle. | ||
|
|
||
| ## NICo Web UI | ||
| In the NICo Web UI, there is a Section named "Attestations", where the attestations for all machines are listed. It makes it easier to see the attestations' status at a glance, drill down into individual reports and create bundles out of reports. | ||
|
|
||
| ## Practical Application | ||
| One issue with Measured Boot is that the PCR values can change quite frequently. The operator needs to establish which PCR values are necessary for a particular monitoring use case, and which ones of those are stable enough to include in the bundle. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| When a site had just been set up and the attestation had been enabled, all machines will fail attestation unless it had been configured. In order to enable the "accept anything" type of attestation, the easiest way is to execute a command similar to this one: | ||
|
|
||
| ```sh | ||
| nico-admin-cli -c https://nico-api.forge/ measured-boot site trusted-machine approve \* persist --pcr-registers="0,3,5,6" | ||
| ``` | ||
|
|
||
| This will insert a rule to automatically approve all reports for all machines on site `az32`, that rule will persist and bundles will be created out of reports using PCR registers 0, 3, 5 and 6. Obviously, such attestation is not enforcing any specific PCR register values, so it should be used with caution. | ||
|
|
||
| If a machine is stuck in `Measuring/WaitingForMeasurements` state, this means the measurements have not been sent by `scout`. This may be due to the problem booting into discovery/scout image, or maybe due to some problem with the `scout` itself. | ||
|
|
||
| Other Measured Boot error states are: | ||
|
|
||
| - `PendingBundle` - this means attestation is being enforced and measurements from a given host do not have matching golden values. Please refer to above sections for details on creating a new bundle from the latest report. Also, if a closest match has been identified, it will be shown in the Attestation table in NICo UI. Using `nico-admin-cli measured-boot bundle find-closest-match` will achieve the same result. | ||
| - `Failed/MeasurementsCAValidationFailed` indicates that a corresponding CA certificate has not been found for a EK cert that is returned by this machine's TPM. Please refer to above sections on details about installing missing CA certificates. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add this file to the TOC in docs/index.yml. The "Provisioning (Day 0 Operations)" section seems like the proper place to me.