Skip to content

[vm-repair] Fix unlock failure on Ubuntu 24.04 ADE-encrypted VMs#9907

Open
msaenzbosupport wants to merge 1 commit into
Azure:mainfrom
msaenzbosupport:fix/vm-repair-unlock-ubuntu2404
Open

[vm-repair] Fix unlock failure on Ubuntu 24.04 ADE-encrypted VMs#9907
msaenzbosupport wants to merge 1 commit into
Azure:mainfrom
msaenzbosupport:fix/vm-repair-unlock-ubuntu2404

Conversation

@msaenzbosupport
Copy link
Copy Markdown

Description

Fix az vm repair create --unlock-encrypted-vm failing on Ubuntu 24.04 ADE-encrypted VMs.

Problem

The data_os_lvm_check function in linux-mount-encrypted-disk.sh identifies the root partition by filtering partitions larger than 600MB:

export root_part=lsblk ${data_disk} -l -n -p -b 2>&1 | grep -w -v ${data_disk} | awk '$4 > 600000000{print $1}'`

Ubuntu 24.04 introduced a ~913MB /boot partition (partition 16, ext4, LABEL=BOOT). This partition exceeds the 600MB threshold, causing root_part to capture two values instead of one:

root_part=/dev/sdb1\n/dev/sdb16

When unlock_root passes this to cryptsetup, the command becomes:

cryptsetup luksOpen --key-file ... --header ... /dev/sdb1 /dev/sdb16 osencrypt

Where /dev/sdb16 is interpreted as the mapper name instead of osencrypt, producing:

Device sdb16 not found
Cannot use device /dev/sdb16, name is invalid or still in use.

Partition layout comparison

Ubuntu version Boot partition Size Exceeds 600MB?
20.04 sdX2 (ext2) ~248MB No
22.04 sdX2 (ext2) ~248MB No
24.04 sdX16 (ext4) ~913MB Yes

Fix

Replace the fixed-threshold filter with a sort-by-size approach that selects only the largest partition (always root):

export root_part=lsblk ${data_disk} -l -n -p -b 2>&1 | grep -w -v ${data_disk} | sort -k4 -rn | awk 'NR==1{print $1}'`

This is future-proof against boot partition size changes in newer distro versions.

Testing

  • Ubuntu 24.04 Gen 1 + ADE: unlock now succeeds (was failing before)
  • Ubuntu 22.04 Gen 1/Gen 2 + ADE: no regression (verified)
  • Ubuntu 20.04 Gen 1/Gen 2 + ADE: no regression (verified)
  • RHEL 7/8/9 LVM + ADE: not affected (uses LVM branch, not this code path)

Affected file

src/vm-repair/azext_vm_repair/scripts/linux-mount-encrypted-disk.sh

Copilot AI review requested due to automatic review settings May 29, 2026 18:02
@azure-client-tools-bot-prd
Copy link
Copy Markdown

Validation for Breaking Change Starting...

Thanks for your contribution!

@azure-client-tools-bot-prd
Copy link
Copy Markdown

Hi @msaenzbosupport,
Please write the description of changes which can be perceived by customers into HISTORY.rst.
If you want to release a new extension version, please update the version in setup.py as well.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the Linux encrypted disk mounting helper to more reliably detect the OS/root partition on distros where /boot may exceed the previous size threshold.

Changes:

  • Replace a fixed size threshold partition selection with “largest partition wins”.
  • Add clarifying comments explaining why the selection logic changed.

# Select the largest partition on the data disk (root is always the largest).
# Using sort+head instead of a size threshold to avoid matching /boot partitions
# that exceed 600MB (e.g. Ubuntu 24.04 has a ~913MB /boot on partition 16).
export root_part=`lsblk ${data_disk} -l -n -p -b 2>&1 | grep -w -v ${data_disk} | sort -k4 -rn | awk 'NR==1{print $1}'` >> ${logpath}/${logfile}
# Select the largest partition on the data disk (root is always the largest).
# Using sort+head instead of a size threshold to avoid matching /boot partitions
# that exceed 600MB (e.g. Ubuntu 24.04 has a ~913MB /boot on partition 16).
export root_part=`lsblk ${data_disk} -l -n -p -b 2>&1 | grep -w -v ${data_disk} | sort -k4 -rn | awk 'NR==1{print $1}'` >> ${logpath}/${logfile}
@msaenzbosupport
Copy link
Copy Markdown
Author

Re: Copilot review comments

Thanks for the review. Both observations about 2>&1 inside the command substitution and the unquoted variables with >> ${logpath}/${logfile} at the end are pre-existing patterns in the original code — every line in this function uses the same style:

# Original line (before this PR):
export root_part=lsblk ${data_disk} -l -n -p -b 2>&1 | grep -w -v ${data_disk} |awk '$4 > 600000000{print $1}'` >> ${logpath}/${logfile}

The scope of this PR is intentionally limited to fixing the partition selection logic (threshold → largest partition) to unblock Ubuntu 24.04 ADE unlock. Refactoring the logging/quoting patterns across the entire script would be a separate effort.

Re: HISTORY.rst / setup.py

This change only affects a shell script that runs inside the repair VM via az vm run-command invoke. It does not change any Python code, CLI parameters, or extension behavior beyond fixing the broken Ubuntu 24.04 unlock path. Happy to add a HISTORY.rst entry if the maintainers prefer it — please advise.

The root partition detection in data_os_lvm_check uses a 600MB size
threshold to filter partitions. Ubuntu 24.04 has a ~913MB /boot
partition (partition 16) that also exceeds this threshold, causing
root_part to capture two partitions instead of one.

This results in cryptsetup receiving an invalid device name argument:
  cryptsetup luksOpen ... /dev/sdb1 /dev/sdb16 osencrypt
instead of:
  cryptsetup luksOpen ... /dev/sdb1 osencrypt

The error manifests as:
  Device sdb16 not found
  Cannot use device /dev/sdb16, name is invalid or still in use.

Fix: Replace the fixed-threshold filter with a sort-by-size approach
that selects only the largest partition (which is always the root
partition). This is future-proof against /boot partition size changes.

Additional improvements per review feedback:
- Use $() instead of backticks for command substitution
- Redirect stderr to logfile instead of capturing into the variable
- Quote variables to prevent word-splitting issues
- Separate export from assignment for clarity

Tested on Ubuntu 24.04 Gen 1 and Gen 2 with ADE encryption - unlock
now succeeds. Also verified no regression on Ubuntu 20.04 and 22.04.
@msaenzbosupport msaenzbosupport force-pushed the fix/vm-repair-unlock-ubuntu2404 branch from 65b5286 to 6c8be38 Compare May 29, 2026 19:55
@yonzhan yonzhan assigned yanzhudd and unassigned zhoxing-ms May 29, 2026
@yonzhan yonzhan removed the request for review from zhoxing-ms May 29, 2026 23:33
@yonzhan
Copy link
Copy Markdown
Collaborator

yonzhan commented May 29, 2026

vm-repair

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Auto-Assign Auto assign by bot Compute

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants