Skip to content

refine: Refine virt for vmm#114

Merged
junyu0312 merged 1 commit intomainfrom
refactor_hvp
Apr 2, 2026
Merged

refine: Refine virt for vmm#114
junyu0312 merged 1 commit intomainfrom
refactor_hvp

Conversation

@junyu0312
Copy link
Copy Markdown
Owner

@junyu0312 junyu0312 commented Apr 1, 2026

Summary by CodeRabbit

  • Refactor
    • Restructured virtual machine lifecycle management, replacing builder pattern with manager-based initialization for improved control flow.
    • Consolidated vCPU trait hierarchy and introduced dedicated error type for enhanced diagnostics during VM operations.
    • Separated hypervisor abstraction into distinct VM creation and instance management responsibilities for cleaner architecture boundaries.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

This PR fundamentally restructures the VM and vCPU abstraction layers. It removes execution responsibilities from the generic Virt trait, introduces a new Vm trait for VM management, replaces the generic Vcpu<A> trait with an architecture-agnostic Vcpu trait, creates a new VcpuManager to centralize vCPU lifecycle management, introduces VcpuError as the dedicated vCPU error type, and replaces VmBuilder with a Vmm struct. Architecture-specific backends (HVP, KVM) and the PSCI implementation are updated accordingly.

Changes

Cohort / File(s) Summary
Error handling foundation
crates/vm-core/src/vcpu/error.rs, crates/vm-machine/src/error.rs
Introduced VcpuError enum with platform-specific variants (ApplevisorError, KvmError, PsciError), DeviceError, GuestError, and VcpuNotCreated. Extended machine-layer Error with VmAlreadyExists, VmNotExists, and VcpuError variants.
Core trait layer restructuring
crates/vm-core/src/arch/vcpu.rs, crates/vm-core/src/arch.rs, crates/vm-core/src/virt.rs
Removed generic Vcpu<A> trait; introduced new Vm trait for VM creation/vCPU management; simplified Virt trait to single create_vm() method; removed vcpu module export from arch layer.
Architecture-specific vCPU traits
crates/vm-core/src/vcpu/vcpu.rs, crates/vm-core/src/arch/aarch64/vcpu.rs, crates/vm-core/src/arch/x86_64/vcpu.rs
Introduced new Vcpu trait (architecture-conditional) with vm_exit_handler(), post_init_within_thread(), and run() methods; decoupled from Vcpu<A> bounds. Updated AArch64Vcpu and X86_64Vcpu to use VcpuError and mutable receivers; added get_psci_handler() to AArch64Vcpu.
PSCI firmware layer
crates/vm-core/src/arch/aarch64/firmware/psci.rs, crates/vm-core/src/arch/aarch64/firmware/psci/psci_0_2.rs
Updated Psci::call() to take &mut dyn AArch64Vcpu; replaced cpu_on_barrier channel-based synchronization in Psci02 with shared Arc<Mutex<VcpuManager>>; direct vCPU invocation via start_vcpu() instead of channel messaging.
AArch64 VM-exit and register handling
crates/vm-core/src/arch/aarch64/vm_exit.rs, crates/vm-core/src/arch/aarch64/vcpu.rs, crates/vm-core/src/arch/aarch64/vcpu/reg/esr_el2.rs
Changed handle_vm_exit() to retrieve handlers from vCPU trait methods; updated error type to VcpuError; added setup_cpu() function for CPU initialization; changed esr_el2::ec() to return VcpuError.
vCPU manager infrastructure
crates/vm-core/src/vcpu/vcpu_manager.rs
Added new VcpuManager struct managing VM instance and vCPU collection; implements vCPU creation/initialization and per-vCPU thread spawning with architecture-specific CPU setup and VM-exit handling.
HVP/Apple Hypervisor backend
crates/vm-core/src/virt/hvp.rs, crates/vm-core/src/virt/hvp/vcpu.rs
Replaced Hvp struct with AppleHypervisor (implements new Virt trait) and AppleHypervisorVm (implements Vm trait); refactored HvpVcpu to store dependencies as Arc trait objects and Option-wrapped handles; implements new Vcpu trait; added post_init_within_thread() for lazy vCPU creation; renamed vcpu_id: u64 to usize.
KVM backend
crates/vm-core/src/virt/kvm.rs, crates/vm-core/src/virt/kvm/vcpu/arch/aarch64.rs, crates/vm-core/src/virt/kvm/vcpu/arch/x86_64.rs
Updated Virt implementation to only provide create_vm() (todo); updated vCPU trait method signatures to use VcpuError; changed AArch64Vcpu/X86_64Vcpu implementations to use mutable receivers and new error types; implemented new Vcpu trait; added vm_exit_handler() and post_init_within_thread() stubs.
VM machine layer refactoring
crates/vm-machine/src/vm.rs, crates/vm-machine/src/vm/config.rs, crates/vm-machine/src/vmm.rs
Replaced generic Vm<V> with concrete Vm holding dynamic Arc<dyn vm_core::virt::Vm> and shared VcpuManager; introduced VmConfig struct for configuration; added new Vmm struct managing hypervisor instance and VM lifecycle; renamed run() to boot() in Vm.
VmBuilder removal and architecture updates
crates/vm-machine/src/vm_builder.rs
Deleted entire VmBuilder implementation; VM creation now handled by Vmm::create_vm_from_config().
Module exports and compilation
crates/vm-machine/src/lib.rs, crates/vm-core/src/lib.rs, crates/vm-cli/src/main.rs
Commented out #![deny(warnings)] in both vm-machine and vm-cli; updated vm-machine module export from vm_builder to vmm; added vcpu module export to vm-core; reworked vm-cli to use Vmm::new() and Vmm::create_vm_from_config() instead of VmBuilder.
Device initialization
crates/vm-machine/src/device.rs
Updated InitDevice::init_devices() to accept &[Device] slice instead of owned Vec<Device>, eliminating unnecessary ownership transfer.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI
    participant Vmm
    participant Virt as Virt (Hypervisor)
    participant Vm as Vm Instance
    participant VcpuMgr as VcpuManager
    participant Vcpu
    participant Handler as DeviceVmExitHandler

    CLI->>Vmm: new(Box::new(AppleHypervisor))
    activate Vmm

    CLI->>Vmm: create_vm_from_config(VmConfig)
    activate Vmm
    
    Vmm->>Virt: create_vm()
    activate Virt
    Virt->>Vm: new() + hv_vm_create
    activate Vm
    Vm-->>Virt: Arc<dyn Vm>
    deactivate Vm
    Virt-->>Vmm: Arc<dyn Vm>
    deactivate Virt
    
    Vmm->>Vmm: allocate & map RAM<br/>create IRQ chip<br/>init devices
    
    Vmm->>VcpuMgr: new(vm_instance)
    activate VcpuMgr
    VcpuMgr-->>Vmm: VcpuManager
    deactivate VcpuMgr
    
    loop for each vCPU
        Vmm->>VcpuMgr: create_vcpu(id, handler, psci)
        activate VcpuMgr
        VcpuMgr->>Vm: create_vcpu(id, handler, psci)
        activate Vm
        Vm->>Vcpu: new(id, handler, psci)
        activate Vcpu
        Vcpu-->>Vm: HvpVcpu (no hv_vcpu_create yet)
        deactivate Vcpu
        Vm-->>VcpuMgr: Arc<Mutex<dyn Vcpu>>
        deactivate Vm
        VcpuMgr-->>Vmm: ()
        deactivate VcpuMgr
    end
    
    Vmm-->>CLI: create_vm_from_config() done
    deactivate Vmm
    
    CLI->>Vmm: run(bootloader)
    activate Vmm
    
    Vmm->>Vmm: bootloader.load(...)
    
    Vmm->>VcpuMgr: start_vcpu(0, start_pc, x0)
    activate VcpuMgr
    
    Note over VcpuMgr: Spawn thread per vCPU
    VcpuMgr->>Vcpu: post_init_within_thread()<br/>(calls hv_vcpu_create)
    activate Vcpu
    Vcpu-->>VcpuMgr: ()
    deactivate Vcpu
    
    loop VM execution
        VcpuMgr->>Vcpu: run()
        activate Vcpu
        Vcpu->>Handler: vm_exit_handler()
        Handler-->>Vcpu: Arc<dyn DeviceVmExitHandler>
        Vcpu-->>VcpuMgr: VmExitReason
        deactivate Vcpu
        
        VcpuMgr->>VcpuMgr: handle_vm_exit result
    end
    
    VcpuMgr-->>Vmm: ()
    deactivate VcpuMgr
    
    Vmm-->>CLI: run() done
    deactivate Vmm
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

  • feat: Support smp #77 — Both modify the same PSCI/vCPU/HVP/Virt code paths, with related refactoring of Psci trait signature, Psci02 implementation, HvpVcpu changes, and introduction of VcpuManager for vCPU lifecycle management.
  • Refactor hvp #113 — Overlapping changes to VM construction/virt API, including refactoring of Virt/Vm trait separation and updates to Hvp/hypervisor backend implementations.
  • Refactor hvp #112 — Directly related through modifications to the VM/vCPU abstraction layer in virt.rs and HVP backend vCPU storage and trait implementations.

Poem

🐰 Refactored the layers with care,
From Virt to Vm, everywhere,
VcpuManager threads now spawn,
A cleaner dance from dusk to dawn,
Traits now flow like morning dew! 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.61% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Refine virt for vmm' clearly describes the main refactoring effort: restructuring the virtualization trait and VM management to support the new Vmm (virtual machine monitor) abstraction.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor_hvp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm-core/src/virt/kvm/vcpu/arch/x86_64.rs (1)

33-37: ⚠️ Potential issue | 🟡 Minor

Guest debugging enabled unconditionally may impact performance.

KVM_GUESTDBG_SINGLESTEP is enabled by default in init_arch_vcpu. This causes the guest to exit after every instruction, which significantly degrades performance. Consider making this configurable or removing it for production use.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/virt/kvm/vcpu/arch/x86_64.rs` around lines 33 - 37,
init_arch_vcpu currently enables guest debugging with KVM_GUESTDBG_SINGLESTEP
unconditionally by calling set_guest_debug with kvm_guest_debug, which forces
single‑step exits and hurts performance; update init_arch_vcpu to make
single‑step optional (e.g., add a config flag or feature) and only include
KVM_GUESTDBG_SINGLESTEP in the control field when that flag is set (leave
KVM_GUESTDBG_ENABLE as needed), or remove the single‑step bit for production
paths so set_guest_debug / kvm_guest_debug are invoked without
KVM_GUESTDBG_SINGLESTEP unless explicitly requested.
🧹 Nitpick comments (8)
crates/vm-core/src/lib.rs (1)

1-1: Re-enable a warnings gate to avoid silent regressions.

Commenting out #![deny(warnings)] weakens CI/compiler feedback and can let quality issues accumulate. Prefer keeping it enabled and adding targeted #[allow(...)] where intentionally needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/lib.rs` at line 1, Uncomment the crate-level attribute
that enforces warnings as errors by restoring #![deny(warnings)] at the top of
the crate (undo the commented-out line "// #![deny(warnings)]") so the compiler
treats warnings as failures; if specific warnings are noisy, add targeted
#[allow(...)] annotations on the offending items rather than disabling the deny
for the whole crate.
crates/vm-machine/src/lib.rs (1)

1-1: Please keep a warnings gate at crate level.

Disabling #![deny(warnings)] reduces compile-time enforcement and can hide regressions; prefer explicit targeted allows instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-machine/src/lib.rs` at line 1, The crate-level warnings gate was
removed (the commented-out #![deny(warnings)]), re-enable a crate-level deny by
restoring the attribute at the top of lib.rs (uncomment or add
#![deny(warnings)]) and, if specific warnings are noisy, replace with targeted
allows (e.g., #![allow(...)] for specific lint names) rather than disabling the
whole deny; look for the crate root attribute location in lib.rs to apply this
change.
crates/vm-machine/src/vmm.rs (1)

103-103: Use expect() instead of unwrap() for better diagnostics.

♻️ Proposed fix
-            vcpu_manager.lock().unwrap().create_vcpu(
+            vcpu_manager.lock().expect("vcpu_manager mutex poisoned").create_vcpu(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-machine/src/vmm.rs` at line 103, The call
vcpu_manager.lock().unwrap().create_vcpu should use expect() instead of unwrap()
to provide a clear panic message; replace the unwrap() on the MutexGuard
acquisition (vcpu_manager.lock().unwrap()) with
vcpu_manager.lock().expect("failed to lock vcpu_manager mutex when creating
vCPU") so that the failure context is included when calling create_vcpu; ensure
the expect message references vcpu_manager and create_vcpu to aid diagnostics.
crates/vm-core/src/virt/hvp/vcpu.rs (1)

187-196: Consider documenting the thread-affinity requirement.

The method name post_init_within_thread suggests it must be called from the vCPU's dedicated thread, but this contract isn't enforced. Add documentation clarifying this requirement.

📝 Proposed documentation
+    /// Initializes the vCPU within its dedicated thread.
+    ///
+    /// # Safety Contract
+    /// This method MUST be called from the thread that will execute this vCPU.
+    /// All subsequent calls to `run()` and register accessors MUST occur from
+    /// the same thread. Violating this invariant causes undefined behavior.
     fn post_init_within_thread(&mut self) -> Result<(), VcpuError> {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/virt/hvp/vcpu.rs` around lines 187 - 196, Add a doc
comment to post_init_within_thread in vcpu.rs stating it must be called from the
vCPU's dedicated thread (thread-affinity requirement), explaining that
hv_vcpu_create expects to run on that thread and that calling it from any other
thread is undefined/unsafe and may cause errors; reference the function name
post_init_within_thread and the fields it sets (self.handler and self.exit) so
readers know the contract and side effects.
crates/vm-core/src/virt/kvm/vcpu/arch/aarch64.rs (1)

31-32: Replace assert_eq! with proper error handling.

Using assert_eq! for length validation will panic in release builds if the KVM ioctl returns an unexpected length. Consider returning an error instead.

♻️ Proposed fix
     fn get_core_reg(&mut self, reg: CoreRegister) -> Result<u64, VcpuError> {
         let mut bytes = [0; 8];
         let len = self.vcpu_fd.get_one_reg(reg.to_kvm_reg(), &mut bytes)?;
-        assert_eq!(len, 8);
+        if len != 8 {
+            return Err(VcpuError::InvalidRegisterSize { expected: 8, actual: len });
+        }

         let value = u64::from_le_bytes(bytes);
         Ok(value)
     }

Also applies to: 41-42

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/virt/kvm/vcpu/arch/aarch64.rs` around lines 31 - 32,
Replace the panic caused by assert_eq!(len, 8) with proper error handling: after
calling self.vcpu_fd.get_one_reg(reg.to_kvm_reg(), &mut bytes) check if len != 8
and return a descriptive error (propagate a suitable crate error or io::Error
with context like "unexpected register size from KVM for reg {reg}") instead of
asserting; do the same for the other occurrence (the second get_one_reg check
around the later len variable) so both places return an Err with context rather
than panicking.
crates/vm-core/src/arch/aarch64/vcpu.rs (2)

28-32: Use named constants for PState bit manipulation.

Magic constants 0x03C0, 0xf, and 0x0005 are hard to understand. Consider defining named constants for DAIF mask and EL1h mode.

♻️ Proposed refactor
+const PSTATE_DAIF_MASK: u64 = 0x03C0;  // D, A, I, F bits
+const PSTATE_MODE_MASK: u64 = 0x0F;    // Mode bits [3:0]
+const PSTATE_EL1H: u64 = 0x05;         // EL1 with SP_EL1
+
 pub fn setup_cpu(...) -> Result<(), VcpuError> {
     // ...
     {
         let mut pstate = vcpu.get_core_reg(CoreRegister::PState)?;
-        pstate |= 0x03C0; // DAIF
-        pstate &= !0xf; // Clear low 4 bits
-        pstate |= 0x0005; // El1h
+        pstate |= PSTATE_DAIF_MASK;
+        pstate &= !PSTATE_MODE_MASK;
+        pstate |= PSTATE_EL1H;
         vcpu.set_core_reg(CoreRegister::PState, pstate)?;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/arch/aarch64/vcpu.rs` around lines 28 - 32, Replace the
magic numeric masks used when adjusting the CPU PState with named constants to
improve readability: introduce constants like DAIF_MASK (0x03C0), LOW4_MASK
(0xF) or LOW4_CLEAR (~0xF), and EL1H_MODE (0x0005) and then use them in the
sequence that reads and updates pstate (the variable pstate,
CoreRegister::PState, and the calls vcpu.get_core_reg / vcpu.set_core_reg) so
the code becomes pstate |= DAIF_MASK; pstate &= !LOW4_MASK; pstate |= EL1H_MODE;
keeping the same logic but replacing the literals with descriptive constant
names.

35-37: Dead code blocks with if false { todo!() } should be removed or converted to comments.

Multiple if false { ... } blocks contain todo!() placeholders that will never execute. This is confusing dead code. Consider either:

  1. Removing them entirely
  2. Converting to // TODO: comments
  3. Using feature flags if the functionality is planned
♻️ Example refactor for one block
-        // more, non secure el1
-        if false {
-            todo!()
-        }
+        // TODO: Configure non-secure EL1 if needed

Also applies to: 52-56, 58-63, 75-77

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/arch/aarch64/vcpu.rs` around lines 35 - 37, Remove the
unreachable "if false { todo!() }" blocks in vcpu.rs (or convert them into clear
comments or gated feature-flagged code); specifically locate each occurrence
(the blocks at the shown locations) and either delete the dead branch or replace
it with a `// TODO: ...` comment or a cfg/gate for planned functionality so no
`todo!()` is left inside an always-false conditional; ensure any intended future
work is documented or controlled by feature flags and that no unreachable
`todo!()` calls remain in functions/impls within this module.
crates/vm-core/src/vcpu/vcpu_manager.rs (1)

53-53: Use expect() or handle poisoned mutex gracefully.

If another vCPU thread panics while holding a mutex (e.g., due to an unhandled error), this unwrap() will panic and crash the entire VM. Consider using expect() with a descriptive message or handling the PoisonError explicitly.

♻️ Proposed fix
-            let mut vcpu = vcpu.lock().unwrap();
+            let mut vcpu = vcpu.lock().expect("vCPU mutex poisoned - another vCPU thread panicked");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/vcpu/vcpu_manager.rs` at line 53, The code currently does
vcpu.lock().unwrap() which will panic the whole VM if the mutex is poisoned;
replace the unwrap with either vcpu.lock().expect("failed to lock vcpu mutex:
mutex poisoned") or explicitly handle PoisonError by matching vcpu.lock() ->
Ok(mut vcpu) => { ... } and Err(poison) => { let mut vcpu = poison.into_inner();
/* log warning about poisoned mutex */ ... } so the vCPU mutex is recovered
gracefully and a descriptive message is logged; update the occurrence of
vcpu.lock().unwrap() in vcpu_manager.rs accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/vm-cli/src/main.rs`:
- Around line 3-9: The code calls the bail! macro but never brings it into
scope; either add an import for the macro (e.g., use anyhow::bail;) near the
other imports or qualify calls as anyhow::bail!(...) where used (e.g., in the
function that currently invokes bail!). Update the top-level imports in main.rs
(alongside anyhow::Result) or change the bail! invocation to anyhow::bail! to
resolve the compilation error.
- Around line 55-58: The match arm for Accel::Kvm currently contains a todo!()
which will panic at runtime; replace that reachable panic with a proper,
non-panicking fallback: either return a structured error (e.g., propagate an Err
with a clear message like "KVM backend not implemented" from the function that
matches on Accel::Kvm) or map the variant to a safe fallback backend if
supported; locate the Accel::Kvm match arm in main.rs and change the todo!() to
return/propagate the error (using the function's Result/error type) or
remove/exclude the KVM option until implementation is added.

In `@crates/vm-core/src/arch/aarch64/firmware/psci/psci_0_2.rs`:
- Around line 53-58: The CPU_ON64 handler currently casts the guest MPIDR from
get_smc_arg1() directly to a vCPU index and calls
self.vcpu_manager.lock().unwrap().start_vcpu(...).unwrap(), which mis-maps MPIDs
in non-trivial topologies and panics on errors; change it to (1) resolve the
guest MPIDR to the correct zero-based vCPU index via your VM/CPUID mapping
routine (e.g., call the existing MPIDR→vcpu lookup helper or add one) instead of
casting target_cpu directly, (2) avoid unwraps: handle the lock() Result and the
start_vcpu Result and translate failures into the appropriate PSCI status codes
(e.g., INVALID_PARAMETERS, ALREADY_ON, DENIED, INTERNAL_FAILURE) returned by the
PSCI handler, and (3) ensure the call site uses the resolved vcpu index when
invoking VcpuManager::start_vcpu rather than target_cpu as usize so
non-sequential MPIDR topologies work correctly.

In `@crates/vm-core/src/arch/aarch64/vcpu.rs`:
- Line 17: The MPIDR_EL1 value is being set by directly casting cpu_id to u64
which violates the ARM bitfield layout; update the code around the
vcpu.set_sys_reg(SysRegister::MpidrEl1, ...) call to build MPIDR_EL1 with the
RES1 bit (bit 31) set and proper affinity fields (Aff3–Aff0 in bits
[39:32],[23:16],[15:8],[7:0]) while keeping reserved bits zero; for
single-cluster use (set bit 31) and place cpu_id into Aff0, or for multi-cluster
split cpu_id into Aff0–Aff3 and OR them into the proper bit positions before
passing the constructed value to vcpu.set_sys_reg(SysRegister::MpidrEl1, ...).

In `@crates/vm-core/src/arch/aarch64/vcpu/reg/esr_el2.rs`:
- Around line 38-42: The ec() method eagerly allocates the error string because
ok_or(...) calls format! on every invocation; change it to cache the raw value
(call self.ec_raw() once into a local like raw) and use ok_or_else(...) so the
GuestError string is only constructed on the Err path (e.g.,
Ec::from_repr(raw).ok_or_else(|| VcpuError::GuestError(format!("unknown ec:
0x{:x}", raw)))).

In `@crates/vm-core/src/vcpu/vcpu_manager.rs`:
- Around line 45-50: The start_vcpu function currently indexes the vcpus Vec by
vcpu_id which assumes contiguous ordering; change the lookup to be id-based:
either replace the vcpus Vec with a HashMap<usize, Arc<Mutex<dyn Vcpu>>> and
lookup by key, or if keeping Vec, iterate/search for a vCPU whose own id matches
(e.g., call a getter like vcpu.id() or compare a stored id field) and return
VcpuError::VcpuNotCreated(vcpu_id) when not found; update all codepaths that
construct or access vcpus (creation code and other getters) to use the new map
or the validated lookup to ensure correct vCPU selection in start_vcpu.
- Around line 52-80: The vCPU thread spawned in the closure that calls
post_init_within_thread(), setup_cpu(), vcpu.run(), and handle_vm_exit() returns
Result<(), VcpuError> but the JoinHandle is immediately dropped so errors are
lost; modify VcpuManager to capture and store the JoinHandle (or send
Result<VcpuError> back via a channel) so the main thread can observe
failures—locate the thread::spawn(...) block around the closure, return or push
the JoinHandle into the manager's handle collection (or create an error channel
and send Err(e) before terminating the thread) and ensure callers can join/check
the handle or receive the propagated error.

In `@crates/vm-core/src/virt/hvp.rs`:
- Around line 195-201: The vm_config handle from hv_vm_config_create is never
released on error; wrap it in an RAII guard or ensure os_release(vm_config) is
called on every error path inside create_vm. Specifically, create a small guard
type (e.g., HvVmConfigGuard) that stores the raw vm_config, calls os_release in
Drop, and exposes into_raw() or disarm() so you can transfer ownership before
calling hv_vm_create; then replace the direct
hv_vm_config_create/hv_vm_config_set_el2_enabled/hv_vm_create sequence in
create_vm to use the guard and disarm it only after hv_vm_create succeeds,
ensuring hv_vm_config_set_el2_enabled failures or hv_vm_create failures trigger
automatic os_release via the guard.

In `@crates/vm-core/src/virt/hvp/vcpu.rs`:
- Around line 124-131: The current try_get_exit_info function unsafely
dereferences self.exit without verifying thread-affinity; add a thread-affinity
check (e.g., call or implement an is_on_vcpu_thread/ensure_on_vcpu_thread helper
that compares the current thread id to the vCPU's recorded thread id) at the top
of try_get_exit_info and return a suitable VcpuError (e.g.,
VcpuError::WrongThread or a new variant) if the check fails, and only then
perform the unsafe dereference of
self.exit.as_ref().ok_or(VcpuError::VcpuNotCreated(self.vcpu_id))? so that the
pointer is only accessed on the vCPU thread.
- Line 100: HvpVcpu is currently marked unsafe impl Send which is unsound
because its exit raw pointer (the exit field returned by hv_vcpu_create) is
thread-local and must only be used on the creating thread; remove the unsafe
impl Send for HvpVcpu (or replace it with !Send) and ensure the type system
prevents cross-thread use, and/or add runtime thread-affinity checks in HvpVcpu
(store the creating thread id in HvpVcpu and assert in try_get_exit_info() and
run()) so any attempt to move or call run() from a different thread (or via
VcpuManager mutex acquired on another thread) is rejected with a clear error.

In `@crates/vm-core/src/virt/kvm.rs`:
- Around line 28-29: KvmVirt::create_vm currently panics via todo!(), breaking
any caller that selects the KVM backend; replace the unconditional panic by
either constructing and returning the proper Vm implementation (wire the new Vm
object through the KvmVirt factory) or, if not yet implemented, return a
concrete Error variant (e.g., an Error::NotImplemented or a backend-specific
error) instead of panicking so callers can handle it; update the create_vm
signature's Result path to return Ok(Arc<dyn Vm>) when wiring the real VM or
Err(Error::...) when returning the typed error, referencing KvmVirt::create_vm
and the project’s Error enum to choose the correct variant.

In `@crates/vm-core/src/virt/kvm/vcpu/arch/aarch64.rs`:
- Around line 25-27: The aarch64 KVM vCPU currently contains unimplemented
todo!()s; either fully implement these APIs or mark KVM/aarch64 unsupported. To
implement: add fields to KvmVcpu for psci_handler: Arc<dyn Psci> and
device_vm_exit_handler: Arc<dyn DeviceVmExitHandler>, initialize them in
init_arch_vcpu(), and implement get_psci_handler(), vm_exit_handler(),
get_sys_reg(), set_sys_reg(), post_init_within_thread(), and run() to use those
fields and the existing vcpu_fd/vcpu_id. Alternatively, if you prefer to block
usage until implemented, replace the todo!()s with clear runtime errors (or
Result-based returns) indicating KVM aarch64 is unsupported and update any
feature gating/docs accordingly so enabling the feature fails fast with a clear
message.

In `@crates/vm-core/src/virt/kvm/vcpu/arch/x86_64.rs`:
- Around line 75-86: The KVM x86_64 backend currently panics because KvmVcpu's
Vcpu trait methods vm_exit_handler, post_init_within_thread, and run are left as
todo!(); implement these methods on the KvmVcpu type mirroring the aarch64
backend behavior: return the appropriate Arc<dyn DeviceVmExitHandler> from
vm_exit_handler, perform per-thread initialization and return Result<(),
VcpuError> in post_init_within_thread, and execute the vCPU loop returning
Result<VmExitReason, VcpuError> in run; use the same error handling, state
setup, and device exit dispatch patterns found in the aarch64 implementation
(match the method names vm_exit_handler, post_init_within_thread, run and the
KvmVcpu struct) so runtime panics are eliminated.

In `@crates/vm-machine/src/error.rs`:
- Around line 6-10: Update the error display strings on the VmAlreadyExists and
VmNotExists variants so they are clear and grammatical: change "Vm already
exists" to "VM already exists" and "Vm not exists" to "VM does not exist". Edit
the #[error(...)] attributes for the VmAlreadyExists and VmNotExists enum
variants in error.rs to use the new messages and keep the variant names
unchanged.

In `@crates/vm-machine/src/vm.rs`:
- Around line 55-56: The Vm::boot implementation currently panics on non-aarch64
builds because of the #[cfg(not(target_arch = "aarch64"))] todo!() branch;
replace that panic with a proper error return so the shared VMM API is usable on
other architectures. Update the non-aarch64 branch in Vm::boot (the block
guarded by #[cfg(not(target_arch = "aarch64"))]) to return an appropriate Err
(e.g., an UnsupportedArchitecture or BootError variant) instead of calling
todo!(), and ensure the function signature and callers handle that Result error
path consistently.

In `@crates/vm-machine/src/vmm.rs`:
- Around line 102-109: The loop holds the vcpu_manager mutex while calling
vcpu_manager.lock().unwrap().create_vcpu(...), which can deadlock because the
PSCI CpuOn64 handler (in psci_0_2.rs) also locks vcpu_manager from a vCPU
thread; refactor so the mutex is not held while vCPU threads may start: change
VcpuManager::create_vcpu into two steps (e.g., prepare_vcpu that performs
internal state mutation while holding the lock, then release the lock and call
start_vcpu/spawn_vcpu which does thread creation without holding vcpu_manager),
and update this loop to call prepare_vcpu under the lock and then start_vcpu
after dropping the lock; reference vcpu_manager, create_vcpu, and the PSCI
CpuOn64 handler in psci_0_2.rs when making the change.
- Line 133: The empty busy-wait loop after boot() in vmm.rs should be replaced
with a blocking wait so the main thread doesn't spin at 100% CPU; remove loop {}
and instead park the thread or block on a synchronization primitive (e.g.,
thread::park(), Condvar wait, or recv() on a shutdown channel) and wire that to
the existing vCPU lifecycle/shutdown path (for example use the
VcpuManager::start_vcpu shutdown signal or expose a shutdown Receiver that the
main thread can block on), ensuring the main thread unblocks when a
shutdown/stop event occurs.

---

Outside diff comments:
In `@crates/vm-core/src/virt/kvm/vcpu/arch/x86_64.rs`:
- Around line 33-37: init_arch_vcpu currently enables guest debugging with
KVM_GUESTDBG_SINGLESTEP unconditionally by calling set_guest_debug with
kvm_guest_debug, which forces single‑step exits and hurts performance; update
init_arch_vcpu to make single‑step optional (e.g., add a config flag or feature)
and only include KVM_GUESTDBG_SINGLESTEP in the control field when that flag is
set (leave KVM_GUESTDBG_ENABLE as needed), or remove the single‑step bit for
production paths so set_guest_debug / kvm_guest_debug are invoked without
KVM_GUESTDBG_SINGLESTEP unless explicitly requested.

---

Nitpick comments:
In `@crates/vm-core/src/arch/aarch64/vcpu.rs`:
- Around line 28-32: Replace the magic numeric masks used when adjusting the CPU
PState with named constants to improve readability: introduce constants like
DAIF_MASK (0x03C0), LOW4_MASK (0xF) or LOW4_CLEAR (~0xF), and EL1H_MODE (0x0005)
and then use them in the sequence that reads and updates pstate (the variable
pstate, CoreRegister::PState, and the calls vcpu.get_core_reg /
vcpu.set_core_reg) so the code becomes pstate |= DAIF_MASK; pstate &=
!LOW4_MASK; pstate |= EL1H_MODE; keeping the same logic but replacing the
literals with descriptive constant names.
- Around line 35-37: Remove the unreachable "if false { todo!() }" blocks in
vcpu.rs (or convert them into clear comments or gated feature-flagged code);
specifically locate each occurrence (the blocks at the shown locations) and
either delete the dead branch or replace it with a `// TODO: ...` comment or a
cfg/gate for planned functionality so no `todo!()` is left inside an
always-false conditional; ensure any intended future work is documented or
controlled by feature flags and that no unreachable `todo!()` calls remain in
functions/impls within this module.

In `@crates/vm-core/src/lib.rs`:
- Line 1: Uncomment the crate-level attribute that enforces warnings as errors
by restoring #![deny(warnings)] at the top of the crate (undo the commented-out
line "// #![deny(warnings)]") so the compiler treats warnings as failures; if
specific warnings are noisy, add targeted #[allow(...)] annotations on the
offending items rather than disabling the deny for the whole crate.

In `@crates/vm-core/src/vcpu/vcpu_manager.rs`:
- Line 53: The code currently does vcpu.lock().unwrap() which will panic the
whole VM if the mutex is poisoned; replace the unwrap with either
vcpu.lock().expect("failed to lock vcpu mutex: mutex poisoned") or explicitly
handle PoisonError by matching vcpu.lock() -> Ok(mut vcpu) => { ... } and
Err(poison) => { let mut vcpu = poison.into_inner(); /* log warning about
poisoned mutex */ ... } so the vCPU mutex is recovered gracefully and a
descriptive message is logged; update the occurrence of vcpu.lock().unwrap() in
vcpu_manager.rs accordingly.

In `@crates/vm-core/src/virt/hvp/vcpu.rs`:
- Around line 187-196: Add a doc comment to post_init_within_thread in vcpu.rs
stating it must be called from the vCPU's dedicated thread (thread-affinity
requirement), explaining that hv_vcpu_create expects to run on that thread and
that calling it from any other thread is undefined/unsafe and may cause errors;
reference the function name post_init_within_thread and the fields it sets
(self.handler and self.exit) so readers know the contract and side effects.

In `@crates/vm-core/src/virt/kvm/vcpu/arch/aarch64.rs`:
- Around line 31-32: Replace the panic caused by assert_eq!(len, 8) with proper
error handling: after calling self.vcpu_fd.get_one_reg(reg.to_kvm_reg(), &mut
bytes) check if len != 8 and return a descriptive error (propagate a suitable
crate error or io::Error with context like "unexpected register size from KVM
for reg {reg}") instead of asserting; do the same for the other occurrence (the
second get_one_reg check around the later len variable) so both places return an
Err with context rather than panicking.

In `@crates/vm-machine/src/lib.rs`:
- Line 1: The crate-level warnings gate was removed (the commented-out
#![deny(warnings)]), re-enable a crate-level deny by restoring the attribute at
the top of lib.rs (uncomment or add #![deny(warnings)]) and, if specific
warnings are noisy, replace with targeted allows (e.g., #![allow(...)] for
specific lint names) rather than disabling the whole deny; look for the crate
root attribute location in lib.rs to apply this change.

In `@crates/vm-machine/src/vmm.rs`:
- Line 103: The call vcpu_manager.lock().unwrap().create_vcpu should use
expect() instead of unwrap() to provide a clear panic message; replace the
unwrap() on the MutexGuard acquisition (vcpu_manager.lock().unwrap()) with
vcpu_manager.lock().expect("failed to lock vcpu_manager mutex when creating
vCPU") so that the failure context is included when calling create_vcpu; ensure
the expect message references vcpu_manager and create_vcpu to aid diagnostics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5dc57c40-70d4-4f2f-951e-7539787c831f

📥 Commits

Reviewing files that changed from the base of the PR and between e962cfe and ee1bb39.

📒 Files selected for processing (27)
  • crates/vm-cli/src/main.rs
  • crates/vm-core/src/arch.rs
  • crates/vm-core/src/arch/aarch64/firmware/psci.rs
  • crates/vm-core/src/arch/aarch64/firmware/psci/psci_0_2.rs
  • crates/vm-core/src/arch/aarch64/vcpu.rs
  • crates/vm-core/src/arch/aarch64/vcpu/reg/esr_el2.rs
  • crates/vm-core/src/arch/aarch64/vm_exit.rs
  • crates/vm-core/src/arch/vcpu.rs
  • crates/vm-core/src/arch/x86_64/vcpu.rs
  • crates/vm-core/src/lib.rs
  • crates/vm-core/src/vcpu.rs
  • crates/vm-core/src/vcpu/error.rs
  • crates/vm-core/src/vcpu/vcpu.rs
  • crates/vm-core/src/vcpu/vcpu_manager.rs
  • crates/vm-core/src/virt.rs
  • crates/vm-core/src/virt/hvp.rs
  • crates/vm-core/src/virt/hvp/vcpu.rs
  • crates/vm-core/src/virt/kvm.rs
  • crates/vm-core/src/virt/kvm/vcpu/arch/aarch64.rs
  • crates/vm-core/src/virt/kvm/vcpu/arch/x86_64.rs
  • crates/vm-machine/src/device.rs
  • crates/vm-machine/src/error.rs
  • crates/vm-machine/src/lib.rs
  • crates/vm-machine/src/vm.rs
  • crates/vm-machine/src/vm/config.rs
  • crates/vm-machine/src/vm_builder.rs
  • crates/vm-machine/src/vmm.rs
💤 Files with no reviewable changes (3)
  • crates/vm-core/src/arch.rs
  • crates/vm-core/src/arch/vcpu.rs
  • crates/vm-machine/src/vm_builder.rs

Comment on lines 3 to +9
use clap::Parser;
use tracing::debug;
use tracing_subscriber::EnvFilter;
use vm_bootloader::boot_loader::BootLoaderBuilder;
use vm_core::virt::Virt;
use vm_machine::vm_builder::VmBuilder;
use vm_machine::vm::config::VmConfig;
use vm_machine::vmm::Vmm;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that `bail!` is used without either `use anyhow::bail;`
# or a qualified `anyhow::bail!` call.
set -e
rg -n '\bbail!\s*\(' crates/vm-cli/src/main.rs
rg -n '^\s*use anyhow::bail;|anyhow::bail!\s*\(' crates/vm-cli/src/main.rs || true

Repository: junyu0312/rust-vm

Length of output: 115


🏁 Script executed:

cat -n crates/vm-cli/src/main.rs | head -80

Repository: junyu0312/rust-vm

Length of output: 2461


Qualify bail! or import it here.

Line 64 calls bail!, but this macro is not brought into scope. The file imports anyhow::Result (line 40) but does not import the bail! macro, causing a compilation error.

Minimal fix
-                bail!("hvp only supports aarch64");
+                anyhow::bail!("hvp only supports aarch64");

Also applies to: 60-72

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-cli/src/main.rs` around lines 3 - 9, The code calls the bail! macro
but never brings it into scope; either add an import for the macro (e.g., use
anyhow::bail;) near the other imports or qualify calls as anyhow::bail!(...)
where used (e.g., in the function that currently invokes bail!). Update the
top-level imports in main.rs (alongside anyhow::Result) or change the bail!
invocation to anyhow::bail! to resolve the compilation error.

Comment on lines 55 to 58
#[cfg(feature = "kvm")]
Accel::Kvm => {
#[cfg(target_arch = "aarch64")]
{
use vm_bootloader::boot_loader::arch::aarch64::AArch64BootLoader;
use vm_core::arch::aarch64::AArch64;
use vm_core::virt::kvm::KvmVirt;

build_and_run_vm::<KvmVirt<AArch64>, AArch64BootLoader>(args)?;
}

#[cfg(target_arch = "x86_64")]
{
use vm_bootloader::boot_loader::arch::x86_64::X86_64BootLoader;
use vm_core::arch::x86_64::X86_64;
use vm_core::virt::kvm::KvmVirt;

build_and_run_vm::<KvmVirt<X86_64>, X86_64BootLoader>(args)?;
}
todo!()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't leave Accel::Kvm as a reachable panic path.

Line 57 aborts the CLI whenever the user selects KVM. If this backend is intentionally deferred, return a structured error or hide the option until the implementation lands.

Safer fallback
         #[cfg(feature = "kvm")]
         Accel::Kvm => {
-            todo!()
+            anyhow::bail!("KVM support is not implemented in this refactor yet");
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[cfg(feature = "kvm")]
Accel::Kvm => {
#[cfg(target_arch = "aarch64")]
{
use vm_bootloader::boot_loader::arch::aarch64::AArch64BootLoader;
use vm_core::arch::aarch64::AArch64;
use vm_core::virt::kvm::KvmVirt;
build_and_run_vm::<KvmVirt<AArch64>, AArch64BootLoader>(args)?;
}
#[cfg(target_arch = "x86_64")]
{
use vm_bootloader::boot_loader::arch::x86_64::X86_64BootLoader;
use vm_core::arch::x86_64::X86_64;
use vm_core::virt::kvm::KvmVirt;
build_and_run_vm::<KvmVirt<X86_64>, X86_64BootLoader>(args)?;
}
todo!()
}
#[cfg(feature = "kvm")]
Accel::Kvm => {
anyhow::bail!("KVM support is not implemented in this refactor yet");
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-cli/src/main.rs` around lines 55 - 58, The match arm for Accel::Kvm
currently contains a todo!() which will panic at runtime; replace that reachable
panic with a proper, non-panicking fallback: either return a structured error
(e.g., propagate an Err with a clear message like "KVM backend not implemented"
from the function that matches on Accel::Kvm) or map the variant to a safe
fallback backend if supported; locate the Accel::Kvm match arm in main.rs and
change the todo!() to return/propagate the error (using the function's
Result/error type) or remove/exclude the KVM option until implementation is
added.

Comment on lines +53 to 58
println!("try lock vcpu_manager");
self.vcpu_manager
.lock()
.unwrap()
.start_vcpu(target_cpu as usize, entry_point_address, context_id)
.unwrap();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CPU_ON64 needs an MPIDR→vCPU mapping and real error handling.

The value from get_smc_arg1() is the guest MPIDR, but VcpuManager::start_vcpu indexes a zero-based vCPU vector. Casting it directly on Line 57 will mis-target non-trivial topologies, and the unwrap() turns invalid/already-on requests into a host crash instead of a PSCI status code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/arch/aarch64/firmware/psci/psci_0_2.rs` around lines 53 -
58, The CPU_ON64 handler currently casts the guest MPIDR from get_smc_arg1()
directly to a vCPU index and calls
self.vcpu_manager.lock().unwrap().start_vcpu(...).unwrap(), which mis-maps MPIDs
in non-trivial topologies and panics on errors; change it to (1) resolve the
guest MPIDR to the correct zero-based vCPU index via your VM/CPUID mapping
routine (e.g., call the existing MPIDR→vcpu lookup helper or add one) instead of
casting target_cpu directly, (2) avoid unwraps: handle the lock() Result and the
start_vcpu Result and translate failures into the appropriate PSCI status codes
(e.g., INVALID_PARAMETERS, ALREADY_ON, DENIED, INTERNAL_FAILURE) returned by the
PSCI handler, and (3) ensure the call site uses the resolved vcpu index when
invoking VcpuManager::start_vcpu rather than target_cpu as usize so
non-sequential MPIDR topologies work correctly.

cpu_id: usize,
vcpu: &mut dyn AArch64Vcpu,
) -> Result<(), VcpuError> {
vcpu.set_sys_reg(SysRegister::MpidrEl1, cpu_id as u64)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

What is the correct format for MPIDR_EL1 register on ARM64 for multi-core VMs?

💡 Result:

The MPIDR_EL1 register on ARM64 has the same format whether in physical hardware or virtualized multi-core VMs. When EL2 (hypervisor) is enabled, EL1 reads of MPIDR_EL1 return the value from VMPIDR_EL2, which the hypervisor configures to present virtual affinity levels to the guest VM. The bitfield layout is identical to the physical MPIDR_EL1. 64-bit register layout (MSB to LSB): Bits [63:40]: RES0 (Reserved) Bits [39:32]: Aff3 (Affinity level 3) Bit [31]: RES1 (Reserved) Bit [30]: U (Uniprocessor indicator: 1 if uniprocessor system, 0 if multiprocessor) Bits [29:25]: RES0 (Reserved) Bit [24]: MT (Multi-Thread indicator: 1 if lowest affinity level uses multithreading, 0 if independent PEs) Bits [23:16]: Aff2 (Affinity level 2) Bits [15:8]: Aff1 (Affinity level 1) Bits [7:0]: Aff0 (Affinity level 0) For multi-core VMs, the hypervisor sets unique {Aff3,Aff2,Aff1,Aff0} combinations across vCPUs (typically incrementing Aff0 for each vCPU, with higher Aff levels for clusters/nodes as needed). U=0 and MT=0 for standard multi-core setups. All Aff fields must be unique system-wide within the VM. Access: RO (read-only) from EL1/EL0 when trapped by hypervisor.

Citations:


🏁 Script executed:

# First, let's examine the file and get context around line 17
cd crates/vm-core/src/arch/aarch64 && cat -n vcpu.rs | head -50

Repository: junyu0312/rust-vm

Length of output: 1787


🏁 Script executed:

# Let's search for set_sys_reg implementation to understand how it handles values
rg "fn set_sys_reg" -A 10

Repository: junyu0312/rust-vm

Length of output: 2531


🏁 Script executed:

# Search for MPIDR_EL1 usage across the codebase
rg "MPIDR|MpidrEl1" -i

Repository: junyu0312/rust-vm

Length of output: 404


🏁 Script executed:

# Look for any affinity or topology-related code
rg "affinity|Aff[0-3]|topology" -i

Repository: junyu0312/rust-vm

Length of output: 306


🏁 Script executed:

# Check if there's any helper to format MPIDR_EL1
rg "mpidr|MPIDR" -i -B 3 -A 3

Repository: junyu0312/rust-vm

Length of output: 2298


🏁 Script executed:

# Look at the reg.rs file to see how registers are defined/formatted
head -100 crates/vm-core/src/arch/aarch64/vcpu/reg.rs

Repository: junyu0312/rust-vm

Length of output: 2028


Construct MPIDR_EL1 with proper ARM bitfield format.

Direct casting cpu_id to u64 violates the ARM MPIDR_EL1 specification. The register requires specific bitfield layout:

  • Bit [31] must be set to 1 (RES1), which the current implementation fails to do
  • Bits [39:32], [23:16], [15:8], [7:0] must encode affinity levels (Aff3–Aff0)
  • Bits [63:40], [29:25] must be 0 (RES0)

The current code only works for cpu_id < 256 by accident. Construct the value with proper affinity encoding, for example: (1u64 << 31) | (cpu_id as u64) for single-cluster configurations, or with Aff3–Aff0 populated according to your topology for multi-cluster setups.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/arch/aarch64/vcpu.rs` at line 17, The MPIDR_EL1 value is
being set by directly casting cpu_id to u64 which violates the ARM bitfield
layout; update the code around the vcpu.set_sys_reg(SysRegister::MpidrEl1, ...)
call to build MPIDR_EL1 with the RES1 bit (bit 31) set and proper affinity
fields (Aff3–Aff0 in bits [39:32],[23:16],[15:8],[7:0]) while keeping reserved
bits zero; for single-cluster use (set bit 31) and place cpu_id into Aff0, or
for multi-cluster split cpu_id into Aff0–Aff3 and OR them into the proper bit
positions before passing the constructed value to
vcpu.set_sys_reg(SysRegister::MpidrEl1, ...).

Comment on lines +38 to +42
pub fn ec(&self) -> Result<Ec, VcpuError> {
Ec::from_repr(self.ec_raw()).ok_or(VcpuError::GuestError(format!(
"unknown ec: 0x{:x}",
self.ec_raw()
)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid eager error allocation in ec() hot path.

ok_or(...) evaluates format! on every call, including success cases. Use ok_or_else(...) and cache ec_raw once to avoid unnecessary allocations during frequent exit decoding.

💡 Proposed fix
     pub fn ec(&self) -> Result<Ec, VcpuError> {
-        Ec::from_repr(self.ec_raw()).ok_or(VcpuError::GuestError(format!(
-            "unknown ec: 0x{:x}",
-            self.ec_raw()
-        )))
+        let ec_raw = self.ec_raw();
+        Ec::from_repr(ec_raw).ok_or_else(|| {
+            VcpuError::GuestError(format!("unknown ec: 0x{ec_raw:x}"))
+        })
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn ec(&self) -> Result<Ec, VcpuError> {
Ec::from_repr(self.ec_raw()).ok_or(VcpuError::GuestError(format!(
"unknown ec: 0x{:x}",
self.ec_raw()
)))
pub fn ec(&self) -> Result<Ec, VcpuError> {
let ec_raw = self.ec_raw();
Ec::from_repr(ec_raw).ok_or_else(|| {
VcpuError::GuestError(format!("unknown ec: 0x{ec_raw:x}"))
})
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/arch/aarch64/vcpu/reg/esr_el2.rs` around lines 38 - 42,
The ec() method eagerly allocates the error string because ok_or(...) calls
format! on every invocation; change it to cache the raw value (call
self.ec_raw() once into a local like raw) and use ok_or_else(...) so the
GuestError string is only constructed on the Err path (e.g.,
Ec::from_repr(raw).ok_or_else(|| VcpuError::GuestError(format!("unknown ec:
0x{:x}", raw)))).

Comment on lines +75 to 86
impl Vcpu for KvmVcpu {
fn vm_exit_handler(&self) -> Arc<dyn DeviceVmExitHandler> {
todo!()
}

fn post_init_within_thread(&mut self) -> Result<(), VcpuError> {
todo!()
}

fn run(&mut self) -> Result<VmExitReason, VcpuError> {
todo!()
/*
loop {
let r = self.vcpu_fd.run();

match r? {
VcpuExit::IoOut(port, data) => pio.io_out(port, data)?,
VcpuExit::IoIn(port, data) => pio.io_in(port, data)?,
VcpuExit::MmioRead(_, _) => {
// Ignore
}
VcpuExit::MmioWrite(_, _) => {
// Ignore
}
VcpuExit::Unknown => todo!(),
VcpuExit::Exception => todo!(),
VcpuExit::Hypercall(_) => todo!(),
VcpuExit::Debug(_) => {}
VcpuExit::Hlt => {
// warn!("hlt");
// todo!()
}
VcpuExit::IrqWindowOpen => todo!(),
VcpuExit::Shutdown => todo!(),
VcpuExit::FailEntry(_, _) => todo!(),
VcpuExit::Intr => todo!(),
VcpuExit::SetTpr => todo!(),
VcpuExit::TprAccess => todo!(),
VcpuExit::S390Sieic => todo!(),
VcpuExit::S390Reset => todo!(),
VcpuExit::Dcr => todo!(),
VcpuExit::Nmi => todo!(),
VcpuExit::InternalError => {
let kvm_run = self.vcpu_fd.get_kvm_run();
unsafe {
error!(?kvm_run.__bindgen_anon_1.internal, "InternalError");
}
panic!();
}
VcpuExit::Osi => todo!(),
VcpuExit::PaprHcall => todo!(),
VcpuExit::S390Ucontrol => todo!(),
VcpuExit::Watchdog => todo!(),
VcpuExit::S390Tsch => todo!(),
VcpuExit::Epr => todo!(),
VcpuExit::SystemEvent(_, _) => todo!(),
VcpuExit::S390Stsi => todo!(),
VcpuExit::IoapicEoi(_) => todo!(),
VcpuExit::Hyperv => todo!(),
VcpuExit::X86Rdmsr(_) => todo!(),
VcpuExit::X86Wrmsr(_) => todo!(),
VcpuExit::MemoryFault { .. } => todo!(),
VcpuExit::Unsupported(_) => todo!(),
}
}
*/
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

KVM x86_64 backend is incomplete.

Similar to the aarch64 backend, the Vcpu trait implementation has todo!() placeholders for all required methods. This backend will panic at runtime if used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-core/src/virt/kvm/vcpu/arch/x86_64.rs` around lines 75 - 86, The
KVM x86_64 backend currently panics because KvmVcpu's Vcpu trait methods
vm_exit_handler, post_init_within_thread, and run are left as todo!(); implement
these methods on the KvmVcpu type mirroring the aarch64 backend behavior: return
the appropriate Arc<dyn DeviceVmExitHandler> from vm_exit_handler, perform
per-thread initialization and return Result<(), VcpuError> in
post_init_within_thread, and execute the vCPU loop returning
Result<VmExitReason, VcpuError> in run; use the same error handling, state
setup, and device exit dispatch patterns found in the aarch64 implementation
(match the method names vm_exit_handler, post_init_within_thread, run and the
KvmVcpu struct) so runtime panics are eliminated.

Comment on lines +6 to +10
#[error("Vm already exists")]
VmAlreadyExists,

#[error("Vm not exists")]
VmNotExists,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Polish VM state error messages for clarity.

The strings read awkwardly (Vm not exists). Prefer concise, grammatical messages like “VM already exists” / “VM does not exist”.

✍️ Suggested wording update
-    #[error("Vm already exists")]
+    #[error("VM already exists")]
     VmAlreadyExists,

-    #[error("Vm not exists")]
+    #[error("VM does not exist")]
     VmNotExists,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[error("Vm already exists")]
VmAlreadyExists,
#[error("Vm not exists")]
VmNotExists,
#[error("VM already exists")]
VmAlreadyExists,
#[error("VM does not exist")]
VmNotExists,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-machine/src/error.rs` around lines 6 - 10, Update the error display
strings on the VmAlreadyExists and VmNotExists variants so they are clear and
grammatical: change "Vm already exists" to "VM already exists" and "Vm not
exists" to "VM does not exist". Edit the #[error(...)] attributes for the
VmAlreadyExists and VmNotExists enum variants in error.rs to use the new
messages and keep the variant names unchanged.

Comment on lines +55 to +56
#[cfg(not(target_arch = "aarch64"))]
todo!();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Vm::boot still hard-panics off aarch64.

Lines 55-56 make the shared VMM API unusable on every non-aarch64 build. Even after the frontend/backend wiring is restored, this path will still abort instead of returning an error or doing architecture-specific startup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-machine/src/vm.rs` around lines 55 - 56, The Vm::boot
implementation currently panics on non-aarch64 builds because of the
#[cfg(not(target_arch = "aarch64"))] todo!() branch; replace that panic with a
proper error return so the shared VMM API is usable on other architectures.
Update the non-aarch64 branch in Vm::boot (the block guarded by
#[cfg(not(target_arch = "aarch64"))]) to return an appropriate Err (e.g., an
UnsupportedArchitecture or BootError variant) instead of calling todo!(), and
ensure the function signature and callers handle that Result error path
consistently.

Comment on lines +102 to +109
for vcpu_id in 0..vm_config.vcpus {
vcpu_manager.lock().unwrap().create_vcpu(
vcpu_id,
device_manager.clone(),
#[cfg(target_arch = "aarch64")]
psci.clone(),
)?;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential deadlock with PSCI handler.

Based on the context snippet from psci_0_2.rs (lines 35-70), the PSCI CpuOn64 handler acquires vcpu_manager.lock() from within a vCPU thread. If this loop holds the lock while a vCPU thread is executing PSCI, a deadlock could occur.

The current code locks briefly per iteration which mitigates this, but consider documenting this constraint or using a lock-free approach for vCPU creation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-machine/src/vmm.rs` around lines 102 - 109, The loop holds the
vcpu_manager mutex while calling vcpu_manager.lock().unwrap().create_vcpu(...),
which can deadlock because the PSCI CpuOn64 handler (in psci_0_2.rs) also locks
vcpu_manager from a vCPU thread; refactor so the mutex is not held while vCPU
threads may start: change VcpuManager::create_vcpu into two steps (e.g.,
prepare_vcpu that performs internal state mutation while holding the lock, then
release the lock and call start_vcpu/spawn_vcpu which does thread creation
without holding vcpu_manager), and update this loop to call prepare_vcpu under
the lock and then start_vcpu after dropping the lock; reference vcpu_manager,
create_vcpu, and the PSCI CpuOn64 handler in psci_0_2.rs when making the change.

.ok_or(Error::VmNotExists)?
.boot(boot_loader)?;

loop {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Busy-wait loop burns CPU unnecessarily.

The empty loop {} after boot() will spin at 100% CPU utilization. Since vCPU threads run independently (as shown in VcpuManager::start_vcpu), the main thread should park or wait on a synchronization primitive.

💡 Proposed fix
     pub fn run(&mut self, boot_loader: &dyn BootLoader) -> Result<()> {
         self.vm
             .as_mut()
             .ok_or(Error::VmNotExists)?
             .boot(boot_loader)?;

-        loop {}
+        // Park the main thread; vCPU threads handle execution
+        loop {
+            std::thread::park();
+        }
     }

Or better, wait on a shutdown signal/channel.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
loop {}
// Park the main thread; vCPU threads handle execution
loop {
std::thread::park();
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm-machine/src/vmm.rs` at line 133, The empty busy-wait loop after
boot() in vmm.rs should be replaced with a blocking wait so the main thread
doesn't spin at 100% CPU; remove loop {} and instead park the thread or block on
a synchronization primitive (e.g., thread::park(), Condvar wait, or recv() on a
shutdown channel) and wire that to the existing vCPU lifecycle/shutdown path
(for example use the VcpuManager::start_vcpu shutdown signal or expose a
shutdown Receiver that the main thread can block on), ensuring the main thread
unblocks when a shutdown/stop event occurs.

@junyu0312 junyu0312 merged commit f247123 into main Apr 2, 2026
8 checks passed
This was referenced Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant