[WIP] [Deepin-Kernel-SIG] [linux 6.6.y] [Arm] [Fromlist] [Security] arm64: Support for Arm CCA in KVM by Avenger-285714 · Pull Request #1520 · deepin-community/kernel

Avenger-285714 · 2026-03-02T07:08:56Z

以下 description 由 AI 辅助生成：

arm64: 支持 Arm CCA (Confidential Compute Architecture) in KVM

概述

本 PR 将 Arm CCA（Confidential Compute Architecture）支持回合到 linux-6.6.y (v6.6.127) 内核。CCA 是 Arm 的机密计算架构，通过 Realm Management Extension (RME) 和 Realm Management Monitor (RMM) 为虚拟机提供硬件级别的内存隔离与保护，使 guest 内存对 hypervisor 不可见。

本次回合基于上游 v12 版本补丁系列 "[PATCH v12 00/46] arm64: Support for Arm CCA in KVM"（原始基线为 v6.19-rc1+），包含 71 个提交。

提交组成

类型	数量	标记方式	说明
Upstream Backport	25	commit body 中包含 `[ Upstream commit <sha1> ]`	从上游主线 cherry-pick 的前置基础设施提交，保留原始作者
Fromlist 补丁	46	commit title 以 `Fromlist:` 为前缀	CCA v12 补丁系列的 46 个补丁，尚未合入上游主线

一、Upstream Backport 提交详解 (25 个)

CCA 补丁系列依赖的多项基础设施在 6.6.y 中完全不存在，必须先从上游 backport。这些提交分为三个子阶段：

阶段 0.1：guest_memfd 核心框架 (10 个)

virt/kvm/guest_memfd.c 整个子系统（guest 私有内存后端）以及 CONFIG_KVM_GUEST_MEMFD、CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES 等 Kconfig 选项在 6.6.y 中均不存在。以下 10 个提交构建了 CCA 所需的最小 guest_memfd 框架：

序号	分支 commit	原始 commit	标题	作用
1	`968978aaba80`	`5a475554db1e`	KVM: Introduce per-page memory attributes	引入 `KVM_SET_MEMORY_ATTRIBUTES` ioctl 和 xarray 存储的按页内存属性框架
2	`303b8bf0c6ad`	`a7800aa80ea4`	KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory	创建 `virt/kvm/guest_memfd.c`，实现 guest 私有内存后端
3	`9d80bea67d28`	`80583d0cfd8f`	KVM: guest-memfd: fix unused-function warning	修复 #2 的编译警告
4	`78aaab8f673e`	`1d23040caa8b`	KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode	安全性：将 guest_memfd inode 标记为不可直接访问
5	`04540c790841`	`70623723778a`	KVM: guest_memfd: pass error up from filemap_grab_folio	错误处理改进
6	`c9b28b8e40c8`	`fa30b0dc91c8`	KVM: guest_memfd: limit overzealous WARN	稳定性修复
7	`adb2ddb3b378`	`3bb2531e20bf`	KVM: guest_memfd: Add hook for initializing memory	引入 `kvm_arch_gmem_prepare()` 回调，CCA 用于 Realm 内存初始化
8	`fd27a620480e`	`17573fd971f9`	KVM: guest_memfd: extract __kvm_gmem_get_pfn()	提取内部辅助函数，为后续 API 演化做准备
9	`b0fe205e0697`	`1f6c06b17751`	KVM: guest_memfd: Add interface for populating gmem pages with user data	引入 `kvm_gmem_populate()`，CCA patch 20 用于填充 Realm 初始内存
10	`7b47c551d87b`	`a90764f0e4ed`	KVM: guest_memfd: Add hook for invalidating memory	引入 `kvm_arch_gmem_invalidate()` 回调，CCA 用于 Realm 内存回收

阶段 0.2：API 修复与命名统一 (12 个)

在核心框架之上，还需要一系列修复、重命名和 API 演化提交以匹配 CCA 补丁所期望的接口：

序号	分支 commit	原始 commit	标题	作用
11	`df348ae0e862`	`d81473840ce1`	KVM: interrupt kvm_gmem_populate() on signals	健壮性：允许信号中断长时间的 populate 操作
12	`ad1f35ab6858`	`e300614f10bd`	KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes()	代码简化
13	`f74c8d9744e4`	`47bb584237cc`	KVM: Allow CPU to reschedule while setting per-page memory attributes	大内存区域的调度优化
14	`e24c03506e3c`	`19a9a1ab5c3d`	KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GUEST_MEMFD	配置项命名统一
15	`3cfed31caac9`	`d0d87226f535`	KVM: guest_memfd: return folio from __kvm_gmem_get_pfn()	API 演化：返回 folio 而非仅设置 pfn
16	`244147e581fb`	`d04c77d23122`	KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation	正确性：仅在 prepare 成功后才标记 uptodate
17	`7edb9f3ea128`	`564429a6bd8d`	KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*	配置项命名统一
18	`9867b6e495b0`	`e4ee54479273`	KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns	CCA patch 20 所需的行为约束条件
19	`7a4bf632a6a4`	`dca6c8853232`	KVM: Add member to struct kvm_gfn_range to indicate private/shared	CCA 内存属性变更通知所需
20	`4eb848a26ec9`	`923310be23b2`	KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem()	函数重命名，CCA patch 24 引用此名称
21	`2b3c6d0abdc6`	`638ea79669f8`	KVM: arm64: Refactor user_mem_abort()	拆分 `user_mem_abort()` 为 shared/gmem 路径
22	`e908e1b7fa31`	`a7b57e099592`	KVM: arm64: Handle guest_memfd-backed guest page faults	引入 `gmem_abort()` 处理 guest 私有内存缺页

阶段 0.3：UAPI/API 补全 (3 个)

在应用全部 46 个 CCA 补丁后，发现以下关键 UAPI 定义和 API 签名在 6.6.y 中完全缺失，导致编译失败。这 3 个补丁补全了缺失的接口定义：

序号	分支 commit	原始 commit	标题	补全的缺失项
23	`362de6414aa9`	`16f95f3b95ca`	KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace	`KVM_EXIT_MEMORY_FAULT`（exit reason #39）、`kvm_run.memory_fault` 结构体、3 参数版 `kvm_prepare_memory_fault_exit()`
24	`f52c7a72d8b8`	`8dd2eee9d526`	KVM: x86/mmu: Handle page fault for private memory	`KVM_MEMORY_EXIT_FLAG_PRIVATE` 定义、将 `kvm_prepare_memory_fault_exit()` 升级为 6 参数版（增加 `is_write/is_exec/is_private`）、`kvm_faultin_pfn_private()` 和 `kvm_max_level_for_order()`
25	`f33220e586ee`	`1fbee5b01a0f`	KVM: guest_memfd: Provide "struct page" as output from kvm_gmem_get_pfn()	为 `kvm_gmem_get_pfn()` 增加 `struct page **page` 输出参数，CCA 的 arm64 缺页处理需要此参数获取 struct page

📋 #23-#25 说明：原计划中仅有 22 个前置 backport。在 CCA 补丁全部应用后进行接口完整性验证时，发现 KVM_EXIT_MEMORY_FAULT UAPI（exit reason #39）、kvm_run.memory_fault 结构体、KVM_MEMORY_EXIT_FLAG_PRIVATE 等定义在 6.6.y 中完全不存在，kvm_prepare_memory_fault_exit() 和 kvm_gmem_get_pfn() 的函数签名也与 CCA 补丁所需不匹配。为保持与上游的一致性，选择从上游 cherry-pick 这 3 个正规提交来补全，而非使用自定义 deepin: 修复。

二、Fromlist CCA 补丁 (46 个)

46 个 CCA 补丁来自上游邮件列表提交的 v12 版本（尚未合入主线），按原始编号 01-46 顺序应用。

完整提交列表

编号	分支 commit	标题	功能分组
01	`24b9abbadd7f`	kvm: arm64: Include kvm_emulate.h in kvm/arm_psci.h	基础架构
02	`518467276a83`	arm64: RME: Handle Granule Protection Faults (GPFs)	基础架构
03	`076d987f7772`	arm64: RMI: Add SMC definitions for calling the RMM	RMI 接口
04	`3a3f7ad88a02`	arm64: RMI: Add wrappers for RMI calls	RMI 接口
05	`a7f730878bef`	arm64: RMI: Check for RMI support at KVM init	RMI 接口
06	`aeb5303ad0b8`	arm64: RMI: Define the user ABI	RMI 接口
07	`afc2ab68ad30`	arm64: RMI: Basic infrastructure for creating a realm	Realm 创建
08	`077c40cc0590`	kvm: arm64: Don't expose unsupported capabilities for realm guests	Realm 创建
09	`f2ade45d5ffa`	KVM: arm64: Allow passing machine type in KVM creation	Realm 创建
10	`b55df100a8da`	arm64: RMI: RTT tear down	Realm 创建
11	`7b5895a4420d`	arm64: RMI: Activate realm on first VCPU run	Realm 创建
12	`49f5140f8958`	arm64: RMI: Allocate/free RECs to match vCPUs	Realm 创建
13	`960dd69f0be4`	KVM: arm64: vgic: Provide helper for number of list registers	中断/定时器
14	`cfd4093a365d`	arm64: RMI: Support for the VGIC in realms	中断/定时器
15	`b422fc1dac78`	KVM: arm64: Support timers in realm RECs	中断/定时器
16	`37d1dfaf5825`	arm64: RMI: Handle realm enter/exit	运行时处理
17	`9a682ece496c`	arm64: RMI: Handle RMI_EXIT_RIPAS_CHANGE	运行时处理
18	`5247c359d8d5`	KVM: arm64: Handle realm MMIO emulation	运行时处理
19	`d3bbfb218950`	KVM: arm64: Expose support for private memory	guest_memfd 集成
20	`b283df588980`	arm64: RMI: Allow populating initial contents	guest_memfd 集成
21	`d092e884102a`	arm64: RMI: Set RIPAS of initial memslots	guest_memfd 集成
22	`fd2d851ec369`	arm64: RMI: Create the realm descriptor	guest_memfd 集成
23	`14ed1026558e`	arm64: RMI: Add a VMID allocator for realms	guest_memfd 集成
24	`4d43d62fad0b`	arm64: RMI: Runtime faulting of memory	guest_memfd 集成 ⚠️
25	`07f2c44fbdd1`	KVM: arm64: Handle realm VCPU load	VCPU 管理
26	`be825b367036`	KVM: arm64: Validate register access for a Realm VM	VCPU 管理
27	`c4a1a72d6bf4`	KVM: arm64: Handle Realm PSCI requests	VCPU 管理
28	`2eb2b434e6eb`	KVM: arm64: WARN on injected undef exceptions	VCPU 管理
29	`770c17b40b43`	arm64: Don't expose stolen time for realm guests	VCPU 管理
30	`82f0a4d8fc4b`	arm64: RMI: allow userspace to inject aborts	VCPU 管理
31	`351eefbe3e11`	arm64: RMI: support RSI_HOST_CALL	扩展特性
32	`77b631f8cdad`	arm64: RMI: Allow checking SVE on VM instance	扩展特性
33	`95143a558cf7`	arm64: RMI: Always use 4k pages for realms	扩展特性
34	`43b2e9d91ea0`	arm64: RMI: Prevent Device mappings for Realms	扩展特性
35	`5686e6e6d77b`	HACK: Restore per-CPU cpu_armpmu pointer	PMU 支持
36	`c4ea18576076`	arm_pmu: Provide a mechanism for disabling the physical IRQ	PMU 支持
37	`41aacd68e2c9`	arm64: RMI: Enable PMU support with a realm guest	PMU 支持
38	`66f8d0071a09`	arm64: RMI: Propagate number of breakpoints and watchpoints to userspace	调试支持
39	`89a8a4caad52`	arm64: RMI: Set breakpoint parameters through SET_ONE_REG	调试支持
40	`5308dd832f16`	arm64: RMI: Initialize PMCR.N with number counter supported by RMM	PMU 支持
41	`9de55f73cfa9`	arm64: RMI: Propagate max SVE vector length from RMM	SVE 支持
42	`d7347fb7a726`	arm64: RMI: Configure max SVE vector length for a Realm	SVE 支持
43	`1bcf11badcea`	arm64: RMI: Provide register list for unfinalized RMI RECs	寄存器管理
44	`5ea16f211e4f`	arm64: RMI: Provide accurate register list	寄存器管理
45	`9b38228b2b70`	KVM: arm64: Expose KVM_ARM_VCPU_REC to user space	用户态 ABI
46	`a0ba4561deac`	arm64: RMI: Enable realms to be created	最终使能

功能分组详解

基础架构与 RMI 接口 (Patch 01-06)：修复头文件依赖，添加 GPF（Granule Protection Fault）处理，定义 RMI SMC 调用接口（rmi_smc.h、rmi_cmds.h，均为新文件），KVM 初始化时检测 RMI 支持并引入 kvm_is_realm() 等辅助函数，定义 Realm 用户态 ABI 并新增 KVM_CAP_ARM_RMI capability。

Realm 虚拟机创建与管理 (Patch 07-12)：Realm 创建基础设施（IPA limit 判断、Realm Descriptor 管理），Realm guest 的 capability 过滤（屏蔽不支持的特性），允许通过 KVM_VM_TYPE_ARM_REALM 指定 machine type 创建 Realm，RTT（Realm Translation Table）拆除清理，首次 VCPU 运行时激活 Realm，REC（Realm Execution Context）分配/释放。

中断与定时器支持 (Patch 13-15)：VGIC list register 数量查询 helper，Realm 中 VGIC 的完整支持，Realm REC 中的定时器支持。

运行时处理 (Patch 16-18)：Realm 进入/退出处理（rmi-exit.c，新文件），RMI_EXIT_RIPAS_CHANGE（Realm IPA State 变更请求）处理，Realm MMIO 模拟。

guest_memfd 集成 (Patch 19-24)：暴露 private memory 支持，允许通过 kvm_gmem_populate() 填充 Realm 初始内存内容，设置初始 memslot 的 RIPAS，创建 Realm Descriptor，Realm VMID 分配器，运行时内存缺页处理（Patch 24 是最复杂的补丁——在 gmem_abort() 中处理 Realm 私有内存的缺页，涉及 RMI 数据/RTT 创建）。

VCPU 与寄存器管理 (Patch 25-30)：Realm VCPU load，寄存器访问验证，Realm PSCI 请求处理，注入未定义异常时的 WARN 检查，禁用 Realm guest 的 stolen time，允许用户态注入 abort。

扩展特性 (Patch 31-46)：RSI_HOST_CALL 支持，SVE 检查与 4K 页面强制，禁止 Realm 的 Device mapping，PMU 支持（包括 per-CPU cpu_armpmu 指针恢复、IRQ disable 机制、PMU 计数器初始化），断点/观察点参数传播与设置，SVE 向量长度从 RMM 获取与配置，Realm REC 的精确寄存器列表，暴露 KVM_ARM_VCPU_REC 到用户态，最终通过使能 static branch 允许创建 Realm。

⚠️ Patch 24 (Runtime faulting of memory) 是整个系列中最复杂的补丁，涉及最多的 API 等价替换，是运行时正确性的关键风险点。

三、提交顺序说明

25 个 upstream backport 先于 46 个 CCA 补丁应用，其中阶段 0.3 的 3 个 UAPI/API 补全提交穿插在 CCA Patch 23 和 Patch 24 之间（因为 Patch 24 是第一个实际使用 kvm_prepare_memory_fault_exit() 6 参数版本和 kvm_gmem_get_pfn() 含 struct page **page 参数的补丁）：

 Upstream #1-#22 (Phase 0.1 + 0.2)
   ↓
 CCA Patch 01-23 (Fromlist)
   ↓
 Upstream #23-#25 (Phase 0.3: UAPI/API 补全)
   ↓
 CCA Patch 24-46 (Fromlist)

四、相对于原始上游/Fromlist 状态的适配改动

由于原始 CCA 补丁基于 v6.19-rc1+，而目标分支为 v6.6.127，两者之间存在大约 3 年的内核演进差距，回合过程中进行了以下主要适配：

4.1 guest_memfd 子系统完整 Backport

原因：6.6.y 中 virt/kvm/guest_memfd.c 完全不存在，CONFIG_KVM_GUEST_MEMFD、CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES 等 Kconfig 选项均缺失。CCA 补丁 19、20、24 强依赖此子系统。

做法：从上游主线中精选了 22 个 guest_memfd 相关提交按依赖顺序 cherry-pick，包括核心框架创建、bug 修复、API 演化和命名统一。这些提交之间存在严格的顺序依赖关系。

cherry-pick 冲突解决：

include/uapi/linux/kvm.h：6.6.y 中 KVM_CAP 最高编号为 229，需要在已有 deepin 特有 CAP（如 HYGON 相关）之后正确插入新 CAP 编号
virt/kvm/Kconfig / Makefile.kvm：新增 guest_memfd 相关的 Kconfig 选项和编译规则
include/linux/kvm_host.h：在 6.6.y 已有的结构体定义中插入新的内存属性和 gmem 相关声明

4.2 KVM_EXIT_MEMORY_FAULT UAPI 补全

原因：CCA 补丁 24 在 gmem_abort() 中调用 kvm_prepare_memory_fault_exit() 向用户态报告私有/共享内存访问不匹配的情况。但 6.6.y 中 KVM_EXIT_MEMORY_FAULT（exit reason #39）、kvm_run.memory_fault 结构体、KVM_MEMORY_EXIT_FLAG_PRIVATE 均完全缺失。

做法：Backport 了 3 个额外的上游提交（#23-#25）来补全这些 UAPI 定义和 API 签名，而非使用自定义的 deepin: 修复提交，以保持代码与上游的一致性。

冲突解决：

include/uapi/linux/kvm.h：在 6.6.y 的 exit reason 列表中插入 KVM_EXIT_MEMORY_FAULT = 39，在 kvm_run union 中添加 memory_fault 成员
include/linux/kvm_host.h：保留 6.6.y 已有的 Phase 0 基础设施代码（memory attributes、gmem 函数声明），同时正确合入 kvm_prepare_memory_fault_exit() 的 6 参数版本
arch/x86/kvm/mmu/mmu.c：kvm_mmu_max_mapping_level() 函数签名变更（增加 max_level 参数），保留 6.6.y 使用的 kvm_slot_has_gmem() 函数名（上游此处为 kvm_slot_can_be_private()，在 Phase 0 commit chore(CI): Update workflows.yml #20 中已重命名）
arch/x86/kvm/mmu/mmu_internal.h：在 kvm_page_fault 结构体中添加 refcounted_page 字段
arch/x86/kvm/svm/sev.c：上游 commit 引入的 SNP/SEV 相关函数（snp_rmptable_psmash、sev_handle_rmp_fault 等）在 6.6.y 中无对应基础代码，全部丢弃（保留 HEAD 侧空内容）
virt/kvm/guest_memfd.c：kvm_gmem_get_pfn() 增加 struct page **page 参数后，在函数体中添加 folio_file_page() 调用以正确设置页面输出

4.3 mmu.c 的 API 适配

CCA 补丁（特别是 patch 07、10、17、24、33、34）在 arch/arm64/kvm/mmu.c 中使用了多个 6.6.y 不存在的 API。回合时进行了以下等价替换：

原始补丁使用的 v6.19 API	6.6.y 中的等价替换	说明
`KVM_PGT_FN(func)(args)`	直接调用 `func(args)` + `kvm_is_realm()` 检查	`KVM_PGT_FN` 是 v6.19 的 pKVM MMU 分发宏（由 `fce886a60207` 引入），属于大型 pKVM 重构，不适合单独 backport
`kvm_fault_lock(kvm)`	`read_lock(&kvm->mmu_lock)`	6.6.y 中 `kvm_fault_lock()` helper 不存在
`kvm_fault_unlock(kvm)`	`read_unlock(&kvm->mmu_lock)`	同上
`kvm_release_faultin_page(kvm, pfn, ...)`	`kvm_release_pfn_clean(pfn)`	6.6.y 中 `kvm_release_faultin_page()` 不存在
`kvm_stage2_destroy(pgt)`	`kvm_pgtable_stage2_destroy(pgt)`	v6.19 由 `d68d66e57e2b` 重命名拆分
`kvm_init_ipa_range()`	内联在 `kvm_init_stage2_mmu()` 中的等价逻辑	v6.19 已将 IPA 范围初始化拆分为独立函数

4.4 arm.c 上下文适配

arch/arm64/kvm/arm.c 中的多个函数在 v6.19 经历了重构：

kvm_arch_init_vm()：v6.19 在 lockdep 之后增加了 kvm_init_nested() 调用，6.6.y 没有。Realm type 解析代码插入在 6.6.y 的 lockdep 与 kvm_share_hyp() 之间
kvm_arch_vcpu_run_pid_change()：v6.19 已移除 kvm_arch_vcpu_run_map_fp() 调用，6.6.y 仍保留。Realm activation 代码插入在 vcpu_has_run_once() 检查之后
kvm_vm_ioctl_check_extension()：6.6.y 最后一个 case 为 KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES（(kernel-rolling) mtd: rawnand: Add Phytium NAND flash controller support #229），v6.19 多了数个新 case。KVM_CAP_ARM_RMI 在本分支中分配了新编号

4.5 其他适配

arch/arm64/kvm/reset.c：v6.19 有 system_supported_vcpu_features(kvm) 函数（接受 kvm 参数），6.6.y 不存在，直接在 6.6.y 中实现等价的 realm feature 检查逻辑
arch/arm64/kvm/inject_fault.c：v6.19 新增了 __kvm_inject_exception() 等 generic injection primitive，6.6.y 没有，CCA 补丁的 WARN 检查直接插入到 6.6.y 已有函数中
drivers/perf/arm_pmu.c：Patch 35 为 HACK 恢复 cpu_armpmu per-CPU 指针（v6.19 中已被 fa9d27773873 删除），在 6.6.y 中该指针仍存在，因此此补丁的适配有所简化
arch/arm64/kvm/pmu-emul.c：v6.19 经历了 PMCR.N → nr_pmu_counters 重命名重构，6.6.y 仍使用原始变量名，已相应适配
include/uapi/linux/kvm.h：新增 KVM_VM_TYPE_ARM_MASK 和 KVM_VM_TYPE_ARM_REALM 定义，6.6.y 中仅有 KVM_VM_TYPE_ARM_IPA_SIZE_MASK
arch/arm64/include/asm/kvm_host.h：KVM_VCPU_MAX_FEATURES 从 7 增加到 8，为 KVM_ARM_VCPU_REC 预留位

4.6 有意不回合的上游重构

以下上游重构被有意不回合，以减少对 6.6.y 代码库的侵入性：

上游提交	标题	不回合原因
`fce886a60207`	KVM: arm64: Plumb the pKVM MMU in KVM	引入 `KVM_PGT_FN` 宏的大型 pKVM 重构，会大幅改变 mmu.c 结构，对 CCA 非必需
`dc06193532af`	KVM: Move x86's API to release a faultin page to common KVM	引入 `kvm_release_faultin_page()`，可用 `kvm_release_pfn_clean()` 等价替代
`85c7869e30b7`	KVM: arm64: Use __kvm_faultin_pfn() to handle memory aborts	ARM64 采用新 faultin API，可用原有 API 替代
`d68d66e57e2b`	KVM: arm64: Split kvm_pgtable_stage2_destroy()	函数重命名拆分，保持 6.6.y 原有函数名即可
`8cc9dc1ae4fb`	KVM: arm64: Rename the device variable to s2_force_noncacheable	纯重命名，保持 6.6.y 变量名 `device`

五、涉及的主要文件

新增文件

文件	说明
`virt/kvm/guest_memfd.c`	guest 私有内存后端（从 upstream backport）
`arch/arm64/include/asm/rmi_smc.h`	RMI SMC 调用定义
`arch/arm64/include/asm/rmi_cmds.h`	RMI 调用封装
`arch/arm64/include/asm/kvm_rmi.h`	KVM RMI 内部接口
`arch/arm64/kvm/rmi.c`	RMI 实现主体
`arch/arm64/kvm/rmi-exit.c`	RMI 退出处理

主要修改文件

文件	修改说明
`include/uapi/linux/kvm.h`	新增 KVM_EXIT_MEMORY_FAULT、KVM_CAP_ARM_RMI、KVM_VM_TYPE_ARM_REALM、memory_fault struct 等 UAPI 定义
`include/linux/kvm_host.h`	新增 memory attributes 基础设施、kvm_prepare_memory_fault_exit()、kvm_gmem_get_pfn() 等声明
`arch/arm64/kvm/mmu.c`	Realm 内存管理（IPA limit、RTT 拆除、缺页处理、RIPAS 变更、4K 页面强制）
`arch/arm64/kvm/arm.c`	Realm 创建/销毁/capability/VCPU 生命周期管理
`arch/arm64/include/asm/kvm_host.h`	struct kvm_arch 增加 realm 字段、KVM_VCPU_MAX_FEATURES 增加
`arch/arm64/include/asm/kvm_emulate.h`	kvm_is_realm()、vcpu_is_rec() 等辅助函数
`arch/arm64/kvm/guest.c`	Realm 寄存器访问验证和列表管理
`arch/arm64/kvm/reset.c`	Realm VCPU 初始化和 PMCR.N 设置
`arch/arm64/kvm/psci.c`	Realm PSCI 请求处理
`arch/arm64/kvm/inject_fault.c`	Realm 异常注入检查
`arch/arm64/kvm/mmio.c`	Realm MMIO 模拟
`arch/arm64/mm/fault.c`	GPF 异常处理注册
`arch/arm64/kvm/vgic/`	Realm VGIC 支持
`arch/arm64/kvm/arch_timer.c`	Realm timer 同步
`arch/arm64/kvm/pmu-emul.c`	Realm PMU 支持
`drivers/perf/arm_pmu.c`	PMU IRQ disable 机制
`virt/kvm/Kconfig`	新增 KVM_GUEST_MEMFD、KVM_GENERIC_MEMORY_ATTRIBUTES 选项
`Documentation/virt/kvm/api.rst`	KVM_CAP_ARM_RMI 等接口文档

Link: #1319

[ Upstream commit 5a47555 ] In confidential computing usages, whether a page is private or shared is necessary information for KVM to perform operations like page fault handling, page zapping etc. There are other potential use cases for per-page memory attributes, e.g. to make memory read-only (or no-exec, or exec-only, etc.) without having to modify memslots. Introduce the KVM_SET_MEMORY_ATTRIBUTES ioctl, advertised by KVM_CAP_MEMORY_ATTRIBUTES, to allow userspace to set the per-page memory attributes to a guest memory range. Use an xarray to store the per-page attributes internally, with a naive, not fully optimized implementation, i.e. prioritize correctness over performance for the initial implementation. Use bit 3 for the PRIVATE attribute so that KVM can use bits 0-2 for RWX attributes/protections in the future, e.g. to give userspace fine-grained control over read, write, and execute protections for guest memory. Provide arch hooks for handling attribute changes before and after common code sets the new attributes, e.g. x86 will use the "pre" hook to zap all relevant mappings, and the "post" hook to track whether or not hugepages can be used to map the range. To simplify the implementation wrap the entire sequence with kvm_mmu_invalidate_{begin,end}() even though the operation isn't strictly guaranteed to be an invalidation. For the initial use case, x86 *will* always invalidate memory, and preventing arch code from creating new mappings while the attributes are in flux makes it much easier to reason about the correctness of consuming attributes. It's possible that future usages may not require an invalidation, e.g. if KVM ends up supporting RWX protections and userspace grants _more_ protections, but again opt for simplicity and punt optimizations to if/when they are needed. Suggested-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com Cc: Fuad Tabba <tabba@google.com> Cc: Xu Yilun <yilun.xu@intel.com> Cc: Mickaël Salaün <mic@digikod.net> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20231027182217.3615211-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

…mory [ Upstream commit a7800aa ] Introduce an ioctl(), KVM_CREATE_GUEST_MEMFD, to allow creating file-based memory that is tied to a specific KVM virtual machine and whose primary purpose is to serve guest memory. A guest-first memory subsystem allows for optimizations and enhancements that are kludgy or outright infeasible to implement/support in a generic memory subsystem. With guest_memfd, guest protections and mapping sizes are fully decoupled from host userspace mappings. E.g. KVM currently doesn't support mapping memory as writable in the guest without it also being writable in host userspace, as KVM's ABI uses VMA protections to define the allow guest protection. Userspace can fudge this by establishing two mappings, a writable mapping for the guest and readable one for itself, but that’s suboptimal on multiple fronts. Similarly, KVM currently requires the guest mapping size to be a strict subset of the host userspace mapping size, e.g. KVM doesn’t support creating a 1GiB guest mapping unless userspace also has a 1GiB guest mapping. Decoupling the mappings sizes would allow userspace to precisely map only what is needed without impacting guest performance, e.g. to harden against unintentional accesses to guest memory. Decoupling guest and userspace mappings may also allow for a cleaner alternative to high-granularity mappings for HugeTLB, which has reached a bit of an impasse and is unlikely to ever be merged. A guest-first memory subsystem also provides clearer line of sight to things like a dedicated memory pool (for slice-of-hardware VMs) and elimination of "struct page" (for offload setups where userspace _never_ needs to mmap() guest memory). More immediately, being able to map memory into KVM guests without mapping said memory into the host is critical for Confidential VMs (CoCo VMs), the initial use case for guest_memfd. While AMD's SEV and Intel's TDX prevent untrusted software from reading guest private data by encrypting guest memory with a key that isn't usable by the untrusted host, projects such as Protected KVM (pKVM) provide confidentiality and integrity *without* relying on memory encryption. And with SEV-SNP and TDX, accessing guest private memory can be fatal to the host, i.e. KVM must be prevent host userspace from accessing guest memory irrespective of hardware behavior. Attempt deepin-community#1 to support CoCo VMs was to add a VMA flag to mark memory as being mappable only by KVM (or a similarly enlightened kernel subsystem). That approach was abandoned largely due to it needing to play games with PROT_NONE to prevent userspace from accessing guest memory. Attempt deepin-community#2 to was to usurp PG_hwpoison to prevent the host from mapping guest private memory into userspace, but that approach failed to meet several requirements for software-based CoCo VMs, e.g. pKVM, as the kernel wouldn't easily be able to enforce a 1:1 page:guest association, let alone a 1:1 pfn:gfn mapping. And using PG_hwpoison does not work for memory that isn't backed by 'struct page', e.g. if devices gain support for exposing encrypted memory regions to guests. Attempt deepin-community#3 was to extend the memfd() syscall and wrap shmem to provide dedicated file-based guest memory. That approach made it as far as v10 before feedback from Hugh Dickins and Christian Brauner (and others) led to it demise. Hugh's objection was that piggybacking shmem made no sense for KVM's use case as KVM didn't actually *want* the features provided by shmem. I.e. KVM was using memfd() and shmem to avoid having to manage memory directly, not because memfd() and shmem were the optimal solution, e.g. things like read/write/mmap in shmem were dead weight. Christian pointed out flaws with implementing a partial overlay (wrapping only _some_ of shmem), e.g. poking at inode_operations or super_operations would show shmem stuff, but address_space_operations and file_operations would show KVM's overlay. Paraphrashing heavily, Christian suggested KVM stop being lazy and create a proper API. Link: https://lore.kernel.org/all/20201020061859.18385-1-kirill.shutemov@linux.intel.com Link: https://lore.kernel.org/all/20210416154106.23721-1-kirill.shutemov@linux.intel.com Link: https://lore.kernel.org/all/20210824005248.200037-1-seanjc@google.com Link: https://lore.kernel.org/all/20211111141352.26311-1-chao.p.peng@linux.intel.com Link: https://lore.kernel.org/all/20221202061347.1070246-1-chao.p.peng@linux.intel.com Link: https://lore.kernel.org/all/ff5c5b97-acdf-9745-ebe5-c6609dd6322e@google.com Link: https://lore.kernel.org/all/20230418-anfallen-irdisch-6993a61be10b@brauner Link: https://lore.kernel.org/all/ZEM5Zq8oo+xnApW9@google.com Link: https://lore.kernel.org/linux-mm/20230306191944.GA15773@monkey Link: https://lore.kernel.org/linux-mm/ZII1p8ZHlHaQ3dDl@casper.infradead.org Cc: Fuad Tabba <tabba@google.com> Cc: Vishal Annapurve <vannapurve@google.com> Cc: Ackerley Tng <ackerleytng@google.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Maciej Szmigiero <mail@maciej.szmigiero.name> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Quentin Perret <qperret@google.com> Cc: Michael Roth <michael.roth@amd.com> Cc: Wang <wei.w.wang@intel.com> Cc: Liam Merwick <liam.merwick@oracle.com> Cc: Isaku Yamahata <isaku.yamahata@gmail.com> Co-developed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Co-developed-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Ackerley Tng <ackerleytng@google.com> Signed-off-by: Ackerley Tng <ackerleytng@google.com> Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Co-developed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20231027182217.3615211-17-seanjc@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 80583d0 ] With migration disabled, one function becomes unused: virt/kvm/guest_memfd.c:262:12: error: 'kvm_gmem_migrate_folio' defined but not used [-Werror=unused-function] 262 | static int kvm_gmem_migrate_folio(struct address_space *mapping, | ^~~~~~~~~~~~~~~~~~~~~~ Remove the #ifdef around the reference so that fallback_migrate_folio() is never used. The gmem implementation of the hook is trivial; since the gmem mapping is unmovable, the pages should not be migrated anyway. Fixes: a7800aa ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") Reported-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 1d23040 ] truncate_inode_pages_range() may attempt to zero pages before truncating them, and this will occur before arch-specific invalidations can be triggered via .invalidate_folio/.free_folio hooks via kvm_gmem_aops. For AMD SEV-SNP this would result in an RMP #PF being generated by the hardware, which is currently treated as fatal (and even if specifically allowed for, would not result in anything other than garbage being written to guest pages due to encryption). On Intel TDX this would also result in undesirable behavior. Set the AS_INACCESSIBLE flag to prevent the MM from attempting unexpected accesses of this sort during operations like truncation. This may also in some cases yield a decent performance improvement for guest_memfd userspace implementations that hole-punch ranges immediately after private->shared conversions via KVM_SET_MEMORY_ATTRIBUTES, since the current implementation of truncate_inode_pages_range() always ends up zero'ing an entire 4K range if it is backing by a 2M folio. Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/ Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240329212444.395559-6-michael.roth@amd.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 7062372 ] Some SNP ioctls will require the page not to be in the pagecache, and as such they will want to return EEXIST to userspace. Start by passing the error up from filemap_grab_folio. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit fa30b0d ] Because kvm_gmem_get_pfn() is called from the page fault path without any of the slots_lock, filemap lock or mmu_lock taken, it is possible for it to race with kvm_gmem_unbind(). This is not a problem, as any PTE that is installed temporarily will be zapped before the guest has the occasion to run. However, it is not possible to have a complete unbind+bind racing with the page fault, because deleting the memslot will call synchronize_srcu_expedited() and wait for the page fault to be resolved. Thus, we can still warn if the file is there and is not the one we expect. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 3bb2531 ] guest_memfd pages are generally expected to be in some arch-defined initial state prior to using them for guest memory. For SEV-SNP this initial state is 'private', or 'guest-owned', and requires additional operations to move these pages into a 'private' state by updating the corresponding entries the RMP table. Allow for an arch-defined hook to handle updates of this sort, and go ahead and implement one for x86 so KVM implementations like AMD SVM can register a kvm_x86_ops callback to handle these updates for SEV-SNP guests. The preparation callback is always called when allocating/grabbing folios via gmem, and it is up to the architecture to keep track of whether or not the pages are already in the expected state (e.g. the RMP table in the case of SEV-SNP). In some cases, it is necessary to defer the preparation of the pages to handle things like in-place encryption of initial guest memory payloads before marking these pages as 'private'/'guest-owned'. Add an argument (always true for now) to kvm_gmem_get_folio() that allows for the preparation callback to be bypassed. To detect possible issues in the way userspace initializes memory, it is only possible to add an unprepared page if it is not already included in the filemap. Link: https://lore.kernel.org/lkml/ZLqVdvsF11Ddo7Dq@google.com/ Co-developed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20231230172351.574091-5-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 17573fd ] In preparation for adding a function that walks a set of pages provided by userspace and populates them in a guest_memfd, add a version of kvm_gmem_get_pfn() that has a "bool prepare" argument and passes it down to kvm_gmem_get_folio(). Populating guest memory has to call repeatedly __kvm_gmem_get_pfn() on the same file, so make the new function take struct file*. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 1f6c06b ] During guest run-time, kvm_arch_gmem_prepare() is issued as needed to prepare newly-allocated gmem pages prior to mapping them into the guest. In the case of SEV-SNP, this mainly involves setting the pages to private in the RMP table. However, for the GPA ranges comprising the initial guest payload, which are encrypted/measured prior to starting the guest, the gmem pages need to be accessed prior to setting them to private in the RMP table so they can be initialized with the userspace-provided data. Additionally, an SNP firmware call is needed afterward to encrypt them in-place and measure the contents into the guest's launch digest. While it is possible to bypass the kvm_arch_gmem_prepare() hooks so that this handling can be done in an open-coded/vendor-specific manner, this may expose more gmem-internal state/dependencies to external callers than necessary. Try to avoid this by implementing an interface that tries to handle as much of the common functionality inside gmem as possible, while also making it generic enough to potentially be usable/extensible for TDX as well. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Co-developed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit a90764f ] In some cases, like with SEV-SNP, guest memory needs to be updated in a platform-specific manner before it can be safely freed back to the host. Wire up arch-defined hooks to the .free_folio kvm_gmem_aops callback to allow for special handling of this sort when freeing memory in response to FALLOC_FL_PUNCH_HOLE operations and when releasing the inode, and go ahead and define an arch-specific hook for x86 since it will be needed for handling memory used for SEV-SNP guests. Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20231230172351.574091-6-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit d814738 ] kvm_gmem_populate() is a potentially lengthy operation that can involve multiple calls to the firmware. Interrupt it if a signal arrives. Fixes: 1f6c06b ("KVM: guest_memfd: Add interface for populating gmem pages with user data") Cc: Isaku Yamahata <isaku.yamahata@intel.com> Cc: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit e300614 ] Use a guard to simplify early returns, and add two more easy shortcuts. If the requested attributes are invalid, the attributes xarray will never show them as set. And if testing a single page, kvm_get_memory_attributes() is more efficient. Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 47bb584 ] When running an SEV-SNP guest with a sufficiently large amount of memory (1TB+), the host can experience CPU soft lockups when running an operation in kvm_vm_set_mem_attributes() to set memory attributes on the whole range of guest memory. watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [qemu-kvm:6372] CPU: 8 UID: 0 PID: 6372 Comm: qemu-kvm Kdump: loaded Not tainted 6.15.0-rc7.20250520.el9uek.rc1.x86_64 deepin-community#1 PREEMPT(voluntary) Hardware name: Oracle Corporation ORACLE SERVER E4-2c/Asm,MB Tray,2U,E4-2c, BIOS 78016600 11/13/2024 RIP: 0010:xas_create+0x78/0x1f0 Code: 00 00 00 41 80 fc 01 0f 84 82 00 00 00 ba 06 00 00 00 bd 06 00 00 00 49 8b 45 08 4d 8d 65 08 41 39 d6 73 20 83 ed 06 48 85 c0 <74> 67 48 89 c2 83 e2 03 48 83 fa 02 75 0c 48 3d 00 10 00 00 0f 87 RSP: 0018:ffffad890a34b940 EFLAGS: 00000286 RAX: ffff96f30b261daa RBX: ffffad890a34b9c8 RCX: 0000000000000000 RDX: 000000000000001e RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffad890a356868 R13: ffffad890a356860 R14: 0000000000000000 R15: ffffad890a356868 FS: 00007f5578a2a400(0000) GS:ffff97ed317e1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f015c70fb18 CR3: 00000001109fd006 CR4: 0000000000f70ef0 PKRU: 55555554 Call Trace: <TASK> xas_store+0x58/0x630 __xa_store+0xa5/0x130 xa_store+0x2c/0x50 kvm_vm_set_mem_attributes+0x343/0x710 [kvm] kvm_vm_ioctl+0x796/0xab0 [kvm] __x64_sys_ioctl+0xa3/0xd0 do_syscall_64+0x8c/0x7a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f5578d031bb Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d 4c 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe0a742b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000000004020aed2 RCX: 00007f5578d031bb RDX: 00007ffe0a742c80 RSI: 000000004020aed2 RDI: 000000000000000b RBP: 0000010000000000 R08: 0000010000000000 R09: 0000017680000000 R10: 0000000000000080 R11: 0000000000000246 R12: 00005575e5f95120 R13: 00007ffe0a742c80 R14: 0000000000000008 R15: 00005575e5f961e0 While looping through the range of memory setting the attributes, call cond_resched() to give the scheduler a chance to run a higher priority task on the runqueue if necessary and avoid staying in kernel mode long enough to trigger the lockup. Fixes: 5a47555 ("KVM: Introduce per-page memory attributes") Cc: stable@vger.kernel.org # 6.12.x Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Link: https://lore.kernel.org/r/20250609091121.2497429-2-liam.merwick@oracle.com Signed-off-by: Sean Christopherson <seanjc@google.com>

[ Upstream commit 19a9a1a ] Rename the Kconfig option CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GUEST_MEMFD. The original name implied that the feature only supported "private" memory. However, CONFIG_KVM_PRIVATE_MEM enables guest_memfd in general, which is not exclusively for private memory. Subsequent patches in this series will add guest_memfd support for non-CoCo VMs, whose memory is not private. Renaming the Kconfig option to CONFIG_KVM_GUEST_MEMFD more accurately reflects its broader scope as the main Kconfig option for all guest_memfd-backed memory. This provides clearer semantics for the option and avoids confusion as new features are introduced. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shivank Garg <shivankg@amd.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit d0d8722 ] Right now this is simply more consistent and avoids use of pfn_to_page() and put_page(). It will be put to more use in upcoming patches, to ensure that the up-to-date flag is set at the very end of both the kvm_gmem_get_pfn() and kvm_gmem_populate() flows. Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

…preparation [ Upstream commit d04c77d ] The up-to-date flag as is now is not too useful; it tells guest_memfd not to overwrite the contents of a folio, but it doesn't say that the page is ready to be mapped into the guest. For encrypted guests, mapping a private page requires that the "preparation" phase has succeeded, and at the same time the same page cannot be prepared twice. So, ensure that folio_mark_uptodate() is only called on a prepared page. If kvm_gmem_prepare_folio() or the post_populate callback fail, the folio will not be marked up-to-date; it's not a problem to call clear_highpage() again on such a page prior to the next preparation attempt. Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 564429a ] Add "ARCH" to the symbols; shortly, the "prepare" phase will include both the arch-independent step to clear out contents left in the page by the host, and the arch-dependent step enabled by CONFIG_HAVE_KVM_GMEM_PREPARE. For consistency do the same for CONFIG_HAVE_KVM_GMEM_INVALIDATE as well. Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit e4ee544 ] This check is currently performed by sev_gmem_post_populate(), but it applies to all callers of kvm_gmem_populate(): the point of the function is that the memory is being encrypted and some work has to be done on all the gfns in order to encrypt them. Therefore, check the KVM_MEMORY_ATTRIBUTE_PRIVATE attribute prior to invoking the callback, and stop the operation if a shared page is encountered. Because CONFIG_KVM_PRIVATE_MEM in principle does not require attributes, this makes kvm_gmem_populate() depend on CONFIG_KVM_GENERIC_PRIVATE_MEM (which does require them). Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit dca6c88 ] Add new members to strut kvm_gfn_range to indicate which mapping (private-vs-shared) to operate on: enum kvm_gfn_range_filter attr_filter. Update the core zapping operations to set them appropriately. TDX utilizes two GPA aliases for the same memslots, one for memory that is for private memory and one that is for shared. For private memory, KVM cannot always perform the same operations it does on memory for default VMs, such as zapping pages and having them be faulted back in, as this requires guest coordination. However, some operations such as guest driven conversion of memory between private and shared should zap private memory. Internally to the MMU, private and shared mappings are tracked on separate roots. Mapping and zapping operations will operate on the respective GFN alias for each root (private or shared). So zapping operations will by default zap both aliases. Add fields in struct kvm_gfn_range to allow callers to specify which aliases so they can only target the aliases appropriate for their specific operation. There was feedback that target aliases should be specified such that the default value (0) is to operate on both aliases. Several options were considered. Several variations of having separate bools defined such that the default behavior was to process both aliases. They either allowed nonsensical configurations, or were confusing for the caller. A simple enum was also explored and was close, but was hard to process in the caller. Instead, use an enum with the default value (0) reserved as a disallowed value. Catch ranges that didn't have the target aliases specified by looking for that specific value. Set target alias with enum appropriately for these MMU operations: - For KVM's mmu notifier callbacks, zap shared pages only because private pages won't have a userspace mapping - For setting memory attributes, kvm_arch_pre_set_memory_attributes() chooses the aliases based on the attribute. - For guest_memfd invalidations, zap private only. Link: https://lore.kernel.org/kvm/ZivIF9vjKcuGie3s@google.com/ Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Message-ID: <20240718211230.1492011-3-rick.p.edgecombe@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 923310b ] Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() to improve clarity and accurately reflect its purpose. The function kvm_slot_can_be_private() was previously used to check if a given kvm_memory_slot is backed by guest_memfd. However, its name implied that the memory in such a slot was exclusively "private". As guest_memfd support expands to include non-private memory (e.g., shared host mappings), it's important to remove this association. The new name, kvm_slot_has_gmem(), states that the slot is backed by guest_memfd without making assumptions about the memory's privacy attributes. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shivank Garg <shivankg@amd.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit 638ea79 ] Refactor user_mem_abort() to improve code clarity and simplify assumptions within the function. Key changes include: * Immediately set force_pte to true at the beginning of the function if logging_active is true. This simplifies the flow and makes the condition for forcing a PTE more explicit. * Remove the misleading comment stating that logging_active is guaranteed to never be true for VM_PFNMAP memslots, as this assertion is not entirely correct. * Extract reusable code blocks into new helper functions: * prepare_mmu_memcache(): Encapsulates the logic for preparing and topping up the MMU page cache. * adjust_nested_fault_perms(): Isolates the adjustments to shadow S2 permissions and the encoding of nested translation levels. * Update min(a, (long)b) to min_t(long, a, b) for better type safety and consistency. * Perform other minor tidying up of the code. These changes primarily aim to simplify user_mem_abort() and make its logic easier to understand and maintain, setting the stage for future modifications. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Tao Chan <chentao@kylinos.cn> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[ Upstream commit a7b57e0 ] Add arm64 architecture support for handling guest page faults on memory slots backed by guest_memfd. This change introduces a new function, gmem_abort(), which encapsulates the fault handling logic specific to guest_memfd-backed memory. The kvm_handle_guest_abort() entry point is updated to dispatch to gmem_abort() when a fault occurs on a guest_memfd-backed memory slot (as determined by kvm_slot_has_gmem()). Until guest_memfd gains support for huge pages, the fault granule for these memory regions is restricted to PAGE_SIZE. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Fix a potential build error (like below, when asm/kvm_emulate.h gets included after the kvm/arm_psci.h) by including the missing header file in kvm/arm_psci.h: ./include/kvm/arm_psci.h: In function ‘kvm_psci_version’: ./include/kvm/arm_psci.h:29:13: error: implicit declaration of function ‘vcpu_has_feature’; did you mean ‘cpu_have_feature’? [-Werror=implicit-function-declaration] 29 | if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_PSCI_0_2)) { | ^~~~~~~~~~~~~~~~ | cpu_have_feature Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com>

If the host attempts to access granules that have been delegated for use in a realm these accesses will be caught and will trigger a Granule Protection Fault (GPF). A fault during a page walk signals a bug in the kernel and is handled by oopsing the kernel. A non-page walk fault could be caused by user space having access to a page which has been delegated to the kernel and will trigger a SIGBUS to allow debugging why user space is trying to access a delegated page. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com>

The RMM (Realm Management Monitor) provides functionality that can be accessed by SMC calls from the host. The SMC definitions are based on DEN0137[1] version 1.0-rel0 [1] https://developer.arm.com/documentation/den0137/1-0rel0/ Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com>

The wrappers make the call sites easier to read and deal with the boiler plate of handling the error codes from the RMM. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com>

Query the RMI version number and check if it is a compatible version. A static key is also provided to signal that a supported RMM is available. Functions are provided to query if a VM or VCPU is a realm (or rec) which currently will always return false. Later patches make use of struct realm and the states as the ioctls interfaces are added to support realm and REC creation and destruction. Signed-off-by: Steven Price <steven.price@arm.com>

There is one CAP which identified the presence of CCA, and two ioctls. One ioctl is used to populate memory and the other is used when user space is providing the PSCI implementation to identify the target of the operation. Signed-off-by: Steven Price <steven.price@arm.com>

Introduce the skeleton functions for creating and destroying a realm. The IPA size requested is checked against what the RMM supports. The actual work of constructing the realm will be added in future patches. Signed-off-by: Steven Price <steven.price@arm.com>

… guests RMM v1.0 provides no mechanism for the host to perform debug operations on the guest. So limit the extensions that are visible to an allowlist so that only those capabilities we can support are advertised. Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com>

The RMM only allows setting the GPRS (x0-x30) and PC for a realm guest. Check this in kvm_arm_set_reg() so that the VMM can receive a suitable error return if other registers are written to. The RMM makes similar restrictions for reading of the guest's registers (this is *confidential* compute after all), however we don't impose the restriction here. This allows the VMM to read (stale) values from the registers which might be useful to read back the initial values even if the RMM doesn't provide the latest version. For migration of a realm VM, a new interface will be needed so that the VMM can receive an (encrypted) blob of the VM's state. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Steven Price <steven.price@arm.com>

The RMM needs to be informed of the target REC when a PSCI call is made with an MPIDR argument. Expose an ioctl to the userspace in case the PSCI is handled by it. Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com>

The RMM doesn't allow injection of a undefined exception into a realm guest. Add a WARN to catch if this ever happens. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com>

The VMM has no control or visibility of the VPU execution of a realm guest, and therefore is unable to provide meaningful stolen time statistics. Reflect this by not advertising KVM_CAP_STEAL_TIME when running a realm guest. Note that steal time accounting is not available when a guest is running within a Arm CCA realm (machine type KVM_VM_TYPE_ARM_REALM). Signed-off-by: Steven Price <steven.price@arm.com>

For Realm guests it is impossible to directly inject a synchronous exception. Instead the RMM can be asked to inject a Synchronous External Abort (SEA) when the next REC enter is performed. Expose the KVM_SET_VCPU_EVENTS API to provide the means for the VMM to trigger an SEA injection, when the previous exit was due to a Data abort for an emulated unprotected access. Signed-off-by: Steven Price <steven.price@arm.com>

Forward RSI_HOST_CALLS to KVM's HVC handler. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com>

Given we have different types of VMs supported, check the support for SVE for the given instance of the VM to accurately report the status. Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com>

The minimum granule size for the RMM is 4k page. So force 4k page on for realm guests. Signed-off-by: Steven Price <steven.price@arm.com>

Physical device assignment is not supported by RMM v1.0, so it doesn't make much sense to allow device mappings within the realm. Prevent them when the guest is a realm. Signed-off-by: Steven Price <steven.price@arm.com>

Commit fa9d27773873 ("perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer") removed the per-CPU cpu_armpmu. Rather than refactoring the code to deal with this just reintroduce it. The CCA PMU code will be changing when switching to the RMM v2.0 ABI and will need completely reworking. Signed-off-by: Steven Price <steven.price@arm.com>

Arm CCA assigns the physical PMU device to the guest running in realm world, however the IRQs are routed via the host. To enter a realm guest while a PMU IRQ is pending it is necessary to block the physical IRQ to prevent an immediate exit. Provide a mechanism in the PMU driver for KVM to control the physical IRQ. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com>

Use the PMU registers from the RmiRecExit structure to identify when an overflow interrupt is due and inject it into the guest. Also hook up the configuration option for enabling the PMU within the guest. When entering a realm guest with a PMU interrupt pending, it is necessary to disable the physical interrupt. Otherwise when the RMM restores the PMU state the physical interrupt will trigger causing an immediate exit back to the host. The guest is expected to acknowledge the interrupt causing a host exit (to update the GIC state) which gives the opportunity to re-enable the physical interrupt before the next PMU event. Number of PMU counters is configured by the VMM by writing to PMCR.N. Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com>

… to userspace The RMM describes the maximum number of BPs/WPs available to the guest in the Feature Register 0. Propagate those numbers into ID_AA64DFR0_EL1, which is visible to userspace. A VMM needs this information in order to set up realm parameters. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com>

Allow userspace to configure the number of breakpoints and watchpoints of a Realm VM through KVM_SET_ONE_REG ID_AA64DFR0_EL1. The KVM sys_reg handler checks the user value against the maximum value given by RMM (arm64_check_features() gets it from the read_sanitised_id_aa64dfr0_el1() reset handler). Userspace discovers that it can write these fields by issuing a KVM_ARM_GET_REG_WRITABLE_MASKS ioctl. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com>

… by RMM Provide an accurate number of available PMU counters to userspace when setting up a Realm. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com>

RMM provides the maximum vector length it supports for a guest in its feature register. Make it visible to the rest of KVM and to userspace via KVM_REG_ARM64_SVE_VLS. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Obtain the max vector length configured by userspace on the vCPUs, and write it into the Realm parameters. By default the vCPU is configured with the max vector length reported by RMM, and userspace can reduce it with a write to KVM_REG_ARM64_SVE_VLS. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com>

KVM_GET_REG_LIST should not be called before SVE is finalized. The ioctl handler currently returns -EPERM in this case. But because it uses kvm_arm_vcpu_is_finalized(), it now also rejects the call for unfinalized REC even though finalizing the REC can only be done late, after Realm descriptor creation. Move the check to copy_sve_reg_indices(). One adverse side effect of this change is that a KVM_GET_REG_LIST call that only probes for the array size will now succeed even if SVE is not finalized, but that seems harmless since the following KVM_GET_REG_LIST with the full array will fail. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com>

Userspace can set a few registers with KVM_SET_ONE_REG (9 GP registers at runtime, and 3 system registers during initialization). Update the register list returned by KVM_GET_REG_LIST. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Steven Price <steven.price@arm.com>

Increment KVM_VCPU_MAX_FEATURES to expose the new capability to user space. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com>

All the pieces are now in place, so enable kvm_rmi_is_available when the RMM is detected. Signed-off-by: Steven Price <steven.price@arm.com>

sourcery-ai

Sorry @Avenger-285714, your pull request is larger than the review limit of 150000 diff characters

deepin-ci-robot · 2026-03-02T07:09:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from avenger-285714. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

deepin/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Pull request overview

This PR backports Arm CCA (RME/Realm) enablement for KVM to the linux-6.6.y kernel series, including the prerequisite generic KVM infrastructure (guest_memfd, per-page memory attributes, and UAPI extensions) needed to support private vs. shared guest memory.

Changes:

Add generic KVM guest_memfd and per-page memory attribute infrastructure, including new ioctls/UAPI (KVM_CREATE_GUEST_MEMFD, KVM_SET_MEMORY_ATTRIBUTES, KVM_SET_USER_MEMORY_REGION2, KVM_EXIT_MEMORY_FAULT).
Integrate Arm64 Realm/RMI plumbing into KVM (new RMI headers, realm VM/vCPU lifecycle, MMU fault handling, VGIC/timer/PSCI adaptations).
Extend x86 KVM paths to interoperate with generic private memory infrastructure (e.g., memory-fault exit info, gmem prepare/invalidate hooks, SNP-related populate flow in SEV code).

Reviewed changes

Copilot reviewed 49 out of 49 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
virt/kvm/kvm_mm.h	Adds guest_memfd interface declarations/stubs.
virt/kvm/kvm_main.c	Wires memslot lifecycle to gmem bind/unbind; adds memory attributes and guest_memfd ioctls.
virt/kvm/guest_memfd.c	Introduces guest_memfd backing implementation (folio management, bind/unbind, populate).
virt/kvm/Makefile.kvm	Builds guest_memfd support when enabled.
virt/kvm/Kconfig	Adds KVM_GUEST_MEMFD and generic memory-attribute/private-mem Kconfig symbols.
include/uapi/linux/kvm.h	Adds UAPI for memory attributes, guest_memfd, memory-fault exit, arm64 VM type bits, and new caps.
include/linux/perf/arm_pmu.h	Exposes per-CPU arm PMU pointer and physical IRQ toggling API.
include/linux/kvm_host.h	Adds mem attributes API hooks and gmem PFN retrieval API.
include/kvm/arm_psci.h	Adds arm64 KVM emulate header dependency.
include/kvm/arm_pmu.h	Adds helper macro used by realm PMU/IRQ handling.
include/kvm/arm_arch_timer.h	Exposes realm timer update helper.
fs/anon_inodes.c	Exports anon inode helper for guest_memfd file creation.
drivers/perf/arm_pmu.c	Implements PMU physical IRQ enable/disable helper and exposes cpu_armpmu.
arch/x86/kvm/x86.c	Adds x86 arch hooks for gmem prepare/invalidate; exposes memory-fault-info capability.
arch/x86/kvm/svm/sev.c	Adds SNP launch/update flow using gmem populate support.
arch/x86/kvm/mmu/mmu_internal.h	Extends page fault tracking for private memory and refcounted pages.
arch/x86/kvm/mmu/mmu.c	Adds private-memory PFN faultin path and memory-attribute integration.
arch/x86/kvm/Kconfig	Enables generic private-mem + gmem hooks under SEV.
arch/x86/include/asm/kvm_host.h	Adds x86 arch “has_private_mem” plumbing and gmem ops hooks.
arch/x86/include/asm/kvm-x86-ops.h	Adds optional x86 ops entries for gmem prepare/invalidate.
arch/arm64/mm/fault.c	Adds GPF handling for RME Granule Protection Faults.
arch/arm64/kvm/vgic/vgic.h	Adds realm-specific LR count helper and RMI include.
arch/arm64/kvm/vgic/vgic.c	Adds realm save/restore paths for VGIC state.
arch/arm64/kvm/vgic/vgic-v3.c	Skips host-side APR/trap handling for realm vCPUs.
arch/arm64/kvm/vgic/vgic-init.c	Blocks unsupported VGICv2 emulation for realms.
arch/arm64/kvm/sys_regs.c	Tightens ID reg validation and hides sysregs for realms.
arch/arm64/kvm/rmi-exit.c	Implements realm REC exit decoding/handling.
arch/arm64/kvm/reset.c	Adds realm-aware SVE max VL handling and REC cleanup.
arch/arm64/kvm/psci.c	Integrates realm PSCI completion semantics.
arch/arm64/kvm/pmu-emul.c	Reads realm PMU overflow status from REC exit context.
arch/arm64/kvm/mmu.c	Adds realm mapping/unmapping paths and gmem fault handling for private memory.
arch/arm64/kvm/mmio.c	Adjusts MMIO emulation return path for realm REC ABI.
arch/arm64/kvm/inject_fault.c	Adjusts exception injection behavior for realm RECs.
arch/arm64/kvm/hypercalls.c	Hides FW reg indices for realms.
arch/arm64/kvm/guest.c	Restricts and validates writable regs and event injection for realms.
arch/arm64/kvm/arm.c	Adds realm VM type, capability filtering, REC run loop, and RMI init/populate ioctls.
arch/arm64/kvm/arch_timer.c	Adds realm timer IRQ update path and realm-specific offset behavior.
arch/arm64/kvm/Makefile	Builds new RMI implementation files.
arch/arm64/kvm/Kconfig	Enables generic memory-attributes integration for arm64 KVM and related selects.
arch/arm64/kernel/cpufeature.c	Exposes RME feature bits in CPU feature framework.
arch/arm64/include/uapi/asm/kvm.h	Adds `KVM_ARM_VCPU_REC` feature bit.
arch/arm64/include/asm/virt.h	Declares static key for RMI availability.
arch/arm64/include/asm/rmi_smc.h	Adds RMI SMC IDs and ABI structures.
arch/arm64/include/asm/rmi_cmds.h	Adds inline wrappers for RMI commands.
arch/arm64/include/asm/kvm_rmi.h	Adds realm/REC structures and internal APIs.
arch/arm64/include/asm/kvm_host.h	Adds realm fields to KVM arch structs and realm-private-mem indicator.
arch/arm64/include/asm/kvm_emulate.h	Adds realm helpers (`kvm_is_realm`, `kvm_gpa_from_fault`, `vcpu_is_rec`, etc.).
Documentation/virt/kvm/api.rst	Documents new UAPIs/capabilities for guest_memfd, memory attributes, and Arm RMI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-02T07:24:33Z

virt/kvm/guest_memfd.c

+	if (IS_ERR(folio))
+		return PTR_ERR(folio);
+
+	*page = folio_file_page(folio, gfn - slot->base_gfn);


The page returned via folio_file_page() is indexed by file offset, but this uses gfn - slot->base_gfn (offset within the memslot) and ignores slot->gmem.pgoff. For non-zero slot->gmem.pgoff, this returns the wrong page pointer. Use the same file index that __kvm_gmem_get_pfn() computes (e.g. gfn - slot->base_gfn + slot->gmem.pgoff).

Copilot · 2026-03-02T07:24:34Z

virt/kvm/kvm_main.c

+bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
+				     unsigned long attrs)
+{


kvm_range_has_memory_attributes() is implemented with a 4-argument signature (attrs only), but new callers added in this PR pass five arguments (mask + attrs). This is a build break and also suggests the helper's semantics are incomplete. Update the function signature and implement masked matching (or update all callers to the 4-arg form).

Copilot · 2026-03-02T07:24:34Z

drivers/perf/arm_pmu.c

+void arm_pmu_set_phys_irq(bool enable)
+{
+	int cpu = get_cpu();
+	struct arm_pmu *pmu = per_cpu(cpu_armpmu, cpu);
+	int irq;
+
+	irq = armpmu_get_cpu_irq(pmu, cpu);
+	if (irq && !enable)


arm_pmu_set_phys_irq() dereferences pmu->hw_events via armpmu_get_cpu_irq(pmu, cpu) without checking whether pmu is NULL. per_cpu(cpu_armpmu, cpu) is set/cleared during CPU hotplug and could be NULL; add a NULL check (and possibly validate cpu_irq_ops) to avoid a potential NULL deref.

Copilot · 2026-03-02T07:24:34Z

virt/kvm/guest_memfd.c

+	struct list_head *gmem_list = &inode->i_mapping->i_private_list;
+	struct kvm_gmem *gmem;


inode->i_mapping doesn't have an i_private_list member (the address_space member is private_list). This will fail to compile and is also inconsistent with later code that uses inode->i_mapping->private_list.

Copilot · 2026-03-02T07:24:34Z

virt/kvm/guest_memfd.c

+	file = anon_inode_create_getfile(anon_name, &kvm_gmem_fops, gmem,
+					 O_RDWR, NULL);
+	if (IS_ERR(file)) {


anon_inode_create_getfile() is used here but is not declared in include/linux/anon_inodes.h and doesn't appear to be defined anywhere in the tree. As-is this will not compile/link. Either switch to an existing helper like anon_inode_getfile_secure()/anon_inode_getfile(), or add/declare an anon_inode_create_getfile() wrapper in fs/anon_inodes.c and the header.

Copilot · 2026-03-02T07:24:35Z

fs/anon_inodes.c

 	return __anon_inode_getfile(name, fops, priv, flags,
 				    context_inode, true);
 }
+EXPORT_SYMBOL_GPL(anon_inode_create_getfile);


This exports anon_inode_create_getfile, but there is no corresponding function definition in this file (only anon_inode_getfile() and anon_inode_getfile_secure()). Exporting an undefined symbol will fail the build, and it doesn't add a prototype for anon_inode_create_getfile() either. Define the wrapper (likely calling __anon_inode_getfile(..., secure=true)) and add it to include/linux/anon_inodes.h, or export the correct existing symbol instead.

Copilot · 2026-03-02T07:24:35Z

virt/kvm/guest_memfd.c

+		while (!kvm_range_has_memory_attributes(kvm, gfn, gfn + (1 << max_order),
+							KVM_MEMORY_ATTRIBUTE_PRIVATE,
+							KVM_MEMORY_ATTRIBUTE_PRIVATE)) {
+			if (!max_order)


kvm_range_has_memory_attributes() is called with five arguments here, but its declaration/definition in this PR only accepts four. This is a build break; please align the function signature and update all call sites consistently (e.g. if a mask parameter is intended, add it to the prototype and implement masked matching in the helper).

Copilot · 2026-03-02T07:24:35Z

arch/x86/kvm/mmu/mmu.c

+	if (level == PG_LEVEL_2M)
+		return kvm_range_has_memory_attributes(kvm, start, end, ~0, attrs);


kvm_range_has_memory_attributes() is invoked with five arguments here, but in this PR it's declared/defined with only four parameters. This mismatch will fail to compile; align the helper's prototype/definition with the intended call pattern (or adjust callers).

Copilot · 2026-03-02T07:24:35Z

include/linux/kvm_host.h

+bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
+				     unsigned long attrs);


kvm_range_has_memory_attributes() is declared with a 4-argument signature, but there are new callers in this PR that pass five arguments (e.g. x86 MMU hugepage checks and guest_memfd populate). This will not compile; update the prototype (and the implementation) to match the intended signature and semantics.

Copilot · 2026-03-02T07:24:36Z

arch/arm64/kvm/Kconfig

 	select SCHED_INFO
 	select GUEST_PERF_EVENTS if PERF_EVENTS
+	select KVM_GENERIC_MEMORY_ATTRIBUTES
+	select HAVE_KVM_ARCH_GMEM_POPULATE


HAVE_KVM_ARCH_GMEM_POPULATE is selected here, but that Kconfig symbol isn't defined anywhere (at least not in this PR or the existing tree). Selecting an undefined symbol will break Kconfig processing; either add the missing symbol definition (likely in virt/kvm/Kconfig) or drop this select if it's not needed.

Suggested change

select HAVE_KVM_ARCH_GMEM_POPULATE

chao-p and others added 30 commits March 1, 2026 09:14

Steven Price and others added 21 commits March 2, 2026 14:22

Fromlist: KVM: arm64: WARN on injected undef exceptions

2eb2b43

The RMM doesn't allow injection of a undefined exception into a realm guest. Add a WARN to catch if this ever happens. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com>

Fromlist: arm64: RMI: support RSI_HOST_CALL

351eefb

Forward RSI_HOST_CALLS to KVM's HVC handler. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com>

Fromlist: arm64: RMI: Always use 4k pages for realms

95143a5

The minimum granule size for the RMM is 4k page. So force 4k page on for realm guests. Signed-off-by: Steven Price <steven.price@arm.com>

Fromlist: arm64: RMI: Prevent Device mappings for Realms

43b2e9d

Physical device assignment is not supported by RMM v1.0, so it doesn't make much sense to allow device mappings within the realm. Prevent them when the guest is a realm. Signed-off-by: Steven Price <steven.price@arm.com>

Fromlist: KVM: arm64: Expose KVM_ARM_VCPU_REC to user space

9b38228

Increment KVM_VCPU_MAX_FEATURES to expose the new capability to user space. Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com>

Fromlist: arm64: RMI: Enable realms to be created

a0ba456

All the pieces are now in place, so enable kvm_rmi_is_available when the RMM is detected. Signed-off-by: Steven Price <steven.price@arm.com>

Avenger-285714 requested review from Copilot and opsiff March 2, 2026 07:08

sourcery-ai bot reviewed Mar 2, 2026

View reviewed changes

deepin-ci-robot requested a review from winnscode March 2, 2026 07:09

Copilot started reviewing on behalf of Avenger-285714 March 2, 2026 07:10 View session

Avenger-285714 changed the title ~~[Deepin-Kernel-SIG] [linux 6.6.y] [Arm] [Fromlist] [Security] arm64: Support for Arm CCA in KVM~~ [WIP] [Deepin-Kernel-SIG] [linux 6.6.y] [Arm] [Fromlist] [Security] arm64: Support for Arm CCA in KVM Mar 2, 2026

deepin-ci-robot added the do-not-merge/work-in-progress label Mar 2, 2026

Copilot AI reviewed Mar 2, 2026

View reviewed changes

		struct list_head *gmem_list = &inode->i_mapping->i_private_list;
		struct kvm_gmem *gmem;

		if (level == PG_LEVEL_2M)
		return kvm_range_has_memory_attributes(kvm, start, end, ~0, attrs);

		bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
		unsigned long attrs);

Conversation

Avenger-285714 commented Mar 2, 2026

以下 description 由 AI 辅助生成：

arm64: 支持 Arm CCA (Confidential Compute Architecture) in KVM

概述

提交组成

一、Upstream Backport 提交详解 (25 个)

阶段 0.1：guest_memfd 核心框架 (10 个)

阶段 0.2：API 修复与命名统一 (12 个)

阶段 0.3：UAPI/API 补全 (3 个)

二、Fromlist CCA 补丁 (46 个)

完整提交列表

功能分组详解

三、提交顺序说明

四、相对于原始上游/Fromlist 状态的适配改动

4.1 guest_memfd 子系统完整 Backport

4.2 KVM_EXIT_MEMORY_FAULT UAPI 补全

4.3 mmu.c 的 API 适配

4.4 arm.c 上下文适配

4.5 其他适配

4.6 有意不回合的上游重构

五、涉及的主要文件

新增文件

主要修改文件

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

deepin-ci-robot commented Mar 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants