Preserve Linx block state across Linux context switches and early boot#23
Preserve Linx block state across Linux context switches and early boot#23zhoubot wants to merge 1 commit into
Conversation
The current work renames the EBSTATE window to spec-shaped SSR aliases, saves and restores the full ACR1 block-state window in __switch_to, tightens exception and return control flow, and adjusts early MMU and initramfs bring-up around the firmwareless runtime lane. Constraint: The active PR lane still depends on full ACR1 EBSTATE save and restore across task switches Rejected: Save only named EBARG fields | current bring-up still needs the full 0x1f40..0x1f5f window preserved per task Confidence: medium Scope-risk: broad Directive: Do not shrink the saved ACR1 window without re-proving task-switch, FRET.STK, and full runtime lanes Tested: git diff --check Not-tested: kernel build, initramfs smoke, BusyBox full-OS regression Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
Code Review
This pull request introduces numerous early bring-up workarounds, fixes, and boot-time bypasses for the Linx architecture, covering memory management, scheduling, locking, and CPU hotplug subsystems. The code review identified several critical issues that must be addressed: a severe security vulnerability in __get_kernel_nofault and __put_kernel_nofault due to global virtual-to-physical address conversion; a potential post-boot crash caused by marking the persistent linx_root_task_group_boot_buf with __initdata; missing parentheses in the LINX_BOOT_PT_LOW_ALIAS_LIMIT macro definition; broken cacheinfo initialization on SMP systems due to boot-CPU-only hotplug callbacks; a severe memory leak in finish_task_switch caused by an unreachable cleanup path; and a resource leak of MSI domains on initialization failure in the interrupt controller driver.
| if (is_kernel_mapping((unsigned long)__kr_src) || \ | ||
| is_linear_mapping((unsigned long)__kr_src)) \ | ||
| __kr_src = (const void *)(uintptr_t)__pa(__kr_src); \ |
There was a problem hiding this comment.
Converting kernel virtual addresses to physical addresses globally in __get_kernel_nofault and __put_kernel_nofault is highly dangerous. While this might work during early boot when identity mappings are active, at runtime (when user space processes are running), the active page table (current->mm->pgd) maps low virtual addresses to user space memory. Consequently, any attempt to use these nofault helpers at runtime while a user process is active will access the user's virtual memory instead of the kernel's physical memory. This is a critical security vulnerability and correctness bug. Please restrict this physical address conversion workaround to early boot (e.g., by checking system_state == SYSTEM_BOOTING) or avoid doing it globally.
| static unsigned long linx_root_task_group_boot_buf[512] __initdata | ||
| __aligned(SMP_CACHE_BYTES); |
There was a problem hiding this comment.
The static buffer linx_root_task_group_boot_buf is marked with __initdata. However, it is used as a persistent fallback backing store for root_task_group.se and root_task_group.cfs_rq, which must remain valid for the entire lifetime of the system. Since __initdata memory is freed and reclaimed after early boot completes, any scheduling activity after boot will access freed/reclaimed memory, leading to undefined behavior, silent data corruption, or a catastrophic kernel panic. Please remove the __initdata attribute from this buffer.
static unsigned long linx_root_task_group_boot_buf[512] __aligned(SMP_CACHE_BYTES);| #define PTRS_PER_P4D (PAGE_SIZE / sizeof(p4d_t)) | ||
|
|
||
| #if defined(__LINX__) | ||
| #define LINX_BOOT_PT_LOW_ALIAS_LIMIT _AC(64, UL) << 20 |
There was a problem hiding this comment.
The macro LINX_BOOT_PT_LOW_ALIAS_LIMIT is defined using a shift operator (<<) without surrounding parentheses. In C, shift operators have lower precedence than additive operators (+, -). If this macro is ever used in an additive expression (e.g., LINX_BOOT_PT_LOW_ALIAS_LIMIT + offset), it will be evaluated incorrectly as 64UL << (20 + offset). Please wrap the macro definition in parentheses to ensure safe evaluation.
| #define LINX_BOOT_PT_LOW_ALIAS_LIMIT _AC(64, UL) << 20 | |
| #define LINX_BOOT_PT_LOW_ALIAS_LIMIT (_AC(64, UL) << 20) |
| ret = cpuhp_setup_state_nocalls(CPUHP_AP_BASE_CACHEINFO_ONLINE, | ||
| "base/cacheinfo:online", | ||
| cacheinfo_cpu_online, | ||
| cacheinfo_cpu_pre_down); | ||
| if (ret) | ||
| return ret; | ||
|
|
||
| ret = cacheinfo_cpu_online(smp_processor_id()); | ||
| if (ret) { | ||
| cpuhp_remove_state_nocalls(CPUHP_AP_BASE_CACHEINFO_ONLINE); | ||
| return ret; | ||
| } |
There was a problem hiding this comment.
Using cpuhp_setup_state_nocalls and manually invoking the callback only on the boot CPU (smp_processor_id()) will break cacheinfo initialization on multi-core (SMP) systems. Since cacheinfo_sysfs_init is a device_initcall, it runs after secondary CPUs have already been brought online. By using nocalls and only manually invoking it on the boot CPU, all secondary CPUs will miss the cacheinfo_cpu_online callback, resulting in missing or uninitialized cacheinfo sysfs attributes for those CPUs. Please ensure that the callback is executed for all online CPUs, or restrict this workaround specifically to non-SMP/UP configurations.
| #if defined(CONFIG_LINX) || defined(__LINX__) | ||
| return rq; | ||
| #endif |
There was a problem hiding this comment.
The early return rq; added at line 5349 under defined(CONFIG_LINX) || defined(__LINX__) makes the entire rest of the finish_task_switch function unreachable. This completely bypasses critical cleanup steps, most notably mmdrop_lazy_tlb_sched(mm). As a result, mm_struct reference counts will leak on every task switch from a user process, leading to a severe memory leak. Please restructure the conditional compilation so that necessary cleanup steps are still executed.
| ret = setup_lxic(node, &nvec, &ndev, priv); | ||
| if (ret) { | ||
| error = ret; | ||
| goto out_iounmap; | ||
| } |
There was a problem hiding this comment.
If setup_lxic fails in lxic_init, the function jumps to out_iounmap but does not clean up the allocated MSI domains. This results in a resource leak of the PCI domain, base domain, and irqdomain created earlier in lxic_allocate_msi_domains. Please call irq_domain_cleanup(priv) in the error path before returning.
Summary
__switch_toTesting
git diff --checkNot tested