Skip to content

Preserve Linx block state across Linux context switches and early boot#23

Draft
zhoubot wants to merge 1 commit into
mainfrom
codex/model-polish-20260530
Draft

Preserve Linx block state across Linux context switches and early boot#23
zhoubot wants to merge 1 commit into
mainfrom
codex/model-polish-20260530

Conversation

@zhoubot

@zhoubot zhoubot commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • package the current model-generated Linx Linux work from the existing checkout
  • save and restore the full ACR1 EBSTATE window across __switch_to
  • tighten exception/return control flow and early MMU/initramfs bring-up for the firmwareless lane

Testing

  • git diff --check

Not tested

  • kernel build
  • initramfs smoke
  • BusyBox full-OS regression lane

The current work renames the EBSTATE window to spec-shaped SSR aliases, saves and restores the full ACR1 block-state window in __switch_to, tightens exception and return control flow, and adjusts early MMU and initramfs bring-up around the firmwareless runtime lane.

Constraint: The active PR lane still depends on full ACR1 EBSTATE save and restore across task switches

Rejected: Save only named EBARG fields | current bring-up still needs the full 0x1f40..0x1f5f window preserved per task

Confidence: medium

Scope-risk: broad

Directive: Do not shrink the saved ACR1 window without re-proving task-switch, FRET.STK, and full runtime lanes

Tested: git diff --check

Not-tested: kernel build, initramfs smoke, BusyBox full-OS regression

Co-authored-by: OmX <omx@oh-my-codex.dev>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces numerous early bring-up workarounds, fixes, and boot-time bypasses for the Linx architecture, covering memory management, scheduling, locking, and CPU hotplug subsystems. The code review identified several critical issues that must be addressed: a severe security vulnerability in __get_kernel_nofault and __put_kernel_nofault due to global virtual-to-physical address conversion; a potential post-boot crash caused by marking the persistent linx_root_task_group_boot_buf with __initdata; missing parentheses in the LINX_BOOT_PT_LOW_ALIAS_LIMIT macro definition; broken cacheinfo initialization on SMP systems due to boot-CPU-only hotplug callbacks; a severe memory leak in finish_task_switch caused by an unreachable cleanup path; and a resource leak of MSI domains on initialization failure in the interrupt controller driver.

Comment on lines +347 to +349
if (is_kernel_mapping((unsigned long)__kr_src) || \
is_linear_mapping((unsigned long)__kr_src)) \
__kr_src = (const void *)(uintptr_t)__pa(__kr_src); \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Converting kernel virtual addresses to physical addresses globally in __get_kernel_nofault and __put_kernel_nofault is highly dangerous. While this might work during early boot when identity mappings are active, at runtime (when user space processes are running), the active page table (current->mm->pgd) maps low virtual addresses to user space memory. Consequently, any attempt to use these nofault helpers at runtime while a user process is active will access the user's virtual memory instead of the kernel's physical memory. This is a critical security vulnerability and correctness bug. Please restrict this physical address conversion workaround to early boot (e.g., by checking system_state == SYSTEM_BOOTING) or avoid doing it globally.

Comment thread kernel/sched/core.c
Comment on lines +8845 to +8846
static unsigned long linx_root_task_group_boot_buf[512] __initdata
__aligned(SMP_CACHE_BYTES);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The static buffer linx_root_task_group_boot_buf is marked with __initdata. However, it is used as a persistent fallback backing store for root_task_group.se and root_task_group.cfs_rq, which must remain valid for the entire lifetime of the system. Since __initdata memory is freed and reclaimed after early boot completes, any scheduling activity after boot will access freed/reclaimed memory, leading to undefined behavior, silent data corruption, or a catastrophic kernel panic. Please remove the __initdata attribute from this buffer.

static unsigned long linx_root_task_group_boot_buf[512] __aligned(SMP_CACHE_BYTES);

#define PTRS_PER_P4D (PAGE_SIZE / sizeof(p4d_t))

#if defined(__LINX__)
#define LINX_BOOT_PT_LOW_ALIAS_LIMIT _AC(64, UL) << 20

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The macro LINX_BOOT_PT_LOW_ALIAS_LIMIT is defined using a shift operator (<<) without surrounding parentheses. In C, shift operators have lower precedence than additive operators (+, -). If this macro is ever used in an additive expression (e.g., LINX_BOOT_PT_LOW_ALIAS_LIMIT + offset), it will be evaluated incorrectly as 64UL << (20 + offset). Please wrap the macro definition in parentheses to ensure safe evaluation.

Suggested change
#define LINX_BOOT_PT_LOW_ALIAS_LIMIT _AC(64, UL) << 20
#define LINX_BOOT_PT_LOW_ALIAS_LIMIT (_AC(64, UL) << 20)

Comment thread drivers/base/cacheinfo.c
Comment on lines +1045 to +1056
ret = cpuhp_setup_state_nocalls(CPUHP_AP_BASE_CACHEINFO_ONLINE,
"base/cacheinfo:online",
cacheinfo_cpu_online,
cacheinfo_cpu_pre_down);
if (ret)
return ret;

ret = cacheinfo_cpu_online(smp_processor_id());
if (ret) {
cpuhp_remove_state_nocalls(CPUHP_AP_BASE_CACHEINFO_ONLINE);
return ret;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using cpuhp_setup_state_nocalls and manually invoking the callback only on the boot CPU (smp_processor_id()) will break cacheinfo initialization on multi-core (SMP) systems. Since cacheinfo_sysfs_init is a device_initcall, it runs after secondary CPUs have already been brought online. By using nocalls and only manually invoking it on the boot CPU, all secondary CPUs will miss the cacheinfo_cpu_online callback, resulting in missing or uninitialized cacheinfo sysfs attributes for those CPUs. Please ensure that the callback is executed for all online CPUs, or restrict this workaround specifically to non-SMP/UP configurations.

Comment thread kernel/sched/core.c
Comment on lines +5348 to +5350
#if defined(CONFIG_LINX) || defined(__LINX__)
return rq;
#endif

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The early return rq; added at line 5349 under defined(CONFIG_LINX) || defined(__LINX__) makes the entire rest of the finish_task_switch function unreachable. This completely bypasses critical cleanup steps, most notably mmdrop_lazy_tlb_sched(mm). As a result, mm_struct reference counts will leak on every task switch from a user process, leading to a severe memory leak. Please restructure the conditional compilation so that necessary cleanup steps are still executed.

Comment on lines +858 to +862
ret = setup_lxic(node, &nvec, &ndev, priv);
if (ret) {
error = ret;
goto out_iounmap;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If setup_lxic fails in lxic_init, the function jumps to out_iounmap but does not clean up the allocated MSI domains. This results in a resource leak of the PCI domain, base domain, and irqdomain created earlier in lxic_allocate_msi_domains. Please call irq_domain_cleanup(priv) in the error path before returning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant