Skip to content

[Code Health] Per-device runtime config (scheduler timeout) rides the per-run arena layout instead of a per-device channel #1220

Description

@ChaoZheng109

Category

Technical Debt (cleanup, refactor)

Component

Host Runtime (with AICPU Scheduler on the read side)

Description

PTO2_SCHEDULER_TIMEOUT_MS is a per-device, run-invariant value (the AICPU scheduler no-progress watchdog). It is semantically per-device config, but it is currently carried as a field of the per-run runtime arena layout (PTO2RuntimeArenaLayout::scheduler_timeout_ms) and re-transmitted on every run as part of the full arena image H2D.

This is a structural mismatch with two concrete downsides:

  1. The layout becomes a dumping ground. PTO2RuntimeArenaLayout describes the per-run arena (ring sizes, tensor_map, scope caps — things that genuinely change per run). A per-device watchdog timeout has nothing to do with ring/tensor layout. Every future per-device knob that "just rides the layout" compounds this.

  2. Read path is per-run for a value that never changes per run. The host re-reads the env (resolve_scheduler_timeout_ms()) every run and re-writes it into the freshly-rebuilt arena image; the device re-reads it from rt_->prebuilt_layout on every boot.

Ring sizes (PTO2_RING_*) legitimately belong in the layout (they are per-run). The mismatch is only for run-invariant per-device config like the scheduler timeout.

There is now a purpose-built channel for exactly this: InitArgs. A recent refactor introduced InitArgs (src/a5/platform/include/common/kernel_args.h:130), documented verbatim as "per-device one-shot invariants ... uploaded once at worker init via the simpler_aicpu_init entry, before any register_callable/exec launch ... so they no longer ride on the per-run KernelArgs: latched once into the resident AICPU SO globals and surviving every subsequent per-task launch." It currently carries device_id, log_level, log_info_v. The scheduler timeout is the same category of value and belongs here.

  • Host send: ensure_aicpu_init_launched() (src/common/platform/onboard/host/device_runner_base.cpp:364) fills InitArgs (:374) and launches KernelNames::InitName exactly once per runner, guarded by aicpu_init_launched_ (:380, aicpu_num=1).
  • Device latch precedent: InitArgs.log_info_v is latched into the resident AICPU global g_log_info_v (src/common/platform/onboard/aicpu/device_log.cpp:36), "latched once per device ... not re-pushed per run."

This supersedes an earlier note in this issue that claimed there was no transmit-once channel — that reasoning only considered the per-run launch tier. InitArgs is a genuine transmit-once-per-device path.

Location

Current placement to remove:

  • src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2.h:117scheduler_timeout_ms field in PTO2RuntimeArenaLayout
  • src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp:248 — per-run resolve_scheduler_timeout_ms() (env read), written into layout at :499
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp:606 — device read of rt_->prebuilt_layout.scheduler_timeout_ms

Target channel to reuse:

  • InitArgs struct: src/a5/platform/include/common/kernel_args.h:130
  • Host one-shot launch: src/common/platform/onboard/host/device_runner_base.cpp:364 (ensure_aicpu_init_launched)
  • Device latch precedent: src/common/platform/onboard/aicpu/device_log.cpp:36 (g_log_info_v)

a2a3 mirrors under src/a2a3/...; sim variants under src/*/platform/sim/....

Proposed Fix

Recommended: carry scheduler_timeout_ms in InitArgs (the per-device one-shot channel that already exists for device_id / log config):

  1. Add uint32_t scheduler_timeout_ms; to InitArgs (kernel_args.h).
  2. Host: in ensure_aicpu_init_launched() stamp init_args.scheduler_timeout_ms from the env value resolved once at init (resolve_onboard_timeout_config() already reads the scheduler env at attach for ordering validation and currently discards it — keep it). The per-run getenv in runtime_maker is then deleted.
  3. Device: simpler_aicpu_init latches it into a resident AICPU SO global (next to the device_id / g_log_info_v latches).
  4. Scheduler: scheduler_dispatch.cpp reads that global instead of rt_->prebuilt_layout.scheduler_timeout_ms.
  5. Remove scheduler_timeout_ms from PTO2RuntimeArenaLayout and the per-run resolve_scheduler_timeout_ms().
  6. Apply symmetrically across the four quadrants (onboard/sim x a5/a2a3).

This is a true transmit-once-per-device path: the value leaves the per-run arena and the per-run KernelArgs entirely, is uploaded once at init, latched into AICPU SO globals, and consumed read-only by every subsequent run — exactly how device_id / log config already work. No new device buffer, no per-run pointer, no per-run getenv. InitArgs being strictly per-device (vs per-callable) means there is not even a re-stamp concern.

No new env gate is introduced — PTO2_SCHEDULER_TIMEOUT_MS already exists; only its landing/transport changes. Existing per-case tests that set different values (tests/st/runtime_fatal_codes, tests/st/aicore_op_timeout) are per-process and set the env before init, so an init-time read does not break them.

Alternatives considered (inferior, kept for the record): an inline scalar in the per-run KernelArgs (fixes categorization but stays per-run); a separate persistent device buffer modeled on device_wall_dev_ptr_ (data once, but the pointer still free-rides KernelArgs per run — only worthwhile for a large/growing config blob); or the per-callable RegisterCallableArgs register tier (transmit-once but per-callable, so less clean than the strictly per-device InitArgs).

Priority

Low (no impact today, good to fix eventually)

Metadata

Metadata

Assignees

No one assigned

    Labels

    code healthTechnical debt, robustness, code quality

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions