[Code Health] Per-device runtime config (scheduler timeout) rides the per-run arena layout instead of a per-device channel

### Category

Technical Debt (cleanup, refactor)

### Component

Host Runtime (with AICPU Scheduler on the read side)

### Description

`PTO2_SCHEDULER_TIMEOUT_MS` is a **per-device, run-invariant** value (the AICPU scheduler no-progress watchdog). It is semantically per-device config, but it is currently carried as a field of the **per-run** runtime arena layout (`PTO2RuntimeArenaLayout::scheduler_timeout_ms`) and re-transmitted on every run as part of the full arena image H2D.

This is a structural mismatch with two concrete downsides:

1. **The layout becomes a dumping ground.** `PTO2RuntimeArenaLayout` describes the *per-run* arena (ring sizes, tensor_map, scope caps — things that genuinely change per run). A per-device watchdog timeout has nothing to do with ring/tensor layout. Every future per-device knob that "just rides the layout" compounds this.

2. **Read path is per-run for a value that never changes per run.** The host re-reads the env (`resolve_scheduler_timeout_ms()`) every run and re-writes it into the freshly-rebuilt arena image; the device re-reads it from `rt_->prebuilt_layout` on every boot.

Ring sizes (`PTO2_RING_*`) legitimately belong in the layout (they are per-run). The mismatch is only for run-invariant per-device config like the scheduler timeout.

**There is now a purpose-built channel for exactly this: `InitArgs`.** A recent refactor introduced `InitArgs` (`src/a5/platform/include/common/kernel_args.h:130`), documented verbatim as *"per-device one-shot invariants ... uploaded once at worker init via the `simpler_aicpu_init` entry, before any register_callable/exec launch ... so they no longer ride on the per-run KernelArgs: latched once into the resident AICPU SO globals and surviving every subsequent per-task launch."* It currently carries `device_id`, `log_level`, `log_info_v`. The scheduler timeout is the same category of value and belongs here.

- Host send: `ensure_aicpu_init_launched()` (`src/common/platform/onboard/host/device_runner_base.cpp:364`) fills `InitArgs` (`:374`) and launches `KernelNames::InitName` exactly once per runner, guarded by `aicpu_init_launched_` (`:380`, `aicpu_num=1`).
- Device latch precedent: `InitArgs.log_info_v` is latched into the resident AICPU global `g_log_info_v` (`src/common/platform/onboard/aicpu/device_log.cpp:36`), "latched once per device ... not re-pushed per run."

This supersedes an earlier note in this issue that claimed there was no transmit-once channel — that reasoning only considered the per-run launch tier. `InitArgs` is a genuine transmit-once-per-device path.

### Location

Current placement to remove:
- `src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2.h:117` — `scheduler_timeout_ms` field in `PTO2RuntimeArenaLayout`
- `src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp:248` — per-run `resolve_scheduler_timeout_ms()` (env read), written into layout at `:499`
- `src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp:606` — device read of `rt_->prebuilt_layout.scheduler_timeout_ms`

Target channel to reuse:
- `InitArgs` struct: `src/a5/platform/include/common/kernel_args.h:130`
- Host one-shot launch: `src/common/platform/onboard/host/device_runner_base.cpp:364` (`ensure_aicpu_init_launched`)
- Device latch precedent: `src/common/platform/onboard/aicpu/device_log.cpp:36` (`g_log_info_v`)

a2a3 mirrors under `src/a2a3/...`; sim variants under `src/*/platform/sim/...`.

### Proposed Fix

**Recommended: carry `scheduler_timeout_ms` in `InitArgs`** (the per-device one-shot channel that already exists for `device_id` / log config):

1. Add `uint32_t scheduler_timeout_ms;` to `InitArgs` (`kernel_args.h`).
2. Host: in `ensure_aicpu_init_launched()` stamp `init_args.scheduler_timeout_ms` from the env value resolved **once at init** (`resolve_onboard_timeout_config()` already reads the scheduler env at attach for ordering validation and currently discards it — keep it). The per-run `getenv` in `runtime_maker` is then deleted.
3. Device: `simpler_aicpu_init` latches it into a resident AICPU SO global (next to the `device_id` / `g_log_info_v` latches).
4. Scheduler: `scheduler_dispatch.cpp` reads that global instead of `rt_->prebuilt_layout.scheduler_timeout_ms`.
5. Remove `scheduler_timeout_ms` from `PTO2RuntimeArenaLayout` and the per-run `resolve_scheduler_timeout_ms()`.
6. Apply symmetrically across the four quadrants (onboard/sim x a5/a2a3).

This is a true transmit-once-per-device path: the value leaves the per-run arena and the per-run `KernelArgs` entirely, is uploaded once at init, latched into AICPU SO globals, and consumed read-only by every subsequent run — exactly how `device_id` / log config already work. No new device buffer, no per-run pointer, no per-run `getenv`. `InitArgs` being strictly per-device (vs per-callable) means there is not even a re-stamp concern.

No new env gate is introduced — `PTO2_SCHEDULER_TIMEOUT_MS` already exists; only its landing/transport changes. Existing per-case tests that set different values (`tests/st/runtime_fatal_codes`, `tests/st/aicore_op_timeout`) are per-process and set the env before init, so an init-time read does not break them.

Alternatives considered (inferior, kept for the record): an inline scalar in the per-run `KernelArgs` (fixes categorization but stays per-run); a separate persistent device buffer modeled on `device_wall_dev_ptr_` (data once, but the pointer still free-rides `KernelArgs` per run — only worthwhile for a large/growing config blob); or the per-callable `RegisterCallableArgs` register tier (transmit-once but per-callable, so less clean than the strictly per-device `InitArgs`).

### Priority

Low (no impact today, good to fix eventually)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Code Health] Per-device runtime config (scheduler timeout) rides the per-run arena layout instead of a per-device channel #1220

Category

Component

Description

Location

Proposed Fix

Priority

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Code Health] Per-device runtime config (scheduler timeout) rides the per-run arena layout instead of a per-device channel #1220

Description

Category

Component

Description

Location

Proposed Fix

Priority

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions