Platform
All / Unknown
Runtime Variant
All / Unknown
Description
Current simpler main can leave a group task requeued forever when the final
eligible endpoint sets require reusing the same endpoint for multiple
automatically selected group members.
This is based on origin/main at:
26b7b1507476024d6c97dbf97e52545853d44bd6
The problematic shape is:
eligible_endpoint_ids = {{0}, {0}};
For a group of size 2, if endpoint 0 exists and both members have no explicit
worker affinity, this submit shape can pass validation. Scheduler dispatch then
cannot assign the second member because automatic selection excludes endpoints
already selected for earlier members in the same group.
Current Main Code Example
In src/common/hierarchical/orchestrator.cpp, current main only checks that
each eligible endpoint set is non-empty. If a member has no explicit affinity,
validation skips the rest of the checks:
for (size_t i = 0; i < args_count; ++i) {
const auto &eligible =
eligible_endpoint_ids.empty() ? std::vector<int32_t>{} : eligible_endpoint_ids[i];
if (!eligible_endpoint_ids.empty() && eligible.empty()) {
throw std::invalid_argument(
"Orchestrator: final eligible endpoint set is empty for member " + std::to_string(i)
);
}
int8_t affinity = affinities.empty() ? int8_t(-1) : affinities[i];
if (affinity < 0) continue;
...
}
So eligible_endpoint_ids = {{0}, {0}} is not rejected when both group members
are unconstrained by explicit affinity.
In src/common/hierarchical/types.h, current main stores and exposes
per-member eligible endpoint sets:
const std::vector<int32_t> &eligible_endpoints_for(int32_t i) const {
static const std::vector<int32_t> empty;
if (eligible_endpoint_ids.empty()) return empty;
if (i < 0 || static_cast<size_t>(i) >= eligible_endpoint_ids.size()) return empty;
return eligible_endpoint_ids[static_cast<size_t>(i)];
}
In src/common/hierarchical/scheduler.cpp, current main uses all-or-nothing
group dispatch. It first selects workers for all group members, and only
dispatches after every member has a selected worker:
std::vector<WorkerThread *> workers(static_cast<size_t>(N), nullptr);
bool ok = true;
// Pass 2: fill unconstrained slots from idle pool
if (ok) {
for (int i = 0; i < N; i++) {
if (workers[static_cast<size_t>(i)] != nullptr) continue;
auto *wt =
cfg_.manager->pick_idle_excluding_eligible(
s.worker_type, workers, s.eligible_endpoints_for(i));
if (!wt) {
ok = false;
break;
}
workers[static_cast<size_t>(i)] = wt;
}
}
if (!ok) {
q->push(slot);
break;
}
s.state.store(TaskState::RUNNING, std::memory_order_release);
The exclusion happens inside
src/common/hierarchical/worker_manager.cpp::pick_idle_excluding_eligible():
bool excluded = false;
for (auto *ex : exclude) {
if (ex == wt.get()) {
excluded = true;
break;
}
}
if (!excluded) return wt.get();
For eligible_endpoint_ids = {{0}, {0}}, dispatch behaves like this:
- member 0 tentatively selects endpoint 0 and stores it in
workers[0];
- member 1 is also restricted to endpoint 0;
pick_idle_excluding_eligible() sees endpoint 0, but it is already in the
exclude list;
- no endpoint is returned for member 1;
ok = false;
- the whole group slot is pushed back to the ready queue;
- no member is dispatched, so the same state can repeat forever.
Steps to Reproduce
1. Register one NEXT_LEVEL endpoint with endpoint id 0.
2. Submit a NEXT_LEVEL group task with two members and no explicit worker
affinity.
3. Set both members' final eligible endpoint set to endpoint 0:
orch.submit_next_level_group(callable, {args0, args1}, cfg, {}, {{0}, {0}});
4. Run the scheduler/drain path.
Expected Behavior
The scheduler should not requeue forever. It should choose and document one
contract:
- allow endpoint reuse by dispatching both group members to endpoint 0, where
the WorkerThread queue runs them sequentially, or
- reject this shape at submit time with a clear
invalid_argument if group
members are required to occupy distinct endpoints.
Actual Behavior
The submit can succeed, but scheduler dispatch cannot complete worker
selection. The whole group slot is pushed back to the ready queue and retried.
Since no member is dispatched, the slot can remain undrained.
Git Commit ID
26b7b15
CANN Version
N/A - scheduler logic issue, not hardware-specific
Driver Version
N/A - scheduler logic issue, not hardware-specific
Host Platform
Linux (aarch64)
Additional Context
This was found while reviewing PR #1011's remote L3 worker-id cleanup. PR #1011 should only reject unknown eligible endpoint/worker ids at submit time. It should not force a distinct-endpoint contract for {{0}, {0}}, because endpoint reuse may be a valid scheduler behavior. The broader scheduler contract issue should be tracked separately here.
Platform
All / Unknown
Runtime Variant
All / Unknown
Description
Current
simplermain can leave a group task requeued forever when the finaleligible endpoint sets require reusing the same endpoint for multiple
automatically selected group members.
This is based on
origin/mainat:The problematic shape is:
eligible_endpoint_ids = {{0}, {0}};For a group of size 2, if endpoint 0 exists and both members have no explicit
worker affinity, this submit shape can pass validation. Scheduler dispatch then
cannot assign the second member because automatic selection excludes endpoints
already selected for earlier members in the same group.
Current Main Code Example
In
src/common/hierarchical/orchestrator.cpp, current main only checks thateach eligible endpoint set is non-empty. If a member has no explicit affinity,
validation skips the rest of the checks:
So
eligible_endpoint_ids = {{0}, {0}}is not rejected when both group membersare unconstrained by explicit affinity.
In
src/common/hierarchical/types.h, current main stores and exposesper-member eligible endpoint sets:
In
src/common/hierarchical/scheduler.cpp, current main uses all-or-nothinggroup dispatch. It first selects workers for all group members, and only
dispatches after every member has a selected worker:
The exclusion happens inside
src/common/hierarchical/worker_manager.cpp::pick_idle_excluding_eligible():For
eligible_endpoint_ids = {{0}, {0}}, dispatch behaves like this:workers[0];pick_idle_excluding_eligible()sees endpoint 0, but it is already in theexclude list;
ok = false;Steps to Reproduce
Expected Behavior
The scheduler should not requeue forever. It should choose and document one
contract:
the
WorkerThreadqueue runs them sequentially, orinvalid_argumentif groupmembers are required to occupy distinct endpoints.
Actual Behavior
The submit can succeed, but scheduler dispatch cannot complete worker
selection. The whole group slot is pushed back to the ready queue and retried.
Since no member is dispatched, the slot can remain undrained.
Git Commit ID
26b7b15
CANN Version
N/A - scheduler logic issue, not hardware-specific
Driver Version
N/A - scheduler logic issue, not hardware-specific
Host Platform
Linux (aarch64)
Additional Context
This was found while reviewing PR #1011's remote L3 worker-id cleanup. PR #1011 should only reject unknown eligible endpoint/worker ids at submit time. It should not force a distinct-endpoint contract for
{{0}, {0}}, because endpoint reuse may be a valid scheduler behavior. The broader scheduler contract issue should be tracked separately here.