E2E: back-migration test flakes on Connection refused after restart

### Background

`e2e-tests::e2e migration_service::migration_service__should_handle_back_migration_a_to_b_to_a` (added in #3346) has been flaking on CI. Three confirmed failures with identical signature (`Connection refused` on `http://127.0.0.1:21725/debug/migrations` at `migration_service.rs:533:29`):

- PR #3346 CI run [26451546911](https://github.com/near/mpc/actions/runs/26451546911/job/77873914394) attempt 1
- PR #3362 CI [job 77946818099](https://github.com/near/mpc/actions/runs/26471648821/job/77946818099)
- PR #3362 CI [job 77952041274](https://github.com/near/mpc/actions/runs/26473240463/job/77952041274)

Root cause: `cluster.wait_for_node_healthy()` only checks that `/health` returns `200 OK`. `/health` is `|| async { "OK" }` (`crates/node/src/web.rs`) — bound very early in node startup, before the indexer initializes. After the kill+restart in the back-migration test, the function returns in **as little as 16 ms** (the time to spawn the binary and bind a socket), with the indexer still warming up. The test then marches into the back-migration's polling loop on `/debug/migrations`, and if A0's process exits during catch-up (likely panicking on the contract state after the forward migration's identity swap), the polling sits on `Connection refused` for the full 30 s `INDEXER_SYNC_TIMEOUT`.

### User Story

As a developer, I want `migration_service__should_handle_back_migration_a_to_b_to_a` to pass deterministically so flaky CI doesn't block unrelated PRs.

### Acceptance Criteria

- [ ] The test passes on at least 15 consecutive CI runs on the same branch.
- [ ] The fix is targeted at the readiness-gap (don't proceed before the indexer is making progress), not a band-aid on the polling timeout.

### Resources & Additional Notes

- Fix PR: #3365 (validated with 15/15 success and per-run wait metrics).
- If the secondary "A0 crashes during catch-up" theory ever materializes after this fix, the new helper's error message will say *"indexer block-height metric not available — node may have exited"* (instead of the silent 30 s `Connection refused` timeout). A future change to capture each MPC node's `stderr.log` into the test output on failure would close the diagnostic loop for that case.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2E: back-migration test flakes on Connection refused after restart #3366

Background

User Story

Acceptance Criteria

Resources & Additional Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

E2E: back-migration test flakes on Connection refused after restart #3366

Description

Background

User Story

Acceptance Criteria

Resources & Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions