Skip to content

E2E: back-migration test flakes on Connection refused after restart #3366

@barakeinav1

Description

@barakeinav1

Background

e2e-tests::e2e migration_service::migration_service__should_handle_back_migration_a_to_b_to_a (added in #3346) has been flaking on CI. Three confirmed failures with identical signature (Connection refused on http://127.0.0.1:21725/debug/migrations at migration_service.rs:533:29):

Root cause: cluster.wait_for_node_healthy() only checks that /health returns 200 OK. /health is || async { "OK" } (crates/node/src/web.rs) — bound very early in node startup, before the indexer initializes. After the kill+restart in the back-migration test, the function returns in as little as 16 ms (the time to spawn the binary and bind a socket), with the indexer still warming up. The test then marches into the back-migration's polling loop on /debug/migrations, and if A0's process exits during catch-up (likely panicking on the contract state after the forward migration's identity swap), the polling sits on Connection refused for the full 30 s INDEXER_SYNC_TIMEOUT.

User Story

As a developer, I want migration_service__should_handle_back_migration_a_to_b_to_a to pass deterministically so flaky CI doesn't block unrelated PRs.

Acceptance Criteria

  • The test passes on at least 15 consecutive CI runs on the same branch.
  • The fix is targeted at the readiness-gap (don't proceed before the indexer is making progress), not a band-aid on the polling timeout.

Resources & Additional Notes

  • Fix PR: test(e2e): wait for indexer progress in back-migration restart (fix flaky test) #3365 (validated with 15/15 success and per-run wait metrics).
  • If the secondary "A0 crashes during catch-up" theory ever materializes after this fix, the new helper's error message will say "indexer block-height metric not available — node may have exited" (instead of the silent 30 s Connection refused timeout). A future change to capture each MPC node's stderr.log into the test output on failure would close the diagnostic loop for that case.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions