Skip to content

test(e2e): add no-restart back-migration variant + refactor shared scaffolding#3388

Draft
barakeinav1 wants to merge 1 commit into
mainfrom
test/back-migration-no-kill
Draft

test(e2e): add no-restart back-migration variant + refactor shared scaffolding#3388
barakeinav1 wants to merge 1 commit into
mainfrom
test/back-migration-no-kill

Conversation

@barakeinav1
Copy link
Copy Markdown
Contributor

Summary

Adds a second back-migration test that does not kill+restart A0 between the forward and back directions, and refactors the shared scaffolding so the two tests share their setup/teardown.

Why a no-restart variant

migration_service__should_handle_back_migration_a_to_b_to_a (existing) kill+restart's A0 to model the disk-persisted-state scenario investigated in #2121. That kill+restart is also the trigger for the upstream nearcore indexer panic surfaced via #3366 — see the diagnostic report for the chain/indexer/src/streamer/mod.rs:207 expect("receipt must be present at this moment") panic. On branches that produce more sandbox chain activity than plain main, A0's post-restart indexer catch-up hits that panic and the test flakes.

Until the upstream nearcore fix lands, this PR adds migration_service__should_handle_back_migration_a_to_b_to_a_no_restart — a stable sibling that exercises the back-migration code path through A0's in-memory state transitions, with no kill+restart and no restart-time catch-up.

What each test covers

Test Coverage Triggers nearcore panic?
…_a_to_b_to_a (existing) back-migration code path + #2121 disk-persisted-state scenario yes, on branches with high chain activity
…_a_to_b_to_a_no_restart (new) back-migration code path (in-memory state transitions only) no

Refactor

The two tests previously had ~85% identical bodies (cluster setup, sanity asserts, forward A→B, post-forward sign+ckd, back B→A, kill B0, final sign+ckd). I factored that into run_back_migration_round_trip(port_seed, between) with a small BetweenDirections enum (None vs KillAndRestartA). Each #[tokio::test] is now a one-line wrapper that picks the port seed and variant.

The kill+restart-specific block (keyshare snapshots, indexer-height snapshot, kill/start/health/indexer-wait, keyshare-preserved assert) lives inside the KillAndRestartA match arm.

Test plan

  • cargo check --tests -p e2e-tests clean
  • Both tests pass in CI (the new no-restart should be stable; the existing one's flake behavior is unchanged — still tracked under E2E: back-migration test flakes on Connection refused after restart #3366)
  • If the no-restart variant goes flaky too, we've learned something important about whether the issue is fully kill-specific

…affolding

The existing `migration_service__should_handle_back_migration_a_to_b_to_a`
kill+restart's A0 between the forward and back directions to model the
disk-persisted-state scenario from #2121. That kill+restart is also the
trigger for the upstream nearcore indexer panic surfaced via #3366,
which flakes the test on branches that produce additional sandbox chain
activity.

This commit:
- Adds `migration_service__should_handle_back_migration_a_to_b_to_a_no_restart`,
  a stable sibling that exercises the back-migration code path through
  A0's in-memory state transitions (no kill+restart, no restart-time
  indexer catch-up).
- Factors the shared round-trip scaffolding (cluster setup, sanity asserts,
  forward, sign+ckd, back, kill B0, final asserts) into
  `run_back_migration_round_trip(port_seed, between)`.
- Splits the kill+restart-specific block (keyshare snapshots, indexer-height
  snapshot, kill/start/health/indexer-wait, keyshare-preserved assert) into
  the `BetweenDirections::KillAndRestartA` arm. The new no-restart test
  uses `BetweenDirections::None`.

The two #[tokio::test] entry points are now one-line wrappers that
differ only in port seed and the between-directions variant.

The kill+restart test stays the #2121 regression coverage; the no-restart
test gives CI a non-flaky back-migration check while the upstream
indexer panic is open.
Copilot AI review requested due to automatic review settings May 28, 2026 13:26
@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

Pull request overview

Test-only change in crates/e2e-tests. Extracts the ~85% shared body of migration_service__should_handle_back_migration_a_to_b_to_a into a run_back_migration_round_trip(port_seed, between) helper parameterised by a BetweenDirections enum, then adds a new …_no_restart sibling that skips the kill+restart of A0. The new variant exists so that the back-migration code path keeps a stable green test while the kill+restart variant continues to flake against the upstream nearcore indexer panic from #3366. A new MIGRATION_BACK_NO_RESTART_PORT_SEED = 22 is added so the two tests don't share a sandbox port range.

Changes:

  • Adds MIGRATION_BACK_NO_RESTART_PORT_SEED = 22 in tests/common.rs.
  • Introduces BetweenDirections { None, KillAndRestartA } and run_back_migration_round_trip(port_seed, between) in tests/migration_service.rs.
  • Existing …_a_to_b_to_a now wraps the helper with KillAndRestartA; the kill+restart-specific block (keyshare snapshot, indexer-height snapshot, kill/start/health/indexer-wait, keyshare-preserved assert) lives only inside that match arm.
  • New …_a_to_b_to_a_no_restart test wraps the helper with BetweenDirections::None.

Reviewed changes

Per-file summary
File Description
crates/e2e-tests/tests/common.rs Adds MIGRATION_BACK_NO_RESTART_PORT_SEED = 22
crates/e2e-tests/tests/migration_service.rs Extracts run_back_migration_round_trip helper + BetweenDirections enum; existing back-migration test now a 1-line wrapper; adds new no-restart sibling test

Findings

No blocking issues. The refactor is behaviour-preserving for the existing …_a_to_b_to_a test (all kill+restart steps stay in the KillAndRestartA arm and run in the same order). The new test follows the project's <system_under_test>__should_<assertion> naming and reuses the existing helpers for sign/ckd. Port seed 22 is unique and sequential with the surrounding constants.

Non-blocking (nits, follow-ups):

  • crates/e2e-tests/tests/migration_service.rs:660run_back_migration_round_trip is roughly 130 lines, mostly because the KillAndRestartA arm inlines 6 distinct steps (snapshot, assert, kill, start, wait healthy, wait indexer, assert preserved). If a future variant needs to reuse parts of that block, consider lifting it into a small kill_and_restart_a_preserving_keyshares(&mut cluster, a_idx).await helper. Not worth doing for two callers today.
  • crates/e2e-tests/tests/migration_service.rs:631BetweenDirections::None is fine but a tad opaque at the call site (BetweenDirections::None reads like "no variant"). A name like KeepRunning would make …_no_restart's wrapper self-documenting. Cosmetic.
  • PR description's test plan still has unchecked boxes for "Both tests pass in CI" — worth confirming the new no-restart variant has actually been run end-to-end through cargo make e2e-tests (not just cargo check --tests) before merge, since check-only validation won't catch a port collision or runtime setup issue.

✅ Approved

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a stable no-restart back-migration E2E variant while keeping the existing kill+restart coverage, and refactors shared A→B→A migration setup into a common helper.

Changes:

  • Introduces BetweenDirections and run_back_migration_round_trip(...) to share common test scaffolding.
  • Adds migration_service__should_handle_back_migration_a_to_b_to_a_no_restart.
  • Adds a dedicated port seed for the new E2E test.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
crates/e2e-tests/tests/migration_service.rs Refactors back-migration test flow and adds the no-restart variant.
crates/e2e-tests/tests/common.rs Adds a unique port seed for the new back-migration no-restart test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants