Skip to content

fix: Improve e2e timeouts#2658

Merged
dianab-cl merged 2 commits into
mainfrom
diana/fou-251-improve-e2e-pipeline-for-interchain-security
Jun 19, 2026
Merged

fix: Improve e2e timeouts#2658
dianab-cl merged 2 commits into
mainfrom
diana/fou-251-improve-e2e-pipeline-for-interchain-security

Conversation

@dianab-cl

Copy link
Copy Markdown
Contributor

Please go to the Preview tab and select the appropriate sub-template:

  • Production code - for types fix, feat, and refactor.
  • Docs - for documentation changes.
  • Others - for changes that do not affect production code.

@linear-code

linear-code Bot commented Jun 19, 2026

Copy link
Copy Markdown

FOU-251

@github-actions github-actions Bot added C:Testing Assigned automatically by the PR labeler C:CI Assigned automatically by the PR labeler labels Jun 19, 2026
@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown

Greptile Summary

This PR improves e2e CI reliability by adding explicit timeouts to chain-start operations and parallelizing the compatibility-test matrix. Previously a consumer chain that never produced blocks would block indefinitely; now it fails within 5 minutes at the Go level, and the CI job itself is capped at 30 minutes per consumer version.

  • tests/e2e/actions.go and tests/e2e/v5/actions.go: StartChain (and AssignConsumerPubKey in v5) are refactored from a simple scan loop to a goroutine + buffered-channel + select/time.After pattern, killing the child process and calling log.Fatalf on timeout.
  • .github/workflows/nightly-e2e.yml: The single 200-minute sequential job is replaced by a fail-fast: false matrix over [latest, v5.2.0, v6.3.0] with a 30-minute per-job cap; a duplicate setup-go step is also removed, and compatibility-test is wired into nightly-test-fail's dependency list.

Confidence Score: 5/5

The changes are safe to merge; they tighten timeouts and parallelize CI without altering any production logic.

The goroutine + buffered-channel (size 1) + select pattern is correctly implemented in both actions files: the goroutine can always drain without blocking even if the timeout branch fires first, and log.Fatalf exits the process before any leak becomes a concern. The workflow matrix correctly sets fail-fast: false so a hanging version does not cancel siblings, and the reduced per-job timeout (30 min) is generous relative to the stated healthy startup time (under 1 minute).

No files require special attention.

Important Files Changed

Filename Overview
.github/workflows/nightly-e2e.yml Parallelizes compatibility tests via matrix strategy (latest/v5.2.0/v6.3.0), reduces per-job timeout from 200 to 30 minutes, removes duplicate setup-go step, and adds compatibility-test to nightly-test-fail dependency list.
tests/e2e/actions.go Adds startChainTimeout (5 min) and refactors StartChain to a goroutine+select pattern so a chain that never emits the done sentinel fails fast instead of blocking until the CI job timeout.
tests/e2e/v5/actions.go Adds scanTimeout (5 min) and applies the same goroutine+select timeout pattern to both StartChain and AssignConsumerPubKey; buffered channel (size 1) is correctly sized to prevent the background goroutine from blocking after the timeout branch exits.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Test as Test Runner
    participant GO as Go goroutine
    participant Script as start-chain.sh
    participant Timer as time.After(5m)

    Test->>Script: cmd.Start() (StdoutPipe)
    Test->>GO: launch scanner goroutine
    Test->>Timer: arm 5-minute timer

    loop scan lines
        Script-->>GO: stdout line
        GO-->>GO: check for "done"
    end

    alt Chain starts successfully
        Script-->>GO: "done!!!!!!!!"
        GO->>Test: "scanDone <- nil"
        Test->>Test: continue test
    else Script exits without done
        Script-->>GO: pipe closed
        GO->>Test: "scanDone <- error"
        Test->>Test: log.Fatal(err)
    else Timeout fires (5 min)
        Timer->>Test: timeout signal
        Test->>Script: cmd.Process.Kill()
        Test->>Test: log.Fatalf(timed out)
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Test as Test Runner
    participant GO as Go goroutine
    participant Script as start-chain.sh
    participant Timer as time.After(5m)

    Test->>Script: cmd.Start() (StdoutPipe)
    Test->>GO: launch scanner goroutine
    Test->>Timer: arm 5-minute timer

    loop scan lines
        Script-->>GO: stdout line
        GO-->>GO: check for "done"
    end

    alt Chain starts successfully
        Script-->>GO: "done!!!!!!!!"
        GO->>Test: "scanDone <- nil"
        Test->>Test: continue test
    else Script exits without done
        Script-->>GO: pipe closed
        GO->>Test: "scanDone <- error"
        Test->>Test: log.Fatal(err)
    else Timeout fires (5 min)
        Timer->>Test: timeout signal
        Test->>Script: cmd.Process.Kill()
        Test->>Test: log.Fatalf(timed out)
    end
Loading

Reviews (1): Last reviewed commit: "Improve e2e timeouts" | Re-trigger Greptile

Comment thread tests/e2e/actions.go
Comment on lines +213 to +231
go func() {
for scanner.Scan() {
out := scanner.Text()
if verbose {
fmt.Println("startChain: " + out)
}
if out == done {
scanDone <- nil
return
}
}
if out == done {
break
if err := scanner.Err(); err != nil {
scanDone <- err
return
}
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
// The script exited (stdout closed) before signaling done, which means
// the chain failed to start rather than just being slow.
scanDone <- fmt.Errorf("chain %s start script exited before signaling done", action.Chain)
}()
Comment thread tests/e2e/v5/actions.go
Comment on lines +145 to +161
go func() {
for scanner.Scan() {
out := scanner.Text()
if verbose {
fmt.Println("startChain: " + out)
}
if out == done {
scanDone <- nil
return
}
}
if out == done {
break
if err := scanner.Err(); err != nil {
scanDone <- err
return
}
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
scanDone <- fmt.Errorf("chain %s start script exited before signaling done", action.Chain)
}()
Comment thread tests/e2e/v5/actions.go
Comment on lines +845 to +861
go func() {
for scanner.Scan() {
out := scanner.Text()
if verbose {
fmt.Println("assign key - reconfigure: " + out)
}
if out == done {
scanDone <- nil
return
}
}
if out == done {
break
if err := scanner.Err(); err != nil {
scanDone <- err
return
}
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
scanDone <- fmt.Errorf("reconfigure node for %s exited before signaling done", action.Chain)
}()
@dianab-cl dianab-cl requested a review from Eric-Warehime June 19, 2026 14:05
@dianab-cl dianab-cl changed the title Improve e2e timeouts fix: Improve e2e timeouts Jun 19, 2026
Comment thread .github/workflows/nightly-e2e.yml Outdated
matrix:
# Consumer versions tested against the latest provider.
# For new versions to be tested add/remove entries here.
consumer-version: [latest, v5.2.0, v6.3.0]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably just test latest. Stride is really the only one we care about right now and they're on v7 That will likely reduce test time even further.

@mergify

mergify Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Queued — the merge queue status continues in this comment ↓.

Comment thread .github/workflows/nightly-e2e.yml Outdated
@mergify

mergify Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Merge Queue Status

This pull request spent 20 seconds in the queue, including 4 seconds running CI.

Waiting for
  • #approved-reviews-by>1
  • any of: [🛡 GitHub branch protection]
    • check-neutral = test-e2e
    • check-skipped = test-e2e
    • check-success = test-e2e
  • any of: [🛡 GitHub branch protection]
    • check-neutral = test-integration
    • check-skipped = test-integration
    • check-success = test-integration
  • any of: [🛡 GitHub branch protection]
    • check-neutral = tests
    • check-skipped = tests
    • check-success = tests
  • any of: [🛡 GitHub branch protection]
    • check-neutral = golangci-lint
    • check-skipped = golangci-lint
    • check-success = golangci-lint
All conditions
  • #approved-reviews-by>1
  • any of [🛡 GitHub branch protection]:
    • check-neutral = test-e2e
    • check-skipped = test-e2e
    • check-success = test-e2e
  • any of [🛡 GitHub branch protection]:
    • check-neutral = test-integration
    • check-skipped = test-integration
    • check-success = test-integration
  • any of [🛡 GitHub branch protection]:
    • check-neutral = tests
    • check-skipped = tests
    • check-success = tests
  • any of [🛡 GitHub branch protection]:
    • check-neutral = golangci-lint
    • check-skipped = golangci-lint
    • check-success = golangci-lint
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]

Reason

Pull request #2658 has been dequeued

merge conditions no longer match. Blocked by:

  • #approved-reviews-by>1

  • #approved-reviews-by>1

  • any of [🛡 GitHub branch protection]:

    • check-neutral = test-e2e
    • check-skipped = test-e2e
    • check-success = test-e2e
  • any of [🛡 GitHub branch protection]:

    • check-neutral = test-integration
    • check-skipped = test-integration
    • check-success = test-integration
  • any of [🛡 GitHub branch protection]:

    • check-neutral = tests
    • check-skipped = tests
    • check-success = tests
  • any of [🛡 GitHub branch protection]:

    • check-neutral = golangci-lint
    • check-skipped = golangci-lint
    • check-success = golangci-lint

Hint

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.
If you do update this pull request, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio queue comment.

Tick the box to put this pull request back in the merge queue (same as @mergifyio queue).

  • Requeue this pull request

@dianab-cl dianab-cl added this pull request to the merge queue Jun 19, 2026
@mergify mergify Bot added the queued label Jun 19, 2026
mergify Bot added a commit that referenced this pull request Jun 19, 2026
@mergify mergify Bot added dequeued and removed queued labels Jun 19, 2026
Merged via the queue into main with commit f6165bc Jun 19, 2026
38 of 40 checks passed
@dianab-cl dianab-cl deleted the diana/fou-251-improve-e2e-pipeline-for-interchain-security branch June 19, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C:CI Assigned automatically by the PR labeler C:Testing Assigned automatically by the PR labeler dequeued

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants