Skip to content

Add missing timeout configurations to CI workflows #2396

@mr-raj12

Description

@mr-raj12

The problem

The CI workflows (.github/workflows/test.yml and .github/workflows/build-macos.yml) are missing timeout configurations on several jobs and steps. Without proper timeouts, these jobs can hang indefinitely, leading to:

  • Wasted CI runner minutes (and costs)
  • Delayed failure feedback for developers
  • Blocked CI queues preventing other jobs from running

Currently:

  • lint job (test.yml:20): No timeout configured (defaults to 360 minutes/6 hours)
  • prepare-matrix job (test.yml:34): No timeout configured (defaults to 360 minutes/6 hours)
  • tmate debug steps (test.yml:82-84, 130-132): No timeout configured (could hang indefinitely if debug mode is enabled)

While the project recently addressed major CI hang issues through comprehensive test mocking (PR #2353), the workflow timeout configurations themselves still have gaps that could allow jobs to run far longer than necessary.

Requested Solution

Add appropriate timeout-minutes configurations to all jobs and critical steps in CI workflows:

  1. Add timeout-minutes: 15 to lint job (test.yml:20)

    • Linting with ruff typically completes in <2 minutes
    • 15 minutes provides buffer for slow runners or network issues
  2. Add timeout-minutes: 5 to prepare-matrix job (test.yml:34)

    • Matrix generation is a simple bash script (runs in seconds)
    • 5 minutes is more than sufficient
  3. Add timeout-minutes: 60 to tmate debug steps (test.yml:82-84, 130-132)

    • Prevents forgotten debug sessions from hanging for 6 hours
    • 60 minutes is reasonable for interactive debugging

Alternatives

  • Keep current timeouts and rely solely on test mocking fixes from PR Fix test hangs on Linux/macOS CI by mocking system dependencies #2353

    • Downside: No protection against workflow-level hangs unrelated to test execution
  • Use shorter timeouts (e.g., 10 minutes for lint, 3 minutes for prepare-matrix)

    • Downside: May be too aggressive and kill legitimate slow jobs

Additional context

Affected files:

  • .github/workflows/test.yml (lines 20, 34, 82-84, 130-132)
  • .github/workflows/build-macos.yml (optional: line 91 for artifact upload)

Current timeout status:

  • test-unit job: 120 minutes (test.yml:59) - already configured
  • test-integration job: 20 minutes (test.yml:112) - already configured
  • build job: 120 minutes (build-macos.yml:25) - already configured
  • lint, prepare-matrix, and tmate steps - missing timeouts

Related work:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions