[COST-7490] Add midnight hold mechanism for PR checks#122
Conversation
There was a problem hiding this comment.
Code Review
This pull request expands the CI management infrastructure to support multiple scheduled jobs, including a new 'onprem' job, and updates the associated scripts, Makefile, and documentation. It also introduces a midnight-hold task to the pipeline, which estimates test duration based on PR labels and delays execution if it would overlap with the UTC midnight boundary to prevent test failures. Feedback highlights a race condition in the time calculation within the midnight-hold task and recommends a more reliable approach for parsing JSON from the GitHub API.
bacciotti
left a comment
There was a problem hiding this comment.
Race condition fix (high priority): Fixed in 5db0905 — replaced the three separate date calls with a single epoch-based calculation ($(date -u +%s) % 86400) as suggested.
JSON parsing (medium priority): Acknowledged. The current grep + cut approach is intentionally consistent with how init-pipeline-context.yaml already parses PR labels in this same repository. PR labels in project-koku/koku are controlled strings without special characters, so the fragility risk is low. Switching to jq would require changing the base image (curlimages/curl → alpine + apk add jq), which adds complexity not justified for an interim defensive mechanism. Can revisit if the label parsing ever proves fragile in practice.
Implement a new Tekton task that delays the pipeline if the estimated IQE test duration would overlap the UTC midnight boundary, preventing date-sensitive test failures caused by calendar day rollovers. The task fetches PR labels via the GitHub API to estimate job duration using empirical data from historical check runs, and sleeps until 00:05 UTC when a midnight conflict is predicted. Co-authored-by: Cursor <cursoragent@cursor.com>
…ace condition Co-authored-by: Cursor <cursoragent@cursor.com>
5db0905 to
02cd7f8
Compare
…stimates Co-authored-by: Cursor <cursoragent@cursor.com>
Jira Ticket
COST-7490
Summary
midnight-holdthat acts as a gate beforereserve-namespaceIS_SCHEDULED_TEST_JOB=true) and non-IQE labels (ok-to-skip-smokes,run-jenkins-tests) are skipped unconditionallyLabel → Estimated Duration Mapping
hot-fix-smoke-testsaws-smoke-tests/azure-smoke-tests/gcp-smoke-testsocp-on-prem-smoke-testssmoke-testsocp-smoke-testsfull-run-smoke-testsIQE_CJI_TIMEOUT, capped at 240 minPipeline Change
midnight-holdis inserted betweeninit-pipeline-contextandreserve-namespace, ensuring the hold occurs before any ephemeral namespace is allocated.Notes
This is an interim defensive measure (see COST-7402 for the long-term fix). If the test suite is refactored to be date-resilient, this task can be decommissioned.