Summary
Implement the Helix Reporter Job (HRJ) — a lightweight, long-running Azure DevOps build job that runs within each pipeline stage that submits Helix work items. Each HRJ instance polls Helix for completed work items from its stage, downloads and parses test result XMLs, and uploads them to Azure DevOps. This is the core component that decouples build agents from Helix test execution.
Motivation
Today, build agents remain allocated while waiting for Helix work items to complete. The HRJ eliminates this by taking over the responsibility of monitoring Helix jobs and reporting test results, allowing build jobs to exit immediately after submitting work items.
Requirements
Deployment Model
The HRJ is deployed as a job within each stage that submits Helix work items — one HRJ per stage, not a single HRJ for the entire pipeline. Each HRJ instance monitors only the Helix jobs submitted by other jobs in its stage.
Job Lifecycle
The HRJ runs as a job within each stage that submits Helix work, starting in parallel with the other jobs in that stage. It operates in three phases:
Phase 1: Incremental Monitoring
While other jobs in the stage are still running:
- Poll the Azure DevOps REST API to check the status of all non-HRJ jobs in the stage.
- Poll the Helix API (Issue 1 endpoint) for completed Helix jobs associated with this build/stage.
- For each newly completed Helix job:
- Download test result XML files from Helix storage.
- Parse results using the existing test result processing logic (from Helix Machines repo).
- Upload parsed results to Azure DevOps via the test results REST API.
- Log progress to console (jobs completed, jobs remaining, links to failures).
- Sleep and repeat.
Phase 2: Final Collection
Once all non-HRJ jobs in the stage have finished:
- Continue polling Helix until all associated jobs are complete.
- Process any remaining test results.
Phase 3: Report and Exit
- If all Helix work items passed → exit green.
- If any work item failed → exit red.
- Output a summary: total jobs, pass/fail counts, links to failed work items.
Statelessness
The HRJ must be fully stateless — it derives its current state entirely from:
- The Azure DevOps REST API (pipeline state, job statuses, attempt numbers).
- The Helix API (job statuses, work item results).
No pipeline artifacts are used for state tracking. This ensures the HRJ can be safely re-run at any time without getting into a bad state.
Retry Handling
| Scenario |
Behavior |
| Helix work item fails |
HRJ goes red. On re-run, it queries Helix for all jobs from the stage (including resubmitted ones), processes only unprocessed results, and reports the new outcome. |
| Build job fails (no Helix submission) |
HRJ detects the failure via AzDO API and reflects it in its status. |
| HRJ itself crashes |
Re-run re-derives state from APIs. |
| Build job re-run without HRJ re-run |
Stage stays red (HRJ is still red), forcing the user to also re-run the HRJ. |
Rate Limiting
It's possible that the Azure DevOps REST API will throttle the test result uploads. For this reason, we have to respect the HTTP headers of the response and adhere to the backoff rules set by those.
Authentication
- Uses its own system access token (from the HRJ's pipeline job) to upload test results.
- No system access tokens are passed to Helix agents.
- No Managed Identities needed — avoids throttling concerns entirely.
Test Result Processing
Reuse the existing Python logic from the Helix Machines repo that:
- Supports multiple formats (xUnit, NUnit, etc.).
- Locates test result XMLs by convention.
- Parses and converts to Azure DevOps test result format.
- Uploads via the Azure DevOps test results REST API.
Console Output
The HRJ should produce informative console output:
- Periodic status:
[12:34:56] 35/60 Helix jobs complete (2 failed). Waiting...
- Per-job completion:
[12:35:10] Job "coreclr-tests-linux-x64" completed (142 passed, 1 failed).
- Links to failed work item console logs.
- Final summary with pass/fail counts.
Infrastructure
| Requirement |
Details |
| Dedicated pool |
A small pool of lightweight Linux containers (minimal CPU/memory). |
| Pool sizing |
Must handle concurrent pipelines and stages — each stage with Helix work gets its own HRJ instance. Start with N machines; monitor and scale. |
| Timeout |
Configurable; should be at least as long as the longest expected Helix execution (e.g., 4–6 hours). |
| Polling interval |
Configurable; default 15–30 seconds. |
Implementation Notes
- The HRJ YAML template should be added to the Arcade SDK so that any pipeline stage using Helix can include it.
- Consider making the HRJ opt-in initially (via a pipeline variable like
EnableHelixReporterJob: true) for safe rollout.
- The HRJ script/tool should be a .NET console app living under Arcade's
src/Microsoft.DotNet.Helix.
- Investigate whether the HRJ can programmatically trigger itself (via AzDO REST API) when a build job is re-run but the HRJ is not — as a convenience for users - maybe from the job-sending infrastructure.
- The Helix SDK must include stage metadata when submitting jobs so the HRJ can scope its queries to its own stage.
Acceptance Criteria
Summary
Implement the Helix Reporter Job (HRJ) — a lightweight, long-running Azure DevOps build job that runs within each pipeline stage that submits Helix work items. Each HRJ instance polls Helix for completed work items from its stage, downloads and parses test result XMLs, and uploads them to Azure DevOps. This is the core component that decouples build agents from Helix test execution.
Motivation
Today, build agents remain allocated while waiting for Helix work items to complete. The HRJ eliminates this by taking over the responsibility of monitoring Helix jobs and reporting test results, allowing build jobs to exit immediately after submitting work items.
Requirements
Deployment Model
The HRJ is deployed as a job within each stage that submits Helix work items — one HRJ per stage, not a single HRJ for the entire pipeline. Each HRJ instance monitors only the Helix jobs submitted by other jobs in its stage.
Job Lifecycle
The HRJ runs as a job within each stage that submits Helix work, starting in parallel with the other jobs in that stage. It operates in three phases:
Phase 1: Incremental Monitoring
While other jobs in the stage are still running:
Phase 2: Final Collection
Once all non-HRJ jobs in the stage have finished:
Phase 3: Report and Exit
Statelessness
The HRJ must be fully stateless — it derives its current state entirely from:
No pipeline artifacts are used for state tracking. This ensures the HRJ can be safely re-run at any time without getting into a bad state.
Retry Handling
Rate Limiting
It's possible that the Azure DevOps REST API will throttle the test result uploads. For this reason, we have to respect the HTTP headers of the response and adhere to the backoff rules set by those.
Authentication
Test Result Processing
Reuse the existing Python logic from the Helix Machines repo that:
Console Output
The HRJ should produce informative console output:
[12:34:56] 35/60 Helix jobs complete (2 failed). Waiting...[12:35:10] Job "coreclr-tests-linux-x64" completed (142 passed, 1 failed).Infrastructure
Implementation Notes
EnableHelixReporterJob: true) for safe rollout.src/Microsoft.DotNet.Helix.Acceptance Criteria