Record rollout start time and show rollout latency in UI#398
Record rollout start time and show rollout latency in UI#398
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9171bc586c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if row.execution_metadata.rollout_start_time is None: | ||
| row.execution_metadata.rollout_start_time = datetime.now(timezone.utc) |
There was a problem hiding this comment.
Reset rollout_start_time on retries
In retry flows (rollout_processor_with_retry reuses the same EvaluationRow), this if ... is None guard means the timestamp is only set on the first attempt. If the first attempt fails and the row is retried, the successful attempt keeps the earlier rollout_start_time while rollout_duration_seconds reflects only the last attempt, so any latency calculation or trace alignment based on rollout_start_time will be too early by the time spent in prior retries. Consider resetting rollout_start_time at the start of each attempt (or in the retry wrapper) to keep these timings consistent.
Useful? React with 👍 / 👎.
Motivation
created_attimestamps refer to the invocation and cannot be used to compute per-rollout start times.Description
rollout_start_timetoExecutionMetadataineval_protocol/models.pyand extended the TypeScript schema invite-app/src/types/eval-protocol.tswithrollout_start_time,rollout_duration_seconds, andeval_duration_seconds.rollout_start_timeat rollout start in the main rollout entry points by setting it before the processing timer in processors such asdefault_single_turn_rollout_process.py,default_pydantic_ai_rollout_processor.py,remote_rollout_processor.py,github_action_rollout_processor.py,openenv_rollout_processor.py,default_klavis_sandbox_rollout_processor.py,default_agent_rollout_processor.py,tinker_rollout_processor.py,priority_scheduler.py, andmcp/execution/manager.py.Rollout Latencycolumn invite-app/src/components/EvaluationTable.tsx, aRowRolloutDurationrenderer, and wiring the cell invite-app/src/components/EvaluationRow.tsxto displayexecution_metadata.rollout_duration_secondsformatted as seconds.rollout_duration_secondsassignments remain unchanged) while providing the start timestamp for future trace alignment.Testing
make pre-committo run linters/type checks but it failed because thepre-committool is not installed in the environment.npm installinvite-appto validate frontend dependencies but it failed withCannot read properties of null (reading 'matches')fromnpmin this environment.git commit) after the edits completed successfully.Codex Task
Note
Adds explicit rollout timing to enable accurate per-rollout latency and sorting.
rollout_start_timetoExecutionMetadataplusrollout_duration_secondsandeval_duration_secondsfields; keep existing duration plumbing; serialize inmodels.pyand TSExecutionMetadataSchema.execution_metadata.rollout_start_timeat rollout start and computerollout_duration_secondsintinker_rollout_processor.py,mcp/execution/manager.py,default_single_turn_rollout_process.py,default_pydantic_ai_rollout_processor.py,default_agent_rollout_processor.py,default_klavis_sandbox_rollout_processor.py,openenv_rollout_processor.py,remote_rollout_processor.py,github_action_rollout_processor.py, andpriority_scheduler.py.EvaluationTable.tsxand render withRowRolloutDurationinEvaluationRow.tsxusingexecution_metadata.rollout_duration_seconds.Written by Cursor Bugbot for commit 9171bc5. This will update automatically on new commits. Configure here.