Conversation
xzrderek
left a comment
There was a problem hiding this comment.
looks good. it will also measure the time it takes waiting for the sema, that is intended right?
| async def _execute_pointwise_eval_with_semaphore( | ||
| row: EvaluationRow, | ||
| ) -> EvaluationRow: | ||
| start_time = time.perf_counter() |
There was a problem hiding this comment.
Eval duration includes semaphore wait time, not just execution
The start_time is captured before async with semaphore:, which means eval_duration_seconds includes time spent waiting for the semaphore, not just the actual evaluation execution time. When concurrent evaluations compete for semaphore slots, reported durations will be artificially inflated. The start_time assignment needs to be moved inside the async with semaphore: block to measure only the actual evaluation time.
Additional Locations (1)
| if isinstance(results, list): | ||
| eval_duration = time.perf_counter() - start_time | ||
| for r in results: | ||
| r.execution_metadata.eval_duration_seconds = eval_duration |
There was a problem hiding this comment.
Duration set before element type validation causes AttributeError
The new loop at lines 636-637 accesses r.execution_metadata.eval_duration_seconds before the validation at lines 638-642 confirms all elements are EvaluationRow instances. If the test function returns a list containing non-EvaluationRow objects, this will raise an AttributeError instead of the helpful ValueError with guidance. This is inconsistent with the pointwise case where validation occurs before setting the duration.
|
@xzrderek yes |
Note
Adds precise evaluation timing to aid performance analysis.
result.execution_metadata.eval_duration_secondsresultseval_duration_secondsto all returned rowsWritten by Cursor Bugbot for commit 6bfe52e. This will update automatically on new commits. Configure here.