-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Bug: Incorrect Latency Metric Extraction in Benchmark Consolidation
Summary
The consolidate_multiple_run_of_metrics.py script incorrectly extracts the avg field instead of the latency field from GStreamer's latency_tracer output, resulting in pipeline latency being reported as ~1374ms when the actual end-to-end latency is ~66ms.
Impact
- Severity: High - Completely incorrect performance metric reporting
- Affected Component:
performance-tools/benchmark-scripts/consolidate_multiple_run_of_metrics.py - Result: Latency metrics in consolidated reports are off by ~20x, making performance benchmarks misleading
Root Cause
The PipelineLatencyExtractor class extracts the wrong field from the latency_tracer_pipeline output.
Current Buggy Code
File: performance-tools/benchmark-scripts/consolidate_multiple_run_of_metrics.py
Line: 489
class PipelineLatencyExtractor(KPIExtractor):
def extract_data(self, log_file_path):
print("parsing latency")
average_latency_value = ""
latency = {}
lat = re.findall(r'\d+', os.path.basename(log_file_path))
lat_filename = lat[0] if len(lat) > 0 else "UNKNOWN"
latency_key = "Pipeline_{} {}".format(lat_filename, PIPELINE_LATENCY_CONSTANT)
with open(log_file_path) as f:
for line in f:
if "latency_tracer_pipeline" in line:
match = re.search(r'avg=\(double\)([0-9]*\.?[0-9]+)', line) # ← BUG HERE
if match:
average_latency_value=match.group(1)
if len(average_latency_value) > 0:
latency[latency_key] = average_latency_value
else:
latency[latency_key] = "NA"
return latencyUnderstanding the GStreamer latency_tracer Output
The latency_tracer_pipeline output contains multiple fields with different meanings:
latency_tracer_pipeline, frame_latency=(double)1461.186673, avg=(double)1374.084365, min=(double)0.076990, max=(double)1609.683051, latency=(double)66.573660, fps=(double)15.020956, frame_num=(uint)858;
Field Definitions (per Intel DLStreamer Documentation)
According to the official Intel DLStreamer latency_tracer documentation:
pipeline latency: Time that a buffer takes to travel from source pad of the source element to source pad of the element before sink element.
| Field | Value (ms) | Official Definition |
|---|---|---|
frame_latency |
1461.19 | Current frame's latency measurement |
avg |
1374.08 | Average of frame_latency values across all frames (NOT true pipeline latency!) |
min |
0.077 | Minimum frame_latency observed |
max |
1609.68 | Maximum frame_latency observed |
latency |
66.57 | TRUE end-to-end pipeline latency (source pad → element before sink) |
fps |
15.02 | Current frame rate |
frame_num |
858 | Frame counter |
Why avg ≠ Pipeline Latency
From Intel's example in their documentation:
latency_tracer_pipeline, frame_latency=(double)0.249380, avg=(double)0.255765,
min=(double)0.205778, max=(double)0.588363, latency=(double)0.340871,
fps=(double)2933.658362, frame_num=(uint)8802;
Notice that in Intel's example:
avg= 0.256ms (average of frame_latency values)latency= 0.341ms (the actual pipeline latency)- They are different values representing different metrics
The avg field is the running average of frame_latency measurements, which can be affected by:
- Pipeline initialization irregularities
- Frame pacing variations during startup
- Buffering and queue behaviors
The actual pipeline latency (time for a buffer to traverse the entire pipeline from source to sink) is specifically stored in the latency field: 66.57ms
Evidence
Example Log Output
From benchmark/gst-launch_20251117183446055080963_gst0.log:
Early frames (showing divergence):
FRAME:0.076990 AVG:0.076990 LAT:2.018444 FPS:495.431134 NUM:1
FRAME:51.450018 AVG:25.763504 LAT:26.719209 FPS:37.426258 NUM:2
FRAME:104.181531 AVG:51.902846 LAT:40.046907 FPS:24.970718 NUM:3
FRAME:158.584199 AVG:78.573184 LAT:46.727682 FPS:21.400591 NUM:4
Final frames (steady state at 15 FPS):
FRAME:1264.599661 AVG:1373.685147 LAT:66.572720 FPS:15.021168 NUM:849
FRAME:1397.865997 AVG:1373.713595 LAT:66.572770 FPS:15.021157 NUM:850
FRAME:1330.939429 AVG:1373.663332 LAT:66.572882 FPS:15.021131 NUM:851
FRAME:1598.156056 AVG:1373.926821 LAT:66.573028 FPS:15.021098 NUM:852
FRAME:1461.186673 AVG:1374.084365 LAT:66.573660 FPS:15.020956 NUM:858 ← FINAL
Current Consolidated Output
File: benchmark/metrics.csv
Pipeline_20251117183446055080963 Latency,1374.084365
This reports 1374ms latency (WRONG - extracted from avg field)
Expected Consolidated Output
Pipeline_20251117183446055080963 Latency,66.573660
This should report 66.5ms latency (CORRECT - from latency field)
Proposed Fix
Change the regex pattern from matching avg to matching latency:
class PipelineLatencyExtractor(KPIExtractor):
def extract_data(self, log_file_path):
print("parsing latency")
average_latency_value = ""
latency = {}
lat = re.findall(r'\d+', os.path.basename(log_file_path))
lat_filename = lat[0] if len(lat) > 0 else "UNKNOWN"
latency_key = "Pipeline_{} {}".format(lat_filename, PIPELINE_LATENCY_CONSTANT)
with open(log_file_path) as f:
for line in f:
if "latency_tracer_pipeline" in line:
# FIX: Extract 'latency' field instead of 'avg' field
match = re.search(r'latency=\(double\)([0-9]*\.?[0-9]+)', line)
if match:
average_latency_value=match.group(1)
if len(average_latency_value) > 0:
latency[latency_key] = average_latency_value
else:
latency[latency_key] = "NA"
return latencyChanged line 489:
# Before (WRONG):
match = re.search(r'avg=\(double\)([0-9]*\.?[0-9]+)', line)
# After (CORRECT):
match = re.search(r'latency=\(double\)([0-9]*\.?[0-9]+)', line)Testing
Test Case
- Run benchmark:
make benchmark REGISTRY=true - Consolidate metrics:
make consolidate-metrics RESULTS_DIR=$PWD/benchmark - Check
benchmark/metrics.csv
Expected Result (After Fix)
Pipeline_20251117183446055080963 Latency,66.573660
Latency should be in the 60-70ms range for typical single-stream YOLO11n inference, not 1300+ms.
Additional Context
- This bug affects all benchmark reports generated by the consolidation script
- Historical benchmark data showing ~1374ms latency is invalid and should be re-evaluated
- The actual 66ms pipeline latency is reasonable for:
- Video decode (h264/h265)
- YOLO11n INT8 inference
- Post-processing and metadata generation
Alignment with Intel DLStreamer Documentation
This bug directly contradicts Intel's official guidance on using the latency_tracer tool.
What Intel's Documentation Says
From Intel DLStreamer latency_tracer guide:
-
Definition of pipeline latency:
"pipeline latency: Time that a buffer takes to travel from source pad of the source element to source pad of the element before sink element."
-
Sample output format shows distinct
avgandlatencyfields:latency_tracer_pipeline, frame_latency=(double)0.249380, avg=(double)0.255765, min=(double)0.205778, max=(double)0.588363, latency=(double)0.340871, fps=(double)2933.658362, frame_num=(uint)8802; -
The
latencyfield is explicitly documented as the pipeline latency metric
What the Current Code Does
The consolidation script extracts the avg field instead of the latency field, which:
- Contradicts Intel's definition of pipeline latency
- Misrepresents performance characteristics by ~20x
- Makes benchmark comparisons meaningless
Correct Implementation
To align with Intel's documentation, the script should extract the latency field, which represents the actual time a buffer takes to traverse the pipeline from source to sink.
References
- Intel DLStreamer latency_tracer documentation: https://dlstreamer.github.io/dev_guide/latency_tracer.html
- Affected file:
performance-tools/benchmark-scripts/consolidate_multiple_run_of_metrics.py:476-500 - Related:
PIPELINE_LATENCY_CONSTANTdefinition at line 40
Reproduction
# Setup
make update-submodules
make download-models
make download-sample-videos
# Run benchmark
make benchmark REGISTRY=true
# Consolidate (will show bug)
make consolidate-metrics RESULTS_DIR=$PWD/benchmark
# Check incorrect output
cat benchmark/metrics.csv | grep Latency
# Shows: Pipeline_20251117183446055080963 Latency,1374.084365 (WRONG)
# Manually verify correct value in logs
grep "latency_tracer_pipeline," benchmark/gst-launch_*.log | grep -v interval | tail -1
# Shows: latency=(double)66.573660 (CORRECT)Metadata
Metadata
Assignees
Labels
Type
Projects
Status