Skip to content

Bug: Incorrect Latency Metric Extraction in Benchmark Consolidation #180

@JoshCork

Description

@JoshCork

Bug: Incorrect Latency Metric Extraction in Benchmark Consolidation

Summary

The consolidate_multiple_run_of_metrics.py script incorrectly extracts the avg field instead of the latency field from GStreamer's latency_tracer output, resulting in pipeline latency being reported as ~1374ms when the actual end-to-end latency is ~66ms.

Impact

  • Severity: High - Completely incorrect performance metric reporting
  • Affected Component: performance-tools/benchmark-scripts/consolidate_multiple_run_of_metrics.py
  • Result: Latency metrics in consolidated reports are off by ~20x, making performance benchmarks misleading

Root Cause

The PipelineLatencyExtractor class extracts the wrong field from the latency_tracer_pipeline output.

Current Buggy Code

File: performance-tools/benchmark-scripts/consolidate_multiple_run_of_metrics.py
Line: 489

class PipelineLatencyExtractor(KPIExtractor):
    def extract_data(self, log_file_path):
        print("parsing latency")
        average_latency_value = ""
        latency = {}
        lat = re.findall(r'\d+', os.path.basename(log_file_path))
        lat_filename = lat[0] if len(lat) > 0 else "UNKNOWN"
        latency_key = "Pipeline_{} {}".format(lat_filename, PIPELINE_LATENCY_CONSTANT)
        with open(log_file_path) as f:
            for line in f:
              if "latency_tracer_pipeline" in line:
                match = re.search(r'avg=\(double\)([0-9]*\.?[0-9]+)', line)  # ← BUG HERE
                if match:
                    average_latency_value=match.group(1)
        if len(average_latency_value) > 0:
            latency[latency_key] = average_latency_value
        else:
            latency[latency_key] = "NA"
        return latency

Understanding the GStreamer latency_tracer Output

The latency_tracer_pipeline output contains multiple fields with different meanings:

latency_tracer_pipeline, frame_latency=(double)1461.186673, avg=(double)1374.084365, min=(double)0.076990, max=(double)1609.683051, latency=(double)66.573660, fps=(double)15.020956, frame_num=(uint)858;

Field Definitions (per Intel DLStreamer Documentation)

According to the official Intel DLStreamer latency_tracer documentation:

pipeline latency: Time that a buffer takes to travel from source pad of the source element to source pad of the element before sink element.

Field Value (ms) Official Definition
frame_latency 1461.19 Current frame's latency measurement
avg 1374.08 Average of frame_latency values across all frames (NOT true pipeline latency!)
min 0.077 Minimum frame_latency observed
max 1609.68 Maximum frame_latency observed
latency 66.57 TRUE end-to-end pipeline latency (source pad → element before sink)
fps 15.02 Current frame rate
frame_num 858 Frame counter

Why avg ≠ Pipeline Latency

From Intel's example in their documentation:

latency_tracer_pipeline, frame_latency=(double)0.249380, avg=(double)0.255765,
min=(double)0.205778, max=(double)0.588363, latency=(double)0.340871,
fps=(double)2933.658362, frame_num=(uint)8802;

Notice that in Intel's example:

  • avg = 0.256ms (average of frame_latency values)
  • latency = 0.341ms (the actual pipeline latency)
  • They are different values representing different metrics

The avg field is the running average of frame_latency measurements, which can be affected by:

  • Pipeline initialization irregularities
  • Frame pacing variations during startup
  • Buffering and queue behaviors

The actual pipeline latency (time for a buffer to traverse the entire pipeline from source to sink) is specifically stored in the latency field: 66.57ms

Evidence

Example Log Output

From benchmark/gst-launch_20251117183446055080963_gst0.log:

Early frames (showing divergence):

FRAME:0.076990    AVG:0.076990     LAT:2.018444   FPS:495.431134 NUM:1
FRAME:51.450018   AVG:25.763504    LAT:26.719209  FPS:37.426258  NUM:2
FRAME:104.181531  AVG:51.902846    LAT:40.046907  FPS:24.970718  NUM:3
FRAME:158.584199  AVG:78.573184    LAT:46.727682  FPS:21.400591  NUM:4

Final frames (steady state at 15 FPS):

FRAME:1264.599661 AVG:1373.685147 LAT:66.572720 FPS:15.021168 NUM:849
FRAME:1397.865997 AVG:1373.713595 LAT:66.572770 FPS:15.021157 NUM:850
FRAME:1330.939429 AVG:1373.663332 LAT:66.572882 FPS:15.021131 NUM:851
FRAME:1598.156056 AVG:1373.926821 LAT:66.573028 FPS:15.021098 NUM:852
FRAME:1461.186673 AVG:1374.084365 LAT:66.573660 FPS:15.020956 NUM:858 ← FINAL

Current Consolidated Output

File: benchmark/metrics.csv

Pipeline_20251117183446055080963 Latency,1374.084365

This reports 1374ms latency (WRONG - extracted from avg field)

Expected Consolidated Output

Pipeline_20251117183446055080963 Latency,66.573660

This should report 66.5ms latency (CORRECT - from latency field)

Proposed Fix

Change the regex pattern from matching avg to matching latency:

class PipelineLatencyExtractor(KPIExtractor):
    def extract_data(self, log_file_path):
        print("parsing latency")
        average_latency_value = ""
        latency = {}
        lat = re.findall(r'\d+', os.path.basename(log_file_path))
        lat_filename = lat[0] if len(lat) > 0 else "UNKNOWN"
        latency_key = "Pipeline_{} {}".format(lat_filename, PIPELINE_LATENCY_CONSTANT)
        with open(log_file_path) as f:
            for line in f:
              if "latency_tracer_pipeline" in line:
                # FIX: Extract 'latency' field instead of 'avg' field
                match = re.search(r'latency=\(double\)([0-9]*\.?[0-9]+)', line)
                if match:
                    average_latency_value=match.group(1)
        if len(average_latency_value) > 0:
            latency[latency_key] = average_latency_value
        else:
            latency[latency_key] = "NA"
        return latency

Changed line 489:

# Before (WRONG):
match = re.search(r'avg=\(double\)([0-9]*\.?[0-9]+)', line)

# After (CORRECT):
match = re.search(r'latency=\(double\)([0-9]*\.?[0-9]+)', line)

Testing

Test Case

  1. Run benchmark: make benchmark REGISTRY=true
  2. Consolidate metrics: make consolidate-metrics RESULTS_DIR=$PWD/benchmark
  3. Check benchmark/metrics.csv

Expected Result (After Fix)

Pipeline_20251117183446055080963 Latency,66.573660

Latency should be in the 60-70ms range for typical single-stream YOLO11n inference, not 1300+ms.

Additional Context

  • This bug affects all benchmark reports generated by the consolidation script
  • Historical benchmark data showing ~1374ms latency is invalid and should be re-evaluated
  • The actual 66ms pipeline latency is reasonable for:
    • Video decode (h264/h265)
    • YOLO11n INT8 inference
    • Post-processing and metadata generation

Alignment with Intel DLStreamer Documentation

This bug directly contradicts Intel's official guidance on using the latency_tracer tool.

What Intel's Documentation Says

From Intel DLStreamer latency_tracer guide:

  1. Definition of pipeline latency:

    "pipeline latency: Time that a buffer takes to travel from source pad of the source element to source pad of the element before sink element."

  2. Sample output format shows distinct avg and latency fields:

    latency_tracer_pipeline, frame_latency=(double)0.249380, avg=(double)0.255765,
    min=(double)0.205778, max=(double)0.588363, latency=(double)0.340871,
    fps=(double)2933.658362, frame_num=(uint)8802;
    
  3. The latency field is explicitly documented as the pipeline latency metric

What the Current Code Does

The consolidation script extracts the avg field instead of the latency field, which:

  • Contradicts Intel's definition of pipeline latency
  • Misrepresents performance characteristics by ~20x
  • Makes benchmark comparisons meaningless

Correct Implementation

To align with Intel's documentation, the script should extract the latency field, which represents the actual time a buffer takes to traverse the pipeline from source to sink.

References

Reproduction

# Setup
make update-submodules
make download-models
make download-sample-videos

# Run benchmark
make benchmark REGISTRY=true

# Consolidate (will show bug)
make consolidate-metrics RESULTS_DIR=$PWD/benchmark

# Check incorrect output
cat benchmark/metrics.csv | grep Latency
# Shows: Pipeline_20251117183446055080963 Latency,1374.084365 (WRONG)

# Manually verify correct value in logs
grep "latency_tracer_pipeline," benchmark/gst-launch_*.log | grep -v interval | tail -1
# Shows: latency=(double)66.573660 (CORRECT)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions