Bug: Incorrect Latency Metric Extraction in Benchmark Consolidation

# Bug: Incorrect Latency Metric Extraction in Benchmark Consolidation

## Summary

The `consolidate_multiple_run_of_metrics.py` script incorrectly extracts the `avg` field instead of the `latency` field from GStreamer's `latency_tracer` output, resulting in pipeline latency being reported as ~1374ms when the actual end-to-end latency is ~66ms.

## Impact

- **Severity**: High - Completely incorrect performance metric reporting
- **Affected Component**: `performance-tools/benchmark-scripts/consolidate_multiple_run_of_metrics.py`
- **Result**: Latency metrics in consolidated reports are off by ~20x, making performance benchmarks misleading

## Root Cause

The `PipelineLatencyExtractor` class extracts the wrong field from the `latency_tracer_pipeline` output.

### Current Buggy Code

**File**: `performance-tools/benchmark-scripts/consolidate_multiple_run_of_metrics.py`
**Line**: 489

```python
class PipelineLatencyExtractor(KPIExtractor):
    def extract_data(self, log_file_path):
        print("parsing latency")
        average_latency_value = ""
        latency = {}
        lat = re.findall(r'\d+', os.path.basename(log_file_path))
        lat_filename = lat[0] if len(lat) > 0 else "UNKNOWN"
        latency_key = "Pipeline_{} {}".format(lat_filename, PIPELINE_LATENCY_CONSTANT)
        with open(log_file_path) as f:
            for line in f:
              if "latency_tracer_pipeline" in line:
                match = re.search(r'avg=$double$([0-9]*\.?[0-9]+)', line)  # ← BUG HERE
                if match:
                    average_latency_value=match.group(1)
        if len(average_latency_value) > 0:
            latency[latency_key] = average_latency_value
        else:
            latency[latency_key] = "NA"
        return latency
```

## Understanding the GStreamer latency_tracer Output

The `latency_tracer_pipeline` output contains multiple fields with different meanings:

```
latency_tracer_pipeline, frame_latency=(double)1461.186673, avg=(double)1374.084365, min=(double)0.076990, max=(double)1609.683051, latency=(double)66.573660, fps=(double)15.020956, frame_num=(uint)858;
```

### Field Definitions (per Intel DLStreamer Documentation)

According to the [official Intel DLStreamer latency_tracer documentation](https://dlstreamer.github.io/dev_guide/latency_tracer.html):

> **pipeline latency**: Time that a buffer takes to travel from source pad of the source element to source pad of the element before sink element.

| Field | Value (ms) | Official Definition |
|-------|------------|---------------------|
| `frame_latency` | 1461.19 | Current frame's latency measurement |
| `avg` | 1374.08 | **Average of `frame_latency` values** across all frames (NOT true pipeline latency!) |
| `min` | 0.077 | Minimum `frame_latency` observed |
| `max` | 1609.68 | Maximum `frame_latency` observed |
| **`latency`** | **66.57** | **TRUE end-to-end pipeline latency** (source pad → element before sink) |
| `fps` | 15.02 | Current frame rate |
| `frame_num` | 858 | Frame counter |

### Why `avg` ≠ Pipeline Latency

**From Intel's example in their documentation:**
```
latency_tracer_pipeline, frame_latency=(double)0.249380, avg=(double)0.255765,
min=(double)0.205778, max=(double)0.588363, latency=(double)0.340871,
fps=(double)2933.658362, frame_num=(uint)8802;
```

Notice that in Intel's example:
- `avg` = 0.256ms (average of frame_latency values)
- `latency` = 0.341ms (the actual pipeline latency)
- They are **different values** representing **different metrics**

The `avg` field is the running average of `frame_latency` measurements, which can be affected by:
- Pipeline initialization irregularities
- Frame pacing variations during startup
- Buffering and queue behaviors

The **actual pipeline latency** (time for a buffer to traverse the entire pipeline from source to sink) is specifically stored in the `latency` field: **66.57ms**

## Evidence

### Example Log Output

From `benchmark/gst-launch_20251117183446055080963_gst0.log`:

**Early frames (showing divergence):**
```
FRAME:0.076990    AVG:0.076990     LAT:2.018444   FPS:495.431134 NUM:1
FRAME:51.450018   AVG:25.763504    LAT:26.719209  FPS:37.426258  NUM:2
FRAME:104.181531  AVG:51.902846    LAT:40.046907  FPS:24.970718  NUM:3
FRAME:158.584199  AVG:78.573184    LAT:46.727682  FPS:21.400591  NUM:4
```

**Final frames (steady state at 15 FPS):**
```
FRAME:1264.599661 AVG:1373.685147 LAT:66.572720 FPS:15.021168 NUM:849
FRAME:1397.865997 AVG:1373.713595 LAT:66.572770 FPS:15.021157 NUM:850
FRAME:1330.939429 AVG:1373.663332 LAT:66.572882 FPS:15.021131 NUM:851
FRAME:1598.156056 AVG:1373.926821 LAT:66.573028 FPS:15.021098 NUM:852
FRAME:1461.186673 AVG:1374.084365 LAT:66.573660 FPS:15.020956 NUM:858 ← FINAL
```

### Current Consolidated Output

**File**: `benchmark/metrics.csv`

```csv
Pipeline_20251117183446055080963 Latency,1374.084365
```

This reports **1374ms latency** (WRONG - extracted from `avg` field)

### Expected Consolidated Output

```csv
Pipeline_20251117183446055080963 Latency,66.573660
```

This should report **66.5ms latency** (CORRECT - from `latency` field)

## Proposed Fix

Change the regex pattern from matching `avg` to matching `latency`:

```python
class PipelineLatencyExtractor(KPIExtractor):
    def extract_data(self, log_file_path):
        print("parsing latency")
        average_latency_value = ""
        latency = {}
        lat = re.findall(r'\d+', os.path.basename(log_file_path))
        lat_filename = lat[0] if len(lat) > 0 else "UNKNOWN"
        latency_key = "Pipeline_{} {}".format(lat_filename, PIPELINE_LATENCY_CONSTANT)
        with open(log_file_path) as f:
            for line in f:
              if "latency_tracer_pipeline" in line:
                # FIX: Extract 'latency' field instead of 'avg' field
                match = re.search(r'latency=$double$([0-9]*\.?[0-9]+)', line)
                if match:
                    average_latency_value=match.group(1)
        if len(average_latency_value) > 0:
            latency[latency_key] = average_latency_value
        else:
            latency[latency_key] = "NA"
        return latency
```

**Changed line 489:**
```python
# Before (WRONG):
match = re.search(r'avg=$double$([0-9]*\.?[0-9]+)', line)

# After (CORRECT):
match = re.search(r'latency=$double$([0-9]*\.?[0-9]+)', line)
```

## Testing

### Test Case

1. Run benchmark: `make benchmark REGISTRY=true`
2. Consolidate metrics: `make consolidate-metrics RESULTS_DIR=$PWD/benchmark`
3. Check `benchmark/metrics.csv`

### Expected Result (After Fix)

```csv
Pipeline_20251117183446055080963 Latency,66.573660
```

Latency should be in the **60-70ms range** for typical single-stream YOLO11n inference, not 1300+ms.

## Additional Context

- This bug affects all benchmark reports generated by the consolidation script
- Historical benchmark data showing ~1374ms latency is invalid and should be re-evaluated
- The actual 66ms pipeline latency is reasonable for:
  - Video decode (h264/h265)
  - YOLO11n INT8 inference
  - Post-processing and metadata generation

## Alignment with Intel DLStreamer Documentation

This bug directly contradicts Intel's official guidance on using the latency_tracer tool.

### What Intel's Documentation Says

From [Intel DLStreamer latency_tracer guide](https://dlstreamer.github.io/dev_guide/latency_tracer.html):

1. **Definition of pipeline latency**:
   > "pipeline latency: Time that a buffer takes to travel from source pad of the source element to source pad of the element before sink element."

2. **Sample output format** shows distinct `avg` and `latency` fields:
   ```
   latency_tracer_pipeline, frame_latency=(double)0.249380, avg=(double)0.255765,
   min=(double)0.205778, max=(double)0.588363, latency=(double)0.340871,
   fps=(double)2933.658362, frame_num=(uint)8802;
   ```

3. **The `latency` field** is explicitly documented as the pipeline latency metric

### What the Current Code Does

The consolidation script extracts the `avg` field instead of the `latency` field, which:
- Contradicts Intel's definition of pipeline latency
- Misrepresents performance characteristics by ~20x
- Makes benchmark comparisons meaningless

### Correct Implementation

To align with Intel's documentation, the script should extract the `latency` field, which represents the actual time a buffer takes to traverse the pipeline from source to sink.

## References

- **Intel DLStreamer latency_tracer documentation**: https://dlstreamer.github.io/dev_guide/latency_tracer.html
- Affected file: `performance-tools/benchmark-scripts/consolidate_multiple_run_of_metrics.py:476-500`
- Related: `PIPELINE_LATENCY_CONSTANT` definition at line 40

## Reproduction

```bash
# Setup
make update-submodules
make download-models
make download-sample-videos

# Run benchmark
make benchmark REGISTRY=true

# Consolidate (will show bug)
make consolidate-metrics RESULTS_DIR=$PWD/benchmark

# Check incorrect output
cat benchmark/metrics.csv | grep Latency
# Shows: Pipeline_20251117183446055080963 Latency,1374.084365 (WRONG)

# Manually verify correct value in logs
grep "latency_tracer_pipeline," benchmark/gst-launch_*.log | grep -v interval | tail -1
# Shows: latency=(double)66.573660 (CORRECT)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Incorrect Latency Metric Extraction in Benchmark Consolidation #180

Bug: Incorrect Latency Metric Extraction in Benchmark Consolidation

Summary

Impact

Root Cause

Current Buggy Code

Understanding the GStreamer latency_tracer Output

Field Definitions (per Intel DLStreamer Documentation)

Why `avg` ≠ Pipeline Latency

Evidence

Example Log Output

Current Consolidated Output

Expected Consolidated Output

Proposed Fix

Testing

Test Case

Expected Result (After Fix)

Additional Context

Alignment with Intel DLStreamer Documentation

What Intel's Documentation Says

What the Current Code Does

Correct Implementation

References

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	Value (ms)	Official Definition
`frame_latency`	1461.19	Current frame's latency measurement
`avg`	1374.08	Average of `frame_latency` values across all frames (NOT true pipeline latency!)
`min`	0.077	Minimum `frame_latency` observed
`max`	1609.68	Maximum `frame_latency` observed
`latency`	66.57	TRUE end-to-end pipeline latency (source pad → element before sink)
`fps`	15.02	Current frame rate
`frame_num`	858	Frame counter

Bug: Incorrect Latency Metric Extraction in Benchmark Consolidation #180

Description

Bug: Incorrect Latency Metric Extraction in Benchmark Consolidation

Summary

Impact

Root Cause

Current Buggy Code

Understanding the GStreamer latency_tracer Output

Field Definitions (per Intel DLStreamer Documentation)

Why avg ≠ Pipeline Latency

Evidence

Example Log Output

Current Consolidated Output

Expected Consolidated Output

Proposed Fix

Testing

Test Case

Expected Result (After Fix)

Additional Context

Alignment with Intel DLStreamer Documentation

What Intel's Documentation Says

What the Current Code Does

Correct Implementation

References

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Why `avg` ≠ Pipeline Latency