Skip to content

feature/sparse vcd parser#45

Open
snirqm wants to merge 1 commit intoics-jku:mainfrom
snirqm:feature/sparse-vcd-parser
Open

feature/sparse vcd parser#45
snirqm wants to merge 1 commit intoics-jku:mainfrom
snirqm:feature/sparse-vcd-parser

Conversation

@snirqm
Copy link
Copy Markdown

@snirqm snirqm commented Feb 1, 2026

Summary

This PR introduces TraceVcdSparse, an optional memory-efficient VCD parser designed for large simulation waveforms. The existing TraceVcd expands VCD data into a dense array where every signal has a value at every timestamp—this becomes prohibitively
expensive for large simulations with many signals that change infrequently.

The Problem

VCD files are inherently sparse (they only record value changes), but the current parser expands this into:

{'clk': ['0','1','0','1',...], 'data': ['x','x','x','x','5','5','5',...]}

For a simulation with 1,000 signals over 50,000 timestamps, this requires 50 million entries even if most signals rarely change.

The Solution

TraceVcdSparse preserves the sparse nature of VCD by storing only actual value changes:

{'clk': ([0,1,2,3,...], ['0','1','0','1',...]),
 'data': ([0,40,100,...], ['x','5','7',...])}

Value lookup uses binary search (bisect) to find the most recent change at any given index.

Key Features

  • Massive memory reduction: 50-200x for typical simulations (0.5-2% change rate)
  • Full API compatibility: Drop-in replacement, same interface as TraceVcd
  • Opt-in usage: Enable with sparse=True parameter, no breaking changes
  • Comprehensive test coverage: 20 new tests including large-scale validation

Benchmarks

Simulation Size Dense Entries Sparse Entries Compression
500 signals × 10K timestamps 5,000,000 100,829 49.6x
1000 signals × 50K timestamps 49,652,000 250,803 198x

Access performance:

  • Dense: ~8M accesses/sec (O(1) lookup)
  • Sparse: ~3.6M accesses/sec (O(log n) binary search)
  • Tradeoff: 2x slower access for 50-200x less memory

Usage

from wal.trace.container import TraceContainer

tc = TraceContainer()
tc.load('large_simulation.vcd', sparse=True)

# Check compression stats
trace = tc.traces['t0']
stats = trace.memory_stats()
print(f"Compression: {stats['compression_ratio']:.1f}x")

Files Changed

  • wal/trace/vcd_sparse.py - New sparse VCD parser implementation
  • wal/trace/container.py - Added sparse parameter to load()
  • tests/test_vcd_sparse.py - Basic functionality tests
  • tests/test_vcd_sparse_comparison.py - Large-scale comparison tests
  • pyproject.toml - Registered slow pytest marker

Test Plan

  • All existing tests pass
  • New sparse parser produces identical values to dense parser at all indices
  • Signal/scope lists match between parsers
  • Edge cases: static signals, high-frequency signals, x/z values, real values
  • Large-scale tests: 500+ signals, 10K-50K timestamps
  • Memory compression verified for various change rates
# Run all sparse VCD tests
python -m pytest tests/test_vcd_sparse.py tests/test_vcd_sparse_comparison.py -v

# Run only fast tests (skip large simulations)
python -m pytest tests/test_vcd_sparse_comparison.py -m "not slow"

Introduce TraceVcdSparse, an alternative VCD parser that stores only
value changes rather than expanding to a dense representation. This
addresses memory scalability issues when loading large simulation
waveforms where signals change infrequently.

Key implementation details:
- Sparse storage using (indices, values) tuples per signal
- O(log n) value lookup via binary search (bisect)
- Change detection to avoid storing redundant values
- Full API compatibility with existing TraceVcd

Performance characteristics:
- 50-200x memory reduction for typical simulations (0.5-2% change rate)
- 2x slower random access (still >3M accesses/sec)
- Identical parsing overhead

Usage:
  tc = TraceContainer()
  tc.load('trace.vcd', sparse=True)

Includes comprehensive test suite with synthetic VCD generation
for validation across small, medium, and large simulations.
@meitarsqm
Copy link
Copy Markdown

Great proposal. Storing only value changes with binary search lookup is the right approach for VCD data - it matches the inherent sparsity of hardware signals. The 50-200x memory reduction would make it practical to analyze much larger simulations.

@sjalloq
Copy link
Copy Markdown

sjalloq commented Feb 1, 2026

Why don't you just use vcd2fst?

@TDoGoodT
Copy link
Copy Markdown

TDoGoodT commented Feb 2, 2026

@sjalloq We actually use VCD in most of our waveform analisys tooling so FST is kind of out of our scope.
Also, I didn't looked into the FST parser, does it also expand the signals to get the full vector?

@sjalloq
Copy link
Copy Markdown

sjalloq commented Feb 2, 2026

I haven't looked but I think it does. When you want all the resolved edges, the tool expands the compressed FST and returns the full vector. So if you can actually store sparse data in memory then it's a win as long as you don't lose performance in having to resolve when querying.

But I solved the problem another way - convert to FST to save size, and then post process the FST to dump only the signals you need and the time window you're interested in. Most of the time you only care about a handful of signals or a sub-unit or a group of signals across the design, for example all AXI buses. There's no reason to load the full VCD/FST when you're going to be querying 0.01% of the signals in the design.

@snirqm
Copy link
Copy Markdown
Author

snirqm commented Feb 5, 2026

I haven't looked but I think it does. When you want all the resolved edges, the tool expands the compressed FST and returns the full vector. So if you can actually store sparse data in memory then it's a win as long as you don't lose performance in having to resolve when querying.

But I solved the problem another way - convert to FST to save size, and then post process the FST to dump only the signals you need and the time window you're interested in. Most of the time you only care about a handful of signals or a sub-unit or a group of signals across the design, for example all AXI buses. There's no reason to load the full VCD/FST when you're going to be querying 0.01% of the signals in the design.

Actually we are planning on using WAL as a service which means that we can't know in advance what are the desired timestamps / signals the user want.
This makes FST also non-option because we can't have the whole signal loaded to memory as the accumulate and crash the server.

@snirqm
Copy link
Copy Markdown
Author

snirqm commented Mar 23, 2026

@LucasKl any comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants