Conversation
Introduce TraceVcdSparse, an alternative VCD parser that stores only
value changes rather than expanding to a dense representation. This
addresses memory scalability issues when loading large simulation
waveforms where signals change infrequently.
Key implementation details:
- Sparse storage using (indices, values) tuples per signal
- O(log n) value lookup via binary search (bisect)
- Change detection to avoid storing redundant values
- Full API compatibility with existing TraceVcd
Performance characteristics:
- 50-200x memory reduction for typical simulations (0.5-2% change rate)
- 2x slower random access (still >3M accesses/sec)
- Identical parsing overhead
Usage:
tc = TraceContainer()
tc.load('trace.vcd', sparse=True)
Includes comprehensive test suite with synthetic VCD generation
for validation across small, medium, and large simulations.
48b659e to
6013757
Compare
|
Great proposal. Storing only value changes with binary search lookup is the right approach for VCD data - it matches the inherent sparsity of hardware signals. The 50-200x memory reduction would make it practical to analyze much larger simulations. |
|
Why don't you just use |
|
@sjalloq We actually use VCD in most of our waveform analisys tooling so FST is kind of out of our scope. |
|
I haven't looked but I think it does. When you want all the resolved edges, the tool expands the compressed FST and returns the full vector. So if you can actually store sparse data in memory then it's a win as long as you don't lose performance in having to resolve when querying. But I solved the problem another way - convert to FST to save size, and then post process the FST to dump only the signals you need and the time window you're interested in. Most of the time you only care about a handful of signals or a sub-unit or a group of signals across the design, for example all AXI buses. There's no reason to load the full VCD/FST when you're going to be querying 0.01% of the signals in the design. |
Actually we are planning on using WAL as a service which means that we can't know in advance what are the desired timestamps / signals the user want. |
|
@LucasKl any comment? |
Summary
This PR introduces
TraceVcdSparse, an optional memory-efficient VCD parser designed for large simulation waveforms. The existingTraceVcdexpands VCD data into a dense array where every signal has a value at every timestamp—this becomes prohibitivelyexpensive for large simulations with many signals that change infrequently.
The Problem
VCD files are inherently sparse (they only record value changes), but the current parser expands this into:
For a simulation with 1,000 signals over 50,000 timestamps, this requires 50 million entries even if most signals rarely change.
The Solution
TraceVcdSparsepreserves the sparse nature of VCD by storing only actual value changes:Value lookup uses binary search (
bisect) to find the most recent change at any given index.Key Features
TraceVcdsparse=Trueparameter, no breaking changesBenchmarks
Access performance:
Usage
Files Changed
wal/trace/vcd_sparse.py- New sparse VCD parser implementationwal/trace/container.py- Addedsparseparameter toload()tests/test_vcd_sparse.py- Basic functionality teststests/test_vcd_sparse_comparison.py- Large-scale comparison testspyproject.toml- Registeredslowpytest markerTest Plan