Skip to content

v0.1.0: DSN binary parser#47

Merged
valentinozegna merged 49 commits into
mainfrom
release/v0.1.0
Mar 10, 2026
Merged

v0.1.0: DSN binary parser#47
valentinozegna merged 49 commits into
mainfrom
release/v0.1.0

Conversation

@valentinozegna
Copy link
Copy Markdown
Contributor

Summary

  • Complete DSN binary parser for Cadence .DSN schematic files (CFBF/OLE container format), providing direct netlist extraction without Cadence's exported .dat files
  • 100% pin number coverage and 96.1% pin name coverage across 9 test fixtures
  • New CLI commands (--export-json, --coverage), DNS detection, Altium encoding/overbar fixes
  • Internal refactors: parser module splits, service module splits, script consolidation

Test plan

  • npm run type-check passes
  • npm run lint passes
  • npm test passes (393 tests)

valentinozegna and others added 30 commits March 6, 2026 11:08
…al path support

Move OLE/CFBF reader from parsers/altium/ to parsers/ole-reader/ so it can be
shared between Altium and Cadence DSN parsers. Add hierarchical directory
tree traversal (childId/leftSiblingId/rightSiblingId), readStreamByPath(),
and listAllEntries() for nested CFBF containers like .DSN files.
Validate OleReader hierarchical path support against BeagleBone-Black DSN
fixture. Container has 11 page streams under Views/BeagleBoneBlack/Pages/,
a Library stream, and Packages Directory stream.
Port of DataStream.cpp from OpenOrCadParser. Little-endian integer reads,
ASCII string reads (zero-terminated, length-prefixed, length+zero), position
tracking, bounds checking, and data assertion.
Port GenericParser.cpp and FutureData.hpp from OpenOrCadParser. Implements
prefix chain detection (auto-detect count 10 down to 1), long/short prefix
reading, preamble magic (0xff 0xe4 0x5c 0x39), and FutureDataList checkpoint
system for structure boundary validation.
Port 11 structure parsers from OpenOrCadParser: SymbolDisplayProp, Alias,
Wire (scalar/bus), T0x10 (pin instances), PlacedInstance, GraphicInst base,
Global, Port, OffPageConnector, Device, and Package. Each follows the
prefix-preamble-checkpoint pattern from the C++ reference.
Parses Page streams from DSN CFBF containers to extract nets and
components using coordinate-based wire-to-pin matching. Handles
primitive T0x34/T0x35 structures, net tables, globals, ports, and
off-page connectors. Smoke test validates 331 nets and 414 components
from BeagleBone-Black fixture.
cadenceHandler.parse() now falls back to direct DSN binary parsing
when .dat export files are not available. Comparison tests against
BeagleBone-Black show 94.9% net coverage and 99.7% component coverage
versus the DAT parser output.
Cadence's Allegro netlist export uppercases all net names in .dat files.
The DSN schematic preserves original mixed case (e.g. SYS_RESETn vs
SYS_RESETN). Uppercasing at parse time ensures consistent net names
across both parsing paths.
Add DSN Parser Coverage test suite that compares direct DSN binary
parsing against DAT-derived golden output for all Cadence fixtures.
Component coverage is 100% across all 10 designs. Net coverage ranges
from 59.8% to 100%, with gaps from unnamed auto-generated nets and
unresolved multi-segment wire junctions.

Regenerate BEAGLEBONEBLK_C3 golden from freshly exported dat files.
Replace coordinate-only net matching with netId-based pin grouping
and page-scoped coordinate resolution. This fixes cross-page coordinate
collisions and enables unnamed net synthesis via N{netId}, dramatically
improving DSN parser coverage (e.g. BeagleBone-Black 95.5% -> 99.7%).

Add reusable debug scripts: dsn-coverage-report.ts for comparing DSN
vs DAT golden output, dsn-inspect.ts for low-level binary inspection.
- Add Union-Find wire graph to propagate net names through connected
  wire segments, replacing per-coordinate lookups
- Collect ALL wire aliases (not just first) to resolve dual-name nets
  like USBDM_1/USBDM2 on the same wire
- Use alphabetical-first resolution when a wire group has multiple
  candidate names, matching Cadence CIS export behavior
- Discover wire segmentId field (previously skipped 4 bytes): this is
  the per-segment database object ID that Cadence uses for auto-generated
  net names (N{minSegmentId})
- Synthesize N{minSegmentId} for unnamed wire groups, matching DAT export
- Add sentinel netId handling: netId=0 maps to NC, 0xFFFFFFFF skipped
- Exclude global/port/OPC symbol names from net resolution (they contain
  symbol type names like VCC_BAR, not net names like VDD_CORE)
- Refactor buildNetConnectivity into focused sub-functions:
  buildPageCoordMap, collectPins, assembleNets

Coverage improvement (DSN vs DAT golden):
  7/10 fixtures at 100% (was 0/10)
  Aggregate: 97.2% (was 95.8%), missing 128 (was 193), extra 35 (was 1134)

Add DSN debug scripts: dsn-gap-analysis, dsn-check-ports, dsn-wire-trace,
dsn-find-wire, dsn-wire-fields
- OPC midpoint matching: connect OPCs to wires via 5 candidate edge
  midpoints (right, left, top, bottom, loc), avoiding false unions
  from overlapping bboxes on dense schematics
- OPC pairing: union OPCs sharing the same pairingId for intra-page
  and cross-page net equivalence
- Multi-name netId: store all net table names per wireId (not just
  the last one) so hierarchy preference can resolve ties (e.g.,
  VOLUP vs GPIO2 on same netId, hierarchy picks VOLUP)
- Sentinel pin handling: process 0xFFFFFFFF pins that have a
  coordinate-resolved net name instead of dropping them
- Library strLst: fall back to uint32 count when uint16 fails
- PropPairs threading: capture short prefix name/value pairs in
  GraphicInst for OPC label extraction
- Add 20 DSN debug/inspection scripts
The OPC label extraction via propPairs was a dead end: the labels
are already in the net table (multiple names per netId), not in
the short prefix property pairs. Remove the propPairs parameter
from autoReadPrefixes, readPrefixes, readSinglePrefixShort,
parseGraphicInstBase, and the GraphicInst interface.
Add disambiguateCrossPageNets() to resolve duplicate net names across
pages using hierarchy-suffixed names matched by sort order (two-pointer).
Brings reServer J401 v11 from 89.1% to 100% net coverage.

Cleanup: remove unused parseLibraryStrLst, no-op sanitizeCheckpoints,
dead symbolDisplayProps loop in buildComponents, stale comments. Fix
in-place mutation of pageCoordMaps in buildNetConnectivity.

Aggregate DSN coverage: 99.8% (4594/4601), 0 extra.
Handle direct pin-to-pin overlaps (no wire) by grouping sentinel pins
at shared coordinates and matching them to unmatched N{number} hierarchy
names. This closes the last 7 coverage gaps (LAUNCHXL 2, CutiePi 5),
bringing DSN-to-DAT parity to 100% (4601/4601 nets across 10 fixtures).

Refactor assembleNets into composed functions: classifyPins separates
pins into 4 categories, resolveNetIdName and resolveWirelessSentinelNets
handle naming independently, and assembleNets orchestrates.

Also remove stale sanitizeCheckpoints calls from debug scripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete 24 one-off debug scripts written during DSN parser development.
All functionality absorbed into dsn-inspect.ts as 8 new commands:
nettable, symbols, wire, wiretrace, conflicts, hierarchy, streams, stream.

Three scripts remain: dsn-inspect.ts (13 commands covering full DSN
visibility), dsn-coverage-report.ts, and dsn-gap-analysis.ts.

Also adds tsconfig.check.json for type-checking scripts alongside src,
fixes no-explicit-any lint errors in coverage/gap scripts, and updates
AGENTS.md documentation.
Update package.json scripts to use tsconfig.check.json for type-checking
and include scripts/**/*.ts in lint targets. Fix zod indentation.
Remove obsolete cadence.zip test fixture archive.
Route parseCadenceDesign on file extension: .dsn goes to DSN binary
parser, .dat goes to DAT parser. No internal fallback; the agent
controls which path to use.

Remove discovery error for DSN designs without DAT files since DSN
parsing no longer requires them. Golden tests always parse via DAT
(gold standard) by resolving .dsn paths to their pstxnet.dat.
…N output

Parse Packages/{name} OLE streams to extract Device.pinMap, resolving
physical pin numbers (e.g., BGA A1, QFN pad numbers) instead of using
sequential indices. Set component mpn to sourcePackage as a baseline.
Add explore-packages and verify-pin-numbers scripts.
… matching

Add findPinMap with three matching strategies: direct sourcePackage,
multi-unit pkgName extraction (handles doubled unit refs like AA→A),
and version suffix normalization (_0_ → _0.0_). Pin number match rate
improves from 86% to 90% aggregate across all fixtures.
…t fields

Add dns?: boolean to ComponentDetails, set by parsers via isDnsComponent().
Strip DNS/DNI/DNP/DNF/NF marker tokens from mpn and value fields after
detection. Altium parser now checks Assembly Info parameter for NF markers.
Service and traversal layers read component.dns instead of recomputing.
Add --all flag to gen-golden.ts for batch regeneration.
Add NC (Not Connected) and DNM (Do Not Mount) to DNS_PATTERN and
stripDnsMarkers. CutiePi uses _NC suffix, LAUNCHXL-CC1310 uses _DNM.
Strip marker and trailing content from mid-string positions too.
Parse Library stream strLst string table and resolve PlacedInstance
prefix property pairs to extract real MPN and Value fields. Parse
LibraryPart SymbolPins from Package streams for functional pin names.

- Extract PrefixPropertyPair from short prefix in generic-parser
- Add library-parser.ts for Library stream strLst extraction
- Add parseSymbolPin/parseLibraryPart to structures.ts
- Use T0x10.pinIndex (1-based) for accurate pin map lookup
- Value: 3-source priority (prefix > partValueIdx > LibraryPart default)
- MPN: prefix properties with fallback to sourcePackage
- Pin names: LibraryPart SymbolPin lookup via pinIndex
- Add field-level coverage (Value, PinNum, PinName, MPN) to coverage report
The hook was scanning every word in a command, including arguments and
string literals. Words like 'more' in a commit message triggered false
positives. Now it breaks after finding the first command name per segment.
Three complementary fixes:
- Scan Cache stream for Package/Device structures as fallback pinMap source
- Merge multi-unit component pins (e.g. resistor packs with multiple PlacedInstances)
- Expand findPinMap matching with trailing _N suffix stripping and dual-key indexing
…Name to 85.9%

Replace brute-force byte scanning with sequential parsing based on
OpenOrCadParser StreamCache.cpp reference. The Cache is now parsed
entry-by-entry from header to EOF, extracting both Package (pin maps)
and LibraryPart (pin names) structures. BB-Black PinNum jumps from
66.7% to 99.3%, PinName from 0% to 98.4%.
Cache recovery scanner, findCachedPart matching, pin name
uppercasing/disambiguation, DNS value cleanup, and unit "A" fallback
bring coverage to PinNum 99.1%, PinName 96.0%, Value 99.9%.

Update docs/dsn-format.md with all findings. Add DSN parser reference
section to CLAUDE.md pointing to C++ source and format spec.
DSN binary stores some values uppercased (e.g., 100PF) while pstchip.dat
preserves original case (100pF). Count these as matches but flag them as
"case-transformed" in the report output.

Also add scripts/tsconfig.json to resolve IDE diagnostics for scripts.
…um 99.1% → 99.8%)

Multi-section components (resistor packs, transistor arrays) where all
sections share the same pkgName now get the correct pinMap via positional
dbId-order assignment instead of falling back to device "A" for all sections.
…nNum 99.8% → 100.0%)

Sentinel pins (netId=0xFFFFFFFF) that overlap power/ground global symbols
were unresolvable because they connect via geometric overlap, not wires.
Three fixes: (1) bbox containment matching connects globals to wire graph,
(2) pairingId-based fallback resolves net names for pins inside global
symbol bboxes across pages, (3) netId=0 pins with wire connections are
no longer silently dropped. Also adds point-on-segment matching for pins
on wire bodies (not just endpoints).
Pins connected directly to off-page connectors (no wire) were missing
because the OPC had no wire endpoint to match. Resolves net names via
OPC pairingId cross-page lookup: if the same OPC appears on another
page with a wire connection, the net name propagates back.
…→ 100.0%)

GraphicInst first 8 bytes are strLst indices: name_str_idx (net name)
and lib_str_idx (source library path), not unknown/pairing bytes.
OPC net names are now resolved directly via strLst lookup, fixing
pins connected to OPCs with no wire (e.g., U3.30/LOL, U3.32/LOR).
…inNum 100%)

When a physical package has more pads than the schematic symbol exposes
(e.g., a 2-pin crystal in a 4-pad XTAL-CM200S package), the Packages/
stream pinMap contains all physical pads while the Cache stream pinMap
contains only the schematic-level pins. The parser now stores Cache
pinMaps separately and falls back to them when the Packages/ pinMap
has more entries than the instance's T0x10 count.

Also: coverage report now shows missing PinNum details in verbose mode,
and CLAUDE.md updated with mandatory C++ reference workflow.
Extract 5 modules from the 1699-line monolith:
- cache-parser.ts: Cache stream parsing
- page-parser.ts: Page, Package, Hierarchy stream parsing
- pin-resolver.ts: pin number resolution (shared leaf module)
- net-builder.ts: wire graph connectivity and net assembly
- component-builder.ts: MPN, value, and pin name enrichment

Introduce PinMapData context type to replace threading 3 separate
Maps through ~10 function signatures. Add CachedLibraryPart to
structure-types.ts. dsn-parser.ts reduced to ~145-line orchestrator.

Add OpenOrCadParser attribution to NOTICE and README.md.
Extract coverage analysis from dev script into src/coverage.ts so
the compiled binary can compare DSN parser output against DAT netlist
exports. Reports field-level parity (nets, comps, value, MPN, DNS,
pinNum, pinName) with markdown file export and optional --verbose mode.
list_designs now returns pstxnet.dat as path (preferred) with source
pointing to the .DSN schematic when exported .dat files exist. Extracts
all server/tool description strings into src/descriptions.ts.
Extract the monolithic service.ts into focused modules under src/service/:
- service/index.ts (re-export hub)
- service/load-netlist.ts (design loading)
- service/component-grouping.ts (component aggregation)
- service/regex-helpers.ts (input validation)
- service/tools/ (one file per MCP tool handler)
Its only consumer is cadence-export.ts, so colocate it there.
…bar unescaping

Prefix matching used startsWith(), causing L to match LED, C to match CON, etc.
Now uses getRefdesPrefix() with Set lookup for exact matches.

Altium record parser now tries UTF-8 first and falls back to latin1 for
Windows-1252 encoded files (fixes corrupted µ, ±, ° characters).

Altium net names with overbar notation (\V\C\C) are now unescaped to
plain text (VCC) so they can be queried normally.
@valentinozegna valentinozegna merged commit de93e68 into main Mar 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant