Skip to content

Extract shared resource container infrastructure across generators and loaders #636

@frostney

Description

@frostney

Summary

Extract the duplicated binary resource container code shared by the three generator scripts (JS) and three embedded resource loaders (Pascal) into shared modules.

Why

After #607 added the Unicode data generator and loader, three independent generators and three independent loaders now contain structurally identical code. The duplication is stable — the binary format is the same across all three — so extracting it reduces maintenance surface without coupling unrelated domains.

Current behavior

JS generatorsgenerate-timezone-data.js, generate-intl-data.js, and generate-unicode-data.js each contain identical copies of:

  • writeUInt32LE, buildResourceContainer, generateResourceFile
  • pascalUnitNameForOutput, resourceFileForOutput
  • Constants: RESOURCE_HEADER_SIZE, RESOURCE_ENTRY_SIZE, RESOURCE_FORMAT_VERSION, PASCAL_UNIT_IDENTIFIER_PATTERN, MAX_REDIRECTS, DOWNLOAD_TIMEOUT_MS

~150 lines duplicated per script.

Pascal loadersGoccia.Temporal.TimeZoneData.pas, Goccia.Intl.CLDRData.pas, and Goccia.RegExp.UnicodeData.pas each contain structurally identical:

  • HasExpectedMagic, TryReadEmbeddedHeader, TryReadEmbeddedEntry, TryGetEntryName / TryCompareEntryName, TryFindEmbeddedEntry, TryReadEmbeddedResource

Only the constant prefixes (UCD_ / TIME_ZONE_DATA_ / CLDR_DATA_) and magic bytes differ. ~150 lines duplicated per loader.

Expected behavior

JS: A shared scripts/lib/resource-container.js module exporting the container-building and resource-compilation functions. Each generator imports the shared module and only defines its domain-specific logic (data downloading, parsing, entry collection).

Pascal: EmbeddedResourceReader.pas (already exists with low-level primitives) extended with a generic container reader parameterized by magic bytes, or a new EmbeddedResourceContainer.pas shared unit providing TryReadHeader, TryFindEntry, and TryLoadFromResource. Each loader delegates to the shared unit and only defines its domain-specific extraction logic.

Scope notes

  • The binary format (8-byte magic, 6×uint32 header, 16-byte entry table, names section, data section) is already unified across all three resources — this is a pure deduplication, not a format change.
  • EmbeddedResourceReader.pas already provides HasBytesAvailable, ReadUInt32LE, and CopyStringFromBytes. The header/entry/search layer is the missing abstraction.
  • The downloadFile function is also duplicated between the timezone and unicode generators but not the Intl generator (which uses npm). Only extract download if it can be shared cleanly.
  • Related: Generate full Unicode property tables for regex \p{} escapes #607 (introduced the third copy).

Metadata

Metadata

Assignees

No one assigned

    Labels

    internalRefactoring, CI, tooling, cleanup

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions