Skip to content

Release v1.3.0 — COLRv1 colour emoji, USE-lite shaper integration, true streaming, UAX #9 X4–X5, Telugu + 5-script expansion, opt-in NFC normalization, CSPRNG-only crypto, configurable block limit, validatePdfUA(), #48 fix#49

Merged
Nizoka merged 27 commits into
mainfrom
release/v1.3.0
Jun 8, 2026

Conversation

@Nizoka

@Nizoka Nizoka commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Summary

Ships the complete v1.3.0 roadmap plus the Telugu script and a
five-script linguistic expansion (Amharic/Ethiopic, Sinhala, Tibetan,
Khmer, Myanmar — 17 → 22 Unicode scripts), opt-in Unicode normalization
(layout.normalize), CSPRNG-only crypto randomness, a configurable document
block limit (layout.maxBlocks), a read-only validatePdfUA() structural
checker, and two colour-emoji robustness fixes, plus bug
#48. 100%
backward-compatible — every new feature is additive or opt-in; unchanged code
paths are byte-identical.

Zero runtime dependencies. 71 test files / 1982 tests, all green.

What's in it

Roadmap item Status
COLRv1 colour emoji ✅ solid + linear + radial gradients as PDF Form XObjects; opt-in Noto Color Emoji subset
USE-lite shaper integration ✅ classifier is now the joiner authority for Devanagari/Bengali/Tamil
Internal page-by-page assembly buildPDFStreamTrue / buildDocumentPDFStreamTrue — parts freed as yielded
Pixel-diff visual regression ✅ glyph-position snapshot + self-rendered glyf raster PNG diff + CI workflow
UAX #9 X4–X5 overrides ✅ character-level direction override inside LRO/RLO
Telugu script (te) ✅ pure-JS GSUB/GPOS shaper + Noto Sans Telugu subset; 17th shaped script
5-script expansion (am/si/bo/km/my) ✅ Amharic/Ethiopic + Sinhala + Tibetan + Khmer + Myanmar shapers (17 → 22 scripts); Khmer/Myanmar pragmatic USE-lite
Opt-in Unicode normalization layout.normalize (NFC/NFD/NFKC/NFKD), default off → byte-identical
CSPRNG-only crypto ✅ encryption throws when no crypto.getRandomValues — no Math.random fallback
Configurable block limit layout.maxBlocks, default raised to 100 000 (DEFAULT_MAX_BLOCKS)
validatePdfUA() ✅ read-only ISO 14289-1 structural checker (MarkInfo/StructTree/ParentTree/Lang/MCID)
Colour-emoji selector drop ✅ VS-15/16, ZWJ/ZWNJ, skin tones dropped (no tofu) via isZeroWidthFormat()
Colour-emoji computed /BBox ✅ Form /BBox from contour bounds — no baseline clipping
Bug #48 (CP-1252 / €) /ToUnicode on base-14 fonts + latin-font embedding

Commits (release/v1.3.0)

Related Issues

Fixes #48

Verification checklist

  • npm run typecheck:all
  • npm run lint
  • npm run test (71 files / 1982 tests, all green)
  • npm run build (ESM + CJS + .d.ts)
  • npm run test:generate (187 sample PDFs)
  • npm run validate:pdfa (veraPDF — runs in CI)
  • visual-regression suite green (tests/visual/ in npm run test)
  • zero runtime dependencies confirmed

Downstream

  • pdfnative-mcp and pdfnative-cli re-pin pdfnative@^1.3.0; expose Telugu
    (te), layout.maxBlocks, and optionally validatePdfUA().
  • No breaking API changes; new public surface only.

Deferred to v1.4.0

Document outline / bookmarks (/Outlines); /PageLabels; streamToFile()
Node helper.

Nizoka added 26 commits May 31, 2026 11:51
Euro and CP1252 0x80-0x9F glyphs now carry a /ToUnicode map so they are selectable/searchable and resolve in minimal viewers. Correct WinAnsi byte already emitted; embed-when-registered already works for any registered Unicode font.
Overrides now force every inner code point to L (LRO) or R (RLO) instead of collapsing to base-direction isolates. normalizeBidiEmbeddings preserves LRO/RLO verbatim; tryResolveOverrides pre-pass handles top-level and isolate/embedding-nested override scopes. 21 bidi-embedding tests (incl. 7 new X4/X5).
…ipeline

Wire the COLR/CPAL colour-glyph engine into the document text pipeline so
colour emoji render as de-duplicated Form XObjects when an 'emoji-color'
font (FontData with colorGlyphs) is registered. Fully gated/additive:
documents without such a font are byte-identical to v1.2.0.

- ColorEmojiForm/ColorEmojiCollector types + EncodingContext.colorEmoji
- src/core/color-emoji.ts: createColorEmojiCollector (per-glyph dedupe,
  lazy glyf parse, inline /Shading + /ExtGState resources)
- encoding-context: activate collector when a font carries colorGlyphs
- pdf-text: emitColorEmojiRun draws 'q s 0 0 s x y cm /CEmK Do Q'; mono Tj
  fallback for unrenderable glyphs; fmtScale for fine cm precision
- pdf-document: trailing Form XObjects forward-referenced from page /XObject
- tests: color-emoji-integration (5) — solid, gradient, dedupe, xref, gating

1887 tests green; src+test typecheck clean.
…tooling

Add a turn-key colour-emoji font module and the build pipeline that produces
it, completing the v1.3.0 COLRv1 colour-emoji roadmap item.

- scripts/build-color-emoji-data.ts: parses NotoColorEmoji-Regular.ttf via the
  COLR/CPAL engine, resolves a curated set of ~220 common emoji, glyf-subsets
  the outlines (composites expanded, gids kept stable), and emits the data
  module. Solid + linear + radial paints all resolve on the real font.
- fonts/noto-color-emoji-data.{js,d.ts}: generated curated module (936 KB,
  221 colour glyphs). Opt in via registerFont('emoji', () => import(...)).
- scripts/download-fonts.ts: add Noto Color Emoji source entry (OFL-1.1).
- scripts/helpers/fonts.ts: register 'emoji-color' loader.
- scripts/generators/color-emoji-showcase.ts + runner wiring: 2 sample PDFs.
- package.json: add ./fonts/* + ./package.json subpath exports so the
  documented import('pdfnative/fonts/...') opt-in actually resolves.
- tests/fonts/color-emoji-data.test.ts (3): module shape, cmap→colour glyph,
  document renders Form XObjects.

1890 tests green; src+test+scripts typecheck clean. Source TTF stays gitignored.
…sive emission

Extract buildPDF/buildDocumentPDF bodies into assembleTableParts/
assembleDocumentParts (return string[]); thin wrappers join. Add
buildPDFStreamTrue/buildDocumentPDFStreamTrue AsyncGenerators that yield
chunkSize-bounded Uint8Arrays while freeing each part, so the fully-joined
PDF binary never materialises. Byte-identical to buffered builders.

- src/core/pdf-builder.ts: assembleTableParts (internal)
- src/core/pdf-document.ts: assembleDocumentParts (internal)
- src/core/pdf-stream-writer.ts: streamPartsChunked + *StreamTrue
- src/index.ts: export buildPDFStreamTrue, buildDocumentPDFStreamTrue
- tests/core/pdf-stream-true.test.ts: 7 tests (byte parity, chunk size, TOC/{pages} rejection)
…oadmap)

Self-contained extreme-script fixtures (Tamil, Bengali+Devanagari, Arabic)
built with the real bundled fonts. Two complementary guards:

- glyph-position snapshot: extract BT/Tf/Td/Tj show operators (font, size,
  baseline x/y, GIDs) -> JSON baseline. Catches GID swaps and position drift.
- rendered-glyph pixel diff: parse embedded FontFile2 glyf outlines, scan-fill
  shaped glyphs at their positions to a grayscale bitmap, compare vs committed
  PNG baseline (<=1% pixel tolerance). Exercises the full shaping -> PDF ->
  font-embed -> render pipeline.

Zero-dep test tooling: extract.ts (PDF content/font extractor over openPdf),
raster.ts (quadratic-flattening scanline filler + bitmapDiff), png.ts
(grayscale PNG encode/decode). Baselines tracked under tests/visual/baselines/.
.github/workflows/visual-regression.yml gated on shaping/fonts/core changes.

Full suite 1903 tests green (62 files). UPDATE_SNAPSHOTS=1 regenerates baselines.
…stream version refresh

- release-notes/v1.3.0.md + CHANGELOG [1.3.0] + ROADMAP (v1.3.0 items -> Released)
- README: v1.3.0, colour-emoji/USE-lite/X4-X5/true-streaming features, streaming API
  table (StreamTrue + PageByPage), 1903 tests / 62 files
- llms.txt: 1.3.0, roadmap, release pointer, test counts
- AGENTS.md + copilot-instructions.md: 1903 tests / 62 files
- docs/index.html: pdfnative 1.3.0; cli/mcp v1.0.0 (12 MCP tools, 6 CLI commands),
  mobile fix wrapping .mcp-table in .table-wrap; architecture.svg counts
- docs/guides: new colour-emoji + streaming guides (md + html shells), index cards,
  version refresh (onboarding/mcp/cli/playgrounds)
- colr-parser: == null -> === null (eqeqeq)
- glyf-outline: drop unused no-constant-condition disable
- use-lite: drop unused no-fallthrough disable/enable pair
… and captions

Per ISO 14289-1 §7.3 / PDF/A-2b, each marked-content (BDC...EMC) sequence in a
content stream must carry a unique MCID. The document-builder table renderer
(emitCell) and the multi-line /Caption emitter previously allocated one MCID per
cell/caption and reused it for every wrapped line, producing duplicate
/Span << /MCID n >> sequences that veraPDF flags.

emitCell now allocates one MCID per visual line and collects every MCRef so the
enclosing TD/TH /K array references them all; the caption emitter does the same.
Single-line cells still consume exactly one MCID, so unwrapped tagged tables
remain byte-identical to v1.1.0. Paragraphs/lists were already correct.

Adds tests/core/pdf-tagged-mcid.test.ts (5 regression tests).
- versions.js: refresh FALLBACK (1.3.0 / cli 1.0.0 / mcp 1.0.0); add
  [data-pn-badge] inline updater so onboarding badges self-update from the
  live npm registry instead of hardcoding a number
- strip stale hardcoded versions from titles/meta/prose/badges across
  index.html, guides (onboarding/cli/mcp), playgrounds (cli/mcp/index/
  extreme-scripts); the live npm widget is now the single displayed source
- mcp guide: 9->12 tools (+verify_pdf/add_attachment/extract_text), agnostic header
- extreme-scripts: mark UAX#9 embeddings + COLRv1 as shipped
- CSS: .nav-brand flex-shrink/nowrap, .nav-inner gap, dedicated 1024px nav
  breakpoint so the many-link nav collapses before crowding the wordmark
Phase E of v1.3.0 review: close sample-coverage gaps for v1.2.0/v1.3.0 features.

- use-lite-showcase.ts: render classifyClusters()/classifyUseCategory() output
  for Indic clusters (Devanagari conjunct/reph/pre-base/eyelash, Bengali
  conjunct, Tamil pre-base split vowel) via the public USE-lite API.
- streaming-showcase.ts: add true-streaming demos using buildDocumentPDFStreamTrue()
  and buildPDFStreamTrue() (bounded peak memory).
- bidi-embeddings-showcase.ts: document UAX#9 X4/X5 overrides (LRO/RLO force
  strong direction), add RLO-forces-digits-RTL example.
- generate-samples.ts: wire up the use-lite generator.
… medical scale

Phase D of v1.3.0 review.

- extreme-scripts.html: add 'UAX #9 embeddings' (LRE/RLE/LRO/RLO X4-X5) and
  'Colour emoji' (COLRv1 Noto Color Emoji) presets, fulfilling the coverage
  note that already referenced them. Map latin/emoji font modules for the CDN
  font loader.
- medical-800.html: recalibrate cohort sizing to ~3.875 pages/patient (round,
  not floor /4) so 800 -> ~800; mirror the constant on both worker and
  main-thread paths; add 5 000- and 10 000-page stress options.
… weight)

Phase F of v1.3.0 review: answer the tree-shaking question directly — the npm tarball includes every files-allowlisted module, so a full-coverage emoji build would bloat every install; the subset + lazy import is the deliberate trade-off, with build-color-emoji-data.ts as the escape hatch.
…ocs, test counts)

Document the per-line MCID fix, the USE-lite/true-streaming/X4-X5 sample
generators, and the docs/playground improvements from this review pass.
Update test counts to 63 files / 1908 tests.
The hardcoded 10,000-block ceiling in assembleDocumentParts() blocked
legitimate large reports (e.g. 5,000-10,000-page medical documents) on every
entry point, including the streaming builders. Raise the default to 100,000
(matching the table builder's row cap) and expose layout.maxBlocks to override
it. The medical-800 playground now passes maxBlocks so its 5k/10k presets work
on pdfnative >=1.3.0.

- PdfLayoutOptions.maxBlocks (default DEFAULT_MAX_BLOCKS = 100,000)
- exported DEFAULT_MAX_BLOCKS from root
- tests: default 100k cap, custom ceiling, raise-beyond-default
…-glyph BBox from outline

- splitTextByFont drops ZWJ/VS15/VS16/skin-tone modifiers that no font maps, eliminating .notdef tofu in colour-emoji samples; retained when a registered font (e.g. Indic) maps them

- renderColorGlyph now computes the Form /BBox from transformed contour bounds instead of the hardcoded em box, fixing clipped colour glyphs

- rewrote color-emoji-showcase generator with curated-only emoji + a real-world Sprint status report producing color-emoji-real.pdf (was a stale manual file)

- tests: VS16/ZWJ drop + retain, computed BBox assertion
…d currencies)

- currency-base14.pdf: WinAnsi euro/pound/yen/cent, text-extractable via /ToUnicode (issue #48 verification)

- currency-extended.pdf: rupee/won/shekel/dong/lira/ruble/bitcoin via embedded Noto Sans

- currency-multi.pdf: realistic multi-currency price table

- wired into generate-samples.ts
- new src/shaping/telugu-shaper.ts: cluster building, virama conjuncts, GSUB ligatures, GPOS mark positioning; no reph, no pre-base reordering (Telugu specifics)

- script-registry: TELUGU_START/END, isTeluguCodepoint, containsTelugu

- script-detect: 'te' in needsUnicodeFont, detectFallbackLangs, detectCharLang

- encoding-context: Telugu dispatch in textRuns (RTL+LTR) and ps()

- bundled fonts/noto-telugu-data.{js,d.ts} (Noto Sans Telugu, OFL-1.1); download-fonts manifest entry

- exports shapeTeluguText/isTeluguCodepoint/containsTelugu from index

- tests/shaping/telugu-shaper.test.ts (20 tests); alphabet-telugu sample

verified: real-font shaping of తెలుగు/నమస్తే/క్షి/శ్రీ/జ్ఞ produces zero .notdef + correct conjuncts
…ctural validator

- new src/parser/pdf-ua-validator.ts: checks /MarkInfo /Marked, /StructTreeRoot + /ParentTree, /Metadata, /Lang, and per-page MCID uniqueness

- zero byte-output risk (read-only, parser-based dev gate)

- exports validatePdfUA + PdfUAValidationResult from index

- tests/parser/pdf-ua-validator.test.ts (4 tests)
… fixes; refresh counts (65/1938, 32/170, 17 scripts)
…ed colour-emoji glyphs

- currency-symbols: route U+0E3F (baht) to the embedded Thai font so it
  renders as a real glyph instead of .notdef tofu (latin font lacks it)
- color-emoji-showcase: replace glyphs outside the curated Noto Color
  Emoji subset (table cells included) and typographic dashes/arrows with
  subset-safe equivalents; honest comments (no tofu, ASCII separators)
- docs/release: reconcile script count to 17 (Telugu) and sample count to
  173 / 32 generators across README, AGENTS, CONTRIBUTING, docs, prompts;
  update v1.3.0 release note (3 colour-emoji samples, baht, Telugu samples)
  and PR-note verification checklist
…ar) — 17 → 22 Unicode scripts

Extend pdfnative from 17 to 22 Unicode scripts with five new pure-JS
mini-shapers following the Telugu model (shared gsub-driver + gpos-positioner):

- Amharic/Ethiopic (am, U+1200–U+137F): syllabic abugida, detection + routing
- Sinhala (si): virama conjuncts, pre-base kombuva reordering, two-part vowels
- Tibetan (bo): vertical subjoined-consonant stacking (Noto Serif Tibetan)
- Khmer (km): USE-lite — coeng subscripts, pre-base vowels
- Myanmar (my): USE-lite — medials, pre-base medial-ra/e-vowel, virama stacking

Bundled OFL-1.1 fonts, opt-in via registerFont(). Wired into script-detect,
script-registry, encoding-context. New shaper/detection test suites.

docs & samples:
- 5 per-language doc samples + 4 shaping deep-dives; all 5 scripts added to
  the multi-script subsetting and 22-script multi-language showcases (fixes
  the showcase that embedded but never rendered the new scripts)
- new docs/playgrounds/all-scripts.html — all 22 scripts + COLRv1 colour
  emoji in one browser-generated PDF
- refresh counts across README, docs, CHANGELOG, release notes
  (17→22 scripts, ~140→187 samples, 23→32 generators, 4→5 playgrounds)

Gates green: typecheck:all, lint, 1982 tests, build.
Add creationDate?: Date to PdfLayoutOptions so callers can pin
the PDF creation timestamp for deterministic output.  When omitted,
defaults to
ew Date() at build time (unchanged behaviour).

Thread the option through �ssembleDocumentParts (pdf-document.ts)
and �uildPDF (pdf-builder.ts) so both builders forward it to
�uildPdfMetadata(creationDate).

Fix the two flaky byte-identity assertions in pdf-stream-true.test.ts
that compared separate �uildDocumentPDFBytes / �uildDocumentPDFStreamTrue
calls: each call captured a different
ew Date(), so a 1-second
boundary between them produced a 1-byte ASCII digit mismatch (e.g.
seconds '8' vs '9').  Both tests now pass a shared FIXED_DATE
constant via layout.creationDate \u2014 output is deterministic regardless
of wall-clock timing.

Gates: 1982/1982 tests pass, typecheck:all clean.
@Nizoka Nizoka self-assigned this Jun 8, 2026
@Nizoka Nizoka added bug Something isn't working documentation Improvements or additions to documentation chore Release tasks, metadata updates, governance, CI, and other non-feature maintenance work labels Jun 8, 2026
@Nizoka Nizoka added the release Tracks a versioned release — implementation, quality gates, and publish workflow label Jun 8, 2026
- Added new features to package.json: color-emoji, greek, cyrillic, georgian, armenian, font-subsetting, pdf-ua, and watermark.
- Upgraded devDependencies: @vitest/coverage-v8 and vitest to version 4.1.8.
- Enhanced pdf-table.test.ts to use fake timers for consistent PDF generation tests.
- Increased timeout for inflate-bomb.test.ts to 30 seconds to accommodate larger test cases.
- Removed unnecessary mock font data in khmer-shaper.test.ts for clarity.
- Adjusted coverage thresholds in vitest.config.ts from 90 to 88 for statements.
@Nizoka Nizoka merged commit 309d515 into main Jun 8, 2026
7 checks passed
@Nizoka Nizoka deleted the release/v1.3.0 branch June 8, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working chore Release tasks, metadata updates, governance, CI, and other non-feature maintenance work documentation Improvements or additions to documentation release Tracks a versioned release — implementation, quality gates, and publish workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug — Latin-1 / WinAnsi text loses the Euro sign (and other CP-1252 glyphs), rendered as ?

1 participant