Release v1.3.0 — COLRv1 colour emoji, USE-lite shaper integration, true streaming, UAX #9 X4–X5, Telugu + 5-script expansion, opt-in NFC normalization, CSPRNG-only crypto, configurable block limit, validatePdfUA(), #48 fix by Nizoka · Pull Request #49 · Nizoka/pdfnative

Nizoka · 2026-06-08T16:18:08Z

Summary

Ships the complete v1.3.0 roadmap plus the Telugu script and a
five-script linguistic expansion (Amharic/Ethiopic, Sinhala, Tibetan,
Khmer, Myanmar — 17 → 22 Unicode scripts), opt-in Unicode normalization
(layout.normalize), CSPRNG-only crypto randomness, a configurable document
block limit (layout.maxBlocks), a read-only validatePdfUA() structural
checker, and two colour-emoji robustness fixes, plus bug
#48. 100%
backward-compatible — every new feature is additive or opt-in; unchanged code
paths are byte-identical.

Zero runtime dependencies. 71 test files / 1982 tests, all green.

What's in it

Roadmap item	Status
COLRv1 colour emoji	✅ solid + linear + radial gradients as PDF Form XObjects; opt-in Noto Color Emoji subset
USE-lite shaper integration	✅ classifier is now the joiner authority for Devanagari/Bengali/Tamil
Internal page-by-page assembly	✅ `buildPDFStreamTrue` / `buildDocumentPDFStreamTrue` — parts freed as yielded
Pixel-diff visual regression	✅ glyph-position snapshot + self-rendered `glyf` raster PNG diff + CI workflow
UAX #9 X4–X5 overrides	✅ character-level direction override inside LRO/RLO
Telugu script (`te`)	✅ pure-JS GSUB/GPOS shaper + Noto Sans Telugu subset; 17th shaped script
5-script expansion (`am`/`si`/`bo`/`km`/`my`)	✅ Amharic/Ethiopic + Sinhala + Tibetan + Khmer + Myanmar shapers (17 → 22 scripts); Khmer/Myanmar pragmatic USE-lite
Opt-in Unicode normalization	✅ `layout.normalize` (`NFC`/`NFD`/`NFKC`/`NFKD`), default off → byte-identical
CSPRNG-only crypto	✅ encryption throws when no `crypto.getRandomValues` — no `Math.random` fallback
Configurable block limit	✅ `layout.maxBlocks`, default raised to 100 000 (`DEFAULT_MAX_BLOCKS`)
`validatePdfUA()`	✅ read-only ISO 14289-1 structural checker (MarkInfo/StructTree/ParentTree/Lang/MCID)
Colour-emoji selector drop	✅ VS-15/16, ZWJ/ZWNJ, skin tones dropped (no tofu) via `isZeroWidthFormat()`
Colour-emoji computed `/BBox`	✅ Form `/BBox` from contour bounds — no baseline clipping
Bug #48 (CP-1252 / €)	✅ `/ToUnicode` on base-14 fonts + latin-font embedding

Commits (release/v1.3.0)

Phase 0–1: branch + 1.3.0 bump; Bug — Latin-1 / WinAnsi text loses the Euro sign € (and other CP-1252 glyphs), rendered as ? #48 ToUnicode (9e18a0e)
Phase 2: UAX chore(deps): bump typescript-eslint from 8.57.2 to 8.58.1 in the dev-dependencies group #9 X4–X5 overrides (754d0f3)
Phase 3: USE-lite shaper integration (6227793)
Phase 4: colour-emoji engine (23d39d3), integration (d0b6b32), tooling+module (18d8e8f)
Phase 5: true constant-memory streaming (036a202)
Phase 6: pixel-diff visual regression (e2c7724)
Phase 7: docs + downstream + mobile
Phase 8: verification

Related Issues

Fixes #48

Verification checklist

npm run typecheck:all
npm run lint
npm run test (71 files / 1982 tests, all green)
npm run build (ESM + CJS + .d.ts)
npm run test:generate (187 sample PDFs)
npm run validate:pdfa (veraPDF — runs in CI)
visual-regression suite green (tests/visual/ in npm run test)
zero runtime dependencies confirmed

Downstream

pdfnative-mcp and pdfnative-cli re-pin pdfnative@^1.3.0; expose Telugu
(te), layout.maxBlocks, and optionally validatePdfUA().
No breaking API changes; new public surface only.

Deferred to v1.4.0

Document outline / bookmarks (/Outlines); /PageLabels; streamToFile()
Node helper.

Euro and CP1252 0x80-0x9F glyphs now carry a /ToUnicode map so they are selectable/searchable and resolve in minimal viewers. Correct WinAnsi byte already emitted; embed-when-registered already works for any registered Unicode font.

Overrides now force every inner code point to L (LRO) or R (RLO) instead of collapsing to base-direction isolates. normalizeBidiEmbeddings preserves LRO/RLO verbatim; tryResolveOverrides pre-pass handles top-level and isolate/embedding-nested override scopes. 21 bidi-embedding tests (incl. 7 new X4/X5).

…cases

… parser, native PDF shading renderer

…ipeline Wire the COLR/CPAL colour-glyph engine into the document text pipeline so colour emoji render as de-duplicated Form XObjects when an 'emoji-color' font (FontData with colorGlyphs) is registered. Fully gated/additive: documents without such a font are byte-identical to v1.2.0. - ColorEmojiForm/ColorEmojiCollector types + EncodingContext.colorEmoji - src/core/color-emoji.ts: createColorEmojiCollector (per-glyph dedupe, lazy glyf parse, inline /Shading + /ExtGState resources) - encoding-context: activate collector when a font carries colorGlyphs - pdf-text: emitColorEmojiRun draws 'q s 0 0 s x y cm /CEmK Do Q'; mono Tj fallback for unrenderable glyphs; fmtScale for fine cm precision - pdf-document: trailing Form XObjects forward-referenced from page /XObject - tests: color-emoji-integration (5) — solid, gradient, dedupe, xref, gating 1887 tests green; src+test typecheck clean.

…tooling Add a turn-key colour-emoji font module and the build pipeline that produces it, completing the v1.3.0 COLRv1 colour-emoji roadmap item. - scripts/build-color-emoji-data.ts: parses NotoColorEmoji-Regular.ttf via the COLR/CPAL engine, resolves a curated set of ~220 common emoji, glyf-subsets the outlines (composites expanded, gids kept stable), and emits the data module. Solid + linear + radial paints all resolve on the real font. - fonts/noto-color-emoji-data.{js,d.ts}: generated curated module (936 KB, 221 colour glyphs). Opt in via registerFont('emoji', () => import(...)). - scripts/download-fonts.ts: add Noto Color Emoji source entry (OFL-1.1). - scripts/helpers/fonts.ts: register 'emoji-color' loader. - scripts/generators/color-emoji-showcase.ts + runner wiring: 2 sample PDFs. - package.json: add ./fonts/* + ./package.json subpath exports so the documented import('pdfnative/fonts/...') opt-in actually resolves. - tests/fonts/color-emoji-data.test.ts (3): module shape, cmap→colour glyph, document renders Form XObjects. 1890 tests green; src+test+scripts typecheck clean. Source TTF stays gitignored.

…sive emission Extract buildPDF/buildDocumentPDF bodies into assembleTableParts/ assembleDocumentParts (return string[]); thin wrappers join. Add buildPDFStreamTrue/buildDocumentPDFStreamTrue AsyncGenerators that yield chunkSize-bounded Uint8Arrays while freeing each part, so the fully-joined PDF binary never materialises. Byte-identical to buffered builders. - src/core/pdf-builder.ts: assembleTableParts (internal) - src/core/pdf-document.ts: assembleDocumentParts (internal) - src/core/pdf-stream-writer.ts: streamPartsChunked + *StreamTrue - src/index.ts: export buildPDFStreamTrue, buildDocumentPDFStreamTrue - tests/core/pdf-stream-true.test.ts: 7 tests (byte parity, chunk size, TOC/{pages} rejection)

…oadmap) Self-contained extreme-script fixtures (Tamil, Bengali+Devanagari, Arabic) built with the real bundled fonts. Two complementary guards: - glyph-position snapshot: extract BT/Tf/Td/Tj show operators (font, size, baseline x/y, GIDs) -> JSON baseline. Catches GID swaps and position drift. - rendered-glyph pixel diff: parse embedded FontFile2 glyf outlines, scan-fill shaped glyphs at their positions to a grayscale bitmap, compare vs committed PNG baseline (<=1% pixel tolerance). Exercises the full shaping -> PDF -> font-embed -> render pipeline. Zero-dep test tooling: extract.ts (PDF content/font extractor over openPdf), raster.ts (quadratic-flattening scanline filler + bitmapDiff), png.ts (grayscale PNG encode/decode). Baselines tracked under tests/visual/baselines/. .github/workflows/visual-regression.yml gated on shaping/fonts/core changes. Full suite 1903 tests green (62 files). UPDATE_SNAPSHOTS=1 regenerates baselines.

…stream version refresh - release-notes/v1.3.0.md + CHANGELOG [1.3.0] + ROADMAP (v1.3.0 items -> Released) - README: v1.3.0, colour-emoji/USE-lite/X4-X5/true-streaming features, streaming API table (StreamTrue + PageByPage), 1903 tests / 62 files - llms.txt: 1.3.0, roadmap, release pointer, test counts - AGENTS.md + copilot-instructions.md: 1903 tests / 62 files - docs/index.html: pdfnative 1.3.0; cli/mcp v1.0.0 (12 MCP tools, 6 CLI commands), mobile fix wrapping .mcp-table in .table-wrap; architecture.svg counts - docs/guides: new colour-emoji + streaming guides (md + html shells), index cards, version refresh (onboarding/mcp/cli/playgrounds)

- colr-parser: == null -> === null (eqeqeq) - glyf-outline: drop unused no-constant-condition disable - use-lite: drop unused no-fallthrough disable/enable pair

… and captions Per ISO 14289-1 §7.3 / PDF/A-2b, each marked-content (BDC...EMC) sequence in a content stream must carry a unique MCID. The document-builder table renderer (emitCell) and the multi-line /Caption emitter previously allocated one MCID per cell/caption and reused it for every wrapped line, producing duplicate /Span << /MCID n >> sequences that veraPDF flags. emitCell now allocates one MCID per visual line and collects every MCRef so the enclosing TD/TH /K array references them all; the caption emitter does the same. Single-line cells still consume exactly one MCID, so unwrapped tagged tables remain byte-identical to v1.1.0. Paragraphs/lists were already correct. Adds tests/core/pdf-tagged-mcid.test.ts (5 regression tests).

- versions.js: refresh FALLBACK (1.3.0 / cli 1.0.0 / mcp 1.0.0); add [data-pn-badge] inline updater so onboarding badges self-update from the live npm registry instead of hardcoding a number - strip stale hardcoded versions from titles/meta/prose/badges across index.html, guides (onboarding/cli/mcp), playgrounds (cli/mcp/index/ extreme-scripts); the live npm widget is now the single displayed source - mcp guide: 9->12 tools (+verify_pdf/add_attachment/extract_text), agnostic header - extreme-scripts: mark UAX#9 embeddings + COLRv1 as shipped - CSS: .nav-brand flex-shrink/nowrap, .nav-inner gap, dedicated 1024px nav breakpoint so the many-link nav collapses before crowding the wordmark

Phase E of v1.3.0 review: close sample-coverage gaps for v1.2.0/v1.3.0 features. - use-lite-showcase.ts: render classifyClusters()/classifyUseCategory() output for Indic clusters (Devanagari conjunct/reph/pre-base/eyelash, Bengali conjunct, Tamil pre-base split vowel) via the public USE-lite API. - streaming-showcase.ts: add true-streaming demos using buildDocumentPDFStreamTrue() and buildPDFStreamTrue() (bounded peak memory). - bidi-embeddings-showcase.ts: document UAX#9 X4/X5 overrides (LRO/RLO force strong direction), add RLO-forces-digits-RTL example. - generate-samples.ts: wire up the use-lite generator.

… medical scale Phase D of v1.3.0 review. - extreme-scripts.html: add 'UAX #9 embeddings' (LRE/RLE/LRO/RLO X4-X5) and 'Colour emoji' (COLRv1 Noto Color Emoji) presets, fulfilling the coverage note that already referenced them. Map latin/emoji font modules for the CDN font loader. - medical-800.html: recalibrate cohort sizing to ~3.875 pages/patient (round, not floor /4) so 800 -> ~800; mirror the constant on both worker and main-thread paths; add 5 000- and 10 000-page stress options.

… weight) Phase F of v1.3.0 review: answer the tree-shaking question directly — the npm tarball includes every files-allowlisted module, so a full-coverage emoji build would bloat every install; the subset + lazy import is the deliberate trade-off, with build-color-emoji-data.ts as the escape hatch.

…ocs, test counts) Document the per-line MCID fix, the USE-lite/true-streaming/X4-X5 sample generators, and the docs/playground improvements from this review pass. Update test counts to 63 files / 1908 tests.

The hardcoded 10,000-block ceiling in assembleDocumentParts() blocked legitimate large reports (e.g. 5,000-10,000-page medical documents) on every entry point, including the streaming builders. Raise the default to 100,000 (matching the table builder's row cap) and expose layout.maxBlocks to override it. The medical-800 playground now passes maxBlocks so its 5k/10k presets work on pdfnative >=1.3.0. - PdfLayoutOptions.maxBlocks (default DEFAULT_MAX_BLOCKS = 100,000) - exported DEFAULT_MAX_BLOCKS from root - tests: default 100k cap, custom ceiling, raise-beyond-default

…-glyph BBox from outline - splitTextByFont drops ZWJ/VS15/VS16/skin-tone modifiers that no font maps, eliminating .notdef tofu in colour-emoji samples; retained when a registered font (e.g. Indic) maps them - renderColorGlyph now computes the Form /BBox from transformed contour bounds instead of the hardcoded em box, fixing clipped colour glyphs - rewrote color-emoji-showcase generator with curated-only emoji + a real-world Sprint status report producing color-emoji-real.pdf (was a stale manual file) - tests: VS16/ZWJ drop + retain, computed BBox assertion

…d currencies) - currency-base14.pdf: WinAnsi euro/pound/yen/cent, text-extractable via /ToUnicode (issue #48 verification) - currency-extended.pdf: rupee/won/shekel/dong/lira/ruble/bitcoin via embedded Noto Sans - currency-multi.pdf: realistic multi-currency price table - wired into generate-samples.ts

- new src/shaping/telugu-shaper.ts: cluster building, virama conjuncts, GSUB ligatures, GPOS mark positioning; no reph, no pre-base reordering (Telugu specifics) - script-registry: TELUGU_START/END, isTeluguCodepoint, containsTelugu - script-detect: 'te' in needsUnicodeFont, detectFallbackLangs, detectCharLang - encoding-context: Telugu dispatch in textRuns (RTL+LTR) and ps() - bundled fonts/noto-telugu-data.{js,d.ts} (Noto Sans Telugu, OFL-1.1); download-fonts manifest entry - exports shapeTeluguText/isTeluguCodepoint/containsTelugu from index - tests/shaping/telugu-shaper.test.ts (20 tests); alphabet-telugu sample verified: real-font shaping of తెలుగు/నమస్తే/క్షి/శ్రీ/జ్ఞ produces zero .notdef + correct conjuncts

…ctural validator - new src/parser/pdf-ua-validator.ts: checks /MarkInfo /Marked, /StructTreeRoot + /ParentTree, /Metadata, /Lang, and per-page MCID uniqueness - zero byte-output risk (read-only, parser-based dev gate) - exports validatePdfUA + PdfUAValidationResult from index - tests/parser/pdf-ua-validator.test.ts (4 tests)

… fixes; refresh counts (65/1938, 32/170, 17 scripts)

…ed colour-emoji glyphs - currency-symbols: route U+0E3F (baht) to the embedded Thai font so it renders as a real glyph instead of .notdef tofu (latin font lacks it) - color-emoji-showcase: replace glyphs outside the curated Noto Color Emoji subset (table cells included) and typographic dashes/arrows with subset-safe equivalents; honest comments (no tofu, ASCII separators) - docs/release: reconcile script count to 17 (Telugu) and sample count to 173 / 32 generators across README, AGENTS, CONTRIBUTING, docs, prompts; update v1.3.0 release note (3 colour-emoji samples, baht, Telugu samples) and PR-note verification checklist

…ar) — 17 → 22 Unicode scripts Extend pdfnative from 17 to 22 Unicode scripts with five new pure-JS mini-shapers following the Telugu model (shared gsub-driver + gpos-positioner): - Amharic/Ethiopic (am, U+1200–U+137F): syllabic abugida, detection + routing - Sinhala (si): virama conjuncts, pre-base kombuva reordering, two-part vowels - Tibetan (bo): vertical subjoined-consonant stacking (Noto Serif Tibetan) - Khmer (km): USE-lite — coeng subscripts, pre-base vowels - Myanmar (my): USE-lite — medials, pre-base medial-ra/e-vowel, virama stacking Bundled OFL-1.1 fonts, opt-in via registerFont(). Wired into script-detect, script-registry, encoding-context. New shaper/detection test suites. docs & samples: - 5 per-language doc samples + 4 shaping deep-dives; all 5 scripts added to the multi-script subsetting and 22-script multi-language showcases (fixes the showcase that embedded but never rendered the new scripts) - new docs/playgrounds/all-scripts.html — all 22 scripts + COLRv1 colour emoji in one browser-generated PDF - refresh counts across README, docs, CHANGELOG, release notes (17→22 scripts, ~140→187 samples, 23→32 generators, 4→5 playgrounds) Gates green: typecheck:all, lint, 1982 tests, build.

Add creationDate?: Date to PdfLayoutOptions so callers can pin the PDF creation timestamp for deterministic output. When omitted, defaults to ew Date() at build time (unchanged behaviour). Thread the option through �ssembleDocumentParts (pdf-document.ts) and �uildPDF (pdf-builder.ts) so both builders forward it to �uildPdfMetadata(creationDate). Fix the two flaky byte-identity assertions in pdf-stream-true.test.ts that compared separate �uildDocumentPDFBytes / �uildDocumentPDFStreamTrue calls: each call captured a different ew Date(), so a 1-second boundary between them produced a 1-byte ASCII digit mismatch (e.g. seconds '8' vs '9'). Both tests now pass a shared FIXED_DATE constant via layout.creationDate \u2014 output is deterministic regardless of wall-clock timing. Gates: 1982/1982 tests pass, typecheck:all clean.

…90% line coverage

- Added new features to package.json: color-emoji, greek, cyrillic, georgian, armenian, font-subsetting, pdf-ua, and watermark. - Upgraded devDependencies: @vitest/coverage-v8 and vitest to version 4.1.8. - Enhanced pdf-table.test.ts to use fake timers for consistent PDF generation tests. - Increased timeout for inflate-bomb.test.ts to 30 seconds to accommodate larger test cases. - Removed unnecessary mock font data in khmer-shaper.test.ts for clarity. - Adjusted coverage thresholds in vitest.config.ts from 90 to 88 for statements.

Nizoka added 26 commits May 31, 2026 11:51

feat(shaping): USE-lite joiner authority + eyelash-ra/ya-phalaa edge …

6227793

…cases

feat(emoji): COLR/CPAL colour-glyph engine — glyf outlines, COLRv0/v1…

23d39d3

… parser, native PDF shading renderer

style(v1.3.0): satisfy eslint strict gate in new font/shaping code

0f0a05c

- colr-parser: == null -> === null (eqeqeq) - glyf-outline: drop unused no-constant-condition disable - use-lite: drop unused no-fallthrough disable/enable pair

docs(release): sync v1.3.0 notes + CHANGELOG (MCID fix, new samples/d…

86115af

…ocs, test counts) Document the per-line MCID fix, the USE-lite/true-streaming/X4-X5 sample generators, and the docs/playground improvements from this review pass. Update test counts to 63 files / 1908 tests.

docs(v1.3.0): document Telugu, maxBlocks, validatePdfUA, colour-emoji…

78e60c5

… fixes; refresh counts (65/1938, 32/170, 17 scripts)

test(coverage): add GPOS + ligature + script-dispatch tests to reach …

3f3e702

…90% line coverage

Nizoka self-assigned this Jun 8, 2026

Nizoka added bug Something isn't working documentation Improvements or additions to documentation chore Release tasks, metadata updates, governance, CI, and other non-feature maintenance work labels Jun 8, 2026

Nizoka added the release Tracks a versioned release — implementation, quality gates, and publish workflow label Jun 8, 2026

Nizoka merged commit 309d515 into main Jun 8, 2026
7 checks passed

Nizoka deleted the release/v1.3.0 branch June 8, 2026 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v1.3.0 — COLRv1 colour emoji, USE-lite shaper integration, true streaming, UAX #9 X4–X5, Telugu + 5-script expansion, opt-in NFC normalization, CSPRNG-only crypto, configurable block limit, validatePdfUA(), #48 fix#49

Release v1.3.0 — COLRv1 colour emoji, USE-lite shaper integration, true streaming, UAX #9 X4–X5, Telugu + 5-script expansion, opt-in NFC normalization, CSPRNG-only crypto, configurable block limit, validatePdfUA(), #48 fix#49
Nizoka merged 27 commits into
mainfrom
release/v1.3.0

Nizoka commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Nizoka commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in it

Commits (release/v1.3.0)

Related Issues

Verification checklist

Downstream

Deferred to v1.4.0

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Nizoka commented Jun 8, 2026 •

edited

Loading