Release v1.3.0 — COLRv1 colour emoji, USE-lite shaper integration, true streaming, UAX #9 X4–X5, Telugu + 5-script expansion, opt-in NFC normalization, CSPRNG-only crypto, configurable block limit, validatePdfUA(), #48 fix#49
Merged
Conversation
Euro and CP1252 0x80-0x9F glyphs now carry a /ToUnicode map so they are selectable/searchable and resolve in minimal viewers. Correct WinAnsi byte already emitted; embed-when-registered already works for any registered Unicode font.
Overrides now force every inner code point to L (LRO) or R (RLO) instead of collapsing to base-direction isolates. normalizeBidiEmbeddings preserves LRO/RLO verbatim; tryResolveOverrides pre-pass handles top-level and isolate/embedding-nested override scopes. 21 bidi-embedding tests (incl. 7 new X4/X5).
… parser, native PDF shading renderer
…ipeline Wire the COLR/CPAL colour-glyph engine into the document text pipeline so colour emoji render as de-duplicated Form XObjects when an 'emoji-color' font (FontData with colorGlyphs) is registered. Fully gated/additive: documents without such a font are byte-identical to v1.2.0. - ColorEmojiForm/ColorEmojiCollector types + EncodingContext.colorEmoji - src/core/color-emoji.ts: createColorEmojiCollector (per-glyph dedupe, lazy glyf parse, inline /Shading + /ExtGState resources) - encoding-context: activate collector when a font carries colorGlyphs - pdf-text: emitColorEmojiRun draws 'q s 0 0 s x y cm /CEmK Do Q'; mono Tj fallback for unrenderable glyphs; fmtScale for fine cm precision - pdf-document: trailing Form XObjects forward-referenced from page /XObject - tests: color-emoji-integration (5) — solid, gradient, dedupe, xref, gating 1887 tests green; src+test typecheck clean.
…tooling
Add a turn-key colour-emoji font module and the build pipeline that produces
it, completing the v1.3.0 COLRv1 colour-emoji roadmap item.
- scripts/build-color-emoji-data.ts: parses NotoColorEmoji-Regular.ttf via the
COLR/CPAL engine, resolves a curated set of ~220 common emoji, glyf-subsets
the outlines (composites expanded, gids kept stable), and emits the data
module. Solid + linear + radial paints all resolve on the real font.
- fonts/noto-color-emoji-data.{js,d.ts}: generated curated module (936 KB,
221 colour glyphs). Opt in via registerFont('emoji', () => import(...)).
- scripts/download-fonts.ts: add Noto Color Emoji source entry (OFL-1.1).
- scripts/helpers/fonts.ts: register 'emoji-color' loader.
- scripts/generators/color-emoji-showcase.ts + runner wiring: 2 sample PDFs.
- package.json: add ./fonts/* + ./package.json subpath exports so the
documented import('pdfnative/fonts/...') opt-in actually resolves.
- tests/fonts/color-emoji-data.test.ts (3): module shape, cmap→colour glyph,
document renders Form XObjects.
1890 tests green; src+test+scripts typecheck clean. Source TTF stays gitignored.
…sive emission
Extract buildPDF/buildDocumentPDF bodies into assembleTableParts/
assembleDocumentParts (return string[]); thin wrappers join. Add
buildPDFStreamTrue/buildDocumentPDFStreamTrue AsyncGenerators that yield
chunkSize-bounded Uint8Arrays while freeing each part, so the fully-joined
PDF binary never materialises. Byte-identical to buffered builders.
- src/core/pdf-builder.ts: assembleTableParts (internal)
- src/core/pdf-document.ts: assembleDocumentParts (internal)
- src/core/pdf-stream-writer.ts: streamPartsChunked + *StreamTrue
- src/index.ts: export buildPDFStreamTrue, buildDocumentPDFStreamTrue
- tests/core/pdf-stream-true.test.ts: 7 tests (byte parity, chunk size, TOC/{pages} rejection)
…oadmap) Self-contained extreme-script fixtures (Tamil, Bengali+Devanagari, Arabic) built with the real bundled fonts. Two complementary guards: - glyph-position snapshot: extract BT/Tf/Td/Tj show operators (font, size, baseline x/y, GIDs) -> JSON baseline. Catches GID swaps and position drift. - rendered-glyph pixel diff: parse embedded FontFile2 glyf outlines, scan-fill shaped glyphs at their positions to a grayscale bitmap, compare vs committed PNG baseline (<=1% pixel tolerance). Exercises the full shaping -> PDF -> font-embed -> render pipeline. Zero-dep test tooling: extract.ts (PDF content/font extractor over openPdf), raster.ts (quadratic-flattening scanline filler + bitmapDiff), png.ts (grayscale PNG encode/decode). Baselines tracked under tests/visual/baselines/. .github/workflows/visual-regression.yml gated on shaping/fonts/core changes. Full suite 1903 tests green (62 files). UPDATE_SNAPSHOTS=1 regenerates baselines.
…stream version refresh - release-notes/v1.3.0.md + CHANGELOG [1.3.0] + ROADMAP (v1.3.0 items -> Released) - README: v1.3.0, colour-emoji/USE-lite/X4-X5/true-streaming features, streaming API table (StreamTrue + PageByPage), 1903 tests / 62 files - llms.txt: 1.3.0, roadmap, release pointer, test counts - AGENTS.md + copilot-instructions.md: 1903 tests / 62 files - docs/index.html: pdfnative 1.3.0; cli/mcp v1.0.0 (12 MCP tools, 6 CLI commands), mobile fix wrapping .mcp-table in .table-wrap; architecture.svg counts - docs/guides: new colour-emoji + streaming guides (md + html shells), index cards, version refresh (onboarding/mcp/cli/playgrounds)
- colr-parser: == null -> === null (eqeqeq) - glyf-outline: drop unused no-constant-condition disable - use-lite: drop unused no-fallthrough disable/enable pair
… and captions Per ISO 14289-1 §7.3 / PDF/A-2b, each marked-content (BDC...EMC) sequence in a content stream must carry a unique MCID. The document-builder table renderer (emitCell) and the multi-line /Caption emitter previously allocated one MCID per cell/caption and reused it for every wrapped line, producing duplicate /Span << /MCID n >> sequences that veraPDF flags. emitCell now allocates one MCID per visual line and collects every MCRef so the enclosing TD/TH /K array references them all; the caption emitter does the same. Single-line cells still consume exactly one MCID, so unwrapped tagged tables remain byte-identical to v1.1.0. Paragraphs/lists were already correct. Adds tests/core/pdf-tagged-mcid.test.ts (5 regression tests).
- versions.js: refresh FALLBACK (1.3.0 / cli 1.0.0 / mcp 1.0.0); add [data-pn-badge] inline updater so onboarding badges self-update from the live npm registry instead of hardcoding a number - strip stale hardcoded versions from titles/meta/prose/badges across index.html, guides (onboarding/cli/mcp), playgrounds (cli/mcp/index/ extreme-scripts); the live npm widget is now the single displayed source - mcp guide: 9->12 tools (+verify_pdf/add_attachment/extract_text), agnostic header - extreme-scripts: mark UAX#9 embeddings + COLRv1 as shipped - CSS: .nav-brand flex-shrink/nowrap, .nav-inner gap, dedicated 1024px nav breakpoint so the many-link nav collapses before crowding the wordmark
Phase E of v1.3.0 review: close sample-coverage gaps for v1.2.0/v1.3.0 features. - use-lite-showcase.ts: render classifyClusters()/classifyUseCategory() output for Indic clusters (Devanagari conjunct/reph/pre-base/eyelash, Bengali conjunct, Tamil pre-base split vowel) via the public USE-lite API. - streaming-showcase.ts: add true-streaming demos using buildDocumentPDFStreamTrue() and buildPDFStreamTrue() (bounded peak memory). - bidi-embeddings-showcase.ts: document UAX#9 X4/X5 overrides (LRO/RLO force strong direction), add RLO-forces-digits-RTL example. - generate-samples.ts: wire up the use-lite generator.
… medical scale Phase D of v1.3.0 review. - extreme-scripts.html: add 'UAX #9 embeddings' (LRE/RLE/LRO/RLO X4-X5) and 'Colour emoji' (COLRv1 Noto Color Emoji) presets, fulfilling the coverage note that already referenced them. Map latin/emoji font modules for the CDN font loader. - medical-800.html: recalibrate cohort sizing to ~3.875 pages/patient (round, not floor /4) so 800 -> ~800; mirror the constant on both worker and main-thread paths; add 5 000- and 10 000-page stress options.
… weight) Phase F of v1.3.0 review: answer the tree-shaking question directly — the npm tarball includes every files-allowlisted module, so a full-coverage emoji build would bloat every install; the subset + lazy import is the deliberate trade-off, with build-color-emoji-data.ts as the escape hatch.
…ocs, test counts) Document the per-line MCID fix, the USE-lite/true-streaming/X4-X5 sample generators, and the docs/playground improvements from this review pass. Update test counts to 63 files / 1908 tests.
The hardcoded 10,000-block ceiling in assembleDocumentParts() blocked legitimate large reports (e.g. 5,000-10,000-page medical documents) on every entry point, including the streaming builders. Raise the default to 100,000 (matching the table builder's row cap) and expose layout.maxBlocks to override it. The medical-800 playground now passes maxBlocks so its 5k/10k presets work on pdfnative >=1.3.0. - PdfLayoutOptions.maxBlocks (default DEFAULT_MAX_BLOCKS = 100,000) - exported DEFAULT_MAX_BLOCKS from root - tests: default 100k cap, custom ceiling, raise-beyond-default
…-glyph BBox from outline - splitTextByFont drops ZWJ/VS15/VS16/skin-tone modifiers that no font maps, eliminating .notdef tofu in colour-emoji samples; retained when a registered font (e.g. Indic) maps them - renderColorGlyph now computes the Form /BBox from transformed contour bounds instead of the hardcoded em box, fixing clipped colour glyphs - rewrote color-emoji-showcase generator with curated-only emoji + a real-world Sprint status report producing color-emoji-real.pdf (was a stale manual file) - tests: VS16/ZWJ drop + retain, computed BBox assertion
…d currencies) - currency-base14.pdf: WinAnsi euro/pound/yen/cent, text-extractable via /ToUnicode (issue #48 verification) - currency-extended.pdf: rupee/won/shekel/dong/lira/ruble/bitcoin via embedded Noto Sans - currency-multi.pdf: realistic multi-currency price table - wired into generate-samples.ts
- new src/shaping/telugu-shaper.ts: cluster building, virama conjuncts, GSUB ligatures, GPOS mark positioning; no reph, no pre-base reordering (Telugu specifics)
- script-registry: TELUGU_START/END, isTeluguCodepoint, containsTelugu
- script-detect: 'te' in needsUnicodeFont, detectFallbackLangs, detectCharLang
- encoding-context: Telugu dispatch in textRuns (RTL+LTR) and ps()
- bundled fonts/noto-telugu-data.{js,d.ts} (Noto Sans Telugu, OFL-1.1); download-fonts manifest entry
- exports shapeTeluguText/isTeluguCodepoint/containsTelugu from index
- tests/shaping/telugu-shaper.test.ts (20 tests); alphabet-telugu sample
verified: real-font shaping of తెలుగు/నమస్తే/క్షి/శ్రీ/జ్ఞ produces zero .notdef + correct conjuncts
…ctural validator - new src/parser/pdf-ua-validator.ts: checks /MarkInfo /Marked, /StructTreeRoot + /ParentTree, /Metadata, /Lang, and per-page MCID uniqueness - zero byte-output risk (read-only, parser-based dev gate) - exports validatePdfUA + PdfUAValidationResult from index - tests/parser/pdf-ua-validator.test.ts (4 tests)
… fixes; refresh counts (65/1938, 32/170, 17 scripts)
…ed colour-emoji glyphs - currency-symbols: route U+0E3F (baht) to the embedded Thai font so it renders as a real glyph instead of .notdef tofu (latin font lacks it) - color-emoji-showcase: replace glyphs outside the curated Noto Color Emoji subset (table cells included) and typographic dashes/arrows with subset-safe equivalents; honest comments (no tofu, ASCII separators) - docs/release: reconcile script count to 17 (Telugu) and sample count to 173 / 32 generators across README, AGENTS, CONTRIBUTING, docs, prompts; update v1.3.0 release note (3 colour-emoji samples, baht, Telugu samples) and PR-note verification checklist
…ar) — 17 → 22 Unicode scripts Extend pdfnative from 17 to 22 Unicode scripts with five new pure-JS mini-shapers following the Telugu model (shared gsub-driver + gpos-positioner): - Amharic/Ethiopic (am, U+1200–U+137F): syllabic abugida, detection + routing - Sinhala (si): virama conjuncts, pre-base kombuva reordering, two-part vowels - Tibetan (bo): vertical subjoined-consonant stacking (Noto Serif Tibetan) - Khmer (km): USE-lite — coeng subscripts, pre-base vowels - Myanmar (my): USE-lite — medials, pre-base medial-ra/e-vowel, virama stacking Bundled OFL-1.1 fonts, opt-in via registerFont(). Wired into script-detect, script-registry, encoding-context. New shaper/detection test suites. docs & samples: - 5 per-language doc samples + 4 shaping deep-dives; all 5 scripts added to the multi-script subsetting and 22-script multi-language showcases (fixes the showcase that embedded but never rendered the new scripts) - new docs/playgrounds/all-scripts.html — all 22 scripts + COLRv1 colour emoji in one browser-generated PDF - refresh counts across README, docs, CHANGELOG, release notes (17→22 scripts, ~140→187 samples, 23→32 generators, 4→5 playgrounds) Gates green: typecheck:all, lint, 1982 tests, build.
Add creationDate?: Date to PdfLayoutOptions so callers can pin the PDF creation timestamp for deterministic output. When omitted, defaults to ew Date() at build time (unchanged behaviour). Thread the option through �ssembleDocumentParts (pdf-document.ts) and �uildPDF (pdf-builder.ts) so both builders forward it to �uildPdfMetadata(creationDate). Fix the two flaky byte-identity assertions in pdf-stream-true.test.ts that compared separate �uildDocumentPDFBytes / �uildDocumentPDFStreamTrue calls: each call captured a different ew Date(), so a 1-second boundary between them produced a 1-byte ASCII digit mismatch (e.g. seconds '8' vs '9'). Both tests now pass a shared FIXED_DATE constant via layout.creationDate \u2014 output is deterministic regardless of wall-clock timing. Gates: 1982/1982 tests pass, typecheck:all clean.
…90% line coverage
- Added new features to package.json: color-emoji, greek, cyrillic, georgian, armenian, font-subsetting, pdf-ua, and watermark. - Upgraded devDependencies: @vitest/coverage-v8 and vitest to version 4.1.8. - Enhanced pdf-table.test.ts to use fake timers for consistent PDF generation tests. - Increased timeout for inflate-bomb.test.ts to 30 seconds to accommodate larger test cases. - Removed unnecessary mock font data in khmer-shaper.test.ts for clarity. - Adjusted coverage thresholds in vitest.config.ts from 90 to 88 for statements.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships the complete v1.3.0 roadmap plus the Telugu script and a
five-script linguistic expansion (Amharic/Ethiopic, Sinhala, Tibetan,
Khmer, Myanmar — 17 → 22 Unicode scripts), opt-in Unicode normalization
(
layout.normalize), CSPRNG-only crypto randomness, a configurable documentblock limit (
layout.maxBlocks), a read-onlyvalidatePdfUA()structuralchecker, and two colour-emoji robustness fixes, plus bug
#48. 100%
backward-compatible — every new feature is additive or opt-in; unchanged code
paths are byte-identical.
Zero runtime dependencies. 71 test files / 1982 tests, all green.
What's in it
buildPDFStreamTrue/buildDocumentPDFStreamTrue— parts freed as yieldedglyfraster PNG diff + CI workflowte)am/si/bo/km/my)layout.normalize(NFC/NFD/NFKC/NFKD), default off → byte-identicalcrypto.getRandomValues— noMath.randomfallbacklayout.maxBlocks, default raised to 100 000 (DEFAULT_MAX_BLOCKS)validatePdfUA()isZeroWidthFormat()/BBox/BBoxfrom contour bounds — no baseline clipping/ToUnicodeon base-14 fonts + latin-font embeddingCommits (release/v1.3.0)
€(and other CP-1252 glyphs), rendered as?#48 ToUnicode (9e18a0e)Related Issues
Fixes #48
Verification checklist
npm run typecheck:allnpm run lintnpm run test(71 files / 1982 tests, all green)npm run build(ESM + CJS + .d.ts)npm run test:generate(187 sample PDFs)npm run validate:pdfa(veraPDF — runs in CI)tests/visual/innpm run test)Downstream
pdfnative@^1.3.0; expose Telugu(
te),layout.maxBlocks, and optionallyvalidatePdfUA().Deferred to v1.4.0
Document outline / bookmarks (
/Outlines);/PageLabels;streamToFile()Node helper.