Skip to content

feat(shaping): v1.1.0 epic — deep OpenType shaping, full BiDi UAX #9, GPOS anchor improvements #25

@Nizoka

Description

@Nizoka

Summary

Three user-supplied PDFs reproduced visual defects that pdfnative v1.0.2
could not handle correctly:

  1. pdfnative-test-bidi-arabic-hebrew-thai.pdf — mixed BiDi paragraphs
    with Arabic + Hebrew + Thai + Latin + digits exhibit non-canonical
    run ordering at neutral boundaries.
  2. pdfnative-test-tamil-ultra-extreme-shaping-positioning-pdfnative.pdf
    — Tamil deep conjuncts (e.g. ஸ்ரீ, க்ஷ) with split vowels render
    as non-ligated forms in some sequences.
  3. pdfnative-test-bengali-devanagari-ultra-extreme-shaping-positioning-pdfnative.pdf
    — Bengali ক্ষ্ম and Devanagari क्ष्म multi-halant chains, and
    reph reordering on certain bases, fall back to base+halant+base
    sequences instead of the expected ligature glyph.

A fourth defect surfaced during repro:

  1. The heading
    "Test Bengali + Devanagari ULTRA EXTREME — Shaping & Positioning — pdfnative"
    overflowed the right margin because wrapText() accepted the first
    segment unconditionally on a fresh line, even when that segment was
    wider than the column.

v1.0.3 ships the layout fix for (4), four extreme-script visual
baselines under test-output/extreme/, two interactive playgrounds, and
documentation. Defects (1)–(3) are deferred to v1.1.0 because they
require either GPOS table re-extraction in the pre-built font data
modules or new OpenType lookup implementations in the shaping pipeline,
both of which exceed the scope of a SemVer-patch.

Root-cause analysis

# Defect Root cause Module(s) Fix scope
1 BiDi mis-ordering with 3+ scripts Simplified UAX #9: paragraph-level L2 reorder only; neutrals between RTL runs of different scripts inherit the wrong embedding level. src/shaping/bidi.ts Implement full UBA level resolution rules N1/N2 + W1–W7 with explicit isolate handling.
2 Tamil multi-component conjuncts tryLigature() matches greedily and stops at the first GSUB lookup; nested LookupType 4 chains (e.g. (க + ◌்) + ஷ) need recursive multi-pass application. src/shaping/tamil-shaper.ts, fonts/noto-tamil-data.{js,d.ts} Multi-pass GSUB driver + verify pre-built ligatures table contains the deeply-nested entries.
3a Bengali / Devanagari ligature fallback Same as (2): single-pass GSUB. src/shaping/bengali-shaper.ts, src/shaping/devanagari-shaper.ts Multi-pass GSUB driver.
3b Reph + halant cluster reordering edge-cases Cluster builder treats halant as terminator; pre-base reph + post-base halant rules need richer cluster classification. src/shaping/devanagari-shaper.ts Implement Universal Shaping Engine (USE) cluster types or USE-lite for Devanagari/Bengali.
Arabic isolated harakat anchoring (extra) GPOS mark-to-base anchors are applied only on connected base+mark sequences; isolated tashkeel falls back to default positioning. src/shaping/arabic-shaper.ts + font data Support GPOS LookupType 4 (MarkBasePos) for isolated harakat with markGlyphCoverage.
Thai tall-consonant mark stacking (extra) Mark anchor data missing for ป ฝ ฟ ฬ when 3+ marks stack. fonts/noto-thai-data.{js,d.ts} Re-extract GPOS anchors with full mark coverage; verify with extreme-bidi.pdf.
4 Heading overflow (FIXED in v1.0.3) wrapText accepted overlong segment unconditionally on fresh line. src/core/pdf-renderers.ts ✅ Fixed: hardBreakSegment() slices at code-point boundaries when single segment > maxWidth.

Scope of v1.0.3 (this release)

  • Fix heading overflow at character boundaries (hardBreakSegment)
  • Add tests/core/pdf-document.test.ts regression tests for hard-break
  • Generate four extreme-* PDFs as visual baselines
  • tests/integration/extreme-shaping.test.ts end-to-end smoke tests
  • docs/playgrounds/extreme-scripts.html for in-browser repros
  • docs/playgrounds/medical-800.html Web Worker showcase
  • CHANGELOG, README, ROADMAP, release notes
  • Document defects (1)–(3) and Arabic/Thai extras as known limitations

Scope of v1.1.0 (this issue tracks)

  • shaping(bidi): implement full UAX chore(deps): bump typescript-eslint from 8.57.2 to 8.58.1 in the dev-dependencies group #9 W1–W7 + N1/N2 + isolates
  • shaping(common): multi-pass GSUB driver (re-apply LookupType 4 until fixed point)
  • shaping(devanagari/bengali): USE-lite cluster classification (pre-base / above-base / below-base / post-base)
  • shaping(arabic): GPOS MarkBasePos for isolated harakat
  • fonts(thai): re-extract GPOS anchors with full mark coverage; rebuild noto-thai-data.{js,d.ts} via tools/build-font-data.cjs
  • fonts(tamil/bengali/devanagari): verify pre-built ligatures tables contain deeply-nested chains; rebuild if needed
  • tests(visual): add pixel-diff visual regression tests using the four extreme-* baselines (probably via pdf2pic + pixelmatch in CI)
  • bench: add a shaping-throughput benchmark covering each script

Acceptance criteria for v1.1.0

  • All four user-supplied PDFs render correctly:
  • Extra defects:
    • Isolated Arabic harakat anchor to default base position
    • Thai 3+ mark stacks on ป ฝ ฟ ฬ do not overlap
  • No regression in existing npm run test:generate outputs
  • Pixel-diff visual regression tests pass on CI for all four extreme-* baselines

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions