Summary
Three user-supplied PDFs reproduced visual defects that pdfnative v1.0.2
could not handle correctly:
pdfnative-test-bidi-arabic-hebrew-thai.pdf — mixed BiDi paragraphs
with Arabic + Hebrew + Thai + Latin + digits exhibit non-canonical
run ordering at neutral boundaries.
pdfnative-test-tamil-ultra-extreme-shaping-positioning-pdfnative.pdf
— Tamil deep conjuncts (e.g. ஸ்ரீ, க்ஷ) with split vowels render
as non-ligated forms in some sequences.
pdfnative-test-bengali-devanagari-ultra-extreme-shaping-positioning-pdfnative.pdf
— Bengali ক্ষ্ম and Devanagari क्ष्म multi-halant chains, and
reph reordering on certain bases, fall back to base+halant+base
sequences instead of the expected ligature glyph.
A fourth defect surfaced during repro:
- The heading
"Test Bengali + Devanagari ULTRA EXTREME — Shaping & Positioning — pdfnative"
overflowed the right margin because wrapText() accepted the first
segment unconditionally on a fresh line, even when that segment was
wider than the column.
v1.0.3 ships the layout fix for (4), four extreme-script visual
baselines under test-output/extreme/, two interactive playgrounds, and
documentation. Defects (1)–(3) are deferred to v1.1.0 because they
require either GPOS table re-extraction in the pre-built font data
modules or new OpenType lookup implementations in the shaping pipeline,
both of which exceed the scope of a SemVer-patch.
Root-cause analysis
| # |
Defect |
Root cause |
Module(s) |
Fix scope |
| 1 |
BiDi mis-ordering with 3+ scripts |
Simplified UAX #9: paragraph-level L2 reorder only; neutrals between RTL runs of different scripts inherit the wrong embedding level. |
src/shaping/bidi.ts |
Implement full UBA level resolution rules N1/N2 + W1–W7 with explicit isolate handling. |
| 2 |
Tamil multi-component conjuncts |
tryLigature() matches greedily and stops at the first GSUB lookup; nested LookupType 4 chains (e.g. (க + ◌்) + ஷ) need recursive multi-pass application. |
src/shaping/tamil-shaper.ts, fonts/noto-tamil-data.{js,d.ts} |
Multi-pass GSUB driver + verify pre-built ligatures table contains the deeply-nested entries. |
| 3a |
Bengali / Devanagari ligature fallback |
Same as (2): single-pass GSUB. |
src/shaping/bengali-shaper.ts, src/shaping/devanagari-shaper.ts |
Multi-pass GSUB driver. |
| 3b |
Reph + halant cluster reordering edge-cases |
Cluster builder treats halant as terminator; pre-base reph + post-base halant rules need richer cluster classification. |
src/shaping/devanagari-shaper.ts |
Implement Universal Shaping Engine (USE) cluster types or USE-lite for Devanagari/Bengali. |
| – |
Arabic isolated harakat anchoring (extra) |
GPOS mark-to-base anchors are applied only on connected base+mark sequences; isolated tashkeel falls back to default positioning. |
src/shaping/arabic-shaper.ts + font data |
Support GPOS LookupType 4 (MarkBasePos) for isolated harakat with markGlyphCoverage. |
| – |
Thai tall-consonant mark stacking (extra) |
Mark anchor data missing for ป ฝ ฟ ฬ when 3+ marks stack. |
fonts/noto-thai-data.{js,d.ts} |
Re-extract GPOS anchors with full mark coverage; verify with extreme-bidi.pdf. |
| 4 |
Heading overflow (FIXED in v1.0.3) |
wrapText accepted overlong segment unconditionally on fresh line. |
src/core/pdf-renderers.ts |
✅ Fixed: hardBreakSegment() slices at code-point boundaries when single segment > maxWidth. |
Scope of v1.0.3 (this release)
Scope of v1.1.0 (this issue tracks)
Acceptance criteria for v1.1.0
- All four user-supplied PDFs render correctly:
- Extra defects:
- Isolated Arabic harakat anchor to default base position
- Thai 3+ mark stacks on ป ฝ ฟ ฬ do not overlap
- No regression in existing
npm run test:generate outputs
- Pixel-diff visual regression tests pass on CI for all four
extreme-* baselines
References
Summary
Three user-supplied PDFs reproduced visual defects that pdfnative v1.0.2
could not handle correctly:
pdfnative-test-bidi-arabic-hebrew-thai.pdf— mixed BiDi paragraphswith Arabic + Hebrew + Thai + Latin + digits exhibit non-canonical
run ordering at neutral boundaries.
pdfnative-test-tamil-ultra-extreme-shaping-positioning-pdfnative.pdf— Tamil deep conjuncts (e.g.
ஸ்ரீ,க்ஷ) with split vowels renderas non-ligated forms in some sequences.
pdfnative-test-bengali-devanagari-ultra-extreme-shaping-positioning-pdfnative.pdf— Bengali
ক্ষ্মand Devanagariक्ष्मmulti-halant chains, andreph reordering on certain bases, fall back to base+halant+base
sequences instead of the expected ligature glyph.
A fourth defect surfaced during repro:
"Test Bengali + Devanagari ULTRA EXTREME — Shaping & Positioning — pdfnative"overflowed the right margin because
wrapText()accepted the firstsegment unconditionally on a fresh line, even when that segment was
wider than the column.
v1.0.3 ships the layout fix for (4), four extreme-script visual
baselines under
test-output/extreme/, two interactive playgrounds, anddocumentation. Defects (1)–(3) are deferred to v1.1.0 because they
require either GPOS table re-extraction in the pre-built font data
modules or new OpenType lookup implementations in the shaping pipeline,
both of which exceed the scope of a SemVer-patch.
Root-cause analysis
src/shaping/bidi.tstryLigature()matches greedily and stops at the first GSUB lookup; nested LookupType 4 chains (e.g.(க + ◌்) + ஷ) need recursive multi-pass application.src/shaping/tamil-shaper.ts,fonts/noto-tamil-data.{js,d.ts}ligaturestable contains the deeply-nested entries.src/shaping/bengali-shaper.ts,src/shaping/devanagari-shaper.tssrc/shaping/devanagari-shaper.tssrc/shaping/arabic-shaper.ts+ font datamarkGlyphCoverage.fonts/noto-thai-data.{js,d.ts}extreme-bidi.pdf.wrapTextaccepted overlong segment unconditionally on fresh line.src/core/pdf-renderers.tshardBreakSegment()slices at code-point boundaries when single segment > maxWidth.Scope of v1.0.3 (this release)
hardBreakSegment)tests/core/pdf-document.test.tsregression tests for hard-breakextreme-*PDFs as visual baselinestests/integration/extreme-shaping.test.tsend-to-end smoke testsdocs/playgrounds/extreme-scripts.htmlfor in-browser reprosdocs/playgrounds/medical-800.htmlWeb Worker showcaseScope of v1.1.0 (this issue tracks)
noto-thai-data.{js,d.ts}viatools/build-font-data.cjsligaturestables contain deeply-nested chains; rebuild if neededextreme-*baselines (probably viapdf2pic+pixelmatchin CI)Acceptance criteria for v1.1.0
ஸ்ரீandக்ஷrender as ligaturesক্ষ্মand Devanagariक्ष्मrender as ligaturesnpm run test:generateoutputsextreme-*baselinesReferences