Skip to content

EPUB export of AZW3/MOBI: resolve TOC fragments and write embedded font assets#23

Open
Imaclean74 wants to merge 1 commit into
zacharydenton:masterfrom
Imaclean74:feat/epub-resolve-toc-and-fonts
Open

EPUB export of AZW3/MOBI: resolve TOC fragments and write embedded font assets#23
Imaclean74 wants to merge 1 commit into
zacharydenton:masterfrom
Imaclean74:feat/epub-resolve-toc-and-fonts

Conversation

@Imaclean74

Copy link
Copy Markdown
Contributor

Closes #22.

Summary

Two narrowly-scoped fixes to EpubExporter that turn currently-broken
output into correct EPUBs for AZW3 / MOBI books with fine-grained NCX
indexes and for any book with embedded fonts.

TOC fragment resolution (AZW3 / MOBI)

AZW3 / MOBI importers leave TOC entries with bare chapter hrefs at open
time; the #fileposN / #aid-XXXX fragment is filled in by
Book::resolve_toc. The exporter currently generates the NCX before
calling that, so every TOC entry within a single source chapter
collapsed onto the same partNNNN.html href.

Both export_raw and export_normalized now call book.resolve_toc()
before generating the NCX. EPUB importers don't need this (their TOC is
already resolved by the importer), so the call is a no-op for that
backend.

Font writing in normalized export

export_normalized writes assets from NormalizedContent::assets,
which only contains resources referenced from the IR DOM. Embedded
fonts are typically referenced from CSS @font-face rules and never
make it into the normalized asset list. The exporter now snapshots
book.list_assets() before normalization and, after writing the
normalized assets, enumerates every fonts/* path that wasn't already
covered — adding both an OPF manifest entry and a ZIP entry for each.

export_raw is unaffected here because it already writes the full
book.list_assets() set verbatim.

Related work

#13 surfaced KFX fonts via bcRawFont entity discovery; this PR makes
them survive the EPUB export. For AZW3 / MOBI font extraction, a
separate PR (#21) plumbs the Kindle FONT container decoder through to
list_assets. The font-writing logic here is format-agnostic and
benefits any importer that surfaces fonts/* paths.

Changes

  • src/export/epub.rs:
    • export_raw: call book.resolve_toc() before NCX generation.
    • export_normalized: call book.resolve_toc() before NCX
      generation; snapshot book.list_assets() and, after writing the
      normalized-content assets, emit OPF manifest items + ZIP entries
      for any fonts/* path not already in content.assets.

Tests

tests/epub_exports_fonts_and_toc.rs:

  • epub_export_writes_font_assets_from_kfx — uses the existing
    tests/fixtures/fonts_only.kfx.gz fixture, exports to EPUB, asserts
    the three KFX font assets are written into OEBPS/fonts/ and
    referenced in the OPF manifest.
  • epub_export_resolves_azw3_toc_fragments — uses the existing
    tests/fixtures/epictetus.azw3 fixture, exports to EPUB, asserts
    the generated toc.ncx carries resolved #aid-XXXX fragments
    rather than bare chapter hrefs.

Both tests fail when reverting this patch and pass with it applied —
verified locally by git stash-ing the exporter changes and re-running.

Verification

```
cargo fmt -- --check
cargo clippy --lib --tests
cargo test --lib # 548 passed
cargo test --test epub_exports_fonts_and_toc # 2 passed
```

Two narrowly-scoped improvements to \`EpubExporter\`:

## TOC fragment resolution

AZW3 and MOBI importers leave TOC entries with bare chapter hrefs
(\`partNNNN.html\` or \`content.html\`) at open time — the \`#fileposN\`
/ \`#aid-XXXX\` fragment is populated only when \`Book::resolve_toc\` is
called. Previously, the EPUB exporter generated the NCX before calling
that, so every TOC entry within a single chapter collapsed onto the same
\`partNNNN.html\` href and readers landed on chapter starts instead of
the intended in-chapter target.

Both \`export_raw\` and \`export_normalized\` now call
\`book.resolve_toc()\` before generating the NCX. EPUB importers don't
need this (their TOC is already resolved by the importer), so the call
is a no-op for that backend.

## Font writing in normalized export

\`export_normalized\` writes assets from \`NormalizedContent::assets\`,
which only contains resources referenced from the IR DOM. Embedded
fonts are typically referenced from CSS \`@font-face\` rules rather
than DOM nodes, so they never made it into the normalized asset list
and the exported EPUB shipped \`@font-face\` declarations whose \`src:\`
URLs pointed at files that were never written into the ZIP.

The exporter now snapshots \`book.list_assets()\` before normalization
and, after writing the normalized assets, additionally enumerates every
\`fonts/\*\` path that wasn't already covered — adding both an OPF
manifest entry and a ZIP entry for each.

\`export_raw\` is unaffected here because it already writes the full
\`book.list_assets()\` set verbatim.

## Tests

\`tests/epub_exports_fonts_and_toc.rs\`:

- \`epub_export_writes_font_assets_from_kfx\` uses the existing
  \`tests/fixtures/fonts_only.kfx.gz\` fixture (the one added by zacharydenton#13)
  to confirm that the three KFX font assets the importer surfaces are
  written into the exported EPUB's \`OEBPS/fonts/\` directory and
  referenced in the OPF manifest.
- \`epub_export_resolves_azw3_toc_fragments\` uses the existing
  \`tests/fixtures/epictetus.azw3\` fixture to confirm that the
  generated \`toc.ncx\` contains resolved \`#aid-XXXX\` fragments
  rather than the bare chapter hrefs the importer initially produced.

Both tests fail without this patch's changes; both pass with them.

## Related work

zacharydenton#13 surfaced KFX fonts; this PR makes them survive the EPUB export.
For the AZW3 / MOBI side, font extraction is being added in a separate
PR — the font-writing logic here is format-agnostic and benefits any
importer that surfaces \`fonts/\*\` paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EPUB export of AZW3/MOBI loses TOC fragments and embedded font assets

1 participant