Decode Kindle FONT container records for AZW3/MOBI font extraction by Imaclean74 · Pull Request #21 · zacharydenton/boko

Imaclean74 · 2026-05-20T15:49:41Z

Closes #20.

Summary

AZW3 and MOBI ebooks can embed fonts wrapped in a Kindle-specific FONT
FourCC container with optional XOR obfuscation (first 1040 bytes masked
with a per-record key) and optional zlib compression. The importer
previously classified FONT records as metadata and skipped them, so
embedded fonts never reached the asset list and books with custom
typography rendered with fallback system fonts.

Surveyed our 864-book AZW3/MOBI test corpus: 109 books carry FONT
records, 2,121 total. With this patch applied, the importer surfaces
them as fonts/font_NNNN.otf assets and load_image_record returns the
unwrapped font bytes.

Related work

#13 added KFX font extraction by surfacing bcRawFont entities as
fonts/font_NNNN.otf assets. KFX and AZW3/MOBI use different on-disk
mechanisms for embedded fonts (Ion entity table vs. FONT FourCC
container), so this PR is the AZW3/MOBI counterpart to that earlier
work — it does not duplicate or overlap with KFX font handling.

Changes

src/mobi/parser.rs:
- Add decode_font_record — parses the 24-byte header, reverses the
  XOR mask over the first 1040 bytes when bit 1 is set, then zlib-
  decompresses when bit 0 is set.
- Recognise the FONT magic in detect_font_type (defaults to
  \"otf\" extension).
- Drop FONT from is_metadata_record.
src/mobi/mod.rs: re-export decode_font_record.
src/import/azw3.rs and src/import/mobi.rs:
- Detect font records in discover_assets (plus the standalone
  discover_assets_from_source in mobi.rs) and emit
  fonts/font_NNNN.<ext> paths.
- Accept the fonts/font_NNNN.ext prefix in load_asset (image and
  font records share the same index space; the prefix just selects
  naming).
- Dispatch to decode_font_record in load_image_record when the
  record starts with FONT.

Tests

Seven new unit tests in mobi::parser::tests:

test_decode_font_record_uncompressed_plain — bare payload, no flags
test_decode_font_record_zlib_compressed — zlib only
test_decode_font_record_xor_obfuscated — XOR mask only
test_decode_font_record_xor_and_zlib — both flags (real-world case)
test_decode_font_record_rejects_wrong_magic
test_decode_font_record_rejects_truncated
test_decode_font_record_rejects_offset_beyond_record

Existing test_detect_font_type and test_is_metadata_record updated for
the new FONT classification.

Verification

```
cargo fmt -- --check
cargo clippy --lib
cargo test --lib # 555 passed
```

End-to-end check on a real AZW3 with one embedded font: the exported EPUB
contains OEBPS/fonts/font_0000.otf whose first bytes are the TrueType
magic 00 01 00 00 followed by the expected font tables — confirming the
full decode pipeline (XOR-unmask → zlib-inflate → write-to-EPUB) works.

References

MobileRead Wiki — MOBI/AZW format: https://wiki.mobileread.com/wiki/MOBI
Amazon KDP Publishing Guidelines (embedded fonts): https://kdp.amazon.com/en_US/help/topic/G201834180
EPUB 3.3 Core Media Types §3.3: https://www.w3.org/TR/epub-33/#sec-core-media-types

AZW3 and MOBI ebooks can embed fonts wrapped in a Kindle-specific \`FONT\` FourCC container with optional XOR obfuscation (first 1040 bytes masked with a per-record key) and optional zlib compression. Previously the importer classified \`FONT\` records as metadata and skipped them entirely, so embedded fonts never reached the asset list and books with custom typography rendered with fallback system fonts. This patch: - Adds \`decode_font_record\` to parse the \`FONT\` container header, reverse the XOR mask, and zlib-decompress the payload, returning the raw font bytes (typically OTF / TTF / WOFF). - Recognises the \`FONT\` magic in \`detect_font_type\` (defaults to \`.otf\` extension; the actual format is known after decoding but e-readers identify fonts via the \`@font-face src:\` MIME, not the filename extension). - Removes \`FONT\` from \`is_metadata_record\` so the records flow through \`discover_assets\`. - Updates AZW3 and MOBI \`discover_assets\` (plus the standalone \`discover_assets_from_source\`) to emit \`fonts/font_NNNN.<ext>\` paths when a record sniffs as a font. - Extends AZW3 and MOBI \`load_asset\` to accept the \`fonts/font_NNNN.ext\` prefix (image and font records share the same index space, so the prefix just selects naming). - Updates AZW3 and MOBI \`load_image_record\` to dispatch to \`decode_font_record\` when the record starts with \`FONT\`. Includes 7 unit tests covering the four flag combinations (uncompressed / zlib-only / XOR-only / both), wrong-magic rejection, truncated-record rejection, and out-of-range data-offset rejection. ## Format references - MobileRead Wiki — MOBI / AZW format overview, palm record types: https://wiki.mobileread.com/wiki/MOBI - Amazon Publishing Guidelines — embedded font support in AZW3 (\`@font-face\` rules are honoured by Kindle e-ink and apps): https://kdp.amazon.com/en_US/help/topic/G201834180 - EPUB 3.3 Core Media Types §3.3 — OTF / TTF / WOFF are accepted font resource MIME types: https://www.w3.org/TR/epub-33/#sec-core-media-types

This was referenced May 24, 2026

EPUB export of AZW3/MOBI loses TOC fragments and embedded font assets #22

Open

EPUB export of AZW3/MOBI: resolve TOC fragments and write embedded font assets #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decode Kindle FONT container records for AZW3/MOBI font extraction#21

Decode Kindle FONT container records for AZW3/MOBI font extraction#21
Imaclean74 wants to merge 1 commit into
zacharydenton:masterfrom
Imaclean74:feat/azw3-mobi-font-extraction

Imaclean74 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Imaclean74 commented May 20, 2026

Summary

Related work

Changes

Tests

Verification

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant