Decode Kindle FONT container records for AZW3/MOBI font extraction#21
Open
Imaclean74 wants to merge 1 commit into
Open
Decode Kindle FONT container records for AZW3/MOBI font extraction#21Imaclean74 wants to merge 1 commit into
Imaclean74 wants to merge 1 commit into
Conversation
AZW3 and MOBI ebooks can embed fonts wrapped in a Kindle-specific \`FONT\` FourCC container with optional XOR obfuscation (first 1040 bytes masked with a per-record key) and optional zlib compression. Previously the importer classified \`FONT\` records as metadata and skipped them entirely, so embedded fonts never reached the asset list and books with custom typography rendered with fallback system fonts. This patch: - Adds \`decode_font_record\` to parse the \`FONT\` container header, reverse the XOR mask, and zlib-decompress the payload, returning the raw font bytes (typically OTF / TTF / WOFF). - Recognises the \`FONT\` magic in \`detect_font_type\` (defaults to \`.otf\` extension; the actual format is known after decoding but e-readers identify fonts via the \`@font-face src:\` MIME, not the filename extension). - Removes \`FONT\` from \`is_metadata_record\` so the records flow through \`discover_assets\`. - Updates AZW3 and MOBI \`discover_assets\` (plus the standalone \`discover_assets_from_source\`) to emit \`fonts/font_NNNN.<ext>\` paths when a record sniffs as a font. - Extends AZW3 and MOBI \`load_asset\` to accept the \`fonts/font_NNNN.ext\` prefix (image and font records share the same index space, so the prefix just selects naming). - Updates AZW3 and MOBI \`load_image_record\` to dispatch to \`decode_font_record\` when the record starts with \`FONT\`. Includes 7 unit tests covering the four flag combinations (uncompressed / zlib-only / XOR-only / both), wrong-magic rejection, truncated-record rejection, and out-of-range data-offset rejection. ## Format references - MobileRead Wiki — MOBI / AZW format overview, palm record types: https://wiki.mobileread.com/wiki/MOBI - Amazon Publishing Guidelines — embedded font support in AZW3 (\`@font-face\` rules are honoured by Kindle e-ink and apps): https://kdp.amazon.com/en_US/help/topic/G201834180 - EPUB 3.3 Core Media Types §3.3 — OTF / TTF / WOFF are accepted font resource MIME types: https://www.w3.org/TR/epub-33/#sec-core-media-types
This was referenced May 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #20.
Summary
AZW3 and MOBI ebooks can embed fonts wrapped in a Kindle-specific
FONTFourCC container with optional XOR obfuscation (first 1040 bytes masked
with a per-record key) and optional zlib compression. The importer
previously classified
FONTrecords as metadata and skipped them, soembedded fonts never reached the asset list and books with custom
typography rendered with fallback system fonts.
Surveyed our 864-book AZW3/MOBI test corpus: 109 books carry FONT
records, 2,121 total. With this patch applied, the importer surfaces
them as
fonts/font_NNNN.otfassets andload_image_recordreturns theunwrapped font bytes.
Related work
#13 added KFX font extraction by surfacing
bcRawFontentities asfonts/font_NNNN.otfassets. KFX and AZW3/MOBI use different on-diskmechanisms for embedded fonts (Ion entity table vs.
FONTFourCCcontainer), so this PR is the AZW3/MOBI counterpart to that earlier
work — it does not duplicate or overlap with KFX font handling.
Changes
src/mobi/parser.rs:decode_font_record— parses the 24-byte header, reverses theXOR mask over the first 1040 bytes when bit 1 is set, then zlib-
decompresses when bit 0 is set.
FONTmagic indetect_font_type(defaults to\"otf\"extension).FONTfromis_metadata_record.src/mobi/mod.rs: re-exportdecode_font_record.src/import/azw3.rsandsrc/import/mobi.rs:discover_assets(plus the standalonediscover_assets_from_sourceinmobi.rs) and emitfonts/font_NNNN.<ext>paths.fonts/font_NNNN.extprefix inload_asset(image andfont records share the same index space; the prefix just selects
naming).
decode_font_recordinload_image_recordwhen therecord starts with
FONT.Tests
Seven new unit tests in
mobi::parser::tests:test_decode_font_record_uncompressed_plain— bare payload, no flagstest_decode_font_record_zlib_compressed— zlib onlytest_decode_font_record_xor_obfuscated— XOR mask onlytest_decode_font_record_xor_and_zlib— both flags (real-world case)test_decode_font_record_rejects_wrong_magictest_decode_font_record_rejects_truncatedtest_decode_font_record_rejects_offset_beyond_recordExisting
test_detect_font_typeandtest_is_metadata_recordupdated forthe new
FONTclassification.Verification
```
cargo fmt -- --check
cargo clippy --lib
cargo test --lib # 555 passed
```
End-to-end check on a real AZW3 with one embedded font: the exported EPUB
contains
OEBPS/fonts/font_0000.otfwhose first bytes are the TrueTypemagic
00 01 00 00followed by the expected font tables — confirming thefull decode pipeline (XOR-unmask → zlib-inflate → write-to-EPUB) works.
References