Skip to content

Strip top-level Ion annotation in KFX parse_entity_ion#19

Open
Imaclean74 wants to merge 2 commits into
zacharydenton:masterfrom
Imaclean74:fix/kfx-ion-annotated-entities
Open

Strip top-level Ion annotation in KFX parse_entity_ion#19
Imaclean74 wants to merge 2 commits into
zacharydenton:masterfrom
Imaclean74:fix/kfx-ion-annotated-entities

Conversation

@Imaclean74

Copy link
Copy Markdown
Contributor

Summary

Some KFX containers tag entities with an Ion type annotation, e.g.
$490::{ ... }, where the outer Ion value is an Annotated wrapper around
the entity struct. Every callsite of parse_entity_ion in import/kfx.rs
then reaches for the struct via .as_struct() (directly or via get_field),
which returns None for an Annotated value — silently dropping the
entity's fields and producing a book with empty metadata.

We've observed this on multiple real-world KFX-ZIP packages: the inner
.azw.md metadata sidecar carries a $490::{ ... }-wrapped book_metadata
entity, so title, authors, publisher, language, identifier, etc.
all return their defaults until the wrapper is stripped. A synthetic
fixture reproducing the issue is included.

This patch strips the wrapper once at the parse boundary so all importer
paths can treat the returned value as the entity itself, consistent with how
bin/kfx-dump.rs already unwraps annotated values at every consumer (~17
manual unwraps in that binary).

Changes

  • src/import/kfx.rs: parse_entity_ion now matches on the parsed value
    and returns the inner value when it is an IonValue::Annotated. Existing
    behaviour for un-annotated values is unchanged.

The fix uses a move (not the existing unwrap_annotated() accessor) to
avoid a clone on the hot import path; unwrap_annotated() returns a
reference and would force the caller to clone.

Test

  • tests/fixtures/annotated_metadata.kfx.gz (501 bytes) — a 641-byte
    synthetic KFX container with a single \$490::{ ... }-annotated
    book_metadata entity carrying eight synthetic categorised metadata
    fields. Built by examples/build_annotated_kfx_fixture.rs using only
    the public kfx::serialization helpers — no real-book provenance.
  • tests/kfx_annotated_metadata.rs — opens the fixture via the public
    Book::open API and asserts metadata().title, authors, publisher,
    language, identifier, date all round-trip. Before this patch's fix,
    the test fails with left: \"\" / right: \"Annotated Entity Test Book\"
    on the first assertion.

Verification

cargo fmt -- --check
cargo clippy --lib --tests --examples
cargo test --lib            # 548 passed
cargo test --test kfx_annotated_metadata  # 1 passed

Reference

Amazon Ion specification on annotations — annotations are schema-level
metadata and do not change a value's logical shape:
https://amazon-ion.github.io/ion-docs/docs/spec.html#annot

Some KFX containers tag entities with an Ion type annotation, e.g.
\`\$490::{ ... }\`, where the outer Ion value is an Annotated wrapper
around the actual entity struct. Every callsite of \`parse_entity_ion\`
in \`import/kfx.rs\` then calls \`.as_struct()\` (directly or via
\`get_field()\`) on the returned value — which returns \`None\` for an
\`Annotated\`, silently dropping the entity's fields.

Strip the wrapper once at the parse boundary so all importer paths can
treat the returned value as the entity itself, consistent with how
\`bin/kfx-dump.rs\` already unwraps annotated values at every consumer.

This matches the Ion spec's treatment of annotations as schema-level
metadata that does not change the value's logical shape:
https://amazon-ion.github.io/ion-docs/docs/spec.html#annot
Adds a 641-byte (501 bytes gzipped) synthetic KFX fixture with a single
`\$490`-annotated `book_metadata` entity carrying eight categorised
metadata fields (title/author/publisher/language/ASIN/book_id/cde/issue_date).

`tests/kfx_annotated_metadata.rs` opens the fixture via the public
`Book::open` API and asserts that all metadata fields round-trip. Before
the parent commit's fix, this test fails because the importer's
`.as_struct()` calls return `None` for the annotated value and every
field comes back empty.

`examples/build_annotated_kfx_fixture.rs` is a small generator that uses
the public `kfx::serialization` helpers to rebuild the fixture from
synthetic data. Run with
`cargo run --release --example build_annotated_kfx_fixture` to
regenerate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant