Support JWST and Roman by icweaver · Pull Request #25 · JuliaAstro/ASDF.jl

icweaver · 2026-03-21T11:38:34Z

Hey folks, these are a few toggles I added to try and support some of the quirks I ran into "out in the wild". I've split them into the following three commits:

Don't choke on unknown tags: dc40329
1. This just provides a fallback for STScI's many asdf tags. Could be replaced by specific converters in the future.
2. Also adds a metadata tag and a newer ndarray-1.1.0 tag that Roman seems to use now.
3. This feature is behind an ASDF.load(<filename.asdf>; extensions = false) kwarg by default to preserve current behavior. Similar to the extensions kwarg in the Python impl.
Checksum or not to checksum: 56a8c33
1. Apparently some Roman data products store their file's checksum based on the uncompressed file instead of compressed (ASDF.jl's default). This breaks things.
2. Added a flag to get around this, similarly to the Python impl.
Blocks vs Frames: 0d425a0
1. Roman seems to use the block layout for their Lz4 compression, instead of frame. This adds support for both, and automatically handles it using magic numbers.

Usage examples:

JWST
Roman

Does this seem reasonable to folks? I'd especially appreciate feedback for the last point, which I have little experience in and relied heavily on Claude for to come up with the needed incantations.

To-do

Add tests for item 3 if we end up sticking with an impl like this
Add Lz4 frame/block write option

…r Roman)

codecov · 2026-03-21T11:39:50Z

Codecov Report

❌ Patch coverage is 98.11321% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 95.54%. Comparing base (c6f56b0) to head (e5b6bd4).

Files with missing lines	Patch %	Lines
src/ASDF.jl	98.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #25      +/-   ##
==========================================
+ Coverage   95.28%   95.54%   +0.26%     
==========================================
  Files           1        1              
  Lines         318      359      +41     
==========================================
+ Hits          303      343      +40     
- Misses         15       16       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

eschnett · 2026-03-21T15:47:06Z

src/ASDF.jl

+        data[3] == 0x4D && data[4] == 0x18)
+        return decode(LZ4FrameCodec(), data)
+    else
+        # If the data was originally created from Python's ASDF, then it will be in block instead of frame layout,


Can we also produce the block layout? Should we? Can Python handle the frame layout?

Thanks for reminding me, will add that in. Looks like they can, but as a plug-in https://github.com/asdf-format/asdf-compression

Added in e012b21

There is a layout option for NDArrayWrapper now to flip between frame and block. Does that seem like a reasonable place to control things? I guess it doesn't really make sense for other compression schemes, so I just set the default layout as default

I don't love from a maintainability point of view that there is now a hand-rolled Lz4-specific encode and decode path to accommodate Python's asdf scheme. There's also the matter of compatibility with Lz4 frame support on the Python side, but since that's still an experimental plugin, maybe that can be a problem for future us to deal with.

I am now squarely out of my comfort zone and would gladly accept any suggestions for simplifying things, haha. Thanks again for taking a look!

If this is only for lz4 then I would call it lz4_layout, with values :frame and :block. I am sure that other compression schemes will also want to have options in the future, e.g. specifying the compression level.

Sounds good, just renamed here e5b6bd4

eschnett · 2026-03-21T15:49:14Z

The checksum bit sounds weird. There is a standard for checksums, and THE major player in the field gets the implementation wrong? Does this make checksums useless in practice? Well, the world is what it is...

icweaver · 2026-03-21T20:44:37Z

Yea, the checksum bit was bothering me too. Here's what I am seeing for the sample Roman data I tried:

julia> using ASDF, MD5

julia> af = ASDF.load_file("docs/data/roman.asdf"; extensions = true);

julia> ndarray = af.metadata["roman"]["data"];

julia> header = ndarray.lazy_block_headers.block_headers[ndarray.source + 1]
ASDF.BlockHeader(IOStream(<file docs/data/roman.asdf>), 176347, UInt8[0xd3, 0x42, 0x4c, 0x4b], 0x0030, 0x00000000, UInt8[0x6c, 0x7a, 0x34, 0x00], 0x0000000003ffe53b, 0x0000000003ffe53b, 0x0000000003fc0100, UInt8[0xef, 0x4e, 0x63, 0x45, 0xc4, 0xd6, 0xcd, 0xa0, 0xed, 0x4d, 0x14, 0x27, 0x43, 0xa7, 0xb2, 0xbc], true)

julia> block_data_start = header.position + 6 + header.header_size
176401

julia> seek(header.io, block_data_start)
IOStream(<file docs/data/roman.asdf>)

julia> data = Array{UInt8}(undef, header.used_size);

julia> nb = readbytes!(header.io, data)
67102011

julia> data_decompressed = ASDF.decode_Lz4(data);

julia> md5(data_decompressed) == header.checksum
true

I wonder if this behavior could just be an uncompressed data streaming thing for this specific instance with Roman's data products. I don't work with AWS S3, but it seems that this a selling point for them I guess. fwiw, the JWST data I tried looks to behave as expected (it's just downloaded as a regular file). Here are worked examples for both

Doc previews:

icweaver added 4 commits March 20, 2026 20:32

feat: extensions/tags fallback

dc40329

feat: checksum validation flag

56a8c33

feat: autodetect block or frame layout for Lz4 compression (needed fo…

0d425a0

…r Roman)

Merge branch 'main' into strict

a821211

icweaver mentioned this pull request Mar 21, 2026

docs: Usage examples #19

Open

eschnett reviewed Mar 21, 2026

View reviewed changes

icweaver added 2 commits March 22, 2026 00:34

feat: add Lz4 block writing for python asdf compat

e012b21

refactor: rename layout --> lz4_layout

e5b6bd4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support JWST and Roman#25

Support JWST and Roman#25
icweaver wants to merge 6 commits intomainfrom
strict

icweaver commented Mar 21, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

eschnett Mar 21, 2026

Uh oh!

icweaver Mar 21, 2026

Uh oh!

icweaver Mar 22, 2026

Uh oh!

eschnett Mar 22, 2026

Uh oh!

icweaver Mar 22, 2026

Uh oh!

eschnett commented Mar 21, 2026

Uh oh!

icweaver commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

icweaver commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

To-do

Uh oh!

codecov bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eschnett Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

icweaver Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

icweaver Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

eschnett Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

icweaver Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

eschnett commented Mar 21, 2026

Uh oh!

icweaver commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

icweaver commented Mar 21, 2026 •

edited

Loading

codecov bot commented Mar 21, 2026 •

edited

Loading

icweaver commented Mar 21, 2026 •

edited

Loading