Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #25 +/- ##
==========================================
+ Coverage 95.28% 95.54% +0.26%
==========================================
Files 1 1
Lines 318 359 +41
==========================================
+ Hits 303 343 +40
- Misses 15 16 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| data[3] == 0x4D && data[4] == 0x18) | ||
| return decode(LZ4FrameCodec(), data) | ||
| else | ||
| # If the data was originally created from Python's ASDF, then it will be in block instead of frame layout, |
There was a problem hiding this comment.
Can we also produce the block layout? Should we? Can Python handle the frame layout?
There was a problem hiding this comment.
Thanks for reminding me, will add that in. Looks like they can, but as a plug-in https://github.com/asdf-format/asdf-compression
There was a problem hiding this comment.
Added in e012b21
There is a layout option for NDArrayWrapper now to flip between frame and block. Does that seem like a reasonable place to control things? I guess it doesn't really make sense for other compression schemes, so I just set the default layout as default
I don't love from a maintainability point of view that there is now a hand-rolled Lz4-specific encode and decode path to accommodate Python's asdf scheme. There's also the matter of compatibility with Lz4 frame support on the Python side, but since that's still an experimental plugin, maybe that can be a problem for future us to deal with.
I am now squarely out of my comfort zone and would gladly accept any suggestions for simplifying things, haha. Thanks again for taking a look!
There was a problem hiding this comment.
If this is only for lz4 then I would call it lz4_layout, with values :frame and :block. I am sure that other compression schemes will also want to have options in the future, e.g. specifying the compression level.
|
The checksum bit sounds weird. There is a standard for checksums, and THE major player in the field gets the implementation wrong? Does this make checksums useless in practice? Well, the world is what it is... |
|
Yea, the checksum bit was bothering me too. Here's what I am seeing for the sample Roman data I tried: julia> using ASDF, MD5
julia> af = ASDF.load_file("docs/data/roman.asdf"; extensions = true);
julia> ndarray = af.metadata["roman"]["data"];
julia> header = ndarray.lazy_block_headers.block_headers[ndarray.source + 1]
ASDF.BlockHeader(IOStream(<file docs/data/roman.asdf>), 176347, UInt8[0xd3, 0x42, 0x4c, 0x4b], 0x0030, 0x00000000, UInt8[0x6c, 0x7a, 0x34, 0x00], 0x0000000003ffe53b, 0x0000000003ffe53b, 0x0000000003fc0100, UInt8[0xef, 0x4e, 0x63, 0x45, 0xc4, 0xd6, 0xcd, 0xa0, 0xed, 0x4d, 0x14, 0x27, 0x43, 0xa7, 0xb2, 0xbc], true)
julia> block_data_start = header.position + 6 + header.header_size
176401
julia> seek(header.io, block_data_start)
IOStream(<file docs/data/roman.asdf>)
julia> data = Array{UInt8}(undef, header.used_size);
julia> nb = readbytes!(header.io, data)
67102011
julia> data_decompressed = ASDF.decode_Lz4(data);
julia> md5(data_decompressed) == header.checksum
trueI wonder if this behavior could just be an uncompressed data streaming thing for this specific instance with Roman's data products. I don't work with AWS S3, but it seems that this a selling point for them I guess. fwiw, the JWST data I tried looks to behave as expected (it's just downloaded as a regular file). Here are worked examples for both Doc previews: |
Hey folks, these are a few toggles I added to try and support some of the quirks I ran into "out in the wild". I've split them into the following three commits:
ndarray-1.1.0tag that Roman seems to use now.ASDF.load(<filename.asdf>; extensions = false)kwarg by default to preserve current behavior. Similar to theextensionskwarg in the Python impl.Usage examples:
Does this seem reasonable to folks? I'd especially appreciate feedback for the last point, which I have little experience in and relied heavily on Claude for to come up with the needed incantations.
To-do