I did some basic profiling on a simplified version of #1 (code in https://github.com/englacial/magg/blob/8fff3c8d8693049b7666eeddd2f4dcb91b150711/process-single-cell.py). Note that I'll still put together an example of how to write the indexes to Zarr rather than Parquet, but wanted to first check how the gains could relate to the goal of reducing the total time to <5 minutes as mentioned in #1 (comment). I profiled aggregating data from the cell with the most granules associated with it figuring it'd be a worst case/rase-limiting function invocation.
A vast majority of the time was spent on reading data and there was a lot of time wasted on lock contention:
A quick Claude-coded experiment reduced the execution time to ~1/3 the original processing time by using a Semaphore to limit concurrency and processing batches in parallel, but I have yet to check its work.
Is this type of use-case with needing to open hundreds of hdf5 files in a single workflow common @espg @weiji14? h5coro was definitely not designed for it, and this type of requirement was the motivation for https://github.com/developmentseed/async-tiff for COGs. A similar fully async, rust-based implementation makes a lot of sense for this style of workflow, but is a big lift. A less big lift would be contributing some examples on how to use h5coro optimally for different use-cases.
I did some basic profiling on a simplified version of #1 (code in https://github.com/englacial/magg/blob/8fff3c8d8693049b7666eeddd2f4dcb91b150711/process-single-cell.py). Note that I'll still put together an example of how to write the indexes to Zarr rather than Parquet, but wanted to first check how the gains could relate to the goal of reducing the total time to <5 minutes as mentioned in #1 (comment). I profiled aggregating data from the cell with the most granules associated with it figuring it'd be a worst case/rase-limiting function invocation.
A vast majority of the time was spent on reading data and there was a lot of time wasted on lock contention:
A quick Claude-coded experiment reduced the execution time to ~1/3 the original processing time by using a Semaphore to limit concurrency and processing batches in parallel, but I have yet to check its work.
Is this type of use-case with needing to open hundreds of hdf5 files in a single workflow common @espg @weiji14? h5coro was definitely not designed for it, and this type of requirement was the motivation for https://github.com/developmentseed/async-tiff for COGs. A similar fully async, rust-based implementation makes a lot of sense for this style of workflow, but is a big lift. A less big lift would be contributing some examples on how to use h5coro optimally for different use-cases.