Performance profiling results

I did some basic profiling on a simplified version of https://github.com/englacial/magg/pull/1 (code in https://github.com/englacial/magg/blob/8fff3c8d8693049b7666eeddd2f4dcb91b150711/process-single-cell.py). Note that I'll still put together an example of how to write the indexes to Zarr rather than Parquet, but wanted to first check how the gains could relate to the goal of reducing the total time to <5 minutes as mentioned in https://github.com/englacial/magg/pull/1#issue-3713697614. I profiled aggregating data from the cell with the most granules associated with it figuring it'd be a worst case/rase-limiting function invocation.

A vast majority of the time was spent on reading data and there was a lot of time wasted on lock contention:

<img width="1780" height="662" alt="Image" src="https://github.com/user-attachments/assets/784307e4-e248-400e-a2ad-cc0e0b3ac8f8" />

A quick [Claude-coded experiment](https://github.com/developmentseed/magg/commit/90dc639a0caba54b7457e38d4a0b6c8794206faa) reduced the execution time to ~1/3 the original processing time by using a Semaphore to limit concurrency and processing batches in parallel, but I have yet to check its work.

Is this type of use-case with needing to open hundreds of hdf5 files in a single workflow common @espg @weiji14? h5coro was definitely not designed for it, and this type of requirement was the motivation for https://github.com/developmentseed/async-tiff for COGs. A similar fully async, rust-based implementation makes a lot of sense for this style of workflow, but is a big lift. A less big lift would be contributing some examples on how to use h5coro optimally for different use-cases. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance profiling results #5

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Performance profiling results #5

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions