Cluster distributed histograms many 10s to 100s of GB in size with billions of bins are possible: - https://github.com/dask-contrib/dask-histogram/issues/168 However, at present they are a little slow and could be improved with auxiliary data shuffling services in the filling step. - What is the actual extent of uses cases for histograms this large? - How do we make a good user interface to "virtual" histograms that are only ever rendered fully in memory back at the client machine? - GPUs? #49
Cluster distributed histograms many 10s to 100s of GB in size with billions of bins are possible:
However, at present they are a little slow and could be improved with auxiliary data shuffling services in the filling step.