Slowness at DGX, server CPUs (Intel Xeon Platinum 8480C, 8570)

Hi Blosc Team,

I have observed significant degradation (2-5 times slower than workstation) in terms of compress/decompress speed at DGXs (CPUs are Intel Xeon Platinum 8480C, 8570). When I reduced the number of threads, it helps, but according the doc in the below, it should be similar speed. But it might be limited since the test size are small, 32 (blocks), 64 (chunks).

https://github.com/Blosc/c-blosc2/blob/6c16487c76895543c001e2e18605fd56c9c444d9/README_THREADED.rst?plain=1#L9

Or.. the current roadmap for 3.0, is exactly for optimization for DGX servers?  I found the below points are very relevant.

https://github.com/Blosc/c-blosc2/blob/6c16487c76895543c001e2e18605fd56c9c444d9/ROADMAP-TO-3.0.rst?plain=1#L11
https://github.com/Blosc/c-blosc2/blob/6c16487c76895543c001e2e18605fd56c9c444d9/ROADMAP-TO-3.0.rst#L13

For now, if you can clarify that blosc2 speed (compress/decompress) at DGX is slower than the speed at the most  modern workstations, that will be super helpful, and good to know information. Plus, your advice to optimize the latency with the current blosc2 at DGXs.

Btw, it seems obvious that per-core clock speed of Xeon Platinum 8480C, 8570 is weaker than workstation (AMD Threadripper Pro 5975WX). 

<img width="660" height="1101" alt="Image" src="https://github.com/user-attachments/assets/0748107c-22c6-4f52-b80e-ca992d6e9740" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slowness at DGX, server CPUs (Intel Xeon Platinum 8480C, 8570) #784

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Slowness at DGX, server CPUs (Intel Xeon Platinum 8480C, 8570) #784

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions