Skip to content

Optimized shuffle for typesize=12 #649

@froody

Description

@froody

Describe the bug
Feature request, I'm happy to contribute some but I don't know if my solutions will be optimal. I compress a lot of data where typesize=12, and when using shuffle this falls back to unshuffle_generic, which is slow. It would be nice if there were 12-byte variants of all the platform-specific shuffle code. It might not be as fast as a power-of-2 typesize, but it's still much faster than generic.

To Reproduce
Decompress any data using shuffle with typesize=12, see that unshuffle_generic dominates the overall time.

Expected behavior
unshuffle for typesize=12 is approximately as fast as typesize=8 or typesize=16

Logs
If applicable, add logs to help explain your problem.

System information:

  • OS: [e.g. OSX]
  • Compiler [e.g. gcc, clang]
  • Version [e.g. 2.0.1]

Additional context
I think it would be nice to support all possible typesizes up to a point, as for most the could be quite a significant speedup compared to the generic implementation.

Here's my attempt at avx512-unshuffle: #648

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions