Skip to content

Work division between threads improvment #2

Description

@benreese0

Calculating compression on a ~10% filled device with multiple threads led to ~10% of the actual threads being used for the majority of work.

It looks like the current algo evenly divides the data space among them, without regard to content. In this instance ~90% of threads finished very fast because they were reading unallocated data, and the remaining ~10% took very long due to threading limitations.

I propose making a work queue of much smaller (but still manageable) chunks that can put non-contiguous work per every thread to more evenly allocate work when there are trivial data regions. i.e.

  1. Program is invoked across large data space with 10 threads
  2. Main thread generates work items items[10000] evenly distributed across data space, and program is called with 10 threads
  3. Thread 1 takes items[0], thread 2 takes items[1] ... thread 10 takes items[9]
  4. When each thread finishes the current work, it will take the next un-serviced work in items[] (or exit, and a new thread can be spawned)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions