Work division between threads improvment

Calculating compression on a ~10% filled device with multiple threads led to ~10% of the actual threads being used for the majority of work.

It looks like the current algo evenly divides the data space among them, without regard to content. In this instance ~90% of threads finished very fast because they were reading unallocated data, and the remaining ~10% took very long due to threading limitations.

I propose making a work queue of much smaller (but still manageable) chunks that can put non-contiguous work per every thread to more evenly allocate work when there are trivial data regions. i.e.

1. Program is invoked across large data space with 10 threads
2. Main thread generates work items `items[10000]` evenly distributed across data space, and program is called with 10 threads
3. Thread 1 takes `items[0]`, thread 2 takes `items[1]` ... thread 10 takes `items[9]`
4. When each thread finishes the current work, it will take the next un-serviced work in `items[]` (or exit, and a new thread can be spawned)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work division between threads improvment #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Work division between threads improvment #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions