Skip to content

Nearest neighbor scaling #41

@stharakan

Description

@stharakan

We have some questions about running nearest neighbors — specifically the distributed_nearest_neighbors.cpp example. We are focusing on the DistKernelMatrix run (second half of the script), with the following parameters

n = 5000
d = 3
k = 64

As we increase the number of mpi tasks, we notice strange behavior. Most importantly, the code slows down significantly when more tasks are introduced. By the eye test, this is clear, but the reported flops and mops after each neighbor iteration tell the same story

nprocs = 1
[ RT] 31 [normal] 0 [listen] 0 [nested] 3.000E+04 flops 3.000E+04 mops
[ RT] 16 [normal] 0 [listen] 0 [nested] 6.250E+06 flops 1.250E+07 mops

nprocs = 2
[ RT] 32 [normal] 0 [listen] 0 [nested] 9.000E+04 flops 9.000E+04 mops
[ RT] 16 [normal] 0 [listen] 0 [nested] 6.250E+06 flops 1.250E+07 mops

nprocs = 4
[ RT] 36 [normal] 0 [listen] 0 [nested] 2.100E+05 flops 2.100E+05 mops
[ RT] 16 [normal] 0 [listen] 0 [nested] 6.250E+06 flops 1.250E+07 mops

nprocs = 8
[ RT] 48 [normal] 0 [listen] 0 [nested] 2.400E+05 flops 2.400E+05 mops
[ RT] 16 [normal] 0 [listen] 0 [nested] 6.250E+06 flops 1.250E+07 mops

This leads to question 1: Should the number of flops grow for the same problem size when increasing tasks, or is there a problem here?

Our hypothesis is that it has something to do with the splitter warning

[WARNING] increase the middle gap to 10 percent!

which are displayed more and more frequently as the number of processors increases. We outline that below, showing gap warnings/nn iteration.

nprocs : 1 warnings: 0
nprocs : 2 warnings: 2
nprocs : 4 warnings: 8
nprocs : 8 warnings: 24

Unless we are misunderstanding the algorithm, the warning is displayed when the there are multiple points that project to the median under that split. Our question is twofold: Why would this increase with the number of tasks? And is it fixable?

For the record, this happens even if care is taken so each processes’ data is randomized with a unique seed so there are no duplicate points. Thanks for the help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions