Skip to content

can we multithread the index construction better? #63

@jbellis

Description

@jbellis

observed init behavior on a large repo, 24 core / 48 ht box with a 3090 GPU:

  • GPU is busy only intermittently, while it is busy CPU is minimal, single digit percents
  • when GPU is not busy CPU spikes to 3-5 (ht) cores

Educated guess:

  • GPU is primarily only used for embeddings up front [this is surprising given what I remember about og PLAID]
  • Index construction is CPU bound and poorly parallelized
  • Possibly there is also a pipelining issue where GPU sits idle while waiting for CPU (but also possibly it's just not a deep pipeline and the CPU is the bottleneck)

I do see thread counts briefly burst from ~40 to ~100 but this is not highly correlated to more %CPU reported by top

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions