Skip to content

[CPU] ML Portion for GPU-BDB Queries  #248

@VibhuJawa

Description

@VibhuJawa

Below queries rely on cuML models from for ML GPU . Depending on the performance we need to decide b/w Distributed (dask-ml) vs non distributed (sklearn) implementation for the ML portion of these queries. I suggest benchmarking both and then choosing the one that gives the best performance.

Query-05 GPU:cuml.LogisticRegression

  1. Non Distributed CPU: sklearn.linear_model.LogisticRegression.LogisticRegression
  2. Distributed CPU: dask_ml.linear_model.LogisticRegression

Query-20 GPU: cuml.cluster.kmeans

  1. CPU: sklearn.cluster.KMeans
  2. Distributed CPU: dask_ml.cluster.Kmeans

Query-25 GPU: cuml.cluster.kmeans

  1. CPU: sklearn.cluster.KMeans
  2. Distributed CPU: dask_ml.cluster.Kmeans

Query-26 GPU: cuml.cluster.kmeans

  1. CPU: sklearn.cluster.KMeans
  2. Distributed CPU: dask_ml.cluster.Kmeans

Query 28 GPUcuml.dask.naive_bayes

  1. Distributed CPU CPU Equivalent dask_ml.naive_bayes

CC: @DaceT , @randerzander

Related PRS:

#243

#244

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions