[CPU] ML Portion for GPU-BDB Queries 

Below queries rely on cuML models from for ML GPU .  Depending on the performance we need  to decide b/w Distributed (dask-ml) vs non distributed (sklearn) implementation for the ML portion of these queries.  I suggest benchmarking both and then choosing the one that gives the best performance. 

**Query-05** **GPU:cuml.LogisticRegression**  
1. Non Distributed CPU:  [sklearn.linear_model.LogisticRegression](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model).LogisticRegression
2. Distributed CPU:  [dask_ml.linear_model.LogisticRegression](https://ml.dask.org/modules/generated/dask_ml.linear_model.LogisticRegression.html)

**Query-20**  **GPU: cuml.cluster.kmeans**  
1. CPU: [sklearn.cluster.KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html?highlight=kmeans#sklearn.cluster.KMeans)
2. Distributed CPU: [dask_ml.cluster.Kmeans](https://ml.dask.org/modules/api.html#module-dask_ml.cluster)


**Query-25** **GPU: cuml.cluster.kmeans**  
1. CPU: [sklearn.cluster.KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html?highlight=kmeans#sklearn.cluster.KMeans)
2. Distributed CPU: [dask_ml.cluster.Kmeans](https://ml.dask.org/modules/api.html#module-dask_ml.cluster)


**Query-26**  **GPU: cuml.cluster.kmeans**  

1. CPU: [sklearn.cluster.KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html?highlight=kmeans#sklearn.cluster.KMeans)
2. Distributed CPU: [dask_ml.cluster.Kmeans](https://ml.dask.org/modules/api.html#module-dask_ml.cluster)

**Query 28** **GPUcuml.dask.naive_bayes**  
1. Distributed CPU CPU Equivalent [dask_ml.naive_bayes](https://ml.dask.org/naive-bayes.html)
 
 
 CC: @DaceT , @randerzander  
 
 Related PRS:  
 
 https://github.com/rapidsai/gpu-bdb/pull/243
 
 https://github.com/rapidsai/gpu-bdb/pull/244

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] ML Portion for GPU-BDB Queries #248

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CPU] ML Portion for GPU-BDB Queries #248

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions