A minimal C++20 implementation of the K-Means++ clustering algorithm with OpenMP parallelism.
kmeans_pp.h: public APIkmeans_pp.cpp: implementationmain.cpp: example usage (2D synthetic dataset, evaluation, CSV export)plot.py: matplotlib script to visualize results
There are no precompiled libraries to link against. Just drop kmeans_pp.h and kmeans_pp.cpp into your project and compile them alongside your own code.
Requires a C++20 compiler (or later) and OpenMP. CMake 3.8+.
cmake --preset linux-debug # or x64-release, etc.
cmake --build out/build/linux-debugWindows presets (x64-debug, x64-release, x86-*) use MSVC + Ninja.
The entire API is a single function:
#include "kmeans_pp.h"
// data: flat array of N points, each with D coordinates (row-major)
// Returns centroids and per-point cluster assignments.
auto result = kmeans::kmeans_pp(N, D, K, data, max_iterations, seed);
// result.centroids: std::vector<double>, size K*D
// result.assignments: std::vector<int>, size Nseed = 0 uses std::random_device for non-deterministic initialization.
main.cpp loads a 2D dataset in ARFF format, runs K-Means++, evaluates accuracy against ground-truth labels via greedy cluster-to-class matching, and exports results to CSV. It's just a demo, not part of the library.
The dataset used is 2d-10c (2990 points, 10 classes).
./kmeans data.arff # default seed=42
./kmeans data.arff 123 # custom seedTo plot:
python plot.py datasets/kmeans_result.csv-
OpenMP (linked via CMake's
OpenMP::OpenMP_CXX) -
C++20 or later
-
plot.pyneeds pandas and matplotlib.
Do whatever you want with it.