Generalized Spectral Clustering

This repository serves as the official implementation for the paper Generalized Dirichlet Energy and Graph Laplacians for Clustering Directed and Undirected Graphs. It provides a robust Python implementation of the Generalized Spectral Clustering (GSC) algorithm via a custom scikit-learn fork. This implementation thus uses SOTA algorithms and is optimized for parallel computing on CPU via JIT compilation.

It also provides a flexible framework for custom clustering experiments. The exact experiments and plots/figures included in the paper can be reproduced via the reproduce_paper.sh shell script. For more information on this tool, see the associated README.

GSC Implementation

The scikit-learn fork is identical to the 1.8 version of scikit-learn except for the SpectralClustering class and related utilities.

Two parameters are added to the API call for this class: standard (bool) and laplacian_method (str: unnorm, norm, random_walk).

If standard is True, the adjacency matrix is symmetrized and the standard Laplacian is computed according to the specified method. Otherwise, GSC is performed, using the specified laplacian method.

Setting up the experimental environment

Create a virtual environment and activate it (conda or venv):

python -m venv sklearn-env
source sklearn-env/bin/activate.fish

Install all required dependencies:

pip install -r requirements.txt

Install scikit-learn in editable mode:

pip install -e scikit-learn \
          --no-build-isolation \
          --config-settings editable-verbose=true \
          --verbose

Experiments

Datasets

Point Cloud datasets are stored in Hugging Face format and network datasets are stored in .npz format via their adjacency matrix. All file manipulations are handled via the file manager script. This pipeline also includes an interactive 2D labeled dataset builder. It is accessible via the build_dataset() function in utils/dataset_builder.py.

Pipeline

You can edit and run experiments using the template script. An "experiment" consists of datasets to cluster with given methods and fully customizable parameters. The clusterings are then evaluated using the metrics of your choice (currently available: nmi, ari, ami, ch, modularity, graph_ch, map). Three experiment pipelines are available:

Score: Saves results as CSV double-entry tables for each metric
Visualization: Visualizes results in a matplotlib plot, for 2D or 3D point-cloud datasets.
Grid Search: Perform a classical score experiment with grid search on specified parameters.

Adding Custom Components

Detailed guidelines for adding custom components are included in the competitors module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalized Spectral Clustering

GSC Implementation

Setting up the experimental environment

Experiments

Datasets

Pipeline

Adding Custom Components

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
competitors		competitors
datasets		datasets
experiments		experiments
plots		plots
reproduce_paper		reproduce_paper
scikit-learn		scikit-learn
utils		utils
.gitignore		.gitignore
README.md		README.md
reproduce_paper.sh		reproduce_paper.sh
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Generalized Spectral Clustering

GSC Implementation

Setting up the experimental environment

Experiments

Datasets

Pipeline

Adding Custom Components

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages