Variational Bayesian PCA (Ilin & Raiko, 2010) with support for missing data, sparse masks, optional bias terms, and an orthogonal post-rotation to a PCA basis. The implementation follows the original MATLAB reference while adding Python-native APIs, fast C++ extensions, and runtime autotuning.
Documentation · API Reference · Tutorials
Missing values are common in scientific and industrial tabular datasets, but many analysis pipelines either impute first (masking uncertainty) or drop incomplete samples. VBPCApy models missingness directly and exposes posterior uncertainty outputs alongside reconstructions, enabling uncertainty-aware latent-factor analysis in a single reproducible Python API.
From PyPI (pre-built wheels for Python 3.11–3.14, Linux/macOS/Windows):
pip install vbpca-pyWith plotting support:
pip install vbpca-py[plot]See the installation guide for building from source and Eigen setup.
import numpy as np
from vbpca_py import VBPCA
# 50 features, 200 samples
x = np.random.randn(50, 200)
mask = np.ones_like(x) # 1 = observed, 0 = missing
model = VBPCA(n_components=5, maxiters=100)
scores = model.fit_transform(x, mask=mask)
recon = model.reconstruction_
var = model.variance_More examples: quickstart, dense PCA tutorial, missing data & model selection, sparse data.
- Dense or sparse data with explicit missing-entry masks
- Optional bias estimation and rotation to PCA-aligned solution
- Posterior covariances for scores and loadings; held-out probe RMS
- C++ extensions with runtime autotune for threading and memory
- Missing-aware preprocessing: one-hot, standard/minmax scaling, log, power, winsorize, auto-routing (
AutoEncoder) - Preflight data diagnostics via
check_data()/DataReport - scikit-learn-compatible estimator (
fit/transform/inverse_transform) - Model selection via
select_n_componentsandcross_validate_components - Configurable convergence: subspace angle, RMS/cost plateau, ELBO, curvature, composite rules, patience
See the concept guides and API reference for full details.
git clone https://github.com/yoavram-lab/VBPCApy.git
cd VBPCApy
uv sync --extra dev --extra plot
just ci # lint + typecheck + test
just docs-serve # local docs previewSee CONTRIBUTING.md for guidelines.
If you use this package in your research, please cite:
@software{vbpca_py2026,
author = {Macdonald, Joshua and Naim, Shany and Ram, Yoav},
title = {{VBPCApy}: Variational Bayesian PCA with Missing Data Support},
year = {2026},
url = {https://github.com/yoavram-lab/VBPCApy},
version = {0.2.0},
}@article{ilin2010practical,
title={Practical Approaches to Principal Component Analysis in the Presence of Missing Values},
author={Ilin, Alexander and Raiko, Tapani},
journal={Journal of Machine Learning Research},
volume={11},
pages={1957--2000},
year={2010}
}See CITATION.cff for machine-readable metadata.
MIT — see LICENSE.