Training simple models (Random Forest, Fully Connected, BERT + Fully Connected) on paired sequence and categorical features. The data is private medical data and not included in this repository.
- Random Forest — scikit-learn baseline
- SimpleClassifier — fully connected PyTorch network with one-hot encoded sequences
- TinyBERTClassifier — Intel's TinyBERT followed by a fully connected head
modules/— core library: datasets, models, preprocessing, analysis utilitiesexperiments_notebook/— Jupyter notebooks for running experimentstests/— unit tests (42 tests)
uv syncOpen and run the notebooks in experiments_notebook/.
uv run pytestMathieu Charbonnel
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT 2019.
Hugging Face Transformers Library: https://github.com/huggingface/transformers