Predictive modelling pipeline for death or ICU admission in eOBS

Overview

This project involves the processing and modeling of ICU data to predict death or ICU admission based on various physiological measurements and other time series based features. The pipeline involves multiple steps including data extraction, transformation, feature engineering, and model training to achieve reliable predictions.

An API deployment of the models trained here can be found at https://github.com/ATayls/DEWS_fastapi

Installation

To run this project, you need to have Python installed along with the following libraries:

Using `pip` with `requirements.txt`

To install dependencies using pip with a requirements.txt file, follow these steps:

Ensure you have Python and pip installed: You can check by running:
```
python --version
pip --version
```
Install dependencies: Navigate to your project directory where the requirements.txt file is located and run:
```
pip install -r requirements.txt
```

This command will install all the dependencies listed in the requirements.txt file.

Using `Poetry` with `pyproject.toml`

To install dependencies using Poetry with a pyproject.toml file, follow these steps:

Install Poetry: If you haven't already installed Poetry, you can do so by running:
```
curl -sSL https://install.python-poetry.org | python3 -
```
Navigate to your project directory: Ensure you are in the directory where your pyproject.toml file is located.
Install dependencies: Run the following command to install all dependencies specified in the pyproject.toml file:
```
poetry install
```
Activate the virtual environment (optional): To work within the virtual environment managed by Poetry, you can run:
```
poetry shell
```

This will create and activate a virtual environment with all the dependencies installed.

Data Pipeline

Extract and Transform

The ETL (Extract, Transform, Load) process handles loading, preprocessing, and feature engineering of the data. The ETL function:

Loads the data from the specified filename.
Applies preprocessing steps.
Creates additional features.
Saves or loads the processed dataset.

Feature Engineering

The feature engineering process involves creating time series features, calculating rolling averages, standard deviations, and slopes. It includes functions like:

create_time_delta: Creates a time variable.
create_diff: Calculates differences from previous values.
create_rolling: Calculates rolling averages and standard deviations.
create_expanding: Calculates expanding averages and standard deviations.
create_ts_base_features: Combines multiple time series features.
create_slopes_cached: Calculates slopes for variables.

Model Training

The model training process involves training logistic regression models using cross-validation and bootstrapping techniques. It includes:

run_lr_train: Trains a logistic regression model.
train_logistic_model_cv: Performs cross-validation.
train_logistic_model_bootstrapped: Uses bootstrapping for model training.
train_logistic_model_CV_grouped: Cross-validation with non-overlapping patient groups.

Configuration

Configuration settings are handled in the settings.py file, including directory paths for data, processed data, saved results, plots, and models.

Usage

To run the main experiment, execute the run.py script:

python run.py

This script will perform the following steps:

Load and preprocess the training and testing data.
Perform feature engineering on the data.
Train logistic regression models using both cross-validation and bootstrapping.
Evaluate the models on the test set.
Save the results, models, and plots.

Results

The results of the model training and evaluation are saved in CSV format in the SAVED_RESULTS_DIR directory. The results include metrics such as AUROC and AUPRC along with confidence intervals.

Plots

The script generates several plots to visualize the model performance and feature importance:

ROC and PR curves for cross-validation and test sets.
Permutation importance plots.
SHAP value summaries.

These plots are saved in the PLOTS_DIR directory.

Export

The processed data, model predictions, and metrics can be exported to Excel files for further analysis. This is handled by the export_as_excel.py module.

Files and Directories

run.py: Main script to run the experiment.
preprocessing.py: Contains preprocessing functions.
feature_engineering.py: Contains feature engineering functions.
train.py: Contains model training functions.
settings.py: Configuration settings.
plots.py: Functions to generate plots.
export_as_excel.py: Functions to export results to Excel.
evaluation.py: Utilities around model evaluation.
utils.py: General Utilities.

Citation

If you use this repository, please cite it as follows:

Taylor, A. (2022). ICU Data Analysis [Source code]. GitHub. https://github.com/ATayls/ICU_data_analysis

This repository is part of the work published in Respiratory Research:

Gonem, S., Taylor, A., et al. (2022). Dynamic early warning scores for predicting clinical deterioration in patients with respiratory disease. Respiratory Research, 23(1), Article 130. https://doi.org/10.1186/s12931-022-02130-6

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
src		src
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive modelling pipeline for death or ICU admission in eOBS

Overview

Table of Contents

Installation

Using `pip` with `requirements.txt`

Using `Poetry` with `pyproject.toml`

Data Pipeline

Extract and Transform

Feature Engineering

Model Training

Configuration

Usage

Results

Plots

Export

Files and Directories

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predictive modelling pipeline for death or ICU admission in eOBS

Overview

Table of Contents

Installation

Using pip with requirements.txt

Using Poetry with pyproject.toml

Data Pipeline

Extract and Transform

Feature Engineering

Model Training

Configuration

Usage

Results

Plots

Export

Files and Directories

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using `pip` with `requirements.txt`

Using `Poetry` with `pyproject.toml`

Packages