Harnessing Routine Data and Healthcare Contacts to Predict Risk and Treatment Requirements in Older People
PhD Thesis Supplementary Codebase
This repository contains the supporting codebase for the PhD thesis titled "Harnessing Routine Data and Healthcare Contacts to Predict Risk and Treatment Requirements in Older People". The codebase provides scripts for the analysis of routine healthcare data, focusing on risk prediction, treatment requirements, and care pathway analytics relevant to geriatric care planning. The original work is based on data from NHS Lothian care systems, conducted within DataLoch Secure Data Environments (SDEs).
Note: only publicly-available or fake (dummy) data are shared in this repository.
-
data/: currently contains the open COCHRANE REH-COVER summary data collated from previous systematic review rounds, used for the analysis in Chapter 4. Future data will include synthetic EHRs for Machine Learning and Process Mining pipelines.
-
docs/: basic documentation, including structured overviews of the analysis code and related work.
-
notebooks/: Jupyter and R Markdown notebooks and files covering the original data processing pipelines used to conduct the experiments within DataLoch Secure Data Enviroments:
- cv19_statistics: relevant to statistical analysis and mapping for COVID-19 care pathways and rehabilitation interventions (covered in Chapters 4, 5 and 6)
- process_mining: relevant to Process Mining analytics for comparison of care interactions between COVID-19 pandemic waves (covered in Chapter 6).
- geriatric_ml: relevant to Machine Learning analytics for risk and resource predictions in urgently hospitalised older patients (covered in Chapter 8).
- The codebase is under active development to enable reproduction of these experiments on synthetic (dummy) healthcare data. Some scripts and pipelines are prototypes or in-progress. The
src/folder will be used to construct data pipelines for Machine Learning and Process Mining analytics related to geriatric care pathways.
If you use this codebase or its methods, please cite any of the related published work:
- Georgiev, K., et al. "Understanding hospital rehabilitation using electronic health records in patients with and without COVID-19." BMC Health Serv Res 24, 1245 (2024). https://doi.org/10.1186/s12913-024-11665-x
- Georgiev, K., Fleuriot, J.D., Papapanagiotou, P. et al. "Comparing Care Pathways Between COVID-19 Pandemic Waves Using Electronic Health Records: A Process Mining Case Study." J Healthc Inform Res 9, 41–66 (2025). https://doi.org/10.1007/s41666-024-00181-6
- Georgiev, K., McPeake, J., Shenkin, S.D. et al. "Understanding hospital activity and outcomes for people with multimorbidity using electronic health records". Sci Rep 15, 8522 (2025). https://doi.org/10.1038/s41598-025-92940-7
- Georgiev, K., et al., "Predicting incident dementia in community-dwelling older adults using primary and secondary care data from electronic health records", Brain Communications, Volume 7, Issue 1, 2025, fcae469, https://doi.org/10.1093/braincomms/fcae469
This thesis is supported by a PhD Fellowship award from the Sir Jules Thorn Charitable Trust (21/01PhD) as part of the University of Edinburgh’s Precision Medicine PhD programme.
The original data studies were supported by DataLoch, which is core-funded by the Data-Driven Innovation programme within the Edinburgh and South East Scotland City Region Deal, and the Chief Scientist Office, Scottish Government.
This project is licensed under the MIT License. See the LICENSE file for details.