This repository defines the core diabetes dataset, including codelists and implementation rules for deriving its components, alongside code to extract these in a reproducible way. It also contains information on related projects.
The core diabetes dataset defines a standardised set of variables describing diabetes diagnosis, classification, treatment, and key outcomes, derived from routinely collected health data. It is designed to support consistent, reproducible analyses across studies by providing clear variable definitions, validated codelists, and explicit implementation rules for data extraction and derivation. This dataset was agreed by expert consensus with input from patients and the public as defined here [link to paper]. It was developed as part of a 2025-2026 NHS-funded driver project.
Codelists and algorithms for defining variables in routine primary and secondary care data (SNOMED, dm+d, ICD10 and OPCS4), as well as implementation rules, are provided for the components of the core diabetes dataset and other variables.