A high-sensitivity clinical triage tool designed to predict heart disease using a subset of seven objective "hard indicators." Built for the Byte 2 Beat Hackathon.
- Recall (Sensitivity): 73%
- Negative Predictive Value (NPV): 96%
-
Statistical Significance:
$p < 0.001$ across all indicators - Test Set Size: 50,736 observations
Our research identified that cardiovascular risk can be effectively triaged using only seven objective, verifiable data points, bypassing the need for subjective survey data:
- Age (Odds Ratio: 2.59)
- Stroke History (Odds Ratio: 1.35)
- Smoking Status (Odds Ratio: 1.33)
- BMI
- Sex
- Heavy Alcohol Consumption
- Healthcare Access
- Language: Python
- Modeling: Scikit-learn (Logistic Regression with Balanced Class Weights)
- Inference: Statsmodels (Maximum Likelihood Estimation)
- Visualization: Matplotlib, Seaborn
- Data Cleaning: Leveraged the CDC's BRFSS dataset (253,680 records).
- Feature Selection: Filtered 22 variables down to 7 "Hard Indicators" to reduce subjective bias.
- Validation: Utilized a 60/20/20 split with Calibration Curve analysis to ensure clinical reliability.
notebooks/: Containseda.ipynbfor initial discovery andmodel.ipynbfor the full experimental pipeline and statistical analysis.docs/: Includes primary research references and the finalRapid Screening for Cardiovascular Risk_ A High-Sensitivity Triage Approach.pdf.images/: Exported visualizations includingconfusion_matrix.png,calibration_curve.png, andp_values.png.data/: Structured storage forrawBRFSS data andprocessedcurated subsets.requirements.txt: Environment dependencies for reproducibility.