An End-to-End Machine Learning Solution for Clinical Risk Assessment
This project provides a data-driven approach to identifying high-risk stroke patients. By leveraging clinical parameters such as blood pressure, glucose levels, and smoking history, the system outputs a probability score using a trained Logistic Regression model.
- Core Engine: Python 3.10+
- Data Science: Scikit-Learn (Model & Scaler), Pandas (EDA), NumPy
- Web Framework: Streamlit (UI/UX)
- Deployment: Streamlit Cloud / GitHub Actions
The model was trained on the [Kaggle Stroke Dataset]. Key technical implementations include:
- Feature Engineering: Created
pulse_pressure(SysBP - DiaBP) andage_glucose_impactto capture non-linear risks. - Preprocessing: Robust handling of class imbalance and
StandardScalernormalization. - Pipeline: Integrated a serialized
joblibpipeline for seamless deployment.
# Clone the repository
git clone [https://github.com/YourUsername/Heart-disease-predictor.git](https://github.com/YourUsername/Heart-disease-predictor.git)
# Install dependencies
pip install -r requirements.txt
# Launch the application
streamlit run app/app.py


