- The dataset includes information about various substances present in water, typically measured in units of concentration per liter.
All attributes are numeric variables and they are listed below :
- A description of the data attributes.
aluminium- dangerous if greater than 2.8ammonia- dangerous if greater than 32.5arsenic- dangerous if greater than 0.01barium- dangerous if greater than 2cadmium- dangerous if greater than 0.005chloramine- dangerous if greater than 4chromium- dangerous if greater than 0.1copper- dangerous if greater than 1.3flouride- dangerous if greater than 1.5bacteria- dangerous if greater than 0viruses- dangerous if greater than 0lead- dangerous if greater than 0.015nitrates- dangerous if greater than 10nitrites- dangerous if greater than 1mercury- dangerous if greater than 0.002perchlorate- dangerous if greater than 56radium- dangerous if greater than 5selenium- dangerous if greater than 0.5silver- dangerous if greater than 0.1uranium- dangerous if greater than 0.3is_safe- class attribute {0 - not safe, 1 - safe}
Dataset Source Link : https://www.kaggle.com/datasets/mssmartypants/water-quality
The objective is to categorize the provided instances into one of two distinct categories and predict the percentage indicating the quality of water being good.
- Data Preprocessing:
- In this initial stage, we identify the null values. The #NUM! values are replaced with NaN, and then the NaN values are dropped, as there are very few of them.
- Data Transformation:
- In this stage, standard scaling is performed on the complete dataset except for the target variable.
- Model Training:
- In this phase, the model was trained using a Random Forest Classifier, which achieved an accuracy of 95.19%.
- In a binary classification problem, the function
predict_probawas utilized to compute the probabilities for the givenx_testdata.
- Flask App Creation:
- The Flask library is used to develop a web application that serves as a user interface for predicting water quality as a percentage.



