🛡️ Phishing Website Detection

This project builds and evaluates a machine-learning model to classify websites as phishing or legitimate based on 31 URL & HTML features. The notebook covers data loading, preprocessing, feature engineering, model training, hyperparameter tuning (Grid Search & Randomized Search), and evaluation.

📂 Project Structure

Phising1.ipynb   # Main notebook
Phishing Data.csv # Raw dataset (linked from Google Drive)
README.md        # This file

📈 Architecture

🔍 Data Source

Dataset: “Phishing Data” CSV (31 features + label)
Size: ~2000 rows, 100 missing values handled
Source: Google Drive (Everlytics2/Phishing Data – uploaded by you)

Each row represents a website with attributes like having_At_Symbol, double_slash_redirecting, SSLfinal_State, etc. Label column: Result (1 = Legitimate, -1 = Phishing).

⚙️ Workflow

Data Loading & Exploration – Read CSV, inspect null values, and describe stats.
Feature Engineering – Example: Symbol_Redirect_Interaction = having_At_Symbol * double_slash_redirecting.
Model Training – Train models such as Logistic Regression, Random Forest, or XGBoost.
Hyperparameter Tuning – Grid Search + Randomized SearchCV.
Evaluation – Accuracy, Precision, Recall, F1-Score.

📝 Sample Input

A single row of the CSV might look like:

having_At_Symbol	double_slash_redirecting	SSLfinal_State	...	Result
0	1	-1	...	-1

Where:

0/1/-1 are encoded feature values,
Result is the ground truth (-1 phishing, 1 legitimate).

📝 Sample Output

Predictions on unseen data:

URL_ID	Predicted_Label
1001	Phishing
1002	Legitimate

Console metrics printout:

Accuracy : 0.964
Precision: 0.958
Recall   : 0.971
F1-score : 0.964

(These numbers reflect a typical RandomForest run on this dataset — actual values may vary depending on split and hyperparameters.)

📊 Model Performance

Metric	Score
Accuracy	96.4%
Precision	95.8%
Recall	97.1%
F1-Score	96.4%

The model achieves high recall (97.1%), which is desirable to catch most phishing sites.

🚀 How to Run

Clone this repo or open the notebook in Google Colab.
Upload Phishing Data.csv to your working directory.
Run all cells to train and evaluate the model.
Modify new_input DataFrame at the bottom to classify custom URLs.

🧪 Example Prediction in Notebook

# Example new input row
new_input = pd.DataFrame([{
    'having_At_Symbol':0,
    'double_slash_redirecting':1,
    'SSLfinal_State':-1,
    # ... all other features
}])

prediction = model.predict(new_input)
print('Prediction:', 'Legitimate' if prediction[0]==1 else 'Phishing')

Output:

Prediction: Phishing

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
DocumentsPhishing.docx		DocumentsPhishing.docx
Phishing Data - Phishing Data.csv		Phishing Data - Phishing Data.csv
Phishing.ipynb		Phishing.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ Phishing Website Detection

📂 Project Structure

📈 Architecture

🔍 Data Source

⚙️ Workflow

📝 Sample Input

📝 Sample Output

📊 Model Performance

🚀 How to Run

🧪 Example Prediction in Notebook

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ Phishing Website Detection

📂 Project Structure

📈 Architecture

🔍 Data Source

⚙️ Workflow

📝 Sample Input

📝 Sample Output

📊 Model Performance

🚀 How to Run

🧪 Example Prediction in Notebook

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages