A machine learning classifier that predicts whether online shopping users will complete a purchase based on their browsing behavior. The model uses a k-nearest neighbors algorithm trained on user session data.
- Python 3.6+
- scikit-learn
Install dependencies:
pip install scikit-learnpython shopping.py <data_file.csv>Example:
python shopping.py shopping.csvThe program expects a CSV file with 18 columns:
| Column | Name | Type | Description |
|---|---|---|---|
| 0 | Administrative | int | Number of administrative pages visited |
| 1 | Administrative_Duration | float | Time spent on administrative pages |
| 2 | Informational | int | Number of informational pages visited |
| 3 | Informational_Duration | float | Time spent on informational pages |
| 4 | ProductRelated | int | Number of product-related pages visited |
| 5 | ProductRelated_Duration | float | Time spent on product-related pages |
| 6 | BounceRates | float | Bounce rate of pages visited |
| 7 | ExitRates | float | Exit rate of pages visited |
| 8 | PageValues | float | Average page value of pages visited |
| 9 | SpecialDay | float | Closeness to a special day (e.g., holiday) |
| 10 | Month | string | Month of the session (e.g., "Jan", "Feb") |
| 11 | OperatingSystems | int | Operating system identifier |
| 12 | Browser | int | Browser identifier |
| 13 | Region | int | Geographic region identifier |
| 14 | TrafficType | int | Traffic source type |
| 15 | VisitorType | string | "Returning_Visitor" or other |
| 16 | Weekend | boolean | TRUE/FALSE - whether session was on weekend |
| 17 | Revenue | boolean | TRUE/FALSE - whether user made a purchase |
The following conversions are applied during data loading:
- Month: Converted to integers 0-11 (Jan=0, Feb=1, ..., Dec=11)
- VisitorType: Converted to 1 (Returning_Visitor) or 0 (other)
- Weekend: Converted to 1 (TRUE) or 0 (FALSE)
- Revenue: Converted to 1 (TRUE) or 0 (FALSE)
Loads and preprocesses CSV data. Returns a tuple (evidence, labels) where:
evidence: List of 17-element lists containing numeric featureslabels: List of integers (1 for purchase, 0 for no purchase)
Creates and trains a k-nearest neighbors classifier with k=1. Returns the fitted model.
Calculates classifier performance metrics. Returns a tuple (sensitivity, specificity):
- Sensitivity: True positive rate (correctly identified purchases)
- Specificity: True negative rate (correctly identified non-purchases)
The program outputs:
- Correct: Number of correctly classified test samples
- Incorrect: Number of incorrectly classified test samples
- True Positive Rate: Percentage of actual purchases correctly predicted
- True Negative Rate: Percentage of actual non-purchases correctly predicted
Example output:
Correct: 4088
Incorrect: 844
True Positive Rate: 41.02%
True Negative Rate: 90.55%