Skip to content

ealbertoav/shopping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Shopping Behavior Prediction using K-Nearest Neighbors

A machine learning classifier that predicts whether online shopping users will complete a purchase based on their browsing behavior. The model uses a k-nearest neighbors algorithm trained on user session data.

Requirements

  • Python 3.6+
  • scikit-learn

Install dependencies:

pip install scikit-learn

Usage

python shopping.py <data_file.csv>

Example:

python shopping.py shopping.csv

Data Format

The program expects a CSV file with 18 columns:

Column Name Type Description
0 Administrative int Number of administrative pages visited
1 Administrative_Duration float Time spent on administrative pages
2 Informational int Number of informational pages visited
3 Informational_Duration float Time spent on informational pages
4 ProductRelated int Number of product-related pages visited
5 ProductRelated_Duration float Time spent on product-related pages
6 BounceRates float Bounce rate of pages visited
7 ExitRates float Exit rate of pages visited
8 PageValues float Average page value of pages visited
9 SpecialDay float Closeness to a special day (e.g., holiday)
10 Month string Month of the session (e.g., "Jan", "Feb")
11 OperatingSystems int Operating system identifier
12 Browser int Browser identifier
13 Region int Geographic region identifier
14 TrafficType int Traffic source type
15 VisitorType string "Returning_Visitor" or other
16 Weekend boolean TRUE/FALSE - whether session was on weekend
17 Revenue boolean TRUE/FALSE - whether user made a purchase

Data Conversions

The following conversions are applied during data loading:

  • Month: Converted to integers 0-11 (Jan=0, Feb=1, ..., Dec=11)
  • VisitorType: Converted to 1 (Returning_Visitor) or 0 (other)
  • Weekend: Converted to 1 (TRUE) or 0 (FALSE)
  • Revenue: Converted to 1 (TRUE) or 0 (FALSE)

Implementation Details

load_data(filename)

Loads and preprocesses CSV data. Returns a tuple (evidence, labels) where:

  • evidence: List of 17-element lists containing numeric features
  • labels: List of integers (1 for purchase, 0 for no purchase)

train_model(evidence, labels)

Creates and trains a k-nearest neighbors classifier with k=1. Returns the fitted model.

evaluate(labels, predictions)

Calculates classifier performance metrics. Returns a tuple (sensitivity, specificity):

  • Sensitivity: True positive rate (correctly identified purchases)
  • Specificity: True negative rate (correctly identified non-purchases)

Output

The program outputs:

  • Correct: Number of correctly classified test samples
  • Incorrect: Number of incorrectly classified test samples
  • True Positive Rate: Percentage of actual purchases correctly predicted
  • True Negative Rate: Percentage of actual non-purchases correctly predicted

Example output:

Correct: 4088
Incorrect: 844
True Positive Rate: 41.02%
True Negative Rate: 90.55%

About

A machine learning classifier that predicts whether online shopping users will complete a purchase based on their browsing behavior. The model uses a k-nearest neighbors algorithm trained on user session data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages