Shopping Behavior Prediction using K-Nearest Neighbors

A machine learning classifier that predicts whether online shopping users will complete a purchase based on their browsing behavior. The model uses a k-nearest neighbors algorithm trained on user session data.

Requirements

Python 3.6+
scikit-learn

Install dependencies:

pip install scikit-learn

Usage

python shopping.py <data_file.csv>

Example:

python shopping.py shopping.csv

Data Format

The program expects a CSV file with 18 columns:

Column	Name	Type	Description
0	Administrative	int	Number of administrative pages visited
1	Administrative_Duration	float	Time spent on administrative pages
2	Informational	int	Number of informational pages visited
3	Informational_Duration	float	Time spent on informational pages
4	ProductRelated	int	Number of product-related pages visited
5	ProductRelated_Duration	float	Time spent on product-related pages
6	BounceRates	float	Bounce rate of pages visited
7	ExitRates	float	Exit rate of pages visited
8	PageValues	float	Average page value of pages visited
9	SpecialDay	float	Closeness to a special day (e.g., holiday)
10	Month	string	Month of the session (e.g., "Jan", "Feb")
11	OperatingSystems	int	Operating system identifier
12	Browser	int	Browser identifier
13	Region	int	Geographic region identifier
14	TrafficType	int	Traffic source type
15	VisitorType	string	"Returning_Visitor" or other
16	Weekend	boolean	TRUE/FALSE - whether session was on weekend
17	Revenue	boolean	TRUE/FALSE - whether user made a purchase

Data Conversions

The following conversions are applied during data loading:

Month: Converted to integers 0-11 (Jan=0, Feb=1, ..., Dec=11)
VisitorType: Converted to 1 (Returning_Visitor) or 0 (other)
Weekend: Converted to 1 (TRUE) or 0 (FALSE)
Revenue: Converted to 1 (TRUE) or 0 (FALSE)

Implementation Details

`load_data(filename)`

Loads and preprocesses CSV data. Returns a tuple (evidence, labels) where:

evidence: List of 17-element lists containing numeric features
labels: List of integers (1 for purchase, 0 for no purchase)

`train_model(evidence, labels)`

Creates and trains a k-nearest neighbors classifier with k=1. Returns the fitted model.

`evaluate(labels, predictions)`

Calculates classifier performance metrics. Returns a tuple (sensitivity, specificity):

Sensitivity: True positive rate (correctly identified purchases)
Specificity: True negative rate (correctly identified non-purchases)

Output

The program outputs:

Correct: Number of correctly classified test samples
Incorrect: Number of incorrectly classified test samples
True Positive Rate: Percentage of actual purchases correctly predicted
True Negative Rate: Percentage of actual non-purchases correctly predicted

Example output:

Correct: 4088
Incorrect: 844
True Positive Rate: 41.02%
True Negative Rate: 90.55%

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
requirements.txt		requirements.txt
shopping.csv		shopping.csv
shopping.py		shopping.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shopping Behavior Prediction using K-Nearest Neighbors

Requirements

Usage

Data Format

Data Conversions

Implementation Details

`load_data(filename)`

`train_model(evidence, labels)`

`evaluate(labels, predictions)`

Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Shopping Behavior Prediction using K-Nearest Neighbors

Requirements

Usage

Data Format

Data Conversions

Implementation Details

load_data(filename)

train_model(evidence, labels)

evaluate(labels, predictions)

Output

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`load_data(filename)`

`train_model(evidence, labels)`

`evaluate(labels, predictions)`

Packages