📊 Data Mining Lab Projects (AIN429)

📚 Repository Overview

This repository includes three comprehensive data mining projects, each focusing on different analytical techniques:

Comprehensive Data Preprocessing: Data integration, cleaning, scaling, and exploratory data analysis (EDA).
Frequent Pattern Mining: Implementation of Apriori and FP-Growth algorithms for discovering frequent itemsets.
Classification and Clustering: Supervised classification and unsupervised clustering applied to a real-world dataset.

📝 Project 1: Comprehensive Data Preprocessing

Objective: Prepare and preprocess datasets for meaningful analysis and modeling.
Tasks:
- Dataset integration and cleaning.
- Feature scaling and normalization.
- Exploratory Data Analysis (EDA).
- Principal Component Analysis (PCA).
Dataset:
- Data_Main.csv
- Data_Additional.csv
Report: Assignment 2 PDF

Key Learnings:

Handling missing values.
Dataset integration techniques.
Insights from feature distributions and PCA.

📝 Project 2: Frequent Pattern Mining

Objective: Identify frequent itemsets and patterns using association rule mining techniques.
Tasks:
- Dataset transformation for pattern mining.
- Implementation of Apriori Algorithm.
- Implementation of FP-Growth Algorithm.
- Performance comparison between both algorithms.
Dataset: transaction_data.csv
Report: Assignment 3 PDF

Key Learnings:

Understanding support, confidence, and lift metrics.
Differences in runtime and efficiency between Apriori and FP-Growth.
Visualization of frequent patterns.

📝 Project 3: Classification and Clustering

Objective: Apply supervised and unsupervised learning techniques to classify and cluster data.
Tasks:
- Preprocessing and feature selection.
- Implementation of three classification algorithms (e.g., Logistic Regression, Decision Tree, kNN).
- Implementation of three clustering algorithms (e.g., K-Means, DBSCAN, Hierarchical Clustering).
- Evaluation and comparison of results.
Dataset: data_4.csv
Report: Assignment 4 PDF

Key Learnings:

Model evaluation using Precision, Recall, F1-Score.
Feature importance and its impact on classification.
Comparison of clustering algorithm performance.

📊 Dataset Information

Project 1 Dataset

Main and additional datasets with bean seed properties.
Attributes include Area, Perimeter, Major Axis Length, Aspect Ratio, and more.

Source:

Murat Koklu, Selcuk University
Ilker Ali Ozkan, Selcuk University

Project 2 Dataset

Transaction dataset with item-level transactions.

Project 3 Dataset

Real-world dataset for classification and clustering tasks.

⚙️ How to Run the Projects

Clone the Repository:

git clone <repo-link>
cd data-mining-lab-projects

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Classification and Clustering		Classification and Clustering
Comprehensive Data Preprocessing		Comprehensive Data Preprocessing
Frequent Pattern Mining		Frequent Pattern Mining
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Data Mining Lab Projects (AIN429)

📚 Repository Overview

📝 Project 1: Comprehensive Data Preprocessing

📝 Project 2: Frequent Pattern Mining

📝 Project 3: Classification and Clustering

📊 Dataset Information

Project 1 Dataset

Project 2 Dataset

Project 3 Dataset

⚙️ How to Run the Projects

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📊 Data Mining Lab Projects (AIN429)

📚 Repository Overview

📝 Project 1: Comprehensive Data Preprocessing

📝 Project 2: Frequent Pattern Mining

📝 Project 3: Classification and Clustering

📊 Dataset Information

Project 1 Dataset

Project 2 Dataset

Project 3 Dataset

⚙️ How to Run the Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages