This repository includes three comprehensive data mining projects, each focusing on different analytical techniques:
- Comprehensive Data Preprocessing: Data integration, cleaning, scaling, and exploratory data analysis (EDA).
- Frequent Pattern Mining: Implementation of Apriori and FP-Growth algorithms for discovering frequent itemsets.
- Classification and Clustering: Supervised classification and unsupervised clustering applied to a real-world dataset.
- Objective: Prepare and preprocess datasets for meaningful analysis and modeling.
- Tasks:
- Dataset integration and cleaning.
- Feature scaling and normalization.
- Exploratory Data Analysis (EDA).
- Principal Component Analysis (PCA).
- Dataset:
Data_Main.csvData_Additional.csv
- Report: Assignment 2 PDF
Key Learnings:
- Handling missing values.
- Dataset integration techniques.
- Insights from feature distributions and PCA.
- Objective: Identify frequent itemsets and patterns using association rule mining techniques.
- Tasks:
- Dataset transformation for pattern mining.
- Implementation of Apriori Algorithm.
- Implementation of FP-Growth Algorithm.
- Performance comparison between both algorithms.
- Dataset:
transaction_data.csv - Report: Assignment 3 PDF
Key Learnings:
- Understanding support, confidence, and lift metrics.
- Differences in runtime and efficiency between Apriori and FP-Growth.
- Visualization of frequent patterns.
- Objective: Apply supervised and unsupervised learning techniques to classify and cluster data.
- Tasks:
- Preprocessing and feature selection.
- Implementation of three classification algorithms (e.g., Logistic Regression, Decision Tree, kNN).
- Implementation of three clustering algorithms (e.g., K-Means, DBSCAN, Hierarchical Clustering).
- Evaluation and comparison of results.
- Dataset:
data_4.csv - Report: Assignment 4 PDF
Key Learnings:
- Model evaluation using Precision, Recall, F1-Score.
- Feature importance and its impact on classification.
- Comparison of clustering algorithm performance.
- Main and additional datasets with bean seed properties.
- Attributes include Area, Perimeter, Major Axis Length, Aspect Ratio, and more.
Source:
- Murat Koklu, Selcuk University
- Ilker Ali Ozkan, Selcuk University
- Transaction dataset with item-level transactions.
- Real-world dataset for classification and clustering tasks.
- Clone the Repository:
git clone <repo-link> cd data-mining-lab-projects