Skip to content

emirhanutku/data-mining-lab-projects

Repository files navigation

📊 Data Mining Lab Projects (AIN429)

📚 Repository Overview

This repository includes three comprehensive data mining projects, each focusing on different analytical techniques:

  1. Comprehensive Data Preprocessing: Data integration, cleaning, scaling, and exploratory data analysis (EDA).
  2. Frequent Pattern Mining: Implementation of Apriori and FP-Growth algorithms for discovering frequent itemsets.
  3. Classification and Clustering: Supervised classification and unsupervised clustering applied to a real-world dataset.

📝 Project 1: Comprehensive Data Preprocessing

  • Objective: Prepare and preprocess datasets for meaningful analysis and modeling.
  • Tasks:
    • Dataset integration and cleaning.
    • Feature scaling and normalization.
    • Exploratory Data Analysis (EDA).
    • Principal Component Analysis (PCA).
  • Dataset:
    • Data_Main.csv
    • Data_Additional.csv
  • Report: Assignment 2 PDF

Key Learnings:

  • Handling missing values.
  • Dataset integration techniques.
  • Insights from feature distributions and PCA.

📝 Project 2: Frequent Pattern Mining

  • Objective: Identify frequent itemsets and patterns using association rule mining techniques.
  • Tasks:
    • Dataset transformation for pattern mining.
    • Implementation of Apriori Algorithm.
    • Implementation of FP-Growth Algorithm.
    • Performance comparison between both algorithms.
  • Dataset: transaction_data.csv
  • Report: Assignment 3 PDF

Key Learnings:

  • Understanding support, confidence, and lift metrics.
  • Differences in runtime and efficiency between Apriori and FP-Growth.
  • Visualization of frequent patterns.

📝 Project 3: Classification and Clustering

  • Objective: Apply supervised and unsupervised learning techniques to classify and cluster data.
  • Tasks:
    • Preprocessing and feature selection.
    • Implementation of three classification algorithms (e.g., Logistic Regression, Decision Tree, kNN).
    • Implementation of three clustering algorithms (e.g., K-Means, DBSCAN, Hierarchical Clustering).
    • Evaluation and comparison of results.
  • Dataset: data_4.csv
  • Report: Assignment 4 PDF

Key Learnings:

  • Model evaluation using Precision, Recall, F1-Score.
  • Feature importance and its impact on classification.
  • Comparison of clustering algorithm performance.

📊 Dataset Information

Project 1 Dataset

  • Main and additional datasets with bean seed properties.
  • Attributes include Area, Perimeter, Major Axis Length, Aspect Ratio, and more.

Source:

  • Murat Koklu, Selcuk University
  • Ilker Ali Ozkan, Selcuk University

Project 2 Dataset

  • Transaction dataset with item-level transactions.

Project 3 Dataset

  • Real-world dataset for classification and clustering tasks.

⚙️ How to Run the Projects

  1. Clone the Repository:
    git clone <repo-link>
    cd data-mining-lab-projects

About

This repository contains three distinct data mining projects completed as part of the AIN429 Data Mining Laboratory course. Each project explores different aspects of data preprocessing, frequent pattern mining, classification, and clustering, showcasing practical implementations and insights using real-world datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors