-
Prologue 📥 Download Slides
Show description
This introductory presentation outlines the foundational structure, learning objectives, and expectations for our comprehensive course on knowledge discovery in databases. By establishing the pedagogical framework, it prepares students to engage with both the theoretical principles and practical applications of data mining.
-
Introduction 📥 Download Slides
Show description
This lecture explores the fundamental principles, overarching methodologies, and significant challenges of data mining within an era of exponential data growth. It emphasizes the practical and academic relevance of extracting actionable knowledge from large datasets to support informed decision-making across various disciplines.
-
Data 📥 Download Slides
Show description
This session provides a detailed examination of data characteristics, covering essential attribute types, statistical descriptors, and visualization techniques necessary for comprehensive data profiling. Understanding these core concepts is critical for effectively measuring data similarity and establishing a robust foundation for subsequent analytical processing.
-
Preprocessing 📥 Download Slides
Show description
This presentation details the critical phases of data preprocessing, including data cleaning, integration, reduction, and transformation, which are essential for mitigating the inherent imperfections of real-world datasets. Ultimately, the methodologies discussed underscore the necessity of high-quality data preparation to ensure the accuracy and reliability of downstream machine learning and data mining models.
-
OLAP 📥 Download Slides
Show description
This presentation examines the foundational architectures of data warehousing, emphasizing multidimensional data modeling and Online Analytical Processing. By exploring concepts such as data cubes and schema designs, it highlights the critical role of consolidated, historical data in facilitating advanced enterprise decision-making.
-
Mining Frequent Patterns 📥 Download Slides
Show description
This lecture explores the principles of frequent pattern mining and association rule generation, which are essential for discovering inherent regularities within large datasets. It details scalable algorithmic approaches and evaluation metrics that enable the identification of meaningful data relationships to support predictive analytics.
-
Classification 📥 Download Slides
Show description
This document provides a comprehensive overview of supervised learning, specifically focusing on the theoretical underpinnings and practical applications of classification algorithms. By detailing techniques such as decision tree induction and Bayesian methods, it demonstrates how predictive models are constructed and evaluated to categorize unseen information.
-
Cluster Analysis 📥 Download Slides
Show description
This slide deck investigates the core paradigms of cluster analysis, a fundamental unsupervised learning technique used to segment data into meaningful, cohesive groups. It systematically reviews various algorithmic approaches, including partitioning and hierarchical methods, illustrating their significance in uncovering hidden structural patterns across diverse analytical domains.
-
Outlier Analysis 📥 Download Slides
Show description
This presentation addresses the principles of outlier analysis, detailing the diverse methodologies used to detect significant anomalies and deviations within complex datasets. It outlines statistical, proximity-based, and clustering approaches, underscoring their vital importance in ensuring data integrity and facilitating anomaly recognition in real-world systems.
-
Introduction to Python & Pandas
Student: 📄 Exercise Sheet | 📦 Additional Files
Solution: ✅ Solution | 📦 Additional FilesShow description
This exercise sheet provides a foundational introduction to essential programming environments and data manipulation libraries utilized in computational analytics. Mastering these technical frameworks is a critical prerequisite for executing complex data science workflows and implementing advanced machine learning algorithms.
-
Data Analysis and Preprocessing
Student: 📄 Exercise Sheet | 📦 Additional Files
Solution: ✅ Solution | 📦 Additional FilesShow description
This exercise set focuses on the critical initial stages of the data mining pipeline, encompassing data cleaning, integration, and transformation methodologies. These preparatory techniques are intrinsically vital for ensuring underlying data quality, thereby enabling the extraction of valid and robust analytical insights.
-
Frequent Patterns
Student: 📄 Exercise Sheet | 📦 Additional Files
Solution: ✅ Solution | 📦 Additional FilesShow description
This assignment explores the algorithmic identification of recurring itemsets and association rules within large-scale transactional databases. By practically applying prominent pattern mining methodologies, it highlights how inherent structural regularities can be autonomously discovered and leveraged for strategic data analysis.
-
Classification
Student: 📄 Exercise Sheet | 📦 Additional Files
Solution: ✅ Solution | 📦 Additional FilesShow description
This exercise delves into supervised learning paradigms, requiring the mathematical construction and empirical evaluation of foundational classification models. It emphasizes the critical processes of feature selection and performance metric analysis to rigorously validate the predictive efficacy of these algorithms.
-
Clustering
Student: 📄 Exercise Sheet | 📦 Additional Files
Solution: ✅ Solution | 📦 Additional FilesShow description
This exercise sheet centers on the application of unsupervised learning methodologies, specifically partitioning and density-based clustering algorithms, to discover latent groupings within unstructured datasets. By analyzing proximity and density metrics, it demonstrates how spatial data can be autonomously segmented into cohesive, analytically meaningful categories.
-
Frequent Patterns 📝 Task Description
Show description
This programming assignment focuses on the computational extraction of recurring itemsets from transactional databases through the implementation of the foundational Apriori and FP-growth algorithms. By independently programming these core techniques, the exercise provides profound practical insight into scalable pattern recognition and the underlying data structures required for efficient association rule mining.
-
Classification 📝 Task Description
Show description
This submission necessitates the practical implementation of fundamental predictive modeling techniques, specifically focusing on building Decision Tree induction and Naïve Bayes classification algorithms from the ground up. Through the development of these supervised learning frameworks, the exercise emphasizes the critical mathematical criteria—such as entropy, impurity, and probabilistic likelihoods—that are essential for constructing robust automated categorization systems.
-
Clustering 📝 Task Description
Show description
This submission centers on unsupervised learning paradigms, requiring the algorithmic implementation of K-means and DBSCAN to effectively partition multidimensional spatial data. By designing both distance-based and density-based models, the assignment reinforces the computational strategies utilized by modern analytical systems to autonomously uncover hidden structural relationships and cohesive segments within unlabeled datasets.
- Semester duration: 13 April 2026 – 17 July 2026
- Public holidays:
- Friday, 1 May 2026
- Thursday, 14 May 2026
- Monday, 25 May 2026
- Thursday, 4 June 2026
- FAU specific holidays (no lectures and exercises):
- Tuesday, 26 May 2026
- Friday, 5 June 2026
To build these lecture slides locally on your machine you'll need an up-to-date version of LaTeX such as texlive or MikTex.
We use the framework pre-commit to manage our pre-commit hooks. This simplifies the maintenance of the hooks - especially on heterogeneous systems - but requires an initial installation process of the individual users. First, the framework itself must be installed. This process is explained on the framework's website under "Installation". The second thing that needs to be done is to install the pre-commit hooks themselves. This can be achieved by running the command pre-commit install in the root directory of this project. We assume that each commit has been validated with these pre-commit hooks and will not accept pull requests that contain unvalidated commits (the pre-commit hooks are also checked again on the server side by a GitHub action).