KDD - Knowledge Discovery in Databases

Downloads

Lecture

Prologue 📥 Download Slides

Show description

This introductory presentation outlines the foundational structure, learning objectives, and expectations for our comprehensive course on knowledge discovery in databases. By establishing the pedagogical framework, it prepares students to engage with both the theoretical principles and practical applications of data mining.
Introduction 📥 Download Slides

Show description

This lecture explores the fundamental principles, overarching methodologies, and significant challenges of data mining within an era of exponential data growth. It emphasizes the practical and academic relevance of extracting actionable knowledge from large datasets to support informed decision-making across various disciplines.
Data 📥 Download Slides

Show description

This session provides a detailed examination of data characteristics, covering essential attribute types, statistical descriptors, and visualization techniques necessary for comprehensive data profiling. Understanding these core concepts is critical for effectively measuring data similarity and establishing a robust foundation for subsequent analytical processing.
Preprocessing 📥 Download Slides

Show description

This presentation details the critical phases of data preprocessing, including data cleaning, integration, reduction, and transformation, which are essential for mitigating the inherent imperfections of real-world datasets. Ultimately, the methodologies discussed underscore the necessity of high-quality data preparation to ensure the accuracy and reliability of downstream machine learning and data mining models.
OLAP 📥 Download Slides

Show description

This presentation examines the foundational architectures of data warehousing, emphasizing multidimensional data modeling and Online Analytical Processing. By exploring concepts such as data cubes and schema designs, it highlights the critical role of consolidated, historical data in facilitating advanced enterprise decision-making.
Mining Frequent Patterns 📥 Download Slides

Show description

This lecture explores the principles of frequent pattern mining and association rule generation, which are essential for discovering inherent regularities within large datasets. It details scalable algorithmic approaches and evaluation metrics that enable the identification of meaningful data relationships to support predictive analytics.
Classification 📥 Download Slides

Show description

This document provides a comprehensive overview of supervised learning, specifically focusing on the theoretical underpinnings and practical applications of classification algorithms. By detailing techniques such as decision tree induction and Bayesian methods, it demonstrates how predictive models are constructed and evaluated to categorize unseen information.
Cluster Analysis 📥 Download Slides

Show description

This slide deck investigates the core paradigms of cluster analysis, a fundamental unsupervised learning technique used to segment data into meaningful, cohesive groups. It systematically reviews various algorithmic approaches, including partitioning and hierarchical methods, illustrating their significance in uncovering hidden structural patterns across diverse analytical domains.
Outlier Analysis 📥 Download Slides

Show description

This presentation addresses the principles of outlier analysis, detailing the diverse methodologies used to detect significant anomalies and deviations within complex datasets. It outlines statistical, proximity-based, and clustering approaches, underscoring their vital importance in ensuring data integrity and facilitating anomaly recognition in real-world systems.

Exercise

Introduction to Python & Pandas
Student: 📄 Exercise Sheet | 📦 Additional Files
Solution: ✅ Solution | 📦 Additional Files

Show description

This exercise sheet provides a foundational introduction to essential programming environments and data manipulation libraries utilized in computational analytics. Mastering these technical frameworks is a critical prerequisite for executing complex data science workflows and implementing advanced machine learning algorithms.
Data Analysis and Preprocessing
Student: 📄 Exercise Sheet | 📦 Additional Files
Solution: ✅ Solution | 📦 Additional Files

Show description

This exercise set focuses on the critical initial stages of the data mining pipeline, encompassing data cleaning, integration, and transformation methodologies. These preparatory techniques are intrinsically vital for ensuring underlying data quality, thereby enabling the extraction of valid and robust analytical insights.
Frequent Patterns
Student: 📄 Exercise Sheet | 📦 Additional Files
Solution: ✅ Solution | 📦 Additional Files

Show description

This assignment explores the algorithmic identification of recurring itemsets and association rules within large-scale transactional databases. By practically applying prominent pattern mining methodologies, it highlights how inherent structural regularities can be autonomously discovered and leveraged for strategic data analysis.
Classification
Student: 📄 Exercise Sheet | 📦 Additional Files
Solution: ✅ Solution | 📦 Additional Files

Show description

This exercise delves into supervised learning paradigms, requiring the mathematical construction and empirical evaluation of foundational classification models. It emphasizes the critical processes of feature selection and performance metric analysis to rigorously validate the predictive efficacy of these algorithms.
Clustering
Student: 📄 Exercise Sheet | 📦 Additional Files
Solution: ✅ Solution | 📦 Additional Files

Show description

This exercise sheet centers on the application of unsupervised learning methodologies, specifically partitioning and density-based clustering algorithms, to discover latent groupings within unstructured datasets. By analyzing proximity and density metrics, it demonstrates how spatial data can be autonomously segmented into cohesive, analytically meaningful categories.

Submission

Frequent Patterns 📝 Task Description

Show description

This programming assignment focuses on the computational extraction of recurring itemsets from transactional databases through the implementation of the foundational Apriori and FP-growth algorithms. By independently programming these core techniques, the exercise provides profound practical insight into scalable pattern recognition and the underlying data structures required for efficient association rule mining.
Classification 📝 Task Description

Show description

This submission necessitates the practical implementation of fundamental predictive modeling techniques, specifically focusing on building Decision Tree induction and Naïve Bayes classification algorithms from the ground up. Through the development of these supervised learning frameworks, the exercise emphasizes the critical mathematical criteria—such as entropy, impurity, and probabilistic likelihoods—that are essential for constructing robust automated categorization systems.
Clustering 📝 Task Description

Show description

This submission centers on unsupervised learning paradigms, requiring the algorithmic implementation of K-means and DBSCAN to effectively partition multidimensional spatial data. By designing both distance-based and density-based models, the assignment reinforces the computational strategies utilized by modern analytical systems to autonomously uncover hidden structural relationships and cohesive segments within unlabeled datasets.

Summer Semester 2026

Semester duration: 13 April 2026 – 17 July 2026
Public holidays:
- Friday, 1 May 2026
- Thursday, 14 May 2026
- Monday, 25 May 2026
- Thursday, 4 June 2026
FAU specific holidays (no lectures and exercises):
- Tuesday, 26 May 2026
- Friday, 5 June 2026

Setup for Building Lecture Slides Locally

To build these lecture slides locally on your machine you'll need an up-to-date version of LaTeX such as texlive or MikTex.

Setup for Commiting

We use the framework pre-commit to manage our pre-commit hooks. This simplifies the maintenance of the hooks - especially on heterogeneous systems - but requires an initial installation process of the individual users. First, the framework itself must be installed. This process is explained on the framework's website under "Installation". The second thing that needs to be done is to install the pre-commit hooks themselves. This can be achieved by running the command pre-commit install in the root directory of this project. We assume that each commit has been validated with these pre-commit hooks and will not accept pull requests that contain unvalidated commits (the pre-commit hooks are also checked again on the server side by a GitHub action).

Name		Name	Last commit message	Last commit date
Latest commit History 371 Commits
.github/workflows		.github/workflows
exercise		exercise
lecture		lecture
submission		submission
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KDD - Knowledge Discovery in Databases

Downloads

Lecture

Exercise

Submission

Summer Semester 2026

Setup for Building Lecture Slides Locally

Setup for Commiting

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

KDD - Knowledge Discovery in Databases

Downloads

Lecture

Exercise

Submission

Summer Semester 2026

Setup for Building Lecture Slides Locally

Setup for Commiting

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages