Data Engineering Zoomcamp

Hands-on projects and notes from the Data Engineering Zoomcamp by Alexey Grigorev and the DataTalks.Club team.

This repository follows the full course curriculum and contains exercises, experiments, and the final project, covering modern data engineering tools and best practices.

Course Overview

The course is a practical, end-to-end introduction to data engineering, including:

Containerization and infrastructure as code
Workflow orchestration
Data ingestion pipelines
Data warehousing and analytics engineering
Batch and streaming processing
A final real-world project with peer review

Tech Stack

Tools and technologies used throughout the course include:

Docker and Docker Compose
Google Cloud Platform (GCP)
Terraform
PostgreSQL
Kestra
BigQuery
dbt
Apache Spark
Kafka
Python and SQL

Repository Structure

The repository is organized by modules and workshops:

.
├── 01-containerization-iac    # IAC = infrastructure as code
├── 02-workflow-orchestration
├── 03-data-warehousing
├── 04-analytics-engineering
├── 05-data-platforms
├── 06-agentic-dlt
├── 07-batch-processing
├── 08-streaming
└── final-project

Structure may evolve as the course progresses.

Goals

Build production-style data pipelines from scratch
Learn cloud-native data engineering workflows
Practice infrastructure automation and orchestration
Gain experience with both batch and streaming systems

Acknowledgements

All course content and structure are provided by:

Alexey Grigorev
The DataTalks.Club team

This repository is for personal learning and practice.

License

This project is for educational purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Zoomcamp

Course Overview

Tech Stack

Repository Structure

Goals

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
01-containerization-iac		01-containerization-iac
02-workflow-orchestration		02-workflow-orchestration
03-data-warehousing		03-data-warehousing
04-analytics-engineering		04-analytics-engineering
05-data-platforms		05-data-platforms
06-agentic-dlt		06-agentic-dlt
07-batch-processing		07-batch-processing
08-streaming		08-streaming
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Zoomcamp

Course Overview

Tech Stack

Repository Structure

Goals

Acknowledgements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages