Hands-on projects and notes from the Data Engineering Zoomcamp by Alexey Grigorev and the DataTalks.Club team.
This repository follows the full course curriculum and contains exercises, experiments, and the final project, covering modern data engineering tools and best practices.
The course is a practical, end-to-end introduction to data engineering, including:
- Containerization and infrastructure as code
- Workflow orchestration
- Data ingestion pipelines
- Data warehousing and analytics engineering
- Batch and streaming processing
- A final real-world project with peer review
Tools and technologies used throughout the course include:
- Docker and Docker Compose
- Google Cloud Platform (GCP)
- Terraform
- PostgreSQL
- Kestra
- BigQuery
- dbt
- Apache Spark
- Kafka
- Python and SQL
The repository is organized by modules and workshops:
.
├── 01-containerization-iac # IAC = infrastructure as code
├── 02-workflow-orchestration
├── 03-data-warehousing
├── 04-analytics-engineering
├── 05-data-platforms
├── 06-agentic-dlt
├── 07-batch-processing
├── 08-streaming
└── final-project
Structure may evolve as the course progresses.
- Build production-style data pipelines from scratch
- Learn cloud-native data engineering workflows
- Practice infrastructure automation and orchestration
- Gain experience with both batch and streaming systems
All course content and structure are provided by:
- Alexey Grigorev
- The DataTalks.Club team
This repository is for personal learning and practice.
This project is for educational purposes.