Skip to content

rdanielsstat/data-engineering-zoomcamp

Repository files navigation

Data Engineering Zoomcamp

Hands-on projects and notes from the Data Engineering Zoomcamp by Alexey Grigorev and the DataTalks.Club team.

This repository follows the full course curriculum and contains exercises, experiments, and the final project, covering modern data engineering tools and best practices.

Course Overview

The course is a practical, end-to-end introduction to data engineering, including:

  • Containerization and infrastructure as code
  • Workflow orchestration
  • Data ingestion pipelines
  • Data warehousing and analytics engineering
  • Batch and streaming processing
  • A final real-world project with peer review

Tech Stack

Tools and technologies used throughout the course include:

  • Docker and Docker Compose
  • Google Cloud Platform (GCP)
  • Terraform
  • PostgreSQL
  • Kestra
  • BigQuery
  • dbt
  • Apache Spark
  • Kafka
  • Python and SQL

Repository Structure

The repository is organized by modules and workshops:

.
├── 01-containerization-iac    # IAC = infrastructure as code
├── 02-workflow-orchestration
├── 03-data-warehousing
├── 04-analytics-engineering
├── 05-data-platforms
├── 06-agentic-dlt
├── 07-batch-processing
├── 08-streaming
└── final-project

Structure may evolve as the course progresses.

Goals

  • Build production-style data pipelines from scratch
  • Learn cloud-native data engineering workflows
  • Practice infrastructure automation and orchestration
  • Gain experience with both batch and streaming systems

Acknowledgements

All course content and structure are provided by:

  • Alexey Grigorev
  • The DataTalks.Club team

This repository is for personal learning and practice.

License

This project is for educational purposes.

About

Hands-on data engineering projects following the Data Engineering Zoomcamp by DataTalks.Club.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors