SIDDHARTH GUPTA Siddharthsid12

Hey, I'm Siddharth Gupta 👋

Data Engineer with 2+ years of experience building scalable data pipelines, cloud warehouses and analytics platforms.

I design and implement end-to-end data solutions — from raw ingestion to production-ready analytics — using modern tools like PySpark, Databricks, Delta Lake and Azure.

🔧 What I Work With

Languages        Python · SQL 
Processing       PySpark · Apache Spark · Databricks
Storage          Delta Lake · Azure Data Lake Gen2 · PostgreSQL
Cloud            Azure (ADLS, ADF, Synapse, Databricks) · AWS (S3, Glue, Lambda) · GCP
Modeling         Data Vault 2.0 · Star Schema · SCD Type 2 · Dimensional Modeling
Orchestration    Apache Airflow · Azure Data Factory
Quality          Great Expectations · Custom Validation Frameworks
CI/CD            Azure DevOps · GitHub Actions · Git · Docker
IaC              Terraform

📂 Featured Projects

Azure Databricks Lakehouse

End-to-end data lakehouse on Azure Databricks with Medallion Architecture (Bronze → Silver → Gold), Star Schema with SCD Type 2, automated data quality framework and full CI/CD pipeline on Azure DevOps.

Data Vault 2.0 Warehouse

Data Vault 2.0 warehouse with Hubs, Links, and Satellites from 3 source systems, hash-diff change detection, Star Schema marts with SCD Type 2, orchestrated by Apache Airflow, containerized with Docker.

🧠 What I'm Focused On

Building production-grade data warehouse architectures
Designing reliable pipelines with automated quality gates and full data lineage
Cloud-native data platforms on Azure and Databricks
Writing clean, testable, well-documented data engineering code

📫 Let's Connect

If you're working on data engineering challenges or looking for a collaborator, feel free to reach out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly