Senior Data Engineer Β· Bengaluru, India Β· 8+ years building data platforms across AWS and GCP.
I design and ship production data pipelines that are reliable, observable, and cost-aware. Currently a full-time engineer at Equal Experts, with parallel freelance work building AWS-native data pipelines for top clients in the BFSI & Telecom domain.
Cloud & Infra: AWS (Glue, Lambda, Step Functions, Redshift, Kinesis, EventBridge, Bedrock), GCP (BigQuery, Cloud Composer, GCS), Terraform, Docker
Data Processing: PySpark, Databricks, Delta Lake, dbt, Airflow (Cosmos), Apache Kafka, Debezium
Storage: Snowflake, Redshift, BigQuery, PostgreSQL, OpenSearch, S3, Delta Lake
Languages: Python, SQL, Bash
These four repos cover the breadth of a modern data engineering role β streaming CDC, batch analytics, lakehouse patterns, and performance engineering.
ποΈ lakehouse-iot-telemetry-pipeline
Multi-tenant IoT Lakehouse on Databricks + Delta Lake with Auto Loader, Delta Live Tables, medallion architecture (Bronze β Silver β Gold), SCD2 tracking, and Z-ORDER benchmarks (37Γ query speedup). 13 Terraform modules for full AWS deployment.
Production-grade CDC pipeline: MySQL β Debezium β Kinesis β S3 β AWS Glue (PySpark) β Redshift + Postgres + OpenSearch. Multi-sink fanout with SCD2 merges, DynamoDB-backed idempotency, Step Functions orchestration, and complete local dev stack via Docker Compose.
Modern data stack reference: dbt + BigQuery + Airflow (Cloud Composer) with medallion layering (staging β intermediate β marts), SCD2 snapshots, exposures, source freshness SLAs, custom tests, and a documented 45Γ cost reduction via partition + cluster + incremental tuning.
Battle-tested Apache Spark tuning patterns with reproducible benchmarks. 10 techniques β partition pruning, broadcast joins, AQE, skew handling, Z-ORDER, predicate pushdown, executor sizing, and more β each paired with measured before/after speedups runnable on a laptop.
- End-to-end data platform design β from ingestion (CDC, streaming, batch) through transformation to BI/ML serving layers
- Performance tuning β consistent 5-50Γ speedups on real production workloads through partition design, join strategy, and executor right-sizing
- Cost engineering β every pipeline I design has documented cost-per-run, lifecycle policies, and idle-cost minimization
- Testing rigor β DQ gates, custom generic + singular dbt tests, CI for every change
- πΌ LinkedIn
- βοΈ sushmakl95@gmail.com
Looking for senior / staff data engineering roles. Open to AWS / Databricks-heavy platforms, analytics engineering, and data-platform team leadership positions.