Skip to content
View sushmakl95's full-sized avatar

Block or report sushmakl95

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sushmakl95/README.md

Hi, I'm Sushma πŸ‘‹

Senior Data Engineer Β· Bengaluru, India Β· 8+ years building data platforms across AWS and GCP.

I design and ship production data pipelines that are reliable, observable, and cost-aware. Currently a full-time engineer at Equal Experts, with parallel freelance work building AWS-native data pipelines for top clients in the BFSI & Telecom domain.


πŸ› οΈ Tech Stack

Cloud & Infra: AWS (Glue, Lambda, Step Functions, Redshift, Kinesis, EventBridge, Bedrock), GCP (BigQuery, Cloud Composer, GCS), Terraform, Docker

Data Processing: PySpark, Databricks, Delta Lake, dbt, Airflow (Cosmos), Apache Kafka, Debezium

Storage: Snowflake, Redshift, BigQuery, PostgreSQL, OpenSearch, S3, Delta Lake

Languages: Python, SQL, Bash


πŸ“š Featured Projects

These four repos cover the breadth of a modern data engineering role β€” streaming CDC, batch analytics, lakehouse patterns, and performance engineering.

Multi-tenant IoT Lakehouse on Databricks + Delta Lake with Auto Loader, Delta Live Tables, medallion architecture (Bronze β†’ Silver β†’ Gold), SCD2 tracking, and Z-ORDER benchmarks (37Γ— query speedup). 13 Terraform modules for full AWS deployment.

Production-grade CDC pipeline: MySQL β†’ Debezium β†’ Kinesis β†’ S3 β†’ AWS Glue (PySpark) β†’ Redshift + Postgres + OpenSearch. Multi-sink fanout with SCD2 merges, DynamoDB-backed idempotency, Step Functions orchestration, and complete local dev stack via Docker Compose.

Modern data stack reference: dbt + BigQuery + Airflow (Cloud Composer) with medallion layering (staging β†’ intermediate β†’ marts), SCD2 snapshots, exposures, source freshness SLAs, custom tests, and a documented 45Γ— cost reduction via partition + cluster + incremental tuning.

Battle-tested Apache Spark tuning patterns with reproducible benchmarks. 10 techniques β€” partition pruning, broadcast joins, AQE, skew handling, Z-ORDER, predicate pushdown, executor sizing, and more β€” each paired with measured before/after speedups runnable on a laptop.


πŸ’Ό What I'm good at

  • End-to-end data platform design β€” from ingestion (CDC, streaming, batch) through transformation to BI/ML serving layers
  • Performance tuning β€” consistent 5-50Γ— speedups on real production workloads through partition design, join strategy, and executor right-sizing
  • Cost engineering β€” every pipeline I design has documented cost-per-run, lifecycle policies, and idle-cost minimization
  • Testing rigor β€” DQ gates, custom generic + singular dbt tests, CI for every change

πŸ“« Get in touch


Looking for senior / staff data engineering roles. Open to AWS / Databricks-heavy platforms, analytics engineering, and data-platform team leadership positions.

Pinned Loading

  1. aws-glue-cdc-framework aws-glue-cdc-framework Public

    Production-grade CDC pipeline: MySQL β†’ Debezium β†’ Kinesis β†’ S3 β†’ AWS Glue (PySpark) β†’ Redshift + Postgres + OpenSearch. Multi-sink fanout with SCD2, idempotency tracking, and 13 modular Terraform m…

    Python

  2. dbt-bigquery-analytics-platform dbt-bigquery-analytics-platform Public

    Modern data stack reference: dbt + BigQuery + Airflow (Cloud Composer) with medallion layering, SCD2 snapshots, exposures, freshness SLAs, and 45Γ— cost reduction via partition + cluster + increment…

    Python

  3. lakehouse-iot-telemetry-pipeline lakehouse-iot-telemetry-pipeline Public

    Multi-tenant IoT telemetry Lakehouse on Databricks + Delta Lake. PySpark, Auto Loader, DLT, medallion architecture, Terraform IaC.

    Python

  4. spark-performance-optimization-playbook spark-performance-optimization-playbook Public

    Battle-tested Apache Spark tuning patterns with reproducible benchmarks. 10 techniques (partition pruning, broadcast joins, AQE, skew handling, Z-ORDER, and more) β€” each paired with measured before…

    Python