My engineering journey started in the physical world. With a background in Mechanical Engineering and a recognized passion for illustration (honored with a National Award from the President of India in 2012 & 2017), I have always loved breaking down how complex systems fit together.
Today, I apply that same spatial reasoning and structural mindset to distributed cloud systems, DevOps pipelines, and Site Reliability Engineering (SRE). I specialize in automating away operational toil, securing infrastructure, and managing highly reliable enterprise environments (currently managing 50+ distributed compute nodes and high-throughput event streams).
Languages, Backend & Databases:
I believe in understanding why a technology exists, not just how to use it. When I am not building infrastructure, I am deep-diving into the mechanics of next-generation tools:
- Event Streaming at Scale: Researching how massive platforms handle throughput bottlenecks beyond standard Kafka implementations.
- Deep OS Troubleshooting: Going beyond standard administration by utilizing
stracefor system calls, tracking inode allocations, and performing packet forensics withtcpdumpand Wireshark. - Version Control Under the Hood: Analyzing Meta’s Sapling to understand how data structures and historical node snapshots operate differently from standard Git architecture.
- Orchestration: Transitioning deeper into container orchestration by exploring Kubernetes (K8s) and multi-stage Docker optimization.
- Backend Engineering: Expanding my architectural scope by building Python-based web backends (including bypassing frameworks to build raw HTTP servers using basic TCP sockets).
- Algorithms: Consistently sharpening my problem-solving efficiency and data structures knowledge.
Catch up on my latest algorithmic problem-solving:
Note: Due to strict corporate NDAs and security policies, specific enterprise project code is maintained in private repositories. Below are high-level overviews of the architectural challenges I solve in production environments.
- Zero-Touch QA Validation Engine: Engineered an AWS-native Python automation tool using
boto3to dynamically discover cloud resources at runtime. Integrated with CloudWatch and Datadog to automate deployment health checks, reducing manual sign-off time from ~30 minutes to under 2 minutes while enforcing a zero-error baseline. - High-Throughput Data Pipelines: Manage secure, event-driven pipelines (Amazon MSK/Kafka) and public-facing APIs, ensuring reliable delivery for downstream B2B applications processing 200 GB to 500 GB of data daily.
- Proactive System Monitoring: Lead Platform L2 shift support by managing sophisticated Datadog monitors across 5 environments, tracking compute capacity, broker health, and cluster utilization to maintain near-zero deployment downtime.
- Cost Anomaly Root Cause Analysis: Triaged runaway AWS billing alerts by cross-referencing CloudWatch metrics with CloudTrail logs. Identified and severed an infinite synchronous retry loop within an event-streaming architecture, proposing permanent architectural safeguards.
- Vulnerability Remediation (CVEs): Spearheaded DevSecOps remediations to patch critical vulnerabilities in containerized serverless deployments. Implemented multi-stage Docker builds to harden security and achieved a 42% image size reduction (607MB to 350MB).
- IAM Least-Privilege Enforcement: Authored and deployed Terraform configurations to systematically re-route AWS IAM permissions and manage VPC endpoints, ensuring seamless transitions without service interruption.


















