Skip to content
View shafaypro's full-sized avatar
🧑‍💻
Going to be in FAANG IA
🧑‍💻
Going to be in FAANG IA
  • https://www.linkedin.com/in/imshafay/
  • Berlin

Block or report shafaypro

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shafaypro/README.md
Shafay Amjad

Typing SVG

Portfolio GitHub LinkedIn Credly Email

Berlin, Germany 9+ years experience 13+ certifications Billions of events per day AWS and GCP Open to remote roles Profile views


👋 About Me

def build_data_platform():
    return {
        'name':           'Shafay Amjad',
        'role':           'Lead Data Engineer',
        'based_in':       'Berlin, Germany 🇩🇪',
        'specialization': ['GenAI', 'ML Engineering', 'Data Platforms'],
        'experience':     '9+ years • Amazon, Delivery Hero, Goldman Sachs & more',
        'cloud':          ['AWS', 'GCP', 'Serverless'],
        'architecture':   ['Lakehouse', 'Delta Lake', 'Data Mesh', 'Event-Driven'],
        'data_stack':     ['Airflow', 'DBT', 'Databricks', 'Spark', 'Kafka', 'Iceberg'],
        'genai_ml':       ['Agentic AI', 'Bedrock', 'LangGraph', 'RAG', 'SageMaker'],
        'scale':          'billions of events / day',
        'impact':         'measurable revenue growth & operational excellence',
    }

I'm a Lead Data Engineer and GenAI specialist based in Berlin, with 9+ years architecting enterprise-scale data platforms and intelligent systems. Across top-tier companies like Amazon, Delivery Hero, and Goldman Sachs, I've built solutions processing billions of events daily that drive measurable revenue growth and operational excellence.

  • 🏗️ End-to-end data platforms. I architect and ship complete platforms from ingestion to serving: metadata-driven pipelines with Airflow, DBT, Spark, and Kafka, plus full governance, lineage, and automated lifecycle management across AWS and GCP.
  • 🤖 GenAI and agentic AI. In my latest work I build complete end-to-end GenAI applications and agentic AI systems: multi-step autonomous agents, tool-calling and orchestration, RAG over enterprise knowledge, and human-in-the-loop copilots on Amazon Bedrock, LangChain, and HuggingFace, all with guardrails and evaluation built in.
  • 🧠 Production ML. Recommendation engines, predictive analytics, and real-time intelligent automation with TensorFlow, PyTorch, and SageMaker.


🧰 Tech Stack

Languages

Python SQL Scala R SPARQL Bash

Data Engineering & Streaming

Apache Spark Kafka Flink Kinesis Airflow Dagster dbt Databricks Delta Lake Iceberg Great Expectations

Warehouses & Query Engines

BigQuery Redshift Snowflake Trino DuckDB

Cloud & DevOps

AWS GCP Terraform Kubernetes Docker GitHub Actions

GenAI & Machine Learning

Amazon Bedrock OpenAI Anthropic Agentic AI LangChain LangGraph LlamaIndex MCP Hugging Face PyTorch TensorFlow MLflow SageMaker

BI & Visualization

QuickSight Looker Power BI Streamlit Metabase


🚀 What I Build

Area Focus
Data Platforms Lakehouse architectures, ELT/ETL pipelines, metadata-driven workflows, governance, lineage, and scalable analytics
Streaming Systems Kafka, Kinesis, Flink, and Spark for event-driven pipelines that power near real-time reporting and intelligent automation
GenAI Engineering RAG systems, LLM-powered agents, internal copilots, chatbots, knowledge retrieval, and workflow automation
ML & MLOps Production ML pipelines, feature/training workflows, recommendation systems, and cloud-native model deployment

💼 Professional Experience

Role Company Period Highlights
Lead Data Engineer Orion S.A. Jan 2024 - Present Built complete end-to-end data platforms (serverless AWS lakehouse) centralizing manufacturing, operations & finance analytics; metadata-driven DBT/Airflow pipelines; led GenAI adoption shipping production GenAI applications and agentic AI systems with Bedrock, LangChain & Streamlit
Senior Data Engineer Delivery Hero SE Jan 2022 - Jan 2024 Batch & streaming pipelines across AWS/GCP; real-time analytics APIs; Terraform IaC modernization; BigQuery + Spark processing billions of events/day
Senior Data Engineer Amazon Sep 2020 - Jan 2022 Predictive modeling & ML analytics; end-to-end BI/ML on Redshift, SageMaker, Glue, Lambda, Athena; multi-stream real-time systems with Kafka & Kinesis
Senior Data Engineer Goldman Sachs Jun 2021 - Dec 2021 Enterprise-scale financial analytics pipelines; secure, compliance-driven data workflows; ML model deployment support
Sr. Software Engineer / Data Scientist NorthBay Solutions Feb 2019 - Sep 2020 ETL & OCR automation for insurance/healthcare; domain ML models & recommendation engines; graph-based & Alexa-integrated LLM automation
Machine Learning Engineer NorthBay Solutions Sep 2017 - Feb 2019 Serverless ML on Lambda & SageMaker; voice-enabled LLM systems; real-time processing with Spark, Flink, EMR
Data Integration Intern Teradata Jul 2016 - Sep 2016 Complex SQL & ETL optimization; data warehouse integration procedures and technical documentation

🔐 Featured Projects & Enterprise Impact

High-impact platform, ML, and GenAI systems delivered across manufacturing, e-commerce, food delivery, and regulated finance. Client names and proprietary details are withheld under NDA, but the engineering and the outcomes are real.

🏭 Enterprise Serverless Lakehouse

Fully serverless, scalable lakehouse on AWS for a leading manufacturing enterprise, centralizing analytics across manufacturing, operations, and finance with metadata-driven pipelines, end-to-end governance, lineage tracking, and automated lifecycle management.

📈 Measurable Revenue Growth Full Data Governance

AWS Glue · Athena · Lake Formation · S3 · Delta Lake · Step Functions · EventBridge · QuickSight · DBT · Lambda · Terraform

🤖 End-to-End GenAI & Agentic AI Platform

Designed and shipped complete end-to-end GenAI applications and agentic AI systems: multi-step autonomous agents with tool calling and orchestration, RAG over enterprise knowledge bases, and human-in-the-loop copilots, built on Amazon Bedrock, LangChain, and LangGraph with guardrails, evaluation harnesses, and observability throughout.

🤖 Autonomous Agents Tool Calling & Orchestration Enterprise RAG

Bedrock · LangChain · LangGraph · MCP · Streamlit · HuggingFace · OpenAI · Vector DB · RAG

⚡ Multi-Stream Real-Time Data Platform

Kafka + Kinesis real-time systems for a leading e-commerce platform processing billions of events/day to power CX optimization, predictive analytics, and dynamic metric dashboards across global markets.

⚡ Billions of Events/Day Real-Time Insights

Kafka · Kinesis · Lambda · DynamoDB · Step Functions · Python

🌍 Global Large-Scale Analytics on GCP

BigQuery, Spark & Cloud Functions analytics for a global food-delivery platform processing billions of events across international markets, driving millions in revenue through automated decision-making and real-time analytics.

💰 Millions in Revenue Global Markets

BigQuery · Cloud Functions · Dataflow · Spark · Terraform · DBT · Looker

🧠 Production ML & Recommendation Engines

Production-grade ML pipelines and recommendation engines with SageMaker, TensorFlow & PyTorch, automating deep-learning workflows for predictive analytics, customer segmentation, and targeted advertising.

Improved CX Targeted Advertising

SageMaker · TensorFlow · PyTorch · MLflow · Kubeflow · Python

🏗️ Terraform Multi-Cloud Modernization

Terraform-based IaC across AWS & GCP, improving deployment consistency and reproducibility while enabling automated multi-cloud infrastructure management at enterprise scale.

Automated Deployments Enhanced Security

Terraform · AWS · GCP · Kubernetes · Docker · GitHub Actions


🌱 Open Source & Side Projects

📚 Cracking ML Interview

Comprehensive guide for ML & GenAI interviews with solutions, patterns, and best practices, plus a dedicated GenAI section. Helping thousands of engineers land roles at top tech companies.

Stars

Read the Guide Repo

🇩🇪 DeutschHier

Full-stack German language learning platform with interactive lessons, practice exercises, and real-world content powered by NLP.

Live

🎮 German Gender Game

Gamified learning for German noun genders (der / die / das) with spaced repetition and adaptive difficulty.

Play


📈 Current Focus

  • 🪨 Modern lakehouse and open table formats: Delta Lake and Apache Iceberg patterns
  • ✅ Data quality, contracts, lineage, and trustworthy pipelines at platform scale (Great Expectations, dbt tests)
  • 🔴 Real-time data products powered by Kafka, Kinesis, Flink & Spark
  • 🧩 End-to-end GenAI and agentic AI: autonomous multi-step agents, tool/function calling, MCP, RAG over knowledge bases, and evaluation-driven iteration
  • ☁️ Multi-cloud data & AI infrastructure with Terraform, Kubernetes, CI/CD
  • ⚙️ ML & LLM platform engineering focused on reliability, governance, and cost-aware deployment

📌 Featured Repositories

CrackingMachineLearningInterview repository card PYSHA repository card

DeepLearningZerotoHero repository card


📊 GitHub Analytics

GitHub stats Top languages

GitHub streak

GitHub trophies


🏅 Certifications

13+ verified badges → Credly Profile


🤝 Let's Build Something Amazing Together

Looking for a lead data engineer, need consultation on cloud architecture, or want to collaborate on innovative AI projects? I'd love to hear from you.

footer

Pinned Loading

  1. scikit-learn/scikit-learn scikit-learn/scikit-learn Public

    scikit-learn: machine learning in Python

    Python 66.6k 27.1k

  2. aws/amazon-sagemaker-examples aws/amazon-sagemaker-examples Public

    Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

    Jupyter Notebook 11k 7k

  3. RDFLib/rdflib RDFLib/rdflib Public

    RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.

    Python 2.5k 594

  4. CrackingMachineLearningInterview CrackingMachineLearningInterview Public

    A repository to prepare you for your machine learning interview, involving most of the questions asked by all the tech giants and local companies. Do this to Ace your Machine Learning Engineer Inte…

    HTML 665 135

  5. PYSHA PYSHA Public

    A Simple Virtual Assistant Build in Python 3.5

    Python 19 5

  6. DeepLearningZerotoHero DeepLearningZerotoHero Public

    A repository for Deep Learning projects, which includes complete preparation from Novice to Expert

    Jupyter Notebook 7 1