def build_data_platform():
return {
'name': 'Shafay Amjad',
'role': 'Lead Data Engineer',
'based_in': 'Berlin, Germany 🇩🇪',
'specialization': ['GenAI', 'ML Engineering', 'Data Platforms'],
'experience': '9+ years • Amazon, Delivery Hero, Goldman Sachs & more',
'cloud': ['AWS', 'GCP', 'Serverless'],
'architecture': ['Lakehouse', 'Delta Lake', 'Data Mesh', 'Event-Driven'],
'data_stack': ['Airflow', 'DBT', 'Databricks', 'Spark', 'Kafka', 'Iceberg'],
'genai_ml': ['Agentic AI', 'Bedrock', 'LangGraph', 'RAG', 'SageMaker'],
'scale': 'billions of events / day',
'impact': 'measurable revenue growth & operational excellence',
}I'm a Lead Data Engineer and GenAI specialist based in Berlin, with 9+ years architecting enterprise-scale data platforms and intelligent systems. Across top-tier companies like Amazon, Delivery Hero, and Goldman Sachs, I've built solutions processing billions of events daily that drive measurable revenue growth and operational excellence.
- 🏗️ End-to-end data platforms. I architect and ship complete platforms from ingestion to serving: metadata-driven pipelines with Airflow, DBT, Spark, and Kafka, plus full governance, lineage, and automated lifecycle management across AWS and GCP.
- 🤖 GenAI and agentic AI. In my latest work I build complete end-to-end GenAI applications and agentic AI systems: multi-step autonomous agents, tool-calling and orchestration, RAG over enterprise knowledge, and human-in-the-loop copilots on Amazon Bedrock, LangChain, and HuggingFace, all with guardrails and evaluation built in.
- 🧠 Production ML. Recommendation engines, predictive analytics, and real-time intelligent automation with TensorFlow, PyTorch, and SageMaker.
Languages
Data Engineering & Streaming
Warehouses & Query Engines
Cloud & DevOps
GenAI & Machine Learning
BI & Visualization
| Area | Focus |
|---|---|
| Data Platforms | Lakehouse architectures, ELT/ETL pipelines, metadata-driven workflows, governance, lineage, and scalable analytics |
| Streaming Systems | Kafka, Kinesis, Flink, and Spark for event-driven pipelines that power near real-time reporting and intelligent automation |
| GenAI Engineering | RAG systems, LLM-powered agents, internal copilots, chatbots, knowledge retrieval, and workflow automation |
| ML & MLOps | Production ML pipelines, feature/training workflows, recommendation systems, and cloud-native model deployment |
| Role | Company | Period | Highlights |
|---|---|---|---|
| Lead Data Engineer | Orion S.A. | Jan 2024 - Present | Built complete end-to-end data platforms (serverless AWS lakehouse) centralizing manufacturing, operations & finance analytics; metadata-driven DBT/Airflow pipelines; led GenAI adoption shipping production GenAI applications and agentic AI systems with Bedrock, LangChain & Streamlit |
| Senior Data Engineer | Delivery Hero SE | Jan 2022 - Jan 2024 | Batch & streaming pipelines across AWS/GCP; real-time analytics APIs; Terraform IaC modernization; BigQuery + Spark processing billions of events/day |
| Senior Data Engineer | Amazon | Sep 2020 - Jan 2022 | Predictive modeling & ML analytics; end-to-end BI/ML on Redshift, SageMaker, Glue, Lambda, Athena; multi-stream real-time systems with Kafka & Kinesis |
| Senior Data Engineer | Goldman Sachs | Jun 2021 - Dec 2021 | Enterprise-scale financial analytics pipelines; secure, compliance-driven data workflows; ML model deployment support |
| Sr. Software Engineer / Data Scientist | NorthBay Solutions | Feb 2019 - Sep 2020 | ETL & OCR automation for insurance/healthcare; domain ML models & recommendation engines; graph-based & Alexa-integrated LLM automation |
| Machine Learning Engineer | NorthBay Solutions | Sep 2017 - Feb 2019 | Serverless ML on Lambda & SageMaker; voice-enabled LLM systems; real-time processing with Spark, Flink, EMR |
| Data Integration Intern | Teradata | Jul 2016 - Sep 2016 | Complex SQL & ETL optimization; data warehouse integration procedures and technical documentation |
High-impact platform, ML, and GenAI systems delivered across manufacturing, e-commerce, food delivery, and regulated finance. Client names and proprietary details are withheld under NDA, but the engineering and the outcomes are real.
|
Fully serverless, scalable lakehouse on AWS for a leading manufacturing enterprise, centralizing analytics across manufacturing, operations, and finance with metadata-driven pipelines, end-to-end governance, lineage tracking, and automated lifecycle management.
AWS Glue · Athena · Lake Formation · S3 · Delta Lake · Step Functions · EventBridge · QuickSight · DBT · Lambda · Terraform |
Designed and shipped complete end-to-end GenAI applications and agentic AI systems: multi-step autonomous agents with tool calling and orchestration, RAG over enterprise knowledge bases, and human-in-the-loop copilots, built on Amazon Bedrock, LangChain, and LangGraph with guardrails, evaluation harnesses, and observability throughout.
Bedrock · LangChain · LangGraph · MCP · Streamlit · HuggingFace · OpenAI · Vector DB · RAG |
|
Kafka + Kinesis real-time systems for a leading e-commerce platform processing billions of events/day to power CX optimization, predictive analytics, and dynamic metric dashboards across global markets.
Kafka · Kinesis · Lambda · DynamoDB · Step Functions · Python |
BigQuery, Spark & Cloud Functions analytics for a global food-delivery platform processing billions of events across international markets, driving millions in revenue through automated decision-making and real-time analytics.
BigQuery · Cloud Functions · Dataflow · Spark · Terraform · DBT · Looker |
|
Production-grade ML pipelines and recommendation engines with SageMaker, TensorFlow & PyTorch, automating deep-learning workflows for predictive analytics, customer segmentation, and targeted advertising.
SageMaker · TensorFlow · PyTorch · MLflow · Kubeflow · Python |
Terraform-based IaC across AWS & GCP, improving deployment consistency and reproducibility while enabling automated multi-cloud infrastructure management at enterprise scale.
Terraform · AWS · GCP · Kubernetes · Docker · GitHub Actions |
|
Comprehensive guide for ML & GenAI interviews with solutions, patterns, and best practices, plus a dedicated GenAI section. Helping thousands of engineers land roles at top tech companies. |
Full-stack German language learning platform with interactive lessons, practice exercises, and real-world content powered by NLP. |
Gamified learning for German noun genders (der / die / das) with spaced repetition and adaptive difficulty. |
- 🪨 Modern lakehouse and open table formats: Delta Lake and Apache Iceberg patterns
- ✅ Data quality, contracts, lineage, and trustworthy pipelines at platform scale (Great Expectations, dbt tests)
- 🔴 Real-time data products powered by Kafka, Kinesis, Flink & Spark
- 🧩 End-to-end GenAI and agentic AI: autonomous multi-step agents, tool/function calling, MCP, RAG over knowledge bases, and evaluation-driven iteration
- ☁️ Multi-cloud data & AI infrastructure with Terraform, Kubernetes, CI/CD
- ⚙️ ML & LLM platform engineering focused on reliability, governance, and cost-aware deployment
13+ verified badges → Credly Profile
Looking for a lead data engineer, need consultation on cloud architecture, or want to collaborate on innovative AI projects? I'd love to hear from you.




