Skip to content
View Duncan610's full-sized avatar

Block or report Duncan610

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Duncan610/README.md

Duncan Otieno

Data Engineer | Analytics Engineer β€’ Data Pipeline Architect

Building reliable data infrastructure, one transformation at a time

LinkedIn β€’ Email β€’ Nairobi, Kenya πŸ‡°πŸ‡ͺ

Profile Views


πŸ‘¨πŸΎβ€πŸ’» What I Do

I transform raw data into reliable insights. Currently specializing in modern data stack engineering with a focus on:

  • Data Modeling β†’ Dimensional modeling, slowly changing dimensions, data vault
  • Pipeline Orchestration β†’ Airflow DAGs, dependency management, error handling
  • Analytics Engineering β†’ dbt transformations, incremental materialization, data quality
  • Cloud Infrastructure β†’ AWS data services, infrastructure-as-code, cost optimization

🎯 Current Mission

Transitioning into production analytics engineering after completing a 1-year intensive data science certification and earning my AWS Cloud Practitioner certification. Building portfolio projects that solve real business problems with clean code and thoughtful architecture.


πŸš€ Featured Work

πŸ—οΈ [Instacart Analytics pipeline]

Production-Grade E-Commerce Analytics Platform

Building an end-to-end analytics pipeline that processes customer transaction data using the modern data stack.

Stack: dbt β€’ PostgreSQL β€’ Airflow β€’ Python β€’ Docker β€’ Snowflake
Highlights: Incremental ETL, data quality testing, CI/CD automation

Technical Deep Dive

Architecture:

  • Medallion architecture (Bronze β†’ Silver β†’ Gold layers)
  • Incremental materialization for performance
  • Great Expectations for data quality
  • GitHub Actions for continuous deployment

Business Impact:

  • Reduces data processing time by 80%
  • Automated data quality checks catch 99% of issues
  • Self-service analytics layer for stakeholders

πŸ’Ό Technical Toolkit

Core Data Engineering Stack

Python SQL dbt Airflow PostgreSQL Snowflake

Cloud & DevOps

AWS Docker Git GitHub Actions Terraform

Data Science & ML

Pandas NumPy Scikit-learn Jupyter

Data Engineering

  • Languages: Python, SQL, Bash
  • Orchestration: Apache Airflow
  • Transformation: dbt (data build tool)
  • Databases: PostgreSQL, Snowflake
  • Data Quality: Great Expectations
  • Version Control: Git, GitHub Actions

Cloud & Infrastructure

  • Cloud Platform: AWS (EC2, S3, RDS, Lambda)
  • Containerization: Docker, Docker Compose
  • IaC: Terraform (learning)
  • BI Tools: Tableau, Power BI (basic)
  • ML Background: Scikit-learn, Pandas, NumPy
  • IDEs: VS Code, Jupyter, PyCharm

πŸ“ˆ What Makes Me Different

πŸ” Detail-Oriented Engineering
I don't just make pipelines workβ€”I make them maintainable, testable, and cost-efficient.

🧠 Business-First Mindset
Every technical decision traces back to business value. Data engineering isn't just moving dataβ€”it's enabling better decisions.

πŸ“š Continuous Learner
From data science certification to cloud engineering to analytics engineeringβ€”I'm always expanding my technical horizons.

🌍 Global Perspective, Local Impact
Based in Nairobi, building skills that compete globally while looking to create impact locally.


πŸ’» Philosophy in Code

def approach_to_data_engineering():
    principles = {
        "quality": "Test everything, twice",
        "efficiency": "Automate the boring stuff",
        "clarity": "Code is read more than written",
        "impact": "Focus on business value"
    }
    return principles

πŸ“Š 2025 Focus Areas

graph LR
    A[Modern Data Stack] --> B[dbt Mastery]
    A --> C[Airflow Expertise]
    A --> D[Cloud Architecture]
    B --> E[Production Projects]
    C --> E
    D --> E
    E --> F[Analytics Engineering Role]
Loading

Currently Building:

  • βœ… Production-grade data pipelines
  • βœ… Data quality frameworks
  • βœ… CI/CD for analytics code
  • 🎯 Real-time streaming (next phase)

πŸ“ Now

Last updated: January 2025

Currently:

  • πŸ”¨ Building: Data Engineering and analytics engineering projects
  • πŸ“š Learning: Advanced data modeling patterns (Kimball methodology)
  • 🎯 Seeking: Data Engineer / Analytics Engineer roles
  • 🌱 Reading: "The Data Warehouse Toolkit" by Ralph Kimball

This Week:

  • Implementing incremental loads in dbt
  • Building Airflow DAGs for orchestration
  • Networking with data engineers on LinkedIn
  • Contributing to data engineering communities

πŸ’­ Philosophy

"The best data pipeline is the one you don't have to think aboutβ€”it just works, scales, and alerts you when it doesn't."

I believe in:

  • Automation over manual work β†’ If I do it twice, I automate it
  • Documentation as code β†’ Good docs prevent 3 AM debugging sessions
  • Test-driven development β†’ Catch bugs before they catch you
  • Incremental improvement β†’ Small wins compound into excellence

πŸŽ“ Certifications & Education

AWS Certified Cloud Practitioner β€’ 2024
ALX Data Science Tech Programs β€’ 1-Year Program β€’ 2023-2024


⚑ Fun Fact

When I'm not building data pipelines, I'm probably:

  • ⚽ Training for a football tournament around Nairobi
  • β˜• Experimenting with pour-over coffee (yes, I track the extraction ratios in a spreadsheet)
  • πŸ“– Reading technical blogs and data engineering case studies
  • β™ŸοΈ Playing chess online (data analysis extends to opening theory!)

I've written SQL queries that join 10+ tables without losing my sanity. My secret? CTEs, lots of CTEs.


🀝 Let's Connect

I'm actively seeking Analytics Engineer or Junior Data Engineer roles where I can:

  • Build scalable data infrastructure
  • Work with modern data stack (dbt, Airflow, Snowflake, Databricks)
  • Collaborate with data teams solving real problems
  • Learn from experienced engineers

Reach out if you're:

  • Hiring for analytics engineering roles
  • Want to discuss data architecture
  • Building something interesting in the data space
  • Looking for collaboration on open-source data tools

πŸ“§ Email: otienoduncan99@gmail.com
πŸ’Ό LinkedIn: duncan-otieno
πŸ“ Location: Nairobi, Kenya (Open to remote)
πŸ• Timezone: EAT (UTC+3)


πŸ“Š GitHub Activity


πŸ’‘ Current Status

+ Building production-grade projects
+ Networking with data engineering community
+ Actively seeking analytics engineering roles
! Available for opportunities - Let's build something great together

"Data is the new electricity, and Engineers are the power grid. Keep Building, Keep Automating, Keep Scaling."


Last Updated: January 2025

Pinned Loading

  1. instacart-analytics-pipeline instacart-analytics-pipeline Public

    Python

  2. urban-pulse-analytics-pipeline urban-pulse-analytics-pipeline Public

    UrbanPulse ingests live NYC public data from three sources, transforms it through a production-grade medallion architecture, and surfaces the answers.

    Python 1

  3. sql-warehouse sql-warehouse Public

    Building a data warehouse using PostgreSQL, including ETL processes, data modelling and analytics.

    PLpgSQL

  4. devopsprojects devopsprojects Public

    JavaScript 1

  5. stock-trading-python-app stock-trading-python-app Public

    This uses polygon.io API to extract data about stocks

    Python

  6. cinemaiq cinemaiq Public

    Python