Skip to content
View lucasskuja's full-sized avatar

Block or report lucasskuja

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
lucasskuja/README.md

Hi, I'm Lucas Skuja

Senior Data Engineer focused on building scalable cloud data platforms, reliable analytics workflows, and practical GenAI-enabled solutions.

I build production-style data workflows across GCP, AWS, and Azure, with hands-on work in Python, SQL, Airflow, dbt, BigQuery, Databricks, Spark, and Terraform.

What I Build

  • Scalable batch and orchestration workflows for analytics and operational data use cases
  • Cloud-native data platforms with infrastructure as code, testing, and CI/CD
  • Analytics engineering layers for trustworthy reporting and decision support
  • GenAI-enabled workflows for reporting, knowledge access, and data product experiences

Selected Projects

Production-style GCP data platform for IoT energy monitoring using Cloud Run Jobs, Composer, BigQuery, dbt, Terraform, and CI workflows.

AWS data lake infrastructure provisioned with Terraform, modular environments, secure defaults, MWAA orchestration, and analytics-ready services.

ETL and reporting pipeline for B3 reference rates with Airflow orchestration, layered storage, and LLM-assisted executive reporting.

Bilingual technical documentation portfolio covering data architecture, pipelines, quality, observability, governance, and cloud patterns.

Core Areas

  • Data platform design
  • Analytics engineering
  • Workflow orchestration
  • Cloud architecture
  • Infrastructure as code
  • Data quality and observability
  • GenAI applied to data workflows

Current Focus

  • Strengthening production-style portfolio projects for international opportunities
  • Expanding practical GenAI use cases for data teams and analytics workflows
  • Deepening multi-cloud platform patterns across GCP, AWS, and Azure

Tech Stack

Python SQL Spark PySpark Airflow dbt BigQuery Databricks Terraform GCP AWS Azure

Connect


I use this GitHub to share practical data platform projects, architecture decisions, and applied experiments at the intersection of data engineering and GenAI.

Pinned Loading

  1. b3-ref-rates-pipeline b3-ref-rates-pipeline Public

    Este repositório contém a implementação de um pipeline de extração, transformação e carga (ETL) das Taxas Referenciais da B3, utilizando Apache Airflow

    Python

  2. documentation documentation Public

    This repository serves as a bilingual data engineering documentation portfolio, bringing together foundational and advanced concepts across the field. The content is organized in Markdown to docume…

  3. terraform-datalake terraform-datalake Public

    AWS data lake infrastructure with Terraform, modular environments, MWAA orchestration, and secure analytics services.

    HCL

  4. gcp-iot-analytics-dbt gcp-iot-analytics-dbt Public

    Pipeline de dados em GCP para monitoramento energético IoT com Cloud Run Jobs, Composer, BigQuery e dbt.

    Python