Skip to content
View galafis's full-sized avatar
  • Brazil
  • 13:38 (UTC -03:00)

Block or report galafis

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
galafis/README.md
Typing SVG

LinkedIn GitHub Email Profile Views


About Me

Data Scientist with hands-on experience in Machine Learning, Deep Learning, and Generative AI (LLMs/RLHF), working across the full lifecycle: ETL/ELT, predictive modeling, deployment, MLOps, and monitoring. Background in real-time fraud detection (30K+ transactions/day), LLM refinement with RLHF, and analytical solutions on GCP.

Currently pursuing a Postgraduate in AI & Health Data Science at Instituto Sírio-Libanês and an MBA in Data Science & AI at USP/Esalq.

  • Based in Curitiba, PR, Brazil
  • 454+ public repositories across 10+ languages
  • Certified by Google, IBM, Johns Hopkins & Wharton
  • Open to opportunities in Data Science, MLOps & GenAI

Tech Stack

Languages

Python SQL R TypeScript JavaScript Java Rust Go

Machine Learning & AI

Scikit-Learn TensorFlow PyTorch XGBoost LightGBM HuggingFace OpenCV SHAP

Data Engineering & MLOps

Apache Spark Kafka MLflow Docker Kubernetes GitHub Actions FastAPI Streamlit

Cloud & Databases

GCP BigQuery PostgreSQL MongoDB MySQL

BI & Visualization

Power BI Tableau Plotly Looker Studio


Featured Projects

Ensemble of 4 models (RF, XGBoost, Neural Networks, Autoencoders) with end-to-end MLOps pipeline. AUC 0.94 | Latency < 200ms | 30K+ transactions/day.

Python TensorFlow XGBoost MLflow Kafka

Real-time analytics processing 10K+ events/second for market microstructure insights, trading signal generation, and performance monitoring.

Python PySpark Kafka PostgreSQL

Medical entity extraction (ICD-10, medications, symptoms) from clinical texts using BERTimbau/BioBERTpt Transformers with optimized NER F1-score.

Python Transformers NER FastAPI

End-to-end pipeline for DNA-seq, RNA-seq, single-cell & ChIP-seq workflows with ML-based insights on HPC and cloud (AWS, GCP, Azure).

Python Bioinformatics ML Cloud

Fairness auditing for HR models: 5 metrics (Disparate Impact, Demographic Parity, Equal Opportunity), SHAP by group, 3 mitigation techniques, automated HTML reports. 62 tests.

Python Fairlearn SHAP FastAPI Docker

Organizational Network Analysis with NetworkX: centrality metrics, bottleneck detection, knowledge loss risk, Louvain community detection, executive recommendations. 68 tests.

Python NetworkX FastAPI Streamlit

End-to-end MLOps pipeline for predicting employee turnover risk with automated retraining, model monitoring, and drift detection.

Python MLflow Docker CI/CD

RAG-powered assistant for HR policy Q&A using LLMs with retrieval-augmented generation over corporate policy documents.

Python LangChain LLMs RAG FastAPI

450+ repositories spanning Data Science, ML/AI, Data Engineering, Quantitative Finance, HealthTech, HR Tech, and more. Explore all repositories →


Certifications


Advanced Data Analytics
Data Analytics

AI Engineering
Data Engineering
Machine Learning
GenAI Engineering
Deep Learning
Data Science

Data Science
Specialization

Business Analytics
UPenn

Education

Degree Institution Period
Postgraduate — AI & Health Data Science Instituto Sírio-Libanês (IEP/HSL) 2026 – 2027
MBA — Data Science, AI & Analytics USP / Esalq 2026 – 2027
B.Tech — Systems Analysis & Development UniDomBosco 2022 – 2025
B.Tech — Cyber Defense UniDomBosco 2022 – 2025
B.Tech — IT Management UniDomBosco 2022 – 2025
Data Scientist (Professional) EBAC 2024 – 2025

GitHub Analytics




Experience Highlights

Analista de Dados / Cientista de Dados  —  trade2go          (Out/2025 – Mar/2026)
├── Full ML/AI cycle (POC → Production): regression, classification, clustering
├── EDA on 2M+ records datasets, identifying business KPI patterns
└── 6+ dashboards (Power BI, Looker Studio) | SQL optimization (+40% performance)

Data Science Researcher              —  Manus AI            (Mar/2025 – Present)
├── R&D in AI / ML / Deep Learning: 5+ architectures benchmarked
└── Feature engineering on 80+ raw features (+8% accuracy improvement)

Analista de Cibersegurança           —  Sicredi PJ/Contt    (Jan/2023 – Jun/2025)
├── Real-time fraud detection: RF, XGBoost, Neural Nets, Autoencoders (-28% FP)
├── MLOps pipeline: Python, TensorFlow, Kafka, Spark, MLflow (99.2% uptime)
└── Anomaly detection contributing to -15% in financial losses

Estagiário Dev Fullstack             —  EBANX               (Mar/2022 – Jan/2023)
├── Scalable web applications (PHP, JS, HTML5, CSS)
└── MySQL query optimization (-25% response time)

Domain Expertise

mindmap
  root((Gabriel Lafis))
    Machine Learning
      Supervised & Unsupervised
      Ensemble Methods
      Hyperparameter Optimization
      Feature Engineering
    Deep Learning & NLP
      Transformers & BERT
      LLMs & RLHF
      NER & Text Mining
      Computer Vision
    MLOps & Engineering
      CI/CD Pipelines
      Model Monitoring
      Docker & Kubernetes
      Real-Time Streaming
    Data Engineering
      ETL/ELT Pipelines
      Apache Spark & Kafka
      Data Warehousing
      BigQuery & GCP
    Domain Applications
      Financial Fraud Detection
      HealthTech & Clinical NLP
      Quantitative Finance
      HR Tech & People Analytics
Loading

Let's Connect

I'm always open to discussing Data Science, MLOps architectures, or AI applications in Finance, Health, and HR.

LinkedIn Email


Curitiba, PR, Brazil | Open to remote & hybrid opportunities

Pinned Loading

  1. Advanced-ML-Pipeline Advanced-ML-Pipeline Public

    Pipeline de ML para classificacao com EDA automatizada, comparacao de 4 modelos (RF, GB, LR, SVM), GridSearchCV e persistencia. Projeto educacional.

    Python 1

  2. ai-financial-fraud-detection ai-financial-fraud-detection Public

    AI-powered fraud detection system for financial transactions. Uses ensemble models, anomaly detection, and real-time scoring to identify fraudulent patterns.

    Python 1

  3. genomic-data-analysis-pipeline genomic-data-analysis-pipeline Public

    End-to-end genomic data analysis pipeline: DNA-seq, RNA-seq, single-cell & ChIP-seq workflows with ML-based insights on HPC and cloud (AWS, GCP, Azure)

    Python 1

  4. high-frequency-trading-analytics high-frequency-trading-analytics Public

    Real-time analytics platform for high-frequency trading data. Processes tick-level data with ultra-low latency for market microstructure insights and trading performance analysis.

    Python 1

  5. clinical-nlp-pipeline-ptbr clinical-nlp-pipeline-ptbr Public

    Pipeline de NLP clinico para portugues brasileiro - Extracao de entidades medicas (NER) de textos clinicos usando Transformers (BERTimbau/BioBERTpt)

    Python 2 1