Skip to content

JheyssonDouglas/sql-showcase

Repository files navigation

SQL Showcase — Olist E-Commerce Analytics

End-to-end SQL analysis of Brazil's largest public e-commerce dataset, demonstrating skills from fundamentals to production-grade analytics engineering.

SQL Dataset Status


Overview

This repository is a structured SQL portfolio built on the Brazilian E-Commerce Public Dataset by Olist — a real dataset with over 100,000 orders, 99k customers, 3k sellers, and 32k products.

The project covers the full analytical lifecycle:

  • Data exploration and validation
  • Core SQL techniques (joins, aggregations, window functions)
  • Business analytics with real KPIs
  • Analytics engineering with dimensional modeling
  • Query optimization and performance analysis

Dataset Schema

customers ──────── orders ─────────── order_items ──── products
    (customer_id)      (order_id)           (order_id)      (product_id)
                           │                    │
                           │               order_items ──── sellers
                           │                                (seller_id)
                    order_payments
                    order_reviews
Table Rows Description
customers 99,441 Customer records with location
orders 99,441 Orders with status and timestamps
order_items 112,650 Products sold within each order
products 32,951 Product catalog with categories
sellers 3,095 Marketplace seller information
order_payments 103,886 Payment method and installments
order_reviews 99,224 Customer satisfaction scores

Project Structure

sql-showcase/
├── 00_data_understanding/      # Schema exploration and relationship validation
├── 01_select_and_filtering/    # SELECT, WHERE, ORDER BY, LIKE, BETWEEN, IN
├── 02_joins/                   # INNER, LEFT, multiple joins, anti-join patterns
├── 03_aggregations/            # GROUP BY, HAVING, COUNT, SUM, AVG, ROLLUP
├── 04_ctes_and_subqueries/     # CTEs, correlated subqueries, multi-step logic
├── 05_window_functions/        # ROW_NUMBER, RANK, LAG/LEAD, running totals
├── 06_business_cases/          # Real KPIs: revenue, RFM, LTV, delivery SLA
├── 07_analytics_engineering/   # Dimensional modeling: fact + dim tables
├── 08_query_optimization/      # Performance rewrites and EXPLAIN analysis
├── database/
│   └── setup.sql               # DuckDB setup script
└── datasets/                   # Olist CSV files

Sections

Schema exploration, primary/foreign key validation, relationship mapping, and business question formulation before any analytical work.

Foundational query patterns: column selection, row filtering, pattern matching, null handling, and sorting. Covers WHERE, LIKE, BETWEEN, IN, IS NULL, ORDER BY, DISTINCT.

Combining data across tables: INNER JOIN, LEFT JOIN, multi-table joins, and anti-join patterns using LEFT JOIN ... WHERE IS NULL.

Summarizing data at scale: GROUP BY, HAVING, COUNT DISTINCT, revenue rollups, and ROLLUP for subtotals.

Multi-step logic with WITH clauses, scalar subqueries, subqueries in FROM and WHERE, correlated subqueries, and chained CTEs.

Advanced analytics: ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD, running totals, moving averages, NTILE, and PERCENT_RANK.

Real-world KPIs: monthly revenue, YoY growth, customer RFM segmentation, seller performance scorecard, delivery SLA analysis, and LTV estimation.

Dimensional modeling in SQL: building dim_customer, dim_product, dim_seller, dim_date, and fact_sales from raw tables using CREATE TABLE AS SELECT.

Performance-focused rewrites: eliminating SELECT *, early filtering, index recommendations, CTE vs subquery trade-offs, and EXPLAIN output analysis.


Setup

Requirements: DuckDB CLI or any DuckDB-compatible client (DBeaver, VSCode extension).

  1. Clone this repository
  2. Download the dataset from Kaggle — Brazilian E-Commerce Public Dataset by Olist and place the CSV files in the datasets/ folder
  3. Run the setup script to create the database:
duckdb database/olist.duckdb < database/setup.sql
  1. Open any .sql file and run queries against database/olist.duckdb

Skills Demonstrated

Area Tools / Concepts
SQL Fundamentals SELECT, WHERE, JOIN, GROUP BY, ORDER BY
Advanced SQL Window Functions, CTEs, Subqueries, ROLLUP
Business Analytics KPI design, RFM, LTV, cohort-style analysis
Analytics Engineering Star schema, fact/dim modeling, CREATE TABLE AS
Query Performance EXPLAIN, index strategy, query rewriting
Data Platform DuckDB, columnar storage, CSV ingestion

Author

Jheysson Douglas Analytics Engineer | Data Engineer

LinkedIn GitHub

About

SQL portfolio on the Olist Brazilian e-commerce dataset — from fundamentals (joins, aggregations) through window functions, business KPIs (RFM, LTV), dimensional modeling, and query optimization, using DuckDB.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors