Skip to content

SaeedTaghavi/metabase-postgres-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build a Metabase Dashboard on Postgres — from scratch

A hands-on, fully-dockerized tutorial. A small sensor simulator streams synthetic data into a Postgres warehouse, and you build an operational KPI dashboard in Metabase — step by step, with screenshots for every click.

Why synthetic data? Because the point is to mimic a real-time data-producing pipeline without needing real plant access, credentials, or a big download. Simulated "machines" continuously write production telemetry into Postgres — exactly the shape of data a real PLC/sensor system produces — so the whole stack comes alive with one command and you can practise the analyst's real job: turning a live data stream into KPIs, dashboards, and decisions.

The example domain is manufacturing (machines, sensors, OEE), chosen to mirror a Data Analyst brief: exploratory analysis of operational data, dashboards, KPI/metric definition, descriptive + diagnostic views, and functional data-quality validation.

Operational dashboard

Reading the dashboard (an example insight)

Note: the data here is synthetic, so the figures are illustrative — not a real finding about real machines. The point of this section is to show how an analyst reads a dashboard and turns it into a decision. The actual conclusions are always specific to the real data and business you're working with; the same chart in a different context would tell a different story.

Reading the dashboard top-down tells a clear operational story. All three machines show healthy Quality (~0.93) and Performance (~0.92) — they're not producing bad parts or running slowly. The low OEE (0.38–0.45, well below the ~0.85 world-class benchmark) is driven almost entirely by Availability: the machines are stopped too often. The failure breakdown locates the cause — stoppages are dominated by heat-dissipation (HDF) and power (PWF) failures, and PRESS-01 carries a PWF problem the others don't. The takeaway: focus maintenance on availability, not quality or speed — specifically, investigate PRESS-01's power supply and the heat-dissipation conditions affecting the line.

That single sentence — what's fine, what's wrong, the specific cause, the action — is the actual deliverable; the dashboard is the evidence for it. (More on this workflow in docs/02-data-analyst-playbook.md.)

Companion reading — a short course, in order

  1. docs/01-what-is-a-data-analyst.md — what the role is, what a KPI is, and the reports different industries (fintech, IoT, e-commerce) ask for.
  2. docs/02-data-analyst-playbook.md — the craft: how a business question becomes a metric, becomes SQL, becomes a chart, becomes a decision.
  3. docs/03-data-platforms-explained.md — why this repo uses one Postgres, and how operational databases, data warehouses, and data lakes differ in the real world.
  4. docs/04-setup-runbook.md — the full step-by-step build, with a screenshot at every step and a "configure-as-code" section.

Architecture

machine-cnc01   ┐
machine-cnc02   ├──►  Postgres ("plant" DB)  ◄──  Metabase  ──►  dashboard
machine-press01 ┘        (warehouse)              (BI layer)     localhost:3000
   (sensor sim)                                                  (decisions)

All three machines run the same image; only their command-line arguments differ. That is the Dockerfile ENTRYPOINT (the script) + CMD (per-machine args, overridden in compose) pattern — one build, many machines.

In a real system the sensors would write to a live operational/time-series database, and a pipeline would move that data into a separate warehouse for analytics. Here a single Postgres plays the warehouse role for simplicity — see docs/03-data-platforms-explained.md.

Stack

  • Postgres 16 — the warehouse the sensors write to and Metabase reads from.
  • Metabase — open-source BI; SQL questions assembled into a dashboard.
  • Python sensor simulator — AI4I-style telemetry (temperatures, speed, torque, tool wear) plus five failure modes and OEE-ready production fields.
  • Docker Compose — brings the whole thing up with one command.

Quickstart

docker compose up --build

Each machine backfills ~2000 historical cycles instantly, then streams live, so dashboards have data within seconds. Check the warehouse directly:

docker compose exec postgres psql -U sensors -d plant \
  -c "SELECT machine_id, count(*) FROM sensor_readings GROUP BY 1;"

Run it a few times — the counts climb (here 2.6k → 3.5k → 93k across runs), which proves the sensor stream is genuinely live, not a one-off load:

Row counts climbing across repeated runs

Then connect Metabase and build the dashboard by following docs/04-setup-runbook.md. In short: open http://localhost:3000, add the Postgres database (host postgres, db plant, user sensors, password sensors_pw), create one SQL question per file in sql/kpi/, and arrange them on a dashboard.

"Live" dashboards: the sensor stream is genuinely continuous, but a Metabase chart runs its SQL once on load. Turn on the dashboard's auto-refresh (clock icon → 1 minute) for a continuously-updating view. It's near-real-time (1-minute floor), which is what "real-time" means for an operations dashboard in practice.

KPIs

One file per question in sql/kpi/:

Query Question it answers Chart
OEE by Machine How effective is each machine overall? bar
Failure Breakdown by Type Which failure modes dominate, per machine? bar
Downtime % by Hour When and how much are machines stopped? line
Tool Wear Trend How does tool wear evolve between changes? line
Data Quality Checks Is the data internally consistent? (all 0 = healthy) table

Repo structure

.
├── README.md
├── docker-compose.yml          # Postgres + Metabase + 3 machines
├── sensor/
│   ├── sensor.py               # the simulator (ENTRYPOINT target)
│   ├── Dockerfile              # single-stage (prebuilt psycopg2 wheel)
│   ├── Dockerfile.multistage   # learning variant: builder/runtime split
│   └── requirements.txt
├── sql/kpi/                    # one SQL file per Metabase question
└── docs/
    ├── 01-what-is-a-data-analyst.md     # the role, KPIs, industry reports
    ├── 02-data-analyst-playbook.md      # the analyst craft, explained
    ├── 03-data-platforms-explained.md   # DB vs warehouse vs lake
    ├── 04-setup-runbook.md              # illustrated step-by-step build
    └── *.png                            # screenshots

Notes

  • Data is synthetic by design (see the "why" above). The simulator is modelled on the UCI AI4I 2020 Predictive Maintenance dataset, so the columns and failure modes mirror a real machine log. Swap in the real CSV to validate distributions.
  • Tweak a machine by editing its command: args in compose, e.g. --interval 0.2 (rows/sec), --down-prob 0.08, --backfill 20000.

About

Learn to build a Metabase + Postgres dashboard end to end. Synthetic data, one-command setup, illustrated step-by-step runbook.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors