Skip to content

Backend infrastructure managing restaurant ratings and health inspection data storage and serverless functions.

License

Notifications You must be signed in to change notification settings

SurEtBon/backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SΓ»rEtBon Backend Infrastructure

Executive Summary

SΓ»rEtBon is a backend infrastructure for restaurant safety assessment, combining government health inspection data with public ratings to provide comprehensive food establishment evaluation. Built on Supabase and following medallion architecture patterns, it delivers scalable, secure, and maintainable data processing capabilities.

Important Note: Initialization vs Data Updates

This backend project provides one-time initialization scripts to set up the infrastructure and load initial data. Ongoing data updates are handled by a separate project (data-pipeline) using Apache Airflow for scheduled ETL operations.

  • This project (backend): Initial setup, migrations, and infrastructure maintenance
  • data-pipeline project: Scheduled data updates, ETL pipelines, and data refresh operations

Table of Contents

Architecture Overview

System Design

SΓ»rEtBon implements a modern data platform architecture with separated initialization and update processes:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        External Data Sources                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ OpenStreetMapβ”‚  β”‚ Alim'confianceβ”‚  β”‚ Google Places β”‚  β”‚ Tripadvisor β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β–Ό                  β–Ό                  β–Ό                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Data Ingestion (Two Projects)                         β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚         β”‚  Backend (This Project)β”‚  β”‚  data-pipeline (Airflow) β”‚          β”‚
β”‚         β”‚  β€’ Initial data load   β”‚  β”‚  β€’ Scheduled updates     β”‚          β”‚
β”‚         β”‚  β€’ One-time setup      β”‚  β”‚  β€’ ETL pipelines         β”‚          β”‚
β”‚         β”‚  β€’ Infrastructure      β”‚  β”‚  β€’ Data refresh          β”‚          β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β–Ό                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           Supabase Platform                               β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚         β”‚                  Storage (Data Lake)                 β”‚          β”‚
β”‚         β”‚                  Parquet File Archive                β”‚          β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                                  β–Ό                                        β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚         β”‚              PostgreSQL Database                     β”‚          β”‚
β”‚         β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚          β”‚
β”‚         β”‚  β”‚   Bronze   β”‚β†’ β”‚   Silver   β”‚β†’ β”‚    Gold    β”‚      β”‚          β”‚
β”‚         β”‚  β”‚   Schema   β”‚  β”‚   Schema   β”‚  β”‚   Schema   β”‚      β”‚          β”‚
β”‚         β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚          β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Medallion Architecture

The system implements a three-layer medallion architecture for progressive data refinement:

  • Bronze Layer: Raw data ingestion with minimal transformation
  • Silver Layer: Cleansed, validated, and standardized data
  • Gold Layer: Business-ready aggregations and metrics

Geospatial Capabilities

The platform includes comprehensive PostGIS support for location-based analysis:

  • PostGIS Core: 2D/3D geometry and geography types for restaurant locations
  • Raster Support: Heatmap generation and density analysis
  • Advanced 3D: Complex geometric calculations for urban environments
  • Topology Management: Administrative boundaries and inspection zones
  • Spatial Indexing: High-performance proximity searches and geographic queries

Prerequisites

Required Tools

1. Supabase CLI

The Supabase CLI manages local development and deployment:

πŸ“š Installation Guide: https://supabase.com/docs/guides/cli/getting-started

2. Python Environment Manager (uv)

I recommend uv for fast, reliable Python dependency management:

πŸ“š Installation Guide: https://docs.astral.sh/uv/getting-started/installation/

3. Authentication

Authenticate with your Supabase account:

supabase login

πŸ“š Authentication Guide: https://supabase.com/docs/reference/cli/supabase-login

Setup

Step 1: Create Hosted Supabase Project

Create a new project on the Supabase platform:

supabase projects create SurEtBon \
  --db-password <strong-password> \
  --org-id <your-org-id> \
  --region eu-west-3

Parameters:

  • --db-password: Strong password (16+ chars, mixed case, numbers, symbols)
  • --org-id: Organization ID (run supabase orgs list to find)
  • --region: Geographic region (eu-west-3 for Paris)

Important: Save the displayed REFERENCE ID - you'll need it throughout setup.

Step 2: Start Local Development Environment

Initialize the local Supabase stack:

supabase start

Step 3: Link Local to Hosted Project

Connect your local environment to the cloud project:

supabase link --project-ref <reference-id>

Step 3 (Alternative): Self-Hosted Supabase Setup

For self-hosted Supabase instances, the supabase link command is not available. Skip Step 3 and proceed directly to Step 4.

The initialization script auto-detects self-hosted deployments and uses --db-url instead of --linked for database migrations. Ensure SUPABASE_DB_URI is correctly configured in your .env file.

Step 4: Configure Environment Variables

  1. Copy template:

    cp .env.sample .env
    chmod 600 .env  # Secure file permissions
  2. Set SUPABASE_URL:

    SUPABASE_URL=https://<reference-id>.supabase.co
  3. Get API keys:

    supabase projects api-keys --project-ref <reference-id>

    Copy the service_role key to SUPABASE_SERVICE_ROLE_KEY

  4. Configure database URI:

    • Navigate to: https://supabase.com/dashboard/project/<reference-id>?showConnect=true
    • Select "Session pooler" connection type
    • Copy URI and remove postgresql:// prefix
    • Set in .env as SUPABASE_DB_URI

Step 5: Initialize Backend Infrastructure

Run the comprehensive initialization script:

./bin/initialize_backend.sh

This performs:

  • Storage bucket creation with security policies
  • Database setup via migrations (all schemas and tables)
  • Initial data import (OSM + Alim'confiance) for bootstrap

Expected runtime: 5-10 minutes

Note: This is a one-time initialization. Subsequent data updates will be handled by the data-pipeline project.

Applying New Migrations (Post-Initialization)

For users who have already initialized the backend and need to apply new migrations:

Supabase cloud

Push pending migrations to your linked project:

supabase db push --linked

This applies all migrations in supabase/migrations/ that haven't been executed yet.

Self-Hosted Supabase

Push pending migrations using the database URL:

supabase db push --db-url "postgres://$SUPABASE_DB_URI"

Ensure SUPABASE_DB_URI is configured in your .env file.

Verifying Migration Status

Check which migrations have been applied:

# Hosted Supabase
supabase migration list --linked

# Self-hosted Supabase
supabase migration list --db-url "postgres://$SUPABASE_DB_URI"

Project Structure

backend/
β”œβ”€β”€ bin/                                                                       # Executable scripts
β”‚   β”œβ”€β”€ initialize_backend.sh                                                  # Main orchestrator (Bash)
β”‚   β”œβ”€β”€ setup_bucket.py                                                        # Storage configuration (Python)
β”‚   β”œβ”€β”€ download_osm_data.py                                                   # OSM initial data loader (Python)
β”‚   └── download_alimconfiance_data.py                                         # Government initial data loader (Python)
β”‚
β”œβ”€β”€ supabase/                                                                  # Supabase configuration
β”‚   β”œβ”€β”€ config.toml                                                            # Service configuration
β”‚   β”œβ”€β”€ migrations/                                                            # Database migrations
β”‚   β”‚   β”œβ”€β”€ 20251008212033_create_medallion_architecture.sql                   # Schemas: bronze, silver, gold
β”‚   β”‚   β”œβ”€β”€ 20251008230529_create_restaurant_ratings_enrichment_tables.sql     # API response tables
β”‚   β”‚   └── 20251025000000_enable_database_extensions.sql                      # PostGIS and fuzzystrmatch extensions
β”‚   └── functions/                                                             # Edge Functions (future)
β”‚
β”œβ”€β”€ logs/                                                                      # Execution logs (gitignored)
β”‚   └── backend_initialization_*.log                                           # Timestamped logs
β”‚
β”œβ”€β”€ .env.sample                                                                # Environment template
β”œβ”€β”€ .env                                                                       # Local configuration (gitignored)
β”œβ”€β”€ .gitignore                                                                 # Git exclusions
β”œβ”€β”€ pyproject.toml                                                             # Python dependencies
└── README.md                                                                  # This documentation

Data Architecture

Storage Layer

Bucket: data_lake

  • Access: Private (service role only)
  • File Types: Parquet, CSV
  • Size Limit: 50MB per file
  • Organization: Date-partitioned folders

Database Schemas

Extensions Schema (PostgreSQL Extensions)

Extension Schema Description Use Case
postgis extensions Core geospatial types and functions Restaurant proximity searches, coordinate transforms
postgis_raster extensions Raster data support Heatmaps, density analysis, coverage visualization
postgis_sfcgal extensions Advanced 3D geometry operations Complex spatial calculations, building-level analysis
postgis_topology topology Spatial topology management Administrative boundaries, inspection zone management
fuzzystrmatch extensions Fuzzy string matching functions Restaurant name matching, typo tolerance
geohash_adjacent extensions Compute adjacent GeoHash cell in a direction Spatial neighbor computation, boundary handling
geohash_neighbors extensions Get 9-cell array (center + 8 neighbors) Spatial batch processing, boundary coverage

Spatial Reference Systems: WGS84 (4326), Web Mercator (3857), Lambert-93 (2154)

Bronze Schema (Raw Data)

Table Description Initial Data Row Count
osm_france_food_service OpenStreetMap restaurants Latest snapshot at initialization ~165K
export_alimconfiance Health inspections Latest available (daily refresh) ~80K
google_places Google Maps restaurant ratings Populated by data-pipeline via Airflow Variable*
tripadvisor_location_details Tripadvisor restaurant ratings Populated by data-pipeline via Airflow Variable*

*API-based tables: Row counts vary based on monthly query budget. Airflow prioritizes new restaurants first, then oldest updates. Historical data is preserved.

Silver Schema (Cleansed Data)

Populated by the data-pipeline project using DBT transformations. Contains validated and standardized datasets where OpenStreetMap restaurant data is linked with official health inspection results, enabling accurate matching for rating enrichment.

Gold Schema (Business Layer)

Populated by the data-pipeline project using DBT transformations. Contains the final aggregated dataset with all restaurant data enriched with Google Maps and Tripadvisor ratings, providing comprehensive metrics and KPIs for food establishment assessment.

Data Sources

OpenStreetMap France

  • Provider: OpenDataSoft
  • Coverage: Metropolitan France + overseas
  • Initial Load: Latest available snapshot at initialization (dated to most recent Monday)
  • Format: Parquet
  • License: ODbL
  • Note: Regular updates handled by data-pipeline project. The initialization script calculates the most recent Monday date for consistency with OpenDataSoft's weekly refresh cycle.

Alim'confiance

  • Provider: Ministry of Agriculture (via OpenDataSoft DGAL)
  • Coverage: All French food establishments
  • Initial Load: Latest available snapshot at initialization
  • Format: Parquet
  • License: Open License 2.0
  • Note: Data refreshed daily. Regular updates handled by data-pipeline project

Version: 0.0.3 Last Updated: 2025-12-09 Maintainers: Jonathan About

About

Backend infrastructure managing restaurant ratings and health inspection data storage and serverless functions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published