Skip to content

WhatsThisClint/welllabs-data-platform-uploader

Repository files navigation

WELL Labs Data Platform Uploader

A cross-platform terminal uploader for data.welllabs.org.

This repository contains a single Python script that helps WELL Labs Data Platform users upload datasets and files from Windows, macOS, or Linux. It follows the same upload flow as the web application, while adding a terminal-friendly workflow for metadata, API-key setup, and pre-upload sensitivity checks.

What It Does

  • Prompts for WELL_API_KEY if it is not already set.
  • Optionally saves the API key for future terminal sessions.
  • Provides a local Tkinter UI for users who prefer forms over terminal prompts.
  • Prompts for WELL Labs login credentials locally.
  • Creates a logged-in upload session.
  • Creates new datasets.
  • Adds files to existing datasets.
  • Replaces datasets by uploading a new dataset first, then asking whether to delete the old one.
  • Deletes datasets only after explicit confirmation.
  • Supports reviewed metadata JSON files so metadata can be prepared before upload.
  • Runs a pre-upload scan for sensitive or traceability fields.
  • Offers an interactive sanitization step to drop or anonymise selected columns or JSON keys.
  • Checks the upload size limit and can ZIP large files before upload.

The script does not contain credentials, API keys, or organization-specific data.

Requirements

  • Python 3.10 or newer.
  • A WELL Labs Data Platform account.
  • A WELL Labs Data Platform API key.
  • Permission to upload to data.welllabs.org.

No external Python packages are required. The script uses only the Python standard library.

Install

Download or clone this repository, then run the script directly.

git clone https://github.com/WhatsThisClint/welllabs-data-platform-uploader.git
cd welllabs-data-platform-uploader

On Windows, use PowerShell. On macOS or Linux, use Terminal.

Recommended Launcher

Use the launcher when you want to choose UI or terminal at the beginning.

Windows:

python .\welllabs_uploader.py

macOS or Linux:

python3 welllabs_uploader.py

It asks:

1. Local UI
2. Terminal

You can also skip the question:

python3 welllabs_uploader.py --ui
python3 welllabs_uploader.py --terminal --mode new-dataset --file "/path/to/your-file.csv"

Local UI

Open the local uploader UI:

Windows:

python .\welllabs_uploader_ui.py

macOS or Linux:

python3 welllabs_uploader_ui.py

The UI runs on your computer. It lets you choose a file, fill metadata in forms, scan for sensitive fields, drop or anonymise supported columns or JSON keys, save or load metadata JSON, ZIP files that exceed the platform upload limit, and upload using your own API key and WELL Labs login.

The UI does not save passwords. If you choose to save the API key, it uses the same WELL_API_KEY behavior as the terminal uploader.

Quick Start

Create a new dataset and upload one file.

Windows:

python .\welllabs_data_platform_uploader.py --mode new-dataset --file "C:\path\to\your-file.csv"

macOS or Linux:

python3 welllabs_data_platform_uploader.py --mode new-dataset --file "/path/to/your-file.csv"

If the API key is not available, the script asks for it. If you choose to save it, future runs will use WELL_API_KEY.

Add a File

Add a file to an existing dataset:

python3 welllabs_data_platform_uploader.py --mode add-file --dataset-id DATASET_UUID --file "/path/to/your-file.csv"

Use Metadata JSON

Prepare metadata in advance and pass it to the uploader:

python3 welllabs_data_platform_uploader.py --mode new-dataset --file "/path/to/your-file.csv" --metadata "./metadata.example.json"

When --metadata is used, the script uses the JSON values and prompts only for missing required values, API-key setup, and login.

See metadata.example.json.

Replace a Dataset

Replace mode uploads a new dataset first. Only after the new upload succeeds does the script ask whether to delete the old dataset.

python3 welllabs_data_platform_uploader.py --mode replace-dataset --dataset-id OLD_DATASET_UUID --file "/path/to/new-file.csv" --metadata "./metadata.example.json"

Delete a Dataset

Delete mode shows the dataset details and requires typing DELETE before it deletes anything.

python3 welllabs_data_platform_uploader.py --mode delete-dataset --dataset-id DATASET_UUID

Sensitive Data Preflight

Before uploading, the script scans supported file types for sensitive or traceability fields.

It checks for likely:

  • beneficiary or household fields
  • farmer, respondent, or person fields
  • phone, contact, email, address, Aadhaar, or identity fields
  • owner fields
  • crop fields that may contribute to household traceability
  • latitude, longitude, GPS, or coordinate fields

Supported scans:

  • GeoPackage: .gpkg
  • CSV: .csv
  • TSV: .tsv
  • Excel workbook: .xlsx
  • JSON and GeoJSON: .json, .geojson

For GeoPackage files, the scan checks all non-system tables and layers. It checks column names across the whole package and samples text values for email, phone, and identity-like patterns.

If possible sensitive fields are found, the script prints the findings and asks whether to continue.

It also offers to create a sanitized copy before upload. The original file is not modified. In the sanitizer, you can:

  • drop selected columns or JSON keys
  • anonymise selected columns or JSON keys with one-way stable tokens
  • write the prepared copy to a separate folder with --prepared-output-dir

Manual sanitization currently supports:

  • GeoPackage: .gpkg
  • CSV: .csv
  • TSV: .tsv
  • JSON and GeoJSON: .json, .geojson

Run the sanitizer even when the automatic scan does not flag anything:

python3 welllabs_data_platform_uploader.py --mode new-dataset --file "/path/to/file.gpkg" --metadata "./metadata.example.json" --sanitize-before-upload

Write sanitized or compressed copies to a chosen folder:

python3 welllabs_data_platform_uploader.py --mode new-dataset --file "/path/to/file.gpkg" --prepared-output-dir "./prepared_uploads"

Stop immediately on possible sensitive fields:

python3 welllabs_data_platform_uploader.py --mode new-dataset --file "/path/to/file.gpkg" --metadata "./metadata.example.json" --fail-on-sensitive

Skip the scan only after separate review:

python3 welllabs_data_platform_uploader.py --mode new-dataset --file "/path/to/file.gpkg" --metadata "./metadata.example.json" --skip-sensitivity-check

Large Files

The current platform object upload limit is 5 GiB. Before upload, the script checks the final file size. If the file is larger than the limit, it asks whether to create a ZIP archive and upload that ZIP instead.

Automatically compress large files without prompting:

python3 welllabs_data_platform_uploader.py --mode new-dataset --file "/path/to/large-file.gpkg" --metadata "./metadata.example.json" --compress-large-files

Fail instead of compressing:

python3 welllabs_data_platform_uploader.py --mode new-dataset --file "/path/to/large-file.gpkg" --no-compress-large-files

If a file is compressed, the uploader sets the uploaded file format metadata to zip and adds a note that the ZIP should be extracted before use.

Metadata Fields

Dataset metadata:

  • title
  • description
  • tags
  • private

File metadata:

  • file_title
  • file_description
  • file_format
  • provenance
  • source
  • cite_as
  • permissions
  • groundtruthed
  • duration
  • value
  • temporal_from
  • temporal_to
  • geography

The data platform rejects some special characters in text fields. The uploader removes:

< > " ' & ; ( )

Security Notes

  • Do not commit API keys, passwords, cookies, transcripts, or uploaded data.
  • Do not paste API keys into chat, screenshots, public issues, or README files.
  • Rotate your API key if it is accidentally exposed.
  • The uploader does not save passwords.
  • The sensitivity scan is a safety layer, not a final release decision.
  • GeoPackage and GeoJSON geometries are retained unless you separately generalize or remove them. Dropping latitude and longitude attribute columns does not spatially anonymize the geometry itself.

Common Problems

Python is not installed

Install Python 3 from:

https://www.python.org/downloads/

On Windows, make sure python.exe is added to PATH.

Sign-in fails

Check that:

  • Your email and password work on https://data.welllabs.org.
  • Your account is verified.
  • Your account has upload permission.

Upload-job creation fails

This usually means the login session was not created or the account does not have upload permission.

EntityTooLarge or maximum allowed size

The platform rejected the uploaded object because it exceeded the 5 GiB limit. Re-run the upload and choose compression when prompted, or pass --compress-large-files.

Sensitive fields are reported

Review the findings before uploading. The uploader can help create a sanitized copy for supported formats, but the file may still need to be kept private, aggregated, or withheld.

About

Cross-platform terminal uploader for data.welllabs.org

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages