Live demo: galaxy-classification.vercel.app
The API runs on Render's free tier — if the page sits cold it may take ~30 s for the first prediction to come back while the worker wakes up.
A portfolio project that classifies galaxy morphology from a single image. A fine-tuned EfficientNet-B0 trained on the Galaxy Zoo 2 dataset is served by a FastAPI backend; the frontend is a Next.js + Three.js site that classifies a curated gallery live on every page load and lets visitors upload their own galaxy images.
Live gallery — every card is classified by the model on page load. The card border glows in the predicted class's color, and ground-truth labels are shown alongside the model's verdict so you can see when it agrees and when it doesn't.
Upload your own galaxy image — the model returns probabilities across all five classes with a written description of the predicted morphology.
| Class | Color |
|---|---|
| Elliptical | orange |
| Spiral | blue |
| Barred Spiral | cyan |
| Edge-on Disk | indigo |
| Merger | pink |
Held-out test accuracy: 78.9% (per-class: edge-on 93.4%, merger 85.6%, elliptical 82.6%, barred 69.4%, spiral 64.4%). Spiral ↔ barred-spiral is the expected hard split since bars can be subtle even for human classifiers.
galaxy-classification/
ml/ # training pipeline (see ml/README inline in scripts)
class_map.py # 5-class labeling logic over Galaxy Zoo 2 codes
prepare_data.py # join labels + mapping, balance, train/val/test splits
dataset.py # PyTorch Dataset that streams images from the zip
train.py # EfficientNet-B0 transfer learn + ONNX export
colab_train.ipynb # GPU training wrapper for Colab
build_gallery.py # picks ~25 curated images for the website
verify_onnx.py # sanity check the exported ONNX model
artifacts/ # trained weights + ONNX export
api/ # FastAPI service
main.py # /health, /gallery, /predict, /predict-gallery/<f>
model.py # ONNX inference wrapper with training-matched preprocess
requirements.txt
web/ # Next.js + TS + Tailwind frontend
app/
components/
lib/
public/gallery/ # 25 curated JPGs surfaced as the live gallery
data/ # NOT in git — populated by the user
raw/
gz2_hart16.csv.gz
gz2_filename_mapping.csv
images_gz2.zip
processed/ # produced by ml/prepare_data.py
train.csv
val.csv
test.csv
pip install -r api/requirements.txt
pip install torch torchvision pandas tqdm scikit-learn onnxscript
cd web
npm install
cd ..Place these files in data/raw/:
images_gz2.zipfrom Zenodo 3565489gz2_filename_mapping.csvfrom the same Zenodo recordgz2_hart16.csv.gzfrom https://data.galaxyzoo.org/
python ml/prepare_data.py # produces data/processed/{train,val,test}.csvThen either upload the folder to Google Drive and run ml/colab_train.ipynb on
a Colab T4 GPU (~30 min), or train locally on CPU (slow, ~6 hr/epoch):
python ml/train.py --epochs 8 --batch-size 32 --workers 4Either path writes ml/artifacts/galaxy_classifier.onnx and best.pt.
After training, pick a curated set of demo galaxies:
python ml/build_gallery.pyThis writes ~25 JPGs to web/public/gallery/ and a manifest at
web/public/gallery.json.
Two terminals — both stay running:
# Terminal 1 — API
python -m uvicorn api.main:app --port 8000 --reload
# Terminal 2 — Frontend
cd web
npm run devOpen http://localhost:3000.
- Training preprocessing: center-crop to 212×212 (the inner half of GZ2's 424×424 frames where the galaxy sits), resize to 224×224, ImageNet normalize. Augmentation is rotation/flip — galaxies have no canonical orientation.
- Inference preprocessing (in
api/model.py) matches training exactly so user uploads of arbitrary sizes are funneled into the same effective view the model was trained on. - Gallery cards call
/predict-gallery/<filename>from the browser on mount, staggered by ~120 ms each so the page doesn't fire 25 requests simultaneously. Probability bars animate in via Framer Motion. The card border glows in the predicted class's color. - Upload zone accepts drag-drop or click, sends to
/predictas multipart/form-data, displays the top class with description and a stack of probability bars.
- The 3.2 GB image zip is never extracted; both training and gallery-building
stream JPGs directly out of the zip via
zipfile. - The ONNX export uses opset 17 +
dynamo=Falseto avoid the new exporter's Unicode print statements which crash on Windows cp1252 consoles. - CORS in
api/main.pyallowslocalhost:3000,127.0.0.1:3000, and anyhttps://*.vercel.apporigin. Extra origins for custom domains can be added at runtime via theFRONTEND_ORIGINSenv var (comma-separated).
This project uses the Galaxy Zoo 2 dataset. If you use it for academic work, please cite:
- Willett, K. W., et al. (2013). "Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the Sloan Digital Sky Survey." Monthly Notices of the Royal Astronomical Society, 435(4), 2835–2860.
- Hart, R. E., et al. (2016). "Galaxy Zoo: comparing the demographics of spiral arm number and pitch angle between galaxy zoo and galaxy zoo 2." Monthly Notices of the Royal Astronomical Society, 461(4), 3663–3682.
Imagery is from the Sloan Digital Sky Survey (SDSS Data Release 7). Funding for SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the U.S. Department of Energy, NASA, the National Science Foundation, and the Japanese Monbukagakusho.
Released under the MIT License. You're free to use, modify, and
distribute the code for any purpose — academic, commercial, or personal —
as long as the copyright notice in LICENSE is preserved.
Note that this license covers the code only. The Galaxy Zoo 2 catalog, SDSS imagery, and citation requirements above are governed by their respective sources.



