Skip to content

Kaushik1128/galaxy-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Galaxy Classifier

Live demo: galaxy-classification.vercel.app

The API runs on Render's free tier — if the page sits cold it may take ~30 s for the first prediction to come back while the worker wakes up.

Landing page with starfield, gradient title, and class legend

A portfolio project that classifies galaxy morphology from a single image. A fine-tuned EfficientNet-B0 trained on the Galaxy Zoo 2 dataset is served by a FastAPI backend; the frontend is a Next.js + Three.js site that classifies a curated gallery live on every page load and lets visitors upload their own galaxy images.

Screenshots

Live gallery — every card is classified by the model on page load. The card border glows in the predicted class's color, and ground-truth labels are shown alongside the model's verdict so you can see when it agrees and when it doesn't.

Gallery row of elliptical galaxies with high-confidence predictions

Gallery row of spiral and barred-spiral galaxies with probability bars

Upload your own galaxy image — the model returns probabilities across all five classes with a written description of the predicted morphology.

Upload zone with a spiral galaxy classified at 63% confidence, full probability breakdown on the right

Five classes

Class Color
Elliptical orange
Spiral blue
Barred Spiral cyan
Edge-on Disk indigo
Merger pink

Held-out test accuracy: 78.9% (per-class: edge-on 93.4%, merger 85.6%, elliptical 82.6%, barred 69.4%, spiral 64.4%). Spiral ↔ barred-spiral is the expected hard split since bars can be subtle even for human classifiers.

Repo layout

galaxy-classification/
  ml/                       # training pipeline (see ml/README inline in scripts)
    class_map.py            # 5-class labeling logic over Galaxy Zoo 2 codes
    prepare_data.py         # join labels + mapping, balance, train/val/test splits
    dataset.py              # PyTorch Dataset that streams images from the zip
    train.py                # EfficientNet-B0 transfer learn + ONNX export
    colab_train.ipynb       # GPU training wrapper for Colab
    build_gallery.py        # picks ~25 curated images for the website
    verify_onnx.py          # sanity check the exported ONNX model
    artifacts/              # trained weights + ONNX export
  api/                      # FastAPI service
    main.py                 # /health, /gallery, /predict, /predict-gallery/<f>
    model.py                # ONNX inference wrapper with training-matched preprocess
    requirements.txt
  web/                      # Next.js + TS + Tailwind frontend
    app/
    components/
    lib/
    public/gallery/         # 25 curated JPGs surfaced as the live gallery
  data/                     # NOT in git — populated by the user
    raw/
      gz2_hart16.csv.gz
      gz2_filename_mapping.csv
      images_gz2.zip
    processed/              # produced by ml/prepare_data.py
      train.csv
      val.csv
      test.csv

Setup

One-time

pip install -r api/requirements.txt
pip install torch torchvision pandas tqdm scikit-learn onnxscript

cd web
npm install
cd ..

Download the dataset (~3.3 GB)

Place these files in data/raw/:

Train

python ml/prepare_data.py     # produces data/processed/{train,val,test}.csv

Then either upload the folder to Google Drive and run ml/colab_train.ipynb on a Colab T4 GPU (~30 min), or train locally on CPU (slow, ~6 hr/epoch):

python ml/train.py --epochs 8 --batch-size 32 --workers 4

Either path writes ml/artifacts/galaxy_classifier.onnx and best.pt.

Build the gallery

After training, pick a curated set of demo galaxies:

python ml/build_gallery.py

This writes ~25 JPGs to web/public/gallery/ and a manifest at web/public/gallery.json.

Run

Two terminals — both stay running:

# Terminal 1 — API
python -m uvicorn api.main:app --port 8000 --reload

# Terminal 2 — Frontend
cd web
npm run dev

Open http://localhost:3000.

How it works

  1. Training preprocessing: center-crop to 212×212 (the inner half of GZ2's 424×424 frames where the galaxy sits), resize to 224×224, ImageNet normalize. Augmentation is rotation/flip — galaxies have no canonical orientation.
  2. Inference preprocessing (in api/model.py) matches training exactly so user uploads of arbitrary sizes are funneled into the same effective view the model was trained on.
  3. Gallery cards call /predict-gallery/<filename> from the browser on mount, staggered by ~120 ms each so the page doesn't fire 25 requests simultaneously. Probability bars animate in via Framer Motion. The card border glows in the predicted class's color.
  4. Upload zone accepts drag-drop or click, sends to /predict as multipart/form-data, displays the top class with description and a stack of probability bars.

Notes

  • The 3.2 GB image zip is never extracted; both training and gallery-building stream JPGs directly out of the zip via zipfile.
  • The ONNX export uses opset 17 + dynamo=False to avoid the new exporter's Unicode print statements which crash on Windows cp1252 consoles.
  • CORS in api/main.py allows localhost:3000, 127.0.0.1:3000, and any https://*.vercel.app origin. Extra origins for custom domains can be added at runtime via the FRONTEND_ORIGINS env var (comma-separated).

Data attribution

This project uses the Galaxy Zoo 2 dataset. If you use it for academic work, please cite:

  • Willett, K. W., et al. (2013). "Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the Sloan Digital Sky Survey." Monthly Notices of the Royal Astronomical Society, 435(4), 2835–2860.
  • Hart, R. E., et al. (2016). "Galaxy Zoo: comparing the demographics of spiral arm number and pitch angle between galaxy zoo and galaxy zoo 2." Monthly Notices of the Royal Astronomical Society, 461(4), 3663–3682.

Imagery is from the Sloan Digital Sky Survey (SDSS Data Release 7). Funding for SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the U.S. Department of Energy, NASA, the National Science Foundation, and the Japanese Monbukagakusho.

License

Released under the MIT License. You're free to use, modify, and distribute the code for any purpose — academic, commercial, or personal — as long as the copyright notice in LICENSE is preserved.

Note that this license covers the code only. The Galaxy Zoo 2 catalog, SDSS imagery, and citation requirements above are governed by their respective sources.

About

Deep-learning galaxy morphology classifier — EfficientNet-B0 fine-tuned on Galaxy Zoo 2, served via FastAPI with a Next.js + Three.js front-end that classifies a live gallery and any image you upload.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors