This repository hosts the official evaluation system for the Graph Topology Ablation (GTA) challenge. Participants submit predictions for ideal and perturbed graph topologies. All submissions are encrypted, automatically evaluated, and ranked on a public leaderboard.
📊 Live leaderboard: Open leaderboard
Participants must generate predictions for two settings:
- ✅ Ideal graph topology – clean, unmodified node features.
- ✅ Perturbed graph topology – node features corrupted by a combination of distribution shift and Gaussian noise.
The goal is to build a Graph Neural Network (GNN) that is both accurate on clean data and robust to realistic feature corruptions.
We use the MUTAG dataset, a classic benchmark for graph classification from chemical informatics.
🔗 Official source:
https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/MUTAG.zip
| Property | Value |
|---|---|
| Task | Binary graph classification |
| Domain | Chemical compounds (mutagenic vs non‑mutagenic) |
| Number of graphs | 188 |
| Avg. nodes per graph | ~18 |
| Avg. edges per graph | ~40 |
| Node features | Categorical atom labels (interpreted as features) |
| Number of classes | 2 |
Each graph represents a molecule:
- Nodes – atoms
- Edges – chemical bonds
- Graph label – indicates whether the molecule is mutagenic to Salmonella typhimurium.
The dataset is split 70/30 with stratification by class:
data/train.csv– labeled training graphs (70%)data/test.csv– unlabeled test graphs (30%)
Training labels are provided in data/train.csv with columns graph_index and label.
Test graphs are listed in data/test.csv with only graph_index (labels are hidden for scoring).
Two types of feature corruption are applied to the test graphs to generate the perturbed setting:
-
Distribution Shift
A constant offset is added to each node feature:
x ← x + δ
whereδ = feature_shift(default0.3).
Simulates systematic measurement bias or domain shift. -
Gaussian Noise Injection
Random noise is added to each feature:
x ← x + ϵ, ϵ ~ N(0, σ²)
whereσ = noise_std(default0.05).
Simulates noisy feature extraction.
Purpose:
This setup evaluates whether a GNN:
- relies on exact feature values,
- generalizes under feature distribution shift,
- remains stable under noisy topological descriptors.
The model is trained on clean features and evaluated on corrupted features to measure robustness.
Performance is measured using the F1 score (macro‑averaged) because it balances precision and recall, providing a more reliable measure than accuracy alone.
Each submission is evaluated under two conditions:
- F1 Score (Ideal) – performance on clean topological features.
- F1 Score (Perturbed) – performance on corrupted features.
To quantify robustness, we compute the Robustness Gap:
Robustness Gap = F1_ideal − F1_perturbed
A smaller gap indicates a more stable and reliable model.
- Highest Perturbed F1 Score (primary)
- Lowest Robustness Gap (secondary)
- Most recent submission (tie‑breaker)
Only the best perturbed score per participant is kept in the leaderboard.
Create a Python virtual environment and install dependencies:
Starter Code
A baseline GIN model is provided in starter_code/baseline.py. You can modify it or build your own model.
The script Loads the MUTAG dataset:
Reads train.csv and test.csv.
Trains on clean graphs.
Generates predictions for both ideal and perturbed test graphs.
Saves submission files in the required format.
cd starter_code
starter_code/python baseline.py
This will create ideal_submission.csv and perturbed_submission.csv in the submissions/ folder (which is git‑ignored).
Submissions must be encrypted and placed inside a folder named after your team.
Step 1: Prepare your submission files
Your CSV files must have the following format (example for 38 test graphs):
graph_index,label
160,1
62,0
48,0
173,1
.....
.....
ideal_submission.csv- predictions on clean test graphs.
perturbed_submission.csv – predictions on perturbed test graphs.
Step 2: Encrypt your files
From the project root, run the encryption script:
cd submissions
submissions/python encrypt_submissions.py
cd ..
This script will:
Look for ideal_submission.csv and perturbed_submission.csv in 'submissions' folder.
-
Encrypt them using the public key (submissions/python encrypt_submissions.py).
-
It will Produce ideal.enc and perturbed.enc in the same folder (submissions/<Team_Name>/*.enc).
Only the .enc files should be committed; the raw .csv files remain local (they are git‑ignored). Note: Please dont forget to create submisions/<Team_Name>/ (two *enc files will be placed here)
Step 3: Commit and push
-
Fork the repository.
-
Create a folder submissions// and place the .enc files inside.
project_root/
├── encryption/
│ ├── __init__.py
│ └── encrypt.py # contains encrypt_file function
├── submissions/
│ ├── ideal_submission.csv
│ ├── perturbed_submission.csv
│ └── team_name/ # <-- folder to be created
│ ├── ideal.enc # For your geenrated Ideal_submission.csv.enc change name to ideal.enc (mandatory) see, submissions/Idrees_Bhat/
│ └── perturbed.enc # or your geenrated perturbed_submission.csv.enc change name to perturbed.enc (mandatory) see, submissions/Idrees_Bhat/
└── encrypt_submissions.py # script that encrypts files
- Create a new branch, commit only the .enc files, and open a Pull Request (PR) against the main branch.
Important:
Do not commit any raw .csv files.
Ensure your team folder name does not contain spaces.
Step 4: Automatic evaluation
-
Once the PR is opened, the automated workflow will:
-
Decrypt your files (using the organiser’s private key, stored as a secret).
-
Compute F1 scores for both ideal and perturbed submissions.
-
Calculate the robustness gap.
-
Update the leaderboard (only your best perturbed score is retained).
-
The live leaderboard will reflect the new results within minutes.
gnn-topology-ablation/
│
├── .github/ # GitHub Actions workflows
│ ├── scripts/ # Helper scripts for evaluation
│ └── workflows/ # CI/CD pipeline definition
│
├── data/ # Dataset files
│ └── MUTAG/
│ ├── test.csv # Test graph indices
│ └── train.csv # Training labels
│
├── docs/ # Live leaderboard website
│ ├── index.html
│ ├── leaderboard.css
│ ├── leaderboard.csv # Auto‑generated ranking
│ ├── leaderboard.js
│ └── readme
│
├── encryption/ # Encryption/decryption utilities
│ ├── __init__.py
│ ├── decrypt.py
│ ├── encrypt.py
│ ├── generate_keys.py
│ └── public_key.pem # Public key for participants
│
├── leaderboard/ # Scoring and ranking logic
│ ├── __init__.py
│ ├── calculate_scores.py
│ ├── hidden_labels_reader.py
│ ├── render_leaderboard.py
│ ├── score_submission.py
│ └── update_leaderboard.py
│
├── starter_code/ # Participant starter kit
│ ├── baseline.py
│ └── requirements.txt
│
├── submissions/ # Encrypted submissions (git‑tracked)
│ └──<Team_Name>
│ └── encrypt_submissions.py # encrypts the *.CSVs
│
├── .env.example
├── .gitignore
├── LICENSE
├── README.md
├── leaderboard.md
└── requirements.txt
└── scoring_scripts.py
└── utils.py
-
Predictions are encrypted locally using a symmetric key, which is then encrypted with the organiser’s RSA public key.
-
Only the organiser (with the corresponding private key stored as a GitHub secret) can decrypt the submissions.
-
Encrypted files are visible in the repository but completely unreadable without the private key.
-
This ensures blind evaluation – participants cannot see each other’s predictions, and the organiser cannot see them until after the submission deadline (if desired).
This project is released under the MIT License. See the LICENSE file for details.