Predict BACE-1 enzyme inhibition using Graph Neural Networks
Welcome to the GNN Molecular Graph Classification Challenge — a Kaggle-style competition designed to benchmark Graph Neural Network architectures on molecular property prediction.
Given a molecular graph
-
Nodes
$V$ represent atoms with features$\mathbf{x}_v \in \mathbb{R}^d$ encoding atomic properties -
Edges
$E$ represent chemical bonds with features encoding bond types
Your goal is to learn a graph-level representation and predict a binary label
Top performers will be invited to join a high-level research project aiming for publication at NeurIPS 2026.
- Dataset
- Evaluation Metrics
- Getting Started
- Baseline Architectures
- Advanced Architectures
- Submission Process
- Evaluation Dimensions
- Repository Structure
- Rules
- References
We use the OGB MolBACE dataset from the Open Graph Benchmark:
| Split | Molecules | Description |
|---|---|---|
| Train | 1,210 | For training your model |
| Valid | 151 | For local validation and hyperparameter tuning |
| Test | 152 | For final evaluation (labels hidden) |
Each molecule is represented as a graph with:
-
Node features: 9-dimensional vectors
$\mathbf{x}_v \in \mathbb{R}^9$ encoding:- Atomic number (type of atom)
- Chirality tag
- Degree, formal charge, number of H atoms
- Hybridization, aromaticity, and ring membership
- Edge features: 3-dimensional vectors encoding bond type, stereochemistry, and conjugation
The dataset uses a scaffold split based on molecular substructures, ensuring that:
- Test molecules are structurally different from training molecules
- This simulates real-world drug discovery scenarios
- Prevents data leakage from similar molecular scaffolds
The dataset is imbalanced with approximately 30% positive class (active inhibitors). This makes the task non-trivial — a naive classifier predicting all zeros would achieve ~70% accuracy but poor F1.
Submissions are evaluated using Macro F1 Score, which equally weights performance on both classes:
where for each class
with:
Why Macro F1?
- Treats both classes equally regardless of sample size
- Penalizes poor performance on the minority class
- More challenging than accuracy for imbalanced datasets
- Standard metric in molecular property prediction benchmarks
We also track computational efficiency to encourage practical solutions:
where:
-
$\text{time}_{ms}$ = average inference time per batch (milliseconds) -
$\text{params}$ = total number of trainable parameters
Interpretation:
- Logarithmic scaling ensures 10x speedup always gives the same benefit
- Squaring F1 heavily rewards prediction quality
- Balances accuracy with practical deployment considerations
The leaderboard shows both Macro F1 (primary ranking) and Efficiency (secondary metric).
git clone https://github.com/muuki2/gnn-ddi.git
cd gnn-ddi# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r starter_code/requirements.txtcd starter_code
# Run GraphSAGE baseline (default)
python baseline.py
# Run specific model
python baseline.py --model graphsage
python baseline.py --model gcn
python baseline.py --model gin
# Run all baselines for comparison
python baseline.py --allThis will:
- Download the OGB MolBACE dataset automatically
- Train the selected GNN model for 50 epochs
- Generate
{model}_submission.csvin thesubmissions/folder - Report validation F1 score
| Model | Validation Macro F1 |
|---|---|
| GCN | 0.6153 |
| GIN | 0.6103 |
| GraphSAGE | 0.5835 |
from ogb.graphproppred import PygGraphPropPredDataset
dataset = PygGraphPropPredDataset(name='ogbg-molbace')
split_idx = dataset.get_idx_split()
# Get a sample graph
graph = dataset[0]
print(f"Nodes: {graph.num_nodes}, Edges: {graph.num_edges}")
print(f"Node features shape: {graph.x.shape}")
print(f"Label: {graph.y.item()}")Create a CSV file with predictions for all test molecules:
id,target
0,1
1,0
6,1
...id: Molecule index fromdata/test.csvtarget: Your binary prediction (0 or 1)
- Fork this repository
- Add your submission file to
submissions/folder- Name it
your_github_username.csv
- Name it
- Create a Pull Request to the main repository
When you open a Pull Request, the system automatically:
- Validates your submission format
- Evaluates against hidden test labels (stored in a private repository)
- Comments on your PR with your Macro F1 score
- Updates the leaderboard with your result
The test labels are never exposed to participants — they are fetched from a private repository during GitHub Actions execution, ensuring fair evaluation.
To appear on the leaderboard with efficiency metrics, include a metadata file:
Format: Create submissions/your_username_metadata.yaml:
team_name: your_team
model_name: MyGNN
model_architecture:
type: GCN
num_layers: 3
hidden_dim: 64
efficiency_metrics:
inference_time_ms: 5.2
total_params: 45000Use evaluation/speed_benchmark.py to measure these values:
from evaluation.speed_benchmark import ModelProfiler
profiler = ModelProfiler(model)
metrics = profiler.profile_model(loader, device)
print(f"Inference time: {metrics.mean_inference_time_ms} ms")
print(f"Parameters: {metrics.total_params}")See schema/submission_metadata.json for the full schema.
submissions/
├── sample_submission.csv # Example format (152 predictions)
├── your_username.csv # Your submission
└── your_username_metadata.yaml # Optional efficiency metadata
| Rank | Participant | Macro-F1 | Efficiency | Params |
|---|---|---|---|---|
| 🥇 1 | Baseline-GCN | 0.6153 | - | 45K |
| 🥈 2 | Baseline-GIN | 0.6103 | - | 52K |
| 🥉 3 | Baseline-GraphSAGE | 0.5835 | - | 48K |
The competition provides three baseline GNN architectures. Below are their message-passing formulations.
GCN (Kipf & Welling, 2017) performs spectral graph convolutions using a first-order approximation:
where
GraphSAGE (Hamilton et al., 2017) learns to aggregate neighborhood features:
where AGG can be mean, max-pool, or LSTM aggregation. Our baseline uses mean aggregation.
GIN (Xu et al., 2019) achieves maximal expressive power among message-passing GNNs:
where
All models use global mean pooling for graph-level prediction:
followed by a linear classifier:
Beyond the baselines, we provide two advanced architectures with stronger mathematical foundations.
D-MPNN (Yang et al., 2019) is an edge-centric GNN designed for molecular graphs that prevents "message backflow" — a key limitation of standard MPNNs.
Message Passing:
Key Features:
- Messages flow along directed edges
- Prevents information from immediately flowing back to source
- Edge features are first-class citizens
- Particularly effective for molecular property prediction
Implementation: advanced_baselines/dmpnn.py
Our Spectral GNN operates in the graph frequency domain using Chebyshev polynomial approximations.
Chebyshev Convolution:
where:
-
$\tilde{\mathbf{L}} = \frac{2}{\lambda_{max}} \mathbf{L} - \mathbf{I}$ is the scaled Laplacian -
$T_k$ are Chebyshev polynomials:$T_0 = 1, T_1 = x, T_k = 2xT_{k-1} - T_{k-2}$ -
$\theta_k$ are learnable spectral coefficients
Laplacian Regularization Loss:
We minimize the Dirichlet energy to encourage smoothness:
Laplacian Positional Encodings:
Optional positional features from Laplacian eigenvectors:
The first
Implementation: advanced_baselines/spectral_gnn.py
We evaluate submissions along multiple dimensions beyond raw accuracy.
Macro F1 Score is the primary ranking metric (see Evaluation Metrics).
Tracked via the efficiency formula above. We record:
- Inference time (ms per batch)
- Parameter count
- Memory usage
- FLOPs estimate
Use the profiler in evaluation/speed_benchmark.py to measure your model.
Good models should know when they don't know. We provide tools to evaluate:
MC Dropout: Epistemic uncertainty via multiple forward passes with dropout enabled:
Conformal Prediction: Distribution-free prediction sets with coverage guarantees:
where
Temperature Scaling: Post-hoc calibration via:
with temperature
Metrics:
- Expected Calibration Error (ECE)
- Brier Score
- Empirical Coverage at 90%
Implementation: evaluation/uncertainty.py
We evaluate model robustness to graph perturbations:
Attack Types:
- Random Edge Perturbation: Add/remove random edges
- Gradient-Based Attack: Remove high-importance edges
- Feature Noise: Gaussian noise on node features
- Feature Masking: Zero out random features
Metrics:
- Robust Accuracy under attack
- Attack Success Rate (ASR)
Implementation: evaluation/adversarial.py
We visualize the accuracy-efficiency trade-off:
A model is Pareto optimal if no other model is:
- Better in accuracy AND equally efficient, OR
- Equally accurate AND more efficient, OR
- Better in both
Hypervolume Indicator:
Higher hypervolume indicates better overall performance.
Visualization: visualization/pareto_plot.py
- GAT (Graph Attention Network) — attention-weighted message passing
- MPNN (Message Passing Neural Network) — edge-conditioned convolutions
- AttentiveFP — designed specifically for molecular property prediction
- D-MPNN — see our implementation in
advanced_baselines/dmpnn.py - Spectral GNN — see our implementation in
advanced_baselines/spectral_gnn.py - Ensemble methods — combine multiple architectures
- Class weighting — address class imbalance via weighted cross-entropy
- Focal loss — down-weight easy examples, focus on hard ones
- Laplacian regularization — encourage smooth representations (see Spectral GNN)
- Data augmentation — random edge dropping, node feature masking
- Different pooling — sum pooling, attention-based pooling, Set2Set
- Virtual nodes — add a global node connected to all atoms
- Positional encodings — Laplacian eigenvectors, random walk features
- Learning rate scheduling — cosine annealing, warm restarts
- Early stopping — monitor validation F1 to prevent overfitting
- Speed benchmark:
evaluation/speed_benchmark.py— profile inference time - Uncertainty:
evaluation/uncertainty.py— MC Dropout, Conformal Prediction - Adversarial:
evaluation/adversarial.py— robustness testing - Visualization:
visualization/pareto_plot.py— Pareto front analysis
- PyTorch Geometric Documentation
- OGB Leaderboard for MolBACE
- Graph Neural Networks: A Review
- GraphSAGE Paper
- GIN Paper
gnn-ddi/
├── data/
│ ├── train.csv # Training molecule indices
│ ├── valid.csv # Validation molecule indices
│ ├── test.csv # Test molecule indices (labels hidden)
│ └── ogb/ # OGB dataset (auto-downloaded)
├── submissions/
│ ├── sample_submission.csv
│ ├── gcn_submission.csv
│ ├── gin_submission.csv
│ └── graphsage_submission.csv
├── starter_code/
│ ├── baseline.py # Baseline models (GraphSAGE, GCN, GIN)
│ └── requirements.txt # Python dependencies
├── advanced_baselines/
│ ├── dmpnn.py # Directed Message Passing NN
│ └── spectral_gnn.py # Spectral GNN with Laplacian regularization
├── evaluation/
│ ├── speed_benchmark.py # Performance profiling
│ ├── uncertainty.py # Uncertainty quantification
│ └── adversarial.py # Adversarial robustness tests
├── visualization/
│ └── pareto_plot.py # Pareto front visualization
├── schema/
│ └── submission_metadata.json # Metadata JSON schema
├── scripts/
│ └── generate_labels.py # Label generation utility
├── docs/
│ └── PRIVATE_REPO_SETUP.md # Private repo setup guide
├── .github/
│ └── workflows/
│ └── evaluate.yml # Automated scoring workflow
├── scoring_script.py # Evaluation script (Macro F1 + Efficiency)
├── update_leaderboard.py # Leaderboard update utility
├── leaderboard.md # Current standings
└── README.md
Hidden Infrastructure
Test and validation labels are stored in a private repository (gnn-ddi-private) and are only accessed during GitHub Actions evaluation. This ensures:
- Participants cannot access ground truth labels
- Fair and tamper-proof evaluation
- Transparent scoring via automated comments
- No external data: Use only the provided OGB MolBACE dataset
- No pre-trained models: Train from scratch; pre-trained molecular embeddings are not allowed
- One submission per PR: Each pull request should contain exactly one submission file
- Best score kept: Multiple submissions allowed; the leaderboard shows your best score
- Code sharing encouraged: You may share code and ideas, but submit individually
- Fair play: Do not attempt to access test labels or exploit the evaluation system
Q: Can I use libraries other than PyTorch Geometric?
Yes. You can use DGL, Spektral, JAX, or any other framework. Ensure your final predictions follow the CSV format.
Q: How do I test locally before submitting?
Use the validation set to evaluate your model locally. Training labels are available via OGB; only test labels are hidden.
Q: Can I submit multiple times?
Yes. The leaderboard keeps your best score. Each submission triggers a fresh evaluation.
Q: How does the automated scoring work?
When you open a PR, GitHub Actions fetches the hidden test labels from a private repository, runs the scoring script, and comments on your PR with the result.
Q: When does the competition end?
This is an ongoing challenge. Top performers will be contacted for the research opportunity.
- Dataset: Open Graph Benchmark
- Original BACE data: MoleculeNet
If you use this challenge or the methods implemented here, please cite the following:
Open Graph Benchmark (OGB)
@article{hu2020ogb,
title={Open Graph Benchmark: Datasets for Machine Learning on Graphs},
author={Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure},
journal={Advances in Neural Information Processing Systems},
volume={33},
pages={22118--22133},
year={2020}
}MoleculeNet
@article{wu2018moleculenet,
title={MoleculeNet: A Benchmark for Molecular Machine Learning},
author={Wu, Zhenqin and Ramsundar, Bharath and Feinberg, Evan N and Gomes, Joseph and Geniesse, Caleb and Pappu, Aneesh S and Leswing, Karl and Pande, Vijay},
journal={Chemical Science},
volume={9},
number={2},
pages={513--530},
year={2018},
publisher={Royal Society of Chemistry}
}GraphSAGE
@inproceedings{hamilton2017inductive,
title={Inductive Representation Learning on Large Graphs},
author={Hamilton, William L and Ying, Rex and Leskovec, Jure},
booktitle={Advances in Neural Information Processing Systems},
volume={30},
year={2017}
}Graph Convolutional Networks (GCN)
@inproceedings{kipf2017semi,
title={Semi-Supervised Classification with Graph Convolutional Networks},
author={Kipf, Thomas N and Welling, Max},
booktitle={International Conference on Learning Representations},
year={2017}
}Graph Isomorphism Network (GIN)
@inproceedings{xu2019powerful,
title={How Powerful are Graph Neural Networks?},
author={Xu, Keyulu and Hu, Weihua and Leskovec, Jure and Jegelka, Stefanie},
booktitle={International Conference on Learning Representations},
year={2019}
}Directed Message Passing Neural Network (D-MPNN)
@article{yang2019analyzing,
title={Analyzing Learned Molecular Representations for Property Prediction},
author={Yang, Kevin and Swanson, Kyle and Jin, Wengong and Coley, Connor and
Eiden, Philipp and Gao, Hua and Guzman-Perez, Angel and Hopper, Timothy and
Kelley, Brian and Mathea, Miriam and others},
journal={Journal of Chemical Information and Modeling},
volume={59},
number={8},
pages={3370--3388},
year={2019},
publisher={ACS Publications}
}Spectral Graph Theory
@book{chung1997spectral,
title={Spectral Graph Theory},
author={Chung, Fan RK},
year={1997},
publisher={American Mathematical Society}
}Chebyshev Spectral Convolutions
@inproceedings{defferrard2016convolutional,
title={Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering},
author={Defferrard, Micha{\"e}l and Bresson, Xavier and Vandergheynst, Pierre},
booktitle={Advances in Neural Information Processing Systems},
volume={29},
year={2016}
}Conformal Prediction
@article{romano2020classification,
title={Classification with Valid and Adaptive Coverage},
author={Romano, Yaniv and Sesia, Matteo and Candes, Emmanuel},
journal={Advances in Neural Information Processing Systems},
volume={33},
pages={3581--3591},
year={2020}
}PyTorch Geometric
@inproceedings{fey2019fast,
title={Fast Graph Representation Learning with PyTorch Geometric},
author={Fey, Matthias and Lenssen, Jan Eric},
booktitle={ICLR Workshop on Representation Learning on Graphs and Manifolds},
year={2019}
}- Jure Leskovec (Stanford University) — Open Graph Benchmark, GraphSAGE
- Weihua Hu (Stanford University) — Open Graph Benchmark
- Zhenqin Wu and Vijay Pande (Stanford University) — MoleculeNet
- William L. Hamilton, Rex Ying, Jure Leskovec — GraphSAGE
- Thomas N. Kipf, Max Welling — Graph Convolutional Networks
- Keyulu Xu, Weihua Hu, Jure Leskovec, Stefanie Jegelka — Graph Isomorphism Network
- Matthias Fey, Jan Eric Lenssen — PyTorch Geometric
- Deep Graph Library (DGL) Team — DGL Framework
- BASIRA Lab — Research collaboration and support
- Prof. Islem Rekik (Imperial College London) — Mentorship and guidance
- Murat Kolic — Sarajevo, Bosnia and Herzegovina
For questions or issues, please open a GitHub Issue.
Organizer: Murat Kolic (@muuki2)
Location: Sarajevo, Bosnia and Herzegovina
Good luck. May the best GNN win.