This competition bridges Biomedical Engineering and Graph Machine Learning. Your task is to build a model that can diagnose the type of a cell (Tumor, Stroma, Immune, etc.) based on its features and its spatial neighbors in a tissue biopsy. Standard AI often fails in clinics because it memorizes training data, so our challenge forces models to predict cell types inductively, meaning they must generalize to completely unseen patients.
You are provided with cell graphs constructed from H&E stained histology images.
- Training Phase: You receive full graphs (nodes, edges, and labels) from a set of patients (e.g., Patient A, Patient B).
- Testing Phase: You must predict the cell types for completely unseen patients (e.g., Patient C).
Unlike standard transductive tasks (like Cora), the test nodes belong to entirely new graphs (new tissue slides) that were not present during training. Your model must learn general rules about tissue organization, not just memorize a specific graph structure.
The data is derived from the NuCLS dataset (breast cancer).
The graph was built using the following inductive pipeline to ensure biological realism:
[ Histology Image ] --> [ Cell Detection ] --> [ Graph Construction ]
🖼️ 📍 🕸️
(Raw Pixels) (Centroids x,y) (Nodes + Edges)
|
(k-NN Neighbors)
| Component | Description |
|---|---|
| Node Definition | Raw bounding boxes from NuCLS were converted into centroids (x, y). Each node represents a single cell nucleus. |
| Edge Construction (k-NN) | For every cell, we computed its 5 nearest spatial neighbors within the same tissue image. Edges represent the local tissue microenvironment (e.g., cell–cell interactions). Edges strictly connect cells within the same image — no edges between different patients. |
| Inductive Split | Dataset was split by Image ID, not by random cells. Training graph = 80% of tissue images. Test graph = remaining 20% (completely unseen patients). |
| Feature Normalization | Node features (x, y, width, height) are standardized (zero mean, unit variance) for stable GNN training. |
- Nodes: Individual cells (nuclei)
- Edges: Spatial proximity via k-Nearest Neighbors (
k=5). Physically close cells are connected.
| Feature | Description |
|---|---|
x, y |
Normalized spatial coordinates |
width, height |
Morphological features of the nucleus |
| Label | Cell Type | Description |
|---|---|---|
0 |
Tumor | Malignant cells |
1 |
Stromal | Connective tissue |
2 |
Lymphocyte | Immune cells |
3 |
Macrophage | Immune cells |
| File | Description | Columns |
|---|---|---|
train.csv |
Training nodes with labels | id, x, y, width, height, label |
edge_list.csv |
Edges for the training graph | source, target |
test_nodes.csv |
Unseen test nodes (no labels) | id, x, y, width, height |
test_edges.csv |
Edges for the test graph | source, target |
sample_submission.csv |
Example submission format | — |
Create a single CSV file named strictly using your team name: <team_name>.csv (e.g., emmanuel_owusu.csv).
id,y_pred
41269,0
41270,2
...Requirements:
- Header must be exactly:
id,y_pred - One row per test node
- Labels must be integers in
[0–3]
git clone https://github.com/emmakowu3579-ui/inductive-class-challenge.git
cd inductive-class-challenge
pip install -r starter_code/requirements.txtA simple PyTorch GCN baseline is provided in the starter_code/ directory.
python starter_code/baseline.pyThis will:
- Train a basic GCN on the training graph
- Generate a submission file at
submissions/baseline_submission.csv
To preserve privacy, you must encrypt your CSV file before uploading. Do not upload raw CSV files.
# Usage: python starter_code/encrypt.py <path_to_your_team_csv>
python starter_code/encrypt.py submissions/<team_name>.csv
# Example:
# python starter_code/encrypt.py submissions/emmanuel_owusu.csv
⚠️ IMPORTANT: Do NOT commit the raw.csvfile. Ensure you are only committing the.encfile with your team name.
git add submissions/<team_name>.csv.enc
git commit -m "Add encrypted submission for <team_name>"
git push origin <your-branch-name>
# Example:
# git add submissions/emmanuel_owusu.csv.enc
# git commit -m "Add encrypted submission for emmanuel_owusu"
# git push origin mainThen open a Pull Request against the main branch on GitHub.
Once your PR is opened:
- ✅ An Auto-Grader Bot runs automatically (it decrypts your file securely)
- 📊 Your Macro F1-Score is computed
- 💬 The score is posted as a comment on your PR
If the submission is valid, the PR will be merged by an admin and 🎉 your name appears on the Leaderboard.
| Rule | Detail |
|---|---|
| Evaluation Metric | Macro F1-Score |
| Inductive Setting | No access to test labels during training. No memorization of node IDs or embeddings. |
| Message Passing | Allowed only on the training graph during training. Test edges may be used only at inference time. |
| External Data | ❌ Strictly forbidden |
| Runtime Constraint | Training must finish in < 5 minutes on Google Colab (CPU/GPU) |
| Libraries | Any standard GNN library (PyTorch, PyTorch Geometric, DGL, etc.) |
Normal table leaderboard: 📈 View Leaderboard
Interactive web leaderboard that updates automatically whenever a new submission is graded!