Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 134 additions & 20 deletions Graph_Representation_Learning_Rushil_Singha/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# JetNet Graph Diffusion Model

**Author**: Rushil Singha
**GSoC 2025 Project**: Graph-based diffusion models for realistic jet generation

A PyTorch/PyTorch-Geometric implementation of a **graph-based diffusion model** for generating realistic jets from the [JetNet dataset](https://huggingface.co/datasets/jetnet).

This project builds **k-nearest neighbor (kNN) jet graphs**, learns **Chebyshev GCN (ChebNet) embeddings**, trains a **diffusion model in latent space**, and decodes back into particle-level jets.
Expand All @@ -18,37 +21,148 @@ This project builds **k-nearest neighbor (kNN) jet graphs**, learns **Chebyshev

## ⚙️ Installation

Clone the repo and install dependencies:
### Prerequisites
- Python 3.8+ (tested on 3.9)
- CUDA 11.8+ (for GPU acceleration)
- At least 8GB RAM (16GB recommended)

### Setup

```bash
git clone https://github.com/your-username/jetnet-graph-diffusion.git
cd jetnet-graph-diffusion
# Clone and navigate to project
git clone https://github.com/ML4SCI/GENIE.git
cd GENIE/Graph_Representation_Learning_Rushil_Singha

# Install dependencies
pip install -r requirements.txt
```

**Note**: If you encounter PyTorch Geometric installation issues, install manually:
```bash
pip install torch==2.0.0+cu118 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-geometric torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
```

---

## 🏃‍♂️ Usage

### Basic Run
```bash
python code.py
```

### What the script does:
1. **Downloads JetNet dataset** (~2GB) to `jetnet_data/` directory
2. **Preprocesses jets** - extracts particle features (eta, phi, pt) and masks
3. **Builds kNN graphs** - constructs k=8 nearest neighbor graphs for each jet
4. **Trains ChebNet encoder** - learns 64-dimensional latent representations
5. **Runs diffusion training** - trains denoising model in latent space
6. **Generates synthetic jets** - samples new jets from trained model
7. **Evaluates results** - computes KL divergence and Wasserstein distances
8. **Saves outputs** to `results/` directory

### Expected Runtime
- **CPU**: 3-4 hours
- **GPU (RTX 3080+)**: 45-90 minutes
- **Memory usage**: 6-12GB RAM

### Output Files
```
results/
├── training_logs.txt # Training progress and losses
├── generated_jets.png # Comparison plots
├── evaluation_metrics.json # KL divergence, Wasserstein distances
├── model_checkpoints/ # Saved model weights
└── jet_visualizations/ # Individual jet plots
```

requirements.txt

numpy==1.24.3
torch==2.0.0
torch-geometric
torch-scatter
torch-sparse
torch-cluster
networkx
scikit-learn
jetnet
---

## 🔧 Configuration

Key parameters in `code.py`:
```python
# Graph construction
K_NEIGHBORS = 8 # kNN graph connectivity
LATENT_DIM = 64 # Embedding dimension

# Training
BATCH_SIZE = 32 # Adjust based on GPU memory
LEARNING_RATE = 1e-4 # Adam optimizer learning rate
NUM_EPOCHS = 100 # Training epochs
```
# This script:

->Encodes jets into latent space
---

->Runs diffusion training
## 📊 Expected Results

->Decodes jets back into particle space
**Good results show:**
- KL divergence < 0.1 for jet mass and pT distributions
- Wasserstein distance < 0.05 for particle multiplicity
- Generated jets visually similar to real jets in eta-phi space

**If results are poor:**
- Increase training epochs (try 200+)
- Adjust learning rate (try 5e-5 or 2e-4)
- Check GPU memory usage (reduce batch size if needed)

---

## 🐛 Troubleshooting

**Common Issues:**

1. **CUDA out of memory**
```python
# Reduce batch size in code.py
BATCH_SIZE = 16 # or 8
```

2. **JetNet download fails**
```bash
# Manual download alternative
wget https://zenodo.org/record/6975118/files/jetnet.tar.gz
tar -xzf jetnet.tar.gz
```

3. **PyTorch Geometric errors**
```bash
# Reinstall with specific CUDA version
pip uninstall torch-geometric torch-scatter torch-sparse
pip install torch-geometric -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
```

4. **Slow training on CPU**
- Expected behavior - consider using Google Colab or cloud GPU
- Reduce dataset size by modifying `num_particles=50` in `load_jetnet_data()`

---

## 📈 Performance Tips

- **GPU acceleration**: Ensure CUDA is properly installed
- **Memory optimization**: Use gradient checkpointing for large models
- **Faster convergence**: Try learning rate scheduling
- **Better results**: Experiment with different graph construction methods (radius graphs, etc.)

---

## 🤝 Contributing

Found a bug or want to improve the model?
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Submit a pull request

---

->Logs evaluation metrics
## 📚 References

->Saves visualizations to results/
- [JetNet Dataset](https://huggingface.co/datasets/jetnet)
- [PyTorch Geometric Documentation](https://pytorch-geometric.readthedocs.io/)
- [Chebyshev Graph Convolutions](https://arxiv.org/abs/1606.09375)



Expand Down
105 changes: 105 additions & 0 deletions Non_local_Jet_Classification_Tanmay_Bakshi/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Non-local Jet Classification with Topological Features

**Author**: Tanmay Bakshi
**GSoC 2025 Project**: Advanced jet classification using persistent homology and topological data analysis

## Overview

This project implements sophisticated neural network architectures for classifying particle jets, with a focus on capturing non-local geometric features through topological data analysis. The approach combines traditional jet features with persistent homology to improve classification performance on quark vs gluon discrimination tasks.

## Dataset

The project uses the **Quark Gluon Tagging Reference Dataset** by Kasieczka et al., featuring:
- 1.2M training events, 400k validation, 400k test events
- 14 TeV hadronic tops (signal) vs QCD dijets (background)
- Anti-kT 0.8 jets in pT range [550,650] GeV
- Leading 200 jet constituents stored per jet
- Constituents sorted by pT (highest first)

## Project Structure

```
Non_local_Jet_Classification_Tanmay_Bakshi/
├── main.py # Main entry point
├── datasets.py # Data loading utilities
├── coordinates_extract.py # Feature extraction
├── data_arrange.py # Data preprocessing
├── preprocess_dask.py # Parallel preprocessing
├── persistent_net-2.ipynb # Interactive demo notebook
├── console/ # Console utilities
├── helper/ # Helper functions
├── nn/ # Neural network models
├── persistence/ # Topological analysis
├── scnn/ # Simplicial CNN implementation
└── Weaver/ # Weaver framework integration
```

## Quick Start

### Prerequisites
- Python 3.8+
- PyTorch 1.8+
- awkward-array
- scikit-learn
- h5py
- pandas
- numpy

### Installation
```bash
# Navigate to project directory
cd Non_local_Jet_Classification_Tanmay_Bakshi

# Install dependencies (create requirements.txt if needed)
pip install torch awkward scikit-learn h5py pandas numpy matplotlib

# For topological analysis
pip install gudhi # for persistent homology
```

### Running the Code

**Option 1: Python Script**
```bash
python main.py
```

**Option 2: Interactive Notebook (Recommended)**
```bash
jupyter notebook persistent_net-2.ipynb
```

**Option 3: Data Preprocessing**
```bash
# For large datasets, use parallel preprocessing
python preprocess_dask.py
```

## Key Features

- **Topological Feature Extraction**: Uses persistent homology to capture jet topology
- **Multi-scale Analysis**: Analyzes jets at different geometric scales
- **Advanced Architectures**: Implements Simplicial CNNs and graph-based methods
- **Weaver Integration**: Compatible with the Weaver framework for particle physics ML

## Expected Outputs

- Classification accuracy metrics
- ROC curves and performance plots
- Topological feature visualizations
- Model checkpoints in respective subdirectories

## Troubleshooting

**Common Issues:**
1. **Memory errors**: Reduce batch size or use `preprocess_dask.py` for large datasets
2. **Missing dependencies**: Install `gudhi` for topological analysis features
3. **CUDA errors**: Ensure PyTorch CUDA version matches your system

## Citation

If you use this code, please cite:
```
Kasieczka, G., Plehn, T., Thompson, J., & Russell, M.
"Quark Gluon Tagging Reference Dataset"
```
Loading