ML4SCI · Asadp3406 · Jan 22, 2026
diff --git a/Graph_Representation_Learning_Rushil_Singha/README.md b/Graph_Representation_Learning_Rushil_Singha/README.md
@@ -1,5 +1,8 @@
 # JetNet Graph Diffusion Model
 
+**Author**: Rushil Singha  
+**GSoC 2025 Project**: Graph-based diffusion models for realistic jet generation
+
 A PyTorch/PyTorch-Geometric implementation of a **graph-based diffusion model** for generating realistic jets from the [JetNet dataset](https://huggingface.co/datasets/jetnet).  
 
 This project builds **k-nearest neighbor (kNN) jet graphs**, learns **Chebyshev GCN (ChebNet) embeddings**, trains a **diffusion model in latent space**, and decodes back into particle-level jets.
@@ -18,37 +21,148 @@ This project builds **k-nearest neighbor (kNN) jet graphs**, learns **Chebyshev
 
 ## ⚙️ Installation
 
-Clone the repo and install dependencies:
+### Prerequisites
+- Python 3.8+ (tested on 3.9)
+- CUDA 11.8+ (for GPU acceleration)
+- At least 8GB RAM (16GB recommended)
+
+### Setup
 
 ```bash
-git clone https://github.com/your-username/jetnet-graph-diffusion.git
-cd jetnet-graph-diffusion
+# Clone and navigate to project
+git clone https://github.com/ML4SCI/GENIE.git
+cd GENIE/Graph_Representation_Learning_Rushil_Singha
 
+# Install dependencies
 pip install -r requirements.txt
+```
+
+**Note**: If you encounter PyTorch Geometric installation issues, install manually:
+```bash
+pip install torch==2.0.0+cu118 -f https://download.pytorch.org/whl/torch_stable.html
+pip install torch-geometric torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
+```
+
+---
+
+## 🏃‍♂️ Usage
+
+### Basic Run
+```bash
+python code.py
+```
+
+### What the script does:
+1. **Downloads JetNet dataset** (~2GB) to `jetnet_data/` directory
+2. **Preprocesses jets** - extracts particle features (eta, phi, pt) and masks
+3. **Builds kNN graphs** - constructs k=8 nearest neighbor graphs for each jet
+4. **Trains ChebNet encoder** - learns 64-dimensional latent representations
+5. **Runs diffusion training** - trains denoising model in latent space
+6. **Generates synthetic jets** - samples new jets from trained model
+7. **Evaluates results** - computes KL divergence and Wasserstein distances
+8. **Saves outputs** to `results/` directory
+
+### Expected Runtime
+- **CPU**: 3-4 hours
+- **GPU (RTX 3080+)**: 45-90 minutes
+- **Memory usage**: 6-12GB RAM
+
+### Output Files
+```
+results/
+├── training_logs.txt          # Training progress and losses
+├── generated_jets.png         # Comparison plots
+├── evaluation_metrics.json    # KL divergence, Wasserstein distances
+├── model_checkpoints/         # Saved model weights
+└── jet_visualizations/        # Individual jet plots
+```
 
-requirements.txt
-
-numpy==1.24.3
-torch==2.0.0
-torch-geometric
-torch-scatter
-torch-sparse
-torch-cluster
-networkx
-scikit-learn
-jetnet
+---
+
+## 🔧 Configuration
+
+Key parameters in `code.py`:
+```python
+# Graph construction
+K_NEIGHBORS = 8              # kNN graph connectivity
+LATENT_DIM = 64             # Embedding dimension
+
+# Training
+BATCH_SIZE = 32             # Adjust based on GPU memory
+LEARNING_RATE = 1e-4        # Adam optimizer learning rate
+NUM_EPOCHS = 100            # Training epochs
 ```
-# This script:
 
-->Encodes jets into latent space
+---
 
-->Runs diffusion training
+## 📊 Expected Results
 
-->Decodes jets back into particle space
+**Good results show:**
+- KL divergence < 0.1 for jet mass and pT distributions
+- Wasserstein distance < 0.05 for particle multiplicity
+- Generated jets visually similar to real jets in eta-phi space
+
+**If results are poor:**
+- Increase training epochs (try 200+)
+- Adjust learning rate (try 5e-5 or 2e-4)
+- Check GPU memory usage (reduce batch size if needed)
+
+---
+
+## 🐛 Troubleshooting
+
+**Common Issues:**
+
+1. **CUDA out of memory**
+   ```python
+   # Reduce batch size in code.py
+   BATCH_SIZE = 16  # or 8
+   ```
+
+2. **JetNet download fails**
+   ```bash
+   # Manual download alternative
+   wget https://zenodo.org/record/6975118/files/jetnet.tar.gz
+   tar -xzf jetnet.tar.gz
+   ```
+
+3. **PyTorch Geometric errors**
+   ```bash
+   # Reinstall with specific CUDA version
+   pip uninstall torch-geometric torch-scatter torch-sparse
+   pip install torch-geometric -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
+   ```
+
+4. **Slow training on CPU**
+   - Expected behavior - consider using Google Colab or cloud GPU
+   - Reduce dataset size by modifying `num_particles=50` in `load_jetnet_data()`
+
+---
+
+## 📈 Performance Tips
+
+- **GPU acceleration**: Ensure CUDA is properly installed
+- **Memory optimization**: Use gradient checkpointing for large models
+- **Faster convergence**: Try learning rate scheduling
+- **Better results**: Experiment with different graph construction methods (radius graphs, etc.)
+
+---
+
+## 🤝 Contributing
+
+Found a bug or want to improve the model? 
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Submit a pull request
+
+---
 
-->Logs evaluation metrics
+## 📚 References
 
-->Saves visualizations to results/
+- [JetNet Dataset](https://huggingface.co/datasets/jetnet)
+- [PyTorch Geometric Documentation](https://pytorch-geometric.readthedocs.io/)
+- [Chebyshev Graph Convolutions](https://arxiv.org/abs/1606.09375)
 
 
 

diff --git a/Non_local_Jet_Classification_Tanmay_Bakshi/readme.md b/Non_local_Jet_Classification_Tanmay_Bakshi/readme.md
@@ -0,0 +1,105 @@
+# Non-local Jet Classification with Topological Features
+
+**Author**: Tanmay Bakshi  
+**GSoC 2025 Project**: Advanced jet classification using persistent homology and topological data analysis
+
+## Overview
+
+This project implements sophisticated neural network architectures for classifying particle jets, with a focus on capturing non-local geometric features through topological data analysis. The approach combines traditional jet features with persistent homology to improve classification performance on quark vs gluon discrimination tasks.
+
+## Dataset
+
+The project uses the **Quark Gluon Tagging Reference Dataset** by Kasieczka et al., featuring:
+- 1.2M training events, 400k validation, 400k test events
+- 14 TeV hadronic tops (signal) vs QCD dijets (background)
+- Anti-kT 0.8 jets in pT range [550,650] GeV
+- Leading 200 jet constituents stored per jet
+- Constituents sorted by pT (highest first)
+
+## Project Structure
+
+```
+Non_local_Jet_Classification_Tanmay_Bakshi/
+├── main.py                    # Main entry point
+├── datasets.py                # Data loading utilities
+├── coordinates_extract.py     # Feature extraction
+├── data_arrange.py           # Data preprocessing
+├── preprocess_dask.py        # Parallel preprocessing
+├── persistent_net-2.ipynb   # Interactive demo notebook
+├── console/                  # Console utilities
+├── helper/                   # Helper functions
+├── nn/                       # Neural network models
+├── persistence/              # Topological analysis
+├── scnn/                     # Simplicial CNN implementation
+└── Weaver/                   # Weaver framework integration
+```
+
+## Quick Start
+
+### Prerequisites
+- Python 3.8+
+- PyTorch 1.8+
+- awkward-array
+- scikit-learn
+- h5py
+- pandas
+- numpy
+
+### Installation
+```bash
+# Navigate to project directory
+cd Non_local_Jet_Classification_Tanmay_Bakshi
+
+# Install dependencies (create requirements.txt if needed)
+pip install torch awkward scikit-learn h5py pandas numpy matplotlib
+
+# For topological analysis
+pip install gudhi  # for persistent homology
+```
+
+### Running the Code
+
+**Option 1: Python Script**
+```bash
+python main.py
+```
+
+**Option 2: Interactive Notebook (Recommended)**
+```bash
+jupyter notebook persistent_net-2.ipynb
+```
+
+**Option 3: Data Preprocessing**
+```bash
+# For large datasets, use parallel preprocessing
+python preprocess_dask.py
+```
+
+## Key Features
+
+- **Topological Feature Extraction**: Uses persistent homology to capture jet topology
+- **Multi-scale Analysis**: Analyzes jets at different geometric scales
+- **Advanced Architectures**: Implements Simplicial CNNs and graph-based methods
+- **Weaver Integration**: Compatible with the Weaver framework for particle physics ML
+
+## Expected Outputs
+
+- Classification accuracy metrics
+- ROC curves and performance plots
+- Topological feature visualizations
+- Model checkpoints in respective subdirectories
+
+## Troubleshooting
+
+**Common Issues:**
+1. **Memory errors**: Reduce batch size or use `preprocess_dask.py` for large datasets
+2. **Missing dependencies**: Install `gudhi` for topological analysis features
+3. **CUDA errors**: Ensure PyTorch CUDA version matches your system
+
+## Citation
+
+If you use this code, please cite:
+```
+Kasieczka, G., Plehn, T., Thompson, J., & Russell, M. 
+"Quark Gluon Tagging Reference Dataset"
+```