A PyTorch-based project for training neural networks to predict entropy values from data. This project includes both synthetic data generation and real-world image processing capabilities using CIFAR-100 dataset.
- Entropy Calculation: Custom implementation of entropy calculation with histogram-based approach
- Synthetic Data Generation: Generate datasets with controlled entropy characteristics
- Neural Network Models: MLP and Transformer architectures for entropy prediction
- CIFAR-100 Integration: Process and analyze real image data entropy
- Training Pipelines: Complete training and evaluation workflows
- Visualization: Comprehensive plotting and result analysis
entropy-fitting/
├── entcal.py # Core entropy calculation functions
├── entdata.py # Synthetic data generation and dataset classes
├── train-synth.py # Training on synthetic data
├── train-cifar.py # Training on CIFAR-100 images
├── c100-ent-dist.py # CIFAR-100 entropy distribution analysis
├── compare-entcal.py # Entropy calculation comparison
├── best_entropy_model.pth # Trained model checkpoint
└── cifar-100/ # CIFAR-100 dataset directory
- Clone the repository:
git clone <repository-url>
cd entropy-fitting- Install dependencies:
pip install torch torchvision numpy scipy matplotlib tqdm scikit-image- Download CIFAR-100 dataset (will be automatically downloaded when running training scripts):
python -c "from torchvision import datasets; datasets.CIFAR100(root='./cifar-100', train=True, download=True)"Train on generated synthetic data with controlled entropy:
python train-synth.pyParameters:
num_samples: Number of training samples (default: 10000)dim: Dimension of each sample (default: 128)num_bins: Number of bins for entropy calculation (default: 16)model_type: 'mlp' or 'transformer' (default: 'mlp')
Train on CIFAR-100 images or their noise components:
python train-cifar.pyParameters:
crop_size: Image crop size (default: 24)num_bins: Number of bins for entropy calculation (default: 16)use_noise: Train on noise components instead of images (default: False)
Analyze entropy distribution of CIFAR-100 dataset:
python c100-ent-dist.pyTest the entropy calculation implementation:
python compare-entcal.pybatch_histc(): Batch histogram calculation with adaptive binningcalculate_entropy(): Compute entropy from data using histogram method
BaseEntropyDataset: Base class for entropy datasetsEntropyDataset: Finite dataset with pre-generated samplesOnTheFlyEntropyDataset: Infinite dataset with on-the-fly generation- DataLoader utilities for easy integration
MLP Model: Multi-layer perceptron for synthetic data
def get_mlp(input_dim=128, hidden_dims=[256, 256, 256])CNN Model: Convolutional network for image data
def get_cnn(hidden_channels=[16, 16, "A2", 32, 32, "A2", 64, 64, "A2", 128, 128, "AA"])The project includes comprehensive evaluation metrics:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Training and test loss curves
- Prediction vs true value scatter plots
- Residual distribution analysis
from entdata import EntropyDataset
import matplotlib.pyplot as plt
dataset = EntropyDataset(num_samples=10000, dim=128, num_bins=16)
plt.plot(dataset.data[0])
plt.title(f"Entropy: {dataset.labels[0].item():.4f}")
plt.show()from entcal import calculate_entropy
import torch
data = torch.randn(1, 100) # Batch of 100 samples
entropy = calculate_entropy(data, min_value=-3, max_value=3, num_bins=16)
print(f"Entropy: {entropy.item():.4f}")Key parameters can be adjusted in the script files:
- Data Generation:
dim,num_bins,num_samples - Training:
batch_size,learning_rate,num_epochs - Model: Hidden layer sizes, number of heads/layers
- Evaluation: Test ratio, random seed
- Python 3.7+
- PyTorch 1.8+
- torchvision
- NumPy
- SciPy
- Matplotlib
- scikit-image
- tqdm
This project is for research and educational purposes.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
If you use this code in your research, please cite:
@software{entropy_fitting,
title = {Entropy Fitting Project},
author = {Your Name},
year = {2025},
url = {https://github.com/your-username/entropy-fitting}
}For questions and support, please open an issue on GitHub or contact the maintainers.