Skip to content

yeyun11/entropy-experiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This README is generated by AI without any manual check (this line is by the author)

Entropy Fitting Project

A PyTorch-based project for training neural networks to predict entropy values from data. This project includes both synthetic data generation and real-world image processing capabilities using CIFAR-100 dataset.

Features

  • Entropy Calculation: Custom implementation of entropy calculation with histogram-based approach
  • Synthetic Data Generation: Generate datasets with controlled entropy characteristics
  • Neural Network Models: MLP and Transformer architectures for entropy prediction
  • CIFAR-100 Integration: Process and analyze real image data entropy
  • Training Pipelines: Complete training and evaluation workflows
  • Visualization: Comprehensive plotting and result analysis

Project Structure

entropy-fitting/
├── entcal.py              # Core entropy calculation functions
├── entdata.py            # Synthetic data generation and dataset classes
├── train-synth.py        # Training on synthetic data
├── train-cifar.py        # Training on CIFAR-100 images
├── c100-ent-dist.py      # CIFAR-100 entropy distribution analysis
├── compare-entcal.py     # Entropy calculation comparison
├── best_entropy_model.pth # Trained model checkpoint
└── cifar-100/           # CIFAR-100 dataset directory

Installation

  1. Clone the repository:
git clone <repository-url>
cd entropy-fitting
  1. Install dependencies:
pip install torch torchvision numpy scipy matplotlib tqdm scikit-image
  1. Download CIFAR-100 dataset (will be automatically downloaded when running training scripts):
python -c "from torchvision import datasets; datasets.CIFAR100(root='./cifar-100', train=True, download=True)"

Usage

Synthetic Data Training

Train on generated synthetic data with controlled entropy:

python train-synth.py

Parameters:

  • num_samples: Number of training samples (default: 10000)
  • dim: Dimension of each sample (default: 128)
  • num_bins: Number of bins for entropy calculation (default: 16)
  • model_type: 'mlp' or 'transformer' (default: 'mlp')

CIFAR-100 Training

Train on CIFAR-100 images or their noise components:

python train-cifar.py

Parameters:

  • crop_size: Image crop size (default: 24)
  • num_bins: Number of bins for entropy calculation (default: 16)
  • use_noise: Train on noise components instead of images (default: False)

Entropy Distribution Analysis

Analyze entropy distribution of CIFAR-100 dataset:

python c100-ent-dist.py

Entropy Calculation

Test the entropy calculation implementation:

python compare-entcal.py

Core Components

Entropy Calculation (entcal.py)

  • batch_histc(): Batch histogram calculation with adaptive binning
  • calculate_entropy(): Compute entropy from data using histogram method

Data Generation (entdata.py)

  • BaseEntropyDataset: Base class for entropy datasets
  • EntropyDataset: Finite dataset with pre-generated samples
  • OnTheFlyEntropyDataset: Infinite dataset with on-the-fly generation
  • DataLoader utilities for easy integration

Model Architectures

MLP Model: Multi-layer perceptron for synthetic data

def get_mlp(input_dim=128, hidden_dims=[256, 256, 256])

CNN Model: Convolutional network for image data

def get_cnn(hidden_channels=[16, 16, "A2", 32, 32, "A2", 64, 64, "A2", 128, 128, "AA"])

Training Results

The project includes comprehensive evaluation metrics:

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)
  • Training and test loss curves
  • Prediction vs true value scatter plots
  • Residual distribution analysis

Examples

Generate and visualize synthetic data:

from entdata import EntropyDataset
import matplotlib.pyplot as plt

dataset = EntropyDataset(num_samples=10000, dim=128, num_bins=16)
plt.plot(dataset.data[0])
plt.title(f"Entropy: {dataset.labels[0].item():.4f}")
plt.show()

Calculate entropy of custom data:

from entcal import calculate_entropy
import torch

data = torch.randn(1, 100)  # Batch of 100 samples
entropy = calculate_entropy(data, min_value=-3, max_value=3, num_bins=16)
print(f"Entropy: {entropy.item():.4f}")

Configuration

Key parameters can be adjusted in the script files:

  • Data Generation: dim, num_bins, num_samples
  • Training: batch_size, learning_rate, num_epochs
  • Model: Hidden layer sizes, number of heads/layers
  • Evaluation: Test ratio, random seed

Dependencies

  • Python 3.7+
  • PyTorch 1.8+
  • torchvision
  • NumPy
  • SciPy
  • Matplotlib
  • scikit-image
  • tqdm

License

This project is for research and educational purposes.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

Citation

If you use this code in your research, please cite:

@software{entropy_fitting,
  title = {Entropy Fitting Project},
  author = {Your Name},
  year = {2025},
  url = {https://github.com/your-username/entropy-fitting}
}

Support

For questions and support, please open an issue on GitHub or contact the maintainers.

About

A toy example to fit entropy of a noise

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages