Skip to content

feat: merge v2 branch into main with major updates and improvements#2

Open
c0deplayer wants to merge 17 commits into
mainfrom
v2
Open

feat: merge v2 branch into main with major updates and improvements#2
c0deplayer wants to merge 17 commits into
mainfrom
v2

Conversation

@c0deplayer
Copy link
Copy Markdown
Owner

@c0deplayer c0deplayer commented Sep 26, 2024

Pull Request Message:

This pull request proposes merging the v2 branch into the main branch. The v2 branch includes a comprehensive set of updates, new features, and improvements that significantly enhance the project.

Notes:

  • Most scripts will not work. Once completed, it will be tested and merged into the main branch.
  • Please review the changes and test them in your local environment before merging.

Refactored the ConvNeXt model to integrate with PyTorch Lightning.
- Replaced the existing nn.Module with LightningModule.
- Added training and validation steps with accuracy metrics.
- Configured optimizer and learning rate scheduler.
- Introduced normalization based on the model type.
- Updated the model freezing and modification logic.
- Updated `base_gpu.yaml` with new `max_seq_len` and `max_text_len` values.
- Added `best_gpu.yaml` for Diffusion with specific parameters.
- Added `best_gpu.yaml` for LatentDiffusion with specific parameters.
- Refactored `config.py` to include new configurations for ConvNeXt and Inception.
- Updated `constants.py` to register new models and configurations.
- Adjusted `MODELS_APP` to exclude Inception and ConvNeXt from the list.

These changes enhance the flexibility and configurability of the system by
introducing new model configurations and updating existing ones.
- Refactored `DataModule` to support additional configurations for ConvNeXt and Inception.
- Improved dataset creation and splitting logic.
- Updated `IAMonDataset` and `IAMDataset` to streamline data loading and preprocessing.
- Added methods to handle dataset-specific tasks like loading dataset text files, parsing lines, and validating labels.
- Simplified attribute names and removed redundant properties.
- Enhanced docstrings for better clarity and understanding.
- Updated `data_utils.py` for better readability and consistency.
- Added `get_device` function in `utils.py` to determine the best available device.

These changes improve the flexibility and maintainability of the dataset
handling and preprocessing code.
- Added sys.path.append(".") to ensure the script can find local modules.
- Reformatted long lines for better readability.
- Updated file_data_mapping to improve clarity.
- Enhanced the filter_data_iam_on_dataset function to handle writerID extraction more robustly.
- Replaced `multiprocessing.Pool` with `concurrent.futures.ProcessPoolExecutor` for better parallelism.
- Reformatted long lines for better readability.
- Simplified the `__get_file_paths` function by using a suffix variable.
- Updated argument parsing to include new model options.
- Enhanced `get_model_params` and `get_trainer_params` functions to handle
  new model configurations.
- Move callbacks to utils/callbacks.py.
- Renamed `train_model` function to `train` and added optional Neptune
  token parameter.
- Replaced custom normalization with torchvision.transforms.v2
- Added training_step and validation_step methods
- Integrated MulticlassAccuracy for train and validation
- Configured AdamW optimizer and StepLR scheduler
- Replaced `os` with `pathlib.Path` for better path handling.
- Added `ValueError` raise documentation in `get_model_params`.
- Improved exception handling by assigning error messages to variables before raising.
- Adjusted type hints and docstrings for better clarity and consistency.
- Updated return type of __create_dataset to Dataset.
- Improved docstrings for better clarity and consistency.
- Converted max_files to int in prepare_data function.
…gging

Added detailed docstrings for all methods and classes, and replaced
print statements with logging for better traceability. Simplified
conditional checks and improved type hinting. This refactor aims to
make the codebase more understandable.
Refactored `data_utils.py` to enhance type hints, improve error handling,
and update docstrings for better clarity. Replaced `xml.etree.ElementTree`
with `defusedxml.ElementTree` for better security. Updated function
signatures to use modern type hinting and improved exception messages
for better debugging.
Enhanced the `EarlyStopper` and `PeriodicCheckpoint` classes by adding
logging, improving type annotations, and refactoring code for better
readability. Replaced `print` statements with logging for better
control over output verbosity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant