feat: merge v2 branch into main with major updates and improvements#2
Open
c0deplayer wants to merge 17 commits into
Open
feat: merge v2 branch into main with major updates and improvements#2c0deplayer wants to merge 17 commits into
c0deplayer wants to merge 17 commits into
Conversation
Refactored the ConvNeXt model to integrate with PyTorch Lightning. - Replaced the existing nn.Module with LightningModule. - Added training and validation steps with accuracy metrics. - Configured optimizer and learning rate scheduler. - Introduced normalization based on the model type. - Updated the model freezing and modification logic.
- Updated `base_gpu.yaml` with new `max_seq_len` and `max_text_len` values. - Added `best_gpu.yaml` for Diffusion with specific parameters. - Added `best_gpu.yaml` for LatentDiffusion with specific parameters. - Refactored `config.py` to include new configurations for ConvNeXt and Inception. - Updated `constants.py` to register new models and configurations. - Adjusted `MODELS_APP` to exclude Inception and ConvNeXt from the list. These changes enhance the flexibility and configurability of the system by introducing new model configurations and updating existing ones.
- Refactored `DataModule` to support additional configurations for ConvNeXt and Inception. - Improved dataset creation and splitting logic. - Updated `IAMonDataset` and `IAMDataset` to streamline data loading and preprocessing. - Added methods to handle dataset-specific tasks like loading dataset text files, parsing lines, and validating labels. - Simplified attribute names and removed redundant properties. - Enhanced docstrings for better clarity and understanding. - Updated `data_utils.py` for better readability and consistency. - Added `get_device` function in `utils.py` to determine the best available device. These changes improve the flexibility and maintainability of the dataset handling and preprocessing code.
- Added sys.path.append(".") to ensure the script can find local modules.
- Reformatted long lines for better readability.
- Updated file_data_mapping to improve clarity.
- Enhanced the filter_data_iam_on_dataset function to handle writerID extraction more robustly.
- Replaced `multiprocessing.Pool` with `concurrent.futures.ProcessPoolExecutor` for better parallelism. - Reformatted long lines for better readability. - Simplified the `__get_file_paths` function by using a suffix variable.
- Updated argument parsing to include new model options. - Enhanced `get_model_params` and `get_trainer_params` functions to handle new model configurations. - Move callbacks to utils/callbacks.py. - Renamed `train_model` function to `train` and added optional Neptune token parameter.
- Replaced custom normalization with torchvision.transforms.v2 - Added training_step and validation_step methods - Integrated MulticlassAccuracy for train and validation - Configured AdamW optimizer and StepLR scheduler
- Replaced `os` with `pathlib.Path` for better path handling. - Added `ValueError` raise documentation in `get_model_params`. - Improved exception handling by assigning error messages to variables before raising. - Adjusted type hints and docstrings for better clarity and consistency.
- Updated return type of __create_dataset to Dataset. - Improved docstrings for better clarity and consistency. - Converted max_files to int in prepare_data function.
…gging Added detailed docstrings for all methods and classes, and replaced print statements with logging for better traceability. Simplified conditional checks and improved type hinting. This refactor aims to make the codebase more understandable.
Refactored `data_utils.py` to enhance type hints, improve error handling, and update docstrings for better clarity. Replaced `xml.etree.ElementTree` with `defusedxml.ElementTree` for better security. Updated function signatures to use modern type hinting and improved exception messages for better debugging.
Enhanced the `EarlyStopper` and `PeriodicCheckpoint` classes by adding logging, improving type annotations, and refactoring code for better readability. Replaced `print` statements with logging for better control over output verbosity.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Message:
This pull request proposes merging the
v2branch into themainbranch. Thev2branch includes a comprehensive set of updates, new features, and improvements that significantly enhance the project.Notes:
mainbranch.