feat: merge v2 branch into main with major updates and improvements by c0deplayer · Pull Request #2 · c0deplayer/handwriting-generation

c0deplayer · 2024-09-26T09:36:16Z

Pull Request Message:

This pull request proposes merging the v2 branch into the main branch. The v2 branch includes a comprehensive set of updates, new features, and improvements that significantly enhance the project.

Notes:

Most scripts will not work. Once completed, it will be tested and merged into the main branch.
Please review the changes and test them in your local environment before merging.

Refactored the ConvNeXt model to integrate with PyTorch Lightning. - Replaced the existing nn.Module with LightningModule. - Added training and validation steps with accuracy metrics. - Configured optimizer and learning rate scheduler. - Introduced normalization based on the model type. - Updated the model freezing and modification logic.

- Updated `base_gpu.yaml` with new `max_seq_len` and `max_text_len` values. - Added `best_gpu.yaml` for Diffusion with specific parameters. - Added `best_gpu.yaml` for LatentDiffusion with specific parameters. - Refactored `config.py` to include new configurations for ConvNeXt and Inception. - Updated `constants.py` to register new models and configurations. - Adjusted `MODELS_APP` to exclude Inception and ConvNeXt from the list. These changes enhance the flexibility and configurability of the system by introducing new model configurations and updating existing ones.

- Refactored `DataModule` to support additional configurations for ConvNeXt and Inception. - Improved dataset creation and splitting logic. - Updated `IAMonDataset` and `IAMDataset` to streamline data loading and preprocessing. - Added methods to handle dataset-specific tasks like loading dataset text files, parsing lines, and validating labels. - Simplified attribute names and removed redundant properties. - Enhanced docstrings for better clarity and understanding. - Updated `data_utils.py` for better readability and consistency. - Added `get_device` function in `utils.py` to determine the best available device. These changes improve the flexibility and maintainability of the dataset handling and preprocessing code.

- Added sys.path.append(".") to ensure the script can find local modules. - Reformatted long lines for better readability. - Updated file_data_mapping to improve clarity. - Enhanced the filter_data_iam_on_dataset function to handle writerID extraction more robustly.

- Replaced `multiprocessing.Pool` with `concurrent.futures.ProcessPoolExecutor` for better parallelism. - Reformatted long lines for better readability. - Simplified the `__get_file_paths` function by using a suffix variable.

- Updated argument parsing to include new model options. - Enhanced `get_model_params` and `get_trainer_params` functions to handle new model configurations. - Move callbacks to utils/callbacks.py. - Renamed `train_model` function to `train` and added optional Neptune token parameter.

- Replaced custom normalization with torchvision.transforms.v2 - Added training_step and validation_step methods - Integrated MulticlassAccuracy for train and validation - Configured AdamW optimizer and StepLR scheduler

- Replaced `os` with `pathlib.Path` for better path handling. - Added `ValueError` raise documentation in `get_model_params`. - Improved exception handling by assigning error messages to variables before raising. - Adjusted type hints and docstrings for better clarity and consistency.

- Updated return type of __create_dataset to Dataset. - Improved docstrings for better clarity and consistency. - Converted max_files to int in prepare_data function.

…gging Added detailed docstrings for all methods and classes, and replaced print statements with logging for better traceability. Simplified conditional checks and improved type hinting. This refactor aims to make the codebase more understandable.

…trings

Refactored `data_utils.py` to enhance type hints, improve error handling, and update docstrings for better clarity. Replaced `xml.etree.ElementTree` with `defusedxml.ElementTree` for better security. Updated function signatures to use modern type hinting and improved exception messages for better debugging.

Enhanced the `EarlyStopper` and `PeriodicCheckpoint` classes by adding logging, improving type annotations, and refactoring code for better readability. Replaced `print` statements with logging for better control over output verbosity.

c0deplayer added 8 commits September 22, 2024 14:41

🎨: complete code refactoring (part 1)

6c192c7

feat(data): switch to ProcessPoolExecutor for parallelism

a7b7c94

- Replaced `multiprocessing.Pool` with `concurrent.futures.ProcessPoolExecutor` for better parallelism. - Reformatted long lines for better readability. - Simplified the `__get_file_paths` function by using a suffix variable.

feat(InceptionV3): integrate PyTorch Lightning and metrics

5c35862

- Replaced custom normalization with torchvision.transforms.v2 - Added training_step and validation_step methods - Integrated MulticlassAccuracy for train and validation - Configured AdamW optimizer and StepLR scheduler

c0deplayer force-pushed the v2 branch from a8ff72e to 5c35862 Compare September 26, 2024 12:50

c0deplayer added 9 commits September 26, 2024 14:52

fix(dataset): resolve circular import error

5a015b0

refactor(data): improve dataset preparation and type hints

1c7849e

- Updated return type of __create_dataset to Dataset. - Improved docstrings for better clarity and consistency. - Converted max_files to int in prepare_data function.

docs(ConvNeXt): Add docstrings to ConvNeXt model methods

6d68f88

refactor(ConvNeXt): Update imports

910a4df

refactor(InceptionV3): Improve readability and add comprehensive docs…

e5d4dbc

…trings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: merge v2 branch into main with major updates and improvements#2

feat: merge v2 branch into main with major updates and improvements#2
c0deplayer wants to merge 17 commits into
mainfrom
v2

c0deplayer commented Sep 26, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

c0deplayer commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Message:

Notes:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

c0deplayer commented Sep 26, 2024 •

edited

Loading