We introduce Charm , a novel tokenization approach that preserves Composition, High-resolution, Aspect Ratio, and Multi-scale information simultaneously. By preserving critical information, Charm works like a charm for image aesthetic and quality assessment 🌟🌟🌟.
- Step 1) Check our GitHub Page and install the requirements.
pip install -r requirements.txt
- Step 2) Install Charm tokenizer.
pip install Charm-tokenizer
- Step 3) Tokenization + Position embedding preparation
from Charm_tokenizer.ImageProcessor import Charm_Tokenizer
img_path = r"img.png"
charm_tokenizer = Charm_Tokenizer(patch_selection='frequency', training_dataset='tad66k',backbone='facebook/dinov2-small', without_pad_or_dropping=True)
tokens, pos_embed, mask_token = charm_tokenizer.preprocess(img_path)Charm Tokenizer has the following input args:
- patch_selection (str): The method for selecting important patches
- Options: 'saliency', 'random', 'frequency', 'gradient', 'entropy', 'original'.
- training_dataset (str): Used to set the number of ViT input tokens to match a specific training dataset from the paper.
- Aesthetic assessment datasets: 'ava', 'aadb', 'tad66k', 'para', 'baid'.
- Quality assessment datasets: 'spaq', 'koniq10k'.
- backbone (str): The ViT backbone model (default: 'facebook/dinov2-small' (for all datasets except for AVA) and 'facebook/dinov2-large' (Just for AVA).
- factor (float): The downscaling factor for less important patches (default: 0.5).
- scales (int): The number of scales used for multiscale processing (default: 2).
- random_crop_size (tuple): Used for the 'original' patch selection strategy (default: (224, 224)).
- downscale_shortest_edge (int): Used for the 'original' patch selection strategy (default: 256).
- without_pad_or_dropping (bool): Whether to avoid padding or dropping patches (default: True).
The output is the preprocessed tokens, their corresponding positional embeddings, and a mask token that indicates which patches are in high resolution and which are in low resolution.
- Step 4) Predicting aesthetic/quality score
from Charm_tokenizer.Backbone import backbone
model = backbone(training_dataset='tad66k', device='cpu')
prediction = model.predict(tokens, pos_embed, mask_token)Note:
- While random patch selection during training helps avoid overfitting,for consistent results during inference, fully deterministic patch selection approaches should be used.
- For the training code, check our GitHub Page.