Skip to content

asandu-cloud/Ancient-Text-CNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Font Classification for Ancient Text Digitization

  • Barontini Chiara, Milanino Tommaso, Sandu Andrei
  1. Introduction – Briefly describe your project.
  2. Methods: This section presents the techniques used on our dataset for exploration, preprocessing, data augmentation, and model development.
    • 2.1 Exploratory analysis and preprocessing: The dataset used for this project consists of 1,256 ancient digital texts labelled based on 11 distinct font classes: augustus, aureus, cicero, colosseum, consul, forum, laurel, roman, senatus, trajan, and vesta. An initial analysis of the distribution of samples across font categories revealed that overall, the classes were well represented with slight imbalances, for instance, cicero and aureus appeared more frequently than forum and laurel. Next, we inspected the dimensions of the images in the dataset which revealed a lot of variability in the width and height of the images. We also checked for duplicate or corrupt images and found none. Given the variability in the images, some standard preprocessing steps were applied to ensure consistency. First, grayscale conversion was applied to reduce complexity, then to each image was applied Gaussian blurring using a 3x3 kernel to eliminate noise, helpful to ignore minor imperfections often present in scanned documents. Following denoising, we applied Otsu’s thresholding to convert each image into a binary black and white format, this improved the contrast between text and background making fonts more distinguishable to the model. Finally, images were resized to a standard dimension of 224x224 to normalize input size across the dataset and reduce computation cost.
    • 2.2 Data Augmentation: Given the limited size of the dataset, data augmentation was an important method to improve robustness and generalization of our models. The transformations simulate possible distortion that could happen in real life, making the model more robust to small variations. First, we split the dataset into train and test set, then we applied the augmentation only on the training set while leaving the test set untouched, since we want to evaluate on original images from the dataset. Augmentation was implemented using ImageDataGenerator from Keras, which applies augmentation in real-time while training. The augmentations include applying random rotations, horizontal and vertical shifts, zoom, horizontal vertical flips, shearing, brightness change. For the first model (explained in the next section) we applied more aggressive data augmentation, while for the second model we used a more conservative augmentation to preserve the integrity of pretrained weights.
    • 2.3 Model Architectures: To explore different deep learning methods, two distinct neural network architectures were used. The baseline CNN was designed with a hierarchical structure of four convolutional blocks, each made up of two convolutional layers, to extract spatial features, combined with batch normalization, for training stability, followed by a maxpooling layer, for downsampling. The model began with 32 filters and progressively increased to 256, allowing the network to learn more complex features. A fully connected layer followed by a dropout and a final softmax output layer was used for classification. The model was trained using Adam optimizer with low learning rate, categorical crossentropy as the loss function, since our task is multi-class classification, and accuracy as the evaluation metric. The second model employed a CNN exploiting the EfficientNet architecture pre-trained on ImageNet. Since EfficientNet expects rgb inputs, grayscale images were converted to three channels. Then, 80% of the base layers of EfficientNet were partially frozen to retain learned features while allowing the upper layers to fine tune and adapt to our domain-specific font classification task. A global average pooling was used to reduce the spatial dimansions, followed by a dense layer, Relu, batch normalization, and a final softmax layer. This architecture benefited from the pre-learned features, allowing faster convergence and improved accuracy.
    • 2.4 Environment: Our project was developed and tested on Google Colab, using Python 3.11.12 on T4 GPU. The deep learning models were implemented in TensorFlow 2.18.0 along with supporting libraries such as Keras, Pandas, Matplotlib, Seaborn, and Scikit-learn (tensorflow=2.18.0, numpy=2.0.2, pandas=2.2.2, matplotlib=3.10.0, seaborn=0.13.2, scikit-learn=1.6.1).
  3. Experimental Design: To approach the problem of font classification, we experimented with two deep learning models (described in detail in the previous section), a custom CNN designed from scratch without external knowledge and a transfer learning model with EfficientNet architecture pre-trained on ImageNet. The first serves as a baseline to understand how well models created from scratch perform on this task, while the second model provides a state-of-the-art benchmark to understand the benefit of using pre-trained models in our specific case. Next, we aimed to evaluate whether using the first or the second architecture yields better results, given our limited dataset. To evaluate model performance, we used the following evaluation metrics: accuracy, precision, recall, F1-score, confusion matrix, training and validation curves. Accuracy was used to measure how often the model makes correct predictions in general, while precision recall, and F1-score were used to understand the performance per font to see if there was any imbalance. The confusion matrix visualizes predictions versus actual labels across all fonts, this is useful for identifying frequent misclassifications. We also used training and validation curves to visualize accuracy and loss across epochs during training, this helped us understand if accuracy and loss were stable or fluctuated a lot during training and to see if the model reached a plateau after a certain epoch.
  4. Results – Describe the following: o Main finding(s): report your results and what you might conclude from your work. o Include at least one placeholder figure and/or table for communicating your findings. o All the figures containing results should be generated from the code.
  5. Conclusions – List some concluding remarks. In particular: o Summarize in one paragraph the take-away point from your work. o Include one paragraph to explain what questions may not be fully answered by your work as well as natural next steps for this direction of future work.

About

Machine Learning project:

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors