Skip to content
This repository was archived by the owner on Apr 2, 2025. It is now read-only.

Light-House-AI/ACFI

Repository files navigation

Arabic Calligraphy Font Identification

This font identification system is able to identify the font for a given Arabic text snippet. It proves very useful in a wide range of applications, such as the fields of graphic design like in illustrator arts and optical character recognition for digitizing hardcover documents.

Not limited to that, but it also raised interest in the analysis of historical texts authentic manuscript verification where we would like to know the origin of a manuscript. For example, if the manuscript is written in Andalusi font, it is most probably dates to the Andalusian Era. If it were written in Demasqi font, it is most probably dates to the Abbasi Era.

Getting started

Install

pip install -r requirements.txt

Run

# run cli program to identify fonts in image batch
python src/inference/predict.py --test_directory=<FULL PATH> --output_directory=<FULL PATH> --verbose=<OPTIONAL>

The results are two files:

  • output/results.txt: contains the results of the evaluation.
  • output/time.txt: contains the inference time for each result. The time is in seconds.

Train new model

# run cli program to train and save model with name model.sav
python src/models/train_model.py --training_directory=<FULL PATH> --output_directory=<FULL PATH> --verbose=<OPTIONAL>

Evaluation criteria

The system is evaluated on the results accuracy and the inference time. With an emphasis on the accuracy.

System modules

  • Pre-processing Module.
  • Feature Extraction Module.
  • Model Selection and Training Module.
  • Performance Analysis Module.

Note The project is limited only to classical machine learning methods such as Bayesian Classifiers, KNN, Linear/Logistic Regression, Neural Networks (with two hidden layers as a maximum), Support Vector Machines, Principal Component Analysis, etc.

Project structure

├── data
│   ├── processed      <- The processed data.
│   └── raw            <- The original dataset.
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creators initials, and a short `-` delimited description, e.g.
│                         `01-data-exploration`.
│
├── src                <- Source code to build and evaluate models.
│   │
│   ├── data
│   │   ├── preprocess_data.py  <- Pre-processing module.
│   │   └── pipeline.py         <- Data preprocessing pipeline with cli interface.
│   │
│   ├── features
│   │   └──  build_features.py   <- Feature extraction module.
│   │
│   ├── models
│   │   └── train_model.py        <- Model training module.
│   │
│   ├── evaluation
│   │   ├── choose_model.py     <- Script to choose the best model.
│   │   └── evaluate_model.py   <- Script to evaluate the best model.
│   │
│   ├── inference
│   │   └── predict.py    <- Script to predict the results.
│   │
│   └── visualization
│       └── visualize.py        <- Script to visualize the results.
│
├── models             <- Trained and serialized models
│
├── cli                <- CLI code to interact with the models.
│
└── assets             <- Assets for the README file.

Dataset

We used the ACdb Arabic Calligraphy Database containing 9 categories of computer printed Arabic text snippets.

Research Papers

Note that you don't need to open all research papers. From the following papers, they gave us a hint of which features to extract from an image to have the ability to identify fonts. Each research paper was implemented in separate python notebook in notebooks/.

  1. A Statistical Global Feature Extraction Method for Optical Font Recognition
  2. A New Computational Method for Arabic Calligraphy Style Representation and Classification
  3. An efficient multiple-classifier system for Arabic calligraphy style recognition
  4. Arabic Artistic Script Style Identification UsingTexture Descriptors

About

Arabic Calligraphy Font Identification

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •