Arabic Calligraphy Font Identification

This font identification system is able to identify the font for a given Arabic text snippet. It proves very useful in a wide range of applications, such as the fields of graphic design like in illustrator arts and optical character recognition for digitizing hardcover documents.

Not limited to that, but it also raised interest in the analysis of historical texts authentic manuscript verification where we would like to know the origin of a manuscript. For example, if the manuscript is written in Andalusi font, it is most probably dates to the Andalusian Era. If it were written in Demasqi font, it is most probably dates to the Abbasi Era.

Getting started

Install

pip install -r requirements.txt

Run

# run cli program to identify fonts in image batch
python src/inference/predict.py --test_directory=<FULL PATH> --output_directory=<FULL PATH> --verbose=<OPTIONAL>

The results are two files:

output/results.txt: contains the results of the evaluation.
output/time.txt: contains the inference time for each result. The time is in seconds.

Train new model

# run cli program to train and save model with name model.sav
python src/models/train_model.py --training_directory=<FULL PATH> --output_directory=<FULL PATH> --verbose=<OPTIONAL>

Evaluation criteria

The system is evaluated on the results accuracy and the inference time. With an emphasis on the accuracy.

System modules

Pre-processing Module.
Feature Extraction Module.
Model Selection and Training Module.
Performance Analysis Module.

Note The project is limited only to classical machine learning methods such as Bayesian Classifiers, KNN, Linear/Logistic Regression, Neural Networks (with two hidden layers as a maximum), Support Vector Machines, Principal Component Analysis, etc.

Project structure

├── data
│   ├── processed      <- The processed data.
│   └── raw            <- The original dataset.
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creators initials, and a short `-` delimited description, e.g.
│                         `01-data-exploration`.
│
├── src                <- Source code to build and evaluate models.
│   │
│   ├── data
│   │   ├── preprocess_data.py  <- Pre-processing module.
│   │   └── pipeline.py         <- Data preprocessing pipeline with cli interface.
│   │
│   ├── features
│   │   └──  build_features.py   <- Feature extraction module.
│   │
│   ├── models
│   │   └── train_model.py        <- Model training module.
│   │
│   ├── evaluation
│   │   ├── choose_model.py     <- Script to choose the best model.
│   │   └── evaluate_model.py   <- Script to evaluate the best model.
│   │
│   ├── inference
│   │   └── predict.py    <- Script to predict the results.
│   │
│   └── visualization
│       └── visualize.py        <- Script to visualize the results.
│
├── models             <- Trained and serialized models
│
├── cli                <- CLI code to interact with the models.
│
└── assets             <- Assets for the README file.

Dataset

We used the ACdb Arabic Calligraphy Database containing 9 categories of computer printed Arabic text snippets.

Research Papers

Note that you don't need to open all research papers. From the following papers, they gave us a hint of which features to extract from an image to have the ability to identify fonts. Each research paper was implemented in separate python notebook in notebooks/.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
cli		cli
data		data
models		models
notebooks		notebooks
src		src
.gitignore		.gitignore
ACFI_Report.pdf		ACFI_Report.pdf
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic Calligraphy Font Identification

Getting started

Install

Run

Train new model

Evaluation criteria

System modules

Project structure

Dataset

Research Papers

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Light-House-AI/ACFI

Folders and files

Latest commit

History

Repository files navigation

Arabic Calligraphy Font Identification

Getting started

Install

Run

Train new model

Evaluation criteria

System modules

Project structure

Dataset

Research Papers

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages