Fine-tune Qwen3-VL-8B for Arabic manuscript OCR using LoRA.
This repository serves as the baseline for the NAKBA NLP 2026: Arabic Manuscript Understanding Shared Task. Base model: Qwen/Qwen3-VL-8B-Instruct.
Prerequisite: install uv from https://docs.astral.sh/uv/
UV_TORCH_BACKEND=auto uv syncPrepare your data as:
data/
├── images/
── train.csv
CSV format:
filename,text
001.png,النص العربي هنا
002.png,نص آخرuv run python train.py --config configs/default.yamlOverride config values:
uv run python train.py --config configs/default.yaml --learning_rate 1e-5 --num_train_epochs 5Without config file:
uv run python train.py --images_dir data/images --csv_path data/train.csvuv run python inference.py --model_path outputs/qwen3vl-arabic-ocr-lora --image path/to/image.pngReleased test set metrics:
- CER: 0.2297
- WER: 0.4998
With ground truth:
uv run python evaluate.py \
--model_path outputs/qwen3vl-arabic-ocr-lora \
--images_dir data/test/images \
--csv_path data/test/annotations.csv \
--output_dir results/Generate predictions only:
uv run python evaluate.py \
--model_path outputs/qwen3vl-arabic-ocr-lora \
--images_dir data/test/images \
--csv_path data/test/filenames.csv \
--output_dir submissions/ \
--generate_onlyEdit configs/default.yaml to adjust hyperparameters. Key settings:
| Parameter | Default | Description |
|---|---|---|
| learning_rate | 2e-5 | Learning rate |
| num_train_epochs | 3 | Training epochs |
| per_device_train_batch_size | 4 | Batch size |
| lora.r | 32 | LoRA rank |
| lora.lora_alpha | 64 | LoRA alpha |