Skip to content

cloudforge1/PaddleOCR-VL-For-Medical-Rx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PaddleOCR-VL for Medical Prescription OCR 💊

Fine-tuned PaddleOCR-VL for medical prescription document OCR — drug names, dosages, Latin abbreviations, and structured output.

Highlights

  • Accurate drug name recognition across 22 unique medications
  • Latin pharmaceutical abbreviations: Rp., D.t.d., S., q.i.d., b.i.d., t.i.d., p.o., etc.
  • Structured output: drug name + dosage + frequency + route of administration
  • 840-image evaluation set with 6 visual conditions and 3 difficulty levels
  • 100% synthetic data — no patient privacy concerns (fully HIPAA-safe)

Quick Start

pip install paddlepaddle paddleformers
from paddleformers.models import PaddleOCRVL

model = PaddleOCRVL.from_pretrained("cloudforge1/PaddleOCR-VL-For-Medical-Rx")
result = model.predict("path/to/prescription.jpg")
print(result["text"])
# Output: "Rp.\n  Amoxicillin 500 mg\n  D.t.d. No. 21\n  S. 1 caps. t.i.d. p.o."

Note: API is preliminary and will be updated once PaddleFormers stabilizes the inference interface.

Model Architecture

Component Details
Base model PaddleOCR-VL (PaddlePaddle/PaddleOCR-VL)
Fine-tuning LoRA (rank=32, alpha=64)
Training framework PaddleFormers SFT with erniekit dataset format
Hardware NVIDIA V100 16GB

Evaluation Set

  • 840 images across 6 visual conditions
  • Visual conditions:
    • Clean print: 40%
    • Faded/aged: 18%
    • Photographed (camera capture): 17%
    • Multi-section prescriptions: 10%
    • Stamp overlay: 8%
    • Handwritten elements: 7%
  • Difficulty distribution: easy 30% / medium 40% / hard 30%
  • Drug vocabulary: 22 unique medications across common therapeutic categories

Training Data

  • 2000+ prescription images (scaling to 5000+)
  • Fully synthetic — no real patient data used at any stage
  • Multi-section prescriptions with stamps, headers, and signatures
  • Covers typed prescriptions, mixed typed/handwritten, and photographed documents
  • Latin abbreviation coverage: 15+ standard pharmaceutical abbreviations

Prescription Elements

The model recognizes and structures the following prescription components:

Element Examples
Header Rp. (Recipe), patient info, date
Drug name Amoxicillin, Metformin, Lisinopril, ...
Dosage 500 mg, 10 mg, 0.25 mg
Form caps., tab., sol., susp.
Quantity D.t.d. No. 21, No. 30
Instructions S. 1 tab. b.i.d. p.o. (take 1 tablet twice daily by mouth)
Stamps/signatures Doctor stamps, registration numbers

Results

Metric Pre-SFT Post-SFT
NED (overall) TBD TBD
NED (clean print) TBD TBD
NED (handwritten) TBD TBD
Drug name accuracy TBD TBD
Dosage accuracy TBD TBD
Latin abbreviation accuracy TBD TBD

Results will be populated after baseline evaluation and fine-tuning runs.

Project Structure

PaddleOCR-VL-For-Medical-Rx/
├── configs/            # Training and evaluation configs
├── docs/               # Documentation and research notes
├── eval_set/           # 840-image evaluation dataset
├── scripts/            # Data generation, training, evaluation scripts
├── train_data/         # Training dataset (synthetic)
├── LICENSE             # Apache 2.0
└── README.md

Citation

If you use this work in your research, please cite:

@misc{paddleocr-vl-medical-rx-2025,
  title={PaddleOCR-VL for Medical Prescription OCR},
  author={CloudForge},
  year={2025},
  url={https://github.com/cloudforge1/PaddleOCR-VL-For-Medical-Rx}
}

License

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Acknowledgments

  • PaddlePaddle team for PaddleOCR-VL and PaddleFormers
  • Baidu PaddleOCR Derivative Model Challenge (飞桨衍生模型挑战赛)

中文说明

项目简介

基于 PaddleOCR-VL 微调的医疗处方 OCR 模型,支持药品名称、剂量、拉丁文缩写的识别与结构化输出。

特色

  • 准确识别 22 种常见药品名称
  • 支持拉丁文药学缩写:Rp.、D.t.d.、S.、q.i.d.、b.i.d.、t.i.d.、p.o. 等
  • 结构化输出:药品名 + 剂量 + 频次 + 给药途径
  • 840 张评估图片,涵盖 6 种视觉条件
  • 100% 合成数据,无任何患者隐私问题

快速开始

pip install paddlepaddle paddleformers
from paddleformers.models import PaddleOCRVL

model = PaddleOCRVL.from_pretrained("cloudforge1/PaddleOCR-VL-For-Medical-Rx")
result = model.predict("path/to/prescription.jpg")
print(result["text"])

评估集

  • 840 张图片,6 种视觉条件
  • 条件分布:清晰印刷 40%、褪色老化 18%、拍照采集 17%、多区域处方 10%、印章覆盖 8%、手写元素 7%
  • 难度分布:简单 30% / 中等 40% / 困难 30%

训练数据

  • 2000+ 张合成处方图片(将扩展至 5000+)
  • 完全合成数据,不涉及真实患者信息
  • 覆盖打印处方、打印手写混合、拍照文档等场景

致谢

  • PaddlePaddle 团队提供的 PaddleOCR-VL 模型
  • 百度飞桨 PaddleOCR 衍生模型挑战赛

About

PaddleOCR-VL fine-tuned for medical prescription OCR — drug names, dosages, Latin abbreviations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors