diff --git a/ocr/arabic/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/arabic/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..fa7016740 --- /dev/null +++ b/ocr/arabic/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,224 @@ +--- +category: general +date: 2026-03-28 +description: كيفية استخدام تقنية التعرف الضوئي على الأحرف (OCR) للتعرف على النص المكتوب + يدويًا في الصور. تعلم استخراج النص المكتوب يدويًا، تحويل الصورة المكتوبة يدويًا، + والحصول على نتائج نظيفة بسرعة. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: ar +og_description: كيفية استخدام تقنية OCR للتعرف على النص المكتوب يدوياً. يوضح لك هذا + الدرس خطوة بخطوة كيفية استخراج النص المكتوب يدوياً من الصور والحصول على نتائج مصقولة. +og_title: كيفية استخدام تقنية OCR للتعرف على النص المكتوب بخط اليد – دليل شامل +tags: +- OCR +- Handwriting Recognition +- Python +title: كيفية استخدام OCR للتعرف على النص المكتوب بخط اليد – دليل شامل +url: /ar/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# كيفية استخدام OCR للتعرف على النص المكتوب بخط اليد – دليل كامل + +كيفية استخدام OCR للملاحظات المكتوبة بخط اليد هو سؤال يطرحه العديد من المطورين عندما يحتاجون إلى رقمنة الرسومات، محاضر الاجتماعات، أو الأفكار السريعة المكتوبة. في هذا الدليل سنستعرض الخطوات الدقيقة للتعرف على النص المكتوب بخط اليد، استخراج النص المكتوب بخط اليد، وتحويل صورة مكتوبة بخط اليد إلى سلاسل نظيفة قابلة للبحث. + +إذا سبق لك أن حدقت في صورة لقائمة بقالة وتساءلت، “هل يمكنني تحويل هذه الصورة المكتوبة بخط اليد إلى نص دون كتابة كل شيء مرة أخرى؟” – فأنت في المكان الصحيح. بحلول النهاية ستحصل على سكريبت جاهز للتنفيذ يحول **ملاحظة مكتوبة بخط اليد إلى نص** في ثوانٍ. + +## ما ستحتاجه + +- Python 3.8+ (الكود يعمل مع أي نسخة حديثة) +- مكتبة `ocr` – ثبّتها باستخدام `pip install ocr-sdk` (استبدل باسم حزمة موفر الخدمة الخاص بك) +- صورة واضحة لملاحظة مكتوبة بخط اليد (`hand_note.png` في المثال) +- قليل من الفضول وفنجان قهوة ☕️ (اختياري لكن يُنصح به) + +لا أطر عمل ثقيلة، ولا مفاتيح سحابية مدفوعة – مجرد محرك محلي يدعم **التعرف على الخط المكتوب بخط اليد** مباشرةً. + +## الخطوة 1 – تثبيت حزمة OCR واستيرادها + +أولاً، لنحصل على الحزمة الصحيحة على جهازك. افتح الطرفية وشغّل: + +```bash +pip install ocr-sdk +``` + +بعد انتهاء التثبيت، استورد الوحدة في سكريبتك: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **نصيحة احترافية:** إذا كنت تستخدم بيئة افتراضية، فعّلها قبل التثبيت. هذا يحافظ على نظافة مشروعك ويتجنب تعارض الإصدارات. + +## الخطوة 2 – إنشاء محرك OCR وتفعيل وضع الخط المكتوب بخط اليد + +الآن نحتاج إلى **كيفية استخدام OCR** – نحتاج إلى كائن محرك يعرف أننا نتعامل مع خطوط منحنية بدلاً من خطوط مطبوعة. المقتطف التالي ينشئ المحرك ويحول وضعه إلى وضع الخط المكتوب بخط اليد: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +لماذا نضبط `recognition_mode`؟ لأن معظم محركات OCR تفضّل الكشف عن النص المطبوع افتراضيًا، مما يتخطى غالبًا الحلقات والمنحنيات في الملاحظة الشخصية. تفعيل وضع الخط المكتوب بخط اليد يزيد الدقة بشكل كبير. + +## الخطوة 3 – تحميل الصورة التي تريد تحويلها (تحويل صورة مكتوبة بخط اليد) + +الصور هي المادة الخام لأي مهمة OCR. تأكد من حفظ صورتك بصيغة غير مضغوطة (PNG يعمل جيدًا) وأن النص قابل للقراءة إلى حد معقول. ثم حمّلها هكذا: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +إذا كانت الصورة موجودة بجوار سكريبتك، يمكنك ببساطة استخدام `"hand_note.png"` بدلاً من المسار الكامل. + +> **ماذا لو كانت الصورة غير واضحة؟** جرّب المعالجة المسبقة باستخدام OpenCV (مثلاً `cv2.cvtColor` إلى تدرج رمادي، `cv2.threshold` لزيادة التباين) قبل تمريرها إلى محرك OCR. + +## الخطوة 4 – تشغيل محرك التعرف لاستخراج النص المكتوب بخط اليد + +مع جاهزية المحرك وتحميل الصورة في الذاكرة، يمكننا أخيرًا **استخراج النص المكتوب بخط اليد**. طريقة `recognize` تُعيد كائن نتيجة خام يحتوي على النص بالإضافة إلى درجات الثقة. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +قد يتضمن الناتج الخام عادةً فواصل أسطر عشوائية أو أحرف غير صحيحة، خاصةً إذا كان الخط غير مرتب. لهذا السبب توجد الخطوة التالية. + +## الخطوة 5 – (اختياري) تحسين النتيجة باستخدام معالج AI ما بعد المعالجة + +معظم SDKs الحديثة لـ OCR تأتي مع معالج AI خفيف الوزن ما بعد المعالجة ينظّف التباعد، يصلح الأخطاء الشائعة، ويُوحّد نهايات الأسطر. تشغيله سهل كالتالي: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +إذا تخطيت هذه الخطوة ستحصل على نص قابل للاستخدام، لكن تحويل **الملاحظة المكتوبة بخط اليد إلى نص** سيظهر بصورة أقل صقلًا. المعالج ما بعد المعالجة مفيد خصوصًا للملاحظات التي تحتوي على نقاط تعداد أو كلمات مختلطة الأحرف. + +## الخطوة 6 – التحقق من النتيجة ومعالجة الحالات الخاصة + +بعد طباعة النتيجة المُنقّحة، تحقق من أن كل شيء يبدو صحيحًا. إليك فحص سريع يمكنك إضافته: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**قائمة التحقق من الحالات الخاصة** + +| الحالة | ما الذي يجب فعله | +|-----------|------------| +| **تباين منخفض جدًا** | زيادة التباين باستخدام `cv2.convertScaleAbs` قبل التحميل. | +| **عدة لغات** | ضبط `ocr_engine.language = ["en", "es"]` (أو اللغات المستهدفة). | +| **مستندات كبيرة** | معالجة الصفحات على دفعات لتجنب ارتفاع استهلاك الذاكرة. | +| **رموز خاصة** | إضافة قاموس مخصص عبر `ocr_engine.add_custom_words([...])`. | + +## نظرة عامة بصرية + +فيما يلي صورة بديلة توضح سير العمل — من ملاحظة مُصوَّرة إلى نص نظيف. يحتوي النص البديل على الكلمة الرئيسية، مما يجعل الصورة صديقة لتحسين محركات البحث. + +![كيفية استخدام OCR على صورة ملاحظة مكتوبة بخط اليد](/images/handwritten_ocr_flow.png "كيفية استخدام OCR على صورة ملاحظة مكتوبة بخط اليد") + +## سكريبت كامل قابل للتنفيذ + +بدمج جميع الأجزاء معًا، إليك البرنامج الكامل جاهز للنسخ واللصق: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**الناتج المتوقع (مثال)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +لاحظ كيف قام المعالج ما بعد المعالجة بتصحيح الخطأ “T0d@y” وتوحيد التباعد. + +## الأخطاء الشائعة ونصائح احترافية + +- **حجم الصورة مهم** – عادةً ما تحدّ محركات OCR حجم الإدخال إلى 4 K × 4 K. قم بتغيير حجم الصور الكبيرة مسبقًا. +- **نمط الخط** – الخط المتصل مقابل الحروف المنفصلة قد يؤثر على الدقة. إذا كان بإمكانك التحكم في المصدر (مثلاً قلم رقمي)، شجع على كتابة الحروف المنفصلة للحصول على أفضل النتائج. +- **المعالجة الدفعية** – عند التعامل مع عشرات الملاحظات، غلف السكريبت في حلقة واحفظ كل نتيجة في ملف CSV أو قاعدة بيانات SQLite. +- **تسرب الذاكرة** – بعض SDKs تحتفظ بذاكر داخلية؛ استدعِ `ocr_engine.dispose()` بعد الانتهاء إذا لاحظت بطءً. + +## الخطوات التالية – ما بعد OCR البسيط + +الآن بعد أن أتقنت **كيفية استخدام OCR** لصورة واحدة، فكر في هذه التوسعات: + +1. **التكامل مع التخزين السحابي** – سحب الصور من AWS S3 أو Azure Blob، تشغيل نفس الخطوات، وإعادة النتائج. +2. **إضافة كشف اللغة** – استخدم `ocr_engine.detect_language()` لتبديل القواميس تلقائيًا. +3. **الدمج مع NLP** – مرّر النص المنقّح إلى spaCy أو NLTK لاستخراج الكيانات، التواريخ، أو عناصر العمل. +4. **إنشاء نقطة نهاية REST** – غلف السكريبت بـ Flask أو FastAPI حتى تتمكن الخدمات الأخرى من إرسال صور عبر POST واستلام نص مشفر بصيغة JSON. + +كل هذه الأفكار لا تزال تدور حول المفاهيم الأساسية لـ **التعرف على النص المكتوب بخط اليد**، **استخراج النص المكتوب بخط اليد**، و**تحويل صورة مكتوبة بخط اليد** — العبارات التي من المحتمل أن تبحث عنها لاحقًا. + +--- + +### ملخص + +أظهرنا لك **كيفية استخدام OCR** للتعرف على النص المكتوب بخط اليد، استخراجّه، وتنقيته إلى سلسلة قابلة للاستخدام. السكريبت الكامل جاهز للتنفيذ، تم شرح سير العمل خطوة بخطوة، ولديك الآن قائمة تحقق للحالات الخاصة. احصل على صورة لملاحظة اجتماعك القادمة، ضعها في السكريبت، ودع الآلة تقوم بالكتابة بدلاً منك. + +برمجة سعيدة، ولتظل ملاحظاتك دائمًا قابلة للقراءة! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/arabic/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/arabic/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..60a888436 --- /dev/null +++ b/ocr/arabic/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,187 @@ +--- +category: general +date: 2026-03-28 +description: قم بإجراء التعرف الضوئي على الأحرف (OCR) على الصورة واحصل على نص نظيف + مع إحداثيات الصناديق المحيطة. تعلم كيفية استخراج النص من OCR، تنظيفه، وعرض النتائج + خطوة بخطوة. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: ar +og_description: نفّذ OCR على الصورة، نظّف النتيجة، واعرض إحداثيات الصناديق المحيطة + في دليل مختصر. +og_title: تنفيذ التعرف الضوئي على الأحرف في الصورة – نتائج نظيفة ومربعات الإحاطة +tags: +- OCR +- Computer Vision +- Python +title: إجراء التعرف الضوئي على الأحرف في الصورة – نتائج نظيفة وعرض إحداثيات الصندوق + المحيط +url: /ar/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# إجراء OCR على الصورة – تنظيف النتائج وعرض إحداثيات الصناديق المحيطة + +هل احتجت يوماً إلى **إجراء OCR على ملفات الصور** لكنك حصلت على نص فوضوي ولا تعرف أين يقع كل كلمة في الصورة؟ لست وحدك. في العديد من المشاريع—رقمنة الفواتير، مسح الإيصالات، أو استخراج النص البسيط—الحصول على مخرجات OCR الخام هو مجرد العائق الأول. الخبر السار؟ يمكنك تنظيف تلك المخرجات ورؤية إحداثيات الصناديق المحيطة لكل منطقة فوراً دون كتابة الكثير من الشيفرة المتكررة. + +في هذا الدليل سنستعرض **كيفية استخراج OCR**، تشغيل **معالج لاحق لتنظيف OCR**، وأخيراً **عرض إحداثيات الصناديق المحيطة** لكل منطقة تم تنظيفها. في النهاية ستحصل على سكريبت واحد قابل للتنفيذ يحول صورة غير واضحة إلى نص منظم ومهيكل جاهز للمعالجة اللاحقة. + +## ما ستحتاجه + +- Python 3.9+ (الصياغة أدناه تعمل على 3.8 وما فوق) +- محرك OCR يدعم `recognize(..., return_structured=True)` – على سبيل المثال مكتبة خيالية `engine` مستخدمة في المقتطف. استبدلها بـ Tesseract أو EasyOCR أو أي SDK يُعيد بيانات المناطق. +- إلمام أساسي بدوال Python والحلقات +- ملف صورة تريد مسحه (PNG، JPG، إلخ) + +> **نصيحة احترافية:** إذا كنت تستخدم Tesseract، فإن الدالة `pytesseract.image_to_data` تُعطيك الصناديق المحيطة بالفعل. يمكنك تغليف نتيجتها بمحول صغير يحاكي واجهة `engine.recognize` الموضحة أدناه. + +--- + +![مثال على إجراء OCR على الصورة](image-placeholder.png "مثال على إجراء OCR على الصورة") + +*نص بديل: مخطط يوضح كيفية إجراء OCR على الصورة وتصور إحداثيات الصناديق المحيطة* + +## الخطوة 1 – إجراء OCR على الصورة والحصول على المناطق المهيكلة + +الخطوة الأولى هي طلب من محرك OCR أن يُعيد ليس فقط النص العادي بل قائمة مهيكلة من مناطق النص. تحتوي هذه القائمة على السلسلة النصية الخام والمستطيل الذي يحيط بها. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**لماذا هذا مهم:** +عند طلب النص العادي فقط تفقد السياق المكاني. البيانات المهيكلة تتيح لك لاحقاً **عرض إحداثيات الصناديق المحيطة**، محاذاة النص مع الجداول، أو تمرير المواقع الدقيقة إلى نموذج لاحق. + +## الخطوة 2 – كيفية تنظيف مخرجات OCR باستخدام معالج لاحق + +محركات OCR جيدة في التعرف على الأحرف، لكنها غالباً ما تترك مسافات زائدة، قطع سطر غير مرغوب فيها، أو رموز تم التعرف عليها خطأً. المعالج اللاحق يُطبع النص، يُصحح الأخطاء الشائعة في OCR، ويزيل الفراغات الزائدة. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +إذا كنت تبني منظفك الخاص، فكر في: + +- إزالة الأحرف غير ASCII (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- دمج مسافات متعددة إلى مسافة واحدة +- تطبيق مدقق إملائي مثل `pyspellchecker` لتصحيح الأخطاء الواضحة + +**لماذا يجب أن تهتم:** +السلسلة النظيفة تجعل البحث، الفهرسة، وسلاسل معالجة اللغة الطبيعية اللاحقة أكثر موثوقية. بعبارة أخرى، **كيفية تنظيف OCR** غالباً ما تكون الفارق بين مجموعة بيانات صالحة وصداع رأس. + +## الخطوة 3 – عرض إحداثيات الصناديق المحيطة لكل منطقة تم تنظيفها + +الآن بعد أن أصبح النص مرتباً، نمر على كل منطقة، نطبع مستطيلها والنص المنظف. هذه هي الخطوة التي نُظهر فيها أخيراً **إحداثيات الصناديق المحيطة**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**نموذج الإخراج** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +يمكنك الآن تمرير تلك الإحداثيات إلى مكتبة رسم (مثل OpenCV) لتغطية الصناديق على الصورة الأصلية، أو تخزينها في قاعدة بيانات لاستعلامات لاحقة. + +## البرنامج الكامل الجاهز للتنفيذ + +فيما يلي البرنامج الكامل الذي يجمع بين الخطوات الثلاث. استبدل استدعاءات `engine` الوهمية بـ SDK OCR الفعلي الخاص بك. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### كيفية التشغيل + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +سترى قائمة بالصناديق المحيطة مقترنة بالنص المنظف، تماماً كما هو موضح في نموذج الإخراج أعلاه. + +## الأسئلة المتكررة والحالات الخاصة + +| السؤال | الجواب | +|----------|--------| +| **ماذا لو لم يدعم محرك OCR الخاص بـ `return_structured`؟** | اكتب غلافًا خفيفًا يحول مخرجات المحرك الخام (عادةً قائمة كلمات مع إحداثيات) إلى كائنات تحتوي على خصائص `text` و `bounding_box`. | +| **هل يمكنني الحصول على درجات الثقة؟** | العديد من SDKs تُظهر مقياس الثقة لكل منطقة. أضفه إلى جملة الطباعة: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **كيف أتعامل مع النص المدور؟** | عالج الصورة مسبقًا باستخدام `cv2.minAreaRect` من OpenCV لتصحيح الانحراف قبل استدعاء `recognize`. | +| **ماذا لو احتجت المخرجات بصيغة JSON؟** | سَلّس `processed_result.regions` باستخدام `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **هل هناك طريقة لتصور الصناديق؟** | استخدم OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` داخل الحلقة، ثم `cv2.imwrite("annotated.jpg", img)`. | + +## الخلاصة + +لقد تعلمت الآن **كيفية إجراء OCR على الصورة**، تنظيف المخرجات الخام، و**عرض إحداثيات الصناديق المحيطة** لكل منطقة. تدفق الخطوات الثلاث—التعرف → المعالجة اللاحقة → التكرار—هو نمط قابل لإعادة الاستخدام يمكنك دمجه في أي مشروع Python يحتاج إلى استخراج نص موثوق. + +### ما التالي؟ + +- **استكشاف محركات OCR مختلفة** (Tesseract، EasyOCR، Google Vision) ومقارنة الدقة. +- **دمج مع قاعدة بيانات** لتخزين بيانات المناطق لأرشفة قابلة للبحث. +- **إضافة كشف لغة** لتوجيه كل منطقة إلى مدقق إملائي مناسب. +- **تغطية الصناديق على الصورة الأصلية** للتحقق البصري (انظر مقتطف OpenCV أعلاه). + +إذا صادفتك أية مشاكل، تذكر أن أكبر فائدة تأتي من خطوة المعالجة اللاحقة القوية؛ النص المنظف أسهل بكثير في التعامل منه إلى تفريغ عشوائي من الأحرف. + +برمجة سعيدة، ولتكن خطوط أنابيب OCR الخاصة بك دائماً مرتبة! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/arabic/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/arabic/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..ef02964e5 --- /dev/null +++ b/ocr/arabic/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,232 @@ +--- +category: general +date: 2026-03-28 +description: دروس OCR بلغة Python توضح كيفية استخراج النص من الصورة باستخدام Aspose + OCR Cloud. تعلم كيفية تحميل الصورة للتعرف الضوئي على الأحرف وتحويل الصورة إلى نص + عادي في دقائق. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: ar +og_description: يشرح برنامج تعليمي للـ OCR باستخدام بايثون كيفية تحميل الصورة للـ + OCR وتحويل النص العادي للصورة باستخدام Aspose OCR Cloud. احصل على الكود الكامل والنصائح. +og_title: دليل بايثون للتعرف الضوئي على الأحرف – استخراج النص من الصور +tags: +- OCR +- Python +- Image Processing +title: دورة بايثون للتعرف الضوئي على الأحرف – استخراج النص من الصور +url: /ar/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Tutorial – استخراج النص من الصور + +هل تساءلت يومًا كيف تحول صورة إيصال غير مرتبة إلى نص نظيف وقابل للبحث؟ لست وحدك. في تجربتي، أكبر عقبة ليست محرك OCR نفسه بل الحصول على الصورة بالتنسيق الصحيح واستخراج النص العادي دون أي مشاكل. + +هذا **python ocr tutorial** يشرح لك كل خطوة — تحميل صورة للـ OCR، تشغيل التعرف، وأخيرًا تحويل النص العادي للصورة إلى سلسلة Python يمكنك تخزينها أو تحليلها. في النهاية ستكون قادرًا على **extract text image python**، ولن تحتاج إلى أي ترخيص مدفوع للبدء. + +## ما ستتعلمه + +- كيفية تثبيت واستيراد Aspose OCR Cloud SDK للـ Python. +- الكود الدقيق لـ **load image for OCR** (PNG, JPEG, TIFF, PDF، إلخ). +- كيفية استدعاء المحرك لإجراء تحويل **ocr image to text**. +- نصائح للتعامل مع الحالات الشائعة مثل ملفات PDF متعددة الصفحات أو المسحات منخفضة الدقة. +- طرق للتحقق من النتيجة وماذا تفعل إذا كان النص مشوشًا. + +### المتطلبات المسبقة + +- Python 3.8+ مثبت على جهازك. +- حساب Aspose Cloud مجاني (الإصدار التجريبي يعمل بدون ترخيص). +- إلمام أساسي بـ pip وبيئات virtual environments — لا شيء معقد. + +> **نصيحة احترافية:** إذا كنت تستخدم virtualenv بالفعل، فعّله الآن. يحافظ ذلك على نظافة الاعتمادات ويتجنب تعارض الإصدارات. + +![Python OCR tutorial screenshot showing recognized text](path/to/ocr_example.png "Python OCR tutorial – extracted plain text display") + +## الخطوة 1 – تثبيت Aspose OCR Cloud SDK + +أولًا، نحتاج إلى المكتبة التي تتواصل مع خدمة OCR من Aspose. افتح الطرفية واكتب: + +```bash +pip install asposeocrcloud +``` + +هذا الأمر الواحد يجلب أحدث SDK (الإصدار الحالي 23.12). الحزمة تشمل كل ما تحتاجه — لا حاجة لمكتبات معالجة صور إضافية. + +## الخطوة 2 – تهيئة محرك OCR (الكلمة المفتاحية الأساسية في العمل) + +الآن بعد أن أصبح SDK جاهزًا، يمكننا تشغيل محرك **python ocr tutorial**. المُنشئ لا يحتاج إلى مفتاح ترخيص للتجربة، مما يبسط الأمور. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **لماذا هذا مهم:** تهيئة المحرك مرة واحدة فقط تجعل الاستدعاءات اللاحقة سريعة. إذا قمت بإنشاء الكائن لكل صورة ستضيع جولات الشبكة. + +## الخطوة 3 – تحميل صورة للـ OCR + +هنا يبرز دور كلمة **load image for OCR**. طريقة `Image.load` في SDK تقبل مسار ملف أو URL، وتكتشف التنسيق تلقائيًا (PNG، JPEG، TIFF، PDF، إلخ). لنحمّل إيصالًا تجريبيًا: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +إذا كنت تتعامل مع PDF متعدد الصفحات، ما عليك سوى الإشارة إلى ملف PDF؛ سيعامل SDK كل صفحة كصورة منفصلة داخليًا. + +## الخطوة 4 – تنفيذ تحويل OCR من صورة إلى نص + +مع وجود الصورة في الذاكرة، يحدث الـ OCR الفعلي في سطر واحد. طريقة `recognize` تُعيد كائن `OcrResult` يحتوي على النص العادي، درجات الثقة، وحتى إطارات الحدود إذا احتجتها لاحقًا. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **حالة خاصة:** للصور منخفضة الدقة (أقل من 300 dpi) قد ترغب في تكبير الصورة أولًا. يوفر SDK أداة مساعدة `Resize`، لكن بالنسبة لمعظم الإيصالات الإعداد الافتراضي يعمل جيدًا. + +## الخطوة 5 – تحويل النص العادي للصورة إلى سلسلة قابلة للاستخدام + +القطعة الأخيرة من اللغز هي استخراج النص العادي من كائن النتيجة. هذه هي خطوة **convert image plain text** التي تحول كتلة الـ OCR إلى شيء يمكنك طباعته، تخزينه، أو إمداده إلى نظام آخر. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +عند تشغيل السكريبت، يجب أن ترى شيئًا مثل: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +هذه النتيجة الآن سلسلة Python عادية، جاهزة لتصدير CSV، إدخال قاعدة بيانات، أو معالجة اللغة الطبيعية. + +## التعامل مع المشكلات الشائعة + +### 1. صور فارغة أو مشوشة + +إذا كان `ocr_result.text` فارغًا، تحقق مرة أخرى من جودة الصورة. حل سريع هو إضافة خطوة تمهيدية: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. ملفات PDF متعددة الصفحات + +عند إمداد PDF، تُعيد `recognize` النتائج لكل صفحة. قم بالتكرار عبرها هكذا: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. دعم اللغات + +يدعم Aspose OCR أكثر من 60 لغة. لتغيير اللغة، اضبط خاصية `language` قبل استدعاء `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## مثال كامل يعمل + +بجمع كل ذلك معًا، إليك سكريبت كامل جاهز للنسخ واللصق يغطي كل شيء من التثبيت إلى معالجة الحالات الخاصة: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +شغّل السكريبت (`python ocr_demo.py`) وسترى ناتج **ocr image to text** مباشرة في وحدة التحكم. + +## ملخص – ما تم تغطيته + +- تم تثبيت SDK **Aspose OCR Cloud** (`pip install asposeocrcloud`). +- **تهيئة محرك OCR** بدون ترخيص (مثالي للتجربة). +- تم توضيح كيفية **load image for OCR**، سواء كان PNG أو JPEG أو PDF. +- تم تنفيذ تحويل **ocr image to text** و**convert image plain text** إلى سلسلة Python قابلة للاستخدام. +- تم معالجة المشكلات الشائعة مثل المسحات منخفضة الدقة، ملفات PDF متعددة الصفحات، واختيار اللغة. + +## الخطوات التالية والمواضيع ذات الصلة + +الآن بعد أن أتقنت **python ocr tutorial**، فكر في استكشاف: + +- **Extract text image python** للمعالجة الدفعية لمجلدات كبيرة من الإيصالات. +- دمج ناتج OCR مع **pandas** لتحليل البيانات (`df = pd.read_csv(StringIO(extracted))`). +- استخدام **Tesseract OCR** كخيار احتياطي عندما تكون اتصال الإنترنت محدودًا. +- إضافة معالجة لاحقة باستخدام **spaCy** لتحديد الكيانات مثل التواريخ، المبالغ، وأسماء التجار. + +لا تتردد في التجربة: جرّب صيغ صور مختلفة، عدّل التباين، أو غيّر اللغات. مجال OCR واسع، والمهارات التي اكتسبتها الآن تشكل أساسًا قويًا لأي مشروع أتمتة مستندات. + +برمجة سعيدة، ولتكن نصوصك دائمًا قابلة للقراءة! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/arabic/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/arabic/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..3939b83e0 --- /dev/null +++ b/ocr/arabic/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-03-28 +description: تعلم كيفية تشغيل OCR على الصورة، تنزيل نموذج Hugging Face تلقائيًا، تنظيف + نص OCR وتكوين نموذج LLM في بايثون باستخدام Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: ar +og_description: قم بتشغيل OCR على الصورة وتنظيف المخرجات باستخدام نموذج Hugging Face + يتم تنزيله تلقائيًا. يوضح هذا الدليل كيفية تكوين نموذج LLM في بايثون. +og_title: تشغيل OCR على الصورة – دليل Aspose OCR Cloud الكامل +tags: +- OCR +- Python +- LLM +- HuggingFace +title: تشغيل OCR على الصورة باستخدام Aspose OCR Cloud – دليل كامل خطوة بخطوة +url: /ar/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# تشغيل OCR على الصورة – دليل Aspose OCR Cloud الكامل + +هل احتجت يوماً لتشغيل OCR على ملفات الصور لكن المخرجات الأولية كانت تبدو فوضىً غير مفهومة؟ في تجربتي، أكبر نقطة ألم ليست في عملية التعرف نفسها—بل في عملية التنظيف. لحسن الحظ، يتيح لك Aspose OCR Cloud إرفاق معالج ما بعد‑الـLLM يمكنه *تنظيف نص OCR* تلقائيًا. في هذا الدرس سنستعرض كل ما تحتاجه: من **تنزيل نموذج Hugging Face** إلى تكوين الـLLM، تشغيل محرك OCR، وأخيرًا صقل النتيجة. + +بحلول نهاية هذا الدليل ستحصل على سكريبت جاهز للتنفيذ يقوم بـ: + +1. سحب نموذج Qwen 2.5 المدمج من Hugging Face (تم تنزيله تلقائيًا لك). +2. تكوين النموذج لتشغيل جزء من الشبكة على الـGPU والباقي على الـCPU. +3. تنفيذ محرك OCR على صورة ملاحظة مكتوبة بخط اليد. +4. استخدام الـLLM لتنظيف النص المُعترف به، مما يمنحك مخرجات قابلة للقراءة البشرية. + +> **المتطلبات المسبقة** – Python 3.8+، حزمة `asposeocrcloud`، بطاقة GPU بسعة لا تقل عن 4 GB VRAM (اختياري لكن يُنصح به)، واتصال إنترنت لتنزيل النموذج الأول. + +--- + +## ما ستحتاجه + +- **Aspose OCR Cloud SDK** – تثبيت عبر `pip install asposeocrcloud`. +- **صورة نموذجية** – مثال: `handwritten_note.jpg` موجودة في مجلد محلي. +- **دعم GPU** – إذا كان لديك GPU يدعم CUDA، سيقوم السكريبت بتحميل 30 طبقة؛ وإلا سيعود تلقائيًا إلى الـCPU. +- **صلاحية كتابة** – يقوم السكريبت بتخزين النموذج مؤقتًا في `YOUR_DIRECTORY`؛ تأكد من وجود المجلد. + +--- + +## الخطوة 1 – تكوين نموذج الـLLM (تنزيل نموذج Hugging Face) + +أول ما نفعله هو إخبار Aspose AI من أين يجلب النموذج. تتولى فئة `AsposeAIModelConfig` عملية التنزيل التلقائي، والكمّية، وتخصيص طبقات الـGPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**لماذا هذا مهم** – تحويل النموذج إلى `int8` يقلل استهلاك الذاكرة بشكل كبير (≈ 4 GB مقابل 12 GB). تقسيم النموذج بين الـGPU والـCPU يتيح لك تشغيل LLM بحدود 3 مليار معامل حتى على RTX 3060 متوسط القدرة. إذا لم يكن لديك GPU، اضبط `gpu_layers=0` وسيتولى الـSDK تشغيل كل شيء على الـCPU. + +> **نصيحة:** التشغيل الأول سيقوم بتنزيل حوالي 1.5 GB، لذا امنحه بضع دقائق واتصالًا مستقرًا. + +--- + +## الخطوة 2 – تهيئة محرك الذكاء الاصطناعي باستخدام تكوين النموذج + +الآن نقوم بتشغيل محرك Aspose AI ونمرره إلى التكوين الذي أنشأناه للتو. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**ما الذي يحدث خلف الكواليس؟** يتحقق الـSDK من `directory_model_path` للعثور على نموذج موجود. إذا وجد نسخة مطابقة، يقوم بتحميلها فورًا؛ وإلا سيقوم بتنزيل ملف GGUF من Hugging Face، فك ضغطه، وإعداد خط أنابيب الاستدلال. + +--- + +## الخطوة 3 – إنشاء محرك OCR وإرفاق معالج ما بعد‑الـAI + +يقوم محرك OCR بالعمل الشاق للتعرف على الأحرف. عبر إرفاق `ocr_ai.run_postprocessor` نُمكّن **تنظيف نص OCR** تلقائيًا بعد عملية التعرف. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**لماذا نستخدم معالج ما بعد‑الـAI؟** غالبًا ما يحتوي OCR الخام على فواصل أسطر في أماكن غير صحيحة، علامات ترقيم خاطئة، أو رموز عشوائية. يستطيع الـLLM إعادة صياغة المخرجات إلى جمل صحيحة، تصحيح الأخطاء الإملائية، وحتى استنتاج الكلمات المفقودة—مما يحول النص الخام إلى نص منقح. + +--- + +## الخطوة 4 – تشغيل OCR على ملف صورة + +مع ربط جميع المكونات، حان الوقت لتغذية صورة إلى المحرك. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**حالة خاصة:** إذا كانت الصورة كبيرة (> 5 MP)، قد ترغب في تصغير حجمها أولاً لتسريع المعالجة. يقبل الـSDK كائن Pillow `Image`، لذا يمكنك التحضير مسبقًا باستخدام `PIL.Image.thumbnail()` إذا لزم الأمر. + +--- + +## الخطوة 5 – السماح للـAI بتنظيف النص المُعترف به وعرض النسختين + +أخيرًا نستدعي معالج ما بعد‑الـAI الذي أرفقناه مسبقًا. تُظهر هذه الخطوة الفارق بين *قبل* و*بعد* التنظيف. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### المخرجات المتوقعة + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +لاحظ كيف قام الـLLM بـ: + +- تصحيح الأخطاء الشائعة في OCR (`Th1s` → `This`). +- إزالة الرموز العشوائية (`&` → `and`). +- تحويل فواصل الأسطر إلى جمل صحيحة. + +--- + +## 🎨 نظرة بصرية (سير عمل تشغيل OCR على الصورة) + +![Run OCR on image workflow](run_ocr_on_image_workflow.png "Diagram showing the run OCR on image pipeline from model download to cleaned output") + +تلخّص الصورة أعلاه الخطوات الكاملة: **تنزيل نموذج Hugging Face → تكوين الـLLM → تهيئة AI → محرك OCR → معالج ما بعد‑الـAI → تنظيف نص OCR**. + +--- + +## أسئلة شائعة ونصائح احترافية + +### ماذا لو لم يكن لدي GPU؟ + +اضبط `gpu_layers=0` في `AsposeAIModelConfig`. سيعمل النموذج بالكامل على الـCPU، وهو أبطأ لكنه لا يزال فعالًا. يمكنك أيضًا الانتقال إلى نموذج أصغر (مثل `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) لتقليل زمن الاستدلال. + +### كيف أغيّر النموذج لاحقًا؟ + +ما عليك سوى تحديث `hugging_face_repo_id` وإعادة تشغيل `ocr_ai.initialize(model_config)`. سيكتشف الـSDK تغيير الإصدار، ينزل النموذج الجديد، ويستبدل الملفات المخزنة مؤقتًا. + +### هل يمكن تخصيص موجه معالج ما بعد‑الـAI؟ + +نعم. مرّر قاموسًا إلى `custom_settings` يحتوي على مفتاح `prompt_template`. مثال: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### هل يجب تخزين النص المنقح في ملف؟ + +بالتأكيد. بعد التنظيف يمكنك كتابة النتيجة إلى ملف `.txt` أو `.json` لمعالجة لاحقة: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## الخلاصة + +لقد أظهرنا لك كيفية **تشغيل OCR على الصور** باستخدام Aspose OCR Cloud، مع **تنزيل نموذج Hugging Face** تلقائيًا، وتكوين إعدادات **نموذج الـLLM** بدقة، وأخيرًا **تنظيف نص OCR** باستخدام معالج ما بعد‑الـLLM قوي. العملية بأكملها تتضمن سكريبت Python واحد سهل التشغيل وتعمل على الأجهزة التي تدعم GPU أو على الـCPU فقط. + +إذا شعرت بالراحة مع هذا الخط الأنابيب، فكر في التجربة مع: + +- **نماذج LLM مختلفة** – جرّب `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` للحصول على نافذة سياق أوسع. +- **معالجة دفعات** – كرّر العملية على مجلد من الصور واجمع النتائج المنقحة في ملف CSV. +- **موجهات مخصصة** – عدّل الـAI ليتناسب مع مجالك (وثائق قانونية، ملاحظات طبية، إلخ). + +لا تتردد في تعديل قيمة `gpu_layers`، أو استبدال النموذج، أو إدخال موجهك الخاص. السماء هي الحد، والكود الموجود الآن هو منصة الإطلاق. + +برمجة سعيدة، ولتكن مخرجات OCR دائمًا نظيفة! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/chinese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/chinese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..b47d2433d --- /dev/null +++ b/ocr/chinese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,221 @@ +--- +category: general +date: 2026-03-28 +description: 如何使用 OCR 在图像中识别手写文字。学习提取手写文字、转换手写图像,并快速获得干净的结果。 +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: zh +og_description: 如何使用 OCR 识别手写文字。本教程将一步步演示如何从图像中提取手写文字并获得精美的结果。 +og_title: 如何使用 OCR 识别手写文本 – 完整指南 +tags: +- OCR +- Handwriting Recognition +- Python +title: 如何使用 OCR 识别手写文本 – 完整指南 +url: /zh/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 如何使用 OCR 识别手写文本 – 完整指南 + +如何使用 OCR 处理手写笔记是许多开发者在需要数字化草图、会议记录或快速记下的想法时常问的问题。在本指南中,我们将逐步演示识别手写文本、提取手写文本以及将手写图像转换为干净、可搜索的字符串的完整步骤。 + +如果你曾经盯着一张购物清单的照片,心想“我能把这张手写图像转换成文本,而不必重新输入所有内容吗?”——那么你来对地方了。阅读完本教程后,你将拥有一个可直接运行的脚本,能够在几秒钟内将 **handwritten note to text** 转换为文本。 + +## 你需要的准备 + +- Python 3.8+(代码在任何近期版本均可运行) +- `ocr` 库 – 使用 `pip install ocr-sdk` 安装(请替换为你的供应商的包名) +- 一张清晰的手写笔记图片(示例中的 `hand_note.png`) +- 一点好奇心和一杯咖啡 ☕️(可选,但推荐) + +无需庞大的框架,也不需要付费的云密钥——只需一个本地引擎,即可开箱即用 **handwritten recognition**。 + +## 第一步 – 安装 OCR 包并导入 + +首先,让我们在机器上安装正确的包。打开终端并运行: + +```bash +pip install ocr-sdk +``` + +安装完成后,在脚本中导入该模块: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **专业提示:** 如果你使用虚拟环境,请在安装前激活它。这可以保持项目整洁,避免版本冲突。 + +## 第二步 – 创建 OCR 引擎并启用手写模式 + +现在我们真正开始 **how to use OCR**——我们需要一个能够识别手写笔画而非印刷字体的引擎实例。下面的代码片段创建了引擎并将其切换到手写模式: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +为什么要设置 `recognition_mode`?因为大多数 OCR 引擎默认检测印刷文本,这往往会忽略个人笔记中的弧线和倾斜。启用手写模式可以显著提升准确率。 + +## 第三步 – 加载要转换的图像(Convert Handwritten Image) + +图像是任何 OCR 任务的原始材料。确保你的图片以无损格式保存(PNG 非常合适),且文字可辨认。然后按如下方式加载: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +如果图像与脚本位于同一目录,只需使用 `"hand_note.png"` 而无需完整路径。 + +> **如果图像模糊怎么办?** 在将图像输入 OCR 引擎之前,尝试使用 OpenCV 进行预处理(例如,使用 `cv2.cvtColor` 转为灰度,使用 `cv2.threshold` 提高对比度)。 + +## 第四步 – 运行识别引擎以提取手写文本 + +引擎准备就绪且图像已加载到内存后,我们终于可以 **extract handwritten text**。`recognize` 方法返回一个原始结果对象,其中包含文本以及置信度分数。 + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +典型的原始输出可能包含多余的换行或误识别的字符,尤其是在手写字迹凌乱时。这也是下一步存在的原因。 + +## 第五步 – (可选)使用 AI 后处理器优化输出 + +大多数现代 OCR SDK 都附带轻量级的 AI 后处理器,可清理空格、修正常见 OCR 错误并规范换行。运行它非常简单: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +如果跳过此步骤,你仍然会得到可用的文本,但 **handwritten note to text** 转换的效果会稍显粗糙。后处理器对包含项目符号或混合大小写单词的笔记尤其有用。 + +## 第六步 – 验证结果并处理边缘情况 + +打印出优化后的结果后,请再次确认一切是否正确。下面是一个可以添加的快速检查示例: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**边缘情况检查表** + +| 情况 | 处理方法 | +|-----------|------------| +| **对比度极低** | 在加载前使用 `cv2.convertScaleAbs` 提高对比度。 | +| **多语言** | 设置 `ocr_engine.language = ["en", "es"]`(或你的目标语言)。 | +| **大文档** | 分批处理页面以避免内存激增。 | +| **特殊符号** | 通过 `ocr_engine.add_custom_words([...])` 添加自定义词典。 | + +## 可视化概览 + +下面是一张占位图,展示了工作流——从拍摄的笔记到干净文本。alt 文本包含主要关键词,使图像更友好于 SEO。 + +![how to use OCR on a handwritten note image](/images/handwritten_ocr_flow.png "how to use OCR on a handwritten note image") + +## 完整可运行脚本 + +将所有部分组合在一起,以下是完整的、可直接复制粘贴的程序: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**预期输出(示例)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +请注意,后处理器修正了 “T0d@y” 的拼写错误并规范了间距。 + +## 常见陷阱与专业提示 + +- **图像尺寸重要** – OCR 引擎通常将输入尺寸限制在 4 K × 4 K。请提前缩放大照片。 +- **手写风格** – 连写体与印刷体会影响准确率。如果你能控制来源(例如使用数位笔),建议使用印刷体以获得最佳效果。 +- **批量处理** – 处理数十张笔记时,可将脚本放入循环,并将每个结果存入 CSV 或 SQLite 数据库。 +- **内存泄漏** – 某些 SDK 会保留内部缓冲区;如果发现速度变慢,请在完成后调用 `ocr_engine.dispose()`。 + +## 下一步 – 超越基础 OCR + +现在你已经掌握了单张图像的 **how to use OCR**,可以考虑以下扩展: + +1. **集成云存储** – 从 AWS S3 或 Azure Blob 拉取图像,运行相同的流水线,然后将结果推回。 +2. **添加语言检测** – 使用 `ocr_engine.detect_language()` 自动切换词典。 +3. **结合 NLP** – 将清理后的文本输入 spaCy 或 NLTK,以提取实体、日期或行动项。 +4. **创建 REST 接口** – 将脚本封装在 Flask 或 FastAPI 中,使其他服务能够 POST 图像并接收 JSON 编码的文本。 + +所有这些思路仍围绕 **recognize handwritten text**、**extract handwritten text** 和 **convert handwritten image** 这几个核心概念——这些正是你接下来可能搜索的确切短语。 + +--- + +### TL;DR + +我们向你展示了 **how to use OCR** 来识别手写文本、提取文本并将结果打磨成可用的字符串。完整脚本已准备好运行,工作流已逐步解释,并提供了常见边缘情况的检查清单。拍一张下次会议记录的照片,放入脚本,让机器替你完成输入。 + +祝编码愉快,愿你的笔记永远清晰可读! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/chinese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/chinese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..018fa61f4 --- /dev/null +++ b/ocr/chinese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,183 @@ +--- +category: general +date: 2026-03-28 +description: 对图像执行 OCR 并获取带有边界框坐标的干净文本。学习如何提取 OCR、清理 OCR,并一步步显示结果。 +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: zh +og_description: 对图像进行 OCR 识别,清理输出,并在简明教程中显示边界框坐标。 +og_title: 对图像进行 OCR – 干净的结果和边界框 +tags: +- OCR +- Computer Vision +- Python +title: 对图像进行 OCR – 清理结果并显示边界框坐标 +url: /zh/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 对图像执行 OCR – 清理结果并显示边界框坐标 + +是否曾需要 **对图像执行 OCR**,却总是得到杂乱的文本并且不确定每个单词在图片中的位置?你并不孤单。在许多项目中——发票数字化、收据扫描或简单的文本提取——获取原始 OCR 输出只是第一道障碍。好消息是?你可以清理这些输出,并立即看到每个区域的边界框坐标,而无需编写大量样板代码。 + +在本指南中,我们将逐步演示 **如何提取 OCR**,运行 **如何清理 OCR** 的后处理器,最后 **显示每个清理后区域的边界框坐标**。完成后,你将拥有一个可直接运行的脚本,将模糊的照片转换为整洁、结构化的文本,准备好进行后续处理。 + +## 你需要准备的内容 + +- Python 3.9+(下面的语法在 3.8 及以上版本均可运行) +- 支持 `recognize(..., return_structured=True)` 的 OCR 引擎——例如本文代码片段中使用的虚构 `engine` 库。请将其替换为 Tesseract、EasyOCR 或任何返回区域数据的 SDK。 +- 对 Python 函数和循环有基本了解 +- 一张你想要扫描的图像文件(PNG、JPG 等) + +> **专业提示:** 如果使用 Tesseract,`pytesseract.image_to_data` 已经提供了边界框。你可以将其结果包装成一个小适配器,以模拟下文展示的 `engine.recognize` API。 + +--- + +![对图像执行 OCR 示例](image-placeholder.png "对图像执行 OCR 示例") + +*Alt text: 展示如何对图像执行 OCR 并可视化边界框坐标的示意图* + +## 第一步 – 对图像执行 OCR 并获取结构化区域 + +首先,让 OCR 引擎返回的不仅是纯文本,而是一个结构化的文本区域列表。该列表包含原始字符串以及包围它的矩形。 + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**为什么这很重要:** +仅请求纯文本会丢失空间位置信息。结构化数据让你后续能够 **显示边界框坐标**、将文本与表格对齐,或将精确位置提供给下游模型。 + +## 第二步 – 使用后处理器清理 OCR 输出 + +OCR 引擎擅长识别字符,但常会留下多余空格、换行符或误识别的符号。后处理器会对文本进行规范化,修正常见 OCR 错误,并去除多余空白。 + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +如果你自行实现清理器,可以考虑: + +- 删除非 ASCII 字符(`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- 将多个空格合并为单个空格 +- 使用 `pyspellchecker` 等拼写检查器纠正常见拼写错误 + +**为什么你应该在意:** +整洁的字符串使搜索、索引以及后续的 NLP 流程更加可靠。换句话说,**如何清理 OCR** 往往决定了数据集是可用还是让人头疼。 + +## 第三步 – 为每个清理后的区域显示边界框坐标 + +文本整理完毕后,我们遍历每个区域,打印其矩形以及清理后的字符串。这一步就是最终 **显示边界框坐标** 的地方。 + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**示例输出** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +现在,你可以将这些坐标传入绘图库(例如 OpenCV)在原图上绘制框,或将其存入数据库以供后续查询。 + +## 完整、可直接运行的脚本 + +下面是把上述三步整合在一起的完整程序。请将占位的 `engine` 调用替换为你实际使用的 OCR SDK。 + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### 如何运行 + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +你应该会看到一系列边界框与清理后文本的对应列表,正如上面的示例输出所示。 + +## 常见问题与边缘情况 + +| Question | Answer | +|----------|--------| +| **What if the OCR engine doesn’t support `return_structured`?** | Write a thin wrapper that converts the engine’s raw output (usually a list of words with coordinates) into objects with `text` and `bounding_box` attributes. | +| **Can I get confidence scores?** | Many SDKs expose a confidence metric per region. Append it to the print statement: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **How to handle rotated text?** | Pre‑process the image with OpenCV’s `cv2.minAreaRect` to deskew before calling `recognize`. | +| **What if I need the output in JSON?** | Serialize `processed_result.regions` with `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Is there a way to visualize the boxes?** | Use OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` inside the loop, then `cv2.imwrite("annotated.jpg", img)`. | + +## 小结 + +你刚刚学习了 **如何对图像执行 OCR**、清理原始输出,并 **显示每个区域的边界框坐标**。这套三步流程——识别 → 后处理 → 遍历——是一个可复用的模式,能够轻松嵌入任何需要可靠文本提取的 Python 项目。 + +### 接下来可以做什么? + +- **探索不同的 OCR 后端**(Tesseract、EasyOCR、Google Vision),比较准确率。 +- **与数据库集成**,将区域数据存储用于可搜索的档案。 +- **添加语言检测**,为每个区域路由到合适的拼写检查器。 +- **在原图上叠加框** 进行可视化验证(参见上面的 OpenCV 代码片段)。 + +如果遇到奇怪的情况,请记住,最大的收益来自稳固的后处理步骤;干净的字符串远比原始字符堆更易于后续操作。 + +祝编码愉快,愿你的 OCR 流程始终保持整洁! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/chinese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/chinese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..367600aeb --- /dev/null +++ b/ocr/chinese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,229 @@ +--- +category: general +date: 2026-03-28 +description: Python OCR 教程,展示如何使用 Aspose OCR Cloud 在 Python 中提取图像文字。学习如何加载图像进行 OCR,并在几分钟内将图像转换为纯文本。 +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: zh +og_description: Python OCR 教程解释如何加载图像进行 OCR 并使用 Aspose OCR Cloud 将图像转换为纯文本。获取完整代码和技巧。 +og_title: Python OCR 教程 – 从图像中提取文本 +tags: +- OCR +- Python +- Image Processing +title: Python OCR 教程——从图像中提取文本 +url: /zh/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR 教程 – 从图像中提取文本 + +有没有想过如何把一张凌乱的收据照片转换成干净、可搜索的文本?你并不是唯一有这种想法的人。根据我的经验,最大的障碍并不是 OCR 引擎本身,而是将图像转换为正确的格式并顺利提取纯文本。 + +本 **python ocr tutorial** 将逐步指导你完成每一步——加载用于 OCR 的图像、运行识别,最后将图像的纯文本转换为可以存储或分析的 Python 字符串。完成后,你就能以 **extract text image python** 的方式提取文本,而且无需任何付费许可证即可开始。 + +## 你将学到的内容 + +- 如何安装并导入 Aspose OCR Cloud SDK for Python。 +- 获取用于 **load image for OCR** 的确切代码(PNG、JPEG、TIFF、PDF 等)。 +- 如何调用引擎执行 **ocr image to text** 转换。 +- 处理常见边缘情况的技巧,例如多页 PDF 或低分辨率扫描。 +- 验证输出的方法以及当文本出现乱码时的处理方式。 + +### 前提条件 + +- 在机器上已安装 Python 3.8+。 +- 拥有免费 Aspose Cloud 账户(试用版无需许可证即可使用)。 +- 对 pip 和虚拟环境有基本了解——无需高级技巧。 + +> **专业提示:** 如果你已经在使用 virtualenv,请立即激活它。这可以让依赖保持整洁,避免版本冲突。 + +![Python OCR 教程截图,显示识别的文本](path/to/ocr_example.png "Python OCR 教程 – 提取的纯文本显示") + +## 第一步 – 安装 Aspose OCR Cloud SDK + +首先,我们需要与 Aspose OCR 服务通信的库。打开终端并运行: + +```bash +pip install asposeocrcloud +``` + +该单行命令会拉取最新的 SDK(当前版本为 23.12)。该包已包含所有必需内容——无需额外的图像处理库。 + +## 第二步 – 初始化 OCR 引擎(关键字示例) + +SDK 准备好后,我们即可启动 **python ocr tutorial** 引擎。构造函数在试用期间不需要任何许可证密钥,这让过程更简单。 + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **为什么重要:** 只初始化一次引擎可以保持后续调用的快速。如果为每张图像都重新创建对象,会浪费网络往返。 + +## 第三步 – 加载用于 OCR 的图像 + +这正是 **load image for OCR** 关键字发挥作用的地方。SDK 的 `Image.load` 方法接受文件路径或 URL,并会自动检测格式(PNG、JPEG、TIFF、PDF 等)。我们来加载一个示例收据: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +如果处理的是多页 PDF,只需指向该 PDF 文件;SDK 会在内部将每页视为单独的图像。 + +## 第四步 – 执行 OCR 图像转文本转换 + +图像已加载到内存后,实际的 OCR 只需一行代码即可完成。`recognize` 方法返回一个 `OcrResult` 对象,其中包含纯文本、置信度分数,甚至在需要时的边界框。 + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **边缘情况:** 对于低分辨率图片(低于 300 dpi),可能需要先放大图像。SDK 提供了 `Resize` 辅助工具,但对大多数收据而言默认设置已足够。 + +## 第五步 – 将图像纯文本转换为可用字符串 + +拼图的最后一块是从结果对象中提取纯文本。这一步即 **convert image plain text**,将 OCR 数据块转换为可打印、存储或输送到其他系统的内容。 + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +运行脚本后,你应该会看到类似如下的输出: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +该输出现在是普通的 Python 字符串,可用于 CSV 导出、数据库插入或自然语言处理。 + +## 处理常见陷阱 + +### 1. 空白或噪声图像 + +如果 `ocr_result.text` 返回为空,请再次检查图像质量。一个快速的解决办法是添加预处理步骤: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. 多页 PDF + +当你提供 PDF 时,`recognize` 会返回每页的结果。可以这样遍历: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. 语言支持 + +Aspose OCR 支持超过 60 种语言。要切换语言,请在调用 `recognize` 前设置 `language` 属性: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## 完整工作示例 + +将所有步骤整合在一起,下面是一个完整的、可直接复制粘贴的脚本,涵盖从安装到边缘情况处理的全部内容: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +运行脚本(`python ocr_demo.py`),你将在控制台看到 **ocr image to text** 的输出。 + +## 回顾 – 我们覆盖的内容 + +- 已安装 **Aspose OCR Cloud** SDK(`pip install asposeocrcloud`)。 +- **初始化 OCR 引擎**,无需许可证(非常适合试用)。 +- 演示了如何 **load image for OCR**,无论是 PNG、JPEG 还是 PDF。 +- 执行了 **ocr image to text** 转换并 **convert image plain text** 为可用的 Python 字符串。 +- 解决了常见问题,如低分辨率扫描、多页 PDF 和语言选择。 + +## 后续步骤与相关主题 + +既然你已经掌握了 **python ocr tutorial**,可以进一步探索: + +- **Extract text image python** 用于批量处理大量收据文件夹。 +- 将 OCR 输出与 **pandas** 集成进行数据分析(`df = pd.read_csv(StringIO(extracted))`)。 +- 在网络连接受限时使用 **Tesseract OCR** 作为后备方案。 +- 使用 **spaCy** 添加后处理,以识别日期、金额和商户名称等实体。 + +随意尝试:更换不同的图像格式、调整对比度或切换语言。OCR 领域广阔,你刚学到的技能是任何文档自动化项目的坚实基础。 + +祝编码愉快,愿你的文本始终可读! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/chinese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/chinese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..c55eebd6a --- /dev/null +++ b/ocr/chinese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,218 @@ +--- +category: general +date: 2026-03-28 +description: 学习如何在图像上运行 OCR,自动下载 Hugging Face 模型,清理 OCR 文本,并使用 Aspose OCR Cloud 在 + Python 中配置 LLM 模型。 +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: zh +og_description: 对图像进行 OCR 并使用自动下载的 Hugging Face 模型清理输出。本指南展示了如何在 Python 中配置 LLM 模型。 +og_title: 在图像上运行 OCR – 完整的 Aspose OCR 云教程 +tags: +- OCR +- Python +- LLM +- HuggingFace +title: 使用 Aspose OCR Cloud 对图像进行 OCR – 完整分步指南 +url: /zh/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 在图像上运行 OCR – 完整 Aspose OCR Cloud 教程 + +是否曾需要对图像文件进行 OCR,但原始输出却像一团乱麻?在我的经验中,最大痛点并不是识别本身,而是后期的清理。幸运的是,Aspose OCR Cloud 允许你附加一个 LLM 后处理器,能够自动 *清理 OCR 文本*。在本教程中,我们将逐步演示所有必需的步骤:从 **下载 Hugging Face 模型** 到配置 LLM、运行 OCR 引擎,最后对结果进行润色。 + +完成本指南后,你将拥有一个可直接运行的脚本,能够: + +1. 从 Hugging Face 拉取一个紧凑的 Qwen 2.5 模型(自动下载)。 +2. 配置模型,使网络的一部分在 GPU 上运行,剩余部分在 CPU 上运行。 +3. 对手写笔记图像执行 OCR 引擎。 +4. 使用 LLM 清理识别出的文本,得到可读的输出。 + +> **先决条件** – Python 3.8+、`asposeocrcloud` 包、至少 4 GB 显存的 GPU(可选但推荐),以及首次下载模型时的网络连接。 + +--- + +## 你需要的内容 + +- **Aspose OCR Cloud SDK** – 通过 `pip install asposeocrcloud` 安装。 +- **示例图像** – 例如 `handwritten_note.jpg`,放置在本地文件夹中。 +- **GPU 支持** – 如果拥有支持 CUDA 的 GPU,脚本会将 30 层卸载到 GPU;否则会自动回退到 CPU。 +- **写入权限** – 脚本会将模型缓存到 `YOUR_DIRECTORY`,请确保该文件夹已存在。 + +--- + +## 第一步 – 配置 LLM 模型(下载 Hugging Face 模型) + +首先我们需要告诉 Aspose AI 从哪里获取模型。`AsposeAIModelConfig` 类负责自动下载、量化以及 GPU 层分配。 + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**为什么这很重要** – 将模型量化为 `int8` 能显著降低内存占用(≈ 4 GB 对比 12 GB)。在 GPU 与 CPU 之间拆分模型后,即使是 30 亿参数的 LLM 也能在普通 RTX 3060 上运行。如果没有 GPU,设置 `gpu_layers=0`,SDK 将全部在 CPU 上执行。 + +> **提示**:首次运行时会下载约 1.5 GB,请预留几分钟并保持网络稳定。 + +--- + +## 第二步 – 使用模型配置初始化 AI 引擎 + +现在我们启动 Aspose AI 引擎,并将刚才创建的配置传入。 + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**内部发生了什么?** SDK 会检查 `directory_model_path` 中是否已有模型。如果找到匹配的版本,则立即加载;否则会从 Hugging Face 下载 GGUF 文件,解压并准备推理管线。 + +--- + +## 第三步 – 创建 OCR 引擎并附加 AI 后处理器 + +OCR 引擎负责字符识别的核心工作。通过附加 `ocr_ai.run_postprocessor`,我们可以在识别后自动实现 **清理 OCR 文本**。 + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**为什么要使用后处理器?** 原始 OCR 常常出现换行位置错误、标点误识别或杂散符号。LLM 能将输出改写为完整句子,纠正拼写,甚至推断缺失的词语——本质上把原始的乱七八糟转换为润色后的文本。 + +--- + +## 第四步 – 对图像文件运行 OCR + +所有组件已就绪,现在可以将图像输入引擎进行处理。 + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**边缘情况**:如果图像较大(> 5 MP),建议先缩放以加快处理速度。SDK 接受 Pillow 的 `Image` 对象,你可以使用 `PIL.Image.thumbnail()` 进行预处理。 + +--- + +## 第五步 – 让 AI 清理识别文本并展示前后对比 + +最后调用之前附加的后处理器。此步骤展示了 *清理前* 与 *清理后* 的对比效果。 + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### 预期输出 + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +请注意 LLM 已经: + +- 修正常见的 OCR 误识别(`Th1s` → `This`)。 +- 删除杂散符号(`&` → `and`)。 +- 将换行规范为完整句子。 + +--- + +## 🎨 可视化概览(在图像上运行 OCR 工作流) + +![Run OCR on image workflow](run_ocr_on_image_workflow.png "Diagram showing the run OCR on image pipeline from model download to cleaned output") + +上图概括了完整流程:**下载 Hugging Face 模型 → 配置 LLM → 初始化 AI → OCR 引擎 → AI 后处理器 → 清理 OCR 文本**。 + +--- + +## 常见问题 & 专业技巧 + +### 如果没有 GPU 怎么办? + +在 `AsposeAIModelConfig` 中将 `gpu_layers=0`。模型将完全在 CPU 上运行,速度会慢一些,但仍可使用。你也可以切换到更小的模型(例如 `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`),以保持推理时间在合理范围。 + +### 如何以后更换模型? + +只需更新 `hugging_face_repo_id` 并重新运行 `ocr_ai.initialize(model_config)`。SDK 会检测版本变化,下载新模型并替换缓存文件。 + +### 能自定义后处理器的提示词吗? + +可以。向 `custom_settings` 传入包含 `prompt_template` 键的字典。例如: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### 是否应该把清理后的文本保存到文件? + +当然。清理完成后,你可以将结果写入 `.txt` 或 `.json` 文件,以供后续处理: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## 结论 + +我们已经演示了如何使用 Aspose OCR Cloud **在图像上运行 OCR**,自动 **下载 Hugging Face 模型**,专业 **配置 LLM 模型** 参数,并最终通过强大的 LLM 后处理器 **清理 OCR 文本**。整个过程可以封装在一个易于运行的 Python 脚本中,兼容 GPU 加速和纯 CPU 环境。 + +如果你对该流水线已经熟悉,可以进一步尝试: + +- **不同的 LLM** – 试试 `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF`,获取更大的上下文窗口。 +- **批量处理** – 循环遍历文件夹中的图像,并将清理后的结果汇总到 CSV。 +- **自定义提示词** – 为你的领域(法律文档、医疗笔记等)定制 AI 提示。 + +随意调整 `gpu_layers` 参数、替换模型或使用自己的提示词。天地无限,而你手中的代码正是起飞的助推器。 + +祝编码愉快,愿你的 OCR 输出永远干净! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/czech/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/czech/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..cde3a3b3a --- /dev/null +++ b/ocr/czech/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,224 @@ +--- +category: general +date: 2026-03-28 +description: Jak používat OCR k rozpoznání ručně psaného textu na obrázcích. Naučte + se extrahovat ručně psaný text, převést ručně psaný obrázek a získat čisté výsledky + rychle. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: cs +og_description: Jak použít OCR k rozpoznání rukopisu. Tento tutoriál vám krok za krokem + ukáže, jak z obrázků extrahovat ručně psaný text a získat vylepšené výsledky. +og_title: Jak použít OCR k rozpoznání ručně psaného textu – kompletní průvodce +tags: +- OCR +- Handwriting Recognition +- Python +title: Jak použít OCR k rozpoznání ručně psaného textu – kompletní průvodce +url: /cs/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Jak použít OCR k rozpoznání rukopisu – Kompletní průvodce + +Jak použít OCR pro ručně psané poznámky, je otázka, kterou si klade mnoho vývojářů, když potřebují digitalizovat skici, zápisy ze schůzek nebo rychlé nápady. V tomto průvodci projdeme přesné kroky k rozpoznání rukopisu, extrakci rukopisného textu a převodu obrázku s rukopisem na čisté, prohledávatelné řetězce. + +Pokud jste někdy zírali na fotografii nákupního seznamu a přemýšleli: „Mohu převést tento ručně psaný obrázek na text, aniž bych vše znovu přepisoval?“ – jste na správném místě. Na konci budete mít připravený skript, který během několika sekund změní **ručně psanou poznámku na text**. + +## Co budete potřebovat + +- Python 3.8+ (kód funguje s jakoukoliv novější verzí) +- Knihovna `ocr` – nainstalujte ji pomocí `pip install ocr-sdk` (nahraďte názvem balíčku vašeho poskytovatele) +- Jasná fotografie ručně psané poznámky (`hand_note.png` v příkladu) +- Trocha zvědavosti a káva ☕️ (volitelné, ale doporučené) + +Žádné těžkopádné frameworky, žádné placené cloudové klíče – jen lokální engine, který podporuje **rozpoznání rukopisu** přímo z krabice. + +## Krok 1 – Nainstalujte OCR balíček a importujte jej + +Nejprve si na stroj nainstalujeme správný balíček. Otevřete terminál a spusťte: + +```bash +pip install ocr-sdk +``` + +Po dokončení instalace importujte modul ve svém skriptu: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Tip:** Pokud používáte virtuální prostředí, aktivujte jej před instalací. Pomůže vám udržet projekt přehledný a vyhnout se konfliktům verzí. + +## Krok 2 – Vytvořte OCR engine a zapněte režim rukopisu + +Nyní skutečně **jak použít OCR** – potřebujeme instanci engine, která ví, že pracujeme s kurzívními tahy místo tištěných fontů. Následující úryvek vytvoří engine a přepne jej do režimu rukopisu: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Proč nastavit `recognition_mode`? Protože většina OCR engineů ve výchozím nastavení detekuje tištěný text, který často přehlíží smyčky a skloněné tahy osobní poznámky. Zapnutí režimu rukopisu dramaticky zvyšuje přesnost. + +## Krok 3 – Načtěte obrázek, který chcete převést (Convert Handwritten Image) + +Obrázky jsou surovým materiálem pro jakýkoli OCR úkol. Ujistěte se, že je vaše fotografie uložena v bezztrátovém formátu (PNG funguje skvěle) a že je text poměrně čitelný. Pak jej načtěte takto: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Pokud obrázek leží vedle vašeho skriptu, můžete jednoduše použít `"hand_note.png"` místo úplné cesty. + +> **Co když je obrázek rozmazaný?** Zkuste předzpracování pomocí OpenCV (např. `cv2.cvtColor` na odstíny šedi, `cv2.threshold` ke zvýšení kontrastu) před předáním OCR engineu. + +## Krok 4 – Spusťte rozpoznávací engine a extrahujte rukopisný text + +S připraveným enginem a načteným obrázkem v paměti můžeme konečně **extrahovat rukopisný text**. Metoda `recognize` vrací surový výsledek, který obsahuje text i skóre důvěry. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Typický surový výstup může obsahovat nadbytečné zalomení řádků nebo špatně rozpoznané znaky, zvláště pokud je rukopis nepořádný. Proto existuje další krok. + +## Krok 5 – (Volitelné) Vylepšete výstup pomocí AI post‑processoru + +Většina moderních OCR SDK obsahuje lehký AI post‑processor, který upravuje mezery, opravuje běžné OCR chyby a normalizuje konce řádků. Spustit jej je tak jednoduché: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Pokud tento krok přeskočíte, získáte stále použitelný text, ale konverze **ručně psané poznámky na text** bude vypadat trochu drsněji. Post‑processor je zvláště užitečný pro poznámky s odrážkami nebo smíšeným zápisem velkých a malých písmen. + +## Krok 6 – Ověřte výsledek a řešte okrajové případy + +Po vytištění vylepšeného výsledku dvojitě zkontrolujte, že vše vypadá správně. Zde je rychlá kontrola, kterou můžete přidat: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Seznam okrajových případů** + +| Situace | Co dělat | +|-----------|------------| +| **Velmi nízký kontrast** | Zvýšit kontrast pomocí `cv2.convertScaleAbs` před načtením. | +| **Více jazyků** | Nastavit `ocr_engine.language = ["en", "es"]` (nebo vaše cílové jazyky). | +| **Velké dokumenty** | Zpracovávat stránky po dávkách, aby nedošlo k výkyvům paměti. | +| **Speciální symboly** | Přidat vlastní slovník pomocí `ocr_engine.add_custom_words([...])`. | + +## Vizualizace + +Níže je zástupný obrázek, který ilustruje workflow – od vyfocené poznámky po čistý text. Alt text obsahuje hlavní klíčové slovo, což zvyšuje SEO přívětivost obrázku. + +![jak použít OCR na obrázku ručně psané poznámky](/images/handwritten_ocr_flow.png "jak použít OCR na obrázku ručně psané poznámky") + +## Kompletní spustitelný skript + +Sestavením všech částí získáte kompletní, připravený ke zkopírování a vložení program: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Očekávaný výstup (příklad)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Všimněte si, jak post‑processor opravil překlep „T0d@y“ a normalizoval mezery. + +## Časté úskalí a tipy od profíků + +- **Velikost obrázku má význam** – OCR engine obvykle omezuje vstupní rozměry na 4 K × 4 K. Předem zmenšete velké fotografie. +- **Styl rukopisu** – Kurzíva vs. bloková písmena mohou ovlivnit přesnost. Pokud máte kontrolu nad zdrojem (např. digitální pero), upřednostněte bloková písmena pro nejlepší výsledek. +- **Dávkové zpracování** – Při práci s desítkami poznámek obalte skript do smyčky a uložte každý výsledek do CSV nebo SQLite DB. +- **Úniky paměti** – Některá SDK udržují interní buffery; po dokončení zavolejte `ocr_engine.dispose()`, pokud zaznamenáte zpomalení. + +## Další kroky – Přesahování jednoduchého OCR + +Nyní, když ovládáte **jak použít OCR** pro jeden obrázek, zvažte tyto rozšíření: + +1. **Integrace s cloudovým úložištěm** – Stahujte obrázky z AWS S3 nebo Azure Blob, spusťte stejný pipeline a výsledek vraťte zpět. +2. **Detekce jazyka** – Použijte `ocr_engine.detect_language()` k automatickému přepínání slovníků. +3. **Kombinace s NLP** – Vstupte vyčištěný text do spaCy nebo NLTK a extrahujte entity, data nebo úkoly. +4. **Vytvoření REST endpointu** – Zabalte skript do Flask nebo FastAPI, aby jiné služby mohly POSTovat obrázky a získat JSON‑kódovaný text. + +Všechny tyto nápady stále vycházejí z hlavních konceptů **rozpoznat rukopisný text**, **extrahovat rukopisný text** a **převést obrázek s rukopisem** – přesně ty fráze, na které pravděpodobně budete příště hledat. + +--- + +### TL;DR + +Ukázali jsme vám **jak použít OCR** k rozpoznání rukopisu, jeho extrakci a vylepšení výsledku na použitelný řetězec. Kompletní skript je připraven k běhu, workflow je vysvětleno krok za krokem a máte nyní kontrolní seznam pro běžné okrajové případy. Pořiďte si fotografii další schůzkové poznámky, vložte ji do skriptu a nechte stroj psát za vás. + +Šťastné programování a ať jsou vaše poznámky vždy čitelné! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/czech/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/czech/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..bbea791d2 --- /dev/null +++ b/ocr/czech/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,187 @@ +--- +category: general +date: 2026-03-28 +description: Proveďte OCR na obrázku a získejte čistý text s koordináty ohraničujících + rámečků. Naučte se, jak extrahovat OCR, vyčistit OCR a zobrazit výsledky krok za + krokem. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: cs +og_description: Proveďte OCR na obrázku, vyčistěte výstup a zobrazte souřadnice ohraničujícího + rámečku v stručném tutoriálu. +og_title: Proveďte OCR na obrázku – čisté výsledky a ohraničující rámečky +tags: +- OCR +- Computer Vision +- Python +title: Proveďte OCR na obrázku – čisté výsledky a zobrazte souřadnice ohraničujícího + rámečku +url: /cs/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Proveďte OCR na obrázku – Vyčistěte výsledek a zobrazte souřadnice ohraničujícího rámečku + +Už jste někdy potřebovali **provést OCR na obrázku**, ale dostávali jste nečistý text a nevedeli, kde se každé slovo nachází na obrázku? Nejste v tom sami. V mnoha projektech – digitalizace faktur, skenování účtenek nebo jednoduché získávání textu – je získání surového výstupu OCR jen první překážkou. Dobrá zpráva? Ten výstup můžete vyčistit a okamžitě vidět souřadnice ohraničujících rámečků každého regionu, aniž byste museli psát spoustu boilerplate kódu. + +V tomto průvodci si ukážeme **jak extrahovat OCR**, spustíme **post‑processor pro čištění OCR** a nakonec **zobrazíme souřadnice ohraničujících rámečků** pro každý vyčištěný region. Na konci budete mít jeden spustitelný skript, který promění rozmazanou fotografii na úhledný, strukturovaný text připravený pro další zpracování. + +## Co budete potřebovat + +- Python 3.9+ (syntaxe níže funguje na 3.8 a novějších) +- OCR engine, který podporuje `recognize(..., return_structured=True)` – například fiktivní knihovnu `engine` použité v úryvku. Nahraďte ji Tesseractem, EasyOCR nebo jakýmkoli SDK, které vrací data o regionech. +- Základní znalost funkcí a smyček v Pythonu +- Soubor s obrázkem, který chcete skenovat (PNG, JPG, atd.) + +> **Tip:** Pokud používáte Tesseract, funkce `pytesseract.image_to_data` už poskytuje ohraničující rámečky. Můžete výsledek zabalit do malého adaptéru, který napodobuje API `engine.recognize` uvedené níže. + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Alt text: diagram ukazující, jak provést OCR na obrázku a vizualizovat souřadnice ohraničujících rámečků* + +## Krok 1 – Proveďte OCR na obrázku a získejte strukturované regiony + +Prvním krokem je požádat OCR engine, aby nevracel jen prostý text, ale strukturovaný seznam textových regionů. Tento seznam obsahuje surový řetězec a obdélník, který jej obklopuje. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Proč je to důležité:** +Když požadujete jen prostý text, ztratíte prostorový kontext. Strukturovaná data vám později umožní **zobrazit souřadnice ohraničujících rámečků**, zarovnat text s tabulkami nebo předat přesné polohy dalšímu modelu. + +## Krok 2 – Jak vyčistit výstup OCR pomocí post‑processoru + +OCR engine jsou skvělé v rozpoznávání znaků, ale často zanechávají nadbytečné mezery, artefakty konců řádků nebo špatně rozpoznané symboly. Post‑processor normalizuje text, opravuje běžné OCR chyby a ořezává bílé znaky. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Pokud si vytváříte vlastní čistič, zvažte: + +- Odstranění ne‑ASCII znaků (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Sloučení více mezer do jedné mezery +- Použití kontroloru pravopisu jako `pyspellchecker` pro zjevné překlepy + +**Proč by vás to mělo zajímat:** +Úhledný řetězec usnadňuje vyhledávání, indexování a spolehlivost následných NLP pipeline. Jinými slovy, **jak vyčistit OCR** je často rozdíl mezi použitelné datové sadě a hlavou bolesti. + +## Krok 3 – Zobrazte souřadnice ohraničujících rámečků pro každý vyčištěný region + +Nyní, když je text čistý, projdeme každý region a vypíšeme jeho obdélník a vyčištěný řetězec. To je část, kde konečně **zobrazíme souřadnice ohraničujících rámečků**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Ukázkový výstup** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Nyní můžete tyto souřadnice předat kreslící knihovně (např. OpenCV) a překreslit rámečky na původním obrázku, nebo je uložit do databáze pro pozdější dotazy. + +## Kompletní, připravený ke spuštění skript + +Níže je kompletní program, který spojuje všechny tři kroky. Vyměňte placeholderové volání `engine` za vaše skutečné OCR SDK. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Jak spustit + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Měli byste vidět seznam ohraničujících rámečků spárovaných s vyčištěným textem, přesně jako ve výše uvedeném ukázkovém výstupu. + +## Často kladené otázky a okrajové případy + +| Otázka | Odpověď | +|----------|--------| +| **Co když OCR engine nepodporuje `return_structured`?** | Napište tenký wrapper, který převede surový výstup engine (obvykle seznam slov s koordináty) na objekty s atributy `text` a `bounding_box`. | +| **Mohu získat skóre důvěry?** | Mnoho SDK poskytuje metriku důvěry pro každý region. Přidejte ji do výpisu: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Jak zacházet s otočeným textem?** | Předzpracujte obrázek pomocí OpenCV `cv2.minAreaRect` a deskewujte jej před voláním `recognize`. | +| **Co když potřebuji výstup v JSON?** | Serializujte `processed_result.regions` pomocí `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Existuje způsob, jak vizualizovat rámečky?** | Použijte OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` uvnitř smyčky, pak `cv2.imwrite("annotated.jpg", img)`. | + +## Závěr + +Právě jste se naučili **jak provést OCR na obrázku**, vyčistit surový výstup a **zobrazit souřadnice ohraničujících rámečků** pro každý region. Tříkrokový tok – rozpoznání → post‑processing → iterace – je znovupoužitelný vzor, který můžete vložit do jakéhokoli Python projektu vyžadujícího spolehlivé získávání textu. + +### Co dál? + +- **Prozkoumejte různé OCR back‑endy** (Tesseract, EasyOCR, Google Vision) a porovnejte přesnost. +- **Integraujte s databází** pro ukládání dat o regionech a vyhledávat v archivách. +- **Přidejte detekci jazyka** a směrujte každý region přes odpovídající kontrolor pravopisu. +- **Překryjte rámečky na původní obrázek** pro vizuální ověření (viz výše uvedený OpenCV úryvek). + +Pokud narazíte na podivnosti, pamatujte, že největší výhoda pochází ze solidního post‑processing kroku; čistý řetězec je mnohem snazší pracovat než surový výpis znaků. + +Šťastné kódování a ať jsou vaše OCR pipeline vždy úhledné! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/czech/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/czech/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..d1e8b2dff --- /dev/null +++ b/ocr/czech/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,232 @@ +--- +category: general +date: 2026-03-28 +description: Python OCR tutoriál ukazující, jak extrahovat text z obrázku v Pythonu + pomocí Aspose OCR Cloud. Naučte se načíst obrázek pro OCR a během několika minut + převést obrázek na prostý text. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: cs +og_description: Python OCR tutoriál vysvětluje, jak načíst obrázek pro OCR a převést + obrázek na prostý text pomocí Aspose OCR Cloud. Získejte kompletní kód a tipy. +og_title: Python OCR tutoriál – Extrahování textu z obrázků +tags: +- OCR +- Python +- Image Processing +title: Python OCR tutoriál – Extrahování textu z obrázků +url: /cs/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR tutoriál – Extrahování textu z obrázků + +Už jste se někdy zamýšleli, jak převést nepořádnou fotografii účtenky na čistý, prohledávatelný text? Nejste v tom jediní. Podle mé zkušenosti není největší překážkou samotný OCR engine, ale získání obrázku do správného formátu a vytažení čistého textu bez problémů. + +Tento **python ocr tutorial** vás provede každým krokem – načtením obrázku pro OCR, spuštěním rozpoznání a nakonec převodem čistého textu z obrázku na Python řetězec, který můžete uložit nebo analyzovat. Na konci budete schopni **extract text image python** styl, a nebudete potřebovat žádnou placenou licenci k zahájení. + +## Co se naučíte + +- Jak nainstalovat a importovat Aspose OCR Cloud SDK pro Python. +- Přesný kód pro **load image for OCR** (PNG, JPEG, TIFF, PDF, atd.). +- Jak zavolat engine k provedení **ocr image to text** konverze. +- Tipy pro zvládání běžných edge‑cases, jako jsou více‑stránkové PDF nebo skeny s nízkým rozlišením. +- Způsoby, jak ověřit výstup a co dělat, pokud text vypadá poškozeně. + +### Předpoklady + +- Python 3.8+ nainstalovaný na vašem počítači. +- Bezplatný účet Aspose Cloud (zkouška funguje bez licence). +- Základní znalost pip a virtuálních prostředí – nic složitého. + +> **Pro tip:** Pokud již používáte virtualenv, aktivujte jej nyní. Udrží vaše závislosti přehledné a zabrání konfliktům verzí. + +![Snímek obrazovky Python OCR tutoriálu zobrazující rozpoznaný text](path/to/ocr_example.png "Python OCR tutoriál – zobrazení extrahovaného čistého textu") + +## Krok 1 – Instalace Aspose OCR Cloud SDK + +Nejprve potřebujeme knihovnu, která komunikuje se službou OCR od Aspose. Otevřete terminál a spusťte: + +```bash +pip install asposeocrcloud +``` + +Tento jediný příkaz stáhne nejnovější SDK (aktuálně verze 23.12). Balíček obsahuje vše, co potřebujete – není nutná žádná další knihovna pro zpracování obrázků. + +## Krok 2 – Inicializace OCR engine (Primární klíčové slovo v akci) + +Nyní, když je SDK připravené, můžeme spustit **python ocr tutorial** engine. Konstruktor nevyžaduje žádný licenční klíč pro zkušební verzi, což vše zjednodušuje. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Proč je to důležité:** Inicializace engine pouze jednou udržuje následné volání rychlé. Pokud objekt vytvoříte znovu pro každý obrázek, zbytečně plýtváte síťovými požadavky. + +## Krok 3 – Načtení obrázku pro OCR + +Zde se ukáže síla klíčového slova **load image for OCR**. Metoda `Image.load` SDK přijímá cestu k souboru nebo URL a automaticky detekuje formát (PNG, JPEG, TIFF, PDF, atd.). Načtěme ukázkovou účtenku: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Pokud pracujete s více‑stránkovým PDF, jednoduše odkažte na PDF soubor; SDK interně bude každou stránku považovat za samostatný obrázek. + +## Krok 4 – Provedení OCR konverze obrázku na text + +S obrázkem v paměti proběhne skutečné OCR jedním řádkem. Metoda `recognize` vrací objekt `OcrResult`, který obsahuje čistý text, skóre důvěry a dokonce i ohraničující rámečky, pokud je budete později potřebovat. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** Pro obrázky s nízkým rozlišením (méně než 300 dpi) možná budete chtít nejprve zvětšit velikost obrázku. SDK nabízí pomocnou funkci `Resize`, ale pro většinu účtenek výchozí nastavení funguje dobře. + +## Krok 5 – Převod čistého textu z obrázku na použitelné řetězec + +Poslední část skládačky je extrakce čistého textu z objektu výsledku. Toto je krok **convert image plain text**, který převádí OCR blob na něco, co můžete vytisknout, uložit nebo předat jinému systému. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Když spustíte skript, měli byste vidět něco jako: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Tento výstup je nyní běžný Python řetězec, připravený pro export do CSV, vložení do databáze nebo zpracování přirozeného jazyka. + +## Řešení běžných problémů + +### 1. Prázdné nebo šumové obrázky + +Pokud `ocr_result.text` vrátí prázdný řetězec, zkontrolujte kvalitu obrázku. Rychlé řešení je přidat krok předzpracování: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Více‑stránkové PDF + +Když předáte PDF, `recognize` vrátí výsledky pro každou stránku. Procházejte je takto: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Podpora jazyků + +Aspose OCR podporuje více než 60 jazyků. Pro změnu jazyka nastavte vlastnost `language` před voláním `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Kompletní funkční příklad + +Spojením všech částí získáte kompletní skript připravený ke zkopírování, který pokrývá vše od instalace po řešení okrajových případů: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Spusťte skript (`python ocr_demo.py`) a uvidíte výstup **ocr image to text** přímo ve vaší konzoli. + +## Shrnutí – Co jsme probrali + +- Nainstalovali jsme **Aspose OCR Cloud** SDK (`pip install asposeocrcloud`). +- **Initialised the OCR engine** bez licence (ideální pro zkušební verzi). +- Ukázali jsme, jak **load image for OCR**, ať už jde o PNG, JPEG nebo PDF. +- Provedli jsme konverzi **ocr image to text** a **converted image plain text** na použitelné Python řetězce. +- Vyřešili jsme běžné problémy jako skeny s nízkým rozlišením, více‑stránkové PDF a výběr jazyka. + +## Další kroky a související témata + +Nyní, když jste zvládli **python ocr tutorial**, zvažte prozkoumání: + +- **Extract text image python** pro dávkové zpracování velkých složek s účtenkami. +- Integrace výstupu OCR s **pandas** pro analýzu dat (`df = pd.read_csv(StringIO(extracted))`). +- Použití **Tesseract OCR** jako záložní řešení, když je omezené internetové připojení. +- Přidání post‑zpracování pomocí **spaCy** k identifikaci entit jako datum, částky a názvy obchodníků. + +Neváhejte experimentovat: vyzkoušejte různé formáty obrázků, upravte kontrast nebo změňte jazyk. Oblast OCR je široká a dovednosti, které jste právě získali, jsou solidním základem pro jakýkoli projekt automatizace dokumentů. + +Šťastné programování a ať je váš text vždy čitelný! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/czech/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/czech/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..53cc5318a --- /dev/null +++ b/ocr/czech/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,220 @@ +--- +category: general +date: 2026-03-28 +description: Naučte se, jak spustit OCR na obrázku, automaticky stáhnout model Hugging Face, + vyčistit text z OCR a nakonfigurovat model LLM v Pythonu pomocí Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: cs +og_description: Spusťte OCR na obrázku a vyčistěte výstup pomocí automaticky staženého + modelu Hugging Face. Tento průvodce ukazuje, jak nakonfigurovat model LLM v Pythonu. +og_title: Spusťte OCR na obrázku – Kompletní tutoriál Aspose OCR Cloud +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Spusťte OCR na obrázku pomocí Aspose OCR Cloud – Kompletní průvodce krok za + krokem +url: /cs/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Spusťte OCR na obrázku – Kompletní tutoriál Aspose OCR Cloud + +Už jste někdy potřebovali spustit OCR na souborech s obrázky, ale surový výstup vypadal jako chaotický zmatek? Podle mé zkušenosti není největším problémem samotné rozpoznání – je to úklid. Naštěstí Aspose OCR Cloud vám umožňuje připojit LLM post‑processor, který může *automaticky vyčistit OCR text*. V tomto tutoriálu projdeme vše, co potřebujete: od **stažení modelu z Hugging Face** po konfiguraci LLM, spuštění OCR enginu a nakonec vyleštění výsledku. + +Na konci tohoto průvodce budete mít připravený skript, který: + +1. Stáhne kompaktní model Qwen 2.5 z Hugging Face (automaticky stažený pro vás). +2. Nakonfiguruje model tak, aby část sítě běžela na GPU a zbytek na CPU. +3. Spustí OCR engine na obrázku s ručně psanou poznámkou. +4. Použije LLM k vyčištění rozpoznaného textu a poskytne vám čitelný výstup. + +> **Požadavky** – Python 3.8+, balíček `asposeocrcloud`, GPU s alespoň 4 GB VRAM (volitelné, ale doporučené) a internetové připojení pro první stažení modelu. + +--- + +## Co budete potřebovat + +- **Aspose OCR Cloud SDK** – nainstalujte pomocí `pip install asposeocrcloud`. +- **Ukázkový obrázek** – např. `handwritten_note.jpg` umístěný v lokální složce. +- **Podpora GPU** – pokud máte CUDA‑povolený GPU, skript přenese 30 vrstev; jinak se automaticky přepne na CPU. +- **Oprávnění k zápisu** – skript ukládá model do `YOUR_DIRECTORY`; ujistěte se, že složka existuje. + +--- + +## Krok 1 – Konfigurace modelu LLM (stažení modelu z Hugging Face) + +První, co uděláme, je říct Aspose AI, odkud má model načíst. Třída `AsposeAIModelConfig` zajišťuje automatické stažení, kvantizaci a alokaci GPU vrstev. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Proč je to důležité** – Kvantizace na `int8` dramaticky snižuje spotřebu paměti (≈ 4 GB vs 12 GB). Rozdělení modelu mezi GPU a CPU vám umožní spustit 3‑miliard‑parametrový LLM i na skromném RTX 3060. Pokud nemáte GPU, nastavte `gpu_layers=0` a SDK vše nechá běžet na CPU. + +> **Tip:** První spuštění stáhne ~ 1,5 GB, takže počítejte s několika minutami a stabilním připojením. + +--- + +## Krok 2 – Inicializace AI enginu s konfigurací modelu + +Nyní spustíme Aspose AI engine a předáme mu konfiguraci, kterou jsme právě vytvořili. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Co se děje pod kapotou?** SDK kontroluje `directory_model_path` pro existující model. Pokud najde odpovídající verzi, načte ji okamžitě; jinak stáhne soubor GGUF z Hugging Face, rozbalí jej a připraví inference pipeline. + +--- + +## Krok 3 – Vytvoření OCR enginu a připojení AI post‑processoru + +OCR engine provádí těžkou práci rozpoznávání znaků. Připojením `ocr_ai.run_postprocessor` automaticky povolíme **clean OCR text** po rozpoznání. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Proč používat post‑processor?** Surové OCR často obsahuje špatně umístěné zalomení řádků, chybně detekovanou interpunkci nebo cizí symboly. LLM může výstup přepsat do správných vět, opravit pravopis a dokonce doplnit chybějící slova – v podstatě promění surový dump na upravený text. + +--- + +## Krok 4 – Spuštění OCR na souboru obrázku + +S veškerým propojením je čas předat obrázek engine. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Hraniční případ:** Pokud je obrázek velký (> 5 MP), můžete jej nejprve zmenšit, aby se zrychlilo zpracování. SDK přijímá objekt Pillow `Image`, takže můžete předzpracovat pomocí `PIL.Image.thumbnail()` podle potřeby. + +--- + +## Krok 5 – Nechte AI vyčistit rozpoznaný text a zobrazit obě verze + +Nakonec zavoláme post‑processor, který jsme připojili dříve. Tento krok ukazuje kontrast mezi *před* a *po* vyčištění. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Očekávaný výstup + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Všimněte si, jak LLM: + +- Opravil běžné OCR chyby (`Th1s` → `This`). +- Odstranil cizí symboly (`&` → `and`). +- Normalizoval zalomení řádků do správných vět. + +--- + +## 🎨 Vizualizace (Workflow spuštění OCR na obrázku) + +![Workflow spuštění OCR na obrázku](run_ocr_on_image_workflow.png "Diagram ukazující pipeline spuštění OCR na obrázku od stažení modelu po vyčištěný výstup") + +Diagram výše shrnuje celý pipeline: **stažení modelu z Hugging Face → konfigurace LLM → inicializace AI → OCR engine → AI post‑processor → clean OCR text**. + +--- + +## Časté otázky a profesionální tipy + +### Co když nemám GPU? + +Nastavte `gpu_layers=0` v `AsposeAIModelConfig`. Model poběží kompletně na CPU, což je pomalejší, ale stále funkční. Můžete také přejít na menší model (např. `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`), aby byl inference čas přijatelný. + +### Jak změnit model později? + +Stačí aktualizovat `hugging_face_repo_id` a znovu spustit `ocr_ai.initialize(model_config)`. SDK detekuje změnu verze, stáhne nový model a nahradí cache soubory. + +### Můžu přizpůsobit prompt post‑processoru? + +Ano. Předávejte slovník do `custom_settings` s klíčem `prompt_template`. Například: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Mám ukládat vyčištěný text do souboru? + +Určitě. Po vyčištění můžete výsledek zapsat do souboru `.txt` nebo `.json` pro další zpracování: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Závěr + +Právě jsme vám ukázali, jak **spustit OCR na obrázku** pomocí Aspose OCR Cloud, automaticky **stáhnout model z Hugging Face**, odborně **nakonfigurovat nastavení modelu LLM** a nakonec **vyčistit OCR text** pomocí výkonného LLM post‑processoru. Celý proces se vejde do jediného, snadno spustitelného Python skriptu a funguje jak na strojích s GPU, tak jen s CPU. + +Pokud vám tento pipeline vyhovuje, zkuste experimentovat s: + +- **Různými LLM** – vyzkoušejte `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` pro větší kontextové okno. +- **Dávkovým zpracováním** – projděte složku s obrázky a agregujte vyčištěné výsledky do CSV. +- **Vlastními promptami** – přizpůsobte AI vašemu oboru (právní dokumenty, lékařské poznámky atd.). + +Klidně upravte hodnotu `gpu_layers`, vyměňte model nebo připojte vlastní prompt. Možnosti jsou neomezené a kód, který máte nyní, je odrazovým můstkem. + +Šťastné kódování a ať jsou vaše OCR výstupy vždy čisté! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/dutch/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/dutch/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..210d98001 --- /dev/null +++ b/ocr/dutch/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Hoe OCR te gebruiken om handgeschreven tekst in afbeeldingen te herkennen. + Leer handgeschreven tekst te extraheren, handgeschreven afbeeldingen te converteren + en snel schone resultaten te krijgen. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: nl +og_description: Hoe OCR te gebruiken om handgeschreven tekst te herkennen. Deze tutorial + laat je stap voor stap zien hoe je handgeschreven tekst uit afbeeldingen kunt extraheren + en gepolijste resultaten krijgt. +og_title: Hoe OCR te gebruiken om handgeschreven tekst te herkennen – Complete gids +tags: +- OCR +- Handwriting Recognition +- Python +title: Hoe OCR te gebruiken om handgeschreven tekst te herkennen – Complete gids +url: /nl/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Hoe OCR te gebruiken om handgeschreven tekst te herkennen – Complete gids + +Hoe OCR voor handgeschreven notities te gebruiken is een vraag die veel ontwikkelaars stellen wanneer ze schetsen, notulen of snelle ideeën moeten digitaliseren. In deze gids lopen we stap voor stap door hoe je handgeschreven tekst herkent, handgeschreven tekst extraheert en een handgeschreven afbeelding omzet in schone, doorzoekbare strings. + +Als je ooit naar een foto van een boodschappenlijstje hebt gekeken en je afvroeg: “Kan ik deze handgeschreven afbeelding naar tekst omzetten zonder alles opnieuw te typen?” – dan ben je hier op de juiste plek. Aan het einde heb je een kant‑klaar script dat een **handgeschreven notitie naar tekst** omzet in enkele seconden. + +## Wat je nodig hebt + +- Python 3.8+ (de code werkt met elke recente versie) +- De `ocr`‑bibliotheek – installeer deze met `pip install ocr-sdk` (vervang door de pakketnaam van jouw provider) +- Een duidelijke foto van een handgeschreven notitie (`hand_note.png` in het voorbeeld) +- Een beetje nieuwsgierigheid en een kop koffie ☕️ (optioneel maar aanbevolen) + +Geen zware frameworks, geen betaalde cloud‑sleutels – alleen een lokale engine die **handgeschreven herkenning** direct ondersteunt. + +## Stap 1 – Installeer het OCR‑pakket en importeer het + +Allereerst, laten we het juiste pakket op je machine krijgen. Open een terminal en voer uit: + +```bash +pip install ocr-sdk +``` + +Wanneer de installatie voltooid is, importeer je de module in je script: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tip:** Als je een virtuele omgeving gebruikt, activeer deze dan vóór het installeren. Zo houd je je project netjes en voorkom je versieconflicten. + +## Stap 2 – Maak een OCR‑engine en schakel handgeschreven modus in + +Nu gaan we daadwerkelijk **hoe OCR te gebruiken** – we hebben een engine‑instance nodig die weet dat we te maken hebben met cursieve streken in plaats van gedrukte letters. Het volgende fragment maakt de engine aan en schakelt handgeschreven modus in: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Waarom `recognition_mode` instellen? Omdat de meeste OCR‑engines standaard op gedrukte‑tekstdetectie staan, waardoor de lussen en schuine streken van een persoonlijke notitie vaak over het hoofd worden gezien. Het inschakelen van de handgeschreven modus verhoogt de nauwkeurigheid drastisch. + +## Stap 3 – Laad de afbeelding die je wilt converteren (Handgeschreven afbeelding converteren) + +Afbeeldingen zijn het ruwe materiaal voor elke OCR‑taak. Zorg ervoor dat je foto is opgeslagen in een verliesvrij formaat (PNG werkt uitstekend) en dat de tekst redelijk leesbaar is. Laad de afbeelding vervolgens als volgt: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Als de afbeelding zich naast je script bevindt, kun je simpelweg `"hand_note.png"` gebruiken in plaats van een volledig pad. + +> **Wat als de afbeelding onscherp is?** Probeer vooraf te verwerken met OpenCV (bijv. `cv2.cvtColor` naar grijswaarden, `cv2.threshold` om het contrast te verhogen) voordat je deze aan de OCR‑engine geeft. + +## Stap 4 – Laat de herkenningsengine de handgeschreven tekst extraheren + +Met de engine klaar en de afbeelding in het geheugen, kunnen we eindelijk **handgeschreven tekst extraheren**. De `recognize`‑methode geeft een ruwe result‑object terug dat de tekst plus vertrouwensscores bevat. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Typische ruwe output kan losse regeleinden of verkeerd geïdentificeerde tekens bevatten, vooral als het handschrift rommelig is. Daarom bestaat de volgende stap. + +## Stap 5 – (Optioneel) Polijst de output met de AI‑postprocessor + +De meeste moderne OCR‑SDK's worden geleverd met een lichte AI‑postprocessor die spatiëring opruimt, veelvoorkomende OCR‑fouten corrigeert en regeleinden normaliseert. Het uitvoeren is net zo eenvoudig als: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Als je deze stap overslaat, krijg je nog steeds bruikbare tekst, maar de **handgeschreven notitie naar tekst** conversie ziet er iets ruwer uit. De postprocessor is vooral handig voor notities met opsommingstekens of gemengde hoofdletters. + +## Stap 6 – Verifieer het resultaat en behandel randgevallen + +Na het afdrukken van het gepolijste resultaat, controleer je dubbel of alles er goed uitziet. Hier is een snelle sanity‑check die je kunt toevoegen: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Checklist voor randgevallen** + +| Situatie | Wat te doen | +|-----------|------------| +| **Zeer laag contrast** | Verhoog het contrast met `cv2.convertScaleAbs` vóór het laden. | +| **Meerdere talen** | Stel `ocr_engine.language = ["en", "es"]` in (of jouw doeltalen). | +| **Grote documenten** | Verwerk pagina’s in batches om geheugenpieken te voorkomen. | +| **Speciale symbolen** | Voeg een aangepast woordenboek toe via `ocr_engine.add_custom_words([...])`. | + +## Visueel overzicht + +Hieronder staat een placeholder‑afbeelding die de workflow illustreert — van een gefotografeerde notitie tot schone tekst. De alt‑tekst bevat het belangrijkste zoekwoord, waardoor de afbeelding SEO‑vriendelijk is. + +![hoe OCR te gebruiken op een handgeschreven notitie afbeelding](/images/handwritten_ocr_flow.png "hoe OCR te gebruiken op een handgeschreven notitie afbeelding") + +## Volledig, uitvoerbaar script + +Alle onderdelen samengevoegd, hier is het complete, copy‑and‑paste‑klare programma: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Verwachte output (voorbeeld)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Let op hoe de postprocessor de “T0d@y”‑typefout corrigeerde en de spatiëring normaliseerde. + +## Veelvoorkomende valkuilen & Pro‑tips + +- **Afbeeldingsgrootte is belangrijk** – OCR‑engines beperken meestal de invoergrootte tot 4 K × 4 K. Pas grote foto’s vooraf aan. +- **Handschriftstijl** – Cursief versus blokletters kan de nauwkeurigheid beïnvloeden. Als je de bron kunt beheersen (bijv. een digitale pen), moedig dan blokletters aan voor het beste resultaat. +- **Batchverwerking** – Bij tientallen notities, wikkel het script in een lus en sla elk resultaat op in een CSV‑ of SQLite‑database. +- **Geheugenlekken** – Sommige SDK's houden interne buffers vast; roep `ocr_engine.dispose()` aan nadat je klaar bent als je een vertraging merkt. + +## Volgende stappen – Verder gaan dan eenvoudige OCR + +Nu je **hoe OCR te gebruiken** voor één afbeelding onder de knie hebt, overweeg je deze uitbreidingen: + +1. **Integreren met cloudopslag** – Haal afbeeldingen op uit AWS S3 of Azure Blob, voer dezelfde pipeline uit en zet de resultaten terug. +2. **Taaldetectie toevoegen** – Gebruik `ocr_engine.detect_language()` om automatisch woordenboeken te wisselen. +3. **Combineren met NLP** – Voer de opgeschoonde tekst in spaCy of NLTK om entiteiten, data of actiepunten te extraheren. +4. **Maak een REST‑endpoint** – Wikkel het script in Flask of FastAPI zodat andere services afbeeldingen kunnen POSTen en JSON‑gecodeerde tekst ontvangen. + +Al deze ideeën draaien nog steeds om de kernconcepten **recognize handwritten text**, **extract handwritten text**, en **convert handwritten image** — de exacte zinnen waar je waarschijnlijk als volgende naar zoekt. + +--- + +### TL;DR + +We hebben je laten zien **hoe OCR te gebruiken** om handgeschreven tekst te herkennen, te extraheren en het resultaat te polijsten tot een bruikbare string. Het volledige script staat klaar om te draaien, de workflow is stap‑voor‑stap uitgelegd, en je hebt nu een checklist voor veelvoorkomende randgevallen. Pak een foto van je volgende notitie, voer die in het script in, en laat de machine het typen voor je doen. + +Happy coding, and may your notes always be readable! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/dutch/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/dutch/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..38aa6b89b --- /dev/null +++ b/ocr/dutch/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,187 @@ +--- +category: general +date: 2026-03-28 +description: Voer OCR uit op een afbeelding en verkrijg schone tekst met de coördinaten + van de begrenzingskaders. Leer hoe je OCR kunt extraheren, OCR kunt opschonen en + de resultaten stap voor stap kunt weergeven. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: nl +og_description: Voer OCR uit op een afbeelding, maak de output schoon en toon de coördinaten + van de begrenzingskaders in een beknopte tutorial. +og_title: Voer OCR uit op afbeelding – Schone resultaten en begrenzingskaders +tags: +- OCR +- Computer Vision +- Python +title: Voer OCR uit op afbeelding – schone resultaten en toon de coördinaten van het + begrenzingsvak +url: /nl/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# OCR uitvoeren op afbeelding – schone resultaten en coördinaten van begrenzende vakken tonen + +Heb je ooit **OCR op afbeelding** moeten uitvoeren, maar kreeg je rommelige tekst en wist je niet waar elk woord zich op de foto bevond? Je bent niet de enige. In veel projecten—factuurdigitalisatie, kassabon‑scannen of eenvoudige tekste‑extractie—is het verkrijgen van ruwe OCR‑output slechts de eerste hindernis. Het goede nieuws? Je kunt die output opschonen en direct de coördinaten van de begrenzende vakken van elke regio zien, zonder een hoop boilerplate‑code te schrijven. + +In deze gids lopen we stap voor stap door **hoe OCR te extraheren**, een **hoe OCR op te schonen** post‑processor uit te voeren, en uiteindelijk **de coördinaten van begrenzende vakken** voor elke opgeschoonde regio weer te geven. Aan het einde heb je één enkel, uitvoerbaar script dat een onscherpe foto omzet in nette, gestructureerde tekst klaar voor verdere verwerking. + +## Wat je nodig hebt + +- Python 3.9+ (de syntaxis hieronder werkt op 3.8 en nieuwer) +- Een OCR‑engine die `recognize(..., return_structured=True)` ondersteunt – bijvoorbeeld een fictieve `engine`‑bibliotheek die in het voorbeeld wordt gebruikt. Vervang deze door Tesseract, EasyOCR of een andere SDK die regiogegevens retourneert. +- Basiskennis van Python‑functies en loops +- Een afbeeldingsbestand dat je wilt scannen (PNG, JPG, enz.) + +> **Pro tip:** Als je Tesseract gebruikt, geeft de functie `pytesseract.image_to_data` al begrenzende vakken terug. Je kunt het resultaat in een kleine adapter wikkelen die de `engine.recognize`‑API nabootst zoals hieronder getoond. + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Alt‑tekst: diagram dat laat zien hoe OCR op een afbeelding wordt uitgevoerd en hoe de coördinaten van begrenzende vakken worden gevisualiseerd* + +## Stap 1 – OCR uitvoeren op afbeelding en gestructureerde regio’s ophalen + +Het eerste wat je moet doen, is de OCR‑engine vragen om niet alleen platte tekst, maar een gestructureerde lijst van tekstreeksen terug te geven. Deze lijst bevat de ruwe string en de rechthoek die deze omsluit. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Waarom dit belangrijk is:** +Wanneer je alleen om platte tekst vraagt, verlies je de ruimtelijke context. Gestructureerde data stelt je later in staat om **coördinaten van begrenzende vakken** weer te geven, tekst uit te lijnen met tabellen, of precieze locaties aan een downstream‑model te leveren. + +## Stap 2 – Hoe OCR‑output op te schonen met een post‑processor + +OCR‑engines zijn goed in het herkennen van tekens, maar laten vaak overbodige spaties, regeleinde‑artefacten of foutief herkende symbolen achter. Een post‑processor normaliseert de tekst, corrigeert veelvoorkomende OCR‑fouten en verwijdert onnodige witruimte. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Als je je eigen opschoonroutine bouwt, overweeg dan: + +- Het verwijderen van niet‑ASCII‑tekens (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Het samenvoegen van meerdere spaties tot één enkele spatie +- Het toepassen van een spell‑checker zoals `pyspellchecker` voor duidelijke typefouten + +**Waarom je hier om zou moeten geven:** +Een nette string maakt zoeken, indexeren en downstream‑NLP‑pijplijnen veel betrouwbaarder. Met andere woorden, **hoe OCR op te schonen** is vaak het verschil tussen een bruikbare dataset en een hoofdpijnervaring. + +## Stap 3 – Coördinaten van begrenzende vakken voor elke opgeschoonde regio weergeven + +Nu de tekst netjes is, itereren we over elke regio, printen we de rechthoek en de opgeschoonde string. Dit is het moment waarop we eindelijk **coördinaten van begrenzende vakken** weergeven. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Voorbeeldoutput** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Je kunt die coördinaten nu doorgeven aan een tekenbibliotheek (bijv. OpenCV) om vakken over de originele afbeelding te leggen, of ze opslaan in een database voor latere queries. + +## Volledig, kant‑klaar script + +Hieronder staat het complete programma dat alle drie de stappen samenbrengt. Vervang de placeholder `engine`‑aanroepen door je eigen OCR‑SDK. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Hoe uit te voeren + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Je zou een lijst met begrenzende vakken gekoppeld aan opgeschoonde tekst moeten zien, precies zoals de voorbeeldoutput hierboven. + +## Veelgestelde vragen & randgevallen + +| Vraag | Antwoord | +|----------|--------| +| **Wat als de OCR‑engine `return_structured` niet ondersteunt?** | Schrijf een dunne wrapper die de ruwe output van de engine (meestal een lijst van woorden met coördinaten) omzet naar objecten met de attributen `text` en `bounding_box`. | +| **Kan ik vertrouwensscores krijgen?** | Veel SDK’s bieden een vertrouwensmetric per regio. Voeg deze toe aan de print‑statement: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Hoe ga ik om met gedraaide tekst?** | Pre‑process de afbeelding met OpenCV’s `cv2.minAreaRect` om de scheefstand te corrigeren voordat je `recognize` aanroept. | +| **Wat als ik de output in JSON nodig heb?** | Serialiseer `processed_result.regions` met `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Is er een manier om de vakken te visualiseren?** | Gebruik OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` binnen de loop, daarna `cv2.imwrite("annotated.jpg", img)`. | + +## Afsluiten + +Je hebt zojuist geleerd **hoe OCR op afbeelding** uit te voeren, de ruwe output op te schonen, en **coördinaten van begrenzende vakken** voor elke regio weer te geven. De drie‑stappen‑flow — herkennen → post‑processen → itereren — is een herbruikbaar patroon dat je in elk Python‑project kunt gebruiken dat betrouwbare tekste‑extractie nodig heeft. + +### Wat is het volgende? + +- **Verken verschillende OCR‑back‑ends** (Tesseract, EasyOCR, Google Vision) en vergelijk de nauwkeurigheid. +- **Integreer met een database** om regiogegevens op te slaan voor doorzoekbare archieven. +- **Voeg taaldetectie toe** om elke regio door de juiste spell‑checker te laten lopen. +- **Leg vakken over de originele afbeelding** voor visuele verificatie (zie het OpenCV‑fragment hierboven). + +Als je tegen eigenaardigheden aanloopt, onthoud dan dat de grootste winst voortkomt uit een solide post‑processing stap; een schone string is veel makkelijker mee te werken dan een ruwe dump van tekens. + +Happy coding, and may your OCR pipelines be ever tidy! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/dutch/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/dutch/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..f18dd4a58 --- /dev/null +++ b/ocr/dutch/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Python OCR-tutorial die laat zien hoe je tekst uit een afbeelding haalt + met Python en Aspose OCR Cloud. Leer hoe je een afbeelding laadt voor OCR en de + afbeelding in platte tekst converteert in enkele minuten. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: nl +og_description: Python OCR‑tutorial legt uit hoe je een afbeelding laadt voor OCR + en platte tekst van de afbeelding converteert met Aspose OCR Cloud. Haal de volledige + code en tips op. +og_title: Python OCR-tutorial – Tekst uit afbeeldingen extraheren +tags: +- OCR +- Python +- Image Processing +title: Python OCR Tutorial – Tekst extraheren uit afbeeldingen +url: /nl/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Tutorial – Tekst uit afbeeldingen extraheren + +Heb je je ooit afgevraagd hoe je een rommelige bonfoto kunt omzetten in schone, doorzoekbare tekst? Je bent niet de enige. Naar mijn ervaring is het grootste obstakel niet de OCR-engine zelf, maar het krijgen van de afbeelding in het juiste formaat en het zonder problemen extraheren van de platte tekst. + +Dit **python ocr tutorial** leidt je door elke stap—het laden van een afbeelding voor OCR, het uitvoeren van de herkenning, en uiteindelijk het converteren van de platte tekst van de afbeelding naar een Python‑string die je kunt opslaan of analyseren. Aan het einde kun je **extract text image python** stijl, en heb je geen betaalde licentie nodig om te beginnen. + +## Wat je zult leren + +- Hoe je de Aspose OCR Cloud SDK voor Python installeert en importeert. +- De exacte code om **load image for OCR** (PNG, JPEG, TIFF, PDF, enz.) te gebruiken. +- Hoe je de engine aanroept om **ocr image to text** conversie uit te voeren. +- Tips voor het omgaan met veelvoorkomende edge‑cases zoals multi‑page PDF’s of scans met lage resolutie. +- Manieren om de output te verifiëren en wat te doen als de tekst er onleesbaar uitziet. + +### Vereisten + +- Python 3.8+ geïnstalleerd op je machine. +- Een gratis Aspose Cloud‑account (de proefversie werkt zonder licentie). +- Basiskennis van pip en virtuele omgevingen—niets ingewikkelds. + +> **Pro tip:** Als je al een virtualenv gebruikt, activeer deze dan nu. Het houdt je afhankelijkheden netjes en voorkomt versieconflicten. + +![Python OCR tutorial screenshot die herkende tekst toont](path/to/ocr_example.png "Python OCR tutorial – weergave van geëxtraheerde platte tekst") + +## Stap 1 – Installeer de Aspose OCR Cloud SDK + +Allereerst hebben we de bibliotheek nodig die met de OCR‑service van Aspose communiceert. Open een terminal en voer uit: + +```bash +pip install asposeocrcloud +``` + +Dat enkele commando haalt de nieuwste SDK op (momenteel versie 23.12). Het pakket bevat alles wat je nodig hebt—geen extra beeldverwerkingsbibliotheken vereist. + +## Stap 2 – Initialise de OCR‑engine (Primary Keyword in Action) + +Nu de SDK klaar is, kunnen we de **python ocr tutorial** engine opstarten. De constructor heeft geen licentiesleutel nodig voor de proefversie, wat het simpel houdt. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** Het initialiseren van de engine slechts één keer houdt de daaropvolgende oproepen snel. Als je het object voor elke afbeelding opnieuw maakt, verspil je netwerk‑rondreizen. + +## Stap 3 – Laad afbeelding voor OCR + +Hier komt het **load image for OCR**‑keyword goed van pas. De `Image.load`‑methode van de SDK accepteert een bestandspad of een URL, en detecteert automatisch het formaat (PNG, JPEG, TIFF, PDF, enz.). Laten we een voorbeeldbon laden: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Als je met een multi‑page PDF werkt, verwijs dan simpelweg naar het PDF‑bestand; de SDK behandelt elke pagina intern als een aparte afbeelding. + +## Stap 4 – Voer OCR‑afbeelding‑naar‑tekst conversie uit + +Met de afbeelding in het geheugen gebeurt de daadwerkelijke OCR in één regel. De `recognize`‑methode retourneert een `OcrResult`‑object dat de platte tekst, vertrouwensscores en zelfs begrenzingskaders bevat als je die later nodig hebt. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** Voor afbeeldingen met lage resolutie (onder 300 dpi) wil je de afbeelding eerst opschalen. De SDK biedt een `Resize`‑helper, maar voor de meeste bonnetjes werkt de standaardinstelling prima. + +## Stap 5 – Converteer platte tekst van afbeelding naar een bruikbare string + +Het laatste puzzelstuk is het extraheren van de platte tekst uit het result‑object. Dit is de **convert image plain text** stap die de OCR‑blob omzet in iets dat je kunt afdrukken, opslaan of in een ander systeem kunt voeren. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Wanneer je het script uitvoert, zou je iets moeten zien als: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Die output is nu een reguliere Python‑string, klaar voor CSV‑export, database‑invoeging of natural‑language processing. + +## Veelvoorkomende valkuilen behandelen + +### 1. Lege of ruisige afbeeldingen + +Als `ocr_result.text` leeg terugkomt, controleer dan de beeldkwaliteit. Een snelle oplossing is een preprocessing‑stap toe te voegen: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Multi‑page PDF’s + +Wanneer je een PDF invoert, retourneert `recognize` resultaten voor elke pagina. Loop er doorheen als volgt: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Taalondersteuning + +Aspose OCR ondersteunt meer dan 60 talen. Om van taal te wisselen, stel je de `language`‑eigenschap in vóór het aanroepen van `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Volledig werkend voorbeeld + +Alles bij elkaar, hier is een compleet, copy‑paste‑klaar script dat alles dekt van installatie tot het afhandelen van edge‑cases: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Voer het script uit (`python ocr_demo.py`) en je ziet de **ocr image to text** output direct in je console. + +## Samenvatting – Wat we hebben behandeld + +- De **Aspose OCR Cloud** SDK geïnstalleerd (`pip install asposeocrcloud`). +- De OCR‑engine **geïnitieerd** zonder licentie (perfect voor de proefversie). +- Gedemonstreerd hoe je **load image for OCR** uitvoert, of het nu een PNG, JPEG of PDF is. +- **ocr image to text** conversie uitgevoerd en **convert image plain text** omgezet naar een bruikbare Python‑string. +- Veelvoorkomende valkuilen aangepakt zoals scans met lage resolutie, multi‑page PDF’s en taalkeuze. + +## Volgende stappen & gerelateerde onderwerpen + +Nu je de **python ocr tutorial** onder de knie hebt, overweeg het volgende: + +- **Extract text image python** voor batchverwerking van grote mappen met bonnetjes. +- De OCR‑output integreren met **pandas** voor data‑analyse (`df = pd.read_csv(StringIO(extracted))`). +- **Tesseract OCR** gebruiken als fallback wanneer de internetverbinding beperkt is. +- Post‑processing toevoegen met **spaCy** om entiteiten zoals data, bedragen en winkelnamen te identificeren. + +Voel je vrij om te experimenteren: probeer verschillende afbeeldingsformaten, pas het contrast aan, of wissel van taal. Het OCR‑landschap is breed, en de vaardigheden die je net hebt opgedaan vormen een solide basis voor elk document‑automatiseringsproject. + +Veel plezier met coderen, en moge je tekst altijd leesbaar zijn! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/dutch/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/dutch/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..addd91cf6 --- /dev/null +++ b/ocr/dutch/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,221 @@ +--- +category: general +date: 2026-03-28 +description: Leer hoe je OCR op een afbeelding uitvoert, automatisch een Hugging Face‑model + downloadt, OCR‑tekst opschoont en een LLM‑model configureert in Python met Aspose OCR + Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: nl +og_description: Voer OCR uit op een afbeelding en reinig de output met een automatisch + gedownload Hugging Face‑model. Deze gids laat zien hoe je een LLM‑model configureert + in Python. +og_title: OCR uitvoeren op afbeelding – Complete Aspose OCR Cloud Tutorial +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Voer OCR uit op afbeelding met Aspose OCR Cloud – Volledige stapsgewijze handleiding +url: /nl/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# OCR uitvoeren op afbeelding – Complete Aspose OCR Cloud Tutorial + +Heb je ooit OCR moeten uitvoeren op afbeeldingsbestanden, maar zag de ruwe uitvoer eruit als een warboel? Naar mijn ervaring is het grootste pijnpunt niet de herkenning zelf—het is de opschoning. Gelukkig laat Aspose OCR Cloud je een LLM‑post‑processor koppelen die *OCR‑tekst automatisch kan opschonen*. In deze tutorial lopen we alles door wat je nodig hebt: van **het downloaden van een Hugging Face‑model** tot het configureren van de LLM, het uitvoeren van de OCR‑engine en uiteindelijk het polijsten van het resultaat. + +Aan het einde van deze gids heb je een kant‑klaar script dat: + +1. Een compact Qwen 2.5‑model van Hugging Face ophaalt (automatisch voor je gedownload). +2. Het model configureert om een deel van het netwerk op GPU en de rest op CPU te draaien. +3. De OCR‑engine uitvoert op een afbeelding van een handgeschreven notitie. +4. De LLM gebruikt om de herkende tekst op te schonen, zodat je mens‑leesbare output krijgt. + +> **Prerequisites** – Python 3.8+, `asposeocrcloud`‑package, een GPU met minimaal 4 GB VRAM (optioneel maar aanbevolen), en een internetverbinding voor de eerste model‑download. + +--- + +## Wat je nodig hebt + +- **Aspose OCR Cloud SDK** – installeren via `pip install asposeocrcloud`. +- **Een voorbeeldafbeelding** – bijv. `handwritten_note.jpg` in een lokale map geplaatst. +- **GPU‑ondersteuning** – als je een CUDA‑enabled GPU hebt, zal het script 30 lagen offloaden; anders valt het automatisch terug op CPU. +- **Schrijfrechten** – het script cachet het model in `YOUR_DIRECTORY`; zorg dat de map bestaat. + +--- + +## Stap 1 – Configureer het LLM‑model (download Hugging Face model) + +Het eerste wat we doen is Aspose AI vertellen waar het model vandaan moet halen. De `AsposeAIModelConfig`‑klasse behandelt auto‑download, kwantisatie en GPU‑laag‑allocatie. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Why this matters** – Kwantiseren naar `int8` bespaart het geheugen drastisch (≈ 4 GB vs 12 GB). Het model splitsen tussen GPU en CPU laat je een 3‑billion‑parameter LLM draaien zelfs op een bescheiden RTX 3060. Als je geen GPU hebt, stel `gpu_layers=0` in en de SDK houdt alles op CPU. + +> **Tip:** De eerste uitvoering downloadt ~ 1,5 GB, dus geef het een paar minuten en een stabiele verbinding. + +--- + +## Stap 2 – Initialiseer de AI‑engine met de modelconfiguratie + +Nu starten we de Aspose AI‑engine en voeren we de configuratie die we net hebben gemaakt in. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**What’s happening under the hood?** De SDK controleert `directory_model_path` op een bestaand model. Als er een overeenkomende versie wordt gevonden, wordt deze direct geladen; anders downloadt hij het GGUF‑bestand van Hugging Face, pakt het uit en bereidt de inferentie‑pipeline voor. + +--- + +## Stap 3 – Maak de OCR‑engine en koppel de AI‑post‑processor + +De OCR‑engine doet het zware werk van het herkennen van tekens. Door `ocr_ai.run_postprocessor` te koppelen, schakelen we **clean OCR text** automatisch in na herkenning. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Why use a post‑processor?** Ruwe OCR bevat vaak regeleinden op de verkeerde plaatsen, fout‑gedetecteerde interpunctie of vreemde symbolen. De LLM kan de output herschrijven naar correcte zinnen, spelling verbeteren en zelfs ontbrekende woorden afleiden—feitelijk een ruwe dump omzetten in gepolijste proza. + +--- + +## Stap 4 – Voer OCR uit op een afbeeldingsbestand + +Met alles aangesloten is het tijd om een afbeelding aan de engine te voeren. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Edge case:** Als de afbeelding groot is (> 5 MP), wil je deze eerst verkleinen om de verwerking te versnellen. De SDK accepteert een Pillow `Image`‑object, dus je kunt vooraf verwerken met `PIL.Image.thumbnail()` indien nodig. + +--- + +## Stap 5 – Laat de AI de herkende tekst opschonen en toon beide versies + +Tot slot roepen we de post‑processor aan die we eerder hebben gekoppeld. Deze stap toont het contrast tussen *voor* en *na* opschoning. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Verwachte uitvoer + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Let op hoe de LLM: + +- Veelvoorkomende OCR‑mis‑herkenningen heeft gecorrigeerd (`Th1s` → `This`). +- Vreemde symbolen heeft verwijderd (`&` → `and`). +- Regeleinden heeft genormaliseerd naar correcte zinnen. + +--- + +## 🎨 Visueel overzicht (OCR uitvoeren op afbeelding workflow) + +![OCR uitvoeren op afbeelding workflow](run_ocr_on_image_workflow.png "Diagram dat de OCR‑uitvoeren‑op‑afbeelding‑pipeline toont, van model‑download tot opgeschoonde output") + +Het diagram hierboven vat de volledige pipeline samen: **download Hugging Face model → configureer LLM → initialiseer AI → OCR‑engine → AI‑post‑processor → clean OCR text**. + +--- + +## Veelgestelde vragen & Pro‑tips + +### Wat als ik geen GPU heb? + +Stel `gpu_layers=0` in de `AsposeAIModelConfig`. Het model draait volledig op CPU, wat langzamer is maar nog steeds functioneel. Je kunt ook overschakelen naar een kleiner model (bijv. `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) om de inferentietijd redelijk te houden. + +### Hoe wijzig ik later het model? + +Werk simpelweg `hugging_face_repo_id` bij en voer `ocr_ai.initialize(model_config)` opnieuw uit. De SDK detecteert de versie‑wijziging, downloadt het nieuwe model en vervangt de gecachete bestanden. + +### Kan ik de post‑processor prompt aanpassen? + +Ja. Geef een dictionary door aan `custom_settings` met een `prompt_template`‑sleutel. Bijvoorbeeld: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Moet ik de opgeschoonde tekst opslaan in een bestand? + +Zeker. Na het opschonen kun je het resultaat wegschrijven naar een `.txt`‑ of `.json`‑bestand voor downstream verwerking: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Conclusie + +We hebben je net laten zien hoe je **OCR uitvoert op afbeelding**‑bestanden met Aspose OCR Cloud, automatisch **een Hugging Face‑model downloadt**, vakkundig **LLM‑modelinstellingen configureert**, en uiteindelijk **OCR‑tekst opschoont** met een krachtige LLM‑post‑processor. Het hele proces past in één eenvoudig uit te voeren Python‑script en werkt zowel op GPU‑enabled als CPU‑only machines. + +Als je vertrouwd bent met deze pipeline, overweeg dan te experimenteren met: + +- **Verschillende LLM‑modellen** – probeer `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` voor een groter context‑venster. +- **Batch‑verwerking** – loop over een map met afbeeldingen en verzamel opgeschoonde resultaten in een CSV. +- **Aangepaste prompts** – stem de AI af op jouw domein (juridische documenten, medische notities, enz.). + +Voel je vrij om de `gpu_layers`‑waarde aan te passen, het model te wisselen, of je eigen prompt te gebruiken. De mogelijkheden zijn eindeloos, en de code die je nu hebt is de lanceerbasis. + +Veel plezier met coderen, en moge je OCR‑uitvoer altijd schoon zijn! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/english/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/english/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..bf3b74e07 --- /dev/null +++ b/ocr/english/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,223 @@ +--- +category: general +date: 2026-03-28 +description: How to use OCR to recognize handwritten text in images. Learn to extract + handwritten text, convert handwritten image, and get clean results fast. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: en +og_description: How to use OCR to recognize handwritten text. This tutorial shows + you step‑by‑step how to extract handwritten text from images and get polished results. +og_title: How to Use OCR to Recognize Handwritten Text – Complete Guide +tags: +- OCR +- Handwriting Recognition +- Python +title: How to Use OCR to Recognize Handwritten Text – Complete Guide +url: /python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# How to Use OCR to Recognize Handwritten Text – Complete Guide + +How to use OCR for handwritten notes is a question many developers ask when they need to digitize sketches, meeting minutes, or quick‑jot ideas. In this guide we’ll walk through the exact steps to recognize handwritten text, extract handwritten text, and turn a handwritten image into clean, searchable strings. + +If you’ve ever stared at a photo of a grocery list and wondered, “Can I convert this handwritten image to text without typing everything again?” – you’re in the right place. By the end you’ll have a ready‑to‑run script that turns a **handwritten note to text** in seconds. + +## What You’ll Need + +- Python 3.8+ (the code works with any recent version) +- The `ocr` library – install it with `pip install ocr-sdk` (replace with your provider’s package name) +- A clear picture of a handwritten note (`hand_note.png` in the example) +- A bit of curiosity and a coffee ☕️ (optional but recommended) + +No heavyweight frameworks, no paid cloud keys – just a local engine that supports **handwritten recognition** out of the box. + +## Step 1 – Install the OCR Package and Import It + +First things first, let’s get the right package on your machine. Open a terminal and run: + +```bash +pip install ocr-sdk +``` + +Once the installation finishes, import the module in your script: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tip:** If you’re using a virtual environment, activate it before installing. That keeps your project tidy and avoids version clashes. + +## Step 2 – Create an OCR Engine and Enable Handwritten Mode + +Now we actually **how to use OCR** – we need an engine instance that knows we’re dealing with cursive strokes rather than printed fonts. The following snippet creates the engine and switches it to handwritten mode: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Why set `recognition_mode`? Because most OCR engines default to printed‑text detection, which often skips the loops and slants of a personal note. Enabling the handwritten mode boosts accuracy dramatically. + +## Step 3 – Load the Image You Want to Convert (Convert Handwritten Image) + +Images are the raw material for any OCR job. Make sure your picture is saved in a lossless format (PNG works great) and that the text is reasonably legible. Then load it like this: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +If the image lives next to your script, you can simply use `"hand_note.png"` instead of a full path. + +> **What if the image is blurry?** Try pre‑processing with OpenCV (e.g., `cv2.cvtColor` to grayscale, `cv2.threshold` to increase contrast) before feeding it to the OCR engine. + +## Step 4 – Run the Recognition Engine to Extract Handwritten Text + +With the engine ready and the image in memory, we can finally **extract handwritten text**. The `recognize` method returns a raw result object that contains the text plus confidence scores. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Typical raw output might include stray line breaks or mis‑identified characters, especially if the handwriting is messy. That’s why the next step exists. + +## Step 5 – (Optional) Polish the Output with the AI Post‑Processor + +Most modern OCR SDKs ship with a lightweight AI post‑processor that cleans up spacing, fixes common OCR errors, and normalizes line endings. Running it is as easy as: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +If you skip this step you’ll still get usable text, but the **handwritten note to text** conversion will look a bit rougher. The post‑processor is especially handy for notes that contain bullet points or mixed‑case words. + +## Step 6 – Verify the Result and Handle Edge Cases + +After printing the polished result, double‑check that everything looks right. Here’s a quick sanity check you can add: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Edge‑case checklist** + +| Situation | What to do | +|-----------|------------| +| **Very low contrast** | Increase contrast with `cv2.convertScaleAbs` before loading. | +| **Multiple languages** | Set `ocr_engine.language = ["en", "es"]` (or your target languages). | +| **Large documents** | Process pages in batches to avoid memory spikes. | +| **Special symbols** | Add a custom dictionary via `ocr_engine.add_custom_words([...])`. | + +## Visual Overview + +Below is a placeholder image that illustrates the workflow—from a photographed note to clean text. The alt text contains the primary keyword, making the image SEO‑friendly. + +![how to use OCR on a handwritten note image](/images/handwritten_ocr_flow.png "how to use OCR on a handwritten note image") + +## Full, Runnable Script + +Putting all the pieces together, here’s the complete, copy‑and‑paste‑ready program: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Expected output (example)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Notice how the post‑processor fixed the “T0d@y” typo and normalized spacing. + +## Common Pitfalls & Pro Tips + +- **Image size matters** – OCR engines usually cap input size at 4 K × 4 K. Resize large photos beforehand. +- **Handwriting style** – Cursive vs. block letters can affect accuracy. If you control the source (e.g., a digital pen), encourage block letters for best results. +- **Batch processing** – When dealing with dozens of notes, wrap the script in a loop and store each result in a CSV or SQLite DB. +- **Memory leaks** – Some SDKs keep internal buffers; call `ocr_engine.dispose()` after you’re done if you notice a slowdown. + +## Next Steps – Going Beyond Simple OCR + +Now that you’ve mastered **how to use OCR** for a single image, consider these extensions: + +1. **Integrate with cloud storage** – Pull images from AWS S3 or Azure Blob, run the same pipeline, and push the results back. +2. **Add language detection** – Use `ocr_engine.detect_language()` to automatically switch dictionaries. +3. **Combine with NLP** – Feed the cleaned text into spaCy or NLTK to extract entities, dates, or action items. +4. **Create a REST endpoint** – Wrap the script in Flask or FastAPI so other services can POST images and receive JSON‑encoded text. + +All of these ideas still revolve around the core concepts of **recognize handwritten text**, **extract handwritten text**, and **convert handwritten image**—the exact phrases you’ll likely search for next. + +--- + +### TL;DR + +We showed you **how to use OCR** to recognize handwritten text, extract it, and polish the result into a usable string. The full script is ready to run, the workflow is explained step‑by‑step, and you now have a checklist for common edge cases. Grab a photo of your next meeting note, plug it into the script, and let the machine do the typing for you. + +Happy coding, and may your notes always be readable! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/english/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/english/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..c116060ad --- /dev/null +++ b/ocr/english/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,185 @@ +--- +category: general +date: 2026-03-28 +description: Perform OCR on image and get clean text with bounding box coordinates. + Learn how to extract OCR, clean OCR, and display results step‑by‑step. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: en +og_description: Perform OCR on image, clean the output, and display bounding box coordinates + in a concise tutorial. +og_title: Perform OCR on Image – Clean Results and Bounding Boxes +tags: +- OCR +- Computer Vision +- Python +title: Perform OCR on Image – Clean Results and Show Bounding Box Coordinates +url: /python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Perform OCR on Image – Clean Results and Show Bounding Box Coordinates + +Ever needed to **perform OCR on image** files but kept getting messy text and unsure where each word lives on the picture? You're not alone. In many projects—invoice digitization, receipt scanning, or simple text extraction—getting raw OCR output is just the first hurdle. The good news? You can clean that output and instantly see each region’s bounding box coordinates without writing a ton of boilerplate code. + +In this guide we’ll walk through **how to extract OCR**, run a **how to clean OCR** post‑processor, and finally **display bounding box coordinates** for every cleaned region. By the end you’ll have a single, runnable script that turns a blurry photo into tidy, structured text ready for downstream processing. + +## What You’ll Need + +- Python 3.9+ (the syntax below works on 3.8 and newer) +- An OCR engine that supports `recognize(..., return_structured=True)` – for example, a fictional `engine` library used in the snippet. Replace it with Tesseract, EasyOCR, or any SDK that returns region data. +- Basic familiarity with Python functions and loops +- An image file you want to scan (PNG, JPG, etc.) + +> **Pro tip:** If you’re using Tesseract, the `pytesseract.image_to_data` function already gives you bounding boxes. You can wrap its result in a small adapter that mimics the `engine.recognize` API shown below. + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Alt text: diagram showing how to perform OCR on image and visualize bounding box coordinates* + +## Step 1 – Perform OCR on Image and Get Structured Regions + +The first thing is to ask the OCR engine to return not just plain text but a structured list of text regions. This list contains the raw string and the rectangle that encloses it. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Why this matters:** +When you only ask for plain text you lose spatial context. Structured data lets you later **display bounding box coordinates**, align text with tables, or feed precise locations to a downstream model. + +## Step 2 – How to Clean OCR Output with a Post‑Processor + +OCR engines are great at spotting characters, but they often leave stray spaces, line‑break artifacts, or mis‑recognized symbols. A post‑processor normalizes the text, fixes common OCR errors, and trims whitespace. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +If you’re building your own cleaner, consider: + +- Removing non‑ASCII characters (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Collapsing multiple spaces into a single space +- Applying a spell‑checker like `pyspellchecker` for obvious typos + +**Why you should care:** +A tidy string makes searching, indexing, and downstream NLP pipelines far more reliable. In other words, **how to clean OCR** is often the difference between a usable dataset and a headache. + +## Step 3 – Display Bounding Box Coordinates for Each Cleaned Region + +Now that the text is tidy, we iterate over each region, printing its rectangle and the cleaned string. This is the part where we finally **display bounding box coordinates**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Sample output** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +You can now feed those coordinates into a drawing library (e.g., OpenCV) to overlay boxes on the original image, or store them in a database for later queries. + +## Full, Ready‑to‑Run Script + +Below is the complete program that ties together all three steps. Swap out the placeholder `engine` calls with your actual OCR SDK. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### How to Run + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +You should see a list of bounding boxes paired with cleaned text, exactly like the sample output above. + +## Frequently Asked Questions & Edge Cases + +| Question | Answer | +|----------|--------| +| **What if the OCR engine doesn’t support `return_structured`?** | Write a thin wrapper that converts the engine’s raw output (usually a list of words with coordinates) into objects with `text` and `bounding_box` attributes. | +| **Can I get confidence scores?** | Many SDKs expose a confidence metric per region. Append it to the print statement: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **How to handle rotated text?** | Pre‑process the image with OpenCV’s `cv2.minAreaRect` to deskew before calling `recognize`. | +| **What if I need the output in JSON?** | Serialize `processed_result.regions` with `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Is there a way to visualize the boxes?** | Use OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` inside the loop, then `cv2.imwrite("annotated.jpg", img)`. | + +## Wrapping Up + +You’ve just learned **how to perform OCR on image**, clean the raw output, and **display bounding box coordinates** for every region. The three‑step flow—recognize → post‑process → iterate—is a reusable pattern you can drop into any Python project that needs reliable text extraction. + +### What’s Next? + +- **Explore different OCR back‑ends** (Tesseract, EasyOCR, Google Vision) and compare accuracy. +- **Integrate with a database** to store region data for searchable archives. +- **Add language detection** to route each region through the appropriate spell‑checker. +- **Overlay boxes on the original image** for visual verification (see the OpenCV snippet above). + +If you run into quirks, remember that the biggest win comes from a solid post‑processing step; a clean string is far easier to work with than a raw dump of characters. + +Happy coding, and may your OCR pipelines be ever tidy! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/english/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/english/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..14f083e7c --- /dev/null +++ b/ocr/english/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,231 @@ +--- +category: general +date: 2026-03-28 +description: Python OCR tutorial showing how to extract text image python with Aspose + OCR Cloud. Learn to load image for OCR and convert image plain text in minutes. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: en +og_description: Python OCR tutorial explains how to load image for OCR and convert + image plain text using Aspose OCR Cloud. Get the full code and tips. +og_title: Python OCR Tutorial – Extract Text from Images +tags: +- OCR +- Python +- Image Processing +title: Python OCR Tutorial – Extract Text from Images +url: /python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Tutorial – Extract Text from Images + +Ever wondered how to turn a messy receipt photo into clean, searchable text? You're not the only one. In my experience, the biggest hurdle isn’t the OCR engine itself but getting the image into the right format and pulling the plain text out without a hitch. + +This **python ocr tutorial** walks you through every step—loading an image for OCR, running the recognition, and finally converting the image plain text into a Python string you can store or analyse. By the end you’ll be able to **extract text image python** style, and you won’t need any paid licence to get started. + +## What You’ll Learn + +- How to install and import the Aspose OCR Cloud SDK for Python. +- The exact code to **load image for OCR** (PNG, JPEG, TIFF, PDF, etc.). +- How to call the engine to perform **ocr image to text** conversion. +- Tips for handling common edge‑cases like multi‑page PDFs or low‑resolution scans. +- Ways to verify the output and what to do if the text looks garbled. + +### Prerequisites + +- Python 3.8+ installed on your machine. +- A free Aspose Cloud account (the trial works without a licence). +- Basic familiarity with pip and virtual environments—nothing fancy. + +> **Pro tip:** If you’re already using a virtualenv, activate it now. It keeps your dependencies tidy and avoids version clashes. + +![Python OCR tutorial screenshot showing recognized text](path/to/ocr_example.png "Python OCR tutorial – extracted plain text display") + +## Step 1 – Install the Aspose OCR Cloud SDK + +First thing’s first, we need the library that talks to Aspose’s OCR service. Open a terminal and run: + +```bash +pip install asposeocrcloud +``` + +That single command pulls the latest SDK (currently version 23.12). The package includes everything you need—no extra image‑processing libs required. + +## Step 2 – Initialise the OCR Engine (Primary Keyword in Action) + +Now that the SDK is ready, we can spin up the **python ocr tutorial** engine. The constructor doesn’t need any licence key for the trial, which keeps things simple. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** Initialising the engine only once keeps the subsequent calls fast. If you re‑create the object for every image you’ll waste network round‑trips. + +## Step 3 – Load Image for OCR + +Here’s where the **load image for OCR** keyword shines. The SDK’s `Image.load` method accepts a file path or a URL, and it automatically detects the format (PNG, JPEG, TIFF, PDF, etc.). Let’s load a sample receipt: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +If you’re dealing with a multi‑page PDF, simply point to the PDF file; the SDK will treat each page as a separate image internally. + +## Step 4 – Perform OCR Image to Text Conversion + +With the image in memory, the actual OCR happens in a single line. The `recognize` method returns an `OcrResult` object that contains the plain text, confidence scores, and even bounding boxes if you need them later. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** For low‑resolution pictures (under 300 dpi) you might want to upscale the image first. The SDK offers a `Resize` helper, but for most receipts the default works fine. + +## Step 5 – Convert Image Plain Text to a Usable String + +The final piece of the puzzle is extracting the plain text from the result object. This is the **convert image plain text** step that turns the OCR blob into something you can print, store, or feed into another system. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +When you run the script, you should see something like: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +That output is now a regular Python string, ready for CSV export, database insertion, or natural‑language processing. + +## Handling Common Pitfalls + +### 1. Blank or Noisy Images + +If `ocr_result.text` comes back empty, double‑check the image quality. A quick fix is to add a preprocessing step: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Multi‑Page PDFs + +When you feed a PDF, `recognize` returns results for each page. Loop through them like this: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Language Support + +Aspose OCR supports over 60 languages. To switch the language, set the `language` property before calling `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Full Working Example + +Putting it all together, here’s a complete, copy‑paste‑ready script that covers everything from installation to edge‑case handling: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Run the script (`python ocr_demo.py`) and you’ll see the **ocr image to text** output right in your console. + +## Recap – What We Covered + +- Installed the **Aspose OCR Cloud** SDK (`pip install asposeocrcloud`). +- **Initialised the OCR engine** without a licence (perfect for trial). +- Demonstrated how to **load image for OCR**, whether it’s a PNG, JPEG, or PDF. +- Executed **ocr image to text** conversion and **converted image plain text** into a usable Python string. +- Tackled common pitfalls like low‑resolution scans, multi‑page PDFs, and language selection. + +## Next Steps & Related Topics + +Now that you’ve mastered the **python ocr tutorial**, consider exploring: + +- **Extract text image python** for batch processing large folders of receipts. +- Integrating the OCR output with **pandas** for data analysis (`df = pd.read_csv(StringIO(extracted))`). +- Using **Tesseract OCR** as a fallback when internet connectivity is limited. +- Adding post‑processing with **spaCy** to identify entities like dates, amounts, and merchant names. + +Feel free to experiment: try different image formats, tweak the contrast, or switch languages. The OCR landscape is broad, and the skills you’ve just picked up are a solid foundation for any document‑automation project. + +Happy coding, and may your text always be readable! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/english/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/english/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..9e0846277 --- /dev/null +++ b/ocr/english/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-03-28 +description: Learn how to run OCR on image, download Hugging Face model automatically, + clean OCR text and configure LLM model in Python using Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: en +og_description: Run OCR on image and clean the output using an auto‑downloaded Hugging Face + model. This guide shows how to configure LLM model in Python. +og_title: Run OCR on Image – Complete Aspose OCR Cloud Tutorial +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Run OCR on Image with Aspose OCR Cloud – Full Step‑by‑Step Guide +url: /python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Run OCR on Image – Complete Aspose OCR Cloud Tutorial + +Ever needed to run OCR on image files but the raw output looked like a jumbled mess? In my experience the biggest pain point isn’t the recognition itself—it’s the cleanup. Luckily, Aspose OCR Cloud lets you attach an LLM post‑processor that can *clean OCR text* automatically. In this tutorial we’ll walk through everything you need: from **downloading a Hugging Face model** to configuring the LLM, running the OCR engine, and finally polishing the result. + +By the end of this guide you’ll have a ready‑to‑run script that: + +1. Pulls a compact Qwen 2.5 model from Hugging Face (auto‑downloaded for you). +2. Configures the model to run part of the network on GPU and the rest on CPU. +3. Executes the OCR engine on a handwritten note image. +4. Uses the LLM to clean the recognised text, giving you human‑readable output. + +> **Prerequisites** – Python 3.8+, `asposeocrcloud` package, a GPU with at least 4 GB VRAM (optional but recommended), and an internet connection for the first model download. + +--- + +## What You’ll Need + +- **Aspose OCR Cloud SDK** – install via `pip install asposeocrcloud`. +- **A sample image** – e.g., `handwritten_note.jpg` placed in a local folder. +- **GPU support** – if you have a CUDA‑enabled GPU, the script will offload 30 layers; otherwise it will fall back to CPU automatically. +- **Write permission** – the script caches the model in `YOUR_DIRECTORY`; make sure the folder exists. + +--- + +## Step 1 – Configure the LLM Model (download Hugging Face model) + +The first thing we do is tell Aspose AI where to fetch the model from. The `AsposeAIModelConfig` class handles auto‑download, quantization, and GPU layer allocation. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Why this matters** – Quantizing to `int8` shaves memory usage dramatically (≈ 4 GB vs 12 GB). Splitting the model between GPU and CPU lets you run a 3‑billion‑parameter LLM even on a modest RTX 3060. If you don’t have a GPU, set `gpu_layers=0` and the SDK will keep everything on CPU. + +> **Tip:** The first run will download ~ 1.5 GB, so give it a few minutes and a stable connection. + +--- + +## Step 2 – Initialise the AI Engine with the Model Configuration + +Now we spin up the Aspose AI engine and feed it the configuration we just created. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**What’s happening under the hood?** The SDK checks `directory_model_path` for an existing model. If it finds a matching version it loads it instantly; otherwise it downloads the GGUF file from Hugging Face, unpacks it, and prepares the inference pipeline. + +--- + +## Step 3 – Create the OCR Engine and Attach the AI Post‑Processor + +The OCR engine does the heavy lifting of recognising characters. By attaching `ocr_ai.run_postprocessor` we enable **clean OCR text** automatically after recognition. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Why use a post‑processor?** Raw OCR often includes line breaks in the wrong places, mis‑detected punctuation, or stray symbols. The LLM can rewrite the output into proper sentences, correct spelling, and even infer missing words—essentially turning a raw dump into polished prose. + +--- + +## Step 4 – Run OCR on an Image File + +With everything wired together, it’s time to feed an image to the engine. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Edge case:** If the image is large (> 5 MP), you might want to resize it first to speed up processing. The SDK accepts a Pillow `Image` object, so you can pre‑process with `PIL.Image.thumbnail()` if needed. + +--- + +## Step 5 – Let the AI Clean Up the Recognised Text and Show Both Versions + +Finally we invoke the post‑processor we attached earlier. This step demonstrates the contrast between *before* and *after* cleaning. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Expected Output + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Notice how the LLM has: + +- Fixed common OCR mis‑recognitions (`Th1s` → `This`). +- Removed stray symbols (`&` → `and`). +- Normalised line breaks into proper sentences. + +--- + +## 🎨 Visual Overview (Run OCR on image Workflow) + +![Run OCR on image workflow](run_ocr_on_image_workflow.png "Diagram showing the run OCR on image pipeline from model download to cleaned output") + +The diagram above summarises the full pipeline: **download Hugging Face model → configure LLM → initialise AI → OCR engine → AI post‑processor → clean OCR text**. + +--- + +## Common Questions & Pro Tips + +### What if I don’t have a GPU? + +Set `gpu_layers=0` in the `AsposeAIModelConfig`. The model will run entirely on CPU, which is slower but still functional. You can also switch to a smaller model (e.g., `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) to keep inference time reasonable. + +### How do I change the model later? + +Just update `hugging_face_repo_id` and re‑run `ocr_ai.initialize(model_config)`. The SDK will detect the version change, download the new model, and replace the cached files. + +### Can I customise the post‑processor prompt? + +Yes. Pass a dictionary to `custom_settings` with a `prompt_template` key. For example: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Should I store the cleaned text to a file? + +Definitely. After cleaning you can write the result to a `.txt` or `.json` file for downstream processing: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Conclusion + +We’ve just shown you how to **run OCR on image** files with Aspose OCR Cloud, automatically **download a Hugging Face model**, expertly **configure LLM model** settings, and finally **clean OCR text** using a powerful LLM post‑processor. The whole process fits into a single, easy‑to‑run Python script and works on both GPU‑enabled and CPU‑only machines. + +If you’re comfortable with this pipeline, consider experimenting with: + +- **Different LLMs** – try `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` for a larger context window. +- **Batch processing** – loop over a folder of images and aggregate cleaned results into a CSV. +- **Custom prompts** – tailor the AI to your domain (legal documents, medical notes, etc.). + +Feel free to tweak the `gpu_layers` value, swap the model, or plug in your own prompt. The sky’s the limit, and the code you have now is the launchpad. + +Happy coding, and may your OCR outputs be ever clean! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/french/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/french/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..00fb690cd --- /dev/null +++ b/ocr/french/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Comment utiliser l'OCR pour reconnaître le texte manuscrit dans les images. + Apprenez à extraire le texte manuscrit, à convertir l'image manuscrite et à obtenir + des résultats propres rapidement. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: fr +og_description: Comment utiliser l’OCR pour reconnaître le texte manuscrit. Ce tutoriel + vous montre, étape par étape, comment extraire le texte manuscrit des images et + obtenir des résultats soignés. +og_title: Comment utiliser l’OCR pour reconnaître le texte manuscrit – Guide complet +tags: +- OCR +- Handwriting Recognition +- Python +title: Comment utiliser l’OCR pour reconnaître le texte manuscrit – Guide complet +url: /fr/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Comment utiliser l'OCR pour reconnaître le texte manuscrit – Guide complet + +Comment utiliser l'OCR pour les notes manuscrites est une question que de nombreux développeurs se posent lorsqu'ils doivent numériser des croquis, des comptes‑rendus de réunion ou des idées griffonnées rapidement. Dans ce guide, nous parcourrons les étapes exactes pour reconnaître le texte manuscrit, extraire le texte manuscrit et transformer une image manuscrite en chaînes propres et recherchables. + +Si vous avez déjà fixé une photo d’une liste de courses en vous demandant « Puis‑je convertir cette image manuscrite en texte sans tout retaper ? » – vous êtes au bon endroit. À la fin, vous disposerez d’un script prêt à l’emploi qui transforme une **note manuscrite en texte** en quelques secondes. + +## Ce dont vous avez besoin + +- Python 3.8+ (le code fonctionne avec n'importe quelle version récente) +- La bibliothèque `ocr` – installez‑la avec `pip install ocr-sdk` (remplacez par le nom du paquet de votre fournisseur) +- Une image claire d’une note manuscrite (`hand_note.png` dans l’exemple) +- Un peu de curiosité et un café ☕️ (optionnel mais recommandé) + +Pas de frameworks lourds, pas de clés cloud payantes – juste un moteur local qui prend en charge la **reconnaissance manuscrite** dès le départ. + +## Étape 1 – Installer le package OCR et l’importer + +Tout d’abord, obtenons le bon package sur votre machine. Ouvrez un terminal et exécutez : + +```bash +pip install ocr-sdk +``` + +Une fois l’installation terminée, importez le module dans votre script : + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Astuce :** Si vous utilisez un environnement virtuel, activez‑le avant d’installer. Cela garde votre projet propre et évite les conflits de versions. + +## Étape 2 – Créer un moteur OCR et activer le mode manuscrit + +Maintenant nous passons réellement à **comment utiliser l'OCR** – nous avons besoin d’une instance de moteur qui sait que nous traitons des traits cursifs plutôt que des polices imprimées. Le fragment suivant crée le moteur et le passe en mode manuscrit : + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Pourquoi définir `recognition_mode` ? Parce que la plupart des moteurs OCR détectent par défaut du texte imprimé, ce qui ignore souvent les boucles et les inclinaisons d’une note personnelle. Activer le mode manuscrit augmente considérablement la précision. + +## Étape 3 – Charger l’image que vous souhaitez convertir (Convertir une image manuscrite) + +Les images sont la matière première de tout travail OCR. Assurez‑vous que votre photo est enregistrée dans un format sans perte (PNG fonctionne très bien) et que le texte est raisonnablement lisible. Chargez‑la ensuite ainsi : + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Si l’image se trouve à côté de votre script, vous pouvez simplement utiliser `"hand_note.png"` au lieu d’un chemin complet. + +> **Et si l’image est floue ?** Essayez un pré‑traitement avec OpenCV (par ex., `cv2.cvtColor` en niveaux de gris, `cv2.threshold` pour augmenter le contraste) avant de la transmettre au moteur OCR. + +## Étape 4 – Exécuter le moteur de reconnaissance pour extraire le texte manuscrit + +Avec le moteur prêt et l’image en mémoire, nous pouvons enfin **extraire le texte manuscrit**. La méthode `recognize` renvoie un objet résultat brut qui contient le texte ainsi que les scores de confiance. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +La sortie brute typique peut contenir des sauts de ligne parasites ou des caractères mal identifiés, surtout si l’écriture est désordonnée. C’est pourquoi l’étape suivante existe. + +## Étape 5 – (Optionnel) Affiner la sortie avec le post‑processeur IA + +La plupart des SDK OCR modernes sont fournis avec un post‑processeur IA léger qui nettoie les espaces, corrige les erreurs OCR courantes et normalise les fins de ligne. L’exécuter est aussi simple que : + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Si vous sautez cette étape, vous obtiendrez toujours un texte exploitable, mais la conversion **note manuscrite en texte** sera un peu plus rugueuse. Le post‑processeur est particulièrement utile pour les notes contenant des puces ou des mots à casse mixte. + +## Étape 6 – Vérifier le résultat et gérer les cas limites + +Après avoir affiché le résultat affiné, revérifiez que tout semble correct. Voici une vérification rapide que vous pouvez ajouter : + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Checklist des cas limites** + +| Situation | Action à faire | +|-----------|----------------| +| **Very low contrast** | Augmentez le contraste avec `cv2.convertScaleAbs` avant le chargement. | +| **Multiple languages** | Définissez `ocr_engine.language = ["en", "es"]` (ou vos langues cibles). | +| **Large documents** | Traitez les pages par lots pour éviter les pics de mémoire. | +| **Special symbols** | Ajoutez un dictionnaire personnalisé via `ocr_engine.add_custom_words([...])`. | + +## Vue d’ensemble visuelle + +Ci‑dessous se trouve une image de substitution qui illustre le flux de travail — d’une note photographiée à du texte propre. Le texte alternatif contient le mot‑clé principal, rendant l’image optimisée pour le SEO. + +![comment utiliser l'OCR sur une image de note manuscrite](/images/handwritten_ocr_flow.png "comment utiliser l'OCR sur une image de note manuscrite") + +## Script complet et exécutable + +En assemblant toutes les pièces, voici le programme complet, prêt à copier‑coller : + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Sortie attendue (exemple)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Remarquez comment le post‑processeur a corrigé la faute de frappe « T0d@y » et normalisé les espaces. + +## Pièges courants & Astuces pro + +- **La taille de l’image compte** – les moteurs OCR limitent généralement la taille d’entrée à 4 K × 4 K. Redimensionnez les grandes photos au préalable. +- **Style d’écriture** – la cursive vs. les lettres imprimées peuvent affecter la précision. Si vous contrôlez la source (par ex., un stylo numérique), privilégiez les lettres imprimées pour de meilleurs résultats. +- **Traitement par lots** – lorsque vous avez des dizaines de notes, encapsulez le script dans une boucle et stockez chaque résultat dans un CSV ou une base de données SQLite. +- **Fuites de mémoire** – certains SDK conservent des tampons internes ; appelez `ocr_engine.dispose()` une fois terminé si vous remarquez un ralentissement. + +## Prochaines étapes – Aller au‑delà de l’OCR simple + +Maintenant que vous avez maîtrisé **comment utiliser l'OCR** pour une image unique, envisagez ces extensions : + +1. **Intégrer avec le stockage cloud** – récupérer les images depuis AWS S3 ou Azure Blob, exécuter le même pipeline et renvoyer les résultats. +2. **Ajouter la détection de langue** – utilisez `ocr_engine.detect_language()` pour changer automatiquement de dictionnaires. +3. **Combiner avec le NLP** – injectez le texte nettoyé dans spaCy ou NLTK pour extraire des entités, des dates ou des actions. +4. **Créer un endpoint REST** – encapsulez le script dans Flask ou FastAPI afin que d’autres services puissent POST des images et recevoir du texte encodé en JSON. + +Toutes ces idées tournent toujours autour des concepts clés de **reconnaître le texte manuscrit**, **extraire le texte manuscrit**, et **convertir une image manuscrite** — les expressions exactes que vous rechercherez probablement ensuite. + +--- + +### TL;DR + +Nous vous avons montré **comment utiliser l'OCR** pour reconnaître le texte manuscrit, l’extraire et affiner le résultat en une chaîne exploitable. Le script complet est prêt à être exécuté, le flux de travail est expliqué étape par étape, et vous disposez maintenant d’une checklist pour les cas limites courants. Prenez une photo de votre prochaine note de réunion, branchez‑la dans le script, et laissez la machine taper pour vous. + +Bonne programmation, et que vos notes soient toujours lisibles ! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/french/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/french/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..1b5933138 --- /dev/null +++ b/ocr/french/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,188 @@ +--- +category: general +date: 2026-03-28 +description: Effectuer une OCR sur l'image et obtenir du texte propre avec les coordonnées + des boîtes englobantes. Apprenez comment extraire l'OCR, nettoyer l'OCR et afficher + les résultats étape par étape. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: fr +og_description: Effectuer la reconnaissance optique de caractères sur une image, nettoyer + le résultat et afficher les coordonnées des boîtes englobantes dans un tutoriel + concis. +og_title: Effectuer une OCR sur l'image – Résultats propres et boîtes englobantes +tags: +- OCR +- Computer Vision +- Python +title: Effectuer l'OCR sur une image – Nettoyer les résultats et afficher les coordonnées + des boîtes englobantes +url: /fr/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Effectuer une OCR sur une image – Nettoyer les résultats et afficher les coordonnées des boîtes englobantes + +Vous avez déjà eu besoin d'**effectuer une OCR sur des fichiers image** mais vous obteniez du texte désordonné sans savoir où chaque mot se situe sur l'image ? Vous n'êtes pas seul. Dans de nombreux projets—numérisation de factures, scan de reçus ou extraction de texte simple—obtenir la sortie brute d'une OCR n'est que le premier obstacle. Bonne nouvelle : vous pouvez nettoyer cette sortie et voir instantanément les coordonnées de la boîte englobante de chaque région sans écrire une tonne de code boilerplate. + +Dans ce guide, nous allons parcourir **comment extraire l'OCR**, exécuter un **post‑processus de nettoyage d'OCR**, puis **afficher les coordonnées des boîtes englobantes** pour chaque région nettoyée. À la fin, vous disposerez d'un script unique, exécutable, qui transforme une photo floue en texte structuré et propre, prêt pour le traitement en aval. + +## Ce dont vous avez besoin + +- Python 3.9+ (la syntaxe ci‑dessous fonctionne sur 3.8 et versions ultérieures) +- Un moteur OCR qui supporte `recognize(..., return_structured=True)` – par exemple, une bibliothèque fictive `engine` utilisée dans l'exemple. Remplacez‑la par Tesseract, EasyOCR ou tout SDK qui renvoie des données de région. +- Une connaissance de base des fonctions et boucles Python +- Un fichier image que vous souhaitez analyser (PNG, JPG, etc.) + +> **Astuce :** Si vous utilisez Tesseract, la fonction `pytesseract.image_to_data` vous fournit déjà les boîtes englobantes. Vous pouvez envelopper son résultat dans un petit adaptateur qui imite l'API `engine.recognize` montrée ci‑dessous. + +--- + +![perform OCR on image example](image-placeholder.png "exemple d'exécution d'OCR sur une image") + +*Texte alternatif : diagramme montrant comment effectuer une OCR sur une image et visualiser les coordonnées des boîtes englobantes* + +## Étape 1 – Effectuer une OCR sur l'image et obtenir les régions structurées + +La première chose à faire est de demander au moteur OCR de renvoyer non seulement du texte brut mais une liste structurée de régions de texte. Cette liste contient la chaîne brute et le rectangle qui l'englobe. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Pourquoi c'est important :** +Lorsque vous ne demandez que du texte brut, vous perdez le contexte spatial. Les données structurées vous permettent ensuite **d'afficher les coordonnées des boîtes englobantes**, d'aligner le texte avec des tableaux ou d'alimenter des modèles en aval avec des emplacements précis. + +## Étape 2 – Comment nettoyer la sortie OCR avec un post‑processus + +Les moteurs OCR sont excellents pour repérer les caractères, mais ils laissent souvent des espaces superflus, des artefacts de retour à la ligne ou des symboles mal reconnus. Un post‑processus normalise le texte, corrige les erreurs OCR courantes et supprime les espaces inutiles. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Si vous créez votre propre nettoyeur, pensez à : + +- Supprimer les caractères non‑ASCII (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Réduire plusieurs espaces à un seul espace +- Appliquer un correcteur orthographique comme `pyspellchecker` pour les fautes évidentes + +**Pourquoi cela compte :** +Une chaîne propre rend la recherche, l'indexation et les pipelines NLP en aval beaucoup plus fiables. En d'autres termes, **comment nettoyer l'OCR** est souvent la différence entre un jeu de données exploitable et un cauchemar. + +## Étape 3 – Afficher les coordonnées des boîtes englobantes pour chaque région nettoyée + +Maintenant que le texte est propre, nous parcourons chaque région, affichant son rectangle et la chaîne nettoyée. C'est la partie où nous **affichons enfin les coordonnées des boîtes englobantes**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Exemple de sortie** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Vous pouvez maintenant injecter ces coordonnées dans une bibliothèque de dessin (par ex., OpenCV) pour superposer des boîtes sur l'image originale, ou les stocker dans une base de données pour des requêtes ultérieures. + +## Script complet, prêt à l'exécution + +Voici le programme complet qui réunit les trois étapes. Remplacez les appels fictifs `engine` par votre SDK OCR réel. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Comment l'exécuter + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Vous devriez voir une liste de boîtes englobantes associées à du texte nettoyé, exactement comme l'exemple de sortie ci‑dessus. + +## Questions fréquentes & cas particuliers + +| Question | Réponse | +|----------|---------| +| **Et si le moteur OCR ne supporte pas `return_structured` ?** | Écrivez un petit wrapper qui convertit la sortie brute du moteur (généralement une liste de mots avec coordonnées) en objets possédant les attributs `text` et `bounding_box`. | +| **Puis‑je obtenir des scores de confiance ?** | De nombreux SDK exposent une métrique de confiance par région. Ajoutez‑la à l'instruction d'affichage : `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Comment gérer du texte tourné ?** | Pré‑traitez l'image avec `cv2.minAreaRect` d'OpenCV pour la redresser avant d'appeler `recognize`. | +| **Et si j'ai besoin du résultat en JSON ?** | Sérialisez `processed_result.regions` avec `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Existe‑t‑il un moyen de visualiser les boîtes ?** | Utilisez OpenCV : `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` dans la boucle, puis `cv2.imwrite("annotated.jpg", img)`. | + +## Conclusion + +Vous venez d'apprendre **comment effectuer une OCR sur une image**, nettoyer la sortie brute, et **afficher les coordonnées des boîtes englobantes** pour chaque région. Le flux en trois étapes—reconnaître → post‑processer → itérer—est un modèle réutilisable que vous pouvez intégrer à n'importe quel projet Python nécessitant une extraction de texte fiable. + +### Et après ? + +- **Explorez différents back‑ends OCR** (Tesseract, EasyOCR, Google Vision) et comparez leur précision. +- **Intégrez une base de données** pour stocker les données de région et créer des archives consultables. +- **Ajoutez la détection de langue** afin de diriger chaque région vers le correcteur orthographique approprié. +- **Superposez les boîtes sur l'image originale** pour une vérification visuelle (voir le snippet OpenCV ci‑dessus). + +Si vous rencontrez des particularités, souvenez‑vous que le plus grand gain provient d'une étape de post‑traitement solide ; une chaîne propre est bien plus facile à manipuler qu'un dump brut de caractères. + +Bon codage, et que vos pipelines OCR restent toujours impeccables ! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/french/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/french/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..8d445eeca --- /dev/null +++ b/ocr/french/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Tutoriel OCR Python montrant comment extraire du texte d’une image avec + Aspose OCR Cloud. Apprenez à charger une image pour l’OCR et à convertir l’image + en texte brut en quelques minutes. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: fr +og_description: Le tutoriel OCR Python explique comment charger une image pour l'OCR + et convertir le texte brut de l'image en utilisant Aspose OCR Cloud. Obtenez le + code complet et des astuces. +og_title: Tutoriel OCR Python – Extraire du texte d'images +tags: +- OCR +- Python +- Image Processing +title: Tutoriel OCR Python – Extraire le texte des images +url: /fr/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Tutoriel Python OCR – Extraire du texte à partir d'images + +Vous êtes-vous déjà demandé comment transformer une photo de reçu brouillonne en texte propre et consultable ? Vous n'êtes pas le seul. D'après mon expérience, le principal obstacle n’est pas le moteur OCR lui‑même, mais bien de mettre l’image au bon format et d’extraire le texte brut sans accroc. + +Ce **python ocr tutorial** vous guide pas à pas : charger une image pour l’OCR, lancer la reconnaissance, puis convertir le texte brut de l’image en une chaîne Python que vous pouvez stocker ou analyser. À la fin, vous saurez **extract text image python** et vous n’aurez besoin d’aucune licence payante pour commencer. + +## Ce que vous allez apprendre + +- Comment installer et importer le Aspose OCR Cloud SDK for Python. +- Le code exact pour **load image for OCR** (PNG, JPEG, TIFF, PDF, etc.). +- Comment appeler le moteur pour réaliser la conversion **ocr image to text**. +- Astuces pour gérer les cas limites courants comme les PDF multi‑pages ou les scans basse résolution. +- Méthodes pour vérifier la sortie et que faire si le texte apparaît illisible. + +### Prérequis + +- Python 3.8+ installé sur votre machine. +- Un compte gratuit Aspose Cloud (l’essai fonctionne sans licence). +- Une connaissance de base de pip et des environnements virtuels — rien de compliqué. + +> **Pro tip :** Si vous utilisez déjà un virtualenv, activez‑le maintenant. Cela garde vos dépendances propres et évite les conflits de version. + +![Capture d’écran du tutoriel Python OCR montrant le texte reconnu](path/to/ocr_example.png "Tutoriel Python OCR – affichage du texte brut extrait") + +## Étape 1 – Installer le Aspose OCR Cloud SDK + +Première chose, il nous faut la bibliothèque qui communique avec le service OCR d’Aspose. Ouvrez un terminal et exécutez : + +```bash +pip install asposeocrcloud +``` + +Cette unique commande télécharge le SDK le plus récent (actuellement version 23.12). Le package contient tout ce dont vous avez besoin — aucune bibliothèque de traitement d’image supplémentaire n’est requise. + +## Étape 2 – Initialiser le moteur OCR (Mot‑clé principal en action) + +Maintenant que le SDK est prêt, nous pouvons lancer le moteur du **python ocr tutorial**. Le constructeur ne nécessite aucune clé de licence pour l’essai, ce qui simplifie les choses. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Pourquoi c’est important :** Initialiser le moteur une seule fois garde les appels suivants rapides. Si vous recréez l’objet pour chaque image, vous gaspillerez des allers‑retours réseau. + +## Étape 3 – Charger l’image pour l’OCR + +C’est ici que le mot‑clé **load image for OCR** prend tout son sens. La méthode `Image.load` du SDK accepte un chemin de fichier ou une URL, et détecte automatiquement le format (PNG, JPEG, TIFF, PDF, etc.). Chargons un reçu d’exemple : + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Si vous travaillez avec un PDF multi‑pages, indiquez simplement le fichier PDF ; le SDK traitera chaque page comme une image distincte en interne. + +## Étape 4 – Effectuer la conversion OCR Image to Text + +Avec l’image en mémoire, l’OCR réel s’effectue en une seule ligne. La méthode `recognize` renvoie un objet `OcrResult` contenant le texte brut, les scores de confiance, et même les boîtes englobantes si vous en avez besoin plus tard. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Cas limite :** Pour des images basse résolution (moins de 300 dpi) vous pourriez vouloir les agrandir d’abord. Le SDK propose un helper `Resize`, mais pour la plupart des reçus le réglage par défaut suffit. + +## Étape 5 – Convertir le texte brut de l’image en une chaîne exploitable + +Le dernier maillon du puzzle consiste à extraire le texte brut de l’objet résultat. C’est l’étape **convert image plain text** qui transforme le blob OCR en une chaîne que vous pouvez afficher, stocker ou transmettre à un autre système. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Lorsque vous exécuterez le script, vous devriez obtenir quelque chose comme : + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Cette sortie est maintenant une chaîne Python ordinaire, prête pour une exportation CSV, une insertion en base de données, ou du traitement de langage naturel. + +## Gestion des écueils courants + +### 1. Images vides ou bruyantes + +Si `ocr_result.text` revient vide, revérifiez la qualité de l’image. Une solution rapide consiste à ajouter une étape de pré‑traitement : + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. PDF multi‑pages + +Lorsque vous fournissez un PDF, `recognize` renvoie des résultats pour chaque page. Parcourez‑les ainsi : + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Prise en charge des langues + +Aspose OCR supporte plus de 60 langues. Pour changer de langue, définissez la propriété `language` avant d’appeler `recognize` : + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Exemple complet fonctionnel + +En rassemblant le tout, voici un script complet, prêt à copier‑coller, qui couvre l’installation ainsi que la gestion des cas limites : + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Exécutez le script (`python ocr_demo.py`) et vous verrez la sortie **ocr image to text** directement dans votre console. + +## Récapitulatif – Ce que nous avons couvert + +- Installation du SDK **Aspose OCR Cloud** (`pip install asposeocrcloud`). +- **Initialisation du moteur OCR** sans licence (idéal pour l’essai). +- Démonstration du **load image for OCR**, que ce soit un PNG, JPEG ou PDF. +- Exécution de la conversion **ocr image to text** et **convert image plain text** en une chaîne Python exploitable. +- Gestion des problèmes fréquents comme les scans basse résolution, les PDF multi‑pages et le choix de la langue. + +## Prochaines étapes & sujets associés + +Maintenant que vous avez maîtrisé le **python ocr tutorial**, vous pouvez explorer : + +- **Extract text image python** pour le traitement par lots de grands dossiers de reçus. +- Intégrer la sortie OCR avec **pandas** pour l’analyse de données (`df = pd.read_csv(StringIO(extracted))`). +- Utiliser **Tesseract OCR** comme solution de secours lorsque la connectivité internet est limitée. +- Ajouter du post‑traitement avec **spaCy** pour identifier des entités telles que dates, montants et noms de commerçants. + +N’hésitez pas à expérimenter : essayez différents formats d’image, ajustez le contraste, ou changez de langue. Le domaine de l’OCR est vaste, et les compétences que vous venez d’acquérir constituent une base solide pour tout projet d’automatisation de documents. + +Bon codage, et que votre texte soit toujours lisible ! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/french/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/french/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..3871afed0 --- /dev/null +++ b/ocr/french/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,222 @@ +--- +category: general +date: 2026-03-28 +description: Apprenez à exécuter l’OCR sur une image, à télécharger automatiquement + le modèle Hugging Face, à nettoyer le texte OCR et à configurer le modèle LLM en + Python en utilisant Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: fr +og_description: Exécutez la reconnaissance optique de caractères sur une image et + nettoyez le résultat à l'aide d'un modèle Hugging Face téléchargé automatiquement. + Ce guide montre comment configurer le modèle LLM en Python. +og_title: Exécuter l'OCR sur une image – Tutoriel complet Aspose OCR Cloud +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Exécuter la reconnaissance optique de caractères sur une image avec Aspose OCR Cloud + – Guide complet étape par étape +url: /fr/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Exécuter l'OCR sur une image – Tutoriel complet Aspose OCR Cloud + +Vous avez déjà eu besoin d'exécuter l'OCR sur des fichiers image mais la sortie brute ressemblait à un méli-mélo ? D'après mon expérience, le principal point douloureux n'est pas la reconnaissance elle‑-même — c'est le nettoyage. Heureusement, Aspose OCR Cloud vous permet d'attacher un post‑processeur LLM qui peut *nettoyer le texte OCR* automatiquement. Dans ce tutoriel, nous passerons en revue tout ce dont vous avez besoin : du **téléchargement d'un modèle Hugging Face** à la configuration du LLM, en passant par l'exécution du moteur OCR, jusqu'à la finition du résultat. + +À la fin de ce guide, vous disposerez d'un script prêt à l'emploi qui : + +1. Récupère un modèle compact Qwen 2.5 depuis Hugging Face (téléchargé automatiquement pour vous). +2. Configure le modèle pour exécuter une partie du réseau sur le GPU et le reste sur le CPU. +3. Exécute le moteur OCR sur une image de note manuscrite. +4. Utilise le LLM pour nettoyer le texte reconnu, vous fournissant une sortie lisible par l'homme. + +> **Pré-requis** – Python 3.8+, package `asposeocrcloud`, un GPU avec au moins 4 Go de VRAM (optionnel mais recommandé), et une connexion internet pour le premier téléchargement du modèle. + +--- + +## Ce dont vous aurez besoin + +- **Aspose OCR Cloud SDK** – installez via `pip install asposeocrcloud`. +- **Une image d'exemple** – par ex., `handwritten_note.jpg` placée dans un dossier local. +- **Support GPU** – si vous disposez d'un GPU compatible CUDA, le script déchargera 30 couches ; sinon il reviendra automatiquement au CPU. +- **Permission d'écriture** – le script met en cache le modèle dans `YOUR_DIRECTORY` ; assurez‑vous que le dossier existe. + +--- + +## Étape 1 – Configurer le modèle LLM (télécharger le modèle Hugging Face) + +La première chose que nous faisons est d'indiquer à Aspose AI où récupérer le modèle. La classe `AsposeAIModelConfig` gère le téléchargement automatique, la quantisation et l'allocation des couches GPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Pourquoi c'est important** – Quantiser en `int8` réduit drastiquement l'utilisation de mémoire (≈ 4 Go vs 12 Go). Diviser le modèle entre GPU et CPU vous permet d'exécuter un LLM de 3 milliards de paramètres même sur un RTX 3060 modeste. Si vous n'avez pas de GPU, définissez `gpu_layers=0` et le SDK gardera tout sur le CPU. + +> **Astuce :** La première exécution téléchargera ~ 1,5 Go, donc accordez‑lui quelques minutes et une connexion stable. + +--- + +## Étape 2 – Initialiser le moteur IA avec la configuration du modèle + +Nous lançons maintenant le moteur Aspose AI et lui fournissons la configuration que nous venons de créer. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Ce qui se passe en coulisses ?** Le SDK vérifie `directory_model_path` pour un modèle existant. S'il trouve une version correspondante, il la charge instantanément ; sinon il télécharge le fichier GGUF depuis Hugging Face, le décompresse et prépare le pipeline d'inférence. + +--- + +## Étape 3 – Créer le moteur OCR et attacher le post‑processeur IA + +Le moteur OCR effectue le travail lourd de reconnaissance des caractères. En attachant `ocr_ai.run_postprocessor`, nous activons automatiquement le **nettoyage du texte OCR** après la reconnaissance. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Pourquoi utiliser un post‑processeur ?** L'OCR brut inclut souvent des sauts de ligne aux mauvais endroits, une ponctuation mal détectée ou des symboles parasites. Le LLM peut réécrire la sortie en phrases correctes, corriger l'orthographe et même deviner les mots manquants — transformant essentiellement un dump brut en texte poli. + +--- + +## Étape 4 – Exécuter l'OCR sur un fichier image + +Avec tout connecté, il est temps de fournir une image au moteur. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Cas particulier :** Si l'image est grande (> 5 MP), vous pourriez vouloir la redimensionner d'abord pour accélérer le traitement. Le SDK accepte un objet Pillow `Image`, vous pouvez donc pré‑traiter avec `PIL.Image.thumbnail()` si nécessaire. + +--- + +## Étape 5 – Laisser l'IA nettoyer le texte reconnu et afficher les deux versions + +Enfin nous invoquons le post‑processeur que nous avons attaché précédemment. Cette étape montre le contraste entre le *avant* et le *après* nettoyage. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Sortie attendue + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Remarquez comment le LLM a : + +- Corrigé les erreurs de reconnaissance OCR courantes (`Th1s` → `This`). +- Supprimé les symboles parasites (`&` → `and`). +- Normalisé les sauts de ligne en phrases correctes. + +--- + +## 🎨 Vue d'ensemble visuelle (Flux de travail Exécuter OCR sur image) + +![Flux d'exécution OCR sur image](run_ocr_on_image_workflow.png "Diagramme montrant le pipeline d'exécution OCR sur image, du téléchargement du modèle à la sortie nettoyée") + +Le diagramme ci‑dessus résume le pipeline complet : **télécharger le modèle Hugging Face → configurer le LLM → initialiser l'IA → moteur OCR → post‑processeur IA → nettoyer le texte OCR**. + +--- + +## Questions fréquentes & Astuces pro + +### Et si je n'ai pas de GPU ? + +Définissez `gpu_layers=0` dans le `AsposeAIModelConfig`. Le modèle s'exécutera entièrement sur le CPU, ce qui est plus lent mais toujours fonctionnel. Vous pouvez également passer à un modèle plus petit (par ex., `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) pour garder un temps d'inférence raisonnable. + +### Comment changer le modèle plus tard ? + +Il suffit de mettre à jour `hugging_face_repo_id` et de relancer `ocr_ai.initialize(model_config)`. Le SDK détectera le changement de version, téléchargera le nouveau modèle et remplacera les fichiers en cache. + +### Puis‑je personnaliser le prompt du post‑processeur ? + +Oui. Passez un dictionnaire à `custom_settings` avec une clé `prompt_template`. Par exemple : + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Dois‑je enregistrer le texte nettoyé dans un fichier ? + +Absolument. Après le nettoyage, vous pouvez écrire le résultat dans un fichier `.txt` ou `.json` pour un traitement en aval : + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Conclusion + +Nous venons de vous montrer comment **exécuter l'OCR sur des images** avec Aspose OCR Cloud, **télécharger automatiquement un modèle Hugging Face**, configurer avec expertise les paramètres du **modèle LLM**, et enfin **nettoyer le texte OCR** à l'aide d'un puissant post‑processeur LLM. L'ensemble du processus tient dans un seul script Python facile à exécuter et fonctionne à la fois sur des machines avec GPU et uniquement CPU. + +Si vous êtes à l'aise avec ce pipeline, envisagez d'expérimenter avec : + +- **Différents LLMs** – essayez `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` pour une fenêtre de contexte plus large. +- **Traitement par lots** – parcourez un dossier d'images et agrégerez les résultats nettoyés dans un CSV. +- **Prompts personnalisés** – adaptez l'IA à votre domaine (documents juridiques, notes médicales, etc.). + +N'hésitez pas à ajuster la valeur `gpu_layers`, changer de modèle, ou brancher votre propre prompt. Le ciel est la limite, et le code que vous avez maintenant est votre rampe de lancement. + +Bon codage, et que vos sorties OCR restent toujours propres ! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/german/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/german/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..9c4eed83e --- /dev/null +++ b/ocr/german/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Wie man OCR verwendet, um handgeschriebenen Text in Bildern zu erkennen. + Lernen Sie, handgeschriebenen Text zu extrahieren, handgeschriebene Bilder zu konvertieren + und schnell saubere Ergebnisse zu erhalten. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: de +og_description: Wie man OCR verwendet, um handgeschriebenen Text zu erkennen. Dieses + Tutorial zeigt Ihnen Schritt für Schritt, wie Sie handgeschriebenen Text aus Bildern + extrahieren und ein professionelles Ergebnis erzielen. +og_title: Wie man OCR zur Erkennung handgeschriebener Texte verwendet – Komplettanleitung +tags: +- OCR +- Handwriting Recognition +- Python +title: Wie man OCR verwendet, um handgeschriebenen Text zu erkennen – Komplettanleitung +url: /de/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Wie man OCR verwendet, um handgeschriebenen Text zu erkennen – Komplett‑Guide + +Wie man OCR für handgeschriebene Notizen einsetzt, ist eine Frage, die sich viele Entwickler stellen, wenn sie Skizzen, Sitzungsprotokolle oder schnelle Ideen digitalisieren wollen. In diesem Guide gehen wir Schritt für Schritt durch, wie man handgeschriebenen Text erkennt, extrahiert und ein handgeschriebenes Bild in saubere, durchsuchbare Zeichenketten umwandelt. + +Wenn du jemals auf ein Foto einer Einkaufsliste gestarrt hast und dich gefragt hast: „Kann ich dieses handgeschriebene Bild in Text umwandeln, ohne alles erneut abzutippen?“ – dann bist du hier genau richtig. Am Ende hast du ein sofort einsatzbereites Skript, das **handgeschriebene Notiz zu Text** in Sekunden verwandelt. + +## Was du brauchst + +- Python 3.8+ (der Code funktioniert mit jeder aktuellen Version) +- Die `ocr`‑Bibliothek – installiere sie mit `pip install ocr-sdk` (ersetze den Namen durch das Paket deines Anbieters) +- Ein klares Bild einer handgeschriebenen Notiz (`hand_note.png` im Beispiel) +- Ein bisschen Neugier und ein Kaffee ☕️ (optional, aber empfohlen) + +Kein schwergewichtiges Framework, keine kostenpflichtigen Cloud‑Keys – nur eine lokale Engine, die **Handschrifterkennung** out of the box unterstützt. + +## Schritt 1 – Das OCR‑Paket installieren und importieren + +Zuerst einmal das richtige Paket auf deinem Rechner. Öffne ein Terminal und führe aus: + +```bash +pip install ocr-sdk +``` + +Nachdem die Installation abgeschlossen ist, importiere das Modul in deinem Skript: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro‑Tipp:** Wenn du eine virtuelle Umgebung nutzt, aktiviere sie vor der Installation. So bleibt dein Projekt sauber und Versionskonflikte werden vermieden. + +## Schritt 2 – Einen OCR‑Engine erstellen und den Handschrift‑Modus aktivieren + +Jetzt gehen wir tatsächlich **wie man OCR verwendet** – wir benötigen eine Engine‑Instanz, die weiß, dass wir mit kursive Strichen statt gedruckten Schriften arbeiten. Das folgende Snippet erstellt die Engine und schaltet sie in den Handschrift‑Modus: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Warum `recognition_mode` setzen? Weil die meisten OCR‑Engines standardmäßig die Erkennung von Drucktext aktivieren, was die Schleifen und Schrägen einer persönlichen Notiz häufig überspringt. Der Handschrift‑Modus erhöht die Genauigkeit dramatisch. + +## Schritt 3 – Das Bild laden, das du konvertieren willst (Handgeschriebenes Bild konvertieren) + +Bilder sind das Rohmaterial für jede OCR‑Aufgabe. Stelle sicher, dass dein Bild in einem verlustfreien Format gespeichert ist (PNG funktioniert hervorragend) und der Text einigermaßen lesbar ist. Dann lade es so: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Falls das Bild im selben Verzeichnis wie dein Skript liegt, kannst du einfach `"hand_note.png"` anstelle eines kompletten Pfads verwenden. + +> **Was, wenn das Bild unscharf ist?** Versuche eine Vorverarbeitung mit OpenCV (z. B. `cv2.cvtColor` zu Graustufen, `cv2.threshold` zur Kontraststeigerung), bevor du es an die OCR‑Engine übergibst. + +## Schritt 4 – Die Erkennungs‑Engine ausführen, um handgeschriebenen Text zu extrahieren + +Mit der bereitstehenden Engine und dem Bild im Speicher können wir endlich **handgeschriebenen Text extrahieren**. Die Methode `recognize` liefert ein Roh‑Ergebnisobjekt, das den Text sowie Konfidenzwerte enthält. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Typische Rohausgaben können überflüssige Zeilenumbrüche oder falsch erkannte Zeichen enthalten, besonders bei unordentlicher Handschrift. Deshalb gibt es den nächsten Schritt. + +## Schritt 5 – (Optional) Ausgabe mit dem KI‑Post‑Processor verfeinern + +Die meisten modernen OCR‑SDKs liefern einen leichten KI‑Post‑Processor, der Abstände bereinigt, gängige OCR‑Fehler korrigiert und Zeilenenden normalisiert. Die Ausführung ist so einfach: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Wenn du diesen Schritt überspringst, bekommst du immer noch nutzbaren Text, aber die **Handschrift‑zu‑Text**‑Konvertierung wirkt etwas rauer. Der Post‑Processor ist besonders praktisch für Notizen mit Aufzählungspunkten oder gemischter Groß‑ und Kleinschreibung. + +## Schritt 6 – Ergebnis prüfen und Randfälle behandeln + +Nachdem du das verfeinerte Ergebnis ausgegeben hast, überprüfe, ob alles korrekt aussieht. Hier ein kurzer Plausibilitäts‑Check, den du hinzufügen kannst: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Checkliste für Randfälle** + +| Situation | Was zu tun ist | +|-----------|----------------| +| **Sehr geringer Kontrast** | Kontrast mit `cv2.convertScaleAbs` erhöhen, bevor das Bild geladen wird. | +| **Mehrere Sprachen** | `ocr_engine.language = ["en", "es"]` setzen (oder deine Zielsprachen). | +| **Große Dokumente** | Seiten stapelweise verarbeiten, um Speicher‑Spikes zu vermeiden. | +| **Spezial‑Symbole** | Ein benutzerdefiniertes Wörterbuch via `ocr_engine.add_custom_words([...])` hinzufügen. | + +## Visueller Überblick + +Unten siehst du ein Platzhalter‑Bild, das den Workflow illustriert – von einer fotografierten Notiz zu sauberem Text. Der Alt‑Text enthält das Haupt‑Keyword und macht das Bild SEO‑freundlich. + +![wie man OCR auf einem handschriftlichen Notizbild verwendet](/images/handwritten_ocr_flow.png "wie man OCR auf einem handschriftlichen Notizbild verwendet") + +## Vollständiges, ausführbares Skript + +Alle Teile zusammengefügt, hier das komplette, copy‑and‑paste‑bereite Programm: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Erwartete Ausgabe (Beispiel)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Sieh, wie der Post‑Processor den Tippfehler „T0d@y“ korrigiert und die Abstände normalisiert hat. + +## Häufige Stolperfallen & Pro‑Tipps + +- **Bildgröße zählt** – OCR‑Engines begrenzen die Eingabe meist auf 4 K × 4 K. Große Fotos vorher verkleinern. +- **Handschrift‑Stil** – Kursive vs. Blockbuchstaben können die Genauigkeit beeinflussen. Wenn du die Quelle kontrollierst (z. B. einen digitalen Stift), verwende Blockbuchstaben für beste Ergebnisse. +- **Batch‑Verarbeitung** – Bei Dutzenden Notizen das Skript in einer Schleife ausführen und jedes Ergebnis in einer CSV‑Datei oder SQLite‑DB speichern. +- **Speicher‑Leaks** – Einige SDKs behalten interne Puffer; rufe `ocr_engine.dispose()` auf, wenn du eine Verlangsamung bemerkst. + +## Nächste Schritte – über einfaches OCR hinaus + +Jetzt, wo du **wie man OCR verwendet** für ein einzelnes Bild gemeistert hast, erwäge diese Erweiterungen: + +1. **Integration mit Cloud‑Speicher** – Bilder von AWS S3 oder Azure Blob holen, dieselbe Pipeline laufen lassen und die Ergebnisse zurückschieben. +2. **Spracherkennung hinzufügen** – `ocr_engine.detect_language()` nutzen, um automatisch Wörterbücher zu wechseln. +3. **Kombination mit NLP** – Den bereinigten Text an spaCy oder NLTK übergeben, um Entitäten, Daten oder Aktionen zu extrahieren. +4. **REST‑Endpoint erstellen** – Das Skript in Flask oder FastAPI einbetten, sodass andere Services Bilder per POST senden und JSON‑kodierten Text erhalten können. + +All diese Ideen drehen sich weiterhin um die Kernkonzepte **handgeschriebenen Text erkennen**, **handgeschriebenen Text extrahieren** und **handgeschriebenes Bild konvertieren** – die genauen Phrasen, nach denen du als Nächstes suchen wirst. + +--- + +### TL;DR + +Wir haben dir gezeigt, **wie man OCR** verwendet, um handgeschriebenen Text zu erkennen, zu extrahieren und das Ergebnis zu einem nutzbaren String zu verfeinern. Das vollständige Skript ist einsatzbereit, der Workflow wird Schritt für Schritt erklärt und du hast jetzt eine Checkliste für gängige Randfälle. Schnapp dir ein Foto deiner nächsten Sitzungsnotiz, steck es ins Skript und lass die Maschine das Tippen übernehmen. + +Viel Spaß beim Coden und möge deine Handschrift immer lesbar bleiben! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/german/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/german/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..2d373a34b --- /dev/null +++ b/ocr/german/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,187 @@ +--- +category: general +date: 2026-03-28 +description: Führen Sie OCR auf einem Bild durch und erhalten Sie bereinigten Text + mit Koordinaten der Begrenzungsrahmen. Lernen Sie, wie man OCR extrahiert, OCR bereinigt + und die Ergebnisse Schritt für Schritt anzeigt. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: de +og_description: Führen Sie OCR auf einem Bild aus, bereinigen Sie die Ausgabe und + zeigen Sie die Koordinaten der Begrenzungsrahmen in einer kurzen Anleitung. +og_title: OCR auf Bild ausführen – Saubere Ergebnisse und Begrenzungsrahmen +tags: +- OCR +- Computer Vision +- Python +title: OCR auf Bild ausführen – Ergebnisse bereinigen und Bounding‑Box‑Koordinaten + anzeigen +url: /de/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# OCR auf Bild ausführen – Ergebnisse bereinigen und Begrenzungsbox‑Koordinaten anzeigen + +Haben Sie jemals **OCR auf Bild**‑Dateien ausführen müssen, aber immer wieder unordentlichen Text erhalten und nicht sicher sein können, wo jedes Wort im Bild liegt? Sie sind nicht allein. In vielen Projekten – Rechnungsdigitalisierung, Belegscan oder einfache Textextraktion – ist die Roh‑OCR‑Ausgabe nur das erste Hindernis. Die gute Nachricht? Sie können diese Ausgabe bereinigen und sofort die Begrenzungsbox‑Koordinaten jeder Region sehen, ohne eine Menge Boiler‑Plate‑Code zu schreiben. + +In diesem Leitfaden gehen wir Schritt für Schritt durch **how to extract OCR**, führen einen **how to clean OCR**‑Post‑Processor aus und zeigen schließlich **display bounding box coordinates** für jede bereinigte Region. Am Ende haben Sie ein einzelnes, ausführbares Skript, das ein unscharfes Foto in sauberen, strukturierten Text verwandelt, bereit für die nachgelagerte Verarbeitung. + +## Was Sie benötigen + +- Python 3.9+ (die unten gezeigte Syntax funktioniert ab 3.8) +- Eine OCR‑Engine, die `recognize(..., return_structured=True)` unterstützt – zum Beispiel die fiktive `engine`‑Bibliothek im Beispiel. Ersetzen Sie sie durch Tesseract, EasyOCR oder ein beliebiges SDK, das Regionsdaten zurückgibt. +- Grundlegende Kenntnisse von Python‑Funktionen und Schleifen +- Eine Bilddatei, die Sie scannen möchten (PNG, JPG usw.) + +> **Profi‑Tipp:** Wenn Sie Tesseract verwenden, liefert die Funktion `pytesseract.image_to_data` bereits Begrenzungsboxen. Sie können das Ergebnis in einen kleinen Adapter einbinden, der die unten gezeigte `engine.recognize`‑API nachahmt. + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Alt text: Diagramm, das zeigt, wie OCR auf Bild ausgeführt wird und Begrenzungsbox‑Koordinaten visualisiert werden* + +## Schritt 1 – OCR auf Bild ausführen und strukturierte Regionen erhalten + +Der erste Schritt besteht darin, die OCR‑Engine zu bitten, nicht nur reinen Text, sondern eine strukturierte Liste von Textregionen zurückzugeben. Diese Liste enthält die Rohzeichenkette und das Rechteck, das sie umschließt. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Warum das wichtig ist:** +Wenn Sie nur nach reinem Text fragen, verlieren Sie den räumlichen Kontext. Strukturierte Daten ermöglichen es Ihnen später, **display bounding box coordinates** anzuzeigen, Text mit Tabellen auszurichten oder präzise Positionen an ein nachgelagertes Modell zu übergeben. + +## Schritt 2 – OCR‑Ausgabe mit einem Post‑Processor bereinigen + +OCR‑Engines erkennen Zeichen gut, lassen jedoch oft überflüssige Leerzeichen, Zeilenumbruch‑Artefakte oder falsch erkannte Symbole zurück. Ein Post‑Processor normalisiert den Text, behebt häufige OCR‑Fehler und entfernt überflüssige Leerzeichen. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Wenn Sie Ihren eigenen Cleaner bauen, sollten Sie berücksichtigen: + +- Entfernen von Nicht‑ASCII‑Zeichen (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Zusammenführen mehrerer Leerzeichen zu einem einzigen +- Anwenden eines Rechtschreibprüfers wie `pyspellchecker` für offensichtliche Tippfehler + +**Warum das wichtig ist:** +Ein bereinigter String macht Suche, Indexierung und nachgelagerte NLP‑Pipelines deutlich zuverlässiger. Mit anderen Worten, **how to clean OCR** ist oft der Unterschied zwischen einem nutzbaren Datensatz und einem Kopfschmerz. + +## Schritt 3 – Begrenzungsbox‑Koordinaten für jede bereinigte Region anzeigen + +Jetzt, wo der Text bereinigt ist, iterieren wir über jede Region, geben ihr Rechteck und den bereinigten String aus. Das ist der Teil, in dem wir schließlich **display bounding box coordinates** anzeigen. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Beispielausgabe** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Sie können diese Koordinaten nun in eine Zeichenbibliothek (z. B. OpenCV) einspeisen, um Boxen über das Originalbild zu legen, oder sie in einer Datenbank für spätere Abfragen speichern. + +## Vollständiges, sofort ausführbares Skript + +Unten finden Sie das vollständige Programm, das alle drei Schritte verbindet. Ersetzen Sie die Platzhalter‑`engine`‑Aufrufe durch Ihr tatsächliches OCR‑SDK. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### So führen Sie das Skript aus + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Sie sollten eine Liste von Begrenzungsboxen zusammen mit bereinigtem Text sehen, genau wie in der obigen Beispielausgabe. + +## Häufig gestellte Fragen & Sonderfälle + +| Frage | Antwort | +|----------|--------| +| **Was ist, wenn die OCR‑Engine `return_structured` nicht unterstützt?** | Schreiben Sie einen dünnen Wrapper, der die Rohausgabe der Engine (in der Regel eine Liste von Wörtern mit Koordinaten) in Objekte mit den Attributen `text` und `bounding_box` konvertiert. | +| **Kann ich Konfidenzwerte erhalten?** | Viele SDKs stellen eine Konfidenzmetrik pro Region bereit. Hängen Sie sie an die Print‑Anweisung an: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Wie gehe ich mit gedrehtem Text um?** | Vorverarbeiten Sie das Bild mit OpenCVs `cv2.minAreaRect`, um es vor dem Aufruf von `recognize` zu entzerren. | +| **Was ist, wenn ich die Ausgabe im JSON‑Format benötige?** | Serialisieren Sie `processed_result.regions` mit `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Gibt es eine Möglichkeit, die Boxen zu visualisieren?** | Verwenden Sie OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` innerhalb der Schleife und dann `cv2.imwrite("annotated.jpg", img)`. | + +## Fazit + +Sie haben gerade **how to perform OCR on image** gelernt, die Rohausgabe bereinigt und **display bounding box coordinates** für jede Region angezeigt. Der dreistufige Ablauf – erkennen → nachbearbeiten → iterieren – ist ein wiederverwendbares Muster, das Sie in jedes Python‑Projekt einbinden können, das zuverlässige Textextraktion benötigt. + +### Was kommt als Nächstes? + +- **Untersuchen Sie verschiedene OCR‑Back‑ends** (Tesseract, EasyOCR, Google Vision) und vergleichen Sie die Genauigkeit. +- **Integrieren Sie eine Datenbank**, um Regionsdaten für durchsuchbare Archive zu speichern. +- **Fügen Sie Spracherkennung hinzu**, um jede Region durch den passenden Rechtschreibprüfer zu leiten. +- **Boxen über das Originalbild legen** zur visuellen Überprüfung (siehe das OpenCV‑Snippet oben). + +Wenn Sie auf Eigenheiten stoßen, denken Sie daran, dass der größte Gewinn aus einem soliden Nachbearbeitungsschritt resultiert; ein bereinigter String ist viel einfacher zu verarbeiten als ein Rohdump von Zeichen. + +Viel Spaß beim Coden, und mögen Ihre OCR‑Pipelines stets ordentlich sein! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/german/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/german/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..98a72ce5f --- /dev/null +++ b/ocr/german/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Python OCR‑Tutorial, das zeigt, wie man Text aus einem Bild mit Aspose + OCR Cloud extrahiert. Lernen Sie, ein Bild für OCR zu laden und das Bild in Klartext + zu konvertieren – in wenigen Minuten. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: de +og_description: Das Python-OCR-Tutorial erklärt, wie man ein Bild für OCR lädt und + den Bild‑Plain‑Text mit Aspose OCR Cloud konvertiert. Holen Sie sich den vollständigen + Code und Tipps. +og_title: Python OCR Tutorial – Text aus Bildern extrahieren +tags: +- OCR +- Python +- Image Processing +title: Python OCR‑Tutorial – Text aus Bildern extrahieren +url: /de/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Tutorial – Text aus Bildern extrahieren + +Haben Sie sich jemals gefragt, wie man ein unordentliches Belegfoto in sauberen, durchsuchbaren Text verwandelt? Sie sind nicht allein. Nach meiner Erfahrung ist das größte Hindernis nicht die OCR‑Engine selbst, sondern das Bild in das richtige Format zu bringen und den Klartext problemlos herauszuholen. + +Dieses **python ocr tutorial** führt Sie durch jeden Schritt – das Laden eines Bildes für OCR, das Ausführen der Erkennung und schließlich das Umwandeln des Bild‑Klartexts in einen Python‑String, den Sie speichern oder analysieren können. Am Ende werden Sie im **extract text image python**‑Stil Text extrahieren können, und Sie benötigen keine kostenpflichtige Lizenz, um zu beginnen. + +## Was Sie lernen werden + +- Wie man das Aspose OCR Cloud SDK für Python installiert und importiert. +- Der genaue Code zum **load image for OCR** (PNG, JPEG, TIFF, PDF usw.). +- Wie man die Engine aufruft, um eine **ocr image to text**‑Umwandlung durchzuführen. +- Tipps zum Umgang mit gängigen Edge‑Cases wie mehrseitigen PDFs oder Aufnahmen mit niedriger Auflösung. +- Möglichkeiten, die Ausgabe zu überprüfen und was zu tun ist, wenn der Text unleserlich erscheint. + +### Voraussetzungen + +- Python 3.8+ auf Ihrem Rechner installiert. +- Ein kostenloses Aspose Cloud‑Konto (die Testversion funktioniert ohne Lizenz). +- Grundlegende Kenntnisse mit pip und virtuellen Umgebungen – nichts Besonderes. + +> **Pro Tipp:** Wenn Sie bereits ein virtualenv verwenden, aktivieren Sie es jetzt. Es hält Ihre Abhängigkeiten sauber und vermeidet Versionskonflikte. + +![Python OCR tutorial Screenshot, der erkannten Text zeigt](path/to/ocr_example.png "Python OCR tutorial – Anzeige des extrahierten Klartexts") + +## Schritt 1 – Installieren des Aspose OCR Cloud SDK + +Zuerst benötigen wir die Bibliothek, die mit dem Aspose OCR‑Dienst kommuniziert. Öffnen Sie ein Terminal und führen Sie aus: + +```bash +pip install asposeocrcloud +``` + +Dieser einzelne Befehl lädt das neueste SDK (derzeit Version 23.12). Das Paket enthält alles, was Sie benötigen – keine zusätzlichen Bildverarbeitungs‑Bibliotheken sind erforderlich. + +## Schritt 2 – Initialisieren der OCR‑Engine (Primary Keyword in Action) + +Jetzt, wo das SDK bereit ist, können wir die **python ocr tutorial**‑Engine starten. Der Konstruktor benötigt keinen Lizenzschlüssel für die Testversion, was die Sache einfach macht. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Warum das wichtig ist:** Das Initialisieren der Engine nur einmal hält die nachfolgenden Aufrufe schnell. Wenn Sie das Objekt für jedes Bild neu erstellen, verschwenden Sie Netzwerk‑Rundreisen. + +## Schritt 3 – Bild für OCR laden + +Hier kommt das **load image for OCR**‑Schlüsselwort zum Einsatz. Die `Image.load`‑Methode des SDK akzeptiert einen Dateipfad oder eine URL und erkennt das Format automatisch (PNG, JPEG, TIFF, PDF usw.). Laden wir einen Beispielbeleg: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Wenn Sie mit einem mehrseitigen PDF arbeiten, verweisen Sie einfach auf die PDF‑Datei; das SDK behandelt jede Seite intern als separates Bild. + +## Schritt 4 – OCR‑Bild‑zu‑Text‑Umwandlung durchführen + +Mit dem Bild im Speicher erfolgt die eigentliche OCR in einer einzigen Zeile. Die `recognize`‑Methode gibt ein `OcrResult`‑Objekt zurück, das den Klartext, Konfidenzwerte und sogar Begrenzungsrahmen enthält, falls Sie diese später benötigen. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge‑Case:** Bei Bildern mit niedriger Auflösung (unter 300 dpi) möchten Sie das Bild möglicherweise zuerst hochskalieren. Das SDK bietet einen `Resize`‑Helfer, aber für die meisten Belege funktioniert die Standardeinstellung gut. + +## Schritt 5 – Bild‑Klartext in einen nutzbaren String umwandeln + +Das letzte Puzzleteil besteht darin, den Klartext aus dem Ergebnisobjekt zu extrahieren. Dies ist der **convert image plain text**‑Schritt, der den OCR‑Blob in etwas umwandelt, das Sie drucken, speichern oder in ein anderes System einspeisen können. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Wenn Sie das Skript ausführen, sollten Sie etwa Folgendes sehen: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Diese Ausgabe ist nun ein regulärer Python‑String, bereit für CSV‑Export, Datenbankeinfügung oder Natural‑Language‑Processing. + +## Umgang mit häufigen Fallstricken + +### 1. Leere oder verrauschte Bilder + +Wenn `ocr_result.text` leer zurückkommt, überprüfen Sie die Bildqualität erneut. Eine schnelle Lösung ist, einen Vorverarbeitungsschritt hinzuzufügen: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Mehrseitige PDFs + +Wenn Sie ein PDF übergeben, gibt `recognize` Ergebnisse für jede Seite zurück. Durchlaufen Sie sie wie folgt: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Sprachunterstützung + +Aspose OCR unterstützt über 60 Sprachen. Um die Sprache zu wechseln, setzen Sie die `language`‑Eigenschaft, bevor Sie `recognize` aufrufen: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Vollständiges funktionierendes Beispiel + +Alles zusammengeführt, hier ein komplettes, copy‑paste‑fertiges Skript, das alles von der Installation bis zur Behandlung von Edge‑Cases abdeckt: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Führen Sie das Skript (`python ocr_demo.py`) aus und Sie sehen die **ocr image to text**‑Ausgabe direkt in Ihrer Konsole. + +## Zusammenfassung – Was wir behandelt haben + +- Das **Aspose OCR Cloud** SDK installiert (`pip install asposeocrcloud`). +- **Initialised the OCR engine** ohne Lizenz (perfekt für die Testversion). +- Gezeigt, wie man **load image for OCR** verwendet, egal ob PNG, JPEG oder PDF. +- **ocr image to text**‑Umwandlung durchgeführt und **convert image plain text** in einen nutzbaren Python‑String umgewandelt. +- Häufige Fallstricke wie Scans mit niedriger Auflösung, mehrseitige PDFs und Sprachauswahl behandelt. + +## Nächste Schritte & verwandte Themen + +Jetzt, da Sie das **python ocr tutorial** gemeistert haben, sollten Sie Folgendes erkunden: + +- **Extract text image python** für die Stapelverarbeitung großer Belegordner. +- Integration der OCR‑Ausgabe mit **pandas** für Datenanalyse (`df = pd.read_csv(StringIO(extracted))`). +- Verwendung von **Tesseract OCR** als Rückfallback, wenn die Internetverbindung eingeschränkt ist. +- Hinzufügen von Nachbearbeitung mit **spaCy**, um Entitäten wie Daten, Beträge und Händlernamen zu identifizieren. + +Fühlen Sie sich frei zu experimentieren: probieren Sie verschiedene Bildformate, passen Sie den Kontrast an oder wechseln Sie die Sprache. Die OCR‑Landschaft ist breit, und die Fähigkeiten, die Sie gerade erworben haben, bilden eine solide Grundlage für jedes Dokument‑Automatisierungsprojekt. + +Viel Spaß beim Coden, und möge Ihr Text stets lesbar sein! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/german/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/german/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..66785853d --- /dev/null +++ b/ocr/german/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,203 @@ +--- +category: general +date: 2026-03-28 +description: Erfahren Sie, wie Sie OCR auf Bildern ausführen, das Hugging Face‑Modell + automatisch herunterladen, OCR‑Text bereinigen und ein LLM‑Modell in Python mit + Aspose OCR Cloud konfigurieren. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: de +og_description: Führen Sie OCR auf einem Bild aus und bereinigen Sie die Ausgabe mithilfe + eines automatisch heruntergeladenen Hugging Face‑Modells. Dieser Leitfaden zeigt, + wie man ein LLM‑Modell in Python konfiguriert. +og_title: OCR auf Bild ausführen – Vollständiges Aspose OCR‑Cloud‑Tutorial +tags: +- OCR +- Python +- LLM +- HuggingFace +title: OCR auf Bild mit Aspose OCR Cloud ausführen – Vollständige Schritt‑für‑Schritt‑Anleitung +url: /de/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# OCR auf Bild ausführen – Komplettes Aspose OCR Cloud Tutorial + +Haben Sie jemals OCR auf Bilddateien ausführen müssen, aber die Rohausgabe sah wie ein wirres Durcheinander aus? Meiner Erfahrung nach ist der größte Schmerzpunkt nicht die Erkennung selbst – sondern die Bereinigung. Glücklicherweise ermöglicht Aspose OCR Cloud das Anhängen eines LLM‑Post‑Processors, der *OCR‑Text* automatisch säubert. In diesem Tutorial führen wir Sie durch alles, was Sie benötigen: vom **Herunterladen eines Hugging Face‑Modells** über die Konfiguration des LLMs, das Ausführen der OCR‑Engine bis hin zur abschließenden Verfeinerung des Ergebnisses. + +Am Ende dieses Leitfadens haben Sie ein einsatzbereites Skript, das: + +1. Ein kompaktes Qwen 2.5‑Modell von Hugging Face abruft (automatisch für Sie heruntergeladen). +2. Das Modell so konfiguriert, dass ein Teil des Netzwerks auf der GPU und der Rest auf der CPU läuft. +3. Die OCR‑Engine auf einem Bild einer handschriftlichen Notiz ausführt. +4. Das LLM verwendet, um den erkannten Text zu bereinigen und Ihnen menschenlesbare Ausgabe zu liefern. + +> **Voraussetzungen** – Python 3.8+, `asposeocrcloud`‑Paket, eine GPU mit mindestens 4 GB VRAM (optional aber empfohlen) und eine Internetverbindung für den ersten Modell‑Download. + +## Was Sie benötigen + +- **Aspose OCR Cloud SDK** – Installation über `pip install asposeocrcloud`. +- **Ein Beispielbild** – z. B. `handwritten_note.jpg` in einem lokalen Ordner. +- **GPU‑Unterstützung** – Wenn Sie eine CUDA‑fähige GPU haben, lagert das Skript 30 Schichten aus; andernfalls fällt es automatisch auf die CPU zurück. +- **Schreibberechtigung** – Das Skript cached das Modell in `YOUR_DIRECTORY`; stellen Sie sicher, dass der Ordner existiert. + +## Schritt 1 – LLM‑Modell konfigurieren (Hugging Face‑Modell herunterladen) + +Zuerst teilen wir Aspose AI mit, wo das Modell abgerufen werden soll. Die Klasse `AsposeAIModelConfig` übernimmt den Auto‑Download, die Quantisierung und die Zuweisung von GPU‑Schichten. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Warum das wichtig ist** – Die Quantisierung auf `int8` reduziert den Speicherverbrauch drastisch (≈ 4 GB gegenüber 12 GB). Das Aufteilen des Modells zwischen GPU und CPU ermöglicht das Ausführen eines 3‑Milliarden‑Parameter‑LLM selbst auf einer bescheidenen RTX 3060. Wenn Sie keine GPU haben, setzen Sie `gpu_layers=0` und das SDK hält alles auf der CPU. + +> **Tipp:** Der erste Durchlauf lädt ~ 1,5 GB herunter, geben Sie also ein paar Minuten und eine stabile Verbindung. + +## Schritt 2 – AI‑Engine mit der Modellkonfiguration initialisieren + +Jetzt starten wir die Aspose AI‑Engine und übergeben ihr die gerade erstellte Konfiguration. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Was im Hintergrund passiert** – Das SDK prüft `directory_model_path` auf ein vorhandenes Modell. Wenn es eine passende Version findet, wird sie sofort geladen; andernfalls wird die GGUF‑Datei von Hugging Face heruntergeladen, entpackt und die Inferenz‑Pipeline vorbereitet. + +## Schritt 3 – OCR‑Engine erstellen und den AI‑Post‑Processor anhängen + +Die OCR‑Engine übernimmt das schwere Heben beim Erkennen von Zeichen. Durch das Anhängen von `ocr_ai.run_postprocessor` aktivieren wir **sauberen OCR‑Text** automatisch nach der Erkennung. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Warum einen Post‑Processor verwenden?** Roh‑OCR enthält oft Zeilenumbrüche an falschen Stellen, falsch erkannte Interpunktion oder fremde Symbole. Das LLM kann die Ausgabe in korrekte Sätze umschreiben, Rechtschreibung korrigieren und sogar fehlende Wörter ableiten – im Wesentlichen wird ein Rohdump in polierten Prosa verwandelt. + +## Schritt 4 – OCR auf einer Bilddatei ausführen + +Nachdem alles verbunden ist, ist es Zeit, ein Bild an die Engine zu übergeben. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Randfall:** Wenn das Bild groß ist (> 5 MP), möchten Sie es möglicherweise zuerst verkleinern, um die Verarbeitung zu beschleunigen. Das SDK akzeptiert ein Pillow‑`Image`‑Objekt, sodass Sie bei Bedarf mit `PIL.Image.thumbnail()` vorverarbeiten können. + +## Schritt 5 – Die KI den erkannten Text bereinigen lassen und beide Versionen anzeigen + +Zum Schluss rufen wir den zuvor angehängten Post‑Processor auf. Dieser Schritt zeigt den Unterschied zwischen *vor* und *nach* der Bereinigung. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Erwartete Ausgabe + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Beachten Sie, wie das LLM: + +- Häufige OCR‑Fehler korrigiert hat (`Th1s` → `This`). +- Fremde Symbole entfernt hat (`&` → `and`). +- Zeilenumbrüche in korrekte Sätze normalisiert hat. + +## 🎨 Visueller Überblick (OCR‑Workflow auf Bild ausführen) + +![Run OCR on image workflow](run_ocr_on_image_workflow.png "Diagram showing the run OCR on image pipeline from model download to cleaned output") + +Das obige Diagramm fasst die gesamte Pipeline zusammen: **Hugging Face‑Modell herunterladen → LLM konfigurieren → AI initialisieren → OCR‑Engine → AI‑Post‑Processor → OCR‑Text bereinigen**. + +## Häufige Fragen & Pro‑Tipps + +### Was, wenn ich keine GPU habe? + +Setzen Sie `gpu_layers=0` in `AsposeAIModelConfig`. Das Modell läuft dann vollständig auf der CPU, was langsamer, aber dennoch funktionsfähig ist. Sie können auch zu einem kleineren Modell wechseln (z. B. `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`), um die Inferenzzeit angemessen zu halten. + +### Wie ändere ich später das Modell? + +Aktualisieren Sie einfach `hugging_face_repo_id` und führen Sie `ocr_ai.initialize(model_config)` erneut aus. Das SDK erkennt die Versionsänderung, lädt das neue Modell herunter und ersetzt die zwischengespeicherten Dateien. + +### Kann ich den Prompt des Post‑Processors anpassen? + +Ja. Übergeben Sie ein Dictionary an `custom_settings` mit einem Schlüssel `prompt_template`. Zum Beispiel: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Sollte ich den bereinigten Text in einer Datei speichern? + +Definitiv. Nach der Bereinigung können Sie das Ergebnis in einer `.txt`‑ oder `.json`‑Datei für die Weiterverarbeitung schreiben: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +## Fazit + +Wir haben Ihnen gerade gezeigt, wie Sie **OCR auf Bilddateien** mit Aspose OCR Cloud ausführen, automatisch ein **Hugging Face‑Modell herunterladen**, die **LLM‑Modelleinstellungen** fachkundig **konfigurieren** und schließlich **OCR‑Text** mit einem leistungsstarken LLM‑Post‑Processor bereinigen. Der gesamte Prozess passt in ein einziges, leicht auszuführendes Python‑Skript und funktioniert sowohl auf GPU‑fähigen als auch auf reinen CPU‑Maschinen. + +Wenn Sie mit dieser Pipeline vertraut sind, experimentieren Sie gern mit: + +- **Verschiedene LLMs** – probieren Sie `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` für ein größeres Kontextfenster. +- **Batch‑Verarbeitung** – iterieren Sie über einen Ordner mit Bildern und aggregieren Sie die bereinigten Ergebnisse in einer CSV. +- **Benutzerdefinierte Prompts** – passen Sie die KI an Ihre Domäne an (rechtliche Dokumente, medizinische Notizen usw.). + +Passen Sie den Wert `gpu_layers` nach Belieben an, tauschen Sie das Modell aus oder fügen Sie Ihren eigenen Prompt ein. Der Himmel ist die Grenze, und der Code, den Sie jetzt haben, ist die Startrampe. + +Viel Spaß beim Programmieren, und möge Ihr OCR‑Ausgabe stets sauber sein! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/greek/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/greek/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..81b710c62 --- /dev/null +++ b/ocr/greek/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Πώς να χρησιμοποιήσετε OCR για την αναγνώριση χειρόγραφου κειμένου σε + εικόνες. Μάθετε πώς να εξάγετε το χειρόγραφο κείμενο, να μετατρέψετε την εικόνα + με χειρόγραφο και να λαμβάνετε καθαρά αποτελέσματα γρήγορα. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: el +og_description: Πώς να χρησιμοποιήσετε OCR για να αναγνωρίσετε χειρόγραφο κείμενο. + Αυτό το σεμινάριο σας δείχνει βήμα‑βήμα πώς να εξάγετε χειρόγραφο κείμενο από εικόνες + και να πετύχετε άψογα αποτελέσματα. +og_title: Πώς να χρησιμοποιήσετε OCR για την αναγνώριση χειρόγραφου κειμένου – Πλήρης + οδηγός +tags: +- OCR +- Handwriting Recognition +- Python +title: Πώς να χρησιμοποιήσετε OCR για την αναγνώριση χειρόγραφου κειμένου – Πλήρης + οδηγός +url: /el/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Πώς να Χρησιμοποιήσετε OCR για την Αναγνώριση Χειρόγραφου Κειμένου – Πλήρης Οδηγός + +Πώς να χρησιμοποιήσετε OCR για χειρόγραφες σημειώσεις είναι μια ερώτηση που πολλοί προγραμματιστές κάνουν όταν χρειάζεται να ψηφιοποιήσουν σκίτσα, πρακτικά συναντήσεων ή γρήγορες ιδέες. Σε αυτόν τον οδηγό θα περάσουμε από τα ακριβή βήματα για την αναγνώριση χειρόγραφου κειμένου, την εξαγωγή του και τη μετατροπή μιας χειρόγραφης εικόνας σε καθαρά, αναζητήσιμα strings. + +Αν έχετε ποτέ κοιτάξει μια φωτογραφία λίστας αγορών και αναρωτηθήκατε, “Μπορώ να μετατρέψω αυτή τη χειρόγραφη εικόνα σε κείμενο χωρίς να πληκτρολογήσω ξανά τα πάντα;” – βρίσκεστε στο σωστό μέρος. Στο τέλος θα έχετε ένα έτοιμο script που μετατρέπει μια **χειρόγραφη σημείωση σε κείμενο** σε δευτερόλεπτα. + +## Τι Θα Χρειαστείτε + +- Python 3.8+ (ο κώδικας λειτουργεί με οποιαδήποτε πρόσφατη έκδοση) +- Η βιβλιοθήκη `ocr` – εγκαταστήστε την με `pip install ocr-sdk` (αντικαταστήστε με το όνομα του πακέτου του παρόχου σας) +- Μια καθαρή φωτογραφία ενός χειρόγραφου σημειώματος (`hand_note.png` στο παράδειγμα) +- Λίγη περιέργεια και ένας καφές ☕️ (προαιρετικό αλλά συνιστάται) + +Δεν χρειάζονται βαριά frameworks, ούτε πληρωμένα κλειδιά cloud – μόνο μια τοπική μηχανή που υποστηρίζει **handwritten recognition** έτοιμη προς χρήση. + +## Βήμα 1 – Εγκατάσταση του Πακέτου OCR και Εισαγωγή του + +Πρώτα απ' όλα, ας εγκαταστήσουμε το σωστό πακέτο στο μηχάνημά σας. Ανοίξτε ένα τερματικό και εκτελέστε: + +```bash +pip install ocr-sdk +``` + +Μόλις ολοκληρωθεί η εγκατάσταση, εισάγετε το module στο script σας: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tip:** Αν χρησιμοποιείτε εικονικό περιβάλλον, ενεργοποιήστε το πριν την εγκατάσταση. Έτσι το έργο σας παραμένει καθαρό και αποφεύγονται συγκρούσεις εκδόσεων. + +## Βήμα 2 – Δημιουργία Μηχανής OCR και Ενεργοποίηση Λειτουργίας Χειρογράφου + +Τώρα που πραγματικά **πώς να χρησιμοποιήσετε OCR** – χρειαζόμαστε μια παρουσία μηχανής που ξέρει ότι ασχολούμαστε με καμπύλες γραμμές αντί για τυπωμένο κείμενο. Το παρακάτω απόσπασμα δημιουργεί τη μηχανή και τη θέτει σε λειτουργία χειρογράφου: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Γιατί να ορίσουμε το `recognition_mode`; Επειδή οι περισσότερες μηχανές OCR προεπιλέγουν ανίχνευση τυπωμένου κειμένου, που συχνά παραλείπει τις βρόχους και τις κλίσεις ενός προσωπικού σημειώματος. Η ενεργοποίηση της λειτουργίας χειρογράφου αυξάνει την ακρίβεια δραματικά. + +## Βήμα 3 – Φόρτωση της Εικόνας που Θέλετε να Μετατρέψετε (Convert Handwritten Image) + +Οι εικόνες είναι το ακατέργαστο υλικό για κάθε εργασία OCR. Βεβαιωθείτε ότι η φωτογραφία σας είναι αποθηκευμένη σε μορφή χωρίς απώλειες (PNG λειτουργεί εξαιρετικά) και ότι το κείμενο είναι λογικά αναγνώσιμο. Στη συνέχεια φορτώστε την ως εξής: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Αν η εικόνα βρίσκεται δίπλα στο script, μπορείτε απλώς να χρησιμοποιήσετε `"hand_note.png"` αντί για πλήρη διαδρομή. + +> **Τι γίνεται αν η εικόνα είναι θολή;** Δοκιμάστε προεπεξεργασία με OpenCV (π.χ., `cv2.cvtColor` σε grayscale, `cv2.threshold` για αύξηση αντίθεσης) πριν τη δώσετε στη μηχανή OCR. + +## Βήμα 4 – Εκτέλεση της Μηχανής Αναγνώρισης για Εξαγωγή Χειρόγραφου Κειμένου + +Με τη μηχανή έτοιμη και την εικόνα στη μνήμη, μπορούμε τελικά **να εξάγουμε χειρόγραφο κείμενο**. Η μέθοδος `recognize` επιστρέφει ένα ακατέργαστο αντικείμενο αποτελέσματος που περιέχει το κείμενο μαζί με βαθμολογίες εμπιστοσύνης. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Το τυπικό ακατέργαστο αποτέλεσμα μπορεί να περιλαμβάνει τυχαία line breaks ή λανθασμένους χαρακτήρες, ειδικά αν το γράψιμο είναι ακατάστατο. Γι' αυτό υπάρχει το επόμενο βήμα. + +## Βήμα 5 – (Προαιρετικό) Βελτίωση του Αποτελέσματος με τον AI Post‑Processor + +Οι περισσότερες σύγχρονες OCR SDK έρχονται με έναν ελαφρύ AI post‑processor που καθαρίζει τα κενά, διορθώνει κοινά σφάλματα OCR και ομαλοποιεί τα line endings. Η εκτέλεσή του είναι τόσο απλή όσο: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Αν παραλείψετε αυτό το βήμα, θα έχετε ακόμα χρησιμοποιήσιμο κείμενο, αλλά η **μετατροπή χειρόγραφης σημείωσης σε κείμενο** θα φαίνεται λίγο πιο ακατέργαστη. Ο post‑processor είναι ιδιαίτερα χρήσιμος για σημειώσεις που περιέχουν κουκίδες ή μικτά κεφαλαία-μικρά γράμματα. + +## Βήμα 6 – Επαλήθευση του Αποτελέσματος και Διαχείριση Edge Cases + +Αφού εκτυπώσετε το βελτιωμένο αποτέλεσμα, ελέγξτε ξανά ότι όλα φαίνονται σωστά. Εδώ είναι ένας γρήγορος έλεγχος που μπορείτε να προσθέσετε: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Λίστα ελέγχου edge‑case** + +| Κατάσταση | Τι να κάνετε | +|-----------|--------------| +| **Πολύ χαμηλή αντίθεση** | Αυξήστε την αντίθεση με `cv2.convertScaleAbs` πριν τη φόρτωση. | +| **Πολλαπλές γλώσσες** | Ορίστε `ocr_engine.language = ["en", "es"]` (ή τις γλώσσες‑στόχο σας). | +| **Μεγάλα έγγραφα** | Επεξεργαστείτε τις σελίδες σε παρτίδες για να αποφύγετε αυξήσεις μνήμης. | +| **Ειδικά σύμβολα** | Προσθέστε ένα προσαρμοσμένο λεξικό μέσω `ocr_engine.add_custom_words([...])`. | + +## Visual Overview + +Παρακάτω υπάρχει μια εικόνα placeholder που απεικονίζει τη ροή εργασίας—από μια φωτογραφημένη σημείωση σε καθαρό κείμενο. Το alt text περιέχει τη βασική λέξη‑κλειδί, καθιστώντας την εικόνα φιλική για SEO. + +![πώς να χρησιμοποιήσετε OCR σε μια εικόνα χειρόγραφου σημειώματος](/images/handwritten_ocr_flow.png "πώς να χρησιμοποιήσετε OCR σε μια εικόνα χειρόγραφου σημειώματος") + +## Πλήρες, Εκτελέσιμο Script + +Συνδυάζοντας όλα τα κομμάτια, εδώ είναι το πλήρες, έτοιμο για αντιγραφή‑και‑επικόλληση πρόγραμμα: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Αναμενόμενο αποτέλεσμα (παράδειγμα)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Παρατηρήστε πώς ο post‑processor διόρθωσε το τυπογραφικό σφάλμα “T0d@y” και ομαλοποίησε τα κενά. + +## Συνηθισμένα Πιθανά Προβλήματα & Συμβουλές Pro + +- **Το μέγεθος της εικόνας μετράει** – Οι μηχανές OCR συνήθως περιορίζουν το μέγεθος εισόδου στα 4 K × 4 K. Αλλάξτε το μέγεθος μεγάλων φωτογραφιών εκ των προτέρων. +- **Στυλ χειρογράφου** – Η καλλιγραφική γραφή vs. μπλοκ γράμματα μπορεί να επηρεάσει την ακρίβεια. Αν ελέγχετε την πηγή (π.χ., ψηφιακό στυλό), προτιμήστε μπλοκ γράμματα για καλύτερα αποτελέσματα. +- **Επεξεργασία παρτίδων** – Όταν διαχειρίζεστε δεκάδες σημειώσεις, τυλίξτε το script σε βρόχο και αποθηκεύστε κάθε αποτέλεσμα σε CSV ή SQLite DB. +- **Διαρροές μνήμης** – Ορισμένα SDK διατηρούν εσωτερικές προσωρινές μνήμες· καλέστε `ocr_engine.dispose()` μετά το τέλος αν παρατηρήσετε επιβράδυνση. + +## Επόμενα Βήματα – Πέρα από το Απλό OCR + +Τώρα που έχετε κατακτήσει **πώς να χρησιμοποιήσετε OCR** για μια μοναδική εικόνα, σκεφτείτε αυτές τις επεκτάσεις: + +1. **Ενσωμάτωση με αποθήκευση στο cloud** – Ανάκτηση εικόνων από AWS S3 ή Azure Blob, εκτέλεση της ίδιας διαδικασίας και αποστολή των αποτελεσμάτων πίσω. +2. **Προσθήκη ανίχνευσης γλώσσας** – Χρησιμοποιήστε `ocr_engine.detect_language()` για αυτόματη εναλλαγή λεξικών. +3. **Συνδυασμός με NLP** – Εισάγετε το καθαρισμένο κείμενο στο spaCy ή NLTK για εξαγωγή οντοτήτων, ημερομηνιών ή ενεργειών. +4. **Δημιουργία REST endpoint** – Τυλίξτε το script σε Flask ή FastAPI ώστε άλλες υπηρεσίες να μπορούν να στέλνουν POST εικόνες και να λαμβάνουν κείμενο κωδικοποιημένο σε JSON. + +Όλες αυτές οι ιδέες περιστρέφονται ακόμα γύρω από τις βασικές έννοιες **recognize handwritten text**, **extract handwritten text**, και **convert handwritten image**—τις ακριβείς φράσεις που πιθανότατα θα ψάξετε στη συνέχεια. + +### TL;DR + +Σας δείξαμε **πώς να χρησιμοποιήσετε OCR** για την αναγνώριση χειρόγραφου κειμένου, την εξαγωγή του και τη βελτίωση του αποτελέσματος σε ένα χρήσιμο string. Το πλήρες script είναι έτοιμο για εκτέλεση, η ροή εργασίας εξηγείται βήμα‑βήμα, και έχετε τώρα μια λίστα ελέγχου για κοινά edge cases. Πάρτε μια φωτογραφία της επόμενης σημείωσης της συνάντησής σας, τροφοδοτήστε τη στο script, και αφήστε τη μηχανή να κάνει την πληκτρολόγηση για εσάς. + +Καλό coding, και ας είναι πάντα αναγνώσιμες οι σημειώσεις σας! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/greek/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/greek/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..c9d8d9a37 --- /dev/null +++ b/ocr/greek/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,187 @@ +--- +category: general +date: 2026-03-28 +description: Εκτελέστε OCR σε εικόνα και λάβετε καθαρό κείμενο με συντεταγμένες περιοριστικού + πλαισίου. Μάθετε πώς να εξάγετε OCR, να καθαρίζετε OCR και να εμφανίζετε τα αποτελέσματα + βήμα‑βήμα. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: el +og_description: Εκτελέστε OCR σε εικόνα, καθαρίστε το αποτέλεσμα και εμφανίστε τις + συντεταγμένες του πλαισίου σε ένα σύντομο οδηγό. +og_title: Πραγματοποιήστε OCR σε εικόνα – Καθαρά αποτελέσματα και πλαίσια οριοθέτησης +tags: +- OCR +- Computer Vision +- Python +title: Εκτέλεση OCR σε εικόνα – Καθαρά αποτελέσματα και εμφάνιση συντεταγμένων πλαισίου + περιορισμού +url: /el/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Εκτέλεση OCR σε Εικόνα – Καθαρά Αποτελέσματα και Εμφάνιση Συντεταγμένων Πλαισίου Περιγράμματος + +Έχετε ποτέ χρειαστεί να **εκτελέσετε OCR σε εικόνα** αλλά να λαμβάνετε ακατάστατο κείμενο και να μην ξέρετε πού βρίσκεται κάθε λέξη στην εικόνα; Δεν είστε μόνοι. Σε πολλά έργα—ψηφιοποίηση τιμολογίων, σάρωση αποδείξεων ή απλή εξαγωγή κειμένου—η λήψη του ακατέργαστου αποτελέσματος OCR είναι μόνο το πρώτο εμπόδιο. Τα καλά νέα; Μπορείτε να καθαρίσετε αυτό το αποτέλεσμα και να δείτε αμέσως τις συντεταγμένες του πλαισίου περιγράμματος κάθε περιοχής χωρίς να γράψετε πολύ κώδικα boilerplate. + +Σε αυτόν τον οδηγό θα περάσουμε από το **πώς να εξάγετε OCR**, θα τρέξουμε έναν **πώς να καθαρίσετε OCR** μετα‑επεξεργαστή, και τελικά **να εμφανίσουμε τις συντεταγμένες του πλαισίου περιγράμματος** για κάθε καθαρή περιοχή. Στο τέλος θα έχετε ένα ενιαίο, εκτελέσιμο σενάριο που μετατρέπει μια θολή φωτογραφία σε τακτικό, δομημένο κείμενο έτοιμο για επεξεργασία downstream. + +## Τι Θα Χρειαστείτε + +- Python 3.9+ (η σύνταξη παρακάτω λειτουργεί σε 3.8 και νεότερες εκδόσεις) +- Μια μηχανή OCR που υποστηρίζει `recognize(..., return_structured=True)` – για παράδειγμα, μια φανταστική βιβλιοθήκη `engine` που χρησιμοποιείται στο απόσπασμα. Αντικαταστήστε την με Tesseract, EasyOCR ή οποιοδήποτε SDK που επιστρέφει δεδομένα περιοχής. +- Βασική εξοικείωση με συναρτήσεις και βρόχους της Python +- Ένα αρχείο εικόνας που θέλετε να σαρώσετε (PNG, JPG κ.λπ.) + +> **Pro tip:** Αν χρησιμοποιείτε Tesseract, η συνάρτηση `pytesseract.image_to_data` παρέχει ήδη πλαίσια περιγράμματος. Μπορείτε να τυλίξετε το αποτέλεσμα της σε έναν μικρό προσαρμογέα που μιμείται το API `engine.recognize` που φαίνεται παρακάτω. + +--- + +![παράδειγμα εκτέλεσης OCR σε εικόνα](image-placeholder.png "παράδειγμα εκτέλεσης OCR σε εικόνα") + +*Alt text: diagram showing how to perform OCR on image and visualize bounding box coordinates* + +## Βήμα 1 – Εκτέλεση OCR σε Εικόνα και Λήψη Δομημένων Περιοχών + +Το πρώτο που πρέπει να κάνετε είναι να ζητήσετε από τη μηχανή OCR να επιστρέψει όχι μόνο απλό κείμενο, αλλά μια δομημένη λίστα περιοχών κειμένου. Αυτή η λίστα περιέχει τη ακατέργαστη συμβολοσειρά και το ορθογώνιο που την περιβάλλει. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Γιατί αυτό είναι σημαντικό:** +Όταν ζητάτε μόνο απλό κείμενο χάνετε το χωρικό πλαίσιο. Τα δομημένα δεδομένα σας επιτρέπουν να **εμφανίσετε τις συντεταγμένες του πλαισίου περιγράμματος**, να ευθυγραμμίσετε το κείμενο με πίνακες ή να δώσετε ακριβείς θέσεις σε ένα downstream μοντέλο. + +## Βήμα 2 – Πώς να Καθαρίσετε το Αποτέλεσμα OCR με έναν Μετα‑Επεξεργαστή + +Οι μηχανές OCR είναι εξαιρετικές στο να εντοπίζουν χαρακτήρες, αλλά συχνά αφήνουν περιττά κενά, τεχνάσματα αλλαγής γραμμής ή λανθασμένα αναγνωρισμένα σύμβολα. Ένας μετα‑επεξεργαστής κανονικοποιεί το κείμενο, διορθώνει κοινά σφάλματα OCR και αφαιρεί λευκούς χαρακτήρες. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Αν δημιουργείτε τον δικό σας καθαριστή, σκεφτείτε: + +- Αφαίρεση μη‑ASCII χαρακτήρων (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Συμπίεση πολλαπλών κενών σε ένα ενιαίο κενό +- Εφαρμογή ελεγκτή ορθογραφίας όπως `pyspellchecker` για προφανή τυπογραφικά λάθη + +**Γιατί πρέπει να σας ενδιαφέρει:** +Μια τακτική συμβολοσειρά κάνει την αναζήτηση, την ευρετηρίαση και τις downstream pipelines NLP πολύ πιο αξιόπιστες. Με άλλα λόγια, το **πώς να καθαρίσετε OCR** είναι συχνά η διαφορά μεταξύ ενός χρήσιμου συνόλου δεδομένων και ενός κεφαλαλγίας. + +## Βήμα 3 – Εμφάνιση Συντεταγμένων Πλαισίου Περιγράμματος για Κάθε Καθαρή Περιοχή + +Τώρα που το κείμενο είναι τακτοποιημένο, επαναλαμβάνουμε πάνω από κάθε περιοχή, εκτυπώνοντας το ορθογώνιο της και τη καθαρή συμβολοσειρά. Αυτό είναι το τμήμα όπου τελικά **εμφανίζουμε τις συντεταγμένες του πλαισίου περιγράμματος**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Δείγμα εξόδου** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Μπορείτε τώρα να περάσετε αυτές τις συντεταγμένες σε μια βιβλιοθήκη σχεδίασης (π.χ., OpenCV) για να επικάλυψετε πλαίσια στην αρχική εικόνα, ή να τις αποθηκεύσετε σε μια βάση δεδομένων για μελλοντικά ερωτήματα. + +## Πλήρες, Έτοιμο‑για‑Εκτέλεση Σενάριο + +Παρακάτω είναι το πλήρες πρόγραμμα που ενώνει τα τρία βήματα. Αντικαταστήστε τις κλήσεις placeholder `engine` με το πραγματικό σας OCR SDK. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Πώς να Εκτελέσετε + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Θα πρέπει να δείτε μια λίστα πλαισίων περιγράμματος συνδεδεμένων με καθαρό κείμενο, ακριβώς όπως το δείγμα εξόδου παραπάνω. + +## Συχνές Ερωτήσεις & Ακραίες Περιπτώσεις + +| Ερώτηση | Απάντηση | +|----------|--------| +| **Τι γίνεται αν η μηχανή OCR δεν υποστηρίζει `return_structured`;** | Γράψτε έναν ελαφρύ wrapper που μετατρέπει το ακατέργαστο αποτέλεσμα της μηχανής (συνήθως μια λίστα λέξεων με συντεταγμένες) σε αντικείμενα με ιδιότητες `text` και `bounding_box`. | +| **Μπορώ να λάβω βαθμολογίες εμπιστοσύνης;** | Πολλά SDK εκθέτουν ένα μέτρο εμπιστοσύνης ανά περιοχή. Προσθέστε το στη δήλωση εκτύπωσης: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Πώς να διαχειριστείτε κείμενο με περιστροφή;** | Προεπεξεργαστείτε την εικόνα με το `cv2.minAreaRect` του OpenCV για διόρθωση κλίσης πριν καλέσετε το `recognize`. | +| **Τι γίνεται αν χρειάζομαι το αποτέλεσμα σε JSON;** | Σειριοποιήστε το `processed_result.regions` με `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Υπάρχει τρόπος να οπτικοποιήσετε τα πλαίσια;** | Χρησιμοποιήστε το OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` μέσα στον βρόχο, μετά `cv2.imwrite("annotated.jpg", img)`. | + +## Συμπερασματικά + +Μόλις μάθατε **πώς να εκτελέσετε OCR σε εικόνα**, να καθαρίσετε το ακατέργαστο αποτέλεσμα, και **να εμφανίσετε τις συντεταγμένες του πλαισίου περιγράμματος** για κάθε περιοχή. Η τριπλή ροή—recognize → post‑process → iterate—είναι ένα επαναχρησιμοποιήσιμο μοτίβο που μπορείτε να ενσωματώσετε σε οποιοδήποτε έργο Python που χρειάζεται αξιόπιστη εξαγωγή κειμένου. + +### Τι Ακολουθεί; + +- **Εξερευνήστε διαφορετικά OCR back‑ends** (Tesseract, EasyOCR, Google Vision) και συγκρίνετε την ακρίβεια. +- **Ενσωματώστε με μια βάση δεδομένων** για αποθήκευση δεδομένων περιοχής για αναζητήσιμα αρχεία. +- **Προσθέστε ανίχνευση γλώσσας** για να κατευθύνετε κάθε περιοχή μέσω του κατάλληλου ελεγκτή ορθογραφίας. +- **Επικάλυψη πλαισίων στην αρχική εικόνα** για οπτική επαλήθευση (δείτε το απόσπασμα OpenCV παραπάνω). + +Αν αντιμετωπίσετε ιδιομορφίες, θυμηθείτε ότι η μεγαλύτερη νίκη προέρχεται από ένα ισχυρό βήμα μετα‑επεξεργασίας· μια καθαρή συμβολοσειρά είναι πολύ πιο εύκολο να δουλέψει από μια ακατέργαστη ροή χαρακτήρων. + +Καλό προγραμματισμό, και εύχομαι οι OCR pipelines σας να είναι πάντα τακτικοί! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/greek/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/greek/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..7aa757a45 --- /dev/null +++ b/ocr/greek/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Μάθημα Python OCR που δείχνει πώς να εξάγετε κείμενο από εικόνα με το + Aspose OCR Cloud. Μάθετε πώς να φορτώνετε εικόνα για OCR και να μετατρέπετε την + εικόνα σε απλό κείμενο σε λίγα λεπτά. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: el +og_description: Το σεμινάριο Python OCR εξηγεί πώς να φορτώσετε εικόνα για OCR και + να μετατρέψετε το απλό κείμενο της εικόνας χρησιμοποιώντας το Aspose OCR Cloud. + Λάβετε τον πλήρη κώδικα και συμβουλές. +og_title: Python OCR Tutorial – Εξαγωγή κειμένου από εικόνες +tags: +- OCR +- Python +- Image Processing +title: Python OCR Tutorial – Εξαγωγή κειμένου από εικόνες +url: /el/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Tutorial – Εξαγωγή κειμένου από εικόνες + +Έχετε αναρωτηθεί ποτέ πώς να μετατρέψετε μια ακατάστατη φωτογραφία από απόδειξη σε καθαρό, αναζητήσιμο κείμενο; Δεν είστε ο μόνος. Κατά την εμπειρία μου, το μεγαλύτερο εμπόδιο δεν είναι η μηχανή OCR, αλλά η μετατροπή της εικόνας στη σωστή μορφή και η εξαγωγή του απλού κειμένου χωρίς προβλήματα. + +Αυτό το **python ocr tutorial** σας καθοδηγεί βήμα προς βήμα—φόρτωση εικόνας για OCR, εκτέλεση της αναγνώρισης, και τελικά μετατροπή του απλού κειμένου της εικόνας σε μια συμβολοσειρά Python που μπορείτε να αποθηκεύσετε ή να αναλύσετε. Στο τέλος θα μπορείτε να **extract text image python** με στυλ, και δεν θα χρειαστείτε καμία επί πληρωμή άδεια για να ξεκινήσετε. + +## Τι θα μάθετε + +- Πώς να εγκαταστήσετε και να εισάγετε το Aspose OCR Cloud SDK για Python. +- Ο ακριβής κώδικας για **load image for OCR** (PNG, JPEG, TIFF, PDF, κλπ). +- Πώς να καλέσετε τη μηχανή για να εκτελέσετε τη μετατροπή **ocr image to text**. +- Συμβουλές για τη διαχείριση κοινών edge‑cases όπως PDF πολλαπλών σελίδων ή σάρωση χαμηλής ανάλυσης. +- Τρόποι επαλήθευσης του αποτελέσματος και τι να κάνετε αν το κείμενο εμφανίζεται παραμορφωμένο. + +### Προαπαιτούμενα + +- Python 3.8+ εγκατεστημένο στο μηχάνημά σας. +- Δωρεάν λογαριασμός Aspose Cloud (η δοκιμή λειτουργεί χωρίς άδεια). +- Βασική εξοικείωση με pip και εικονικά περιβάλλοντα—τίποτα περίπλοκο. + +> **Pro tip:** Αν ήδη χρησιμοποιείτε virtualenv, ενεργοποιήστε το τώρα. Διατηρεί τις εξαρτήσεις σας οργανωμένες και αποτρέπει συγκρούσεις εκδόσεων. + +![Στιγμιότυπο οθόνης του Python OCR tutorial που δείχνει το αναγνωρισμένο κείμενο](path/to/ocr_example.png "Python OCR tutorial – εμφάνιση εξαγόμενου απλού κειμένου") + +## Step 1 – Εγκατάσταση του Aspose OCR Cloud SDK + +Πρώτα απ' όλα, χρειαζόμαστε τη βιβλιοθήκη που επικοινωνεί με την υπηρεσία OCR της Aspose. Ανοίξτε ένα τερματικό και εκτελέστε: + +```bash +pip install asposeocrcloud +``` + +Αυτή η εντολή κατεβάζει το πιο πρόσφατο SDK (επί του παρόντος έκδοση 23.12). Το πακέτο περιλαμβάνει όλα όσα χρειάζεστε—δεν απαιτούνται πρόσθετες βιβλιοθήκες επεξεργασίας εικόνας. + +## Step 2 – Αρχικοποίηση της μηχανής OCR (Primary Keyword in Action) + +Τώρα που το SDK είναι έτοιμο, μπορούμε να εκκινήσουμε τη μηχανή **python ocr tutorial**. Ο κατασκευαστής δεν χρειάζεται κανένα κλειδί άδειας για τη δοκιμή, κάτι που απλοποιεί τη διαδικασία. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** Η αρχικοποίηση της μηχανής μόνο μία φορά διατηρεί τις επόμενες κλήσεις γρήγορες. Αν δημιουργείτε ξανά το αντικείμενο για κάθε εικόνα, θα σπαταλήσετε δικτυακές κλήσεις. + +## Step 3 – Φόρτωση εικόνας για OCR + +Εδώ όπου η λέξη-κλειδί **load image for OCR** λάμπει. Η μέθοδος `Image.load` του SDK δέχεται διαδρομή αρχείου ή URL, και ανιχνεύει αυτόματα τη μορφή (PNG, JPEG, TIFF, PDF, κλπ). Ας φορτώσουμε ένα δείγμα απόδειξης: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Αν εργάζεστε με PDF πολλαπλών σελίδων, απλώς δείξτε στο αρχείο PDF· το SDK θα αντιμετωπίσει κάθε σελίδα ως ξεχωριστή εικόνα εσωτερικά. + +## Step 4 – Εκτέλεση μετατροπής OCR εικόνας σε κείμενο + +Με την εικόνα στη μνήμη, η πραγματική OCR εκτελείται σε μία γραμμή. Η μέθοδος `recognize` επιστρέφει ένα αντικείμενο `OcrResult` που περιέχει το απλό κείμενο, τις βαθμολογίες εμπιστοσύνης, και ακόμη και τα πλαίσια περιορισμού εάν τα χρειαστείτε αργότερα. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** Για εικόνες χαμηλής ανάλυσης (κάτω από 300 dpi) ίσως θέλετε πρώτα να αυξήσετε το μέγεθος της εικόνας. Το SDK προσφέρει βοηθητικό `Resize`, αλλά για τις περισσότερες αποδείξεις η προεπιλογή λειτουργεί καλά. + +## Step 5 – Μετατροπή του απλού κειμένου εικόνας σε χρήσιμη συμβολοσειρά + +Το τελευταίο κομμάτι του παζλ είναι η εξαγωγή του απλού κειμένου από το αντικείμενο αποτελέσματος. Αυτό είναι το βήμα **convert image plain text** που μετατρέπει το blob OCR σε κάτι που μπορείτε να εκτυπώσετε, αποθηκεύσετε ή να το δώσετε σε άλλο σύστημα. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Όταν εκτελέσετε το script, θα πρέπει να δείτε κάτι όπως: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Αυτή η έξοδος είναι τώρα μια κανονική συμβολοσειρά Python, έτοιμη για εξαγωγή CSV, εισαγωγή σε βάση δεδομένων ή επεξεργασία φυσικής γλώσσας. + +## Διαχείριση κοινών προβλημάτων + +### 1. Κενές ή θορυβώδεις εικόνες + +Αν το `ocr_result.text` επιστρέφει κενό, ελέγξτε ξανά την ποιότητα της εικόνας. Μια γρήγορη λύση είναι να προσθέσετε ένα βήμα προεπεξεργασίας: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. PDF πολλαπλών σελίδων + +Όταν δίνετε ένα PDF, το `recognize` επιστρέφει αποτελέσματα για κάθε σελίδα. Επανάληψη μέσω αυτών ως εξής: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Υποστήριξη γλώσσας + +Το Aspose OCR υποστηρίζει πάνω από 60 γλώσσες. Για να αλλάξετε τη γλώσσα, ορίστε την ιδιότητα `language` πριν καλέσετε το `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Πλήρες λειτουργικό παράδειγμα + +Συνδυάζοντας τα πάντα, εδώ είναι ένα πλήρες script έτοιμο για αντιγραφή‑επικόλληση που καλύπτει τα πάντα από την εγκατάσταση μέχρι τη διαχείριση edge‑case: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Εκτελέστε το script (`python ocr_demo.py`) και θα δείτε την έξοδο **ocr image to text** απευθείας στην κονσόλα σας. + +## Ανακεφαλαίωση – Τι καλύψαμε + +- Εγκαταστήσαμε το SDK **Aspose OCR Cloud** (`pip install asposeocrcloud`). +- **Initialised the OCR engine** χωρίς άδεια (τέλειο για δοκιμή). +- Δείξαμε πώς να **load image for OCR**, είτε πρόκειται για PNG, JPEG ή PDF. +- Εκτελέσαμε τη μετατροπή **ocr image to text** και **converted image plain text** σε μια χρήσιμη συμβολοσειρά Python. +- Αντιμετωπίσαμε κοινά προβλήματα όπως σάρωση χαμηλής ανάλυσης, PDF πολλαπλών σελίδων και επιλογή γλώσσας. + +## Επόμενα βήματα & Σχετικά θέματα + +Τώρα που έχετε κατακτήσει το **python ocr tutorial**, σκεφτείτε να εξερευνήσετε: + +- **Extract text image python** για επεξεργασία παρτίδας μεγάλων φακέλων αποδείξεων. +- Ενσωμάτωση της εξόδου OCR με **pandas** για ανάλυση δεδομένων (`df = pd.read_csv(StringIO(extracted))`). +- Χρήση του **Tesseract OCR** ως εναλλακτική όταν η σύνδεση στο διαδίκτυο είναι περιορισμένη. +- Προσθήκη post‑processing με **spaCy** για αναγνώριση οντοτήτων όπως ημερομηνίες, ποσά και ονόματα εμπόρων. + +Μη διστάσετε να πειραματιστείτε: δοκιμάστε διαφορετικές μορφές εικόνας, ρυθμίστε την αντίθεση ή αλλάξτε γλώσσες. Το πεδίο του OCR είναι ευρύ, και οι δεξιότητες που μόλις αποκτήσατε αποτελούν μια ισχυρή βάση για οποιοδήποτε έργο αυτοματοποίησης εγγράφων. + +Καλό προγραμματισμό, και εύχομαι το κείμενό σας πάντα να είναι αναγνώσιμο! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/greek/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/greek/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..bc3eedc6f --- /dev/null +++ b/ocr/greek/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,221 @@ +--- +category: general +date: 2026-03-28 +description: Μάθετε πώς να εκτελείτε OCR σε εικόνα, να κατεβάζετε αυτόματα το μοντέλο + Hugging Face, να καθαρίζετε το κείμενο OCR και να διαμορφώνετε το μοντέλο LLM σε + Python χρησιμοποιώντας το Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: el +og_description: Εκτελέστε OCR σε εικόνα και καθαρίστε το αποτέλεσμα χρησιμοποιώντας + ένα αυτόματα ληφθέν μοντέλο Hugging Face. Αυτός ο οδηγός δείχνει πώς να διαμορφώσετε + το μοντέλο LLM στην Python. +og_title: Εκτέλεση OCR σε εικόνα – Πλήρης οδηγός Aspose OCR Cloud +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Εκτελέστε OCR σε εικόνα με το Aspose OCR Cloud – Πλήρης οδηγός βήμα‑προς‑βήμα +url: /el/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Εκτέλεση OCR σε εικόνα – Πλήρης οδηγός Aspose OCR Cloud + +Έχετε χρειαστεί ποτέ να τρέξετε OCR σε αρχεία εικόνας αλλά το ακατέργαστο αποτέλεσμα να φαίνεται σαν ακατάστατο μπέρδεμα; Από την εμπειρία μου το μεγαλύτερο πρόβλημα δεν είναι η αναγνώριση αυτή καθαυτή—είναι ο καθαρισμός. Ευτυχώς, το Aspose OCR Cloud σας επιτρέπει να συνδέσετε έναν LLM post‑processor που μπορεί να *καθαρίσει το κείμενο OCR* αυτόματα. Σε αυτό το tutorial θα περάσουμε από όλα όσα χρειάζεστε: από **τη λήψη ενός μοντέλου Hugging Face** μέχρι τη ρύθμιση του LLM, την εκτέλεση της μηχανής OCR και, τέλος, την τελική επεξεργασία του αποτελέσματος. + +Στο τέλος αυτού του οδηγού θα έχετε ένα έτοιμο‑για‑εκτέλεση script που: + +1. Κατεβάζει ένα συμπαγές μοντέλο Qwen 2.5 από το Hugging Face (αυτόματα για εσάς). +2. Ρυθμίζει το μοντέλο ώστε να τρέχει μέρος του δικτύου στην GPU και το υπόλοιπο στην CPU. +3. Εκτελεί τη μηχανή OCR σε μια εικόνα χειρόγραφης σημείωσης. +4. Χρησιμοποιεί το LLM για να καθαρίσει το αναγνωρισμένο κείμενο, παρέχοντάς σας έξοδο αναγνώσιμη από άνθρωπο. + +> **Προαπαιτούμενα** – Python 3.8+, πακέτο `asposeocrcloud`, GPU με τουλάχιστον 4 GB VRAM (προαιρετικό αλλά συνιστάται), και σύνδεση στο internet για την πρώτη λήψη του μοντέλου. + +--- + +## Τι θα χρειαστείτε + +- **Aspose OCR Cloud SDK** – εγκαταστήστε το με `pip install asposeocrcloud`. +- **Ένα δείγμα εικόνας** – π.χ., `handwritten_note.jpg` τοποθετημένο σε τοπικό φάκελο. +- **Υποστήριξη GPU** – αν διαθέτετε GPU με υποστήριξη CUDA, το script θα μεταφέρει 30 στρώματα· διαφορετικά θα επιστρέψει αυτόματα στην CPU. +- **Δικαίωμα εγγραφής** – το script αποθηκεύει την cache του μοντέλου στο `YOUR_DIRECTORY`; βεβαιωθείτε ότι ο φάκελος υπάρχει. + +--- + +## Βήμα 1 – Ρύθμιση του μοντέλου LLM (λήψη μοντέλου Hugging Face) + +Το πρώτο που κάνουμε είναι να πούμε στο Aspose AI από πού να πάρει το μοντέλο. Η κλάση `AsposeAIModelConfig` διαχειρίζεται την αυτόματη λήψη, την ποσοτικοποίηση και την κατανομή των στρωμάτων στην GPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Γιατί είναι σημαντικό** – Η ποσοτικοποίηση σε `int8` μειώνει δραστικά τη χρήση μνήμης (≈ 4 GB vs 12 GB). Ο διαχωρισμός του μοντέλου μεταξύ GPU και CPU σας επιτρέπει να τρέξετε ένα LLM 3 δισεκατομμυρίων παραμέτρων ακόμα και σε μια μέτρια RTX 3060. Αν δεν έχετε GPU, ορίστε `gpu_layers=0` και το SDK θα κρατήσει τα πάντα στην CPU. + +> **Συμβουλή:** Η πρώτη εκτέλεση θα κατεβάσει ~ 1.5 GB, οπότε δώστε του λίγα λεπτά και μια σταθερή σύνδεση. + +--- + +## Βήμα 2 – Αρχικοποίηση της AI Μηχανής με τη Ρύθμιση του Μοντέλου + +Τώρα ξεκινάμε τη μηχανή Aspose AI και της δίνουμε τη ρύθμιση που μόλις δημιουργήσαμε. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Τι συμβαίνει στο παρασκήνιο;** Το SDK ελέγχει το `directory_model_path` για υπάρχον μοντέλο. Αν βρει μια αντίστοιχη έκδοση, το φορτώνει αμέσως· διαφορετικά, κατεβάζει το αρχείο GGUF από το Hugging Face, το αποσυμπιέζει και προετοιμάζει τη γραμμή επεξεργασίας. + +--- + +## Βήμα 3 – Δημιουργία της Μηχανής OCR και Σύνδεση του AI Post‑Processor + +Η μηχανή OCR κάνει το βαρέως βάρους έργο της αναγνώρισης χαρακτήρων. Συνδέοντας το `ocr_ai.run_postprocessor` ενεργοποιούμε την **καθαριότητα του κειμένου OCR** αυτόματα μετά την αναγνώριση. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Γιατί να χρησιμοποιήσετε post‑processor;** Το ακατέργαστο OCR συχνά περιλαμβάνει αλλαγές γραμμής στα λάθος σημεία, λανθασμένη στίξη ή τυχαία σύμβολα. Το LLM μπορεί να ξαναγράψει το αποτέλεσμα σε σωστές προτάσεις, να διορθώσει ορθογραφικά λάθη και ακόμη να συμπληρώσει ελλιπή λέξεις—με άλλα λόγια, μετατρέπει ένα ακατέργαστο αρχείο σε γυαλιστερό κείμενο. + +--- + +## Βήμα 4 – Εκτέλεση OCR σε Αρχείο Εικόνας + +Με όλα συνδεδεμένα, ήρθε η ώρα να τροφοδοτήσουμε μια εικόνα στη μηχανή. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Ακραία περίπτωση:** Αν η εικόνα είναι μεγάλη (> 5 MP), ίσως θελήσετε να την μειώσετε πρώτα για να επιταχύνετε την επεξεργασία. Το SDK δέχεται αντικείμενο Pillow `Image`, οπότε μπορείτε να προεπεξεργαστείτε με `PIL.Image.thumbnail()` αν χρειαστεί. + +--- + +## Βήμα 5 – Αφήστε το AI να Καθαρίσει το Αναγνωρισμένο Κείμενο και Εμφανίστε Και τις Δύο Εκδόσεις + +Τέλος, καλούμε τον post‑processor που συνδέσαμε νωρίτερα. Αυτό το βήμα δείχνει τη διαφορά μεταξύ *πριν* και *μετά* τον καθαρισμό. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Αναμενόμενο Αποτέλεσμα + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Δείτε πώς το LLM: + +- Διόρθωσε κοινές λανθασμένες αναγνώσεις OCR (`Th1s` → `This`). +- Αφαίρεσε τυχαία σύμβολα (`&` → `and`). +- Κανονικοποίησε τις αλλαγές γραμμής σε σωστές προτάσεις. + +--- + +## 🎨 Οπτική Επισκόπηση (Ροή εργασίας Εκτέλεσης OCR σε εικόνα) + +![Ροή εργασίας εκτέλεσης OCR σε εικόνα](run_ocr_on_image_workflow.png "Διάγραμμα που δείχνει τη διαδικασία εκτέλεσης OCR σε εικόνα από τη λήψη του μοντέλου μέχρι το καθαρισμένο αποτέλεσμα") + +Το παραπάνω διάγραμμα συνοψίζει ολόκληρη τη διαδικασία: **λήψη μοντέλου Hugging Face → ρύθμιση LLM → αρχικοποίηση AI → μηχανή OCR → AI post‑processor → καθαρό κείμενο OCR**. + +--- + +## Συχνές Ερωτήσεις & Επαγγελματικές Συμβουλές + +### Τι κάνω αν δεν έχω GPU; + +Ορίστε `gpu_layers=0` στο `AsposeAIModelConfig`. Το μοντέλο θα τρέξει εξ ολοκλήρου στην CPU, κάτι που είναι πιο αργό αλλά λειτουργικό. Μπορείτε επίσης να μεταβείτε σε μικρότερο μοντέλο (π.χ., `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) για να διατηρήσετε λογικό χρόνο εκτέλεσης. + +### Πώς αλλάζω το μοντέλο αργότερα; + +Απλώς ενημερώστε το `hugging_face_repo_id` και ξανατρέξτε `ocr_ai.initialize(model_config)`. Το SDK θα εντοπίσει την αλλαγή έκδοσης, θα κατεβάσει το νέο μοντέλο και θα αντικαταστήσει τα αρχεία cache. + +### Μπορώ να προσαρμόσω το prompt του post‑processor; + +Ναι. Περνάτε ένα λεξικό στο `custom_settings` με κλειδί `prompt_template`. Για παράδειγμα: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Πρέπει να αποθηκεύσω το καθαρισμένο κείμενο σε αρχείο; + +Απολύτως. Μετά τον καθαρισμό μπορείτε να γράψετε το αποτέλεσμα σε αρχείο `.txt` ή `.json` για περαιτέρω επεξεργασία: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Συμπέρασμα + +Σας δείξαμε πώς να **εκτελέσετε OCR σε αρχεία εικόνας** με το Aspose OCR Cloud, να **κατεβάσετε αυτόματα ένα μοντέλο Hugging Face**, να **ρυθμίσετε τις παραμέτρους του μοντέλου LLM** και, τέλος, να **καθαρίσετε το κείμενο OCR** χρησιμοποιώντας έναν ισχυρό LLM post‑processor. Η όλη διαδικασία χωράει σε ένα μόνο, εύκολο‑να‑τρέξει script Python και λειτουργεί τόσο σε μηχανές με GPU όσο και σε μηχανές μόνο με CPU. + +Αν αισθάνεστε άνετα με αυτή τη ροή, δοκιμάστε: + +- **Διάφορα LLM** – δοκιμάστε το `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` για μεγαλύτερο παράθυρο συμφραζομένων. +- **Επεξεργασία παρτίδας** – κάντε βρόχο πάνω από έναν φάκελο εικόνων και συγκεντρώστε τα καθαρισμένα αποτελέσματα σε CSV. +- **Προσαρμοσμένα prompts** – προσαρμόστε το AI στον τομέα σας (νομικά έγγραφα, ιατρικές σημειώσεις κ.λπ.). + +Αλλάξτε την τιμή `gpu_layers`, αντικαταστήστε το μοντέλο ή προσθέστε το δικό σας prompt. Ο ουρανός είναι το όριο, και ο κώδικας που έχετε τώρα είναι η πλατφόρμα εκτόξευσης. + +Καλό προγραμματισμό, και οι έξοδοι OCR σας να είναι πάντα καθαροί! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hindi/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/hindi/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..17fad5e13 --- /dev/null +++ b/ocr/hindi/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: छवियों में हस्तलिखित पाठ को पहचानने के लिए OCR का उपयोग कैसे करें। हस्तलिखित + पाठ निकालना, हस्तलिखित छवि को परिवर्तित करना, और तेज़ी से साफ़ परिणाम प्राप्त करना + सीखें। +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: hi +og_description: हस्तलिखित पाठ को पहचानने के लिए OCR का उपयोग कैसे करें। यह ट्यूटोरियल + आपको चरण‑दर‑चरण दिखाता है कि कैसे छवियों से हस्तलिखित पाठ निकालें और परिष्कृत परिणाम + प्राप्त करें। +og_title: हस्तलिखित पाठ को पहचानने के लिए OCR का उपयोग कैसे करें – पूर्ण मार्गदर्शिका +tags: +- OCR +- Handwriting Recognition +- Python +title: हस्तलेखित पाठ को पहचानने के लिए OCR का उपयोग कैसे करें – पूर्ण गाइड +url: /hi/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# हाथ से लिखे टेक्स्ट को पहचानने के लिए OCR का उपयोग कैसे करें – पूर्ण गाइड + +हाथ से लिखे नोट्स के लिए OCR का उपयोग कैसे करें, यह सवाल कई डेवलपर्स पूछते हैं जब उन्हें स्केच, मीटिंग मिनट्स या त्वरित विचारों को डिजिटल रूप में बदलना होता है। इस गाइड में हम ठीक‑ठीक कदम‑दर‑कदम दिखाएंगे कि हाथ से लिखे टेक्स्ट को कैसे पहचाना जाए, उसे कैसे निकाला जाए, और हाथ से लिखी हुई इमेज को साफ़, सर्चेबल स्ट्रिंग्स में कैसे बदला जाए। + +अगर आपने कभी किराने की लिस्ट की फोटो देख कर सोचा हो, “क्या मैं इस हाथ से लिखी इमेज को बिना फिर से टाइप किए टेक्स्ट में बदल सकता हूँ?” – तो आप सही जगह पर हैं। अंत तक आपके पास एक तैयार‑स्क्रिप्ट होगी जो **हाथ से लिखे नोट को टेक्स्ट में** सेकंडों में बदल देगी। + +## आपको क्या चाहिए + +- Python 3.8+ (कोड किसी भी हालिया संस्करण के साथ काम करता है) +- `ocr` लाइब्रेरी – इसे `pip install ocr-sdk` से इंस्टॉल करें (अपने प्रोवाइडर के पैकेज नाम से बदलें) +- एक स्पष्ट तस्वीर हाथ से लिखे नोट की (`hand_note.png` उदाहरण में) +- थोड़ी जिज्ञासा और एक कप कॉफ़ी ☕️ (वैकल्पिक लेकिन अनुशंसित) + +कोई भारी फ्रेमवर्क नहीं, कोई पेड क्लाउड की नहीं – सिर्फ एक लोकल इंजन जो **हाथ से लिखे टेक्स्ट की पहचान** को बॉक्स से बाहर सपोर्ट करता है। + +## चरण 1 – OCR पैकेज इंस्टॉल करें और इम्पोर्ट करें + +सबसे पहले, सही पैकेज को अपने मशीन पर लाएँ। टर्मिनल खोलें और चलाएँ: + +```bash +pip install ocr-sdk +``` + +इंस्टॉलेशन समाप्त होने के बाद, अपने स्क्रिप्ट में मॉड्यूल को इम्पोर्ट करें: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **प्रो टिप:** अगर आप वर्चुअल एनवायरनमेंट इस्तेमाल कर रहे हैं, तो इंस्टॉल करने से पहले उसे एक्टिवेट करें। इससे आपका प्रोजेक्ट साफ़ रहेगा और वर्ज़न टकराव नहीं होगा। + +## चरण 2 – OCR इंजन बनाएं और हैंडराइटन मोड सक्षम करें + +अब हम वास्तव में **OCR का उपयोग कैसे करें** – हमें एक इंजन इंस्टेंस चाहिए जो समझे कि हम प्रिंटेड फ़ॉन्ट की बजाय कर्सिव स्ट्रोक्स के साथ काम कर रहे हैं। नीचे दिया गया स्निपेट इंजन बनाता है और उसे हैंडराइटन मोड पर स्विच करता है: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +`recognition_mode` क्यों सेट करें? क्योंकि अधिकांश OCR इंजन डिफ़ॉल्ट रूप से प्रिंटेड‑टेक्स्ट डिटेक्शन पर होते हैं, जो अक्सर व्यक्तिगत नोट के लूप्स और स्लैंट्स को छोड़ देते हैं। हैंडराइटन मोड को एनेबल करने से सटीकता में काफी सुधार आता है। + +## चरण 3 – वह इमेज लोड करें जिसे आप कन्वर्ट करना चाहते हैं (हैंडराइटन इमेज को कन्वर्ट करें) + +इमेज OCR काम की कच्ची सामग्री है। सुनिश्चित करें कि आपकी तस्वीर लॉसलेस फॉर्मेट (PNG बहुत अच्छा काम करता है) में सेव हो और टेक्स्ट पढ़ने योग्य हो। फिर इसे इस तरह लोड करें: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +अगर इमेज आपके स्क्रिप्ट के साथ ही रखी है, तो आप `"hand_note.png"` का उपयोग कर सकते हैं, पूरे पाथ की जरूरत नहीं। + +> **अगर इमेज धुंधली है तो?** OCR इंजन को फीड करने से पहले OpenCV से प्री‑प्रोसेसिंग करें (जैसे `cv2.cvtColor` से ग्रेस्केल, `cv2.threshold` से कॉन्ट्रास्ट बढ़ाएँ)। + +## चरण 4 – पहचान इंजन चलाएँ और हाथ से लिखे टेक्स्ट को एक्सट्रैक्ट करें + +इंजन तैयार है और इमेज मेमोरी में है, अब हम अंततः **हाथ से लिखे टेक्स्ट को एक्सट्रैक्ट** कर सकते हैं। `recognize` मेथड एक रॉ रिज़ल्ट ऑब्जेक्ट रिटर्न करता है जिसमें टेक्स्ट और कॉन्फिडेंस स्कोर दोनों होते हैं। + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +आम तौर पर रॉ आउटपुट में अनावश्यक लाइन ब्रेक या गलत पहचान वाले कैरेक्टर हो सकते हैं, ख़ासकर अगर हैंडराइटिंग गंदा हो। इसलिए अगला चरण मौजूद है। + +## चरण 5 – (वैकल्पिक) AI पोस्ट‑प्रोसेसर से आउटपुट को पॉलिश करें + +अधिकांश आधुनिक OCR SDKs में एक हल्का AI पोस्ट‑प्रोसेसर होता है जो स्पेसिंग को ठीक करता है, सामान्य OCR त्रुटियों को सुधारता है, और लाइन एंडिंग्स को सामान्य बनाता है। इसे चलाना बहुत आसान है: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +अगर आप इस चरण को छोड़ देते हैं तो भी आपको उपयोगी टेक्स्ट मिलेगा, लेकिन **हाथ से लिखे नोट को टेक्स्ट में** कन्वर्ज़न थोड़ा कच्चा दिखेगा। पोस्ट‑प्रोसेसर खासकर उन नोट्स के लिए उपयोगी है जिनमें बुलेट पॉइंट्स या मिक्स्ड‑केस शब्द हों। + +## चरण 6 – परिणाम की जाँच करें और एज केस हैंडल करें + +पॉलिश्ड परिणाम को प्रिंट करने के बाद, दोबारा चेक करें कि सब कुछ सही दिख रहा है। यहाँ एक त्वरित sanity चेक है जिसे आप जोड़ सकते हैं: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**एज‑केस चेकलिस्ट** + +| स्थिति | क्या करें | +|-----------|------------| +| **बहुत कम कॉन्ट्रास्ट** | लोड करने से पहले `cv2.convertScaleAbs` से कॉन्ट्रास्ट बढ़ाएँ। | +| **एकाधिक भाषाएँ** | `ocr_engine.language = ["en", "es"]` सेट करें (या अपनी टार्गेट भाषाएँ)। | +| **बड़ी डॉक्यूमेंट्स** | मेमोरी स्पाइक से बचने के लिए पेजेज को बैच में प्रोसेस करें। | +| **स्पेशल सिंबल्स** | `ocr_engine.add_custom_words([...])` से कस्टम डिक्शनरी जोड़ें। | + +## विज़ुअल ओवरव्यू + +नीचे एक प्लेसहोल्डर इमेज है जो वर्कफ़्लो को दर्शाती है—फ़ोटो किए हुए नोट से लेकर साफ़ टेक्स्ट तक। अल्ट टेक्स्ट में मुख्य कीवर्ड है, जिससे इमेज SEO‑फ़्रेंडली बनती है। + +![हाथ से लिखे नोट इमेज पर OCR कैसे उपयोग करें](/images/handwritten_ocr_flow.png "हाथ से लिखे नोट इमेज पर OCR कैसे उपयोग करें") + +## पूर्ण, रन‑एबल स्क्रिप्ट + +सभी हिस्सों को मिलाकर, यहाँ पूरा, कॉपी‑एंड‑पेस्ट‑रेडी प्रोग्राम है: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**अपेक्षित आउटपुट (उदाहरण)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +ध्यान दें कि पोस्ट‑प्रोसेसर ने “T0d@y” टाइपो को ठीक किया और स्पेसिंग को सामान्य किया। + +## सामान्य गलतियाँ और प्रो टिप्स + +- **इमेज साइज मायने रखता है** – OCR इंजन आमतौर पर इनपुट साइज को 4 K × 4 K तक सीमित रखते हैं। बड़े फ़ोटो को पहले रिसाइज़ करें। +- **हैंडराइटिंग स्टाइल** – कर्सिव बनाम ब्लॉक लेटर सटीकता को प्रभावित कर सकते हैं। अगर आप स्रोत को नियंत्रित कर सकते हैं (जैसे डिजिटल पेन), तो बेहतर परिणाम के लिए ब्लॉक लेटर का उपयोग करें। +- **बैच प्रोसेसिंग** – अगर दर्जनों नोट्स हैं, तो स्क्रिप्ट को लूप में रैप करें और प्रत्येक रिज़ल्ट को CSV या SQLite DB में स्टोर करें। +- **मेमोरी लीक्स** – कुछ SDKs इंटरनल बफ़र्स रखते हैं; अगर स्लोडाउन महसूस हो तो `ocr_engine.dispose()` कॉल करें। + +## अगले कदम – साधारण OCR से आगे बढ़ें + +अब जब आप एक इमेज के लिए **OCR का उपयोग कैसे करें** में निपुण हो गए हैं, तो इन एक्सटेंशन पर विचार करें: + +1. **क्लाउड स्टोरेज के साथ इंटीग्रेट करें** – AWS S3 या Azure Blob से इमेज लाएँ, वही पाइपलाइन चलाएँ, और परिणाम वापस पुश करें। +2. **भाषा डिटेक्शन जोड़ें** – `ocr_engine.detect_language()` का उपयोग करके स्वचालित रूप से डिक्शनरी बदलें। +3. **NLP के साथ मिलाएँ** – क्लीन टेक्स्ट को spaCy या NLTK में फीड करके एंटिटीज़, डेट्स या एक्शन आइटम्स निकालें। +4. **REST एंडपॉइंट बनाएं** – स्क्रिप्ट को Flask या FastAPI में रैप करें ताकि अन्य सर्विसेज इमेज POST कर सकें और JSON‑एन्कोडेड टेक्स्ट प्राप्त कर सकें। + +इन सभी विचारों का मूल अभी भी **हाथ से लिखे टेक्स्ट को पहचानना**, **हाथ से लिखे टेक्स्ट को एक्सट्रैक्ट करना**, और **हैंडराइटन इमेज को कन्वर्ट करना** पर आधारित है—वही फ़्रेज़ जिन्हें आप आगे खोजेंगे। + +--- + +### TL;DR + +हमने आपको **OCR का उपयोग कैसे करें** दिखाया ताकि आप हाथ से लिखे टेक्स्ट को पहचान सकें, उसे एक्सट्रैक्ट कर सकें, और परिणाम को उपयोगी स्ट्रिंग में पॉलिश कर सकें। पूरा स्क्रिप्ट रन‑रेडी है, वर्कफ़्लो स्टेप‑बाय‑स्टेप समझाया गया है, और सामान्य एज केस के लिए चेकलिस्ट भी है। अगली मीटिंग नोट की फोटो लें, स्क्रिप्ट में डालें, और मशीन को टाइपिंग का काम करने दें। + +हैप्पी कोडिंग, और आपकी नोट्स हमेशा पढ़ने योग्य रहें! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hindi/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/hindi/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..134f069d2 --- /dev/null +++ b/ocr/hindi/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,186 @@ +--- +category: general +date: 2026-03-28 +description: छवि पर OCR करें और बाउंडिंग बॉक्स निर्देशांक के साथ साफ़ टेक्स्ट प्राप्त + करें। सीखें कि OCR कैसे निकालें, OCR को साफ़ करें, और परिणाम चरण‑दर‑चरण प्रदर्शित + करें। +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: hi +og_description: छवि पर OCR करें, आउटपुट को साफ़ करें, और संक्षिप्त ट्यूटोरियल में + बाउंडिंग बॉक्स के निर्देशांक दिखाएँ। +og_title: इमेज पर OCR करें – साफ़ परिणाम और बाउंडिंग बॉक्स +tags: +- OCR +- Computer Vision +- Python +title: छवि पर OCR करें – साफ परिणाम प्राप्त करें और बाउंडिंग बॉक्स निर्देशांक दिखाएँ +url: /hi/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# इमेज पर OCR करें – परिणाम साफ़ करें और बाउंडिंग बॉक्स कॉर्डिनेट्स दिखाएँ + +क्या आपको कभी **perform OCR on image** फ़ाइलों को प्रोसेस करना पड़ा, लेकिन लगातार गड़बड़ टेक्स्ट मिल रहा था और यह नहीं पता चल रहा था कि प्रत्येक शब्द तस्वीर में कहाँ स्थित है? आप अकेले नहीं हैं। कई प्रोजेक्ट्स—इनवॉइस डिजिटाइज़ेशन, रसीद स्कैनिंग, या साधारण टेक्स्ट एक्सट्रैक्शन—में कच्चा OCR आउटपुट सिर्फ पहला कदम होता है। अच्छी खबर? आप उस आउटपुट को साफ़ कर सकते हैं और तुरंत प्रत्येक क्षेत्र के बाउंडिंग बॉक्स कॉर्डिनेट्स देख सकते हैं, बिना बहुत सारा बायलरप्लेट कोड लिखे। + +इस गाइड में हम **OCR कैसे निकालें**, एक **how to clean OCR** पोस्ट‑प्रोसेसर चलाएँ, और अंत में **display bounding box coordinates** प्रत्येक साफ़ किए गए क्षेत्र के लिए दिखाएँगे। अंत तक आपके पास एक एकल, रन करने योग्य स्क्रिप्ट होगी जो धुंधली फोटो को साफ़, संरचित टेक्स्ट में बदल देगी, जो आगे की प्रोसेसिंग के लिए तैयार है। + +## आपको क्या चाहिए + +- Python 3.9+ (नीचे दिया गया सिंटैक्स 3.8 और उसके बाद के संस्करणों पर काम करता है) +- एक OCR इंजन जो `recognize(..., return_structured=True)` को सपोर्ट करता हो – उदाहरण के लिए, नीचे के स्निपेट में उपयोग किया गया काल्पनिक `engine` लाइब्रेरी। इसे Tesseract, EasyOCR, या किसी भी SDK से बदलें जो क्षेत्र डेटा रिटर्न करता हो। +- Python फ़ंक्शन और लूप्स की बेसिक समझ +- एक इमेज फ़ाइल जिसे आप स्कैन करना चाहते हैं (PNG, JPG, आदि) + +> **Pro tip:** यदि आप Tesseract इस्तेमाल कर रहे हैं, तो `pytesseract.image_to_data` फ़ंक्शन पहले से ही बाउंडिंग बॉक्स देता है। आप उसके परिणाम को एक छोटे एडेप्टर में रैप कर सकते हैं जो नीचे दिखाए गए `engine.recognize` API की नकल करता हो। + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Alt text: इमेज पर OCR करने और बाउंडिंग बॉक्स कॉर्डिनेट्स को विज़ुअलाइज़ करने की प्रक्रिया दिखाने वाला डायग्राम* + +## चरण 1 – इमेज पर OCR करें और संरचित क्षेत्रों को प्राप्त करें + +सबसे पहले OCR इंजन को यह बताना है कि वह केवल प्लेन टेक्स्ट नहीं, बल्कि टेक्स्ट क्षेत्रों की एक संरचित सूची रिटर्न करे। इस सूची में कच्चा स्ट्रिंग और उसे घेरने वाला आयत (रेकटैंगल) शामिल होता है। + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**यह क्यों महत्वपूर्ण है:** +जब आप केवल प्लेन टेक्स्ट माँगते हैं तो आप स्पैटियल कॉन्टेक्स्ट खो देते हैं। संरचित डेटा आपको बाद में **display bounding box coordinates** करने, टेक्स्ट को टेबल्स के साथ अलाइन करने, या सटीक लोकेशन को डाउनस्ट्रीम मॉडल को फीड करने की सुविधा देता है। + +## चरण 2 – पोस्ट‑प्रोसेसर के साथ OCR आउटपुट को साफ़ करें + +OCR इंजन अक्षरों को पहचानने में अच्छे होते हैं, लेकिन अक्सर वे अतिरिक्त स्पेसेस, लाइन‑ब्रेक आर्टिफैक्ट्स, या गलत पहचान वाले सिम्बॉल छोड़ देते हैं। एक पोस्ट‑प्रोसेसर टेक्स्ट को सामान्यीकृत करता है, सामान्य OCR त्रुटियों को ठीक करता है, और व्हाइटस्पेस को ट्रिम करता है। + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +यदि आप अपना खुद का क्लीनर बना रहे हैं, तो विचार करें: + +- नॉन‑ASCII कैरेक्टर्स हटाएँ (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- कई स्पेसेस को एक ही स्पेस में बदलें +- स्पष्ट टाइपो के लिए `pyspellchecker` जैसी स्पेल‑चेकर लागू करें + +**आपको क्यों परवाह करनी चाहिए:** +एक साफ़ स्ट्रिंग सर्चिंग, इंडेक्सिंग, और डाउनस्ट्रीम NLP पाइपलाइन को बहुत अधिक भरोसेमंद बनाती है। दूसरे शब्दों में, **how to clean OCR** अक्सर एक उपयोगी डेटासेट और सिरदर्द के बीच का अंतर होता है। + +## चरण 3 – प्रत्येक साफ़ किए गए क्षेत्र के लिए बाउंडिंग बॉक्स कॉर्डिनेट्स दिखाएँ + +अब टेक्स्ट साफ़ हो गया है, हम प्रत्येक क्षेत्र पर इटररेट करते हैं, उसका आयत और साफ़ किया हुआ स्ट्रिंग प्रिंट करते हैं। यही वह हिस्सा है जहाँ हम अंततः **display bounding box coordinates** करते हैं। + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**नमूना आउटपुट** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +अब आप इन कॉर्डिनेट्स को किसी ड्राइंग लाइब्रेरी (जैसे OpenCV) में फीड करके मूल इमेज पर बॉक्स ओवरले कर सकते हैं, या बाद में क्वेरीज़ के लिए उन्हें डेटाबेस में स्टोर कर सकते हैं। + +## पूर्ण, रन‑टू‑डैड स्क्रिप्ट + +नीचे पूरा प्रोग्राम दिया गया है जो तीनों चरणों को जोड़ता है। प्लेसहोल्डर `engine` कॉल्स को अपने वास्तविक OCR SDK से बदलें। + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### कैसे चलाएँ + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +आपको बाउंडिंग बॉक्स की सूची के साथ साफ़ किया हुआ टेक्स्ट दिखेगा, बिल्कुल ऊपर के नमूना आउटपुट की तरह। + +## अक्सर पूछे जाने वाले प्रश्न और किनारे के केस + +| प्रश्न | उत्तर | +|----------|--------| +| **यदि OCR इंजन `return_structured` को सपोर्ट नहीं करता तो क्या करें?** | एक हल्का रैपर लिखें जो इंजन के रॉ आउटपुट (आमतौर पर शब्दों की सूची और उनके कॉर्डिनेट्स) को `text` और `bounding_box` एट्रिब्यूट वाले ऑब्जेक्ट्स में बदल दे। | +| **क्या मैं कॉन्फिडेंस स्कोर प्राप्त कर सकता हूँ?** | कई SDK प्रत्येक क्षेत्र के लिए कॉन्फिडेंस मेट्रिक एक्सपोज़ करते हैं। इसे प्रिंट स्टेटमेंट में जोड़ें: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`। | +| **घुमाए हुए टेक्स्ट को कैसे हैंडल करें?** | `recognize` कॉल करने से पहले OpenCV के `cv2.minAreaRect` से इमेज को डेस्क्यू करें। | +| **यदि मुझे आउटपुट JSON में चाहिए तो?** | `processed_result.regions` को `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)` से सीरियलाइज़ करें। | +| **क्या बॉक्स को विज़ुअलाइज़ करने का कोई तरीका है?** | लूप के अंदर OpenCV उपयोग करें: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` और फिर `cv2.imwrite("annotated.jpg", img)`। | + +## निष्कर्ष + +आपने अभी **perform OCR on image**, कच्चे आउटपुट को साफ़ करना, और प्रत्येक क्षेत्र के लिए **display bounding box coordinates** करना सीख लिया है। तीन‑स्टेप फ्लो—recognize → post‑process → iterate—एक पुन: उपयोग योग्य पैटर्न है जिसे आप किसी भी Python प्रोजेक्ट में डाल सकते हैं जिसे भरोसेमंद टेक्स्ट एक्सट्रैक्शन चाहिए। + +### आगे क्या? + +- **विभिन्न OCR बैक‑एंड्स** (Tesseract, EasyOCR, Google Vision) को एक्सप्लोर करें और उनकी एक्यूरेसी की तुलना करें। +- **डेटाबेस के साथ इंटीग्रेट** करें ताकि क्षेत्र डेटा को सर्चेबल आर्काइव्स में स्टोर किया जा सके। +- **भाषा डिटेक्शन** जोड़ें ताकि प्रत्येक क्षेत्र को उपयुक्त स्पेल‑चेकर के माध्यम से रूट किया जा सके। +- **मूल इमेज पर बॉक्स ओवरले** करें ताकि विज़ुअल वेरिफिकेशन हो (ऊपर के OpenCV स्निपेट को देखें)। + +यदि आपको कोई अजीब व्यवहार मिलता है, तो याद रखें कि सबसे बड़ा फ़ायदा एक ठोस पोस्ट‑प्रोसेसिंग स्टेप से आता है; एक साफ़ स्ट्रिंग कच्चे कैरेक्टर डंप की तुलना में बहुत आसान होती है। + +हैप्पी कोडिंग, और आपके OCR पाइपलाइन हमेशा साफ़ रहें! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hindi/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/hindi/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..78f9b28f9 --- /dev/null +++ b/ocr/hindi/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Python OCR ट्यूटोरियल जो दिखाता है कि Aspose OCR क्लाउड के साथ Python + में इमेज से टेक्स्ट कैसे निकालें। OCR के लिए इमेज लोड करना सीखें और कुछ ही मिनटों + में इमेज को साधारण टेक्स्ट में बदलें। +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: hi +og_description: Python OCR ट्यूटोरियल समझाता है कि OCR के लिए इमेज कैसे लोड करें और + Aspose OCR Cloud का उपयोग करके इमेज को साधारण टेक्स्ट में कैसे बदलें। पूरा कोड और + टिप्स प्राप्त करें। +og_title: पायथन OCR ट्यूटोरियल – छवियों से टेक्स्ट निकालें +tags: +- OCR +- Python +- Image Processing +title: Python OCR ट्यूटोरियल – छवियों से पाठ निकालें +url: /hi/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Tutorial – छवियों से टेक्स्ट निकालें + +क्या आपने कभी सोचा है कि एक गंदा रसीद फोटो को साफ, खोज योग्य टेक्स्ट में कैसे बदला जाए? आप अकेले नहीं हैं। मेरे अनुभव में, सबसे बड़ी बाधा OCR इंजन नहीं बल्कि इमेज को सही फॉर्मेट में लाना और बिना किसी समस्या के प्लेन टेक्स्ट निकालना है। + +यह **python ocr tutorial** आपको हर कदम से गुज़राता है—OCR के लिए इमेज लोड करना, पहचान चलाना, और अंत में इमेज प्लेन टेक्स्ट को एक Python स्ट्रिंग में बदलना जिसे आप स्टोर या विश्लेषण कर सकते हैं। अंत तक आप **extract text image python** शैली में टेक्स्ट निकाल पाएँगे, और शुरू करने के लिए आपको किसी भी पेड लाइसेंस की जरूरत नहीं होगी। + +## आप क्या सीखेंगे + +- Python के लिए Aspose OCR Cloud SDK को इंस्टॉल और इम्पोर्ट करने का तरीका। +- **load image for OCR** (PNG, JPEG, TIFF, PDF, आदि) के लिए सटीक कोड। +- **ocr image to text** रूपांतरण करने के लिए इंजन को कॉल करने का तरीका। +- मल्टी‑पेज PDFs या लो‑रेज़ोल्यूशन स्कैन जैसे सामान्य एज‑केस को संभालने के टिप्स। +- आउटपुट को वेरिफाई करने के तरीके और अगर टेक्स्ट गड़बड़ दिखे तो क्या करना है। + +### पूर्वापेक्षाएँ + +- आपके मशीन पर Python 3.8+ इंस्टॉल होना चाहिए। +- एक फ्री Aspose Cloud अकाउंट (ट्रायल लाइसेंस के बिना भी काम करता है)। +- pip और वर्चुअल एनवायरनमेंट्स की बेसिक जानकारी—कुछ भी जटिल नहीं। + +> **Pro tip:** यदि आप पहले से ही virtualenv का उपयोग कर रहे हैं, तो इसे अभी एक्टिवेट करें। यह आपके डिपेंडेंसीज़ को व्यवस्थित रखता है और वर्ज़न टकराव से बचाता है। + +![Python OCR tutorial screenshot showing recognized text](path/to/ocr_example.png "Python OCR tutorial – extracted plain text display") + +## चरण 1 – Aspose OCR Cloud SDK इंस्टॉल करें + +सबसे पहले, हमें वह लाइब्रेरी चाहिए जो Aspose के OCR सर्विस से बात करती है। टर्मिनल खोलें और चलाएँ: + +```bash +pip install asposeocrcloud +``` + +यह एकल कमांड नवीनतम SDK (वर्तमान में संस्करण 23.12) को डाउनलोड करता है। पैकेज में आपको जो कुछ भी चाहिए वह शामिल है—कोई अतिरिक्त इमेज‑प्रोसेसिंग लाइब्रेरीज़ की जरूरत नहीं। + +## चरण 2 – OCR इंजन को इनिशियलाइज़ करें (Primary Keyword in Action) + +अब जब SDK तैयार है, हम **python ocr tutorial** इंजन को शुरू कर सकते हैं। कंस्ट्रक्टर को ट्रायल के लिए किसी लाइसेंस की आवश्यकता नहीं होती, जिससे सब कुछ सरल रहता है। + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** इंजन को केवल एक बार इनिशियलाइज़ करने से बाद के कॉल तेज़ रहते हैं। यदि आप हर इमेज के लिए ऑब्जेक्ट को फिर से बनाते हैं तो नेटवर्क राउंड‑ट्रिप्स बर्बाद होंगे। + +## चरण 3 – OCR के लिए इमेज लोड करें + +यहीं पर **load image for OCR** कीवर्ड चमकता है। SDK की `Image.load` मेथड फ़ाइल पाथ या URL को स्वीकार करती है, और स्वचालित रूप से फॉर्मेट (PNG, JPEG, TIFF, PDF, आदि) का पता लगाती है। चलिए एक सैंपल रसीद लोड करते हैं: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +यदि आप मल्टी‑पेज PDF से निपट रहे हैं, तो बस PDF फ़ाइल को पॉइंट करें; SDK प्रत्येक पेज को आंतरिक रूप से अलग इमेज मान लेगा। + +## चरण 4 – OCR इमेज से टेक्स्ट रूपांतरण करें + +इमेज मेमोरी में होने पर, वास्तविक OCR एक ही लाइन में होता है। `recognize` मेथड एक `OcrResult` ऑब्जेक्ट लौटाता है जिसमें प्लेन टेक्स्ट, कॉन्फिडेंस स्कोर, और यदि बाद में जरूरत पड़े तो बाउंडिंग बॉक्स भी होते हैं। + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** लो‑रेज़ोल्यूशन तस्वीरों (300 dpi से कम) के लिए आप पहले इमेज को अपस्केल करना चाह सकते हैं। SDK एक `Resize` हेल्पर प्रदान करता है, लेकिन अधिकांश रसीदों के लिए डिफ़ॉल्ट ठीक काम करता है। + +## चरण 5 – इमेज प्लेन टेक्स्ट को उपयोगी स्ट्रिंग में बदलें + +पज़ल का अंतिम हिस्सा है रिज़ल्ट ऑब्जेक्ट से प्लेन टेक्स्ट निकालना। यह **convert image plain text** चरण है जो OCR ब्लॉब को ऐसी चीज़ में बदलता है जिसे आप प्रिंट, स्टोर या किसी अन्य सिस्टम में फीड कर सकते हैं। + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +जब आप स्क्रिप्ट चलाएँगे, आपको कुछ इस तरह दिखेगा: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +यह आउटपुट अब एक सामान्य Python स्ट्रिंग है, CSV एक्सपोर्ट, डेटाबेस इन्सर्शन, या नेचुरल‑लैंग्वेज प्रोसेसिंग के लिए तैयार। + +## सामान्य समस्याओं का समाधान + +### 1. खाली या शोर वाली इमेजेज + +यदि `ocr_result.text` खाली आता है, तो इमेज क्वालिटी दोबारा जांचें। एक त्वरित समाधान है प्री‑प्रोसेसिंग स्टेप जोड़ना: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. मल्टी‑पेज PDFs + +जब आप PDF फीड करते हैं, `recognize` प्रत्येक पेज के लिए रिज़ल्ट देता है। इसे इस तरह लूप करें: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. भाषा समर्थन + +Aspose OCR 60 से अधिक भाषाओं को सपोर्ट करता है। भाषा बदलने के लिए, `recognize` कॉल करने से पहले `language` प्रॉपर्टी सेट करें: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## पूर्ण कार्यशील उदाहरण + +सब कुछ एक साथ रखते हुए, यहाँ एक पूर्ण, कॉपी‑पेस्ट‑रेडी स्क्रिप्ट है जो इंस्टॉलेशन से लेकर एज‑केस हैंडलिंग तक सब कुछ कवर करती है: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +स्क्रिप्ट चलाएँ (`python ocr_demo.py`) और आपको **ocr image to text** आउटपुट सीधे आपके कंसोल में दिखेगा। + +## पुनरावलोकन – हमने क्या कवर किया + +- **Aspose OCR Cloud** SDK इंस्टॉल किया (`pip install asposeocrcloud`). +- लाइसेंस के बिना **OCR इंजन को इनिशियलाइज़ किया** (ट्रायल के लिए परफेक्ट)। +- दिखाया कि **load image for OCR** कैसे करें, चाहे वह PNG, JPEG, या PDF हो। +- **ocr image to text** रूपांतरण किया और **converted image plain text** को उपयोगी Python स्ट्रिंग में बदला। +- लो‑रेज़ोल्यूशन स्कैन, मल्टी‑पेज PDFs, और भाषा चयन जैसी सामान्य समस्याओं को हल किया। + +## अगले कदम और संबंधित विषय + +अब जब आप **python ocr tutorial** में निपुण हो गए हैं, तो विचार करें: + +- **Extract text image python** का उपयोग बड़े रसीद फ़ोल्डर्स की बैच प्रोसेसिंग के लिए करें। +- OCR आउटपुट को **pandas** के साथ इंटीग्रेट करें डेटा एनालिसिस के लिए (`df = pd.read_csv(StringIO(extracted))`). +- जब इंटरनेट कनेक्टिविटी सीमित हो तो **Tesseract OCR** को फॉलबैक के रूप में उपयोग करें। +- **spaCy** के साथ पोस्ट‑प्रोसेसिंग जोड़ें ताकि डेट, अमाउंट, और मर्चेंट नाम जैसी एंटिटीज़ पहचानी जा सकें। + +बिना हिचकिचाए प्रयोग करें: विभिन्न इमेज फॉर्मेट आज़माएँ, कंट्रास्ट समायोजित करें, या भाषाएँ बदलें। OCR का क्षेत्र व्यापक है, और आपने जो कौशल अभी सीखे हैं वे किसी भी डॉक्यूमेंट‑ऑटोमेशन प्रोजेक्ट के लिए एक मजबूत आधार हैं। + +कोडिंग का आनंद लें, और आपका टेक्स्ट हमेशा पढ़ने योग्य रहे! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hindi/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/hindi/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..587971c9f --- /dev/null +++ b/ocr/hindi/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,203 @@ +--- +category: general +date: 2026-03-28 +description: छवि पर OCR चलाना सीखें, Hugging Face मॉडल को स्वचालित रूप से डाउनलोड + करें, OCR टेक्स्ट को साफ़ करें और Aspose OCR Cloud का उपयोग करके Python में LLM + मॉडल को कॉन्फ़िगर करें। +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: hi +og_description: छवि पर OCR चलाएँ और स्वचालित रूप से डाउनलोड किए गए Hugging Face मॉडल + का उपयोग करके आउटपुट को साफ़ करें। यह गाइड दिखाता है कि Python में LLM मॉडल को कैसे + कॉन्फ़िगर करें। +og_title: इमेज पर OCR चलाएँ – पूर्ण Aspose OCR क्लाउड ट्यूटोरियल +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Aspose OCR क्लाउड के साथ छवि पर OCR चलाएँ – पूर्ण चरण‑दर‑चरण मार्गदर्शिका +url: /hi/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# छवि पर OCR चलाएँ – पूर्ण Aspose OCR क्लाउड ट्यूटोरियल + +क्या आपको कभी छवि फ़ाइलों पर OCR चलाने की ज़रूरत पड़ी है लेकिन कच्चा आउटपुट एक गड़बड़ mess जैसा दिखता था? मेरे अनुभव में सबसे बड़ी समस्या पहचान नहीं है—यह सफ़ाई है। सौभाग्य से, Aspose OCR Cloud आपको एक LLM पोस्ट‑प्रोसेसर संलग्न करने देता है जो *OCR टेक्स्ट को* स्वचालित रूप से साफ़ कर सकता है। इस ट्यूटोरियल में हम सब कुछ कवर करेंगे: **Hugging Face मॉडल डाउनलोड करने** से लेकर LLM को कॉन्फ़िगर करने, OCR इंजन चलाने, और अंत में परिणाम को पॉलिश करने तक। + +इस गाइड के अंत तक आपके पास एक तैयार‑चलाने‑योग्य स्क्रिप्ट होगी जो: + +1. Hugging Face से एक कॉम्पैक्ट Qwen 2.5 मॉडल खींचती है (आपके लिए ऑटो‑डownload)। +2. मॉडल को GPU पर नेटवर्क का कुछ हिस्सा और शेष CPU पर चलाने के लिए कॉन्फ़िगर करती है। +3. हस्तलिखित नोट छवि पर OCR इंजन को निष्पादित करती है। +4. पहचाने गए टेक्स्ट को साफ़ करने के लिए LLM का उपयोग करती है, जिससे आपको मानव‑पठनीय आउटपुट मिलता है। + +> **Prerequisites** – Python 3.8+, `asposeocrcloud` पैकेज, कम से कम 4 GB VRAM वाला GPU (वैकल्पिक लेकिन अनुशंसित), और पहली मॉडल डाउनलोड के लिए इंटरनेट कनेक्शन। + +## आपको क्या चाहिए + +- **Aspose OCR Cloud SDK** – `pip install asposeocrcloud` के माध्यम से इंस्टॉल करें। +- **एक नमूना छवि** – उदाहरण के लिए, `handwritten_note.jpg` को स्थानीय फ़ोल्डर में रखें। +- **GPU समर्थन** – यदि आपके पास CUDA‑सक्षम GPU है, तो स्क्रिप्ट 30 लेयर्स को ऑफलोड करेगी; अन्यथा यह स्वचालित रूप से CPU पर फ़ॉल्बैक हो जाएगी। +- **लेखन अनुमति** – स्क्रिप्ट मॉडल को `YOUR_DIRECTORY` में कैश करती है; सुनिश्चित करें कि फ़ोल्डर मौजूद है। + +## चरण 1 – LLM मॉडल कॉन्फ़िगर करें (Hugging Face मॉडल डाउनलोड) + +पहले हम Aspose AI को बताते हैं कि मॉडल कहाँ से लाना है। `AsposeAIModelConfig` क्लास ऑटो‑डownload, क्वांटाइज़ेशन, और GPU लेयर अलोकेशन को संभालती है। + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**यह क्यों महत्वपूर्ण है** – `int8` में क्वांटाइज़ करने से मेमोरी उपयोग बहुत घट जाता है (≈ 4 GB बनाम 12 GB)। मॉडल को GPU और CPU में बाँटने से आप एक 3‑बिलियन‑पैरामीटर LLM को भी एक साधारण RTX 3060 पर चला सकते हैं। यदि आपके पास GPU नहीं है, तो `gpu_layers=0` सेट करें और SDK सब कुछ CPU पर रखेगा। + +> **Tip:** पहली रन में लगभग 1.5 GB डाउनलोड होगा, इसलिए कुछ मिनट और स्थिर कनेक्शन दें। + +## चरण 2 – मॉडल कॉन्फ़िगरेशन के साथ AI इंजन को इनिशियलाइज़ करें + +अब हम Aspose AI इंजन को स्पिन अप करते हैं और उसे वही कॉन्फ़िगरेशन देते हैं जो हमने अभी बनाई है। + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**अंदर क्या हो रहा है?** SDK `directory_model_path` में मौजूदा मॉडल की जाँच करता है। यदि वह मिलती‑जुलती संस्करण पाता है तो तुरंत लोड करता है; अन्यथा Hugging Face से GGUF फ़ाइल डाउनलोड करता है, अनज़िप करता है, और इन्फ़रेंस पाइपलाइन तैयार करता है। + +## चरण 3 – OCR इंजन बनाएं और AI पोस्ट‑प्रोसेसर संलग्न करें + +OCR इंजन अक्षरों को पहचानने का भारी काम करता है। `ocr_ai.run_postprocessor` को संलग्न करके हम **clean OCR text** को स्वचालित रूप से पहचान के बाद सक्षम करते हैं। + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**पोस्ट‑प्रोसेसर क्यों उपयोग करें?** कच्चा OCR अक्सर गलत स्थानों पर लाइन ब्रेक, गलत विराम चिह्न, या बेतरतीब प्रतीक शामिल करता है। LLM आउटपुट को सही वाक्यों में पुनर्लेखन कर सकता है, वर्तनी सुधार सकता है, और यहाँ‑तक कि गायब शब्दों का अनुमान लगा सकता है—अर्थात कच्चे डंप को पॉलिश्ड प्रोसेस में बदल देता है। + +## चरण 4 – छवि फ़ाइल पर OCR चलाएँ + +सब कुछ जोड़ने के बाद, अब छवि को इंजन में फीड करने का समय है। + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Edge case:** यदि छवि बड़ी है (> 5 MP), तो प्रोसेसिंग तेज़ करने के लिए पहले उसका आकार बदलना चाह सकते हैं। SDK Pillow `Image` ऑब्जेक्ट स्वीकार करता है, इसलिए आप आवश्यकता पड़ने पर `PIL.Image.thumbnail()` से प्री‑प्रोसेस कर सकते हैं। + +## चरण 5 – AI को पहचाने गए टेक्स्ट को साफ़ करने दें और दोनों संस्करण दिखाएँ + +अंत में हम पहले संलग्न किए गए पोस्ट‑प्रोसेसर को कॉल करते हैं। यह चरण *साफ़ करने से पहले* और *बाद* के अंतर को दर्शाता है। + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### अपेक्षित आउटपुट + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +ध्यान दें कि LLM ने: + +- सामान्य OCR गलत‑पहचानें ठीक कीं (`Th1s` → `This`)। +- बेतरतीब प्रतीकों को हटाया (`&` → `and`)। +- लाइन ब्रेक को सही वाक्यों में सामान्यीकृत किया। + +## 🎨 विज़ुअल ओवरव्यू (Run OCR on image Workflow) + +![छवि पर OCR चलाने का वर्कफ़्लो](run_ocr_on_image_workflow.png "डायग्राम दिखाता है छवि पर OCR पाइपलाइन मॉडल डाउनलोड से लेकर साफ़ आउटपुट तक") + +ऊपर का डायग्राम पूरी पाइपलाइन का सारांश देता है: **Hugging Face मॉडल डाउनलोड → LLM कॉन्फ़िगर → AI इनिशियलाइज़ → OCR इंजन → AI पोस्ट‑प्रोसेसर → clean OCR text**। + +## सामान्य प्रश्न & प्रो टिप्स + +### अगर मेरे पास GPU नहीं है तो क्या करें? + +`AsposeAIModelConfig` में `gpu_layers=0` सेट करें। मॉडल पूरी तरह CPU पर चलेगा, जो धीमा होगा लेकिन फिर भी कार्यशील है। आप एक छोटे मॉडल (जैसे `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) पर भी स्विच कर सकते हैं ताकि इन्फ़रेंस समय उचित रहे। + +### बाद में मॉडल कैसे बदलें? + +सिर्फ `hugging_face_repo_id` को अपडेट करें और `ocr_ai.initialize(model_config)` को फिर से चलाएँ। SDK संस्करण परिवर्तन का पता लगाएगा, नया मॉडल डाउनलोड करेगा, और कैश्ड फ़ाइलों को बदल देगा। + +### क्या मैं पोस्ट‑प्रोसेसर प्रॉम्प्ट को कस्टमाइज़ कर सकता हूँ? + +हाँ। `custom_settings` में एक डिक्शनरी पास करें जिसमें `prompt_template` कुंजी हो। उदाहरण के लिए: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### क्या मुझे साफ़ किया हुआ टेक्स्ट फ़ाइल में स्टोर करना चाहिए? + +बिल्कुल। साफ़ करने के बाद आप परिणाम को `.txt` या `.json` फ़ाइल में लिख सकते हैं ताकि डाउनस्ट्रीम प्रोसेसिंग हो सके: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +## निष्कर्ष + +हमने आपको दिखाया कि कैसे **छवि पर OCR चलाएँ** Aspose OCR Cloud के साथ, स्वचालित रूप से **Hugging Face मॉडल डाउनलोड करें**, कुशलता से **LLM मॉडल कॉन्फ़िगर करें**, और अंत में एक शक्तिशाली LLM पोस्ट‑प्रोसेसर का उपयोग करके **OCR टेक्स्ट साफ़ करें**। पूरा प्रोसेस एक ही आसान‑चलाने‑योग्य Python स्क्रिप्ट में फिट बैठता है और GPU‑सक्षम तथा CPU‑केवल दोनों मशीनों पर काम करता है। + +यदि आप इस पाइपलाइन में सहज हैं, तो प्रयोग करने पर विचार करें: + +- **विभिन्न LLMs** – बड़े कॉन्टेक्स्ट विंडो के लिए `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` आज़माएँ। +- **बैच प्रोसेसिंग** – छवियों के फ़ोल्डर पर लूप चलाएँ और साफ़ किए गए परिणामों को CSV में एकत्रित करें। +- **कस्टम प्रॉम्प्ट्स** – AI को अपने डोमेन (कानूनी दस्तावेज़, मेडिकल नोट्स, आदि) के अनुसार ट्यून करें। + +`gpu_layers` मान को बदलने, मॉडल बदलने, या अपना प्रॉम्प्ट प्लग‑इन करने में स्वतंत्र महसूस करें। संभावनाएँ असीमित हैं, और आपके पास अभी जो कोड है वह लॉन्चपैड है। + +कोडिंग का आनंद लें, और आपका OCR आउटपुट हमेशा साफ़ रहे! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hongkong/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/hongkong/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..9f70e41ff --- /dev/null +++ b/ocr/hongkong/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,221 @@ +--- +category: general +date: 2026-03-28 +description: 如何使用 OCR 識別圖像中的手寫文字。學習提取手寫文字、轉換手寫圖像,快速獲得乾淨的結果。 +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: zh-hant +og_description: 如何使用 OCR 識別手寫文字。本教學將一步一步示範如何從圖像中提取手寫文字,並獲得精緻的結果。 +og_title: 如何使用 OCR 識別手寫文字 – 完整指南 +tags: +- OCR +- Handwriting Recognition +- Python +title: 如何使用 OCR 識別手寫文字 – 完整指南 +url: /zh-hant/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 如何使用 OCR 識別手寫文字 – 完整指南 + +如何使用 OCR 處理手寫筆記是許多開發者在需要數位化草圖、會議紀要或快速記錄想法時常問的問題。在本指南中,我們將逐步說明如何識別手寫文字、提取手寫文字,並將手寫圖像轉換為乾淨、可搜尋的字串。 + +如果你曾盯著一張雜貨清單的照片,心想「能不能把這張手寫圖像直接轉成文字,而不用重新打字?」——你來對地方了。完成後,你將擁有一個即時可執行的腳本,能在數秒內將 **handwritten note to text** 轉換完成。 + +## 你需要的條件 + +- Python 3.8+(此程式碼適用於任何較新的版本) +- `ocr` 函式庫 – 使用 `pip install ocr-sdk` 安裝(請替換為你的供應商套件名稱) +- 清晰的手寫筆記照片(範例中的 `hand_note.png`) +- 一點好奇心與一杯咖啡 ☕️(可選,但建議) + +不需要大型框架,也不需要付費雲端金鑰——只要一個本地引擎,即可直接支援 **handwritten recognition**。 + +## 步驟 1 – 安裝 OCR 套件並匯入 + +首先,先在你的機器上安裝正確的套件。打開終端機並執行: + +```bash +pip install ocr-sdk +``` + +安裝完成後,在腳本中匯入該模組: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **專業提示:** 若你使用虛擬環境,請在安裝前先啟動它。這樣可保持專案整潔,避免版本衝突。 + +## 步驟 2 – 建立 OCR 引擎並啟用手寫模式 + +現在我們真正開始 **how to use OCR**——我們需要一個知道我們在處理手寫筆劃而非印刷字體的引擎實例。以下程式碼片段會建立引擎並切換至手寫模式: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +為什麼要設定 `recognition_mode`?因為大多數 OCR 引擎預設只偵測印刷文字,往往會忽略個人筆記的迴圈與斜線。啟用手寫模式可大幅提升準確度。 + +## 步驟 3 – 載入欲轉換的圖像(Convert Handwritten Image) + +圖像是任何 OCR 工作的原始素材。確保你的照片以無損格式儲存(PNG 表現良好),且文字相對清晰。然後這樣載入: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +如果圖像與腳本位於同一目錄,只需使用 `"hand_note.png"` 而不必寫完整路徑。 + +> **如果圖像模糊怎麼辦?** 嘗試使用 OpenCV 進行前處理(例如,使用 `cv2.cvtColor` 轉為灰階,`cv2.threshold` 提高對比度),再送入 OCR 引擎。 + +## 步驟 4 – 執行辨識引擎以提取手寫文字 + +引擎已就緒且圖像載入記憶體後,我們終於可以 **extract handwritten text**。`recognize` 方法會回傳一個原始結果物件,內含文字與信心分數。 + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +典型的原始輸出可能會有多餘的換行或錯誤辨識的字元,特別是手寫字跡雜亂時。這也是下一步的原因所在。 + +## 步驟 5 – (可選)使用 AI 後處理器潤飾輸出 + +大多數現代 OCR SDK 內建輕量級 AI 後處理器,可清理間距、修正常見 OCR 錯誤,並正規化換行。執行方式非常簡單: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +若略過此步驟仍能取得可用的文字,但 **handwritten note to text** 轉換的結果會稍顯粗糙。後處理器對於包含項目符號或混合大小寫字詞的筆記特別有用。 + +## 步驟 6 – 驗證結果並處理邊緣情況 + +印出潤飾後的結果後,請再次確認內容是否正確。以下是一個簡易的健全性檢查範例: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**邊緣情況檢查清單** + +| 情況 | 處理方式 | +|-----------|------------| +| **Very low contrast** | 在載入前使用 `cv2.convertScaleAbs` 提高對比度。 | +| **Multiple languages** | 設定 `ocr_engine.language = ["en", "es"]`(或你的目標語言)。 | +| **Large documents** | 分批處理頁面以避免記憶體激增。 | +| **Special symbols** | 透過 `ocr_engine.add_custom_words([...])` 新增自訂字典。 | + +## 視覺概覽 + +以下是一張示意圖,說明工作流程——從拍攝的筆記到乾淨的文字。alt 文字包含主要關鍵字,提升圖片 SEO 效果。 + +![如何在手寫筆記圖像上使用 OCR](/images/handwritten_ocr_flow.png "如何在手寫筆記圖像上使用 OCR") + +## 完整、可執行的腳本 + +將所有部件組合起來,以下是完整、可直接複製貼上的程式: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**預期輸出(範例)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +請注意,後處理器已修正 “T0d@y” 的錯字並正規化間距。 + +## 常見陷阱與專業提示 + +- **Image size matters** – OCR 引擎通常將輸入大小限制在 4 K × 4 K。請先縮小大型照片。 +- **Handwriting style** – 手寫體與印刷體會影響準確度。若能控制來源(例如使用數位筆),建議使用印刷體以獲得最佳效果。 +- **Batch processing** – 處理數十張筆記時,將腳本包在迴圈中,並將每筆結果儲存至 CSV 或 SQLite 資料庫。 +- **Memory leaks** – 某些 SDK 會保留內部緩衝區;若發現效能下降,請在完成後呼叫 `ocr_engine.dispose()`。 + +## 往後步驟 – 超越簡易 OCR + +既然你已掌握 **how to use OCR** 於單張圖像,請考慮以下擴充功能: + +1. **Integrate with cloud storage** – 從 AWS S3 或 Azure Blob 取得圖像,執行相同流程,並將結果回傳。 +2. **Add language detection** – 使用 `ocr_engine.detect_language()` 自動切換字典。 +3. **Combine with NLP** – 將清理過的文字輸入 spaCy 或 NLTK,以抽取實體、日期或待辦事項。 +4. **Create a REST endpoint** – 將腳本包裝於 Flask 或 FastAPI,讓其他服務能 POST 圖像並接收 JSON 編碼的文字。 + +所有這些想法仍圍繞著 **recognize handwritten text**、**extract handwritten text** 與 **convert handwritten image** 這三個核心概念——也是你接下來可能搜尋的關鍵詞。 + +--- + +### TL;DR + +我們示範了 **how to use OCR** 以識別手寫文字、提取文字,並將結果潤飾成可用的字串。完整腳本已備妥,工作流程逐步說明,且提供常見邊緣情況的檢查清單。拍下一張下次會議的筆記照片,放入腳本,即可讓機器代替你打字。 + +祝開發順利,願你的筆記永遠清晰可讀! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hongkong/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/hongkong/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..6f086403c --- /dev/null +++ b/ocr/hongkong/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,183 @@ +--- +category: general +date: 2026-03-28 +description: 對圖片執行光學字符辨識,取得帶有邊框座標的乾淨文字。學習如何提取 OCR、清理 OCR,並一步一步顯示結果。 +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: zh-hant +og_description: 對圖像執行 OCR、清理輸出,並在簡潔教學中顯示邊框座標。 +og_title: 在圖像上執行 OCR – 清晰的結果與邊框 +tags: +- OCR +- Computer Vision +- Python +title: 對圖像執行 OCR – 清理結果並顯示邊框座標 +url: /zh-hant/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 在圖像上執行 OCR – 清理結果並顯示邊界框座標 + +有沒有過想 **在圖像上執行 OCR** 卻得到一堆雜亂文字,且不清楚每個字在圖片中的位置?你並不孤單。無論是發票數位化、收據掃描,或是簡單的文字擷取,取得原始 OCR 輸出只是第一道關卡。好消息是,你可以清理這些輸出,並立即看到每個區域的邊界框座標,且不需要寫大量樣板程式碼。 + +在本指南中,我們將逐步說明 **如何擷取 OCR**、執行 **如何清理 OCR** 的後處理,最後 **顯示每個清理後區域的邊界框座標**。完成後,你將擁有一個可直接執行的腳本,能把模糊的照片轉換成整齊、結構化的文字,供後續處理使用。 + +## 你需要的環境 + +- Python 3.9+(以下語法在 3.8 及以上皆可執行) +- 支援 `recognize(..., return_structured=True)` 的 OCR 引擎——例如本文片段中使用的虛構 `engine` 函式庫。實際使用時可改成 Tesseract、EasyOCR 或任何會回傳區域資料的 SDK。 +- 基本的 Python 函式與迴圈概念 +- 一張想要掃描的圖像檔(PNG、JPG 等) + +> **專業小技巧:** 若使用 Tesseract,`pytesseract.image_to_data` 已經會回傳邊界框。只要把它的結果包裝成一個小型適配器,模擬下方 `engine.recognize` 的 API 即可。 + +--- + +![在圖像上執行 OCR 範例](image-placeholder.png "在圖像上執行 OCR 範例") + +*Alt text: 顯示如何在圖像上執行 OCR 並視覺化邊界框座標的示意圖* + +## 步驟 1 – 在圖像上執行 OCR 並取得結構化區域 + +首先,要請 OCR 引擎回傳的不僅是純文字,而是一個包含文字區域的結構化清單。此清單會同時提供原始字串與包圍它的矩形。 + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**為什麼這很重要:** +只取得純文字會失去空間資訊。結構化資料讓你之後可以 **顯示邊界框座標**、將文字對齊到表格,或將精確位置傳給下游模型。 + +## 步驟 2 – 使用後處理器清理 OCR 輸出 + +OCR 引擎雖然擅長辨識字元,但常會留下多餘的空格、換行符或錯誤辨識的符號。後處理器會正規化文字、修正常見的 OCR 錯誤,並去除多餘的空白。 + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +如果你自行開發清理程式,可考慮以下做法: + +- 移除非 ASCII 字元(`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- 將多個連續空格合併為單一空格 +- 使用 `pyspellchecker` 等拼字檢查工具修正明顯的錯別字 + +**為什麼你需要在意:** +乾淨的字串在搜尋、索引以及後續的 NLP 流程中會更可靠。換句話說,**如何清理 OCR** 常常是資料可用與頭痛之間的分水嶺。 + +## 步驟 3 – 為每個清理後的區域顯示邊界框座標 + +文字整理好之後,我們遍歷每個區域,印出其矩形與清理過的字串。這一步就是最終 **顯示邊界框座標** 的地方。 + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**範例輸出** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +接著,你可以把這些座標傳給繪圖函式庫(例如 OpenCV)在原圖上疊加方框,或是存入資料庫以供日後查詢。 + +## 完整、可直接執行的腳本 + +以下程式碼把上述三個步驟整合成一個完整範例。只要把佔位的 `engine` 呼叫換成實際的 OCR SDK 即可。 + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### 如何執行 + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +執行後,你應該會看到一串邊界框與清理後文字的列表,與上方範例輸出相同。 + +## 常見問題與特殊情況 + +| 問題 | 解答 | +|----------|--------| +| **如果 OCR 引擎不支援 `return_structured`,該怎麼辦?** | 寫一個薄層封裝器,將引擎的原始輸出(通常是帶座標的單字清單)轉換成具備 `text` 與 `bounding_box` 屬性的物件。 | +| **可以取得信心分數嗎?** | 許多 SDK 會提供每個區域的信心指標。只要在印出語句中加入:`print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")` 即可。 | +| **如何處理旋轉文字?** | 在呼叫 `recognize` 前,先使用 OpenCV 的 `cv2.minAreaRect` 進行去斜處理。 | +| **如果需要 JSON 格式的輸出呢?** | 使用 `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)` 序列化 `processed_result.regions`。 | +| **有沒有方法可視化這些方框?** | 在迴圈內使用 OpenCV:`cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)`,然後 `cv2.imwrite("annotated.jpg", img)`。 | + +## 結語 + +你剛剛學會了 **如何在圖像上執行 OCR**、清理原始輸出,並 **顯示每個區域的邊界框座標**。這套「辨識 → 後處理 → 迭代」的三步流程是一個可重複使用的模式,能輕鬆嵌入任何需要可靠文字擷取的 Python 專案。 + +### 接下來可以做什麼? + +- **探索不同的 OCR 後端**(Tesseract、EasyOCR、Google Vision)並比較準確度。 +- **結合資料庫**,將區域資料儲存起來,打造可搜尋的檔案庫。 +- **加入語言偵測**,讓每個區域自動走對應的拼字檢查流程。 +- **在原圖上疊加方框** 以進行視覺驗證(參考上方的 OpenCV 片段)。 + +若在實作過程中遇到怪異情況,請記得:最關鍵的勝利往往來自穩固的後處理步驟;乾淨的字串遠比一堆原始字符更易於操作。 + +祝程式開發順利,願你的 OCR 流程永遠保持整潔! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hongkong/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/hongkong/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..14a45eb91 --- /dev/null +++ b/ocr/hongkong/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,229 @@ +--- +category: general +date: 2026-03-28 +description: Python OCR 教學示範如何使用 Aspose OCR Cloud 從圖像提取文字。學習載入圖像進行 OCR,並在數分鐘內將圖像轉換為純文字。 +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: zh-hant +og_description: Python OCR 教學說明如何載入影像進行 OCR,並使用 Aspose OCR Cloud 將影像轉換為純文字。取得完整程式碼與技巧。 +og_title: Python OCR 教學 – 從圖片中提取文字 +tags: +- OCR +- Python +- Image Processing +title: Python OCR 教學 – 從圖像提取文字 +url: /zh-hant/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR 教學 – 從圖像提取文字 + +有沒有想過如何把一張凌亂的收據照片轉換成乾淨、可搜尋的文字?你並不是唯一有這個疑問的人。依我的經驗,最大障礙並非 OCR 引擎本身,而是將圖像轉成正確格式,並順利擷取純文字。 + +本 **python ocr tutorial** 會一步步帶領你——載入圖像以供 OCR、執行辨識,最後將圖像的純文字轉換成可儲存或分析的 Python 字串。完成後,你就能以 **extract text image python** 方式提取文字,且不需要任何付費授權即可開始。 + +## 你將學會 + +- 如何安裝並匯入 Aspose OCR Cloud SDK for Python。 +- 用於 **load image for OCR** 的完整程式碼(支援 PNG、JPEG、TIFF、PDF 等)。 +- 如何呼叫引擎執行 **ocr image to text** 轉換。 +- 處理常見邊緣案例的技巧,例如多頁 PDF 或低解析度掃描。 +- 驗證輸出的方法,以及當文字出現亂碼時的處理方式。 + +### 前置條件 + +- 在機器上已安裝 Python 3.8+。 +- 免費的 Aspose Cloud 帳號(試用版無需授權即可使用)。 +- 基本了解 pip 與虛擬環境——不需複雜設定。 + +> **Pro tip:** 如果你已在使用 virtualenv,現在就啟動它。這樣可以讓相依套件保持整潔,避免版本衝突。 + +![Python OCR 教學截圖,顯示已辨識文字](path/to/ocr_example.png "Python OCR 教學 – 提取的純文字顯示") + +## 步驟 1 – 安裝 Aspose OCR Cloud SDK + +首先,我們需要與 Aspose OCR 服務溝通的函式庫。打開終端機並執行: + +```bash +pip install asposeocrcloud +``` + +這條指令會下載最新的 SDK(目前為 23.12 版)。此套件已包含所有必需的元件,無需額外的影像處理函式庫。 + +## 步驟 2 – 初始化 OCR 引擎(主要關鍵字示範) + +SDK 準備好後,我們即可啟動 **python ocr tutorial** 引擎。建構子在試用版中不需要授權金鑰,使用上更為簡單。 + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** 只初始化一次引擎即可保持後續呼叫的速度。若每張圖像都重新建立物件,會浪費網路往返次數。 + +## 步驟 3 – 載入圖像以供 OCR + +這裡正是 **load image for OCR** 關鍵字發揮作用的地方。SDK 的 `Image.load` 方法接受檔案路徑或 URL,並會自動偵測格式(PNG、JPEG、TIFF、PDF 等)。現在載入一張範例收據: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +如果處理的是多頁 PDF,只需指向該 PDF 檔案;SDK 會在內部將每一頁視為獨立圖像。 + +## 步驟 4 – 執行 OCR 圖像轉文字轉換 + +圖像已載入記憶體後,實際的 OCR 只需一行程式碼。`recognize` 方法會回傳一個 `OcrResult` 物件,內含純文字、信心分數,甚至在之後需要時的邊界框資訊。 + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** 若圖片解析度低(低於 300 dpi),可能需要先放大圖像。SDK 提供 `Resize` 輔助工具,但對大多數收據而言,預設設定已足夠。 + +## 步驟 5 – 將圖像純文字轉換為可用字串 + +最後一步是從結果物件中擷取純文字。這就是 **convert image plain text** 步驟,將 OCR 產出的資料轉成可列印、儲存或輸入其他系統的字串。 + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +執行腳本後,應會看到類似以下的輸出: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +該輸出現在是一個普通的 Python 字串,可用於 CSV 匯出、資料庫寫入或自然語言處理。 + +## 處理常見問題 + +### 1. 空白或雜訊圖像 + +如果 `ocr_result.text` 為空,請再次檢查圖像品質。快速解決方法是加入前處理步驟: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. 多頁 PDF + +當輸入 PDF 時,`recognize` 會回傳每頁的結果。可使用以下方式迴圈處理: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. 語言支援 + +Aspose OCR 支援超過 60 種語言。要切換語言,只需在呼叫 `recognize` 前設定 `language` 屬性: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## 完整範例程式 + +將上述步驟整合起來,以下是一個完整、可直接複製貼上的腳本,涵蓋從安裝到邊緣案例的處理: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +執行腳本(`python ocr_demo.py`),即可在終端機看到 **ocr image to text** 的輸出結果。 + +## 重點回顧 – 我們學了什麼 + +- 安裝 **Aspose OCR Cloud** SDK(`pip install asposeocrcloud`)。 +- **Initialised the OCR engine** 無需授權(適合試用)。 +- 示範如何 **load image for OCR**,不論是 PNG、JPEG 或 PDF。 +- 執行 **ocr image to text** 轉換,並 **converted image plain text** 成可用的 Python 字串。 +- 解決常見問題,如低解析度掃描、多頁 PDF 與語言選擇。 + +## 往後步驟與相關主題 + +既然你已熟悉 **python ocr tutorial**,可以進一步探索: + +- **Extract text image python** 用於批次處理大量收據資料夾。 +- 結合 **pandas** 進行資料分析,將 OCR 輸出匯入(`df = pd.read_csv(StringIO(extracted))`)。 +- 在網路不佳時,以 **Tesseract OCR** 作為備援方案。 +- 使用 **spaCy** 進行後處理,辨識日期、金額、商家名稱等實體。 + +歡迎自行實驗:嘗試不同的圖像格式、調整對比度或切換語言。OCR 的應用範圍相當廣泛,你剛學會的技能是任何文件自動化專案的堅實基礎。 + +祝程式開發順利,願你的文字永遠清晰可讀! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hongkong/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/hongkong/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..7c7b8d536 --- /dev/null +++ b/ocr/hongkong/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,218 @@ +--- +category: general +date: 2026-03-28 +description: 學習如何在圖像上執行 OCR、自動下載 Hugging Face 模型、清理 OCR 文字,並使用 Aspose OCR Cloud 在 + Python 中配置 LLM 模型。 +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: zh-hant +og_description: 在圖像上執行 OCR,並使用自動下載的 Hugging Face 模型清理輸出。本指南說明如何在 Python 中設定 LLM 模型。 +og_title: 在圖像上執行 OCR – 完整的 Aspose OCR 雲端教學 +tags: +- OCR +- Python +- LLM +- HuggingFace +title: 使用 Aspose OCR Cloud 於圖片執行 OCR – 完整逐步指南 +url: /zh-hant/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 在圖像上執行 OCR – 完整 Aspose OCR Cloud 教程 + +有沒有遇過要在圖像檔案上執行 OCR,但原始輸出卻像是一團亂碼?依我之見,最大的痛點其實不是辨識本身,而是後續的清理。幸好,Aspose OCR Cloud 允許你掛接一個 LLM 後處理器,能自動 *清理 OCR 文字*。在本教學中,我們會一步步帶你完成:從 **下載 Hugging Face 模型**、設定 LLM、執行 OCR 引擎,到最後拋光結果。 + +完成本指南後,你將擁有一個即時可執行的腳本,具備以下功能: + +1. 從 Hugging Face 取得緊湊的 Qwen 2.5 模型(會自動下載)。 +2. 設定模型在 GPU 上執行部分網路層,剩餘部分在 CPU 上執行。 +3. 在手寫筆記圖像上執行 OCR 引擎。 +4. 使用 LLM 清理辨識出的文字,產生人類可讀的輸出。 + +> **先決條件** – Python 3.8+、`asposeocrcloud` 套件、具備至少 4 GB VRAM 的 GPU(非必要但建議),以及首次下載模型時需要的網路連線。 + +--- + +## 你需要的東西 + +- **Aspose OCR Cloud SDK** – 透過 `pip install asposeocrcloud` 安裝。 +- **範例圖像** – 例如 `handwritten_note.jpg`,放在本機資料夾內。 +- **GPU 支援** – 若有支援 CUDA 的 GPU,腳本會將 30 個層級卸載至 GPU;否則會自動回退至 CPU。 +- **寫入權限** – 腳本會將模型快取於 `YOUR_DIRECTORY`,請確保該資料夾已存在。 + +--- + +## 第一步 – 設定 LLM 模型(下載 Hugging Face 模型) + +首先,我們要告訴 Aspose AI 從哪裡取得模型。`AsposeAIModelConfig` 類別負責自動下載、量化以及 GPU 層級分配。 + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**為什麼這很重要** – 量化為 `int8` 能大幅減少記憶體使用(約 4 GB 對比 12 GB)。將模型分割至 GPU 與 CPU,可讓 30 億參數的 LLM 在一般的 RTX 3060 上也能運行。若沒有 GPU,將 `gpu_layers=0`,SDK 會全部在 CPU 上執行。 + +> **小技巧**:首次執行會下載約 1.5 GB,請預留幾分鐘並確保網路穩定。 + +--- + +## 第二步 – 使用模型設定初始化 AI 引擎 + +接著,我們啟動 Aspose AI 引擎,並將剛剛建立的設定傳入。 + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**背後發生了什麼**?SDK 會檢查 `directory_model_path` 是否已有模型。若找到相符版本,會立即載入;否則會從 Hugging Face 下載 GGUF 檔案、解壓縮,並建構推論管線。 + +--- + +## 第三步 – 建立 OCR 引擎並掛接 AI 後處理器 + +OCR 引擎負責執行文字辨識。掛接 `ocr_ai.run_postprocessor` 後,辨識完成即會自動執行 **清理 OCR 文字**。 + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**為什麼要使用後處理器?** 原始 OCR 常會出現錯位的換行、誤判的標點或多餘的符號。LLM 能將輸出改寫成完整句子、校正拼寫,甚至推測遺漏的字詞,等於把雜亂的資料轉變為潤飾過的文章。 + +--- + +## 第四步 – 在圖像檔案上執行 OCR + +所有元件已串接完成,現在把圖像送入引擎吧。 + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**邊緣情況**:若圖像過大(> 5 MP),建議先縮小以加速處理。SDK 接受 Pillow 的 `Image` 物件,你可以先用 `PIL.Image.thumbnail()` 進行前置處理。 + +--- + +## 第五步 – 讓 AI 清理辨識文字,並同時顯示前後兩版 + +最後,我們呼叫先前掛接的後處理器。此步驟會展示清理前後的差異。 + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### 預期輸出 + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +可以看到 LLM 已經: + +- 修正常見的 OCR 誤認(`Th1s` → `This`)。 +- 移除多餘符號(`&` → `and`)。 +- 將換行正規化為完整句子。 + +--- + +## 🎨 視覺概覽(在圖像上執行 OCR 工作流程) + +![在圖像上執行 OCR 工作流程](run_ocr_on_image_workflow.png "示意圖顯示從模型下載到清理輸出的在圖像上執行 OCR 流程") + +上圖概括了完整管線:**下載 Hugging Face 模型 → 設定 LLM → 初始化 AI → OCR 引擎 → AI 後處理器 → 清理 OCR 文字**。 + +--- + +## 常見問題與專業小技巧 + +### 沒有 GPU 該怎麼辦? + +在 `AsposeAIModelConfig` 中將 `gpu_layers=0`。模型會全程在 CPU 上執行,雖然較慢但仍可使用。你也可以改用較小的模型(例如 `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`),以維持合理的推論時間。 + +### 想之後更換模型要怎麼做? + +只要更新 `hugging_face_repo_id` 後重新執行 `ocr_ai.initialize(model_config)`。SDK 會偵測版本變更、下載新模型,並取代快取檔案。 + +### 可以自訂後處理器的提示語嗎? + +可以。將字典傳入 `custom_settings`,其中包含 `prompt_template` 鍵。例如: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### 要把清理後的文字寫入檔案嗎? + +一定要。清理完畢後,你可以將結果寫入 `.txt` 或 `.json` 檔,以供後續處理: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## 結論 + +我們已示範如何使用 Aspose OCR Cloud **在圖像上執行 OCR**,自動 **下載 Hugging Face 模型**、精細 **設定 LLM 模型**,最後透過強大的 LLM 後處理器 **清理 OCR 文字**。整個流程只需一個簡易的 Python 腳本,且同時支援 GPU 與純 CPU 環境。 + +如果你對此管線已熟悉,可進一步嘗試: + +- **不同 LLM** – 如 `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF`,以取得更大的上下文視窗。 +- **批次處理** – 迴圈遍歷資料夾內多張圖像,將清理結果匯總至 CSV。 +- **自訂提示** – 針對特定領域(法律文件、醫療筆記等)調整 AI 行為。 + +隨意調整 `gpu_layers` 數值、替換模型,或套用自己的提示語。未來的可能性無限,而你現在手上的程式碼,就是起飛的發射台。 + +祝開發順利,願你的 OCR 輸出永遠乾淨! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hungarian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/hungarian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..c652a2b5e --- /dev/null +++ b/ocr/hungarian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Hogyan használjunk OCR-t a képekben lévő kézírásos szöveg felismerésére. + Tanulja meg a kézírásos szöveg kinyerését, a kézírásos kép átalakítását, és gyorsan + tiszta eredményeket érjen el. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: hu +og_description: Hogyan használjunk OCR-t a kézírásos szöveg felismerésére. Ez az útmutató + lépésről lépésre megmutatja, hogyan lehet képekből kinyerni a kézírásos szöveget, + és kifinomult eredményeket elérni. +og_title: Hogyan használjuk az OCR-t kézírásos szöveg felismerésére – Teljes útmutató +tags: +- OCR +- Handwriting Recognition +- Python +title: Hogyan használjuk az OCR-t kézírásos szöveg felismerésére – Teljes útmutató +url: /hu/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Hogyan használjuk az OCR-t kézírásos szöveg felismerésére – Teljes útmutató + +Az OCR kézírásos jegyzetekhez való használata sok fejlesztő kérdése, amikor vázlatokat, értekezleti jegyzeteket vagy gyors ötleteket kell digitalizálni. Ebben az útmutatóban lépésről lépésre bemutatjuk, hogyan lehet felismerni a kézírásos szöveget, kinyerni a kézírásos szöveget, és egy kézírásos képet tiszta, kereshető karakterláncokká alakítani. + +Ha valaha is egy bevásárlólista fényképét nézted, és azon tűnődtél, hogy „Át tudom-e konvertálni ezt a kézírásos képet szöveggé anélkül, hogy mindent újra be kellene gépelni?” – jó helyen vagy. A végére egy kész‑futtatható szkriptet kapsz, amely **kézírásos jegyzetet szöveggé** alakít másodpercek alatt. + +## Amire szükséged lesz + +- Python 3.8+ (a kód bármely friss verzióval működik) +- Az `ocr` könyvtár – telepítsd a `pip install ocr-sdk` paranccsal (cseréld le a szolgáltatód csomagnevére) +- Egy tiszta kép egy kézírásos jegyzetből (`hand_note.png` a példában) +- Egy kis kíváncsiság és egy kávé ☕️ (opcionális, de ajánlott) + +Nincs nehéz keretrendszer, nincs fizetős felhőkulcs – csak egy helyi motor, amely alapból támogatja a **kézírásos felismerést**. + +## 1. lépés – Az OCR csomag telepítése és importálása + +Először is szerezzük be a megfelelő csomagot a gépedre. Nyiss egy terminált és futtasd: + +```bash +pip install ocr-sdk +``` + +A telepítés befejezése után importáld a modult a szkriptedbe: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tipp:** Ha virtuális környezetet használsz, aktiváld azt a telepítés előtt. Ez rendezetten tartja a projektet és elkerüli a verzióütközéseket. + +## 2. lépés – OCR motor létrehozása és kézírásos mód engedélyezése + +Most már ténylegesen **hogyan használjuk az OCR-t** – szükségünk van egy motor példányra, amely tudja, hogy folyó vonalakról van szó, nem nyomtatott betűtípusokról. A következő kódrészlet létrehozza a motort és átkapcsolja kézírásos módra: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Miért állítjuk be a `recognition_mode`-t? Mivel a legtöbb OCR motor alapértelmezés szerint a nyomtatott szöveg felismerésére van beállítva, ami gyakran kihagyja a személyes jegyzetek hurkányait és dőléseit. A kézírásos mód engedélyezése drámaian növeli a pontosságot. + +## 3. lépés – A konvertálni kívánt kép betöltése (Kézírásos kép konvertálása) + +A képek bármely OCR feladat nyers anyagai. Győződj meg róla, hogy a képed veszteségmentes formátumban van mentve (a PNG remekül működik), és a szöveg megfelelően olvasható. Ezután töltsd be így: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Ha a kép a szkript mellett helyezkedik el, egyszerűen használhatod a `"hand_note.png"`-t a teljes útvonal helyett. + +> **Mi van, ha a kép elmosódott?** Próbálj meg előfeldolgozást végezni OpenCV-vel (pl. `cv2.cvtColor` szürkeárnyalatosra, `cv2.threshold` a kontraszt növeléséhez), mielőtt az OCR motorba adod. + +## 4. lépés – A felismerő motor futtatása a kézírásos szöveg kinyeréséhez + +A motor készen áll és a kép a memóriában, végre **kivonhatjuk a kézírásos szöveget**. A `recognize` metódus egy nyers eredményobjektust ad vissza, amely a szöveget és a megbízhatósági pontszámokat tartalmazza. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +A tipikus nyers kimenet tartalmazhat felesleges sortöréseket vagy helytelenül azonosított karaktereket, különösen ha a kézírás rendezetlen. Ezért van a következő lépés. + +## 5. lépés – (Opcionális) A kimenet finomítása az AI utófeldolgozóval + +A legtöbb modern OCR SDK könnyű AI utófeldolgozóval érkezik, amely tisztítja a szóközöket, javítja a gyakori OCR hibákat, és normalizálja a sorvégeket. Futtatása ennyire egyszerű: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Ha kihagyod ezt a lépést, még mindig használható szöveget kapsz, de a **kézírásos jegyzet szöveggé** konvertálása kissé durvább lesz. Az utófeldolgozó különösen hasznos olyan jegyzeteknél, amelyek felsorolásjeleket vagy vegyes nagy- és kisbetűs szavakat tartalmaznak. + +## 6. lépés – Az eredmény ellenőrzése és a szélsőséges esetek kezelése + +A finomított eredmény kiírása után ellenőrizd le kétszer, hogy minden rendben van-e. Itt egy gyors ellenőrzés, amit hozzáadhatsz: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Szélsőséges esetek ellenőrzőlista** + +| Helyzet | Mit kell tenni | +|-----------|------------| +| **Nagyon alacsony kontraszt** | Növeld a kontrasztot a `cv2.convertScaleAbs` használatával a betöltés előtt. | +| **Több nyelv** | Állítsd be a `ocr_engine.language = ["en", "es"]`-t (vagy a kívánt nyelveket). | +| **Nagy dokumentumok** | Dolgozd fel az oldalakat kötegekben a memória csúcsok elkerülése érdekében. | +| **Speciális szimbólumok** | Adj hozzá egy egyedi szótárat a `ocr_engine.add_custom_words([...])` segítségével. | + +## Vizuális áttekintés + +Az alábbi helyőrző kép szemlélteti a munkafolyamatot – egy fényképezett jegyzetből a tiszta szövegig. Az alt szöveg tartalmazza a fő kulcsszót, így a kép SEO‑barát. + +![hogyan használjuk az OCR-t egy kézírásos jegyzet képen](/images/handwritten_ocr_flow.png "hogyan használjuk az OCR-t egy kézírásos jegyzet képen") + +## Teljes, futtatható szkript + +Az összes elemet összeállítva, itt a teljes, másolás‑beillesztés‑kész program: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Várható kimenet (példa)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Vedd észre, hogy az utófeldolgozó kijavította a „T0d@y” elírást és normalizálta a szóközöket. + +## Gyakori buktatók és pro tippek + +- **A kép mérete számít** – Az OCR motorok általában 4 K × 4 K-re korlátozzák a bemeneti méretet. Mielőtt nagy fotókat használnál, méretezd át őket. +- **Kézírás stílusa** – A folyó írás vs. blokk betűk befolyásolhatják a pontosságot. Ha a forrást (pl. digitális toll) irányítod, ösztönözd a blokk betűket a legjobb eredményért. +- **Kötegelt feldolgozás** – Ha tucatnyi jegyzetről van szó, csomagold a szkriptet egy ciklusba, és tárold az eredményeket CSV‑ben vagy SQLite adatbázisban. +- **Memóriaszivárgások** – Egyes SDK-k belső puffereket tartanak; hívd meg a `ocr_engine.dispose()`-t a munka befejezése után, ha lassulást észlelsz. + +## Következő lépések – Túl a egyszerű OCR-on + +Miután elsajátítottad a **hogyan használjuk az OCR-t** egyetlen képre, fontold meg ezeket a kiterjesztéseket: + +1. **Integrálás felhő tárolóval** – Képek lekérése AWS S3‑ról vagy Azure Blob‑ról, ugyanazon csővezeték futtatása, és az eredmények visszaküldése. +2. **Nyelvfelismerés hozzáadása** – Használd a `ocr_engine.detect_language()`-t a szótárak automatikus váltásához. +3. **Kombinálás NLP‑vel** – A megtisztított szöveget add át spaCy‑nek vagy NLTK‑nek entitások, dátumok vagy feladatok kinyeréséhez. +4. **REST végpont létrehozása** – Csomagold a szkriptet Flask‑be vagy FastAPI‑ba, hogy más szolgáltatások POST‑olhassanak képeket és JSON‑kódolt szöveget kapjanak vissza. + +Mindezek az ötletek továbbra is a **kézírásos szöveg felismerése**, **kézírásos szöveg kinyerése**, és **kézírásos kép konvertálása** alapfogalmak körül forognak – azok a pontos kifejezések, amelyeket valószínűleg legközelebb keresni fogsz. + +--- + +### TL;DR + +Bemutattuk, hogyan **használjuk az OCR-t** a kézírásos szöveg felismerésére, annak kinyerésére, és a végeredmény finomítására egy használható karakterláncba. A teljes szkript készen áll a futtatásra, a munkafolyamat lépésről lépésre van magyarázva, és most már van egy ellenőrzőlistád a gyakori szélsőséges esetekhez. Készíts egy fényképet a következő értekezleti jegyzetedről, add a szkriptnek, és hagyd, hogy a gép gépelje helyetted. + +Boldog kódolást, és legyenek a jegyzeteid mindig olvashatóak! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hungarian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/hungarian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..9c7a3735f --- /dev/null +++ b/ocr/hungarian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,187 @@ +--- +category: general +date: 2026-03-28 +description: Képen OCR-t hajtson végre, és kapjon tiszta szöveget a körülhatároló + doboz koordinátáival. Tanulja meg, hogyan lehet kinyerni az OCR-t, megtisztítani + azt, és lépésről lépésre megjeleníteni az eredményeket. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: hu +og_description: Képen OCR-t végez, megtisztítja a kimenetet, és egy tömör útmutatóban + megjeleníti a körülhatároló doboz koordinátáit. +og_title: OCR végrehajtása képen – Tiszta eredmények és határoló dobozok +tags: +- OCR +- Computer Vision +- Python +title: Kép OCR végrehajtása – Tiszta eredmények és a körülhatároló doboz koordinátáinak + megjelenítése +url: /hu/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Kép OCR végrehajtása – Tiszta eredmények és a körülhatároló doboz koordináták megjelenítése + +Volt már, hogy **képen OCR-t kellett végrehajtani**, de csak rendezetlen szöveget kaptál, és nem tudtad, hol helyezkedik el az egyes szavak a képen? Nem vagy egyedül. Sok projektben – számlák digitalizálása, nyugták beolvasása vagy egyszerű szövegkinyerés – a nyers OCR kimenet csak az első akadály. A jó hír? Tisztíthatod ezt a kimenetet, és azonnal megtekintheted az egyes régiók körülhatároló doboz koordinátáit anélkül, hogy rengeteg sablonkódot kellene írnod. + +Ebben az útmutatóban végigvezetünk a **OCR kinyerésének**, egy **OCR tisztító** post‑processzor futtatásának, és végül minden tisztított régió **körülhatároló doboz koordinátáinak** megjelenítésének lépésein. A végére egyetlen, futtatható szkriptet kapsz, amely egy homályos fényképet rendezett, strukturált szöveggé alakít, készen állva a további feldolgozásra. + +## Amire szükséged lesz + +- Python 3.9+ (az alábbi szintaxis 3.8 és újabb verziókon is működik) +- Egy OCR motor, amely támogatja a `recognize(..., return_structured=True)` hívást – például a példakódban szereplő fiktív `engine` könyvtárat. Cseréld le Tesseract-ra, EasyOCR-ra vagy bármely SDK-ra, amely régióadatokat ad vissza. +- Alapvető ismeretek a Python függvényekről és ciklusokról +- Egy képfájl, amelyet be szeretnél olvasni (PNG, JPG, stb.) + +> **Pro tipp:** Ha Tesseract-ot használsz, a `pytesseract.image_to_data` függvény már eleve adja a körülhatároló dobozokat. Egy kis adapterrel becsomagolhatod az eredményt, hogy utánozza a lent bemutatott `engine.recognize` API-t. + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Alt text: diagram, amely bemutatja, hogyan kell OCR-t végrehajtani képen, és a körülhatároló doboz koordinátákat megjeleníteni* + +## 1. lépés – OCR végrehajtása képen és strukturált régiók lekérése + +Az első dolog, hogy az OCR motorát megkérjük, ne csak egyszerű szöveget, hanem egy strukturált szövegrégió-listát adjon vissza. Ez a lista tartalmazza a nyers karakterláncot és a körülötte lévő téglalapot. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Miért fontos ez:** +Ha csak egyszerű szöveget kérsz, elveszíted a térbeli kontextust. A strukturált adatok lehetővé teszik, hogy később **megjelenítsd a körülhatároló doboz koordinátákat**, szöveget táblázatokhoz igazíts, vagy pontos helyzeteket adj egy downstream modellnek. + +## 2. lépés – OCR kimenet tisztítása post‑processzorral + +Az OCR motorok jól felismerik a karaktereket, de gyakran hagynak meg felesleges szóközöket, sortörés‑artifaktusokat vagy hibásan felismert szimbólumokat. Egy post‑processzor normalizálja a szöveget, javítja a gyakori OCR hibákat, és levágja a fölösleges whitespace‑t. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Ha saját tisztítót építesz, fontold meg: + +- Nem‑ASCII karakterek eltávolítása (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Több szóköz egyetlen szóközzé összevonása +- Egy helyesírás‑ellenőrző, például a `pyspellchecker` használata nyilvánvaló elírások javításához + +**Miért érdemes foglalkozni vele:** +A rendezett karakterlánc sokkal megbízhatóbbá teszi a keresést, indexelést és a downstream NLP folyamatokat. Más szóval, a **hogyan tisztítsuk az OCR-t** gyakran a használható adatállomány és a fejfájás közti különbség. + +## 3. lépés – Körülhatároló doboz koordináták megjelenítése minden tisztított régióhoz + +Most, hogy a szöveg már tiszta, végigiterálunk minden régión, kiírva a téglalapot és a tisztított karakterláncot. Ez a rész, ahol végre **megjelenítjük a körülhatároló doboz koordinátákat**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Minta kimenet** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Ezeket a koordinátákat most már átadhatod egy rajzoló könyvtárnak (pl. OpenCV), hogy dobozokat helyezz a eredeti képre, vagy adatbázisban tárold későbbi lekérdezésekhez. + +## Teljes, futtatható szkript + +Az alábbiakban a teljes program látható, amely összekapcsolja a három lépést. Cseréld le a helyőrző `engine` hívásokat a saját OCR SDK-dra. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Hogyan futtassuk + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +A kimenet egy listát fog mutatni a körülhatároló dobozokról, párosítva a tisztított szöveggel, pontosan úgy, mint a fenti minta kimenet. + +## Gyakran ismételt kérdések és széljegyek + +| Kérdés | Válasz | +|----------|--------| +| **Mi a teendő, ha az OCR motor nem támogatja a `return_structured` opciót?** | Írj egy vékony wrapper‑t, amely a motor nyers kimenetét (általában szavak listája koordinátákkal) átalakítja olyan objektumokká, amelyek `text` és `bounding_box` attribútumokkal rendelkeznek. | +| **Kaphatok megbízhatósági pontszámokat?** | Sok SDK biztosít konfidencia‑metrikát régiónként. Egészítsd a kiírást: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Hogyan kezeljem a forgatott szöveget?** | Előfeldolgozásként használd az OpenCV `cv2.minAreaRect` függvényét a dőlés korrigálásához, mielőtt meghívod a `recognize`‑t. | +| **Mi a teendő, ha JSON‑formátumban szeretném a kimenetet?** | Sorosítsd a `processed_result.regions`‑t a `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)` hívással. | +| **Van mód a dobozok vizualizálására?** | Használd az OpenCV‑t: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` a ciklusban, majd `cv2.imwrite("annotated.jpg", img)`. | + +## Összegzés + +Most már megtanultad, **hogyan kell OCR-t végrehajtani képen**, tisztítani a nyers kimenetet, és **megjeleníteni a körülhatároló doboz koordinátákat** minden régióhoz. A háromlépéses folyamat – felismerés → post‑processzálás → iteráció – egy újrahasználható minta, amelyet bármely Python projektbe beilleszthetsz, amely megbízható szövegkinyerést igényel. + +### Mi következik? + +- **Fedezz fel különböző OCR háttérmotorokat** (Tesseract, EasyOCR, Google Vision) és hasonlítsd össze a pontosságot. +- **Integráld egy adatbázissal**, hogy a régióadatokat kereshető archívumokban tárold. +- **Adj hozzá nyelvfelismerést**, hogy minden régiót a megfelelő helyesírás‑ellenőrzőn keresztül irányítsd. +- **Helyezz dobozokat az eredeti képre** a vizuális ellenőrzéshez (lásd a fenti OpenCV kódrészletet). + +Ha bármilyen furcsasággal találkozol, ne feledd, hogy a legnagyobb előny egy szilárd post‑processzálási lépésből származik; egy tiszta karakterlánc sokkal könnyebben kezelhető, mint egy nyers karakterhalmaz. + +Boldog kódolást, és legyenek mindig rendezettek az OCR csővezetékek! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hungarian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/hungarian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..914b4929f --- /dev/null +++ b/ocr/hungarian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Python OCR oktató, amely bemutatja, hogyan lehet szöveget kinyerni képből + Python segítségével az Aspose OCR Cloud használatával. Tanulja meg, hogyan töltsön + be képet OCR-hez, és hogyan konvertálja a képet egyszerű szöveggé percek alatt. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: hu +og_description: A Python OCR oktatóanyag bemutatja, hogyan töltsünk be képet OCR-hez, + és hogyan konvertáljuk a képet egyszerű szöveggé az Aspose OCR Cloud segítségével. + Szerezd meg a teljes kódot és tippeket. +og_title: Python OCR útmutató – Szöveg kinyerése képekből +tags: +- OCR +- Python +- Image Processing +title: Python OCR útmutató – Szöveg kinyerése képekből +url: /hu/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Bemutató – Szöveg kinyerése képekből + +Gondoltad már, hogyan lehet egy rendezetlen nyugta fényképet tiszta, kereshető szöveggé alakítani? Nem vagy egyedül. Tapasztalatom szerint a legnagyobb akadály nem maga az OCR motor, hanem a kép megfelelő formátumba hozása és a sima szöveg hibátlan kinyerése. + +Ez a **python ocr tutorial** végigvezet minden lépésen – a kép betöltésén OCR-hez, a felismerés futtatásán, és végül a kép sima szövegének Python stringgé alakításán, amelyet tárolhatsz vagy elemezhetsz. A végére képes leszel **extract text image python** stílusban szöveget kinyerni, és nem lesz szükséged fizetős licencre a kezdéshez. + +## Mit fogsz megtanulni + +- Hogyan telepítsd és importáld az Aspose OCR Cloud SDK-t Pythonhoz. +- A pontos kód a **load image for OCR** (PNG, JPEG, TIFF, PDF, stb.) +- Hogyan hívjuk meg a motort a **ocr image to text** átalakítás elvégzéséhez. +- Tippek a gyakori edge‑case-ek kezelésére, mint a többoldalas PDF-ek vagy alacsony felbontású beolvasások. +- Módszerek a kimenet ellenőrzésére és mit tegyünk, ha a szöveg összezavarodott. + +### Előfeltételek + +- Python 3.8+ telepítve a gépeden. +- Ingyenes Aspose Cloud fiók (a próbaverzió licenc nélkül működik). +- Alapvető ismeretek a pip-ről és a virtuális környezetekről – semmi bonyolult. + +> **Pro tip:** Ha már használsz virtualenv-et, aktiváld most. Ez rendezetten tartja a függőségeket és elkerüli a verzióütközéseket. + +![Python OCR tutorial képernyőkép a felismert szöveggel](path/to/ocr_example.png "Python OCR tutorial – kinyert egyszerű szöveg megjelenítése") + +## 1. lépés – Az Aspose OCR Cloud SDK telepítése + +Először is szükségünk van a könyvtárra, amely az Aspose OCR szolgáltatásával kommunikál. Nyiss egy terminált és futtasd: + +```bash +pip install asposeocrcloud +``` + +Ez az egyetlen parancs letölti a legújabb SDK-t (jelenleg a 23.12-es verziót). A csomag mindent tartalmaz, amire szükséged van – nincs szükség extra kép‑feldolgozó könyvtárakra. + +## 2. lépés – Az OCR motor inicializálása (Primary Keyword in Action) + +Most, hogy az SDK készen áll, elindíthatjuk a **python ocr tutorial** motort. A konstruktor nem igényel licenckulcsot a próbaverzióhoz, ami egyszerűvé teszi a dolgokat. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** A motor egyszeri inicializálása gyorsabbá teszi a későbbi hívásokat. Ha minden képhez újra létrehozod az objektumot, feleslegesen pazarolod a hálózati kéréseket. + +## 3. lépés – Kép betöltése OCR-hez + +Itt jön képbe a **load image for OCR** kulcsszó. Az SDK `Image.load` metódusa fájlútvonalat vagy URL-t fogad, és automatikusan felismeri a formátumot (PNG, JPEG, TIFF, PDF, stb.). Töltsünk be egy mintanyugtát: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Ha többoldalas PDF-fel dolgozol, egyszerűen mutass a PDF fájlra; az SDK belsőleg minden oldalt külön képként kezel. + +## 4. lépés – OCR kép‑szöveg átalakítás végrehajtása + +A memóriában lévő képpel a tényleges OCR egyetlen sorban történik. A `recognize` metódus egy `OcrResult` objektumot ad vissza, amely tartalmazza a sima szöveget, a bizalmi pontszámokat, és akár a keretboxokat is, ha később szükséged van rájuk. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** Alacsony felbontású képek (300 dpi alatti) esetén először érdemes lehet felskálázni a képet. Az SDK egy `Resize` segédfüggvényt kínál, de a legtöbb nyugta esetén az alapértelmezett jól működik. + +## 5. lépés – Kép sima szövegének átalakítása használható stringgé + +A puzzle utolsó darabja a sima szöveg kinyerése az eredményobjektumból. Ez a **convert image plain text** lépés, amely az OCR adatblokkot olyanná alakítja, amit kiírhatsz, tárolhatsz vagy egy másik rendszerbe továbbíthatsz. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +A szkript futtatásakor valami ilyesmit kell látnod: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Ez a kimenet most egy szabványos Python string, készen áll CSV exportálásra, adatbázisba való beszúrásra vagy természetes nyelvi feldolgozásra. + +## Gyor + +### 1. Üres vagy zajos képek + +Ha az `ocr_result.text` üres, ellenőrizd a kép minőségét. Egy gyors megoldás, ha hozzáadsz egy előfeldolgozási lépést: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Többoldalas PDF-ek + +Ha PDF-et adsz meg, a `recognize` minden oldalra eredményt ad vissza. Így iterálhatsz rajtuk: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Nyelvtámogatás + +Az Aspose OCR több mint 60 nyelvet támogat. A nyelv váltásához állítsd be a `language` tulajdonságot a `recognize` hívása előtt: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Teljes működő példa + +Összegezve, itt egy teljes, másolás‑beillesztésre kész szkript, amely mindent lefed a telepítéstől a edge‑case kezelésekig: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Futtasd a szkriptet (`python ocr_demo.py`), és a **ocr image to text** kimenetet közvetlenül a konzolon fogod látni. + +## Összefoglalás – Amit átfedtünk + +- Telepítetted a **Aspose OCR Cloud** SDK-t (`pip install asposeocrcloud`). +- **Initialised the OCR engine** licenc nélkül (tökéletes a próbaverzióhoz). +- Bemutattuk, hogyan **load image for OCR**, legyen az PNG, JPEG vagy PDF. +- Végrehajtottuk a **ocr image to text** átalakítást és a **convert image plain text** lépést, hogy használható Python stringet kapjunk. +- Megoldottuk a gyakori buktatókat, mint az alacsony felbontású beolvasások, többoldalas PDF-ek és a nyelvválasztás. + +## Következő lépések és kapcsolódó témák + +Miután elsajátítottad a **python ocr tutorial**-t, érdemes tovább kutatni: + +- **Extract text image python** nagy nyugták mappáinak kötegelt feldolgozásához. +- Az OCR kimenet integrálása a **pandas**-szal adat elemzéshez (`df = pd.read_csv(StringIO(extracted))`). +- **Tesseract OCR** használata tartalékmegoldásként, ha az internetkapcsolat korlátozott. +- Utófeldolgozás hozzáadása a **spaCy**-val, hogy azonosítsa az entitásokat, mint dátumok, összegek és kereskedőnevek. + +Nyugodtan kísérletezz: próbálj ki különböző képformátumokat, állítsd a kontrasztot, vagy válts nyelvet. Az OCR terület széleskörű, és a most megszerzett készségek szilárd alapot nyújtanak bármilyen dokumentum‑automatizálási projekthez. + +Boldog kódolást, és legyen a szöveged mindig olvasható! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/hungarian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/hungarian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..3cd288790 --- /dev/null +++ b/ocr/hungarian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,222 @@ +--- +category: general +date: 2026-03-28 +description: Tanulja meg, hogyan futtasson OCR-t képen, automatikusan töltse le a + Hugging Face modellt, tisztítsa meg az OCR‑szöveget, és konfigurálja az LLM modellt + Pythonban az Aspose OCR Cloud használatával. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: hu +og_description: Futtass OCR-t a képen, és tisztítsd meg a kimenetet egy automatikusan + letöltött Hugging Face modellel. Ez az útmutató bemutatja, hogyan konfigurálj LLM + modellt Pythonban. +og_title: OCR futtatása képen – Teljes Aspose OCR Cloud útmutató +tags: +- OCR +- Python +- LLM +- HuggingFace +title: OCR futtatása képen az Aspose OCR Cloud használatával – Teljes lépésről‑lépésre + útmutató +url: /hu/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Kép OCR futtatása – Teljes Aspose OCR Cloud bemutató + +Volt már szükséged arra, hogy képfájlokon OCR-t futtass, de a nyers kimenet egy összegabalyodott kusza szövegnek tűnt? Tapasztalatom szerint a legnagyobb gond nem maga a felismerés – a tisztítás. Szerencsére az Aspose OCR Cloud lehetővé teszi, hogy egy LLM post‑processzort csatolj, amely automatikusan *tisztítja az OCR szöveget*. Ebben a bemutatóban végigvezetünk mindenen, amire szükséged van: a **Hugging Face modell letöltésétől** a LLM konfigurálásáig, az OCR motor futtatásáig, és végül az eredmény finomításáig. + +A végére egy kész‑futtatható szkriptet kapsz, amely: + +1. Letölti a kompakt Qwen 2.5 modellt a Hugging Face‑ről (automatikusan letöltve számodra). +2. Beállítja a modellt úgy, hogy a hálózat egy részét GPU-n, a maradékot CPU-n futtassa. +3. Végrehajtja az OCR motort egy kézzel írott jegyzet képen. +4. Az LLM-et használja a felismert szöveg tisztítására, emberi olvasásra alkalmas kimenetet biztosítva. + +> **Előfeltételek** – Python 3.8+, `asposeocrcloud` csomag, legalább 4 GB VRAM-mal rendelkező GPU (opcionális, de ajánlott), valamint internetkapcsolat az első modell letöltéséhez. + +--- + +## Amire szükséged lesz + +- **Aspose OCR Cloud SDK** – telepítés: `pip install asposeocrcloud`. +- **Minta kép** – például `handwritten_note.jpg`, helyi mappában. +- **GPU támogatás** – ha CUDA‑támogatott GPU-d van, a script 30 réteget áthelyez a GPU-ra; egyébként automatikusan CPU-ra vált. +- **Írási jogosultság** – a script a modellt a `YOUR_DIRECTORY` könyvtárban tárolja; győződj meg róla, hogy a mappa létezik. + +--- + +## 1. lépés – Az LLM modell konfigurálása (Hugging Face modell letöltése) + +Az első dolog, amit teszünk, hogy megmondjuk az Aspose AI‑nek, honnan töltse le a modellt. A `AsposeAIModelConfig` osztály kezeli az automatikus letöltést, kvantálást és a GPU rétegek kiosztását. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Miért fontos** – Az `int8` kvantálás jelentősen csökkenti a memóriahasználatot (≈ 4 GB vs 12 GB). A modell GPU és CPU közötti felosztása lehetővé teszi, hogy egy 3 milliárd paraméteres LLM-et futtass még egy közepes RTX 3060-on is. Ha nincs GPU-d, állítsd be a `gpu_layers=0` értéket, és az SDK mindent CPU-n tart. + +> **Tipp:** Az első futtatás ~ 1,5 GB-ot tölt le, ezért adj neki néhány percet és stabil kapcsolatot. + +--- + +## 2. lépés – Az AI motor inicializálása a modell konfigurációval + +Most elindítjuk az Aspose AI motort, és átadjuk neki a frissen létrehozott konfigurációt. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Mi történik a háttérben?** Az SDK ellenőrzi a `directory_model_path`-t, hogy van-e már meglévő modell. Ha megtalálja a megfelelő verziót, azonnal betölti; ellenkező esetben letölti a GGUF fájlt a Hugging Face‑ről, kicsomagolja, és előkészíti az inferencia csővezetéket. + +--- + +## 3. lépés – OCR motor létrehozása és az AI post‑processzor csatolása + +Az OCR motor végzi a karakterfelismerés nehéz munkáját. A `ocr_ai.run_postprocessor` csatolásával automatikusan engedélyezzük a **tisztított OCR szöveget** a felismerés után. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Miért használjunk post‑processzort?** A nyers OCR gyakran tartalmaz rossz helyen lévő sortöréseket, hibásan felismert írásjeleket vagy felesleges szimbólumokat. Az LLM átírhatja a kimenetet helyes mondatokra, javíthatja a helyesírást, sőt hiányzó szavakat is kitalálhat – lényegében egy nyers dumpot átalakít szép szöveggé. + +--- + +## 4. lépés – OCR futtatása képfájlon + +Miután minden összekapcsoltuk, itt az ideje, hogy egy képet adjon a motorhoz. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Szélsőséges eset:** Ha a kép nagy (> 5 MP), érdemes előbb átméretezni a feldolgozás felgyorsítása érdekében. Az SDK elfogad egy Pillow `Image` objektumot, így szükség esetén előfeldolgozhatod a `PIL.Image.thumbnail()`‑val. + +--- + +## 5. lépés – Engedjük, hogy az AI megtisztítsa a felismert szöveget és mutassa mindkét változatot + +Végül meghívjuk a korábban csatolt post‑processzort. Ez a lépés bemutatja a *tisztítás előtti* és *utáni* különbséget. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Várható kimenet + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Figyeld meg, hogy az LLM: + +- Kijavította a gyakori OCR hibákat (`Th1s` → `This`). +- Eltávolította a felesleges szimbólumokat (`&` → `and`). +- Normalizálta a sortöréseket helyes mondatokká. + +--- + +## 🎨 Vizuális áttekintés (OCR futtatása képen munkafolyamat) + +![OCR futtatása képen munkafolyamat](run_ocr_on_image_workflow.png "Diagram, amely bemutatja az OCR futtatása képen csővezetékét a modell letöltésétől a tisztított kimenetig") + +A fenti diagram összefoglalja a teljes csővezetéket: **Hugging Face modell letöltése → LLM konfigurálása → AI inicializálása → OCR motor → AI post‑processzor → tisztított OCR szöveg**. + +--- + +## Gyakori kérdések és profi tippek + +### Mi van, ha nincs GPU-m? + +Állítsd be a `gpu_layers=0` értéket az `AsposeAIModelConfig`‑ban. A modell teljesen CPU-n fog futni, ami lassabb, de még mindig működőképes. Átválthatsz egy kisebb modellre is (pl. `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`), hogy az inferencia idő ésszerű maradjon. + +### Hogyan változtathatom meg később a modellt? + +Egyszerűen frissítsd a `hugging_face_repo_id`‑t, és futtasd újra az `ocr_ai.initialize(model_config)`‑t. Az SDK észleli a verzióváltozást, letölti az új modellt, és felülírja a gyorsítótárazott fájlokat. + +### Testreszabhatom a post‑processzor promptját? + +Igen. Adj át egy szótárat a `custom_settings`‑nek egy `prompt_template` kulccsal. Például: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Tároljam a tisztított szöveget fájlban? + +Határozottan. A tisztítás után a eredményt `.txt` vagy `.json` fájlba írhatod a további feldolgozáshoz: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Összegzés + +Most bemutattuk, hogyan **futtathatsz OCR-t képfájlokon** az Aspose OCR Cloud segítségével, automatikusan **letöltheted a Hugging Face modellt**, szakszerűen **konfigurálhatod az LLM modellt**, és végül **tisztíthatod az OCR szöveget** egy erőteljes LLM post‑processzorral. Az egész folyamat egyetlen, könnyen futtatható Python szkriptbe illeszkedik, és működik mind GPU‑val felszerelt, mind csak CPU‑t használó gépeken. + +Ha már magabiztos vagy ebben a csővezetékben, gondolkodj el a következő kísérleteken: + +- **Különböző LLM-ek** – próbáld ki a `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF`‑t egy nagyobb kontextusablakért. +- **Kötegelt feldolgozás** – iterálj egy képmappán, és gyűjtsd össze a tisztított eredményeket egy CSV-be. +- **Egyedi promptok** – szabja testre az AI-t a saját területére (jogi dokumentumok, orvosi jegyzetek stb.). + +Nyugodtan módosítsd a `gpu_layers` értékét, cseréld le a modellt, vagy csatlakoztasd a saját promptodat. A lehetőségek határtalanok, és a jelenlegi kód a kiindulópont. + +Boldog kódolást, és legyenek az OCR kimeneteid mindig tiszták! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/indonesian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/indonesian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..5ed59add9 --- /dev/null +++ b/ocr/indonesian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Cara menggunakan OCR untuk mengenali teks tulisan tangan dalam gambar. + Pelajari cara mengekstrak teks tulisan tangan, mengonversi gambar tulisan tangan, + dan mendapatkan hasil yang bersih dengan cepat. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: id +og_description: Cara menggunakan OCR untuk mengenali teks tulisan tangan. Tutorial + ini menunjukkan langkah demi langkah cara mengekstrak teks tulisan tangan dari gambar + dan mendapatkan hasil yang halus. +og_title: Cara Menggunakan OCR untuk Mengenali Teks Tulisan Tangan – Panduan Lengkap +tags: +- OCR +- Handwriting Recognition +- Python +title: Cara Menggunakan OCR untuk Mengenali Teks Tangan – Panduan Lengkap +url: /id/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Cara Menggunakan OCR untuk Mengenali Teks Tertulis Tangan – Panduan Lengkap + +Cara menggunakan OCR untuk catatan tulisan tangan adalah pertanyaan yang banyak dikemukakan pengembang ketika mereka perlu mendigitalkan sketsa, notulen rapat, atau ide cepat. Dalam panduan ini kami akan menjelaskan langkah‑langkah tepat untuk mengenali teks tulisan tangan, mengekstrak teks tulisan tangan, dan mengubah gambar tulisan tangan menjadi string yang bersih dan dapat dicari. + +Jika Anda pernah menatap foto daftar belanja dan bertanya, “Bisakah saya mengonversi gambar tulisan tangan ini menjadi teks tanpa mengetik semuanya lagi?” – Anda berada di tempat yang tepat. Pada akhir panduan Anda akan memiliki skrip siap‑jalankan yang mengubah **catatan tulisan tangan menjadi teks** dalam hitungan detik. + +## Apa yang Anda Butuhkan + +- Python 3.8+ (kode berfungsi dengan versi terbaru apa pun) +- Library `ocr` – instal dengan `pip install ocr-sdk` (ganti dengan nama paket penyedia Anda) +- Gambar yang jelas dari catatan tulisan tangan (`hand_note.png` dalam contoh) +- Sedikit rasa ingin tahu dan secangkir kopi ☕️ (opsional namun disarankan) + +Tanpa kerangka kerja berat, tanpa kunci cloud berbayar – hanya mesin lokal yang mendukung **pengenalan tulisan tangan** secara langsung. + +## Langkah 1 – Instal Paket OCR dan Impor + +Pertama-tama, mari pasang paket yang tepat di mesin Anda. Buka terminal dan jalankan: + +```bash +pip install ocr-sdk +``` + +Setelah instalasi selesai, impor modul dalam skrip Anda: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tip:** Jika Anda menggunakan lingkungan virtual, aktifkan sebelum menginstal. Itu menjaga proyek Anda tetap rapi dan menghindari benturan versi. + +## Langkah 2 – Buat Mesin OCR dan Aktifkan Mode Tulisan Tangan + +Sekarang kita benar‑benarnya **cara menggunakan OCR** – kita memerlukan instance mesin yang tahu kita berurusan dengan goresan kursif bukan font cetak. Potongan kode berikut membuat mesin dan mengalihkannya ke mode tulisan tangan: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Mengapa mengatur `recognition_mode`? Karena kebanyakan mesin OCR secara default mendeteksi teks cetak, yang sering mengabaikan lingkaran dan kemiringan pada catatan pribadi. Mengaktifkan mode tulisan tangan secara dramatis meningkatkan akurasi. + +## Langkah 3 – Muat Gambar yang Ingin Anda Konversi (Konversi Gambar Tulisan Tangan) + +Gambar adalah bahan mentah untuk setiap pekerjaan OCR. Pastikan foto Anda disimpan dalam format lossless (PNG sangat cocok) dan teksnya cukup terbaca. Kemudian muat seperti ini: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Jika gambar berada di samping skrip Anda, Anda cukup menggunakan `"hand_note.png"` alih‑alih path lengkap. + +> **Bagaimana jika gambar blur?** Coba pra‑proses dengan OpenCV (misalnya, `cv2.cvtColor` ke grayscale, `cv2.threshold` untuk meningkatkan kontras) sebelum memberi ke mesin OCR. + +## Langkah 4 – Jalankan Mesin Pengenalan untuk Mengekstrak Teks Tulisan Tangan + +Dengan mesin siap dan gambar di memori, kita akhirnya dapat **mengekstrak teks tulisan tangan**. Metode `recognize` mengembalikan objek hasil mentah yang berisi teks serta skor kepercayaan. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Output mentah biasanya dapat berisi jeda baris yang tidak diinginkan atau karakter yang salah dikenali, terutama jika tulisan tangan berantakan. Itulah mengapa langkah selanjutnya ada. + +## Langkah 5 – (Opsional) Poles Output dengan AI Post‑Processor + +Sebagian besar SDK OCR modern dilengkapi dengan AI post‑processor ringan yang membersihkan spasi, memperbaiki kesalahan OCR umum, dan menormalkan akhir baris. Menjalankannya semudah: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Jika Anda melewatkan langkah ini, Anda masih akan mendapatkan teks yang dapat digunakan, tetapi konversi **catatan tulisan tangan menjadi teks** akan terlihat agak kasar. Post‑processor sangat berguna untuk catatan yang berisi poin bullet atau kata dengan campuran huruf besar/kecil. + +## Langkah 6 – Verifikasi Hasil dan Tangani Kasus Edge + +Setelah mencetak hasil yang dipoles, periksa kembali bahwa semuanya terlihat benar. Berikut pemeriksaan cepat yang dapat Anda tambahkan: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Daftar Periksa Kasus Edge** + +| Situation | What to do | +|-----------|------------| +| **Very low contrast** | Increase contrast with `cv2.convertScaleAbs` before loading. | +| **Multiple languages** | Set `ocr_engine.language = ["en", "es"]` (or your target languages). | +| **Large documents** | Process pages in batches to avoid memory spikes. | +| **Special symbols** | Add a custom dictionary via `ocr_engine.add_custom_words([...])`. | + +## Gambaran Visual + +Di bawah ini adalah gambar placeholder yang menggambarkan alur kerja—dari catatan yang difoto hingga teks bersih. Teks alt berisi kata kunci utama, menjadikan gambar SEO‑friendly. + +![cara menggunakan OCR pada gambar catatan tulisan tangan](/images/handwritten_ocr_flow.png "cara menggunakan OCR pada gambar catatan tulisan tangan") + +## Skrip Lengkap yang Dapat Dijalankan + +Menggabungkan semua bagian, berikut program lengkap yang siap disalin‑dan‑tempel: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Output yang Diharapkan (contoh)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Perhatikan bagaimana post‑processor memperbaiki typo “T0d@y” dan menormalkan spasi. + +## Kesalahan Umum & Pro Tips + +- **Ukuran gambar penting** – Mesin OCR biasanya membatasi ukuran input hingga 4 K × 4 K. Ubah ukuran foto besar sebelumnya. +- **Gaya tulisan tangan** – Tulisan kursif vs. huruf blok dapat memengaruhi akurasi. Jika Anda mengontrol sumber (misalnya, pena digital), dorong penggunaan huruf blok untuk hasil terbaik. +- **Pemrosesan batch** – Saat menangani puluhan catatan, bungkus skrip dalam loop dan simpan setiap hasil ke CSV atau DB SQLite. +- **Memory leak** – Beberapa SDK menyimpan buffer internal; panggil `ocr_engine.dispose()` setelah selesai jika Anda melihat penurunan performa. + +## Langkah Selanjutnya – Melampaui OCR Sederhana + +Setelah Anda menguasai **cara menggunakan OCR** untuk satu gambar, pertimbangkan ekstensi berikut: + +1. **Integrasikan dengan penyimpanan cloud** – Ambil gambar dari AWS S3 atau Azure Blob, jalankan pipeline yang sama, dan kirim kembali hasilnya. +2. **Tambahkan deteksi bahasa** – Gunakan `ocr_engine.detect_language()` untuk secara otomatis beralih kamus. +3. **Kombinasikan dengan NLP** – Masukkan teks bersih ke spaCy atau NLTK untuk mengekstrak entitas, tanggal, atau item tindakan. +4. **Buat endpoint REST** – Bungkus skrip dalam Flask atau FastAPI sehingga layanan lain dapat POST gambar dan menerima teks berformat JSON. + +Semua ide ini tetap berpusat pada konsep inti **mengenali teks tulisan tangan**, **mengekstrak teks tulisan tangan**, dan **mengonversi gambar tulisan tangan**—frasa tepat yang kemungkinan akan Anda cari selanjutnya. + +--- + +### TL;DR + +Kami menunjukkan **cara menggunakan OCR** untuk mengenali teks tulisan tangan, mengekstraknya, dan memoles hasil menjadi string yang dapat digunakan. Skrip lengkap siap dijalankan, alur kerja dijelaskan langkah demi langkah, dan Anda kini memiliki daftar periksa untuk kasus edge umum. Ambil foto catatan rapat berikutnya, masukkan ke skrip, dan biarkan mesin yang mengetik untuk Anda. + +Selamat coding, semoga catatan Anda selalu terbaca! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/indonesian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/indonesian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..6a9960417 --- /dev/null +++ b/ocr/indonesian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,186 @@ +--- +category: general +date: 2026-03-28 +description: Lakukan OCR pada gambar dan dapatkan teks bersih dengan koordinat kotak + pembatas. Pelajari cara mengekstrak OCR, membersihkan OCR, dan menampilkan hasil + langkah demi langkah. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: id +og_description: Lakukan OCR pada gambar, bersihkan outputnya, dan tampilkan koordinat + kotak pembatas dalam tutorial singkat. +og_title: Lakukan OCR pada Gambar – Hasil Bersih dan Kotak Pembatas +tags: +- OCR +- Computer Vision +- Python +title: Lakukan OCR pada Gambar – Bersihkan Hasil dan Tampilkan Koordinat Kotak Pembatas +url: /id/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Lakukan OCR pada Gambar – Bersihkan Hasil dan Tampilkan Koordinat Bounding Box + +Pernahkah Anda perlu **melakukan OCR pada gambar** tetapi terus mendapatkan teks yang berantakan dan tidak yakin di mana setiap kata berada pada gambar? Anda tidak sendirian. Dalam banyak proyek—digitalisasi faktur, pemindaian struk, atau ekstraksi teks sederhana—mendapatkan output OCR mentah hanyalah rintangan pertama. Kabar baiknya? Anda dapat membersihkan output tersebut dan langsung melihat koordinat bounding box setiap wilayah tanpa menulis banyak kode boilerplate. + +Dalam panduan ini kami akan menjelaskan **cara mengekstrak OCR**, menjalankan post‑processor **cara membersihkan OCR**, dan akhirnya **menampilkan koordinat bounding box** untuk setiap wilayah yang telah dibersihkan. Pada akhir panduan Anda akan memiliki satu skrip yang dapat dijalankan yang mengubah foto buram menjadi teks terstruktur yang rapi siap untuk pemrosesan selanjutnya. + +## Apa yang Anda Butuhkan + +- Python 3.9+ (sintaks di bawah bekerja pada 3.8 dan lebih baru) +- Mesin OCR yang mendukung `recognize(..., return_structured=True)` – misalnya, pustaka fiktif `engine` yang digunakan dalam contoh. Ganti dengan Tesseract, EasyOCR, atau SDK apa pun yang mengembalikan data wilayah. +- Familiaritas dasar dengan fungsi dan loop Python +- File gambar yang ingin Anda pindai (PNG, JPG, dll.) + +> **Tips Pro:** Jika Anda menggunakan Tesseract, fungsi `pytesseract.image_to_data` sudah memberikan bounding box. Anda dapat membungkus hasilnya dalam adaptor kecil yang meniru API `engine.recognize` yang ditunjukkan di bawah. + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Teks alternatif: diagram yang menunjukkan cara melakukan OCR pada gambar dan memvisualisasikan koordinat bounding box* + +## Langkah 1 – Lakukan OCR pada Gambar dan Dapatkan Wilayah Terstruktur + +Hal pertama adalah meminta mesin OCR untuk mengembalikan tidak hanya teks biasa tetapi daftar terstruktur dari wilayah teks. Daftar ini berisi string mentah dan persegi panjang yang mengelilinginya. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Mengapa ini penting:** +Ketika Anda hanya meminta teks biasa, Anda kehilangan konteks spasial. Data terstruktur memungkinkan Anda kemudian **menampilkan koordinat bounding box**, menyelaraskan teks dengan tabel, atau memberikan lokasi yang tepat ke model selanjutnya. + +## Langkah 2 – Cara Membersihkan Output OCR dengan Post‑Processor + +Mesin OCR hebat dalam mengenali karakter, tetapi sering meninggalkan spasi berlebih, artefak pemutusan baris, atau simbol yang salah dikenali. Post‑processor menormalkan teks, memperbaiki kesalahan OCR umum, dan memangkas spasi kosong. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Jika Anda membuat pembersih sendiri, pertimbangkan: + +- Menghapus karakter non‑ASCII (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Menggabungkan beberapa spasi menjadi satu spasi +- Menggunakan pemeriksa ejaan seperti `pyspellchecker` untuk typo yang jelas + +**Mengapa Anda harus peduli:** +String yang rapi membuat pencarian, pengindeksan, dan pipeline NLP selanjutnya jauh lebih dapat diandalkan. Dengan kata lain, **cara membersihkan OCR** sering menjadi perbedaan antara dataset yang dapat digunakan dan sakit kepala. + +## Langkah 3 – Tampilkan Koordinat Bounding Box untuk Setiap Wilayah yang Dibersihkan + +Sekarang teks sudah rapi, kami mengiterasi setiap wilayah, mencetak persegi panjangnya dan string yang telah dibersihkan. Ini adalah bagian di mana kami akhirnya **menampilkan koordinat bounding box**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Contoh output** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Anda sekarang dapat memasukkan koordinat tersebut ke dalam pustaka gambar (mis., OpenCV) untuk menambahkan kotak pada gambar asli, atau menyimpannya dalam basis data untuk kueri di kemudian hari. + +## Skrip Lengkap, Siap‑Jalankan + +Berikut adalah program lengkap yang menggabungkan ketiga langkah. Ganti pemanggilan placeholder `engine` dengan SDK OCR Anda yang sebenarnya. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Cara Menjalankan + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Anda akan melihat daftar bounding box yang dipasangkan dengan teks yang telah dibersihkan, persis seperti contoh output di atas. + +## Pertanyaan yang Sering Diajukan & Kasus Tepi + +| Question | Answer | +|----------|--------| +| **Bagaimana jika mesin OCR tidak mendukung `return_structured`?** | Tulislah wrapper tipis yang mengonversi output mentah mesin (biasanya daftar kata dengan koordinat) menjadi objek dengan atribut `text` dan `bounding_box`. | +| **Apakah saya dapat memperoleh skor kepercayaan?** | Banyak SDK menampilkan metrik kepercayaan per wilayah. Tambahkan ke pernyataan print: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Bagaimana menangani teks yang diputar?** | Pra‑proses gambar dengan `cv2.minAreaRect` milik OpenCV untuk meluruskan sebelum memanggil `recognize`. | +| **Bagaimana jika saya membutuhkan output dalam format JSON?** | Serialisasikan `processed_result.regions` dengan `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Apakah ada cara untuk memvisualisasikan kotak?** | Gunakan OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` di dalam loop, lalu `cv2.imwrite("annotated.jpg", img)`. | + +## Kesimpulan + +Anda baru saja mempelajari **cara melakukan OCR pada gambar**, membersihkan output mentah, dan **menampilkan koordinat bounding box** untuk setiap wilayah. Alur tiga langkah—recognize → post‑process → iterate—adalah pola yang dapat digunakan kembali yang dapat Anda masukkan ke dalam proyek Python apa pun yang membutuhkan ekstraksi teks yang dapat diandalkan. + +### Apa Selanjutnya? + +- **Jelajahi berbagai back‑end OCR** (Tesseract, EasyOCR, Google Vision) dan bandingkan akurasi. +- **Integrasikan dengan basis data** untuk menyimpan data wilayah untuk arsip yang dapat dicari. +- **Tambahkan deteksi bahasa** untuk mengarahkan setiap wilayah melalui pemeriksa ejaan yang sesuai. +- **Tumpangkan kotak pada gambar asli** untuk verifikasi visual (lihat cuplikan OpenCV di atas). + +Jika Anda menemukan keanehan, ingat bahwa kemenangan terbesar datang dari langkah post‑processing yang solid; string yang bersih jauh lebih mudah dikerjakan daripada dump karakter mentah. + +Selamat coding, dan semoga pipeline OCR Anda selalu rapi! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/indonesian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/indonesian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..2c807c5de --- /dev/null +++ b/ocr/indonesian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Tutorial OCR Python yang menunjukkan cara mengekstrak teks dari gambar + dengan Aspose OCR Cloud. Pelajari cara memuat gambar untuk OCR dan mengonversi gambar + menjadi teks biasa dalam hitungan menit. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: id +og_description: Tutorial OCR Python menjelaskan cara memuat gambar untuk OCR dan mengonversi + gambar menjadi teks biasa menggunakan Aspose OCR Cloud. Dapatkan kode lengkap dan + tips. +og_title: Tutorial OCR Python – Ekstrak Teks dari Gambar +tags: +- OCR +- Python +- Image Processing +title: Tutorial OCR Python – Ekstrak Teks dari Gambar +url: /id/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Tutorial OCR Python – Ekstrak Teks dari Gambar + +Pernah bertanya-tanya bagaimana mengubah foto struk yang berantakan menjadi teks bersih yang dapat dicari? Anda tidak sendirian. Menurut pengalaman saya, hambatan terbesar bukanlah mesin OCR itu sendiri, melainkan cara menyiapkan gambar dalam format yang tepat dan mengekstrak teks polos tanpa masalah. + +**python ocr tutorial** ini memandu Anda melalui setiap langkah—memuat gambar untuk OCR, menjalankan pengenalan, dan akhirnya mengonversi teks polos gambar menjadi string Python yang dapat Anda simpan atau analisis. Pada akhirnya Anda akan dapat **extract text image python** dengan mudah, dan tidak memerlukan lisensi berbayar untuk memulai. + +## Apa yang Akan Anda Pelajari + +- Cara menginstal dan mengimpor Aspose OCR Cloud SDK untuk Python. +- Kode tepat untuk **load image for OCR** (PNG, JPEG, TIFF, PDF, dll.). +- Cara memanggil engine untuk melakukan konversi **ocr image to text**. +- Tips menangani edge‑case umum seperti PDF multi‑halaman atau pemindaian beresolusi rendah. +- Cara memverifikasi output dan apa yang harus dilakukan jika teks terlihat kacau. + +### Prasyarat + +- Python 3.8+ terinstal di mesin Anda. +- Akun Aspose Cloud gratis (versi percobaan berfungsi tanpa lisensi). +- Familiaritas dasar dengan pip dan lingkungan virtual—tidak ada yang rumit. + +> **Pro tip:** Jika Anda sudah menggunakan virtualenv, aktifkan sekarang. Ini menjaga dependensi Anda tetap rapi dan menghindari benturan versi. + +![Python OCR tutorial screenshot showing recognized text](path/to/ocr_example.png "Python OCR tutorial – extracted plain text display") + +## Langkah 1 – Instal Aspose OCR Cloud SDK + +Pertama-tama, kita membutuhkan pustaka yang berkomunikasi dengan layanan OCR Aspose. Buka terminal dan jalankan: + +```bash +pip install asposeocrcloud +``` + +Perintah tunggal itu mengunduh SDK terbaru (saat ini versi 23.12). Paket ini mencakup semua yang Anda perlukan—tanpa kebutuhan pustaka pemrosesan gambar tambahan. + +## Langkah 2 – Inisialisasi Engine OCR (Kata Kunci Utama dalam Aksi) + +Sekarang SDK sudah siap, kita dapat memulai engine **python ocr tutorial**. Konstruktor tidak memerlukan kunci lisensi untuk versi percobaan, sehingga proses menjadi sederhana. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** Inisialisasi engine hanya sekali menjaga panggilan selanjutnya tetap cepat. Jika Anda membuat ulang objek untuk setiap gambar, Anda akan membuang-buang perjalanan jaringan. + +## Langkah 3 – Muat Gambar untuk OCR + +Di sinilah kata kunci **load image for OCR** bersinar. Metode `Image.load` pada SDK menerima jalur file atau URL, dan secara otomatis mendeteksi formatnya (PNG, JPEG, TIFF, PDF, dll.). Mari muat contoh struk: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Jika Anda berurusan dengan PDF multi‑halaman, cukup arahkan ke file PDF; SDK akan memperlakukan setiap halaman sebagai gambar terpisah secara internal. + +## Langkah 4 – Lakukan Konversi OCR Gambar ke Teks + +Dengan gambar berada di memori, OCR sebenarnya terjadi dalam satu baris. Metode `recognize` mengembalikan objek `OcrResult` yang berisi teks polos, skor kepercayaan, dan bahkan kotak pembatas jika Anda membutuhkannya nanti. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** Untuk gambar beresolusi rendah (di bawah 300 dpi) Anda mungkin ingin memperbesar gambar terlebih dahulu. SDK menyediakan helper `Resize`, tetapi untuk kebanyakan struk, pengaturan default sudah cukup baik. + +## Langkah 5 – Konversi Teks Biasa Gambar menjadi String yang Dapat Digunakan + +Bagian terakhir dari teka‑teki adalah mengekstrak teks polos dari objek hasil. Ini adalah langkah **convert image plain text** yang mengubah blob OCR menjadi sesuatu yang dapat Anda cetak, simpan, atau masukkan ke sistem lain. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Saat Anda menjalankan skrip, Anda akan melihat sesuatu seperti: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Output tersebut kini menjadi string Python biasa, siap untuk diekspor ke CSV, dimasukkan ke basis data, atau diproses lebih lanjut dengan natural‑language processing. + +## Menangani Kesulitan Umum + +### 1. Gambar Kosong atau Berisik + +Jika `ocr_result.text` kembali kosong, periksa kembali kualitas gambar. Solusi cepat adalah menambahkan langkah pra‑pemrosesan: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. PDF Multi‑Halaman + +Saat Anda memberi PDF, `recognize` mengembalikan hasil untuk setiap halaman. Loop melalui hasil tersebut seperti ini: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Dukungan Bahasa + +Aspose OCR mendukung lebih dari 60 bahasa. Untuk mengganti bahasa, atur properti `language` sebelum memanggil `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Contoh Lengkap yang Berfungsi + +Menggabungkan semuanya, berikut skrip lengkap siap salin‑tempel yang mencakup semua hal mulai dari instalasi hingga penanganan edge‑case: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Jalankan skrip (`python ocr_demo.py`) dan Anda akan melihat output **ocr image to text** langsung di konsol Anda. + +## Ringkasan – Apa yang Telah Kita Bahas + +- Menginstal SDK **Aspose OCR Cloud** (`pip install asposeocrcloud`). +- **Inisialisasi engine OCR** tanpa lisensi (sempurna untuk percobaan). +- Menunjukkan cara **load image for OCR**, baik itu PNG, JPEG, atau PDF. +- Menjalankan konversi **ocr image to text** dan **converted image plain text** menjadi string Python yang dapat digunakan. +- Menangani kesulitan umum seperti pemindaian beresolusi rendah, PDF multi‑halaman, dan pemilihan bahasa. + +## Langkah Selanjutnya & Topik Terkait + +Sekarang Anda telah menguasai **python ocr tutorial**, pertimbangkan untuk menjelajahi: + +- **Extract text image python** untuk pemrosesan batch folder besar berisi kwitansi. +- Mengintegrasikan output OCR dengan **pandas** untuk analisis data (`df = pd.read_csv(StringIO(extracted))`). +- Menggunakan **Tesseract OCR** sebagai cadangan ketika konektivitas internet terbatas. +- Menambahkan post‑processing dengan **spaCy** untuk mengidentifikasi entitas seperti tanggal, jumlah, dan nama pedagang. + +Silakan bereksperimen: coba format gambar yang berbeda, sesuaikan kontras, atau ganti bahasa. Lanskap OCR sangat luas, dan keterampilan yang baru Anda dapatkan merupakan fondasi yang kuat untuk proyek otomatisasi dokumen apa pun. + +Selamat coding, semoga teks Anda selalu dapat dibaca! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/indonesian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/indonesian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..5c86c134e --- /dev/null +++ b/ocr/indonesian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,222 @@ +--- +category: general +date: 2026-03-28 +description: Pelajari cara menjalankan OCR pada gambar, mengunduh model Hugging Face + secara otomatis, membersihkan teks OCR, dan mengonfigurasi model LLM di Python menggunakan + Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: id +og_description: Jalankan OCR pada gambar dan bersihkan outputnya menggunakan model + Hugging Face yang diunduh secara otomatis. Panduan ini menunjukkan cara mengonfigurasi + model LLM di Python. +og_title: Jalankan OCR pada Gambar – Tutorial Lengkap Aspose OCR Cloud +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Jalankan OCR pada Gambar dengan Aspose OCR Cloud – Panduan Langkah-demi-Langkah + Lengkap +url: /id/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Jalankan OCR pada Gambar – Tutorial Lengkap Aspose OCR Cloud + +Pernahkah Anda perlu menjalankan OCR pada file gambar tetapi output mentahnya terlihat berantakan? Menurut pengalaman saya, titik sakit terbesar bukanlah pengenalan itu sendiri—melainkan pembersihan. Untungnya, Aspose OCR Cloud memungkinkan Anda menambahkan post‑processor LLM yang dapat *membersihkan teks OCR* secara otomatis. Dalam tutorial ini kita akan membahas semua yang Anda perlukan: mulai dari **mengunduh model Hugging Face** hingga mengonfigurasi LLM, menjalankan mesin OCR, dan akhirnya memoles hasilnya. + +Pada akhir panduan ini Anda akan memiliki skrip siap‑jalankan yang: + +1. Mengambil model Qwen 2.5 yang ringkas dari Hugging Face (diunduh otomatis untuk Anda). +2. Mengonfigurasi model untuk menjalankan sebagian jaringan di GPU dan sisanya di CPU. +3. Menjalankan mesin OCR pada gambar catatan tulisan tangan. +4. Menggunakan LLM untuk membersihkan teks yang dikenali, memberikan output yang dapat dibaca manusia. + +> **Prasyarat** – Python 3.8+, paket `asposeocrcloud`, GPU dengan setidaknya 4 GB VRAM (opsional tetapi disarankan), dan koneksi internet untuk pengunduhan model pertama. + +--- + +## Apa yang Anda Butuhkan + +- **Aspose OCR Cloud SDK** – instal melalui `pip install asposeocrcloud`. +- **Sebuah gambar contoh** – misalnya `handwritten_note.jpg` yang ditempatkan di folder lokal. +- **Dukungan GPU** – jika Anda memiliki GPU yang mendukung CUDA, skrip akan memindahkan 30 lapisan; jika tidak, akan otomatis kembali ke CPU. +- **Izin menulis** – skrip menyimpan cache model di `YOUR_DIRECTORY`; pastikan folder tersebut ada. + +--- + +## Langkah 1 – Mengonfigurasi Model LLM (unduh model Hugging Face) + +Hal pertama yang kami lakukan adalah memberi tahu Aspose AI dari mana mengambil model. Kelas `AsposeAIModelConfig` menangani pengunduhan otomatis, kuantisasi, dan alokasi lapisan GPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Mengapa ini penting** – Mengkuantisasi ke `int8` secara drastis mengurangi penggunaan memori (≈ 4 GB vs 12 GB). Membagi model antara GPU dan CPU memungkinkan Anda menjalankan LLM 3 miliar parameter bahkan pada RTX 3060 yang sederhana. Jika Anda tidak memiliki GPU, setel `gpu_layers=0` dan SDK akan menjaga semuanya di CPU. + +> **Tip:** Jalankan pertama kali akan mengunduh sekitar ~ 1,5 GB, jadi beri beberapa menit dan koneksi yang stabil. + +--- + +## Langkah 2 – Menginisialisasi Mesin AI dengan Konfigurasi Model + +Sekarang kami memulai mesin Aspose AI dan memberinya konfigurasi yang baru saja dibuat. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Apa yang terjadi di balik layar?** SDK memeriksa `directory_model_path` untuk model yang sudah ada. Jika menemukan versi yang cocok, ia memuatnya secara instan; jika tidak, ia mengunduh file GGUF dari Hugging Face, mengekstraknya, dan menyiapkan pipeline inferensi. + +--- + +## Langkah 3 – Membuat Mesin OCR dan Menambahkan Post‑Processor AI + +Mesin OCR melakukan pekerjaan berat dalam mengenali karakter. Dengan menambahkan `ocr_ai.run_postprocessor` kami mengaktifkan **pembersihan teks OCR** secara otomatis setelah pengenalan. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Mengapa menggunakan post‑processor?** OCR mentah sering menyertakan pemisahan baris di tempat yang salah, tanda baca yang terdeteksi keliru, atau simbol yang tidak diinginkan. LLM dapat menulis ulang output menjadi kalimat yang tepat, memperbaiki ejaan, dan bahkan menebak kata yang hilang—pada dasarnya mengubah dump mentah menjadi prosa yang dipoles. + +--- + +## Langkah 4 – Menjalankan OCR pada File Gambar + +Setelah semuanya terhubung, saatnya memberi gambar ke mesin. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Kasus tepi:** Jika gambar berukuran besar (> 5 MP), Anda mungkin ingin mengubah ukurannya terlebih dahulu untuk mempercepat pemrosesan. SDK menerima objek Pillow `Image`, jadi Anda dapat melakukan pra‑proses dengan `PIL.Image.thumbnail()` bila diperlukan. + +--- + +## Langkah 5 – Biarkan AI Membersihkan Teks yang Dikenali dan Tampilkan Kedua Versi + +Akhirnya kami memanggil post‑processor yang telah kami lampirkan sebelumnya. Langkah ini menunjukkan kontras antara *sebelum* dan *sesudah* pembersihan. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Output yang Diharapkan + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Perhatikan bagaimana LLM telah: + +- Memperbaiki kesalahan pengenalan OCR umum (`Th1s` → `This`). +- Menghapus simbol yang tidak diinginkan (`&` → `and`). +- Menormalkan pemisahan baris menjadi kalimat yang tepat. + +--- + +## 🎨 Ikhtisar Visual (Alur Kerja Jalankan OCR pada Gambar) + +![Jalankan OCR pada gambar workflow](run_ocr_on_image_workflow.png "Diagram yang menunjukkan alur kerja jalankan OCR pada gambar mulai dari pengunduhan model hingga output yang sudah dibersihkan") + +Diagram di atas merangkum seluruh pipeline: **unduh model Hugging Face → konfigurasikan LLM → inisialisasi AI → mesin OCR → post‑processor AI → bersihkan teks OCR**. + +--- + +## Pertanyaan Umum & Tips Pro + +### Bagaimana jika saya tidak memiliki GPU? + +Setel `gpu_layers=0` pada `AsposeAIModelConfig`. Model akan berjalan sepenuhnya di CPU, yang lebih lambat namun tetap berfungsi. Anda juga dapat beralih ke model yang lebih kecil (misalnya `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) agar waktu inferensi tetap wajar. + +### Bagaimana cara mengubah model nanti? + +Cukup perbarui `hugging_face_repo_id` dan jalankan kembali `ocr_ai.initialize(model_config)`. SDK akan mendeteksi perubahan versi, mengunduh model baru, dan menggantikan file cache. + +### Bisakah saya menyesuaikan prompt post‑processor? + +Ya. Kirimkan kamus ke `custom_settings` dengan kunci `prompt_template`. Contohnya: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Haruskah saya menyimpan teks yang sudah dibersihkan ke file? + +Tentu saja. Setelah pembersihan Anda dapat menulis hasilnya ke file `.txt` atau `.json` untuk pemrosesan lebih lanjut: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Kesimpulan + +Kami baru saja menunjukkan cara **menjalankan OCR pada gambar** dengan Aspose OCR Cloud, secara otomatis **mengunduh model Hugging Face**, secara ahli **mengonfigurasi pengaturan model LLM**, dan akhirnya **membersihkan teks OCR** menggunakan post‑processor LLM yang kuat. Seluruh proses dapat dimasukkan ke dalam satu skrip Python yang mudah dijalankan dan berfungsi baik pada mesin dengan GPU maupun hanya CPU. + +Jika Anda sudah nyaman dengan pipeline ini, pertimbangkan untuk bereksperimen dengan: + +- **LLM yang berbeda** – coba `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` untuk jendela konteks yang lebih besar. +- **Pemrosesan batch** – iterasi melalui folder gambar dan gabungkan hasil yang sudah dibersihkan ke dalam CSV. +- **Prompt khusus** – sesuaikan AI dengan domain Anda (dokumen hukum, catatan medis, dll.). + +Silakan ubah nilai `gpu_layers`, ganti model, atau pasang prompt Anda sendiri. Langit adalah batasnya, dan kode yang Anda miliki sekarang adalah landasan peluncuran. + +Selamat coding, semoga output OCR Anda selalu bersih! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/italian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/italian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..5dad2cbe6 --- /dev/null +++ b/ocr/italian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Come usare l'OCR per riconoscere il testo scritto a mano nelle immagini. + Impara a estrarre il testo scritto a mano, convertire l'immagine scritta a mano + e ottenere risultati puliti rapidamente. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: it +og_description: Come usare l'OCR per riconoscere il testo scritto a mano. Questo tutorial + ti mostra passo passo come estrarre il testo scritto a mano dalle immagini e ottenere + risultati rifiniti. +og_title: Come usare l'OCR per riconoscere il testo scritto a mano – Guida completa +tags: +- OCR +- Handwriting Recognition +- Python +title: Come utilizzare l'OCR per riconoscere il testo scritto a mano – Guida completa +url: /it/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Come usare OCR per riconoscere il testo scritto a mano – Guida completa + +Come usare OCR per appunti scritti a mano è una domanda che molti sviluppatori si pongono quando hanno bisogno di digitalizzare schizzi, verbali di riunioni o idee annotate rapidamente. In questa guida percorreremo i passaggi esatti per riconoscere il testo scritto a mano, estrarre il testo scritto a mano e trasformare un'immagine scritta a mano in stringhe pulite e ricercabili. + +Se ti sei mai trovato a fissare una foto di una lista della spesa chiedendoti “Posso convertire quest’immagine scritta a mano in testo senza dover digitare tutto di nuovo?” – sei nel posto giusto. Alla fine avrai uno script pronto all’uso che trasforma una **handwritten note to text** in pochi secondi. + +## Cosa ti servirà + +- Python 3.8+ (il codice funziona con qualsiasi versione recente) +- La libreria `ocr` – installala con `pip install ocr-sdk` (sostituisci con il nome del pacchetto del tuo provider) +- Un’immagine chiara di una nota scritta a mano (`hand_note.png` nell’esempio) +- Un po’ di curiosità e un caffè ☕️ (opzionale ma consigliato) + +Nessun framework ingombrante, nessuna chiave cloud a pagamento – solo un motore locale che supporta **handwritten recognition** out of the box. + +## Step 1 – Install the OCR Package and Import It + +First things first, let’s get the right package on your machine. Open a terminal and run: + +```bash +pip install ocr-sdk +``` + +Once the installation finishes, import the module in your script: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tip:** If you’re using a virtual environment, activate it before installing. That keeps your project tidy and avoids version clashes. + +## Step 2 – Create an OCR Engine and Enable Handwritten Mode + +Now we actually **how to use OCR** – we need an engine instance that knows we’re dealing with cursive strokes rather than printed fonts. The following snippet creates the engine and switches it to handwritten mode: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Why set `recognition_mode`? Because most OCR engines default to printed‑text detection, which often skips the loops and slants of a personal note. Enabling the handwritten mode boosts accuracy dramatically. + +## Step 3 – Load the Image You Want to Convert (Convert Handwritten Image) + +Images are the raw material for any OCR job. Make sure your picture is saved in a lossless format (PNG works great) and that the text is reasonably legible. Then load it like this: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +If the image lives next to your script, you can simply use `"hand_note.png"` instead of a full path. + +> **What if the image is blurry?** Try pre‑processing with OpenCV (e.g., `cv2.cvtColor` to grayscale, `cv2.threshold` to increase contrast) before feeding it to the OCR engine. + +## Step 4 – Run the Recognition Engine to Extract Handwritten Text + +With the engine ready and the image in memory, we can finally **extract handwritten text**. The `recognize` method returns a raw result object that contains the text plus confidence scores. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Typical raw output might include stray line breaks or mis‑identified characters, especially if the handwriting is messy. That’s why the next step exists. + +## Step 5 – (Optional) Polish the Output with the AI Post‑Processor + +Most modern OCR SDKs ship with a lightweight AI post‑processor that cleans up spacing, fixes common OCR errors, and normalizes line endings. Running it is as easy as: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +If you skip this step you’ll still get usable text, but the **handwritten note to text** conversion will look a bit rougher. The post‑processor is especially handy for notes that contain bullet points or mixed‑case words. + +## Step 6 – Verify the Result and Handle Edge Cases + +After printing the polished result, double‑check that everything looks right. Here’s a quick sanity check you can add: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Edge‑case checklist** + +| Situation | What to do | +|-----------|------------| +| **Very low contrast** | Increase contrast with `cv2.convertScaleAbs` before loading. | +| **Multiple languages** | Set `ocr_engine.language = ["en", "es"]` (or your target languages). | +| **Large documents** | Process pages in batches to avoid memory spikes. | +| **Special symbols** | Add a custom dictionary via `ocr_engine.add_custom_words([...])`. | + +## Visual Overview + +Below is a placeholder image that illustrates the workflow—from a photographed note to clean text. The alt text contains the primary keyword, making the image SEO‑friendly. + +![how to use OCR on a handwritten note image](/images/handwritten_ocr_flow.png "how to use OCR on a handwritten note image") + +## Full, Runnable Script + +Putting all the pieces together, here’s the complete, copy‑and‑paste‑ready program: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Expected output (example)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Notice how the post‑processor fixed the “T0d@y” typo and normalized spacing. + +## Common Pitfalls & Pro Tips + +- **Image size matters** – OCR engines usually cap input size at 4 K × 4 K. Resize large photos beforehand. +- **Handwriting style** – Cursive vs. block letters can affect accuracy. If you control the source (e.g., a digital pen), encourage block letters for best results. +- **Batch processing** – When dealing with dozens of notes, wrap the script in a loop and store each result in a CSV or SQLite DB. +- **Memory leaks** – Some SDKs keep internal buffers; call `ocr_engine.dispose()` after you’re done if you notice a slowdown. + +## Next Steps – Going Beyond Simple OCR + +Now that you’ve mastered **how to use OCR** for a single image, consider these extensions: + +1. **Integrate with cloud storage** – Pull images from AWS S3 or Azure Blob, run the same pipeline, and push the results back. +2. **Add language detection** – Use `ocr_engine.detect_language()` to automatically switch dictionaries. +3. **Combine with NLP** – Feed the cleaned text into spaCy or NLTK to extract entities, dates, or action items. +4. **Create a REST endpoint** – Wrap the script in Flask or FastAPI so other services can POST images and receive JSON‑encoded text. + +All of these ideas still revolve around the core concepts of **recognize handwritten text**, **extract handwritten text**, and **convert handwritten image**—the exact phrases you’ll likely search for next. + +--- + +### TL;DR + +We showed you **how to use OCR** to recognize handwritten text, extract it, and polish the result into a usable string. The full script is ready to run, the workflow is explained step‑by‑step, and you now have a checklist for common edge cases. Grab a photo of your next meeting note, plug it into the script, and let the machine do the typing for you. + +Happy coding, and may your notes always be readable! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/italian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/italian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..c26d3aa27 --- /dev/null +++ b/ocr/italian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,187 @@ +--- +category: general +date: 2026-03-28 +description: Esegui OCR sull'immagine e ottieni testo pulito con le coordinate delle + bounding box. Scopri come estrarre l'OCR, pulire l'OCR e visualizzare i risultati + passo dopo passo. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: it +og_description: Esegui OCR sull'immagine, pulisci l'output e mostra le coordinate + del riquadro di delimitazione in un tutorial conciso. +og_title: Esegui OCR sull'immagine – risultati puliti e riquadri +tags: +- OCR +- Computer Vision +- Python +title: Esegui OCR sull'immagine – Risultati puliti e mostra le coordinate del riquadro + di delimitazione +url: /it/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Esegui OCR su Immagine – Pulizia dei Risultati e Visualizzazione delle Coordinate delle Bounding Box + +Hai mai dovuto **eseguire OCR su file immagine** ma hai ottenuto testo confuso e non sai dove si trovi ogni parola nella foto? Non sei solo. In molti progetti—digitalizzazione di fatture, scansione di ricevute o semplice estrazione di testo—ottenere l'output grezzo dell'OCR è solo il primo ostacolo. La buona notizia? Puoi pulire quell'output e vedere immediatamente le coordinate della bounding box di ogni regione senza scrivere una montagna di codice boilerplate. + +In questa guida vedremo **come estrarre OCR**, eseguire un **come pulire OCR** post‑processore, e infine **visualizzare le coordinate della bounding box** per ogni regione pulita. Alla fine avrai uno script unico, eseguibile, che trasforma una foto sfocata in testo ordinato e strutturato pronto per l'elaborazione successiva. + +## Cosa Ti Serve + +- Python 3.9+ (la sintassi qui sotto funziona su 3.8 e versioni successive) +- Un motore OCR che supporti `recognize(..., return_structured=True)` – ad esempio, una libreria fittizia `engine` usata nello snippet. Sostituiscila con Tesseract, EasyOCR o qualsiasi SDK che restituisca dati di regione. +- Familiarità di base con funzioni e cicli Python +- Un file immagine che desideri analizzare (PNG, JPG, ecc.) + +> **Consiglio:** Se usi Tesseract, la funzione `pytesseract.image_to_data` fornisce già le bounding box. Puoi avvolgere il suo risultato in un piccolo adattatore che imiti l'API `engine.recognize` mostrata qui sotto. + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Testo alternativo: diagramma che mostra come eseguire OCR su immagine e visualizzare le coordinate delle bounding box* + +## Passo 1 – Esegui OCR su Immagine e Ottieni Regioni Strutturate + +La prima cosa è chiedere al motore OCR di restituire non solo testo semplice ma un elenco strutturato di regioni di testo. Questo elenco contiene la stringa grezza e il rettangolo che la racchiude. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Perché è importante:** +Quando chiedi solo il testo semplice perdi il contesto spaziale. I dati strutturati ti permettono in seguito di **visualizzare le coordinate della bounding box**, allineare il testo con tabelle o fornire posizioni precise a un modello successivo. + +## Passo 2 – Come Pulire l'Output OCR con un Post‑Processore + +I motori OCR sono ottimi nel riconoscere i caratteri, ma spesso lasciano spazi superflui, artefatti di interruzioni di riga o simboli riconosciuti erroneamente. Un post‑processore normalizza il testo, corregge gli errori OCR comuni e rimuove gli spazi bianchi inutili. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Se costruisci il tuo pulitore, considera: + +- Rimuovere i caratteri non‑ASCII (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Collassare più spazi in un singolo spazio +- Applicare un correttore ortografico come `pyspellchecker` per errori evidenti + +**Perché dovresti interessartene:** +Una stringa ordinata rende la ricerca, l'indicizzazione e le pipeline NLP successive molto più affidabili. In altre parole, **come pulire OCR** è spesso la differenza tra un dataset utilizzabile e un mal di testa. + +## Passo 3 – Visualizza le Coordinate della Bounding Box per Ogni Regione Pulita + +Ora che il testo è pulito, iteriamo su ogni regione, stampando il suo rettangolo e la stringa pulita. Questa è la parte in cui finalmente **visualizziamo le coordinate della bounding box**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Output di esempio** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Ora puoi inserire quelle coordinate in una libreria di disegno (ad es., OpenCV) per sovrapporre le box sull'immagine originale, o salvarle in un database per query successive. + +## Script Completo, Pronto‑da‑Eseguire + +Di seguito trovi il programma completo che unisce tutti e tre i passaggi. Sostituisci le chiamate placeholder `engine` con il tuo SDK OCR reale. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Come Eseguire + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Dovresti vedere un elenco di bounding box associate al testo pulito, esattamente come l'output di esempio sopra. + +## Domande Frequenti & Casi Limite + +| Domanda | Risposta | +|----------|----------| +| **E se il motore OCR non supporta `return_structured`?** | Scrivi un wrapper leggero che converta l'output grezzo del motore (di solito un elenco di parole con coordinate) in oggetti con attributi `text` e `bounding_box`. | +| **Posso ottenere i punteggi di confidenza?** | Molti SDK espongono una metrica di confidenza per regione. Aggiungila alla stampa: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Come gestire testo ruotato?** | Pre‑processa l'immagine con `cv2.minAreaRect` di OpenCV per correggere l'inclinazione prima di chiamare `recognize`. | +| **E se ho bisogno dell'output in JSON?** | Serializza `processed_result.regions` con `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **C'è un modo per visualizzare le box?** | Usa OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` dentro il ciclo, poi `cv2.imwrite("annotated.jpg", img)`. | + +## Conclusione + +Hai appena imparato **come eseguire OCR su immagine**, pulire l'output grezzo e **visualizzare le coordinate della bounding box** per ogni regione. Il flusso a tre step—riconoscimento → post‑processo → iterazione—è un modello riutilizzabile che puoi inserire in qualsiasi progetto Python che richieda un'estrazione di testo affidabile. + +### Qual è il Prossimo Passo? + +- **Esplora diversi back‑end OCR** (Tesseract, EasyOCR, Google Vision) e confronta l'accuratezza. +- **Integra con un database** per memorizzare i dati delle regioni in archivi ricercabili. +- **Aggiungi il rilevamento della lingua** per indirizzare ogni regione al correttore ortografico appropriato. +- **Sovrapponi le box sull'immagine originale** per una verifica visiva (vedi lo snippet OpenCV sopra). + +Se incontri stranezze, ricorda che il maggior vantaggio deriva da un solido passo di post‑processing; una stringa pulita è molto più facile da gestire rispetto a un dump grezzo di caratteri. + +Buon coding, e che le tue pipeline OCR siano sempre ordinate! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/italian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/italian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..6f29019fb --- /dev/null +++ b/ocr/italian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Tutorial OCR in Python che mostra come estrarre testo da un'immagine + con Aspose OCR Cloud. Impara a caricare l'immagine per l'OCR e a convertire l'immagine + in testo semplice in pochi minuti. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: it +og_description: Il tutorial OCR in Python spiega come caricare un'immagine per l'OCR + e convertire il testo semplice dell'immagine usando Aspose OCR Cloud. Ottieni il + codice completo e i consigli. +og_title: Tutorial OCR Python – Estrai il testo dalle immagini +tags: +- OCR +- Python +- Image Processing +title: Tutorial OCR Python – Estrai il testo dalle immagini +url: /it/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Tutorial OCR Python – Estrai Testo dalle Immagini + +Ti sei mai chiesto come trasformare una foto di una ricevuta confusa in testo pulito e ricercabile? Non sei il solo. Nella mia esperienza, l'ostacolo più grande non è il motore OCR in sé, ma ottenere l'immagine nel formato corretto e estrarre il testo semplice senza intoppi. + +Questo **python ocr tutorial** ti guida passo passo—caricamento di un'immagine per OCR, esecuzione del riconoscimento e, infine, conversione del testo semplice dell'immagine in una stringa Python che puoi memorizzare o analizzare. Alla fine sarai in grado di **extract text image python** in stile, e non avrai bisogno di alcuna licenza a pagamento per iniziare. + +## Cosa Imparerai + +- Come installare e importare l'Aspose OCR Cloud SDK per Python. +- Il codice esatto per **load image for OCR** (PNG, JPEG, TIFF, PDF, ecc.). +- Come chiamare il motore per eseguire la conversione **ocr image to text**. +- Suggerimenti per gestire casi limite comuni come PDF multi‑pagina o scansioni a bassa risoluzione. +- Modi per verificare l'output e cosa fare se il testo appare distorto. + +### Prerequisiti + +- Python 3.8+ installato sulla tua macchina. +- Un account gratuito Aspose Cloud (la versione di prova funziona senza licenza). +- Familiarità di base con pip e ambienti virtuali—nulla di complicato. + +> **Pro tip:** Se stai già usando un virtualenv, attivalo ora. Mantiene le dipendenze ordinate ed evita conflitti di versione. + +![Screenshot del tutorial OCR Python che mostra il testo riconosciuto](path/to/ocr_example.png "Tutorial OCR Python – visualizzazione del testo estratto") + +## Step 1 – Install the Aspose OCR Cloud SDK + +Prima di tutto, abbiamo bisogno della libreria che comunica con il servizio OCR di Aspose. Apri un terminale ed esegui: + +```bash +pip install asposeocrcloud +``` + +Quel singolo comando scarica l'SDK più recente (attualmente versione 23.12). Il pacchetto include tutto il necessario—nessuna libreria di elaborazione immagini aggiuntiva è richiesta. + +## Step 2 – Initialise the OCR Engine (Primary Keyword in Action) + +Ora che l'SDK è pronto, possiamo avviare il motore del **python ocr tutorial**. Il costruttore non richiede alcuna chiave di licenza per la versione di prova, il che semplifica le cose. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** Inizializzare il motore una sola volta mantiene le chiamate successive rapide. Se ricrei l'oggetto per ogni immagine sprecherai round‑trip di rete. + +## Step 3 – Load Image for OCR + +Ecco dove brilla la keyword **load image for OCR**. Il metodo `Image.load` dell'SDK accetta un percorso file o un URL, e rileva automaticamente il formato (PNG, JPEG, TIFF, PDF, ecc.). Carichiamo una ricevuta di esempio: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Se devi gestire un PDF multi‑pagina, punta semplicemente al file PDF; l'SDK tratterà ogni pagina come un'immagine separata internamente. + +## Step 4 – Perform OCR Image to Text Conversion + +Con l'immagine in memoria, l'effettivo OCR avviene in una sola riga. Il metodo `recognize` restituisce un oggetto `OcrResult` che contiene il testo semplice, i punteggi di confidenza e persino le bounding box se ti servono in seguito. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** Per foto a bassa risoluzione (meno di 300 dpi) potresti voler ingrandire l'immagine prima. L'SDK offre un helper `Resize`, ma per la maggior parte delle ricevute l'impostazione predefinita funziona bene. + +## Step 5 – Convert Image Plain Text to a Usable String + +L'ultimo pezzo del puzzle è estrarre il testo semplice dall'oggetto risultato. Questo è il passaggio **convert image plain text** che trasforma il blob OCR in qualcosa che puoi stampare, memorizzare o inviare a un altro sistema. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Quando esegui lo script, dovresti vedere qualcosa del genere: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Quell'output è ora una normale stringa Python, pronta per l'esportazione CSV, l'inserimento in un database o l'elaborazione del linguaggio naturale. + +## Handling Common Pitfalls + +### 1. Blank or Noisy Images + +Se `ocr_result.text` ritorna vuoto, ricontrolla la qualità dell'immagine. Una soluzione rapida è aggiungere un passaggio di pre‑elaborazione: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Multi‑Page PDFs + +Quando fornisci un PDF, `recognize` restituisce risultati per ogni pagina. Scorri i risultati così: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Language Support + +Aspose OCR supporta oltre 60 lingue. Per cambiare lingua, imposta la proprietà `language` prima di chiamare `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Full Working Example + +Mettendo tutto insieme, ecco uno script completo, pronto per il copia‑incolla, che copre dall'installazione alla gestione dei casi limite: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Esegui lo script (`python ocr_demo.py`) e vedrai l'output **ocr image to text** direttamente nella console. + +## Recap – What We Covered + +- Installato l'SDK **Aspose OCR Cloud** (`pip install asposeocrcloud`). +- **Inizializzato il motore OCR** senza licenza (perfetto per la prova). +- Dimostrato come **load image for OCR**, sia PNG, JPEG o PDF. +- Eseguito la conversione **ocr image to text** e **convertito image plain text** in una stringa Python utilizzabile. +- Affrontato problemi comuni come scansioni a bassa risoluzione, PDF multi‑pagina e selezione della lingua. + +## Next Steps & Related Topics + +Ora che hai padroneggiato il **python ocr tutorial**, considera di approfondire: + +- **Extract text image python** per l'elaborazione batch di grandi cartelle di ricevute. +- Integrare l'output OCR con **pandas** per l'analisi dei dati (`df = pd.read_csv(StringIO(extracted))`). +- Usare **Tesseract OCR** come fallback quando la connettività internet è limitata. +- Aggiungere post‑processing con **spaCy** per identificare entità come date, importi e nomi dei commercianti. + +Sentiti libero di sperimentare: prova formati immagine diversi, regola il contrasto o cambia lingua. Il panorama OCR è ampio, e le competenze appena acquisite costituiscono una solida base per qualsiasi progetto di automazione documentale. + +Happy coding, and may your text always be readable! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/italian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/italian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..fcefea4d4 --- /dev/null +++ b/ocr/italian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,221 @@ +--- +category: general +date: 2026-03-28 +description: Impara come eseguire l'OCR su un'immagine, scaricare automaticamente + il modello Hugging Face, pulire il testo OCR e configurare il modello LLM in Python + usando Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: it +og_description: Esegui OCR sull'immagine e pulisci l'output usando un modello Hugging Face + scaricato automaticamente. Questa guida mostra come configurare il modello LLM in + Python. +og_title: Esegui OCR su un'immagine – Tutorial completo di Aspose OCR Cloud +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Esegui OCR su immagine con Aspose OCR Cloud – Guida completa passo‑passo +url: /it/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Esegui OCR su Immagine – Tutorial Completo di Aspose OCR Cloud + +Hai mai dovuto eseguire OCR su file immagine ma l'output grezzo sembrava un pasticcio? Nella mia esperienza il punto dolente più grande non è il riconoscimento in sé, ma la pulizia. Fortunatamente, Aspose OCR Cloud ti permette di collegare un post‑processore LLM che può *pulire il testo OCR* automaticamente. In questo tutorial vedremo tutto ciò di cui hai bisogno: dal **download di un modello Hugging Face** alla configurazione dell'LLM, all'esecuzione del motore OCR e, infine, alla rifinitura del risultato. + +Al termine di questa guida avrai uno script pronto all'uso che: + +1. Scarica un modello compatto Qwen 2.5 da Hugging Face (auto‑downloadato per te). +2. Configura il modello per eseguire parte della rete su GPU e il resto su CPU. +3. Esegue il motore OCR su un'immagine di una nota scritta a mano. +4. Usa l'LLM per pulire il testo riconosciuto, fornendoti un output leggibile. + +> **Prerequisiti** – Python 3.8+, pacchetto `asposeocrcloud`, una GPU con almeno 4 GB di VRAM (opzionale ma consigliata) e una connessione internet per il primo download del modello. + +--- + +## Cosa Ti Serve + +- **Aspose OCR Cloud SDK** – installalo con `pip install asposeocrcloud`. +- **Un'immagine di esempio** – ad es. `handwritten_note.jpg` posizionata in una cartella locale. +- **Supporto GPU** – se disponi di una GPU abilitata CUDA, lo script scaricherà 30 layer; altrimenti tornerà automaticamente alla CPU. +- **Permessi di scrittura** – lo script memorizza nella cache il modello in `YOUR_DIRECTORY`; assicurati che la cartella esista. + +--- + +## Passo 1 – Configura il Modello LLM (download modello Hugging Face) + +La prima cosa che facciamo è dire ad Aspose AI dove recuperare il modello. La classe `AsposeAIModelConfig` gestisce l'auto‑download, la quantizzazione e l'allocazione dei layer GPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Perché è importante** – Quantizzare a `int8` riduce drasticamente l'uso di memoria (≈ 4 GB vs 12 GB). Dividere il modello tra GPU e CPU ti permette di eseguire un LLM da 3 miliardi di parametri anche su una modesta RTX 3060. Se non hai una GPU, imposta `gpu_layers=0` e l'SDK manterrà tutto sulla CPU. + +> **Suggerimento:** La prima esecuzione scaricherà ~ 1,5 GB, quindi concedi qualche minuto e una connessione stabile. + +--- + +## Passo 2 – Inizializza il Motore AI con la Configurazione del Modello + +Ora avviamo il motore Aspose AI e gli forniamo la configurazione appena creata. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Cosa succede dietro le quinte?** L'SDK controlla `directory_model_path` per un modello già presente. Se trova una versione corrispondente la carica immediatamente; altrimenti scarica il file GGUF da Hugging Face, lo decomprime e prepara la pipeline di inferenza. + +--- + +## Passo 3 – Crea il Motore OCR e Collega il Post‑Processore AI + +Il motore OCR si occupa del riconoscimento dei caratteri. Collegando `ocr_ai.run_postprocessor` abiliti **pulizia automatica del testo OCR** subito dopo il riconoscimento. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Perché usare un post‑processore?** L'OCR grezzo spesso contiene interruzioni di riga nei posti sbagliati, punteggiatura errata o simboli estranei. L'LLM può riscrivere l'output in frasi corrette, correggere l'ortografia e persino inferire parole mancanti—trasformando un dump grezzo in prosa levigata. + +--- + +## Passo 4 – Esegui OCR su un File Immagine + +Con tutto collegato, è il momento di fornire un'immagine al motore. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Caso limite:** Se l'immagine è grande (> 5 MP), potresti volerla ridimensionare prima per velocizzare l'elaborazione. L'SDK accetta un oggetto Pillow `Image`, quindi puoi pre‑processare con `PIL.Image.thumbnail()` se necessario. + +--- + +## Passo 5 – Lascia che l'AI Pulisca il Testo Riconosciuto e Mostra Entrambe le Versioni + +Infine invochiamo il post‑processore collegato in precedenza. Questo passaggio mostra il contrasto tra *prima* e *dopo* la pulizia. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Output Atteso + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Nota come l'LLM abbia: + +- Corretto errori comuni di OCR (`Th1s` → `This`). +- Rimosso simboli estranei (`&` → `and`). +- Normalizzato le interruzioni di riga in frasi corrette. + +--- + +## 🎨 Panoramica Visiva (Workflow Esegui OCR su Immagine) + +![Run OCR on image workflow](run_ocr_on_image_workflow.png "Diagram showing the run OCR on image pipeline from model download to cleaned output") + +Il diagramma sopra riassume l'intera pipeline: **download modello Hugging Face → configura LLM → inizializza AI → motore OCR → post‑processore AI → testo OCR pulito**. + +--- + +## Domande Frequenti & Pro Tips + +### E se non ho una GPU? + +Imposta `gpu_layers=0` in `AsposeAIModelConfig`. Il modello verrà eseguito interamente su CPU, più lentamente ma comunque funzionante. Puoi anche passare a un modello più piccolo (ad es., `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) per mantenere tempi di inferenza ragionevoli. + +### Come cambio modello in seguito? + +Basta aggiornare `hugging_face_repo_id` e rieseguire `ocr_ai.initialize(model_config)`. L'SDK rileverà il cambiamento di versione, scaricherà il nuovo modello e sostituirà i file nella cache. + +### Posso personalizzare il prompt del post‑processore? + +Sì. Passa un dizionario a `custom_settings` con la chiave `prompt_template`. Per esempio: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Devo salvare il testo pulito su file? + +Assolutamente. Dopo la pulizia puoi scrivere il risultato in un file `.txt` o `.json` per ulteriori elaborazioni: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Conclusione + +Ti abbiamo appena mostrato come **eseguire OCR su file immagine** con Aspose OCR Cloud, **scaricare automaticamente un modello Hugging Face**, configurare con maestria le impostazioni del **modello LLM** e infine **pulire il testo OCR** usando un potente post‑processore LLM. L'intero processo è racchiuso in un unico script Python facile da eseguire e funziona sia su macchine con GPU sia su quelle solo CPU. + +Se ti trovi a tuo agio con questa pipeline, sperimenta con: + +- **LLM diversi** – prova `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` per una finestra di contesto più ampia. +- **Elaborazione batch** – itera su una cartella di immagini e aggrega i risultati puliti in un CSV. +- **Prompt personalizzati** – adatta l'AI al tuo dominio (documenti legali, note mediche, ecc.). + +Sentiti libero di modificare il valore `gpu_layers`, cambiare modello o inserire il tuo prompt. Il cielo è il limite, e il codice che hai ora è la rampa di lancio. + +Buon coding, e che i tuoi output OCR siano sempre puliti! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/japanese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/japanese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..3d8f14bd8 --- /dev/null +++ b/ocr/japanese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,221 @@ +--- +category: general +date: 2026-03-28 +description: 画像内の手書きテキストを認識するためのOCRの使い方。手書きテキストの抽出、手書き画像の変換、そして迅速にきれいな結果を得る方法を学びましょう。 +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: ja +og_description: OCR を使って手書き文字を認識する方法。このチュートリアルでは、画像から手書き文字を抽出し、洗練された結果を得るまでの手順をステップバイステップで示します。 +og_title: OCRを使って手書き文字を認識する方法 – 完全ガイド +tags: +- OCR +- Handwriting Recognition +- Python +title: OCRを使って手書き文字を認識する方法 – 完全ガイド +url: /ja/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 手書きテキストを認識するための OCR の使い方 – 完全ガイド + +手書きメモに OCR を使用する方法は、スケッチや会議の議事録、または簡単なアイデアを書き留めたものをデジタル化する必要がある開発者がよく抱く質問です。このガイドでは、手書きテキストを認識し、抽出し、手書き画像をクリーンで検索可能な文字列に変換する正確な手順を順に解説します。 + +もし、買い物リストの写真を見て「この手書き画像をもう一度入力せずにテキストに変換できないか?」と思ったことがあるなら、ここがその場所です。最終的には、**handwritten note to text** を数秒で実行できるスクリプトが用意できます。 + +## 必要なもの + +- Python 3.8+(コードは最新バージョンで動作します) +- `ocr` ライブラリ – `pip install ocr-sdk` でインストールします(プロバイダーのパッケージ名に置き換えてください) +- 手書きノートの鮮明な画像(例では `hand_note.png`) +- 少しの好奇心とコーヒー ☕️(任意ですが推奨) + +重いフレームワークや有料のクラウドキーは不要です – すぐに使える **handwritten recognition** をサポートするローカルエンジンだけです。 + +## Step 1 – OCR パッケージのインストールとインポート + +まずは、必要なパッケージをマシンにインストールしましょう。ターミナルを開いて次のコマンドを実行します: + +```bash +pip install ocr-sdk +``` + +インストールが完了したら、スクリプトでモジュールをインポートします: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tip:** 仮想環境を使用している場合は、インストール前にそれをアクティブにしてください。これによりプロジェクトが整理され、バージョン衝突を防げます。 + +## Step 2 – OCR エンジンの作成と手書きモードの有効化 + +ここで本当に **how to use OCR**(OCR の使い方)です – 印刷フォントではなく手書きの筆跡を扱うことをエンジンに認識させるインスタンスが必要です。以下のスニペットはエンジンを作成し、手書きモードに切り替えます: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +`recognition_mode` を設定する理由は何ですか?ほとんどの OCR エンジンはデフォルトで印刷テキスト検出になっており、個人のメモのループや斜めの線を見逃しがちです。手書きモードを有効にすると、精度が大幅に向上します。 + +## Step 3 – 変換したい画像をロードする(手書き画像の変換) + +画像は OCR の原料です。画像はロスレス形式(PNG が最適)で保存し、テキストが十分に判読できることを確認してください。次のようにロードします: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +画像がスクリプトと同じディレクトリにある場合は、フルパスの代わりに `"hand_note.png"` を使用できます。 + +> **What if the image is blurry?** 画像がぼやけている場合は、OCR エンジンに渡す前に OpenCV で前処理(例:`cv2.cvtColor` でグレースケール化、`cv2.threshold` でコントラスト向上)を試してください。 + +## Step 4 – 認識エンジンを実行して手書きテキストを抽出する + +エンジンが準備でき、画像がメモリにロードされたら、ついに **extract handwritten text**(手書きテキストの抽出)を行えます。`recognize` メソッドはテキストと信頼度スコアを含む生の結果オブジェクトを返します。 + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +典型的な生データには余分な改行や誤認識文字が含まれることがあります。特に手書きが乱れている場合は顕著です。そのため次のステップが用意されています。 + +## Step 5 – (オプション)AI ポストプロセッサで出力を整える + +最新の OCR SDK の多くは、スペースを整え、一般的な OCR エラーを修正し、改行を正規化する軽量 AI ポストプロセッサを同梱しています。実行は次のように簡単です: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +このステップを省略しても使用可能なテキストは得られますが、**handwritten note to text** の変換はやや粗くなります。ポストプロセッサは、箇条書きや大小文字が混在するノートに特に便利です。 + +## Step 6 – 結果を検証し、エッジケースに対処する + +整形された結果を出力したら、内容が正しいか二重チェックします。以下は追加できる簡単なサニティチェックです: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**エッジケースチェックリスト** + +| 状況 | 対応策 | +|-----------|------------| +| **Very low contrast** | ロード前に `cv2.convertScaleAbs` でコントラストを上げます。 | +| **Multiple languages** | `ocr_engine.language = ["en", "es"]`(または対象言語)を設定します。 | +| **Large documents** | メモリ急増を防ぐためにページをバッチ処理します。 | +| **Special symbols** | `ocr_engine.add_custom_words([...])` でカスタム辞書を追加します。 | + +## ビジュアル概要 + +以下は、撮影したノートからクリーンなテキストへと変換するワークフローを示すプレースホルダー画像です。alt テキストには主要キーワードが含まれており、画像の SEO に有利です。 + +![手書きノート画像で OCR を使用する方法](/images/handwritten_ocr_flow.png "手書きノート画像で OCR を使用する方法") + +## 完全な実行可能スクリプト + +すべてのパーツを組み合わせた、コピー&ペーストで実行できる完全なプログラムは以下です: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**期待される出力(例)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +ポストプロセッサが “T0d@y” のタイプミスを修正し、スペースを正規化したことに注目してください。 + +## よくある落とし穴とプロのコツ + +- **Image size matters** – OCR エンジンは通常、入力サイズを 4 K × 4 K に制限します。大きな写真は事前にリサイズしてください。 +- **Handwriting style** – Cursive とブロック文字は精度に影響します。ソースを制御できる場合(例:デジタルペン)、ベストな結果のためにブロック文字を推奨します。 +- **Batch processing** – 数十枚のノートを処理する場合は、スクリプトをループで包み、各結果を CSV または SQLite DB に保存します。 +- **Memory leaks** – 一部の SDK は内部バッファを保持します。遅延が見られたら、完了後に `ocr_engine.dispose()` を呼び出してください。 + +## 次のステップ – シンプル OCR を超えて + +単一画像に対する **how to use OCR**(OCR の使い方)を習得したので、以下の拡張を検討してください: + +1. **Integrate with cloud storage** – AWS S3 や Azure Blob から画像を取得し、同じパイプラインを実行して結果をプッシュします。 +2. **Add language detection** – `ocr_engine.detect_language()` を使用して自動的に辞書を切り替えます。 +3. **Combine with NLP** – 整形されたテキストを spaCy や NLTK に渡し、エンティティ、日付、アクション項目を抽出します。 +4. **Create a REST endpoint** – スクリプトを Flask または FastAPI でラップし、他のサービスが画像を POST して JSON 形式のテキストを受け取れるようにします。 + +これらのアイデアはすべて、**recognize handwritten text**、**extract handwritten text**、**convert handwritten image** というコア概念を中心にしています – 次に検索するであろう正確なフレーズです。 + +--- + +### TL;DR + +私たちは **how to use OCR**(OCR の使い方)で手書きテキストを認識し、抽出し、結果を使える文字列に整形する方法を示しました。完全なスクリプトはすぐに実行可能で、ワークフローはステップバイステップで説明され、一般的なエッジケースのチェックリストも用意しました。次の会議ノートの写真を撮り、スクリプトに投入すれば、機械が代わりに入力してくれます。 + +コーディングを楽しんで、ノートが常に読めますように! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/japanese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/japanese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..19e26425e --- /dev/null +++ b/ocr/japanese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,183 @@ +--- +category: general +date: 2026-03-28 +description: 画像に対して OCR を実行し、バウンディングボックス座標付きのクリーンなテキストを取得します。OCR の抽出、クリーンアップ、結果のステップバイステップ表示方法を学びましょう。 +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: ja +og_description: 画像でOCRを実行し、出力をクリーンアップし、簡潔なチュートリアルでバウンディングボックスの座標を表示します。 +og_title: 画像でOCRを実行 – 結果とバウンディングボックスをクリーンに +tags: +- OCR +- Computer Vision +- Python +title: 画像でOCRを実行 – 結果をクリーンにし、バウンディングボックス座標を表示 +url: /ja/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 画像でOCRを実行 – 結果をクリーンアップしバウンディングボックス座標を表示 + +画像ファイルで **OCRを実行** したいけれど、テキストが乱雑で画像上の各単語の位置が分からない、という経験はありませんか? 多くのプロジェクト—請求書のデジタル化、レシートのスキャン、シンプルなテキスト抽出—で、OCR の生データを取得するだけでも最初のハードルです。 良いニュースは、出力をクリーンにして、膨大なボイラープレートコードを書かずに各領域のバウンディングボックス座標を即座に確認できることです。 + +このガイドでは **OCR の抽出方法**、**OCR をクリーンアップする方法** のポストプロセッサ、そして最終的に **バウンディングボックス座標を表示** する手順を順に解説します。 最後まで読むと、ぼやけた写真を整然とした構造化テキストに変換し、下流処理にすぐ使える単一の実行可能スクリプトが手に入ります。 + +## 必要なもの + +- Python 3.9+(以下の構文は3.8以降で動作します) +- `recognize(..., return_structured=True)` をサポートする OCR エンジン(例: スニペットで使用されている架空の `engine` ライブラリ)。Tesseract、EasyOCR、または領域データを返す任意の SDK に置き換えてください。 +- Python の関数とループに関する基本的な知識 +- スキャンしたい画像ファイル(PNG、JPG など) + +> **Pro tip:** Tesseract を使用している場合、`pytesseract.image_to_data` 関数はすでにバウンディングボックスを提供します。その結果を小さなアダプタでラップして、下記の `engine.recognize` API と同じ形にすることができます。 + +--- + +![画像でOCRを実行しバウンディングボックス座標を可視化する図](image-placeholder.png "画像でOCRを実行しバウンディングボックス座標を可視化する図") + +*Alt text: diagram showing how to perform OCR on image and visualize bounding box coordinates* + +## ステップ 1 – 画像でOCRを実行し構造化された領域を取得 + +最初に OCR エンジンに、単なるプレーンテキストではなく、テキスト領域の構造化リストを返すよう指示します。このリストは生文字列とそれを囲む矩形を含みます。 + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**なぜ重要か:** +プレーンテキストだけを取得すると空間的コンテキストが失われます。構造化データがあれば、後で **バウンディングボックス座標を表示** したり、テキストをテーブルに合わせたり、正確な位置情報を下流モデルに渡したりできます。 + +## ステップ 2 – ポストプロセッサでOCR出力をクリーンアップする方法 + +OCR エンジンは文字を検出するのは得意ですが、余計なスペースや改行アーティファクト、誤認識シンボルが残りがちです。ポストプロセッサはテキストを正規化し、一般的な OCR エラーを修正し、余分な空白をトリムします。 + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +独自のクリーンアップロジックを作る場合は、以下を検討してください: + +- 非ASCII文字の除去 (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- 複数のスペースを単一のスペースに縮小 +- `pyspellchecker` のようなスペルチェッカーを適用して明らかな誤字を修正 + +**なぜ気にすべきか:** +整った文字列は検索、インデックス作成、下流の NLP パイプラインの信頼性を格段に向上させます。言い換えれば、 **OCR をクリーンアップする方法** が使えるデータセットと頭痛の種になるデータセットの差を決めます。 + +## ステップ 3 – クリーンアップされた各領域のバウンディングボックス座標を表示 + +テキストが整ったら、各領域を走査し、矩形とクリーンな文字列を出力します。ここが最終的に **バウンディングボックス座標を表示** する部分です。 + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**サンプル出力** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +この座標を描画ライブラリ(例: OpenCV)に渡して元画像にボックスを重ね合わせたり、後でクエリできるようにデータベースに保存したりできます。 + +## 完全な実行可能スクリプト + +以下は 3 つのステップをすべて結びつけた完全なプログラムです。プレースホルダーの `engine` 呼び出しを実際に使用している OCR SDK に差し替えてください。 + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### 実行方法 + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +実行すると、サンプル出力と同様にクリーンテキストとバウンディングボックスの一覧が表示されます。 + +## よくある質問とエッジケース + +| Question | Answer | +|----------|--------| +| **OCRエンジンが `return_structured` をサポートしていない場合はどうしますか?** | エンジンの生データ(通常は座標付き単語のリスト)を `text` と `bounding_box` 属性を持つオブジェクトに変換する薄いラッパーを書きます。 | +| **信頼度スコアを取得できますか?** | 多くの SDK は領域ごとの信頼度指標を提供しています。出力文に追加してください: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`。 | +| **回転したテキストを処理するには?** | `recognize` を呼び出す前に、OpenCV の `cv2.minAreaRect` で画像をデスクューする前処理を行います。 | +| **出力を JSON 形式にしたい場合は?** | `processed_result.regions` を `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)` でシリアライズします。 | +| **ボックスを可視化する方法はありますか?** | ループ内で OpenCV を使用し、`cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` と描画し、最後に `cv2.imwrite("annotated.jpg", img)` で保存します。 | + +## まとめ + +**画像でOCRを実行**し、生データをクリーンアップし、 **バウンディングボックス座標を表示** する方法を学びました。認識 → ポストプロセス → 走査という 3 ステップのフローは、信頼できるテキスト抽出が必要なあらゆる Python プロジェクトに再利用可能なパターンです。 + +### 次にやること + +- **異なる OCR バックエンド**(Tesseract、EasyOCR、Google Vision)を試して精度を比較する。 +- **データベースと統合**し、領域データを検索可能なアーカイブとして保存する。 +- **言語検出を追加**して、各領域を適切なスペルチェッカーにルーティングする。 +- **元画像にボックスをオーバーレイ**して視覚的に検証する(上記の OpenCV スニペット参照)。 + +問題が発生したら、最大の効果はしっかりしたポストプロセッシングにあることを思い出してください。クリーンな文字列は、生の文字列の塊よりもはるかに扱いやすくなります。 + +Happy coding, and may your OCR pipelines be ever tidy! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/japanese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/japanese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..d9fcc0e63 --- /dev/null +++ b/ocr/japanese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,230 @@ +--- +category: general +date: 2026-03-28 +description: Aspose OCR Cloud を使用した Python OCR チュートリアルです。Python で画像からテキストを抽出する方法を示します。OCR + 用に画像を読み込み、数分で画像をプレーンテキストに変換する方法を学びましょう。 +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: ja +og_description: Python OCRチュートリアルでは、OCR用に画像を読み込む方法と、Aspose OCR Cloudを使用して画像のプレーンテキストに変換する方法を解説しています。完全なコードとヒントを入手してください。 +og_title: Python OCRチュートリアル – 画像からテキストを抽出する +tags: +- OCR +- Python +- Image Processing +title: Python OCRチュートリアル – 画像からテキストを抽出 +url: /ja/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR チュートリアル – 画像からテキストを抽出する + +散らかった領収書の写真をきれいで検索可能なテキストに変換したいと思ったことはありませんか? あなただけではありません。私の経験では、最大の障壁は OCR エンジン自体ではなく、画像を正しい形式に変換し、問題なくプレーンテキストを取り出すことです。 + +この **python ocr tutorial** では、OCR 用に画像をロードし、認識を実行し、最終的に画像のプレーンテキストを Python の文字列に変換して保存や分析ができるようになるまでのすべての手順を案内します。最後まで読めば、**extract text image python** スタイルでテキストを抽出でき、開始するために有料ライセンスは不要です。 + +## 学べること + +- Aspose OCR Cloud SDK for Python のインストールとインポート方法。 +- **load image for OCR** 用の正確なコード(PNG、JPEG、TIFF、PDF など)。 +- **ocr image to text** 変換を実行するエンジンの呼び出し方。 +- マルチページ PDF や低解像度スキャンなど、一般的なエッジケースへの対処法。 +- 出力を検証する方法と、テキストが乱れた場合の対処法。 + +### 前提条件 + +- Python 3.8+ がマシンにインストールされていること。 +- 無料の Aspose Cloud アカウント(トライアルはライセンス不要で動作)。 +- pip と仮想環境の基本的な知識—特別なことは不要です。 + +> **Pro tip:** すでに virtualenv を使用している場合は、今すぐ有効化してください。依存関係が整理され、バージョン衝突を防げます。 + +![Python OCR チュートリアルのスクリーンショット(認識されたテキストが表示)](path/to/ocr_example.png "Python OCR チュートリアル – 抽出されたプレーンテキスト表示") + +## Step 1 – Install the Aspose OCR Cloud SDK + +まず最初に、Aspose の OCR サービスと通信するライブラリが必要です。ターミナルを開いて次のコマンドを実行してください。 + +```bash +pip install asposeocrcloud +``` + +この単一コマンドで最新の SDK(現在のバージョンは 23.12)を取得できます。パッケージには必要なものがすべて含まれており、追加の画像処理ライブラリは不要です。 + +## Step 2 – Initialise the OCR Engine (Primary Keyword in Action) + +SDK の準備ができたので、**python ocr tutorial** エンジンを起動できます。トライアル版ではライセンスキーは不要なので、シンプルに開始できます。 + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** エンジンを一度だけ初期化すれば、以降の呼び出しが高速になります。画像ごとにオブジェクトを再作成すると、ネットワーク往復が無駄になります。 + +## Step 3 – Load Image for OCR + +ここで **load image for OCR** キーワードが活躍します。SDK の `Image.load` メソッドはファイルパスまたは URL を受け取り、形式(PNG、JPEG、TIFF、PDF など)を自動的に検出します。サンプルの領収書をロードしてみましょう。 + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +マルチページ PDF を扱う場合は、PDF ファイルを指定するだけで、SDK が内部的に各ページを個別の画像として処理します。 + +## Step 4 – Perform OCR Image to Text Conversion + +画像がメモリ上にある状態で、実際の OCR はワンラインで完了します。`recognize` メソッドはプレーンテキスト、信頼度スコア、必要に応じてバウンディングボックスを含む `OcrResult` オブジェクトを返します。 + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** 300 dpi 未満の低解像度画像の場合は、先に画像を拡大した方が良いかもしれません。SDK には `Resize` ヘルパーがありますが、ほとんどの領収書ではデフォルトで問題ありません。 + +## Step 5 – Convert Image Plain Text to a Usable String + +最後のピースは、結果オブジェクトからプレーンテキストを抽出することです。これが **convert image plain text** ステップで、OCR のバイナリデータを印刷・保存・他システムへの入力に使える文字列に変換します。 + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +スクリプトを実行すると、次のような出力が得られるはずです。 + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +この出力は通常の Python 文字列となり、CSV エクスポート、データベースへの挿入、自然言語処理などにすぐ利用できます。 + +## Handling Common Pitfalls + +### 1. Blank or Noisy Images + +`ocr_result.text` が空の場合は、画像品質を再確認してください。簡単な対策として前処理ステップを追加できます。 + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Multi‑Page PDFs + +PDF を入力すると、`recognize` は各ページごとの結果を返します。以下のようにループ処理してください。 + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Language Support + +Aspose OCR は 60 以上の言語に対応しています。言語を切り替えるには、`recognize` を呼び出す前に `language` プロパティを設定します。 + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Full Working Example + +すべてをまとめた、インストールからエッジケース処理まで網羅したコピー&ペースト可能な完全スクリプトをご紹介します。 + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +スクリプトを実行します(`python ocr_demo.py`)と、コンソールに **ocr image to text** の出力が表示されます。 + +## Recap – What We Covered + +- **Aspose OCR Cloud** SDK をインストール(`pip install asposeocrcloud`)。 +- ライセンス不要で **Initialised the OCR engine**(トライアルに最適)。 +- PNG、JPEG、PDF など、**load image for OCR** の方法を実演。 +- **ocr image to text** 変換と **converted image plain text** を実行し、使える Python 文字列に変換。 +- 低解像度スキャン、マルチページ PDF、言語選択といった一般的な落とし穴に対処。 + +## Next Steps & Related Topics + +**python ocr tutorial** をマスターした今、次のテーマを検討してみてください。 + +- 大量の領収書フォルダーをバッチ処理する **Extract text image python**。 +- OCR 出力を **pandas** と組み合わせてデータ分析に活用(`df = pd.read_csv(StringIO(extracted))`)。 +- インターネット接続が不安定な場合の代替手段として **Tesseract OCR** を使用。 +- **spaCy** で事後処理し、日付、金額、店舗名などのエンティティを抽出。 + +自由に実験してください:異なる画像形式を試す、コントラストを調整する、言語を切り替えるなど。OCR の領域は広く、今回習得したスキルはあらゆる文書自動化プロジェクトの堅実な基盤となります。 + +Happy coding, and may your text always be readable! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/japanese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/japanese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..c506446ee --- /dev/null +++ b/ocr/japanese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,216 @@ +--- +category: general +date: 2026-03-28 +description: Aspose OCR Cloud を使用して、画像で OCR を実行し、Hugging Face モデルを自動的にダウンロードし、OCR + テキストをクリーンアップし、Python で LLM モデルを設定する方法を学びます。 +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: ja +og_description: 画像でOCRを実行し、Auto‑downloaded Hugging Faceモデルを使用して出力をクリーンアップします。このガイドでは、PythonでLLMモデルを設定する方法を示します。 +og_title: 画像でOCRを実行 – 完全なAspose OCRクラウドチュートリアル +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Aspose OCR Cloudで画像のOCRを実行する – 完全ステップバイステップガイド +url: /ja/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 画像でOCRを実行 – 完全 Aspose OCR Cloud チュートリアル + +画像ファイルでOCRを実行したことがありますか?しかし、出力が乱雑な文字列のように見えることはありませんか?私の経験では、最大の課題は認識そのものではなく、結果のクリーンアップです。幸い、Aspose OCR Cloud では、LLM ポストプロセッサーを添付して *OCR テキストを自動的にクリーンアップ* できるようになっています。このチュートリアルでは、**Hugging Face モデルのダウンロード**から LLM の設定、OCR エンジンの実行、そして最終的な結果の磨き上げまで、必要な手順をすべて解説します。 + +このガイドの最後までに、すぐに実行できるスクリプトが手に入ります。そのスクリプトは以下を実現します: + +1. Hugging Face からコンパクトな Qwen 2.5 モデルを取得します(自動ダウンロード)。 +2. モデルを設定し、ネットワークの一部を GPU、残りを CPU で実行します。 +3. 手書きメモ画像に対して OCR エンジンを実行します。 +4. LLM を使用して認識されたテキストをクリーンアップし、人間が読みやすい出力を得ます。 + +> **前提条件** – Python 3.8+、`asposeocrcloud` パッケージ、最低 4 GB VRAM を持つ GPU(オプションだが推奨)、および最初のモデルダウンロードのためのインターネット接続。 + +--- + +## 必要なもの + +- **Aspose OCR Cloud SDK** – `pip install asposeocrcloud` でインストールします。 +- **サンプル画像** – 例: `handwritten_note.jpg` をローカルフォルダーに配置します。 +- **GPU サポート** – CUDA 対応 GPU がある場合、スクリプトは 30 層をオフロードします。GPU がない場合は自動的に CPU にフォールバックします。 +- **書き込み権限** – スクリプトはモデルを `YOUR_DIRECTORY` にキャッシュします。フォルダーが存在することを確認してください。 + +--- + +## ステップ 1 – LLM モデルの設定(Hugging Face モデルのダウンロード) + +最初に行うのは、Aspose AI にモデルの取得先を指示することです。`AsposeAIModelConfig` クラスは自動ダウンロード、量子化、GPU レイヤー割り当てを処理します。 + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**なぜ重要か** – `int8` に量子化することでメモリ使用量が大幅に削減されます(≈ 4 GB 対 12 GB)。モデルを GPU と CPU に分割することで、RTX 3060 のような控えめな GPU でも 30 億パラメータの LLM を実行できます。GPU がない場合は `gpu_layers=0` と設定すれば、SDK はすべて CPU 上で動作します。 + +> **ヒント**: 初回実行時に約 1.5 GB をダウンロードするため、数分間と安定した接続を確保してください。 + +--- + +## ステップ 2 – モデル設定で AI エンジンを初期化 + +ここで Aspose AI エンジンを起動し、先ほど作成した設定を渡します。 + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**内部で何が起きているか** – SDK は `directory_model_path` に既存のモデルがあるか確認します。該当バージョンが見つかれば即座にロードし、なければ Hugging Face から GGUF ファイルをダウンロードし、解凍して推論パイプラインを準備します。 + +--- + +## ステップ 3 – OCR エンジンを作成し、AI ポストプロセッサーを添付 + +OCR エンジンは文字認識の重い処理を担当します。`ocr_ai.run_postprocessor` を添付することで、認識後に自動的に **clean OCR text** が有効になります。 + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**なぜポストプロセッサーを使うのか** – 生の OCR には、誤った改行や誤検出された句読点、不要な記号が含まれがちです。LLM は出力を適切な文に書き直し、スペルを修正し、欠落した単語を推測することさえできます。要するに、生のダンプを洗練された文章に変換します。 + +--- + +## ステップ 4 – 画像ファイルで OCR を実行 + +すべてが接続されたので、画像をエンジンに渡す時です。 + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**エッジケース**: 画像が大きい(> 5 MP)場合、処理速度向上のために事前にリサイズした方が良いでしょう。SDK は Pillow の `Image` オブジェクトを受け取るので、必要に応じて `PIL.Image.thumbnail()` で前処理できます。 + +--- + +## ステップ 5 – AI に認識テキストのクリーンアップをさせ、両方のバージョンを表示 + +最後に、先ほど添付したポストプロセッサーを呼び出します。このステップで *クリーンアップ前* と *クリーンアップ後* の対比を示します。 + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### 期待される出力 + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +LLM が以下のように処理したことに注目してください: + +- 一般的な OCR 誤認識を修正(`Th1s` → `This`)。 +- 不要な記号を除去(`&` → `and`)。 +- 改行を正しい文に正規化。 + +--- + +## 🎨 ビジュアル概要(画像で OCR を実行するワークフロー) + +![画像で OCR を実行するワークフロー](run_ocr_on_image_workflow.png "モデルダウンロードからクリーンアップ出力までの画像 OCR パイプラインを示す図") + +上の図はフルパイプラインを要約しています:**Hugging Face モデルのダウンロード → LLM の設定 → AI の初期化 → OCR エンジン → AI ポストプロセッサー → OCR テキストのクリーンアップ**。 + +--- + +## よくある質問とプロのコツ + +### GPU がない場合はどうすればいいですか? + +`AsposeAIModelConfig` で `gpu_layers=0` を設定します。モデルは完全に CPU 上で実行され、速度は遅くなりますが機能します。推論時間を抑えるために、より小さいモデル(例: `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`)に切り替えることも可能です。 + +### 後でモデルを変更するには? + +`hugging_face_repo_id` を更新し、`ocr_ai.initialize(model_config)` を再実行するだけです。SDK はバージョン変更を検知し、新しいモデルをダウンロードしてキャッシュファイルを置き換えます。 + +### ポストプロセッサーのプロンプトをカスタマイズできますか? + +はい。`custom_settings` に `prompt_template` キーを持つ辞書を渡します。例: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### クリーンアップしたテキストをファイルに保存すべきですか? + +ぜひ保存してください。クリーンアップ後、結果を `.txt` や `.json` ファイルに書き出して、後続の処理に利用できます。 + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +## 結論 + +ここでは、Aspose OCR Cloud を使用して画像ファイルで **OCR を実行**し、**Hugging Face モデルを自動ダウンロード**し、LLM モデル設定を巧みに **構成**し、最後に強力な LLM ポストプロセッサーで **OCR テキストをクリーンアップ**する方法を示しました。全工程は単一の実行しやすい Python スクリプトに収まり、GPU 対応マシンでも CPU のみのマシンでも動作します。 + +このパイプラインに慣れたら、以下の点を試してみてください: + +- **異なる LLM** – より大きなコンテキストウィンドウが必要なら `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` を試してください。 +- **バッチ処理** – 画像フォルダーをループし、クリーンアップ結果を CSV に集約します。 +- **カスタムプロンプト** – AI を特定の領域(法務文書、医療メモなど)に合わせて調整します。 + +`gpu_layers` の値を調整したり、モデルを入れ替えたり、独自のプロンプトを組み込んだりして自由にカスタマイズしてください。可能性は無限で、現在手元にあるコードが出発点です。 + +コーディングを楽しんで、OCR の出力が常にクリーンでありますように! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/korean/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/korean/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..75304f8dd --- /dev/null +++ b/ocr/korean/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,223 @@ +--- +category: general +date: 2026-03-28 +description: 이미지에서 손글씨 텍스트를 인식하기 위해 OCR을 사용하는 방법. 손글씨 텍스트를 추출하고, 손글씨 이미지를 변환하며, 빠르게 + 깨끗한 결과를 얻는 방법을 배워보세요. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: ko +og_description: OCR을 사용하여 손글씨를 인식하는 방법. 이 튜토리얼은 이미지에서 손글씨를 추출하고 깔끔한 결과를 얻는 과정을 단계별로 + 보여줍니다. +og_title: OCR을 사용하여 손글씨 텍스트 인식하는 방법 – 완전 가이드 +tags: +- OCR +- Handwriting Recognition +- Python +title: OCR를 사용해 손글씨 텍스트 인식하는 방법 – 완전 가이드 +url: /ko/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 손글씨 텍스트 인식을 위한 OCR 사용 방법 – 완전 가이드 + +손글씨 노트를 위한 OCR 사용 방법은 스케치, 회의록, 혹은 빠르게 적은 아이디어를 디지털화해야 할 때 많은 개발자들이 묻는 질문입니다. 이 가이드에서는 손글씨 텍스트를 인식하고, 추출하며, 손글씨 이미지를 깔끔하고 검색 가능한 문자열로 변환하는 정확한 단계들을 안내합니다. + +식료품 목록 사진을 보면서 “이 손글씨 이미지를 다시 타이핑하지 않고 텍스트로 변환할 수 있을까?” 라고 생각해 본 적이 있다면, 바로 여기가 맞는 곳입니다. 끝까지 따라오시면 **손글씨 노트를 텍스트로 변환**하는 스크립트를 몇 초 만에 실행할 수 있게 됩니다. + +## 준비물 + +- Python 3.8+ (코드는 최신 버전에서 모두 동작합니다) +- `ocr` 라이브러리 – `pip install ocr-sdk` 로 설치합니다 (귀하의 제공업체 패키지 이름으로 교체하세요) +- 손글씨 노트의 선명한 사진 (`hand_note.png` 예시 파일) +- 조금의 호기심과 커피 ☕️ (선택 사항이지만 권장합니다) + +무거운 프레임워크도, 유료 클라우드 키도 필요 없습니다 – 바로 **손글씨 인식**을 지원하는 로컬 엔진만 있으면 됩니다. + +## Step 1 – OCR 패키지 설치 및 임포트 + +먼저, 올바른 패키지를 머신에 설치합시다. 터미널을 열고 다음을 실행하세요: + +```bash +pip install ocr-sdk +``` + +설치가 완료되면 스크립트에서 모듈을 임포트합니다: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tip:** 가상 환경을 사용 중이라면 설치하기 전에 활성화하세요. 이렇게 하면 프로젝트가 깔끔해지고 버전 충돌을 피할 수 있습니다. + +## Step 2 – OCR 엔진 생성 및 손글씨 모드 활성화 + +이제 실제로 **OCR 사용 방법**을 적용합니다 – 인쇄된 글꼴이 아닌 필기체를 인식한다는 것을 엔진에 알려야 합니다. 다음 스니펫은 엔진을 생성하고 손글씨 모드로 전환합니다: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +`recognition_mode`를 설정하는 이유는 대부분의 OCR 엔진이 기본적으로 인쇄 텍스트 감지를 수행하기 때문에 개인 노트의 곡선과 기울임을 놓치기 쉽기 때문입니다. 손글씨 모드를 활성화하면 정확도가 크게 향상됩니다. + +## Step 3 – 변환할 이미지 로드 (손글씨 이미지 변환) + +이미지는 모든 OCR 작업의 원시 자료입니다. 사진이 무손실 포맷(PNG 등)으로 저장되어 있고 텍스트가 충분히 읽을 수 있는지 확인하세요. 그런 다음 아래와 같이 로드합니다: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +이미지가 스크립트와 같은 디렉터리에 있다면 전체 경로 대신 `"hand_note.png"` 를 사용할 수 있습니다. + +> **이미지가 흐릿하면?** OpenCV를 사용해 전처리해 보세요(예: `cv2.cvtColor` 로 그레이스케일 변환, `cv2.threshold` 로 대비 증가) 후 OCR 엔진에 전달합니다. + +## Step 4 – 인식 엔진 실행으로 손글씨 텍스트 추출 + +엔진이 준비되고 이미지가 메모리에 로드되면 이제 **손글씨 텍스트를 추출**할 수 있습니다. `recognize` 메서드는 텍스트와 신뢰도 점수를 포함한 원시 결과 객체를 반환합니다. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +일반적인 원시 출력에는 불필요한 줄 바꿈이나 잘못 인식된 문자들이 포함될 수 있습니다(특히 손글씨가 지저분한 경우). 그래서 다음 단계가 필요합니다. + +## Step 5 – (선택) AI 후처리기로 출력 다듬기 + +대부분의 최신 OCR SDK는 간단한 AI 후처리기를 제공하여 공백을 정리하고 일반적인 OCR 오류를 수정하며 줄 끝을 정규화합니다. 실행은 다음과 같이 간단합니다: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +이 단계를 건너뛰어도 사용 가능한 텍스트를 얻을 수 있지만 **손글씨 노트를 텍스트로 변환**하는 결과가 다소 거칠게 보일 수 있습니다. 후처리기는 특히 글머리표나 혼합 대소문자가 포함된 노트에 유용합니다. + +## Step 6 – 결과 확인 및 엣지 케이스 처리 + +다듬어진 결과를 출력한 뒤, 모든 것이 올바른지 다시 확인하세요. 아래는 간단히 추가할 수 있는 검증 코드입니다: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**엣지 케이스 체크리스트** + +| 상황 | 조치 | +|-----------|------------| +| **대조가 매우 낮음** | 로드하기 전에 `cv2.convertScaleAbs` 로 대비를 높이세요. | +| **다중 언어** | `ocr_engine.language = ["en", "es"]` (또는 목표 언어) 로 설정하세요. | +| **대용량 문서** | 메모리 급증을 방지하기 위해 페이지를 배치 처리하세요. | +| **특수 기호** | `ocr_engine.add_custom_words([...])` 로 사용자 사전을 추가하세요. | + +## 시각적 개요 + +아래는 워크플로우를 보여주는 플레이스홀더 이미지입니다—촬영된 노트에서 깔끔한 텍스트까지. alt 텍스트에 주요 키워드가 포함되어 있어 이미지 SEO에 유리합니다. + +![손글씨 이미지에서 OCR 사용 방법](/images/handwritten_ocr_flow.png "손글씨 이미지에서 OCR 사용 방법") + +## 전체 실행 가능한 스크립트 + +모든 요소를 합쳐서, 복사‑붙여넣기만 하면 바로 실행 가능한 전체 프로그램은 다음과 같습니다: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**예상 출력 (예시)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +후처리기가 “T0d@y” 오타를 수정하고 공백을 정규화한 것을 확인하세요. + +## 흔히 발생하는 문제와 팁 + +- **이미지 크기 중요** – OCR 엔진은 보통 입력 크기를 4K×4K로 제한합니다. 큰 사진은 미리 리사이즈하세요. +- **필기 스타일** – 필기체와 블록체는 정확도에 영향을 줍니다. 소스(예: 디지털 펜)를 제어할 수 있다면 블록체를 권장합니다. +- **배치 처리** – 수십 개의 노트를 처리할 때는 스크립트를 루프에 감싸고 각 결과를 CSV 또는 SQLite DB에 저장하세요. +- **메모리 누수** – 일부 SDK는 내부 버퍼를 유지합니다; 속도가 느려지는 것을 감지하면 작업이 끝난 뒤 `ocr_engine.dispose()` 를 호출하세요. + +## 다음 단계 – 단순 OCR을 넘어 + +단일 이미지에 대한 **OCR 사용 방법**을 마스터했으니, 다음 확장 기능을 고려해 보세요: + +1. **클라우드 스토리지와 통합** – AWS S3 또는 Azure Blob에서 이미지를 가져와 동일 파이프라인을 실행하고 결과를 다시 업로드합니다. +2. **언어 감지 추가** – `ocr_engine.detect_language()` 를 사용해 사전을 자동 전환합니다. +3. **NLP와 결합** – 정제된 텍스트를 spaCy 또는 NLTK에 입력해 엔터티, 날짜, 작업 항목 등을 추출합니다. +4. **REST 엔드포인트 생성** – 스크립트를 Flask 또는 FastAPI로 감싸 다른 서비스가 이미지를 POST하고 JSON 형태 텍스트를 받을 수 있게 합니다. + +이 모든 아이디어는 **손글씨 텍스트 인식**, **손글씨 텍스트 추출**, **손글씨 이미지 변환**이라는 핵심 개념을 중심으로 합니다—다음에 검색할 가능성이 높은 정확한 문구들입니다. + +--- + +### TL;DR + +우리는 **OCR 사용 방법**을 보여주어 손글씨 텍스트를 인식하고 추출하며, 결과를 사용 가능한 문자열로 다듬었습니다. 전체 스크립트는 바로 실행할 수 있고, 워크플로우는 단계별로 설명되었으며, 일반적인 엣지 케이스를 위한 체크리스트도 제공합니다. 다음 회의 노트 사진을 찍어 스크립트에 넣으면 기계가 대신 타이핑해 줍니다. + +코딩 즐겁게 하시고, 노트가 언제나 읽기 쉬우길 바랍니다! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/korean/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/korean/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..7253e50b7 --- /dev/null +++ b/ocr/korean/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,184 @@ +--- +category: general +date: 2026-03-28 +description: 이미지에 OCR을 수행하고 경계 상자 좌표와 함께 정제된 텍스트를 얻습니다. OCR 추출, OCR 정제 및 결과 표시를 단계별로 + 배우세요. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: ko +og_description: 이미지에서 OCR을 수행하고, 출력 결과를 정리한 뒤, 간결한 튜토리얼에서 경계 상자 좌표를 표시합니다. +og_title: 이미지에서 OCR 수행 – 깨끗한 결과와 경계 상자 +tags: +- OCR +- Computer Vision +- Python +title: 이미지에서 OCR 수행 – 결과 정리 및 바운딩 박스 좌표 표시 +url: /ko/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 이미지에서 OCR 수행 – 결과 정리 및 경계 상자 좌표 표시 + +이미지 파일에 **OCR을 수행**하고 싶지만 텍스트가 엉망이고 각 단어가 사진의 어느 위치에 있는지 모른 적이 있나요? 혼자만 그런 것이 아닙니다. 청구서 디지털화, 영수증 스캔, 간단한 텍스트 추출 등 많은 프로젝트에서 원시 OCR 출력은 첫 번째 장벽에 불과합니다. 좋은 소식은? 그 출력을 정리하고 수많은 보일러플레이트 코드를 작성하지 않고도 각 영역의 경계 상자 좌표를 즉시 확인할 수 있습니다. + +이 가이드에서는 **OCR 추출 방법**, **OCR 정리 후처리 방법**, 그리고 최종적으로 **정리된 각 영역의 경계 상자 좌표 표시** 방법을 단계별로 살펴봅니다. 끝까지 따라오면 흐릿한 사진을 깔끔하고 구조화된 텍스트로 변환하는 단일 실행 스크립트를 얻게 됩니다. + +## 준비 사항 + +- Python 3.9+ (아래 구문은 3.8 및 최신 버전에서도 동작) +- `recognize(..., return_structured=True)` 를 지원하는 OCR 엔진 – 예시에서는 가상의 `engine` 라이브러리를 사용합니다. Tesseract, EasyOCR, 혹은 영역 데이터를 반환하는 다른 SDK로 교체하세요. +- Python 함수와 반복문에 대한 기본 지식 +- 스캔하려는 이미지 파일 (PNG, JPG 등) + +> **Pro tip:** Tesseract를 사용한다면 `pytesseract.image_to_data` 함수가 이미 경계 상자를 제공합니다. 이 결과를 아래 `engine.recognize` API와 동일하게 동작하도록 작은 어댑터로 감싸면 됩니다. + +--- + +![이미지에서 OCR 수행 예시](image-placeholder.png "이미지에서 OCR 수행 예시") + +*Alt text: 이미지에서 OCR을 수행하고 경계 상자 좌표를 시각화하는 흐름을 보여주는 다이어그램* + +## 1단계 – 이미지에서 OCR 수행 및 구조화된 영역 가져오기 + +먼저 OCR 엔진에 단순 텍스트가 아니라 구조화된 텍스트 영역 리스트를 반환하도록 요청합니다. 이 리스트는 원시 문자열과 이를 둘러싼 사각형 정보를 포함합니다. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**왜 중요한가:** +단순 텍스트만 요청하면 공간적 컨텍스트를 잃게 됩니다. 구조화된 데이터는 이후 **경계 상자 좌표를 표시**하거나, 텍스트를 표와 정렬하거나, 정확한 위치 정보를 다운스트림 모델에 전달하는 데 활용할 수 있습니다. + +## 2단계 – 후처리기로 OCR 출력 정리하기 + +OCR 엔진은 문자 인식에 뛰어나지만 종종 불필요한 공백, 줄바꿈 아티팩트, 잘못 인식된 기호 등을 남깁니다. 후처리기는 텍스트를 정규화하고 일반적인 OCR 오류를 수정하며 공백을 제거합니다. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +직접 클리너를 구현한다면 다음을 고려하세요: + +- 비 ASCII 문자 제거 (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- 여러 개의 공백을 하나의 공백으로 축소 +- `pyspellchecker` 같은 맞춤법 검사기로 명백한 오타 교정 + +**왜 신경 써야 할까:** +정돈된 문자열은 검색, 인덱싱, 그리고 다운스트림 NLP 파이프라인의 신뢰성을 크게 높여줍니다. 다시 말해, **OCR 정리 방법**은 사용 가능한 데이터셋과 골칫거리 사이의 차이를 만들곤 합니다. + +## 3단계 – 정리된 각 영역의 경계 상자 좌표 표시 + +텍스트가 정리되었으니 이제 각 영역을 순회하면서 사각형과 정리된 문자열을 출력합니다. 바로 여기서 **경계 상자 좌표를 표시**하게 됩니다. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**샘플 출력** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +이 좌표들을 그림 라이브러리(예: OpenCV)에 전달해 원본 이미지에 박스를 오버레이하거나, 나중에 조회할 수 있도록 데이터베이스에 저장할 수 있습니다. + +## 전체 실행 가능한 스크립트 + +아래는 세 단계를 모두 연결한 완전한 프로그램입니다. 자리표시자 `engine` 호출을 실제 OCR SDK 호출로 교체하면 됩니다. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### 실행 방법 + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +위와 같은 샘플 출력이 화면에 표시될 것입니다. + +## 자주 묻는 질문 & 엣지 케이스 + +| 질문 | 답변 | +|----------|--------| +| **OCR 엔진이 `return_structured` 를 지원하지 않으면 어떻게 하나요?** | 엔진의 원시 출력(보통 좌표가 포함된 단어 리스트)을 `text` 와 `bounding_box` 속성을 가진 객체 리스트로 변환하는 얇은 래퍼를 작성합니다. | +| **신뢰도 점수를 얻을 수 있나요?** | 많은 SDK가 영역별 신뢰도 메트릭을 제공합니다. 출력문에 추가하세요: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **회전된 텍스트는 어떻게 처리하나요?** | `recognize` 호출 전에 OpenCV의 `cv2.minAreaRect` 로 이미지의 기울기를 보정합니다. | +| **출력을 JSON 형태로 받고 싶어요.** | `processed_result.regions` 를 `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)` 로 직렬화합니다. | +| **박스를 시각화할 방법이 있나요?** | 루프 안에서 OpenCV를 사용해 `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` 로 그린 뒤 `cv2.imwrite("annotated.jpg", img)` 로 저장합니다. | + +## 마무리 + +이제 **이미지에서 OCR 수행**, 원시 출력 정리, 그리고 **각 영역의 경계 상자 좌표 표시** 방법을 배웠습니다. 인식 → 후처리 → 순회라는 3단계 흐름은 신뢰할 수 있는 텍스트 추출이 필요한 모든 Python 프로젝트에 재사용 가능한 패턴이 됩니다. + +### 다음 단계는? + +- **다양한 OCR 백엔드**(Tesseract, EasyOCR, Google Vision)를 탐색하고 정확도를 비교해 보세요. +- **데이터베이스와 연동**해 영역 데이터를 저장하고 검색 가능한 아카이브를 구축하세요. +- **언어 감지**를 추가해 각 영역을 적절한 맞춤법 검사기로 라우팅하세요. +- **원본 이미지에 박스 오버레이**를 적용해 시각적으로 검증하세요(위 OpenCV 스니펫 참고). + +예상치 못한 문제가 발생하더라도, 가장 큰 성과는 견고한 후처리 단계에서 나온다는 점을 기억하세요. 깨끗한 문자열은 원시 문자 덤프보다 훨씬 다루기 쉽습니다. + +행복한 코딩 되시고, OCR 파이프라인이 언제나 깔끔하게 유지되길 바랍니다! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/korean/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/korean/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..471cad43e --- /dev/null +++ b/ocr/korean/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,231 @@ +--- +category: general +date: 2026-03-28 +description: Aspose OCR Cloud를 사용하여 파이썬에서 이미지 텍스트를 추출하는 방법을 보여주는 파이썬 OCR 튜토리얼입니다. + OCR을 위해 이미지를 로드하고 몇 분 안에 이미지를 일반 텍스트로 변환하는 방법을 배워보세요. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: ko +og_description: Python OCR 튜토리얼은 OCR을 위해 이미지를 로드하고 Aspose OCR Cloud를 사용해 이미지의 평문 텍스트로 + 변환하는 방법을 설명합니다. 전체 코드와 팁을 확인하세요. +og_title: 파이썬 OCR 튜토리얼 – 이미지에서 텍스트 추출 +tags: +- OCR +- Python +- Image Processing +title: Python OCR 튜토리얼 – 이미지에서 텍스트 추출하기 +url: /ko/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR 튜토리얼 – 이미지에서 텍스트 추출 + +영수증 사진 같은 지저분한 이미지를 깔끔하고 검색 가능한 텍스트로 바꾸고 싶으신가요? 혼자만 그런 것이 아닙니다. 제 경험에 따르면 가장 큰 장애물은 OCR 엔진 자체가 아니라 이미지를 올바른 형식으로 준비하고 텍스트를 끊김 없이 추출하는 과정입니다. + +이 **python ocr tutorial**은 이미지 로드, OCR 실행, 그리고 이미지의 평문 텍스트를 Python 문자열로 변환하는 모든 단계를 자세히 안내합니다. 마지막까지 진행하면 **extract text image python** 스타일로 텍스트를 추출할 수 있게 되며, 시작하는 데 별도의 유료 라이선스가 필요하지 않습니다. + +## What You’ll Learn + +- Aspose OCR Cloud SDK for Python을 설치하고 가져오는 방법. +- **load image for OCR**(PNG, JPEG, TIFF, PDF 등) 정확한 코드. +- **ocr image to text** 변환을 수행하도록 엔진을 호출하는 방법. +- 다중 페이지 PDF나 저해상도 스캔과 같은 일반적인 엣지 케이스를 처리하는 팁. +- 출력 결과를 검증하고 텍스트가 깨졌을 때 대처하는 방법. + +### Prerequisites + +- 머신에 Python 3.8+이 설치되어 있어야 합니다. +- 무료 Aspose Cloud 계정(체험판은 라이선스 없이 작동). +- pip와 가상 환경에 대한 기본적인 이해—특별한 것이 필요하지 않습니다. + +> **Pro tip:** 이미 virtualenv를 사용 중이라면 지금 활성화하세요. 의존성을 깔끔하게 관리하고 버전 충돌을 방지할 수 있습니다. + +![Python OCR 튜토리얼 스크린샷 – 인식된 텍스트가 표시된 화면](path/to/ocr_example.png "Python OCR 튜토리얼 – 추출된 평문 텍스트 표시") + +## Step 1 – Install the Aspose OCR Cloud SDK + +먼저 Aspose OCR 서비스와 통신할 라이브러리를 설치해야 합니다. 터미널을 열고 다음을 실행하세요: + +```bash +pip install asposeocrcloud +``` + +이 한 줄 명령으로 최신 SDK(현재 버전 23.12)를 가져옵니다. 패키지에는 필요한 모든 것이 포함되어 있어 별도의 이미지 처리 라이브러리를 추가로 설치할 필요가 없습니다. + +## Step 2 – Initialise the OCR Engine (Primary Keyword in Action) + +SDK가 준비되었으니 이제 **python ocr tutorial** 엔진을 초기화합니다. 트라이얼 버전은 라이선스 키가 필요 없으므로 간단합니다. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** 엔진을 한 번만 초기화하면 이후 호출이 빠르게 수행됩니다. 이미지마다 객체를 새로 만들면 네트워크 왕복이 불필요하게 늘어납니다. + +## Step 3 – Load Image for OCR + +여기서 **load image for OCR** 키워드가 빛을 발합니다. SDK의 `Image.load` 메서드는 파일 경로나 URL을 받아 자동으로 형식(PNG, JPEG, TIFF, PDF 등)을 감지합니다. 샘플 영수증을 로드해 보겠습니다: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +다중 페이지 PDF를 다룰 경우 PDF 파일을 지정하기만 하면 SDK가 각 페이지를 내부적으로 별도 이미지로 처리합니다. + +## Step 4 – Perform OCR Image to Text Conversion + +이미지가 메모리에 로드되면 실제 OCR은 한 줄로 수행됩니다. `recognize` 메서드는 평문 텍스트, 신뢰도 점수, 필요 시 바운딩 박스 등을 포함한 `OcrResult` 객체를 반환합니다. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** 300 dpi 이하의 저해상도 사진은 먼저 업스케일하는 것이 좋습니다. SDK에 `Resize` 헬퍼가 있지만 대부분 영수증은 기본 설정으로 충분합니다. + +## Step 5 – Convert Image Plain Text to a Usable String + +퍼즐의 마지막 조각은 결과 객체에서 평문 텍스트를 추출하는 것입니다. 이것이 **convert image plain text** 단계이며, OCR 결과물을 출력, 저장 또는 다른 시스템에 전달할 수 있는 문자열로 변환합니다. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +스크립트를 실행하면 다음과 같은 출력이 나타납니다: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +이 출력은 이제 일반 Python 문자열이므로 CSV로 내보내기, 데이터베이스 삽입, 혹은 자연어 처리에 바로 사용할 수 있습니다. + +## Handling Common Pitfalls + +### 1. Blank or Noisy Images + +`ocr_result.text`가 빈 문자열이면 이미지 품질을 다시 확인하세요. 간단히 전처리 단계를 추가할 수 있습니다: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Multi‑Page PDFs + +PDF를 입력하면 `recognize`가 각 페이지별 결과를 반환합니다. 다음과 같이 반복하면 됩니다: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Language Support + +Aspose OCR은 60개 이상의 언어를 지원합니다. 언어를 변경하려면 `recognize` 호출 전에 `language` 속성을 설정하세요: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Full Working Example + +전체 과정을 한 번에 보여주는 복사‑붙여넣기 가능한 스크립트입니다. 설치부터 엣지 케이스 처리까지 모두 포함합니다: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +스크립트를 실행(`python ocr_demo.py`)하면 콘솔에 **ocr image to text** 결과가 바로 표시됩니다. + +## Recap – What We Covered + +- **Aspose OCR Cloud** SDK 설치(`pip install asposeocrcloud`). +- 라이선스 없이 **Initialised the OCR engine**(트라이얼에 최적). +- PNG, JPEG, PDF 등 다양한 형식에 대해 **load image for OCR**하는 방법 시연. +- **ocr image to text** 변환 및 **converted image plain text**를 사용 가능한 Python 문자열로 변환. +- 저해상도 스캔, 다중 페이지 PDF, 언어 선택 등 일반적인 문제 해결 방법. + +## Next Steps & Related Topics + +**python ocr tutorial**을 마스터했으니 다음 주제들을 탐색해 보세요: + +- 대량 영수증 폴더를 처리하기 위한 **extract text image python** 배치 처리. +- OCR 결과를 **pandas**와 연계해 데이터 분석하기(`df = pd.read_csv(StringIO(extracted))`). +- 인터넷 연결이 제한될 때 대비해 **Tesseract OCR**을 백업 옵션으로 사용. +- **spaCy**를 활용해 날짜, 금액, 가맹점 이름 등 엔터티를 식별하는 후처리 추가. + +다양한 이미지 형식, 대비 조정, 언어 전환 등을 실험해 보세요. OCR 분야는 넓고, 지금 익힌 기술은 문서 자동화 프로젝트의 탄탄한 기반이 됩니다. + +Happy coding, and may your text always be readable! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/korean/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/korean/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..a66a2b4d5 --- /dev/null +++ b/ocr/korean/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-03-28 +description: Aspose OCR Cloud를 사용하여 이미지에서 OCR을 실행하고, Hugging Face 모델을 자동으로 다운로드하며, + OCR 텍스트를 정리하고, Python에서 LLM 모델을 구성하는 방법을 배워보세요. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: ko +og_description: 이미지에 OCR을 실행하고 자동으로 다운로드된 Hugging Face 모델을 사용해 출력을 정리합니다. 이 가이드는 Python에서 + LLM 모델을 구성하는 방법을 보여줍니다. +og_title: 이미지에서 OCR 실행 – 완전한 Aspose OCR 클라우드 튜토리얼 +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Aspose OCR Cloud를 사용해 이미지에서 OCR 수행 – 전체 단계별 가이드 +url: /ko/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# 이미지에서 OCR 실행 – 완전한 Aspose OCR Cloud 튜토리얼 + +이미지 파일에 OCR을 실행했는데 원시 출력이 뒤죽박죽이라면 언제였나요? 제 경험상 가장 큰 고통은 인식 자체가 아니라 정리 작업입니다. 다행히 Aspose OCR Cloud는 LLM 후처리기를 연결하여 *OCR 텍스트를 자동으로 정리*할 수 있게 해줍니다. 이번 튜토리얼에서는 **Hugging Face 모델 다운로드**부터 LLM 설정, OCR 엔진 실행, 최종 결과 정리까지 필요한 모든 과정을 단계별로 안내합니다. + +이 가이드를 끝까지 따라하면 다음과 같은 준비된 스크립트를 얻게 됩니다: + +1. Hugging Face에서 컴팩트한 Qwen 2.5 모델을 가져옵니다(자동 다운로드). +2. 모델을 GPU와 CPU에 부분적으로 할당하도록 구성합니다. +3. 손글씨 메모 이미지에 OCR 엔진을 실행합니다. +4. LLM을 사용해 인식된 텍스트를 정리하여 사람이 읽기 쉬운 출력으로 변환합니다. + +> **Prerequisites** – Python 3.8+, `asposeocrcloud` 패키지, 최소 4 GB VRAM을 가진 GPU(선택 사항이지만 권장), 그리고 첫 모델 다운로드를 위한 인터넷 연결. + +--- + +## 필요 사항 + +- **Aspose OCR Cloud SDK** – `pip install asposeocrcloud` 로 설치합니다. +- **샘플 이미지** – 예: 로컬 폴더에 `handwritten_note.jpg` 를 배치합니다. +- **GPU 지원** – CUDA 지원 GPU가 있으면 스크립트가 30개의 레이어를 오프로드합니다; 없으면 자동으로 CPU만 사용합니다. +- **쓰기 권한** – 스크립트가 모델을 `YOUR_DIRECTORY` 에 캐시하므로 해당 폴더가 존재하는지 확인합니다. + +--- + +## Step 1 – LLM 모델 구성 (Hugging Face 모델 다운로드) + +먼저 Aspose AI에 모델을 어디서 가져올지 알려줍니다. `AsposeAIModelConfig` 클래스가 자동 다운로드, 양자화, GPU 레이어 할당을 처리합니다. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Why this matters** – `int8` 로 양자화하면 메모리 사용량이 크게 줄어듭니다(≈ 4 GB vs 12 GB). 모델을 GPU와 CPU에 나누어 배치하면 RTX 3060 같은 보통 사양에서도 30억 파라미터 LLM을 실행할 수 있습니다. GPU가 없으면 `gpu_layers=0` 으로 설정하면 SDK가 모든 작업을 CPU에서 수행합니다. + +> **Tip:** 첫 실행 시 약 1.5 GB 를 다운로드하므로 몇 분 정도 기다리고 안정적인 연결을 유지하세요. + +--- + +## Step 2 – 모델 구성으로 AI 엔진 초기화 + +이제 Aspose AI 엔진을 시작하고 방금 만든 구성을 전달합니다. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**What’s happening under the hood?** SDK가 `directory_model_path` 에 기존 모델이 있는지 확인합니다. 일치하는 버전을 찾으면 즉시 로드하고, 없으면 Hugging Face 에서 GGUF 파일을 다운로드하고 압축을 풀어 추론 파이프라인을 준비합니다. + +--- + +## Step 3 – OCR 엔진 생성 및 AI 후처리기 연결 + +OCR 엔진은 문자 인식이라는 무거운 작업을 수행합니다. `ocr_ai.run_postprocessor` 를 연결하면 인식 후 자동으로 **clean OCR text** 가 활성화됩니다. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Why use a post‑processor?** 원시 OCR 결과에는 잘못된 줄바꿈, 오인식된 구두점, 불필요한 기호 등이 포함될 수 있습니다. LLM은 출력을 올바른 문장으로 재작성하고, 철자를 교정하며, 누락된 단어를 추론까지 해줍니다—즉, 원시 덤프를 깔끔한 문장으로 변환합니다. + +--- + +## Step 4 – 이미지 파일에 OCR 실행 + +모든 설정이 완료되었으니 이제 이미지를 엔진에 전달할 차례입니다. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Edge case:** 이미지가 크면(> 5 MP) 먼저 리사이즈하여 처리 속도를 높이는 것이 좋습니다. SDK는 Pillow `Image` 객체를 받으므로 필요 시 `PIL.Image.thumbnail()` 로 전처리할 수 있습니다. + +--- + +## Step 5 – AI 로 인식 텍스트 정리 및 두 버전 출력 + +마지막으로 앞서 연결한 후처리기를 호출합니다. 이 단계에서는 *정리 전*과 *정리 후* 텍스트를 비교합니다. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Expected Output + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +LLM이 수행한 작업을 확인해 보세요: + +- 일반적인 OCR 오인식 수정(`Th1s` → `This`). +- 불필요한 기호 제거(`&` → `and`). +- 줄바꿈을 적절한 문장으로 정규화. + +--- + +## 🎨 Visual Overview (Run OCR on image Workflow) + +![이미지에서 OCR 실행 워크플로우](run_ocr_on_image_workflow.png "모델 다운로드부터 정제된 출력까지 이미지에서 OCR 실행 파이프라인을 보여주는 다이어그램") + +위 다이어그램은 전체 파이프라인을 요약합니다: **Hugging Face 모델 다운로드 → LLM 구성 → AI 초기화 → OCR 엔진 → AI 후처리기 → 정제된 OCR 텍스트**. + +--- + +## Common Questions & Pro Tips + +### What if I don’t have a GPU? + +`AsposeAIModelConfig` 에서 `gpu_layers=0` 으로 설정하세요. 모델이 완전히 CPU에서 실행되며 속도는 느리지만 여전히 동작합니다. 추론 시간을 합리적으로 유지하려면 더 작은 모델(예: `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) 로 전환할 수도 있습니다. + +### How do I change the model later? + +`hugging_face_repo_id` 를 업데이트하고 `ocr_ai.initialize(model_config)` 를 다시 실행하면 됩니다. SDK가 버전 변화를 감지하고 새 모델을 다운로드한 뒤 캐시된 파일을 교체합니다. + +### Can I customise the post‑processor prompt? + +가능합니다. `custom_settings` 에 `prompt_template` 키를 포함한 딕셔너리를 전달하면 됩니다. 예시: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Should I store the cleaned text to a file? + +물론입니다. 정리 후 결과를 `.txt` 혹은 `.json` 파일에 저장하여 후속 처리에 활용할 수 있습니다: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Conclusion + +이번 튜토리얼을 통해 Aspose OCR Cloud 로 **이미지에서 OCR 실행**, **Hugging Face 모델 자동 다운로드**, **LLM 모델 설정**, 그리고 강력한 LLM 후처리기로 **OCR 텍스트 정리**까지 한 번에 수행하는 방법을 보여드렸습니다. 전체 과정은 단일 Python 스크립트에 담겨 있으며 GPU가 있든 없든 모두 작동합니다. + +이 파이프라인에 익숙해졌다면 다음을 시도해 보세요: + +- **다양한 LLM** – 더 큰 컨텍스트 윈도우가 필요한 경우 `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` 를 사용해 보세요. +- **배치 처리** – 이미지 폴더를 순회하면서 정리된 결과를 CSV 로 집계합니다. +- **맞춤 프롬프트** – 도메인(법률 문서, 의료 기록 등)에 맞게 AI를 튜닝합니다. + +`gpu_layers` 값을 조정하거나 모델을 교체하고, 직접 만든 프롬프트를 연결해 보세요. 가능성은 무한하며, 지금 가지고 있는 코드는 출발점에 불과합니다. + +행복한 코딩 되시고, OCR 출력이 언제나 깨끗하길 바랍니다! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/polish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/polish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..52cab0fc2 --- /dev/null +++ b/ocr/polish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,224 @@ +--- +category: general +date: 2026-03-28 +description: Jak używać OCR do rozpoznawania odręcznego tekstu na obrazach. Dowiedz + się, jak wyodrębnić odręczny tekst, przekształcić obraz odręczny i szybko uzyskać + czyste wyniki. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: pl +og_description: Jak używać OCR do rozpoznawania odręcznego tekstu. Ten tutorial pokazuje + krok po kroku, jak wyodrębnić odręczny tekst z obrazów i uzyskać dopracowane wyniki. +og_title: Jak używać OCR do rozpoznawania tekstu odręcznego – Kompletny przewodnik +tags: +- OCR +- Handwriting Recognition +- Python +title: Jak używać OCR do rozpoznawania tekstu odręcznego – kompletny przewodnik +url: /pl/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Jak używać OCR do rozpoznawania tekstu odręcznego – Kompletny przewodnik + +Jak używać OCR do notatek odręcznych jest pytaniem, które zadaje wielu programistów, gdy muszą zdigitalizować szkice, protokoły spotkań lub szybkie pomysły. W tym przewodniku przeprowadzimy Cię przez dokładne kroki rozpoznawania tekstu odręcznego, wyodrębniania tekstu odręcznego i przekształcania obrazu odręcznego w czyste, przeszukiwalne ciągi znaków. + +Jeśli kiedykolwiek patrzyłeś na zdjęcie listy zakupów i zastanawiałeś się: „Czy mogę przekonwertować ten odręczny obraz na tekst bez ponownego przepisywania?” – jesteś we właściwym miejscu. Po zakończeniu będziesz mieć gotowy do uruchomienia skrypt, który zamienia **notatkę odręczną na tekst** w kilka sekund. + +## Czego będziesz potrzebować + +- Python 3.8+ (kod działa z każdą nowszą wersją) +- Biblioteka `ocr` – zainstaluj ją za pomocą `pip install ocr-sdk` (zastąp nazwą pakietu swojego dostawcy) +- Czytelne zdjęcie notatki odręcznej (`hand_note.png` w przykładzie) +- Odrobina ciekawości i kawa ☕️ (opcjonalnie, ale zalecane) + +Bez ciężkich frameworków, bez płatnych kluczy w chmurze – tylko lokalny silnik, który obsługuje **rozpoznawanie odręczne** od razu po instalacji. + +## Krok 1 – Zainstaluj pakiet OCR i zaimportuj go + +Na początek, pobierzmy odpowiedni pakiet na Twój komputer. Otwórz terminal i uruchom: + +```bash +pip install ocr-sdk +``` + +Po zakończeniu instalacji zaimportuj moduł w swoim skrypcie: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Porada:** Jeśli używasz wirtualnego środowiska, aktywuj je przed instalacją. Dzięki temu Twój projekt będzie uporządkowany i unikniesz konfliktów wersji. + +## Krok 2 – Utwórz silnik OCR i włącz tryb odręczny + +Teraz faktycznie **jak używać OCR** – potrzebujemy instancji silnika, który wie, że mamy do czynienia z pismem odręcznym, a nie drukowanymi czcionkami. Poniższy fragment kodu tworzy silnik i przełącza go w tryb odręczny: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Dlaczego ustawiamy `recognition_mode`? Ponieważ większość silników OCR domyślnie wykrywa tekst drukowany, co często pomija pętle i pochylenia w notatce osobistej. Włączenie trybu odręcznego znacznie zwiększa dokładność. + +## Krok 3 – Załaduj obraz, który chcesz przekonwertować (Konwersja obrazu odręcznego) + +Obrazy są surowym materiałem dla każdego zadania OCR. Upewnij się, że Twoje zdjęcie jest zapisane w formacie bezstratnym (PNG sprawdza się doskonale) i że tekst jest w miarę czytelny. Następnie załaduj je w ten sposób: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Jeśli obraz znajduje się w tym samym katalogu co skrypt, możesz po prostu użyć `"hand_note.png"` zamiast pełnej ścieżki. + +> **Co zrobić, jeśli obraz jest rozmyty?** Spróbuj wstępnego przetwarzania za pomocą OpenCV (np. `cv2.cvtColor` do konwersji na odcienie szarości, `cv2.threshold` do zwiększenia kontrastu) przed przekazaniem go do silnika OCR. + +## Krok 4 – Uruchom silnik rozpoznawania, aby wyodrębnić tekst odręczny + +Gdy silnik jest gotowy, a obraz w pamięci, możemy w końcu **wyodrębnić tekst odręczny**. Metoda `recognize` zwraca surowy obiekt wyniku, który zawiera tekst oraz oceny pewności. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Typowy surowy wynik może zawierać niechciane podziały linii lub błędnie rozpoznane znaki, szczególnie jeśli pismo jest niechlujne. Dlatego istnieje kolejny krok. + +## Krok 5 – (Opcjonalnie) Wypoleruj wynik za pomocą AI post‑procesora + +Większość nowoczesnych SDK OCR dostarcza lekki AI post‑procesor, który usuwa nadmiarowe spacje, naprawia typowe błędy OCR i normalizuje zakończenia linii. Uruchomienie go jest tak proste jak: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Jeśli pominiesz ten krok, nadal otrzymasz użyteczny tekst, ale konwersja **notatki odręcznej na tekst** będzie wyglądać nieco surowiej. Post‑procesor jest szczególnie przydatny w notatkach zawierających wypunktowania lub słowa z mieszanymi wielkościami liter. + +## Krok 6 – Zweryfikuj wynik i obsłuż przypadki brzegowe + +Po wydrukowaniu wypolerowanego wyniku, sprawdź podwójnie, czy wszystko wygląda poprawnie. Oto szybka kontrola, którą możesz dodać: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Lista kontrolna przypadków brzegowych** + +| Sytuacja | Co zrobić | +|-----------|------------| +| **Bardzo niski kontrast** | Zwiększ kontrast za pomocą `cv2.convertScaleAbs` przed załadowaniem. | +| **Wiele języków** | Ustaw `ocr_engine.language = ["en", "es"]` (lub swoje docelowe języki). | +| **Duże dokumenty** | Przetwarzaj strony w partiach, aby uniknąć skoków pamięci. | +| **Specjalne symbole** | Dodaj własny słownik poprzez `ocr_engine.add_custom_words([...])`. | + +## Przegląd wizualny + +Poniżej znajduje się obraz zastępczy ilustrujący przepływ pracy — od sfotografowanej notatki do czystego tekstu. Tekst alternatywny zawiera główne słowo kluczowe, co sprawia, że obraz jest przyjazny SEO. + +![jak używać OCR na obrazie notatki odręcznej](/images/handwritten_ocr_flow.png "jak używać OCR na obrazie notatki odręcznej") + +## Pełny, gotowy do uruchomienia skrypt + +Łącząc wszystkie elementy, oto kompletny, gotowy do skopiowania i wklejenia program: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Oczekiwany wynik (przykład)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Zauważ, jak post‑procesor poprawił literówkę „T0d@y” i znormalizował odstępy. + +## Częste pułapki i porady + +- **Rozmiar obrazu ma znaczenie** – silniki OCR zazwyczaj ograniczają rozmiar wejściowy do 4 K × 4 K. Przedtem zmniejsz duże zdjęcia. +- **Styl pisma** – pismo odręczne vs. litery drukowane mogą wpływać na dokładność. Jeśli kontrolujesz źródło (np. cyfrowy piórko), zachęcaj do liter drukowanych dla najlepszych rezultatów. +- **Przetwarzanie wsadowe** – przy pracy z dziesiątkami notatek, otocz skrypt pętlą i zapisz każdy wynik w pliku CSV lub bazie SQLite. +- **Wycieki pamięci** – niektóre SDK utrzymują wewnętrzne bufory; wywołaj `ocr_engine.dispose()` po zakończeniu, jeśli zauważysz spowolnienie. + +## Kolejne kroki – wyjście poza prosty OCR + +Teraz, gdy opanowałeś **jak używać OCR** dla pojedynczego obrazu, rozważ te rozszerzenia: + +1. **Integracja z przechowywaniem w chmurze** – Pobieraj obrazy z AWS S3 lub Azure Blob, uruchamiaj ten sam pipeline i odsyłaj wyniki z powrotem. +2. **Dodaj wykrywanie języka** – Użyj `ocr_engine.detect_language()`, aby automatycznie przełączać słowniki. +3. **Połączenie z NLP** – Przekaż oczyszczony tekst do spaCy lub NLTK, aby wyodrębnić encje, daty lub zadania. +4. **Utwórz endpoint REST** – Owiń skrypt w Flask lub FastAPI, aby inne usługi mogły wysyłać POST z obrazami i otrzymywać tekst zakodowany w JSON. + +Wszystkie te pomysły wciąż obracają się wokół podstawowych pojęć **rozpoznawania tekstu odręcznego**, **wyodrębniania tekstu odręcznego** i **konwersji obrazu odręcznego** — dokładnych fraz, które prawdopodobnie będziesz wyszukiwać dalej. + +--- + +### TL;DR + +Pokażemy Ci **jak używać OCR**, aby rozpoznawać tekst odręczny, wyodrębniać go i wypolerować wynik do użytego ciągu znaków. Pełny skrypt jest gotowy do uruchomienia, przepływ pracy wyjaśniony krok po kroku, a Ty masz już listę kontrolną typowych przypadków brzegowych. Zrób zdjęcie swojej kolejnej notatki ze spotkania, podłącz je do skryptu i pozwól maszynie pisać za Ciebie. + +Miłego kodowania i niech Twoje notatki zawsze będą czytelne! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/polish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/polish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..8ac3da8bb --- /dev/null +++ b/ocr/polish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,186 @@ +--- +category: general +date: 2026-03-28 +description: Wykonaj OCR na obrazie i uzyskaj czysty tekst z współrzędnymi prostokątów + ograniczających. Dowiedz się, jak wyodrębnić OCR, oczyścić OCR i wyświetlić wyniki + krok po kroku. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: pl +og_description: Wykonaj OCR na obrazie, oczyść wynik i wyświetl współrzędne ramki + ograniczającej w zwięzłym samouczku. +og_title: Wykonaj OCR na obrazie – czyste wyniki i ramki ograniczające +tags: +- OCR +- Computer Vision +- Python +title: Wykonaj OCR na obrazie – czyste wyniki i wyświetl współrzędne ramki ograniczającej +url: /pl/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Wykonaj OCR na obrazie – Oczyść wyniki i pokaż współrzędne prostokątów ograniczających + +Czy kiedykolwiek musiałeś **wykonać OCR na plikach obrazu**, ale otrzymywałeś nieuporządkowany tekst i nie wiedziałeś, gdzie każde słowo znajduje się na zdjęciu? Nie jesteś sam. W wielu projektach — digitalizacja faktur, skanowanie paragonów czy proste wyodrębnianie tekstu — surowy wynik OCR to dopiero pierwszy krok. Dobra wiadomość? Możesz oczyścić ten wynik i natychmiast zobaczyć współrzędne prostokątów ograniczających każdą sekcję, nie pisząc mnóstwa kodu szablonowego. + +W tym przewodniku przejdziemy przez **wyodrębnianie OCR**, uruchomienie **post‑procesora czyszczenia OCR** oraz w końcu **wyświetlenie współrzędnych prostokątów ograniczających** dla każdego oczyszczonego regionu. Po zakończeniu będziesz mieć pojedynczy, gotowy do uruchomienia skrypt, który zamieni rozmyte zdjęcie w uporządkowany, strukturalny tekst gotowy do dalszego przetwarzania. + +## Czego będziesz potrzebować + +- Python 3.9+ (składnia poniżej działa na 3.8 i nowszych) +- Silnik OCR obsługujący `recognize(..., return_structured=True)` – na przykład fikcyjna biblioteka `engine` użyta w przykładzie. Zamień ją na Tesseract, EasyOCR lub dowolne SDK zwracające dane o regionach. +- Podstawowa znajomość funkcji i pętli w Pythonie +- Plik obrazu, który chcesz zeskanować (PNG, JPG, itp.) + +> **Pro tip:** Jeśli używasz Tesseract, funkcja `pytesseract.image_to_data` już zwraca prostokąty ograniczające. Możesz opakować jej wynik w mały adapter, który naśladuje API `engine.recognize` pokazane poniżej. + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Alt text: diagram pokazujący, jak wykonać OCR na obrazie i zwizualizować współrzędne prostokątów ograniczających* + +## Krok 1 – Wykonaj OCR na obrazie i uzyskaj strukturalne regiony + +Pierwszym krokiem jest poproszenie silnika OCR o zwrócenie nie tylko zwykłego tekstu, ale także strukturalnej listy regionów tekstowych. Lista ta zawiera surowy ciąg znaków oraz prostokąt, który go otacza. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Dlaczego to ważne:** +Gdy żądasz jedynie zwykłego tekstu, tracisz kontekst przestrzenny. Strukturalne dane pozwalają później **wyświetlić współrzędne prostokątów ograniczających**, dopasować tekst do tabel lub przekazać precyzyjne położenie do kolejnego modelu. + +## Krok 2 – Jak oczyścić wynik OCR przy użyciu post‑procesora + +Silniki OCR świetnie rozpoznają znaki, ale często pozostawiają zbędne spacje, artefakty podziału linii lub błędnie rozpoznane symbole. Post‑procesor normalizuje tekst, naprawia typowe błędy OCR i usuwa nadmiarowe białe znaki. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Jeśli tworzysz własny czyszczacz, rozważ: + +- Usuwanie znaków nie‑ASCII (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Zastępowanie wielu spacji jedną +- Użycie sprawdzania pisowni, np. `pyspellchecker`, aby naprawić oczywiste literówki + +**Dlaczego warto się tym przejmować:** +Uporządkowany ciąg znaków sprawia, że wyszukiwanie, indeksowanie i dalsze pipeline’y NLP są znacznie bardziej niezawodne. Innymi słowy, **jak oczyścić OCR** to często różnica między użytecznym zestawem danych a koszmarem. + +## Krok 3 – Wyświetl współrzędne prostokątów ograniczających dla każdego oczyszczonego regionu + +Teraz, gdy tekst jest czysty, iterujemy po każdym regionie, wypisując jego prostokąt i oczyszczony ciąg znaków. To właśnie w tym miejscu **wyświetlamy współrzędne prostokątów ograniczających**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Przykładowy wynik** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Możesz teraz przekazać te współrzędne do biblioteki rysującej (np. OpenCV), aby nałożyć prostokąty na oryginalny obraz, lub zapisać je w bazie danych do późniejszych zapytań. + +## Pełny, gotowy do uruchomienia skrypt + +Poniżej znajduje się kompletny program, który łączy wszystkie trzy kroki. Zamień wywołania `engine` na rzeczywiste wywołania Twojego SDK OCR. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Jak uruchomić + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Powinieneś zobaczyć listę prostokątów ograniczających sparowanych z oczyszczonym tekstem, dokładnie taką jak w przykładowym wyniku powyżej. + +## Najczęściej zadawane pytania i przypadki brzegowe + +| Pytanie | Odpowiedź | +|----------|--------| +| **Co zrobić, jeśli silnik OCR nie obsługuje `return_structured`?** | Napisz cienki wrapper, który przekształci surowy wynik silnika (zwykle listę słów z koordynatami) w obiekty z atrybutami `text` i `bounding_box`. | +| **Czy mogę uzyskać wyniki wiarygodności (confidence)?** | Wiele SDK udostępnia metrykę wiarygodności dla każdego regionu. Dodaj ją do instrukcji drukowania: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Jak obsłużyć tekst obrócony?** | Przetwórz obraz wstępnie przy użyciu `cv2.minAreaRect` z OpenCV, aby wyrównać go przed wywołaniem `recognize`. | +| **Co jeśli potrzebuję wyniku w formacie JSON?** | Serializuj `processed_result.regions` przy pomocy `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Czy istnieje sposób na wizualizację prostokątów?** | Użyj OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` wewnątrz pętli, a potem `cv2.imwrite("annotated.jpg", img)`. | + +## Podsumowanie + +Właśnie nauczyłeś się **wykonywać OCR na obrazie**, czyścić surowy wynik i **wyświetlać współrzędne prostokątów ograniczających** dla każdego regionu. Trójstopniowy przepływ — rozpoznawanie → post‑procesowanie → iteracja — to wzorzec, który możesz wstawić do dowolnego projektu Pythona wymagającego niezawodnego wyodrębniania tekstu. + +### Co dalej? + +- **Eksploruj różne backendy OCR** (Tesseract, EasyOCR, Google Vision) i porównaj ich dokładność. +- **Zintegruj z bazą danych**, aby przechowywać dane regionów dla przeszukiwalnych archiwów. +- **Dodaj wykrywanie języka**, aby kierować każdy region do odpowiedniego sprawdzania pisowni. +- **Nałóż prostokąty na oryginalny obraz** w celu wizualnej weryfikacji (zobacz fragment OpenCV powyżej). + +Jeśli napotkasz problemy, pamiętaj, że największy zysk pochodzi z solidnego kroku post‑procesowania; czysty ciąg znaków jest znacznie łatwiejszy do pracy niż surowy zlepek znaków. + +Miłego kodowania i niech Twoje pipeline’y OCR będą zawsze schludne! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/polish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/polish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..976783fb5 --- /dev/null +++ b/ocr/polish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,232 @@ +--- +category: general +date: 2026-03-28 +description: Samouczek OCR w Pythonie pokazujący, jak wyodrębnić tekst z obrazu w + Pythonie przy użyciu Aspose OCR Cloud. Dowiedz się, jak wczytać obraz do OCR i w + kilka minut przekształcić obraz w zwykły tekst. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: pl +og_description: Samouczek OCR w Pythonie wyjaśnia, jak wczytać obraz do OCR i przekształcić + go w zwykły tekst przy użyciu Aspose OCR Cloud. Pobierz pełny kod i wskazówki. +og_title: Samouczek OCR w Pythonie – Wyodrębnianie tekstu z obrazów +tags: +- OCR +- Python +- Image Processing +title: Samouczek OCR w Pythonie – Wyodrębnianie tekstu z obrazów +url: /pl/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Tutorial – Wyodrębnianie tekstu z obrazów + +Zastanawiałeś się kiedyś, jak zamienić nieczytelne zdjęcie paragonu w czysty, przeszukiwalny tekst? Nie jesteś jedyny. Z mojego doświadczenia największą przeszkodą nie jest sam silnik OCR, lecz przygotowanie obrazu w odpowiednim formacie i wyciągnięcie czystego tekstu bez problemów. + +Ten **python ocr tutorial** przeprowadzi Cię przez każdy krok — ładowanie obrazu do OCR, uruchomienie rozpoznawania i w końcu konwersję czystego tekstu obrazu do łańcucha znaków w Pythonie, który możesz przechowywać lub analizować. Po zakończeniu będziesz w stanie **extract text image python** w stylu, i nie będziesz potrzebował płatnej licencji, aby rozpocząć. + +## Czego się nauczysz + +- Jak zainstalować i zaimportować Aspose OCR Cloud SDK dla Pythona. +- Dokładny kod do **load image for OCR** (PNG, JPEG, TIFF, PDF, itp.). +- Jak wywołać silnik, aby wykonał konwersję **ocr image to text**. +- Wskazówki dotyczące obsługi typowych przypadków brzegowych, takich jak wielostronicowe PDF‑y lub skany o niskiej rozdzielczości. +- Sposoby weryfikacji wyniku i co zrobić, gdy tekst jest nieczytelny. + +### Wymagania wstępne + +- Python 3.8+ zainstalowany na Twoim komputerze. +- Darmowe konto Aspose Cloud (wersja próbna działa bez licencji). +- Podstawowa znajomość pip i środowisk wirtualnych — nic skomplikowanego. + +> **Pro tip:** Jeśli już używasz virtualenv, aktywuj go teraz. Dzięki temu Twoje zależności będą uporządkowane i unikniesz konfliktów wersji. + +![Python OCR tutorial screenshot showing recognized text](path/to/ocr_example.png "Python OCR tutorial – extracted plain text display") + +## Krok 1 – Zainstaluj Aspose OCR Cloud SDK + +Na początek potrzebujemy biblioteki, która komunikuje się z usługą OCR Aspose. Otwórz terminal i uruchom: + +```bash +pip install asposeocrcloud +``` + +To pojedyncze polecenie pobiera najnowszy SDK (obecnie wersja 23.12). Pakiet zawiera wszystko, czego potrzebujesz — nie są wymagane dodatkowe biblioteki do przetwarzania obrazów. + +## Krok 2 – Zainicjalizuj silnik OCR (Primary Keyword in Action) + +Teraz, gdy SDK jest gotowe, możemy uruchomić silnik **python ocr tutorial**. Konstruktor nie wymaga klucza licencyjnego w wersji próbnej, co upraszcza sprawę. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Dlaczego to ważne:** Inicjalizacja silnika tylko raz sprawia, że kolejne wywołania są szybkie. Jeśli będziesz tworzyć obiekt dla każdego obrazu, zmarnujesz niepotrzebne połączenia sieciowe. + +## Krok 3 – Ładowanie obrazu do OCR + +Tutaj **load image for OCR** naprawdę się przydaje. Metoda `Image.load` w SDK przyjmuje ścieżkę do pliku lub URL i automatycznie wykrywa format (PNG, JPEG, TIFF, PDF, itp.). Załadujmy przykładowy paragon: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Jeśli pracujesz z wielostronicowym PDF‑em, po prostu wskaż plik PDF; SDK potraktuje każdą stronę jako osobny obraz wewnętrznie. + +## Krok 4 – Wykonaj konwersję OCR obraz na tekst + +Gdy obraz jest w pamięci, rzeczywiste OCR odbywa się w jednej linii. Metoda `recognize` zwraca obiekt `OcrResult`, który zawiera czysty tekst, wyniki pewności oraz ewentualne ramki ograniczające, jeśli będą potrzebne później. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Przypadek brzegowy:** W przypadku obrazów o niskiej rozdzielczości (poniżej 300 dpi) warto najpierw zwiększyć ich rozmiar. SDK oferuje pomocniczą metodę `Resize`, ale dla większości paragonów domyślne ustawienia działają dobrze. + +## Krok 5 – Konwersja czystego tekstu obrazu na użyteczny łańcuch znaków + +Ostatnim elementem układanki jest wyodrębnienie czystego tekstu z obiektu wyniku. To krok **convert image plain text**, który przekształca blob OCR w coś, co możesz wydrukować, zapisać lub przekazać do innego systemu. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Po uruchomieniu skryptu powinieneś zobaczyć coś podobnego do: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Ten wynik jest teraz zwykłym łańcuchem znaków Pythona, gotowym do eksportu CSV, wstawiania do bazy danych lub przetwarzania języka naturalnego. + +## Radzenie sobie z typowymi problemami + +### 1. Puste lub zaszumione obrazy + +Jeśli `ocr_result.text` jest pusty, sprawdź jakość obrazu. Szybkim rozwiązaniem jest dodanie kroku wstępnego przetwarzania: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Wielostronicowe PDF‑y + +Gdy podasz PDF, `recognize` zwraca wyniki dla każdej strony. Przejdź przez nie w pętli w ten sposób: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Obsługa języków + +Aspose OCR obsługuje ponad 60 języków. Aby zmienić język, ustaw właściwość `language` przed wywołaniem `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Pełny działający przykład + +Łącząc wszystko razem, oto kompletny, gotowy do skopiowania skrypt, który obejmuje wszystko od instalacji po obsługę przypadków brzegowych: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Uruchom skrypt (`python ocr_demo.py`), a zobaczysz wynik **ocr image to text** bezpośrednio w konsoli. + +## Podsumowanie – Co omówiliśmy + +- Zainstalowano **Aspose OCR Cloud** SDK (`pip install asposeocrcloud`). +- **Zainicjalizowano silnik OCR** bez licencji (idealne dla wersji próbnej). +- Zademonstrowano, jak **load image for OCR**, niezależnie czy to PNG, JPEG czy PDF. +- Wykonano konwersję **ocr image to text** i **converted image plain text** na użyteczny łańcuch znaków Pythona. +- Rozwiązano typowe problemy, takie jak skany o niskiej rozdzielczości, wielostronicowe PDF‑y i wybór języka. + +## Kolejne kroki i powiązane tematy + +Teraz, gdy opanowałeś **python ocr tutorial**, rozważ dalsze zagadnienia: + +- **Extract text image python** do przetwarzania wsadowego dużych folderów z paragonami. +- Integracja wyniku OCR z **pandas** w celu analizy danych (`df = pd.read_csv(StringIO(extracted))`). +- Użycie **Tesseract OCR** jako zapasowego rozwiązania, gdy połączenie internetowe jest ograniczone. +- Dodanie przetwarzania końcowego przy użyciu **spaCy**, aby wykrywać jednostki takie jak daty, kwoty i nazwy sprzedawców. + +Śmiało eksperymentuj: wypróbuj różne formaty obrazów, dostosuj kontrast lub zmień język. Środowisko OCR jest szerokie, a nabyte umiejętności stanowią solidną podstawę dla każdego projektu automatyzacji dokumentów. + +Szczęśliwego kodowania i niech Twój tekst zawsze będzie czytelny! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/polish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/polish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..29c6ebd2e --- /dev/null +++ b/ocr/polish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,221 @@ +--- +category: general +date: 2026-03-28 +description: Dowiedz się, jak uruchomić OCR na obrazie, automatycznie pobrać model + Hugging Face, oczyścić tekst OCR i skonfigurować model LLM w Pythonie przy użyciu + Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: pl +og_description: Uruchom OCR na obrazie i oczyść wynik przy użyciu automatycznie pobranego + modelu Hugging Face. Ten przewodnik pokazuje, jak skonfigurować model LLM w Pythonie. +og_title: Uruchom OCR na obrazie – Kompletny samouczek Aspose OCR Cloud +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Uruchom OCR na obrazie przy użyciu Aspose OCR Cloud – Kompletny przewodnik + krok po kroku +url: /pl/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Uruchom OCR na obrazie – Kompletny samouczek Aspose OCR Cloud + +Czy kiedykolwiek musiałeś wykonać OCR na plikach graficznych, a surowy wynik wyglądał jak nieuporządkowany bałagan? Z mojego doświadczenia największym problemem nie jest samo rozpoznawanie – to czyszczenie. Na szczęście Aspose OCR Cloud pozwala podłączyć post‑procesor LLM, który może *automatycznie wyczyścić tekst OCR*. W tym samouczku przeprowadzimy Cię przez wszystko, czego potrzebujesz: od **pobrania modelu z Hugging Face** po skonfigurowanie LLM, uruchomienie silnika OCR i ostateczne wypolerowanie wyniku. + +Po zakończeniu tego przewodnika będziesz mieć gotowy do uruchomienia skrypt, który: + +1. Pobiera kompaktowy model Qwen 2.5 z Hugging Face (automatycznie pobierany dla Ciebie). +2. Konfiguruje model tak, aby część sieci działała na GPU, a reszta na CPU. +3. Wykonuje silnik OCR na obrazie odręcznej notatki. +4. Używa LLM do wyczyszczenia rozpoznanego tekstu, dając wynik czytelny dla człowieka. + +> **Wymagania wstępne** – Python 3.8+, pakiet `asposeocrcloud`, GPU z co najmniej 4 GB VRAM (opcjonalnie, ale zalecane) oraz połączenie internetowe do pierwszego pobrania modelu. + +--- + +## Czego będziesz potrzebować + +- **Aspose OCR Cloud SDK** – zainstaluj za pomocą `pip install asposeocrcloud`. +- **Przykładowy obraz** – np. `handwritten_note.jpg` umieszczony w lokalnym folderze. +- **Wsparcie GPU** – jeśli masz GPU obsługujące CUDA, skrypt przeniesie 30 warstw na GPU; w przeciwnym razie automatycznie przełączy się na CPU. +- **Uprawnienia do zapisu** – skrypt buforuje model w `YOUR_DIRECTORY`; upewnij się, że folder istnieje. + +--- + +## Krok 1 – Skonfiguruj model LLM (pobierz model z Hugging Face) + +Pierwszą rzeczą, którą robimy, jest poinformowanie Aspose AI, skąd pobrać model. Klasa `AsposeAIModelConfig` obsługuje automatyczne pobieranie, kwantyzację i przydział warstw GPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Dlaczego to ważne** – Kwantyzacja do `int8` dramatycznie zmniejsza zużycie pamięci (≈ 4 GB vs 12 GB). Podzielenie modelu między GPU a CPU pozwala uruchomić LLM o 3 miliardach parametrów nawet na skromnym RTX 3060. Jeśli nie masz GPU, ustaw `gpu_layers=0`, a SDK utrzyma wszystko na CPU. + +> **Wskazówka:** Pierwsze uruchomienie pobierze ~ 1,5 GB, więc daj mu kilka minut i stabilne połączenie. + +--- + +## Krok 2 – Zainicjalizuj silnik AI z konfiguracją modelu + +Teraz uruchamiamy silnik Aspose AI i przekazujemy mu właśnie stworzoną konfigurację. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Co się dzieje pod maską?** SDK sprawdza `directory_model_path` pod kątem istniejącego modelu. Jeśli znajdzie pasującą wersję, ładuje ją natychmiast; w przeciwnym razie pobiera plik GGUF z Hugging Face, rozpakowuje go i przygotowuje pipeline inferencyjny. + +--- + +## Krok 3 – Utwórz silnik OCR i podłącz post‑procesor AI + +Silnik OCR wykonuje ciężką pracę rozpoznawania znaków. Podłączając `ocr_ai.run_postprocessor`, włączamy **czyszczenie tekstu OCR** automatycznie po rozpoznaniu. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Dlaczego używać post‑procesora?** Surowy OCR często zawiera nieprawidłowe podziały linii, błędnie wykryte znaki interpunkcyjne lub zbędne symbole. LLM może przekształcić wynik w poprawne zdania, skorygować pisownię i nawet domyślić się brakujących słów – w praktyce zamienia surowy zrzut w dopracowaną prozę. + +--- + +## Krok 4 – Uruchom OCR na pliku obrazu + +Po podłączeniu wszystkiego, czas przekazać obraz do silnika. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Przypadek brzegowy:** Jeśli obraz jest duży (> 5 MP), warto go najpierw zmniejszyć, aby przyspieszyć przetwarzanie. SDK przyjmuje obiekt Pillow `Image`, więc możesz wstępnie przetworzyć go metodą `PIL.Image.thumbnail()` w razie potrzeby. + +--- + +## Krok 5 – Niech AI wyczyści rozpoznany tekst i pokaż obie wersje + +Na koniec wywołujemy wcześniej podłączony post‑procesor. Ten krok ilustruje kontrast między *przed* a *po* czyszczeniu. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Oczekiwany wynik + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Zauważ, jak LLM: + +- Naprawił typowe błędy OCR (`Th1s` → `This`). +- Usunął zbędne symbole (`&` → `and`). +- Znormalizował podziały linii do poprawnych zdań. + +--- + +## 🎨 Przegląd wizualny (Workflow „Uruchom OCR na obrazie”) + +![Run OCR on image workflow](run_ocr_on_image_workflow.png "Diagram showing the run OCR on image pipeline from model download to cleaned output") + +Diagram powyżej podsumowuje cały pipeline: **pobranie modelu z Hugging Face → konfiguracja LLM → inicjalizacja AI → silnik OCR → post‑procesor AI → czysty tekst OCR**. + +--- + +## Często zadawane pytania i pro tipy + +### Co zrobić, jeśli nie mam GPU? + +Ustaw `gpu_layers=0` w `AsposeAIModelConfig`. Model będzie działał w pełni na CPU, co jest wolniejsze, ale wciąż funkcjonalne. Możesz także przejść na mniejszy model (np. `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`), aby utrzymać rozsądny czas inferencji. + +### Jak zmienić model później? + +Po prostu zaktualizuj `hugging_face_repo_id` i ponownie uruchom `ocr_ai.initialize(model_config)`. SDK wykryje zmianę wersji, pobierze nowy model i zastąpi zbuforowane pliki. + +### Czy mogę dostosować prompt post‑procesora? + +Tak. Przekaż słownik do `custom_settings` z kluczem `prompt_template`. Przykład: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Czy powinienem zapisać wyczyszczony tekst do pliku? + +Zdecydowanie. Po czyszczeniu możesz zapisać wynik do pliku `.txt` lub `.json` do dalszego przetwarzania: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Zakończenie + +Pokazaliśmy, jak **uruchomić OCR na obrazach** przy użyciu Aspose OCR Cloud, automatycznie **pobrać model z Hugging Face**, precyzyjnie **skonfigurować ustawienia modelu LLM** i w końcu **wyczyścić tekst OCR** za pomocą potężnego post‑procesora LLM. Cały proces mieści się w jednym, łatwym do uruchomienia skrypcie Pythona i działa zarówno na maszynach z GPU, jak i wyłącznie na CPU. + +Jeśli ten pipeline Ci odpowiada, rozważ eksperymenty z: + +- **Różnymi LLM** – wypróbuj `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` dla większego okna kontekstowego. +- **Przetwarzaniem wsadowym** – iteruj po folderze obrazów i agreguj wyczyszczone wyniki do CSV. +- **Niestandardowymi promptami** – dostosuj AI do swojej dziedziny (dokumenty prawne, notatki medyczne itp.). + +Śmiało modyfikuj wartość `gpu_layers`, wymieniaj model lub podłącz własny prompt. Niebo jest granicą, a kod, który masz teraz, to dopiero start. + +Miłego kodowania i niech Twoje wyniki OCR będą zawsze czyste! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/portuguese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/portuguese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..67dd21f33 --- /dev/null +++ b/ocr/portuguese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,224 @@ +--- +category: general +date: 2026-03-28 +description: Como usar OCR para reconhecer texto manuscrito em imagens. Aprenda a + extrair texto manuscrito, converter imagem manuscrita e obter resultados limpos + rapidamente. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: pt +og_description: Como usar OCR para reconhecer texto manuscrito. Este tutorial mostra + passo a passo como extrair texto manuscrito de imagens e obter resultados refinados. +og_title: Como usar OCR para reconhecer texto manuscrito – Guia completo +tags: +- OCR +- Handwriting Recognition +- Python +title: Como usar OCR para reconhecer texto manuscrito – Guia completo +url: /pt/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Como Usar OCR para Reconhecer Texto Manuscrito – Guia Completo + +Como usar OCR para notas manuscritas é uma pergunta que muitos desenvolvedores fazem quando precisam digitalizar esboços, atas de reunião ou ideias rápidas. Neste guia, vamos percorrer os passos exatos para reconhecer texto manuscrito, extrair texto manuscrito e transformar uma imagem manuscrita em strings limpas e pesquisáveis. + +Se você já ficou olhando para uma foto de uma lista de compras e se perguntou: “Posso converter essa imagem manuscrita em texto sem digitar tudo novamente?” – você está no lugar certo. Ao final, você terá um script pronto‑para‑executar que transforma uma **nota manuscrita em texto** em segundos. + +## O Que Você Precisa + +- Python 3.8+ (o código funciona com qualquer versão recente) +- A biblioteca `ocr` – instale com `pip install ocr-sdk` (substitua pelo nome do pacote do seu provedor) +- Uma foto clara de uma nota manuscrita (`hand_note.png` no exemplo) +- Um pouco de curiosidade e um café ☕️ (opcional, mas recomendado) + +Sem frameworks pesados, sem chaves pagas de nuvem – apenas um motor local que suporta **reconhecimento de manuscrito** pronto para uso. + +## Etapa 1 – Instale o Pacote OCR e Importe‑o + +Primeiro de tudo, vamos colocar o pacote correto na sua máquina. Abra um terminal e execute: + +```bash +pip install ocr-sdk +``` + +Depois que a instalação terminar, importe o módulo no seu script: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Dica profissional:** Se você estiver usando um ambiente virtual, ative‑o antes de instalar. Isso mantém seu projeto organizado e evita conflitos de versão. + +## Etapa 2 – Crie um Motor OCR e Ative o Modo Manuscrito + +Agora, realmente **como usar OCR** – precisamos de uma instância do motor que saiba que estamos lidando com traços cursivos em vez de fontes impressas. O trecho a seguir cria o motor e o muda para o modo manuscrito: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Por que definir `recognition_mode`? Porque a maioria dos motores OCR tem como padrão a detecção de texto impresso, que costuma ignorar os laços e inclinações de uma nota pessoal. Ativar o modo manuscrito aumenta a precisão drasticamente. + +## Etapa 3 – Carregue a Imagem Que Você Quer Converter (Converter Imagem Manuscrita) + +Imagens são a matéria‑prima de qualquer trabalho de OCR. Certifique‑se de que sua foto esteja salva em um formato sem perdas (PNG funciona muito bem) e que o texto seja razoavelmente legível. Então carregue‑a assim: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Se a imagem estiver ao lado do seu script, você pode simplesmente usar `"hand_note.png"` em vez de um caminho completo. + +> **E se a imagem estiver borrada?** Tente pré‑processar com OpenCV (por exemplo, `cv2.cvtColor` para tons de cinza, `cv2.threshold` para aumentar o contraste) antes de enviá‑la ao motor OCR. + +## Etapa 4 – Execute o Motor de Reconhecimento para Extrair Texto Manuscrito + +Com o motor pronto e a imagem na memória, finalmente podemos **extrair texto manuscrito**. O método `recognize` devolve um objeto de resultado bruto que contém o texto mais as pontuações de confiança. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +A saída bruta típica pode incluir quebras de linha indesejadas ou caracteres mal identificados, especialmente se a caligrafia for bagunçada. Por isso a próxima etapa existe. + +## Etapa 5 – (Opcional) Refine a Saída com o Pós‑Processador de IA + +A maioria dos SDKs OCR modernos vem com um pós‑processador de IA leve que limpa espaçamentos, corrige erros comuns de OCR e normaliza quebras de linha. Executá‑lo é tão simples quanto: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Se você pular esta etapa ainda receberá texto utilizável, mas a conversão **de nota manuscrita para texto** ficará um pouco mais áspera. O pós‑processador é especialmente útil para notas que contêm marcadores ou palavras com maiúsculas e minúsculas misturadas. + +## Etapa 6 – Verifique o Resultado e Trate Casos Limítrofes + +Depois de imprimir o resultado refinado, verifique se tudo está correto. Aqui está uma verificação rápida de sanidade que você pode acrescentar: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Checklist de casos‑limite** + +| Situação | O que fazer | +|-----------|------------| +| **Contraste muito baixo** | Aumente o contraste com `cv2.convertScaleAbs` antes de carregar. | +| **Múltiplos idiomas** | Defina `ocr_engine.language = ["en", "es"]` (ou os idiomas de destino). | +| **Documentos grandes** | Processar páginas em lotes para evitar picos de memória. | +| **Símbolos especiais** | Adicione um dicionário personalizado via `ocr_engine.add_custom_words([...])`. | + +## Visão Geral Visual + +Abaixo está uma imagem de espaço reservado que ilustra o fluxo de trabalho — de uma nota fotografada ao texto limpo. O texto alternativo contém a palavra‑chave principal, tornando a imagem amigável ao SEO. + +![como usar OCR em uma imagem de nota manuscrita](/images/handwritten_ocr_flow.png "como usar OCR em uma imagem de nota manuscrita") + +## Script Completo e Executável + +Juntando todas as peças, aqui está o programa completo, pronto para copiar‑e‑colar: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Saída esperada (exemplo)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Observe como o pós‑processador corrigiu o erro “T0d@y” e normalizou os espaçamentos. + +## Armadilhas Comuns & Dicas Profissionais + +- **Tamanho da imagem importa** – Motores OCR geralmente limitam o tamanho de entrada a 4 K × 4 K. Redimensione fotos grandes antes. +- **Estilo de caligrafia** – Cursiva vs. letras de bloco podem afetar a precisão. Se você controla a fonte (por exemplo, uma caneta digital), incentive letras de bloco para melhores resultados. +- **Processamento em lote** – Quando lidar com dezenas de notas, envolva o script em um loop e armazene cada resultado em um CSV ou banco SQLite. +- **Vazamentos de memória** – Alguns SDKs mantêm buffers internos; chame `ocr_engine.dispose()` depois de terminar se notar lentidão. + +## Próximos Passos – Indo Além do OCR Simples + +Agora que você dominou **como usar OCR** para uma única imagem, considere estas extensões: + +1. **Integrar com armazenamento em nuvem** – Busque imagens do AWS S3 ou Azure Blob, execute o mesmo pipeline e envie os resultados de volta. +2. **Adicionar detecção de idioma** – Use `ocr_engine.detect_language()` para mudar dicionários automaticamente. +3. **Combinar com NLP** – Alimente o texto limpo ao spaCy ou NLTK para extrair entidades, datas ou itens de ação. +4. **Criar um endpoint REST** – Envolva o script em Flask ou FastAPI para que outros serviços possam fazer POST de imagens e receber texto codificado em JSON. + +Todas essas ideias ainda giram em torno dos conceitos centrais de **reconhecer texto manuscrito**, **extrair texto manuscrito** e **converter imagem manuscrita** — as frases exatas que você provavelmente buscará a seguir. + +--- + +### TL;DR + +Mostramos **como usar OCR** para reconhecer texto manuscrito, extraí‑lo e refinar o resultado em uma string utilizável. O script completo está pronto para executar, o fluxo foi explicado passo a passo, e agora você tem um checklist para casos‑limite comuns. Pegue uma foto da sua próxima nota de reunião, insira‑a no script e deixe a máquina digitar por você. + +Feliz codificação, e que suas notas estejam sempre legíveis! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/portuguese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/portuguese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..23f9c18c6 --- /dev/null +++ b/ocr/portuguese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,186 @@ +--- +category: general +date: 2026-03-28 +description: Execute OCR em imagem e obtenha texto limpo com coordenadas de caixa + delimitadora. Aprenda como extrair OCR, limpar OCR e exibir os resultados passo + a passo. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: pt +og_description: Execute OCR em uma imagem, limpe a saída e exiba as coordenadas da + caixa delimitadora em um tutorial conciso. +og_title: Realize OCR em imagem – resultados limpos e caixas delimitadoras +tags: +- OCR +- Computer Vision +- Python +title: Realizar OCR em Imagem – Resultados Limpos e Exibir Coordenadas da Caixa Delimitadora +url: /pt/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Realizar OCR em Imagem – Limpar Resultados e Mostrar Coordenadas da Caixa Delimitadora + +Já precisou **realizar OCR em imagem** mas continuava obtendo texto bagunçado e sem saber onde cada palavra está na foto? Você não está sozinho. Em muitos projetos—digitalização de faturas, escaneamento de recibos ou extração simples de texto—obter a saída bruta do OCR é apenas o primeiro obstáculo. A boa notícia? Você pode limpar essa saída e ver instantaneamente as coordenadas da caixa delimitadora de cada região sem escrever um monte de código boilerplate. + +Neste guia vamos percorrer **como extrair OCR**, executar um **como limpar OCR** pós‑processador e, finalmente, **exibir coordenadas da caixa delimitadora** para cada região limpa. Ao final, você terá um único script executável que transforma uma foto borrada em texto estruturado e organizado, pronto para processamento posterior. + +## O que Você Precisará + +- Python 3.9+ (a sintaxe abaixo funciona em 3.8 e superior) +- Um motor de OCR que suporte `recognize(..., return_structured=True)` – por exemplo, a biblioteca fictícia `engine` usada no trecho. Substitua-a por Tesseract, EasyOCR ou qualquer SDK que retorne dados de região. +- Familiaridade básica com funções e loops em Python +- Um arquivo de imagem que você deseja escanear (PNG, JPG, etc.) + +> **Dica profissional:** Se você estiver usando Tesseract, a função `pytesseract.image_to_data` já fornece caixas delimitadoras. Você pode envolver seu resultado em um pequeno adaptador que imita a API `engine.recognize` mostrada abaixo. + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Alt text: diagrama mostrando como realizar OCR em imagem e visualizar coordenadas da caixa delimitadora* + +## Etapa 1 – Realizar OCR em Imagem e Obter Regiões Estruturadas + +A primeira coisa é solicitar ao motor de OCR que retorne não apenas texto simples, mas uma lista estruturada de regiões de texto. Essa lista contém a string bruta e o retângulo que a envolve. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Por que isso importa:** +Quando você pede apenas texto simples, perde o contexto espacial. Dados estruturados permitem que você mais tarde **exiba coordenadas da caixa delimitadora**, alinhe texto com tabelas ou forneça localizações precisas a um modelo posterior. + +## Etapa 2 – Como Limpar a Saída de OCR com um Pós‑Processador + +Motores de OCR são ótimos em detectar caracteres, mas frequentemente deixam espaços estranhos, artefatos de quebras de linha ou símbolos reconhecidos incorretamente. Um pós‑processador normaliza o texto, corrige erros comuns de OCR e remove espaços em branco. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Se você estiver construindo seu próprio limpador, considere: + +- Remover caracteres não‑ASCII (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Colapsar múltiplos espaços em um único espaço +- Aplicar um corretor ortográfico como `pyspellchecker` para erros óbvios + +**Por que isso importa:** +Uma string organizada torna a busca, indexação e pipelines de NLP posteriores muito mais confiáveis. Em outras palavras, **como limpar OCR** costuma ser a diferença entre um conjunto de dados utilizável e uma dor de cabeça. + +## Etapa 3 – Exibir Coordenadas da Caixa Delimitadora para Cada Região Limpa + +Agora que o texto está organizado, iteramos sobre cada região, imprimindo seu retângulo e a string limpa. Esta é a parte onde finalmente **exibimos coordenadas da caixa delimitadora**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Saída de exemplo** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Você pode agora alimentar essas coordenadas em uma biblioteca de desenho (por exemplo, OpenCV) para sobrepor caixas na imagem original, ou armazená‑las em um banco de dados para consultas posteriores. + +## Script Completo, Pronto‑para‑Executar + +A seguir está o programa completo que une as três etapas. Substitua as chamadas placeholder `engine` pelo seu SDK de OCR real. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Como Executar + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Você deverá ver uma lista de caixas delimitadoras emparelhadas com texto limpo, exatamente como a saída de exemplo acima. + +## Perguntas Frequentes e Casos Limítrofes + +| Pergunta | Resposta | +|----------|----------| +| **E se o motor de OCR não suportar `return_structured`?** | Escreva um wrapper leve que converta a saída bruta do motor (geralmente uma lista de palavras com coordenadas) em objetos com atributos `text` e `bounding_box`. | +| **Posso obter pontuações de confiança?** | Muitos SDKs expõem uma métrica de confiança por região. Anexe‑a à instrução de impressão: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Como lidar com texto rotacionado?** | Pré‑procese a imagem com `cv2.minAreaRect` do OpenCV para corrigir a inclinação antes de chamar `recognize`. | +| **E se eu precisar da saída em JSON?** | Serialize `processed_result.regions` com `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Existe uma forma de visualizar as caixas?** | Use OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` dentro do loop, então `cv2.imwrite("annotated.jpg", img)`. | + +## Conclusão + +Você acabou de aprender **como realizar OCR em imagem**, limpar a saída bruta e **exibir coordenadas da caixa delimitadora** para cada região. O fluxo de três etapas—reconhecer → pós‑processar → iterar—é um padrão reutilizável que pode ser inserido em qualquer projeto Python que precise de extração de texto confiável. + +### O que vem a seguir? + +- **Explore diferentes back‑ends de OCR** (Tesseract, EasyOCR, Google Vision) e compare a precisão. +- **Integre com um banco de dados** para armazenar dados de regiões para arquivos pesquisáveis. +- **Adicione detecção de idioma** para direcionar cada região ao corretor ortográfico apropriado. +- **Sobreponha caixas na imagem original** para verificação visual (veja o snippet OpenCV acima). + +Se você encontrar peculiaridades, lembre‑se de que o maior ganho vem de um passo sólido de pós‑processamento; uma string limpa é muito mais fácil de trabalhar do que um despejo bruto de caracteres. + +Feliz codificação, e que seus pipelines de OCR estejam sempre organizados! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/portuguese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/portuguese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..cf6427f0f --- /dev/null +++ b/ocr/portuguese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Tutorial de OCR em Python mostrando como extrair texto de imagem com + Aspose OCR Cloud. Aprenda a carregar a imagem para OCR e converter a imagem em texto + simples em minutos. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: pt +og_description: Tutorial de OCR em Python explica como carregar imagem para OCR e + converter a imagem em texto simples usando Aspose OCR Cloud. Obtenha o código completo + e dicas. +og_title: Tutorial de OCR em Python – Extrair Texto de Imagens +tags: +- OCR +- Python +- Image Processing +title: Tutorial de OCR em Python – Extrair Texto de Imagens +url: /pt/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Tutorial de OCR em Python – Extrair Texto de Imagens + +Já se perguntou como transformar uma foto de recibo bagunçada em texto limpo e pesquisável? Você não é o único. Na minha experiência, o maior obstáculo não é o motor de OCR em si, mas colocar a imagem no formato correto e extrair o texto puro sem problemas. + +Este **python ocr tutorial** guia você por cada passo — carregando uma imagem para OCR, executando o reconhecimento e, finalmente, convertendo o texto puro da imagem em uma string Python que você pode armazenar ou analisar. Ao final, você será capaz de **extract text image python** no estilo, e não precisará de nenhuma licença paga para começar. + +## O que você aprenderá + +- Como instalar e importar o Aspose OCR Cloud SDK for Python. +- O código exato para **load image for OCR** (PNG, JPEG, TIFF, PDF, etc.). +- Como chamar o motor para realizar a conversão **ocr image to text**. +- Dicas para lidar com casos de borda comuns, como PDFs de várias páginas ou digitalizações de baixa resolução. +- Formas de verificar a saída e o que fazer se o texto aparecer confuso. + +### Pré-requisitos + +- Python 3.8+ instalado na sua máquina. +- Uma conta gratuita Aspose Cloud (a versão de avaliação funciona sem licença). +- Familiaridade básica com pip e ambientes virtuais — nada sofisticado. + +> **Pro tip:** Se você já está usando um virtualenv, ative-o agora. Ele mantém suas dependências organizadas e evita conflitos de versão. + +![Captura de tela do tutorial de OCR em Python mostrando texto reconhecido](path/to/ocr_example.png "Tutorial de OCR em Python – exibição de texto puro extraído") + +## Etapa 1 – Instalar o Aspose OCR Cloud SDK + +Primeiro de tudo, precisamos da biblioteca que se comunica com o serviço OCR da Aspose. Abra um terminal e execute: + +```bash +pip install asposeocrcloud +``` + +Esse único comando baixa o SDK mais recente (atualmente versão 23.12). O pacote inclui tudo que você precisa — sem bibliotecas extras de processamento de imagem necessárias. + +## Etapa 2 – Inicializar o Motor OCR (Palavra‑chave Principal em Ação) + +Agora que o SDK está pronto, podemos iniciar o motor **python ocr tutorial**. O construtor não precisa de nenhuma chave de licença para a avaliação, o que simplifica as coisas. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** Inicializar o motor apenas uma vez mantém as chamadas subsequentes rápidas. Se você recriar o objeto para cada imagem, desperdiçará viagens de rede. + +## Etapa 3 – Carregar Imagem para OCR + +É aqui que a palavra‑chave **load image for OCR** se destaca. O método `Image.load` do SDK aceita um caminho de arquivo ou uma URL, e detecta automaticamente o formato (PNG, JPEG, TIFF, PDF, etc.). Vamos carregar um recibo de exemplo: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Se você estiver lidando com um PDF de várias páginas, basta apontar para o arquivo PDF; o SDK tratará cada página como uma imagem separada internamente. + +## Etapa 4 – Executar a Conversão OCR de Imagem para Texto + +Com a imagem na memória, o OCR real acontece em uma única linha. O método `recognize` retorna um objeto `OcrResult` que contém o texto puro, pontuações de confiança e até caixas delimitadoras se você precisar delas depois. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** Para imagens de baixa resolução (abaixo de 300 dpi) você pode querer ampliar a imagem primeiro. O SDK oferece um auxiliar `Resize`, mas para a maioria dos recibos o padrão funciona bem. + +## Etapa 5 – Converter o Texto Puro da Imagem em uma String Utilizável + +A peça final do quebra‑cabeça é extrair o texto puro do objeto de resultado. Esta é a etapa **convert image plain text** que transforma o blob de OCR em algo que você pode imprimir, armazenar ou alimentar em outro sistema. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Ao executar o script, você deverá ver algo como: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Essa saída agora é uma string Python comum, pronta para exportação CSV, inserção em banco de dados ou processamento de linguagem natural. + +## Lidando com Problemas Comuns + +### 1. Imagens em Branco ou Ruidosas + +Se `ocr_result.text` retornar vazio, verifique novamente a qualidade da imagem. Uma solução rápida é adicionar uma etapa de pré‑processamento: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. PDFs de Múltiplas Páginas + +Quando você fornece um PDF, `recognize` retorna resultados para cada página. Percorra‑os assim: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Suporte a Idiomas + +Aspose OCR suporta mais de 60 idiomas. Para mudar o idioma, defina a propriedade `language` antes de chamar `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Exemplo Completo Funcional + +Juntando tudo, aqui está um script completo, pronto para copiar e colar, que cobre tudo desde a instalação até o tratamento de casos de borda: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Execute o script (`python ocr_demo.py`) e você verá a saída **ocr image to text** diretamente no seu console. + +## Recapitulação – O que Cobrimos + +- Instalou o SDK **Aspose OCR Cloud** (`pip install asposeocrcloud`). +- **Inicializou o motor OCR** sem licença (perfeito para avaliação). +- Demonstrou como **load image for OCR**, seja PNG, JPEG ou PDF. +- Executou a conversão **ocr image to text** e **convert image plain text** em uma string Python utilizável. +- Abordou problemas comuns como digitalizações de baixa resolução, PDFs de várias páginas e seleção de idioma. + +## Próximos Passos e Tópicos Relacionados + +Agora que você dominou o **python ocr tutorial**, considere explorar: + +- **Extract text image python** para processamento em lote de grandes pastas de recibos. +- Integrar a saída OCR com **pandas** para análise de dados (`df = pd.read_csv(StringIO(extracted))`). +- Usar **Tesseract OCR** como alternativa quando a conectividade com a internet for limitada. +- Adicionar pós‑processamento com **spaCy** para identificar entidades como datas, valores e nomes de comerciantes. + +Sinta‑se à vontade para experimentar: tente diferentes formatos de imagem, ajuste o contraste ou troque de idioma. O cenário de OCR é amplo, e as habilidades que você acabou de adquirir são uma base sólida para qualquer projeto de automação de documentos. + +Feliz codificação, e que seu texto esteja sempre legível! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/portuguese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/portuguese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..4178d32ee --- /dev/null +++ b/ocr/portuguese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-03-28 +description: Aprenda a executar OCR em imagens, baixar o modelo Hugging Face automaticamente, + limpar o texto OCR e configurar o modelo LLM em Python usando o Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: pt +og_description: Execute OCR em imagem e limpe a saída usando um modelo Hugging Face + baixado automaticamente. Este guia mostra como configurar o modelo LLM em Python. +og_title: Executar OCR em Imagem – Tutorial Completo do Aspose OCR Cloud +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Execute OCR em imagem com Aspose OCR Cloud – Guia completo passo a passo +url: /pt/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Executar OCR em Imagem – Tutorial Completo do Aspose OCR Cloud + +Já precisou executar OCR em arquivos de imagem, mas a saída bruta parecia uma bagunça? Na minha experiência, o maior ponto crítico não é o reconhecimento em si — é a limpeza. Felizmente, o Aspose OCR Cloud permite anexar um pós‑processador LLM que pode *limpar o texto OCR* automaticamente. Neste tutorial, vamos percorrer tudo o que você precisa: desde **baixar um modelo do Hugging Face** até configurar o LLM, executar o motor OCR e, finalmente, polir o resultado. + +Ao final deste guia você terá um script pronto‑para‑executar que: + +1. Baixa um modelo compacto Qwen 2.5 do Hugging Face (baixado automaticamente para você). +2. Configura o modelo para executar parte da rede na GPU e o resto na CPU. +3. Executa o motor OCR em uma imagem de nota manuscrita. +4. Usa o LLM para limpar o texto reconhecido, fornecendo uma saída legível por humanos. + +> **Pré‑requisitos** – Python 3.8+, pacote `asposeocrcloud`, uma GPU com pelo menos 4 GB de VRAM (opcional, mas recomendado) e conexão à internet para o primeiro download do modelo. + +--- + +## O Que Você Precisa + +- **Aspose OCR Cloud SDK** – instale via `pip install asposeocrcloud`. +- **Uma imagem de exemplo** – por exemplo, `handwritten_note.jpg` colocada em uma pasta local. +- **Suporte a GPU** – se você possui uma GPU com CUDA, o script descarregará 30 camadas; caso contrário, ele retornará automaticamente para a CPU. +- **Permissão de escrita** – o script armazena em cache o modelo em `YOUR_DIRECTORY`; certifique‑se de que a pasta exista. + +--- + +## Etapa 1 – Configurar o Modelo LLM (baixar modelo Hugging Face) + +A primeira coisa que fazemos é informar ao Aspose AI onde buscar o modelo. A classe `AsposeAIModelConfig` cuida do download automático, quantização e alocação de camadas na GPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Por que isso importa** – Quantizar para `int8` reduz drasticamente o uso de memória (≈ 4 GB vs 12 GB). Dividir o modelo entre GPU e CPU permite rodar um LLM de 3 bilhões de parâmetros mesmo em uma RTX 3060 modesta. Se você não tem GPU, defina `gpu_layers=0` e o SDK manterá tudo na CPU. + +> **Dica:** A primeira execução baixará ~ 1,5 GB, então reserve alguns minutos e uma conexão estável. + +--- + +## Etapa 2 – Inicializar o Motor de IA com a Configuração do Modelo + +Agora iniciamos o motor Aspose AI e fornecemos a configuração que acabamos de criar. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**O que está acontecendo nos bastidores?** O SDK verifica `directory_model_path` em busca de um modelo existente. Se encontrar uma versão compatível, carrega-a instantaneamente; caso contrário, baixa o arquivo GGUF do Hugging Face, descompacta‑o e prepara o pipeline de inferência. + +--- + +## Etapa 3 – Criar o Motor OCR e Anexar o Pós‑Processador de IA + +O motor OCR realiza o trabalho pesado de reconhecer caracteres. Ao anexar `ocr_ai.run_postprocessor` habilitamos **limpeza automática do texto OCR** após o reconhecimento. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Por que usar um pós‑processador?** OCR bruto costuma incluir quebras de linha nos lugares errados, pontuação detectada incorretamente ou símbolos estranhos. O LLM pode reescrever a saída em frases corretas, corrigir ortografia e até inferir palavras ausentes — essencialmente transformando um dump bruto em prosa polida. + +--- + +## Etapa 4 – Executar OCR em um Arquivo de Imagem + +Com tudo conectado, é hora de alimentar uma imagem ao motor. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Caso extremo:** Se a imagem for grande (> 5 MP), pode ser interessante redimensioná‑la primeiro para acelerar o processamento. O SDK aceita um objeto Pillow `Image`, então você pode pré‑processar com `PIL.Image.thumbnail()` se necessário. + +--- + +## Etapa 5 – Deixar a IA Limpar o Texto Reconhecido e Mostrar Ambas as Versões + +Por fim, invocamos o pós‑processador que anexamos anteriormente. Esta etapa demonstra o contraste entre *antes* e *depois* da limpeza. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Saída Esperada + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Observe como o LLM: + +- Corrigiu erros comuns de OCR (`Th1s` → `This`). +- Removeu símbolos estranhos (`&` → `and`). +- Normalizou quebras de linha em frases adequadas. + +--- + +## 🎨 Visão Geral Visual (Fluxo de Execução de OCR em Imagem) + +![Run OCR on image workflow](run_ocr_on_image_workflow.png "Diagram showing the run OCR on image pipeline from model download to cleaned output") + +O diagrama acima resume o pipeline completo: **download do modelo Hugging Face → configurar LLM → inicializar IA → motor OCR → pós‑processador de IA → texto OCR limpo**. + +--- + +## Perguntas Frequentes & Dicas Profissionais + +### E se eu não tiver GPU? + +Defina `gpu_layers=0` em `AsposeAIModelConfig`. O modelo será executado totalmente na CPU, o que é mais lento, mas ainda funcional. Você também pode mudar para um modelo menor (por exemplo, `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) para manter o tempo de inferência razoável. + +### Como mudar o modelo depois? + +Basta atualizar `hugging_face_repo_id` e reexecutar `ocr_ai.initialize(model_config)`. O SDK detectará a mudança de versão, baixará o novo modelo e substituirá os arquivos em cache. + +### Posso personalizar o prompt do pós‑processador? + +Sim. Passe um dicionário para `custom_settings` com a chave `prompt_template`. Por exemplo: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Devo armazenar o texto limpo em um arquivo? + +Com certeza. Após a limpeza você pode gravar o resultado em um arquivo `.txt` ou `.json` para processamento posterior: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Conclusão + +Acabamos de mostrar como **executar OCR em arquivos de imagem** com Aspose OCR Cloud, **baixar automaticamente um modelo Hugging Face**, configurar habilmente as **configurações do modelo LLM** e, finalmente, **limpar o texto OCR** usando um poderoso pós‑processador LLM. Todo o processo cabe em um único script Python fácil de executar e funciona tanto em máquinas com GPU quanto apenas com CPU. + +Se você está confortável com este pipeline, experimente: + +- **Modelos diferentes** – teste `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` para uma janela de contexto maior. +- **Processamento em lote** – percorra uma pasta de imagens e agregue os resultados limpos em um CSV. +- **Prompts personalizados** – ajuste a IA ao seu domínio (documentos legais, notas médicas, etc.). + +Sinta‑se à vontade para ajustar o valor de `gpu_layers`, trocar o modelo ou inserir seu próprio prompt. O céu é o limite, e o código que você tem agora é a plataforma de lançamento. + +Bom código, e que suas saídas OCR estejam sempre limpas! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/russian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/russian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..1584d3d86 --- /dev/null +++ b/ocr/russian/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Как использовать OCR для распознавания рукописного текста на изображениях. + Узнайте, как извлекать рукописный текст, преобразовывать изображение с рукописным + текстом и быстро получать чистый результат. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: ru +og_description: Как использовать OCR для распознавания рукописного текста. Этот учебник + покажет вам пошагово, как извлекать рукописный текст из изображений и получать качественные + результаты. +og_title: Как использовать OCR для распознавания рукописного текста – Полное руководство +tags: +- OCR +- Handwriting Recognition +- Python +title: Как использовать OCR для распознавания рукописного текста — полное руководство +url: /ru/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Как использовать OCR для распознавания рукописного текста – Полное руководство + +Как использовать OCR для рукописных заметок — вопрос, который задают многие разработчики, когда им нужно оцифровать эскизы, протоколы встреч или быстрые идеи. В этом руководстве мы пройдём по точным шагам распознавания рукописного текста, извлечения рукописного текста и преобразования изображения с рукописью в чистые, индексируемые строки. + +Если вы когда‑нибудь смотрели на фото списка покупок и думали: «Могу ли я преобразовать это рукописное изображение в текст без повторного набора?» — вы в нужном месте. К концу вы получите готовый к запуску скрипт, который за секунды превратит **рукописную заметку в текст**. + +## Что понадобится + +- Python 3.8+ (код работает с любой современной версией) +- Библиотека `ocr` — установите её командой `pip install ocr-sdk` (замените на название пакета вашего провайдера) +- Чёткое фото рукописной заметки (`hand_note.png` в примере) +- Немного любопытства и кофе ☕️ (по желанию, но рекомендуется) + +Никаких тяжёлых фреймворков, никаких платных облачных ключей — только локальный движок, поддерживающий **handwritten recognition** «из коробки». + +## Шаг 1 — Установите пакет OCR и импортируйте его + +Сначала получим нужный пакет на ваш компьютер. Откройте терминал и выполните: + +```bash +pip install ocr-sdk +``` + +После завершения установки импортируйте модуль в ваш скрипт: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tip:** Если вы используете виртуальное окружение, активируйте его перед установкой. Это сохраняет ваш проект чистым и избегает конфликтов версий. + +## Шаг 2 — Создайте OCR‑движок и включите режим рукописного ввода + +Теперь мы действительно **how to use OCR** — нам нужен экземпляр движка, который понимает, что мы имеем дело с курсивными штрихами, а не печатными шрифтами. Следующий фрагмент создаёт движок и переключает его в режим рукописного ввода: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Зачем устанавливать `recognition_mode`? Потому что большинство OCR‑движков по умолчанию ищут печатный текст, часто игнорируя петли и наклоны личных заметок. Включение режима рукописного ввода резко повышает точность. + +## Шаг 3 — Загрузите изображение, которое хотите конвертировать (Convert Handwritten Image) + +Изображения — сырьё любого OCR‑процесса. Убедитесь, что ваша фотография сохранена в без потерь формате (PNG отлично подходит) и текст достаточно разборчив. Затем загрузите её так: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Если изображение находится рядом со скриптом, можно просто использовать `"hand_note.png"` вместо полного пути. + +> **Что делать, если изображение размыто?** Попробуйте предварительную обработку с помощью OpenCV (например, `cv2.cvtColor` в градации серого, `cv2.threshold` для повышения контраста) перед передачей в OCR‑движок. + +## Шаг 4 — Запустите движок распознавания, чтобы извлечь рукописный текст + +Когда движок готов и изображение загружено в память, мы наконец‑то **extract handwritten text**. Метод `recognize` возвращает необработанный объект результата, содержащий текст и оценки уверенности. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Типичный необработанный вывод может включать лишние разрывы строк или ошибочно распознанные символы, особенно если почерк неаккуратный. Поэтому существует следующий шаг. + +## Шаг 5 — (Опционально) Полировать вывод с помощью AI‑постпроцессора + +Большинство современных OCR‑SDK поставляются с лёгким AI‑постпроцессором, который исправляет пробелы, типичные OCR‑ошибки и нормализует окончания строк. Запустить его так же просто: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Если пропустить этот шаг, вы всё равно получите пригодный текст, но конверсия **handwritten note to text** будет выглядеть менее аккуратно. Постпроцессор особенно полезен для заметок с маркерами или смешанным регистром. + +## Шаг 6 — Проверьте результат и обработайте граничные случаи + +После вывода полированного результата дважды проверьте, что всё выглядит правильно. Вот быстрый sanity‑check, который можно добавить: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Чек‑лист граничных случаев** + +| Ситуация | Что делать | +|-----------|------------| +| **Очень низкий контраст** | Увеличьте контраст с помощью `cv2.convertScaleAbs` перед загрузкой. | +| **Несколько языков** | Установите `ocr_engine.language = ["en", "es"]` (или ваши целевые языки). | +| **Большие документы** | Обрабатывайте страницы пакетами, чтобы избежать всплесков памяти. | +| **Специальные символы** | Добавьте пользовательский словарь через `ocr_engine.add_custom_words([...])`. | + +## Визуальный обзор + +Ниже размещено заполнитель‑изображение, иллюстрирующее рабочий процесс — от сфотографированной заметки до чистого текста. alt‑текст содержит основной ключевой запрос, делая изображение SEO‑дружественным. + +![how to use OCR on a handwritten note image](/images/handwritten_ocr_flow.png "how to use OCR on a handwritten note image") + +## Полный, готовый к запуску скрипт + +Собрав все части вместе, получаем полностью готовую к копированию и вставке программу: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Ожидаемый вывод (пример)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Обратите внимание, как постпроцессор исправил опечатку «T0d@y» и нормализовал пробелы. + +## Распространённые подводные камни и профессиональные советы + +- **Размер изображения имеет значение** — OCR‑движки обычно ограничивают вход до 4 K × 4 K. Предварительно уменьшайте большие фотографии. +- **Стиль рукописи** — курсив против печатных букв может влиять на точность. Если вы контролируете источник (например, цифровой перо), предпочтительно использовать печатные буквы для лучшего результата. +- **Пакетная обработка** — при работе с десятками заметок оберните скрипт в цикл и сохраняйте каждый результат в CSV или SQLite. +- **Утечки памяти** — некоторые SDK держат внутренние буферы; вызывайте `ocr_engine.dispose()` после завершения, если замечаете замедление. + +## Следующие шаги — Выход за пределы простого OCR + +Теперь, когда вы освоили **how to use OCR** для одного изображения, рассмотрите следующие расширения: + +1. **Интеграция с облачным хранилищем** — получайте изображения из AWS S3 или Azure Blob, запускайте тот же конвейер и сохраняйте результаты обратно. +2. **Добавление детекции языка** — используйте `ocr_engine.detect_language()` для автоматической смены словарей. +3. **Комбинация с NLP** — передайте очищенный текст в spaCy или NLTK для извлечения сущностей, дат или задач. +4. **Создание REST‑endpoint** — оберните скрипт в Flask или FastAPI, чтобы другие сервисы могли POST‑ить изображения и получать текст в формате JSON. + +Все эти идеи по‑прежнему опираются на ключевые концепции **recognize handwritten text**, **extract handwritten text** и **convert handwritten image** — именно те фразы, которые вы, вероятно, будете искать дальше. + +--- + +### TL;DR + +Мы показали, как **how to use OCR** для распознавания рукописного текста, его извлечения и полировки результата в пригодную строку. Полный скрипт готов к запуску, процесс объяснён шаг за шагом, и у вас есть чек‑лист для типичных граничных случаев. Возьмите фото следующей заметки встречи, запустите скрипт и позвольте машине выполнить набор текста за вас. + +Счастливого кодинга, и пусть ваши заметки всегда остаются разборчивыми! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/russian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/russian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..113b9d530 --- /dev/null +++ b/ocr/russian/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,186 @@ +--- +category: general +date: 2026-03-28 +description: Выполните OCR на изображении и получите чистый текст с координатами ограничивающих + рамок. Узнайте, как извлекать OCR, очищать его и отображать результаты пошагово. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: ru +og_description: Выполните OCR на изображении, очистите вывод и отобразите координаты + ограничивающих рамок в кратком руководстве. +og_title: Выполнить OCR на изображении — чистые результаты и ограничивающие рамки +tags: +- OCR +- Computer Vision +- Python +title: Выполнить OCR на изображении — очистить результаты и отобразить координаты + ограничивающих рамок +url: /ru/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Выполнить OCR на изображении – Очистить результаты и показать координаты ограничивающих рамок + +Когда‑то вам нужно **выполнить OCR на изображении**, но вы получаете беспорядочный текст и не знаете, где каждое слово расположено на картинке? Вы не одиноки. Во многих проектах — оцифровка счетов, сканирование чеков или простое извлечение текста — получение «сырого» вывода OCR — лишь первая преграда. Хорошая новость: вы можете очистить этот вывод и мгновенно увидеть координаты ограничивающих рамок каждого региона без написания кучи шаблонного кода. + +В этом руководстве мы пройдемся по **извлечению OCR**, запустим **пост‑обработку очистки OCR**, а затем **отобразим координаты ограничивающих рамок** для каждого очищенного региона. К концу вы получите один готовый к запуску скрипт, который превратит размытое фото в аккуратный, структурированный текст, готовый к дальнейшей обработке. + +## Что понадобится + +- Python 3.9+ (синтаксис ниже работает на 3.8 и новее) +- OCR‑движок, поддерживающий `recognize(..., return_structured=True)` — например, вымышленная библиотека `engine`, используемая в примере. Замените её на Tesseract, EasyOCR или любой SDK, возвращающий данные о регионах. +- Базовое знакомство с функциями и циклами в Python +- Файл изображения, который хотите просканировать (PNG, JPG и т.д.) + +> **Pro tip:** Если вы используете Tesseract, функция `pytesseract.image_to_data` уже возвращает ограничивающие рамки. Вы можете обернуть её результат в небольшой адаптер, имитирующий API `engine.recognize`, показанный ниже. + +--- + +![perform OCR on image example](image-placeholder.png "perform OCR on image example") + +*Alt text: диаграмма, показывающая, как выполнить OCR на изображении и визуализировать координаты ограничивающих рамок* + +## Шаг 1 – Выполнить OCR на изображении и получить структурированные регионы + +Первое, что нужно сделать, — попросить OCR‑движок вернуть не просто простой текст, а структурированный список текстовых регионов. Этот список содержит исходную строку и прямоугольник, который её охватывает. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Почему это важно:** +Когда вы запрашиваете только простой текст, вы теряете пространственный контекст. Структурированные данные позволяют позже **отобразить координаты ограничивающих рамок**, выравнивать текст с таблицами или передавать точные позиции в downstream‑модель. + +## Шаг 2 – Как очистить вывод OCR с помощью пост‑процессора + +OCR‑движки хорошо распознают символы, но часто оставляют лишние пробелы, артефакты разрывов строк или ошибочно распознанные символы. Пост‑процессор нормализует текст, исправляет типичные ошибки OCR и обрезает пробелы. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Если вы создаёте собственный очиститель, учитывайте следующее: + +- Удаление не‑ASCII символов (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Сведение нескольких пробелов к одному +- Применение проверяющего орфографии, например `pyspellchecker`, для исправления очевидных опечаток + +**Почему это важно:** +Аккуратная строка делает поиск, индексацию и последующие NLP‑конвейеры гораздо надёжнее. Иными словами, **как очистить OCR** часто является разницей между пригодным набором данных и головной болью. + +## Шаг 3 – Отобразить координаты ограничивающих рамок для каждого очищенного региона + +Теперь, когда текст уже чистый, мы проходим по каждому региону, выводя его прямоугольник и очищенную строку. Это та часть, где мы наконец **отображаем координаты ограничивающих рамок**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Пример вывода** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Теперь вы можете передать эти координаты в библиотеку рисования (например, OpenCV), чтобы наложить рамки на оригинальное изображение, либо сохранить их в базе данных для последующих запросов. + +## Полный готовый к запуску скрипт + +Ниже представлена полная программа, связывающая все три шага. Замените вызовы заглушки `engine` на ваш реальный OCR‑SDK. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Как запустить + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Вы должны увидеть список ограничивающих рамок, сопоставленных с очищенным текстом, точно как в примере вывода выше. + +## Часто задаваемые вопросы и особые случаи + +| Вопрос | Ответ | +|----------|--------| +| **Что делать, если OCR‑движок не поддерживает `return_structured`?** | Напишите лёгкую оболочку, которая преобразует «сырой» вывод движка (обычно список слов с координатами) в объекты с атрибутами `text` и `bounding_box`. | +| **Можно ли получить оценки уверенности?** | Многие SDK предоставляют метрику confidence для каждого региона. Добавьте её к оператору печати: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Как обрабатывать повернутый текст?** | Предобработайте изображение с помощью `cv2.minAreaRect` из OpenCV, чтобы исправить наклон перед вызовом `recognize`. | +| **А если нужен вывод в JSON?** | Сериализуйте `processed_result.regions` через `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Есть ли способ визуализировать рамки?** | Используйте OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` внутри цикла, затем `cv2.imwrite("annotated.jpg", img)`. | + +## Подведение итогов + +Вы только что узнали, **как выполнить OCR на изображении**, очистить «сырой» вывод и **отобразить координаты ограничивающих рамок** для каждого региона. Трёхшаговый поток — распознавание → пост‑обработка → итерация — это переиспользуемый шаблон, который можно внедрить в любой Python‑проект, требующий надёжного извлечения текста. + +### Что дальше? + +- **Исследуйте разные OCR‑бэкенды** (Tesseract, EasyOCR, Google Vision) и сравните точность. +- **Интегрируйте с базой данных** для хранения данных регионов в поисковых архивах. +- **Добавьте определение языка**, чтобы направлять каждый регион в соответствующий проверяющий орфографии. +- **Наложите рамки на оригинальное изображение** для визуальной проверки (см. фрагмент кода OpenCV выше). + +Если столкнётесь с нюансами, помните, что главный выигрыш приходит от надёжного шага пост‑обработки; чистая строка гораздо проще в работе, чем «сырой» набор символов. + +Счастливого кодинга, и пусть ваши OCR‑конвейеры всегда остаются аккуратными! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/russian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/russian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..75bd4b632 --- /dev/null +++ b/ocr/russian/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Учебник по OCR на Python, показывающий, как извлекать текст из изображения + с помощью Aspose OCR Cloud. Научитесь загружать изображение для OCR и преобразовывать + его в обычный текст за считанные минуты. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: ru +og_description: Учебник по OCR на Python объясняет, как загрузить изображение для + OCR и преобразовать его в обычный текст с помощью Aspose OCR Cloud. Получите полный + код и советы. +og_title: Учебник по OCR на Python – извлечение текста из изображений +tags: +- OCR +- Python +- Image Processing +title: Учебник по OCR в Python – извлечение текста из изображений +url: /ru/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Tutorial – Извлечение текста из изображений + +Когда‑нибудь задумывались, как превратить неаккуратную фотографию чека в чистый, пригодный для поиска текст? Вы не одиноки. По моему опыту, самая большая преграда — это не сам движок OCR, а подготовка изображения в правильный формат и извлечение чистого текста без проблем. + +Этот **python ocr tutorial** проведёт вас через каждый шаг — загрузка изображения для OCR, запуск распознавания и, наконец, преобразование полученного текста в строку Python, которую можно сохранить или проанализировать. К концу вы сможете **extract text image python**‑стилем, и вам не понадобится платная лицензия, чтобы начать. + +## Что вы узнаете + +- Как установить и импортировать Aspose OCR Cloud SDK для Python. +- Точный код для **load image for OCR** (PNG, JPEG, TIFF, PDF и др.). +- Как вызвать движок для выполнения **ocr image to text**‑конверсии. +- Советы по работе с типичными краевыми случаями, такими как многостраничные PDF или сканы низкого разрешения. +- Способы проверки результата и что делать, если текст выглядит искажённым. + +### Предварительные требования + +- Python 3.8+ установленный на вашем компьютере. +- Бесплатный аккаунт Aspose Cloud (пробная версия работает без лицензии). +- Базовое знакомство с pip и виртуальными окружениями — ничего сложного. + +> **Pro tip:** Если вы уже используете virtualenv, активируйте его сейчас. Это поможет поддерживать зависимости в порядке и избежать конфликтов версий. + +![Python OCR tutorial screenshot showing recognized text](path/to/ocr_example.png "Python OCR tutorial – отображение извлечённого чистого текста") + +## Шаг 1 – Установите Aspose OCR Cloud SDK + +Первым делом нам нужна библиотека, которая общается с сервисом OCR от Aspose. Откройте терминал и выполните: + +```bash +pip install asposeocrcloud +``` + +Эта единственная команда загрузит последнюю версию SDK (в данный момент — версия 23.12). Пакет включает всё необходимое — дополнительные библиотеки для обработки изображений не требуются. + +## Шаг 2 – Инициализируйте OCR‑движок (Primary Keyword in Action) + +Теперь, когда SDK готов, мы можем запустить **python ocr tutorial**‑движок. Конструктор не требует ключа лицензии для пробной версии, что упрощает процесс. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** Инициализация движка один раз ускоряет последующие вызовы. Если создавать объект заново для каждого изображения, вы будете терять сетевые запросы. + +## Шаг 3 – Загрузите изображение для OCR + +Здесь проявляется сила ключевого слова **load image for OCR**. Метод SDK `Image.load` принимает путь к файлу или URL и автоматически определяет формат (PNG, JPEG, TIFF, PDF и др.). Загрузим пример чека: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Если вы работаете с многостраничным PDF, просто укажите путь к PDF‑файлу; SDK будет рассматривать каждую страницу как отдельное изображение внутри. + +## Шаг 4 – Выполните OCR‑конверсию изображения в текст + +Имея изображение в памяти, само OCR происходит в одну строку. Метод `recognize` возвращает объект `OcrResult`, содержащий чистый текст, оценки уверенности и даже ограничивающие рамки, если они понадобятся позже. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** Для изображений низкого разрешения (менее 300 dpi) может потребоваться предварительно увеличить их. SDK предоставляет вспомогательный класс `Resize`, но для большинства чеков значение по умолчанию работает отлично. + +## Шаг 5 – Преобразуйте чистый текст изображения в пригодную строку + +Последний элемент головоломки — извлечение чистого текста из объекта результата. Это шаг **convert image plain text**, который превращает OCR‑блоб в строку, которую можно вывести, сохранить или передать в другую систему. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +При запуске скрипта вы должны увидеть что‑то вроде: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Эта строка теперь обычный Python‑строковый объект, готовый к экспорту в CSV, вставке в базу данных или обработке естественного языка. + +## Обработка распространённых проблем + +### 1. Пустые или шумные изображения + +Если `ocr_result.text` возвращает пустую строку, проверьте качество изображения. Быстрое решение — добавить шаг предобработки: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Многостраничные PDF + +При передаче PDF метод `recognize` возвращает результаты для каждой страницы. Обойдите их так: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Поддержка языков + +Aspose OCR поддерживает более 60 языков. Чтобы сменить язык, задайте свойство `language` перед вызовом `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Полный рабочий пример + +Объединив всё вместе, получаем готовый к копированию скрипт, покрывающий всё — от установки до обработки краевых случаев: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Запустите скрипт (`python ocr_demo.py`), и вы увидите вывод **ocr image to text** прямо в консоли. + +## Итоги – Что мы рассмотрели + +- Установили **Aspose OCR Cloud** SDK (`pip install asposeocrcloud`). +- **Инициализировали OCR‑движок** без лицензии (идеально для пробной версии). +- Показали, как **load image for OCR**, будь то PNG, JPEG или PDF. +- Выполнили **ocr image to text**‑конверсию и **convert image plain text** в пригодную строку Python. +- Разобрались с типичными проблемами, такими как сканы низкого разрешения, многостраничные PDF и выбор языка. + +## Следующие шаги и смежные темы + +Теперь, когда вы освоили **python ocr tutorial**, можно изучить: + +- **Extract text image python** для пакетной обработки больших папок с чеками. +- Интеграцию OCR‑результатов с **pandas** для анализа данных (`df = pd.read_csv(StringIO(extracted))`). +- Использование **Tesseract OCR** как резервного варианта при ограниченном интернет‑соединении. +- Добавление пост‑обработки с **spaCy** для выявления сущностей, таких как даты, суммы и названия продавцов. + +Экспериментируйте: пробуйте разные форматы изображений, меняйте контраст или переключайте языки. Область OCR широка, а полученные навыки станут надёжным фундаментом для любого проекта автоматизации документов. + +Счастливого кодинга, и пусть ваш текст всегда остаётся читаемым! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/russian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/russian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..0dc8484b3 --- /dev/null +++ b/ocr/russian/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,221 @@ +--- +category: general +date: 2026-03-28 +description: Узнайте, как выполнять OCR на изображении, автоматически загружать модель + Hugging Face, очищать текст OCR и настраивать модель LLM в Python с использованием + Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: ru +og_description: Запустите OCR на изображении и очистите вывод с помощью автоматически + загруженной модели Hugging Face. Это руководство показывает, как настроить модель + LLM в Python. +og_title: Запуск OCR на изображении – Полный учебник по Aspose OCR Cloud +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Запуск OCR на изображении с помощью Aspose OCR Cloud – Полное пошаговое руководство +url: /ru/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Запуск OCR на изображении – Полный учебник Aspose OCR Cloud + +Когда‑нибудь нужно было выполнить OCR на файлах изображений, но полученный результат выглядел как набор бессвязных символов? По моему опыту самая большая боль – не распознавание, а последующая очистка. К счастью, Aspose OCR Cloud позволяет подключить LLM‑постпроцессор, который может *автоматически очистить OCR‑текст*. В этом руководстве мы пройдем всё необходимое: от **загрузки модели с Hugging Face** до настройки LLM, запуска OCR‑движка и финальной полировки результата. + +К концу этого руководства у вас будет готовый скрипт, который: + +1. Загружает компактную модель Qwen 2.5 с Hugging Face (автоматически скачивается для вас). +2. Настраивает модель так, чтобы часть сети работала на GPU, а остальное – на CPU. +3. Выполняет OCR‑движок на изображении с рукописной заметкой. +4. Использует LLM для очистки распознанного текста, получая человекочитаемый вывод. + +> **Prerequisites** – Python 3.8+, пакет `asposeocrcloud`, GPU с минимум 4 ГБ видеопамяти (опционально, но рекомендуется) и интернет‑соединение для первой загрузки модели. + +--- + +## Что вам понадобится + +- **Aspose OCR Cloud SDK** – установить через `pip install asposeocrcloud`. +- **Пример изображения** – например, `handwritten_note.jpg`, размещённый в локальной папке. +- **Поддержка GPU** – если у вас есть CUDA‑совместимый GPU, скрипт выгрузит 30 слоёв; иначе он автоматически переключится на CPU. +- **Разрешение на запись** – скрипт кэширует модель в `YOUR_DIRECTORY`; убедитесь, что папка существует. + +--- + +## Шаг 1 – Настройка модели LLM (загрузка модели с Hugging Face) + +Первое, что мы делаем, – сообщаем Aspose AI, откуда получать модель. Класс `AsposeAIModelConfig` отвечает за автоматическую загрузку, квантизацию и распределение слоёв по GPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Почему это важно** – квантизация до `int8` резко сокращает потребление памяти (≈ 4 ГБ против 12 ГБ). Разделение модели между GPU и CPU позволяет запускать LLM с 3‑миллиардами параметров даже на скромном RTX 3060. Если у вас нет GPU, задайте `gpu_layers=0`, и SDK оставит всё на CPU. + +> **Tip:** При первом запуске будет скачано ~ 1,5 ГБ, поэтому выделите несколько минут и обеспечьте стабильное соединение. + +--- + +## Шаг 2 – Инициализация AI‑движка с конфигурацией модели + +Теперь мы поднимаем AI‑движок Aspose и передаём ему только что созданную конфигурацию. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Что происходит за кулисами?** SDK проверяет `directory_model_path` на наличие уже загруженной модели. Если найдено соответствующее версии, она загружается мгновенно; иначе скачивается GGUF‑файл с Hugging Face, распаковывается и готовится конвейер вывода. + +--- + +## Шаг 3 – Создание OCR‑движка и подключение AI‑постпроцессора + +OCR‑движок выполняет тяжёлую работу по распознаванию символов. Подключив `ocr_ai.run_postprocessor`, мы автоматически включаем **очистку OCR‑текста** после распознавания. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Зачем нужен пост‑процессор?** Сырой OCR часто содержит неверные разрывы строк, ошибочно распознанную пунктуацию или лишние символы. LLM может переписать вывод в правильные предложения, исправить орфографию и даже восстановить пропущенные слова – по сути превращая «мусор» в отшлифованный текст. + +--- + +## Шаг 4 – Запуск OCR на файле изображения + +Когда всё соединено, пришло время передать изображение в движок. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Особый случай:** Если изображение большое (> 5 МП), имеет смысл сначала уменьшить его, чтобы ускорить обработку. SDK принимает объект Pillow `Image`, так что вы можете предварительно обработать его с помощью `PIL.Image.thumbnail()` при необходимости. + +--- + +## Шаг 5 – Позвольте ИИ очистить распознанный текст и покажите обе версии + +Наконец, вызываем ранее подключённый пост‑процессор. Этот шаг демонстрирует контраст между *до* и *после* очистки. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Ожидаемый вывод + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Обратите внимание, как LLM: + +- Исправил типичные ошибки OCR (`Th1s` → `This`). +- Удалил лишние символы (`&` → `and`). +- Привёл разрывы строк к корректным предложениям. + +--- + +## 🎨 Визуальный обзор (рабочий процесс «Run OCR on image») + +![Запуск OCR на изображении workflow](run_ocr_on_image_workflow.png "Диаграмма, показывающая конвейер запуска OCR на изображении от загрузки модели до очищенного вывода") + +Диаграмма выше суммирует весь конвейер: **загрузка модели с Hugging Face → настройка LLM → инициализация AI → OCR‑движок → AI‑постпроцессор → очистка OCR‑текста**. + +--- + +## Часто задаваемые вопросы и профессиональные советы + +### Что делать, если нет GPU? + +Установите `gpu_layers=0` в `AsposeAIModelConfig`. Модель будет полностью работать на CPU, что медленнее, но всё равно функционально. Вы также можете переключиться на более маленькую модель (например, `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`), чтобы время вывода оставалось приемлемым. + +### Как изменить модель позже? + +Просто обновите `hugging_face_repo_id` и заново выполните `ocr_ai.initialize(model_config)`. SDK обнаружит изменение версии, скачает новую модель и заменит кэшированные файлы. + +### Можно ли настроить подсказку (prompt) пост‑процессора? + +Да. Передайте словарь в `custom_settings` с ключом `prompt_template`. Например: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Стоит ли сохранять очищенный текст в файл? + +Определённо. После очистки вы можете записать результат в файл `.txt` или `.json` для дальнейшей обработки: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Заключение + +Мы показали, как **запустить OCR на изображении** с помощью Aspose OCR Cloud, автоматически **скачать модель с Hugging Face**, профессионально **настроить параметры модели LLM** и, наконец, **очистить OCR‑текст** с помощью мощного LLM‑постпроцессора. Весь процесс укладывается в один простой Python‑скрипт и работает как на GPU‑окружениях, так и на машинах без видеокарты. + +Если вам комфортно с этим конвейером, попробуйте поэкспериментировать с: + +- **Различными LLM** – например, `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` для более широкого контекстного окна. +- **Пакетной обработкой** – пройдитесь по папке изображений и соберите очищенные результаты в CSV. +- **Пользовательскими подсказками** – адаптируйте ИИ под вашу область (юридические документы, медицинские записи и т.д.). + +Не бойтесь менять значение `gpu_layers`, заменять модель или подключать собственную подсказку. Возможности безграничны, а полученный код – это ваша стартовая площадка. + +Счастливого кодинга, и пусть ваши OCR‑результаты всегда будут чистыми! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/spanish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/spanish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..635008676 --- /dev/null +++ b/ocr/spanish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,224 @@ +--- +category: general +date: 2026-03-28 +description: Cómo usar OCR para reconocer texto manuscrito en imágenes. Aprende a + extraer texto manuscrito, convertir imágenes manuscritas y obtener resultados limpios + rápidamente. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: es +og_description: Cómo usar OCR para reconocer texto manuscrito. Este tutorial te muestra + paso a paso cómo extraer texto manuscrito de imágenes y obtener resultados pulidos. +og_title: Cómo usar OCR para reconocer texto manuscrito – Guía completa +tags: +- OCR +- Handwriting Recognition +- Python +title: Cómo usar OCR para reconocer texto manuscrito – Guía completa +url: /es/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Cómo usar OCR para reconocer texto manuscrito – Guía completa + +Cómo usar OCR para notas manuscritas es una pregunta que muchos desarrolladores se hacen cuando necesitan digitalizar bocetos, actas de reuniones o ideas rápidas. En esta guía recorreremos los pasos exactos para reconocer texto manuscrito, extraer texto manuscrito y convertir una imagen manuscrita en cadenas limpias y buscables. + +Si alguna vez has mirado una foto de una lista de la compra y te has preguntado, “¿Puedo convertir esta imagen manuscrita a texto sin volver a teclear todo?” – estás en el lugar correcto. Al final tendrás un script listo para ejecutar que convierte una **nota manuscrita a texto** en segundos. + +## Lo que necesitarás + +- Python 3.8+ (el código funciona con cualquier versión reciente) +- La biblioteca `ocr` – instálala con `pip install ocr-sdk` (reemplaza con el nombre del paquete de tu proveedor) +- Una foto clara de una nota manuscrita (`hand_note.png` en el ejemplo) +- Un poco de curiosidad y un café ☕️ (opcional pero recomendado) + +Sin frameworks pesados, sin claves de nube pagas – solo un motor local que soporta **handwritten recognition** out of the box. + +## Paso 1 – Instalar el paquete OCR e importarlo + +Primero lo primero, obtengamos el paquete correcto en tu máquina. Abre una terminal y ejecuta: + +```bash +pip install ocr-sdk +``` + +Una vez que la instalación termine, importa el módulo en tu script: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Consejo profesional:** Si estás usando un entorno virtual, actívalo antes de instalar. Eso mantiene tu proyecto ordenado y evita conflictos de versiones. + +## Paso 2 – Crear un motor OCR y habilitar el modo manuscrito + +Ahora realmente **cómo usar OCR** – necesitamos una instancia del motor que sepa que estamos tratando con trazos cursivos en lugar de fuentes impresas. El siguiente fragmento crea el motor y lo cambia al modo manuscrito: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +¿Por qué establecer `recognition_mode`? Porque la mayoría de los motores OCR por defecto detectan texto impreso, lo que a menudo omite los bucles y ángulos de una nota personal. Habilitar el modo manuscrito aumenta la precisión dramáticamente. + +## Paso 3 – Cargar la imagen que deseas convertir (Convertir imagen manuscrita) + +Las imágenes son la materia prima para cualquier trabajo de OCR. Asegúrate de que tu foto esté guardada en un formato sin pérdida (PNG funciona muy bien) y que el texto sea razonablemente legible. Luego cárgala así: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Si la imagen está junto a tu script, puedes simplemente usar `"hand_note.png"` en lugar de una ruta completa. + +> **¿Qué pasa si la imagen está borrosa?** Intenta pre‑procesarla con OpenCV (p.ej., `cv2.cvtColor` a escala de grises, `cv2.threshold` para aumentar el contraste) antes de pasarla al motor OCR. + +## Paso 4 – Ejecutar el motor de reconocimiento para extraer texto manuscrito + +Con el motor listo y la imagen en memoria, finalmente podemos **extraer texto manuscrito**. El método `recognize` devuelve un objeto de resultado bruto que contiene el texto más puntuaciones de confianza. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +La salida bruta típica puede incluir saltos de línea extraños o caracteres mal identificados, especialmente si la escritura es desordenada. Por eso existe el siguiente paso. + +## Paso 5 – (Opcional) Pulir la salida con el post‑procesador de IA + +La mayoría de los SDKs OCR modernos incluyen un post‑procesador de IA ligero que limpia los espacios, corrige errores comunes de OCR y normaliza los finales de línea. Ejecutarlo es tan fácil como: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Si omites este paso aún obtendrás texto utilizable, pero la conversión de **nota manuscrita a texto** se verá un poco más áspera. El post‑procesador es especialmente útil para notas que contienen viñetas o palabras con mayúsculas y minúsculas mezcladas. + +## Paso 6 – Verificar el resultado y manejar casos límite + +Después de imprimir el resultado pulido, verifica que todo se vea correcto. Aquí tienes una rápida comprobación de sanidad que puedes añadir: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Lista de verificación de casos límite** + +| Situación | Qué hacer | +|-----------|----------| +| **Muy bajo contraste** | Aumenta el contraste con `cv2.convertScaleAbs` antes de cargar. | +| **Múltiples idiomas** | Establece `ocr_engine.language = ["en", "es"]` (o tus idiomas objetivo). | +| **Documentos grandes** | Procesa páginas en lotes para evitar picos de memoria. | +| **Símbolos especiales** | Añade un diccionario personalizado vía `ocr_engine.add_custom_words([...])`. | + +## Visión general visual + +A continuación hay una imagen de marcador de posición que ilustra el flujo de trabajo — desde una nota fotografiada hasta texto limpio. El texto alternativo contiene la palabra clave principal, haciendo que la imagen sea SEO‑friendly. + +![how to use OCR on a handwritten note image](/images/handwritten_ocr_flow.png "how to use OCR on a handwritten note image") + +## Script completo y ejecutable + +Juntando todas las piezas, aquí tienes el programa completo, listo para copiar y pegar: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Salida esperada (ejemplo)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Observa cómo el post‑procesador corrigió el error tipográfico “T0d@y” y normalizó los espacios. + +## Errores comunes y consejos profesionales + +- **El tamaño de la imagen importa** – los motores OCR suelen limitar el tamaño de entrada a 4 K × 4 K. Redimensiona fotos grandes de antemano. +- **Estilo de escritura** – Cursiva vs. letras de bloque pueden afectar la precisión. Si controlas la fuente (p.ej., un bolígrafo digital), fomenta letras de bloque para obtener los mejores resultados. +- **Procesamiento por lotes** – Cuando manejas decenas de notas, envuelve el script en un bucle y almacena cada resultado en un CSV o base de datos SQLite. +- **Fugas de memoria** – Algunos SDKs mantienen buffers internos; llama a `ocr_engine.dispose()` después de terminar si notas una desaceleración. + +## Próximos pasos – Más allá del OCR simple + +Ahora que dominas **cómo usar OCR** para una sola imagen, considera estas extensiones: + +1. **Integrar con almacenamiento en la nube** – Obtén imágenes de AWS S3 o Azure Blob, ejecuta la misma canalización y devuelve los resultados. +2. **Añadir detección de idioma** – Usa `ocr_engine.detect_language()` para cambiar automáticamente los diccionarios. +3. **Combinar con NLP** – Alimenta el texto limpio a spaCy o NLTK para extraer entidades, fechas o acciones. +4. **Crear un endpoint REST** – Envuelve el script en Flask o FastAPI para que otros servicios puedan POSTear imágenes y recibir texto codificado en JSON. + +Todas estas ideas siguen girando en torno a los conceptos centrales de **reconocer texto manuscrito**, **extraer texto manuscrito**, y **convertir imagen manuscrita** — las frases exactas que probablemente buscarás a continuación. + +--- + +### TL;DR + +Te mostramos **cómo usar OCR** para reconocer texto manuscrito, extraerlo y pulir el resultado en una cadena utilizable. El script completo está listo para ejecutar, el flujo de trabajo está explicado paso a paso, y ahora tienes una lista de verificación para casos límite comunes. Toma una foto de tu próxima nota de reunión, introdúcela en el script y deja que la máquina haga la escritura por ti. + +¡Feliz codificación, y que tus notas siempre sean legibles! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/spanish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/spanish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..318405521 --- /dev/null +++ b/ocr/spanish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,187 @@ +--- +category: general +date: 2026-03-28 +description: Realiza OCR en la imagen y obtén texto limpio con coordenadas de los + cuadros delimitadores. Aprende cómo extraer OCR, limpiar OCR y mostrar los resultados + paso a paso. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: es +og_description: Realiza OCR en la imagen, limpia la salida y muestra las coordenadas + de los cuadros delimitadores en un tutorial conciso. +og_title: Realizar OCR en una imagen – resultados limpios y cajas delimitadoras +tags: +- OCR +- Computer Vision +- Python +title: Realizar OCR en la imagen – Resultados limpios y mostrar coordenadas del cuadro + delimitador +url: /es/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Realizar OCR en Imagen – Resultados Limpios y Mostrar Coordenadas de Cuadro Delimitador + +¿Alguna vez necesitaste **realizar OCR en archivos de imagen** pero obtuviste texto desordenado y no sabías dónde se encuentra cada palabra en la foto? No estás solo. En muchos proyectos—digitalización de facturas, escaneo de recibos o extracción simple de texto—obtener la salida bruta de OCR es solo el primer obstáculo. ¿La buena noticia? Puedes limpiar esa salida y ver instantáneamente las coordenadas del cuadro delimitador de cada región sin escribir mucho código repetitivo. + +En esta guía recorreremos **cómo extraer OCR**, ejecutar un **cómo limpiar OCR** post‑procesador y, finalmente, **mostrar coordenadas de cuadro delimitador** para cada región limpia. Al final tendrás un único script ejecutable que convierte una foto borrosa en texto estructurado y ordenado listo para el procesamiento posterior. + +## Lo que Necesitarás + +- Python 3.9+ (la sintaxis a continuación funciona en 3.8 y versiones posteriores) +- Un motor OCR que soporte `recognize(..., return_structured=True)` – por ejemplo, una biblioteca ficticia `engine` usada en el fragmento. Reemplázala con Tesseract, EasyOCR o cualquier SDK que devuelva datos de regiones. +- Familiaridad básica con funciones y bucles en Python +- Un archivo de imagen que quieras escanear (PNG, JPG, etc.) + +> **Consejo profesional:** Si usas Tesseract, la función `pytesseract.image_to_data` ya te proporciona cuadros delimitadores. Puedes envolver su resultado en un pequeño adaptador que imite la API `engine.recognize` mostrada a continuación. + +--- + +![perform OCR on image example](image-placeholder.png "ejemplo de realizar OCR en imagen") + +*Texto alternativo: diagrama que muestra cómo realizar OCR en una imagen y visualizar las coordenadas del cuadro delimitador* + +## Paso 1 – Realizar OCR en Imagen y Obtener Regiones Estructuradas + +Lo primero es pedirle al motor OCR que devuelva no solo texto plano sino una lista estructurada de regiones de texto. Esta lista contiene la cadena cruda y el rectángulo que la encierra. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Por qué es importante:** +Cuando solo solicitas texto plano pierdes el contexto espacial. Los datos estructurados te permiten más tarde **mostrar coordenadas de cuadro delimitador**, alinear texto con tablas o proporcionar ubicaciones precisas a un modelo posterior. + +## Paso 2 – Cómo Limpiar la Salida de OCR con un Post‑Procesador + +Los motores OCR son excelentes detectando caracteres, pero a menudo dejan espacios sobrantes, artefactos de saltos de línea o símbolos mal reconocidos. Un post‑procesador normaliza el texto, corrige errores comunes de OCR y elimina espacios en blanco. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Si construyes tu propio limpiador, considera: + +- Eliminar caracteres no ASCII (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Colapsar múltiples espacios en uno solo +- Aplicar un corrector ortográfico como `pyspellchecker` para errores evidentes + +**Por qué deberías preocuparte:** +Una cadena ordenada hace que la búsqueda, indexación y los pipelines de NLP posteriores sean mucho más fiables. En otras palabras, **cómo limpiar OCR** suele ser la diferencia entre un conjunto de datos utilizable y un dolor de cabeza. + +## Paso 3 – Mostrar Coordenadas de Cuadro Delimitador para Cada Región Limpia + +Ahora que el texto está ordenado, iteramos sobre cada región, imprimiendo su rectángulo y la cadena limpiada. Esta es la parte donde finalmente **mostramos coordenadas de cuadro delimitador**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Salida de ejemplo** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Ahora puedes pasar esas coordenadas a una biblioteca de dibujo (p. ej., OpenCV) para superponer cuadros sobre la imagen original, o almacenarlas en una base de datos para consultas posteriores. + +## Script Completo y Listo para Ejecutar + +A continuación tienes el programa completo que une los tres pasos. Sustituye las llamadas de marcador `engine` por tu SDK OCR real. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Cómo Ejecutar + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Deberías ver una lista de cuadros delimitadores emparejados con texto limpio, exactamente como la salida de ejemplo anterior. + +## Preguntas Frecuentes y Casos Extremos + +| Pregunta | Respuesta | +|----------|-----------| +| **¿Qué pasa si el motor OCR no soporta `return_structured`?** | Escribe un contenedor ligero que convierta la salida cruda del motor (usualmente una lista de palabras con coordenadas) en objetos con atributos `text` y `bounding_box`. | +| **¿Puedo obtener puntuaciones de confianza?** | Muchos SDK exponen una métrica de confianza por región. Añádela a la instrucción de impresión: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **¿Cómo manejar texto rotado?** | Pre‑procesa la imagen con `cv2.minAreaRect` de OpenCV para desinclinar antes de llamar a `recognize`. | +| **¿Qué pasa si necesito la salida en JSON?** | Serializa `processed_result.regions` con `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **¿Hay una forma de visualizar los cuadros?** | Usa OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` dentro del bucle, luego `cv2.imwrite("annotated.jpg", img)`. | + +## Conclusión + +Acabas de aprender **cómo realizar OCR en imagen**, limpiar la salida cruda y **mostrar coordenadas de cuadro delimitador** para cada región. El flujo de tres pasos—reconocer → post‑procesar → iterar—es un patrón reutilizable que puedes incorporar en cualquier proyecto Python que necesite extracción de texto fiable. + +### ¿Qué Sigue? + +- **Explora diferentes back‑ends OCR** (Tesseract, EasyOCR, Google Vision) y compara precisión. +- **Integra con una base de datos** para almacenar datos de regiones y crear archivos buscables. +- **Añade detección de idioma** para encaminar cada región al corrector ortográfico adecuado. +- **Superpone cuadros sobre la imagen original** para verificación visual (consulta el fragmento de OpenCV arriba). + +Si encuentras particularidades, recuerda que la mayor ventaja proviene de un sólido paso de post‑procesamiento; una cadena limpia es mucho más fácil de manejar que un volcado bruto de caracteres. + +¡Feliz codificación, y que tus pipelines de OCR siempre estén ordenados! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/spanish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/spanish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..626fc98c1 --- /dev/null +++ b/ocr/spanish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Tutorial de OCR en Python que muestra cómo extraer texto de una imagen + con Aspose OCR Cloud. Aprende a cargar una imagen para OCR y convertirla a texto + plano en minutos. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: es +og_description: El tutorial de OCR en Python explica cómo cargar una imagen para OCR + y convertir la imagen a texto plano usando Aspose OCR Cloud. Obtén el código completo + y los consejos. +og_title: Tutorial de OCR en Python – Extraer texto de imágenes +tags: +- OCR +- Python +- Image Processing +title: Tutorial de OCR en Python – Extraer texto de imágenes +url: /es/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Tutorial de OCR en Python – Extraer Texto de Imágenes + +¿Alguna vez te has preguntado cómo convertir una foto desordenada de un recibo en texto limpio y buscable? No eres el único. En mi experiencia, el mayor obstáculo no es el motor OCR en sí, sino conseguir que la imagen tenga el formato correcto y extraer el texto plano sin problemas. + +Este **python ocr tutorial** te guía paso a paso: cargar una imagen para OCR, ejecutar el reconocimiento y, finalmente, convertir el texto plano de la imagen en una cadena de Python que puedas almacenar o analizar. Al final podrás **extraer texto de imagen con python**, y no necesitarás ninguna licencia de pago para comenzar. + +## Lo que aprenderás + +- Cómo instalar e importar el Aspose OCR Cloud SDK para Python. +- El código exacto para **cargar imagen para OCR** (PNG, JPEG, TIFF, PDF, etc.). +- Cómo llamar al motor para realizar la conversión **ocr image to text**. +- Consejos para manejar casos límite comunes como PDFs de varias páginas o escaneos de baja resolución. +- Formas de verificar la salida y qué hacer si el texto aparece distorsionado. + +### Requisitos previos + +- Python 3.8+ instalado en tu máquina. +- Una cuenta gratuita de Aspose Cloud (la prueba funciona sin licencia). +- Familiaridad básica con pip y entornos virtuales—nada complicado. + +> **Consejo profesional:** Si ya estás usando un virtualenv, actívalo ahora. Mantiene tus dependencias ordenadas y evita conflictos de versiones. + +![Captura de pantalla del tutorial de OCR en Python que muestra el texto reconocido](path/to/ocr_example.png "Tutorial de OCR en Python – visualización del texto plano extraído") + +## Paso 1 – Instalar el Aspose OCR Cloud SDK + +Lo primero es obtener la biblioteca que se comunica con el servicio OCR de Aspose. Abre una terminal y ejecuta: + +```bash +pip install asposeocrcloud +``` + +Ese único comando descarga el SDK más reciente (actualmente versión 23.12). El paquete incluye todo lo que necesitas—no se requieren librerías adicionales de procesamiento de imágenes. + +## Paso 2 – Inicializar el Motor OCR (Palabra clave principal en acción) + +Ahora que el SDK está listo, podemos iniciar el motor del **python ocr tutorial**. El constructor no necesita ninguna clave de licencia para la prueba, lo que simplifica las cosas. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Por qué importa:** Inicializar el motor solo una vez mantiene las llamadas posteriores rápidas. Si recreas el objeto para cada imagen, desperdiciarás viajes de red. + +## Paso 3 – Cargar Imagen para OCR + +Aquí es donde brilla la palabra clave **cargar imagen para OCR**. El método `Image.load` del SDK acepta una ruta de archivo o una URL, y detecta automáticamente el formato (PNG, JPEG, TIFF, PDF, etc.). Carguemos un recibo de ejemplo: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Si trabajas con un PDF de varias páginas, simplemente apunta al archivo PDF; el SDK tratará cada página como una imagen separada internamente. + +## Paso 4 – Realizar la Conversión OCR de Imagen a Texto + +Con la imagen en memoria, el OCR real ocurre en una sola línea. El método `recognize` devuelve un objeto `OcrResult` que contiene el texto plano, puntuaciones de confianza e incluso cajas delimitadoras si las necesitas más adelante. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Caso límite:** Para fotos de baja resolución (menos de 300 dpi) puede que quieras escalar la imagen primero. El SDK ofrece un ayudante `Resize`, pero para la mayoría de los recibos el valor predeterminado funciona bien. + +## Paso 5 – Convertir el Texto Plano de la Imagen en una Cadena Utilizable + +La pieza final del rompecabezas es extraer el texto plano del objeto de resultado. Este es el paso **convert image plain text** que transforma el blob OCR en algo que puedes imprimir, almacenar o pasar a otro sistema. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Al ejecutar el script, deberías ver algo como: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Esa salida ahora es una cadena de Python normal, lista para exportarse a CSV, insertarse en una base de datos o procesarse con procesamiento de lenguaje natural. + +## Manejo de Problemas Comunes + +### 1. Imágenes en blanco o ruidosas + +Si `ocr_result.text` vuelve vacío, verifica la calidad de la imagen. Una solución rápida es añadir un paso de preprocesamiento: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. PDFs de varias páginas + +Cuando alimentas un PDF, `recognize` devuelve resultados para cada página. Recorre los resultados así: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Soporte de idiomas + +Aspose OCR admite más de 60 idiomas. Para cambiar el idioma, establece la propiedad `language` antes de llamar a `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Ejemplo Completo Funcional + +Juntando todo, aquí tienes un script completo listo para copiar y pegar que cubre desde la instalación hasta el manejo de casos límite: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Ejecuta el script (`python ocr_demo.py`) y verás la salida **ocr image to text** directamente en tu consola. + +## Recapitulación – Lo que cubrimos + +- Instalamos el SDK **Aspose OCR Cloud** (`pip install asposeocrcloud`). +- **Inicializamos el motor OCR** sin licencia (perfecto para la prueba). +- Demostramos cómo **cargar imagen para OCR**, ya sea PNG, JPEG o PDF. +- Ejecutamos la conversión **ocr image to text** y **convertimos el texto plano de la imagen** en una cadena de Python utilizable. +- Abordamos problemas comunes como escaneos de baja resolución, PDFs de varias páginas y selección de idioma. + +## Próximos Pasos y Temas Relacionados + +Ahora que dominas el **python ocr tutorial**, considera explorar: + +- **Extract text image python** para procesamiento por lotes de grandes carpetas de recibos. +- Integrar la salida OCR con **pandas** para análisis de datos (`df = pd.read_csv(StringIO(extracted))`). +- Usar **Tesseract OCR** como alternativa cuando la conectividad a internet es limitada. +- Añadir post‑procesamiento con **spaCy** para identificar entidades como fechas, montos y nombres de comercios. + +Siéntete libre de experimentar: prueba diferentes formatos de imagen, ajusta el contraste o cambia de idioma. El mundo del OCR es amplio, y las habilidades que acabas de adquirir son una base sólida para cualquier proyecto de automatización documental. + +¡Feliz codificación, y que tu texto siempre sea legible! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/spanish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/spanish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..754034c2c --- /dev/null +++ b/ocr/spanish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,220 @@ +--- +category: general +date: 2026-03-28 +description: Aprende cómo ejecutar OCR en una imagen, descargar automáticamente el + modelo de Hugging Face, limpiar el texto OCR y configurar el modelo LLM en Python + usando Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: es +og_description: Ejecuta OCR en la imagen y limpia la salida usando un modelo de Hugging Face + descargado automáticamente. Esta guía muestra cómo configurar el modelo LLM en Python. +og_title: Ejecutar OCR en una imagen – Tutorial completo de Aspose OCR Cloud +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Ejecuta OCR en una imagen con Aspose OCR Cloud – Guía completa paso a paso +url: /es/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Ejecutar OCR en Imagen – Tutorial Completo de Aspose OCR Cloud + +¿Alguna vez necesitaste ejecutar OCR en archivos de imagen pero la salida cruda parecía un desastre desordenado? En mi experiencia, el mayor punto doloroso no es el reconocimiento en sí, sino la limpieza. Afortunadamente, Aspose OCR Cloud te permite adjuntar un post‑procesador LLM que puede *limpiar el texto OCR* automáticamente. En este tutorial recorreremos todo lo que necesitas: desde **descargar un modelo de Hugging Face** hasta configurar el LLM, ejecutar el motor OCR y, finalmente, pulir el resultado. + +Al final de esta guía tendrás un script listo para ejecutar que: + +1. Obtiene un modelo compacto Qwen 2.5 de Hugging Face (descargado automáticamente para ti). +2. Configura el modelo para ejecutar parte de la red en GPU y el resto en CPU. +3. Ejecuta el motor OCR sobre una imagen de una nota manuscrita. +4. Usa el LLM para limpiar el texto reconocido, dándote una salida legible para humanos. + +> **Prerequisitos** – Python 3.8+, paquete `asposeocrcloud`, una GPU con al menos 4 GB de VRAM (opcional pero recomendado) y una conexión a internet para la primera descarga del modelo. + +--- + +## Lo que Necesitarás + +- **Aspose OCR Cloud SDK** – instálalo vía `pip install asposeocrcloud`. +- **Una imagen de ejemplo** – p. ej., `handwritten_note.jpg` colocada en una carpeta local. +- **Soporte GPU** – si dispones de una GPU con CUDA, el script delegará 30 capas; de lo contrario volverá a CPU automáticamente. +- **Permiso de escritura** – el script almacena en caché el modelo en `YOUR_DIRECTORY`; asegúrate de que la carpeta exista. + +--- + +## Paso 1 – Configurar el Modelo LLM (descargar modelo de Hugging Face) + +Lo primero que hacemos es indicar a Aspose AI dónde obtener el modelo. La clase `AsposeAIModelConfig` gestiona la descarga automática, la cuantización y la asignación de capas a GPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Por qué es importante** – Cuantizar a `int8` reduce drásticamente el uso de memoria (≈ 4 GB vs 12 GB). Dividir el modelo entre GPU y CPU te permite ejecutar un LLM de 3 mil millones de parámetros incluso en una RTX 3060 modesta. Si no tienes GPU, establece `gpu_layers=0` y el SDK mantendrá todo en CPU. + +> **Consejo:** La primera ejecución descargará ~ 1.5 GB, así que dale unos minutos y una conexión estable. + +--- + +## Paso 2 – Inicializar el Motor AI con la Configuración del Modelo + +Ahora iniciamos el motor AI de Aspose y le pasamos la configuración que acabamos de crear. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**¿Qué ocurre bajo el capó?** El SDK verifica `directory_model_path` en busca de un modelo existente. Si encuentra una versión coincidente lo carga al instante; de lo contrario descarga el archivo GGUF de Hugging Face, lo descomprime y prepara la canalización de inferencia. + +--- + +## Paso 3 – Crear el Motor OCR y Adjuntar el Post‑procesador AI + +El motor OCR realiza el trabajo pesado de reconocer caracteres. Al adjuntar `ocr_ai.run_postprocessor` habilitamos **texto OCR limpio** automáticamente después del reconocimiento. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**¿Por qué usar un post‑procesador?** El OCR crudo a menudo incluye saltos de línea en lugares incorrectos, puntuación mal detectada o símbolos errantes. El LLM puede reescribir la salida en oraciones correctas, corregir la ortografía e incluso inferir palabras faltantes, esencialmente convirtiendo un volcado bruto en prosa pulida. + +--- + +## Paso 4 – Ejecutar OCR en un Archivo de Imagen + +Con todo conectado, es hora de alimentar una imagen al motor. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Caso límite:** Si la imagen es grande (> 5 MP), quizá quieras redimensionarla primero para acelerar el procesamiento. El SDK acepta un objeto `Image` de Pillow, así que puedes pre‑procesar con `PIL.Image.thumbnail()` si lo necesitas. + +--- + +## Paso 5 – Dejar que la IA Limpie el Texto Reconocido y Mostrar Ambas Versiones + +Finalmente invocamos el post‑procesador que adjuntamos antes. Este paso muestra el contraste entre *antes* y *después* de la limpieza. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Salida Esperada + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Observa cómo el LLM ha: + +- Corregido errores comunes de OCR (`Th1s` → `This`). +- Eliminado símbolos errantes (`&` → `and`). +- Normalizado saltos de línea en oraciones correctas. + +--- + +## 🎨 Visión General Visual (Flujo de Ejecutar OCR en Imagen) + +![Flujo de ejecución de OCR en imagen](run_ocr_on_image_workflow.png "Diagrama que muestra el pipeline de OCR en imagen desde la descarga del modelo hasta la salida limpiada") + +El diagrama anterior resume el pipeline completo: **descargar modelo de Hugging Face → configurar LLM → inicializar AI → motor OCR → post‑procesador AI → texto OCR limpio**. + +--- + +## Preguntas Frecuentes y Consejos Profesionales + +### ¿Qué pasa si no tengo GPU? + +Establece `gpu_layers=0` en `AsposeAIModelConfig`. El modelo se ejecutará completamente en CPU, lo cual es más lento pero sigue siendo funcional. También puedes cambiar a un modelo más pequeño (p. ej., `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) para mantener razonable el tiempo de inferencia. + +### ¿Cómo cambio el modelo más adelante? + +Simplemente actualiza `hugging_face_repo_id` y vuelve a ejecutar `ocr_ai.initialize(model_config)`. El SDK detectará el cambio de versión, descargará el nuevo modelo y reemplazará los archivos en caché. + +### ¿Puedo personalizar el prompt del post‑procesador? + +Sí. Pasa un diccionario a `custom_settings` con una clave `prompt_template`. Por ejemplo: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### ¿Debo almacenar el texto limpio en un archivo? + +Definitivamente. Después de la limpieza puedes escribir el resultado en un archivo `.txt` o `.json` para procesamiento posterior: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Conclusión + +Acabamos de mostrarte cómo **ejecutar OCR en archivos de imagen** con Aspose OCR Cloud, **descargar automáticamente un modelo de Hugging Face**, configurar expertamente los **ajustes del modelo LLM** y, finalmente, **limpiar el texto OCR** usando un potente post‑procesador LLM. Todo el proceso cabe en un único script fácil de ejecutar y funciona tanto en máquinas con GPU como en aquellas solo con CPU. + +Si te sientes cómodo con este pipeline, considera experimentar con: + +- **Diferentes LLMs** – prueba `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` para una ventana de contexto mayor. +- **Procesamiento por lotes** – recorre una carpeta de imágenes y agrega los resultados limpios a un CSV. +- **Prompts personalizados** – adapta la IA a tu dominio (documentos legales, notas médicas, etc.). + +Siéntete libre de ajustar el valor de `gpu_layers`, cambiar el modelo o conectar tu propio prompt. El cielo es el límite, y el código que tienes ahora es la plataforma de lanzamiento. + +¡Feliz codificación, y que tus salidas OCR estén siempre limpias! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/swedish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/swedish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..24dad54fc --- /dev/null +++ b/ocr/swedish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Hur man använder OCR för att känna igen handskriven text i bilder. Lär + dig att extrahera handskriven text, konvertera handskriven bild och få rena resultat + snabbt. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: sv +og_description: Hur du använder OCR för att känna igen handskriven text. Den här handledningen + visar dig steg för steg hur du extraherar handskriven text från bilder och får ett + polerat resultat. +og_title: Hur man använder OCR för att känna igen handskriven text – Komplett guide +tags: +- OCR +- Handwriting Recognition +- Python +title: Hur du använder OCR för att känna igen handskriven text – Komplett guide +url: /sv/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Hur man använder OCR för att känna igen handskriven text – Komplett guide + +Hur man använder OCR för handskrivna anteckningar är en fråga som många utvecklare ställer när de behöver digitalisera skisser, mötesprotokoll eller snabba idéer. I den här guiden går vi igenom de exakta stegen för att känna igen handskriven text, extrahera handskriven text och omvandla en handskriven bild till rena, sökbara strängar. + +Om du någonsin har stirrat på ett foto av en inköpslista och undrat, “Kan jag konvertera den här handskrivna bilden till text utan att skriva om allt?” – så är du på rätt plats. Vid slutet har du ett färdigt skript som förvandlar en **handwritten note to text** på några sekunder. + +## Vad du behöver + +- Python 3.8+ (koden fungerar med alla nyare versioner) +- `ocr`-biblioteket – installera det med `pip install ocr-sdk` (byt ut mot ditt leverantörs paketnamn) +- En klar bild av en handskriven anteckning (`hand_note.png` i exemplet) +- En gnutta nyfikenhet och en kaffe ☕️ (valfritt men rekommenderas) + +Inga tunga ramverk, inga betalda molnnycklar – bara en lokal motor som stödjer **handwritten recognition** direkt ur lådan. + +## Steg 1 – Installera OCR-paketet och importera det + +Först och främst, låt oss skaffa rätt paket på din maskin. Öppna en terminal och kör: + +```bash +pip install ocr-sdk +``` + +När installationen är klar, importera modulen i ditt skript: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tip:** Om du använder en virtuell miljö, aktivera den innan du installerar. Det håller ditt projekt prydligt och undviker versionskonflikter. + +## Steg 2 – Skapa en OCR-motor och aktivera handskriftsläge + +Nu ska vi faktiskt **how to use OCR** – vi behöver en motorinstans som vet att vi hanterar kursiva streck snarare än tryckt text. Följande kodsnutt skapar motorn och växlar den till handskriftsläge: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Varför sätta `recognition_mode`? Eftersom de flesta OCR-motorer som standard detekterar tryckt text, vilket ofta missar slingorna och lutningarna i en personlig anteckning. Att aktivera handskriftsläget ökar noggrannheten dramatiskt. + +## Steg 3 – Ladda bilden du vill konvertera (Convert Handwritten Image) + +Bilder är råmaterialet för alla OCR-uppgifter. Se till att din bild sparas i ett förlustfritt format (PNG fungerar bra) och att texten är rimligt läsbar. Ladda sedan den så här: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Om bilden ligger bredvid ditt skript kan du helt enkelt använda `"hand_note.png"` istället för en fullständig sökväg. + +> **What if the image is blurry?** Försök med förbehandling med OpenCV (t.ex. `cv2.cvtColor` till gråskala, `cv2.threshold` för att öka kontrasten) innan du matar in den i OCR-motorn. + +## Steg 4 – Kör igenkänningsmotorn för att extrahera handskriven text + +Med motorn redo och bilden i minnet kan vi äntligen **extract handwritten text**. `recognize`-metoden returnerar ett råresultatobjekt som innehåller texten plus förtroendesiffror. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Typisk råoutput kan innehålla oönskade radbrytningar eller felidentifierade tecken, särskilt om handstilen är rörig. Det är därför nästa steg finns. + +## Steg 5 – (Valfritt) Polera outputen med AI‑postprocessorn + +De flesta moderna OCR SDK:er levereras med en lättviktig AI‑postprocessor som rensar upp mellanslag, fixar vanliga OCR‑fel och normaliserar radslut. Att köra den är lika enkelt som: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Om du hoppar över detta steg får du fortfarande användbar text, men konverteringen **handwritten note to text** kommer att se lite grövre ut. Postprocessorn är särskilt praktisk för anteckningar som innehåller punktlistor eller blandade versaler. + +## Steg 6 – Verifiera resultatet och hantera kantfall + +Efter att ha skrivit ut det polerade resultatet, dubbelkolla att allt ser rätt ut. Här är en snabb kontroll du kan lägga till: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Kantfallschecklista** + +| Situation | Vad man ska göra | +|-----------|-------------------| +| **Mycket låg kontrast** | Öka kontrasten med `cv2.convertScaleAbs` innan inläsning. | +| **Flera språk** | Sätt `ocr_engine.language = ["en", "es"]` (eller dina målspåk). | +| **Stora dokument** | Bearbeta sidor i batcher för att undvika minnesspikar. | +| **Specialtecken** | Lägg till en anpassad ordlista via `ocr_engine.add_custom_words([...])`. | + +## Visuell översikt + +Nedan är en platshållarbild som illustrerar arbetsflödet—från en fotograferad anteckning till ren text. Alt‑texten innehåller huvudnyckelordet, vilket gör bilden SEO‑vänlig. + +![hur man använder OCR på en handskriven anteckningsbild](/images/handwritten_ocr_flow.png "hur man använder OCR på en handskriven anteckningsbild") + +## Fullt, körbart skript + +När alla bitar satts ihop, här är det kompletta, kopiera‑och‑klistra‑klara programmet: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Förväntad output (exempel)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Lägg märke till hur postprocessorn fixade stavfelet “T0d@y” och normaliserade mellanslagen. + +## Vanliga fallgropar & Pro‑tips + +- **Image size matters** – OCR-motorer brukar begränsa inmatningsstorleken till 4 K × 4 K. Ändra storlek på stora foton i förväg. +- **Handwriting style** – Kursiv vs. blockbokstäver kan påverka noggrannheten. Om du kontrollerar källan (t.ex. en digital penna), uppmuntra blockbokstäver för bästa resultat. +- **Batch processing** – När du hanterar dussintals anteckningar, omslut skriptet i en loop och lagra varje resultat i en CSV‑ eller SQLite‑databas. +- **Memory leaks** – Vissa SDK:er behåller interna buffertar; anropa `ocr_engine.dispose()` när du är klar om du märker en nedgång i prestanda. + +## Nästa steg – Gå bortom enkel OCR + +Nu när du behärskar **how to use OCR** för en enskild bild, överväg dessa tillägg: + +1. **Integrate with cloud storage** – Hämta bilder från AWS S3 eller Azure Blob, kör samma pipeline och skicka tillbaka resultaten. +2. **Add language detection** – Använd `ocr_engine.detect_language()` för att automatiskt byta ordböcker. +3. **Combine with NLP** – Mata den rensade texten i spaCy eller NLTK för att extrahera entiteter, datum eller åtgärdspunkter. +4. **Create a REST endpoint** – Omslut skriptet i Flask eller FastAPI så att andra tjänster kan POST:a bilder och få JSON‑kodad text. + +Alla dessa idéer kretsar fortfarande kring kärnkoncepten **recognize handwritten text**, **extract handwritten text**, och **convert handwritten image**—de exakta fraserna du sannolikt kommer att söka efter härnäst. + +--- + +### TL;DR + +Vi visade dig **how to use OCR** för att känna igen handskriven text, extrahera den och polera resultatet till en användbar sträng. Det fullständiga skriptet är redo att köras, arbetsflödet förklaras steg‑för‑steg, och du har nu en checklista för vanliga kantfall. Ta ett foto av din nästa mötesanteckning, mata in det i skriptet, och låt maskinen göra skrivandet åt dig. + +Lycka till med kodandet, och må dina anteckningar alltid vara läsbara! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/swedish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/swedish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..660830e28 --- /dev/null +++ b/ocr/swedish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,185 @@ +--- +category: general +date: 2026-03-28 +description: Utför OCR på bilden och få ren text med koordinater för avgränsningsrutor. + Lär dig hur du extraherar OCR, rensar OCR och visar resultaten steg för steg. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: sv +og_description: Utför OCR på en bild, rensa resultatet och visa koordinater för avgränsningsrutor + i en kortfattad handledning. +og_title: Utför OCR på bild – Rena resultat och avgränsningsrutor +tags: +- OCR +- Computer Vision +- Python +title: Utför OCR på bild – Rensa resultat och visa koordinater för avgränsningsruta +url: /sv/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Utför OCR på bild – Rensa resultat och visa koordinater för avgränsningsrutor + +Har du någonsin behövt **perform OCR on image** filer men fått rörig text och är osäker på var varje ord finns på bilden? Du är inte ensam. I många projekt—fakturadigitalisering, kvittoskanning eller enkel textutvinning—är det råa OCR‑utdata bara det första hindret. Den goda nyheten? Du kan rensa den utdata och omedelbart se varje regions avgränsningsrutekoordinater utan att skriva en massa boilerplate‑kod. + +I den här guiden går vi igenom **how to extract OCR**, kör en **how to clean OCR** post‑processor och slutligen **display bounding box coordinates** för varje rensad region. I slutet har du ett enda körbart skript som förvandlar ett suddigt foto till prydlig, strukturerad text redo för vidare bearbetning. + +## Vad du behöver + +- Python 3.9+ (syntaxen nedan fungerar på 3.8 och nyare) +- En OCR‑motor som stödjer `recognize(..., return_structured=True)` – till exempel ett fiktivt `engine`‑bibliotek som används i kodsnutten. Byt ut det mot Tesseract, EasyOCR eller någon SDK som returnerar regionsdata. +- Grundläggande kunskap om Python‑funktioner och loopar +- En bildfil du vill skanna (PNG, JPG, etc.) + +> **Pro tip:** Om du använder Tesseract ger `pytesseract.image_to_data`‑funktionen dig redan avgränsningsrutor. Du kan omsluta dess resultat i en liten adapter som efterliknar `engine.recognize`‑API:t som visas nedan. + +--- + +![utför OCR på bild exempel](image-placeholder.png "utför OCR på bild exempel") + +*Alt text: diagram som visar hur man utför OCR på bild och visualiserar avgränsningsrutekoordinater* + +## Steg 1 – Utför OCR på bild och hämta strukturerade regioner + +Det första är att be OCR‑motorn att returnera inte bara vanlig text utan en strukturerad lista med textregioner. Denna lista innehåller den råa strängen och rektangeln som omsluter den. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Varför detta är viktigt:** +När du bara ber om vanlig text förlorar du den rumsliga kontexten. Strukturerad data låter dig senare **display bounding box coordinates**, align text with tables, or feed precise locations to a downstream model. + +## Steg 2 – Hur man rensar OCR‑utdata med en post‑processor + +OCR‑motorer är bra på att identifiera tecken, men de lämnar ofta kvar oönskade mellanslag, radbrytningsartefakter eller felaktigt igenkända symboler. En post‑processor normaliserar texten, rättar vanliga OCR‑fel och tar bort onödigt blanksteg. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Om du bygger din egen rensare, överväg: + +- Ta bort icke‑ASCII‑tecken (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Kombinera flera mellanslag till ett enda mellanslag +- Använd en stavningskontroll som `pyspellchecker` för uppenbara stavfel + +**Varför du bör bry dig:** +En prydlig sträng gör sökning, indexering och efterföljande NLP‑pipelines mycket mer pålitliga. Med andra ord är **how to clean OCR** ofta skillnaden mellan en användbar dataset och ett huvudvärk. + +## Steg 3 – Visa avgränsningsrutekoordinater för varje rensad region + +Nu när texten är prydlig itererar vi över varje region, skriver ut dess rektangel och den rensade strängen. Detta är delen där vi slutligen **display bounding box coordinates**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Exempel på output** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Du kan nu mata in dessa koordinater i ett ritbibliotek (t.ex. OpenCV) för att överlagra rutor på originalbilden, eller lagra dem i en databas för senare frågor. + +## Fullt, körklart skript + +Nedan är det kompletta programmet som knyter ihop alla tre stegen. Byt ut platshållar‑anropen `engine` mot ditt faktiska OCR‑SDK. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Så kör du + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Du bör se en lista med avgränsningsrutor ihopparade med rensad text, exakt som exempelutdata ovan. + +## Vanliga frågor & kantfall + +| Question | Answer | +|----------|--------| +| **What if the OCR engine doesn’t support `return_structured`?** | Write a thin wrapper that converts the engine’s raw output (usually a list of words with coordinates) into objects with `text` and `bounding_box` attributes. | +| **Can I get confidence scores?** | Many SDKs expose a confidence metric per region. Append it to the print statement: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **How to handle rotated text?** | Pre‑process the image with OpenCV’s `cv2.minAreaRect` to deskew before calling `recognize`. | +| **What if I need the output in JSON?** | Serialize `processed_result.regions` with `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Is there a way to visualize the boxes?** | Use OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` inside the loop, then `cv2.imwrite("annotated.jpg", img)`. | + +## Avslutning + +Du har precis lärt dig **how to perform OCR on image**, rensat den råa utdata och **display bounding box coordinates** för varje region. Det trestegsflödet—recognize → post‑process → iterate—är ett återanvändbart mönster som du kan infoga i vilket Python‑projekt som helst som behöver pålitlig textutvinning. + +### Vad blir nästa? + +- **Utforska olika OCR‑bakändar** (Tesseract, EasyOCR, Google Vision) och jämför noggrannhet. +- **Integrera med en databas** för att lagra regionsdata för sökbara arkiv. +- **Lägg till språkdetection** för att dirigera varje region genom lämplig stavningskontroll. +- **Överlagra rutor på originalbilden** för visuell verifiering (se OpenCV‑snutten ovan). + +Om du stöter på konstigheter, kom ihåg att den största vinsten kommer från ett solidt post‑processing‑steg; en ren sträng är mycket enklare att arbeta med än en rå dump av tecken. + +Lycka till med kodandet, och må dina OCR‑pipelines alltid vara prydliga! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/swedish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/swedish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..549fee601 --- /dev/null +++ b/ocr/swedish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,232 @@ +--- +category: general +date: 2026-03-28 +description: Python OCR-handledning som visar hur man extraherar text från en bild + med Aspose OCR Cloud. Lär dig att ladda en bild för OCR och konvertera bilden till + vanlig text på några minuter. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: sv +og_description: Python OCR-handledning förklarar hur du laddar en bild för OCR och + konverterar bildens rena text med Aspose OCR Cloud. Få hela koden och tipsen. +og_title: Python OCR-handledning – Extrahera text från bilder +tags: +- OCR +- Python +- Image Processing +title: Python OCR-handledning – Extrahera text från bilder +url: /sv/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR-handledning – Extrahera text från bilder + +Har du någonsin undrat hur du kan förvandla ett rörigt kvittobild till ren, sökbar text? Du är inte ensam. Enligt min erfarenhet är det största hindret inte OCR-motorn i sig utan att få bilden i rätt format och extrahera ren text utan problem. + +Denna **python ocr tutorial** guidar dig genom varje steg—laddar en bild för OCR, kör igenkänning och slutligen konverterar bildens rena text till en Python-sträng som du kan lagra eller analysera. I slutet kommer du kunna **extract text image python** i stil, och du behöver ingen betald licens för att komma igång. + +## Vad du kommer att lära dig + +- Hur du installerar och importerar Aspose OCR Cloud SDK för Python. +- Den exakta koden för att **load image for OCR** (PNG, JPEG, TIFF, PDF, etc.). +- Hur du anropar motorn för att utföra **ocr image to text**-konvertering. +- Tips för att hantera vanliga edge‑cases som flersidiga PDF:er eller lågupplösta skanningar. +- Sätt att verifiera resultatet och vad du ska göra om texten ser förvrängd ut. + +### Förutsättningar + +- Python 3.8+ installerat på din maskin. +- Ett gratis Aspose Cloud‑konto (prövversionen fungerar utan licens). +- Grundläggande kunskap om pip och virtuella miljöer—inget avancerat. + +> **Pro tip:** Om du redan använder ett virtualenv, aktivera det nu. Det håller dina beroenden organiserade och undviker versionskonflikter. + +![Python OCR-handledning skärmdump som visar igenkänd text](path/to/ocr_example.png "Python OCR-handledning – visning av extraherad ren text") + +## Steg 1 – Installera Aspose OCR Cloud SDK + +Först och främst behöver vi biblioteket som kommunicerar med Asposes OCR-tjänst. Öppna en terminal och kör: + +```bash +pip install asposeocrcloud +``` + +Det enkla kommandot hämtar den senaste SDK:n (för närvarande version 23.12). Paketet innehåller allt du behöver—inga extra bildbehandlingsbibliotek krävs. + +## Steg 2 – Initiera OCR-motorn (Primärt nyckelord i aktion) + +Nu när SDK:n är klar kan vi starta **python ocr tutorial**-motorn. Konstruktorn kräver ingen licensnyckel för provversionen, vilket gör det enkelt. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Varför detta är viktigt:** Att initiera motorn endast en gång håller efterföljande anrop snabba. Om du återskapar objektet för varje bild slösar du nätverksrundresor. + +## Steg 3 – Ladda bild för OCR + +Här är där **load image for OCR**-nyckelordet glänser. SDK:ns `Image.load`-metod accepterar en filsökväg eller en URL, och den upptäcker automatiskt formatet (PNG, JPEG, TIFF, PDF, etc.). Låt oss ladda ett exempelkvitto: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Om du hanterar en flersidig PDF, peka helt enkelt på PDF-filen; SDK:n kommer att behandla varje sida som en separat bild internt. + +## Steg 4 – Utför OCR Bild till Text-konvertering + +Med bilden i minnet sker den faktiska OCR:n i en enda rad. `recognize`-metoden returnerar ett `OcrResult`-objekt som innehåller den rena texten, förtroendesiffror och även avgränsningsrutor om du behöver dem senare. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** För lågupplösta bilder (under 300 dpi) kan du vilja först skala upp bilden. SDK:n erbjuder en `Resize`-hjälp, men för de flesta kvitton fungerar standardinställningen bra. + +## Steg 5 – Konvertera bildens rena text till en användbar sträng + +Den sista pusselbiten är att extrahera den rena texten från result-objektet. Detta är steget **convert image plain text** som omvandlar OCR-blocket till något du kan skriva ut, lagra eller föra in i ett annat system. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +När du kör skriptet bör du se något liknande: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Det resultatet är nu en vanlig Python-sträng, redo för CSV-export, databasinsättning eller naturlig språkbehandling. + +## Hantera vanliga fallgropar + +### 1. Tomma eller brusiga bilder + +Om `ocr_result.text` blir tomt, dubbelkolla bildkvaliteten. En snabb lösning är att lägga till ett förbehandlingssteg: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Flersidiga PDF:er + +När du matar in en PDF, returnerar `recognize` resultat för varje sida. Loopa igenom dem så här: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Språkstöd + +Aspose OCR stödjer över 60 språk. För att byta språk, sätt `language`-egenskapen innan du anropar `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Fullt fungerande exempel + +Sätter ihop allt, här är ett komplett, kopiera‑och‑klistra‑klart skript som täcker allt från installation till hantering av edge‑cases: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Kör skriptet (`python ocr_demo.py`) så ser du **ocr image to text**-utdata direkt i din konsol. + +## Sammanfattning – Vad vi gick igenom + +- Installerade **Aspose OCR Cloud** SDK (`pip install asposeocrcloud`). +- **Initialised the OCR engine** utan licens (perfekt för provversion). +- Visade hur man **load image for OCR**, oavsett om det är en PNG, JPEG eller PDF. +- Utförde **ocr image to text**-konvertering och **convert image plain text** till en användbar Python-sträng. +- Hanterade vanliga fallgropar som lågupplösta skanningar, flersidiga PDF:er och språkval. + +## Nästa steg & relaterade ämnen + +Nu när du har bemästrat **python ocr tutorial**, överväg att utforska: + +- **Extract text image python** för batch‑bearbetning av stora mappar med kvitton. +- Integrera OCR‑utdata med **pandas** för dataanalys (`df = pd.read_csv(StringIO(extracted))`). +- Använda **Tesseract OCR** som reserv när internetanslutningen är begränsad. +- Lägga till efterbehandling med **spaCy** för att identifiera entiteter som datum, belopp och handlarens namn. + +Känn dig fri att experimentera: prova olika bildformat, justera kontrasten eller byta språk. OCR‑landskapet är brett, och de färdigheter du just har lärt dig är en solid grund för alla dokument‑automatiseringsprojekt. + +Lycka till med kodandet, och må din text alltid vara läsbar! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/swedish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/swedish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..90205d94c --- /dev/null +++ b/ocr/swedish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,201 @@ +--- +category: general +date: 2026-03-28 +description: Lär dig hur du kör OCR på en bild, laddar ner Hugging Face-modellen automatiskt, + rensar OCR‑text och konfigurerar LLM‑modellen i Python med Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: sv +og_description: Kör OCR på bild och rensa utdata med en automatiskt nedladdad Hugging Face-modell. + Den här guiden visar hur du konfigurerar LLM-modellen i Python. +og_title: Kör OCR på bild – Komplett Aspose OCR Cloud‑handledning +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Kör OCR på bild med Aspose OCR Cloud – Fullständig steg‑för‑steg‑guide +url: /sv/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Kör OCR på bild – Komplett Aspose OCR Cloud-handledning + +Har du någonsin behövt köra OCR på bildfiler men den råa utskriften såg ut som en rörig röra? Enligt min erfarenhet är den största smärtan inte igenkänningen i sig – det är rengöringen. Lyckligtvis låter Aspose OCR Cloud dig bifoga en LLM‑postprocessor som automatiskt kan *rensa OCR‑text* . I den här handledningen går vi igenom allt du behöver: från **nedladdning av en Hugging Face-modell** till konfiguration av LLM, körning av OCR‑motorn och slutligen polering av resultatet. + +Vid slutet av den här guiden har du ett färdigt skript som: + +1. Hämtar en kompakt Qwen 2.5-modell från Hugging Face (automatiskt nedladdad åt dig). +2. Konfigurerar modellen så att en del av nätverket körs på GPU och resten på CPU. +3. Kör OCR‑motorn på en bild av en handskriven notering. +4. Använder LLM för att rensa den igenkända texten, vilket ger dig ett människoläsbart resultat. + +> **Förutsättningar** – Python 3.8+, `asposeocrcloud`-paket, ett GPU med minst 4 GB VRAM (valfritt men rekommenderat), och en internetanslutning för den första modellnedladdningen. + +## Vad du behöver + +- **Aspose OCR Cloud SDK** – installera via `pip install asposeocrcloud`. +- **En exempelbild** – t.ex. `handwritten_note.jpg` placerad i en lokal mapp. +- **GPU‑stöd** – om du har ett CUDA‑aktiverat GPU, kommer skriptet att avlasta 30 lager; annars faller det tillbaka till CPU automatiskt. +- **Skrivbehörighet** – skriptet cachar modellen i `YOUR_DIRECTORY`; se till att mappen finns. + +## Steg 1 – Konfigurera LLM-modellen (ladda ner Hugging Face-modell) + +Det första vi gör är att tala om för Aspose AI var modellen ska hämtas från. Klassen `AsposeAIModelConfig` hanterar automatisk nedladdning, kvantisering och GPU‑lagerallokering. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Varför detta är viktigt** – Kvantisering till `int8` minskar minnesanvändningen dramatiskt (≈ 4 GB vs 12 GB). Att dela modellen mellan GPU och CPU låter dig köra en 3‑miljard‑parameter LLM även på ett modest RTX 3060. Om du inte har ett GPU, sätt `gpu_layers=0` så håller SDK allt på CPU. + +> **Tips:** Första körningen kommer att ladda ner ~ 1,5 GB, så ge den några minuter och en stabil anslutning. + +## Steg 2 – Initiera AI‑motorn med modellkonfigurationen + +Nu startar vi Aspose AI‑motorn och matar in den konfiguration vi just skapade. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Vad händer under huven?** SDK:n kontrollerar `directory_model_path` för en befintlig modell. Om den hittar en matchande version laddas den omedelbart; annars laddas GGUF‑filen ner från Hugging Face, packas upp och förbereder inferens‑pipeline. + +## Steg 3 – Skapa OCR‑motorn och bifoga AI‑postprocessorn + +OCR‑motorn utför det tunga arbetet med att känna igen tecken. Genom att bifoga `ocr_ai.run_postprocessor` aktiverar vi **ren OCR‑text** automatiskt efter igenkänning. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Varför använda en post‑processor?** Rå OCR innehåller ofta radbrytningar på fel ställen, felaktig interpunktion eller lösa symboler. LLM:n kan skriva om utskriften till korrekta meningar, rätta stavning och till och med gissa saknade ord – i princip förvandla en rå dump till polerad prosa. + +## Steg 4 – Kör OCR på en bildfil + +När allt är kopplat ihop är det dags att mata in en bild till motorn. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Edge case:** Om bilden är stor (> 5 MP) kan du vilja ändra storlek först för att snabba upp bearbetningen. SDK:n accepterar ett Pillow `Image`‑objekt, så du kan förbehandla med `PIL.Image.thumbnail()` om så behövs. + +## Steg 5 – Låt AI:n rensa den igenkända texten och visa båda versionerna + +Slutligen anropar vi post‑processorn som vi bifogade tidigare. Detta steg visar kontrasten mellan *före* och *efter* rengöring. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Förväntad utskrift + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Observera hur LLM:n har: + +- Fixat vanliga OCR‑fel (`Th1s` → `This`). +- Tagit bort lösa symboler (`&` → `and`). +- Normaliserat radbrytningar till korrekta meningar. + +## 🎨 Visuell översikt (Kör OCR på bild‑arbetsflöde) + +![Run OCR on image workflow](run_ocr_on_image_workflow.png "Diagram showing the run OCR on image pipeline from model download to cleaned output") + +Diagrammet ovan sammanfattar hela pipeline:n: **ladda ner Hugging Face-modell → konfigurera LLM → initiera AI → OCR‑motor → AI‑postprocessor → ren OCR‑text**. + +## Vanliga frågor & pro‑tips + +### Vad händer om jag inte har ett GPU? + +Sätt `gpu_layers=0` i `AsposeAIModelConfig`. Modellen kommer att köras helt på CPU, vilket är långsammare men fortfarande funktionellt. Du kan också byta till en mindre modell (t.ex. `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) för att hålla inferenstiden rimlig. + +### Hur ändrar jag modellen senare? + +Uppdatera bara `hugging_face_repo_id` och kör `ocr_ai.initialize(model_config)` igen. SDK:n kommer att upptäcka versionsändringen, ladda ner den nya modellen och ersätta de cachade filerna. + +### Kan jag anpassa post‑processor‑prompten? + +Ja. Passa ett dictionary till `custom_settings` med en `prompt_template`‑nyckel. Till exempel: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Bör jag spara den rensade texten till en fil? + +Definitivt. Efter rengöring kan du skriva resultatet till en `.txt`‑ eller `.json`‑fil för vidare bearbetning: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +## Slutsats + +Vi har just visat dig hur du **kör OCR på bild**‑filer med Aspose OCR Cloud, automatiskt **laddar ner en Hugging Face-modell**, skickligt **konfigurerar LLM-modell**‑inställningar, och slutligen **rengör OCR‑text** med en kraftfull LLM‑postprocessor. Hela processen får plats i ett enda, lätt‑att‑köra Python‑skript och fungerar både på GPU‑aktiverade och CPU‑endast maskiner. + +Om du är bekväm med detta pipeline, överväg att experimentera med: + +- **Olika LLM:er** – prova `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` för ett större kontextfönster. +- **Batch‑bearbetning** – loopa över en mapp med bilder och samla de rensade resultaten i en CSV. +- **Anpassade prompts** – skräddarsy AI:n för ditt område (juridiska dokument, medicinska anteckningar, etc.). + +Känn dig fri att justera `gpu_layers`‑värdet, byta modell eller ansluta din egen prompt. Himlen är gränsen, och koden du har nu är startplattan. + +Lycka till med kodandet, och må dina OCR‑resultat alltid vara rena! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/thai/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/thai/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..b09e847f6 --- /dev/null +++ b/ocr/thai/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,222 @@ +--- +category: general +date: 2026-03-28 +description: วิธีใช้ OCR เพื่อจดจำข้อความลายมือในภาพ เรียนรู้การสกัดข้อความลายมือ, + แปลงภาพลายมือ, และได้ผลลัพธ์ที่สะอาดและรวดเร็ว +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: th +og_description: วิธีใช้ OCR เพื่อจดจำข้อความที่เขียนด้วยมือ การสอนนี้จะแสดงขั้นตอนโดยละเอียดว่าคุณจะดึงข้อความที่เขียนด้วยมือจากภาพและได้ผลลัพธ์ที่เรียบหรูอย่างไร +og_title: วิธีใช้ OCR เพื่อจดจำข้อความลายมือ – คู่มือฉบับสมบูรณ์ +tags: +- OCR +- Handwriting Recognition +- Python +title: วิธีใช้ OCR เพื่อจดจำข้อความลายมือ – คู่มือฉบับสมบูรณ์ +url: /th/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# วิธีใช้ OCR เพื่อจดจำข้อความที่เขียนด้วยมือ – คู่มือฉบับสมบูรณ์ + +การใช้ OCR สำหรับโน้ตที่เขียนด้วยมือเป็นคำถามที่นักพัฒนาหลายคนถามเมื่อต้องการแปลงสเก็ตช์, รายงานการประชุม, หรือไอเดียที่จดไว้เร็ว ๆ ให้เป็นดิจิทัล ในคู่มือนี้เราจะพาคุณผ่านขั้นตอนที่แม่นยำเพื่อจดจำข้อความที่เขียนด้วยมือ, ดึงข้อความที่เขียนด้วยมือ, และแปลงภาพที่เขียนด้วยมือให้เป็นสตริงที่สะอาดและสามารถค้นหาได้ + +ถ้าคุณเคยมองภาพรายการของชำและสงสัยว่า “ฉันสามารถแปลงภาพที่เขียนด้วยมือนี้เป็นข้อความได้โดยไม่ต้องพิมพ์ทุกอย่างใหม่หรือไม่?” – คุณมาถูกที่แล้ว. เมื่อจบคุณจะมีสคริปต์พร้อมรันที่เปลี่ยน **handwritten note to text** ในไม่กี่วินาที + +## สิ่งที่คุณต้องเตรียม + +- Python 3.8+ (โค้ดทำงานกับเวอร์ชันล่าสุดใดก็ได้) +- ไลบรารี `ocr` – ติดตั้งด้วย `pip install ocr-sdk` (แทนที่ด้วยชื่อแพคเกจของผู้ให้บริการของคุณ) +- ภาพที่ชัดเจนของโน้ตที่เขียนด้วยมือ (`hand_note.png` ในตัวอย่าง) +- ความอยากรู้อยากเห็นเล็กน้อยและกาแฟ ☕️ (ไม่บังคับแต่แนะนำ) + +ไม่มีเฟรมเวิร์กหนัก ๆ, ไม่มีคีย์คลาวด์ที่ต้องชำระเงิน – เพียงเครื่องยนต์ในเครื่องที่รองรับ **handwritten recognition** ตั้งแต่แรก + +## ขั้นตอนที่ 1 – ติดตั้งแพคเกจ OCR และนำเข้า + +ก่อนอื่นเลย, มาติดตั้งแพคเกจที่ถูกต้องบนเครื่องของคุณกัน. เปิดเทอร์มินัลและรัน: + +```bash +pip install ocr-sdk +``` + +เมื่อการติดตั้งเสร็จสิ้น, ให้นำเข้าโมดูลในสคริปต์ของคุณ: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **เคล็ดลับ:** หากคุณใช้ virtual environment, ให้เปิดใช้งานก่อนติดตั้ง. สิ่งนี้จะทำให้โปรเจกต์ของคุณเป็นระเบียบและหลีกเลี่ยงการชนกันของเวอร์ชัน + +## ขั้นตอนที่ 2 – สร้าง OCR Engine และเปิดใช้โหมด Handwritten + +ตอนนี้เราจริง ๆ แล้ว **how to use OCR** – เราต้องการอินสแตนซ์ของ engine ที่รู้ว่าเรากำลังจัดการกับลายเส้นตัวเขียนแบบโค้งแทนฟอนต์พิมพ์. โค้ดต่อไปนี้สร้าง engine และสลับเป็นโหมด handwritten: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +ทำไมต้องตั้งค่า `recognition_mode`? เพราะ OCR engine ส่วนใหญ่ตั้งค่าเริ่มต้นเป็นการตรวจจับข้อความพิมพ์, ซึ่งมักจะมองข้ามลูปและการเอียงของโน้ตส่วนบุคคล. การเปิดใช้โหมด handwritten จะเพิ่มความแม่นยำอย่างมาก + +## ขั้นตอนที่ 3 – โหลดภาพที่คุณต้องการแปลง (Convert Handwritten Image) + +ภาพคือวัสดุดิบสำหรับงาน OCR ใด ๆ. ตรวจสอบให้แน่ใจว่าภาพของคุณบันทึกในรูปแบบ lossless (PNG ทำงานได้ดี) และข้อความอ่านได้พอประมาณ. จากนั้นโหลดภาพดังนี้: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +หากภาพอยู่ในโฟลเดอร์เดียวกับสคริปต์ของคุณ, คุณสามารถใช้ `"hand_note.png"` แทนการระบุพาธเต็มได้. + +> **ถ้าภาพเบลอล่ะ?** ลองทำการพรี‑โปรเซสด้วย OpenCV (เช่น `cv2.cvtColor` เพื่อแปลงเป็นระดับสีเทา, `cv2.threshold` เพื่อเพิ่มคอนทราสต์) ก่อนส่งให้ OCR engine + +## ขั้นตอนที่ 4 – รัน Recognition Engine เพื่อดึงข้อความที่เขียนด้วยมือ + +เมื่อ engine พร้อมและภาพอยู่ในหน่วยความจำ, เราสามารถ **extract handwritten text** ได้ในที่สุด. เมธอด `recognize` จะคืนค่าอ็อบเจ็กต์ผลลัพธ์ดิบที่มีข้อความพร้อมคะแนนความเชื่อมั่น + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +ผลลัพธ์ดิบทั่วไปอาจมีการขึ้นบรรทัดใหม่ที่ไม่ต้องการหรืออักขระที่ระบุผิด, โดยเฉพาะถ้าการเขียนมือเป็นระเบียบไม่ดี. นั่นคือเหตุผลที่ขั้นตอนต่อไปมีอยู่ + +## ขั้นตอนที่ 5 – (ทางเลือก) ปรับแต่งผลลัพธ์ด้วย AI Post‑Processor + +OCR SDK สมัยใหม่ส่วนใหญ่มาพร้อมกับ AI post‑processor ที่เบา ๆ ซึ่งทำความสะอาดการเว้นวรรค, แก้ไขข้อผิดพลาด OCR ที่พบบ่อย, และทำให้การจบบรรทัดเป็นมาตรฐาน. การรันมันง่ายเพียง: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +หากคุณข้ามขั้นตอนนี้คุณยังจะได้ข้อความที่ใช้งานได้, แต่การแปลง **handwritten note to text** จะดูหยาบกว่าเล็กน้อย. Post‑processor มีประโยชน์เป็นพิเศษสำหรับโน้ตที่มี bullet points หรือคำที่มีตัวพิมพ์ใหญ่-เล็กผสมกัน + +## ขั้นตอนที่ 6 – ตรวจสอบผลลัพธ์และจัดการ Edge Cases + +หลังจากพิมพ์ผลลัพธ์ที่ปรับแต่งแล้ว, ตรวจสอบสองครั้งว่าทุกอย่างดูถูกต้อง. นี่คือการตรวจสอบความสมเหตุสมผลอย่างรวดเร็วที่คุณสามารถเพิ่มได้: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**รายการตรวจสอบ Edge‑case** + +| สถานการณ์ | สิ่งที่ต้องทำ | +|-----------|------------| +| **Very low contrast** | เพิ่มคอนทราสต์ด้วย `cv2.convertScaleAbs` ก่อนโหลด. | +| **Multiple languages** | ตั้งค่า `ocr_engine.language = ["en", "es"]` (หรือภาษาที่คุณต้องการ). | +| **Large documents** | ประมวลผลหน้าเป็นชุดเพื่อหลีกเลี่ยงการใช้หน่วยความจำสูง. | +| **Special symbols** | เพิ่มพจนานุกรมกำหนดเองผ่าน `ocr_engine.add_custom_words([...])`. | + +## ภาพรวมโดยรวม + +ด้านล่างเป็นภาพตัวอย่างที่แสดงขั้นตอนการทำงาน — ตั้งแต่โน้ตที่ถ่ายภาพจนถึงข้อความที่สะอาด. ข้อความ alt มีคีย์เวิร์ดหลัก ทำให้ภาพเป็นมิตรต่อ SEO + +![วิธีใช้ OCR กับภาพโน้ตที่เขียนด้วยมือ](/images/handwritten_ocr_flow.png "วิธีใช้ OCR กับภาพโน้ตที่เขียนด้วยมือ") + +## สคริปต์เต็มที่สามารถรันได้ + +รวมทุกส่วนเข้าด้วยกัน, นี่คือโปรแกรมที่พร้อมคัดลอก‑วางเต็มรูปแบบ: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**ผลลัพธ์ที่คาดหวัง (ตัวอย่าง)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +สังเกตว่าการ post‑processor แก้ไขการพิมพ์ผิด “T0d@y” และทำให้การเว้นวรรคเป็นมาตรฐาน + +## ข้อผิดพลาดทั่วไป & เคล็ดลับมืออาชีพ + +- **ขนาดภาพสำคัญ** – OCR engine มักจำกัดขนาดอินพุตที่ 4 K × 4 K. ปรับขนาดรูปใหญ่ล่วงหน้า. +- **สไตล์การเขียนมือ** – การเขียนแบบ Cursive กับตัวอักษรบล็อกอาจส่งผลต่อความแม่นยำ. หากคุณควบคุมแหล่งที่มา (เช่น ปากกาดิจิทัล), แนะนำให้ใช้ตัวอักษรบล็อกเพื่อผลลัพธ์ที่ดีที่สุด. +- **การประมวลผลแบบแบช** – เมื่อจัดการกับหลายสิบโน้ต, ห่อสคริปต์ในลูปและเก็บผลลัพธ์แต่ละรายการใน CSV หรือ SQLite DB. +- **Memory leaks** – SDK บางตัวเก็บบัฟเฟอร์ภายใน; เรียก `ocr_engine.dispose()` หลังใช้งานเสร็จหากสังเกตว่าช้า. + +## ขั้นตอนต่อไป – ไปไกลกว่าการ OCR แบบง่าย + +ตอนนี้คุณได้เชี่ยวชาญ **how to use OCR** สำหรับภาพเดียวแล้ว, พิจารณาการขยายต่อไปนี้: + +1. **เชื่อมต่อกับคลาวด์สตอเรจ** – ดึงภาพจาก AWS S3 หรือ Azure Blob, รัน pipeline เดียวกัน, แล้วผลักผลลัพธ์กลับ. +2. **เพิ่มการตรวจจับภาษา** – ใช้ `ocr_engine.detect_language()` เพื่อสลับพจนานุกรมโดยอัตโนมัติ. +3. **รวมกับ NLP** – ส่งข้อความที่ทำความสะอาดแล้วเข้า spaCy หรือ NLTK เพื่อดึง entities, วันที่, หรือรายการทำ. +4. **สร้าง REST endpoint** – ห่อสคริปต์ใน Flask หรือ FastAPI เพื่อให้บริการอื่น ๆ สามารถ POST ภาพและรับข้อความที่เข้ารหัสเป็น JSON. + +แนวคิดทั้งหมดนี้ยังคงหมุนรอบแนวคิดหลักของ **recognize handwritten text**, **extract handwritten text**, และ **convert handwritten image** — คำวลีที่คุณอาจค้นหาในขั้นต่อไป + +--- + +### สรุปย่อ + +เราได้แสดงให้คุณ **how to use OCR** เพื่อจดจำข้อความที่เขียนด้วยมือ, ดึงข้อความนั้น, และปรับผลลัพธ์ให้เป็นสตริงที่ใช้งานได้. สคริปต์เต็มพร้อมรัน, ขั้นตอนทำงานอธิบายเป็นขั้นตอน, และคุณมีรายการตรวจสอบสำหรับ edge case ทั่วไปแล้ว. ถ่ายภาพโน้ตการประชุมครั้งต่อไปของคุณ, ใส่ลงในสคริปต์, แล้วให้เครื่องทำการพิมพ์ให้คุณ + +ขอให้เขียนโค้ดอย่างสนุก, และขอให้โน้ตของคุณอ่านได้เสมอ! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/thai/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/thai/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..16981ddd2 --- /dev/null +++ b/ocr/thai/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,183 @@ +--- +category: general +date: 2026-03-28 +description: ทำ OCR บนภาพและรับข้อความที่สะอาดพร้อมพิกัดของกล่องขอบเขต เรียนรู้วิธีดึง + OCR ทำความสะอาด OCR และแสดงผลลัพธ์ขั้นตอนต่อขั้นตอน. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: th +og_description: ทำ OCR บนภาพ ทำความสะอาดผลลัพธ์ และแสดงพิกัดกล่องขอบเขตในบทแนะนำสั้น + ๆ +og_title: ทำ OCR บนภาพ – ผลลัพธ์ที่สะอาดและกล่องขอบเขต +tags: +- OCR +- Computer Vision +- Python +title: ทำ OCR บนภาพ – ทำความสะอาดผลลัพธ์และแสดงพิกัดกล่องขอบ +url: /th/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# ทำ OCR บนรูปภาพ – ทำความสะอาดผลลัพธ์และแสดงพิกัด Bounding Box + +เคยต้อง **perform OCR on image** แต่ผลลัพธ์เป็นข้อความที่ยุ่งยากและไม่แน่ใจว่าคำแต่ละคำอยู่ที่ไหนบนรูปหรือไม่? คุณไม่ได้เป็นคนเดียว ในหลายโครงการ—การแปลงใบแจ้งหนี้เป็นดิจิทัล, การสแกนใบเสร็จ, หรือการดึงข้อความอย่างง่าย—การได้ผลลัพธ์ OCR ดิบเป็นเพียงอุปสรรคแรก ข่าวดีคือ คุณสามารถทำความสะอาดผลลัพธ์นั้นและดูพิกัด Bounding Box ของแต่ละพื้นที่ได้ทันทีโดยไม่ต้องเขียนโค้ดซ้ำซ้อนมาก + +ในคู่มือนี้เราจะอธิบายขั้นตอน **how to extract OCR**, รัน **how to clean OCR** post‑processor, และสุดท้าย **display bounding box coordinates** สำหรับแต่ละพื้นที่ที่ทำความสะอาดแล้ว. เมื่อเสร็จคุณจะได้สคริปต์เดียวที่สามารถรันได้ซึ่งแปลงรูปภาพเบลอให้เป็นข้อความที่เป็นระเบียบและมีโครงสร้างพร้อมสำหรับการประมวลผลต่อไป + +## สิ่งที่คุณต้องการ + +- Python 3.9+ (ไวยากรณ์ด้านล่างทำงานบน 3.8 และใหม่กว่า) +- OCR engine ที่รองรับ `recognize(..., return_structured=True)` – ตัวอย่างเช่นไลบรารี `engine` สมมติที่ใช้ในโค้ดตัวอย่าง. แทนที่ด้วย Tesseract, EasyOCR, หรือ SDK ใด ๆ ที่คืนค่าข้อมูลพื้นที่ +- ความคุ้นเคยพื้นฐานกับฟังก์ชันและลูปของ Python +- ไฟล์รูปภาพที่คุณต้องการสแกน (PNG, JPG, ฯลฯ) + +> **เคล็ดลับ:** หากคุณใช้ Tesseract, ฟังก์ชัน `pytesseract.image_to_data` จะให้ Bounding Box อยู่แล้ว. คุณสามารถห่อผลลัพธ์นั้นด้วยอะแดปเตอร์เล็ก ๆ ที่จำลอง API `engine.recognize` ด้านล่าง + +![ตัวอย่างการทำ OCR บนรูปภาพ](image-placeholder.png "ตัวอย่างการทำ OCR บนรูปภาพ") + +*ข้อความแทน: แผนภาพแสดงวิธีทำ OCR บนรูปภาพและแสดงพิกัด Bounding Box* + +## ขั้นตอนที่ 1 – ทำ OCR บนรูปภาพและรับโครงสร้างพื้นที่ + +สิ่งแรกคือการขอให้ OCR engine คืนค่าไม่ใช่แค่ข้อความธรรมดา แต่เป็นรายการโครงสร้างของพื้นที่ข้อความ. รายการนี้ประกอบด้วยสตริงดิบและสี่เหลี่ยมที่ล้อมรอบมัน. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**ทำไมเรื่องนี้ถึงสำคัญ:** +เมื่อคุณขอแค่ข้อความธรรมดา คุณจะสูญเสียบริบทเชิงพื้นที่. ข้อมูลเชิงโครงสร้างทำให้คุณสามารถ **display bounding box coordinates** ในภายหลัง, จัดตำแหน่งข้อความกับตาราง, หรือส่งตำแหน่งที่แม่นยำให้กับโมเดลต่อไป. + +## ขั้นตอนที่ 2 – วิธีทำความสะอาดผลลัพธ์ OCR ด้วย Post‑Processor + +OCR engine มีความสามารถในการตรวจจับอักขระได้ดี, แต่บ่อยครั้งจะเหลือช่องว่างเกิน, สิ่งกีดขวางจากการตัดบรรทัด, หรือสัญลักษณ์ที่อ่านผิด. Post‑processor จะทำให้ข้อความเป็นมาตรฐาน, แก้ไขข้อผิดพลาด OCR ที่พบบ่อย, และตัดช่องว่างส่วนเกิน. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +หากคุณกำลังสร้างตัวทำความสะอาดของคุณเอง, ควรพิจารณา: + +- ลบอักขระที่ไม่ใช่ ASCII (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- ทำให้หลายช่องว่างต่อเนื่องกลายเป็นช่องว่างเดียว +- ใช้ spell‑checker อย่าง `pyspellchecker` เพื่อตรวจสอบการพิมพ์ผิดที่ชัดเจน + +**ทำไมคุณควรใส่ใจ:** +สตริงที่เป็นระเบียบทำให้การค้นหา, การทำดัชนี, และ pipeline NLP ต่อไปทำงานได้เชื่อถือได้มากขึ้น. กล่าวคือ, **how to clean OCR** มักเป็นความแตกต่างระหว่างชุดข้อมูลที่ใช้ได้และปัญหาต่าง ๆ. + +## ขั้นตอนที่ 3 – แสดงพิกัด Bounding Box สำหรับแต่ละพื้นที่ที่ทำความสะอาดแล้ว + +เมื่อข้อความเป็นระเบียบแล้ว, เราจะวนลูปผ่านแต่ละพื้นที่, พิมพ์สี่เหลี่ยมและสตริงที่ทำความสะอาด. นี่คือส่วนที่เราจะ **display bounding box coordinates** ในที่สุด. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**ตัวอย่างผลลัพธ์** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +คุณสามารถนำพิกัดเหล่านั้นไปใช้กับไลบรารีการวาด (เช่น OpenCV) เพื่อวางกล่องบนรูปภาพต้นฉบับ, หรือเก็บไว้ในฐานข้อมูลเพื่อการสืบค้นในภายหลัง. + +## สคริปต์เต็มพร้อมรัน + +ด้านล่างเป็นโปรแกรมเต็มที่เชื่อมโยงขั้นตอนทั้งสามเข้าด้วยกัน. แทนที่การเรียก `engine` ตัวอย่างด้วย SDK OCR ของคุณจริง. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### วิธีการรัน + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +คุณควรเห็นรายการของ Bounding Box ที่จับคู่กับข้อความที่ทำความสะอาด, เหมือนกับตัวอย่างผลลัพธ์ด้านบน. + +## คำถามที่พบบ่อยและกรณีขอบ + +| คำถาม | คำตอบ | +|----------|--------| +| **ถ้า OCR engine ไม่รองรับ `return_structured` จะทำอย่างไร?** | เขียน wrapper เล็ก ๆ ที่แปลงผลลัพธ์ดิบของ engine (โดยทั่วไปเป็นรายการคำพร้อมพิกัด) ให้เป็นอ็อบเจกต์ที่มีแอตทริบิวต์ `text` และ `bounding_box`. | +| **ฉันสามารถรับคะแนนความเชื่อมั่นได้หรือไม่?** |หลาย SDK จะเปิดเผยเมตริกความเชื่อมั่นต่อแต่ละพื้นที่. เพิ่มลงในคำสั่งพิมพ์: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **จะจัดการข้อความที่หมุนได้อย่างไร?** | ทำการพรี‑โปรเซสรูปภาพด้วย `cv2.minAreaRect` ของ OpenCV เพื่อแก้ไขการเอียงก่อนเรียก `recognize`. | +| **ถ้าฉันต้องการผลลัพธ์ในรูปแบบ JSON จะทำอย่างไร?** |ทำการซีเรียลไลซ์ `processed_result.regions` ด้วย `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **มีวิธีใดในการแสดงภาพกล่องบ้าง?** | ใช้ OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` ภายในลูป, จากนั้น `cv2.imwrite("annotated.jpg", img)`. | + +## สรุป + +คุณเพิ่งเรียนรู้ **how to perform OCR on image**, ทำความสะอาดผลลัพธ์ดิบ, และ **display bounding box coordinates** สำหรับทุกพื้นที่. กระบวนการสามขั้นตอน—recognize → post‑process → iterate—เป็นแพทเทิร์นที่นำกลับมาใช้ได้ในโปรเจกต์ Python ใด ๆ ที่ต้องการการดึงข้อความที่เชื่อถือได้. + +### ขั้นตอนต่อไปคืออะไร? + +- **สำรวจ OCR back‑ends ต่าง ๆ** (Tesseract, EasyOCR, Google Vision) และเปรียบเทียบความแม่นยำ. +- **รวมเข้ากับฐานข้อมูล** เพื่อเก็บข้อมูลพื้นที่สำหรับคลังข้อมูลที่สามารถค้นหาได้. +- **เพิ่มการตรวจจับภาษา** เพื่อส่งแต่ละพื้นที่ผ่าน spell‑checker ที่เหมาะสม. +- **วางกล่องบนรูปภาพต้นฉบับ** เพื่อการตรวจสอบภาพ (ดูโค้ด OpenCV ด้านบน). + +หากคุณเจอข้อผิดพลาด, จำไว้ว่าการชนะใหญ่ที่สุดมาจากขั้นตอน post‑processing ที่แข็งแรง; สตริงที่สะอาดง่ายต่อการทำงานมากกว่าการดัมพ์อักขระดิบ. + +ขอให้สนุกกับการเขียนโค้ด, และขอให้ pipeline OCR ของคุณเป็นระเบียบเสมอ! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/thai/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/thai/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..9eecf39d6 --- /dev/null +++ b/ocr/thai/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,231 @@ +--- +category: general +date: 2026-03-28 +description: บทเรียน OCR ด้วย Python แสดงวิธีดึงข้อความจากรูปภาพด้วย Aspose OCR Cloud + เรียนรู้การโหลดรูปภาพสำหรับ OCR และแปลงรูปภาพเป็นข้อความธรรมดาในไม่กี่นาที +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: th +og_description: บทแนะนำ OCR ด้วย Python อธิบายวิธีโหลดภาพสำหรับ OCR และแปลงข้อความธรรมดาจากภาพโดยใช้ + Aspose OCR Cloud. รับโค้ดเต็มและเคล็ดลับทั้งหมด. +og_title: บทเรียน OCR ด้วย Python – ดึงข้อความจากภาพ +tags: +- OCR +- Python +- Image Processing +title: บทเรียน OCR ด้วย Python – ดึงข้อความจากรูปภาพ +url: /th/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# บทเรียน Python OCR – แยกข้อความจากรูปภาพ + +เคยสงสัยไหมว่าจะเปลี่ยนรูปภาพใบเสร็จที่รกเป็นข้อความที่สะอาดและสามารถค้นหาได้? คุณไม่ได้เป็นคนเดียวที่คิดแบบนี้. ตามประสบการณ์ของผม, อุปสรรคที่ใหญ่ที่สุดไม่ใช่เครื่องมือ OCR เอง แต่คือการทำให้รูปภาพอยู่ในรูปแบบที่ถูกต้องและดึงข้อความธรรมดาออกมาโดยไม่มีปัญหา. + +บทเรียน **python ocr tutorial** นี้จะพาคุณผ่านทุกขั้นตอน—การโหลดรูปภาพสำหรับ OCR, การรันการจดจำ, และสุดท้ายการแปลงข้อความธรรมดาจากรูปภาพเป็นสตริง Python ที่คุณสามารถเก็บหรือวิเคราะห์ได้. เมื่อจบคุณจะสามารถ **extract text image python** ได้อย่างสไตล์, และคุณไม่จำเป็นต้องมีไลเซนส์แบบจ่ายเงินเพื่อเริ่มต้น. + +## สิ่งที่คุณจะได้เรียนรู้ + +- วิธีการติดตั้งและนำเข้า Aspose OCR Cloud SDK สำหรับ Python. +- โค้ดที่แม่นยำเพื่อ **load image for OCR** (PNG, JPEG, TIFF, PDF, ฯลฯ). +- วิธีเรียกใช้ engine เพื่อทำการแปลง **ocr image to text**. +- เคล็ดลับการจัดการกับ edge‑case ที่พบบ่อยเช่น PDF หลายหน้า หรือสแกนความละเอียดต่ำ. +- วิธีตรวจสอบผลลัพธ์และทำอย่างไรหากข้อความดูเป็นอักขระผสมกัน. + +### ข้อกำหนดเบื้องต้น + +- Python 3.8+ ติดตั้งบนเครื่องของคุณ. +- บัญชี Aspose Cloud ฟรี (รุ่นทดลองทำงานได้โดยไม่ต้องมีไลเซนส์). +- ความคุ้นเคยพื้นฐานกับ pip และ virtual environments—ไม่มีอะไรซับซ้อน. + +> **Pro tip:** หากคุณกำลังใช้ virtualenv อยู่แล้ว, ให้เปิดใช้งานตอนนี้. มันช่วยให้การจัดการ dependencies ของคุณเป็นระเบียบและหลีกเลี่ยงการชนกันของเวอร์ชัน. + +![ภาพหน้าจอบทเรียน Python OCR แสดงข้อความที่จดจำได้](path/to/ocr_example.png "Python OCR tutorial – แสดงข้อความธรรมดาที่แยกได้") + +## ขั้นตอนที่ 1 – ติดตั้ง Aspose OCR Cloud SDK + +ก่อนอื่นเราต้องการไลบรารีที่สื่อสารกับบริการ OCR ของ Aspose. เปิดเทอร์มินัลและรัน: + +```bash +pip install asposeocrcloud +``` + +คำสั่งเดียวนี้จะดึง SDK ล่าสุด (ขณะนี้เป็นเวอร์ชัน 23.12). แพ็กเกจมีทุกอย่างที่คุณต้องการ—ไม่ต้องใช้ไลบรารีประมวลผลรูปภาพเพิ่มเติม. + +## ขั้นตอนที่ 2 – เริ่มต้น OCR Engine (Primary Keyword in Action) + +ตอนนี้ SDK พร้อมแล้ว, เราสามารถสปินอัป engine **python ocr tutorial** ได้. ตัวสร้างไม่ต้องการคีย์ไลเซนส์สำหรับรุ่นทดลอง, ทำให้ขั้นตอนง่ายขึ้น. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **ทำไมเรื่องนี้ถึงสำคัญ:** การเริ่มต้น engine เพียงครั้งเดียวทำให้การเรียกต่อไปเร็วขึ้น. หากคุณสร้างอ็อบเจกต์ใหม่สำหรับแต่ละรูปภาพจะทำให้เสียเวลาเครือข่ายเพิ่มขึ้น. + +## ขั้นตอนที่ 3 – โหลดรูปภาพสำหรับ OCR + +นี่คือจุดที่คีย์เวิร์ด **load image for OCR** ส่องแสง. เมธอด `Image.load` ของ SDK รับพาธไฟล์หรือ URL, และจะตรวจจับรูปแบบโดยอัตโนมัติ (PNG, JPEG, TIFF, PDF, ฯลฯ). มาลองโหลดใบเสร็จตัวอย่าง: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +หากคุณทำงานกับ PDF หลายหน้า, เพียงชี้ไปที่ไฟล์ PDF; SDK จะถือแต่ละหน้าเป็นรูปภาพแยกภายใน. + +## ขั้นตอนที่ 4 – ทำการแปลง OCR Image to Text + +เมื่อรูปภาพอยู่ในหน่วยความจำ, OCR จริงจะเกิดขึ้นในบรรทัดเดียว. เมธอด `recognize` จะคืนค่าอ็อบเจกต์ `OcrResult` ที่มีข้อความธรรมดา, คะแนนความเชื่อมั่น, และแม้แต่ bounding boxes หากคุณต้องการใช้ต่อในภายหลัง. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** สำหรับรูปภาพความละเอียดต่ำ (ต่ำกว่า 300 dpi) คุณอาจต้องอัปสเกลรูปก่อน. SDK มี helper `Resize`, แต่สำหรับใบเสร็จส่วนใหญ่ค่าเริ่มต้นทำงานได้ดี. + +## ขั้นตอนที่ 5 – แปลงข้อความธรรมดาจากรูปภาพเป็นสตริงที่ใช้ได้ + +ส่วนสุดท้ายของปริศนาคือการดึงข้อความธรรมดาจากอ็อบเจกต์ผลลัพธ์. นี่คือขั้นตอน **convert image plain text** ที่เปลี่ยน blob OCR ให้เป็นสิ่งที่คุณสามารถพิมพ์, เก็บ, หรือส่งต่อไปยังระบบอื่นได้. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +เมื่อคุณรันสคริปต์, คุณควรเห็นผลลัพธ์ประมาณนี้: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +ผลลัพธ์นั้นตอนนี้เป็นสตริง Python ปกติ, พร้อมสำหรับการส่งออกเป็น CSV, แทรกลงฐานข้อมูล, หรือประมวลผลภาษาธรรมชาติ. + +## การจัดการกับปัญหาทั่วไป + +### 1. รูปภาพว่างหรือมีเสียงรบกวน + +หาก `ocr_result.text` กลับมาเป็นค่าว่าง, ตรวจสอบคุณภาพของรูปภาพอีกครั้ง. วิธีแก้เร็วคือเพิ่มขั้นตอนการพรีโปรเซส: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. PDF หลายหน้า + +เมื่อคุณป้อน PDF, `recognize` จะคืนค่าผลลัพธ์สำหรับแต่ละหน้า. วนลูปผ่านผลลัพธ์ดังนี้: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. การสนับสนุนภาษา + +Aspose OCR รองรับกว่า 60 ภาษา. เพื่อสลับภาษา, ตั้งค่า property `language` ก่อนเรียก `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## ตัวอย่างทำงานเต็มรูปแบบ + +รวมทุกอย่างเข้าด้วยกัน, นี่คือสคริปต์พร้อมคัดลอก‑วางที่ครอบคลุมตั้งแต่การติดตั้งจนถึงการจัดการ edge‑case: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +รันสคริปต์ (`python ocr_demo.py`) แล้วคุณจะเห็นผลลัพธ์ **ocr image to text** ปรากฏในคอนโซลของคุณ. + +## สรุป – สิ่งที่เราได้ครอบคลุม + +- ติดตั้ง **Aspose OCR Cloud** SDK (`pip install asposeocrcloud`). +- **Initialised the OCR engine** โดยไม่ต้องใช้ไลเซนส์ (เหมาะสำหรับรุ่นทดลอง). +- แสดงวิธี **load image for OCR**, ไม่ว่าจะเป็น PNG, JPEG, หรือ PDF. +- ทำการแปลง **ocr image to text** และ **converted image plain text** ให้เป็นสตริง Python ที่ใช้งานได้. +- แก้ไขปัญหาทั่วไปเช่นสแกนความละเอียดต่ำ, PDF หลายหน้า, และการเลือกภาษา. + +## ขั้นตอนต่อไปและหัวข้อที่เกี่ยวข้อง + +ตอนนี้คุณได้เชี่ยวชาญ **python ocr tutorial**, ลองสำรวจต่อ: + +- **Extract text image python** สำหรับการประมวลผลเป็นชุดของโฟลเดอร์ใบเสร็จขนาดใหญ่. +- การรวมผลลัพธ์ OCR กับ **pandas** เพื่อวิเคราะห์ข้อมูล (`df = pd.read_csv(StringIO(extracted))`). +- ใช้ **Tesseract OCR** เป็นทางเลือกสำรองเมื่อการเชื่อมต่ออินเทอร์เน็ตจำกัด. +- เพิ่มการประมวลผลหลังจาก OCR ด้วย **spaCy** เพื่อระบุเอนทิตีเช่น วันที่, จำนวนเงิน, และชื่อผู้ขาย. + +ลองทดลองได้เลย: ใช้รูปแบบไฟล์ต่างๆ, ปรับคอนทราสต์, หรือสลับภาษา. โลกของ OCR กว้างขวาง, และทักษะที่คุณเพิ่งเรียนรู้เป็นพื้นฐานที่มั่นคงสำหรับโครงการอัตโนมัติเอกสารใด ๆ. + +Happy coding, and may your text always be readable! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/thai/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/thai/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..209d99919 --- /dev/null +++ b/ocr/thai/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-03-28 +description: เรียนรู้วิธีการทำ OCR บนภาพ ดาวน์โหลดโมเดล Hugging Face อัตโนมัติ ทำความสะอาดข้อความ + OCR และกำหนดค่าโมเดล LLM ใน Python ด้วย Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: th +og_description: ทำการ OCR บนภาพและทำความสะอาดผลลัพธ์โดยใช้โมเดล Hugging Face ที่ดาวน์โหลดอัตโนมัติ + คู่มือนี้แสดงวิธีตั้งค่าโมเดล LLM ใน Python. +og_title: ทำ OCR บนรูปภาพ – คู่มือ Aspose OCR Cloud อย่างครบถ้วน +tags: +- OCR +- Python +- LLM +- HuggingFace +title: เรียกใช้ OCR บนภาพด้วย Aspose OCR Cloud – คู่มือขั้นตอนเต็ม +url: /th/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# เรียกใช้ OCR บนภาพ – คู่มือเต็มของ Aspose OCR Cloud + +เคยต้องการเรียกใช้ OCR บนไฟล์รูปภาพแต่ผลลัพธ์ดิบดูเหมือนเป็นการสับสนกันหรือไม่? ตามประสบการณ์ของผม จุดเจ็บปวดที่ใหญ่ที่สุดไม่ได้อยู่ที่การรับรู้เอง—แต่เป็นการทำความสะอาด. โชคดีที่ Aspose OCR Cloud ให้คุณแนบ LLM post‑processor ที่สามารถ *ทำความสะอาดข้อความ OCR* โดยอัตโนมัติ. ในบทเรียนนี้เราจะพาคุณผ่านทุกอย่างที่คุณต้องการ: ตั้งแต่ **การดาวน์โหลดโมเดลจาก Hugging Face** ไปจนถึงการกำหนดค่า LLM, การรัน OCR engine, และสุดท้ายการขัดเกลาผลลัพธ์. + +โดยตอนจบของคู่มือนี้คุณจะมีสคริปต์พร้อม‑รันที่: + +1. ดึงโมเดล Qwen 2.5 ขนาดกะทัดรัดจาก Hugging Face (ดาวน์โหลดอัตโนมัติให้คุณ). +2. กำหนดค่าโมเดลให้รันบางส่วนของเครือข่ายบน GPU และส่วนที่เหลือบน CPU. +3. ทำงาน OCR engine บนรูปภาพบันทึกมือ. +4. ใช้ LLM ทำความสะอาดข้อความที่ได้รับการจดจำ, ให้ผลลัพธ์ที่อ่านได้โดยมนุษย์. + +> **Prerequisites** – Python 3.8+, แพ็กเกจ `asposeocrcloud`, GPU ที่มี VRAM อย่างน้อย 4 GB (ไม่บังคับแต่แนะนำ), และการเชื่อมต่ออินเทอร์เน็ตสำหรับการดาวน์โหลดโมเดลครั้งแรก. + +--- + +## สิ่งที่คุณต้องการ + +- **Aspose OCR Cloud SDK** – ติดตั้งด้วย `pip install asposeocrcloud`. +- **ภาพตัวอย่าง** – เช่น `handwritten_note.jpg` ที่วางไว้ในโฟลเดอร์โลคัล. +- **การสนับสนุน GPU** – หากคุณมี GPU ที่เปิดใช้งาน CUDA, สคริปต์จะย้าย 30 ชั้นไปยัง GPU; หากไม่มีจะกลับไปใช้ CPU โดยอัตโนมัติ. +- **สิทธิ์การเขียน** – สคริปต์จะแคชโมเดลใน `YOUR_DIRECTORY`; ตรวจสอบให้แน่ใจว่าโฟลเดอร์นั้นมีอยู่. + +--- + +## Step 1 – Configure the LLM Model (download Hugging Face model) + +สิ่งแรกที่เราทำคือบอก Aspose AI ว่าจะดึงโมเดลจากที่ไหน. คลาส `AsposeAIModelConfig` จัดการการดาวน์โหลดอัตโนมัติ, การควอนไทซ์, และการจัดสรรชั้นบน GPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Why this matters** – การควอนไทซ์เป็น `int8` ช่วยลดการใช้หน่วยความจำอย่างมหาศาล (≈ 4 GB เทียบกับ 12 GB). การแยกโมเดลระหว่าง GPU และ CPU ทำให้คุณสามารถรัน LLM ขนาด 3 พันล้านพารามิเตอร์แม้บน RTX 3060 ที่ค่อนข้างธรรมดา. หากคุณไม่มี GPU, ตั้งค่า `gpu_layers=0` แล้ว SDK จะทำงานทั้งหมดบน CPU. + +> **Tip:** การรันครั้งแรกจะดาวน์โหลดประมาณ ~ 1.5 GB, ดังนั้นให้เวลาสักครู่และเชื่อมต่อที่เสถียร. + +--- + +## Step 2 – Initialise the AI Engine with the Model Configuration + +ตอนนี้เราจะสปินอัพ Aspose AI engine และป้อนการกำหนดค่าที่เราสร้างขึ้นมาให้มัน. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**What’s happening under the hood?** SDK ตรวจสอบ `directory_model_path` เพื่อหาโมเดลที่มีอยู่แล้ว. หากพบเวอร์ชันที่ตรงกันจะโหลดทันที; หากไม่พบจะดาวน์โหลดไฟล์ GGUF จาก Hugging Face, แยกแพ็คและเตรียม pipeline สำหรับการ inference. + +--- + +## Step 3 – Create the OCR Engine and Attach the AI Post‑Processor + +OCR engine ทำหน้าที่หนักในการจดจำอักขระ. โดยการแนบ `ocr_ai.run_postprocessor` เราจะเปิดใช้งาน **clean OCR text** โดยอัตโนมัติหลังการจดจำ. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Why use a post‑processor?** OCR ดิบมักมีการขึ้นบรรทัดใหม่ในตำแหน่งที่ไม่ถูกต้อง, เครื่องหมายวรรคตอนที่ตรวจจับผิด, หรือสัญลักษณ์รบกวน. LLM สามารถเขียนใหม่ให้เป็นประโยคที่สมบูรณ์, แก้ไขการสะกด, และแม้กระทั่งสรุปคำที่หายไป—โดยสรุปคือเปลี่ยนข้อมูลดิบให้เป็นข้อความที่เรียบหรู. + +--- + +## Step 4 – Run OCR on an Image File + +เมื่อทุกอย่างเชื่อมต่อกันแล้ว, ถึงเวลาป้อนภาพให้กับ engine. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Edge case:** หากภาพมีขนาดใหญ่ (> 5 MP) คุณอาจต้องปรับขนาดก่อนเพื่อเร่งการประมวลผล. SDK รองรับอ็อบเจกต์ Pillow `Image`, ดังนั้นคุณสามารถทำพรี‑โปรเซสด้วย `PIL.Image.thumbnail()` หากต้องการ. + +--- + +## Step 5 – Let the AI Clean Up the Recognised Text and Show Both Versions + +สุดท้ายเราจะเรียกใช้ post‑processor ที่แนบไว้ก่อนหน้านี้. ขั้นตอนนี้แสดงความแตกต่างระหว่าง *ก่อน* และ *หลัง* การทำความสะอาด. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Expected Output + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +สังเกตว่า LLM ได้: + +- แก้ไขการจดจำ OCR ที่พบบ่อย (`Th1s` → `This`). +- ลบสัญลักษณ์รบกวน (`&` → `and`). +- ปรับบรรทัดใหม่ให้เป็นประโยคที่สมบูรณ์. + +--- + +## 🎨 Visual Overview (Run OCR on image Workflow) + +![Run OCR on image workflow](run_ocr_on_image_workflow.png "Diagram showing the run OCR on image pipeline from model download to cleaned output") + +แผนภาพด้านบนสรุป pipeline ทั้งหมด: **download Hugging Face model → configure LLM → initialise AI → OCR engine → AI post‑processor → clean OCR text**. + +--- + +## Common Questions & Pro Tips + +### What if I don’t have a GPU? + +ตั้งค่า `gpu_layers=0` ใน `AsposeAIModelConfig`. โมเดลจะทำงานทั้งหมดบน CPU, ซึ่งช้ากว่าแต่ยังทำงานได้. คุณยังสามารถสลับไปใช้โมเดลที่เล็กลง (เช่น `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) เพื่อให้เวลา inference อยู่ในระดับที่รับได้. + +### How do I change the model later? + +เพียงอัปเดต `hugging_face_repo_id` แล้วรัน `ocr_ai.initialize(model_config)` ใหม่. SDK จะตรวจจับการเปลี่ยนเวอร์ชัน, ดาวน์โหลดโมเดลใหม่, และแทนที่ไฟล์ที่แคชไว้. + +### Can I customise the post‑processor prompt? + +ได้. ส่งพจนานุกรมไปยัง `custom_settings` พร้อมคีย์ `prompt_template`. ตัวอย่างเช่น: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Should I store the cleaned text to a file? + +แน่นอน. หลังจากทำความสะอาดแล้วคุณสามารถเขียนผลลัพธ์ลงไฟล์ `.txt` หรือ `.json` เพื่อการประมวลผลต่อไป: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Conclusion + +เราได้แสดงให้คุณเห็นวิธี **run OCR on image** ด้วย Aspose OCR Cloud, ดาวน์โหลดโมเดลจาก Hugging Face อัตโนมัติ, กำหนดค่า **configure LLM model** อย่างเชี่ยวชาญ, และสุดท้าย **clean OCR text** ด้วย LLM post‑processor ที่ทรงพลัง. ทั้งหมดนี้รวมอยู่ในสคริปต์ Python เพียงไฟล์เดียวที่ง่ายต่อการรันและทำงานได้ทั้งบนเครื่องที่มี GPU และเครื่องที่มีเฉพาะ CPU. + +หากคุณคุ้นเคยกับ pipeline นี้แล้ว, ลองทดลองกับ: + +- **Different LLMs** – ลอง `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` เพื่อรับหน้าต่างบริบทที่กว้างขึ้น. +- **Batch processing** – วนลูปผ่านโฟลเดอร์ของภาพและรวมผลลัพธ์ที่ทำความสะอาดเป็น CSV. +- **Custom prompts** – ปรับแต่ง AI ให้ตรงกับโดเมนของคุณ (เอกสารกฎหมาย, บันทึกการแพทย์, ฯลฯ). + +อย่าลังเลที่จะปรับค่า `gpu_layers`, เปลี่ยนโมเดล, หรือใส่ prompt ของคุณเอง. ท้องฟ้าเป็นขอบเขต, และโค้ดที่คุณมีตอนนี้คือจุดเริ่มต้น. + +Happy coding, and may your OCR outputs be ever clean! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/turkish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/turkish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..07b9bc3ab --- /dev/null +++ b/ocr/turkish/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Görüntülerde el yazısı metni tanımak için OCR nasıl kullanılır. El yazısı + metni çıkarmayı, el yazısı görüntüsünü dönüştürmeyi öğrenin ve hızlıca temiz sonuçlar + elde edin. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: tr +og_description: El yazısı metni tanımak için OCR nasıl kullanılır. Bu öğretici, el + yazısı metni görüntülerden adım adım nasıl çıkaracağınızı ve kusursuz sonuçlar elde + edeceğinizi gösterir. +og_title: OCR'yi Kullanarak El Yazısı Metni Tanıma – Tam Kılavuz +tags: +- OCR +- Handwriting Recognition +- Python +title: El Yazısı Metni Tanımak İçin OCR Nasıl Kullanılır – Tam Rehber +url: /tr/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# El Yazısı Metni Tanımak İçin OCR Nasıl Kullanılır – Tam Kılavuz + +El yazısı notlar için OCR kullanmak, taslakları, toplantı tutanaklarını veya hızlı notları dijitalleştirmeleri gerektiğinde birçok geliştiricinin sorduğu bir sorudur. Bu kılavuzda el yazısı metni tanıma, el yazısı metni çıkarma ve el yazısı görüntüsünü temiz, aranabilir dizelere dönüştürme adımlarını ayrıntılı olarak göstereceğiz. + +Eğer bir market listesi fotoğrafına bakıp “Bu el yazısı görüntüyü tekrar yazmadan metne dönüştürebilir miyim?” diye düşündüyseniz – doğru yerdesiniz. Sonunda, **handwritten note to text** ifadesini saniyeler içinde dönüştüren, çalıştırmaya hazır bir betiğe sahip olacaksınız. + +## Gereksinimler + +- Python 3.8+ (kod, herhangi bir yeni sürümde çalışır) +- `ocr` kütüphanesi – `pip install ocr-sdk` ile kurun (sağlayıcınızın paket adıyla değiştirin) +- El yazısı notunun net bir fotoğrafı (`hand_note.png` örnekte) +- Biraz merak ve bir kahve ☕️ (isteğe bağlı ama tavsiye edilir) + +Ağır çerçeveler yok, ücretli bulut anahtarları yok – sadece kutudan çıkar çıkmaz **handwritten recognition** destekleyen yerel bir motor. + +## Adım 1 – OCR Paketini Kurun ve İçe Aktarın + +İlk olarak, doğru paketi makinenize alalım. Bir terminal açın ve şu komutu çalıştırın: + +```bash +pip install ocr-sdk +``` + +Kurulum tamamlandığında, modülü betiğinizde içe aktarın: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Pro tip:** Sanal ortam kullanıyorsanız, kurmadan önce etkinleştirin. Bu, projenizi düzenli tutar ve sürüm çakışmalarını önler. + +## Adım 2 – Bir OCR Motoru Oluşturun ve El Yazısı Modunu Etkinleştirin + +Şimdi gerçekten **how to use OCR** – bir motor örneğine ihtiyacımız var ki, basılı metin yerine el yazısı darbeleriyle çalıştığımızı bilir. Aşağıdaki kod parçacığı motoru oluşturur ve el yazısı moduna geçirir: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +`recognition_mode` neden ayarlanır? Çünkü çoğu OCR motoru varsayılan olarak basılı‑metin algılamasını yapar, bu da kişisel notların döngü ve eğimlerini sık sık atlar. El yazısı modunu etkinleştirmek doğruluğu büyük ölçüde artırır. + +## Adım 3 – Dönüştürmek İstediğiniz Görüntüyü Yükleyin (El Yazısı Görüntüsünü Dönüştürme) + +Görseller, herhangi bir OCR işinin ham malzemesidir. Fotoğrafınızın kayıpsız bir formatta (PNG çok iyi çalışır) kaydedildiğinden ve metnin yeterince okunaklı olduğundan emin olun. Ardından şu şekilde yükleyin: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Görüntü betiğinizin yanında bulunuyorsa, tam yol yerine sadece `"hand_note.png"` kullanabilirsiniz. + +> **Görüntü bulanıktaysa ne yapmalı?** OCR motoruna vermeden önce OpenCV ile ön işleme yapmayı deneyin (ör. `cv2.cvtColor` ile gri tonlamaya, `cv2.threshold` ile kontrast artırmaya). + +## Adım 4 – Tanıma Motorunu Çalıştırarak El Yazısı Metni Çıkarın + +Motor hazır ve görüntü bellekte olduğunda, nihayet **extract handwritten text** yapabiliriz. `recognize` yöntemi, metin ve güven skorlarını içeren ham bir sonuç nesnesi döndürür. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Tipik ham çıktı, özellikle el yazısı dağınık ise, rastgele satır sonları veya hatalı karakterler içerebilir. Bu yüzden bir sonraki adım vardır. + +## Adım 5 – (İsteğe Bağlı) Çıktıyı AI Post‑Processor ile Parlatın + +Çoğu modern OCR SDK, boşlukları temizleyen, yaygın OCR hatalarını düzelten ve satır sonlarını normalleştiren hafif bir AI post‑processor ile birlikte gelir. Çalıştırması şu kadar kolay: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Bu adımı atlayarak da kullanılabilir bir metin elde edersiniz, ancak **handwritten note to text** dönüşümü biraz daha kaba görünecektir. Post‑processor, madde işaretli veya karışık‑büyük‑küçük harfli kelimeler içeren notlar için özellikle kullanışlıdır. + +## Adım 6 – Sonucu Doğrulayın ve Kenar Durumlarını Ele Alın + +Parlatılmış sonucu yazdırdıktan sonra, her şeyin doğru göründüğünden iki kez kontrol edin. Ekleyebileceğiniz hızlı bir mantık kontrolü: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Kenar Durumu Kontrol Listesi** + +| Durum | Ne Yapmalı | +|-----------|------------| +| **Çok düşük kontrast** | Yüklemeden önce `cv2.convertScaleAbs` ile kontrastı artırın. | +| **Birden fazla dil** | `ocr_engine.language = ["en", "es"]` olarak ayarlayın (veya hedef dilleriniz). | +| **Büyük belgeler** | Bellek dalgalanmalarını önlemek için sayfaları toplu işleyin. | +| **Özel semboller** | `ocr_engine.add_custom_words([...])` ile özel bir sözlük ekleyin. | + +## Görsel Genel Bakış + +Aşağıda, fotoğraflanmış bir nottan temiz metne kadar iş akışını gösteren bir yer tutucu görüntü bulunmaktadır. Alt metin, birincil anahtar kelimeyi içerir ve görüntüyü SEO‑dostu yapar. + +![el yazısı not görüntüsü üzerinde OCR nasıl kullanılır](/images/handwritten_ocr_flow.png "el yazısı not görüntüsü üzerinde OCR nasıl kullanılır") + +## Tam, Çalıştırılabilir Betik + +Tüm parçaları bir araya getirerek, işte tamamen kopyala‑yapıştır‑hazır program: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Beklenen çıktı (örnek)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Post‑processor'ın “T0d@y” yazım hatasını nasıl düzelttiğine ve boşlukları nasıl normalleştirdiğine bakın. + +## Yaygın Tuzaklar ve Pro İpuçları + +- **Görüntü boyutu önemlidir** – OCR motorları genellikle giriş boyutunu 4 K × 4 K ile sınırlar. Büyük fotoğrafları önceden yeniden boyutlandırın. +- **El yazısı stili** – Cursive (el yazısı) ve blok harfler doğruluğu etkileyebilir. Kaynağı kontrol edebiliyorsanız (ör. dijital kalem), en iyi sonuç için blok harfleri tercih edin. +- **Toplu işleme** – Onlarca notla çalışırken, betiği bir döngüye sarın ve her sonucu bir CSV ya da SQLite DB'de saklayın. +- **Bellek sızıntıları** – Bazı SDK'lar iç tamponları tutar; yavaşlama fark ederseniz `ocr_engine.dispose()` çağırın. + +## Sonraki Adımlar – Basit OCR'ın Ötesine Geçmek + +Artık tek bir görüntü için **how to use OCR** konusunda uzmanlaştığınıza göre, şu uzantıları düşünün: + +1. **Bulut depolama ile bütünleştirme** – Görüntüleri AWS S3 veya Azure Blob'dan çekin, aynı işlem hattını çalıştırın ve sonuçları geri gönderin. +2. **Dil algılama ekleyin** – `ocr_engine.detect_language()` kullanarak sözlükleri otomatik olarak değiştirin. +3. **NLP ile birleştirin** – Temizlenmiş metni spaCy veya NLTK'ye vererek varlıkları, tarihleri veya eylem maddelerini çıkarın. +4. **REST uç noktası oluşturun** – Betiği Flask veya FastAPI ile sararak diğer hizmetlerin görüntü POST etmesini ve JSON‑kodlu metin almasını sağlayın. + +Bu fikirlerin tümü hâlâ **recognize handwritten text**, **extract handwritten text** ve **convert handwritten image** temel kavramları etrafında dönüyor — muhtemelen bir sonraki aramanızda kullanacağınız tam ifadeler. + +--- + +### TL;DR + +Size **how to use OCR**'ı gösterdik; el yazısı metni tanıma, çıkartma ve sonucu kullanılabilir bir dizeye parlatma. Tam betik çalıştırmaya hazır, iş akışı adım adım açıklandı ve artık yaygın kenar durumları için bir kontrol listeniz var. Bir sonraki toplantı notunuzun fotoğrafını çekin, betiğe ekleyin ve makinenin sizin yerinize yazmasını sağlayın. + +Kodlamaktan keyif alın, ve notlarınız her zaman okunaklı olsun! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/turkish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/turkish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..78b08eb4b --- /dev/null +++ b/ocr/turkish/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,187 @@ +--- +category: general +date: 2026-03-28 +description: Görüntüde OCR gerçekleştir ve sınırlayıcı kutu koordinatlarıyla temiz + metin al. OCR'yi nasıl çıkaracağınızı, OCR'yi nasıl temizleyeceğinizi ve sonuçları + adım adım nasıl görüntüleyeceğinizi öğrenin. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: tr +og_description: Görüntüde OCR yapın, çıktıyı temizleyin ve sınırlayıcı kutu koordinatlarını + kısa bir öğreticide gösterin. +og_title: Görselde OCR Yap – Temiz Sonuçlar ve Sınır Kutuları +tags: +- OCR +- Computer Vision +- Python +title: Görüntüde OCR Gerçekleştir – Temiz Sonuçlar ve Sınırlama Kutusu Koordinatlarını + Göster +url: /tr/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Görüntüde OCR Yap – Sonuçları Temizle ve Sınırlayıcı Kutu Koordinatlarını Göster + +Görüntü dosyalarında **OCR yapma** ihtiyacı hissettiniz ama dağınık metinler alıyordunuz ve her kelimenin resimde nerede olduğunu bilemiyordunuz? Tek başınıza değilsiniz. Birçok projede—fatura dijitalleştirme, fiş tarama veya basit metin çıkarma—ham OCR çıktısını elde etmek sadece ilk engeldir. İyi haber? Bu çıktıyı temizleyebilir ve çok fazla tekrarlayan kod yazmadan her bölgenin sınırlayıcı kutu koordinatlarını anında görebilirsiniz. + +Bu rehberde **OCR çıkarımını nasıl yapacağınızı**, bir **OCR temizleme** sonrası işlemcisini nasıl çalıştıracağınızı ve sonunda her temizlenmiş bölge için **sınırlayıcı kutu koordinatlarını nasıl göstereceğinizi** adım adım inceleyeceğiz. Sonunda bulanık bir fotoğrafı, sonraki işlemler için hazır, düzenli ve yapılandırılmış metne dönüştüren tek bir çalıştırılabilir betiğe sahip olacaksınız. + +## Gereksinimler + +- Python 3.9+ (aşağıdaki sözdizimi 3.8 ve üzeri sürümlerde çalışır) +- `recognize(..., return_structured=True)` destekleyen bir OCR motoru – örneğin, kod parçacığında kullanılan kurgusal `engine` kütüphanesi. Bunu Tesseract, EasyOCR veya bölge verisi dönen herhangi bir SDK ile değiştirin. +- Python fonksiyonları ve döngülerine temel aşinalık +- Taramak istediğiniz bir görüntü dosyası (PNG, JPG, vb.) + +> **Pro ipucu:** Tesseract kullanıyorsanız, `pytesseract.image_to_data` fonksiyonu zaten sınırlayıcı kutuları verir. Sonucunu, aşağıda gösterilen `engine.recognize` API'sini taklit eden küçük bir adaptöre sarabilirsiniz. + +--- + +![görüntüde OCR yapma örneği](image-placeholder.png "görüntüde OCR yapma örneği") + +*Alt metin: görüntüde OCR yapma ve sınırlayıcı kutu koordinatlarını görselleştirme diyagramı* + +## Adım 1 – Görüntüde OCR Yap ve Yapılandırılmış Bölgeleri Al + +İlk olarak OCR motorundan yalnızca düz metin değil, aynı zamanda metin bölgelerinin yapılandırılmış bir listesini döndürmesini istemeniz gerekir. Bu liste ham dizeyi ve onu çevreleyen dikdörtgeni içerir. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Neden Önemli:** +Sadece düz metin istediğinizde uzamsal bağlamı kaybedersiniz. Yapılandırılmış veri, daha sonra **sınırlayıcı kutu koordinatlarını görüntülemenizi**, metni tablolarla hizalamanızı veya kesin konumları sonraki bir modele beslemenizi sağlar. + +## Adım 2 – OCR Çıktısını Bir Post‑İşlemciyle Nasıl Temizlersiniz + +OCR motorları karakterleri tespit etmede iyidir, ancak genellikle gereksiz boşluklar, satır sonu artefaktları veya hatalı tanınan semboller bırakır. Bir post‑işlemci metni normalleştirir, yaygın OCR hatalarını düzeltir ve boşlukları temizler. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Kendi temizleyicinizi oluşturuyorsanız, şunları göz önünde bulundurun: + +- ASCII olmayan karakterleri kaldırma (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Birden fazla boşluğu tek bir boşluğa indirgeme +- `pyspellchecker` gibi bir yazım denetleyicisi uygulamak, belirgin yazım hatalarını düzeltmek için + +**Neden Önemli:** +Düzenli bir dize, arama, indeksleme ve sonraki NLP boru hatlarını çok daha güvenilir kılar. Başka bir deyişle, **OCR nasıl temizlenir** çoğu zaman kullanılabilir bir veri seti ile baş ağrısı arasındaki farktır. + +## Adım 3 – Her Temizlenmiş Bölge İçin Sınırlayıcı Kutu Koordinatlarını Göster + +Metin artık düzenli olduğuna göre, her bölgeyi döngüye alıp dikdörtgenini ve temizlenmiş dizesini yazdırıyoruz. İşte sonunda **sınırlayıcı kutu koordinatlarını gösterdiğimiz** kısım. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Örnek çıktı** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Artık bu koordinatları bir çizim kütüphanesine (ör. OpenCV) besleyerek orijinal görüntü üzerine kutular çizebilir veya daha sonraki sorgular için bir veritabanında saklayabilirsiniz. + +## Tam, Çalıştırmaya Hazır Betik + +Aşağıda üç adımı birleştiren tam program yer alıyor. Yer tutucu `engine` çağrılarını gerçek OCR SDK'nızla değiştirin. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Nasıl Çalıştırılır + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Yukarıdaki örnek çıktıya benzer şekilde, temizlenmiş metinle eşleşen sınırlayıcı kutu listesini görmelisiniz. + +## Sık Sorulan Sorular & Kenar Durumları + +| Soru | Cevap | +|----------|--------| +| **OCR motoru `return_structured` desteklemiyorsa ne olur?** | Motorun ham çıktısını (genellikle koordinatları olan kelimeler listesi) `text` ve `bounding_box` özniteliklerine sahip nesnelere dönüştüren ince bir sarmalayıcı yazın. | +| **Güven skorları alabilir miyim?** | Birçok SDK, bölge başına bir güven metriği sunar. Bunu yazdırma ifadesine ekleyin: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Döndürülmüş metni nasıl ele alırsınız?** | `recognize` çağırmadan önce OpenCV'nin `cv2.minAreaRect` fonksiyonuyla görüntüyü düzleştirerek ön işleme yapın. | +| **Çıktıyı JSON formatında ihtiyacım olursa ne yapmalıyım?** | `processed_result.regions` öğesini `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)` ile serileştirin. | +| **Kutuları görselleştirmenin bir yolu var mı?** | Döngü içinde OpenCV kullanın: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)`, ardından `cv2.imwrite("annotated.jpg", img)`. | + +## Sonuç + +Şimdi **görüntüde OCR nasıl yapılır**, ham çıktının nasıl temizlenir ve her bölge için **sınırlayıcı kutu koordinatlarının nasıl gösterilir** öğrendiniz. Tanıma → post‑işlem → yineleme adımlarından oluşan üç adımlı akış, güvenilir metin çıkarımı gerektiren herhangi bir Python projesine ekleyebileceğiniz yeniden kullanılabilir bir desendir. + +### Sıradaki Adımlar? + +- **Farklı OCR arka uçlarını keşfedin** (Tesseract, EasyOCR, Google Vision) ve doğruluklarını karşılaştırın. +- Bölge verilerini aranabilir arşivler için saklamak amacıyla **bir veritabanıyla bütünleştirin**. +- Her bölgeyi uygun yazım denetleyicisine yönlendirmek için **dil tespiti ekleyin**. +- Görsel doğrulama için **kutuları orijinal görüntünün üzerine bindirin** (yukarıdaki OpenCV kod parçacığına bakın). + +Eğer tuhaflıklarla karşılaşırsanız, en büyük kazancın sağlam bir post‑işlem adımından geldiğini unutmayın; temiz bir dize, karakterlerin ham dökümünden çok daha kolay işlenir. + +Kodlamaktan keyif alın, ve OCR boru hatlarınız her zaman düzenli olsun! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/turkish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/turkish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..b45222887 --- /dev/null +++ b/ocr/turkish/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,233 @@ +--- +category: general +date: 2026-03-28 +description: Aspose OCR Cloud ile Python’da metin çıkarma işlemini gösteren Python + OCR öğreticisi. OCR için görüntüyü nasıl yükleyeceğinizi ve görüntüyü dakikalar + içinde düz metne dönüştüreceğinizi öğrenin. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: tr +og_description: Python OCR öğreticisi, OCR için görüntünün nasıl yükleneceğini ve + Aspose OCR Cloud kullanarak görüntüyü düz metne nasıl dönüştüreceğinizi açıklar. + Tam kodu ve ipuçlarını alın. +og_title: Python OCR Eğitimi – Görsellerden Metin Çıkarma +tags: +- OCR +- Python +- Image Processing +title: Python OCR Eğitimi – Görsellerden Metin Çıkarma +url: /tr/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Tutorial – Extract Text from Images + +Hiç karışık bir fiş fotoğrafını temiz, aranabilir bir metne dönüştürmeyi düşündünüz mü? Tek başınıza değilsiniz. Benim deneyimime göre en büyük engel OCR motoru değil, görüntüyü doğru formata getirip düz metni sorunsuz bir şekilde çıkarmak. + +Bu **python ocr tutorial** size her adımı gösteriyor—OCR için bir görüntü yükleme, tanıma çalıştırma ve sonunda görüntünün düz metnini bir Python dizesi olarak saklayıp analiz edebileceğiniz şekilde dönüştürme. Sonuna geldiğinizde **extract text image python** tarzında metin çıkarabilecek ve başlamak için hiçbir ücretli lisansa ihtiyacınız olmayacak. + +## What You’ll Learn + +- Aspose OCR Cloud SDK for Python'ı nasıl kurup içe aktaracağınızı öğrenin. +- **load image for OCR** (PNG, JPEG, TIFF, PDF vb.) için kesin kodu alın. +- **ocr image to text** dönüşümünü gerçekleştirmek için motoru nasıl çağıracağınızı öğrenin. +- Çok sayfalı PDF'ler veya düşük çözünürlüklü taramalar gibi yaygın kenar durumlarını nasıl yöneteceğinize dair ipuçları. +- Çıktıyı nasıl doğrulayacağınızı ve metin bozuk göründüğünde ne yapmanız gerektiğini keşfedin. + +### Prerequisites + +- Makinenizde Python 3.8+ yüklü olmalı. +- Ücretsiz bir Aspose Cloud hesabı (deneme sürümü lisans gerektirmez). +- pip ve sanal ortamlar hakkında temel bilgi—fantezi bir şey gerekmez. + +> **Pro tip:** Zaten bir virtualenv kullanıyorsanız, şimdi etkinleştirin. Bağımlılıkları düzenli tutar ve sürüm çakışmalarını önler. + +![Python OCR öğretici ekran görüntüsü, tanınan metni gösteriyor](path/to/ocr_example.png "Python OCR öğretici – çıkarılan düz metin gösterimi") + +## Step 1 – Install the Aspose OCR Cloud SDK + +İlk iş olarak, Aspose'un OCR hizmetiyle iletişim kuran kütüphaneye ihtiyacımız var. Bir terminal açın ve şu komutu çalıştırın: + +```bash +pip install asposeocrcloud +``` + +Bu tek komut en yeni SDK'yı (şu anda sürüm 23.12) indirir. Paket ihtiyacınız olan her şeyi içerir—ekstra görüntü‑işleme kütüphanelerine gerek yoktur. + +## Step 2 – Initialise the OCR Engine (Primary Keyword in Action) + +SDK hazır olduğuna göre, **python ocr tutorial** motorunu başlatabiliriz. Yapıcı (constructor) deneme sürümü için lisans anahtarı gerektirmez, bu da işleri basitleştirir. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Neden önemli:** Motoru yalnızca bir kez başlatmak sonraki çağrıları hızlı tutar. Her görüntü için nesneyi yeniden oluşturursanız ağ trafiğini boşa harcarsınız. + +## Step 3 – Load Image for OCR + +İşte **load image for OCR** anahtar kelimesinin parladığı yer. SDK'nın `Image.load` metodu bir dosya yolu ya da URL kabul eder ve formatı otomatik olarak algılar (PNG, JPEG, TIFF, PDF vb.). Örnek bir fişi yükleyelim: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Çok sayfalı bir PDF ile çalışıyorsanız, sadece PDF dosyasına işaret edin; SDK her sayfayı dahili olarak ayrı bir görüntü gibi ele alır. + +## Step 4 – Perform OCR Image to Text Conversion + +Görüntü bellekteyken, gerçek OCR tek bir satırda gerçekleşir. `recognize` metodu bir `OcrResult` nesnesi döndürür; bu nesne düz metni, güven skorlarını ve isterseniz daha sonra kullanabileceğiniz sınırlama kutularını içerir. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Kenar durumu:** Düşük çözünürlüklü fotoğraflar (300 dpi altında) için önce görüntüyü büyütmek isteyebilirsiniz. SDK bir `Resize` yardımcı fonksiyonu sunar, ancak çoğu fiş için varsayılan ayarlar yeterlidir. + +## Step 5 – Convert Image Plain Text to a Usable String + +Bulmacanın son parçası, sonuç nesnesinden düz metni çıkarmaktır. Bu, OCR bloğunu yazdırabileceğiniz, saklayabileceğiniz veya başka bir sisteme besleyebileceğiniz bir dizeye dönüştüren **convert image plain text** adımıdır. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Betik çalıştırıldığında aşağıdakine benzer bir çıktı görmelisiniz: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Bu çıktı artık normal bir Python dizesi, CSV dışa aktarımı, veritabanı ekleme veya doğal dil işleme için hazır. + +## Handling Common Pitfalls + +### 1. Blank or Noisy Images + +`ocr_result.text` boş dönüyorsa, görüntü kalitesini tekrar kontrol edin. Hızlı bir çözüm, ön işleme adımı eklemektir: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. Multi‑Page PDFs + +PDF beslediğinizde, `recognize` her sayfa için sonuç döndürür. Aşağıdaki gibi döngüye alın: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Language Support + +Aspose OCR 60'tan fazla dili destekler. Dili değiştirmek için `recognize` çağırmadan önce `language` özelliğini ayarlayın: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Full Working Example + +Hepsini bir araya getiren, kurulumdan kenar‑durum yönetimine kadar her şeyi kapsayan tam bir kopyala‑yapıştır betiği: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Betik çalıştırın (`python ocr_demo.py`) ve **ocr image to text** çıktısını doğrudan konsolda göreceksiniz. + +## Recap – What We Covered + +- **Aspose OCR Cloud** SDK'sını kurdunuz (`pip install asposeocrcloud`). +- Lisans gerektirmeden **OCR motorunu başlattınız** (deneme için mükemmel). +- **load image for OCR** nasıl yapılır gösterildi; PNG, JPEG veya PDF fark etmez. +- **ocr image to text** dönüşümünü gerçekleştirdiniz ve **convert image plain text** adımıyla kullanılabilir bir Python dizesi elde ettiniz. +- Düşük çözünürlüklü taramalar, çok sayfalı PDF'ler ve dil seçimi gibi yaygın sorunları ele aldınız. + +## Next Steps & Related Topics + +Artık **python ocr tutorial**'ı kavradığınıza göre aşağıdakileri keşfedebilirsiniz: + +- Büyük fiş klasörlerini toplu işlemek için **extract text image python**. +- OCR çıktısını **pandas** ile veri analizi için birleştirin (`df = pd.read_csv(StringIO(extracted))`). +- İnternet bağlantısının sınırlı olduğu durumlarda **Tesseract OCR**'yi yedek olarak kullanın. +- **spaCy** ile tarih, tutar ve mağaza adı gibi varlıkları tanımlamak için son‑işleme ekleyin. + +Denemekten çekinmeyin: farklı görüntü formatları deneyin, kontrastı ayarlayın veya dilleri değiştirin. OCR dünyası geniş ve yeni edindiğiniz beceriler, herhangi bir belge‑otomasyon projesi için sağlam bir temel oluşturur. + +İyi kodlamalar, ve metniniz her zaman okunabilir olsun! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/turkish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/turkish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..e67f8b9f4 --- /dev/null +++ b/ocr/turkish/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,205 @@ +--- +category: general +date: 2026-03-28 +description: Görüntüde OCR çalıştırmayı, Hugging Face modelini otomatik olarak indirmeyi, + OCR metnini temizlemeyi ve Aspose OCR Cloud kullanarak Python’da LLM modelini yapılandırmayı + öğrenin. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: tr +og_description: Görüntüde OCR çalıştırın ve çıktıyı otomatik indirilmiş bir Hugging Face + modeli kullanarak temizleyin. Bu kılavuz, Python’da LLM modelini nasıl yapılandıracağınızı + gösterir. +og_title: Görüntüde OCR Çalıştır – Tam Aspose OCR Cloud Öğreticisi +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Aspose OCR Cloud ile Görüntüde OCR Çalıştırma – Tam Adım Adım Kılavuz +url: /tr/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Görüntüde OCR Çalıştırma – Tam Aspose OCR Cloud Öğreticisi + +Hiç bir görüntü dosyasında OCR çalıştırmanız gerekti, ancak ham çıktı karışık bir karmaşa gibi göründü mü? Benim deneyimime göre en büyük sorun tanıma değil—temizlik. Neyse ki, Aspose OCR Cloud size *OCR metnini temizleyebilen* bir LLM post‑işlemcisi ekleme imkanı sunuyor. Bu öğreticide ihtiyacınız olan her şeyi adım adım göstereceğiz: **Hugging Face modelinin indirilmesinden** LLM yapılandırmasına, OCR motorunun çalıştırılmasına ve sonunda sonucun cilalanmasına kadar. + +Bu rehberin sonunda çalıştırmaya hazır bir betiğiniz olacak: + +1. Hugging Face'ten kompakt bir Qwen 2.5 modelini çeker (sizin için otomatik indirilir). +2. Modeli ağın bir kısmını GPU, geri kalanını CPU'da çalışacak şekilde yapılandırır. +3. El yazısı not görüntüsü üzerinde OCR motorunu yürütür. +4. Tanınan metni temizlemek için LLM'yi kullanır ve size insan tarafından okunabilir bir çıktı verir. + +> **Prerequisites** – Python 3.8+, `asposeocrcloud` paketi, en az 4 GB VRAM'li bir GPU (isteğe bağlı ama tavsiye edilir) ve ilk model indirmesi için bir internet bağlantısı. + +--- + +## İhtiyacınız Olanlar + +- **Aspose OCR Cloud SDK** – `pip install asposeocrcloud` komutu ile kurun. +- **Bir örnek görüntü** – örneğin, yerel bir klasöre yerleştirilmiş `handwritten_note.jpg`. +- **GPU desteği** – CUDA destekli bir GPU'nuz varsa, betik 30 katmanı GPU'ya aktarır; aksi takdirde otomatik olarak CPU'ya geçer. +- **Yazma izni** – betik modeli `YOUR_DIRECTORY` içinde önbelleğe alır; klasörün mevcut olduğundan emin olun. + +## Adım 1 – LLM Modelini Yapılandırma (Hugging Face modelini indirme) + +İlk olarak Aspose AI'ye modeli nereden alacağını söylememiz gerekir. `AsposeAIModelConfig` sınıfı otomatik indirme, kuantizasyon ve GPU katman tahsislerini yönetir. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Neden önemli?** – `int8`'e kuantize etmek bellek kullanımını büyük ölçüde azaltır (≈ 4 GB vs 12 GB). Modeli GPU ve CPU arasında bölmek, mütevazı bir RTX 3060'da bile 3 milyar parametreli bir LLM çalıştırmanıza olanak tanır. GPU'nuz yoksa `gpu_layers=0` ayarlayın ve SDK her şeyi CPU'da tutar. + +> **İpucu:** İlk çalıştırmada ~ 1.5 GB indirilecektir, bu yüzden birkaç dakika ve stabil bir bağlantı sağlayın. + +## Adım 2 – Model Yapılandırmasıyla AI Motorunu Başlatma + +Şimdi Aspose AI motorunu başlatıyor ve az önce oluşturduğumuz yapılandırmayı ona veriyoruz. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Arka planda ne oluyor?** SDK, mevcut bir model için `directory_model_path`'i kontrol eder. Eşleşen bir sürüm bulursa anında yükler; aksi takdirde Hugging Face'ten GGUF dosyasını indirir, açar ve çıkarım hattını hazırlar. + +## Adım 3 – OCR Motorunu Oluşturma ve AI Post‑İşlemcisini Ekleme + +OCR motoru karakter tanımanın ağır işini yapar. `ocr_ai.run_postprocessor`'ı ekleyerek tanımanın ardından otomatik olarak **temiz OCR metni** etkinleştiriyoruz. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Neden bir post‑işlemci kullanmalı?** Ham OCR genellikle yanlış yerlerde satır sonları, hatalı noktalama işaretleri veya gereksiz semboller içerir. LLM, çıktıyı düzgün cümlelere yeniden yazar, yazım hatalarını düzeltir ve hatta eksik kelimeleri tahmin edebilir—temelde ham dökümü cilalı bir metne dönüştürür. + +## Adım 4 – Bir Görüntü Dosyasında OCR Çalıştırma + +Her şey bağlandığına göre, motoru bir görüntü ile besleme zamanı. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Köşe durumu:** Görüntü büyükse (> 5 MP), işleme hızını artırmak için önce yeniden boyutlandırmak isteyebilirsiniz. SDK, bir Pillow `Image` nesnesini kabul eder, bu yüzden gerektiğinde `PIL.Image.thumbnail()` ile ön işleme yapabilirsiniz. + +## Adım 5 – AI'yi Tanınan Metni Temizletmek ve Her İki Versiyonu da Gösterme + +Son olarak daha önce eklediğimiz post‑işlemciyi çağırıyoruz. Bu adım, *temizleme öncesi* ve *sonrası* arasındaki farkı gösterir. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Beklenen Çıktı + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +LLM'nin şu şekilde çalıştığını fark edin: + +- Yaygın OCR hatalarını düzeltti (`Th1s` → `This`). +- Gereksiz sembolleri kaldırdı (`&` → `and`). +- Satır sonlarını düzgün cümlelere dönüştürdü. + +## 🎨 Görsel Genel Bakış (Görüntüde OCR Çalıştırma İş Akışı) + +![Görüntüde OCR Çalıştırma iş akışı](run_ocr_on_image_workflow.png "Model indirilmesinden temizlenmiş çıktıya kadar görüntüde OCR çalıştırma sürecini gösteren diyagram") + +Yukarıdaki diyagram, tam iş akışını özetliyor: **Hugging Face modelini indir → LLM'yi yapılandır → AI'yi başlat → OCR motoru → AI post‑işlemci → temiz OCR metni**. + +## Sık Sorulan Sorular & Uzman İpuçları + +### GPU'm yoksa ne olur? + +`AsposeAIModelConfig` içinde `gpu_layers=0` ayarlayın. Model tamamen CPU'da çalışacak, bu daha yavaş ama yine de işlevsel. Ayrıca çıkarım süresini makul tutmak için daha küçük bir modele (ör. `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) geçebilirsiniz. + +### Modeli daha sonra nasıl değiştiririm? + +`hugging_face_repo_id` değerini güncelleyin ve `ocr_ai.initialize(model_config)`'i yeniden çalıştırın. SDK sürüm değişikliğini algılayacak, yeni modeli indirecek ve önbellekteki dosyaları değiştirecektir. + +### Post‑işlemci istemini özelleştirebilir miyim? + +Evet. `custom_settings`'e `prompt_template` anahtarı içeren bir sözlük geçirin. Örneğin: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Temizlenmiş metni bir dosyaya kaydetmeli miyim? + +Kesinlikle. Temizlemeden sonra sonucu `.txt` veya `.json` dosyasına yazarak sonraki işlemler için kullanabilirsiniz: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +## Sonuç + +Size Aspose OCR Cloud ile **görüntü dosyalarında OCR çalıştırma**, otomatik **Hugging Face modeli indirme**, uzmanlıkla **LLM model ayarlarını yapılandırma** ve sonunda güçlü bir LLM post‑işlemci kullanarak **OCR metnini temizleme** konularını gösterdik. Tüm süreç tek bir, çalıştırması kolay Python betiğine sığar ve hem GPU destekli hem de sadece CPU'lu makinelerde çalışır. + +Eğer bu iş akışına hâkimseniz, şu konularda denemeler yapabilirsiniz: + +- **Farklı LLM'ler** – daha geniş bir bağlam penceresi için `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` deneyin. +- **Toplu işleme** – bir klasördeki görüntüler üzerinde döngü kurup temizlenmiş sonuçları bir CSV'ye toplayın. +- **Özel istemler** – AI'yi alanınıza göre özelleştirin (hukuki belgeler, tıbbi notlar vb.). + +`gpu_layers` değerini istediğiniz gibi ayarlayabilir, modeli değiştirebilir veya kendi isteminizi ekleyebilirsiniz. Gökyüzü sınırdır ve şu an elinizdeki kod bir başlangıç noktasıdır. + +İyi kodlamalar, ve OCR çıktılarınız daima temiz olsun! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/vietnamese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md b/ocr/vietnamese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md new file mode 100644 index 000000000..4ca90bf17 --- /dev/null +++ b/ocr/vietnamese/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/_index.md @@ -0,0 +1,225 @@ +--- +category: general +date: 2026-03-28 +description: Cách sử dụng OCR để nhận dạng văn bản viết tay trong hình ảnh. Học cách + trích xuất văn bản viết tay, chuyển đổi hình ảnh viết tay và nhận kết quả sạch nhanh + chóng. +draft: false +keywords: +- how to use OCR +- recognize handwritten text +- extract handwritten text +- handwritten note to text +- convert handwritten image +language: vi +og_description: Cách sử dụng OCR để nhận dạng văn bản viết tay. Hướng dẫn này sẽ chỉ + cho bạn từng bước cách trích xuất văn bản viết tay từ hình ảnh và đạt được kết quả + hoàn thiện. +og_title: Cách Sử Dụng OCR Để Nhận Diện Văn Bản Viết Tay – Hướng Dẫn Toàn Diện +tags: +- OCR +- Handwriting Recognition +- Python +title: Cách Sử Dụng OCR Để Nhận Dạng Văn Bản Viết Bằng Tay – Hướng Dẫn Toàn Diện +url: /vi/python/general/how-to-use-ocr-to-recognize-handwritten-text-complete-guide/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Cách Sử Dụng OCR Để Nhận Diện Văn Bản Viết Tay – Hướng Dẫn Toàn Diện + +Cách sử dụng OCR cho các ghi chú viết tay là câu hỏi mà nhiều nhà phát triển đặt ra khi họ cần số hoá các bản phác thảo, biên bản họp, hoặc những ý tưởng nhanh. Trong hướng dẫn này, chúng tôi sẽ hướng dẫn chi tiết các bước để nhận diện văn bản viết tay, trích xuất văn bản viết tay và chuyển đổi hình ảnh viết tay thành các chuỗi sạch, có thể tìm kiếm. + +Nếu bạn đã bao giờ nhìn chằm chằm vào một bức ảnh danh sách mua sắm và tự hỏi, “Liệu tôi có thể chuyển đổi hình ảnh viết tay này thành văn bản mà không phải gõ lại mọi thứ không?” – bạn đang ở đúng nơi. Khi kết thúc, bạn sẽ có một script sẵn sàng chạy để biến **ghi chú viết tay thành văn bản** trong vài giây. + +## Những Gì Bạn Cần + +- Python 3.8+ (mã hoạt động với bất kỳ phiên bản mới nào) +- Thư viện `ocr` – cài đặt bằng `pip install ocr-sdk` (thay bằng tên gói của nhà cung cấp của bạn) +- Một bức ảnh rõ ràng của ghi chú viết tay (`hand_note.png` trong ví dụ) +- Một chút tò mò và một tách cà phê ☕️ (tùy chọn nhưng được khuyến nghị) + +Không có framework nặng, không có khóa cloud trả phí – chỉ một engine cục bộ hỗ trợ **handwritten recognition** ngay từ đầu. + +## Bước 1 – Cài Đặt Gói OCR và Nhập Vào Script + +Đầu tiên, hãy cài đặt gói phù hợp trên máy của bạn. Mở terminal và chạy: + +```bash +pip install ocr-sdk +``` + +Khi quá trình cài đặt hoàn tất, nhập module vào script của bạn: + +```python +# Step 1: Import the OCR SDK +import ocr +``` + +> **Mẹo chuyên nghiệp:** Nếu bạn đang sử dụng môi trường ảo, hãy kích hoạt nó trước khi cài đặt. Điều này giúp dự án của bạn gọn gàng và tránh xung đột phiên bản. + +## Bước 2 – Tạo Engine OCR và Bật Chế Độ Viết Tay + +Bây giờ chúng ta thực sự **cách sử dụng OCR** – chúng ta cần một instance engine biết rằng chúng ta đang xử lý các nét chữ viết liền thay vì phông chữ in. Đoạn mã sau tạo engine và chuyển sang chế độ viết tay: + +```python +# Step 2: Initialize the OCR engine for handwritten text +ocr_engine = ocr.OcrEngine() +ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN +``` + +Tại sao phải đặt `recognition_mode`? Bởi vì hầu hết các engine OCR mặc định chỉ phát hiện văn bản in, thường bỏ qua các vòng và góc nghiêng của ghi chú cá nhân. Bật chế độ viết tay sẽ tăng độ chính xác đáng kể. + +## Bước 3 – Tải Ảnh Muốn Chuyển Đổi (Convert Handwritten Image) + +Ảnh là nguyên liệu thô cho bất kỳ công việc OCR nào. Đảm bảo ảnh của bạn được lưu ở định dạng không mất dữ liệu (PNG hoạt động tốt) và văn bản đủ rõ để đọc. Sau đó tải nó như sau: + +```python +# Step 3: Load the handwritten image you want to convert +handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") +``` + +Nếu ảnh nằm cùng thư mục với script, bạn có thể chỉ dùng `"hand_note.png"` thay vì đường dẫn đầy đủ. + +> **Nếu ảnh bị mờ?** Hãy thử tiền xử lý bằng OpenCV (ví dụ, `cv2.cvtColor` để chuyển sang grayscale, `cv2.threshold` để tăng độ tương phản) trước khi đưa vào engine OCR. + +## Bước 4 – Chạy Engine Nhận Diện Để Trích Xuất Văn Bản Viết Tay + +Với engine đã sẵn sàng và ảnh đã được nạp vào bộ nhớ, chúng ta cuối cùng có thể **trích xuất văn bản viết tay**. Phương thức `recognize` trả về một đối tượng kết quả thô chứa văn bản cùng với điểm tin cậy. + +```python +# Step 4: Perform OCR and get the raw result +raw_result = ocr_engine.recognize(handwritten_image) +print("Raw OCR output:") +print(raw_result.text) +``` + +Kết quả thô thường có thể chứa các dấu ngắt dòng lạc hoặc ký tự bị nhận dạng sai, đặc biệt nếu chữ viết tay lộn xộn. Đó là lý do có bước tiếp theo. + +## Bước 5 – (Tùy Chọn) Tinh Chỉnh Kết Quả Bằng Bộ Xử Lý AI Sau Khi Nhận Diện + +Hầu hết các SDK OCR hiện đại đi kèm với một bộ xử lý AI nhẹ nhàng giúp làm sạch khoảng cách, sửa các lỗi OCR phổ biến và chuẩn hoá ký tự xuống dòng. Chạy nó rất đơn giản: + +```python +# Step 5: Refine the raw OCR output (handwritten note to text) +polished_result = ocr_engine.run_postprocessor(raw_result) + +# Display the cleaned, readable text +print("\nPolished OCR output:") +print(polished_result.text) +``` + +Nếu bạn bỏ qua bước này, vẫn sẽ nhận được văn bản có thể sử dụng, nhưng việc **chuyển đổi ghi chú viết tay thành văn bản** sẽ có chút thô ráp. Bộ xử lý sau rất hữu ích cho các ghi chú có dấu đầu dòng hoặc từ hỗn hợp chữ hoa/chữ thường. + +## Bước 6 – Kiểm Tra Kết Quả và Xử Lý Các Trường Hợp Đặc Biệt + +Sau khi in ra kết quả đã được tinh chỉnh, hãy kiểm tra lại để chắc chắn mọi thứ đúng. Dưới đây là một kiểm tra nhanh bạn có thể thêm: + +```python +# Step 6: Simple verification +if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") +else: + print("\n✅ OCR succeeded! You can now save or further process the text.") +``` + +**Danh sách kiểm tra các trường hợp đặc biệt** + +| Tình huống | Cần làm gì | +|-----------|------------| +| **Độ tương phản rất thấp** | Tăng độ tương phản bằng `cv2.convertScaleAbs` trước khi tải. | +| **Nhiều ngôn ngữ** | Đặt `ocr_engine.language = ["en", "es"]` (hoặc các ngôn ngữ mục tiêu của bạn). | +| **Tài liệu lớn** | Xử lý các trang theo lô để tránh tăng đột biến bộ nhớ. | +| **Ký hiệu đặc biệt** | Thêm từ điển tùy chỉnh qua `ocr_engine.add_custom_words([...])`. | + +## Tổng Quan Hình Ảnh + +Dưới đây là một hình ảnh placeholder minh họa quy trình – từ một ghi chú được chụp ảnh đến văn bản sạch. Văn bản alt chứa từ khóa chính, giúp hình ảnh thân thiện với SEO. + +![how to use OCR on a handwritten note image](/images/handwritten_ocr_flow.png "how to use OCR on a handwritten note image") + +## Script Đầy Đủ, Có Thể Chạy Ngay + +Kết hợp tất cả các phần lại, đây là chương trình hoàn chỉnh, sẵn sàng sao chép‑dán: + +```python +# Complete script: Convert a handwritten image to clean text using OCR + +import ocr + +def main(): + # 1️⃣ Initialize OCR engine for handwritten recognition + ocr_engine = ocr.OcrEngine() + ocr_engine.recognition_mode = ocr.RecognitionMode.HANDWRITTEN + + # 2️⃣ Load the image containing the handwritten note + handwritten_image = ocr.Image.load(r"YOUR_DIRECTORY/hand_note.png") + + # 3️⃣ Perform OCR to extract raw text + raw_result = ocr_engine.recognize(handwritten_image) + print("Raw OCR output:") + print(raw_result.text) + + # 4️⃣ (Optional) Run AI post‑processor for cleaner output + polished_result = ocr_engine.run_postprocessor(raw_result) + + # 5️⃣ Show the polished, readable text + print("\nPolished OCR output:") + print(polished_result.text) + + # 6️⃣ Simple sanity check + if not polished_result.text.strip(): + raise ValueError("OCR returned an empty string – check image quality.") + else: + print("\n✅ OCR succeeded! You can now save or further process the text.") + +if __name__ == "__main__": + main() +``` + +**Kết quả mong đợi (ví dụ)** + +``` +Raw OCR output: +T0d@y I w3nt to the market +and bought 5 aplpes, 2 bananas, +and a loaf of bread. + +Polished OCR output: +Today I went to the market and bought 5 apples, 2 bananas, and a loaf of bread. +``` + +Lưu ý cách bộ xử lý sau đã sửa lỗi “T0d@y” và chuẩn hoá khoảng cách. + +## Những Sai Lầm Thường Gặp & Mẹo Chuyên Nghiệp + +- **Kích thước ảnh quan trọng** – Các engine OCR thường giới hạn kích thước đầu vào ở 4 K × 4 K. Hãy thay đổi kích thước ảnh lớn trước. +- **Phong cách viết tay** – Chữ viết liền (cursive) so với chữ khối có thể ảnh hưởng đến độ chính xác. Nếu bạn kiểm soát nguồn (ví dụ, bút kỹ thuật số), khuyến khích sử dụng chữ khối để có kết quả tốt nhất. +- **Xử lý hàng loạt** – Khi làm việc với hàng chục ghi chú, hãy bọc script trong vòng lặp và lưu mỗi kết quả vào CSV hoặc cơ sở dữ liệu SQLite. +- **Rò rỉ bộ nhớ** – Một số SDK giữ bộ đệm nội bộ; gọi `ocr_engine.dispose()` sau khi hoàn thành nếu bạn nhận thấy chậm lại. + +## Các Bước Tiếp Theo – Vượt Qua OCR Cơ Bản + +Bây giờ bạn đã thành thạo **cách sử dụng OCR** cho một hình ảnh duy nhất, hãy xem xét các mở rộng sau: + +1. **Tích hợp với lưu trữ đám mây** – Lấy ảnh từ AWS S3 hoặc Azure Blob, chạy cùng quy trình và đẩy kết quả trở lại. +2. **Thêm phát hiện ngôn ngữ** – Sử dụng `ocr_engine.detect_language()` để tự động chuyển đổi từ điển. +3. **Kết hợp với NLP** – Đưa văn bản đã làm sạch vào spaCy hoặc NLTK để trích xuất thực thể, ngày tháng hoặc các mục hành động. +4. **Tạo endpoint REST** – Đóng gói script trong Flask hoặc FastAPI để các dịch vụ khác có thể POST ảnh và nhận văn bản dạng JSON. + +Tất cả những ý tưởng này vẫn xoay quanh các khái niệm cốt lõi **recognize handwritten text**, **extract handwritten text**, và **convert handwritten image**—những cụm từ bạn có thể sẽ tìm kiếm tiếp theo. + +--- + +### TL;DR + +Chúng tôi đã chỉ cho bạn **cách sử dụng OCR** để nhận diện văn bản viết tay, trích xuất nó và tinh chỉnh kết quả thành một chuỗi có thể sử dụng. Script đầy đủ đã sẵn sàng chạy, quy trình được giải thích từng bước, và bạn đã có danh sách kiểm tra cho các trường hợp đặc biệt. Hãy chụp một bức ảnh ghi chú cuộc họp tiếp theo, đưa vào script, và để máy móc thực hiện việc gõ cho bạn. + +Chúc lập trình vui vẻ, và mong các ghi chú của bạn luôn dễ đọc! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/vietnamese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md b/ocr/vietnamese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md new file mode 100644 index 000000000..600cca6c6 --- /dev/null +++ b/ocr/vietnamese/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/_index.md @@ -0,0 +1,185 @@ +--- +category: general +date: 2026-03-28 +description: Thực hiện OCR trên hình ảnh và lấy văn bản sạch cùng tọa độ hộp bao. + Học cách trích xuất OCR, làm sạch OCR và hiển thị kết quả từng bước. +draft: false +keywords: +- perform OCR on image +- how to extract OCR +- how to clean OCR +- display bounding box coordinates +- OCR post‑processing +- OCR bounding boxes +language: vi +og_description: Thực hiện OCR trên hình ảnh, làm sạch kết quả và hiển thị tọa độ hộp + bao trong một hướng dẫn ngắn gọn. +og_title: Thực hiện OCR trên hình ảnh – Kết quả sạch và hộp bao +tags: +- OCR +- Computer Vision +- Python +title: Thực hiện OCR trên hình ảnh – Làm sạch kết quả và hiển thị tọa độ hộp bao +url: /vi/python/general/perform-ocr-on-image-clean-results-and-show-bounding-box-coo/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Thực hiện OCR trên hình ảnh – Làm sạch kết quả và hiển thị tọa độ Bounding Box + +Bạn đã bao giờ cần **perform OCR on image** nhưng luôn nhận được văn bản lộn xộn và không chắc mỗi từ nằm ở đâu trên ảnh? Bạn không phải là người duy nhất. Trong nhiều dự án—số hoá hoá đơn, quét biên lai, hay chỉ đơn giản là trích xuất văn bản—việc có được đầu ra OCR thô chỉ là rào cản đầu tiên. Tin tốt là gì? Bạn có thể làm sạch đầu ra đó và ngay lập tức xem tọa độ bounding box của mỗi vùng mà không cần viết một loạt mã lặp lại. + +Trong hướng dẫn này, chúng ta sẽ đi qua **cách trích xuất OCR**, chạy một **cách làm sạch OCR** post‑processor, và cuối cùng **hiển thị tọa độ bounding box** cho mỗi vùng đã được làm sạch. Khi kết thúc, bạn sẽ có một script duy nhất, có thể chạy ngay, biến một bức ảnh mờ thành văn bản gọn gàng, có cấu trúc, sẵn sàng cho các bước xử lý tiếp theo. + +## Những gì bạn cần + +- Python 3.9+ (cú pháp dưới đây hoạt động trên 3.8 và mới hơn) +- Một engine OCR hỗ trợ `recognize(..., return_structured=True)` – ví dụ, một thư viện giả `engine` được dùng trong đoạn mã. Thay thế bằng Tesseract, EasyOCR, hoặc bất kỳ SDK nào trả về dữ liệu vùng. +- Kiến thức cơ bản về hàm và vòng lặp trong Python +- Một file ảnh bạn muốn quét (PNG, JPG, v.v.) + +> **Mẹo chuyên nghiệp:** Nếu bạn đang dùng Tesseract, hàm `pytesseract.image_to_data` đã cung cấp sẵn bounding box. Bạn có thể bọc kết quả của nó trong một adapter nhỏ để mô phỏng API `engine.recognize` như dưới đây. + +--- + +![diagram showing how to perform OCR on image and visualize bounding box coordinates](image-placeholder.png "diagram showing how to perform OCR on image and visualize bounding box coordinates") + +*Alt text: sơ đồ minh họa cách thực hiện OCR trên hình ảnh và hiển thị tọa độ bounding box* + +## Bước 1 – Thực hiện OCR trên hình ảnh và lấy các vùng có cấu trúc + +Điều đầu tiên là yêu cầu engine OCR trả về không chỉ văn bản thuần mà còn một danh sách có cấu trúc các vùng văn bản. Danh sách này chứa chuỗi thô và hình chữ nhật bao quanh nó. + +```python +import engine # replace with your actual OCR library +from pathlib import Path + +# Load the image you want to process +image_path = Path("sample_invoice.jpg") +image = engine.load_image(image_path) + +# Step 1: Perform OCR and request a structured list of text regions +raw_result = engine.recognize(image, return_structured=True) +``` + +**Tại sao điều này quan trọng:** +Khi bạn chỉ yêu cầu văn bản thuần, bạn sẽ mất ngữ cảnh không gian. Dữ liệu có cấu trúc cho phép bạn sau này **hiển thị tọa độ bounding box**, căn chỉnh văn bản với bảng, hoặc cung cấp vị trí chính xác cho mô hình downstream. + +## Bước 2 – Cách làm sạch đầu ra OCR bằng post‑processor + +Các engine OCR giỏi trong việc nhận dạng ký tự, nhưng chúng thường để lại các khoảng trắng thừa, ký tự ngắt dòng, hoặc ký tự nhận dạng sai. Một post‑processor sẽ chuẩn hoá văn bản, sửa các lỗi OCR phổ biến, và cắt bỏ khoảng trắng thừa. + +```python +# Step 2: Clean the plain‑text of each region using the post‑processor +processed_result = engine.run_postprocessor(raw_result) +``` + +Nếu bạn tự xây dựng bộ làm sạch, hãy cân nhắc: + +- Loại bỏ các ký tự không phải ASCII (`re.sub(r'[^\x00-\x7F]+',' ', text)`) +- Gộp nhiều khoảng trắng thành một khoảng trắng duy nhất +- Áp dụng bộ kiểm tra chính tả như `pyspellchecker` để sửa các lỗi chính tả rõ ràng + +**Tại sao bạn nên quan tâm:** +Một chuỗi gọn gàng giúp việc tìm kiếm, lập chỉ mục, và các pipeline NLP downstream trở nên đáng tin cậy hơn rất nhiều. Nói cách khác, **how to clean OCR** thường là yếu tố quyết định giữa một bộ dữ liệu có thể sử dụng và một cơn đau đầu. + +## Bước 3 – Hiển thị tọa độ Bounding Box cho mỗi vùng đã được làm sạch + +Bây giờ văn bản đã sạch sẽ, chúng ta lặp qua từng vùng, in ra hình chữ nhật và chuỗi đã được làm sạch. Đây là phần mà chúng ta cuối cùng **hiển thị tọa độ bounding box**. + +```python +# Step 3 – Iterate over the cleaned regions and display their bounding box and text +for text_region in processed_result.regions: + # Each region has a .bounding_box attribute (x, y, width, height) + bbox = text_region.bounding_box + print(f"[{bbox}] {text_region.text}") +``` + +**Kết quả mẫu** + +``` +[(34, 120, 210, 30)] Invoice #12345 +[(34, 160, 420, 28)] Date: 2026‑03‑01 +[(34, 200, 380, 28)] Total Amount: $1,254.00 +``` + +Bạn có thể đưa các tọa độ này vào một thư viện vẽ (ví dụ, OpenCV) để phủ các hộp lên ảnh gốc, hoặc lưu chúng vào cơ sở dữ liệu để truy vấn sau. + +## Script đầy đủ, sẵn sàng chạy + +Dưới đây là chương trình hoàn chỉnh kết hợp ba bước trên. Thay thế các lời gọi `engine` placeholder bằng SDK OCR thực tế của bạn. + +```python +#!/usr/bin/env python3 +""" +Perform OCR on image → clean results → display bounding box coordinates. +Author: Your Name +Date: 2026‑03‑28 +""" + +import engine # <-- replace with your OCR library +from pathlib import Path +import sys + +def main(image_path: str): + # Load image + image = engine.load_image(Path(image_path)) + + # 1️⃣ Perform OCR and ask for structured output + raw_result = engine.recognize(image, return_structured=True) + + # 2️⃣ Clean the raw text using the built‑in post‑processor + processed_result = engine.run_postprocessor(raw_result) + + # 3️⃣ Show each region's bounding box and cleaned text + print("\n=== Cleaned OCR Regions ===") + for region in processed_result.regions: + bbox = region.bounding_box # (x, y, w, h) + print(f"[{bbox}] {region.text}") + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python perform_ocr.py ") + sys.exit(1) + main(sys.argv[1]) +``` + +### Cách chạy + +```bash +python perform_ocr.py sample_invoice.jpg +``` + +Bạn sẽ thấy danh sách các bounding box kèm theo văn bản đã được làm sạch, giống như kết quả mẫu ở trên. + +## Câu hỏi thường gặp & Trường hợp đặc biệt + +| Câu hỏi | Trả lời | +|----------|--------| +| **Nếu engine OCR không hỗ trợ `return_structured` thì sao?** | Viết một wrapper mỏng để chuyển đổi đầu ra thô của engine (thường là danh sách các từ kèm tọa độ) thành các đối tượng có thuộc tính `text` và `bounding_box`. | +| **Có thể lấy điểm tin cậy (confidence scores) không?** | Nhiều SDK cung cấp chỉ số confidence cho mỗi vùng. Thêm nó vào câu lệnh in: `print(f"[{bbox}] {region.text} (conf: {region.confidence:.2f})")`. | +| **Làm sao xử lý văn bản bị quay?** | Tiền xử lý ảnh bằng `cv2.minAreaRect` của OpenCV để cân chỉnh trước khi gọi `recognize`. | +| **Nếu tôi cần đầu ra ở dạng JSON?** | Serialize `processed_result.regions` bằng `json.dumps([r.__dict__ for r in processed_result.regions], indent=2)`. | +| **Có cách nào để trực quan hoá các hộp không?** | Dùng OpenCV: `cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2)` trong vòng lặp, sau đó `cv2.imwrite("annotated.jpg", img)`. | + +## Kết luận + +Bạn vừa học được **cách thực hiện OCR trên hình ảnh**, làm sạch đầu ra thô, và **hiển thị tọa độ bounding box** cho mỗi vùng. Quy trình ba bước — nhận dạng → post‑process → lặp — là một mẫu có thể tái sử dụng trong bất kỳ dự án Python nào cần trích xuất văn bản đáng tin cậy. + +### Tiếp theo? + +- **Khám phá các back‑end OCR khác nhau** (Tesseract, EasyOCR, Google Vision) và so sánh độ chính xác. +- **Tích hợp với cơ sở dữ liệu** để lưu trữ dữ liệu vùng cho kho lưu trữ có thể tìm kiếm. +- **Thêm phát hiện ngôn ngữ** để định hướng mỗi vùng qua bộ kiểm tra chính tả phù hợp. +- **Phủ các hộp lên ảnh gốc** để xác minh trực quan (xem đoạn mã OpenCV ở trên). + +Nếu gặp bất kỳ vấn đề nào, hãy nhớ rằng lợi thế lớn nhất đến từ một bước post‑processing vững chắc; một chuỗi sạch sẽ dễ làm việc hơn rất nhiều so với một đống ký tự thô. + +Chúc lập trình vui vẻ, và chúc các pipeline OCR của bạn luôn gọn gàng! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/vietnamese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md b/ocr/vietnamese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md new file mode 100644 index 000000000..e9e3b211f --- /dev/null +++ b/ocr/vietnamese/python/general/python-ocr-tutorial-extract-text-from-images/_index.md @@ -0,0 +1,232 @@ +--- +category: general +date: 2026-03-28 +description: Hướng dẫn OCR Python cho thấy cách trích xuất văn bản từ hình ảnh bằng + Python với Aspose OCR Cloud. Học cách tải hình ảnh để OCR và chuyển đổi hình ảnh + thành văn bản thuần trong vài phút. +draft: false +keywords: +- python ocr tutorial +- extract text image python +- ocr image to text +- load image for ocr +- convert image plain text +language: vi +og_description: Hướng dẫn OCR Python giải thích cách tải ảnh để OCR và chuyển đổi + văn bản thuần từ ảnh bằng Aspose OCR Cloud. Nhận mã đầy đủ và các mẹo. +og_title: Hướng Dẫn OCR Python – Trích Xuất Văn Bản Từ Hình Ảnh +tags: +- OCR +- Python +- Image Processing +title: Hướng dẫn OCR bằng Python – Trích xuất văn bản từ hình ảnh +url: /vi/python/general/python-ocr-tutorial-extract-text-from-images/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Python OCR Tutorial – Trích xuất văn bản từ hình ảnh + +Bạn đã bao giờ tự hỏi làm thế nào để biến một bức ảnh biên lai lộn xộn thành văn bản sạch sẽ, có thể tìm kiếm không? Bạn không phải là người duy nhất. Theo kinh nghiệm của tôi, rào cản lớn nhất không phải là engine OCR mà là việc đưa hình ảnh vào định dạng đúng và trích xuất văn bản thuần túy một cách suôn sẻ. + +Bài **python ocr tutorial** này sẽ hướng dẫn bạn qua từng bước — tải hình ảnh cho OCR, chạy quá trình nhận dạng, và cuối cùng chuyển văn bản thuần từ hình ảnh thành một chuỗi Python mà bạn có thể lưu trữ hoặc phân tích. Khi hoàn thành, bạn sẽ có thể **extract text image python** theo phong cách, và bạn sẽ không cần bất kỳ giấy phép trả phí nào để bắt đầu. + +## Những gì bạn sẽ học + +- Cách cài đặt và import Aspose OCR Cloud SDK cho Python. +- Mã chính xác để **load image for OCR** (PNG, JPEG, TIFF, PDF, v.v.). +- Cách gọi engine để thực hiện chuyển đổi **ocr image to text**. +- Mẹo xử lý các trường hợp góc cạnh phổ biến như PDF đa trang hoặc ảnh có độ phân giải thấp. +- Cách xác minh đầu ra và những gì cần làm nếu văn bản bị rối. + +### Yêu cầu trước + +- Python 3.8+ đã được cài đặt trên máy của bạn. +- Một tài khoản Aspose Cloud miễn phí (bản dùng thử hoạt động mà không cần giấy phép). +- Hiểu biết cơ bản về pip và môi trường ảo — không cần gì phức tạp. + +> **Pro tip:** Nếu bạn đã sử dụng virtualenv, hãy kích hoạt nó ngay bây giờ. Nó giúp các phụ thuộc của bạn gọn gàng và tránh xung đột phiên bản. + +![Ảnh chụp màn hình Python OCR tutorial hiển thị văn bản đã nhận dạng](path/to/ocr_example.png "Python OCR tutorial – hiển thị văn bản thuần đã trích xuất") + +## Bước 1 – Cài đặt Aspose OCR Cloud SDK + +Trước hết, chúng ta cần thư viện giao tiếp với dịch vụ OCR của Aspose. Mở terminal và chạy: + +```bash +pip install asposeocrcloud +``` + +Lệnh duy nhất này sẽ tải SDK mới nhất (hiện tại phiên bản 23.12). Gói này bao gồm mọi thứ bạn cần — không cần thư viện xử lý ảnh bổ sung. + +## Bước 2 – Khởi tạo Engine OCR (Từ khóa chính đang hoạt động) + +Bây giờ SDK đã sẵn sàng, chúng ta có thể khởi động engine **python ocr tutorial**. Constructor không cần bất kỳ khóa giấy phép nào cho bản dùng thử, giúp mọi thứ đơn giản hơn. + +```python +import asposeocrcloud as ocr + +# Initialise the OCR engine – no licence needed for trial use +ocr_engine = ocr.OcrEngine() +``` + +> **Why this matters:** Khởi tạo engine chỉ một lần giúp các lần gọi tiếp theo nhanh hơn. Nếu bạn tạo lại đối tượng cho mỗi hình ảnh, bạn sẽ lãng phí các lượt truyền tải mạng. + +## Bước 3 – Tải hình ảnh cho OCR + +Đây là nơi từ khóa **load image for OCR** tỏa sáng. Phương thức `Image.load` của SDK chấp nhận đường dẫn tệp hoặc URL, và tự động phát hiện định dạng (PNG, JPEG, TIFF, PDF, v.v.). Hãy tải một biên lai mẫu: + +```python +# Step 3: Load the input image (PNG, JPEG, TIFF, PDF …) +input_image = ocr.Image.load(r"YOUR_DIRECTORY/receipt.png") +``` + +Nếu bạn đang xử lý một PDF đa trang, chỉ cần chỉ tới tệp PDF; SDK sẽ xem mỗi trang như một hình ảnh riêng biệt bên trong. + +## Bước 4 – Thực hiện chuyển đổi OCR Image to Text + +Với hình ảnh đã được tải vào bộ nhớ, quá trình OCR thực tế diễn ra trong một dòng lệnh. Phương thức `recognize` trả về một đối tượng `OcrResult` chứa văn bản thuần, điểm tin cậy, và thậm chí các hộp giới hạn nếu bạn cần sau này. + +```python +# Step 4: Perform OCR on the loaded image +ocr_result = ocr_engine.recognize(input_image) +``` + +> **Edge case:** Đối với ảnh độ phân giải thấp (dưới 300 dpi) bạn có thể muốn tăng kích thước ảnh trước. SDK cung cấp một công cụ trợ giúp `Resize`, nhưng đối với hầu hết các biên lai, mặc định vẫn hoạt động tốt. + +## Bước 5 – Chuyển đổi Văn bản Thuần từ Hình ảnh thành Chuỗi có thể sử dụng + +Mảnh cuối cùng của câu đố là trích xuất văn bản thuần từ đối tượng kết quả. Đây là bước **convert image plain text** biến khối dữ liệu OCR thành thứ bạn có thể in, lưu trữ, hoặc đưa vào hệ thống khác. + +```python +# Step 5: Output the recognised plain text +print("Plain OCR:") +print(ocr_result.text) +``` + +Khi bạn chạy script, bạn sẽ thấy kết quả giống như: + +``` +Plain OCR: +Starbucks Coffee +Date: 2026/03/27 +Total: $4.75 +Thank you! +``` + +Kết quả đó bây giờ là một chuỗi Python thông thường, sẵn sàng cho việc xuất CSV, chèn vào cơ sở dữ liệu, hoặc xử lý ngôn ngữ tự nhiên. + +## Xử lý các vấn đề thường gặp + +### 1. Hình ảnh trống hoặc nhiễu + +Nếu `ocr_result.text` trả về rỗng, hãy kiểm tra lại chất lượng hình ảnh. Một giải pháp nhanh là thêm bước tiền xử lý: + +```python +# Simple preprocessing – convert to grayscale and increase contrast +processed = input_image.to_grayscale().adjust_contrast(1.2) +ocr_result = ocr_engine.recognize(processed) +``` + +### 2. PDF đa trang + +Khi bạn đưa vào một PDF, `recognize` trả về kết quả cho mỗi trang. Lặp qua chúng như sau: + +```python +pdf_image = ocr.Image.load("document.pdf") +pages = pdf_image.pages # collection of page images + +for i, page in enumerate(pages, start=1): + result = ocr_engine.recognize(page) + print(f"--- Page {i} ---") + print(result.text) +``` + +### 3. Hỗ trợ Ngôn ngữ + +Aspose OCR hỗ trợ hơn 60 ngôn ngữ. Để chuyển ngôn ngữ, đặt thuộc tính `language` trước khi gọi `recognize`: + +```python +ocr_engine.language = "fr" # French +ocr_result = ocr_engine.recognize(input_image) +``` + +## Ví dụ Hoạt động đầy đủ + +Kết hợp tất cả lại, đây là một script hoàn chỉnh, sẵn sàng sao chép‑dán, bao phủ mọi thứ từ cài đặt đến xử lý các trường hợp đặc biệt: + +```python +# -*- coding: utf-8 -*- +""" +Python OCR tutorial – complete example using Aspose OCR Cloud. +Demonstrates loading an image, performing OCR, and handling multi‑page PDFs. +""" + +import asposeocrcloud as ocr +import os + +def ocr_file(filepath: str, language: str = "en"): + """ + Perform OCR on a given file (image or PDF) and return plain text. + """ + # Initialise engine (trial licence) + engine = ocr.OcrEngine() + engine.language = language + + # Load the file – SDK auto‑detects format + image = ocr.Image.load(filepath) + + # If it's a PDF, iterate over pages + if image.is_pdf: + all_text = [] + for page in image.pages: + result = engine.recognize(page) + all_text.append(result.text) + return "\n".join(all_text) + + # Single‑image case + result = engine.recognize(image) + return result.text + + +if __name__ == "__main__": + # Example usage – replace with your own path + sample_path = os.path.join("YOUR_DIRECTORY", "receipt.png") + + if not os.path.exists(sample_path): + raise FileNotFoundError(f"File not found: {sample_path}") + + extracted = ocr_file(sample_path) + print("=== Extracted Text ===") + print(extracted) +``` + +Chạy script (`python ocr_demo.py`) và bạn sẽ thấy đầu ra **ocr image to text** ngay trong console. + +## Tóm tắt – Những gì chúng ta đã đề cập + +- Đã cài đặt SDK **Aspose OCR Cloud** (`pip install asposeocrcloud`). +- **Khởi tạo engine OCR** mà không cần giấy phép (hoàn hảo cho bản dùng thử). +- Đã minh họa cách **load image for OCR**, dù là PNG, JPEG, hay PDF. +- Đã thực hiện chuyển đổi **ocr image to text** và **convert image plain text** thành một chuỗi Python có thể sử dụng. +- Đã giải quyết các vấn đề thường gặp như quét độ phân giải thấp, PDF đa trang, và lựa chọn ngôn ngữ. + +## Các bước tiếp theo & Chủ đề liên quan + +Bây giờ bạn đã thành thạo **python ocr tutorial**, hãy cân nhắc khám phá: + +- **Extract text image python** để xử lý hàng loạt các thư mục biên lai lớn. +- Tích hợp đầu ra OCR với **pandas** để phân tích dữ liệu (`df = pd.read_csv(StringIO(extracted))`). +- Sử dụng **Tesseract OCR** như một phương án dự phòng khi kết nối internet hạn chế. +- Thêm bước hậu xử lý với **spaCy** để nhận dạng các thực thể như ngày tháng, số tiền, và tên thương gia. + +Hãy tự do thử nghiệm: thử các định dạng hình ảnh khác nhau, điều chỉnh độ tương phản, hoặc chuyển ngôn ngữ. Cảnh quan OCR rất rộng, và những kỹ năng bạn vừa học được là nền tảng vững chắc cho bất kỳ dự án tự động hoá tài liệu nào. + +Chúc lập trình vui vẻ, và hy vọng văn bản của bạn luôn dễ đọc! + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file diff --git a/ocr/vietnamese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md b/ocr/vietnamese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md new file mode 100644 index 000000000..4f69c4d4d --- /dev/null +++ b/ocr/vietnamese/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/_index.md @@ -0,0 +1,219 @@ +--- +category: general +date: 2026-03-28 +description: Tìm hiểu cách chạy OCR trên hình ảnh, tự động tải mô hình Hugging Face, + làm sạch văn bản OCR và cấu hình mô hình LLM trong Python bằng Aspose OCR Cloud. +draft: false +keywords: +- run OCR on image +- download hugging face model +- clean OCR text +- configure LLM model +language: vi +og_description: Chạy OCR trên hình ảnh và làm sạch đầu ra bằng mô hình Hugging Face + tự tải xuống. Hướng dẫn này cho thấy cách cấu hình mô hình LLM trong Python. +og_title: Chạy OCR trên hình ảnh – Hướng dẫn đầy đủ Aspose OCR Cloud +tags: +- OCR +- Python +- LLM +- HuggingFace +title: Chạy OCR trên hình ảnh với Aspose OCR Cloud – Hướng dẫn chi tiết từng bước +url: /vi/python/general/run-ocr-on-image-with-aspose-ocr-cloud-full-step-by-step-gui/ +--- + +{{< blocks/products/pf/main-wrap-class >}} +{{< blocks/products/pf/main-container >}} +{{< blocks/products/pf/tutorial-page-section >}} + +# Chạy OCR trên Hình ảnh – Hướng dẫn đầy đủ Aspose OCR Cloud + +Bạn đã bao giờ cần chạy OCR trên các tệp hình ảnh nhưng kết quả thô trông như một mớ hỗn độn? Theo kinh nghiệm của tôi, vấn đề khó khăn nhất không phải là việc nhận dạng mà là việc làm sạch. May mắn là Aspose OCR Cloud cho phép bạn gắn một bộ xử lý hậu xử lý LLM có thể *tự động làm sạch văn bản OCR*. Trong hướng dẫn này, chúng tôi sẽ hướng dẫn bạn mọi thứ cần thiết: từ **tải xuống mô hình Hugging Face** đến cấu hình LLM, chạy engine OCR, và cuối cùng là tinh chỉnh kết quả. + +Khi hoàn thành hướng dẫn này, bạn sẽ có một script sẵn sàng chạy mà: + +1. Tải về mô hình Qwen 2.5 gọn nhẹ từ Hugging Face (tự động tải cho bạn). +2. Cấu hình mô hình để chạy một phần mạng trên GPU và phần còn lại trên CPU. +3. Thực thi engine OCR trên một ảnh ghi chú viết tay. +4. Sử dụng LLM để làm sạch văn bản đã nhận dạng, cung cấp đầu ra dễ đọc cho con người. + +> **Prerequisites** – Python 3.8+, gói `asposeocrcloud`, một GPU có ít nhất 4 GB VRAM (tùy chọn nhưng được khuyến nghị), và kết nối internet để tải mô hình lần đầu. + +--- + +## Bạn sẽ cần gì + +- **Aspose OCR Cloud SDK** – cài đặt qua `pip install asposeocrcloud`. +- **Một ảnh mẫu** – ví dụ: `handwritten_note.jpg` đặt trong thư mục cục bộ. +- **Hỗ trợ GPU** – nếu bạn có GPU hỗ trợ CUDA, script sẽ chuyển 30 lớp sang GPU; nếu không, nó sẽ tự động quay lại CPU. +- **Quyền ghi** – script sẽ lưu bộ nhớ đệm mô hình trong `YOUR_DIRECTORY`; hãy chắc chắn thư mục tồn tại. + +--- + +## Bước 1 – Cấu hình mô hình LLM (tải mô hình Hugging Face) + +Điều đầu tiên chúng ta làm là chỉ cho Aspose AI nơi lấy mô hình. Lớp `AsposeAIModelConfig` xử lý việc tự động tải, lượng tử hoá và phân bổ lớp GPU. + +```python +import asposeocrcloud as ocr +from asposeocrcloud.ai import AsposeAI, AsposeAIModelConfig + +# ---------------------------------------------------------------------- +# Step 1: Model configuration – this will download the model if it’s missing +# ---------------------------------------------------------------------- +model_config = AsposeAIModelConfig( + allow_auto_download="true", # Enables auto‑download + hugging_face_repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF", # Repo on Hugging Face + hugging_face_quantization="int8", # Small footprint, fast inference + gpu_layers=30, # 30 layers on GPU, rest on CPU + directory_model_path=r"YOUR_DIRECTORY" # Cache folder (optional) +) +``` + +**Tại sao điều này quan trọng** – Lượng tử hoá thành `int8` giảm đáng kể việc sử dụng bộ nhớ (≈ 4 GB so với 12 GB). Phân chia mô hình giữa GPU và CPU cho phép bạn chạy một LLM 3 tỷ tham số ngay trên RTX 3060 thông thường. Nếu bạn không có GPU, đặt `gpu_layers=0` và SDK sẽ giữ mọi thứ trên CPU. + +> **Tip:** Lần chạy đầu tiên sẽ tải xuống khoảng ~ 1.5 GB, vì vậy hãy kiên nhẫn trong vài phút và đảm bảo kết nối ổn định. + +--- + +## Bước 2 – Khởi tạo AI Engine với cấu hình mô hình + +Bây giờ chúng ta khởi động engine AI của Aspose và truyền vào cấu hình vừa tạo. + +```python +# ---------------------------------------------------------------------- +# Step 2: Initialise the AI engine – pulls the model if needed +# ---------------------------------------------------------------------- +ocr_ai = AsposeAI() +ocr_ai.initialize(model_config) # This call blocks until the model is ready +``` + +**Điều gì đang diễn ra phía sau?** SDK kiểm tra `directory_model_path` để tìm mô hình đã tồn tại. Nếu tìm thấy phiên bản phù hợp, nó sẽ tải ngay; nếu không, nó sẽ tải file GGUF từ Hugging Face, giải nén và chuẩn bị pipeline suy luận. + +--- + +## Bước 3 – Tạo OCR Engine và Gắn AI Post‑Processor + +OCR engine thực hiện việc nhận dạng ký tự. Bằng cách gắn `ocr_ai.run_postprocessor` chúng ta kích hoạt **làm sạch văn bản OCR** tự động sau khi nhận dạng. + +```python +# ---------------------------------------------------------------------- +# Step 3: Build the OCR engine and bind the LLM post‑processor +# ---------------------------------------------------------------------- +ocr_engine = ocr.OcrEngine() +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=None) +``` + +**Tại sao cần post‑processor?** Văn bản OCR thô thường có các ngắt dòng sai chỗ, dấu câu bị nhận sai, hoặc ký tự lạ. LLM có thể viết lại đầu ra thành các câu đúng, sửa lỗi chính tả và thậm chí suy luận các từ thiếu—như biến một đống dữ liệu thô thành văn bản mượt mà. + +--- + +## Bước 4 – Chạy OCR trên tệp ảnh + +Sau khi đã kết nối mọi thứ, đã đến lúc đưa ảnh vào engine. + +```python +# ---------------------------------------------------------------------- +# Step 4: Load the image and run OCR +# ---------------------------------------------------------------------- +input_image = ocr.Image.load(r"YOUR_DIRECTORY/handwritten_note.jpg") +raw_result = ocr_engine.recognize(input_image) # Returns an OcrResult object +``` + +**Trường hợp đặc biệt:** Nếu ảnh quá lớn (> 5 MP), bạn có thể muốn thu nhỏ trước để tăng tốc xử lý. SDK chấp nhận đối tượng Pillow `Image`, vì vậy bạn có thể tiền xử lý bằng `PIL.Image.thumbnail()` nếu cần. + +--- + +## Bước 5 – Để AI làm sạch văn bản đã nhận dạng và hiển thị cả hai phiên bản + +Cuối cùng chúng ta gọi post‑processor đã gắn trước đó. Bước này minh họa sự khác biệt giữa *trước* và *sau* khi làm sạch. + +```python +# ---------------------------------------------------------------------- +# Step 5: Clean the OCR output using the LLM and display both results +# ---------------------------------------------------------------------- +cleaned_result = ocr_engine.run_postprocessor(raw_result) + +print("=== Before AI ===") +print(raw_result.text) + +print("\n=== After AI ===") +print(cleaned_result.text) +``` + +### Kết quả mong đợi + +``` +=== Before AI === +Th1s 1s a h@ndwr1tt3n n0te. It c0nta1ns m1st@k3s, l1n3 br3aks, & sp3c!@l ch@r@ct3rs. + +=== After AI === +This is a handwritten note. It contains mistakes, line breaks, and special characters. +``` + +Chú ý cách LLM đã: + +- Sửa các lỗi nhận dạng thường gặp (`Th1s` → `This`). +- Loại bỏ ký tự lạ (`&` → `and`). +- Chuẩn hoá các ngắt dòng thành câu hoàn chỉnh. + +--- + +## 🎨 Tổng quan trực quan (Quy trình chạy OCR trên ảnh) + +![Quy trình chạy OCR trên hình ảnh](run_ocr_on_image_workflow.png "Sơ đồ mô tả quy trình chạy OCR trên hình ảnh từ tải mô hình đến kết quả đã làm sạch") + +Sơ đồ trên tóm tắt toàn bộ pipeline: **tải mô hình Hugging Face → cấu hình LLM → khởi tạo AI → OCR engine → AI post‑processor → làm sạch văn bản OCR**. + +--- + +## Câu hỏi thường gặp & Mẹo chuyên nghiệp + +### Nếu tôi không có GPU thì sao? + +Đặt `gpu_layers=0` trong `AsposeAIModelConfig`. Mô hình sẽ chạy hoàn toàn trên CPU, chậm hơn nhưng vẫn hoạt động. Bạn cũng có thể chuyển sang mô hình nhỏ hơn (ví dụ `Qwen/Qwen2.5-1.5B‑Instruct‑GGUF`) để thời gian suy luận hợp lý. + +### Làm sao để thay đổi mô hình sau này? + +Chỉ cần cập nhật `hugging_face_repo_id` và chạy lại `ocr_ai.initialize(model_config)`. SDK sẽ phát hiện thay đổi phiên bản, tải mô hình mới và thay thế các file đã lưu. + +### Tôi có thể tùy chỉnh prompt cho post‑processor không? + +Có. Truyền một dictionary vào `custom_settings` với khóa `prompt_template`. Ví dụ: + +```python +custom_prompt = { + "prompt_template": "Correct the following OCR text and keep line breaks:\n{ocr_text}" +} +ocr_engine.set_post_processor(ocr_ai.run_postprocessor, custom_settings=custom_prompt) +``` + +### Nên lưu văn bản đã làm sạch vào file không? + +Chắc chắn rồi. Sau khi làm sạch, bạn có thể ghi kết quả ra file `.txt` hoặc `.json` để xử lý tiếp: + +```python +with open("cleaned_note.txt", "w", encoding="utf-8") as f: + f.write(cleaned_result.text) +``` + +--- + +## Kết luận + +Chúng ta vừa trình bày cách **chạy OCR trên ảnh** bằng Aspose OCR Cloud, tự động **tải mô hình Hugging Face**, cấu hình **cài đặt mô hình LLM** một cách chuyên nghiệp, và cuối cùng **làm sạch văn bản OCR** bằng một bộ xử lý hậu LLM mạnh mẽ. Toàn bộ quy trình được gói gọn trong một script Python dễ chạy, hoạt động trên cả máy có GPU và chỉ CPU. + +Nếu bạn đã quen thuộc với pipeline này, hãy thử: + +- **LLM khác** – thử `meta-llama/Meta-Llama-3-8B‑Instruct‑GGUF` để có cửa sổ ngữ cảnh lớn hơn. +- **Xử lý batch** – lặp qua thư mục ảnh và tổng hợp kết quả đã làm sạch vào CSV. +- **Prompt tùy chỉnh** – điều chỉnh AI cho lĩnh vực của bạn (tài liệu pháp lý, ghi chú y tế, v.v.). + +Hãy tự do thay đổi giá trị `gpu_layers`, đổi mô hình, hoặc chèn prompt của riêng bạn. Bầu trời là giới hạn, và đoạn code bạn có ngay bây giờ là bệ phóng. + +Chúc lập trình vui vẻ, và mong đầu ra OCR của bạn luôn sạch sẽ! 🚀 + +{{< /blocks/products/pf/tutorial-page-section >}} +{{< /blocks/products/pf/main-container >}} +{{< /blocks/products/pf/main-wrap-class >}} +{{< blocks/products/products-backtop-button >}} \ No newline at end of file