Skip to content

nanda-gopal-sb/doc-layout

Repository files navigation

Document Layout Analysis

A compact document-layout analysis program designed to detect and normalize visual structure in scanned or photographed documents. It targets noisy real-world inputs (blurred, deskewed, low-resolution) and supports many languages.

Features

  • Detects page elements (text blocks, tables, figures, headings) using a YOLO model trained on the DocLayNet dataset.
  • Robust to blurred, deskewed, and low-resolution inputs.
  • Multi-language support for downstream OCR and analysis.
  • Input preprocessing with OpenCV: Hough transforms and common rotation/deskew operations to normalize orientation.
  • Produces structured layout outputs (bounding boxes + labels) and enhanced images for OCR.

Implementation (brief)

  • Detection model: YOLO trained on DocLayNet — capable of identifying most document element types.
  • Preprocessing: edge/Hough-based line detection and rotation correction; basic denoising and resizing for improving detection/OCR quality.
  • Typical outputs: annotated image, normalized crops per element, layout JSON for downstream processing.

Next steps

  • Integrate a dedicated OCR model to extract text from identified regions.
  • Add an image- and text-level summarization pipeline to summarize content found across detected elements.
  • Implement a reconstruction step to produce a cleaner, higher-resolution version of hard-to-identify documents (super-resolution + layout-aware compositing).

Goal

Reconstruct hard-to-identify documents into cleaner, higher-resolution versions while producing a structured layout and text output suitable for search, archival, and downstream NLP/summarization.

Minimal dependencies

  • OpenCV (preprocessing)
  • PyTorch/TensorFlow + YOLO implementation and weights (DocLayNet-trained)
  • Optional: OCR engine (Tesseract / OCR model), super-resolution model

About

A document layout understanding program. Team Polyglot AI's submission for the AI Grand Challenge 2025

Topics

Resources

Stars

Watchers

Forks

Contributors