A compact document-layout analysis program designed to detect and normalize visual structure in scanned or photographed documents. It targets noisy real-world inputs (blurred, deskewed, low-resolution) and supports many languages.
- Detects page elements (text blocks, tables, figures, headings) using a YOLO model trained on the DocLayNet dataset.
- Robust to blurred, deskewed, and low-resolution inputs.
- Multi-language support for downstream OCR and analysis.
- Input preprocessing with OpenCV: Hough transforms and common rotation/deskew operations to normalize orientation.
- Produces structured layout outputs (bounding boxes + labels) and enhanced images for OCR.
- Detection model: YOLO trained on DocLayNet — capable of identifying most document element types.
- Preprocessing: edge/Hough-based line detection and rotation correction; basic denoising and resizing for improving detection/OCR quality.
- Typical outputs: annotated image, normalized crops per element, layout JSON for downstream processing.
- Integrate a dedicated OCR model to extract text from identified regions.
- Add an image- and text-level summarization pipeline to summarize content found across detected elements.
- Implement a reconstruction step to produce a cleaner, higher-resolution version of hard-to-identify documents (super-resolution + layout-aware compositing).
Reconstruct hard-to-identify documents into cleaner, higher-resolution versions while producing a structured layout and text output suitable for search, archival, and downstream NLP/summarization.
- OpenCV (preprocessing)
- PyTorch/TensorFlow + YOLO implementation and weights (DocLayNet-trained)
- Optional: OCR engine (Tesseract / OCR model), super-resolution model