Skip to content

[Feature] PDF Blank Page Detector & Remover #413

Description

@aanyabansal-22

✨ Feature Overview

Add a PDF Blank Page Detector & Remover tool that automatically scans uploaded PDF files, identifies blank or near-blank pages, and allows users to remove them before downloading a cleaned PDF.

This feature would help users quickly clean scanned documents, merged PDFs, books, reports, and forms that often contain unnecessary blank pages.

🚀 Why is this Feature Needed?

Many PDFs generated through scanning, printing, or document merging workflows contain unwanted blank pages.

Currently, users must manually inspect documents and remove such pages using external software.

Benefits:

  • Saves time by automatically detecting blank pages.
  • Reduces PDF file size.
  • Improves document organization and readability.
  • Eliminates the need for third-party PDF editors.
  • Enhances the platform's collection of PDF productivity tools.

🎨 Visuals (If applicable)

Suggested workflow:

  1. Upload PDF
  2. Click Detect Blank Pages
  3. Display detected pages

Example:

Detected Blank Pages:
☑ Page 3
☑ Page 8
☑ Page 12

Actions:

  • Remove Selected Pages
  • Download Cleaned PDF

🔧 Possible Implementation (Optional)

Backend

Using PyMuPDF (fitz):

  • Iterate through PDF pages.

  • Detect pages with:

  • No extractable text.

  • Very low pixel/content density.

  • Return detected blank page numbers.

  • Generate a new PDF excluding selected blank pages.

Frontend

  • Create a dedicated tool page.
  • Display detected blank pages with checkboxes.
  • Allow users to review and remove pages before downloading.

💡 Additional Notes

  • Support both text-based and scanned PDFs.
  • Allow users to manually deselect pages before removal.
  • Optionally support detection of near-blank pages containing only scanner marks or small artifacts.
  • Maintain user privacy by processing files locally/on the server without third-party services.

🏆 Are you contributing under any open-source program?

GSSoC 2026

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions