Skip to content

feat: add get_cell_image method to TableItem#652

Open
sriharan0804 wants to merge 1 commit into
docling-project:mainfrom
sriharan0804:feat-table-cell-image-clean
Open

feat: add get_cell_image method to TableItem#652
sriharan0804 wants to merge 1 commit into
docling-project:mainfrom
sriharan0804:feat-table-cell-image-clean

Conversation

@sriharan0804

Copy link
Copy Markdown
Contributor

Summary

Closes #449

This PR adds a get_cell_image() helper method to TableItem to enable extraction of image crops corresponding to individual table cells.

Currently, TableCell objects may contain bounding box information, but there is no direct API for retrieving the image region represented by a cell. This change provides a convenient way to obtain cell-level image crops from the underlying page image.

Changes

  • Added TableItem.get_cell_image()

  • Reused the existing image cropping workflow used by DocItem.get_image()

  • Added validation for:

    • missing cell bounding boxes
    • missing table provenance
    • missing page images
  • Returns None when image extraction is not possible

Implementation Details

The method:

  1. Retrieves the page associated with the table provenance.
  2. Converts the cell bounding box to top-left origin coordinates.
  3. Scales the bounding box to the rendered page image size.
  4. Crops and returns the corresponding PIL image.

This follows the same coordinate conversion and image extraction logic already used elsewhere in the document model.

Testing

Added a unit test covering successful table cell image extraction.

Executed:

uv run pytest test/test_docling_doc.py -k "get_cell_image" -v

Result:

1 passed

Additional verification:

uv run pytest test/test_docling_doc.py -k "table" -v

Result:

10 passed

Linting:

uv run ruff check docling_core/types/doc/document.py test/test_docling_doc.py

Result:

All checks passed

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @sriharan0804, all your commits are properly signed off. 🎉

@mergify

mergify Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

🔴 1 of 2 protections blocking · waiting on 👀 reviews

Protection Waiting on
🔴 Require two reviewer for test updates 👀 reviews
🟢 Enforce conventional commit

🔴 Require two reviewer for test updates

Waiting for

  • #approved-reviews-by >= 2
This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

Show 1 satisfied protection

🟢 Enforce conventional commit

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@sriharan0804 sriharan0804 force-pushed the feat-table-cell-image-clean branch from e92fb65 to 7d4c7e8 Compare June 22, 2026 11:30
@christymanthara

Copy link
Copy Markdown

@dolfim-ibm please review.

@sriharan0804

Copy link
Copy Markdown
Contributor Author

hi @christymanthara , take a look at this PR

Signed-off-by: sriharan2005@Tamil-- <sriharan0804@users.noreply.github.com>
@sriharan0804 sriharan0804 force-pushed the feat-table-cell-image-clean branch from 7d4c7e8 to ec5ad1a Compare June 27, 2026 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add method to extract images from table cells

2 participants