Skip to content

feat: add ContentLayer.SHEET and serialize Excel sheet groups in HTML and Markdown#660

Open
samiuc wants to merge 2 commits into
docling-project:mainfrom
samiuc:sami/serialize-sheet-names
Open

feat: add ContentLayer.SHEET and serialize Excel sheet groups in HTML and Markdown#660
samiuc wants to merge 2 commits into
docling-project:mainfrom
samiuc:sami/serialize-sheet-names

Conversation

@samiuc

@samiuc samiuc commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

When converting a multi-sheet Excel file, there was no way to tell which table or piece of content belonged to which worksheet in the HTML or Markdown output. This PR fixes that by tagging visible worksheet groups with a dedicated content layer, so the HTML serializer can wrap each sheet's content in a labeled <section> element and the Markdown serializer can emit a heading for each sheet.

…izers

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>
@github-actions

Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @samiuc, all your commits are properly signed off. 🎉

@mergify

mergify Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Merge Protections

🔴 1 of 2 protections blocking · waiting on 👀 reviews

Protection Waiting on
🔴 Require two reviewer for test updates 👀 reviews
🟢 Enforce conventional commit

🔴 Require two reviewer for test updates

Waiting for

  • #approved-reviews-by >= 2
This rule is failing.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

Show 1 satisfied protection

🟢 Enforce conventional commit

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@samiuc samiuc changed the title feat: add support for SHEET content layer in HTML and Markdown serializers feat: add ContentLayer.SHEET and serialize Excel sheet groups in HTML and Markdown Jun 25, 2026
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>
@ceberam

ceberam commented Jun 26, 2026

Copy link
Copy Markdown
Member

@samiuc I don't think we should create a new content layer for the structural component derived from a workbook sheet for the same reason that we don't have a content layer for PDF pages or EPUB chapters.
Please, check the work done in docling-project/docling#3635 since you should be able to achieve what you need with the current code, at least for the Markdown serializer. We will follow the approach described on that PR thread to generalize the rendering of GroupLabel.SHEET from MsExcelMarkdownDocSerializer to the default Markdown and HTML serializer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants