Fix/doclang suppress content filtered shells by nassarofficial · Pull Request #656 · docling-project/docling-core

nassarofficial · 2026-06-24T12:54:53Z

Problem When we serialize with narrow content_types (e.g. picture-only prompts like []), the serializer was still emitting head-only shells — elements with metadata (, , , ) but no actual supervised body content. That leaked spurious tags into WDS task-filtered training targets.

Fix (opt-in only — default behavior unchanged) All new logic is gated behind suppress_empty_elements=True (default remains False):

• Text / table / picture: if an element’s body type is filtered out and there’s no visible content, drop the element entirely instead of emitting an empty tag or metadata-only shell
• Layout exception: if add_location=True and the item has provenance, keep the element so layout supervision still gets boxes
• Picture labels: under suppression, is only emitted when picture/chart/chemistry content is actually requested (layout-only prompts get boxes, no classification label)
• New param: emit_picture_layer (default True) — lets callers omit on pictures (we set this to False in granite for picture-classification prompts)

When suppress_empty_elements is enabled and content_types narrows the serialized body (task-filtered training), omit text/table/picture elements that would otherwise emit metadata-only shells (layer, thread, location, classification label without allowed body). Adds emit_picture_layer and layout-only picture box behavior with regression tests.

Mirrors granite-4-docling OCR-only training path so the serializer fix does not rely on callers downgrading label_mode to AUTO.

github-actions · 2026-06-24T12:55:03Z

✅ DCO Check Passed

Thanks @nassarofficial, all your commits are properly signed off. 🎉

mergify · 2026-06-24T12:55:29Z

Merge Protections

🔴 2 of 2 protections blocking · waiting on 👀 reviews and 🙋 you

	Protection	Waiting on
🔴	Enforce conventional commit	🙋 you
🔴	Require two reviewer for test updates	👀 reviews

🔴 Enforce conventional commit

Waiting for

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

This rule is failing.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🔴 Require two reviewer for test updates

Waiting for

#approved-reviews-by >= 2

This rule is failing.

When test data is updated, we require two reviewers

#approved-reviews-by >= 2

Update content-filtered table/picture tests to reflect that suppress_empty_elements is opt-in and that add_location keeps layout boxes (without classification labels) for filtered elements. Signed-off-by: Ahmed Nassar AHN@zurich.ibm.com <AHN@zurich.ibm.com>

…h.ibm.com> I, Ahmed Nassar AHN@zurich.ibm.com <AHN@zurich.ibm.com>, hereby add my Signed-off-by to this commit: 323dd1d I, Ahmed Nassar AHN@zurich.ibm.com <AHN@zurich.ibm.com>, hereby add my Signed-off-by to this commit: d12853a I, Ahmed Nassar AHN@zurich.ibm.com <AHN@zurich.ibm.com>, hereby add my Signed-off-by to this commit: e2bfb21 Signed-off-by: Ahmed Nassar AHN@zurich.ibm.com <AHN@zurich.ibm.com>

Ahmed Nassar AHN@zurich.ibm.com added 3 commits June 23, 2026 11:40

test: cover unclassified picture suppression with label_mode ALWAYS

d12853a

Mirrors granite-4-docling OCR-only training path so the serializer fix does not rely on callers downgrading label_mode to AUTO.

suppress head-only shells under task filtering

e2bfb21

nassarofficial requested a review from vagenas June 24, 2026 12:54

Ahmed Nassar AHN@zurich.ibm.com added 2 commits June 24, 2026 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/doclang suppress content filtered shells#656

Fix/doclang suppress content filtered shells#656
nassarofficial wants to merge 5 commits into
mainfrom
fix/doclang-suppress-content-filtered-shells

nassarofficial commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nassarofficial commented Jun 24, 2026

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 Enforce conventional commit

🔴 Require two reviewer for test updates

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 24, 2026 •

edited

Loading

mergify Bot commented Jun 24, 2026 •

edited

Loading