Skip to content

fix(pixel-layout): prevent OOM by limiting queued page futures#216

Merged
PeterStaar-IBM merged 1 commit into
mainfrom
sami/fix-pixel-layout-oom
May 28, 2026
Merged

fix(pixel-layout): prevent OOM by limiting queued page futures#216
PeterStaar-IBM merged 1 commit into
mainfrom
sami/fix-pixel-layout-oom

Conversation

@samiuc

@samiuc samiuc commented May 12, 2026

Copy link
Copy Markdown
Contributor

PixelLayoutEvaluator.__call__ submitted all futures for the entire dataset before collecting any results:

# Before: submit all, then collect all
for data in ds_selection:        # e.g. 981 documents
    futures.extend(submit(...))  # all queued in memory simultaneously

for future in as_completed(futures):
    collect(future)

Each in-flight page future holds a serialized (height × width) uint64 numpy array (~8 bytes/pixel) queued for the worker process. On datasets with large pages (e.g. OmniDocBench at ~145 M pixels/page), with concurrency=4, peak memory reaches 4 workers × 2 arrays × ~1.16 GB = ~9 GB, exhausting RAM and killing the process.

The PR:

  • Replaces the two-pass pattern with a bounded sliding window: at most concurrency × 2 futures are in-flight at once. Completed futures are drained inline before new ones are submitted, so peak memory is proportional to concurrency rather than dataset size.
  • Extract the result-accumulation logic into a _collect_page_result static method (previously an inline closure with nonlocal) for clarity and testability.

…tasets

Signed-off-by: samiuc <sami.ullah.chat@gmail.com>
@github-actions

Copy link
Copy Markdown
Contributor

DCO Check Passed

Thanks @samiuc, all your commits are properly signed off. 🎉

@mergify

mergify Bot commented May 12, 2026

Copy link
Copy Markdown

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@samiuc samiuc requested a review from nikos-livathinos May 12, 2026 20:13
@samiuc samiuc changed the title fix(pixel-layout): bound in-flight futures to prevent OOM on large datasets fix(pixel-layout): prevent OOM by limiting queued page futures May 14, 2026

@PeterStaar-IBM PeterStaar-IBM left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@laurachiticariu

Copy link
Copy Markdown
Contributor

@PeterStaar-IBM can this be reviewed and merged?

@PeterStaar-IBM PeterStaar-IBM merged commit 835a73d into main May 28, 2026
10 checks passed
@PeterStaar-IBM PeterStaar-IBM deleted the sami/fix-pixel-layout-oom branch May 28, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants