Stream CsvDataTableStore.getRows from disk lazily and mandate closing the returned stream by jkschneider · Pull Request #8156 · openrewrite/rewrite

jkschneider · 2026-06-30T19:01:31Z

What's changed?

CsvDataTableStore.readRows(...) buffered every matching CSV file fully into a List before returning list.stream(), so reading back a large data table held all of its rows in memory at once. It now streams rows lazily from disk via a single Spliterator that keeps one file open at a time:

rows are produced on demand, so peak memory is bounded to one row rather than the whole table;
the file is closed the moment its last row is read, so a fully-drained stream — how every caller consumes one today — releases its handle with no explicit close;
closing the stream early (try-with-resources) also releases the open file.

Because the returned stream now owns a file handle, DataTableStore.getRows(...) is annotated @MustBeClosed. The annotation is already on the compile classpath transitively (via Caffeine); it's added explicitly as a compileOnly dependency since it's CLASS-retention and not needed at runtime. The effect is that IntelliJ flags any call site that consumes the stream without try-with-resources, so a leak can't slip through unnoticed. Every call site in the repo is wrapped accordingly.

This is an alternative to Stream CsvDataTableStore.getRows from disk lazily instead of buffering the whole table #7858 with the same goal (bounded read-back memory). The difference is the resource model: a single owned handle closed deterministically at end-of-input plus a mandated @MustBeClosed contract, rather than flatMap over per-file resource-backed streams documented with a Javadoc note.

What's your motivation?

Recipes — and the hosts that run them — can read their own data tables back, e.g. to export or aggregate them, and those tables can get very large (one row per method/class across a large repository). Buffering the entire table into a List before the consumer sees a single row makes peak memory scale with table size; streaming bounds the store side to one row at a time.

Checklist

I've added unit tests to cover both positive and negative cases
I've read and applied the recipe conventions and best practices
I've used the IntelliJ IDEA auto-formatter on affected files

… the returned stream readRows(...) buffered every matching CSV file fully into a List before returning list.stream(), so reading back a large data table held all of its rows in memory at once. It now streams rows lazily via a single Spliterator that keeps one file open at a time: rows are produced on demand, the file is closed the moment its last row is read (so a fully-drained stream self-closes), and closing the stream early also releases the open file. Because the returned stream now owns a file handle, DataTableStore.getRows is annotated @MustBeClosed (error_prone_annotations, added as compileOnly) so callers are flagged if they consume it without try-with-resources. All call sites are wrapped accordingly. Alternative to #7858.

github-project-automation Bot moved this to In Progress in OpenRewrite Jun 30, 2026

github-project-automation Bot added this to OpenRewrite Jun 30, 2026

moderne-meeseeks Bot assigned jkschneider Jun 30, 2026

jkschneider marked this pull request as ready for review June 30, 2026 19:14

jkschneider merged commit df82349 into main Jun 30, 2026
1 check failed

jkschneider deleted the stream-data-table-rows-lazy branch June 30, 2026 19:15

github-project-automation Bot moved this from In Progress to Done in OpenRewrite Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream CsvDataTableStore.getRows from disk lazily and mandate closing the returned stream#8156

Stream CsvDataTableStore.getRows from disk lazily and mandate closing the returned stream#8156
jkschneider merged 1 commit into
mainfrom
stream-data-table-rows-lazy

jkschneider commented Jun 30, 2026 •

edited by moderne-meeseeks Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jkschneider commented Jun 30, 2026 • edited by moderne-meeseeks Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's changed?

What's your motivation?

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jkschneider commented Jun 30, 2026 •

edited by moderne-meeseeks Bot

Loading