Skip to content

Added datasets.CreateParquetReader#39

Merged
janpfeifer merged 1 commit intomainfrom
dataset-create-reader
Apr 8, 2026
Merged

Added datasets.CreateParquetReader#39
janpfeifer merged 1 commit intomainfrom
dataset-create-reader

Conversation

@janpfeifer
Copy link
Copy Markdown
Contributor

This PR introduces datasets.CreateParquetReader to the datasets package.

Summary

  • Added CreateParquetReader[T any](ds *Dataset, config, split string) (*parquet.GenericReader[T], error):
    • This function groups all Parquet files for a given config and split.
    • It downloads the files if necessary.
    • It creates a single reader for them by combining the row groups.
    • It also fixes the schema based on the generic type T.
  • Updated ListFiles documentation to clarify the order of returned files.
  • Updated docs/CHANGELOG.md.

This is useful when random access or more flexibility than sequential iteration is required.

@janpfeifer janpfeifer merged commit b374c30 into main Apr 8, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant