named tuple iterator, fixes for nested structures and column name handling#71
Merged
named tuple iterator, fixes for nested structures and column name handling#71
Conversation
purge protobuf and thrift conversion of parquet schemas in preparation of moving to named tuples representation.
The `ParFile` reader now accepts an optional `map_logical_types`. ParFile(path; map_logical_types) => ParFile `map_logical_types` can be one of: - `false`: no mapping is done (default) - `true`: default mappings are attempted on all columns (bytearray => String, int96 => DateTime) - A user supplied dict mapping column names to a tuple of type and a converter function
tanmaykm
added a commit
to tanmaykm/ParquetFiles.jl
that referenced
this pull request
May 17, 2020
This [Parquet.jl update](JuliaIO/Parquet.jl#71) will add a named tuple iterator `RecordReader`. We will be able to use that here directly, instead of wrapping over the older `RecCursor`. The new `map_logical_types` option to `ParFile` automatically converts byte arrays to strings, so we do not need to handle that here now.
Member
Author
|
I have also put up a PR for the corresponding changes needed to ParquetFiles.jl: queryverse/ParquetFiles.jl#25 |
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RecordCursorthat gives out named tuples for records. Did not reuse the old nameRecCursorto avoid confusion.Ref: #51, should now pass all the failure cases listed there.
cc: @davidanthoff does this look fine?
Some example interactions:
Will update the readme and examples. Will also do some more tests and maybe some refactor and cleanup.