Background
Currently, composite types such as tuples or structs are stored as a single value/blob. This makes it inefficient to access individual fields, and prevents us from leveraging columnar storage benefits (e.g., better compression, vectorized scans, projection pushdown).
Proposal
Adapt the storage layer to support a columnar layout for composite types. The attributes of a tuple/struct should be physically stored as multiple sub-columns, one per field, instead of a single packed record.
Example
For a struct Person { name: string, age: int, address: Address { city: string, zip: int } }, the storage would expand into sub-columns:
Person.name
Person.age
Person.address.city
Person.address.zip
Requirements
Benefits
- Better compression ratio (homogeneous data per column)
- Faster scans when only a subset of fields is needed
- Natural fit for analytical / vectorized execution
Open Questions
- How to encode nested paths in the sub-column naming scheme?
- How to handle null/optional fields at each nesting level?
- Interaction with existing index structures?
Background
Currently, composite types such as tuples or structs are stored as a single value/blob. This makes it inefficient to access individual fields, and prevents us from leveraging columnar storage benefits (e.g., better compression, vectorized scans, projection pushdown).
Proposal
Adapt the storage layer to support a columnar layout for composite types. The attributes of a tuple/struct should be physically stored as multiple sub-columns, one per field, instead of a single packed record.
Example
For a struct
Person { name: string, age: int, address: Address { city: string, zip: int } }, the storage would expand into sub-columns:Person.namePerson.agePerson.address.cityPerson.address.zipRequirements
Benefits
Open Questions