Skip to content

Support columnar storage layout for composite types (tuple/struct) #478

@zhanglei1949

Description

@zhanglei1949

Background

Currently, composite types such as tuples or structs are stored as a single value/blob. This makes it inefficient to access individual fields, and prevents us from leveraging columnar storage benefits (e.g., better compression, vectorized scans, projection pushdown).

Proposal

Adapt the storage layer to support a columnar layout for composite types. The attributes of a tuple/struct should be physically stored as multiple sub-columns, one per field, instead of a single packed record.

Example

For a struct Person { name: string, age: int, address: Address { city: string, zip: int } }, the storage would expand into sub-columns:

  • Person.name
  • Person.age
  • Person.address.city
  • Person.address.zip

Requirements

  • Support flat tuple/struct types — each field becomes a sub-column
  • Support nested composite types — sub-columns can themselves be decomposed recursively
  • Field-level access should read only the relevant sub-column(s) (projection pushdown)
  • Schema evolution considerations (adding/removing fields)

Benefits

  • Better compression ratio (homogeneous data per column)
  • Faster scans when only a subset of fields is needed
  • Natural fit for analytical / vectorized execution

Open Questions

  • How to encode nested paths in the sub-column naming scheme?
  • How to handle null/optional fields at each nesting level?
  • Interaction with existing index structures?

Metadata

Metadata

Assignees

Labels

storeStorage layer
No fields configured for Feature.

Projects

Status
To do

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions