Skip to content

Iceberg: support nanosecond timestamps #4462

@jacobmarble

Description

@jacobmarble

Let's add a materialize-iceberg task-level option to use Iceberg timestamptz_ns column types instead of timestamptz.

Current Landscape

Iceberg v3

Iceberg table format v3 adds support for nanosecond timestamps, serialized as "8-byte little-endian long".

The range of values is narrower than the more common microsecond timestamps:

  • microsecond timestamps: 290,308 BCE -> 294,246 CE
  • nanosecond timestamps: 1677 CE -> 2262 CE

Blocker for the materialize-s3-iceberg path: pyiceberg ≤0.11.1 (latest) cannot write to v3 tables — write_manifest_list has no V3 writer and data_file_statistics_from_parquet_metadata rejects ns-precision parquet column stats, so Transaction.add_files fails at append time.

Flow

Timestamps are serialized and transferred as RFC3339 strings, whose range is 1 CE -> 9999 CE, and technically has no precision limit. The following connectors explicitly handle nanosecond precision in code:

  • source-oracle — parses TIMESTAMP WITH TIME ZONE with 9-digit fractional seconds: replication.go:29-34, main.go:60-63
  • source-kafka — handles Avro TimestampNanos and LocalTimestampNanos logical types: pull.rs:514-536
  • materialize-snowflake — persists to TIMESTAMP_LTZ/NTZ/TZ at scale 9 via nanosecond-scaled binary decimal encoding: bdec.go:819-832

Note that "parses nanos from source" (Oracle, Kafka) and "persists nanos to a ns-precision column" (Snowflake) are distinct claims — most other materializers can carry a 9-digit RFC3339 string through opaquely but truncate when writing to the destination column type.

Notes

Out-of-range timestamp semantics

Flow's wire format allows year 0001–9999, but timestamptz_ns (int64 Unix nanos) is only valid 1677-09-21 → 2262-04-11. Out-of-range values are clamped to the ns min/max.

Reader compatibility

timestamptz_ns requires Iceberg v3 readers. Validate writes with a DuckDB-based unit test.

Schema evolution

Iceberg doesn't allow in-place promotion between timestamptz and timestamptz_ns (different physical encodings). The implementation must support bidirectional migration — both timestamptztimestamptz_ns and timestamptz_nstimestamptz.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions