Skip to content

Conversation

@nicosuave
Copy link
Member

Summary

Adds several capabilities to match dltHub's Iceberg destination:

  • Hard delete support: Rows with _dlt_deleted_at column set are permanently deleted during merge operations. Works with both delete-insert (atomic, same transaction) and upsert strategies.

  • table_location_layout: Control directory structure for table files with patterns like {namespace}/{table_name}.

  • register_new_tables: Auto-register tables found in storage but missing from catalog (local filesystem only for now).

  • Custom partition field names: iceberg_partition.year("ts", "year_field") to set custom names instead of auto-generated ones.

Breaking Change

The bucket and truncate partition methods now take parameter first to match dltHub's API:

  • Before: iceberg_partition.bucket("column", 10)
  • After: iceberg_partition.bucket(10, "column")

Tests

  • Added tests/test_capabilities.py with tests for hard delete, table location layout, and custom partition names
  • All 118 tests pass

…partition names

New capabilities matching dltHub's Iceberg destination:

1. Hard delete support
   - Rows with _dlt_deleted_at column set are permanently deleted during merge
   - Works with both delete-insert and upsert strategies
   - Delete-insert executes hard deletes atomically in same transaction
   - Configurable via hard_delete_column config option

2. table_location_layout
   - Control directory structure for table files
   - Supports patterns: {namespace}, {dataset_name}, {table_name}
   - Relative paths are prepended with warehouse location

3. register_new_tables
   - Auto-register tables found in storage but missing from catalog
   - Useful for backward compatibility and recovery scenarios
   - Currently supports local filesystem only

4. Custom partition field names
   - iceberg_partition methods now accept optional name parameter
   - Example: iceberg_partition.year("ts", "year_field")
   - API changed: bucket(num_buckets, column) and truncate(width, column)
     to match dltHub's argument order
- Accept single string or list of strings as partition parameter
- Strings are converted to identity PartitionTransforms
- Allows mixing strings with explicit PartitionTransform objects
- Example: partition=["region", iceberg_partition.month("created_at")]
@nicosuave nicosuave merged commit 5261dad into main Jan 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants