Skip to content

varchar(8000) default in FabricColumn silently truncates strings #384

@sdebruyn

Description

@sdebruyn

Summary

FabricColumn.TYPE_LABELS maps the generic STRING type to VARCHAR(8000), and FabricColumn.string_type() / string_size() both default to 8000 when char_size is None. Fabric Warehouse supports varchar(MAX), so the 8000-character cap is an unnecessary hard limit that silently truncates any string column produced through these code paths.

Evidence (HEAD 0de2190, v1.10.0)

These defaults flow into any code path that asks the adapter what type a string column should be without specifying a width — most commonly, models that copy long-text source columns (JSON payloads, free-text descriptions, serialized blobs) where the column type is inferred rather than declared in a contract.

User impact

  • Any inferred-width string column is hard-capped at 8000 characters regardless of source data width. Long-text columns from a source — JSON payloads, free-text fields, serialized blobs — silently lose the tail beyond byte 8000 when materialized into a Fabric table.
  • The truncation is silent: no warning, no error, just lost data. Users typically only notice when a downstream query returns visibly cut-off text.

Suggested fix

Default to varchar(MAX) (Fabric supports it) and only fall back to a fixed width when the user has explicitly requested one.

TYPE_LABELS = {
    ...
    "STRING": "VARCHAR(MAX)",
    "VARCHAR": "VARCHAR(MAX)",
    "NVARCHAR": "VARCHAR(MAX)",
}

@classmethod
def string_type(cls, size: int) -> str:
    if size is None or size <= 0:
        return "varchar(max)"
    return f"varchar({size})"

def string_size(self) -> int:
    if not self.is_string():
        raise DbtRuntimeError("Called string_size() on non-string field!")
    if self.char_size is None:
        return -1  # sentinel for MAX
    return int(self.char_size)

can_expand_to needs a small update so the -1 sentinel compares as the widest possible size.

A PR with the full change is linked from this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions