Skip to content

Incremental --full-refresh drops the target before recreating it — data loss if creation fails #397

@sdebruyn

Description

@sdebruyn

Summary

fabric__incremental (via dbt/include/fabric/macros/materializations/models/incremental/incremental.sql) calls adapter.drop_relation(target_relation) before re-creating the table. If the subsequent CREATE TABLE AS SELECT then fails for any reason — transient Fabric error, query timeout, broken model SQL, capacity issue, OOM, network hiccup — the user is left with no table at all. This is the documented data-loss anti-pattern that every other reference adapter (dbt-postgres, dbt-snowflake, dbt-spark, dbt-bigquery) avoids by using an intermediate-relation + backup-rename swap.

Evidence (HEAD 0de2190, v1.10.0)

dbt/include/fabric/macros/materializations/models/incremental/incremental.sql#L25-L42:

{% if existing_relation is none or full_refresh_mode or existing_relation.is_view %}
    ...
    {{ adapter.drop_relation(target_relation) }}

    {%- call statement('main') -%}
      {{ get_create_table_as_sql(False, target_relation, sql)}}
    {%- endcall -%}
    ...
{% endif %}

The drop happens unconditionally, before the create. There is no rollback path.

Reproduction

  1. Set up a nightly job that runs dbt build --full-refresh against a production Fabric DW.
  2. On a network hiccup, capacity throttling event, or transient Fabric error during the CTAS step, the target table is gone.
  3. Downstream BI dashboards and reports that depend on the table now point at nothing.

User impact

Silent destructive failure mode. A single transient infrastructure error can wipe a production table that downstream consumers depend on. The user only learns about it when the dashboard goes blank or when the next dbt run reports missing dependencies.

Suggested fix

Use the standard dbt-native intermediate + backup + rename swap pattern that every other reference adapter uses:

{% if full_refresh_mode %}
    {% set intermediate_relation = make_intermediate_relation(target_relation) %}
    {% set backup_relation = make_backup_relation(target_relation, target_relation.type) %}
    {% do adapter.drop_relation(intermediate_relation) %}
    {% do adapter.drop_relation(backup_relation) %}
    {% set build_sql = create_table_as(False, intermediate_relation, sql) %}
    {% do run_query(build_sql) %}
    {% if existing_relation is not none %}
        {% do adapter.rename_relation(existing_relation, backup_relation) %}
    {% endif %}
    {% do adapter.rename_relation(intermediate_relation, target_relation) %}
    {% if existing_relation is not none %}
        {% do adapter.drop_relation(backup_relation) %}
    {% endif %}
{% endif %}

This is the same pattern dbt-postgres, dbt-snowflake, and dbt-spark use. The existing target keeps its data if creation fails.

A PR with the change is linked from this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions