Skip to content

[BUG] expect_column_values_to_match_regex ERRORs with Snowflake collated columns #37

@testingtony

Description

@testingtony

Is this a new bug in dbt-expectations?

  • I believe this is a new bug in dbt-expectations
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behaviour

A column with an explicit collation can not be used with the expect_column_values_to_match_regex generic test because Snowflake's own regexp_instr function will not work with collated columns (this is advice from Snowflake)

Expected Behavior

The collation should be removed from the values before passing to regexp_instr. This allows the tests to function as they would on columns with no explicit collation.

Steps To Reproduce

Run the expect_column_values_to_match_regex on a column which has a collation set on it.

Relevant log output

^[[0m05:17:06.210777 [debug] [Thread-1 (]: Began executing node test.collation.dbt_expectations_expect_column_values_to_match_regex_my_first_dbt_model_id3_ci___0_9_.ab17e08a5e                                                                             ^[[0m05:17:06.235444 [debug] [Thread-1 (]: Writing runtime sql for node "test.collation.dbt_expectations_expect_column_values_to_match_regex_my_first_dbt_model_id3_ci___0_9_.ab17e08a5e"                                                                   ^[[0m05:17:06.236834 [debug] [Thread-1 (]: Using snowflake connection "test.collation.dbt_expectations_expect_column_values_to_match_regex_my_first_dbt_model_id3_ci___0_9_.ab17e08a5e"                                                                     ^[[0m05:17:06.237422 [debug] [Thread-1 (]: On test.collation.dbt_expectations_expect_column_values_to_match_regex_my_first_dbt_model_id3_ci___0_9_.ab17e08a5e: select                                                                                             count(*) as failures,                                                                                                                                                                                                                                       count(*) != 0 as should_warn,                                                                                                                                                                                                                               count(*) != 0 as should_error                                                                                                                                                                                                                             from (                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          with grouped_expression as (                                                                                                                                                                                                                                select                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      regexp_instr(id3_ci, '^[0-9]+$', 1, 1, 0, '')                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        > 0                                                                                                                                                                                                                                                         as expression                                                                                                                                                                                                                                              

    from DBT_STUFF.sources.my_first_dbt_model


),
validation_errors as (

    select
        *
    from
        grouped_expression
    where
        not(expression = true)

)

select *
from validation_errors                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              

    ) dbt_internal_test
/* {"app": "dbt", "dbt_version": "1.10.15", "profile_name": "collation", "target_name": "dev", "node_id": "test.collation.dbt_expectations_expect_column_values_to_match_regex_my_first_dbt_model_id3_ci___0_9_.ab17e08a5e"} */
^[[0m05:17:06.338033 [debug] [Thread-1 (]: Snowflake adapter: Snowflake query id: 01c08dfd-0002-77c5-0002-d36600073eb6
^[[0m05:17:06.339635 [debug] [Thread-1 (]: Snowflake adapter: Snowflake error: 002401 (0A000): SQL compilation error: error line 22 at position 0
Function REGEXP_INSTR does not support collation.
^[[0m05:17:06.348622 [debug] [Thread-1 (]: Database Error in test dbt_expectations_expect_column_values_to_match_regex_my_first_dbt_model_id3_ci___0_9_ (models/example/schema.yml)
  002401 (0A000): SQL compilation error: error line 22 at position 0
  Function REGEXP_INSTR does not support collation.
  compiled code at target/run/collation/models/example/schema.yml/dbt_expectations_expect_column_7c0ca407cf266012603135c672999109.sql
^[[0m05:17:06.350537 [error] [Thread-1 (]: 2 of 8 ERROR dbt_expectations_expect_column_values_to_match_regex_my_first_dbt_model_id3_ci___0_9_  [^[[31mERROR^[[0m in 0.15s]

Environment

- OS: Ubuntu 24.04
- Python: 3.10.19
- dbt: 1.10.15
- dbt-expectations: 0.10.9

Which database adapter are you using with dbt?

Snowflake

Additional Context

Notebook demonstrating issue and potential fix

collation_issue.html

Patch file

Based on commit f487b93 (HEAD -> main, origin/main, origin/HEAD)
Merge: 77d3856 110d09d
Author: Guru Mahendran me@gurubavan.com
Date: Tue Sep 9 13:01:08 2025 +0100

collation.patch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions