Skip to content

SNOW-3203859: 'filter' followed by withColumn(F.concat(...)) adds NaN rows and mismatches values in local testing #4105

@asyncdoggo

Description

@asyncdoggo

1. What version of Python are you using?

Python 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]

2. What are the Snowpark Python and pandas versions in the environment?

pandas==3.0.0
snowflake-snowpark-python==1.45.0

3. What did you do?

This seems like the same root cause as #3384 (SNOW-2116667), which was fixed in v1.33.0 for F.concat_ws.
F.concat still produces the same split NaN rows on v1.45.0.

Complete runnable reproduction:

from snowflake.snowpark import Session, functions as F
from snowflake.snowpark.functions import lit
from snowflake.snowpark.types import StringType, StructField, StructType

s = Session.builder.config("local_testing", True).create()

# 4 rows: TYPE in {A, B}.  LABEL derived from TYPE using F.when.
df = s.create_dataframe(
    [("A", "a1"), ("A", "a2"), ("B", "b1"), ("B", "b2")],
    StructType([StructField("TYPE", StringType()), StructField("NAME", StringType())]),
)
df = df.withColumn("LABEL", F.when(F.col("TYPE") == lit("A"), lit("a")).otherwise(lit("b")))

# Filter to 2 rows where LABEL == "b", then overwrite NAME with F.concat
result = (
    df
    .filter(F.col("LABEL") == lit("b"))
    .withColumn("NAME", F.concat(F.col("TYPE"), F.lit("_x")))
)
result.show()
s.close()

Actual output (4 rows instead of 2):

-------------------------------
|"TYPE"  |"LABEL"  |"NAME"   |
-------------------------------
|B       |b        |None     |
|B       |b        |None     |
|None    |None     |B_x      |
|None    |None     |B_x      |
-------------------------------

4. What did you expect to see?

Running the same code using actual snowflake connection:

-------------------------------
|"TYPE"  |"LABEL"  |"NAME"   |
-------------------------------
|B       |b        |B_x      |
|B       |b        |B_x      |
-------------------------------

Metadata

Metadata

Labels

bugSomething isn't workinglocal testingLocal Testing issues/PRsstatus-triage_doneInitial triage done, will be further handled by the driver team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions