Skip to content

Unequal treatment of np.NaNs with operators #54

@fkroeber

Description

@fkroeber

Description

When evaluating the operators, np.NaNs are treated unevenly, depending on whether they occur in the array x or in the comparison array y. This is due to the use of np.where(pd.notnull(x)...) without the complementary np.where(pd.notnull(y)...) in the operators.py, where one-sided NaNs are retained if they occur in x, but not if they occur in y. The consequence of this is, for example, a violation of the commutativity of Boolean operators (see MRE below).

Reproducible example

import json
import geopandas as gpd
import semantique as sq
from semantique.processor.core import QueryProcessor

# define recipe
recipe = sq.QueryRecipe()

recipe["res_a"] = sq.reflectance("s2_band02").\
    filter_time("before", sq.time_instant("2019-12-31")).\
    evaluate("or", sq.reflectance("s2_band02"))

recipe["res_b"] = sq.reflectance("s2_band02").\
    evaluate("or", sq.reflectance("s2_band02").filter_time("before", sq.time_instant("2019-12-31")))

# create context
with open("files/mapping.json", "r") as file:
    mapping = sq.mapping.Semantique(json.load(file))

with open("files/layout_gtiff.json", "r") as file:
    dc = sq.datacube.GeotiffArchive(json.load(file), src = "files/layers_gtiff.zip")

space = sq.SpatialExtent(gpd.read_file("files/footprint.geojson"))
time = sq.TemporalExtent("2019-01-01", "2020-12-31")

context = {
    "datacube": dc, 
    "mapping": mapping,
    "space": space,
    "time": time,
    "crs": 3035, 
    "tz": "UTC", 
    "spatial_resolution": [-10, 10],
    "track_types": False
}

# execute recipe
fp = QueryProcessor.parse(recipe, **context)
response = fp.optimize().execute()

# evaluate equivalance of both results -> leads to False
response["res_a"].equals(response["res_b"])

Expected behavior

The correctness of the operator algebra should be ensured by consistent handling of NaNs regardless of whether they occur in x or y.

Proposed solution

Replacing the following operator defintions

def f(x, y):
   y = utils.null_as_zero(y)
   return np.where(pd.notnull(x), np.logical_or(x, y), np.nan)

with

def f(x, y):
  return np.where(pd.notnull(x) & pd.notnull(y), np.logical_or(x, y), np.nan)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug 🐛Something isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions