Description
When evaluating the operators, np.NaNs are treated unevenly, depending on whether they occur in the array x or in the comparison array y. This is due to the use of np.where(pd.notnull(x)...) without the complementary np.where(pd.notnull(y)...) in the operators.py, where one-sided NaNs are retained if they occur in x, but not if they occur in y. The consequence of this is, for example, a violation of the commutativity of Boolean operators (see MRE below).
Reproducible example
import json
import geopandas as gpd
import semantique as sq
from semantique.processor.core import QueryProcessor
# define recipe
recipe = sq.QueryRecipe()
recipe["res_a"] = sq.reflectance("s2_band02").\
filter_time("before", sq.time_instant("2019-12-31")).\
evaluate("or", sq.reflectance("s2_band02"))
recipe["res_b"] = sq.reflectance("s2_band02").\
evaluate("or", sq.reflectance("s2_band02").filter_time("before", sq.time_instant("2019-12-31")))
# create context
with open("files/mapping.json", "r") as file:
mapping = sq.mapping.Semantique(json.load(file))
with open("files/layout_gtiff.json", "r") as file:
dc = sq.datacube.GeotiffArchive(json.load(file), src = "files/layers_gtiff.zip")
space = sq.SpatialExtent(gpd.read_file("files/footprint.geojson"))
time = sq.TemporalExtent("2019-01-01", "2020-12-31")
context = {
"datacube": dc,
"mapping": mapping,
"space": space,
"time": time,
"crs": 3035,
"tz": "UTC",
"spatial_resolution": [-10, 10],
"track_types": False
}
# execute recipe
fp = QueryProcessor.parse(recipe, **context)
response = fp.optimize().execute()
# evaluate equivalance of both results -> leads to False
response["res_a"].equals(response["res_b"])
Expected behavior
The correctness of the operator algebra should be ensured by consistent handling of NaNs regardless of whether they occur in x or y.
Proposed solution
Replacing the following operator defintions
def f(x, y):
y = utils.null_as_zero(y)
return np.where(pd.notnull(x), np.logical_or(x, y), np.nan)
with
def f(x, y):
return np.where(pd.notnull(x) & pd.notnull(y), np.logical_or(x, y), np.nan)
Description
When evaluating the operators, np.NaNs are treated unevenly, depending on whether they occur in the array x or in the comparison array y. This is due to the use of
np.where(pd.notnull(x)...)without the complementarynp.where(pd.notnull(y)...)in the operators.py, where one-sided NaNs are retained if they occur in x, but not if they occur in y. The consequence of this is, for example, a violation of the commutativity of Boolean operators (see MRE below).Reproducible example
Expected behavior
The correctness of the operator algebra should be ensured by consistent handling of NaNs regardless of whether they occur in x or y.
Proposed solution
Replacing the following operator defintions
with