Feature/correct interpolation by paulf81 · Pull Request #249 · NatLabRockies/hercules

paulf81 · 2026-04-08T21:08:34Z

Issue #248 was a first flagging of a potential issue arising from the fact that solar, and really all input data files, are assumed to have the time_utc column marking the start of the period. As noted in #248 I was first thinking this was an issue to be corrected when calling the solar module specifically, since it directly uses time_utc for solar azimuth. However I think now correcting this in the solar module is wrong since time_utc is the start of the period only in the input files, it is meant to be instantaneous within the input files.

Therefore I think the more general correction is in this pull request. Specifically, when we interpolate the input files, we need to account that we are interpolating from data files where time_utc is marking the start of the time period, onto a time_utc that is instantaneous time.

This pull request implements a correction where during interpolation, the initial values are moved to the midpoints of the time period before interpolation. This change is implemented into _interpolate_with_polars in utilities.py. I believe this is more correct.

The PR also then includes changes to the tests to match the new behavior, and then updates to documentation, to make more explicit that time_utc is start of period in inputs, and instantaneous within Hercules and in Hercules output files.

Copilot

Pull request overview

This PR updates Hercules’ input resampling behavior to account for the difference between start-of-period timestamps in input files and instantaneous values used internally, by shifting numeric input samples to interval midpoints prior to interpolation.

Changes:

Apply midpoint correction for numeric columns during interpolation (_interpolate_with_polars).
Update unit/regression tests to reflect the new interpolation semantics.
Clarify timing conventions in documentation (inputs = start-of-period averages; internal/outputs = instantaneous).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`hercules/utilities.py`	Implements midpoint-corrected interpolation and adds `_compute_interval_midpoints`.
`tests/utilities_test.py`	Updates interpolation expectations to validate midpoint behavior.
`tests/wind_farm_scada_power_test.py`	Adjusts expected SCADA replay values at `step=1` due to midpoint correction.
`tests/wind_farm_precom_floris_test.py`	Updates wind input expectations by matching internal midpoint interpolation.
`tests/wind_farm_dynamic_floris_test.py`	Updates wind input expectations and adds midpoint-based comparison logic.
`tests/power_playback_test.py`	Updates expected power playback at `step=1` to midpoint-averaged value.
`tests/test_inputs/scada_input.csv`	Tweaks SCADA power value to support updated rated power / expectations.
`tests/regression_tests/solar_pysam_pvwatts_regression_test.py`	Updates expected regression arrays after interpolation semantics change.
`tests/example_regression_tests/example_03_regression_test.py`	Updates expected final wind/solar/plant totals for corrected interpolation.
`tests/example_regression_tests/example_00b_regression_precom_test.py`	Updates expected final totals for corrected interpolation.
`tests/example_regression_tests/example_00_regression_test.py`	Updates expected final totals for corrected interpolation.
`docs/timing.md`	Adds explicit “Inputs vs internal values” section and explains midpoint correction.
`docs/wind.md`	Clarifies SCADA `time_utc` is start-of-period and points to timing docs.
`docs/solar_pv.md`	Clarifies solar weather `time_utc` start-of-period interpretation and conversion.
`docs/power_playback.md`	Clarifies playback file timestamps are start-of-period and are midpoint-corrected.
`docs/output_files.md`	Clarifies output values are instantaneous (not period averages).
`docs/hercules_input.md`	Clarifies external data file `time_utc` start-of-period interpretation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T21:13:00Z

+    midpoints = np.empty_like(time_values, dtype=np.float64)
+    midpoints[:-1] = (time_values[:-1] + time_values[1:]) / 2.0
+    midpoints[-1] = time_values[-1] + (time_values[-1] - time_values[-2]) / 2.0
+    return midpoints


_compute_interval_midpoints assumes at least 2 time points; with a single-row input (or after upstream filtering) this will raise an IndexError at time_values[-2]. Add an explicit guard for len(time_values) < 2 (e.g., return time_values.copy() or time_values + 0.0) so interpolate_df can handle degenerate/constant inputs gracefully.

This seems like an unlikely edge case, handling by simply returning that scalar value as its own midpoint

Copilot · 2026-04-08T21:13:00Z

            col_values = col_data[col].to_numpy()

+            midpoints = _compute_interval_midpoints(time_values)
+


_interpolate_with_polars recomputes midpoints inside the loop for every numeric column, even though they only depend on the (sorted) time array. For wide input tables (e.g., many turbines/signals) this is unnecessary overhead; compute the sorted time_values and midpoints once, then reuse them for each numeric column (only extracting col_values per column).

Good suggestion, made that change

Copilot · 2026-04-08T21:13:01Z

+    Numeric columns are assumed to represent period-averaged values whose
+    timestamps mark the start of each period.  To recover the best estimate
+    of the instantaneous value at a query time, each value is assigned to the
+    midpoint of its interval before interpolating.
+
+    Datetime columns (e.g. ``time_utc``) are instantaneous coordinates — they
+    map simulation time to wall-clock time directly — so they are interpolated
+    without the midpoint shift.


The midpoint-correction behavior is implemented in _interpolate_with_polars, but interpolate_df is the public entrypoint and its docstring still describes generic interpolation. Consider updating interpolate_df’s docstring (and/or module docs) to explicitly mention the midpoint shift for numeric columns so callers don’t assume start-of-period semantics are preserved.

Suggested change

Numeric columns are assumed to represent period-averaged values whose

timestamps mark the start of each period. To recover the best estimate

of the instantaneous value at a query time, each value is assigned to the

midpoint of its interval before interpolating.

Datetime columns (e.g. ``time_utc``) are instantaneous coordinates — they

map simulation time to wall-clock time directly — so they are interpolated

without the midpoint shift.

This helper implements the interpolation semantics used by the public

DataFrame interpolation path: numeric columns are midpoint-corrected

before interpolation, so the returned values do not preserve pure

start-of-period semantics from the original timestamps.

Numeric columns are assumed to represent period-averaged values whose

timestamps mark the start of each period. To recover the best estimate

of the instantaneous value at a query time, each value is assigned to the

midpoint of its interval before interpolating.

Datetime columns (e.g. ``time_utc``) are instantaneous coordinates—they

map simulation time to wall-clock time directly—so they are interpolated

against the original timestamps without the midpoint shift.

Adding to public docstring

Copilot · 2026-04-08T21:13:01Z

-    ).all()
+
+    # Assume df_input represents time stamps indicating start of period.
+    # Convert to instantanous values with midpoint correction as would be done


Typo in comment: "instantanous" → "instantaneous".

Suggested change

# Convert to instantanous values with midpoint correction as would be done

# Convert to instantaneous values with midpoint correction as would be done

paulf81 · 2026-04-13T23:05:31Z

hi @genevievestarke and @misi9170 , thanks for the feedback today. Please see the updated PR with a few changes based on our discussion:

Combined the public and private interpolate function, this just seemed needlessly complicated when we were reviewing and just thought I'd get it done
interpolate_df now accepts an input to specify what type of interpolation should be done. This is mandatory, figuring having a default just opened up opportunity for error. The wind/solar inputs go for averaged_to_instantaneous while external_data goes to ZOH
Added a new test for the ZOH method
Minor update to tests to include the new required parameter in the function call
Updates to docs to reflect new behaviors and assumptions

genevievestarke

I think this is looking really nice @paulf81!! I added some suggestions for the docs, but I think the question about how we want to handle external signals is the most important!

genevievestarke · 2026-04-17T15:04:32Z


        # Interpolate using the utility function
-        df_interpolated = interpolate_df(df_ext, new_times)
+        df_interpolated = interpolate_df(


I'm not sure we should hard code this. It makes sense that the LMP prices should be this type of interpolation, but this is also how we input power reference signals. Should those also use the zoh_to_instantaneous method?

I agree this is tricky. I think to date in Hercules we haven't explicitly tracked LMP prices, these are just external signals that end up having signal names that Hycon expects.

One idea is we can add to Hercules input a dictionary that says how different external channels should be upsampled. With a default to this for backwards compatibilty, or not, force explicit and have one more breaking change? @misi9170 any thoughts here?

Americanize spelling Co-authored-by: genevievestarke <103534902+genevievestarke@users.noreply.github.com>

Co-authored-by: genevievestarke <103534902+genevievestarke@users.noreply.github.com>

paulf81 · 2026-04-17T20:44:34Z

Thank you @genevievestarke for your comments! I commited your suggested changes and then the question on upsampling the LMP is a good one, so maybe a little more discussion to go there

paulf81 · 2026-04-21T19:18:36Z

ok @genevievestarke and @misi9170 I removed the ZOH option and added documentation on how ZOH should be done in this framework. Hopefully much more clear now! I think back to ready for review

Copilot

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T19:24:50Z

    df_pl = pl.from_pandas(df)
+    result_pl = pl.DataFrame({"time": new_time})

-    # Create a Polars DataFrame for the new time points
-    new_time_pl = pl.DataFrame({"time": new_time})
-
-    # Start with the time column
-    result_pl = new_time_pl
-
-    # Process numeric columns using Polars' interpolation
-    if numeric_cols:
-        for col in numeric_cols:
-            # Use Polars' join_asof for efficient interpolation-like behavior
-            # This is more memory efficient than pandas for large datasets
-            col_data = df_pl.select(["time", col]).sort("time")
+    time_values = df_pl["time"].to_numpy()

-            # Perform interpolation using Polars operations
-            # Note: Polars doesn't have direct linear interpolation, so we use numpy interp
-            # but with Polars' efficient data extraction
-            time_values = col_data["time"].to_numpy()
-            col_values = col_data[col].to_numpy()
-
-            # Linear interpolation with float32 precision
-            interpolated_values = np.interp(new_time, time_values, col_values).astype(
-                hercules_float_type
-            )
+    if interpolation_method == "averaged_to_instantaneous":
+        x_coords = _compute_interval_midpoints(time_values)
+    else:
+        x_coords = time_values

-            # Add interpolated column to result
-            result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col))
+    for col in numeric_cols:
+        col_values = df_pl[col].to_numpy()
+        interpolated_values = np.interp(new_time, x_coords, col_values).astype(hercules_float_type)
+        result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col))


np.interp requires the x-coordinates to be increasing, but time_values (and the corresponding col_values) are no longer sorted for numeric columns. Previously, the code sorted per-column (.sort("time")); now numeric interpolation can silently produce incorrect results or error if df["time"] isn’t strictly increasing. Fix by sorting once by "time" (and applying the same ordering to all numeric columns) before computing x_coords and calling np.interp.

good catch, fixing

Copilot · 2026-04-21T19:24:51Z

+}
+
+
+def interpolate_df(df, new_time, interpolation_method):


interpolate_df previously accepted two parameters and now requires interpolation_method with no default, which is a breaking API change for any downstream/internal callers not updated in this PR. If backward compatibility is needed, consider providing a default (and optionally emitting a deprecation warning when omitted) so older call sites keep working while migrating to explicit behavior.

Not truly backward compatible, but this function was not used outside of Hercules

Copilot · 2026-04-21T19:24:51Z


 The CSV file must contain:
- A `time_utc` column with UTC timestamps in ISO 8601 format
+- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.


This description appears to conflict with HerculesModel._read_external_data_file, which explicitly interpolates external data with "instantaneous_to_instantaneous" (and docs/timing.md also documents external data as instantaneous unless preprocessed for ZOH). Update this line to reflect the external-data convention (instantaneous-to-instantaneous interpolation), and reserve the “start-of-period period-average” wording for wind/solar/SCADA/playback inputs.

Suggested change

- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.

- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp represents an **instantaneous sample time** for the values on that row. Hercules interpolates external data using instantaneous-to-instantaneous interpolation. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for the distinction between these external-data inputs and start-of-period period-average inputs such as wind/solar/SCADA/playback data.

Good catch, updated

Copilot · 2026-04-21T19:24:51Z

+    df_input["time"] = np.arange(0, df_input.shape[0], 1)
+    df_input["time_utc"] = pd.to_datetime(df_input["time_utc"])
+    df_input_interpolated = interpolate_df(
+        df_input,
+        np.arange(0, df_input.shape[0], 1),


This test mutates df_input in-place (adds "time" and converts "time_utc" dtype). That can create hard-to-debug coupling if the same df_input object is reused later in the test/module/fixture. Prefer working on a copy (e.g., df_input = df_input.copy() or creating a derived frame) before adding columns/type conversions.

Suggested change

df_input["time"] = np.arange(0, df_input.shape[0], 1)

df_input["time_utc"] = pd.to_datetime(df_input["time_utc"])

df_input_interpolated = interpolate_df(

df_input,

np.arange(0, df_input.shape[0], 1),

df_input_interpolation = df_input.copy()

df_input_interpolation["time"] = np.arange(0, df_input_interpolation.shape[0], 1)

df_input_interpolation["time_utc"] = pd.to_datetime(

df_input_interpolation["time_utc"]

)

df_input_interpolated = interpolate_df(

df_input_interpolation,

np.arange(0, df_input_interpolation.shape[0], 1),

Copilot · 2026-04-21T19:24:51Z

+    value_points = time_points * 1.7
+    df = pd.DataFrame({"time": time_points, "value": value_points})
+
+    # Query at interval midpoints (0.5, 5.5, 9.5) and end points (0, 10)


The comment doesn’t match new_time (it mentions 0.5 and 9.5, but the array contains 5.0 and 10.5). Please update the comment to reflect the actual query points (or adjust new_time to match the comment) to keep the test intent clear.

Suggested change

# Query at interval midpoints (0.5, 5.5, 9.5) and end points (0, 10)

# Query at endpoints and representative midpoint/boundary points (0.0, 5.0, 5.5, 10.0, 10.5)

Copilot · 2026-04-21T19:24:52Z

+    # ...
+    # Time 10 is in between last and second last midpoint, so value should be 9
+    expected_values = [0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    assert np.allclose(result["value"], expected_values), "Interpolated values should match y = x"


The assertion message no longer reflects the updated expectation for "averaged_to_instantaneous" (the expected output is not simply y = x anymore due to midpoint shift and clamping). Update the message to describe midpoint-corrected interpolation (or remove the message) so failures are easier to interpret.

Suggested change

assert np.allclose(result["value"], expected_values), "Interpolated values should match y = x"

assert np.allclose(

result["value"], expected_values

), "Interpolated values should match midpoint-corrected averaged_to_instantaneous output with edge clamping"

Fixing assertion

genevievestarke

Looks good, @paulf81!
Just one suggestion for the docs and then we can merge!

genevievestarke · 2026-04-21T19:23:26Z

+#### Achieving zero-order-hold (ZOH) behaviour
+
+`interpolate_df` does not provide a dedicated zero-order-hold mode.  If you
+need step/piecewise-constant semantics -- for example, LMP prices that


Suggested change

need step/piecewise-constant semantics -- for example, LMP prices that

need step/piecewise-constant values -- for example, LMP prices that

fixed, thanks @genevievestarke!

paulf81 · 2026-04-21T21:36:13Z

Looks good, @paulf81! Just one suggestion for the docs and then we can merge!

Thank you @genevievestarke ! I think it's good to go on my end now

paulf81 added 5 commits April 8, 2026 14:37

interpolate with midpoint correction

750eccc

update tests

0651975

update docs

f0aa013

pass midpoints to solar module

aa82d05

undo change

72510da

paulf81 requested review from Copilot, genevievestarke and misi9170 April 8, 2026 21:08

paulf81 self-assigned this Apr 8, 2026

paulf81 added bug Something isn't working enhancement New feature or request labels Apr 8, 2026

Copilot started reviewing on behalf of paulf81 April 8, 2026 21:09 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

paulf81 added 4 commits April 8, 2026 15:17

add to public docstring

f1abb94

typo

806ee4a

dont compute midpoints in loop

753d077

handle edge case

55c112d

paulf81 added the Ready for review label Apr 10, 2026

paulf81 added 2 commits April 13, 2026 16:41

combine interpolate functions

a444406

Add additional interpolation methods

9367453

genevievestarke reviewed Apr 17, 2026

View reviewed changes

paulf81 and others added 3 commits April 17, 2026 14:39

Update hercules/utilities.py

9cf9211

Americanize spelling Co-authored-by: genevievestarke <103534902+genevievestarke@users.noreply.github.com>

Make docs more specific

87904a3

Co-authored-by: genevievestarke <103534902+genevievestarke@users.noreply.github.com>

Improve docs

17ff78e

Co-authored-by: genevievestarke <103534902+genevievestarke@users.noreply.github.com>

genevievestarke and others added 2 commits April 20, 2026 15:44

Merge branch 'develop' into feature/correct_interpolation

3fa8c1a

remove zoh and update docs and tests

31e9bc5

paulf81 requested review from Copilot and genevievestarke April 21, 2026 19:17

Copilot AI reviewed Apr 21, 2026

View reviewed changes

genevievestarke approved these changes Apr 21, 2026

View reviewed changes

Copilot started reviewing on behalf of paulf81 April 21, 2026 19:25 View session

paulf81 added 4 commits April 21, 2026 15:26

fix docs

fcc046d

fix

d872b4b

fix sorting

b62f96a

fix assertion

34d0053

genevievestarke merged commit 54b94f1 into NatLabRockies:develop Apr 22, 2026
3 checks passed

		col_values = col_data[col].to_numpy()

		midpoints = _compute_interval_midpoints(time_values)

	# Convert to instantanous values with midpoint correction as would be done
	# Convert to instantaneous values with midpoint correction as would be done

	# Query at interval midpoints (0.5, 5.5, 9.5) and end points (0, 10)
	# Query at endpoints and representative midpoint/boundary points (0.0, 5.0, 5.5, 10.0, 10.5)

	need step/piecewise-constant semantics -- for example, LMP prices that
	need step/piecewise-constant values -- for example, LMP prices that

Conversation

paulf81 commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paulf81 commented Apr 13, 2026

Uh oh!

genevievestarke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paulf81 commented Apr 17, 2026

Uh oh!

paulf81 commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

genevievestarke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment