Skip to content

Feature/correct interpolation#249

Merged
genevievestarke merged 20 commits intoNatLabRockies:developfrom
paulf81:feature/correct_interpolation
Apr 22, 2026
Merged

Feature/correct interpolation#249
genevievestarke merged 20 commits intoNatLabRockies:developfrom
paulf81:feature/correct_interpolation

Conversation

@paulf81
Copy link
Copy Markdown
Collaborator

@paulf81 paulf81 commented Apr 8, 2026

Issue #248 was a first flagging of a potential issue arising from the fact that solar, and really all input data files, are assumed to have the time_utc column marking the start of the period. As noted in #248 I was first thinking this was an issue to be corrected when calling the solar module specifically, since it directly uses time_utc for solar azimuth. However I think now correcting this in the solar module is wrong since time_utc is the start of the period only in the input files, it is meant to be instantaneous within the input files.

Therefore I think the more general correction is in this pull request. Specifically, when we interpolate the input files, we need to account that we are interpolating from data files where time_utc is marking the start of the time period, onto a time_utc that is instantaneous time.

This pull request implements a correction where during interpolation, the initial values are moved to the midpoints of the time period before interpolation. This change is implemented into _interpolate_with_polars in utilities.py. I believe this is more correct.

The PR also then includes changes to the tests to match the new behavior, and then updates to documentation, to make more explicit that time_utc is start of period in inputs, and instantaneous within Hercules and in Hercules output files.

@paulf81 paulf81 self-assigned this Apr 8, 2026
@paulf81 paulf81 added bug Something isn't working enhancement New feature or request labels Apr 8, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Hercules’ input resampling behavior to account for the difference between start-of-period timestamps in input files and instantaneous values used internally, by shifting numeric input samples to interval midpoints prior to interpolation.

Changes:

  • Apply midpoint correction for numeric columns during interpolation (_interpolate_with_polars).
  • Update unit/regression tests to reflect the new interpolation semantics.
  • Clarify timing conventions in documentation (inputs = start-of-period averages; internal/outputs = instantaneous).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
hercules/utilities.py Implements midpoint-corrected interpolation and adds _compute_interval_midpoints.
tests/utilities_test.py Updates interpolation expectations to validate midpoint behavior.
tests/wind_farm_scada_power_test.py Adjusts expected SCADA replay values at step=1 due to midpoint correction.
tests/wind_farm_precom_floris_test.py Updates wind input expectations by matching internal midpoint interpolation.
tests/wind_farm_dynamic_floris_test.py Updates wind input expectations and adds midpoint-based comparison logic.
tests/power_playback_test.py Updates expected power playback at step=1 to midpoint-averaged value.
tests/test_inputs/scada_input.csv Tweaks SCADA power value to support updated rated power / expectations.
tests/regression_tests/solar_pysam_pvwatts_regression_test.py Updates expected regression arrays after interpolation semantics change.
tests/example_regression_tests/example_03_regression_test.py Updates expected final wind/solar/plant totals for corrected interpolation.
tests/example_regression_tests/example_00b_regression_precom_test.py Updates expected final totals for corrected interpolation.
tests/example_regression_tests/example_00_regression_test.py Updates expected final totals for corrected interpolation.
docs/timing.md Adds explicit “Inputs vs internal values” section and explains midpoint correction.
docs/wind.md Clarifies SCADA time_utc is start-of-period and points to timing docs.
docs/solar_pv.md Clarifies solar weather time_utc start-of-period interpretation and conversion.
docs/power_playback.md Clarifies playback file timestamps are start-of-period and are midpoint-corrected.
docs/output_files.md Clarifies output values are instantaneous (not period averages).
docs/hercules_input.md Clarifies external data file time_utc start-of-period interpretation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hercules/utilities.py
Comment on lines +567 to +570
midpoints = np.empty_like(time_values, dtype=np.float64)
midpoints[:-1] = (time_values[:-1] + time_values[1:]) / 2.0
midpoints[-1] = time_values[-1] + (time_values[-1] - time_values[-2]) / 2.0
return midpoints
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_compute_interval_midpoints assumes at least 2 time points; with a single-row input (or after upstream filtering) this will raise an IndexError at time_values[-2]. Add an explicit guard for len(time_values) < 2 (e.g., return time_values.copy() or time_values + 0.0) so interpolate_df can handle degenerate/constant inputs gracefully.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an unlikely edge case, handling by simply returning that scalar value as its own midpoint

Comment thread hercules/utilities.py Outdated
Comment on lines +516 to +525
col_values = col_data[col].to_numpy()

midpoints = _compute_interval_midpoints(time_values)

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_interpolate_with_polars recomputes midpoints inside the loop for every numeric column, even though they only depend on the (sorted) time array. For wide input tables (e.g., many turbines/signals) this is unnecessary overhead; compute the sorted time_values and midpoints once, then reuse them for each numeric column (only extracting col_values per column).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, made that change

Comment thread hercules/utilities.py Outdated
Comment on lines +484 to +491
Numeric columns are assumed to represent period-averaged values whose
timestamps mark the start of each period. To recover the best estimate
of the instantaneous value at a query time, each value is assigned to the
midpoint of its interval before interpolating.

Datetime columns (e.g. ``time_utc``) are instantaneous coordinates — they
map simulation time to wall-clock time directly — so they are interpolated
without the midpoint shift.
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The midpoint-correction behavior is implemented in _interpolate_with_polars, but interpolate_df is the public entrypoint and its docstring still describes generic interpolation. Consider updating interpolate_df’s docstring (and/or module docs) to explicitly mention the midpoint shift for numeric columns so callers don’t assume start-of-period semantics are preserved.

Suggested change
Numeric columns are assumed to represent period-averaged values whose
timestamps mark the start of each period. To recover the best estimate
of the instantaneous value at a query time, each value is assigned to the
midpoint of its interval before interpolating.
Datetime columns (e.g. ``time_utc``) are instantaneous coordinatesthey
map simulation time to wall-clock time directlyso they are interpolated
without the midpoint shift.
This helper implements the interpolation semantics used by the public
DataFrame interpolation path: numeric columns are midpoint-corrected
before interpolation, so the returned values do not preserve pure
start-of-period semantics from the original timestamps.
Numeric columns are assumed to represent period-averaged values whose
timestamps mark the start of each period. To recover the best estimate
of the instantaneous value at a query time, each value is assigned to the
midpoint of its interval before interpolating.
Datetime columns (e.g. ``time_utc``) are instantaneous coordinatesthey
map simulation time to wall-clock time directlyso they are interpolated
against the original timestamps without the midpoint shift.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to public docstring

Comment thread tests/wind_farm_dynamic_floris_test.py Outdated
).all()

# Assume df_input represents time stamps indicating start of period.
# Convert to instantanous values with midpoint correction as would be done
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "instantanous" → "instantaneous".

Suggested change
# Convert to instantanous values with midpoint correction as would be done
# Convert to instantaneous values with midpoint correction as would be done

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@paulf81
Copy link
Copy Markdown
Collaborator Author

paulf81 commented Apr 13, 2026

hi @genevievestarke and @misi9170 , thanks for the feedback today. Please see the updated PR with a few changes based on our discussion:

  1. Combined the public and private interpolate function, this just seemed needlessly complicated when we were reviewing and just thought I'd get it done
  2. interpolate_df now accepts an input to specify what type of interpolation should be done. This is mandatory, figuring having a default just opened up opportunity for error. The wind/solar inputs go for averaged_to_instantaneous while external_data goes to ZOH
  3. Added a new test for the ZOH method
  4. Minor update to tests to include the new required parameter in the function call
  5. Updates to docs to reflect new behaviors and assumptions

Copy link
Copy Markdown
Collaborator

@genevievestarke genevievestarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking really nice @paulf81!! I added some suggestions for the docs, but I think the question about how we want to handle external signals is the most important!

Comment thread hercules/utilities.py Outdated
Comment thread docs/timing.md Outdated
Comment thread docs/timing.md Outdated

# Interpolate using the utility function
df_interpolated = interpolate_df(df_ext, new_times)
df_interpolated = interpolate_df(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should hard code this. It makes sense that the LMP prices should be this type of interpolation, but this is also how we input power reference signals. Should those also use the zoh_to_instantaneous method?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is tricky. I think to date in Hercules we haven't explicitly tracked LMP prices, these are just external signals that end up having signal names that Hycon expects.

One idea is we can add to Hercules input a dictionary that says how different external channels should be upsampled. With a default to this for backwards compatibilty, or not, force explicit and have one more breaking change? @misi9170 any thoughts here?

paulf81 and others added 3 commits April 17, 2026 14:39
Americanize spelling

Co-authored-by: genevievestarke <103534902+genevievestarke@users.noreply.github.com>
Co-authored-by: genevievestarke <103534902+genevievestarke@users.noreply.github.com>
Co-authored-by: genevievestarke <103534902+genevievestarke@users.noreply.github.com>
@paulf81
Copy link
Copy Markdown
Collaborator Author

paulf81 commented Apr 17, 2026

Thank you @genevievestarke for your comments! I commited your suggested changes and then the question on upsampling the LMP is a good one, so maybe a little more discussion to go there

@paulf81
Copy link
Copy Markdown
Collaborator Author

paulf81 commented Apr 21, 2026

ok @genevievestarke and @misi9170 I removed the ZOH option and added documentation on how ZOH should be done in this framework. Hopefully much more clear now! I think back to ready for review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hercules/utilities.py Outdated
Comment on lines +515 to +528
df_pl = pl.from_pandas(df)
result_pl = pl.DataFrame({"time": new_time})

# Create a Polars DataFrame for the new time points
new_time_pl = pl.DataFrame({"time": new_time})

# Start with the time column
result_pl = new_time_pl

# Process numeric columns using Polars' interpolation
if numeric_cols:
for col in numeric_cols:
# Use Polars' join_asof for efficient interpolation-like behavior
# This is more memory efficient than pandas for large datasets
col_data = df_pl.select(["time", col]).sort("time")
time_values = df_pl["time"].to_numpy()

# Perform interpolation using Polars operations
# Note: Polars doesn't have direct linear interpolation, so we use numpy interp
# but with Polars' efficient data extraction
time_values = col_data["time"].to_numpy()
col_values = col_data[col].to_numpy()

# Linear interpolation with float32 precision
interpolated_values = np.interp(new_time, time_values, col_values).astype(
hercules_float_type
)
if interpolation_method == "averaged_to_instantaneous":
x_coords = _compute_interval_midpoints(time_values)
else:
x_coords = time_values

# Add interpolated column to result
result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col))
for col in numeric_cols:
col_values = df_pl[col].to_numpy()
interpolated_values = np.interp(new_time, x_coords, col_values).astype(hercules_float_type)
result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col))
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.interp requires the x-coordinates to be increasing, but time_values (and the corresponding col_values) are no longer sorted for numeric columns. Previously, the code sorted per-column (.sort("time")); now numeric interpolation can silently produce incorrect results or error if df["time"] isn’t strictly increasing. Fix by sorting once by "time" (and applying the same ordering to all numeric columns) before computing x_coords and calling np.interp.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixing

Comment thread hercules/utilities.py
}


def interpolate_df(df, new_time, interpolation_method):
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpolate_df previously accepted two parameters and now requires interpolation_method with no default, which is a breaking API change for any downstream/internal callers not updated in this PR. If backward compatibility is needed, consider providing a default (and optionally emitting a deprecation warning when omitted) so older call sites keep working while migrating to explicit behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not truly backward compatible, but this function was not used outside of Hercules

Comment thread docs/hercules_input.md Outdated

The CSV file must contain:
- A `time_utc` column with UTC timestamps in ISO 8601 format
- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description appears to conflict with HerculesModel._read_external_data_file, which explicitly interpolates external data with "instantaneous_to_instantaneous" (and docs/timing.md also documents external data as instantaneous unless preprocessed for ZOH). Update this line to reflect the external-data convention (instantaneous-to-instantaneous interpolation), and reserve the “start-of-period period-average” wording for wind/solar/SCADA/playback inputs.

Suggested change
- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp represents an **instantaneous sample time** for the values on that row. Hercules interpolates external data using instantaneous-to-instantaneous interpolation. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for the distinction between these external-data inputs and start-of-period period-average inputs such as wind/solar/SCADA/playback data.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated

Comment on lines +59 to +63
df_input["time"] = np.arange(0, df_input.shape[0], 1)
df_input["time_utc"] = pd.to_datetime(df_input["time_utc"])
df_input_interpolated = interpolate_df(
df_input,
np.arange(0, df_input.shape[0], 1),
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test mutates df_input in-place (adds "time" and converts "time_utc" dtype). That can create hard-to-debug coupling if the same df_input object is reused later in the test/module/fixture. Prefer working on a copy (e.g., df_input = df_input.copy() or creating a derived frame) before adding columns/type conversions.

Suggested change
df_input["time"] = np.arange(0, df_input.shape[0], 1)
df_input["time_utc"] = pd.to_datetime(df_input["time_utc"])
df_input_interpolated = interpolate_df(
df_input,
np.arange(0, df_input.shape[0], 1),
df_input_interpolation = df_input.copy()
df_input_interpolation["time"] = np.arange(0, df_input_interpolation.shape[0], 1)
df_input_interpolation["time_utc"] = pd.to_datetime(
df_input_interpolation["time_utc"]
)
df_input_interpolated = interpolate_df(
df_input_interpolation,
np.arange(0, df_input_interpolation.shape[0], 1),

Copilot uses AI. Check for mistakes.
Comment thread tests/utilities_test.py
value_points = time_points * 1.7
df = pd.DataFrame({"time": time_points, "value": value_points})

# Query at interval midpoints (0.5, 5.5, 9.5) and end points (0, 10)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment doesn’t match new_time (it mentions 0.5 and 9.5, but the array contains 5.0 and 10.5). Please update the comment to reflect the actual query points (or adjust new_time to match the comment) to keep the test intent clear.

Suggested change
# Query at interval midpoints (0.5, 5.5, 9.5) and end points (0, 10)
# Query at endpoints and representative midpoint/boundary points (0.0, 5.0, 5.5, 10.0, 10.5)

Copilot uses AI. Check for mistakes.
Comment thread tests/utilities_test.py Outdated
# ...
# Time 10 is in between last and second last midpoint, so value should be 9
expected_values = [0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
assert np.allclose(result["value"], expected_values), "Interpolated values should match y = x"
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion message no longer reflects the updated expectation for "averaged_to_instantaneous" (the expected output is not simply y = x anymore due to midpoint shift and clamping). Update the message to describe midpoint-corrected interpolation (or remove the message) so failures are easier to interpret.

Suggested change
assert np.allclose(result["value"], expected_values), "Interpolated values should match y = x"
assert np.allclose(
result["value"], expected_values
), "Interpolated values should match midpoint-corrected averaged_to_instantaneous output with edge clamping"

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing assertion

Copy link
Copy Markdown
Collaborator

@genevievestarke genevievestarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, @paulf81!
Just one suggestion for the docs and then we can merge!

Comment thread docs/timing.md Outdated
#### Achieving zero-order-hold (ZOH) behaviour

`interpolate_df` does not provide a dedicated zero-order-hold mode. If you
need step/piecewise-constant semantics -- for example, LMP prices that
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
need step/piecewise-constant semantics -- for example, LMP prices that
need step/piecewise-constant values -- for example, LMP prices that

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks @genevievestarke!

@paulf81
Copy link
Copy Markdown
Collaborator Author

paulf81 commented Apr 21, 2026

Looks good, @paulf81! Just one suggestion for the docs and then we can merge!

Thank you @genevievestarke ! I think it's good to go on my end now

@genevievestarke genevievestarke merged commit 54b94f1 into NatLabRockies:develop Apr 22, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request Ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants