diff --git a/docs/hercules_input.md b/docs/hercules_input.md index 3f674c5a..982066ed 100644 --- a/docs/hercules_input.md +++ b/docs/hercules_input.md @@ -131,7 +131,7 @@ The old format is still supported for backward compatibility but will show a dep ### External Data File Format The CSV file must contain: -- A `time_utc` column with UTC timestamps in ISO 8601 format +- A `time_utc` column with UTC timestamps in ISO 8601 format. Unlike wind/solar/SCADA/playback inputs (which are treated as start-of-period period averages), external data values are treated as **instantaneous** samples at their timestamps and are upsampled to the simulation time grid via `"instantaneous_to_instantaneous"` (linear interpolation). If you need zero-order-hold (piecewise-constant) behaviour -- e.g. for LMP prices -- pre-process the file to include an extra row at the end of each interval carrying the previous value; see [Achieving zero-order-hold (ZOH) behaviour](timing.md#achieving-zero-order-hold-zoh-behaviour) and the [`generate_locational_marginal_price_dataframe_from_gridstatus`](../hercules/grid/grid_utilities.py) helper. - One or more data columns with external signals. Note that the names of the other columns are arbitrary; any column names will be carried forward and interpolated. However, the values must be floats. Additionally, some controllers and plotting utilities that work on external signals may require specific column names like `lmp_rt`, `lmp_da`, `wind_forecast`, etc. Example `lmp_data.csv`: diff --git a/docs/output_files.md b/docs/output_files.md index a2c33148..1f872792 100644 --- a/docs/output_files.md +++ b/docs/output_files.md @@ -2,6 +2,8 @@ Hercules generates HDF5 output files containing simulation data for analysis and visualization. This page describes the file format, available utilities for reading the data, and how HerculesModel generates these files. +All values in output files represent **instantaneous** quantities at each time step, not period averages. This differs from the convention used by input data files, where timestamps mark the start of a reporting period. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for details on this distinction and the midpoint correction applied during input interpolation. + ## File Format Hercules outputs simulation data in HDF5 (Hierarchical Data Format 5) format. diff --git a/docs/power_playback.md b/docs/power_playback.md index d231883c..4c68d8ba 100644 --- a/docs/power_playback.md +++ b/docs/power_playback.md @@ -32,7 +32,7 @@ power_unit_1: The input file must contain the following columns: -- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings) +- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings). Each timestamp marks the **start of a reporting period**; the power value on that row is treated as the period average. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values. - `power`: Power output in kW Supported file formats: `.csv`, `.p`, `.pkl` (pickle), `.f`, `.ftr` (feather). diff --git a/docs/solar_pv.md b/docs/solar_pv.md index f30d8cf7..c587266d 100644 --- a/docs/solar_pv.md +++ b/docs/solar_pv.md @@ -12,7 +12,7 @@ Presently only one solar simulator is available Both models require an input weather file: 1. A CSV file that specifies the weather conditions (e.g. NonAnnualSimulation-sample_data-interpolated-daytime.csv). This file should include: - - timestamp (see [timing](timing.md) for time format requirements) + - timestamp (see [timing](timing.md) for time format requirements). Each `time_utc` timestamp marks the **start of a reporting period**; irradiance and weather values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values. - direct normal irradiance (DNI) - diffuse horizontal irradiance (DHI) - global horizontal irradiance (GHI) diff --git a/docs/timing.md b/docs/timing.md index 795ebbb4..29e8f5f6 100644 --- a/docs/timing.md +++ b/docs/timing.md @@ -9,6 +9,105 @@ Timing in Hercules is specified using two complementary representations: - `time` (float): Simulation time in seconds, where `time=0` corresponds to `starttime_utc` - `time_utc` (datetime): Absolute UTC timestamp +## Time Interpretation: Inputs vs. Internal Values + +### Input files: start-of-period convention + +In external data sources such as weather files, SCADA records, and resource +databases, each `time_utc` timestamp marks the **beginning** of a reporting +period and the associated values (irradiance, wind speed, power, etc.) +represent an average or aggregate over that period. For example, an hourly +weather file with a row at `2020-06-15T12:00:00Z` and GHI = 735 W/m² means +that 735 W/m² is the average GHI from 12:00 to 13:00. + +### Hercules internal values: instantaneous convention + +Inside the simulation, values at a given time step represent **instantaneous** +quantities at that moment. All Hercules output values follow this same +instantaneous convention. + +### Interpolation methods + +The `interpolate_df` function in `utilities.py` accepts a mandatory +`interpolation_method` parameter that controls how numeric columns are +resampled onto the simulation time grid. Two methods are available: + +#### `"averaged_to_instantaneous"` (wind, solar, and similar resource and power signals) + +Input values are period averages whose timestamps mark the **start** of each +period. The best single-point estimate of a period-averaged value is at the +**midpoint** of its interval, not the start. For example, the hourly average +from 12:00-13:00 is most representative of conditions at 12:30. This also ensures that an average of the signal back to the original time interval will match the original data. + +1. Each numeric value is assigned to the midpoint of its input interval + (using `_compute_interval_midpoints`). +2. Linear interpolation is then performed between these midpoints to produce + values at the simulation time steps. + +``` +Input file (start-of-period): + +time_utc value +12:00 100 ← average over [12:00, 13:00) +13:00 200 ← average over [13:00, 14:00) + +After midpoint correction: + +time value +12:30 100 ← midpoint of [12:00, 13:00) +13:30 200 ← midpoint of [13:00, 14:00) + +Querying at 13:00 yields 150 (halfway between midpoints). +``` + +#### `"instantaneous_to_instantaneous"` + +Input values already represent instantaneous measurements at their +timestamps. Standard linear interpolation is performed directly on the +original timestamps with no midpoint shift. + +--- + +In both methods, datetime columns (e.g. `time_utc`) are linearly +interpolated on the raw timestamps without any shift, because they are +instantaneous coordinate mappings between simulation time and wall-clock +time, not period-averaged measurements. + +#### Achieving zero-order-hold (ZOH) behaviour + +`interpolate_df` does not provide a dedicated zero-order-hold mode. If you +need step/piecewise-constant values -- for example, LMP prices that +should be held constant across each reporting interval -- pre-process your +input data to include an additional row at the end of each interval that +carries the same value as the start-of-interval row, and then use +`"instantaneous_to_instantaneous"`. Linear interpolation between each pair +of identical endpoints reproduces the ZOH shape. + +``` +Original data (start-of-interval only): + +time_utc value +12:00 100 +13:00 200 + +After inserting end-of-interval rows (just before the next start): + +time_utc value +12:00 100 +12:59:59 100 ← added endpoint +13:00 200 +13:59:59 200 ← added endpoint + +Querying at 12:30 with "instantaneous_to_instantaneous" yields 100. +Querying at 13:00 yields 200. +``` + +See +[`generate_locational_marginal_price_dataframe_from_gridstatus`](../hercules/grid/grid_utilities.py) +in `hercules/grid/grid_utilities.py` for a worked example of this +endpoint-insertion pattern (it shifts a copy of the data by `dt - 1` seconds +and merges it back in before handing the frame to Hercules). + ## Input Requirements All Hercules input files must specify start and end times using UTC datetime strings: @@ -113,7 +212,20 @@ For the example above, `endtime` would be 3600.0 seconds. ### Wind and Solar Input Data -Both wind and solar input CSV/Feather/Parquet files must contain a `time_utc` column with UTC timestamps: +Both wind and solar input CSV/Feather/Parquet files must contain a `time_utc` column with UTC timestamps. Each `time_utc` value marks the **start of a reporting period**; the data values on that row are treated as period averages. These are interpolated with `"averaged_to_instantaneous"`. See [Interpolation methods](#interpolation-methods) above for details. + +### External Data (LMP, etc.) + +External data files loaded via `_read_external_data_file` are upsampled onto +the simulation time grid with `"instantaneous_to_instantaneous"` (linear +interpolation between the supplied timestamps). If you want zero-order-hold +(piecewise-constant) behaviour for signals like LMP prices, pre-process the +file to include end-of-interval rows that repeat the previous value as +described in [Achieving zero-order-hold (ZOH) behaviour](#achieving-zero-order-hold-zoh-behaviour). +The helper +[`generate_locational_marginal_price_dataframe_from_gridstatus`](../hercules/grid/grid_utilities.py) +in `hercules/grid/grid_utilities.py` is a concrete example of adding those +endpoint rows for LMP data. ```text time_utc,wd_mean,ws_000,ws_001,ws_002 @@ -145,6 +257,8 @@ Key Points: ## Output Files +All values in Hercules output files represent **instantaneous** quantities at each time step, not period averages. See [Time Interpretation](#time-interpretation-inputs-vs-internal-values) for the distinction from input files. + Hercules output HDF5 files store: - `time` array: Simulation time points (seconds from t=0) diff --git a/docs/wind.md b/docs/wind.md index 24fb2de6..0fef6df5 100644 --- a/docs/wind.md +++ b/docs/wind.md @@ -54,7 +54,7 @@ Required parameters for WindFarmSCADAPower: **SCADA File Format:** The SCADA file must contain the following columns: -- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings) +- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings). Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values. - `wd_mean`: Mean wind direction in degrees - `pow_###`: Power output for each turbine (e.g., `pow_000`, `pow_001`, `pow_002`) diff --git a/hercules/hercules_model.py b/hercules/hercules_model.py index dfe78b18..0b4a3f28 100644 --- a/hercules/hercules_model.py +++ b/hercules/hercules_model.py @@ -172,10 +172,23 @@ def _read_external_data_file(self, filename): """ Read and interpolate external data from a CSV, feather, or pickle file. - This method reads external data from the specified file (CSV, feather, or pickle) - and interpolates it according to the simulation time steps. The external data must - include a 'time_utc' column which will be converted to simulation time. - The interpolated data is stored in self.external_signals_all. + This method reads external data from the specified file (CSV, feather, or + pickle) and upsamples it onto the simulation time grid using + ``"instantaneous_to_instantaneous"`` (linear interpolation between the + values at the supplied timestamps). + + If zero-order-hold (piecewise-constant / step) behavior is desired -- + for example, LMP prices that should be held constant across each + reporting interval -- the external data file must be pre-processed to + include an additional row at the end of each interval carrying the + same value. Linear interpolation between each pair of identical + endpoints then reproduces the ZOH shape. See + ``hercules.grid.grid_utilities.generate_locational_marginal_price_dataframe_from_gridstatus`` + for a worked example of this endpoint-insertion pattern. + + The external data must include a ``time_utc`` column which will be + converted to simulation time. The interpolated data is stored in + ``self.external_signals_all``. Args: filename (str): Path to the file containing external data. Supported formats: @@ -216,7 +229,9 @@ def _read_external_data_file(self, filename): ) # Interpolate using the utility function - df_interpolated = interpolate_df(df_ext, new_times) + df_interpolated = interpolate_df( + df_ext, new_times, interpolation_method="instantaneous_to_instantaneous" + ) # Convert interpolated DataFrame to dictionary format for col in df_interpolated.columns: diff --git a/hercules/plant_components/power_playback.py b/hercules/plant_components/power_playback.py index e12f4361..3d127ea1 100644 --- a/hercules/plant_components/power_playback.py +++ b/hercules/plant_components/power_playback.py @@ -122,7 +122,9 @@ def __init__(self, h_dict, component_name): # Interpolate df_scada on to the time steps time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type) - df_scada = interpolate_df(df_scada, time_steps_all) + df_scada = interpolate_df( + df_scada, time_steps_all, interpolation_method="averaged_to_instantaneous" + ) # Confirm that there is a column called "power" if "power" not in df_scada.columns: diff --git a/hercules/plant_components/solar_pysam_base.py b/hercules/plant_components/solar_pysam_base.py index 01c24030..857ecc1a 100644 --- a/hercules/plant_components/solar_pysam_base.py +++ b/hercules/plant_components/solar_pysam_base.py @@ -126,7 +126,9 @@ def _load_solar_data(self, h_dict): # Interpolate df_solar on to the time steps time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type) - df_solar = interpolate_df(df_solar, time_steps_all) + df_solar = interpolate_df( + df_solar, time_steps_all, interpolation_method="averaged_to_instantaneous" + ) # Can now save the input data as simple columns self.year_array = df_solar["time_utc"].dt.year.values diff --git a/hercules/plant_components/wind_farm.py b/hercules/plant_components/wind_farm.py index 4845d8f8..2cc97893 100644 --- a/hercules/plant_components/wind_farm.py +++ b/hercules/plant_components/wind_farm.py @@ -188,7 +188,9 @@ def __init__(self, h_dict, component_name): # Interpolate df_wi on to the time steps time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type) - df_wi = interpolate_df(df_wi, time_steps_all) + df_wi = interpolate_df( + df_wi, time_steps_all, interpolation_method="averaged_to_instantaneous" + ) # INITIALIZE FLORIS BASED ON WAKE MODEL if self.wake_method == "precomputed": diff --git a/hercules/plant_components/wind_farm_scada_power.py b/hercules/plant_components/wind_farm_scada_power.py index 9ae544d7..6a80b01f 100644 --- a/hercules/plant_components/wind_farm_scada_power.py +++ b/hercules/plant_components/wind_farm_scada_power.py @@ -128,7 +128,9 @@ def __init__(self, h_dict, component_name): # Interpolate df_scada on to the time steps time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type) - df_scada = interpolate_df(df_scada, time_steps_all) + df_scada = interpolate_df( + df_scada, time_steps_all, interpolation_method="averaged_to_instantaneous" + ) # Get a list of power columns and infer number of turbines self.power_columns = sorted([col for col in df_scada.columns if col.startswith("pow_")]) diff --git a/hercules/utilities.py b/hercules/utilities.py index 401e15bd..d7e7cf96 100644 --- a/hercules/utilities.py +++ b/hercules/utilities.py @@ -448,20 +448,57 @@ def close_logging(logger): logger.removeHandler(handler) -def interpolate_df(df, new_time): +_VALID_INTERPOLATION_METHODS = { + "averaged_to_instantaneous", + "instantaneous_to_instantaneous", +} + + +def interpolate_df(df, new_time, interpolation_method): """Interpolate DataFrame values to match new time axis. - Uses linear interpolation with Polars backend for better performance and memory efficiency. - Converts datetime columns to timestamps for interpolation. + The ``interpolation_method`` parameter controls how numeric columns are + resampled onto ``new_time``: + + - ``"averaged_to_instantaneous"``: Input values are period averages whose + timestamps mark the **start** of each period. Each value is assigned to + the midpoint of its interval and then linearly interpolated. Use for + wind speed, solar irradiance, and similar time-averaged signals. + - ``"instantaneous_to_instantaneous"``: Input values already represent + instantaneous measurements. Standard linear interpolation is performed + directly on the original timestamps with no midpoint shift. + + Datetime columns (e.g. ``time_utc``) are always linearly interpolated on + the raw timestamps regardless of the chosen method, because they map + simulation time to wall-clock time directly. + + Note: + A dedicated zero-order-hold (ZOH) mode is intentionally not provided. + If you need step/piecewise-constant behaviour (e.g. LMP prices that + should be held constant across each reporting interval), pre-process + the input DataFrame to include an extra row at the end of each + interval carrying the same value, and then call this function with + ``"instantaneous_to_instantaneous"``. Linear interpolation between + each pair of identical endpoints reproduces the ZOH shape. See + ``hercules.grid.grid_utilities.generate_locational_marginal_price_dataframe_from_gridstatus`` + for an example of this endpoint-insertion pattern. Args: df (pd.DataFrame): DataFrame with 'time' column and data columns. new_time (array-like): New time points for interpolation. + interpolation_method (str): One of ``"averaged_to_instantaneous"`` or + ``"instantaneous_to_instantaneous"``. Returns: pd.DataFrame: DataFrame with new time axis and interpolated data columns. + """ - # Convert new_time to numpy array for consistency + if interpolation_method not in _VALID_INTERPOLATION_METHODS: + raise ValueError( + f"Unknown interpolation_method '{interpolation_method}'. " + f"Must be one of {sorted(_VALID_INTERPOLATION_METHODS)}." + ) + new_time = np.asarray(new_time) # Separate datetime and non-datetime columns for different processing @@ -475,59 +512,28 @@ def interpolate_df(df, new_time): else: numeric_cols.append(col) - return _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols) + # Sort by "time" once up front so that np.interp (which requires + # strictly-increasing x-coordinates) sees monotonic input for every + # column. Applying the sort in one place keeps numeric and datetime + # columns consistently ordered. + df_pl = pl.from_pandas(df).sort("time") + result_pl = pl.DataFrame({"time": new_time}) + time_values = df_pl["time"].to_numpy() -def _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols): - """Interpolate using Polars backend. - - Args: - df (pd.DataFrame): Input DataFrame. - new_time (np.ndarray): New time points. - datetime_cols (list): Datetime column names. - numeric_cols (list): Numeric column names. - - Returns: - pd.DataFrame: Interpolated DataFrame. - """ - # Convert to Polars for efficient processing - df_pl = pl.from_pandas(df) - - # Create a Polars DataFrame for the new time points - new_time_pl = pl.DataFrame({"time": new_time}) - - # Start with the time column - result_pl = new_time_pl - - # Process numeric columns using Polars' interpolation - if numeric_cols: - for col in numeric_cols: - # Use Polars' join_asof for efficient interpolation-like behavior - # This is more memory efficient than pandas for large datasets - col_data = df_pl.select(["time", col]).sort("time") - - # Perform interpolation using Polars operations - # Note: Polars doesn't have direct linear interpolation, so we use numpy interp - # but with Polars' efficient data extraction - time_values = col_data["time"].to_numpy() - col_values = col_data[col].to_numpy() - - # Linear interpolation with float32 precision - interpolated_values = np.interp(new_time, time_values, col_values).astype( - hercules_float_type - ) + if interpolation_method == "averaged_to_instantaneous": + x_coords = _compute_interval_midpoints(time_values) + else: + x_coords = time_values - # Add interpolated column to result - result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col)) + for col in numeric_cols: + col_values = df_pl[col].to_numpy() + interpolated_values = np.interp(new_time, x_coords, col_values).astype(hercules_float_type) + result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col)) - # Process datetime columns + # Process datetime columns (use the same sorted frame as numeric cols) for col in datetime_cols: - # Extract datetime data using Polars - col_data = df_pl.select(["time", col]).sort("time") - time_values = col_data["time"].to_numpy() - - # Convert datetime to timestamps for interpolation - datetime_values = col_data[col].to_pandas().astype("int64").values / 10**9 + datetime_values = df_pl[col].to_pandas().astype("int64").values / 10**9 # Interpolate timestamps (datetime precision doesn't need float32 constraint) interpolated_timestamps = np.interp(new_time, time_values, datetime_values) @@ -540,6 +546,31 @@ def _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols): return result_pl.to_pandas() +def _compute_interval_midpoints(time_values): + """Compute the midpoints of consecutive time intervals. + + For start-of-period timestamps, each value is best represented at the + center of its interval. The last interval width is assumed equal to the + preceding one. + + Args: + time_values (np.ndarray): Sorted array of start-of-period timestamps. + + Returns: + np.ndarray: Array of interval midpoints, same length as *time_values*. + """ + # Allow the edge case of a single time value by returning the time value itself + if len(time_values) < 2: + return time_values + # Compute midpoints + midpoints = np.empty_like(time_values, dtype=np.float64) + midpoints[:-1] = (time_values[:-1] + time_values[1:]) / 2.0 + midpoints[-1] = ( + time_values[-1] + (time_values[-1] - time_values[-2]) / 2.0 + ) # Last interval is equal to the previous one + return midpoints + + def find_time_utc_value(df, time_value, time_column="time", time_utc_column="time_utc"): """Return UTC timestamp at a given time value via linear interpolation or extrapolation. diff --git a/tests/example_regression_tests/example_00_regression_test.py b/tests/example_regression_tests/example_00_regression_test.py index ca2b265f..85723fda 100644 --- a/tests/example_regression_tests/example_00_regression_test.py +++ b/tests/example_regression_tests/example_00_regression_test.py @@ -18,8 +18,8 @@ # Test configuration NUM_TIME_STEPS = 5 -EXPECTED_FINAL_WIND_POWER = 3271 # Updated after wind model changes -EXPECTED_FINAL_PLANT_POWER = 3271 # Same as wind power for wind-only case +EXPECTED_FINAL_WIND_POWER = 3265 # Updated for midpoint interpolation correction +EXPECTED_FINAL_PLANT_POWER = 3265 # Same as wind power for wind-only case # File names INPUT_FILE = "hercules_input.yaml" diff --git a/tests/example_regression_tests/example_00b_regression_precom_test.py b/tests/example_regression_tests/example_00b_regression_precom_test.py index c8c6716c..20c6cd33 100644 --- a/tests/example_regression_tests/example_00b_regression_precom_test.py +++ b/tests/example_regression_tests/example_00b_regression_precom_test.py @@ -20,8 +20,8 @@ # Test configuration NUM_TIME_STEPS = 5 -EXPECTED_FINAL_WIND_POWER = 3021 # Updated for precomputed FLORIS model -EXPECTED_FINAL_PLANT_POWER = 3021 # Same as wind power for wind-only case +EXPECTED_FINAL_WIND_POWER = 3020 # Updated for midpoint interpolation correction +EXPECTED_FINAL_PLANT_POWER = 3020 # Same as wind power for wind-only case # File names INPUT_FILE = "hercules_input.yaml" diff --git a/tests/example_regression_tests/example_03_regression_test.py b/tests/example_regression_tests/example_03_regression_test.py index b5e67777..16d74c8f 100644 --- a/tests/example_regression_tests/example_03_regression_test.py +++ b/tests/example_regression_tests/example_03_regression_test.py @@ -21,9 +21,9 @@ # Test configuration NUM_TIME_STEPS = 5 -EXPECTED_FINAL_WIND_POWER = 14322 # Updated for 9 turbines with large config -EXPECTED_FINAL_SOLAR_POWER = 20912 # Expected final solar farm power output (kW) -EXPECTED_FINAL_PLANT_POWER = 35234 # Wind + Solar (14322 + 20912) +EXPECTED_FINAL_WIND_POWER = 14321 # Updated for midpoint interpolation correction +EXPECTED_FINAL_SOLAR_POWER = 21054 # Updated for midpoint interpolation correction +EXPECTED_FINAL_PLANT_POWER = 35375 # Wind + Solar (14321 + 21054) # File names INPUT_FILE = "hercules_input.yaml" diff --git a/tests/power_playback_test.py b/tests/power_playback_test.py index 7448abf6..0504d309 100644 --- a/tests/power_playback_test.py +++ b/tests/power_playback_test.py @@ -46,8 +46,10 @@ def test_power_playback_step(): step_h_dict["step"] = 1 result = power_playback.step(step_h_dict) - # Verify power - assert np.isclose(result["power_playback"]["power"], 2000.0) + # With midpoint correction the value at t=1 is the average of period-0 + # (midpoint = 0.5 seconds after starttime) + # (1000) and period-1 (2000) values (midpoint = 1.5 seconds after starttime) + assert np.isclose(result["power_playback"]["power"], 1500.0) def test_power_playback_raises_on_nan_in_power_columns(): diff --git a/tests/regression_tests/solar_pysam_pvwatts_regression_test.py b/tests/regression_tests/solar_pysam_pvwatts_regression_test.py index 19c2c966..c239db88 100644 --- a/tests/regression_tests/solar_pysam_pvwatts_regression_test.py +++ b/tests/regression_tests/solar_pysam_pvwatts_regression_test.py @@ -10,16 +10,16 @@ powers_base_no_control = np.array( [ - 16528.82749492729, - 16541.958599140045, - 16555.08955834377, - 16568.220372741496, - 16581.35104253094, - 16594.481567904546, - 16607.61194537151, - 16620.74217922295, - 16633.872269119838, - 16647.002215233784, + 17092.15820312, + 17098.77539062, + 17112.00976562, + 17125.24609375, + 17138.48242188, + 17151.71679688, + 17164.94921875, + 17178.18554688, + 17191.41992188, + 17204.65625, ] ) @@ -41,30 +41,30 @@ dni_base_no_control = np.array( [ 330.86019897, - 331.19604492, - 331.53189087, - 331.86773682, - 332.20358276, - 332.53942871, - 332.87527466, - 333.21112061, - 333.54696655, - 333.8828125, + 331.02813721, + 331.36395264, + 331.6998291, + 332.03564453, + 332.371521, + 332.70733643, + 333.04321289, + 333.37902832, + 333.71490479, ] ) aoi_base_no_control = np.array( [ - 67.82689268, - 67.82689268, - 67.82689268, - 67.82689268, - 67.82689268, - 67.82689268, - 67.82689268, - 67.82689268, - 67.82689268, - 67.82689268, + 67.82688904, + 67.82688904, + 67.82688904, + 67.82688904, + 67.82688904, + 67.82688904, + 67.82688904, + 67.82688904, + 67.82688904, + 67.82688904, ] ) diff --git a/tests/test_inputs/scada_input.csv b/tests/test_inputs/scada_input.csv index 1d934114..e6ba4499 100644 --- a/tests/test_inputs/scada_input.csv +++ b/tests/test_inputs/scada_input.csv @@ -3,7 +3,7 @@ time_utc,wd_mean,ws_000,ws_001,ws_002,pow_000,pow_001,pow_002 2018-05-10 12:31:01,185.2,9.1,9.0,9.2,3200.0,3100.0,3300.0 2018-05-10 12:31:02,190.8,7.8,7.7,7.9,2200.0,2100.0,2300.0 2018-05-10 12:31:03,175.3,6.5,6.4,6.6,1500.0,1400.0,1600.0 -2018-05-10 12:31:04,170.1,10.2,10.1,10.3,4200.0,4100.0,4300.0 +2018-05-10 12:31:04,170.1,10.2,10.1,10.3,5000.0,4100.0,4300.0 2018-05-10 12:31:05,165.7,11.5,11.4,11.6,5000.0,4900.0,5000.0 2018-05-10 12:31:06,160.4,9.8,9.7,9.9,5000.0,3800.0,4000.0 2018-05-10 12:31:07,155.9,8.7,8.6,8.8,3000.0,2900.0,3100.0 diff --git a/tests/utilities_test.py b/tests/utilities_test.py index bd39efe1..cbc5f9d1 100644 --- a/tests/utilities_test.py +++ b/tests/utilities_test.py @@ -32,18 +32,29 @@ def test_upsampling(): } ) + # Midpoints will be 1, 3, 5, 7, 9, 11 + # Create new_time with more points (upsampling) new_time = np.linspace(0, 10, 11) # [0, 1, 2, 3, ..., 10] # Interpolate - result = interpolate_df(df, new_time) + result = interpolate_df(df, new_time, interpolation_method="averaged_to_instantaneous") # Assert time is correct assert np.allclose(result["time"], new_time) # Assert values are correct - expected_values = new_time # Linear function y = x - assert np.allclose(result["value"], expected_values), "Interpolated values should match y = x" + # Time 0 is before first midpoint, so value should clamp to 0 + # Time 1 is at first midpoint, so value should be 0 + # Time 2 is between first and second midpoint, so value should be 1 + # Time 3 is at second midpoint, so value should be 2 + # ... + # Time 10 is in between last and second last midpoint, so value should be 9 + expected_values = [0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] + assert np.allclose(result["value"], expected_values), ( + "Interpolated values should match midpoint-corrected " + "averaged_to_instantaneous output with edge clamping" + ) def test_downsampling(): @@ -55,21 +66,24 @@ def test_downsampling(): """ time_points = np.linspace(0, 10, 11) - df = pd.DataFrame({"time": time_points, "value": time_points * 1.7}) - - # Create new_time with fewer points (downsampling) - new_time = np.array([0, 2, 4]) - - # Interpolate - result = interpolate_df(df, new_time) - - # For our quadratic function, the interpolated values should be the square of new_time - expected_values = new_time * 1.7 + value_points = time_points * 1.7 + df = pd.DataFrame({"time": time_points, "value": value_points}) + + # Query at interval midpoints (0.5, 5.5, 9.5) and end points (0, 10) + new_time = np.array([0.0, 5.0, 5.5, 10.0, 10.5]) + + result = interpolate_df(df, new_time, interpolation_method="averaged_to_instantaneous") + + # At the midpoints we should recover the original period values + expected_values = [ + value_points[0], + (value_points[4] + value_points[5]) / 2, + value_points[5], + (value_points[-2] + value_points[-1]) / 2, + value_points[-1], + ] assert np.allclose(result["value"], expected_values) - # Check the shape is correct - assert result.shape[0] == len(new_time) - def test_datetime_interpolation(): """ @@ -98,7 +112,7 @@ def test_datetime_interpolation(): new_time = np.array([0, 2.5, 5, 7.5, 10]) # Interpolate - result = interpolate_df(df, new_time) + result = interpolate_df(df, new_time, interpolation_method="averaged_to_instantaneous") # Assert time is correct assert np.allclose(result["time"], new_time) @@ -566,7 +580,7 @@ def test_interpolate_df_with_large_dataset(): new_time = np.linspace(0, 1000, 500) # Interpolate - result = interpolate_df(df, new_time) + result = interpolate_df(df, new_time, interpolation_method="averaged_to_instantaneous") # Verify result has the correct shape and columns assert len(result) == len(new_time) diff --git a/tests/wind_farm_dynamic_floris_test.py b/tests/wind_farm_dynamic_floris_test.py index 99064f53..7e7568cf 100644 --- a/tests/wind_farm_dynamic_floris_test.py +++ b/tests/wind_farm_dynamic_floris_test.py @@ -8,7 +8,7 @@ import pandas as pd import pytest from hercules.plant_components.wind_farm import WindFarm -from hercules.utilities import hercules_float_type +from hercules.utilities import hercules_float_type, interpolate_df from tests.test_inputs.h_dict import h_dict_wind @@ -42,14 +42,27 @@ def test_wind_farm_ws_mean(): # Test that, since individual speed are specified, ws_mean is ignored # Note that h_dict_wind specifies an end time of 10. wind_sim = WindFarm(test_h_dict, "wind_farm") - assert ( - wind_sim.ws_mat[:, 0] == df_input["ws_000"].to_numpy(dtype=hercules_float_type)[:10] - ).all() + + # Assume df_input represents time stamps indicating start of period. + # Convert to instantaneous values with midpoint correction as would be done + # internally by interpolate_df function. + df_input["time"] = np.arange(0, df_input.shape[0], 1) + df_input["time_utc"] = pd.to_datetime(df_input["time_utc"]) + df_input_interpolated = interpolate_df( + df_input, + np.arange(0, df_input.shape[0], 1), + interpolation_method="averaged_to_instantaneous", + ) + + assert np.allclose( + wind_sim.ws_mat[:, 0], + df_input_interpolated["ws_000"].to_numpy(dtype=hercules_float_type)[:10], + ) assert np.allclose( wind_sim.ws_mat_mean, - (df_input[["ws_000", "ws_001", "ws_002"]].mean(axis=1)).to_numpy(dtype=hercules_float_type)[ - :10 - ], + (df_input_interpolated[["ws_000", "ws_001", "ws_002"]].mean(axis=1)).to_numpy( + dtype=hercules_float_type + )[:10], ) # Drop individual speeds and test that ws_mean is used instead diff --git a/tests/wind_farm_precom_floris_test.py b/tests/wind_farm_precom_floris_test.py index 356d0143..0a73f782 100644 --- a/tests/wind_farm_precom_floris_test.py +++ b/tests/wind_farm_precom_floris_test.py @@ -8,7 +8,7 @@ import pandas as pd import pytest from hercules.plant_components.wind_farm import WindFarm -from hercules.utilities import hercules_float_type +from hercules.utilities import hercules_float_type, interpolate_df from tests.test_inputs.h_dict import h_dict_wind @@ -52,14 +52,27 @@ def test_wind_farm_precom_floris_ws_mean(): # Test that, since individual speed are specified, ws_mean is ignored # Note that h_dict_wind_precom_floris specifies an end time of 10. wind_sim = WindFarm(test_h_dict, "wind_farm") - assert ( - wind_sim.ws_mat[:, 0] == df_input["ws_000"].to_numpy(dtype=hercules_float_type)[:10] - ).all() + + # Assume df_input represents time stamps indicating start of period. + # Convert to instantaneous values with midpoint correction as would be done + # internally by interpolate_df function. + df_input["time"] = np.arange(0, df_input.shape[0], 1) + df_input["time_utc"] = pd.to_datetime(df_input["time_utc"]) + df_input_interpolated = interpolate_df( + df_input, + np.arange(0, df_input.shape[0], 1), + interpolation_method="averaged_to_instantaneous", + ) + + assert np.allclose( + wind_sim.ws_mat[:, 0], + df_input_interpolated["ws_000"].to_numpy(dtype=hercules_float_type)[:10], + ) assert np.allclose( wind_sim.ws_mat_mean, - (df_input[["ws_000", "ws_001", "ws_002"]].mean(axis=1)).to_numpy(dtype=hercules_float_type)[ - :10 - ], + (df_input_interpolated[["ws_000", "ws_001", "ws_002"]].mean(axis=1)).to_numpy( + dtype=hercules_float_type + )[:10], ) # Drop individual speeds and test that ws_mean is used instead @@ -254,8 +267,9 @@ def test_wind_farm_precom_floris_velocities_update_correctly(): "Withwakes wind speeds should have been updated" ) - # Verify the wind speeds match the expected values from the input data - expected_background = np.array([9.0, 9.5, 10.0]) # ws values for step 1 + # With midpoint correction, the value at step=1 is the average of + # period-0 and period-1 input values for each turbine. + expected_background = np.array([8.5, 9.0, 9.5]) np.testing.assert_array_equal(wind_sim.wind_speeds_background, expected_background) # Verify that wake deficits are recalculated diff --git a/tests/wind_farm_scada_power_test.py b/tests/wind_farm_scada_power_test.py index 55f11c18..f80dd507 100644 --- a/tests/wind_farm_scada_power_test.py +++ b/tests/wind_farm_scada_power_test.py @@ -87,9 +87,9 @@ def test_wind_farm_scada_power_step(): result["wind_farm"]["wind_speeds_background"], ) - # Verify turbine powers - assert np.allclose(result["wind_farm"]["turbine_powers"], [3200.0, 3100.0, 3300.0]) - assert np.isclose(result["wind_farm"]["power"], 3200.0 + 3100.0 + 3300.0) + # At step=1, midpoint correction gives the average of period-0 and period-1 values. + assert np.allclose(result["wind_farm"]["turbine_powers"], [2850.0, 2750.0, 2950.0]) + assert np.isclose(result["wind_farm"]["power"], 2850.0 + 2750.0 + 2950.0) def test_wind_farm_scada_power_get_initial_conditions_and_meta_data():