Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/hercules_input.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ The old format is still supported for backward compatibility but will show a dep
### External Data File Format

The CSV file must contain:
- A `time_utc` column with UTC timestamps in ISO 8601 format
- A `time_utc` column with UTC timestamps in ISO 8601 format. Unlike wind/solar/SCADA/playback inputs (which are treated as start-of-period period averages), external data values are treated as **instantaneous** samples at their timestamps and are upsampled to the simulation time grid via `"instantaneous_to_instantaneous"` (linear interpolation). If you need zero-order-hold (piecewise-constant) behaviour -- e.g. for LMP prices -- pre-process the file to include an extra row at the end of each interval carrying the previous value; see [Achieving zero-order-hold (ZOH) behaviour](timing.md#achieving-zero-order-hold-zoh-behaviour) and the [`generate_locational_marginal_price_dataframe_from_gridstatus`](../hercules/grid/grid_utilities.py) helper.
- One or more data columns with external signals. Note that the names of the other columns are arbitrary; any column names will be carried forward and interpolated. However, the values must be floats. Additionally, some controllers and plotting utilities that work on external signals may require specific column names like `lmp_rt`, `lmp_da`, `wind_forecast`, etc.

Example `lmp_data.csv`:
Expand Down
2 changes: 2 additions & 0 deletions docs/output_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

Hercules generates HDF5 output files containing simulation data for analysis and visualization. This page describes the file format, available utilities for reading the data, and how HerculesModel generates these files.

All values in output files represent **instantaneous** quantities at each time step, not period averages. This differs from the convention used by input data files, where timestamps mark the start of a reporting period. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for details on this distinction and the midpoint correction applied during input interpolation.

## File Format

Hercules outputs simulation data in HDF5 (Hierarchical Data Format 5) format.
Expand Down
2 changes: 1 addition & 1 deletion docs/power_playback.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ power_unit_1:

The input file must contain the following columns:

- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings)
- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings). Each timestamp marks the **start of a reporting period**; the power value on that row is treated as the period average. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
- `power`: Power output in kW

Supported file formats: `.csv`, `.p`, `.pkl` (pickle), `.f`, `.ftr` (feather).
Expand Down
2 changes: 1 addition & 1 deletion docs/solar_pv.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Presently only one solar simulator is available

Both models require an input weather file:
1. A CSV file that specifies the weather conditions (e.g. NonAnnualSimulation-sample_data-interpolated-daytime.csv). This file should include:
- timestamp (see [timing](timing.md) for time format requirements)
- timestamp (see [timing](timing.md) for time format requirements). Each `time_utc` timestamp marks the **start of a reporting period**; irradiance and weather values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
- direct normal irradiance (DNI)
- diffuse horizontal irradiance (DHI)
- global horizontal irradiance (GHI)
Expand Down
116 changes: 115 additions & 1 deletion docs/timing.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,105 @@ Timing in Hercules is specified using two complementary representations:
- `time` (float): Simulation time in seconds, where `time=0` corresponds to `starttime_utc`
- `time_utc` (datetime): Absolute UTC timestamp

## Time Interpretation: Inputs vs. Internal Values

### Input files: start-of-period convention

In external data sources such as weather files, SCADA records, and resource
databases, each `time_utc` timestamp marks the **beginning** of a reporting
period and the associated values (irradiance, wind speed, power, etc.)
represent an average or aggregate over that period. For example, an hourly
weather file with a row at `2020-06-15T12:00:00Z` and GHI = 735 W/m² means
that 735 W/m² is the average GHI from 12:00 to 13:00.

### Hercules internal values: instantaneous convention

Inside the simulation, values at a given time step represent **instantaneous**
quantities at that moment. All Hercules output values follow this same
instantaneous convention.

### Interpolation methods

The `interpolate_df` function in `utilities.py` accepts a mandatory
`interpolation_method` parameter that controls how numeric columns are
resampled onto the simulation time grid. Two methods are available:

#### `"averaged_to_instantaneous"` (wind, solar, and similar resource and power signals)

Input values are period averages whose timestamps mark the **start** of each
period. The best single-point estimate of a period-averaged value is at the
**midpoint** of its interval, not the start. For example, the hourly average
from 12:00-13:00 is most representative of conditions at 12:30. This also ensures that an average of the signal back to the original time interval will match the original data.

1. Each numeric value is assigned to the midpoint of its input interval
(using `_compute_interval_midpoints`).
2. Linear interpolation is then performed between these midpoints to produce
values at the simulation time steps.

```
Input file (start-of-period):

time_utc value
12:00 100 ← average over [12:00, 13:00)
13:00 200 ← average over [13:00, 14:00)

After midpoint correction:

time value
12:30 100 ← midpoint of [12:00, 13:00)
13:30 200 ← midpoint of [13:00, 14:00)

Querying at 13:00 yields 150 (halfway between midpoints).
```

#### `"instantaneous_to_instantaneous"`

Input values already represent instantaneous measurements at their
timestamps. Standard linear interpolation is performed directly on the
original timestamps with no midpoint shift.

---

In both methods, datetime columns (e.g. `time_utc`) are linearly
interpolated on the raw timestamps without any shift, because they are
instantaneous coordinate mappings between simulation time and wall-clock
time, not period-averaged measurements.

#### Achieving zero-order-hold (ZOH) behaviour

`interpolate_df` does not provide a dedicated zero-order-hold mode. If you
need step/piecewise-constant values -- for example, LMP prices that
should be held constant across each reporting interval -- pre-process your
input data to include an additional row at the end of each interval that
carries the same value as the start-of-interval row, and then use
`"instantaneous_to_instantaneous"`. Linear interpolation between each pair
of identical endpoints reproduces the ZOH shape.

```
Original data (start-of-interval only):

time_utc value
12:00 100
13:00 200

After inserting end-of-interval rows (just before the next start):

time_utc value
12:00 100
12:59:59 100 ← added endpoint
13:00 200
13:59:59 200 ← added endpoint

Querying at 12:30 with "instantaneous_to_instantaneous" yields 100.
Querying at 13:00 yields 200.
```

See
[`generate_locational_marginal_price_dataframe_from_gridstatus`](../hercules/grid/grid_utilities.py)
in `hercules/grid/grid_utilities.py` for a worked example of this
endpoint-insertion pattern (it shifts a copy of the data by `dt - 1` seconds
and merges it back in before handing the frame to Hercules).

## Input Requirements

All Hercules input files must specify start and end times using UTC datetime strings:
Expand Down Expand Up @@ -113,7 +212,20 @@ For the example above, `endtime` would be 3600.0 seconds.

### Wind and Solar Input Data

Both wind and solar input CSV/Feather/Parquet files must contain a `time_utc` column with UTC timestamps:
Both wind and solar input CSV/Feather/Parquet files must contain a `time_utc` column with UTC timestamps. Each `time_utc` value marks the **start of a reporting period**; the data values on that row are treated as period averages. These are interpolated with `"averaged_to_instantaneous"`. See [Interpolation methods](#interpolation-methods) above for details.

### External Data (LMP, etc.)

External data files loaded via `_read_external_data_file` are upsampled onto
the simulation time grid with `"instantaneous_to_instantaneous"` (linear
interpolation between the supplied timestamps). If you want zero-order-hold
(piecewise-constant) behaviour for signals like LMP prices, pre-process the
file to include end-of-interval rows that repeat the previous value as
described in [Achieving zero-order-hold (ZOH) behaviour](#achieving-zero-order-hold-zoh-behaviour).
The helper
[`generate_locational_marginal_price_dataframe_from_gridstatus`](../hercules/grid/grid_utilities.py)
in `hercules/grid/grid_utilities.py` is a concrete example of adding those
endpoint rows for LMP data.

```text
time_utc,wd_mean,ws_000,ws_001,ws_002
Expand Down Expand Up @@ -145,6 +257,8 @@ Key Points:

## Output Files

All values in Hercules output files represent **instantaneous** quantities at each time step, not period averages. See [Time Interpretation](#time-interpretation-inputs-vs-internal-values) for the distinction from input files.

Hercules output HDF5 files store:

- `time` array: Simulation time points (seconds from t=0)
Expand Down
2 changes: 1 addition & 1 deletion docs/wind.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Required parameters for WindFarmSCADAPower:
**SCADA File Format:**

The SCADA file must contain the following columns:
- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings)
- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings). Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
- `wd_mean`: Mean wind direction in degrees
- `pow_###`: Power output for each turbine (e.g., `pow_000`, `pow_001`, `pow_002`)

Expand Down
25 changes: 20 additions & 5 deletions hercules/hercules_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,10 +172,23 @@ def _read_external_data_file(self, filename):
"""
Read and interpolate external data from a CSV, feather, or pickle file.

This method reads external data from the specified file (CSV, feather, or pickle)
and interpolates it according to the simulation time steps. The external data must
include a 'time_utc' column which will be converted to simulation time.
The interpolated data is stored in self.external_signals_all.
This method reads external data from the specified file (CSV, feather, or
pickle) and upsamples it onto the simulation time grid using
``"instantaneous_to_instantaneous"`` (linear interpolation between the
values at the supplied timestamps).

If zero-order-hold (piecewise-constant / step) behavior is desired --
for example, LMP prices that should be held constant across each
reporting interval -- the external data file must be pre-processed to
include an additional row at the end of each interval carrying the
same value. Linear interpolation between each pair of identical
endpoints then reproduces the ZOH shape. See
``hercules.grid.grid_utilities.generate_locational_marginal_price_dataframe_from_gridstatus``
for a worked example of this endpoint-insertion pattern.

The external data must include a ``time_utc`` column which will be
converted to simulation time. The interpolated data is stored in
``self.external_signals_all``.

Args:
filename (str): Path to the file containing external data. Supported formats:
Expand Down Expand Up @@ -216,7 +229,9 @@ def _read_external_data_file(self, filename):
)

# Interpolate using the utility function
df_interpolated = interpolate_df(df_ext, new_times)
df_interpolated = interpolate_df(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should hard code this. It makes sense that the LMP prices should be this type of interpolation, but this is also how we input power reference signals. Should those also use the zoh_to_instantaneous method?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is tricky. I think to date in Hercules we haven't explicitly tracked LMP prices, these are just external signals that end up having signal names that Hycon expects.

One idea is we can add to Hercules input a dictionary that says how different external channels should be upsampled. With a default to this for backwards compatibilty, or not, force explicit and have one more breaking change? @misi9170 any thoughts here?

df_ext, new_times, interpolation_method="instantaneous_to_instantaneous"
)

# Convert interpolated DataFrame to dictionary format
for col in df_interpolated.columns:
Expand Down
4 changes: 3 additions & 1 deletion hercules/plant_components/power_playback.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,9 @@ def __init__(self, h_dict, component_name):

# Interpolate df_scada on to the time steps
time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
df_scada = interpolate_df(df_scada, time_steps_all)
df_scada = interpolate_df(
df_scada, time_steps_all, interpolation_method="averaged_to_instantaneous"
)

# Confirm that there is a column called "power"
if "power" not in df_scada.columns:
Expand Down
4 changes: 3 additions & 1 deletion hercules/plant_components/solar_pysam_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,9 @@ def _load_solar_data(self, h_dict):

# Interpolate df_solar on to the time steps
time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
df_solar = interpolate_df(df_solar, time_steps_all)
df_solar = interpolate_df(
df_solar, time_steps_all, interpolation_method="averaged_to_instantaneous"
)

# Can now save the input data as simple columns
self.year_array = df_solar["time_utc"].dt.year.values
Expand Down
4 changes: 3 additions & 1 deletion hercules/plant_components/wind_farm.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,9 @@ def __init__(self, h_dict, component_name):

# Interpolate df_wi on to the time steps
time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
df_wi = interpolate_df(df_wi, time_steps_all)
df_wi = interpolate_df(
df_wi, time_steps_all, interpolation_method="averaged_to_instantaneous"
)

# INITIALIZE FLORIS BASED ON WAKE MODEL
if self.wake_method == "precomputed":
Expand Down
4 changes: 3 additions & 1 deletion hercules/plant_components/wind_farm_scada_power.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,9 @@ def __init__(self, h_dict, component_name):

# Interpolate df_scada on to the time steps
time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
df_scada = interpolate_df(df_scada, time_steps_all)
df_scada = interpolate_df(
df_scada, time_steps_all, interpolation_method="averaged_to_instantaneous"
)

# Get a list of power columns and infer number of turbines
self.power_columns = sorted([col for col in df_scada.columns if col.startswith("pow_")])
Expand Down
Loading
Loading