Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/API.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@ Plotting module
.. automodule:: enlopy.plot
:members:

Statistics module
-----------------
.. automodule:: enlopy.stats
:members:

Utilities module
----------------
.. automodule:: enlopy.utils
Expand Down
92 changes: 92 additions & 0 deletions docs/analysis.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
.. _analysis_module:

enlopy.analysis: Analyzing Energy Timeseries
=============================================

The ``enlopy.analysis`` module offers a collection of functions designed to
inspect, characterize, and extract meaningful insights from energy-related
timeseries data. These tools are fundamental for understanding load patterns,
variability, and for preparing data for further modeling or reporting.

Core Functionalities
--------------------

The module focuses on:

* **Data Transformation:** Reshaping timeseries for easier analysis and visualization.
* **Load Characterization:** Calculating standard metrics like Load Duration Curves and key statistics.
* **Pattern Recognition:** Identifying typical load profiles (archetypes) using clustering.
* **Data Cleaning:** Detecting outliers.

Rationale and Use Cases of Key Functions
----------------------------------------

Below is a description of key functions, their purpose, and typical use cases.
For detailed API parameters, please refer to the :ref:`API documentation <API>`.

.. contents:: Key Functions
:local:
:depth: 1

reshape_timeseries
~~~~~~~~~~~~~~~~~~
* **Rationale:** Timeseries data is often a long 1D array. Reshaping it into a 2D
matrix based on time attributes (e.g., rows as hours of the day, columns as
days of the year) allows for powerful visualizations (like heatmaps) and
makes it easier to observe daily, weekly, or seasonal patterns.
* **Use Case:** Transforming an annual hourly electricity demand series into a
24 (hour) x 365 (day) matrix to visualize daily load shapes across the year
using a heatmap. This can help identify when peak loads occur or how profiles
change seasonally.

get_LDC
~~~~~~~
* **Rationale:** The Load Duration Curve (LDC) is a fundamental tool in power system
analysis. It sorts load values from highest to lowest, showing the percentage
of time the load meets or exceeds a particular level. This helps in
understanding the utilization of generation capacity and planning new investments.
* **Use Case:** Analyzing an annual hourly load profile to determine for how many
hours the system load is above 80% of its peak, which informs decisions about
peaking power plant requirements. It can also be used to compare the "peakiness"
of different load profiles.

get_load_archetypes
~~~~~~~~~~~~~~~~~~~
* **Rationale:** In a large dataset of individual load profiles (e.g., from many
smart meters), there are often recurring typical daily or weekly patterns.
This function uses k-means clustering to identify these "archetypes" or
representative profiles.
* **Use Case:** Segmenting a population of residential electricity consumers based
on their typical daily usage patterns (e.g., "night owls," "morning peaks,"
"daytime constant") for targeted demand-side management programs or tariff design.

get_load_stats
~~~~~~~~~~~~~~
* **Rationale:** To quickly summarize key characteristics of a load profile over
defined periods (e.g., monthly, annually). This function computes metrics like
peak load, average load, load factor (average/peak), base load factor, and
total operating hours, providing a snapshot of the load's behavior. It leverages
descriptors from the ``enlopy.stats`` module.
* **Use Case:** Calculating monthly peak demand, average demand, and load factor for
an industrial facility to track energy efficiency improvements or to report
to energy regulators.

detect_outliers
~~~~~~~~~~~~~~~
* **Rationale:** Anomalous data points (outliers) can skew statistical analyses
and lead to incorrect conclusions or model behavior. This function provides a
method to identify such outliers based on deviations from a rolling median,
which is robust to the presence of outliers itself.
* **Use Case:** Cleaning a timeseries of sensor data (e.g., temperature, power output)
by identifying and flagging readings that are likely errors before further
processing or analysis. The identified outliers can then be removed or imputed
using ``enlopy.generate.remove_outliers``.

countweekend_days_per_month
~~~~~~~~~~~~~~~~~~~~~~~~~~~
* **Rationale:** A utility function that counts the number of weekend days (Saturdays and Sundays)
within each month of a given timeseries' DatetimeIndex. This can be useful for analyses
that need to normalize or compare data based on the number of working vs. non-working days.
* **Use Case:** Normalizing monthly energy consumption data by the number of business days in
each month to get a more comparable measure of consumption intensity, especially when
comparing different months or years.
139 changes: 139 additions & 0 deletions docs/generate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
.. _generate_module:

enlopy.generate: Generating Energy Timeseries
=============================================

The ``enlopy.generate`` module provides a suite of tools for creating, synthesizing,
and manipulating energy-related timeseries data. These functions are essential
for simulations, modeling alternative scenarios, data augmentation, or when
actual high-resolution data is unavailable.

Core Functionalities
--------------------

The module covers several aspects of timeseries generation:

* **Creating profiles from base data:** Generating higher-resolution series from coarser data (e.g., daily to hourly) or from typical profiles.
* **Stochastic modeling:** Creating realistic synthetic timeseries based on statistical properties.
* **Transformations:** Modifying existing timeseries by adding noise, simulating demand response, or removing outliers.
* **Specialized generation:** Creating loads from Load Duration Curves (LDCs) or Power Spectral Densities (PSDs).

Rationale and Use Cases of Key Functions
----------------------------------------

Below is a description of some key functions, their purpose, and typical use cases.
For detailed API parameters, please refer to the :ref:`API documentation <API>`.

.. contents:: Key Functions
:local:
:depth: 1

disag_upsample
~~~~~~~~~~~~~~
* **Rationale:** Often, energy data is available at a coarse granularity (e.g., daily consumption),
but models or analyses require higher resolution (e.g., hourly). This function
distributes the coarser data points into finer intervals based on a representative
disaggregation profile, ensuring the total sum over the original period is preserved.
* **Use Case:** Converting daily household energy consumption data to hourly data using a
standard hourly consumption profile for that type of household.

gen_daily_stoch_el
~~~~~~~~~~~~~~~~~~
* **Rationale:** To create realistic, synthetic daily electricity load profiles when only
aggregate daily energy is known or when multiple variations are needed for robust analysis.
It uses pre-defined statistical means and standard deviations (derived from analysis
of many households) per timestep, combined with a Gauss-Markov process to introduce
autocorrelation.
* **Use Case:** Generating diverse daily load profiles for a set of simulated households
in an agent-based model, where each household has a total daily energy consumption target.

gen_load_from_daily_monthly
~~~~~~~~~~~~~~~~~~~~~~~~~~~
* **Rationale:** Constructing an annual hourly load profile when only monthly total consumption
and typical daily profiles (for weekdays and weekends) are available. This is common
in energy planning or when detailed historical data is scarce.
* **Use Case:** Creating a year-long hourly electricity demand forecast for a region
based on projected monthly energy demands and established daily usage patterns for
residential and commercial sectors.

gen_load_sinus
~~~~~~~~~~~~~~
* **Rationale:** To generate synthetic timeseries that exhibit clear periodic behavior
at multiple timescales (daily, weekly, annually). This is useful for creating
baseline profiles or test data for models that need to capture seasonality.
* **Use Case:** Creating a synthetic temperature profile or a baseline renewable energy
generation profile that follows predictable daily and annual cycles.

gen_corr_arrays
~~~~~~~~~~~~~~~
* **Rationale:** In many energy systems, multiple variables are correlated (e.g., wind
speed and solar irradiance at different locations, or electricity prices and demand).
This function generates multiple arrays of random numbers that exhibit a specified
correlation structure, essential for Monte Carlo simulations or for generating
realistic multi-variate inputs.
* **Use Case:** Generating correlated wind speed timeseries for several nearby wind farms
to assess the aggregated power output variability.

gen_load_from_LDC
~~~~~~~~~~~~~~~~~
* **Rationale:** To create a sequence of load values that statistically matches a given
Load Duration Curve (LDC). The LDC represents the amount of time the load is at or
above a certain level. This method uses inverse transform sampling.
* **Important Note:** This method generates values that match the LDC's distribution
but **loses the original temporal sequence**. The output is a set of load values,
not a chronologically realistic timeseries. It's often a precursor to `gen_load_from_PSD`.
* **Use Case:** Generating a set of hourly load values for a year that, when sorted,
will precisely match a target LDC for planning purposes.

gen_load_from_PSD
~~~~~~~~~~~~~~~~~
* **Rationale:** To generate a realistic timeseries that not only matches a target
probability distribution (often derived from an LDC via `gen_load_from_LDC`)
but also possesses specific spectral characteristics (i.e., how power is distributed
across different frequencies, indicating temporal patterns like ramps, cycles).
It uses the Iterated Amplitude Adjusted Fourier Transform (IAAFT) algorithm.
* **Use Case:** Taking hourly load values generated by `gen_load_from_LDC` and
"shuffling" them to create a chronologically realistic annual load profile that
exhibits typical daily and weekly patterns (captured in the PSD).

gen_gauss_markov
~~~~~~~~~~~~~~~~
* **Rationale:** To generate timeseries that exhibit autoregressive properties, meaning
future values depend on past values, along with some randomness. This is useful for
modeling systems with inertia or memory, where values don't change erratically
but smoothly transition.
* **Use Case:** Simulating short-term load fluctuations or temperature variations where
the current value is strongly influenced by the immediately preceding values.

add_noise
~~~~~~~~~
* **Rationale:** To introduce variability or uncertainty into an existing timeseries.
Real-world data is rarely perfectly smooth, and adding noise can make simulations
more realistic or test the robustness of models.
* **Use Case:** Adding random fluctuations to a deterministic solar power generation
profile to account for unpredictable cloud cover.

gen_analytical_LDC
~~~~~~~~~~~~~~~~~~
* **Rationale:** To quickly generate a standard Load Duration Curve shape based on
a few key empirical parameters (Peak load, capacity factor, base load factor,
operating hours). This avoids needing full timeseries data to get an LDC.
* **Use Case:** Quickly sketching an LDC for a system where only high-level statistics
are known, for initial capacity planning or policy analysis.

gen_demand_response
~~~~~~~~~~~~~~~~~~~
* **Rationale:** To simulate the impact of demand response programs, which aim to
reduce peak loads by either shifting demand to off-peak hours or by curtailing
(shaving) load during peak times.
* **Use Case:** Assessing how much a utility can reduce its peak capacity requirements
by implementing a residential demand response program that shifts a certain percentage
of peak load.

remove_outliers
~~~~~~~~~~~~~~~
* **Rationale:** Outliers in timeseries data can distort analysis and modeling. This
function first detects outliers (using methods from `enlopy.analysis`) and then
replaces them with interpolated values, providing a cleaner dataset.
* **Use Case:** Preprocessing a measured electricity demand timeseries to remove anomalous
readings caused by sensor errors before using it for forecasting.
7 changes: 6 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,13 @@ Contents
--------

.. toctree::
:maxdepth: 1
:maxdepth: 2

generate
analysis
plot
stats
utils
API

``enlopy`` is an open source python library with methods to generate,
Expand Down
92 changes: 92 additions & 0 deletions docs/plot.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
.. _plot_module:

enlopy.plot: Visualizing Energy Timeseries
===========================================

The ``enlopy.plot`` module provides a collection of functions for visualizing
energy-related timeseries data. These plotting utilities are designed to reveal
patterns, trends, and distributions within the data, often working in conjunction
with transformations from the ``enlopy.analysis`` module.

Core Visualizations
-------------------

The module offers several types of plots common in energy analysis:

* **Heatmaps and 3D plots:** For visualizing load across two time dimensions.
* **Percentile plots:** To understand temporal variations in load distribution.
* **Boxplots:** To compare distributions across different time categories.
* **Load Duration Curve (LDC) plots:** Standard visualization for power system analysis.
* **Rug plots:** For displaying activity or comparing multiple timeseries.

Rationale and Use Cases of Key Functions
----------------------------------------

Below is a description of key plotting functions, their purpose, and typical use cases.
For detailed API parameters, please refer to the :ref:`API documentation <API>`.

.. contents:: Key Functions
:local:
:depth: 1

plot_heatmap
~~~~~~~~~~~~
* **Rationale:** Heatmaps are an effective way to visualize the magnitude of a variable
across two dimensions. For timeseries, this typically involves reshaping the data
(e.g., using ``enlopy.analysis.reshape_timeseries``) so that one time attribute
(like hour of day) forms one axis, and another (like day of year) forms the other.
Color intensity represents the load magnitude.
* **Use Case:** Visualizing an entire year's hourly electricity demand to quickly
identify periods of high/low consumption, seasonal trends, and daily patterns.
For example, seeing bright colors during summer afternoons (AC load) and winter
evenings (heating/lighting).

plot_3d
~~~~~~~
* **Rationale:** Similar to heatmaps, 3D surface plots can represent load magnitude
across two time dimensions, but with the load value explicitly shown on the Z-axis.
This can sometimes offer a more intuitive grasp of peaks and valleys in the data.
* **Use Case:** Creating a 3D representation of hourly load versus day of year to
emphasize the height of peak demand periods and the depth of low-demand troughs.

plot_percentiles
~~~~~~~~~~~~~~~~
* **Rationale:** To understand how the distribution of load values changes over a
specific cycle (e.g., daily, weekly). This function plots user-defined percentiles
(e.g., 5th, 25th, 50th (median), 75th, 95th) for each point in the cycle,
showing the typical range and variability of the load.
* **Use Case:** Plotting hourly percentiles of electricity demand for each day of the
week. This can show, for instance, that while median load on weekends is lower,
the variability (spread between 5th and 95th percentiles) might be higher or different
in shape compared to weekdays.

plot_rug
~~~~~~~~
* **Rationale:** Rug plots are useful for visualizing the activity or values of multiple
timeseries simultaneously in a compact way. Each timeseries is represented by a
horizontal "rug." For on/off data, dashes can indicate "on" periods. For continuous
data, the color or intensity of dashes can represent magnitude.
* **Use Case:** Displaying the operational status (on/off) of multiple appliances in a
household over a day. Or, visualizing the normalized output of several renewable
energy sources (wind, solar) over time to see their collective behavior.

plot_boxplot
~~~~~~~~~~~~
* **Rationale:** Boxplots (or box-and-whisker plots) provide a standardized way to
display the distribution of data based on a five-number summary (minimum, first
quartile, median, third quartile, maximum). They are excellent for comparing
distributions across different categories.
* **Use Case:** Comparing the distribution of hourly electricity demand for each day
of the week. This can clearly show differences in median load, variability (interquartile
range), and the presence of outliers for weekdays versus weekend days.

plot_LDC
~~~~~~~~
* **Rationale:** Visualizing the Load Duration Curve (LDC), which is typically generated
by ``enlopy.analysis.get_LDC``. This plot shows the relationship between load levels
and the duration for which those levels are met or exceeded. It's a standard tool for
assessing power system adequacy and operational characteristics.
* **Use Case:** Plotting the LDC for a regional electricity system to visualize how many
hours per year different levels of generation capacity are utilized. Options allow
for plotting multiple LDCs (e.g., for different scenarios or sub-regions) and
zooming into the peak portion of the curve.
8 changes: 4 additions & 4 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
matplotlib==2.2.0
numpy==1.22.0
pandas==2.0.0
scipy==1.10.0
matplotlib>3.5.1,<3.6
numpy
pandas
scipy
Loading