kavvkon · kavvkon · May 21, 2025 · May 21, 2025 · May 21, 2025 · May 21, 2025
diff --git a/docs/API.rst b/docs/API.rst
@@ -19,6 +19,11 @@ Plotting module
 .. automodule:: enlopy.plot
     :members:
 
+Statistics module
+-----------------
+.. automodule:: enlopy.stats
+    :members:
+
 Utilities module
 ----------------
 .. automodule:: enlopy.utils

diff --git a/docs/analysis.rst b/docs/analysis.rst
@@ -0,0 +1,92 @@
+.. _analysis_module:
+
+enlopy.analysis: Analyzing Energy Timeseries
+=============================================
+
+The ``enlopy.analysis`` module offers a collection of functions designed to
+inspect, characterize, and extract meaningful insights from energy-related
+timeseries data. These tools are fundamental for understanding load patterns,
+variability, and for preparing data for further modeling or reporting.
+
+Core Functionalities
+--------------------
+
+The module focuses on:
+
+*   **Data Transformation:** Reshaping timeseries for easier analysis and visualization.
+*   **Load Characterization:** Calculating standard metrics like Load Duration Curves and key statistics.
+*   **Pattern Recognition:** Identifying typical load profiles (archetypes) using clustering.
+*   **Data Cleaning:** Detecting outliers.
+
+Rationale and Use Cases of Key Functions
+----------------------------------------
+
+Below is a description of key functions, their purpose, and typical use cases.
+For detailed API parameters, please refer to the :ref:`API documentation <API>`.
+
+.. contents:: Key Functions
+   :local:
+   :depth: 1
+
+reshape_timeseries
+~~~~~~~~~~~~~~~~~~
+*   **Rationale:** Timeseries data is often a long 1D array. Reshaping it into a 2D
+    matrix based on time attributes (e.g., rows as hours of the day, columns as
+    days of the year) allows for powerful visualizations (like heatmaps) and
+    makes it easier to observe daily, weekly, or seasonal patterns.
+*   **Use Case:** Transforming an annual hourly electricity demand series into a
+    24 (hour) x 365 (day) matrix to visualize daily load shapes across the year
+    using a heatmap. This can help identify when peak loads occur or how profiles
+    change seasonally.
+
+get_LDC
+~~~~~~~
+*   **Rationale:** The Load Duration Curve (LDC) is a fundamental tool in power system
+    analysis. It sorts load values from highest to lowest, showing the percentage
+    of time the load meets or exceeds a particular level. This helps in
+    understanding the utilization of generation capacity and planning new investments.
+*   **Use Case:** Analyzing an annual hourly load profile to determine for how many
+    hours the system load is above 80% of its peak, which informs decisions about
+    peaking power plant requirements. It can also be used to compare the "peakiness"
+    of different load profiles.
+
+get_load_archetypes
+~~~~~~~~~~~~~~~~~~~
+*   **Rationale:** In a large dataset of individual load profiles (e.g., from many
+    smart meters), there are often recurring typical daily or weekly patterns.
+    This function uses k-means clustering to identify these "archetypes" or
+    representative profiles.
+*   **Use Case:** Segmenting a population of residential electricity consumers based
+    on their typical daily usage patterns (e.g., "night owls," "morning peaks,"
+    "daytime constant") for targeted demand-side management programs or tariff design.
+
+get_load_stats
+~~~~~~~~~~~~~~
+*   **Rationale:** To quickly summarize key characteristics of a load profile over
+    defined periods (e.g., monthly, annually). This function computes metrics like
+    peak load, average load, load factor (average/peak), base load factor, and
+    total operating hours, providing a snapshot of the load's behavior. It leverages
+    descriptors from the ``enlopy.stats`` module.
+*   **Use Case:** Calculating monthly peak demand, average demand, and load factor for
+    an industrial facility to track energy efficiency improvements or to report
+    to energy regulators.
+
+detect_outliers
+~~~~~~~~~~~~~~~
+*   **Rationale:** Anomalous data points (outliers) can skew statistical analyses
+    and lead to incorrect conclusions or model behavior. This function provides a
+    method to identify such outliers based on deviations from a rolling median,
+    which is robust to the presence of outliers itself.
+*   **Use Case:** Cleaning a timeseries of sensor data (e.g., temperature, power output)
+    by identifying and flagging readings that are likely errors before further
+    processing or analysis. The identified outliers can then be removed or imputed
+    using ``enlopy.generate.remove_outliers``.
+
+countweekend_days_per_month
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+*   **Rationale:** A utility function that counts the number of weekend days (Saturdays and Sundays)
+    within each month of a given timeseries' DatetimeIndex. This can be useful for analyses
+    that need to normalize or compare data based on the number of working vs. non-working days.
+*   **Use Case:** Normalizing monthly energy consumption data by the number of business days in
+    each month to get a more comparable measure of consumption intensity, especially when
+    comparing different months or years.
diff --git a/docs/generate.rst b/docs/generate.rst
@@ -0,0 +1,139 @@
+.. _generate_module:
+
+enlopy.generate: Generating Energy Timeseries
+=============================================
+
+The ``enlopy.generate`` module provides a suite of tools for creating, synthesizing,
+and manipulating energy-related timeseries data. These functions are essential
+for simulations, modeling alternative scenarios, data augmentation, or when
+actual high-resolution data is unavailable.
+
+Core Functionalities
+--------------------
+
+The module covers several aspects of timeseries generation:
+
+*   **Creating profiles from base data:** Generating higher-resolution series from coarser data (e.g., daily to hourly) or from typical profiles.
+*   **Stochastic modeling:** Creating realistic synthetic timeseries based on statistical properties.
+*   **Transformations:** Modifying existing timeseries by adding noise, simulating demand response, or removing outliers.
+*   **Specialized generation:** Creating loads from Load Duration Curves (LDCs) or Power Spectral Densities (PSDs).
+
+Rationale and Use Cases of Key Functions
+----------------------------------------
+
+Below is a description of some key functions, their purpose, and typical use cases.
+For detailed API parameters, please refer to the :ref:`API documentation <API>`.
+
+.. contents:: Key Functions
+   :local:
+   :depth: 1
+
+disag_upsample
+~~~~~~~~~~~~~~
+*   **Rationale:** Often, energy data is available at a coarse granularity (e.g., daily consumption),
+    but models or analyses require higher resolution (e.g., hourly). This function
+    distributes the coarser data points into finer intervals based on a representative
+    disaggregation profile, ensuring the total sum over the original period is preserved.
+*   **Use Case:** Converting daily household energy consumption data to hourly data using a
+    standard hourly consumption profile for that type of household.
+
+gen_daily_stoch_el
+~~~~~~~~~~~~~~~~~~
+*   **Rationale:** To create realistic, synthetic daily electricity load profiles when only
+    aggregate daily energy is known or when multiple variations are needed for robust analysis.
+    It uses pre-defined statistical means and standard deviations (derived from analysis
+    of many households) per timestep, combined with a Gauss-Markov process to introduce
+    autocorrelation.
+*   **Use Case:** Generating diverse daily load profiles for a set of simulated households
+    in an agent-based model, where each household has a total daily energy consumption target.
+
+gen_load_from_daily_monthly
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+*   **Rationale:** Constructing an annual hourly load profile when only monthly total consumption
+    and typical daily profiles (for weekdays and weekends) are available. This is common
+    in energy planning or when detailed historical data is scarce.
+*   **Use Case:** Creating a year-long hourly electricity demand forecast for a region
+    based on projected monthly energy demands and established daily usage patterns for
+    residential and commercial sectors.
+
+gen_load_sinus
+~~~~~~~~~~~~~~
+*   **Rationale:** To generate synthetic timeseries that exhibit clear periodic behavior
+    at multiple timescales (daily, weekly, annually). This is useful for creating
+    baseline profiles or test data for models that need to capture seasonality.
+*   **Use Case:** Creating a synthetic temperature profile or a baseline renewable energy
+    generation profile that follows predictable daily and annual cycles.
+
+gen_corr_arrays
+~~~~~~~~~~~~~~~
+*   **Rationale:** In many energy systems, multiple variables are correlated (e.g., wind
+    speed and solar irradiance at different locations, or electricity prices and demand).
+    This function generates multiple arrays of random numbers that exhibit a specified
+    correlation structure, essential for Monte Carlo simulations or for generating
+    realistic multi-variate inputs.
+*   **Use Case:** Generating correlated wind speed timeseries for several nearby wind farms
+    to assess the aggregated power output variability.
+
+gen_load_from_LDC
+~~~~~~~~~~~~~~~~~
+*   **Rationale:** To create a sequence of load values that statistically matches a given
+    Load Duration Curve (LDC). The LDC represents the amount of time the load is at or
+    above a certain level. This method uses inverse transform sampling.
+*   **Important Note:** This method generates values that match the LDC's distribution
+    but **loses the original temporal sequence**. The output is a set of load values,
+    not a chronologically realistic timeseries. It's often a precursor to `gen_load_from_PSD`.
+*   **Use Case:** Generating a set of hourly load values for a year that, when sorted,
+    will precisely match a target LDC for planning purposes.
+
+gen_load_from_PSD
+~~~~~~~~~~~~~~~~~
+*   **Rationale:** To generate a realistic timeseries that not only matches a target
+    probability distribution (often derived from an LDC via `gen_load_from_LDC`)
+    but also possesses specific spectral characteristics (i.e., how power is distributed
+    across different frequencies, indicating temporal patterns like ramps, cycles).
+    It uses the Iterated Amplitude Adjusted Fourier Transform (IAAFT) algorithm.
+*   **Use Case:** Taking hourly load values generated by `gen_load_from_LDC` and
+    "shuffling" them to create a chronologically realistic annual load profile that
+    exhibits typical daily and weekly patterns (captured in the PSD).
+
+gen_gauss_markov
+~~~~~~~~~~~~~~~~
+*   **Rationale:** To generate timeseries that exhibit autoregressive properties, meaning
+    future values depend on past values, along with some randomness. This is useful for
+    modeling systems with inertia or memory, where values don't change erratically
+    but smoothly transition.
+*   **Use Case:** Simulating short-term load fluctuations or temperature variations where
+    the current value is strongly influenced by the immediately preceding values.
+
+add_noise
+~~~~~~~~~
+*   **Rationale:** To introduce variability or uncertainty into an existing timeseries.
+    Real-world data is rarely perfectly smooth, and adding noise can make simulations
+    more realistic or test the robustness of models.
+*   **Use Case:** Adding random fluctuations to a deterministic solar power generation
+    profile to account for unpredictable cloud cover.
+
+gen_analytical_LDC
+~~~~~~~~~~~~~~~~~~
+*   **Rationale:** To quickly generate a standard Load Duration Curve shape based on
+    a few key empirical parameters (Peak load, capacity factor, base load factor,
+    operating hours). This avoids needing full timeseries data to get an LDC.
+*   **Use Case:** Quickly sketching an LDC for a system where only high-level statistics
+    are known, for initial capacity planning or policy analysis.
+
+gen_demand_response
+~~~~~~~~~~~~~~~~~~~
+*   **Rationale:** To simulate the impact of demand response programs, which aim to
+    reduce peak loads by either shifting demand to off-peak hours or by curtailing
+    (shaving) load during peak times.
+*   **Use Case:** Assessing how much a utility can reduce its peak capacity requirements
+    by implementing a residential demand response program that shifts a certain percentage
+    of peak load.
+
+remove_outliers
+~~~~~~~~~~~~~~~
+*   **Rationale:** Outliers in timeseries data can distort analysis and modeling. This
+    function first detects outliers (using methods from `enlopy.analysis`) and then
+    replaces them with interpolated values, providing a cleaner dataset.
+*   **Use Case:** Preprocessing a measured electricity demand timeseries to remove anomalous
+    readings caused by sensor errors before using it for forecasting.
diff --git a/docs/index.rst b/docs/index.rst
@@ -7,8 +7,13 @@ Contents
 --------
 
 .. toctree::
-   :maxdepth: 1
+   :maxdepth: 2
 
+   generate
+   analysis
+   plot
+   stats
+   utils
    API
 
 ``enlopy`` is an open source python library with methods to generate,

diff --git a/docs/plot.rst b/docs/plot.rst
@@ -0,0 +1,92 @@
+.. _plot_module:
+
+enlopy.plot: Visualizing Energy Timeseries
+===========================================
+
+The ``enlopy.plot`` module provides a collection of functions for visualizing
+energy-related timeseries data. These plotting utilities are designed to reveal
+patterns, trends, and distributions within the data, often working in conjunction
+with transformations from the ``enlopy.analysis`` module.
+
+Core Visualizations
+-------------------
+
+The module offers several types of plots common in energy analysis:
+
+*   **Heatmaps and 3D plots:** For visualizing load across two time dimensions.
+*   **Percentile plots:** To understand temporal variations in load distribution.
+*   **Boxplots:** To compare distributions across different time categories.
+*   **Load Duration Curve (LDC) plots:** Standard visualization for power system analysis.
+*   **Rug plots:** For displaying activity or comparing multiple timeseries.
+
+Rationale and Use Cases of Key Functions
+----------------------------------------
+
+Below is a description of key plotting functions, their purpose, and typical use cases.
+For detailed API parameters, please refer to the :ref:`API documentation <API>`.
+
+.. contents:: Key Functions
+   :local:
+   :depth: 1
+
+plot_heatmap
+~~~~~~~~~~~~
+*   **Rationale:** Heatmaps are an effective way to visualize the magnitude of a variable
+    across two dimensions. For timeseries, this typically involves reshaping the data
+    (e.g., using ``enlopy.analysis.reshape_timeseries``) so that one time attribute
+    (like hour of day) forms one axis, and another (like day of year) forms the other.
+    Color intensity represents the load magnitude.
+*   **Use Case:** Visualizing an entire year's hourly electricity demand to quickly
+    identify periods of high/low consumption, seasonal trends, and daily patterns.
+    For example, seeing bright colors during summer afternoons (AC load) and winter
+    evenings (heating/lighting).
+
+plot_3d
+~~~~~~~
+*   **Rationale:** Similar to heatmaps, 3D surface plots can represent load magnitude
+    across two time dimensions, but with the load value explicitly shown on the Z-axis.
+    This can sometimes offer a more intuitive grasp of peaks and valleys in the data.
+*   **Use Case:** Creating a 3D representation of hourly load versus day of year to
+    emphasize the height of peak demand periods and the depth of low-demand troughs.
+
+plot_percentiles
+~~~~~~~~~~~~~~~~
+*   **Rationale:** To understand how the distribution of load values changes over a
+    specific cycle (e.g., daily, weekly). This function plots user-defined percentiles
+    (e.g., 5th, 25th, 50th (median), 75th, 95th) for each point in the cycle,
+    showing the typical range and variability of the load.
+*   **Use Case:** Plotting hourly percentiles of electricity demand for each day of the
+    week. This can show, for instance, that while median load on weekends is lower,
+    the variability (spread between 5th and 95th percentiles) might be higher or different
+    in shape compared to weekdays.
+
+plot_rug
+~~~~~~~~
+*   **Rationale:** Rug plots are useful for visualizing the activity or values of multiple
+    timeseries simultaneously in a compact way. Each timeseries is represented by a
+    horizontal "rug." For on/off data, dashes can indicate "on" periods. For continuous
+    data, the color or intensity of dashes can represent magnitude.
+*   **Use Case:** Displaying the operational status (on/off) of multiple appliances in a
+    household over a day. Or, visualizing the normalized output of several renewable
+    energy sources (wind, solar) over time to see their collective behavior.
+
+plot_boxplot
+~~~~~~~~~~~~
+*   **Rationale:** Boxplots (or box-and-whisker plots) provide a standardized way to
+    display the distribution of data based on a five-number summary (minimum, first
+    quartile, median, third quartile, maximum). They are excellent for comparing
+    distributions across different categories.
+*   **Use Case:** Comparing the distribution of hourly electricity demand for each day
+    of the week. This can clearly show differences in median load, variability (interquartile
+    range), and the presence of outliers for weekdays versus weekend days.
+
+plot_LDC
+~~~~~~~~
+*   **Rationale:** Visualizing the Load Duration Curve (LDC), which is typically generated
+    by ``enlopy.analysis.get_LDC``. This plot shows the relationship between load levels
+    and the duration for which those levels are met or exceeded. It's a standard tool for
+    assessing power system adequacy and operational characteristics.
+*   **Use Case:** Plotting the LDC for a regional electricity system to visualize how many
+    hours per year different levels of generation capacity are utilized. Options allow
+    for plotting multiple LDCs (e.g., for different scenarios or sub-regions) and
+    zooming into the peak portion of the curve.
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -1,4 +1,4 @@
-matplotlib==2.2.0
-numpy==1.22.0
-pandas==2.0.0
-scipy==1.10.0
+matplotlib>3.5.1,<3.6
+numpy
+pandas
+scipy