Skip to content

Dev flow control anti windup#107

Draft
lnagel wants to merge 11 commits into
mainfrom
dev-flow-control-anti-windup
Draft

Dev flow control anti windup#107
lnagel wants to merge 11 commits into
mainfrom
dev-flow-control-anti-windup

Conversation

@lnagel
Copy link
Copy Markdown
Owner

@lnagel lnagel commented Feb 22, 2026

Branch Summary: Flow-Rate Scheduling + Back-Calculation Anti-Windup

Problem Statement

A multi-zone UFH system controlled by PWM (pulse-width modulation of valve open time) faces two interacting constraints:

  1. Flow-rate limits: The boiler/heat pump has a minimum and maximum aggregate flow rate. Too many zones open simultaneously exceeds the heat source capacity; too few zones open means insufficient flow for the heat source to fire at all.

  2. PID integral windup under contention: When zones compete for limited flow slots, some zones are deferred — their valves stay closed even though the PID controller commands heating. The PID integral accumulates error during deferral. Without correction, the integral ratchets to its clamp (100%), causing aggressive overshoot when the zone finally gets its turn.

These two features must work together: flow scheduling creates the contention, and back-calculation anti-windup prevents the PID from misbehaving because of it.

Feature 1: Back-Calculation Anti-Windup

Mechanism

At each observation period boundary (every 2 hours by default), the controller compares what the PID commanded against what was actually delivered:

  • u_commanded: duty cycle derived from last_requested_duration / observation_period. The last_requested_duration is captured at convergence points — valve state transitions (TURN_ON, TURN_OFF) and period boundaries — to reflect the duty cycle at the moment the PWM decision was last re-evaluated.

  • u_actual: used_duration / observation_period. used_duration accumulates real seconds of flow (valve open and flow confirmed). When a supply temperature sensor is available, accumulation is weighted by supply coefficient (actual supply temp / target supply temp), so partial-heat delivery counts proportionally.

The correction uses the standard back-calculation formula (Åström & Hägglund):

Kt = Ki / Kp           (tracking gain = 1/Ti)
integral += Kt * (u_actual - u_commanded) * observation_period

When u_actual < u_commanded (zone was deferred or partially served), the correction is negative, pulling the integral down. When u_actual > u_commanded (rare — zone got more than requested), the correction is positive.

Why convergence-point tracking?

The naive approach — comparing used_duration against the current PID duty cycle — fails because the PID output changes continuously. By the time the period ends, the current duty cycle no longer reflects what was commanded when the scheduling decision was made. Convergence-point tracking snapshots requested_duration at the moment the controller last made a meaningful decision about the zone, giving a stable reference for comparison.

Design question for review

Tracking gain Kt = Ki/Kp: This is the textbook choice for continuous PI controllers (Kt = 1/Ti where Ti = Kp/Ki). However, this system is not continuous — it's a sampled PWM system where corrections happen at discrete 2-hour boundaries. The effective correction per period is:

Δintegral = (Ki/Kp) × (u_actual - u_commanded) × observation_period

With default Ki=0.001, Kp=50, observation_period=7200s:

Kt × dt = (0.001/50) × 7200 = 0.144

So a 100% mismatch (u_actual=0, u_commanded=100) produces a correction of -14.4 per period. This means it takes roughly 7 periods (14 hours) to unwind a fully saturated integral.

Is this correction speed appropriate? Too fast risks oscillation (zone gets deferred → integral drops → zone loses priority → gets deferred again). Too slow means the integral stays elevated for many hours after a zone is starved.

Feature 2: Flow-Rate Scheduling

Max-flow enforcement

When aggregate flow from zones wanting to open would exceed optimal_flow_rate_max, TURN_ON candidates are admitted by priority (remaining quota descending — zones needing the most time get first access). Excess candidates are demoted to STAY_OFF. Already-running zones (STAY_ON) are never preempted mid-run.

Never-starve rule: If a single zone is the only candidate and no other zones are running (committed_flow = 0), it is admitted regardless of whether its flow exceeds max. This prevents a high-flow zone from being permanently starved. The boiler should still be able to modulate for a single circuit.

Min-flow enforcement (new in this branch)

After max-flow admission, if total prospective flow (STAY_ON + admitted TURN_ON zones) is below optimal_flow_rate_min, all TURN_ON candidates are demoted to STAY_OFF. The rationale: with insufficient flow, the boiler/heat pump won't fire. Opening valves just circulates cold water through the buffer/pipes — wasting pump energy and providing no useful heating.

STAY_ON zones are not demoted — closing a running valve mid-cycle would cause unnecessary wear and transient behavior.

Interaction with back-calculation

When min-flow enforcement defers a zone:

  1. The zone's valve stays closed → used_duration = 0
  2. At period end, back-calculation sees u_actual = 0 vs u_commanded > 0
  3. The integral is corrected downward, preventing windup
  4. The zone's priority (remaining quota) stays proportional to actual demand, not inflated by accumulated integral error

Without min-flow enforcement, the zone's valve would open but the boiler wouldn't fire (heat_request suppressed). The zone would accumulate used_duration from valve-open time, making u_actual ≈ u_commanded, and back-calculation would see no mismatch. The integral would ratchet to 100% unchecked.

Design question for review

Should min-flow ever be overridden? Currently, if only 1 zone has demand and the other 4 are satisfied, that zone is permanently deferred until another zone develops demand. In a real installation:

  • Is it acceptable for a single zone to wait indefinitely for a companion zone? This could mean a zone stays 2°C below setpoint for hours.
  • Would it be better to apply a timeout (e.g., after N consecutive deferrals, allow the zone to open alone and rely on the boiler's own minimum-flow protection)?
  • Some boilers have built-in minimum-flow bypass valves — should the controller assume this and let the zone open regardless?

Room Thermal Model

Simulations use a lumped-capacitance model per EN ISO 13790:

dT/dt = (Q_gain - Q_loss) / C

Q_loss = U × (T_room - T_outdoor)     [W/m²]
Q_gain = P_heating  if valve open      [W/m²]
C = thermal_mass × 1000               [J/(K·m²)]

Three room archetypes (per EN 12831 / EN 1264):

Archetype Thermal mass Heat loss coeff UFH power Time constant Notes
well_insulated 120 kJ/(K·m²) 0.56 W/(K·m²) 30 W/m² ~60 h Passivhaus / current Nordic code
moderate 165 kJ/(K·m²) 1.65 W/(K·m²) 75 W/m² ~28 h 1980s-2000s renovation
borderline 200 kJ/(K·m²) 4.18 W/(K·m²) 50 W/m² ~13 h Pre-1960s, undersized UFH

The simulation harness models valve ramp (180s open, 90s close) and flow detection (valve position > 85% threshold).

Test Scenarios

All simulations use 60-second time steps and 2-hour observation periods unless noted.

Anti-Windup Tests (test_anti_windup.py)

test_integral_clamps_at_max

  • Setup: Borderline room (U=4.18, P=50), outdoor=5°C, setpoint=21°C. System cannot reach setpoint — max achievable temp ≈ 5 + 50/4.18 = 17°C.
  • Duration: 24 hours.
  • Assertion: Integral clamps at 100 (not higher, not oscillating).
  • Purpose: Validates basic anti-windup clamp works. The integral should saturate cleanly at the configured maximum.

test_integral_clamped_at_zero_above_setpoint

  • Setup: Well-insulated room, outdoor=5°C, initial_temp=25°C (above setpoint=21°C).
  • Duration: 4 hours (cooling phase).
  • Assertion: Integral stays at 0 during cooling phase. Heat request is False.
  • Purpose: Negative error should not drive integral below its floor (0). No heating while room is above setpoint.

test_integral_recovers_from_clamp

  • Setup: Borderline room, outdoor=5°C. Integral saturates to 100.
  • Mutation at 12h: Outdoor warms to 20°C, setpoint stays 21°C.
  • Duration: 48 hours.
  • Assertion: Integral drops below 90 after 36h (conditions improved, demand fell).
  • Purpose: Integral must unwind when conditions improve, not stay permanently at 100.

test_under_delivery_correction

  • Setup: Well-insulated room, outdoor=17°C, setpoint=21°C. Theoretical duty = 0.56×4/30×100 = 7.5%.
  • Duration: 48 hours.
  • Assertion: Integral bounded [0,100], stable (drift < 5) after 24h.
  • Purpose: Valve ramp overhead causes slight under-delivery each cycle. Back-calculation should prevent cumulative integral drift from this systematic mismatch.

test_over_delivery_tolerance

  • Setup: Well-insulated room, outdoor=18.87°C. Theoretical duty ≈ 4.0%.
  • Duration: 48 hours.
  • Assertion: Integral bounded and stable (drift < 5) after 24h.
  • Purpose: Duty near the min-run-time threshold — zone fires for 540s minimum even when PID requests less. Small over-delivery should not cause integral oscillation.

test_sustained_under_delivery

  • Setup: Well-insulated room, outdoor=19.67°C. Theoretical duty ≈ 2.5%.
  • Duration: 48 hours.
  • Assertion: Integral converges below 15 after 24h, stable (drift < 5).
  • Purpose: Duty well below min-run threshold. Zone fires sporadically (approximately every other period). Without back-calculation, integral would climb steadily. With it, integral should settle near the true steady-state duty.

test_low_kp_integral_converges

  • Setup: Well-insulated room, outdoor=19°C, Kp=10 (reduced from default 50). Theoretical duty ≈ 3.7%.
  • Duration: 48 hours.
  • Assertion: Integral converges below 15 after 24h, stable (drift < 10).
  • Purpose: Low Kp means the integral carries more of the steady-state load. Verifies back-calculation works correctly when the integral is the dominant PID term.

test_demand_transition_bounded_overshoot

  • Setup: Well-insulated room, Kp=10. Phase 1 (0–24h): outdoor=20°C, duty ≈ 1.9%. Phase 2 (24–48h): outdoor drops to 0°C, duty ≈ 39.2%.
  • Mutation at 24h: Outdoor temp drops from 20°C to 0°C.
  • Assertion: Room temperature never exceeds setpoint + 2°C during the transition.
  • Purpose: After a sudden demand increase ("cold snap"), the controller should respond aggressively without causing temperature overshoot. The integral state from the mild phase should not cause problems.

Flow Control Tests (test_flow_control.py)

All flow control tests use 5 zones × 2 L/min each, min_flow=4 L/min (requires ≥ 2 zones), max_flow=6 L/min (allows ≤ 3 zones), well-insulated room archetype, Kp=30, Ki=0.001.

test_flow_limited_zones_reach_setpoint

  • Setup: 5 zones at outdoor temps [5, 10, 12, 15, 18]°C. Varying demand.
  • Duration: 72 hours.
  • Assertion: All zones within ±1°C of setpoint after 48h. Integrals bounded [0,100].
  • Purpose: Despite only 3 zones firing simultaneously, all 5 zones eventually reach setpoint through time-sharing. The scheduler's priority system (remaining quota descending) ensures fair allocation.

test_flow_limited_integral_stays_reasonable

  • Setup: 5 zones, all at outdoor=15°C (uniform demand). Theoretical duty per zone ≈ 11.2%. With 5 zones × 11.2% = 56% aggregate, but only 3 slots → each zone gets ~67% of its requested time.
  • Duration: 72 hours.
  • Assertion: All zone integrals converge below 30 after 36h, stable (drift < 10).
  • Purpose: Uniform contention — all zones are equally deferred. Without back-calculation, integrals would all ratchet to 100. With it, they should settle at a level consistent with the actual delivered duty.

test_flow_limited_fair_allocation

  • Setup: 5 zones at outdoor temps [0, 5, 10, 15, 18]°C. Widely varying demand.
  • Duration: 72 hours.
  • Assertion: Coldest zone (outdoor=0) gets more total used_duration than warmest (outdoor=18). All zones within ±1.5°C of setpoint after 48h.
  • Purpose: The scheduler's priority system (front-loading by remaining quota) should give proportionally more runtime to zones with higher demand.

test_heat_request_requires_min_flow

  • Setup: 1 zone at outdoor=0°C (extreme demand), 4 zones at outdoor=22°C (above setpoint, no demand).
  • Duration: 24 hours.
  • Assertion: z0 valve never opens, z0 never has flow, heat_request is never True.
  • Purpose: A single high-demand zone (2 L/min) cannot satisfy min_flow (4 L/min). Min-flow enforcement prevents the valve from opening. This is the correct behavior — opening the valve would circulate cold water with no boiler firing.

test_single_demand_zone_deferred_by_min_flow

  • Setup: 1 zone at outdoor=19°C (moderate demand, 2°C below setpoint), 4 zones at outdoor=22°C (no demand).
  • Duration: 48 hours.
  • Assertion: z0 integral average stays below 20 after 24h (back-calc prevents ratcheting). z0 integral stable (drift < 20). z0 temp near outdoor (~19-20°C). z0 never has flow.
  • Purpose: The key interaction test. Min-flow defers the zone → valve stays closed → used_duration=0 → back-calculation sees u_actual=0 vs u_commanded≈60% → integral corrected ≈-14.4 per period. Without back-calculation, the integral would ratchet to 100 in ~7 periods. With it, the integral oscillates in a bounded range.

Unit Tests (test_scheduler.py)

Six unit tests covering edge cases in apply_flow_constraint:

  • Zone without nominal_flow_rate passes through unconstrained (max-flow)
  • Single zone exceeding max still admitted (never-starve rule)
  • Second zone demoted when exceeding max budget
  • Single TURN_ON below min-flow demoted to STAY_OFF
  • STAY_ON preserved even when below min-flow
  • Sufficient aggregate flow allows TURN_ON

Open Questions for Heating Engineer Review

  1. Min-flow threshold vs boiler protection: Most modern boilers have a built-in minimum flow switch or bypass valve. Should the controller's min-flow enforcement be considered a "soft" optimization (save pump energy) or a "hard" safety requirement (prevent boiler lockout)? This affects whether the never-starve rule should apply to min-flow as well.

  2. Single-zone deferral duration: When only one zone has demand, it is deferred indefinitely. In practice, how long is acceptable for a zone to wait? Would a 2-hour timeout (one observation period) be reasonable before overriding the min-flow constraint?

  3. Correction speed: The back-calculation corrects ~14.4 integral units per 2-hour period for a full mismatch. With integral range [0, 100], this means about 14 hours to fully unwind a saturated integral. Is this response speed appropriate for residential UFH with its long thermal time constants (13–60 hours)?

  4. Supply coefficient weighting: When a supply temperature sensor is configured, used_duration is weighted by actual_supply_temp / target_supply_temp. This means a zone receiving 30°C water when the target is 40°C accumulates at 75% rate. Is this linear approximation reasonable for UFH heat transfer, or should it be non-linear (e.g., accounting for floor surface resistance)?

  5. Valve ramp in flow calculation: Flow is detected when valve position exceeds 85%. The simulation models valve ramp (180s to open, 90s to close), but the flow constraint uses nominal_flow_rate as a binary value — a zone either contributes its full flow or zero. In practice, partially-open valves have reduced flow. Does this simplification matter for the scheduling decisions, or is the binary model adequate?

  6. Observation period alignment: Observation periods are aligned to midnight (00:00, 02:00, 04:00...). This means the first period after startup may be short. Back-calculation uses the full observation_period as dt, regardless of the actual elapsed time. Should this be the actual period duration instead?

lnagel and others added 11 commits February 22, 2026 13:20
Correct the PID integral at observation period boundaries by comparing
actual valve delivery against commanded duty cycle. This prevents
integral windup when the duty cycle is too small for the minimum run
time, avoiding excessive overshoot when demand later rises.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use a period-start snapshot (commanded_duty_cycle) instead of the
end-of-period PID duty cycle as u_commanded. This prevents sign errors
when the duty cycle drifts during the observation period — e.g., when
the PI outputs 15% at period start but drops to 6% as the room warms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update data flow diagram to reflect back-calculation step during
period transition. Fix pid.py docstring that still referenced
"period end" after the snapshot refactor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The single period-start snapshot was too coarse — the PID drifts during
the period, and when a valve delivers its quota and closes, the stale
snapshot can cause false corrections. Replace with paired fields
(last_action_at, last_requested_duration) bumped forward at PWM
convergence points (valve open, valve close, period start). These fields
are also persisted, fixing incorrect corrections after restart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The harness was missing the mid-period convergence point tracking
that the coordinator performs after evaluate(). This ensures
TURN_ON/TURN_OFF events update last_requested_duration for
accurate back-calculation at period boundaries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- assert_integral_converged helper: checks integral average stays
  below a threshold after settling time
- test_low_kp_integral_converges: kp=10 at outdoor=19°C, verifies
  integral settles proportional to steady-state duty (~3.73%)
- test_demand_transition_bounded_overshoot: outdoor 20→0°C cold
  snap, verifies temperature stays within setpoint ± 2°C
- Strengthen test_sustained_under_delivery with integral level
  check (max_value=15 vs theoretical duty ~2.5%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add per-zone nominal flow rate configuration and two flow-rate constraints
to the heating controller:

- Flow constraint (max): Limits how many zones can be ON simultaneously by
  capping aggregate flow rate. TURN_ON candidates are prioritized by
  remaining quota (front-loading high-demand zones). Zones already ON are
  never preempted, and a single zone is never starved.

- Flow minimum gating: Suppresses the boiler heat request when aggregate
  flow from active zones is below a configured minimum threshold (latent
  heat mode), allowing residual heat to be used before firing the boiler.

Both thresholds are optional and configurable via a new "Flow Scheduling"
options menu step. Per-zone nominal flow rate is configurable in zone
entity settings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace string literals with CONF_NOMINAL_FLOW_RATE,
CONF_OPTIMAL_FLOW_RATE_MIN, and CONF_OPTIMAL_FLOW_RATE_MAX constants
in config_flow.py and coordinator.py for consistency and typo prevention.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Zones wanting TURN_ON are now demoted to STAY_OFF when total prospective
flow (STAY_ON + admitted TURN_ON) falls below optimal_flow_rate_min.
This prevents opening valves when the boiler won't fire due to
insufficient flow. Back-calculation naturally corrects the integral
since used_duration=0 when the valve stays closed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cover edge cases: zones without nominal_flow_rate passing through
unconstrained, single-zone never-starve path, and min-flow enforcement
(demote, preserve STAY_ON, sufficient flow). Brings scheduler.py diff
coverage to 100%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 22, 2026

Codecov Report

❌ Patch coverage is 78.90625% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.38%. Comparing base (9b36085) to head (467b320).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
custom_components/ufh_controller/config_flow.py 21.87% 23 Missing and 2 partials ⚠️
custom_components/ufh_controller/core/scheduler.py 95.23% 0 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #107      +/-   ##
==========================================
- Coverage   96.61%   95.38%   -1.24%     
==========================================
  Files          20       21       +1     
  Lines        1715     1842     +127     
  Branches      257      297      +40     
==========================================
+ Hits         1657     1757     +100     
- Misses         36       59      +23     
- Partials       22       26       +4     
Files with missing lines Coverage Δ
custom_components/ufh_controller/const.py 100.00% <100.00%> (ø)
custom_components/ufh_controller/coordinator.py 94.76% <100.00%> (+0.08%) ⬆️
...ustom_components/ufh_controller/core/controller.py 100.00% <100.00%> (ø)
custom_components/ufh_controller/core/pid.py 100.00% <100.00%> (ø)
custom_components/ufh_controller/core/zone.py 100.00% <100.00%> (ø)
custom_components/ufh_controller/core/scheduler.py 95.23% <95.23%> (ø)
custom_components/ufh_controller/config_flow.py 84.35% <21.87%> (-13.62%) ⬇️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant