Skip to content

Add statistics nodes (median, variance, std dev, …) #155

@xujustinj

Description

@xujustinj

Motivation

Beyond basic arithmetic (#153), workflows over numeric sequences commonly need summary statistics. We already have Sum and (proposed) Mean/Min/Max; this adds the rest. Python's stdlib statistics module covers most of these directly.

Proposed nodes

All take a single SequenceValue[FloatValue] input and output a FloatValue unless noted.

  • Median — middle value; for even-length sequences, define interpolation (recommend statistics.median, i.e. mean of the two middle values).
  • Mode — most common value. Define tie behavior (statistics.mode raises on no unique mode in older Pythons; multimode returns all — pick one and document).
  • Variance — with a population (bool) param: sample (statistics.variance, n-1) vs population (pvariance, n). Default to one and document.
  • StandardDeviation — same population param (stdev / pstdev).
  • Rangemax - minFloatValue.
  • Percentile / Quantile — a q param (0–100 or 0–1); define interpolation method (statistics.quantiles or numpy-style).
  • Count / Length — number of elements → IntegerValue. (Overlaps with a generic sequence Length; see the sequence-utilities issue — decide whether stats reuses that or has its own.)

Conventions / decisions

  • Empty sequence → raise NodeException (consistent with the decision in Add more built-in arithmetic nodes #153 for Min/Max/Mean).
  • Default FloatValue in/out; Integer→Float casts already handle integer sources.
  • Required title/description; auto-derived type; version 1.0.0.
  • Sample-vs-population and percentile-method defaults should be explicit params, not hidden.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions