Skip to content

Reworking transforms for flexibility and fewer allocations#12

Merged
rscarson merged 1 commit into
rscarson:masterfrom
scottmcm:lots-of-transform-stuff
Apr 2, 2026
Merged

Reworking transforms for flexibility and fewer allocations#12
rscarson merged 1 commit into
rscarson:masterfrom
scottmcm:lots-of-transform-stuff

Conversation

@scottmcm
Copy link
Copy Markdown
Contributor

Opening this probably more as a discussion, because while I started trying to do something simple, I ended up way down a rabbit hole and this is really bigger than probably makes sense to land as one thing, but figured I'd open it with all the stuff and we can chat about whether you even want all these changes and how to split them into smaller steps.


I started just trying to make a ScaleTransform::inverse method, because after scaling the inputs to something I often wanted to un-scale the outputs, and I'm really good at getting that inverse wrong.

That required two changes:

  • Adding ∛ and √ transforms, which seem reasonable enough. (I also stuck non_exhaustive on the enum so more can be added later without another semver break.)
  • Figuring out what to do about ScaleTransform::Exponential, because its multiplicative factor can't be undone by Logarithmic's factor. (And, correspondingly, that ScaleTransform::Logarithmic's factor is kinda weird since it could always be collapsed into the base.)

So the first possibly-controversial change: I removed the factors from Exponential and Logarithmic. TBH, I find that kinda reasonable anyway, because I always thought that (T, T) was quite unclear for exactly what they were doing, whereas with just one T it's obvious. (An alternative here would be to move to something like Exponential { base: T, factor: T, phase: T for x' = factor * pow(base, x - phase), but that felt kinda like overkill.) I also like that that makes the size of the variants more consistent.

To make sure that that was still possible, I had the idea that ended up being the cause of all my problems: let people put transforms in an array to compose them so that we don't need all the pre-composed versions. That way the migration path from ScaleTransform::Exponential(b, f) is to [ScaleTransform::Exponential(b), ScaleTransform::Linear(f)]. And it also opens the door to affine transformations by combining Linear+Shift, etc. (See the example on inverse_array for a worked version of this.)

At first that went great, but then I realized that trying to use Transform::apply_to doesn't work at all for all the impure transformations. (I'd tried implementing apply by calling each transform's apply_to on each element, which is great for ScaleTransform but totally nonsensical for the smoothing ones and such.)

Thus the giant scary change here: letting Transform::apply iterate multiple times.

That definitely comes at a cost, because the signature needs to change from the easy impl Iterator<Item = &mut T> to the more complicated for<'a> &'a mut I: IntoIterator<Item = &'a mut T>. The compiler definitely isn't as comfortable dealing with ∀'a bounds as it is for more "normal" things. If it was just for this array case, that wouldn't be worth it at all.

But one cool consequence of it is that a bunch of transforms no longer allocate! For example, MeanSubtraction can now iterate once to get the mean, then iterate again to subtract it. Strength::into_stddev no longer needs a slice, so its callers -- such as NoiseTransform::CorrelatedGaussian -- don't have to allocate. No more collecting into a Vec<&mut T>. And it's all still safe code, too.

So that seems overall at least plausible, since so long as one normally goes through Transformable anyway you don't even see the difference; things just work better under the hood. And thus this PR 🙂


A few other notes of things I did along the way:

  • Added XTransform and YTransform wrappers to adapt transforms from working on a scalar to working on a component of a pair.
  • Changed Transformable to taking a impl Transform instead of &impl Transform, but then adding a blanket impl<R: Transform> Transform for &R so that passing a reference to an impl Transform still works.
  • Switched ScaleTransform::Logarithmic to use T::min_positive_value() instead of T::epsilon() because there are a ton of floats smaller that epsilon. (For f64, ε is about 2e-16 and min_positive is about 2e-308.)
  • Removed the Self: Sized bound on Transformable::transformed so you can do things like array[i..j].transformed(…) (applying it to a slice) and it uses ToOwned to get you a Vec result.

@scottmcm scottmcm force-pushed the lots-of-transform-stuff branch from 7bb1d2b to 48a2660 Compare March 28, 2026 21:18
@rscarson
Copy link
Copy Markdown
Owner

rscarson commented Apr 2, 2026

I actually really like all these changes

This seems like a massively more ergonomic structure for the transforms library and I really really appreciate the work that you put into it this is fantastic

@scottmcm
Copy link
Copy Markdown
Contributor Author

scottmcm commented Apr 2, 2026

Oh, cool! Thanks! I'll get this rebased properly then.

Are you fine with it all in one change, or would you prefer I split out any of the parts into separate things?

@scottmcm scottmcm force-pushed the lots-of-transform-stuff branch from 48a2660 to 3560f89 Compare April 2, 2026 18:03
@scottmcm scottmcm force-pushed the lots-of-transform-stuff branch from 3560f89 to ef101dd Compare April 2, 2026 18:10
@rscarson
Copy link
Copy Markdown
Owner

rscarson commented Apr 2, 2026

Looks good to me! Thanks again - this is a huge improvement for sure

@rscarson rscarson merged commit 17cb73b into rscarson:master Apr 2, 2026
4 checks passed
@scottmcm
Copy link
Copy Markdown
Contributor Author

scottmcm commented Apr 2, 2026

While I'm touching the transforms, there was one other thing I was considering but couldn't decide whether it was a good idea: should transforms be allowed to have an output as well?

I don't think any of the ScaleTransforms need that, so they'd still have type Output = ();. But it would be handy if running NormalizationTransform::ZScore could return the affine transformation it applied (or the inverse thereof, or the average+stdev, or whatever), since it calculated it anyway and that would be convenient to un-normalize the output later. But an associated type would be awkward for that because it would need to be specified for the whole NormalizationTransform type, but the different variants would want different things. So I couldn't come up with a design that made me happy with it, so didn't do anything.

Maybe trying to do that in Transform/Transformable is the wrong approach, though, and it'd be better to just have some helpers to go from the result of polyfit::statistics::stddev_and_mean to a transforms instead, or something? That could be a separate PR, though -- probably doesn't fit in this one.

EDIT: Oops, you're too quick -- didn't notice this merge while I was typing 🙃

@scottmcm scottmcm deleted the lots-of-transform-stuff branch April 2, 2026 18:22
@rscarson
Copy link
Copy Markdown
Owner

rscarson commented Apr 2, 2026

I've actually been playing around with overhauling the statistics module I was never really happy with the giant pile of functions structure I ended up with after grew out of some internal functionality

I'm also not satisfied with the trait bounds they use because it's a little restrictive for my liking

I've been experimenting with a large trait offering the different statistics as methods but the trait would be unbelievably large so I don't think that's the best option either

Another issue with statistics at the moment is reusability, I have a lot of functions that just return more than one statistic as a tuple so that I don't have to recalculate them, but that's pretty clumsy

But yeah I'm probably going to fundamentally alter the structure of statistics at some point and I think that's probably a better place to put it, depending on the structure that ends up there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants