Skip to content

Symmetry and Data Augmentation#626

Open
Scienfitz wants to merge 22 commits intodev/symmetryfrom
feature/invariance_augmentation
Open

Symmetry and Data Augmentation#626
Scienfitz wants to merge 22 commits intodev/symmetryfrom
feature/invariance_augmentation

Conversation

@Scienfitz
Copy link
Copy Markdown
Collaborator

@Scienfitz Scienfitz commented Aug 21, 2025

Implements #621

New: A Symmetry class which is part of baybe Surrogates. Three distinct symmetries are included, for more info check the userguide and for a demonstration of the effect see the new example. The ability to perform data augmentation has been included for all symmetries.

I have left some initial comments on design questions that are still open or where I am kind of indifferent and just had to choose one. Feel free to leave an opinion there first so the large-scale design picture can be finalized independent of small comments.

TODO

  • CHANGELOG (after architecture is logged in)
  • Remember to compress finalized svg picture

Other Notes
Symmetries and constraints are conceptually so similar that they should probably have the same interface. The design here has been done from scratch completely ignoring the constraint interface because it is already known to be not optimal and needs refactoring.

  • There is no overarching shared attribute parameters or similar because some symmetries allow single and some multiple such parameters. Instead the parameters are treated like the objectives treat target(s)
  • Contrary to dependency constraint, dependency symmetry can only hold 1 set of dependencies. The constraint should be refactored to look the same.

Unrelated Bugfix
I noticed that the permutation constraint also removed the diagonal in its filtering process. However this seems unreasonable since the diagonal is a set of points that are unique and have no invariant equivalent hence nothing needs removing. Turns out there was an automatic removal of the diagoanl because internally DiscretePermutationInvarianceConstraint also always applied a DiscreteNoLabelDuplicates constraint. I think the rational was that label duplicates dont make sense in these mixture situations so they need removing. However, this as nothing to do with the invariance and is achieved anyway in mixture use cases by adding a no label duplicate explicitly. So it was removed from the DiscretePermutationInvarianceConstraint which now leads to the expected amount of removed points (on of the matrix triangles)

@Scienfitz Scienfitz self-assigned this Aug 21, 2025
@Scienfitz Scienfitz added the new feature New functionality label Aug 21, 2025
@Scienfitz Scienfitz linked an issue Aug 21, 2025 that may be closed by this pull request
@AVHopp
Copy link
Copy Markdown
Collaborator

AVHopp commented Aug 25, 2025

@Scienfitz just to make sure - I guess since this is marked as a draft, you do not require a PR review for now, right? Is there anything else that we can assist with?

@Scienfitz
Copy link
Copy Markdown
Collaborator Author

@AVHopp yes exactly and it will always be like that for PR's that I open in draft: Ignore until requested or asked in any other way

@Scienfitz Scienfitz force-pushed the feature/invariance_augmentation branch from 00cfef8 to 5ba4cb1 Compare September 10, 2025 08:10
@Scienfitz Scienfitz force-pushed the feature/invariance_augmentation branch 4 times, most recently from db98f64 to aedafa7 Compare September 25, 2025 10:41
@Scienfitz Scienfitz marked this pull request as ready for review September 25, 2025 11:01
Copilot AI review requested due to automatic review settings September 25, 2025 11:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements automatic data augmentation for measurements when constraints support symmetry assumptions, particularly for permutation and dependency invariance constraints. This enhancement helps surrogate models better learn from symmetric relationships in the data without requiring users to manually generate augmented points.

  • Adds consider_data_augmentation flags to both surrogate models and relevant constraints to control augmentation behavior
  • Integrates augmentation logic into the Bayesian recommender workflow, applying it before model fitting when configured
  • Provides comprehensive examples and documentation showing the performance benefits of augmentation

Reviewed Changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_measurement_augmentation.py New test file verifying augmentation is applied when configured
examples/Constraints_Discrete/augmentation.py New example demonstrating augmentation effects on optimization performance
docs/userguide/surrogates.md Documentation updates explaining data augmentation feature
docs/userguide/constraints.md Documentation updates for augmentation flags in constraints
docs/scripts/build_examples.py Build script improvement to ignore __pycache__ folders
baybe/utils/dataframe.py Added documentation note about constraint considerations
baybe/utils/augmentation.py Cleaned up duplicate example in docstring
baybe/surrogates/gaussian_process/core.py Added consider_data_augmentation flag with temporary default
baybe/surrogates/base.py Added base consider_data_augmentation flag to surrogate interface
baybe/searchspace/core.py Core augmentation logic and augment_measurements method
baybe/recommenders/pure/bayesian/base.py Integration of augmentation into Bayesian recommender workflow
baybe/recommenders/pure/base.py Minor cleanup of validation logic
baybe/constraints/discrete.py Added consider_data_augmentation flags to constraint classes
baybe/constraints/base.py Moved augmentation flag to base constraint class
CHANGELOG.md Documented new features and changes

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread baybe/surrogates/gaussian_process/core.py Outdated
Comment thread tests/test_measurement_augmentation.py Outdated
Comment thread docs/scripts/build_examples.py
@Scienfitz Scienfitz force-pushed the feature/invariance_augmentation branch from 583b7be to 98c29e8 Compare September 25, 2025 11:10
@Scienfitz

This comment was marked as outdated.

@Scienfitz Scienfitz marked this pull request as draft September 30, 2025 11:26
@Scienfitz Scienfitz changed the title Add Auto-Augmentation of Measurements in the Presence of Invariance Constraints Symmetry and Data Augmentation Oct 9, 2025
@Scienfitz Scienfitz force-pushed the feature/invariance_augmentation branch from 98c29e8 to 0e05880 Compare October 24, 2025 18:12
@Scienfitz Scienfitz force-pushed the feature/invariance_augmentation branch from 46bc49c to 859ca3b Compare October 31, 2025 18:23
Comment thread baybe/symmetries.py Outdated
Comment thread baybe/symmetries.py Outdated
# Validate compatibility of surrogate symmetries with searchspace
if hasattr(self._surrogate_model, "symmetries"):
for s in self._surrogate_model.symmetries:
s.validate_searchspace_context(searchspace)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: Validation so far is only part of the recommend call here in the recommenders. Validation has not been included in the Campaign yet. This is due to two factors

  • To properly validate the symmetries and searchspace compatibility there needs to be a mechanism that can iterate over all possible recommenders of a metarecommender. Otherwise this upfront validation already fails for the two phase recommender if the second recommender has symmetries
  • There would be double validation with campaign and recommend call so the context info of whether validation was already performed needs to be passed somewhere. Likely fixable with settings mechanism not yet available

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdrianSosic I see now that the 2nd point could be solved with the Settings mechanism but I have no idea how to solve issue 1.

In the absence of that its not realy possible to turn it into an upfront validation, so I would probably not change the validation for this moment unless you have a smarter idea

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for being pragmatic and not trying to come up with something potentially convoluted right now. Even if we find a better way for the validation later, including it is just a plain improvement without negative consequences to users, so we can add it later without problems.

@Scienfitz Scienfitz marked this pull request as ready for review November 3, 2025 17:31
@Scienfitz Scienfitz requested a review from Copilot November 4, 2025 08:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Scienfitz Scienfitz requested a review from Copilot November 4, 2025 11:55

This comment was marked as resolved.

This comment was marked as outdated.

@Scienfitz Scienfitz changed the base branch from main to dev/symmetry April 2, 2026 00:41
@Scienfitz Scienfitz added the dev label Apr 2, 2026
Scienfitz and others added 22 commits April 10, 2026 20:55
The _autoreplicate converter on main wraps surrogates in a
CompositeSurrogate. Access the inner template for symmetry
validation and augmentation.
Use full module paths (e.g., baybe.symmetries.base.Symmetry)
instead of short paths via __init__.py re-exports, which
Sphinx cannot resolve.
`DiscretePermutationInvarianceConstraint` was always internally applying a DiscreteNoLabelDuplicates constraint to remove the diagonal elements, which is not correct and can always be achieved separately by explicitly using `DiscreteNoLabelDuplicates`
Co-authored-by: Alexander V. Hopp <alexander.hopp@merckgroup.com>
@Scienfitz Scienfitz force-pushed the feature/invariance_augmentation branch from ed0547b to be1c293 Compare April 10, 2026 19:12
@CLAassistant

This comment was marked as outdated.

@Scienfitz Scienfitz changed the base branch from dev/symmetry to main April 13, 2026 10:22
@Scienfitz Scienfitz changed the base branch from main to dev/symmetry April 13, 2026 10:22
Copy link
Copy Markdown
Collaborator

@AVHopp AVHopp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all happy, only some comments but I do not see any major thing that needs change. Thank you for this cool new feature!

| Symmetry | Functional Definition | Corresponding Constraint |
|:-----------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|
| {class}`~baybe.symmetries.permutation.PermutationSymmetry` | $f(x,y) = f(y,x)$ | {class}`~baybe.constraints.discrete.DiscretePermutationInvarianceConstraint` |
| {class}`~baybe.symmetries.dependency.DependencySymmetry` | $f(x,y) = \begin{cases}f(x,y) & \text{if }c(x) \\f(x) & \text{otherwise}\end{cases}$<br>where $c(x)$ is a condition that is either true or false | {class}`~baybe.constraints.discrete.DiscreteDependenciesConstraint` |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe add some horizontal lines here between the different types of symmetries? I find it a bit hard too read (but that might be personal preference/opinion)

Image

# # Optimizing a Permutation-Invariant Function

# In this example, we explore BayBE's capabilities for handling optimization problems
# with symmetry via automatic data augmentation and / or constraint.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for the spaces around "/"? I'd prefer it without them.

| Symmetry | Functional Definition | Corresponding Constraint |
|:-----------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|
| {class}`~baybe.symmetries.permutation.PermutationSymmetry` | $f(x,y) = f(y,x)$ | {class}`~baybe.constraints.discrete.DiscretePermutationInvarianceConstraint` |
| {class}`~baybe.symmetries.dependency.DependencySymmetry` | $f(x,y) = \begin{cases}f(x,y) & \text{if }c(x) \\f(x) & \text{otherwise}\end{cases}$<br>where $c(x)$ is a condition that is either true or false | {class}`~baybe.constraints.discrete.DiscreteDependenciesConstraint` |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formally, one could write it down like that:
Given a function $f:X\times Y \to \mathbb{R}$ as well as a condition function $c:X\to\{True, False\}$, we say that $f$ has a dependency symmetry on $f$ if and only if for all $x\in X$, it holds that $\neg c(x) \implies \forall y_1, y_2 \in Y : f(x, y_1) = f(x, y_2)=: f(x)$

However, that is quite verbose, and I can see that this maybe too much. Let me summon @AdrianSosic here, I would also be fine with using the current variant.

| Symmetry | Functional Definition | Corresponding Constraint |
|:-----------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|
| {class}`~baybe.symmetries.permutation.PermutationSymmetry` | $f(x,y) = f(y,x)$ | {class}`~baybe.constraints.discrete.DiscretePermutationInvarianceConstraint` |
| {class}`~baybe.symmetries.dependency.DependencySymmetry` | $f(x,y) = \begin{cases}f(x,y) & \text{if }c(x) \\f(x) & \text{otherwise}\end{cases}$<br>where $c(x)$ is a condition that is either true or false | {class}`~baybe.constraints.discrete.DiscreteDependenciesConstraint` |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rendered variant for your convenience:

Image

return df_z


# Grid and dataframe for plotting
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not need this comment as well as the "Plot the contour..." one in the next block


# Create the augmented row by mirroring the point at the mirror point.
# x_mirrored = mirror_point + (mirror_point - x) = 2*mirror_point - x
if row[column] != mirror_point:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% but do we want an exact inequality here or an approximate one? Feel free to decide what you think makes more sense, just want to raise awareness

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well that is related tot he topic #779 how floats in general should be compared

Unless thats decided I'm not a fan to discuss this now in every PR separately

template = surrogates.template
if isinstance(template, Surrogate):
return template
return None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember our logic correctly, then we allow users to implement whatever surrogates they want as long as they implement the SurrogateProtocol. In particular, we do not enforce them to implement the Surrogate class or inherit from this. This means that this function returns None, hence no augmentation will be done. However, I do not think that the user is informed at any point that no augmentation has been done in that case. Also, I think the same happens for a CompositeSurrogate that does not have a _ReplicationMapping but actual different surrogates.

Three questions:

  1. Do I understand this logic here correctly?
  2. Do we already have some sort of safe guard/warning informing the user that no augmentation will be done in this case?
  3. If not, can/do we want to add one?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm in the case you describe there is also no moment when the user ever assingned any symmetries because their custom surrogate doesnt have this attribute - so why would there be an expectation of applied symmetries that has to be warned about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dev new feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Data Augmentation for Invariant Contraints

6 participants