Skip to content

feat: Target Conditional generation framework#39

Open
Ramlaoui wants to merge 10 commits intomainfrom
feat/conditional-generation
Open

feat: Target Conditional generation framework#39
Ramlaoui wants to merge 10 commits intomainfrom
feat/conditional-generation

Conversation

@Ramlaoui
Copy link
Copy Markdown
Collaborator

Builds on @andaero and @smglsn12, and expands on the possible conditional metrics to write. This introduces a framework for both:

  • Discrete target generation where the goal is to exactly target one value with an equality check
  • Continuous target generation, with the possibility to select the top k structures amongst the list, an option to turn off the tolerance threshold (which uses directly the distance as a metric then) or to use it for a success rate. It also supports various distance metrics that can be expanded

@Ramlaoui Ramlaoui changed the title feat: Conditional generation framework feat: Target Conditional generation framework Jun 30, 2025

def aggregate_results(self, values: list[float | None]) -> Dict[str, Any]:
"""Aggregate results into final metric values."""
valid_values = [v for v in values if v is not None]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although not critical, I would consider using arrays rather than lists in such operations.

values = np.array(values, dtype=float)
valid_values = values[~np.isnan(values)]
...
success_values = (valid_values == self.config.target_value).sum()
success_rate = success_values.mean()
... # etc.


def aggregate_results(self, values: list[float]) -> Dict[str, Any]:
"""Aggregate results into final metric values."""
valid_values = [v for v in values if not np.isnan(v)]
Copy link
Copy Markdown
Contributor

@vict0rsch vict0rsch Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, working with arrays would be faster.

Even such a simple manipulation is 10x faster with arrays:

In [1]: import numpy as np

In [2]: data = np.random.randint(0, 100, (10000,)).astype(float)

In [3]: data[data<5] = float("nan")

In [4]: values = data.tolist()

In [5]: %timeit [v for v in values if not np.isnan(v)]
3.73 ms ± 67.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [6]: %timeit (w:= np.array(values))[~np.isnan(w)]
267 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants