[ENH] Follow a registry framework for data generating methods

Currently, our benchmark script maintains a hard-coded mapping of data-generating functions to the CI tests that can consume them. Every time we add a new generator, we must also update this central mapping—adding boilerplate and introducing a risk of inconsistency.

**Proposal:**
Build a lightweight registry framework that lets each data-generating function declare, via a decorator, which CI tests it supports. At import time, the decorator will register the function into a global lookup table keyed by test name. The benchmark runner then discovers all generators for a given test by querying this registry.

```python
# registry.py
_FROM collections import defaultdict
_GENERATORS_BY_TEST = defaultdict(list)

def data_generator(*test_names):
    def decorator(fn):
        fn.supported_tests = getattr(fn, "supported_tests", []) + list(test_names)
        for name in test_names:
            _GENERATORS_BY_TEST[name].append(fn)
        return fn
    return decorator

def get_generators_for(test_name):
    return list(_GENERATORS_BY_TEST[test_name])
```
Data generating method:
```python
# linear_gaussian.py
from registry import data_generator
import numpy as np

@data_generator("pearsonr", "pillai", "gcm")
def linear_gaussian():
    ...
```

**Benefits:**
1. Modularity: Each generator self-documents which tests it supports.
2. Adding a new generator requires only creating a decorated function and no changes to the CI scripts is required.
3. Maintainability: Eliminates a growing, error-prone central mapping.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Follow a registry framework for data generating methods #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENH] Follow a registry framework for data generating methods #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions