Skip to content

[ENH] Follow a registry framework for data generating methods #11

@ankurankan

Description

@ankurankan

Currently, our benchmark script maintains a hard-coded mapping of data-generating functions to the CI tests that can consume them. Every time we add a new generator, we must also update this central mapping—adding boilerplate and introducing a risk of inconsistency.

Proposal:
Build a lightweight registry framework that lets each data-generating function declare, via a decorator, which CI tests it supports. At import time, the decorator will register the function into a global lookup table keyed by test name. The benchmark runner then discovers all generators for a given test by querying this registry.

# registry.py
_FROM collections import defaultdict
_GENERATORS_BY_TEST = defaultdict(list)

def data_generator(*test_names):
    def decorator(fn):
        fn.supported_tests = getattr(fn, "supported_tests", []) + list(test_names)
        for name in test_names:
            _GENERATORS_BY_TEST[name].append(fn)
        return fn
    return decorator

def get_generators_for(test_name):
    return list(_GENERATORS_BY_TEST[test_name])

Data generating method:

# linear_gaussian.py
from registry import data_generator
import numpy as np

@data_generator("pearsonr", "pillai", "gcm")
def linear_gaussian():
    ...

Benefits:

  1. Modularity: Each generator self-documents which tests it supports.
  2. Adding a new generator requires only creating a decorated function and no changes to the CI scripts is required.
  3. Maintainability: Eliminates a growing, error-prone central mapping.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions