Skip to content

feat(grpo): add pluggable reward functions / verifier registry #38

@abrichr

Description

@abrichr

reward.py has only binary_task_success with no extensibility. RL training use cases need custom per-task verifiers and reward composition.

Current reward is hardcoded in rollout_collector.py:140.

Proposed design:

  • Implement a reward function protocol matching TRL's reward_funcs pattern (list of callables)
  • Support a TaskVerifierRegistry for registering task-specific verification functions

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions