Refactor DPO example to parametrize dataset configuration #499

Ashish-Patnaik · 2026-01-09T16:02:41Z

Fixes #498

This PR refactors examples/dpo.py to standardize it with other examples (like seq2seq.py and lora.py) and remove technical debt.

Changes:

Removed the hardcoded batch_size and max_length variables from _make_dataset.
Removed the placeholder comment # TODO(epot): !!!!.
Updated get_config() to define these hyperparameters and pass them into the dataset builder.

This improves the configurability of the script and cleans up the code style to match the rest of the repository.

Ashish-Patnaik added 2 commits January 9, 2026 19:22

Refactor classification example to avoid hardcoded label token IDs

033a936

Refactor DPO dataset construction to remove hardcoded parameters

2b177ff

Provide feedback