Description
In examples/dpo.py, the _make_dataset function currently relies on hardcoded values for max_length (512) and batch_size (16). There is also an outstanding TODO comment (# TODO(epot): !!!!) explicitly flagging this section for cleanup.
This differs from other examples like seq2seq.py and lora.py, where these parameters are defined in get_config() and passed down as arguments. This inconsistency makes it harder to configure the DPO training run without modifying the internal logic of the dataset builder.
Solution
I propose refactoring examples/dpo.py to match the pattern used in the other example scripts:
- Move the
batch_size and max_length definitions to the top of get_config().
- Update
_make_dataset to accept these values as keyword arguments (*, training, batch_size, max_length).
- Remove the placeholder TODO comment.
Alternatives I've considered
Leaving it as-is works functionality-wise, but it leaves technical debt in the codebase and makes the examples inconsistent for new users trying to learn the library.
Additional context
I am happy to open a PR to standardize this example.