Hello, thank you for great study.
I want to reproduce results of your paper, but there is no specified hyperparameter settings for evaluation.
For example, for GSM8K test dataset evaluation, what are the specific output length, diffusion steps, unmasking strategy, and system prompt construction?
I noticed that LLaMA-Factory/examples/inference/llama2_full_ddm-gsm-inf.yaml is specifying these values, but not sure how to utilize this .yaml file.
Could you tell me which code to run so that the evaluation runs according to this YAML file configuration?
Also, can I ask you how many time and GPU resource is required for the gsm8k finetuned checkpoint?
