Hi,
When I run the fine-tuning stage in the AdaptDiffuser pipeline, it appears that no trajectory is being selected. My suspicion is that task.metric_value might be set too high. Could you clarify the rationale behind the current setting? I was wondering if we should be doing something like:
DD_RETURN_SCALE[args.task.env_name] * (args.task.metric_value / return_reward_range(env.get_dataset(), 1000)[1])
Additionally, I’d like to confirm whether the current pipeline and configs can reproduce results similar to those reported in the CleanDiffuser paper. If not, should I investigate the lighting branch? I noticed that branch still seems under development (diffusion_training_steps: 2000) and might not be fully ready.
Any guidance would be greatly appreciated! Thank you.