Question about the config for Flux.1-Dev

We are currently reimplementing this excellent work, including AWM, DiffusionNFT and FlowGRPO.

However, we cannot obtain the eval curve of the paper. Can you share your config for training Flux.1-Dev.
We find that KL beta = 0 in current config, but in AWM paper, they set it to be 0.01.
We have test both beta and still cannot obtain the result in the paper. I will appreciate very much if you can help us solve this problem.

Our config is in the following:
```

# >>>>>>>>>> 环境配置（按你的机器改）<<<<<<<<<<
launcher: "accelerate"
config_file: deepspeed/deepspeed_zero2.yaml   # 相对于 syf_exp/
num_processes: 8        # GPU 数量，按实际修改
main_process_port: 29500
mixed_precision: "bf16"

# >>>>>>>>>> 数据配置（必须改）<<<<<<<<<<
data:
  dataset_dir: "dataset/pickscore"   # 相对于 syf_exp/，放数据或建软链接
  preprocessing_batch_size: 8
  dataloader_num_workers: 16
  force_reprocess: true
  cache_dir: "cache/datasets"  # 相对于 syf_exp/
  max_dataset_size: 1024
  sampler_type: "auto"

# >>>>>>>>>> 模型配置（必须改）<<<<<<<<<<
model:
  finetune_type: 'lora'
  lora_rank: 64
  lora_alpha: 128
  target_modules: "default"
  model_name_or_path: "./models/FLUX.1-dev"  # HuggingFace ID 或本地绝对路径
  model_type: "flux1"
  resume_path: null
  resume_type: null

# >>>>>>>>>> 日志与保存 <<<<<<<<<<
log:
  run_name: null    # 为 null 时自动生成，建议改成有意义的名字
  project: "PERL-SYF-TEST"
  logging_backend: "swanlab"    # Options: wandb, swanlab, tensorboard, none
  save_dir: "saves/"          # 相对于 syf_exp/
  save_freq: 20
  save_model_only: true

# >>>>>>>>>> 训练配置 <<<<<<<<<<
train:
  # Trainer settings
  trainer_type: 'awm'
  advantage_aggregation: 'sum' # Options: 'sum', 'gdpo'
  off_policy: false
  awm_weighting: 'ghuber'
  ghuber_power: 0.25
  # Training Timestep distribution
  num_train_timesteps: 4 # Set null to all steps
  time_sampling_strategy: discrete # Options: uniform, logit_normal, discrete, discrete_with_init, discrete_wo_init
  time_shift: 3.0
  timestep_range: 0.7 # Select fraction of timesteps to train on
  # Clipping
  clip_range: 1  # PPO/GRPO clipping range
  adv_clip_range: 1.0  # Advantage clipping range
  # KL div
  kl_weight: 'Uniform'
  kl_type: 'v-based'
  kl_beta: 0.01 # KL divergence beta
  ref_param_device: 'cuda' # Options: cpu, cuda
  # EMA
  ema_kl_beta: 0.1 # Coefficient of KL-loss between current policy and EMA policy, used to stablize training
  ema_decay_schedule: "linear"  # Decay schedule for EMA. Options: ['constant', 'power', 'linear', 'piecewise_linear', 'cosine', 'warmup_cosine']
  ema_decay: 0.3  # EMA decay rate (0 to disable)
  ema_update_interval: 1  # EMA update interval (in epochs)
  warmup_steps: 300
  ema_device: "cuda"  # Device to store EMA model (options: cpu, cuda)

  # Sampling
  resolution: 512  # Can be int or [height, width]
  num_inference_steps: 10  # Number of timesteps
  guidance_scale: 3.5  # Guidance scale for sampling

  # Batch and sampling
  per_device_batch_size: 8  # Batch size per device
  group_size: 16  # Group size for GRPO sampling
  global_std: true  # Use global std for advantage normalization
  unique_sample_num_per_epoch: 48  # Unique samples per group
  gradient_step_per_epoch: 1  # Gradient steps per epoch. The first step is on-policy, the rest are off-policy.
    
  # Optimization
  learning_rate: 3.0e-4  # Initial learning rate
  adam_weight_decay: 1.0e-4  # AdamW weight decay
  adam_betas: [0.9, 0.999]  # AdamW betas
  adam_epsilon: 1.0e-8  # AdamW epsilon
  max_grad_norm: 1.0  # Max gradient norm for clipping

  # Gradient checkpointing
  enable_gradient_checkpointing: false  # Enable gradient checkpointing to save memory with extra compute

  # Seed
  seed: 42  # Random seed

# Scheduler Configuration
scheduler:
  dynamics_type: "ODE"  # Options: Flow-SDE, Dance-SDE, CPS, ODE

# Evaluation settings
eval:
  resolution: 1024  # Evaluation resolution
  per_device_batch_size: 1  # Eval batch size
  guidance_scale: 3.5  # Guidance scale for sampling
  num_inference_steps: 20  # Number of eval timesteps
  eval_freq: 20  # Eval frequency in epochs (0 to disable)
  seed: 42  # Eval seed (defaults to training seed)

# Reward Model Configuration
rewards:
  - name: "pick_score"
    reward_model: "PickScore"
    batch_size: 16
    device: "cuda"
    dtype: bfloat16

# Optional Evaluation Reward Models
eval_rewards:
  - name: "pick_score"
    reward_model: "PickScore"
    batch_size: 32
    device: "cuda"
    dtype: bfloat16
```

The resulting curve is:

<img width="1030" height="618" alt="Image" src="https://github.com/user-attachments/assets/1baa0086-6420-479c-99d5-49a97e9039cf" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the config for Flux.1-Dev #128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the config for Flux.1-Dev #128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions