You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During multi-image training in GRPO, I adjusted the model in the LORA section. When using both accuracy and format as rewards, the model failed to learn any content—it only learned the output format. Is there a solution to this problem?