TRPO and PPO Models don't train

@antoine-galataud @takaomoriyama 
Hello, I've been working on this project for a long time and I've trained the model using the TRPO policy for over 2000 epochs, but the reward would get stabalised early on only. Then I switched to the PPO policy, and it showed great progress when I trained it for 250 epochs where the reward went from -2 lakhs to -20000, but after that inspite of running it for more 300 epochs, the reward didn't drop and it stabalised. 
Please help me figure out why is that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRPO and PPO Models don't train #84

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

TRPO and PPO Models don't train #84

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions