Hello! I used the original code in baseline docker to train the RL model with command:
python3 -m baselines_energyplus.trpo_mpi.run_energyplus --num-timesteps 10000000. The model was trained for 200 episodes and saved and applied for inference with the same idf file and weather file. The west zone temperature during inference fluctuates a lot (annual mean temperature is about 22.4 degree), as shown in the graph below:

Also, the set point temperature set by action is almost always at the lowest value possible:

Is this the expected behavior for the episode?
Hello! I used the original code in baseline docker to train the RL model with command:


python3 -m baselines_energyplus.trpo_mpi.run_energyplus --num-timesteps 10000000. The model was trained for 200 episodes and saved and applied for inference with the same idf file and weather file. The west zone temperature during inference fluctuates a lot (annual mean temperature is about 22.4 degree), as shown in the graph below:Also, the set point temperature set by action is almost always at the lowest value possible:
Is this the expected behavior for the episode?