First of all, thank you for the code provided.
I can not understand two places。 I will grateful if you can help me.
- self.pi_loss = pi_loss = tf.reduce_mean(log_pi_sampled * tf.stop_gradient(log_pi_sampled - Q_sampled + V_S1))
- tf.reduce_sum(log_prob, axis=1) - tf.reduce_sum(tf.log(1 - tf.square(tf.tanh(u)) + EPS), axis=1)
These are out of accord with the paper..
First of all, thank you for the code provided.
I can not understand two places。 I will grateful if you can help me.
These are out of accord with the paper..