agent.agents package¶
Submodules¶
agent.interfaces.partials.agents module¶
agent.agents.ppo_agent module¶
-
class
agent.agents.ppo_agent.
PPOAgent
(*args, **kwargs)[source]¶ Bases:
agent.interfaces.partials.agents.torch_agents.actor_critic_agent.ActorCriticAgent
PPO, Proximal Policy Optimization method
See method __defaults__ for default parameters
-
kl_target_stop
(old_log_probs, new_log_probs, kl_target=0.03, beta_max=20, beta_min=0.05)[source]¶ TRPO
negloss = -tf.reduce_mean(self.advantages_ph * tf.exp(self.logp - self.prev_logp)) negloss += tf.reduce_mean(self.beta_ph * self.kl_divergence) negloss += tf.reduce_mean(self.ksi_ph * tf.square(tf.maximum(0.0, self.kl_divergence - 2 * self.kl_target)))
self.ksi = 10
Adaptive kl_target = 0.01 Adaptive kl_target = 0.03
- Parameters
-
kl_target –
beta_max –
beta_min –
old_log_probs –
new_log_probs –
- Returns
-