Is Recurrent PPO already supported #2229

BeFranke · 2024-06-14T09:38:54Z

BeFranke
Jun 14, 2024

Hi everyone,

I read the tutorial on recurrent Q learning, my question would be if recurrent PPO is also supported?
Any special caveats that I should be aware of when doing recurrent PPO in contrast to recurrent Q-learning?

I'm currently trying it out and receiving an error, if it turns out that it should be supported I will open an issue with more details.

Thanks for your input!

albertbou92 · 2024-06-14T15:46:44Z

albertbou92
Jun 14, 2024

Recurrent PPO (or any other on-policy loss) is supported. Here are the key points to consider, which are similar to the Q-learning tutorial:

Actor and Critic should contain one of the RNNs modules (LSTM or GRU supported).
Then, you should compose the environment with an InitTracker transform to track starts of episodes and a TensorDictPrimer transform to create and fill up the initial hidden states expected by the models with zeros when calling env.reset().
Use an inference version of the models to collect data and a training version of the models to compute the loss. Instantiate your models to get the inference version and then use the method set_recurrent_mode(True) to create the training version. Both training an inference version will share the same parameters automatically. Pass the inference version to the collector and the training version to the loss module and GAE module.
In the replay buffer you should store sequences instead of individual transitions. This is different to the current PPO implementations we have where we store individual transitions because we do not use RNNs. See: https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/ppo_atari.py#L169

Besides that, the loss should work out of the box on the data sampled from the replay buffer, and using the RNNs-based models. But feel free to open an issue or share code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Recurrent PPO already supported #2229

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Is Recurrent PPO already supported #2229

BeFranke Jun 14, 2024

Replies: 1 comment

albertbou92 Jun 14, 2024

BeFranke
Jun 14, 2024

albertbou92
Jun 14, 2024