issue about RPPO #156

tingtingLiuLiu · 2023-02-18T05:09:49Z

Hello, when the LSTM network is used as the basic unit, the environmental state transition model is used for sampling, and the state sequence of t steps consecutive moments is used as the input of the network (s1,..., st).
How does Recurrent ppo set this value? Can you tell me the code location?
Thanks

araffin · 2023-02-18T20:46:06Z

the environmental state transition model is used for sampling, and the state sequence of t steps consecutive moments is used as the input of the network (s1,..., st).

you mean where do we create sequences of observations/actions to update the network?
this is done in the buffer: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/blob/master/sb3_contrib/common/recurrent/buffers.py#L25 (the code is complex to have a fixed mini-batch size without constrain on the number of envs/steps)

there is a simpler version available in #118 but with variable batch size.

araffin added the question Further information is requested label Feb 18, 2023

araffin closed this as completed May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue about RPPO #156

issue about RPPO #156

tingtingLiuLiu commented Feb 18, 2023

araffin commented Feb 18, 2023

issue about RPPO #156

issue about RPPO #156

Comments

tingtingLiuLiu commented Feb 18, 2023

araffin commented Feb 18, 2023