Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you also have an LSTM implementation? #1

Open
Niels-Sch opened this issue Mar 27, 2019 · 5 comments
Open

Do you also have an LSTM implementation? #1

Niels-Sch opened this issue Mar 27, 2019 · 5 comments

Comments

@Niels-Sch
Copy link

I really love this implementation, and I see that LSTM is still in the TODO. Have you made any progress on this in the last two months or should I just do it myself?

@adik993
Copy link
Owner

adik993 commented Mar 27, 2019

Hi, I'm glad someone found it useful. Unfortunately, I haven't got time yet to implement it. I definitely will one day, but I'm not sure when I will find some time for it.

@Niels-Sch
Copy link
Author

Niels-Sch commented Mar 27, 2019

I just finished implementing it. It's still a massive mess though with lots of hackery so I won't bother you with it, but I might clean it up and let you know if you'd like :)

I really like how clear every function is in your code. You make me want to improve my own coding.

@adik993
Copy link
Owner

adik993 commented Mar 27, 2019

Heh, everything emerges from mess :) Yes, sure I'd be happy to see your take on it, it's always nice to have some reference during coding, especially with ML, where the devil is in the details.

@Niels-Sch
Copy link
Author

Niels-Sch commented Mar 31, 2019

I will :) I'm cleaning it while I'm figgering out how to connect the models to Java through onnx/tensorflow/keras.

I also changed some of the algorithm in my version. For example I'm normalizing the curiosity rewards and instead of using .exp() on the difference of the logs I'm using an approximation that doesn't explode. I also simplified some of the hyper parameters. I'm getting full solves of pendulum in a bit less than 20 epochs, so all your renders sticking up in the air. Btw your tensorboard logs are super useful! Because of that I realised that Tanh's are preferred in the agent model because they are slower than the Relu's, allowing the ICM to keep up. Also they're probably less jump-to-conclusiony making them more stable.

Also I'm not using the "recurrent" parameter yet since it makes saving the hidden states tricky while maintaining compatibility with the run_[...].py files, but I guess I'll figger that out after further cleaning.

@tomast95
Copy link

tomast95 commented Jul 12, 2019

Hi, I'm also interested in (statefull) LSTM implementation.
Your implementation is very nice(inheritance and not too long files) and super usefull. I even learned new python thing from you - datatypes in functions declarations and its returns. And also how to use tensorboard... huuge thank you! :)

So far I have changed some of your code to use statefull LSTM and removed multienv to run on my env in single process(felt easier to work with). ICM now runs on each episode seperately (instead of your [n_env,batch_size,n_features] its [batch_size, n_timesteps, n_features]) and later its concated to [n_env_spisodes, batch_size, n_timesteps, n_features] for PPO training input.

But I have problems with diverging losses and rewards (viz my post here ). So now I'm curious if my approach with LSTM is correct.

  • ICM rewards(states are propagated at once with zeros in init hidden)
  • Getting policies in PPO(one state at time with re-using hidden and reset between env episodes)
  • batch ppo training (re-using hidden and reset it between env episodes)

Divergence persist even after reworking it to use batches on all places it uses models (ICM in reward and loss, PPO in getting old policies and training)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants