SAC-discrete implementation #270

timoklein · 2022-08-29T15:32:16Z

Description

Adds the SAC-discrete algorithm as discussed in #266.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2022-08-29T15:32:21Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add your feedback	Jan 13, 2023 at 2:28PM (UTC)

cleanrl/sac_atari.py

dosssman · 2022-09-01T00:43:37Z

Is there any benchmark run for this ? How does it perform ?
Anyways, great job @timoklein

timoklein · 2022-09-06T13:29:28Z

The results of the original paper are in #266. There's quite a number of games where its performance at 100k doesn't differ much from a random agent, e.g. Frostbite. I don't think it makes much sense evaluating those.

Right now I just ran it a couple of times manually to see that the code actually works. A modified version of this codebase has been able to solve Catcher (PyGame) and some simple Minigrid environments.

In general, I'm going to try and find some good hyperparameters and then running it on a few environments where performance actually differs from a random agent. Don't know for sure when I have time for that though... Once that's done, I'll put a report in here. Maybe I'm going to run it for 200k steps also to verify the results.

After that plan on starting with some Docs :)

Thanks for helping out with this @Howuhh and @dosssman !

timoklein · 2022-09-19T06:39:20Z

Posting an update:
I'm running 1m step experiments on Seaquest (takes a while) currently with two versions of the algorithm: One with an implementation as close as possible to cleanrl's SAC implementation and the other which is closer to the paper.
The main difference is the update frequency: CleanRL does a critic update for every learning step, delays the actor updates but compensates for delayed actor updates. As far as I understand SAC-discrete does a single actor and critic update every four steps, not compensating for the delay.

timoklein · 2022-09-20T12:07:05Z

I ran some experiments on MsPacman and Seaquest.

Here's a link to a report with some results. The entropy regularization coefficient $\alpha$ has a tendency to explode when training longer but I couldn't find other experiments at 1m steps to verify whether that's an issue only in my code. Maybe @Howuhh and @dosssman have an idea?

This implementation doesn't quite match the results of the paper which might be due to not using evaluation mode (i.e. deterministic policy). If it's desired I can implement a test loop evaluating a deterministic policy.

The performance does match the reported values in this other implementation I found.

Are there any other experiments you'd like me to run, e.g. specific environments or more seeds?

I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).

Are there ways I can do this without xvfb? It's not on our group's machines and I don't have sudo.

If everything's fine, I'm gonna start writing Docs :)

vwxyzjn · 2022-09-23T15:49:24Z

Hi @timoklein, thank you! The experiments look very interesting.

This implementation doesn't quite match the results of the paper which might be due to not using evaluation mode (i.e. deterministic policy). If it's desired I can implement a test loop evaluating a deterministic policy.

Unless it makes a huge difference in the reported results, I wouldn't worry about it.

Are there any other experiments you'd like me to run, e.g. specific environments or more seeds?

Thanks for running the environments for three random seeds, which is great. For my personal interest, I would like to see the results in Pong, Breakout, and BeamRider, which are the common three environments that we used to benchmark other algorithms (see example here).

Are there ways I can do this without xvfb? It's not on our group's machines and I don't have sudo.

If this is an issue, feel free to run the experiments without xvfb. The video recordings are nice to have, but I am not requiring every run to have them. For example, it's not practical to obtain videos in EnvPool, so I didn't do it.

That said, please save the model at the end of training so that we can visualize them later. Feel free to use the following code:

torch.save(agent.state_dict(), f"models/{run_name}/agent.pt")
if args.prod_mode:
    wandb.save(f"models/{run_name}/agent.pt", base_path=f"models/{run_name}", policy="now")

Howuhh · 2022-09-23T17:01:16Z

@timoklein Sometimes target entropy maybe just very high and hard to reach and the loss can explode (as alpha will grow and grow), so usually I tune a bit coefficient (which is 0.98 defualt in code right know)

Some discussion about it:
toshikwa/sac-discrete.pytorch#15 (comment)

timoklein · 2022-12-19T11:02:39Z

From my point of view this is done now. I ran the new experiments and updated all plots and results. I also deleted all runs that aren't used for the final results from the openRL benchmark project. Docs should also work.
Now surely there'll be some inevitable bugs but then just ping me in the issue and I'm gonna try to fix them :)

dosssman · 2022-12-19T11:12:20Z

Great contribution !
I think we can proceed to merging this as it matches baseline results while maintaining a relatively simple implementation.
Other issues, if any exists, will come in light as other people try it out from a fresher perspective.

araffin

minor: the title for the pong figure is not the right one

timoklein · 2022-12-19T11:56:47Z

minor: the title for the pong figure is not the right one

Thanks, good catch!

timoklein · 2023-01-12T07:21:17Z

Anything more to do here for me? If this is blocking

The codespell check complains about the name "Jimmy Ba". It doesn't seem to have inline ignores (codespell-project/codespell#1212), so I'm not quite sure how to handle it.

I can just remove the citation so that the CI error is gone.

cleanrl/sac_atari.py

vwxyzjn

Everything LGTM. Feel free to merge after you have resolved the minor target-entropy-scale issue. You should already have contributor access. Thanks so much for this contribution, and sorry for the delay.

I can just remove the citation so that the CI error is gone.

Please keep the citation and just add ignore words in the pre-commit configs instead.

cleanrl/.pre-commit-config.yaml

Line 38 in 3f5535c

- --ignore-words-list=nd,reacher,thist,ths,magent

timoklein · 2023-01-13T14:24:22Z

Everything LGTM. Feel free to merge after you have resolved the minor target-entropy-scale issue. You should already have contributor access. Thanks so much for this contribution, and sorry for the delay.

No problem. It's been a fun experience and I learned a lot. Looking forward to contributing more in the future!

timoklein · 2023-01-13T14:41:22Z

@vwxyzjn
I'm not quite sure why this still fails
https://github.com/vwxyzjn/cleanrl/actions/runs/3912194839/jobs/6686505080#step:4:92

"Ba" should be correctly added to the pre-commit

cleanrl/.pre-commit-config.yaml

Line 38 in c3fc57d

- --ignore-words-list=nd,reacher,thist,ths,magent,Ba

I'm going to fix it but it might be Sunday before I get around to doing it.

EDIT: Probably a capitalization issue codespell-project/codespell#2137

add draft of SAC discrete implementation

18b643b

vercel bot deployed to Preview August 29, 2022 15:32 View deployment

run pre-commit

c3c98bd

Howuhh reviewed Aug 29, 2022

View reviewed changes

cleanrl/sac_atari.py Outdated Show resolved Hide resolved

timoklein added 4 commits August 31, 2022 11:23

Use log softmax instead of author's log-pi code

ec31dc4

Revert to cleanrl SAC delay implementation (it's more stable)

deb37e8

Remove docstrings and duplicate code

a1fdd2b

Use correct clipreward wrapper

977a83a

vercel bot deployed to Preview August 31, 2022 12:24 View deployment

Howuhh reviewed Aug 31, 2022

View reviewed changes

cleanrl/sac_atari.py Outdated Show resolved Hide resolved

vwxyzjn requested a review from dosssman August 31, 2022 20:55

dosssman reviewed Sep 1, 2022

View reviewed changes

cleanrl/sac_atari.py Outdated Show resolved Hide resolved

dosssman reviewed Sep 1, 2022

View reviewed changes

cleanrl/sac_atari.py Outdated Show resolved Hide resolved

dosssman reviewed Sep 1, 2022

View reviewed changes

cleanrl/sac_atari.py Outdated Show resolved Hide resolved

dosssman reviewed Sep 1, 2022

View reviewed changes

cleanrl/sac_atari.py Outdated Show resolved Hide resolved

timoklein added 3 commits September 6, 2022 14:32

fix bug in log softmax calculation

f2ea3e6

adhere to cleanrl log_prob naming

48af04c

fix bug in entropy target calculation

b2a09a0

change layer initialization to match existing cleanrl codebase

89680c7

vercel bot deployed to Preview September 6, 2022 13:35 View deployment

working minimal diff version

b1d7d44

implement original learning update frequency

61e1c74

vercel bot deployed to Preview September 20, 2022 05:39 View deployment

timoklein added 2 commits December 19, 2022 11:46

new sac-d training plots

33b00f3

update results table and fix link

5dabafb

vercel bot deployed to Preview December 19, 2022 10:53 View deployment

dosssman closed this Dec 19, 2022

dosssman reopened this Dec 19, 2022

araffin reviewed Dec 19, 2022

View reviewed changes

fix pong chart title

90b2fd5

vercel bot deployed to Preview December 19, 2022 12:07 View deployment

vwxyzjn reviewed Jan 12, 2023

View reviewed changes

cleanrl/sac_atari.py Outdated Show resolved Hide resolved

vwxyzjn approved these changes Jan 12, 2023

View reviewed changes

timoklein added 2 commits January 13, 2023 14:47

add Jimmy Ba name as exception to code spell check

a763994

change target_entropy_scale default value to same value as experiments

071cdbb

vercel bot deployed to Preview January 13, 2023 13:58 View deployment

Merge remote-tracking branch 'upstream/master' into sac-discrete

dcc2633

vercel bot deployed to Preview January 13, 2023 14:03 View deployment

timoklein self-assigned this Jan 13, 2023

timoklein closed this Jan 13, 2023

timoklein reopened this Jan 13, 2023

remove blank line at end of pre-commit

c671a92

vercel bot deployed to Preview January 13, 2023 14:28 View deployment

timoklein merged commit c3fc57d into vwxyzjn:master Jan 13, 2023

araffin mentioned this pull request Aug 12, 2023

SACD Discrete Soft Actor Critic Stable-Baselines-Team/stable-baselines3-contrib#203

Open

16 tasks

vwxyzjn mentioned this pull request Nov 28, 2023

SAC discrete #266

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAC-discrete implementation #270

SAC-discrete implementation #270

timoklein commented Aug 29, 2022 •

edited

Loading

vercel bot commented Aug 29, 2022 •

edited

Loading

dosssman commented Sep 1, 2022 •

edited

Loading

timoklein commented Sep 6, 2022 •

edited

Loading

timoklein commented Sep 19, 2022

timoklein commented Sep 20, 2022

vwxyzjn commented Sep 23, 2022

Howuhh commented Sep 23, 2022 •

edited

Loading

timoklein commented Dec 19, 2022

dosssman commented Dec 19, 2022

araffin left a comment

timoklein commented Dec 19, 2022

timoklein commented Jan 12, 2023

vwxyzjn left a comment

timoklein commented Jan 13, 2023

timoklein commented Jan 13, 2023 •

edited

Loading

SAC-discrete implementation #270

SAC-discrete implementation #270

Conversation

timoklein commented Aug 29, 2022 • edited Loading

Description

Types of changes

Checklist:

vercel bot commented Aug 29, 2022 • edited Loading

dosssman commented Sep 1, 2022 • edited Loading

timoklein commented Sep 6, 2022 • edited Loading

timoklein commented Sep 19, 2022

timoklein commented Sep 20, 2022

vwxyzjn commented Sep 23, 2022

Howuhh commented Sep 23, 2022 • edited Loading

timoklein commented Dec 19, 2022

dosssman commented Dec 19, 2022

araffin left a comment

Choose a reason for hiding this comment

timoklein commented Dec 19, 2022

timoklein commented Jan 12, 2023

vwxyzjn left a comment

Choose a reason for hiding this comment

timoklein commented Jan 13, 2023

timoklein commented Jan 13, 2023 • edited Loading

timoklein commented Aug 29, 2022 •

edited

Loading

vercel bot commented Aug 29, 2022 •

edited

Loading

dosssman commented Sep 1, 2022 •

edited

Loading

timoklein commented Sep 6, 2022 •

edited

Loading

Howuhh commented Sep 23, 2022 •

edited

Loading

timoklein commented Jan 13, 2023 •

edited

Loading