Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Implement SAC-Discrete #157

Closed
toshikwa opened this issue Sep 11, 2020 · 7 comments
Closed

[Feature request] Implement SAC-Discrete #157

toshikwa opened this issue Sep 11, 2020 · 7 comments
Labels
enhancement New feature or request experimental Experimental Feature
Milestone

Comments

@toshikwa
Copy link

Hi, thank you for your great work!!
I'm interested in contributing to Stable-Baselines3.

I want to implement SAC-Discrete(paper, my implementation).
Can we discuss before implementing??

@Miffyli Miffyli added the enhancement New feature or request label Sep 11, 2020
@Miffyli Miffyli added this to the v1.2 milestone Sep 11, 2020
@Miffyli
Copy link
Collaborator

Miffyli commented Sep 11, 2020

Cheers for the nice comments :).

We are (still) working on getting v1.0 out, i.e. mainly bug testing and reviewing of the code. After the release we can discuss adding new algorithms or improvements to existing algorithms. On a quick glimpse this seems simple enough that it could be added with not much extra code.

@araffin
Copy link
Member

araffin commented Sep 11, 2020

Hello,

Thanks for the suggestion =)

In principle I would be for that addition. We mostly need to discuss the advantage of it vs DQN and variants (QR-DQN, ...) in term of performance and runtime and see how much effort it requires and complexity it adds.

@Miffyli maybe a good candidate for stable-baselines3 "contrib" (same as #83 )

@toshikwa
Copy link
Author

Thank you for the response.

According to the paper, SAC-Discrete is evaluated with 100k environment steps because they are most interested in sample efficiency, not final performance.

Its results at 100k steps were not bad, but it failed to solve some simple tasks like Pong.
DQN (and its extensions) can get much better result although needs more samples.
I would say there is a trade-off. (What do you think?)

Once v1.0 is released, I can contribute to implementing QR-DQN and IQN, in addition to SAC-Discrete.

Thanks :)

@araffin araffin added the experimental Experimental Feature label Sep 20, 2020
@araffin
Copy link
Member

araffin commented Oct 30, 2020

The contrib repo is here ;) https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

make sure to read the contributing guide carefully first ;).
In term of priority, I would prefer QR-DQN and IQN first. For QR-DQN, you can re-use the huber quantile loss defined in TQC.

(we don't advertise it yet as we want to check the process and not get too many request for now)

@cosmir17
Copy link

cosmir17 commented Dec 7, 2020

I was asked to post it here, @partiallytyped, regarding the following comment.
#1 (comment)

PartiallyTyped posted an academic paper link for a SAC algorithm that takes a discrete input.

I think PartiallyTyped is already aware since the main github link was mentioned on the paper page, there is a source code example for it. The author publicised his code.
https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/master/agents/actor_critic_agents/SAC_Discrete.py

Hope this helps,
Sean

@araffin
Copy link
Member

araffin commented Jan 8, 2021

I would now close this one as it rather belongs the contrib repo.

PartiallyTyped posted an academic paper link for a SAC algorithm that takes a discrete input.

Academic, yes, but not peer-reviewed...

@araffin araffin closed this as completed Jan 8, 2021
@cosmir17
Copy link

cosmir17 commented Jan 8, 2021

@araffin How about the following paper?
https://arxiv.org/abs/1912.11077v1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request experimental Experimental Feature
Projects
None yet
Development

No branches or pull requests

4 participants