Multiprocessing support for off policy algorithms #439

araffin · 2021-05-17T14:34:55Z

Description

closes #179

Add support for multiprocessing, only HerReplayBuffer not supported (done in #654 ).

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)
I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

…olicy

…aselines3 into feat/multienv-off-policy

…t/multienv-off-policy

…aselines3 into feat/multienv-off-policy

araffin · 2021-11-04T13:33:59Z

@yonkshi @SonSang could you also take a look and review this PR?

Miffyli · 2021-11-04T14:51:39Z

Would you have any performance metrics to share vs. single-threaded (agent performance, training time)? I will try to take a look before end of the week :)

araffin · 2021-11-05T10:52:45Z

so I've got something qualitative so you get the general idea of the pros/cons of multi envs.
This is SAC on Pendulum on cpu with (in order):

one env
two envs
four envs
height envs
four envs and gradient_steps=2 (instead of 1)

Sample efficiency:

Wall clock time

more envs = less sample efficient and faster wall clock time but you can retrieve some sample efficiency by having more gradients steps , in that case you will lose some speed (especially in that experiment with my laptop on cpu only)

the slower the env, the bigger the gap with using one env only will be

tests/test_spaces.py

docs/guide/examples.rst

stable_baselines3/common/type_aliases.py

stable_baselines3/common/off_policy_algorithm.py

Miffyli

LGTM but need to fix the typing thing :)

araffin · 2021-12-01T20:39:06Z

I reintroduced the next_obs = deepcopy(new_obs_) because it prevents many errors (and not only with dict observations)

araffin added 4 commits May 17, 2021 16:32

Add multi-env training support for SAC

353d60b

Fix for dict obs

43b4ee9

Pytype fixes

809b674

Merge branch 'master' into feat/multienv-off-policy

53a12ff

araffin added the experimental Experimental Feature label May 23, 2021

araffin added 4 commits May 23, 2021 13:27

Merge branch 'master' into feat/multienv-off-policy

a3e1ea4

Merge branch 'master' into feat/multienv-off-policy

d81cf71

Merge branch 'master' into feat/multienv-off-policy

0482607

Fix assert on number of envs

47daf8e

araffin mentioned this pull request Jun 26, 2021

[Feature request] Adding multiprocessing support for off policy algorithms #179

Closed

Merge branch 'master' into feat/multienv-off-policy

57013af

This was referenced Jul 6, 2021

Add the Bootstrapped Dual Policy Iteration algorithm for discrete action spaces Stable-Baselines-Team/stable-baselines3-contrib#35

Open

Pybullet SubprocVecEnv Multiprocessing leads to Broken Pipe Error #509

Closed

araffin added 5 commits July 16, 2021 16:50

Merge branch 'master' into feat/multienv-off-policy

6dcbe24

Merge branch 'master' into feat/multienv-off-policy

5bf096f

Merge branch 'master' into feat/multienv-off-policy

48ec0aa

Merge branch 'master' into feat/multienv-off-policy

42743ee

Merge remote-tracking branch 'origin/master' into feat/multienv-off-p…

02dd45b

…olicy

SonSang mentioned this pull request Sep 13, 2021

[Question] Regarding implementation of multi env off-policy algorithm (DQN, Replaybuffer) #567

Closed

2 tasks

araffin added 11 commits September 23, 2021 15:17

Merge branch 'master' into feat/multienv-off-policy

77a79aa

Merge branch 'feat/multienv-off-policy' of github.com:DLR-RM/stable-b…

350969d

…aselines3 into feat/multienv-off-policy

Merge branch 'master' into feat/multienv-off-policy

9582d45

Merge branch 'master' into feat/multienv-off-policy

71a8d10

Merge branch 'master' into feat/multienv-off-policy

10ca6bd

Merge branch 'master' into feat/multienv-off-policy

a607acf

Merge branch 'master' of github.com:DLR-RM/stable-baselines3 into fea…

8f6e59b

…t/multienv-off-policy

Remove for loop

35438e2

Add support for Dict obs

e02925f

Merge branch 'master' into feat/multienv-off-policy

cdd4df5

Merge branch 'feat/multienv-off-policy' of github.com:DLR-RM/stable-b…

5efee9d

…aselines3 into feat/multienv-off-policy

araffin requested a review from hill-a November 4, 2021 13:31

araffin mentioned this pull request Nov 5, 2021

Multiprocessing DLR-RM/rl-baselines3-zoo#184

Closed

Bug fix with VecNormalize

d6aa23e

Miffyli reviewed Nov 8, 2021

View reviewed changes

araffin and others added 14 commits November 16, 2021 17:19

Merge branch 'master' into feat/multienv-off-policy

d4856f1

Update README table

0636dbd

Merge branch 'master' into feat/multienv-off-policy

110721f

Update variable names

e5d0a6c

Update changelog and version

3aec4ab

Update doc and fix for gradient_steps=-1

9b28760

Add test for gradient_steps=-1

8ada4a5

Disable pytype pyi errors

e98356d

Merge branch 'master' into feat/multienv-off-policy

d2b07cd

Fix for DQN

84d0aae

Merge branch 'master' into feat/multienv-off-policy

ee7b044

Merge branch 'master' into feat/multienv-off-policy

0d47ee7

Update comment on deepcopy

f3d798a

Remove episode_reward field

d16bddc

araffin requested a review from Miffyli December 1, 2021 18:39

Miffyli approved these changes Dec 1, 2021

View reviewed changes

araffin added 2 commits December 1, 2021 20:56

Fix RolloutReturn

1c8ad92

Avoid modification by reference

77bd551

Fix error message

3a6d099

araffin merged commit 507ed17 into master Dec 1, 2021

araffin deleted the feat/multienv-off-policy branch December 1, 2021 21:30

christqoh mentioned this pull request Dec 17, 2021

Off-Policy MultiProcessing Example #697

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocessing support for off policy algorithms #439

Multiprocessing support for off policy algorithms #439

araffin commented May 17, 2021 •

edited

Loading

araffin commented Nov 4, 2021

Miffyli commented Nov 4, 2021

araffin commented Nov 5, 2021 •

edited

Loading

Miffyli left a comment

araffin commented Dec 1, 2021

Multiprocessing support for off policy algorithms #439

Multiprocessing support for off policy algorithms #439

Conversation

araffin commented May 17, 2021 • edited Loading

Description

Motivation and Context

Types of changes

Checklist:

araffin commented Nov 4, 2021

Miffyli commented Nov 4, 2021

araffin commented Nov 5, 2021 • edited Loading

Miffyli left a comment

Choose a reason for hiding this comment

araffin commented Dec 1, 2021

araffin commented May 17, 2021 •

edited

Loading

araffin commented Nov 5, 2021 •

edited

Loading