[Feature Request] Double DQN #487

NickLucche · 2021-06-22T21:27:37Z

🚀 Feature

Add double variant of the dqn algorithm.

Motivation

It's in the roadmap #1.

Pitch

I suggest we go from:

with th.no_grad():
      # Compute the next Q-values using the target network
      next_q_values = self.q_net_target(replay_data.next_observations)
      # Follow greedy policy: use the one with the highest value
      next_q_values, _ = next_q_values.max(dim=1)

to:

with th.no_grad():
      # Compute the next Q-values using the target network
      next_q_values = self.q_net_target(replay_data.next_observations)
      if self.double_dqn:
          # use current model to select the action with maximal q value
          max_actions = th.argmax(self.q_net(replay_data.next_observations), dim=1)
          # evaluate q value of that action using fixed target network
          next_q_values = th.gather(next_q_values, dim=1, index=max_actions.unsqueeze(-1))
      else:
          # Follow greedy policy: use the one with the highest value
          next_q_values, _ = next_q_values.max(dim=1)

with double_dqn as additional flag to be passed to DQN init.

### Checklist

[ x] I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

Miffyli · 2021-06-22T21:31:14Z

Yup indeed this is one of the core enhancements we would like to see!

@araffin just checking with you: is bloating DQN code here slightly ok (with a new parameter), or should this go to contrib?

@NickLucche If you wish to work on this PR, note the we would like to see matching results with reference implementations (e.g. same results as in the original paper or in a well-established codebase). One or two Atari games should be enough.

NickLucche · 2021-06-22T21:38:16Z

Sure thing, I'll make sure to setup the experiments in a similar way.
I'll do some searching starting from the source paper.

Miffyli · 2021-06-22T21:40:12Z

I highly recommend using zoo for doing that btw, since it already has known-to-work preprocessing for Atari :). It takes a moment to get into, but will definitely pay itself off with these experiments (there are many tiny details that need taking of).

araffin · 2021-06-23T09:16:01Z

@araffin just checking with you: is bloating DQN code here slightly ok (with a new parameter), or should this go to contrib?

I think double DQN is ok as it is only a 3 lines change.
However, for prioritized replay and the rest of DQN extensions, I would prefer to have a RAINBOW implementation with everything in it. So we keep DQN implementation clean but have the full-bloated RAINBOW too.

NickLucche · 2021-06-23T12:40:45Z

@Miffyli Thanks for the tip.

It may take a while to run the experiments for the same number of timesteps as in "Human-level control through deep reinforcement learning" tho.
e.g I'm getting a decent improvement after ~5.5mln steps in terms of mean reward ( currently 5.92e+03 vs 1.41e+04) on Atlantis-v0.
Can I cut short on the number of steps or should I match the experiments in that too? What kind of benchmarks do you expect as a result?

araffin · 2021-06-23T12:43:45Z

Can I cut short on the number of steps or should I match the experiments in that too? What kind of benchmarks do you expect as a result?

I could help with that if needed but basically you can check what we did with QR-DQN: Stable-Baselines-Team/stable-baselines3-contrib#13

So Breakout + Pong using 10e6 training steps and 3 seeds (40e6 frames because of frame skip) + some benchmark on simple gym task.

NickLucche · 2021-06-23T12:59:04Z

Thanks a lot for your help, it is most appreciated!
I'd love to help out with this small contribution, so I'll try to mimic what has been done in that issue, if that's okay with you.

I was using a script very similar to this for testing so far Stable-Baselines-Team/stable-baselines3-contrib#13 (comment) as I'm yet not super-familiar with zoo.

araffin · 2021-06-24T13:33:37Z

I was using a script very similar to this for testing so far Stable-Baselines-Team/stable-baselines3-contrib#13 (comment) as I'm yet not super-familiar with zoo.

RL Zoo is quite easy to use, if you want to replicate results and plot them, we have a section for that in the documentation ;)
https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#how-to-replicate-the-results

NickLucche · 2021-09-16T21:29:20Z

Sorry for the long inactivity.
I managed to run a few experiments with the proposed change on pong+breakout, I'll leave here the training curves although I can't notice much of a difference (at least that's on par with original papers findings).

Vanilla DQN Pong

Vanilla DQN Breakout

Double DQN Pong

Double DQN Breakout

araffin · 2021-09-16T21:49:46Z

thanks for the update, best would be to have both dqn and double dqn in the same plot (i can help you with the command if needed).
Also, you should probably try on Atari games where double dqn is known to make a difference (in the paper, they give some examples, i think Asterix is one of them)
edit: you mentioned Atlantis too, no?

NickLucche · 2021-09-17T07:34:32Z

Oh yeah I'd appreciate that, I've used the plot_train script from rl-zoo to be more standard but I can't figure out how to put the results in the same plot using that.

Sure thing, I'll make sure to test it on a few more interesting use-cases.

araffin · 2021-09-17T08:22:29Z

Oh yeah I'd appreciate that, I've used the plot_train script from rl-zoo to be more standard but I can't figure out how to put the results in the same plot using that.

cf. https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#how-to-replicate-the-results

you need to plot the evaluations, for instance comparing dqn and ppo on Pong and Breakout:

python scripts/all_plots.py -a dqn ppo -e Pong Breakout -f rl-trained-agents/

(you can also specify multiple folders)

ercoargante · 2021-10-22T10:12:51Z

As a lecturer in an AI program, I recommend our students to use Stable Baselines for their projects due to ease of use and clear documentation (thx for that!). From an educational point of Q-learning and DQN are a good introduction to RL, so students start off using DQN. Results using DQN of SB3 are much, much worse compared to SB2 (both with default values for the parameters). This hampers the adoption of SB3 (and the enthusiasm of students for RL). I have not yet understood/investigated the reason of this difference. Obvious candidates are the missing extensions like PER and DDQN, but of course this is an assumption. Goal of this comment is just to mention that progress in SB3 on this topic is much appreciated. If I can be of help, for example in testing improvements, let me know. Best regards, Erco Argante

Miffyli · 2021-10-22T11:29:21Z

@ercoargante

Thank you for the kind words!

You are right the DQN augmentations do contribute in the performance, however they should not be "make or break" level changes across games. You need to check the original publications for how much the augmentation affects in the specific game you are using (it varies a lot).

Another common point are the hyperparameters. I think we changed some of the default training parameters between SB and SB3, so you might want to double-check that they are valid. Once those are set right (and you disable the augmentations of SB DQN), you should be getting equal-ish performance: #110

Another source for disparency may be the logging of rewards, where the reported performance reflects a different number (see e.g., #181 ). I do not think we included fix for that specific issue in SB.

TL;DR the augmentations definitely contribute here, but I would double-check the parameters and evaluation procedure (these two are points we bring up often to others seeking advice) :)

araffin · 2021-10-22T12:08:33Z

Hello,

Results using DQN of SB3 are much, much worse compared to SB2 (both with default values for the parameters).

As mentioned in the doc, it is highly recommended to use the RL Zoo (https://github.com/DLR-RM/rl-baselines3-zoo) because default hyperparameters are tuned for Atari games only. And you should compare apples to apples... (that's what we did in #110 where we deactivated SB2 DQN extensions and match hyperparameters)
SB2 DQN defaults are not the same as SB3 defaults...

Obvious candidates are the missing extensions like PER and DDQN, but of course this is an assumption.

In case you want to use a more recent algorithm, we have QR-DQN in our contrib repo.

Goal of this comment is just to mention that progress in SB3 on this topic is much appreciated.

My idea would be to have a RAINBOW implementation in SB3 to keep DQN simple but have all the tricks in one algorithm. I will open an issue (and contribution are welcomed ;)).

NickLucche · 2021-10-22T12:36:51Z

Hi,
sorry I didn't post any update, I'd gladly accept any help with running further experiments if you're still interested in the proposed feature as "flag argument" to DQN class.
I've had problems running experiments due to the long time requirements and (personal) gpu availability.
The one I managed to run on Atalantis didn't show any improvement over the vanilla implementation.

Miffyli · 2021-10-22T18:41:29Z

@NickLucche

Could you try training with VideoPinball environment (has the highest gap between DQN and DDQN according to the original paper)? I also recommend trying to do the runs with sb3-zoo, as it automatically applies all necessary wrappers for proper evaluation (this can easily go wrong).

araffin · 2022-04-11T16:45:11Z

closing in favor of #622

NickLucche added the enhancement New feature or request label Jun 22, 2021

araffin mentioned this issue Oct 22, 2021

[Feature Request] RAINBOW #622

Open

1 task

araffin closed this as completed Apr 11, 2022

araffin mentioned this issue Dec 10, 2022

DuelingDQN Stable-Baselines-Team/stable-baselines3-contrib#127

Draft

21 tasks

araffin mentioned this issue Nov 6, 2023

[Question] Is DQN actually Double DQN? #1737

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Double DQN #487

[Feature Request] Double DQN #487

NickLucche commented Jun 22, 2021

Miffyli commented Jun 22, 2021

NickLucche commented Jun 22, 2021

Miffyli commented Jun 22, 2021

araffin commented Jun 23, 2021

NickLucche commented Jun 23, 2021

araffin commented Jun 23, 2021

NickLucche commented Jun 23, 2021 •

edited

Loading

araffin commented Jun 24, 2021

NickLucche commented Sep 16, 2021

araffin commented Sep 16, 2021 •

edited

Loading

NickLucche commented Sep 17, 2021

araffin commented Sep 17, 2021

ercoargante commented Oct 22, 2021 •

edited

Loading

Miffyli commented Oct 22, 2021 •

edited

Loading

araffin commented Oct 22, 2021

NickLucche commented Oct 22, 2021

Miffyli commented Oct 22, 2021

araffin commented Apr 11, 2022

[Feature Request] Double DQN #487

[Feature Request] Double DQN #487

Comments

NickLucche commented Jun 22, 2021

🚀 Feature

Motivation

Pitch

Miffyli commented Jun 22, 2021

NickLucche commented Jun 22, 2021

Miffyli commented Jun 22, 2021

araffin commented Jun 23, 2021

NickLucche commented Jun 23, 2021

araffin commented Jun 23, 2021

NickLucche commented Jun 23, 2021 • edited Loading

araffin commented Jun 24, 2021

NickLucche commented Sep 16, 2021

Vanilla DQN Pong

Vanilla DQN Breakout

Double DQN Pong

Double DQN Breakout

araffin commented Sep 16, 2021 • edited Loading

NickLucche commented Sep 17, 2021

araffin commented Sep 17, 2021

ercoargante commented Oct 22, 2021 • edited Loading

Miffyli commented Oct 22, 2021 • edited Loading

araffin commented Oct 22, 2021

NickLucche commented Oct 22, 2021

Miffyli commented Oct 22, 2021

araffin commented Apr 11, 2022

NickLucche commented Jun 23, 2021 •

edited

Loading

araffin commented Sep 16, 2021 •

edited

Loading

ercoargante commented Oct 22, 2021 •

edited

Loading

Miffyli commented Oct 22, 2021 •

edited

Loading