Make exponential learning-rate decay per step. #145

twuebi · 2019-10-22T21:15:03Z

Make the decay in the ExponentialRate schedule based on the global step instead of the training epoch.

sticker/src/tensorflow/lr.rs

sebpuetz · 2019-10-23T07:56:32Z

sticker/src/tensorflow/lr.rs

@@ -57,13 +57,13 @@ impl ExponentialDecay {
    ///
    /// If `staircase` is true, the exponent of the decay is
    /// computed using integer division. This has the effect that
-    /// the learning rate only changes every `decay_epochs` epochs.
+    /// the learning rate only changes every `decay_steps` steps.


Not part of this change, but would generally be informative: Note that decay_steps not only influences the width of the steps but also directly affects the decay rate.

Agreed. I'd also prefer a more descriptive name than decay_steps which mostly makes sense in the staircase-case. FWIW the formula is included in the doc of the struct:

lr = initial_lr * decay_rate ^ (global_step / decay_steps)

Anyways, the names we got now mirror those of https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/exponential_decay.

OTOH, PyTorch-exponential decay completely avoids decay_steps by not offering stair-casing:

class ExponentialLR(_LRScheduler): """Set the learning rate of each parameter group to the initial lr decayed by gamma every epoch. When last_epoch=-1, sets initial lr as lr. Args: optimizer (Optimizer): Wrapped optimizer. gamma (float): Multiplicative factor of learning rate decay. last_epoch (int): The index of last epoch. Default: -1. """ def __init__(self, optimizer, gamma, last_epoch=-1): self.gamma = gamma super(ExponentialLR, self).__init__(optimizer, last_epoch) def get_lr(self): return [base_lr * self.gamma ** self.last_epoch for base_lr in self.base_lrs]

I am now wondering whether we should just opt for the simple version.

If simple means no staircasing, no please leave staircasing in. I had some models in the past where staircasing improved performance a bit.

With per-epoch or per-batch decay?
I guess it doesn't hurt to have it.

That was per epoch. But now it can be per batch. But in the end it shouldn't matter to much. One has to adjust the hyper parameters anyway to get the same schedule as before (old steps * number of batches in the training data).

Maybe I am missing something, but right now exponential decay is only available when using sticker as a library?

Yes. Plateau replaced it as the default, but I never added an option to switch to exponential decay.

sticker/src/tensorflow/lr.rs

Make the decay in the ExponentialRate schedule based on the global step instead of training epoch.

twuebi requested review from danieldk, DiveFish and sebpuetz October 22, 2019 21:15

sebpuetz reviewed Oct 23, 2019

View reviewed changes

sticker/src/tensorflow/lr.rs Outdated Show resolved Hide resolved

sebpuetz reviewed Oct 23, 2019

View reviewed changes

twuebi force-pushed the exponential_decay_per_step branch from 269114a to d8b5ad4 Compare October 23, 2019 08:03

twuebi mentioned this pull request Oct 23, 2019

Make the exponential decay lr schedule available #147

Open

danieldk suggested changes Oct 23, 2019

View reviewed changes

sticker/src/tensorflow/lr.rs Outdated Show resolved Hide resolved

Make exponential learning-rate decay per step.

e7466e9

Make the decay in the ExponentialRate schedule based on the global step instead of training epoch.

twuebi force-pushed the exponential_decay_per_step branch from d8b5ad4 to e7466e9 Compare October 23, 2019 09:32

danieldk approved these changes Oct 23, 2019

View reviewed changes

danieldk removed the request for review from DiveFish October 23, 2019 09:54

danieldk merged commit 98a7490 into stickeritis:master Oct 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make exponential learning-rate decay per step. #145

Make exponential learning-rate decay per step. #145

twuebi commented Oct 22, 2019

sebpuetz Oct 23, 2019

twuebi Oct 23, 2019

twuebi Oct 23, 2019

danieldk Oct 23, 2019

twuebi Oct 23, 2019

danieldk Oct 23, 2019

twuebi Oct 23, 2019

danieldk Oct 23, 2019

Make exponential learning-rate decay per step. #145

Make exponential learning-rate decay per step. #145

Conversation

twuebi commented Oct 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment