Skip to content
This repository has been archived by the owner on Mar 19, 2021. It is now read-only.

Make exponential learning-rate decay per step. #145

Merged
merged 1 commit into from
Oct 23, 2019

Conversation

twuebi
Copy link
Collaborator

@twuebi twuebi commented Oct 22, 2019

Make the decay in the ExponentialRate schedule based on the global step instead of the training epoch.

@@ -57,13 +57,13 @@ impl ExponentialDecay {
///
/// If `staircase` is true, the exponent of the decay is
/// computed using integer division. This has the effect that
/// the learning rate only changes every `decay_epochs` epochs.
/// the learning rate only changes every `decay_steps` steps.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of this change, but would generally be informative: Note that decay_steps not only influences the width of the steps but also directly affects the decay rate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'd also prefer a more descriptive name than decay_steps which mostly makes sense in the staircase-case. FWIW the formula is included in the doc of the struct:

lr = initial_lr * decay_rate ^ (global_step / decay_steps)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyways, the names we got now mirror those of https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/exponential_decay.

OTOH, PyTorch-exponential decay completely avoids decay_steps by not offering stair-casing:

class ExponentialLR(_LRScheduler):
    """Set the learning rate of each parameter group to the initial lr decayed
    by gamma every epoch. When last_epoch=-1, sets initial lr as lr.

    Args:
        optimizer (Optimizer): Wrapped optimizer.
        gamma (float): Multiplicative factor of learning rate decay.
        last_epoch (int): The index of last epoch. Default: -1.
    """

    def __init__(self, optimizer, gamma, last_epoch=-1):
        self.gamma = gamma
        super(ExponentialLR, self).__init__(optimizer, last_epoch)

    def get_lr(self):
        return [base_lr * self.gamma ** self.last_epoch
                for base_lr in self.base_lrs]

I am now wondering whether we should just opt for the simple version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If simple means no staircasing, no please leave staircasing in. I had some models in the past where staircasing improved performance a bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With per-epoch or per-batch decay?
I guess it doesn't hurt to have it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was per epoch. But now it can be per batch. But in the end it shouldn't matter to much. One has to adjust the hyper parameters anyway to get the same schedule as before (old steps * number of batches in the training data).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing something, but right now exponential decay is only available when using sticker as a library?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Plateau replaced it as the default, but I never added an option to switch to exponential decay.

sticker/src/tensorflow/lr.rs Outdated Show resolved Hide resolved
Make the decay in the ExponentialRate schedule based on the global
step instead of training epoch.
@danieldk danieldk removed the request for review from DiveFish October 23, 2019 09:54
@danieldk danieldk merged commit 98a7490 into stickeritis:master Oct 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants