Make the exponential decay lr schedule available #147

twuebi · 2019-10-23T09:10:18Z

Right now, the exponential decay lr schedule is not available for sticker trainand sticker pretrain. Once #145 is merged, it would make sense to have it available for both subcommands.

This may clutter the command line arguments a bit since we then have:

Plateau decay
- lr_scale
- lr_patience
Exponential decay
- decay_rate
- decay_exponent

Maybe it would make sense to move the learning-rate schedule related things to the config file.

The text was updated successfully, but these errors were encountered:

danieldk · 2019-10-23T10:53:12Z

No more things in the configuration file, there is too much stuff in there already that is only relevant to training. Maybe clap offers some functionality to only unveil options based on the value of some other option?

danieldk · 2019-10-23T10:55:30Z

There is requires, not sure if it hides an option if it is not present:

https://kbknapp.github.io/clap-rs/clap/struct.Arg.html#method.requires

twuebi · 2019-10-23T10:56:40Z

maybe group in conjunction with requires

danieldk · 2019-10-23T11:00:33Z

Yep. It's worth trying if they get hidden if the requirement is not given. But I guess at the very least it would also group the arguments together in usage information? (Which would go a long way of not making it to confusing.)

twuebi · 2019-10-23T11:00:49Z

https://kbknapp.github.io/clap-rs/clap/struct.ArgGroup.html

You can also do things such as name an ArgGroup as a confliction or requirement, meaning any of the arguments that belong to that group will cause a failure if present, or must present respectively.
Perhaps the most common use of ArgGroups is to require one and only one argument to be present out of a given set. Imagine that you had multiple arguments, and you want one of them to be required, but making all of them required isn't feasible because perhaps they conflict with each other. For example, lets say that you were building an application where one could set a given version number by supplying a string with an option argument, i.e. --set-ver v1.2.3, you also wanted to support automatically using a previous version number and simply incrementing one of the three numbers. So you create three flags --major, --minor, and --patch. All of these arguments shouldn't be used at one time but you want to specify that at least one of them is used. For this, you can create a group.

https://kbknapp.github.io/clap-rs/clap/struct.App.html#method.arg_group

twuebi · 2019-11-05T22:59:59Z

I looked a bit further into it, not yet happy with it.

Examples are below.

For the grouping in the help to work, we need to set AppSettings::DeriveDisplayOrder and hide_default_value(true) on every argument that has a default value and should be followed by a newline. This is necessary since appending "\n " to the preceeding help message was the only way to introduce a blank line to get a visual grouping (clap-rs/clap#1250). Arg::conflicts_with also conflicts with Arg::default_value, Arg::default_value_if can be used to get a conditional default value.

Grouping:

sticker-train 0.10.0
Train a sticker model

USAGE:
    sticker train [OPTIONS] <CONFIG> <TRAIN_DATA> <VALIDATION_DATA>

OPTIONS:
        --batchsize <BATCH_SIZE>    Batch size [default: 256]
        --continue <PARAMS>         Continue training from parameter files (e.g.: epoch-50)
        --lr <LR>                   Initial learning rate [default: 0.01]
        --warmup <N>                For the first N timesteps, the learning rate is linearly scaled up to LR.
                                     
        --plateau                   Plateau learning rate schedule
        --lr-patience <N>           Scale learning rate after N epochs without improvement
        --lr-scale <SCALE>          Value to scale the learning rate by
                                     
        --exponential               Exponential learning rate schedule
        --decay-rate <RATE>         coefficient of the exponential decay
        --decay-steps <STEPS>       global_step / steps is the exponent of the decay_rate
                                     
        --maxlen <N>                Ignore sentences longer than N tokens
        --shuffle_buffer <N>        Size of the buffer used for shuffling.
        --patience <N>              Maximum number of epochs without improvement [default: 15]
        --logdir <LOGDIR>           Write Tensorboard summaries to this directory.
    -h, --help                      Prints help information
    -V, --version                   Prints version information

ARGS:
    <CONFIG>             Sticker configuration
    <TRAIN_DATA>         Training data
    <VALIDATION_DATA>    Validation data

No grouping:

sticker-train 0.10.0
Train a sticker model

USAGE:
    sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> <--plateau|--exponential>

OPTIONS:
        --batchsize <BATCH_SIZE>    Batch size [default: 256]
        --continue <PARAMS>         Continue training from parameter files (e.g.: epoch-50)
        --lr <LR>                   Initial learning rate [default: 0.01]
        --warmup <N>                For the first N timesteps, the learning rate is linearly scaled up to LR. [default:
                                    0]
        --plateau                   Plateau learning rate schedule
        --lr-patience <N>           Scale learning rate after N epochs without improvement
        --lr-scale <SCALE>          Value to scale the learning rate by
        --exponential               Exponential learning rate schedule
        --decay-rate <RATE>         coefficient of the exponential decay
        --decay-steps <STEPS>       global_step / steps is the exponent of the decay_rate
        --maxlen <N>                Ignore sentences longer than N tokens
        --shuffle_buffer <N>        Size of the buffer used for shuffling.
        --patience <N>              Maximum number of epochs without improvement [default: 15]
        --logdir <LOGDIR>           Write Tensorboard summaries to this directory.
    -h, --help                      Prints help information
    -V, --version                   Prints version information

ARGS:
    <CONFIG>             Sticker configuration
    <TRAIN_DATA>         Training data
    <VALIDATION_DATA>    Validation data

Making the args mutually exclusive works via conflicts_with which can be specified on ArgGroup as well as Arg. Setting ArgGroup::multiple to true allows multiple values from the same group, per default it is false which means only one value from a group can be present.

.group(ArgGroup::with_name(SCHEDULE_GROUP).required(true))
            .arg(
                Arg::with_name(PLATEAU)
                    .long("plateau")
                    .help("Plateau learning rate schedule")
                    .group(SCHEDULE_GROUP)
                    .requires(PLATEAU_GROUP),
            )
            .group(
                ArgGroup::with_name(PLATEAU_GROUP)
                    .multiple(true)
                    .conflicts_with_all(&[EXPONENTIAL, EXPONENTIAL_GROUP])
            )
            .arg(
                Arg::with_name(LR_PATIENCE)
                    .long("lr-patience")
                    .value_name("N")
                    .help("Scale learning rate after N epochs without improvement")
                    .group(PLATEAU_GROUP)
                    .default_value_if(PLATEAU, None, "5"),
            )
            .arg(
                Arg::with_name(LR_SCALE)
                    .long("lr-scale")
                    .value_name("SCALE")
                    .help("Value to scale the learning rate by")
                    .group(PLATEAU_GROUP)
                    .default_value_if(PLATEAU, None, "0.5"),
            )
            .arg(
                Arg::with_name(EXPONENTIAL)
                    .long("exponential")
                    .help("Exponential learning rate schedule")
                    .group(SCHEDULE_GROUP)
                    .requires(EXPONENTIAL_GROUP),
            )
            .group(
                ArgGroup::with_name(EXPONENTIAL_GROUP)
                    .multiple(true)
                    .conflicts_with_all(&[PLATEAU, PLATEAU_GROUP])
            )
            .arg(
                Arg::with_name(DECAY_RATE)
                    .long("decay-rate")
                    .value_name("RATE")
                    .help("coefficient of the exponential decay")
                    .group(EXPONENTIAL_GROUP)
                    .default_value_if(EXPONENTIAL, None, "0.998"),
            )
            .arg(
                Arg::with_name(DECAY_STEPS)
                    .long("decay-steps")
                    .value_name("STEPS")
                    .help("global_step / steps is the exponent of the decay_rate")
                    .group(EXPONENTIAL_GROUP)
                    .default_value_if(EXPONENTIAL, None, "100"),
            )

The error messages we're getting are sometimes helpful:

$ ./target/release/sticker train dep.conf ger/train.conll ger/dev.conll 
error: The following required arguments were not provided:
    <--plateau|--exponential>

USAGE:
    sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> --batchsize <BATCH_SIZE> --lr <LR> --patience <N> --warmup <N> <--plateau|--exponential>

For more information try --help

Sometimes not so much:

$ ./target/release/sticker train dep.conf train.conll dev.conll --exponential --lr-patience 5 --lr-scale 0.3
error: The argument '--exponential' cannot be used with one or more of the other specified arguments

USAGE:
    sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> --batchsize <BATCH_SIZE> --lr <LR> --patience <N> --warmup <N> <--decay-rate <RATE>|--decay-steps <STEPS>> <--lr-patience <N>|--lr-scale <SCALE>> <--plateau|--exponential>

For more information try --help

./target/release/sticker train dep.conf ger/train.conll ger/dev.conll --exponential --decay-rate 5 --lr-scale 0.3
error: The argument '--lr-scale <SCALE>' cannot be used with one or more of the other specified arguments

USAGE:
    sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> --batchsize <BATCH_SIZE> --lr <LR> --patience <N> --warmup <N> <--decay-rate <RATE>|--decay-steps <STEPS>> <--lr-patience <N>|--lr-scale <SCALE>> <--plateau|--exponential>

For more information try --help

danieldk added this to the release-0.11 milestone Oct 23, 2019

twuebi mentioned this issue Jan 15, 2020

Make finetune actually fine tune the model stickeritis/sticker2#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the exponential decay lr schedule available #147

Make the exponential decay lr schedule available #147

twuebi commented Oct 23, 2019

danieldk commented Oct 23, 2019 •

edited

Loading

danieldk commented Oct 23, 2019

twuebi commented Oct 23, 2019

danieldk commented Oct 23, 2019 •

edited

Loading

twuebi commented Oct 23, 2019 •

edited

Loading

twuebi commented Nov 5, 2019

Make the exponential decay lr schedule available #147

Make the exponential decay lr schedule available #147

Comments

twuebi commented Oct 23, 2019

danieldk commented Oct 23, 2019 • edited Loading

danieldk commented Oct 23, 2019

twuebi commented Oct 23, 2019

danieldk commented Oct 23, 2019 • edited Loading

twuebi commented Oct 23, 2019 • edited Loading

twuebi commented Nov 5, 2019

danieldk commented Oct 23, 2019 •

edited

Loading

danieldk commented Oct 23, 2019 •

edited

Loading

twuebi commented Oct 23, 2019 •

edited

Loading