Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training loop abstraction #5

Open
jimfleming opened this issue Nov 21, 2018 · 0 comments
Open

Training loop abstraction #5

jimfleming opened this issue Nov 21, 2018 · 0 comments
Labels
enhancement New feature or request

Comments

@jimfleming
Copy link
Contributor

jimfleming commented Nov 21, 2018

Just thinking in writing... I'll put this into an RFP when I have a more clear idea of what this should look like. For now this issue just a placeholder.

Thinking about this from first principles: setting aside reusable utility functions (which I think we have a good plan for) most of the code I write is in data preprocessing and the training script. Data preprocessing is project specific so we can avoid that for now. Inference is easy and project specific. The models' themselves are easy. Loading the data is easy thanks to tf.data.Dataset. The training script really does need some good patterns, at a minimum.

Every single training script, however, involves nearly identical code.

  1. Initialize argument parsing
  2. Create artifacts directory if it doesn't exist
  3. Enable Eager
  4. Set random seeds everywhere...
  5. Initialize the environment
  6. Initialize hyperparameters and save them
  7. Initialize optimizers and global step
  8. Initialize models
  9. Initialize checkpoint saver
  10. Initialize summary writer
  11. Big training loop...

Now, I don't think I really want to abstract away any of 1–10, despite being ~100 lines of code written for every project, because it's clear and easy to setup. But 11 ("Begin training...") is usually a bunch of loops that do specific, varying, but similar things:

For some number of iterations:

  • (optionally) Compute some training rollouts
  • Iterate over batches in a dataset for training
  • Compute losses using the model(s) and training data
  • Compute and update gradients (maybe clipping).
  • Save training summaries (either per batch or aggregated over the dataset)
  • Save checkpoints
  • (optionally) Compute some evaluation rollouts
  • Save evaluation summaries (either per batch or aggregated over the dataset)

This covers almost any algorithm we'll write from linear regression to meta-RL. The training loop essentially does what Estimator used to do before Eager but estimators had a terribly inflexible API made with only supervised machine learning in mind. I want a training loop abstraction that is as flexible as the current imperative approach with less redundancy and chance for error.

@jimfleming jimfleming added the enhancement New feature or request label Jan 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant