Skip to content
This repository has been archived by the owner on Mar 19, 2021. It is now read-only.

Epoch summaries on tensorboard #144

Open
sebpuetz opened this issue Oct 22, 2019 · 3 comments
Open

Epoch summaries on tensorboard #144

sebpuetz opened this issue Oct 22, 2019 · 3 comments
Assignees
Labels
feature New feature or request

Comments

@sebpuetz
Copy link
Collaborator

Add epoch-averaged (train and dev) values of the summarized metrics to tensorboard. After a few epochs it's hard to tell anything from the per-batch graphs.

@sebpuetz sebpuetz added the feature New feature or request label Oct 22, 2019
@danieldk danieldk self-assigned this Oct 23, 2019
@twuebi
Copy link
Collaborator

twuebi commented Oct 23, 2019

I'd like to keep the per-batch summaries and add the per-epoch as additional summaries.

Something along these lines:

epoch_loss_placeholder = tf.placeholder(name="epoch_loss_placeholder",dtype=tf.float32,shape=[])
epoch_acc_placeholder =  tf.placeholder(name="epoch_acc_placeholder",dtype=tf.float32,shape=[])
val_epoch = tf.Variable(
    0,
    trainable=False,
    dtype=tf.int64,
    name="val_epoch")
    val_epoch = tf.convert_to_tensor(val_epoch)
with tf.control_dependencies([val_epoch.assign_add(1)]):
    epoch_val_summaries = [
        tf.contrib.summary.scalar(name="epoch_loss",
                                  tensor=epoch_loss_placeholder,
                                  step=val_epoch,
                                  family="val"),
        tf.contrib.summary.scalar(name="epoch_acc",
                                  tensor=epoch_acc_placeholder,
                                  step=val_step,
                                  family="val")]

...

with tf.compat.v1.variable_scope("summaries"):
    self.train_summaries = tf.group(train_summaries, name="train")
    self.val_summaries = tf.group(val_summaries, name="val")
    self.epoch_val_summaries = tf.group(epoch_val_summaries, name="epoch_val")

IMO the per-batch graphs can be informative to find out what's going on when things don't work. For instance, by looking at the per batch gradient norms or spikes in the loss / accuracy.

@danieldk
Copy link
Member

At this point graph compatibility is everything. I guess the ramification here would be that we need three additional Optional ops in TaggerGraph, right?

@twuebi
Copy link
Collaborator

twuebi commented Oct 23, 2019

If graph compatibility means to load an old model with a newly written graph, then we need 4 optional ops since the variable val_epoch will be missing when calling the restore op of a new graph. So it'd also need to be a placeholder.

If it only means being able to load both graphs on the rust side, then it's three options.

All in all, it may be a good idea to rewrite the graph after training for inference where a stable interface is needed such that all of these compatibility problems would only apply to training graphs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature New feature or request
Development

No branches or pull requests

3 participants