Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-Tuning Fails With Exception Between Epoch1 and Epoch2 #72

Open
shashankiyer opened this issue Nov 27, 2018 · 1 comment
Open

Fine-Tuning Fails With Exception Between Epoch1 and Epoch2 #72

shashankiyer opened this issue Nov 27, 2018 · 1 comment

Comments

@shashankiyer
Copy link

shashankiyer commented Nov 27, 2018

I have been trying to use this code to fine-tune the network to classify images from the Cifar10 dataset. However, I get the following error:

Traceback (most recent call last):
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: fc6/weights_0
[[{{node fc6/weights_0}} = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](fc6/weights_0/tag, fc6/weights/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "finetune.py", line 202, in
keep_prob: 1.})
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: fc6/weights_0
[[node fc6/weights_0 (defined at finetune.py:137) = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](fc6/weights_0/tag, fc6/weights/read)]]

Caused by op 'fc6/weights_0', defined at:
File "finetune.py", line 137, in
tf.summary.histogram(var.name, var)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/summary/summary.py", line 187, in histogram
tag=tag, values=values, name=scope)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 284, in histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Nan in summary histogram for: fc6/weights_0
[[node fc6/weights_0 (defined at finetune.py:137) = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](fc6/weights_0/tag, fc6/weights/read)]]

These are lines in the code that cause this:

//Add gradients to summary
for gradient, var in grads_and_vars:
tf.summary.histogram(var.name + '/gradient', gradient)

//Add the variables we train to the summary
for var in var_list:
tf.summary.histogram(var.name, var)

I am running Tensorflow 1.12.0
Any pointers will be greatly appreciated.

@kratzert
Copy link
Owner

kratzert commented Mar 7, 2019

NaN values are almost always a hint that your learning rate ist to high. Try to decrease ist to e.g. 1e-3 or 1e-4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants