Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM when allocating tensor #7

Open
drobertduke opened this issue Sep 17, 2016 · 9 comments
Open

OOM when allocating tensor #7

drobertduke opened this issue Sep 17, 2016 · 9 comments

Comments

@drobertduke
Copy link

I have a 12GB GPU but attempting to train anything with the default settings produces an OOM on the first epoch. I had to dial the batch_size and the dilation_depth way down before it would even start. What settings are you using when you train?

I tensorflow/core/common_runtime/bfc_allocator.cc:689]      Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 83 Chunks of size 256 totalling 20.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 512 totalling 512B
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 15 Chunks of size 1024 totalling 15.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 65536 totalling 64.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 59 Chunks of size 262144 totalling 14.75MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 520704 totalling 508.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 105 Chunks of size 524288 totalling 52.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 13 Chunks of size 67108864 totalling 832.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 67174400 totalling 128.12MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 67239936 totalling 64.12MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 67371008 totalling 64.25MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 67633152 totalling 64.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 68157440 totalling 65.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 134479872 totalling 128.25MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 269484032 totalling 257.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 541065216 totalling 516.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1090519040 totalling 1.02GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 2147483648 totalling 2.00GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 2214592512 totalling 2.06GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 3726535936 totalling 3.47GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 10.68GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit:                 11715375924
InUse:                 11472467200
MaxInUse:              11473515776
NumAllocs:                     563
MaxAllocSize:           3980291328

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ****************************************************************************************xxxxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 2.00GiB.  See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[65536,256,32,1]
@basveeling
Copy link
Owner

I haven't trained with tensorflow yet but I'll look into it. In the meantime, try using theano with cnmem enabled (THEANO_FLAGS='lib.cnmem=1' KERAS_BACKEND=theano python wavenet.py)

@ibab
Copy link

ibab commented Sep 17, 2016

Maybe the reason for this might be the same as for the tensorflow implementation here: ibab/tensorflow-wavenet#4 (comment)
(I haven't looked at how keras implements AtrousConvolution1D, though).

@basveeling
Copy link
Owner

I was wondering why keras was requiring the dilation values to be equal in both dimensions when using tensorflow; it uses tf.nn.atrous_conv2d. Thanks for the heads up, and nice work on the fix :)!

@basveeling
Copy link
Owner

I'm closing this with the assumption that this is probably fixed in tensorflow by now. If not, please let me know.

@Shoshin23
Copy link

Nope. This is not fixed in Tensorflow as of yet. Im getting the EXACT same error as the OP. Trying to run using theano backend and seeing if it works.

@raavianvesh
Copy link

Did it work using theano backend?

@meridion
Copy link

I would like to let you all know it is fixed in TensorFlow 1.10. Works like a charm. I'm using the unmodified current master. (well, technically I modified a single line in dataset to make the code work in Python 3.x)

@basveeling
Copy link
Owner

@meridion Thanks! Would you mind sending a pull request so other users can easily benefit from your fix?

@basveeling basveeling reopened this Sep 3, 2018
@HunterHantao
Copy link

This is solved in Python 2.7, tensorflow-gpu 1.8.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants