RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM #185

matinhosseiny · 2020-06-23T19:48:33Z

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

Current repo: run git fetch && git status -uno to check and git pull to update repo
Common dataset: coco.yaml or coco128.yaml
Common environment: Colab, Google Cloud, or Docker image. See https://github.com/ultralytics/yolov5#reproduce-our-environment

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate these with utils.plot_results().

🐛 Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

Input:

import torch

a = torch.tensor([5])
c = a / 0

Output:

Traceback (most recent call last):
  File "/Users/glennjocher/opt/anaconda3/envs/env1/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-be04c762b799>", line 5, in <module>
    c = a / 0
RuntimeError: ZeroDivisionError

Expected behavior

A clear and concise description of what you expected to happen.

Environment

If applicable, add screenshots to help explain your problem.

OS: [e.g. Ubuntu]
GPU [e.g. 2080 Ti]

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

github-actions · 2020-06-23T19:49:12Z

Hello @matinhosseiny, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

matinhosseiny · 2020-06-23T19:50:14Z

Epoch gpu_mem GIoU obj cls total targets img_size
0%| | 0/277 [00:01<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 403, in
train(hyp)
File "train.py", line 269, in train
loss.backward()
File "/home/matin/yolo5/lib/python3.6/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/matin/yolo5/lib/python3.6/site-packages/torch/autograd/init.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM (operator() at /pytorch/aten/src/ATen/native/cudnn/Conv.cpp:1142)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f89e9945536 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0xf04ff2 (0x7f89eaca5ff2 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0xf01f75 (0x7f89eaca2f75 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #3: + 0xf0340f (0x7f89eaca440f in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0xf07060 (0x7f89eaca8060 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #5: at::native::cudnn_convolution_backward_weight(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x49 (0x7f89eaca82b9 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xf6dc10 (0x7f89ead0ec10 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xfb1e88 (0x7f89ead52e88 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #8: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x2fc (0x7f89eaca8f6c in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #9: + 0xf6d91b (0x7f89ead0e91b in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #10: + 0xfb1ee4 (0x7f89ead52ee4 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #11: + 0x2c80736 (0x7f8a29234736 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #12: + 0x2ccff44 (0x7f8a29283f44 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x378 (0x7f8a28e4c908 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #14: + 0x2d89705 (0x7f8a2933d705 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7f8a2933aa03 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x3d2 (0x7f8a2933b7e2 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_init(int) + 0x39 (0x7f8a29333e59 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7f8a35c7b5f8 in /home/matin/yolo5/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #19: + 0xbd6df (0x7f8a36b2a6df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #20: + 0x76db (0x7f8a38f5c6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #21: clone + 0x3f (0x7f8a3929588f in /lib/x86_64-linux-gnu/libc.so.6)

glenn-jocher · 2020-06-23T20:19:45Z

@matinhosseiny your issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

Your changes to the default repository. If your issue is not reproducible in a fresh git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:

sudo rm -rf yolov5  # remove existing
git clone https://github.com/ultralytics/yolov5 && cd yolov5 # clone latest
python detect.py  # verify detection
python train.py  # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE

Your custom data. If your issue is not reproducible with COCO data we can not debug it. Visit our Custom Training Tutorial for guidelines on training your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
Your environment. If your issue is not reproducible in our Google Cloud VM or Jupyter Notebook we can not debug it. Ensure you meet the requirements specified in the README: Unix, MacOS, or Windows with Python >= 3.7, PyTorch >= 1.5 etc. You can also use our Jupyter Notebook and our Docker Image to test your code in a working environment.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

github-actions · 2020-08-01T00:30:30Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tetsu-kikuchi · 2021-06-02T12:17:39Z

For your information:
In my case, this error happened when there are multi GPUs in my computer.
When I added --device 0 when I run python train.py, this error did not happen and the code worked correctly.

It seems that it was bad to use different type of GPU. In my case, I used two GPUs :
GeForce GTX 1070 Ti
GeForce RTX 2080 Ti

matinhosseiny added the bug Something isn't working label Jun 23, 2020

matinhosseiny changed the title ~~I modified the data.yaml and~~ RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM Jun 23, 2020

github-actions bot added the Stale label Aug 1, 2020

github-actions bot closed this as completed Aug 6, 2020

nguyen14ck mentioned this issue Nov 22, 2020

Run tutorial: RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM #1469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM #185

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM #185

matinhosseiny commented Jun 23, 2020

github-actions bot commented Jun 23, 2020 •

edited by glenn-jocher

Loading

matinhosseiny commented Jun 23, 2020

glenn-jocher commented Jun 23, 2020 •

edited

Loading

github-actions bot commented Aug 1, 2020

tetsu-kikuchi commented Jun 2, 2021 •

edited

Loading

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM #185

RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM #185

Comments

matinhosseiny commented Jun 23, 2020

🐛 Bug

To Reproduce (REQUIRED)

Expected behavior

Environment

Additional context

github-actions bot commented Jun 23, 2020 • edited by glenn-jocher Loading

matinhosseiny commented Jun 23, 2020

glenn-jocher commented Jun 23, 2020 • edited Loading

github-actions bot commented Aug 1, 2020

tetsu-kikuchi commented Jun 2, 2021 • edited Loading

github-actions bot commented Jun 23, 2020 •

edited by glenn-jocher

Loading

glenn-jocher commented Jun 23, 2020 •

edited

Loading

tetsu-kikuchi commented Jun 2, 2021 •

edited

Loading