Cuda Driver error inside Docker. #306

brolinA · 2024-06-14T10:58:31Z

Hi I am trying to run the repository in docker running on Ubuntu 20.04

The docker setup was successful and I was able to run the Jackle robot as expected. Then I tried to run wild_visual_navigation_ros inside the docker and I get the following error.

Command used
roslaunch wild_visual_navigation_ros wild_visual_navigation.launch

Error

This is the native cuda driver on my Ubuntu 20.04

Should the cuda version match between the docker image and my native system? Or is there something else that is causing the error.

The text was updated successfully, but these errors were encountered:

mmattamala · 2024-06-16T15:07:31Z

Hi @brolinA thanks for reporting this.
Can you check if you can:

do nvidia-smi inside the container?
do python3 -c "import torch; print(torch.cuda.is_available())" inside the container

I share your concern that there could be some incompatibility or driver issue.

brolinA · 2024-06-18T12:24:24Z

Hi @mmattamala,
Thank you for the response. Unfortunately, I am not able to enter docker after restarting my system. I keep getting the following error.

[+] Running 0/0
 ⠋ Container docker-wvn_nvidia-1  Recreate                                                                                                                                                                    0.0s 
Error response from daemon: unknown or invalid runtime name: nvidia

I have made sure that both nvidia-docker2 and nvidia-container-toolkit has been installed. Still it doesn't work. It was working the last time but doesn't work after restarting the PC.

Any idea how to tackle this issue.

mmattamala · 2024-06-18T12:28:20Z

I think that something is messed up with the nvidia docker configuration. Can you check that all the steps are correct? https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-docker

Similarly, this thread might have some tips: https://stackoverflow.com/questions/52865988/nvidia-docker-unknown-runtime-specified-nvidia

brolinA · 2024-06-18T12:37:49Z

Hi @mmattamala,
Thank you. I was able to fix it.

Here is the output of the commands you mentioned.

mmattamala · 2024-06-21T09:04:34Z

Good to know it helped.

Coming back to the original issue, it seems to be some mismatch between the driver in the host system and the container (because the nvidia-smi output doesn't match)

I'm a bit short on time at the moment to take a deeper look, but I recommend to search for similar issues with docker.

andreschreiber · 2024-08-24T15:43:00Z

Adding on to this -- I had the exact same issue (same error messages).
Using nvidia-smi in the container showed CUDA 12.3 whereas outside of the container it showed CUDA 12.2.
Changing the Dockerfile to use:
FROM nvidia/cuda:12.2.2-runtime-ubuntu20.04 as base
fixed the issue for me.

mmattamala · 2024-09-06T09:10:20Z

Thanks @andreschreiber for the proposed fix! I'll close the issue

mmattamala closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda Driver error inside Docker. #306

Cuda Driver error inside Docker. #306

brolinA commented Jun 14, 2024

mmattamala commented Jun 16, 2024 •

edited

Loading

brolinA commented Jun 18, 2024

mmattamala commented Jun 18, 2024

brolinA commented Jun 18, 2024

mmattamala commented Jun 21, 2024

andreschreiber commented Aug 24, 2024

mmattamala commented Sep 6, 2024

Cuda Driver error inside Docker. #306

Cuda Driver error inside Docker. #306

Comments

brolinA commented Jun 14, 2024

mmattamala commented Jun 16, 2024 • edited Loading

brolinA commented Jun 18, 2024

mmattamala commented Jun 18, 2024

brolinA commented Jun 18, 2024

mmattamala commented Jun 21, 2024

andreschreiber commented Aug 24, 2024

mmattamala commented Sep 6, 2024

mmattamala commented Jun 16, 2024 •

edited

Loading