Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging on Creating Domain #12 #39

Open
darrylmorley opened this issue Jan 30, 2024 · 1 comment
Open

Hanging on Creating Domain #12 #39

darrylmorley opened this issue Jan 30, 2024 · 1 comment

Comments

@darrylmorley
Copy link

darrylmorley commented Jan 30, 2024

Hi,

I'm having the same issue here with virt-manager hanging on creating domain.

dmesg is reporting: NVRM: Attempting to remove device 0000:01:00.0 with non-zero usage count!

I have changed the graphics mode to compute as recommended in issue #12. nvidia-smi indicates that there are no processes running on the Nvidia GPU, but still this issue with a non-zero usage count.

The IO information for the GPU is IOMMU Group 2 01:00.0 3D controller [0302]: NVIDIA Corporation TU117M [GeForce GTX 1650 Ti Mobile] [10de:1f95] (rev a1)

I have added the scripts bind_vfio.sh & unbind_vfio.sh, and made them executable.

The contents of bind_vfio.sh are:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

## Unbind gpu from nvidia and bind to vfio
virsh nodedev-detach $VIRSH_GPU_VIDEO

The contents of unbind_vfio.sh are:

#!/bin/bash

## Load the config file
source "/etc/libvirt/hooks/kvm.conf"

## Unbind gpu from vfio and bind to nvidia
virsh nodedev-reattach $VIRSH_GPU_VIDEO

## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

kvm.conf contents are simply:

VIRSH_GPU_VIDEO=pci_0000_01_00_0

Where to go from here?

Thanks

P.S. I saw that newer drivers could cause issues so downgraded drivers to nvidia-470 and tried again. Still hanging at the same point. dmesg:

NVRM: GPU at PCI:0000:01:00: GPU-832b7f23-880e-d656-20c0-331ca6c8873a
[  109.630956] NVRM: Xid (PCI:0000:01:00): 79, pid=3465, GPU has fallen off the bus.
[  109.630958] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[  243.446680] VFIO - User Level meta-driver version: 0.3
[  243.464970] NVRM: Attempting to remove minor device 0 with non-zero usage count!

P.P.S I have further information, by adding set -x at the beginning of bind_vfio.sh we have further logs in dmesg:

[   26.485239] nvidia 0000:01:00.0: not ready 8191ms after resume; waiting
[   34.933241] nvidia 0000:01:00.0: not ready 16383ms after resume; waiting
[   52.597123] nvidia 0000:01:00.0: not ready 32767ms after resume; waiting
[   87.412701] nvidia 0000:01:00.0: not ready 65535ms after resume; giving up
[   87.412759] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   87.473074] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.473112] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.473413] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   87.473638] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.473669] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.473846] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   87.473949] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.473980] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.474025] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   87.474119] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.474148] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.474309] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   87.474426] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.474457] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.474603] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.474632] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.474912] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.474937] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   87.475108] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x22:0x56:667)
[   87.475137] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[  145.364413] nvidia 0000:01:00.0: not ready 1023ms after resume; waiting
[  146.420419] nvidia 0000:01:00.0: not ready 2047ms after resume; waiting
[  148.596426] nvidia 0000:01:00.0: not ready 4095ms after resume; waiting
[  152.948405] nvidia 0000:01:00.0: not ready 8191ms after resume; waiting
[  161.396346] nvidia 0000:01:00.0: not ready 16383ms after resume; waiting
[  179.572264] nvidia 0000:01:00.0: not ready 32767ms after resume; waiting
[  214.387968] nvidia 0000:01:00.0: not ready 65535ms after resume; giving up
[  214.388023] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.410562] VFIO - User Level meta-driver version: 0.3
[  214.429107] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.429128] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.429312] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.429326] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.429365] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  214.552809] audit: type=1400 audit(1706607632.349:51): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-aab4928c-15e6-4b1b-aab1-de4fcd2b24af" pid=4628 comm="apparmor_parser"
[  214.681154] audit: type=1400 audit(1706607632.477:52): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-aab4928c-15e6-4b1b-aab1-de4fcd2b24af" pid=4631 comm="apparmor_parser"
[  214.819618] audit: type=1400 audit(1706607632.613:53): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-aab4928c-15e6-4b1b-aab1-de4fcd2b24af" pid=4640 comm="apparmor_parser"
[  216.980015] vfio-pci 0000:01:00.0: not ready 1023ms after resume; waiting
[  218.035803] vfio-pci 0000:01:00.0: not ready 2047ms after resume; waiting
[  220.275818] vfio-pci 0000:01:00.0: not ready 4095ms after resume; waiting
[  224.627829] vfio-pci 0000:01:00.0: not ready 8191ms after resume; waiting
[  233.075664] vfio-pci 0000:01:00.0: not ready 16383ms after resume; waiting
[  251.251461] vfio-pci 0000:01:00.0: not ready 32767ms after resume; waiting
[  286.067103] vfio-pci 0000:01:00.0: not ready 65535ms after resume; giving up
[  286.067158] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.067542] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.137034] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.137047] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.137052] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.137195] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  286.137281] NVRM: This is a 64-bit BAR mapped above 4GB by the system
               NVRM: BIOS or the Linux kernel, but the PCI bridge
               NVRM: immediately upstream of this GPU does not define
               NVRM: a matching prefetchable memory window.
[  286.137283] NVRM: This may be due to a known Linux kernel bug.  Please
               NVRM: see the README section on 64-bit BARs for additional
               NVRM: information.
[  286.137284] nvidia: probe of 0000:01:00.0 failed with error -1

Virt manager now exits cleanly with an error, without hanging:

Unable to complete install: 'internal error: Unknown PCI header type '127' for device '0000:01:00.0''

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 72, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/createvm.py", line 2008, in _do_async_install
    installer.start_install(guest, meter=meter)
  File "/usr/share/virt-manager/virtinst/install/installer.py", line 695, in start_install
    domain = self._create_guest(
  File "/usr/share/virt-manager/virtinst/install/installer.py", line 637, in _create_guest
    domain = self.conn.createXML(initial_xml or final_xml, 0)
  File "/usr/lib/python3/dist-packages/libvirt.py", line 4400, in createXML
    raise libvirtError('virDomainCreateXML() failed')
libvirt.libvirtError: internal error: Unknown PCI header type '127' for device '0000:01:00.0'
@revoltez
Copy link

revoltez commented May 7, 2024

I had a similar issue and in my case i was using the graphic card for rendering in my host so it was not getting released, i had to use the IGPU for rendering and let my dGPU idle so that it can be available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants