Reattach the dGPU to the host after VM is closed

AleXutzZu · June 23, 2024, 10:06am

I’ve been trying to passthrough the dGPU of my FW16 to a Windows VM as I unfortunately still need Windows for some apps. I have been following this tutorial, but running the script to find the IOMMU groups and their PCI IDs only shows up 2 IDs for the AMD GPU. I decided to modify the libvirt hooks not to use the serial and usb IDs and I also only added the VGA and Audio controllers to pass to the VM. After starting the VM, the GPU was nowhere to be found and after closing it the host OS would become non responsive. I am not sure where I went wrong with the setup, so if anyone could clue me in, it would be great

Mark_Hurenkamp · June 23, 2024, 11:25am

There are several topics already discussing this:

and

Please take a look there first.

That said, you need to make sure that the dGPU is not available to the host OS at boot time. Usually that is done by either adding some kernel commandline arguments, or by putting a custom file in /etc/modprobe.d.
Mine contains the following:

options vfio-pci ids=1002:ab30,1002:7480 disable_vga=1 x-no-kvm-intx=on

If you choose the latter, you need to re-create your initrd for the changes to take effect.

You should probably post more details if you want to get useful tips. Try listing the output of lspci, and lsusb, as well as the qemu commandline arguments you are using to start the VM.

AleXutzZu · June 23, 2024, 11:29am

The thing is that I want the dGPU to only detach when I start the VM. That is what the tutorial above was going for as I want the GPU back to the host when I close the VM.

Mark_Hurenkamp · June 23, 2024, 11:37am

Once the system has booted, you can still choose to attach the dGPU to the host before starting the DE, but if you do, you may need to shut down your DE before you can boot your Windows VM.
Also, I would advise to first get it to work without the dynamic switching, to make sure you have the basics covered. And only then try to change your setup to be able to switch dynamically. That eases the troubleshooting.

BTW, if dynamic switching is a particular concern for you, you might want to change the title of this thread to reflect that.

AleXutzZu · June 23, 2024, 12:09pm

I’m kinda new to all of these, do you happen to have a sort of guide I could follow?

Mark_Hurenkamp · June 23, 2024, 4:17pm

I don’t really have a tutorial, but here’s what i usually do:

Determine what hardware needs to be passed to the VM
This you can do using the ‘lspci’ command (and ‘lsusb’ if you also want to passthrough usb devices).

For my framework 16 it lists the following entries for the dGPU:

03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7700S/7600/7600S/7600M XT/PRO W7600] (rev c1)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio

To find out what ids they use, i use ‘lspci -n -s 03:00’ (the 03:00 points to the dGPU listed above), which gives me:

03:00.0 0300: 1002:7480 (rev c1)
03:00.1 0403: 1002:ab30

These give me the 1002:ab30 for audio, and 1002:7480 for the graphics.

Hide this hardware from the host on boot
Create a file (I named it libvirt.conf) in /etc/modprobe.d containing the following options line (values taken from above mentioned output):

options vfio-pci ids=1002:ab30,1002:7480 disable_vga=1

The disable_vga=1, keeps the dGPU from participating in VGA arbitration on boot, you really don’t want the dGPU VGA extensions to be used in the host.

Additionally you should probably add the following lines to the same file:

softdep drm pre: vfio-pci
softdep amdgpu pre: vfio-pci

This makes sure that vfio-pci is loaded before the graphics drivers, thus enabling it to claim the dGPU at boot.

Once you have done that, you need to recreate the initrd, and reboot. After boot, you can check if the dGPU is assigned to vfio-pci with the following command: lspci -v -s 03:00
This will give you a lot of info on the dGPU device, but most interestingly will show you what driver is assigned in the bottom two lines. In my case they list:

        Kernel driver in use: vfio-pci
        Kernel modules: amdgpu

for the graphics and

        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

for the audio device.

Once you get at this point, you have isolated the dGPU from your host, and are free to pass it on to a VM.

Re-assigning the dGPU to the host
When you want to re-assign your dGPU to the host (after boot, or after running a VM), you can do so using the ‘driverctl’ command:

driverctl set-override 0000:03:00.0 amdgpu
driverctl set-override 0000:03:00.1 snd_hda_intel

Now your dGPU should be re-assigned to the host.
I’m not sure how well the amdgpu and Wayland/X11 respond to the new resources becoming available, perhaps you need to restart your Wayland or X11 session before they will make use of the dGPU.

Now if all of this is working fine when run manually, you can consider automating the device re-assignment, e.g. using the libvirt hooks.