Pass the GPU (or how I learned to love IOMMU)

 

A few years ago I moved off VMWare in the home datacentre.

Like a lot of folks I have embraced KVM for most things virtual around the house, but it's only recently that I really needed to virtualise a GPU for one of the guests.

It turns out there's a few little tricks to getting this right, and I thought that since it took some time to get it all working just right I should write it down.  Even if it's so I can find what I went through!

This time I am using Red Hat Enterprise Linux 10.1 so my notes may be a little specific to that platform and release, but most of this should work regardless of the distribution.

First things first

I am going to assume that you have KVM up and running and that you're comfortable with creating and managing guests.  If not, you will need to get that done, and if you're installing from scratch don't install any GUI because it's going to make your job easier later on

Before we start installing cards, make sure your firmware settings are right.  Specifically:

  • Ensure that IOMMU is enabled (might be called Intel VT-d or AMD Vi).  By default it probably isn't.
  • Set your boot graphics device to be the onboard graphics.

Now, with the power off, install the GPU, and make sure that you still get a display on the onboard graphics port.

Assuming that worked, you now need to tell the kernel that it can use IOMMU.  We do this by adding a kernel argument "intel_iommu=on" to the command line  With Red Hat/Fedora and it's relatives the easiest way is to use "grubby" as follows:

sudo grubby --update-kernel=ALL --args="intel_iommu=on iommu=pt"

or

sudo grubby --update-kernel=ALL --args="amd_iommu=on iommu=pt" 

 Reboot, and run:

sudo virt-host=validate

 

You want to see a "PASS" next "Checking if IOMMU is enabled by kernel"

Now for the fun bits!

So far so good, so now we need to find the PCI address of the card we just installed, because it almost certainly has a bunch of functions.  The easiest way is to use "lspci" like so:

lspci | grep  -E "vga"

 The result should look something like this:

 

We found our card, and it's at PCI 03:00.0.  But what else is in that slot we might need to worry about?  It's easy to find out:

lspci | grep -E "03:00" 

 

 So we have:

  • A Display controller (the GPU);
  • An Audio controller;
  • A USB controller; and
  • A Serial-bus controller. 

We also need to make sure that all these devices are isolated in their own IOMMU group.  They probably are, but check with:

find /sys/kernel/iommu_groups/ -type l | sort -V

 

 Fortunately for us, group 13 is exclusively used by this card.

 Lastly, make sure that nothing else in the system can grab our card.  In my case

"lspci -v"

 showed me I had a bit of a problem:

 

 nouveau has grabbed the GPU,

 

intel audio grabbed the Audio controller and

 

 xci_hcd grabbed the host controller!

 So we need to know the device ID's with:

lspci -Dnn | grep NVIDIA

 

In my case, the GPU is [10de:2184] and we need to tell vfio-pci to take control of all the devices on this card.

create a file called:

/etc/modprobe.d/vfio.conf 

 options vfio-pci ids=10de:2184,10de:1aeb,10de:1aec,10de:1aed

Remember to use the PCI ID's from YOUR card.  They probably won't match the examples here 

 And yes, I could have issued one big lspci command to get all of this, but it's difficult to highlight what we're looking for.

We also need to stop nouvea from grabbing.  blacklist it in

/etc/modprobe.d/blacklist-gpu.conf

softdep nouveau pre: vfio-pci
blacklist nouveau
options nouveau modeset=0

With all this done we just need to update the initial ramdisk so that the kernel will make the changes we've asked for.  For recent Red Hat releases, dracut is the easiest way:

sudo dracut -f --kver $(uname -r)

 Finally, reboot and check with lspci that for each component on the card, the driver is now "vfio-pci".  With one card I tried, the USB controller stayed on "xhci_hcd", but it still worked OK.

Now what?

If we've made it this far, now it's time for the big test.  I used Virtual Machine Manager from my laptop to connect to the host.  Configure a new guest as you normally would (I used Ubuntu Server because it's pretty quick to install) but choose "Customize configuration before install".

Now go to "Add Hardware" and add all the pass-through PCI devices that were created.

Then install as normal.

Log into your new guest and run

 lspci -v

You should now see the GPU presented up to the guest.  In my case it was at PCI 05:00  

 

 

 There you have it.  A successful pass-through operation.  I expect you could probably connect other resources such as fiberchannel cards, but that's a topic for another day.

 Happy Virtualising!

 

 

 

 

Comments

Popular posts from this blog

1200W HF Linear Amplifier from budget parts - 2

The IDPROM contents are invalid

OpenVMS on KVM