linux

iv/linux

Author	SHA1	Message	Date
Jason Gunthorpe	8e432bb015	vfio/mdev: Pass in a struct vfio_device * to vfio_pin/unpin_pages() Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. The struct vfio_device already contains the group we need so this avoids complexity, extra refcountings, and a confusing lifecycle model. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:59 -06:00
Jason Gunthorpe	09ea48efff	vfio: Make vfio_(un)register_notifier accept a vfio_device All callers have a struct vfio_device trivially available, pass it in directly and avoid calling the expensive vfio_group_get_from_dev(). Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:58 -06:00
Robin Murphy	a77109ffca	vfio: Stop using iommu_present() IOMMU groups have been mandatory for some time now, so a device without one is necessarily a device without any usable IOMMU, therefore the iommu_present() check is redundant (or at best unhelpful). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/537103bbd7246574f37f2c88704d7824a3a889f2.1649160714.git.robin.murphy@arm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:58 -06:00
Alex Williamson	5acb6cd19d	Merge tag 'gvt-next-2022-04-29' into v5.19/vfio/next Merge GVT-g dependencies for vfio. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-05-11 13:12:05 -06:00
Alex Williamson	920df8d6ef	Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com Signed-of-by: Leon Romanovsky <leonro@nvidia.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCYntY3AAKCRAp8NhrnBAZ scRWAP0QzEqg/Xqk/geUAQ3dliFrA2DZJm8v9B3x5tA5nEAazAD9HqC17MvDzY8T 6KBP7G37JNg2NCkxnKnt2gCIT+O4lgA= =zwWT -----END PGP SIGNATURE----- Merge tag 'mlx5-lm-parallel' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into v5.19/vfio/next Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com Signed-of-by: Leon Romanovsky <leonro@nvidia.com>	2022-05-11 13:08:49 -06:00
Yishai Hadas	85c205db60	vfio/mlx5: Run the SAVE state command in an async mode Use the PF asynchronous command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Link: https://lore.kernel.org/r/20220510090206.90374-5-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-05-11 09:33:40 +03:00
Yishai Hadas	8580ad14f9	vfio/mlx5: Refactor to enable VFs migration in parallel Refactor to enable different VFs to run their commands over the PF command interface in parallel and to not block one each other. This is done by not using the global PF lock that was used before but relying on the VF attach/detach mechanism to sync. Link: https://lore.kernel.org/r/20220510090206.90374-4-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-05-11 09:33:33 +03:00
Yishai Hadas	61a2f1460f	vfio/mlx5: Manage the VF attach/detach callback from the PF Manage the VF attach/detach callback from the PF. This lets the driver to enable parallel VFs migration as will be introduced in the next patch. As part of this, reorganize the VF is migratable code to be in a separate function and rename it to be set_migratable() to match its functionality. Link: https://lore.kernel.org/r/20220510090206.90374-3-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-05-11 09:33:22 +03:00
Jason Gunthorpe	2aa72ec97c	vfio/mdev: Use the driver core to create the 'remove' file The device creator is supposed to use the dev.groups value to add sysfs files before device_add is called, not call sysfs_create_files() after device_add() returns. This creates a race with uevent delivery where the extra attribute will not be visible. This was being done because the groups had been co-opted by the mdev driver, now that prior patches have moved the driver's groups to the struct device_driver the dev.group is properly free for use here. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220411141403.86980-34-hch@lst.de Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>	2022-04-21 07:36:56 -04:00
Jason Gunthorpe	6b42f491e1	vfio/mdev: Remove mdev_parent_ops The last useful member in this struct is the supported_type_groups, move it to the mdev_driver and delete mdev_parent_ops. Replace it with mdev_driver as an argument to mdev_register_device() Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220411141403.86980-33-hch@lst.de Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>	2022-04-21 07:36:56 -04:00
Jason Gunthorpe	e6486939d8	vfio/mdev: Remove mdev_parent_ops dev_attr_groups This is only used by one sample to print a fixed string that is pointless. In general, having a device driver attach sysfs attributes to the parent is horrific. This should never happen, and always leads to some kind of liftime bug as it become very difficult for the sysfs attribute to go back to any data owned by the device driver. Remove the general mechanism to create this abuse. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220411141403.86980-32-hch@lst.de Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>	2022-04-21 07:36:56 -04:00
Jason Gunthorpe	6c7f98b334	vfio/mdev: Remove vfio_mdev.c Now that all mdev drivers directly create their own mdev_device driver and directly register with the vfio core's vfio_device_ops this is all dead code. Delete vfio_mdev.c and the mdev_parent_ops members that are connected to it. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220411141403.86980-31-hch@lst.de Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>	2022-04-21 07:36:56 -04:00
Jason Gunthorpe	1ef3342a93	vfio/pci: Fix vf_token mechanism when device-specific VF drivers are used get_pf_vdev() tries to check if a PF is a VFIO PF by looking at the driver: if (pci_dev_driver(physfn) != pci_dev_driver(vdev->pdev)) { However now that we have multiple VF and PF drivers this is no longer reliable. This means that security tests realted to vf_token can be skipped by mixing and matching different VFIO PCI drivers. Instead of trying to use the driver core to find the PF devices maintain a linked list of all PF vfio_pci_core_device's that we have called pci_enable_sriov() on. When registering a VF just search the list to see if the PF is present and record the match permanently in the struct. PCI core locking prevents a PF from passing pci_disable_sriov() while VF drivers are attached so the VFIO owned PF becomes a static property of the VF. In common cases where vfio does not own the PF the global list remains empty and the VF's pointer is statically NULL. This also fixes a lockdep splat from recursive locking of the vfio_group::device_lock between vfio_device_get_from_name() and vfio_device_get_from_dev(). If the VF and PF share the same group this would deadlock. Fixes: `ff53edf6d6` ("vfio/pci: Split the pci_driver code out of vfio_pci_core.c") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v3-876570980634+f2e8-vfio_vf_token_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-04-13 11:37:44 -06:00
Shameer Kolothum	4406f46c9b	hisi_acc_vfio_pci: Use its own PCI reset_done error handler Register private handler for pci_error_handlers.reset_done and update state accordingly. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220308184902.2242-10-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-03-15 11:41:32 -06:00
Longfang Liu	b0eed08590	hisi_acc_vfio_pci: Add support for VFIO live migration VMs assigned with HiSilicon ACC VF devices can now perform live migration if the VF devices are bind to the hisi_acc_vfio_pci driver. Just like ACC PF/VF drivers this VFIO driver also make use of the HiSilicon QM interface. QM stands for Queue Management which is a generic IP used by ACC devices. It provides a generic PCIe interface for the CPU and the ACC devices to share a group of queues. QM integrated into an accelerator provides queue management service. Queues can be assigned to PF and VFs, and queues can be controlled by unified mailboxes and doorbells. The QM driver (drivers/crypto/hisilicon/qm.c) provides generic interfaces to ACC drivers to manage the QM. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220308184902.2242-9-shameerali.kolothum.thodi@huawei.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-03-15 11:41:32 -06:00
Shameer Kolothum	6abdce51af	hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region HiSilicon ACC VF device BAR2 region consists of both functional register space and migration control register space. Unnecessarily exposing the migration BAR region to the Guest has the potential to prevent/corrupt the Guest migration. Hence, introduce a separate struct vfio_device_ops for migration support which will override the ioctl/read/write/mmap methods to hide the migration region and limit the Guest access only to the functional register space. This will be used in subsequent patches when we add migration support to the driver. Please note that it is OK to export the entire VF BAR if migration is not supported or required as this cannot affect the PF configurations. Reviewed-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220308184902.2242-6-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-03-15 11:34:08 -06:00
Shameer Kolothum	ee3a5b2359	hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices Add a vendor-specific vfio_pci driver for HiSilicon ACC devices. This will be extended in subsequent patches to add support for VFIO live migration feature. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220308184902.2242-5-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-03-15 11:34:08 -06:00
Yishai Hadas	5b26f2c249	vfio/mlx5: Fix to not use 0 as NULL pointer Fix sparse warning to not use plain integer (i.e. 0) as NULL pointer. Reported-by: kernel test robot <lkp@intel.com> Fixes: `6fadb02126` ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/202203090703.kxvZumJg-lkp@intel.com Link: https://lore.kernel.org/r/20220309080217.94274-1-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-03-09 10:29:40 -07:00
Alex Williamson	f8a665b159	Merge branches 'v5.18/vfio/next/mlx5-migration-v10', 'v5.18/vfio/next/pm-fixes' and 'v5.18/vfio/next/uml-build-fix' into v5.18/vfio/next/next	2022-03-03 09:55:34 -07:00
Yishai Hadas	88faa5e8ea	vfio/mlx5: Use its own PCI reset_done error handler Register its own handler for pci_error_handlers.reset_done and update state accordingly. Link: https://lore.kernel.org/all/20220224142024.147653-16-yishaih@nvidia.com Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-03-03 13:01:19 +02:00
Yishai Hadas	915076f70e	vfio/pci: Expose vfio_pci_core_aer_err_detected() Expose vfio_pci_core_aer_err_detected() to be used by drivers as part of their pci_error_handlers structure. Next patch for mlx5 driver will use it. Link: https://lore.kernel.org/all/20220224142024.147653-15-yishaih@nvidia.com Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-03-03 13:01:19 +02:00
Yishai Hadas	6fadb02126	vfio/mlx5: Implement vfio_pci driver for mlx5 devices This patch adds support for vfio_pci driver for mlx5 devices. It uses vfio_pci_core to register to the VFIO subsystem and then implements the mlx5 specific logic in the migration area. The migration implementation follows the definition from uapi/vfio.h and uses the mlx5 VF->PF command channel to achieve it. This patch implements the suspend/resume flows. Link: https://lore.kernel.org/all/20220224142024.147653-14-yishaih@nvidia.com Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-03-03 13:01:19 +02:00
Yishai Hadas	f1d98f346e	vfio/mlx5: Expose migration commands over mlx5 device Expose migration commands over the device, it includes: suspend, resume, get vhca id, query/save/load state. As part of this adds the APIs and data structure that are needed to manage the migration data. Link: https://lore.kernel.org/all/20220224142024.147653-13-yishaih@nvidia.com Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-03-03 13:01:19 +02:00
Jason Gunthorpe	8cb3d83b95	vfio: Extend the device migration protocol with RUNNING_P2P The RUNNING_P2P state is designed to support multiple devices in the same VM that are doing P2P transactions between themselves. When in RUNNING_P2P the device must be able to accept incoming P2P transactions but should not generate outgoing P2P transactions. As an optional extension to the mandatory states it is defined as in between STOP and RUNNING: STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP For drivers that are unable to support RUNNING_P2P the core code silently merges RUNNING_P2P and RUNNING together. Unless driver support is present, the new state cannot be used in SET_STATE. Drivers that support this will be required to implement 4 FSM arcs beyond the basic FSM. 2 of the basic FSM arcs become combination transitions. Compared to the v1 clarification, NDMA is redefined into FSM states and is described in terms of the desired P2P quiescent behavior, noting that halting all DMA is an acceptable implementation. Link: https://lore.kernel.org/all/20220224142024.147653-11-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-03-03 13:00:16 +02:00
Jason Gunthorpe	115dcec65f	vfio: Define device migration protocol v2 Replace the existing region based migration protocol with an ioctl based protocol. The two protocols have the same general semantic behaviors, but the way the data is transported is changed. This is the STOP_COPY portion of the new protocol, it defines the 5 states for basic stop and copy migration and the protocol to move the migration data in/out of the kernel. Compared to the clarification of the v1 protocol Alex proposed: https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit@omen This has a few deliberate functional differences: - ERROR arcs allow the device function to remain unchanged. - The protocol is not required to return to the original state on transition failure. Instead userspace can execute an unwind back to the original state, reset, or do something else without needing kernel support. This simplifies the kernel design and should userspace choose a policy like always reset, avoids doing useless work in the kernel on error handling paths. - PRE_COPY is made optional, userspace must discover it before using it. This reflects the fact that the majority of drivers we are aware of right now will not implement PRE_COPY. - segmentation is not part of the data stream protocol, the receiver does not have to reproduce the framing boundaries. The hybrid FSM for the device_state is described as a Mealy machine by documenting each of the arcs the driver is required to implement. Defining the remaining set of old/new device_state transitions as 'combination transitions' which are naturally defined as taking multiple FSM arcs along the shortest path within the FSM's digraph allows a complete matrix of transitions. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is defined to replace writing to the device_state field in the region. This allows returning a brand new FD whenever the requested transition opens a data transfer session. The VFIO core code implements the new feature and provides a helper function to the driver. Using the helper the driver only has to implement 6 of the FSM arcs and the other combination transitions are elaborated consistently from those arcs. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIGRATION is defined to report the capability for migration and indicate which set of states and arcs are supported by the device. The FSM provides a lot of flexibility to make backwards compatible extensions but the VFIO_DEVICE_FEATURE also allows for future breaking extensions for scenarios that cannot support even the basic STOP_COPY requirements. The VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE with the GET option (i.e. VFIO_DEVICE_FEATURE_GET) can be used to read the current migration state of the VFIO device. Data transfer sessions are now carried over a file descriptor, instead of the region. The FD functions for the lifetime of the data transfer session. read() and write() transfer the data with normal Linux stream FD semantics. This design allows future expansion to support poll(), io_uring, and other performance optimizations. The complicated mmap mode for data transfer is discarded as current qemu doesn't take meaningful advantage of it, and the new qemu implementation avoids substantially all the performance penalty of using a read() on the region. Link: https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-03-03 12:57:39 +02:00
Jason Gunthorpe	445ad495f0	vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl Invoke a new device op 'device_feature' to handle just the data array portion of the command. This lifts the ioctl validation to the core code and makes it simpler for either the core code, or layered drivers, to implement their own feature values. Provide vfio_check_feature() to consolidate checking the flags/etc against what the driver supports. Link: https://lore.kernel.org/all/20220224142024.147653-9-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2022-03-03 12:54:43 +02:00
Abhishek Sahu	26a17b12d7	vfio/pci: wake-up devices around reset functions If 'vfio_pci_core_device::needs_pm_restore' is set (PCI device does not have No_Soft_Reset bit set in its PMCSR config register), then the current PCI state will be saved locally in 'vfio_pci_core_device::pm_save' during D0->D3hot transition and same will be restored back during D3hot->D0 transition. For reset-related functionalities, vfio driver uses PCI reset API's. These API's internally change the PCI power state back to D0 first if the device power state is non-D0. This state change to D0 will happen without the involvement of vfio driver. Let's consider the following example: 1. The device is in D3hot. 2. User invokes VFIO_DEVICE_RESET ioctl. 3. pci_try_reset_function() will be called which internally invokes pci_dev_save_and_disable(). 4. pci_set_power_state(dev, PCI_D0) will be called first. 5. pci_save_state() will happen then. Now, for the devices which has NoSoftRst-, the pci_set_power_state() can trigger soft reset and the original PCI config state will be lost at step (4) and this state cannot be restored again. This original PCI state can include any setting which is performed by SBIOS or host linux kernel (for example LTR, ASPM L1 substates, etc.). When this soft reset will be triggered, then all these settings will be reset, and the device state saved at step (5) will also have this setting cleared so it cannot be restored. Since the vfio driver only exposes limited PCI capabilities to its user, so the vfio driver user also won't have the option to save and restore these capabilities state either and these original settings will be permanently lost. For pci_reset_bus() also, we can have the above situation. The other functions/devices can be in D3hot and the reset will change the power state of all devices to D0 without the involvement of vfio driver. So, before calling any reset-related API's, we need to make sure that the device state is D0. This is mainly to preserve the state around soft reset. For vfio_pci_core_disable(), we use __pci_reset_function_locked() which internally can use pci_pm_reset() for the function reset. pci_pm_reset() requires the device power state to be in D0, otherwise it returns error. This patch changes the device power state to D0 by invoking vfio_pci_set_power_state() explicitly before calling any reset related API's. Fixes: `51ef3a004b` ("vfio/pci: Restore device state on PM transition") Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220217122107.22434-3-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-02-22 13:45:20 -07:00
Abhishek Sahu	eadf88ecf6	vfio/pci: fix memory leak during D3hot to D0 transition If 'vfio_pci_core_device::needs_pm_restore' is set (PCI device does not have No_Soft_Reset bit set in its PMCSR config register), then the current PCI state will be saved locally in 'vfio_pci_core_device::pm_save' during D0->D3hot transition and same will be restored back during D3hot->D0 transition. For saving the PCI state locally, pci_store_saved_state() is being used and the pci_load_and_free_saved_state() will free the allocated memory. But for reset related IOCTLs, vfio driver calls PCI reset-related API's which will internally change the PCI power state back to D0. So, when the guest resumes, then it will get the current state as D0 and it will skip the call to vfio_pci_set_power_state() for changing the power state to D0 explicitly. In this case, the memory pointed by 'pm_save' will never be freed. In a malicious sequence, the state changing to D3hot followed by VFIO_DEVICE_RESET/VFIO_DEVICE_PCI_HOT_RESET can be run in a loop and it can cause an OOM situation. This patch frees the earlier allocated memory first before overwriting 'pm_save' to prevent the mentioned memory leak. Fixes: `51ef3a004b` ("vfio/pci: Restore device state on PM transition") Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220217122107.22434-2-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-02-22 13:45:20 -07:00
Alex Williamson	6e031ec0e5	vfio/pci: Stub vfio_pci_vga_rw when !CONFIG_VFIO_PCI_VGA Resolve build errors reported against UML build for undefined ioport_map() and ioport_unmap() functions. Without this config option a device cannot have vfio_pci_core_device.has_vga set, so the existing function would always return -EINVAL anyway. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/r/20220123125737.2658758-1-geert@linux-m68k.org Link: https://lore.kernel.org/r/164306582968.3758255.15192949639574660648.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2022-02-22 13:45:02 -07:00
Linus Torvalds	c5a0b6e40d	VFIO updates for v5.17-rc1 - Fix sparse endian warnings in IGD code (Alex Williamson) - Balance kvzalloc with kvfree (Jiacheng Shi) -----BEGIN PGP SIGNATURE----- iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmHoTmkbHGFsZXgud2ls bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsicMcQALJcjkN9z2GvFpI7iFN8 6xA7P1MNZ5kU6aHy/hyY/DwukrCHvx5hLc3Xro9ng4JkmP7+2GeUfyV3AlNDUId3 gaZCGgLjmNQY9wxOfT1WiU6oDbvNuGNBL3WxrhAB/DLrjowhEM1hiYL9JBqSeZfB FapfdpBsBrTtpuAbI1daE5qsYSaVzL4NY14ulFOXVvN3sVhPU4byXhgvczGbT8o+ c/lllFG5cXxB89n8TPggvLGyBpJIvHqdKfCTcn8+MrimC3iVkoOAlu9HFWDe2wPA Nt9OrKb7hSy7Q7gBJtwS5k8I531b+ovdzguVoATw8VZfUB08tz4NMn8X5xQUCapI T1Bhma1Ua8TH11i2acl5RsktJAkSQeZ8IWt1AQ6hBeTI7Z19iUleYi+71n6+rePB N5MH7snTqOgdYIK2sLpb2QKHBsEmAn+dG8cf/g0FRD143ce24P4A8fMbuBIO9PeM 129mnlV5q8GluYc8+F9ppGX29lrD9N21TTt2yGj6X55IRzJQ8CJXg8O0y+GztQSn P634BRNQE7zw8w4y6XzaTKQPu43yD+VvB9qpIH0HB56RaJOubfpq9Fd07niE3dZR CHMfwDoihAIaknuwWqf1DnwMMbn9uzOlUKerrbI8v4DTZd5LDuSJ7chiBB7ra8g7 zDU+seQ5EI1rlNrwbTHp110R =MGt6 -----END PGP SIGNATURE----- Merge tag 'vfio-v5.17-rc1' of git://github.com/awilliam/linux-vfio Pull VFIO updates from Alex Williamson: - Fix sparse endian warnings in IGD code (Alex Williamson) - Balance kvzalloc with kvfree (Jiacheng Shi) * tag 'vfio-v5.17-rc1' of git://github.com/awilliam/linux-vfio: vfio/iommu_type1: replace kfree with kvfree vfio/pci: Resolve sparse endian warnings in IGD support	2022-01-20 13:31:46 +02:00
Jiacheng Shi	2bed2ced40	vfio/iommu_type1: replace kfree with kvfree Variables allocated by kvzalloc should not be freed by kfree. Because they may be allocated by vmalloc. So we replace kfree with kvfree here. Fixes: `d6a4c18566` ("vfio iommu: Implementation of ioctl for dirty pages tracking") Signed-off-by: Jiacheng Shi <billsjc@sjtu.edu.cn> Link: https://lore.kernel.org/r/20211212091600.2560-1-billsjc@sjtu.edu.cn Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-12-21 12:30:34 -07:00
Alex Williamson	21ab799585	vfio/pci: Resolve sparse endian warnings in IGD support Sparse warns: sparse warnings: (new ones prefixed by >>) >> drivers/vfio/pci/vfio_pci_igd.c:146:21: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned short [addressable] [usertype] val @@ got restricted __le16 [usertype] @@ drivers/vfio/pci/vfio_pci_igd.c:146:21: sparse: expected unsigned short [addressable] [usertype] val drivers/vfio/pci/vfio_pci_igd.c:146:21: sparse: got restricted __le16 [usertype] >> drivers/vfio/pci/vfio_pci_igd.c:161:21: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned int [addressable] [usertype] val @@ got restricted __le32 [usertype] @@ drivers/vfio/pci/vfio_pci_igd.c:161:21: sparse: expected unsigned int [addressable] [usertype] val drivers/vfio/pci/vfio_pci_igd.c:161:21: sparse: got restricted __le32 [usertype] drivers/vfio/pci/vfio_pci_igd.c:176:21: sparse: sparse: incorrect type in assignment (different base types) @@ expected unsigned short [addressable] [usertype] val @@ got restricted __le16 [usertype] @@ drivers/vfio/pci/vfio_pci_igd.c:176:21: sparse: expected unsigned short [addressable] [usertype] val drivers/vfio/pci/vfio_pci_igd.c:176:21: sparse: got restricted __le16 [usertype] These are due to trying to use an unsigned to store the result of a cpu_to_leXX() conversion. These are small variables, so pointer tricks are wasteful and casting just generates different sparse warnings. Store to and copy results from a separate little endian variable. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/202111290026.O3vehj03-lkp@intel.com/ Link: https://lore.kernel.org/r/163840226123.138003.7668320168896210328.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-12-21 12:30:31 -07:00
Thomas Gleixner	d86a6d47bc	bus: fsl-mc: fsl-mc-allocator: Rework MSI handling Storing a pointer to the MSI descriptor just to track the Linux interrupt number is daft. Just store the interrupt number and be done with it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20211210221815.207838579@linutronix.de	2021-12-16 22:16:41 +01:00
Zhenyu Wang	8704e89349	vfio/pci: Fix OpRegion read This is to fix incorrect pointer arithmetic which caused wrong OpRegion version returned, then VM driver got error to get wanted VBT block. We need to be safe to return correct data, so force pointer type for byte access. Fixes: `49ba1a2976` ("vfio/pci: Add OpRegion 2.0+ Extended VBT support.") Cc: Colin Xu <colin.xu@gmail.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dmitry Torokhov <dtor@chromium.org> Cc: "Xu, Terrence" <terrence.xu@intel.com> Cc: "Gao, Fred" <fred.gao@intel.com> Acked-by: Colin Xu <colin.xu@gmail.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://lore.kernel.org/r/20211125051328.3359902-1-zhenyuw@linux.intel.com [aw: line wrap] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-11-30 11:41:49 -07:00
Randy Dunlap	3b9a2d5793	vfio: remove all kernel-doc notation vfio.c abuses (misuses) "/", which indicates the beginning of kernel-doc notation in the kernel tree. This causes a bunch of kernel-doc complaints about this source file, so quieten all of them by changing all "/" to "/". vfio.c:236: warning: This comment starts with '/', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst IOMMU driver registration vfio.c:236: warning: missing initial short description on line: * IOMMU driver registration vfio.c:295: warning: expecting prototype for Container objects(). Prototype was for vfio_container_get() instead vfio.c:317: warning: expecting prototype for Group objects(). Prototype was for __vfio_group_get_from_iommu() instead vfio.c:496: warning: Function parameter or member 'device' not described in 'vfio_device_put' vfio.c:496: warning: expecting prototype for Device objects(). Prototype was for vfio_device_put() instead vfio.c:599: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Async device support vfio.c:599: warning: missing initial short description on line: * Async device support vfio.c:693: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst VFIO driver API vfio.c:693: warning: missing initial short description on line: * VFIO driver API vfio.c:835: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Get a reference to the vfio_device for a device. Even if the vfio.c:835: warning: missing initial short description on line: * Get a reference to the vfio_device for a device. Even if the vfio.c:969: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst VFIO base fd, /dev/vfio/vfio vfio.c:969: warning: missing initial short description on line: * VFIO base fd, /dev/vfio/vfio vfio.c:1187: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst VFIO Group fd, /dev/vfio/$GROUP vfio.c:1187: warning: missing initial short description on line: * VFIO Group fd, /dev/vfio/$GROUP vfio.c:1540: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst VFIO Device fd vfio.c:1540: warning: missing initial short description on line: * VFIO Device fd vfio.c:1615: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst External user API, exported by symbols to be linked dynamically. vfio.c:1615: warning: missing initial short description on line: * External user API, exported by symbols to be linked dynamically. vfio.c:1663: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst External user API, exported by symbols to be linked dynamically. vfio.c:1663: warning: missing initial short description on line: * External user API, exported by symbols to be linked dynamically. vfio.c:1742: warning: Function parameter or member 'caps' not described in 'vfio_info_cap_add' vfio.c:1742: warning: Function parameter or member 'size' not described in 'vfio_info_cap_add' vfio.c:1742: warning: Function parameter or member 'id' not described in 'vfio_info_cap_add' vfio.c:1742: warning: Function parameter or member 'version' not described in 'vfio_info_cap_add' vfio.c:1742: warning: expecting prototype for Sub(). Prototype was for vfio_info_cap_add() instead vfio.c:2276: warning: This comment starts with '/*', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Module/class support vfio.c:2276: warning: missing initial short description on line: * Module/class support Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/r/38a9cb92-a473-40bf-b8f9-85cc5cfc2da4@infradead.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-11-30 11:41:43 -07:00
Jason Gunthorpe	9cef73918e	vfio: Use cdev_device_add() instead of device_create() Modernize how vfio is creating the group char dev and sysfs presence. These days drivers with state should use cdev_device_add() and cdev_device_del() to manage the cdev and sysfs lifetime. This API requires the driver to put the struct device and struct cdev inside its state struct (vfio_group), and then use the usual device_initialize()/cdev_device_add()/cdev_device_del() sequence. Split the code to make this possible: - vfio_group_alloc()/vfio_group_release() are pair'd functions to alloc/free the vfio_group. release is done under the struct device kref. - vfio_create_group()/vfio_group_put() are pairs that manage the sysfs/cdev lifetime. Once the uses count is zero the vfio group's userspace presence is destroyed. - The IDR is replaced with an IDA. container_of(inode->i_cdev) is used to get back to the vfio_group during fops open. The IDA assigns unique minor numbers. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-15 13:58:20 -06:00
Jason Gunthorpe	2b678aa2f0	vfio: Use a refcount_t instead of a kref in the vfio_group The next patch adds a struct device to the struct vfio_group, and it is confusing/bad practice to have two krefs in the same struct. This kref is controlling the period when the vfio_group is registered in sysfs, and visible in the internal lookup. Switch it to a refcount_t instead. The refcount_dec_and_mutex_lock() is still required because we need atomicity of the list searches and sysfs presence. Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-15 13:58:20 -06:00
Jason Gunthorpe	325a31c920	vfio: Don't leak a group reference if the group already exists If vfio_create_group() searches the group list and returns an already existing group it does not put back the iommu_group reference that the caller passed in. Change the semantic of vfio_create_group() to not move the reference in from the caller, but instead obtain a new reference inside and leave the caller's reference alone. The two callers must now call iommu_group_put(). This is an unlikely race as the only caller that could hit it has already searched the group list before attempting to create the group. Fixes: `cba3345cc4` ("vfio: VFIO core") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-15 13:58:20 -06:00
Jason Gunthorpe	1ceabade1d	vfio: Do not open code the group list search in vfio_create_group() Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so that vfio_create_group() can call it to consolidate this duplicated code. Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-15 13:58:19 -06:00
Jason Gunthorpe	63b150fde7	vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() iommu_group_register_notifier()/iommu_group_unregister_notifier() are built using a blocking_notifier_chain which integrates a rwsem. The notifier function cannot be running outside its registration. When considering how the notifier function interacts with create/destroy of the group there are two fringe cases, the notifier starts before list_add(&vfio.group_list) and the notifier runs after the kref becomes 0. Prior to vfio_create_group() unlocking and returning we have container_users == 0 device_list == empty And this cannot change until the mutex is unlocked. After the kref goes to zero we must also have container_users == 0 device_list == empty Both are required because they are balanced operations and a 0 kref means some caller became unbalanced. Add the missing assertion that container_users must be zero as well. These two facts are important because when checking each operation we see: - IOMMU_GROUP_NOTIFY_ADD_DEVICE Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev() 0 container_users ends the call - IOMMU_GROUP_NOTIFY_BOUND_DRIVER 0 container_users ends the call Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes items from the unbound list. During creation this list is empty, during kref == 0 nothing can read this list, and it will be freed soon. Since the vfio_group_release() doesn't hold the appropriate lock to manipulate the unbound_list and could race with the notifier, move the cleanup to directly before the kfree. This allows deleting all of the deferred group put code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-15 13:58:19 -06:00
Alex Williamson	48f06ca420	Merge branch 'v5.16/vfio/colin_xu_igd_opregion_2.0_v8' into v5.16/vfio/next	2021-10-12 10:26:41 -06:00
Colin Xu	49ba1a2976	vfio/pci: Add OpRegion 2.0+ Extended VBT support. Due to historical reason, some legacy shipped system doesn't follow OpRegion 2.1 spec but still stick to OpRegion 2.0, in which the extended VBT is not contiguous after OpRegion in physical address, but any location pointed by RVDA via absolute address. Also although current OpRegion 2.1+ systems appears that the extended VBT follows OpRegion, RVDA is the relative address to OpRegion head, the extended VBT location may change to non-contiguous to OpRegion. In both cases, it's impossible to map a contiguous range to hold both OpRegion and the extended VBT and expose via one vfio region. The only difference between OpRegion 2.0 and 2.1 is where extended VBT is stored: For 2.0, RVDA is the absolute address of extended VBT while for 2.1, RVDA is the relative address of extended VBT to OpRegion baes, and there is no other difference between OpRegion 2.0 and 2.1. To support the non-contiguous region case as described, the updated read op will patch OpRegion version and RVDA on-the-fly accordingly. So that from vfio igd OpRegion view, only 2.1+ with contiguous extended VBT after OpRegion is exposed, regardless the underneath host OpRegion is 2.0 or 2.1+. The mechanism makes it possible to support legacy OpRegion 2.0 extended VBT systems with on the market, and support OpRegion 2.1+ where the extended VBT isn't contiguous after OpRegion. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Hang Yuan <hang.yuan@linux.intel.com> Cc: Swee Yee Fonn <swee.yee.fonn@intel.com> Cc: Fred Gao <fred.gao@intel.com> Signed-off-by: Colin Xu <colin.xu@intel.com> Link: https://lore.kernel.org/r/20211012124855.52463-1-colin.xu@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-10-12 10:26:08 -06:00
Alex Williamson	052493d553	Merge branch 'v5.16/vfio/diana-fsl-reset-v2' into v5.16/vfio/next	2021-10-11 15:50:49 -06:00
Alex Williamson	d9a0cd510c	Merge branch 'v5.16/vfio/hch-cleanup-vfio-iommu_group-creation-v6' into v5.16/vfio/next	2021-09-30 12:55:35 -06:00
Christoph Hellwig	3f901389fa	vfio/iommu_type1: remove IS_IOMMU_CAP_DOMAIN_IN_CONTAINER IS_IOMMU_CAP_DOMAIN_IN_CONTAINER just obsfucated the checks being performed, so open code it in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-16-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:45 -06:00
Christoph Hellwig	296e505bad	vfio/iommu_type1: remove the "external" domain The external_domain concept rather misleading and not actually needed. Replace it with a list of emulated groups in struct vfio_iommu. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-15-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:45 -06:00
Christoph Hellwig	65cdbf1063	vfio/iommu_type1: initialize pgsize_bitmap in ->open Ensure pgsize_bitmap is always valid by initializing it to PAGE_MASK in vfio_iommu_type1_open and remove the now pointless update for the external domain case in vfio_iommu_type1_attach_group, which was just setting pgsize_bitmap to PAGE_MASK when only external domains were attached. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210924155705.4258-14-hch@lst.de [aw: s/ULONG_MAX/PAGE_MASK/ per discussion in link] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:45 -06:00
Christoph Hellwig	8986390414	vfio/spapr_tce: reject mediated devices Unlike the the type1 IOMMU backend, the SPAPR one does not contain any support for the magic non-IOMMU backed iommu_group used by mediated devices, so reject them in ->attach_group. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-13-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:44 -06:00
Christoph Hellwig	c3c0fa9d94	vfio: clean up the check for mediated device in vfio_iommu_type1 Pass the group flags to ->attach_group and remove the messy check for the bus type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-12-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:44 -06:00
Christoph Hellwig	fda49d97f2	vfio: remove the unused mdev iommu hook The iommu_device field in struct mdev_device has never been used since it was added more than 2 years ago. This is a manual revert of commit `7bd50f0cd2` ("vfio/type1: Add domain at(de)taching group helpers"). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-11-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2021-09-30 12:46:44 -06:00

1 2 3 4 5 ...

766 Commits