linux

iv/linux

Author	SHA1	Message	Date
Sean Christopherson	f9980f56a1	vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn() commit 5cbf3264bc715e9eb384e2b68601f8c02bb9a61d upstream. Use follow_pfn() to get the PFN of a PFNMAP VMA instead of assuming that vma->vm_pgoff holds the base PFN of the VMA. This fixes a bug where attempting to do VFIO_IOMMU_MAP_DMA on an arbitrary PFNMAP'd region of memory calculates garbage for the PFN. Hilariously, this only got detected because the first "PFN" calculated by vaddr_get_pfn() is PFN 0 (vma->vm_pgoff==0), and iommu_iova_to_phys() uses PA==0 as an error, which triggers a WARN in vfio_unmap_unpin() because the translation "failed". PFN 0 is now unconditionally reserved on x86 in order to mitigate L1TF, which causes is_invalid_reserved_pfn() to return true and in turns results in vaddr_get_pfn() returning success for PFN 0. Eventually the bogus calculation runs into PFNs that aren't reserved and leads to failure in vfio_pin_map_dma(). The subsequent call to vfio_remove_dma() attempts to unmap PFN 0 and WARNs. WARNING: CPU: 8 PID: 5130 at drivers/vfio/vfio_iommu_type1.c:750 vfio_unmap_unpin+0x2e1/0x310 [vfio_iommu_type1] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio ... CPU: 8 PID: 5130 Comm: sgx Tainted: G W 5.6.0-rc5-705d787c7fee-vfio+ #3 Hardware name: Intel Corporation Mehlow UP Server Platform/Moss Beach Server, BIOS CNLSE2R1.D00.X119.B49.1803010910 03/01/2018 RIP: 0010:vfio_unmap_unpin+0x2e1/0x310 [vfio_iommu_type1] Code: <0f> 0b 49 81 c5 00 10 00 00 e9 c5 fe ff ff bb 00 10 00 00 e9 3d fe RSP: 0018:ffffbeb5039ebda8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9a55cbf8d480 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9a52b771c200 RBP: 0000000000000000 R08: 0000000000000040 R09: 00000000fffffff2 R10: 0000000000000001 R11: ffff9a51fa896000 R12: 0000000184010000 R13: 0000000184000000 R14: 0000000000010000 R15: ffff9a55cb66ea08 FS: 00007f15d3830b40(0000) GS:ffff9a55d5600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000561cf39429e0 CR3: 000000084f75f005 CR4: 00000000003626e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: vfio_remove_dma+0x17/0x70 [vfio_iommu_type1] vfio_iommu_type1_ioctl+0x9e3/0xa7b [vfio_iommu_type1] ksys_ioctl+0x92/0xb0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f15d04c75d7 Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48 Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-05-05 19:14:39 +02:00
Eric Auger	aecc7621fe	vfio_pci: Enable memory accesses before calling pci_map_rom [ Upstream commit 0cfd027be1d6def4a462cdc180c055143af24069 ] pci_map_rom/pci_get_rom_size() performs memory access in the ROM. In case the Memory Space accesses were disabled, readw() is likely to trigger a synchronous external abort on some platforms. In case memory accesses were disabled, re-enable them before the call and disable them back again just after. Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Signed-off-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-01-29 10:24:16 +01:00
Jiang Yi	017df5edfa	vfio/pci: call irq_bypass_unregister_producer() before freeing irq commit d567fb8819162099035e546b11a736e29c2af0ea upstream. Since irq_bypass_register_producer() is called after request_irq(), we should do tear-down in reverse order: irq_bypass_unregister_producer() then free_irq(). Specifically free_irq() may release resources required by the irqbypass del_producer() callback. Notably an example provided by Marc Zyngier on arm64 with GICv4 that he indicates has the potential to wedge the hardware: free_irq(irq) __free_irq(irq) irq_domain_deactivate_irq(irq) its_irq_domain_deactivate() [unmap the VLPI from the ITS] kvm_arch_irq_bypass_del_producer(cons, prod) kvm_vgic_v4_unset_forwarding(kvm, irq, ...) its_unmap_vlpi(irq) [Unmap the VLPI from the ITS (again), remap the original LPI] Signed-off-by: Jiang Yi <giangyi@amazon.com> Cc: stable@vger.kernel.org # v4.4+ Fixes: 6d7425f109d26 ("vfio: Register/unregister irq_bypass_producer") Link: https://lore.kernel.org/kvm/20191127164910.15888-1-giangyi@amazon.com Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> [aw: commit log] Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-12-21 10:42:31 +01:00
Alexey Kardashevskiy	95c653522a	vfio/spapr_tce: Get rid of possible infinite loop [ Upstream commit 517ad4ae8aa93dccdb9a88c27257ecb421c9e848 ] As a part of cleanup, the SPAPR TCE IOMMU subdriver releases preregistered memory. If there is a bug in memory release, the loop in tce_iommu_release() becomes infinite; this actually happened to me. This makes the loop finite and prints a warning on every failure to make the code more bug prone. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-12-05 15:34:47 +01:00
Alex Williamson	03c3736e79	vfio/pci: Mask buggy SR-IOV VF INTx support [ Upstream commit db04264fe9bc0f2b62e036629f9afb530324b693 ] The SR-IOV spec requires that VFs must report zero for the INTx pin register as VFs are precluded from INTx support. It's much easier for the host kernel to understand whether a device is a VF and therefore whether a non-zero pin register value is bogus than it is to do the same in userspace. Override the INTx count for such devices and virtualize the pin register to provide a consistent view of the device to the user. As this is clearly a spec violation, warn about it to support hardware validation, but also provide a known whitelist as it doesn't do much good to continue complaining if the hardware vendor doesn't plan to fix it. Known devices with this issue: 8086:270c Tested-by: Gage Eads <gage.eads@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-25 09:52:55 +01:00
Li Qiang	be363e27ec	vfio/pci: Fix potential memory leak in vfio_msi_cap_len [ Upstream commit 30ea32ab1951c80c6113f300fce2c70cd12659e4 ] Free allocated vdev->msi_perm in error path. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-11-25 09:52:55 +01:00
hexin	89a4dab0f9	vfio_pci: Restore original state on release [ Upstream commit 92c8026854c25093946e0d7fe536fd9eac440f06 ] vfio_pci_enable() saves the device's initial configuration information with the intent that it is restored in vfio_pci_disable(). However, the commit referenced in Fixes: below replaced the call to __pci_reset_function_locked(), which is not wrapped in a state save and restore, with pci_try_reset_function(), which overwrites the restored device state with the current state before applying it to the device. Reinstate use of __pci_reset_function_locked() to return to the desired behavior. Fixes: 890ed578df82 ("vfio-pci: Use pci "try" reset interface") Signed-off-by: hexin <hexin15@baidu.com> Signed-off-by: Liu Qi <liuqi16@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-10-07 18:53:12 +02:00
Louis Taylor	53ffab7df3	vfio/pci: use correct format characters [ Upstream commit 426b046b748d1f47e096e05bdcc6fb4172791307 ] When compiling with -Wformat, clang emits the following warnings: drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Louis Taylor <louis@kragniz.eu> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2019-05-08 07:19:10 +02:00
Alex Williamson	4f97abd571	vfio/type1: Limit DMA mappings per container commit 492855939bdb59c6f947b0b5b44af9ad82b7e38c upstream. Memory backed DMA mappings are accounted against a user's locked memory limit, including multiple mappings of the same memory. This accounting bounds the number of such mappings that a user can create. However, DMA mappings that are not backed by memory, such as DMA mappings of device MMIO via mmaps, do not make use of page pinning and therefore do not count against the user's locked memory limit. These mappings still consume memory, but the memory is not well associated to the process for the purpose of oom killing a task. To add bounding on this use case, we introduce a limit to the total number of concurrent DMA mappings that a user is allowed to create. This limit is exposed as a tunable module option where the default value of 64K is expected to be well in excess of any reasonable use case (a large virtual machine configuration would typically only make use of tens of concurrent mappings). This fixes CVE-2019-3882. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> [groeck: Adjust for missing upstream commit] Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-04 08:49:10 +02:00
Geert Uytterhoeven	c6e8116307	vfio: platform: Fix reset module leak in error path [ Upstream commit 28a68387888997e8a7fa57940ea5d55f2e16b594 ] If the IOMMU group setup fails, the reset module is not released. Fixes: b5add544d677d363 ("vfio, platform: make reset driver a requirement by default") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-08-03 07:55:13 +02:00
Gustavo A. R. Silva	40974672ae	vfio/pci: Fix potential Spectre v1 commit 0e714d27786ce1fb3efa9aac58abc096e68b1c2a upstream. info.index can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/vfio/pci/vfio_pci.c:734 vfio_pci_ioctl() warn: potential spectre issue 'vdev->region' Fix this by sanitizing info.index before indirectly using it to index vdev->region Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-07-25 11:23:59 +02:00
Alex Williamson	502b50e870	vfio/pci: Virtualize Maximum Read Request Size commit cf0d53ba4947aad6e471491d5b20a567cbe92e56 upstream. MRRS defines the maximum read request size a device is allowed to make. Drivers will often increase this to allow more data transfer with a single request. Completions to this request are bound by the MPS setting for the bus. Aside from device quirks (none known), it doesn't seem to make sense to set an MRRS value less than MPS, yet this is a likely scenario given that user drivers do not have a system-wide view of the PCI topology. Virtualize MRRS such that the user can set MRRS >= MPS, but use MPS as the floor value that we'll write to hardware. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-04-24 09:34:15 +02:00
Alexey Kardashevskiy	bab8ac1b27	vfio/spapr_tce: Check kzalloc() return when preregistering memory [ Upstream commit 3393af24b665cb0aea7353b05e522b03ab1e7d73 ] This adds missing checking for kzalloc() return value. Fixes: 4b6fad7097f8 ("powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-03-22 09:17:51 +01:00
Alexey Kardashevskiy	5d7601dcb2	vfio/powerpc/spapr_tce: Enforce IOMMU type compatibility check [ Upstream commit 1282ba7fc28dbc66c3f0e4aaafaaa228361d1ae5 ] The existing SPAPR TCE driver advertises both VFIO_SPAPR_TCE_IOMMU and VFIO_SPAPR_TCE_v2_IOMMU types to the userspace and the userspace usually picks the v2. Normally the userspace would create a container, attach an IOMMU group to it and only then set the IOMMU type (which would normally be v2). However a specific IOMMU group may not support v2, in other words it may not implement set_window/unset_window/take_ownership/ release_ownership and such a group should not be attached to a v2 container. This adds extra checks that a new group can do what the selected IOMMU type suggests. The userspace can then test the return value from ioctl(VFIO_SET_IOMMU, VFIO_SPAPR_TCE_v2_IOMMU) and try VFIO_SPAPR_TCE_IOMMU. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-03-22 09:17:51 +01:00
Alex Williamson	76d83bfc11	vfio/pci: Virtualize Maximum Payload Size [ Upstream commit 523184972b282cd9ca17a76f6ca4742394856818 ] With virtual PCI-Express chipsets, we now see userspace/guest drivers trying to match the physical MPS setting to a virtual downstream port. Of course a lone physical device surrounded by virtual interconnects cannot make a correct decision for a proper MPS setting. Instead, let's virtualize the MPS control register so that writes through to hardware are disallowed. Userspace drivers like QEMU assume they can write anything to the device and we'll filter out anything dangerous. Since mismatched MPS can lead to AER and other faults, let's add it to the kernel side rather than relying on userspace virtualization to handle it. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-25 14:23:45 +01:00
Alexey Kardashevskiy	3e550debcf	vfio/spapr: Fix missing mutex unlock when creating a window [ Upstream commit 2da64d20a0b20046d688e44f4033efd09157e29d ] Commit d9c728949ddc ("vfio/spapr: Postpone default window creation") added an additional exit to the VFIO_IOMMU_SPAPR_TCE_CREATE case and made it possible to return from tce_iommu_ioctl() without unlocking container->lock; this fixes the issue. Fixes: d9c728949ddc ("vfio/spapr: Postpone default window creation") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-12-09 22:01:54 +01:00
Arvind Yadav	812a7df655	vfio-pci: Handle error from pci_iomap [ Upstream commit e19f32da5ded958238eac1bbe001192acef191a2 ] Here, pci_iomap can fail, handle this case release selected pci regions and return -ENOMEM. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-06 18:59:48 -07:00
Arnd Bergmann	c7d0c0d848	vfio-pci: use 32-bit comparisons for register address for gcc-4.5 [ Upstream commit 45e869714489431625c569d21fc952428d761476 ] Using ancient compilers (gcc-4.5 or older) on ARM, we get a link failure with the vfio-pci driver: ERROR: "__aeabi_lcmp" [drivers/vfio/pci/vfio-pci.ko] undefined! The reason is that the compiler tries to do a comparison of a 64-bit range. This changes it to convert to a 32-bit number explicitly first, as newer compilers do for themselves. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-08-06 18:59:45 -07:00
Alex Williamson	8f9dec0c2d	vfio: New external user group/file match commit 5d6dee80a1e94cc284d03e06d930e60e8d3ecf7d upstream. At the point where the kvm-vfio pseudo device wants to release its vfio group reference, we can't always acquire a new reference to make that happen. The group can be in a state where we wouldn't allow a new reference to be added. This new helper function allows a caller to match a file to a group to facilitate this. Given a file and group, report if they match. Thus the caller needs to already have a group reference to match to the file. This allows the deletion of a group without acquiring a new reference. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-27 15:08:03 -07:00
Alex Williamson	e91a55790d	vfio: Fix group release deadlock commit 811642d8d8a82c0cce8dc2debfdaf23c5a144839 upstream. If vfio_iommu_group_notifier() acquires a group reference and that reference becomes the last reference to the group, then vfio_group_put introduces a deadlock code path where we're trying to unregister from the iommu notifier chain from within a callout of that chain. Use a work_struct to release this reference asynchronously. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-27 15:08:03 -07:00
Greg Kurz	ff3b1dd026	vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null [ Upstream commit bd00fdf198e2da475a2f4265a83686ab42d998a8 ] The recently added mediated VFIO driver doesn't know about powerpc iommu. It thus doesn't register a struct iommu_table_group in the iommu group upon device creation. The iommu_data pointer hence remains null. This causes a kernel oops when userspace tries to set the iommu type of a container associated with a mediated device to VFIO_SPAPR_TCE_v2_IOMMU. [ 82.585440] mtty mtty: MDEV: Registered [ 87.655522] iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 10 [ 87.655527] vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 10 [ 116.297184] Unable to handle kernel paging request for data at address 0x00000030 [ 116.297389] Faulting instruction address: 0xd000000007870524 [ 116.297465] Oops: Kernel access of bad area, sig: 11 [#1] [ 116.297611] SMP NR_CPUS=2048 [ 116.297611] NUMA [ 116.297627] PowerNV ... [ 116.297954] CPU: 33 PID: 7067 Comm: qemu-system-ppc Not tainted 4.10.0-rc5-mdev-test #8 [ 116.297993] task: c000000e7718b680 task.stack: c000000e77214000 [ 116.298025] NIP: d000000007870524 LR: d000000007870518 CTR: 0000000000000000 [ 116.298064] REGS: c000000e77217990 TRAP: 0300 Not tainted (4.10.0-rc5-mdev-test) [ 116.298103] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> [ 116.298107] CR: 84004444 XER: 00000000 [ 116.298154] CFAR: c00000000000888c DAR: 0000000000000030 DSISR: 40000000 SOFTE: 1 GPR00: d000000007870518 c000000e77217c10 d00000000787b0ed c000000eed2103c0 GPR04: 0000000000000000 0000000000000000 c000000eed2103e0 0000000f24320000 GPR08: 0000000000000104 0000000000000001 0000000000000000 d0000000078729b0 GPR12: c00000000025b7e0 c00000000fe08400 0000000000000001 000001002d31d100 GPR16: 000001002c22c850 00003ffff315c750 0000000043145680 0000000043141bc0 GPR20: ffffffffffffffed fffffffffffff000 0000000020003b65 d000000007706018 GPR24: c000000f16cf0d98 d000000007706000 c000000003f42980 c000000003f42980 GPR28: c000000f1575ac00 c000000003f429c8 0000000000000000 c000000eed2103c0 [ 116.298504] NIP [d000000007870524] tce_iommu_attach_group+0x10c/0x360 [vfio_iommu_spapr_tce] [ 116.298555] LR [d000000007870518] tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce] [ 116.298601] Call Trace: [ 116.298610] [c000000e77217c10] [d000000007870518] tce_iommu_attach_group+0x100/0x360 [vfio_iommu_spapr_tce] (unreliable) [ 116.298671] [c000000e77217cb0] [d0000000077033a0] vfio_fops_unl_ioctl+0x278/0x3e0 [vfio] [ 116.298713] [c000000e77217d40] [c0000000002a3ebc] do_vfs_ioctl+0xcc/0x8b0 [ 116.298745] [c000000e77217de0] [c0000000002a4700] SyS_ioctl+0x60/0xc0 [ 116.298782] [c000000e77217e30] [c00000000000b220] system_call+0x38/0xfc [ 116.298812] Instruction dump: [ 116.298828] 7d3f4b78 409effc8 3d220000 e9298020 3c800140 38a00018 608480c0 e8690028 [ 116.298869] 4800249d e8410018 7c7f1b79 41820230 <e93e0030> 2fa90000 419e0114 e9090020 [ 116.298914] ---[ end trace 1e10b0ced08b9120 ]--- This patch fixes the oops. Reported-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-07-05 14:40:23 +02:00
Alexey Kardashevskiy	d536202202	vfio/spapr_tce: Set window when adding additional groups to container [ Upstream commit 930a42ded3fede7ca3acafc9153f4f2d0f56a92c ] If a container already has a group attached, attaching a new group should just program already created IOMMU tables to the hardware via the iommu_table_group_ops::set_window() callback. However commit 6f01cc692a16 ("vfio/spapr: Add a helper to create default DMA window") did not just simplify the code but also removed the set_window() calls in the case of attaching groups to a container which already has tables so it broke VFIO PCI hotplug. This reverts set_window() bits in tce_iommu_take_ownership_ddw(). Fixes: 6f01cc692a16 ("vfio/spapr: Add a helper to create default DMA window") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-06-17 06:41:51 +02:00
Alex Williamson	9f43f70dcc	vfio/type1: Remove locked page accounting workqueue commit 0cfef2b7410b64d7a430947e0b533314c4f97153 upstream. If the mmap_sem is contented then the vfio type1 IOMMU backend will defer locked page accounting updates to a workqueue task. This has a few problems and depending on which side the user tries to play, they might be over-penalized for unmaps that haven't yet been accounted or race the workqueue to enter more mappings than they're allowed. The original intent of this workqueue mechanism seems to be focused on reducing latency through the ioctl, but we cannot do so at the cost of correctness. Remove this workqueue mechanism and update the callers to allow for failure. We can also now recheck the limit under write lock to make sure we don't exceed it. vfio_pin_pages_remote() also now necessarily includes an unwind path which we can jump to directly if the consecutive page pinning finds that we're exceeding the user's memory limits. This avoids the current lazy approach which does accounting and mapping up to the fault, only to return an error on the next iteration to unwind the entire vfio_dma. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-05-20 14:28:38 +02:00
Alexey Kardashevskiy	53e18968a9	vfio/spapr: Postpone default window creation [ Upstream commit d9c728949ddc9de5734bf3b12ea906ca8a77f2a0 ] We are going to allow the userspace to configure container in one memory context and pass container fd to another so we are postponing memory allocations accounted against the locked memory limit. One of previous patches took care of it_userspace. At the moment we create the default DMA window when the first group is attached to a container; this is done for the userspace which is not DDW-aware but familiar with the SPAPR TCE IOMMU v2 in the part of memory pre-registration - such client expects the default DMA window to exist. This postpones the default DMA window allocation till one of the folliwing happens: 1. first map/unmap request arrives; 2. new window is requested; This adds noop for the case when the userspace requested removal of the default window which has not been created yet. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-22 12:43:38 +01:00
Alexey Kardashevskiy	2e60baca23	vfio/spapr: Add a helper to create default DMA window [ Upstream commit 6f01cc692a16405235d5c34056455b182682123c ] There is already a helper to create a DMA window which does allocate a table and programs it to the IOMMU group. However tce_iommu_take_ownership_ddw() did not use it and did these 2 calls itself to simplify error path. Since we are going to delay the default window creation till the default window is accessed/removed or new window is added, we need a helper to create a default window from all these cases. This adds tce_iommu_create_default_window(). Since it relies on a VFIO container to have at least one IOMMU group (for future use), this changes tce_iommu_attach_group() to add a group to the container first and then call the new helper. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-22 12:43:37 +01:00
Alexey Kardashevskiy	080eb13542	powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown [ Upstream commit 4b6fad7097f883335b6d9627c883cb7f276d94c9 ] At the moment the userspace tool is expected to request pinning of the entire guest RAM when VFIO IOMMU SPAPR v2 driver is present. When the userspace process finishes, all the pinned pages need to be put; this is done as a part of the userspace memory context (MM) destruction which happens on the very last mmdrop(). This approach has a problem that a MM of the userspace process may live longer than the userspace process itself as kernel threads use userspace process MMs which was runnning on a CPU where the kernel thread was scheduled to. If this happened, the MM remains referenced until this exact kernel thread wakes up again and releases the very last reference to the MM, on an idle system this can take even hours. This moves preregistered regions tracking from MM to VFIO; insteads of using mm_iommu_table_group_mem_t::used, tce_container::prereg_list is added so each container releases regions which it has pre-registered. This changes the userspace interface to return EBUSY if a memory region is already registered in a container. However it should not have any practical effect as the only userspace tool available now does register memory region once per container anyway. As tce_iommu_register_pages/tce_iommu_unregister_pages are called under container->lock, this does not need additional locking. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-22 12:43:37 +01:00
Alexey Kardashevskiy	92e44bcd71	vfio/spapr: Reference mm in tce_container [ Upstream commit bc82d122ae4a0e9f971f13403995898fcfa0c09e ] In some situations the userspace memory context may live longer than the userspace process itself so if we need to do proper memory context cleanup, we better have tce_container take a reference to mm_struct and use it later when the process is gone (@current or @current->mm is NULL). This references mm and stores the pointer in the container; this is done in a new helper - tce_iommu_mm_set() - when one of the following happens: - a container is enabled (IOMMU v1); - a first attempt to pre-register memory is made (IOMMU v2); - a DMA window is created (IOMMU v2). The @mm stays referenced till the container is destroyed. This replaces current->mm with container->mm everywhere except debug prints. This adds a check that current->mm is the same as the one stored in the container to prevent userspace from making changes to a memory context of other processes. DMA map/unmap ioctls() do not check for @mm as they already check for @enabled which is set after tce_iommu_mm_set() is called. This does not reference a task as multiple threads within the same mm are allowed to ioctl() to vfio and supposedly they will have same limits and capabilities and if they do not, we'll just fail with no harm made. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-22 12:43:37 +01:00
Alexey Kardashevskiy	5b34666bd2	powerpc/iommu: Stop using @current in mm_iommu_xxx [ Upstream commit d7baee6901b34c4895eb78efdbf13a49079d7404 ] This changes mm_iommu_xxx helpers to take mm_struct as a parameter instead of getting it from @current which in some situations may not have a valid reference to mm. This changes helpers to receive @mm and moves all references to @current to the caller, including checks for !current and !current->mm; checks in mm_iommu_preregistered() are removed as there is no caller yet. This moves the mm_iommu_adjust_locked_vm() call to the caller as it receives mm_iommu_table_group_mem_t but it needs mm. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-22 12:43:37 +01:00
Alexey Kardashevskiy	5d8b3e7559	vfio/spapr: Postpone allocation of userspace version of TCE table [ Upstream commit 39701e56f5f16ea0cf8fc9e8472e645f8de91d23 ] The iommu_table struct manages a hardware TCE table and a vmalloc'd table with corresponding userspace addresses. Both are allocated when the default DMA window is created and this happens when the very first group is attached to a container. As we are going to allow the userspace to configure container in one memory context and pas container fd to another, we have to postpones such allocations till a container fd is passed to the destination user process so we would account locked memory limit against the actual container user constrainsts. This postpones the it_userspace array allocation till it is used first time for mapping. The unmapping patch already checks if the array is allocated. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-03-22 12:43:37 +01:00
Vlad Tsyrklevich	05692d7005	vfio/pci: Fix integer overflows, bitmask check The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize user-supplied integers, potentially allowing memory corruption. This patch adds appropriate integer overflow checks, checks the range bounds for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set. VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in vfio_pci_set_irqs_ioctl(). Furthermore, a kzalloc is changed to a kcalloc because the use of a kzalloc with an integer multiplication allowed an integer overflow condition to be reached without this patch. kcalloc checks for overflow and should prevent a similar occurrence. Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-10-26 13:49:29 -06:00
Christoph Hellwig	61771468e0	vfio_pci: use pci_alloc_irq_vectors Simplify the interrupt setup by using the new PCI layer helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-09-29 13:36:38 -06:00
Alex Williamson	c93a97ee05	vfio-pci: Disable INTx after MSI/X teardown The MSI/X shutdown path can gratuitously enable INTx, which is not something we want to happen if we're dealing with broken INTx device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-09-26 13:52:19 -06:00
Alex Williamson	ddf9dc0eb5	vfio-pci: Virtualize PCIe & AF FLR We use a BAR restore trick to try to detect when a user has performed a device reset, possibly through FLR or other backdoors, to put things back into a working state. This is important for backdoor resets, but we can actually just virtualize the "front door" resets provided via PCIe and AF FLR. Set these bits as virtualized + writable, allowing the default write to set them in vconfig, then we can simply check the bit, perform an FLR of our own, and clear the bit. We don't actually have the granularity in PCI to specify the type of reset we want to do, but generally devices don't implement both PCIe and AF FLR and we'll favor these over other types of reset, so we should generally lineup. We do test whether the device provides the requested FLR type to stay consistent with hardware capabilities though. This seems to fix several instance of devices getting into bad states with userspace drivers, like dpdk, running inside a VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Rose <grose@lightfleet.com>	2016-09-26 13:52:16 -06:00
Baoyou Xie	2e06285655	vfio: platform: mark symbols static where possible We get a few warnings when building kernel with W=1: drivers/vfio/platform/vfio_platform_common.c:76:5: warning: no previous prototype for 'vfio_platform_acpi_call_reset' [-Wmissing-prototypes] drivers/vfio/platform/vfio_platform_common.c:98:6: warning: no previous prototype for 'vfio_platform_acpi_has_reset' [-Wmissing-prototypes] drivers/vfio/platform/vfio_platform_common.c:640:5: warning: no previous prototype for 'vfio_platform_of_probe' [-Wmissing-prototypes] drivers/vfio/platform/reset/vfio_platform_amdxgbe.c:59:5: warning: no previous prototype for 'vfio_platform_amdxgbe_reset' [-Wmissing-prototypes] drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c:60:5: warning: no previous prototype for 'vfio_platform_calxedaxgmac_reset' [-Wmissing-prototypes] .... In fact, these functions are only used in the file in which they are declared and don't need a declaration, but can be made static. so this patch marks these functions with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-09-13 16:11:37 -06:00
Wei Jiangang	8138dabbab	vfio/pci: Fix typos in comments Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-08-29 12:39:09 -06:00
Alex Williamson	c8952a7075	vfio/pci: Fix NULL pointer oops in error interrupt setup handling There are multiple cases in vfio_pci_set_ctx_trigger_single() where we assume we can safely read from our data pointer without actually checking whether the user has passed any data via the count field. VFIO_IRQ_SET_DATA_NONE in particular is entirely broken since we attempt to pull an int32_t file descriptor out before even checking the data type. The other data types assume the data pointer contains one element of their type as well. In part this is good news because we were previously restricted from doing much sanitization of parameters because it was missed in the past and we didn't want to break existing users. Clearly DATA_NONE is completely broken, so it must not have any users and we can fix it up completely. For DATA_BOOL and DATA_EVENTFD, we'll just protect ourselves, returning error when count is zero since we previously would have oopsed. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reported-by: Chris Thompson <the_cartographer@hotmail.com> Cc: stable@vger.kernel.org Reviewed-by: Eric Auger <eric.auger@redhat.com>	2016-08-08 16:16:23 -06:00
Sinan Kaya	0991bbdbf5	vfio: platform: check reset call return code during release Release call is ignoring the return code from reset call and can potentially continue even though reset call failed. If reset_required module parameter is set, this patch is going to validate the return code and will cause stack dump with WARN_ON and warn the user of failure. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:54:45 -06:00
Sinan Kaya	e99442323f	vfio: platform: check reset call return code during open Open call is ignoring the return code from reset call and can potentially continue even though reset call failed. If reset_required module parameter is set, this patch is going to validate the return code and will abort open if reset fails. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:54:45 -06:00
Sinan Kaya	b5add544d6	vfio, platform: make reset driver a requirement by default The code was allowing platform devices to be used without a supporting VFIO reset driver. The hardware can be left in some inconsistent state after a guest machine abort. The reset driver will put the hardware back to safe state and disable interrupts before returning the control back to the host machine. Adding a new reset_required kernel module option to platform VFIO drivers. The default value is true for the DT and ACPI based drivers. The reset requirement value for AMBA drivers is set to false and is unchangeable to maintain the existing functionality. New requirements are: 1. A reset function needs to be implemented by the corresponding driver via DT/ACPI. 2. The reset function needs to be discovered via DT/ACPI. The probe of the driver will fail if any of the above conditions are not satisfied. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:46 -06:00
Sinan Kaya	d30daa33ec	vfio: platform: call _RST method when using ACPI The device tree code checks for the presence of a reset driver and calls the of_reset function pointer by looking up the reset driver as a module. ACPI defines _RST method to perform device level reset. After the _RST method is executed, the OS can resume using the device. _RST method is expected to stop DMA transfers and IRQs. This patch introduces two functions as vfio_platform_acpi_has_reset and vfio_platform_acpi_call_reset. The has reset method is used to declare reset capability via the ioctl flag VFIO_DEVICE_FLAGS_RESET. The call reset function is used to execute the _RST ACPI method. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:44 -06:00
Sinan Kaya	5afec27474	vfio: platform: add extra debug info argument to call reset Getting ready to bring out extra debug information to the caller so that more verbose information can be printed when an error is observed. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:43 -06:00
Sinan Kaya	a12a9368e1	vfio: platform: add support for ACPI probe The code is using the compatible DT string to associate a reset driver with the actual device itself. The compatible string does not exist on ACPI based systems. HID is the unique identifier for a device driver instead. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:41 -06:00
Sinan Kaya	dc5542fb11	vfio: platform: determine reset capability Creating a new function to determine if this driver supports reset function or not. This is an attempt to abstract device tree calls from the rest of the code. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:40 -06:00
Sinan Kaya	f084aa7495	vfio: platform: move reset call to a common function The reset call sequence seems to replicate itself multiple times across the file. Grouping them together for maintenance reasons. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:38 -06:00
Sinan Kaya	7aef80cf31	vfio: platform: rename reset function Renaming the reset function to of_reset as it is only used by the device tree based platforms. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Baptiste Reynal <b.reynal@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-19 10:26:36 -06:00
Ilya Lesokhin	d370c917b9	vfio: fix possible use after free of vfio group The vfio group should be released after the vfio_group_try_dissolve_container call. The code should not rely on someone else to hold a reference on the group. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-14 14:28:16 -06:00
Yongji Xie	05f0c03fba	vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive Current vfio-pci implementation disallows to mmap sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio page may be shared with other BARs. This will cause some performance issues when we passthrough a PCI device with this kind of BARs. Guest will be not able to handle the mmio accesses to the BARs which leads to mmio emulations in host. However, not all sub-page BARs will share page with other BARs. We should allow to mmap the sub-page MMIO BARs which we can make sure will not share page with other BARs. This patch adds support for this case. And we try to add a dummy resource to reserve the remainder of the page which hot-add device's BAR might be assigned into. But it's not necessary to handle the case when the BAR is not page aligned. Because we can't expect the BAR will be assigned into the same location in a page in guest when we passthrough the BAR. And it's hard to access this BAR in userspace because we have no way to get the BAR's location in a page. Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-07-08 10:06:04 -06:00
Peng Fan	9698cbf0be	vfio: platform: support No-IOMMU mode The vfio No-IOMMU mode was supported by this 'commit 03a76b60f8ba2797 ("vfio: Include No-IOMMU mode")', but it only support vfio-pci. Using vfio_iommu_group_get/put, but not iommu_group_get/put, the platform devices can be exposed to userspace with CONFIG_VFIO_NOIOMMU and the "enable_unsafe_noiommu_mode" option enabled. From 'commit 03a76b60f8ba2797 ("vfio: Include No-IOMMU mode")', "This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported." Signed-off-by: Peng Fan <van.freenix@gmail.com> Cc: Eric Auger <eric.auger@linaro.org> Cc: Baptiste Reynal <b.reynal@virtualopensystems.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-06-23 09:37:17 -06:00
Alex Williamson	ce7585f3c4	vfio/pci: Allow VPD short read The size of the VPD area is not necessarily 4-byte aligned, so a pci_vpd_read() might return less than 4 bytes. Zero our buffer and accept anything other than an error. Intel X710 NICs exercise this. Fixes: 4e1a635552d3 ("vfio/pci: Use kernel VPD access functions") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-31 21:25:52 -06:00
Alex Williamson	089f1c6b2d	vfio/type1: Fix build warning This function cannot actually be called with npage = 0, so in practice this doesn't return an uninitialized value. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2016-05-30 07:58:10 -06:00

1 2 3 4 5 ...

273 Commits