150 Commits

Author SHA1 Message Date
Alex Williamson
d067a6b62b vfio/pci: Handle concurrent vma faults
[ Upstream commit 6a45ece4c9af473555f01f0f8b97eba56e3c7d0d ]

io_remap_pfn_range() will trigger a BUG_ON if it encounters a
populated pte within the mapping range.  This can occur because we map
the entire vma on fault and multiple faults can be blocked behind the
vma_lock.  This leads to traces like the one reported below.

We can use our vma_list to test whether a given vma is mapped to avoid
this issue.

[ 1591.733256] kernel BUG at mm/memory.c:2177!
[ 1591.739515] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 1591.747381] Modules linked in: vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O)
[ 1591.760536] CPU: 2 PID: 227 Comm: lcore-worker-2 Tainted: G O 5.11.0-rc3+ #1
[ 1591.770735] Hardware name:  , BIOS HixxxxFPGA 1P B600 V121-1
[ 1591.778872] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--)
[ 1591.786134] pc : remap_pfn_range+0x214/0x340
[ 1591.793564] lr : remap_pfn_range+0x1b8/0x340
[ 1591.799117] sp : ffff80001068bbd0
[ 1591.803476] x29: ffff80001068bbd0 x28: 0000042eff6f0000
[ 1591.810404] x27: 0000001100910000 x26: 0000001300910000
[ 1591.817457] x25: 0068000000000fd3 x24: ffffa92f1338e358
[ 1591.825144] x23: 0000001140000000 x22: 0000000000000041
[ 1591.832506] x21: 0000001300910000 x20: ffffa92f141a4000
[ 1591.839520] x19: 0000001100a00000 x18: 0000000000000000
[ 1591.846108] x17: 0000000000000000 x16: ffffa92f11844540
[ 1591.853570] x15: 0000000000000000 x14: 0000000000000000
[ 1591.860768] x13: fffffc0000000000 x12: 0000000000000880
[ 1591.868053] x11: ffff0821bf3d01d0 x10: ffff5ef2abd89000
[ 1591.875932] x9 : ffffa92f12ab0064 x8 : ffffa92f136471c0
[ 1591.883208] x7 : 0000001140910000 x6 : 0000000200000000
[ 1591.890177] x5 : 0000000000000001 x4 : 0000000000000001
[ 1591.896656] x3 : 0000000000000000 x2 : 0168044000000fd3
[ 1591.903215] x1 : ffff082126261880 x0 : fffffc2084989868
[ 1591.910234] Call trace:
[ 1591.914837]  remap_pfn_range+0x214/0x340
[ 1591.921765]  vfio_pci_mmap_fault+0xac/0x130 [vfio_pci]
[ 1591.931200]  __do_fault+0x44/0x12c
[ 1591.937031]  handle_mm_fault+0xcc8/0x1230
[ 1591.942475]  do_page_fault+0x16c/0x484
[ 1591.948635]  do_translation_fault+0xbc/0xd8
[ 1591.954171]  do_mem_abort+0x4c/0xc0
[ 1591.960316]  el0_da+0x40/0x80
[ 1591.965585]  el0_sync_handler+0x168/0x1b0
[ 1591.971608]  el0_sync+0x174/0x180
[ 1591.978312] Code: eb1b027f 540000c0 f9400022 b4fffe02 (d4210000)

Fixes: 11c4cd07ba11 ("vfio-pci: Fault mmaps to enable vma tracking")
Reported-by: Zeng Tao <prime.zeng@hisilicon.com>
Suggested-by: Zeng Tao <prime.zeng@hisilicon.com>
Link: https://lore.kernel.org/r/162497742783.3883260.3282953006487785034.stgit@omen
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20 16:15:58 +02:00
Randy Dunlap
2b2ca3ee36 vfio/pci: zap_vma_ptes() needs MMU
[ Upstream commit 2a55ca37350171d9b43d561528f23d4130097255 ]

zap_vma_ptes() is only available when CONFIG_MMU is set/enabled.
Without CONFIG_MMU, vfio_pci.o has build errors, so make
VFIO_PCI depend on MMU.

riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `vfio_pci_mmap_open':
vfio_pci.c:(.text+0x1ec): undefined reference to `zap_vma_ptes'
riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `.L0 ':
vfio_pci.c:(.text+0x165c): undefined reference to `zap_vma_ptes'

Fixes: 11c4cd07ba11 ("vfio-pci: Fault mmaps to enable vma tracking")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210515190856.2130-1-rdunlap@infradead.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:24:04 +02:00
Zhen Lei
c953aee0d8 vfio/pci: Fix error return code in vfio_ecap_init()
[ Upstream commit d1ce2c79156d3baf0830990ab06d296477b93c26 ]

The error code returned from vfio_ext_cap_len() is stored in 'len', not
in 'ret'.

Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20210515020458.6771-1-thunder.leizhen@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-10 13:24:04 +02:00
Eric Auger
706a63e840 vfio/pci: Move dummy_resources_list init in vfio_pci_probe()
[ Upstream commit 16b8fe4caf499ae8e12d2ab1b1324497e36a7b83 ]

In case an error occurs in vfio_pci_enable() before the call to
vfio_pci_probe_mmaps(), vfio_pci_disable() will  try to iterate
on an uninitialized list and cause a kernel panic.

Lets move to the initialization to vfio_pci_probe() to fix the
issue.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Fixes: 05f0c03fbac1 ("vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive")
CC: Stable <stable@vger.kernel.org> # v4.7+
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-06 14:44:59 +01:00
Jason Gunthorpe
bb3130192f vfio-pci: Use io_remap_pfn_range() for PCI IO memory
[ Upstream commit 7b06a56d468b756ad6bb43ac21b11e474ebc54a0 ]

commit f8f6ae5d077a ("mm: always have io_remap_pfn_range() set
pgprot_decrypted()") allows drivers using mmap to put PCI memory mapped
BAR space into userspace to work correctly on AMD SME systems that default
to all memory encrypted.

Since vfio_pci_mmap_fault() is working with PCI memory mapped BAR space it
should be calling io_remap_pfn_range() otherwise it will not work on SME
systems.

Fixes: 11c4cd07ba11 ("vfio-pci: Fault mmaps to enable vma tracking")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Peter Xu <peterx@redhat.com>
Tested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30 11:25:59 +01:00
Alex Williamson
f393f4a5fc vfio/pci: Clear token on bypass registration failure
[ Upstream commit 852b1beecb6ff9326f7ca4bc0fe69ae860ebdb9e ]

The eventfd context is used as our irqbypass token, therefore if an
eventfd is re-used, our token is the same.  The irqbypass code will
return an -EBUSY in this case, but we'll still attempt to unregister
the producer, where if that duplicate token still exists, results in
removing the wrong object.  Clear the token of failed producers so
that they harmlessly fall out when unregistered.

Fixes: 6d7425f109d2 ("vfio: Register/unregister irq_bypass_producer")
Reported-by: guomin chen <guomin_chen@sina.com>
Tested-by: guomin chen <guomin_chen@sina.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-30 10:38:23 +01:00
Zeng Tao
0d1682ca6d vfio/pci: fix racy on error and request eventfd ctx
[ Upstream commit b872d0640840018669032b20b6375a478ed1f923 ]

The vfio_pci_release call will free and clear the error and request
eventfd ctx while these ctx could be in use at the same time in the
function like vfio_pci_request, and it's expected to protect them under
the vdev->igate mutex, which is missing in vfio_pci_release.

This issue is introduced since commit 1518ac272e78 ("vfio/pci: fix memory
leaks of eventfd ctx"),and since commit 5c5866c593bb ("vfio/pci: Clear
error and request eventfd ctx after releasing"), it's very easily to
trigger the kernel panic like this:

[ 9513.904346] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 9513.913091] Mem abort info:
[ 9513.915871]   ESR = 0x96000006
[ 9513.918912]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 9513.924198]   SET = 0, FnV = 0
[ 9513.927238]   EA = 0, S1PTW = 0
[ 9513.930364] Data abort info:
[ 9513.933231]   ISV = 0, ISS = 0x00000006
[ 9513.937048]   CM = 0, WnR = 0
[ 9513.940003] user pgtable: 4k pages, 48-bit VAs, pgdp=0000007ec7d12000
[ 9513.946414] [0000000000000008] pgd=0000007ec7d13003, p4d=0000007ec7d13003, pud=0000007ec728c003, pmd=0000000000000000
[ 9513.956975] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 9513.962521] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio hclge hns3 hnae3 [last unloaded: vfio_pci]
[ 9513.972998] CPU: 4 PID: 1327 Comm: bash Tainted: G        W         5.8.0-rc4+ #3
[ 9513.980443] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020
[ 9513.989274] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
[ 9513.994827] pc : _raw_spin_lock_irqsave+0x48/0x88
[ 9513.999515] lr : eventfd_signal+0x6c/0x1b0
[ 9514.003591] sp : ffff800038a0b960
[ 9514.006889] x29: ffff800038a0b960 x28: ffff007ef7f4da10
[ 9514.012175] x27: ffff207eefbbfc80 x26: ffffbb7903457000
[ 9514.017462] x25: ffffbb7912191000 x24: ffff007ef7f4d400
[ 9514.022747] x23: ffff20be6e0e4c00 x22: 0000000000000008
[ 9514.028033] x21: 0000000000000000 x20: 0000000000000000
[ 9514.033321] x19: 0000000000000008 x18: 0000000000000000
[ 9514.038606] x17: 0000000000000000 x16: ffffbb7910029328
[ 9514.043893] x15: 0000000000000000 x14: 0000000000000001
[ 9514.049179] x13: 0000000000000000 x12: 0000000000000002
[ 9514.054466] x11: 0000000000000000 x10: 0000000000000a00
[ 9514.059752] x9 : ffff800038a0b840 x8 : ffff007ef7f4de60
[ 9514.065038] x7 : ffff007fffc96690 x6 : fffffe01faffb748
[ 9514.070324] x5 : 0000000000000000 x4 : 0000000000000000
[ 9514.075609] x3 : 0000000000000000 x2 : 0000000000000001
[ 9514.080895] x1 : ffff007ef7f4d400 x0 : 0000000000000000
[ 9514.086181] Call trace:
[ 9514.088618]  _raw_spin_lock_irqsave+0x48/0x88
[ 9514.092954]  eventfd_signal+0x6c/0x1b0
[ 9514.096691]  vfio_pci_request+0x84/0xd0 [vfio_pci]
[ 9514.101464]  vfio_del_group_dev+0x150/0x290 [vfio]
[ 9514.106234]  vfio_pci_remove+0x30/0x128 [vfio_pci]
[ 9514.111007]  pci_device_remove+0x48/0x108
[ 9514.115001]  device_release_driver_internal+0x100/0x1b8
[ 9514.120200]  device_release_driver+0x28/0x38
[ 9514.124452]  pci_stop_bus_device+0x68/0xa8
[ 9514.128528]  pci_stop_and_remove_bus_device+0x20/0x38
[ 9514.133557]  pci_iov_remove_virtfn+0xb4/0x128
[ 9514.137893]  sriov_disable+0x3c/0x108
[ 9514.141538]  pci_disable_sriov+0x28/0x38
[ 9514.145445]  hns3_pci_sriov_configure+0x48/0xb8 [hns3]
[ 9514.150558]  sriov_numvfs_store+0x110/0x198
[ 9514.154724]  dev_attr_store+0x44/0x60
[ 9514.158373]  sysfs_kf_write+0x5c/0x78
[ 9514.162018]  kernfs_fop_write+0x104/0x210
[ 9514.166010]  __vfs_write+0x48/0x90
[ 9514.169395]  vfs_write+0xbc/0x1c0
[ 9514.172694]  ksys_write+0x74/0x100
[ 9514.176079]  __arm64_sys_write+0x24/0x30
[ 9514.179987]  el0_svc_common.constprop.4+0x110/0x200
[ 9514.184842]  do_el0_svc+0x34/0x98
[ 9514.188144]  el0_svc+0x14/0x40
[ 9514.191185]  el0_sync_handler+0xb0/0x2d0
[ 9514.195088]  el0_sync+0x140/0x180
[ 9514.198389] Code: b9001020 d2800000 52800022 f9800271 (885ffe61)
[ 9514.204455] ---[ end trace 648de00c8406465f ]---
[ 9514.212308] note: bash[1327] exited with preempt_count 1

Cc: Qian Cai <cai@lca.pw>
Cc: Alex Williamson <alex.williamson@redhat.com>
Fixes: 1518ac272e78 ("vfio/pci: fix memory leaks of eventfd ctx")
Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:49 +02:00
Alex Williamson
41a7729880 vfio/pci: Clear error and request eventfd ctx after releasing
[ Upstream commit 5c5866c593bbd444d0339ede6a8fb5f14ff66d72 ]

The next use of the device will generate an underflow from the
stale reference.

Cc: Qian Cai <cai@lca.pw>
Fixes: 1518ac272e78 ("vfio/pci: fix memory leaks of eventfd ctx")
Reported-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:48 +02:00
Qian Cai
b7e24664cc vfio/pci: fix memory leaks of eventfd ctx
[ Upstream commit 1518ac272e789cae8c555d69951b032a275b7602 ]

Finished a qemu-kvm (-device vfio-pci,host=0001:01:00.0) triggers a few
memory leaks after a while because vfio_pci_set_ctx_trigger_single()
calls eventfd_ctx_fdget() without the matching eventfd_ctx_put() later.
Fix it by calling eventfd_ctx_put() for those memory in
vfio_pci_release() before vfio_device_release().

unreferenced object 0xebff008981cc2b00 (size 128):
  comm "qemu-kvm", pid 4043, jiffies 4294994816 (age 9796.310s)
  hex dump (first 32 bytes):
    01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de  ....kkkk.....N..
    ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
  backtrace:
    [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c
    [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4
    [<000000005fcec025>] do_eventfd+0x54/0x1ac
    [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44
    [<00000000b819758c>] do_el0_svc+0x128/0x1dc
    [<00000000b244e810>] el0_sync_handler+0xd0/0x268
    [<00000000d495ef94>] el0_sync+0x164/0x180
unreferenced object 0x29ff008981cc4180 (size 128):
  comm "qemu-kvm", pid 4043, jiffies 4294994818 (age 9796.290s)
  hex dump (first 32 bytes):
    01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de  ....kkkk.....N..
    ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
  backtrace:
    [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c
    [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4
    [<000000005fcec025>] do_eventfd+0x54/0x1ac
    [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44
    [<00000000b819758c>] do_el0_svc+0x128/0x1dc
    [<00000000b244e810>] el0_sync_handler+0xd0/0x268
    [<00000000d495ef94>] el0_sync+0x164/0x180

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01 13:14:46 +02:00
Alex Williamson
81fb345971 vfio/pci: Fix SR-IOV VF handling with MMIO blocking
commit ebfa440ce38b7e2e04c3124aa89c8a9f4094cf21 upstream.

SR-IOV VFs do not implement the memory enable bit of the command
register, therefore this bit is not set in config space after
pci_enable_device().  This leads to an unintended difference
between PF and VF in hand-off state to the user.  We can correct
this by setting the initial value of the memory enable bit in our
virtualized config space.  There's really no need however to
ever fault a user on a VF though as this would only indicate an
error in the user's management of the enable bit, versus a PF
where the same access could trigger hardware faults.

Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-09 19:04:32 +02:00
Alex Williamson
da7aea6eb5 vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
commit abafbc551fddede3e0a08dee1dcde08fc0eb8476 upstream.

Accessing the disabled memory space of a PCI device would typically
result in a master abort response on conventional PCI, or an
unsupported request on PCI express.  The user would generally see
these as a -1 response for the read return data and the write would be
silently discarded, possibly with an uncorrected, non-fatal AER error
triggered on the host.  Some systems however take it upon themselves
to bring down the entire system when they see something that might
indicate a loss of data, such as this discarded write to a disabled
memory space.

To avoid this, we want to try to block the user from accessing memory
spaces while they're disabled.  We start with a semaphore around the
memory enable bit, where writers modify the memory enable state and
must be serialized, while readers make use of the memory region and
can access in parallel.  Writers include both direct manipulation via
the command register, as well as any reset path where the internal
mechanics of the reset may both explicitly and implicitly disable
memory access, and manipulation of the MSI-X configuration, where the
MSI-X vector table resides in MMIO space of the device.  Readers
include the read and write file ops to access the vfio device fd
offsets as well as memory mapped access.  In the latter case, we make
use of our new vma list support to zap, or invalidate, those memory
mappings in order to force them to be faulted back in on access.

Our semaphore usage will stall user access to MMIO spaces across
internal operations like reset, but the user might experience new
behavior when trying to access the MMIO space while disabled via the
PCI command register.  Access via read or write while disabled will
return -EIO and access via memory maps will result in a SIGBUS.  This
is expected to be compatible with known use cases and potentially
provides better error handling capabilities than present in the
hardware, while avoiding the more readily accessible and severe
platform error responses that might otherwise occur.

Fixes: CVE-2020-12888
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
[Ajay: Regenerated the patch for v4.19]
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-09 19:04:28 +02:00
Alex Williamson
6c7f2f24a8 vfio-pci: Fault mmaps to enable vma tracking
commit 11c4cd07ba111a09f49625f9e4c851d83daf0a22 upstream.

Rather than calling remap_pfn_range() when a region is mmap'd, setup
a vm_ops handler to support dynamic faulting of the range on access.
This allows us to manage a list of vmas actively mapping the area that
we can later use to invalidate those mappings.  The open callback
invalidates the vma range so that all tracking is inserted in the
fault handler and removed in the close handler.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
[Ajay: Regenerated the patch for v4.19]
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-09 19:04:28 +02:00
Alex Williamson
e9aaff2b5f vfio-pci: Mask cap zero
[ Upstream commit bc138db1b96264b9c1779cf18d5a3b186aa90066 ]

The PCI Code and ID Assignment Specification changed capability ID 0
from reserved to a NULL capability in the v1.1 revision.  The NULL
capability is defined to include only the 16-bit capability header,
ie. only the ID and next pointer.  Unfortunately vfio-pci creates a
map of config space, where ID 0 is used to reserve the standard type
0 header.  Finding an actual capability with this ID therefore results
in a bogus range marked in that map and conflicts with subsequent
capabilities.  As this seems to be a dummy capability anyway and we
already support dropping capabilities, let's hide this one rather than
delving into the potentially subtle dependencies within our map.

Seen on an NVIDIA Tesla T4.

Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-25 15:32:57 +02:00
Qian Cai
78914f1689 vfio/pci: fix memory leaks in alloc_perm_bits()
[ Upstream commit 3e63b94b6274324ff2e7d8615df31586de827c4e ]

vfio_pci_disable() calls vfio_config_free() but forgets to call
free_perm_bits() resulting in memory leaks,

unreferenced object 0xc000000c4db2dee0 (size 16):
  comm "qemu-kvm", pid 4305, jiffies 4295020272 (age 3463.780s)
  hex dump (first 16 bytes):
    00 00 ff 00 ff ff ff ff ff ff ff ff ff ff 00 00  ................
  backtrace:
    [<00000000a6a4552d>] alloc_perm_bits+0x58/0xe0 [vfio_pci]
    [<00000000ac990549>] vfio_config_init+0xdf0/0x11b0 [vfio_pci]
    init_pci_cap_msi_perm at drivers/vfio/pci/vfio_pci_config.c:1125
    (inlined by) vfio_msi_cap_len at drivers/vfio/pci/vfio_pci_config.c:1180
    (inlined by) vfio_cap_len at drivers/vfio/pci/vfio_pci_config.c:1241
    (inlined by) vfio_cap_init at drivers/vfio/pci/vfio_pci_config.c:1468
    (inlined by) vfio_config_init at drivers/vfio/pci/vfio_pci_config.c:1707
    [<000000006db873a1>] vfio_pci_open+0x234/0x700 [vfio_pci]
    [<00000000630e1906>] vfio_group_fops_unl_ioctl+0x8e0/0xb84 [vfio]
    [<000000009e34c54f>] ksys_ioctl+0xd8/0x130
    [<000000006577923d>] sys_ioctl+0x28/0x40
    [<000000006d7b1cf2>] system_call_exception+0x114/0x1e0
    [<0000000008ea7dd5>] system_call_common+0xf0/0x278
unreferenced object 0xc000000c4db2e330 (size 16):
  comm "qemu-kvm", pid 4305, jiffies 4295020272 (age 3463.780s)
  hex dump (first 16 bytes):
    00 ff ff 00 ff ff ff ff ff ff ff ff ff ff 00 00  ................
  backtrace:
    [<000000004c71914f>] alloc_perm_bits+0x44/0xe0 [vfio_pci]
    [<00000000ac990549>] vfio_config_init+0xdf0/0x11b0 [vfio_pci]
    [<000000006db873a1>] vfio_pci_open+0x234/0x700 [vfio_pci]
    [<00000000630e1906>] vfio_group_fops_unl_ioctl+0x8e0/0xb84 [vfio]
    [<000000009e34c54f>] ksys_ioctl+0xd8/0x130
    [<000000006577923d>] sys_ioctl+0x28/0x40
    [<000000006d7b1cf2>] system_call_exception+0x114/0x1e0
    [<0000000008ea7dd5>] system_call_common+0xf0/0x278

Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Signed-off-by: Qian Cai <cai@lca.pw>
[aw: rolled in follow-up patch]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-25 15:32:50 +02:00
Eric Auger
c8fa33f831 vfio_pci: Enable memory accesses before calling pci_map_rom
[ Upstream commit 0cfd027be1d6def4a462cdc180c055143af24069 ]

pci_map_rom/pci_get_rom_size() performs memory access in the ROM.
In case the Memory Space accesses were disabled, readw() is likely
to trigger a synchronous external abort on some platforms.

In case memory accesses were disabled, re-enable them before the
call and disable them back again just after.

Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-27 14:50:24 +01:00
Jiang Yi
73155507af vfio/pci: call irq_bypass_unregister_producer() before freeing irq
commit d567fb8819162099035e546b11a736e29c2af0ea upstream.

Since irq_bypass_register_producer() is called after request_irq(), we
should do tear-down in reverse order: irq_bypass_unregister_producer()
then free_irq().

Specifically free_irq() may release resources required by the
irqbypass del_producer() callback.  Notably an example provided by
Marc Zyngier on arm64 with GICv4 that he indicates has the potential
to wedge the hardware:

 free_irq(irq)
   __free_irq(irq)
     irq_domain_deactivate_irq(irq)
       its_irq_domain_deactivate()
         [unmap the VLPI from the ITS]

 kvm_arch_irq_bypass_del_producer(cons, prod)
   kvm_vgic_v4_unset_forwarding(kvm, irq, ...)
     its_unmap_vlpi(irq)
       [Unmap the VLPI from the ITS (again), remap the original LPI]

Signed-off-by: Jiang Yi <giangyi@amazon.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: 6d7425f109d26 ("vfio: Register/unregister irq_bypass_producer")
Link: https://lore.kernel.org/kvm/20191127164910.15888-1-giangyi@amazon.com
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[aw: commit log]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-21 10:57:37 +01:00
Alex Williamson
1ae5bfee6c vfio/pci: Mask buggy SR-IOV VF INTx support
[ Upstream commit db04264fe9bc0f2b62e036629f9afb530324b693 ]

The SR-IOV spec requires that VFs must report zero for the INTx pin
register as VFs are precluded from INTx support.  It's much easier for
the host kernel to understand whether a device is a VF and therefore
whether a non-zero pin register value is bogus than it is to do the
same in userspace.  Override the INTx count for such devices and
virtualize the pin register to provide a consistent view of the device
to the user.

As this is clearly a spec violation, warn about it to support hardware
validation, but also provide a known whitelist as it doesn't do much
good to continue complaining if the hardware vendor doesn't plan to
fix it.

Known devices with this issue: 8086:270c

Tested-by: Gage Eads <gage.eads@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20 18:47:32 +01:00
Li Qiang
dde3433de9 vfio/pci: Fix potential memory leak in vfio_msi_cap_len
[ Upstream commit 30ea32ab1951c80c6113f300fce2c70cd12659e4 ]

Free allocated vdev->msi_perm in error path.

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-20 18:47:32 +01:00
hexin
9b2d2f2ad0 vfio_pci: Restore original state on release
[ Upstream commit 92c8026854c25093946e0d7fe536fd9eac440f06 ]

vfio_pci_enable() saves the device's initial configuration information
with the intent that it is restored in vfio_pci_disable().  However,
the commit referenced in Fixes: below replaced the call to
__pci_reset_function_locked(), which is not wrapped in a state save
and restore, with pci_try_reset_function(), which overwrites the
restored device state with the current state before applying it to the
device.  Reinstate use of __pci_reset_function_locked() to return to
the desired behavior.

Fixes: 890ed578df82 ("vfio-pci: Use pci "try" reset interface")
Signed-off-by: hexin <hexin15@baidu.com>
Signed-off-by: Liu Qi <liuqi16@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-07 18:56:53 +02:00
Louis Taylor
4d043d3d2b vfio/pci: use correct format characters
[ Upstream commit 426b046b748d1f47e096e05bdcc6fb4172791307 ]

When compiling with -Wformat, clang emits the following warnings:

drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                ^~~~~~

drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                        ^~~~~~

drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                                ^~~~~~~~~

drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                                           ^~~~~~~~~

drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                ^~~~~~

drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                        ^~~~~~

drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                                ^~~~~~~~~

drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                                           ^~~~~~~~~
The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for unsigned ints.

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Louis Taylor <louis@kragniz.eu>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-08 07:21:49 +02:00
Linus Torvalds
b6d6a3076a VFIO updates for v4.19
- Mark switch fall throughs (Gustavo A. R. Silva)
 
  - Disable binding SR-IOV enabled PFs (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJbdaKVAAoJECObm247sIsipVMQAIIVa7zDQ1NoRooJOvy/pnYb
 OSkb0oQ+tnqjHcjOF0bJWfrH/cEmVs9qngS+mSBm1jnX9eD5PMugRXGu67pTOMOM
 4Z84KAKJKNaDaVFvMLRqNH1vU1ekF9fgcvG6e8lR9tOfNpaL02umxa8oIbyPXyrm
 tg8EzpXG/ENjKnPOXLGFz7tLwCPinElYk2aP6hHj5bwv50roE8aQFsQszSF+q8AU
 wJe8nXI6+ChqiLt3OulC5vPZzlyCMWOVslEvNSHQL3IDelx4UronoTIJ4N+Oq1Lr
 YD9jU1ec9KFAG9j13Ivls8OLF+w6jLfDT9TzentEcRMW4iWjPgyaAjiBF/eZKTE0
 L8GhmhWB1OOoltg7ibEV5mi+GmTApzHpiEBT5REWCoZbTq7tGqxw7NyAVM1+IB83
 dy02M7dbB/FztkxkgoN3r+R+jdvEi1ZvqKzYv6Di4eJA2v/5anzJCsLsImuY52iI
 +lgDrKZOcFncxebIrLJN8A/Fd3fe3MWg+A7PwqOmp5uZtn1bQDVdn+HcE6jmD0c7
 HmT/vfBNLkOWLchApacqEbJXyW6MQkNb/kPL1W2LKCyrhUZWp94dv9tTtyQibVUv
 As4bgLJRKjpmxOV2frW78eBVSWsBYIsSlrnEeEFrR2NtH3ElX2CyMQgdp1TC59F8
 PTk50WP4sJWlrjagFoKQ
 =Zb1v
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v4.19-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - mark switch fall-through cases (Gustavo A. R. Silva)

 - disable binding SR-IOV enabled PFs (Alex Williamson)

* tag 'vfio-v4.19-rc1' of git://github.com/awilliam/linux-vfio:
  vfio-pci: Disable binding to PFs with SR-IOV enabled
  vfio: Mark expected switch fall-throughs
2018-08-16 10:34:05 -07:00
Linus Torvalds
4e31843f68 pci-v4.19-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlt1f9AUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxbdhAArnhRvkwOk4m4/LCuKF6HpmlxbBNC
 TjnBCenNf+lFXzWskfDFGFl/Wif4UzGbRTSCNQrwMzj3Ww3f/6R2QIq9rEJvyNC4
 VdxQnaBEZSUgN87q5UGqgdjMTo3zFvlFH6fpb5XDiQ5IX/QZeXeYqoB64w+HvKPU
 M+IsoOvnA5gb7pMcpchrGUnSfS1e6AqQbbTt6tZflore6YCEA4cH5OnpGx8qiZIp
 ut+CMBvQjQB01fHeBc/wGrVte4NwXdONrXqpUb4sHF7HqRNfEh0QVyPhvebBi+k1
 kquqoBQfPFTqgcab31VOcQhg70dEx+1qGm5/YBAwmhCpHR/g2gioFXoROsr+iUOe
 BtF6LZr+Y8cySuhJnkCrJBqWvvBaKbJLg0KMbI+7p4o9MZpod2u7LS5LFrlRDyKW
 3nz3o+b1+v3tCCKVKIhKo0ljolgkweQtR1f6KIHvq93wBODHVQnAOt9NlPfHVyks
 ryGBnOhMjoU5hvfexgIWFk9Ph9MEVQSffkI+TeFPO/tyGBfGfQyGtESiXuEaMQaH
 FGdZHX2RLkY3pWHOtWeMzRHzOnr2XjpDFcAqL3HBGPdJ30K3Umv3WOgoFe2SaocG
 0gaddPjKSwwM4Sa/VP+O5cjGuzi7QnczSDdpYjxIGZzBav32hqx4/rsnLw7bHH8y
 XkEme7cYJc8MGsA=
 =2Dmn
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:

 - Decode AER errors with names similar to "lspci" (Tyler Baicar)

 - Expose AER statistics in sysfs (Rajat Jain)

 - Clear AER status bits selectively based on the type of recovery (Oza
   Pawandeep)

 - Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
   Gagniuc)

 - Don't clear AER status bits if we're using the "Firmware-First"
   strategy where firmware owns the registers (Alexandru Gagniuc)

 - Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
   Shevchenko)

 - Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)

 - Defer DPC event handling to work queue (Keith Busch)

 - Use threaded IRQ for DPC bottom half (Keith Busch)

 - Print AER status while handling DPC events (Keith Busch)

 - Work around IDT switch ACS Source Validation erratum (James
   Puthukattukaran)

 - Emit diagnostics for all cases of PCIe Link downtraining (Links
   operating slower than they're capable of) (Alexandru Gagniuc)

 - Skip VFs when configuring Max Payload Size (Myron Stowe)

 - Reduce Root Port Max Payload Size if necessary when hot-adding a
   device below it (Myron Stowe)

 - Simplify SHPC existence/permission checks (Bjorn Helgaas)

 - Remove hotplug sample skeleton driver (Lukas Wunner)

 - Convert pciehp to threaded IRQ handling (Lukas Wunner)

 - Improve pciehp tolerance of missed events and initially unstable
   links (Lukas Wunner)

 - Clear spurious pciehp events on resume (Lukas Wunner)

 - Add pciehp runtime PM support, including for Thunderbolt controllers
   (Lukas Wunner)

 - Support interrupts from pciehp bridges in D3hot (Lukas Wunner)

 - Mark fall-through switch cases before enabling -Wimplicit-fallthrough
   (Gustavo A. R. Silva)

 - Move DMA-debug PCI init from arch code to PCI core (Christoph
   Hellwig)

 - Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is
   supplied (Heiner Kallweit)

 - Unify PCI and DMA direction #defines (Shunyong Yang)

 - Add PCI_DEVICE_DATA() macro (Andy Shevchenko)

 - Check for VPD completion before checking for timeout (Bert Kenward)

 - Limit Netronome NFP5000 config space size to work around erratum
   (Jakub Kicinski)

 - Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit)

 - Document ACPI description of PCI host bridges (Bjorn Helgaas)

 - Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
   peer-to-peer DMA support (we don't have the peer-to-peer support yet;
   this is just one piece) (Logan Gunthorpe)

 - Clean up devm_of_pci_get_host_bridge_resources() resource allocation
   (Jan Kiszka)

 - Fixup resizable BARs after suspend/resume (Christian König)

 - Make "pci=earlydump" generic (Sinan Kaya)

 - Fix ROM BAR access routines to stay in bounds and check for signature
   correctly (Rex Zhu)

 - Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)

 - Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)

 - To avoid bus errors, enable PASID only if entire path supports
   End-End TLP prefixes (Sinan Kaya)

 - Unify slot and bus reset functions and remove hotplug knowledge from
   callers (Sinan Kaya)

 - Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
   fix guest reboot issues (Alex Williamson)

 - Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD
   Controller (Bjorn Helgaas)

 - Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt)

 - Remove Aardvark outbound window configuration (Evan Wang)

 - Fix Aardvark bridge window sizing issue (Zachary Zhang)

 - Convert Aardvark to use pci_host_probe() to reduce code duplication
   (Thomas Petazzoni)

 - Correct the Cadence cdns_pcie_writel() signature (Alan Douglas)

 - Add Cadence support for optional generic PHYs (Alan Douglas)

 - Add Cadence power management ops (Alan Douglas)

 - Remove redundant variable from Cadence driver (Colin Ian King)

 - Add Kirin MSI support (Xiaowei Song)

 - Drop unnecessary root_bus_nr setting from exynos, imx6, keystone,
   armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn
   Guo)

 - Move link notification settings from DesignWare core to individual
   drivers (Gustavo Pimentel)

 - Add endpoint library MSI-X interfaces (Gustavo Pimentel)

 - Correct signature of endpoint library IRQ interfaces (Gustavo
   Pimentel)

 - Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel)

 - Add endpoint library MSI-X test support (Gustavo Pimentel)

 - Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation
   (Jia-Ju Bai)

 - Add more devices to Broadcom PAXC quirk (Ray Jui)

 - Work around corrupted Broadcom PAXC config space to enable SMMU and
   GICv3 ITS (Ray Jui)

 - Disable MSI parsing to work around broken Broadcom PAXC logic in some
   devices (Ray Jui)

 - Hide unconfigured functions to work around a Broadcom PAXC defect
   (Ray Jui)

 - Lower iproc log level to reduce console output during boot (Ray Jui)

 - Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi)

 - Fix mobiveil missing include file (Lorenzo Pieralisi)

 - Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi)

 - Fix mvebu I/O space remapping issues (Thomas Petazzoni)

 - Use generic pci_host_bridge in mvebu instead of ARM-specific API
   (Thomas Petazzoni)

 - Whitelist VMD devices with fast interrupt handlers to avoid sharing
   vectors with slow handlers (Keith Busch)

* tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits)
  PCI/AER: Don't clear AER bits if error handling is Firmware-First
  PCI: Limit config space size for Netronome NFP5000
  PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips
  PCI/VPD: Check for VPD access completion before checking for timeout
  PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
  PCI: Match Root Port's MPS to endpoint's MPSS as necessary
  PCI: Skip MPS logic for Virtual Functions (VFs)
  PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
  PCI: Check for PCIe Link downtraining
  PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
  PCI: Add device-specific ACS Redirect disable infrastructure
  PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
  PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
  PCI: Allow specifying devices using a base bus and path of devfns
  PCI: Make specifying PCI devices in kernel parameters reusable
  PCI: Hide ACS quirk declarations inside PCI core
  PCI: Delay after FLR of Intel DC P3700 NVMe
  PCI: Disable Samsung SM961/PM961 NVMe before FLR
  PCI: Export pcie_has_flr()
  PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()
  ...
2018-08-16 09:21:54 -07:00
Alex Williamson
0dd0e297f0 vfio-pci: Disable binding to PFs with SR-IOV enabled
We expect to receive PFs with SR-IOV disabled, however some host
drivers leave SR-IOV enabled at unbind.  This puts us in a state where
we can potentially assign both the PF and the VF, leading to both
functionality as well as security concerns due to lack of managing the
SR-IOV state as well as vendor dependent isolation from the PF to VF.
If we were to attempt to actively disable SR-IOV on driver probe, we
risk VF bound drivers blocking, potentially risking live lock
scenarios.  Therefore simply refuse to bind to PFs with SR-IOV enabled
with a warning message indicating the issue.  Users can resolve this
by re-binding to the host driver and disabling SR-IOV before
attempting to use the device with vfio-pci.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-06 12:23:19 -06:00
Gustavo A. R. Silva
544c05a60a vfio: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-06 12:22:54 -06:00
Sinan Kaya
c6a44ba950 PCI: Rename pci_try_reset_bus() to pci_reset_bus()
Now that the old implementation of pci_reset_bus() is gone, replace
pci_try_reset_bus() with pci_reset_bus().

Compared to the old implementation, new code will fail immmediately with
-EAGAIN if object lock cannot be obtained.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 18:04:23 -05:00
Sinan Kaya
811c5cb37d PCI: Unify try slot and bus reset API
Drivers are expected to call pci_try_reset_slot() or pci_try_reset_bus() by
querying if a system supports hotplug or not.  A survey showed that most
drivers don't do this and we are leaking hotplug capability to the user.

Hide pci_try_slot_reset() from drivers and embed into pci_try_bus_reset().
Change pci_try_reset_bus() parameter from struct pci_bus to struct pci_dev.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19 18:04:23 -05:00
Gustavo A. R. Silva
0e714d2778 vfio/pci: Fix potential Spectre v1
info.index can be indirectly controlled by user-space, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/vfio/pci/vfio_pci.c:734 vfio_pci_ioctl()
warn: potential spectre issue 'vdev->region'

Fix this by sanitizing info.index before indirectly using it to index
vdev->region

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-07-18 12:57:25 -06:00
Alex Williamson
08ca1b52f6 vfio/pci: Make IGD support a configurable option
Allow the code which provides extensions to support direct assignment
of Intel IGD (GVT-d) to be compiled out of the kernel if desired.  The
config option for this was previously automatically enabled on X86,
therefore the default remains Y.  This simply provides the option to
disable it even for X86.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-18 16:39:50 -06:00
Linus Torvalds
f605ba97fb VFIO updates for v4.17-rc1
- Adopt iommu_unmap_fast() interface to type1 backend
    (Suravee Suthikulpanit)
 
  - mdev sample driver fixup (Shunyong Yang)
 
  - More efficient PFN mapping handling in type1 backend
    (Jason Cai)
 
  - VFIO device ioeventfd interface (Alex Williamson)
 
  - Tag new vfio-platform sub-maintainer (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJax+VBAAoJECObm247sIsijnUQAI/TEvEGxZkUEXZ5DeFvcXjM
 N5ICSkApFaAyCDmrR5ljfd4u0k1OCePH9v9BR2BsfdMZtNRGCMMeQGYv0NR44Ude
 8wwJh3aitg3angZTaetaWt4o43A1SDHXg4JsjDqcSL6XR7b465gzr10OjuXCN0Wa
 9ltdlRaxEZ/SMrR7oITqJ6CGrSu6OWtQnaMUA9c2lLsNRTVUt8wyv54HhMSdBA4E
 Sm2IjdLwLbVPvStMbVzsd+Rm9nIoVNbaGuURfS7yx6FU30URTuajmbY3AtewA/1w
 BBMgTAbdGaLN7xbxxzZwAApMbHDFoiNrLGT63Y+ylEL4IPSBBksqvqpijsHDy/5g
 ASI9O32i04Wy1x7744nhSPI3XPWBL0rXdRvZHk5OIisIJS7NFk4g05S3wEz1Kfxz
 Vb0DW7AXZmunCFgPH3Oli0V41HfZrDx5F/X8FqtucnGSv2c3CVwMiHgueKDIXx96
 mtujLuXb/qrIUM+/nJ36090DOmiTVD8k5GMcetc9Wu7S4AFQlkTmmOroGQQxRyXA
 giP3rxHCt+H0OSjn0OwjjCsoB0MmMbeUD9Y9Ak0CQU2gSrj2G4/2tVpNYO8Uz8u+
 sInZWClJrRskG6vegLFBoR6um9vYbFU6/WaSb6cDPiixScmSwbm7c1hu0PVOo56p
 8WwkBomAv4iFcAxDCTQG
 =yaeE
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Adopt iommu_unmap_fast() interface to type1 backend
   (Suravee Suthikulpanit)

 - mdev sample driver fixup (Shunyong Yang)

 - More efficient PFN mapping handling in type1 backend
   (Jason Cai)

 - VFIO device ioeventfd interface (Alex Williamson)

 - Tag new vfio-platform sub-maintainer (Alex Williamson)

* tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio:
  MAINTAINERS: vfio/platform: Update sub-maintainer
  vfio/pci: Add ioeventfd support
  vfio/pci: Use endian neutral helpers
  vfio/pci: Pull BAR mapping setup from read-write path
  vfio/type1: Improve memory pinning process for raw PFN mapping
  vfio-mdev/samples: change RDI interrupt condition
  vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
2018-04-06 19:44:27 -07:00
Alex Williamson
30656177c4 vfio/pci: Add ioeventfd support
The ioeventfd here is actually irqfd handling of an ioeventfd such as
supported in KVM.  A user is able to pre-program a device write to
occur when the eventfd triggers.  This is yet another instance of
eventfd-irqfd triggering between KVM and vfio.  The impetus for this
is high frequency writes to pages which are virtualized in QEMU.
Enabling this near-direct write path for selected registers within
the virtualized page can improve performance and reduce overhead.
Specifically this is initially targeted at NVIDIA graphics cards where
the driver issues a write to an MMIO register within a virtualized
region in order to allow the MSI interrupt to re-trigger.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-26 13:22:58 -06:00
Alex Williamson
07fd7ef3a1 vfio/pci: Use endian neutral helpers
The iowriteXX/ioreadXX functions assume little endian hardware and
convert to little endian on a write and from little endian on a read.
We currently do our own explicit conversion to negate this.  Instead,
add some endian dependent defines to avoid all byte swaps.  There
should be no functional change other than big endian systems aren't
penalized with wasted swaps.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-26 13:22:58 -06:00
Alex Williamson
0d77ed3589 vfio/pci: Pull BAR mapping setup from read-write path
This creates a common helper that we'll use for ioeventfd setup.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-26 13:22:58 -06:00
Alex Williamson
834814e802 Revert: "vfio-pci: Mask INTx if a device is not capabable of enabling it"
This reverts commit 2170dd04316e0754cbbfa4892a25aead39d225f7

The intent of commit 2170dd04316e ("vfio-pci: Mask INTx if a device is
not capabable of enabling it") was to disallow the user from seeing
that the device supports INTx if the platform is incapable of enabling
it.  The detection of this case however incorrectly includes devices
which natively do not support INTx, such as SR-IOV VFs, and further
discussions reveal gaps even for the target use case.

Reported-by: Arjun Vynipadath <arjun@chelsio.com>
Fixes: 2170dd04316e ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-21 22:50:19 -06:00
Alexey Kardashevskiy
a32295c612 vfio-pci: Allow mapping MSIX BAR
By default VFIO disables mapping of MSIX BAR to the userspace as
the userspace may program it in a way allowing spurious interrupts;
instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl.
In order to eliminate guessing from the userspace about what is
mmapable, VFIO also advertises a sparse list of regions allowed to mmap.

This works fine as long as the system page size equals to the MSIX
alignment requirement which is 4KB. However with a bigger page size
the existing code prohibits mapping non-MSIX parts of a page with MSIX
structures so these parts have to be emulated via slow reads/writes on
a VFIO device fd. If these emulated bits are accessed often, this has
serious impact on performance.

This allows mmap of the entire BAR containing MSIX vector table.

This removes the sparse capability for PCI devices as it becomes useless.

As the userspace needs to know for sure whether mmapping of the MSIX
vector containing data can succeed, this adds a new capability -
VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - which explicitly tells the userspace
that the entire BAR can be mmapped.

This does not touch the MSIX mangling in the BAR read/write handlers as
we are doing this just to enable direct access to non MSIX registers.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw - fixup whitespace, trim function name]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-12-20 10:05:06 -07:00
Alex Williamson
dda01f787d vfio: Simplify capability helper
The vfio_info_add_capability() helper requires the caller to pass a
capability ID, which it then uses to fill in header fields, assuming
hard coded versions.  This makes for an awkward and rigid interface.
The only thing we want this helper to do is allocate sufficient
space in the caps buffer and chain this capability into the list.
Reduce it to that simple task.

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-12-20 09:53:54 -07:00
Alexey Kardashevskiy
2170dd0431 vfio-pci: Mask INTx if a device is not capabable of enabling it
At the moment VFIO rightfully assumes that INTx is supported if
the interrupt pin is not set to zero in the device config space.
However if that is not the case (the pin is not zero but pdev->irq is),
vfio_intx_enable() fails.

In order to prevent the userspace from trying to enable INTx when we know
that it cannot work, let's mask the PCI_INTERRUPT_PIN register.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-12-20 09:53:52 -07:00
Alex Williamson
cf0d53ba49 vfio/pci: Virtualize Maximum Read Request Size
MRRS defines the maximum read request size a device is allowed to
make.  Drivers will often increase this to allow more data transfer
with a single request.  Completions to this request are bound by the
MPS setting for the bus.  Aside from device quirks (none known), it
doesn't seem to make sense to set an MRRS value less than MPS, yet
this is a likely scenario given that user drivers do not have a
system-wide view of the PCI topology.  Virtualize MRRS such that the
user can set MRRS >= MPS, but use MPS as the floor value that we'll
write to hardware.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-02 12:39:10 -06:00
Alex Williamson
523184972b vfio/pci: Virtualize Maximum Payload Size
With virtual PCI-Express chipsets, we now see userspace/guest drivers
trying to match the physical MPS setting to a virtual downstream port.
Of course a lone physical device surrounded by virtual interconnects
cannot make a correct decision for a proper MPS setting.  Instead,
let's virtualize the MPS control register so that writes through to
hardware are disallowed.  Userspace drivers like QEMU assume they can
write anything to the device and we'll filter out anything dangerous.
Since mismatched MPS can lead to AER and other faults, let's add it
to the kernel side rather than relying on userspace virtualization to
handle it.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
2017-10-02 12:39:09 -06:00
Alex Williamson
796b755066 vfio/pci: Fix handling of RC integrated endpoint PCIe capability size
Root complex integrated endpoints do not have a link and therefore may
use a smaller PCIe capability in config space than we expect when
building our config map.  Add a case for these to avoid reporting an
erroneous overlap.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-07-27 10:39:33 -06:00
Alex Williamson
9f47803503 vfio/pci: Use pci_try_reset_function() on initial open
Device lock bites again; if a device .remove() callback races a user
calling ioctl(VFIO_GROUP_GET_DEVICE_FD), the unbind request will hold
the device lock, but the user ioctl may have already taken a vfio_device
reference.  In the case of a PCI device, the initial open will attempt
to reset the device, which again attempts to get the device lock,
resulting in deadlock.  Use the trylock PCI reset interface and return
error on the open path if reset fails due to lock contention.

Link: https://lkml.org/lkml/2017/7/25/381
Reported-by: Wen Congyang <wencongyang2@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-07-26 14:33:15 -06:00
Alex Williamson
7d57e5e933 vfio/pci: Add Intel XXV710 to hidden INTx devices
XXV710 has the same broken INTx behavior as the rest of the X/XL710
series, the interrupt status register is not wired to report pending
INTx interrupts, thus we never associate the interrupt to the device.
Extend the device IDs to include these so that we hide that the
device supports INTx at all to the user.

Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
2017-06-13 09:22:57 -06:00
Arvind Yadav
e19f32da5d vfio-pci: Handle error from pci_iomap
Here, pci_iomap can fail, handle this case release selected
pci regions and return -ENOMEM.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-01-04 08:34:39 -07:00
Arnd Bergmann
45e8697144 vfio-pci: use 32-bit comparisons for register address for gcc-4.5
Using ancient compilers (gcc-4.5 or older) on ARM, we get a link
failure with the vfio-pci driver:

ERROR: "__aeabi_lcmp" [drivers/vfio/pci/vfio-pci.ko] undefined!

The reason is that the compiler tries to do a comparison of
a 64-bit range. This changes it to convert to a 32-bit number
explicitly first, as newer compilers do for themselves.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-30 08:13:47 -07:00
Linus Torvalds
0ab7b12c49 pci-v4.10-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYUt1vAAoJEFmIoMA60/r8abgP/3R+5Lsk5/kfAHk5/2Mtqbvg
 mZ0eDUpY9GbUeMjSq84Nr2H8u7d+1AJCCu8KtDJYZCmjZpnSp2SuE2PS5JoGC7zC
 fintD24jlIF4/J5+HeVXXmbfr3xATxvpTuiSLEi8sLBRJ3KRIswhMSwoPwOyeTQw
 v/EclWKPGYcI5Zp0oigY9/Jd3q3lQ17KXppi/0dDoLh7PNOFvEHItXWzmf++u/NP
 iYT9R1xmzEsy0/HRd6hiwPT2xA8YsAXxgobhHooUgh1FWmZ02Tg1WjgDemOW4lVh
 kNIUcsLczh7wZCceogrrJ+pwb9+NyyIyKuHPv6OG3ieyz1IZdznaj1fAE5HJYiPo
 eVS7cP1S6DyV3Y5qFj5F2dSRS7T4GXdXG5mNhmeCpUHs0vfzSCG36jLmhTy8UIxs
 1rCf5oFa+uU9q0okfH8VtcGOXqWjGgyxTSGGfF71HUMLnPbsci2fxC2cO6svzIX7
 wDY0uxOzpyMIYMuQR6iz7VqvAwEaZ+7pfMIrWWdDcQ9/5tCNJ49cLuKaThPL4bVu
 juiGBQtnTLg8tjrhjDL9tQiJpuVIweVXyyQ1fvZoVXkMLlhVCF2ttirvwFUit2PB
 84OlevQZ+9QdE/qalrWbv4qzhesuiwu0avkzjGoqg6tWTF0epu2AHI2vqy6UBYEG
 tcfJPEcz1019PKZNSvWy
 =ut0k
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes:

   - add support for PCI on ARM64 boxes with ACPI. We already had this
     for theoretical spec-compliant hardware; now we're adding quirks
     for the actual hardware (Cavium, HiSilicon, Qualcomm, X-Gene)

   - add runtime PM support for hotplug ports

   - enable runtime suspend for Intel UHCI that uses platform-specific
     wakeup signaling

   - add yet another host bridge registration interface. We hope this is
     extensible enough to subsume the others

   - expose device revision in sysfs for DRM

   - to avoid device conflicts, make sure any VF BAR updates are done
     before enabling the VF

   - avoid unnecessary link retrains for ASPM

   - allow INTx masking on Mellanox devices that support it

   - allow access to non-standard VPD for Chelsio devices

   - update Broadcom iProc support for PAXB v2, PAXC v2, inbound DMA,
     etc

   - update Rockchip support for max-link-speed

   - add NVIDIA Tegra210 support

   - add Layerscape LS1046a support

   - update R-Car compatibility strings

   - add Qualcomm MSM8996 support

   - remove some uninformative bootup messages"

* tag 'pci-v4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (115 commits)
  PCI: Enable access to non-standard VPD for Chelsio devices (cxgb3)
  PCI: Expand "VPD access disabled" quirk message
  PCI: pciehp: Remove loading message
  PCI: hotplug: Remove hotplug core message
  PCI: Remove service driver load/unload messages
  PCI/AER: Log AER IRQ when claiming Root Port
  PCI/AER: Log errors with PCI device, not PCIe service device
  PCI/AER: Remove unused version macros
  PCI/PME: Log PME IRQ when claiming Root Port
  PCI/PME: Drop unused support for PMEs from Root Complex Event Collectors
  PCI: Move config space size macros to pci_regs.h
  x86/platform/intel-mid: Constify mid_pci_platform_pm
  PCI/ASPM: Don't retrain link if ASPM not possible
  PCI: iproc: Skip check for legacy IRQ on PAXC buses
  PCI: pciehp: Leave power indicator on when enabling already-enabled slot
  PCI: pciehp: Prioritize data-link event over presence detect
  PCI: rcar: Add gen3 fallback compatibility string for pcie-rcar
  PCI: rcar: Use gen2 fallback compatibility last
  PCI: rcar-gen2: Use gen2 fallback compatibility last
  PCI: rockchip: Move the deassert of pm/aclk/pclk after phy_init()
  ..
2016-12-15 12:46:48 -08:00
Wang Sheng-Hui
cc10385b6f PCI: Move config space size macros to pci_regs.h
Move PCI configuration space size macros (PCI_CFG_SPACE_SIZE and
PCI_CFG_SPACE_EXP_SIZE) from drivers/pci/pci.h to
include/uapi/linux/pci_regs.h so they can be used by more drivers and
eliminate duplicate definitions.

[bhelgaas: Expand comment to include PCI-X details]
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-12 10:05:22 -06:00
Cao jin
f4cb410019 vfio/pci: Drop unnecessary pcibios_err_to_errno()
As of commit d97ffe236894 ("PCI: Fix return value from
pci_user_{read,write}_config_*()") it's unnecessary to call
pcibios_err_to_errno() to fixup the return value from these functions.

pcibios_err_to_errno() already does simple passthrough of -errno values,
therefore no functional change is expected.

[aw: changelog]
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-18 11:06:42 -07:00
Kirti Wankhede
ef198aaa16 vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()
Updated vfio_pci.c file to use vfio_set_irqs_validate_and_prepare()

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17 08:33:20 -07:00
Kirti Wankhede
c535d34569 vfio_pci: Update vfio_pci to use vfio_info_add_capability()
Update msix_sparse_mmap_cap() to use vfio_info_add_capability()
Update region type capability to use vfio_info_add_capability()

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-11-17 08:33:20 -07:00
Vlad Tsyrklevich
05692d7005 vfio/pci: Fix integer overflows, bitmask check
The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
user-supplied integers, potentially allowing memory corruption. This
patch adds appropriate integer overflow checks, checks the range bounds
for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
vfio_pci_set_irqs_ioctl().

Furthermore, a kzalloc is changed to a kcalloc because the use of a
kzalloc with an integer multiplication allowed an integer overflow
condition to be reached without this patch. kcalloc checks for overflow
and should prevent a similar occurrence.

Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-10-26 13:49:29 -06:00
Christoph Hellwig
61771468e0 vfio_pci: use pci_alloc_irq_vectors
Simplify the interrupt setup by using the new PCI layer helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-09-29 13:36:38 -06:00