IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
update_pasid() and its call chain are currently unused in the tree because
Thomas disabled the ENQCMD feature. The feature will be re-enabled shortly
using a different approach and update_pasid() and its call chain will not
be used in the new approach.
Remove the useless functions.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210920192349.2602141-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211014053839.727419-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
pasid_mutex and dev->iommu->param->lock are held while unbinding mm is
flushing IO page fault workqueue and waiting for all page fault works to
finish. But an in-flight page fault work also need to hold the two locks
while unbinding mm are holding them and waiting for the work to finish.
This may cause an ABBA deadlock issue as shown below:
idxd 0000:00:0a.0: unbind PASID 2
======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc7+ #549 Not tainted [ 186.615245] ----------
dsa_test/898 is trying to acquire lock:
ffff888100d854e8 (¶m->lock){+.+.}-{3:3}, at:
iopf_queue_flush_dev+0x29/0x60
but task is already holding lock:
ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
intel_svm_unbind+0x34/0x1e0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (pasid_mutex){+.+.}-{3:3}:
__mutex_lock+0x75/0x730
mutex_lock_nested+0x1b/0x20
intel_svm_page_response+0x8e/0x260
iommu_page_response+0x122/0x200
iopf_handle_group+0x1c2/0x240
process_one_work+0x2a5/0x5a0
worker_thread+0x55/0x400
kthread+0x13b/0x160
ret_from_fork+0x22/0x30
-> #1 (¶m->fault_param->lock){+.+.}-{3:3}:
__mutex_lock+0x75/0x730
mutex_lock_nested+0x1b/0x20
iommu_report_device_fault+0xc2/0x170
prq_event_thread+0x28a/0x580
irq_thread_fn+0x28/0x60
irq_thread+0xcf/0x180
kthread+0x13b/0x160
ret_from_fork+0x22/0x30
-> #0 (¶m->lock){+.+.}-{3:3}:
__lock_acquire+0x1134/0x1d60
lock_acquire+0xc6/0x2e0
__mutex_lock+0x75/0x730
mutex_lock_nested+0x1b/0x20
iopf_queue_flush_dev+0x29/0x60
intel_svm_drain_prq+0x127/0x210
intel_svm_unbind+0xc5/0x1e0
iommu_sva_unbind_device+0x62/0x80
idxd_cdev_release+0x15a/0x200 [idxd]
__fput+0x9c/0x250
____fput+0xe/0x10
task_work_run+0x64/0xa0
exit_to_user_mode_prepare+0x227/0x230
syscall_exit_to_user_mode+0x2c/0x60
do_syscall_64+0x48/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
other info that might help us debug this:
Chain exists of:
¶m->lock --> ¶m->fault_param->lock --> pasid_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pasid_mutex);
lock(¶m->fault_param->lock);
lock(pasid_mutex);
lock(¶m->lock);
*** DEADLOCK ***
2 locks held by dsa_test/898:
#0: ffff888100cc1cc0 (&group->mutex){+.+.}-{3:3}, at:
iommu_sva_unbind_device+0x53/0x80
#1: ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
intel_svm_unbind+0x34/0x1e0
stack backtrace:
CPU: 2 PID: 898 Comm: dsa_test Not tainted 5.14.0-rc7+ #549
Hardware name: Intel Corporation Kabylake Client platform/KBL S
DDR4 UD IMM CRB, BIOS KBLSE2R1.R00.X050.P01.1608011715 08/01/2016
Call Trace:
dump_stack_lvl+0x5b/0x74
dump_stack+0x10/0x12
print_circular_bug.cold+0x13d/0x142
check_noncircular+0xf1/0x110
__lock_acquire+0x1134/0x1d60
lock_acquire+0xc6/0x2e0
? iopf_queue_flush_dev+0x29/0x60
? pci_mmcfg_read+0xde/0x240
__mutex_lock+0x75/0x730
? iopf_queue_flush_dev+0x29/0x60
? pci_mmcfg_read+0xfd/0x240
? iopf_queue_flush_dev+0x29/0x60
mutex_lock_nested+0x1b/0x20
iopf_queue_flush_dev+0x29/0x60
intel_svm_drain_prq+0x127/0x210
? intel_pasid_tear_down_entry+0x22e/0x240
intel_svm_unbind+0xc5/0x1e0
iommu_sva_unbind_device+0x62/0x80
idxd_cdev_release+0x15a/0x200
pasid_mutex protects pasid and svm data mapping data. It's unnecessary
to hold pasid_mutex while flushing the workqueue. To fix the deadlock
issue, unlock pasid_pasid during flushing the workqueue to allow the works
to be handled.
Fixes: d5b9e4bfe0 ("iommu/vt-d: Report prq to io-pgfault framework")
Reported-and-tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210828070622.2437559-3-baolu.lu@linux.intel.com
[joro: Removed timing information from kernel log messages]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The mm->pasid will be used in intel_svm_free_pasid() after load_pasid()
during unbinding mm. Clearing it in load_pasid() will cause PASID cannot
be freed in intel_svm_free_pasid().
Additionally mm->pasid was updated already before load_pasid() during pasid
allocation. No need to update it again in load_pasid() during binding mm.
Don't update mm->pasid to avoid the issues in both binding mm and unbinding
mm.
Fixes: 4048377414 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers")
Reported-and-tested-by: Dave Jiang <dave.jiang@intel.com>
Co-developed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210828070622.2437559-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The minimum per-IOMMU PRQ queue size is one 4K page, this is more entries
than the hardcoded limit of 32 in the current VT-d code. Some devices can
support up to 512 outstanding PRQs but underutilized by this limit of 32.
Although, 32 gives some rough fairness when multiple devices share the same
IOMMU PRQ queue, but far from optimal for customized use case. This extends
the per-IOMMU PRQ queue size to four 4K pages and let the devices have as
many outstanding page requests as they can.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210720013856.4143880-1-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/r/20210818134852.1847070-7-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A PASID reference is increased whenever a device is bound to an mm (and
its PASID) successfully (i.e. the device's sdev user count is increased).
But the reference is not dropped every time the device is unbound
successfully from the mm (i.e. the device's sdev user count is decreased).
The reference is dropped only once by calling intel_svm_free_pasid() when
there isn't any device bound to the mm. intel_svm_free_pasid() drops the
reference and only frees the PASID on zero reference.
Fix the issue by dropping the PASID reference and freeing the PASID when
no reference on successful unbinding the device by calling
intel_svm_free_pasid() .
Fixes: 4048377414 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers")
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210813181345.1870742-1-fenghua.yu@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210817124321.1517985-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Replace a couple of calls to memcpy() with simple assignments in order
to fix the following out-of-bounds warning:
drivers/iommu/intel/svm.c:1198:4: warning: 'memcpy' offset [25, 32] from
the object at 'desc' is out of the bounds of referenced subobject
'qw2' with type 'long long unsigned int' at offset 16 [-Warray-bounds]
The problem is that the original code is trying to copy data into a
couple of struct members adjacent to each other in a single call to
memcpy(). This causes a legitimate compiler warning because memcpy()
overruns the length of &desc.qw2 and &resp.qw2, respectively.
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210414201403.GA392764@embeddedor
Link: https://lore.kernel.org/r/20210610020115.1637656-18-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The Address field of the Page Request Descriptor only keeps bit [63:12]
of the offending address. Convert it to a full address before reporting
it to device drivers.
Fixes: eb8d93ea3c ("iommu/vt-d: Report page request faults for guest SVA")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320025415.641201-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The SVM_FLAG_PRIVATE_PASID has never been referenced in the tree, and
there's no plan to have anything to use it. So cleanup it.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323010600.678627-4-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The svm_dev_ops has never been referenced in the tree, and there's no
plan to have anything to use it. Remove it to make the code neat.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323010600.678627-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The VT-d specification (section 7.6) requires that the value in the
Private Data field of a Page Group Response Descriptor must match
the value in the Private Data field of the respective Page Request
Descriptor.
The private data field of a page group response descriptor is set then
immediately cleared in prq_event_thread(). This breaks the rule defined
by the VT-d specification. Fix it by moving clearing code up.
Fixes: 5b438f4ba3 ("iommu/vt-d: Support page request in scalable mode")
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210320024156.640798-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Page requests are originated from the user page fault. Therefore, we
shall set FAULT_FLAG_USER.
FAULT_FLAG_REMOTE indicates that we are walking an mm which is not
guaranteed to be the same as the current->mm and should not be subject
to protection key enforcement. Therefore, we should set FAULT_FLAG_REMOTE
to avoid faults when both SVM and PKEY are used.
References: commit 1b2ee1266e ("mm/core: Do not enforce PKEY permissions on remote mm access")
Reviewed-by: Raj Ashok <ashok.raj@intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/1614680040-1989-5-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When supervisor/privilige mode SVM is used, we bind init_mm.pgd with
a supervisor PASID. There should not be any page fault for init_mm.
Execution request with DMA read is also not supported.
This patch checks PRQ descriptor for both unsupported configurations,
reject them both with invalid responses.
Fixes: 1c4f88b7f1 ("iommu/vt-d: Shared virtual address in scalable mode")
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lore.kernel.org/r/1614680040-1989-4-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The VT-d IOMMU response RESPONSE_FAILURE for a page request in below
cases:
- When it gets a Page_Request with no PASID;
- When it gets a Page_Request with PASID that is not in use for this
device.
This is allowed by the spec, but IOMMU driver doesn't support such cases
today. When the device receives RESPONSE_FAILURE, it sends the device
state machine to HALT state. Now if we try to unload the driver, it hangs
since the device doesn't send any outbound transactions to host when the
driver is trying to clear things up. The only possible responses would be
for invalidation requests.
Let's use RESPONSE_INVALID instead for now, so that the device state
machine doesn't enter HALT state.
Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210126080730.2232859-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
It is incorrect to always clear PRO when it's set w/o first checking
whether the overflow condition has been cleared. Current code assumes
that if an overflow condition occurs it must have been cleared by earlier
loop. However since the code runs in a threaded context, the overflow
condition could occur even after setting the head to the tail under some
extreme condition. To be sane, we should read both head/tail again when
seeing a pending PRO and only clear PRO after all pending PRs have been
handled.
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/linux-iommu/MWHPR11MB18862D2EA5BD432BF22D99A48CA09@MWHPR11MB1886.namprd11.prod.outlook.com/
Link: https://lore.kernel.org/r/20210126080730.2232859-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The pasid based IOTLB and devTLB invalidation code is duplicate in
several places. Consolidate them by using the common helpers.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210114085021.717041-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The VT-d hardware will ignore those Addr bits which have been masked by
the AM field in the PASID-based-IOTLB invalidation descriptor. As the
result, if the starting address in the descriptor is not aligned with
the address mask, some IOTLB caches might not invalidate. Hence people
will see below errors.
[ 1093.704661] dmar_fault: 29 callbacks suppressed
[ 1093.704664] DMAR: DRHD: handling fault status reg 3
[ 1093.712738] DMAR: [DMA Read] Request device [7a:02.0] PASID 2
fault addr 7f81c968d000 [fault reason 113]
SM: Present bit in first-level paging entry is clear
Fix this by using aligned address for PASID-based-IOTLB invalidation.
Fixes: 1c4f88b7f1 ("iommu/vt-d: Shared virtual address in scalable mode")
Reported-and-tested-by: Guo Kaijie <Kaijie.Guo@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201231005323.2178523-2-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
'struct intel_svm' is shared by all devices bound to a give process,
but records only a single pointer to a 'struct intel_iommu'. Consequently,
cache invalidations may only be applied to a single DMAR unit, and are
erroneously skipped for the other devices.
In preparation for fixing this, rework the structures so that the iommu
pointer resides in 'struct intel_svm_dev', allowing 'struct intel_svm'
to track them in its device list.
Fixes: 1c4f88b7f1 ("iommu/vt-d: Shared virtual address in scalable mode")
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Raj Ashok <ashok.raj@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Reported-by: Guo Kaijie <Kaijie.Guo@intel.com>
Reported-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Guo Kaijie <Kaijie.Guo@intel.com>
Signed-off-by: Xin Zeng <xin.zeng@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Tested-by: Guo Kaijie <Kaijie.Guo@intel.com>
Cc: stable@vger.kernel.org # v5.0+
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1609949037-25291-2-git-send-email-yi.l.liu@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
Lock(&iommu->lock) without disabling irq causes lockdep warnings.
========================================================
WARNING: possible irq lock inversion dependency detected
5.11.0-rc1+ #828 Not tainted
--------------------------------------------------------
kworker/0:1H/120 just changed the state of lock:
ffffffffad9ea1b8 (device_domain_lock){..-.}-{2:2}, at:
iommu_flush_dev_iotlb.part.0+0x32/0x120
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&iommu->lock){+.+.}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&iommu->lock);
local_irq_disable();
lock(device_domain_lock);
lock(&iommu->lock);
<Interrupt>
lock(device_domain_lock);
*** DEADLOCK ***
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201231005323.2178523-5-baolu.lu@linux.intel.com
Signed-off-by: Will Deacon <will@kernel.org>
Let IOASID users take references to existing ioasids with ioasid_get().
ioasid_put() drops a reference and only frees the ioasid when its
reference number is zero. It returns true if the ioasid was freed.
For drivers that don't call ioasid_get(), ioasid_put() is the same as
ioasid_free().
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20201106155048.997886-2-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
In prq_event_thread(), the QI_PGRP_PDP is wrongly set by
'req->pasid_present' which should be replaced to
'req->priv_data_present'.
Fixes: 5b438f4ba3 ("iommu/vt-d: Support page request in scalable mode")
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1604025444-6954-3-git-send-email-yi.y.sun@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Should get correct sid and set it into sdev. Because we execute
'sdev->sid != req->rid' in the loop of prq_event_thread().
Fixes: eb8d93ea3c ("iommu/vt-d: Report page request faults for guest SVA")
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/1604025444-6954-2-git-send-email-yi.y.sun@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
- ARM-SMMU Updates from Will:
- Continued SVM enablement, where page-table is shared with
CPU
- Groundwork to support integrated SMMU with Adreno GPU
- Allow disabling of MSI-based polling on the kernel
command-line
- Minor driver fixes and cleanups (octal permissions, error
messages, ...)
- Secure Nested Paging Support for AMD IOMMU. The IOMMU will
fault when a device tries DMA on memory owned by a guest. This
needs new fault-types as well as a rewrite of the IOMMU memory
semaphore for command completions.
- Allow broken Intel IOMMUs (wrong address widths reported) to
still be used for interrupt remapping.
- IOMMU UAPI updates for supporting vSVA, where the IOMMU can
access address spaces of processes running in a VM.
- Support for the MT8167 IOMMU in the Mediatek IOMMU driver.
- Device-tree updates for the Renesas driver to support r8a7742.
- Several smaller fixes and cleanups all over the place.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl+Fy9MACgkQK/BELZcB
GuNxtRAA0TdYHXt6XyLWmvRAX/ySZSz6KOneZWWwpsQ9wh2/iv1PtBsrV0ltf+6g
CaX4ROZUVRbV9wPD+7maBRbzxrG3QhfEaaV+K45Q2J/QE1wjkyV8qj1eORWTUUoc
nis4FhGDKk2ER/Gsajy2Hjs4+6i43gdWG/+ghVGaCRo8mCZyoz1/6AyMQyN3deuO
NqWOv9E7hsavZjRs/w/LXG7eSE20cZwtt//kPVJF0r9eQqC6i1eJDQj48iRqJVqd
R0dwBQZaLz++qQptyKebDNlmH/3aAsb+A8nCeS7ZwHqWC1QujTWOUYWpFyPPbOmC
KVsQXzTzRfnVTDECF1Pk5d3yi45KILLU3B4zDJfUJjbL3KDYjuVUvhHF/pcGcjC3
H1LWJqHSAL8sJwHvKhpi0VtQ5SOxXnLO5fGG/CZT/Xb4QyM+mkwkFLdn1TryZTR/
M4XA+QuI96TzY7HQUJdSoEDANxoBef6gPnxdDKOnK1v4hfNsPAl7o8hZkM3w0DK8
GoFZUV+vjBhFcymGcQegSNiea28Hfi+hBe+PPHCmw+tJm47cketD5uP5jJ5NGaUe
MKU/QXWXc6oqeBTQT6ki5zJbJXKttbPa8eEmp+FrMatc9kruvBVhQoMbj7Vd3CA1
dC4zK9Awy7yj24ZhZfnAFx2DboCmBTUI3QKjDt9K5PRZyMeyoP8=
=C0Sg
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- ARM-SMMU Updates from Will:
- Continued SVM enablement, where page-table is shared with CPU
- Groundwork to support integrated SMMU with Adreno GPU
- Allow disabling of MSI-based polling on the kernel command-line
- Minor driver fixes and cleanups (octal permissions, error
messages, ...)
- Secure Nested Paging Support for AMD IOMMU. The IOMMU will fault when
a device tries DMA on memory owned by a guest. This needs new
fault-types as well as a rewrite of the IOMMU memory semaphore for
command completions.
- Allow broken Intel IOMMUs (wrong address widths reported) to still be
used for interrupt remapping.
- IOMMU UAPI updates for supporting vSVA, where the IOMMU can access
address spaces of processes running in a VM.
- Support for the MT8167 IOMMU in the Mediatek IOMMU driver.
- Device-tree updates for the Renesas driver to support r8a7742.
- Several smaller fixes and cleanups all over the place.
* tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (57 commits)
iommu/vt-d: Gracefully handle DMAR units with no supported address widths
iommu/vt-d: Check UAPI data processed by IOMMU core
iommu/uapi: Handle data and argsz filled by users
iommu/uapi: Rename uapi functions
iommu/uapi: Use named union for user data
iommu/uapi: Add argsz for user filled data
docs: IOMMU user API
iommu/qcom: add missing put_device() call in qcom_iommu_of_xlate()
iommu/arm-smmu-v3: Add SVA device feature
iommu/arm-smmu-v3: Check for SVA features
iommu/arm-smmu-v3: Seize private ASID
iommu/arm-smmu-v3: Share process page tables
iommu/arm-smmu-v3: Move definitions to a header
iommu/io-pgtable-arm: Move some definitions to a header
iommu/arm-smmu-v3: Ensure queue is read after updating prod pointer
iommu/amd: Re-purpose Exclusion range registers to support SNP CWWB
iommu/amd: Add support for RMP_PAGE_FAULT and RMP_HW_ERR
iommu/amd: Use 4K page for completion wait write-back semaphore
iommu/tegra-smmu: Allow to group clients in same swgroup
iommu/tegra-smmu: Fix iova->phys translation
...
IOMMU generic layer already does sanity checks on UAPI data for version
match and argsz range based on generic information.
This patch adjusts the following data checking responsibilities:
- removes the redundant version check from VT-d driver
- removes the check for vendor specific data size
- adds check for the use of reserved/undefined flags
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/1601051567-54787-7-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
IOMMU UAPI data size is filled by the user space which must be validated
by the kernel. To ensure backward compatibility, user data can only be
extended by either re-purpose padding bytes or extend the variable sized
union at the end. No size change is allowed before the union. Therefore,
the minimum size is the offset of the union.
To use offsetof() on the union, we must make it named.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/linux-iommu/20200611145518.0c2817d6@x1.home/
Link: https://lore.kernel.org/r/1601051567-54787-4-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A PASID is allocated for an "mm" the first time any thread binds to an
SVA-capable device and is freed from the "mm" when the SVA is unbound
by the last thread. It's possible for the "mm" to have different PASID
values in different binding/unbinding SVA cycles.
The mm's PASID (non-zero for valid PASID or 0 for invalid PASID) is
propagated to a per-thread PASID MSR for all threads within the mm
through IPI, context switch, or inherited. This is done to ensure that a
running thread has the right PASID in the MSR matching the mm's PASID.
[ bp: s/SVM/SVA/g; massage. ]
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1600187413-163670-10-git-send-email-fenghua.yu@intel.com
"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".
Change its type to "unsigned int".
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1600187413-163670-3-git-send-email-fenghua.yu@intel.com
PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".
No PASID type change in uapi although it defines PASID as __u64 in
some places.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1600187413-163670-2-git-send-email-fenghua.yu@intel.com
Merge more updates from Andrew Morton:
- most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
memory-hotplug, cleanups, uaccess, migration, gup, pagemap),
- various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
exec, kdump, rapidio, panic, kcov, kgdb, ipc).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits)
mm/gup: remove task_struct pointer for all gup code
mm: clean up the last pieces of page fault accountings
mm/xtensa: use general page fault accounting
mm/x86: use general page fault accounting
mm/sparc64: use general page fault accounting
mm/sparc32: use general page fault accounting
mm/sh: use general page fault accounting
mm/s390: use general page fault accounting
mm/riscv: use general page fault accounting
mm/powerpc: use general page fault accounting
mm/parisc: use general page fault accounting
mm/openrisc: use general page fault accounting
mm/nios2: use general page fault accounting
mm/nds32: use general page fault accounting
mm/mips: use general page fault accounting
mm/microblaze: use general page fault accounting
mm/m68k: use general page fault accounting
mm/ia64: use general page fault accounting
mm/hexagon: use general page fault accounting
mm/csky: use general page fault accounting
...
Patch series "mm: Page fault accounting cleanups", v5.
This is v5 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b98270 ("mm: allow
VM_FAULT_RETRY for multiple times"):
https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/
What this series did:
- Correct page fault accounting: we do accounting for a page fault
(no matter whether it's from #PF handling, or gup, or anything else)
only with the one that completed the fault. For example, page fault
retries should not be counted in page fault counters. Same to the
perf events.
- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
event is used in an adhoc way across different archs.
Case (1): for many archs it's done at the entry of a page fault
handler, so that it will also cover e.g. errornous faults.
Case (2): for some other archs, it is only accounted when the page
fault is resolved successfully.
Case (3): there're still quite some archs that have not enabled
this perf event.
Since this series will touch merely all the archs, we unify this
perf event to always follow case (1), which is the one that makes most
sense. And since we moved the accounting into handle_mm_fault, the
other two MAJ/MIN perf events are well taken care of naturally.
- Unify definition of "major faults": the definition of "major
fault" is slightly changed when used in accounting (not
VM_FAULT_MAJOR). More information in patch 1.
- Always account the page fault onto the one that triggered the page
fault. This does not matter much for #PF handlings, but mostly for
gup. More information on this in patch 25.
Patchset layout:
Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25: Cleanup GUP task_struct pointer since it's not needed any more
This patch (of 25):
This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault(). This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events. To
do this, the pt_regs pointer is passed into handle_mm_fault().
PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.
So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After page requests are handled, software must respond to the device
which raised the page request with the result. This is done through
the iommu ops.page_response if the request was reported to outside of
vendor iommu driver through iommu_report_device_fault(). This adds the
VT-d implementation of page_response ops.
Co-developed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Co-developed-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20200724014925.15523-12-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A pasid might be bound to a page table from a VM guest via the iommu
ops.sva_bind_gpasid. In this case, when a DMA page fault is detected
on the physical IOMMU, we need to inject the page fault request into
the guest. After the guest completes handling the page fault, a page
response need to be sent back via the iommu ops.page_response().
This adds support to report a page request fault. Any external module
which is interested in handling this fault should regiester a notifier
with iommu_register_device_fault_handler().
Co-developed-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Co-developed-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20200724014925.15523-11-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There are several places in the code that need to get the pointers of
svm and sdev according to a pasid and device. Add a helper to achieve
this for code consolidation and readability.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20200724014925.15523-10-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
It is refactored in two ways:
- Make it global so that it could be used in other files.
- Make bus/devfn optional so that callers could ignore these two returned
values when they only want to get the coresponding iommu pointer.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20200724014925.15523-9-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
For the unlikely use case where multiple aux domains from the same pdev
are attached to a single guest and then bound to a single process
(thus same PASID) within that guest, we cannot easily support this case
by refcounting the number of users. As there is only one SL page table
per PASID while we have multiple aux domains thus multiple SL page tables
for the same PASID.
Extra unbinding guest PASID can happen due to race between normal and
exception cases. Termination of one aux domain may affect others unless
we actively track and switch aux domains to ensure the validity of SL
page tables and TLB states in the shared PASID entry.
Support for sharing second level PGDs across domains can reduce the
complexity but this is not available due to the limitations on VFIO
container architecture. We can revisit this decision once sharing PGDs
are available.
Overall, the complexity and potential glitch do not warrant this unlikely
use case thereby removed by this patch.
Fixes: 56722a4398 ("iommu/vt-d: Add bind guest PASID support")
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20200724014925.15523-8-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
- Move the Intel and AMD IOMMU drivers into their own
subdirectory. Both drivers consist of several files by now and
giving them their own directory unclutters the IOMMU top-level
directory a bit.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl7jme4ACgkQK/BELZcB
GuNNMw//U7AL3Qq6J8DqU+Ay+gIblxKUhWtYLVHad1+agSWmcbfy4E6iV8FqXLbP
HnCSmA7ScgEMN+3GAve/WpWccMI3aeAgp4xI4MElz/6p4QeJXfNu9COrllif+OX7
4fDpxXyd0fhKev4lPGZFRY8yGgvgP5ZHvDG0juoxi3bKCqiC2bkAga3itC9RPCQb
8kBefKIb7/q+UUGGVppTvVIW0mrqWLQ1TcnfKf0hovU7yZs4i4RO+8br6Q5eNUcB
Vb64vCV3qkQ/zPdr4vK6rvuZTPRMKkCgY4+MJr/g2/JQWuZxF1O+q+TsTYI1ISAS
qNPRdxgNrZbSBDowg2QfQtPBHPpq3m4eNDeD+ewyQkrVt0/Eneg6Np0FG9j3tGAG
+IS64r2E25O0tGtBIQ9Mi2TC68S0C7VtMbzx55zVcTGF0JH9T2YW4sSdRcTjVdW6
WBFqu5fXEKk63ln3h/8JEP7zPWGp+Q3cuOChDvcmIMjCxQ84k5jOB5AIZppGIgJ9
0nGf45t8YCvIXMbNKufYqjesJZOC2bd+Swi1MZXVlO/gSVv19O40UW+F1X0e7YOp
MHOzsV44rE2posS/huHOLR4q0AQTdc9O1mywCCGDxNW8tlwIBHsLLJ8b9C9raIRn
mZkq94QZQXta+WYtoGvbk6nHQ89FtBOOdEH2TSlEbvvYowpjZZE=
=gX8z
-----END PGP SIGNATURE-----
Merge tag 'iommu-drivers-move-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu driver directory structure cleanup from Joerg Roedel:
"Move the Intel and AMD IOMMU drivers into their own subdirectory.
Both drivers consist of several files by now and giving them their own
directory unclutters the IOMMU top-level directory a bit"
* tag 'iommu-drivers-move-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Move Intel IOMMU driver into subdirectory
iommu/amd: Move AMD IOMMU driver into subdirectory
Move all files related to the Intel IOMMU driver into its own
subdirectory.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20200609130303.26974-3-joro@8bytes.org