1870 Commits

Author SHA1 Message Date
Linus Torvalds
0d519f2d1e pci-v4.14-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
 vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
 64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
 XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
 0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
 B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
 nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
 c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
 4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
 h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
 GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
 EG76ridTolUFVV+wzJD9
 =9cLP
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - add enhanced Downstream Port Containment support, which prints more
   details about Root Port Programmed I/O errors (Dongdong Liu)

 - add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)

 - add MediaTek MT2712 and MT7622 support (Ryder Lee)

 - add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)

 - add Qualcom IPQ8074 support (Varadarajan Narayanan)

 - add R-Car r8a7743/5 device tree support (Biju Das)

 - add Rockchip per-lane PHY support for better power management (Shawn
   Lin)

 - fix IRQ mapping for hot-added devices by replacing the
   pci_fixup_irqs() boot-time design with a host bridge hook called at
   probe-time (Lorenzo Pieralisi, Matthew Minter)

 - fix race when enabling two devices that results in upstream bridge
   not being enabled correctly (Srinath Mannam)

 - fix pciehp power fault infinite loop (Keith Busch)

 - fix SHPC bridge MSI hotplug events by enabling bus mastering
   (Aleksandr Bezzubikov)

 - fix a VFIO issue by correcting PCIe capability sizes (Alex
   Williamson)

 - fix an INTD issue on Xilinx and possibly other drivers by unifying
   INTx IRQ domain support (Paul Burton)

 - avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
   Roedel)

 - allow APM X-Gene device assignment to guests by adding an ACS quirk
   (Feng Kan)

 - fix driver crashes by disabling Extended Tags on Broadcom HT2100
   (Extended Tags support is required for PCIe Receivers but not
   Requesters, and we now enable them by default when Requesters support
   them) (Sinan Kaya)

 - fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
   use the real Requester ID (not a phantom RID) (Robin Murphy)

 - prevent assignment of Intel VMD children to guests (which may be
   supported eventually, but isn't yet) by not associating an IOMMU with
   them (Jon Derrick)

 - fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
   Bauer)

 - fix a Function-Level Reset issue with Intel 750 NVMe by waiting
   longer (up to 60sec instead of 1sec) for device to become ready
   (Sinan Kaya)

 - fix a Function-Level Reset issue on iProc Stingray by working around
   hardware defects in the CRS implementation (Oza Pawandeep)

 - fix an issue with Intel NVMe P3700 after an iProc reset by adding a
   delay during shutdown (Oza Pawandeep)

 - fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
   in compose_msi_msg() (Stephen Hemminger)

 - fix a wireless LAN driver timeout by clearing DesignWare MSI
   interrupt status after it is handled, not before (Faiz Abbas)

 - fix DesignWare ATU enable checking (Jisheng Zhang)

 - reduce Layerscape dependencies on the bootloader by doing more
   initialization in the driver (Hou Zhiqiang)

 - improve Intel VMD performance allowing allocation of more IRQ vectors
   than present CPUs (Keith Busch)

 - improve endpoint framework support for initial DMA mask, different
   BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
   Vijay Abraham I, Stan Drozd)

 - rework CRS support to add periodic messages while we poll during
   enumeration and after Function-Level Reset and prepare for possible
   other uses of CRS (Sinan Kaya)

 - clean up Root Port AER handling by removing unnecessary code and
   moving error handler methods to struct pcie_port_service_driver
   (Christoph Hellwig)

 - clean up error handling paths in various drivers (Bjorn Andersson,
   Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
   Lorenzo Pieralisi, Sergei Shtylyov)

 - clean up SR-IOV resource handling by disabling VF decoding before
   updating the corresponding resource structs (Gavin Shan)

 - clean up DesignWare-based drivers by unifying quirks to update Class
   Code and Interrupt Pin and related handling of write-protected
   registers (Hou Zhiqiang)

 - clean up by adding empty generic pcibios_align_resource() and
   pcibios_fixup_bus() and removing empty arch-specific implementations
   (Palmer Dabbelt)

 - request exclusive reset control for several drivers to allow cleanup
   elsewhere (Philipp Zabel)

 - constify various structures (Arvind Yadav, Bhumika Goyal)

 - convert from full_name() to %pOF (Rob Herring)

 - remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
   Lin)

* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
  PCI: xgene: Clean up whitespace
  PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
  PCI: xgene: Fix platform_get_irq() error handling
  PCI: xilinx-nwl: Fix platform_get_irq() error handling
  PCI: rockchip: Fix platform_get_irq() error handling
  PCI: altera: Fix platform_get_irq() error handling
  PCI: spear13xx: Fix platform_get_irq() error handling
  PCI: artpec6: Fix platform_get_irq() error handling
  PCI: armada8k: Fix platform_get_irq() error handling
  PCI: dra7xx: Fix platform_get_irq() error handling
  PCI: exynos: Fix platform_get_irq() error handling
  PCI: iproc: Clean up whitespace
  PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
  PCI: iproc: Add 500ms delay during device shutdown
  PCI: Fix typos and whitespace errors
  PCI: Remove unused "res" variable from pci_resource_io()
  PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
  PCI/AER: Reformat AER register definitions
  iommu/vt-d: Prevent VMD child devices from being remapping targets
  x86/PCI: Use is_vmd() rather than relying on the domain number
  ...
2017-09-08 15:47:43 -07:00
Linus Torvalds
b1b6f83ac9 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "PCID support, 5-level paging support, Secure Memory Encryption support

  The main changes in this cycle are support for three new, complex
  hardware features of x86 CPUs:

   - Add 5-level paging support, which is a new hardware feature on
     upcoming Intel CPUs allowing up to 128 PB of virtual address space
     and 4 PB of physical RAM space - a 512-fold increase over the old
     limits. (Supercomputers of the future forecasting hurricanes on an
     ever warming planet can certainly make good use of more RAM.)

     Many of the necessary changes went upstream in previous cycles,
     v4.14 is the first kernel that can enable 5-level paging.

     This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
     default.

     (By Kirill A. Shutemov)

   - Add 'encrypted memory' support, which is a new hardware feature on
     upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
     RAM to be encrypted and decrypted (mostly) transparently by the
     CPU, with a little help from the kernel to transition to/from
     encrypted RAM. Such RAM should be more secure against various
     attacks like RAM access via the memory bus and should make the
     radio signature of memory bus traffic harder to intercept (and
     decrypt) as well.

     This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
     by default.

     (By Tom Lendacky)

   - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
     hardware feature that attaches an address space tag to TLB entries
     and thus allows to skip TLB flushing in many cases, even if we
     switch mm's.

     (By Andy Lutomirski)

  All three of these features were in the works for a long time, and
  it's coincidence of the three independent development paths that they
  are all enabled in v4.14 at once"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
  x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
  x86/mm: Use pr_cont() in dump_pagetable()
  x86/mm: Fix SME encryption stack ptr handling
  kvm/x86: Avoid clearing the C-bit in rsvd_bits()
  x86/CPU: Align CR3 defines
  x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
  acpi, x86/mm: Remove encryption mask from ACPI page protection type
  x86/mm, kexec: Fix memory corruption with SME on successive kexecs
  x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
  x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
  x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
  x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
  x86/mm: Allow userspace have mappings above 47-bit
  x86/mm: Prepare to expose larger address space to userspace
  x86/mpx: Do not allow MPX if we have mappings above 47-bit
  x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
  x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
  x86/mm/dump_pagetables: Fix printout of p4d level
  x86/mm/dump_pagetables: Generalize address normalization
  x86/boot: Fix memremap() related build failure
  ...
2017-09-04 12:21:28 -07:00
Jérôme Glisse
30ef7d2c05 iommu/intel: update to new mmu_notifier semantic
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31 16:12:59 -07:00
Jérôme Glisse
f0d1c713d6 iommu/amd: update to new mmu_notifier semantic
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31 16:12:59 -07:00
Jon Derrick
5823e330b5 iommu/vt-d: Prevent VMD child devices from being remapping targets
VMD child devices must use the VMD endpoint's ID as the requester.  Because
of this, there needs to be a way to link the parent VMD endpoint's IOMMU
group and associated mappings to the VMD child devices such that attaching
and detaching child devices modify the endpoint's mappings, while
preventing early detaching on a singular device removal or unbinding.

The reassignment of individual VMD child devices devices to VMs is outside
the scope of VMD, but may be implemented in the future. For now it is best
to prevent any such attempts.

Prevent VMD child devices from returning an IOMMU, which prevents it from
exposing an iommu_group sysfs directory and allowing subsequent binding by
userspace-access drivers such as VFIO.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-30 16:43:58 -05:00
Ingo Molnar
413d63d71b Merge branch 'linus' into x86/mm to pick up fixes and to fix conflicts
Conflicts:
	arch/x86/kernel/head64.c
	arch/x86/mm/mmap.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-26 09:19:13 +02:00
Joerg Roedel
2926a2aa5c iommu: Fix wrong freeing of iommu_device->dev
The struct iommu_device has a 'struct device' embedded into
it, not as a pointer, but the whole struct. In the
conversion of the iommu drivers to use struct iommu_device
it was forgotten that the relase function for that struct
device simply calls kfree() on the pointer.

This frees memory that was never allocated and causes memory
corruption.

To fix this issue, use a pointer to struct device instead of
embedding the whole struct. This needs some updates in the
iommu sysfs code as well as the Intel VT-d and AMD IOMMU
driver.

Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: 39ab9555c241 ('iommu: Add sysfs bindings for struct iommu_device')
Cc: stable@vger.kernel.org # >= v4.11
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-08-15 13:58:48 +02:00
Artem Savkov
a7990c647b iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device
Commit c54451a "iommu/arm-smmu: Fix the error path in arm_smmu_add_device"
removed fwspec assignment in legacy_binding path as redundant which is
wrong. It needs to be updated after fwspec initialisation in
arm_smmu_register_legacy_master() as it is dereferenced later. Without
this there is a NULL-pointer dereference panic during boot on some hosts.

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-08-11 16:56:51 +02:00
Joerg Roedel
74ddda71f4 iommu/amd: Fix schedule-while-atomic BUG in initialization code
The register_syscore_ops() function takes a mutex and might
sleep. In the IOMMU initialization code it is invoked during
irq-remapping setup already, where irqs are disabled.

This causes a schedule-while-atomic bug:

 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
 in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: swapper/0
 no locks held by swapper/0/1.
 irq event stamp: 304
 hardirqs last  enabled at (303): [<ffffffff818a87b6>] _raw_spin_unlock_irqrestore+0x36/0x60
 hardirqs last disabled at (304): [<ffffffff8235d440>] enable_IR_x2apic+0x79/0x196
 softirqs last  enabled at (36): [<ffffffff818ae75f>] __do_softirq+0x35f/0x4ec
 softirqs last disabled at (31): [<ffffffff810c1955>] irq_exit+0x105/0x120
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2.1.el7a.test.x86_64.debug #1
 Hardware name:          PowerEdge C6145 /040N24, BIOS 3.5.0 10/28/2014
 Call Trace:
  dump_stack+0x85/0xca
  ___might_sleep+0x22a/0x260
  __might_sleep+0x4a/0x80
  __mutex_lock+0x58/0x960
  ? iommu_completion_wait.part.17+0xb5/0x160
  ? register_syscore_ops+0x1d/0x70
  ? iommu_flush_all_caches+0x120/0x150
  mutex_lock_nested+0x1b/0x20
  register_syscore_ops+0x1d/0x70
  state_next+0x119/0x910
  iommu_go_to_state+0x29/0x30
  amd_iommu_enable+0x13/0x23

Fix it by moving the register_syscore_ops() call to the next
initialization step, which runs with irqs enabled.

Reported-by: Artem Savkov <asavkov@redhat.com>
Tested-by: Artem Savkov <asavkov@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2c0ae1720c09 ('iommu/amd: Convert iommu initialization to state machine')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-07-26 15:39:14 +02:00
Suravee Suthikulpanit
efe6f24160 iommu/amd: Enable ga_log_intr when enabling guest_mode
IRTE[GALogIntr] bit should set when enabling guest_mode, which enables
IOMMU to generate entry in GALog when IRTE[IsRun] is not set, and send
an interrupt to notify IOMMU driver.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: stable@vger.kernel.org # v4.9+
Fixes: d98de49a53e48 ('iommu/amd: Enable vAPIC interrupt remapping mode by default')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-07-25 15:03:55 +02:00
Robin Murphy
7655739143 iommu/io-pgtable: Sanitise map/unmap addresses
It may be an egregious error to attempt to use addresses outside the
range of the pagetable format, but that still doesn't mean we should
merrily wreak havoc by silently mapping/unmapping whatever truncated
portions of them might happen to correspond to real addresses.

Add some up-front checks to sanitise our inputs so that buggy callers
don't invite potential memory corruption.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-07-20 10:30:28 +01:00
Vivek Gautam
c54451a5e2 iommu/arm-smmu: Fix the error path in arm_smmu_add_device
fwspec->iommu_priv is available only after arm_smmu_master_cfg
instance has been allocated. We shouldn't free it before that.
Also it's logical to free the master cfg itself without
checking for fwspec.

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
[will: remove redundant assignment to fwspec]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-07-20 10:30:28 +01:00
Robin Murphy
2984f7f3bb Revert "iommu/io-pgtable: Avoid redundant TLB syncs"
The tlb_sync_pending flag was necessary for correctness in the Mediatek
M4U driver, but since it offered a small theoretical optimisation for
all io-pgtable users it was implemented as a high-level thing. However,
now that some users may not be using a synchronising lock, there are
several ways this flag can go wrong for them, and at worst it could
result in incorrect behaviour.

Since we've addressed the correctness issue within the Mediatek driver
itself, and fixing the optimisation aspect to be concurrency-safe would
be quite a headache (and impose extra overhead on every operation for
the sake of slightly helping one case which will virtually never happen
in typical usage), let's just retire it.

This reverts commit 88492a4700360a086e55d8874ad786105a5e8b0f.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-07-20 10:30:27 +01:00
Robin Murphy
98a8f63e56 iommu/mtk: Avoid redundant TLB syncs locally
Under certain circumstances, the io-pgtable code may end up issuing two
TLB sync operations without any intervening invalidations. This goes
badly for the M4U hardware, since it means the second sync ends up
polling for a non-existent operation to finish, and as a result times
out and warns. The io_pgtable_tlb_* helpers implement a high-level
optimisation to avoid issuing the second sync at all in such cases, but
in order to work correctly that requires all pagetable operations to be
serialised under a lock, thus is no longer applicable to all io-pgtable
users.

Since we're the only user actually relying on this flag for correctness,
let's reimplement it locally to avoid the headache of trying to make the
high-level version concurrency-safe for other users.

CC: Yong Wu <yong.wu@mediatek.com>
CC: Matthias Brugger <matthias.bgg@gmail.com>
Tested-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-07-20 10:30:27 +01:00
Will Deacon
8e517e762a iommu/arm-smmu: Reintroduce locking around TLB sync operations
Commit 523d7423e21b ("iommu/arm-smmu: Remove io-pgtable spinlock")
removed the locking used to serialise map/unmap calls into the io-pgtable
code from the ARM SMMU driver. This is good for performance, but opens
us up to a nasty race with TLB syncs because the TLB sync register is
shared within a context bank (or even globally for stage-2 on SMMUv1).

There are two cases to consider:

  1. A CPU can be spinning on the completion of a TLB sync, take an
     interrupt which issues a subsequent TLB sync, and then report a
     timeout on return from the interrupt.

  2. A CPU can be spinning on the completion of a TLB sync, but other
     CPUs can continuously issue additional TLB syncs in such a way that
     the backoff logic reports a timeout.

Rather than fix this by spinning for completion of prior TLB syncs before
issuing a new one (which may suffer from fairness issues on large systems),
instead reintroduce locking around TLB sync operations in the ARM SMMU
driver.

Fixes: 523d7423e21b ("iommu/arm-smmu: Remove io-pgtable spinlock")
Cc: Robin Murphy <robin.murphy@arm.com>
Reported-by: Ray Jui <ray.jui@broadcom.com>
Tested-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-07-20 10:30:26 +01:00
Tom Lendacky
2543a786aa iommu/amd: Allow the AMD IOMMU to work with memory encryption
The IOMMU is programmed with physical addresses for the various tables
and buffers that are used to communicate between the device and the
driver. When the driver allocates this memory it is encrypted. In order
for the IOMMU to access the memory as encrypted the encryption mask needs
to be included in these physical addresses during configuration.

The PTE entries created by the IOMMU should also include the encryption
mask so that when the device behind the IOMMU performs a DMA, the DMA
will be performed to encrypted memory.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
Cc: <iommu@lists.linux-foundation.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Toshimitsu Kani <toshi.kani@hpe.com>
Cc: kasan-dev@googlegroups.com
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/3053631ea25ba8b1601c351cb7c541c496f6d9bc.1500319216.git.thomas.lendacky@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-18 11:38:03 +02:00
Linus Torvalds
fb4e3beeff IOMMU Updates for Linux v4.13
This update comes with:
 
 	* Support for lockless operation in the ARM io-pgtable code.
 	  This is an important step to solve the scalability problems in
 	  the common dma-iommu code for ARM
 
 	* Some Errata workarounds for ARM SMMU implemenations
 
 	* Rewrite of the deferred IO/TLB flush code in the AMD IOMMU
 	  driver. The code suffered from very high flush rates, with the
 	  new implementation the flush rate is down to ~1% of what it
 	  was before
 
 	* Support for amd_iommu=off when booting with kexec. Problem
 	  here was that the IOMMU driver bailed out early without
 	  disabling the iommu hardware, if it was enabled in the old
 	  kernel
 
 	* The Rockchip IOMMU driver is now available on ARM64
 
 	* Align the return value of the iommu_ops->device_group
 	  call-backs to not miss error values
 
 	* Preempt-disable optimizations in the Intel VT-d and common
 	  IOVA code to help Linux-RT
 
 	* Various other small cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZZgddAAoJECvwRC2XARrjurgQANO338GIBr2ZkA0oectidDpZ
 Y4yu7W9RH6NyhupJG/Xooya7daBWFjbaA1AVJ3ZZNlMERh69AmehVfRUfVMzF2w+
 buma58HQgiJWN1zFD8xdeMzYKms9P77whA88C/9QvrK/klB3LipWP2SC0yvvvyxJ
 mMCDpgt+D+CGnIDqbRuyLDQoRu3yjAkAvYb6OzL8DPJVP1Y5oLffGwGnHzJbJnOf
 eWJwYHM5ai0uF/Qqy6RNNekacObjVaOLihjugGvokH6ipXfOrSSNriXW9pZiWR5m
 S91898YTP3KuWWsJM+N93UAjvc6pL9PqL/OvbB9zdYpzu+5PtUpFXHYcOebKyEEO
 4j9CaRzubsWFTFjbYItJnR4WgXQRf4NKOGfTfHMHA+dY8aODYnlXNVdQDAA2aFgn
 TUBvHq5xb0zZ3nbPwtTDyW06oDMVfBBarLx2yFI1aQSSh+eg/GtIi5KP28gyFZNz
 4gWj0q3g/e3y7WEwNbYV7L3TS0d/p8VUYFtUp7PUCddnWoY+4cJzgidub5xIViZD
 Ql0nZzga9pXXIE/kE5Pf74WqrG7JJzZsvK2ABy4+XGrMq6RclJf+0pXbSqiXDpXL
 quw8t0oXw0ZEeavQ31Za8mjXBvo5ocM5iintl1wrl2BujHEO3oKqbGsIOaRcLnlN
 Ukehbl4OEKzZpD3oLPPk
 =pmBf
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "This update comes with:

   - Support for lockless operation in the ARM io-pgtable code.

     This is an important step to solve the scalability problems in the
     common dma-iommu code for ARM

   - Some Errata workarounds for ARM SMMU implemenations

   - Rewrite of the deferred IO/TLB flush code in the AMD IOMMU driver.

     The code suffered from very high flush rates, with the new
     implementation the flush rate is down to ~1% of what it was before

   - Support for amd_iommu=off when booting with kexec.

     The problem here was that the IOMMU driver bailed out early without
     disabling the iommu hardware, if it was enabled in the old kernel

   - The Rockchip IOMMU driver is now available on ARM64

   - Align the return value of the iommu_ops->device_group call-backs to
     not miss error values

   - Preempt-disable optimizations in the Intel VT-d and common IOVA
     code to help Linux-RT

   - Various other small cleanups and fixes"

* tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (60 commits)
  iommu/vt-d: Constify intel_dma_ops
  iommu: Warn once when device_group callback returns NULL
  iommu/omap: Return ERR_PTR in device_group call-back
  iommu: Return ERR_PTR() values from device_group call-backs
  iommu/s390: Use iommu_group_get_for_dev() in s390_iommu_add_device()
  iommu/vt-d: Don't disable preemption while accessing deferred_flush()
  iommu/iova: Don't disable preempt around this_cpu_ptr()
  iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
  iommu/arm-smmu-v3: Enable ACPI based HiSilicon CMD_PREFETCH quirk(erratum 161010701)
  iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
  ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 model
  iommu/arm-smmu-v3, acpi: Add temporary Cavium SMMU-V3 IORT model number definitions
  iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table
  iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE
  iommu/arm-smmu-v3: Remove io-pgtable spinlock
  iommu/arm-smmu: Remove io-pgtable spinlock
  iommu/io-pgtable-arm-v7s: Support lockless operation
  iommu/io-pgtable-arm: Support lockless operation
  iommu/io-pgtable: Introduce explicit coherency
  iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
  ...
2017-07-12 10:00:04 -07:00
Linus Torvalds
f72e24a124 This is the first pull request for the new dma-mapping subsystem
In this new subsystem we'll try to properly maintain all the generic
 code related to dma-mapping, and will further consolidate arch code
 into common helpers.
 
 This pull request contains:
 
  - removal of the DMA_ERROR_CODE macro, replacing it with calls
    to ->mapping_error so that the dma_map_ops instances are
    more self contained and can be shared across architectures (me)
  - removal of the ->set_dma_mask method, which duplicates the
    ->dma_capable one in terms of functionality, but requires more
    duplicate code.
  - various updates for the coherent dma pool and related arm code
    (Vladimir)
  - various smaller cleanups (me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlldmw0LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOiKA/+Ln1mFLSf3nfTzIHa24Bbk8ZTGr0B8TD4Vmyyt8iG
 oO3AeaTLn3d6ugbH/uih/tPz8PuyXsdiTC1rI/ejDMiwMTSjW6phSiIHGcStSR9X
 VFNhmMFacp7QpUpvxceV0XZYKDViAoQgHeGdp3l+K5h/v4AYePV/v/5RjQPaEyOh
 YLbCzETO+24mRWdJxdAqtTW4ovYhzj6XsiJ+pAjlV0+SWU6m5L5E+VAPNi1vqv1H
 1O2KeCFvVYEpcnfL3qnkw2timcjmfCfeFAd9mCUAc8mSRBfs3QgDTKw3XdHdtRml
 LU2WuA5cpMrOdBO4mVra2plo8E2szvpB1OZZXoKKdCpK3VGwVpVHcTvClK2Ks/3B
 GDLieroEQNu2ZIUIdWXf/g2x6le3BcC9MmpkAhnGPqCZ7skaIBO5Cjpxm0zTJAPl
 PPY3CMBBEktAvys6DcudOYGixNjKUuAm5lnfpcfTEklFdG0AjhdK/jZOplAFA6w4
 LCiy0rGHM8ZbVAaFxbYoFCqgcjnv6EjSiqkJxVI4fu/Q7v9YXfdPnEmE0PJwCVo5
 +i7aCLgrYshTdHr/F3e5EuofHN3TDHwXNJKGh/x97t+6tt326QMvDKX059Kxst7R
 rFukGbrYvG8Y7yXwrSDbusl443ta0Ht7T1oL4YUoJTZp0nScAyEluDTmrH1JVCsT
 R4o=
 =0Fso
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping infrastructure from Christoph Hellwig:
 "This is the first pull request for the new dma-mapping subsystem

  In this new subsystem we'll try to properly maintain all the generic
  code related to dma-mapping, and will further consolidate arch code
  into common helpers.

  This pull request contains:

   - removal of the DMA_ERROR_CODE macro, replacing it with calls to
     ->mapping_error so that the dma_map_ops instances are more self
     contained and can be shared across architectures (me)

   - removal of the ->set_dma_mask method, which duplicates the
     ->dma_capable one in terms of functionality, but requires more
     duplicate code.

   - various updates for the coherent dma pool and related arm code
     (Vladimir)

   - various smaller cleanups (me)"

* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
  ARM: dma-mapping: Remove traces of NOMMU code
  ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
  ARM: NOMMU: Introduce dma operations for noMMU
  drivers: dma-mapping: allow dma_common_mmap() for NOMMU
  drivers: dma-coherent: Introduce default DMA pool
  drivers: dma-coherent: Account dma_pfn_offset when used with device tree
  dma: Take into account dma_pfn_offset
  dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
  dma-mapping: remove dmam_free_noncoherent
  crypto: qat - avoid an uninitialized variable warning
  au1100fb: remove a bogus dma_free_nonconsistent call
  MAINTAINERS: add entry for dma mapping helpers
  powerpc: merge __dma_set_mask into dma_set_mask
  dma-mapping: remove the set_dma_mask method
  powerpc/cell: use the dma_supported method for ops switching
  powerpc/cell: clean up fixed mapping dma_ops initialization
  tile: remove dma_supported and mapping_error methods
  xen-swiotlb: remove xen_swiotlb_set_dma_mask
  arm: implement ->dma_supported instead of ->set_dma_mask
  mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
  ...
2017-07-06 19:20:54 -07:00
Linus Torvalds
03ffbcdd78 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "The irq department delivers:

   - Expand the generic infrastructure handling the irq migration on CPU
     hotplug and convert X86 over to it. (Thomas Gleixner)

     Aside of consolidating code this is a preparatory change for:

   - Finalizing the affinity management for multi-queue devices. The
     main change here is to shut down interrupts which are affine to a
     outgoing CPU and reenabling them when the CPU comes online again.
     That avoids moving interrupts pointlessly around and breaking and
     reestablishing affinities for no value. (Christoph Hellwig)

     Note: This contains also the BLOCK-MQ and NVME changes which depend
     on the rework of the irq core infrastructure. Jens acked them and
     agreed that they should go with the irq changes.

   - Consolidation of irq domain code (Marc Zyngier)

   - State tracking consolidation in the core code (Jeffy Chen)

   - Add debug infrastructure for hierarchical irq domains (Thomas
     Gleixner)

   - Infrastructure enhancement for managing generic interrupt chips via
     devmem (Bartosz Golaszewski)

   - Constification work all over the place (Tobias Klauser)

   - Two new interrupt controller drivers for MVEBU (Thomas Petazzoni)

   - The usual set of fixes, updates and enhancements all over the
     place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
  irqchip/or1k-pic: Fix interrupt acknowledgement
  irqchip/irq-mvebu-gicp: Allocate enough memory for spi_bitmap
  irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
  nvme: Allocate queues for all possible CPUs
  blk-mq: Create hctx for each present CPU
  blk-mq: Include all present CPUs in the default queue mapping
  genirq: Avoid unnecessary low level irq function calls
  genirq: Set irq masked state when initializing irq_desc
  genirq/timings: Add infrastructure for estimating the next interrupt arrival time
  genirq/timings: Add infrastructure to track the interrupt timings
  genirq/debugfs: Remove pointless NULL pointer check
  irqchip/gic-v3-its: Don't assume GICv3 hardware supports 16bit INTID
  irqchip/gic-v3-its: Add ACPI NUMA node mapping
  irqchip/gic-v3-its-platform-msi: Make of_device_ids const
  irqchip/gic-v3-its: Make of_device_ids const
  irqchip/irq-mvebu-icu: Add new driver for Marvell ICU
  irqchip/irq-mvebu-gicp: Add new driver for Marvell GICP
  dt-bindings/interrupt-controller: Add DT binding for the Marvell ICU
  genirq/irqdomain: Remove auto-recursive hierarchy support
  irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access
  ...
2017-07-03 16:50:31 -07:00
Linus Torvalds
9bd42183b9 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Add the SYSTEM_SCHEDULING bootup state to move various scheduler
     debug checks earlier into the bootup. This turns silent and
     sporadically deadly bugs into nice, deterministic splats. Fix some
     of the splats that triggered. (Thomas Gleixner)

   - A round of restructuring and refactoring of the load-balancing and
     topology code (Peter Zijlstra)

   - Another round of consolidating ~20 of incremental scheduler code
     history: this time in terms of wait-queue nomenclature. (I didn't
     get much feedback on these renaming patches, and we can still
     easily change any names I might have misplaced, so if anyone hates
     a new name, please holler and I'll fix it.) (Ingo Molnar)

   - sched/numa improvements, fixes and updates (Rik van Riel)

   - Another round of x86/tsc scheduler clock code improvements, in hope
     of making it more robust (Peter Zijlstra)

   - Improve NOHZ behavior (Frederic Weisbecker)

   - Deadline scheduler improvements and fixes (Luca Abeni, Daniel
     Bristot de Oliveira)

   - Simplify and optimize the topology setup code (Lauro Ramos
     Venancio)

   - Debloat and decouple scheduler code some more (Nicolas Pitre)

   - Simplify code by making better use of llist primitives (Byungchul
     Park)

   - ... plus other fixes and improvements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
  sched/cputime: Refactor the cputime_adjust() code
  sched/debug: Expose the number of RT/DL tasks that can migrate
  sched/numa: Hide numa_wake_affine() from UP build
  sched/fair: Remove effective_load()
  sched/numa: Implement NUMA node level wake_affine()
  sched/fair: Simplify wake_affine() for the single socket case
  sched/numa: Override part of migrate_degrades_locality() when idle balancing
  sched/rt: Move RT related code from sched/core.c to sched/rt.c
  sched/deadline: Move DL related code from sched/core.c to sched/deadline.c
  sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled
  sched/fair: Spare idle load balancing on nohz_full CPUs
  nohz: Move idle balancer registration to the idle path
  sched/loadavg: Generalize "_idle" naming to "_nohz"
  sched/core: Drop the unused try_get_task_struct() helper function
  sched/fair: WARN() and refuse to set buddy when !se->on_rq
  sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well
  sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
  sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c
  sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>
  sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h>
  ...
2017-07-03 13:08:04 -07:00
Linus Torvalds
81e3e04489 UUID/GUID updates:
- introduce the new uuid_t/guid_t types that are going to replace
    the somewhat confusing uuid_be/uuid_le types and make the terminology
    fit the various specs, as well as the userspace libuuid library.
    (me, based on a previous version from Amir)
  - consolidated generic uuid/guid helper functions lifted from XFS
    and libnvdimm (Amir and me)
  - conversions to the new types and helpers (Amir, Andy and me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAllZfmILHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYMvyg/9EvWHOOsSdeDykCK3KdH2uIqnxwpl+m7ljccaGJIc
 MmaH0KnsP9p/Cuw5hESh2tYlmCYN7pmYziNXpf/LRS65/HpEYbs4oMqo8UQsN0UM
 2IXHfXY0HnCoG5OixH8RNbFTkxuGphsTY8meaiDr6aAmqChDQI2yGgQLo3WM2/Qe
 R9N1KoBWH/bqY6dHv+urlFwtsREm2fBH+8ovVma3TO73uZCzJGLJBWy3anmZN+08
 uYfdbLSyRN0T8rqemVdzsZ2SrpHYkIsYGUZV43F581vp8e/3OKMoMxpWRRd9fEsa
 MXmoaHcLJoBsyVSFR9lcx3axKrhAgBPZljASbbA0h49JneWXrzghnKBQZG2SnEdA
 ktHQ2sE4Yb5TZSvvWEKMQa3kXhEfIbTwgvbHpcDr5BUZX8WvEw2Zq8e7+Mi4+KJw
 QkvFC1S96tRYO2bxdJX638uSesGUhSidb+hJ/edaOCB/GK+sLhUdDTJgwDpUGmyA
 xVXTF51ramRS2vhlbzN79x9g33igIoNnG4/PV0FPvpCTSqxkHmPc5mK6Vals1lqt
 cW6XfUjSQECq5nmTBtYDTbA/T+8HhBgSQnrrvmferjJzZUFGr/7MXl+Evz2x4CjX
 OBQoAMu241w6Vp3zoXqxzv+muZ/NLar52M/zbi9TUjE0GvvRNkHvgCC4NmpIlWYJ
 Sxg=
 =J/4P
 -----END PGP SIGNATURE-----

Merge tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid

Pull uuid subsystem from Christoph Hellwig:
 "This is the new uuid subsystem, in which Amir, Andy and I have started
  consolidating our uuid/guid helpers and improving the types used for
  them. Note that various other subsystems have pulled in this tree, so
  I'd like it to go in early.

  UUID/GUID summary:

   - introduce the new uuid_t/guid_t types that are going to replace the
     somewhat confusing uuid_be/uuid_le types and make the terminology
     fit the various specs, as well as the userspace libuuid library.
     (me, based on a previous version from Amir)

   - consolidated generic uuid/guid helper functions lifted from XFS and
     libnvdimm (Amir and me)

   - conversions to the new types and helpers (Amir, Andy and me)"

* tag 'uuid-for-4.13' of git://git.infradead.org/users/hch/uuid: (34 commits)
  ACPI: hns_dsaf_acpi_dsm_guid can be static
  mmc: sdhci-pci: make guid intel_dsm_guid static
  uuid: Take const on input of uuid_is_null() and guid_is_null()
  thermal: int340x_thermal: fix compile after the UUID API switch
  thermal: int340x_thermal: Switch to use new generic UUID API
  acpi: always include uuid.h
  ACPI: Switch to use generic guid_t in acpi_evaluate_dsm()
  ACPI / extlog: Switch to use new generic UUID API
  ACPI / bus: Switch to use new generic UUID API
  ACPI / APEI: Switch to use new generic UUID API
  acpi, nfit: Switch to use new generic UUID API
  MAINTAINERS: add uuid entry
  tmpfs: generate random sb->s_uuid
  scsi_debug: switch to uuid_t
  nvme: switch to uuid_t
  sysctl: switch to use uuid_t
  partitions/ldm: switch to use uuid_t
  overlayfs: use uuid_t instead of uuid_be
  fs: switch ->s_uuid to uuid_t
  ima/policy: switch to use uuid_t
  ...
2017-07-03 09:55:26 -07:00
Christoph Hellwig
5860acc1a9 x86: remove arch specific dma_supported implementation
And instead wire it up as method for all the dma_map_ops instances.

Note that this also means the arch specific check will be fully instead
of partially applied in the AMD iommu driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-28 06:54:46 -07:00
Christoph Hellwig
a869572c31 iommu/amd: implement ->mapping_error
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-28 06:54:30 -07:00
Joerg Roedel
6a7086431f Merge branches 'iommu/fixes', 'arm/rockchip', 'arm/renesas', 'arm/smmu', 'arm/core', 'x86/vt-d', 'x86/amd', 's390' and 'core' into next 2017-06-28 14:45:02 +02:00
Suravee Suthikulpanit
84a21dbdef iommu/amd: Fix interrupt remapping when disable guest_mode
Pass-through devices to VM guest can get updated IRQ affinity
information via irq_set_affinity() when not running in guest mode.
Currently, AMD IOMMU driver in GA mode ignores the updated information
if the pass-through device is setup to use vAPIC regardless of guest_mode.
This could cause invalid interrupt remapping.

Also, the guest_mode bit should be set and cleared only when
SVM updates posted-interrupt interrupt remapping information.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Joerg Roedel <jroedel@suse.de>
Fixes: d98de49a53e48 ('iommu/amd: Enable vAPIC interrupt remapping mode by default')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 14:44:56 +02:00
Arvind Yadav
01e1932a17 iommu/vt-d: Constify intel_dma_ops
Most dma_map_ops structures are never modified. Constify these
structures such that these can be write-protected.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 14:17:12 +02:00
Joerg Roedel
72dcac6334 iommu: Warn once when device_group callback returns NULL
This callback should never return NULL. Print a warning if
that happens so that we notice and can fix it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 13:29:46 +02:00
Joerg Roedel
8faf5e5a12 iommu/omap: Return ERR_PTR in device_group call-back
Make sure that the device_group callback returns an ERR_PTR
instead of NULL.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 13:29:45 +02:00
Joerg Roedel
7f7a2304aa iommu: Return ERR_PTR() values from device_group call-backs
The generic device_group call-backs in iommu.c return NULL
in case of error. Since they are getting ERR_PTR values from
iommu_group_alloc(), just pass them up instead.

Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 13:29:45 +02:00
Joerg Roedel
0929deca40 iommu/s390: Use iommu_group_get_for_dev() in s390_iommu_add_device()
The iommu_group_get_for_dev() function also attaches the
device to its group, so this code doesn't need to be in the
iommu driver.

Further by using this function the driver can make use of
default domains in the future.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 12:29:00 +02:00
Sebastian Andrzej Siewior
58c4a95f90 iommu/vt-d: Don't disable preemption while accessing deferred_flush()
get_cpu() disables preemption and returns the current CPU number. The
CPU number is only used once while retrieving the address of the local's
CPU deferred_flush pointer.
We can instead use raw_cpu_ptr() while we remain preemptible. The worst
thing that can happen is that flush_unmaps_timeout() is invoked multiple
times: once by taskA after seeing HIGH_WATER_MARK and then preempted to
another CPU and then by taskB which saw HIGH_WATER_MARK on the same CPU
as taskA. It is also likely that ->size got from HIGH_WATER_MARK to 0
right after its read because another CPU invoked flush_unmaps_timeout()
for this CPU.
The access to flush_data is protected by a spinlock so even if we get
migrated to another CPU or preempted - the data structure is protected.

While at it, I marked deferred_flush static since I can't find a
reference to it outside of this file.

Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 12:24:40 +02:00
Sebastian Andrzej Siewior
aaffaa8a3b iommu/iova: Don't disable preempt around this_cpu_ptr()
Commit 583248e6620a ("iommu/iova: Disable preemption around use of
this_cpu_ptr()") disables preemption while accessing a per-CPU variable.
This does keep lockdep quiet. However I don't see the point why it is
bad if we get migrated after its access to another CPU.
__iova_rcache_insert() and __iova_rcache_get() immediately locks the
variable after obtaining it - before accessing its members.
_If_ we get migrated away after retrieving the address of cpu_rcache
before taking the lock then the *other* task on the same CPU will
retrieve the same address of cpu_rcache and will spin on the lock.

alloc_iova_fast() disables preemption while invoking
free_cpu_cached_iovas() on each CPU. The function itself uses
per_cpu_ptr() which does not trigger a warning (like this_cpu_ptr()
does). It _could_ make sense to use get_online_cpus() instead but the we
have a hotplug notifier for CPU down (and none for up) so we are good.

Cc: Joerg Roedel <joro@8bytes.org>
Cc: iommu@lists.linux-foundation.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-06-28 12:23:01 +02:00
Joerg Roedel
0b25635bd4 Merge branch 'for-joerg/arm-smmu/updates' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu 2017-06-28 10:42:54 +02:00
Geetha Sowjanya
f935448acf iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
Cavium ThunderX2 SMMU doesn't support MSI and also doesn't have unique irq
lines for gerror, eventq and cmdq-sync.

New named irq "combined" is set as a errata workaround, which allows to
share the irq line by register single irq handler for all the interrupts.

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Geetha sowjanya <gakula@caviumnetworks.com>
[will: reworked irq equality checking and added SPI check]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:04 +01:00
shameer
99caf177f6 iommu/arm-smmu-v3: Enable ACPI based HiSilicon CMD_PREFETCH quirk(erratum 161010701)
HiSilicon SMMUv3 on Hip06/Hip07 platforms doesn't support CMD_PREFETCH
command. The dt based support for this quirk is already present in the
driver(hisilicon,broken-prefetch-cmd). This adds ACPI support for the
quirk using the IORT smmu model number.

Signed-off-by: shameer <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: hanjun <guohanjun@huawei.com>
[will: rewrote patch]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:04 +01:00
Linu Cherian
e5b829de05 iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
Cavium ThunderX2 SMMU implementation doesn't support page 1 register space
and PAGE0_REGS_ONLY option is enabled as an errata workaround.
This option when turned on, replaces all page 1 offsets used for
EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets.

SMMU resource size checks are now based on SMMU option PAGE0_REGS_ONLY,
since resource size can be either 64k/128k.
For this, arm_smmu_device_dt_probe/acpi_probe has been moved before
platform_get_resource call, so that SMMU options are set beforehand.

Signed-off-by: Linu Cherian <linu.cherian@cavium.com>
Signed-off-by: Geetha Sowjanya <geethasowjanya.akula@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:03 +01:00
Robert Richter
12275bf0a4 iommu/arm-smmu-v3, acpi: Add temporary Cavium SMMU-V3 IORT model number definitions
The model number is already defined in acpica and we are actually
waiting for the acpi maintainers to include it:

 https://github.com/acpica/acpica/commit/d00a4eb86e64

Adding those temporary definitions until the change makes it into
include/acpi/actbl2.h. Once that is done this patch can be reverted.

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:03 +01:00
Will Deacon
77f3445866 iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table
When writing a new table entry, we must ensure that the contents of the
table is made visible to the SMMU page table walker before the updated
table entry itself.

This is currently achieved using wmb(), which expands to an expensive and
unnecessary DSB instruction. Ideally, we'd just use cmpxchg64_release when
writing the table entry, but this doesn't have memory ordering semantics
on !SMP systems.

Instead, use dma_wmb(), which emits DMB OSHST. Strictly speaking, this
does more than we require (since it targets the outer-shareable domain),
but it's likely to be significantly faster than the DSB approach.

Reported-by: Linu Cherian <linu.cherian@cavium.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:02 +01:00
Will Deacon
c1004803b4 iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE
The LPAE/ARMv8 page table format relies on the ability to read and write
64-bit page table entries in an atomic fashion. With the move to a lockless
implementation, we also need support for cmpxchg64 to resolve races when
installing table entries concurrently.

Unfortunately, not all architectures support cmpxchg64, so the code can
fail to compiler when building for these architectures using COMPILE_TEST.
Rather than disable COMPILE_TEST altogether, instead check that
GENERIC_ATOMIC64 is not selected, which is a reasonable indication that
the architecture has support for 64-bit cmpxchg.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:02 +01:00
Robin Murphy
58188afeb7 iommu/arm-smmu-v3: Remove io-pgtable spinlock
As for SMMUv2, take advantage of io-pgtable's newfound tolerance for
concurrency. Unfortunately in this case the command queue lock remains a
point of serialisation for the unmap path, but there may be a little
more we can do to ameliorate that in future.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:01 +01:00
Robin Murphy
523d7423e2 iommu/arm-smmu: Remove io-pgtable spinlock
With the io-pgtable code now robust against (valid) races, we no longer
need to serialise all operations with a lock. This might make broken
callers who issue concurrent operations on overlapping addresses go even
more wrong than before, but hey, they already had little hope of useful
or deterministic results.

We do however still have to keep a lock around to serialise the ATS1*
translation ops, as parallel iova_to_phys() calls could lead to
unpredictable hardware behaviour otherwise.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:01 +01:00
Robin Murphy
119ff305b0 iommu/io-pgtable-arm-v7s: Support lockless operation
Mirroring the LPAE implementation, rework the v7s code to be robust
against concurrent operations. The same two potential races exist, and
are solved in the same manner, with the fixed 2-level structure making
life ever so slightly simpler.

What complicates matters compared to LPAE, however, is large page
entries, since we can't update a block of 16 PTEs atomically, nor assume
available software bits to do clever things with. As most users are
never likely to do partial unmaps anyway (due to DMA API rules), it
doesn't seem unreasonable for this case to remain behind a serialising
lock; we just pull said lock down into the bowels of the implementation
so it's well out of the way of the normal call paths.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:00 +01:00
Robin Murphy
2c3d273eab iommu/io-pgtable-arm: Support lockless operation
For parallel I/O with multiple concurrent threads servicing the same
device (or devices, if several share a domain), serialising page table
updates becomes a massive bottleneck. On reflection, though, we don't
strictly need to do that - for valid IOMMU API usage, there are in fact
only two races that we need to guard against: multiple map requests for
different blocks within the same region, when the intermediate-level
table for that region does not yet exist; and multiple unmaps of
different parts of the same block entry. Both of those are fairly easily
solved by using a cmpxchg to install the new table, such that if we then
find that someone else's table got there first, we can simply free ours
and continue.

Make the requisite changes such that we can withstand being called
without the caller maintaining a lock. In theory, this opens up a few
corners in which wildly misbehaving callers making nonsensical
overlapping requests might lead to crashes instead of just unpredictable
results, but correct code really does not deserve to pay a significant
performance cost for the sake of masking bugs in theoretical broken code.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:00 +01:00
Robin Murphy
81b3c25218 iommu/io-pgtable: Introduce explicit coherency
Once we remove the serialising spinlock, a potential race opens up for
non-coherent IOMMUs whereby a caller of .map() can be sure that cache
maintenance has been performed on their new PTE, but will have no
guarantee that such maintenance for table entries above it has actually
completed (e.g. if another CPU took an interrupt immediately after
writing the table entry, but before initiating the DMA sync).

Handling this race safely will add some potentially non-trivial overhead
to installing a table entry, which we would much rather avoid on
coherent systems where it will be unnecessary, and where we are stirivng
to minimise latency by removing the locking in the first place.

To that end, let's introduce an explicit notion of cache-coherency to
io-pgtable, such that we will be able to avoid penalising IOMMUs which
know enough to know when they are coherent.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:00 +01:00
Robin Murphy
b9f1ef30ac iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
Whilst the short-descriptor format's split_blk_unmap implementation has
no need to be recursive, it followed the pattern of the LPAE version
anyway for the sake of consistency. With the latter now reworked for
both efficiency and future scalability improvements, tweak the former
similarly, not least to make it less obtuse.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:59 +01:00
Robin Murphy
fb3a95795d iommu/io-pgtable-arm: Improve split_blk_unmap
The current split_blk_unmap implementation suffers from some inscrutable
pointer trickery for creating the tables to replace the block entry, but
more than that it also suffers from hideous inefficiency. For example,
the most pathological case of unmapping a level 3 page from a level 1
block will allocate 513 lower-level tables to remap the entire block at
page granularity, when only 2 are actually needed (the rest can be
covered by level 2 block entries).

Also, we would like to be able to relax the spinlock requirement in
future, for which the roll-back-and-try-again logic for race resolution
would be pretty hideous under the current paradigm.

Both issues can be resolved most neatly by turning things sideways:
instead of repeatedly recursing into __arm_lpae_map() map to build up an
entire new sub-table depth-first, we can directly replace the block
entry with a next-level table of block/page entries, then repeat by
unmapping at the next level if necessary. With a little refactoring of
some helper functions, the code ends up not much bigger than before, but
considerably easier to follow and to adapt in future.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:59 +01:00
Robin Murphy
9db829d281 iommu/io-pgtable-arm-v7s: Check table PTEs more precisely
Whilst we don't support the PXN bit at all, so should never encounter a
level 1 section or supersection PTE with it set, it would still be wise
to check both table type bits to resolve any theoretical ambiguity.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:58 +01:00
Arvind Yadav
5c2d021829 iommu: arm-smmu: Handle return of iommu_device_register.
iommu_device_register returns an error code and, although it currently
never fails, we should check its return value anyway.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
[will: adjusted to follow arm-smmu.c]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:58 +01:00
Arvind Yadav
ebdd13c93f iommu: arm-smmu-v3: make of_device_ids const
of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:57 +01:00
Robin Murphy
84c24379a7 iommu/arm-smmu: Plumb in new ACPI identifiers
Revision C of IORT now allows us to identify ARM MMU-401 and the Cavium
ThunderX implementation. Wire them up so that we can probe these models
once firmware starts using the new codes in place of generic ones, and
so that the appropriate features and quirks get enabled when we do.

For the sake of backports and mitigating sychronisation problems with
the ACPICA headers, we'll carry a backup copy of the new definitions
locally for the short term to make life simpler.

CC: stable@vger.kernel.org # 4.10
Acked-by: Robert Richter <rrichter@cavium.com>
Tested-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:57 +01:00