36301 Commits

Author SHA1 Message Date
Juergen Gross
56415c4c3d x86/xen: eliminate xen-asm_64.S
With 32-bit pv-guest support removed xen-asm_64.S can be merged with
xen-asm.S

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-11 08:26:58 +02:00
Juergen Gross
a13f2ef168 x86/xen: remove 32-bit Xen PV guest support
Xen is requiring 64-bit machines today and since Xen 4.14 it can be
built without 32-bit PV guest support. There is no need to carry the
burden of 32-bit PV guest support in the kernel any longer, as new
guests can be either HVM or PVH, or they can use a 64 bit kernel.

Remove the 32-bit Xen PV support from the kernel.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-11 08:26:48 +02:00
Linus Torvalds
97d052ea3f A set of locking fixes and updates:
- Untangle the header spaghetti which causes build failures in various
     situations caused by the lockdep additions to seqcount to validate that
     the write side critical sections are non-preemptible.
 
   - The seqcount associated lock debug addons which were blocked by the
     above fallout.
 
     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict per
     CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot
     validate that the lock is held.
 
     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored and
     write_seqcount_begin() has a lockdep assertion to validate that the
     lock is held.
 
     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API is
     unchanged and determines the type at compile time with the help of
     _Generic which is possible now that the minimal GCC version has been
     moved up.
 
     Adding this lockdep coverage unearthed a handful of seqcount bugs which
     have been addressed already independent of this.
 
     While generaly useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if the
     writers are serialized by an associated lock, which leads to the well
     known reader preempts writer livelock. RT prevents this by storing the
     associated lock pointer independent of lockdep in the seqcount and
     changing the reader side to block on the lock when a reader detects
     that a writer is in the write side critical section.
 
  - Conversion of seqcount usage sites to associated types and initializers.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8xmPYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTuQEACyzQCjU8PgehPp9oMqWzaX2fcVyuZO
 QU2yw6gmz2oTz3ZHUNwdW8UnzGh2OWosK3kDruoD9FtSS51lER1/ISfSPCGfyqxC
 KTjOcB1Kvxwq/3LcCx7Zi3ZxWApat74qs3EhYhKtEiQ2Y9xv9rLq8VV1UWAwyxq0
 eHpjlIJ6b6rbt+ARslaB7drnccOsdK+W/roNj4kfyt+gezjBfojGRdMGQNMFcpnv
 shuTC+vYurAVIiVA/0IuizgHfwZiXOtVpjVoEWaxg6bBH6HNuYMYzdSa/YrlDkZs
 n/aBI/Xkvx+Eacu8b1Zwmbzs5EnikUK/2dMqbzXKUZK61eV4hX5c2xrnr1yGWKTs
 F/juh69Squ7X6VZyKVgJ9RIccVueqwR2EprXWgH3+RMice5kjnXH4zURp0GHALxa
 DFPfB6fawcH3Ps87kcRFvjgm6FBo0hJ1AxmsW1dY4ACFB9azFa2euW+AARDzHOy2
 VRsUdhL9CGwtPjXcZ/9Rhej6fZLGBXKr8uq5QiMuvttp4b6+j9FEfBgD4S6h8csl
 AT2c2I9LcbWqyUM9P4S7zY/YgOZw88vHRuDH7tEBdIeoiHfrbSBU7EQ9jlAKq/59
 f+Htu2Io281c005g7DEeuCYvpzSYnJnAitj5Lmp/kzk2Wn3utY1uIAVszqwf95Ul
 81ppn2KlvzUK8g==
 =7Gj+
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
  x86/headers: Remove APIC headers from <asm/smp.h>
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
2020-08-10 19:07:44 -07:00
Jon Doron
99b48ecc8e x86/kvm/hyper-v: Synic default SCONTROL MSR needs to be enabled
Based on an analysis of the HyperV firmwares (Gen1 and Gen2) it seems
like the SCONTROL is not being set to the ENABLED state as like we have
thought.

Also from a test done by Vitaly Kuznetsov, running a nested HyperV it
was concluded that the first access to the SCONTROL MSR with a read
resulted with the value of 0x1, aka HV_SYNIC_CONTROL_ENABLE.

It's important to note that this diverges from the value states in the
HyperV TLFS of 0.

Signed-off-by: Jon Doron <arilou@gmail.com>
Message-Id: <20200717125238.1103096-2-arilou@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-10 13:22:10 -04:00
Linus Torvalds
fc80c51fd4 Kbuild updates for v5.9
- run the checker (e.g. sparse) after the compiler
 
  - remove unneeded cc-option tests for old compiler flags
 
  - fix tar-pkg to install dtbs
 
  - introduce ccflags-remove-y and asflags-remove-y syntax
 
  - allow to trace functions in sub-directories of lib/
 
  - introduce hostprogs-always-y and userprogs-always-y syntax
 
  - various Makefile cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl8wJXEVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGMGEP/0jDq/WafbfPN0aU83EqEWLt/sKg
 bluzmf/6HGx3XVRnuAzsHNNqysUx77WJiDsU/jbC/zdH8Iox3Sc1diE2sELLNAfY
 iJmQ8NBPggyU74aYG3OJdpDjz8T9EX/nVaYrjyFlbuXElM+Qvo8Z4Fz6NpWqKWlA
 gU+yGxEPPdX6MLHcSPSIu1hGWx7UT4fgfx3zDFTI2qvbQgQjKtzyTjAH5Cm3o87h
 rfomvHSSoAUg+Fh1LediRh1tJlkdVO+w7c+LNwCswmdBtkZuxecj1bQGUTS8GaLl
 CCWOKYfWp0KsVf1veXNNNaX/ecbp+Y34WErFq3V9Fdq5RmVlp+FPSGMyjDMRiQ/p
 LGvzbJLPpG586MnK8of0dOj6Es6tVPuq6WH2HuvsyTGcZJDpFTTxRcK3HDkE8ig6
 ZtuM3owB/Mep8IzwY2yWQiDrc7TX5Fz8S4hzGPU1zG9cfj4VT6TBqHGAy1Eql/0l
 txj6vJpnbQSdXiIX8MIU3yH35Y7eW3JYWgspTZH5Woj1S/wAWwuG93Fuuxq6mQIJ
 q6LSkMavtOfuCjOA9vJBZewpKXRU6yo0CzWNL/5EZ6z/r/I+DGtfb/qka8oYUDjX
 9H0cecL37AQxDHRPTxCZDQF0TpYiFJ6bmnMftK9NKNuIdvsk9DF7UBa3EdUNIj38
 yKS3rI7Lw55xWuY3
 =bkNQ
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - run the checker (e.g. sparse) after the compiler

 - remove unneeded cc-option tests for old compiler flags

 - fix tar-pkg to install dtbs

 - introduce ccflags-remove-y and asflags-remove-y syntax

 - allow to trace functions in sub-directories of lib/

 - introduce hostprogs-always-y and userprogs-always-y syntax

 - various Makefile cleanups

* tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: stop filtering out $(GCC_PLUGINS_CFLAGS) from cc-option base
  kbuild: include scripts/Makefile.* only when relevant CONFIG is enabled
  kbuild: introduce hostprogs-always-y and userprogs-always-y
  kbuild: sort hostprogs before passing it to ifneq
  kbuild: move host .so build rules to scripts/gcc-plugins/Makefile
  kbuild: Replace HTTP links with HTTPS ones
  kbuild: trace functions in subdirectories of lib/
  kbuild: introduce ccflags-remove-y and asflags-remove-y
  kbuild: do not export LDFLAGS_vmlinux
  kbuild: always create directories of targets
  powerpc/boot: add DTB to 'targets'
  kbuild: buildtar: add dtbs support
  kbuild: remove cc-option test of -ffreestanding
  kbuild: remove cc-option test of -fno-stack-protector
  Revert "kbuild: Create directory for target DTB"
  kbuild: run the checker after the compiler
2020-08-09 14:10:26 -07:00
Paolo Bonzini
43bd9ef42b x86: Expose SERIALIZE for supported cpuid
The SERIALIZE instruction is supported by Tntel processors, like
Sapphire Rapids.  SERIALIZE is a faster serializing instruction which
does not modify registers, arithmetic flags or memory, will not cause VM
exit. It's availability is indicated by CPUID.(EAX=7,ECX=0):ECX[bit 14].

Expose it in KVM supported CPUID.  This way, KVM could pass this
information to guests and they can make use of these features accordingly.

Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-09 13:08:54 -04:00
Sean Christopherson
05487215e6 KVM: x86: Don't attempt to load PDPTRs when 64-bit mode is enabled
Don't attempt to load PDPTRs if EFER.LME=1, i.e. if 64-bit mode is
enabled.  A recent change to reload the PDTPRs when CR0.CD or CR0.NW is
toggled botched the EFER.LME handling and sends KVM down the PDTPR path
when is_paging() is true, i.e. when the guest toggles CD/NW in 64-bit
mode.

Split the CR0 checks for 64-bit vs. 32-bit PAE into separate paths.  The
64-bit path is specifically checking state when paging is toggled on,
i.e. CR0.PG transititions from 0->1.  The PDPTR path now needs to run if
the new CR0 state has paging enabled, irrespective of whether paging was
already enabled.  Trying to shave a few cycles to make the PDPTR path an
"else if" case is a mess.

Fixes: d42e3fae6faed ("kvm: x86: Read PDPTEs on CR0.CD and CR0.NW changes")
Cc: Jim Mattson <jmattson@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200714015732.32426-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-09 12:51:50 -04:00
Linus Torvalds
049eb096da pci-v5.9-changes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl8sdUkUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwH2Q/7Brcm1uyLORSzseGsaXSGMncBs2YB
 aKbfhyy4BPsDIZRLnzcfRZzgKo3f4jlLH9dJ6nBukbNXCvS/g7oYCXtNKVuB70MD
 IgBH3OJxLmqsYgDkoQmj1fZBCBhdqMgGbRmeIPLqiIBrWOJkBpGHXKpb0XtyXAas
 CpD0Tvr0JBeHMluZq6Uay09jBDKexeCFrT5HCoVaRMXT/C/iB5K1oMrUczzITsdi
 jB9xesDjh32rYtaePKfuL8itbRT7jtqOwQlk7sCtnMNamaOOaYO/s6hL5v/4GxMh
 rtWa1knOxxA1nOsnEkUEHi0Fj/+9zXDIdb7v6thRDo0ZgWQxl7l3nshvmPcxX421
 tpCm3HqmvHzGqSI85Rtr3p4XKm9e+IjgE2EA/J6Y8Q6Grrb0EGJituhO4meL2Ciq
 6mxdhu7InxDJ2p3TLGas3fB/1hrCO0Fc0pQoBJx7YgqA1ANyld9DYCkDN6IDoZBI
 uUjKgkE1dfbW/pGjotjhBsmz3dycZHkurIFdt1iX/Xtt5KKdPAzu9yM2U03iIS2R
 im1wZ/THiS/YCOlgL/J8+DHTY0ZvXjAdbiSPjTFfwb9XTh8aHVWtFaaZON1jRIjg
 xMpIY0SxfshpLx631ThZdDTDiOwE8D3B+1n/kMwps6HOLpxOoJZeSGTRCt9wGP40
 j58DTtLm5FKpdYc=
 =moI9
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration:
   - Fix pci_cfg_wait queue locking problem (Bjorn Helgaas)
   - Convert PCIe capability PCIBIOS errors to errno (Bolarinwa Olayemi
     Saheed)
   - Align PCIe capability and PCI accessor return values (Bolarinwa
     Olayemi Saheed)
   - Fix pci_create_slot() reference count leak (Qiushi Wu)
   - Announce device after early fixups (Tiezhu Yang)

  PCI device hotplug:
   - Make rpadlpar functions static (Wei Yongjun)

  Driver binding:
   - Add device even if driver attach failed (Rajat Jain)

  Virtualization:
   - xen: Remove redundant initialization of irq (Colin Ian King)

  IOMMU:
   - Add pci_pri_supported() to check device or associated PF (Ashok Raj)
   - Release IVRS table in AMD ACS quirk (Hanjun Guo)
   - Mark AMD Navi10 GPU rev 0x00 ATS as broken (Kai-Heng Feng)
   - Treat "external-facing" devices themselves as internal (Rajat Jain)

  MSI:
   - Forward MSI-X error code in pci_alloc_irq_vectors_affinity() (Piotr
     Stankiewicz)

  Error handling:
   - Clear PCIe Device Status errors only if OS owns AER (Jonathan
     Cameron)
   - Log correctable errors as warning, not error (Matt Jolly)
   - Use 'pci_channel_state_t' instead of 'enum pci_channel_state' (Luc
     Van Oostenryck)

  Peer-to-peer DMA:
   - Allow P2PDMA on AMD Zen and newer CPUs (Logan Gunthorpe)

  ASPM:
   - Add missing newline in sysfs 'policy' (Xiongfeng Wang)

  Native PCIe controllers:
   - Convert to devm_platform_ioremap_resource_byname() (Dejin Zheng)
   - Convert to devm_platform_ioremap_resource() (Dejin Zheng)
   - Remove duplicate error message from devm_pci_remap_cfg_resource()
     callers (Dejin Zheng)
   - Fix runtime PM imbalance on error (Dinghao Liu)
   - Remove dev_err() when handing an error from platform_get_irq()
     (Krzysztof Wilczyński)
   - Use pci_host_bridge.windows list directly instead of splicing in a
     temporary list for cadence, mvebu, host-common (Rob Herring)
   - Use pci_host_probe() instead of open-coding all the pieces for
     altera, brcmstb, iproc, mobiveil, rcar, rockchip, tegra, v3,
     versatile, xgene, xilinx, xilinx-nwl (Rob Herring)
   - Default host bridge parent device to the platform device (Rob
     Herring)
   - Use pci_is_root_bus() instead of tracking root bus number
     separately in aardvark, designware (imx6, keystone,
     designware-host), mobiveil, xilinx-nwl, xilinx, rockchip, rcar (Rob
     Herring)
   - Set host bridge bus number in pci_scan_root_bus_bridge() instead of
     each driver for aardvark, designware-host, host-common, mediatek,
     rcar, tegra, v3-semi (Rob Herring)
   - Move DT resource setup into devm_pci_alloc_host_bridge() (Rob
     Herring)
   - Set bridge map_irq and swizzle_irq to default functions; drivers
     that don't support legacy IRQs (iproc) need to undo this (Rob
     Herring)

  ARM Versatile PCIe controller driver:
   - Drop flag PCI_ENABLE_PROC_DOMAINS (Rob Herring)

  Cadence PCIe controller driver:
   - Use "dma-ranges" instead of "cdns,no-bar-match-nbits" property
     (Kishon Vijay Abraham I)
   - Remove "mem" from reg binding (Kishon Vijay Abraham I)
   - Fix cdns_pcie_{host|ep}_setup() error path (Kishon Vijay Abraham I)
   - Convert all r/w accessors to perform only 32-bit accesses (Kishon
     Vijay Abraham I)
   - Add support to start link and verify link status (Kishon Vijay
     Abraham I)
   - Allow pci_host_bridge to have custom pci_ops (Kishon Vijay Abraham I)
   - Add new *ops* for CPU addr fixup (Kishon Vijay Abraham I)
   - Fix updating Vendor ID and Subsystem Vendor ID register (Kishon
     Vijay Abraham I)
   - Use bridge resources for outbound window setup (Rob Herring)
   - Remove private bus number and range storage (Rob Herring)

  Cadence PCIe endpoint driver:
   - Add MSI-X support (Alan Douglas)

  HiSilicon PCIe controller driver:
   - Remove non-ECAM HiSilicon hip05/hip06 driver (Rob Herring)

  Intel VMD host bridge driver:
   - Use Shadow MEMBAR registers for QEMU/KVM guests (Jon Derrick)

  Loongson PCIe controller driver:
   - Use DECLARE_PCI_FIXUP_EARLY for bridge_class_quirk() (Tiezhu Yang)

  Marvell Aardvark PCIe controller driver:
   - Indicate error in 'val' when config read fails (Pali Rohár)
   - Don't touch PCIe registers if no card connected (Pali Rohár)

  Marvell MVEBU PCIe controller driver:
   - Setup BAR0 in order to fix MSI (Shmuel Hazan)

  Microsoft Hyper-V host bridge driver:
   - Fix a timing issue which causes kdump to fail occasionally (Wei Hu)
   - Make some functions static (Wei Yongjun)

  NVIDIA Tegra PCIe controller driver:
   - Revert tegra124 raw_violation_fixup (Nicolas Chauvet)
   - Remove PLL power supplies (Thierry Reding)

  Qualcomm PCIe controller driver:
   - Change duplicate PCI reset to phy reset (Abhishek Sahu)
   - Add missing ipq806x clocks in PCIe driver (Ansuel Smith)
   - Add missing reset for ipq806x (Ansuel Smith)
   - Add ext reset (Ansuel Smith)
   - Use bulk clk API and assert on error (Ansuel Smith)
   - Add support for tx term offset for rev 2.1.0 (Ansuel Smith)
   - Define some PARF params needed for ipq8064 SoC (Ansuel Smith)
   - Add ipq8064 rev2 variant (Ansuel Smith)
   - Support PCI speed set for ipq806x (Sham Muthayyan)

  Renesas R-Car PCIe controller driver:
   - Use devm_pci_alloc_host_bridge() (Rob Herring)
   - Use struct pci_host_bridge.windows list directly (Rob Herring)
   - Convert rcar-gen2 to use modern host bridge probe functions (Rob
     Herring)

  TI J721E PCIe driver:
   - Add TI J721E PCIe host and endpoint driver (Kishon Vijay Abraham I)

  Xilinx Versal CPM PCIe controller driver:
   - Add Versal CPM Root Port driver and YAML schema (Bharat Kumar
     Gogada)

  MicroSemi Switchtec management driver:
   - Add missing __iomem and __user tags to fix sparse warnings (Logan
     Gunthorpe)

  Miscellaneous:
   - Replace http:// links with https:// (Alexander A. Klimov)
   - Replace lkml.org, spinics, gmane with lore.kernel.org (Bjorn
     Helgaas)
   - Remove unused pci_lost_interrupt() (Heiner Kallweit)
   - Move PCI_VENDOR_ID_REDHAT definition to pci_ids.h (Huacai Chen)
   - Fix kerneldoc warnings (Krzysztof Kozlowski)"

* tag 'pci-v5.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (113 commits)
  PCI: Fix kerneldoc warnings
  PCI: xilinx-cpm: Add Versal CPM Root Port driver
  PCI: xilinx-cpm: Add YAML schemas for Versal CPM Root Port
  PCI: Set bridge map_irq and swizzle_irq to default functions
  PCI: Move DT resource setup into devm_pci_alloc_host_bridge()
  PCI: rcar-gen2: Convert to use modern host bridge probe functions
  PCI: Remove dev_err() when handing an error from platform_get_irq()
  MAINTAINERS: Add Kishon Vijay Abraham I for TI J721E SoC PCIe
  misc: pci_endpoint_test: Add J721E in pci_device_id table
  PCI: j721e: Add TI J721E PCIe driver
  PCI: switchtec: Add missing __iomem tag to fix sparse warnings
  PCI: switchtec: Add missing __iomem and __user tags to fix sparse warnings
  PCI: rpadlpar: Make functions static
  PCI/P2PDMA: Allow P2PDMA on AMD Zen and newer CPUs
  PCI: Release IVRS table in AMD ACS quirk
  PCI: Announce device after early fixups
  PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken
  PCI: Remove unused pci_lost_interrupt()
  dt-bindings: PCI: Add EP mode dt-bindings for TI's J721E SoC
  dt-bindings: PCI: Add host mode dt-bindings for TI's J721E SoC
  ...
2020-08-07 18:48:15 -07:00
Linus Torvalds
32663c78c1 Tracing updates for 5.9
- The biggest news in that the tracing ring buffer can now time events that
    interrupted other ring buffer events. Before this change, if an interrupt
    came in while recording another event, and that interrupt also had an
    event, those events would all have the same time stamp as the event it
    interrupted. Now, with the new design, those events will have a unique time
    stamp and rightfully display the time for those events that were recorded
    while interrupting another event.
 
  - Bootconfig how has an "override" operator that lets the users have a
    default config, but then add options to override the default.
 
  - A fix was made to properly filter function graph tracing to the ftrace
    PIDs. This came in at the end of the -rc cycle, and needs to be backported.
 
  - Several clean ups, performance updates, and minor fixes as well.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXy3GOBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qphsAP9ci1jtrC2+cMBMCNKb/AFpA/nDaKsD
 hpsDzvD0YPOmCAEA9QbZset8wUNG49R4FexP7egQ8Ad2S6Oa5f60jWleDQY=
 =lH+q
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - The biggest news in that the tracing ring buffer can now time events
   that interrupted other ring buffer events.

   Before this change, if an interrupt came in while recording another
   event, and that interrupt also had an event, those events would all
   have the same time stamp as the event it interrupted.

   Now, with the new design, those events will have a unique time stamp
   and rightfully display the time for those events that were recorded
   while interrupting another event.

 - Bootconfig how has an "override" operator that lets the users have a
   default config, but then add options to override the default.

 - A fix was made to properly filter function graph tracing to the
   ftrace PIDs. This came in at the end of the -rc cycle, and needs to
   be backported.

 - Several clean ups, performance updates, and minor fixes as well.

* tag 'trace-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (39 commits)
  tracing: Add trace_array_init_printk() to initialize instance trace_printk() buffers
  kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE
  tracing: Use trace_sched_process_free() instead of exit() for pid tracing
  bootconfig: Fix to find the initargs correctly
  Documentation: bootconfig: Add bootconfig override operator
  tools/bootconfig: Add testcases for value override operator
  lib/bootconfig: Add override operator support
  kprobes: Remove show_registers() function prototype
  tracing/uprobe: Remove dead code in trace_uprobe_register()
  kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler
  ftrace: Fix ftrace_trace_task return value
  tracepoint: Use __used attribute definitions from compiler_attributes.h
  tracepoint: Mark __tracepoint_string's __used
  trace : Have tracing buffer info use kvzalloc instead of kzalloc
  tracing: Remove outdated comment in stack handling
  ftrace: Do not let direct or IPMODIFY ftrace_ops be added to module and set trampolines
  ftrace: Setup correct FTRACE_FL_REGS flags for module
  tracing/hwlat: Honor the tracing_cpumask
  tracing/hwlat: Drop the duplicate assignment in start_kthread()
  tracing: Save one trace_event->type by using __TRACE_LAST_TYPE
  ...
2020-08-07 18:29:15 -07:00
Linus Torvalds
0f43283be7 Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fdpick coredump update from Al Viro:
 "Switches fdpic coredumps away from original aout dumping primitives to
  the same kind of regset use as regular elf coredumps do"

* 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [elf-fdpic] switch coredump to regsets
  [elf-fdpic] use elf_dump_thread_status() for the dumper thread as well
  [elf-fdpic] move allocation of elf_thread_status into elf_dump_thread_status()
  [elf-fdpic] coredump: don't bother with cyclic list for per-thread objects
  kill elf_fpxregs_t
  take fdpic-related parts of elf_prstatus out
  unexport linux/elfcore.h
2020-08-07 13:29:39 -07:00
Linus Torvalds
81e11336d9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few MM hotfixes

 - kthread, tools, scripts, ntfs and ocfs2

 - some of MM

Subsystems affected by this patch series: kthread, tools, scripts, ntfs,
ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan,
debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore,
sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
  mm: vmscan: consistent update to pgrefill
  mm/vmscan.c: fix typo
  khugepaged: khugepaged_test_exit() check mmget_still_valid()
  khugepaged: retract_page_tables() remember to test exit
  khugepaged: collapse_pte_mapped_thp() protect the pmd lock
  khugepaged: collapse_pte_mapped_thp() flush the right range
  mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
  mm: thp: replace HTTP links with HTTPS ones
  mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
  mm/page_alloc.c: skip setting nodemask when we are in interrupt
  mm/page_alloc: fallbacks at most has 3 elements
  mm/page_alloc: silence a KASAN false positive
  mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
  mm/page_alloc.c: simplify pageblock bitmap access
  mm/page_alloc.c: extract the common part in pfn_to_bitidx()
  mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
  mm/shuffle: remove dynamic reconfiguration
  mm/memory_hotplug: document why shuffle_zone() is relevant
  mm/page_alloc: remove nr_free_pagecache_pages()
  mm: remove vm_total_pages
  ...
2020-08-07 11:39:33 -07:00
Mike Rapoport
c89ab04feb mm/sparse: cleanup the code surrounding memory_present()
After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
functions that call memory_present() for each region in memblock.memory:
sparse_memory_present_with_active_regions() and membocks_present().

Moreover, all architectures have a call to either of these functions
preceding the call to sparse_init() and in the most cases they are called
one after the other.

Mark the regions from memblock.memory as present during sparce_init() by
making sparse_init() call memblocks_present(), make memblocks_present()
and memory_present() functions static and remove redundant
sparse_memory_present_with_active_regions() function.

Also remove no longer required HAVE_MEMORY_PRESENT configuration option.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:27 -07:00
Anshuman Khandual
56993b4e14 mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf()
There are many instances where vmemap allocation is often switched between
regular memory and device memory just based on whether altmap is available
or not.  vmemmap_alloc_block_buf() is used in various platforms to
allocate vmemmap mappings.  Lets also enable it to handle altmap based
device memory allocation along with existing regular memory allocations.
This will help in avoiding the altmap based allocation switch in many
places.  To summarize there are two different methods to call
vmemmap_alloc_block_buf().

vmemmap_alloc_block_buf(size, node, NULL)   /* Allocate from system RAM */
vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */

This converts altmap_alloc_block_buf() into a static function, drops it's
entry from the header and updates Documentation/vm/memory-model.rst.

Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jia He <justin.he@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yu Zhao <yuzhao@google.com>
Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:27 -07:00
Anshuman Khandual
1d9cfee753 mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages()
Patch series "arm64: Enable vmemmap mapping from device memory", v4.

This series enables vmemmap backing memory allocation from device memory
ranges on arm64.  But before that, it enables vmemmap_populate_basepages()
and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
alocation requests.

This patch (of 3):

vmemmap_populate_basepages() is used across platforms to allocate backing
memory for vmemmap mapping.  This is used as a standard default choice or
as a fallback when intended huge pages allocation fails.  This just
creates entire vmemmap mapping with base pages (PAGE_SIZE).

On arm64 platforms, vmemmap_populate_basepages() is called instead of the
platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS
is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs.

At present vmemmap_populate_basepages() does not support allocating from
driver defined struct vmem_altmap while trying to create vmemmap mapping
for a device memory range.  It prevents ARM64_16K_PAGES and
ARM64_64K_PAGES configs on arm64 from supporting device memory with
vmemap_altmap request.

This enables vmem_altmap support in vmemmap_populate_basepages() unlocking
device memory allocation for vmemap mapping on arm64 platforms with 16K or
64K base page configs.

Each architecture should evaluate and decide on subscribing device memory
based base page allocation through vmemmap_populate_basepages().  Hence
lets keep it disabled on all archs in order to preserve the existing
semantics.  A subsequent patch enables it on arm64.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jia He <justin.he@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Link: http://lkml.kernel.org/r/1594004178-8861-1-git-send-email-anshuman.khandual@arm.com
Link: http://lkml.kernel.org/r/1594004178-8861-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:27 -07:00
Mike Rapoport
f9cb654cb5 asm-generic: pgalloc: provide generic pgd_free()
Most architectures define pgd_free() as a wrapper for free_page().

Provide a generic version in asm-generic/pgalloc.h and enable its use for
most architectures.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-7-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Mike Rapoport
d9e8b92967 asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one()
Several architectures define pud_alloc_one() as a wrapper for
__get_free_page() and pud_free() as a wrapper for free_page().

Provide a generic implementation in asm-generic/pgalloc.h and use it where
appropriate.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-6-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Mike Rapoport
1355c31eeb asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()
For most architectures that support >2 levels of page tables,
pmd_alloc_one() is a wrapper for __get_free_pages(), sometimes with
__GFP_ZERO and sometimes followed by memset(0) instead.

More elaborate versions on arm64 and x86 account memory for the user page
tables and call to pgtable_pmd_page_ctor() as the part of PMD page
initialization.

Move the arm64 version to include/asm-generic/pgalloc.h and use the
generic version on several architectures.

The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is
not enabled, so there is no functional change for most architectures
except of the addition of __GFP_ACCOUNT for allocation of user page
tables.

The pmd_free() is a wrapper for free_page() in all the cases, so no
functional change here.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-5-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Mike Rapoport
ca15ca406f mm: remove unneeded includes of <asm/pgalloc.h>
Patch series "mm: cleanup usage of <asm/pgalloc.h>"

Most architectures have very similar versions of pXd_alloc_one() and
pXd_free_one() for intermediate levels of page table.  These patches add
generic versions of these functions in <asm-generic/pgalloc.h> and enable
use of the generic functions where appropriate.

In addition, functions declared and defined in <asm/pgalloc.h> headers are
used mostly by core mm and early mm initialization in arch and there is no
actual reason to have the <asm/pgalloc.h> included all over the place.
The first patch in this series removes unneeded includes of
<asm/pgalloc.h>

In the end it didn't work out as neatly as I hoped and moving
pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
unnecessary changes to arches that have custom page table allocations, so
I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
to mm/.

This patch (of 8):

In most cases <asm/pgalloc.h> header is required only for allocations of
page table memory.  Most of the .c files that include that header do not
use symbols declared in <asm/pgalloc.h> and do not require that header.

As for the other header files that used to include <asm/pgalloc.h>, it is
possible to move that include into the .c file that actually uses symbols
from <asm/pgalloc.h> and drop the include from the header file.

The process was somewhat automated using

	sed -i -E '/[<"]asm\/pgalloc\.h/d' \
                $(grep -L -w -f /tmp/xx \
                        $(git grep -E -l '[<"]asm/pgalloc\.h'))

where /tmp/xx contains all the symbols defined in
arch/*/include/asm/pgalloc.h.

[rppt@linux.ibm.com: fix powerpc warning]

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
Waiman Long
453431a549 mm, treewide: rename kzfree() to kfree_sensitive()
As said by Linus:

  A symmetric naming is only helpful if it implies symmetries in use.
  Otherwise it's actively misleading.

  In "kzalloc()", the z is meaningful and an important part of what the
  caller wants.

  In "kzfree()", the z is actively detrimental, because maybe in the
  future we really _might_ want to use that "memfill(0xdeadbeef)" or
  something. The "zero" part of the interface isn't even _relevant_.

The main reason that kzfree() exists is to clear sensitive information
that should not be leaked to other future users of the same memory
objects.

Rename kzfree() to kfree_sensitive() to follow the example of the recently
added kvfree_sensitive() and make the intention of the API more explicit.
In addition, memzero_explicit() is used to clear the memory to make sure
that it won't get optimized away by the compiler.

The renaming is done by using the command sequence:

  git grep -w --name-only kzfree |\
  xargs sed -i 's/kzfree/kfree_sensitive/'

followed by some editing of the kfree_sensitive() kerneldoc and adding
a kzfree backward compatibility macro in slab.h.

[akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
[akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Joe Perches <joe@perches.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:22 -07:00
Linus Torvalds
e51418191f xen: branch for v5.9-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXyzfRgAKCRCAXGG7T9hj
 vtYfAQCiqT8MBvwTpqsiu2si0ceyBZ7y8cuEqOMvqPTH0N1bYQD+JD1mMN2WjR0d
 nKWwy/i8lXudBsDqY4N81WWBh+FMPA4=
 =Lb6P
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - two trivial comment fixes

 - a small series for the Xen balloon driver fixing some issues

 - a series of the Xen privcmd driver targeting elimination of using
   get_user_pages*() in this driver

 - a series for the Xen swiotlb driver cleaning it up and adding support
   for letting the kernel run as dom0 on Rpi4

* tag 'for-linus-5.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/arm: call dma_to_phys on the dma_addr_t parameter of dma_cache_maint
  xen/arm: introduce phys/dma translations in xen_dma_sync_for_*
  swiotlb-xen: introduce phys_to_dma/dma_to_phys translations
  swiotlb-xen: remove XEN_PFN_PHYS
  swiotlb-xen: add struct device * parameter to is_xen_swiotlb_buffer
  swiotlb-xen: add struct device * parameter to xen_dma_sync_for_device
  swiotlb-xen: add struct device * parameter to xen_dma_sync_for_cpu
  swiotlb-xen: add struct device * parameter to xen_bus_to_phys
  swiotlb-xen: add struct device * parameter to xen_phys_to_bus
  swiotlb-xen: remove start_dma_addr
  swiotlb-xen: use vmalloc_to_page on vmalloc virt addresses
  Revert "xen/balloon: Fix crash when ballooning on x86 32 bit PAE"
  xen/balloon: make the balloon wait interruptible
  xen/balloon: fix accounting in alloc_xenballooned_pages error path
  xen: hypercall.h: fix duplicated word
  xen/gntdev: gntdev.h: drop a duplicated word
  xen/privcmd: Convert get_user_pages*() to pin_user_pages*()
  xen/privcmd: Mark pages as dirty
  xen/privcmd: Corrected error handling path
2020-08-07 10:53:02 -07:00
Linus Torvalds
e1ec517e18 Merge branch 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull init and set_fs() cleanups from Al Viro:
 "Christoph's 'getting rid of ksys_...() uses under KERNEL_DS' series"

* 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (50 commits)
  init: add an init_dup helper
  init: add an init_utimes helper
  init: add an init_stat helper
  init: add an init_mknod helper
  init: add an init_mkdir helper
  init: add an init_symlink helper
  init: add an init_link helper
  init: add an init_eaccess helper
  init: add an init_chmod helper
  init: add an init_chown helper
  init: add an init_chroot helper
  init: add an init_chdir helper
  init: add an init_rmdir helper
  init: add an init_unlink helper
  init: add an init_umount helper
  init: add an init_mount helper
  init: mark create_dev as __init
  init: mark console_on_rootfs as __init
  init: initialize ramdisk_execute_command at compile time
  devtmpfs: refactor devtmpfsd()
  ...
2020-08-07 09:40:34 -07:00
Linus Torvalds
19b39c38ab Merge branch 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ptrace regset updates from Al Viro:
 "Internal regset API changes:

   - regularize copy_regset_{to,from}_user() callers

   - switch to saner calling conventions for ->get()

   - kill user_regset_copyout()

  The ->put() side of things will have to wait for the next cycle,
  unfortunately.

  The balance is about -1KLoC and replacements for ->get() instances are
  a lot saner"

* 'work.regset' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
  regset: kill user_regset_copyout{,_zero}()
  regset(): kill ->get_size()
  regset: kill ->get()
  csky: switch to ->regset_get()
  xtensa: switch to ->regset_get()
  parisc: switch to ->regset_get()
  nds32: switch to ->regset_get()
  nios2: switch to ->regset_get()
  hexagon: switch to ->regset_get()
  h8300: switch to ->regset_get()
  openrisc: switch to ->regset_get()
  riscv: switch to ->regset_get()
  c6x: switch to ->regset_get()
  ia64: switch to ->regset_get()
  arc: switch to ->regset_get()
  arm: switch to ->regset_get()
  sh: convert to ->regset_get()
  arm64: switch to ->regset_get()
  mips: switch to ->regset_get()
  sparc: switch to ->regset_get()
  ...
2020-08-07 09:29:25 -07:00
Joerg Roedel
995909a4e2 x86/mm/64: Do not dereference non-present PGD entries
The code for preallocate_vmalloc_pages() was written under the
assumption that the p4d_offset() and pud_offset() functions will perform
present checks before dereferencing the parent entries.

This assumption is wrong an leads to a bug in the code which causes the
physical address found in the PGD be used as a page-table page, even if
the PGD is not present.

So the code flow currently is:

	pgd = pgd_offset_k(addr);
	p4d = p4d_offset(pgd, addr);
	if (p4d_none(*p4d))
		p4d = p4d_alloc(&init_mm, pgd, addr);

This lacks a check for pgd_none() at least, the correct flow would be:

	pgd = pgd_offset_k(addr);
	if (pgd_none(*pgd))
		p4d = p4d_alloc(&init_mm, pgd, addr);
	else
		p4d = p4d_offset(pgd, addr);

But this is the same flow that the p4d_alloc() and the pud_alloc()
functions use internally, so there is no need to duplicate them.

Remove the p?d_none() checks from the function and just call into
p4d_alloc() and pud_alloc() to correctly pre-allocate the PGD entries.

Reported-and-tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Fixes: 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc area")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 09:20:57 -07:00
Pawan Gupta
f29dfa53cc x86/bugs/multihit: Fix mitigation reporting when VMX is not in use
On systems that have virtualization disabled or unsupported, sysfs
mitigation for X86_BUG_ITLB_MULTIHIT is reported incorrectly as:

  $ cat /sys/devices/system/cpu/vulnerabilities/itlb_multihit
  KVM: Vulnerable

System is not vulnerable to DoS attack from a rogue guest when
virtualization is disabled or unsupported in the hardware. Change the
mitigation reporting for these cases.

Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Reported-by: Nelson Dsouza <nelson.dsouza@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/0ba029932a816179b9d14a30db38f0f11ef1f166.1594925782.git.pawan.kumar.gupta@linux.intel.com
2020-08-07 01:32:00 +02:00
Kan Liang
76d10256a9 x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
An xstate size check warning is triggered on machines which support
Architectural LBRs.

    XSAVE consistency problem, dumping leaves
    WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:649 fpu__init_system_xstate+0x4d4/0xd0e
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted intel-arch_lbr+
    RIP: 0010:fpu__init_system_xstate+0x4d4/0xd0e

The xstate size check routine, init_xstate_size(), compares the size
retrieved from the hardware with the size of task->fpu, which is
calculated by the software.

The size from the hardware is the total size of the enabled xstates in
XCR0 | IA32_XSS. Architectural LBR state is a dynamic supervisor
feature, which sets the corresponding bit in the IA32_XSS at boot time.
The size from the hardware includes the size of the Architectural LBR
state.

However, a dynamic supervisor feature doesn't allocate a buffer in the
task->fpu. The size of task->fpu doesn't include the size of the
Architectural LBR state. The mismatch will trigger the warning.

Three options as below were considered to fix the issue:

- Correct the size from the hardware by subtracting the size of the
  dynamic supervisor features.
  The purpose of the check is to compare the size CPU told with the size
  of the XSAVE buffer, which is calculated by the software. If the
  software mucks with the number from hardware, it removes the value of
  the check.
  This option is not a good option.

- Prevent the hardware from counting the size of the dynamic supervisor
  feature by temporarily removing the corresponding bits in IA32_XSS.
  Two extra MSR writes are required to flip the IA32_XSS. The option is
  not pretty, but it is workable. The check is only called once at early
  boot time. The synchronization or context-switching doesn't need to be
  worried.
  This option is implemented here.

- Remove the check entirely, because the check hasn't found any real
  problems. The option may be an alternative as option 2.
  This option is not implemented here.

Add a new function, get_xsaves_size_no_dynamic(), which retrieves the
total size without the dynamic supervisor features from the hardware.
The size will be used to compare with the size of task->fpu.

Fixes: f0dccc9da4c0 ("x86/fpu/xstate: Support dynamic supervisor feature for LBR")
Reported-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/r/1595253051-75374-1-git-send-email-kan.liang@linux.intel.com
2020-08-07 01:32:00 +02:00
Pingfan Liu
52416ffcf8 x86/purgatory: Don't generate debug info for purgatory.ro
Purgatory.ro is a standalone binary that is not linked against the rest of
the kernel.  Its image is copied into an array that is linked to the
kernel, and from there kexec relocates it wherever it desires.

Unlike the debug info for vmlinux, which can be used for analyzing crash
such info is useless in purgatory.ro. And discarding them can save about
200K space.

 Original:
   259080  kexec-purgatory.o
 Stripped debug info:
    29152  kexec-purgatory.o

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Dave Young <dyoung@redhat.com>
Link: https://lore.kernel.org/r/1596433788-3784-1-git-send-email-kernelfans@gmail.com
2020-08-07 01:32:00 +02:00
Dilip Kota
7d98585860 x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC
Frequency descriptor of Lightning Mountain SoC doesn't have all the
frequency entries so resulting in the below failure causing a kernel hang:

    Error MSR_FSB_FREQ index 15 is unknown
    tsc: Fast TSC calibration failed

So, add all the frequency entries in the Lightning Mountain SoC frequency
descriptor.

Fixes: 0cc5359d8fd45 ("x86/cpu: Update init data for new Airmont CPU model")
Fixes: 812c2d7506fd ("x86/tsc_msr: Use named struct initializers")
Signed-off-by: Dilip Kota <eswara.kota@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/211c643ae217604b46cbec43a2c0423946dc7d2d.1596440057.git.eswara.kota@linux.intel.com
2020-08-07 01:32:00 +02:00
Lianbo Jiang
a3e1c3bb24 x86/crash: Correct the address boundary of function parameters
Let's carefully handle the boundary of the function parameter to make
sure that the arguments passed doesn't exceed the address range.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Young <dyoung@redhat.com>
Link: https://lore.kernel.org/r/20200804044933.1973-2-lijiang@redhat.com
2020-08-07 01:32:00 +02:00
Shuo Liu
4c7bfa383e x86/acrn: Remove redundant chars from ACRN signature
hypervisor_cpuid_base() only handles 12 chars of the hypervisor
signature string but is provided with 14 chars.

Remove the redundancy. Additionally, replace the user space uint32_t
with preferred kernel type u32.

Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20200806114111.9448-1-shuo.a.liu@intel.com
2020-08-07 01:32:00 +02:00
Shuo Liu
86d709ce30 x86/acrn: Allow ACRN guest to use X2APIC mode
The ACRN Hypervisor did not support x2APIC and thus x2APIC support was
disabled by always returning false when VM checked for x2APIC support.

ACRN received full support of x2APIC and exports the capability through
CPUID feature bits.

Let VM decide if it needs to switch to x2APIC mode according to CPUID
features.

Originally-by: Yakui Zhao <yakui.zhao@intel.com>
Signed-off-by: Shuo Liu <shuo.a.liu@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20200806113802.9325-1-shuo.a.liu@intel.com
2020-08-07 01:32:00 +02:00
Linus Torvalds
921d2597ab s390: implement diag318
x86:
 * Report last CPU for debugging
 * Emulate smaller MAXPHYADDR in the guest than in the host
 * .noinstr and tracing fixes from Thomas
 * nested SVM page table switching optimization and fixes
 
 Generic:
 * Unify shadow MMU cache data structures across architectures
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8pC+oUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNcOwgAjomqtEqQNlp7DdZT7VyyklzbxX1/
 ud7v+oOJ8K4sFlf64lSthjPo3N9rzZCcw+yOXmuyuITngXOGc3tzIwXpCzpLtuQ1
 WO1Ql3B/2dCi3lP5OMmsO1UAZqy9pKLg1dfeYUPk48P5+p7d/NPmk+Em5kIYzKm5
 JsaHfCp2EEXomwmljNJ8PQ1vTjIQSSzlgYUBZxmCkaaX7zbEUMtxAQCStHmt8B84
 33LczwXBm3viSWrzsoBV37I70+tseugiSGsCfUyupXOvq55d6D9FCqtCb45Hn4Vh
 Ik8ggKdalsk/reiGEwNw1/3nr6mRMkHSbl+Mhc4waOIFf9dn0urgQgOaDg==
 =YVx0
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "s390:
   - implement diag318

  x86:
   - Report last CPU for debugging
   - Emulate smaller MAXPHYADDR in the guest than in the host
   - .noinstr and tracing fixes from Thomas
   - nested SVM page table switching optimization and fixes

  Generic:
   - Unify shadow MMU cache data structures across architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  KVM: SVM: Fix sev_pin_memory() error handling
  KVM: LAPIC: Set the TDCR settable bits
  KVM: x86: Specify max TDP level via kvm_configure_mmu()
  KVM: x86/mmu: Rename max_page_level to max_huge_page_level
  KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
  KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
  KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
  KVM: VMX: Make vmx_load_mmu_pgd() static
  KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
  KVM: VMX: Drop a duplicate declaration of construct_eptp()
  KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
  KVM: Using macros instead of magic values
  MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
  KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
  KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
  KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
  KVM: VMX: Add guest physical address check in EPT violation and misconfig
  KVM: VMX: introduce vmx_need_pf_intercept
  KVM: x86: update exception bitmap on CPUID changes
  KVM: x86: rename update_bp_intercept to update_exception_bitmap
  ...
2020-08-06 12:59:31 -07:00
Linus Torvalds
7b4ea9456d Revert "x86/mm/64: Do not sync vmalloc/ioremap mappings"
This reverts commit 8bb9bf242d1fee925636353807c511d54fde8986.

It seems the vmalloc page tables aren't always preallocated in all
situations, because Jason Donenfeld reports an oops with this commit:

  BUG: unable to handle page fault for address: ffffe8ffffd00608
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP
  CPU: 2 PID: 22 Comm: kworker/2:0 Not tainted 5.8.0+ #154
  RIP: process_one_work+0x2c/0x2d0
  Code: 41 56 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 08 48 8b 06 4c 8b 67 40 49 89 c6 45 30 f6 a8 04 b8 00 00 00 00 4c 0f 44 f0 <49> 8b 46 08 44 8b a8 00 01 05
  Call Trace:
   worker_thread+0x4b/0x3b0
   ? rescuer_thread+0x360/0x360
   kthread+0x116/0x140
   ? __kthread_create_worker+0x110/0x110
   ret_from_fork+0x1f/0x30
  CR2: ffffe8ffffd00608

and that page fault address is right in that vmalloc space, and we
clearly don't have a PGD/P4D entry for it.

Looking at the "Code:" line, the actual fault seems to come from the
'pwq->wq' dereference at the top of the process_one_work() function:

        struct pool_workqueue *pwq = get_work_pwq(work);
        struct worker_pool *pool = worker->pool;
        bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;

so 'struct pool_workqueue *pwq' is the allocation that hasn't been
synchronized across CPUs.

Just revert for now, while Joerg figures out the cause.

Reported-and-bisected-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-06 12:02:58 -07:00
Arvind Sankar
76167e5c54 x86/kaslr: Replace strlen() with strnlen()
strnlen is safer in case the command line is not NUL-terminated.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200803011534.730645-2-nivedita@alum.mit.edu
2020-08-06 17:03:19 +02:00
Thomas Gleixner
0099808553 x86: Select POSIX_CPU_TIMERS_TASK_WORK
Move POSIX CPU timer expiry and signal delivery into task context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200730102337.888613724@linutronix.de
2020-08-06 16:50:59 +02:00
Peter Zijlstra
0cd39f4600 locking/seqlock, headers: Untangle the spaghetti monster
By using lockdep_assert_*() from seqlock.h, the spaghetti monster
attacked.

Attack back by reducing seqlock.h dependencies from two key high level headers:

 - <linux/seqlock.h>:               -Remove <linux/ww_mutex.h>
 - <linux/time.h>:                  -Remove <linux/seqlock.h>
 - <linux/sched.h>:                 +Add    <linux/seqlock.h>

The price was to add it to sched.h ...

Core header fallout, we add direct header dependencies instead of gaining them
parasitically from higher level headers:

 - <linux/dynamic_queue_limits.h>:  +Add <asm/bug.h>
 - <linux/hrtimer.h>:               +Add <linux/seqlock.h>
 - <linux/ktime.h>:                 +Add <asm/bug.h>
 - <linux/lockdep.h>:               +Add <linux/smp.h>
 - <linux/sched.h>:                 +Add <linux/seqlock.h>
 - <linux/videodev2.h>:             +Add <linux/kernel.h>

Arch headers fallout:

 - PARISC: <asm/timex.h>:           +Add <asm/special_insns.h>
 - SH:     <asm/io.h>:              +Add <asm/page.h>
 - SPARC:  <asm/timer_64.h>:        +Add <uapi/asm/asi.h>
 - SPARC:  <asm/vvar.h>:            +Add <asm/processor.h>, <asm/barrier.h>
                                    -Remove <linux/seqlock.h>
 - X86:    <asm/fixmap.h>:          +Add <asm/pgtable_types.h>
                                    -Remove <asm/acpi.h>

There's also a bunch of parasitic header dependency fallout in .c files, not listed
separately.

[ mingo: Extended the changelog, split up & fixed the original patch. ]

Co-developed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200804133438.GK2674@hirez.programming.kicks-ass.net
2020-08-06 16:13:13 +02:00
Ingo Molnar
13c01139b1 x86/headers: Remove APIC headers from <asm/smp.h>
The APIC headers are relatively complex and bring in additional
header dependencies - while smp.h is a relatively simple header
included from high level headers.

Remove the dependency and add in the missing #include's in .c
files where they gained it indirectly before.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-08-06 16:13:09 +02:00
Thomas Gleixner
4c5a116ada vdso/treewide: Add vdso_data pointer argument to __arch_get_hw_counter()
MIPS already uses and S390 will need the vdso data pointer in
__arch_get_hw_counter().

This works nicely as long as the architecture does not support time
namespaces in the VDSO. With time namespaces enabled the regular
accessor to the vdso data pointer __arch_get_vdso_data() will return the
namespace specific VDSO data page for tasks which are part of a
non-root time namespace. This would cause the architectures which need
the vdso data pointer in __arch_get_hw_counter() to access the wrong
vdso data page.

Add a vdso_data pointer argument to __arch_get_hw_counter() and hand it in
from the call sites in the core code. For architectures which do not need
the data pointer in their counter accessor function the compiler will just
optimize it out.

Fix up all existing architecture implementations and make MIPS utilize the
pointer instead of invoking the accessor function.

No functional change and no change in the resulting object code (except
MIPS).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/draft-87wo2ekuzn.fsf@nanos.tec.linutronix.de
2020-08-06 10:57:30 +02:00
Ingo Molnar
a703f3633f Merge branch 'WIP.locking/seqlocks' into locking/urgent
Pick up the full seqlock series PeterZ is working on.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-08-06 10:16:38 +02:00
Linus Torvalds
47ec5303d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

 2) Support UDP segmentation in code TSO code, from Eric Dumazet.

 3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

 4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

 6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

 8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

10) Switch qca8k driver over to phylink, from Jonathan McDowell.

11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

12) Several conversions over to generic power management, from Vaibhav
    Gupta.

13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

14) Various https url conversions, from Alexander A. Klimov.

15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

20) XDP support for xen-netfront, from Denis Kirjanov.

21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

22) Support EF100 chip in sfc driver, from Edward Cree.

23) Add XDP support to mvpp2 driver, from Matteo Croce.

24) Support MPTCP in sock_diag, from Paolo Abeni.

25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

30) Add rfc4884 support to icmp code, from Willem de Bruijn.

31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

34) Support TCP syncookies in MPTCP, from Flowian Westphal.

35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
  net: thunderx: initialize VF's mailbox mutex before first usage
  usb: hso: remove bogus check for EINPROGRESS
  usb: hso: no complaint about kmalloc failure
  hso: fix bailout in error case of probe
  ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
  selftests/net: relax cpu affinity requirement in msg_zerocopy test
  mptcp: be careful on subflow creation
  selftests: rtnetlink: make kci_test_encap() return sub-test result
  selftests: rtnetlink: correct the final return value for the test
  net: dsa: sja1105: use detected device id instead of DT one on mismatch
  tipc: set ub->ifindex for local ipv6 address
  ipv6: add ipv6_dev_find()
  net: openvswitch: silence suspicious RCU usage warning
  Revert "vxlan: fix tos value before xmit"
  ptp: only allow phase values lower than 1 period
  farsync: switch from 'pci_' to 'dma_' API
  wan: wanxl: switch from 'pci_' to 'dma_' API
  hv_netvsc: do not use VF device if link is down
  dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
  net: macb: Properly handle phylink on at91sam9x
  ...
2020-08-05 20:13:21 -07:00
Linus Torvalds
8186749621 drm next for 5.9-rc1
core:
 - add user def flag to cmd line modes
 - dma_fence_wait added might_sleep
 - dma-fence lockdep annotations
 - indefinite fences are bad documentation
 - gem CMA functions used in more drivers
 - struct mutex removal
 - more drm_ debug macro usage
 - set/drop master api fixes
 - fix for drm/mm hole size comparison
 - drm/mm remove invalid entry optimization
 - optimise drm/mm hole handling
 - VRR debugfs added
 - uncompressed AFBC modifier support
 - multiple display id blocks in EDID
 - multiple driver sg handling fixes
 - __drm_atomic_helper_crtc_reset in all drivers
 - managed vram helpers
 
 ttm:
 - ttm_mem_reg handling cleanup
 - remove bo offset field
 - drop CMA memtype flag
 - drop mappable flag
 
 xilinx:
 - New Xilinx ZynqMP DisplayPort Subsystem driver
 
 nouveau:
 - add CRC support
 - start using NVIDIA published class header files
 - convert all push buffer emission to new macros
 - Proper push buffer space management for EVO/NVD channels.
 - firmware loading fixes
 - 2MiB system memory pages support on Pascal and newer
 
 vkms:
 - larget cursor support
 
 i915:
 - Rocketlake platform enablement
 - Early DG1 enablement
 - Numerous GEM refactorings
 - DP MST fixes
 - FBC, PSR, Cursor, Color, Gamma fixes
 - TGL, RKL, EHL workaround updates
 - TGL 8K display support fixes
 - SDVO/HDMI/DVI fixes
 
 amdgpu:
 - Initial support for Sienna Cichlid GPU
 - Initial support for Navy Flounder GPU
 - SI UVD/VCE support
 - expose rotation property
 - Add support for unique id on Arcturus
 - Enable runtime PM on vega10 boards that support BACO
 - Skip BAR resizing if the bios already did id
 - Major swSMU code cleanup
 - Fixes for DCN bandwidth calculations
 
 amdkfd:
 - Track SDMA usage per process
 - SMI events interface
 
 radeon:
 - Default to on chip GART for AGP boards on all arches
 - Runtime PM reference count fixes
 
 msm:
 - headers regenerated causing churn
 - a650/a640 display and GPU enablement
 - dpu dither support for 6bpc panels
 - dpu cursor fix
 - dsi/mdp5 enablement for sdm630/sdm636/sdm66
 
 tegra:
 - video capture prep support
 - reflection support
 
 mediatek:
 - convert mtk_dsi to bridge API
 
 meson:
 - FBC support
 
 sun4i:
 - iommu support
 
 rockchip:
 - register locking fix
 - per-pixel alpha support PX30 VOP
 
 -
 mgag200:
 - ported to simple and shmem helpers
 - device init cleanups
 - use managed pci functions
 - dropped hw cursor support
 
 ast:
 - use managed pci functions
 - use managed VRAM helpers
 - rework cursor support
 
 malidp:
 - dev_groups support
 
 hibmc:
 - refactor hibmc_drv_vdac:
 
 vc4:
 - create TXP CRTC
 
 imx:
 - error path fixes and cleanups
 
 etnaviv:
 - clock handling and error handling cleanups
 - use pin_user_pages
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJfK1atAAoJEAx081l5xIa+vDkQAJvl/mjbEA7fDy8Ysa0cgPLI
 8nI4Bo/MaxkyRfUcP8+f/n3QQrRME37C0xa/Mn6SG1oFAdlovPwDqmDr5kjhkrMI
 geo8oJb2Q+AsrJr+ejpuF+iq0FxWi64bLbwJFJ2nBet+lHTMzoPceWeq3gG1Vvfl
 h6PV4B/9TjrnbhcKLIQSEmJ0kZp9uMkDBF/iynVn4+AKAkG1rQNjigdTH48IFPoz
 28KuqG0B4NWu648zYXhjsN0kD3Dxjv3YOH+FsoWQpQa9icCTySYbySsQ7l0/XvA3
 4BPtP3rWMhU46FHTBkWF72WQR4F0B4wm5DJJKMeG4vb1mXakOqAKcAq7JAbka+wL
 PBIiU+AcAKRSiwHmYDuDWtDoSpvYncuo0p3IvNP5hhih+7hqCnLIULDWS+V8AUzW
 39usS/DXsVKk/HGYIYC89cRwsqWYD4c7edzOBdPQxW4LNYCD2gXezLJ5TeeR2lih
 y9JIVnPiluWleOovs4W3BoZNRuLc1rHBO6COToXjlme/48Z+sRHBAoge6UZurqRN
 jr+e60cS7n/DOeJQuNf4UHZnK48Pc24+3kVfMHlX+OKn8VuKPGr+USkeHV/NYL/B
 USiKCAxkkZM0dxerSb1/Ra9kGnchf0QBpA6Fsem8kV61Z4GVc+K6xJWg7KXB6n/3
 7ZyalUKLwlOCz9sYsCCe
 =Yvtd
 -----END PGP SIGNATURE-----

Merge tag 'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie:
 "New xilinx displayport driver, AMD support for two new GPUs (more
  header files), i915 initial support for RocketLake and some work on
  their DG1 (discrete chip).

  The core also grew some lockdep annotations to try and constrain what
  drivers do with dma-fences, and added some documentation on why the
  idea of indefinite fences doesn't work.

  The long list is below.

  I do have some fixes trees outstanding, but I'll follow up with those
  later.

  core:
   - add user def flag to cmd line modes
   - dma_fence_wait added might_sleep
   - dma-fence lockdep annotations
   - indefinite fences are bad documentation
   - gem CMA functions used in more drivers
   - struct mutex removal
   - more drm_ debug macro usage
   - set/drop master api fixes
   - fix for drm/mm hole size comparison
   - drm/mm remove invalid entry optimization
   - optimise drm/mm hole handling
   - VRR debugfs added
   - uncompressed AFBC modifier support
   - multiple display id blocks in EDID
   - multiple driver sg handling fixes
   - __drm_atomic_helper_crtc_reset in all drivers
   - managed vram helpers

  ttm:
   - ttm_mem_reg handling cleanup
   - remove bo offset field
   - drop CMA memtype flag
   - drop mappable flag

  xilinx:
   - New Xilinx ZynqMP DisplayPort Subsystem driver

  nouveau:
   - add CRC support
   - start using NVIDIA published class header files
   - convert all push buffer emission to new macros
   - Proper push buffer space management for EVO/NVD channels.
   - firmware loading fixes
   - 2MiB system memory pages support on Pascal and newer

  vkms:
   - larger cursor support

  i915:
   - Rocketlake platform enablement
   - Early DG1 enablement
   - Numerous GEM refactorings
   - DP MST fixes
   - FBC, PSR, Cursor, Color, Gamma fixes
   - TGL, RKL, EHL workaround updates
   - TGL 8K display support fixes
   - SDVO/HDMI/DVI fixes

  amdgpu:
   - Initial support for Sienna Cichlid GPU
   - Initial support for Navy Flounder GPU
   - SI UVD/VCE support
   - expose rotation property
   - Add support for unique id on Arcturus
   - Enable runtime PM on vega10 boards that support BACO
   - Skip BAR resizing if the bios already did id
   - Major swSMU code cleanup
   - Fixes for DCN bandwidth calculations

  amdkfd:
   - Track SDMA usage per process
   - SMI events interface

  radeon:
   - Default to on chip GART for AGP boards on all arches
   - Runtime PM reference count fixes

  msm:
   - headers regenerated causing churn
   - a650/a640 display and GPU enablement
   - dpu dither support for 6bpc panels
   - dpu cursor fix
   - dsi/mdp5 enablement for sdm630/sdm636/sdm66

  tegra:
   - video capture prep support
   - reflection support

  mediatek:
   - convert mtk_dsi to bridge API

  meson:
   - FBC support

  sun4i:
   - iommu support

  rockchip:
   - register locking fix
   - per-pixel alpha support PX30 VOP

  mgag200:
   - ported to simple and shmem helpers
   - device init cleanups
   - use managed pci functions
   - dropped hw cursor support

  ast:
   - use managed pci functions
   - use managed VRAM helpers
   - rework cursor support

  malidp:
   - dev_groups support

  hibmc:
   - refactor hibmc_drv_vdac:

  vc4:
   - create TXP CRTC

  imx:
   - error path fixes and cleanups

  etnaviv:
   - clock handling and error handling cleanups
   - use pin_user_pages"

* tag 'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm: (1747 commits)
  drm/msm: use kthread_create_worker instead of kthread_run
  drm/msm/mdp5: Add MDP5 configuration for SDM636/660
  drm/msm/dsi: Add DSI configuration for SDM660
  drm/msm/mdp5: Add MDP5 configuration for SDM630
  drm/msm/dsi: Add phy configuration for SDM630/636/660
  drm/msm/a6xx: add A640/A650 hwcg
  drm/msm/a6xx: hwcg tables in gpulist
  drm/msm/dpu: add SM8250 to hw catalog
  drm/msm/dpu: add SM8150 to hw catalog
  drm/msm/dpu: intf timing path for displayport
  drm/msm/dpu: set missing flush bits for INTF_2 and INTF_3
  drm/msm/dpu: don't use INTF_INPUT_CTRL feature on sdm845
  drm/msm/dpu: move some sspp caps to dpu_caps
  drm/msm/dpu: update UBWC config for sm8150 and sm8250
  drm/msm/dpu: use right setup_blend_config for sm8150 and sm8250
  drm/msm/a6xx: set ubwc config for A640 and A650
  drm/msm/adreno: un-open-code some packets
  drm/msm: sync generated headers
  drm/msm/a6xx: add build_bw_table for A640/A650
  drm/msm/a6xx: fix crashstate capture for A650
  ...
2020-08-05 19:50:06 -07:00
Bjorn Helgaas
6585a1a14e Merge branch 'pci/virtualization'
- Remove redundant variable init in xen (Colin Ian King)

- Add pci_pri_supported() to check device or associated PF for PRI support
  (Ashok Raj)

- Mark AMD Navi10 GPU rev 0x00 ATS as broken (Kai-Heng Feng)

- Release IVRS table in AMD ACS quirk (Hanjun Guo)

* pci/virtualization:
  PCI: Release IVRS table in AMD ACS quirk
  PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken
  PCI/ATS: Add pci_pri_supported() to check device or associated PF
  xen: Remove redundant initialization of irq
2020-08-05 18:24:17 -05:00
Zhu Lingshan
2edd9cb79f kvm: detect assigned device via irqbypass manager
vDPA devices has dedicated backed hardware like
passthrough-ed devices. Then it is possible to setup irq
offloading to vCPU for vDPA devices. Thus this patch tries to
manipulated assigned device counters by
kvm_arch_start/end_assignment() in irqbypass manager, so that
assigned devices could be detected in update_pi_irte()

We will increase/decrease the assigned device counter in kvm/x86.
Both vDPA and VFIO would go through this code path.

Only X86 uses these counters and kvm_arch_start/end_assignment(),
so this code path only affect x86 for now.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200731065533.4144-3-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05 11:08:42 -04:00
Linus Torvalds
2324d50d05 It's been a busy cycle for documentation - hopefully the busiest for a
while to come.  Changes include:
 
  - Some new Chinese translations
 
  - Progress on the battle against double words words and non-HTTPS URLs
 
  - Some block-mq documentation
 
  - More RST conversions from Mauro.  At this point, that task is
    essentially complete, so we shouldn't see this kind of churn again for a
    while.  Unless we decide to switch to asciidoc or something...:)
 
  - Lots of typo fixes, warning fixes, and more.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
 cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
 BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
 7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
 9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
 0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
 =JQLZ
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.9' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It's been a busy cycle for documentation - hopefully the busiest for a
  while to come. Changes include:

   - Some new Chinese translations

   - Progress on the battle against double words words and non-HTTPS
     URLs

   - Some block-mq documentation

   - More RST conversions from Mauro. At this point, that task is
     essentially complete, so we shouldn't see this kind of churn again
     for a while. Unless we decide to switch to asciidoc or
     something...:)

   - Lots of typo fixes, warning fixes, and more"

* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
  scripts/kernel-doc: optionally treat warnings as errors
  docs: ia64: correct typo
  mailmap: add entry for <alobakin@marvell.com>
  doc/zh_CN: add cpu-load Chinese version
  Documentation/admin-guide: tainted-kernels: fix spelling mistake
  MAINTAINERS: adjust kprobes.rst entry to new location
  devices.txt: document rfkill allocation
  PCI: correct flag name
  docs: filesystems: vfs: correct flag name
  docs: filesystems: vfs: correct sync_mode flag names
  docs: path-lookup: markup fixes for emphasis
  docs: path-lookup: more markup fixes
  docs: path-lookup: fix HTML entity mojibake
  CREDITS: Replace HTTP links with HTTPS ones
  docs: process: Add an example for creating a fixes tag
  doc/zh_CN: add Chinese translation prefer section
  doc/zh_CN: add clearing-warn-once Chinese version
  doc/zh_CN: add admin-guide index
  doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
  futex: MAINTAINERS: Re-add selftests directory
  ...
2020-08-04 22:47:54 -07:00
Linus Torvalds
4da9f33026 Support for FSGSBASE. Almost 5 years after the first RFC to support it,
this has been brought into a shape which is maintainable and actually
 works.
 
 This final version was done by Sasha Levin who took it up after Intel
 dropped the ball. Sasha discovered that the SGX (sic!) offerings out there
 ship rogue kernel modules enabling FSGSBASE behind the kernels back which
 opens an instantanious unpriviledged root hole.
 
 The FSGSBASE instructions provide a considerable speedup of the context
 switch path and enable user space to write GSBASE without kernel
 interaction. This enablement requires careful handling of the exception
 entries which go through the paranoid entry path as they cannot longer rely
 on the assumption that user GSBASE is positive (as enforced via prctl() on
 non FSGSBASE enabled systemn). All other entries (syscalls, interrupts and
 exceptions) can still just utilize SWAPGS unconditionally when the entry
 comes from user space. Converting these entries to use FSGSBASE has no
 benefit as SWAPGS is only marginally slower than WRGSBASE and locating and
 retrieving the kernel GSBASE value is not a free operation either. The real
 benefit of RD/WRGSBASE is the avoidance of the MSR reads and writes.
 
 The changes come with appropriate selftests and have held up in field
 testing against the (sanitized) Graphene-SGX driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8pGnoTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTYJD/9873GkwvGcc/Vq/dJH1szGTgFftPyZ
 c/Y9gzx7EGBPLo25BS820L+ZlynzXHDxExKfCEaD10TZfe5XIc1vYNR0J74M2NmK
 IBgEDstJeW93ai+rHCFRXIevhpzU4GgGYJ1MeeOgbVMN3aGU1g6HfzMvtF0fPn8Y
 n6fsLZa43wgnoTdjwjjikpDTrzoZbaL1mbODBzBVPAaTbim7IKKTge6r/iCKrOjz
 Uixvm3g9lVzx52zidJ9kWa8esmbOM1j0EPe7/hy3qH9DFo87KxEzjHNH3T6gY5t6
 NJhRAIfY+YyTHpPCUCshj6IkRudE6w/qjEAmKP9kWZxoJrvPCTWOhCzelwsFS9b9
 gxEYfsnaKhsfNhB6fi0PtWlMzPINmEA7SuPza33u5WtQUK7s1iNlgHfvMbjstbwg
 MSETn4SG2/ZyzUrSC06lVwV8kh0RgM3cENc/jpFfIHD0vKGI3qfka/1RY94kcOCG
 AeJd0YRSU2RqL7lmxhHyG8tdb8eexns41IzbPCLXX2sF00eKNkVvMRYT2mKfKLFF
 q8v1x7yuwmODdXfFR6NdCkGm9IU7wtL6wuQ8Nhu9UraFmcXo6X6FLJC18FqcvSb9
 jvcRP4XY/8pNjjf44JB8yWfah0xGQsaMIKQGP4yLv4j6Xk1xAQKH1MqcC7l1D2HN
 5Z24GibFqSK/vA==
 =QaAN
 -----END PGP SIGNATURE-----

Merge tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fsgsbase from Thomas Gleixner:
 "Support for FSGSBASE. Almost 5 years after the first RFC to support
  it, this has been brought into a shape which is maintainable and
  actually works.

  This final version was done by Sasha Levin who took it up after Intel
  dropped the ball. Sasha discovered that the SGX (sic!) offerings out
  there ship rogue kernel modules enabling FSGSBASE behind the kernels
  back which opens an instantanious unpriviledged root hole.

  The FSGSBASE instructions provide a considerable speedup of the
  context switch path and enable user space to write GSBASE without
  kernel interaction. This enablement requires careful handling of the
  exception entries which go through the paranoid entry path as they
  can no longer rely on the assumption that user GSBASE is positive (as
  enforced via prctl() on non FSGSBASE enabled systemn).

  All other entries (syscalls, interrupts and exceptions) can still just
  utilize SWAPGS unconditionally when the entry comes from user space.
  Converting these entries to use FSGSBASE has no benefit as SWAPGS is
  only marginally slower than WRGSBASE and locating and retrieving the
  kernel GSBASE value is not a free operation either. The real benefit
  of RD/WRGSBASE is the avoidance of the MSR reads and writes.

  The changes come with appropriate selftests and have held up in field
  testing against the (sanitized) Graphene-SGX driver"

* tag 'x86-fsgsbase-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/fsgsbase: Fix Xen PV support
  x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase
  selftests/x86/fsgsbase: Add a missing memory constraint
  selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test
  selftests/x86: Add a syscall_arg_fault_64 test for negative GSBASE
  selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE
  selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write
  Documentation/x86/64: Add documentation for GS/FS addressing mode
  x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
  x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit
  x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
  x86/entry/64: Introduce the FIND_PERCPU_BASE macro
  x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
  x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation
  x86/process/64: Use FSGSBASE instructions on thread copy and ptrace
  x86/process/64: Use FSBSBASE in switch_to() if available
  x86/process/64: Make save_fsgs_for_kvm() ready for FSGSBASE
  x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
  x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
  x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  ...
2020-08-04 21:16:22 -07:00
Linus Torvalds
125cfa0d4d The conversion of X86 syscall, interrupt and exception entry/exit handling
to the generic code. Pretty much a straight forward 1:1 conversion plus the
 consolidation of the KVM handling of pending work before entering guest
 mode.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8pEFgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYocEwD/474Eb7LzZ8yahyUBirWJP3k3qzgs9j
 dZUxqB6LNuDOstEyTGLPdx1dmQP2vHbFfjoM7YBOH37EGcHsqjGliLvn2Y05ZD7O
 6kYwjz6qVnJcm3IMtfSUn/8LkfO5pGUdKd3U5ngDmPLpkeaQ4nPKqiO0uIb0wzwa
 cO7l10tG4YjMCWQxPNIaOh8kncLieQBediJPFjkQjV+Fh33kSU3LWTl3fccz6b5+
 mgSUFL0qjQpp+Nl7lCaDQQiAop9GTUETfDtximRydZauiM2NpCfz+QBmQzq50Xv1
 G3DWZoBIZBjmWJUgfSmS/s4GOYkBTBnT/fUcZmIDcgdRwvtEvRzIhcP87/wn7P3N
 UKpLdHqmvA0BFDXZbNZgS362++29pj5Lnb+u3QbWSKQ9UqHN0NUlSY4wzfTLXsGp
 Mzpp4TW0u/8kyOlo7wK3lVDgNJaPG31aiNVuDPgLe4cEluO5cq7/7g2GcFBqF1Ly
 SqNGD1IccteNQTNvDopczPy7qUl5Lal+Ia06szNSPR48gLrvhSWdyYr2i1sD7vx4
 hAhR0Hsi9dacGv46TrRw1OdDzq9bOW68G8GIgLJgDXaayPXLnx6TQEUjzQtIkE/i
 ydTPUarp5QOFByt+RBjI90ZcW4RuLgMTOEVONPXtSn8IoCP2Kdg9u3gD9AmUW3Q2
 JFkKMiSiJPGxlw==
 =84y7
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 conversion to generic entry code from Thomas Gleixner:
 "The conversion of X86 syscall, interrupt and exception entry/exit
  handling to the generic code.

  Pretty much a straight-forward 1:1 conversion plus the consolidation
  of the KVM handling of pending work before entering guest mode"

* tag 'x86-entry-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kvm: Use __xfer_to_guest_mode_work_pending() in kvm_run_vcpu()
  x86/kvm: Use generic xfer to guest work function
  x86/entry: Cleanup idtentry_enter/exit
  x86/entry: Use generic interrupt entry/exit code
  x86/entry: Cleanup idtentry_entry/exit_user
  x86/entry: Use generic syscall exit functionality
  x86/entry: Use generic syscall entry function
  x86/ptrace: Provide pt_regs helper for entry/exit
  x86/entry: Move user return notifier out of loop
  x86/entry: Consolidate 32/64 bit syscall entry
  x86/entry: Consolidate check_user_regs()
  x86: Correct noinstr qualifiers
  x86/idtentry: Remove stale comment
2020-08-04 21:05:46 -07:00
Linus Torvalds
2ed90dbbf7 dma-mapping updates for 5.9
- make support for dma_ops optional
  - move more code out of line
  - add generic support for a dma_ops bypass mode
  - misc cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl8oGscLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNfEhAAmFwd6BBHGwAhXUchoIue5vdNnuY3GiBFRzUdz67W
 zRYYgZYiPjl+MwflRmwPcoWEnGzmweRa2s6OnyDostiCRauioa8BuQfGqJasf1yZ
 D36dFNVHGW0o6pRDUQkd688k/4A6szwuwpq83qi4e8X2I9QzAITHtW8izjfPM923
 FlJzxEFggbB2TvwfUXOZhmpuG4Dog8S7VZ1Uz4QAg0Z/5FDqIKAAG2aZMqCXBbiX
 01E8tr0AqU/jn2xpc8O+DJGFiYIRhqhyNxQbH6qz1Q3xGFSokcLYm3YqkqVOgpn1
 DLs2UFDxWkly/F+wGnYtju7OD9VGPywzOcW125/LIsApYN5R/rYrtQzK41eq7Mp5
 HY3tqgNTIMdnl4so7QXeU4Vxj+lUdPlI26NZGszcM5AVftdTX8KjGdS+0+PBza6i
 i7trwG7J5/DnwiBCvEKoul7Ul1psUMTSvYwINTXRqsU4mZXhhx/mwyXbtruELnkj
 3agM98u6hoalLNjd2aueh+NjMZi1r+MchTrfRvTcxJ+yQ5BoR5kF+iz7eT/LtZ72
 AqWwimsPGNkLHUa0TrqWql5tv90cdDkBZzWXVbixwxRfgynWYLE6jugeIy8hwjFf
 GjO5XKbBwnWPjdSzFsVMPeuNpmr7ZjVHHewy2Q/jWQAIOyeof0VztEl23LN5yUkx
 pc8=
 =90UK
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - make support for dma_ops optional

 - move more code out of line

 - add generic support for a dma_ops bypass mode

 - misc cleanups

* tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping:
  dma-contiguous: cleanup dma_alloc_contiguous
  dma-debug: use named initializers for dir2name
  powerpc: use the generic dma_ops_bypass mode
  dma-mapping: add a dma_ops_bypass flag to struct device
  dma-mapping: make support for dma ops optional
  dma-mapping: inline the fast path dma-direct calls
  dma-mapping: move the remaining DMA API calls out of line
2020-08-04 17:29:57 -07:00
Linus Torvalds
4f30a60aa7 close-range-v5.9
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXygcpgAKCRCRxhvAZXjc
 ogPeAQDv1ncqtNroFAC4pJ4tQhH7JSjW0OltiMk/AocY/J2SdQD9GJ15luYJ0/om
 697q/Z68sndRynhdoZlMuf3oYuBlHQw=
 =3ZhE
 -----END PGP SIGNATURE-----

Merge tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull close_range() implementation from Christian Brauner:
 "This adds the close_range() syscall. It allows to efficiently close a
  range of file descriptors up to all file descriptors of a calling
  task.

  This is coordinated with the FreeBSD folks which have copied our
  version of this syscall and in the meantime have already merged it in
  April 2019:

    https://reviews.freebsd.org/D21627
    https://svnweb.freebsd.org/base?view=revision&revision=359836

  The syscall originally came up in a discussion around the new mount
  API and making new file descriptor types cloexec by default. During
  this discussion, Al suggested the close_range() syscall.

  First, it helps to close all file descriptors of an exec()ing task.
  This can be done safely via (quoting Al's example from [1] verbatim):

        /* that exec is sensitive */
        unshare(CLONE_FILES);
        /* we don't want anything past stderr here */
        close_range(3, ~0U);
        execve(....);

  The code snippet above is one way of working around the problem that
  file descriptors are not cloexec by default. This is aggravated by the
  fact that we can't just switch them over without massively regressing
  userspace. For a whole class of programs having an in-kernel method of
  closing all file descriptors is very helpful (e.g. demons, service
  managers, programming language standard libraries, container managers
  etc.).

  Second, it allows userspace to avoid implementing closing all file
  descriptors by parsing through /proc/<pid>/fd/* and calling close() on
  each file descriptor and other hacks. From looking at various
  large(ish) userspace code bases this or similar patterns are very
  common in service managers, container runtimes, and programming
  language runtimes/standard libraries such as Python or Rust.

  In addition, the syscall will also work for tasks that do not have
  procfs mounted and on kernels that do not have procfs support compiled
  in. In such situations the only way to make sure that all file
  descriptors are closed is to call close() on each file descriptor up
  to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery.

  Based on Linus' suggestion close_range() also comes with a new flag
  CLOSE_RANGE_UNSHARE to more elegantly handle file descriptor dropping
  right before exec. This would usually be expressed in the sequence:

        unshare(CLONE_FILES);
        close_range(3, ~0U);

  as pointed out by Linus it might be desirable to have this be a part
  of close_range() itself under a new flag CLOSE_RANGE_UNSHARE which
  gets especially handy when we're closing all file descriptors above a
  certain threshold.

  Test-suite as always included"

* tag 'close-range-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: add CLOSE_RANGE_UNSHARE tests
  close_range: add CLOSE_RANGE_UNSHARE
  tests: add close_range() tests
  arch: wire-up close_range()
  open: add close_range()
2020-08-04 15:12:02 -07:00
Linus Torvalds
9ba27414f2 fork-v5.9
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXyge/QAKCRCRxhvAZXjc
 oildAQCCWpnTeXm6hrIE3VZ36X5npFtbaEthdBVAUJM7mo0FYwEA8+Wbnubg6jCw
 mztkXCnTfU7tApUdhKtQzcpEws45/Qk=
 =REE/
 -----END PGP SIGNATURE-----

Merge tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull fork cleanups from Christian Brauner:
 "This is cleanup series from when we reworked a chunk of the process
  creation paths in the kernel and switched to struct
  {kernel_}clone_args.

  High-level this does two main things:

   - Remove the double export of both do_fork() and _do_fork() where
     do_fork() used the incosistent legacy clone calling convention.

     Now we only export _do_fork() which is based on struct
     kernel_clone_args.

   - Remove the copy_thread_tls()/copy_thread() split making the
     architecture specific HAVE_COYP_THREAD_TLS config option obsolete.

  This switches all remaining architectures to select
  HAVE_COPY_THREAD_TLS and thus to the copy_thread_tls() calling
  convention. The current split makes the process creation codepaths
  more convoluted than they need to be. Each architecture has their own
  copy_thread() function unless it selects HAVE_COPY_THREAD_TLS then it
  has a copy_thread_tls() function.

  The split is not needed anymore nowadays, all architectures support
  CLONE_SETTLS but quite a few of them never bothered to select
  HAVE_COPY_THREAD_TLS and instead simply continued to use copy_thread()
  and use the old calling convention. Removing this split cleans up the
  process creation codepaths and paves the way for implementing clone3()
  on such architectures since it requires the copy_thread_tls() calling
  convention.

  After having made each architectures support copy_thread_tls() this
  series simply renames that function back to copy_thread(). It also
  switches all architectures that call do_fork() directly over to
  _do_fork() and the struct kernel_clone_args calling convention. This
  is a corollary of switching the architectures that did not yet support
  it over to copy_thread_tls() since do_fork() is conditional on not
  supporting copy_thread_tls() (Mostly because it lacks a separate
  argument for tls which is trivial to fix but there's no need for this
  function to exist.).

  The do_fork() removal is in itself already useful as it allows to to
  remove the export of both do_fork() and _do_fork() we currently have
  in favor of only _do_fork(). This has already been discussed back when
  we added clone3(). The legacy clone() calling convention is - as is
  probably well-known - somewhat odd:

    #
    # ABI hall of shame
    #
    config CLONE_BACKWARDS
    config CLONE_BACKWARDS2
    config CLONE_BACKWARDS3

  that is aggravated by the fact that some architectures such as sparc
  follow the CLONE_BACKWARDSx calling convention but don't really select
  the corresponding config option since they call do_fork() directly.

  So do_fork() enforces a somewhat arbitrary calling convention in the
  first place that doesn't really help the individual architectures that
  deviate from it. They can thus simply be switched to _do_fork()
  enforcing a single calling convention. (I really hope that any new
  architectures will __not__ try to implement their own calling
  conventions...)

  Most architectures already have made a similar switch (m68k comes to
  mind).

  Overall this removes more code than it adds even with a good portion
  of added comments. It simplifies a chunk of arch specific assembly
  either by moving the code into C or by simply rewriting the assembly.

  Architectures that have been touched in non-trivial ways have all been
  actually boot and stress tested: sparc and ia64 have been tested with
  Debian 9 images. They are the two architectures which have been
  touched the most. All non-trivial changes to architectures have seen
  acks from the relevant maintainers. nios2 with a custom built
  buildroot image. h8300 I couldn't get something bootable to test on
  but the changes have been fairly automatic and I'm sure we'll hear
  people yell if I broke something there.

  All other architectures that have been touched in trivial ways have
  been compile tested for each single patch of the series via git rebase
  -x "make ..." v5.8-rc2. arm{64} and x86{_64} have been boot tested
  even though they have just been trivially touched (removal of the
  HAVE_COPY_THREAD_TLS macro from their Kconfig) because well they are
  basically "core architectures" and since it is trivial to get your
  hands on a useable image"

* tag 'fork-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  arch: rename copy_thread_tls() back to copy_thread()
  arch: remove HAVE_COPY_THREAD_TLS
  unicore: switch to copy_thread_tls()
  sh: switch to copy_thread_tls()
  nds32: switch to copy_thread_tls()
  microblaze: switch to copy_thread_tls()
  hexagon: switch to copy_thread_tls()
  c6x: switch to copy_thread_tls()
  alpha: switch to copy_thread_tls()
  fork: remove do_fork()
  h8300: select HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
  nios2: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
  ia64: enable HAVE_COPY_THREAD_TLS, switch to kernel_clone_args
  sparc: unconditionally enable HAVE_COPY_THREAD_TLS
  sparc: share process creation helpers between sparc and sparc64
  sparc64: enable HAVE_COPY_THREAD_TLS
  fork: fold legacy_clone_args_valid() into _do_fork()
2020-08-04 14:47:45 -07:00
Linus Torvalds
0a72761b27 threads-v5.9
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXygcLwAKCRCRxhvAZXjc
 ohajAP4n5E3BmN0jpIviXT4eNhP62jzxJtxlVXtgGT3D8b1mpQEA5n8NSOlQLoAh
 yUGsjtwR9xDcHMcrhXD3yN6eYJSK0A8=
 =tn4R
 -----END PGP SIGNATURE-----

Merge tag 'threads-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread updates from Christian Brauner:
 "This contains the changes to add the missing support for attaching to
  time namespaces via pidfds.

  Last cycle setns() was changed to support attaching to multiple
  namespaces atomically. This requires all namespaces to have a point of
  no return where they can't fail anymore.

  Specifically, <namespace-type>_install() is allowed to perform
  permission checks and install the namespace into the new struct nsset
  that it has been given but it is not allowed to make visible changes
  to the affected task. Once <namespace-type>_install() returns,
  anything that the given namespace type additionally requires to be
  setup needs to ideally be done in a function that can't fail or if it
  fails the failure must be non-fatal.

  For time namespaces the relevant functions that fell into this
  category were timens_set_vvar_page() and vdso_join_timens(). The
  latter could still fail although it didn't need to. This function is
  only implemented for vdso_join_timens() in current mainline. As
  discussed on-list (cf. [1]), in order to make setns() support time
  namespaces when attaching to multiple namespaces at once properly we
  changed vdso_join_timens() to always succeed. So vdso_join_timens()
  replaces the mmap_write_lock_killable() with mmap_read_lock().

  Please note that arm is about to grow vdso support for time namespaces
  (possibly this merge window). We've synced on this change and arm64
  also uses mmap_read_lock(), i.e. makes vdso_join_timens() a function
  that can't fail. Once the changes here and the arm64 changes have
  landed, vdso_join_timens() should be turned into a void function so
  it's obvious to callers and implementers on other architectures that
  the expectation is that it can't fail.

  We didn't do this right away because it would've introduced
  unnecessary merge conflicts between the two trees for no major gain.

  As always, tests included"

[1]: https://lore.kernel.org/lkml/20200611110221.pgd3r5qkjrjmfqa2@wittgenstein

* tag 'threads-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: add CLONE_NEWTIME setns tests
  nsproxy: support CLONE_NEWTIME with setns()
  timens: add timens_commit() helper
  timens: make vdso_join_timens() always succeed
2020-08-04 14:40:07 -07:00
Linus Torvalds
3950e97543 Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull execve updates from Eric Biederman:
 "During the development of v5.7 I ran into bugs and quality of
  implementation issues related to exec that could not be easily fixed
  because of the way exec is implemented. So I have been diggin into
  exec and cleaning up what I can.

  This cycle I have been looking at different ideas and different
  implementations to see what is possible to improve exec, and cleaning
  the way exec interfaces with in kernel users. Only cleaning up the
  interfaces of exec with rest of the kernel has managed to stabalize
  and make it through review in time for v5.9-rc1 resulting in 2 sets of
  changes this cycle.

   - Implement kernel_execve

   - Make the user mode driver code a better citizen

  With kernel_execve the code size got a little larger as the copying of
  parameters from userspace and copying of parameters from userspace is
  now separate. The good news is kernel threads no longer need to play
  games with set_fs to use exec. Which when combined with the rest of
  Christophs set_fs changes should security bugs with set_fs much more
  difficult"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
  exec: Implement kernel_execve
  exec: Factor bprm_stack_limits out of prepare_arg_pages
  exec: Factor bprm_execve out of do_execve_common
  exec: Move bprm_mm_init into alloc_bprm
  exec: Move initialization of bprm->filename into alloc_bprm
  exec: Factor out alloc_bprm
  exec: Remove unnecessary spaces from binfmts.h
  umd: Stop using split_argv
  umd: Remove exit_umh
  bpfilter: Take advantage of the facilities of struct pid
  exit: Factor thread_group_exited out of pidfd_poll
  umd: Track user space drivers with struct pid
  bpfilter: Move bpfilter_umh back into init data
  exec: Remove do_execve_file
  umh: Stop calling do_execve_file
  umd: Transform fork_usermode_blob into fork_usermode_driver
  umd: Rename umd_info.cmdline umd_info.driver_name
  umd: For clarity rename umh_info umd_info
  umh: Separate the user mode driver and the user mode helper support
  umh: Remove call_usermodehelper_setup_file.
  ...
2020-08-04 14:27:25 -07:00