6372 Commits

Author SHA1 Message Date
Dan Williams
a4574f63ed mm/memremap_pages: convert to 'struct range'
The 'struct resource' in 'struct dev_pagemap' is only used for holding
resource span information.  The other fields, 'name', 'flags', 'desc',
'parent', 'sibling', and 'child' are all unused wasted space.

This is in preparation for introducing a multi-range extension of
devm_memremap_pages().

The bulk of this change is unwinding all the places internal to libnvdimm
that used 'struct resource' unnecessarily, and replacing instances of
'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

P2PDMA had a minor usage of the resource flags field, but only to report
failures with "%pR".  That is replaced with an open coded print of the
range.

[dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
  Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	[xen]
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:28 -07:00
Dan Williams
f5516ec5ef device-dax: make pgmap optional for instance creation
The passed in dev_pagemap is only required in the pmem case as the
libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
place the memmap in pmem directly.  In the hmem case there is no agent
reserving an altmap so it can all be handled by a core internal default.

Pass the resource range via a new @range property of 'struct
dev_dax_data'.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106110513.30709.4303239334850606031.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13 18:38:28 -07:00
Linus Torvalds
8b05418b25 seccomp updates for v5.10-rc1
- heavily refactor seccomp selftests (and clone3 selftests dependency) to
   fix powerpc (Kees Cook, Thadeu Lima de Souza Cascardo)
 - fix style issue in selftests (Zou Wei)
 - upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich Felker)
 - replace task_pt_regs(current) with current_pt_regs() (Denis Efremov)
 - fix corner-case race in USER_NOTIF (Jann Horn)
 - make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl+E1LAWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJgRfD/0cq7W51+o34719vefC+oZaMjJJ
 Bd5HYshmr6NRpMqn0OhtT9kVi6OeV0sK0VJeNxSISDIaGNJ8xCI9YhnXwzY+7myK
 +IQu3i2Hv7dlWvTaXWFLL+mvfk6WopLntFGGJQ8KPMnP2gcfH2AZmOeAKGFGhBDe
 NwpAUZ9zriXg9JCQp6u0FzPJgk8KfgfHjUY6Hsa095gg0aPSJhc8bWEUNBQwjCe6
 uIcxDP/zK2WWaEhO9BfHt6/VTcXw7QgTLS3yM+pwBCgR1JHs7HMhtgcwPT410qES
 LmYD8OiHmv5AZhDjcCcNipKEv3ZnxkLnpU/6hfaKM4zn/DoaR/zbfjO9U017rcNV
 9gf7k5siAP7DH48IFlqf4Erzd3xyF0OJDnVfC7NiPtggPfO9aWOHJJZCuJRQOdrN
 qPMjkaQzFb02qb501PLEn55F24OLDjz1vFOqpkJm2/XamOBVV4uiRKmfpNEo/MOf
 QkhSvzvwEFErWwzPH95uFyVhs42stwnM3ppnwtya2+U5kxXdNvbAR8N5leH7siaU
 ab+YJIHW59+BxXTlKgXIcqBP/6RqJWJtuT9OqGs0K2A7FhQSexh5MOm+9vvGgIwZ
 Qjyijku8dB3aV94BNGnlJq6BV+4Hc6EGadh7h3b8GiRAUTYo0pk5G/iKL6Ii+R6p
 0msJENqalKFtNCr70w==
 =a4u2
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
 "The bulk of the changes are with the seccomp selftests to accommodate
  some powerpc-specific behavioral characteristics. Additional cleanups,
  fixes, and improvements are also included:

   - heavily refactor seccomp selftests (and clone3 selftests
     dependency) to fix powerpc (Kees Cook, Thadeu Lima de Souza
     Cascardo)

   - fix style issue in selftests (Zou Wei)

   - upgrade "unknown action" from KILL_THREAD to KILL_PROCESS (Rich
     Felker)

   - replace task_pt_regs(current) with current_pt_regs() (Denis
     Efremov)

   - fix corner-case race in USER_NOTIF (Jann Horn)

   - make CONFIG_SECCOMP no longer per-arch (YiFei Zhu)"

* tag 'seccomp-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
  seccomp: Make duplicate listener detection non-racy
  seccomp: Move config option SECCOMP to arch/Kconfig
  selftests/clone3: Avoid OS-defined clone_args
  selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
  selftests/seccomp: Allow syscall nr and ret value to be set separately
  selftests/seccomp: Record syscall during ptrace entry
  selftests/seccomp: powerpc: Fix seccomp return value testing
  selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of SYSCALL_RET_SET
  selftests/seccomp: Avoid redundant register flushes
  selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
  selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
  selftests/seccomp: Remove syscall setting #ifdefs
  selftests/seccomp: mips: Remove O32-specific macro
  selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
  selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
  selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
  selftests/seccomp: Provide generic syscall setting macro
  selftests/seccomp: Refactor arch register macros to avoid xtensa special case
  selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
  selftests/seccomp: Use bitwise instead of arithmetic operator for flags
  ...
2020-10-13 16:33:43 -07:00
Jakub Kicinski
ccdf7fae3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-10-12

The main changes are:

1) The BPF verifier improvements to track register allocation pattern, from Alexei and Yonghong.

2) libbpf relocation support for different size load/store, from Andrii.

3) bpf_redirect_peer() helper and support for inner map array with different max_entries, from Daniel.

4) BPF support for per-cpu variables, form Hao.

5) sockmap improvements, from John.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-12 16:16:50 -07:00
Linus Torvalds
edaa5ddf38 Scheduler changes for v5.10:
- Reorganize & clean up the SD* flags definitions and add a bunch
    of sanity checks. These new checks caught quite a few bugs or at
    least inconsistencies, resulting in another set of patches.
 
  - Rseq updates, add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
 
  - Add a new tracepoint to improve CPU capacity tracking
 
  - Improve overloaded SMP system load-balancing behavior
 
  - Tweak SMT balancing
 
  - Energy-aware scheduling updates
 
  - NUMA balancing improvements
 
  - Deadline scheduler fixes and improvements
 
  - CPU isolation fixes
 
  - Misc cleanups, simplifications and smaller optimizations.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EWRERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hV8A/7BB0nt/zYVZ8Z3Di8V0b9hMtr0d1xtRM5
 ZAvg4hcZl/fVgobFndxBw6KdlK8lSce9Mcq+bTTWeD46CS13cK5Vrpiaf7x7Q00P
 m8YHeYEH13ME0pbBrhDoRCR4XzfXukzjkUl7LiyrTekAvRUtFikJ/uKl8MeJtYGZ
 gANEkadqforxUW0v45iUEGepmCWAl8hSlSMb2mDKsVhw4DFMD+px0EBmmA0VDqjE
 e0rkh6dEoUVNqlic2KoaXULld1rLg1xiaOcLUbTAXnucfhmuv5p/H11AC4ABuf+s
 7d0zLrLEfZrcLJkthYxfMHs7DYMtARiQM9Db/a5hAq9Af4Z2bvvVAaHt3gCGvkV1
 llB6BB2yWCki9Qv7oiGOAhANnyJHG/cU4r6WwMuHdlYi4dFT/iN5qkOMUL1IrDgi
 a6ZzvECChXBeisQXHSlMd8Y5O+j0gRvDR7E18z2q0/PlmO8PGJq4w34mEWveWIg3
 LaVF16bmvaARuNFJTQH/zaHhjqVQANSMx5OIv9swp0OkwvQkw21ICYHG0YxfzWCr
 oa/FESEpOL9XdYp8UwMPI0bmVIsEfx79pmDMF3zInYTpJpwMUhV2yjHE8uYVMqEf
 7U8rZv7gdbZ2us38Gjf2l73hY+recp/GrgZKnk0R98OUeMk1l/iVP6dwco6ITUV5
 czGmKlIB1ec=
 =bXy6
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - reorganize & clean up the SD* flags definitions and add a bunch of
   sanity checks. These new checks caught quite a few bugs or at least
   inconsistencies, resulting in another set of patches.

 - rseq updates, add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ

 - add a new tracepoint to improve CPU capacity tracking

 - improve overloaded SMP system load-balancing behavior

 - tweak SMT balancing

 - energy-aware scheduling updates

 - NUMA balancing improvements

 - deadline scheduler fixes and improvements

 - CPU isolation fixes

 - misc cleanups, simplifications and smaller optimizations

* tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  sched/deadline: Unthrottle PI boosted threads while enqueuing
  sched/debug: Add new tracepoint to track cpu_capacity
  sched/fair: Tweak pick_next_entity()
  rseq/selftests: Test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  rseq/selftests,x86_64: Add rseq_offset_deref_addv()
  rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  sched/fair: Use dst group while checking imbalance for NUMA balancer
  sched/fair: Reduce busy load balance interval
  sched/fair: Minimize concurrent LBs between domain level
  sched/fair: Reduce minimal imbalance threshold
  sched/fair: Relax constraint on task's load during load balance
  sched/fair: Remove the force parameter of update_tg_load_avg()
  sched/fair: Fix wrong cpu selecting from isolated domain
  sched: Remove unused inline function uclamp_bucket_base_value()
  sched/rt: Disable RT_RUNTIME_SHARE by default
  sched/deadline: Fix stale throttling on de-/boosted tasks
  sched/numa: Use runnable_avg to classify node
  sched/topology: Move sd_flag_debug out of #ifdef CONFIG_SYSCTL
  MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
  sched/topology: Move SD_DEGENERATE_GROUPS_MASK out of linux/sched/topology.h
  ...
2020-10-12 12:56:01 -07:00
Linus Torvalds
87194efe7e * Misc minor cleanups and corrections to the fsgsbase code and
respective selftests.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EMr8ACgkQEsHwGGHe
 VUpoBQ/+Pe7MaMbS7d+IATzhedKW7PNKcQr2g734rnZkz0BgG+1sBkCQPRAKBZoa
 SIvt3gURM3aEeD4Dp3am4nLElyhldZrlwKoGKGv6AYv2BpLPQM9PG0fHJGUyYBze
 ekKMdPu5YK0hYqoWctrY8h+qbExNdkfAvM7bJMFJMBqypVicm0n5wlgfZthGz0DU
 tkD34WZNE2GGAfsi/NNJ2H+hcZo8bQVrqW98bkgdzIA7+KI3cyZZ132VbKxb03tK
 A69C7+J4B20q/traWFlb4mTcFy/a1Txrt3cJXIv/Xer74gDMqNYcciGgnTJdhryY
 gzBmWNTxuQr1EC8DYJaxjlQbBp6VSAwhELlyc6UeRxLAViEpMxyPfBVMOYqraImc
 sZ8QKGgI02PggInN9yo4qCbtUWAGMCHV7HGGW8stVBrh1lia7o6Dy9jhO+nmTnzV
 EEe/vEoSsp/ydnkgFNjaRwjFLp+vDX2lAf513ZuZukpt+IGQ0nAO5phzgcZVAyH4
 qzr9uXdM3j+NtlXZgLttNppWEvHxzIpkri3Ly46VUFYOqTuKYPmS8A7stEfqx3NO
 T8g38+dDirFfKCoJz8NJBUUs+1KXer8QmJvogfHx5fsZ01Q2qz6AmOmsmMfe7Wqm
 +C9mvZJOJnW/7NunGWGVsuAZJOPx6o2oivjVpxxOyl8AeQnE6fg=
 =itDm
 -----END PGP SIGNATURE-----

Merge tag 'x86_fsgsbase_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fsgsbase updates from Borislav Petkov:
 "Misc minor cleanups and corrections to the fsgsbase code and
  respective selftests"

* tag 'x86_fsgsbase_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/x86/fsgsbase: Test PTRACE_PEEKUSER for GSBASE with invalid LDT GS
  selftests/x86/fsgsbase: Reap a forgotten child
  x86/fsgsbase: Replace static_cpu_has() with boot_cpu_has()
  x86/entry/64: Correct the comment over SAVE_AND_SET_GSBASE
2020-10-12 10:44:24 -07:00
Linus Torvalds
ca1b66922a * Extend the recovery from MCE in kernel space also to processes which
encounter an MCE in kernel space but while copying from user memory by
 sending them a SIGBUS on return to user space and umapping the faulty
 memory, by Tony Luck and Youquan Song.
 
 * memcpy_mcsafe() rework by splitting the functionality into
 copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
 support for new hardware which can recover from a machine check
 encountered during a fast string copy and makes that the default and
 lets the older hardware which does not support that advance recovery,
 opt in to use the old, fragile, slow variant, by Dan Williams.
 
 * New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.
 
 * Do not use MSR-tracing accessors in #MC context and flag any fault
 while accessing MCA architectural MSRs as an architectural violation
 with the hope that such hw/fw misdesigns are caught early during the hw
 eval phase and they don't make it into production.
 
 * Misc fixes, improvements and cleanups, as always.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EIpUACgkQEsHwGGHe
 VUouoBAAgwb+NkWZtIqGImV4f+LOyFjhTR/r/7ZyiijXdbhOIuAdc/jQM31mQxug
 sX2jxaRYnf1n6SLA0ggX99gwr2deRQ/hsNf5Abw55GC+Z1dOxpGL0k59A3ELl1IR
 H9KYmCAFQIHvzfk38qcdND73XHcgthQoXFBOG9wAPAdgDWnaiWt6lcLAq8OiJTmp
 D8pInAYhcnL8YXwMGyQQ1KkFn9HwydoWDsK5Ff2shaw2/+dMQqd1zetenbVtjhLb
 iNYGvV7Bi/RQ8PyMbzmtTWa4kwQJAHC2gptkGxty//2ADGVBbqUQdqF9TjIWCNy5
 V6Ldv5zo0/1s7DOzji3htzqkSs/K1Ea6d2LtZjejkJipHKV5x068UC6Fu+PlfS2D
 VZfcICeapU4G2F3Zvks2DlZ7dVTbHCvoI78Qi7bBgczPUVmk6iqah4xuQaiHyBJc
 kTFDA4Nnf/026GpoWRiFry9vqdnHBZyLet5A6Y+SoWF0FbhYnCVPpq4MnussYoav
 lUIi9ZZav6X2RZp9DDM1f9d5xubtKq0DKt93wvzqAhjK0T2DikckJ+riOYkI6N8t
 fHCBNUkdfgyMzJUTBPAzYQ7RmjbjKWJi7xWP0oz6+GqOJkQfSTVC5/2yEffbb3ya
 whYRS6iklbl7yshzaOeecXsZcAeK2oGPfoHg34WkHFgXdF5mNgA=
 =u1Wg
 -----END PGP SIGNATURE-----

Merge tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:

 - Extend the recovery from MCE in kernel space also to processes which
   encounter an MCE in kernel space but while copying from user memory
   by sending them a SIGBUS on return to user space and umapping the
   faulty memory, by Tony Luck and Youquan Song.

 - memcpy_mcsafe() rework by splitting the functionality into
   copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables
   support for new hardware which can recover from a machine check
   encountered during a fast string copy and makes that the default and
   lets the older hardware which does not support that advance recovery,
   opt in to use the old, fragile, slow variant, by Dan Williams.

 - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta.

 - Do not use MSR-tracing accessors in #MC context and flag any fault
   while accessing MCA architectural MSRs as an architectural violation
   with the hope that such hw/fw misdesigns are caught early during the
   hw eval phase and they don't make it into production.

 - Misc fixes, improvements and cleanups, as always.

* tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Allow for copy_mc_fragile symbol checksum to be generated
  x86/mce: Decode a kernel instruction to determine if it is copying from user
  x86/mce: Recover from poison found while copying from user space
  x86/mce: Avoid tail copy when machine check terminated a copy from user
  x86/mce: Add _ASM_EXTABLE_CPY for copy user access
  x86/mce: Provide method to find out the type of an exception handler
  x86/mce: Pass pointer to saved pt_regs to severity calculation routines
  x86/copy_mc: Introduce copy_mc_enhanced_fast_string()
  x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
  x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list
  x86/mce: Add Skylake quirk for patrol scrub reported errors
  RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE()
  x86/mce: Annotate mce_rd/wrmsrl() with noinstr
  x86/mce/dev-mcelog: Do not update kflags on AMD systems
  x86/mce: Stop mce_reign() from re-computing severity for every CPU
  x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR
  x86/mce: Increase maximum number of banks to 64
  x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check()
  x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap
  RAS/CEC: Fix cec_init() prototype
2020-10-12 10:14:38 -07:00
Linus Torvalds
6734e20e39 arm64 updates for 5.10
- Userspace support for the Memory Tagging Extension introduced by Armv8.5.
   Kernel support (via KASAN) is likely to follow in 5.11.
 
 - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
   switching.
 
 - Fix and subsequent rewrite of our Spectre mitigations, including the
   addition of support for PR_SPEC_DISABLE_NOEXEC.
 
 - Support for the Armv8.3 Pointer Authentication enhancements.
 
 - Support for ASID pinning, which is required when sharing page-tables with
   the SMMU.
 
 - MM updates, including treating flush_tlb_fix_spurious_fault() as a no-op.
 
 - Perf/PMU driver updates, including addition of the ARM CMN PMU driver and
   also support to handle CPU PMU IRQs as NMIs.
 
 - Allow prefetchable PCI BARs to be exposed to userspace using normal
   non-cacheable mappings.
 
 - Implementation of ARCH_STACKWALK for unwinding.
 
 - Improve reporting of unexpected kernel traps due to BPF JIT failure.
 
 - Improve robustness of user-visible HWCAP strings and their corresponding
   numerical constants.
 
 - Removal of TEXT_OFFSET.
 
 - Removal of some unused functions, parameters and prototypes.
 
 - Removal of MPIDR-based topology detection in favour of firmware
   description.
 
 - Cleanups to handling of SVE and FPSIMD register state in preparation
   for potential future optimisation of handling across syscalls.
 
 - Cleanups to the SDEI driver in preparation for support in KVM.
 
 - Miscellaneous cleanups and refactoring work.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+AUXMQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNFc1B/4q2Kabe+pPu7s1f58Q+OTaEfqcr3F1qh27
 F1YpFZUYxg0GPfPsFrnbJpo5WKo7wdR9ceI9yF/GHjs7A/MSoQJis3pG6SlAd9c0
 nMU5tCwhg9wfq6asJtl0/IPWem6cqqhdzC6m808DjeHuyi2CCJTt0vFWH3OeHEhG
 cfmLfaSNXOXa/MjEkT8y1AXJ/8IpIpzkJeCRA1G5s18PXV9Kl5bafIo9iqyfKPLP
 0rJljBmoWbzuCSMc81HmGUQI4+8KRp6HHhyZC/k0WEVgj3LiumT7am02bdjZlTnK
 BeNDKQsv2Jk8pXP2SlrI3hIUTz0bM6I567FzJEokepvTUzZ+CVBi
 =9J8H
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "There's quite a lot of code here, but much of it is due to the
  addition of a new PMU driver as well as some arm64-specific selftests
  which is an area where we've traditionally been lagging a bit.

  In terms of exciting features, this includes support for the Memory
  Tagging Extension which narrowly missed 5.9, hopefully allowing
  userspace to run with use-after-free detection in production on CPUs
  that support it. Work is ongoing to integrate the feature with KASAN
  for 5.11.

  Another change that I'm excited about (assuming they get the hardware
  right) is preparing the ASID allocator for sharing the CPU page-table
  with the SMMU. Those changes will also come in via Joerg with the
  IOMMU pull.

  We do stray outside of our usual directories in a few places, mostly
  due to core changes required by MTE. Although much of this has been
  Acked, there were a couple of places where we unfortunately didn't get
  any review feedback.

  Other than that, we ran into a handful of minor conflicts in -next,
  but nothing that should post any issues.

  Summary:

   - Userspace support for the Memory Tagging Extension introduced by
     Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.

   - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
     switching.

   - Fix and subsequent rewrite of our Spectre mitigations, including
     the addition of support for PR_SPEC_DISABLE_NOEXEC.

   - Support for the Armv8.3 Pointer Authentication enhancements.

   - Support for ASID pinning, which is required when sharing
     page-tables with the SMMU.

   - MM updates, including treating flush_tlb_fix_spurious_fault() as a
     no-op.

   - Perf/PMU driver updates, including addition of the ARM CMN PMU
     driver and also support to handle CPU PMU IRQs as NMIs.

   - Allow prefetchable PCI BARs to be exposed to userspace using normal
     non-cacheable mappings.

   - Implementation of ARCH_STACKWALK for unwinding.

   - Improve reporting of unexpected kernel traps due to BPF JIT
     failure.

   - Improve robustness of user-visible HWCAP strings and their
     corresponding numerical constants.

   - Removal of TEXT_OFFSET.

   - Removal of some unused functions, parameters and prototypes.

   - Removal of MPIDR-based topology detection in favour of firmware
     description.

   - Cleanups to handling of SVE and FPSIMD register state in
     preparation for potential future optimisation of handling across
     syscalls.

   - Cleanups to the SDEI driver in preparation for support in KVM.

   - Miscellaneous cleanups and refactoring work"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
  Revert "arm64: initialize per-cpu offsets earlier"
  arm64: random: Remove no longer needed prototypes
  arm64: initialize per-cpu offsets earlier
  kselftest/arm64: Check mte tagged user address in kernel
  kselftest/arm64: Verify KSM page merge for MTE pages
  kselftest/arm64: Verify all different mmap MTE options
  kselftest/arm64: Check forked child mte memory accessibility
  kselftest/arm64: Verify mte tag inclusion via prctl
  kselftest/arm64: Add utilities and a test to validate mte memory
  perf: arm-cmn: Fix conversion specifiers for node type
  perf: arm-cmn: Fix unsigned comparison to less than zero
  arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
  arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
  arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
  arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
  KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
  arm64: Get rid of arm64_ssbd_state
  KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
  KVM: arm64: Get rid of kvm_arm_have_ssbd()
  KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
  ...
2020-10-12 10:00:51 -07:00
John Fastabend
a24fb420a5 bpf, selftests: Add three new sockmap tests for verdict only programs
Here we add three new tests for sockmap to test having a verdict program
without setting the parser program.

The first test covers the most simply case,

   sender         proxy_recv proxy_send      recv
     |                |                       |
     |              verdict -----+            |
     |                |          |            |
     +----------------+          +------------+

We load the verdict program on the proxy_recv socket without a
parser program. It then does a redirect into the send path of the
proxy_send socket using sendpage_locked().

Next we test the drop case to ensure if we kfree_skb as a result of
the verdict program everything behaves as expected.

Next we test the same configuration above, but with ktls and a
redirect into socket ingress queue. Shown here

   tls                                       tls
   sender         proxy_recv proxy_send      recv
     |                |                       |
     |              verdict ------------------+
     |                |      redirect_ingress
     +----------------+

Also to set up ping/pong test

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160239302638.8495.17125996694402793471.stgit@john-Precision-5820-Tower
2020-10-11 18:09:44 -07:00
John Fastabend
cdf43c4bfa bpf, selftests: Add option to test_sockmap to omit adding parser program
Add option to allow running without a parser program in place. To test
with ping/pong program use,

 # test_sockmap -t ping --txmsg_omit_skb_parser

this will send packets between two socket bouncing through a proxy
socket that does not use a parser program.

   (ping)                                    (pong)
   sender         proxy_recv proxy_send      recv
     |                |                       |
     |              verdict -----+            |
     |                |          |            |
     +----------------+          +------------+

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/160239300387.8495.11908295143121563076.stgit@john-Precision-5820-Tower
2020-10-11 18:09:44 -07:00
Florian Westphal
ea2f7da179 selftests: netfilter: extend nfqueue test case
add a test with re-queueing: usespace doesn't pass accept verdict,
but tells to re-queue to another nf_queue instance.

Also, make the second nf-queue program use non-gso mode, kernel will
have to perform software segmentation.

Lastly, do not queue every packet, just one per second, and add delay
when re-injecting the packet to the kernel.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-12 01:59:40 +02:00
Vladimir Oltean
82c200be7c selftests: net: mscc: ocelot: add test for VLAN modify action
Create a test that changes a VLAN ID from 200 to 300.

We also need to modify the preferences of the filters installed for the
other rules so that they are unique, because we now install the "tc-vlan
modify" filter in VCAP IS1 only temporarily, and we need to perform the
deletion by filter preference number.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11 11:19:04 -07:00
Daniel Borkmann
9f4c53ca23 bpf, selftests: Add redirect_peer selftest
Extend the test_tc_redirect test and add a small test that exercises the new
redirect_peer() helper for the IPv4 and IPv6 case.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-7-daniel@iogearbox.net
2020-10-11 10:21:04 -07:00
Daniel Borkmann
57a73fe7c1 bpf, selftests: Make redirect_neigh test more extensible
Rename into test_tc_redirect.sh and move setup and test code into separate
functions so they can be reused for newly added tests in here. Also remove
the crude hack to override ifindex inside the object file via xxd and sed
and just use a simple map instead. Map given iproute2 does not support BTF
fully and therefore neither global data at this point.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201010234006.7075-6-daniel@iogearbox.net
2020-10-11 10:21:04 -07:00
Daniel Borkmann
6775dab73b bpf, selftests: Add test for different array inner map size
Extend the "diff_size" subtest to also include a non-inlined array map variant
where dynamic inner #elems are possible.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201010234006.7075-5-daniel@iogearbox.net
2020-10-11 10:21:04 -07:00
Brendan Higgins
45dcbb6f5e kunit: test: add test plan to KUnit TAP format
TAP 14 allows an optional test plan to be emitted before the start of
the start of testing[1]; this is valuable because it makes it possible
for a test harness to detect whether the number of tests run matches the
number of tests expected to be run, ensuring that no tests silently
failed.

Link[1]: https://github.com/isaacs/testanything.github.io/blob/tap14/tap-version-14-specification.md#the-plan
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-10-09 14:37:49 -06:00
Daniel Latypov
1abdd39f14 kunit: tool: fix display of make errors
CalledProcessError stores the output of the failed process as `bytes`,
not a `str`.

So when we log it on build error, the make output is all crammed into
one line with "\n" instead of actually printing new lines.

After this change, we get readable output with new lines, e.g.
>   CC      lib/kunit/kunit-example-test.o
> In file included from ../lib/kunit/test.c:9:
> ../include/kunit/test.h:22:1: error: unknown type name ‘invalid_type_that_causes_compile’
>    22 | invalid_type_that_causes_compile errors;
>       | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> make[3]: *** [../scripts/Makefile.build:283: lib/kunit/test.o] Error 1

Secondly, trying to concat exceptions to strings will fail with
> TypeError: can only concatenate str (not "OSError") to str
so fix this with an explicit cast to str.

Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-10-09 14:04:09 -06:00
Alexei Starovoitov
54fada41e8 selftests/bpf: Asm tests for the verifier regalloc tracking.
Add asm tests for register allocator tracking logic.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201009011240.48506-5-alexei.starovoitov@gmail.com
2020-10-09 22:03:06 +02:00
Alexei Starovoitov
03d4d13fab selftests/bpf: Add profiler test
The main purpose of the profiler test to check different llvm generation
patterns to make sure the verifier can load these large programs.

Note that profiler.inc.h test doesn't follow strict kernel coding style.
The code was formatted in the kernel style, but variable declarations are
kept as-is to preserve original llvm IR pattern.

profiler1.c should pass with older and newer llvm

profiler[23].c may fail on older llvm that don't have:
https://reviews.llvm.org/D85570
because llvm may do speculative code motion optimization that
will generate code like this:

// r9 is a pointer to map_value
// r7 is a scalar
17:       bf 96 00 00 00 00 00 00 r6 = r9
18:       0f 76 00 00 00 00 00 00 r6 += r7
19:       a5 07 01 00 01 01 00 00 if r7 < 257 goto +1
20:       bf 96 00 00 00 00 00 00 r6 = r9
// r6 is used here

The verifier will reject such code with the error:
"math between map_value pointer and register with unbounded min value is not allowed"
At insn 18 the r7 is indeed unbounded. The later insn 19 checks the bounds and
the insn 20 undoes map_value addition. It is currently impossible for the
verifier to understand such speculative pointer arithmetic. Hence llvm D85570
addresses it on the compiler side.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201009011240.48506-4-alexei.starovoitov@gmail.com
2020-10-09 22:03:06 +02:00
Alexei Starovoitov
75748837b7 bpf: Propagate scalar ranges through register assignments.
The llvm register allocator may use two different registers representing the
same virtual register. In such case the following pattern can be observed:
1047: (bf) r9 = r6
1048: (a5) if r6 < 0x1000 goto pc+1
1050: ...
1051: (a5) if r9 < 0x2 goto pc+66
1052: ...
1053: (bf) r2 = r9 /* r2 needs to have upper and lower bounds */

This is normal behavior of greedy register allocator.
The slides 137+ explain why regalloc introduces such register copy:
http://llvm.org/devmtg/2018-04/slides/Yatsina-LLVM%20Greedy%20Register%20Allocator.pdf
There is no way to tell llvm 'not to do this'.
Hence the verifier has to recognize such patterns.

In order to track this information without backtracking allocate ID
for scalars in a similar way as it's done for find_good_pkt_pointers().

When the verifier encounters r9 = r6 assignment it will assign the same ID
to both registers. Later if either register range is narrowed via conditional
jump propagate the register state into the other register.

Clear register ID in adjust_reg_min_max_vals() for any alu instruction. The
register ID is ignored for scalars in regsafe() and doesn't affect state
pruning. mark_reg_unknown() clears the ID. It's used to process call, endian
and other instructions. Hence ID is explicitly cleared only in
adjust_reg_min_max_vals() and in 32-bit mov.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20201009011240.48506-2-alexei.starovoitov@gmail.com
2020-10-09 22:03:06 +02:00
Jakub Kicinski
d3b2dc9472 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter selftests fixes from
Fabian Frederick:

1) Extend selftest nft_meta.sh to check for meta cpu.

2) Fix selftest nft_meta.sh error reporting.

3) Fix shellcheck warnings in selftest nft_meta.sh.

4) Extend selftest nft_meta.sh to check for meta time.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09 12:18:16 -07:00
Nikita V. Shirokov
eca43ee6c4 bpf: Add tcp_notsent_lowat bpf setsockopt
Adding support for TCP_NOTSENT_LOWAT sockoption (https://lwn.net/Articles/560082/)
in tcp bpf programs.

Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201009070325.226855-1-tehnerd@tehnerd.com
2020-10-09 17:12:03 +02:00
Bill Wendling
a968433723 kbuild: explicitly specify the build id style
ld's --build-id defaults to "sha1" style, while lld defaults to "fast".
The build IDs are very different between the two, which may confuse
programs that reference them.

Signed-off-by: Bill Wendling <morbo@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-10-09 23:57:30 +09:00
Christian Brauner
01361b665a
tests: remove O_NONBLOCK before waiting for WSTOPPED
Naresh reported that selftests: pidfd: pidfd_wait hangs on linux next kernel on
x86_64, i386 and arm64 Juno-r2
These devices are using NFS mounted rootfs.
I have tested pidfd testcases independently and all test PASS.

The Hang or exit from test run noticed when run by run_kselftest.sh

pidfd_wait.c:208:wait_nonblock:Expected sys_waitid(P_PIDFD, pidfd,
&info, WSTOPPED, NULL) (-1) == 0 (0)
wait_nonblock: Test terminated by assertion

metadata:
  git branch: master
  git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  git commit: e64997027d5f171148687e58b78c8b3c869a6158
  git describe: next-20200922
  make_kernelversion: 5.9.0-rc6
  kernel-config:
http://snapshots.linaro.org/openembedded/lkft/lkft/sumo/intel-core2-32/lkft/linux-next/865/config

The reason for this is a simple race in the selftests, that I overlooked and
which is more likely to hit when there's a lot of processes running on the
system. Basically the child process hasn't SIGSTOPed itself yet but the parent
is already calling waitid() on a O_NONBLOCK pidfd. Since it doesn't find a
WSTOPPED process it returns -EAGAIN correctly.

The fix for this is to move the line where we're removing the O_NONBLOCK
property from the fd before the waitid() WSTOPPED call so we hang until the
child becomes stopped.

Fixes: cd89597bbe5a ("tests: add waitid() tests for non-blocking pidfds")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://lkft.validation.linaro.org/scheduler/job/1813223
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-10-09 11:56:51 +02:00
Ingo Molnar
b36c830f8c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull v5.10 RCU changes from Paul E. McKenney:

- Debugging for smp_call_function().

- Strict grace periods for KASAN.  The point of this series is to find
  RCU-usage bugs, so the corresponding new RCU_STRICT_GRACE_PERIOD
  Kconfig option depends on both DEBUG_KERNEL and RCU_EXPERT, and is
  further disabled by dfefault.  Finally, the help text includes
  a goodly list of scary caveats.

- New smp_call_function() torture test.

- Torture-test updates.

- Documentation updates.

- Miscellaneous fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-10-09 08:21:56 +02:00
Matthieu Baerts
036dfd8322 selftests: mptcp: interpret \n as a new line
In case of errors, this message was printed:

  (...)
  balanced bwidth with unbalanced delay       5233 max 5005  [ fail ]
  client exit code 0, server 0
  \nnetns ns3-0-EwnkPH socket stat for 10003:
  (...)

Obviously, the idea was to add a new line before the socket stat and not
print "\nnetns".

The commit 8b974778f998 ("selftests: mptcp: interpret \n as a new line")
is very similar to this one. But the modification in simult_flows.sh was
missed because this commit above was done in parallel to one here below.

Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-08 17:55:24 -07:00
Kees Cook
e953aeaa91 selftests/clone3: Avoid OS-defined clone_args
As the UAPI headers start to appear in distros, we need to avoid
outdated versions of struct clone_args to be able to test modern
features, named "struct __clone_args". Additionally update the struct
size macro names to match UAPI names.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/20200921075432.u4gis3s2o5qrsb5g@wittgenstein/
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-10-08 13:17:25 -07:00
Kees Cook
a39caac02f selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit
Some archs (like powerpc) only support changing the return code during
syscall exit when ptrace is used. Test entry vs exit phases for which
portions of the syscall number and return values need to be set at which
different phases. For non-powerpc, all changes are made during ptrace
syscall entry, as before. For powerpc, the syscall number is changed at
ptrace syscall entry and the syscall return value is changed on ptrace
syscall exit.

Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Link: https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-cascardo@canonical.com/
Fixes: 58d0a862f573 ("seccomp: add tests for ptrace hole")
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/20200921075300.7iylzof2w5vrutah@wittgenstein/
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-10-08 13:16:52 -07:00
Kees Cook
bef71f86b6 selftests/seccomp: Allow syscall nr and ret value to be set separately
In preparation for setting syscall nr and ret values separately, refactor
the helpers to take a pointer to a value, so that a NULL can indicate
"do not change this respective value". This is done to keep the regset
read/write happening once and in one code path.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/20200921075031.j4gruygeugkp2zwd@wittgenstein/
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-10-08 13:16:27 -07:00
Kees Cook
71c87fbe72 selftests/seccomp: Record syscall during ptrace entry
In preparation for performing actions during ptrace syscall exit, save
the syscall number during ptrace syscall entry. Some architectures do
no have the syscall number available during ptrace syscall exit.

Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Link: https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-cascardo@canonical.com/
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/20200921074354.6shkt2e5yhzhj3sn@wittgenstein/
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-10-08 13:16:00 -07:00
Tom Zanussi
cbcd9c8369 selftests/ftrace: Add test case for synthetic event dynamic strings
Add a selftest that defines and traces a synthetic event that uses a
dynamic string event field.

Link: https://lkml.kernel.org/r/74445afb005046d76d59fb06696a2ceaa164dec9.1601848695.git.zanussi@kernel.org

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-10-08 15:28:41 -04:00
Andrii Nakryiko
888d83b961 selftests/bpf: Validate libbpf's auto-sizing of LD/ST/STX instructions
Add selftests validating libbpf's auto-resizing of load/store instructions
when used with CO-RE relocations. An explicit and manual approach with using
bpf_core_read() is also demonstrated and tested. Separate BPF program is
supposed to fail due to using signed integers of sizes that differ from
kernel's sizes.

To reliably simulate 32-bit BTF (i.e., the one with sizeof(long) ==
sizeof(void *) == 4), selftest generates its own custom BTF and passes it as
a replacement for real kernel BTF. This allows to test 32/64-bitness mix on
all architectures.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201008001025.292064-5-andrii@kernel.org
2020-10-07 18:50:27 -07:00
Kees Cook
5da1918446 selftests/run_kselftest.sh: Make each test individually selectable
Currently with run_kselftest.sh there is no way to choose which test
we could run. All the tests listed in kselftest-list.txt are all run
every time. This patch enhanced the run_kselftest.sh to make the test
collections (or tests) individually selectable. e.g.:

$ ./run_kselftest.sh -c seccomp -t timers:posix_timers -t timers:nanosleep

Additionally adds a way to list all known tests with "-l", usage
with "-h", and perform a dry run without running tests with "-n".

Co-developed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-10-07 07:59:15 -06:00
Kees Cook
f0f0a5df4e selftests: Extract run_kselftest.sh and generate stand-alone test list
Instead of building a script on the fly (which just repeats the same
thing for each test collection), move the script out of the Makefile and
into run_kselftest.sh, which reads kselftest-list.txt.

Adjust the emit_tests target to report each test on a separate line so
that test running tools (e.g. LAVA) can easily remove individual
tests (for example, as seen in [1]).

[1] 2e7b62155e

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-10-07 07:58:54 -06:00
Will Deacon
a82e4ef041 Merge branch 'for-next/late-arrivals' into for-next/core
Late patches for 5.10: MTE selftests, minor KCSAN preparation and removal
of some unused prototypes.

(Amit Daniel Kachhap and others)
* for-next/late-arrivals:
  arm64: random: Remove no longer needed prototypes
  arm64: initialize per-cpu offsets earlier
  kselftest/arm64: Check mte tagged user address in kernel
  kselftest/arm64: Verify KSM page merge for MTE pages
  kselftest/arm64: Verify all different mmap MTE options
  kselftest/arm64: Check forked child mte memory accessibility
  kselftest/arm64: Verify mte tag inclusion via prctl
  kselftest/arm64: Add utilities and a test to validate mte memory
2020-10-07 14:36:24 +01:00
Matthew Wilcox (Oracle)
a219b856a2 ida: Free allocated bitmap in error path
If a bitmap needs to be allocated, and then by the time the thread
is scheduled to be run again all the indices which would satisfy the
allocation have been allocated then we would leak the allocation.  Almost
impossible to hit in practice, but a trivial fix.  Found by Coverity.

Fixes: f32f004cddf8 ("ida: Convert to XArray")
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-10-07 09:11:33 -04:00
Matthew Wilcox (Oracle)
dd841a749d radix tree test suite: Fix compilation
Introducing local_lock broke compilation; fix it all up.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-10-07 09:07:49 -04:00
Hao Luo
bf88a80a04 selftests/bpf: Fix test_verifier after introducing resolve_pseudo_ldimm64
Commit 4976b718c355 ("bpf: Introduce pseudo_btf_id") switched
the order of check_subprogs() and resolve_pseudo_ldimm() in
the verifier. Now an empty prog expects to see the error "last
insn is not an the prog of a single invalid ldimm exit or jmp"
instead, because the check for subprogs comes first. It's now
pointless to validate that half of ldimm64 won't be the last
instruction.

Tested:
 # ./test_verifier
 Summary: 1129 PASSED, 537 SKIPPED, 0 FAILED
 and the full set of bpf selftests.

Fixes: 4976b718c355 ("bpf: Introduce pseudo_btf_id")
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201007022857.2791884-1-haoluo@google.com
2020-10-06 20:16:57 -07:00
Hangbin Liu
44c4aa2bd1 selftest/bpf: Test pinning map with reused map fd
This add a test to make sure that we can still pin maps with
reused map fd.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201006021345.3817033-4-liuhangbin@gmail.com
2020-10-06 11:10:20 -07:00
Andrew Donnellan
dc9af82ea0 selftests/powerpc: Add a rtas_filter selftest
Add a selftest to test the basic functionality of CONFIG_RTAS_FILTER.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Change rmo_start/end to 32-bit to avoid build errors on ppc64]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200820044512.7543-2-ajd@linux.ibm.com
2020-10-06 23:22:27 +11:00
Dan Williams
ec6347bb43 x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

  On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
  >
  > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
  > >
  > > However now I see that copy_user_generic() works for the wrong reason.
  > > It works because the exception on the source address due to poison
  > > looks no different than a write fault on the user address to the
  > > caller, it's still just a short copy. So it makes copy_to_user() work
  > > for the wrong reason relative to the name.
  >
  > Right.
  >
  > And it won't work that way on other architectures. On x86, we have a
  > generic function that can take faults on either side, and we use it
  > for both cases (and for the "in_user" case too), but that's an
  > artifact of the architecture oddity.
  >
  > In fact, it's probably wrong even on x86 - because it can hide bugs -
  > but writing those things is painful enough that everybody prefers
  > having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

 [ bp: Massage a bit. ]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
2020-10-06 11:18:04 +02:00
David S. Miller
8b0308fe31 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Rejecting non-native endian BTF overlapped with the addition
of support for it.

The rest were more simple overlapping changes, except the
renesas ravb binding update, which had to follow a file
move as well as a YAML conversion.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-05 18:40:01 -07:00
Maor Gottlieb
07da1223ec lib/scatterlist: Add support in dynamic allocation of SG table from pages
Extend __sg_alloc_table_from_pages to support dynamic allocation of
SG table from pages. It should be used by drivers that can't supply
all the pages at one time.

This function returns the last populated SGE in the table. Users should
pass it as an argument to the function from the second call and forward.
As before, nents will be equal to the number of populated SGEs (chunks).

With this new extension, drivers can benefit the optimization of merging
contiguous pages without a need to allocate all pages in advance and
hold them in a large buffer.

E.g. with the Infiniband driver that allocates a single page for hold the
pages. For 1TB memory registration, the temporary buffer would consume only
4KB, instead of 2GB.

Link: https://lore.kernel.org/r/20201004154340.1080481-2-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-05 20:45:45 -03:00
Tvrtko Ursulin
29d88681fb tools/testing/scatterlist: Show errors in human readable form
Instead of just asserting dump some more useful info about what the test
saw versus what it expected to see.

Link: https://lore.kernel.org/r/20201004154340.1080481-4-leon@kernel.org
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-05 20:45:45 -03:00
Tvrtko Ursulin
efc5b2e73c tools/testing/scatterlist: Rejuvenate bit-rotten test
A couple small tweaks are needed to make the test build and run
on current kernels.

Link: https://lore.kernel.org/r/20201004154340.1080481-3-leon@kernel.org
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-05 20:45:44 -03:00
Amit Daniel Kachhap
4dafc08d0b kselftest/arm64: Check mte tagged user address in kernel
Add a testcase to check that user address with valid/invalid
mte tag works in kernel mode. This test verifies that the kernel
API's __arch_copy_from_user/__arch_copy_to_user works by considering
if the user pointer has valid/invalid allocation tags.

In MTE sync mode, file memory read/write and other similar interfaces
fails if a user memory with invalid tag is accessed in kernel. In async
mode no such failure occurs.

Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-7-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-10-05 18:52:17 +01:00
Amit Daniel Kachhap
f981d8fa26 kselftest/arm64: Verify KSM page merge for MTE pages
Add a testcase to check that KSM should not merge pages containing
same data with same/different MTE tag values.

This testcase has one positive tests and passes if page merging
happens according to the above rule. It also saves and restores
any modified ksm sysfs entries.

Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-6-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-10-05 18:52:17 +01:00
Amit Daniel Kachhap
53ec81d232 kselftest/arm64: Verify all different mmap MTE options
This testcase checks the different unsupported/supported options for mmap
if used with PROT_MTE memory protection flag. These checks are,

* Either pstate.tco enable or prctl PR_MTE_TCF_NONE option should not cause
  any tag mismatch faults.
* Different combinations of anonymous/file memory mmap, mprotect,
  sync/async error mode and private/shared mappings should work.
* mprotect should not be able to clear the PROT_MTE page property.

Co-developed-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-5-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-10-05 18:52:17 +01:00
Amit Daniel Kachhap
dfe537cf47 kselftest/arm64: Check forked child mte memory accessibility
This test covers the mte memory behaviour of the forked process with
different mapping properties and flags. It checks that all bytes of
forked child memory are accessible with the same tag as that of the
parent and memory accessed outside the tag range causes fault to
occur.

Co-developed-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-4-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-10-05 18:52:17 +01:00
Amit Daniel Kachhap
f3b2a26ca7 kselftest/arm64: Verify mte tag inclusion via prctl
This testcase verifies that the tag generated with "irg" instruction
contains only included tags. This is done via prtcl call.

This test covers 4 scenarios,
* At least one included tag.
* More than one included tags.
* All included.
* None included.

Co-developed-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Gabor Kertesz <gabor.kertesz@arm.com>
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201002115630.24683-3-amit.kachhap@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2020-10-05 18:52:17 +01:00