3027 Commits

Author SHA1 Message Date
Linus Torvalds
e0152e7481 RISC-V Patches for the 6.6 Merge Window, Part 1
* Support for the new "riscv,isa-extensions" and "riscv,isa-base" device
   tree interfaces for probing extensions.
 * Support for userspace access to the performance counters.
 * Support for more instructions in kprobes.
 * Crash kernels can be allocated above 4GiB.
 * Support for KCFI.
 * Support for ELFs in !MMU configurations.
 * ARCH_KMALLOC_MINALIGN has been reduced to 8.
 * mmap() defaults to sv48-sized addresses, with longer addresses hidden
   behind a hint (similar to Arm and Intel).
 * Also various fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmTx96kTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiVjRD/9DYVLlkQ/OEDJjPaEcYCP49xgIVUUU
 lhs3XbSs2VNHBaiG114f6Q0AaT/uNi+uqSej3CeTmEot2kZkBk/f2yu+UNIriPZ9
 GQiZsdyXhu921C+5VFtiI47KDWOVZ+Jpy3M1ll61IWt3yPSQHr1xOP0AOiyHHqe3
 cmqpNnzjajlfVDoXPc2mGGzUJt/7ar4thcwnMNi98raXR5Qh7SP6rrHjoQhE1oFk
 LMP3CHqEAcHE2tE4CxZVpc6HOQ5m0LpQIOK7ypufGMyoIYESm5dt/JOT4MlhTtDw
 6JzyVKtiM7lartUnUaW3ZoX4trQYT5gbXxWrJ2gCnUGy3VulikoXr1Rpz0qfdeOR
 XN8OLkVAqHfTGFI7oKk24f9Adw96R5NPZcdCay90h4J/kMfCiC7ZyUUI1XIa5iy1
 np5pZCkf8HNcdywML7qcFd5n2O0wchyFnRLFZo6kJP9Ls5cEi6kBx/1jSdTcNgx/
 fUKXyoEcriGoQiiwn29+4RZnU69gJV3zqQNLPpuwDQ5F/Q1zHTlrr+dqzezKkzcO
 dRTV2d2Q4A5vIDXPptzNNLlRQdrc8qxPJ1lxQVkPIU4/mtqczmZBwlyY2u9zwPyS
 sehJgJZnoAf+jm71NgQAKLck4MUBsMnMogOWunhXkVRCoZlbbkUWX4ECZYwPKsVk
 W7zVPmLvSM0l5g==
 =/tXb
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for the new "riscv,isa-extensions" and "riscv,isa-base"
   device tree interfaces for probing extensions

 - Support for userspace access to the performance counters

 - Support for more instructions in kprobes

 - Crash kernels can be allocated above 4GiB

 - Support for KCFI

 - Support for ELFs in !MMU configurations

 - ARCH_KMALLOC_MINALIGN has been reduced to 8

 - mmap() defaults to sv48-sized addresses, with longer addresses hidden
   behind a hint (similar to Arm and Intel)

 - Also various fixes and cleanups

* tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
  lib/Kconfig.debug: Restrict DEBUG_INFO_SPLIT for RISC-V
  riscv: support PREEMPT_DYNAMIC with static keys
  riscv: Move create_tmp_mapping() to init sections
  riscv: Mark KASAN tmp* page tables variables as static
  riscv: mm: use bitmap_zero() API
  riscv: enable DEBUG_FORCE_FUNCTION_ALIGN_64B
  riscv: remove redundant mv instructions
  RISC-V: mm: Document mmap changes
  RISC-V: mm: Update pgtable comment documentation
  RISC-V: mm: Add tests for RISC-V mm
  RISC-V: mm: Restrict address space for sv39,sv48,sv57
  riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent
  riscv: allow kmalloc() caches aligned to the smallest value
  riscv: support the elf-fdpic binfmt loader
  binfmt_elf_fdpic: support 64-bit systems
  riscv: Allow CONFIG_CFI_CLANG to be selected
  riscv/purgatory: Disable CFI
  riscv: Add CFI error handling
  riscv: Add ftrace_stub_graph
  riscv: Add types to indirectly called assembly functions
  ...
2023-09-01 08:09:48 -07:00
Linus Torvalds
df57721f9a Add x86 shadow stack support
Convert IBT selftest to asm to fix objtool warning
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTv1QQACgkQaDWVMHDJ
 krAUwhAAn6TOwHJK8BSkHeiQhON1nrlP3c5cv0AyZ2NP8RYDrZrSZvhpYBJ6wgKC
 Cx5CGq5nn9twYsYS3KsktLKDfR3lRdsQ7K9qtyFtYiaeaVKo+7gEKl/K+klwai8/
 gninQWHk0zmSCja8Vi77q52WOMkQKapT8+vaON9EVDO8dVEi+CvhAIfPwMafuiwO
 Rk4X86SzoZu9FP79LcCg9XyGC/XbM2OG9eNUTSCKT40qTTKm5y4gix687NvAlaHR
 ko5MTsdl0Wfp6Qk0ohT74LnoA2c1g/FluvZIM33ci/2rFpkf9Hw7ip3lUXqn6CPx
 rKiZ+pVRc0xikVWkraMfIGMJfUd2rhelp8OyoozD7DB7UZw40Q4RW4N5tgq9Fhe9
 MQs3p1v9N8xHdRKl365UcOczUxNAmv4u0nV5gY/4FMC6VjldCl2V9fmqYXyzFS4/
 Ogg4FSd7c2JyGFKPs+5uXyi+RY2qOX4+nzHOoKD7SY616IYqtgKoz5usxETLwZ6s
 VtJOmJL0h//z0A7tBliB0zd+SQ5UQQBDC2XouQH2fNX2isJMn0UDmWJGjaHgK6Hh
 8jVp6LNqf+CEQS387UxckOyj7fu438hDky1Ggaw4YqowEOhQeqLVO4++x+HITrbp
 AupXfbJw9h9cMN63Yc0gVxXQ9IMZ+M7UxLtZ3Cd8/PVztNy/clA=
 =3UUm
 -----END PGP SIGNATURE-----

Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 shadow stack support from Dave Hansen:
 "This is the long awaited x86 shadow stack support, part of Intel's
  Control-flow Enforcement Technology (CET).

  CET consists of two related security features: shadow stacks and
  indirect branch tracking. This series implements just the shadow stack
  part of this feature, and just for userspace.

  The main use case for shadow stack is providing protection against
  return oriented programming attacks. It works by maintaining a
  secondary (shadow) stack using a special memory type that has
  protections against modification. When executing a CALL instruction,
  the processor pushes the return address to both the normal stack and
  to the special permission shadow stack. Upon RET, the processor pops
  the shadow stack copy and compares it to the normal stack copy.

  For more information, refer to the links below for the earlier
  versions of this patch set"

Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/

* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  x86/shstk: Change order of __user in type
  x86/ibt: Convert IBT selftest to asm
  x86/shstk: Don't retry vm_munmap() on -EINTR
  x86/kbuild: Fix Documentation/ reference
  x86/shstk: Move arch detail comment out of core mm
  x86/shstk: Add ARCH_SHSTK_STATUS
  x86/shstk: Add ARCH_SHSTK_UNLOCK
  x86: Add PTRACE interface for shadow stack
  selftests/x86: Add shadow stack test
  x86/cpufeatures: Enable CET CR4 bit for shadow stack
  x86/shstk: Wire in shadow stack interface
  x86: Expose thread features in /proc/$PID/status
  x86/shstk: Support WRSS for userspace
  x86/shstk: Introduce map_shadow_stack syscall
  x86/shstk: Check that signal frame is shadow stack mem
  x86/shstk: Check that SSP is aligned on sigreturn
  x86/shstk: Handle signals for shadow stack
  x86/shstk: Introduce routines modifying shstk
  x86/shstk: Handle thread shadow stack
  x86/shstk: Add user-mode shadow stack support
  ...
2023-08-31 12:20:12 -07:00
Paolo Bonzini
e43ae8b689 KVM/riscv changes for 6.6
- Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for Guest/VM
 - Added ONE_REG interface for SATP mode
 - Added ONE_REG interface to enable/disable multiple ISA extensions
 - Improved error codes returned by ONE_REG interfaces
 - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
 - Added get-reg-list selftest for KVM RISC-V
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmTlo4gACgkQrUjsVaLH
 LAfuxA/+KzZJ4o6dM5c/VltYNdYwb3y17QJa/FE5abl+mBsGApv+fmefXOKeoj21
 W72XCkGBhpHoaEAIyN63eP0IIs4LvaJOfyGkkINeIR1we9Ws++x8PBaCf/oiclHA
 xwTSOHLj0OYMyR/zeusTqz1moHBwKFQ5Zx3EsBk+HPVThgoejbyQS37+8dG3paRU
 UYaHKz1SJg120G7Xbl/WUpM7e8wQzFZTx/cGuNIWad3U8TM9chYcg2q/HTgEq8pz
 tIwV04g+8bYT9Y/kb1LdqUprHPQgdrYphOnIyf/916goYwIhHkf2PVO2BajzmWjU
 12gZAAOSD5JSS2Rel47dmM2AYAT5nii9E5x9/56YeqHdYG6LqtYdLDAkXvrsIDIh
 EnZdNj7svoWpe5IQ9CqZRgy6uNhXx1u6JlUA6Euuhd0vEvQWRNj9BaPaF7pjFu7E
 6KyRxBhpmueJQ1exBQsnOMV97UB00SYH+8HiQE5kma0yBqKHFTgZbhPDI+VEMfrG
 sqbNJ49Xpm+2Ac1+H00WB+8xtyxGab0z+8PCA2nrkgtquGeGzkwuWQvF3wGctDds
 PXxgS6qaXSXCa3j5KlCShwLgi4wpKiwZ9Nz70NI4TQWWiQBMbxJ+Y2R+5BE5ugk5
 8tFKWO0jLDanNINgg+uJmxa7oQSsZMBJ0+CLqPYc76rC0ivKbKI=
 =ZNgs
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.6-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.6

- Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for Guest/VM
- Added ONE_REG interface for SATP mode
- Added ONE_REG interface to enable/disable multiple ISA extensions
- Improved error codes returned by ONE_REG interfaces
- Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
- Added get-reg-list selftest for KVM RISC-V
2023-08-31 13:25:55 -04:00
Paolo Bonzini
0d15bf966d Common KVM changes for 6.6:
- Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass
    action specific data without needing to constantly update the main handlers.
 
  - Drop unused function declarations
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmTudpYSHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5xJUQAKnMVEV+7gRtfV5KCJFRTNAMxo4zSIt/
 K6QX+x/SwUriXj4nTAlvAtju1xz4nwTYBABKj3bXEaLpVjIUIbnEzEGuTKKK6XY9
 UyJKVgafwLuKLWPYN/5Zv5SCO7DmVC9W3lVMtchgt7gFcRxtZhmEn53boHhrhan0
 /2L5XD6N9rd81Zmd/rQkJNRND7XY3HkvDSnfmsRI/rfFUglCUHBDp4c2Wkmz+Dnb
 ux7N37si5OTbRVp18VzbLg1jalstDEm36ZQ7tLkvIbNbZV6pV93/ZmcTmsOruTeN
 gHVr6/RXmKKwgO3wtZ9DKL6oMcoh20yoT+vqhbaihVssLPGPusk7S2cCQ7529u8/
 Oda+w67MMdbE46N9CmB56fkpwNvn9nLCoQFhMhXBWhPJVNmorpiR6drHKqLy5zCq
 lrsWGqXU/DXA2PwdsztfIIMVeALawzExHu9ayppcKwb4S8TLJhLma7dT+EvwUxuV
 hoswaIT7Tq2ptZ34Fo5/vEz+90u2wi7LynHrNdTs7NLsW+WI/jab7KxKc+mf5WYh
 KuMzqmmPXmWRFupFeDa61YY5PCvMddDeac/jCYL/2cr73RA8bUItivwt5mEg5nOW
 9NEU+cLbl1s8g2KfxwhvodVkbhiNGf8MkVpE5skHHh9OX8HYzZUa/s6uUZO1O0eh
 XOk+fa9KWabt
 =n819
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-generic-6.6' of https://github.com/kvm-x86/linux into HEAD

Common KVM changes for 6.6:

 - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass
   action specific data without needing to constantly update the main handlers.

 - Drop unused function declarations
2023-08-31 13:19:55 -04:00
Paolo Bonzini
e0fb12c673 KVM/arm64 updates for Linux 6.6
- Add support for TLB range invalidation of Stage-2 page tables,
   avoiding unnecessary invalidations. Systems that do not implement
   range invalidation still rely on a full invalidation when dealing
   with large ranges.
 
 - Add infrastructure for forwarding traps taken from a L2 guest to
   the L1 guest, with L0 acting as the dispatcher, another baby step
   towards the full nested support.
 
 - Simplify the way we deal with the (long deprecated) 'CPU target',
   resulting in a much needed cleanup.
 
 - Fix another set of PMU bugs, both on the guest and host sides,
   as we seem to never have any shortage of those...
 
 - Relax the alignment requirements of EL2 VA allocations for
   non-stack allocations, as we were otherwise wasting a lot of that
   precious VA space.
 
 - The usual set of non-functional cleanups, although I note the lack
   of spelling fixes...
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmTsXrUPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDZpIQAJUM1rNEOJ8ExYRfoG1LaTfcOm5TD6D1IWlO
 uCUx4xLMBudw/55HusmUSdiomQ3Xg5UdRaU7vX5OYwPbdoWebjEUfgdP3jCA/TiW
 mZTMv3x9hOvp+EOS/UnS469cERvg1/KfwcdOQsWL0HsCFZnu2XmQHWPD++vovLNp
 F1892ij875mC6C6mOR60H2nyjIiCuqWh/8eKBkp65CARCbFDYxWhqBnmcmTvoquh
 E87pQDPdtgXc0KlOWCABh5bYOu1WGVEXE5f3ixtdY9cQakkSI3NkFKw27/mIWS4q
 TCsagByNnPFDXTglb1dJopNdluLMFi1iXhRJX78R/PYaHxf4uFafWcQk1U7eDdLg
 1kPANggwYe4KNAQZUvRhH7lIPWHCH0r4c1qHV+FsiOZVoDOSKHo4RW1ZFtirJSNW
 LNJMdk+8xyae0S7z164EpZB/tpFttX4gl3YvUT/T+4gH8+CRFAaoAlK39CoGDPpk
 f+P2GE1Z5YupF16YjpZtBnan55KkU1b6eORl5zpnAtoaz5WGXqj1t4qo0Q6e9WB9
 X4rdDVhH7vRUmhjmSP6PuEygb84hnITLdGpkH2BmWj/4uYuCN+p+U2B2o/QdMJoo
 cPxdflLOU/+1gfAFYPtHVjVKCqzhwbw3iLXQpO12gzRYqE13rUnAr7RuGDf5fBVC
 LW7Pv81o
 =DKhx
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 6.6

- Add support for TLB range invalidation of Stage-2 page tables,
  avoiding unnecessary invalidations. Systems that do not implement
  range invalidation still rely on a full invalidation when dealing
  with large ranges.

- Add infrastructure for forwarding traps taken from a L2 guest to
  the L1 guest, with L0 acting as the dispatcher, another baby step
  towards the full nested support.

- Simplify the way we deal with the (long deprecated) 'CPU target',
  resulting in a much needed cleanup.

- Fix another set of PMU bugs, both on the guest and host sides,
  as we seem to never have any shortage of those...

- Relax the alignment requirements of EL2 VA allocations for
  non-stack allocations, as we were otherwise wasting a lot of that
  precious VA space.

- The usual set of non-functional cleanups, although I note the lack
  of spelling fixes...
2023-08-31 13:18:53 -04:00
Palmer Dabbelt
94f00388c2
Merge patch series "RISC-V: mm: Make SV48 the default address space"
Charlie Jenkins <charlie@rivosinc.com> says:

Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.

* b4-shazam-merge:
  RISC-V: mm: Document mmap changes
  RISC-V: mm: Update pgtable comment documentation
  RISC-V: mm: Add tests for RISC-V mm
  RISC-V: mm: Restrict address space for sv39,sv48,sv57

Link: https://lore.kernel.org/r/20230809232218.849726-1-charlie@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:36 -07:00
Palmer Dabbelt
52b77c2806
Merge patch series "riscv: Reduce ARCH_KMALLOC_MINALIGN to 8"
Jisheng Zhang <jszhang@kernel.org> says:

Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E
64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel
Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus
it brings some bad effects to coherent platforms:

Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and
kmalloc-8 slab caches don't exist any more, they are replaced with
either kmalloc-128 or kmalloc-64.

Secondly, larger than necessary kmalloc aligned allocations results
in unnecessary cache/TLB pressure.

This issue also exists on arm64 platforms. From last year, Catalin
tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from
ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to
dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage
in various drivers with ARCH_DMA_MINALIGN etc.[1]

One fact we can make use of for riscv: if the CPU doesn't support
ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on
Catalin's work and above fact, we can easily solve the kmalloc align
issue for riscv: we can override dma_get_cache_alignment(), then let
it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know
the underlying HW neither supports ZICBOM nor supports T-HEAD CMO.

So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the
devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the
kmalloc minimum alignment, nothing changed in this case. This case
can be improved in the future once we see such platforms in mainline.

After this patch, a simple test of booting to a small buildroot rootfs
on qemu shows:

kmalloc-96           5041    5041     96  ...
kmalloc-64           9606    9606     64  ...
kmalloc-32           5128    5128     32  ...
kmalloc-16           7682    7682     16  ...
kmalloc-8           10246   10246      8  ...

So we save about 1268KB memory. The saving will be much larger in normal
OS env on real HW platforms.

patch1 allows kmalloc() caches aligned to the smallest value.
patch2 enables DMA_BOUNCE_UNALIGNED_KMALLOC.

After this series:

As for coherent platforms, kmalloc-{8,16,32,96} caches come back on
coherent both RV32 and RV64 platforms, I.E !ZICBOM and !THEAD_CMO.

As for noncoherent RV32 platforms, nothing changed.

As for noncoherent RV64 platforms, I.E either ZICBOM or THEAD_CMO, the
above kmalloc caches also come back if > 4GB memory or users pass
"swiotlb=mmnn,force" to force swiotlb creation if <= 4GB memory. How
much mmnn should be depends on the specific platform, it needs to be
tried and tested all possible usage case on the specific hardware. For
example, I can use the minimal I/O TLB slabs on Sipeed M1S Dock.

* b4-shazam-merge:
  riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent
  riscv: allow kmalloc() caches aligned to the smallest value

Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1]
Link: https://lore.kernel.org/r/20230718152214.2907-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:35 -07:00
Jisheng Zhang
4e90d0522a
riscv: support PREEMPT_DYNAMIC with static keys
Currently, each architecture can support PREEMPT_DYNAMIC through
either static calls or static keys. To support PREEMPT_DYNAMIC on
riscv, we face three choices:

1. only add static calls support to riscv
As Mark pointed out in commit 99cf983cc8bc ("sched/preempt: Add
PREEMPT_DYNAMIC using static keys"), static keys "...should have
slightly lower overhead than non-inline static calls, as this
effectively inlines each trampoline into the start of its callee. This
may avoid redundant work, and may integrate better with CFI schemes."
So even we add static calls(without inline static calls) to riscv,
static keys is still a better choice.

2. add static calls and inline static calls to riscv
Per my understanding, inline static calls requires objtool support
which is not easy.

3. use static keys

While riscv doesn't have static calls support, it supports static keys
perfectly. So this patch selects HAVE_PREEMPT_DYNAMIC_KEY to enable
support for PREEMPT_DYNAMIC on riscv, so that the preemption model can
be chosen at boot time. It also patches asm-generic/preempt.h, mainly
to add __preempt_schedule() and __preempt_schedule_notrace() macros
for PREEMPT_DYNAMIC case. Other architectures which use generic
preempt.h can also benefit from this patch by simply selecting
HAVE_PREEMPT_DYNAMIC_KEY to enable PREEMPT_DYNAMIC if they supports
static keys.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230716164925.1858-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:34 -07:00
Palmer Dabbelt
150e3c92a1
Merge patch series "riscv: support ELF format binaries in nommu mode"
Greg Ungerer <gerg@kernel.org> says:

The following changes add the ability to run ELF format binaries when
running RISC-V in nommu mode. That support is actually part of the
ELF-FDPIC loader, so these changes are all about making that work on
RISC-V.

The first issue to deal with is making the ELF-FDPIC loader capable of
handling 64-bit ELF files. As coded right now it only supports 32-bit
ELF files.

Secondly some changes are required to enable and compile the ELF-FDPIC
loader on RISC-V and to pass the ELF-FDPIC mapping addresses through to
user space when execing the new program.

These changes have not been used to run actual ELF-FDPIC binaries.
It is used to load and run normal ELF - compiled -pie format. Though the
underlying changes are expected to work with full ELF-FDPIC binaries if
or when that is supported on RISC-V in gcc.

To avoid needing changes to the C-library (tested with uClibc-ng
currently) there is a simple runtime dynamic loader (interpreter)
available to do the final relocations, https://github.com/gregungerer/uldso.
The nice thing about doing it this way is that the same program
binary can also be loaded with the usual ELF loader in MMU linux.

The motivation here is to provide an easy to use alternative to the
flat format binaries normally used for RISC-V nommu based systems.

* b4-shazam-merge:
  riscv: support the elf-fdpic binfmt loader
  binfmt_elf_fdpic: support 64-bit systems

Link: https://lore.kernel.org/r/20230711130754.481209-1-gerg@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:33 -07:00
Palmer Dabbelt
7f7d3ea6eb
Merge patch series "riscv: KCFI support"
Sami Tolvanen <samitolvanen@google.com> says:

This series adds KCFI support for RISC-V. KCFI is a fine-grained
forward-edge control-flow integrity scheme supported in Clang >=16,
which ensures indirect calls in instrumented code can only branch to
functions whose type matches the function pointer type, thus making
code reuse attacks more difficult.

Patch 1 implements a pt_regs based syscall wrapper to address
function pointer type mismatches in syscall handling. Patches 2 and 3
annotate indirectly called assembly functions with CFI types. Patch 4
implements error handling for indirect call checks. Patch 5 disables
CFI for arch/riscv/purgatory. Patch 6 finally allows CONFIG_CFI_CLANG
to be enabled for RISC-V.

Note that Clang 16 has a generic architecture-agnostic KCFI
implementation, which does work with the kernel, but doesn't produce
a stable code sequence for indirect call checks, which means
potential failures just trap and won't result in informative error
messages. Clang 17 includes a RISC-V specific back-end implementation
for KCFI, which emits a predictable code sequence for the checks and a
.kcfi_traps section with locations of the traps, which patch 5 uses to
produce more useful errors.

The type mismatch fixes and annotations in the first three patches
also become necessary in future if the kernel decides to support
fine-grained CFI implemented using the hardware landing pad
feature proposed in the in-progress Zicfisslp extension. Once the
specification is ratified and hardware support emerges, implementing
runtime patching support that replaces KCFI instrumentation with
Zicfisslp landing pads might also be feasible (similarly to KCFI to
FineIBT patching on x86_64), allowing distributions to ship a unified
kernel binary for all devices.

* b4-shazam-merge:
  riscv: Allow CONFIG_CFI_CLANG to be selected
  riscv/purgatory: Disable CFI
  riscv: Add CFI error handling
  riscv: Add ftrace_stub_graph
  riscv: Add types to indirectly called assembly functions
  riscv: Implement syscall wrappers

Link: https://lore.kernel.org/r/20230710183544.999540-8-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:32 -07:00
Alexandre Ghiti
9bdd924803
riscv: Move create_tmp_mapping() to init sections
This function is only used at boot time so mark it as __init.

Fixes: 96f9d4daf745 ("riscv: Rework kasan population functions")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230704074357.233982-2-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:31 -07:00
Alexandre Ghiti
dd7664d67b
riscv: Mark KASAN tmp* page tables variables as static
tmp_pg_dir, tmp_p4d and tmp_pud are only used in kasan_init.c so they
should be declared as static.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306282202.bODptiGE-lkp@intel.com/
Fixes: 96f9d4daf745 ("riscv: Rework kasan population functions")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230704074357.233982-1-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:30 -07:00
Ye Xingchen
665c51f644
riscv: mm: use bitmap_zero() API
bitmap_zero() is faster than bitmap_clear(), so use bitmap_zero()
instead of bitmap_clear().

Signed-off-by: Ye Xingchen <ye.xingchen@zte.com.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/202305061711417142802@zte.com.cn
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:29 -07:00
Palmer Dabbelt
9389e6715f
Merge patch series "support allocating crashkernel above 4G explicitly on riscv"
Chen Jiahao <chenjiahao16@huawei.com> says:

On riscv, the current crash kernel allocation logic is trying to
allocate within 32bit addressible memory region by default, if
failed, try to allocate without 4G restriction.

In need of saving DMA zone memory while allocating a relatively large
crash kernel region, allocating the reserved memory top down in
high memory, without overlapping the DMA zone, is a mature solution.
Hence this patchset introduces the parameter option crashkernel=X,[high,low].

One can reserve the crash kernel from high memory above DMA zone range
by explicitly passing "crashkernel=X,high"; or reserve a memory range
below 4G with "crashkernel=X,low". Besides, there are few rules need
to take notice:
1. "crashkernel=X,[high,low]" will be ignored if "crashkernel=size"
   is specified.
2. "crashkernel=X,low" is valid only when "crashkernel=X,high" is passed
   and there is enough memory to be allocated under 4G.
3. When allocating crashkernel above 4G and no "crashkernel=X,low" is
   specified, a 128M low memory will be allocated automatically for
   swiotlb bounce buffer.
See Documentation/admin-guide/kernel-parameters.txt for more information.

To verify loading the crashkernel, adapted kexec-tools is attached below:
https://github.com/chenjh005/kexec-tools/tree/build-test-riscv-v2

Following test cases have been performed as expected:
1) crashkernel=256M                          //low=256M
2) crashkernel=1G                            //low=1G
3) crashkernel=4G                            //high=4G, low=128M(default)
4) crashkernel=4G crashkernel=256M,high      //high=4G, low=128M(default), high is ignored
5) crashkernel=4G crashkernel=256M,low       //high=4G, low=128M(default), low is ignored
6) crashkernel=4G,high                       //high=4G, low=128M(default)
7) crashkernel=256M,low                      //low=0M, invalid
8) crashkernel=4G,high crashkernel=256M,low  //high=4G, low=256M
9) crashkernel=4G,high crashkernel=4G,low    //high=0M, low=0M, invalid
10) crashkernel=512M@0xd0000000              //low=512M
11) crashkernel=1G,high crashkernel=0M,low   //high=1G, low=0M

* b4-shazam-merge:
  docs: kdump: Update the crashkernel description for riscv
  riscv: kdump: Implement crashkernel=X,[high,low]

Link: https://lore.kernel.org/r/20230726175000.2536220-1-chenjiahao16@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:28 -07:00
Palmer Dabbelt
82dfb5fde6
Merge patch series "riscv: kprobes: simulate some instructions"
Nam Cao <namcaov@gmail.com> says:

Simulate some currently rejected instructions. Still to be simulated are:
    - c.jal
    - c.ebreak

* b4-shazam-merge:
  riscv: kprobes: simulate c.beqz and c.bnez
  riscv: kprobes: simulate c.jr and c.jalr instructions
  riscv: kprobes: simulate c.j instruction

Link: https://lore.kernel.org/r/cover.1690704360.git.namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:27 -07:00
Nam Cao
6b289a3ffa
riscv: remove redundant mv instructions
Some mv instructions were useful when first introduced to preserve a0 and
a1 before function calls. However the code has changed and they are now
redundant. Remove them.

Signed-off-by: Nam Cao <namcaov@gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230725053835.138910-1-namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31 00:18:25 -07:00
Linus Torvalds
ef2a0b7cdb Devicetree include cleanups for v6.6:
These are the remaining few clean-ups of DT related includes which
 didn't get applied to subsystem trees.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmTucUoACgkQ+vtdtY28
 YcOYoQ//RwIPeWc74PHQbOb6eQR95eTHDcDE1MR9Fw8amqxFaomGlSMpbyVyP4ag
 8p82c6qfJIZautyEikbKFO+iYjFMua0KuOTMVuDxHErQOl6ym4P4Uk3+1h5stVSj
 IdfK4CACtMKxKBOPAcyxJU6HKoWcUtMKsKV6OLdDh7M2Fy/G4RCjv4w1Xf3VAn59
 VOa0KF7FhHU3dhIB/tGsj0t13+3e3kF5+l4+pdoMoZWhR4gac5FJRxiR5dMZG6jr
 VY8i9FZb7DW2VtY78FVVOaYDDVf4vNrc+0kqnCbWUaKACHPgNXC375LvS7jFGXvc
 HYVN3teqhFxNOyoSehn2bdBVwJxjQFgy2gTt2vRWTa/CaUDES90cue2R9GT2Sz0b
 eBc3DQtNeT5m8mrLkuEfZrJjKjaEy2Pr6FjNDhNcmkJak7dkMMgkG/Y/SpNmpZOe
 2C3T6i4i6FUxni/2/rWHSVLnYBGfhPNdwWAZcQOi8rqtzp3tF46wVa345+Ev3VDG
 ECDndH8Qk3gtOmGyeTIvPc51yDP6Hpuh7+0jydtehkXHB+cUJtR+g0efIGf7BDgo
 sQpa1vRxkOolrCxyzKwcogEY7jjeccv/FM7BwaZQKXEibiKGkxeDuahdwbfvDuVq
 br16Uj9VzG8Jl6KK0gexV7kzZAAdw1y3JqPGUZf7hn4zmk099ow=
 =eLMf
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree include cleanups from Rob Herring:
 "These are the remaining few clean-ups of DT related includes which
  didn't get applied to subsystem trees"

* tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  ipmi: Explicitly include correct DT includes
  tpm: Explicitly include correct DT includes
  lib/genalloc: Explicitly include correct DT includes
  parport: Explicitly include correct DT includes
  sbus: Explicitly include correct DT includes
  mux: Explicitly include correct DT includes
  macintosh: Explicitly include correct DT includes
  hte: Explicitly include correct DT includes
  EDAC: Explicitly include correct DT includes
  clocksource: Explicitly include correct DT includes
  sparc: Explicitly include correct DT includes
  riscv: Explicitly include correct DT includes
2023-08-30 17:04:28 -07:00
Linus Torvalds
0e72db7767 ARM: devicetree updates for 6.6
These are the devicetree updates for Arm and RISC-V based SoCs,
 mainly from Qualcomm, NXP/Freescale, Aspeed, TI, Rockchips,
 Samsung, ST and Starfive.
 
 Only a few new SoC got added:
 
  - TI AM62P5, a variant of the existing Sitara AM62x family
 
  - Intel Agilex5, an FPGFA platform that includes an
    Cortex-A76/A55 SoC.
 
  - Qualcomm ipq5018 is used in wireless access points
 
  - Qualcomm SM4450 (Snapdragon 4 Gen 2) is a new low-end mobile
    phone platform.
 
 In total, 29 machines get added, which is low because of the summer
 break. These cover SoCs from Aspeed, Broadcom, NXP, Samsung, ST,
 Allwinner, Amlogic, Intel, Qualcomm, Rockchip, TI and T-Head.  Most of
 these are development and reference boards.
 
 Despite not adding a lot of new machines, there are over 700 patches in
 total, most of which are cleanups and minor fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmTuZOQACgkQYKtH/8kJ
 UieJGw/5AWNde1VYY/3uhdHSNeXtF+1LICZYoLQiSTWbpnlEoLhQseymRCybWRhd
 882TLCuEMrHV+ftP5z+kdYS+QJE3K8GNRxp4vAgLTRwnZItEE3qD8/jNeTtK5BqT
 7h9OYZJQhfczsIXISIljsvez2T3n5fA1glq2drc2fLl9AbEYvlByDCtCNmzqMYli
 II/S03vJqT3eFnOwut70EdFtnW3FIJ+HJLWu46zWLhrx2XJVS8hNaWqhQ3uKZ58p
 Q9cWAltu4biaILVpD/0vZST+m77HqcrPtVSwR9uHvzEzDEzYk/GYzn7DDlWgdVbt
 DJMkgK1pLJq67KpsefMqgTioCh2FBiWfNQ6FjsgJ06ykXgThLJfbzeeB4zU4fnDT
 YmcV/8gR6Np/wNeSBlPNLI6BZ1EF4h0Lkm6p//QgayYhdMVbE59K+SNwq4wps24l
 JU3dcxxblwpVmAaKSL5p0lbYTn8VKuz9rcXYIziQm1m8zaCwjq863bDFJKz8JsP8
 0tQ/azvS4Off9fKIMbUE4fiFmDGhgLTi0XL+GIlOFJF6JS6ToD2nL4FGRBJZPWNn
 iPNEV0F/dDnonB7Jfu92NodULY6B0mXs5/q+dPwde6oSpIDU2ORyNRb6Zk4fGC0l
 C+iPdb3BErf+GQYBLnRqQaMuV0sA6mN89lC6KlzWwwHK0UUoohg=
 =bO3j
 -----END PGP SIGNATURE-----

Merge tag 'soc-dt-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM devicetree updates from Arnd Bergmann:
 "These are the devicetree updates for Arm and RISC-V based SoCs, mainly
  from Qualcomm, NXP/Freescale, Aspeed, TI, Rockchips, Samsung, ST and
  Starfive.

  Only a few new SoC got added:

   - TI AM62P5, a variant of the existing Sitara AM62x family

   - Intel Agilex5, an FPGFA platform that includes an Cortex-A76/A55
     SoC.

   - Qualcomm ipq5018 is used in wireless access points

   - Qualcomm SM4450 (Snapdragon 4 Gen 2) is a new low-end mobile phone
     platform.

  In total, 29 machines get added, which is low because of the summer
  break. These cover SoCs from Aspeed, Broadcom, NXP, Samsung, ST,
  Allwinner, Amlogic, Intel, Qualcomm, Rockchip, TI and T-Head. Most of
  these are development and reference boards.

  Despite not adding a lot of new machines, there are over 700 patches
  in total, most of which are cleanups and minor fixes"

* tag 'soc-dt-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (735 commits)
  arm64: dts: use capital "OR" for multiple licenses in SPDX
  ARM: dts: use capital "OR" for multiple licenses in SPDX
  arm64: dts: qcom: sdm845-db845c: Mark cont splash memory region as reserved
  ARM: dts: qcom: apq8064: add support to gsbi4 uart
  riscv: dts: change TH1520 files to dual license
  riscv: dts: thead: add BeagleV Ahead board device tree
  dt-bindings: riscv: Add BeagleV Ahead board compatibles
  ARM: dts: stm32: add SCMI PMIC regulators on stm32mp135f-dk board
  ARM: dts: stm32: STM32MP13x SoC exposes SCMI regulators
  dt-bindings: rcc: stm32: add STM32MP13 SCMI regulators IDs
  ARM: dts: stm32: support display on stm32f746-disco board
  ARM: dts: stm32: rename mmc_vcard to vcc-3v3 on stm32f746-disco
  ARM: dts: stm32: add pin map for LTDC on stm32f7
  ARM: dts: stm32: add ltdc support on stm32f746 MCU
  arm64: dts: qcom: sm6350: Hook up PDC as wakeup-parent of TLMM
  arm64: dts: qcom: sdm670: Hook up PDC as wakeup-parent of TLMM
  arm64: dts: qcom: sa8775p: Hook up PDC as wakeup-parent of TLMM
  arm64: dts: qcom: sc8280xp: Hook up PDC as wakeup-parent of TLMM
  arm64: dts: qcom: sdm670: Add PDC
  riscv: dts: starfive: fix jh7110 qspi sort order
  ...
2023-08-30 16:53:46 -07:00
Linus Torvalds
d68b4b6f30 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
("refactor Kconfig to consolidate KEXEC and CRASH options").
 
 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h").
 
 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands").
 
 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions").
 
 - Switch the handling of kdump from a udev scheme to in-kernel handling,
   by Eric DeVolder ("crash: Kernel handling of CPU and memory hot
   un/plug").
 
 - Many singleton patches to various parts of the tree
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO2GpAAKCRDdBJ7gKXxA
 juW3AQD1moHzlSN6x9I3tjm5TWWNYFoFL8af7wXDJspp/DWH/AD/TO0XlWWhhbYy
 QHy7lL0Syha38kKLMXTM+bN6YQHi9AU=
 =WJQa
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:

 - An extensive rework of kexec and crash Kconfig from Eric DeVolder
   ("refactor Kconfig to consolidate KEXEC and CRASH options")

 - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
   couple of macros to args.h")

 - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
   commands")

 - vsprintf inclusion rationalization from Andy Shevchenko
   ("lib/vsprintf: Rework header inclusions")

 - Switch the handling of kdump from a udev scheme to in-kernel
   handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory
   hot un/plug")

 - Many singleton patches to various parts of the tree

* tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits)
  document while_each_thread(), change first_tid() to use for_each_thread()
  drivers/char/mem.c: shrink character device's devlist[] array
  x86/crash: optimize CPU changes
  crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
  crash: hotplug support for kexec_load()
  x86/crash: add x86 crash hotplug support
  crash: memory and CPU hotplug sysfs attributes
  kexec: exclude elfcorehdr from the segment digest
  crash: add generic infrastructure for crash hotplug support
  crash: move a few code bits to setup support of crash hotplug
  kstrtox: consistently use _tolower()
  kill do_each_thread()
  nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
  scripts/bloat-o-meter: count weak symbol sizes
  treewide: drop CONFIG_EMBEDDED
  lockdep: fix static memory detection even more
  lib/vsprintf: declare no_hash_pointers in sprintf.h
  lib/vsprintf: split out sprintf() and friends
  kernel/fork: stop playing lockless games for exe_file replacement
  adfs: delete unused "union adfs_dirtail" definition
  ...
2023-08-29 14:53:51 -07:00
Linus Torvalds
b96a3e9142 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP.  It
   also speeds up GUP handling of transparent hugepages.
 
 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").
 
 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").
 
 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").
 
 - xu xin has doe some work on KSM's handling of zero pages.  These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support tracking
   KSM-placed zero-pages").
 
 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
 
 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").
 
 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD").
 
 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").
 
 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").
 
 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").
 
 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").
 
 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").
 
 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap").  And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").
 
 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
 
 - Baoquan He has converted some architectures to use the GENERIC_IOREMAP
   ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take
   GENERIC_IOREMAP way").
 
 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").
 
 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep").  Liam also developed some efficiency improvements
   ("Reduce preallocations for maple tree").
 
 - Cleanup and optimization to the secondary IOMMU TLB invalidation, from
   Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").
 
 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").
 
 - Kemeng Shi provides some maintenance work on the compaction code ("Two
   minor cleanups for compaction").
 
 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most
   file-backed faults under the VMA lock").
 
 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").
 
 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").
 
 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").
 
 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
 
 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").
 
 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").
 
 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
 
 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").
 
 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").
 
 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap
   on memory feature on ppc64").
 
 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock migratetype").
 
 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").
 
 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").
 
 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").
 
 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").
 
 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").
 
 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").
 
 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").
 
 - pagetable handling cleanups from Matthew Wilcox ("New page table range
   API").
 
 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").
 
 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").
 
 - Matthew Wilcox has also done some maintenance work on the MM subsystem
   documentation ("Improve mm documentation").
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA
 jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6
 Nfio7afS1ncD+hPYT8947UnLxTgn+ww=
 =Afws
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in
   add_to_avail_list")

 - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP. It
   also speeds up GUP handling of transparent hugepages.

 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").

 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").

 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").

 - xu xin has doe some work on KSM's handling of zero pages. These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support
   tracking KSM-placed zero-pages").

 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").

 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").

 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with
   UFFD").

 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").

 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").

 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").

 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").

 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").

 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap"). And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").

 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").

 - Baoquan He has converted some architectures to use the
   GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
   architectures to take GENERIC_IOREMAP way").

 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").

 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep"). Liam also developed some efficiency
   improvements ("Reduce preallocations for maple tree").

 - Cleanup and optimization to the secondary IOMMU TLB invalidation,
   from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").

 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").

 - Kemeng Shi provides some maintenance work on the compaction code
   ("Two minor cleanups for compaction").

 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
   most file-backed faults under the VMA lock").

 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").

 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").

 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").

 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").

 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").

 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").

 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").

 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").

 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").

 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
   memmap on memory feature on ppc64").

 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock
   migratetype").

 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").

 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").

 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").

 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").

 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").

 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").

 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").

 - pagetable handling cleanups from Matthew Wilcox ("New page table
   range API").

 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").

 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").

 - Matthew Wilcox has also done some maintenance work on the MM
   subsystem documentation ("Improve mm documentation").

* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
  maple_tree: shrink struct maple_tree
  maple_tree: clean up mas_wr_append()
  secretmem: convert page_is_secretmem() to folio_is_secretmem()
  nios2: fix flush_dcache_page() for usage from irq context
  hugetlb: add documentation for vma_kernel_pagesize()
  mm: add orphaned kernel-doc to the rst files.
  mm: fix clean_record_shared_mapping_range kernel-doc
  mm: fix get_mctgt_type() kernel-doc
  mm: fix kernel-doc warning from tlb_flush_rmaps()
  mm: remove enum page_entry_size
  mm: allow ->huge_fault() to be called without the mmap_lock held
  mm: move PMD_ORDER to pgtable.h
  mm: remove checks for pte_index
  memcg: remove duplication detection for mem_cgroup_uncharge_swap
  mm/huge_memory: work on folio->swap instead of page->private when splitting folio
  mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
  mm/swap: use dedicated entry for swap in folio
  mm/swap: stop using page->private on tail pages for THP_SWAP
  selftests/mm: fix WARNING comparing pointer to 0
  selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
  ...
2023-08-29 14:25:26 -07:00
Linus Torvalds
bd6c11bc43 Networking changes for 6.6.
Core
 ----
 
  - Increase size limits for to-be-sent skb frag allocations. This
    allows tun, tap devices and packet sockets to better cope with large
    writes operations.
 
  - Store netdevs in an xarray, to simplify iterating over netdevs.
 
  - Refactor nexthop selection for multipath routes.
 
  - Improve sched class lifetime handling.
 
  - Add backup nexthop ID support for bridge.
 
  - Implement drop reasons support in openvswitch.
 
  - Several data races annotations and fixes.
 
  - Constify the sk parameter of routing functions.
 
  - Prepend kernel version to netconsole message.
 
 Protocols
 ---------
 
  - Implement support for TCP probing the peer being under memory
    pressure.
 
  - Remove hard coded limitation on IPv6 specific info placement
    inside the socket struct.
 
  - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated
    per socket scaling factor.
 
  - Scaling-up the IPv6 expired route GC via a separated list of
    expiring routes.
 
  - In-kernel support for the TLS alert protocol.
 
  - Better support for UDP reuseport with connected sockets.
 
  - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
    header size.
 
  - Get rid of additional ancillary per MPTCP connection struct socket.
 
  - Implement support for BPF-based MPTCP packet schedulers.
 
  - Format MPTCP subtests selftests results in TAP.
 
  - Several new SMC 2.1 features including unique experimental options,
    max connections per lgr negotiation, max links per lgr negotiation.
 
 BPF
 ---
 
  - Multi-buffer support in AF_XDP.
 
  - Add multi uprobe BPF links for attaching multiple uprobes
    and usdt probes, which is significantly faster and saves extra fds.
 
  - Implement an fd-based tc BPF attach API (TCX) and BPF link support on
    top of it.
 
  - Add SO_REUSEPORT support for TC bpf_sk_assign.
 
  - Support new instructions from cpu v4 to simplify the generated code and
    feature completeness, for x86, arm64, riscv64.
 
  - Support defragmenting IPv(4|6) packets in BPF.
 
  - Teach verifier actual bounds of bpf_get_smp_processor_id()
    and fix perf+libbpf issue related to custom section handling.
 
  - Introduce bpf map element count and enable it for all program types.
 
  - Add a BPF hook in sys_socket() to change the protocol ID
    from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy.
 
  - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress.
 
  - Add uprobe support for the bpf_get_func_ip helper.
 
  - Check skb ownership against full socket.
 
  - Support for up to 12 arguments in BPF trampoline.
 
  - Extend link_info for kprobe_multi and perf_event links.
 
 Netfilter
 ---------
 
  - Speed-up process exit by aborting ruleset validation if a
    fatal signal is pending.
 
  - Allow NLA_POLICY_MASK to be used with BE16/BE32 types.
 
 Driver API
 ----------
 
  - Page pool optimizations, to improve data locality and cache usage.
 
  - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need
    for raw ioctl() handling in drivers.
 
  - Simplify genetlink dump operations (doit/dumpit) providing them
    the common information already populated in struct genl_info.
 
  - Extend and use the yaml devlink specs to [re]generate the split ops.
 
  - Introduce devlink selective dumps, to allow SF filtering SF based on
    handle and other attributes.
 
  - Add yaml netlink spec for netlink-raw families, allow route, link and
    address related queries via the ynl tool.
 
  - Remove phylink legacy mode support.
 
  - Support offload LED blinking to phy.
 
  - Add devlink port function attributes for IPsec.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Broadcom ASP 2.0 (72165) ethernet controller
    - MediaTek MT7988 SoC
    - Texas Instruments AM654 SoC
    - Texas Instruments IEP driver
    - Atheros qca8081 phy
    - Marvell 88Q2110 phy
    - NXP TJA1120 phy
 
  - WiFi:
    - MediaTek mt7981 support
 
  - Can:
    - Kvaser SmartFusion2 PCI Express devices
    - Allwinner T113 controllers
    - Texas Instruments tcan4552/4553 chips
 
  - Bluetooth:
    - Intel Gale Peak
    - Qualcomm WCN3988 and WCN7850
    - NXP AW693 and IW624
    - Mediatek MT2925
 
 Drivers
 -------
 
  - Ethernet NICs:
    - nVidia/Mellanox:
      - mlx5:
        - support UDP encapsulation in packet offload mode
        - IPsec packet offload support in eswitch mode
        - improve aRFS observability by adding new set of counters
        - extends MACsec offload support to cover RoCE traffic
        - dynamic completion EQs
      - mlx4:
        - convert to use auxiliary bus instead of custom interface logic
    - Intel
      - ice:
        - implement switchdev bridge offload, even for LAG interfaces
        - implement SRIOV support for LAG interfaces
      - igc:
        - add support for multiple in-flight TX timestamps
    - Broadcom:
      - bnxt:
        - use the unified RX page pool buffers for XDP and non-XDP
        - use the NAPI skb allocation cache
    - OcteonTX2:
      - support Round Robin scheduling HTB offload
      - TC flower offload support for SPI field
    - Freescale:
      -  add XDP_TX feature support
    - AMD:
      - ionic: add support for PCI FLR event
      - sfc:
        - basic conntrack offload
        - introduce eth, ipv4 and ipv6 pedit offloads
    - ST Microelectronics:
      - stmmac: maximze PTP timestamping resolution
 
  - Virtual NICs:
    - Microsoft vNIC:
      - batch ringing RX queue doorbell on receiving packets
      - add page pool for RX buffers
    - Virtio vNIC:
      - add per queue interrupt coalescing support
    - Google vNIC:
      - add queue-page-list mode support
 
  - Ethernet high-speed switches:
    - nVidia/Mellanox (mlxsw):
      - add port range matching tc-flower offload
      - permit enslavement to netdevices with uppers
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - convert to phylink_pcs
    - Renesas:
      - r8A779fx: add speed change support
      - rzn1: enables vlan support
 
  - Ethernet PHYs:
    - convert mv88e6xxx to phylink_pcs
 
  - WiFi:
    - Qualcomm Wi-Fi 7 (ath12k):
      - extremely High Throughput (EHT) PHY support
    - RealTek (rtl8xxxu):
      - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
        RTL8192EU and RTL8723BU
    - RealTek (rtw89):
      - Introduce Time Averaged SAR (TAS) support
 
  - Connector:
    - support for event filtering
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTt1ZoSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkgFUP/REFaYWdWUvAzmWeezyx9dqgZMfSOjWq
 9QvySiA94OAOcjIYkb7wfzQ5BBAZqaBQ/f8XqWwS1EDDDEBs8sP1cxmABKwW7Hsr
 qFRu2sOqLzKBk223d0jIgEocfQaFpGbF71gXoTlDivBjBi5UxWm9bF0XnbYWcKgO
 /QEvzNosi9uNdi85Fzmv62J6YzAdidEpwGsM7X2CfejwNRmStxAEg/NwvRR0Hyiq
 OJCo97omEgTRaUle8nc64PDx33u4h5kQ1BkaeHEv0rbE3hftFC2YPKn/InmqSFGz
 6ew2xnrGPR37LCuAiCcIIv6yR7K0eu0iYJ7jXwZxBDqxGavEPuwWGBoCP6qFiitH
 ZLWhIrAUrdmSbySkTOCONhJ475qFAuQoYHYpZnX/bJZUHlSsb/9lwDJYJQGpVfd1
 /daqJVSb7lhaifmNO1iNd/ibCIXq9zapwtkRwA897M8GkZBTsnVvazFld1Em+Se3
 Bx6DSDUVBqVQ9fpZG2IAGD6odDwOzC1lF2IoceFvK9Ff6oE0psI+A0qNLMkHxZbW
 Qlo7LsNe53hpoCC+yHTfXX7e/X8eNt0EnCGOQJDusZ0Nr3K7H4LKFA0i8UBUK05n
 4lKnnaSQW7GQgdofLWt103OMDR9GoDxpFsm7b1X9+AEk6Fz6tq50wWYeMZETUKYP
 DCW8VGFOZjZM
 =9CsR
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core:

   - Increase size limits for to-be-sent skb frag allocations. This
     allows tun, tap devices and packet sockets to better cope with
     large writes operations

   - Store netdevs in an xarray, to simplify iterating over netdevs

   - Refactor nexthop selection for multipath routes

   - Improve sched class lifetime handling

   - Add backup nexthop ID support for bridge

   - Implement drop reasons support in openvswitch

   - Several data races annotations and fixes

   - Constify the sk parameter of routing functions

   - Prepend kernel version to netconsole message

  Protocols:

   - Implement support for TCP probing the peer being under memory
     pressure

   - Remove hard coded limitation on IPv6 specific info placement inside
     the socket struct

   - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
     socket scaling factor

   - Scaling-up the IPv6 expired route GC via a separated list of
     expiring routes

   - In-kernel support for the TLS alert protocol

   - Better support for UDP reuseport with connected sockets

   - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
     header size

   - Get rid of additional ancillary per MPTCP connection struct socket

   - Implement support for BPF-based MPTCP packet schedulers

   - Format MPTCP subtests selftests results in TAP

   - Several new SMC 2.1 features including unique experimental options,
     max connections per lgr negotiation, max links per lgr negotiation

  BPF:

   - Multi-buffer support in AF_XDP

   - Add multi uprobe BPF links for attaching multiple uprobes and usdt
     probes, which is significantly faster and saves extra fds

   - Implement an fd-based tc BPF attach API (TCX) and BPF link support
     on top of it

   - Add SO_REUSEPORT support for TC bpf_sk_assign

   - Support new instructions from cpu v4 to simplify the generated code
     and feature completeness, for x86, arm64, riscv64

   - Support defragmenting IPv(4|6) packets in BPF

   - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
     perf+libbpf issue related to custom section handling

   - Introduce bpf map element count and enable it for all program types

   - Add a BPF hook in sys_socket() to change the protocol ID from
     IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy

   - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress

   - Add uprobe support for the bpf_get_func_ip helper

   - Check skb ownership against full socket

   - Support for up to 12 arguments in BPF trampoline

   - Extend link_info for kprobe_multi and perf_event links

  Netfilter:

   - Speed-up process exit by aborting ruleset validation if a fatal
     signal is pending

   - Allow NLA_POLICY_MASK to be used with BE16/BE32 types

  Driver API:

   - Page pool optimizations, to improve data locality and cache usage

   - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
     need for raw ioctl() handling in drivers

   - Simplify genetlink dump operations (doit/dumpit) providing them the
     common information already populated in struct genl_info

   - Extend and use the yaml devlink specs to [re]generate the split ops

   - Introduce devlink selective dumps, to allow SF filtering SF based
     on handle and other attributes

   - Add yaml netlink spec for netlink-raw families, allow route, link
     and address related queries via the ynl tool

   - Remove phylink legacy mode support

   - Support offload LED blinking to phy

   - Add devlink port function attributes for IPsec

  New hardware / drivers:

   - Ethernet:
      - Broadcom ASP 2.0 (72165) ethernet controller
      - MediaTek MT7988 SoC
      - Texas Instruments AM654 SoC
      - Texas Instruments IEP driver
      - Atheros qca8081 phy
      - Marvell 88Q2110 phy
      - NXP TJA1120 phy

   - WiFi:
      - MediaTek mt7981 support

   - Can:
      - Kvaser SmartFusion2 PCI Express devices
      - Allwinner T113 controllers
      - Texas Instruments tcan4552/4553 chips

   - Bluetooth:
      - Intel Gale Peak
      - Qualcomm WCN3988 and WCN7850
      - NXP AW693 and IW624
      - Mediatek MT2925

  Drivers:

   - Ethernet NICs:
      - nVidia/Mellanox:
         - mlx5:
            - support UDP encapsulation in packet offload mode
            - IPsec packet offload support in eswitch mode
            - improve aRFS observability by adding new set of counters
            - extends MACsec offload support to cover RoCE traffic
            - dynamic completion EQs
         - mlx4:
            - convert to use auxiliary bus instead of custom interface
              logic
      - Intel
         - ice:
            - implement switchdev bridge offload, even for LAG
              interfaces
            - implement SRIOV support for LAG interfaces
         - igc:
            - add support for multiple in-flight TX timestamps
      - Broadcom:
         - bnxt:
            - use the unified RX page pool buffers for XDP and non-XDP
            - use the NAPI skb allocation cache
      - OcteonTX2:
         - support Round Robin scheduling HTB offload
         - TC flower offload support for SPI field
      - Freescale:
         - add XDP_TX feature support
      - AMD:
         - ionic: add support for PCI FLR event
         - sfc:
            - basic conntrack offload
            - introduce eth, ipv4 and ipv6 pedit offloads
      - ST Microelectronics:
         - stmmac: maximze PTP timestamping resolution

   - Virtual NICs:
      - Microsoft vNIC:
         - batch ringing RX queue doorbell on receiving packets
         - add page pool for RX buffers
      - Virtio vNIC:
         - add per queue interrupt coalescing support
      - Google vNIC:
         - add queue-page-list mode support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add port range matching tc-flower offload
         - permit enslavement to netdevices with uppers

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - convert to phylink_pcs
      - Renesas:
         - r8A779fx: add speed change support
         - rzn1: enables vlan support

   - Ethernet PHYs:
      - convert mv88e6xxx to phylink_pcs

   - WiFi:
      - Qualcomm Wi-Fi 7 (ath12k):
         - extremely High Throughput (EHT) PHY support
      - RealTek (rtl8xxxu):
         - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
           RTL8192EU and RTL8723BU
      - RealTek (rtw89):
         - Introduce Time Averaged SAR (TAS) support

   - Connector:
      - support for event filtering"

* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
  net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
  net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
  net: stmmac: clarify difference between "interface" and "phy_interface"
  r8152: add vendor/device ID pair for D-Link DUB-E250
  devlink: move devlink_notify_register/unregister() to dev.c
  devlink: move small_ops definition into netlink.c
  devlink: move tracepoint definitions into core.c
  devlink: push linecard related code into separate file
  devlink: push rate related code into separate file
  devlink: push trap related code into separate file
  devlink: use tracepoint_enabled() helper
  devlink: push region related code into separate file
  devlink: push param related code into separate file
  devlink: push resource related code into separate file
  devlink: push dpipe related code into separate file
  devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
  devlink: push shared buffer related code into separate file
  devlink: push port related code into separate file
  devlink: push object register/unregister notifications into separate helpers
  inet: fix IP_TRANSPARENT error handling
  ...
2023-08-29 11:33:01 -07:00
Linus Torvalds
d7dd9b449f EFI updates for v6.6
- one bugfix for x86 mixed mode that did not make it into v6.5
 - first pass of cleanup for the EFI runtime wrappers
 - some cosmetic touchups
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZOx+LAAKCRAwbglWLn0t
 XKzLAPwK8wIZ14NlC55NCtdvKEzK3N/muxKRqg2MAHfrbnREFgD9GgUxSbIEU2gz
 BXQM9GLaP86qCXkZlPYktjP6RVfDYAk=
 =e2mx
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "This primarily covers some cleanup work on the EFI runtime wrappers,
  which are shared between all EFI architectures except Itanium, and
  which provide some level of isolation to prevent faults occurring in
  the firmware code (which runs at the same privilege level as the
  kernel) from bringing down the system.

  Beyond that, there is a fix that did not make it into v6.5, and some
  doc fixes and dead code cleanup.

   - one bugfix for x86 mixed mode that did not make it into v6.5

   - first pass of cleanup for the EFI runtime wrappers

   - some cosmetic touchups"

* tag 'efi-next-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/efistub: Fix PCI ROM preservation in mixed mode
  efi/runtime-wrappers: Clean up white space and add __init annotation
  acpi/prmt: Use EFI runtime sandbox to invoke PRM handlers
  efi/runtime-wrappers: Don't duplicate setup/teardown code
  efi/runtime-wrappers: Remove duplicated macro for service returning void
  efi/runtime-wrapper: Move workqueue manipulation out of line
  efi/runtime-wrappers: Use type safe encapsulation of call arguments
  efi/riscv: Move EFI runtime call setup/teardown helpers out of line
  efi/arm64: Move EFI runtime call setup/teardown helpers out of line
  efi/riscv: libstub: Fix comment about absolute relocation
  efi: memmap: Remove kernel-doc warnings
  efi: Remove unused extern declaration efi_lookup_mapped_addr()
2023-08-28 16:25:45 -07:00
Rob Herring
c893884691 riscv: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it was merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.

Link: https://lore.kernel.org/r/20230714174043.4040561-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
2023-08-28 13:30:52 -05:00
Jakub Kicinski
bebfbf07c7 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZOjkTAAKCRDbK58LschI
 gx32AP9gaaHFBtOYBfoenKTJfMgv1WhtQHIBas+WN9ItmBx9MAEA4gm/VyQ6oD7O
 EBjJKJQ2CZ/QKw7cNacXw+l5jF7/+Q0=
 =8P7g
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-08-25

We've added 87 non-merge commits during the last 8 day(s) which contain
a total of 104 files changed, 3719 insertions(+), 4212 deletions(-).

The main changes are:

1) Add multi uprobe BPF links for attaching multiple uprobes
   and usdt probes, which is significantly faster and saves extra fds,
   from Jiri Olsa.

2) Add support BPF cpu v4 instructions for arm64 JIT compiler,
   from Xu Kuohai.

3) Add support BPF cpu v4 instructions for riscv64 JIT compiler,
   from Pu Lehui.

4) Fix LWT BPF xmit hooks wrt their return values where propagating
   the result from skb_do_redirect() would trigger a use-after-free,
   from Yan Zhai.

5) Fix a BPF verifier issue related to bpf_kptr_xchg() with local kptr
   where the map's value kptr type and locally allocated obj type
   mismatch, from Yonghong Song.

6) Fix BPF verifier's check_func_arg_reg_off() function wrt graph
   root/node which bypassed reg->off == 0 enforcement,
   from Kumar Kartikeya Dwivedi.

7) Lift BPF verifier restriction in networking BPF programs to treat
   comparison of packet pointers not as a pointer leak,
   from Yafang Shao.

8) Remove unmaintained XDP BPF samples as they are maintained
   in xdp-tools repository out of tree, from Toke Høiland-Jørgensen.

9) Batch of fixes for the tracing programs from BPF samples in order
   to make them more libbpf-aware, from Daniel T. Lee.

10) Fix a libbpf signedness determination bug in the CO-RE relocation
    handling logic, from Andrii Nakryiko.

11) Extend libbpf to support CO-RE kfunc relocations. Also follow-up
    fixes for bpf_refcount shared ownership implementation,
    both from Dave Marchevsky.

12) Add a new bpf_object__unpin() API function to libbpf,
    from Daniel Xu.

13) Fix a memory leak in libbpf to also free btf_vmlinux
    when the bpf_object gets closed, from Hao Luo.

14) Small error output improvements to test_bpf module, from Helge Deller.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (87 commits)
  selftests/bpf: Add tests for rbtree API interaction in sleepable progs
  bpf: Allow bpf_spin_{lock,unlock} in sleepable progs
  bpf: Consider non-owning refs to refcounted nodes RCU protected
  bpf: Reenable bpf_refcount_acquire
  bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  bpf: Consider non-owning refs trusted
  bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
  selftests/bpf: Enable cpu v4 tests for RV64
  riscv, bpf: Support unconditional bswap insn
  riscv, bpf: Support signed div/mod insns
  riscv, bpf: Support 32-bit offset jmp insn
  riscv, bpf: Support sign-extension mov insns
  riscv, bpf: Support sign-extension load insns
  riscv, bpf: Fix missing exception handling and redundant zext for LDX_B/H/W
  samples/bpf: Add note to README about the XDP utilities moved to xdp-tools
  samples/bpf: Cleanup .gitignore
  samples/bpf: Remove the xdp_sample_pkts utility
  samples/bpf: Remove the xdp1 and xdp2 utilities
  samples/bpf: Remove the xdp_rxq_info utility
  samples/bpf: Remove the xdp_redirect* utilities
  ...
====================

Link: https://lore.kernel.org/r/20230825194319.12727-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-25 18:40:15 -07:00
Linus Torvalds
6f0edbb833 18 hotfixes. 13 are cc:stable and the remainder pertain to post-6.4 issues
or aren't considered suitable for a -stable backport.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZOjuGgAKCRDdBJ7gKXxA
 jkLlAQDY9sYxhQZp1PFLirUIPeOBjEyifVy6L6gCfk9j0snLggEA2iK+EtuJt2Dc
 SlMfoTq29zyU/YgfKKwZEVKtPJZOHQU=
 =oTcj
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "18 hotfixes. 13 are cc:stable and the remainder pertain to post-6.4
  issues or aren't considered suitable for a -stable backport"

* tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  shmem: fix smaps BUG sleeping while atomic
  selftests: cachestat: catch failing fsync test on tmpfs
  selftests: cachestat: test for cachestat availability
  maple_tree: disable mas_wr_append() when other readers are possible
  madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
  madvise:madvise_free_huge_pmd(): don't use mapcount() against large folio for sharing check
  madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
  mm: multi-gen LRU: don't spin during memcg release
  mm: memory-failure: fix unexpected return value in soft_offline_page()
  radix tree: remove unused variable
  mm: add a call to flush_cache_vmap() in vmap_pfn()
  selftests/mm: FOLL_LONGTERM need to be updated to 0x100
  nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
  mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
  selftests: cgroup: fix test_kmem_basic less than error
  mm: enable page walking API to lock vmas during the walk
  smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
  mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
2023-08-25 11:44:43 -07:00
Linus Torvalds
4942fed84b RISC-V Fixes for 6.5-rc8
* The vector ucontext extension has been extended with vlenb.
 * The vector registers ELF core dump note type has been changed to avoid
   aliasing with the CSR type used in embedded systems.
 * Support for accessing vector registers via ptrace() has been reverted.
 * Another build fix for the ISA spec changes around Zifencei/Zicsr that
   manifests on some systems built with binutils-2.37 and gcc-11.2.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmTooW4THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiXB5EACCEaqKDGfITAIII+KJAZFfvoL9UqgU
 iyywubMbFpcpmiqOM9KmRazfbLhxEU6arU5ZOPbwwf03wcS7/dyIn/fDV7wd/lDx
 K+tN8XE0BoQkehDMKfSGAT2WZSIfzBjoa3zkNIUzKCc9DgXdDe0TPrGuQdft5oaf
 /KqE18CHQmqHrWfbt0Mc+Dpq8YXhw9pOKNA994k2aX5GR9/+wphRoA3JmNa4dzHm
 rkBoOQpirWEz1F12JpGilscdFIJOeTs3WB20rt/zisUOfEZCfjzmdx5amviR+e4X
 ENPDo1TzJVYhKfJfigyYO1pPMJ8EOB3t58sVkGjbfEmy7xa4rz3DVml2rn9CYdf/
 FeazMMo7R74DukQrSOMtiBhIlCNTIz0VKIeL24N9sTNXn7HaDzq45mQL6WVI4JxJ
 RBhvdHl3sOzMfFhB8fdebgAGtRcgBZw+joqCPBu7V37Ros2w1hv8c7Ec2q4gX5Yl
 wdtbV9JLmq4DoIrMnxxr8dgMt4QGc8io0UjvK82qBOQ5tHvSv430OSydcFbicBaU
 mLtxuI3SmlqFIURBrUPjk18B/3RZvSCtoRYgz8wyKU5DKUj7CTP6p+6sKqxM3y9G
 I+rg3SlteAqKWdNk3Tc2qExSIL6hWkOXXYeXr53uSweig7TmC2uutHs7w3hThMMp
 9/iByBaT8H2+dQ==
 =bvaa
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "This is obviously not ideal, particularly for something this late in
  the cycle.

  Unfortunately we found some uABI issues in the vector support while
  reviewing the GDB port, which has triggered a revert -- probably a
  good sign we should have reviewed GDB before merging this, I guess I
  just dropped the ball because I was so worried about the context
  extension and libc suff I forgot. Hence the late revert.

  There's some risk here as we're still exposing the vector context for
  signal handlers, but changing that would have meant reverting all of
  the vector support. The issues we've found so far have been fixed
  already and they weren't absolute showstoppers, so we're essentially
  just playing it safe by holding ptrace support for another release (or
  until we get through a proper userspace code review).

  Summary:

   - The vector ucontext extension has been extended with vlenb

   - The vector registers ELF core dump note type has been changed to
     avoid aliasing with the CSR type used in embedded systems

   - Support for accessing vector registers via ptrace() has been
     reverted

   - Another build fix for the ISA spec changes around Zifencei/Zicsr
     that manifests on some systems built with binutils-2.37 and
     gcc-11.2"

* tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix build errors using binutils2.37 toolchains
  RISC-V: vector: export VLENB csr in __sc_riscv_v_state
  RISC-V: Remove ptrace support for vectors
2023-08-25 09:29:47 -07:00
Matthew Wilcox (Oracle)
864609c6a0 riscv: implement the new page table range API
Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().  Change
the PG_dcache_clean flag from being per-page to per-folio.

Link: https://lkml.kernel.org/r/20230802151406.3735276-23-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:23 -07:00
Matthew Wilcox (Oracle)
a379322022 mm: convert page_table_check_pte_set() to page_table_check_ptes_set()
Tell the page table check how many PTEs & PFNs we want it to check.

Link: https://lkml.kernel.org/r/20230802151406.3735276-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:18 -07:00
Suren Baghdasaryan
4089eef0e6 mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED
handle_mm_fault returning VM_FAULT_RETRY or VM_FAULT_COMPLETED means
mmap_lock has been released.  However with per-VMA locks behavior is
different and the caller should still release it.  To make the rules
consistent for the caller, drop the per-VMA lock when returning
VM_FAULT_RETRY or VM_FAULT_COMPLETED.  Currently the only path returning
VM_FAULT_RETRY under per-VMA locks is do_swap_page and no path returns
VM_FAULT_COMPLETED for now.

[willy@infradead.org: fix riscv]
  Link: https://lkml.kernel.org/r/CAJuCfpE6GWEx1rPBmNpUfoD5o-gNFz9-UFywzCE2PbEGBiVz7g@mail.gmail.com
Link: https://lkml.kernel.org/r/20230630211957.1341547-4-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hillf Danton <hdanton@sina.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24 16:20:17 -07:00
Mingzheng Xing
ef21fa7c19
riscv: Fix build errors using binutils2.37 toolchains
When building the kernel with binutils 2.37 and GCC-11.1.0/GCC-11.2.0,
the following error occurs:

  Assembler messages:
  Error: cannot find default versions of the ISA extension `zicsr'
  Error: cannot find default versions of the ISA extension `zifencei'

The above error originated from this commit of binutils[0], which has been
resolved and backported by GCC-12.1.0[1] and GCC-11.3.0[2].

So fix this by change the GCC version in
CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC to GCC-11.3.0.

Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f0bae2552db1dd4f1995608fbf6648fcee4e9e0c [0]
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ca2bbb88f999f4d3cc40e89bc1aba712505dd598 [1]
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d29f5d6ab513c52fd872f532c492e35ae9fd6671 [2]
Fixes: ca09f772ccca ("riscv: Handle zicsr/zifencei issue between gcc and binutils")
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn>
Link: https://lore.kernel.org/r/20230824190852.45470-1-xingmingzheng@iscas.ac.cn
Closes: https://lore.kernel.org/all/20230823-captive-abdomen-befd942a4a73@wendy/
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-24 12:35:20 -07:00
Jakub Kicinski
57ce6427e0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

include/net/inet_sock.h
  f866fbc842de ("ipv4: fix data-races around inet->inet_id")
  c274af224269 ("inet: introduce inet->inet_flags")
https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/

Adjacent changes:

drivers/net/bonding/bond_alb.c
  e74216b8def3 ("bonding: fix macvlan over alb bond support")
  f11e5bd159b0 ("bonding: support balance-alb with openvswitch")

drivers/net/ethernet/broadcom/bgmac.c
  d6499f0b7c7c ("net: bgmac: Return PTR_ERR() for fixed_phy_register()")
  23a14488ea58 ("net: bgmac: Fix return value check for fixed_phy_register()")

drivers/net/ethernet/broadcom/genet/bcmmii.c
  32bbe64a1386 ("net: bcmgenet: Fix return value check for fixed_phy_register()")
  acf50d1adbf4 ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()")

net/sctp/socket.c
  f866fbc842de ("ipv4: fix data-races around inet->inet_id")
  b09bde5c3554 ("inet: move inet->mc_loop to inet->inet_frags")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-24 10:51:39 -07:00
Pu Lehui
83cc63afab riscv, bpf: Support unconditional bswap insn
Add support unconditional bswap instruction. Since riscv is always
little-endian, just treat the unconditional scenario the same as
big-endian conversion.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230824095001.3408573-7-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-24 09:13:08 -07:00
Pu Lehui
3e18ff4bce riscv, bpf: Support signed div/mod insns
Add support signed div/mod instructions for RV64.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230824095001.3408573-6-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-24 09:13:08 -07:00
Pu Lehui
d9839f16c1 riscv, bpf: Support 32-bit offset jmp insn
Add support 32-bit offset jmp instruction for RV64.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230824095001.3408573-5-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-24 09:13:08 -07:00
Pu Lehui
694896ad3c riscv, bpf: Support sign-extension mov insns
Add support sign-extension mov instructions for RV64.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230824095001.3408573-4-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-24 09:13:08 -07:00
Pu Lehui
3d06d8163f riscv, bpf: Support sign-extension load insns
Add Support sign-extension load instructions for RV64.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230824095001.3408573-3-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-24 09:13:08 -07:00
Pu Lehui
469fb2c3c1 riscv, bpf: Fix missing exception handling and redundant zext for LDX_B/H/W
For LDX_B/H/W, when zext has been inserted by verifier, it'll return 1,
and no exception handling will continue. Also, when the offset is 12-bit
value, the redundant zext inserted by the verifier is not removed. Fix
both scenarios by moving down the removal of redundant zext.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20230824095001.3408573-2-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-24 09:13:08 -07:00
Charlie Jenkins
26eee2bfc4
RISC-V: mm: Update pgtable comment documentation
sv57 is supported in the kernel so pgtable.h should reflect that.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230809232218.849726-4-charlie@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:54:14 -07:00
Charlie Jenkins
add2cc6b65
RISC-V: mm: Restrict address space for sv39,sv48,sv57
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. A hint address passed to mmap will
cause the largest address space that fits entirely into the hint to be
used. If the hint is less than or equal to 1<<38, an sv39 address will
be used. An exception is that if the hint address is 0, then a sv48
address will be used. After an address space is completely full, the next
smallest address space will be used.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20230809232218.849726-2-charlie@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:54:12 -07:00
Jisheng Zhang
f51f7a0fc2
riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent
With the DMA bouncing of unaligned kmalloc() buffers now in place,
enable it for riscv when RISCV_DMA_NONCOHERENT=y to allow the
kmalloc-{8,16,32,96} caches. Since RV32 doesn't enable SWIOTLB
yet, and I didn't see any dma noncoherent RV32 platforms in the
mainline, so skip RV32 now by only enabling
DMA_BOUNCE_UNALIGNED_KMALLOC if SWIOTLB is available. Once we see
such requirement on RV32, we can enable it then.

NOTE: we didn't force to create the swiotlb buffer even when the
end of RAM is within the 32-bit physical address range. That's to
say:
For RV64 with > 4GB memory, the feature is enabled.
For RV64 with <= 4GB memory, the feature isn't enabled by default. We
rely on users to pass "swiotlb=mmnn,force" where mmnn is the Number of
I/O TLB slabs, see kernel-parameters.txt for details.

Tested on Sipeed Lichee Pi 4A with 8GB DDR and Sipeed M1S BL808 Dock
board.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230718152214.2907-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:22:01 -07:00
Jisheng Zhang
2926715163
riscv: allow kmalloc() caches aligned to the smallest value
Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E
64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel
Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus
it brings some bad effects to coherent platforms:

Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and
kmalloc-8 slab caches don't exist any more, they are replaced with
either kmalloc-128 or kmalloc-64.

Secondly, larger than necessary kmalloc aligned allocations results
in unnecessary cache/TLB pressure.

This issue also exists on arm64 platforms. From last year, Catalin
tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from
ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to
dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage
in various drivers with ARCH_DMA_MINALIGN etc.[1]

One fact we can make use of for riscv: if the CPU doesn't support
ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on
Catalin's work and above fact, we can easily solve the kmalloc align
issue for riscv: we can override dma_get_cache_alignment(), then let
it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know
the underlying HW neither supports ZICBOM nor supports T-HEAD CMO.

So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the
devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the
kmalloc minimum alignment, nothing changed in this case. This case
can be improved in the future.

After this patch, a simple test of booting to a small buildroot rootfs
on qemu shows:

kmalloc-96           5041    5041     96  ...
kmalloc-64           9606    9606     64  ...
kmalloc-32           5128    5128     32  ...
kmalloc-16           7682    7682     16  ...
kmalloc-8           10246   10246      8  ...

So we save about 1268KB memory. The saving will be much larger in normal
OS env on real HW platforms.

Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1]

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230718152214.2907-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:22:00 -07:00
Greg Ungerer
9549fb354e
riscv: support the elf-fdpic binfmt loader
Add support for enabling and using the binfmt_elf_fdpic program loader
on RISC-V platforms. The most important change is to setup registers
during program load to pass the mapping addresses to the new process.

One of the interesting features of the elf-fdpic loader is that it
also allows appropriately compiled ELF format binaries to be loaded on
nommu systems. Appropriate being those compiled with -pie.

Signed-off-by: Greg Ungerer <gerg@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230711130754.481209-3-gerg@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:17:43 -07:00
Sami Tolvanen
74f8fc31fe
riscv: Allow CONFIG_CFI_CLANG to be selected
Select ARCH_SUPPORTS_CFI_CLANG to allow CFI_CLANG to be selected
on riscv.

Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230710183544.999540-14-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:16:41 -07:00
Sami Tolvanen
a72ab03611
riscv/purgatory: Disable CFI
Filter out CC_FLAGS_CFI when CONFIG_CFI_CLANG.

Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230710183544.999540-13-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:16:40 -07:00
Sami Tolvanen
af0ead42f6
riscv: Add CFI error handling
With CONFIG_CFI_CLANG, the compiler injects a type preamble immediately
before each function and a check to validate the target function type
before indirect calls:

  ; type preamble
    .word <id>
  function:
    ...
  ; indirect call check
    lw      t1, -4(a0)
    lui     t2, <hi20>
    addiw   t2, t2, <lo12>
    beq     t1, t2, .Ltmp0
    ebreak
  .Ltmp0:
    jarl    a0

Implement error handling code for the ebreak traps emitted for the
checks. This produces the following oops on a CFI failure (generated
using lkdtm):

[   21.177245] CFI failure at lkdtm_indirect_call+0x22/0x32 [lkdtm]
(target: lkdtm_increment_int+0x0/0x18 [lkdtm]; expected type: 0x3ad55aca)
[   21.178483] Kernel BUG [#1]
[   21.178671] Modules linked in: lkdtm
[   21.179037] CPU: 1 PID: 104 Comm: sh Not tainted
6.3.0-rc6-00037-g37d5ec6297ab #1
[   21.179511] Hardware name: riscv-virtio,qemu (DT)
[   21.179818] epc : lkdtm_indirect_call+0x22/0x32 [lkdtm]
[   21.180106]  ra : lkdtm_CFI_FORWARD_PROTO+0x48/0x7c [lkdtm]
[   21.180426] epc : ffffffff01387092 ra : ffffffff01386f14 sp : ff20000000453cf0
[   21.180792]  gp : ffffffff81308c38 tp : ff6000000243f080 t0 : ff20000000453b78
[   21.181157]  t1 : 000000003ad55aca t2 : 000000007e0c52a5 s0 : ff20000000453d00
[   21.181506]  s1 : 0000000000000001 a0 : ffffffff0138d170 a1 : ffffffff013870bc
[   21.181819]  a2 : b5fea48dd89aa700 a3 : 0000000000000001 a4 : 0000000000000fff
[   21.182169]  a5 : 0000000000000004 a6 : 00000000000000b7 a7 : 0000000000000000
[   21.182591]  s2 : ff20000000453e78 s3 : ffffffffffffffea s4 : 0000000000000012
[   21.183001]  s5 : ff600000023c7000 s6 : 0000000000000006 s7 : ffffffff013882a0
[   21.183653]  s8 : 0000000000000008 s9 : 0000000000000002 s10: ffffffff0138d878
[   21.184245]  s11: ffffffff0138d878 t3 : 0000000000000003 t4 : 0000000000000000
[   21.184591]  t5 : ffffffff8133df08 t6 : ffffffff8133df07
[   21.184858] status: 0000000000000120 badaddr: 0000000000000000
cause: 0000000000000003
[   21.185415] [<ffffffff01387092>] lkdtm_indirect_call+0x22/0x32 [lkdtm]
[   21.185772] [<ffffffff01386f14>] lkdtm_CFI_FORWARD_PROTO+0x48/0x7c [lkdtm]
[   21.186093] [<ffffffff01383552>] lkdtm_do_action+0x22/0x34 [lkdtm]
[   21.186445] [<ffffffff0138350c>] direct_entry+0x128/0x13a [lkdtm]
[   21.186817] [<ffffffff8033ed8c>] full_proxy_write+0x58/0xb2
[   21.187352] [<ffffffff801d4fe8>] vfs_write+0x14c/0x33a
[   21.187644] [<ffffffff801d5328>] ksys_write+0x64/0xd4
[   21.187832] [<ffffffff801d53a6>] sys_write+0xe/0x1a
[   21.188171] [<ffffffff80003996>] ret_from_syscall+0x0/0x2
[   21.188595] Code: 0513 0f65 a303 ffc5 53b7 7e0c 839b 2a53 0363 0073 (9002) 9582
[   21.189178] ---[ end trace 0000000000000000 ]---
[   21.189590] Kernel panic - not syncing: Fatal exception

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com> # ISA bits
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230710183544.999540-12-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:16:39 -07:00
Sami Tolvanen
f3a0c23f25
riscv: Add ftrace_stub_graph
Commit 883bbbffa5a4 ("ftrace,kcfi: Separate ftrace_stub() and
ftrace_stub_graph()") added a separate ftrace_stub_graph function for
CFI_CLANG. Add the stub to fix FUNCTION_GRAPH_TRACER compatibility
with CFI.

Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230710183544.999540-11-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:16:38 -07:00
Sami Tolvanen
5f59c6855b
riscv: Add types to indirectly called assembly functions
With CONFIG_CFI_CLANG, assembly functions indirectly called
from C code must be annotated with type identifiers to pass CFI
checking. Use the SYM_TYPED_START macro to add types to the
relevant functions.

Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230710183544.999540-10-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:16:37 -07:00
Sami Tolvanen
08d0ce30e0
riscv: Implement syscall wrappers
Commit f0bddf50586d ("riscv: entry: Convert to generic entry") moved
syscall handling to C code, which exposed function pointer type
mismatches that trip fine-grained forward-edge Control-Flow Integrity
(CFI) checks as syscall handlers are all called through the same
syscall_t pointer type. To fix the type mismatches, implement pt_regs
based syscall wrappers similarly to x86 and arm64.

This patch is based on arm64 syscall wrappers added in commit
4378a7d4be30 ("arm64: implement syscall wrappers"), where the main goal
was to minimize the risk of userspace-controlled values being used
under speculation. This may be a concern for riscv in future as well.

Following other architectures, the syscall wrappers generate three
functions for each syscall; __riscv_<compat_>sys_<name> takes a pt_regs
pointer and extracts arguments from registers, __se_<compat_>sys_<name>
is a sign-extension wrapper that casts the long arguments to the
correct types for the real syscall implementation, which is named
__do_<compat_>sys_<name>.

Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230710183544.999540-9-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 14:16:36 -07:00
Palmer Dabbelt
e2de1646f7
Merge patch series "riscv: fix ptrace and export VLENB"
Andy Chiu <andy.chiu@sifive.com> says:

We add a vlenb field in Vector context and save it with the
riscv_vstate_save() macro. It should not cause performance regression as
VLENB is a design-time constant and is frequently used by hardware.
Also, adding this field into the __sc_riscv_v_state may benifit us on a
future compatibility issue becuse a hardware may have writable VLENB.

Adding and saving VLENB have an immediate benifit as it gives ptrace a
better view of the Vector extension and makes it possible to reconstruct
Vector register files from the dump without doing an additional csr read.

This patchset also sync the number of note types between us and gdb for
riscv to solve a conflicting note.

This is not an ABI break given that 6.5 has not been released yet.

* b4-shazam-merge:
  RISC-V: vector: export VLENB csr in __sc_riscv_v_state
  RISC-V: Remove ptrace support for vectors

Link: https://lore.kernel.org/r/20230816155450.26200-1-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 12:35:02 -07:00
Björn Töpel
9f944d2e0a
riscv: Require FRAME_POINTER for some configurations
Some V configurations implicitly turn on '-fno-omit-frame-pointer',
but leaving FRAME_POINTER disabled. This makes it hard to reason about
the FRAME_POINTER config, and also triggers build failures introduced
in by the commit in the Fixes: tag.

Select FRAME_POINTER explicitly for these configurations.

Fixes: ebc9cb03b21e ("riscv: stack: Fixup independent softirq stack for CONFIG_FRAME_POINTER=n")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230823082845.354839-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23 09:07:26 -07:00