3257 Commits

Author SHA1 Message Date
Palmer Dabbelt
2bf8acbf54
Merge patch series "Fix XIP boot and make XIP testable in QEMU"
Frederik Haxel <haxel@fzi.de> says:

XIP boot seems to be broken for some time now. A likely reason why no one
seems to have noticed this is that XIP is more difficult to test, as it is
currently not easily testable with QEMU.

These patches fix the XIP boot and allow an XIP build without BUILTIN_DTB,
which in turn makes it easier to test an image with the QEMU virt machine.

* b4-shazam-merge:
  riscv: Allow disabling of BUILTIN_DTB for XIP
  riscv: Fixed wrong register in XIP_FIXUP_FLASH_OFFSET macro
  riscv: Make XIP bootable again

Link: https://lore.kernel.org/r/20231212130116.848530-1-haxel@fzi.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:39 -08:00
Song Shuai
a7565f4d06
riscv: Remove SHADOW_OVERFLOW_STACK_SIZE macro
The commit be97d0db5f44 ("riscv: VMAP_STACK overflow
detection thread-safe") got rid of `shadow_stack`,
so SHADOW_OVERFLOW_STACK_SIZE should be removed too.

Fixes: be97d0db5f44 ("riscv: VMAP_STACK overflow detection thread-safe")
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20231211110331.359534-1-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:38 -08:00
Palmer Dabbelt
5c89186a32
Merge remote-tracking branch 'palmer/fixes' into for-next
I don't usually merge these in, but I missed sending a PR due to the
holidays.

* palmer/fixes:
  riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXEC
  riscv: Fix module_alloc() that did not reset the linear mapping permissions
  riscv: Fix wrong usage of lm_alias() when splitting a huge linear mapping
  riscv: Check if the code to patch lies in the exit section
  riscv: errata: andes: Probe for IOCP only once in boot stage
  riscv: Fix SMP when shadow call stacks are enabled
  dt-bindings: perf: riscv,pmu: drop unneeded quotes
  riscv: fix misaligned access handling of C.SWSP and C.SDSP
  RISC-V: hwprobe: Always use u64 for extension bits
  Support rv32 ULEB128 test
  riscv: Correct type casting in module loading
  riscv: Safely remove entries from relocation list

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:32 -08:00
Ben Dooks
869436dae7
riscv; fix __user annotation in save_v_state()
The save_v_state() is technically sending a __user pointer through
__put_user() and thus is generating a sparse warning so force the
value to be "void *" to fix:

arch/riscv/kernel/signal.c:94:16: warning: incorrect type in initializer (different address spaces)
arch/riscv/kernel/signal.c:94:16: expected void *__val
arch/riscv/kernel/signal.c:94:16: got void [noderef] __user *[assigned] datap

Fixes: 8ee0b41898fa26f66e32 ("riscv: signal: Add sigcontext save/restore for vector")
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://lore.kernel.org/r/20231123142708.261733-1-ben.dooks@codethink.co.uk
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:18 -08:00
Ben Dooks
ca0e433b41
riscv: fix __user annotation in traps_misaligned.c
The instruction reading code can read from either user or kernel addresses
and thus the use of __user on pointers to instructions depends on which
context. Fix a few sparse warnings by using __user for user-accesses and
remove it when not.

Fixes:

arch/riscv/kernel/traps_misaligned.c:361:21: warning: dereference of noderef expression
arch/riscv/kernel/traps_misaligned.c:373:21: warning: dereference of noderef expression
arch/riscv/kernel/traps_misaligned.c:381:21: warning: dereference of noderef expression
arch/riscv/kernel/traps_misaligned.c:322:24: warning: incorrect type in initializer (different address spaces)
arch/riscv/kernel/traps_misaligned.c:322:24:    expected unsigned char const [noderef] __user *__gu_ptr
arch/riscv/kernel/traps_misaligned.c:322:24:    got unsigned char const [usertype] *addr
arch/riscv/kernel/traps_misaligned.c:361:21: warning: dereference of noderef expression
arch/riscv/kernel/traps_misaligned.c:373:21: warning: dereference of noderef expression
arch/riscv/kernel/traps_misaligned.c:381:21: warning: dereference of noderef expression
arch/riscv/kernel/traps_misaligned.c:332:24: warning: incorrect type in initializer (different address spaces)
arch/riscv/kernel/traps_misaligned.c:332:24:    expected unsigned char [noderef] __user *__gu_ptr
arch/riscv/kernel/traps_misaligned.c:332:24:    got unsigned char [usertype] *addr

Fixes: 7c83232161f60 ("riscv: add support for misaligned trap handling in S-mode")
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://lore.kernel.org/r/20231123141617.259591-1-ben.dooks@codethink.co.uk
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:17 -08:00
Jisheng Zhang
cfbc4f81c9
riscv: Select ARCH_WANTS_NO_INSTR
As said in the help of ARCH_WANTS_NO_INSTR entry in arch/Kconfig:
"An architecture should select this if the noinstr macro is being used on
functions to denote that the toolchain should avoid instrumenting such
functions and is required for correctness."

Select ARCH_WANTS_NO_INSTR for correctness.

PS: The reason we didn't find any issue so far is that the
CC_HAS_NO_PROFILE_FN_ATTR is true.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20231123142223.1787-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:16 -08:00
Palmer Dabbelt
5634d9c280
Merge patch series "riscv: CPU operations cleanup"
Samuel Holland <samuel.holland@sifive.com> says:

This series cleans up some duplicated and dead code around the RISC-V
CPU operations, that was copied from arm64 but is not needed here. The
result is a bit of memory savings and removal of a few SBI calls during
boot, with no functional change.

* b4-shazam-merge:
  riscv: Use the same CPU operations for all CPUs
  riscv: Remove unused members from struct cpu_operations
  riscv: Deduplicate code in setup_smp()

Link: https://lore.kernel.org/r/20231121234736.3489608-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:15 -08:00
Samuel Holland
b7b4e4d79f
riscv: Remove obsolete rv32_defconfig file
This file is not used since commit 72f045d19f25 ("riscv: Fixup
difference with defconfig"), where it was replaced by the
32-bit.config fragment. Delete the old file to avoid any confusion.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231121225320.3430550-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:14 -08:00
Palmer Dabbelt
7a4749739c
Merge patch series "RISC-V: hwprobe: Introduce which-cpus"
Andrew Jones <ajones@ventanamicro.com> says:

This series introduces a flag for the hwprobe syscall which effectively
reverses its behavior from getting the values of keys for a set of cpus
to getting the cpus for a set of key-value pairs.

* b4-shazam-merge:
  RISC-V: selftests: Add which-cpus hwprobe test
  RISC-V: hwprobe: Introduce which-cpus flag
  RISC-V: Move the hwprobe syscall to its own file
  RISC-V: hwprobe: Clarify cpus size parameter

Link: https://lore.kernel.org/r/20231122164700.127954-6-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 20:10:13 -08:00
Frederik Haxel
6c4a2f6329
riscv: Allow disabling of BUILTIN_DTB for XIP
This enables, among other things, testing with the QEMU virt machine.

To build an XIP kernel for the QEMU virt machine, configure the
the kernel as desired and apply the following configuration
```
CONFIG_NONPORTABLE=y
CONFIG_XIP_KERNEL=y
CONFIG_XIP_PHYS_ADDR=0x20000000
CONFIG_PHYS_RAM_BASE=0x80200000
CONFIG_BUILTIN_DTB=n
```

Since the QEMU virt flash memory expects a 32 MB file, the built image
must be padded. For example, with
`truncate -s 32M arch/riscv/boot/xipImage`

The kernel can be started using the following command in QEMU (v8+)
```
qemu-system-riscv64 -M virt,pflash0=pflash0 \
 -blockdev node-name=pflash0,driver=file,read-only=on,\
filename=arch/riscv/boot/xipImage <optional parameters>
```

Signed-off-by: Frederik Haxel <haxel@fzi.de>
Link: https://lore.kernel.org/r/20231212130116.848530-4-haxel@fzi.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 19:33:22 -08:00
Frederik Haxel
5daa372641
riscv: Fixed wrong register in XIP_FIXUP_FLASH_OFFSET macro
During the refactoring, a bug was introduced in the rarly used
XIP_FIXUP_FLASH_OFFSET macro.

Fixes: bee7fbc38579 ("RISC-V CPU Idle Support")
Fixes: e7681beba992 ("RISC-V: Split out the XIP fixups into their own file")

Signed-off-by: Frederik Haxel <haxel@fzi.de>
Link: https://lore.kernel.org/r/20231212130116.848530-3-haxel@fzi.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 19:33:21 -08:00
Frederik Haxel
66f1e68093
riscv: Make XIP bootable again
Currently, the XIP kernel seems to fail to boot due to missing
XIP_FIXUP and a wrong page_offset value. A superfluous XIP_FIXUP
has also been removed.

Signed-off-by: Frederik Haxel <haxel@fzi.de>
Link: https://lore.kernel.org/r/20231212130116.848530-2-haxel@fzi.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 19:33:20 -08:00
Linus Torvalds
9f2a635235 Quite a lot of kexec work this time around. Many singleton patches in
many places.  The notable patch series are:
 
 - nilfs2 folio conversion from Matthew Wilcox in "nilfs2: Folio
   conversions for file paths".
 
 - Additional nilfs2 folio conversion from Ryusuke Konishi in "nilfs2:
   Folio conversions for directory paths".
 
 - IA64 remnant removal in Heiko Carstens's "Remove unused code after
   IA-64 removal".
 
 - Arnd Bergmann has enabled the -Wmissing-prototypes warning everywhere
   in "Treewide: enable -Wmissing-prototypes".  This had some followup
   fixes:
 
   - Nathan Chancellor has cleaned up the hexagon build in the series
     "hexagon: Fix up instances of -Wmissing-prototypes".
 
   - Nathan also addressed some s390 warnings in "s390: A couple of
     fixes for -Wmissing-prototypes".
 
   - Arnd Bergmann addresses the same warnings for MIPS in his series
     "mips: address -Wmissing-prototypes warnings".
 
 - Baoquan He has made kexec_file operate in a top-down-fitting manner
   similar to kexec_load in the series "kexec_file: Load kernel at top of
   system RAM if required"
 
 - Baoquan He has also added the self-explanatory "kexec_file: print out
   debugging message if required".
 
 - Some checkstack maintenance work from Tiezhu Yang in the series
   "Modify some code about checkstack".
 
 - Douglas Anderson has disentangled the watchdog code's logging when
   multiple reports are occurring simultaneously.  The series is "watchdog:
   Better handling of concurrent lockups".
 
 - Yuntao Wang has contributed some maintenance work on the crash code in
   "crash: Some cleanups and fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZ2R6AAKCRDdBJ7gKXxA
 juCVAP4t76qUISDOSKugB/Dn5E4Nt9wvPY9PcufnmD+xoPsgkQD+JVl4+jd9+gAV
 vl6wkJDiJO5JZ3FVtBtC3DFA/xHtVgk=
 =kQw+
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "Quite a lot of kexec work this time around. Many singleton patches in
  many places. The notable patch series are:

   - nilfs2 folio conversion from Matthew Wilcox in 'nilfs2: Folio
     conversions for file paths'.

   - Additional nilfs2 folio conversion from Ryusuke Konishi in 'nilfs2:
     Folio conversions for directory paths'.

   - IA64 remnant removal in Heiko Carstens's 'Remove unused code after
     IA-64 removal'.

   - Arnd Bergmann has enabled the -Wmissing-prototypes warning
     everywhere in 'Treewide: enable -Wmissing-prototypes'. This had
     some followup fixes:

      - Nathan Chancellor has cleaned up the hexagon build in the series
        'hexagon: Fix up instances of -Wmissing-prototypes'.

      - Nathan also addressed some s390 warnings in 's390: A couple of
        fixes for -Wmissing-prototypes'.

      - Arnd Bergmann addresses the same warnings for MIPS in his series
        'mips: address -Wmissing-prototypes warnings'.

   - Baoquan He has made kexec_file operate in a top-down-fitting manner
     similar to kexec_load in the series 'kexec_file: Load kernel at top
     of system RAM if required'

   - Baoquan He has also added the self-explanatory 'kexec_file: print
     out debugging message if required'.

   - Some checkstack maintenance work from Tiezhu Yang in the series
     'Modify some code about checkstack'.

   - Douglas Anderson has disentangled the watchdog code's logging when
     multiple reports are occurring simultaneously. The series is
     'watchdog: Better handling of concurrent lockups'.

   - Yuntao Wang has contributed some maintenance work on the crash code
     in 'crash: Some cleanups and fixes'"

* tag 'mm-nonmm-stable-2024-01-09-10-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (157 commits)
  crash_core: fix and simplify the logic of crash_exclude_mem_range()
  x86/crash: use SZ_1M macro instead of hardcoded value
  x86/crash: remove the unused image parameter from prepare_elf_headers()
  kdump: remove redundant DEFAULT_CRASH_KERNEL_LOW_SIZE
  scripts/decode_stacktrace.sh: strip unexpected CR from lines
  watchdog: if panicking and we dumped everything, don't re-enable dumping
  watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
  watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
  watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
  kexec_core: fix the assignment to kimage->control_page
  x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init()
  lib/trace_readwrite.c:: replace asm-generic/io with linux/io
  nilfs2: cpfile: fix some kernel-doc warnings
  stacktrace: fix kernel-doc typo
  scripts/checkstack.pl: fix no space expression between sp and offset
  x86/kexec: fix incorrect argument passed to kexec_dprintk()
  x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
  nilfs2: add missing set_freezable() for freezable kthread
  kernel: relay: remove relay_file_splice_read dead code, doesn't work
  docs: submit-checklist: remove all of "make namespacecheck"
  ...
2024-01-09 11:46:20 -08:00
Linus Torvalds
fb46e22a9e Many singleton patches against the MM code. The patch series which
are included in this merge do the following:
 
 - Peng Zhang has done some mapletree maintainance work in the
   series
 
 	"maple_tree: add mt_free_one() and mt_attr() helpers"
 	"Some cleanups of maple tree"
 
 - In the series "mm: use memmap_on_memory semantics for dax/kmem"
   Vishal Verma has altered the interworking between memory-hotplug
   and dax/kmem so that newly added 'device memory' can more easily
   have its memmap placed within that newly added memory.
 
 - Matthew Wilcox continues folio-related work (including a few
   fixes) in the patch series
 
 	"Add folio_zero_tail() and folio_fill_tail()"
 	"Make folio_start_writeback return void"
 	"Fix fault handler's handling of poisoned tail pages"
 	"Convert aops->error_remove_page to ->error_remove_folio"
 	"Finish two folio conversions"
 	"More swap folio conversions"
 
 - Kefeng Wang has also contributed folio-related work in the series
 
 	"mm: cleanup and use more folio in page fault"
 
 - Jim Cromie has improved the kmemleak reporting output in the
   series "tweak kmemleak report format".
 
 - In the series "stackdepot: allow evicting stack traces" Andrey
   Konovalov to permits clients (in this case KASAN) to cause
   eviction of no longer needed stack traces.
 
 - Charan Teja Kalla has fixed some accounting issues in the page
   allocator's atomic reserve calculations in the series "mm:
   page_alloc: fixes for high atomic reserve caluculations".
 
 - Dmitry Rokosov has added to the samples/ dorectory some sample
   code for a userspace memcg event listener application.  See the
   series "samples: introduce cgroup events listeners".
 
 - Some mapletree maintanance work from Liam Howlett in the series
   "maple_tree: iterator state changes".
 
 - Nhat Pham has improved zswap's approach to writeback in the
   series "workload-specific and memory pressure-driven zswap
   writeback".
 
 - DAMON/DAMOS feature and maintenance work from SeongJae Park in
   the series
 
 	"mm/damon: let users feed and tame/auto-tune DAMOS"
 	"selftests/damon: add Python-written DAMON functionality tests"
 	"mm/damon: misc updates for 6.8"
 
 - Yosry Ahmed has improved memcg's stats flushing in the series
   "mm: memcg: subtree stats flushing and thresholds".
 
 - In the series "Multi-size THP for anonymous memory" Ryan Roberts
   has added a runtime opt-in feature to transparent hugepages which
   improves performance by allocating larger chunks of memory during
   anonymous page faults.
 
 - Matthew Wilcox has also contributed some cleanup and maintenance
   work against eh buffer_head code int he series "More buffer_head
   cleanups".
 
 - Suren Baghdasaryan has done work on Andrea Arcangeli's series
   "userfaultfd move option".  UFFDIO_MOVE permits userspace heap
   compaction algorithms to move userspace's pages around rather than
   UFFDIO_COPY'a alloc/copy/free.
 
 - Stefan Roesch has developed a "KSM Advisor", in the series
   "mm/ksm: Add ksm advisor".  This is a governor which tunes KSM's
   scanning aggressiveness in response to userspace's current needs.
 
 - Chengming Zhou has optimized zswap's temporary working memory
   use in the series "mm/zswap: dstmem reuse optimizations and
   cleanups".
 
 - Matthew Wilcox has performed some maintenance work on the
   writeback code, both code and within filesystems.  The series is
   "Clean up the writeback paths".
 
 - Andrey Konovalov has optimized KASAN's handling of alloc and
   free stack traces for secondary-level allocators, in the series
   "kasan: save mempool stack traces".
 
 - Andrey also performed some KASAN maintenance work in the series
   "kasan: assorted clean-ups".
 
 - David Hildenbrand has gone to town on the rmap code.  Cleanups,
   more pte batching, folio conversions and more.  See the series
   "mm/rmap: interface overhaul".
 
 - Kinsey Ho has contributed some maintenance work on the MGLRU
   code in the series "mm/mglru: Kconfig cleanup".
 
 - Matthew Wilcox has contributed lruvec page accounting code
   cleanups in the series "Remove some lruvec page accounting
   functions".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA
 jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27
 Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU=
 =0NHs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Peng Zhang has done some mapletree maintainance work in the series

	'maple_tree: add mt_free_one() and mt_attr() helpers'
	'Some cleanups of maple tree'

   - In the series 'mm: use memmap_on_memory semantics for dax/kmem'
     Vishal Verma has altered the interworking between memory-hotplug
     and dax/kmem so that newly added 'device memory' can more easily
     have its memmap placed within that newly added memory.

   - Matthew Wilcox continues folio-related work (including a few fixes)
     in the patch series

	'Add folio_zero_tail() and folio_fill_tail()'
	'Make folio_start_writeback return void'
	'Fix fault handler's handling of poisoned tail pages'
	'Convert aops->error_remove_page to ->error_remove_folio'
	'Finish two folio conversions'
	'More swap folio conversions'

   - Kefeng Wang has also contributed folio-related work in the series

	'mm: cleanup and use more folio in page fault'

   - Jim Cromie has improved the kmemleak reporting output in the series
     'tweak kmemleak report format'.

   - In the series 'stackdepot: allow evicting stack traces' Andrey
     Konovalov to permits clients (in this case KASAN) to cause eviction
     of no longer needed stack traces.

   - Charan Teja Kalla has fixed some accounting issues in the page
     allocator's atomic reserve calculations in the series 'mm:
     page_alloc: fixes for high atomic reserve caluculations'.

   - Dmitry Rokosov has added to the samples/ dorectory some sample code
     for a userspace memcg event listener application. See the series
     'samples: introduce cgroup events listeners'.

   - Some mapletree maintanance work from Liam Howlett in the series
     'maple_tree: iterator state changes'.

   - Nhat Pham has improved zswap's approach to writeback in the series
     'workload-specific and memory pressure-driven zswap writeback'.

   - DAMON/DAMOS feature and maintenance work from SeongJae Park in the
     series

	'mm/damon: let users feed and tame/auto-tune DAMOS'
	'selftests/damon: add Python-written DAMON functionality tests'
	'mm/damon: misc updates for 6.8'

   - Yosry Ahmed has improved memcg's stats flushing in the series 'mm:
     memcg: subtree stats flushing and thresholds'.

   - In the series 'Multi-size THP for anonymous memory' Ryan Roberts
     has added a runtime opt-in feature to transparent hugepages which
     improves performance by allocating larger chunks of memory during
     anonymous page faults.

   - Matthew Wilcox has also contributed some cleanup and maintenance
     work against eh buffer_head code int he series 'More buffer_head
     cleanups'.

   - Suren Baghdasaryan has done work on Andrea Arcangeli's series
     'userfaultfd move option'. UFFDIO_MOVE permits userspace heap
     compaction algorithms to move userspace's pages around rather than
     UFFDIO_COPY'a alloc/copy/free.

   - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm:
     Add ksm advisor'. This is a governor which tunes KSM's scanning
     aggressiveness in response to userspace's current needs.

   - Chengming Zhou has optimized zswap's temporary working memory use
     in the series 'mm/zswap: dstmem reuse optimizations and cleanups'.

   - Matthew Wilcox has performed some maintenance work on the writeback
     code, both code and within filesystems. The series is 'Clean up the
     writeback paths'.

   - Andrey Konovalov has optimized KASAN's handling of alloc and free
     stack traces for secondary-level allocators, in the series 'kasan:
     save mempool stack traces'.

   - Andrey also performed some KASAN maintenance work in the series
     'kasan: assorted clean-ups'.

   - David Hildenbrand has gone to town on the rmap code. Cleanups, more
     pte batching, folio conversions and more. See the series 'mm/rmap:
     interface overhaul'.

   - Kinsey Ho has contributed some maintenance work on the MGLRU code
     in the series 'mm/mglru: Kconfig cleanup'.

   - Matthew Wilcox has contributed lruvec page accounting code cleanups
     in the series 'Remove some lruvec page accounting functions'"

* tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits)
  mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER
  mm, treewide: introduce NR_PAGE_ORDERS
  selftests/mm: add separate UFFDIO_MOVE test for PMD splitting
  selftests/mm: skip test if application doesn't has root privileges
  selftests/mm: conform test to TAP format output
  selftests: mm: hugepage-mmap: conform to TAP format output
  selftests/mm: gup_test: conform test to TAP format output
  mm/selftests: hugepage-mremap: conform test to TAP format output
  mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING
  mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large
  mm/memcontrol: remove __mod_lruvec_page_state()
  mm/khugepaged: use a folio more in collapse_file()
  slub: use a folio in __kmalloc_large_node
  slub: use folio APIs in free_large_kmalloc()
  slub: use alloc_pages_node() in alloc_slab_page()
  mm: remove inc/dec lruvec page state functions
  mm: ratelimit stat flush from workingset shrinker
  kasan: stop leaking stack trace handles
  mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE
  mm/mglru: add dummy pmd_dirty()
  ...
2024-01-09 11:18:47 -08:00
Alexandre Ghiti
b8b2711336
riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXEC
When resetting the linear mapping permissions, we must make sure that we
clear the X bit so that do not end up with WX mappings (since we set
PAGE_KERNEL).

Fixes: 395a21ff859c ("riscv: add ARCH_HAS_SET_DIRECT_MAP support")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231213134027.155327-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 10:59:08 -08:00
Alexandre Ghiti
749b94b080
riscv: Fix module_alloc() that did not reset the linear mapping permissions
After unloading a module, we must reset the linear mapping permissions,
see the example below:

Before unloading a module:

0xffffaf809d65d000-0xffffaf809d6dc000    0x000000011d65d000       508K PTE .   ..     ..   D A G . . W R V
0xffffaf809d6dc000-0xffffaf809d6dd000    0x000000011d6dc000         4K PTE .   ..     ..   D A G . . . R V
0xffffaf809d6dd000-0xffffaf809d6e1000    0x000000011d6dd000        16K PTE .   ..     ..   D A G . . W R V
0xffffaf809d6e1000-0xffffaf809d6e7000    0x000000011d6e1000        24K PTE .   ..     ..   D A G . X . R V

After unloading a module:

0xffffaf809d65d000-0xffffaf809d6e1000    0x000000011d65d000       528K PTE .   ..     ..   D A G . . W R V
0xffffaf809d6e1000-0xffffaf809d6e7000    0x000000011d6e1000        24K PTE .   ..     ..   D A G . X W R V

The last mapping is not reset and we end up with WX mappings in the linear
mapping.

So add VM_FLUSH_RESET_PERMS to our module_alloc() definition.

Fixes: 0cff8bff7af8 ("riscv: avoid the PIC offset of static percpu data in module beyond 2G limits")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20231213134027.155327-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 10:59:07 -08:00
Alexandre Ghiti
c29fc621e1
riscv: Fix wrong usage of lm_alias() when splitting a huge linear mapping
lm_alias() can only be used on kernel mappings since it explicitly uses
__pa_symbol(), so simply fix this by checking where the address belongs
to before.

Fixes: 311cd2f6e253 ("riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings")
Reported-by: syzbot+afb726d49f84c8d95ee1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-riscv/000000000000620dd0060c02c5e1@google.com/
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231212195400.128457-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 10:59:06 -08:00
Alexandre Ghiti
420370f3ae
riscv: Check if the code to patch lies in the exit section
Otherwise we fall through to vmalloc_to_page() which panics since the
address does not lie in the vmalloc region.

Fixes: 043cb41a85de ("riscv: introduce interfaces to patch kernel code")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231214091926.203439-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09 10:58:59 -08:00
Linus Torvalds
bfe8eb3b85 Scheduler changes for v6.8:
- Energy scheduling:
 
     - Consolidate how the max compute capacity is
       used in the scheduler and how we calculate
       the frequency for a level of utilization.
 
     - Rework interface between the scheduler and
       the schedutil governor
 
     - Simplify the util_est logic
 
  - Deadline scheduler:
 
     - Work more towards reducing SCHED_DEADLINE
       starvation of low priority tasks (e.g., SCHED_OTHER)
       tasks when higher priority tasks monopolize CPU
       cycles, via the introduction of 'deadline servers'
       (nested/2-level scheduling).
       "Fair servers" to make use of this facility are
       not introduced yet.
 
  - EEVDF:
 
     - Introduce O(1) fastpath for EEVDF task selection
 
  - NUMA balancing:
 
     - Tune the NUMA-balancing vma scanning logic some more,
       to better distribute the probability
       of a particular vma getting scanned.
 
  - Plus misc fixes, cleanups and updates.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWcASMRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jLbg/+NOwF18M6klF1/3jUaV1PU09vRzYnnA7w
 oF7Tru7JLV+/vZK+rwI1zxzj5Nj3sVBQPIyp1embEHx7Z/QH8MIaIVpcSFsDDCYY
 Q8n6ZVRB+lKWEo5+Ti6JEJftDAWuLHXwFWDa57oWPuR0Tc736+zYHUfj7jdKk0RI
 nT/lnOT6hXU8q26O4QFrBrrhvCCxc4byo7buKPQfqie0bDA70ppIWkFQoQME6mvQ
 US9jvOyUipOiPV06DPwFvPDJUQBGq2VdJNk+5zCEtcqEfLREuo/Xq1Ww1x1BWaZI
 761532EuDo73iMK4IFZrvVmj1ioz957qbje11MSSkDdKj692xxjXyvnY0NBvZuho
 Ueog/jQ4D4I2qu7pPSCF8UfnI/Hw4Q+KJ89j3pcywRm4hmCTf9k3MGpAaVLVxH7G
 e5REZ5MSsFZi4Cs+zF87Of5KCKLhTr1qSetNtShinKahg06WZ+MZ8tW4jb52qy0j
 F8PMlvfBI3f7SOtA8s2P26mDGQ21YQehN2d5P+Fbwj/U3fjIlSTOyx6NwLpFwYaS
 Vf+fctchGFV1Sh7c2JjCh+ecYfXx3ghT/pvyPOImJtxtCKSRUQ8c26ApC1OsWfOE
 FdHv4f2dPqcyswCZzIv/2fyDXc9eaS2E05EMDNqVuMCGnzidzSs81n7hBioNMrnH
 ZgHK90TmEbw=
 =wTVh
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Energy scheduling:

   - Consolidate how the max compute capacity is used in the scheduler
     and how we calculate the frequency for a level of utilization.

   - Rework interface between the scheduler and the schedutil governor

   - Simplify the util_est logic

  Deadline scheduler:

   - Work more towards reducing SCHED_DEADLINE starvation of low
     priority tasks (e.g., SCHED_OTHER) tasks when higher priority tasks
     monopolize CPU cycles, via the introduction of 'deadline servers'
     (nested/2-level scheduling).

     "Fair servers" to make use of this facility are not introduced yet.

  EEVDF:

   - Introduce O(1) fastpath for EEVDF task selection

  NUMA balancing:

   - Tune the NUMA-balancing vma scanning logic some more, to better
     distribute the probability of a particular vma getting scanned.

  Plus misc fixes, cleanups and updates"

* tag 'sched-core-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  sched/fair: Fix tg->load when offlining a CPU
  sched/fair: Remove unused 'next_buddy_marked' local variable in check_preempt_wakeup_fair()
  sched/fair: Use all little CPUs for CPU-bound workloads
  sched/fair: Simplify util_est
  sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)
  arm64/amu: Use capacity_ref_freq() to set AMU ratio
  cpufreq/cppc: Set the frequency used for computing the capacity
  cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}()
  energy_model: Use a fixed reference frequency
  cpufreq/schedutil: Use a fixed reference frequency
  cpufreq: Use the fixed and coherent frequency for scaling capacity
  sched/topology: Add a new arch_scale_freq_ref() method
  freezer,sched: Clean saved_state when restoring it during thaw
  sched/fair: Update min_vruntime for reweight_entity() correctly
  sched/doc: Update documentation after renames and synchronize Chinese version
  sched/cpufreq: Rework iowait boost
  sched/cpufreq: Rework schedutil governor performance estimation
  sched/pelt: Avoid underestimation of task utilization
  sched/timers: Explain why idle task schedules out on remote timer enqueue
  sched/cpuidle: Comment about timers requirements VS idle handler
  ...
2024-01-08 19:49:17 -08:00
Paolo Bonzini
fb872da8e7 Common KVM changes for 6.8:
- Use memdup_array_user() to harden against overflow.
 
  - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmWW8F4SHHNlYW5qY0Bn
 b29nbGUuY29tAAoJEGCRIgFNDBL5urcP/Rex6Too26aHJXelUVHlFOGw3hfOnvbq
 Wr/P3kPqB/1Mncx3aiYTpEvUxFjVTvIkMB5dWba39Eq/G1BbOT2CAHCunlvKJrXy
 L83YgOl17QtZZJS1KmLTRCj1umfl4Z0c+GEIH+P1FOuOmllNXlLJ1+GWmolP6LLf
 u4DF2/tyVZf8JXXeJWYITHsU0YQQ0MhHgYL8/aMYJK8epNFpR3wKIqT3428ASxV3
 Ru4WH7jpYkFF7PaKbvjKdepr+1wyVt4PXJDDpciCScz45/8eebgfylLJbMglpsR1
 JSUTzd6KdCbekgzp51NnRdoIxP+MXgKA3dIuzXKyIDzm2Xq6tna87ve/aWDGw8JC
 nUMkP/vAuaKT+/QTOwskGAvK2GYDQD1UwVcFNLi12Iis50H0qPwcxsUionQuZgUC
 ykCmY4N31rSX4DhPg1WLiqsvC/EeDhfXprYrfSd4HQq08NgD45orRJw0Kov+shcS
 xijIlE1e3aVJMRrbfoSWyc4m79AcooxjYwojQC1Ayqsq0ZTTzzIpd6rqjmY+LbLL
 aP/wNz8hCfMhFekUV7dDk9rMdZY+bBnTiolyKAN66E6EnPYfl2EdrDEGnZOCPXF4
 L/O/kMCXHE90cszzrmiR40yNHLkPelij8sK+ligE4JpqteQ7ia/knh8YAiPBxDw6
 XcIfftXMm5XG
 =wpT4
 -----END PGP SIGNATURE-----

Merge tag 'kvm-x86-generic-6.8' of https://github.com/kvm-x86/linux into HEAD

Common KVM changes for 6.8:

 - Use memdup_array_user() to harden against overflow.

 - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
2024-01-08 08:09:57 -05:00
Paolo Bonzini
caadf876bb KVM: introduce CONFIG_KVM_COMMON
CONFIG_HAVE_KVM is currently used by some architectures to either
enabled the KVM config proper, or to enable host-side code that is
not part of the KVM module.  However, CONFIG_KVM's "select" statement
in virt/kvm/Kconfig corresponds to a third meaning, namely to
enable common Kconfigs required by all architectures that support
KVM.

These three meanings can be replaced respectively by an
architecture-specific Kconfig, by IS_ENABLED(CONFIG_KVM), or by
a new Kconfig symbol that is in turn selected by the
architecture-specific "config KVM".

Start by introducing such a new Kconfig symbol, CONFIG_KVM_COMMON.
Unlike CONFIG_HAVE_KVM, it is selected by CONFIG_KVM, not by
architecture code, and it brings in all dependencies of common
KVM code.  In particular, INTERVAL_TREE was missing in loongarch
and riscv, so that is another thing that is fixed.

Fixes: 8132d887a702 ("KVM: remove CONFIG_HAVE_KVM_EVENTFD", 2023-12-08)
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/all/44907c6b-c5bd-4e4a-a921-e4d3825539d8@infradead.org/
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-08 08:09:38 -05:00
Ingo Molnar
cdb3033e19 Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes for the v6.8 merge window
This fix didn't make it upstream in time, pick it up
for the v6.8 merge window.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-01-08 12:57:28 +01:00
Linus Torvalds
95c8a35f1c 12 hotfixes. 2 are cc:stable and the remainder either address post-6.7
issues or aren't considered necessary for earlier kernel versions.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZhbAwAKCRDdBJ7gKXxA
 jiFeAQD20H8wGT9hbMUYr/PxE4rOxyDXlhnv/mZ0Av3neve4SQD/YlgCFWYpEu8G
 F5rEAtq89UYE13qlgS3o9KjYOPzhtgQ=
 =vZ0P
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-01-05-11-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc mm fixes from Andrew Morton:
 "12 hotfixes.

  Two are cc:stable and the remainder either address post-6.7 issues or
  aren't considered necessary for earlier kernel versions"

* tag 'mm-hotfixes-stable-2024-01-05-11-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: shrinker: use kvzalloc_node() from expand_one_shrinker_info()
  mailmap: add entries for Mathieu Othacehe
  MAINTAINERS: change vmware.com addresses to broadcom.com
  arch/mm/fault: fix major fault accounting when retrying under per-VMA lock
  mm/mglru: skip special VMAs in lru_gen_look_around()
  MAINTAINERS: hand over hwpoison maintainership to Miaohe Lin
  MAINTAINERS: remove hugetlb maintainer Mike Kravetz
  mm: fix unmap_mapping_range high bits shift bug
  mm: memcg: fix split queue list crash when large folio migration
  mm: fix arithmetic for max_prop_frac when setting max_ratio
  mm: fix arithmetic for bdi min_ratio
  mm: align larger anonymous mappings on THP boundaries
2024-01-05 13:46:18 -08:00
Kinsey Ho
533c67e635 mm/mglru: add dummy pmd_dirty()
Add dummy pmd_dirty() for architectures that don't provide it.
This is similar to commit 6617da8fb565 ("mm: add dummy pmd_young()
for architectures not having it").

Link: https://lkml.kernel.org/r/20231227141205.2200125-5-kinseyho@google.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312210606.1Etqz3M4-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202312210042.xQEiqlEh-lkp@intel.com/
Signed-off-by: Kinsey Ho <kinseyho@google.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Donet Tom <donettom@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05 10:17:44 -08:00
Jakub Kicinski
e63c1822ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  e009b2efb7a8 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()")
  0f2b21477988 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL")
https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 18:06:46 -08:00
Samuel Holland
62ff262227
riscv: Use the same CPU operations for all CPUs
RISC-V provides no binding (ACPI or DT) to describe per-cpu start/stop
operations, so cpu_set_ops() will always detect the same operations for
every CPU. Replace the cpu_ops array with a single pointer to save space
and reduce boot time.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231121234736.3489608-4-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-04 15:03:07 -08:00
Samuel Holland
79093f3ec3
riscv: Remove unused members from struct cpu_operations
name is not used anywhere at all. cpu_prepare and cpu_disable do nothing
and always return 0 if implemented.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231121234736.3489608-3-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-04 15:03:06 -08:00
Samuel Holland
a4166aec11
riscv: Deduplicate code in setup_smp()
Both the ACPI and DT implementations contain some of the same code.
Move it to the calling function so it is not duplicated.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231121234736.3489608-2-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-04 15:03:05 -08:00
Andrew Jones
e178bf146e
RISC-V: hwprobe: Introduce which-cpus flag
Introduce the first flag for the hwprobe syscall. The flag basically
reverses its behavior, i.e. instead of populating the values of keys
for a given set of cpus, the set of cpus after the call is the result
of finding a set which supports the values of the keys. In order to
do this, we implement a pair compare function which takes the type of
value (a single value vs. a bitmask of booleans) into consideration.
We also implement vdso support for the new flag.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231122164700.127954-9-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-03 03:36:49 -08:00
Andrew Jones
53b2b22850
RISC-V: Move the hwprobe syscall to its own file
As Palmer says, hwprobe is "sort of its own thing now, and it's only
going to get bigger..."

Suggested-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231122164700.127954-8-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-03 03:36:48 -08:00
Andrew Jones
36d842d654
RISC-V: hwprobe: Clarify cpus size parameter
The "count" parameter associated with the 'cpus' parameter of the
hwprobe syscall is the size in bytes of 'cpus'. Naming it 'cpu_count'
may mislead users (it did me) to think it's the number of CPUs that
are or can be represented by 'cpus' instead. This is particularly
easy (IMO) to get wrong since 'cpus' is documented to be defined by
CPU_SET(3) and CPU_SET(3) also documents a CPU_COUNT() (the number
of CPUs in set) macro. CPU_SET(3) refers to the size of cpu sets
with 'setsize'. Adopt 'cpusetsize' for the hwprobe parameter and
specifically state it is in bytes in Documentation/riscv/hwprobe.rst
to clarify.

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231122164700.127954-7-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-03 03:36:47 -08:00
Palmer Dabbelt
cbc911392c
RISC-V: Remove the removed single-letter extensions
There were a few single-letter extensions that we had references to
floating around in the kernel, but that never ended up as actual ISA
specs and have mostly been replaced by multi-letter extensions.  This
removes the references to those extensions.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231110175903.2631-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-03 03:28:49 -08:00
Joerg Roedel
75f74f85a4 Merge branches 'apple/dart', 'arm/rockchip', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next 2024-01-03 09:59:32 +01:00
Paolo Bonzini
9cc52627c7 KVM/riscv changes for 6.8 part #1
- KVM_GET_REG_LIST improvement for vector registers
 - Generate ISA extension reg_list using macros in get-reg-list selftest
 - Steal time account support along with selftest
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmWQ+cgACgkQrUjsVaLH
 LAckBA//R4X9L5ugfPdDunp3ntjZXmNtBS5pM2jD+UvaoFn2kOA1o5kOD5mXluuh
 0imNjVuzlrX7XoAATQ4BoeoXg0whDbnv/8TE13KqSl1PfNziH2p5YD2DuHXPST3B
 V2VHrGACZ4wN074Whztl0oYI72obGnmpcsiglifkeCvRPerghHuDu40eUaWvCmgD
 DPwT+bjBgxkDZ4IheyytUrPql6reALh1Qo1vfj0FsJAgj+MAqQeD8n6rixSPnOdE
 9XXa4jdu7ycDp675VX/DVEWsNBQGPrsRK/uCiMksO36td+wLCKpAkvX95mE0w/L8
 qFJ+dN1c+1ZkjooHdVLfq2MjxaIRwmIowk7SeJbpvGIf/zG3r7eany7eXJT0+NjO
 22j5FY2z1NqcSG6Fazx76Qp2vVBVbxHShP9h7d6VTZYS7XENjmV6IWHpTSuSF8+n
 puj8Nf5C7WuqbySirSgQndDuKawn9myqfXXEoAuSiZ+kVyYEl8QnXm2gAIcxRDHX
 x+NDPMv0DpMBRO9qa/tXeqgNue/XOTJwgbmXzAlCNff3U7hPIHJ/5aZiJ/Re5TeE
 DxiU9AmIsNN2Bh0csS/wQbdScIqkOdOiDYEwT1DXOJWpmhiyCW7vR8ltaIuMJ4vP
 DtlfuUlSe4aml957nAiqqyjQAY/7gqmpoaGwu+lmrOX1K7fdtF0=
 =FeiG
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.8-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.8 part #1

- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list selftest
- Steal time account support along with selftest
2024-01-02 13:19:40 -05:00
Paolo Bonzini
136292522e LoongArch KVM changes for v6.8
1. Optimization for memslot hugepage checking.
 2. Cleanup and fix some HW/SW timer issues.
 3. Add LSX/LASX (128bit/256bit SIMD) support.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmWGu+0WHGNoZW5odWFj
 YWlAa2VybmVsLm9yZwAKCRAChivD8uImesO7D/wOdYP96R+mRzpLBeuTtFxU8e4A
 3n2luxOeP8v1WYtQ9H8M01Wgly+9u6cJ2pgAlv79BQHfmCfC0aWQLmpnCZmk/mYW
 wtQ75ASA3Qg6zOBWEksCkA0LUdPDHfQuaaUXT7RYZ7QtHKSNkkhsw2nMCq6fgrXU
 RnZjGctjuxgYSqQtwzfYO2AjSBAfAq1MjSzCTULJ0KkE8o5Bg0KOoGj8ijC1U+ua
 QWBnqTNzeKmYmqAFfhXoiiFYcuBUq7DEk5RtwDU7SeqqJEV3a8AbbsrWfz+wMemG
 gri95uRxvnhpPZ+6/PrVjIezqexPJmQ9+tjY6mxh/bPRnS5ICFygjV3lt050JUK8
 xIaJEFvl7g88RIz5mnTeM9tU4ibIsCLgA9zj33ps2H7QP5NazUm1dzk1YGAgqPdw
 m5hjwtTFQEujQM6cz1DLfhoi15VDNcYUonJIvGFZMhl7InitDpB3u9sI+AVGIVUG
 yKzBkqGB1L1vbJGnuWmspEqSUo7Z9iYzuVGbOnjc9LKQ/8OpLxj0brymYheA+CKG
 CIdULximQFVEHc2lbE+H+bW4hnrFP4sN9hlTng7KN7ommCIg+FltisM8Nt5NLWID
 9ywLj4Qa0Qrc5vB3FJ8+ksuDe2nD83uVLj247R7B0wxQcYw4ocyW/YU+gayF4EjY
 6azutwllW5ZB+I3hyw==
 =phol
 -----END PGP SIGNATURE-----

Merge tag 'loongarch-kvm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.8

1. Optimization for memslot hugepage checking.
2. Cleanup and fix some HW/SW timer issues.
3. Add LSX/LASX (128bit/256bit SIMD) support.
2024-01-02 13:16:29 -05:00
Andrew Jones
e9f12b5fff RISC-V: KVM: Implement SBI STA extension
Add a select SCHED_INFO to the KVM config in order to get run_delay
info. Then implement SBI STA's set-steal-time-shmem function and
kvm_riscv_vcpu_record_steal_time() to provide the steal-time info
to guests.

Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30 11:26:38 +05:30
Andrew Jones
f61ce890b1 RISC-V: KVM: Add support for SBI STA registers
KVM userspace needs to be able to save and restore the steal-time
shared memory address. Provide the address through the get/set-one-reg
interface with two ulong-sized SBI STA extension registers (lo and hi).
64-bit KVM userspace must not set the hi register to anything other
than zero and is allowed to completely neglect saving/restoring it.

Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30 11:26:35 +05:30
Andrew Jones
5b9e41321b RISC-V: KVM: Add support for SBI extension registers
Some SBI extensions have state that needs to be saved / restored
when migrating the VM. Provide a get/set-one-reg register type
for SBI extension registers. Each SBI extension that uses this type
will have its own subtype. There are currently no subtypes defined.
The next patch introduces the first one.

Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30 11:26:33 +05:30
Andrew Jones
38b3390ee4 RISC-V: KVM: Add SBI STA info to vcpu_arch
KVM's implementation of SBI STA needs to track the address of each
VCPU's steal-time shared memory region as well as the amount of
stolen time. Add a structure to vcpu_arch to contain this state
and make sure that the address is always set to INVALID_GPA on
vcpu reset. And, of course, ensure KVM won't try to update steal-
time when the shared memory address is invalid.

Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30 11:26:26 +05:30
Andrew Jones
2a1f6bf079 RISC-V: KVM: Add steal-update vcpu request
Add a new vcpu request to inform a vcpu that it should record its
steal-time information. The request is made each time it has been
detected that the vcpu task was not assigned a cpu for some time,
which is easy to do by making the request from vcpu-load. The record
function is just a stub for now and will be filled in with the rest
of the steal-time support functions in following patches.

Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30 11:26:22 +05:30
Andrew Jones
5fed84a800 RISC-V: KVM: Add SBI STA extension skeleton
Add the files and functions needed to support the SBI STA
(steal-time accounting) extension. In the next patches we'll
complete the functions to fully enable SBI STA support.

Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30 11:26:20 +05:30
Andrew Jones
fdf68acccf RISC-V: paravirt: Implement steal-time support
When the SBI STA extension exists we can use it to implement
paravirt steal-time support. Fill in the empty pv-time functions
with an SBI STA implementation and add the Kconfig knobs allowing
it to be enabled.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30 11:26:04 +05:30
Andrew Jones
6cfc624576 RISC-V: Add SBI STA extension definitions
The SBI STA extension enables steal-time accounting. Add the
definitions it specifies.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30 11:25:09 +05:30
Andrew Jones
323925ed6d RISC-V: paravirt: Add skeleton for pv-time support
Add the files and functions needed to support paravirt time on
RISC-V. Also include the common code needed for the first
application of pv-time, which is steal-time. In the next
patches we'll complete the functions to fully enable steal-time
support.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30 11:25:03 +05:30
Suren Baghdasaryan
46e714c729 arch/mm/fault: fix major fault accounting when retrying under per-VMA lock
A test [1] in Android test suite started failing after [2] was merged.  It
turns out that after handling a major fault under per-VMA lock, the
process major fault counter does not register that fault as major.  Before
[2] read faults would be done under mmap_lock, in which case
FAULT_FLAG_TRIED flag is set before retrying.  That in turn causes
mm_account_fault() to account the fault as major once retry completes. 
With per-VMA locks we often retry because a fault can't be handled without
locking the whole mm using mmap_lock.  Therefore such retries do not set
FAULT_FLAG_TRIED flag.  This logic does not work after [2] because we can
now handle read major faults under per-VMA lock and upon retry the fact
there was a major fault gets lost.  Fix this by setting FAULT_FLAG_TRIED
after retrying under per-VMA lock if VM_FAULT_MAJOR was returned.  Ideally
we would use an additional VM_FAULT bit to indicate the reason for the
retry (could not handle under per-VMA lock vs other reason) but this
simpler solution seems to work, so keeping it simple.

[1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp
[2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/

Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com
Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 11:06:49 -08:00
Anup Patel
4c460eb369 RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
The indentation of "break" in kvm_riscv_vcpu_set_reg_csr() is
inconsistent hence let us fix it.

Fixes: c04913f2b54e ("RISCV: KVM: Add sstateen0 to ONE_REG")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312190719.kBuYl6oJ-lkp@intel.com/
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29 12:31:58 +05:30
Daniel Henrique Barboza
3975525e55 RISC-V: KVM: add vector registers and CSRs in KVM_GET_REG_LIST
Add all vector registers and CSRs (vstart, vl, vtype, vcsr, vlenb) in
get-reg-list.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29 12:31:56 +05:30
Daniel Henrique Barboza
2fa290372d RISC-V: KVM: add 'vlenb' Vector CSR
Userspace requires 'vlenb' to be able to encode it in reg ID. Otherwise
it is not possible to retrieve any vector reg since we're returning
EINVAL if reg_size isn't vlenb (see kvm_riscv_vcpu_vreg_addr()).

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29 12:31:54 +05:30
Daniel Henrique Barboza
197bd237b6 RISC-V: KVM: set 'vlenb' in kvm_riscv_vcpu_alloc_vector_context()
'vlenb', added to riscv_v_ext_state by commit c35f3aa34509 ("RISC-V:
vector: export VLENB csr in __sc_riscv_v_state"), isn't being
initialized in guest_context. If we export 'vlenb' as a KVM CSR,
something we want to do in the next patch, it'll always return 0.

Set 'vlenb' to riscv_v_size/32.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29 12:31:53 +05:30
Andrew Jones
23e1dc4502 RISC-V: KVM: Make SBI uapi consistent with ISA uapi
When an SBI extension cannot be enabled, that's a distinct state vs.
enabled and disabled. Modify enum kvm_riscv_sbi_ext_status to
accommodate it, which allows KVM userspace to tell the difference
in state too, as the SBI extension register will disappear when it
cannot be enabled, i.e. accesses to it return ENOENT. get-reg-list is
updated as well to only add SBI extension registers to the list which
may be enabled. Returning ENOENT for SBI extension registers which
cannot be enabled makes them consistent with ISA extension registers.
Any SBI extensions which were enabled by default are still enabled by
default, if they can be enabled at all.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29 12:31:44 +05:30