IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In the current code, gdb can set the watchpoint successfully through
ptrace interface, but watchpoint will not be triggered.
When debugging the following code using gdb.
lihui@bogon:~$ cat test.c
#include <stdio.h>
int a = 0;
int main()
{
a = 1;
printf("a = %d\n", a);
return 0;
}
lihui@bogon:~$ gcc -g test.c -o test
lihui@bogon:~$ gdb test
...
(gdb) watch a
...
(gdb) r
...
a = 1
[Inferior 1 (process 4650) exited normally]
No watchpoints were triggered, the root causes are:
1. Kernel uses perf_event and hw_breakpoint framework to control
watchpoint, but the perf_event corresponding to watchpoint is
not enabled. So it needs to be enabled according to MWPnCFG3
or FWPnCFG3 PLV bit field in ptrace_hbp_set_ctrl(), and privilege
is set according to the monitored addr in hw_breakpoint_control().
Furthermore, add a judgment in ptrace_hbp_set_addr() to ensure
kernel-space addr cannot be monitored in user mode.
2. The global enable control for all watchpoints is the WE bit of
CSR.CRMD, and hardware sets the value to 0 when an exception is
triggered. When the ERTN instruction is executed to return, the
hardware restores the value of the PWE field of CSR.PRMD here.
So, before a thread containing watchpoints be scheduled, the PWE
field of CSR.PRMD needs to be set to 1. Add this modification in
hw_breakpoint_control().
All changes according to the LoongArch Reference Manual:
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpointshttps://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#basic-control-and-status-registers
With this patch:
lihui@bogon:~$ gdb test
...
(gdb) watch a
Hardware watchpoint 1: a
(gdb) r
...
Hardware watchpoint 1: a
Old value = 0
New value = 1
main () at test.c:6
6 printf("a = %d\n", a);
(gdb) c
Continuing.
a = 1
[Inferior 1 (process 775) exited normally]
Cc: stable@vger.kernel.org
Signed-off-by: Hui Li <lihui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
In the current code, when debugging the following code using gdb,
"invalid argument ..." message will be displayed.
lihui@bogon:~$ cat test.c
#include <stdio.h>
int a = 0;
int main()
{
a = 1;
return 0;
}
lihui@bogon:~$ gcc -g test.c -o test
lihui@bogon:~$ gdb test
...
(gdb) watch a
Hardware watchpoint 1: a
(gdb) r
...
Invalid argument setting hardware debug registers
There are mainly two types of issues.
1. Some incorrect judgment condition existed in user_watch_state
argument parsing, causing -EINVAL to be returned.
When setting up a watchpoint, gdb uses the ptrace interface,
ptrace(PTRACE_SETREGSET, tid, NT_LOONGARCH_HW_WATCH, (void *) &iov)).
Register values in user_watch_state as follows:
addr[0] = 0x0, mask[0] = 0x0, ctrl[0] = 0x0
addr[1] = 0x0, mask[1] = 0x0, ctrl[1] = 0x0
addr[2] = 0x0, mask[2] = 0x0, ctrl[2] = 0x0
addr[3] = 0x0, mask[3] = 0x0, ctrl[3] = 0x0
addr[4] = 0x0, mask[4] = 0x0, ctrl[4] = 0x0
addr[5] = 0x0, mask[5] = 0x0, ctrl[5] = 0x0
addr[6] = 0x0, mask[6] = 0x0, ctrl[6] = 0x0
addr[7] = 0x12000803c, mask[7] = 0x0, ctrl[7] = 0x610
In arch_bp_generic_fields(), return -EINVAL when ctrl.len is
LOONGARCH_BREAKPOINT_LEN_8(0b00). So delete the incorrect judgment here.
In ptrace_hbp_fill_attr_ctrl(), when note_type is NT_LOONGARCH_HW_WATCH
and ctrl[0] == 0x0, if ((type & HW_BREAKPOINT_RW) != type) will return
-EINVAL. Here ctrl.type should be set based on note_type, and unnecessary
judgments can be removed.
2. The watchpoint argument was not set correctly due to unnecessary
offset and alignment_mask.
Modify ptrace_hbp_fill_attr_ctrl() and hw_breakpoint_arch_parse(), which
ensure the watchpont argument is set correctly.
All changes according to the LoongArch Reference Manual:
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpoints
Cc: stable@vger.kernel.org
Signed-off-by: Hui Li <lihui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
In JUMP_VIRT_ADDR we are performing an or calculation on address value
directly from pcaddi.
This will only work if we are currently running from direct 1:1 mapping
addresses or firmware's DMW is configured exactly same as kernel. Still,
we should not rely on such assumption.
Fix by overriding higher bits in address comes from pcaddi, so we can
get rid of or operator.
Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
NUMA enabled kernel on FDT based machine fails to boot because CPUs
are all in NUMA_NO_NODE and mm subsystem won't accept that.
Fix by adding them to default NUMA node at FDT parsing phase and move
numa_add_cpu(0) to a later point.
Cc: stable@vger.kernel.org
Fixes: 88d4d957edc7 ("LoongArch: Add FDT booting support from efi system table")
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bergmann which enables a number of additional build-time warnings. We
fixed all the fallout which we could find, there may still be a few
stragglers.
- Samuel Holland has developed the series "Unified cross-architecture
kernel-mode FPU API". This does a lot of consolidation of
per-architecture kernel-mode FPU usage and enables the use of newer AMD
GPUs on RISC-V.
- Tao Su has fixed some selftests build warnings in the series
"Selftests: Fix compilation warnings due to missing _GNU_SOURCE
definition".
- This pull also includes a nilfs2 fixup from Ryusuke Konishi.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZk6OSAAKCRDdBJ7gKXxA
jpTGAP9hQaZ+g7CO38hKQAtEI8rwcZJtvUAP84pZEGMjYMGLxQD/S8z1o7UHx61j
DUbnunbOkU/UcPx3Fs/gp4KcJARMEgs=
=EPi9
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more non-mm updates from Andrew Morton:
- A series ("kbuild: enable more warnings by default") from Arnd
Bergmann which enables a number of additional build-time warnings. We
fixed all the fallout which we could find, there may still be a few
stragglers.
- Samuel Holland has developed the series "Unified cross-architecture
kernel-mode FPU API". This does a lot of consolidation of
per-architecture kernel-mode FPU usage and enables the use of newer
AMD GPUs on RISC-V.
- Tao Su has fixed some selftests build warnings in the series
"Selftests: Fix compilation warnings due to missing _GNU_SOURCE
definition".
- This pull also includes a nilfs2 fixup from Ryusuke Konishi.
* tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
nilfs2: make block erasure safe in nilfs_finish_roll_forward()
selftests/harness: use 1024 in place of LINE_MAX
Revert "selftests/harness: remove use of LINE_MAX"
selftests/fpu: allow building on other architectures
selftests/fpu: move FP code to a separate translation unit
drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT
drm/amd/display: only use hard-float, not altivec on powerpc
riscv: add support for kernel-mode FPU
x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT
powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT
lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS
arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS
arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS
ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT
arch: add ARCH_HAS_KERNEL_FPU_SUPPORT
x86/fpu: fix asm/fpu/types.h include guard
kbuild: enable -Wcast-function-type-strict unconditionally
kbuild: enable -Wformat-truncation on clang
...
1, Select some options in Kconfig;
2, Give a chance to build with !CONFIG_SMP;
3, Switch to use built-in rustc target;
4, Add new supported device nodes to dts;
5, Some bug fixes and other small changes;
6, Update the default config file.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmZKCycWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImeoXWD/9pFhbbJj49T1xiwc2j/XgQL8HI
s88/h4z5AXEbHFO8XIG1Cpw/Z3a1DsCiWBsOkCogagILzYuN0r7UqcrI02ZoeY6N
fbuDatB3i+hJWCBzcl1HPkFy/9av4j4EktZs0+X/wVgKkd0aIh78qs8+1RwKhshf
FoOv+cMu7zFS8Jrt+w16diNCY1JsDv7TCkCVhvJxAodrtGg4oo2NPfrGOrKAP8Dq
LClvFEqDcXq1kKcipw3Q7BwDlBpJEvLZ0iAl19BnLAmBzI3Wfze9ouoYv8WiUyaY
br0GPShGf16I3DKtTdHsHH/zmayQ7JSmFzZ9JEHzcBrE4AprfWLuwsUjd2WXDD6U
wK+p4tWd0AUFf+/h4u1yQB9/rlt+JZ2ny/A2u4YR/BPtthiYqp8SDSH62vpCSFOE
dByDeTbfjTdJsWr+bsI2gOO0sVwDYpph9SJfAyBn4miKw7v8w+2rI1oqo/ZQkP59
0SczM9C9jzpgXSGDc4yQbnqoA4KA9U6zljd12mYL5HV/AjhD19va3FmENgByZUuE
Z7A0RZsiU5T401xEZiOUhwzy9m/USc1O2ivCmeowx9kP/gWic0KeAsmlMiro0jeR
y9jthci8iOgjjLmCEVC06GWGUojP2roXI/38We6enVevy2GXbEEDRa1QGbQ5ndoJ
MEPm4NvW1wsBgWIYmg==
=WR15
-----END PGP SIGNATURE-----
Merge tag 'loongarch-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Select some options in Kconfig
- Give a chance to build with !CONFIG_SMP
- Switch to use built-in rustc target
- Add new supported device nodes to dts
- Some bug fixes and other small changes
- Update the default config file
* tag 'loongarch-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Update Loongson-3 default config file
LoongArch: dts: Add new supported device nodes to Loongson-2K2000
LoongArch: dts: Add new supported device nodes to Loongson-2K0500
LoongArch: dts: Remove "disabled" state of clock controller node
LoongArch: rust: Switch to use built-in rustc target
LoongArch: Fix callchain parse error with kernel tracepoint events again
LoongArch: Give a chance to build with !CONFIG_SMP
LoongArch: Select THP_SWAP if HAVE_ARCH_TRANSPARENT_HUGEPAGE
LoongArch: Select ARCH_WANT_DEFAULT_BPF_JIT
LoongArch: Select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
LoongArch: Select ARCH_HAS_FAST_MULTIPLIER
These are a few cross-architecture cleanup patches:
- Thomas Zimmermann works on separating fbdev support from the asm/video.h
contents that may be used by either the old fbdev drivers or the
newer drm display code.
- Thorsten Blum contributes cleanups for the generic bitops code
and asm-generic/bug.h
- I remove the orphaned include/asm-generic/page.h header that used to
included by long-removed mmu-less architectures.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmZLvewACgkQYKtH/8kJ
UicUEQ//b5WVLOVXkFGlQvAaZkagOLEF8xSTnchA7aKrWQ/C6hSwLN6CQU6MAY7j
Fe54jYQtjwBwpVIj3jn20xiXP/pZbQp9aldkOx4v8YoGnjNF5UWLHm5510DV1ecE
0LF/2YIH25vIXGY6MVm6sFq+nkDgWZee6fBFNc3GsCu2y0biD1Gob9xH/ngCHjIj
tw9KS/j6MivPy/9vJ/Ml2YeutV6+pUA9hNmSrbSVlXSWFh3Wq6IZ+j6bNEftqtZY
xdnYwdVfReOCIayq6hSHhAgIp/uw8JOqLuE2JNwG/9sSF4zp4ZHLvTaMhqEoCpyB
3kZYd1qQTwV3eL5PyYtRcW03KvbhfZpMPzZT+wbl9SNPUljC2MSVeSFF30Uqatgb
yUJ9d/vlb1ynu1yQrFfTZ/kK+U0pPByydwLybcMtEIZ6Hrb1h/eRicvHhUx7bKUB
H9z/FN/TxGY+tPradx2lqm3J1wNu0ox8DUreXjtlJijKIUZQeAkJrGJgr6i6XLBz
crwgKzuQUClzEjBcoWzuTVUB7v19jaDuHMsaBBu8O9f1g5FnEIJlItqnXf1J0Dno
rJy68Mxsg4Dzt4YI3lpOJGDDDPhpOTBXfgsjkuru2MrdFMgZQh+DYLl3qOkJ4DJe
rdiEJb9PygBaGGQnoXO71oOLf5yQuenj+Fg5GIe9AQrci5fXwRQ=
=riCs
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
"These are a few cross-architecture cleanup patches:
- separate out fbdev support from the asm/video.h contents that may
be used by either the old fbdev drivers or the newer drm display
code (Thomas Zimmermann)
- cleanups for the generic bitops code and asm-generic/bug.h
(Thorsten Blum)
- remove the orphaned include/asm-generic/page.h header that used to
be included by long-removed mmu-less architectures (me)"
* tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: Fix name collision with ACPI's video.o
bug: Improve comment
asm-generic: remove unused asm-generic/page.h
arch: Rename fbdev header and source files
arch: Remove struct fb_info from video helpers
arch: Select fbdev helpers with CONFIG_VIDEO
bitops: Change function return types from long to int
LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
asm/fpu.h, so it only needs to add kernel_fpu_available() and export the
CFLAGS adjustments.
Link: https://lkml.kernel.org/r/20240329072441.591471-8-samuel.holland@sifive.com
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Acked-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
documented (hopefully adequately) in the respective changelogs. Notable
series include:
- Lucas Stach has provided some page-mapping
cleanup/consolidation/maintainability work in the series "mm/treewide:
Remove pXd_huge() API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being allocated:
number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in largely
similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
Weiner has fixed the page allocator's handling of migratetype requests,
with resulting improvements in compaction efficiency.
- In the series "make the hugetlb migration strategy consistent" Baolin
Wang has fixed a hugetlb migration issue, which should improve hugetlb
allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when memory
almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting" Kairui
Song has optimized pagecache insertion, yielding ~10% performance
improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various page->flags
cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
"mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs. This
is a simple first-cut implementation for now. The series is "support
multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in the
series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts in
the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes
the initialization code so that migration between different memory types
works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant driver
in the series "mm: follow_pte() improvements and acrn follow_pte()
fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to folio
in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size THP's
in the series "mm: add per-order mTHP alloc and swpout counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His series
"mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback instrumentation
in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series "Fix
and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
series "Improve anon_vma scalability for anon VMAs". Intel's test bot
reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
=V3R/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
With commit d3119bc985fb645 ("LoongArch: Fix callchain parse error with
kernel tracepoint events"), perf can parse kernel callchain, but not
complete and sometimes maybe error. The reason is LoongArch's unwinders
(guess, prologue and orc) don't really need fp (i.e., regs[22]), and
they use sp (i.e., regs[3]) as the frame address rather than the current
stack pointer.
Fix that by removing the assignment of regs[22], and instead assign the
__builtin_frame_address(0) to regs[3].
Without fix:
Children Self Command Shared Object Symbol
........ ........ ............. ................. ................
33.91% 33.91% swapper [kernel.vmlinux] [k] __schedule
|
|--33.04%--__schedule
|
--0.87%--__arch_cpu_idle
__schedule
With this fix:
Children Self Command Shared Object Symbol
........ ........ ............. ................. ................
31.16% 31.16% swapper [kernel.vmlinux] [k] __schedule
|
|--20.63%--smpboot_entry
| cpu_startup_entry
| schedule_idle
| __schedule
|
--10.53%--start_kernel
cpu_startup_entry
schedule_idle
__schedule
Fixes: d3119bc985fb645 ("LoongArch: Fix callchain parse error with kernel tracepoint events")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
In the current code, SMP is selected in Kconfig for LoongArch, the users
can not unset it, this is reasonable for a multi-processor machine. But
as the help info of config SMP said, if you have a system with only one
CPU, say N. On a uni-processor machine, the kernel will run faster if you
say N here.
Loongson-2K0500 is a single-core CPU for applications like industrial
control, printing terminals, and BMC (Baseboard Management Controller),
there are many development boards, products and solutions on the market,
so it is better and necessary to give a chance to build with !CONFIG_SMP
for a uni-processor machine.
First of all, do not select SMP for config LOONGARCH in Kconfig to make
it possible to unset CONFIG_SMP. Then, do some changes to fix warnings
and errors if CONFIG_SMP is not set.
(1) Define get_ipi_irq() only if CONFIG_SMP is set to fix the warning:
arch/loongarch/kernel/irq.c:90:19: warning: 'get_ipi_irq' defined but not used [-Wunused-function]
(2) Add "#ifdef CONFIG_SMP" in asm/smp.h to fix the warning:
./arch/loongarch/include/asm/smp.h:49:9: warning: "raw_smp_processor_id" redefined
49 | #define raw_smp_processor_id raw_smp_processor_id
| ^~~~~~~~~~~~~~~~~~~~
./include/linux/smp.h:198:9: note: this is the location of the previous definition
198 | #define raw_smp_processor_id() 0
(3) Define machine_shutdown() as empty under !CONFIG_SMP to fix the error:
arch/loongarch/kernel/machine_kexec.c: In function 'machine_shutdown':
arch/loongarch/kernel/machine_kexec.c:233:25: error: implicit declaration of function 'cpu_device_up'; did you mean 'put_device'? [-Wimplicit-function-declaration]
(4) Make config SCHED_SMT depends on SMP to fix many errors such as:
kernel/sched/core.c: In function 'sched_core_find':
kernel/sched/core.c:310:43: error: 'struct rq' has no member named 'cpu'
(5) Define cpu_logical_map(cpu) as 0 under !CONFIG_SMP in asm/smp.h,
then include asm/smp.h in asm/acpi.h (because acpi.h is included in
linux/irq.h indirectly) to fix many build errors under drivers/irqchip
such as:
drivers/irqchip/irq-loongson-eiointc.c: In function 'cpu_to_eio_node':
drivers/irqchip/irq-loongson-eiointc.c:59:16: error: implicit declaration of function 'cpu_logical_map' [-Wimplicit-function-declaration]
(6) Do not write per_cpu_offset(0) to PERCPU_BASE_KS when resume because
the per_cpu_offset(x) macro is defined as (__per_cpu_offset[x]) only
under CONFIG_SMP in include/asm-generic/percpu.h. Just save the value of
PERCPU_BASE_KS when suspend and restore it when resume to fix the error:
arch/loongarch/power/suspend.c: In function 'loongarch_common_resume':
arch/loongarch/power/suspend.c:47:21: error: implicit declaration of function 'per_cpu_offset' [-Wimplicit-function-declaration]
(7) Fix huge page handling under !CONFIG_SMP in tlbex.S.
When running the UnixBench tests with "-c 1" single-streamed pass, the
improvement of performance is about 9 percent with this patch.
By the way, it is helpful to debug and analysis the kernel issues of
multi-processor system under !CONFIG_SMP.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
This allows compiling a full 128-bit product of two 64-bit integers as a
mul/mulh pair, instead of a nasty long sequence of 20+ instructions.
However, after selecting ARCH_SUPPORTS_INT128, when optimizing for size
the compiler generates calls to __ashlti3, __ashrti3, and __lshrti3 for
shifting __int128 values, causing a link failure:
loongarch64-unknown-linux-gnu-ld: kernel/sched/fair.o: in
function `mul_u64_u32_shr':
<PATH>/include/linux/math64.h:161:(.text+0x5e4): undefined
reference to `__lshrti3'
So provide the implementation of these functions if ARCH_SUPPORTS_INT128.
Closes: https://lore.kernel.org/loongarch/CAAhV-H5EZ=7OF7CSiYyZ8_+wWuenpo=K2WT8-6mAT4CvzUC_4g@mail.gmail.com/
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When VM runs in kvm mode, system will not exit to host mode when
executing a general software breakpoint instruction such as INSN_BREAK,
trap exception happens in guest mode rather than host mode. In order to
debug guest kernel on host side, one mechanism should be used to let VM
exit to host mode.
Here a hypercall instruction with a special code is used for software
breakpoint usage. VM exits to host mode and kvm hypervisor identifies
the special hypercall code and sets exit_reason with KVM_EXIT_DEBUG. And
then let qemu handle it.
Idea comes from ppc kvm, one api KVM_REG_LOONGARCH_DEBUG_INST is added
to get the hypercall code. VMM needs get sw breakpoint instruction with
this api and set the corresponding sw break point for guest kernel.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
PARAVIRT config option and PV IPI is added for the guest side, function
pv_ipi_init() is used to add IPI sending and IPI receiving hooks. This
function firstly checks whether system runs in VM mode, and if kernel
runs in VM mode, it will call function kvm_para_available() to detect
the current hypervirsor type (now only KVM type detection is supported).
The paravirt functions can work only if current hypervisor type is KVM,
since there is only KVM supported on LoongArch now.
PV IPI uses virtual IPI sender and virtual IPI receiver functions. With
virtual IPI sender, IPI message is stored in memory rather than emulated
HW. IPI multicast is also supported, and 128 vcpus can received IPIs
at the same time like X86 KVM method. Hypercall method is used for IPI
sending.
With virtual IPI receiver, HW SWI0 is used rather than real IPI HW.
Since VCPU has separate HW SWI0 like HW timer, there is no trap in IPI
interrupt acknowledge. Since IPI message is stored in memory, there is
no trap in getting IPI message.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
On LoongArch system, IPI hw uses iocsr registers. There are one iocsr
register access on IPI sending, and two iocsr access on IPI receiving
for the IPI interrupt handler. In VM mode all iocsr accessing will cause
VM to trap into hypervisor. So with one IPI hw notification there will
be three times of trap.
In this patch PV IPI is added for VM, hypercall instruction is used for
IPI sender, and hypervisor will inject an SWI to the destination vcpu.
During the SWI interrupt handler, only CSR.ESTAT register is written to
clear irq. CSR.ESTAT register access will not trap into hypervisor, so
with PV IPI supported, there is one trap with IPI sender, and no trap
with IPI receiver, there is only one trap with IPI notification.
Also this patch adds IPI multicast support, the method is similar with
x86. With IPI multicast support, IPI notification can be sent to at
most 128 vcpus at one time. It greatly reduces the times of trapping
into hypervisor.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Physical CPUID is used for interrupt routing for irqchips such as ipi,
msgint and eiointc interrupt controllers. Physical CPUID is stored at
the CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
is created and the physical CPUIDs of two vcpus cannot be the same.
Different irqchips have different size declaration about physical CPUID,
the max CPUID value for CSR LOONGARCH_CSR_CPUID on Loongson-3A5000 is
512, the max CPUID supported by IPI hardware is 1024, while for eiointc
irqchip is 256, and for msgint irqchip is 65536.
The smallest value from all interrupt controllers is selected now, and
the max cpuid size is defines as 256 by KVM which comes from the eiointc
irqchip.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Instruction cpucfg can be used to get processor features. And there
is a trap exception when it is executed in VM mode, and also it can be
used to provide cpu features to VM. On real hardware cpucfg area 0 - 20
is used by now. Here one specified area 0x40000000 -- 0x400000ff is used
for KVM hypervisor to provide PV features, and the area can be extended
for other hypervisors in future. This area will never be used for real
HW, it is only used by software.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
On LoongArch system, there is a hypercall instruction special for
virtualization. When system executes this instruction on host side,
there is an illegal instruction exception reported, however it will
trap into host when it is executed in VM mode.
When hypercall is emulated, A0 register is set with value
KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid
instruction exception. So VM can continue to executing the next code.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Refine the ipi handling on LoongArch platform, there are three
modifications:
1. Add generic function get_percpu_irq(), replacing some percpu irq
functions such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with
get_percpu_irq().
2. Change definition about parameter action called by function
loongson_send_ipi_single() and loongson_send_ipi_mask(), and it is
defined as decimal encoding format at ipi sender side. Normal decimal
encoding is used rather than binary bitmap encoding for ipi action, ipi
hw sender uses decimal encoding code, and ipi receiver will get binary
bitmap encoding, the ipi hw will convert it into bitmap in ipi message
buffer.
3. Add a structure smp_ops on LoongArch platform so that pv ipi can be
used later.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The per-architecture fbdev code has no dependencies on fbdev and can
be used for any video-related subsystem. Rename the files to 'video'.
Use video-sti.c on parisc as the source file depends on CONFIG_STI_CORE.
On arc, arm, arm64, sh, and um the asm header file is an empty wrapper
around the file in asm-generic. Let Kbuild generate the file. The build
system does this automatically. Only um needs to generate video.h
explicitly, so that it overrides the host architecture's header. The
latter would otherwise interfere with the build.
Further update all includes statements, include guards, and Makefiles.
Also update a few strings and comments to refer to video instead of
fbdev.
v3:
- arc, arm, arm64, sh: generate asm header via build system (Sam,
Helge, Arnd)
- um: rename fb.h to video.h
- fix typo in commit message (Sam)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
With LLVM=1 and W=1 we get:
./include/asm-generic/tlb.h:629:10: error: parameter 'ptep' set
but not used [-Werror,-Wunused-but-set-parameter]
We fixed a similar issue via Arnd in the introducing commit, missed the
LoongArch variant. Turns out, there is no need for LoongArch to have a
custom variant, so let's just drop it and rely on the asm-generic one.
Fixes: 4d5bf0b6183f ("mm/mmu_gather: add tlb_remove_tlb_entries()")
Closes: https://lkml.kernel.org/r/CANiq72mQh3O9S4umbvrKBgMMorty48UMwS01U22FR0mRyd3cyQ@mail.gmail.com
Reported-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
In commit 85fcde402db191b5 ("kexec: split crashkernel reservation code
out from crash_core.c"), crashkernel reservation code is split out from
crash_core.c, and add CRASH_RESERVE to control it. And also rename each
ARCH's <asm/crash_core.h> to <asm/crash_reserve.h> accordingly.
But the relevant part in LoongArch is missed. Do it now.
Fixes: 85fcde402db1 ("kexec: split crashkernel reservation code out from crash_core.c")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The .change_pte() MMU notifier callback was intended as an
optimization. The original point of it was that KSM could tell KVM to flip
its secondary PTE to a new location without having to first zap it. At
the time there was also an .invalidate_page() callback; both of them were
*not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(),
and .invalidate_page() also doubled as a fallback implementation of
.change_pte().
Later on, however, both callbacks were changed to occur within an
invalidate_range_start/end() block.
In the case of .change_pte(), commit 6bdb913f0a70 ("mm: wrap calls to
set_pte_at_notify with invalidate_range_start and invalidate_range_end",
2012-10-09) did so to remove the fallback from .invalidate_page() to
.change_pte() and allow sleepable .invalidate_page() hooks.
This however made KVM's usage of the .change_pte() callback completely
moot, because KVM unmaps the sPTEs during .invalidate_range_start()
and therefore .change_pte() has no hope of finding a sPTE to change.
Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as
well as all the architecture specific implementations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Anup Patel <anup@brainfault.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20240405115815.3226315-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
LoongArch's include/asm/addrspace.h uses SZ_32M and SZ_16K, so add
<linux/sizes.h> to provide those macros to prevent build errors:
In file included from ../arch/loongarch/include/asm/io.h:11,
from ../include/linux/io.h:13,
from ../include/linux/io-64-nonatomic-lo-hi.h:5,
from ../drivers/cxl/pci.c:4:
../include/asm-generic/io.h: In function 'ioport_map':
../arch/loongarch/include/asm/addrspace.h:124:25: error: 'SZ_32M' undeclared (first use in this function); did you mean 'PS_32M'?
124 | #define PCI_IOSIZE SZ_32M
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
KFENCE changes virt_to_page() to be able to translate tlb mapped virtual
addresses, but forget to change virt_to_phys()/phys_to_virt() and other
translation functions as well. This patch fix it, otherwise some drivers
(such as nvme and virtio-blk) cannot work with KFENCE.
All {virt, phys, page, pfn} translation functions are updated:
1, virt_to_pfn()/pfn_to_virt();
2, virt_to_page()/page_to_virt();
3, virt_to_phys()/phys_to_virt().
DMW/TLB mapped addresses are distinguished by comparing the vaddress
with vm_map_base in virt_to_xyz(), and we define WANT_PAGE_VIRTUAL in
the KFENCE case for the reverse translations, xyz_to_virt().
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
1, Add objtool support for LoongArch;
2, Add ORC stack unwinder support for LoongArch;
3, Add kernel livepatching support for LoongArch;
4, Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig;
5, Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig;
6, Some bug fixes and other small changes.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmX69KsWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImerjFD/9rAzm1+G4VxFvFzlOiJXEqquNJ
+Vz2fAZLU3lJhBlx0uUKXFVijvLe8s/DnoLrM9e/M6gk4ivT9eszy3DnqT3NjGDX
njYFkPUWZhZGACmbkoVk9St80R8sPIdZrwXtW3q7g3T0bC7LXUXrJw52Sh4gmbYx
RqLsE6GoEWGY0zhhWqeeAM9LkKDuLxxyjH4fYE4g623EhQt7A7hP5okyaC+xHzp+
qp/4dPFLu61LeqIfeBUKK7nQ6uzno3EWLiME2eHEHiuelYfzmh+BtNMcX9Ugb/En
j0vLGNsoDGmEYw7xGa6OSRaCR/nCwVJz4SvuH33wbbbHhVAiUKUBVNFR3gmAtLlc
BSa2dDZbKhHkiWSUCM9K2ihr7WiQNuraTK1kKHwBgfa+RbEVOTu1q8yokAB9XCaT
T7lijJ8MKQmzHpMvgev7nN41baDB6V5bPIni0Ueh+NhQJKZ2/IxtYA3XzV5D0UgL
TBovVgYB/VNThS9gzOrlenKuDX9hT+kCQgyudErXaoIo645P6dsPFowOZRQxCEIv
WnLskZatLTCA8xWl1XyC1bqtGxhp34Gbhg0ZcvUqlNE20luaK/qi8wtW9Mv1Utp+
aXFO3i7d93z99oAcUT0oc1N83T0x0M/p69Z+rL/2+L0sYQgBf1cwUEiDNRW4OCdI
h15289rRTxjeL7NZPw==
=lSkY
-----END PGP SIGNATURE-----
Merge tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Add objtool support for LoongArch
- Add ORC stack unwinder support for LoongArch
- Add kernel livepatching support for LoongArch
- Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig
- Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig
- Some bug fixes and other small changes
* tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch/crypto: Clean up useless assignment operations
LoongArch: Define the __io_aw() hook as mmiowb()
LoongArch: Remove superfluous flush_dcache_page() definition
LoongArch: Move {dmw,tlb}_virt_to_page() definition to page.h
LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
LoongArch: Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig
LoongArch: Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig
LoongArch: Add kernel livepatching support
LoongArch: Add ORC stack unwinder support
objtool: Check local label in read_unwind_hints()
objtool: Check local label in add_dead_ends()
objtool/LoongArch: Enable orc to be built
objtool/x86: Separate arch-specific and generic parts
objtool/LoongArch: Implement instruction decoder
objtool/LoongArch: Enable objtool to be built
Commit fb24ea52f78e0d595852e ("drivers: Remove explicit invocations of
mmiowb()") remove all mmiowb() in drivers, but it says:
"NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
spin_unlock(). However, pairing each mmiowb() removal in this patch with
the corresponding call to spin_unlock() is not at all trivial, so there
is a small chance that this change may regress any drivers incorrectly
relying on mmiowb() to order MMIO writes between CPUs using lock-free
synchronisation."
The mmio in radeon_ring_commit() is protected by a mutex rather than a
spinlock, but in the mutex fastpath it behaves similar to spinlock. We
can add mmiowb() calls in the radeon driver but the maintainer says he
doesn't like such a workaround, and radeon is not the only example of
mutex protected mmio.
So we should extend the mmiowb tracking system from spinlock to mutex,
and maybe other locking primitives. This is not easy and error prone, so
we solve it in the architectural code, by simply defining the __io_aw()
hook as mmiowb(). And we no longer need to override queued_spin_unlock()
so use the generic definition.
Without this, we get such an error when run 'glxgears' on weak ordering
architectures such as LoongArch:
radeon 0000:04:00.0: ring 0 stalled for more than 10324msec
radeon 0000:04:00.0: ring 3 stalled for more than 10240msec
radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3)
radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
radeon 0000:04:00.0: scheduling IB failed (-35).
[drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@amd.com/T/#t
Link: https://lore.kernel.org/linux-arch/20240301130532.3953167-1-chenhuacai@loongson.cn/T/#t
Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LoongArch doesn't have cache aliases, so flush_dcache_page() is a no-op.
There is a generic implementation for this case in include/asm-generic/
cacheflush.h. So remove the superfluous flush_dcache_page() definition,
which also silences such build warnings:
In file included from crypto/scompress.c:12:
include/crypto/scatterwalk.h: In function 'scatterwalk_pagedone':
include/crypto/scatterwalk.h:76:30: warning: variable 'page' set but not used [-Wunused-but-set-variable]
76 | struct page *page;
| ^~~~
crypto/scompress.c: In function 'scomp_acomp_comp_decomp':
>> crypto/scompress.c:174:38: warning: unused variable 'dst_page' [-Wunused-variable]
174 | struct page *dst_page = sg_page(req->dst);
|
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202403091614.NeUw5zcv-lkp@intel.com/
Suggested-by: Barry Song <baohua@kernel.org>
Acked-by: Barry Song <baohua@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
These two functions are implemented in pgtable.c, and they are needed
only by the virt_to_page() macro in page.h. Having the prototypes in
pgtable.h causes a circular dependency between page.h and pgtable.h,
because the virt_to_page() macro in page.h needs pgtable.h for these
two functions, while pgtable.h needs various definitions from page.h
(e.g. pte_t and pgt_t).
Let's avoid this circular dependency by moving the function prototypes
to page.h.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
From GCC commit 3f13154553f8546a ("df-scan: remove ad-hoc handling of
global regs in asms"), global registers will no longer be forced to add
to the def-use chain. Then current_thread_info(), current_stack_pointer
and __my_cpu_offset may be lifted out of the loop because they are no
longer treated as "volatile variables".
This optimization is still correct for the current_thread_info() and
current_stack_pointer usages because they are associated to a thread.
However it is wrong for __my_cpu_offset because it is associated to a
CPU rather than a thread: if the thread migrates to a different CPU in
the loop, __my_cpu_offset should be changed.
Change __my_cpu_offset definition to treat it as a "volatile variable",
in order to avoid such a mis-optimization.
Cc: stable@vger.kernel.org
Reported-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Reported-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Xing Li <lixing@loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Rui Wang <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
* Changes to FPU handling came in via the main s390 pull request
* Only deliver to the guest the SCLP events that userspace has
requested.
* More virtual vs physical address fixes (only a cleanup since
virtual and physical address spaces are currently the same).
* Fix selftests undefined behavior.
x86:
* Fix a restriction that the guest can't program a PMU event whose
encoding matches an architectural event that isn't included in the
guest CPUID. The enumeration of an architectural event only says
that if a CPU supports an architectural event, then the event can be
programmed *using the architectural encoding*. The enumeration does
NOT say anything about the encoding when the CPU doesn't report support
the event *in general*. It might support it, and it might support it
using the same encoding that made it into the architectural PMU spec.
* Fix a variety of bugs in KVM's emulation of RDPMC (more details on
individual commits) and add a selftest to verify KVM correctly emulates
RDMPC, counter availability, and a variety of other PMC-related
behaviors that depend on guest CPUID and therefore are easier to
validate with selftests than with custom guests (aka kvm-unit-tests).
* Zero out PMU state on AMD if the virtual PMU is disabled, it does not
cause any bug but it wastes time in various cases where KVM would check
if a PMC event needs to be synthesized.
* Optimize triggering of emulated events, with a nice ~10% performance
improvement in VM-Exit microbenchmarks when a vPMU is exposed to the
guest.
* Tighten the check for "PMI in guest" to reduce false positives if an NMI
arrives in the host while KVM is handling an IRQ VM-Exit.
* Fix a bug where KVM would report stale/bogus exit qualification information
when exiting to userspace with an internal error exit code.
* Add a VMX flag in /proc/cpuinfo to report 5-level EPT support.
* Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for
read, e.g. to avoid serializing vCPUs when userspace deletes a memslot.
* Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM
doesn't support yielding in the middle of processing a zap, and 1GiB
granularity resulted in multi-millisecond lags that are quite impolite
for CONFIG_PREEMPT kernels.
* Allocate write-tracking metadata on-demand to avoid the memory overhead when
a kernel is built with i915 virtualization support but the workloads use
neither shadow paging nor i915 virtualization.
* Explicitly initialize a variety of on-stack variables in the emulator that
triggered KMSAN false positives.
* Fix the debugregs ABI for 32-bit KVM.
* Rework the "force immediate exit" code so that vendor code ultimately decides
how and when to force the exit, which allowed some optimization for both
Intel and AMD.
* Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if
vCPU creation ultimately failed, causing extra unnecessary work.
* Cleanup the logic for checking if the currently loaded vCPU is in-kernel.
* Harden against underflowing the active mmu_notifier invalidation
count, so that "bad" invalidations (usually due to bugs elsehwere in the
kernel) are detected earlier and are less likely to hang the kernel.
x86 Xen emulation:
* Overlay pages can now be cached based on host virtual address,
instead of guest physical addresses. This removes the need to
reconfigure and invalidate the cache if the guest changes the
gpa but the underlying host virtual address remains the same.
* When possible, use a single host TSC value when computing the deadline for
Xen timers in order to improve the accuracy of the timer emulation.
* Inject pending upcall events when the vCPU software-enables its APIC to fix
a bug where an upcall can be lost (and to follow Xen's behavior).
* Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
events fails, e.g. if the guest has aliased xAPIC IDs.
RISC-V:
* Support exception and interrupt handling in selftests
* New self test for RISC-V architectural timer (Sstc extension)
* New extension support (Ztso, Zacas)
* Support userspace emulation of random number seed CSRs.
ARM:
* Infrastructure for building KVM's trap configuration based on the
architectural features (or lack thereof) advertised in the VM's ID
registers
* Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
x86's WC) at stage-2, improving the performance of interacting with
assigned devices that can tolerate it
* Conversion of KVM's representation of LPIs to an xarray, utilized to
address serialization some of the serialization on the LPI injection
path
* Support for _architectural_ VHE-only systems, advertised through the
absence of FEAT_E2H0 in the CPU's ID register
* Miscellaneous cleanups, fixes, and spelling corrections to KVM and
selftests
LoongArch:
* Set reserved bits as zero in CPUCFG.
* Start SW timer only when vcpu is blocking.
* Do not restart SW timer when it is expired.
* Remove unnecessary CSR register saving during enter guest.
* Misc cleanups and fixes as usual.
Generic:
* cleanup Kconfig by removing CONFIG_HAVE_KVM, which was basically always
true on all architectures except MIPS (where Kconfig determines the
available depending on CPU capabilities). It is replaced either by
an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM)
everywhere else.
* Factor common "select" statements in common code instead of requiring
each architecture to specify it
* Remove thoroughly obsolete APIs from the uapi headers.
* Move architecture-dependent stuff to uapi/asm/kvm.h
* Always flush the async page fault workqueue when a work item is being
removed, especially during vCPU destruction, to ensure that there are no
workers running in KVM code when all references to KVM-the-module are gone,
i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded.
* Grab a reference to the VM's mm_struct in the async #PF worker itself instead
of gifting the worker a reference, so that there's no need to remember
to *conditionally* clean up after the worker.
Selftests:
* Reduce boilerplate especially when utilize selftest TAP infrastructure.
* Add basic smoke tests for SEV and SEV-ES, along with a pile of library
support for handling private/encrypted/protected memory.
* Fix benign bugs where tests neglect to close() guest_memfd files.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmX0iP8UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroND7wf+JZoNvwZ+bmwWe/4jn/YwNoYi/C5z
eypn8M1gsWEccpCpqPBwznVm9T29rF4uOlcMvqLEkHfTpaL1EKUUjP1lXPz/ileP
6a2RdOGxAhyTiFC9fjy+wkkjtLbn1kZf6YsS0hjphP9+w0chNbdn0w81dFVnXryd
j7XYI8R/bFAthNsJOuZXSEjCfIHxvTTG74OrTf1B1FEBB+arPmrgUeJftMVhffQK
Sowgg8L/Ii/x6fgV5NZQVSIyVf1rp8z7c6UaHT4Fwb0+RAMW8p9pYv9Qp1YkKp8y
5j0V9UzOHP7FRaYimZ5BtwQoqiZXYylQ+VuU/Y2f4X85cvlLzSqxaEMAPA==
=mqOV
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"S390:
- Changes to FPU handling came in via the main s390 pull request
- Only deliver to the guest the SCLP events that userspace has
requested
- More virtual vs physical address fixes (only a cleanup since
virtual and physical address spaces are currently the same)
- Fix selftests undefined behavior
x86:
- Fix a restriction that the guest can't program a PMU event whose
encoding matches an architectural event that isn't included in the
guest CPUID. The enumeration of an architectural event only says
that if a CPU supports an architectural event, then the event can
be programmed *using the architectural encoding*. The enumeration
does NOT say anything about the encoding when the CPU doesn't
report support the event *in general*. It might support it, and it
might support it using the same encoding that made it into the
architectural PMU spec
- Fix a variety of bugs in KVM's emulation of RDPMC (more details on
individual commits) and add a selftest to verify KVM correctly
emulates RDMPC, counter availability, and a variety of other
PMC-related behaviors that depend on guest CPUID and therefore are
easier to validate with selftests than with custom guests (aka
kvm-unit-tests)
- Zero out PMU state on AMD if the virtual PMU is disabled, it does
not cause any bug but it wastes time in various cases where KVM
would check if a PMC event needs to be synthesized
- Optimize triggering of emulated events, with a nice ~10%
performance improvement in VM-Exit microbenchmarks when a vPMU is
exposed to the guest
- Tighten the check for "PMI in guest" to reduce false positives if
an NMI arrives in the host while KVM is handling an IRQ VM-Exit
- Fix a bug where KVM would report stale/bogus exit qualification
information when exiting to userspace with an internal error exit
code
- Add a VMX flag in /proc/cpuinfo to report 5-level EPT support
- Rework TDP MMU root unload, free, and alloc to run with mmu_lock
held for read, e.g. to avoid serializing vCPUs when userspace
deletes a memslot
- Tear down TDP MMU page tables at 4KiB granularity (used to be
1GiB). KVM doesn't support yielding in the middle of processing a
zap, and 1GiB granularity resulted in multi-millisecond lags that
are quite impolite for CONFIG_PREEMPT kernels
- Allocate write-tracking metadata on-demand to avoid the memory
overhead when a kernel is built with i915 virtualization support
but the workloads use neither shadow paging nor i915 virtualization
- Explicitly initialize a variety of on-stack variables in the
emulator that triggered KMSAN false positives
- Fix the debugregs ABI for 32-bit KVM
- Rework the "force immediate exit" code so that vendor code
ultimately decides how and when to force the exit, which allowed
some optimization for both Intel and AMD
- Fix a long-standing bug where kvm_has_noapic_vcpu could be left
elevated if vCPU creation ultimately failed, causing extra
unnecessary work
- Cleanup the logic for checking if the currently loaded vCPU is
in-kernel
- Harden against underflowing the active mmu_notifier invalidation
count, so that "bad" invalidations (usually due to bugs elsehwere
in the kernel) are detected earlier and are less likely to hang the
kernel
x86 Xen emulation:
- Overlay pages can now be cached based on host virtual address,
instead of guest physical addresses. This removes the need to
reconfigure and invalidate the cache if the guest changes the gpa
but the underlying host virtual address remains the same
- When possible, use a single host TSC value when computing the
deadline for Xen timers in order to improve the accuracy of the
timer emulation
- Inject pending upcall events when the vCPU software-enables its
APIC to fix a bug where an upcall can be lost (and to follow Xen's
behavior)
- Fall back to the slow path instead of warning if "fast" IRQ
delivery of Xen events fails, e.g. if the guest has aliased xAPIC
IDs
RISC-V:
- Support exception and interrupt handling in selftests
- New self test for RISC-V architectural timer (Sstc extension)
- New extension support (Ztso, Zacas)
- Support userspace emulation of random number seed CSRs
ARM:
- Infrastructure for building KVM's trap configuration based on the
architectural features (or lack thereof) advertised in the VM's ID
registers
- Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
x86's WC) at stage-2, improving the performance of interacting with
assigned devices that can tolerate it
- Conversion of KVM's representation of LPIs to an xarray, utilized
to address serialization some of the serialization on the LPI
injection path
- Support for _architectural_ VHE-only systems, advertised through
the absence of FEAT_E2H0 in the CPU's ID register
- Miscellaneous cleanups, fixes, and spelling corrections to KVM and
selftests
LoongArch:
- Set reserved bits as zero in CPUCFG
- Start SW timer only when vcpu is blocking
- Do not restart SW timer when it is expired
- Remove unnecessary CSR register saving during enter guest
- Misc cleanups and fixes as usual
Generic:
- Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically
always true on all architectures except MIPS (where Kconfig
determines the available depending on CPU capabilities). It is
replaced either by an architecture-dependent symbol for MIPS, and
IS_ENABLED(CONFIG_KVM) everywhere else
- Factor common "select" statements in common code instead of
requiring each architecture to specify it
- Remove thoroughly obsolete APIs from the uapi headers
- Move architecture-dependent stuff to uapi/asm/kvm.h
- Always flush the async page fault workqueue when a work item is
being removed, especially during vCPU destruction, to ensure that
there are no workers running in KVM code when all references to
KVM-the-module are gone, i.e. to prevent a very unlikely
use-after-free if kvm.ko is unloaded
- Grab a reference to the VM's mm_struct in the async #PF worker
itself instead of gifting the worker a reference, so that there's
no need to remember to *conditionally* clean up after the worker
Selftests:
- Reduce boilerplate especially when utilize selftest TAP
infrastructure
- Add basic smoke tests for SEV and SEV-ES, along with a pile of
library support for handling private/encrypted/protected memory
- Fix benign bugs where tests neglect to close() guest_memfd files"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
selftests: kvm: remove meaningless assignments in Makefiles
KVM: riscv: selftests: Add Zacas extension to get-reg-list test
RISC-V: KVM: Allow Zacas extension for Guest/VM
KVM: riscv: selftests: Add Ztso extension to get-reg-list test
RISC-V: KVM: Allow Ztso extension for Guest/VM
RISC-V: KVM: Forward SEED CSR access to user space
KVM: riscv: selftests: Add sstc timer test
KVM: riscv: selftests: Change vcpu_has_ext to a common function
KVM: riscv: selftests: Add guest helper to get vcpu id
KVM: riscv: selftests: Add exception handling support
LoongArch: KVM: Remove unnecessary CSR register saving during enter guest
LoongArch: KVM: Do not restart SW timer when it is expired
LoongArch: KVM: Start SW timer only when vcpu is blocking
LoongArch: KVM: Set reserved bits as zero in CPUCFG
KVM: selftests: Explicitly close guest_memfd files in some gmem tests
KVM: x86/xen: fix recursive deadlock in timer injection
KVM: pfncache: simplify locking and make more self-contained
KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
KVM: x86/xen: improve accuracy of Xen timers
...
The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is
similar in concept to a DWARF unwinder. The difference is that the format
of the ORC data is much simpler than DWARF, which in turn allows the ORC
unwinder to be much simpler and faster.
The ORC data consists of unwind tables which are generated by objtool.
After analyzing all the code paths of a .o file, it determines information
about the stack state at each instruction address in the file and outputs
that information to the .orc_unwind and .orc_unwind_ip sections.
The per-object ORC sections are combined at link time and are sorted and
post-processed at boot time. The unwinder uses the resulting data to
correlate instruction addresses with their stack states at run time.
Most of the logic are similar with x86, in order to get ra info before ra
is saved into stack, add ra_reg and ra_offset into orc_entry. At the same
time, modify some arch-specific code to silence the objtool warnings.
Co-developed-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Co-developed-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
avoid creating ABI that KVM can't sanely support.
- Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
clear that such VMs are purely a development and testing vehicle, and
come with zero guarantees.
- Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
is to support confidential VMs with deterministic private memory (SNP
and TDX) only in the TDP MMU.
- Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKTobbabEP7vbhhN9OlYIJqCjN/0FAmXZB/8ACgkQOlYIJqCj
N/3XlQ//RIsvqr38k7kELSKhCMyWgF4J57itABrHpMqAZu3gaAo5sETX8AGcHEe5
mxmquxyNQSf4cthhWy1kzxjGCy6+fk+Z0Z7wzfz0Yd5D+FI6vpo3HhkjovLb2gpt
kSrHuhJyuj2vkftNvdaz0nHX1QalVyIEnXnR3oqTmxUUsg6lp1x/zr5SP0KBXjo8
ZzJtyFd0fkRXWpA792T7XPRBWrzPV31HYZBLX8sPlYmJATcbIx9rYSThgCN6XuVN
bfE6wATsC+mwv5BpCoDFpCKmFcqSqamag9NGe5qE5mOby5DQGYTCRMCQB8YXXBR0
97ppaY9ZJV4nOVjrYJn6IMOSMVNfoG7nTRFfcd0eFP4tlPEgHwGr5BGDaBtQPkrd
KcgWJw8nS02eCA2iOE+FtCXvGJwKhTTjQ45w7rU4EcfUk603L5J4GO1ddmjMhPcP
upGGcWDK9vCGrSUFTm8pyWp/NKRJPvAQEiQd/BweSk9+isQHTX2RYCQgPAQnwlTS
wTg7ZPNSLoUkRYmd6r+TUT32ELJGNc8GLftMnxIwweq6V7AgNMi0HE60eMovuBNO
7DAWWzfBEZmJv+0mNNZPGXczHVv4YvMWysRdKkhztBc3+sO7P3AL1zWIDlm5qwoG
LpFeeI3qo3o5ZNaqGzkSop2pUUGNGpWCH46WmP0AG7RpzW/Natw=
=M0td
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM GUEST_MEMFD fixes for 6.8:
- Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
avoid creating ABI that KVM can't sanely support.
- Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
clear that such VMs are purely a development and testing vehicle, and
come with zero guarantees.
- Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
is to support confidential VMs with deterministic private memory (SNP
and TDX) only in the TDP MMU.
- Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
These four architectures define the same Kconfig symbols for configuring
the page size. Move the logic into a common place where it can be shared
with all other architectures.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f53 ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").
Then, much later, we ended up removing the workaround in commit
43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.
Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround. But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.
It looks like there are at least two separate issues that all hit in
this area:
(a) some versions of gcc don't mark the asm goto as 'volatile' when it
has outputs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
which is easy to work around by just adding the 'volatile' by hand.
(b) Internal compiler errors:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
which are worked around by adding the extra empty 'asm' as a
barrier, as in the original workaround.
but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.
but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.
Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KVM uses __KVM_HAVE_* symbols in the architecture-dependent uapi/asm/kvm.h to mask
unused definitions in include/uapi/linux/kvm.h. __KVM_HAVE_READONLY_MEM however
was nothing but a misguided attempt to define KVM_CAP_READONLY_MEM only on
architectures where KVM_CHECK_EXTENSION(KVM_CAP_READONLY_MEM) could possibly
return nonzero. This however does not make sense, and it prevented userspace
from supporting this architecture-independent feature without recompilation.
Therefore, these days __KVM_HAVE_READONLY_MEM does not mask anything and
is only used in virt/kvm/kvm_main.c. Userspace does not need to test it
and there should be no need for it to exist. Remove it and replace it
with a Kconfig symbol within Linux source code.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The stubs for kvm_own/lsx()/kvm_own_lasx() when CONFIG_CPU_HAS_LSX or
CONFIG_CPU_HAS_LASX is not defined should have a return value since they
return an int, so add "return -EINVAL;" to the stubs.
Fixes the build error:
In file included from ../arch/loongarch/include/asm/kvm_csr.h:12,
from ../arch/loongarch/kvm/interrupt.c:8:
../arch/loongarch/include/asm/kvm_vcpu.h: In function 'kvm_own_lasx':
../arch/loongarch/include/asm/kvm_vcpu.h:73:39: error: no return statement in function returning non-void [-Werror=return-type]
73 | static inline int kvm_own_lasx(struct kvm_vcpu *vcpu) { }
Fixes: db1ecca22edf ("LoongArch: KVM: Add LSX (128bit SIMD) support")
Fixes: 118e10cd893d ("LoongArch: KVM: Add LASX (256bit SIMD) support")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
1, Raise minimum clang version to 18.0.0;
2, Enable initial Rust support for LoongArch;
3, Add built-in dtb support for LoongArch;
4, Use generic interface to support crashkernel=X,[high,low];
5, Some bug fixes and other small changes;
6, Update the default config file.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmWnW9cWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImel3CD/0Wnd2VOhoPubJkCXd+v7SdPDFB
+BlkevAdmKQXkxNVXHRwfirsEBnUdQTfSN/5hMd69ZWUTayYq3WFxOcaPs27AAyn
cXmGAzxfCjanSj+zxK8Gcmef5kppx3PRSbFdnWgc42Povu0xTOH3M31HXx5WXGtv
hZK439DspNGHlF1Bsbs3J8xbS76jc/HDZAqnIjLuefQUaWM8nhsYxJIwVeGKUX1T
IyEgBwhHhsY9ho/86yk8VXgordAN4dnMVmAHbR63HqjLo/8sck4IiPNxWKFCHex8
vgxp0zGxfBBts284EfSofDQHrSrrWl4+e2fW2QJ81BBDSS0wPCs4TAnzH+x9X7Wb
MJuh8WIJqhfXdPFxs5fdnUeykEm1V/oWFfkWORk4jbQkpY9aZbk/iv6uxsmRhmhv
2WPWvjF+7B2zSXtMcjgm71ymb/nU95W2FZO02GlwTnbGJRKA2xLkjn9rCXoHWjd3
IlxgIgZJ1vkPvFPS/sbekaTUEG+6/qTPGGa2Ol3Q5ZTTLk9serfDa8ay1xCZeOny
+fRBgLsuQAOGO2pvxfXjs+uvboZNUHeKrAi7XeR61GcbNpQDkjuwNJXQMiMQ+f66
jWM6H+hV+6sQ/W43KVrGCyBqTX4J9PSN/gX/Cq0PL74Yheop6neYXZTl5uDNYDe9
WYxiS9j/FoYgj8lxYQ==
=GzFR
-----END PGP SIGNATURE-----
Merge tag 'loongarch-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Raise minimum clang version to 18.0.0
- Enable initial Rust support for LoongArch
- Add built-in dtb support for LoongArch
- Use generic interface to support crashkernel=X,[high,low]
- Some bug fixes and other small changes
- Update the default config file.
* tag 'loongarch-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (22 commits)
MAINTAINERS: Add BPF JIT for LOONGARCH entry
LoongArch: Update Loongson-3 default config file
LoongArch: BPF: Prevent out-of-bounds memory access
LoongArch: BPF: Support 64-bit pointers to kfuncs
LoongArch: Fix definition of ftrace_regs_set_instruction_pointer()
LoongArch: Use generic interface to support crashkernel=X,[high,low]
LoongArch: Fix and simplify fcsr initialization on execve()
LoongArch: Let cores_io_master cover the largest NR_CPUS
LoongArch: Change SHMLBA from SZ_64K to PAGE_SIZE
LoongArch: Add a missing call to efi_esrt_init()
LoongArch: Parsing CPU-related information from DTS
LoongArch: dts: DeviceTree for Loongson-2K2000
LoongArch: dts: DeviceTree for Loongson-2K1000
LoongArch: dts: DeviceTree for Loongson-2K0500
LoongArch: Allow device trees be built into the kernel
dt-bindings: interrupt-controller: loongson,liointc: Fix dtbs_check warning for interrupt-names
dt-bindings: interrupt-controller: loongson,liointc: Fix dtbs_check warning for reg-names
dt-bindings: loongarch: Add Loongson SoC boards compatibles
dt-bindings: loongarch: Add CPU bindings for LoongArch
LoongArch: Enable initial Rust support
...
- Use memdup_array_user() to harden against overflow.
- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
- Clean up Kconfigs that all KVM architectures were selecting
- New functionality around "guest_memfd", a new userspace API that
creates an anonymous file and returns a file descriptor that refers
to it. guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be resized.
guest_memfd files do however support PUNCH_HOLE, which can be used to
switch a memory area between guest_memfd and regular anonymous memory.
- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
per-page attributes for a given page of guest memory; right now the
only attribute is whether the guest expects to access memory via
guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
x86:
- Support for "software-protected VMs" that can use the new guest_memfd
and page attributes infrastructure. This is mostly useful for testing,
since there is no pKVM-like infrastructure to provide a meaningfully
reduced TCB.
- Fix a relatively benign off-by-one error when splitting huge pages during
CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
- Use more generic lockdep assertions in paths that don't actually care
about whether the caller is a reader or a writer.
- let Xen guests opt out of having PV clock reported as "based on a stable TSC",
because some of them don't expect the "TSC stable" bit (added to the pvclock
ABI by KVM, but never set by Xen) to be set.
- Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
flushes on nested transitions, i.e. always satisfies flush requests. This
allows running bleeding edge versions of VMware Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
- On AMD machines with vNMI, always rely on hardware instead of intercepting
IRET in some cases to detect unmasking of NMIs
- Support for virtualizing Linear Address Masking (LAM)
- Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events using a
dedicated field instead of snapshotting the "previous" counter. If the
hardware PMC count triggers overflow that is recognized in the same VM-Exit
that KVM manually bumps an event count, KVM would pend PMIs for both the
hardware-triggered overflow and for KVM-triggered overflow.
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be problematic for
subsystems that require no regressions for W=1 builds.
- Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
"features".
- Don't force a masterclock update when a vCPU synchronizes to the current TSC
generation, as updating the masterclock can cause kvmclock's time to "jump"
unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
partly as a super minor optimization, but mostly to make KVM play nice with
position independent executable builds.
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
at build time.
ARM64:
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
base granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with
a prefix branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV
support to that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Loongarch:
- Optimization for memslot hugepage checking
- Cleanup and fix some HW/SW timer issues
- Add LSX/LASX (128bit/256bit SIMD) support
RISC-V:
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list selftest
- Support for reporting steal time along with selftest
s390:
- Bugfixes
Selftests:
- Fix an annoying goof where the NX hugepage test prints out garbage
instead of the magic token needed to run the test.
- Fix build errors when a header is delete/moved due to a missing flag
in the Makefile.
- Detect if KVM bugged/killed a selftest's VM and print out a helpful
message instead of complaining that a random ioctl() failed.
- Annotate the guest printf/assert helpers with __printf(), and fix the
various bugs that were lurking due to lack of said annotation.
There are two non-KVM patches buried in the middle of guest_memfd support:
fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
The first is small and mostly suggested-by Christian Brauner; the second
a bit less so but it was written by an mm person (Vlastimil Babka).
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+
6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML
wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE
4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi
rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4
a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA==
=66Ws
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"Generic:
- Use memdup_array_user() to harden against overflow.
- Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
architectures.
- Clean up Kconfigs that all KVM architectures were selecting
- New functionality around "guest_memfd", a new userspace API that
creates an anonymous file and returns a file descriptor that refers
to it. guest_memfd files are bound to their owning virtual machine,
cannot be mapped, read, or written by userspace, and cannot be
resized. guest_memfd files do however support PUNCH_HOLE, which can
be used to switch a memory area between guest_memfd and regular
anonymous memory.
- New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
per-page attributes for a given page of guest memory; right now the
only attribute is whether the guest expects to access memory via
guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
TDX or ARM64 pKVM is checked by firmware or hypervisor that
guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
the case of pKVM).
x86:
- Support for "software-protected VMs" that can use the new
guest_memfd and page attributes infrastructure. This is mostly
useful for testing, since there is no pKVM-like infrastructure to
provide a meaningfully reduced TCB.
- Fix a relatively benign off-by-one error when splitting huge pages
during CLEAR_DIRTY_LOG.
- Fix a bug where KVM could incorrectly test-and-clear dirty bits in
non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
a non-huge SPTE.
- Use more generic lockdep assertions in paths that don't actually
care about whether the caller is a reader or a writer.
- let Xen guests opt out of having PV clock reported as "based on a
stable TSC", because some of them don't expect the "TSC stable" bit
(added to the pvclock ABI by KVM, but never set by Xen) to be set.
- Revert a bogus, made-up nested SVM consistency check for
TLB_CONTROL.
- Advertise flush-by-ASID support for nSVM unconditionally, as KVM
always flushes on nested transitions, i.e. always satisfies flush
requests. This allows running bleeding edge versions of VMware
Workstation on top of KVM.
- Sanity check that the CPU supports flush-by-ASID when enabling SEV
support.
- On AMD machines with vNMI, always rely on hardware instead of
intercepting IRET in some cases to detect unmasking of NMIs
- Support for virtualizing Linear Address Masking (LAM)
- Fix a variety of vPMU bugs where KVM fail to stop/reset counters
and other state prior to refreshing the vPMU model.
- Fix a double-overflow PMU bug by tracking emulated counter events
using a dedicated field instead of snapshotting the "previous"
counter. If the hardware PMC count triggers overflow that is
recognized in the same VM-Exit that KVM manually bumps an event
count, KVM would pend PMIs for both the hardware-triggered overflow
and for KVM-triggered overflow.
- Turn off KVM_WERROR by default for all configs so that it's not
inadvertantly enabled by non-KVM developers, which can be
problematic for subsystems that require no regressions for W=1
builds.
- Advertise all of the host-supported CPUID bits that enumerate
IA32_SPEC_CTRL "features".
- Don't force a masterclock update when a vCPU synchronizes to the
current TSC generation, as updating the masterclock can cause
kvmclock's time to "jump" unexpectedly, e.g. when userspace
hotplugs a pre-created vCPU.
- Use RIP-relative address to read kvm_rebooting in the VM-Enter
fault paths, partly as a super minor optimization, but mostly to
make KVM play nice with position independent executable builds.
- Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
CONFIG_HYPERV as a minor optimization, and to self-document the
code.
- Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
"emulation" at build time.
ARM64:
- LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
granule sizes. Branch shared with the arm64 tree.
- Large Fine-Grained Trap rework, bringing some sanity to the
feature, although there is more to come. This comes with a prefix
branch shared with the arm64 tree.
- Some additional Nested Virtualization groundwork, mostly
introducing the NV2 VNCR support and retargetting the NV support to
that version of the architecture.
- A small set of vgic fixes and associated cleanups.
Loongarch:
- Optimization for memslot hugepage checking
- Cleanup and fix some HW/SW timer issues
- Add LSX/LASX (128bit/256bit SIMD) support
RISC-V:
- KVM_GET_REG_LIST improvement for vector registers
- Generate ISA extension reg_list using macros in get-reg-list
selftest
- Support for reporting steal time along with selftest
s390:
- Bugfixes
Selftests:
- Fix an annoying goof where the NX hugepage test prints out garbage
instead of the magic token needed to run the test.
- Fix build errors when a header is delete/moved due to a missing
flag in the Makefile.
- Detect if KVM bugged/killed a selftest's VM and print out a helpful
message instead of complaining that a random ioctl() failed.
- Annotate the guest printf/assert helpers with __printf(), and fix
the various bugs that were lurking due to lack of said annotation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
x86/kvm: Do not try to disable kvmclock if it was not enabled
KVM: x86: add missing "depends on KVM"
KVM: fix direction of dependency on MMU notifiers
KVM: introduce CONFIG_KVM_COMMON
KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
RISC-V: KVM: selftests: Add get-reg-list test for STA registers
RISC-V: KVM: selftests: Add steal_time test support
RISC-V: KVM: selftests: Add guest_sbi_probe_extension
RISC-V: KVM: selftests: Move sbi_ecall to processor.c
RISC-V: KVM: Implement SBI STA extension
RISC-V: KVM: Add support for SBI STA registers
RISC-V: KVM: Add support for SBI extension registers
RISC-V: KVM: Add SBI STA info to vcpu_arch
RISC-V: KVM: Add steal-update vcpu request
RISC-V: KVM: Add SBI STA extension skeleton
RISC-V: paravirt: Implement steal-time support
RISC-V: Add SBI STA extension definitions
RISC-V: paravirt: Add skeleton for pv-time support
RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
...
The current definition of ftrace_regs_set_instruction_pointer() is not
correct. Obviously, this function is used to set instruction pointer but
not return value, so it should call instruction_pointer_set() instead of
regs_set_return_value().
There is no side effect by now because it is only used for kernel live-
patching which is not supported, so fix it to avoid failure when testing
livepatch in the future.
Fixes: 6fbff14a6382 ("LoongArch: ftrace: Abstract DYNAMIC_FTRACE_WITH_ARGS accesses")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LoongArch already supports two crashkernel regions in kexec-tools, so we
can directly use the common interface to support crashkernel=X,[high,low]
after commit 0ab97169aa0517079b ("crash_core: add generic function to do
reservation").
With the help of newly changed function parse_crashkernel() and generic
reserve_crashkernel_generic(), crashkernel reservation can be simplified
by steps:
1) Add a new header file <asm/crash_core.h>, then define CRASH_ALIGN,
CRASH_ADDR_LOW_MAX and CRASH_ADDR_HIGH_MAX and in <asm/crash_core.h>;
2) Add arch_reserve_crashkernel() to call parse_crashkernel() and
reserve_crashkernel_generic();
3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in
arch/loongarch/Kconfig.
One can reserve the crash kernel from high memory above DMA zone range
by explicitly passing "crashkernel=X,high"; or reserve a memory range
below 4G with "crashkernel=X,low". Besides, there are few rules need to
take notice:
1) "crashkernel=X,[high,low]" will be ignored if "crashkernel=size" is
specified.
2) "crashkernel=X,low" is valid only when "crashkernel=X,high" is passed
and there is enough memory to be allocated under 4G.
3) When allocating crashkernel above 4G and no "crashkernel=X,low" is
specified, a 128M low memory will be allocated automatically for
swiotlb bounce buffer.
See Documentation/admin-guide/kernel-parameters.txt for more information.
Following test cases have been performed as expected:
1) crashkernel=256M //low=256M
2) crashkernel=1G //low=1G
3) crashkernel=4G //high=4G, low=128M(default)
4) crashkernel=4G crashkernel=256M,high //high=4G, low=128M(default), high is ignored
5) crashkernel=4G crashkernel=256M,low //high=4G, low=128M(default), low is ignored
6) crashkernel=4G,high //high=4G, low=128M(default)
7) crashkernel=256M,low //low=0M, invalid
8) crashkernel=4G,high crashkernel=256M,low //high=4G, low=256M
9) crashkernel=4G,high crashkernel=4G,low //high=0M, low=0M, invalid
10) crashkernel=512M@2560M //low=512M
11) crashkernel=1G,high crashkernel=0M,low //high=1G, low=0M
Recommended usage in general:
1) In the case of small memory: crashkernel=512M
2) In the case of large memory: crashkernel=1024M,high crashkernel=128M,low
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
There has been a lingering bug in LoongArch Linux systems causing some
GCC tests to intermittently fail (see Closes link). I've made a minimal
reproducer:
zsh% cat measure.s
.align 4
.globl _start
_start:
movfcsr2gr $a0, $fcsr0
bstrpick.w $a0, $a0, 16, 16
beqz $a0, .ok
break 0
.ok:
li.w $a7, 93
syscall 0
zsh% cc mesaure.s -o measure -nostdlib
zsh% echo $((1.0/3))
0.33333333333333331
zsh% while ./measure; do ; done
This while loop should not stop as POSIX is clear that execve must set
fenv to the default, where FCSR should be zero. But in fact it will
just stop after running for a while (normally less than 30 seconds).
Note that "$((1.0/3))" is needed to reproduce this issue because it
raises FE_INVALID and makes fcsr0 non-zero.
The problem is we are currently relying on SET_PERSONALITY2() to reset
current->thread.fpu.fcsr. But SET_PERSONALITY2() is executed before
start_thread which calls lose_fpu(0). We can see if kernel preempt is
enabled, we may switch to another thread after SET_PERSONALITY2() but
before lose_fpu(0). Then bad thing happens: during the thread switch
the value of the fcsr0 register is stored into current->thread.fpu.fcsr,
making it dirty again.
The issue can be fixed by setting current->thread.fpu.fcsr after
lose_fpu(0) because lose_fpu() clears TIF_USEDFPU, then the thread
switch won't touch current->thread.fpu.fcsr.
The only other architecture setting FCSR in SET_PERSONALITY2() is MIPS.
I've ran a similar test on MIPS with mainline kernel and it turns out
MIPS is buggy, too. Anyway MIPS do this for supporting different FP
flavors (NaN encodings, etc.) which do not exist on LoongArch. So for
LoongArch, we can simply remove the current->thread.fpu.fcsr setting
from SET_PERSONALITY2() and do it in start_thread(), after lose_fpu(0).
The while loop failing with the mainline kernel has survived one hour
after this change on LoongArch.
Fixes: 803b0fc5c3f2baa ("LoongArch: Add process management")
Closes: https://github.com/loongson-community/discussions/issues/7
Link: https://lore.kernel.org/linux-mips/7a6aa1bbdbbe2e63ae96ff163fab0349f58f1b9e.camel@xry111.site/
Cc: stable@vger.kernel.org
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Now loongson_system_configuration::cores_io_master only covers 64 cpus,
if NR_CPUS > 64 there will be memory corruption. So let cores_io_master
cover the largest NR_CPUS (256).
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LoongArch has hardware page coloring for L1 Cache, so we don't have
cache aliases. But SFB (Store Fill Buffer) still has aliases. So we
define SHMLBA to SZ_64K previously. But there are losts of applications
use PAGE_SIZE rather than SHMLBA to mmap() file pages and shared pages.
Of course we can fix them one by one, but not easy.
On the other hand, we can simply disable SFB for 4KB page size to fix
cache alias (there will be performance decrease, but acceptable), and
in future we will fix SFB in hardware. So we can safely define SHMLBA to
PAGE_SIZE (use the generic shmparam.h) to make life easier.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
- Fix a syzbot reported issue in efivarfs where concurrent accesses to
the file system resulted in list corruption
- Add support for accessing EFI variables via the TEE subsystem (and a
trusted application in the secure world) instead of via EFI runtime
firmware running in the OS's execution context
- Avoid linker tricks to discover the image base on LoongArch
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZYVaHQAKCRAwbglWLn0t
XPm/AQDzX9A6TND00eOLYYWw91kybHnzrVd8GRKOv2EIxGz33AEAgW6nXIJlBRax
MBq6S/sXdyknuCC3sO7H9FexdD4BzQM=
=MZUx
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
- Fix a syzbot reported issue in efivarfs where concurrent accesses to
the file system resulted in list corruption
- Add support for accessing EFI variables via the TEE subsystem (and a
trusted application in the secure world) instead of via EFI runtime
firmware running in the OS's execution context
- Avoid linker tricks to discover the image base on LoongArch
* tag 'efi-next-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: memmap: fix kernel-doc warnings
efi/loongarch: Directly position the loaded image file
efivarfs: automatically update super block flag
efi: Add tee-based EFI variable driver
efi: Add EFI_ACCESS_DENIED status code
efi: expose efivar generic ops register function
efivarfs: Move efivarfs list into superblock s_fs_info
efivarfs: Free s_fs_info on unmount
efivarfs: Move efivar availability check into FS context init
efivarfs: force RO when remounting if SetVariable is not supported