9813 Commits

Author SHA1 Message Date
Baoquan He
865e2acd3e s390, crash: wrap crash dumping code into crash related ifdefs
Now crash codes under kernel/ folder has been split out from kexec
code, crash dumping can be separated from kexec reboot in config
items on s390 with some adjustments.

Here wrap up crash dumping codes with CONFIG_CRASH_DUMP ifdeffery.

Link: https://lkml.kernel.org/r/20240124051254.67105-10-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23 17:48:23 -08:00
Linus Torvalds
5efa18e862 s390 fixes for 6.8-rc6
- Fix invalid -EBUSY on ccw_device_start() which can lead to failing
   device initialization
 
 - Add missing multiplication by 8 in __iowrite64_copy() to get the
   correct byte length before calling zpci_memcpy_toio()
 
 - Various config updates
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmXYvSsACgkQIg7DeRsp
 bsIkmw/+JABfYGH1/z0E3g7PSUrsbQkRiINyzmsv57KmIzMVNAhQoYSWiRiCqjVs
 /XB2u2vgkVSZL+29vJCd178MnKK/LiN7VavbmNrYJxpVmgSsM0cc/CjGGf71lh2b
 z25YbmtXGOA6Dgg+1KY0sFouMDzCT82imvKj5dvE0+NB4PBevslNGdY7qZroUGXg
 mcvAA44qkfv61TaaMkt7o/9Mz5SJZ6byR3kacz6twpyGfbIboeQqCkejovit/Yb2
 IlqtbMoWkCmO0ULRr96Pl156Ja8guCYiGvTjjL51DDeLymkiN9z6lNblOqRy0GP6
 +YZKtbU3vdmdaVgk1Bp/ez1frjeteLAFbhRejie39Wd9TibuJp00DouRLJRJnaep
 PywZd82oyMv94x2fy+nC4aBb6MxiOmsl5I2z/IilU0+O0gfToEgAXUb/fRAFFr1H
 1P/5jouS7mMpgaf9nHBWh740oOs5XJX8xGe9S8+BFLTElKnBbAEknQU+e5NOr39K
 OjMxG9OSC4UqiodYFgJtbhqmLV6BWN3m3sY/6SWZep3JbyiDwKB7G2a/4Q5SvypU
 3e5WTVdx9EuTRNgXtA7bkwnNt8XG9JubAhInlNAt29bSkIvt8YODDwPPSpMNMVhC
 vzotbMAX/cuncX6jumO3qkqimIZ1oSBBUJgq5RmYuLMvGQz7RH8=
 =RG5H
 -----END PGP SIGNATURE-----

Merge tag 's390-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

 - Fix invalid -EBUSY on ccw_device_start() which can lead to failing
   device initialization

 - Add missing multiplication by 8 in __iowrite64_copy() to get the
   correct byte length before calling zpci_memcpy_toio()

 - Various config updates

* tag 's390-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cio: fix invalid -EBUSY on ccw_device_start
  s390: use the correct count for __iowrite64_copy()
  s390/configs: update default configurations
  s390/configs: enable INIT_STACK_ALL_ZERO in all configurations
  s390/configs: provide compat topic configuration target
2024-02-23 09:54:13 -08:00
Nathan Chancellor
2947a4567f treewide: update LLVM Bugzilla links
LLVM moved their issue tracker from their own Bugzilla instance to GitHub
issues.  While all of the links are still valid, they may not necessarily
show the most up to date information around the issues, as all updates
will occur on GitHub, not Bugzilla.

Another complication is that the Bugzilla issue number is not always the
same as the GitHub issue number.  Thankfully, LLVM maintains this mapping
through two shortlinks:

  https://llvm.org/bz<num> -> https://bugs.llvm.org/show_bug.cgi?id=<num>
  https://llvm.org/pr<num> -> https://github.com/llvm/llvm-project/issues/<mapped_num>

Switch all "https://bugs.llvm.org/show_bug.cgi?id=<num>" links to the
"https://llvm.org/pr<num>" shortlink so that the links show the most up to
date information.  Each migrated issue links back to the Bugzilla entry,
so there should be no loss of fidelity of information here.

Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-3-eb09b59db071@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Fangrui Song <maskray@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:38:51 -08:00
David Hildenbrand
d7f861b9c4 mm/mmu_gather: add __tlb_remove_folio_pages()
Add __tlb_remove_folio_pages(), which will remove multiple consecutive
pages that belong to the same large folio, instead of only a single page. 
We'll be using this function when optimizing unmapping/zapping of large
folios that are mapped by PTEs.

We're using the remaining spare bit in an encoded_page to indicate that
the next enoced page in an array contains actually shifted "nr_pages". 
Teach swap/freeing code about putting multiple folio references, and
delayed rmap handling to remove page ranges of a folio.

This extension allows for still gathering almost as many small folios as
we used to (-1, because we have to prepare for a possibly bigger next
entry), but still allows for gathering consecutive pages that belong to
the same large folio.

Note that we don't pass the folio pointer, because it is not required for
now.  Further, we don't support page_size != PAGE_SIZE, it won't be
required for simple PTE batching.

We have to provide a separate s390 implementation, but it's fairly
straight forward.

Another, more invasive and likely more expensive, approach would be to use
folio+range or a PFN range instead of page+nr_pages.  But, we should do
that consistently for the whole mmu_gather.  For now, let's keep it simple
and add "nr_pages" only.

Note that it is now possible to gather significantly more pages: In the
past, we were able to gather ~10000 pages, now we can also gather ~5000
folio fragments that span multiple pages.  A folio fragment on x86-64 can
span up to 512 pages (2 MiB THP) and on arm64 with 64k in theory 8192
pages (512 MiB THP).  Gathering more memory is not considered something we
should worry about, especially because these are already corner cases.

While we can gather more total memory, we won't free more folio fragments.
As long as page freeing time primarily only depends on the number of
involved folios, there is no effective change for !preempt configurations.
However, we'll adjust tlb_batch_pages_flush() separately to handle corner
cases where page freeing time grows proportionally with the actual memory
size.

Link: https://lkml.kernel.org/r/20240214204435.167852-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:27:17 -08:00
David Hildenbrand
c30d6bc8d0 mm/mmu_gather: pass "delay_rmap" instead of encoded page to __tlb_remove_page_size()
We have two bits available in the encoded page pointer to store additional
information.  Currently, we use one bit to request delay of the rmap
removal until after a TLB flush.

We want to make use of the remaining bit internally for batching of
multiple pages of the same folio, specifying that the next encoded page
pointer in an array is actually "nr_pages".  So pass page + delay_rmap
flag instead of an encoded page, to handle the encoding internally.

Link: https://lkml.kernel.org/r/20240214204435.167852-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:27:17 -08:00
David Hildenbrand
4555ac8b3c s390/pgtable: define PFN_PTE_SHIFT
We want to make use of pte_next_pfn() outside of set_ptes().  Let's simply
define PFN_PTE_SHIFT, required by pte_next_pfn().

Link: https://lkml.kernel.org/r/20240129124649.189745-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King (Oracle) <linux@armlinux.org.uk>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 10:24:50 -08:00
Christophe Leroy
6cdc82db0c mm: ptdump: have ptdump_check_wx() return bool
Have ptdump_check_wx() return true when the check is successful or false
otherwise.

[akpm@linux-foundation.org: fix a couple of build issues (x86_64 allmodconfig)]
Link: https://lkml.kernel.org/r/7943149fe955458cb7b57cd483bf41a3aad94684.1706610398.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Phong Tran <tranmanphong@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steven Price <steven.price@arm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 10:24:47 -08:00
Christophe Leroy
592e15f62f powerpc,s390: ptdump: define ptdump_check_wx() regardless of CONFIG_DEBUG_WX
Following patch will use ptdump_check_wx() regardless of CONFIG_DEBUG_WX,
so define it at all times on powerpc and s390 just like other
architectures.  Though keep the WARN_ON_ONCE() only when CONFIG_DEBUG_WX
is set.

Link: https://lkml.kernel.org/r/07bfb04c7fec58e84413e91d2533581be357a696.1706610398.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Phong Tran <tranmanphong@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steven Price <steven.price@arm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 10:24:47 -08:00
Christophe Leroy
a5e8131a03 arm64, powerpc, riscv, s390, x86: ptdump: refactor CONFIG_DEBUG_WX
All architectures using the core ptdump functionality also implement
CONFIG_DEBUG_WX, and they all do it more or less the same way, with a
function called debug_checkwx() that is called by mark_rodata_ro(), which
is a substitute to ptdump_check_wx() when CONFIG_DEBUG_WX is set and a
no-op otherwise.

Refactor by centrally defining debug_checkwx() in linux/ptdump.h and call
debug_checkwx() immediately after calling mark_rodata_ro() instead of
calling it at the end of every mark_rodata_ro().

On x86_32, mark_rodata_ro() first checks __supported_pte_mask has _PAGE_NX
before calling debug_checkwx().  Now the check is inside the callee
ptdump_walk_pgd_level_checkwx().

On powerpc_64, mark_rodata_ro() bails out early before calling
ptdump_check_wx() when the MMU doesn't have KERNEL_RO feature.  The check
is now also done in ptdump_check_wx() as it is called outside
mark_rodata_ro().

Link: https://lkml.kernel.org/r/a59b102d7964261d31ead0316a9f18628e4e7a8e.1706610398.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Phong Tran <tranmanphong@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steven Price <steven.price@arm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 10:24:47 -08:00
Nathan Chancellor
7f115ff4fc s390/boot: workaround current 'llvm-objdump -t -j ...' behavior
When building with OBJDUMP=llvm-objdump, there are a series of warnings
from the section comparisons that arch/s390/boot/Makefile performs
between vmlinux and arch/s390/boot/vmlinux:

  llvm-objdump: warning: section '.boot.preserved.data' mentioned in a -j/--section option, but not found in any input file
  llvm-objdump: warning: section '.boot.data' mentioned in a -j/--section option, but not found in any input file
  llvm-objdump: warning: section '.boot.preserved.data' mentioned in a -j/--section option, but not found in any input file
  llvm-objdump: warning: section '.boot.data' mentioned in a -j/--section option, but not found in any input file

The warning is a little misleading, as these sections do exist in the
input files. It is really pointing out that llvm-objdump does not match
GNU objdump's behavior of respecting '-j' / '--section' in combination
with '-t' / '--syms':

  $ s390x-linux-gnu-objdump -t -j .boot.data vmlinux.full

  vmlinux.full:     file format elf64-s390

  SYMBOL TABLE:
  0000000001951000 l     O .boot.data     0000000000003000 sclp_info_sccb
  00000000019550e0 l     O .boot.data     0000000000000001 sclp_info_sccb_valid
  00000000019550e2 g     O .boot.data     0000000000001000 early_command_line
  ...

  $ llvm-objdump -t -j .boot.data vmlinux.full

  vmlinux.full:   file format elf64-s390

  SYMBOL TABLE:
  0000000000100040 l     O .text  0000000000000010 dw_psw
  0000000000000000 l    df *ABS*  0000000000000000 main.c
  00000000001001b0 l     F .text  00000000000000c6 trace_event_raw_event_initcall_level
  0000000000100280 l     F .text  0000000000000100 perf_trace_initcall_level
  ...

It may be possible to change llvm-objdump's behavior to match GNU
objdump's behavior but the difficulty of that task has not yet been
explored. The combination of '$(OBJDUMP) -t -j' is not common in the
kernel tree on a whole, so workaround this tool difference by grepping
for the sections in the full symbol table output in a similar manner to
the sed invocation. This results in no visible change for GNU objdump
users while fixing the warnings for OBJDUMP=llvm-objdump, further
enabling use of LLVM=1 for ARCH=s390 with versions of LLVM that have
support for s390 in ld.lld and llvm-objcopy.

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Closes: https://lore.kernel.org/20240219113248.16287-C-hca@linux.ibm.com/
Link: https://github.com/ClangBuiltLinux/linux/issues/859
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240220-s390-work-around-llvm-objdump-t-j-v1-1-47bb0366a831@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-22 16:06:56 +01:00
Eric Farman
01be7f53df KVM: s390: fix access register usage in ioctls
The routine ar_translation() can be reached by both the instruction
intercept path (where the access registers had been loaded with the
guest register contents), and the MEM_OP ioctls (which hadn't).
Since this routine saves the current registers to vcpu->run,
this routine erroneously saves host registers into the guest space.

Introduce a boolean in the kvm_vcpu_arch struct to indicate whether
the registers contain guest contents. If they do (the instruction
intercept path), the save can be performed and the AR translation
is done just as it is today. If they don't (the MEM_OP path), the
AR can be read from vcpu->run without stashing the current contents.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20240220211211.3102609-2-farman@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-22 16:06:56 +01:00
Eric Farman
85a19b3054 KVM: s390: only deliver the set service event bits
The SCLP driver code masks off the last two bits of the parameter [1]
to determine if a read is required, but doesn't care about the
contents of those bits. Meanwhile, the KVM code that delivers
event interrupts masks off those two bits but sends both to the
guest, even if only one was specified by userspace [2].

This works for the driver code, but it means any nuances of those
bits gets lost. Use the event pending mask as an actual mask, and
only send the bit(s) that were specified in the pending interrupt.

[1] Linux: sclp_interrupt_handler() (drivers/s390/char/sclp.c:658)
[2] QEMU: service_interrupt() (hw/s390x/sclp.c:360..363)

Fixes: 0890ddea1a90 ("KVM: s390: protvirt: Add SCLP interrupt handling")
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20240205214300.1018522-1-farman@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-Id: <20240205214300.1018522-1-farman@linux.ibm.com>
2024-02-22 10:36:42 +01:00
Kefeng Wang
a23f517b0e mm: convert mm_counter() to take a folio
Now all callers of mm_counter() have a folio, convert mm_counter() to take
a folio.  Saves a call to compound_head() hidden inside PageAnon().

Link: https://lkml.kernel.org/r/20240111152429.3374566-10-willy@infradead.org
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-21 16:00:03 -08:00
Kefeng Wang
0601ac883a s390: use pfn_swap_entry_folio() in ptep_zap_swap_entry()
Call pfn_swap_entry_folio() in ptep_zap_swap_entry() as preparation for
converting mm counter functions to take a folio.

Link: https://lkml.kernel.org/r/20240111152429.3374566-5-willy@infradead.org
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-21 16:00:03 -08:00
Sumanth Korikkar
9eda317c15 s390: enable MHP_MEMMAP_ON_MEMORY
Enable MHP_MEMMAP_ON_MEMORY to support "memmap on memory".
memory_hotplug.memmap_on_memory=true kernel parameter should be set in
kernel boot option to enable the feature.

Link: https://lkml.kernel.org/r/20240108132747.3238763-6-sumanthk@linux.ibm.com
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-21 16:00:02 -08:00
Sumanth Korikkar
1a65b73ae9 s390/mm: allocate vmemmap pages from self-contained memory range
Allocate memory map (struct pages array) from the hotplugged memory
range, rather than using system memory. The change addresses the issue
where standby memory, when configured to be much larger than online
memory, could potentially lead to ipl failure due to memory map
allocation from online memory. For example, 16MB of memory map
allocation is needed for a memory block size of 1GB and when standby
memory is configured much larger than online memory, this could lead to
ipl failure.

To address this issue, the solution involves introducing "memmap on
memory" using the vmem_altmap structure on s390.  Architectures that
want to implement it should pass the altmap to the vmemmap_populate()
function and its associated callchain. This enhancement is discussed in
commit 4b94ffdc4163 ("x86, mm: introduce vmem_altmap to augment
vmemmap_populate()")

Provide "memmap on memory" support for s390 by passing the altmap in
vmemmap_populate() and its callchain. The allocation path is described
as follows:
* When altmap is NULL in vmemmap_populate(), memory map allocation
  occurs using the existing vmemmap_alloc_block_buf().
* When altmap is not NULL in vmemmap_populate(), memory map allocation
  still uses vmemmap_alloc_block_buf(), but this function internally
  calls altmap_alloc_block_buf().

For deallocation, the process is outlined as follows:
* When altmap is NULL in vmemmap_free(), memory map deallocation happens
  through free_pages().
* When altmap is not NULL in vmemmap_free(), memory map deallocation
  occurs via vmem_altmap_free().

While memory map allocation is primarily handled through the
self-contained memory map range, there might still be a small amount of
system memory allocation required for vmemmap pagetables. To mitigate
this impact, this feature will be limited to machines with EDAT1
support.

Link: https://lkml.kernel.org/r/20240108132747.3238763-3-sumanthk@linux.ibm.com
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-21 16:00:02 -08:00
Janosch Frank
4a59932874 KVM: s390: introduce kvm_s390_fpu_(store|load)
It's a bit nicer than having multiple lines and will help if there's
another re-work since we'll only have to change one location.

Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-21 15:09:13 +01:00
Jason Gunthorpe
723a2cc8d6 s390: use the correct count for __iowrite64_copy()
The signature for __iowrite64_copy() requires the number of 64 bit
quantities, not bytes. Multiple by 8 to get to a byte length before
invoking zpci_memcpy_toio()

Fixes: 87bc359b9822 ("s390/pci: speed up __iowrite64_copy by using pci store block insn")
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v1-9223d11a7662+1d7785-s390_iowrite64_jgg@nvidia.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-21 15:07:46 +01:00
Anna-Maria Behnsen
cb3444cfdb s390/vdso: Use generic union vdso_data_store
There is already a generic union definition for vdso_data_store in the vdso
datapage header.

Use this definition to prevent code duplication.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219153939.75719-8-anna-maria@linutronix.de
2024-02-20 20:56:00 +01:00
Anna-Maria Behnsen
3ebacc96f8 s390/vdso/data: Drop unnecessary header include
vdso/datapage.h includes the architecture specific vdso/data.h header
file. So there is no need to do it also the other way round and including
the generic vdso/datapage.h header file inside the architecture specific
data.h header file.

Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219153939.75719-3-anna-maria@linutronix.de
2024-02-20 20:56:00 +01:00
Sean Christopherson
9e7325acb3 KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
Rename kvm_is_error_gpa() to kvm_is_gpa_in_memslot() and invert the
polarity accordingly in order to (a) free up kvm_is_error_gpa() to match
with kvm_is_error_{hva,page}(), and (b) to make it more obvious that the
helper is doing a memslot lookup, i.e. not simply checking for INVALID_GPA.

No functional change intended.

Link: https://lore.kernel.org/r/20240215152916.1158-9-paul@xen.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-20 07:37:45 -08:00
Josh Poimboeuf
778666df60 s390: compile relocatable kernel without -fPIE
On s390, currently kernel uses the '-fPIE' compiler flag for compiling
vmlinux.  This has a few problems:

  - It uses dynamic symbols (.dynsym), for which the linker refuses to
    allow more than 64k sections.  This can break features which use
    '-ffunction-sections' and '-fdata-sections', including kpatch-build
    [1] and Function Granular KASLR.

  - It unnecessarily uses GOT relocations, adding an extra layer of
    indirection for many memory accesses.

Instead of using '-fPIE', resolve all the relocations at link time and
then manually adjust any absolute relocations (R_390_64) during boot.

This is done by first telling the linker to preserve all relocations
during the vmlinux link.  (Note this is harmless: they are later
stripped in the vmlinux.bin link.)

Then use the 'relocs' tool to find all absolute relocations (R_390_64)
which apply to allocatable sections.  The offsets of those relocations
are saved in a special section which is then used to adjust the
relocations during boot.

(Note: For some reason, Clang occasionally creates a GOT reference, even
without '-fPIE'.  So Clang-compiled kernels have a GOT, which needs to
be adjusted.)

On my mostly-defconfig kernel, this reduces kernel text size by ~1.3%.

[1] https://github.com/dynup/kpatch/issues/1284
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html

Compiler consideration:

Gcc recently implemented an optimization [2] for loading symbols without
explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates
symbols to reside on a 2-byte boundary, enabling the use of the larl
instruction. However, kernel linker scripts may still generate unaligned
symbols. To address this, a new -munaligned-symbols option has been
introduced [3] in recent gcc versions. This option has to be used with
future gcc versions.

Older Clang lacks support for handling unaligned symbols generated
by kernel linker scripts when the kernel is built without -fPIE. However,
future versions of Clang will include support for the -munaligned-symbols
option. When the support is unavailable, compile the kernel with -fPIE
to maintain the existing behavior.

In addition to it:
move vmlinux.relocs to safe relocation

When the kernel is built with CONFIG_KERNEL_UNCOMPRESSED, the entire
uncompressed vmlinux.bin is positioned in the bzImage decompressor
image at the default kernel LMA of 0x100000, enabling it to be executed
in-place. However, the size of .vmlinux.relocs could be large enough to
cause an overlap with the uncompressed kernel at the address 0x100000.
To address this issue, .vmlinux.relocs is positioned after the
.rodata.compressed in the bzImage. Nevertheless, in this configuration,
vmlinux.relocs will overlap with the .bss section of vmlinux.bin. To
overcome that, move vmlinux.relocs to a safe location before clearing
.bss and handling relocs.

Compile warning fix from Sumanth Korikkar:

When kernel is built with CONFIG_LD_ORPHAN_WARN and -fno-PIE, there are
several warnings:

ld: warning: orphan section `.rela.iplt' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.head.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.init.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.rodata.cst8' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'

Orphan sections are sections that exist in an object file but don't have
a corresponding output section in the final executable. ld raises a
warning when it identifies such sections.

Eliminate the warning by placing all .rela orphan sections in .rela.dyn
and raise an error when size of .rela.dyn is greater than zero. i.e.
Dont just neglect orphan sections.

This is similar to adjustment performed in x86, where kernel is built
with -fno-PIE.
commit 5354e84598f2 ("x86/build: Add asserts for unwanted sections")

[sumanthk@linux.ibm.com: rebased Josh Poimboeuf patches and move
 vmlinux.relocs to safe location]
[hca@linux.ibm.com: merged compile warning fix from Sumanth]
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219132734.22881-4-sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20240219132734.22881-5-sumanthk@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:33 +01:00
Josh Poimboeuf
55dc65b460 s390: add relocs tool
This 'relocs' tool is copied from the x86 version, ported for s390, and
greatly simplified to remove unnecessary features.

It reads vmlinux and outputs assembly to create a .vmlinux.relocs_64
section which contains the offsets of all R_390_64 relocations which
apply to allocatable sections.

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219132734.22881-3-sumanthk@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:33 +01:00
Sumanth Korikkar
8192a1b380 s390/vdso64: filter out munaligned-symbols flag for vdso
Gcc recently implemented an optimization [1] for loading symbols without
explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates
symbols to reside on a 2-byte boundary, enabling the use of the larl
instruction. However, kernel linker scripts may still generate unaligned
symbols. To address this, a new -munaligned-symbols option has been
introduced [2] in recent gcc versions.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html

However, when -munaligned-symbols  is used in vdso code, it leads to the
following compilation error:
`.data.rel.ro.local' referenced in section `.text' of
arch/s390/kernel/vdso64/vdso64_generic.o: defined in discarded section
`.data.rel.ro.local' of arch/s390/kernel/vdso64/vdso64_generic.o

vdso linker script discards .data section to make it lightweight.
However, -munaligned-symbols in vdso object files references literal
pool and accesses _vdso_data. Hence, compile vdso code without
-munaligned-symbols.  This means in the future, vdso code should deal
with alignment of newly introduced unaligned linker symbols.

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219132734.22881-2-sumanthk@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:33 +01:00
Nathan Chancellor
9ea30fd166 s390/boot: add 'alloc' to info.bin .vmlinux.info section flags
When attempting to boot a kernel compiled with OBJCOPY=llvm-objcopy,
there is a crash right at boot:

  Out of memory allocating 6d7800 bytes 8 aligned in range 0:20000000
  Reserved memory ranges:
  0000000000000000 a394c3c30d90cdaf DECOMPRESSOR
  Usable online memory ranges (info source: sclp read info [3]):
  0000000000000000 0000000020000000
  Usable online memory total: 20000000 Reserved: a394c3c30d90cdaf Free: 0
  Call Trace:
  (sp:0000000000033e90 [<0000000000012fbc>] physmem_alloc_top_down+0x5c/0x104)
   sp:0000000000033f00 [<0000000000011d56>] startup_kernel+0x3a6/0x77c
   sp:0000000000033f60 [<00000000000100f4>] startup_normal+0xd4/0xd4

GNU objcopy does not have any issues. Looking at differences between the
object files in each build reveals info.bin does not get properly
populated with llvm-objcopy, which results in an empty .vmlinux.info
section.

  $ file {gnu,llvm}-objcopy/arch/s390/boot/info.bin
  gnu-objcopy/arch/s390/boot/info.bin:  data
  llvm-objcopy/arch/s390/boot/info.bin: empty

  $ llvm-readelf --section-headers {gnu,llvm}-objcopy/arch/s390/boot/vmlinux | rg 'File:|\.vmlinux\.info|\.decompressor\.syms'
  File: gnu-objcopy/arch/s390/boot/vmlinux
    [12] .vmlinux.info     PROGBITS        0000000000034000 035000 000078 00  WA  0   0  1
    [13] .decompressor.syms PROGBITS       0000000000034078 035078 000b00 00  WA  0   0  1
  File: llvm-objcopy/arch/s390/boot/vmlinux
    [12] .vmlinux.info     PROGBITS        0000000000034000 035000 000000 00  WA  0   0  1
    [13] .decompressor.syms PROGBITS       0000000000034000 035000 000b00 00  WA  0   0  1

Ulrich points out that llvm-objcopy only copies sections marked as alloc
with a binary output target, whereas the .vmlinux.info section is only
marked as load. Add 'alloc' in addition to 'load', so that both objcopy
implementations work properly:

  $ file {gnu,llvm}-objcopy/arch/s390/boot/info.bin
  gnu-objcopy/arch/s390/boot/info.bin:  data
  llvm-objcopy/arch/s390/boot/info.bin: data

  $ llvm-readelf --section-headers {gnu,llvm}-objcopy/arch/s390/boot/vmlinux | rg 'File:|\.vmlinux\.info|\.decompressor\.syms'
  File: gnu-objcopy/arch/s390/boot/vmlinux
    [12] .vmlinux.info     PROGBITS        0000000000034000 035000 000078 00  WA  0   0  1
    [13] .decompressor.syms PROGBITS       0000000000034078 035078 000b00 00  WA  0   0  1
  File: llvm-objcopy/arch/s390/boot/vmlinux
    [12] .vmlinux.info     PROGBITS        0000000000034000 035000 000078 00  WA  0   0  1
    [13] .decompressor.syms PROGBITS       0000000000034078 035078 000b00 00  WA  0   0  1

Closes: https://github.com/ClangBuiltLinux/linux/issues/1996
Link: 3c02cb7492
Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240216-s390-fix-boot-with-llvm-objcopy-v1-1-0ac623daf42b@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:33 +01:00
Gerd Bayer
d0c8fd2100 s390/pci: fix three typos in comments
Found and fixed these while working on synchronizing the state
handling of zpci_dev's.

Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:32 +01:00
Gerd Bayer
6ee600bfbe s390/pci: remove hotplug slot when releasing the device
Centralize the removal so all paths are covered and the hotplug slot
will remain active until the device is really destroyed.

Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:32 +01:00
Gerd Bayer
bcb5d6c769 s390/pci: introduce lock to synchronize state of zpci_dev's
There's a number of tasks that need the state of a zpci device
to be stable. Other tasks need to be synchronized as they change the state.

State changes could be generated by the system as availability or error
events, or be requested by the user through manipulations in sysfs.
Some other actions accessible through sysfs - like device resets - need the
state to be stable.

Unsynchronized state handling could lead to unusable devices. This has
been observed in cases of concurrent state changes through systemd udev
rules and DPM boot control. Some breakage can be provoked by artificial
tests, e.g. through repetitively injecting "recover" on a PCI function
through sysfs while running a "hotplug remove/add" in a loop through a
PCI slot's "power" attribute in sysfs. After a few iterations this could
result in a kernel oops.

So introduce a new mutex "state_lock" to guard the state property of the
struct zpci_dev. Acquire this lock in all task that modify the state:

- hotplug add and remove, through the PCI hotplug slot entry,
- avaiability events, as reported by the platform,
- error events, as reported by the platform,
- during device resets, explicit through sysfs requests or
  implict through the common PCI layer.

Break out an inner _do_recover() routine out of recover_store() to
separte the necessary synchronizations from the actual manipulations of
the zpci_dev required for the reset.

With the following changes I was able to run the inject loops for hours
without hitting an error.

Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:32 +01:00
Gerd Bayer
0d48566d4b s390/pci: rename lock member in struct zpci_dev
Since this guards only the Function Measurement Block, rename from
generic lock to fmb_lock in preparation to introduce another lock
that guards the state member

Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:32 +01:00
Thomas Richter
29f6fe17f3 s390/pai: adjust whitespace indentation
Adjust whitespace indentation. No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:32 +01:00
Thomas Richter
82cb9b6185 s390/pai: simplify event start function for perf stat
When an event is started, read the current value of the
PAI counter. This value is saved in event::hw.prev_count.
When an event is stopped, this value is subtracted from the current
value read out at event stop time. The difference is the delta
of this counter.

Simplify the logic and read the event value every time the event is
started. This scheme is identical to other device drivers.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:32 +01:00
Thomas Richter
fe861b0c8d s390/pai: save PAI counter value page in event structure
When the PAI events ALL_CRYPTO or ALL_NNPA are created
for system wide sampling, all PAI counters are monitored.
On each process schedule out, the values of all PAI counters
are investigated. Non-zero values are saved in the event's ring
buffer as raw data. This scheme expects the start value of each counter
to be reset to zero after each read operation performed by the PAI
PMU device driver. This allows for only one active event at any one
time as it relies on the start value of counters to be reset to zero.

Create a save area for each installed PAI XXXX_ALL event and save all
PAI counter values in this save area. Instead of clearing the
PAI counter lowcore area to zero after each read operation,
copy them from the lowcore area to the event's save area at process
schedule out time.
The delta of each PAI counter is calculated by subtracting the
old counter's value stored in the event's save area from the current
value stored in the lowcore area.

With this scheme, mulitple events of the PAI counters XXXX_ALL
can be handled at the same time. This will be addressed in a
follow-on patch.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:31 +01:00
Heiko Carstens
03325e9b64 s390/crc32le: convert to C
Convert CRC-32 LE variants to C.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:18 +01:00
Heiko Carstens
c59bf4de01 s390/crc32be: convert to C
Convert CRC-32 BE variant to C.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:18 +01:00
Heiko Carstens
37346951a8 s390/fpu: add vector instruction inline assemblies for crc32
Provide various vector instruction inline assemblies for crc32
calculations.

This is just preparation to keep the conversion of the existing crc32
implementations from assembly to C small.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:18 +01:00
Heiko Carstens
ea8b75d289 s390/sysinfo: convert bogomips calculation to C
Provide several one instruction fpu inline assemebles and use them to
implement the bogomips calculation in C like style. This is more for
illustration purposes on how kernel fpu code can be written in C.

This has the advantage that the author only has to take care of the
floating point instructions, but doesn't need to take care of general
purpose register allocation (if needed), and the semantics of all other
instructions not related to fpu.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
c8dde11df1 s390/raid6: convert to use standard fpu_*() inline assemblies
Move the s390 specific raid6 inline assemblies, make them generic, and
reuse them to implement the raid6 gen/xor implementation.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
dcd3e1de9d s390/checksum: provide csum_partial_copy_nocheck()
With csum_partial(), which reads all bytes into registers it is easy to
also implement csum_partial_copy_nocheck() which copies the buffer while
calculating its checksum.

For a 512 byte buffer this reduces the runtime by 19%. Compared to the old
generic variant (memcpy() + cksm instruction) runtime is reduced by 42%).

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
cb2a1dd589 s390/checksum: provide vector register variant of csum_partial()
Provide a faster variant of csum_partial() which uses vector registers
instead of the cksm instruction.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
3a74f44de2 s390/checksum: provide and use cksm() inline assembly
Convert those callers of csum_partial() to use the cksm instruction,
which are either very early or in critical paths, like panic/dump, so
they don't have to rely on a working kernel infrastructure, which will
be introduced with a subsequent patch.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
4ce69fcf17 s390/checksum: call instrument_read() instead of kasan_check_read()
Call instrument_read() from csum_partial() instead of kasan_check_read().
instrument_read() covers all memory access instrumentation methods.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
2c6b96762f s390/fpu: remove TIF_FPU
TIF_FPU is unused - remove it.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
8c09871a95 s390/fpu: limit save and restore to used registers
The first invocation of kernel_fpu_begin() after switching from user to
kernel context will save all vector registers, even if only parts of the
vector registers are used within the kernel fpu context. Given that save
and restore of all vector registers is quite expensive change the current
approach in several ways:

- Instead of saving and restoring all user registers limit this to those
  registers which are actually used within an kernel fpu context.

- On context switch save all remaining user fpu registers, so they can be
  restored when the task is rescheduled.

- Saving user registers within kernel_fpu_begin() is done without disabling
  and enabling interrupts - which also slightly reduces runtime. In worst
  case (e.g. interrupt context uses the same registers) this may lead to
  the situation that registers are saved several times, however the
  assumption is that this will not happen frequently, so that the new
  method is faster in nearly all cases.

- save_user_fpu_regs() can still be called from all contexts and saves all
  (or all remaining) user registers to a tasks ufpu user fpu save area.

Overall this reduces the time required to save and restore the user fpu
context for nearly all cases.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
066c40918b s390/fpu: decrease stack usage for some cases
The kernel_fpu structure has a quite large size of 520 bytes. In order to
reduce stack footprint introduce several kernel fpu structures with
different and also smaller sizes. This way every kernel fpu user must use
the correct variant. A compile time check verifies that the correct variant
is used.

There are several users which use only 16 instead of all 32 vector
registers. For those users the new kernel_fpu_16 structure with a size of
only 266 bytes can be used.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
cad8c3abaa s390/fpu: let fpu_vlm() and fpu_vstm() return number of registers
Let fpu_vlm() and fpu_vstm() macros return the number of registers saved /
loaded. This is helpful to read easy to read code in case there are several
subsequent fpu_vlm() or fpu_vstm() calls:

	__vector128 *vxrs = ....

	vxrs += fpu_vstm(0, 15, vxrs);
	vxrs += fpu_vstm(16, 31, vxrs);

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
bdbd3acb33 s390/fpu: remove anonymous union from struct fpu
The anonymous union within struct fpu contains a floating point register
array and a vector register array. Given that the vector register is always
present remove the floating point register array. For configurations
without vector registers save the floating point register contents within
their corresponding vector register location.

This allows to remove the union, and also to simplify ptrace and perf code.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
9cbff7f221 s390/fpu: remove regs member from struct fpu
KVM was the only user which modified the regs pointer in struct fpu. Remove
the pointer and convert the rest of the core fpu code to directly access
the save area embedded within struct fpu.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
ed3a0a011a s390/kvm: convert to regular kernel fpu user
KVM modifies the kernel fpu's regs pointer to its own area to implement its
custom version of preemtible kernel fpu context. With general support for
preemptible kernel fpu context there is no need for the extra complexity in
KVM code anymore.

Therefore convert KVM to a regular kernel fpu user. In particular this
means that all TIF_FPU checks can be removed, since the fpu register
context will never be changed by other kernel fpu users, and also the fpu
register context will be restored if a thread is preempted.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
4eed43de9b s390/fpu: make kernel fpu context preemptible
Make the kernel fpu context preemptible. Add another fpu structure to the
thread_struct, and use it to save and restore the kernel fpu context if its
task uses fpu registers when it is preempted.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:15 +01:00
Heiko Carstens
c038b984a9 s390/fpu: change type of fpu mask from u32 to int
Change type of fpu mask consistently from u32 to int. This is a
prerequisite to make the kernel fpu usage preemptible. Upcoming code
uses __atomic* ops which work with int pointers.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:15 +01:00