IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
- Fix ftrace bug on boot caused by exit text sections with -fpatchable-function-entry.
- Fix accuracy of stolen time on pseries since the switch to VIRT_CPU_ACCOUNTING_GEN.
- Fix a crash in the IOMMU code when doing DLPAR remove.
- Set pt_regs->link on scv entry to fix BPF stack unwinding.
- Add missing PPC_FEATURE_BOOKE on 64-bit e5500/e6500, which broke gdb.
- Fix boot on some 6xx platforms with STRICT_KERNEL_RWX enabled.
- Fix build failures with KASAN enabled and 32KB stack size.
- Some other minor fixes.
Thanks to: Arnd Bergmann, Benjamin Gray, Christophe Leroy, David Engraf, Gaurav
Batra, Jason Gunthorpe, Jiangfeng Xiao, Matthias Schiffer, Nathan Lynch, Naveen
N Rao, Nicholas Piggin, Nysal Jan K.A, R Nageswara Sastry, Shivaprasad G Bhat,
Shrikanth Hegde, Spoorthy, Srikar Dronamraju, Venkat Rao Bagalkote.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmXRQ64THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgNpvD/kBw+HCCTOCIG1R5qW76PE3zek5ikkn
TmzmovQv51S2NH/NJ1vuy12/xs7kkiyKMcLi2G5Ua1HVaGtLRwn25pWsJZpWJii+
inBPCn8lRaXiDNqPCtF3xMvypWtLEvoQnUtH9If6XXEfmzo5TfoRuJdH0TF6eEuM
A6abVONL7qYt/zGM2RhRrkVexznFk3SfF1UvKoR+6LMGVhgdW66mTKwcEt9KPn2X
hOjtBShXQYR315qv3FJQdUQooiwdIqM7IaZf32oFoG1U/iHz+3wzHcG+83iggZEa
jQMxthyeFjLWExT8dKwiLrTuaCa8B0bRKLypGcub1yh396/xcHv4KrX8XNJ3nQoL
nKcQOcPkcd+ZVAfigu7wGUS12CKmuFLUTXspgGp3CJQKzBUfLMaVqlAY/DnKEgmc
stQVi8pOv1puAE3qS2FK7HR0AdLTu0BRTdw8xfTOyfLeoYiGQQRLYnBhxb9HtwW7
HbVjpicE6VSth1BVGfgdmWH0n/a8cuuhXYOGzJ8ug1dCjgZc3zBISVx2B1yortri
vypyMhZ8t4i6j8B2fFRSQ1O0PY/0NmoQ6Yg2JIwIjaO5IbWkyI/KjO5VgdZTkbuV
8i4VLBHvSUUQwd1wBLeNQFD9nLnyJAYo7qvvtBCntmUx6ZNrPihXP4fRjz/la5rJ
I3xlArKK088RMw==
=TiJM
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"This is a bit of a big batch for rc4, but just due to holiday hangover
and because I didn't send any fixes last week due to a late revert
request. I think next week should be back to normal.
- Fix ftrace bug on boot caused by exit text sections with
'-fpatchable-function-entry'
- Fix accuracy of stolen time on pseries since the switch to
VIRT_CPU_ACCOUNTING_GEN
- Fix a crash in the IOMMU code when doing DLPAR remove
- Set pt_regs->link on scv entry to fix BPF stack unwinding
- Add missing PPC_FEATURE_BOOKE on 64-bit e5500/e6500, which broke
gdb
- Fix boot on some 6xx platforms with STRICT_KERNEL_RWX enabled
- Fix build failures with KASAN enabled and 32KB stack size
- Some other minor fixes
Thanks to Arnd Bergmann, Benjamin Gray, Christophe Leroy, David
Engraf, Gaurav Batra, Jason Gunthorpe, Jiangfeng Xiao, Matthias
Schiffer, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A,
R Nageswara Sastry, Shivaprasad G Bhat, Shrikanth Hegde, Spoorthy,
Srikar Dronamraju, and Venkat Rao Bagalkote"
* tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach
powerpc/pseries: fix accuracy of stolen time
powerpc/ftrace: Ignore ftrace locations in exit text sections
powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E
powerpc/kasan: Limit KASAN thread size increase to 32KB
Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"
powerpc: 85xx: mark local functions static
powerpc: udbg_memcons: mark functions static
powerpc/kasan: Fix addr error caused by page alignment
powerpc/6xx: set High BAT Enable flag on G2_LE cores
selftests/powerpc/papr_vpd: Check devfd before get_system_loc_code()
powerpc/64: Set task pt_regs->link to the LR value on scv entry
powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add
powerpc/pseries/papr-sysparm: use u8 arrays for payloads
Michael reported that we are seeing an ftrace bug on bootup when KASAN
is enabled and we are using -fpatchable-function-entry:
ftrace: allocating 47780 entries in 18 pages
ftrace-powerpc: 0xc0000000020b3d5c: No module provided for non-kernel address
------------[ ftrace bug ]------------
ftrace faulted on modifying
[<c0000000020b3d5c>] 0xc0000000020b3d5c
Initializing ftrace call sites
ftrace record flags: 0
(0)
expected tramp: c00000000008cef4
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2180 ftrace_bug+0x3c0/0x424
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rc3-00120-g0f71dcfb4aef #860
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c0000000003aa81c LR: c0000000003aa818 CTR: 0000000000000000
REGS: c0000000033cfab0 TRAP: 0700 Not tainted (6.5.0-rc3-00120-g0f71dcfb4aef)
MSR: 8000000002021033 <SF,VEC,ME,IR,DR,RI,LE> CR: 28028240 XER: 00000000
CFAR: c0000000002781a8 IRQMASK: 3
...
NIP [c0000000003aa81c] ftrace_bug+0x3c0/0x424
LR [c0000000003aa818] ftrace_bug+0x3bc/0x424
Call Trace:
ftrace_bug+0x3bc/0x424 (unreliable)
ftrace_process_locs+0x5f4/0x8a0
ftrace_init+0xc0/0x1d0
start_kernel+0x1d8/0x484
With CONFIG_FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY=y and
CONFIG_KASAN=y, compiler emits nops in functions that it generates for
registering and unregistering global variables (unlike with -pg and
-mprofile-kernel where calls to _mcount() are not generated in those
functions). Those functions then end up in INIT_TEXT and EXIT_TEXT
respectively. We don't expect to see any profiled functions in
EXIT_TEXT, so ftrace_init_nop() assumes that all addresses that aren't
in the core kernel text belongs to a module. Since these functions do
not match that criteria, we see the above bug.
Address this by having ftrace ignore all locations in the text exit
sections of vmlinux.
Fixes: 0f71dcfb4aef ("powerpc/ftrace: Add support for -fpatchable-function-entry")
Cc: stable@vger.kernel.org # v6.6+
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Reviewed-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240213175410.1091313-1-naveen@kernel.org
Commit e320a76db4b0 ("powerpc/cputable: Split cpu_specs[] out of
cputable.h") moved the cpu_specs to separate header files. Previously
PPC_FEATURE_BOOKE was enabled by CONFIG_PPC_BOOK3E_64. The definition in
cpu_specs_e500mc.h for PPC64 no longer enables PPC_FEATURE_BOOKE.
This breaks user space reading the ELF hwcaps and expect
PPC_FEATURE_BOOKE. Debugging an application with gdb is no longer
working on e5500/e6500 because the 64-bit detection relies on
PPC_FEATURE_BOOKE for Book-E.
Fixes: e320a76db4b0 ("powerpc/cputable: Split cpu_specs[] out of cputable.h")
Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: David Engraf <david.engraf@sysgo.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240207092758.1058893-1-david.engraf@sysgo.com
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c3238a ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f53 ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").
Then, much later, we ended up removing the workaround in commit
43c249ea0b1e ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.
Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround. But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.
It looks like there are at least two separate issues that all hit in
this area:
(a) some versions of gcc don't mark the asm goto as 'volatile' when it
has outputs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420
which is easy to work around by just adding the 'volatile' by hand.
(b) Internal compiler errors:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422
which are worked around by adding the extra empty 'asm' as a
barrier, as in the original workaround.
but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.
but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.
Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MMU_FTR_USE_HIGH_BATS is set for G2_LE cores and derivatives like e300cX,
but the high BATs need to be enabled in HID2 to work. Add register
definitions and add the needed setup to __setup_cpu_603.
This fixes boot on CPUs like the MPC5200B with STRICT_KERNEL_RWX enabled
on systems where the flag has not been set by the bootloader already.
Fixes: e4d6654ebe6e ("powerpc/mm/32s: rework mmu_mapin_ram()")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240124103838.43675-1-matthias.schiffer@ew.tq-group.com
Nysal reported that userspace backtraces are missing in offcputime bcc
tool. As an example:
$ sudo ./bcc/tools/offcputime.py -uU
Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to end.
^C
write
- python (9107)
8
write
- sudo (9105)
9
mmap
- python (9107)
16
clock_nanosleep
- multipathd (697)
3001604
The offcputime bcc tool attaches a bpf program to a kprobe on
finish_task_switch(), which is usually hit on a syscall from userspace.
With the switch to system call vectored, we started setting
pt_regs->link to zero. This is because system call vectored behaves like
a function call with LR pointing to the system call return address, and
with no modification to SRR0/SRR1. The LR value does indicate our next
instruction, so it is being saved as pt_regs->nip, and pt_regs->link is
being set to zero. This is not a problem by itself, but BPF uses perf
callchain infrastructure for capturing stack traces, and that stores LR
as the second entry in the stack trace. perf has code to cope with the
second entry being zero, and skips over it. However, generic userspace
unwinders assume that a zero entry indicates end of the stack trace,
resulting in a truncated userspace stack trace.
Rather than fixing all userspace unwinders to ignore/skip past the
second entry, store the real LR value in pt_regs->link so that there
continues to be a valid, though duplicate entry in the stack trace.
With this change:
$ sudo ./bcc/tools/offcputime.py -uU
Tracing off-CPU time (us) of user threads by user stack... Hit Ctrl-C to end.
^C
write
write
[unknown]
[unknown]
[unknown]
[unknown]
[unknown]
PyObject_VectorcallMethod
[unknown]
[unknown]
PyObject_CallOneArg
PyFile_WriteObject
PyFile_WriteString
[unknown]
[unknown]
PyObject_Vectorcall
_PyEval_EvalFrameDefault
PyEval_EvalCode
[unknown]
[unknown]
[unknown]
_PyRun_SimpleFileObject
_PyRun_AnyFileObject
Py_RunMain
[unknown]
Py_BytesMain
[unknown]
__libc_start_main
- python (1293)
7
write
write
[unknown]
sudo_ev_loop_v1
sudo_ev_dispatch_v1
[unknown]
[unknown]
[unknown]
[unknown]
__libc_start_main
- sudo (1291)
7
syscall
syscall
bpf_open_perf_buffer_opts
[unknown]
[unknown]
[unknown]
[unknown]
_PyObject_MakeTpCall
PyObject_Vectorcall
_PyEval_EvalFrameDefault
PyEval_EvalCode
[unknown]
[unknown]
[unknown]
_PyRun_SimpleFileObject
_PyRun_AnyFileObject
Py_RunMain
[unknown]
Py_BytesMain
[unknown]
__libc_start_main
- python (1293)
11
clock_nanosleep
clock_nanosleep
nanosleep
sleep
[unknown]
[unknown]
__clone
- multipathd (698)
3001661
Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions")
Cc: stable@vger.kernel.org
Reported-by: "Nysal Jan K.A" <nysal@linux.ibm.com>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240202154316.395276-1-naveen@kernel.org
The commit 2ad56efa80db ("powerpc/iommu: Setup a default domain and
remove set_platform_dma_ops") refactored the code removing the
set_platform_dma_ops(). It missed out the table group
release_ownership() call which would have got called otherwise
during the guest shutdown via vfio_group_detach_container(). On
PPC64, this particular call actually sets up the 32-bit TCE table,
and enables the 64-bit DMA bypass etc. Now after guest shutdown,
the subsequent host driver (e.g megaraid-sas) probe post unbind
from vfio-pci fails like,
megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0x7fffffffffffffff, table unavailable
megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0xffffffff, table unavailable
megaraid_sas 0031:01:00.0: Failed to set DMA mask
megaraid_sas 0031:01:00.0: Failed from megasas_init_fw 6539
The patch brings back the call to table_group release_ownership()
call when switching back to PLATFORM domain from BLOCKED, while
also separates the domain_ops for both.
Fixes: 2ad56efa80db ("powerpc/iommu: Setup a default domain and remove set_platform_dma_ops")
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/170628173462.3742.18330000394415935845.stgit@ltcd48-lp2.aus.stglab.ibm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The goal is to get sched.h down to a type only header, so the main thing
happening in this patchset is splitting out various _types.h headers and
dependency fixups, as well as moving some things out of sched.h to
better locations.
This is prep work for the memory allocation profiling patchset which
adds new sched.h interdepencencies.
Testing - it's been in -next, and fixes from pretty much all
architectures have percolated in - nothing major.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWfBwwACgkQE6szbY3K
bnZPwBAAmuRojXaeWxi01IPIOehSGDe68vw44PR9glEMZvxdnZuPOdvE4/+245/L
bRKU2WBCjBUokUbV9msIShwRkFTZAmEMPNfPAAsFMA+VXeDYHKB+ZRdwTggNAQ+I
SG6fZgh5m0HsewCDxU8oqVHkjVq4fXn0cy+aL6xLEd9gu67GoBzX2pDieS2Kvy6j
jnyoKTxFwb+LTQgph0P4EIpq5I2umAsdLwdSR8EJ+8e9NiNvMo1pI00Lx/ntAnFZ
JftWUJcMy3TQ5u1GkyfQN9y/yThX1bZK5GvmHS9SJ2Dkacaus5d+xaKCHtRuFS1I
7C6b8PsNgRczUMumBXus44HdlNfNs1yU3lvVxFvBIPE1qC9pYRHrkWIXXIocXLLC
oxTEJ6B2G3BQZVQgLIA4fOaxMVhmvKffi/aEZLi9vN9VVosd1a6XNKI6KbyRnXFp
GSs9qDqszhn5I3GYNlDNQTc/8UsRlhPFgS6nS0By6QnvxtGi9QkU2tBRBsXvqwCy
cLoCYIhc2tvugHvld70dz26umiJ4rnmxGlobStNoigDvIKAIUt1UmIdr1so8P8eH
xehnL9ZcOX6xnANDL0AqMFFHV6I58CJynhFdUoXfVQf/DWLGX48mpi9LVNsYBzsI
CAwVOAQ0UjGrpdWmJ9ueY/ABYqg9vRjzaDEXQ+MhAYO55CLaVsg=
=3tyT
-----END PGP SIGNATURE-----
Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs
Pull header cleanups from Kent Overstreet:
"The goal is to get sched.h down to a type only header, so the main
thing happening in this patchset is splitting out various _types.h
headers and dependency fixups, as well as moving some things out of
sched.h to better locations.
This is prep work for the memory allocation profiling patchset which
adds new sched.h interdepencencies"
* tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits)
Kill sched.h dependency on rcupdate.h
kill unnecessary thread_info.h include
Kill unnecessary kernel.h include
preempt.h: Kill dependency on list.h
rseq: Split out rseq.h from sched.h
LoongArch: signal.c: add header file to fix build error
restart_block: Trim includes
lockdep: move held_lock to lockdep_types.h
sem: Split out sem_types.h
uidgid: Split out uidgid_types.h
seccomp: Split out seccomp_types.h
refcount: Split out refcount_types.h
uapi/linux/resource.h: fix include
x86/signal: kill dependency on time.h
syscall_user_dispatch.h: split out *_types.h
mm_types_task.h: Trim dependencies
Split out irqflags_types.h
ipc: Kill bogus dependency on spinlock.h
shm: Slim down dependencies
workqueue: Split out workqueue_types.h
...
- btree write buffer rewrite: instead of adding keys to the btree write
buffer at transaction commit time, we know journal them with a
different journal entry type and copy them from the journal to the
write buffer just prior to journal write.
This reduces the number of atomic operations on shared cachelines
in the transaction commit path and is a signicant performance
improvement on some workloads: multithreaded 4k random writes went
from ~650k iops to ~850k iops.
- Bring back optimistic spinning for six locks: the new implementation
doesn't use osq locks; instead we add to the lock waitlist as normal,
and then spin on the lock_acquired bit in the waitlist entry, _not_
the lock itself.
- BCH_IOCTL_DEV_USAGE_V2, which allows for new data types
- BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck
but without mounting: useful for transparently using the kernel
version of fsck from 'bcachefs fsck' when the kernel version is a
better match for the on disk filesystem.
- BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet,
but the passes that are supported are fully featured - errors may be
corrected as normal.
The new ioctls use the new 'thread_with_file' abstraction for kicking
off a kthread that's tied to a file descriptor returned to userspace
via the ioctl.
- btree_paths within a btree_trans are now dynamically growable,
instead of being limited to 64. This is important for the
check_directory_structure phase of fsck, and also fixes some issues
we were having with btree path overflow in the reflink btree.
- Trigger refactoring; prep work for the upcoming disk space accounting
rewrite
- Numerous bugfixes :)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWe8PUACgkQE6szbY3K
bnYw6g/9GAXfIGasTZZwK2XEr36RYtEFYMwd/m9V1ET0DH6d/MFH9G7tTYl52AQ4
k9cDFb0d2qdtNk2Rlml1lHFrxMzkp2Q7j9S4YcETrE+/Dir8ODVcJXrGeNTCMGmz
B+C12mTOpWrzGMrioRgFZjWAnacsY3RP8NFRTT9HIJHO9UCP+xN5y++sX10C5Gwv
7UVWTaUwjkgdYWkR8RCKGXuG5cNNlRp4Y0eeK2XruG1iI9VAilir1glcD/YMOY8M
vECQzmf2ZLGFS/tpnmqVhNbNwVWpTQMYassvKaisWNHLDUgskOoF8YfoYSH27t7F
GBb1154O2ga6ea866677FDeNVlg386mGCTUy2xOhMpDL3zW+/Is+8MdfJI4MJP5R
EwcjHnn2bk0C2kULbAohw0gnU42FulfvsLNnrfxCeygmZrDoOOCL1HpvnBG4vskc
Fp6NK83l974QnyLdPsjr1yB2d2pgb+uMP1v76IukQi0IjNSAyvwSa5nloPTHRzpC
j6e2cFpdtX+6vEu6KngXVKTblSEnwhVBTaTR37Lr8PX1sZqFS/+mjRDgg3HZa/GI
u0fC0mQyVL9KjDs5LJGpTc/qs8J4mpoS5+dfzn38MI76dFxd5TYZKWVfILTrOtDF
ugDnoLkMuYFdueKI2M3YzxXyaA7HBT+7McAdENuJJzJnEuSAZs0=
=JvA2
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs updates from Kent Overstreet:
- btree write buffer rewrite: instead of adding keys to the btree write
buffer at transaction commit time, we now journal them with a
different journal entry type and copy them from the journal to the
write buffer just prior to journal write.
This reduces the number of atomic operations on shared cachelines in
the transaction commit path and is a signicant performance
improvement on some workloads: multithreaded 4k random writes went
from ~650k iops to ~850k iops.
- Bring back optimistic spinning for six locks: the new implementation
doesn't use osq locks; instead we add to the lock waitlist as normal,
and then spin on the lock_acquired bit in the waitlist entry, _not_
the lock itself.
- New ioctls:
- BCH_IOCTL_DEV_USAGE_V2, which allows for new data types
- BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of
fsck but without mounting: useful for transparently using the
kernel version of fsck from 'bcachefs fsck' when the kernel
version is a better match for the on disk filesystem.
- BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported
yet, but the passes that are supported are fully featured - errors
may be corrected as normal.
The new ioctls use the new 'thread_with_file' abstraction for kicking
off a kthread that's tied to a file descriptor returned to userspace
via the ioctl.
- btree_paths within a btree_trans are now dynamically growable,
instead of being limited to 64. This is important for the
check_directory_structure phase of fsck, and also fixes some issues
we were having with btree path overflow in the reflink btree.
- Trigger refactoring; prep work for the upcoming disk space accounting
rewrite
- Numerous bugfixes :)
* tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (226 commits)
bcachefs: eytzinger0_find() search should be const
bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent()
bcachefs: fix simulateously upgrading & downgrading
bcachefs: Restart recovery passes more reliably
bcachefs: bch2_dump_bset() doesn't choke on u64s == 0
bcachefs: improve checksum error messages
bcachefs: improve validate_bset_keys()
bcachefs: print sb magic when relevant
bcachefs: __bch2_sb_field_to_text()
bcachefs: %pg is banished
bcachefs: Improve would_deadlock trace event
bcachefs: fsck_err()s don't need to manually check c->sb.version anymore
bcachefs: Upgrades now specify errors to fix, like downgrades
bcachefs: no thread_with_file in userspace
bcachefs: Don't autofix errors we can't fix
bcachefs: add missing bch2_latency_acct() call
bcachefs: increase max_active on io_complete_wq
bcachefs: add time_stats for btree_node_read_done()
bcachefs: don't clear accessed bit in btree node fill
bcachefs: Add an option to control btree node prefetching
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmWYKUIUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNyHw/+IKnqL1MZ5QS+/HtSzi4jCL47N9yZ
OHLol6XswyEGHH9myKPPGnT5lVA93v98v4ty2mws7EJUSGZQQUntYBPbU9Gi40+B
XDzYSRocoj96sdlKeOJMgaWo3NBRD9HYSoGPDNWZixy6m+bLPk/Dqhn3FabKf1lo
2qQSmstvChFRmVNkmgaQnBCAtWVqla4EJEL0EKX6cspHbuzRNTeJdTPn6Q/zOUVL
O2znOZuEtSVpYS7yg3uJT0hHD8H0GnIciAcDAhyPSBL5Uk5l6gwJiACcdRfLRbgp
QM5Z4qUFdKljV5XBCzYnfhhrx1df08h1SG84El8UK8HgTTfOZfYmawByJRWNJSQE
TdCmtyyvEbfb61CKBFVwD7Tzb9/y8WgcY5N3Un8uCQqRzFIO+6cghHri5NrVhifp
nPFlP4klxLHh3d7ZVekLmCMHbpaacRyJKwLy+f/nwbBEID47jpPkvZFIpbalat+r
QaKRBNWdTeV+GZ+Yu0uWsI029aQnpcO1kAnGg09fl6b/dsmxeKOVWebir25AzQ++
a702S8HRmj80X+VnXHU9a64XeGtBH7Nq0vu0lGHQPgwhSx/9P6/qICEPwsIriRjR
I9OulWt4OBPDtlsonHFgDs+lbnd0Z0GJUwYT8e9pjRDMxijVO9lhAXyglVRmuNR8
to2ByKP5BO+Vh8Y=
=Py+n
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull security module updates from Paul Moore:
- Add three new syscalls: lsm_list_modules(), lsm_get_self_attr(), and
lsm_set_self_attr().
The first syscall simply lists the LSMs enabled, while the second and
third get and set the current process' LSM attributes. Yes, these
syscalls may provide similar functionality to what can be found under
/proc or /sys, but they were designed to support multiple,
simultaneaous (stacked) LSMs from the start as opposed to the current
/proc based solutions which were created at a time when only one LSM
was allowed to be active at a given time.
We have spent considerable time discussing ways to extend the
existing /proc interfaces to support multiple, simultaneaous LSMs and
even our best ideas have been far too ugly to support as a kernel
API; after +20 years in the kernel, I felt the LSM layer had
established itself enough to justify a handful of syscalls.
Support amongst the individual LSM developers has been nearly
unanimous, with a single objection coming from Tetsuo (TOMOYO) as he
is worried that the LSM_ID_XXX token concept will make it more
difficult for out-of-tree LSMs to survive. Several members of the LSM
community have demonstrated the ability for out-of-tree LSMs to
continue to exist by picking high/unused LSM_ID values as well as
pointing out that many kernel APIs rely on integer identifiers, e.g.
syscalls (!), but unfortunately Tetsuo's objections remain.
My personal opinion is that while I have no interest in penalizing
out-of-tree LSMs, I'm not going to penalize in-tree development to
support out-of-tree development, and I view this as a necessary step
forward to support the push for expanded LSM stacking and reduce our
reliance on /proc and /sys which has occassionally been problematic
for some container users. Finally, we have included the linux-api
folks on (all?) recent revisions of the patchset and addressed all of
their concerns.
- Add a new security_file_ioctl_compat() LSM hook to handle the 32-bit
ioctls on 64-bit systems problem.
This patch includes support for all of the existing LSMs which
provide ioctl hooks, although it turns out only SELinux actually
cares about the individual ioctls. It is worth noting that while
Casey (Smack) and Tetsuo (TOMOYO) did not give explicit ACKs to this
patch, they did both indicate they are okay with the changes.
- Fix a potential memory leak in the CALIPSO code when IPv6 is disabled
at boot.
While it's good that we are fixing this, I doubt this is something
users are seeing in the wild as you need to both disable IPv6 and
then attempt to configure IPv6 labeled networking via
NetLabel/CALIPSO; that just doesn't make much sense.
Normally this would go through netdev, but Jakub asked me to take
this patch and of all the trees I maintain, the LSM tree seemed like
the best fit.
- Update the LSM MAINTAINERS entry with additional information about
our process docs, patchwork, bug reporting, etc.
I also noticed that the Lockdown LSM is missing a dedicated
MAINTAINERS entry so I've added that to the pull request. I've been
working with one of the major Lockdown authors/contributors to see if
they are willing to step up and assume a Lockdown maintainer role;
hopefully that will happen soon, but in the meantime I'll continue to
look after it.
- Add a handful of mailmap entries for Serge Hallyn and myself.
* tag 'lsm-pr-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (27 commits)
lsm: new security_file_ioctl_compat() hook
lsm: Add a __counted_by() annotation to lsm_ctx.ctx
calipso: fix memory leak in netlbl_calipso_add_pass()
selftests: remove the LSM_ID_IMA check in lsm/lsm_list_modules_test
MAINTAINERS: add an entry for the lockdown LSM
MAINTAINERS: update the LSM entry
mailmap: add entries for Serge Hallyn's dead accounts
mailmap: update/replace my old email addresses
lsm: mark the lsm_id variables are marked as static
lsm: convert security_setselfattr() to use memdup_user()
lsm: align based on pointer length in lsm_fill_user_ctx()
lsm: consolidate buffer size handling into lsm_fill_user_ctx()
lsm: correct error codes in security_getselfattr()
lsm: cleanup the size counters in security_getselfattr()
lsm: don't yet account for IMA in LSM_CONFIG_COUNT calculation
lsm: drop LSM_ID_IMA
LSM: selftests for Linux Security Module syscalls
SELinux: Add selfattr hooks
AppArmor: Add selfattr hooks
Smack: implement setselfattr and getselfattr hooks
...
- Add initial support to recognise the HeXin C2000 processor.
- Add papr-vpd and papr-sysparm character device drivers for VPD & sysparm
retrieval, so userspace tools can be adapted to avoid doing raw firmware
calls from userspace.
- Sched domains optimisations for shared processor partitions on P9/P10.
- A series of optimisations for KVM running as a nested HV under PowerVM.
- Other small features and fixes.
Thanks to: Aditya Gupta, Aneesh Kumar K.V, Arnd Bergmann, Christophe Leroy,
Colin Ian King, Dario Binacchi, David Heidelberg, Geoff Levand, Gustavo A.
R. Silva, Haoran Liu, Jordan Niethe, Kajol Jain, Kevin Hao, Kunwu Chan, Li
kunyu, Li zeming, Masahiro Yamada, Michal Suchánek, Nathan Lynch, Naveen N Rao,
Nicholas Piggin, Randy Dunlap, Sathvika Vasireddy, Srikar Dronamraju, Stephen
Rothwell, Vaibhav Jain, Zhao Ke.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmWRVf0THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIfpEACns86LkKuH1wTxbXJFaY2vIdPbBVUO
oh0+y6Bm6ybCVvSp/CcyDPRRWpVlnp4BZlAh4x3gHrdRYEbIaFhI3gUzUtPLxAmf
Oza1qyN570AFOudTNOy3VErtHiMHSuI7ckRshXWCakbAN8VlBDFWje3VJ4vZZ5OB
Ii4RM0a3e/XqUZodLQXvDcqo3GDeIVmf1BnOTvEFFPhjZUZBfJarL6OHuyX7Xp1J
oGSBA3O7UBVGrQsoGS5UAMRqZQnvLc5hn150FU1qDPkHu5X5iLvIMUakTFCYgGYw
mT7DBPpDWKKFSfVjsjIVX2GPv8XSMPnZDmxOl/SIKM1F4aKAL9vmbYP6AMXXmvVB
SpluSmkcp+YujtK5QO8BN4I2SD3xIbhH8yjMUh2CAFP1SBR0QnKpXUGHRiZ0m7fM
SSFAHHLEzKJC46vUsazazoldyWQMAwBHKQzoASHf59yrEP4uta/+pimHdsOeU2UP
IAQEYzw7fTKbEIvqV4qf6sW+5bVUhISS1vSlJ3OEkGqUxVvaUMQ2ePPbX+rfv7lS
hXlxh9vjFzcDK5PYmLi0Agua9ct0ER0MOdY5kRMXAb4+AlVLQi4EgymxRCrjYu2/
XodDf1xJU2w7gdMc4TpiouHRrOtZQ9JWH5j+x0YnN4lG2vmG7lbU22a4myn6PjP9
RLAymXt4/1iHqA==
=LjlQ
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add initial support to recognise the HeXin C2000 processor.
- Add papr-vpd and papr-sysparm character device drivers for VPD &
sysparm retrieval, so userspace tools can be adapted to avoid doing
raw firmware calls from userspace.
- Sched domains optimisations for shared processor partitions on
P9/P10.
- A series of optimisations for KVM running as a nested HV under
PowerVM.
- Other small features and fixes.
Thanks to Aditya Gupta, Aneesh Kumar K.V, Arnd Bergmann, Christophe
Leroy, Colin Ian King, Dario Binacchi, David Heidelberg, Geoff Levand,
Gustavo A. R. Silva, Haoran Liu, Jordan Niethe, Kajol Jain, Kevin Hao,
Kunwu Chan, Li kunyu, Li zeming, Masahiro Yamada, Michal Suchánek,
Nathan Lynch, Naveen N Rao, Nicholas Piggin, Randy Dunlap, Sathvika
Vasireddy, Srikar Dronamraju, Stephen Rothwell, Vaibhav Jain, and
Zhao Ke.
* tag 'powerpc-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (96 commits)
powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2
powerpc/86xx: Drop unused CONFIG_MPC8610
powerpc/powernv: Add error handling to opal_prd_range_is_valid
selftests/powerpc: Fix spelling mistake "EACCESS" -> "EACCES"
powerpc/hvcall: Reorder Nestedv2 hcall opcodes
powerpc/ps3: Add missing set_freezable() for ps3_probe_thread()
powerpc/mpc83xx: Use wait_event_freezable() for freezable kthread
powerpc/mpc83xx: Add the missing set_freezable() for agent_thread_fn()
powerpc/fsl: Fix fsl,tmu-calibration to match the schema
powerpc/smp: Dynamically build Powerpc topology
powerpc/smp: Avoid asym packing within thread_group of a core
powerpc/smp: Add __ro_after_init attribute
powerpc/smp: Disable MC domain for shared processor
powerpc/smp: Enable Asym packing for cores on shared processor
powerpc/sched: Cleanup vcpu_is_preempted()
powerpc: add cpu_spec.cpu_features to vmcoreinfo
powerpc/imc-pmu: Add a null pointer check in update_events_in_group()
powerpc/powernv: Add a null pointer check in opal_powercap_init()
powerpc/powernv: Add a null pointer check in opal_event_init()
powerpc/powernv: Add a null pointer check to scom_debug_init_one()
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZU0CgAKCRCRxhvAZXjc
osncAQDSJK0frJL+72NqXxa4YNzivrnuw6fhp5iaDAEqxdm8ygEAoJWyh7Rmkt8G
drAXWGyGnCYqv7UgC6axLyciid7TxQg=
=vJuv
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
"This contains the work to retrieve detailed information about mounts
via two new system calls. This is hopefully the beginning of the end
of the saga that started with fsinfo() years ago.
The LWN articles in [1] and [2] can serve as a summary so we can avoid
rehashing everything here.
At LSFMM in May 2022 we got into a room and agreed on what we want to
do about fsinfo(). Basically, split it into pieces. This is the first
part of that agreement. Specifically, it is concerned with retrieving
information about mounts. So this only concerns the mount information
retrieval, not the mount table change notification, or the extended
filesystem specific mount option work. That is separate work.
Currently mounts have a 32bit id. Mount ids are already in heavy use
by libmount and other low-level userspace but they can't be relied
upon because they're recycled very quickly. We agreed that mounts
should carry a unique 64bit id by which they can be referenced
directly. This is now implemented as part of this work.
The new 64bit mount id is exposed in statx() through the new
STATX_MNT_ID_UNIQUE flag. If the flag isn't raised the old mount id is
returned. If it is raised and the kernel supports the new 64bit mount
id the flag is raised in the result mask and the new 64bit mount id is
returned. New and old mount ids do not overlap so they cannot be
conflated.
Two new system calls are introduced that operate on the 64bit mount
id: statmount() and listmount(). A summary of the api and usage can be
found on LWN as well (cf. [3]) but of course, I'll provide a summary
here as well.
Both system calls rely on struct mnt_id_req. Which is the request
struct used to pass the 64bit mount id identifying the mount to
operate on. It is extensible to allow for the addition of new
parameters and for future use in other apis that make use of mount
ids.
statmount() mimicks the semantics of statx() and exposes a set flags
that userspace may raise in mnt_id_req to request specific information
to be retrieved. A statmount() call returns a struct statmount filled
in with information about the requested mount. Supported requests are
indicated by raising the request flag passed in struct mnt_id_req in
the @mask argument in struct statmount.
Currently we do support:
- STATMOUNT_SB_BASIC:
Basic filesystem info
- STATMOUNT_MNT_BASIC
Mount information (mount id, parent mount id, mount attributes etc)
- STATMOUNT_PROPAGATE_FROM
Propagation from what mount in current namespace
- STATMOUNT_MNT_ROOT
Path of the root of the mount (e.g., mount --bind /bla /mnt returns /bla)
- STATMOUNT_MNT_POINT
Path of the mount point (e.g., mount --bind /bla /mnt returns /mnt)
- STATMOUNT_FS_TYPE
Name of the filesystem type as the magic number isn't enough due to submounts
The string options STATMOUNT_MNT_{ROOT,POINT} and STATMOUNT_FS_TYPE
are appended to the end of the struct. Userspace can use the offsets
in @fs_type, @mnt_root, and @mnt_point to reference those strings
easily.
The struct statmount reserves quite a bit of space currently for
future extensibility. This isn't really a problem and if this bothers
us we can just send a follow-up pull request during this cycle.
listmount() is given a 64bit mount id via mnt_id_req just as
statmount(). It takes a buffer and a size to return an array of the
64bit ids of the child mounts of the requested mount. Userspace can
thus choose to either retrieve child mounts for a mount in batches or
iterate through the child mounts. For most use-cases it will be
sufficient to just leave space for a few child mounts. But for big
mount tables having an iterator is really helpful. Iterating through a
mount table works by setting @param in mnt_id_req to the mount id of
the last child mount retrieved in the previous listmount() call"
Link: https://lwn.net/Articles/934469 [1]
Link: https://lwn.net/Articles/829212 [2]
Link: https://lwn.net/Articles/950569 [3]
* tag 'vfs-6.8.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
add selftest for statmount/listmount
fs: keep struct mnt_id_req extensible
wire up syscalls for statmount/listmount
add listmount(2) syscall
statmount: simplify string option retrieval
statmount: simplify numeric option retrieval
add statmount(2) syscall
namespace: extract show_path() helper
mounts: keep list of mounts in an rbtree
add unique mount ID
bcachefs's six locks need kvm_guest, via
ower_on_cpu() -> vcpu_is_preempted() -> is_kvm_guest()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
We're trying to get sched.h down to more or less just types only, not
code - rseq can live in its own header.
This helps us kill the dependency on preempt.h in sched.h.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Merge a branch containing SMP topology updates from Srikar, purely so we can
include the cover letter which has a lot of good detail here:
PowerVM systems configured in shared processors mode have some unique
challenges. Some device-tree properties will be missing on a shared
processor. Hence some sched domains may not make sense for shared processor
systems.
Most shared processor systems are over-provisioned. Underlying PowerVM
Hypervisor would schedule at a Big Core (SMT8) granularity. The most recent
power processors support two almost independent cores. In a lightly loaded
condition, it helps the overall system performance if we pack to lesser number
of Big Cores.
Since each thread-group is independent, running threads on both the
thread-groups of a SMT8 core, should have a minimal adverse impact in
non over provisioned scenarios. These changes in this patchset will not
affect in the over provisioned scenario. If there are more threads than
SMT domains, then asym_packing will not kick-in.
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=96 mem=1066409344 kB cpus=96 ent=64.00
So *64 Entitled cores/ 96 Virtual processor* Scenario
lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 768
On-line CPU(s) list: 0-767
Model name: POWER10 (architected), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 16
Socket(s): 6
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 6 MiB (192 instances)
L1i cache: 9 MiB (192 instances)
NUMA node(s): 6
NUMA node0 CPU(s): 0-7,32-39,80-87,128-135,176-183,224-231,272-279,320-327,368-375,416-423,464-471,512-519,560-567,608-615,656-663,704-711,752-759
NUMA node1 CPU(s): 8-15,40-47,88-95,136-143,184-191,232-239,280-287,328-335,376-383,424-431,472-479,520-527,568-575,616-623,664-671,712-719,760-767
NUMA node4 CPU(s): 64-71,112-119,160-167,208-215,256-263,304-311,352-359,400-407,448-455,496-503,544-551,592-599,640-647,688-695,736-743
NUMA node5 CPU(s): 16-23,48-55,96-103,144-151,192-199,240-247,288-295,336-343,384-391,432-439,480-487,528-535,576-583,624-631,672-679,720-727
NUMA node6 CPU(s): 72-79,120-127,168-175,216-223,264-271,312-319,360-367,408-415,456-463,504-511,552-559,600-607,648-655,696-703,744-751
NUMA node7 CPU(s): 24-31,56-63,104-111,152-159,200-207,248-255,296-303,344-351,392-399,440-447,488-495,536-543,584-591,632-639,680-687,728-735
ebizzy -t 32 -S 200 (5 iterations) Records per second. (Higher is better)
Kernel N Min Max Median Avg Stddev %Change
6.6.0-rc3 5 3840178 4059268 3978042 3973936.6 84264.456
+patch 5 3768393 3927901 3874994 3854046 71532.926 -3.01692
>From lparstat (when the workload stabilized)
Kernel %user %sys %wait %idle physc %entc lbusy app vcsw phint
6.6.0-rc3 4.16 0.00 0.00 95.84 26.06 40.72 4.16 69.88 276906989 578
+patch 4.16 0.00 0.00 95.83 17.70 27.66 4.17 78.26 70436663 119
ebizzy -t 128 -S 200 (5 iterations) Records per second. (Higher is better)
Kernel N Min Max Median Avg Stddev %Change
6.6.0-rc3 5 5520692 5981856 5717709 5727053.2 176093.2
+patch 5 5305888 6259610 5854590 5843311 375917.03 2.02998
>From lparstat (when the workload stabilized)
Kernel %user %sys %wait %idle physc %entc lbusy app vcsw phint
6.6.0-rc3 16.66 0.00 0.00 83.33 45.49 71.08 16.67 50.50 288778533 581
+patch 16.65 0.00 0.00 83.35 30.15 47.11 16.65 65.76 85196150 133
ebizzy -t 512 -S 200 (5 iterations) Records per second. (Higher is better)
Kernel N Min Max Median Avg Stddev %Change
6.6.0-rc3 5 19563921 20049955 19701510 19728733 198295.18
+patch 5 19455992 20176445 19718427 19832017 304094.05 0.523521
>From lparstat (when the workload stabilized)
%Kernel user %sys %wait %idle physc %entc lbusy app vcsw phint
66.6.0-rc3 6.44 0.01 0.00 33.55 94.14 147.09 66.45 1.33 313345175 621
6+patch 6.44 0.01 0.00 33.55 94.15 147.11 66.45 1.33 109193889 309
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=40 mem=1067539392 kB cpus=96 ent=40.00
So *40 Entitled cores/ 40 Virtual processor* Scenario
lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 320
On-line CPU(s) list: 0-319
Model name: POWER10 (architected), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 10
Socket(s): 4
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 2.5 MiB (80 instances)
L1i cache: 3.8 MiB (80 instances)
NUMA node(s): 4
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,128-135,160-167,192-199,224-231,256-263,288-295
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,136-143,168-175,200-207,232-239,264-271,296-303
NUMA node4 CPU(s): 16-23,48-55,80-87,112-119,144-151,176-183,208-215,240-247,272-279,304-311
NUMA node5 CPU(s): 24-31,56-63,88-95,120-127,152-159,184-191,216-223,248-255,280-287,312-319
ebizzy -t 32 -S 200 (5 iterations) Records per second. (Higher is better)
Kernel N Min Max Median Avg Stddev %Change
6.6.0-rc3 5 3535518 3864532 3745967 3704233.2 130216.76
+patch 5 3608385 3708026 3649379 3651596.6 37862.163 -1.42099
%Kernel user %sys %wait %idle physc %entc lbusy app vcsw phint
6.6.0-rc3 10.00 0.01 0.00 89.99 22.98 57.45 10.01 41.01 1135139 262
+patch 10.00 0.00 0.00 90.00 16.95 42.37 10.00 47.05 925561 19
ebizzy -t 64 -S 200 (5 iterations) Records per second. (Higher is better)
Kernel N Min Max Median Avg Stddev %Change
6.6.0-rc3 5 4434984 4957281 4548786 4591298.2 211770.2
+patch 5 4461115 4835167 4544716 4607795.8 151474.85 0.359323
%Kernel user %sys %wait %idle physc %entc lbusy app vcsw phint
6.6.0-rc3 20.01 0.00 0.00 79.99 38.22 95.55 20.01 25.77 1287553 265
+patch 19.99 0.00 0.00 80.01 25.55 63.88 19.99 38.44 1077341 20
ebizzy -t 256 -S 200 (5 iterations) Records per second. (Higher is better)
Kernel N Min Max Median Avg Stddev %Change
6.6.0-rc3 5 8850648 8982659 8951911 8936869.2 52278.031
+patch 5 8751038 9060510 8981409 8942268.4 117070.6 0.0604149
%Kernel user %sys %wait %idle physc %entc lbusy app vcsw phint
6.6.0-rc3 80.02 0.01 0.01 19.96 40.00 100.00 80.03 24.00 1597665 276
+patch 80.02 0.01 0.01 19.96 40.00 100.00 80.03 23.99 1383921 63
Observation:
We are able to see Improvement in ebizzy throughput even with lesser
core utilization (almost half the core utilization) in low utilization
scenarios while still retaining throughput in mid and higher utilization
scenarios.
Note: The numbers are with Uncapped + no-noise case. In the Capped and/or
noise case, due to contention on the Cores, the numbers are expected to
further improve.
Note: The numbers included (sched/fair: Enable group_asym_packing in find_idlest_group)
https://lore.kernel.org/all/20231018155036.2314342-1-srikar@linux.vnet.ibm.com/
Currently there are four Powerpc specific sched topologies. These are
all statically defined. However not all these topologies are used by
all Powerpc systems.
To avoid unnecessary degenerations by the scheduler, masks and flags
are compared. However if the sched topologies are build dynamically then
the code is simpler and there are greater chances of avoiding
degenerations.
Note:
Even X86 builds its sched topologies dynamically and proposed changes
are very similar to the way X86 is building its topologies.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214180720.310852-6-srikar@linux.vnet.ibm.com
PowerVM Hypervisor will schedule at a core granularity. However each
core can have more than one thread_groups. For better utilization in
case of a shared processor, its preferable for the scheduler to pack to
the lowest core. However there is no benefit of moving a thread between
two thread groups of the same core.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214180720.310852-5-srikar@linux.vnet.ibm.com
There are some variables that are only updated at boot time.
So add __ro_after_init attribute to such variables
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214180720.310852-4-srikar@linux.vnet.ibm.com
Like L2-cache info, coregroup information which is used to determine MC
sched domains is only present on dedicated LPARs. i.e PowerVM doesn't
export coregroup information for shared processor LPARs. Hence disable
creating MC domains on shared LPAR Systems.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214180720.310852-3-srikar@linux.vnet.ibm.com
If there are shared processor LPARs, underlying Hypervisor can have more
virtual cores to handle than actual physical cores.
Starting with Power 9, a big core (aka SMT8 core) has 2 nearly
independent thread groups. On a shared processors LPARs, it helps to
pack threads to lesser number of cores so that the overall system
performance and utilization improves. PowerVM schedules at a big core
level. Hence packing to fewer cores helps.
Since each thread-group is independent, running threads on both the
thread-groups of a SMT8 core, should have a minimal adverse impact in
non over provisioned scenarios. These changes in this patchset will not
affect in the over provisioned scenario. If there are more threads than
SMT domains, then asym_packing will not kick-in
For example: Lets says there are two 8-core Shared LPARs that are
actually sharing a 8 Core shared physical pool, each running 8 threads
each. Then Consolidating 8 threads to 4 cores on each LPAR would help
them to perform better. This is because each of the LPAR will get
100% time to run applications and there will no switching required by
the Hypervisor.
To achieve this, enable SD_ASYM_PACKING flag at CACHE, MC and DIE level
when the system is running in shared processor mode and has big cores.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214180720.310852-2-srikar@linux.vnet.ibm.com
If the function descriptor has a populated lock member, then callers
are required to hold it across calls. Now that the firmware activation
sequence is appropriately guarded, we can warn when the requirement
isn't satisfied.
__do_enter_rtas_trace() gets reorganized a bit as a result of
performing the function descriptor lookup unconditionally now.
Reviewed-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-8-e9eafd0c8c6c@linux.ibm.com
Use rtas_ibm_activate_firmware_lock to prevent interleaving call
sequences of the ibm,activate-firmware RTAS function, which typically
requires multiple calls to complete the update. While the spec does
not specifically prohibit interleaved sequences, there's almost
certainly no advantage to allowing them.
Reviewed-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-7-e9eafd0c8c6c@linux.ibm.com
On RTAS platforms there is a general restriction that the OS must not
enter RTAS on more than one CPU at a time. This low-level
serialization requirement is satisfied by holding a spin
lock (rtas_lock) across most RTAS function invocations.
However, some pseries RTAS functions require multiple successive calls
to complete a logical operation. Beginning a new call sequence for such a
function may disrupt any other sequences of that function already in
progress. Safe and reliable use of these functions effectively
requires higher-level serialization beyond what is already done at the
level of RTAS entry and exit.
Where a sequence-based RTAS function is invoked only through
sys_rtas(), with no in-kernel users, there is no issue as far as the
kernel is concerned. User space is responsible for appropriately
serializing its call sequences. (Whether user space code actually
takes measures to prevent sequence interleaving is another matter.)
Examples of such functions currently include ibm,platform-dump and
ibm,get-vpd.
But where a sequence-based RTAS function has both user space and
in-kernel uesrs, there is a hazard. Even if the in-kernel call sites
of such a function serialize their sequences correctly, a user of
sys_rtas() can invoke the same function at any time, potentially
disrupting a sequence in progress.
So in order to prevent disruption of kernel-based RTAS call sequences,
they must serialize not only with themselves but also with sys_rtas()
users, somehow. Preferably without adding more function-specific hacks
to sys_rtas(). This is a prerequisite for adding an in-kernel call
sequence of ibm,get-vpd, which is in a change to follow.
Note that it has never been feasible for the kernel to prevent
sys_rtas()-based sequences from being disrupted because control
returns to user space on every call. sys_rtas()-based users of these
functions have always been, and continue to be, responsible for
coordinating their call sequences with other users, even those which
may invoke the RTAS functions through less direct means than
sys_rtas(). This is an unavoidable consequence of exposing
sequence-based RTAS functions through sys_rtas().
* Add an optional mutex member to struct rtas_function.
* Statically define a mutex for each RTAS function with known call
sequence serialization requirements, and assign its address to the
.lock member of the corresponding function table entry, along with
justifying commentary.
* In sys_rtas(), if the table entry for the RTAS function being
called has a populated lock member, acquire it before taking
rtas_lock and entering RTAS.
* Kernel-based RTAS call sequences are expected to access the
appropriate mutex explicitly by name. For example, a user of the
ibm,activate-firmware RTAS function would do:
int token = rtas_function_token(RTAS_FN_IBM_ACTIVATE_FIRMWARE);
int fwrc;
mutex_lock(&rtas_ibm_activate_firmware_lock);
do {
fwrc = rtas_call(token, 0, 1, NULL);
} while (rtas_busy_delay(fwrc));
mutex_unlock(&rtas_ibm_activate_firmware_lock);
There should be no perceivable change introduced here except that
concurrent callers of the same RTAS function via sys_rtas() may block
on a mutex instead of spinning on rtas_lock.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-6-e9eafd0c8c6c@linux.ibm.com
The rtas system call handler sys_rtas() delegates certain input
validation steps to a helper function: block_rtas_call(). One of these
steps ensures that the user-supplied token value maps to a known RTAS
function. This is done by performing a "reverse" token-to-function
lookup via rtas_token_to_function_untrusted() to obtain an
rtas_function object.
In changes to come, sys_rtas() itself will need the function
descriptor for the token. To prepare:
* Move the lookup and validation up into sys_rtas() and pass the
resulting rtas_function pointer to block_rtas_call(), which is
otherwise unconcerned with the token value.
* Change block_rtas_call() to report the RTAS function name instead of
the token value on validation failures, since it can now rely on
having a valid function descriptor.
One behavior change is that sys_rtas() now silently errors out when
passed a bad token, before calling block_rtas_call(). So we will no
longer log "RTAS call blocked - exploit attempt?" on invalid
tokens. This is consistent with how sys_rtas() currently handles other
"metadata" (nargs and nret), while block_rtas_call() is primarily
concerned with validating the arguments to be passed to specific RTAS
functions.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-5-e9eafd0c8c6c@linux.ibm.com
Enabling any of the powerpc:rtas_* tracepoints at boot is likely to
result in an oops on RTAS platforms. For example, booting a QEMU
pseries model with 'trace_event=powerpc:rtas_input' in the command
line leads to:
BUG: Kernel NULL pointer dereference on read at 0x00000008
Oops: Kernel access of bad area, sig: 7 [#1]
NIP [c00000000004231c] do_enter_rtas+0x1bc/0x460
LR [c00000000004231c] do_enter_rtas+0x1bc/0x460
Call Trace:
do_enter_rtas+0x1bc/0x460 (unreliable)
rtas_call+0x22c/0x4a0
rtas_get_boot_time+0x80/0x14c
read_persistent_clock64+0x124/0x150
read_persistent_wall_and_boot_offset+0x28/0x58
timekeeping_init+0x70/0x348
start_kernel+0xa0c/0xc1c
start_here_common+0x1c/0x20
(This is preceded by a warning for the failed lookup in
rtas_token_to_function().)
This happens when __do_enter_rtas_trace() attempts a token to function
descriptor lookup before the xarray containing the mappings has been
set up.
Fall back to linear scan of the table if rtas_token_to_function_xarray
is empty.
Fixes: 24098f580e2b ("powerpc/rtas: add tracepoints around RTAS entry")
Reviewed-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-3-e9eafd0c8c6c@linux.ibm.com
Add a convenience macro for iterating over every element of the
internal function table and convert the one site that can use it. An
additional user of the macro is anticipated in changes to follow.
Reviewed-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-2-e9eafd0c8c6c@linux.ibm.com
rtas_token_to_function() WARNs when passed an invalid token; it's
meant to catch bugs in kernel-based users of RTAS functions. However,
user space controls the token value passed to rtas_token_to_function()
by block_rtas_call(), so user space with sufficient privilege to use
sys_rtas() can trigger the warnings at will:
unexpected failed lookup for token 2048
WARNING: CPU: 20 PID: 2247 at arch/powerpc/kernel/rtas.c:556
rtas_token_to_function+0xfc/0x110
...
NIP rtas_token_to_function+0xfc/0x110
LR rtas_token_to_function+0xf8/0x110
Call Trace:
rtas_token_to_function+0xf8/0x110 (unreliable)
sys_rtas+0x188/0x880
system_call_exception+0x268/0x530
system_call_common+0x160/0x2c4
It's desirable to continue warning on bogus tokens in
rtas_token_to_function(). Currently it is used to look up RTAS
function descriptors when tracing, where we know there has to have
been a successful descriptor lookup by different means already, and it
would be a serious inconsistency for the reverse lookup to fail.
So instead of weakening rtas_token_to_function()'s contract by
removing the warnings, introduce rtas_token_to_function_untrusted(),
which has no opinion on failed lookups. Convert block_rtas_call() and
rtas_token_to_function() to use it.
Fixes: 8252b88294d2 ("powerpc/rtas: improve function information lookups")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-1-e9eafd0c8c6c@linux.ibm.com
The vdso Makefile adds -U$(ARCH) to CPPFLAGS for the vdso64.lds linker
script. ARCH is always powerpc, so it becomes -Upowerpc, which means
undefine the "powerpc" symbol.
But the 64-bit compiler doesn't define powerpc in the first place,
compare:
$ gcc-5.1.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc -m32 -E -dM - </dev/null | grep -w powerpc
#define powerpc 1
$ gcc-5.1.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc -m64 -E -dM - </dev/null | grep -w powerpc
$
So there's no need to undefine it for the 64-bit linker script.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231206115548.1466874-2-mpe@ellerman.id.au
Commit 41a506ef71eb ("powerpc/ftrace: Create a dummy stackframe to fix
stack unwind") added use of a new stack frame on ftrace entry to fix
stack unwind. However, the commit missed updating the offset used while
tearing down the ftrace stack when ftrace is disabled. Fix the same.
In addition, the commit missed saving the correct stack pointer in
pt_regs. Update the same.
Fixes: 41a506ef71eb ("powerpc/ftrace: Create a dummy stackframe to fix stack unwind")
Cc: stable@vger.kernel.org # v6.5+
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231130065947.2188860-1-naveen@kernel.org
With CONFIG_GENERIC_BUG=n the build fails with:
arch/powerpc/kernel/traps.c:1442:5: error: no previous prototype for ‘is_valid_bugaddr’ [-Werror=missing-prototypes]
1442 | int is_valid_bugaddr(unsigned long addr)
| ^~~~~~~~~~~~~~~~
The prototype is only defined, and the function is only needed, when
CONFIG_GENERIC_BUG=y, so move the implementation under that.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231130114433.3053544-2-mpe@ellerman.id.au
With HIBERNATION=y the build breaks with:
arch/powerpc/kernel/swsusp_64.c:14:6: error: no previous prototype for ‘do_after_copyback’ [-Werror=missing-prototypes]
14 | void do_after_copyback(void)
| ^~~~~~~~~~~~~~~~~
do_after_copyback() is only called from asm, so there is no prototype,
nor any header where it makes sense to place one. Just add a prototype
in the C file to fix the build error.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231129131919.2528517-1-mpe@ellerman.id.au
Before running a guest, the host process (e.g., QEMU) FP/VEC registers
are saved if they were being used, similarly to when the kernel uses FP
registers. The guest values are then loaded into regs, and the host
process registers will be restored lazily when it uses FP/VEC.
KVM HV has a bug here: the host process registers do get saved, but the
user MSR bits remain enabled, which indicates the registers are valid
for the process. After they are clobbered by running the guest, this
valid indication causes the host process to take on the FP/VEC register
values of the guest.
Fixes: 34e119c96b2b ("KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231122025811.2973-1-npiggin@gmail.com
During floating point and vector save to thread data f0/vs0 are
clobbered by the FPSCR/VSCR store routine. This has been obvserved to
lead to userspace register corruption and application data corruption
with io-uring.
Fix it by restoring f0/vs0 after FPSCR/VSCR store has completed for
all the FP, altivec, VMX register save paths.
Tested under QEMU in kvm mode, running on a Talos II workstation with
dual POWER9 DD2.2 CPUs.
Additional detail (mpe):
Typically save_fpu() is called from __giveup_fpu() which saves the FP
regs and also *turns off FP* in the tasks MSR, meaning the kernel will
reload the FP regs from the thread struct before letting the task use FP
again. So in that case save_fpu() is free to clobber f0 because the FP
regs no longer hold live values for the task.
There is another case though, which is the path via:
sys_clone()
...
copy_process()
dup_task_struct()
arch_dup_task_struct()
flush_all_to_thread()
save_all()
That path saves the FP regs but leaves them live. That's meant as an
optimisation for a process that's using FP/VSX and then calls fork(),
leaving the regs live means the parent process doesn't have to take a
fault after the fork to get its FP regs back. The optimisation was added
in commit 8792468da5e1 ("powerpc: Add the ability to save FPU without
giving it up").
That path does clobber f0, but f0 is volatile across function calls,
and typically programs reach copy_process() from userspace via a syscall
wrapper function. So in normal usage f0 being clobbered across a
syscall doesn't cause visible data corruption.
But there is now a new path, because io-uring can call copy_process()
via create_io_thread() from the signal handling path. That's OK if the
signal is handled as part of syscall return, but it's not OK if the
signal is handled due to some other interrupt.
That path is:
interrupt_return_srr_user()
interrupt_exit_user_prepare()
interrupt_exit_user_prepare_main()
do_notify_resume()
get_signal()
task_work_run()
create_worker_cb()
create_io_worker()
copy_process()
dup_task_struct()
arch_dup_task_struct()
flush_all_to_thread()
save_all()
if (tsk->thread.regs->msr & MSR_FP)
save_fpu()
# f0 is clobbered and potentially live in userspace
Note the above discussion applies equally to save_altivec().
Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
Cc: stable@vger.kernel.org # v4.6+
Closes: https://lore.kernel.org/all/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/
Closes: https://lore.kernel.org/linuxppc-dev/480221078.47953493.1700206777956.JavaMail.zimbra@raptorengineeringinc.com/
Tested-by: Timothy Pearson <tpearson@raptorengineering.com>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
[mpe: Reword change log to describe exact path of corruption & other minor tweaks]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1921539696.48534988.1700407082933.JavaMail.zimbra@raptorengineeringinc.com
The rtas_read_config() and rtas_write_config() functions in
kernel/rtas_pci.c have external linkage and two users in arch/powerpc:
the rtas_pci code itself and the pseries platform's "enhanced error
handling" (EEH) support code.
The prototypes for these functions in asm/ppc-pci.h have until now
been guarded by CONFIG_EEH since the only external caller is the
pseries EEH code. However, this presumably has always generated
warnings when built with !CONFIG_EEH and -Wmissing-prototypes:
arch/powerpc/kernel/rtas_pci.c:46:5: error: no previous prototype for
function 'rtas_read_config' [-Werror,-Wmissing-prototypes]
46 | int rtas_read_config(struct pci_dn *pdn, int where,
int size, u32 *val)
arch/powerpc/kernel/rtas_pci.c:98:5: error: no previous prototype for
function 'rtas_write_config' [-Werror,-Wmissing-prototypes]
98 | int rtas_write_config(struct pci_dn *pdn, int where,
int size, u32 val)
The introduction of commit c6345dfa6e3e ("Makefile.extrawarn: turn on
missing-prototypes globally") forces the issue.
The efika and chrp platform code have (static) functions with the same
names but different signatures. We may as well eliminate the potential
for conflicts and confusion by renaming the globally visible versions
as their prototypes get moved out of the CONFIG_EEH-guarded region;
their current names are too generic anyway. Since they operate on
objects of the type 'struct pci_dn *', give them the slightly more
verbose prefix "rtas_pci_dn_" and fix up all the call sites.
Fixes: c6345dfa6e3e ("Makefile.extrawarn: turn on missing-prototypes globally")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/linuxppc-dev/CA+G9fYt0LLXtjSz+Hkf3Fhm-kf0ZQanrhUS+zVZGa3O+Wt2+vg@mail.gmail.com/
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231127-rtas-pci-rw-config-v1-1-385d29ace3df@linux.ibm.com
Commit fb5a515704d7 ("powerpc: Remove platforms/wsp and associated
pieces") removed the A2 CPU support, but missed removal of reg_a2.h.
None of the defines contained in it are used, with the exception of the
SPRN_TEN* values, but they are also defined in reg_booke.h.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231113043947.1931831-1-mpe@ellerman.id.au
The cpu_spec is a struct holding various information about the CPU the
kernel is executing on. It's populated early in boot and must not change
after that.
In particular the cpu_features and mmu_features hold the set of
discovered CPU/MMU features and are used to set static keys for each
feature, and do binary patching of assembly. So any change to the
cpu_features/mmu_features later in boot will not be reflected in
the state of the static keys or patched code.
There is already logic to check that cpu_features/mmu_features don't
change, see check_features() in feature-fixups.c.
But as another layer of protection the entire cpu_spec should be read
only after init, annotate it as such.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231025012452.1985680-1-mpe@ellerman.id.au
rtas_service_present() has no more users.
rtas_function_implemented() is now the appropriate API for determining
whether a given RTAS function is available to call.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231106-rtas-trivial-v1-4-61847655c51f@linux.ibm.com
- Finish a refactor of pgprot_framebuffer() which dependend on some changes
that were merged via the drm tree.
- Fix some kernel-doc warnings to quieten the bots.
Thanks to: Nathan Lynch, Thomas Zimmermann.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmVQIV8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgKTmEACKY2QHnc8ppY2V3W2D62q336OXU8Jj
ljJdPj/4dMlbFxi7RcUHhENGx97KN7pJX/bIOYv+iK4C34B1sM/sMG6OxXzWrlJw
ff2MnxE3ekljFerPdtx0fu3upCsr93hB3spm+/9pb/5V5SViK/gJt70dLUJuZ4ei
Y4AW0mnS4dMNMPZDGwI9GHbjCdq1GAbG9JdfDWbltKu2G3zNuM4MTa0IVJY/kHgU
8dbrPcs4LooC/RXJDTVdpBpShKg4i5sejcK30BP8qV0EXuez09lIRSk464n4aBEi
LWnKavsLOAAGYhEFCuBsn/ZFbWUWCmV6ARcC7ydZ+ukhZi+0iioPMh1dGO0Bo+rP
qesGLMddvsRZHInFN44NLDFVv03NA4V97LazvLQoUKSw8Oyt7aglLCmy+3YZL5Pd
Zny/Pi5Vq3Ma45lqGuafoaT2qhERz4Z3tbedtRcdO3APVnvtGtgWUUPym8xNKAe4
mOx0R1EzVdD3QXjh1Fwi9We69tdu5yRDmu+qne07x2T/vJN5zPR9k6sZXkuv85zH
jX53GlVyLTLXVuD00pFcL9/wjlWhzFHk2BUCg8scKgkqdadN323uZ9qhyn1/VJFt
E+2j0vLUlRA3Bj+WqcbY8TNq7HsDo91nt1ceYDtnHmRiZcSjRj/rh+cNyd28j+Zk
Z4hXJkznVjBHAw==
=Qaeg
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Finish a refactor of pgprot_framebuffer() which dependend
on some changes that were merged via the drm tree
- Fix some kernel-doc warnings to quieten the bots
Thanks to Nathan Lynch and Thomas Zimmermann.
* tag 'powerpc-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/rtas: Fix ppc_rtas_rmo_buf_show() kernel-doc
powerpc/pseries/rtas-work-area: Fix rtas_work_area_reserve_arena() kernel-doc
powerpc/fb: Call internal __phys_mem_access_prot() in fbdev code
powerpc: Remove file parameter from phys_mem_access_prot()
powerpc/machdep: Remove trailing whitespaces
Including:
- Core changes:
- Make default-domains mandatory for all IOMMU drivers
- Remove group refcounting
- Add generic_single_device_group() helper and consolidate
drivers
- Cleanup map/unmap ops
- Scaling improvements for the IOVA rcache depot
- Convert dart & iommufd to the new domain_alloc_paging()
- ARM-SMMU:
- Device-tree binding update:
- Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC
- SMMUv2:
- Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs
- SMMUv3:
- Large refactoring of the context descriptor code to
move the CD table into the master, paving the way
for '->set_dev_pasid()' support on non-SVA domains
- Minor cleanups to the SVA code
- Intel VT-d:
- Enable debugfs to dump domain attached to a pasid
- Remove an unnecessary inline function.
- AMD IOMMU:
- Initial patches for SVA support (not complete yet)
- S390 IOMMU:
- DMA-API conversion and optimized IOTLB flushing
- Some smaller fixes and improvements
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmVJFcEACgkQK/BELZcB
GuMgDxAAsnYVQjQ7wRkwR0rHARuEaJ+Lz2vkLNH+uYXjBzhFe2bT+ykMcZysAkdK
A5PMLOFT5Etf+PAqOM0CoIGQFOefAId6uGl7S61Fp9ZWDKhMrOBFWhxGOaufA1Du
tNvt3i66hwPSDZa82kY3wRCluYtj0aBBzmM6ZTwBwFZdQ7LABMtE8OxisqncVvq0
H6vhV213fqvhCFSQJ6PnTAEiv70WvWBWygA+Z/gwYf9hypZQae91PNXdK9313a9z
OvCzGBkL/R5/3KkJd88UhFwyYzyNGxq/DmH1etawYR5gYZ8UT/Z/sYpcx9hlO7qr
eENPqeQc+YHZXpKqkaq66HBA1FSnXUqRZLl4cVaZahRRMe/yArsBM6R0W1AfkMAR
rZxwHKoHUWeuHQLMVvmSDNL57h/GJJpTXjRc8HMxLZkVp+ScvnT5XCYHWWzRdCdx
TcC/pJ1tet0FQ8rw09ovlwpGVA6eojWvcpVbLVLfGN8ZWViSVfvNFoPNb7HsGK6M
iRi+L41Y7s63cyogC/Gsae2RAvYv29ZpvE91lmon2u+VBlTpMdOFX9EhWS6RqOBF
cV30bhsw0dyCB7v5jDPtABYEOaR6l1mPLhn1gX3u0Ue/tmPhLX69k4bVWBY6wP3p
gmmJD9ub8FuPQtFCGPE7/8ZINjGGrfiKO24DNI2Ty3XEeq21hU4=
=UyWC
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"Core changes:
- Make default-domains mandatory for all IOMMU drivers
- Remove group refcounting
- Add generic_single_device_group() helper and consolidate drivers
- Cleanup map/unmap ops
- Scaling improvements for the IOVA rcache depot
- Convert dart & iommufd to the new domain_alloc_paging()
ARM-SMMU:
- Device-tree binding update:
- Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC
- SMMUv2:
- Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs
- SMMUv3:
- Large refactoring of the context descriptor code to move the CD
table into the master, paving the way for '->set_dev_pasid()'
support on non-SVA domains
- Minor cleanups to the SVA code
Intel VT-d:
- Enable debugfs to dump domain attached to a pasid
- Remove an unnecessary inline function
AMD IOMMU:
- Initial patches for SVA support (not complete yet)
S390 IOMMU:
- DMA-API conversion and optimized IOTLB flushing
And some smaller fixes and improvements"
* tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (102 commits)
iommu/dart: Remove the force_bypass variable
iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging()
iommu/dart: Convert to domain_alloc_paging()
iommu/dart: Move the blocked domain support to a global static
iommu/dart: Use static global identity domains
iommufd: Convert to alloc_domain_paging()
iommu/vt-d: Use ops->blocked_domain
iommu/vt-d: Update the definition of the blocking domain
iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain
Revert "iommu/vt-d: Remove unused function"
iommu/amd: Remove DMA_FQ type from domain allocation path
iommu: change iommu_map_sgtable to return signed values
iommu/virtio: Add __counted_by for struct viommu_request and use struct_size()
iommu/vt-d: debugfs: Support dumping a specified page table
iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid}
iommu/vt-d: debugfs: Dump entry pointing to huge page
iommu/vt-d: Remove unused function
iommu/arm-smmu-v3-sva: Remove bond refcount
iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle
iommu/arm-smmu-v3: Rename cdcfg to cd_table
...
>From a W=1 build:
>> arch/powerpc/kernel/rtas-proc.c:771: warning: Function parameter or member 'm' not described in
>> 'ppc_rtas_rmo_buf_show'
>> arch/powerpc/kernel/rtas-proc.c:771: warning: Function parameter or member 'v' not described in
>> 'ppc_rtas_rmo_buf_show'
Add the missing parameter descriptions.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309211645.1Lvwmbv4-lkp@intel.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231106-rtas-trivial-v1-2-61847655c51f@linux.ibm.com
Remove 'file' parameter from struct machdep_calls.phys_mem_access_prot
and its implementation in pci_phys_mem_access_prot(). The file is not
used on PowerPC. By removing it, a later patch can simplify fbdev's
mmap code, which uses phys_mem_access_prot() on PowerPC.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
[mpe: Rebase on unrelated changes to phys_mem_access_prot()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230922080636.26762-5-tzimmermann@suse.de
Here is the big set of tty/serial driver changes for 6.7-rc1. Included
in here are:
- console/vgacon cleanups and removals from Arnd
- tty core and n_tty cleanups from Jiri
- lots of 8250 driver updates and cleanups
- sc16is7xx serial driver updates
- dt binding updates
- first set of port lock wrapers from Thomas for the printk fixes
coming in future releases
- other small serial and tty core cleanups and updates
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZUTbaw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk9+gCeKdoRb8FDwGCO/GaoHwR4EzwQXhQAoKXZRmN5
LTtw9sbfGIiBdOTtgLPb
=6PJr
-----END PGP SIGNATURE-----
Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty and serial updates from Greg KH:
"Here is the big set of tty/serial driver changes for 6.7-rc1. Included
in here are:
- console/vgacon cleanups and removals from Arnd
- tty core and n_tty cleanups from Jiri
- lots of 8250 driver updates and cleanups
- sc16is7xx serial driver updates
- dt binding updates
- first set of port lock wrapers from Thomas for the printk fixes
coming in future releases
- other small serial and tty core cleanups and updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (193 commits)
serdev: Replace custom code with device_match_acpi_handle()
serdev: Simplify devm_serdev_device_open() function
serdev: Make use of device_set_node()
tty: n_gsm: add copyright Siemens Mobility GmbH
tty: n_gsm: fix race condition in status line change on dead connections
serial: core: Fix runtime PM handling for pending tx
vgacon: fix mips/sibyte build regression
dt-bindings: serial: drop unsupported samsung bindings
tty: serial: samsung: drop earlycon support for unsupported platforms
tty: 8250: Add note for PX-835
tty: 8250: Fix IS-200 PCI ID comment
tty: 8250: Add Brainboxes Oxford Semiconductor-based quirks
tty: 8250: Add support for Intashield IX cards
tty: 8250: Add support for additional Brainboxes PX cards
tty: 8250: Fix up PX-803/PX-857
tty: 8250: Fix port count of PX-257
tty: 8250: Add support for Intashield IS-100
tty: 8250: Add support for Brainboxes UP cards
tty: 8250: Add support for additional Brainboxes UC cards
tty: 8250: Remove UC-257 and UC-431
...