IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZlRqlgAKCRCRxhvAZXjc
os5tAQC6o3f2X39FooKv4bbbQkBXx5x8GqjUZyfnYjbm+Mak7wD/cf8tm4LLvVLt
1g7FbakWkEyQKhPRBMhtngX1GdKiuQI=
=Isax
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix io_uring based write-through after converting cifs to use the
netfs library
- Fix aio error handling when doing write-through via netfs library
- Fix performance regression in iomap when used with non-large folio
mappings
- Fix signalfd error code
- Remove obsolete comment in signalfd code
- Fix async request indication in netfs_perform_write() by raising
BDP_ASYNC when IOCB_NOWAIT is set
- Yield swap device immediately to prevent spurious EBUSY errors
- Don't cross a .backup mountpoint from backup volumes in afs to avoid
infinite loops
- Fix a race between umount and async request completion in 9p after 9p
was converted to use the netfs library
* tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
netfs, 9p: Fix race between umount and async request completion
afs: Don't cross .backup mountpoint from backup volume
swap: yield device immediately
netfs: Fix setting of BDP_ASYNC from iocb flags
signalfd: drop an obsolete comment
signalfd: fix error return code
iomap: fault in smaller chunks for non-large folio mappings
filemap: add helper mapping_max_folio_size()
netfs: Fix AIO error handling when doing write-through
netfs: Fix io_uring based write-through
btf_find_struct_member() might return NULL or an error via the
ERR_PTR() macro. However, its caller in parse_btf_field() only checks
for the NULL condition. Fix this by using IS_ERR() and returning the
error up the stack.
Link: https://lore.kernel.org/all/20240527094351.15687-1-clopez@suse.de/
Fixes: c440adfbe3025 ("tracing/probes: Support BTF based data structure field access")
Signed-off-by: Carlos López <clopez@suse.de>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
- Fix x86 IRQ vector leak caused by a CPU offlining race
- Fix build failure in the riscv-imsic irqchip driver
caused by an API-change semantic conflict
- Fix use-after-free in irq_find_at_or_after()
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZRwMURHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1h/zQ//TTrgyXi6+1xXY4R0LDU45j+wavMTMkq3
kM3eUeyXgy+FDtvLRVaYgEAYbtuR4LGFN9qmVuEHJPZQwpi3AFlnGFUFjFUvyE43
xJuOtHoxFv3mj09VgRGsjZvzp8bxYSkEn3h0ryTWGUHzR+QmoQmYWrU6HExgXw3R
+s8pvi14g6R/+PAy05cF0k1J7aeSsYaOfd38D/XnpyhuhXvPMS2eHgovV6I5Qhk4
5lV6rzJv8XlKxVr7bOYJkRePE3z0HMtx0G7eo8eYERBQapHede18V8imv4OpUiua
vmG8cFhF4Lq9KFdEtiVuf1X9/XH3PoEKTGA81oqQ9lLN9USx7ME/Peg6U5ezvEkp
YmQx2LS12DWqYp5PZQTN0CHnfmMLgksmyGELM3JE/dFFCVh4HdpMrh+2wLwWGRJ3
JLzAJh3YwcPhayLpNVgsSF9AtLKTkDoS0bHd43mHnB6VaEKkus8zbeuCxYAsUeMJ
5wCZw3xQjTZEaMMNd1hJN5O/9TX2of+T6Z4C4cacMBmwpD7vX5oXmDYLE/wUHw6m
9Z67fvOvTdIf3MkYSqjGXFKD1JobL/PmwCfaaGUQFVJkbX5WVNDk6C1zgs5FhmuY
U/AcYfadbNdLVXrN3VLnX6Gmb7gFPShOAE1GgXGeszSReI4pbOUy2zopRGAEWSZS
fRu8nyveGjw=
=vxJh
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
- Fix x86 IRQ vector leak caused by a CPU offlining race
- Fix build failure in the riscv-imsic irqchip driver
caused by an API-change semantic conflict
- Fix use-after-free in irq_find_at_or_after()
* tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
get_pid_task() internally already calls rcu_read_lock() and
rcu_read_unlock(), so there is no point to do this one extra time.
This is a drive-by improvement and has no correctness implications.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Current implementation of PID filtering logic for multi-uprobes in
uprobe_prog_run() is filtering down to exact *thread*, while the intent
for PID filtering it to filter by *process* instead. The check in
uprobe_prog_run() also differs from the analogous one in
uprobe_multi_link_filter() for some reason. The latter is correct,
checking task->mm, not the task itself.
Fix the check in uprobe_prog_run() to perform the same task->mm check.
While doing this, we also update get_pid_task() use to use PIDTYPE_TGID
type of lookup, given the intent is to get a representative task of an
entire process. This doesn't change behavior, but seems more logical. It
would hold task group leader task now, not any random thread task.
Last but not least, given multi-uprobe support is half-broken due to
this PID filtering logic (depending on whether PID filtering is
important or not), we need to make it easy for user space consumers
(including libbpf) to easily detect whether PID filtering logic was
already fixed.
We do it here by adding an early check on passed pid parameter. If it's
negative (and so has no chance of being a valid PID), we return -EINVAL.
Previous behavior would eventually return -ESRCH ("No process found"),
given there can't be any process with negative PID. This subtle change
won't make any practical change in behavior, but will allow applications
to detect PID filtering fixes easily. Libbpf fixes take advantage of
this in the next patch.
Cc: stable@vger.kernel.org
Acked-by: Jiri Olsa <jolsa@kernel.org>
Fixes: b733eeade420 ("bpf: Add pid filter support for uprobe_multi link")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240521163401.3005045-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Otherwise we can cause spurious EBUSY issues when trying to mount the
rootfs later on.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218845
Reported-by: Petri Kaukasoina <petri.kaukasoina@tuni.fi>
Signed-off-by: Christian Brauner <brauner@kernel.org>
irq_find_at_or_after() dereferences the interrupt descriptor which is
returned by mt_find() while neither holding sparse_irq_lock nor RCU read
lock, which means the descriptor can be freed between mt_find() and the
dereference:
CPU0 CPU1
desc = mt_find()
delayed_free_desc(desc)
irq_desc_get_irq(desc)
The use-after-free is reported by KASAN:
Call trace:
irq_get_next_irq+0x58/0x84
show_stat+0x638/0x824
seq_read_iter+0x158/0x4ec
proc_reg_read_iter+0x94/0x12c
vfs_read+0x1e0/0x2c8
Freed by task 4471:
slab_free_freelist_hook+0x174/0x1e0
__kmem_cache_free+0xa4/0x1dc
kfree+0x64/0x128
irq_kobj_release+0x28/0x3c
kobject_put+0xcc/0x1e0
delayed_free_desc+0x14/0x2c
rcu_do_batch+0x214/0x720
Guard the access with a RCU read lock section.
Fixes: 721255b9826b ("genirq: Use a maple tree for interrupt descriptor management")
Signed-off-by: dicken.ding <dicken.ding@mediatek.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240524091739.31611-1-dicken.ding@mediatek.com
Patch series "Introduce mseal", v10.
This patchset proposes a new mseal() syscall for the Linux kernel.
In a nutshell, mseal() protects the VMAs of a given virtual memory range
against modifications, such as changes to their permission bits.
Modern CPUs support memory permissions, such as the read/write (RW) and
no-execute (NX) bits. Linux has supported NX since the release of kernel
version 2.6.8 in August 2004 [1]. The memory permission feature improves
the security stance on memory corruption bugs, as an attacker cannot
simply write to arbitrary memory and point the code to it. The memory
must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct). mseal() additionally protects the
VMA itself against modifications of the selected seal type.
Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped. Memory sealing can automatically be
applied by the runtime loader to seal .text and .rodata pages and
applications can additionally seal security critical data at runtime. A
similar feature already exists in the XNU kernel with the
VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall
[4]. Also, Chrome wants to adopt this feature for their CFI work [2] and
this patchset has been designed to be compatible with the Chrome use case.
Two system calls are involved in sealing the map: mmap() and mseal().
The new mseal() is an syscall on 64 bit CPU, and with following signature:
int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.
mseal() blocks following operations for the given memory range.
1> Unmapping, moving to another location, and shrinking the size,
via munmap() and mremap(), can leave an empty space, therefore can
be replaced with a VMA with a new set of attributes.
2> Moving or expanding a different VMA into the current location,
via mremap().
3> Modifying a VMA via mmap(MAP_FIXED).
4> Size expansion, via mremap(), does not appear to pose any specific
risks to sealed VMAs. It is included anyway because the use case is
unclear. In any case, users can rely on merging to expand a sealed VMA.
5> mprotect() and pkey_mprotect().
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
memory, when users don't have write permission to the memory. Those
behaviors can alter region contents by discarding pages, effectively a
memset(0) for anonymous memory.
The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this
API.
Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications. For example, in the
case of libc, sealing is only applied to read-only (RO) or read-execute
(RX) memory segments (such as .text and .RELRO) to prevent them from
becoming writable, the lifetime of those mappings are tied to the lifetime
of the process.
Chrome wants to seal two large address space reservations that are managed
by different allocators. The memory is mapped RW- and RWX respectively
but write access to it is restricted using pkeys (or in the future ARM
permission overlay extensions). The lifetime of those mappings are not
tied to the lifetime of the process, therefore, while the memory is
sealed, the allocators still need to free or discard the unused memory.
For example, with madvise(DONTNEED).
However, always allowing madvise(DONTNEED) on this range poses a security
risk. For example if a jump instruction crosses a page boundary and the
second page gets discarded, it will overwrite the target bytes with zeros
and change the control flow. Checking write-permission before the discard
operation allows us to control when the operation is valid. In this case,
the madvise will only succeed if the executing thread has PKEY write
permissions and PKRU changes are protected in software by control-flow
integrity.
Although the initial version of this patch series is targeting the Chrome
browser as its first user, it became evident during upstream discussions
that we would also want to ensure that the patch set eventually is a
complete solution for memory sealing and compatible with other use cases.
The specific scenario currently in mind is glibc's use case of loading and
sealing ELF executables. To this end, Stephen is working on a change to
glibc to add sealing support to the dynamic linker, which will seal all
non-writable segments at startup. Once this work is completed, all
applications will be able to automatically benefit from these new
protections.
In closing, I would like to formally acknowledge the valuable
contributions received during the RFC process, which were instrumental in
shaping this patch:
Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Liam R. Howlett: perf optimization.
Linus Torvalds: assisting in defining system call signature and scope.
Theo de Raadt: sharing the experiences and insight gained from
implementing mimmutable() in OpenBSD.
MM perf benchmarks
==================
This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
check the VMAs’ sealing flag, so that no partial update can be made,
when any segment within the given memory range is sealed.
To measure the performance impact of this loop, two tests are developed.
[8]
The first is measuring the time taken for a particular system call,
by using clock_gettime(CLOCK_MONOTONIC). The second is using
PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
similar results.
The tests have roughly below sequence:
for (i = 0; i < 1000, i++)
create 1000 mappings (1 page per VMA)
start the sampling
for (j = 0; j < 1000, j++)
mprotect one mapping
stop and save the sample
delete 1000 mappings
calculates all samples.
Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
4G memory, Chromebook.
Based on the latest upstream code:
The first test (measuring time)
syscall__ vmas t t_mseal delta_ns per_vma %
munmap__ 1 909 944 35 35 104%
munmap__ 2 1398 1502 104 52 107%
munmap__ 4 2444 2594 149 37 106%
munmap__ 8 4029 4323 293 37 107%
munmap__ 16 6647 6935 288 18 104%
munmap__ 32 11811 12398 587 18 105%
mprotect 1 439 465 26 26 106%
mprotect 2 1659 1745 86 43 105%
mprotect 4 3747 3889 142 36 104%
mprotect 8 6755 6969 215 27 103%
mprotect 16 13748 14144 396 25 103%
mprotect 32 27827 28969 1142 36 104%
madvise_ 1 240 262 22 22 109%
madvise_ 2 366 442 76 38 121%
madvise_ 4 623 751 128 32 121%
madvise_ 8 1110 1324 215 27 119%
madvise_ 16 2127 2451 324 20 115%
madvise_ 32 4109 4642 534 17 113%
The second test (measuring cpu cycle)
syscall__ vmas cpu cmseal delta_cpu per_vma %
munmap__ 1 1790 1890 100 100 106%
munmap__ 2 2819 3033 214 107 108%
munmap__ 4 4959 5271 312 78 106%
munmap__ 8 8262 8745 483 60 106%
munmap__ 16 13099 14116 1017 64 108%
munmap__ 32 23221 24785 1565 49 107%
mprotect 1 906 967 62 62 107%
mprotect 2 3019 3203 184 92 106%
mprotect 4 6149 6569 420 105 107%
mprotect 8 9978 10524 545 68 105%
mprotect 16 20448 21427 979 61 105%
mprotect 32 40972 42935 1963 61 105%
madvise_ 1 434 497 63 63 115%
madvise_ 2 752 899 147 74 120%
madvise_ 4 1313 1513 200 50 115%
madvise_ 8 2271 2627 356 44 116%
madvise_ 16 4312 4883 571 36 113%
madvise_ 32 8376 9319 943 29 111%
Based on the result, for 6.8 kernel, sealing check adds
20-40 nano seconds, or around 50-100 CPU cycles, per VMA.
In addition, I applied the sealing to 5.10 kernel:
The first test (measuring time)
syscall__ vmas t tmseal delta_ns per_vma %
munmap__ 1 357 390 33 33 109%
munmap__ 2 442 463 21 11 105%
munmap__ 4 614 634 20 5 103%
munmap__ 8 1017 1137 120 15 112%
munmap__ 16 1889 2153 263 16 114%
munmap__ 32 4109 4088 -21 -1 99%
mprotect 1 235 227 -7 -7 97%
mprotect 2 495 464 -30 -15 94%
mprotect 4 741 764 24 6 103%
mprotect 8 1434 1437 2 0 100%
mprotect 16 2958 2991 33 2 101%
mprotect 32 6431 6608 177 6 103%
madvise_ 1 191 208 16 16 109%
madvise_ 2 300 324 24 12 108%
madvise_ 4 450 473 23 6 105%
madvise_ 8 753 806 53 7 107%
madvise_ 16 1467 1592 125 8 108%
madvise_ 32 2795 3405 610 19 122%
The second test (measuring cpu cycle)
syscall__ nbr_vma cpu cmseal delta_cpu per_vma %
munmap__ 1 684 715 31 31 105%
munmap__ 2 861 898 38 19 104%
munmap__ 4 1183 1235 51 13 104%
munmap__ 8 1999 2045 46 6 102%
munmap__ 16 3839 3816 -23 -1 99%
munmap__ 32 7672 7887 216 7 103%
mprotect 1 397 443 46 46 112%
mprotect 2 738 788 50 25 107%
mprotect 4 1221 1256 35 9 103%
mprotect 8 2356 2429 72 9 103%
mprotect 16 4961 4935 -26 -2 99%
mprotect 32 9882 10172 291 9 103%
madvise_ 1 351 380 29 29 108%
madvise_ 2 565 615 49 25 109%
madvise_ 4 872 933 61 15 107%
madvise_ 8 1508 1640 132 16 109%
madvise_ 16 3078 3323 245 15 108%
madvise_ 32 5893 6704 811 25 114%
For 5.10 kernel, sealing check adds 0-15 ns in time, or 10-30
CPU cycles, there is even decrease in some cases.
It might be interesting to compare 5.10 and 6.8 kernel
The first test (measuring time)
syscall__ vmas t_5_10 t_6_8 delta_ns per_vma %
munmap__ 1 357 909 552 552 254%
munmap__ 2 442 1398 956 478 316%
munmap__ 4 614 2444 1830 458 398%
munmap__ 8 1017 4029 3012 377 396%
munmap__ 16 1889 6647 4758 297 352%
munmap__ 32 4109 11811 7702 241 287%
mprotect 1 235 439 204 204 187%
mprotect 2 495 1659 1164 582 335%
mprotect 4 741 3747 3006 752 506%
mprotect 8 1434 6755 5320 665 471%
mprotect 16 2958 13748 10790 674 465%
mprotect 32 6431 27827 21397 669 433%
madvise_ 1 191 240 49 49 125%
madvise_ 2 300 366 67 33 122%
madvise_ 4 450 623 173 43 138%
madvise_ 8 753 1110 357 45 147%
madvise_ 16 1467 2127 660 41 145%
madvise_ 32 2795 4109 1314 41 147%
The second test (measuring cpu cycle)
syscall__ vmas cpu_5_10 c_6_8 delta_cpu per_vma %
munmap__ 1 684 1790 1106 1106 262%
munmap__ 2 861 2819 1958 979 327%
munmap__ 4 1183 4959 3776 944 419%
munmap__ 8 1999 8262 6263 783 413%
munmap__ 16 3839 13099 9260 579 341%
munmap__ 32 7672 23221 15549 486 303%
mprotect 1 397 906 509 509 228%
mprotect 2 738 3019 2281 1140 409%
mprotect 4 1221 6149 4929 1232 504%
mprotect 8 2356 9978 7622 953 423%
mprotect 16 4961 20448 15487 968 412%
mprotect 32 9882 40972 31091 972 415%
madvise_ 1 351 434 82 82 123%
madvise_ 2 565 752 186 93 133%
madvise_ 4 872 1313 442 110 151%
madvise_ 8 1508 2271 763 95 151%
madvise_ 16 3078 4312 1234 77 140%
madvise_ 32 5893 8376 2483 78 142%
From 5.10 to 6.8
munmap: added 250-550 ns in time, or 500-1100 in cpu cycle, per vma.
mprotect: added 200-750 ns in time, or 500-1200 in cpu cycle, per vma.
madvise: added 33-50 ns in time, or 70-110 in cpu cycle, per vma.
In comparison to mseal, which adds 20-40 ns or 50-100 CPU cycles, the
increase from 5.10 to 6.8 is significantly larger, approximately ten times
greater for munmap and mprotect.
When I discuss the mm performance with Brian Makin, an engineer who worked
on performance, it was brought to my attention that such performance
benchmarks, which measuring millions of mm syscall in a tight loop, may
not accurately reflect real-world scenarios, such as that of a database
service. Also this is tested using a single HW and ChromeOS, the data
from another HW or distribution might be different. It might be best to
take this data with a grain of salt.
This patch (of 5):
Wire up mseal syscall for all architectures.
Link: https://lkml.kernel.org/r/20240415163527.626541-1-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240415163527.626541-2-jeffxu@chromium.org
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Jann Horn <jannh@google.com> [Bug #2]
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Amer Al Shanawany <amer.shanawany@gmail.com>
Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Recent changes made uprobe_cpu_buffer preparation lazy, and moved it
deeper into __uprobe_trace_func(). This is problematic because
__uprobe_trace_func() is called inside rcu_read_lock()/rcu_read_unlock()
block, which then calls prepare_uprobe_buffer() -> uprobe_buffer_get() ->
mutex_lock(&ucb->mutex), leading to a splat about using mutex under
non-sleepable RCU:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 98231, name: stress-ng-sigq
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
...
Call Trace:
<TASK>
dump_stack_lvl+0x3d/0xe0
__might_resched+0x24c/0x270
? prepare_uprobe_buffer+0xd5/0x1d0
__mutex_lock+0x41/0x820
? ___perf_sw_event+0x206/0x290
? __perf_event_task_sched_in+0x54/0x660
? __perf_event_task_sched_in+0x54/0x660
prepare_uprobe_buffer+0xd5/0x1d0
__uprobe_trace_func+0x4a/0x140
uprobe_dispatcher+0x135/0x280
? uprobe_dispatcher+0x94/0x280
uprobe_notify_resume+0x650/0xec0
? atomic_notifier_call_chain+0x21/0x110
? atomic_notifier_call_chain+0xf8/0x110
irqentry_exit_to_user_mode+0xe2/0x1e0
asm_exc_int3+0x35/0x40
RIP: 0033:0x7f7e1d4da390
Code: 33 04 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b9 01 00 00 00 e9 b2 fc ff ff 66 90 f3 0f 1e fa 31 c9 e9 a5 fc ff ff 0f 1f 44 00 00 <cc> 0f 1e fa b8 27 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 6e
RSP: 002b:00007ffd2abc3608 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000076d325f1 RCX: 0000000000000000
RDX: 0000000076d325f1 RSI: 000000000000000a RDI: 00007ffd2abc3690
RBP: 000000000000000a R08: 00017fb700000000 R09: 00017fb700000000
R10: 00017fb700000000 R11: 0000000000000246 R12: 0000000000017ff2
R13: 00007ffd2abc3610 R14: 0000000000000000 R15: 00007ffd2abc3780
</TASK>
Luckily, it's easy to fix by moving prepare_uprobe_buffer() to be called
slightly earlier: into uprobe_trace_func() and uretprobe_trace_func(), outside
of RCU locked section. This still keeps this buffer preparation lazy and helps
avoid the overhead when it's not needed. E.g., if there is only BPF uprobe
handler installed on a given uprobe, buffer won't be initialized.
Note, the other user of prepare_uprobe_buffer(), __uprobe_perf_func(), is not
affected, as it doesn't prepare buffer under RCU read lock.
Link: https://lore.kernel.org/all/20240521053017.3708530-1-andrii@kernel.org/
Fixes: 1b8f85defbc8 ("uprobes: prepare uprobe args buffer lazily")
Reported-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.
When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but the old vector on the original CPU remains and is not immediately
reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming
process is delayed until the next trigger of the interrupt on the new CPU.
Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming old vectors.
However, a rare scenario arises if the old CPU is outgoing before the
interrupt triggers again on the new CPU.
In that case irq_force_complete_move() is not invoked on the outgoing CPU
to reclaim the old apicd->prev_vector because the interrupt isn't currently
affine to the outgoing CPU, and irq_needs_fixup() returns false. Even
though __vector_schedule_cleanup() is later called on the new CPU, it
doesn't reclaim apicd->prev_vector; instead, it simply resets both
apicd->move_in_progress and apicd->prev_vector to 0.
As a result, the vector remains unreclaimed in vector_matrix, leading to a
CPU vector leak.
To address this issue, move the invocation of irq_force_complete_move()
before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the
interrupt is currently or used to be affine to the outgoing CPU.
Additionally, reclaim the vector in __vector_schedule_cleanup() as well,
following a warning message, although theoretically it should never see
apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.
Fixes: f0383c24b485 ("genirq/cpuhotplug: Add support for cleaning up move in progress")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com
- Fix a very tight race between the ring buffer readers and resizing
the ring buffer.
- Correct some stale comments in the ring buffer code.
- Fix kernel-doc in the rv code.
- Add a MODULE_DESCRIPTION to preemptirq_delay_test
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZk6PYBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qrn2AP4//ghUBbEtOJTXOocvyofTGZNQrZ+3
YEAkwmtB4BS0OwEAqR9N1ov6K7r0K10W8x/wNJyfkKsMWa3MwftHqQklvgQ=
=fNlg
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"Minor last minute fixes:
- Fix a very tight race between the ring buffer readers and resizing
the ring buffer
- Correct some stale comments in the ring buffer code
- Fix kernel-doc in the rv code
- Add a MODULE_DESCRIPTION to preemptirq_delay_test"
* tag 'trace-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Update rv_en(dis)able_monitor doc to match kernel-doc
tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test
ring-buffer: Fix a race between readers and resize checks
ring-buffer: Correct stale comments related to non-consuming readers
The __assign_str() macro logic of the TRACE_EVENT() macro was optimized so
that it no longer needs the second argument. The __assign_str() is always
matched with __string() field that takes a field name and the source for
that field:
__string(field, source)
The TRACE_EVENT() macro logic will save off the source value and then use
that value to copy into the ring buffer via the __assign_str(). Before
commit c1fa617caeb0 ("tracing: Rework __assign_str() and __string() to not
duplicate getting the string"), the __assign_str() needed the second
argument which would perform the same logic as the __string() source
parameter did. Not only would this add overhead, but it was error prone as
if the __assign_str() source produced something different, it may not have
allocated enough for the string in the ring buffer (as the __string()
source was used to determine how much to allocate)
Now that the __assign_str() just uses the same string that was used in
__string() it no longer needs the source parameter. It can now be removed.
-----BEGIN PGP SIGNATURE-----
iIkEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZk9RMBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qur+AP9jbSYaGhzZdJ7a3HGA8M4l6JNju8nC
GcX1JpJT4z1qvgD3RkoNvP87etDAUAqmbVhVWnUHCY/vTqr9uB/gqmG6Ag==
=Y+6f
-----END PGP SIGNATURE-----
Merge tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing cleanup from Steven Rostedt:
"Remove second argument of __assign_str()
The __assign_str() macro logic of the TRACE_EVENT() macro was
optimized so that it no longer needs the second argument. The
__assign_str() is always matched with __string() field that takes a
field name and the source for that field:
__string(field, source)
The TRACE_EVENT() macro logic will save off the source value and then
use that value to copy into the ring buffer via the __assign_str().
Before commit c1fa617caeb0 ("tracing: Rework __assign_str() and
__string() to not duplicate getting the string"), the __assign_str()
needed the second argument which would perform the same logic as the
__string() source parameter did. Not only would this add overhead, but
it was error prone as if the __assign_str() source produced something
different, it may not have allocated enough for the string in the ring
buffer (as the __string() source was used to determine how much to
allocate)
Now that the __assign_str() just uses the same string that was used in
__string() it no longer needs the source parameter. It can now be
removed"
* tag 'trace-assign-str-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/treewide: Remove second parameter of __assign_str()
Several new features here:
- virtio-net is finally supported in vduse.
- Virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmZN570PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp2JUH/1K3fZOHymop6Y5Z3USFS7YdlF+dniedY/vg
TKyWERkXOlxq1d9DVxC0mN7tk72DweuWI0YJjLXofrEW1VuW29ecSbyFXxpeWJls
b7ErffxDAFRas5jkMCngD8TuFnbEegU0mGP5kbiHpEndBydQ2hH99Gg0x7swW+cE
xsvU5zonCCLwLGIP2DrVrn9qGOHtV6o8eZfVKDVXfvicn3lFBkUSxlwEYsO9RMup
aKxV4FT2Pb1yBicwBK4TH1oeEXqEGy1YLEn+kAHRbgoC/5L0/LaiqrkzwzwwOIPj
uPGkacf8CIbX0qZo5EzD8kvfcYL1xhU3eT9WBmpp2ZwD+4bINd4=
=nax1
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- virtio-net is finally supported in vduse
- virtio (balloon and mem) interaction with suspend is improved
- vhost-scsi now handles signals better/faster
And fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits)
virtio-pci: Check if is_avq is NULL
virtio: delete vq in vp_find_vqs_msix() when request_irq() fails
MAINTAINERS: add Eugenio Pérez as reviewer
vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API
vp_vdpa: don't allocate unused msix vectors
sound: virtio: drop owner assignment
fuse: virtio: drop owner assignment
scsi: virtio: drop owner assignment
rpmsg: virtio: drop owner assignment
nvdimm: virtio_pmem: drop owner assignment
wifi: mac80211_hwsim: drop owner assignment
vsock/virtio: drop owner assignment
net: 9p: virtio: drop owner assignment
net: virtio: drop owner assignment
net: caif: virtio: drop owner assignment
misc: nsm: drop owner assignment
iommu: virtio: drop owner assignment
drm/virtio: drop owner assignment
gpio: virtio: drop owner assignment
firmware: arm_scmi: virtio: drop owner assignment
...
cpumask_of_node() can be called for NUMA_NO_NODE inside do_map_benchmark()
resulting in the following sanitizer report:
UBSAN: array-index-out-of-bounds in ./arch/x86/include/asm/topology.h:72:28
index -1 is out of range for type 'cpumask [64][1]'
CPU: 1 PID: 990 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:117)
ubsan_epilogue (lib/ubsan.c:232)
__ubsan_handle_out_of_bounds (lib/ubsan.c:429)
cpumask_of_node (arch/x86/include/asm/topology.h:72) [inline]
do_map_benchmark (kernel/dma/map_benchmark.c:104)
map_benchmark_ioctl (kernel/dma/map_benchmark.c:246)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Use cpumask_of_node() in place when binding a kernel thread to a cpuset
of a particular node.
Note that the provided node id is checked inside map_benchmark_ioctl().
It's just a NUMA_NO_NODE case which is not handled properly later.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Barry Song <baohua@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:
BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If do_map_benchmark() has failed, there is nothing useful to copy back
to userspace.
Suggested-by: Barry Song <21cnbao@gmail.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
kthread creation failure is invalidly handled inside do_map_benchmark().
The put_task_struct() calls on the error path are supposed to balance the
get_task_struct() calls which only happen after all the kthreads are
successfully created. Rollback using kthread_stop() for already created
kthreads in case of such failure.
In normal situation call kthread_stop_put() to gracefully stop kthreads
and put their task refcounts. This should be done for all started
kthreads.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 65789daa8087 ("dma-mapping: add benchmark support for streaming DMA APIs")
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.
This means that with:
__string(field, mystring)
Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.
There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:
git grep -l __assign_str | while read a ; do
sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
mv /tmp/test-file $a;
done
I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.
Note, the same updates will need to be done for:
__assign_str_len()
__assign_rel_str()
__assign_rel_str_len()
I tested this with both an allmodconfig and an allyesconfig (build only for both).
[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
Here is the small set of driver core and kernfs changes for 6.10-rc1.
Nothing major here at all, just a small set of changes for some driver
core apis, and minor fixups. Included in here are:
- sysfs_bin_attr_simple_read() helper added and used
- device_show_string() helper added and used
All usages of these were acked by the various maintainers. Also in here
are:
- kernfs minor cleanup
- removed unused functions
- typo fix in documentation
- pay attention to sysfs_create_link() failures in module.c finally.
All of these have been in linux-next for a very long time with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZk3+hQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylfTwCfUyHWkDZuZ7ehdtjzfmcd4EKZBK8An3AAV99G
ox8PXMxuFTaUEdT/69FQ
=2sEo
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the small set of driver core and kernfs changes for 6.10-rc1.
Nothing major here at all, just a small set of changes for some driver
core apis, and minor fixups. Included in here are:
- sysfs_bin_attr_simple_read() helper added and used
- device_show_string() helper added and used
All usages of these were acked by the various maintainers. Also in
here are:
- kernfs minor cleanup
- removed unused functions
- typo fix in documentation
- pay attention to sysfs_create_link() failures in module.c finally
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
device property: Fix a typo in the description of device_get_child_node_count()
kernfs: mount: Remove unnecessary ‘NULL’ values from knparent
scsi: Use device_show_string() helper for sysfs attributes
platform/x86: Use device_show_string() helper for sysfs attributes
perf: Use device_show_string() helper for sysfs attributes
IB/qib: Use device_show_string() helper for sysfs attributes
hwmon: Use device_show_string() helper for sysfs attributes
driver core: Add device_show_string() helper for sysfs attributes
treewide: Use sysfs_bin_attr_simple_read() helper
sysfs: Add sysfs_bin_attr_simple_read() helper
module: don't ignore sysfs_create_link() failures
driver core: Remove unused platform_notify, platform_notify_remove
Here is the big set of tty/serial driver changes for 6.10-rc1. Included
in here are:
- Usual good set of api cleanups and evolution by Jiri Slaby to make
the serial interfaces move out of the 1990's by using kfifos instead
of hand-rolling their own logic.
- 8250_exar driver updates
- max3100 driver updates
- sc16is7xx driver updates
- exar driver updates
- sh-sci driver updates
- tty ldisc api addition to help refuse bindings
- other smaller serial driver updates
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZk4Cvg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymqpwCgnHU1NeBBUsvoSDOLk5oApIQ4jVgAn102jWlw
3dNDhA4i3Ay/mZdv8/Kj
=TI+P
-----END PGP SIGNATURE-----
Merge tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty/serial driver changes for 6.10-rc1.
Included in here are:
- Usual good set of api cleanups and evolution by Jiri Slaby to make
the serial interfaces move out of the 1990's by using kfifos
instead of hand-rolling their own logic.
- 8250_exar driver updates
- max3100 driver updates
- sc16is7xx driver updates
- exar driver updates
- sh-sci driver updates
- tty ldisc api addition to help refuse bindings
- other smaller serial driver updates
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (113 commits)
serial: Clear UPF_DEAD before calling tty_port_register_device_attr_serdev()
serial: imx: Raise TX trigger level to 8
serial: 8250_pnp: Simplify "line" related code
serial: sh-sci: simplify locking when re-issuing RXDMA fails
serial: sh-sci: let timeout timer only run when DMA is scheduled
serial: sh-sci: describe locking requirements for invalidating RXDMA
serial: sh-sci: protect invalidating RXDMA on shutdown
tty: add the option to have a tty reject a new ldisc
serial: core: Call device_set_awake_path() for console port
dt-bindings: serial: brcm,bcm2835-aux-uart: convert to dtschema
tty: serial: uartps: Add support for uartps controller reset
arm64: zynqmp: Add resets property for UART nodes
dt-bindings: serial: cdns,uart: Add optional reset property
serial: 8250_pnp: Switch to DEFINE_SIMPLE_DEV_PM_OPS()
serial: 8250_exar: Keep the includes sorted
serial: 8250_exar: Make type of bit the same in exar_ee_*_bit()
serial: 8250_exar: Use BIT() in exar_ee_read()
serial: 8250_exar: Switch to use dev_err_probe()
serial: 8250_exar: Return directly from switch-cases
serial: 8250_exar: Decrease indentation level
...
- Ensure seldom updated triggers have a brightness value before first update
- New Device Support
- Add support for Simatic IPC Device BX_59A to IPC LEDs Core
- Add support for Qualcomm PMI8950 PWM to LPG Core
- New Functionality
- Add a bunch of new LED function identifiers
- Add support for High Resolution Timers in LED Trigger Patten
- Fix-ups
- Shift out Audio Trigger to the Sound subsystem
- Convert suitable calls to devm_* managed resources
- Device Tree binding adaptions/conversions/creation
- Remove superfluous code/variables/attributes and simplify overall
- Use/convert to new/better APIs/helpers/MACROs instead of hand-rolling implementations
- Bug Fixes
- Repair enabling Torch Mode from V4L2 on the second LED
- Ensure PWM is disabled when suspending
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAmZNwdoACgkQUa+KL4f8
d2FUSw/8Dxzt29Ad326K6q7ePlGgdm4JcFeg+1B2WD+5IOCelecXD3QcVvH47Wz3
50gjHdo3qoBja6IZDwgl0ZFYj6VLKbrEmqjtM9BscdG2gaND1VGYTPtve4EqIyOX
WXv2InA69QfFNmk/n+AxDa/xYOunsK1S1RB1cuPkoVJii/P2rHiEv2LpG/TS8Sxk
I7EN45Ebh3Z6hHLpmfEIbVLLlFuyGFnSFnHOiOSAlFh1Ri0DdHBwgK4IdDYtq2dx
LB9ICem0+6PQxPKpf/ozUS2oV+jd8oS48TDJjx9n8DjblV8zh0IKYi5HEHYZifBb
03xF/XZ62MYtp6jHwiaNE1WgoARu13RZIcFbpQFgC5+N2gwpe4BGH+6nXXrimsK/
opVed2UYPdCDlHVVpScgMMUWYrnCWks4/6Iusd5K9YN6At35xuCqQh5laPOF1Cj+
jKzgxZ6gMzWTTpFQSkYpNn5wFC+p7VosdGvh/d6L5ltVb7bINgZ7mUbVEwRLZaPj
v+ZS/iWjTptMA9bHk6f/4duSJjJy15Ghdqd1CuvX/VAL9DqSz6O7hf+vj9yvOtjf
pmI4ZUBaDYX7Ut0CHcjhbg2fSYv3VMg58fLF5mUUIVgRdSHdhrpRUbYphuodhoHu
zWde2gKEkAqNTaQk56jzLi9K8Knu3PkIOnKZ9SgHfIIkmMLMNOM=
=rF2l
-----END PGP SIGNATURE-----
Merge tag 'leds-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
"Core Frameworks:
- Ensure seldom updated triggers have a brightness value before first
update
New Device Support:
- Add support for Simatic IPC Device BX_59A to IPC LEDs Core
- Add support for Qualcomm PMI8950 PWM to LPG Core
New Functionality:
- Add a bunch of new LED function identifiers
- Add support for High Resolution Timers in LED Trigger Patten
Fix-ups:
- Shift out Audio Trigger to the Sound subsystem
- Convert suitable calls to devm_* managed resources
- Device Tree binding adaptions/conversions/creation
- Remove superfluous code/variables/attributes and simplify overall
- Use/convert to new/better APIs/helpers/MACROs instead of
hand-rolling implementations
Bug Fixes:
- Repair enabling Torch Mode from V4L2 on the second LED
- Ensure PWM is disabled when suspending"
* tag 'leds-next-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (28 commits)
leds: mt6370: Remove unused field 'reg_cfgs' from 'struct mt6370_priv'
leds: lp50xx: Remove unused field 'num_of_banked_leds' from 'struct lp50xx'
leds: lp50xx: Remove unused field 'bank_modules' from 'struct lp50xx_led'
leds: aat1290: Remove unused field 'torch_brightness' from 'struct aat1290_led'
leds: sun50i-a100: Use match_string() helper to simplify the code
leds: pwm: Disable PWM when going to suspend
leds: trigger: pattern: Add support for hrtimer
leds: mt6360: Fix the second LED can not enable torch mode by V4L2
dt-bindings: leds: leds-qcom-lpg: Add support for PMI8950 PWM
leds: qcom-lpg: Add support for PMI8950 PWM
leds: apu: Remove duplicate DMI lookup data
leds: trigger: netdev: Remove not needed call to led_set_brightness in deactivate
dt-bindings: leds: Add LED_FUNCTION_SPEED_* for link speed on LAN/WAN
dt-bindings: leds: Add LED_FUNCTION_MOBILE for mobile network
leds: simatic-ipc-leds-gpio: Add support for module BX-59A
dt-bindings: leds: qcom-lpg: Document PM6150L compatible
dt-bindings: leds: pca963x: Convert text bindings to YAML
leds: an30259a: Use devm_mutex_init() for mutex initialization
leds: mlxreg: Use devm_mutex_init() for mutex initialization
leds: nic78bx: Use devm API to cleanup module's resources
...
* Support for byte/half-word compare-and-exchange, emulated via LR/SC
loops.
* Support for Rust.
* Support for Zihintpause in hwprobe.
* Support for the PR_RISCV_SET_ICACHE_FLUSH_CTX prctl().
* Support for lockless lockrefs.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmZN/hcTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiVrGEACUT3gsbTx1q7fa11iQNxOjVkpl66Qn
7+kI+V9xt5+GuH2EjJk6AsSNHPKeQ8totbSTA8AZjINFvgVjXslN+DPpcjCFKvnh
NN5/Lyd64X0PZMsxGWlN9SHTFWf2b7lalCnY51BlX/IpBbHWc/no9XUsPSVixx6u
9q+JoS3D1DDV92nGcA/UK9ICCsDcf4omWgZW7KbjnVWnuY9jt4ctTy11jtF2RM9R
Z9KAWh0RqPzjz0vNbBBf9Iw7E4jt/Px6HDYPfZAiE2dVsCTHjdsC7TcGRYXzKt6F
4q9zg8kzwvUG5GaBl7/XprXO1vaeOUmPcTVoE7qlRkSdkknRH/iBz1P4hk+r0fze
f+h5ZUV/oJP7vDb+vHm/BExtGufgLuJ2oMA2Bp9qI17EMcMsGiRMt7DsBMEafWDk
bNrFcJdqqYBz6HxfTwzNH5ErxfS/59PuwYl913BTSOH//raCZCFXOfyrSICH7qXd
UFOLLmBpMuApLa8ayFeI9Mp3flWfbdQHR52zLRLiUvlpWNEDKrNQN417juVwTXF0
DYkjJDhFPLfFOr/sJBboftOMOUdA9c/CJepY9o4kPvBXUvPtRHN1jdXDNSCVDZRb
nErnsJ9rv0PzfxQU7Xjhd2QmCMeMlbCQDpXAKKETyyimpTbgF33rovN0i5ixX3m4
KG6RvKDubOzZdA==
=YLoD
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Add byte/half-word compare-and-exchange, emulated via LR/SC loops
- Support for Rust
- Support for Zihintpause in hwprobe
- Add PR_RISCV_SET_ICACHE_FLUSH_CTX prctl()
- Support lockless lockrefs
* tag 'riscv-for-linus-6.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits)
riscv: defconfig: Enable CONFIG_CLK_SOPHGO_CV1800
riscv: select ARCH_HAS_FAST_MULTIPLIER
riscv: mm: still create swiotlb buffer for kmalloc() bouncing if required
riscv: Annotate pgtable_l{4,5}_enabled with __ro_after_init
riscv: Remove redundant CONFIG_64BIT from pgtable_l{4,5}_enabled
riscv: mm: Always use an ASID to flush mm contexts
riscv: mm: Preserve global TLB entries when switching contexts
riscv: mm: Make asid_bits a local variable
riscv: mm: Use a fixed layout for the MM context ID
riscv: mm: Introduce cntx2asid/cntx2version helper macros
riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
riscv: mm: Combine the SMP and UP TLB flush code
riscv: Only send remote fences when some other CPU is online
riscv: mm: Broadcast kernel TLB flushes only when needed
riscv: Use IPIs for remote cache/TLB flushes by default
riscv: Factor out page table TLB synchronization
riscv: Flush the instruction cache during SMP bringup
riscv: hwprobe: export Zihintpause ISA extension
riscv: misaligned: remove CONFIG_RISCV_M_MODE specific code
...
This removes the signal/coredump hacks added for vhost_tasks in:
Commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
When that patch was added vhost_tasks did not handle SIGKILL and would
try to ignore/clear the signal and continue on until the device's close
function was called. In the previous patches vhost_tasks and the vhost
drivers were converted to support SIGKILL by cleaning themselves up and
exiting. The hacks are no longer needed so this removes them.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-10-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Instead of lingering until the device is closed, this has us handle
SIGKILL by:
1. marking the worker as killed so we no longer try to use it with
new virtqueues and new flush operations.
2. setting the virtqueue to worker mapping so no new works are queued.
3. running all the exiting works.
Suggested-by: Edward Adam Davis <eadavis@qq.com>
Reported-and-tested-by: syzbot+98edc2df894917b3431f@syzkaller.appspotmail.com
Message-Id: <tencent_546DA49414E876EEBECF2C78D26D242EE50A@qq.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20240316004707.45557-9-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The patch updates the function documentation comment for
rv_en(dis)able_monitor to adhere to the kernel-doc specification.
Link: https://lore.kernel.org/linux-trace-kernel/20240520054239.61784-1-yang.lee@linux.alibaba.com
Fixes: 102227b970a15 ("rv: Add Runtime Verification (RV) interface")
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old->list.prev->next to point it to the
new page. Following that, if the operation is successful,
old->list.next->prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page->prev->next or
page->next->prev might not be equal back to page for some page in the
ring buffer.
The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:
[ 190.271762] ------------[ cut here ]------------
[ 190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[ 190.271789] Modules linked in: [...]
[ 190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[ 190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G E 6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[ 190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[ 190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[ 190.272023] Code: [...]
[ 190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[ 190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[ 190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[ 190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[ 190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[ 190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[ 190.272053] FS: 00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[ 190.272057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[ 190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 190.272077] Call Trace:
[ 190.272098] <TASK>
[ 190.272189] ring_buffer_resize+0x2ab/0x460
[ 190.272199] __tracing_resize_ring_buffer.part.0+0x23/0xa0
[ 190.272206] tracing_resize_ring_buffer+0x65/0x90
[ 190.272216] tracing_entries_write+0x74/0xc0
[ 190.272225] vfs_write+0xf5/0x420
[ 190.272248] ksys_write+0x67/0xe0
[ 190.272256] do_syscall_64+0x82/0x170
[ 190.272363] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 190.272373] RIP: 0033:0x7f1bd657d263
[ 190.272381] Code: [...]
[ 190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[ 190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[ 190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[ 190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[ 190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[ 190.272412] </TASK>
[ 190.272414] ---[ end trace 0000000000000000 ]---
Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab792705c
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.
The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():
ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
if (!ret)
goto spin;
for (unsigned i = 0; i < 1U << 26; i++) /* inserted delay loop */
__asm__ __volatile__ ("" : : : "memory");
rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;
.. and then run the following commands on the target system:
echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
while true; do
echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
done &
while true; do
for i in /sys/kernel/tracing/per_cpu/*; do
timeout 0.1 cat $i/trace_pipe; sleep 0.2
done
done
To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer->reader_lock.
Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 659f451ff213 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Adjust the following code documentation:
* Kernel-doc comments for ring_buffer_read_prepare() and
ring_buffer_read_finish() mention that recording to the ring buffer is
disabled when the read is active. Remove mention of this restriction
because it was already lifted in commit 1039221cc278 ("ring-buffer: Do
not disable recording when there is an iterator").
* Function ring_buffer_read_finish() performs a self-check of the
ring-buffer by locking cpu_buffer->reader_lock and then calling
rb_check_pages(). The preceding comment explains that the lock is
needed because rb_check_pages() clears the HEAD flag required by
readers which might be running in parallel. Remove this explanation
because commit 8843e06f67b1 ("ring-buffer: Handle race between
rb_move_tail and rb_check_pages") simplified the function so it no
longer resets the mentioned flag. Nonetheless, the lock is still
needed because a reader swapping a page into the ring buffer can make
the underlying doubly-linked list temporarily inconsistent.
This is a non-functional change.
Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-2-petr.pavlu@suse.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Hi Linus,
Please pull patches for 6.10. This includes:
- topology_span_sane() optimization from Kyle Meyer;
- fns() rework from Kuan-Wei Chiu (used in
cpumask_local_spread() and other places); and
- headers cleanup from Andy.
This also adds a MAINTAINERS record for bitops API as it's unattended,
and I'd like to follow it closer.
Thanks,
Yury
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmZKh/kACgkQsUSA/Tof
vshtSQv/eT5+KyXg5qCY3fLaIjWYD0uch5jxkdqtib5BncfIrUMsFpZBon+E2x9C
fWu7K/nfxUjKZF0Sfgl9gVns6K0rC4F24WzHjzWRVVV7+g4idXwMC1kxSX733KQC
o+D2065Dx9EmhnzypBbmNsGQsQ09WXP1GsJLf8qSGCw0lT1zNtgqsAD5sSogFGGn
ca9ZsndThuzTst5lXPXipt1W/c26frchh6SgjVTPjzALCDAf5r9Ls5np3AL1AW8X
yR8cuV9UphT1ysBplzPbBET/Fy/AGbZl1g4u72M6NvGy/nVkQ5Ic4HZj0zIem0Ic
C60PokY8lg6hQ7tWN8da12/g6WZINgZcfUfuodKiQAzryBGUJlW0aDzDUZPcCqB/
gmV/Op4RPJeQr9sibQ6nIFx73ydKVQEmZRliahzXR0p33HJCOLTATOeYqLTXQMdi
ZwhYCqG5fNEUK0VMBy8S4+tEsUAoykU21hFD04b/Ur8A49bxxJ9RDlAUC0IEc1Pj
fiU0VPFx
=H6BQ
-----END PGP SIGNATURE-----
Merge tag 'bitmap-for-6.10v2' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- topology_span_sane() optimization from Kyle Meyer
- fns() rework from Kuan-Wei Chiu (used in cpumask_local_spread() and
other places)
- headers cleanup from Andy
- add a MAINTAINERS record for bitops API
* tag 'bitmap-for-6.10v2' of https://github.com/norov/linux:
usercopy: Don't use "proxy" headers
bitops: Move aligned_byte_mask() to wordpart.h
MAINTAINERS: add BITOPS API record
bitmap: relax find_nth_bit() limitation on return value
lib: make test_bitops compilable into the kernel image
bitops: Optimize fns() for improved performance
lib/test_bitops: Add benchmark test for fns()
Compiler Attributes: Add __always_used macro
sched/topology: Optimize topology_span_sane()
cpumask: Add for_each_cpu_from()
to struct file * and verifying that caller has device
opened exclusively.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZkwkfQAKCRBZ7Krx/gZQ
62C3AQDW5vuXNx2+KDPma5YStjFpPLC0xtSyAS5D3YANjtyRFgD/TOcCarq7rvBt
KubxHVFsfW+eu6ASeaoMRB83w5OIzwk=
=Liix
-----END PGP SIGNATURE-----
Merge tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs blocksize updates from Al Viro:
"This gets rid of bogus set_blocksize() uses, switches it over
to be based on a 'struct file *' and verifies that the caller
has the device opened exclusively"
* tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
make set_blocksize() fail unless block device is opened exclusive
set_blocksize(): switch to passing struct file *
btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens
swsusp: don't bother with setting block size
zram: don't bother with reopening - just use O_EXCL for open
swapon(2): open swap with O_EXCL
swapon(2)/swapoff(2): don't bother with block size
pktcdvd: sort set_blocksize() calls out
bcache_register(): don't bother with set_blocksize()
Currently, worker ID formatting is open coded in create_worker(),
init_rescuer() and worker_thread() (for %WORKER_DIE case). The formatted ID
is saved into task->comm and wq_worker_comm() uses it as the base name to
append extra information to when generating the name to be shown to
userspace.
However, TASK_COMM_LEN is only 16 leading to badly truncated names for
rescuers. For example, the rescuer for the inet_frag_wq workqueue becomes:
$ ps -ef | grep '[k]worker/R-inet'
root 483 2 0 Apr26 ? 00:00:00 [kworker/R-inet_]
Even for non-rescue workers, it's easy to run over 15 characters on
moderately large machines.
Fit it by consolidating worker ID formatting into a new helper
format_worker_id() and calling it from wq_worker_comm() to obtain the
untruncated worker ID string.
$ ps -ef | grep '[k]worker/R-inet'
root 60 2 0 12:10 ? 00:00:00 [kworker/R-inet_frag_wq]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Jan Engelhardt <jengelh@inai.de>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
- The vfio fsl-mc bus driver has become orphaned. We'll consider
removing it in future releases if a new maintainer isn't found.
(Alex Williamson)
- Improved usage of opaque data in vfio-pci INTx handling,
avoiding lookups of the eventfd through the interrupt and
irqfd runtime paths. (Alex Williamson)
- Resolve an error path memory leak introduced in vfio-pci
interrupt code. (Ye Bin)
- Addition of interrupt support for vfio devices exposed on the
CDX bus, including a new MSI allocation helper and export of
existing helpers for MSI alloc and free. (Nipun Gupta)
- A new vfio-pci variant driver supporting migration of Intel
QAT VF devices for the GEN4 PFs. (Xin Zeng & Yahui Cao)
- Resolve a possibly circular locking dependency in vfio-pci
by avoiding copy_to_user() from a PCI bus walk callback.
(Alex Williamson)
- Trivial docs update to remove a duplicate semicolon.
(Foryun Ma)
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmZLhtUbHGFsZXgud2ls
bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsikU4P/jzHWOU9OvpP30c1r6me
ez8V7JIGmAtLI0ci69uqn0B86h1nLAAmLg8QvcTco9s0a+4Pb3QGUmLfA6niZLUV
Ji7Z4c3Df4v6Kxzjg4e2Sb8rSvdzehV+WNB+kQ4lEGPyx7OvfiR6lHi2WYzAjm4M
lcmZCH5Y0URQ+wMSEHZcuom4OOSfHULvOovHuvN9CFyuZfEpVmA57MhAGiCNhXcD
Nr2KMADt7K2xDtfCv84ezx2kw6MP3mTQiWOwN1HHLEI5IW+pnv3DTaPnEn6KdTcn
zRHDu9a3uUnE4/HsuiAkMeOX046NYLHhZRls4IjligcjB8Es53nA3iSVm1sJL9RT
Nos/FubSuZ2TJ9AEkiqLRujSJiq40ALRC1qccjyN4a6pgmWSBe/3lbOHukPjAQ2K
6BmmO3tB/3wLSSbSumojar385NvyzGOQCOVHKTXgoqK7KFJpTQqsxT9GqwMdOQ+O
6nSOzfcnliTGQZ5GFuUVieFeOb6R2U7dQLT42pgBPIvToidjdfEcBRvL0SlvQbQe
HuyQ/Rx4XQ9tHHjSlOw6GEsiNsgY8TsmX+lqrCEc4G15nRLCHMp7RRh7gWz08y+g
/JqeB872zsKNiIlgnaskxmDA5iRZjPLdCu+85H7pZzegLC1NVhVrJJehR3LgleDQ
3WGxxjFNl1gKOGubhiUgd/B7
=EFpj
-----END PGP SIGNATURE-----
Merge tag 'vfio-v6.10-rc1' of https://github.com/awilliam/linux-vfio
Pull vfio updates from Alex Williamson:
- The vfio fsl-mc bus driver has become orphaned. We'll consider
removing it in future releases if a new maintainer isn't found (Alex
Williamson)
- Improved usage of opaque data in vfio-pci INTx handling, avoiding
lookups of the eventfd through the interrupt and irqfd runtime paths
(Alex Williamson)
- Resolve an error path memory leak introduced in vfio-pci interrupt
code (Ye Bin)
- Addition of interrupt support for vfio devices exposed on the CDX
bus, including a new MSI allocation helper and export of existing
helpers for MSI alloc and free (Nipun Gupta)
- A new vfio-pci variant driver supporting migration of Intel QAT VF
devices for the GEN4 PFs (Xin Zeng & Yahui Cao)
- Resolve a possibly circular locking dependency in vfio-pci by
avoiding copy_to_user() from a PCI bus walk callback (Alex
Williamson)
- Trivial docs update to remove a duplicate semicolon (Foryun Ma)
* tag 'vfio-v6.10-rc1' of https://github.com/awilliam/linux-vfio:
vfio/pci: Restore zero affected bus reset devices warning
vfio: remove an extra semicolon
vfio/pci: Collect hot-reset devices to local buffer
vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices
vfio/cdx: add interrupt support
genirq/msi: Add MSI allocation helper and export MSI functions
vfio/pci: fix potential memory leak in vfio_intx_enable()
vfio/pci: Pass eventfd context object through irqfd
vfio/pci: Pass eventfd context to IRQ handler
MAINTAINERS: Orphan vfio fsl-mc bus driver
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmZLJS0ACgkQnJ2qBz9k
QNmFlggAlIg5oDZfOhJur6h3Icldrl2DsnKer0CAP7TFK+GfkFTEb25paoydBEu4
Y0VzZ3n3EqhmsJ8P515k1UPPPXlqqZwSRWGAek0FDhQCXhqEYxiWwf9U343hJNBS
rya4Rnwc1pxqmJU2hrY5R5kEbugUFAIL+qNXzhhLpWonYiy/ya7P5n/qz5F5HJH2
FufRRaPHcHFfk1u0+PvFrk019AS9C6Y3bkcUGtbpdwmFsuN3D4HKuLEkr1+C9Apb
NmkoAwCiSobQhAxGDr6Szqu6r1VCuM+n/O9fqLknnL9u0jm95AmGdIMOdQ/ofx6d
xn3mfRp8gUbPD8PubHhQsMjCmSjGwg==
=kwWW
-----END PGP SIGNATURE-----
Merge tag 'fsnotify_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
- reduce overhead of fsnotify infrastructure when no permission events
are in use
- a few small cleanups
* tag 'fsnotify_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix UAF from FS_ERROR event on a shutting down filesystem
fsnotify: optimize the case of no permission event watchers
fsnotify: use an enum for group priority constants
fsnotify: move s_fsnotify_connectors into fsnotify_sb_info
fsnotify: lazy attach fsnotify_sb_info state to sb
fsnotify: create helper fsnotify_update_sb_watchers()
fsnotify: pass object pointer and type to fsnotify mark helpers
fanotify: merge two checks regarding add of ignore mark
fsnotify: create a wrapper fsnotify_find_inode_mark()
fsnotify: create helpers to get sb and connp from object
fsnotify: rename fsnotify_{get,put}_sb_connectors()
fsnotify: Avoid -Wflex-array-member-not-at-end warning
fanotify: remove unneeded sub-zero check for unsigned value
- optimize DMA sync calls when they are no-ops (Alexander Lobakin)
- fix swiotlb padding for untrusted devices (Michael Kelley)
- add documentation for swiotb (Michael Kelley)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmZLV+gLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPO7hAAlKuXigzwcrVEUnfRGRdaZ28xbmffyC1dPfw8HRZe
xJqvD51aJ/VOoOCcUyt3hNLEQHwtjEk4eM0xGcAASMdwceU58doJCcDJBpbbgbDK
CPKJgBLQBC1JfAJUpRiJkV4RsudRhAyndIzUPVgkz0WObpEgDpfO0ClHRF/0Pavy
1sBFVFMbB1ewb/D8ffpp+DWfwrwu0oMC3A2LkYu2F5SQFWuVOpbNemrnZ6K2ckPt
2mcLpJ308+sti8Ka/LrI2akU8JCLYMYDQnue/44v3X3Gm63cMcEx/fj5M5x6m71n
P+cxAkjsGDHybnfjbUvR842to8msRsH4CI4Zbb69+5HDlWSadM8JhQd74oeii6o6
RiGPrrFEk7vCxFOkUsqGFYMykEX+71wXfQ1Mpp/b4QgdqBLkxW4ozQ3Ya7ASUs2z
TLLmQvIXtYKGnyU+RdOkvS6piHjd4wVHOhuGVdXqVT7WrbaPeovY4TNSTV2ZA1gE
9Y5RCdrX9xeGGNjsYXKwsWGvXVsm6UTQmQVUsatQb3ic+K3S6tQR9pwzk0HmhMuM
BscWHSAEL7T8ZZ5Ydph45Cw/6xdH7LggD+nRtLcdAuzCika12eabZHsO0DrF533n
qXYOjZOgsMEZWICynxq6+EGQKGWY+F+GyKDMU2w2Es5OgMa9Bqb40aSF+Q887s96
xwI=
=Pa8W
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.10-2024-05-20' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- optimize DMA sync calls when they are no-ops (Alexander Lobakin)
- fix swiotlb padding for untrusted devices (Michael Kelley)
- add documentation for swiotb (Michael Kelley)
* tag 'dma-mapping-6.10-2024-05-20' of git://git.infradead.org/users/hch/dma-mapping:
dma: fix DMA sync for drivers not calling dma_set_mask*()
xsk: use generic DMA sync shortcut instead of a custom one
page_pool: check for DMA sync shortcut earlier
page_pool: don't use driver-set flags field directly
page_pool: make sure frag API fields don't span between cachelines
iommu/dma: avoid expensive indirect calls for sync operations
dma: avoid redundant calls for sync operations
dma: compile-out DMA sync op calls when not used
iommu/dma: fix zeroing of bounce buffer padding used by untrusted devices
swiotlb: remove alloc_size argument to swiotlb_tbl_map_single()
Documentation/core-api: add swiotlb documentation
Notable series include:
- Some maintenance and performance work for ocfs2 in Heming Zhao's
series "improve write IO performance when fragmentation is high".
- Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
exposed by fstests".
- kfifo header rework from Andy Shevchenko in the series "kfifo: Clean
up kfifo.h".
- GDB script fixes from Florian Rommel in the series "scripts/gdb: Fixes
for $lx_current and $lx_per_cpu".
- After much discussion, a coding-style update from Barry Song
explaining one reason why inline functions are preferred over macros.
The series is "codingstyle: avoid unused parameters for a function-like
macro".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkpLYQAKCRDdBJ7gKXxA
jo9NAQDctSD3TMXqxqCHLaEpCaYTYzi6TGAVHjgkqGzOt7tYjAD/ZIzgcmRwthjP
R7SSiSgZ7UnP9JRn16DQILmFeaoG1gs=
=lYhr
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-mm updates from Andrew Morton:
"Mainly singleton patches, documented in their respective changelogs.
Notable series include:
- Some maintenance and performance work for ocfs2 in Heming Zhao's
series "improve write IO performance when fragmentation is high".
- Some ocfs2 bugfixes from Su Yue in the series "ocfs2 bugs fixes
exposed by fstests".
- kfifo header rework from Andy Shevchenko in the series "kfifo:
Clean up kfifo.h".
- GDB script fixes from Florian Rommel in the series "scripts/gdb:
Fixes for $lx_current and $lx_per_cpu".
- After much discussion, a coding-style update from Barry Song
explaining one reason why inline functions are preferred over
macros. The series is "codingstyle: avoid unused parameters for a
function-like macro""
* tag 'mm-nonmm-stable-2024-05-19-11-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (62 commits)
fs/proc: fix softlockup in __read_vmcore
nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON()
scripts: checkpatch: check unused parameters for function-like macro
Documentation: coding-style: ask function-like macros to evaluate parameters
nilfs2: use __field_struct() for a bitwise field
selftests/kcmp: remove unused open mode
nilfs2: remove calls to folio_set_error() and folio_clear_error()
kernel/watchdog_perf.c: tidy up kerneldoc
watchdog: allow nmi watchdog to use raw perf event
watchdog: handle comma separated nmi_watchdog command line
nilfs2: make superblock data array index computation sparse friendly
squashfs: remove calls to set the folio error flag
squashfs: convert squashfs_symlink_read_folio to use folio APIs
scripts/gdb: fix detection of current CPU in KGDB
scripts/gdb: make get_thread_info accept pointers
scripts/gdb: fix parameter handling in $lx_per_cpu
scripts/gdb: fix failing KGDB detection during probe
kfifo: don't use "proxy" headers
media: stih-cec: add missing io.h
media: rc: add missing io.h
...
Nine patches this cycle and they split into just three topics:
1. Adopt coccinelle's recommendation to adopt str_plural().
2. A set of seven patches to refactor kdb_read() to improve both code clarity
and it's discipline with respect to fixed size buffers. This isn't just a
refactor. Between them these also fix a cursor movement redraw problem and
two buffer overflows (one latent and one real, albeit difficult to
tickle).
3. Fix an NMI-safety problem when enqueuing kdb's keyboard reset code.
I wrote eight of the nine patches in this collection so many thanks to Doug
Anderson for the reviews. The changes that affects drivers/tty/serial is
acked by Greg KH.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAmZIx28ACgkQfOMlXTn3
iKHxGg//VS1Q7Hrr+AdJyAg3oo9KbyRRutvAgEI8zT0zaXxBmalK2H616x2JpN4O
OQm3/bIs/3qTPx3BC+a4btDJ8+b4R9U5HW928dY35mpaOvVF0IRHK57LIiksFRXD
tEWFMf5CB0MfYzR3ytAhZPOBkk5Qwm1T7T54ZXcnA/V6Xh8eBC3yap8DlDcYL6FB
VFqcVhQ6lpvE1gpfC5zq814d3wNM+rL9sCPee90fQr62Gz4FJWQGBrNgj2PwWfWI
65K0KAWyyAwShVF3eZT19KdyibfRsCaatA1wMBrnSmlaO5XyTXLeeyh9sL2opgdK
3Qrbm8u0ZU/OfIJ+yVejEB8PnUH2PNQTCNduayds8BHuUJFVW+C7q/UTdWEzVr/l
0RsX33WYsgge1chFRRVV+Tsj3ye0D7MSovzB/UqHaA0kJc75A3hUVAenEdXEwGky
ho9zQF0GwXE+xusrG6nW8ATO++9akLSkMHQyBuZ9x+apgVVk8rOsDHcxD5Pry4xL
Wz7xa2jTo7vDq0NuP5DCke/fBFD49m8OwmIsCDjIxN/vkxZIKfJLHqMeIfS/KPZX
2zh+0REsGdidndChB/wSHT24BlD45G0nMsJEbiMkHqMA+4uAFjF6clSfW52OU80J
4u/+LNh1GGQVpOK7fCrr+zlYFCYieFui3Xch/+MRGgGqt8z1JtU=
=vVUC
-----END PGP SIGNATURE-----
Merge tag 'kgdb-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"Nine patches this cycle and they split into just three topics:
- Adopt coccinelle's recommendation to adopt str_plural()
- A set of seven patches to refactor kdb_read() to improve both code
clarity and its discipline with respect to fixed size buffers.
This isn't just a refactor. Between them these also fix a cursor
movement redraw problem and two buffer overflows (one latent and
one real, albeit difficult to tickle).
- Fix an NMI-safety problem when enqueuing kdb's keyboard reset code
I wrote eight of the nine patches in this collection so many thanks to
Doug Anderson for the reviews. The changes that affects
drivers/tty/serial is acked by Greg KH"
* tag 'kgdb-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
serial: kgdboc: Fix NMI-safety problems from keyboard reset code
kdb: Simplify management of tmpbuffer in kdb_read()
kdb: Replace double memcpy() with memmove() in kdb_read()
kdb: Use format-specifiers rather than memset() for padding in kdb_read()
kdb: Merge identical case statements in kdb_read()
kdb: Fix console handling when editing and tab-completing commands
kdb: Use format-strings rather than '\0' injection in kdb_read()
kdb: Fix buffer overflow during tab-complete
kdb: Use str_plural() to fix Coccinelle warning
documented (hopefully adequately) in the respective changelogs. Notable
series include:
- Lucas Stach has provided some page-mapping
cleanup/consolidation/maintainability work in the series "mm/treewide:
Remove pXd_huge() API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one
test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being allocated:
number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in largely
similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene" Johannes
Weiner has fixed the page allocator's handling of migratetype requests,
with resulting improvements in compaction efficiency.
- In the series "make the hugetlb migration strategy consistent" Baolin
Wang has fixed a hugetlb migration issue, which should improve hugetlb
allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when memory
almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting" Kairui
Song has optimized pagecache insertion, yielding ~10% performance
improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various page->flags
cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb
functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series
"mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs. This
is a simple first-cut implementation for now. The series is "support
multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in the
series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts in
the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call it
GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to
use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes
the initialization code so that migration between different memory types
works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant driver
in the series "mm: follow_pte() improvements and acrn follow_pte()
fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to folio
in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size THP's
in the series "mm: add per-order mTHP alloc and swpout counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled
and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His series
"mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes
the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback instrumentation
in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series "Fix
and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the
series "Improve anon_vma scalability for anon VMAs". Intel's test bot
reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZkgQYwAKCRDdBJ7gKXxA
jrdKAP9WVJdpEcXxpoub/vVE0UWGtffr8foifi9bCwrQrGh5mgEAx7Yf0+d/oBZB
nvA4E0DcPrUAFy144FNM0NTCb7u9vAw=
=V3R/
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
"The usual shower of singleton fixes and minor series all over MM,
documented (hopefully adequately) in the respective changelogs.
Notable series include:
- Lucas Stach has provided some page-mapping cleanup/consolidation/
maintainability work in the series "mm/treewide: Remove pXd_huge()
API".
- In the series "Allow migrate on protnone reference with
MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
one test.
- In their series "Memory allocation profiling" Kent Overstreet and
Suren Baghdasaryan have contributed a means of determining (via
/proc/allocinfo) whereabouts in the kernel memory is being
allocated: number of calls and amount of memory.
- Matthew Wilcox has provided the series "Various significant MM
patches" which does a number of rather unrelated things, but in
largely similar code sites.
- In his series "mm: page_alloc: freelist migratetype hygiene"
Johannes Weiner has fixed the page allocator's handling of
migratetype requests, with resulting improvements in compaction
efficiency.
- In the series "make the hugetlb migration strategy consistent"
Baolin Wang has fixed a hugetlb migration issue, which should
improve hugetlb allocation reliability.
- Liu Shixin has hit an I/O meltdown caused by readahead in a
memory-tight memcg. Addressed in the series "Fix I/O high when
memory almost met memcg limit".
- In the series "mm/filemap: optimize folio adding and splitting"
Kairui Song has optimized pagecache insertion, yielding ~10%
performance improvement in one test.
- Baoquan He has cleaned up and consolidated the early zone
initialization code in the series "mm/mm_init.c: refactor
free_area_init_core()".
- Baoquan has also redone some MM initializatio code in the series
"mm/init: minor clean up and improvement".
- MM helper cleanups from Christoph Hellwig in his series "remove
follow_pfn".
- More cleanups from Matthew Wilcox in the series "Various
page->flags cleanups".
- Vlastimil Babka has contributed maintainability improvements in the
series "memcg_kmem hooks refactoring".
- More folio conversions and cleanups in Matthew Wilcox's series:
"Convert huge_zero_page to huge_zero_folio"
"khugepaged folio conversions"
"Remove page_idle and page_young wrappers"
"Use folio APIs in procfs"
"Clean up __folio_put()"
"Some cleanups for memory-failure"
"Remove page_mapping()"
"More folio compat code removal"
- David Hildenbrand chipped in with "fs/proc/task_mmu: convert
hugetlb functions to work on folis".
- Code consolidation and cleanup work related to GUP's handling of
hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".
- Rick Edgecombe has developed some fixes to stack guard gaps in the
series "Cover a guard gap corner case".
- Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
series "mm/ksm: fix ksm exec support for prctl".
- Baolin Wang has implemented NUMA balancing for multi-size THPs.
This is a simple first-cut implementation for now. The series is
"support multi-size THP numa balancing".
- Cleanups to vma handling helper functions from Matthew Wilcox in
the series "Unify vma_address and vma_pgoff_address".
- Some selftests maintenance work from Dev Jain in the series
"selftests/mm: mremap_test: Optimizations and style fixes".
- Improvements to the swapping of multi-size THPs from Ryan Roberts
in the series "Swap-out mTHP without splitting".
- Kefeng Wang has significantly optimized the handling of arm64's
permission page faults in the series
"arch/mm/fault: accelerate pagefault when badaccess"
"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"
- GUP cleanups from David Hildenbrand in "mm/gup: consistently call
it GUP-fast".
- hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
path to use struct vm_fault".
- selftests build fixes from John Hubbard in the series "Fix
selftests/mm build without requiring "make headers"".
- Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
series "Improved Memory Tier Creation for CPUless NUMA Nodes".
Fixes the initialization code so that migration between different
memory types works as intended.
- David Hildenbrand has improved follow_pte() and fixed an errant
driver in the series "mm: follow_pte() improvements and acrn
follow_pte() fixes".
- David also did some cleanup work on large folio mapcounts in his
series "mm: mapcount for large folios + page_mapcount() cleanups".
- Folio conversions in KSM in Alex Shi's series "transfer page to
folio in KSM".
- Barry Song has added some sysfs stats for monitoring multi-size
THP's in the series "mm: add per-order mTHP alloc and swpout
counters".
- Some zswap cleanups from Yosry Ahmed in the series "zswap
same-filled and limit checking cleanups".
- Matthew Wilcox has been looking at buffer_head code and found the
documentation to be lacking. The series is "Improve buffer head
documentation".
- Multi-size THPs get more work, this time from Lance Yang. His
series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
optimizes the freeing of these things.
- Kemeng Shi has added more userspace-visible writeback
instrumentation in the series "Improve visibility of writeback".
- Kemeng Shi then sent some maintenance work on top in the series
"Fix and cleanups to page-writeback".
- Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
the series "Improve anon_vma scalability for anon VMAs". Intel's
test bot reported an improbable 3x improvement in one test.
- SeongJae Park adds some DAMON feature work in the series
"mm/damon: add a DAMOS filter type for page granularity access recheck"
"selftests/damon: add DAMOS quota goal test"
- Also some maintenance work in the series
"mm/damon/paddr: simplify page level access re-check for pageout"
"mm/damon: misc fixes and improvements"
- David Hildenbrand has disabled some known-to-fail selftests ni the
series "selftests: mm: cow: flag vmsplice() hugetlb tests as
XFAIL".
- memcg metadata storage optimizations from Shakeel Butt in "memcg:
reduce memory consumption by memcg stats".
- DAX fixes and maintenance work from Vishal Verma in the series
"dax/bus.c: Fixups for dax-bus locking""
* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
selftests: cgroup: add tests to verify the zswap writeback path
mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
mm/damon/core: fix return value from damos_wmark_metric_value
mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
selftests: cgroup: remove redundant enabling of memory controller
Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
Docs/mm/damon/design: use a list for supported filters
Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
selftests/damon: classify tests for functionalities and regressions
selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
selftests/damon: add a test for DAMOS quota goal
...
- Avoid 'constexpr', which is a keyword in C23
- Allow 'dtbs_check' and 'dt_compatible_check' run independently of
'dt_binding_check'
- Fix weak references to avoid GOT entries in position-independent
code generation
- Convert the last use of 'optional' property in arch/sh/Kconfig
- Remove support for the 'optional' property in Kconfig
- Remove support for Clang's ThinLTO caching, which does not work with
the .incbin directive
- Change the semantics of $(src) so it always points to the source
directory, which fixes Makefile inconsistencies between upstream and
downstream
- Fix 'make tar-pkg' for RISC-V to produce a consistent package
- Provide reasonable default coverage for objtool, sanitizers, and
profilers
- Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
- Remove the last use of tristate choice in drivers/rapidio/Kconfig
- Various cleanups and fixes in Kconfig
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmZFlGcVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsG8voQALC8NtFpduWVfLRj2Qg6Ll/xf1vX
2igcTJEOFHkeqXLGoT8dTDKLEipUBUvKyguPq66CGwVTe2g6zy/nUSXeVtFrUsIa
msLTi8FqhqUo5lodNvGMRf8qqmuqcvnXoiQwIocF92jtsFy14bhiFY+n4HfcFNjj
GOKwqBZYQUwY/VVb090efc7RfS9c7uwABJSBelSoxg3AGZriwjGy7Pw5aSKGgVYi
inqL1eR6qwPP6z7CgQWM99soP+zwybFZmnQrsD9SniRBI4rtAat8Ih5jQFaSUFUQ
lk2w0NQBRFN88/uR2IJ2GWuIlQ74WeJ+QnCqVuQ59tV5zw90wqSmLzngfPD057Dv
JjNuhk0UyXVtpIg3lRtd4810ppNSTe33b9OM4O2H846W/crju5oDRNDHcflUXcwm
Rmn5ho1rb5QVzDVejJbgwidnUInSgJ9PZcvXQ/RJVZPhpgsBzAY9pQexG1G3hviw
y9UDrt6KP6bF9tHjmolmtdIes9Pj0c4dN6/Rdj4HS4hIQ/GDar0tnwvOvtfUctNL
orJlBsA6GeMmDVXKkR0ytOCWRYqWWbyt8g70RVKQJfuHX7/hGyAQPaQ2/u4mQhC2
aevYfbNJMj0VDfGz81HDBKFtkc5n+Ite8l157dHEl2LEabkOkRdNVcn7SNbOvZmd
ZCSnZ31h7woGfNho
=D5B/
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Avoid 'constexpr', which is a keyword in C23
- Allow 'dtbs_check' and 'dt_compatible_check' run independently of
'dt_binding_check'
- Fix weak references to avoid GOT entries in position-independent code
generation
- Convert the last use of 'optional' property in arch/sh/Kconfig
- Remove support for the 'optional' property in Kconfig
- Remove support for Clang's ThinLTO caching, which does not work with
the .incbin directive
- Change the semantics of $(src) so it always points to the source
directory, which fixes Makefile inconsistencies between upstream and
downstream
- Fix 'make tar-pkg' for RISC-V to produce a consistent package
- Provide reasonable default coverage for objtool, sanitizers, and
profilers
- Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc.
- Remove the last use of tristate choice in drivers/rapidio/Kconfig
- Various cleanups and fixes in Kconfig
* tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits)
kconfig: use sym_get_choice_menu() in sym_check_prop()
rapidio: remove choice for enumeration
kconfig: lxdialog: remove initialization with A_NORMAL
kconfig: m/nconf: merge two item_add_str() calls
kconfig: m/nconf: remove dead code to display value of bool choice
kconfig: m/nconf: remove dead code to display children of choice members
kconfig: gconf: show checkbox for choice correctly
kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal
Makefile: remove redundant tool coverage variables
kbuild: provide reasonable defaults for tool coverage
modules: Drop the .export_symbol section from the final modules
kconfig: use menu_list_for_each_sym() in sym_check_choice_deps()
kconfig: use sym_get_choice_menu() in conf_write_defconfig()
kconfig: add sym_get_choice_menu() helper
kconfig: turn defaults and additional prompt for choice members into error
kconfig: turn missing prompt for choice members into error
kconfig: turn conf_choice() into void function
kconfig: use linked list in sym_set_changed()
kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED
kconfig: gconf: remove debug code
...
Commit 1a7d0890dd4a ("kprobe/ftrace: bail out if ftrace was killed")
introduced a bad K&R function definition, which we haven't accepted in a
long long time.
Gcc seems to let it slide, but clang notices with the appropriate error:
kernel/kprobes.c:1140:24: error: a function declaration without a prototype is deprecated in all >
1140 | void kprobe_ftrace_kill()
| ^
| void
but this commit was apparently never in linux-next before it was sent
upstream, so it didn't get the appropriate build test coverage.
Fixes: 1a7d0890dd4a kprobe/ftrace: bail out if ftrace was killed
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
the rest is unremarkable.
Current release - regressions:
- virtio_net: fix missed error path rtnl_unlock after control queue
locking rework
Current release - new code bugs:
- bpf: fix KASAN slab-out-of-bounds in percpu_array_map_gen_lookup,
caused by missing nested map handling
- drv: dsa: correct initialization order for KSZ88x3 ports
Previous releases - regressions:
- af_packet: do not call packet_read_pending() from tpacket_destruct_skb()
fix performance regression
- ipv6: fix route deleting failure when metric equals 0, don't assume
0 means not set / default in this case
Previous releases - always broken:
- bridge: couple of syzbot-driven fixes
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmZHtJQACgkQMUZtbf5S
Irsfyw//ZhCFzvXKLENNalHHMXwq7DsuXe6KWlljEOzLHH0/9QqqNC9ggYKRI5OE
unB//YC3sqtAUczQnED+UOh553Pu4Kvq9334LTX5m4HJQTYLLq1aGM/UZplsBTHx
3MsXUApYFth8pqCZvIcKOZcOddeViBfzEQ9jEAsgIyaqFy3XaiH4Zf6pJAAMyUbE
19CRiK/1TYNrX01XPOeV/9vYGj9rzepo6S5zpHKsWsFZArCcRPBsea/KWYYfLjW7
ExA2Cb+eUnPkNL4bTeH6dwgQGVL8Jo/OsKmsa/tdQffnj1pshdePXtP3TBEynMJF
jSSwwUMq56yE+uok4karE3wIhciUEYvTwfgt5FErYVqfqDiX1+7AZGtdZVDX/mgH
F0etKHDhX9F1zxHVMFwOMA4rLN6cvfpe7Pg+dt4B9E0o18SyNekOM1Ngdu/1ALtd
QV41JFHweHInDRrMLdj4aWW4EYPR5SUuvg66Pec4T7x5hAAapzIJySS+RIydC+ND
guPztYxO5cn5Q7kug1FyUBSXUXZxuCNRACb6zD4/4wbVRZhz7l3OTcd13QADCiwv
Tr61r2bS1Bp/HZ3iIHBY85JnKMvpdwNXN2SPsYQQwVrv9FLj9iskH9kjwqVDG4ja
W3ivZZM+CcZbnB81JynK7Ge54PT+SiPy3Nw4RIVxFl1QlzXC21E=
=7eys
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Current release - regressions:
- virtio_net: fix missed error path rtnl_unlock after control queue
locking rework
Current release - new code bugs:
- bpf: fix KASAN slab-out-of-bounds in percpu_array_map_gen_lookup,
caused by missing nested map handling
- drv: dsa: correct initialization order for KSZ88x3 ports
Previous releases - regressions:
- af_packet: do not call packet_read_pending() from
tpacket_destruct_skb() fix performance regression
- ipv6: fix route deleting failure when metric equals 0, don't assume
0 means not set / default in this case
Previous releases - always broken:
- bridge: couple of syzbot-driven fixes"
* tag 'net-6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits)
selftests: net: local_termination: annotate the expected failures
net: dsa: microchip: Correct initialization order for KSZ88x3 ports
MAINTAINERS: net: Update reviewers for TI's Ethernet drivers
dt-bindings: net: ti: Update maintainers list
l2tp: fix ICMP error handling for UDP-encap sockets
net: txgbe: fix to control VLAN strip
net: wangxun: match VLAN CTAG and STAG features
net: wangxun: fix to change Rx features
af_packet: do not call packet_read_pending() from tpacket_destruct_skb()
virtio_net: Fix missed rtnl_unlock
netrom: fix possible dead-lock in nr_rt_ioctl()
idpf: don't skip over ethtool tcp-data-split setting
dt-bindings: net: qcom: ethernet: Allow dma-coherent
bonding: fix oops during rmmod
net/ipv6: Fix route deleting failure when metric equals 0
selftests/net: reduce xfrm_policy test time
selftests/bpf: Adjust btf_dump test to reflect recent change in file_operations
selftests/bpf: Adjust test_access_variable_array after a kernel function name change
selftests/net/lib: no need to record ns name if it already exist
net: qrtr: ns: Fix module refcnt
...
- Minor update to the user_events interface
The ABI of creating a user event states that the fields
are separated by semicolons, and spaces should be ignored.
But the parsing expected at least one space to be there (which was incorrect).
Fix the reading of the string to handle fields separated by
semicolons but no space between them.
This does extend the API sightly as now "field;field" will now be
parsed and not cause an error. But it should not cause any regressions
as no logic should expect it to fail.
Note, that the logic that parses the event fields to create the
trace_event works with no spaces after the semi-colon. It is
the logic that tests against existing events that is inconsistent.
This causes registering an event without using spaces to succeed
if it doesn't exist, but makes the same call that tries to register
to the same event, but doesn't use spaces, fail.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZkZN1hQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qvCXAQDO8b2GeCuAMa2SW7PMFdpB2Tc2F5v4
WPBEKaLb0TU+7AEAwR0rCm22p9rpke754lcpZDz7xJNcyiyMkyXeJWCauQA=
=PYwP
-----END PGP SIGNATURE-----
Merge tag 'trace-user-events-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing user-event updates from Steven Rostedt:
- Minor update to the user_events interface
The ABI of creating a user event states that the fields are separated
by semicolons, and spaces should be ignored.
But the parsing expected at least one space to be there (which was
incorrect). Fix the reading of the string to handle fields separated
by semicolons but no space between them.
This does extend the API sightly as now "field;field" will now be
parsed and not cause an error. But it should not cause any regressions
as no logic should expect it to fail.
Note, that the logic that parses the event fields to create the
trace_event works with no spaces after the semi-colon. It is
the logic that tests against existing events that is inconsistent.
This causes registering an event without using spaces to succeed
if it doesn't exist, but makes the same call that tries to register
to the same event, but doesn't use spaces, fail.
* tag 'trace-user-events-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
selftests/user_events: Add non-spacing separator check
tracing/user_events: Fix non-spaced field matching
- Add ring_buffer memory mappings
The tracing ring buffer was created based on being mostly used with the
splice system call. It is broken up into page ordered sub-buffers and the
reader swaps a new sub-buffer with an existing sub-buffer that's part
of the write buffer. It then has total access to the swapped out
sub-buffer and can do copyless movements of the memory into other mediums
(file system, network, etc).
The buffer is great for passing around the ring buffer contents in the
kernel, but is not so good for when the consumer is the user space task
itself.
A new interface is added that allows user space to memory map the ring
buffer. It will get all the write sub-buffers as well as reader sub-buffer
(that is not written to). It can send an ioctl to change which sub-buffer
is the new reader sub-buffer.
The ring buffer is read only to user space. It only needs to call the
ioctl when it is finished with a sub-buffer and needs a new sub-buffer
that the writer will not write over.
A self test program was also created for testing and can be used as
an example for the interface to user space. The libtracefs (external
to the kernel) also has code that interacts with this, although it is
disabled until the interface is in a official release. It can be enabled
by compiling the library with a special flag. This was used for testing
applications that perform better with the buffer being mapped.
Memory mapped buffers have limitations. The main one is that it can not be
used with the snapshot logic. If the buffer is mapped, snapshots will be
disabled. If any logic is set to trigger snapshots on a buffer, that
buffer will not be allowed to be mapped.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZkYzDRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qttNAQCj3I0OpeI1vms85ShIa7Eha2qes5uC
Yml2fnapkmRSwAEAp5UTGxtDctycWOk9B9PA7/oJmLgATaQwRKoEeTUwfAA=
=TyEB
-----END PGP SIGNATURE-----
Merge tag 'trace-ringbuffer-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing ring buffer updates from Steven Rostedt:
"Add ring_buffer memory mappings.
The tracing ring buffer was created based on being mostly used with
the splice system call. It is broken up into page ordered sub-buffers
and the reader swaps a new sub-buffer with an existing sub-buffer
that's part of the write buffer. It then has total access to the
swapped out sub-buffer and can do copyless movements of the memory
into other mediums (file system, network, etc).
The buffer is great for passing around the ring buffer contents in the
kernel, but is not so good for when the consumer is the user space
task itself.
A new interface is added that allows user space to memory map the ring
buffer. It will get all the write sub-buffers as well as reader
sub-buffer (that is not written to). It can send an ioctl to change
which sub-buffer is the new reader sub-buffer.
The ring buffer is read only to user space. It only needs to call the
ioctl when it is finished with a sub-buffer and needs a new sub-buffer
that the writer will not write over.
A self test program was also created for testing and can be used as an
example for the interface to user space. The libtracefs (external to
the kernel) also has code that interacts with this, although it is
disabled until the interface is in a official release. It can be
enabled by compiling the library with a special flag. This was used
for testing applications that perform better with the buffer being
mapped.
Memory mapped buffers have limitations. The main one is that it can
not be used with the snapshot logic. If the buffer is mapped,
snapshots will be disabled. If any logic is set to trigger snapshots
on a buffer, that buffer will not be allowed to be mapped"
* tag 'trace-ringbuffer-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Add cast to unsigned long addr passed to virt_to_page()
ring-buffer: Have mmapped ring buffer keep track of missed events
ring-buffer/selftest: Add ring-buffer mapping test
Documentation: tracing: Add ring-buffer mapping
tracing: Allow user-space mapping of the ring-buffer
ring-buffer: Introducing ring-buffer mapping functions
ring-buffer: Allocate sub-buffers with __GFP_COMP
- Removed unused ftrace_direct_funcs variables
- Fix a possible NULL pointer dereference race in eventfs
- Update do_div() usage in trace event benchmark test
- Speedup direct function registration with asynchronous RCU callback.
The synchronization was done in the registration code and this
caused delays when registering direct callbacks. Move the freeing
to a call_rcu() that will prevent delaying of the registering.
- Replace simple_strtoul() usage with kstrtoul()
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZkYrphQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnNbAP0TCG5dLbHlcUtXFCG3AdOufOteyJZ4
efbRjFq0QY/RvQD7Bh1BNLSBsG0ptKPC7ch377A55xsgxZTr0mEarVTOQwg=
=GKXv
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Remove unused ftrace_direct_funcs variables
- Fix a possible NULL pointer dereference race in eventfs
- Update do_div() usage in trace event benchmark test
- Speedup direct function registration with asynchronous RCU callback.
The synchronization was done in the registration code and this caused
delays when registering direct callbacks. Move the freeing to a
call_rcu() that will prevent delaying of the registering.
- Replace simple_strtoul() usage with kstrtoul()
* tag 'trace-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Fix a possible null pointer dereference in eventfs_find_events()
ftrace: Fix possible use-after-free issue in ftrace_location()
ftrace: Remove unused global 'ftrace_direct_func_count'
ftrace: Remove unused list 'ftrace_direct_funcs'
tracing: Improve benchmark test performance by using do_div()
ftrace: Use asynchronous grace period for register_ftrace_direct()
ftrace: Replaces simple_strtoul in ftrace
- tracing/probes: Adding new pseudo-types %pd and %pD support for dumping
dentry name from 'struct dentry *' and file name from 'struct file *'.
- uprobes: Some performance optimizations have been done.
. Speed up the BPF uprobe event by delaying the fetching of the uprobe
event arguments that are not used in BPF.
. Avoid locking by speculatively checking whether uprobe event is valid.
. Reduce lock contention by using read/write_lock instead of spinlock for
uprobe list operation. This improved BPF uprobe benchmark result 43% on
average.
- rethook: Removes non-fatal warning messages when tracing stack from BPF
and skip rcu_is_watching() validation in rethook if possible.
- objpool: Optimizing objpool (which is used by kretprobes and fprobe as
rethook backend storage) by inlining functions and avoid caching nr_cpu_ids
because it is a const value.
- fprobe: Add entry/exit callbacks types (code cleanup)
- kprobes: Check ftrace was killed in kprobes if it uses ftrace.
-----BEGIN PGP SIGNATURE-----
iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmZFUxsbHG1hc2FtaS5o
aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8b+fIH/A96/SeC5WRLhXmHfTCM
IvKUea2n0b0oV/2pVfHqfkCBTICuUZ97Opd9VH9jLtjBOTh0fUOGZ2DNVGdSYfWm
IIkS5dhuZxHXrSHEVYykwLHI3AOL7Q6Ny9EmOg1CNMidUkPMNtBvppsBYPlFU/B/
qQJAvOdkVOnNITCaas0+MNgepoVVKdJzdNQ1I4WrGyG8isCZBaCYKo2QcGyheCNN
y8NXvnVHgmgHQ8nTaeE5AawclFzFnhwHfPQPe1kiyGrx15b8K+VYmaZxPKv33A1a
KT3TKJ1Ep7s7iWFh2iPVJzIwOXCmSnvNTKfNx/MDuKtO7UVfFwytoMEaekbmv3bG
VqM=
=n/mW
-----END PGP SIGNATURE-----
Merge tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
- tracing/probes: Add new pseudo-types %pd and %pD support for dumping
dentry name from 'struct dentry *' and file name from 'struct file *'
- uprobes performance optimizations:
- Speed up the BPF uprobe event by delaying the fetching of the
uprobe event arguments that are not used in BPF
- Avoid locking by speculatively checking whether uprobe event is
valid
- Reduce lock contention by using read/write_lock instead of
spinlock for uprobe list operation. This improved BPF uprobe
benchmark result 43% on average
- rethook: Remove non-fatal warning messages when tracing stack from
BPF and skip rcu_is_watching() validation in rethook if possible
- objpool: Optimize objpool (which is used by kretprobes and fprobe as
rethook backend storage) by inlining functions and avoid caching
nr_cpu_ids because it is a const value
- fprobe: Add entry/exit callbacks types (code cleanup)
- kprobes: Check ftrace was killed in kprobes if it uses ftrace
* tag 'probes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
kprobe/ftrace: bail out if ftrace was killed
selftests/ftrace: Fix required features for VFS type test case
objpool: cache nr_possible_cpus() and avoid caching nr_cpu_ids
objpool: enable inlining objpool_push() and objpool_pop() operations
rethook: honor CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING in rethook_try_get()
ftrace: make extra rcu_is_watching() validation check optional
uprobes: reduce contention on uprobes_tree access
rethook: Remove warning messages printed for finding return address of a frame.
fprobe: Add entry/exit callbacks types
selftests/ftrace: add fprobe test cases for VFS type "%pd" and "%pD"
selftests/ftrace: add kprobe test cases for VFS type "%pd" and "%pD"
Documentation: tracing: add new type '%pd' and '%pD' for kprobe
tracing/probes: support '%pD' type for print struct file's name
tracing/probes: support '%pd' type for print struct dentry's name
uprobes: add speculative lockless system-wide uprobe filter check
uprobes: prepare uprobe args buffer lazily
uprobes: encapsulate preparation of uprobe args buffer
Summary
* Removed sentinel elements from ctl_table structs in kernel/*
Removing sentinels in ctl_table arrays reduces the build time size and
runtime memory consumed by ~64 bytes per array. Removals for net/, io_uring/,
mm/, ipc/ and security/ are set to go into mainline through their respective
subsystems making the next release the most likely place where the final
series that removes the check for proc_name == NULL will land. This PR adds
to removals already in arch/, drivers/ and fs/.
* Adjusted ctl_table definitions and references to allow constification
Adjustments:
- Removing unused ctl_table function arguments
- Moving non-const elements from ctl_table to ctl_table_header
- Making ctl_table pointers const in ctl_table_root structure
Making the static ctl_table structs const will increase safety by keeping the
pointers to proc_handler functions in .rodata. Though no ctl_tables where
made const in this PR, the ground work for making that possible has started
with these changes sent by Thomas Weißschuh.
Testing
* These changes went into linux-next after v6.9-rc4; giving it a good month of
testing.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmZFvBMACgkQupfNUreW
QU/eGAv9EWeiXKxr3EVSMAsb9MWbJq7C99I/pd5hMf+qH4PgJpKDH7w/sb2e8h8+
unGiW83ikgrtph7OS4/xM3Y9r3Nvzd6C/OztqgMnNKeRFdMgP7wu9HaSNs05ordb
CqJdhvL93quc5HxrGTS9sdLK/wLJWOHwuWMXhX4qS44JNxTdPV2q10Rb7DZyHZ6O
C9qp61L2Q2CrnOBKIx8MoeCh20ynJQAo3b0pTN63ZYF4D0vqCcnYNNTPkge4ID8/
ULJoP5hAQY0vJ4g4fC4Gmooa5GECpm8MfZUf3SdgPyauqM/sm3dVdsLXAWD4Phcp
TsG2a/5KMYwnLHrUGwDW7bFfEemRU88h0Iam56+SKMl1kMlEpWaLL9ApQXoHFayG
e10izS+i/nlQiqYIHtuczCoTimT4/LGnonCLcdA//C3XzBT5MnOd7xsjuaQSpFWl
/CV9SZa4ABwzX7u2jty8ik90iihLCFQyKj1d9m1mDVbgb6r3iUOxVuHBgMtY7MF7
eyaEmV7l
=/rQW
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Remove sentinel elements from ctl_table structs in kernel/*
Removing sentinels in ctl_table arrays reduces the build time size
and runtime memory consumed by ~64 bytes per array. Removals for
net/, io_uring/, mm/, ipc/ and security/ are set to go into mainline
through their respective subsystems making the next release the most
likely place where the final series that removes the check for
proc_name == NULL will land.
This adds to removals already in arch/, drivers/ and fs/.
- Adjust ctl_table definitions and references to allow constification
- Remove unused ctl_table function arguments
- Move non-const elements from ctl_table to ctl_table_header
- Make ctl_table pointers const in ctl_table_root structure
Making the static ctl_table structs const will increase safety by
keeping the pointers to proc_handler functions in .rodata. Though no
ctl_tables where made const in this PR, the ground work for making
that possible has started with these changes sent by Thomas
Weißschuh.
* tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: drop now unnecessary out-of-bounds check
sysctl: move sysctl type to ctl_table_header
sysctl: drop sysctl_is_perm_empty_ctl_table
sysctl: treewide: constify argument ctl_table_root::permissions(table)
sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table)
bpf: Remove the now superfluous sentinel elements from ctl_table array
delayacct: Remove the now superfluous sentinel elements from ctl_table array
kprobes: Remove the now superfluous sentinel elements from ctl_table array
printk: Remove the now superfluous sentinel elements from ctl_table array
scheduler: Remove the now superfluous sentinel elements from ctl_table array
seccomp: Remove the now superfluous sentinel elements from ctl_table array
timekeeping: Remove the now superfluous sentinel elements from ctl_table array
ftrace: Remove the now superfluous sentinel elements from ctl_table array
umh: Remove the now superfluous sentinel elements from ctl_table array
kernel misc: Remove the now superfluous sentinel elements from ctl_table array
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
- Allow per-process DEXCR (Dynamic Execution Control Register) settings via
prctl, notably NPHIE which controls hashst/hashchk for ROP protection.
- Install powerpc selftests in sub-directories. Note this changes the way
run_kselftest.sh needs to be invoked for powerpc selftests.
- Change fadump (Firmware Assisted Dump) to better handle memory add/remove.
- Add support for passing additional parameters to the fadump kernel.
- Add support for updating the kdump image on CPU/memory add/remove events.
- Other small features, cleanups and fixes.
Thanks to: Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann,
Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe Jaillet, Christophe
Leroy, Colin Ian King, Cédric Le Goater, Dr. David Alan Gilbert, Erhard Furtner,
Frank Li, GUO Zihua, Ganesh Goudar, Geoff Levand, Ghanshyam Agrawal, Greg Kurz,
Hari Bathini, Joel Stanley, Justin Stitt, Kunwu Chan, Li Yang, Lidong Zhong,
Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Matthias Schiffer,
Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas
Miehlbradt, Ran Wang, Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta,
Shrikanth Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, Zhao Chenhui.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmZHLtwTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgCGdD/0cqQkYl6+E0/K68Y7jnAWF+l0LNFlm
/4jZ+zKXPiPhSdaQq4xo2ZjEooUPsm3c+AHidmrAtOMBULvv4pyciu61hrVu4Y2b
aAudkBMUc+i/Lfaz7fq1KnN4LDFVm7xZZ+i/ju9tOBLMpOZ3YZ+YoOGA6nqsshJF
XuB5h0T+H55he1wBpvyyrsUUyss53Mp3IsajxdwBOsUDDp0fSAg8SLEyhoiK3BsQ
EjEa6iEqJSBheqFEXPvqsMuqM3k51CHe/pCOMODjo7P+u/MNrClZUscZKXGB5xq9
Bu3SPxIYfRmU4XE53517faElEPmlxSBrjQGCD1EGEVXGsjn6r7TD6R5voow3SoUq
CLTy90KNNrS1cIqeomu6bJ/anzYrViqTdekImA7Vb+Ol8f+uT9l+l1D75eYOKPQ3
N0AHoa4rnWIb5kjCAjHaZ54O+B2q2tPlQqFUmt+BrvZyKS13zjE36stnArxP3MPC
Xw6y3huX3AkZiJ4mQYRiBn//xGOLwrRCd/EoTDnoe08yq0Hoor6qIm4uEy2Nu3Kf
0mBsEOxMsmQd6NEq43B/sFgVbbxKhAyxfZ9gHqxDQZcgoxXcMesyj/n4+jM5sRYK
zmavLlykM2Tjlh1evs8+e0mCEwDjDn2GRlqstJQTrmnGhbMKi3jvw9I7gGtZVqbS
kAflTXzsIXvxBA==
=GoCV
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT.
- Allow per-process DEXCR (Dynamic Execution Control Register) settings
via prctl, notably NPHIE which controls hashst/hashchk for ROP
protection.
- Install powerpc selftests in sub-directories. Note this changes the
way run_kselftest.sh needs to be invoked for powerpc selftests.
- Change fadump (Firmware Assisted Dump) to better handle memory
add/remove.
- Add support for passing additional parameters to the fadump kernel.
- Add support for updating the kdump image on CPU/memory add/remove
events.
- Other small features, cleanups and fixes.
Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd
Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe
Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David
Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff
Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin
Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh
Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan
Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang,
Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth
Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav
Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui.
* tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits)
powerpc/fadump: Fix section mismatch warning
powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
powerpc/fadump: update documentation about bootargs_append
powerpc/fadump: pass additional parameters when fadump is active
powerpc/fadump: setup additional parameters for dump capture kernel
powerpc/pseries/fadump: add support for multiple boot memory regions
selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction"
KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info()
KVM: PPC: Fix documentation for ppc mmu caps
KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver
KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception
powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
powerpc/code-patching: Use dedicated memory routines for patching
powerpc/code-patching: Test patch_instructions() during boot
powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region()
powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX
powerpc: Fix typos
powerpc/eeh: Fix spelling of the word "auxillary" and update comment
macintosh/ams: Fix unused variable warning
powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large
...
In the cgroup v2 CPU subsystem, assuming we have a
cgroup named 'test', and we set cpu.max and cpu.max.burst:
# echo 1000000 > /sys/fs/cgroup/test/cpu.max
# echo 1000000 > /sys/fs/cgroup/test/cpu.max.burst
then we check cpu.max and cpu.max.burst:
# cat /sys/fs/cgroup/test/cpu.max
1000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst
1000000
Next we set cpu.max again and check cpu.max and
cpu.max.burst:
# echo 2000000 > /sys/fs/cgroup/test/cpu.max
# cat /sys/fs/cgroup/test/cpu.max
2000000 100000
# cat /sys/fs/cgroup/test/cpu.max.burst
1000
... we find that the cpu.max.burst value changed unexpectedly.
In cpu_max_write(), the unit of the burst value returned
by tg_get_cfs_burst() is microseconds, while in cpu_max_write(),
the burst unit used for calculation should be nanoseconds,
which leads to the bug.
To fix it, get the burst value directly from tg->cfs_bandwidth.burst.
Fixes: f4183717b370 ("sched/fair: Introduce the burstable CFS controller")
Reported-by: Qixin Liao <liaoqixin@huawei.com>
Signed-off-by: Cheng Yu <serein.chengyu@huawei.com>
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20240424132438.514720-1-serein.chengyu@huawei.com
On 05/03/2024 15:05, Vincent Guittot wrote:
I'm fine with either and that was my first thought here, too, but it did seem like
the comment was mostly placed there to justify the 'unexpected' high utilization
when explicitly passing FREQUENCY_UTIL and the need to clamp it then.
So removing did feel slightly more natural to me anyway.
So alternatively:
From: Christian Loehle <christian.loehle@arm.com>
Date: Tue, 5 Mar 2024 09:34:41 +0000
Subject: [PATCH] sched/fair: Remove stale FREQUENCY_UTIL mention
effective_cpu_util() flags were removed, so remove mentioning of the
flag.
commit 9c0b4bb7f6303 ("sched/cpufreq: Rework schedutil governor performance estimation")
reworked effective_cpu_util() removing enum cpu_util_type. Modify the
comment accordingly.
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/0e2833ee-0939-44e0-82a2-520a585a0153@arm.com