41256 Commits

Author SHA1 Message Date
Linus Torvalds
d3feaff4d9 Char/Misc driver fixes for 6.2-rc7
Here are a number of small char/misc/whatever driver fixes for 6.2-rc7.
 They include:
   - IIO driver fixes for some reported problems
   - nvmem driver fixes
   - fpga driver fixes
   - debugfs memory leak fix in the hv_balloon and irqdomain code
     (irqdomain change was acked by the maintainer.)
 
 All have been in linux-next with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY9+Xkw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yl92QCfSTATuad8nUXmtBDsveUyAV3G6pkAoJFMWAIj
 otmAl9XOGWxdwAURdovs
 =tSdo
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are a number of small char/misc/whatever driver fixes. They
  include:

   - IIO driver fixes for some reported problems

   - nvmem driver fixes

   - fpga driver fixes

   - debugfs memory leak fix in the hv_balloon and irqdomain code
     (irqdomain change was acked by the maintainer)

  All have been in linux-next with no reported problems"

* tag 'char-misc-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (33 commits)
  kernel/irq/irqdomain.c: fix memory leak with using debugfs_lookup()
  HV: hv_balloon: fix memory leak with using debugfs_lookup()
  nvmem: qcom-spmi-sdam: fix module autoloading
  nvmem: core: fix return value
  nvmem: core: fix cell removal on error
  nvmem: core: fix device node refcounting
  nvmem: core: fix registration vs use race
  nvmem: core: fix cleanup after dev_set_name()
  nvmem: core: remove nvmem_config wp_gpio
  nvmem: core: initialise nvmem->id early
  nvmem: sunxi_sid: Always use 32-bit MMIO reads
  nvmem: brcm_nvram: Add check for kzalloc
  iio: imu: fxos8700: fix MAGN sensor scale and unit
  iio: imu: fxos8700: remove definition FXOS8700_CTRL_ODR_MIN
  iio: imu: fxos8700: fix failed initialization ODR mode assignment
  iio: imu: fxos8700: fix incorrect ODR mode readback
  iio: light: cm32181: Fix PM support on system with 2 I2C resources
  iio: hid: fix the retval in gyro_3d_capture_sample
  iio: hid: fix the retval in accel_3d_capture_sample
  iio: imu: st_lsm6dsx: fix build when CONFIG_IIO_TRIGGERED_BUFFER=m
  ...
2023-02-05 11:52:23 -08:00
Linus Torvalds
de506eec89 - Lock the proper critical section when dealing with perf event context
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPfdlAACgkQEsHwGGHe
 VUrG1BAAvbH5AHHgjiF2WkfPpJ7v4/GhYerks/YTq3uKgnAtCOnsDBS18oRVj63A
 iDy6VzZUOQ3NcarJoz+eGLjSnLQ4xZY9qm42uHVGKol1Nz9Weu2loIOOSUsINe7S
 6qNE6HASM4GHUGJ1uuMxnOt0I0o8d01Eo9ZPd6ieAsmsGc4GLNOgC+h8eKDDlvOz
 gSTWzQUF29DSIY2JVyZ9lc5pIZ6E+gHnjIUjPdwAbYSgMpjGekNFn/OTkB4ly5G4
 ehoXUudTHG/fXQ0fKXmQt4aGbJaplVxf86f/9hpuCaHP8/48Zq/eNf5udNrlhzVU
 HAkpZcWomtGIeu+y5dyXsh1jm3tQOc5MCSV/LI7+pVl/5jMMn48lyL7HT8K2gJzd
 XNFrO1KxE0Sk3d1CZKgBXjLSaV5ey8uphlpAEQpbv7zbEYlInpo+SGvUmapCkyYp
 JFNDK7cCmP1vSaS4DkYbK3YxiGfWgbN/o7tRAFO8yHRl/yjsjNqz0BESpM8AsDz6
 UbrluPbjfbkV4HYXEXHlKg+qfgUX4qaTHNNk1m2JUVkRvVgwF5aFEBrZ6IVtNT9S
 8KXrOfjXruRSWtcJP9pIeMN/d4Uq7ldkcRHu/yyHHTJqifYk8z8jT/kGs2AQqecO
 Thh7Iruu3b6HUz2nLRmdeBIRsZn6oAqI+vNLs42l7og2BjQ4QU0=
 =Yway
 -----END PGP SIGNATURE-----

Merge tag 'perf_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Borislav Petkov:

 - Lock the proper critical section when dealing with perf event context

* tag 'perf_urgent_for_v6.2_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix perf_event_pmu_context serialization
2023-02-05 11:03:56 -08:00
Anup Patel
835a486cd9 genirq: Add mechanism to multiplex a single HW IPI
All RISC-V platforms have a single HW IPI provided by the INTC local
interrupt controller. The HW method to trigger INTC IPI can be through
external irqchip (e.g. RISC-V AIA), through platform specific device
(e.g. SiFive CLINT timer), or through firmware (e.g. SBI IPI call).

To support multiple IPIs on RISC-V, add a generic IPI multiplexing
mechanism which help us create multiple virtual IPIs using a single
HW IPI. This generic IPI multiplexing is inspired by the Apple AIC
irqchip driver and it is shared by various RISC-V irqchip drivers.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Hector Martin <marcan@marcan.st>
Tested-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230103141221.772261-4-apatel@ventanamicro.com
2023-02-05 10:57:55 +00:00
Christoph Hellwig
f05837ed73 blk-cgroup: store a gendisk to throttle in struct task_struct
Switch from a request_queue pointer and reference to a gendisk once
for the throttle information in struct task_struct.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Link: https://lore.kernel.org/r/20230203150400.3199230-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03 08:20:05 -07:00
Song Liu
0c05e7bd2d livepatch,x86: Clear relocation targets on a module removal
Josh reported a bug:

  When the object to be patched is a module, and that module is
  rmmod'ed and reloaded, it fails to load with:

  module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
  livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
  livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'

  The livepatch module has a relocation which references a symbol
  in the _previous_ loading of nfsd. When apply_relocate_add()
  tries to replace the old relocation with a new one, it sees that
  the previous one is nonzero and it errors out.

He also proposed three different solutions. We could remove the error
check in apply_relocate_add() introduced by commit eda9cec4c9a1
("x86/module: Detect and skip invalid relocations"). However the check
is useful for detecting corrupted modules.

We could also deny the patched modules to be removed. If it proved to be
a major drawback for users, we could still implement a different
approach. The solution would also complicate the existing code a lot.

We thus decided to reverse the relocation patching (clear all relocation
targets on x86_64). The solution is not
universal and is too much arch-specific, but it may prove to be simpler
in the end.

Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Originally-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230125185401.279042-2-song@kernel.org
2023-02-03 11:28:22 +01:00
Greg Kroah-Hartman
55bf243c51 kernel/printk/index.c: fix memory leak with using debugfs_lookup()
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time.  To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.

Cc: Chris Down <chris@chrisdown.name>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230202151411.2308576-1-gregkh@linuxfoundation.org
2023-02-03 10:42:02 +01:00
Ricardo Ribalda
a42aaad2e4 kexec: introduce sysctl parameters kexec_load_limit_*
kexec allows replacing the current kernel with a different one.  This is
usually a source of concerns for sysadmins that want to harden a system.

Linux already provides a way to disable loading new kexec kernel via
kexec_load_disabled, but that control is very coard, it is all or nothing
and does not make distinction between a panic kexec and a normal kexec.

This patch introduces new sysctl parameters, with finer tuning to specify
how many times a kexec kernel can be loaded.  The sysadmin can set
different limits for kexec panic and kexec reboot kernels.  The value can
be modified at runtime via sysctl, but only with a stricter value.

With these new parameters on place, a system with loadpin and verity
enabled, using the following kernel parameters:
sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a
good warranty that if initrd tries to load a panic kernel, a malitious
user will have small chances to replace that kernel with a different one,
even if they can trigger timeouts on the disk where the panic kernel
lives.

Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-3-6a8531a09b9a@chromium.org
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:50:05 -08:00
Ricardo Ribalda
7e99f8b69c kexec: factor out kexec_load_permitted
Both syscalls (kexec and kexec_file) do the same check, let's factor it
out.

Link: https://lkml.kernel.org/r/20221114-disable-kexec-reset-v6-2-6a8531a09b9a@chromium.org
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:50:04 -08:00
Randy Dunlap
e227db4d4f userns: fix a struct's kernel-doc notation
Use the 'struct' keyword for a struct's kernel-doc notation to avoid a
kernel-doc warning:

kernel/user_namespace.c:232: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * idmap_key struct holds the information necessary to find an idmapping in a

Link: https://lkml.kernel.org/r/20230108021243.16683-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:50:04 -08:00
Zqiang
eb79fa7ea7 kthread_worker: check all delayed works when destroy kthread worker
When destroying a kthread worker warn if there are still some pending
delayed works.  This indicates that the caller should clear all pending
delayed works before destroying the kthread worker.

Link: https://lkml.kernel.org/r/20230104144230.938521-1-qiang1.zhang@intel.com
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:50:02 -08:00
Greg Kroah-Hartman
d83d7ed260 kernel/irq/irqdomain.c: fix memory leak with using debugfs_lookup()
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time.  To make things simpler, just
call debugfs_lookup_and_remove() instead which handles all of the logic
at once.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable <stable@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230202151554.2310273-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-03 07:45:46 +01:00
Joey Gouly
b507808ebc mm: implement memory-deny-write-execute as a prctl
Patch series "mm: In-kernel support for memory-deny-write-execute (MDWE)",
v2.

The background to this is that systemd has a configuration option called
MemoryDenyWriteExecute [2], implemented as a SECCOMP BPF filter.  Its aim
is to prevent a user task from inadvertently creating an executable
mapping that is (or was) writeable.  Since such BPF filter is stateless,
it cannot detect mappings that were previously writeable but subsequently
changed to read-only.  Therefore the filter simply rejects any
mprotect(PROT_EXEC).  The side-effect is that on arm64 with BTI support
(Branch Target Identification), the dynamic loader cannot change an ELF
section from PROT_EXEC to PROT_EXEC|PROT_BTI using mprotect().  For
libraries, it can resort to unmapping and re-mapping but for the main
executable it does not have a file descriptor.  The original bug report in
the Red Hat bugzilla - [3] - and subsequent glibc workaround for libraries
- [4].

This series adds in-kernel support for this feature as a prctl
PR_SET_MDWE, that is inherited on fork().  The prctl denies PROT_WRITE |
PROT_EXEC mappings.  Like the systemd BPF filter it also denies adding
PROT_EXEC to mappings.  However unlike the BPF filter it only denies it if
the mapping didn't previous have PROT_EXEC.  This allows to PROT_EXEC ->
PROT_EXEC | PROT_BTI with mprotect(), which is a problem with the BPF
filter.


This patch (of 2):

The aim of such policy is to prevent a user task from creating an
executable mapping that is also writeable.

An example of mmap() returning -EACCESS if the policy is enabled:

	mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, 0, 0);

Similarly, mprotect() would return -EACCESS below:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);

The BPF filter that systemd MDWE uses is stateless, and disallows
mprotect() with PROT_EXEC completely. This new prctl allows PROT_EXEC to
be enabled if it was already PROT_EXEC, which allows the following case:

	addr = mmap(0, size, PROT_READ | PROT_EXEC, flags, 0, 0);
	mprotect(addr, size, PROT_READ | PROT_EXEC | PROT_BTI);

where PROT_BTI enables branch tracking identification on arm64.

Link: https://lkml.kernel.org/r/20230119160344.54358-1-joey.gouly@arm.com
Link: https://lkml.kernel.org/r/20230119160344.54358-2-joey.gouly@arm.com
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: nd <nd@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
Cc: Topi Miettinen <toiwoton@gmail.com>
Cc: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:33:24 -08:00
Matthew Wilcox (Oracle)
672aa27d0b mm: remove munlock_vma_page()
All callers now have a folio and can call munlock_vma_folio().  Update the
documentation to refer to munlock_vma_folio().

Link: https://lkml.kernel.org/r/20230116192827.2146732-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:33:20 -08:00
Matthew Wilcox (Oracle)
1c5509be58 mm: remove 'First tail page' members from struct page
All former users now use the folio equivalents, so remove them from the
definition of struct page.

Link: https://lkml.kernel.org/r/20230111142915.1001531-23-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:32:59 -08:00
Alistair Popple
7d4a8be0c4 mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only export
mmu_notifier_range_update_to_read_only() was originally introduced in
commit c6d23413f81b ("mm/mmu_notifier:
mmu_notifier_range_update_to_read_only() helper") as an optimisation for
device drivers that know a range has only been mapped read-only.  However
there are no users of this feature so remove it.  As it is the only user
of the struct mmu_notifier_range.vma field remove that also.

Link: https://lkml.kernel.org/r/20230110025722.600912-1-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-02 22:32:54 -08:00
Lorenzo Bianconi
b9d460c924 bpf: devmap: check XDP features in __xdp_enqueue routine
Check if the destination device implements ndo_xdp_xmit callback relying
on NETDEV_XDP_ACT_NDO_XMIT flags. Moreover, check if the destination device
supports XDP non-linear frame in __xdp_enqueue and is_valid_dst routines.
This patch allows to perform XDP_REDIRECT on non-linear XDP buffers.

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/26a94c33520c0bfba021b3fbb2cb8c1e69bf53b8.1675245258.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-02 20:48:24 -08:00
Tobias Klauser
158e5e9eea bpf: Drop always true do_idr_lock parameter to bpf_map_free_id
The do_idr_lock parameter to bpf_map_free_id was introduced by commit
bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID"). However, all callers set
do_idr_lock = true since commit 1e0bd5a091e5 ("bpf: Switch bpf_map ref
counter to atomic64_t so bpf_map_inc() never fails").

While at it also inline __bpf_map_put into its only caller bpf_map_put
now that do_idr_lock can be dropped from its signature.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Link: https://lore.kernel.org/r/20230202141921.4424-1-tklauser@distanz.ch
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-02 20:26:12 -08:00
Paul E. McKenney
bba8d3d17d Merge branch 'stall.2023.01.09a' into HEAD
stall.2023.01.09a: RCU CPU stall-warning updates.
2023-02-02 16:40:07 -08:00
Paul E. McKenney
8e1704b6a8 Merge branches 'doc.2023.01.05a', 'fixes.2023.01.23a', 'kvfree.2023.01.03a', 'srcu.2023.01.03a', 'srcu-always.2023.02.02a', 'tasks.2023.01.03a', 'torture.2023.01.05a' and 'torturescript.2023.01.03a' into HEAD
doc.2023.01.05a: Documentation update.
fixes.2023.01.23a: Miscellaneous fixes.
kvfree.2023.01.03a: kvfree_rcu() updates.
srcu.2023.01.03a: SRCU updates.
srcu-always.2023.02.02a: Finish making SRCU be unconditionally available.
tasks.2023.01.03a: Tasks-RCU updates.
torture.2023.01.05a: Torture-test updates.
torturescript.2023.01.03a: Torture-test scripting updates.
2023-02-02 16:33:43 -08:00
Paul E. McKenney
5634469360 kernel/notifier: Remove CONFIG_SRCU
Now that the SRCU Kconfig option is unconditionally selected, there is
no longer any point in conditional compilation based on CONFIG_SRCU.
Therefore, remove the #ifdef.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: Borislav Petkov <bp@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
2023-02-02 16:26:06 -08:00
Jakub Kicinski
82b4a9412b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/core/gro.c
  7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO")
  b1a78b9b9886 ("net: add support for ipv4 big tcp")
https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02 14:49:55 -08:00
Linus Torvalds
edb9b8f380 Including fixes from bpf, can and netfilter.
Current release - regressions:
 
  - phy: fix null-deref in phy_attach_direct
 
  - mac802154: fix possible double free upon parsing error
 
 Previous releases - regressions:
 
  - bpf: preserve reg parent/live fields when copying range info,
    prevent mis-verification of programs as safe
 
  - ip6: fix GRE tunnels not generating IPv6 link local addresses
 
  - phy: dp83822: fix null-deref on DP83825/DP83826 devices
 
  - sctp: do not check hb_timer.expires when resetting hb_timer
 
  - eth: mtk_sock: fix SGMII configuration after phylink conversion
 
 Previous releases - always broken:
 
  - eth: xdp: execute xdp_do_flush() before napi_complete_done()
 
  - skb: do not mix page pool and page referenced frags in GRO
 
  - bpf:
    - fix a possible task gone issue with bpf_send_signal[_thread]()
    - fix an off-by-one bug in bpf_mem_cache_idx() to select
      the right cache
    - add missing btf_put to register_btf_id_dtor_kfuncs
    - sockmap: fon't let sock_map_{close,destroy,unhash} call itself
 
  - gso: fix null-deref in skb_segment_list()
 
  - mctp: purge receive queues on sk destruction
 
  - fix UaF caused by accept on already connected socket in exotic
    socket families
 
  - tls: don't treat list head as an entry in tls_is_tx_ready()
 
  - netfilter: br_netfilter: disable sabotage_in hook after first
    suppression
 
  - wwan: t7xx: fix runtime PM implementation
 
 Misc:
 
  - MAINTAINERS: spring cleanup of networking maintainers
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmPcKlkACgkQMUZtbf5S
 IrtHgxAAxvuXeN9kQ+/gDYbxTa4Fc3gwCAA7SdJiBdfxP9xJAUEBs9TNfh5MPynb
 iIWl9tTe1OOQtWJTnp5CKMuegUj4hlJVYuLu4MK36hW56Q41cbCFvhsDk/1xoFTF
 2mtmEr5JiXZfgEdf9/tEITTGKgfANsbXOAzL4b5HcV9qtvwvEd2aFxll3VO9N2Xn
 k6o59Lr2mff7NnXQsrVkTLBxXRK9oir8VwyTnCs7j+T7E8Qe4hIkx2LDWcDiRC1e
 vSxfoWFoe+uOGTQMxlarMOAgAOt7nvOQngwCaaVpMBa/OLzGo51Clf98cpPm+cSi
 18ZjmA8r5owGghd75p2PtxcND8U1vXeeRJW+FKK60K8qznMc0D4s7HX+wHBfevnV
 U649F0OHsNsX0Y75Z5sqA1eSmJbTR9QeyiRbuS6nfOnT7ZKDw6vPJrKGJRN789y0
 erzG/JwrfnrCDavXfNNEgk0mMlP0squemffWjuH6FfL4Pt4/gUz63Nokr5vRFSns
 yskHZgXVs4gq++yrbZDZjKfQaPDkA1P/z6VADVRWh4Y5m1m6GtvDUy6vJxK7/hUs
 5giqmhbAAQq0U+2L5RIaoYGfjk2UaDcunNpqTCgjdCdMXx9Yz41QBloFgBjRshp/
 3kg6ElYPzflXXCYK4pGRrzqO4Gqz1Hrvr9YuT+kka+wrpZPDtWc=
 =OGU7
 -----END PGP SIGNATURE-----

Merge tag 'net-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, can and netfilter.

  Current release - regressions:

   - phy: fix null-deref in phy_attach_direct

   - mac802154: fix possible double free upon parsing error

  Previous releases - regressions:

   - bpf: preserve reg parent/live fields when copying range info,
     prevent mis-verification of programs as safe

   - ip6: fix GRE tunnels not generating IPv6 link local addresses

   - phy: dp83822: fix null-deref on DP83825/DP83826 devices

   - sctp: do not check hb_timer.expires when resetting hb_timer

   - eth: mtk_sock: fix SGMII configuration after phylink conversion

  Previous releases - always broken:

   - eth: xdp: execute xdp_do_flush() before napi_complete_done()

   - skb: do not mix page pool and page referenced frags in GRO

   - bpf:
      - fix a possible task gone issue with bpf_send_signal[_thread]()
      - fix an off-by-one bug in bpf_mem_cache_idx() to select the right
        cache
      - add missing btf_put to register_btf_id_dtor_kfuncs
      - sockmap: fon't let sock_map_{close,destroy,unhash} call itself

   - gso: fix null-deref in skb_segment_list()

   - mctp: purge receive queues on sk destruction

   - fix UaF caused by accept on already connected socket in exotic
     socket families

   - tls: don't treat list head as an entry in tls_is_tx_ready()

   - netfilter: br_netfilter: disable sabotage_in hook after first
     suppression

   - wwan: t7xx: fix runtime PM implementation

  Misc:

   - MAINTAINERS: spring cleanup of networking maintainers"

* tag 'net-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  mtk_sgmii: enable PCS polling to allow SFP work
  net: mediatek: sgmii: fix duplex configuration
  net: mediatek: sgmii: ensure the SGMII PHY is powered down on configuration
  MAINTAINERS: update SCTP maintainers
  MAINTAINERS: ipv6: retire Hideaki Yoshifuji
  mailmap: add John Crispin's entry
  MAINTAINERS: bonding: move Veaceslav Falico to CREDITS
  net: openvswitch: fix flow memory leak in ovs_flow_cmd_new
  net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC
  virtio-net: Keep stop() to follow mirror sequence of open()
  selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking
  selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs
  selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided
  selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning
  can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq
  can: isotp: split tx timer into transmission and timeout
  can: isotp: handle wait_event_interruptible() return values
  can: raw: fix CAN FD frame transmissions over CAN XL devices
  can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
  hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap()
  ...
2023-02-02 14:03:31 -08:00
Shiju Jose
3e46d910d8 tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
poll() and select() on per_cpu trace_pipe and trace_pipe_raw do not work
since kernel 6.1-rc6. This issue is seen after the commit
42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have
polling block on watermark").

This issue is firstly detected and reported, when testing the CXL error
events in the rasdaemon and also erified using the test application for poll()
and select().

This issue occurs for the per_cpu case, when calling the ring_buffer_poll_wait(),
in kernel/trace/ring_buffer.c, with the buffer_percent > 0 and then wait until the
percentage of pages are available. The default value set for the buffer_percent is 50
in the kernel/trace/trace.c.

As a fix, allow userspace application could set buffer_percent as 0 through
the buffer_percent_fops, so that the task will wake up as soon as data is added
to any of the specific cpu buffer.

Link: https://lore.kernel.org/linux-trace-kernel/20230202182309.742-2-shiju.jose@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <mchehab@kernel.org>
Cc: <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark")
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-02-02 16:15:53 -05:00
Greg Kroah-Hartman
d45fed4ff6 coresight: Updates for v6.3
- Dynamic TraceID allocation scheme for CoreSight trace source. Allows systems
    with > 44 CPUs to use the ETMs. TraceID is advertised via AUX_OUTPUT_HWID
    packets in perf.data. Also allows allocating trace-ids for non-CPU bound trace
    components (e.g., Qualcomm TPDA).
 
 - Support for Qualcomm TPDA and TPDM CoreSight devices.
 
 - Support for Ultrasoc SMB CoreSight Sink buffer.
 
 - Fixes for HiSilicon PTT driver
 
 - MAINTAINERS update: Add Reviewer for HiSilicon PTT driver
 
 - Bug fixes for CTI power management and sysfs mode
 
 - Fix CoreSight ETM4x TRCSEQRSTEVRn access
 
 Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuFy0byloRoXZHaWBxcXRZPKyBqEFAmPY/2QACgkQxcXRZPKy
 BqHfYRAAsO3uJ+Q4dRdNIftKKVppbmDGehzmRorJKXPgc7eosXVWQlFkiEFLUGVt
 VOwh3eZzW9g1PYZ2HoB2u5aW1/mXL9SLa0lepEovqi2BVxzrpoBh/cw8KT2C0UtU
 jfkirNk4xaJl3kiwy8WTZ3vFCPa8SRkoxXQEZYD/8NWfgWC4JwOi5yRrCIDvves8
 0aMHGuU9benT/CBUNhRE558WILG+71QGQoYzekSp/4CustEk1AM0DpnI0wfYRHB1
 GvDNoTUGoD1GhDvfaiggBj6YWPmrdwgim3xLSLU7OsEK3hRWy3QwAeVN49W0wAtm
 zhrkuHsETT7c81I+IFh170imueCj42aDgJS9Y5Z/xWfI0B5kzaKVQdYLIO0Lyu6p
 SZkdvQXG2YlrYlmUOWepbglhMnZmw0DVI2lt0Pwns/7ebqL5nJMAD6c8iyWV0if+
 9BJ9gqNpFAJBbjXJtFVe/T7eaYkeBBO2Q5ysLRaE60JgN7nvrmvIxZ11yIeauEAD
 fahlepOT4bxOkG88pn+9lXXzXe4Xyq/kU33TtbRtL1OxIXKws+vw/4gUUeZHGnxT
 1kUTcHHfT6SAogRa2mYzVAHXNVLrc5J79BvdYQ/zNgQ1hQgJ1cghHGNim5JlvuYz
 AZZO2DfX8XRXJ/5f3AibM77SPR+PDhfGnpPYm7ZHP1FAavP/o9k=
 =FS9z
 -----END PGP SIGNATURE-----

Merge tag 'coresight-next-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next

Suzuki writes:

coresight: Updates for v6.3

 - Dynamic TraceID allocation scheme for CoreSight trace source. Allows systems
   with > 44 CPUs to use the ETMs. TraceID is advertised via AUX_OUTPUT_HWID
   packets in perf.data. Also allows allocating trace-ids for non-CPU bound trace
   components (e.g., Qualcomm TPDA).

- Support for Qualcomm TPDA and TPDM CoreSight devices.

- Support for Ultrasoc SMB CoreSight Sink buffer.

- Fixes for HiSilicon PTT driver

- MAINTAINERS update: Add Reviewer for HiSilicon PTT driver

- Bug fixes for CTI power management and sysfs mode

- Fix CoreSight ETM4x TRCSEQRSTEVRn access

Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

* tag 'coresight-next-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux: (35 commits)
  coresight: tmc: Don't enable TMC when it's not ready.
  coresight: tpda: fix return value check in tpda_probe()
  Coresight: tpda/tpdm: remove incorrect __exit annotation
  coresight: perf: Output trace id only once
  coresight: Fix uninitialised variable use in coresight_disable
  Documentation: coresight: tpdm: Add dummy comment after sysfs list
  Documentation: coresight: Extend title heading syntax in TPDM and TPDA documentation
  Documentation: trace: Add documentation for TPDM and TPDA
  dt-bindings: arm: Adds CoreSight TPDA hardware definitions
  Coresight: Add TPDA link driver
  coresight-tpdm: Add integration test support
  coresight-tpdm: Add DSB dataset support
  dt-bindings: arm: Add CoreSight TPDM hardware
  Coresight: Add coresight TPDM source driver
  coresight: core: Use IDR for non-cpu bound sources' paths.
  coresight: trace-id: Add debug & test macros to Trace ID allocation
  coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID
  kernel: events: Export perf_report_aux_output_id()
  coresight: trace id: Remove legacy get trace ID function.
  coresight: etmX.X: stm: Remove trace_id() callback
  ...
2023-02-02 16:56:14 +01:00
David Vernet
400031e05a bpf: Add __bpf_kfunc tag to all kfuncs
Now that we have the __bpf_kfunc tag, we should use add it to all
existing kfuncs to ensure that they'll never be elided in LTO builds.

Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230201173016.342758-4-void@manifault.com
2023-02-02 00:25:14 +01:00
Waiman Long
e5ae880384 cgroup/cpuset: Fix wrong check in update_parent_subparts_cpumask()
It was found that the check to see if a partition could use up all
the cpus from the parent cpuset in update_parent_subparts_cpumask()
was incorrect. As a result, it is possible to leave parent with no
effective cpu left even if there are tasks in the parent cpuset. This
can lead to system panic as reported in [1].

Fix this probem by updating the check to fail the enabling the partition
if parent's effective_cpus is a subset of the child's cpus_allowed.

Also record the error code when an error happens in update_prstate()
and add a test case where parent partition and child have the same cpu
list and parent has task. Enabling partition in the child will fail in
this case.

[1] https://www.spinics.net/lists/cgroups/msg36254.html

Fixes: f0af1bfc27b5 ("cgroup/cpuset: Relax constraints to partition & cpus changes")
Cc: stable@vger.kernel.org # v6.1
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-31 12:14:02 -10:00
James Clark
4f64a6c9f6 perf: Fix perf_event_pmu_context serialization
Syzkaller triggered a WARN in put_pmu_ctx().

  WARNING: CPU: 1 PID: 2245 at kernel/events/core.c:4925 put_pmu_ctx+0x1f0/0x278

This is because there is no locking around the access of "if
(!epc->ctx)" in find_get_pmu_context() and when it is set to NULL in
put_pmu_ctx().

The decrement of the reference count in put_pmu_ctx() also happens
outside of the spinlock, leading to the possibility of this order of
events, and the context being cleared in put_pmu_ctx(), after its
refcount is non zero:

 CPU0                                   CPU1
 find_get_pmu_context()
   if (!epc->ctx) == false
                                        put_pmu_ctx()
                                        atomic_dec_and_test(&epc->refcount) == true
                                        epc->refcount == 0
     atomic_inc(&epc->refcount);
     epc->refcount == 1
                                        list_del_init(&epc->pmu_ctx_entry);
	                                      epc->ctx = NULL;

Another issue is that WARN_ON for no active PMU events in put_pmu_ctx()
is outside of the lock. If the perf_event_pmu_context is an embedded
one, even after clearing it, it won't be deleted and can be re-used. So
the warning can trigger. For this reason it also needs to be moved
inside the lock.

The above warning is very quick to trigger on Arm by running these two
commands at the same time:

  while true; do perf record -- ls; done
  while true; do perf record -- ls; done

[peterz: atomic_dec_and_raw_lock*()]
Fixes: bd2756811766 ("perf: Rewrite core context handling")
Reported-by: syzbot+697196bc0265049822bd@syzkaller.appspotmail.com
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20230127143141.1782804-2-james.clark@arm.com
2023-01-31 20:37:18 +01:00
Peter Zijlstra
776f22913b sched/clock: Make local_clock() noinstr
With sched_clock() noinstr, provide a noinstr implementation of
local_clock().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230126151323.760767043@infradead.org
2023-01-31 15:01:47 +01:00
Peter Zijlstra
3017ba4b83 cpuidle: tracing, preempt: Squash _rcuidle tracing
Extend/fix commit:

  9aedeaed6fc6 ("tracing, hardirq: No moar _rcuidle() tracing")

... to also cover trace_preempt_{on,off}() which were mysteriously
untouched.

Fixes: 9aedeaed6fc6 ("tracing, hardirq: No moar _rcuidle() tracing")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/Y9D5AfnOukWNOZ5q@hirez.programming.kicks-ass.net
Link: https://lore.kernel.org/r/Y9jWXKgkxY5EZVwW@hirez.programming.kicks-ass.net
2023-01-31 15:01:46 +01:00
Peter Zijlstra
5a5d7e9bad cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
In order to avoid WARN/BUG from generating nested or even recursive
warnings, force rcu_is_watching() true during
WARN/lockdep_rcu_suspicious().

Notably things like unwinding the stack can trigger rcu_dereference()
warnings, which then triggers more unwinding which then triggers more
warnings etc..

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230126151323.408156109@infradead.org
2023-01-31 15:01:45 +01:00
Ingo Molnar
57a30218fa Linux 6.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPW7E8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGf7MIAI0JnHN9WvtEukSZ
 E6j6+cEGWxsvD6q0g3GPolaKOCw7hlv0pWcFJFcUAt0jebspMdxV2oUGJ8RYW7Lg
 nCcHvEVswGKLAQtQSWw52qotW6fUfMPsNYYB5l31sm1sKH4Cgss0W7l2HxO/1LvG
 TSeNHX53vNAZ8pVnFYEWCSXC9bzrmU/VALF2EV00cdICmfvjlgkELGXoLKJJWzUp
 s63fBHYGGURSgwIWOKStoO6HNo0j/F/wcSMx8leY8qDUtVKHj4v24EvSgxUSDBER
 ch3LiSQ6qf4sw/z7pqruKFthKOrlNmcc0phjiES0xwwGiNhLv0z3rAhc4OM2cgYh
 SDc/Y/c=
 =zpaD
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc6' into sched/core, to pick up fixes

Pick up fixes before merging another batch of cpuidle updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-31 15:01:20 +01:00
Davidlohr Bueso
0c52310f26 hrtimer: Ignore slack time for RT tasks in schedule_hrtimeout_range()
While in theory the timer can be triggered before expires + delta, for the
cases of RT tasks they really have no business giving any lenience for
extra slack time, so override any passed value by the user and always use
zero for schedule_hrtimeout_range() calls. Furthermore, this is similar to
what the nanosleep(2) family already does with current->timer_slack_ns.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230123173206.6764-3-dave@stgolabs.net
2023-01-31 11:23:07 +01:00
Davidlohr Bueso
c14fd3dcac hrtimer: Rely on rt_task() for DL tasks too
Checking dl_task() is redundant as rt_task() returns true for deadline
tasks too.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230123173206.6764-2-dave@stgolabs.net
2023-01-31 11:23:07 +01:00
Linus Torvalds
ab072681ea - Cleanup the firmware node for the new IRQ MSI domain properly, to
avoid leaking memory
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPWd70ACgkQEsHwGGHe
 VUpTgxAAjKo7kjhyE9PjKA8tJ5pv6I/21tieu/JAmRyb8AVkhi21elXJaRoEPW4k
 2TuSd0WMxZeoRJ1N366HcGPZcsldKN45v3xXEv1e3XSL6ZTAIFsYvwni6adwVocS
 ztnZBrMuHf1KZW/6ES956qGmWsum0rYcU2sr0kvnS9ihPiXJsbeLfNF1VXPwaH2y
 DTnTxgl/QtuipSQYOUvLSk1XBduvE+Wx8kE1S6TrdlWE9qsD/Jn1Jn7btaLQeHfs
 27AXTNGYnVHeRZSVy6d38dIwU4Gh6LcLMFfTtNENIqTVCUfc/QcAb/gItDF0NvIr
 fgKbkdmI7jc0rIfLGlBUM8WZ0OaxopFwvchVywjcogDpCfLvJbgR7JplEKTfltgz
 62IjvTF6vXUCcjySwu/C48EE2G7CWL0FtIcJofZsI0yTeHESkwFf1r5eOjnTV2ao
 M2QYGeToRtk7z9vb+pa0ZttD55TKXQKZDBv+lBRKLTy8SeIzlExJswIAuGTpWuKR
 ntwrdtekyM2HNir/l9ad1X2mjVblivEEv1hLMpsxPsthigWXDr+PZtQIxHCao8wj
 aKY4bI0F1AXByuKhQAwX/RVHUmB+AGrzJ/M5Fdj+ealQ6Nk+xCrPyTX2qDmveiSF
 NlQZpGCxV+gI/8q9YU4coJm1AioT8TSFsMxaufGZCreqqaWBXl0=
 =qrIZ
 -----END PGP SIGNATURE-----

Merge tag 'irq_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fix from Borislav Petkov:

 - Cleanup the firmware node for the new IRQ MSI domain properly, to
   avoid leaking memory

* tag 'irq_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/msi: Free the fwnode created by msi_create_device_irq_domain()
2023-01-29 11:26:49 -08:00
Ilya Leoshkevich
49f67f393f bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag
s390x eBPF JIT needs to know whether a function return value is signed
and which function arguments are signed, in order to generate code
compliant with the s390x ABI.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-26-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-28 12:45:15 -08:00
Ilya Leoshkevich
0f0e5f5bd5 bpf: iterators: Split iterators.lskel.h into little- and big- endian versions
iterators.lskel.h is little-endian, therefore bpf iterator is currently
broken on big-endian systems. Introduce a big-endian version and add
instructions regarding its generation. Unfortunately bpftool's
cross-endianness capabilities are limited to BTF right now, so the
procedure requires access to a big-endian machine or a configured
emulator.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-25-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-28 12:45:15 -08:00
David Vernet
cb4a21ea59 bpf: Build-time assert that cpumask offset is zero
The first element of a struct bpf_cpumask is a cpumask_t. This is done
to allow struct bpf_cpumask to be cast to a struct cpumask. If this
element were ever moved to another field, any BPF program passing a
struct bpf_cpumask * to a kfunc expecting a const struct cpumask * would
immediately fail to load. Add a build-time assertion so this is
assumption is captured and verified.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230128141537.100777-1-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-28 12:21:40 -08:00
Jakub Kicinski
2d104c390f bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY9RqJgAKCRDbK58LschI
 gw2IAP9G5uhFO5abBzYLupp6SY3T5j97MUvPwLfFqUEt7EXmuwEA2lCUEWeW0KtR
 QX+QmzCa6iHxrW7WzP4DUYLue//FJQY=
 =yYqA
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2023-01-28

We've added 124 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 6386 insertions(+), 1827 deletions(-).

The main changes are:

1) Implement XDP hints via kfuncs with initial support for RX hash and
   timestamp metadata kfuncs, from Stanislav Fomichev and
   Toke Høiland-Jørgensen.
   Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk

2) Extend libbpf's bpf_tracing.h support for tracing arguments of
   kprobes/uprobes and syscall as a special case, from Andrii Nakryiko.

3) Significantly reduce the search time for module symbols by livepatch
   and BPF, from Jiri Olsa and Zhen Lei.

4) Enable cpumasks to be used as kptrs, which is useful for tracing
   programs tracking which tasks end up running on which CPUs
   in different time intervals, from David Vernet.

5) Fix several issues in the dynptr processing such as stack slot liveness
   propagation, missing checks for PTR_TO_STACK variable offset, etc,
   from Kumar Kartikeya Dwivedi.

6) Various performance improvements, fixes, and introduction of more
   than just one XDP program to XSK selftests, from Magnus Karlsson.

7) Big batch to BPF samples to reduce deprecated functionality,
   from Daniel T. Lee.

8) Enable struct_ops programs to be sleepable in verifier,
   from David Vernet.

9) Reduce pr_warn() noise on BTF mismatches when they are expected under
   the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien.

10) Describe modulo and division by zero behavior of the BPF runtime
    in BPF's instruction specification document, from Dave Thaler.

11) Several improvements to libbpf API documentation in libbpf.h,
    from Grant Seltzer.

12) Improve resolve_btfids header dependencies related to subcmd and add
    proper support for HOSTCC, from Ian Rogers.

13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room()
    helper along with BPF selftests, from Ziyang Xuan.

14) Simplify the parsing logic of structure parameters for BPF trampoline
    in the x86-64 JIT compiler, from Pu Lehui.

15) Get BTF working for kernels with CONFIG_RUST enabled by excluding
    Rust compilation units with pahole, from Martin Rodriguez Reboredo.

16) Get bpf_setsockopt() working for kTLS on top of TCP sockets,
    from Kui-Feng Lee.

17) Disable stack protection for BPF objects in bpftool given BPF backends
    don't support it, from Holger Hoffstätte.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits)
  selftest/bpf: Make crashes more debuggable in test_progs
  libbpf: Add documentation to map pinning API functions
  libbpf: Fix malformed documentation formatting
  selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata
  selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket.
  bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().
  bpf/selftests: Verify struct_ops prog sleepable behavior
  bpf: Pass const struct bpf_prog * to .check_member
  libbpf: Support sleepable struct_ops.s section
  bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
  selftests/bpf: Fix vmtest static compilation error
  tools/resolve_btfids: Alter how HOSTCC is forced
  tools/resolve_btfids: Install subcmd headers
  bpf/docs: Document the nocast aliasing behavior of ___init
  bpf/docs: Document how nested trusted fields may be defined
  bpf/docs: Document cpumask kfuncs in a new file
  selftests/bpf: Add selftest suite for cpumask kfuncs
  selftests/bpf: Add nested trust selftests suite
  bpf: Enable cpumasks to be queried and used as kptrs
  bpf: Disallow NULLable pointers for trusted kfuncs
  ...
====================

Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28 00:00:14 -08:00
Jakub Kicinski
0548c5f26a bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY9RCwAAKCRDbK58LschI
 g7drAQDfMPc1Q2CE4LZ9oh2wu1Nt2/85naTDK/WirlCToKs0xwD+NRQOyO3hcoJJ
 rOCwfjOlAm+7uqtiwwodBvWgTlDgVAM=
 =0fC+
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
bpf 2023-01-27

We've added 10 non-merge commits during the last 9 day(s) which contain
a total of 10 files changed, 170 insertions(+), 59 deletions(-).

The main changes are:

1) Fix preservation of register's parent/live fields when copying
   range-info, from Eduard Zingerman.

2) Fix an off-by-one bug in bpf_mem_cache_idx() to select the right
   cache, from Hou Tao.

3) Fix stack overflow from infinite recursion in sock_map_close(),
   from Jakub Sitnicki.

4) Fix missing btf_put() in register_btf_id_dtor_kfuncs()'s error path,
   from Jiri Olsa.

5) Fix a splat from bpf_setsockopt() via lsm_cgroup/socket_sock_rcv_skb,
   from Kui-Feng Lee.

6) Fix bpf_send_signal[_thread]() helpers to hold a reference on the task,
   from Yonghong Song.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix the kernel crash caused by bpf_setsockopt().
  selftests/bpf: Cover listener cloning with progs attached to sockmap
  selftests/bpf: Pass BPF skeleton to sockmap_listen ops tests
  bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener
  bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself
  bpf: Add missing btf_put to register_btf_id_dtor_kfuncs
  selftests/bpf: Verify copy_register_state() preserves parent/live fields
  bpf: Fix to preserve reg parent/live fields when copying range info
  bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers
  bpf: Fix off-by-one error in bpf_mem_cache_idx()
====================

Link: https://lore.kernel.org/r/20230127215820.4993-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27 23:32:03 -08:00
Jakub Kicinski
b568d3072a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/intel/ice/ice_main.c
  418e53401e47 ("ice: move devlink port creation/deletion")
  643ef23bd9dd ("ice: Introduce local var for readability")
https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/
https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/

drivers/net/ethernet/engleder/tsnep_main.c
  3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues")
  25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock")
https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/

net/netfilter/nf_conntrack_proto_sctp.c
  13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"")
  a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths")
  f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets")
https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/
https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27 22:56:18 -08:00
Linus Torvalds
d786f0fe5e Tracing updates for 6.2:
- Fix filter memory leak by calling ftrace_free_filter()
 
 - Initialize trace_printk() earlier so that ftrace_dump_on_oops shows data
   on early crashes.
 
 - Update the outdated instructions in scripts/tracing/ftrace-bisect.sh
 
 - Add lockdep_is_held() to fix lockdep warning
 
 - Add allocation failure check in create_hist_field()
 
 - Don't initialize pointer that gets set right away in enabled_monitors_write()
 
 - Update MAINTAINER entries
 
 - Fix help messages in Kconfigs
 
 - Fix kernel-doc header for update_preds()
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY9MaCBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qqb4AP4p78mTVP/Rw5oLCRUM4E7JGb8hmxuZ
 HD0RR9fbvyDaQAD/ffkar/KmvGCfCvBNrkGvvx98dwtGIssIB1dShoEZPAU=
 =lBwh
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix filter memory leak by calling ftrace_free_filter()

 - Initialize trace_printk() earlier so that ftrace_dump_on_oops shows
   data on early crashes.

 - Update the outdated instructions in scripts/tracing/ftrace-bisect.sh

 - Add lockdep_is_held() to fix lockdep warning

 - Add allocation failure check in create_hist_field()

 - Don't initialize pointer that gets set right away in enabled_monitors_write()

 - Update MAINTAINER entries

 - Fix help messages in Kconfigs

 - Fix kernel-doc header for update_preds()

* tag 'trace-v6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  bootconfig: Update MAINTAINERS file to add tree and mailing list
  rv: remove redundant initialization of pointer ptr
  ftrace: Maintain samples/ftrace
  tracing/filter: fix kernel-doc warnings
  lib: Kconfig: fix spellos
  trace_events_hist: add check for return value of 'create_hist_field'
  tracing/osnoise: Use built-in RCU list checking
  tracing: Kconfig: Fix spelling/grammar/punctuation
  ftrace/scripts: Update the instructions for ftrace-bisect.sh
  tracing: Make sure trace_printk() can output as soon as it can be used
  ftrace: Export ftrace_free_filter() to modules
2023-01-27 16:03:32 -08:00
Kui-Feng Lee
5416c9aea8 bpf: Fix the kernel crash caused by bpf_setsockopt().
The kernel crash was caused by a BPF program attached to the
"lsm_cgroup/socket_sock_rcv_skb" hook, which performed a call to
`bpf_setsockopt()` in order to set the TCP_NODELAY flag as an
example. Flags like TCP_NODELAY can prompt the kernel to flush a
socket's outgoing queue, and this hook
"lsm_cgroup/socket_sock_rcv_skb" is frequently triggered by
softirqs. The issue was that in certain circumstances, when
`tcp_write_xmit()` was called to flush the queue, it would also allow
BH (bottom-half) to run. This could lead to our program attempting to
flush the same socket recursively, which caused a `skbuff` to be
unlinked twice.

`security_sock_rcv_skb()` is triggered by `tcp_filter()`. This occurs
before the sock ownership is checked in `tcp_v4_rcv()`. Consequently,
if a bpf program runs on `security_sock_rcv_skb()` while under softirq
conditions, it may not possess the lock needed for `bpf_setsockopt()`,
thus presenting an issue.

The patch fixes this issue by ensuring that a BPF program attached to
the "lsm_cgroup/socket_sock_rcv_skb" hook is not allowed to call
`bpf_setsockopt()`.

The differences from v1 are
 - changing commit log to explain holding the lock of the sock,
 - emphasizing that TCP_NODELAY is not the only flag, and
 - adding the fixes tag.

v1: https://lore.kernel.org/bpf/20230125000244.1109228-1-kuifeng@meta.com/

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Fixes: 9113d7e48e91 ("bpf: expose bpf_{g,s}etsockopt to lsm cgroup")
Link: https://lore.kernel.org/r/20230127001732.4162630-1-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-26 23:26:40 -08:00
Waiman Long
1d61659ced locking/rwsem: Disable preemption in all down_write*() and up_write() code paths
The previous patch has disabled preemption in all the down_read() and
up_read() code paths. For symmetry, this patch extends commit:

  48dfb5d2560d ("locking/rwsem: Disable preemption while trying for rwsem lock")

... to have preemption disabled in all the down_write() and up_write()
code paths, including downgrade_write().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230126003628.365092-4-longman@redhat.com
2023-01-26 11:46:46 +01:00
Waiman Long
3f5245538a locking/rwsem: Disable preemption in all down_read*() and up_read() code paths
Commit:

  91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner")

... assumes that when the owner field is changed to NULL, the lock will
become free soon. But commit:

  48dfb5d2560d ("locking/rwsem: Disable preemption while trying for rwsem lock")

... disabled preemption when acquiring rwsem for write.

However, preemption has not yet been disabled when acquiring a read lock
on a rwsem.  So a reader can add a RWSEM_READER_BIAS to count without
setting owner to signal a reader, got preempted out by a RT task which
then spins in the writer slowpath as owner remains NULL leading to live lock.

One easy way to fix this problem is to disable preemption at all the
down_read*() and up_read() code paths as implemented in this patch.

Fixes: 91d2a812dfb9 ("locking/rwsem: Make handoff writer optimistically spin on owner")
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230126003628.365092-3-longman@redhat.com
2023-01-26 11:46:46 +01:00
Waiman Long
b613c7f314 locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
A non-first waiter can potentially spin in the for loop of
rwsem_down_write_slowpath() without sleeping but fail to acquire the
lock even if the rwsem is free if the following sequence happens:

  Non-first RT waiter    First waiter      Lock holder
  -------------------    ------------      -----------
  Acquire wait_lock
  rwsem_try_write_lock():
    Set handoff bit if RT or
      wait too long
    Set waiter->handoff_set
  Release wait_lock
                         Acquire wait_lock
                         Inherit waiter->handoff_set
                         Release wait_lock
					   Clear owner
                                           Release lock
  if (waiter.handoff_set) {
    rwsem_spin_on_owner(();
    if (OWNER_NULL)
      goto trylock_again;
  }
  trylock_again:
  Acquire wait_lock
  rwsem_try_write_lock():
     if (first->handoff_set && (waiter != first))
	return false;
  Release wait_lock

A non-first waiter cannot really acquire the rwsem even if it mistakenly
believes that it can spin on OWNER_NULL value. If that waiter happens
to be an RT task running on the same CPU as the first waiter, it can
block the first waiter from acquiring the rwsem leading to live lock.
Fix this problem by making sure that a non-first waiter cannot spin in
the slowpath loop without sleeping.

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230126003628.365092-2-longman@redhat.com
2023-01-26 11:46:45 +01:00
Ingo Molnar
6397859c8e Linux 6.2-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPMgtUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGv7wH/3J6vjENrW+Jg/3W
 Vhr8bDEBtZ0+myheNZQTjR+/VNLI88UPxp9AV4a5qH1jlPmmqEC1AJcCN5sg9PMB
 Kqr8Jr6gOLI8zVKP7fAhbiJVL0bjtZv4G/IEk5y30jNyO40YBVqwhPwUD8Umhooy
 DAHDXyDF3laWk6bMEmJODahM0rdVmPLqvH+X+ALcliRh0SK61t+bMG/4Ox/j81HO
 7d61Kd0ttXuzsPdwWmZWiDFMc+KxQ4004vCemU4dZwOaDW3RJU7Uc84+fCBH6nJC
 HEzcIaAu9FdmSBY0U1fH8BNIiaBAo+nye2LgV5mZ9LQ77MfuMdDo0I+rWjtBmEp0
 hayPOPI=
 =aO6E
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc5' into locking/core, to pick up fixes

Refresh this branch with the latest locking fixes, before
applying a new series of changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-26 11:46:29 +01:00
Christophe JAILLET
fbed4fea64 module: Use kstrtobool() instead of strtobool()
strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.

In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.

While at it, include the corresponding header file (<linux/kstrtox.h>)

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-01-25 14:07:21 -08:00
Christophe JAILLET
def7b92efd kernel/params.c: Use kstrtobool() instead of strtobool()
strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.

In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.

While at it, include the corresponding header file (<linux/kstrtox.h>)

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-01-25 14:07:21 -08:00
David Vernet
51a52a29eb bpf: Pass const struct bpf_prog * to .check_member
The .check_member field of struct bpf_struct_ops is currently passed the
member's btf_type via const struct btf_type *t, and a const struct
btf_member *member. This allows the struct_ops implementation to check
whether e.g. an ops is supported, but it would be useful to also enforce
that the struct_ops prog being loaded for that member has other
qualities, like being sleepable (or not). This patch therefore updates
the .check_member() callback to also take a const struct bpf_prog *prog
argument.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 10:25:57 -08:00
David Vernet
1e12d3ef47 bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
BPF struct_ops programs currently cannot be marked as sleepable. This
need not be the case -- struct_ops programs can be sleepable, and e.g.
invoke kfuncs that export the KF_SLEEPABLE flag. So as to allow future
struct_ops programs to invoke such kfuncs, this patch updates the
verifier to allow struct_ops programs to be sleepable. A follow-on patch
will add support to libbpf for specifying struct_ops.s as a sleepable
struct_ops program, and then another patch will add testcases to the
dummy_st_ops selftest suite which test sleepable struct_ops behavior.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 10:25:57 -08:00