1247977 Commits

Author SHA1 Message Date
Pablo Neira Ayuso
3ce67e3793 netfilter: nf_tables: do not allow mismatch field size and set key length
The set description provides the size of each field in the set whose sum
should not mismatch the set key length, bail out otherwise.

I did not manage to crash nft_set_pipapo with mismatch fields and set key
length so far, but this is UB which must be disallowed.

Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-17 12:02:50 +01:00
Pablo Neira Ayuso
b1db244ffd netfilter: nf_tables: check if catch-all set element is active in next generation
When deactivating the catch-all set element, check the state in the next
generation that represents this transaction.

This bug uncovered after the recent removal of the element busy mark
a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API").

Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support")
Cc: stable@vger.kernel.org
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-17 12:02:49 +01:00
Pavel Tikhomirov
9874808878 netfilter: bridge: replace physindev with physinif in nf_bridge_info
An skb can be added to a neigh->arp_queue while waiting for an arp
reply. Where original skb's skb->dev can be different to neigh's
neigh->dev. For instance in case of bridging dnated skb from one veth to
another, the skb would be added to a neigh->arp_queue of the bridge.

As skb->dev can be reset back to nf_bridge->physindev and used, and as
there is no explicit mechanism that prevents this physindev from been
freed under us (for instance neigh_flush_dev doesn't cleanup skbs from
different device's neigh queue) we can crash on e.g. this stack:

arp_process
  neigh_update
    skb = __skb_dequeue(&neigh->arp_queue)
      neigh_resolve_output(..., skb)
        ...
          br_nf_dev_xmit
            br_nf_pre_routing_finish_bridge_slow
              skb->dev = nf_bridge->physindev
              br_handle_frame_finish

Let's use plain ifindex instead of net_device link. To peek into the
original net_device we will use dev_get_by_index_rcu(). Thus either we
get device and are safe to use it or we don't get it and drop skb.

Fixes: c4e70a87d975 ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-17 12:02:49 +01:00
Pavel Tikhomirov
a54e721970 netfilter: propagate net to nf_bridge_get_physindev
This is a preparation patch for replacing physindev with physinif on
nf_bridge_info structure. We will use dev_get_by_index_rcu to resolve
device, when needed, and it requires net to be available.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-17 12:02:48 +01:00
Pavel Tikhomirov
aeaa44075f netfilter: nf_queue: remove excess nf_bridge variable
We don't really need nf_bridge variable here. And nf_bridge_info_exists
is better replacement for nf_bridge_info_get in case we are only
checking for existence.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-17 12:02:48 +01:00
Pavel Tikhomirov
c3f9fd54cd netfilter: nfnetlink_log: use proper helper for fetching physinif
We don't use physindev in __build_packet_message except for getting
physinif from it. So let's switch to nf_bridge_get_physinif to get what
we want directly.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-17 12:02:47 +01:00
Pablo Neira Ayuso
91a139cee1 netfilter: nft_limit: do not ignore unsupported flags
Bail out if userspace provides unsupported flags, otherwise future
extensions to the limit expression will be silently ignored by the
kernel.

Fixes: c7862a5f0de5 ("netfilter: nft_limit: allow to invert matching criteria")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-17 12:02:47 +01:00
Pablo Neira Ayuso
3c13725f43 netfilter: nf_tables: bail out if stateful expression provides no .clone
All existing NFT_EXPR_STATEFUL provide a .clone interface, remove
fallback to copy content of stateful expression since this is never
exercised and bail out if .clone interface is not defined.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-17 12:02:46 +01:00
Pablo Neira Ayuso
65b3bd600e netfilter: nf_tables: validate .maxattr at expression registration
struct nft_expr_info allows to store up to NFT_EXPR_MAXATTR (16)
attributes when parsing netlink attributes.

Rise a warning in case there is ever a nft expression whose .maxattr
goes beyond this number of expressions, in such case, struct nft_expr_info
needs to be updated.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-17 12:02:46 +01:00
Pablo Neira Ayuso
0617c3de9b netfilter: nf_tables: reject invalid set policy
Report -EINVAL in case userspace provides a unsupported set backend
policy.

Fixes: c50b960ccc59 ("netfilter: nf_tables: implement proper set selection")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-17 12:02:45 +01:00
Jakub Kicinski
ea937f7720 net: netdevsim: don't try to destroy PHC on VFs
PHC gets initialized in nsim_init_netdevsim(), which
is only called if (nsim_dev_port_is_pf()).

Create a counterpart of nsim_init_netdevsim() and
move the mock_phc_destroy() there.

This fixes a crash trying to destroy netdevsim with
VFs instantiated, as caught by running the devlink.sh test:

    BUG: kernel NULL pointer dereference, address: 00000000000000b8
    RIP: 0010:mock_phc_destroy+0xd/0x30
    Call Trace:
     <TASK>
     nsim_destroy+0x4a/0x70 [netdevsim]
     __nsim_dev_port_del+0x47/0x70 [netdevsim]
     nsim_dev_reload_destroy+0x105/0x120 [netdevsim]
     nsim_drv_remove+0x2f/0xb0 [netdevsim]
     device_release_driver_internal+0x1a1/0x210
     bus_remove_device+0xd5/0x120
     device_del+0x159/0x490
     device_unregister+0x12/0x30
     del_device_store+0x11a/0x1a0 [netdevsim]
     kernfs_fop_write_iter+0x130/0x1d0
     vfs_write+0x30b/0x4b0
     ksys_write+0x69/0xf0
     do_syscall_64+0xcc/0x1e0
     entry_SYSCALL_64_after_hwframe+0x6f/0x77

Fixes: b63e78fca889 ("net: netdevsim: use mock PHC driver")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-17 10:56:44 +00:00
Paolo Abeni
c0f5aec28e mptcp: relax check on MPC passive fallback
While testing the blamed commit below, I was able to miss (!)
packetdrill failures in the fastopen test-cases.

On passive fastopen the child socket is created by incoming TCP MPC syn,
allow for both MPC_SYN and MPC_ACK header.

Fixes: 724b00c12957 ("mptcp: refine opt_mp_capable determination")
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-17 10:55:54 +00:00
Romain Gantois
c2945c435c net: stmmac: Prevent DSA tags from breaking COE
Some DSA tagging protocols change the EtherType field in the MAC header
e.g.  DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105). On TX these tagged
frames are ignored by the checksum offload engine and IP header checker of
some stmmac cores.

On RX, the stmmac driver wrongly assumes that checksums have been computed
for these tagged packets, and sets CHECKSUM_UNNECESSARY.

Add an additional check in the stmmac TX and RX hotpaths so that COE is
deactivated for packets with ethertypes that will not trigger the COE and
IP header checks.

Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan")
Cc:  <stable@vger.kernel.org>
Reported-by: Richard Tresidder <rtresidd@electromag.com.au>
Link: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electromag.com.au/
Reported-by: Romain Gantois <romain.gantois@bootlin.com>
Link: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.com/
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-17 10:03:29 +00:00
Bartosz Golaszewski
efb8235bfd gpiolib: revert the attempt to protect the GPIO device list with an rwsem
This reverts commits 1979a2807547 ("gpiolib: replace the GPIO device
mutex with a read-write semaphore") and 65a828bab158 ("gpiolib: use
a mutex to protect the list of GPIO devices").

Unfortunately the legacy GPIO API that's still used in older code has to
translate numbers from the global GPIO numberspace to descriptors. This
results in a GPIO device lookup in every call to legacy functions. Some
of those functions - like gpio_set/get_value() - can be called from
atomic context so taking a sleeping lock that is an RW semaphore results
in an error.

We'll probably have to protect this list with SRCU.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-wireless/f7b5ff1e-8f34-4d98-a7be-b826cb897dc8@moroto.mountain/
Fixes: 1979a2807547 ("gpiolib: replace the GPIO device mutex with a read-write semaphore")
Fixes: 65a828bab158 ("gpiolib: use a mutex to protect the list of GPIO devices")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-01-17 09:52:37 +01:00
Nicolas Dichtel
e9ce7ededf selftests: rtnetlink: use setup_ns in bonding test
This is a follow-up of commit a159cbe81d3b ("selftests: rtnetlink: check
enslaving iface in a bond") after the merge of net-next into net.

The goal is to follow the new convention,
see commit d3b6b1116127 ("selftests/net: convert rtnetlink.sh to run it in
unique namespace") for more details.

Let's use also the generic dummy name instead of defining a new one.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240115135922.3662648-1-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-16 17:49:54 -08:00
Russell King (Oracle)
97eb5d51b4 net: sfp-bus: fix SFP mode detect from bitrate
The referenced commit moved the setting of the Autoneg and pause bits
early in sfp_parse_support(). However, we check whether the modes are
empty before using the bitrate to set some modes. Setting these bits
so early causes that test to always be false, preventing this working,
and thus some modules that used to work no longer do.

Move them just before the call to the quirk.

Fixes: 8110633db49d ("net: sfp-bus: allow SFP quirks to override Autoneg and pause bits")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://lore.kernel.org/r/E1rPMJW-001Ahf-L0@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-16 17:49:49 -08:00
Kunwu Chan
776dac5a66 net: dsa: vsc73xx: Add null pointer check to vsc73xx_gpio_probe
devm_kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.

Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver")
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240111072018.75971-1-chentao@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-16 17:48:27 -08:00
Erick Archer
1057066009 eventfs: Use kcalloc() instead of kzalloc()
As noted in the "Deprecated Interfaces, Language Features, Attributes,
and Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead
to values wrapping around and a smaller allocation being made than the
caller was expecting. Using those allocations could lead to linear
overflows of heap memory and other misbehaviors.

So, use the purpose specific kcalloc() function instead of the argument
size * count in the kzalloc() function.

[1] https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Link: https://lore.kernel.org/linux-trace-kernel/20240115181658.4562-1-erick.archer@gmx.com

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Erick Archer <erick.archer@gmx.com>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-16 17:52:33 -05:00
Steven Rostedt (Google)
852e46e239 eventfs: Do not create dentries nor inodes in iterate_shared
The original eventfs code added a wrapper around the dcache_readdir open
callback and created all the dentries and inodes at open, and increment
their ref count. A wrapper was added around the dcache_readdir release
function to decrement all the ref counts of those created inodes and
dentries. But this proved to be buggy[1] for when a kprobe was created
during a dir read, it would create a dentry between the open and the
release, and because the release would decrement all ref counts of all
files and directories, that would include the kprobe directory that was
not there to have its ref count incremented in open. This would cause the
ref count to go to negative and later crash the kernel.

To solve this, the dentries and inodes that were created and had their ref
count upped in open needed to be saved. That list needed to be passed from
the open to the release, so that the release would only decrement the ref
counts of the entries that were incremented in the open.

Unfortunately, the dcache_readdir logic was already using the
file->private_data, which is the only field that can be used to pass
information from the open to the release. What was done was the eventfs
created another descriptor that had a void pointer to save the
dcache_readdir pointer, and it wrapped all the callbacks, so that it could
save the list of entries that had their ref counts incremented in the
open, and pass it to the release. The wrapped callbacks would just put
back the dcache_readdir pointer and call the functions it used so it could
still use its data[2].

But Linus had an issue with the "hijacking" of the file->private_data
(unfortunately this discussion was on a security list, so no public link).
Which we finally agreed on doing everything within the iterate_shared
callback and leave the dcache_readdir out of it[3]. All the information
needed for the getents() could be created then.

But this ended up being buggy too[4]. The iterate_shared callback was not
the right place to create the dentries and inodes. Even Christian Brauner
had issues with that[5].

An attempt was to go back to creating the inodes and dentries at
the open, create an array to store the information in the
file->private_data, and pass that information to the other callbacks.[6]

The difference between that and the original method, is that it does not
use dcache_readdir. It also does not up the ref counts of the dentries and
pass them. Instead, it creates an array of a structure that saves the
dentry's name and inode number. That information is used in the
iterate_shared callback, and the array is freed in the dir release. The
dentries and inodes created in the open are not used for the iterate_share
or release callbacks. Just their names and inode numbers.

Linus did not like that either[7] and just wanted to remove the dentries
being created in iterate_shared and use the hard coded inode numbers.

[ All this while Linus enjoyed an unexpected vacation during the merge
  window due to lack of power. ]

[1] https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/
[2] https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home/
[3] https://lore.kernel.org/linux-trace-kernel/20240104015435.682218477@goodmis.org/
[4] https://lore.kernel.org/all/202401152142.bfc28861-oliver.sang@intel.com/
[5] https://lore.kernel.org/all/20240111-unzahl-gefegt-433acb8a841d@brauner/
[6] https://lore.kernel.org/all/20240116114711.7e8637be@gandalf.local.home/
[7] https://lore.kernel.org/all/20240116170154.5bf0a250@gandalf.local.home/

Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.573784051@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al  Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Fixes: 493ec81a8fb8 ("eventfs: Stop using dcache_readdir() for getdents()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202401152142.bfc28861-oliver.sang@intel.com
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-16 17:48:19 -05:00
Steven Rostedt (Google)
53c41052ba eventfs: Have the inodes all for files and directories all be the same
The dentries and inodes are created in the readdir for the sole purpose of
getting a consistent inode number. Linus stated that is unnecessary, and
that all inodes can have the same inode number. For a virtual file system
they are pretty meaningless.

Instead use a single unique inode number for all files and one for all
directories.

Link: https://lore.kernel.org/all/20240116133753.2808d45e@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20240116211353.412180363@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al  Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-01-16 16:27:47 -05:00
Hans de Goede
58f65f9db7 Input: atkbd - use ab83 as id when skipping the getid command
Barnabás reported that the change to skip the getid command
when the controller is in translated mode on laptops caused
the Version field of his "AT Translated Set 2 keyboard"
input device to change from ab83 to abba, breaking a custom
hwdb entry for this keyboard.

Use the standard ab83 id for keyboards when getid is skipped
(rather then that getid fails) to avoid reporting a different
Version to userspace then before skipping the getid.

Fixes: 936e4d49ecbc ("Input: atkbd - skip ATKBD_CMD_GETID in translated mode")
Reported-by: Barnabás Pőcze <pobrn@protonmail.com>
Closes: https://lore.kernel.org/linux-input/W1ydwoG2fYv85Z3C3yfDOJcVpilEvGge6UGa9kZh8zI2-qkHXp7WLnl2hSkFz63j-c7WupUWI5TLL6n7Lt8DjRuU-yJBwLYWrreb1hbnd6A=@protonmail.com/
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240116204325.7719-1-hdegoede@redhat.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-01-16 13:16:10 -08:00
Hao Sun
33772ff3b8 selftests/bpf: Add test for alu on PTR_TO_FLOW_KEYS
Add a test case for PTR_TO_FLOW_KEYS alu. Testing if alu with variable
offset on flow_keys is rejected. For the fixed offset success case, we
already have C code coverage to verify (e.g. via bpf_flow.c).

Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240115082028.9992-2-sunhao.th@gmail.com
2024-01-16 17:12:48 +01:00
Hao Sun
22c7fa171a bpf: Reject variable offset alu on PTR_TO_FLOW_KEYS
For PTR_TO_FLOW_KEYS, check_flow_keys_access() only uses fixed off
for validation. However, variable offset ptr alu is not prohibited
for this ptr kind. So the variable offset is not checked.

The following prog is accepted:

  func#0 @0
  0: R1=ctx() R10=fp0
  0: (bf) r6 = r1                       ; R1=ctx() R6_w=ctx()
  1: (79) r7 = *(u64 *)(r6 +144)        ; R6_w=ctx() R7_w=flow_keys()
  2: (b7) r8 = 1024                     ; R8_w=1024
  3: (37) r8 /= 1                       ; R8_w=scalar()
  4: (57) r8 &= 1024                    ; R8_w=scalar(smin=smin32=0,
  smax=umax=smax32=umax32=1024,var_off=(0x0; 0x400))
  5: (0f) r7 += r8
  mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1
  mark_precise: frame0: regs=r8 stack= before 4: (57) r8 &= 1024
  mark_precise: frame0: regs=r8 stack= before 3: (37) r8 /= 1
  mark_precise: frame0: regs=r8 stack= before 2: (b7) r8 = 1024
  6: R7_w=flow_keys(smin=smin32=0,smax=umax=smax32=umax32=1024,var_off
  =(0x0; 0x400)) R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1024,
  var_off=(0x0; 0x400))
  6: (79) r0 = *(u64 *)(r7 +0)          ; R0_w=scalar()
  7: (95) exit

This prog loads flow_keys to r7, and adds the variable offset r8
to r7, and finally causes out-of-bounds access:

  BUG: unable to handle page fault for address: ffffc90014c80038
  [...]
  Call Trace:
   <TASK>
   bpf_dispatcher_nop_func include/linux/bpf.h:1231 [inline]
   __bpf_prog_run include/linux/filter.h:651 [inline]
   bpf_prog_run include/linux/filter.h:658 [inline]
   bpf_prog_run_pin_on_cpu include/linux/filter.h:675 [inline]
   bpf_flow_dissect+0x15f/0x350 net/core/flow_dissector.c:991
   bpf_prog_test_run_flow_dissector+0x39d/0x620 net/bpf/test_run.c:1359
   bpf_prog_test_run kernel/bpf/syscall.c:4107 [inline]
   __sys_bpf+0xf8f/0x4560 kernel/bpf/syscall.c:5475
   __do_sys_bpf kernel/bpf/syscall.c:5561 [inline]
   __se_sys_bpf kernel/bpf/syscall.c:5559 [inline]
   __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:5559
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0x3f/0x110 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x63/0x6b

Fix this by rejecting ptr alu with variable offset on flow_keys.
Applying the patch rejects the program with "R7 pointer arithmetic
on flow_keys prohibited".

Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Signed-off-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240115082028.9992-1-sunhao.th@gmail.com
2024-01-16 17:12:29 +01:00
Jakub Kicinski
03fb8565c8 selftests: bonding: add missing build configs
bonding tests also try to create bridge, veth and dummy
interfaces. These are not currently listed in config.

Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management")
Fixes: c078290a2b76 ("selftests: include bonding tests into the kselftest infra")
Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240116020201.1883023-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-16 06:58:53 -08:00
Rafael J. Wysocki
5b5268cd49 Merge branches 'pnp', 'acpi-resource' and 'acpica'
Merge a PNP change, new ACPI IRQ management quirks and a small ACPICA
code update for 6.8-rc1:

 - Make pnp_bus_type const (Greg Kroah-Hartman).

 - Add ACPI IRQ management quirks for ASUS ExpertBook B1502CGA and ASUS
   Vivobook E1504GA and E1504GAB (Ben Mayo, Michael Maltsev).

 - Add new MADT GICC/GICR/ITS non-coherent flags and GICC online capable
   bit handling to ACPICA (Lorenzo Pieralisi).

* pnp:
  PNP: make pnp_bus_type const

* acpi-resource:
  ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CGA
  ACPI: resource: Add DMI quirks for ASUS Vivobook E1504GA and E1504GAB

* acpica:
  ACPICA: MADT: Add new MADT GICC/GICR/ITS non-coherent flags handling
  ACPICA: MADT: Add GICC online capable bit handling
2024-01-16 12:44:35 +01:00
Jakub Kicinski
4697381bd0 selftests: netdevsim: correct expected FEC strings
ethtool CLI has changed its output. Make the test compatible.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240114224748.1210578-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16 12:42:29 +01:00
Rafael J. Wysocki
dd75558b2d Merge branches 'thermal-core' and 'thermal-intel'
Merge additional updates for 6.8-rc1 in the thermal core and in the
Intel HFI thermal driver:

 - Add debugfs-based diagnostics support to the thermal core (Daniel
   Lezcano, Dan Carpenter).

 - Fix a power allocator thermal governor issue preventing it from
   resetting cooling devices sometimes (Di Shen).

 - Simplify the thermal netlink API and clean up related code (Rafael J.
   Wysocki).

 - Make the Intel HFI driver support hibernation and deep suspend
   properly (Ricardo Neri).

* thermal-core:
  thermal/debugfs: Unlock on error path in thermal_debug_tz_trip_up()
  thermal: gov_power_allocator: avoid inability to reset a cdev
  thermal: helpers: Rearrange thermal_cdev_set_cur_state()
  thermal: netlink: Rework notify API for cooling devices
  thermal: core: Use kstrdup_const() during cooling device registration
  thermal/debugfs: Add thermal debugfs information for mitigation episodes
  thermal/debugfs: Add thermal cooling device debugfs information
  thermal: netlink: Pass thermal zone pointer to notify routines
  thermal: netlink: Drop thermal_notify_tz_trip_add/delete()
  thermal: netlink: Pass pointers to thermal_notify_tz_trip_up/down()
  thermal: netlink: Pass pointers to thermal_notify_tz_trip_change()

* thermal-intel:
  thermal: intel: hfi: Add syscore callbacks for system-wide PM
2024-01-16 12:33:15 +01:00
Rafael J. Wysocki
9223614ea7 Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-qos' into pm
* pm-sleep:
  PM: sleep: Restore asynchronous device resume optimization

* pm-cpufreq:
  Documentation: admin-guide: PM: Fix two typos
  cpufreq: intel_pstate: Update hybrid scaling factor for Meteor Lake

* pm-qos:
  PM: QoS: Use kcalloc() instead of kzalloc()
2024-01-16 12:23:24 +01:00
Jakub Kicinski
2c4ca79772 selftests: netdevsim: sprinkle more udevadm settle
Number of tests are failing when netdev renaming is active
on the system. Add udevadm settle in logic determining
the names.

Fixes: 242aaf03dc9b ("selftests: add a test for ethtool pause stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240114224726.1210532-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16 12:16:04 +01:00
Vincent Guittot
e37617c8e5 sched/fair: Fix frequency selection for non-invariant case
Linus reported a ~50% performance regression on single-threaded
workloads on his AMD Ryzen system, and bisected it to:

  9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")

When frequency invariance is not enabled, get_capacity_ref_freq(policy)
is supposed to return the current frequency and the performance margin
applied by map_util_perf(), enabling the utilization to go above the
maximum compute capacity and to select a higher frequency than the current one.

After the changes in 9c0b4bb7f630, the performance margin was applied
earlier in the path to take into account utilization clampings and
we couldn't get a utilization higher than the maximum compute capacity,
and the CPU remained 'stuck' at lower frequencies.

To fix this, we must use a frequency above the current frequency to
get a chance to select a higher OPP when the current one becomes fully used.
Apply the same margin and return a frequency 25% higher than the current
one in order to switch to the next OPP before we fully use the CPU
at the current one.

[ mingo: Clarified the changelog. ]

Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Bisected-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Wyes Karny <wkarny@gmail.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Wyes Karny <wkarny@gmail.com>
Link: https://lore.kernel.org/r/20240114183600.135316-1-vincent.guittot@linaro.org
2024-01-16 10:41:25 +01:00
Qiang Ma
a23aa04042 net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake calls
We found the following dmesg calltrace when testing the GMAC NIC notebook:

[9.448656] ------------[ cut here ]------------
[9.448658] Unbalanced IRQ 43 wake disable
[9.448673] WARNING: CPU: 3 PID: 1083 at kernel/irq/manage.c:688 irq_set_irq_wake+0xe0/0x128
[9.448717] CPU: 3 PID: 1083 Comm: ethtool Tainted: G           O      4.19 #1
[9.448773]         ...
[9.448774] Call Trace:
[9.448781] [<9000000000209b5c>] show_stack+0x34/0x140
[9.448788] [<9000000000d52700>] dump_stack+0x98/0xd0
[9.448794] [<9000000000228610>] __warn+0xa8/0x120
[9.448797] [<9000000000d2fb60>] report_bug+0x98/0x130
[9.448800] [<900000000020a418>] do_bp+0x248/0x2f0
[9.448805] [<90000000002035f4>] handle_bp_int+0x4c/0x78
[9.448808] [<900000000029ea40>] irq_set_irq_wake+0xe0/0x128
[9.448813] [<9000000000a96a7c>] stmmac_set_wol+0x134/0x150
[9.448819] [<9000000000be6ed0>] dev_ethtool+0x1368/0x2440
[9.448824] [<9000000000c08350>] dev_ioctl+0x1f8/0x3e0
[9.448827] [<9000000000bb2a34>] sock_ioctl+0x2a4/0x450
[9.448832] [<900000000046f044>] do_vfs_ioctl+0xa4/0x738
[9.448834] [<900000000046f778>] ksys_ioctl+0xa0/0xe8
[9.448837] [<900000000046f7d8>] sys_ioctl+0x18/0x28
[9.448840] [<9000000000211ab4>] syscall_common+0x20/0x34
[9.448842] ---[ end trace 40c18d9aec863c3e ]---

Multiple disable_irq_wake() calls will keep decreasing the IRQ
wake_depth, When wake_depth is 0, calling disable_irq_wake() again,
will report the above calltrace.

Due to the need to appear in pairs, we cannot call disable_irq_wake()
without calling enable_irq_wake(). Fix this by making sure there are
no unbalanced disable_irq_wake() calls.

Fixes: 3172d3afa998 ("stmmac: support wake up irq from external sources (v3)")
Signed-off-by: Qiang Ma <maqianga@uniontech.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240112021249.24598-1-maqianga@uniontech.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16 09:37:38 +01:00
Paolo Abeni
915805b505 Merge branch 'selftests-net-small-fixes'
Benjamin Poirier says:

====================
selftests: net: Small fixes

From: Benjamin Poirier <benjamin.poirier@gmail.com>

Two small fixes for net selftests.

These patches were carved out of the following RFC series:
https://lore.kernel.org/netdev/20231222135836.992841-1-bpoirier@nvidia.com/

I'm planning to send the rest of the series to net-next after it opens up.
====================

Link: https://lore.kernel.org/r/20240110141436.157419-1-bpoirier@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16 09:06:18 +01:00
Benjamin Poirier
49078c1b80 selftests: forwarding: Remove executable bits from lib.sh
The lib.sh script is meant to be sourced from other scripts, not executed
directly. Therefore, remove the executable bits from lib.sh's permissions.

Fixes: fe32dffdcd33 ("selftests: forwarding: add TCPDUMP_EXTRA_FLAGS to lib.sh")
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16 09:06:15 +01:00
Benjamin Poirier
c2518da8e6 selftests: bonding: Change script interpreter
The tests changed by this patch, as well as the scripts they source, use
features which are not part of POSIX sh (ex. 'source' and 'local'). As a
result, these tests fail when /bin/sh is dash such as on Debian. Change the
interpreter to bash so that these tests can run successfully.

Fixes: d43eff0b85ae ("selftests: bonding: up/down delay w/ slave link flapping")
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-16 09:06:15 +01:00
Antoniu Miclaus
dedaf03b99 rtc: max31335: add driver support
RTC driver for MAX31335 ±2ppm Automotive Real-Time Clock with
Integrated MEMS Resonator.

Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Antoniu Miclaus <antoniu.miclaus@analog.com>
Link: https://lore.kernel.org/r/20231120120114.48657-2-antoniu.miclaus@analog.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-01-15 23:27:59 +01:00
Antoniu Miclaus
5905777847 dt-bindings: rtc: max31335: add max31335 bindings
Document the Analog Devices MAX31335 device tree bindings.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Antoniu Miclaus <antoniu.miclaus@analog.com>
Link: https://lore.kernel.org/r/20231120120114.48657-1-antoniu.miclaus@analog.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-01-15 23:27:14 +01:00
Randy Dunlap
62b38f30a0 gpio: EN7523: fix kernel-doc warnings
Add "struct" keyword and explain the @dir array differently to
prevent kernel-doc warnings:

gpio-en7523.c:22: warning: cannot understand function prototype: 'struct airoha_gpio_ctrl '
gpio-en7523.c:27: warning: Function parameter or struct member 'dir' not described in 'airoha_gpio_ctrl'
gpio-en7523.c:27: warning: Excess struct member 'dir0' description in 'airoha_gpio_ctrl'
gpio-en7523.c:27: warning: Excess struct member 'dir1' description in 'airoha_gpio_ctrl'

Fixes: 0868ad385aff ("gpio: Add support for Airoha EN7523 GPIO controller")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-01-15 23:15:35 +01:00
Alexandre Belloni
83c0711453 rtc: rv8803: add wakeup-source support
The RV8803 can be wired directly to a PMIC that can wake up an SoC without
the CPU getting interrupts.

Link: https://lore.kernel.org/r/20240108004357.602918-1-alexandre.belloni@bootlin.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-01-15 22:29:48 +01:00
Randy Dunlap
eea7615b68 rtc: ac100: remove misuses of kernel-doc
Prevent kernel-doc warnings by changing "/**" to common comment
format "/*" in non-kernel-doc comments:

drivers/rtc/rtc-ac100.c:103: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Clock controls for 3 clock output pins
drivers/rtc/rtc-ac100.c:382: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * RTC related bits

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc:  <linux-rtc@vger.kernel.org>
Link: https://lore.kernel.org/r/20240114231320.31437-1-rdunlap@infradead.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-01-15 22:27:35 +01:00
Bjorn Helgaas
7119ca35ee Merge branch 'pci/misc'
- Drop unused struct pci_driver.node member (Mathias Krause)

- Fix documentation typos (Attreyee Mukherjee)

- Use a unique test pattern for each BAR in the pci_endpoint_test to make
  it easier to debug address translation issues (Niklas Cassel)

- Fix kernel-doc issues (Bjorn Helgaas)

* pci/misc:
  PCI: Fix kernel-doc issues
  misc: pci_endpoint_test: Use a unique test pattern for each BAR
  docs: PCI: Fix typos
  PCI: Remove unused 'node' member from struct pci_driver
2024-01-15 12:10:41 -06:00
Bjorn Helgaas
9946bbb3ed Merge branch 'pci/dt-bindings'
- Increase qcom iommu-map maxItems to accommodate SDX55 (five entries) and
  SDM845 (sixteen entries) (Krzysztof Kozlowski)

- Describe qcom,pcie-sc8180x clocks and resets accurately (Krzysztof
  Kozlowski)

- Describe qcom,pcie-sm8150 clocks and resets accurately (Krzysztof
  Kozlowski)

- Correct the qcom "reset-name" property, previously incorrectly called
  "reset-names" (Krzysztof Kozlowski)

- Document rockchip optional PCIe reference clock input (Heiko Stuebner)

- Document qcom,pcie-sm8650, based on qcom,pcie-sm8550 (Neil Armstrong)

* pci/dt-bindings:
  dt-bindings: PCI: qcom: Document the SM8650 PCIe Controller
  dt-bindings: PCI: dwc: rockchip: Document optional PCIe reference clock input
  dt-bindings: PCI: qcom: Correct reset-names property
  dt-bindings: PCI: qcom: Correct clocks for SM8150
  dt-bindings: PCI: qcom: Correct clocks for SC8180x
  dt-bindings: PCI: qcom: Adjust iommu-map for different SoC
2024-01-15 12:10:41 -06:00
Bjorn Helgaas
eddcaefa5f Merge branch 'pci/remove-old-api'
- In dw-xdata-pcie, pci_endpoint_test, and vmd, replace usage of deprecated
  ida_simple_*() API with ida_alloc() and ida_free() (Christophe JAILLET)

* pci/remove-old-api:
  dw-xdata: Remove usage of the deprecated ida_simple_*() API
  misc: pci_endpoint_test: Remove usage of the deprecated ida_simple_*() API
  PCI: vmd: Remove usage of the deprecated ida_simple_*() API
2024-01-15 12:10:41 -06:00
Bjorn Helgaas
d43e4239f0 Merge branch 'pci/endpoint'
- Make struct pci_epc_event_ops and struct pci_epf_ops instances const
  (Lars-Peter Clausen)

* pci/endpoint:
  PCI: endpoint: pci-epf-test: Make struct pci_epf_ops const
  PCI: endpoint: pci-epf-vntb: Make struct pci_epf_ops const
  PCI: endpoint: pci-epf-ntb: Make struct pci_epf_ops const
  PCI: endpoint: pci-epf-mhi: Make structs pci_epf_ops and pci_epf_event_ops const
  PCI: endpoint: Make struct pci_epf_ops in pci_epf_driver const
2024-01-15 12:10:40 -06:00
Bjorn Helgaas
dc14155d46 Merge branch 'pci/irq-clean-up'
- Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more explicit and match spec
  terminology (Bjorn Helgaas)

- Use existing PCI_IRQ_INTX, PCI_IRQ_MSI, PCI_IRQ_MSIX in artpec6, cadence,
  designware, designware-plat, dra7xx, imx6, keembay, keystone, layerscape,
  mhi, ntb, qcom, rcar, rcar-gen4, rockchip, tegra194, uniphier, vntb; drop
  the redundant pci_epc_irq_type enum with the same values (Damien Le Moal)

- Use "intx" instead of "leg" or "legacy" when describing INTx interrupts
  in endpoint core, endpoint tests, cadence, dra7xx, designware,
  dw-rockchip, dwc core, imx6, keystone, layerscape, qcom, rcar-gen4,
  rockchip, tegra194, uniphier, xilinx-nwl (Damien Le Moal)

* pci/irq-clean-up:
  PCI: xilinx-nwl: Use INTX instead of legacy
  PCI: rockchip-host: Rename rockchip_pcie_legacy_int_handler()
  PCI: rockchip-ep: Use INTX instead of legacy
  PCI: uniphier: Use INTX instead of legacy
  PCI: tegra194: Use INTX instead of legacy
  PCI: dw-rockchip: Rename rockchip_pcie_legacy_int_handler()
  PCI: keystone: Use INTX instead of legacy
  PCI: dwc: Rename dw_pcie_ep_raise_legacy_irq()
  PCI: cadence: Use INTX instead of legacy
  PCI: dra7xx: Rename dra7xx_pcie_raise_legacy_irq()
  misc: pci_endpoint_test: Use INTX instead of LEGACY
  PCI: endpoint: Rename LEGACY to INTX in test function driver
  PCI: endpoint: Use INTX instead of legacy
  PCI: endpoint: Drop PCI_EPC_IRQ_XXX definitions
  PCI: Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX
2024-01-15 12:10:40 -06:00
Bjorn Helgaas
eb30ad4141 Merge branch 'pci/controller/remove-void-return'
- Convert exynos, keystone, kirin from .remove() to .remove_new(), which
  returns void instead of int (Uwe Kleine-König)

* pci/controller/remove-void-return:
  PCI: kirin: Convert to platform remove callback returning void
  PCI: keystone: Convert to platform remove callback returning void
  PCI: exynos: Convert to platform remove callback returning void
2024-01-15 12:10:40 -06:00
Bjorn Helgaas
fd286a1de5 Merge branch 'pci/controller/xilinx'
- Remove redundant dev_err(), since platform_get_irq() and
  platform_get_irq_byname() already log errors (Yang Li)

- Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq() (Krzysztof
  Wilczyński)

- Fix xilinx_pl_dma_pcie_init_irq_domain() error return when
  irq_domain_add_linear() fails (Harshit Mogalapalli)

* pci/controller/xilinx:
  PCI: xilinx-xdma: Fix error code in xilinx_pl_dma_pcie_init_irq_domain()
  PCI: xilinx-xdma: Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
  PCI: xilinx-xdma: Remove redundant dev_err()
2024-01-15 12:10:39 -06:00
Bjorn Helgaas
161d42df9a Merge branch 'pci/controller/vmd'
- Use ida_alloc() instead of deprecated ida_simple_get() (Christophe JAILLET)

* pci/controller/vmd:
  PCI: vmd: Remove usage of the deprecated ida_simple_xx() API
2024-01-15 12:10:39 -06:00
Bjorn Helgaas
67b9ef22c6 Merge branch 'pci/controller/rcar'
- Replace of_device.h with explicit of.h include to untangle header usage
  (Rob Herring)

- Add DT and driver support for optional miniPCIe 1.5v and 3.3v regulators
  on KingFisher (Wolfram Sang)

* pci/controller/rcar:
  PCI: rcar-host: Add support for optional regulators
  dt-bindings: PCI: rcar-pci-host: Add optional regulators
  PCI: rcar-gen4: Replace of_device.h with explicit of.h include
2024-01-15 12:10:38 -06:00
Bjorn Helgaas
1b6069f51e Merge branch 'pci/controller/mediatek'
- Clear MSI interrupt status before handler to avoid missing MSIs that
  occur after the handler (qizhong cheng)

- Update mediatek-gen3 translation window setup to handle MMIO space that
  is not a power of two in size (Jianjun Wang)

* pci/controller/mediatek:
  PCI: mediatek-gen3: Fix translation window size calculation
  PCI: mediatek: Clear interrupt status before dispatching handler
2024-01-15 12:10:38 -06:00
Bjorn Helgaas
1800c660b0 Merge branch 'pci/controller/layerscape'
- Add suspend/resume support for Layerscape LS1043a, including
  software-managed PME_Turn_Off and transitions between L0, L2/L3_Ready
  Link states (Frank Li)

* pci/controller/layerscape:
  PCI: layerscape: Add suspend/resume for ls1043a
  PCI: layerscape(ep): Rename pf_* as pf_lut_*
  PCI: layerscape: Add suspend/resume for ls1021a
  PCI: layerscape: Add function pointer for exit_from_l2()
2024-01-15 12:10:38 -06:00