linux

iv/linux

Author	SHA1	Message	Date
Jakub Kicinski	99845220d3	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2022-01-19 We've added 12 non-merge commits during the last 8 day(s) which contain a total of 12 files changed, 262 insertions(+), 64 deletions(-). The main changes are: 1) Various verifier fixes mainly around register offset handling when passed to helper functions, from Daniel Borkmann. 2) Fix XDP BPF link handling to assert program type, from Toke Høiland-Jørgensen. 3) Fix regression in mount parameter handling for BPF fs, from Yafang Shao. 4) Fix incorrect integer literal when marking scratched stack slots in verifier, from Christy Lee. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, selftests: Add ringbuf memory type confusion test bpf, selftests: Add various ringbuf tests with invalid offset bpf: Fix ringbuf memory type confusion when passing to helpers bpf: Fix out of bounds access for ringbuf helpers bpf: Generally fix helper register offset check bpf: Mark PTR_TO_FUNC register initially with zero offset bpf: Generalize check_ctx_reg for reuse with other types bpf: Fix incorrect integer literal used for marking scratched stack. bpf/selftests: Add check for updating XDP bpf_link with wrong program type bpf/selftests: convert xdp_link test to ASSERT_* macros xdp: check prog type before updating BPF link bpf: Fix mount source show for bpffs ==================== Link: https://lore.kernel.org/r/20220119011825.9082-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-18 19:28:29 -08:00
Daniel Borkmann	37c8d4807d	bpf, selftests: Add ringbuf memory type confusion test Add two tests, one which asserts that ring buffer memory can be passed to other helpers for populating its entry area, and another one where verifier rejects different type of memory passed to bpf_ringbuf_submit(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 01:27:03 +01:00
Daniel Borkmann	722e4db3ae	bpf, selftests: Add various ringbuf tests with invalid offset Assert that the verifier is rejecting invalid offsets on the ringbuf entries: # ./test_verifier \| grep ring #947/u ringbuf: invalid reservation offset 1 OK #947/p ringbuf: invalid reservation offset 1 OK #948/u ringbuf: invalid reservation offset 2 OK #948/p ringbuf: invalid reservation offset 2 OK Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 01:21:49 +01:00
Daniel Borkmann	a672b2e36a	bpf: Fix ringbuf memory type confusion when passing to helpers The bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM in their bpf_func_proto definition as their first argument, and thus both expect the result from a prior bpf_ringbuf_reserve() call which has a return type of RET_PTR_TO_ALLOC_MEM_OR_NULL. While the non-NULL memory from bpf_ringbuf_reserve() can be passed to other helpers, the two sinks (bpf_ringbuf_submit(), bpf_ringbuf_discard()) right now only enforce a register type of PTR_TO_MEM. This can lead to potential type confusion since it would allow other PTR_TO_MEM memory to be passed into the two sinks which did not come from bpf_ringbuf_reserve(). Add a new MEM_ALLOC composable type attribute for PTR_TO_MEM, and enforce that: - bpf_ringbuf_reserve() returns NULL or PTR_TO_MEM \| MEM_ALLOC - bpf_ringbuf_submit() and bpf_ringbuf_discard() only take PTR_TO_MEM \| MEM_ALLOC but not plain PTR_TO_MEM arguments via ARG_PTR_TO_ALLOC_MEM - however, other helpers might treat PTR_TO_MEM \| MEM_ALLOC as plain PTR_TO_MEM to populate the memory area when they use ARG_PTR_TO_{UNINIT_,}MEM in their func proto description Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 01:21:46 +01:00
Daniel Borkmann	64620e0a1e	bpf: Fix out of bounds access for ringbuf helpers Both bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM in their bpf_func_proto definition as their first argument. They both expect the result from a prior bpf_ringbuf_reserve() call which has a return type of RET_PTR_TO_ALLOC_MEM_OR_NULL. Meaning, after a NULL check in the code, the verifier will promote the register type in the non-NULL branch to a PTR_TO_MEM and in the NULL branch to a known zero scalar. Generally, pointer arithmetic on PTR_TO_MEM is allowed, so the latter could have an offset. The ARG_PTR_TO_ALLOC_MEM expects a PTR_TO_MEM register type. However, the non- zero result from bpf_ringbuf_reserve() must be fed into either bpf_ringbuf_submit() or bpf_ringbuf_discard() but with the original offset given it will then read out the struct bpf_ringbuf_hdr mapping. The verifier missed to enforce a zero offset, so that out of bounds access can be triggered which could be used to escalate privileges if unprivileged BPF was enabled (disabled by default in kernel). Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: <tr3e.wang@gmail.com> (SecCoder Security Lab) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 01:21:39 +01:00
Daniel Borkmann	6788ab2350	bpf: Generally fix helper register offset check Right now the assertion on check_ptr_off_reg() is only enforced for register types PTR_TO_CTX (and open coded also for PTR_TO_BTF_ID), however, this is insufficient since many other PTR_TO_* register types such as PTR_TO_FUNC do not handle/expect register offsets when passed to helper functions. Given this can slip-through easily when adding new types, make this an explicit allow-list and reject all other current and future types by default if this is encountered. Also, extend check_ptr_off_reg() to handle PTR_TO_BTF_ID as well instead of duplicating it. For PTR_TO_BTF_ID, reg->off is used for BTF to match expected BTF ids if struct offset is used. This part still needs to be allowed, but the dynamic off from the tnum must be rejected. Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 01:21:34 +01:00
Daniel Borkmann	d400a6cf1c	bpf: Mark PTR_TO_FUNC register initially with zero offset Similar as with other pointer types where we use ldimm64, clear the register content to zero first, and then populate the PTR_TO_FUNC type and subprogno number. Currently this is not done, and leads to reuse of stale register tracking data. Given for special ldimm64 cases we always clear the register offset, make it common for all cases, so it won't be forgotten in future. Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 01:21:29 +01:00
Daniel Borkmann	be80a1d3f9	bpf: Generalize check_ctx_reg for reuse with other types Generalize the check_ctx_reg() helper function into a more generic named one so that it can be reused for other register types as well to check whether their offset is non-zero. No functional change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>	2022-01-19 01:21:24 +01:00
Eric Dumazet	2836615aa2	netns: add schedule point in ops_exit_list() When under stress, cleanup_net() can have to dismantle netns in big numbers. ops_exit_list() currently calls many helpers [1] that have no schedule point, and we can end up with soft lockups, particularly on hosts with many cpus. Even for moderate amount of netns processed by cleanup_net() this patch avoids latency spikes. [1] Some of these helpers like fib_sync_up() and fib_sync_down_dev() are very slow because net/ipv4/fib_semantics.c uses host-wide hash tables, and ifindex is used as the only input of two hash functions. ifindexes tend to be the same for all netns (lo.ifindex==1 per instance) This will be fixed in a separate patch. Fixes: 72ad937abd0a ("net: Add support for batching network namespace cleanups") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-18 13:40:58 +00:00
Russell King (Oracle)	5765cee119	net: sfp: fix high power modules without diagnostic monitoring Commit 7cfa9c92d0a3 ("net: sfp: avoid power switch on address-change modules") unintetionally changed the semantics for high power modules without the digital diagnostics monitoring. We repeatedly attempt to read the power status from the non-existing 0xa2 address in a futile hope this failure is temporary: [ 8.856051] sfp sfp-eth3: module NTT 0000000000000000 rev 0000 sn 0000000000000000 dc 160408 [ 8.865843] mvpp2 f4000000.ethernet eth3: switched to inband/1000base-x link mode [ 8.873469] sfp sfp-eth3: Failed to read EEPROM: -5 [ 8.983251] sfp sfp-eth3: Failed to read EEPROM: -5 [ 9.103250] sfp sfp-eth3: Failed to read EEPROM: -5 We previosuly assumed such modules were powered up in the correct mode, continuing without further configuration as long as the required power class was supported by the host. Restore this behaviour, while preserving the intent of subsequent patches to avoid the "Address Change Sequence not supported" warning if we are not going to be accessing the DDM address. Fixes: 7cfa9c92d0a3 ("net: sfp: avoid power switch on address-change modules") Reported-by: 照山周一郎 <teruyama@springboard-inc.jp> Tested-by: 照山周一郎 <teruyama@springboard-inc.jp> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-17 16:30:44 +00:00
David S. Miller	9ea674d7ca	Merge branch 'skb-leak-fixes' Gal Pressman says: ==================== net: Couple of skb memory leak fixes As discussed in: https://lore.kernel.org/netdev/20220102081253.9123-1-gal@nvidia.com/ These are the two followup suggestions from Eric and Jakub. Patch #1 adds a sk_defer_free_flush() call to the kTLS splice_read handler. Patch #2 verifies the defer list is empty on socket destroy, and calls a defer free flush as well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-17 13:07:47 +00:00
Gal Pressman	79074a72d3	net: Flush deferred skb free on socket destroy The cited Fixes patch moved to a deferred skb approach where the skbs are not freed immediately under the socket lock. Add a WARN_ON_ONCE() to verify the deferred list is empty on socket destroy, and empty it to prevent potential memory leaks. Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-17 13:07:47 +00:00
Gal Pressman	db094aa814	net/tls: Fix another skb memory leak when running kTLS traffic This patch is a followup to commit ffef737fd037 ("net/tls: Fix skb memory leak when running kTLS traffic") Which was missing another sk_defer_free_flush() call in tls_sw_splice_read(). Fixes: f35f821935d8 ("tcp: defer skb freeing after socket lock is released") Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-17 13:07:47 +00:00
Horatiu Vultur	c0b7f7d7e0	net: ocelot: Fix the call to switchdev_bridge_port_offload In the blamed commit, the call to the function switchdev_bridge_port_offload was passing the wrong argument for atomic_nb. It was ocelot_netdevice_nb instead of ocelot_swtchdev_nb. This patch fixes this issue. Fixes: 4e51bf44a03af6 ("net: bridge: move the switchdev object replay helpers to "push" mode") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-17 13:04:25 +00:00
Moshe Tal	429e3d123d	bonding: Fix extraction of ports from the packet headers Wrong hash sends single stream to multiple output interfaces. The offset calculation was relative to skb->head, fix it to be relative to skb->data. Fixes: a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff") Reviewed-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Moshe Tal <moshet@nvidia.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-16 23:38:59 +00:00
Wen Gu	56d99e81ec	net/smc: Fix hung_task when removing SMC-R devices A hung_task is observed when removing SMC-R devices. Suppose that a link group has two active links(lnk_A, lnk_B) associated with two different SMC-R devices(dev_A, dev_B). When dev_A is removed, the link group will be removed from smc_lgr_list and added into lgr_linkdown_list. lnk_A will be cleared and smcibdev(A)->lnk_cnt will reach to zero. However, when dev_B is removed then, the link group can't be found in smc_lgr_list and lnk_B won't be cleared, making smcibdev->lnk_cnt never reaches zero, which causes a hung_task. This patch fixes this issue by restoring the implementation of smc_smcr_terminate_all() to what it was before commit 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock"). The original implementation also satisfies the intention that make sure QP destroy earlier than CQ destroy because we will always wait for smcibdev->lnk_cnt reaches zero, which guarantees QP has been destroyed. Fixes: 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-16 12:30:28 +00:00
Eric Dumazet	0a6e6b3c7d	ipv4: update fib_info_cnt under spinlock protection In the past, free_fib_info() was supposed to be called under RTNL protection. This eventually was no longer the case. Instead of enforcing RTNL it seems we simply can move fib_info_cnt changes to occur when fib_info_lock is held. v2: David Laight suggested to update fib_info_cnt only when an entry is added/deleted to/from the hash table, as fib_info_cnt is used to make sure hash table size is optimal. BUG: KCSAN: data-race in fib_create_info / free_fib_info write to 0xffffffff86e243a0 of 4 bytes by task 26429 on cpu 0: fib_create_info+0xe78/0x3440 net/ipv4/fib_semantics.c:1428 fib_table_insert+0x148/0x10c0 net/ipv4/fib_trie.c:1224 fib_magic+0x195/0x1e0 net/ipv4/fib_frontend.c:1087 fib_add_ifaddr+0xd0/0x2e0 net/ipv4/fib_frontend.c:1109 fib_netdev_event+0x178/0x510 net/ipv4/fib_frontend.c:1466 notifier_call_chain kernel/notifier.c:83 [inline] raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:391 __dev_notify_flags+0x1d3/0x3b0 dev_change_flags+0xa2/0xc0 net/core/dev.c:8872 do_setlink+0x810/0x2410 net/core/rtnetlink.c:2719 rtnl_group_changelink net/core/rtnetlink.c:3242 [inline] __rtnl_newlink net/core/rtnetlink.c:3396 [inline] rtnl_newlink+0xb10/0x13b0 net/core/rtnetlink.c:3506 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2496 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x726/0x840 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2492 __do_sys_sendmsg net/socket.c:2501 [inline] __se_sys_sendmsg net/socket.c:2499 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffffffff86e243a0 of 4 bytes by task 31505 on cpu 1: free_fib_info+0x35/0x80 net/ipv4/fib_semantics.c:252 fib_info_put include/net/ip_fib.h:575 [inline] nsim_fib4_rt_destroy drivers/net/netdevsim/fib.c:294 [inline] nsim_fib4_rt_replace drivers/net/netdevsim/fib.c:403 [inline] nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:431 [inline] nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline] nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline] nsim_fib_event_work+0x15ca/0x2cf0 drivers/net/netdevsim/fib.c:1477 process_one_work+0x3fc/0x980 kernel/workqueue.c:2298 process_scheduled_works kernel/workqueue.c:2361 [inline] worker_thread+0x7df/0xa70 kernel/workqueue.c:2447 kthread+0x2c7/0x2e0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 value changed: 0x00000d2d -> 0x00000d2e Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 31505 Comm: kworker/1:21 Not tainted 5.16.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events nsim_fib_event_work Fixes: 48bb9eb47b27 ("netdevsim: fib: Add dummy implementation for FIB offload") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-16 12:25:54 +00:00
Wen Gu	9404bc1e58	net/smc: Remove unused function declaration The declaration of smc_wr_tx_dismiss_slots() is unused. So remove it. Fixes: 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-15 22:57:21 +00:00
Slark Xiao	f542cdfa30	net: wwan: Fix MRU mismatch issue which may lead to data connection lost In pci_generic.c there is a 'mru_default' in struct mhi_pci_dev_info. This value shall be used for whole mhi if it's given a value for a specific product. But in function mhi_net_rx_refill_work(), it's still using hard code value MHI_DEFAULT_MRU. 'mru_default' shall have higher priority than MHI_DEFAULT_MRU. And after checking, this change could help fix a data connection lost issue. Fixes: 5c2c85315948 ("bus: mhi: pci-generic: configurable network interface MRU") Signed-off-by: Shujun Wang <wsj20369@163.com> Signed-off-by: Slark Xiao <slark_xiao@163.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-15 22:40:52 +00:00
Mohammad Athari Bin Ismail	020a45aff1	net: phy: marvell: add Marvell specific PHY loopback Existing genphy_loopback() is not applicable for Marvell PHY. Besides configuring bit-6 and bit-13 in Page 0 Register 0 (Copper Control Register), it is also required to configure same bits in Page 2 Register 21 (MAC Specific Control Register 2) according to speed of the loopback is operating. Tested working on Marvell88E1510 PHY for all speeds (1000/100/10Mbps). FIXME: Based on trial and error test, it seem 1G need to have delay between soft reset and loopback enablement. Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration") Cc: <stable@vger.kernel.org> # 5.15.x Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-15 22:38:27 +00:00
Christophe JAILLET	9a9acdccdf	net: ethernet: sun4i-emac: Fix an error handling path in emac_probe() A dma_request_chan() call is hidden in emac_configure_dma(). It must be released in the probe if an error occurs, as already done in the remove function. Add the corresponding dma_release_channel() call. Fixes: 47869e82c8b8 ("sun4i-emac.c: add dma support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-15 22:34:52 +00:00
Tom Rix	214b3369ab	net: ethernet: mtk_eth_soc: fix error checking in mtk_mac_config() Clang static analysis reports this problem mtk_eth_soc.c:394:7: warning: Branch condition evaluates to a garbage value if (err) ^~~ err is not initialized and only conditionally set. So intitialize err. Fixes: 7e538372694b ("net: ethernet: mediatek: Re-add support SGMII") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-15 22:33:17 +00:00
Vladimir Oltean	80f15f3bef	net: mscc: ocelot: don't dereference NULL pointers with shared tc filters The following command sequence: tc qdisc del dev swp0 clsact tc qdisc add dev swp0 ingress_block 1 clsact tc qdisc add dev swp1 ingress_block 1 clsact tc filter add block 1 flower action drop tc qdisc del dev swp0 clsact produces the following NPD: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000014 pc : vcap_entry_set+0x14/0x70 lr : ocelot_vcap_filter_del+0x198/0x234 Call trace: vcap_entry_set+0x14/0x70 ocelot_vcap_filter_del+0x198/0x234 ocelot_cls_flower_destroy+0x94/0xe4 felix_cls_flower_del+0x70/0x84 dsa_slave_setup_tc_block_cb+0x13c/0x60c dsa_slave_setup_tc_block_cb_ig+0x20/0x30 tc_setup_cb_reoffload+0x44/0x120 fl_reoffload+0x280/0x320 tcf_block_playback_offloads+0x6c/0x184 tcf_block_unbind+0x80/0xe0 tcf_block_setup+0x174/0x214 tcf_block_offload_cmd.isra.0+0x100/0x13c tcf_block_offload_unbind+0x5c/0xa0 __tcf_block_put+0x54/0x174 tcf_block_put_ext+0x5c/0x74 clsact_destroy+0x40/0x60 qdisc_destroy+0x4c/0x150 qdisc_put+0x70/0x90 qdisc_graft+0x3f0/0x4c0 tc_get_qdisc+0x1cc/0x364 rtnetlink_rcv_msg+0x124/0x340 The reason is that the driver isn't prepared to receive two tc filters with the same cookie. It unconditionally creates a new struct ocelot_vcap_filter for each tc filter, and it adds all filters with the same identifier (cookie) to the ocelot_vcap_block. The problem is here, in ocelot_vcap_filter_del(): /* Gets index of the filter / index = ocelot_vcap_block_get_filter_index(block, filter); if (index < 0) return index; / Delete filter / ocelot_vcap_block_remove_filter(ocelot, block, filter); / Move up all the blocks over the deleted filter / for (i = index; i < block->count; i++) { struct ocelot_vcap_filter tmp; tmp = ocelot_vcap_block_find_filter_by_index(block, i); vcap_entry_set(ocelot, i, tmp); } what will happen is ocelot_vcap_block_get_filter_index() will return the index (@index) of the first filter found with that cookie. This is _not_ the index of _this_ filter, but the other one with the same cookie, because ocelot_vcap_filter_equal() gets fooled. Then later, ocelot_vcap_block_remove_filter() is coded to remove all filters that are ocelot_vcap_filter_equal() with the passed @filter. So unexpectedly, both filters get deleted from the list. Then ocelot_vcap_filter_del() will attempt to move all the other filters up, again finding them by index (@i). The block count is 2, @index was 0, so it will attempt to move up filter @i=0 and @i=1. It assigns tmp = ocelot_vcap_block_find_filter_by_index(block, i), which is now a NULL pointer because ocelot_vcap_block_remove_filter() has removed more than one filter. As far as I can see, this problem has been there since the introduction of tc offload support, however I cannot test beyond the blamed commit due to hardware availability. In any case, any fix cannot be backported that far, due to lots of changes to the code base. Therefore, let's go for the correct solution, which is to not call ocelot_vcap_filter_add() and ocelot_vcap_filter_del(), unless the filter is actually unique and not shared. For the shared filters, we should just modify the ingress port mask and call ocelot_vcap_filter_replace(), a function introduced by commit 95706be13b9f ("net: mscc: ocelot: create a function that replaces an existing VCAP filter"). This way, block->rules will only contain filters with unique cookies, by design. Fixes: 07d985eef073 ("net: dsa: felix: Wire up the ocelot cls_flower methods") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-15 22:31:50 +00:00
Eric Dumazet	9d6d7f1cb6	af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress wait_for_unix_gc() reads unix_tot_inflight & gc_in_progress without synchronization. Adds READ_ONCE()/WRITE_ONCE() and their associated comments to better document the intent. BUG: KCSAN: data-race in unix_inflight / wait_for_unix_gc write to 0xffffffff86e2b7c0 of 4 bytes by task 9380 on cpu 0: unix_inflight+0x1e8/0x260 net/unix/scm.c:63 unix_attach_fds+0x10c/0x1e0 net/unix/scm.c:121 unix_scm_to_skb net/unix/af_unix.c:1674 [inline] unix_dgram_sendmsg+0x679/0x16b0 net/unix/af_unix.c:1817 unix_seqpacket_sendmsg+0xcc/0x110 net/unix/af_unix.c:2258 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2549 __do_sys_sendmmsg net/socket.c:2578 [inline] __se_sys_sendmmsg net/socket.c:2575 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2575 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffffffff86e2b7c0 of 4 bytes by task 9375 on cpu 1: wait_for_unix_gc+0x24/0x160 net/unix/garbage.c:196 unix_dgram_sendmsg+0x8e/0x16b0 net/unix/af_unix.c:1772 unix_seqpacket_sendmsg+0xcc/0x110 net/unix/af_unix.c:2258 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2549 __do_sys_sendmmsg net/socket.c:2578 [inline] __se_sys_sendmmsg net/socket.c:2575 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2575 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00000002 -> 0x00000004 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 9375 Comm: syz-executor.1 Not tainted 5.16.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 9915672d4127 ("af_unix: limit unix_tot_inflight") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220114164328.2038499-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-14 18:31:37 -08:00
Kai-Heng Feng	d90d0c175c	net: stmmac: Fix "Unbalanced pm_runtime_enable!" warning If the device is PCI based like intel-eth-pci, pm_runtime_enable() is already called by pci_pm_init(). So only pm_runtime_enable() when it's not already enabled. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-14 11:27:34 +00:00
Miaoqian Lin	99218cbf81	lib82596: Fix IRQ check in sni_82596_probe platform_get_irq() returns negative error number instead 0 on failure. And the doc of platform_get_irq() provides a usage example: int irq = platform_get_irq(pdev, 0); if (irq < 0) return irq; Fix the check of return value to catch errors correctly. Fixes: 115978859272 ("i825xx: Move the Intel 82586/82593/82596 based drivers") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-14 11:24:03 +00:00
Michael Ellerman	ea93824855	net: apple: bmac: Fix build since dev_addr constification Since commit adeef3e32146 ("net: constify netdev->dev_addr") the bmac driver no longer builds with the following errors (pmac32_defconfig): linux/drivers/net/ethernet/apple/bmac.c: In function ‘bmac_probe’: linux/drivers/net/ethernet/apple/bmac.c:1287:20: error: assignment of read-only location ‘*(dev->dev_addr + (sizetype)j)’ 1287 \| dev->dev_addr[j] = rev ? bitrev8(addr[j]): addr[j]; \| ^ Fix it by making the modifications to a local macaddr variable and then passing that to eth_hw_addr_set(). We don't use the existing addr variable because the bitrev8() would mutate it, but it is already used unreversed later in the function. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-14 11:22:57 +00:00
Michael Ellerman	6c8dc12cd9	net: apple: mace: Fix build since dev_addr constification Since commit adeef3e32146 ("net: constify netdev->dev_addr") the mace driver no longer builds with various errors (pmac32_defconfig): linux/drivers/net/ethernet/apple/mace.c: In function ‘mace_probe’: linux/drivers/net/ethernet/apple/mace.c:170:20: error: assignment of read-only location ‘(dev->dev_addr + (sizetype)j)’ 170 \| dev->dev_addr[j] = rev ? bitrev8(addr[j]): addr[j]; \| ^ linux/drivers/net/ethernet/apple/mace.c: In function ‘mace_reset’: linux/drivers/net/ethernet/apple/mace.c:349:32: warning: passing argument 2 of ‘__mace_set_address’ discards ‘const’ qualifier from pointer target type 349 \| __mace_set_address(dev, dev->dev_addr); \| ~~~^~~~~~~~~~ linux/drivers/net/ethernet/apple/mace.c:93:62: note: expected ‘void ’ but argument is of type ‘const unsigned char ’ 93 \| static void __mace_set_address(struct net_device dev, void addr); \| ~~~~~~^~~~ linux/drivers/net/ethernet/apple/mace.c: In function ‘__mace_set_address’: linux/drivers/net/ethernet/apple/mace.c:388:36: error: assignment of read-only location ‘(dev->dev_addr + (sizetype)i)’ 388 \| out_8(&mb->padr, dev->dev_addr[i] = p[i]); \| ^ Fix it by making the modifications to a local macaddr variable and then passing that to eth_hw_addr_set(), as well as adding some missing const qualifiers. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-14 11:22:57 +00:00
Li Zhijian	2255634100	kselftests/net: list all available tests in usage() So that users can run/query them easily. $ ./fcnal-test.sh -h usage: fcnal-test.sh OPTS -4 IPv4 tests only -6 IPv6 tests only -t <test> Test name/set to run -p Pause on fail -P Pause after each test -v Be verbose Tests: ipv4_ping ipv4_tcp ipv4_udp ipv4_bind ipv4_runtime ipv4_netfilter ipv6_ping ipv6_tcp ipv6_udp ipv6_bind ipv6_runtime ipv6_netfilter use_cases Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-14 11:21:55 +00:00
Markus Reichl	0bf3885324	net: usb: Correct reset handling of smsc95xx On boards with LAN9514 and no preconfigured MAC address we don't get an ip address from DHCP after commit a049a30fc27c ("net: usb: Correct PHY handling of smsc95xx") anymore. Adding an explicit reset before starting the phy fixes the issue. [1] https://lore.kernel.org/netdev/199eebbd6b97f52b9119c9fa4fd8504f8a34de18.camel@collabora.com/ From: Gabriel Hojda <ghojda@yo2urs.ro> Fixes: a049a30fc27c ("net: usb: Correct PHY handling of smsc95xx") Signed-off-by: Gabriel Hojda <ghojda@yo2urs.ro> Signed-off-by: Markus Reichl <m.reichl@fivetechno.de> Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-14 11:21:06 +00:00
Sergey Shtylyov	9deb48b53e	bcmgenet: add WOL IRQ check The driver neglects to check the result of platform_get_irq_optional()'s call and blithely passes the negative error codes to devm_request_irq() (which takes unsigned IRQ #), causing it to fail with -EINVAL. Stop calling devm_request_irq() with the invalid IRQ #s. Fixes: 8562056f267d ("net: bcmgenet: request Wake-on-LAN interrupt") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-14 11:20:06 +00:00
Kevin Bracey	fb80445c43	net_sched: restore "mpu xxx" handling commit 56b765b79e9a ("htb: improved accuracy at high rates") broke "overhead X", "linklayer atm" and "mpu X" attributes. "overhead X" and "linklayer atm" have already been fixed. This restores the "mpu X" handling, as might be used by DOCSIS or Ethernet shaping: tc class add ... htb rate X overhead 4 mpu 64 The code being fixed is used by htb, tbf and act_police. Cake has its own mpu handling. qdisc_calculate_pkt_len still uses the size table containing values adjusted for mpu by user space. iproute2 tc has always passed mpu into the kernel via a tc_ratespec structure, but the kernel never directly acted on it, merely stored it so that it could be read back by `tc class show`. Rather, tc would generate length-to-time tables that included the mpu (and linklayer) in their construction, and the kernel used those tables. Since v3.7, the tables were no longer used. Along with "mpu", this also broke "overhead" and "linklayer" which were fixed in 01cb71d2d47b ("net_sched: restore "overhead xxx" handling", v3.10) and 8a8e3d84b171 ("net_sched: restore "linklayer atm" handling", v3.11). "overhead" was fixed by simply restoring use of tc_ratespec::overhead - this had originally been used by the kernel but was initially omitted from the new non-table-based calculations. "linklayer" had been handled in the table like "mpu", but the mode was not originally passed in tc_ratespec. The new implementation was made to handle it by getting new versions of tc to pass the mode in an extended tc_ratespec, and for older versions of tc the table contents were analysed at load time to deduce linklayer. As "mpu" has always been given to the kernel in tc_ratespec, accompanying the mpu-based table, we can restore system functionality with no userspace change by making the kernel act on the tc_ratespec value. Fixes: 56b765b79e9a ("htb: improved accuracy at high rates") Signed-off-by: Kevin Bracey <kevin@bracey.fi> Cc: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Vimalkumar <j.vimal@gmail.com> Link: https://lore.kernel.org/r/20220112170210.1014351-1-kevin@bracey.fi Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-13 11:06:42 -08:00
Kyoungkyu Park	a6fadfd757	net: qmi_wwan: Add Hucom Wireless HM-211S/K The Hucom Wireless HM-211S/K is an LTE module based on Qualcomm MDM9207. This module supports LTE Band 1, 3, 5, 7, 8 and WCDMA Band 1. Manual testing showed that only interface number two replies to QMI messages. T: Bus=01 Lev=02 Prnt=02 Port=01 Cnt=01 Dev#= 3 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=22de ProdID=9051 Rev= 3.18 S: Manufacturer=Android S: Product=Android S: SerialNumber=0123456789ABCDEF C:* #Ifs= 4 Cfg#= 1 Atr=80 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none) E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=85(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Kyoungkyu Park <choryu.park@choryu.space> Acked-by: Bjørn Mork <bjorn@mork.no> Link: https://lore.kernel.org/r/Yd+nxAA6KorDpQFv@choryu-tfx5470h Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-13 09:24:51 -08:00
Wen Gu	20c9398d33	net/smc: Resolve the race between SMC-R link access and clear We encountered some crashes caused by the race between SMC-R link access and link clear that triggered by abnormal link group termination, such as port error. Here is an example of this kind of crashes: BUG: kernel NULL pointer dereference, address: 0000000000000000 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_llc_flow_initiate+0x44/0x190 [smc] Call Trace: <TASK> ? __smc_buf_create+0x75a/0x950 [smc] smcr_lgr_reg_rmbs+0x2a/0xbf [smc] smc_listen_work+0xf72/0x1230 [smc] ? process_one_work+0x25c/0x600 process_one_work+0x25c/0x600 worker_thread+0x4f/0x3a0 ? process_one_work+0x600/0x600 kthread+0x15d/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> smc_listen_work() __smc_lgr_terminate() --------------------------------------------------------------- \| smc_lgr_free() \| \|- smcr_link_clear() \| \|- memset(lnk, 0) smc_listen_rdma_reg() \| \|- smcr_lgr_reg_rmbs() \| \|- smc_llc_flow_initiate() \| \|- access lnk->lgr (panic) \| These crashes are similarly caused by clearing SMC-R link resources when some functions is still accessing to them. This patch tries to fix the issue by introducing reference count of SMC-R links and ensuring that the sensitive resources of links won't be cleared until reference count reaches zero. The operation to the SMC-R link reference count can be concluded as follows: object [hold or initialized as 1] [put] -------------------------------------------------------------------- links smcr_link_init() smcr_link_clear() connections smc_conn_create() smc_conn_free() Through this way, the clear of SMC-R links is later than the free of all the smc connections above it, thus avoiding the unsafe reference to SMC-R links. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-13 13:14:53 +00:00
Wen Gu	ea89c6c098	net/smc: Introduce a new conn->lgr validity check helper It is no longer suitable to identify whether a smc connection is registered in a link group through checking if conn->lgr is NULL, because conn->lgr won't be reset even the connection is unregistered from a link group. So this patch introduces a new helper smc_conn_lgr_valid() and replaces all the check of conn->lgr in original implementation with the new helper to judge if conn->lgr is valid to use. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-13 13:14:53 +00:00
Eric Dumazet	91341fa000	inet: frags: annotate races around fqdir->dead and fqdir->high_thresh Both fields can be read/written without synchronization, add proper accessors and documentation. Fixes: d5dd88794a13 ("inet: fix various use-after-free in defrags units") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-13 13:06:05 +00:00
David S. Miller	3ba8c6258e	Merge branch 'smc-race-fixes' Wen Gu says: ==================== net/smc: Fixes for race in smc link group termination We encountered some crashes recently and they are caused by the race between the access and free of link/link group in abnormal smc link group termination. The crashes can be reproduced in frequent abnormal link group termination, like setting RNICs up/down. This set of patches tries to fix this by extending the life cycle of link/link group to ensure that they won't be referred to after cleared or freed. v1 -> v2: - Improve some comments. - Move codes of waking up lgrs_deleted wait queue from smc_lgr_free() to __smc_lgr_free(). - Move codes of waking up links_deleted wait queue from smcr_link_clear() to __smcr_link_clear(). - Move codes of smc_ibdev_cnt_dec() and put_device() from smcr_link_clear() to __smcr_link_clear() - Move smc_lgr_put() to the end of __smcr_link_clear(). - Call smc_lgr_put() after 'out' tag in smcr_link_init() when link initialization fails. - Modify the location where smc connection holds the lgr or link. before: * hold lgr in smc_lgr_register_conn(). * hold link in smcr_lgr_conn_assign_link(). after: * hold both lgr and link in smc_conn_create(). Modify the location to symmetrical with the place where smc connections put the lgr or link, which is smc_conn_free(). - Initialize conn->freed as zero in smc_conn_create(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-13 12:55:40 +00:00
Wen Gu	61f434b028	net/smc: Resolve the race between link group access and termination We encountered some crashes caused by the race between the access and the termination of link groups. Here are some of panic stacks we met: 1) Race between smc_clc_wait_msg() and __smc_lgr_terminate() BUG: kernel NULL pointer dereference, address: 00000000000002f0 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc] Call Trace: <TASK> ? smc_clc_send_accept+0x45/0xa0 [smc] ? smc_clc_send_accept+0x45/0xa0 [smc] smc_listen_work+0x783/0x1220 [smc] ? finish_task_switch+0xc4/0x2e0 ? process_one_work+0x1ad/0x3c0 process_one_work+0x1ad/0x3c0 worker_thread+0x4c/0x390 ? rescuer_thread+0x320/0x320 kthread+0x149/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> smc_listen_work() abnormal case like port error --------------------------------------------------------------- \| __smc_lgr_terminate() \| \|- smc_conn_kill() \| \|- smc_lgr_unregister_conn() \| \|- set conn->lgr = NULL smc_clc_wait_msg() \| \|- access conn->lgr (panic) \| 2) Race between smc_setsockopt() and __smc_lgr_terminate() BUG: kernel NULL pointer dereference, address: 00000000000002e8 RIP: 0010:smc_setsockopt+0x17a/0x280 [smc] Call Trace: <TASK> __sys_setsockopt+0xfc/0x190 __x64_sys_setsockopt+0x20/0x30 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> smc_setsockopt() abnormal case like port error -------------------------------------------------------------- \| __smc_lgr_terminate() \| \|- smc_conn_kill() \| \|- smc_lgr_unregister_conn() \| \|- set conn->lgr = NULL mod_delayed_work() \| \|- access conn->lgr (panic) \| There are some other panic places and they are caused by the similar reason as described above, which is accessing link group after termination, thus getting a NULL pointer or invalid resource. Currently, there seems to be no synchronization between the link group access and a sudden termination of it. This patch tries to fix this by introducing reference count of link group and not freeing link group until reference count is zero. Link group might be referred to by links or smc connections. So the operation to the link group reference count can be concluded as follows: object [hold or initialized as 1] [put] ------------------------------------------------------------------- link group smc_lgr_create() smc_lgr_free() connections smc_conn_create() smc_conn_free() links smcr_link_init() smcr_link_clear() Througth this way, we extend the life cycle of link group and ensure it is longer than the life cycle of connections and links above it, so that avoid invalid access to link group after its termination. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-13 12:55:40 +00:00
Li Zhijian	de0e444706	kselftests/net: adapt the timeout to the largest runtime timeout in settings is used by each case under the same directory, so it should adapt to the maximum runtime. A normally running net/fib_nexthops.sh may be killed by this unsuitable timeout. Furthermore, since the defect[1] of kselftests framework, net/fib_nexthops.sh which might take at least (300 * 4) seconds would block the whole kselftests framework previously. $ git grep -w 'sleep 300' tools/testing/selftests/net tools/testing/selftests/net/fib_nexthops.sh: sleep 300 tools/testing/selftests/net/fib_nexthops.sh: sleep 300 tools/testing/selftests/net/fib_nexthops.sh: sleep 300 tools/testing/selftests/net/fib_nexthops.sh: sleep 300 Enlarge the timeout by plus 300 based on the obvious largest runtime to avoid the blocking. [1]: https://www.spinics.net/lists/kernel/msg4185370.html Signed-off-by: Zhou Jie <zhoujie2011@fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-13 12:53:22 +00:00
Vladimir Oltean	33cb0ff30c	net: mscc: ocelot: don't let phylink re-enable TX PAUSE on the NPI port Since commit b39648079db4 ("net: mscc: ocelot: disable flow control on NPI interface"), flow control should be disabled on the DSA CPU port when used in NPI mode. However, the commit blamed in the Fixes: tag below broke this, because it allowed felix_phylink_mac_link_up() to overwrite SYS_PAUSE_CFG_PAUSE_ENA for the DSA CPU port. This issue became noticeable since the device tree update from commit 8fcea7be5736 ("arm64: dts: ls1028a: mark internal links between Felix and ENETC as capable of flow control"). The solution is to check whether this is the currently configured NPI port from ocelot_phylink_mac_link_up(), and to not modify the statically disabled PAUSE frame transmission if it is. When the port is configured for lossless mode as opposed to tail drop mode, but the link partner (DSA master) doesn't observe the transmitted PAUSE frames, the switch termination throughput is much worse, as can be seen below. Before: root@debian:~# iperf3 -c 192.168.100.2 Connecting to host 192.168.100.2, port 5201 [ 5] local 192.168.100.1 port 37504 connected to 192.168.100.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 28.4 MBytes 238 Mbits/sec 357 22.6 KBytes [ 5] 1.00-2.00 sec 33.6 MBytes 282 Mbits/sec 426 19.8 KBytes [ 5] 2.00-3.00 sec 34.0 MBytes 285 Mbits/sec 343 21.2 KBytes [ 5] 3.00-4.00 sec 32.9 MBytes 276 Mbits/sec 354 22.6 KBytes [ 5] 4.00-5.00 sec 32.3 MBytes 271 Mbits/sec 297 18.4 KBytes ^C[ 5] 5.00-5.06 sec 2.05 MBytes 270 Mbits/sec 45 19.8 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-5.06 sec 163 MBytes 271 Mbits/sec 1822 sender [ 5] 0.00-5.06 sec 0.00 Bytes 0.00 bits/sec receiver After: root@debian:~# iperf3 -c 192.168.100.2 Connecting to host 192.168.100.2, port 5201 [ 5] local 192.168.100.1 port 49470 connected to 192.168.100.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 112 MBytes 941 Mbits/sec 259 143 KBytes [ 5] 1.00-2.00 sec 110 MBytes 920 Mbits/sec 329 144 KBytes [ 5] 2.00-3.00 sec 112 MBytes 936 Mbits/sec 255 144 KBytes [ 5] 3.00-4.00 sec 110 MBytes 927 Mbits/sec 355 105 KBytes [ 5] 4.00-5.00 sec 110 MBytes 926 Mbits/sec 350 156 KBytes [ 5] 5.00-6.00 sec 110 MBytes 925 Mbits/sec 305 148 KBytes [ 5] 6.00-7.00 sec 110 MBytes 924 Mbits/sec 320 143 KBytes [ 5] 7.00-8.00 sec 110 MBytes 925 Mbits/sec 273 97.6 KBytes [ 5] 8.00-9.00 sec 109 MBytes 913 Mbits/sec 299 141 KBytes [ 5] 9.00-10.00 sec 110 MBytes 922 Mbits/sec 287 146 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.08 GBytes 926 Mbits/sec 3032 sender [ 5] 0.00-10.00 sec 1.08 GBytes 925 Mbits/sec receiver Fixes: de274be32cb2 ("net: dsa: felix: set TX flow control according to the phylink_mac_link_up resolution") Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-13 12:52:15 +00:00
Colin Ian King	d7b4303411	atm: iphase: remove redundant pointer skb The pointer skb is redundant, it is assigned a value that is never read and hence can be removed. Cleans up clang scan warning: drivers/atm/iphase.c:205:18: warning: Although the value stored to 'skb' is used in the enclosing expression, the value is never actually read from 'skb' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-13 12:50:48 +00:00
Maxim Mikityanskiy	de2d807b29	sch_api: Don't skip qdisc attach on ingress The attach callback of struct Qdisc_ops is used by only a few qdiscs: mq, mqprio and htb. qdisc_graft() contains the following logic (pseudocode): if (!qdisc->ops->attach) { if (ingress) do ingress stuff; else do egress stuff; } if (!ingress) { ... if (qdisc->ops->attach) qdisc->ops->attach(qdisc); } else { ... } As we see, the attach callback is not called if the qdisc is being attached to ingress (TC_H_INGRESS). That wasn't a problem for mq and mqprio, since they contain a check that they are attached to TC_H_ROOT, and they can't be attached to TC_H_INGRESS anyway. However, the commit cited below added the attach callback to htb. It is needed for the hardware offload, but in the non-offload mode it simulates the "do egress stuff" part of the pseudocode above. The problem is that when htb is attached to ingress, neither "do ingress stuff" nor attach() is called. It results in an inconsistency, and the following message is printed to dmesg: unregister_netdevice: waiting for lo to become free. Usage count = 2 This commit addresses the issue by running "do ingress stuff" in the ingress flow even in the attach callback is present, which is fine, because attach isn't going to be called afterwards. The bug was found by syzbot and reported by Eric. Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reported-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-13 12:34:59 +00:00
Pawel Dembicki	078c6a1cbd	net: qmi_wwan: add ZTE MF286D modem 19d2:1485 Modem from ZTE MF286D is an Qualcomm MDM9250 based 3G/4G modem. T: Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 3 Spd=5000 MxCh= 0 D: Ver= 3.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 1 P: Vendor=19d2 ProdID=1485 Rev=52.87 S: Manufacturer=ZTE,Incorporated S: Product=ZTE Technologies MSM S: SerialNumber=MF286DZTED000000 C:* #Ifs= 7 Cfg#= 1 Atr=80 MxPwr=896mA A: FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=06 Prot=00 I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=ff Driver=rndis_host E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=86(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=8e(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=0f(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs E: Ad=05(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=89(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-01-13 12:34:08 +00:00
Ignat Korchagin	ed6ae5ca43	sit: allow encapsulated IPv6 traffic to be delivered locally While experimenting with FOU encapsulation Amir noticed that encapsulated IPv6 traffic fails to be delivered, if the peer IP address is configured locally. It can be easily verified by creating a sit interface like below: $ sudo ip link add name fou_test type sit remote 127.0.0.1 encap fou encap-sport auto encap-dport 1111 $ sudo ip link set fou_test up and sending some IPv4 and IPv6 traffic to it $ ping -I fou_test -c 1 1.1.1.1 $ ping6 -I fou_test -c 1 fe80::d0b0:dfff:fe4c:fcbc "tcpdump -i any udp dst port 1111" will confirm that only the first IPv4 ping was encapsulated and attempted to be delivered. This seems like a limitation: for example, in a cloud environment the "peer" service may be arbitrarily scheduled on any server within the cluster, where all nodes are trying to send encapsulated traffic. And the unlucky node will not be able to. Moreover, delivering encapsulated IPv4 traffic locally is allowed. But I may not have all the context about this restriction and this code predates the observable git history. Reported-by: Amir Razmjou <arazmjou@cloudflare.com> Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220107123842.211335-1-ignat@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-12 13:56:07 -08:00
Yevhen Orlov	e179f045f9	net: marvell: prestera: Fix deinit sequence for router * Add missed call prestera_router_fini in prestera_switch_fini * Add prestera_router_hw_fini, which verify lists are empty Fixes: 69204174cc5c ("net: marvell: prestera: Add prestera router infra") Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Link: https://lore.kernel.org/r/20220111011129.5457-1-yevhen.orlov@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-12 10:17:11 -08:00
Yevhen Orlov	32d098bb2e	net: marvell: prestera: Refactor router functions * Reverse xmas tree variables order * User friendly messages on error paths * Refactor __prestera_inetaddr_event to use early return Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Link: https://lore.kernel.org/r/20220111011051.4941-1-yevhen.orlov@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-12 10:17:04 -08:00
Yevhen Orlov	6a1ba8758f	net: marvell: prestera: Refactor get/put VR functions * Use refcount, instead of uint * Increment/decrement recount inside get/put * Fix error path in __prestera_vr_create. Remove unnecessary kfree. * Make __prestera_vr_destroy symmetric to "create" Fixes: bca5859bc6c6 ("net: marvell: prestera: add hardware router objects accounting") Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Link: https://lore.kernel.org/r/20220111011014.4418-1-yevhen.orlov@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-12 10:16:55 -08:00
Yevhen Orlov	9c0c2c7aa2	net: marvell: prestera: Cleanup router struct Field "aborted" was added in 69204174cc5c ("net: marvell: prestera: Add prestera router infra"). It will not be used. So remove. Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Link: https://lore.kernel.org/r/20220111010826.3779-1-yevhen.orlov@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-12 10:16:47 -08:00
Jakub Kicinski	2716a5271d	Merge branch 'arm-ox810se-add-ethernet-support' Neil Armstrong says: ==================== ARM: ox810se: Add Ethernet support This adds support for the Synopsys DWMAC controller found in the OX820SE SoC, by using almost the same glue code as the OX820. ==================== Link: https://lore.kernel.org/r/20220104145646.135877-1-narmstrong@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-12 09:05:06 -08:00
Neil Armstrong	72f1f7e46c	net: stmmac: dwmac-oxnas: Add support for OX810SE Add support for OX810SE dwmac glue setup, which is a simplified version of the OX820 introduced later with more control on the PHY interface. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-01-12 09:05:02 -08:00

1 2 3 4 5 ...

1065598 Commits