linux

iv/linux

Author	SHA1	Message	Date
Jakub Kicinski	cd101f40a4	MAINTAINERS: update SCTP maintainers Vlad has stepped away from SCTP related duties. Move him to CREDITS and add Xin Long. Subsystem SCTP PROTOCOL Changes 237 / 629 (37%) Last activity: 2022-12-12 Vlad Yasevich <vyasevich@gmail.com>: Neil Horman <nhorman@tuxdriver.com>: Author 20a785aa52c8 2020-05-19 00:00:00 4 Tags 20a785aa52c8 2020-05-19 00:00:00 84 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>: Author 557fb5862c92 2021-07-28 00:00:00 41 Tags da05cecc4939 2022-12-12 00:00:00 197 Top reviewers: [15]: lucien.xin@gmail.com INACTIVE MAINTAINER Vlad Yasevich <vyasevich@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-02 11:35:33 -08:00
Jakub Kicinski	c71a70c267	MAINTAINERS: ipv6: retire Hideaki Yoshifuji We very rarely hear from Hideaki Yoshifuji and the IPv4/IPv6 entry covers a lot of code. Asking people to CC someone who rarely responds feels wrong. Note that Hideaki Yoshifuji already has an entry in CREDITS for IPv6 so not adding another one. Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-02 11:35:33 -08:00
Jakub Kicinski	a359656256	mailmap: add John Crispin's entry John has not been CCed on some of the fixes which perhaps resulted in the lack of review tags: Subsystem MEDIATEK ETHERNET DRIVER Changes 50 / 295 (16%) Last activity: 2023-01-17 Felix Fietkau <nbd@nbd.name>: Author 8bd8dcc5e47f 2022-11-18 00:00:00 33 Tags 8bd8dcc5e47f 2022-11-18 00:00:00 38 John Crispin <john@phrozen.org>: Sean Wang <sean.wang@mediatek.com>: Author 880c2d4b2fdf 2019-06-03 00:00:00 7 Tags a5d75538295b 2020-04-07 00:00:00 10 Mark Lee <Mark-MC.Lee@mediatek.com>: Author 8d66a8183d0c 2019-11-14 00:00:00 4 Tags 8d66a8183d0c 2019-11-14 00:00:00 4 Lorenzo Bianconi <lorenzo@kernel.org>: Author 08a764a7c51b 2023-01-17 00:00:00 68 Tags 08a764a7c51b 2023-01-17 00:00:00 74 Top reviewers: [12]: leonro@nvidia.com [6]: f.fainelli@gmail.com [6]: andrew@lunn.ch INACTIVE MAINTAINER John Crispin <john@phrozen.org> map his old address to the up to date one. Acked-by: John Crispin <john@phrozen.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-02 11:35:33 -08:00
Jakub Kicinski	57b24f8c30	MAINTAINERS: bonding: move Veaceslav Falico to CREDITS Veaceslav has stepped away from netdev: Subsystem BONDING DRIVER Changes 96 / 319 (30%) Last activity: 2022-12-01 Jay Vosburgh <j.vosburgh@gmail.com>: Author 4f5d33f4f798 2022-08-11 00:00:00 3 Tags e5214f363dab 2022-12-01 00:00:00 48 Veaceslav Falico <vfalico@gmail.com>: Andy Gospodarek <andy@greyhouse.net>: Tags 47f706262f1d 2019-02-24 00:00:00 4 Top reviewers: [42]: jay.vosburgh@canonical.com [18]: jiri@nvidia.com [10]: jtoppins@redhat.com INACTIVE MAINTAINER Veaceslav Falico <vfalico@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-02 11:35:33 -08:00
Fedor Pchelkin	0c598aed44	net: openvswitch: fix flow memory leak in ovs_flow_cmd_new Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is not freed when an allocation of a key fails. BUG: memory leak unreferenced object 0xffff888116668000 (size 632): comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000defa3494>] kmem_cache_zalloc include/linux/slab.h:654 [inline] [<00000000defa3494>] ovs_flow_alloc+0x19/0x180 net/openvswitch/flow_table.c:77 [<00000000c67d8873>] ovs_flow_cmd_new+0x1de/0xd40 net/openvswitch/datapath.c:957 [<0000000010a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 net/netlink/genetlink.c:739 [<00000000dff3302d>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] [<00000000dff3302d>] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800 [<000000000286dd87>] netlink_rcv_skb+0x153/0x430 net/netlink/af_netlink.c:2515 [<0000000061fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 [<000000009dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] [<000000009dc0f111>] netlink_unicast+0x545/0x7f0 net/netlink/af_netlink.c:1339 [<000000004a5ee816>] netlink_sendmsg+0x8e7/0xde0 net/netlink/af_netlink.c:1934 [<00000000482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline] [<00000000482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671 [<00000000698574ba>] ____sys_sendmsg+0x70a/0x870 net/socket.c:2356 [<00000000d28d9e11>] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410 [<0000000083ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439 [<00000000c00628f8>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46 [<000000004abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 To fix this the patch rearranges the goto labels to reflect the order of object allocations and adds appropriate goto statements on the error paths. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 68bb10101e6b ("openvswitch: Fix flow lookup to use unmasked key") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230201210218.361970-1-pchelkin@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-02 11:32:51 -08:00
Arınç ÜNAL	a1f47752fd	net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC According to my tests on MT7621AT and MT7623NI SoCs, hardware DSA untagging won't work on the second MAC. Therefore, disable this feature when the second MAC of the MT7621 and MT7623 SoCs is being used. Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging") Link: https://lore.kernel.org/netdev/6249fc14-b38a-c770-36b4-5af6d41c21d3@arinc9.com/ Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20230128094232.2451947-1-arinc.unal@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-02 11:30:41 -08:00
Parav Pandit	63b114042d	virtio-net: Keep stop() to follow mirror sequence of open() Cited commit in fixes tag frees rxq xdp info while RQ NAPI is still enabled and packet processing may be ongoing. Follow the mirror sequence of open() in the stop() callback. This ensures that when rxq info is unregistered, no rx packet processing is ongoing. Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Link: https://lore.kernel.org/r/20230202163516.12559-1-parav@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-02 10:10:53 -08:00
Bo Liu	b18ea3d9d2	net: dsa: Use sysfs_emit() to instead of sprintf() Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20230201081438.3151-1-liubo03@inspur.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 15:28:59 +01:00
Paolo Abeni	886d2278a6	Merge branch 'amd-xgbe-add-support-for-2-5gbe-and-rx-adaptation' Raju Rangoju says: ==================== amd-xgbe: add support for 2.5GbE and rx-adaptation This patch series adds support for 2.GbE in 10GBaseT mode and rx-adaptation support for Yellow Carp devices. 1) Support for 2.5GbE: Add the necessary changes to the driver to fully recognize and enable 2.5GbE speed in 10GBaseT mode. 2) Support for rx-adaptation: In order to support the 10G backplane mode without Auto-negotiation and to support the longer-length DAC cables, it requires PHY to perform RX Adaptation sequence as mentioned in the Synopsys databook. Add the necessary changes to Yellow Carp devices to ensure seamless RX Adaptation for 10G-SFI (LONG DAC), and 10G-KR modes without Auto-Negotiation (CL72 not present) ==================== Link: https://lore.kernel.org/r/20230201054932.212700-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 15:17:22 +01:00
Raju Rangoju	4f3b20bfbb	amd-xgbe: add support for rx-adaptation The existing implementation for non-Autonegotiation 10G speed modes does not enable RX adaptation in the Driver and FW. The RX Equalization settings (AFE settings alone) are manually configured and the existing link-up sequence in the driver does not perform rx adaptation process as mentioned in the Synopsys databook. There's a customer request for 10G backplane mode without Auto-negotiation and for the DAC cables of more significant length that follow the non-Autonegotiation mode. These modes require PHY to perform RX Adaptation. The proposed logic adds the necessary changes to Yellow Carp devices to ensure seamless RX Adaptation for 10G-SFI (LONG DAC) and 10G-KR without AN (CL72 not present). The RX adaptation core algorithm is executed by firmware, however, to achieve that a new mailbox sub-command is required to be sent by the driver. Co-developed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 15:17:19 +01:00
Raju Rangoju	3ee217c47b	amd-xgbe: add 2.5GbE support to 10G BaseT mode Add support to the driver to fully recognize and enable 2.5GbE speed in 10GBaseT mode. Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 15:17:19 +01:00
Andrei Gherzan	329c9cd769	selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking The test tool can check that the zerocopy number of completions value is valid taking into consideration the number of datagram send calls. This can catch the system into a state where the datagrams are still in the system (for example in a qdisk, waiting for the network interface to return a completion notification, etc). This change adds a retry logic of computing the number of completions up to a configurable (via CLI) timeout (default: 2 seconds). Fixes: 79ebc3c26010 ("net/udpgso_bench_tx: options to exercise TX CMSG") Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230201001612.515730-4-andrei.gherzan@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 13:29:51 +01:00
Andrei Gherzan	dafe93b9ee	selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs "udpgro_bench.sh" invokes udpgso_bench_rx/udpgso_bench_tx programs subsequently and while doing so, there is a chance that the rx one is not ready to accept socket connections. This racing bug could fail the test with at least one of the following: ./udpgso_bench_tx: connect: Connection refused ./udpgso_bench_tx: sendmsg: Connection refused ./udpgso_bench_tx: write: Connection refused This change addresses this by making udpgro_bench.sh wait for the rx program to be ready before firing off the tx one - up to a 10s timeout. Fixes: 3a687bef148d ("selftests: udp gso benchmark") Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230201001612.515730-3-andrei.gherzan@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 13:29:51 +01:00
Andrei Gherzan	db9b47ee9f	selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided Leaving unrecognized arguments buried in the output, can easily hide a CLI/script typo. Avoid this by exiting when wrong arguments are provided to the udpgso_bench test programs. Fixes: 3a687bef148d ("selftests: udp gso benchmark") Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230201001612.515730-2-andrei.gherzan@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 13:29:51 +01:00
Andrei Gherzan	c03c80e3a0	selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning This change fixes the following compiler warning: /usr/include/x86_64-linux-gnu/bits/error.h:40:5: warning: ‘gso_size’ may be used uninitialized [-Wmaybe-uninitialized] 40 \| __error_noreturn (__status, __errnum, __format, __va_arg_pack ()); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ udpgso_bench_rx.c: In function ‘main’: udpgso_bench_rx.c:253:23: note: ‘gso_size’ was declared here 253 \| int ret, len, gso_size, budget = 256; Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Signed-off-by: Andrei Gherzan <andrei.gherzan@canonical.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20230201001612.515730-1-andrei.gherzan@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 13:29:51 +01:00
Paolo Abeni	8b6f322e47	Merge branch 'net-sched-transition-act_pedit-to-rcu-and-percpu-stats' Pedro Tammela says: ==================== net/sched: transition act_pedit to rcu and percpu stats The software pedit action didn't get the same love as some of the other actions and it's still using spinlocks and shared stats. Therefore, transition the action to rcu and percpu stats which improves the action's performance. We test this change with a very simple packet forwarding setup: tc filter add dev ens2f0 ingress protocol ip matchall \ action pedit ex munge eth src set b8:ce:f6:4b:68:35 pipe \ action pedit ex munge eth dst set ac:1f:6b:e4:ff:93 pipe \ action mirred egress redirect dev ens2f1 tc filter add dev ens2f1 ingress protocol ip matchall \ action pedit ex munge eth src set b8:ce:f6:4b:68:34 pipe \ action pedit ex munge eth dst set ac:1f:6b:e4:ff:92 pipe \ action mirred egress redirect dev ens2f0 Using TRex with a http-like profile, in our setup with a 25G NIC and a 26 cores Intel CPU, we observe the following in perf: before: 11.59% 2.30% [kernel] [k] tcf_pedit_act 2.55% tcf_pedit_act 8.38% _raw_spin_lock 6.43% native_queued_spin_lock_slowpath after: 1.46% 1.46% [kernel] [k] tcf_pedit_act tdc results for pedit after the patch: 1..69 ok 1 319a - Add pedit action that mangles IP TTL ok 2 7e67 - Replace pedit action with invalid goto chain ok 3 377e - Add pedit action with RAW_OP offset u32 ok 4 a0ca - Add pedit action with RAW_OP offset u32 (INVALID) ok 5 dd8a - Add pedit action with RAW_OP offset u16 u16 ok 6 53db - Add pedit action with RAW_OP offset u16 (INVALID) ok 7 5c7e - Add pedit action with RAW_OP offset u8 add value ok 8 2893 - Add pedit action with RAW_OP offset u8 quad ok 9 3a07 - Add pedit action with RAW_OP offset u8-u16-u8 ok 10 ab0f - Add pedit action with RAW_OP offset u16-u8-u8 ok 11 9d12 - Add pedit action with RAW_OP offset u32 set u16 clear u8 invert ok 12 ebfa - Add pedit action with RAW_OP offset overflow u32 (INVALID) ok 13 f512 - Add pedit action with RAW_OP offset u16 at offmask shift set ok 14 c2cb - Add pedit action with RAW_OP offset u32 retain value ok 15 1762 - Add pedit action with RAW_OP offset u8 clear value ok 16 bcee - Add pedit action with RAW_OP offset u8 retain value ok 17 e89f - Add pedit action with RAW_OP offset u16 retain value ok 18 c282 - Add pedit action with RAW_OP offset u32 clear value ok 19 c422 - Add pedit action with RAW_OP offset u16 invert value ok 20 d3d3 - Add pedit action with RAW_OP offset u32 invert value ok 21 57e5 - Add pedit action with RAW_OP offset u8 preserve value ok 22 99e0 - Add pedit action with RAW_OP offset u16 preserve value ok 23 1892 - Add pedit action with RAW_OP offset u32 preserve value ok 24 4b60 - Add pedit action with RAW_OP negative offset u16/u32 set value ok 25 a5a7 - Add pedit action with LAYERED_OP eth set src ok 26 86d4 - Add pedit action with LAYERED_OP eth set src & dst ok 27 f8a9 - Add pedit action with LAYERED_OP eth set dst ok 28 c715 - Add pedit action with LAYERED_OP eth set src (INVALID) ok 29 8131 - Add pedit action with LAYERED_OP eth set dst (INVALID) ok 30 ba22 - Add pedit action with LAYERED_OP eth type set/clear sequence ok 31 dec4 - Add pedit action with LAYERED_OP eth set type (INVALID) ok 32 ab06 - Add pedit action with LAYERED_OP eth add type ok 33 918d - Add pedit action with LAYERED_OP eth invert src ok 34 a8d4 - Add pedit action with LAYERED_OP eth invert dst ok 35 ee13 - Add pedit action with LAYERED_OP eth invert type ok 36 7588 - Add pedit action with LAYERED_OP ip set src ok 37 0fa7 - Add pedit action with LAYERED_OP ip set dst ok 38 5810 - Add pedit action with LAYERED_OP ip set src & dst ok 39 1092 - Add pedit action with LAYERED_OP ip set ihl & dsfield ok 40 02d8 - Add pedit action with LAYERED_OP ip set ttl & protocol ok 41 3e2d - Add pedit action with LAYERED_OP ip set ttl (INVALID) ok 42 31ae - Add pedit action with LAYERED_OP ip ttl clear/set ok 43 486f - Add pedit action with LAYERED_OP ip set duplicate fields ok 44 e790 - Add pedit action with LAYERED_OP ip set ce, df, mf, firstfrag, nofrag fields ok 45 cc8a - Add pedit action with LAYERED_OP ip set tos ok 46 7a17 - Add pedit action with LAYERED_OP ip set precedence ok 47 c3b6 - Add pedit action with LAYERED_OP ip add tos ok 48 43d3 - Add pedit action with LAYERED_OP ip add precedence ok 49 438e - Add pedit action with LAYERED_OP ip clear tos ok 50 6b1b - Add pedit action with LAYERED_OP ip clear precedence ok 51 824a - Add pedit action with LAYERED_OP ip invert tos ok 52 106f - Add pedit action with LAYERED_OP ip invert precedence ok 53 6829 - Add pedit action with LAYERED_OP beyond ip set dport & sport ok 54 afd8 - Add pedit action with LAYERED_OP beyond ip set icmp_type & icmp_code ok 55 3143 - Add pedit action with LAYERED_OP beyond ip set dport (INVALID) ok 56 815c - Add pedit action with LAYERED_OP ip6 set src ok 57 4dae - Add pedit action with LAYERED_OP ip6 set dst ok 58 fc1f - Add pedit action with LAYERED_OP ip6 set src & dst ok 59 6d34 - Add pedit action with LAYERED_OP ip6 dst retain value (INVALID) ok 60 94bb - Add pedit action with LAYERED_OP ip6 traffic_class ok 61 6f5e - Add pedit action with LAYERED_OP ip6 flow_lbl ok 62 6795 - Add pedit action with LAYERED_OP ip6 set payload_len, nexthdr, hoplimit ok 63 1442 - Add pedit action with LAYERED_OP tcp set dport & sport ok 64 b7ac - Add pedit action with LAYERED_OP tcp sport set (INVALID) ok 65 cfcc - Add pedit action with LAYERED_OP tcp flags set ok 66 3bc4 - Add pedit action with LAYERED_OP tcp set dport, sport & flags fields ok 67 f1c8 - Add pedit action with LAYERED_OP udp set dport & sport ok 68 d784 - Add pedit action with mixed RAW/LAYERED_OP #1 ok 69 70ca - Add pedit action with mixed RAW/LAYERED_OP #2 ==================== Link: https://lore.kernel.org/r/20230131190512.3805897-1-pctammela@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 13:19:07 +01:00
Pedro Tammela	95b0693823	net/sched: simplify tcf_pedit_act Remove the check for a negative number of keys as this cannot ever happen Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 13:19:02 +01:00
Pedro Tammela	52cf89f78c	net/sched: transition act_pedit to rcu and percpu stats The software pedit action didn't get the same love as some of the other actions and it's still using spinlocks and shared stats in the datapath. Transition the action to rcu and percpu stats as this improves the action's performance dramatically on multiple cpu deployments. Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 13:19:02 +01:00
Paolo Abeni	a8248fc4ad	rxrpc development -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmPZRFAACgkQ+7dXa6fL C2sf2w/+JPdtpughFCFQEztVYrw6pDr3EU4iDGrbOW1GDGHppTP+l6SaGXIG3uLD ahd/3PrTCDNwSEu7acY7RADKtse8DT2nzEutoOStTSFISpD49d/K1h7jWfvCaDO8 p18rmRPnn/MTHmWOBazivBuKiA7W8tiCm+CSIEXldfuBZ1cHTY2qjR/MdSjEnMQX T2cuCONkb7yuT0K8BCFm2Oj3LPi7EoMorie18dY9zdEsBK22unqssQ/hWzbYHxX9 JC2Xqx8G76xCDKymejpf75V3NrO/OYjr0X8JItb77uc6Fc7i944cK5WM+1ikPnox kFfPauT/vIOZFBm0bXxzVN7x4VOtfO6wc8VNNtuCYKKJFwqwbbT82bLXfVGwEFNf ZC7KeU6y3dN2iEqcPpytYIocIUE3SkYwEk8xBWGK3Ufm4MK06QqqcFowUPJbDAnQ 9RXdO6JPdswUmDtrgaqtx9K/+BWH7eRFtmbySG2+ZTIVEfthFPPr0ZGtFSGukdBr 0VGF5oUywAyW36pPGgTosOysC+wZlDdKKKlVzhWxbzhvF/+U+QGW7W0BZH6aqN8C ZslReMY4xauzluAMdyyX/ZewBCjdlX2wkOPIgieZPO0LL+ptB+gpCXXwmzVzO6ln 2i9+se4MOiWMSVENdvYRVN5tcZwf9u/r8asX4QsqtRBhjdNI5n8= =s0Uq -----END PGP SIGNATURE----- Merge tag 'rxrpc-next-20230131' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== Here's the fifth part of patches in the process of moving rxrpc from doing a lot of its stuff in softirq context to doing it in an I/O thread in process context and thereby making it easier to support a larger SACK table. The full description is in the description for the first part[1] which is now upstream. The second and third parts are also upstream[2]. A subset of the original fourth part[3] got applied as a fix for a race[4]. The fifth part includes some cleanups: (1) Miscellaneous trace header cleanups: fix a trace string, display the security index in rx_packet rather than displaying the type twice, remove some whitespace to make checkpatch happier and remove some excess tabulation. (2) Convert ->recvmsg_lock to a spinlock as it's only ever locked exclusively. (3) Make ->ackr_window and ->ackr_nr_unacked non-atomic as they're only used in the I/O thread. (4) Don't use call->tx_lock to access ->tx_buffer as that is only accessed inside the I/O thread. sendmsg() loads onto ->tx_sendmsg and the I/O thread decants from that to the buffer. (5) Remove local->defrag_sem as DATA packets are transmitted serially by the I/O thread. (6) Remove the service connection bundle is it was only used for its channel_lock - which has now gone. And some more significant changes: (7) Add a debugging option to allow a delay to be injected into packet reception to help investigate the behaviour over longer links than just a few cm. (8) Generate occasional PING ACKs to probe for RTT information during a receive heavy call. (9) Simplify the SACK table maintenance and ACK generation. Now that both parts are done in the same thread, there's no possibility of a race and no need to try and be cunning to avoid taking a BH spinlock whilst copying the SACK table (which in the future will be up to 2K) and no need to rotate the copy to fit the ACK packet table. (10) Use SKB_CONSUMED when freeing received DATA packets (stop dropwatch complaining). * tag 'rxrpc-next-20230131' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: rxrpc: Kill service bundle rxrpc: Change rx_packet tracepoint to display securityIndex not type twice rxrpc: Show consumed and freed packets as non-dropped in dropwatch rxrpc: Remove local->defrag_sem rxrpc: Don't lock call->tx_lock to access call->tx_buffer rxrpc: Simplify ACK handling rxrpc: De-atomic call->ackr_window and call->ackr_nr_unacked rxrpc: Generate extra pings for RTT during heavy-receive call rxrpc: Allow a delay to be injected into packet reception rxrpc: Convert call->recvmsg_lock to a spinlock rxrpc: Shrink the tabulation in the rxrpc trace header a bit rxrpc: Remove whitespace before ')' in trace header rxrpc: Fix trace string ==================== Link: https://lore.kernel.org/all/20230131171227.3912130-1-dhowells@redhat.com/ Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 12:47:28 +01:00
Hans de Goede	eecf2acd4a	platform/x86: touchscreen_dmi: Add Chuwi Vi8 (CWI501) DMI match Add a DMI match for the CWI501 version of the Chuwi Vi8 tablet, pointing to the same chuwi_vi8_data as the existing CWI506 version DMI match. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230202103413.331459-1-hdegoede@redhat.com	2023-02-02 11:34:38 +01:00
Marc Kleine-Budde	1613fff7a3	can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq If the a new ring layout is set, the max coalesced frames for RX and TX are re-calculated, too. Add the missing assignment of the newly calculated TX max coalesced frames. Fixes: 656fc12ddaf8 ("can: mcp251xfd: add TX IRQ coalescing ethtool support") Link: https://lore.kernel.org/all/20230130154334.1578518-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-02 10:33:27 +01:00
Oliver Hartkopp	4f027cba82	can: isotp: split tx timer into transmission and timeout The timer for the transmission of isotp PDUs formerly had two functions: 1. send two consecutive frames with a given time gap 2. monitor the timeouts for flow control frames and the echo frames This led to larger txstate checks and potentially to a problem discovered by syzbot which enabled the panic_on_warn feature while testing. The former 'txtimer' function is split into 'txfrtimer' and 'txtimer' to handle the two above functionalities with separate timer callbacks. The two simplified timers now run in one-shot mode and make the state transitions (especially with isotp_rcv_echo) better understandable. Fixes: 866337865f37 ("can: isotp: fix tx state handling for echo tx processing") Reported-by: syzbot+5aed6c3aaba661f5b917@syzkaller.appspotmail.com Cc: stable@vger.kernel.org # >= v6.0 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230104145701.2422-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-02 10:33:26 +01:00
Oliver Hartkopp	823b2e4272	can: isotp: handle wait_event_interruptible() return values When wait_event_interruptible() has been interrupted by a signal the tx.state value might not be ISOTP_IDLE. Force the state machines into idle state to inhibit the timer handlers to continue working. Fixes: 866337865f37 ("can: isotp: fix tx state handling for echo tx processing") Cc: stable@vger.kernel.org Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230112192347.1944-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-02 10:33:26 +01:00
Oliver Hartkopp	3793301cba	can: raw: fix CAN FD frame transmissions over CAN XL devices A CAN XL device is always capable to process CAN FD frames. The former check when sending CAN FD frames relied on the existence of a CAN FD device and did not check for a CAN XL device that would be correct too. With this patch the CAN FD feature is enabled automatically when CAN XL is switched on - and CAN FD cannot be switch off while CAN XL is enabled. This precondition also leads to a clean up and reduction of checks in the hot path in raw_rcv() and raw_sendmsg(). Some conditions are reordered to handle simple checks first. changes since v1: https://lore.kernel.org/all/20230131091012.50553-1-socketcan@hartkopp.net - fixed typo: devive -> device changes since v2: https://lore.kernel.org/all/20230131091824.51026-1-socketcan@hartkopp.net/ - reorder checks in if statements to handle simple checks first Fixes: 626332696d75 ("can: raw: add CAN XL support") Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230131105613.55228-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-02 10:33:26 +01:00
Ziyang Xuan	d0553680f9	can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate The conclusion "j1939_session_deactivate() should be called with a session ref-count of at least 2" is incorrect. In some concurrent scenarios, j1939_session_deactivate can be called with the session ref-count less than 2. But there is not any problem because it will check the session active state before session putting in j1939_session_deactivate_locked(). Here is the concurrent scenario of the problem reported by syzbot and my reproduction log. cpu0 cpu1 j1939_xtp_rx_eoma j1939_xtp_rx_abort_one j1939_session_get_by_addr [kref == 2] j1939_session_get_by_addr [kref == 3] j1939_session_deactivate [kref == 2] j1939_session_put [kref == 1] j1939_session_completed j1939_session_deactivate WARN_ON_ONCE(kref < 2) ===================================================== WARNING: CPU: 1 PID: 21 at net/can/j1939/transport.c:1088 j1939_session_deactivate+0x5f/0x70 CPU: 1 PID: 21 Comm: ksoftirqd/1 Not tainted 5.14.0-rc7+ #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:j1939_session_deactivate+0x5f/0x70 Call Trace: j1939_session_deactivate_activate_next+0x11/0x28 j1939_xtp_rx_eoma+0x12a/0x180 j1939_tp_recv+0x4a2/0x510 j1939_can_recv+0x226/0x380 can_rcv_filter+0xf8/0x220 can_receive+0x102/0x220 ? process_backlog+0xf0/0x2c0 can_rcv+0x53/0xf0 __netif_receive_skb_one_core+0x67/0x90 ? process_backlog+0x97/0x2c0 __netif_receive_skb+0x22/0x80 Fixes: 0c71437dd50d ("can: j1939: j1939_session_deactivate(): clarify lifetime of session object") Reported-by: syzbot+9981a614060dcee6eeca@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20210906094200.95868-1-william.xuanziyang@huawei.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-02-02 10:33:26 +01:00
Michael Kelley	99f1c46011	hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap() netvsc_dma_map() and netvsc_dma_unmap() currently check the cp_partial flag and adjust the page_count so that pagebuf entries for the RNDIS portion of the message are skipped when it has already been copied into a send buffer. But this adjustment has already been made by code in netvsc_send(). The duplicate adjustment causes some pagebuf entries to not be mapped. In a normal VM, this doesn't break anything because the mapping doesn’t change the PFN. But in a Confidential VM, dma_map_single() does bounce buffering and provides a different PFN. Failing to do the mapping causes the wrong PFN to be passed to Hyper-V, and various errors ensue. Fix this by removing the duplicate adjustment in netvsc_dma_map() and netvsc_dma_unmap(). Fixes: 846da38de0e8 ("net: netvsc: Add Isolation VM support for netvsc driver") Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/1675135986-254490-1-git-send-email-mikelley@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-02-02 09:15:30 +01:00
Sunil Goutham	609aa68d60	octeontx2-af: Removed unnecessary debug messages. NPC exact match feature is supported only on one silicon variant, removed debug messages which print that this feature is not available on all other silicon variants. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230201040301.1034843-1-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 21:33:08 -08:00
Heng Qi	981f14d42a	virtio-net: fix possible unsigned integer overflow When the single-buffer xdp is loaded and after xdp_linearize_page() is called, num_buf becomes 0 and (num_buf - 1) may overflow into a large integer in virtnet_build_xdp_buff_mrg(), resulting in unexpected packet dropping. Fixes: ef75cb51f139 ("virtio-net: build xdp_buff with multi buffers") Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20230131085004.98687-1-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 21:31:56 -08:00
Ratheesh Kannoth	917d5e04d4	octeontx2-af: Fix devlink unregister Exact match feature is only available in CN10K-B. Unregister exact match devlink entry only for this silicon variant. Fixes: 87e4ea29b030 ("octeontx2-af: Debugsfs support for exact match.") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230131061659.1025137-1-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 21:22:41 -08:00
Tom Rix	a2df8463e1	igc: return an error if the mac type is unknown in igc_ptp_systim_to_hwtstamp() clang static analysis reports drivers/net/ethernet/intel/igc/igc_ptp.c:673:3: warning: The left operand of '+' is a garbage value [core.UndefinedBinaryOperatorResult] ktime_add_ns(shhwtstamps.hwtstamp, adjust); ^ ~~~~~~~~~~~~~~~~~~~~ igc_ptp_systim_to_hwtstamp() silently returns without setting the hwtstamp if the mac type is unknown. This should be treated as an error. Fixes: 81b055205e8b ("igc: Add support for RX timestamping") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230131215437.1528994-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 21:19:57 -08:00
Leon Romanovsky	028fb19c6b	netlink: provide an ability to set default extack message In netdev common pattern, extack pointer is forwarded to the drivers to be filled with error message. However, the caller can easily overwrite the filled message. Instead of adding multiple "if (!extack->_msg)" checks before any NL_SET_ERR_MSG() call, which appears after call to the driver, let's add new macro to common code. [1] https://lore.kernel.org/all/Y9Irgrgf3uxOjwUm@unreal Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/6993fac557a40a1973dfa0095107c3d03d40bec1.1675171790.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 21:04:09 -08:00
Brian Haley	62e395f82d	neighbor: fix proxy_delay usage when it is zero When set to zero, the neighbor sysctl proxy_delay value does not cause an immediate reply for ARP/ND requests as expected, it instead causes a random delay between [0, U32_MAX). Looking at this comment from __get_random_u32_below() explains the reason: /* * This function is technically undefined for ceil == 0, and in fact * for the non-underscored constant version in the header, we build bug * on that. But for the non-constant case, it's convenient to have that * evaluate to being a straight call to get_random_u32(), so that * get_random_u32_inclusive() can work over its whole range without * undefined behavior. */ Added helper function that does not call get_random_u32_below() if proxy_delay is zero and just uses the current value of jiffies instead, causing pneigh_enqueue() to respond immediately. Also added definition of proxy_delay to ip-sysctl.txt since it was missing. Signed-off-by: Brian Haley <haleyb.dev@gmail.com> Link: https://lore.kernel.org/r/20230130171428.367111-1-haleyb.dev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 21:02:54 -08:00
Jakub Kicinski	983f507c30	Merge branch 'net-support-ipv4-big-tcp' Xin Long says: ==================== net: support ipv4 big tcp This is similar to the BIG TCP patchset added by Eric for IPv6: https://lwn.net/Articles/895398/ Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header doesn't have exthdrs(options) for the BIG TCP packets' length. To make it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to indicate this might be a BIG TCP packet and use skb->len as the real IPv4 total length. This will work safely, as all BIG TCP packets are GSO/GRO packets and processed on the same host as they were created; There is no padding in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4 packet total length; Also, before implementing the feature, all those places that may get iph tot_len from BIG TCP packets are taken care with some new APIs: Patch 1 adds some APIs for iph tot_len setting and getting, which are used in all these places where IPv4 BIG TCP packets may reach in Patch 2-7, Patch 8 adds a GSO_TCP tp_status for af_packet users, and Patch 9 add new netlink attributes to make IPv4 BIG TCP independent from IPv6 BIG TCP on configuration, and Patch 10 implements this feature. Note that the similar change as in Patch 2-6 are also needed for IPv6 BIG TCP packets, and will be addressed in another patchset. The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC and 1.5K MTU: No BIG TCP: for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT\|tail -1; done 168 322 337 3776.49 143 236 277 4654.67 128 258 288 4772.83 171 229 278 4645.77 175 228 243 4678.93 149 239 279 4599.86 164 234 268 4606.94 155 276 289 4235.82 180 255 268 4418.95 168 241 249 4417.82 Enable BIG TCP: ip link set dev ens1f0np0 gro_ipv4_max_size 128000 gso_ipv4_max_size 128000 for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT\|tail -1; done 161 241 252 4821.73 174 205 217 5098.28 167 208 220 5001.43 164 228 249 4883.98 150 233 249 4914.90 180 233 244 4819.66 154 208 219 5004.92 157 209 247 4999.78 160 218 246 4842.31 174 206 217 5080.99 Thanks for the feedback from Eric and David Ahern. ==================== Link: https://lore.kernel.org/r/cover.1674921359.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:30 -08:00
Xin Long	b1a78b9b98	net: add support for ipv4 big tcp Similar to Eric's IPv6 BIG TCP, this patch is to enable IPv4 BIG TCP. Firstly, allow sk->sk_gso_max_size to be set to a value greater than GSO_LEGACY_MAX_SIZE by not trimming gso_max_size in sk_trim_gso_size() for IPv4 TCP sockets. Then on TX path, set IP header tot_len to 0 when skb->len > IP_MAX_MTU in __ip_local_out() to allow to send BIG TCP packets, and this implies that skb->len is the length of a IPv4 packet; On RX path, use skb->len as the length of the IPv4 packet when the IP header tot_len is 0 and skb->len > IP_MAX_MTU in ip_rcv_core(). As the API iph_set_totlen() and skb_ip_totlen() are used in __ip_local_out() and ip_rcv_core(), we only need to update these APIs. Also in GRO receive, add the check for ETH_P_IP/IPPROTO_TCP, and allows the merged packet size >= GRO_LEGACY_MAX_SIZE in skb_gro_receive(). In GRO complete, set IP header tot_len to 0 when the merged packet size greater than IP_MAX_MTU in iph_set_totlen() so that it can be processed on RX path. Note that by checking skb_is_gso_tcp() in API iph_totlen(), it makes this implementation safe to use iph->len == 0 indicates IPv4 BIG TCP packets. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:27 -08:00
Xin Long	9eefedd58a	net: add gso_ipv4_max_size and gro_ipv4_max_size per device This patch introduces gso_ipv4_max_size and gro_ipv4_max_size per device and adds netlink attributes for them, so that IPV4 BIG TCP can be guarded by a separate tunable in the next patch. To not break the old application using "gso/gro_max_size" for IPv4 GSO packets, this patch updates "gso/gro_ipv4_max_size" in netif_set_gso/gro_max_size() if the new size isn't greater than GSO_LEGACY_MAX_SIZE, so that nothing will change even if userspace doesn't realize the new netlink attributes. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:27 -08:00
Xin Long	8e08bb75b6	packet: add TP_STATUS_GSO_TCP for tp_status Introduce TP_STATUS_GSO_TCP tp_status flag to tell the af_packet user that this is a TCP GSO packet. When parsing IPv4 BIG TCP packets in tcpdump/libpcap, it can use tp_len as the IPv4 packet len when this flag is set, as iph tot_len is set to 0 for IPv4 BIG TCP packets. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:27 -08:00
Xin Long	50e6fb5c6e	ipvlan: use skb_ip_totlen in ipvlan_get_L3_hdr ipvlan devices calls netif_inherit_tso_max() to get the tso_max_size/segs from the lower device, so when lower device supports BIG TCP, the ipvlan devices support it too. We also should consider its iph tot_len accessing. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:27 -08:00
Xin Long	7eb072be41	cipso_ipv4: use iph_set_totlen in skbuff_setattr It may process IPv4 TCP GSO packets in cipso_v4_skbuff_setattr(), so the iph->tot_len update should use iph_set_totlen(). Note that for these non GSO packets, the new iph tot_len with extra iph option len added may become greater than 65535, the old process will cast it and set iph->tot_len to it, which is a bug. In theory, iph options shouldn't be added for these big packets in here, a fix may be needed here in the future. For now this patch is only to set iph->tot_len to 0 when it happens. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:27 -08:00
Xin Long	a13fbf5ed5	netfilter: use skb_ip_totlen and iph_totlen There are also quite some places in netfilter that may process IPv4 TCP GSO packets, we need to replace them too. In length_mt(), we have to use u_int32_t/int to accept skb_ip_totlen() return value, otherwise it may overflow and mismatch. This change will also help us add selftest for IPv4 BIG TCP in the following patch. Note that we don't need to replace the one in tcpmss_tg4(), as it will return if there is data after tcphdr in tcpmss_mangle_packet(). The same in mangle_contents() in nf_nat_helper.c, it returns false when skb->len + extra > 65535 in enlarge_skb(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:27 -08:00
Xin Long	043e397e48	net: sched: use skb_ip_totlen and iph_totlen There are 1 action and 1 qdisc that may process IPv4 TCP GSO packets and access iph->tot_len, replace them with skb_ip_totlen() and iph_totlen() accordingly. Note that we don't need to replace the one in tcf_csum_ipv4(), as it will return for TCP GSO packets in tcf_csum_ipv4_tcp(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:27 -08:00
Xin Long	ec84c955a0	openvswitch: use skb_ip_totlen in conntrack IPv4 GSO packets may get processed in ovs_skb_network_trim(), and we need to use skb_ip_totlen() to get iph totlen. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:27 -08:00
Xin Long	46abd17302	bridge: use skb_ip_totlen in br netfilter These 3 places in bridge netfilter are called on RX path after GRO and IPv4 TCP GSO packets may come through, so replace iph tot_len accessing with skb_ip_totlen() in there. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:27 -08:00
Xin Long	058a8f7f73	net: add a couple of helpers for iph tot_len This patch adds three APIs to replace the iph->tot_len setting and getting in all places where IPv4 BIG TCP packets may reach, they will be used in the following patches. Note that iph_totlen() will be used when iph is not in linear data of the skb. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:54:27 -08:00
Jakub Kicinski	d8673afbf5	Merge branch 'virtio_net-vdpa-update-mac-address-when-it-is-generated-by-virtio-net' Laurent Vivier says: ==================== virtio_net: vdpa: update MAC address when it is generated by virtio-net When the MAC address is not provided by the vdpa device virtio_net driver assigns a random one without notifying the device. The consequence, in the case of mlx5_vdpa, is the internal routing tables of the device are not updated and this can block the communication between two namespaces. To fix this problem, use virtnet_send_command(VIRTIO_NET_CTRL_MAC) to set the address from virtnet_probe() when the MAC address is not provided by the device. ==================== Link: https://lore.kernel.org/r/20230127204500.51930-1-lvivier@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:53:06 -08:00
Laurent Vivier	9f62d221a4	virtio_net: notify MAC address change on device initialization In virtnet_probe(), if the device doesn't provide a MAC address the driver assigns a random one. As we modify the MAC address we need to notify the device to allow it to update all the related information. The problem can be seen with vDPA and mlx5_vdpa driver as it doesn't assign a MAC address by default. The virtio_net device uses a random MAC address (we can see it with "ip link"), but we can't ping a net namespace from another one using the virtio-vdpa device because the new MAC address has not been provided to the hardware: RX packets are dropped since they don't go through the receive filters, TX packets go through unaffected. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:53:04 -08:00
Laurent Vivier	7c06458c10	virtio_net: disable VIRTIO_NET_F_STANDBY if VIRTIO_NET_F_MAC is not set failover relies on the MAC address to pair the primary and the standby devices: "[...] the hypervisor needs to enable VIRTIO_NET_F_STANDBY feature on the virtio-net interface and assign the same MAC address to both virtio-net and VF interfaces." Documentation/networking/net_failover.rst This patch disables the STANDBY feature if the MAC address is not provided by the hypervisor. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 20:53:04 -08:00
Huayu Chen	ca3daf437d	nfp: correct cleanup related to DCB resources This patch corrects two oversights relating to releasing resources and DCB initialisation. 1. If mapping of the dcbcfg_tbl area fails: an error should be propagated, allowing partial initialisation (probe) to be unwound. 2. Conversely, if where dcbcfg_tbl is successfully mapped: it should be unmapped in nfp_nic_dcb_clean() which is called via various error cleanup paths, and shutdown or removal of the PCIE device. Fixes: 9b7fe8046d74 ("nfp: add DCB IEEE support") Signed-off-by: Huayu Chen <huayu.chen@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230131163033.981937-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 19:57:30 -08:00
Yanguo Li	9c6b9cbafd	nfp: flower: avoid taking mutex in atomic context A mutex may sleep, which is not permitted in atomic context. Avoid a case where this may arise by moving the to nfp_flower_lag_get_info_from_netdev() in nfp_tun_write_neigh() spinlock. Fixes: abc210952af7 ("nfp: flower: tunnel neigh support bond offload") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Yanguo Li <yanguo.li@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230131080313.2076060-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 19:56:11 -08:00
Jiapeng Chong	bc61761394	ipv6: ICMPV6: Use swap() instead of open coding it Swap is a function interface that provides exchange function. To avoid code duplication, we can use swap function. ./net/ipv6/icmp.c:344:25-26: WARNING opportunity for swap(). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3896 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230131063456.76302-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 19:55:41 -08:00
Jakub Kicinski	cca6e9ff22	Merge branch 'ip-ip6_gre-fix-gre-tunnels-not-generating-ipv6-link-local-addresses' Thomas Winter says: ==================== ip/ip6_gre: Fix GRE tunnels not generating IPv6 link local addresses For our point-to-point GRE tunnels, they have IN6_ADDR_GEN_MODE_NONE when they are created then we set IN6_ADDR_GEN_MODE_EUI64 when they come up to generate the IPv6 link local address for the interface. Recently we found that they were no longer generating IPv6 addresses. Also, non-point-to-point tunnels were not generating any IPv6 link local address and instead generating an IPv6 compat address, breaking IPv6 communication on the tunnel. These failures were caused by commit e5dd729460ca and this patch set aims to resolve these issues. ==================== Link: https://lore.kernel.org/r/20230131034646.237671-1-Thomas.Winter@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-01 19:52:35 -08:00

... 2 3 4 5 6 ...

1156246 Commits