68749 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Jakub Kicinski
|
03eb7daec5 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Incorrect comparison in bitmask .reduce, from Jeremy Sowden. 2) Missing GFP_KERNEL_ACCOUNT for dynamically allocated objects, from Vasily Averin. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: memcg accounting for dynamically allocated objects netfilter: bitwise: fix reduce comparisons ==================== Link: https://lore.kernel.org/r/20220405100923.7231-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
David Ahern
|
1158f79f82 |
ipv6: Fix stats accounting in ip6_pkt_drop
VRF devices are the loopbacks for VRFs, and a loopback can not be assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should be '||' not '&&'. Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach") Reported-by: Pudak, Filip <Filip.Pudak@windriver.com> Reported-by: Xiao, Jiguang <Jiguang.Xiao@windriver.com> Signed-off-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Vasily Averin
|
42193ffd79 |
netfilter: nf_tables: memcg accounting for dynamically allocated objects
nft_*.c files whose NFT_EXPR_STATEFUL flag is set on need to use __GFP_ACCOUNT flag for objects that are dynamically allocated from the packet path. Such objects are allocated inside nft_expr_ops->init() callbacks executed in task context while processing netlink messages. In addition, this patch adds accounting to nft_set_elem_expr_clone() used for the same purposes. Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> |
||
Jamie Bainbridge
|
e3d37210df |
sctp: count singleton chunks in assoc user stats
Singleton chunks (INIT, HEARTBEAT PMTU probes, and SHUTDOWN- COMPLETE) are not counted in SCTP_GET_ASOC_STATS "sas_octrlchunks" counter available to the assoc owner. These are all control chunks so they should be counted as such. Add counting of singleton chunks so they are properly accounted for. Fixes: 196d67593439 ("sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/c9ba8785789880cf07923b8a5051e174442ea9ee.1649029663.git.jamie.bainbridge@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Nikolay Aleksandrov
|
6bf92d70e6 |
net: ipv4: fix route with nexthop object delete warning
FRR folks have hit a kernel warning[1] while deleting routes[2] which is caused by trying to delete a route pointing to a nexthop id without specifying nhid but matching on an interface. That is, a route is found but we hit a warning while matching it. The warning is from fib_info_nh() in include/net/nexthop.h because we run it on a fib_info with nexthop object. The call chain is: inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a nexthop fib_info and also with fc_oif set thus calling fib_info_nh on the fib_info and triggering the warning). The fix is to not do any matching in that branch if the fi has a nexthop object because those are managed separately. I.e. we should match when deleting without nh spec and should fail when deleting a nexthop route with old-style nh spec because nexthop objects are managed separately, e.g.: $ ip r show 1.2.3.4/32 1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0 $ ip r del 1.2.3.4/32 $ ip r del 1.2.3.4/32 nhid 12 <both should work> $ ip r del 1.2.3.4/32 dev dummy0 <should fail with ESRCH> [1] [ 523.462226] ------------[ cut here ]------------ [ 523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460 [ 523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd [ 523.462274] videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse [ 523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P OE 5.16.18-200.fc35.x86_64 #1 [ 523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020 [ 523.462303] RIP: 0010:fib_nh_match+0x210/0x460 [ 523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00 [ 523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286 [ 523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0 [ 523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380 [ 523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000 [ 523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031 [ 523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0 [ 523.462311] FS: 00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000 [ 523.462313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0 [ 523.462315] Call Trace: [ 523.462316] <TASK> [ 523.462320] fib_table_delete+0x1a9/0x310 [ 523.462323] inet_rtm_delroute+0x93/0x110 [ 523.462325] rtnetlink_rcv_msg+0x133/0x370 [ 523.462327] ? _copy_to_iter+0xb5/0x6f0 [ 523.462330] ? rtnl_calcit.isra.0+0x110/0x110 [ 523.462331] netlink_rcv_skb+0x50/0xf0 [ 523.462334] netlink_unicast+0x211/0x330 [ 523.462336] netlink_sendmsg+0x23f/0x480 [ 523.462338] sock_sendmsg+0x5e/0x60 [ 523.462340] ____sys_sendmsg+0x22c/0x270 [ 523.462341] ? import_iovec+0x17/0x20 [ 523.462343] ? sendmsg_copy_msghdr+0x59/0x90 [ 523.462344] ? __mod_lruvec_page_state+0x85/0x110 [ 523.462348] ___sys_sendmsg+0x81/0xc0 [ 523.462350] ? netlink_seq_start+0x70/0x70 [ 523.462352] ? __dentry_kill+0x13a/0x180 [ 523.462354] ? __fput+0xff/0x250 [ 523.462356] __sys_sendmsg+0x49/0x80 [ 523.462358] do_syscall_64+0x3b/0x90 [ 523.462361] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 523.462364] RIP: 0033:0x7f24552aa337 [ 523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337 [ 523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003 [ 523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001 [ 523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040 [ 523.462373] </TASK> [ 523.462374] ---[ end trace ba537bc16f6bf4ed ]--- [2] https://github.com/FRRouting/frr/issues/6412 Fixes: 4c7e8084fd46 ("ipv4: Plumb support for nexthop object in a fib_info") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Matt Johnston
|
4a9dda1c1d |
mctp: Use output netdev to allocate skb headroom
Previously the skb was allocated with headroom MCTP_HEADER_MAXLEN, but that isn't sufficient if we are using devs that are not MCTP specific. This also adds a check that the smctp_halen provided to sendmsg for extended addressing is the correct size for the netdev. Fixes: 833ef3b91de6 ("mctp: Populate socket implementation") Reported-by: Matthew Rinaldi <mjrinal@g.clemson.edu> Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Matt Johnston
|
60be976ac4 |
mctp: Fix check for dev_hard_header() result
dev_hard_header() returns the length of the header, so we need to test for negative errors rather than non-zero. Fixes: 889b7da23abf ("mctp: Add initial routing framework") Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vladimir Oltean
|
066dfc4290 |
Revert "net: dsa: stop updating master MTU from master.c"
This reverts commit a1ff94c2973c43bc1e2677ac63ebb15b1d1ff846. Switch drivers that don't implement ->port_change_mtu() will cause the DSA master to remain with an MTU of 1500, since we've deleted the other code path. In turn, this causes a regression for those systems, where MTU-sized traffic can no longer be terminated. Revert the change taking into account the fact that rtnl_lock() is now taken top-level from the callers of dsa_master_setup() and dsa_master_teardown(). Also add a comment in order for it to be absolutely clear why it is still needed. Fixes: a1ff94c2973c ("net: dsa: stop updating master MTU from master.c") Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Jean-Philippe Brucker
|
1effe8ca4e |
skbuff: fix coalescing for page_pool fragment recycling
Fix a use-after-free when using page_pool with page fragments. We encountered this problem during normal RX in the hns3 driver: (1) Initially we have three descriptors in the RX queue. The first one allocates PAGE1 through page_pool, and the other two allocate one half of PAGE2 each. Page references look like this: RX_BD1 _______ PAGE1 RX_BD2 _______ PAGE2 RX_BD3 _________/ (2) Handle RX on the first descriptor. Allocate SKB1, eventually added to the receive queue by tcp_queue_rcv(). (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to netif_receive_skb(): netif_receive_skb(SKB2) ip_rcv(SKB2) SKB3 = skb_clone(SKB2) SKB2 and SKB3 share a reference to PAGE2 through skb_shinfo()->dataref. The other ref to PAGE2 is still held by RX_BD3: SKB2 ---+- PAGE2 SKB3 __/ / RX_BD3 _________/ (3b) Now while handling TCP, coalesce SKB3 with SKB1: tcp_v4_rcv(SKB3) tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds kfree_skb_partial(SKB3) skb_release_data(SKB3) // drops one dataref SKB1 _____ PAGE1 \____ SKB2 _____ PAGE2 / RX_BD3 _________/ In skb_try_coalesce(), __skb_frag_ref() takes a page reference to PAGE2, where it should instead have increased the page_pool frag reference, pp_frag_count. Without coalescing, when releasing both SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now when releasing SKB1 and SKB2, two references to PAGE2 will be dropped, resulting in underflow. (3c) Drop SKB2: af_packet_rcv(SKB2) consume_skb(SKB2) skb_release_data(SKB2) // drops second dataref page_pool_return_skb_page(PAGE2) // drops one pp_frag_count SKB1 _____ PAGE1 \____ PAGE2 / RX_BD3 _________/ (4) Userspace calls recvmsg() Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we release the SKB3 page as well: tcp_eat_recv_skb(SKB1) skb_release_data(SKB1) page_pool_return_skb_page(PAGE1) page_pool_return_skb_page(PAGE2) // drops second pp_frag_count (5) PAGE2 is freed, but the third RX descriptor was still using it! In our case this causes IOMMU faults, but it would silently corrupt memory if the IOMMU was disabled. Change the logic that checks whether pp_recycle SKBs can be coalesced. We still reject differing pp_recycle between 'from' and 'to' SKBs, but in order to avoid the situation described above, we also reject coalescing when both 'from' and 'to' are pp_recycled and 'from' is cloned. The new logic allows coalescing a cloned pp_recycle SKB into a page refcounted one, because in this case the release (4) will drop the right reference, the one taken by skb_try_coalesce(). Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") Suggested-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Ziyang Xuan
|
9381fe8c84 |
net/tls: fix slab-out-of-bounds bug in decrypt_internal
The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in tls_set_sw_offload(). The return value of crypto_aead_ivsize() for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes memory space will trigger slab-out-of-bounds bug as following: ================================================================== BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls] Read of size 16 at addr ffff888114e84e60 by task tls/10911 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db ? decrypt_internal+0x385/0xc40 [tls] kasan_report+0xab/0x120 ? decrypt_internal+0x385/0xc40 [tls] kasan_check_range+0xf9/0x1e0 memcpy+0x20/0x60 decrypt_internal+0x385/0xc40 [tls] ? tls_get_rec+0x2e0/0x2e0 [tls] ? process_rx_list+0x1a5/0x420 [tls] ? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls] decrypt_skb_update+0x9d/0x400 [tls] tls_sw_recvmsg+0x3c8/0xb50 [tls] Allocated by task 10911: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 tls_set_sw_offload+0x2eb/0xa20 [tls] tls_setsockopt+0x68c/0x700 [tls] __sys_setsockopt+0xfe/0x1b0 Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size when memcpy() iv value in TLS_1_3_VERSION scenario. Fixes: f295b3ae9f59 ("net/tls: Add support of AES128-CCM based ciphers") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Linus Torvalds
|
2975dbdc39 |
Networking fixes for 5.18-rc1 and rethook patches.
Features: - kprobes: rethook: x86: replace kretprobe trampoline with rethook Current release - regressions: - sfc: avoid null-deref on systems without NUMA awareness in the new queue sizing code Current release - new code bugs: - vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices - eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when interface is down Previous releases - always broken: - openvswitch: correct neighbor discovery target mask field in the flow dump - wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak - rxrpc: fix call timer start racing with call destruction - rxrpc: fix null-deref when security type is rxrpc_no_security - can: fix UAF bugs around echo skbs in multiple drivers Misc: - docs: move netdev-FAQ to the "process" section of the documentation Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmJF3S0ACgkQMUZtbf5S IruIvA/+NZx+c+fBBrbjOh63avRL7kYIqIDREf+v6lh4ZXmbrp22xalcjIdxgWeK vAiYfYmzZblWAGkilcvPG3blCBc+9b+YE+pPJXFe60Huv3eYpjKfgTKwQOg/lIeM 8MfPP7eBwcJ/ltSTRtySRl9LYgyVcouP9rAVJavFVYrvuDYunwhfChswVfGCYon8 42O4nRwrtkTE1MjHD8HS3YxvwGlo+iIyhsxgG/gWx8F2xeIG22H6adzjDXcCQph8 air/awrJ4enYkVMRokGNfNppK9Z3vjJDX5xha3CREpvXNPe0F24cAE/L8XqyH7+r /bXP5y9VC9mmEO7x4Le3VmDhOJGbCOtR89gTlevftDRdSIrbNHffZhbPW48tR7o8 NJFlhiSJb4HEMN0q7BmxnWaKlbZUlvLEXLuU5ytZE/G7i+nETULlunfZrCD4eNYH gBGYhiob2I/XotJA9QzG/RDyaFwDaC/VARsyv37PSeBAl/yrEGAeP7DsKkKX/ayg LM9ItveqHXK30J0xr3QJA8s49EkIYejjYR3l0hQ9esf9QvGK99dE/fo44Apf3C3A Lz6XpnRc5Xd7tZ9Aopwb3FqOH6WR9Hq9Qlbk0qifsL/2sRbatpuZbbDK6L3CR3Ir WFNcOoNbbqv85kCKFXFjj0jdpoNa9Yej8XFkMkVSkM3sHImYmYQ= =5Bvy -----END PGP SIGNATURE----- Merge tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull more networking updates from Jakub Kicinski: "Networking fixes and rethook patches. Features: - kprobes: rethook: x86: replace kretprobe trampoline with rethook Current release - regressions: - sfc: avoid null-deref on systems without NUMA awareness in the new queue sizing code Current release - new code bugs: - vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices - eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when interface is down Previous releases - always broken: - openvswitch: correct neighbor discovery target mask field in the flow dump - wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak - rxrpc: fix call timer start racing with call destruction - rxrpc: fix null-deref when security type is rxrpc_no_security - can: fix UAF bugs around echo skbs in multiple drivers Misc: - docs: move netdev-FAQ to the 'process' section of the documentation" * tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices openvswitch: Add recirc_id to recirc warning rxrpc: fix some null-ptr-deref bugs in server_key.c rxrpc: Fix call timer start racing with call destruction net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware net: hns3: fix the concurrency between functions reading debugfs docs: netdev: move the netdev-FAQ to the process pages docs: netdev: broaden the new vs old code formatting guidelines docs: netdev: call out the merge window in tag checking docs: netdev: add missing back ticks docs: netdev: make the testing requirement more stringent docs: netdev: add a question about re-posting frequency docs: netdev: rephrase the 'should I update patchwork' question docs: netdev: rephrase the 'Under review' question docs: netdev: shorten the name and mention msgid for patch status docs: netdev: note that RFC postings are allowed any time docs: netdev: turn the net-next closed into a Warning docs: netdev: move the patch marking section up docs: netdev: minor reword docs: netdev: replace references to old archives ... |
||
Stéphane Graber
|
ea07af2e71 |
openvswitch: Add recirc_id to recirc warning
When hitting the recirculation limit, the kernel would currently log something like this: [ 58.586597] openvswitch: ovs-system: deferred action limit reached, drop recirc action Which isn't all that useful to debug as we only have the interface name to go on but can't track it down to a specific flow. With this change, we now instead get: [ 58.586597] openvswitch: ovs-system: deferred action limit reached, drop recirc action (recirc_id=0x9e) Which can now be correlated with the flow entries from OVS. Suggested-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Tested-by: Stephane Graber <stgraber@ubuntu.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20220330194244.3476544-1-stgraber@ubuntu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jakub Kicinski
|
46b556205d |
linux-can-fixes-for-5.18-20220331
-----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmJFXq0THG1rbEBwZW5n dXRyb25peC5kZQAKCRCtfkuQ2KDTXc+GB/9hPO+0SN80Fz93rDE8M6wI1rY6famJ mqcaNmSDGuX9jMxC3CLugw8rckfXKb0kicjq8xLqAU7PX67h4nShnZCmHq+4RnJW AZIV4sUOgeMcDohMPY0b8BIyS1JDwjR2L9vrFjeJm/BlDC2mkbRC2pXr6Pq9Xg4Y aBaNA94v9UMQvTdPK8UmHOvl3M6gPV3ggBiCRM5xxnYGkrQFApgtCsXMvizbBUGS tNLEOGYD/YAZKAiZfEnaTKqklUCGPdm4vuiP+4F9+2ovBK38CiiJSx/fF6aLkZy+ RdLpjrsqjva1cPSxaE3tAm45KOF5vm3hrBGWgslJz+0CN+tBWy/Q2G4s =yO1B -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-5.18-20220331' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-03-31 The first patch is by Oliver Hartkopp and fixes MSG_PEEK feature in the CAN ISOTP protocol (broken in net-next for v5.18 only). Tom Rix's patch for the mcp251xfd driver fixes the propagation of an error value in case of an error. A patch by me for the m_can driver fixes a use-after-free in the xmit handler for m_can IP cores v3.0.x. Hangyu Hua contributes 3 patches fixing the same double free in the error path of the xmit handler in the ems_usb, usb_8dev and mcba_usb USB CAN driver. Pavel Skripkin contributes a patch for the mcba_usb driver to properly check the endpoint type. The last patch is by me and fixes a mem leak in the gs_usb, which was introduced in net-next for v5.18. * tag 'linux-can-fixes-for-5.18-20220331' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: gs_usb: gs_make_candev(): fix memory leak for devices with extended bit timing configuration can: mcba_usb: properly check endpoint type can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path can: m_can: m_can_tx_handler(): fix use after free of skb can: mcp251xfd: mcp251xfd_register_get_dev_id(): fix return of error value can: isotp: restore accidentally removed MSG_PEEK feature ==================== Link: https://lore.kernel.org/r/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Xiaolong Huang
|
ff8376ade4 |
rxrpc: fix some null-ptr-deref bugs in server_key.c
Some function calls are not implemented in rxrpc_no_security, there are preparse_server_key, free_preparse_server_key and destroy_server_key. When rxrpc security type is rxrpc_no_security, user can easily trigger a null-ptr-deref bug via ioctl. So judgment should be added to prevent it The crash log: user@syzkaller:~$ ./rxrpc_preparse_s [ 37.956878][T15626] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 37.957645][T15626] #PF: supervisor instruction fetch in kernel mode [ 37.958229][T15626] #PF: error_code(0x0010) - not-present page [ 37.958762][T15626] PGD 4aadf067 P4D 4aadf067 PUD 4aade067 PMD 0 [ 37.959321][T15626] Oops: 0010 [#1] PREEMPT SMP [ 37.959739][T15626] CPU: 0 PID: 15626 Comm: rxrpc_preparse_ Not tainted 5.17.0-01442-gb47d5a4f6b8d #43 [ 37.960588][T15626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 37.961474][T15626] RIP: 0010:0x0 [ 37.961787][T15626] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 37.962480][T15626] RSP: 0018:ffffc9000d9abdc0 EFLAGS: 00010286 [ 37.963018][T15626] RAX: ffffffff84335200 RBX: ffff888012a1ce80 RCX: 0000000000000000 [ 37.963727][T15626] RDX: 0000000000000000 RSI: ffffffff84a736dc RDI: ffffc9000d9abe48 [ 37.964425][T15626] RBP: ffffc9000d9abe48 R08: 0000000000000000 R09: 0000000000000002 [ 37.965118][T15626] R10: 000000000000000a R11: f000000000000000 R12: ffff888013145680 [ 37.965836][T15626] R13: 0000000000000000 R14: ffffffffffffffec R15: ffff8880432aba80 [ 37.966441][T15626] FS: 00007f2177907700(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 37.966979][T15626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 37.967384][T15626] CR2: ffffffffffffffd6 CR3: 000000004aaf1000 CR4: 00000000000006f0 [ 37.967864][T15626] Call Trace: [ 37.968062][T15626] <TASK> [ 37.968240][T15626] rxrpc_preparse_s+0x59/0x90 [ 37.968541][T15626] key_create_or_update+0x174/0x510 [ 37.968863][T15626] __x64_sys_add_key+0x139/0x1d0 [ 37.969165][T15626] do_syscall_64+0x35/0xb0 [ 37.969451][T15626] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 37.969824][T15626] RIP: 0033:0x43a1f9 Signed-off-by: Xiaolong Huang <butterflyhuangxx@gmail.com> Tested-by: Xiaolong Huang <butterflyhuangxx@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005069.html Fixes: 12da59fcab5a ("rxrpc: Hand server key parsing off to the security class") Link: https://lore.kernel.org/r/164865013439.2941502.8966285221215590921.stgit@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
David Howells
|
4a7f62f919 |
rxrpc: Fix call timer start racing with call destruction
The rxrpc_call struct has a timer used to handle various timed events relating to a call. This timer can get started from the packet input routines that are run in softirq mode with just the RCU read lock held. Unfortunately, because only the RCU read lock is held - and neither ref or other lock is taken - the call can start getting destroyed at the same time a packet comes in addressed to that call. This causes the timer - which was already stopped - to get restarted. Later, the timer dispatch code may then oops if the timer got deallocated first. Fix this by trying to take a ref on the rxrpc_call struct and, if successful, passing that ref along to the timer. If the timer was already running, the ref is discarded. The timer completion routine can then pass the ref along to the call's work item when it queues it. If the timer or work item where already queued/running, the extra ref is discarded. Fixes: a158bdd3247b ("rxrpc: Fix call timeouts") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005073.html Link: https://lore.kernel.org/r/164865115696.2943015.11097991776647323586.stgit@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Oliver Hartkopp
|
e382fea8ae |
can: isotp: restore accidentally removed MSG_PEEK feature
In commit 42bf50a1795a ("can: isotp: support MSG_TRUNC flag when reading from socket") a new check for recvmsg flags has been introduced that only checked for the flags that are handled in isotp_recvmsg() itself. This accidentally removed the MSG_PEEK feature flag which is processed later in the call chain in __skb_try_recv_from_queue(). Add MSG_PEEK to the set of valid flags to restore the feature. Fixes: 42bf50a1795a ("can: isotp: support MSG_TRUNC flag when reading from socket") Link: https://github.com/linux-can/can-utils/issues/347#issuecomment-1079554254 Link: https://lore.kernel.org/all/20220328113611.3691-1-socketcan@hartkopp.net Reported-by: Derek Will <derekrobertwill@gmail.com> Suggested-by: Derek Will <derekrobertwill@gmail.com> Tested-by: Derek Will <derekrobertwill@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> |
||
Jakub Kicinski
|
77c9387c0c |
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says: ==================== pull-request: bpf 2022-03-29 We've added 16 non-merge commits during the last 1 day(s) which contain a total of 24 files changed, 354 insertions(+), 187 deletions(-). The main changes are: 1) x86 specific bits of fprobe/rethook, from Masami and Peter. 2) ice/xsk fixes, from Maciej and Magnus. 3) Various small fixes, from Andrii, Yonghong, Geliang and others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Fix clang compilation errors ice: xsk: Fix indexing in ice_tx_xsk_pool() ice: xsk: Stop Rx processing when ntc catches ntu ice: xsk: Eliminate unnecessary loop iteration xsk: Do not write NULL in SW ring at allocation failure x86,kprobes: Fix optprobe trampoline to generate complete pt_regs x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs x86,rethook,kprobes: Replace kretprobe with rethook on x86 kprobes: Use rethook for kretprobe if possible bpftool: Fix generated code in codegen_asserts selftests/bpf: fix selftest after random: Urandom_read tracepoint removal bpf: Fix maximum permitted number of arguments check bpf: Sync comments for bpf_get_stack fprobe: Fix sparse warning for acccessing __rcu ftrace_hash fprobe: Fix smatch type mismatch warning bpf/bpftool: Add unprivileged_bpf_disabled check against value of 2 ==================== Link: https://lore.kernel.org/r/20220329234924.39053-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
965181d7ef |
NFS client updates for Linux 5.18
Highlights include: Features: - Switch NFS to use readahead instead of the obsolete readpages. - Readdir fixes to improve cacheability of large directories when there are multiple readers and writers. - Readdir performance improvements when doing a seekdir() immediately after opening the directory (common when re-exporting NFS). - NFS swap improvements from Neil Brown. - Loosen up memory allocation to permit direct reclaim and write back in cases where there is no danger of deadlocking the writeback code or NFS swap. - Avoid sillyrename when the NFSv4 server claims to support the necessary features to recover the unlinked but open file after reboot. Bugfixes: - Patch from Olga to add a mount option to control NFSv4.1 session trunking discovery, and default it to being off. - Fix a lockup in nfs_do_recoalesce(). - Two fixes for list iterator variables being used when pointing to the list head. - Fix a kernel memory scribble when reading from a non-socket transport in /sys/kernel/sunrpc. - Fix a race where reconnecting to a server could leave the TCP socket stuck forever in the connecting state. - Patch from Neil to fix a shutdown race which can leave the SUNRPC transport timer primed after we free the struct xprt itself. - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2 copy offload. - Sunrpc patch from Olga to avoid resending a task on an offlined transport. Cleanups: - Patches from Dave Wysochanski to clean up the fscache code -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJDLLUACgkQZwvnipYK APJZ5Q/9FH5S38RcUOxpFS+XaVdmxltJTMQv9Bz5mcyljq6PwRjTn73Qs7MfBQfD k90O9Pn/EXgQpFWNW4o/saQGLAO1MbVaesdNNF92Le9NziMz7lr95fsFXOqpg9EQ rEVOImWinv/N0+P0yv5tgjx6Fheu70L4dnwsxepnmtv2R1VjXOVchK7WjsV+9qey Xvn15OzhtVCGhRvvKfmGuGW+2I0D29tRNn2X0Pi+LjxAEQ0We/rcsJeGDMGYtbeC 2U4ADk7KB0kJq04WYCH8k8tTrljL3PUONzO6QP1lltmAZ0FOwXf/AjyaD3SML2Xu A1yc2bEn3/S6wCQ3BiMxoE7UCSewg/ZyQD7B9GjZ/nWYNXREhLKvtVZt+ULTCA6o UKAI2vlEHljzQI2YfbVVZFyL8KCZqi+BFcJ/H/dtFKNazIkiefkaF+703Pflazis HjhA5G6IT0Pk6qZ6OBp+Z/4kCDYZYXXi9ysiNuMPWCiYEfZzt7ocUjm44uRAi+vR Qpf0V01Rg4iplTj3YFXqb9TwxrudzC7AcQ7hbLJia0kD/8djG+aGZs3SJWeO+VuE Xa41oxAx9jqOSRvSKPvj7nbexrwn8PtQI2NtDRqjPXlg6+Jbk23u8lIEVhsCH877 xoMUHtTKLyqECskzRN4nh7qoBD3QfiLuI74aDcPtGEuXzU1+Ox8= =jeGt -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - Switch NFS to use readahead instead of the obsolete readpages. - Readdir fixes to improve cacheability of large directories when there are multiple readers and writers. - Readdir performance improvements when doing a seekdir() immediately after opening the directory (common when re-exporting NFS). - NFS swap improvements from Neil Brown. - Loosen up memory allocation to permit direct reclaim and write back in cases where there is no danger of deadlocking the writeback code or NFS swap. - Avoid sillyrename when the NFSv4 server claims to support the necessary features to recover the unlinked but open file after reboot. Bugfixes: - Patch from Olga to add a mount option to control NFSv4.1 session trunking discovery, and default it to being off. - Fix a lockup in nfs_do_recoalesce(). - Two fixes for list iterator variables being used when pointing to the list head. - Fix a kernel memory scribble when reading from a non-socket transport in /sys/kernel/sunrpc. - Fix a race where reconnecting to a server could leave the TCP socket stuck forever in the connecting state. - Patch from Neil to fix a shutdown race which can leave the SUNRPC transport timer primed after we free the struct xprt itself. - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2 copy offload. - Sunrpc patch from Olga to avoid resending a task on an offlined transport. Cleanups: - Patches from Dave Wysochanski to clean up the fscache code" * tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (91 commits) NFSv4/pNFS: Fix another issue with a list iterator pointing to the head NFS: Don't loop forever in nfs_do_recoalesce() SUNRPC: Don't return error values in sysfs read of closed files SUNRPC: Do not dereference non-socket transports in sysfs NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error SUNRPC don't resend a task on an offlined transport NFS: replace usage of found with dedicated list iterator variable SUNRPC: avoid race between mod_timer() and del_timer_sync() pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod NFS: Avoid writeback threads getting stuck in mempool_alloc() NFS: nfsiod should not block forever in mempool_alloc() SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent SUNRPC: Fix unx_lookup_cred() allocation NFS: Fix memory allocation in rpc_alloc_task() NFS: Fix memory allocation in rpc_malloc() SUNRPC: Improve accuracy of socket ENOBUFS determination SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACE SUNRPC: Fix socket waits for write buffer space ... |
||
Jeremy Sowden
|
3181821317 |
netfilter: bitwise: fix reduce comparisons
The `nft_bitwise_reduce` and `nft_bitwise_fast_reduce` functions should compare the bitwise operation in `expr` with the tracked operation associated with the destination register of `expr`. However, instead of being called on `expr` and `track->regs[priv->dreg].selector`, `nft_expr_priv` is called on `expr` twice, so both reduce functions return true even when the operations differ. Fixes: be5650f8f47e ("netfilter: nft_bitwise: track register operations") Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> |
||
Duoming Zhou
|
82e31755e5 |
ax25: Fix UAF bugs in ax25 timers
There are race conditions that may lead to UAF bugs in ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(), ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call ax25_release() to deallocate ax25_dev. One of the UAF bugs caused by ax25_release() is shown below: (Thread 1) | (Thread 2) ax25_dev_device_up() //(1) | ... | ax25_kill_by_device() ax25_bind() //(2) | ax25_connect() | ... ax25_std_establish_data_link() | ax25_start_t1timer() | ax25_dev_device_down() //(3) mod_timer(&ax25->t1timer,..) | | ax25_release() (wait a time) | ... | ax25_dev_put(ax25_dev) //(4)FREE ax25_t1timer_expiry() | ax25->ax25_dev->values[..] //USE| ... ... | We increase the refcount of ax25_dev in position (1) and (2), and decrease the refcount of ax25_dev in position (3) and (4). The ax25_dev will be freed in position (4) and be used in ax25_t1timer_expiry(). The fail log is shown below: ============================================================== [ 106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60 [ 106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0 [ 106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574 [ 106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14 [ 106.116942] Call Trace: ... [ 106.116942] ax25_t1timer_expiry+0x1c/0x60 [ 106.116942] call_timer_fn+0x122/0x3d0 [ 106.116942] __run_timers.part.0+0x3f6/0x520 [ 106.116942] run_timer_softirq+0x4f/0xb0 [ 106.116942] __do_softirq+0x1c2/0x651 ... This patch adds del_timer_sync() in ax25_release(), which could ensure that all timers stop before we deallocate ax25_dev. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Duoming Zhou
|
5352a76130 |
ax25: fix UAF bug in ax25_send_control()
There are UAF bugs in ax25_send_control(), when we call ax25_release() to deallocate ax25_dev. The possible race condition is shown below: (Thread 1) | (Thread 2) ax25_dev_device_up() //(1) | | ax25_kill_by_device() ax25_bind() //(2) | ax25_connect() | ... ax25->state = AX25_STATE_1 | ... | ax25_dev_device_down() //(3) (Thread 3) ax25_release() | ax25_dev_put() //(4) FREE | case AX25_STATE_1: | ax25_send_control() | alloc_skb() //USE | The refcount of ax25_dev increases in position (1) and (2), and decreases in position (3) and (4). The ax25_dev will be freed before dereference sites in ax25_send_control(). The following is part of the report: [ 102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210 [ 102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602 [ 102.297448] Call Trace: [ 102.303751] ax25_send_control+0x33/0x210 [ 102.303751] ax25_release+0x356/0x450 [ 102.305431] __sock_release+0x6d/0x120 [ 102.305431] sock_close+0xf/0x20 [ 102.305431] __fput+0x11f/0x420 [ 102.305431] task_work_run+0x86/0xd0 [ 102.307130] get_signal+0x1075/0x1220 [ 102.308253] arch_do_signal_or_restart+0x1df/0xc00 [ 102.308253] exit_to_user_mode_prepare+0x150/0x1e0 [ 102.308253] syscall_exit_to_user_mode+0x19/0x50 [ 102.308253] do_syscall_64+0x48/0x90 [ 102.308253] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 102.308253] RIP: 0033:0x405ae7 This patch defers the free operation of ax25_dev and net_device after all corresponding dereference sites in ax25_release() to avoid UAF. Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Martin Varghese
|
f19c44452b |
openvswitch: Fixed nd target mask field in the flow dump.
IPv6 nd target mask was not getting populated in flow dump. In the function __ovs_nla_put_key the icmp code mask field was checked instead of icmp code key field to classify the flow as neighbour discovery. ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0), skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0), ct_zone(0/0),ct_mark(0/0),ct_label(0/0), eth(src=00:00:00:00:00:00/00:00:00:00:00:00, dst=00:00:00:00:00:00/00:00:00:00:00:00), eth_type(0x86dd), ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no), icmpv6(type=135,code=0), nd(target=2001::2/::, sll=00:00:00:00:00:00/00:00:00:00:00:00, tll=00:00:00:00:00:00/00:00:00:00:00:00), packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2 Fixes: e64457191a25 (openvswitch: Restructure datapath.c and flow.c) Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Link: https://lore.kernel.org/r/20220328054148.3057-1-martinvarghesenokia@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Magnus Karlsson
|
a95a4d9b39 |
xsk: Do not write NULL in SW ring at allocation failure
For the case when xp_alloc_batch() is used but the batched allocation cannot be used, there is a slow path that uses the non-batched xp_alloc(). When it fails to allocate an entry, it returns NULL. The current code wrote this NULL into the entry of the provided results array (pointer to the driver SW ring usually) and returned. This might not be what the driver expects and to make things simpler, just write successfully allocated xdp_buffs into the SW ring,. The driver might have information in there that is still important after an allocation failure. Note that at this point in time, there are no drivers using xp_alloc_batch() that could trigger this slow path. But one might get added. Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-2-maciej.fijalkowski@intel.com |
||
Linus Torvalds
|
d717e4cae0 |
Networking fixes, including fixes from netfilter.
Current release - regressions: - llc: only change llc->dev when bind() succeeds, fix null-deref Current release - new code bugs: - smc: fix a memory leak in smc_sysctl_net_exit() - dsa: realtek: make interface drivers depend on OF Previous releases - regressions: - sched: act_ct: fix ref leak when switching zones Previous releases - always broken: - netfilter: egress: report interface as outgoing - vsock/virtio: enable VQs early on probe and finish the setup before using them Misc: - memcg: enable accounting for nft objects Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmJCSMMACgkQMUZtbf5S Irt4eQ//fTAC/7mBmT8uoUJMZlrRckSDnJ/Y1ukgOQrjbwcgeRi0PK1cy2oGmU4w mRZ8zhskVpmzodPuduCIdmsdE2PaWTCFoVRC52QH1HffCRbj1mRK9vrf94q0TP9+ jqzaIOhKyWKGMgYQGObIFbojnF4H1wm+tIXcEVWzxivS/2yY4W/3hdBIblBO++r5 c9vxO//qzGH1kGDCWfahuJSTvZBpQ3HTmjGLC1F8xTh8RkR7MGQyGCQ984j+DClb PJJQXeV/Zoyxvrzv14MU5Ms9+lsgH2pyBdVzvN8p2QSwSaU8CsbbM05I4lB5mT/b tGBYNreMmuQbXRxNVoxaZOTgQqEtTgH+AKJ9L0f2Es6Ftp5TTrFZZA97lO0/qzMj NGbxa0p7tlNyOGKDxyUw6SB1+kqqgR2a6skk4XnQ6CAH7AvxSFOxvt63mjeJfCY7 +j5Lxtm+a/RpVt6Djsvpwq12lKiootcbEyMoUKxKeQ+4I08z6W6hoS1zjUDeMDM6 q8eDXsxpZgGF6k7x3eKkOWKLMVeQ1cv0CjGaCoTXCqtGZTixRll3v6I6/Oh405Gw 18fZkIC4TjRdXyfA23n7MzyukjOjmbzn5Kx01lfiMYFeiS/tMwFt/W+ka836j0R6 gzUdEHLEZdPN699WP4fRrxmIjGlGpEpl02WDEFrP+LDdFCYHCzY= =sOIu -----END PGP SIGNATURE----- Merge tag 'net-5.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - llc: only change llc->dev when bind() succeeds, fix null-deref Current release - new code bugs: - smc: fix a memory leak in smc_sysctl_net_exit() - dsa: realtek: make interface drivers depend on OF Previous releases - regressions: - sched: act_ct: fix ref leak when switching zones Previous releases - always broken: - netfilter: egress: report interface as outgoing - vsock/virtio: enable VQs early on probe and finish the setup before using them Misc: - memcg: enable accounting for nft objects" * tag 'net-5.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (39 commits) Revert "selftests: net: Add tls config dependency for tls selftests" net/smc: Send out the remaining data in sndbuf before close net: move net_unlink_todo() out of the header net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator net: bnxt_ptp: fix compilation error selftests: net: Add tls config dependency for tls selftests memcg: enable accounting for nft objects net/sched: act_ct: fix ref leak when switching zones net/smc: fix a memory leak in smc_sysctl_net_exit() selftests: tls: skip cmsg_to_pipe tests with TLS=n octeontx2-af: initialize action variable net: sparx5: switchdev: fix possible NULL pointer dereference net/x25: Fix null-ptr-deref caused by x25_disconnect qlcnic: dcb: default to returning -EOPNOTSUPP net: sparx5: depends on PTP_1588_CLOCK_OPTIONAL net: hns3: fix phy can not link up when autoneg off and reset net: hns3: add NULL pointer check for hns3_set/get_ringparam() net: hns3: add netdev reset check for hns3_set_tunable() net: hns3: clean residual vf config after disable sriov net: hns3: add max order judgement for tx spare buffer ... |
||
Wen Gu
|
906b3d6491 |
net/smc: Send out the remaining data in sndbuf before close
The current autocork algorithms will delay the data transmission in BH context to smc_release_cb() when sock_lock is hold by user. So there is a possibility that when connection is being actively closed (sock_lock is hold by user now), some corked data still remains in sndbuf, waiting to be sent by smc_release_cb(). This will cause: - smc_close_stream_wait(), which is called under the sock_lock, has a high probability of timeout because data transmission is delayed until sock_lock is released. - Unexpected data sends may happen after connction closed and use the rtoken which has been deleted by remote peer through LLC_DELETE_RKEY messages. So this patch will try to send out the remaining corked data in sndbuf before active close process, to ensure data integrity and avoid unexpected data transmission after close. Reported-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Fixes: 6b88af839d20 ("net/smc: don't send in the BH context if sock_owned_by_user") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/1648447836-111521-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Johannes Berg
|
f32404ae1b |
net: move net_unlink_todo() out of the header
There's no reason for this to be in netdevice.h, it's all just used in dev.c. Also make it no longer inline and let the compiler decide to do that by itself. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20220325225023.f49b9056fe1c.I6b901a2df00000837a9bd251a8dd259bd23f5ded@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
a701f370b5 |
xen: branch for v5.18-rc1
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYkF9UwAKCRCAXGG7T9hj vsXpAPwKXI4WIQcvnVCdULQfuXpA1TbD5XZuS9OuiN/OxWHbzAEA1VHWTmS+tpZ1 ptOyoGhAWhTGeplToobDSGz5qTXEPAI= =FaKX -----END PGP SIGNATURE----- Merge tag 'for-linus-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - A bunch of minor cleanups - A fix for kexec in Xen dom0 when executed on a high cpu number - A fix for resuming after suspend of a Xen guest with assigned PCI devices - A fix for a crash due to not disabled preemption when resuming as Xen dom0 * tag 'for-linus-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: fix is_xen_pmu() xen: don't hang when resuming PCI device arch:x86:xen: Remove unnecessary assignment in xen_apic_read() xen/grant-table: remove readonly parameter from functions xen/grant-table: remove gnttab_*transfer*() functions drivers/xen: use helper macro __ATTR_RW x86/xen: Fix kerneldoc warning xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 xen: use time_is_before_eq_jiffies() instead of open coding it |
||
David S. Miller
|
2aa2f88c97 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Incorrect output device in nf_egress hook, from Phill Sutter. 2) Preserve liberal flag in TCP conntrack state, reported by Sven Auhagen. 3) Use GFP_KERNEL_ACCOUNT flag for nf_tables objects, from Vasily Averin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Vasily Averin
|
33758c8914 |
memcg: enable accounting for nft objects
nftables replaces iptables, but it lacks memcg accounting. This patch account most of the memory allocation associated with nft and should protect the host from misusing nft inside a memcg restricted container. Signed-off-by: Vasily Averin <vvs@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> |
||
Marcelo Ricardo Leitner
|
bcb74e132a |
net/sched: act_ct: fix ref leak when switching zones
When switching zones or network namespaces without doing a ct clear in between, it is now leaking a reference to the old ct entry. That's because tcf_ct_skb_nfct_cached() returns false and tcf_ct_flow_table_lookup() may simply overwrite it. The fix is to, as the ct entry is not reusable, free it already at tcf_ct_skb_nfct_cached(). Reported-by: Florian Westphal <fw@strlen.de> Fixes: 2f131de361f6 ("net/sched: act_ct: Fix flow table lookup after ct clear or switching zones") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Eric Dumazet
|
5ae6acf1d0 |
net/smc: fix a memory leak in smc_sysctl_net_exit()
Recently added smc_sysctl_net_exit() forgot to free the memory allocated from smc_sysctl_net_init() for non initial network namespace. Fixes: 462791bbfa35 ("net/smc: add sysctl interface for SMC") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Tony Lu <tonylu@linux.alibaba.com> Cc: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Duoming Zhou
|
7781607938 |
net/x25: Fix null-ptr-deref caused by x25_disconnect
When the link layer is terminating, x25->neighbour will be set to NULL in x25_disconnect(). As a result, it could cause null-ptr-deref bugs in x25_sendmsg(),x25_recvmsg() and x25_connect(). One of the bugs is shown below. (Thread 1) | (Thread 2) x25_link_terminated() | x25_recvmsg() x25_kill_by_neigh() | ... x25_disconnect() | lock_sock(sk) ... | ... x25->neighbour = NULL //(1) | ... | x25->neighbour->extended //(2) The code sets NULL to x25->neighbour in position (1) and dereferences x25->neighbour in position (2), which could cause null-ptr-deref bug. This patch adds lock_sock() in x25_kill_by_neigh() in order to synchronize with x25_sendmsg(), x25_recvmsg() and x25_connect(). What`s more, the sock held by lock_sock() is not NULL, because it is extracted from x25_list and uses x25_list_lock to synchronize. Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Eric Dumazet
|
2d327a79ee |
llc: only change llc->dev when bind() succeeds
My latest patch, attempting to fix the refcount leak in a minimal way turned out to add a new bug. Whenever the bind operation fails before we attempt to grab a reference count on a device, we might release the device refcount of a prior successful bind() operation. syzbot was not happy about this [1]. Note to stable teams: Make sure commit b37a46683739 ("netdevice: add the case if dev is NULL") is already present in your trees. [1] general protection fault, probably for non-canonical address 0xdffffc0000000070: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000380-0x0000000000000387] CPU: 1 PID: 3590 Comm: syz-executor361 Tainted: G W 5.17.0-syzkaller-04796-g169e77764adc #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:llc_ui_connect+0x400/0xcb0 net/llc/af_llc.c:500 Code: 80 3c 02 00 0f 85 fc 07 00 00 4c 8b a5 38 05 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d bc 24 80 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 a9 07 00 00 49 8b b4 24 80 03 00 00 4c 89 f2 48 RSP: 0018:ffffc900038cfcc0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff8880756eb600 RCX: 0000000000000000 RDX: 0000000000000070 RSI: ffffc900038cfe3e RDI: 0000000000000380 RBP: ffff888015ee5000 R08: 0000000000000001 R09: ffff888015ee5535 R10: ffffed1002bdcaa6 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc900038cfe37 R14: ffffc900038cfe38 R15: ffff888015ee5012 FS: 0000555555acd300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000280 CR3: 0000000077db6000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __sys_connect_file+0x155/0x1a0 net/socket.c:1900 __sys_connect+0x161/0x190 net/socket.c:1917 __do_sys_connect net/socket.c:1927 [inline] __se_sys_connect net/socket.c:1924 [inline] __x64_sys_connect+0x6f/0xb0 net/socket.c:1924 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f016acb90b9 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd417947f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f016acb90b9 RDX: 0000000000000010 RSI: 0000000020000140 RDI: 0000000000000003 RBP: 00007f016ac7d0a0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f016ac7d130 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:llc_ui_connect+0x400/0xcb0 net/llc/af_llc.c:500 Fixes: 764f4eb6846f ("llc: fix netdevice reference leaks in llc_ui_bind()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: 赵子轩 <beraphin@gmail.com> Cc: Stoyan Manolov <smanolov@suse.de> Link: https://lore.kernel.org/r/20220325035827.360418-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Trond Myklebust
|
ebbe788731 |
SUNRPC: Don't return error values in sysfs read of closed files
Instead of returning an error value, which ends up being the return value for the read() system call, it is more elegant to simply return the error as a string value. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> |
||
Trond Myklebust
|
421ab1be43 |
SUNRPC: Do not dereference non-socket transports in sysfs
Do not cast the struct xprt to a sock_xprt unless we know it is a UDP or TCP transport. Otherwise the call to lock the mutex will scribble over whatever structure is actually there. This has been seen to cause hard system lockups when the underlying transport was RDMA. Fixes: b49ea673e119 ("SUNRPC: lock against ->sock changing during sysfs read") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> |
||
Stefano Garzarella
|
88704454ef |
vsock/virtio: enable VQs early on probe
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, but virtio-vsock driver uses VQs in the probe function to fill rx and event VQs with new buffers. Let's fix this, calling virtio_device_ready() before using VQs in the probe function. Fixes: 0ea9e1d3a9e3 ("VSOCK: Introduce virtio_transport.ko") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Stefano Garzarella
|
c1011c0b3a |
vsock/virtio: read the negotiated features before using VQs
Complete the driver configuration, reading the negotiated features, before using the VQs in the virtio_vsock_probe(). Fixes: 53efbba12cc7 ("virtio/vsock: enable SEQPACKET for transport") Suggested-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Stefano Garzarella
|
4b5f1ad556 |
vsock/virtio: initialize vdev->priv before using VQs
When we fill VQs with empty buffers and kick the host, it may send an interrupt. `vdev->priv` must be initialized before this since it is used in the virtqueue callbacks. Fixes: 0deab087b16a ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock") Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
85c7000fda |
The highlights are:
- several changes to how snap context and snap realms are tracked (Xiubo Li). In particular, this should resolve a long-standing issue of high kworker CPU usage and various stalls caused by needless iteration over all inodes in the snap realm. - async create fixes to address hangs in some edge cases (Jeff Layton) - support for getvxattr MDS op for querying server-side xattrs, such as file/directory layouts and ephemeral pins (Milind Changire) - average latency is now maintained for all metrics (Venky Shankar) - some tweaks around handling inline data to make it fit better with netfs helper library (David Howells) Also a couple of memory leaks got plugged along with a few assorted fixups. Last but not least, Xiubo has stepped up to serve as a CephFS co-maintainer. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmI8qE4THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi9UvB/kBZt4mkyjqRC+KJ5rukw5q7lyAYGC1 QYVbuTuIJydyDvqQp9pXYFndlj10pb7ULnUlQlcBfntVAr9s7xx7ZKrKciE48MPT vLiJmq3MpEedM4oE4FgcJbmHtltDgZWvOxXB7renpHNeHuPeezNpKzaKQXGUHUDo +7cX5XWBzZk+AYbEvxQUsjDozcgDp31qf015mAX3r0P7XFkBB7xwZA7sb7Cw1GEr S6ZdlNoFcWUq0ULUdh2C7l5a2mKQnVnpOO3TMjE6tSqJ74iozRy4tO9aFgj99NEn D1rQbCLr3JPfY//JFyqEOIDYf3hepOMmEkoGHOFukckFKUe3yfJJxa3r =ESr5 -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "The highlights are: - several changes to how snap context and snap realms are tracked (Xiubo Li). In particular, this should resolve a long-standing issue of high kworker CPU usage and various stalls caused by needless iteration over all inodes in the snap realm. - async create fixes to address hangs in some edge cases (Jeff Layton) - support for getvxattr MDS op for querying server-side xattrs, such as file/directory layouts and ephemeral pins (Milind Changire) - average latency is now maintained for all metrics (Venky Shankar) - some tweaks around handling inline data to make it fit better with netfs helper library (David Howells) Also a couple of memory leaks got plugged along with a few assorted fixups. Last but not least, Xiubo has stepped up to serve as a CephFS co-maintainer" * tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client: (27 commits) ceph: fix memory leak in ceph_readdir when note_last_dentry returns error ceph: uninitialized variable in debug output ceph: use tracked average r/w/m latencies to display metrics in debugfs ceph: include average/stdev r/w/m latency in mds metrics ceph: track average r/w/m latency ceph: use ktime_to_timespec64() rather than jiffies_to_timespec64() ceph: assign the ci only when the inode isn't NULL ceph: fix inode reference leakage in ceph_get_snapdir() ceph: misc fix for code style and logs ceph: allocate capsnap memory outside of ceph_queue_cap_snap() ceph: do not release the global snaprealm until unmounting ceph: remove incorrect and unused CEPH_INO_DOTDOT macro MAINTAINERS: add Xiubo Li as cephfs co-maintainer ceph: eliminate the recursion when rebuilding the snap context ceph: do not update snapshot context when there is no new snapshot ceph: zero the dir_entries memory when allocating it ceph: move to a dedicated slabcache for ceph_cap_snap ceph: add getvxattr op libceph: drop else branches in prepare_read_data{,_cont} ceph: fix comments mentioning i_mutex ... |
||
Linus Torvalds
|
169e77764a |
Networking changes for 5.18.
Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+ AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw= =ELpe -----END PGP SIGNATURE----- Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ... |
||
Olga Kornievskaia
|
82ee41b85c |
SUNRPC don't resend a task on an offlined transport
When a task is being retried, due to an NFS error, if the assigned transport has been put offline and the task is relocatable pick a new transport. Fixes: 6f081693e7b2b ("sunrpc: remove an offlined xprt using sysfs") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> |
||
Pablo Neira Ayuso
|
f2dd495a8d |
netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options
Do not reset IP_CT_TCP_FLAG_BE_LIBERAL flag in out-of-sync scenarios coming before the TCP window tracking, otherwise such connections will fail in the window check. Update tcp_options() to leave this flag in place and add a new helper function to reset the tcp window state. Based on patch from Sven Auhagen. Fixes: c4832c7bbc3f ("netfilter: nf_ct_tcp: improve out-of-sync situation in TCP tracking") Tested-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> |
||
Linus Torvalds
|
194dfe88d6 |
asm-generic updates for 5.18
There are three sets of updates for 5.18 in the asm-generic tree: - The set_fs()/get_fs() infrastructure gets removed for good. This was already gone from all major architectures, but now we can finally remove it everywhere, which loses some particularly tricky and error-prone code. There is a small merge conflict against a parisc cleanup, the solution is to use their new version. - The nds32 architecture ends its tenure in the Linux kernel. The hardware is still used and the code is in reasonable shape, but the mainline port is not actively maintained any more, as all remaining users are thought to run vendor kernels that would never be updated to a future release. There are some obvious conflicts against changes to the removed files. - A series from Masahiro Yamada cleans up some of the uapi header files to pass the compile-time checks. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6 sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89 QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw= =vtCN -----END PGP SIGNATURE----- Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "There are three sets of updates for 5.18 in the asm-generic tree: - The set_fs()/get_fs() infrastructure gets removed for good. This was already gone from all major architectures, but now we can finally remove it everywhere, which loses some particularly tricky and error-prone code. There is a small merge conflict against a parisc cleanup, the solution is to use their new version. - The nds32 architecture ends its tenure in the Linux kernel. The hardware is still used and the code is in reasonable shape, but the mainline port is not actively maintained any more, as all remaining users are thought to run vendor kernels that would never be updated to a future release. - A series from Masahiro Yamada cleans up some of the uapi header files to pass the compile-time checks" * tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits) nds32: Remove the architecture uaccess: remove CONFIG_SET_FS ia64: remove CONFIG_SET_FS support sh: remove CONFIG_SET_FS support sparc64: remove CONFIG_SET_FS support lib/test_lockup: fix kernel pointer check for separate address spaces uaccess: generalize access_ok() uaccess: fix type mismatch warnings from access_ok() arm64: simplify access_ok() m68k: fix access_ok for coldfire MIPS: use simpler access_ok() MIPS: Handle address errors for accesses above CPU max virtual user address uaccess: add generic __{get,put}_kernel_nofault nios2: drop access_ok() check from __put_user() x86: use more conventional access_ok() definition x86: remove __range_not_ok() sparc64: add __{get,put}_kernel_nofault() nds32: fix access_ok() checks in get/put_user uaccess: fix nios2 and microblaze get_user_8() sparc64: fix building assembly files ... |
||
NeilBrown
|
3848e96edf |
SUNRPC: avoid race between mod_timer() and del_timer_sync()
xprt_destory() claims XPRT_LOCKED and then calls del_timer_sync(). Both xprt_unlock_connect() and xprt_release() call ->release_xprt() which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect() which calls mod_timer(). This may result in mod_timer() being called *after* del_timer_sync(). When this happens, the timer may fire long after the xprt has been freed, and run_timer_softirq() will probably crash. The pairing of ->release_xprt() and xprt_schedule_autodisconnect() is always called under ->transport_lock. So if we take ->transport_lock to call del_timer_sync(), we can be sure that mod_timer() will run first (if it runs at all). Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> |
||
Jakub Kicinski
|
89695196f0 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Merge in overtime fixes, no conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Eric Dumazet
|
764f4eb684 |
llc: fix netdevice reference leaks in llc_ui_bind()
Whenever llc_ui_bind() and/or llc_ui_autobind() took a reference on a netdevice but subsequently fail, they must properly release their reference or risk the infamous message from unregister_netdevice() at device dismantle. unregister_netdevice: waiting for eth0 to become free. Usage count = 3 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: 赵子轩 <beraphin@gmail.com> Reported-by: Stoyan Manolov <smanolov@suse.de> Link: https://lore.kernel.org/r/20220323004147.1990845-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Tobias Waldekranz
|
a911ad18a5 |
net: bridge: mst: Restrict info size queries to bridge ports
Ensure that no bridge masters are ever considered for MST info dumping. MST states are only supported on bridge ports, not bridge masters - which br_mst_info_size relies on. Fixes: 122c29486e1f ("net: bridge: mst: Support setting and reporting MST port states") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Link: https://lore.kernel.org/r/20220322133001.16181-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Vladimir Oltean
|
5077e2c8cf |
net: dsa: fix missing host-filtered multicast addresses
DSA ports are stacked devices, so they use dev_mc_add() to sync their address list to their lower interface (DSA master). But they are also hardware devices, so they program those addresses to hardware using the __dev_mc_add() sync and unsync callbacks. Unfortunately both cannot work at the same time, and it seems that the multicast addresses which are already present on the DSA master, like 33:33:00:00:00:01 (added by addrconf.c as in6addr_linklocal_allnodes) are synced to the master via dev_mc_sync(), but not to hardware by __dev_mc_sync(). This happens because both the dev_mc_sync() -> __hw_addr_sync_one() code path, as well as __dev_mc_sync() -> __hw_addr_sync_dev(), operate on the same variable: ha->sync_cnt, in a way that causes the "sync" method (dsa_slave_sync_mc) to no longer be called. To fix the issue we need to work with the API in the way in which it was intended to be used, and therefore, call dev_uc_add() and friends for each individual hardware address, from the sync and unsync callbacks. Fixes: 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB") Link: https://lore.kernel.org/netdev/20220321163213.lrn5sk7m6grighbl@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220322003701.2056895-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
3bf03b9a08 |
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ... |
||
Muchun Song
|
fd60b28842 |
fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |