1a27fc0170
16883 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
73be7fb14e |
Including fixes from netfilter and bpf.
Current release - regressions: - eth: stmmac: fix failure to probe without MAC interface specified Current release - new code bugs: - docs: netlink: fix missing classic_netlink doc reference Previous releases - regressions: - deal with integer overflows in kmalloc_reserve() - use sk_forward_alloc_get() in sk_get_meminfo() - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc - fib: avoid warn splat in flow dissector after packet mangling - skb_segment: call zero copy functions before using skbuff frags - eth: sfc: check for zero length in EF10 RX prefix Previous releases - always broken: - af_unix: fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR() - netfilter: - nft_exthdr: fix non-linear header modification - xt_u32, xt_sctp: validate user space input - nftables: exthdr: fix 4-byte stack OOB write - nfnetlink_osf: avoid OOB read - one more fix for the garbage collection work from last release - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t - handshake: fix null-deref in handshake_nl_done_doit() - ip: ignore dst hint for multipath routes to ensure packets are hashed across the nexthops - phy: micrel: - correct bit assignments for cable test errata - disable EEE according to the KSZ9477 errata Misc: - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations - Revert "net: macsec: preserve ingress frame ordering", it appears to have been developed against an older kernel, problem doesn't exist upstream Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmT6R6wACgkQMUZtbf5S IrsmTg//TgmRjxSZ0lrPQtJwZR/eN3ZR2oQG3rwnssCx+YgHEGGxQsfT4KHEMacR ZgGDZVTpthUJkkACBPi8ZMoy++RdjEmlCcanfeDkGHoYGtiX1lhkofhLMn1KUHbI rIbP9EdNKxQT0SsBlw/U28pD5jKyqOgL23QobEwmcjLTdMpamb+qIsD6/xNv9tEj Tu4BdCIkhjxnBD622hsE3pFTG7oSn2WM6rf5NT1E43mJ3W8RrMcydSB27J7Oryo9 l3nYMAhz0vQINS2WQ9eCT1/7GI6gg1nDtxFtrnV7ASvxayRBPIUr4kg1vT+Tixsz CZMnwVamEBIYl9agmj7vSji7d5nOUgXPhtWhwWUM2tRoGdeGw3vSi1pgDvRiUCHE PJ4UHv7goa2AgnOlOQCFtRybAu+9nmSGm7V+GkeGLnH7xbFsEa5smQ/+FSPJs8Dn Yf4q5QAhdN8tdnofRlrN/nCssoDF3cfmBsTJ7wo5h71gW+BWhsP58eDCJlXd/r8k +Qnvoe2kw27ktFR1tjsUDZ0AcSmeVARNwmXCOBYZsG4tEek8pLyj008mDvJvdfyn PGPn7Eo5DyaERlHVmPuebHXSyniDEPe2GLTmlHcGiRpGspoUHbB+HRiDAuRLMB9g pkL8RHpNfppnuUXeUoNy3rgEkYwlpTjZX0QHC6N8NQ76ccB6CNM= =YpmE -----END PGP SIGNATURE----- Merge tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking updates from Jakub Kicinski: "Including fixes from netfilter and bpf. Current release - regressions: - eth: stmmac: fix failure to probe without MAC interface specified Current release - new code bugs: - docs: netlink: fix missing classic_netlink doc reference Previous releases - regressions: - deal with integer overflows in kmalloc_reserve() - use sk_forward_alloc_get() in sk_get_meminfo() - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc - fib: avoid warn splat in flow dissector after packet mangling - skb_segment: call zero copy functions before using skbuff frags - eth: sfc: check for zero length in EF10 RX prefix Previous releases - always broken: - af_unix: fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR() - netfilter: - nft_exthdr: fix non-linear header modification - xt_u32, xt_sctp: validate user space input - nftables: exthdr: fix 4-byte stack OOB write - nfnetlink_osf: avoid OOB read - one more fix for the garbage collection work from last release - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t - handshake: fix null-deref in handshake_nl_done_doit() - ip: ignore dst hint for multipath routes to ensure packets are hashed across the nexthops - phy: micrel: - correct bit assignments for cable test errata - disable EEE according to the KSZ9477 errata Misc: - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations - Revert "net: macsec: preserve ingress frame ordering", it appears to have been developed against an older kernel, problem doesn't exist upstream" * tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs() Revert "net: team: do not use dynamic lockdep key" net: hns3: remove GSO partial feature bit net: hns3: fix the port information display when sfp is absent net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue net: hns3: fix debugfs concurrency issue between kfree buffer and read net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read() net: hns3: Support query tx timeout threshold by debugfs net: hns3: fix tx timeout issue net: phy: Provide Module 4 KSZ9477 errata (DS80000754C) netfilter: nf_tables: Unbreak audit log reset netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID netfilter: nfnetlink_osf: avoid OOB read netfilter: nftables: exthdr: fix 4-byte stack OOB write selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc bpf: bpf_sk_storage: Fix invalid wait context lockdep report s390/bpf: Pass through tail call counter in trampolines ... |
||
Eric Dumazet
|
9b271ebaf9 |
ip_tunnels: use DEV_STATS_INC()
syzbot/KCSAN reported data-races in iptunnel_xmit_stats() [1]
This can run from multiple cpus without mutual exclusion.
Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.
[1]
BUG: KCSAN: data-race in iptunnel_xmit / iptunnel_xmit
read-write to 0xffff8881353df170 of 8 bytes by task 30263 on cpu 1:
iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline]
iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87
ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662
__netdev_start_xmit include/linux/netdevice.h:4889 [inline]
netdev_start_xmit include/linux/netdevice.h:4903 [inline]
xmit_one net/core/dev.c:3544 [inline]
dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560
__dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340
dev_queue_xmit include/linux/netdevice.h:3082 [inline]
__bpf_tx_skb net/core/filter.c:2129 [inline]
__bpf_redirect_no_mac net/core/filter.c:2159 [inline]
__bpf_redirect+0x723/0x9c0 net/core/filter.c:2182
____bpf_clone_redirect net/core/filter.c:2453 [inline]
bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425
___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954
__bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195
bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline]
__bpf_prog_run include/linux/filter.h:609 [inline]
bpf_prog_run include/linux/filter.h:616 [inline]
bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423
bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045
bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996
__sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353
__do_sys_bpf kernel/bpf/syscall.c:5439 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5437 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read-write to 0xffff8881353df170 of 8 bytes by task 30249 on cpu 0:
iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline]
iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87
ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662
__netdev_start_xmit include/linux/netdevice.h:4889 [inline]
netdev_start_xmit include/linux/netdevice.h:4903 [inline]
xmit_one net/core/dev.c:3544 [inline]
dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560
__dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340
dev_queue_xmit include/linux/netdevice.h:3082 [inline]
__bpf_tx_skb net/core/filter.c:2129 [inline]
__bpf_redirect_no_mac net/core/filter.c:2159 [inline]
__bpf_redirect+0x723/0x9c0 net/core/filter.c:2182
____bpf_clone_redirect net/core/filter.c:2453 [inline]
bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425
___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954
__bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195
bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline]
__bpf_prog_run include/linux/filter.h:609 [inline]
bpf_prog_run include/linux/filter.h:616 [inline]
bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423
bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045
bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996
__sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353
__do_sys_bpf kernel/bpf/syscall.c:5439 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5437 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000018830 -> 0x0000000000018831
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 30249 Comm: syz-executor.4 Not tainted 6.5.0-syzkaller-11704-g3f86ed6ec0b3 #0
Fixes:
|
||
Kuniyuki Iwashima
|
718e6b5129 |
af_unix: Fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT.
Heiko Carstens reported that SCM_PIDFD does not work with MSG_CMSG_COMPAT
because scm_pidfd_recv() always checks msg_controllen against sizeof(struct
cmsghdr).
We need to use sizeof(struct compat_cmsghdr) for the compat case.
Fixes:
|
||
Sriram Yagnaraman
|
6ac66cb03a |
ipv4: ignore dst hint for multipath routes
Route hints when the nexthop is part of a multipath group causes packets
in the same receive batch to be sent to the same nexthop irrespective of
the multipath hash of the packet. So, do not extract route hint for
packets whose destination is part of a multipath group.
A new SKB flag IPSKB_MULTIPATH is introduced for this purpose, set the
flag when route is looked up in ip_mkroute_input() and use it in
ip_extract_route_hint() to check for the existence of the flag.
Fixes:
|
||
Eric Dumazet
|
e3390b30a5 |
net: annotate data-races around sk->sk_tsflags
sk->sk_tsflags can be read locklessly, add corresponding annotations.
Fixes:
|
||
Eric Dumazet
|
5e6300e7b3 |
net: annotate data-races around sk->sk_forward_alloc
Every time sk->sk_forward_alloc is read locklessly,
add a READ_ONCE().
Add sk_forward_alloc_add() helper to centralize updates,
to reduce number of WRITE_ONCE().
Fixes:
|
||
Florian Westphal
|
8aae7625ff |
net: fib: avoid warn splat in flow dissector
New skbs allocated via nf_send_reset() have skb->dev == NULL.
fib*_rules_early_flow_dissect helpers already have a 'struct net'
argument but its not passed down to the flow dissector core, which
will then WARN as it can't derive a net namespace to use:
WARNING: CPU: 0 PID: 0 at net/core/flow_dissector.c:1016 __skb_flow_dissect+0xa91/0x1cd0
[..]
ip_route_me_harder+0x143/0x330
nf_send_reset+0x17c/0x2d0 [nf_reject_ipv4]
nft_reject_inet_eval+0xa9/0xf2 [nft_reject_inet]
nft_do_chain+0x198/0x5d0 [nf_tables]
nft_do_chain_inet+0xa4/0x110 [nf_tables]
nf_hook_slow+0x41/0xc0
ip_local_deliver+0xce/0x110
..
Cc: Stanislav Fomichev <sdf@google.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Ido Schimmel <idosch@nvidia.com>
Fixes:
|
||
Linus Torvalds
|
adfd671676 |
sysctl-6.6-rc1
Long ago we set out to remove the kitchen sink on kernel/sysctl.c arrays and placings sysctls to their own sybsystem or file to help avoid merge conflicts. Matthew Wilcox pointed out though that if we're going to do that we might as well also *save* space while at it and try to remove the extra last sysctl entry added at the end of each array, a sentintel, instead of bloating the kernel by adding a new sentinel with each array moved. Doing that was not so trivial, and has required slowing down the moves of kernel/sysctl.c arrays and measuring the impact on size by each new move. The complex part of the effort to help reduce the size of each sysctl is being done by the patient work of el señor Don Joel Granados. A lot of this is truly painful code refactoring and testing and then trying to measure the savings of each move and removing the sentinels. Although Joel already has code which does most of this work, experience with sysctl moves in the past shows is we need to be careful due to the slew of odd build failures that are possible due to the amount of random Kconfig options sysctls use. To that end Joel's work is split by first addressing the major housekeeping needed to remove the sentinels, which is part of this merge request. The rest of the work to actually remove the sentinels will be done later in future kernel releases. At first I was only going to send his first 7 patches of his patch series, posted 1 month ago, but in retrospect due to the testing the changes have received in linux-next and the minor changes they make this goes with the entire set of patches Joel had planned: just sysctl house keeping. There are networking changes but these are part of the house keeping too. The preliminary math is showing this will all help reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array where we are able to remove each sentinel in the future. That also means there is no more bloating the kernel with the extra ~64 bytes per array moved as no new sentinels are created. Most of this has been in linux-next for about a month, the last 7 patches took a minor refresh 2 week ago based on feedback. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmTuVnMSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinIckP/imvRlfkO6L0IP7MmJBRPtwY01rsTAKO Q14dZ//bG4DVQeGl1FdzrF6hhuLgekU0qW1YDFIWiCXO7CbaxaNBPSUkeW6ReVoC R/VHNUPxSR1PWQy1OTJV2t4XKri2sB7ijmUsfsATtISwhei9bggTHEysShtP4tv+ U87DzhoqMnbYIsfMo49KCqOa1Qm7TmjC1a7WAp6Fph3GJuXAzZR5pXpsd0NtOZ9x Ud5RT22icnQpMl7K+yPsqY6XcS5JkgBe/WbSzMAUkYZvBZFBq9t2D+OW5h9TZMhw piJWQ9X0Rm7qI2D15mJfXwaOhhyDhWci391hzdJmS6DI0prf6Ma2NFdAWOt/zomI uiRujS4bGeBUaK5F4TX2WQ1+jdMtAZ+0FncFnzt4U8q7dzUc91uVCm6iHW3gcfAb N7OEg2ZL0gkkgCZHqKxN8wpNQiC2KwnNk+HLAbnL2a/oJYfBtdopQmlxWfrN2hpF xxROiENqk483BRdMXDq6DR/gyDZmZWCobXIglSzlqCOjCOcLbDziIJ7pJk83ok09 h/QnXTYHf9protBq9OIQesgh2pwNzBBLifK84KZLKcb7IbdIKjpQrW5STp04oNGf wcGJzEz8tXUe0UKyMM47AcHQGzIy6cdXNLjyF8a+m7rnZzr1ndnMqZyRStZzuQin AUg2VWHKPmW9 =sq2p -----END PGP SIGNATURE----- Merge tag 'sysctl-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "Long ago we set out to remove the kitchen sink on kernel/sysctl.c arrays and placings sysctls to their own sybsystem or file to help avoid merge conflicts. Matthew Wilcox pointed out though that if we're going to do that we might as well also *save* space while at it and try to remove the extra last sysctl entry added at the end of each array, a sentintel, instead of bloating the kernel by adding a new sentinel with each array moved. Doing that was not so trivial, and has required slowing down the moves of kernel/sysctl.c arrays and measuring the impact on size by each new move. The complex part of the effort to help reduce the size of each sysctl is being done by the patient work of el señor Don Joel Granados. A lot of this is truly painful code refactoring and testing and then trying to measure the savings of each move and removing the sentinels. Although Joel already has code which does most of this work, experience with sysctl moves in the past shows is we need to be careful due to the slew of odd build failures that are possible due to the amount of random Kconfig options sysctls use. To that end Joel's work is split by first addressing the major housekeeping needed to remove the sentinels, which is part of this merge request. The rest of the work to actually remove the sentinels will be done later in future kernel releases. The preliminary math is showing this will all help reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array where we are able to remove each sentinel in the future. That also means there is no more bloating the kernel with the extra ~64 bytes per array moved as no new sentinels are created" * tag 'sysctl-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: Use ctl_table_size as stopping criteria for list macro sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl vrf: Update to register_net_sysctl_sz networking: Update to register_net_sysctl_sz netfilter: Update to register_net_sysctl_sz ax.25: Update to register_net_sysctl_sz sysctl: Add size to register_net_sysctl function sysctl: Add size arg to __register_sysctl_init sysctl: Add size to register_sysctl sysctl: Add a size arg to __register_sysctl_table sysctl: Add size argument to init_header sysctl: Add ctl_table_size to ctl_table_header sysctl: Use ctl_table_header in list_for_each_table_entry sysctl: Prefer ctl_table_header in proc_sysctl |
||
Linus Torvalds
|
b96a3e9142 |
- Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6 Nfio7afS1ncD+hPYT8947UnLxTgn+ww= =Afws -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ... |
||
Sabrina Dubroca
|
fd0fc6fdd8 |
tls: move tls_cipher_size_desc to net/tls/tls.h
It's only used in net/tls/*, no need to bloat include/net/tls.h. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/dd9fad80415e5b3575b41f56b331871038362eab.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Dima Chumak
|
390a24cbc3 |
devlink: Expose port function commands to control IPsec packet offloads
Expose port function commands to enable / disable IPsec packet offloads, this is used to control the port IPsec capabilities. When IPsec packet is disabled for a function of the port (default), function cannot offload IPsec packet operations (encapsulation and XFRM policy offload). When enabled, IPsec packet operations can be offloaded by the function of the port, which includes crypto operation (Encrypt/Decrypt), IPsec encapsulation and XFRM state and policy offload. Example of a PCI VF port which supports IPsec packet offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet disable $ devlink port function set pci/0000:06:00.0/1 ipsec_packet enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_packet enable Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Dima Chumak
|
62b6442c58 |
devlink: Expose port function commands to control IPsec crypto offloads
Expose port function commands to enable / disable IPsec crypto offloads, this is used to control the port IPsec capabilities. When IPsec crypto is disabled for a function of the port (default), function cannot offload any IPsec crypto operations (Encrypt/Decrypt and XFRM state offloading). When enabled, IPsec crypto operations can be offloaded by the function of the port. Example of a PCI VF port which supports IPsec crypto offloads: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto disable $ devlink port function set pci/0000:06:00.0/1 ipsec_crypto enable $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable ipsec_crypto enable Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jakub Kicinski
|
bebfbf07c7 |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZOjkTAAKCRDbK58LschI gx32AP9gaaHFBtOYBfoenKTJfMgv1WhtQHIBas+WN9ItmBx9MAEA4gm/VyQ6oD7O EBjJKJQ2CZ/QKw7cNacXw+l5jF7/+Q0= =8P7g -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-08-25 We've added 87 non-merge commits during the last 8 day(s) which contain a total of 104 files changed, 3719 insertions(+), 4212 deletions(-). The main changes are: 1) Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds, from Jiri Olsa. 2) Add support BPF cpu v4 instructions for arm64 JIT compiler, from Xu Kuohai. 3) Add support BPF cpu v4 instructions for riscv64 JIT compiler, from Pu Lehui. 4) Fix LWT BPF xmit hooks wrt their return values where propagating the result from skb_do_redirect() would trigger a use-after-free, from Yan Zhai. 5) Fix a BPF verifier issue related to bpf_kptr_xchg() with local kptr where the map's value kptr type and locally allocated obj type mismatch, from Yonghong Song. 6) Fix BPF verifier's check_func_arg_reg_off() function wrt graph root/node which bypassed reg->off == 0 enforcement, from Kumar Kartikeya Dwivedi. 7) Lift BPF verifier restriction in networking BPF programs to treat comparison of packet pointers not as a pointer leak, from Yafang Shao. 8) Remove unmaintained XDP BPF samples as they are maintained in xdp-tools repository out of tree, from Toke Høiland-Jørgensen. 9) Batch of fixes for the tracing programs from BPF samples in order to make them more libbpf-aware, from Daniel T. Lee. 10) Fix a libbpf signedness determination bug in the CO-RE relocation handling logic, from Andrii Nakryiko. 11) Extend libbpf to support CO-RE kfunc relocations. Also follow-up fixes for bpf_refcount shared ownership implementation, both from Dave Marchevsky. 12) Add a new bpf_object__unpin() API function to libbpf, from Daniel Xu. 13) Fix a memory leak in libbpf to also free btf_vmlinux when the bpf_object gets closed, from Hao Luo. 14) Small error output improvements to test_bpf module, from Helge Deller. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (87 commits) selftests/bpf: Add tests for rbtree API interaction in sleepable progs bpf: Allow bpf_spin_{lock,unlock} in sleepable progs bpf: Consider non-owning refs to refcounted nodes RCU protected bpf: Reenable bpf_refcount_acquire bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes bpf: Consider non-owning refs trusted bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire selftests/bpf: Enable cpu v4 tests for RV64 riscv, bpf: Support unconditional bswap insn riscv, bpf: Support signed div/mod insns riscv, bpf: Support 32-bit offset jmp insn riscv, bpf: Support sign-extension mov insns riscv, bpf: Support sign-extension load insns riscv, bpf: Fix missing exception handling and redundant zext for LDX_B/H/W samples/bpf: Add note to README about the XDP utilities moved to xdp-tools samples/bpf: Cleanup .gitignore samples/bpf: Remove the xdp_sample_pkts utility samples/bpf: Remove the xdp1 and xdp2 utilities samples/bpf: Remove the xdp_rxq_info utility samples/bpf: Remove the xdp_redirect* utilities ... ==================== Link: https://lore.kernel.org/r/20230825194319.12727-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jakub Kicinski
|
1fa6ffad12 |
wireless-next patches for v6.6
The second pull request for v6.6, this time with both stack and driver changes. Unusually we have only one major new feature but lots of small cleanup all over, I guess this is due to people have been on vacation the last month. Major changes: rtw89 * Introduce Time Averaged SAR (TAS) support -----BEGIN PGP SIGNATURE----- iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmToqosRHGt2YWxvQGtl cm5lbC5vcmcACgkQbhckVSbrbZv9XQf9HDq9smbuWLvwzNjbbS31hHFLmnfhN8Zp +Zzn47gpMCle9ahGLQyw8lcfNPWCMyqOu4sGQ6hyyuH+YXoxZryuq9QDwWo9L/b1 5Cpm4IaBYBMm0ZoOkWw2lQSzGyNrXgvCEKRVC+pYQMvr5V2aEWxT/kT4guiou9D5 OXPRFN2iqZP0Q3TKcfKWRnWn3S0Ok3kZCFuXcWkL0sgwjqP/wbAPO1XNI1IImKNM xUd0zT4vK/layYq7i20y8blglI5kcp/aKCFEwYpQC2WPeZ3Wtl1G9PQ8eze5Gc2Q NTw3xfr6tENIcAmYoLdBdKbUq6e6pwLwXlojlZ2beR6s7LHM30AinQ== =2Hja -----END PGP SIGNATURE----- Merge tag 'wireless-next-2023-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.6 The second pull request for v6.6, this time with both stack and driver changes. Unusually we have only one major new feature but lots of small cleanup all over, I guess this is due to people have been on vacation the last month. Major changes: rtw89 - Introduce Time Averaged SAR (TAS) support * tag 'wireless-next-2023-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (114 commits) wifi: rtlwifi: rtl8723: Remove unused function rtl8723_cmd_send_packet() wifi: rtw88: usb: kill and free rx urbs on probe failure wifi: rtw89: Fix clang -Wimplicit-fallthrough in rtw89_query_sar() wifi: rtw89: phy: modify register setting of ENV_MNTR, PHYSTS and DIG wifi: rtw89: phy: add phy_gen_def::cr_base to support WiFi 7 chips wifi: rtw89: mac: define register address of rx_filter to generalize code wifi: rtw89: mac: define internal memory address for WiFi 7 chip wifi: rtw89: mac: generalize code to indirectly access WiFi internal memory wifi: rtw89: mac: add mac_gen_def::band1_offset to map MAC band1 register address wifi: wlcore: sdio: Use module_sdio_driver macro to simplify the code wifi: rtw89: initialize multi-channel handling wifi: rtw89: provide functions to configure NoA for beacon update wifi: rtw89: call rtw89_chan_get() by vif chanctx if aware of vif wifi: rtw89: sar: let caller decide the center frequency to query wifi: rtw89: refine rtw89_correct_cck_chan() by rtw89_hw_to_nl80211_band() wifi: rtw89: add function prototype for coex request duration Fix nomenclature for USB and PCI wireless devices wifi: ath: Use is_multicast_ether_addr() to check multicast Ether address wifi: ath12k: Remove unused declarations wifi: ath12k: add check max message length while scanning with extraie ... ==================== Link: https://lore.kernel.org/r/20230825132230.A0833C433C8@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Luiz Augusto von Dentz
|
253f3399f4 |
Bluetooth: HCI: Introduce HCI_QUIRK_BROKEN_LE_CODED
This introduces HCI_QUIRK_BROKEN_LE_CODED which is used to indicate
that LE Coded PHY shall not be used, it is then set for some Intel
models that claim to support it but when used causes many problems.
Cc: stable@vger.kernel.org # 6.4.y+
Link: https://github.com/bluez/bluez/issues/577
Link: https://github.com/bluez/bluez/issues/582
Link: https://lore.kernel.org/linux-bluetooth/CABBYNZKco-v7wkjHHexxQbgwwSz-S=GZ=dZKbRE1qxT1h4fFbQ@mail.gmail.com/T/#
Fixes:
|
||
Claudia Draghicescu
|
9c0826310b |
Bluetooth: ISO: Add support for periodic adv reports processing
In the case of a Periodic Synchronized Receiver, the PA report received from a Broadcaster contains the BASE, which has information about codec and other parameters of a BIG. This isnformation is stored and the application can retrieve it using getsockopt(BT_ISO_BASE). Signed-off-by: Claudia Draghicescu <claudia.rosu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> |
||
Luiz Augusto von Dentz
|
db08722fc7 |
Bluetooth: hci_core: Fix missing instances using HCI_MAX_AD_LENGTH
There a few instances still using HCI_MAX_AD_LENGTH instead of using
max_adv_len which takes care of detecting what is the actual maximum
length depending on if the controller supports EA or not.
Fixes:
|
||
Iulia Tanasescu
|
fbdc4bc472 |
Bluetooth: ISO: Use defer setup to separate PA sync and BIG sync
This commit implements defer setup support for the Broadcast Sink scenario: By setting defer setup on a broadcast socket before calling listen, the user is able to trigger the PA sync and BIG sync procedures separately. This is useful if the user first wants to synchronize to the periodic advertising transmitted by a Broadcast Source, and trigger the BIG sync procedure later on. If defer setup is set, once a PA sync established event arrives, a new hcon is created and notified to the ISO layer. A child socket associated with the PA sync connection will be added to the accept queue of the listening socket. Once the accept call returns the fd for the PA sync child socket, the user should call read on that fd. This will trigger the BIG create sync procedure, and the PA sync socket will become a listening socket itself. When the BIG sync established event is notified to the ISO layer, the bis connections will be added to the accept queue of the PA sync parent. The user should call accept on the PA sync socket to get the final bis connections. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> |
||
Jakub Kicinski
|
3c5066c6b0 |
Merge branch 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says: ==================== mlx5 MACsec RoCEv2 support From Patrisious: This series extends previously added MACsec offload support to cover RoCE traffic either. In order to achieve that, we need configure MACsec with offload between the two endpoints, like below: REMOTE_MAC=10:70:fd:43:71:c0 * ip addr add 1.1.1.1/16 dev eth2 * ip link set dev eth2 up * ip link add link eth2 macsec0 type macsec encrypt on * ip macsec offload macsec0 mac * ip macsec add macsec0 tx sa 0 pn 1 on key 00 dffafc8d7b9a43d5b9a3dfbbf6a30c16 * ip macsec add macsec0 rx port 1 address $REMOTE_MAC * ip macsec add macsec0 rx port 1 address $REMOTE_MAC sa 0 pn 1 on key 01 ead3664f508eb06c40ac7104cdae4ce5 * ip addr add 10.1.0.1/16 dev macsec0 * ip link set dev macsec0 up And in a similar manner on the other machine, while noting the keys order would be reversed and the MAC address of the other machine. RDMA traffic is separated through relevant GID entries and in case of IP ambiguity issue - meaning we have a physical GIDs and a MACsec GIDs with the same IP/GID, we disable our physical GID in order to force the user to only use the MACsec GID. v0: https://lore.kernel.org/netdev/20230813064703.574082-1-leon@kernel.org/ * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: RDMA/mlx5: Handles RoCE MACsec steering rules addition and deletion net/mlx5: Add RoCE MACsec steering infrastructure in core net/mlx5: Configure MACsec steering for ingress RoCEv2 traffic net/mlx5: Configure MACsec steering for egress RoCEv2 traffic IB/core: Reorder GID delete code for RoCE net/mlx5: Add MACsec priorities in RDMA namespaces RDMA/mlx5: Implement MACsec gid addition and deletion net/mlx5: Maintain fs_id xarray per MACsec device inside macsec steering net/mlx5: Remove netdevice from MACsec steering net/mlx5e: Move MACsec flow steering and statistics database from ethernet to core net/mlx5e: Rename MACsec flow steering functions/parameters to suit core naming style net/mlx5: Remove dependency of macsec flow steering on ethernet net/mlx5e: Move MACsec flow steering operations to be used as core library macsec: add functions to get macsec real netdevice and check offload ==================== Link: https://lore.kernel.org/r/20230821073833.59042-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Jakub Kicinski
|
57ce6427e0 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. Conflicts: include/net/inet_sock.h |
||
Paolo Abeni
|
8938fc0c7e |
netfilter pull request 2023-08-23
-----BEGIN PGP SIGNATURE----- iQJBBAABCAArFiEEgKkgxbID4Gn1hq6fcJGo2a1f9gAFAmTmI1cNHGZ3QHN0cmxl bi5kZQAKCRBwkajZrV/2AKBEEACACRkBNJ38IZoNhRdDWWVpoGiBL08BBZ/9Fdhh Cc/iZ0d/XWcAS8qmPlABk82rwZ7EwW0l+9VGai4easY37S6SC0qLKZQYScZj5Fpl hUMRiEn/Hd1fYjgGPCPG7dCFHYmh0JzXDFDDrBE9eRJmo7JdU/M9amLxYa2q1La7 vvC6f9MO7+zUeCl5KLOpCBl3/kLDadHSA0FBaPIWP3K+Pd1wR2QJpNoy8U7XzZJP 0+oS6kqqaOhAKImCzct2de1xfY4djnMzYYxAqxAUdd60/2dLiT+NJK03LA+FMKFX 7bZY/CnoqWZzXbWcMAC/fg7nbj7zSS1HIgOft3zbj1sGZrhZmINC3hTjiIeSwyZV /n0fbV3IQaGCWx3dAGUQpuuCk3FwpIsw4NyRM8v43mnbFeaon/dBtMycXsWP+xiH VMc0j+BJl5zWNynZVTF1PYuNwkX9uubhDVrgtkqZZD+9RzE8i6DiRf7deOBLsI3N XlJpuc34hgGKe3s+Wn1FOY7jMO4FG6OEjB67t0tpjgAxg4mnuxGncXPV+dbTDq9k fgwntbo5RAL9R4itb2Qfy0cg4NiFF1Nqjyzxo+bBMMByst1hlsrAX/V7LInKF9Hi VI4X8YRdV2b8cQVFpqBigJS/k7wRUH7pdgd7YA6QSDVrBSp5mLf49+L7gaGOTJ6i hag4pg== =EVaB -----END PGP SIGNATURE----- Merge tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter updates for net This PR contains nf_tables updates for your *net* tree. First patch fixes table validation, I broke this in 6.4 when tracking validation state per table, reported by Pablo, fixup from myself. Second patch makes sure objects waiting for memory release have been released, this was broken in 6.1, patch from Pablo Neira Ayuso. Patch three is a fix-for-fix from previous PR: In case a transaction gets aborted, gc sequence counter needs to be incremented so pending gc requests are invalidated, from Pablo. Same for patch 4: gc list needs to use gc list lock, not destroy lock, also from Pablo. Patch 5 fixes a UaF in a set backend, but this should only occur when failslab is enabled for GFP_KERNEL allocations, broken since feature was added in 5.6, from myself. Patch 6 fixes a double-free bug that was also added via previous PR: We must not schedule gc work if the previous batch is still queued. netfilter pull request 2023-08-23 * tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: defer gc run if previous batch is still pending netfilter: nf_tables: fix out of memory error handling netfilter: nf_tables: use correct lock to protect gc_list netfilter: nf_tables: GC transaction race with abort path netfilter: nf_tables: flush pending destroy work before netlink notifier netfilter: nf_tables: validate all pending tables ==================== Link: https://lore.kernel.org/r/20230823152711.15279-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Hangbin Liu
|
e74216b8de |
bonding: fix macvlan over alb bond support
The commit |
||
Florian Westphal
|
8e51830e29 |
netfilter: nf_tables: defer gc run if previous batch is still pending
Don't queue more gc work, else we may queue the same elements multiple
times.
If an element is flagged as dead, this can mean that either the previous
gc request was invalidated/discarded by a transaction or that the previous
request is still pending in the system work queue.
The latter will happen if the gc interval is set to a very low value,
e.g. 1ms, and system work queue is backlogged.
The sets refcount is 1 if no previous gc requeusts are queued, so add
a helper for this and skip gc run if old requests are pending.
Add a helper for this and skip the gc run in this case.
Fixes:
|
||
Florian Westphal
|
4b80ced971 |
netfilter: nf_tables: validate all pending tables
We have to validate all tables in the transaction that are in
VALIDATE_DO state, the blamed commit below did not move the break
statement to its right location so we only validate one table.
Moreover, we can't init table->validate to _SKIP when a table object
is allocated.
If we do, then if a transcaction creates a new table and then
fails the transaction, nfnetlink will loop and nft will hang until
user cancels the command.
Add back the pernet state as a place to stash the last state encountered.
This is either _DO (we hit an error during commit validation) or _SKIP
(transaction passed all checks).
Fixes:
|
||
Geliang Tang
|
740ebe35bd |
mptcp: add struct mptcp_sched_ops
This patch defines struct mptcp_sched_ops, which has three struct members, name, owner and list, and four function pointers: init(), release() and get_subflow(). The scheduler function get_subflow() have a struct mptcp_sched_data parameter, which contains a reinject flag for retrans or not, a subflows number and a mptcp_subflow_context array. Add the scheduler registering, unregistering and finding functions to add, delete and find a packet scheduler on the global list mptcp_sched_list. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230821-upstream-net-next-20230818-v1-3-0c860fb256a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Arnd Bergmann
|
a7a2ef0c4b |
mac80211: make ieee80211_tx_info padding explicit
While looking at a bug, I got rather confused by the layout of the 'status' field in ieee80211_tx_info. Apparently, the intention is that status_driver_data[] is used for driver specific data, and fills up the size of the union to 40 bytes, just like the other ones. This is indeed what actually happens, but only because of the combination of two mistakes: - "void *status_driver_data[18 / sizeof(void *)];" is intended to be 18 bytes long but is actually two bytes shorter because of rounding-down in the division, to a multiple of the pointer size (4 bytes or 8 bytes). - The other fields combined are intended to be 22 bytes long, but are actually 24 bytes because of padding in front of the unaligned tx_time member, and in front of the pointer array. The two mistakes cancel out. so the size ends up fine, but it seems more helpful to make this explicit, by having a multiple of 8 bytes in the size calculation and explicitly describing the padding. Fixes: |
||
Randy Dunlap
|
c6662a4b3e |
wifi: mac80211: fix kernel-doc notation warning
Add description for struct member 'agg' to prevent a kernel-doc
warning.
mac80211.h:2289: warning: Function parameter or member 'agg' not described in 'ieee80211_link_sta'
Fixes:
|
||
Randy Dunlap
|
266a5cd768 |
wifi: radiotap: fix kernel-doc notation warnings
Fix a typo (82011 -> 80211) to prevent a kernel-doc warning.
Add one missing function parameter description to prevent a
kernel-doc warning.
ieee80211_radiotap.h:52: warning: expecting prototype for struct ieee82011_radiotap_header. Prototype was for struct ieee80211_radiotap_header instead
ieee80211_radiotap.h:581: warning: Function parameter or member 'data' not described in 'ieee80211_get_radiotap_len'
Fixes:
|
||
Randy Dunlap
|
a49a0d4e57 |
wifi: cfg80211: remove dead/unused enum value
Drop an unused (extra) enum value to prevent a kernel-doc warning.
cfg80211.h:1492: warning: Excess enum value 'STATION_PARAM_APPLY_STA_TXPOWER' description in 'station_parameters_apply_mask'
Fixes:
|
||
Dmitry Antipov
|
1dcf396b42 |
wifi: cfg80211: improve documentation for flag fields
Fix and hopefully improve documentation for 'flag' fields of a few types by adding references to relevant enumerations. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://lore.kernel.org/r/20230713132957.275859-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com> |
||
Yue Haibing
|
7bdfda42f0 |
wifi: wext: Remove unused declaration dev_get_wireless_info()
Commit |
||
Jakub Kicinski
|
1a8660546b |
Two fixes:
* reorder buffer filter checks can cause bad shift/UBSAN warning with newer HW, avoid the check (mac80211) * add Kconfig dependency for iwlwifi for PTP clock usage -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmTkrKMACgkQ10qiO8sP aAD6/A//bR8/98qCxJqeZurTw83JwWDo7J5Twthk4ik4cpi35s+7bqVYc34a1vIT poIiIxffZRUSsHOoMXpTP6xd7gLP9Hz6Ba8Jd7X9NG+/lfdpJlWGLDFri3JpNREi Jjqq8XOzB7+c+TwrK85j7nxY1JXLqLzXOxetAtuIZEqVrmC+T6+nnAw1ITLfCVL/ fVaAFx3mDxTLi/qbAYnYPBKY5Kq1KH0F3j389vfIVPzTXBmLcYppZ17Lz38io0pZ MC0NO3W0U4MFX3i5P2Q/yjlUsuwJJROqJmnAOcy9+McQU9nFHrIAV3Na9G1mZbyS xmdcbjokhcE95ht8JI9yHESAbtACaSM4jq4W03zlgvvj3TrVt+1bDjjXLexqb6kc YinRoVc3Dn+Nvw8DrDa/1PMuO1YAlVg3ZYXRyjlL4dcLbz0SOiEHFyTIjIKflZYZ TbstNDbygIxBBtmVx/aJBZZoFo3G6e5FGQrVZ0uvDPqVaaFvs3ESv+ooXek9gl06 OlzngnrTJO52Ky4quCTtmR16+J3GeAUjZNUsQKqNHu28zzBsF+CP8j+OCUTPQTe0 YZuqBqnSDPUcllBxXEDp2pPm102q0DvHeKXg7cugdY2zyzvjgrxBSBwh5xwUA4sT RMI55Gok6gcdzQ1GHHRzLK6UkKzgmAbTjMypJkVV6Qh7dG6RNJo= =edA4 -----END PGP SIGNATURE----- Merge tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Two fixes: - reorder buffer filter checks can cause bad shift/UBSAN warning with newer HW, avoid the check (mac80211) - add Kconfig dependency for iwlwifi for PTP clock usage * tag 'wireless-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning wifi: iwlwifi: mvm: add dependency for PTP clock ==================== Link: https://lore.kernel.org/r/20230822124206.43926-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Zhengchao Shao
|
43c2817225 |
net: remove unnecessary input parameter 'how' in ifdown function
When the ifdown function in the dst_ops structure is referenced, the input parameter 'how' is always true. In the current implementation of the ifdown interface, ip6_dst_ifdown does not use the input parameter 'how', xfrm6_dst_ifdown and xfrm4_dst_ifdown functions use the input parameter 'unregister'. But false judgment on 'unregister' in xfrm6_dst_ifdown and xfrm4_dst_ifdown is false, so remove the input parameter 'how' in ifdown function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230821084104.3812233-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
||
Kees Cook
|
545d3523df |
wifi: cfg80211: Annotate struct cfg80211_tid_config with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_tid_config. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230817211531.4193219-7-keescook@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> |
||
Kees Cook
|
e3eac9f32e |
wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_scan_request. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230817211531.4193219-6-keescook@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> |
||
Kees Cook
|
7b6d708703 |
wifi: cfg80211: Annotate struct cfg80211_rnr_elems with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_rnr_elems. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230817211531.4193219-5-keescook@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> |
||
Kees Cook
|
342bc7c9e8 |
wifi: cfg80211: Annotate struct cfg80211_pmsr_request with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_pmsr_request. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230817211531.4193219-4-keescook@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> |
||
Kees Cook
|
c14679d700 |
wifi: cfg80211: Annotate struct cfg80211_mbssid_elems with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_mbssid_elems. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230817211531.4193219-3-keescook@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> |
||
Kees Cook
|
93ca82447c |
wifi: cfg80211: Annotate struct cfg80211_acl_data with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cfg80211_acl_data. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20230817211531.4193219-1-keescook@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> |
||
Ping-Ke Shih
|
b98c16107c |
wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning
The commit
|
||
Eric Dumazet
|
f866fbc842 |
ipv4: fix data-races around inet->inet_id
UDP sendmsg() is lockless, so ip_select_ident_segs()
can very well be run from multiple cpus [1]
Convert inet->inet_id to an atomic_t, but implement
a dedicated path for TCP, avoiding cost of a locked
instruction (atomic_add_return())
Note that this patch will cause a trivial merge conflict
because we added inet->flags in net-next tree.
v2: added missing change in
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
(David Ahern)
[1]
BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb
read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1:
ip_select_ident_segs include/net/ip.h:542 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0:
ip_select_ident_segs include/net/ip.h:541 [inline]
ip_select_ident include/net/ip.h:556 [inline]
__ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446
ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2634
__do_sys_sendmmsg net/socket.c:2663 [inline]
__se_sys_sendmmsg net/socket.c:2660 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x184d -> 0x184e
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
==================================================================
Fixes:
|
||
Jakub Kicinski
|
f534f6581e |
net: validate veth and vxcan peer ifindexes
veth and vxcan need to make sure the ifindexes of the peer are not negative, core does not validate this. Using iproute2 with user-space-level checking removed: Before: # ./ip link add index 10 type veth peer index -1 # ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff 10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff -1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff Now: $ ./ip link add index 10 type veth peer index -1 Error: ifindex can't be negative. This problem surfaced in net-next because an explicit WARN() was added, the root cause is older. Fixes: |
||
Patrisious Haddad
|
f132fdd9dc |
macsec: add functions to get macsec real netdevice and check offload
Given a macsec net_device add two functions to return the real net_device for that device, and check if that macsec device is offloaded or not. This is needed for auxiliary drivers that implement MACsec offload, but have flows which are triggered over the macsec net_device, this allows the drivers in such cases to verify if the device is offloaded or not, and to access the real device of that macsec device, which would belong to the driver, and would be needed for the offload procedure. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> |
||
Eric Dumazet
|
4025d3e73a |
net: add skb_queue_purge_reason and __skb_queue_purge_reason
skb_queue_purge() and __skb_queue_purge() become wrappers around the new generic functions. New SKB_DROP_REASON_QUEUE_PURGE drop reason is added, but users can start adding more specific reasons. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Eric Dumazet
|
76f33296d2 |
sock: annotate data-races around prot->memory_pressure
*prot->memory_pressure is read/writen locklessly, we need to add proper annotations. A recent commit added a new race, it is time to audit all accesses. Fixes: |
||
Jakub Kicinski
|
7ff57803d2 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c |
||
Matthew Wilcox (Oracle)
|
350f6bbca1 |
mm: allow per-VMA locks on file-backed VMAs
Remove the TCP layering violation by allowing per-VMA locks on all VMAs. The fault path will immediately fail in handle_mm_fault(). There may be a small performance reduction from this patch as a little unnecessary work will be done on each page fault. See later patches for the improvement. Link: https://lkml.kernel.org/r/20230724185410.1124082-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Yan Zhai
|
a171fbec88 |
lwt: Check LWTUNNEL_XMIT_CONTINUE strictly
LWTUNNEL_XMIT_CONTINUE is implicitly assumed in ip(6)_finish_output2,
such that any positive return value from a xmit hook could cause
unexpected continue behavior, despite that related skb may have been
freed. This could be error-prone for future xmit hook ops. One of the
possible errors is to return statuses of dst_output directly.
To make the code safer, redefine LWTUNNEL_XMIT_CONTINUE value to
distinguish from dst_output statuses and check the continue
condition explicitly.
Fixes:
|
||
Abel Wu
|
2d0c88e84e |
sock: Fix misuse of sk_under_memory_pressure()
The status of global socket memory pressure is updated when:
a) __sk_mem_raise_allocated():
enter: sk_memory_allocated(sk) > sysctl_mem[1]
leave: sk_memory_allocated(sk) <= sysctl_mem[0]
b) __sk_mem_reduce_allocated():
leave: sk_under_memory_pressure(sk) &&
sk_memory_allocated(sk) < sysctl_mem[0]
So the conditions of leaving global pressure are inconstant, which
may lead to the situation that one pressured net-memcg prevents the
global pressure from being cleared when there is indeed no global
pressure, thus the global constrains are still in effect unexpectedly
on the other sockets.
This patch fixes this by ignoring the net-memcg's pressure when
deciding whether should leave global memory pressure.
Fixes:
|
||
Jakub Kicinski
|
f54a2a132a |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZN0eNgAKCRDbK58LschI gwhhAQCwbrEgA3LslDlk22eqyfRH04D+9d7Kc3ISQssyjlr9swD+NfwfDvYqopwj Dp67QkHdluixf2/NMPTEvg/CA4mlmww= =4BwF -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-08-16 We've added 17 non-merge commits during the last 6 day(s) which contain a total of 20 files changed, 1179 insertions(+), 37 deletions(-). The main changes are: 1) Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy applications, from Geliang Tang. 2) Follow-up/fallout fix from the SO_REUSEPORT + bpf_sk_assign work to fix a splat on non-fullsock sks in inet[6]_steal_sock, from Lorenz Bauer. 3) Improvements to struct_ops links to avoid forcing presence of update/validate callbacks. Also add bpf_struct_ops fields documentation, from David Vernet. 4) Ensure libbpf sets close-on-exec flag on gzopen, from Marco Vedovati. 5) Several new tcx selftest additions and bpftool link show support for tcx and xdp links, from Daniel Borkmann. 6) Fix a smatch warning on uninitialized symbol in bpf_perf_link_fill_kprobe, from Yafang Shao. 7) BPF selftest fixes e.g. misplaced break in kfunc_call test, from Yipeng Zou. 8) Small cleanup to remove unused declaration bpf_link_new_file, from Yue Haibing. 9) Small typo fix to bpftool's perf help message, from Daniel T. Lee. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Add mptcpify test selftests/bpf: Fix error checks of mptcp open_and_load selftests/bpf: Add two mptcp netns helpers bpf: Add update_socket_protocol hook bpftool: Implement link show support for xdp bpftool: Implement link show support for tcx selftests/bpf: Add selftest for fill_link_info bpf: Fix uninitialized symbol in bpf_perf_link_fill_kprobe() net: Fix slab-out-of-bounds in inet[6]_steal_sock bpf: Document struct bpf_struct_ops fields bpf: Support default .validate() and .update() behavior for struct_ops links selftests/bpf: Add various more tcx test cases selftests/bpf: Clean up fmod_ret in bench_rename test script selftests/bpf: Fix repeat option when kfunc_call verification fails libbpf: Set close-on-exec flag on gzopen bpftool: fix perf help message bpf: Remove unused declaration bpf_link_new_file() ==================== Link: https://lore.kernel.org/r/20230816212840.1539-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |