From 0e03c643dc9389e61fa484562dae58c8d6e96d63 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Wed, 17 Jul 2024 13:25:06 +0200 Subject: [PATCH 01/16] eth: fbnic: fix s390 build. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Building the fbnic nn s390, yield a build bug: In function ‘fbnic_config_drop_mode_rcq’, inlined from ‘fbnic_enable’ at drivers/net/ethernet/meta/fbnic/fbnic_txrx.c:1836:4: ././include/linux/compiler_types.h:510:45: error: call to ‘__compiletime_assert_919’ declared with attribute error: FIELD_PREP: value too large for the field The relevant mask is 9 bits wide, and the related value is the cacheline aligned size of struct skb_shared_info. On s390 the cacheline size is 256 bytes, and skb_shared_info minimum size on 64 bits system is 320 bytes. Avoid building the driver for such arch. Reported-by: kernel test robot Link: https://lore.kernel.org/202407170432.dYJQOWVz-lkp@intel.com/ Reported-by: Nathan Chancellor Fixes: 0cb4c0a13723 ("eth: fbnic: Implement Rx queue alloc/start/stop/free") Signed-off-by: Paolo Abeni Link: https://patch.msgid.link/5dfefd3e90e77828f38e68854b171a5b8b8c6ede.1721215379.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/meta/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/meta/Kconfig b/drivers/net/ethernet/meta/Kconfig index d8f5e9f9bb33..a9f078212c78 100644 --- a/drivers/net/ethernet/meta/Kconfig +++ b/drivers/net/ethernet/meta/Kconfig @@ -20,6 +20,7 @@ if NET_VENDOR_META config FBNIC tristate "Meta Platforms Host Network Interface" depends on X86_64 || COMPILE_TEST + depends on S390=n depends on PCI_MSI select PHYLINK help From 782161895eb4ac45cf7cfa8db375bd4766cb8299 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Sat, 13 Jul 2024 16:47:38 +0200 Subject: [PATCH 02/16] netfilter: ctnetlink: use helper function to calculate expect ID Delete expectation path is missing a call to the nf_expect_get_id() helper function to calculate the expectation ID, otherwise LSB of the expectation object address is leaked to userspace. Fixes: 3c79107631db ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id") Reported-by: zdi-disclosures@trendmicro.com Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_netlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 3b846cbdc050..4cbf71d0786b 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -3420,7 +3420,8 @@ static int ctnetlink_del_expect(struct sk_buff *skb, if (cda[CTA_EXPECT_ID]) { __be32 id = nla_get_be32(cda[CTA_EXPECT_ID]); - if (ntohl(id) != (u32)(unsigned long)exp) { + + if (id != nf_expect_get_id(exp)) { nf_ct_expect_put(exp); return -ENOENT; } From 791a615b7ad2258c560f91852be54b0480837c93 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 15 Jul 2024 13:54:03 +0200 Subject: [PATCH 03/16] netfilter: nf_set_pipapo: fix initial map fill The initial buffer has to be inited to all-ones, but it must restrict it to the size of the first field, not the total field size. After each round in the map search step, the result and the fill map are swapped, so if we have a set where f->bsize of the first element is smaller than m->bsize_max, those one-bits are leaked into future rounds result map. This makes pipapo find an incorrect matching results for sets where first field size is not the largest. Followup patch adds a test case to nft_concat_range.sh selftest script. Thanks to Stefano Brivio for pointing out that we need to zero out the remainder explicitly, only correcting memset() argument isn't enough. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: Yi Chen Cc: Stefano Brivio Signed-off-by: Florian Westphal Reviewed-by: Stefano Brivio Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_set_pipapo.c | 4 ++-- net/netfilter/nft_set_pipapo.h | 21 +++++++++++++++++++++ net/netfilter/nft_set_pipapo_avx2.c | 10 ++++++---- 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index 15a236bebb46..eb4c4a4ac7ac 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -434,7 +434,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set, res_map = scratch->map + (map_index ? m->bsize_max : 0); fill_map = scratch->map + (map_index ? 0 : m->bsize_max); - memset(res_map, 0xff, m->bsize_max * sizeof(*res_map)); + pipapo_resmap_init(m, res_map); nft_pipapo_for_each_field(f, i, m) { bool last = i == m->field_count - 1; @@ -542,7 +542,7 @@ static struct nft_pipapo_elem *pipapo_get(const struct net *net, goto out; } - memset(res_map, 0xff, m->bsize_max * sizeof(*res_map)); + pipapo_resmap_init(m, res_map); nft_pipapo_for_each_field(f, i, m) { bool last = i == m->field_count - 1; diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h index 0d2e40e10f7f..4a2ff85ce1c4 100644 --- a/net/netfilter/nft_set_pipapo.h +++ b/net/netfilter/nft_set_pipapo.h @@ -278,4 +278,25 @@ static u64 pipapo_estimate_size(const struct nft_set_desc *desc) return size; } +/** + * pipapo_resmap_init() - Initialise result map before first use + * @m: Matching data, including mapping table + * @res_map: Result map + * + * Initialize all bits covered by the first field to one, so that after + * the first step, only the matching bits of the first bit group remain. + * + * If other fields have a large bitmap, set remainder of res_map to 0. + */ +static inline void pipapo_resmap_init(const struct nft_pipapo_match *m, unsigned long *res_map) +{ + const struct nft_pipapo_field *f = m->f; + int i; + + for (i = 0; i < f->bsize; i++) + res_map[i] = ULONG_MAX; + + for (i = f->bsize; i < m->bsize_max; i++) + res_map[i] = 0ul; +} #endif /* _NFT_SET_PIPAPO_H */ diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c index d08407d589ea..8910a5ac7ed1 100644 --- a/net/netfilter/nft_set_pipapo_avx2.c +++ b/net/netfilter/nft_set_pipapo_avx2.c @@ -1036,6 +1036,7 @@ nothing: /** * nft_pipapo_avx2_lookup_slow() - Fallback function for uncommon field sizes + * @mdata: Matching data, including mapping table * @map: Previous match result, used as initial bitmap * @fill: Destination bitmap to be filled with current match result * @f: Field, containing lookup and mapping tables @@ -1051,7 +1052,8 @@ nothing: * Return: -1 on no match, rule index of match if @last, otherwise first long * word index to be checked next (i.e. first filled word). */ -static int nft_pipapo_avx2_lookup_slow(unsigned long *map, unsigned long *fill, +static int nft_pipapo_avx2_lookup_slow(const struct nft_pipapo_match *mdata, + unsigned long *map, unsigned long *fill, const struct nft_pipapo_field *f, int offset, const u8 *pkt, bool first, bool last) @@ -1060,7 +1062,7 @@ static int nft_pipapo_avx2_lookup_slow(unsigned long *map, unsigned long *fill, int i, ret = -1, b; if (first) - memset(map, 0xff, bsize * sizeof(*map)); + pipapo_resmap_init(mdata, map); for (i = offset; i < bsize; i++) { if (f->bb == 8) @@ -1186,7 +1188,7 @@ next_match: } else if (f->groups == 16) { NFT_SET_PIPAPO_AVX2_LOOKUP(8, 16); } else { - ret = nft_pipapo_avx2_lookup_slow(res, fill, f, + ret = nft_pipapo_avx2_lookup_slow(m, res, fill, f, ret, rp, first, last); } @@ -1202,7 +1204,7 @@ next_match: } else if (f->groups == 32) { NFT_SET_PIPAPO_AVX2_LOOKUP(4, 32); } else { - ret = nft_pipapo_avx2_lookup_slow(res, fill, f, + ret = nft_pipapo_avx2_lookup_slow(m, res, fill, f, ret, rp, first, last); } From 0935ee6032dfe68bd1f8ddf4c43b618d7beafc69 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 15 Jul 2024 13:55:29 +0200 Subject: [PATCH 04/16] selftests: netfilter: add test case for recent mismatch bug Without 'netfilter: nf_set_pipapo: fix initial map fill' this fails: TEST: reported issues Add two elements, flush, re-add 1s [ OK ] net,mac with reload 1s [ OK ] net,port,proto 1s [FAIL] post-add: should have returned 10.5.8.0/24 . 51-60 . 6-17 but got table inet filter { set test { type ipv4_addr . inet_service . inet_proto flags interval,timeout elements = { 10.5.7.0/24 . 51-60 . 6-17 } } } The other sets defined in the selftest do not trigger this bug, it only occurs if the first field group bitsize is smaller than the largest group bitsize. For each added element, check 'get' works and actually returns the requested range. After map has been filled, check all added ranges can still be retrieved. For each deleted element, check that 'get' fails. Based on a reproducer script from Yi Chen. Signed-off-by: Florian Westphal Reviewed-by: Stefano Brivio Signed-off-by: Pablo Neira Ayuso --- .../net/netfilter/nft_concat_range.sh | 76 ++++++++++++++++++- 1 file changed, 75 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/netfilter/nft_concat_range.sh b/tools/testing/selftests/net/netfilter/nft_concat_range.sh index 6d66240e149c..47088b005390 100755 --- a/tools/testing/selftests/net/netfilter/nft_concat_range.sh +++ b/tools/testing/selftests/net/netfilter/nft_concat_range.sh @@ -27,7 +27,7 @@ TYPES="net_port port_net net6_port port_proto net6_port_mac net6_port_mac_proto net6_port_net6_port net_port_mac_proto_net" # Reported bugs, also described by TYPE_ variables below -BUGS="flush_remove_add reload" +BUGS="flush_remove_add reload net_port_proto_match" # List of possible paths to pktgen script from kernel tree for performance tests PKTGEN_SCRIPT_PATHS=" @@ -371,6 +371,22 @@ race_repeat 0 perf_duration 0 " +TYPE_net_port_proto_match=" +display net,port,proto +type_spec ipv4_addr . inet_service . inet_proto +chain_spec ip daddr . udp dport . meta l4proto +dst addr4 port proto +src +start 1 +count 9 +src_delta 9 +tools sendip bash +proto udp + +race_repeat 0 + +perf_duration 0 +" # Set template for all tests, types and rules are filled in depending on test set_template=' flush ruleset @@ -1555,6 +1571,64 @@ test_bug_reload() { nft flush ruleset } +# - add ranged element, check that packets match it +# - delete element again, check it is gone +test_bug_net_port_proto_match() { + setup veth send_"${proto}" set || return ${ksft_skip} + rstart=${start} + + range_size=1 + for i in $(seq 1 10); do + for j in $(seq 1 20) ; do + elem=$(printf "10.%d.%d.0/24 . %d1-%d0 . 6-17 " ${i} ${j} ${i} "$((i+1))") + + nft "add element inet filter test { $elem }" || return 1 + nft "get element inet filter test { $elem }" | grep -q "$elem" + if [ $? -ne 0 ];then + local got=$(nft "get element inet filter test { $elem }") + err "post-add: should have returned $elem but got $got" + return 1 + fi + done + done + + # recheck after set was filled + for i in $(seq 1 10); do + for j in $(seq 1 20) ; do + elem=$(printf "10.%d.%d.0/24 . %d1-%d0 . 6-17 " ${i} ${j} ${i} "$((i+1))") + + nft "get element inet filter test { $elem }" | grep -q "$elem" + if [ $? -ne 0 ];then + local got=$(nft "get element inet filter test { $elem }") + err "post-fill: should have returned $elem but got $got" + return 1 + fi + done + done + + # random del and re-fetch + for i in $(seq 1 10); do + for j in $(seq 1 20) ; do + local rnd=$((RANDOM%10)) + local got="" + + elem=$(printf "10.%d.%d.0/24 . %d1-%d0 . 6-17 " ${i} ${j} ${i} "$((i+1))") + if [ $rnd -gt 0 ];then + continue + fi + + nft "delete element inet filter test { $elem }" + got=$(nft "get element inet filter test { $elem }" 2>/dev/null) + if [ $? -eq 0 ];then + err "post-delete: query for $elem returned $got instead of error." + return 1 + fi + done + done + + nft flush ruleset +} + test_reported_issues() { eval test_bug_"${subtest}" } From cbd070a4ae62f119058973f6d2c984e325bce6e7 Mon Sep 17 00:00:00 2001 From: Chen Hanxiao Date: Thu, 27 Jun 2024 14:15:15 +0800 Subject: [PATCH 05/16] ipvs: properly dereference pe in ip_vs_add_service Use pe directly to resolve sparse warning: net/netfilter/ipvs/ip_vs_ctl.c:1471:27: warning: dereference of noderef expression Fixes: 39b972231536 ("ipvs: handle connections started by real-servers") Signed-off-by: Chen Hanxiao Acked-by: Julian Anastasov Acked-by: Simon Horman Signed-off-by: Pablo Neira Ayuso --- net/netfilter/ipvs/ip_vs_ctl.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 78a1cc72dc38..706c2b52a1ac 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -1459,18 +1459,18 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u, if (ret < 0) goto out_err; - /* Bind the ct retriever */ - RCU_INIT_POINTER(svc->pe, pe); - pe = NULL; - /* Update the virtual service counters */ if (svc->port == FTPPORT) atomic_inc(&ipvs->ftpsvc_counter); else if (svc->port == 0) atomic_inc(&ipvs->nullsvc_counter); - if (svc->pe && svc->pe->conn_out) + if (pe && pe->conn_out) atomic_inc(&ipvs->conn_out_counter); + /* Bind the ct retriever */ + RCU_INIT_POINTER(svc->pe, pe); + pe = NULL; + /* Count only IPv4 services for old get/setsockopt interface */ if (svc->af == AF_INET) ipvs->num_services++; From 03b54bad26f3c78bb1f90410ec3e4e7fe197adc9 Mon Sep 17 00:00:00 2001 From: Joshua Washington Date: Tue, 16 Jul 2024 10:10:41 -0700 Subject: [PATCH 06/16] gve: Fix XDP TX completion handling when counters overflow In gve_clean_xdp_done, the driver processes the TX completions based on a 32-bit NIC counter and a 32-bit completion counter stored in the tx queue. Fix the for loop so that the counter wraparound is handled correctly. Fixes: 75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format") Signed-off-by: Joshua Washington Signed-off-by: Praveen Kaligineedi Reviewed-by: Simon Horman Link: https://patch.msgid.link/20240716171041.1561142-1-pkaligineedi@google.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/google/gve/gve_tx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c index 24a64ec1073e..e7fb7d6d283d 100644 --- a/drivers/net/ethernet/google/gve/gve_tx.c +++ b/drivers/net/ethernet/google/gve/gve_tx.c @@ -158,15 +158,16 @@ static int gve_clean_xdp_done(struct gve_priv *priv, struct gve_tx_ring *tx, u32 to_do) { struct gve_tx_buffer_state *info; - u32 clean_end = tx->done + to_do; u64 pkts = 0, bytes = 0; size_t space_freed = 0; u32 xsk_complete = 0; u32 idx; + int i; - for (; tx->done < clean_end; tx->done++) { + for (i = 0; i < to_do; i++) { idx = tx->done & tx->mask; info = &tx->info[idx]; + tx->done++; if (unlikely(!info->xdp.size)) continue; From 1f038d5897fe6b439039fc28420842abcc0d126b Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Wed, 17 Jul 2024 10:15:46 +0200 Subject: [PATCH 07/16] net: airoha: fix error branch in airoha_dev_xmit and airoha_set_gdm_ports Fix error case management in airoha_dev_xmit routine since we need to DMA unmap pending buffers starting from q->head. Moreover fix a typo in error case branch in airoha_set_gdm_ports routine. Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC") Signed-off-by: Lorenzo Bianconi Reviewed-by: Simon Horman Link: https://patch.msgid.link/b628871bc8ae4861b5e2ab4db90aaf373cbb7cee.1721203880.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mediatek/airoha_eth.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mediatek/airoha_eth.c b/drivers/net/ethernet/mediatek/airoha_eth.c index 7967a92803c2..cc1a69522f61 100644 --- a/drivers/net/ethernet/mediatek/airoha_eth.c +++ b/drivers/net/ethernet/mediatek/airoha_eth.c @@ -977,7 +977,7 @@ static int airoha_set_gdm_ports(struct airoha_eth *eth, bool enable) return 0; error: - for (i--; i >= 0; i++) + for (i--; i >= 0; i--) airoha_set_gdm_port(eth, port_list[i], false); return err; @@ -2431,9 +2431,11 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb, return NETDEV_TX_OK; error_unmap: - for (i--; i >= 0; i++) - dma_unmap_single(dev->dev.parent, q->entry[i].dma_addr, - q->entry[i].dma_len, DMA_TO_DEVICE); + for (i--; i >= 0; i--) { + index = (q->head + i) % q->ndesc; + dma_unmap_single(dev->dev.parent, q->entry[index].dma_addr, + q->entry[index].dma_len, DMA_TO_DEVICE); + } spin_unlock_bh(&q->lock); error: From c14112a5574ff5cf3de198ab6eeff53ac1234068 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Wed, 17 Jul 2024 10:29:16 -0700 Subject: [PATCH 08/16] driver core: auxiliary bus: Fix documentation of auxiliary_device Fix the documentation of the below field of struct auxiliary_device include/linux/auxiliary_bus.h:150: warning: Function parameter or struct member 'sysfs' not described in 'auxiliary_device' include/linux/auxiliary_bus.h:150: warning: Excess struct member 'irqs' description in 'auxiliary_device' include/linux/auxiliary_bus.h:150: warning: Excess struct member 'lock' description in 'auxiliary_device' include/linux/auxiliary_bus.h:150: warning: Excess struct member 'irq_dir_exists' description in 'auxiliary_device' Fixes: a808878308a8 ("driver core: auxiliary bus: show auxiliary device IRQs") Reported-by: Stephen Rothwell Signed-off-by: Shay Drory Signed-off-by: Saeed Mahameed Link: https://patch.msgid.link/20240717172916.595808-1-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/auxiliary_bus.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h index 3ba4487c9cd9..1539bbd263d2 100644 --- a/include/linux/auxiliary_bus.h +++ b/include/linux/auxiliary_bus.h @@ -58,9 +58,10 @@ * in * @name: Match name found by the auxiliary device driver, * @id: unique identitier if multiple devices of the same name are exported, - * @irqs: irqs xarray contains irq indices which are used by the device, - * @lock: Synchronize irq sysfs creation, - * @irq_dir_exists: whether "irqs" directory exists, + * @sysfs: embedded struct which hold all sysfs related fields, + * @sysfs.irqs: irqs xarray contains irq indices which are used by the device, + * @sysfs.lock: Synchronize irq sysfs creation, + * @sysfs.irq_dir_exists: whether "irqs" directory exists, * * An auxiliary_device represents a part of its parent device's functionality. * It is given a name that, combined with the registering drivers From 120f1c857a73e52132e473dee89b340440cb692b Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Mon, 15 Jul 2024 16:14:42 +0200 Subject: [PATCH 09/16] net: flow_dissector: use DEBUG_NET_WARN_ON_ONCE The following splat is easy to reproduce upstream as well as in -stable kernels. Florian Westphal provided the following commit: d1dab4f71d37 ("net: add and use __skb_get_hash_symmetric_net") but this complementary fix has been also suggested by Willem de Bruijn and it can be easily backported to -stable kernel which consists in using DEBUG_NET_WARN_ON_ONCE instead to silence the following splat given __skb_get_hash() is used by the nftables tracing infrastructure to to identify packets in traces. [69133.561393] ------------[ cut here ]------------ [69133.561404] WARNING: CPU: 0 PID: 43576 at net/core/flow_dissector.c:1104 __skb_flow_dissect+0x134f/ [...] [69133.561944] CPU: 0 PID: 43576 Comm: socat Not tainted 6.10.0-rc7+ #379 [69133.561959] RIP: 0010:__skb_flow_dissect+0x134f/0x2ad0 [69133.561970] Code: 83 f9 04 0f 84 b3 00 00 00 45 85 c9 0f 84 aa 00 00 00 41 83 f9 02 0f 84 81 fc ff ff 44 0f b7 b4 24 80 00 00 00 e9 8b f9 ff ff <0f> 0b e9 20 f3 ff ff 41 f6 c6 20 0f 84 e4 ef ff ff 48 8d 7b 12 e8 [69133.561979] RSP: 0018:ffffc90000006fc0 EFLAGS: 00010246 [69133.561988] RAX: 0000000000000000 RBX: ffffffff82f33e20 RCX: ffffffff81ab7e19 [69133.561994] RDX: dffffc0000000000 RSI: ffffc90000007388 RDI: ffff888103a1b418 [69133.562001] RBP: ffffc90000007310 R08: 0000000000000000 R09: 0000000000000000 [69133.562007] R10: ffffc90000007388 R11: ffffffff810cface R12: ffff888103a1b400 [69133.562013] R13: 0000000000000000 R14: ffffffff82f33e2a R15: ffffffff82f33e28 [69133.562020] FS: 00007f40f7131740(0000) GS:ffff888390800000(0000) knlGS:0000000000000000 [69133.562027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [69133.562033] CR2: 00007f40f7346ee0 CR3: 000000015d200001 CR4: 00000000001706f0 [69133.562040] Call Trace: [69133.562044] [69133.562049] ? __warn+0x9f/0x1a0 [ 1211.841384] ? __skb_flow_dissect+0x107e/0x2860 [...] [ 1211.841496] ? bpf_flow_dissect+0x160/0x160 [ 1211.841753] __skb_get_hash+0x97/0x280 [ 1211.841765] ? __skb_get_hash_symmetric+0x230/0x230 [ 1211.841776] ? mod_find+0xbf/0xe0 [ 1211.841786] ? get_stack_info_noinstr+0x12/0xe0 [ 1211.841798] ? bpf_ksym_find+0x56/0xe0 [ 1211.841807] ? __rcu_read_unlock+0x2a/0x70 [ 1211.841819] nft_trace_init+0x1b9/0x1c0 [nf_tables] [ 1211.841895] ? nft_trace_notify+0x830/0x830 [nf_tables] [ 1211.841964] ? get_stack_info+0x2b/0x80 [ 1211.841975] ? nft_do_chain_arp+0x80/0x80 [nf_tables] [ 1211.842044] nft_do_chain+0x79c/0x850 [nf_tables] Fixes: 9b52e3f267a6 ("flow_dissector: handle no-skb use case") Suggested-by: Willem de Bruijn Signed-off-by: Pablo Neira Ayuso Reviewed-by: Willem de Bruijn Link: https://patch.msgid.link/20240715141442.43775-1-pablo@netfilter.org Signed-off-by: Paolo Abeni --- net/core/flow_dissector.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index ada1e39b557e..0e638a37aa09 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -1117,7 +1117,7 @@ bool __skb_flow_dissect(const struct net *net, } } - WARN_ON_ONCE(!net); + DEBUG_NET_WARN_ON_ONCE(!net); if (net) { enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR; struct bpf_prog_array *run_array; From 338bb57e4c2a1c2c6fc92f9c0bd35be7587adca7 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Mon, 15 Jul 2024 17:23:53 +0300 Subject: [PATCH 10/16] ipv4: Fix incorrect TOS in route get reply The TOS value that is returned to user space in the route get reply is the one with which the lookup was performed ('fl4->flowi4_tos'). This is fine when the matched route is configured with a TOS as it would not match if its TOS value did not match the one with which the lookup was performed. However, matching on TOS is only performed when the route's TOS is not zero. It is therefore possible to have the kernel incorrectly return a non-zero TOS: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get 192.0.2.2 tos 0xfc 192.0.2.2 tos 0x1c dev dummy1 src 192.0.2.1 uid 0 cache Fix by adding a DSCP field to the FIB result structure (inside an existing 4 bytes hole), populating it in the route lookup and using it when filling the route get reply. Output after the patch: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get 192.0.2.2 tos 0xfc 192.0.2.2 dev dummy1 src 192.0.2.1 uid 0 cache Fixes: 1a00fee4ffb2 ("ipv4: Remove rt_key_{src,dst,tos} from struct rtable.") Signed-off-by: Ido Schimmel Reviewed-by: David Ahern Reviewed-by: Guillaume Nault Signed-off-by: Paolo Abeni --- include/net/ip_fib.h | 1 + net/ipv4/fib_trie.c | 1 + net/ipv4/route.c | 14 +++++++------- 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 6e7984bfb986..72af2f223e59 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -173,6 +173,7 @@ struct fib_result { unsigned char type; unsigned char scope; u32 tclassid; + dscp_t dscp; struct fib_nh_common *nhc; struct fib_info *fi; struct fib_table *table; diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index f474106464d2..8f30e3f00b7f 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1629,6 +1629,7 @@ set_result: res->nhc = nhc; res->type = fa->fa_type; res->scope = fi->fib_scope; + res->dscp = fa->fa_dscp; res->fi = fi; res->table = tb; res->fa_head = &n->leaf; diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 54512acbead7..82b8fe8faeb7 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2867,9 +2867,9 @@ EXPORT_SYMBOL_GPL(ip_route_output_flow); /* called with rcu_read_lock held */ static int rt_fill_info(struct net *net, __be32 dst, __be32 src, - struct rtable *rt, u32 table_id, struct flowi4 *fl4, - struct sk_buff *skb, u32 portid, u32 seq, - unsigned int flags) + struct rtable *rt, u32 table_id, dscp_t dscp, + struct flowi4 *fl4, struct sk_buff *skb, u32 portid, + u32 seq, unsigned int flags) { struct rtmsg *r; struct nlmsghdr *nlh; @@ -2885,7 +2885,7 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src, r->rtm_family = AF_INET; r->rtm_dst_len = 32; r->rtm_src_len = 0; - r->rtm_tos = fl4 ? fl4->flowi4_tos : 0; + r->rtm_tos = inet_dscp_to_dsfield(dscp); r->rtm_table = table_id < 256 ? table_id : RT_TABLE_COMPAT; if (nla_put_u32(skb, RTA_TABLE, table_id)) goto nla_put_failure; @@ -3035,7 +3035,7 @@ static int fnhe_dump_bucket(struct net *net, struct sk_buff *skb, goto next; err = rt_fill_info(net, fnhe->fnhe_daddr, 0, rt, - table_id, NULL, skb, + table_id, 0, NULL, skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, flags); if (err) @@ -3358,8 +3358,8 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, err = fib_dump_info(skb, NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, RTM_NEWROUTE, &fri, 0); } else { - err = rt_fill_info(net, dst, src, rt, table_id, &fl4, skb, - NETLINK_CB(in_skb).portid, + err = rt_fill_info(net, dst, src, rt, table_id, res.dscp, &fl4, + skb, NETLINK_CB(in_skb).portid, nlh->nlmsg_seq, 0); } if (err < 0) From f036e68212c11e5a7edbb59b5e25299341829485 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Mon, 15 Jul 2024 17:23:54 +0300 Subject: [PATCH 11/16] ipv4: Fix incorrect TOS in fibmatch route get reply The TOS value that is returned to user space in the route get reply is the one with which the lookup was performed ('fl4->flowi4_tos'). This is fine when the matched route is configured with a TOS as it would not match if its TOS value did not match the one with which the lookup was performed. However, matching on TOS is only performed when the route's TOS is not zero. It is therefore possible to have the kernel incorrectly return a non-zero TOS: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get fibmatch 192.0.2.2 tos 0xfc 192.0.2.0/24 tos 0x1c dev dummy1 proto kernel scope link src 192.0.2.1 Fix by instead returning the DSCP field from the FIB result structure which was populated during the route lookup. Output after the patch: # ip link add name dummy1 up type dummy # ip address add 192.0.2.1/24 dev dummy1 # ip route get fibmatch 192.0.2.2 tos 0xfc 192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1 Extend the existing selftests to not only verify that the correct route is returned, but that it is also returned with correct "tos" value (or without it). Fixes: b61798130f1b ("net: ipv4: RTM_GETROUTE: return matched fib result when requested") Signed-off-by: Ido Schimmel Reviewed-by: David Ahern Reviewed-by: Guillaume Nault Signed-off-by: Paolo Abeni --- net/ipv4/route.c | 2 +- tools/testing/selftests/net/fib_tests.sh | 24 ++++++++++++------------ 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 82b8fe8faeb7..5090912533d6 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -3331,7 +3331,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, fri.tb_id = table_id; fri.dst = res.prefix; fri.dst_len = res.prefixlen; - fri.dscp = inet_dsfield_to_dscp(fl4.flowi4_tos); + fri.dscp = res.dscp; fri.type = rt->rt_type; fri.offload = 0; fri.trap = 0; diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh index 73895711cdf4..5f3c28fc8624 100755 --- a/tools/testing/selftests/net/fib_tests.sh +++ b/tools/testing/selftests/net/fib_tests.sh @@ -1737,53 +1737,53 @@ ipv4_rt_dsfield() # DSCP 0x10 should match the specific route, no matter the ECN bits $IP route get fibmatch 172.16.102.1 dsfield 0x10 | \ - grep -q "via 172.16.103.2" + grep -q "172.16.102.0/24 tos 0x10 via 172.16.103.2" log_test $? 0 "IPv4 route with DSCP and ECN:Not-ECT" $IP route get fibmatch 172.16.102.1 dsfield 0x11 | \ - grep -q "via 172.16.103.2" + grep -q "172.16.102.0/24 tos 0x10 via 172.16.103.2" log_test $? 0 "IPv4 route with DSCP and ECN:ECT(1)" $IP route get fibmatch 172.16.102.1 dsfield 0x12 | \ - grep -q "via 172.16.103.2" + grep -q "172.16.102.0/24 tos 0x10 via 172.16.103.2" log_test $? 0 "IPv4 route with DSCP and ECN:ECT(0)" $IP route get fibmatch 172.16.102.1 dsfield 0x13 | \ - grep -q "via 172.16.103.2" + grep -q "172.16.102.0/24 tos 0x10 via 172.16.103.2" log_test $? 0 "IPv4 route with DSCP and ECN:CE" # Unknown DSCP should match the generic route, no matter the ECN bits $IP route get fibmatch 172.16.102.1 dsfield 0x14 | \ - grep -q "via 172.16.101.2" + grep -q "172.16.102.0/24 via 172.16.101.2" log_test $? 0 "IPv4 route with unknown DSCP and ECN:Not-ECT" $IP route get fibmatch 172.16.102.1 dsfield 0x15 | \ - grep -q "via 172.16.101.2" + grep -q "172.16.102.0/24 via 172.16.101.2" log_test $? 0 "IPv4 route with unknown DSCP and ECN:ECT(1)" $IP route get fibmatch 172.16.102.1 dsfield 0x16 | \ - grep -q "via 172.16.101.2" + grep -q "172.16.102.0/24 via 172.16.101.2" log_test $? 0 "IPv4 route with unknown DSCP and ECN:ECT(0)" $IP route get fibmatch 172.16.102.1 dsfield 0x17 | \ - grep -q "via 172.16.101.2" + grep -q "172.16.102.0/24 via 172.16.101.2" log_test $? 0 "IPv4 route with unknown DSCP and ECN:CE" # Null DSCP should match the generic route, no matter the ECN bits $IP route get fibmatch 172.16.102.1 dsfield 0x00 | \ - grep -q "via 172.16.101.2" + grep -q "172.16.102.0/24 via 172.16.101.2" log_test $? 0 "IPv4 route with no DSCP and ECN:Not-ECT" $IP route get fibmatch 172.16.102.1 dsfield 0x01 | \ - grep -q "via 172.16.101.2" + grep -q "172.16.102.0/24 via 172.16.101.2" log_test $? 0 "IPv4 route with no DSCP and ECN:ECT(1)" $IP route get fibmatch 172.16.102.1 dsfield 0x02 | \ - grep -q "via 172.16.101.2" + grep -q "172.16.102.0/24 via 172.16.101.2" log_test $? 0 "IPv4 route with no DSCP and ECN:ECT(0)" $IP route get fibmatch 172.16.102.1 dsfield 0x03 | \ - grep -q "via 172.16.101.2" + grep -q "172.16.102.0/24 via 172.16.101.2" log_test $? 0 "IPv4 route with no DSCP and ECN:CE" } From a1a305375dc356865b22a43cbe09869bf8fae9ca Mon Sep 17 00:00:00 2001 From: Jack Wu Date: Tue, 16 Jul 2024 10:49:02 +0800 Subject: [PATCH 12/16] net: wwan: t7xx: add support for Dell DW5933e add support for Dell DW5933e (0x14c0, 0x4d75) Signed-off-by: Jack Wu Link: https://patch.msgid.link/20240716024902.16054-1-wojackbb@gmail.com Signed-off-by: Paolo Abeni --- drivers/net/wwan/t7xx/t7xx_pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/wwan/t7xx/t7xx_pci.c b/drivers/net/wwan/t7xx/t7xx_pci.c index e0b1e7a616ca..10a8c1080b10 100644 --- a/drivers/net/wwan/t7xx/t7xx_pci.c +++ b/drivers/net/wwan/t7xx/t7xx_pci.c @@ -852,6 +852,7 @@ static void t7xx_pci_remove(struct pci_dev *pdev) static const struct pci_device_id t7xx_pci_table[] = { { PCI_DEVICE(PCI_VENDOR_ID_MEDIATEK, 0x4d75) }, + { PCI_DEVICE(0x14c0, 0x4d75) }, // Dell DW5933e { } }; MODULE_DEVICE_TABLE(pci, t7xx_pci_table); From 4e076ff6ad5302c015617da30d877b4cdcbdf613 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Wed, 17 Jul 2024 10:47:19 +0200 Subject: [PATCH 13/16] net: airoha: Fix NULL pointer dereference in airoha_qdma_cleanup_rx_queue() Move page_pool_get_dma_dir() inside the while loop of airoha_qdma_cleanup_rx_queue routine in order to avoid possible NULL pointer dereference if airoha_qdma_init_rx_queue() fails before properly allocating the page_pool pointer. Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC") Signed-off-by: Lorenzo Bianconi Reviewed-by: Simon Horman Link: https://patch.msgid.link/7330a41bba720c33abc039955f6172457a3a34f0.1721205981.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni --- drivers/net/ethernet/mediatek/airoha_eth.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mediatek/airoha_eth.c b/drivers/net/ethernet/mediatek/airoha_eth.c index cc1a69522f61..16761fde6c6c 100644 --- a/drivers/net/ethernet/mediatek/airoha_eth.c +++ b/drivers/net/ethernet/mediatek/airoha_eth.c @@ -1585,7 +1585,6 @@ static int airoha_qdma_init_rx_queue(struct airoha_eth *eth, static void airoha_qdma_cleanup_rx_queue(struct airoha_queue *q) { - enum dma_data_direction dir = page_pool_get_dma_dir(q->page_pool); struct airoha_eth *eth = q->eth; while (q->queued) { @@ -1593,7 +1592,7 @@ static void airoha_qdma_cleanup_rx_queue(struct airoha_queue *q) struct page *page = virt_to_head_page(e->buf); dma_sync_single_for_cpu(eth->dev, e->dma_addr, e->dma_len, - dir); + page_pool_get_dma_dir(q->page_pool)); page_pool_put_full_page(q->page_pool, page, false); q->tail = (q->tail + 1) % q->ndesc; q->queued--; From 66b6095c264e1b4e0a441c6329861806504e06c6 Mon Sep 17 00:00:00 2001 From: Martin Willi Date: Wed, 17 Jul 2024 11:08:19 +0200 Subject: [PATCH 14/16] net: dsa: mv88e6xxx: Limit chip-wide frame size config to CPU ports Marvell chips not supporting per-port jumbo frame size configurations use a chip-wide frame size configuration. In the commit referenced with the Fixes tag, the setting is applied just for the last port changing its MTU. While configuring CPU ports accounts for tagger overhead, user ports do not. When setting the MTU for a user port, the chip-wide setting is reduced to not include the tagger overhead, resulting in an potentially insufficient maximum frame size for the CPU port. Specifically, sending full-size frames from the CPU port on a MV88E6097 having a user port MTU of 1500 bytes results in dropped frames. As, by design, the CPU port MTU is adjusted for any user port change, apply the chip-wide setting only for CPU ports. Fixes: 1baf0fac10fb ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU") Suggested-by: Vladimir Oltean Signed-off-by: Martin Willi Reviewed-by: Vladimir Oltean Signed-off-by: Paolo Abeni --- drivers/net/dsa/mv88e6xxx/chip.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 07c897b13de1..5b4e2ce5470d 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3626,7 +3626,8 @@ static int mv88e6xxx_change_mtu(struct dsa_switch *ds, int port, int new_mtu) mv88e6xxx_reg_lock(chip); if (chip->info->ops->port_set_jumbo_size) ret = chip->info->ops->port_set_jumbo_size(chip, port, new_mtu); - else if (chip->info->ops->set_max_frame_size) + else if (chip->info->ops->set_max_frame_size && + dsa_is_cpu_port(ds, port)) ret = chip->info->ops->set_max_frame_size(chip, new_mtu); mv88e6xxx_reg_unlock(chip); From c5118072e228e7e4385fc5ac46b2e31cf6c4f2d3 Mon Sep 17 00:00:00 2001 From: Martin Willi Date: Wed, 17 Jul 2024 11:08:20 +0200 Subject: [PATCH 15/16] net: dsa: b53: Limit chip-wide jumbo frame config to CPU ports Broadcom switches supported by the b53 driver use a chip-wide jumbo frame configuration. In the commit referenced with the Fixes tag, the setting is applied just for the last port changing its MTU. While configuring CPU ports accounts for tagger overhead, user ports do not. When setting the MTU for a user port, the chip-wide setting is reduced to not include the tagger overhead, resulting in an potentially insufficient chip-wide maximum frame size for the CPU port. As, by design, the CPU port MTU is adjusted for any user port change, apply the chip-wide setting only for CPU ports. This aligns the driver to the behavior of other switch drivers. Fixes: 6ae5834b983a ("net: dsa: b53: add MTU configuration support") Suggested-by: Vladimir Oltean Signed-off-by: Martin Willi Reviewed-by: Vladimir Oltean Signed-off-by: Paolo Abeni --- drivers/net/dsa/b53/b53_common.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index 8f50abe739b7..0783fc121bbb 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -2256,6 +2256,9 @@ static int b53_change_mtu(struct dsa_switch *ds, int port, int mtu) if (is5325(dev) || is5365(dev)) return -EOPNOTSUPP; + if (!dsa_is_cpu_port(ds, port)) + return 0; + enable_jumbo = (mtu >= JMS_MIN_SIZE); allow_10_100 = (dev->chip_id == BCM583XX_DEVICE_ID); From 4359836129d931fc424370249a1fcdec139fe407 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 17 Jul 2024 09:15:59 -0700 Subject: [PATCH 16/16] eth: fbnic: don't build the driver when skb has more than 21 frags Similarly to commit 0e03c643dc93 ("eth: fbnic: fix s390 build."), the driver won't build if skb_shared_info has more than 25 frags assuming a 64B cache line and 21 frags assuming a 128B cache line. (512 - 48 - 64) / 16 = 25 (512 - 48 - 128) / 16 = 21 Fixes: 0cb4c0a13723 ("eth: fbnic: Implement Rx queue alloc/start/stop/free") Signed-off-by: Jakub Kicinski Link: https://patch.msgid.link/20240717161600.1291544-1-kuba@kernel.org Signed-off-by: Paolo Abeni --- drivers/net/ethernet/meta/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/meta/Kconfig b/drivers/net/ethernet/meta/Kconfig index a9f078212c78..86034ea4ba5b 100644 --- a/drivers/net/ethernet/meta/Kconfig +++ b/drivers/net/ethernet/meta/Kconfig @@ -21,6 +21,7 @@ config FBNIC tristate "Meta Platforms Host Network Interface" depends on X86_64 || COMPILE_TEST depends on S390=n + depends on MAX_SKB_FRAGS < 22 depends on PCI_MSI select PHYLINK help