linux

iv/linux

Author	SHA1	Message	Date
Taehee Yoo	4b4b84468a	mld: change lockdep annotation for ip6_sf_socklist and ipv6_mc_socklist struct ip6_sf_socklist and ipv6_mc_socklist are per-socket MLD data. These data are protected by rtnl lock, socket lock, and RCU. So, when these are used, it verifies whether rtnl lock is acquired or not. ip6_mc_msfget() is called by do_ipv6_getsockopt(). But caller doesn't acquire rtnl lock. So, when these data are used in the ip6_mc_msfget() lockdep warns about it. But accessing these is actually safe because socket lock was acquired by do_ipv6_getsockopt(). So, it changes lockdep annotation from rtnl lock to socket lock. (rtnl_dereference -> sock_dereference) Locking graph for mld data is like below: When writing mld data: do_ipv6_setsockopt() rtnl_lock lock_sock (mld functions) idev->mc_lock(if per-interface mld data is modified) When reading mld data: do_ipv6_getsockopt() lock_sock ip6_mc_msfget() Splat looks like: ============================= WARNING: suspicious RCU usage 5.12.0-rc4+ #503 Not tainted ----------------------------- net/ipv6/mcast.c:610 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by mcast-listener-/923: #0: ffff888007958a70 (sk_lock-AF_INET6){+.+.}-{0:0}, at: ipv6_get_msfilter+0xaf/0x190 stack backtrace: CPU: 1 PID: 923 Comm: mcast-listener- Not tainted 5.12.0-rc4+ #503 Call Trace: dump_stack+0xa4/0xe5 ip6_mc_msfget+0x553/0x6c0 ? ipv6_sock_mc_join_ssm+0x10/0x10 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? mark_held_locks+0xb7/0x120 ? lockdep_hardirqs_on_prepare+0x27c/0x3e0 ? __local_bh_enable_ip+0xa5/0xf0 ? lock_sock_nested+0x82/0xf0 ipv6_get_msfilter+0xc3/0x190 ? compat_ipv6_get_msfilter+0x300/0x300 ? lock_downgrade+0x690/0x690 do_ipv6_getsockopt.isra.6.constprop.13+0x1809/0x29e0 ? do_ipv6_mcast_group_source+0x150/0x150 ? register_lock_class+0x1750/0x1750 ? kvm_sched_clock_read+0x14/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x3a/0x1c0 ? lock_downgrade+0x690/0x690 ? ipv6_getsockopt+0xdb/0x1b0 ipv6_getsockopt+0xdb/0x1b0 [ ... ] Fixes: `88e2ca3080` ("mld: convert ifmcaddr6 to RCU") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 12:50:04 -07:00
Christophe JAILLET	7190e9d8e1	qede: Use 'skb_add_rx_frag()' instead of hand coding it Some lines of code can be merged into an equivalent 'skb_add_rx_frag()' call which is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Manish Chopra <manishc@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 11:59:33 -07:00
Christophe JAILLET	1ec3d02f9c	qede: Remove a erroneous ++ in 'qede_rx_build_jumbo()' This ++ is confusing. It looks duplicated with the one already performed in 'skb_fill_page_desc()'. In fact, it is harmless. 'nr_frags' is written twice with the same value. Once, because of the nr_frags++, and once because of the 'nr_frags = i + 1' in 'skb_fill_page_desc()'. So axe this post-increment to avoid confusion. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Manish Chopra <manishc@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 11:59:33 -07:00
Christophe JAILLET	c438a801e0	sfc: Use 'skb_add_rx_frag()' instead of hand coding it Some lines of code can be merged into an equivalent 'skb_add_rx_frag()' call which is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 11:55:44 -07:00
Christophe JAILLET	c3105f8485	ibmvnic: Use 'skb_frag_address()' instead of hand coding it 'page_address(skb_frag_page()) + skb_frag_off()' can be replaced by an equivalent 'skb_frag_address()' which is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 11:55:44 -07:00
Christophe JAILLET	0282bc6ae8	net: ag71xx: Slightly simplify 'ag71xx_rx_packets()' There is no need to use 'list_for_each_entry_safe' here, as nothing is removed from the list in the 'for' loop. Use 'list_for_each_entry' instead, it is slightly less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 11:53:44 -07:00
Xie He	514e1150da	net: x25: Queue received packets in the drivers instead of per-CPU queues X.25 Layer 3 (the Packet Layer) expects layer 2 to provide a reliable datalink service such that no packets are reordered or dropped. And X.25 Layer 2 (the LAPB layer) is indeed designed to provide such service. However, this reliability is not preserved when a driver calls "netif_rx" to deliver the received packets to layer 3, because "netif_rx" will put the packets into per-CPU queues before they are delivered to layer 3. If there are multiple CPUs, the order of the packets may not be preserved. The per-CPU queues may also drop packets if there are too many. Therefore, we should not call "netif_rx" to let it queue the packets. Instead, we should use our own queue that won't reorder or drop packets. This patch changes all X.25 drivers to use their own queues instead of calling "netif_rx". The patch also documents this requirement in the "x25-iface" documentation. Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-05 11:42:12 -07:00
Christophe JAILLET	7d42e84eb9	net: openvswitch: Use 'skb_push_rcsum()' instead of hand coding it 'skb_push()'/'skb_postpush_rcsum()' can be replaced by an equivalent 'skb_push_rcsum()' which is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-04 01:43:02 -07:00
David S. Miller	cd77ce9303	mlx5-updates-2021-04-02 This series provides trivial updates and cleanup to mlx5 driver 1) Support for matching on ct_state inv and rel flag in connection tracking 2) Reject TC rules that redirect from a VF to itself 3) Parav provided some E-Switch cleanups that could be summarized to: 3.1) Packing and Reduce structure sizes 3.2) Dynamic allocation of rate limit tables and structures 4) Vu Makes the netdev arfs and vlan tables allocation dynamic. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmBnpQQACgkQSD+KveBX +j48uAf/S6VrX6NPn0j1nG9LRPt679KYv5eync/PUPxAbHb9DLkNYYTjrbQYp77H KSFjRr7r7CXW5Vl6ZeAEA9O2ZzzNtpuTIDrTOaLu3oM6FgGho5Y8PIzsq4zUOudd y948kRosZedYIYCtgPFjHqhAOGkEZl7GX5oSSF2WTYtfPn8tgTU91FCYSTtEk+rM UBm9VuuZnz5hBtesl5zjqym6l27U3xsJpxvpR465iRB0rrIz0gaoyFLI1NTouMVz XYZZwsFtIrl4Ffnm18ayqO1zM7VgybpvPX2zrWddX8Eb7Cbr9r90PCK4qbtiUtnI fELrif57Rh6TNIQxafEovM5UGaZ+eA== =kCdd -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-04-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-04-02 This series provides trivial updates and cleanup to mlx5 driver 1) Support for matching on ct_state inv and rel flag in connection tracking 2) Reject TC rules that redirect from a VF to itself 3) Parav provided some E-Switch cleanups that could be summarized to: 3.1) Packing and Reduce structure sizes 3.2) Dynamic allocation of rate limit tables and structures 4) Vu Makes the netdev arfs and vlan tables allocation dynamic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-04 01:41:08 -07:00
David S. Miller	428e68e1a8	Merge branch 'stmmac-xdp' Ong Boon Leong says: ==================== stmmac: Add XDP support This is the v4 patch series for adding XDP native support to stmmac. Changes in v4: 5/6: Move TX clean timer setup to the end of NAPI RX process and group it under stmmac_finalize_xdp_rx(). Also, fixed stmmac_xdp_xmit_back() returns STMMAC_XDP_CONSUMED if XDP buffer conversion to XDP frame fails. 6/6: Move xdp_do_flush(0 into stmmac_finalize_xdp_rx() and combine the XDP verdict of XDP TX and XDP REDIRECT together. I retested the patch series on the 'xdp2' and 'xdp_redirect' related to changes above and found the result to be satisfactory. History of previous patch series: v3: https://patchwork.kernel.org/project/netdevbpf/cover/20210331154135.8507-1-boon.leong.ong@intel.com/ v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=457757 v1: https://patchwork.kernel.org/project/netdevbpf/list/?series=457139 It will be great if community can help to test or review the v4 series and provide me any input if any. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-03 14:13:40 -07:00
Ong Boon Leong	8b278a5b69	net: stmmac: Add support for XDP_REDIRECT action This patch adds the support of XDP_REDIRECT to another remote cpu for further action. It also implements ndo_xdp_xmit ops, enabling the driver to transmit packets forwarded to it by XDP program running on another interface. This patch has been tested using "xdp_redirect_cpu" for XDP_REDIRECT + drop testing. It also been tested with "xdp_redirect" sample app which can be used to exercise ndo_xdp_xmit ops. The burst traffics are generated using pktgen_sample03_burst_single_flow.sh in samples/pktgen directory. v4: Move xdp_do_flush() processing into stmmac_finalize_xdp_rx() and combined the XDP verdict of XDP TX and REDIRECT together. v3: Added 'nq->trans_start = jiffies' to avoid TX time-out as we are sharing TX queue between slow path and XDP. Thanks to Jakub Kicinski for point out. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-03 14:13:39 -07:00
Ong Boon Leong	be8b38a722	net: stmmac: Add support for XDP_TX action This patch adds support for XDP_TX action which enables XDP program to transmit back received frames. This patch has been tested with the "xdp2" app located in samples/bpf dir. The DUT receives burst traffic packet generated using pktgen script 'pktgen_sample03_burst_single_flow.sh'. v4: Moved stmmac_tx_timer_arm() to be done once at the end of NAPI RX. Fixed stmmac_xdp_xmit_back() to return STMMAC_XDP_CONSUMED if XDP buffer to frame conversion fails. Thanks to Jakub's input. v3: Added 'nq->trans_start = jiffies' to avoid TX time-out as we are sharing TX queue between slow path and XDP. Thanks to Jakub Kicinski for pointing out. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-03 14:13:39 -07:00
Ong Boon Leong	5fabb01207	net: stmmac: Add initial XDP support This patch adds the initial XDP support to stmmac driver. It supports XDP_PASS, XDP_DROP and XDP_ABORTED actions. Upcoming patches will add support for XDP_TX and XDP_REDIRECT. To support XDP headroom, this patch adds page_offset into RX buffer and change the dma_sync_single_for_device\|cpu(). The DMA address used for RX operation are changed to take into page_offset too. As page_pool can handle dma_sync_single_for_device() on behalf of driver with PP_FLAG_DMA_SYNC_DEV flag, we skip doing that in stmmac driver. Current stmmac driver supports split header support (SPH) in RX but the flexibility of splitting header and payload at different position makes it very complex to be supported for XDP processing. In addition, jumbo frame is not supported in XDP to keep the initial codes simple. This patch has been tested with the sample app "xdp1" located in samples/bpf directory for both SKB and Native (XDP) mode. The burst traffic generated using pktgen_sample03_burst_single_flow.sh in samples/pktgen directory. Changes in v3: - factor in xdp header and tail adjustment done by XDP program. Thanks to Jakub Kicinski for pointing out the gap in v2. Changes in v2: - fix for "warning: variable 'len' set but not used" reported by lkp. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-03 14:13:39 -07:00
Ong Boon Leong	d96febedfd	net: stmmac: arrange Tx tail pointer update to stmmac_flush_tx_descriptors This patch organizes TX tail pointer update into a new function called stmmac_flush_tx_descriptors() so that we can reuse it in stmmac_xmit(), stmmac_tso_xmit() and up-coming XDP implementation. Changes to v2: - Fix for warning: unused variable ‘desc_size’ https://patchwork.hopto.org/static/nipa/457321/12170149/build_32bit/stderr Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-03 14:13:38 -07:00
Ong Boon Leong	d08d32d101	net: stmmac: make SPH enable/disable to be configurable SPH functionality splits header and payload according to split mode and offsef fields (SPLM and SPLOFST). It is beneficials for Linux network stack RX processing however it adds a lot of complexity in XDP processing. So, this patch makes the split-header (SPH) capability of the controller is stored in "priv->sph_cap" and the enabling/disabling of SPH is decided by "priv->sph". This is to prepare initial XDP enabling for stmmac to disable the use of SPH whenever XDP is enabled. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-03 14:13:38 -07:00
Ong Boon Leong	8deec94c60	net: stmmac: set IRQ affinity hint for multi MSI vectors Certain platform likes Intel mGBE has independent hardware IRQ resources for TX and RX DMA operation. In preparation to support XDP TX, we add IRQ affinity hint to group both RX and TX queue of the same queue ID to the same CPU. Changes in v2: - IRQ affinity hint need to set to null before IRQ is released. Thanks to issue reported by Song, Yoong Siang. Reported-by: Song, Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-03 14:13:38 -07:00
Vu Pham	6783f0a21a	net/mlx5e: Dynamic alloc vlan table for netdev when needed Dynamic allocate vlan table in mlx5e_priv for EN netdev when needed. Don't allocate it for representor netdev. Signed-off-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:08 -07:00
Vu Pham	f6755b80d6	net/mlx5e: Dynamic alloc arfs table for netdev when needed Dynamic allocate arfs table in mlx5e_priv for EN netdev when needed. Don't allocate it for representor netdev. Signed-off-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:08 -07:00
Ariel Levkovich	bb5696570b	net/mlx5e: Reject tc rules which redirect from a VF to itself Since there are self loopback prevention mechanisms at the VF level, offloading such rules which redirect from a VF to itself in the eswitch will break the datapath since the packets will be dropped once they go back to the vport they came from. Therefore, offloading such rules will be rejected and left to be handled by SW. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:08 -07:00
Roi Dayan	8802b8a44e	net/mlx5: Use ida_alloc_range() instead of ida_simple_alloc() ida_simple_alloc() and remove functions are deprecated. Related change: commit `3264ceec8f` ("lib/idr.c: document that ida_simple_{get,remove}() are deprecated") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:07 -07:00
Parav Pandit	233dd7d656	net/mlx5: E-Switch, move QoS specific fields to existing qos struct Function QoS related fields are already defined in qos related struct. min and max rate are left out to mlx5_vport_info struct. Move them to existing qos struct. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:07 -07:00
Parav Pandit	b47e105625	net/mlx5: E-Switch, cut down mlx5_vport_info structure size by 8 bytes Structure mlx5_vport_info consumes 40 bytes of space due to a hole in it. After packing it reduces to 32 bytes. Currently: pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; /* 0 6 / u16 vlan; / 6 2 / u8 qos; / 8 1 / / XXX 7 bytes hole, try to pack / u64 node_guid; / 16 8 / int link_state; / 24 4 / u32 min_rate; / 28 4 / u32 max_rate; / 32 4 / bool spoofchk; / 36 1 / bool trusted; / 37 1 / / size: 40, cachelines: 1, members: 9 / / sum members: 31, holes: 1, sum holes: 7 / / padding: 2 / / last cacheline: 40 bytes / }; After packing: $ pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; / 0 6 / u16 vlan; / 6 2 / u64 node_guid; / 8 8 / int link_state; / 16 4 / u32 min_rate; / 20 4 / u32 max_rate; / 24 4 / u8 qos; / 28 1 / u8 spoofchk:1; / 29: 0 1 / u8 trusted:1; / 29: 1 1 / / size: 32, cachelines: 1, members: 9 / / padding: 2 / / bit_padding: 6 bits / / last cacheline: 32 bytes */ }; Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:07 -07:00
Parav Pandit	19779f28c9	net/mlx5: Pair mutex_destory with mutex_init for rate limit table Add missing mutex_destroy() to pair with mutex_init(). This should be done only when table is initialized, hence perform mutex_init() only when table is initialized. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:06 -07:00
Parav Pandit	6b30b6d4d3	net/mlx5: Allocate rate limit table when rate is configured A device supports 128 rate limiters. A static table allocation consumes 8KB of memory even when rate is not configured. Instead, allocate the table when at least one rate is configured. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:06 -07:00
Parav Pandit	97d85aba25	net/mlx5: Use helper to increment, decrement rate entry refcount Rate limit entry refcount can be incremented uniformly when it is newly allocated or reused. So simplify the code to increment refcount at one place. Use decrement refcount helper in two routines. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:06 -07:00
Parav Pandit	51ccc9f5f1	net/mlx5: Use helpers to allocate and free rl table entries User helper routines to allocate and free rate limit table entries. Subsequent patch extends use of these helpers to do allocation during rate entry allocation callback. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:05 -07:00
Parav Pandit	16e74672a2	net/mlx5: Do not hold mutex while reading table constants Table max_size, min and max rate are constants initialized while table is created. Reading it doesn't need to hold a table mutex. Hence, read them without holding table mutex. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:05 -07:00
Parav Pandit	4c4c0a89ab	net/mlx5: Pack mlx5_rl_entry structure mlx5_rl_entry structure is not properly packed as shown below. Due to this an array of size 9144 bytes allocated which is aligned to 16Kbytes. Hence, pack the structure and avoid the wastage. This offers 8Kbytes of saving per mlx5_core_dev struct. pahole -C mlx5_rl_entry drivers/net/ethernet/mellanox/mlx5/core/en_main.o Existing layout: struct mlx5_rl_entry { u8 rl_raw[48]; /* 0 48 / u16 index; / 48 2 / / XXX 6 bytes hole, try to pack / u64 refcount; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / u16 uid; / 64 2 / u8 dedicated:1; / 66: 0 1 / / size: 72, cachelines: 2, members: 5 / / sum members: 60, holes: 1, sum holes: 6 / / sum bitfield members: 1 bits (0 bytes) / / padding: 5 / / bit_padding: 7 bits / / last cacheline: 8 bytes / }; After alignment: struct mlx5_rl_entry { u8 rl_raw[48]; / 0 48 / u64 refcount; / 48 8 / u16 index; / 56 2 / u16 uid; / 58 2 / u8 dedicated:1; / 60: 0 1 / / size: 64, cachelines: 1, members: 5 / / padding: 3 / / bit_padding: 7 bits */ }; Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:05 -07:00
Parav Pandit	c6baac47d9	net/mlx5: Use unsigned int for free_count Fix the warning due to missing int. WARNING: Prefer 'unsigned int' to bare use of 'unsigned' + unsigned free_count; Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:04 -07:00
Parav Pandit	e591605f80	net/mlx5: E-Switch, move QoS specific fields to existing qos struct Function QoS related fields are already defined in qos related struct. min and max rate are left out to mlx5_vport_info struct. Move them to existing qos struct. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:04 -07:00
Parav Pandit	cadb129ffd	net/mlx5: E-Switch, cut down mlx5_vport_info structure size by 8 bytes Structure mlx5_vport_info consumes 40 bytes of space due to a hole in it. After packing it reduces to 32 bytes. Currently: pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; /* 0 6 / u16 vlan; / 6 2 / u8 qos; / 8 1 / / XXX 7 bytes hole, try to pack / u64 node_guid; / 16 8 / int link_state; / 24 4 / u32 min_rate; / 28 4 / u32 max_rate; / 32 4 / bool spoofchk; / 36 1 / bool trusted; / 37 1 / / size: 40, cachelines: 1, members: 9 / / sum members: 31, holes: 1, sum holes: 7 / / padding: 2 / / last cacheline: 40 bytes / }; After packing: $ pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; / 0 6 / u16 vlan; / 6 2 / u64 node_guid; / 8 8 / int link_state; / 16 4 / u32 min_rate; / 20 4 / u32 max_rate; / 24 4 / u8 qos; / 28 1 / u8 spoofchk:1; / 29: 0 1 / u8 trusted:1; / 29: 1 1 / / size: 32, cachelines: 1, members: 9 / / padding: 2 / / bit_padding: 6 bits / / last cacheline: 32 bytes */ }; Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:04 -07:00
Ariel Levkovich	116c76c510	net/mlx5: CT: Add support for matching on ct_state inv and rel flags Add support for matching on ct_state inv and rel flags. Currently the support is only for match on -inv and -rel. Matching on +inv and +rel will be rejected. Example: $ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \ ct_state -est-rel+trk \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est-inv \ action mirred egress redirect dev ens1f0_0 Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-04-02 16:13:03 -07:00
Eric Dumazet	8250666517	tcp: reorder tcp_congestion_ops for better cache locality Group all the often used fields in the first cache line, to reduce cache line misses. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:32:27 -07:00
Eric Dumazet	7f040aa322	net: reorganize fields in netns_mib Order fields to increase locality for most used protocols. udplite and icmp are moved at the end. Same for proc_net_devsnmp6 which is not used in fast path. This potentially saves one cache line miss for typical TCP/UDP over IPv4/IPv6. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:31:44 -07:00
Dan Carpenter	ca4d4c34ae	nfc: pn533: prevent potential memory corruption If the "type_a->nfcid_len" is too large then it would lead to memory corruption in pn533_target_found_type_a() when we do: memcpy(nfc_tgt->nfcid1, tgt_type_a->nfcid_data, nfc_tgt->nfcid1_len); Fixes: `c3b1e1e8a7` ("NFC: Export NFCID1 from pn533") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:27:50 -07:00
David S. Miller	8577dd8a07	Merge branch 'dpaa2-rx-copybreak' Ioana Ciornei says: ==================== dpaa2-eth: add rx copybreak support DMA unmapping, allocating a new buffer and DMA mapping it back on the refill path is really not that efficient. Proper buffer recycling (page pool, flipping the page and using the other half) cannot be done for DPAA2 since it's not a ring based controller but it rather deals with multiple queues which all get their buffers from the same buffer pool on Rx. To circumvent these limitations, add support for Rx copybreak in dpaa2-eth. Below you can find a summary of the tests that were run to end up with the default rx copybreak value of 512. A bit about the setup - a LS2088A SoC, 8 x Cortex A72 @ 1.8GHz, IPfwd zero loss test @ 20Gbit/s throughput. I tested multiple frame sizes to get an idea where is the break even point. Here are 2 sets of results, (1) is the baseline and (2) is just allocating a new skb for all frames sizes received (as if the copybreak was even to the MTU). All numbers are in Mpps. 64 128 256 512 640 768 896 (1) 3.23 3.23 3.24 3.21 3.1 2.76 2.71 (2) 3.95 3.88 3.79 3.62 3.3 3.02 2.65 It seems that even for 512 bytes frame sizes it's comfortably better when allocating a new skb. After that, we see diminishing rewards or even worse. Changes in v2: - properly marked dpaa2_eth_copybreak as static ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:25:47 -07:00
Ioana Ciornei	8ed3cefc26	dpaa2-eth: export the rx copybreak value as an ethtool tunable It's useful, especially for debugging purposes, to have the Rx copybreak value changeable at runtime. Export it as an ethtool tunable. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:25:47 -07:00
Ioana Ciornei	50f826999a	dpaa2-eth: add rx copybreak support DMA unmapping, allocating a new buffer and DMA mapping it back on the refill path is really not that efficient. Proper buffer recycling (page pool, flipping the page and using the other half) cannot be done for DPAA2 since it's not a ring based controller but it rather deals with multiple queues which all get their buffers from the same buffer pool on Rx. To circumvent these limitations, add support for Rx copybreak. For small sized packets instead of creating a skb around the buffer in which the frame was received, allocate a new sk buffer altogether, copy the contents of the frame and release the initial page back into the buffer pool. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:25:47 -07:00
Ioana Ciornei	28d137cc8c	dpaa2-eth: rename dpaa2_eth_xdp_release_buf into dpaa2_eth_recycle_buf Rename the dpaa2_eth_xdp_release_buf function into dpaa2_eth_recycle_buf since in the next patches we'll be using the same recycle mechanism for the normal stack path beside for XDP_DROP. Also, rename the array which holds the buffers to be recycled so that it does not have any reference to XDP. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:25:47 -07:00
David S. Miller	3e8db6365f	Merge branch 'mptcp-misc' Mat Martineau says: ==================== MPTCP: Miscellaneous changes Here is a collection of patches from the MPTCP tree: Patches 1 and 2 add some helpful MIB counters for connection information. Patch 3 cleans up some unnecessary checks. Patch 4 is a new feature, support for the MP_TCPRST option. This option is used when resetting one subflow within a MPTCP connection, and provides a reason code that the recipient can use when deciding how to adapt to the lost subflow. Patches 5-7 update the existing MPTCP selftests to improve timeout handling and to share better information when tests fail. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:51 -07:00
Matthieu Baerts	c2a55e8fd8	selftests: mptcp: dump more info on mpjoin errors Very occasionally, MPTCP selftests fail. Yeah, I saw that at least once! Here we provide more details in case of errors with mptcp_join.sh script like it was done with mptcp_connect.sh, see commit `767389c8dd` ("selftests: mptcp: dump more info on errors") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:51 -07:00
Matthieu Baerts	76e5e27ca9	selftests: mptcp: init nstat history Not to be impacted by packets sent between sub-tests. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:51 -07:00
Matthieu Baerts	5888a61cb4	selftests: mptcp: launch mptcp_connect with timeout 'mptcp_connect' already has a timeout for poll() but in some cases, it is not enough. With "timeout" tool, we will force the command to fail if it doesn't finish on time. Thanks to that, the script will continue and display details about the current state before marking the test as failed. Displaying this state is very important to be able to understand the issue. Best to have our CI reporting the issue than just "the test hanged". Note that in mptcp_connect.sh, we were using a long timeout to validate the fact we cannot create a socket if a sysctl is set. We don't need this timeout. In diag.sh, we want to send signals to mptcp_connect instances that have been started in the netns. But we cannot send this signal to 'timeout' otherwise that will stop the timeout and messages telling us SIGUSR1 has been received will be printed. Instead of trying to find the right PID and storing them in an array, we can simply use the output of 'ip netns pids' which is all the PIDs we want to send signal to. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/160 Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:50 -07:00
Florian Westphal	dc87efdb1a	mptcp: add mptcp reset option support The MPTCP reset option allows to carry a mptcp-specific error code that provides more information on the nature of a connection reset. Reset option data received gets stored in the subflow context so it can be sent to userspace via the 'subflow closed' netlink event. When a subflow is closed, the desired error code that should be sent to the peer is also placed in the subflow context structure. If a reset is sent before subflow establishment could complete, e.g. on HMAC failure during an MP_JOIN operation, the mptcp skb extension is used to store the reset information. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:50 -07:00
Paolo Abeni	781bf13d4f	mptcp: remove unneeded check on first subflow Currently we explicitly check for the first subflow being NULL in a couple of places, even if we don't need any special actions in such scenario. Just drop the unneeded checks, to avoid confusion. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:50 -07:00
Paolo Abeni	5695eb8891	mptcp: add active MPC mibs We are not currently tracking the active MPTCP connection attempts. Let's add the related counters. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:50 -07:00
Paolo Abeni	a16195e35c	mptcp: add mib for token creation fallback If the MPTCP protocol is unable to create a new token, the socket fallback to plain TCP, let's keep track of such events via a specific MIB. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:21:50 -07:00
David S. Miller	f3f409a9b7	Merge branch 'ionic-ptp' Shannon Nelson says: ==================== ionic: add PTP and hw clock support This patchset adds support for accessing the DSC hardware clock and for offloading PTP timestamping. Tx packet timestamping happens through a separate Tx queue set up with expanded completion descriptors that can report the timestamp. Rx timestamping can happen either on all queues, or on a separate timestamping queue when specific filtering is requested. Again, the timestamps are reported with the expanded completion descriptors. The timestamping offload ability is advertised but not enabled until an OS service asks for it. At that time the driver's queues are reconfigured to use the different completion descriptors and the private processing queues as needed. Reading the raw clock value comes through a new pair of values in the device info registers in BAR0. These high and low values are interpreted with help from new clock mask, mult, and shift values in the device identity information. First we add the ability to detect new queue features, then the handling of the new descriptor sizes. After adding the new interface structures, we start adding the support code, saving the advertising to the stack for last. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:18:33 -07:00
Shannon Nelson	afeefec677	ionic: advertise support for hardware timestamps Let the network stack know we've got support for timestamping the packets. Signed-off-by: Allen Hubbe <allenbh@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:18:33 -07:00
Shannon Nelson	196f56c07f	ionic: ethtool ptp stats Add the new hwstamp stats to our ethtool stats output. Signed-off-by: Allen Hubbe <allenbh@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-02 14:18:33 -07:00

1 2 3 4 5 ...

998771 Commits