linux

iv/linux

Author	SHA1	Message	Date
Parav Pandit	fcf8ec54b0	net/mlx5: E-switch, Remove vport enabled check An eswitch vport of the devlink port is always enabled before a devlink port is registered. And a eswitch vport is always disabled after a devlink port is unregistered. Hence avoid the vport enabled check in the devlink callback routine. Such check is only applicable in the legacy SR-IOV callbacks. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Sunil Sudhakar Rani <sunrani@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-11-16 20:31:51 -08:00
Chris Mi	819c319c8c	net/mlx5e: Specify out ifindex when looking up decap route There is a use case that the local and remote VTEPs are in the same host. Currently, the out ifindex is not specified when looking up the decap route for offloads. So in this case, a local route is returned and the route dev is lo. Actual tunnel interface can be created with a parameter "dev" [1], which specifies the physical device to use for tunnel endpoint communication. Pass this parameter to driver when looking up decap route for offloads. So that a unicast route will be returned. [1] ip link add name vxlan1 type vxlan id 100 dev enp4s0f0 remote 1.1.1.1 dstport 4789 Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-11-16 20:31:51 -08:00
Roi Dayan	fc3a879aea	net/mlx5e: TC, Move comment about mod header flag to correct place Move the comment to the correct place where the driver actually removes the flag and not in the check that maybe pedit actions exists. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-11-16 20:31:50 -08:00
Roi Dayan	88d9748604	net/mlx5e: TC, Move kfree() calls after destroying all resources When deleting fdb/nic flow rules first release all resources and then call the kfree() calls instead of sparse them around the function. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-11-16 20:31:50 -08:00
Roi Dayan	972fe492e8	net/mlx5e: TC, Destroy nic flow counter if exists Counter is only added if counter flag exists. So check the counter fag exists for deleting the counter. This is the same as in add/del fdb flow. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-11-16 20:31:49 -08:00
Yihao Han	0164a9bd9d	net/mlx5: TC, using swap() instead of tmp variable swap() was used instead of the tmp variable to swap values Signed-off-by: Yihao Han <hanyihao@vivo.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-11-16 20:31:49 -08:00
Paul Blakey	1cfd3490f2	net/mlx5: CT: Allow static allocation of mod headers As each CT rule uses at least 4 modify header actions, each rule causes at least 3 reallocations by the mod header actions api. Allow initial static allocation of the mod acts array, and use it for CT rules. If the static allocation is exceeded go back to dynamic allocation. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>	2021-11-16 20:31:48 -08:00
Paul Blakey	2c0e5cf520	net/mlx5e: Refactor mod header management API For all mod hdr related functions to reside in a single self contained component (mod_hdr.c), refactor alloc() and add get_id() so that user won't rely on internal implementation, and move both to mod_hdr component. Rename the prefix to mlx5e_mod_hdr_* as other mod hdr functions. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-11-16 20:31:48 -08:00
Aya Levin	f28a14c1dc	net/mlx5: Avoid printing health buffer when firmware is unavailable Use firmware version field as an indication to health buffer's sanity. When firmware version is 0xFFFFFFFF, deduce that firmware is unavailable and avoid printing the health buffer to dmesg as it doesn't provide debug info. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-11-16 20:31:48 -08:00
Saeed Mahameed	aef0f8c67d	net/mlx5: Fix format-security build warnings Treat the string as an argument to avoid this. drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:482:5: error: format string is not a string literal (potentially insecure) name); ^~~~ drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:2079:4: error: format string is not a string literal (potentially insecure) ptp_ch_stats_desc[i].format); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>	2021-11-16 20:31:47 -08:00
Saeed Mahameed	bc541621f8	net/mlx5e: Support ethtool cq mode Add support for ethtool coalesce cq mode set and get. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>	2021-11-16 20:31:47 -08:00
Russell King (Oracle)	b9241f5413	net: document SMII and correct phylink's new validation mechanism SMII has not been documented in the kernel, but information on this PHY interface mode has been recently found. Document it, and correct the recently introduced phylink handling for this interface mode. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/E1mmfVl-0075nP-14@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-16 19:22:30 -08:00
Jakub Kicinski	be0f6c4100	Merge branch 'r8169-disable-detection-of-further-chip-versions-that-didn-t-make-it-to-the-mass-market' Heiner Kallweit says: ==================== r8169: disable detection of further chip versions that didn't make it to the mass market There's no sign of life from further chip versions. Seems they didn't make it to the mass market. Let's disable detection and if nobody complains remove support a few kernel versions later. ==================== Link: https://lore.kernel.org/r/7708d13a-4a2b-090d-fadf-ecdd0fff5d2e@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-16 19:10:35 -08:00
Heiner Kallweit	364ef1f378	r8169: disable detection of chip version 41 It seems this chip version never made it to the wild. Therefore disable detection and if nobody complains remove support completely later. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-16 19:10:32 -08:00
Heiner Kallweit	6c8a5cf97c	r8169: disable detection of chip version 45 It seems this chip version never made it to the wild. Therefore disable detection and if nobody complains remove support completely later. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-16 19:10:31 -08:00
Heiner Kallweit	2d6600c754	r8169: disable detection of chip versions 49 and 50 It seems these chip versions never made it to the wild. Therefore disable detection and if nobody complains remove support completely later. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-16 19:10:31 -08:00
Heiner Kallweit	4b5f82f6aa	r8169: enable ASPM L1/L1.1 from RTL8168h With newer chip versions ASPM-related issues seem to occur only if L1.2 is enabled. I have a test system with RTL8168h that gives a number of rx_missed errors when running iperf and L1.2 is enabled. With L1.2 disabled (and L1 + L1.1 active) everything is fine. See also [0]. Can't test this, but L1 + L1.1 being active should be sufficient to reach higher package power saving states. [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1942830 Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/36feb8c4-a0b6-422a-899c-e61f2e869dfe@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-16 19:10:17 -08:00
Jakub Kicinski	c60c34a910	Merge branch 'net-better-packing-of-global-vars' Eric Dumazet says: ==================== net: better packing of global vars First two patches avoid holes in data section, and last patch makes sure some siphash keys are contained in a single cache line. ==================== Link: https://lore.kernel.org/r/20211115172303.3732746-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-16 19:07:58 -08:00
Eric Dumazet	49ecc2e9c3	net: align static siphash keys siphash keys use 16 bytes. Define siphash_aligned_key_t macro so that we can make sure they are not crossing a cache line boundary. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-16 19:07:54 -08:00
Eric Dumazet	7071732c26	net: use .data.once section in netdev_level_once() Same rationale than prior patch : using the dedicated section avoid holes and pack all these bool values. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-16 19:07:53 -08:00
Eric Dumazet	c2c60ea37e	once: use __section(".data.once") .data.once contains nicely packed bool variables. It is used already by DO_ONCE_LITE(). Using it also in DO_ONCE() removes holes in .data section. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-11-16 19:07:53 -08:00
David S. Miller	62803fec52	Merge branch 'inuse-cleanups' Eric Dumazet says: ==================== net: prot_inuse and sock_inuse cleanups Small series cleaning and optimizing sock_prot_inuse_add() and sock_inuse_add(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:20:45 +00:00
Eric Dumazet	b3cb764aa1	net: drop nopreempt requirement on sock_prot_inuse_add() This is distracting really, let's make this simpler, because many callers had to take care of this by themselves, even if on x86 this adds more code than really needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:20:45 +00:00
Eric Dumazet	4199bae10c	net: merge net->core.prot_inuse and net->core.sock_inuse net->core.sock_inuse is a per cpu variable (int), while net->core.prot_inuse is another per cpu variable of 64 integers. per cpu allocator tend to place them in very different places. Grouping them together makes sense, since it makes updates potentially faster, if hitting the same cache line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:20:45 +00:00
Eric Dumazet	d477eb9004	net: make sock_inuse_add() available MPTCP hard codes it, let us instead provide this helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:20:45 +00:00
Eric Dumazet	2a12ae5d43	net: inline sock_prot_inuse_add() sock_prot_inuse_add() is very small, we can inline it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:20:45 +00:00
David S. Miller	abc3342a09	Merge branch 'gro-out-of-core-files' Eric Dumazet says: ==================== gro: get out of core files Move GRO related content into net/core/gro.c and include/net/gro.h. This reduces GRO scope to where it is really needed, and shrinks too big files (include/linux/netdevice.h and net/core/dev.c) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:16:54 +00:00
Eric Dumazet	587652bbdd	net: gro: populate net/core/gro.c Move gro code and data from net/core/dev.c to net/core/gro.c to ease maintenance. gro_normal_list() and gro_normal_one() are inlined because they are called from both files. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:16:54 +00:00
Eric Dumazet	e456a18a39	net: gro: move skb_gro_receive into net/core/gro.c net/core/gro.c will contain all core gro functions, to shrink net/core/skbuff.c and net/core/dev.c Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:16:54 +00:00
Eric Dumazet	0b935d7f8c	net: gro: move skb_gro_receive_list to udp_offload.c This helper is used once, no need to keep it in fat net/core/skbuff.c Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:16:54 +00:00
Eric Dumazet	4721031c35	net: move gro definitions to include/net/gro.h include/linux/netdevice.h became too big, move gro stuff into include/net/gro.h Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:16:54 +00:00
David S. Miller	6fcc06205c	Merge branch 'tcp-optimizations' Eric Dumazet says: ==================== tcp: optimizations for linux-5.17 Mostly small improvements in this series. The notable change is in "defer skb freeing after socket lock is released" in recvmsg() (and RX zerocopy) The idea is to try to let skb freeing to BH handler, whenever possible, or at least perform the freeing outside of the socket lock section, for much improved performance. This idea can probably be extended to other protocols. Tests on a 100Gbit NIC Max throughput for one TCP_STREAM flow, over 10 runs. MTU : 1500 (1428 bytes of TCP payload per MSS) Before: 55 Gbit After: 66 Gbit MTU : 4096+ (4096 bytes of TCP payload, plus TCP/IPv6 headers) Before: 82 Gbit After: 95 Gbit ==================== Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:35 +00:00
Eric Dumazet	43f51df417	net: move early demux fields close to sk_refcnt sk_rx_dst/sk_rx_dst_ifindex/sk_rx_dst_cookie are read in early demux, and currently spans two cache lines. Moving them close to sk_refcnt makes more sense, as only one cache line is needed. New layout for this hot cache line is : struct sock { struct sock_common __sk_common; /* 0 0x88 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / struct dst_entry sk_rx_dst; /* 0x88 0x8 / int sk_rx_dst_ifindex; / 0x90 0x4 / u32 sk_rx_dst_cookie; / 0x94 0x4 / socket_lock_t sk_lock; / 0x98 0x20 / atomic_t sk_drops; / 0xb8 0x4 / int sk_rcvlowat; / 0xbc 0x4 / / --- cacheline 3 boundary (192 bytes) --- */ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:35 +00:00
Eric Dumazet	29fbc26e6d	tcp: do not call tcp_cleanup_rbuf() if we have a backlog Under pressure, tcp recvmsg() has logic to process the socket backlog, but calls tcp_cleanup_rbuf() right before. Avoiding sending ACK right before processing new segments makes a lot of sense, as this decrease the number of ACK packets, with no impact on effective ACK clocking. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:35 +00:00
Eric Dumazet	8bd172b787	tcp: check local var (timeo) before socket fields in one test Testing timeo before sk_err/sk_state/sk_shutdown makes more sense. Modern applications use non-blocking IO, while a socket is terminated only once during its life time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:35 +00:00
Eric Dumazet	f35f821935	tcp: defer skb freeing after socket lock is released tcp recvmsg() (or rx zerocopy) spends a fair amount of time freeing skbs after their payload has been consumed. A typical ~64KB GRO packet has to release ~45 page references, eventually going to page allocator for each of them. Currently, this freeing is performed while socket lock is held, meaning that there is a high chance that BH handler has to queue incoming packets to tcp socket backlog. This can cause additional latencies, because the user thread has to process the backlog at release_sock() time, and while doing so, additional frames can be added by BH handler. This patch adds logic to defer these frees after socket lock is released, or directly from BH handler if possible. Being able to free these skbs from BH handler helps a lot, because this avoids the usual alloc/free assymetry, when BH handler and user thread do not run on same cpu or NUMA node. One cpu can now be fully utilized for the kernel->user copy, and another cpu is handling BH processing and skb/page allocs/frees (assuming RFS is not forcing use of a single CPU) Tested: 100Gbit NIC Max throughput for one TCP_STREAM flow, over 10 runs MTU : 1500 Before: 55 Gbit After: 66 Gbit MTU : 4096+(headers) Before: 82 Gbit After: 95 Gbit Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:35 +00:00
Eric Dumazet	3df684c1a3	tcp: avoid indirect calls to sock_rfree TCP uses sk_eat_skb() when skbs can be removed from receive queue. However, the call to skb_orphan() from __kfree_skb() incurs an indirect call so sock_rfee(), which is more expensive than a direct call, especially for CONFIG_RETPOLINE=y. Add tcp_eat_recv_skb() function to make the call before __kfree_skb(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:35 +00:00
Eric Dumazet	b96c51bd3b	tcp: tp->urg_data is unlikely to be set Use some unlikely() hints in the fast path. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	7b6a893a59	tcp: annotate races around tp->urg_data tcp_poll() and tcp_ioctl() are reading tp->urg_data without socket lock owned. Also, it is faster to first check tp->urg_data in tcp_poll(), then tp->urg_seq == tp->copied_seq, because tp->urg_seq is located in a different/cold cache line. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	0307a0b74b	tcp: annotate data-races on tp->segs_in and tp->data_segs_in tcp_segs_in() can be called from BH, while socket spinlock is held but socket owned by user, eventually reading these fields from tcp_get_info() Found by code inspection, no need to backport this patch to older kernels. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	d2489c7b6d	tcp: add RETPOLINE mitigation to sk_backlog_rcv Use INDIRECT_CALL_INET() to avoid an indirect call when/if CONFIG_RETPOLINE=y Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	93afcfd1db	tcp: small optimization in tcp recvmsg() When reading large chunks of data, incoming packets might be added to the backlog from BH. tcp recvmsg() detects the backlog queue is not empty, and uses a release_sock()/lock_sock() pair to process this backlog. We now have __sk_flush_backlog() to perform this a bit faster. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	91b6d32563	net: cache align tcp_memory_allocated, tcp_sockets_allocated tcp_memory_allocated and tcp_sockets_allocated often share a common cache line, source of false sharing. Also take care of udp_memory_allocated and mptcp_sockets_allocated. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	6c302e799a	net: forward_alloc_get depends on CONFIG_MPTCP (struct proto)->sk_forward_alloc is currently only used by MPTCP. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	1ace2b4d2b	net: shrink struct sock by 8 bytes Move sk_bind_phc next to sk_peer_lock to fill a hole. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	1b31debca8	ipv6: shrink struct ipcm6_cookie gso_size can be moved after tclass, to use an existing hole. (8 bytes saved on 64bit arches) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	aba546565b	net: remove sk_route_nocaps Instead of using a full netdev_features_t, we can use a single bit, as sk_route_nocaps is only used to remove NETIF_F_GSO_MASK from sk->sk_route_cap. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	d0d598ca86	net: remove sk_route_forced_caps We were only using one bit, and we can replace it by sk_is_tcp() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	42f67eea3b	net: use sk_is_tcp() in more places Move sk_is_tcp() to include/net/sock.h and use it where we can. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00
Eric Dumazet	3735440200	tcp: small optimization in tcp_v6_send_check() For TCP flows, inet6_sk(sk)->saddr has the same value than sk->sk_v6_rcv_saddr. Using sk->sk_v6_rcv_saddr increases data locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-16 13:10:34 +00:00

1 2 3 4 5 ...

1057854 Commits