linux

iv/linux

Author	SHA1	Message	Date
Yunsheng Lin	a0727489ac	net: introduce page_frag_cache_drain() When draining a page_frag_cache, most user are doing the similar steps, so introduce an API to avoid code duplication. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-05 11:38:14 +01:00
Jeroen de Borst	056a70924a	gve: Add header split ethtool stats To record the stats of header split packets, three stats are added in the driver's ethtool stats. - rx_hsplit_pkt is the split packets count with header split - rx_hsplit_bytes is the received header bytes count with header split - rx_hsplit_unsplit_pkt is the unsplit packet count due to header buffer overflow or zero header length when header split is enabled Currently, it's entering the stats_update critical section more than once per packet. We have plans to avoid that in the future change to let all the stats_update happen in one place at the end of `gve_rx_poll_dqo`. Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-03-04 10:03:32 +00:00
Jeroen de Borst	5e37d8254e	gve: Add header split data path Add header buffers and ethtool support to enable header split via the tcp-data-split flag in ethtool's ringparam config. A coherent dma memory is allocated for the header buffers. There is one header buffer per ring entry by calculating the offset to the header-buffers starting address. The header buffer is always copied directly into the skb and payload is always added as frags. When there is a header buffer overflow or the header length is 0, the driver places the whole unsplit packet in frags. When toggling header split, the driver will call gve_adjust_config to set its queues appropriately. If header split is enabled by the user and the max packet buffer size is no less than 4KB, driver will set the packet buffer size as 4KB to support TCP_ZEROCOPY_RECEIVE. Otherwise the driver will use the default 2KB as the packet buffer size. `ethtool -G <dev> tcp-data-split on/off` is the command to toggle header split. `ethtool -g <dev>` will show the status of header split with the field of `tcp-data-split`. Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-03-04 10:03:32 +00:00
Jeroen de Borst	0b43cf527d	gve: Add header split device option To enable header split via ethtool, we first need to query the device to get the max rx buffer size and header buffer size. Add a device option to get these values and store them in the driver. If the header buffer size received from the device is non-zero, it means header split is supported in the device. Currently the max rx buffer size will only be used when header split is enabled which will set the data_buffer_size_dqo to be the max rx buffer size. Also change the data_buffer_size_dqo from int to u16 since we are modifying it and making it to be consistent with max_rx_buffer_size. Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-03-04 10:03:31 +00:00
Jakub Kicinski	cf244463a2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-02-01 15:12:37 -08:00
Ankit Garg	3df1841626	gve: Modify rx_buf_alloc_fail counter centrally and closer to failure Previously, each caller of gve_rx_alloc_buffer had to increase counter and as a result one caller was not tracking those failure. Increasing counters at a common location now so callers don't have to duplicate code or miss counter management. Signed-off-by: Ankit Garg <nktgrg@google.com> Link: https://lore.kernel.org/r/20240124205435.1021490-1-nktgrg@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-25 17:18:56 -08:00
Praveen Kaligineedi	5343267110	gve: Fix skb truesize underestimation For a skb frag with a newly allocated copy page, the true size is incorrectly set to packet buffer size. It should be set to PAGE_SIZE instead. Fixes: `82fd151d38` ("gve: Reduce alloc and copy costs in the GQ rx path") Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Link: https://lore.kernel.org/r/20240124161025.1819836-1-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-25 17:08:34 -08:00
Shailend Chand	f3753771e7	gve: Alloc before freeing when changing features Previously, existing queues were being freed before the resources for the new queues were being allocated. This would take down the interface if someone were to attempt to change feature flags under a resource crunch. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Link: https://lore.kernel.org/r/20240122182632.1102721-7-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 17:41:31 -08:00
Shailend Chand	5f08cd3d64	gve: Alloc before freeing when adjusting queues Previously, existing queues were being freed before the resources for the new queues were being allocated. This would take down the interface if someone were to attempt to change queue counts under a resource crunch. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Link: https://lore.kernel.org/r/20240122182632.1102721-6-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 17:41:31 -08:00
Shailend Chand	92a6d7a401	gve: Refactor gve_open and gve_close gve_open is rewritten to be composed of two funcs: gve_queues_mem_alloc and gve_queues_start. The former only allocates queue resources without doing anything to install the queues, which is taken up by the latter. Similarly gve_close is split into gve_queues_stop and gve_queues_mem_free. Separating the acts of queue resource allocation and making the queue become live help with subsequent changes that aim to not take down the datapath when applying new configurations. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Link: https://lore.kernel.org/r/20240122182632.1102721-5-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 17:41:31 -08:00
Shailend Chand	f13697cc7a	gve: Switch to config-aware queue allocation The new config-aware functions will help achieve the goal of being able to allocate resources for new queues while there already are active queues serving traffic. These new functions work off of arbitrary queue allocation configs rather than just the currently active config in priv, and they return the newly allocated resources instead of writing them into priv. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240122182632.1102721-4-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 17:41:31 -08:00
Shailend Chand	1dfc2e4611	gve: Refactor napi add and remove functions This change makes the napi poll functions non-static and moves the gve_(add\|remove)_napi functions to gve_utils.c, to make possible future "start queue" hooks in the datapath files. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240122182632.1102721-3-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 17:41:31 -08:00
Shailend Chand	7cea48b9a4	gve: Define config structs for queue allocation Queue allocation functions currently can only allocate into priv and free memory in priv. These new structs would be passed into the queue functions in a subsequent change to make them capable of returning newly allocated resources and not just writing them into priv. They also make it possible to allocate resources for queues with a different config than that of the currently active queues. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240122182632.1102721-2-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 17:41:30 -08:00
John Fraker	da7d4b42ca	gve: Remove dependency on 4k page size. Prior to this change, gve crashes when attempting to run in kernels with page sizes other than 4k. This change removes unnecessary references to PAGE_SIZE and replaces them with more meaningful constants. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-6-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:36 -08:00
John Fraker	513072fb4b	gve: Add page size register to the register_page_list command. This register is required on platforms with page sizes greater than 4k. This is because the tx side of the driver vmaps the entire queue page list of pages into a single flat address space, then uses the entire space. Without communicating the guest page size to the backend, the backend will only access the first 4k of each page in the queue page list. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-5-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:36 -08:00
John Fraker	ce260cb114	gve: Remove obsolete checks that rely on page size. These checks are safe to remove as they are no longer enforced by the backend. Retaining them would require updating them to work differently with page sizes larger than 4k. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-4-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:36 -08:00
John Fraker	8ae980d241	gve: Deprecate adminq_pfn for pci revision 0x1. adminq_pfn assumes a page size of 4k, causing this mechanism to break in kernels compiled with different page sizes. A new PCI device revision was needed for the device to be able to communicate with the driver how to set up the admin queue prior to having access to the admin queue. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-3-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:36 -08:00
John Fraker	955f4d3bf0	gve: Perform adminq allocations through a dma_pool. This allows the adminq to be smaller than a page, paving the way for non 4k page support. This is to support platforms where PAGE_SIZE is not 4k, such as some ARM platforms. Signed-off-by: Jordan Kimbrough <jrkim@google.com> Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231128002648.320892-2-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-29 08:32:35 -08:00
Eric Dumazet	18de1e517e	gve: add gve_features_check() It is suboptimal to attempt skb linearization from ndo_start_xmit() if a gso skb has pathological layout, or if host stack does not have access to the payload (TCP direct). Linearization of large skbs can also fail under memory pressure. We should instead have an ndo_features_check() so that we can fallback to GSO, which is supported even for TCP direct, and generally much more efficient (no payload copy). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Bailey Forrest <bcf@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jeroen de Borst <jeroendb@google.com> Cc: Praveen Kaligineedi <pkaligineedi@google.com> Cc: Shailend Chand <shailend@google.com> Cc: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-11-17 03:29:16 +00:00
Ziwei Xiao	278a370c17	gve: Fixes for napi_poll when budget is 0 Netpoll will explicilty pass the polling call with a budget of 0 to indicate it's clearing the Tx path only. For the gve_rx_poll and gve_xdp_poll, they were mistakenly taking the 0 budget as the indication to do all the work. Add check to avoid the rx path and xdp path being called when budget is 0. And also avoid napi_complete_done being called when budget is 0 for netpoll. Fixes: `f5cedc84a3` ("gve: Add transmit and receive support") Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Link: https://lore.kernel.org/r/20231114004144.2022268-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-11-14 20:00:11 -08:00
Jakub Kicinski	041c3466f3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. net/mac80211/key.c `02e0e426a2` ("wifi: mac80211: fix error path key leak") `2a8b665e6b` ("wifi: mac80211: remove key_mtx") `7d6904bf26` ("Merge wireless into wireless-next") https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/ti/Kconfig `a602ee3176` ("net: ethernet: ti: Fix mixed module-builtin object") `98bdeae950` ("net: cpmac: remove driver to prepare for platform removal") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-19 13:29:01 -07:00
Shailend Chand	95535e37e8	gve: Do not fully free QPL pages on prefill errors The prefill function should have only removed the page count bias it added. Fully freeing the page will cause gve_free_queue_page_list to free a page the driver no longer owns. Fixes: `82fd151d38` ("gve: Reduce alloc and copy costs in the GQ rx path") Signed-off-by: Shailend Chand <shailend@google.com> Link: https://lore.kernel.org/r/20231014014121.2843922-1-shailend@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-10-17 13:33:29 +02:00
Christian Marangi	73382e919f	netdev: replace napi_reschedule with napi_schedule Now that napi_schedule return a bool, we can drop napi_reschedule that does the same exact function. The function comes from a very old commit `bfe13f54f5` ("ibm_emac: Convert to use napi_struct independent of struct net_device") and the purpose is actually deprecated in favour of different logic. Convert every user of napi_reschedule to napi_schedule. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> # ath10k Acked-by: Nick Child <nnac123@linux.ibm.com> # ibm Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for can/dev/rx-offload.c Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20231009133754.9834-3-ansuelsmth@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-10-11 17:28:06 -07:00
Gustavo A. R. Silva	d692873cbe	gve: Use size_add() in call to struct_size() If, for any reason, `tx_stats_num + rx_stats_num` wraps around, the protection that struct_size() adds against potential integer overflows is defeated. Fix this by hardening call to struct_size() with size_add(). Fixes: `691f4077d5` ("gve: Replace zero-length array with flexible-array member") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-17 18:50:33 +01:00
Eric Dumazet	817c7cd204	gve: fix frag_list chaining gve_rx_append_frags() is able to build skbs chained with frag_list, like GRO engine. Problem is that shinfo->frag_list should only be used for the head of the chain. All other links should use skb->next pointer. Otherwise, built skbs are not valid and can cause crashes. Equivalent code in GRO (skb_gro_receive()) is: if (NAPI_GRO_CB(p)->last == p) skb_shinfo(p)->frag_list = skb; else NAPI_GRO_CB(p)->last->next = skb; NAPI_GRO_CB(p)->last = skb; Fixes: `9b8dd5e5ea` ("gve: DQO: Add RX path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Bailey Forrest <bcf@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Catherine Sullivan <csully@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-09-04 06:52:27 +01:00
Rushil Gupta	e7075ab4fb	gve: RX path for DQO-QPL The RX path allocates the QPL page pool at queue creation, and tries to reuse these pages through page recycling. This patch ensures that on refill no non-QPL pages are posted to the device. When the driver is running low on free buffers, an ondemand allocation step kicks in that allocates a non-qpl page for SKB business to free up the QPL page in use. gve_try_recycle_buf was moved to gve_rx_append_frags so that driver does not attempt to mark buffer as used if a non-qpl page was allocated ondemand. Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-06 08:34:36 +01:00
Rushil Gupta	a6fb8d5a8b	gve: Tx path for DQO-QPL Each QPL page is divided into GVE_TX_BUFS_PER_PAGE_DQO buffers. When a packet needs to be transmitted, we break the packet into max GVE_TX_BUF_SIZE_DQO sized chunks and transmit each chunk using a TX descriptor. We allocate the TX buffers from the free list in dqo_tx. We store these TX buffer indices in an array in the pending_packet structure. The TX buffers are returned to the free list in dqo_compl after receiving packet completion or when removing packets from miss completions list. Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-06 08:34:36 +01:00
Rushil Gupta	66ce8e6b49	gve: Control path for DQO-QPL GVE supports QPL ("queue-page-list") mode where all data is communicated through a set of pre-registered pages. Adding this mode to DQO descriptor format. Add checks, abi-changes and device options to support QPL mode for DQO in addition to GQI. Also, use pages-per-qpl supplied by device-option to control the size of the "queue-page-list". Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-06 08:34:36 +01:00
Jakub Kicinski	92272ec410	eth: add missing xdp.h includes in drivers Handful of drivers currently expect to get xdp.h by virtue of including netdevice.h. This will soon no longer be the case so add explicit includes. Reviewed-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-08-03 08:38:07 -07:00
Jesper Dangaard Brouer	68af900072	gve: trivial spell fix Recive to Receive Spotted this trivial spell mistake while casually reading the google GVE driver code. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-14 10:28:17 +01:00
Junfeng Guo	9d0aba9831	gve: unify driver name usage Current codebase contained the usage of two different names for this driver (i.e., `gvnic` and `gve`), which is quite unfriendly for users to use, especially when trying to bind or unbind the driver manually. The corresponding kernel module is registered with the name of `gve`. It's more reasonable to align the name of the driver with the module. Fixes: `893ce44df5` ("gve: Add basic driver framework for Compute Engine Virtual NIC") Cc: csully@google.com Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-10 08:29:55 +01:00
Junfeng Guo	0503efeadb	gve: Set default duplex configuration to full Current duplex mode was unset in the driver, resulting in the default parameter being set to 0, which corresponds to half duplex. It might mislead users to have incorrect expectation about the driver's transmission capabilities. Set the default duplex configuration to full, as the driver runs in full duplex mode at this point. Fixes: `7e074d5a76` ("gve: Enable Link Speed Reporting in the driver.") Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Message-ID: <20230706044128.2726747-1-junfeng.guo@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-06 19:14:24 -07:00
Julia Lawall	a13de901e8	gve: use vmalloc_array and vcalloc Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" \| "vzalloc" -> "vcalloc" \| _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1x2x3) \| alloc(C1 * C2) \| alloc((sizeof(t)) * (COUNT), ...) \| - alloc((e1) * (e2)) + realloc(e1, e2) \| - alloc((e1) * (COUNT)) + realloc(COUNT, e1) \| - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-5-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-27 09:30:23 -07:00
Coco Li	a695641c8e	gve: Support IPv6 Big TCP on DQ Add support for using IPv6 Big TCP on DQ which can handle large TSO/GRO packets. See https://lwn.net/Articles/895398/. This can improve the throughput and CPU usage. Perf test result: ip -d link show $DEV gso_max_size 185000 gso_max_segs 65535 tso_max_size 262143 tso_max_segs 65535 gro_max_size 185000 For performance, tested with neper using 9k MTU on hardware that supports 200Gb/s line rate. In single streams when line rate is not saturated, we expect throughput improvements. When the networking is performing at line rate, we expect cpu usage improvements. Tcp_stream (unidirectional stream test, T=thread, F=flow): skb=180kb, T=1, F=1, no zerocopy: throughput average=64576.88 Mb/s, sender stime=8.3, receiver stime=10.68 skb=64kb, T=1, F=1, no zerocopy: throughput average=64862.54 Mb/s, sender stime=9.96, receiver stime=12.67 skb=180kb, T=1, F=1, yes zerocopy: throughput average=146604.97 Mb/s, sender stime=10.61, receiver stime=5.52 skb=64kb, T=1, F=1, yes zerocopy: throughput average=131357.78 Mb/s, sender stime=12.11, receiver stime=12.25 skb=180kb, T=20, F=100, no zerocopy: throughput average=182411.37 Mb/s, sender stime=41.62, receiver stime=79.4 skb=64kb, T=20, F=100, no zerocopy: throughput average=182892.02 Mb/s, sender stime=57.39, receiver stime=72.69 skb=180kb, T=20, F=100, yes zerocopy: throughput average=182337.65 Mb/s, sender stime=27.94, receiver stime=39.7 skb=64kb, T=20, F=100, yes zerocopy: throughput average=182144.20 Mb/s, sender stime=47.06, receiver stime=39.01 Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230522201552.3585421-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-23 21:11:35 -07:00
Ziwei Xiao	f4c2e67c17	gve: Remove the code of clearing PBA bit Clearing the PBA bit from the driver is race prone and it may lead to dropped interrupt events. This could potentially lead to the traffic being completely halted. Fixes: `5e8c5adf95` ("gve: DQO: Add core netdev features") Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-10 10:30:46 +01:00
Shailend Chand	4de00f0acc	gve: Unify duplicate GQ min pkt desc size constants The two constants accomplish the same thing. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230407184830.309398-1-shailend@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-04-11 15:47:14 +02:00
Jakub Kicinski	d9c960675a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/google/gve/gve.h `3ce9345580` ("gve: Secure enough bytes in the first TX desc for all TCP pkts") `75eaae158b` ("gve: Add XDP DROP and TX support for GQI-QPL format") https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/ https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/ Adjacent changes: net/can/isotp.c `051737439e` ("can: isotp: fix race between isotp_sendsmg() and isotp_release()") `96d1c81e6a` ("can: isotp: add module parameter for maximum pdu size") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-06 12:01:20 -07:00
Shailend Chand	3ce9345580	gve: Secure enough bytes in the first TX desc for all TCP pkts Non-GSO TCP packets whose SKBs' linear portion did not include the entire TCP header were not populating the first Tx descriptor with as many bytes as the vNIC expected. This change ensures that all TCP packets populate the first descriptor with the correct number of bytes. Fixes: `893ce44df5` ("gve: Add basic driver framework for Compute Engine Virtual NIC") Signed-off-by: Shailend Chand <shailend@google.com> Link: https://lore.kernel.org/r/20230403172809.2939306-1-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-04-04 18:58:12 -07:00
Jakub Kicinski	dc0a7b5200	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c `6e9d51b1a5` ("net/mlx5e: Initialize link speed to zero") `1bffcea429` ("net/mlx5e: Add devlink hairpin queues parameters") https://lore.kernel.org/all/20230324120623.4ebbc66f@canb.auug.org.au/ https://lore.kernel.org/all/20230321211135.47711-1-saeed@kernel.org/ Adjacent changes: drivers/net/phy/phy.c `323fe43cf9` ("net: phy: Improved PHY error reporting in state machine") `4203d84032` ("net: phy: Ensure state transitions are processed from phy_stop()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-24 10:10:20 -07:00
Joshua Washington	68c3e4fc86	gve: Cache link_speed value from device The link speed is never changed for the uptime of a VM, and the current implementation sends an admin queue command for each call. Admin queue command invocations have nontrivial overhead (e.g., VM exits), which can be disruptive to users if triggered frequently. Our telemetry data shows that there are VMs that make frequent calls to this admin queue command. Caching the result of the original admin queue command would eliminate the need to send multiple admin queue commands on subsequent calls to retrieve link speed. Fixes: `7e074d5a76` ("gve: Enable Link Speed Reporting in the driver.") Signed-off-by: Joshua Washington <joshwash@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230321172332.91678-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-22 22:03:21 -07:00
Praveen Kaligineedi	fd8e40321a	gve: Add AF_XDP zero-copy support for GQI-QPL format Adding AF_XDP zero-copy support. Note: Although these changes support AF_XDP socket in zero-copy mode, there is still a copy happening within the driver between XSK buffer pool and QPL bounce buffers in GQI-QPL format. In GQI-QPL queue format, the driver needs to allocate a fixed size memory, the size specified by vNIC device, for RX/TX and register this memory as a bounce buffer with the vNIC device when a queue is created. The number of pages in the bounce buffer is limited and the pages need to be made available to the vNIC by copying the RX data out to prevent head-of-line blocking. Therefore, we cannot pass the XSK buffer pool to the vNIC. The number of copies on RX path from the bounce buffer to XSK buffer is 2 for AF_XDP copy mode (bounce buffer -> allocated page frag -> XSK buffer) and 1 for AF_XDP zero-copy mode (bounce buffer -> XSK buffer). This patch contains the following changes: 1) Enable and disable XSK buffer pool 2) Copy XDP packets from QPL bounce buffers to XSK buffer on rx 3) Copy XDP packets from XSK buffer to QPL bounce buffers and ring the doorbell as part of XDP TX napi poll 4) ndo_xsk_wakeup callback support Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-17 08:29:21 +00:00
Praveen Kaligineedi	39a7f4aa3e	gve: Add XDP REDIRECT support for GQI-QPL format This patch contains the following changes: 1) Support for XDP REDIRECT action on rx 2) ndo_xdp_xmit callback support In GQI-QPL queue format, the driver needs to allocate a fixed size memory, the size specified by vNIC device, for RX/TX and register this memory as a bounce buffer with the vNIC device when a queue is created. The number of pages in the bounce buffer is limited and the pages need to be made available to the vNIC by copying the RX data out to prevent head-of-line blocking. The XDP_REDIRECT packets are therefore immediately copied to a newly allocated page. Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-17 08:29:21 +00:00
Praveen Kaligineedi	75eaae158b	gve: Add XDP DROP and TX support for GQI-QPL format Add support for XDP PASS, DROP and TX actions. This patch contains the following changes: 1) Support installing/uninstalling XDP program 2) Add dedicated XDP TX queues 3) Add support for XDP DROP action 4) Add support for XDP TX action Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-17 08:29:20 +00:00
Praveen Kaligineedi	7fc2bf78a4	gve: Changes to add new TX queues Changes to enable adding and removing TX queues without calling gve_close() and gve_open(). Made the following changes: 1) priv->tx, priv->rx and priv->qpls arrays are allocated based on max tx queues and max rx queues 2) Changed gve_adminq_create_tx_queues(), gve_adminq_destroy_tx_queues(), gve_tx_alloc_rings() and gve_tx_free_rings() functions to add/remove a subset of TX queues rather than all the TX queues. Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-17 08:29:20 +00:00
Praveen Kaligineedi	2e80aeae9f	gve: XDP support GQI-QPL: helper function changes This patch adds/modifies helper functions needed to add XDP support. Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-17 08:29:20 +00:00
Praveen Kaligineedi	8437114593	gve: Fix gve interrupt names IRQs are currently requested before the netdevice is registered and a proper name is assigned to the device. Changing interrupt name to avoid using the format string in the name. Interrupt name before change: eth%d-ntfy-block.<blk_id> Interrupt name after change: gve-ntfy-blk<blk_id>@pci:<pci_name> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 10:03:26 +00:00
Jeroen de Borst	a5affbd8a7	gve: Handle alternate miss completions The virtual NIC has 2 ways of indicating a miss-path completion. This handles the alternate. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-21 10:52:14 +00:00
Jeroen de Borst	c2a0c3ed5b	gve: Adding a new AdminQ command to verify driver Check whether the driver is compatible with the device presented. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-21 10:52:14 +00:00
Yang Yingliang	64c426dfbb	gve: Fix error return code in gve_prefill_rx_pages() If alloc_page() fails in gve_prefill_rx_pages(), it should return an error code in the error path. Fixes: `82fd151d38` ("gve: Reduce alloc and copy costs in the GQ rx path") Cc: Jeroen de Borst <jeroendb@google.com> Cc: Catherine Sullivan <csully@google.com> Cc: Shailend Chand <shailend@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-07 11:32:28 +00:00
Shailend Chand	82fd151d38	gve: Reduce alloc and copy costs in the GQ rx path Previously, even if just one of the many fragments of a 9k packet required a copy, we'd copy the whole packet into a freshly-allocated 9k-sized linear SKB, and this led to performance issues. By having a pool of pages to copy into, each fragment can be independently handled, leading to a reduced incidence of allocation and copy. Signed-off-by: Shailend Chand <shailend@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-02 11:52:51 +00:00

1 2 3 4

156 Commits