linux

iv/linux

Author	SHA1	Message	Date
Bailey Forrest	e8192476de	gve: Fix warnings reported for DQO patchset https://patchwork.kernel.org/project/netdevbpf/list/?series=506637&state=* - Remove unused variable - Use correct integer type for string formatting. - Remove `inline` in C files Fixes: 9c1a59a2f4bc ("gve: DQO: Add ring allocation and initialization") Fixes: a57e5de476be ("gve: DQO: Add TX path") Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 15:38:29 -07:00
Bailey Forrest	9b8dd5e5ea	gve: DQO: Add RX path The RX queue has an array of `gve_rx_buf_state_dqo` objects. All allocated pages have an associated buf_state object. When a buffer is posted on the RX buffer queue, the buffer ID will be the buf_state's index into the RX queue's array. On packet reception, the RX queue will have one descriptor for each buffer associated with a received packet. Each RX descriptor will have a buffer_id that was posted on the buffer queue. Notable mentions: - We use a default buffer size of 2048 bytes. Based on page size, we may post separate sections of a single page as separate buffers. - The driver holds an extra reference on pages passed up the receive path with an skb and keeps these pages on a list. When posting new buffers to the NIC, we check if any of these pages has only our reference, or another buffer sized segment of the page has no references. If so, it is free to reuse. This page recycling approach is a common netdev optimization that reduces page alloc/free calls. - Pages in the free list have a page_count bias in order to avoid an atomic increment of pagecount every time we attempt to reuse a page. # references = page_count() - bias - In order to track when a page is safe to reuse, we keep track of the last offset which had a single SKB reference. When this occurs, it implies that every single other offset is reusable. Otherwise, we don't know if offsets can be safely reused. - We maintain two free lists of pages. List #1 (recycled_buf_states) contains pages we know can be reused right away. List #2 (used_buf_states) contains pages which cannot be used right away. We only attempt to get pages from list #2 when list #1 is empty. We only attempt to use a small fixed number pages from list #2 before giving up and allocating a new page. Both lists are FIFOs in hope that by the time we attempt to reuse a page, the references were dropped. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:38 -07:00
Bailey Forrest	a57e5de476	gve: DQO: Add TX path TX SKBs will have their buffers DMA mapped with the device. Each buffer will have at least one TX descriptor associated. Each SKB will also have a metadata descriptor. Each TX queue maintains an array of `gve_tx_pending_packet_dqo` objects. Every TX SKB will have an associated pending_packet object. A TX SKB's descriptors will use its pending_packet's index as the completion tag, which will be returned on the TX completion queue. The device implements a "flow-miss model". Most packets will simply receive a packet completion. The flow-miss system may choose to process a packet based on its contents. A TX packet which experiences a flow miss would receive a miss completion followed by a later reinjection completion. The miss-completion is received when the packet starts to be processed by the flow-miss system and the reinjection completion is received when the flow-miss system completes processing the packet and sends it on the wire. Notable mentions: - Buffers may be freed after receiving the miss-completion, but in order to avoid packet reordering, we do not complete the SKB until receiving the reinjection completion. - The driver must robustly handle the unlikely scenario where a miss completion does not have an associated reinjection completion. This is accomplished by maintaining a list of packets which have a pending reinjection completion. After a short timeout (5 seconds), the SKB and buffers are released and the pending_packet is moved to a second list which has a longer timeout (60 seconds), where the pending_packet will not be reused. When the longer timeout elapses, the driver may assume the reinjection completion would never be received and the pending_packet may be reused. - Completion handling is triggered by an interrupt and is done in the NAPI poll function. Because the TX path and completion exist in different threading contexts they maintain their own lists for free pending_packet objects. The TX path uses a lock-free approach to steal the list from the completion path. - Both the TSO context and general context descriptors have metadata bytes. The device requires that if multiple descriptors contain the same field, each descriptor must have the same value set for that field. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:38 -07:00
Bailey Forrest	0dcc144a79	gve: DQO: Configure interrupts on device up When interrupts are first enabled, we also set the ratelimits, which will be static for the entire usage of the device. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:38 -07:00
Bailey Forrest	9c1a59a2f4	gve: DQO: Add ring allocation and initialization Allocate the buffer and completion ring structures. Do not populate the rings yet. That will happen in the respective rx and tx datapath follow-on patches Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:38 -07:00
Bailey Forrest	5e8c5adf95	gve: DQO: Add core netdev features Add napi netdev device registration, interrupt handling and initial tx and rx polling stubs. The stubs will be filled in follow-on patches. Also: - LRO feature advertisement and handling - Also update ethtool logic Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:38 -07:00
Bailey Forrest	1f6228e459	gve: Update adminq commands to support DQO queues DQO queue creation requires additional parameters: - TX completion/RX buffer queue size - TX completion/RX buffer queue address - TX/RX queue size - RX buffer size Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:38 -07:00
Bailey Forrest	a4aa1f1e69	gve: Add DQO fields for core data structures - Add new DQO datapath structures: - `gve_rx_buf_queue_dqo` - `gve_rx_compl_queue_dqo` - `gve_rx_buf_state_dqo` - `gve_tx_desc_dqo` - `gve_tx_pending_packet_dqo` - Incorporate these into the existing ring data structures: - `gve_rx_ring` - `gve_tx_ring` Noteworthy mentions: - `gve_rx_buf_state` represents an RX buffer which was posted to HW. Each RX queue has an array of these objects and the index into the array is used as the buffer_id when posted to HW. - `gve_tx_pending_packet_dqo` is treated similarly for TX queues. The completion_tag is the index into the array. - These two structures have links for linked lists which are represented by 16b indexes into a contiguous array of these structures. This reduces memory footprint compared to 64b pointers. - We use unions for the writeable datapath structures to reduce cache footprint. GQI specific members will renamed like DQO members in a future patch. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:38 -07:00
Bailey Forrest	223198183f	gve: Add dqo descriptors General description of rings and descriptors: TX ring is used for sending TX packet buffers to the NIC. It has the following descriptors: - `gve_tx_pkt_desc_dqo` - Data buffer descriptor - `gve_tx_tso_context_desc_dqo` - TSO context descriptor - `gve_tx_general_context_desc_dqo` - Generic metadata descriptor Metadata is a collection of 12 bytes. We define `gve_tx_metadata_dqo` which represents the logical interpetation of the metadata bytes. It's helpful to define this structure because the metadata bytes exist in multiple descriptor types (including `gve_tx_tso_context_desc_dqo`), and the device requires same field has the same value in all descriptors. The TX completion ring is used to receive completions from the NIC. Having a separate ring allows for completions to be out of order. The completion descriptor `gve_tx_compl_desc` has several different types, most important are packet and descriptor completions. Descriptor completions are used to notify the driver when descriptors sent on the TX ring are done being consumed. The descriptor completion is only used to signal that space is cleared in the TX ring. A packet completion will be received when a packet transmitted on the TX queue is done being transmitted. In addition there are "miss" and "reinjection" completions. The device implements a "flow-miss model". Most packets will simply receive a packet completion. The flow-miss system may choose to process a packet based on its contents. A TX packet which experiences a flow miss would receive a miss completion followed by a later reinjection completion. The miss-completion is received when the packet starts to be processed by the flow-miss system and the reinjection completion is received when the flow-miss system completes processing the packet and sends it on the wire. The RX buffer ring is used to send buffers to HW via the `gve_rx_desc_dqo` descriptor. Received packets are put into the RX queue by the device, which populates the `gve_rx_compl_desc_dqo` descriptor. The RX descriptors refer to buffers posted by the buffer queue. Received buffers may be returned out of order, such as when HW LRO is enabled. Important concepts: - "TX" and "RX buffer" queues, which send descriptors to the device, use MMIO doorbells to notify the device of new descriptors. - "RX" and "TX completion" queues, which receive descriptors from the device, use a "generation bit" to know when a descriptor was populated by the device. The driver initializes all bits with the "current generation". The device will populate received descriptors with the "next generation" which is inverted from the current generation. When the ring wraps, the current/next generation are swapped. - It's the driver's responsibility to ensure that the RX and TX completion queues are not overrun. This can be accomplished by limiting the number of descriptors posted to HW. - TX packets have a 16 bit completion_tag and RX buffers have a 16 bit buffer_id. These will be returned on the TX completion and RX queues respectively to let the driver know which packet/buffer was completed. Bitfields are used to describe descriptor fields. This notation is more concise and readable than shift-and-mask. It is possible because the driver is restricted to little endian platforms. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:37 -07:00
Bailey Forrest	c4b87ac876	gve: Add support for DQO RX PTYPE map Unlike GQI, DQO RX descriptors do not contain the L3 and L4 type of the packet. L3 and L4 types are necessary in order to set the hash and csum on RX SKBs correctly. DQO RX descriptors instead contain a 10 bit PTYPE index. The PTYPE map enables the device to tell the driver how to map from PTYPE index to L3/L4 type. The device doesn't provide any guarantees about the range of possible PTYPEs, so we just use a 1024 entry array to implement a fast mapping structure. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:37 -07:00
Bailey Forrest	5ca2265eef	gve: adminq: DQO specific device descriptor logic - In addition to TX and RX queues, DQO has TX completion and RX buffer queues. - TX completions are received when the device has completed sending a packet on the wire. - RX buffers are posted on a separate queue form the RX completions. - DQO descriptor rings are allowed to be smaller than PAGE_SIZE. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:37 -07:00
Bailey Forrest	a5886ef4f4	gve: Introduce per netdev `enum gve_queue_format` The currently supported queue formats are: - GQI_RDA - GQI with raw DMA addressing - GQI_QPL - GQI with queue page list - DQO_RDA - DQO with raw DMA addressing The old `gve_priv.raw_addressing` value is only used for GQI_RDA, so we remove it in favor of just checking against GQI_RDA Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:37 -07:00
Bailey Forrest	8a39d3e0da	gve: Introduce a new model for device options The current model uses an integer ID and a fixed size struct for the parameters of each device option. The new model allows the device option structs to grow in size over time. A driver may assume that changes to device option structs will always be appended. New device options will also generally have a `supported_features_mask` so that the driver knows which fields within a particular device option are enabled. `gve_device_option.feat_mask` is changed to `required_features_mask`, and it is a bitmask which must match the value expected by the driver. This gives the device the ability to break backwards compatibility with old drivers for certain features by blocking the old drivers from trying to use the feature. We maintain ABI compatibility with the old model for GVE_DEV_OPT_ID_RAW_ADDRESSING in case a driver is using a device which does not support the new model. This patch introduces some new terminology: RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA mapped and read/updated by the device. QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped with the device for read/write and data is copied from/to SKBs. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:37 -07:00
Bailey Forrest	920fb45193	gve: Make gve_rx_slot_page_info.page_offset an absolute offset Using `page_offset` like a boolean means a page may only be split into two sections. With page sizes larger than 4k, this can be very wasteful. Future commits in this patchset use `struct gve_rx_slot_page_info` in a way which supports a fixed buffer size and a variable page size. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:37 -07:00
Bailey Forrest	35f9b2f43f	gve: gve_rx_copy: Move padding to an argument Future use cases will have a different padding value. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:37 -07:00
Bailey Forrest	dbdaa67540	gve: Move some static functions to a common file These functions will be shared by the GQI and DQO variants of the GVNIC driver as of follow-up patches in this series. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:37 -07:00
Steen Hegelund	af4b11022e	net: sparx5: add ethtool configuration and statistics support This adds statistic counters for the network interfaces provided by the driver. It also adds CPU port counters (which are not exposed by ethtool). This also adds support for configuring the network interface parameters via ethtool: speed, duplex, aneg etc. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 11:28:13 -07:00
Steen Hegelund	0a9d48ad0d	net: sparx5: add calendar bandwidth allocation support This configures the Sparx5 calendars according to the bandwidth requested in the Device Tree nodes. It also checks if the total requested bandwidth is within the specs of the detected Sparx5 models limits. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 11:28:13 -07:00
Steen Hegelund	d6fce51419	net: sparx5: add switching support This adds SwitchDev support by hardware offloading the software bridge. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 11:28:13 -07:00
Steen Hegelund	78eab33bb6	net: sparx5: add vlan support This adds Sparx5 VLAN support. Sparx5 has more VLAN features than provided here, but these will be added in later series. For now we only add the basic L2 features. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 11:28:13 -07:00
Steen Hegelund	b37a1bae74	net: sparx5: add mactable support This adds the Sparx5 MAC tables: listening for MAC table updates and updating on request. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 11:28:13 -07:00
Steen Hegelund	946e7fd505	net: sparx5: add port module support This add configuration of the Sparx5 port module instances. Sparx5 has in total 65 logical ports (denoted D0 to D64) and 33 physical SerDes connections (S0 to S32). The 65th port (D64) is fixed allocated to SerDes0 (S0). The remaining 64 ports can in various multiplexing scenarios be connected to the remaining 32 SerDes using QSGMII, or USGMII or USXGMII extenders. 32 of the ports can have a 1:1 mapping to the 32 SerDes. Some additional ports (D65 to D69) are internal to the device and do not connect to port modules or SerDes macros. For example, internal ports are used for frame injection and extraction to the CPU queues. The 65 logical ports are split up into the following blocks. - 13 x 5G ports (D0-D11, D64) - 32 x 2G5 ports (D16-D47) - 12 x 10G ports (D12-D15, D48-D55) - 8 x 25G ports (D56-D63) Each logical port supports different line speeds, and depending on the speeds supported, different port modules (MAC+PCS) are needed. A port supporting 5 Gbps, 10 Gbps, or 25 Gbps as maximum line speed, will have a DEV5G, DEV10G, or DEV25G module to support the 5 Gbps, 10 Gbps (incl 5 Gbps), or 25 Gbps (including 10 Gbps and 5 Gbps) speeds. As well as, it will have a shadow DEV2G5 port module to support the lower speeds (10/100/1000/2500Mbps). When a port needs to operate at lower speed and the shadow DEV2G5 needs to be connected to its corresponding SerDes Not all interface modes are supported in this series, but will be added at a later stage. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 11:28:12 -07:00
Steen Hegelund	f3cad2611a	net: sparx5: add hostmode with phylink support This patch adds netdevs and phylink support for the ports in the switch. It also adds register based injection and extraction for these ports. Frame DMA support for injection and extraction will be added in a later series. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 11:28:12 -07:00
Steen Hegelund	3cfa11bac9	net: sparx5: add the basic sparx5 driver This adds the Sparx5 basic SwitchDev driver framework with IO range mapping, switch device detection and core clock configuration. Support for ports, phylink, netdev, mactable etc. are in the following patches. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com> Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 11:28:12 -07:00
David Wilder	7525de2516	ibmveth: Set CHECKSUM_PARTIAL if NULL TCP CSUM. TCP checksums on received packets may be set to NULL by the sender if CSO is enabled. The hypervisor flags these packets as check-sum-ok and the skb is then flagged CHECKSUM_UNNECESSARY. If these packets are then forwarded the sender will not request CSO due to the CHECKSUM_UNNECESSARY flag. The result is a TCP packet sent with a bad checksum. This change sets up CHECKSUM_PARTIAL on these packets causing the sender to correctly request CSUM offload. Signed-off-by: David Wilder <dwilder@us.ibm.com> Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> Tested-by: Cristobal Forno <cforno12@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-23 12:54:11 -07:00
David S. Miller	fe87797bf2	mlx5-net-next-2021-06-22 1) Various minor cleanups and fixes from net-next branch 2) Optimize mlx5 feature check on tx and a fix to allow Vxlan with Ipsec offloads -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmDSYyYACgkQSD+KveBX +j45Bgf+M7Vg3OK4fA7ZURrbxu2EiCQB4XprO/it7YSzpiZx634DNNRzWXQ2mLJD jx5TcmAFVUKkGmx/qrPYe/Y9c9l5s6JMjwACL5aEawXtvPzI/q1KBx/n5L+CoFYw lO1IpBPpkwqIbeIl9cwata7IJ6aTeOGjqfQ//Fodfwga063Aaggig6sEcPkr0Ewe wcuJmblnw/qOkSI2BlSOyixYiVjPRDF7cAVRTBK4/DCDFiGJTiaj8w0JgfdS2zVs 3xMrYajdz7qArMcqbuQe59KojeYj4hxALUSs+s9ks8qeIrbM+hZ9sH5m0dJgM4P6 7Pg87LD6PFDV31DWF0XM3KgdqfiZxw== =aszX -----END PGP SIGNATURE----- Merge tag 'mlx5-net-next-2021-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-net-next-2021-06-22 1) Various minor cleanups and fixes from net-next branch 2) Optimize mlx5 feature check on tx and a fix to allow Vxlan with Ipsec offloads ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-23 12:48:07 -07:00
Huy Nguyen	f1267798c9	net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload The packet is VXLAN packet over IPsec transport mode tunnel which has the following format: [IP1 \| ESP \| UDP \| VXLAN \| IP2 \| TCP] NVIDIA ConnectX card cannot do checksum offload for two L4 headers. The solution is using the checksum partial offload similar to VXLAN \| TCP packet. Hardware calculates IP1, IP2 and TCP checksums and software calculates UDP checksum. However, unlike VXLAN \| TCP case, IPsec's mlx5 driver cannot access the inner plaintext IP protocol type. Therefore, inner_ipproto is added in the sec_path structure to provide this information. Also, utilize the skb's csum_start to program L4 inner checksum offset. While at it, remove the call to mlx5e_set_eseg_swp and setup software parser fields directly in mlx5e_ipsec_set_swp. mlx5e_set_eseg_swp is not needed as the two features (GENEVE and IPsec) are different and adding this sharing layer creates unnecessary complexity and affect performance. For the case VXLAN packet over IPsec tunnel mode tunnel, checksum offload is disabled because the hardware does not support checksum offload for three L3 (IP) headers. Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Huy Nguyen <huyn@nvidia.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-22 15:24:35 -07:00
Huy Nguyen	dd7cf00f87	net/mlx5: Optimize mlx5e_feature_checks for non IPsec packet mlx5e_ipsec_feature_check belongs to mlx5e_tunnel_features_check. Also, IPsec is not the default configuration so it should be checked at the end instead of the beginning of mlx5e_features_check. Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Huy Nguyen <huyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-22 15:24:29 -07:00
caihuoqing	5bf3ee97f4	net/mlx5: remove "default n" from Kconfig remove "default n" and "No" is default Signed-off-by: caihuoqing <caihuoqing@baidu.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-22 15:24:26 -07:00
Colin Ian King	2cc7dad75d	net/mlx5: Fix spelling mistake "enught" -> "enough" There is a spelling mistake in a mlx5_core_err error message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-22 15:24:24 -07:00
Nathan Chancellor	d4472a4b8c	net/mlx5: Use cpumask_available() in mlx5_eq_create_generic() When CONFIG_CPUMASK_OFFSTACK is unset, cpumask_var_t is not a pointer but a single element array, meaning its address in a structure cannot be NULL as long as it is not the first element, which it is not. This results in a clang warning: drivers/net/ethernet/mellanox/mlx5/core/eq.c:715:14: warning: address of array 'param->affinity' will always evaluate to 'true' [-Wpointer-bool-conversion] if (!param->affinity) ~~~~~~~~^~~~~~~~ 1 warning generated. The helper cpumask_available was added in commit f7e30f01a9e2 ("cpumask: Add helper cpumask_available()") to handle situations like this so use it to keep the meaning of the code the same while resolving the warning. Fixes: e4e3f24b822f ("net/mlx5: Provide cpumask at EQ creation phase") Link: https://github.com/ClangBuiltLinux/linux/issues/1400 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-22 15:24:20 -07:00
Jiapeng Chong	9201ab5f55	net/mlx5: Fix missing error code in mlx5_init_fs() The error code is missing in this code scenario, add the error code '-ENOMEM' to the return value 'err'. Eliminate the follow smatch warning: drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:2973 mlx5_init_fs() warn: missing error code 'err'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: 4a98544d1827 ("net/mlx5: Move chains ft pool to be used by all firmware steering"). Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-22 15:24:17 -07:00
Lorenzo Bianconi	aff0824dc4	net: marvell: return csum computation result from mvneta_rx_csum/mvpp2_rx_csum This is a preliminary patch to add hw csum hint support to mvneta/mvpp2 xdp implementation Tested-by: Matteo Croce <mcroce@linux.microsoft.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:55:05 -07:00
Dan Carpenter	b0e03950dd	stmmac: dwmac-loongson: fix uninitialized variable in loongson_dwmac_probe() The "mdio" variable is never set to false. Also it should be a bool type instead of int. Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:44:50 -07:00
Kees Cook	ee8e7622e0	octeontx2-af: Avoid field-overflowing memcpy() In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. To avoid having memcpy() think a u64 "prof" is being written beyond, adjust the prof member type by adding struct nix_bandprof_s to the union to match the other structs. This silences the following future warning: In file included from ./include/linux/string.h:253, from ./include/linux/bitmap.h:10, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/cpumask.h:5, from ./arch/x86/include/asm/msr.h:11, from ./arch/x86/include/asm/processor.h:22, from ./arch/x86/include/asm/timex.h:5, from ./include/linux/timex.h:65, from ./include/linux/time32.h:13, from ./include/linux/time.h:60, from ./include/linux/stat.h:19, from ./include/linux/module.h:13, from drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:11: In function '__fortify_memcpy_chk', inlined from '__fortify_memcpy' at ./include/linux/fortify-string.h:310:2, inlined from 'rvu_nix_blk_aq_enq_inst' at drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c:910:5: ./include/linux/fortify-string.h:268:4: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); please use struct_group() [-Wattribute-warning] 268 \| __write_overflow_field(); \| ^~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c: ... else if (req->ctype == NIX_AQ_CTYPE_BANDPROF) memcpy(&rsp->prof, ctx, sizeof(struct nix_bandprof_s)); ... Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Subbaraya Sundeep<sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 10:19:01 -07:00
Marcin Wojtas	8d909440ab	net: mvpp2: remove unused 'has_phy' field The 'has_phy' field from struct mvpp2_port is no longer used. Remove it. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 09:54:55 -07:00
Marcin Wojtas	dfce1bab8f	net: mvpp2: enable using phylink with ACPI Now that the MDIO and phylink are supported in the ACPI world, enable to use them in the mvpp2 driver. Ensure a backward compatibility with the firmware whose ACPI description does not contain the necessary elements for the proper phy handling and fall back to relying on the link interrupts instead. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 09:54:55 -07:00
Marcin Wojtas	c54da4c1ac	net: mvmdio: add ACPI support This patch introducing ACPI support for the mvmdio driver by adding acpi_match_table with two entries: * "MRVL0100" for the SMI operation * "MRVL0101" for the XSMI mode Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 09:54:55 -07:00
Marcin Wojtas	33fc11f098	net/fsl: switch to fwnode_mdiobus_register Utilize the newly added helper routine for registering the MDIO bus via fwnode_ interface. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-22 09:54:55 -07:00
Kees Cook	ef2c3ddaa4	ibmvnic: Use strscpy() instead of strncpy() Since these strings are expected to be NUL-terminated and the buffers are exactly sized (in vnic_client_data_len()) with no padding, strncpy() can be safely replaced with strscpy() here, as strncpy() on NUL-terminated string is considered deprecated[1]. This has the side-effect of silencing a -Warray-bounds warning due to the compiler being confused about the vlcd incrementing: In file included from ./include/linux/string.h:253, from ./include/linux/bitmap.h:10, from ./include/linux/cpumask.h:12, from ./include/linux/mm_types_task.h:14, from ./include/linux/mm_types.h:5, from ./include/linux/buildid.h:5, from ./include/linux/module.h:14, from drivers/net/ethernet/ibm/ibmvnic.c:35: In function '__fortify_strncpy', inlined from 'vnic_add_client_data' at drivers/net/ethernet/ibm/ibmvnic.c:3919:2: ./include/linux/fortify-string.h:39:30: warning: '__builtin_strncpy' offset 12 from the object at 'v lcd' is out of the bounds of referenced subobject 'name' with type 'char[]' at offset 12 [-Warray-bo unds] 39 \| #define __underlying_strncpy __builtin_strncpy \| ^ ./include/linux/fortify-string.h:51:9: note: in expansion of macro '__underlying_strncpy' 51 \| return __underlying_strncpy(p, q, size); \| ^~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/ibm/ibmvnic.c: In function 'vnic_add_client_data': drivers/net/ethernet/ibm/ibmvnic.c:3883:7: note: subobject 'name' declared here 3883 \| char name[]; \| ^~~~ [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings Cc: Dany Madden <drt@linux.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Cc: Thomas Falcon <tlfalcon@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 14:52:16 -07:00
Esben Haabendal	ce03b94ba6	net: ll_temac: Remove left-over debug message Fixes: f63963411942 ("net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 14:44:09 -07:00
Christophe JAILLET	b40d7af798	net: hns3: Fix a memory leak in an error handling path in 'hclge_handle_error_info_log()' If this 'kzalloc()' fails we must free some resources as in all the other error handling paths of this function. Fixes: 2e2deee7618b ("net: hns3: add the RAS compatibility adaptation solution") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 14:30:15 -07:00
Fugang Duan	52c4a1a85f	net: fec: add ndo_select_queue to fix TX bandwidth fluctuations As we know that AVB is enabled by default, and the ENET IP design is queue 0 for best effort, queue 1&2 for AVB Class A&B. Bandwidth of each queue 1&2 set in driver is 50%, TX bandwidth fluctuated when selecting tx queues randomly with FEC_QUIRK_HAS_AVB quirk available. This patch adds ndo_select_queue callback to select queues for transmitting to fix this issue. It will always return queue 0 if this is not a vlan packet, and return queue 1 or 2 based on priority of vlan packet. You may complain that in fact we only use single queue for trasmitting if we are not targeted to VLAN. Yes, but seems we have no choice, since AVB is enabled when the driver probed, we can't switch this feature dynamicly. After compare multiple queues to single queue, TX throughput almost no improvement. One way we can implemet is to configure the driver to multiple queues with Round-robin scheme by default. Then add ndo_setup_tc callback to enable/disable AVB feature for users. Unfortunately, ENET AVB IP seems not follow the standard 802.1Qav spec. We only can program DMAnCFG[IDLE_SLOPE] field to calculate bandwidth fraction. And idle slope is restricted to certain valus (a total of 19). It's far away from CBS QDisc implemented in Linux TC framework. If you strongly suggest to do this, I think we only can support limited numbers of bandwidth and reject others, but it's really urgly and wried. With this patch, VLAN tagged packets route to queue 0/1/2 based on vlan priority; VLAN untagged packets route to queue 0. Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de> Reported-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 14:24:20 -07:00
Joakim Zhang	471ff4455d	net: fec: add FEC_QUIRK_HAS_MULTI_QUEUES represents i.MX6SX ENET IP Frieder Schrempf reported a TX throuthput issue [1], it happens quite often that the measured bandwidth in TX direction drops from its expected/nominal value to something like ~50% (for 100M) or ~67% (for 1G) connections. [1] https://lore.kernel.org/linux-arm-kernel/421cc86c-b66f-b372-32f7-21e59f9a98bc@kontron.de/ The issue becomes clear after digging into it, Net core would select queues when transmitting packets. Since FEC have not impletemented ndo_select_queue callback yet, so it will call netdev_pick_tx to select queues randomly. For i.MX6SX ENET IP with AVB support, driver default enables this feature. According to the setting of QOS/RCMRn/DMAnCFG registers, AVB configured to Credit-based scheme, 50% bandwidth of each queue 1&2. With below tests let me think more: 1) With FEC_QUIRK_HAS_AVB quirk, can reproduce TX bandwidth fluctuations issue. 2) Without FEC_QUIRK_HAS_AVB quirk, can't reproduce TX bandwidth fluctuations issue. The related difference with or w/o FEC_QUIRK_HAS_AVB quirk is that, whether we program FTYPE field of TxBD or not. As I describe above, AVB feature is enabled by default. With FEC_QUIRK_HAS_AVB quirk, frames in queue 0 marked as non-AVB, and frames in queue 1&2 marked as AVB Class A&B. It's unreasonable if frames in queue 1&2 are not required to be time-sensitive. So when Net core select tx queues ramdomly, Credit-based scheme would work and lead to TX bandwidth fluctuated. On the other hand, w/o FEC_QUIRK_HAS_AVB quirk, frames in queue 1&2 are all marked as non-AVB, so Credit-based scheme would not work. Till now, how can we fix this TX throughput issue? Yes, please remove FEC_QUIRK_HAS_AVB quirk if you suffer it from time-nonsensitive networking. However, this quirk is used to indicate i.MX6SX, other setting depends on it. So this patch adds a new quirk FEC_QUIRK_HAS_MULTI_QUEUES to represent i.MX6SX, it is safe for us remove FEC_QUIRK_HAS_AVB quirk now. FEC_QUIRK_HAS_AVB quirk is set by default in the driver, and users may not know much about driver details, they would waste effort to find the root cause, that is not we want. The following patch is a implementation to fix it and users don't need to modify the driver. Tested-by: Frieder Schrempf <frieder.schrempf@kontron.de> Reported-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 14:24:20 -07:00
Ido Schimmel	1e27b9e408	mlxsw: core: Add support for module EEPROM read by page Add support for ethtool_ops::get_module_eeprom_by_page() which allows user space to read transceiver module EEPROM based on passed parameters. The I2C address is not validated in order to avoid module-specific code. In case of wrong address, error will be returned from device's firmware. Tested by comparing output with legacy method (ioctl) output. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 12:33:05 -07:00
Ido Schimmel	cecefb3a6e	mlxsw: reg: Document possible MCIA status values Will be used to emit meaningful messages to user space via extack in a subsequent patch. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 12:33:04 -07:00
Ido Schimmel	d51ea60e01	mlxsw: reg: Add bank number to MCIA register Add bank number to MCIA (Management Cable Info Access) register in order to allow access to banked pages on EEPROMs using CMIS (Common Management Interface Specification) memory map. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 12:33:04 -07:00
Dan Carpenter	43c9a81116	nfp: flower-ct: check for error in nfp_fl_ct_offload_nft_flow() The nfp_fl_ct_add_flow() function can fail so we need to check for failure. Fixes: 95255017e0a8 ("nfp: flower-ct: add nft flows to nft list") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 12:20:02 -07:00
Dan Carpenter	753ba09aa3	net: qualcomm: rmnet: fix two pointer math bugs We recently changed these two pointers from void pointers to struct pointers and it breaks the pointer math so now the "txphdr" points beyond the end of the buffer. Fixes: 56a967c4f7e5 ("net: qualcomm: rmnet: Remove some unneeded casts") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 12:19:19 -07:00
Dan Carpenter	956c3ae411	net: hns3: fix a double shift bug These flags are used to set and test bits like this: if (!test_bit(HCLGE_PTP_FLAG_TX_EN, &ptp->flags) \|\| The issue is that test_bit() takes a bit number like 1, but we are passing BIT(1) instead and it's testing BIT(BIT(1)). This does not cause a problem because it is always done consistently and the bit values are very small. Fixes: 0bf5eb788512 ("net: hns3: add support for PTP") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-21 12:14:11 -07:00

1 2 3 4 5 ...

38384 Commits