38400 Commits

Author SHA1 Message Date
Raed Salem
5589b8f1a2 net/mlx5e: Add IPsec support to uplink representor
Add the xfrm xdo and ipsec_init/cleanup to uplink representor to
support IPsec in SRIOV switchdev mode.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26 00:31:25 -07:00
Tariq Toukan
e8c8276145 net/mlx5e: kTLS, Add stats for number of deleted kTLS TX offloaded connections
Expose ethtool SW counter for the number of kTLS device-offloaded
TX connections that are finished and deleted.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26 00:31:23 -07:00
Eli Cohen
5bd8cee2b9 net/mlx5: SF, Improve performance in SF allocation
Avoid second traversal on the SF table by recording the first free entry
and using it in case the looked up entry was not found in the table.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26 00:31:20 -07:00
Ariel Levkovich
6cdc686aa3 net/mlx5: Increase hairpin buffer size
The max packet size a hairpin queue is able to handle
is determined by the total hairpin buffer size divided
by 4.

Currently the buffer size is set to 32KB which makes
the max packet size to be 8KB and doesn't support
jumbo frames of size 9KB.

This change increases the buffer size to 64KB to increase
the max frame size and support 9KB frames.

Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26 00:31:18 -07:00
Yevgeny Kliteynik
1ab6dc35e9 net/mlx5: DR, Add support for flow sampler offload
Add SW steering support for sFlow / flow sampler action.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26 00:31:15 -07:00
Yevgeny Kliteynik
6f8515568e net/mlx5: Compare sampler flow destination ID in fs_core
When comparing sampler flow destinations,
in fs_core, consider sampler ID as well.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26 00:31:13 -07:00
David S. Miller
ff8744b5eb Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2021-06-25

This series contains updates to ice driver only.

Jesse adds support for tracepoints to aide in debugging.

Maciej adds support for PTP auxiliary pin support.

Victor removes the VSI info from the old aggregator when moving the VSI
to another aggregator.

Tony removes an unnecessary VSI assignment.

Christophe Jaillet fixes a memory leak for failed allocation in
ice_pf_dcb_cfg().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-25 11:59:11 -07:00
Marcin Wojtas
ac53c26433 net: mdiobus: withdraw fwnode_mdbiobus_register
The newly implemented fwnode_mdbiobus_register turned out to be
problematic - in case the fwnode_/of_/acpi_mdio are built as
modules, a dependency cycle can be observed during the depmod phase of
modules_install, eg.:

depmod: ERROR: Cycle detected: fwnode_mdio -> of_mdio -> fwnode_mdio
depmod: ERROR: Found 2 modules in dependency cycles!

OR:

depmod: ERROR: Cycle detected: acpi_mdio -> fwnode_mdio -> acpi_mdio
depmod: ERROR: Found 2 modules in dependency cycles!

A possible solution could be to rework fwnode_mdiobus_register,
so that to merge the contents of acpi_mdiobus_register and
of_mdiobus_register. However feasible, such change would
be very intrusive and affect huge amount of the of_mdiobus_register
users.

Since there are currently 2 users of ACPI and MDIO
(xgmac_mdio and mvmdio), withdraw the fwnode_mdbiobus_register
and roll back to a simple 'if' condition in affected drivers.

Fixes: 62a6ef6a996f ("net: mdiobus: Introduce fwnode_mdbiobus_register()")
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-25 11:46:29 -07:00
Christophe JAILLET
b81c191c46 ice: Fix a memory leak in an error handling path in 'ice_pf_dcb_cfg()'
If this 'kzalloc()' fails we must free some resources as in all the other
error handling paths of this function.

Fixes: 348048e724a0 ("ice: Implement iidc operations")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-25 11:30:50 -07:00
Tony Nguyen
70fa0a0780 ice: remove unnecessary VSI assignment
ice_get_vf_vsi() is being called twice for the same VSI. Remove the
unnecessary call/assignment.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
2021-06-25 11:30:50 -07:00
Victor Raj
37c592062b ice: remove the VSI info from previous agg
Remove the VSI info from previous aggregator after moving the VSI to a
new aggregator.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-25 11:30:49 -07:00
Maciej Machnikowski
172db5f91d ice: add support for auxiliary input/output pins
The E810 device supports programmable pins for enabling both input and
output events related to the PTP hardware clock. This includes both
output signals with programmable period, as well as timestamping of
events on input pins.

Add support for enabling these using the CONFIG_PTP_1588_CLOCK
interface.

This allows programming the software defined pins to take advantage of
the hardware clock features.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-25 11:30:49 -07:00
David Thompson
f92e1869d7 Add Mellanox BlueField Gigabit Ethernet driver
This patch adds build and driver logic for the "mlxbf_gige"
Ethernet driver from Mellanox Technologies. The second
generation BlueField SoC from Mellanox supports an
out-of-band GigaBit Ethernet management port to the Arm
subsystem.  This driver supports TCP/IP network connectivity
for that port, and provides back-end routines to handle
basic ethtool requests.

The driver interfaces to the Gigabit Ethernet block of
BlueField SoC via MMIO accesses to registers, which contain
control information or pointers describing transmit and
receive resources.  There is a single transmit queue, and
the port supports transmit ring sizes of 4 to 256 entries.
There is a single receive queue, and the port supports
receive ring sizes of 32 to 32K entries. The transmit and
receive rings are allocated from DMA coherent memory. There
is a 16-bit producer and consumer index per ring to denote
software ownership and hardware ownership, respectively.

The main driver logic such as probe(), remove(), and netdev
ops are in "mlxbf_gige_main.c".  Logic in "mlxbf_gige_rx.c"
and "mlxbf_gige_tx.c" handles the packet processing for
receive and transmit respectively.

The logic in "mlxbf_gige_ethtool.c" supports the handling
of some basic ethtool requests: get driver info, get ring
parameters, get registers, and get statistics.

The logic in "mlxbf_gige_mdio.c" is the driver controlling
the Mellanox BlueField hardware that interacts with a PHY
device via MDIO/MDC pins.  This driver does the following:
  - At driver probe time, it configures several BlueField MDIO
    parameters such as sample rate, full drive, voltage and MDC
  - It defines functions to read and write MDIO registers and
    registers the MDIO bus.
  - It defines the phy interrupt handler reporting a
    link up/down status change
  - This driver's probe is invoked from the main driver logic
    while the phy interrupt handler is registered in ndo_open.

Driver limitations
  - Only supports 1Gbps speed
  - Only supports GMII protocol
  - Supports maximum packet size of 2KB
  - Does not support scatter-gather buffering

Testing
  - Successful build of kernel for ARM64, ARM32, X86_64
  - Tested ARM64 build on FastModels & Palladium
  - Tested ARM64 build on several Mellanox boards that are built with
    the BlueField-2 SoC.  The testing includes coverage in the areas
    of networking (e.g. ping, iperf, ifconfig, route), file transfers
    (e.g. SCP), and various ethtool options relevant to this driver.

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: Liming Sun <limings@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-25 11:20:23 -07:00
Jesse Brandeburg
3089cf6d3c ice: add tracepoints
This patch is modeled after one by Scott Peterson for i40e.

Add tracepoints to the driver, via a new file ice_trace.h and some new
trace calls added in interesting places in the driver. Add some tracing
for DIMLIB to help debug interrupt moderation problems.

Performance should not be affected, and this can be very useful
for debugging and adding new trace events to paths in the future.

Note eBPF programs can attach to these events, as well as perf
can count them since we're attaching to the events subsystem
in the kernel.

Co-developed-by: Ben Shelton <benjamin.h.shelton@intel.com>
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-25 08:32:18 -07:00
Jian-Hong Pan
19938bafa7 net: bcmgenet: Add mdio-bcm-unimac soft dependency
The Broadcom UniMAC MDIO bus from mdio-bcm-unimac module comes too late.
So, GENET cannot find the ethernet PHY on UniMAC MDIO bus. This leads
GENET fail to attach the PHY as following log:

bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
...
could not attach to PHY
bcmgenet fd580000.ethernet eth0: failed to connect to PHY
uart-pl011 fe201000.serial: no DMA platform data
libphy: bcmgenet MII bus: probed
...
unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus

It is not just coming too late, there is also no way for the module
loader to figure out the dependency between GENET and its MDIO bus
driver unless we provide this MODULE_SOFTDEP hint.

This patch adds the soft dependency to load mdio-bcm-unimac module
before genet module to fix this issue.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=213485
Fixes: 9a4e79697009 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver")
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 15:51:26 -07:00
Ido Schimmel
911bd1b1f0 mlxsw: core_env: Avoid unnecessary memcpy()s
Simply get a pointer to the data in the register payload instead of
copying it to a temporary buffer.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 15:41:55 -07:00
Bailey Forrest
e8192476de gve: Fix warnings reported for DQO patchset
https://patchwork.kernel.org/project/netdevbpf/list/?series=506637&state=*

- Remove unused variable
- Use correct integer type for string formatting.
- Remove `inline` in C files

Fixes: 9c1a59a2f4bc ("gve: DQO: Add ring allocation and initialization")
Fixes: a57e5de476be ("gve: DQO: Add TX path")
Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 15:38:29 -07:00
Bailey Forrest
9b8dd5e5ea gve: DQO: Add RX path
The RX queue has an array of `gve_rx_buf_state_dqo` objects. All
allocated pages have an associated buf_state object. When a buffer is
posted on the RX buffer queue, the buffer ID will be the buf_state's
index into the RX queue's array.

On packet reception, the RX queue will have one descriptor for each
buffer associated with a received packet. Each RX descriptor will have
a buffer_id that was posted on the buffer queue.

Notable mentions:

- We use a default buffer size of 2048 bytes. Based on page size, we
  may post separate sections of a single page as separate buffers.

- The driver holds an extra reference on pages passed up the receive
  path with an skb and keeps these pages on a list. When posting new
  buffers to the NIC, we check if any of these pages has only our
  reference, or another buffer sized segment of the page has no
  references. If so, it is free to reuse. This page recycling approach
  is a common netdev optimization that reduces page alloc/free calls.

- Pages in the free list have a page_count bias in order to avoid an
  atomic increment of pagecount every time we attempt to reuse a page.
  # references = page_count() - bias

- In order to track when a page is safe to reuse, we keep track of the
  last offset which had a single SKB reference. When this occurs, it
  implies that every single other offset is reusable. Otherwise, we
  don't know if offsets can be safely reused.

- We maintain two free lists of pages. List #1 (recycled_buf_states)
  contains pages we know can be reused right away. List #2
  (used_buf_states) contains pages which cannot be used right away. We
  only attempt to get pages from list #2 when list #1 is empty. We only
  attempt to use a small fixed number pages from list #2 before giving
  up and allocating a new page. Both lists are FIFOs in hope that by the
  time we attempt to reuse a page, the references were dropped.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
a57e5de476 gve: DQO: Add TX path
TX SKBs will have their buffers DMA mapped with the device. Each buffer
will have at least one TX descriptor associated. Each SKB will also have
a metadata descriptor.

Each TX queue maintains an array of `gve_tx_pending_packet_dqo` objects.
Every TX SKB will have an associated pending_packet object. A TX SKB's
descriptors will use its pending_packet's index as the completion tag,
which will be returned on the TX completion queue.

The device implements a "flow-miss model". Most packets will simply
receive a packet completion. The flow-miss system may choose to process
a packet based on its contents. A TX packet which experiences a flow
miss would receive a miss completion followed by a later reinjection
completion. The miss-completion is received when the packet starts to be
processed by the flow-miss system and the reinjection completion is
received when the flow-miss system completes processing the packet and
sends it on the wire.

Notable mentions:

- Buffers may be freed after receiving the miss-completion, but in order
  to avoid packet reordering, we do not complete the SKB until receiving
  the reinjection completion.

- The driver must robustly handle the unlikely scenario where a miss
  completion does not have an associated reinjection completion. This is
  accomplished by maintaining a list of packets which have a pending
  reinjection completion. After a short timeout (5 seconds), the
  SKB and buffers are released and the pending_packet is moved to a
  second list which has a longer timeout (60 seconds), where the
  pending_packet will not be reused. When the longer timeout elapses,
  the driver may assume the reinjection completion would never be
  received and the pending_packet may be reused.

- Completion handling is triggered by an interrupt and is done in the
  NAPI poll function. Because the TX path and completion exist in
  different threading contexts they maintain their own lists for free
  pending_packet objects. The TX path uses a lock-free approach to steal
  the list from the completion path.

- Both the TSO context and general context descriptors have metadata
  bytes. The device requires that if multiple descriptors contain the
  same field, each descriptor must have the same value set for that
  field.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
0dcc144a79 gve: DQO: Configure interrupts on device up
When interrupts are first enabled, we also set the ratelimits, which
will be static for the entire usage of the device.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
9c1a59a2f4 gve: DQO: Add ring allocation and initialization
Allocate the buffer and completion ring structures. Do not populate the
rings yet. That will happen in the respective rx and tx datapath
follow-on patches

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
5e8c5adf95 gve: DQO: Add core netdev features
Add napi netdev device registration, interrupt handling and initial tx
and rx polling stubs. The stubs will be filled in follow-on patches.

Also:
- LRO feature advertisement and handling
- Also update ethtool logic

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
1f6228e459 gve: Update adminq commands to support DQO queues
DQO queue creation requires additional parameters:
- TX completion/RX buffer queue size
- TX completion/RX buffer queue address
- TX/RX queue size
- RX buffer size

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
a4aa1f1e69 gve: Add DQO fields for core data structures
- Add new DQO datapath structures:
  - `gve_rx_buf_queue_dqo`
  - `gve_rx_compl_queue_dqo`
  - `gve_rx_buf_state_dqo`
  - `gve_tx_desc_dqo`
  - `gve_tx_pending_packet_dqo`

- Incorporate these into the existing ring data structures:
  - `gve_rx_ring`
  - `gve_tx_ring`

Noteworthy mentions:

- `gve_rx_buf_state` represents an RX buffer which was posted to HW.
  Each RX queue has an array of these objects and the index into the
  array is used as the buffer_id when posted to HW.

- `gve_tx_pending_packet_dqo` is treated similarly for TX queues. The
  completion_tag is the index into the array.

- These two structures have links for linked lists which are represented
  by 16b indexes into a contiguous array of these structures.
  This reduces memory footprint compared to 64b pointers.

- We use unions for the writeable datapath structures to reduce cache
  footprint. GQI specific members will renamed like DQO members in a
  future patch.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:38 -07:00
Bailey Forrest
223198183f gve: Add dqo descriptors
General description of rings and descriptors:

TX ring is used for sending TX packet buffers to the NIC. It has the
following descriptors:
- `gve_tx_pkt_desc_dqo` - Data buffer descriptor
- `gve_tx_tso_context_desc_dqo` - TSO context descriptor
- `gve_tx_general_context_desc_dqo` - Generic metadata descriptor

Metadata is a collection of 12 bytes. We define `gve_tx_metadata_dqo`
which represents the logical interpetation of the metadata bytes. It's
helpful to define this structure because the metadata bytes exist in
multiple descriptor types (including `gve_tx_tso_context_desc_dqo`),
and the device requires same field has the same value in all
descriptors.

The TX completion ring is used to receive completions from the NIC.
Having a separate ring allows for completions to be out of order. The
completion descriptor `gve_tx_compl_desc` has several different types,
most important are packet and descriptor completions. Descriptor
completions are used to notify the driver when descriptors sent on the
TX ring are done being consumed. The descriptor completion is only used
to signal that space is cleared in the TX ring. A packet completion will
be received when a packet transmitted on the TX queue is done being
transmitted.

In addition there are "miss" and "reinjection" completions. The device
implements a "flow-miss model". Most packets will simply receive a
packet completion. The flow-miss system may choose to process a packet
based on its contents. A TX packet which experiences a flow miss would
receive a miss completion followed by a later reinjection completion.
The miss-completion is received when the packet starts to be processed
by the flow-miss system and the reinjection completion is received when
the flow-miss system completes processing the packet and sends it on the
wire.

The RX buffer ring is used to send buffers to HW via the
`gve_rx_desc_dqo` descriptor.

Received packets are put into the RX queue by the device, which
populates the `gve_rx_compl_desc_dqo` descriptor. The RX descriptors
refer to buffers posted by the buffer queue. Received buffers may be
returned out of order, such as when HW LRO is enabled.

Important concepts:
- "TX" and "RX buffer" queues, which send descriptors to the device, use
  MMIO doorbells to notify the device of new descriptors.

- "RX" and "TX completion" queues, which receive descriptors from the
  device, use a "generation bit" to know when a descriptor was populated
  by the device. The driver initializes all bits with the "current
  generation". The device will populate received descriptors with the
  "next generation" which is inverted from the current generation. When
  the ring wraps, the current/next generation are swapped.

- It's the driver's responsibility to ensure that the RX and TX
  completion queues are not overrun. This can be accomplished by
  limiting the number of descriptors posted to HW.

- TX packets have a 16 bit completion_tag and RX buffers have a 16 bit
  buffer_id. These will be returned on the TX completion and RX queues
  respectively to let the driver know which packet/buffer was completed.

Bitfields are used to describe descriptor fields. This notation is more
concise and readable than shift-and-mask. It is possible because the
driver is restricted to little endian platforms.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
c4b87ac876 gve: Add support for DQO RX PTYPE map
Unlike GQI, DQO RX descriptors do not contain the L3 and L4 type of the
packet. L3 and L4 types are necessary in order to set the hash and csum
on RX SKBs correctly.

DQO RX descriptors instead contain a 10 bit PTYPE index. The PTYPE map
enables the device to tell the driver how to map from PTYPE index to
L3/L4 type.

The device doesn't provide any guarantees about the range of possible
PTYPEs, so we just use a 1024 entry array to implement a fast mapping
structure.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
5ca2265eef gve: adminq: DQO specific device descriptor logic
- In addition to TX and RX queues, DQO has TX completion and RX buffer
  queues.
  - TX completions are received when the device has completed sending a
    packet on the wire.
  - RX buffers are posted on a separate queue form the RX completions.
- DQO descriptor rings are allowed to be smaller than PAGE_SIZE.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
a5886ef4f4 gve: Introduce per netdev enum gve_queue_format
The currently supported queue formats are:
- GQI_RDA - GQI with raw DMA addressing
- GQI_QPL - GQI with queue page list
- DQO_RDA - DQO with raw DMA addressing

The old `gve_priv.raw_addressing` value is only used for GQI_RDA, so we
remove it in favor of just checking against GQI_RDA

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
8a39d3e0da gve: Introduce a new model for device options
The current model uses an integer ID and a fixed size struct for the
parameters of each device option.

The new model allows the device option structs to grow in size over
time. A driver may assume that changes to device option structs will
always be appended.

New device options will also generally have a
`supported_features_mask` so that the driver knows which fields within a
particular device option are enabled.

`gve_device_option.feat_mask` is changed to `required_features_mask`,
and it is a bitmask which must match the value expected by the driver.
This gives the device the ability to break backwards compatibility with
old drivers for certain features by blocking the old drivers from trying
to use the feature.

We maintain ABI compatibility with the old model for
GVE_DEV_OPT_ID_RAW_ADDRESSING in case a driver is using a device which
does not support the new model.

This patch introduces some new terminology:

RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
      mapped and read/updated by the device.

QPL - Queue Page Lists - Driver uses bounce buffers which are DMA mapped
      with the device for read/write and data is copied from/to SKBs.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
920fb45193 gve: Make gve_rx_slot_page_info.page_offset an absolute offset
Using `page_offset` like a boolean means a page may only be split into
two sections. With page sizes larger than 4k, this can be very wasteful.
Future commits in this patchset use `struct gve_rx_slot_page_info` in a
way which supports a fixed buffer size and a variable page size.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
35f9b2f43f gve: gve_rx_copy: Move padding to an argument
Future use cases will have a different padding value.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Bailey Forrest
dbdaa67540 gve: Move some static functions to a common file
These functions will be shared by the GQI and DQO variants of the GVNIC
driver as of follow-up patches in this series.

Signed-off-by: Bailey Forrest <bcf@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 12:47:37 -07:00
Steen Hegelund
af4b11022e net: sparx5: add ethtool configuration and statistics support
This adds statistic counters for the network interfaces provided
by the driver.  It also adds CPU port counters (which are not
exposed by ethtool).
This also adds support for configuring the network interface
parameters via ethtool: speed, duplex, aneg etc.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
0a9d48ad0d net: sparx5: add calendar bandwidth allocation support
This configures the Sparx5 calendars according to the bandwidth
requested in the Device Tree nodes.
It also checks if the total requested bandwidth is within the
specs of the detected Sparx5 models limits.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
d6fce51419 net: sparx5: add switching support
This adds SwitchDev support by hardware offloading the
software bridge.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
78eab33bb6 net: sparx5: add vlan support
This adds Sparx5 VLAN support.

Sparx5 has more VLAN features than provided here, but these will be added
in later series. For now we only add the basic L2 features.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
b37a1bae74 net: sparx5: add mactable support
This adds the Sparx5 MAC tables: listening for MAC table updates and
updating on request.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:13 -07:00
Steen Hegelund
946e7fd505 net: sparx5: add port module support
This add configuration of the Sparx5 port module instances.

Sparx5 has in total 65 logical ports (denoted D0 to D64) and 33
physical SerDes connections (S0 to S32).  The 65th port (D64) is fixed
allocated to SerDes0 (S0). The remaining 64 ports can in various
multiplexing scenarios be connected to the remaining 32 SerDes using
QSGMII, or USGMII or USXGMII extenders. 32 of the ports can have a 1:1
mapping to the 32 SerDes.

Some additional ports (D65 to D69) are internal to the device and do not
connect to port modules or SerDes macros. For example, internal ports are
used for frame injection and extraction to the CPU queues.

The 65 logical ports are split up into the following blocks.

- 13 x 5G ports (D0-D11, D64)
- 32 x 2G5 ports (D16-D47)
- 12 x 10G ports (D12-D15, D48-D55)
- 8 x 25G ports (D56-D63)

Each logical port supports different line speeds, and depending on the
speeds supported, different port modules (MAC+PCS) are needed. A port
supporting 5 Gbps, 10 Gbps, or 25 Gbps as maximum line speed, will have a
DEV5G, DEV10G, or DEV25G module to support the 5 Gbps, 10 Gbps (incl 5
Gbps), or 25 Gbps (including 10 Gbps and 5 Gbps) speeds. As well as, it
will have a shadow DEV2G5 port module to support the lower speeds
(10/100/1000/2500Mbps). When a port needs to operate at lower speed and the
shadow DEV2G5 needs to be connected to its corresponding SerDes

Not all interface modes are supported in this series, but will be added at
a later stage.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:12 -07:00
Steen Hegelund
f3cad2611a net: sparx5: add hostmode with phylink support
This patch adds netdevs and phylink support for the ports in the switch.
It also adds register based injection and extraction for these ports.

Frame DMA support for injection and extraction will be added in a later
series.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:12 -07:00
Steen Hegelund
3cfa11bac9 net: sparx5: add the basic sparx5 driver
This adds the Sparx5 basic SwitchDev driver framework with IO range
mapping, switch device detection and core clock configuration.

Support for ports, phylink, netdev, mactable etc. are in the following
patches.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: Lars Povlsen <lars.povlsen@microchip.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-24 11:28:12 -07:00
David Wilder
7525de2516 ibmveth: Set CHECKSUM_PARTIAL if NULL TCP CSUM.
TCP checksums on received packets may be set to NULL by the sender if CSO
is enabled. The hypervisor flags these packets as check-sum-ok and the
skb is then flagged CHECKSUM_UNNECESSARY. If these packets are then
forwarded the sender will not request CSO due to the CHECKSUM_UNNECESSARY
flag. The result is a TCP packet sent with a bad checksum. This change
sets up CHECKSUM_PARTIAL on these packets causing the sender to correctly
request CSUM offload.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Tested-by: Cristobal Forno <cforno12@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:54:11 -07:00
David S. Miller
fe87797bf2 mlx5-net-next-2021-06-22
1) Various minor cleanups and fixes from net-next branch
 2) Optimize mlx5 feature check on tx and
    a fix to allow Vxlan with Ipsec offloads
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmDSYyYACgkQSD+KveBX
 +j45Bgf+M7Vg3OK4fA7ZURrbxu2EiCQB4XprO/it7YSzpiZx634DNNRzWXQ2mLJD
 jx5TcmAFVUKkGmx/qrPYe/Y9c9l5s6JMjwACL5aEawXtvPzI/q1KBx/n5L+CoFYw
 lO1IpBPpkwqIbeIl9cwata7IJ6aTeOGjqfQ//Fodfwga063Aaggig6sEcPkr0Ewe
 wcuJmblnw/qOkSI2BlSOyixYiVjPRDF7cAVRTBK4/DCDFiGJTiaj8w0JgfdS2zVs
 3xMrYajdz7qArMcqbuQe59KojeYj4hxALUSs+s9ks8qeIrbM+hZ9sH5m0dJgM4P6
 7Pg87LD6PFDV31DWF0XM3KgdqfiZxw==
 =aszX
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-net-next-2021-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-net-next-2021-06-22

1) Various minor cleanups and fixes from net-next branch
2) Optimize mlx5 feature check on tx and
   a fix to allow Vxlan with Ipsec offloads
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23 12:48:07 -07:00
Huy Nguyen
f1267798c9 net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload
The packet is VXLAN packet over IPsec transport mode tunnel
which has the following format: [IP1 | ESP | UDP | VXLAN | IP2 | TCP]
NVIDIA ConnectX card cannot do checksum offload for two L4 headers.
The solution is using the checksum partial offload similar to
VXLAN | TCP packet. Hardware calculates IP1, IP2 and TCP checksums and
software calculates UDP checksum. However, unlike VXLAN | TCP case,
IPsec's mlx5 driver cannot access the inner plaintext IP protocol type.
Therefore, inner_ipproto is added in the sec_path structure
to provide this information. Also, utilize the skb's csum_start to
program L4 inner checksum offset.

While at it, remove the call to mlx5e_set_eseg_swp and setup software parser
fields directly in mlx5e_ipsec_set_swp. mlx5e_set_eseg_swp is not
needed as the two features (GENEVE and IPsec) are different and adding
this sharing layer creates unnecessary complexity and affect
performance.

For the case VXLAN packet over IPsec tunnel mode tunnel, checksum offload
is disabled because the hardware does not support checksum offload for
three L3 (IP) headers.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-22 15:24:35 -07:00
Huy Nguyen
dd7cf00f87 net/mlx5: Optimize mlx5e_feature_checks for non IPsec packet
mlx5e_ipsec_feature_check belongs to mlx5e_tunnel_features_check.
Also, IPsec is not the default configuration so it should be
checked at the end instead of the beginning of mlx5e_features_check.

Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-22 15:24:29 -07:00
caihuoqing
5bf3ee97f4 net/mlx5: remove "default n" from Kconfig
remove "default n" and "No" is default

Signed-off-by: caihuoqing <caihuoqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-22 15:24:26 -07:00
Colin Ian King
2cc7dad75d net/mlx5: Fix spelling mistake "enught" -> "enough"
There is a spelling mistake in a mlx5_core_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-22 15:24:24 -07:00
Nathan Chancellor
d4472a4b8c net/mlx5: Use cpumask_available() in mlx5_eq_create_generic()
When CONFIG_CPUMASK_OFFSTACK is unset, cpumask_var_t is not a pointer
but a single element array, meaning its address in a structure cannot be
NULL as long as it is not the first element, which it is not. This
results in a clang warning:

drivers/net/ethernet/mellanox/mlx5/core/eq.c:715:14: warning: address of
array 'param->affinity' will always evaluate to 'true'
[-Wpointer-bool-conversion]
        if (!param->affinity)
            ~~~~~~~~^~~~~~~~
1 warning generated.

The helper cpumask_available was added in commit f7e30f01a9e2 ("cpumask:
Add helper cpumask_available()") to handle situations like this so use
it to keep the meaning of the code the same while resolving the warning.

Fixes: e4e3f24b822f ("net/mlx5: Provide cpumask at EQ creation phase")
Link: https://github.com/ClangBuiltLinux/linux/issues/1400
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-22 15:24:20 -07:00
Jiapeng Chong
9201ab5f55 net/mlx5: Fix missing error code in mlx5_init_fs()
The error code is missing in this code scenario, add the error code
'-ENOMEM' to the return value 'err'.

Eliminate the follow smatch warning:

drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:2973 mlx5_init_fs()
warn: missing error code 'err'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: 4a98544d1827 ("net/mlx5: Move chains ft pool to be used by all firmware steering").
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-22 15:24:17 -07:00
Lorenzo Bianconi
aff0824dc4 net: marvell: return csum computation result from mvneta_rx_csum/mvpp2_rx_csum
This is a preliminary patch to add hw csum hint support to
mvneta/mvpp2 xdp implementation

Tested-by: Matteo Croce <mcroce@linux.microsoft.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:55:05 -07:00
Dan Carpenter
b0e03950dd stmmac: dwmac-loongson: fix uninitialized variable in loongson_dwmac_probe()
The "mdio" variable is never set to false.  Also it should be a bool
type instead of int.

Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22 10:44:50 -07:00