41037 Commits

Author SHA1 Message Date
Arthur Kiyanovski
e344546980 net: ena: Move reset completion print to the reset function
The print that indicates that device reset has finished is
currently called from ena_restore_device().

Move it to ena_fw_reset_device() as it is the more natural
location for it.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07 19:25:51 -08:00
Arthur Kiyanovski
09f8676eae net: ena: Remove redundant return code check
The ena_com_indirect_table_fill_entry() function only returns -EINVAL
or 0, no need to check for -EOPNOTSUPP.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07 19:25:50 -08:00
Arthur Kiyanovski
394c48e08b net: ena: Change ENI stats support check to use capabilities field
Use the capabilities field to query the device for ENI stats
support.

This replaces the previous method that tried to get the ENI stats
during ena_probe() and used the success or failure as an indication
for support by the device.

Remove eni_stats_supported field from struct ena_adapter. This field
was used for the previous method of queriying for ENI stats support.

Change the severity level of the print in case of
ena_com_get_eni_stats() failure from info to error.
With the previous method of querying form ENI stats support, failure
to get ENI stats was normal for devices that don't support it.
With the use of the capabilities field such a failure is unexpected,
as it is called only if the device reported that it supports ENI
stats.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07 19:25:50 -08:00
Arthur Kiyanovski
a2d5d6a70f net: ena: Add capabilities field with support for ENI stats capability
This bitmask field indicates what capabilities are supported by the
device.

The capabilities field differs from the 'supported_features' field which
indicates what sub-commands for the set/get feature commands are
supported. The sub-commands are specified in the 'feature_id' field of
the 'ena_admin_set_feat_cmd' struct in the following way:

        struct ena_admin_set_feat_cmd cmd;

        cmd.aq_common_descriptor.opcode = ENA_ADMIN_SET_FEATURE;
        cmd.feat_common.feature_

The 'capabilities' field, on the other hand, specifies different
capabilities of the device. For example, whether the device supports
querying of ENI stats.

Also add an enumerator which contains all the capabilities. The
first added capability macro is for ENI stats feature.

Capabilities are queried along with the other device attributes (in
ena_com_get_dev_attr_feat()) during device initialization and are stored
in the ena_com_dev struct. They can be later queried using the
ena_com_get_cap() helper function.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07 19:25:50 -08:00
Arthur Kiyanovski
7dcf922152 net: ena: Change return value of ena_calc_io_queue_size() to void
ena_calc_io_queue_size() always returns 0, therefore make it a
void function and update the calling function to stop checking
the return value.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07 19:25:50 -08:00
Sunil Goutham
6dc9a23e29 octeontx2-af: Fix interrupt name strings
Fixed interrupt name string logic which currently results
in wrong memory location being accessed while dumping
/proc/interrupts.

Fixes: 4826090719d4 ("octeontx2-af: Enable CPT HW interrupts")
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/1641538505-28367-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07 19:07:06 -08:00
Vladimir Oltean
5cad43a52e net: dsa: felix: add port fast age support
Add support for flushing the MAC table on a given port in the ocelot
switch library, and use this functionality in the felix DSA driver.

This operation is needed when a port leaves a bridge to become
standalone, and when the learning is disabled, and when the STP state
changes to a state where no FDB entry should be present.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220107144229.244584-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07 18:58:25 -08:00
Vladimir Oltean
a14e6b69f3 net: mscc: ocelot: fix incorrect balancing with down LAG ports
Assuming the test setup described here:
https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/
(swp1 and swp2 are in bond0, and bond0 is in a bridge with swp0)

it can be seen that when swp1 goes down (on either board A or B), then
traffic that should go through that port isn't forwarded anywhere.

A dump of the PGID table shows the following:

PGID_DST[0] = ports 0
PGID_DST[1] = ports 1
PGID_DST[2] = ports 2
PGID_DST[3] = ports 3
PGID_DST[4] = ports 4
PGID_DST[5] = ports 5
PGID_DST[6] = no ports
PGID_AGGR[0] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[1] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[2] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[3] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[4] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[5] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[6] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[7] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[8] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[9] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[10] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[11] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[12] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[13] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[14] = ports 0, 1, 2, 3, 4, 5
PGID_AGGR[15] = ports 0, 1, 2, 3, 4, 5
PGID_SRC[0] = ports 1, 2
PGID_SRC[1] = ports 0
PGID_SRC[2] = ports 0
PGID_SRC[3] = no ports
PGID_SRC[4] = no ports
PGID_SRC[5] = no ports
PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5

Whereas a "good" PGID configuration for that setup should have looked
like this:

PGID_DST[0] = ports 0
PGID_DST[1] = ports 1, 2
PGID_DST[2] = ports 1, 2
PGID_DST[3] = ports 3
PGID_DST[4] = ports 4
PGID_DST[5] = ports 5
PGID_DST[6] = no ports
PGID_AGGR[0] = ports 0, 2, 3, 4, 5
PGID_AGGR[1] = ports 0, 2, 3, 4, 5
PGID_AGGR[2] = ports 0, 2, 3, 4, 5
PGID_AGGR[3] = ports 0, 2, 3, 4, 5
PGID_AGGR[4] = ports 0, 2, 3, 4, 5
PGID_AGGR[5] = ports 0, 2, 3, 4, 5
PGID_AGGR[6] = ports 0, 2, 3, 4, 5
PGID_AGGR[7] = ports 0, 2, 3, 4, 5
PGID_AGGR[8] = ports 0, 2, 3, 4, 5
PGID_AGGR[9] = ports 0, 2, 3, 4, 5
PGID_AGGR[10] = ports 0, 2, 3, 4, 5
PGID_AGGR[11] = ports 0, 2, 3, 4, 5
PGID_AGGR[12] = ports 0, 2, 3, 4, 5
PGID_AGGR[13] = ports 0, 2, 3, 4, 5
PGID_AGGR[14] = ports 0, 2, 3, 4, 5
PGID_AGGR[15] = ports 0, 2, 3, 4, 5
PGID_SRC[0] = ports 1, 2
PGID_SRC[1] = ports 0
PGID_SRC[2] = ports 0
PGID_SRC[3] = no ports
PGID_SRC[4] = no ports
PGID_SRC[5] = no ports
PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5

In other words, in the "bad" configuration, the attempt is to remove the
inactive swp1 from the destination ports via PGID_DST. But when a MAC
table entry is learned, it is learned towards PGID_DST 1, because that
is the logical port id of the LAG itself (it is equal to the lowest
numbered member port). So when swp1 becomes inactive, if we set
PGID_DST[1] to contain just swp1 and not swp2, the packet will not have
any chance to reach the destination via swp2.

The "correct" way to remove swp1 as a destination is via PGID_AGGR
(remove swp1 from the aggregation port groups for all aggregation
codes). This means that PGID_DST[1] and PGID_DST[2] must still contain
both swp1 and swp2. This makes the MAC table still treat packets
destined towards the single-port LAG as "multicast", and the inactive
ports are removed via the aggregation code tables.

The change presented here is a design one: the ocelot_get_bond_mask()
function used to take an "only_active_ports" argument. We don't need
that. The only call site that specifies only_active_ports=true,
ocelot_set_aggr_pgids(), must retrieve the entire bonding mask, because
it must program that into PGID_DST. Additionally, it must also clear the
inactive ports from the bond mask here, which it can't do if bond_mask
just contains the active ports:

	ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i);
	ac &= ~bond_mask;  <---- here
	/* Don't do division by zero if there was no active
	 * port. Just make all aggregation codes zero.
	 */
	if (num_active_ports)
		ac |= BIT(aggr_idx[i % num_active_ports]);
	ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i);

So it becomes the responsibility of ocelot_set_aggr_pgids() to take
ocelot_port->lag_tx_active into consideration when populating the
aggr_idx array.

Fixes: 23ca3b727ee6 ("net: mscc: ocelot: rebalance LAGs on link up/down events")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220107164332.402133-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07 18:54:59 -08:00
Jason Wang
5322c68e58 iavf: remove an unneeded variable
The variable `ret_code' used for returning is never changed in function
`iavf_shutdown_adminq'. So that it can be removed and just return its
initial value 0 at the end of `iavf_shutdown_adminq' function.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:04:21 -08:00
Yang Li
a127adf2fc i40e: remove variables set but not used
The code that uses variables pe_cntx_size and pe_filt_size
has been removed, so they should be removed as well.

Eliminate the following clang warnings:
drivers/net/ethernet/intel/i40e/i40e_common.c:4139:20:
warning: variable 'pe_filt_size' set but not used.
drivers/net/ethernet/intel/i40e/i40e_common.c:4139:6:
warning: variable 'pe_cntx_size' set but not used.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:04:21 -08:00
Mateusz Palczewski
17b33d4319 i40e: Remove non-inclusive language
Remove non-inclusive language from the driver.

Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:04:21 -08:00
Mateusz Palczewski
9c83ca8a63 i40e: Update FW API version
Update FW API versions to the newest supported NVM images.

Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:04:21 -08:00
Jedrzej Jagielski
ef39584ddb i40e: Minimize amount of busy-waiting during AQ send
The i40e_asq_send_command will now use a non blocking usleep_range if
possible (non-atomic context), instead of busy-waiting udelay. The
usleep_range function uses hrtimers to provide better performance and
removes the negative impact of busy-waiting in time-critical
environments.

1. Rename i40e_asq_send_command to i40e_asq_send_command_atomic
   and add 5th parameter to inform if called from an atomic context.
   Call inside usleep_range (if non-atomic) or udelay (if atomic).

2. Change i40e_asq_send_command to invoke
   i40e_asq_send_command_atomic(..., false).

3. Change two functions:
    - i40e_aq_set_vsi_uc_promisc_on_vlan
    - i40e_aq_set_vsi_mc_promisc_on_vlan
   to explicitly use i40e_asq_send_command_atomic(..., true)
   instead of i40e_asq_send_command, as they use spinlocks and do some
   work in an atomic context.
   All other calls to i40e_asq_send_command remain unchanged.

Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:04:21 -08:00
Karen Sornek
cfb1d572c9 i40e: Add ensurance of MacVlan resources for every trusted VF
Trusted VF can use up every resource available, leaving nothing
to other trusted VFs.
Introduce define, which calculates MacVlan resources available based
on maximum available MacVlan resources, bare minimum for each VF and
number of currently allocated VFs.

Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Karen Sornek <karen.sornek@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-07 09:03:44 -08:00
Rakesh Babu Saladi
eabd0f88b0 octeontx2-nicvf: Free VF PTP resources.
When a VF is removed respective PTP resources are not
being freed currently. This patch fixes it.

Fixes: 43510ef4ddad ("octeontx2-nicvf: Add PTP hardware clock support to NIX VF")
Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07 14:04:19 +00:00
Subbaraya Sundeep
93440f4888 octeontx2-af: Increment ptp refcount before use
Before using the ptp pci device by AF driver increment
the reference count of it.

Fixes: a8b90c9d26d6 ("octeontx2-af: Add PTP device id for CN10K and 95O silcons")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07 14:04:19 +00:00
David S. Miller
26abf15c49 mlx5-updates-2022-01-06
1) Expose FEC per lane block counters via ethtool
 
 2) Trivial fixes/updates/cleanup to mlx5e netdev driver
 
 3) Fix htmldoc build warning
 
 4) Spread mlx5 SFs (sub-functions) to all available CPU cores: Commits 1..5
 
 Shay Drory Says:
 ================
 Before this patchset, mlx5 subfunction shared the same IRQs (MSI-X) with
 their peers subfunctions, causing them to use same CPU cores.
 
 In large scale, this is very undesirable, SFs use small number of cpu
 cores and all of them will be packed on the same CPU cores, not
 utilizing all CPU cores in the system.
 
 In this patchset we want to achieve two things.
  a) Spread IRQs used by SFs to all cpu cores
  b) Pack less SFs in the same IRQ, will result in multiple IRQs per core.
 
 In this patchset, we spread SFs over all online cpus available to mlx5
 irqs in Round-Robin manner. e.g.: Whenever a SF is created, pick the next
 CPU core with least number of SF IRQs bound to it, SFs will share IRQs on
 the same core until a certain limit, when such limit is reached, we
 request a new IRQ and add it to that CPU core IRQ pool, when out of IRQs,
 pick any IRQ with least number of SF users.
 
 This enhancement is done in order to achieve a better distribution of
 the SFs over all the available CPUs, which reduces application latency,
 as shown bellow.
 
 Machine details:
 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores.
 PCI Express 3 with BW of 126 Gb/s.
 ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0
 x16.
 
 Base line test description:
 Single SF on the system. One instance of netperf is running on-top the
 SF.
 Numbers: latency = 15.136 usec, CPU Util = 35%
 
 Test description:
 There are 250 SFs on the system. There are 3 instances of netperf
 running, on-top three different SFs, in parallel.
 
 Perf numbers:
  # netperf     SFs         latency(usec)     latency    CPU utilization
    affinity    affinity    (lower is better) increase %
  1 cpu=0       cpu={0}     ~23 (app 1-3)     35%        75%
  2 cpu=0,2,4   cpu={0}     app 1: 21.625     30%        68% (CPU 0)
                            app 2-3: 16.5     9%         15% (CPU 2,4)
  3 cpu=0       cpu={0,2,4} app 1: ~16        7%         84% (CPU 0)
                            app 2-3: ~17.9    14%        22% (CPU 2,4)
  4 cpu=0,2,4   cpu={0,2,4} 15.2 (app 1-3)    0%         33% (CPU 0,2,4)
 
  - The first two entries (#1 and #2) show current state. e.g.: SFs are
    using the same CPU. The last two entries (#3 and #4) shows the latency
    reduction improvement of this patch. e.g.: SFs are on different CPUs.
  - Whenever we use several CPUs, in case there is a different CPU
    utilization, write the utilization of each CPU separately.
  - Whenever the latency result of the netperf instances were different,
    write the latency of each netperf instances separately.
 
 Commands:
  - for netperf CPU=0:
 $ for i in {1..3}; do taskset -c 0 netperf -H 1${i}.1.1.1 -t TCP_RR  -- \
   -o RT_LATENCY -r8 & done
 
  - for netperf CPU=0,2,4
 $ for i in {1..3}; do taskset -c $(( ($i - 1) * 2  )) netperf -H \
   1${i}.1.1.1 -t TCP_RR  -- -o RT_LATENCY -r8 & done
 
 ================
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmHXh+AACgkQSD+KveBX
 +j68fQgAghUX4TFS2JSwa7+XSCtzz7GIu2Xrz8aWTAnydRLlNXuFuuHYcNed6I0l
 7DaVOZwHp1tp3tnx3WMGPUU6ujDPEgasaDDblvG2UXix5LPVEHDXY44ittQX8mpC
 SC8Yj9mNo6DSfOMUZklFDMbw57XuLJ+HEGnwnrOEEyLX7ruDXGEViUmVBd4IoC3B
 F2fJHBkdTJfHWTJRB4pWbZD1dw7WbKd0RyPla3OkoHugEUCKnbjii8cMwNM64Bbp
 Pjz/SiShVy+NTotqPzRNjcx7y4tHOXCYt33zt1VlGtdUxs5eCA5jkjHFz0jb12Lu
 rvfHaBaU+elMKTw5G/WMGJxZQx0kEQ==
 =VBWY
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2022-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-01-06

1) Expose FEC per lane block counters via ethtool

2) Trivial fixes/updates/cleanup to mlx5e netdev driver

3) Fix htmldoc build warning

4) Spread mlx5 SFs (sub-functions) to all available CPU cores: Commits 1..5

Shay Drory Says:
================
Before this patchset, mlx5 subfunction shared the same IRQs (MSI-X) with
their peers subfunctions, causing them to use same CPU cores.

In large scale, this is very undesirable, SFs use small number of cpu
cores and all of them will be packed on the same CPU cores, not
utilizing all CPU cores in the system.

In this patchset we want to achieve two things.
 a) Spread IRQs used by SFs to all cpu cores
 b) Pack less SFs in the same IRQ, will result in multiple IRQs per core.

In this patchset, we spread SFs over all online cpus available to mlx5
irqs in Round-Robin manner. e.g.: Whenever a SF is created, pick the next
CPU core with least number of SF IRQs bound to it, SFs will share IRQs on
the same core until a certain limit, when such limit is reached, we
request a new IRQ and add it to that CPU core IRQ pool, when out of IRQs,
pick any IRQ with least number of SF users.

This enhancement is done in order to achieve a better distribution of
the SFs over all the available CPUs, which reduces application latency,
as shown bellow.

Machine details:
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores.
PCI Express 3 with BW of 126 Gb/s.
ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0
x16.

Base line test description:
Single SF on the system. One instance of netperf is running on-top the
SF.
Numbers: latency = 15.136 usec, CPU Util = 35%

Test description:
There are 250 SFs on the system. There are 3 instances of netperf
running, on-top three different SFs, in parallel.

Perf numbers:
 # netperf     SFs         latency(usec)     latency    CPU utilization
   affinity    affinity    (lower is better) increase %
 1 cpu=0       cpu={0}     ~23 (app 1-3)     35%        75%
 2 cpu=0,2,4   cpu={0}     app 1: 21.625     30%        68% (CPU 0)
                           app 2-3: 16.5     9%         15% (CPU 2,4)
 3 cpu=0       cpu={0,2,4} app 1: ~16        7%         84% (CPU 0)
                           app 2-3: ~17.9    14%        22% (CPU 2,4)
 4 cpu=0,2,4   cpu={0,2,4} 15.2 (app 1-3)    0%         33% (CPU 0,2,4)

 - The first two entries (#1 and #2) show current state. e.g.: SFs are
   using the same CPU. The last two entries (#3 and #4) shows the latency
   reduction improvement of this patch. e.g.: SFs are on different CPUs.
 - Whenever we use several CPUs, in case there is a different CPU
   utilization, write the utilization of each CPU separately.
 - Whenever the latency result of the netperf instances were different,
   write the latency of each netperf instances separately.

Commands:
 - for netperf CPU=0:
$ for i in {1..3}; do taskset -c 0 netperf -H 1${i}.1.1.1 -t TCP_RR  -- \
  -o RT_LATENCY -r8 & done

 - for netperf CPU=0,2,4
$ for i in {1..3}; do taskset -c $(( ($i - 1) * 2  )) netperf -H \
  1${i}.1.1.1 -t TCP_RR  -- -o RT_LATENCY -r8 & done

================

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07 11:10:57 +00:00
Jakub Kicinski
e4a3d6a6a1 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-01-06

Victor adds restoring of advanced rules after reset.

Wojciech improves usage of switchdev control VSI by utilizing the
device's advanced rules for forwarding.

Christophe Jaillet removes some unneeded calls to zero bitmaps, changes
some bitmap operations that don't need to be atomic, and converts a
kfree() to a more appropriate bitmap_free().

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: Use bitmap_free() to free bitmap
  ice: Optimize a few bitmap operations
  ice: Slightly simply ice_find_free_recp_res_idx
  ice: improve switchdev's slow-path
  ice: replay advanced rules after reset
====================

Link: https://lore.kernel.org/r/20220106183013.3777622-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 20:06:32 -08:00
Amit Cohen
4735402173 mlxsw: spectrum: Extend to support Spectrum-4 ASIC
Extend existing driver for Spectrum, Spectrum-2 and Spectrum-3 ASICs
to support Spectrum-4 ASIC as well.

Currently there is no released firmware version for Spectrum-4, so the
driver is not enforcing a minimum version.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 20:00:46 -08:00
Amit Cohen
852ee4191d mlxsw: spectrum_acl_bloom_filter: Add support for Spectrum-4 calculation
Spectrum-4 will calculate hash function for bloom filter differently
from the existing ASICs.

First, two hash functions will be used to calculate 16 bits result.
The final result will be combination of the two results - 6 bits which
are result of CRC-6 will be used as MSB and 10 bits which are result of
CRC-10 will be used as LSB.

Second, while in Spectrum{2,3}, there is a padding in each chunk, so the
chunks use a sequence of whole bytes, in Spectrum-4 there is no padding,
so each chunk use 20 bytes minus 2 bits, so it is necessary to align the
chunks to be without holes.

Add dedicated 'mlxsw_sp_acl_bf_ops' for Spectrum-4 and add the required
tables for CRC calculations.

All the details are documented as part of the code for future use.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 20:00:45 -08:00
Amit Cohen
58723d2f77 mlxsw: Add operations structure for bloom filter calculation
Spectrum-4 will calculate hash function for bloom filter differently from
the existing ASICs.

There are two changes:
1. Instead of using one hash function to calculate 16 bits output (CRC-16),
   two functions will be used.
2. The chunks will be built differently, without padding.

As preparation for support of Spectrum-4 bloom filter, add 'ops'
structure to allow handling different calculation for different ASICs.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 20:00:45 -08:00
Amit Cohen
29409f363e mlxsw: spectrum_acl_bloom_filter: Rename Spectrum-2 specific objects for future use
Spectrum-4 will calculate hash function for bloom filter differently from
the existing ASICs.

There are two changes:
1. Instead of using one hash function to calculate 16 bits output (CRC-16),
   two functions will be used.
2. The chunks will be built differently, without padding.

As preparation for support of Spectrum-4 bloom filter, rename CRC table
to include "sp2" prefix and "crc16", as next patch will add two additional
tables. In addition, rename all the dedicated functions and defines for
Spectrum-{2,3} to include "sp2" prefix.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 20:00:45 -08:00
Amit Cohen
5d5c3ba9e4 mlxsw: spectrum_acl_bloom_filter: Make mlxsw_sp_acl_bf_key_encode() more flexible
Spectrum-4 will calculate hash function for bloom filter differently from
the existing ASICs.

One of the changes is related to the way that the chunks will be build -
without padding.

As preparation for support of Spectrum-4 bloom filter, make
mlxsw_sp_acl_bf_key_encode() more flexible, so it will be able to use it
for Spectrum-4 as well.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 20:00:45 -08:00
Amit Cohen
4711671297 mlxsw: spectrum_acl_bloom_filter: Reorder functions to make the code more aesthetic
Currently, mlxsw_sp_acl_bf_rule_count_index_get() is implemented before
mlxsw_sp_acl_bf_index_get() but is used after it.

Adding a new function for Spectrum-4 would make them further apart still.
Fix by moving them around.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 20:00:44 -08:00
Amit Cohen
07ff135958 mlxsw: Introduce flex key elements for Spectrum-4
Spectrum-4 ASIC will support more virtual routers and local ports
compared to the existing ASICs. Therefore, the virtual router and local
port ACL key elements need to be increased.

Introduce new key elements for Spectrum-4 to be aligned with the elements
used already for other Spectrum ASICs.

The key blocks layout is the same for Spectrum-4, so use the existing
code for encode_block() and clear_block(), just create separate blocks.

Note that size of `VIRT_ROUTER_MSB` is 4 bits in Spectrum-4,
therefore declare it using `MLXSW_AFK_ELEMENT_INST_U32()`, in order to
be able to set `.avoid_size_check` to true.
Otherwise, `mlxsw_afk_blocks_check()` will fail and warn.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 20:00:44 -08:00
Amit Cohen
6d5d8ebb88 mlxsw: Rename virtual router flex key element
In Spectrum-4, the size of the virtual router ACL key element increased
from 11 bits to 12 bits.

In order to reuse the existing virtual router ACL key element
enumerators for Spectrum-4, rename 'VIRT_ROUTER_8_10' and
'VIRT_ROUTER_0_7' to 'VIRT_ROUTER_MSB' and 'VIRT_ROUTER_LSB',
respectively.

No functional changes intended.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 20:00:44 -08:00
Ioana Ciornei
d1a9b84183 dpaa2-switch: check if the port priv is valid
Before accessing the port private structure make sure that there is
still a non-NULL pointer there. A NULL pointer access can happen when we
are on the remove path, some switch ports are unregistered and some are
in the process of unregistering.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 19:49:10 -08:00
Ioana Ciornei
4e30e98c4b dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is not set
We could get into a situation when the fwnode of the parent device is
not yet set because its probe didn't yet finish. When this happens, any
caller of the dpaa2_mac_open() will not have the fwnode available, thus
cause problems at the PHY connect time.

Avoid this by just returning -EPROBE_DEFER from the dpaa2_mac_open when
this happens.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 19:49:10 -08:00
Robert-Ionut Alexa
5b1e38c079 dpaa2-mac: bail if the dpmacs fwnode is not found
The parent pointer node handler must be declared with a NULL
initializer. Before using it, a check must be performed to make
sure that a valid address has been assigned to it.

Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06 19:49:10 -08:00
Moshe Shemesh
4f6626b0e1 Revert "net/mlx5: Add retry mechanism to the command entry index allocation"
This reverts commit 410bd754cd73c4a2ac3856d9a03d7b08f9c906bf.

The reverted commit had added a retry mechanism to the command entry
index allocation. The previous patch ensures that there is a free
command entry index once the command work handler holds the command
semaphore. Thus the retry mechanism is not needed.

Fixes: 410bd754cd73 ("net/mlx5: Add retry mechanism to the command entry index allocation")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:42 -08:00
Moshe Shemesh
8e715cd613 net/mlx5: Set command entry semaphore up once got index free
Avoid a race where command work handler may fail to allocate command
entry index, by holding the command semaphore down till command entry
index is being freed.

Fixes: 410bd754cd73 ("net/mlx5: Add retry mechanism to the command entry index allocation")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:42 -08:00
Maor Dickman
07f6dc4024 net/mlx5e: Sync VXLAN udp ports during uplink representor profile change
Currently during NIC profile disablement all VXLAN udp ports offloaded to the
HW are flushed and during its enablement the driver send notification to
the stack to inform the core that the entire UDP tunnel port state has been
lost, uplink representor doesn't have the same behavior which can cause
VXLAN udp ports offload to be in bad state while moving between modes while
VXLAN interface exist.

Fixed by aligning the uplink representor profile behavior to the NIC behavior.

Fixes: 84db66124714 ("net/mlx5e: Move set vxlan nic info to profile init")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:41 -08:00
Shay Drory
a1c7c49c20 net/mlx5: Fix access to sf_dev_table on allocation failure
Even when SF devices are supported, the SF device table allocation
can still fail.
In such case mlx5_sf_dev_supported still reports true, but SF device
table is invalid. This can result in NULL table access.

Hence, fix it by adding NULL table check.

Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:41 -08:00
Paul Blakey
b6dfff21a1 net/mlx5e: Fix matching on modified inner ip_ecn bits
Tunnel device follows RFC 6040, and during decapsulation inner
ip_ecn might change depending on inner and outer ip_ecn as follows:

 +---------+----------------------------------------+
 |Arriving |         Arriving Outer Header          |
 |   Inner +---------+---------+---------+----------+
 |  Header | Not-ECT | ECT(0)  | ECT(1)  |   CE     |
 +---------+---------+---------+---------+----------+
 | Not-ECT | Not-ECT | Not-ECT | Not-ECT | <drop>   |
 |  ECT(0) |  ECT(0) | ECT(0)  | ECT(1)  |   CE*    |
 |  ECT(1) |  ECT(1) | ECT(1)  | ECT(1)* |   CE*    |
 |    CE   |   CE    |  CE     | CE      |   CE     |
 +---------+---------+---------+---------+----------+

Cells marked above are changed from original inner packet ip_ecn value.

Tc then matches on the modified inner ip_ecn, but hw offload which
matches the inner ip_ecn value before decap, will fail.

Fix that by mapping all the cases of outer and inner ip_ecn matching,
and only supporting cases where we know inner wouldn't be changed by
decap, or in the outer ip_ecn=CE case, inner ip_ecn didn't matter.

Fixes: bcef735c59f2 ("net/mlx5e: Offload TC matching on tos/ttl for ip tunnels")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:41 -08:00
Aya Levin
01c3fd113e Revert "net/mlx5e: Block offload of outer header csum for GRE tunnel"
This reverts commit 54e1217b90486c94b26f24dcee1ee5ef5372f832.

Although the NIC doesn't support offload of outer header CSUM, using
gso_partial_features allows offloading the tunnel's segmentation. The
driver relies on the stack CSUM calculation of the outer header. For
this, NETIF_F_GSO_GRE_CSUM must be a member of the device's features.

Fixes: 54e1217b9048 ("net/mlx5e: Block offload of outer header csum for GRE tunnel")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:40 -08:00
Aya Levin
64050cdad0 Revert "net/mlx5e: Block offload of outer header csum for UDP tunnels"
This reverts commit 6d6727dddc7f93fcc155cb8d0c49c29ae0e71122.

Although the NIC doesn't support offload of outer header CSUM, using
gso_partial_features allows offloading the tunnel's segmentation. The
driver relies on the stack CSUM calculation of the outer header. For
this, NETIF_F_GSO_UDP_TUNNEL_CSUM must be a member of the device's
features.

Fixes: 6d6727dddc7f ("net/mlx5e: Block offload of outer header csum for UDP tunnels")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:40 -08:00
Maor Dickman
9e72a55a3c net/mlx5e: Don't block routes with nexthop objects in SW
Routes with nexthop objects is currently not supported by multipath offload
and any attempts to use it is blocked, however this also block adding SW
routes with nexthop.

Resolve this by returning NOTIFY_DONE instead of an error which will allow such
a route to be created in SW but not offloaded.

This fix also solve an issue which block adding such routes on different devices
due to missing check if the route FIB device is one of multipath devices.

Fixes: 6a87afc072c3 ("mlx5: Fail attempts to use routes with nexthop objects")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:40 -08:00
Maor Dickman
885751eb1b net/mlx5e: Fix wrong usage of fib_info_nh when routes with nexthop objects are used
Creating routes with nexthop objects while in switchdev mode leads to access to
un-allocated memory and trigger bellow call trace due to hitting WARN_ON.
This is caused due to illegal usage of fib_info_nh in TC tunnel FIB event handling to
resolve the FIB device while fib_info built in with nexthop.

Fixed by ignoring attempts to use nexthop objects with routes until support can be
properly added.

WARNING: CPU: 1 PID: 1724 at include/net/nexthop.h:468 mlx5e_tc_tun_fib_event+0x448/0x570 [mlx5_core]
CPU: 1 PID: 1724 Comm: ip Not tainted 5.15.0_for_upstream_min_debug_2021_11_09_02_04 #1
RIP: 0010:mlx5e_tc_tun_fib_event+0x448/0x570 [mlx5_core]
RSP: 0018:ffff8881349f7910 EFLAGS: 00010202
RAX: ffff8881492f1980 RBX: ffff8881349f79e8 RCX: 0000000000000000
RDX: ffff8881349f79e8 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8881349f7950 R08: 00000000000000fe R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88811e9d0000
R13: ffff88810eb62000 R14: ffff888106710268 R15: 0000000000000018
FS:  00007f1d5ca6e800(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffedba44ff8 CR3: 0000000129808004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 atomic_notifier_call_chain+0x42/0x60
 call_fib_notifiers+0x21/0x40
 fib_table_insert+0x479/0x6d0
 ? try_charge_memcg+0x480/0x6d0
 inet_rtm_newroute+0x65/0xb0
 rtnetlink_rcv_msg+0x2af/0x360
 ? page_add_file_rmap+0x13/0x130
 ? do_set_pte+0xcd/0x120
 ? rtnl_calcit.isra.0+0x120/0x120
 netlink_rcv_skb+0x4e/0xf0
 netlink_unicast+0x1ee/0x2b0
 netlink_sendmsg+0x22e/0x460
 sock_sendmsg+0x33/0x40
 ____sys_sendmsg+0x1d1/0x1f0
 ___sys_sendmsg+0xab/0xf0
 ? __mod_memcg_lruvec_state+0x40/0x60
 ? __mod_lruvec_page_state+0x95/0xd0
 ? page_add_new_anon_rmap+0x4e/0xf0
 ? __handle_mm_fault+0xec6/0x1470
 __sys_sendmsg+0x51/0x90
 ? internal_get_user_pages_fast+0x480/0xa10
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 8914add2c9e5 ("net/mlx5e: Handle FIB events to update tunnel endpoint device")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:39 -08:00
Dima Chumak
de31854ece net/mlx5e: Fix nullptr on deleting mirroring rule
Deleting a Tc rule with multiple outputs, one of which is internal port,
like this one:

  tc filter del dev enp8s0f0_0 ingress protocol ip pref 5 flower \
      dst_mac 0c:42:a1:d1:d0:88 \
      src_mac e4:ea:09:08:00:02 \
      action tunnel_key  set \
          src_ip 0.0.0.0 \
          dst_ip 7.7.7.8 \
          id 8 \
          dst_port 4789 \
      action mirred egress mirror dev vxlan_sys_4789 pipe \
      action mirred egress redirect dev enp8s0f0_1

Triggers a call trace:

  BUG: kernel NULL pointer dereference, address: 0000000000000230
  RIP: 0010:del_sw_hw_rule+0x2b/0x1f0 [mlx5_core]
  Call Trace:
   tree_remove_node+0x16/0x30 [mlx5_core]
   mlx5_del_flow_rules+0x51/0x160 [mlx5_core]
   __mlx5_eswitch_del_rule+0x4b/0x170 [mlx5_core]
   mlx5e_tc_del_fdb_flow+0x295/0x550 [mlx5_core]
   mlx5e_flow_put+0x1f/0x70 [mlx5_core]
   mlx5e_delete_flower+0x286/0x390 [mlx5_core]
   tc_setup_cb_destroy+0xac/0x170
   fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
   __fl_delete+0x15e/0x170 [cls_flower]
   fl_delete+0x36/0x80 [cls_flower]
   tc_del_tfilter+0x3a6/0x6e0
   rtnetlink_rcv_msg+0xe5/0x360
   ? rtnl_calcit.isra.0+0x110/0x110
   netlink_rcv_skb+0x46/0x110
   netlink_unicast+0x16b/0x200
   netlink_sendmsg+0x202/0x3d0
   sock_sendmsg+0x33/0x40
   ____sys_sendmsg+0x1c3/0x200
   ? copy_msghdr_from_user+0xd6/0x150
   ___sys_sendmsg+0x88/0xd0
   ? ___sys_recvmsg+0x88/0xc0
   ? do_futex+0x10c/0x460
   __sys_sendmsg+0x59/0xa0
   do_syscall_64+0x48/0x140
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix by disabling offloading for flows matching
esw_is_chain_src_port_rewrite() which have more than one output.

Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:39 -08:00
Aya Levin
0b7cfa4082 net/mlx5e: Fix page DMA map/unmap attributes
Driver initiates DMA sync, hence it may skip CPU sync. Add
DMA_ATTR_SKIP_CPU_SYNC as input attribute both to dma_map_page and
dma_unmap_page to avoid redundant sync with the CPU.
When forcing the device to work with SWIOTLB, the extra sync might cause
data corruption. The driver unmaps the whole page while the hardware
used just a part of the bounce buffer. So syncing overrides the entire
page with bounce buffer that only partially contains real data.

Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE")
Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:55:39 -08:00
Gal Pressman
5dd29f40b2 net/mlx5e: Add recovery flow in case of error CQE
The rep legacy RQ completion handling was missing the appropriate
handling of error CQEs (dump the CQE and queue a recover work), fix it
by calling trigger_report() when needed.

Since all CQE handling flows do the exact same error CQE handling,
extract it to a common helper function.

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:55 -08:00
Roi Dayan
68511b48bf net/mlx5e: TC, Remove redundant error logging
Remove redundant and trivial error logging when trying to
offload mirred device with unsupported devices.
Using OVS could hit those a lot and the errors are still
logged in extack.

Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:55 -08:00
Saeed Mahameed
be23511eb5 net/mlx5e: Refactor set_pflag_cqe_based_moder
Rearrange the code and use cqe_mode_to_period_mode() helper.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:54 -08:00
Gal Pressman
b5f4290370 net/mlx5e: Move HW-GRO and CQE compression check to fix features flow
Feature dependencies should be resolved in fix features rather than in
set features flow. Move the check that disables HW-GRO in case CQE
compression is enabled from set_feature_hw_gro() to
mlx5e_fix_features().

Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:54 -08:00
Aya Levin
bc2a7b5c6b net/mlx5e: Fix feature check per profile
Remove redundant space when constructing the feature's enum. Validate
against the indented enum value.

Fixes: 6c72cb05d4b8 ("net/mlx5e: Use bitmap field for profile features")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:54 -08:00
Maor Dickman
7846665d35 net/mlx5e: Unblock setting vid 0 for VF in case PF isn't eswitch manager
When using libvirt to passthrough VF to VM it will always set the VF vlan
to 0 even if user didn’t request it, this will cause libvirt to fail to
boot in case the PF isn't eswitch owner.

Example of such case is the DPU host PF which isn't eswitch manager, so
any attempt to passthrough VF of it using libvirt will fail.

Fix it by not returning error in case set VF vlan is called with vid 0.

Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:53 -08:00
Lama Kayal
0a1498ebfa net/mlx5e: Expose FEC counters via ethtool
Add FEC counters' statistics of corrected_blocks and
uncorrectable_blocks, along with their lanes via ethtool.

HW supports corrected_blocks and uncorrectable_blocks counters both for
RS-FEC mode and FC-FEC mode. In FC mode these counters are accumulated
per lane, while in RS mode the correction method crosses lanes, thus
only total corrected_blocks and uncorrectable_blocks are reported in
this mode.

Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:53 -08:00
Maher Sanalla
f79a609ea6 net/mlx5: Update log_max_qp value to FW max capability
log_max_qp in driver's default profile #2 was set to 18, but FW actually
supports 17 at the most - a situation that led to the concerning print
when the driver is loaded:
"log_max_qp value in current profile is 18, changing to HCA capabaility
limit (17)"

The expected behavior from mlx5_profile #2 is to match the maximum FW
capability in regards to log_max_qp. Thus, log_max_qp in profile #2 is
initialized to a defined static value (0xff) - which basically means that
when loading this profile, log_max_qp value  will be what the currently
installed FW supports at most.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:52 -08:00
Shay Drory
061f5b2358 net/mlx5: SF, Use all available cpu for setting cpu affinity
Currently all SFs are using the same CPUs. Spreading SF over CPUs, in
round-robin manner, in order to achieve better distribution of the SFs
over available CPUs.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:52 -08:00
Shay Drory
79b60ca83b net/mlx5: Introduce API for bulk request and release of IRQs
Currently IRQs are requested one by one. To balance spreading IRQs
among cpus using such scheme requires remembering cpu mask for the
cpus used for a given device. This complicates the IRQ allocation
scheme in subsequent patch.

Hence, prepare the code for bulk IRQs allocation. This enables
spreading IRQs among cpus in subsequent patch.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06 16:22:52 -08:00