4436 Commits

Author SHA1 Message Date
Saeed Mahameed
f2f3df5501 net/mlx5: EQ, Privatize eq_table and friends
Move unnecessary EQ table structures and declaration from the
public include/linux/mlx5/driver.h into the private area of mlx5_core
and into eq.c/eq.h.

Introduce new mlx5 EQ APIs:

mlx5_comp_vectors_count(dev);
mlx5_comp_irq_get_affinity_mask(dev, vector);

And use them from mlx5_ib or mlx5e netdevice instead of direct access to
mlx5_core internal structures.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20 20:06:54 +02:00
Saeed Mahameed
d674a9aa43 net/mlx5: EQ, irq_info and rmap belong to eq_table
irq_info and rmap are EQ properties of the driver, and only needed for
EQ objects, move them to the eq_table EQs database structure.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20 20:06:48 +02:00
Saeed Mahameed
c8e21b3b57 net/mlx5: EQ, Create all EQs in one place
Instead of creating the EQ table in three steps at driver load,
 - allocate irq vectors
 - allocate async EQs
 - allocate completion EQs
Gather all of the procedures into one function in eq.c and call it from
driver load.

This will help us reduce the EQ and EQ table private structures
visibility to eq.c in downstream refactoring.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20 20:06:42 +02:00
Saeed Mahameed
ca828cb468 net/mlx5: EQ, Move all EQ logic to eq.c
Move completion EQs flows from main.c to eq.c, reasons:
1) It is where this logic belongs.
2) It will help centralize the EQ logic in one file for downstream
refactoring, and future extensions/updates.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20 20:06:37 +02:00
Saeed Mahameed
aaa553a644 net/mlx5: EQ, Remove redundant completion EQ list lock
Completion EQs list is only modified on driver load/unload, locking is
not required, remove it.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20 20:06:23 +02:00
Saeed Mahameed
2883f35257 net/mlx5: EQ, No need to store eq index as a field
eq->index is used only for completion EQs and is assigned to be
the completion eq index, it is used only when traversing the completion
eqs list, and it can be calculated dynamically, thus remove the
eq->index field.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20 20:06:18 +02:00
Saeed Mahameed
4de45c7586 net/mlx5: EQ, Remove unused fields and structures
Some fields and structures are not referenced nor used by the driver,
remove them.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20 20:06:07 +02:00
Saeed Mahameed
1e86ace4c1 net/mlx5: EQ, Use the right place to store/read IRQ affinity hint
Currently the cpu affinity hint mask for completion EQs is stored and
read from the wrong place, since reading and storing is done from the
same index, there is no actual issue with that, but internal irq_info
for completion EQs stars at MLX5_EQ_VEC_COMP_BASE offset in irq_info
array, this patch changes the code to use the correct offset to store
and read the IRQ affinity hint.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20 20:05:59 +02:00
David S. Miller
1359f25106 mlx5-fixes-2018-11-19
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJb80jFAAoJEEg/ir3gV/o+eUEH/0/bUhiaxaN88VFZXA24WlU2
 v5jaeMg1P3ZuWYNS/AxGWedYWYr7oOgin79/4LIBKJMFdTdLGS2Oa0cTi0ycSNpx
 0WMhVchClX3/UNTn6u/G23VsHeYav/AP+uaqnI+Tbtv4o8tc9ecpFgCXo4pMJaW/
 osUOJC3MmU68yRAo91uwhhSHWMsl+8tceMdoS9N1Q6faM2v9XcJrMvKHMNpDvPpf
 VXvC18fTX/qi/loJ6oKepTXn6lNVKDIgKmCjjudgwfRHPiw9R5uMlunaOoHO1wnH
 T8ttdg/Dn94BLOOmHlOoW9TT7LVFLk7w2y5NoefUY0wUKSjByMxzyzqmdI+qXZU=
 =SSaw
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2018-11-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2018-11-19

The following fixes are for mlx5 core and netdev driver.

For -stable v4.16
bc7fda7d4637 ('net/mlx5e: IPoIB, Reset QP after channels are closed')

For -stable v4.17
36917a270395 ('net/mlx5: IPSec, Fix the SA context hash key')

For -stable v4.18
6492a432be3a ('net/mlx5e: Always use the match level enum when parsing TC rule match')
c3f81be236b1 ('net/mlx5e: Removed unnecessary warnings in FEC caps query')
c5ce2e736b64 ('net/mlx5e: Fix selftest for small MTUs')

For -stable v4.19
effcd896b25e ('net/mlx5e: Adjust to max number of channles when re-attaching')
394cbc5acd68 ('net/mlx5e: RX, verify received packet size in Linear Striding RQ')
447cbb3613c8 ('net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded')
c223c1574612 ('net/mlx5e: Claim TC hw offloads support only under a proper build config')

Please pull and let me know if there's any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 19:04:33 -08:00
Shay Agroskin
9184e51b5b net/mlx5e: Fix failing ethtool query on FEC query error
If FEC caps query fails when executing 'ethtool <interface>'
the whole callback fails unnecessarily, fixed that by replacing the
error return code with debug logging only.

Fixes: 6cfa94605091 ("net/mlx5e: Ethtool driver callback for query/set FEC policy")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 15:33:31 -08:00
Shay Agroskin
64e2833484 net/mlx5e: Removed unnecessary warnings in FEC caps query
Querying interface FEC caps with 'ethtool [int]' after link reset
throws warning regading link speed.
This warning is not needed as there is already an indication in
user space that the link is not up.

Fixes: 0696d60853d5 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 15:33:31 -08:00
Shay Agroskin
febd72f27c net/mlx5e: Fix wrong field name in FEC related functions
This bug would result in reading wrong FEC capabilities for 10G/40G.

Fixes: 2095b2641477 ("net/mlx5e: Add port FEC get/set functions")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 15:33:31 -08:00
Shay Agroskin
9cdeaab3b7 net/mlx5e: Fix a bug in turning off FEC policy in unsupported speeds
Some speeds don't support turning FEC policy off. In case a requested
FEC policy is not supported for a speed (including current speed), its new
FEC policy would be:
	no FEC - if disabling FEC is supported for that speed
	unchanged - else

Fixes: 2095b2641477 ("net/mlx5e: Add port FEC get/set functions")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 15:33:31 -08:00
Valentine Fatiev
228c4cd04d net/mlx5e: Fix selftest for small MTUs
Loopback test had fixed packet size, which can be bigger than configured
MTU. Shorten the loopback packet size to be bigger than minimal MTU
allowed by the device. Text field removed from struct 'mlx5ehdr'
as redundant to allow send small packets as minimal allowed MTU.

Fixes: d605d66 ("net/mlx5e: Add support for ethtool self diagnostics test")
Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 14:35:04 -08:00
Moshe Shemesh
0073c8f727 net/mlx5e: RX, verify received packet size in Linear Striding RQ
In case of striding RQ, we use  MPWRQ (Multi Packet WQE RQ), which means
that WQE (RX descriptor) can be used for many packets and so the WQE is
much bigger than MTU.  In virtualization setups where the port mtu can
be larger than the vf mtu, if received packet is bigger than MTU, it
won't be dropped by HW on too small receive WQE. If we use linear SKB in
striding RQ, since each stride has room for mtu size payload and skb
info, an oversized packet can lead to crash for crossing allocated page
boundary upon the call to build_skb. So driver needs to check packet
size and drop it.

Introduce new SW rx counter, rx_oversize_pkts_sw_drop, which counts the
number of packets dropped by the driver for being too large.

As a new field is added to the RQ struct, re-open the channels whenever
this field is being used in datapath (i.e., in the case of linear
Striding RQ).

Fixes: 619a8f2a42f1 ("net/mlx5e: Use linear SKB in Striding RQ")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 14:35:04 -08:00
Roi Dayan
1392f44bba net/mlx5e: Apply the correct check for supporting TC esw rules split
The mirror and not the output count is the one denoting a split.
Fix to condition the offload attempt on the mirror count being > 0
along the firmware to have the related capability.

Fixes: 592d36515969 ("net/mlx5e: Parse mirroring action for offloaded TC eswitch flows")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Yossi Kuperman <yossiku@mellanox.com>
Reviewed-by: Chris Mi <chrism@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 14:35:04 -08:00
Yuval Avnery
a1f240f180 net/mlx5e: Adjust to max number of channles when re-attaching
When core driver enters deattach/attach flow after pci reset,
Number of logical CPUs may have changed.
As a result we need to update the cpu affiliated resource tables.
	1. indirect rqt list
	2. eq table

Reproduction (PowerPC):
	echo 1000 > /sys/kernel/debug/powerpc/eeh_max_freezes
	ppc64_cpu --smt=on
	# Restart driver
	modprobe -r ... ; modprobe ...
	# Link up
	ifconfig ...
	# Only physical CPUs
	ppc64_cpu --smt=off
	# Inject PCI errors so PCI will reset - calling the pci error handler
	echo 0x8000000000000000 > /sys/kernel/debug/powerpc/<PCI BUS>/err_injct_inboundA

Call trace when trying to add non-existing rqs to an indirect rqt:
	mlx5e_redirect_rqt+0x84/0x260 [mlx5_core] (unreliable)
	mlx5e_redirect_rqts+0x188/0x190 [mlx5_core]
	mlx5e_activate_priv_channels+0x488/0x570 [mlx5_core]
	mlx5e_open_locked+0xbc/0x140 [mlx5_core]
	mlx5e_open+0x50/0x130 [mlx5_core]
	mlx5e_nic_enable+0x174/0x1b0 [mlx5_core]
	mlx5e_attach_netdev+0x154/0x290 [mlx5_core]
	mlx5e_attach+0x88/0xd0 [mlx5_core]
	mlx5_attach_device+0x168/0x1e0 [mlx5_core]
	mlx5_load_one+0x1140/0x1210 [mlx5_core]
	mlx5_pci_resume+0x6c/0xf0 [mlx5_core]

Create cq will fail when trying to use non-existing EQ.

Fixes: 89d44f0a6c73 ("net/mlx5_core: Add pci error handlers to mlx5_core driver")
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 14:35:04 -08:00
Or Gerlitz
83621b7df6 net/mlx5e: Always use the match level enum when parsing TC rule match
We get the match level (none, l2, l3, l4) while going over the match
dissectors of an offloaded tc rule. When doing this, the match level
enum and the not min inline enum values should be used, fix that.

This worked accidentally b/c both enums have the same numerical values.

Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 14:35:04 -08:00
Or Gerlitz
077ecd785d net/mlx5e: Claim TC hw offloads support only under a proper build config
Currently, we are only supporting tc hw offloads when the eswitch
support is compiled in, but we are not gating the adevertizment
of the NETIF_F_HW_TC feature on this config being set.

Fix it, and while doing that, also avoid dealing with the feature
on ethtool when the config is not set.

Fixes: e8f887ac6a45 ('net/mlx5e: Introduce tc offload support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 14:35:04 -08:00
Or Gerlitz
d3a80bb5a3 net/mlx5e: Don't match on vlan non-existence if ethertype is wildcarded
For the "all" ethertype we should not care whether the packet has
vlans. Besides being wrong, the way we did it caused FW error
for rules such as:

tc filter add dev eth0 protocol all parent ffff: \
	prio 1 flower skip_sw action drop

b/c the matching meta-data (outer headers bit in struct mlx5_flow_spec)
wasn't set. Fix that by matching on vlan non-existence only if we were
also told to match on the ethertype.

Fixes: cee26487620b ('net/mlx5e: Set vlan masks for all offloaded TC rules')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 14:35:04 -08:00
Denis Drozdov
acf3766b36 net/mlx5e: IPoIB, Reset QP after channels are closed
The mlx5e channels should be closed before mlx5i_uninit_underlay_qp
puts the QP into RST (reset) state during mlx5i_close. Currently QP
state incorrectly set to RST before channels got deactivated and closed,
since mlx5_post_send request expects QP in RTS (Ready To Send) state.

The fix is to keep QP in RTS state until mlx5e channels get closed
and to reset QP afterwards.

Also this fix is simply correct in order to keep the open/close flow
symmetric, i.e mlx5i_init_underlay_qp() is called first thing at open,
the correct thing to do is to call mlx5i_uninit_underlay_qp() last thing
at close, which is exactly what this patch is doing.

Fixes: dae37456c8ac ("net/mlx5: Support for attaching multiple underlay QPs to root flow table")
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 14:35:04 -08:00
Raed Salem
f2b18732ee net/mlx5: IPSec, Fix the SA context hash key
The commit "net/mlx5: Refactor accel IPSec code" introduced a
bug where asynchronous short time change in hash key value
by create/release SA context might happen during an asynchronous
hash resize operation this could cause a subsequent remove SA
context operation to fail as the key value used during resize is
not the same key value used when remove SA context operation is
invoked.

This commit fixes the bug by defining the SA context hash key
such that it includes only fields that never change during the
lifetime of the SA context object.

Fixes: d6c4f0298cec ("net/mlx5: Refactor accel IPSec code")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19 14:35:04 -08:00
David S. Miller
f2be6d710d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-19 10:55:00 -08:00
Shalom Toledo
bae4e10983 mlxsw: spectrum: Expose discard counters via ethtool
Expose packets discard counters via ethtool to help with debugging.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-18 19:02:07 -08:00
Aya Levin
a463146e67 net/mlx4: Fix UBSAN warning of signed integer overflow
UBSAN: Undefined behavior in
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:626:29
signed integer overflow: 1802201963 + 1802201963 cannot be represented
in type 'int'

The union of res_reserved and res_port_rsvd[MLX4_MAX_PORTS] monitors
granting of reserved resources. The grant operation is calculated and
protected, thus both members of the union cannot be negative.  Changed
type of res_reserved and of res_port_rsvd[MLX4_MAX_PORTS] from signed
int to unsigned int, allowing large value.

Fixes: 5a0d0a6161ae ("mlx4: Structures and init/teardown for VF resource quotas")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 16:09:31 -08:00
Tariq Toukan
3ea7e7ea53 net/mlx4_core: Fix uninitialized variable compilation warning
Initialize the uid variable to zero to avoid the compilation warning.

Fixes: 7a89399ffad7 ("net/mlx4: Add mlx4_bitmap zone allocator")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 16:09:31 -08:00
Jack Morgenstein
bd85fbc203 net/mlx4_core: Zero out lkey field in SW2HW_MPT fw command
When re-registering a user mr, the mpt information for the
existing mr when running SRIOV is obtained via the QUERY_MPT
fw command. The returned information includes the mpt's lkey.

This retrieved mpt information is used to move the mpt back
to hardware ownership in the rereg flow (via the SW2HW_MPT
fw command when running SRIOV).

The fw API spec states that for SW2HW_MPT, the lkey field
must be zero. Any ConnectX-3 PF driver which checks for strict spec
adherence will return failure for SW2HW_MPT if the lkey field is not
zero (although the fw in practice ignores this field for SW2HW_MPT).

Thus, in order to conform to the fw API spec, set the lkey field to zero
before invoking SW2HW_MPT when running SRIOV.

Fixes: e630664c8383 ("mlx4_core: Add helper functions to support MR re-registration")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 16:09:30 -08:00
Jiri Pirko
c22291f7cf mlxsw: spectrum: acl: Implement delta for ERP
Allow ERP sharing for multiple mask. Do it by properly implementing
delta_create() objagg object. Use the computed delta info for inserting
rules in A-TCAM.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 14:43:43 -08:00
Jiri Pirko
c293ba3403 mlxsw: spectrum: acl: Push code related to num_ctcam_erps inc/dec into separate helpers
Later on the same code is going to be needed for deltas as well. So push
the procedures related to increment and decrement of num_ctcam_erps
into a separate helpers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 14:43:43 -08:00
Jiri Pirko
59600844cf mlxsw: spectrum: acl: Remove mlxsw_afk_encode() block range args and key/mask check
Since two remaining users of mlxsw_afk_encode() do not specify
block ranges to work on, remove the args. Also, key/mask is always
non-NULL now, so skip the checks.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 14:43:43 -08:00
Jiri Pirko
b1ce60e621 mlxsw: spectrum: acl: Don't encode the key again in mlxsw_sp_acl_atcam_12kb_lkey_id_get()
No need to do key encoding again in
mlxsw_sp_acl_atcam_12kb_lkey_id_get(). Instead of that, introduce
a new helper that would just clear unused blocks.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 14:43:43 -08:00
Jiri Pirko
3bc6f3858a mlxsw: core_acl: Change order of args of ops->encode_block()
Change order so it is aligned with the usual case where the "write_to"
buffer comes as the first arg.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 14:43:43 -08:00
Jiri Pirko
d07cd66060 mlxsw: spectrum: acl: Pass key pointer to master_mask_set/clear
The device requires that the master mask of each region will be
composed from a logical OR between all the unmasked bits in the region.
Currently, this is just a logical OR between all the eRPs used in the
region, but the next patch is going to introduce delta bits support
which need to be taken into account as well.

Since the eRP does not include the delta bits, pass the key pointer to
mlxsw_sp_acl_erp_master_mask_set/clear instead. Convert key->mask to
the bitmap on fly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 14:43:43 -08:00
Jiri Pirko
c71abd7d94 mlxsw: spectrum: acl_erp: Convert to use objagg for tracking ERPs
Currently the ERPs are tracked internally in a hashtable. Benefit from
the newly introduced objagg library and use it to track ERPs. At this
point, there is no nesting of objects done, as the delta_create callback
always returns -EOPNOTSUPP. On the way, add "mask" into ERP mask get and
set functions and struct names.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 14:43:43 -08:00
Moni Shoua
90290db766 net/mlx5: Use multi threaded workqueue for page fault handling
Page fault events are processed in a workqueue context. Since each QP
can have up to two concurrent unrelated page-faults, one for requester
and one for responder, page-fault handling can be done in parallel.
Achieve this by changing the workqueue to be multi-threaded.
The number of threads is the same as the number of command interface
channels to avoid command interface bottlenecks.

In addition to multi-threads, change the workqueue flags to give it high
priority.

Stress benchmark shows that before this change 85% of page faults were
waiting in queue 8 seconds or more while after the change 98% of page
faults were waiting in queue 64 milliseconds or less. The number of threads
was chosen as the number of channels to the command interface.

Fixes: d9aaed838765 ("{net,IB}/mlx5: Refactor page fault handling")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-12 22:21:32 +02:00
Moni Shoua
ef90c5e975 net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state
When the device is in internal error state, command interface isn't
accessible and the driver decides which commands to fail and which to
pass.
Move the PAGE_FAULT_RESUME command to the pass list in order to avoid
redundant failure messages.

Fixes: 89d44f0a6c73 ("net/mlx5_core: Add pci error handlers to mlx5_core driver")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-12 22:21:17 +02:00
Moni Shoua
27e95603f4 net/mlx5: Add interface to hold and release core resources
Sometimes upper layers may want to prevent the destruction of a core
resource for a period of time while work on that resource is in
progress.  Add API to support this.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-12 22:20:29 +02:00
Moni Shoua
698114968a net/mlx5: Release resource on error flow
Fix reference counting leakage when the event handler aborts due to an
unsupported event for the resource type.

Fixes: a14c2d4beee5 ("net/mlx5_core: Warn on unsupported events of QP/RQ/SQ")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-12 22:20:13 +02:00
Michał Mirosław
4b17f9fe48 mlx4: use __vlan_hwaccel helpers
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08 20:45:04 -08:00
Shalom Toledo
96801552f8 mlxsw: spectrum: Fix IP2ME CPU policer configuration
The CPU policer used to police packets being trapped via a local route
(IP2ME) was incorrectly configured to police based on bytes per second
instead of packets per second.

Change the policer to police based on packets per second and avoid
packet loss under certain circumstances.

Fixes: 9148e7cf73ce ("mlxsw: spectrum: Add policers for trap groups")
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03 19:31:42 -07:00
Eric Dumazet
c297344435 net/mlx4_en: use __netdev_tx_sent_queue()
doorbell only depends on xmit_more and netif_tx_queue_stopped()

Using __netdev_tx_sent_queue() avoids messing with BQL stop flag,
and is more generic.

This patch increases performance on GSO workload by keeping
doorbells to the minimum required.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03 15:40:01 -07:00
Petr Machata
0fe6402316 mlxsw: spectrum: Set minimum shaper on MC TCs
An MC-aware mode was introduced in commit 7b8195306694 ("mlxsw:
spectrum: Configure MC-aware mode on mlxsw ports"). In MC-aware mode,
BUM traffic gets a special treatment by being assigned to a separate set
of traffic classes 8..15. Pairs of TCs 0 and 8, 1 and 9, etc., are then
configured to strictly prioritize the lower-numbered ones. The intention
is to prevent BUM traffic from flooding the switch and push out all UC
traffic, which would otherwise happen, and instead give UC traffic
precedence.

However strictly prioritizing UC traffic has the effect that UC overload
pushes out all BUM traffic, such as legitimate ARP queries. These
packets are kept in queues for a while, but under sustained UC overload,
their lifetime eventually expires and these packets are dropped. That is
detrimental to network performance as well.

Therefore configure the MC TCs (8..15) with minimum shaper of 200Mbps (a
minimum permitted value) to allow a trickle of necessary control traffic
to get through.

Fixes: 7b8195306694 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:56:58 -07:00
Petr Machata
8b931821aa mlxsw: reg: QEEC: Add minimum shaper fields
Add QEEC.mise (minimum shaper enable) and QEEC.min_shaper_rate to enable
configuration of minimum shaper.

Increase the QEEC length to 0x20 as well: that's the length that the
register has had for a long time now, but with the configurations that
mlxsw typically exercises, the firmware tolerated 0x1C-sized packets.
With mise=true however, FW rejects packets unless they have the full
required length.

Fixes: b9b7cee40579 ("mlxsw: reg: Add QoS ETS Element Configuration register")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:56:58 -07:00
Eric Dumazet
d48051c5b8 net/mlx5e: fix csum adjustments caused by RXFCS
As shown by Dmitris, we need to use csum_block_add() instead of csum_add()
when adding the FCS contribution to skb csum.

Before 4.18 (more exactly commit 88078d98d1bb "net: pskb_trim_rcsum()
and CHECKSUM_COMPLETE are friends"), the whole skb csum was thrown away,
so RXFCS changes were ignored.

Then before commit d55bef5059dd ("net: fix pskb_trim_rcsum_slow() with
odd trim offset") both mlx5 and pskb_trim_rcsum_slow() bugs were canceling
each other.

Now we fixed pskb_trim_rcsum_slow() we need to fix mlx5.

Note that this patch also rewrites mlx5e_get_fcs() to :

- Use skb_header_pointer() instead of reinventing it.
- Use __get_unaligned_cpu32() to avoid possible non aligned accesses
  as Dmitris pointed out.

Fixes: 902a545904c7 ("net/mlx5e: When RXFCS is set, add FCS data into checksum calculation")
Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eran Ben Elisha <eranbe@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Dimitris Michailidis <dmichail@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Paweł Staszewski <pstaszewski@itcare.pl>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Tested-By: Maria Pasechnik <mariap@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-31 12:40:52 -07:00
Eric Dumazet
3aa8029e1a net/mlx4_en: add a missing <net/ip.h> include
Abdul Haleem reported a build error on ppc :

drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: `struct
iphdr` declared inside parameter list [enabled by default]
           struct iphdr *iph)
                  ^
drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: its scope is
only this definition or declaration, which is probably not what you want
[enabled by default]
drivers/net/ethernet/mellanox/mlx4/en_rx.c: In function
get_fixed_ipv4_csum:
drivers/net/ethernet/mellanox/mlx4/en_rx.c:586:20: error: dereferencing
pointer to incomplete type
  __u8 ipproto = iph->protocol;
                    ^

Fixes: 55469bc6b577 ("drivers: net: remove <net/busy_poll.h> inclusion when not needed")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-30 11:18:59 -07:00
Shalom Toledo
a22712a962 mlxsw: core: Fix devlink unregister flow
After a failed reload, the driver is still registered to devlink, its
devlink instance is still allocated and the 'reload_fail' flag is set.
Then, in the next reload try, the driver's allocated devlink instance will
be freed without unregistering from devlink and its components (e.g,
resources). This scenario can cause a use-after-free if the user tries to
execute command via devlink user-space tool.

Fix by not freeing the devlink instance during reload (failed or not).

Fixes: 24cc68ad6c46 ("mlxsw: core: Add support for reload")
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29 20:48:00 -07:00
Petr Machata
ad0b9d9418 mlxsw: spectrum_switchdev: Don't ignore deletions of learned MACs
Demands to remove FDB entries should be honored even if the FDB entry in
question was originally learned, and not added by the user. Therefore
ignore the added_by_user datum for SWITCHDEV_FDB_DEL_TO_DEVICE.

Fixes: 816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev notifications")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Suggested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29 20:48:00 -07:00
Eric Dumazet
55469bc6b5 drivers: net: remove <net/busy_poll.h> inclusion when not needed
Drivers using generic NAPI interface no longer need to include
<net/busy_poll.h>, since busy polling was moved to core networking
stack long ago.

See commit 79e7fff47b7b ("net: remove support for per driver
ndo_busy_poll()") for reference.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-25 16:20:02 -07:00
Dan Carpenter
cc3a4cd3f0 net/mlx5: Allocate enough space for the FDB sub-namespaces
FDB_MAX_CHAIN is three.  We wanted to allocate enough memory to hold four
structs but there are missing parentheses so we only allocate enough
memory for three structs and the first byte of the fourth one.

Fixes: 328edb499f99 ("net/mlx5: Split FDB fast path prio to multiple namespaces")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-22 19:46:34 -07:00
David S. Miller
2e2d6f0342 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
net/sched/cls_api.c has overlapping changes to a call to
nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL
to the 5th argument, and another (from 'net-next') added cb->extack
instead of NULL to the 6th argument.

net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to
code which moved (to mr_table_dump)) in 'net-next'.  Thanks to David
Ahern for the heads up.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19 11:03:06 -07:00