Commit Graph

1046339 Commits

Author SHA1 Message Date
David S. Miller
72b93a8685 Merge branch 'mlxsw-rif-mac-prefixes'
Ido Schimmel says:

====================
mlxsw: Support multiple RIF MAC prefixes

Currently, mlxsw enforces that all the netdevs used as router interfaces
(RIFs) have the same MAC prefix (e.g., same 38 MSBs in Spectrum-1).
Otherwise, an error is returned to user space with extack. This patchset
relaxes the limitation through the use of RIF MAC profiles.

A RIF MAC profile is a hardware entity that represents a particular MAC
prefix which multiple RIFs can reference. Therefore, the number of
possible MAC prefixes is no longer one, but the number of profiles
supported by the device.

The ability to change the MAC of a particular netdev is useful, for
example, for users who use the netdev to connect to an upstream provider
that performs MAC filtering. Currently, such users are either forced to
negotiate with the provider or change the MAC address of all other
netdevs so that they share the same prefix.

Patchset overview:

Patches #1-#3 are preparations.

Patch #4 adds actual support for RIF MAC profiles.

Patch #5 exposes RIF MAC profiles as a devlink resource, so that user
space has visibility into the maximum number of profiles and current
occupancy. Useful for debugging and testing (next 3 patches).

Patches #6-#8 add both scale and functional tests.

Patch #9 removes tests that validated the previous limitation. It is now
covered by patch #6 for devices that support a single profile.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:35:59 +01:00
Danielle Ratson
c24dbf3d4f selftests: mlxsw: Remove deprecated test cases
After adding the previous patches, the constraint that all the router
interface MAC addresses have the same prefix is no longer relevant.

Remove the test cases that validated that this constraint is honored.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:35:58 +01:00
Danielle Ratson
20d446db61 selftests: Add an occupancy test for RIF MAC profiles
When all the RIF MAC profiles are in use, test that it is possible to
change the MAC of a netdev (i.e., a RIF) when its MAC profile is not
shared with other RIFs. Test that replacement fails when the MAC profile
is shared.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:35:58 +01:00
Danielle Ratson
a10b7bacde selftests: mlxsw: Add forwarding test for RIF MAC profiles
Verify that MAC profile changes are indeed applied and that packets are
forwarded with the correct source MAC.

Output example:

$ ./rif_mac_profiles.sh
TEST: h1->h2: new mac profile                                       [ OK ]
TEST: h2->h1: new mac profile                                       [ OK ]
TEST: h1->h2: edit mac profile                                      [ OK ]
TEST: h2->h1: edit mac profile                                      [ OK ]

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:35:58 +01:00
Danielle Ratson
152f98e7c5 selftests: mlxsw: Add a scale test for RIF MAC profiles
Query the maximum number of supported RIF MAC profiles using
devlink-resource and verify that all available MAC profiles can be utilized
and that an error is generated when user space tries to exceed this number.

Output example in Spectrum-2:

$ TESTS='rif_mac_profile' ./resource_scale.sh
TEST: 'rif_mac_profile' 4                                           [ OK ]
TEST: 'rif_mac_profile' overflow 5                                  [ OK ]

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:35:58 +01:00
Danielle Ratson
1c375ffb2e mlxsw: spectrum_router: Expose RIF MAC profiles to devlink resource
Expose via devlink-resource the maximum number of RIF MAC profiles and
their current occupancy, so it can be used for debug and writing generic
tests, like in the next patch.

Example for Spectrum-2 output:

$ devlink resource show pci/0000:06:00.0
...
  name rif_mac_profiles size 4 occ 0 unit entry dpipe_tables none

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:35:57 +01:00
Danielle Ratson
605d25cd78 mlxsw: spectrum_router: Add RIF MAC profiles support
Currently, mlxsw enforces that all the router interfaces (RIFs) have the
same MAC prefix.

Relax this limitation by using RIF MAC profiles. Each profile is
associated with a particular MAC prefix and multiple RIFs can use the
same profile. Therefore, the number of possible MAC prefixes is no
longer one, but the number of profiles supported by the device.

Store the profiles in an IDR and reference count them according to the
number of RIFs using them.

Associate a RIF with a profile when the RIF is created and remove the
association when the RIF is deleted.

Change the association following 'NETDEV_CHANGEADDR' events, except when
only one RIF is using the profile. In which case, change the MAC prefix
of the profile itself instead of associating the RIF with a new profile.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:35:57 +01:00
Danielle Ratson
26029225d9 mlxsw: spectrum_router: Propagate extack further
The next patch will set the MAC profile of a router interface (RIF) as
part of its configure() callback. The operation can fail in case the
maximum number of profiles was exceeded.

Add extack to mlxsw_sp_rif_ops::configure() in order to communicate such
failures to user space.

In addition, the MAC profile of a RIF can change following a
'NETDEV_CHANGEADDR' notification. Propagate extack to
mlxsw_sp_router_port_change_event() so that failures could be
communicated in this path as well.

No functional changes intended.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:35:57 +01:00
Danielle Ratson
a8428e5045 mlxsw: resources: Add resource identifier for RIF MAC profiles
Add a resource identifier for maximum RIF MAC profiles so that it could
be later used to query the information from firmware.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:35:56 +01:00
Danielle Ratson
d25d7fc31e mlxsw: reg: Add MAC profile ID field to RITR register
Add MAC profile ID field to RITR register so that it could be used for
associating a RIF with a MAC profile ID by a later patch.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:35:56 +01:00
David S. Miller
be34892644 Merge branch 'netfilter-vrf-rework'
Florian Westphal says:

====================
vrf: rework interaction with netfilter/conntrack

V2:
- fix 'plain integer as null pointer' warning
- reword commit message in patch 2 to clarify loss of 'ct set untracked'

This patch series aims to solve the to-be-reverted change 09e856d54b
("vrf: Reset skb conntrack connection on VRF rcv") in a different way.

Rather than have skbs pass through conntrack and nat hooks twice, suppress
conntrack invocation if the conntrack/nat hook is called from the vrf driver.

First patch deals with 'incoming connection' case:
1. suppress NAT transformations
2. skip conntrack confirmation

NAT and conntrack confirmation is done when ip/ipv6 stack calls
the postrouting hook.

Second patch deals with local packets:
in vrf driver, mark the skbs as 'untracked', so conntrack output
hook ignores them.  This skips all nat hooks as well.

Afterwards, remove the untracked state again so the second
round will pick them up.

One alternative to the chosen implementation would be to add a 'caller
id' field to 'struct nf_hook_state' and then use that, these patches
use the more straightforward check of VRF flag on the state->out device.

The two patches apply to both net and net-next, i am targeting -next
because I think that since snat did not work correctly for so long that
we can take the longer route.  If you disagree, apply to net at your
discretion.

The patches apply both with 09e856d54b reverted or still
in-place, but only with the revert in place ingress conntrack settings
(zone, notrack etc) start working again.

I've already submitted selftests for vrf+nfqueue and conntrack+vrf.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:21:10 +01:00
Florian Westphal
8c9c296adf vrf: run conntrack only in context of lower/physdev for locally generated packets
The VRF driver invokes netfilter for output+postrouting hooks so that users
can create rules that check for 'oif $vrf' rather than lower device name.

This is a problem when NAT rules are configured.

To avoid any conntrack involvement in round 1, tag skbs as 'untracked'
to prevent conntrack from picking them up.

This gets cleared before the packet gets handed to the ip stack so
conntrack will be active on the second iteration.

One remaining issue is that a rule like

  output ... oif $vrfname notrack

won't propagate to the second round because we can't tell
'notrack set via ruleset' and 'notrack set by vrf driver' apart.
However, this isn't a regression: the 'notrack' removal happens
instead of unconditional nf_reset_ct().
I'd also like to avoid leaking more vrf specific conditionals into the
netfilter infra.

For ingress, conntrack has already been done before the packet makes it
to the vrf driver, with this patch egress does connection tracking with
lower/physical device as well.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:21:10 +01:00
Florian Westphal
8e0538d8ee netfilter: conntrack: skip confirmation and nat hooks in postrouting for vrf
The VRF driver invokes netfilter for output+postrouting hooks so that users
can create rules that check for 'oif $vrf' rather than lower device name.

Afterwards, ip stack calls those hooks again.

This is a problem when conntrack is used with IP masquerading.
masquerading has an internal check that re-validates the output
interface to account for route changes.

This check will trigger in the vrf case.

If the -j MASQUERADE rule matched on the first iteration, then round 2
finds state->out->ifindex != nat->masq_index: the latter is the vrf
index, but out->ifindex is the lower device.

The packet gets dropped and the conntrack entry is invalidated.

This change makes conntrack postrouting skip the nat hooks.
Also skip confirmation.  This allows the second round
(postrouting invocation from ipv4/ipv6) to create nat bindings.

This also prevents the second round from seeing packets that had their
source address changed by the nat hook.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:21:09 +01:00
David S. Miller
4900a76915 mlx5-updates-2021-10-25
Misc updates for mlx5 driver:
 
 1) Misc updates and cleanups:
  - Don't write directly to netdev->dev_addr, From Jakub Kicinski
  - Remove unnecessary checks for slow path flag in tc module
  - Fix unused function warning of mlx5i_flow_type_mask
  - Bridge, support replacing existing FDB entry
 
 2) Sub Functions, Reduction in memory usage:
  - Reduce flow counters bulk query buffer size
  - Implement max_macs devlink parameter
  - Add devlink vendor params to control Event Queue sizes
  - Added SF life cycle trace points by Parav/
 
 3) From Aya, Firmware health buffer reporting improvements
  - Print health buffer by log level and more missing information
  - Periodic update of host time to firmware
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmF3GMkACgkQSD+KveBX
 +j5sZggAnuZLX9nUwT0hj/1Qu102RzDFjWrqG4DkWFbkCb4/yI9wckSjXW1J9c7v
 /KyVU87SM58MQGwcaNn6t8DMJTtpieGL+emYW5P1raemmYPvRsL8X9Bb7qiQM5G0
 tupLpzwmBuSJASWMC743VDGrnWUdD1woIbo6LXuU1nDVoxfOxIwUZPni5kOpo6TP
 Ui6mrSYsT2HWUrem0lXrY5hCHkCinwpSLDr6J4Djv0PJcnwt5zTXbleH904Kdtxu
 qMazklHCEh5njFQ/ukl5PNon+uUIpdCE9/Gn5IcmhbUEHGH5yLgmBaOfDY8M6TTV
 4VPjhuGaNK8eyjoWsFPlxlCLWn1IqA==
 =gv0F
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-10-25

Misc updates for mlx5 driver:

1) Misc updates and cleanups:
 - Don't write directly to netdev->dev_addr, From Jakub Kicinski
 - Remove unnecessary checks for slow path flag in tc module
 - Fix unused function warning of mlx5i_flow_type_mask
 - Bridge, support replacing existing FDB entry

2) Sub Functions, Reduction in memory usage:
 - Reduce flow counters bulk query buffer size
 - Implement max_macs devlink parameter
 - Add devlink vendor params to control Event Queue sizes
 - Added SF life cycle trace points by Parav/

3) From Aya, Firmware health buffer reporting improvements
 - Print health buffer by log level and more missing information
 - Periodic update of host time to firmware
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:17:45 +01:00
Jon Maxwell
cf12e6f912 tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
v1: Implement a more general statement as recommended by Eric Dumazet. The
sequence number will be advanced, so this check will fix the FIN case and
other cases.

A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that
the write_queue was not empty as determined by tcp_write_queue_empty() but the
sk_buff containing the FIN flag had been freed and the socket was zombied in
that state. Corresponding pcaps show no FIN from the Linux kernel on the wire.

Some instrumentation was added to the kernel and it was found that there is a
timing window where tcp_sendmsg() can run after tcp_send_fin().

tcp_sendmsg() will hit an error, for example:

1269 ▹       if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩
1270 ▹       ▹       goto do_error;↩

tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The
TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent.

If the other side sends a FIN packet the socket will transition to CLOSING and
remain that way until the system is rebooted.

Fix this by checking for the FIN flag in the sk_buff and don't free it if that
is the case. Testing confirmed that fixed the issue.

Fixes: fdfc5c8594 ("tcp: remove empty skb from write queue in error cases")
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Reported-by: Monir Zouaoui <Monir.Zouaoui@mail.schwarz>
Reported-by: Simon Stier <simon.stier@mail.schwarz>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:10:04 +01:00
Jakub Kicinski
36d935a0a6 Merge branch 'small-fixes-for-true-expression-checks'
Jean Sacren says:

====================
Small fixes for true expression checks

This series fixes checks of true !rc expression.
====================

Link: https://lore.kernel.org/r/cover.1634974124.git.sakiwit@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 19:11:17 -07:00
Jean Sacren
036f590fe5 net: qed_dev: fix check of true !rc expression
Remove the check of !rc in (!rc && !resc_lock_params.b_granted) since it
is always true.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 19:11:13 -07:00
Jean Sacren
165f8e82c2 net: qed_ptp: fix check of true !rc expression
Remove the check of !rc in (!rc && !params.b_granted) since it is always
true.

We should also use constant 0 for return.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 19:11:13 -07:00
Jakub Kicinski
e43b76abf7 Merge branch 'tcp-receive-path-optimizations'
Eric Dumazet says:

====================
tcp: receive path optimizations

This series aims to reduce cache line misses in RX path.

I am still working on better cache locality in tcp_sock but
this will wait few more weeks.
====================

Link: https://lore.kernel.org/r/20211025164825.259415-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:17 -07:00
Eric Dumazet
12c8691de3 ipv6/tcp: small drop monitor changes
Two kfree_skb() calls must be replaced by consume_skb()
for skbs that are not technically dropped.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:14 -07:00
Eric Dumazet
020e71a3cf ipv4: guard IP_MINTTL with a static key
RFC 5082 IP_MINTTL option is rarely used on hosts.

Add a static key to remove from TCP fast path useless code,
and potential cache line miss to fetch inet_sk(sk)->min_ttl

Note that once ip4_min_ttl static key has been enabled,
it stays enabled until next boot.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:14 -07:00
Eric Dumazet
14834c4f4e ipv4: annotate data races arount inet->min_ttl
No report yet from KCSAN, yet worth documenting the races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:13 -07:00
Eric Dumazet
790eb67374 ipv6: guard IPV6_MINHOPCOUNT with a static key
RFC 5082 IPV6_MINHOPCOUNT is rarely used on hosts.

Add a static key to remove from TCP fast path useless code,
and potential cache line miss to fetch tcp_inet6_sk(sk)->min_hopcount

Note that once ip6_min_hopcount static key has been enabled,
it stays enabled until next boot.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:13 -07:00
Eric Dumazet
cc17c3c8e8 ipv6: annotate data races around np->min_hopcount
No report yet from KCSAN, yet worth documenting the races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:13 -07:00
Eric Dumazet
09b8984667 net: annotate accesses to sk->sk_rx_queue_mapping
sk->sk_rx_queue_mapping can be modified locklessly,
add a couple of READ_ONCE()/WRITE_ONCE() to document this fact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:13 -07:00
Eric Dumazet
342159ee39 net: avoid dirtying sk->sk_rx_queue_mapping
sk_rx_queue_mapping is located in a cache line that should be kept read mostly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:13 -07:00
Eric Dumazet
2b13af8ade net: avoid dirtying sk->sk_napi_id
sk_napi_id is located in a cache line that can be kept read mostly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:12 -07:00
Eric Dumazet
ef57c1610d ipv6: move inet6_sk(sk)->rx_dst_cookie to sk->sk_rx_dst_cookie
Increase cache locality by moving rx_dst_coookie next to sk->sk_rx_dst

This removes one or two cache line misses in IPv6 early demux (TCP/UDP)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:12 -07:00
Eric Dumazet
0c0a5ef809 tcp: move inet->rx_dst_ifindex to sk->sk_rx_dst_ifindex
Increase cache locality by moving rx_dst_ifindex next to sk->sk_rx_dst

This is part of an effort to reduce cache line misses in TCP fast path.

This removes one cache line miss in early demux.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 18:02:12 -07:00
Alexander Lobakin
fd559a943e ax88796c: fix fetching error stats from percpu containers
rx_dropped, tx_dropped, rx_frame_errors and rx_crc_errors are being
wrongly fetched from the target container rather than source percpu
ones.
No idea if that goes from the vendor driver or was brainoed during
the refactoring, but fix it either way.

Fixes: a97c69ba4f ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Acked-by: Łukasz Stelmach <l.stelmach@samsung.com>
Link: https://lore.kernel.org/r/20211023121148.113466-1-alobakin@pm.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 17:33:16 -07:00
Parav Pandit
d67ab0a8c1 net/mlx5: SF_DEV Add SF device trace points
Add SF device add and delete specific trace points.

echo mlx5:mlx5_sf_dev_add >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_dev_del >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_vhca_event >> /sys/kernel/debug/tracing/set_event

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:21 -07:00
Parav Pandit
b3ccada68b net/mlx5: SF, Add SF trace points
Add support for trace events for SFs to improve debugging.
This covers
(a) port add and free trace points
(b) device level trace points
(c) SF hardware context add, free trace points.
(d) SF function activate/deacticate and state trace points

SF events examples:
echo mlx5:mlx5_sf_add >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_free >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_hwc_alloc >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_hwc_free >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_hwc_deferred_free >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_update_state >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_activate >> /sys/kernel/debug/tracing/set_event
echo mlx5:mlx5_sf_deactivate >> /sys/kernel/debug/tracing/set_event

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:21 -07:00
Shay Drory
5546040619 net/mlx5: Let user configure max_macs param
Currently, max_macs is taking 70Kbytes of memory per function. This
size is not needed in all use cases, and is critical with large scale.
Hence, allow user to configure the number of max_macs.

For example, to reduce the number of max_macs to 1, execute::
$ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \
              cmode driverinit
$ devlink dev reload pci/0000:00:0b.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:21 -07:00
Shay Drory
a6cb08daa3 net/mlx5: Let user configure event_eq_size param
Event EQ is an EQ which received the notification of almost all the
events generated by the NIC.
Currently, each event EQ is taking 512KB of memory. This size is not
needed in most use cases, and is critical with large scale. Hence,
allow user to configure the size of the event EQ.

For example to reduce event EQ size to 64, execute::
$ devlink resource set pci/0000:00:0b.0 path /event_eq_size/ size 64
$ devlink dev reload pci/0000:00:0b.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:20 -07:00
Shay Drory
46ae40b94d net/mlx5: Let user configure io_eq_size param
Currently, each I/O EQ is taking 128KB of memory. This size
is not needed in all use cases, and is critical with large scale.
Hence, allow user to configure the size of I/O EQs.

For example, to reduce I/O EQ size to 64, execute:
$ devlink resource set pci/0000:00:0b.0 path /io_eq_size/ size 64
$ devlink dev reload pci/0000:00:0b.0

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:20 -07:00
Vlad Buslov
3518c83fc9 net/mlx5: Bridge, support replacing existing FDB entry
The SWITCHDEV_FDB_ADD_TO_DEVICE is used for both adding new and replacing
existing entry. Implement support for replacing existing FDB entries in
mlx5 offload code.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:20 -07:00
Vlad Buslov
2deda2f1bf net/mlx5: Bridge, extract code to lookup and del/notify entry
Following two patterns in bridge code are used in multiple places where
similar code is duplicated:

- Lookup FDB entry from hashtable by address+vid pair.

- Notify software bridge and then delete existing FDB entry.

In order to improve code quality and prepare for following patch series
that also uses described patterns, extract the codes to dedicated helper
functions.

This commit doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:20 -07:00
Aya Levin
5a1023deee net/mlx5: Add periodic update of host time to firmware
Firmware logs its asserts also to non-volatile memory. In order to
reduce drift between the NIC and the host, the driver sets the host
epoch-time to the firmware every hour.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:19 -07:00
Aya Levin
b87ef75cb5 net/mlx5: Print health buffer by log level
Add log macro which gets log level as a parameter. Use the severity
read from the health buffer and the new log macro to log the health buffer
with severity as log level.  Prior to this patch, health buffer was
printed in error log level regardless of its severity. Now the user may
filter dmesg (--level) or change kernel log level to focus on different
severity levels of firmware errors.

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:19 -07:00
Aya Levin
cb464ba53c net/mlx5: Extend health buffer dump
Enhance health buffer to include:
 - assert_var5: expose the 6'th assert variable.
 - time: error's time-stamp in seconds (epoch time).
 - rfr: Recovery Flow Requiered. When set, indicates that the error
        cannot be recovered without flow involving reset.
 - severity: error's severity value, ranging from emergency to debug.
Expose them in the health buffer dump (dmesg and devlink fw reporter).

Health buffer in dmesg:
mlx5_core 0000:08:00.0: print_health_info:425:(pid 912): Health issue observed, firmware internal error, severity(3) ERROR:
mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[0] 0x08040700
mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[1] 0x00000000
mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[2] 0x00000000
mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[3] 0x00000000
mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[4] 0x00000000
mlx5_core 0000:08:00.0: print_health_info:429:(pid 912): assert_var[5] 0x00000000
mlx5_core 0000:08:00.0: print_health_info:432:(pid 912): assert_exit_ptr 0x00aaf800
mlx5_core 0000:08:00.0: print_health_info:434:(pid 912): assert_callra 0x00aaf70c
mlx5_core 0000:08:00.0: print_health_info:436:(pid 912): fw_ver 16.32.492
mlx5_core 0000:08:00.0: print_health_info:437:(pid 912): time 1634819758
mlx5_core 0000:08:00.0: print_health_info:438:(pid 912): hw_id 0x0000020d
mlx5_core 0000:08:00.0: print_health_info:439:(pid 912): rfr 0
mlx5_core 0000:08:00.0: print_health_info:440:(pid 912): severity 3 (ERROR)
mlx5_core 0000:08:00.0: print_health_info:441:(pid 912): irisc_index 9
mlx5_core 0000:08:00.0: print_health_info:442:(pid 912): synd 0x1: firmware internal error
mlx5_core 0000:08:00.0: print_health_info:444:(pid 912): ext_synd 0x802b
mlx5_core 0000:08:00.0: print_health_info:445:(pid 912): raw fw_ver 0x102001ec

Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:19 -07:00
Avihai Horon
2fdeb4f4c2 net/mlx5: Reduce flow counters bulk query buffer size for SFs
Currently, the flow counters bulk query buffer takes a little more than
512KB of memory, which is aligned to the next power of 2, to 1MB.

The buffer size determines the maximum number of flow counters that can
be queried at a time. Thus, having a bigger buffer can improve
performance for users that need to query many flow counters.

SFs don't use many flow counters and don't need a big buffer. Since this
size is critical with large scale, reduce the size of the bulk query
buffer for SFs.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:18 -07:00
Shay Drory
038e5e4718 net/mlx5: Fix unused function warning of mlx5i_flow_type_mask
The cited commit is causing unused-function warning[1] when
CONFIG_MLX5_EN_RXNFC is not set.
Fix this by moving the function into the ifdef, where it's only used

[1]
warning: ‘mlx5i_flow_type_mask’ defined but not used [-Wunused-function]

Fixes: 9fbe1c25ec ("net/mlx5i: Enable Rx steering for IPoIB via ethtool")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:18 -07:00
Paul Blakey
a64c5edbd2 net/mlx5: Remove unnecessary checks for slow path flag
After previous changes, caller (mlx5e_tc_offload_fdb_rules()) already
checks for the slow path flag, and if set won't call offload/unoffload
sample.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:18 -07:00
Jakub Kicinski
537e4d2e6f net/mlx5e: don't write directly to netdev->dev_addr
Use a local buffer and eth_hw_addr_set()

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-25 13:51:17 -07:00
Jakub Kicinski
dcd63d4326 Merge branch 'bluetooth-don-t-write-directly-to-netdev-dev_addr'
Jakub Kicinski says:

====================
bluetooth: don't write directly to netdev->dev_addr

The usual conversions.
====================

Link: https://lore.kernel.org/r/20211022231834.2710245-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 11:01:33 -07:00
Jakub Kicinski
a1916d3446 bluetooth: use dev_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.

Reviewed-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 11:01:29 -07:00
Jakub Kicinski
08c181f052 bluetooth: use eth_hw_addr_set()
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.

Convert bluetooth from memcpy(... ETH_ADDR) to eth_hw_addr_set():

  @@
  expression dev, np;
  @@
  - memcpy(dev->dev_addr, np, ETH_ALEN)
  + eth_hw_addr_set(dev, np)

Reviewed-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 11:01:24 -07:00
Jakub Kicinski
a0c8c3372b fddi: defza: add missing pointer type cast
hw_addr is a uint AKA unsigned int. dev_addr_set() takes
a u8 *.

  drivers/net/fddi/defza.c:1383:27: error: passing argument 2 of 'dev_addr_set' from incompatible pointer type [-Werror=incompatible-pointer-types]

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 1e9258c389 ("fddi: defxx,defza: use dev_addr_set()")
Acked-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://lore.kernel.org/r/20211025160000.2803818-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25 10:28:47 -07:00
Tianjia Zhang
3fb59a5de5 net/tls: getsockopt supports complete algorithm list
AES_CCM_128 and CHACHA20_POLY1305 are already supported by tls,
similar to setsockopt, getsockopt also needs to support these
two algorithms.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25 15:55:30 +01:00
Tianjia Zhang
39d8fb96e3 net/tls: tls_crypto_context add supported algorithms context
tls already supports the SM4 GCM/CCM algorithms. It is also necessary
to add support for these two algorithms in tls_crypto_context to avoid
potential issues caused by forced type conversion.

Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25 15:54:46 +01:00