1074769 Commits

Author SHA1 Message Date
Minghao Chi
51ae468aa7 can: softing: softing_netdev_open(): remove redundant ret variable
Return value from softing_startstop() directly instead of taking this
in another redundant variable.

Link: https://lore.kernel.org/all/20220112080629.667191-1-chi.minghao@zte.com.cn
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: CGEL ZTE <cgel.zte@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:04 +01:00
Marc Kleine-Budde
8d0a82e1f4 can: c_can: ethtool: use default drvinfo
The ethtool core implements a default drvinfo.

There's no need to replicate this in the driver, no additional
information is added, so remove this and rely on the default.

Link: https://lore.kernel.org/all/20220124215642.3474154-10-mkl@pengutronix.de
Cc: Dario Binacchi <dariobin@libero.it>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:04 +01:00
Marc Kleine-Budde
1c256e3a2c can: kvaser_usb: kvaser_usb_send_cmd(): remove redundant variable actual_len
The function usb_bulk_msg() can be called with a NULL pointer as the
"actual_length" parameter. This patch removes this variable.

Link: https://lore.kernel.org/all/20220124215642.3474154-9-mkl@pengutronix.de
Cc: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:04 +01:00
Marc Kleine-Budde
5597f082fc can: bittiming: mark function arguments and local variables as const
This patch marks the arguments of some functions as well as some local
variables as constant.

Link: https://lore.kernel.org/all/20220124215642.3474154-7-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:03 +01:00
Marc Kleine-Budde
5b60d334e4 can: bittiming: can_validate_bitrate(): simplify bit rate checking
This patch simplifies the validation of the fixed bit rates. If a
supported bit rate is found, directly return 0.

If no valid bit rate is found return -EINVAL;

Link: https://lore.kernel.org/all/20220124215642.3474154-6-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:03 +01:00
Eric Dumazet
181d444790 can: gw: use call_rcu() instead of costly synchronize_rcu()
Commit fb8696ab14ad ("can: gw: synchronize rcu operations
before removing gw job entry") added three synchronize_rcu() calls
to make sure one rcu grace period was observed before freeing
a "struct cgw_job" (which are tiny objects).

This should be converted to call_rcu() to avoid adding delays
in device / network dismantles.

Use the rcu_head that was already in struct cgw_job,
not yet used.

Link: https://lore.kernel.org/all/20220207190706.1499190-1-eric.dumazet@gmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:03 +01:00
Marc Kleine-Budde
58212e03e5 dt-binding: can: m_can: include common CAN controller bindings
Since commit

| 1f9234401ce0 ("dt-bindings: can: add can-controller.yaml")

there is a common CAN controller binding. Add this to the m_can
binding.

Link: https://lore.kernel.org/all/20220124220653.3477172-4-mkl@pengutronix.de
Reviewed-by: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:03 +01:00
Marc Kleine-Budde
bffd5217ca dt-binding: can: m_can: fix indention of table in bosch,mram-cfg description
This patch fixes the indention of the table in the description of the
bosch,mram-cfg property.

Link: https://lore.kernel.org/all/20220217101111.2291151-1-mkl@pengutronix.de
Reviewed-by: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:03 +01:00
Marc Kleine-Budde
edd056a109 dt-binding: can: m_can: list Chandrasekar Ramakrishnan as maintainer
Since Sriram Dash's email bounces, change the maintainer entry to
Chandrasekar Ramakrishnan. Chandrasekar Ramakrishnan is already listed
as a maintainer in the MAINTAINERS file.

Link: https://lore.kernel.org/all/20220217113839.2311417-1-mkl@pengutronix.de
Cc: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:03 +01:00
Marc Kleine-Budde
d931686dc2 dt-binding: can: sun4i_can: include common CAN controller bindings
Since commit

| 1f9234401ce0 ("dt-bindings: can: add can-controller.yaml")

there is a common CAN controller binding. Add this to the sun4i_can
binding.

Link: https://lore.kernel.org/all/20220124220653.3477172-3-mkl@pengutronix.de
Cc: Evgeny Boger <boger@wirenboard.com>
Cc: Gerhard Bertelsmann <info@gerhard-bertelsmann.de>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:03 +01:00
Marc Kleine-Budde
66224f6656 dt-binding: can: mcp251xfd: include common CAN controller bindings
Since commit

| 1f9234401ce0 ("dt-bindings: can: add can-controller.yaml")

there is a common CAN controller binding. Add this to the mcp251xfd
binding.

Link: https://lore.kernel.org/all/20220124220653.3477172-2-mkl@pengutronix.de
Cc: Manivannan Sadhasivam <mani@kernel.org>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-02-24 08:26:03 +01:00
Jakub Kicinski
e422eef268 Merge branch 'add-ethtool-support-for-completion-queue-event-size'
Subbaraya Sundeep says:

====================
Add ethtool support for completion queue event size

After a packet is sent or received by NIC then NIC posts
a completion queue event which consists of transmission status
(like send success or error) and received status(like
pointers to packet fragments). These completion events may
also use a ring similar to rx and tx rings. This patchset
introduces cqe-size ethtool parameter to modify the size
of the completion queue event if NIC hardware has that capability.
A bigger completion queue event can have more receive buffer pointers
inturn NIC can transfer a bigger frame from wire as long as
hardware(MAC) receive frame size limit is not exceeded.

Patch 1 adds support setting/getting cqe-size via
ethtool -G and ethtool -g.

Patch 2 includes octeontx2 driver changes to use
completion queue event size set from ethtool -G.
====================

Link: https://lore.kernel.org/r/1645555153-4932-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23 20:33:16 -08:00
Subbaraya Sundeep
68258596cb octeontx2-pf: Vary completion queue event size
Completion Queue Entry(CQE) is a descriptor written
by hardware to notify software about the send and
receive completion status. The CQE can be of size
128 or 512 bytes. A 512 bytes CQE can hold more receive
fragments pointers compared to 128 bytes CQE. This
patch enables to modify CQE size using:
<ethtool -G cqe-size N>.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23 20:33:06 -08:00
Subbaraya Sundeep
1241e329ce ethtool: add support to set/get completion queue event size
Add support to set completion queue event size via ethtool -G
parameter and get it via ethtool -g parameter.

~ # ./ethtool -G eth0 cqe-size 512
~ # ./ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:             1048576
RX Mini:        n/a
RX Jumbo:       n/a
TX:             1048576
Current hardware settings:
RX:             256
RX Mini:        n/a
RX Jumbo:       n/a
TX:             4096
RX Buf Len:             2048
CQE Size:                128

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23 20:33:05 -08:00
Xin Long
6a47cdc381 Revert "vlan: move dev_put into vlan_dev_uninit"
This reverts commit d6ff94afd90b0ce8d1715f8ef77d4347d7a7f2c0.

Since commit faab39f63c1f ("net: allow out-of-order netdev unregistration")
fixed the issue in a better way, this patch is to revert the previous fix,
as it might bring back the old problem fixed by commit 563bcbae3ba2 ("net:
vlan: fix a UAF in vlan_dev_real_dev()").

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/563c0a6e48510ccbff9ef4715de37209695e9fc4.1645592097.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23 15:21:13 -08:00
Sebastian Andrzej Siewior
167053f8dd net: Correct wrong BH disable in hard-interrupt.
I missed the obvious case where netif_ix() is invoked from hard-IRQ
context.

Disabling bottom halves is only needed in process context. This ensures
that the code remains on the current CPU and that the soft-interrupts
are processed at local_bh_enable() time.
In hard- and soft-interrupt context this is already the case and the
soft-interrupts will be processed once the context is left (at irq-exit
time).

Disable bottom halves if neither hard-interrupts nor soft-interrupts are
disabled. Update the kernel-doc, mention that interrupts must be enabled
if invoked from process context.

Fixes: baebdf48c3600 ("net: dev: Makes sure netif_rx() can be invoked in any context.")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/Yg05duINKBqvnxUc@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-23 08:04:27 -08:00
David S. Miller
6ce71687d4 Merge branch 'locked-bridge-ports'
Hans Schultz says:

====================
Add support for locked bridge ports (for 802.1X)

This series starts by adding support for SA filtering to the bridge,
which is then allowed to be offloaded to switchdev devices. Furthermore
an offloading implementation is supplied for the mv88e6xxx driver.

Public Local Area Networks are often deployed such that there is a
risk of unauthorized or unattended clients getting access to the LAN.
To prevent such access we introduce SA filtering, such that ports
designated as secure ports are set in locked mode, so that only
authorized source MAC addresses are given access by adding them to
the bridges forwarding database. Incoming packets with source MAC
addresses that are not in the forwarding database of the bridge are
discarded. It is then the task of user space daemons to populate the
bridge's forwarding database with static entries of authorized entities.

The most common approach is to use the IEEE 802.1X protocol to take
care of the authorization of allowed users to gain access by opening
for the source address of the authorized host.

With the current use of the bridge parameter in hostapd, there is
a limitation in using this for IEEE 802.1X port authentication. It
depends on hostapd attaching the port on which it has a successful
authentication to the bridge, but that only allows for a single
authentication per port. This patch set allows for the use of
IEEE 802.1X port authentication in a more general network context with
multiple 802.1X aware hosts behind a single port as depicted, which is
a commonly used commercial use-case, as it is only the number of
available entries in the forwarding database that limits the number of
authenticated clients.

      +--------------------------------+
      |                                |
      |      Bridge/Authenticator      |
      |                                |
      +-------------+------------------+
       802.1X port  |
                    |
                    |
             +------+-------+
             |              |
             |  Hub/Switch  |
             |              |
             +-+----------+-+
               |          |
            +--+--+    +--+--+
            |     |    |     |
    Hosts   |  a  |    |  b  |   . . .
            |     |    |     |
            +-----+    +-----+

The 802.1X standard involves three different components, a Supplicant
(Host), an Authenticator (Network Access Point) and an Authentication
Server which is typically a Radius server. This patch set thus enables
the bridge module together with an authenticator application to serve
as an Authenticator on designated ports.

For the bridge to become an IEEE 802.1X Authenticator, a solution using
hostapd with the bridge driver can be found at
https://github.com/westermo/hostapd/tree/bridge_driver .

The relevant components work transparently in relation to if it is the
bridge module or the offloaded switchcore case that is in use.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:52:35 +00:00
Hans Schultz
b2b681a412 selftests: forwarding: tests of locked port feature
These tests check that the basic locked port feature works, so that
no 'host' can communicate (ping) through a locked port unless the
MAC address of the 'host' interface is in the forwarding database of
the bridge.

Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com>
Acked-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:52:34 +00:00
Hans Schultz
34ea415f92 net: dsa: mv88e6xxx: Add support for bridge port locked mode
Supporting bridge ports in locked mode using the drop on lock
feature in Marvell mv88e6xxx switchcores is described in the
'88E6096/88E6097/88E6097F Datasheet', sections 4.4.6, 4.4.7 and
5.1.2.1 (Drop on Lock).

This feature is implemented here facilitated by the locked port flag.

Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:52:34 +00:00
Hans Schultz
b9e8b58fd2 net: dsa: Include BR_PORT_LOCKED in the list of synced brport flags
Ensures that the DSA switch driver gets notified of changes to the
BR_PORT_LOCKED flag as well, for the case when a DSA port joins or
leaves a LAG that is a bridge port.

Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:52:34 +00:00
Hans Schultz
fa1c833429 net: bridge: Add support for offloading of locked port flag
Various switchcores support setting ports in locked mode, so that
clients behind locked ports cannot send traffic through the port
unless a fdb entry is added with the clients MAC address.

Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:52:34 +00:00
Hans Schultz
a21d9a670d net: bridge: Add support for bridge port in locked mode
In a 802.1X scenario, clients connected to a bridge port shall not
be allowed to have traffic forwarded until fully authenticated.
A static fdb entry of the clients MAC address for the bridge port
unlocks the client and allows bidirectional communication.

This scenario is facilitated with setting the bridge port in locked
mode, which is also supported by various switchcore chipsets.

Signed-off-by: Hans Schultz <schultz.hans+netdev@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:52:34 +00:00
Eric Dumazet
b26ef81c46 drop_monitor: remove quadratic behavior
drop_monitor is using an unique list on which all netdevices in
the host have an element, regardless of their netns.

This scales poorly, not only at device unregister time (what I
caught during my netns dismantle stress tests), but also at packet
processing time whenever trace_napi_poll_hit() is called.

If the intent was to avoid adding one pointer in 'struct net_device'
then surely we prefer O(1) behavior.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:39:58 +00:00
David S. Miller
503310a5d4 Merge branch 'mlxsw-next'
Ido Schimmel says:

====================
mlxsw: Various updates

This patchset contains miscellaneous updates to mlxsw gathered over
time.

Patches #1-#2 fix recent regressions present in net-next.

Patches #3-#11 are small cleanups performed while adding line card
support in mlxsw.

Patch #12 adds the SFF-8024 Identifier Value of OSFP transceiver in
order to be able to dump their EEPROM contents over the ethtool IOCTL
interface.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:17 +00:00
Danielle Ratson
f881c4ab37 mlxsw: core: Add support for OSFP transceiver modules
The driver can already dump the EEPROM contents of QSFP-DD transceiver
modules via its ethtool_ops::get_module_info() and
ethtool_ops::get_module_eeprom() callbacks.

Add support for OSFP transceiver modules by adding their SFF-8024
Identifier Value (0x19).

This is required for future NVIDIA Spectrum-4 based systems that will be
equipped with OSFP transceivers.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:17 +00:00
Ido Schimmel
cc4d3de990 mlxsw: Remove resource query check
Since SwitchX-2 support was removed in commit b0d80c013b04 ("mlxsw:
Remove Mellanox SwitchX-2 ASIC support"), all the ASICs supported by
mlxsw support the resource query command.

Therefore, remove the resource query check and always query resources
from the device.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:17 +00:00
Vadim Pasternak
902992d18f mlxsw: core: Unify method of trap support validation
Currently there are several different features defined in 'mlxsw_driver'
for trap support validation. There is no reason to have dedicated
features for specific traps. Perform validation of all of them by
testing feature 'MLXSW_BUS_F_TXRX'.

Remove trap capability validation from 'core_env.c' which is redundant
after validation has been added to mlxsw_core_trap_register().

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:17 +00:00
Jiri Pirko
8b5f555be8 mlxsw: spectrum: Remove SP{1,2,3} defines for FW minor and subminor
The FW minor and subminor versions are the same for all generations of
Spectrum ASICs. Unify them into a single set of defines.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:17 +00:00
Vadim Pasternak
af9911c569 mlxsw: core: Remove unnecessary asserts
Remove unnecessary asserts for module index validation. Leave only one
that is actually necessary in mlxsw_env_pmpe_listener_func() where the
module index is directly read from the firmware event.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:17 +00:00
Vadim Pasternak
719fc0662c mlxsw: reg: Add "mgpir_" prefix to MGPIR fields comments
Do the same as for other registers and have "mgpir_" prefix for the
MGPIR fields.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:17 +00:00
Vadim Pasternak
bfb82c9cce mlxsw: core_thermal: Remove obsolete API for query resource
Remove obsolete API mlxsw_core_res_query_enabled(), which is only
relevant for end-of-life SwitchX-2 ASICs. Support for these ASICs was
removed in commit b0d80c013b04 ("mlxsw: Remove Mellanox SwitchX-2 ASIC
support").

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:17 +00:00
Vadim Pasternak
009da9fad5 mlxsw: core_thermal: Rename labels according to naming convention
Rename labels for error flow handling in order to align with naming
convention used in rest of 'mlxsw' code.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:17 +00:00
Vadim Pasternak
bed8f4197c mlxsw: core_hwmon: Fix variable names for hwmon attributes
Replace all local variables 'mlwsw_hwmon_attr' by 'mlxsw_hwmon_attr'.
All variable prefixes should start with 'mlxsw' according to the naming
convention, so 'mlwsw' is changed to 'mlxsw'.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:16 +00:00
Vadim Pasternak
f8a36880f4 mlxsw: core_thermal: Avoid creation of virtual hwmon objects by thermal module
The driver registers with both the hwmon and thermal subsystems.
Therefore, there is no need for the thermal subsystem to automatically
create hwmon entries upon registration of a thermal zone, as this
results in duplicate information.

Avoid creation of virtual hwmon objects by thermal subsystem by
registering a thermal zone with 'no_hwmon' set to 'true'.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:16 +00:00
Ido Schimmel
42c9135fef mlxsw: spectrum_span: Ignore VLAN entries not used by the bridge in mirroring
Only VLAN entries installed on the bridge device itself should be
considered when checking whether a packet with a specific VLAN can be
mirrored via a bridge device. VLAN entries only used to keep context
(i.e., entries with 'BRIDGE_VLAN_INFO_BRENTRY' unset) should be ignored.

Fix this by preventing mirroring when the VLAN entry does not have the
'BRIDGE_VLAN_INFO_BRENTRY' flag set.

Fixes: ddaff5047003 ("mlxsw: spectrum: remove guards against !BRIDGE_VLAN_INFO_BRENTRY")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:16 +00:00
Vadim Pasternak
c035ea76c4 mlxsw: core: Prevent trap group setting if driver does not support EMAD
Avoid trap group setting if driver is not capable of EMAD support.
For example, "mlxsw_minimal" driver works over I2C bus, overs which
EMADs cannot be sent.
Validation is performed by testing feature 'MLXSW_BUS_F_TXRX'.

Fixes: 74e0494d35ac ("mlxsw: core: Move basic_trap_groups_set() call out of EMAD init code")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:38:16 +00:00
Matt Johnston
8d783197f0 mctp: Fix warnings reported by clang-analyzer
net/mctp/device.c:140:11: warning: Assigned value is garbage or undefined
[clang-analyzer-core.uninitialized.Assign]
        mcb->idx = idx;

- Not a real problem due to how the callback runs, fix the warning.

net/mctp/route.c:458:4: warning: Value stored to 'msk' is never read
[clang-analyzer-deadcode.DeadStores]
        msk = container_of(key->sk, struct mctp_sock, sk);

- 'msk' dead assignment can be removed here.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:31:39 +00:00
David S. Miller
3185485cfa Merge branch 'mctp-incorrect-addr-refs'
Matt Johnston says:

====================
mctp: Fix incorrect refs for extended addr

This fixes an incorrect netdev unref and also addresses the race
condition identified by Jakub in v2. Thanks for the review.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:29:15 +00:00
Matt Johnston
e297db3ead mctp: Fix incorrect netdev unref for extended addr
In the extended addressing local route output codepath
dev_get_by_index_rcu() doesn't take a dev_hold() so we shouldn't
dev_put().

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:29:15 +00:00
Matt Johnston
dc121c0084 mctp: make __mctp_dev_get() take a refcount hold
Previously there was a race that could allow the mctp_dev refcount
to hit zero:

rcu_read_lock();
mdev = __mctp_dev_get(dev);
// mctp_unregister() happens here, mdev->refs hits zero
mctp_dev_hold(dev);
rcu_read_unlock();

Now we make __mctp_dev_get() take the hold itself. It is safe to test
against the zero refcount because __mctp_dev_get() is called holding
rcu_read_lock and mctp_dev uses kfree_rcu().

Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:29:15 +00:00
David S. Miller
4767b7e2ed Merge branch 'dsa-realtek-phy-read-corruption'
Alvin Šipraga says:

====================
net: dsa: realtek: fix PHY register read corruption

These two patches fix the issue reported by Arınç where PHY register
reads sometimes return garbage data.

v1 -> v2:

- no code changes
- just update the commit message of patch 2 to reflect the conclusion
  of further investigation requested by Vladimir
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:24:29 +00:00
Alvin Šipraga
2796728460 net: dsa: realtek: rtl8365mb: serialize indirect PHY register access
Realtek switches in the rtl8365mb family can access the PHY registers of
the internal PHYs via the switch registers. This method is called
indirect access. At a high level, the indirect PHY register access
method involves reading and writing some special switch registers in a
particular sequence. This works for both SMI and MDIO connected
switches.

Currently the rtl8365mb driver does not take any care to serialize the
aforementioned access to the switch registers. In particular, it is
permitted for other driver code to access other switch registers while
the indirect PHY register access is ongoing. Locking is only done at the
regmap level. This, however, is a bug: concurrent register access, even
to unrelated switch registers, risks corrupting the PHY register value
read back via the indirect access method described above.

Arınç reported that the switch sometimes returns nonsense data when
reading the PHY registers. In particular, a value of 0 causes the
kernel's PHY subsystem to think that the link is down, but since most
reads return correct data, the link then flip-flops between up and down
over a period of time.

The aforementioned bug can be readily observed by:

 1. Enabling ftrace events for regmap and mdio
 2. Polling BSMR PHY register for a connected port;
    it should always read the same (e.g. 0x79ed)
 3. Wait for step 2 to give a different value

Example command for step 2:

    while true; do phytool read swp2/2/0x01; done

On my i.MX8MM, the above steps will yield a bogus value for the BSMR PHY
register within a matter of seconds. The interleaved register access it
then evident in the trace log:

 kworker/3:4-70      [003] .......  1927.139849: regmap_reg_write: ethernet-switch reg=1004 val=bd
     phytool-16816   [002] .......  1927.139979: regmap_reg_read: ethernet-switch reg=1f01 val=0
 kworker/3:4-70      [003] .......  1927.140381: regmap_reg_read: ethernet-switch reg=1005 val=0
     phytool-16816   [002] .......  1927.140468: regmap_reg_read: ethernet-switch reg=1d15 val=a69
 kworker/3:4-70      [003] .......  1927.140864: regmap_reg_read: ethernet-switch reg=1003 val=0
     phytool-16816   [002] .......  1927.140955: regmap_reg_write: ethernet-switch reg=1f02 val=2041
 kworker/3:4-70      [003] .......  1927.141390: regmap_reg_read: ethernet-switch reg=1002 val=0
     phytool-16816   [002] .......  1927.141479: regmap_reg_write: ethernet-switch reg=1f00 val=1
 kworker/3:4-70      [003] .......  1927.142311: regmap_reg_write: ethernet-switch reg=1004 val=be
     phytool-16816   [002] .......  1927.142410: regmap_reg_read: ethernet-switch reg=1f01 val=0
 kworker/3:4-70      [003] .......  1927.142534: regmap_reg_read: ethernet-switch reg=1005 val=0
     phytool-16816   [002] .......  1927.142618: regmap_reg_read: ethernet-switch reg=1f04 val=0
     phytool-16816   [002] .......  1927.142641: mdio_access: SMI-0 read  phy:0x02 reg:0x01 val:0x0000 <- ?!
 kworker/3:4-70      [003] .......  1927.143037: regmap_reg_read: ethernet-switch reg=1001 val=0
 kworker/3:4-70      [003] .......  1927.143133: regmap_reg_read: ethernet-switch reg=1000 val=2d89
 kworker/3:4-70      [003] .......  1927.143213: regmap_reg_write: ethernet-switch reg=1004 val=be
 kworker/3:4-70      [003] .......  1927.143291: regmap_reg_read: ethernet-switch reg=1005 val=0
 kworker/3:4-70      [003] .......  1927.143368: regmap_reg_read: ethernet-switch reg=1003 val=0
 kworker/3:4-70      [003] .......  1927.143443: regmap_reg_read: ethernet-switch reg=1002 val=6

The kworker here is polling MIB counters for stats, as evidenced by the
register 0x1004 that we are writing to (RTL8365MB_MIB_ADDRESS_REG). This
polling is performed every 3 seconds, but is just one example of such
unsynchronized access. In Arınç's case, the driver was not using the
switch IRQ, so the PHY subsystem was itself doing polling analogous to
phytool in the above example.

A test module was created [see second Link] to simulate such spurious
switch register accesses while performing indirect PHY register reads
and writes. Realtek was also consulted to confirm whether this is a
known issue or not. The conclusion of these lines of inquiry is as
follows:

1. Reading of PHY registers via indirect access will be aborted if,
   after executing the read operation (via a write to the
   INDIRECT_ACCESS_CTRL_REG), any register is accessed, other than
   INDIRECT_ACCESS_STATUS_REG.

2. The PHY register indirect read is only complete when
   INDIRECT_ACCESS_STATUS_REG reads zero.

3. The INDIRECT_ACCESS_DATA_REG, which is read to get the result of the
   PHY read, will contain the result of the last successful read
   operation. If there was spurious register access and the indirect
   read was aborted, then this register is not guaranteed to hold
   anything meaningful and the PHY read will silently fail.

4. PHY writes do not appear to be affected by this mechanism.

5. Other similar access routines, such as for MIB counters, although
   similar to the PHY indirect access method, are actually table access.
   Table access is not affected by spurious reads or writes of other
   registers. However, concurrent table access is not allowed. Currently
   this is protected via mib_lock, so there is nothing to fix.

The above statements are corroborated both via the test module and
through consultation with Realtek. In particular, Realtek states that
this is simply a property of the hardware design and is not a hardware
bug.

To fix this problem, one must guard against regmap access while the
PHY indirect register read is executing. Fix this by using the newly
introduced "nolock" regmap in all PHY-related functions, and by aquiring
the regmap mutex at the top level of the PHY register access callbacks.
Although no issue has been observed with PHY register _writes_, this
change also serializes the indirect access method there. This is done
purely as a matter of convenience and for reasons of symmetry.

Fixes: 4af2950c50c8 ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
Link: https://lore.kernel.org/netdev/CAJq09z5FCgG-+jVT7uxh1a-0CiiFsoKoHYsAWJtiKwv7LXKofQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/871qzwjmtv.fsf@bang-olufsen.dk/
Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:24:29 +00:00
Alvin Šipraga
907e772f6f net: dsa: realtek: allow subdrivers to externally lock regmap
Currently there is no way for Realtek DSA subdrivers to serialize
consecutive regmap accesses. In preparation for a bugfix relating to
indirect PHY register access - which involves a series of regmap
reads and writes - add a facility for subdrivers to serialize their
regmap access.

Specifically, a mutex is added to the driver private data structure and
the standard regmap is initialized with custom lock/unlock ops which use
this mutex. Then, a "nolock" variant of the regmap is added, which is
functionally equivalent to the existing regmap except that regmap
locking is disabled. Functions that wish to serialize a sequence of
regmap accesses may then lock the newly introduced driver-owned mutex
before using the nolock regmap.

Doing things this way means that subdriver code that doesn't care about
serialized register access - i.e. the vast majority of code - needn't
worry about synchronizing register access with an external lock: it can
just continue to use the original regmap.

Another advantage of this design is that, while regmaps with locking
disabled do not expose a debugfs interface for obvious reasons, there
still exists the original regmap which does expose this interface. This
interface remains safe to use even combined with driver codepaths that
use the nolock regmap, because said codepaths will use the same mutex
to synchronize access.

With respect to disadvantages, it can be argued that having
near-duplicate regmaps is confusing. However, the naming is rather
explicit, and examples will abound.

Finally, while we are at it, rename realtek_smi_mdio_regmap_config to
realtek_smi_regmap_config. This makes it consistent with the naming
realtek_mdio_regmap_config in realtek-mdio.c.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:24:29 +00:00
Vladimir Oltean
acd8df5880 net: switchdev: avoid infinite recursion from LAG to bridge with port object handler
The logic from switchdev_handle_port_obj_add_foreign() is directly
adapted from switchdev_handle_fdb_event_to_device(), which already
detects events on foreign interfaces and reoffloads them towards the
switchdev neighbors.

However, when we have a simple br0 <-> bond0 <-> swp0 topology and the
switchdev_handle_port_obj_add_foreign() gets called on bond0, we get
stuck into an infinite recursion:

1. bond0 does not pass check_cb(), so we attempt to find switchdev
   neighbor interfaces. For that, we recursively call
   __switchdev_handle_port_obj_add() for bond0's bridge, br0.

2. __switchdev_handle_port_obj_add() recurses through br0's lowers,
   essentially calling __switchdev_handle_port_obj_add() for bond0

3. Go to step 1.

This happens because switchdev_handle_fdb_event_to_device() and
switchdev_handle_port_obj_add_foreign() are not exactly the same.
The FDB event helper special-cases LAG interfaces with its lag_mod_cb(),
so this is why we don't end up in an infinite loop - because it doesn't
attempt to treat LAG interfaces as potentially foreign bridge ports.

The problem is solved by looking ahead through the bridge's lowers to
see whether there is any switchdev interface that is foreign to the @dev
we are currently processing. This stops the recursion described above at
step 1: __switchdev_handle_port_obj_add(bond0) will not create another
call to __switchdev_handle_port_obj_add(br0). Going one step upper
should only happen when we're starting from a bridge port that has been
determined to be "foreign" to the switchdev driver that passes the
foreign_dev_check_cb().

Fixes: c4076cdd21f8 ("net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:12:12 +00:00
Shannon Nelson
922ea87ff6 ionic: use vmalloc include
The ever-vigilant Linux kernel test robot reminded us that
we need to use the correct include files to be sure that
all the build variations will work correctly.  Adding the
vmalloc.h include takes care of declaring our use of vzalloc()
and vfree().

drivers/net/ethernet/pensando/ionic/ionic_lif.c:396:17: error: implicit
declaration of function 'vfree'; did you mean 'kvfree'?

drivers/net/ethernet/pensando/ionic/ionic_lif.c:531:21: warning:
assignment to 'struct ionic_desc_info *' from 'int' makes pointer from
integer without a cast

Fixes: 116dce0ff047 ("ionic: Use vzalloc for large per-queue related buffers")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Link: https://lore.kernel.org/r/20220223015731.22025-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-22 19:53:14 -08:00
Jakub Kicinski
fa4fad40d5 Merge branch 'tcp-take-care-of-another-syzbot-issue'
Eric Dumazet says:

====================
tcp: take care of another syzbot issue

This is a minor issue: It took months for syzbot to find a C repro,
and even with it, I had to spend a lot of time to understand KFENCE
was a prereq. With the default kfence 500ms interval, I had to be
very patient to trigger the kernel warning and perform my analysis.

This series targets net-next tree, because I added a new generic helper
in the first patch, then fixed the issue in the second one.
They can be backported once proven solid.
====================

Link: https://lore.kernel.org/r/20220222032113.4005821-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-22 19:44:07 -08:00
Eric Dumazet
2b88cba558 net: preserve skb_end_offset() in skb_unclone_keeptruesize()
syzbot found another way to trigger the infamous WARN_ON_ONCE(delta < len)
in skb_try_coalesce() [1]

I was able to root cause the issue to kfence.

When kfence is in action, the following assertion is no longer true:

int size = xxxx;
void *ptr1 = kmalloc(size, gfp);
void *ptr2 = kmalloc(size, gfp);

if (ptr1 && ptr2)
	ASSERT(ksize(ptr1) == ksize(ptr2));

We attempted to fix these issues in the blamed commits, but forgot
that TCP was possibly shifting data after skb_unclone_keeptruesize()
has been used, notably from tcp_retrans_try_collapse().

So we not only need to keep same skb->truesize value,
we also need to make sure TCP wont fill new tailroom
that pskb_expand_head() was able to get from a
addr = kmalloc(...) followed by ksize(addr)

Split skb_unclone_keeptruesize() into two parts:

1) Inline skb_unclone_keeptruesize() for the common case,
   when skb is not cloned.

2) Out of line __skb_unclone_keeptruesize() for the 'slow path'.

WARNING: CPU: 1 PID: 6490 at net/core/skbuff.c:5295 skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295
Modules linked in:
CPU: 1 PID: 6490 Comm: syz-executor161 Not tainted 5.17.0-rc4-syzkaller-00229-g4f12b742eb2b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295
Code: bf 01 00 00 00 0f b7 c0 89 c6 89 44 24 20 e8 62 24 4e fa 8b 44 24 20 83 e8 01 0f 85 e5 f0 ff ff e9 87 f4 ff ff e8 cb 20 4e fa <0f> 0b e9 06 f9 ff ff e8 af b2 95 fa e9 69 f0 ff ff e8 95 b2 95 fa
RSP: 0018:ffffc900063af268 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 00000000ffffffd5 RCX: 0000000000000000
RDX: ffff88806fc05700 RSI: ffffffff872abd55 RDI: 0000000000000003
RBP: ffff88806e675500 R08: 00000000ffffffd5 R09: 0000000000000000
R10: ffffffff872ab659 R11: 0000000000000000 R12: ffff88806dd554e8
R13: ffff88806dd9bac0 R14: ffff88806dd9a2c0 R15: 0000000000000155
FS:  00007f18014f9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020002000 CR3: 000000006be7a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 tcp_try_coalesce net/ipv4/tcp_input.c:4651 [inline]
 tcp_try_coalesce+0x393/0x920 net/ipv4/tcp_input.c:4630
 tcp_queue_rcv+0x8a/0x6e0 net/ipv4/tcp_input.c:4914
 tcp_data_queue+0x11fd/0x4bb0 net/ipv4/tcp_input.c:5025
 tcp_rcv_established+0x81e/0x1ff0 net/ipv4/tcp_input.c:5947
 tcp_v4_do_rcv+0x65e/0x980 net/ipv4/tcp_ipv4.c:1719
 sk_backlog_rcv include/net/sock.h:1037 [inline]
 __release_sock+0x134/0x3b0 net/core/sock.c:2779
 release_sock+0x54/0x1b0 net/core/sock.c:3311
 sk_wait_data+0x177/0x450 net/core/sock.c:2821
 tcp_recvmsg_locked+0xe28/0x1fd0 net/ipv4/tcp.c:2457
 tcp_recvmsg+0x137/0x610 net/ipv4/tcp.c:2572
 inet_recvmsg+0x11b/0x5e0 net/ipv4/af_inet.c:850
 sock_recvmsg_nosec net/socket.c:948 [inline]
 sock_recvmsg net/socket.c:966 [inline]
 sock_recvmsg net/socket.c:962 [inline]
 ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632
 ___sys_recvmsg+0x127/0x200 net/socket.c:2674
 __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: c4777efa751d ("net: add and use skb_unclone_keeptruesize() helper")
Fixes: 097b9146c0e2 ("net: fix up truesize of cloned skb in skb_prepare_for_shift()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-22 19:43:50 -08:00
Eric Dumazet
763087dab9 net: add skb_set_end_offset() helper
We have multiple places where this helper is convenient,
and plan using it in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-22 19:43:04 -08:00
Eric Dumazet
0ebea8f9b8 ipv6: tcp: consistently use MAX_TCP_HEADER
All other skbs allocated for TCP tx are using MAX_TCP_HEADER already.

MAX_HEADER can be too small for some cases (like eBPF based encapsulation),
so this can avoid extra pskb_expand_head() in lower stacks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220222031115.4005060-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-22 17:27:18 -08:00
Maciek Machnikowski
f64ae40de5 testptp: add option to shift clock by nanoseconds
Add option to shift the clock by a specified number of nanoseconds.

The new argument -n will specify the number of nanoseconds to add to the
ptp clock. Since the API doesn't support negative shifts those needs to
be calculated by subtracting full seconds and adding a nanosecond offset.

Signed-off-by: Maciek Machnikowski <maciek@machnikowski.net>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20220221200637.125595-1-maciek@machnikowski.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-22 17:03:40 -08:00