2138 Commits

Author SHA1 Message Date
DENG Qingfang
342afce10d net: dsa: mt7530: correct ds->num_ports
Setting ds->num_ports to DSA_MAX_PORTS made DSA core allocate unnecessary
dsa_port's and call mt7530_port_disable for non-existent ports.

Set it to MT7530_NUM_PORTS to fix that, and dsa_is_user_port check in
port_enable/disable is no longer required.

Cc: stable@vger.kernel.org
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18 13:22:21 +01:00
Aleksander Jan Bajkowski
66d262804a net: dsa: lantiq_gswip: fix register definition
I compared the register definitions with the D-Link DWR-966
GPL sources and found that the PUAFD field definition was
incorrect. This definition is unused and causes no issues.

Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18 13:21:49 +01:00
Vladimir Oltean
8d5f7954b7 net: dsa: felix: break at first CPU port during init and teardown
The NXP LS1028A switch has two Ethernet ports towards the CPU, but only
one of them is capable of acting as an NPI port at a time (inject and
extract packets using DSA tags).

However, using the alternative ocelot-8021q tagging protocol, it should
be possible to use both CPU ports symmetrically, but for that we need to
mark both ports in the device tree as DSA masters.

In the process of doing that, it can be seen that traffic to/from the
network stack gets broken, and this is because the Felix driver iterates
through all DSA CPU ports and configures them as NPI ports. But since
there can only be a single NPI port, we effectively end up in a
situation where DSA thinks the default CPU port is the first one, but
the hardware port configured to be an NPI is the last one.

I would like to treat this as a bug, because if the updated device trees
are going to start circulating, it would be really good for existing
kernels to support them, too.

Fixes: adb3dccf090b ("net: dsa: felix: convert to the new .change_tag_protocol DSA API")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:19 -07:00
Vladimir Oltean
1328a88325 net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent
At present, when a PTP packet which requires TX timestamping gets
dropped under congestion by the switch, things go downhill very fast.
The driver keeps a clone of that skb in a queue of packets awaiting TX
timestamp interrupts, but interrupts will never be raised for the
dropped packets.

Moreover, matching timestamped packets to timestamps is done by a 2-bit
timestamp ID, and this can wrap around and we can match on the wrong skb.

Since with the default NPI-based tagging protocol, we get no notification
about packet drops, the best we can do is eventually recover from the
drop of a PTP frame: its skb will be dead memory until another skb which
was assigned the same timestamp ID happens to find it.

However, with the ocelot-8021q tagger which injects packets using the
manual register interface, it appears that we can check for more
information, such as:

- whether the input queue has reached the high watermark or not
- whether the injection group's FIFO can accept additional data or not

so we know that a PTP frame is likely to get dropped before actually
sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it
can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet).

But when we do that, we can also remove the skb from the timestamping
queue, because there surely won't be any timestamp that matches it.

Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:18 -07:00
Vladimir Oltean
49f885b2d9 net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib
Michael reported that when using the "ocelot-8021q" tagging protocol,
the switch driver module must be manually loaded before the tagging
protocol can be loaded/is available.

This appears to be the same problem described here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
where due to the fact that DSA tagging protocols make use of symbols
exported by the switch drivers, circular dependencies appear and this
breaks module autoloading.

The ocelot_8021q driver needs the ocelot_can_inject() and
ocelot_port_inject_frame() functions from the switch library. Previously
the wrong approach was taken to solve that dependency: shims were
provided for the case where the ocelot switch library was compiled out,
but that turns out to be insufficient, because the dependency when the
switch lib _is_ compiled is problematic too.

We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
static inline functions, because these access I/O functions like
__ocelot_write_ix() which is called by ocelot_write_rix(). Making those
static inline basically means exposing the whole guts of the ocelot
switch library, not ideal...

We already have one tagging protocol driver which calls into the switch
driver during xmit but not using any exported symbol: sja1105_defer_xmit.
We can do the same thing here: create a kthread worker and one work item
per skb, and let the switch driver itself do the register accesses to
send the skb, and then consume it.

Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
Reported-by: Michael Walle <michael@walle.cc>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:18 -07:00
Vladimir Oltean
52849bcf00 net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO
PTP packets with 2-step TX timestamp requests are matched to packets
based on the egress port number and a 6-bit timestamp identifier.
All PTP timestamps are held in a common FIFO that is 128 entry deep.

This patch ensures that back-to-back timestamping requests cannot exceed
the hardware FIFO capacity. If that happens, simply send the packets
without requesting a TX timestamp to be taken (in the case of felix,
since the DSA API has a void return code in ds->ops->port_txtstamp) or
drop them (in the case of ocelot).

I've moved the ts_id_lock from a per-port basis to a per-switch basis,
because we need separate accounting for both numbers of PTP frames in
flight. And since we need locking to inc/dec the per-switch counter,
that also offers protection for the per-port counter and hence there is
no reason to have a per-port counter anymore.

Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:17 -07:00
Vladimir Oltean
4ac0567e40 net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver
It's nice to be able to test a tagging protocol with dsa_loop, but not
at the cost of losing the ability of building the tagging protocol and
switch driver as modules, because as things stand, there is a circular
dependency between the two. Tagging protocol drivers cannot depend on
switch drivers, that is a hard fact.

The reasoning behind the blamed patch was that accessing dp->priv should
first make sure that the structure behind that pointer is what we really
think it is.

Currently the "sja1105" and "sja1110" tagging protocols only operate
with the sja1105 switch driver, just like any other tagging protocol and
switch combination. The only way to mix and match them is by modifying
the code, and this applies to dsa_loop as well (by default that uses
DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
practice there isn't one.

Until we extend dsa_loop to allow user space configuration, treat the
problem as a non-issue and just say that DSA ports found by tag_sja1105
are always sja1105 ports, which is in fact true. But keep the
dsa_port_is_sja1105 function so that it's easy to patch it during
testing, and rely on dead code elimination.

Fixes: 994d2cbb08ca ("net: dsa: tag_sja1105: be dsa_loop-safe")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:33:36 -07:00
Vladimir Oltean
28da0555c3 net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver
The problem is that DSA tagging protocols really must not depend on the
switch driver, because this creates a circular dependency at insmod
time, and the switch driver will effectively not load when the tagging
protocol driver is missing.

The code was structured in the way it was for a reason, though. The DSA
driver-facing API for PTP timestamping relies on the assumption that
two-step TX timestamps are provided by the hardware in an out-of-band
manner, typically by raising an interrupt and making that timestamp
available inside some sort of FIFO which is to be accessed over
SPI/MDIO/etc.

So the API puts .port_txtstamp into dsa_switch_ops, because it is
expected that the switch driver needs to save some state (like put the
skb into a queue until its TX timestamp arrives).

On SJA1110, TX timestamps are provided by the switch as Ethernet
packets, so this makes them be received and processed by the tagging
protocol driver. This in itself is great, because the timestamps are
full 64-bit and do not require reconstruction, and since Ethernet is the
fastest I/O method available to/from the switch, PTP timestamps arrive
very quickly, no matter how bottlenecked the SPI connection is, because
SPI interaction is not needed at all.

DSA's code structure and strict isolation between the tagging protocol
driver and the switch driver break the natural code organization.

When the tagging protocol driver receives a packet which is classified
as a metadata packet containing timestamps, it passes those timestamps
one by one to the switch driver, which then proceeds to compare them
based on the recorded timestamp ID that was generated in .port_txtstamp.

The communication between the tagging protocol and the switch driver is
done through a method exported by the switch driver, sja1110_process_meta_tstamp.
To satisfy build requirements, we force a dependency to build the
tagging protocol driver as a module when the switch driver is a module.
However, as explained in the first paragraph, that causes the circular
dependency.

To solve this, move the skb queue from struct sja1105_private :: struct
sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
The latter is a data structure for which hacks have already been put
into place to be able to create persistent storage per switch that is
accessible from the tagging protocol driver (see sja1105_setup_ports).

With the skb queue directly accessible from the tagging protocol driver,
we can now move sja1110_process_meta_tstamp into the tagging driver
itself, and avoid exporting a symbol.

Fixes: 566b18c8b752 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:33:36 -07:00
Arun Ramadoss
ef1100ef20 net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work
When the ksz module is installed and removed using rmmod, kernel crashes
with null pointer dereferrence error. During rmmod, ksz_switch_remove
function tries to cancel the mib_read_workqueue using
cancel_delayed_work_sync routine and unregister switch from dsa.

During dsa_unregister_switch it calls ksz_mac_link_down, which in turn
reschedules the workqueue since mib_interval is non-zero.
Due to which queue executed after mib_interval and it tries to access
dp->slave. But the slave is unregistered in the ksz_switch_remove
function. Hence kernel crashes.

To avoid this crash, before canceling the workqueue, resetted the
mib_interval to 0.

v1 -> v2:
-Removed the if condition in ksz_mib_read_work

Fixes: 469b390e1ba3 ("net: dsa: microchip: use delayed_work instead of timer + work")
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:35:53 +01:00
Maarten Zanders
4a3e0aeddf net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's
mv88e6xxx_port_ppu_updates() interpretes data in the PORT_STS
register incorrectly for internal ports (ie no PPU). In these
cases, the PHY_DETECT bit indicates link status. This results
in forcing the MAC state whenever the PHY link goes down which
is not intended. As a side effect, LED's configured to show
link status stay lit even though the physical link is down.

Add a check in mac_link_down and mac_link_up to see if it
concerns an external port and only then, look at PPU status.

Fixes: 5d5b231da7ac (net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down)
Reported-by: Maarten Zanders <m.zanders@televic.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Maarten Zanders <maarten.zanders@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:32:14 +01:00
Vladimir Oltean
5bded8259e net: dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports
Similar to commit 6087175b7991 ("net: dsa: mt7530: use independent VLAN
learning on VLAN-unaware bridges"), software forwarding between an
unoffloaded LAG port (a bonding interface with an unsupported policy)
and a mv88e6xxx user port directly under a bridge is broken.

We adopt the same strategy, which is to make the standalone ports not
find any ATU entry learned on a bridge port.

Theory: the mv88e6xxx ATU is looked up by FID and MAC address. There are
as many FIDs as VIDs (4096). The FID is derived from the VID when
possible (the VTU maps a VID to a FID), with a fallback to the port
based default FID value when not (802.1Q Mode is disabled on the port,
or the classified VID isn't present in the VTU).

The mv88e6xxx driver makes the following use of FIDs and VIDs:

- the port's DefaultVID (to which untagged & pvid-tagged packets get
  classified) is 0 and is absent from the VTU, so this kind of packets is
  processed in FID 0, the default FID assigned by mv88e6xxx_setup_port.

- every time a bridge VLAN is created, mv88e6xxx_port_vlan_join() ->
  mv88e6xxx_atu_new() associates a FID with that VID which increases
  linearly starting from 1. Like this:

  bridge vlan add dev lan0 vid 100 # FID 1
  bridge vlan add dev lan1 vid 100 # still FID 1
  bridge vlan add dev lan2 vid 1024 # FID 2

The FID allocation made by the driver is sub-optimal for the following
reasons:

(a) A standalone port has a DefaultPVID of 0 and a default FID of 0 too.
    A VLAN-unaware bridged port has a DefaultPVID of 0 and a default FID
    of 0 too. The difference is that the bridged ports may learn ATU
    entries, while the standalone port has the requirement that it must
    not, and must not find them either. Standalone ports must not use
    the same FID as ports belonging to a bridge. All standalone ports
    can use the same FID, since the ATU will never have an entry in
    that FID.

(b) Multiple VLAN-unaware bridges will all use a DefaultPVID of 0 and a
    default FID of 0 on all their ports. The FDBs will not be isolated
    between these bridges. Every VLAN-unaware bridge must use the same
    FID on all its ports, different from the FID of other bridge ports.

(c) Each bridge VLAN uses a unique FID which is useful for Independent
    VLAN Learning, but the same VLAN ID on multiple VLAN-aware bridges
    will result in the same FID being used by mv88e6xxx_atu_new().
    The correct behavior is for VLAN 1 in br0 to have a different FID
    compared to VLAN 1 in br1.

This patch cannot fix all the above. Traditionally the DSA framework did
not care about this, and the reality is that DSA core involvement is
needed for the aforementioned issues to be solved. The only thing we can
solve here is an issue which does not require API changes, and that is
issue (a), aka use a different FID for standalone ports vs ports under
VLAN-unaware bridges.

The first step is deciding what VID and FID to use for standalone ports,
and what VID and FID for bridged ports. The 0/0 pair for standalone
ports is what they used up till now, let's keep using that. For bridged
ports, there are 2 cases:

- VLAN-aware ports will never end up using the port default FID, because
  packets will always be classified to a VID in the VTU or dropped
  otherwise. The FID is the one associated with the VID in the VTU.

- On VLAN-unaware ports, we _could_ leave their DefaultVID (pvid) at
  zero (just as in the case of standalone ports), and just change the
  port's default FID from 0 to a different number (say 1).

However, Tobias points out that there is one more requirement to cater to:
cross-chip bridging. The Marvell DSA header does not carry the FID in
it, only the VID. So once a packet crosses a DSA link, if it has a VID
of zero it will get classified to the default FID of that cascade port.
Relying on a port default FID for upstream cascade ports results in
contradictions: a default FID of 0 breaks ATU isolation of bridged ports
on the downstream switch, a default FID of 1 breaks standalone ports on
the downstream switch.

So not only must standalone ports have different FIDs compared to
bridged ports, they must also have different DefaultVID values.
IEEE 802.1Q defines two reserved VID values: 0 and 4095. So we simply
choose 4095 as the DefaultVID of ports belonging to VLAN-unaware
bridges, and VID 4095 maps to FID 1.

For the xmit operation to look up the same ATU database, we need to put
VID 4095 in DSA tags sent to ports belonging to VLAN-unaware bridges
too. All shared ports are configured to map this VID to the bridging
FID, because they are members of that VLAN in the VTU. Shared ports
don't need to have 802.1QMode enabled in any way, they always parse the
VID from the DSA header, they don't need to look at the 802.1Q header.

We install VID 4095 to the VTU in mv88e6xxx_setup_port(), with the
mention that mv88e6xxx_vtu_setup() which was located right below that
call was flushing the VTU so those entries wouldn't be preserved.
So we need to relocate the VTU flushing prior to the port initialization
during ->setup(). Also note that this is why it is safe to assume that
VID 4095 will get associated with FID 1: the user ports haven't been
created, so there is no avenue for the user to create a bridge VLAN
which could otherwise race with the creation of another FID which would
otherwise use up the non-reserved FID value of 1.

[ Currently mv88e6xxx_port_vlan_join() doesn't have the option of
  specifying a preferred FID, it always calls mv88e6xxx_atu_new(). ]

mv88e6xxx_port_db_load_purge() is the function to access the ATU for
FDB/MDB entries, and it used to determine the FID to use for
VLAN-unaware FDB entries (VID=0) using mv88e6xxx_port_get_fid().
But the driver only called mv88e6xxx_port_set_fid() once, during probe,
so no surprises, the port FID was always 0, the call to get_fid() was
redundant. As much as I would have wanted to not touch that code, the
logic is broken when we add a new FID which is not the port-based
default. Now the port-based default FID only corresponds to standalone
ports, and FDB/MDB entries belong to the bridging service. So while in
the future, when the DSA API will support FDB isolation, we will have to
figure out the FID based on the bridge number, for now there's a single
bridging FID, so hardcode that.

Lastly, the tagger needs to check, when it is transmitting a VLAN
untagged skb, whether it is sending it towards a bridged or a standalone
port. When we see it is bridged we assume the bridge is VLAN-unaware.
Not because it cannot be VLAN-aware but:

- if we are transmitting from a VLAN-aware bridge we are likely doing so
  using TX forwarding offload. That code path guarantees that skbs have
  a vlan hwaccel tag in them, so we would not enter the "else" branch
  of the "if (skb->protocol == htons(ETH_P_8021Q))" condition.

- if we are transmitting on behalf of a VLAN-aware bridge but with no TX
  forwarding offload (no PVT support, out of space in the PVT, whatever),
  we would indeed be transmitting with VLAN 4095 instead of the bridge
  device's pvid. However we would be injecting a "From CPU" frame, and
  the switch won't learn from that - it only learns from "Forward" frames.
  So it is inconsequential for address learning. And VLAN 4095 is
  absolutely enough for the frame to exit the switch, since we never
  remove that VLAN from any port.

Fixes: 57e661aae6a8 ("net: dsa: mv88e6xxx: Link aggregation support")
Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08 15:47:46 -07:00
Vladimir Oltean
8b6836d824 net: dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware
The VLAN support in mv88e6xxx has a loaded history. Commit 2ea7a679ca2a
("net: dsa: Don't add vlans when vlan filtering is disabled") noticed
some issues with VLAN and decided the best way to deal with them was to
make the DSA core ignore VLANs added by the bridge while VLAN awareness
is turned off. Those issues were never explained, just presented as
"at least one corner case".

That approach had problems of its own, presented by
commit 54a0ed0df496 ("net: dsa: provide an option for drivers to always
receive bridge VLANs") for the DSA core, followed by
commit 1fb74191988f ("net: dsa: mv88e6xxx: fix vlan setup") which
applied ds->configure_vlan_while_not_filtering = true for mv88e6xxx in
particular.

We still don't know what corner case Andrew saw when he wrote
commit 2ea7a679ca2a ("net: dsa: Don't add vlans when vlan filtering is
disabled"), but Tobias now reports that when we use TX forwarding
offload, pinging an external station from the bridge device is broken if
the front-facing DSA user port has flooding turned off. The full
description is in the link below, but for short, when a mv88e6xxx port
is under a VLAN-unaware bridge, it inherits that bridge's pvid.
So packets ingressing a user port will be classified to e.g. VID 1
(assuming that value for the bridge_default_pvid), whereas when
tag_dsa.c xmits towards a user port, it always sends packets using a VID
of 0 if that port is standalone or under a VLAN-unaware bridge - or at
least it did so prior to commit d82f8ab0d874 ("net: dsa: tag_dsa:
offload the bridge forwarding process").

In any case, when there is a conversation between the CPU and a station
connected to a user port, the station's MAC address is learned in VID 1
but the CPU tries to transmit through VID 0. The packets reach the
intended station, but via flooding and not by virtue of matching the
existing ATU entry.

DSA has established (and enforced in other drivers: sja1105, felix,
mt7530) that a VLAN-unaware port should use a private pvid, and not
inherit the one from the bridge. The bridge's pvid should only be
inherited when that bridge is VLAN-aware, so all state transitions need
to be handled. On the other hand, all bridge VLANs should sit in the VTU
starting with the moment when the bridge offloads them via switchdev,
they are just not used.

This solves the problem that Tobias sees because packets ingressing on
VLAN-unaware user ports now get classified to VID 0, which is also the
VID used by tag_dsa.c on xmit.

Fixes: d82f8ab0d874 ("net: dsa: tag_dsa: offload the bridge forwarding process")
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003222312.284175-2-vladimir.oltean@nxp.com/#24491503
Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08 15:47:46 -07:00
Andrew Lunn
b9c587fed6 dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports
Same members of the Marvell Ethernet switches impose MTU restrictions
on ports used for connecting to the CPU or another switch for DSA. If
the MTU is set too low, tagged frames will be discarded. Ensure the
worst case tagger overhead is included in setting the MTU for DSA and
CPU ports.

Fixes: 1baf0fac10fb ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
Reported by: 曹煜 <cao88yu@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27 13:31:10 +01:00
Andrew Lunn
b92ce2f54c dsa: mv88e6xxx: Fix MTU definition
The MTU passed to the DSA driver is the payload size, typically 1500.
However, the switch uses the frame size when applying restrictions.
Adjust the MTU with the size of the Ethernet header and the frame
checksum. The VLAN header also needs to be included when the frame
size it per port, but not when it is global.

Fixes: 1baf0fac10fb ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
Reported by: 曹煜 <cao88yu@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27 13:31:10 +01:00
Andrew Lunn
fe23036192 dsa: mv88e6xxx: 6161: Use chip wide MAX MTU
The datasheets suggests the 6161 uses a per port setting for jumbo
frames. Testing has however shown this is not correct, it uses the old
style chip wide MTU control. Change the ops in the 6161 structure to
reflect this.

Fixes: 1baf0fac10fb ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
Reported by: 曹煜 <cao88yu@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27 13:31:10 +01:00
Vladimir Oltean
74b6d7d133 net: dsa: realtek: register the MDIO bus under devres
The Linux device model permits both the ->shutdown and ->remove driver
methods to get called during a shutdown procedure. Example: a DSA switch
which sits on an SPI bus, and the SPI bus driver calls this on its
->shutdown method:

spi_unregister_controller
-> device_for_each_child(&ctlr->dev, NULL, __unregister);
   -> spi_unregister_device(to_spi_device(dev));
      -> device_del(&spi->dev);

So this is a simple pattern which can theoretically appear on any bus,
although the only other buses on which I've been able to find it are
I2C:

i2c_del_adapter
-> device_for_each_child(&adap->dev, NULL, __unregister_client);
   -> i2c_unregister_device(client);
      -> device_unregister(&client->dev);

The implication of this pattern is that devices on these buses can be
unregistered after having been shut down. The drivers for these devices
might choose to return early either from ->remove or ->shutdown if the
other callback has already run once, and they might choose that the
->shutdown method should only perform a subset of the teardown done by
->remove (to avoid unnecessary delays when rebooting).

So in other words, the device driver may choose on ->remove to not
do anything (therefore to not unregister an MDIO bus it has registered
on ->probe), because this ->remove is actually triggered by the
device_shutdown path, and its ->shutdown method has already run and done
the minimally required cleanup.

This used to be fine until the blamed commit, but now, the following
BUG_ON triggers:

void mdiobus_free(struct mii_bus *bus)
{
	/* For compatibility with error handling in drivers. */
	if (bus->state == MDIOBUS_ALLOCATED) {
		kfree(bus);
		return;
	}

	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
	bus->state = MDIOBUS_RELEASED;

	put_device(&bus->dev);
}

In other words, there is an attempt to free an MDIO bus which was not
unregistered. The attempt to free it comes from the devres release
callbacks of the SPI device, which are executed after the device is
unregistered.

I'm not saying that the fact that MDIO buses allocated using devres
would automatically get unregistered wasn't strange. I'm just saying
that the commit didn't care about auditing existing call paths in the
kernel, and now, the following code sequences are potentially buggy:

(a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
    located on a bus that unregisters its children on shutdown. After
    the blamed patch, either both the alloc and the register should use
    devres, or none should.

(b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
    mdiobus_unregister at all in the remove path. After the blamed
    patch, nobody unregisters the MDIO bus anymore, so this is even more
    buggy than the previous case which needs a specific bus
    configuration to be seen, this one is an unconditional bug.

In this case, the Realtek drivers fall under category (b). To solve it,
we can register the MDIO bus under devres too, which restores the
previous behavior.

Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-21 13:52:16 +01:00
Vladimir Oltean
fd292c189a net: dsa: tear down devlink port regions when tearing down the devlink port on error
Commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal")
decided it was fine to ignore errors on certain ports that fail to
probe, and go on with the ports that do probe fine.

Commit fb6ec87f7229 ("net: dsa: Fix type was not set for devlink port")
noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get
called, and devlink notices after a timeout of 3600 seconds and prints a
WARN_ON. So it went ahead to unregister the devlink port. And because
there exists an UNUSED port flavour, we actually re-register the devlink
port as UNUSED.

Commit 08156ba430b4 ("net: dsa: Add devlink port regions support to
DSA") added devlink port regions, which are set up by the driver and not
by DSA.

When we trigger the devlink port deregistration and reregistration as
unused, devlink now prints another WARN_ON, from here:

devlink_port_unregister:
	WARN_ON(!list_empty(&devlink_port->region_list));

So the port still has regions, which makes sense, because they were set
up by the driver, and the driver doesn't know we're unregistering the
devlink port.

Somebody needs to tear them down, and optionally (actually it would be
nice, to be consistent) set them up again for the new devlink port.

But DSA's layering stays in our way quite badly here.

The options I've considered are:

1. Introduce a function in devlink to just change a port's type and
   flavour. No dice, devlink keeps a lot of state, it really wants the
   port to not be registered when you set its parameters, so changing
   anything can only be done by destroying what we currently have and
   recreating it.

2. Make DSA cache the parameters passed to dsa_devlink_port_region_create,
   and the region returned, keep those in a list, then when the devlink
   port unregister needs to take place, the existing devlink regions are
   destroyed by DSA, and we replay the creation of new regions using the
   cached parameters. Problem: mv88e6xxx keeps the region pointers in
   chip->ports[port].region, and these will remain stale after DSA frees
   them. There are many things DSA can do, but updating mv88e6xxx's
   private pointers is not one of them.

3. Just let the driver do it (i.e. introduce a very specific method
   called ds->ops->port_reinit_as_unused, which unregisters its devlink
   port devlink regions, then the old devlink port, then registers the
   new one, then the devlink port regions for it). While it does work,
   as opposed to the others, it's pretty horrible from an API
   perspective and we can do better.

4. Introduce a new pair of methods, ->port_setup and ->port_teardown,
   which in the case of mv88e6xxx must register and unregister the
   devlink port regions. Call these 2 methods when the port must be
   reinitialized as unused.

Naturally, I went for the 4th approach.

Fixes: 08156ba430b4 ("net: dsa: Add devlink port regions support to DSA")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19 13:05:44 +01:00
Vladimir Oltean
a68e9da485 net: dsa: xrs700x: be compatible with masters which unregister on shutdown
Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA
master to get rid of lockdep warnings"), DSA gained a requirement which
it did not fulfill, which is to unlink itself from the DSA master at
shutdown time.

Since the Arrow SpeedChips XRS700x driver was introduced after the bad
commit, it has never worked with DSA masters which decide to unregister
their net_device on shutdown, effectively hanging the reboot process.
To fix that, we need to call dsa_switch_shutdown.

These devices can be connected by I2C or by MDIO, and if I search for
I2C or MDIO bus drivers that implement their ->shutdown by redirecting
it to ->remove I don't see any, however this does not mean it would not
be possible. To be compatible with that pattern, it is necessary to
implement an "if this then not that" scheme, to avoid ->remove and
->shutdown from being called both for the same struct device.

Fixes: ee00b24f32eb ("net: dsa: add Arrow SpeedChips XRS700x driver")
Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19 12:08:37 +01:00
Vladimir Oltean
fe4053078c net: dsa: microchip: ksz8863: be compatible with masters which unregister on shutdown
Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA
master to get rid of lockdep warnings"), DSA gained a requirement which
it did not fulfill, which is to unlink itself from the DSA master at
shutdown time.

Since the Microchip sub-driver for KSZ8863 was introduced after the bad
commit, it has never worked with DSA masters which decide to unregister
their net_device on shutdown, effectively hanging the reboot process.
To fix that, we need to call dsa_switch_shutdown.

Since this driver expects the MDIO bus to be backed by mdio_bitbang, I
don't think there is currently any MDIO bus driver which implements its
->shutdown by redirecting it to ->remove, but in any case, to be
compatible with that pattern, it is necessary to implement an "if this
then not that" scheme, to avoid ->remove and ->shutdown from being
called both for the same struct device.

Fixes: 60a364760002 ("net: dsa: microchip: Add Microchip KSZ8863 SMI based driver support")
Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19 12:08:37 +01:00
Vladimir Oltean
46baae56e1 net: dsa: hellcreek: be compatible with masters which unregister on shutdown
Since commit 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA
master to get rid of lockdep warnings"), DSA gained a requirement which
it did not fulfill, which is to unlink itself from the DSA master at
shutdown time.

Since the hellcreek driver was introduced after the bad commit, it has
never worked with DSA masters which decide to unregister their
net_device on shutdown, effectively hanging the reboot process.

Hellcreek is a platform device driver, so we probably cannot have the
oddities of ->shutdown and ->remove getting both called for the exact
same struct device. But to be in line with the pattern from the other
device drivers which are on slow buses, implement the same "if this then
not that" pattern of either running the ->shutdown or the ->remove hook.
The driver's current ->remove implementation makes that very easy
because it already zeroes out its device_drvdata on ->remove.

Fixes: e4b27ebc780f ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches")
Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19 12:08:37 +01:00
Vladimir Oltean
0650bf52b3 net: dsa: be compatible with masters which unregister on shutdown
Lino reports that on his system with bcmgenet as DSA master and KSZ9897
as a switch, rebooting or shutting down never works properly.

What does the bcmgenet driver have special to trigger this, that other
DSA masters do not? It has an implementation of ->shutdown which simply
calls its ->remove implementation. Otherwise said, it unregisters its
network interface on shutdown.

This message can be seen in a loop, and it hangs the reboot process there:

unregister_netdevice: waiting for eth0 to become free. Usage count = 3

So why 3?

A usage count of 1 is normal for a registered network interface, and any
virtual interface which links itself as an upper of that will increment
it via dev_hold. In the case of DSA, this is the call path:

dsa_slave_create
-> netdev_upper_dev_link
   -> __netdev_upper_dev_link
      -> __netdev_adjacent_dev_insert
         -> dev_hold

So a DSA switch with 3 interfaces will result in a usage count elevated
by two, and netdev_wait_allrefs will wait until they have gone away.

Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
delete themselves, but DSA cannot just vanish and go poof, at most it
can unbind itself from the switch devices, but that must happen strictly
earlier compared to when the DSA master unregisters its net_device, so
reacting on the NETDEV_UNREGISTER event is way too late.

It seems that it is a pretty established pattern to have a driver's
->shutdown hook redirect to its ->remove hook, so the same code is
executed regardless of whether the driver is unbound from the device, or
the system is just shutting down. As Florian puts it, it is quite a big
hammer for bcmgenet to unregister its net_device during shutdown, but
having a common code path with the driver unbind helps ensure it is well
tested.

So DSA, for better or for worse, has to live with that and engage in an
arms race of implementing the ->shutdown hook too, from all individual
drivers, and do something sane when paired with masters that unregister
their net_device there. The only sane thing to do, of course, is to
unlink from the master.

However, complications arise really quickly.

The pattern of redirecting ->shutdown to ->remove is not unique to
bcmgenet or even to net_device drivers. In fact, SPI controllers do it
too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
and MDIO controllers do it too (this is something I have not researched
too deeply, but even if this is not the case today, it is certainly
plausible to happen in the future, and must be taken into consideration).

Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
insane implication is that for the exact same DSA switch device, we
might have both ->shutdown and ->remove getting called.

So we need to do something with that insane environment. The pattern
I've come up with is "if this, then not that", so if either ->shutdown
or ->remove gets called, we set the device's drvdata to NULL, and in the
other hook, we check whether the drvdata is NULL and just do nothing.
This is probably not necessary for platform devices, just for devices on
buses, but I would really insist for consistency among drivers, because
when code is copy-pasted, it is not always copy-pasted from the best
sources.

So depending on whether the DSA switch's ->remove or ->shutdown will get
called first, we cannot really guarantee even for the same driver if
rebooting will result in the same code path on all platforms. But
nonetheless, we need to do something minimally reasonable on ->shutdown
too to fix the bug. Of course, the ->remove will do more (a full
teardown of the tree, with all data structures freed, and this is why
the bug was not caught for so long). The new ->shutdown method is kept
separate from dsa_unregister_switch not because we couldn't have
unregistered the switch, but simply in the interest of doing something
quick and to the point.

The big question is: does the DSA switch's ->shutdown get called earlier
than the DSA master's ->shutdown? If not, there is still a risk that we
might still trigger the WARN_ON in unregister_netdevice that says we are
attempting to unregister a net_device which has uppers. That's no good.
Although the reference to the master net_device won't physically go away
even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
on it.

The answer to that question lies in this comment above device_link_add:

 * A side effect of the link creation is re-ordering of dpm_list and the
 * devices_kset list by moving the consumer device and all devices depending
 * on it to the ends of these lists (that does not happen to devices that have
 * not been registered when this function is called).

so the fact that DSA uses device_link_add towards its master is not
exactly for nothing. device_shutdown() walks devices_kset from the back,
so this is our guarantee that DSA's shutdown happens before the master's
shutdown.

Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19 12:08:37 +01:00
Florian Fainelli
02319bf15a net: dsa: bcm_sf2: Fix array overrun in bcm_sf2_num_active_ports()
After d12e1c464988 ("net: dsa: b53: Set correct number of ports in the
DSA struct") we stopped setting dsa_switch::num_ports to DSA_MAX_PORTS,
which created an off by one error between the statically allocated
bcm_sf2_priv::port_sts array (of size DSA_MAX_PORTS). When
dsa_is_cpu_port() is used, we end-up accessing an out of bounds member
and causing a NPD.

Fix this by iterating with the appropriate port count using
ds->num_ports.

Fixes: d12e1c464988 ("net: dsa: b53: Set correct number of ports in the DSA struct")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-17 14:21:46 +01:00
Vladimir Oltean
3c9cfb5269 net: update NXP copyright text
NXP Legal insists that the following are not fine:

- Saying "NXP Semiconductors" instead of "NXP", since the company's
  registered name is "NXP"

- Putting a "(c)" sign in the copyright string

- Putting a comma in the copyright string

The only accepted copyright string format is "Copyright <year-range> NXP".

This patch changes the copyright headers in the networking files that
were sent by me, or derived from code sent by me.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-17 13:52:17 +01:00
Aleksander Jan Bajkowski
111b64e35e net: dsa: lantiq_gswip: Add 200ms assert delay
The delay is especially needed by the xRX300 and xRX330 SoCs. Without
this patch, some phys are sometimes not properly detected.

The patch was tested on BT Home Hub 5A and D-Link DWR-966.

Fixes: a09d042b0862 ("net: dsa: lantiq: allow to use all GPHYs on xRX300 and xRX330")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-13 13:01:44 +01:00
Ansuel Smith
ce062a0adb net: dsa: qca8k: fix kernel panic with legacy mdio mapping
When the mdio legacy mapping is used the mii_bus priv registered by DSA
refer to the dsa switch struct instead of the qca8k_priv struct and
causes a kernel panic. Create dedicated function when the internal
dedicated mdio driver is used to properly handle the 2 different
implementation.

Fixes: 759bafb8a322 ("net: dsa: qca8k: add support for internal phy and internal mdio")
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-12 10:47:31 +01:00
Rafał Miłecki
63f8428b40 net: dsa: b53: Fix IMP port setup on BCM5301x
Broadcom's b53 switches have one IMP (Inband Management Port) that needs
to be programmed using its own designed register. IMP port may be
different than CPU port - especially on devices with multiple CPU ports.

For that reason it's required to explicitly note IMP port index and
check for it when choosing a register to use.

This commit fixes BCM5301x support. Those switches use CPU port 5 while
their IMP port is 8. Before this patch b53 was trying to program port 5
with B53_PORT_OVERRIDE_CTRL instead of B53_GMII_PORT_OVERRIDE_CTRL(5).

It may be possible to also replace "cpu_port" usages with
dsa_is_cpu_port() but that is out of the scope of thix BCM5301x fix.

Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-05 19:02:38 +01:00
Rafał Miłecki
d12e1c4649 net: dsa: b53: Set correct number of ports in the DSA struct
Setting DSA_MAX_PORTS caused DSA to call b53 callbacks (e.g.
b53_disable_port() during dsa_register_switch()) for invalid
(non-existent) ports. That made b53 modify unrelated registers and is
one of reasons for a broken BCM5301x support.

This problem exists for years but DSA_MAX_PORTS usage has changed few
times. It seems the most accurate to reference commit dropping
dsa_switch_alloc() in the Fixes tag.

Fixes: 7e99e3470172 ("net: dsa: remove dsa_switch_alloc helper")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-02 13:32:24 +01:00
Rafał Miłecki
cdb067d31c net: dsa: b53: Fix calculating number of switch ports
It isn't true that CPU port is always the last one. Switches BCM5301x
have 9 ports (port 6 being inactive) and they use port 5 as CPU by
default (depending on design some other may be CPU ports too).

A more reliable way of determining number of ports is to check for the
last set bit in the "enabled_ports" bitfield.

This fixes b53 internal state, it will allow providing accurate info to
the DSA and is required to fix BCM5301x support.

Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-02 13:32:24 +01:00
Jan Hoffmann
552799f8b3 net: dsa: lantiq_gswip: fix maximum frame length
Currently, outgoing packets larger than 1496 bytes are dropped when
tagged VLAN is used on a switch port.

Add the frame check sequence length to the value of the register
GSWIP_MAC_FLEN to fix this. This matches the lantiq_ppa vendor driver,
which uses a value consisting of 1518 bytes for the MAC frame, plus the
lengths of special tag and VLAN tags.

Fixes: 14fceff4771e ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Cc: stable@vger.kernel.org
Signed-off-by: Jan Hoffmann <jan@3e8.eu>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-02 11:34:33 +01:00
Jakub Kicinski
97c78d0af5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 17:57:57 -07:00
Kurt Kanzenbach
b7658ed35a net: dsa: hellcreek: Adjust schedule look ahead window
Traffic schedules can only be started up to eight seconds within the
future. Therefore, the driver periodically checks every two seconds whether the
admin base time provided by the user is inside that window. If so the schedule
is started. Otherwise the check is deferred.

However, according to the programming manual the look ahead window size should
be four - not eight - seconds. By using the proposed value of four seconds
starting a schedule at a specified admin base time actually works as expected.

Fixes: 24dfc6eb39b2 ("net: dsa: hellcreek: Add TAPRIO offloading support")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:26:06 +01:00
Kurt Kanzenbach
a7db5ed863 net: dsa: hellcreek: Fix incorrect setting of GCL
Currently the gate control list which is programmed into the hardware is
incorrect resulting in wrong traffic schedules. The problem is the loop
variables are incremented before they are referenced. Therefore, move the
increment to the end of the loop.

Fixes: 24dfc6eb39b2 ("net: dsa: hellcreek: Add TAPRIO offloading support")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-26 10:26:05 +01:00
Vladimir Oltean
8ded916092 net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid
Introduced in commit 38b5beeae7a4 ("net: dsa: sja1105: prepare tagger
for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid
function solved quite a different problem than our needs are now.

Then, we used best-effort VLAN filtering and we were using the xmit_tpid
to tunnel packets coming from an 8021q upper through the TX VLAN allocated
by tag_8021q to that egress port. The need for a different VLAN protocol
depending on switch revision came from the fact that this in itself was
more of a hack to trick the hardware into accepting tunneled VLANs in
the first place.

Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we
supported them again, we would not do that using the same method of
{tunneling the VLAN on egress, retagging the VLAN on ingress} that we
had in the best-effort VLAN filtering mode. It seems rather simpler that
we just allocate a VLAN in the VLAN table that is simply not used by the
bridge at all, or by any other port.

Anyway, I have 2 gripes with the current sja1105_xmit_tpid:

1. When sending packets on behalf of a VLAN-aware bridge (with the new
   TX forwarding offload framework) plus untagged (with the tag_8021q
   VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S
   and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent
   through the DSA master have a VLAN protocol of 0x8100 and others of
   0x88a8. This is strange and there is no reason for it now. If we have
   a bridge and are therefore forced to send using that bridge's TPID,
   we can as well blend with that bridge's VLAN protocol for all packets.

2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver,
   because it looks inside dp->priv. It is desirable to keep as much
   separation between taggers and switch drivers as possible. Now it
   doesn't do that anymore.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25 11:14:34 +01:00
Vladimir Oltean
b0b8c67eaa net: dsa: sja1105: drop untagged packets on the CPU and DSA ports
The sja1105 driver is a bit special in its use of VLAN headers as DSA
tags. This is because in VLAN-aware mode, the VLAN headers use an actual
TPID of 0x8100, which is understood even by the DSA master as an actual
VLAN header.

Furthermore, control packets such as PTP and STP are transmitted with no
VLAN header as a DSA tag, because, depending on switch generation, there
are ways to steer these control packets towards a precise egress port
other than VLAN tags. Transmitting control packets as untagged means
leaving a door open for traffic in general to be transmitted as untagged
from the DSA master, and for it to traverse the switch and exit a random
switch port according to the FDB lookup.

This behavior is a bit out of line with other DSA drivers which have
native support for DSA tagging. There, it is to be expected that the
switch only accepts DSA-tagged packets on its CPU port, dropping
everything that does not match this pattern.

We perhaps rely a bit too much on the switches' hardware dropping on the
CPU port, and place no other restrictions in the kernel data path to
avoid that. For example, sja1105 is also a bit special in that STP/PTP
packets are transmitted using "management routes"
(sja1105_port_deferred_xmit): when sending a link-local packet from the
CPU, we must first write a SPI message to the switch to tell it to
expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
it towards port 3 when it gets it. This entry expires as soon as it
matches a packet received by the switch, and it needs to be reinstalled
for the next packet etc. All in all quite a ghetto mechanism, but it is
all that the sja1105 switches offer for injecting a control packet.
The driver takes a mutex for serializing control packets and making the
pairs of SPI writes of a management route and its associated skb atomic,
but to be honest, a mutex is only relevant as long as all parties agree
to take it. With the DSA design, it is possible to open an AF_PACKET
socket on the DSA master net device, and blast packets towards
01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
it all goes kaput because management routes installed by the driver will
match skbs sent by the DSA master, and not skbs generated by the driver
itself. So they will end up being routed on the wrong port.

So through the lens of that, maybe it would make sense to avoid that
from happening by doing something in the network stack, like: introduce
a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
dev_hard_start_xmit(), introduce the following check:

	if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
		kfree_skb(skb);

Ok, maybe that is a bit drastic, but that would at least prevent a bunch
of problems. For example, right now, even though the majority of DSA
switches drop packets without DSA tags sent by the DSA master (and
therefore the majority of garbage that user space daemons like avahi and
udhcpcd and friends create), it is still conceivable that an aggressive
user space program can open an AF_PACKET socket and inject a spoofed DSA
tag directly on the DSA master. We have no protection against that; the
packet will be understood by the switch and be routed wherever user
space says. Furthermore: there are some DSA switches where we even have
register access over Ethernet, using DSA tags. So even user space
drivers are possible in this way. This is a huge hole.

However, the biggest thing that bothers me is that udhcpcd attempts to
ask for an IP address on all interfaces by default, and with sja1105, it
will attempt to get a valid IP address on both the DSA master as well as
on sja1105 switch ports themselves. So with IP addresses in the same
subnet on multiple interfaces, the routing table will be messed up and
the system will be unusable for traffic until it is configured manually
to not ask for an IP address on the DSA master itself.

It turns out that it is possible to avoid that in the sja1105 driver, at
least very superficially, by requesting the switch to drop VLAN-untagged
packets on the CPU port. With the exception of control packets, all
traffic originated from tag_sja1105.c is already VLAN-tagged, so only
STP and PTP packets need to be converted. For that, we need to uphold
the equivalence between an untagged and a pvid-tagged packet, and to
remember that the CPU port of sja1105 uses a pvid of 4095.

Now that we drop untagged traffic on the CPU port, non-aggressive user
space applications like udhcpcd stop bothering us, and sja1105 effectively
becomes just as vulnerable to the aggressive kind of user space programs
as other DSA switches are (ok, users can also create 8021q uppers on top
of the DSA master in the case of sja1105, but in future patches we can
easily deny that, but it still doesn't change the fact that VLAN-tagged
packets can still be injected over raw sockets).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25 11:14:33 +01:00
Vladimir Oltean
73ceab8326 net: dsa: sja1105: prevent tag_8021q VLANs from being received on user ports
Currently it is possible for an attacker to craft packets with a fake
DSA tag and send them to us, and our user ports will accept them and
preserve that VLAN when transmitting towards the CPU. Then the tagger
will be misled into thinking that the packets came on a different port
than they really came on.

Up until recently there wasn't a good option to prevent this from
happening. In SJA1105P and later, the MAC Configuration Table introduced
two options called:
- DRPSITAG: Drop Single Inner Tagged Frames
- DRPSOTAG: Drop Single Outer Tagged Frames

Because the sja1105 driver classifies all VLANs as "outer VLANs" (S-Tags),
it would be in principle possible to enable the DRPSOTAG bit on ports
using tag_8021q, and drop on ingress all packets which have a VLAN tag.
When the switch is VLAN-unaware, this works, because it uses a custom
TPID of 0xdadb, so any "tagged" packets received on a user port are
probably a spoofing attempt. But when the switch overall is VLAN-aware,
and some ports are standalone (therefore they use tag_8021q), the TPID
is 0x8100, and the port can receive a mix of untagged and VLAN-tagged
packets. The untagged ones will be classified to the tag_8021q pvid, and
the tagged ones to the VLAN ID from the packet header. Yes, it is true
that since commit 4fbc08bd3665 ("net: dsa: sja1105: deny 8021q uppers on
ports") we no longer support this mixed mode, but that is a temporary
limitation which will eventually be lifted. It would be nice to not
introduce one more restriction via DRPSOTAG, which would make the
standalone ports of a VLAN-aware switch drop genuinely VLAN-tagged
packets.

Also, the DRPSOTAG bit is not available on the first generation of
switches (SJA1105E, SJA1105T). So since one of the key features of this
driver is compatibility across switch generations, this makes it an even
less desirable approach.

The breakthrough comes from commit bef0746cf4cc ("net: dsa: sja1105:
make sure untagged packets are dropped on ingress ports with no pvid"),
where it became obvious that untagged packets are not dropped even if
the ingress port is not in the VMEMB_PORT vector of that port's pvid.
However, VLAN-tagged packets are subject to VLAN ingress
checking/dropping. This means that instead of using the catch-all
DRPSOTAG bit introduced in SJA1105P, we can drop tagged packets on a
per-VLAN basis, and this is already compatible with SJA1105E/T.

This patch adds an "allowed_ingress" argument to sja1105_vlan_add(), and
we call it with "false" for tag_8021q VLANs on user ports. The tag_8021q
VLANs still need to be allowed, of course, on ingress to DSA ports and
CPU ports.

We also need to refine the drop_untagged check in sja1105_commit_pvid to
make it not freak out about this new configuration. Currently it will
try to keep the configuration consistent between untagged and pvid-tagged
packets, so if the pvid of a port is 1 but VLAN 1 is not in VMEMB_PORT,
packets tagged with VID 1 will behave the same as untagged packets, and
be dropped. This behavior is what we want for ports under a VLAN-aware
bridge, but for the ports with a tag_8021q pvid, we want untagged
packets to be accepted, but packets tagged with a header recognized by
the switch as a tag_8021q VLAN to be dropped. So only restrict the
drop_untagged check to apply to the bridge_pvid, not to the tag_8021q_pvid.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25 11:14:33 +01:00
DENG Qingfang
1ca8a193ca net: dsa: mt7530: manually set up VLAN ID 0
The driver was relying on dsa_slave_vlan_rx_add_vid to add VLAN ID 0. After
the blamed commit, VLAN ID 0 won't be set up anymore, breaking software
bridging fallback on VLAN-unaware bridges.

Manually set up VLAN ID 0 to fix this.

Fixes: 06cfb2df7eb0 ("net: dsa: don't advertise 'rx-vlan-filter' when not needed")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25 11:09:31 +01:00
Nathan Rossi
3b0720ba00 net: dsa: mv88e6xxx: Update mv88e6393x serdes errata
In early erratas this issue only covered port 0 when changing from
[x]MII (rev A 3.6). In subsequent errata versions this errata changed to
cover the additional "Hardware reset in CPU managed mode" condition, and
removed the note specifying that it only applied to port 0.

In designs where the device is configured with CPU managed mode
(CPU_MGD), on reset all SERDES ports (p0, p9, p10) have a stuck power
down bit and require this initial power up procedure. As such apply this
errata to all three SERDES ports of the mv88e6393x.

Signed-off-by: Nathan Rossi <nathan.rossi@digi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24 10:48:46 +01:00
Vladimir Oltean
58adf9dcb1 net: dsa: let drivers state that they need VLAN filtering while standalone
As explained in commit e358bef7c392 ("net: dsa: Give drivers the chance
to veto certain upper devices"), the hellcreek driver uses some tricks
to comply with the network stack expectations: it enforces port
separation in standalone mode using VLANs. For untagged traffic,
bridging between ports is prevented by using different PVIDs, and for
VLAN-tagged traffic, it never accepts 8021q uppers with the same VID on
two ports, so packets with one VLAN cannot leak from one port to another.

That is almost fine*, and has worked because hellcreek relied on an
implicit behavior of the DSA core that was changed by the previous
patch: the standalone ports declare the 'rx-vlan-filter' feature as 'on
[fixed]'. Since most of the DSA drivers are actually VLAN-unaware in
standalone mode, that feature was actually incorrectly reflecting the
hardware/driver state, so there was a desire to fix it. This leaves the
hellcreek driver in a situation where it has to explicitly request this
behavior from the DSA framework.

We configure the ports as follows:

- Standalone: 'rx-vlan-filter' is on. An 8021q upper on top of a
  standalone hellcreek port will go through dsa_slave_vlan_rx_add_vid
  and will add a VLAN to the hardware tables, giving the driver the
  opportunity to refuse it through .port_prechangeupper.

- Bridged with vlan_filtering=0: 'rx-vlan-filter' is off. An 8021q upper
  on top of a bridged hellcreek port will not go through
  dsa_slave_vlan_rx_add_vid, because there will not be any attempt to
  offload this VLAN. The driver already disables VLAN awareness, so that
  upper should receive the traffic it needs.

- Bridged with vlan_filtering=1: 'rx-vlan-filter' is on. An 8021q upper
  on top of a bridged hellcreek port will call dsa_slave_vlan_rx_add_vid,
  and can again be vetoed through .port_prechangeupper.

*It is not actually completely fine, because if I follow through
correctly, we can have the following situation:

ip link add br0 type bridge vlan_filtering 0
ip link set lan0 master br0 # lan0 now becomes VLAN-unaware
ip link set lan0 nomaster # lan0 fails to become VLAN-aware again, therefore breaking isolation

This patch fixes that corner case by extending the DSA core logic, based
on this requested attribute, to change the VLAN awareness state of the
switch (port) when it leaves the bridge.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24 09:30:58 +01:00
Vladimir Oltean
3b95d1b293 net: mscc: ocelot: transmit the VLAN filtering restrictions via extack
We need to transmit more restrictions in future patches, convert this
one to netlink extack.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20 14:39:52 +01:00
Vladimir Oltean
01af940e9b net: mscc: ocelot: transmit the "native VLAN" error via extack
We need to reject some more configurations in future patches, convert
the existing one to netlink extack.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20 14:39:52 +01:00
Jakub Kicinski
f444fea789 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/ptp/Kconfig:
  55c8fca1dae1 ("ptp_pch: Restore dependency on PCI")
  e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 18:09:18 -07:00
Vladimir Oltean
994d2cbb08 net: dsa: tag_sja1105: be dsa_loop-safe
Add support for tag_sja1105 running on non-sja1105 DSA ports, by making
sure that every time we dereference dp->priv, we check the switch's
dsa_switch_ops (otherwise we access a struct sja1105_port structure that
is in fact something else).

This adds an unconditional build-time dependency between sja1105 being
built as module => tag_sja1105 must also be built as module. This was
there only for PTP before.

Some sane defaults must also take place when not running on sja1105
hardware. These are:

- sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols
  depending on VLAN awareness and switch revision (when an encapsulated
  VLAN must be sent). Default to 0x8100.

- sja1105_rcv_meta_state_machine: this aggregates PTP frames with their
  metadata timestamp frames. When running on non-sja1105 hardware, don't
  do that and accept all frames unmodified.

- sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c
  which writes a management route over SPI. When not running on sja1105
  hardware, bypass the SPI write and send the frame as-is.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:33:15 +01:00
Vladimir Oltean
ed5d2937a6 net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse
It seems that of_find_compatible_node has a weird calling convention in
which it calls of_node_put() on the "from" node argument, instead of
leaving that up to the caller. This comes from the fact that
of_find_compatible_node with a non-NULL "from" argument it only supposed
to be used as the iterator function of for_each_compatible_node(). OF
iterator functions call of_node_get on the next OF node and of_node_put()
on the previous one.

When of_find_compatible_node calls of_node_put, it actually never
expects the refcount to drop to zero, because the call is done under the
atomic devtree_lock context, and when the refcount drops to zero it
triggers a kobject and a sysfs file deletion, which assume blocking
context.

So any driver call to of_find_compatible_node is probably buggy because
an unexpected of_node_put() takes place.

What should be done is to use the of_get_compatible_child() function.

Fixes: 5a8f09748ee7 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
Link: https://lore.kernel.org/netdev/20210814010139.kzryimmp4rizlznt@skbuf/
Suggested-by: Frank Rowand <frowand.list@gmail.com>
Suggested-by: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-18 10:21:01 +01:00
Vladimir Oltean
022522aca4 net: dsa: sja1105: reorganize probe, remove, setup and teardown ordering
The sja1105 driver's initialization and teardown sequence is a chaotic
mess that has gathered a lot of cruft over time. It works because there
is no strict dependency between the functions, but it could be improved.

The basic principle that teardown should be the exact reverse of setup
is obviously not held. We have initialization steps (sja1105_tas_setup,
sja1105_flower_setup) in the probe method that are torn down in the DSA
.teardown method instead of driver unbind time.

We also have code after the dsa_register_switch() call, which implicitly
means after the .setup() method has finished, which is pretty unusual.

Also, sja1105_teardown() has calls set up in a different order than the
error path of sja1105_setup(): see the reversed ordering between
sja1105_ptp_clock_unregister and sja1105_mdiobus_unregister.

Also, sja1105_static_config_load() is called towards the end of
sja1105_setup(), but sja1105_static_config_free() is also towards the
end of the error path and teardown path. The static_config_load() call
should be earlier.

Also, making and breaking the connections between struct sja1105_port
and struct dsa_port could be refactored into dedicated functions, makes
the code easier to follow.

We move some code from the DSA .setup() method into the probe method,
like the device tree parsing, and we move some code from the probe
method into the DSA .setup() method to be symmetric with its placement
in the DSA .teardown() method, which is nice because the unbind function
has a single call to dsa_unregister_switch(). Example of the latter type
of code movement are the connections between ports mentioned above, they
are now in the .setup() method.

Finally, due to fact that the kthread_init_worker() call is no longer
in sja1105_probe() - located towards the bottom of the file - but in
sja1105_setup() - located much higher - there is an inverse ordering
with the worker function declaration, sja1105_port_deferred_xmit. To
avoid that, the entire sja1105_setup() and sja1105_teardown() functions
are moved towards the bottom of the file.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16 11:24:53 +01:00
Vladimir Oltean
e6e12df625 net: mscc: ocelot: convert to phylink
The felix DSA driver, which is a wrapper over the same hardware class as
ocelot, is integrated with phylink, but ocelot is using the plain PHY
library. It makes sense to bring together the two implementations, which
is what this patch achieves.

This is a large patch and hard to break up, but it does the following:

The existing ocelot_adjust_link writes some registers, and
felix_phylink_mac_link_up writes some registers, some of them are
common, but both functions write to some registers to which the other
doesn't.

The main reasons for this are:
- Felix switches so far have used an NXP PCS so they had no need to
  write the PCS1G registers that ocelot_adjust_link writes
- Felix switches have the MAC fixed at 1G, so some of the MAC speed
  changes actually break the link and must be avoided.

The naming conventions for the functions introduced in this patch are:
- vsc7514_phylink_{mac_config,validate} are specific to the Ocelot
  instantiations and placed in ocelot_net.c which is built only for the
  ocelot switchdev driver.
- ocelot_phylink_mac_link_{up,down} are shared between the ocelot
  switchdev driver and the felix DSA driver (they are put in the common
  lib).

One by one, the registers written by ocelot_adjust_link are:

DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this
                   register since its out-of-reset value was fine and
                   did not need changing. The write is moved to the
                   common ocelot_phylink_mac_link_up and on felix it is
                   guarded by a quirk bit that makes the written value
                   identical with the out-of-reset one
DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config
PCS1G_MODE_CFG - same as above
PCS1G_SD_CFG - same as above
PCS1G_CFG - same as above
PCS1G_ANEG_CFG - same as above
PCS1G_LB_CFG - same as above
DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable
                  touched this. felix_phylink_mac_link_{up,down} also
                  do. We go with what felix does and put it in
                  ocelot_phylink_mac_link_up.
DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both
                write this, but to different values. Move to the common
                ocelot_phylink_mac_link_up and make sure via the quirk
                that the old values are preserved for both.
ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up
                  did not. Runtime invariant, speed does not matter since
                  PFC is disabled via the RX_PFC_ENA bits which are cleared.
                  Move to vsc7514_phylink_mac_config.
QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and
                                 felix_phylink_mac_link_{up,down} wrote
                                 this. Ocelot also wrote this register
                                 from ocelot_port_disable. Keep what
                                 felix did, move in ocelot_phylink_mac_link_{up,down}
                                 and delete ocelot_port_disable.
ANA_POL_FLOWC - same as above
SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas
                 ocelot always enabled RX and TX flow control, felix
                 listened to phylink (for the most part, at least - see
                 the 2500base-X comment).

The registers which only felix_phylink_mac_link_up wrote are:

SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control
                          worked on ocelot. Not it should, since the
                          code is shared with felix where it does.
ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink
                    should be the one touching them, deleted.

Other changes:

- The old phylib registration code was in mscc_ocelot_init_ports. It is
  hard to work with 2 levels of indentation already in, and with hard to
  follow teardown logic. The new phylink registration code was moved
  inside ocelot_probe_port(), right between alloc_etherdev() and
  register_netdev(). It could not be done before (=> outside of)
  ocelot_probe_port() because ocelot_probe_port() allocates the struct
  ocelot_port which we then use to assign ocelot_port->phy_mode to. It
  is more preferable to me to have all PHY handling logic inside the
  same function.
- On the same topic: struct ocelot_port_private :: serdes is only used
  in ocelot_port_open to set the SERDES protocol to Ethernet. This is
  logically a runtime invariant and can be done just once, when the port
  registers with phylink. We therefore don't even need to keep the
  serdes reference inside struct ocelot_port_private, or to use the devm
  variant of of_phy_get().
- Phylink needs a valid phy-mode for phylink_create() to succeed, and
  the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts
  don't define one for the internal PHY ports. So we patch
  PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL.
- There was a strategically placed:

	switch (priv->phy_mode) {
	case PHY_INTERFACE_MODE_NA:
	        continue;

  which made the code skip the serdes initialization for the internal
  PHY ports. Frankly that is not all that obvious, so now we explicitly
  initialize the serdes under an "if" condition and not rely on code
  jumps, so everything is clearer.
- There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII
  ports. Since that is in fact the default value for the register field
  DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear
  the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out
  of reset, which does match the comment. I don't even want to know why
  this code is placed there, but if there is indeed an issue that all
  ports that share a QSGMII lane must all be up, then this logic is
  already buggy, since mscc_ocelot_init_ports iterates using
  for_each_available_child_of_node, so nobody prevents the user from
  putting a 'status = "disabled";' for some QSGMII ports which would
  break the driver's assumption.
  In any case, in the eventuality that I'm right, we would have yet
  another issue if ocelot_phylink_mac_link_down would reset those ports
  and that would be forbidden, so since the ocelot_adjust_link logic did
  not do that (maybe for a reason), add another quirk to preserve the
  old logic.

The ocelot driver teardown goes through all ports in one fell swoop.
When initialization of one port fails, the ocelot->ports[port] pointer
for that is reset to NULL, and teardown is done only for non-NULL ports,
so there is no reason to do partial teardowns, let the central
mscc_ocelot_release_ports() do its job.

Tested bind, unbind, rebind, link up, link down, speed change on mock-up
hardware (modified the driver to probe on Felix VSC9959). Also
regression tested the felix DSA driver. Could not test the Ocelot
specific bits (PCS1G, SERDES, device tree bindings).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16 11:19:34 +01:00
Vladimir Oltean
46efe4efb9 net: dsa: felix: stop calling ocelot_port_{enable,disable}
ocelot_port_enable touches ANA_PORT_PORT_CFG, which has the following
fields:

- LOCKED_PORTMOVE_CPU, LEARNDROP, LEARNCPU, LEARNAUTO, RECV_ENA, all of
  which are written with their hardware default values, also runtime
  invariants. So it makes no sense to write these during every .ndo_open.

- PORTID_VAL: this field has an out-of-reset value of zero for all ports
  and must be initialized by software. Additionally, the
  ocelot_setup_logical_port_ids() code path sets up different logical
  port IDs for the ports in a hardware LAG, and we absolutely don't want
  .ndo_open to interfere there and reset those values.

So in fact the write from ocelot_port_enable can better be moved to
ocelot_init_port, and the .ndo_open hook deleted.

ocelot_port_disable touches DEV_MAC_ENA_CFG and QSYS_SWITCH_PORT_MODE_PORT_ENA,
in an attempt to undo what ocelot_adjust_link did. But since .ndo_stop
does not get called each time the link falls (i.e. this isn't a
substitute for .phylink_mac_link_down), felix already does better at
this by writing those registers already in felix_phylink_mac_link_down.

So keep ocelot_port_disable (for now, until ocelot is converted to
phylink too), and just delete the felix call to it, which is not
necessary.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16 11:19:34 +01:00
Arnd Bergmann
e5f3155267 ethernet: fix PTP_1588_CLOCK dependencies
The 'imply' keyword does not do what most people think it does, it only
politely asks Kconfig to turn on another symbol, but does not prevent
it from being disabled manually or built as a loadable module when the
user is built-in. In the ICE driver, the latter now causes a link failure:

aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl':
ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config'
ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config'
aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config'
ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset':
ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release'
ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild':

This is a recurring problem in many drivers, and we have discussed
it several times befores, without reaching a consensus. I'm providing
a link to the previous email thread for reference, which discusses
some related problems.

To solve the dependency issue better than the 'imply' keyword, introduce a
separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver
can depend on if it is able to use PTP support when available, but works
fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are
then prevented from being built-in, the same way as with a 'depends on
PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick,
but that can be rather confusing when you first see it.

Since this should cover the dependencies correctly, the IS_REACHABLE()
hack in the header is no longer needed now, and can be turned back
into a normal IS_ENABLED() check. Any driver that gets the dependency
wrong will now cause a link time failure rather than being unable to use
PTP support when that is in a loadable module.

However, the two recently added ptp_get_vclocks_index() and
ptp_convert_timestamp() interfaces are only called from builtin code with
ethtool and socket timestamps, so keep the current behavior by stubbing
those out completely when PTP is in a loadable module. This should be
addressed properly in a follow-up.

As Richard suggested, we may want to actually turn PTP support into a
'bool' option later on, preventing it from being a loadable module
altogether, which would be one way to solve the problem with the ethtool
interface.

Fixes: 06c16d89d2cb ("ice: register 1588 PTP clock device object for E810 devices")
Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/
Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/
Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 17:49:05 -07:00
Jakub Kicinski
f4083a752a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
  9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
  9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins")
  099fdeda659d ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
  a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
  c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
  5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
  2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
  7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
  7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 06:41:22 -07:00
Vladimir Oltean
700fa08da4 net: dsa: sja1105: unregister the MDIO buses during teardown
The call to sja1105_mdiobus_unregister is present in the error path but
absent from the main driver unbind path.

Fixes: 5a8f09748ee7 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12 10:53:40 +01:00
DENG Qingfang
7428022b50 net: dsa: mt7530: fix VLAN traffic leaks again
When a port leaves a VLAN-aware bridge, the current code does not clear
other ports' matrix field bit. If the bridge is later set to VLAN-unaware
mode, traffic in the bridge may leak to that port.

Remove the VLAN filtering check in mt7530_port_bridge_leave.

Fixes: 474a2ddaa192 ("net: dsa: mt7530: fix VLAN traffic leaks")
Fixes: 83163f7dca56 ("net: dsa: mediatek: add VLAN support for MT7530")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12 10:52:58 +01:00