2019-10-14 21:51:20 +05:30
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2018, Sensor-Technik Wiedemann GmbH
2019-05-02 23:23:30 +03:00
* Copyright ( c ) 2018 - 2019 , Vladimir Oltean < olteanv @ gmail . com >
*/
# ifndef _SJA1105_H
# define _SJA1105_H
net: dsa: sja1105: Add support for the PTP clock
The design of this PHC driver is influenced by the switch's behavior
w.r.t. timestamping. It exposes two PTP counters, one free-running
(PTPTSCLK) and the other offset- and frequency-corrected in hardware
through PTPCLKVAL, PTPCLKADD and PTPCLKRATE. The MACs can sample either
of these for frame timestamps.
However, the user manual warns that taking timestamps based on the
corrected clock is less than useful, as the switch can deliver corrupted
timestamps in a variety of circumstances.
Therefore, this PHC uses the free-running PTPTSCLK together with a
timecounter/cyclecounter structure that translates it into a software
time domain. Thus, the settime/adjtime and adjfine callbacks are
hardware no-ops.
The timestamps (introduced in a further patch) will also be translated
to the correct time domain before being handed over to the userspace PTP
stack.
The introduction of a second set of PHC operations that operate on the
hardware PTPCLKVAL/PTPCLKADD/PTPCLKRATE in the future is somewhat
unavoidable, as the TTEthernet core uses the corrected PTP time domain.
However, the free-running counter + timecounter structure combination
will suffice for now, as the resulting timestamps yield a sub-50 ns
synchronization offset in steady state using linuxptp.
For this patch, in absence of frame timestamping, the operations of the
switch PHC were tested by syncing it to the system time as a local slave
clock with:
phc2sys -s CLOCK_REALTIME -c swp2 -O 0 -m -S 0.01
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08 15:04:34 +03:00
# include <linux/ptp_clock_kernel.h>
# include <linux/timecounter.h>
2019-05-02 23:23:30 +03:00
# include <linux/dsa/sja1105.h>
net: dsa: sja1105: implement cross-chip bridging operations
sja1105 uses dsa_8021q for DSA tagging, a format which is VLAN at heart
and which is compatible with cascading. A complete description of this
tagging format is in net/dsa/tag_8021q.c, but a quick summary is that
each external-facing port tags incoming frames with a unique pvid, and
this special VLAN is transmitted as tagged towards the inside of the
system, and as untagged towards the exterior. The tag encodes the switch
id and the source port index.
This means that cross-chip bridging for dsa_8021q only entails adding
the dsa_8021q pvids of one switch to the RX filter of the other
switches. Everything else falls naturally into place, as long as the
bottom-end of ports (the leaves in the tree) is comprised exclusively of
dsa_8021q-compatible (i.e. sja1105 switches). Otherwise, there would be
a chance that a front-panel switch transmits a packet tagged with a
dsa_8021q header, header which it wouldn't be able to remove, and which
would hence "leak" out.
The only use case I tested (due to lack of board availability) was when
the sja1105 switches are part of disjoint trees (however, this doesn't
change the fact that multiple sja1105 switches still need unique switch
identifiers in such a system). But in principle, even "true" single-tree
setups (with DSA links) should work just as fine, except for a small
change which I can't test: dsa_towards_port should be used instead of
dsa_upstream_port (I made the assumption that the routing port that any
sja1105 should use towards its neighbours is the CPU port. That might
not hold true in other setups).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10 19:37:43 +03:00
# include <linux/dsa/8021q.h>
2019-05-02 23:23:30 +03:00
# include <net/dsa.h>
2019-05-05 13:19:27 +03:00
# include <linux/mutex.h>
2019-05-02 23:23:30 +03:00
# include "sja1105_static_config.h"
# define SJA1105ET_FDB_BIN_SIZE 4
2019-05-02 23:23:36 +03:00
/* The hardware value is in multiples of 10 ms.
* The passed parameter is in multiples of 1 ms .
*/
# define SJA1105_AGEING_TIME_MS(ms) ((ms) / 10)
net: dsa: sja1105: add support for the SJA1110 switch family
The SJA1110 is basically an SJA1105 with more ports, some integrated
PHYs (100base-T1 and 100base-TX) and an embedded microcontroller which
can be disabled, and the switch core can be controlled by a host running
Linux, over SPI.
This patch contains:
- the static and dynamic config packing functions, for the tables that
are common with SJA1105
- one more static config tables which is "unique" to the SJA1110
(actually it is a rehash of stuff that was placed somewhere else in
SJA1105): the PCP Remapping Table
- a reset and clock configuration procedure for the SJA1110 switch.
This resets just the switch subsystem, and gates off the clock which
powers on the embedded microcontroller.
- an RGMII delay configuration procedure for SJA1110, which is very
similar to SJA1105, but different enough for us to be unable to reuse
it (this is a pattern that repeats itself)
- some adaptations to dynamic config table entries which are no longer
programmed in the same way. For example, to delete a VLAN, you used to
write an entry through the dynamic reconfiguration interface with the
desired VLAN ID, and with the VALIDENT bit set to false. Now, the VLAN
table entries contain a TYPE_ENTRY field, which must be set to zero
(in a backwards-incompatible way) in order for the entry to be deleted,
or to some other entry for the VLAN to match "inner tagged" or "outer
tagged" packets.
- a similar thing for the static config: the xMII Mode Parameters Table
encoding for SGMII and MII (the latter just when attached to a
100base-TX PHY) just isn't what it used to be in SJA1105. They are
identical, except there is an extra "special" bit which needs to be
set. Set it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-08 12:25:36 +03:00
# define SJA1105_NUM_L2_POLICERS SJA1110_MAX_L2_POLICING_COUNT
2019-05-02 23:23:30 +03:00
net: dsa: sja1105: parse {rx, tx}-internal-delay-ps properties for RGMII delays
This change does not fix any functional issue or address any real life
use case that wasn't possible before. It is just a small step in the
process of standardizing the way in which Ethernet MAC drivers may apply
RGMII delays (traditionally these have been applied by PHYs, with no
clear definition of what to do in the case of a fixed-link).
The sja1105 driver used to apply MAC-level RGMII delays on the RX data
lines when in fixed-link mode and using a phy-mode of "rgmii-rxid" or
"rgmii-id" and on the TX data lines when using "rgmii-txid" or "rgmii-id".
But the standard definitions don't say anything about behaving
differently when the port is in fixed-link vs when it isn't, and the new
device tree bindings are about having a way of applying the delays in a
way that is independent of the phy-mode and of the fixed-link property.
When the {rx,tx}-internal-delay-ps properties are present, use them,
otherwise fall back to the old behavior and warn.
One other thing to note is that the SJA1105 hardware applies a delay
value in degrees rather than in picoseconds (the delay in ps changes
depending on the frequency of the RGMII clock - 125 MHz at 1G, 25 MHz at
100M, 2.5MHz at 10M). I assume that is fine, we calculate the phase
shift of the internal delay lines assuming that the device tree meant
gigabit, and we let the hardware scale those according to the link speed.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210723173108.459770-6-prasanna.vengateshan@microchip.com/
Link: https://patchwork.ozlabs.org/project/netdev/patch/20200616074955.GA9092@laureti-dev/#2461123
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18 22:29:52 +03:00
/* Calculated assuming 1Gbps, where the clock has 125 MHz (8 ns period)
* To avoid floating point operations , we ' ll multiply the degrees by 10
* to get a " phase " and get 1 decimal point precision .
*/
# define SJA1105_RGMII_DELAY_PS_TO_PHASE(ps) \
( ( ( ps ) * 360 ) / 800 )
# define SJA1105_RGMII_DELAY_PHASE_TO_PS(phase) \
( ( 800 * ( phase ) ) / 360 )
# define SJA1105_RGMII_DELAY_PHASE_TO_HW(phase) \
( ( ( phase ) - 738 ) / 9 )
# define SJA1105_RGMII_DELAY_PS_TO_HW(ps) \
SJA1105_RGMII_DELAY_PHASE_TO_HW ( SJA1105_RGMII_DELAY_PS_TO_PHASE ( ps ) )
/* Valid range in degrees is a value between 73.8 and 101.7
* in 0.9 degree increments
*/
# define SJA1105_RGMII_DELAY_MIN_PS \
SJA1105_RGMII_DELAY_PHASE_TO_PS ( 738 )
# define SJA1105_RGMII_DELAY_MAX_PS \
SJA1105_RGMII_DELAY_PHASE_TO_PS ( 1017 )
2019-11-12 02:11:53 +02:00
typedef enum {
SPI_READ = 0 ,
SPI_WRITE = 1 ,
} sja1105_spi_rw_mode_t ;
2019-09-15 05:00:02 +03:00
# include "sja1105_tas.h"
2019-10-12 02:18:15 +03:00
# include "sja1105_ptp.h"
2019-09-15 05:00:02 +03:00
net: dsa: sja1105: don't use burst SPI reads for port statistics
The current internal sja1105 driver API is optimized for retrieving many
statistics counters at once. But the switch does not do atomic snapshotting
for them anyway.
In case we start reporting the hardware port counters through
ndo_get_stats64 as well, not just ethtool, it would be good to be able
to read individual port counters and not all of them.
Additionally, since Arnd Bergmann's commit ae1804de93f6 ("dsa: sja1105:
dynamically allocate stats structure"), sja1105_get_ethtool_stats
allocates memory dynamically, since struct sja1105_port_status was
deemed to consume too much stack memory. That is not ideal.
The large structure is only needed because of the burst read.
If we read statistics one by one, we can consume less memory, and
we can avoid dynamic allocation.
Additionally, latency-sensitive interfaces such as PTP operations (for
phc2sys) might suffer if the SPI mutex is being held for too long, which
happens in the case of SPI burst reads. By reading counters one by one,
we give a chance for higher priority processes to preempt and take the
SPI bus mutex for accessing the PTP clock.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-21 16:16:08 +03:00
enum sja1105_stats_area {
MAC ,
HL1 ,
HL2 ,
ETHER ,
__MAX_SJA1105_STATS_AREA ,
} ;
2019-05-02 23:23:30 +03:00
/* Keeps the different addresses between E/T and P/Q/R/S */
struct sja1105_regs {
u64 device_id ;
u64 prod_id ;
u64 status ;
2019-05-02 23:23:37 +03:00
u64 port_control ;
2019-05-02 23:23:30 +03:00
u64 rgu ;
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05 22:20:56 +03:00
u64 vl_status ;
2019-05-02 23:23:30 +03:00
u64 config ;
u64 rmii_pll1 ;
net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT
The SJA1105 switch family has a PTP_CLK pin which emits a signal with
fixed 50% duty cycle, but variable frequency and programmable start time.
On the second generation (P/Q/R/S) switches, this pin supports even more
functionality. The use case described by the hardware documents talks
about synchronization via oneshot pulses: given 2 sja1105 switches,
arbitrarily designated as a master and a slave, the master emits a
single pulse on PTP_CLK, while the slave is configured to timestamp this
pulse received on its PTP_CLK pin (which must obviously be configured as
input). The difference between the timestamps then exactly becomes the
slave offset to the master.
The only trouble with the above is that the hardware is very much tied
into this use case only, and not very generic beyond that:
- When emitting a oneshot pulse, instead of being told when to emit it,
the switch just does it "now" and tells you later what time it was,
via the PTPSYNCTS register. [ Incidentally, this is the same register
that the slave uses to collect the ext_ts timestamp from, too. ]
- On the sync slave, there is no interrupt mechanism on reception of a
new extts, and no FIFO to buffer them, because in the foreseen use
case, software is in control of both the master and the slave pins,
so it "knows" when there's something to collect.
These 2 problems mean that:
- We don't support (at least yet) the quirky oneshot mode exposed by
the hardware, just normal periodic output.
- We abuse the hardware a little bit when we expose generic extts.
Because there's no interrupt mechanism, we need to poll at double the
frequency we expect to receive a pulse. Currently that means a
non-configurable "twice a second".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24 00:59:24 +02:00
u64 ptppinst ;
u64 ptppindur ;
net: dsa: sja1105: Add support for the PTP clock
The design of this PHC driver is influenced by the switch's behavior
w.r.t. timestamping. It exposes two PTP counters, one free-running
(PTPTSCLK) and the other offset- and frequency-corrected in hardware
through PTPCLKVAL, PTPCLKADD and PTPCLKRATE. The MACs can sample either
of these for frame timestamps.
However, the user manual warns that taking timestamps based on the
corrected clock is less than useful, as the switch can deliver corrupted
timestamps in a variety of circumstances.
Therefore, this PHC uses the free-running PTPTSCLK together with a
timecounter/cyclecounter structure that translates it into a software
time domain. Thus, the settime/adjtime and adjfine callbacks are
hardware no-ops.
The timestamps (introduced in a further patch) will also be translated
to the correct time domain before being handed over to the userspace PTP
stack.
The introduction of a second set of PHC operations that operate on the
hardware PTPCLKVAL/PTPCLKADD/PTPCLKRATE in the future is somewhat
unavoidable, as the TTEthernet core uses the corrected PTP time domain.
However, the free-running counter + timecounter structure combination
will suffice for now, as the resulting timestamps yield a sub-50 ns
synchronization offset in steady state using linuxptp.
For this patch, in absence of frame timestamping, the operations of the
switch PHC were tested by syncing it to the system time as a local slave
clock with:
phc2sys -s CLOCK_REALTIME -c swp2 -O 0 -m -S 0.01
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08 15:04:34 +03:00
u64 ptp_control ;
2019-10-16 21:41:02 +03:00
u64 ptpclkval ;
net: dsa: sja1105: Add support for the PTP clock
The design of this PHC driver is influenced by the switch's behavior
w.r.t. timestamping. It exposes two PTP counters, one free-running
(PTPTSCLK) and the other offset- and frequency-corrected in hardware
through PTPCLKVAL, PTPCLKADD and PTPCLKRATE. The MACs can sample either
of these for frame timestamps.
However, the user manual warns that taking timestamps based on the
corrected clock is less than useful, as the switch can deliver corrupted
timestamps in a variety of circumstances.
Therefore, this PHC uses the free-running PTPTSCLK together with a
timecounter/cyclecounter structure that translates it into a software
time domain. Thus, the settime/adjtime and adjfine callbacks are
hardware no-ops.
The timestamps (introduced in a further patch) will also be translated
to the correct time domain before being handed over to the userspace PTP
stack.
The introduction of a second set of PHC operations that operate on the
hardware PTPCLKVAL/PTPCLKADD/PTPCLKRATE in the future is somewhat
unavoidable, as the TTEthernet core uses the corrected PTP time domain.
However, the free-running counter + timecounter structure combination
will suffice for now, as the resulting timestamps yield a sub-50 ns
synchronization offset in steady state using linuxptp.
For this patch, in absence of frame timestamping, the operations of the
switch PHC were tested by syncing it to the system time as a local slave
clock with:
phc2sys -s CLOCK_REALTIME -c swp2 -O 0 -m -S 0.01
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08 15:04:34 +03:00
u64 ptpclkrate ;
2019-11-12 02:11:54 +02:00
u64 ptpclkcorp ;
net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT
The SJA1105 switch family has a PTP_CLK pin which emits a signal with
fixed 50% duty cycle, but variable frequency and programmable start time.
On the second generation (P/Q/R/S) switches, this pin supports even more
functionality. The use case described by the hardware documents talks
about synchronization via oneshot pulses: given 2 sja1105 switches,
arbitrarily designated as a master and a slave, the master emits a
single pulse on PTP_CLK, while the slave is configured to timestamp this
pulse received on its PTP_CLK pin (which must obviously be configured as
input). The difference between the timestamps then exactly becomes the
slave offset to the master.
The only trouble with the above is that the hardware is very much tied
into this use case only, and not very generic beyond that:
- When emitting a oneshot pulse, instead of being told when to emit it,
the switch just does it "now" and tells you later what time it was,
via the PTPSYNCTS register. [ Incidentally, this is the same register
that the slave uses to collect the ext_ts timestamp from, too. ]
- On the sync slave, there is no interrupt mechanism on reception of a
new extts, and no FIFO to buffer them, because in the foreseen use
case, software is in control of both the master and the slave pins,
so it "knows" when there's something to collect.
These 2 problems mean that:
- We don't support (at least yet) the quirky oneshot mode exposed by
the hardware, just normal periodic output.
- We abuse the hardware a little bit when we expose generic extts.
Because there's no interrupt mechanism, we need to poll at double the
frequency we expect to receive a pulse. Currently that means a
non-configurable "twice a second".
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24 00:59:24 +02:00
u64 ptpsyncts ;
2019-11-12 02:11:54 +02:00
u64 ptpschtm ;
2021-05-24 16:14:15 +03:00
u64 ptpegr_ts [ SJA1105_MAX_NUM_PORTS ] ;
u64 pad_mii_tx [ SJA1105_MAX_NUM_PORTS ] ;
u64 pad_mii_rx [ SJA1105_MAX_NUM_PORTS ] ;
u64 pad_mii_id [ SJA1105_MAX_NUM_PORTS ] ;
u64 cgu_idiv [ SJA1105_MAX_NUM_PORTS ] ;
u64 mii_tx_clk [ SJA1105_MAX_NUM_PORTS ] ;
u64 mii_rx_clk [ SJA1105_MAX_NUM_PORTS ] ;
u64 mii_ext_tx_clk [ SJA1105_MAX_NUM_PORTS ] ;
u64 mii_ext_rx_clk [ SJA1105_MAX_NUM_PORTS ] ;
u64 rgmii_tx_clk [ SJA1105_MAX_NUM_PORTS ] ;
u64 rmii_ref_clk [ SJA1105_MAX_NUM_PORTS ] ;
u64 rmii_ext_tx_clk [ SJA1105_MAX_NUM_PORTS ] ;
u64 stats [ __MAX_SJA1105_STATS_AREA ] [ SJA1105_MAX_NUM_PORTS ] ;
2021-06-08 12:25:38 +03:00
u64 mdio_100base_tx ;
u64 mdio_100base_t1 ;
2021-06-11 23:05:29 +03:00
u64 pcs_base [ SJA1105_MAX_NUM_PORTS ] ;
2021-06-08 12:25:38 +03:00
} ;
struct sja1105_mdio_private {
struct sja1105_private * priv ;
2019-05-02 23:23:30 +03:00
} ;
2021-05-31 01:59:37 +03:00
enum {
SJA1105_SPEED_AUTO ,
SJA1105_SPEED_10MBPS ,
SJA1105_SPEED_100MBPS ,
SJA1105_SPEED_1000MBPS ,
SJA1105_SPEED_2500MBPS ,
SJA1105_SPEED_MAX ,
} ;
2021-06-08 12:25:38 +03:00
enum sja1105_internal_phy_t {
SJA1105_NO_PHY = 0 ,
SJA1105_PHY_BASE_TX ,
SJA1105_PHY_BASE_T1 ,
} ;
2019-05-02 23:23:30 +03:00
struct sja1105_info {
u64 device_id ;
/* Needed for distinction between P and R, and between Q and S
* ( since the parts with / without SGMII share the same
* switch core and device_id )
*/
u64 part_no ;
2019-06-08 15:04:35 +03:00
/* E/T and P/Q/R/S have partial timestamps of different sizes.
* They must be reconstructed on both families anyway to get the full
* 64 - bit values back .
*/
int ptp_ts_bits ;
/* Also SPI commands are of different sizes to retrieve
* the egress timestamps .
*/
int ptpegr_ts_bytes ;
2020-05-28 03:27:58 +03:00
int num_cbs_shapers ;
2021-05-24 16:14:21 +03:00
int max_frame_mem ;
net: dsa: sja1105: add support for the SJA1110 switch family
The SJA1110 is basically an SJA1105 with more ports, some integrated
PHYs (100base-T1 and 100base-TX) and an embedded microcontroller which
can be disabled, and the switch core can be controlled by a host running
Linux, over SPI.
This patch contains:
- the static and dynamic config packing functions, for the tables that
are common with SJA1105
- one more static config tables which is "unique" to the SJA1110
(actually it is a rehash of stuff that was placed somewhere else in
SJA1105): the PCP Remapping Table
- a reset and clock configuration procedure for the SJA1110 switch.
This resets just the switch subsystem, and gates off the clock which
powers on the embedded microcontroller.
- an RGMII delay configuration procedure for SJA1110, which is very
similar to SJA1105, but different enough for us to be unable to reuse
it (this is a pattern that repeats itself)
- some adaptations to dynamic config table entries which are no longer
programmed in the same way. For example, to delete a VLAN, you used to
write an entry through the dynamic reconfiguration interface with the
desired VLAN ID, and with the VALIDENT bit set to false. Now, the VLAN
table entries contain a TYPE_ENTRY field, which must be set to zero
(in a backwards-incompatible way) in order for the entry to be deleted,
or to some other entry for the VLAN to match "inner tagged" or "outer
tagged" packets.
- a similar thing for the static config: the xMII Mode Parameters Table
encoding for SGMII and MII (the latter just when attached to a
100base-TX PHY) just isn't what it used to be in SJA1105. They are
identical, except there is an extra "special" bit which needs to be
set. Set it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-08 12:25:36 +03:00
int num_ports ;
net: dsa: sja1105: allow RX timestamps to be taken on all ports for SJA1110
On SJA1105, there is support for a cascade port which is presumably
connected to a downstream SJA1105 switch. The upstream one does not take
PTP timestamps for packets received on this port, presumably because the
downstream switch already did (and for PTP, it only makes sense for the
leaf nodes in a DSA switch tree to do that).
I haven't been able to validate that feature in a fully assembled setup,
so I am disabling the feature by setting the cascade port to an unused
port value (ds->num_ports).
In SJA1110, multiple cascade ports are supported, and CASC_PORT became
a bit mask from a port number. So when CASC_PORT is set to ds->num_ports
(which is 11 on SJA1110), it is actually set to 0b1011, so ports 3, 1
and 0 are configured as cascade ports and we cannot take RX timestamps
on them.
So we need to introduce a check for SJA1110 and set things differently
(to zero there), so that the cascading feature is properly disabled and
RX timestamps can be taken on all ports.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11 22:01:23 +03:00
bool multiple_cascade_ports ;
2023-09-06 00:53:38 +03:00
/* Every {port, TXQ} has its own CBS shaper */
bool fixed_cbs_mapping ;
net: dsa: add support for the SJA1110 native tagging protocol
The SJA1110 has improved a few things compared to SJA1105:
- To send a control packet from the host port with SJA1105, one needed
to program a one-shot "management route" over SPI. This is no longer
true with SJA1110, you can actually send "in-band control extensions"
in the packets sent by DSA, these are in fact DSA tags which contain
the destination port and switch ID.
- When receiving a control packet from the switch with SJA1105, the
source port and switch ID were written in bytes 3 and 4 of the
destination MAC address of the frame (which was a very poor shot at a
DSA header). If the control packet also had an RX timestamp, that
timestamp was sent in an actual follow-up packet, so there were
reordering concerns on multi-core/multi-queue DSA masters, where the
metadata frame with the RX timestamp might get processed before the
actual packet to which that timestamp belonged (there is no way to
pair a packet to its timestamp other than the order in which they were
received). On SJA1110, this is no longer true, control packets have
the source port, switch ID and timestamp all in the DSA tags.
- Timestamps from the switch were partial: to get a 64-bit timestamp as
required by PTP stacks, one would need to take the partial 24-bit or
32-bit timestamp from the packet, then read the current PTP time very
quickly, and then patch in the high bits of the current PTP time into
the captured partial timestamp, to reconstruct what the full 64-bit
timestamp must have been. That is awful because packet processing is
done in NAPI context, but reading the current PTP time is done over
SPI and therefore needs sleepable context.
But it also aggravated a few things:
- Not only is there a DSA header in SJA1110, but there is a DSA trailer
in fact, too. So DSA needs to be extended to support taggers which
have both a header and a trailer. Very unconventional - my understanding
is that the trailer exists because the timestamps couldn't be prepared
in time for putting them in the header area.
- Like SJA1105, not all packets sent to the CPU have the DSA tag added
to them, only control packets do:
* the ones which match the destination MAC filters/traps in
MAC_FLTRES1 and MAC_FLTRES0
* the ones which match FDB entries which have TRAP or TAKETS bits set
So we could in theory hack something up to request the switch to take
timestamps for all packets that reach the CPU, and those would be
DSA-tagged and contain the source port / switch ID by virtue of the
fact that there needs to be a timestamp trailer provided. BUT:
- The SJA1110 does not parse its own DSA tags in a way that is useful
for routing in cross-chip topologies, a la Marvell. And the sja1105
driver already supports cross-chip bridging from the SJA1105 days.
It does that by automatically setting up the DSA links as VLAN trunks
which contain all the necessary tag_8021q RX VLANs that must be
communicated between the switches that span the same bridge. So when
using tag_8021q on sja1105, it is possible to have 2 switches with
ports sw0p0, sw0p1, sw1p0, sw1p1, and 2 VLAN-unaware bridges br0 and
br1, and br0 can take sw0p0 and sw1p0, and br1 can take sw0p1 and
sw1p1, and forwarding will happen according to the expected rules of
the Linux bridge.
We like that, and we don't want that to go away, so as a matter of
fact, the SJA1110 tagger still needs to support tag_8021q.
So the sja1110 tagger is a hybrid between tag_8021q for data packets,
and the native hardware support for control packets.
On RX, packets have a 13-byte trailer if they contain an RX timestamp.
That trailer is padded in such a way that its byte 8 (the start of the
"residence time" field - not parsed by Linux because we don't care) is
aligned on a 16 byte boundary. So the padding has a variable length
between 0 and 15 bytes. The DSA header contains the offset of the
beginning of the padding relative to the beginning of the frame (and the
end of the padding is obviously the end of the packet minus 13 bytes,
the length of the trailer). So we discard it.
Packets which don't have a trailer contain the source port and switch ID
information in the header (they are "trap-to-host" packets). Packets
which have a trailer contain the source port and switch ID in the trailer.
On TX, the destination port mask and switch ID is always in the trailer,
so we always need to say in the header that a trailer is present.
The header needs a custom EtherType and this was chosen as 0xdadc, after
0xdada which is for Marvell and 0xdadb which is for VLANs in
VLAN-unaware mode on SJA1105 (and SJA1110 in fact too).
Because we use tag_8021q in concert with the native tagging protocol,
control packets will have 2 DSA tags.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11 22:01:29 +03:00
enum dsa_tag_protocol tag_proto ;
2019-05-02 23:23:30 +03:00
const struct sja1105_dynamic_table_ops * dyn_ops ;
const struct sja1105_table_ops * static_ops ;
const struct sja1105_regs * regs ;
net: dsa: sja1105: offload bridge port flags to device
The chip can configure unicast flooding, broadcast flooding and learning.
Learning is per port, while flooding is per {ingress, egress} port pair
and we need to configure the same value for all possible ingress ports
towards the requested one.
While multicast flooding is not officially supported, we can hack it by
using a feature of the second generation (P/Q/R/S) devices, which is that
FDB entries are maskable, and multicast addresses always have an odd
first octet. So by putting a match-all for 00:01:00:00:00:00 addr and
00:01:00:00:00:00 mask at the end of the FDB, we make sure that it is
always checked last, and does not take precedence in front of any other
MDB. So it behaves effectively as an unknown multicast entry.
For the first generation switches, this feature is not available, so
unknown multicast will always be treated the same as unknown unicast.
So the only thing we can do is request the user to offload the settings
for these 2 flags in tandem, i.e.
ip link set swp2 type bridge_slave flood off
Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
ip link set swp2 type bridge_slave flood off mcast_flood off
ip link set swp2 type bridge_slave mcast_flood on
Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12 17:16:00 +02:00
bool can_limit_mcast_flood ;
2019-11-13 00:16:41 +02:00
int ( * reset_cmd ) ( struct dsa_switch * ds ) ;
net: dsa: sja1105: Error out if RGMII delays are requested in DT
Documentation/devicetree/bindings/net/ethernet.txt is confusing because
it says what the MAC should not do, but not what it *should* do:
* "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, the MAC
should not add an RX delay in this case)
The gap in semantics is threefold:
1. Is it illegal for the MAC to apply the Rx internal delay by itself,
and simplify the phy_mode (mask off "rgmii-rxid" into "rgmii") before
passing it to of_phy_connect? The documentation would suggest yes.
1. For "rgmii-rxid", while the situation with the Rx clock skew is more
or less clear (needs to be added by the PHY), what should the MAC
driver do about the Tx delays? Is it an implicit wild card for the
MAC to apply delays in the Tx direction if it can? What if those were
already added as serpentine PCB traces, how could that be made more
obvious through DT bindings so that the MAC doesn't attempt to add
them twice and again potentially break the link?
3. If the interface is a fixed-link and therefore the PHY object is
fixed (a purely software entity that obviously cannot add clock
skew), what is the meaning of the above property?
So an interpretation of the RGMII bindings was chosen that hopefully
does not contradict their intention but also makes them more applied.
The SJA1105 driver understands to act upon "rgmii-*id" phy-mode bindings
if the port is in the PHY role (either explicitly, or if it is a
fixed-link). Otherwise it always passes the duty of setting up delays to
the PHY driver.
The error behavior that this patch adds is required on SJA1105E/T where
the MAC really cannot apply internal delays. If the other end of the
fixed-link cannot apply RGMII delays either (this would be specified
through its own DT bindings), then the situation requires PCB delays.
For SJA1105P/Q/R/S, this is however hardware supported and the error is
thus only temporary. I created a stub function pointer for configuring
delays per-port on RXC and TXC, and will implement it when I have access
to a board with this hardware setup.
Meanwhile do not allow the user to select an invalid configuration.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02 23:23:32 +03:00
int ( * setup_rgmii_delay ) ( const void * ctx , int port ) ;
2019-06-03 00:11:57 +03:00
/* Prototypes from include/net/dsa.h */
int ( * fdb_add_cmd ) ( struct dsa_switch * ds , int port ,
const unsigned char * addr , u16 vid ) ;
int ( * fdb_del_cmd ) ( struct dsa_switch * ds , int port ,
const unsigned char * addr , u16 vid ) ;
2019-11-12 02:11:53 +02:00
void ( * ptp_cmd_packing ) ( u8 * buf , struct sja1105_ptp_cmd * cmd ,
enum packing_op op ) ;
2021-06-11 22:01:30 +03:00
bool ( * rxtstamp ) ( struct dsa_switch * ds , int port , struct sk_buff * skb ) ;
net: dsa: sja1105: implement TX timestamping for SJA1110
The TX timestamping procedure for SJA1105 is a bit unconventional
because the transmit procedure itself is unconventional.
Control packets (and therefore PTP as well) are transmitted to a
specific port in SJA1105 using "management routes" which must be written
over SPI to the switch. These are one-shot rules that match by
destination MAC address on traffic coming from the CPU port, and select
the precise destination port for that packet. So to transmit a packet
from NET_TX softirq context, we actually need to defer to a process
context so that we can perform that SPI write before we send the packet.
The DSA master dev_queue_xmit() runs in process context, and we poll
until the switch confirms it took the TX timestamp, then we annotate the
skb clone with that TX timestamp. This is why the sja1105 driver does
not need an skb queue for TX timestamping.
But the SJA1110 is a bit (not much!) more conventional, and you can
request 2-step TX timestamping through the DSA header, as well as give
the switch a cookie (timestamp ID) which it will give back to you when
it has the timestamp. So now we do need a queue for keeping the skb
clones until their TX timestamps become available.
The interesting part is that the metadata frames from SJA1105 haven't
disappeared completely. On SJA1105 they were used as follow-ups which
contained RX timestamps, but on SJA1110 they are actually TX completion
packets, which contain a variable (up to 32) array of timestamps.
Why an array? Because:
- not only is the TX timestamp on the egress port being communicated,
but also the RX timestamp on the CPU port. Nice, but we don't care
about that, so we ignore it.
- because a packet could be multicast to multiple egress ports, each
port takes its own timestamp, and the TX completion packet contains
the individual timestamps on each port.
This is unconventional because switches typically have a timestamping
FIFO and raise an interrupt, but this one doesn't. So the tagger needs
to detect and parse meta frames, and call into the main switch driver,
which pairs the timestamps with the skbs in the TX timestamping queue
which are waiting for one.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-11 22:01:31 +03:00
void ( * txtstamp ) ( struct dsa_switch * ds , int port , struct sk_buff * skb ) ;
2021-05-24 16:14:17 +03:00
int ( * clocking_setup ) ( struct sja1105_private * priv ) ;
2023-01-17 00:52:25 +01:00
int ( * pcs_mdio_read_c45 ) ( struct mii_bus * bus , int phy , int mmd ,
int reg ) ;
int ( * pcs_mdio_write_c45 ) ( struct mii_bus * bus , int phy , int mmd ,
int reg , u16 val ) ;
net: dsa: sja1105: properly power down the microcontroller clock for SJA1110
It turns out that powering down the BASE_TIMER_CLK does not turn off the
microcontroller, just its timers, including the one for the watchdog.
So the embedded microcontroller is still running, and potentially still
doing things.
To prevent unwanted interference, we should power down the BASE_MCSS_CLK
as well (MCSS = microcontroller subsystem).
The trouble is that currently we turn off the BASE_TIMER_CLK for SJA1110
from the .clocking_setup() method, mostly because this is a Clock
Generation Unit (CGU) setting which was traditionally configured in that
method for SJA1105. But in SJA1105, the CGU was used for bringing up the
port clocks at the proper speeds, and in SJA1110 it's not (but rather
for initial configuration), so it's best that we rebrand the
sja1110_clocking_setup() method into what it really is - an implementation
of the .disable_microcontroller() method.
Since disabling the microcontroller only needs to be done once, at probe
time, we can choose the best place to do that as being in sja1105_setup(),
before we upload the static config to the device. This guarantees that
the static config being used by the switch afterwards is really ours.
Note that the procedure to upload a static config necessarily resets the
switch. This already did not reset the microcontroller, only the switch
core, so since the .disable_microcontroller() method is guaranteed to be
called by that point, if it's disabled, it remains disabled. Add a
comment to make that clear.
With the code movement for SJA1110 from .clocking_setup() to
.disable_microcontroller(), both methods are optional and are guarded by
"if" conditions.
Tested by enabling in the device tree the rev-mii switch port 0 that
goes towards the microcontroller, and flashing a firmware that would
have networking. Without this patch, the microcontroller can be pinged,
with this patch it cannot.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 14:52:54 +03:00
int ( * disable_microcontroller ) ( struct sja1105_private * priv ) ;
2019-05-02 23:23:30 +03:00
const char * name ;
net: dsa: sja1105: add a PHY interface type compatibility matrix
On the SJA1105, all ports support the parallel "xMII" protocols (MII,
RMII, RGMII) except for port 4 on SJA1105R/S which supports only SGMII.
This was relatively easy to model, by special-casing the SGMII port.
On the SJA1110, certain ports can be pinmuxed between SGMII and xMII, or
between SGMII and an internal 100base-TX PHY. This creates problems,
because the driver's assumption so far was that if a port supports
SGMII, it uses SGMII.
We allow the device tree to tell us how the port pinmuxing is done, and
check that against a PHY interface type compatibility matrix for
plausibility.
The other big change is that instead of doing SGMII configuration based
on what the port supports, we do it based on what is the configured
phy_mode of the port.
The 2500base-x support added in this patch is not complete.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-31 01:59:36 +03:00
bool supports_mii [ SJA1105_MAX_NUM_PORTS ] ;
bool supports_rmii [ SJA1105_MAX_NUM_PORTS ] ;
bool supports_rgmii [ SJA1105_MAX_NUM_PORTS ] ;
bool supports_sgmii [ SJA1105_MAX_NUM_PORTS ] ;
bool supports_2500basex [ SJA1105_MAX_NUM_PORTS ] ;
2021-06-08 12:25:38 +03:00
enum sja1105_internal_phy_t internal_phy [ SJA1105_MAX_NUM_PORTS ] ;
2021-05-31 01:59:37 +03:00
const u64 port_speed [ SJA1105_SPEED_MAX ] ;
2019-05-02 23:23:30 +03:00
} ;
2020-05-05 22:20:54 +03:00
enum sja1105_key_type {
SJA1105_KEY_BCAST ,
SJA1105_KEY_TC ,
SJA1105_KEY_VLAN_UNAWARE_VL ,
SJA1105_KEY_VLAN_AWARE_VL ,
} ;
struct sja1105_key {
enum sja1105_key_type type ;
union {
/* SJA1105_KEY_TC */
struct {
int pcp ;
} tc ;
/* SJA1105_KEY_VLAN_UNAWARE_VL */
/* SJA1105_KEY_VLAN_AWARE_VL */
struct {
u64 dmac ;
u16 vid ;
u16 pcp ;
} vl ;
} ;
} ;
2020-03-29 14:52:02 +03:00
enum sja1105_rule_type {
SJA1105_RULE_BCAST_POLICER ,
SJA1105_RULE_TC_POLICER ,
2020-05-05 22:20:55 +03:00
SJA1105_RULE_VL ,
} ;
enum sja1105_vl_type {
SJA1105_VL_NONCRITICAL ,
SJA1105_VL_RATE_CONSTRAINED ,
SJA1105_VL_TIME_TRIGGERED ,
2020-03-29 14:52:02 +03:00
} ;
struct sja1105_rule {
struct list_head list ;
unsigned long cookie ;
unsigned long port_mask ;
2020-05-05 22:20:54 +03:00
struct sja1105_key key ;
2020-03-29 14:52:02 +03:00
enum sja1105_rule_type type ;
2020-05-05 22:20:55 +03:00
/* Action */
2020-03-29 14:52:02 +03:00
union {
/* SJA1105_RULE_BCAST_POLICER */
struct {
int sharindx ;
} bcast_pol ;
/* SJA1105_RULE_TC_POLICER */
struct {
int sharindx ;
} tc_pol ;
2020-05-05 22:20:55 +03:00
/* SJA1105_RULE_VL */
struct {
enum sja1105_vl_type type ;
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05 22:20:56 +03:00
unsigned long destports ;
int sharindx ;
int maxlen ;
int ipv ;
u64 base_time ;
u64 cycle_time ;
int num_entries ;
struct action_gate_entry * entries ;
struct flow_stats stats ;
2020-05-05 22:20:55 +03:00
} vl ;
2020-03-29 14:52:02 +03:00
} ;
} ;
struct sja1105_flow_block {
struct list_head rules ;
bool l2_policer_used [ SJA1105_NUM_L2_POLICERS ] ;
2020-05-05 22:20:55 +03:00
int num_virtual_links ;
2020-03-29 14:52:02 +03:00
} ;
2019-05-02 23:23:30 +03:00
struct sja1105_private {
struct sja1105_static_config static_config ;
net: dsa: sja1105: parse {rx, tx}-internal-delay-ps properties for RGMII delays
This change does not fix any functional issue or address any real life
use case that wasn't possible before. It is just a small step in the
process of standardizing the way in which Ethernet MAC drivers may apply
RGMII delays (traditionally these have been applied by PHYs, with no
clear definition of what to do in the case of a fixed-link).
The sja1105 driver used to apply MAC-level RGMII delays on the RX data
lines when in fixed-link mode and using a phy-mode of "rgmii-rxid" or
"rgmii-id" and on the TX data lines when using "rgmii-txid" or "rgmii-id".
But the standard definitions don't say anything about behaving
differently when the port is in fixed-link vs when it isn't, and the new
device tree bindings are about having a way of applying the delays in a
way that is independent of the phy-mode and of the fixed-link property.
When the {rx,tx}-internal-delay-ps properties are present, use them,
otherwise fall back to the old behavior and warn.
One other thing to note is that the SJA1105 hardware applies a delay
value in degrees rather than in picoseconds (the delay in ps changes
depending on the frequency of the RGMII clock - 125 MHz at 1G, 25 MHz at
100M, 2.5MHz at 10M). I assume that is fine, we calculate the phase
shift of the internal delay lines assuming that the device tree meant
gigabit, and we let the hardware scale those according to the link speed.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210723173108.459770-6-prasanna.vengateshan@microchip.com/
Link: https://patchwork.ozlabs.org/project/netdev/patch/20200616074955.GA9092@laureti-dev/#2461123
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18 22:29:52 +03:00
int rgmii_rx_delay_ps [ SJA1105_MAX_NUM_PORTS ] ;
int rgmii_tx_delay_ps [ SJA1105_MAX_NUM_PORTS ] ;
2021-05-31 01:59:35 +03:00
phy_interface_t phy_mode [ SJA1105_MAX_NUM_PORTS ] ;
2021-06-04 17:01:49 +03:00
bool fixed_link [ SJA1105_MAX_NUM_PORTS ] ;
2021-02-16 13:41:19 +02:00
unsigned long ucast_egress_floods ;
unsigned long bcast_egress_floods ;
2021-12-10 01:34:41 +02:00
unsigned long hwts_tx_en ;
2023-07-04 01:05:45 +03:00
unsigned long hwts_rx_en ;
2019-05-02 23:23:30 +03:00
const struct sja1105_info * info ;
2021-05-21 00:16:57 +03:00
size_t max_xfer_len ;
2019-05-02 23:23:30 +03:00
struct spi_device * spidev ;
struct dsa_switch * ds ;
net: dsa: sja1105: delete vlan delta save/restore logic
With the best_effort_vlan_filtering mode now gone, the driver does not
have 3 operating modes anymore (VLAN-unaware, VLAN-aware and best effort),
but only 2.
The idea is that we will gain support for network stack I/O through a
VLAN-aware bridge, using the data plane offload framework (imprecise RX,
imprecise TX). So the VLAN-aware use case will be more functional.
But standalone ports that are part of the same switch when some other
ports are under a VLAN-aware bridge should work too. Termination on
those should work through the tag_8021q RX VLAN and TX VLAN.
This was not possible using the old logic, because:
- in VLAN-unaware mode, only the tag_8021q VLANs were committed to hw
- in VLAN-aware mode, only the bridge VLANs were committed to hw
- in best-effort VLAN mode, both the tag_8021q and bridge VLANs were
committed to hw
The strategy for the new VLAN-aware mode is to allow the bridge and the
tag_8021q VLANs to coexist in the VLAN table at the same time.
[ yes, we need to make sure that the bridge cannot install a tag_8021q
VLAN, but ]
This means that the save/restore logic introduced by commit ec5ae61076d0
("net: dsa: sja1105: save/restore VLANs using a delta commit method")
does not serve a purpose any longer. We can delete it and restore the
old code that simply adds a VLAN to the VLAN table and calls it a day.
Note that we keep the sja1105_commit_pvid() function from those days,
but adapt it slightly. Ports that are under a VLAN-aware bridge use the
bridge's pvid, ports that are standalone or under a VLAN-unaware bridge
use the tag_8021q pvid, for local termination or VLAN-unaware forwarding.
Now, when the vlan_filtering property is toggled for the bridge, the
pvid of the ports beneath it is the only thing that's changing, we no
longer delete some VLANs and restore others.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-26 19:55:31 +03:00
u16 bridge_pvid [ SJA1105_MAX_NUM_PORTS ] ;
u16 tag_8021q_pvid [ SJA1105_MAX_NUM_PORTS ] ;
2020-03-29 14:52:02 +03:00
struct sja1105_flow_block flow_block ;
2019-05-05 13:19:27 +03:00
/* Serializes transmission of management frames so that
* the switch doesn ' t confuse them with one another .
*/
struct mutex mgmt_lock ;
2021-12-10 01:34:43 +02:00
/* PTP two-step TX timestamp ID, and its serialization lock */
spinlock_t ts_id_lock ;
u8 ts_id ;
2021-10-24 20:17:50 +03:00
/* Serializes access to the dynamic config interface */
struct mutex dynamic_config_lock ;
2020-09-26 02:04:20 +03:00
struct devlink_region * * regions ;
2020-05-28 03:27:58 +03:00
struct sja1105_cbs_entry * cbs ;
2021-06-08 12:25:38 +03:00
struct mii_bus * mdio_base_t1 ;
struct mii_bus * mdio_base_tx ;
2021-06-11 23:05:28 +03:00
struct mii_bus * mdio_pcs ;
struct dw_xpcs * xpcs [ SJA1105_MAX_NUM_PORTS ] ;
2019-10-12 02:18:15 +03:00
struct sja1105_ptp_data ptp_data ;
2019-09-15 05:00:02 +03:00
struct sja1105_tas_data tas_data ;
2019-05-02 23:23:30 +03:00
} ;
# include "sja1105_dynamic_config.h"
struct sja1105_spi_message {
u64 access ;
u64 read_count ;
u64 address ;
} ;
2019-09-15 05:00:02 +03:00
/* From sja1105_main.c */
2019-11-12 23:22:00 +02:00
enum sja1105_reset_reason {
SJA1105_VLAN_FILTERING = 0 ,
SJA1105_AGEING_TIME ,
SJA1105_SCHEDULING ,
2020-03-27 21:55:45 +02:00
SJA1105_BEST_EFFORT_POLICING ,
2020-05-05 22:20:55 +03:00
SJA1105_VIRTUAL_LINKS ,
2019-11-12 23:22:00 +02:00
} ;
int sja1105_static_config_reload ( struct sja1105_private * priv ,
enum sja1105_reset_reason reason ) ;
2021-02-13 22:43:19 +02:00
int sja1105_vlan_filtering ( struct dsa_switch * ds , int port , bool enabled ,
struct netlink_ext_ack * extack ) ;
2020-05-12 20:20:37 +03:00
void sja1105_frame_memory_partitioning ( struct sja1105_private * priv ) ;
2021-06-08 12:25:38 +03:00
/* From sja1105_mdio.c */
int sja1105_mdiobus_register ( struct dsa_switch * ds ) ;
void sja1105_mdiobus_unregister ( struct dsa_switch * ds ) ;
2023-01-17 00:52:25 +01:00
int sja1105_pcs_mdio_read_c45 ( struct mii_bus * bus , int phy , int mmd , int reg ) ;
int sja1105_pcs_mdio_write_c45 ( struct mii_bus * bus , int phy , int mmd , int reg ,
u16 val ) ;
int sja1110_pcs_mdio_read_c45 ( struct mii_bus * bus , int phy , int mmd , int reg ) ;
int sja1110_pcs_mdio_write_c45 ( struct mii_bus * bus , int phy , int mmd , int reg ,
u16 val ) ;
2021-06-08 12:25:38 +03:00
2020-09-26 02:04:19 +03:00
/* From sja1105_devlink.c */
int sja1105_devlink_setup ( struct dsa_switch * ds ) ;
void sja1105_devlink_teardown ( struct dsa_switch * ds ) ;
2020-09-26 02:04:21 +03:00
int sja1105_devlink_info_get ( struct dsa_switch * ds ,
struct devlink_info_req * req ,
struct netlink_ext_ack * extack ) ;
2020-09-26 02:04:19 +03:00
2019-05-02 23:23:30 +03:00
/* From sja1105_spi.c */
2019-10-01 22:18:01 +03:00
int sja1105_xfer_buf ( const struct sja1105_private * priv ,
sja1105_spi_rw_mode_t rw , u64 reg_addr ,
net: dsa: sja1105: Switch to scatter/gather API for SPI
This reworks the SPI transfer implementation to make use of more of the
SPI core features. The main benefit is to avoid the memcpy in
sja1105_xfer_buf().
The memcpy was only needed because the function was transferring a
single buffer at a time. So it needed to copy the caller-provided buffer
at buf + 4, to store the SPI message header in the "headroom" area.
But the SPI core supports scatter-gather messages, comprised of multiple
transfers. We can actually use those to break apart every SPI message
into 2 transfers: one for the header and one for the actual payload.
To keep the behavior the same regarding the chip select signal, it is
necessary to tell the SPI core to de-assert the chip select after each
chunk. This was not needed before, because each spi_message contained
only 1 single transfer.
The meaning of the per-transfer cs_change=1 is:
- If the transfer is the last one of the message, keep CS asserted
- Otherwise, deassert CS
We need to deassert CS in the "otherwise" case, which was implicit
before.
Avoiding the memcpy creates yet another opportunity. The device can't
process more than 256 bytes of SPI payload at a time, so the
sja1105_xfer_long_buf() function used to exist, to split the larger
caller buffer into chunks.
But these chunks couldn't be used as scatter/gather buffers for
spi_message until now, because of that memcpy (we would have needed more
memory for each chunk). So we can now remove the sja1105_xfer_long_buf()
function and have a single implementation for long and short buffers.
Another benefit is lower usage of stack memory. Previously we had to
store 2 SPI buffers for each chunk. Due to the elimination of the
memcpy, we can now send pointers to the actual chunks from the
caller-supplied buffer to the SPI core.
Since the patch merges two functions into a rewritten implementation,
the function prototype was also changed, mainly for cosmetic consistency
with the structures used within it.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-12 01:31:15 +03:00
u8 * buf , size_t len ) ;
2019-10-01 22:18:00 +03:00
int sja1105_xfer_u32 ( const struct sja1105_private * priv ,
2019-11-09 13:32:22 +02:00
sja1105_spi_rw_mode_t rw , u64 reg_addr , u32 * value ,
struct ptp_system_timestamp * ptp_sts ) ;
2019-10-01 22:18:00 +03:00
int sja1105_xfer_u64 ( const struct sja1105_private * priv ,
2019-11-09 13:32:22 +02:00
sja1105_spi_rw_mode_t rw , u64 reg_addr , u64 * value ,
struct ptp_system_timestamp * ptp_sts ) ;
2020-09-26 02:04:20 +03:00
int static_config_buf_prepare_for_upload ( struct sja1105_private * priv ,
void * config_buf , int buf_len ) ;
2019-05-02 23:23:30 +03:00
int sja1105_static_config_upload ( struct sja1105_private * priv ) ;
2019-06-08 16:03:43 +03:00
int sja1105_inhibit_tx ( const struct sja1105_private * priv ,
unsigned long port_bitmap , bool tx_inhibited ) ;
2019-05-02 23:23:30 +03:00
2020-06-20 20:18:32 +03:00
extern const struct sja1105_info sja1105e_info ;
extern const struct sja1105_info sja1105t_info ;
extern const struct sja1105_info sja1105p_info ;
extern const struct sja1105_info sja1105q_info ;
extern const struct sja1105_info sja1105r_info ;
extern const struct sja1105_info sja1105s_info ;
net: dsa: sja1105: add support for the SJA1110 switch family
The SJA1110 is basically an SJA1105 with more ports, some integrated
PHYs (100base-T1 and 100base-TX) and an embedded microcontroller which
can be disabled, and the switch core can be controlled by a host running
Linux, over SPI.
This patch contains:
- the static and dynamic config packing functions, for the tables that
are common with SJA1105
- one more static config tables which is "unique" to the SJA1110
(actually it is a rehash of stuff that was placed somewhere else in
SJA1105): the PCP Remapping Table
- a reset and clock configuration procedure for the SJA1110 switch.
This resets just the switch subsystem, and gates off the clock which
powers on the embedded microcontroller.
- an RGMII delay configuration procedure for SJA1110, which is very
similar to SJA1105, but different enough for us to be unable to reuse
it (this is a pattern that repeats itself)
- some adaptations to dynamic config table entries which are no longer
programmed in the same way. For example, to delete a VLAN, you used to
write an entry through the dynamic reconfiguration interface with the
desired VLAN ID, and with the VALIDENT bit set to false. Now, the VLAN
table entries contain a TYPE_ENTRY field, which must be set to zero
(in a backwards-incompatible way) in order for the entry to be deleted,
or to some other entry for the VLAN to match "inner tagged" or "outer
tagged" packets.
- a similar thing for the static config: the xMII Mode Parameters Table
encoding for SGMII and MII (the latter just when attached to a
100base-TX PHY) just isn't what it used to be in SJA1105. They are
identical, except there is an extra "special" bit which needs to be
set. Set it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-08 12:25:36 +03:00
extern const struct sja1105_info sja1110a_info ;
extern const struct sja1105_info sja1110b_info ;
extern const struct sja1105_info sja1110c_info ;
extern const struct sja1105_info sja1110d_info ;
2019-05-02 23:23:30 +03:00
/* From sja1105_clocking.c */
typedef enum {
XMII_MAC = 0 ,
XMII_PHY = 1 ,
} sja1105_mii_role_t ;
typedef enum {
XMII_MODE_MII = 0 ,
XMII_MODE_RMII = 1 ,
XMII_MODE_RGMII = 2 ,
2020-03-20 13:29:37 +02:00
XMII_MODE_SGMII = 3 ,
2019-05-02 23:23:30 +03:00
} sja1105_phy_interface_t ;
2019-06-08 19:12:28 +03:00
int sja1105pqrs_setup_rgmii_delay ( const void * ctx , int port ) ;
net: dsa: sja1105: add support for the SJA1110 switch family
The SJA1110 is basically an SJA1105 with more ports, some integrated
PHYs (100base-T1 and 100base-TX) and an embedded microcontroller which
can be disabled, and the switch core can be controlled by a host running
Linux, over SPI.
This patch contains:
- the static and dynamic config packing functions, for the tables that
are common with SJA1105
- one more static config tables which is "unique" to the SJA1110
(actually it is a rehash of stuff that was placed somewhere else in
SJA1105): the PCP Remapping Table
- a reset and clock configuration procedure for the SJA1110 switch.
This resets just the switch subsystem, and gates off the clock which
powers on the embedded microcontroller.
- an RGMII delay configuration procedure for SJA1110, which is very
similar to SJA1105, but different enough for us to be unable to reuse
it (this is a pattern that repeats itself)
- some adaptations to dynamic config table entries which are no longer
programmed in the same way. For example, to delete a VLAN, you used to
write an entry through the dynamic reconfiguration interface with the
desired VLAN ID, and with the VALIDENT bit set to false. Now, the VLAN
table entries contain a TYPE_ENTRY field, which must be set to zero
(in a backwards-incompatible way) in order for the entry to be deleted,
or to some other entry for the VLAN to match "inner tagged" or "outer
tagged" packets.
- a similar thing for the static config: the xMII Mode Parameters Table
encoding for SGMII and MII (the latter just when attached to a
100base-TX PHY) just isn't what it used to be in SJA1105. They are
identical, except there is an extra "special" bit which needs to be
set. Set it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-08 12:25:36 +03:00
int sja1110_setup_rgmii_delay ( const void * ctx , int port ) ;
2019-05-02 23:23:30 +03:00
int sja1105_clocking_setup_port ( struct sja1105_private * priv , int port ) ;
int sja1105_clocking_setup ( struct sja1105_private * priv ) ;
net: dsa: sja1105: properly power down the microcontroller clock for SJA1110
It turns out that powering down the BASE_TIMER_CLK does not turn off the
microcontroller, just its timers, including the one for the watchdog.
So the embedded microcontroller is still running, and potentially still
doing things.
To prevent unwanted interference, we should power down the BASE_MCSS_CLK
as well (MCSS = microcontroller subsystem).
The trouble is that currently we turn off the BASE_TIMER_CLK for SJA1110
from the .clocking_setup() method, mostly because this is a Clock
Generation Unit (CGU) setting which was traditionally configured in that
method for SJA1105. But in SJA1105, the CGU was used for bringing up the
port clocks at the proper speeds, and in SJA1110 it's not (but rather
for initial configuration), so it's best that we rebrand the
sja1110_clocking_setup() method into what it really is - an implementation
of the .disable_microcontroller() method.
Since disabling the microcontroller only needs to be done once, at probe
time, we can choose the best place to do that as being in sja1105_setup(),
before we upload the static config to the device. This guarantees that
the static config being used by the switch afterwards is really ours.
Note that the procedure to upload a static config necessarily resets the
switch. This already did not reset the microcontroller, only the switch
core, so since the .disable_microcontroller() method is guaranteed to be
called by that point, if it's disabled, it remains disabled. Add a
comment to make that clear.
With the code movement for SJA1110 from .clocking_setup() to
.disable_microcontroller(), both methods are optional and are guarded by
"if" conditions.
Tested by enabling in the device tree the rev-mii switch port 0 that
goes towards the microcontroller, and flashing a firmware that would
have networking. Without this patch, the microcontroller can be pinged,
with this patch it cannot.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18 14:52:54 +03:00
int sja1110_disable_microcontroller ( struct sja1105_private * priv ) ;
2019-05-02 23:23:30 +03:00
2019-05-02 23:23:35 +03:00
/* From sja1105_ethtool.c */
void sja1105_get_ethtool_stats ( struct dsa_switch * ds , int port , u64 * data ) ;
void sja1105_get_strings ( struct dsa_switch * ds , int port ,
u32 stringset , u8 * data ) ;
int sja1105_get_sset_count ( struct dsa_switch * ds , int port , int sset ) ;
2019-05-02 23:23:30 +03:00
2019-05-02 23:23:35 +03:00
/* From sja1105_dynamic_config.c */
2019-05-02 23:23:30 +03:00
int sja1105_dynamic_config_read ( struct sja1105_private * priv ,
enum sja1105_blk_idx blk_idx ,
int index , void * entry ) ;
int sja1105_dynamic_config_write ( struct sja1105_private * priv ,
enum sja1105_blk_idx blk_idx ,
int index , void * entry , bool keep ) ;
2019-06-03 00:15:45 +03:00
enum sja1105_iotag {
SJA1105_C_TAG = 0 , /* Inner VLAN header */
SJA1105_S_TAG = 1 , /* Outer VLAN header */
} ;
net: dsa: sja1105: add support for the SJA1110 switch family
The SJA1110 is basically an SJA1105 with more ports, some integrated
PHYs (100base-T1 and 100base-TX) and an embedded microcontroller which
can be disabled, and the switch core can be controlled by a host running
Linux, over SPI.
This patch contains:
- the static and dynamic config packing functions, for the tables that
are common with SJA1105
- one more static config tables which is "unique" to the SJA1110
(actually it is a rehash of stuff that was placed somewhere else in
SJA1105): the PCP Remapping Table
- a reset and clock configuration procedure for the SJA1110 switch.
This resets just the switch subsystem, and gates off the clock which
powers on the embedded microcontroller.
- an RGMII delay configuration procedure for SJA1110, which is very
similar to SJA1105, but different enough for us to be unable to reuse
it (this is a pattern that repeats itself)
- some adaptations to dynamic config table entries which are no longer
programmed in the same way. For example, to delete a VLAN, you used to
write an entry through the dynamic reconfiguration interface with the
desired VLAN ID, and with the VALIDENT bit set to false. Now, the VLAN
table entries contain a TYPE_ENTRY field, which must be set to zero
(in a backwards-incompatible way) in order for the entry to be deleted,
or to some other entry for the VLAN to match "inner tagged" or "outer
tagged" packets.
- a similar thing for the static config: the xMII Mode Parameters Table
encoding for SGMII and MII (the latter just when attached to a
100base-TX PHY) just isn't what it used to be in SJA1105. They are
identical, except there is an extra "special" bit which needs to be
set. Set it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-08 12:25:36 +03:00
enum sja1110_vlan_type {
SJA1110_VLAN_INVALID = 0 ,
SJA1110_VLAN_C_TAG = 1 , /* Single inner VLAN tag */
SJA1110_VLAN_S_TAG = 2 , /* Single outer VLAN tag */
SJA1110_VLAN_D_TAG = 3 , /* Double tagged, use outer tag for lookup */
} ;
enum sja1110_shaper_type {
SJA1110_LEAKY_BUCKET_SHAPER = 0 ,
SJA1110_CBS_SHAPER = 1 ,
} ;
2019-06-03 00:11:57 +03:00
u8 sja1105et_fdb_hash ( struct sja1105_private * priv , const u8 * addr , u16 vid ) ;
int sja1105et_fdb_add ( struct dsa_switch * ds , int port ,
const unsigned char * addr , u16 vid ) ;
int sja1105et_fdb_del ( struct dsa_switch * ds , int port ,
const unsigned char * addr , u16 vid ) ;
int sja1105pqrs_fdb_add ( struct dsa_switch * ds , int port ,
const unsigned char * addr , u16 vid ) ;
int sja1105pqrs_fdb_del ( struct dsa_switch * ds , int port ,
const unsigned char * addr , u16 vid ) ;
2019-05-02 23:23:31 +03:00
2020-03-29 14:52:02 +03:00
/* From sja1105_flower.c */
int sja1105_cls_flower_del ( struct dsa_switch * ds , int port ,
struct flow_cls_offload * cls , bool ingress ) ;
int sja1105_cls_flower_add ( struct dsa_switch * ds , int port ,
struct flow_cls_offload * cls , bool ingress ) ;
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05 22:20:56 +03:00
int sja1105_cls_flower_stats ( struct dsa_switch * ds , int port ,
struct flow_cls_offload * cls , bool ingress ) ;
2020-03-29 14:52:02 +03:00
void sja1105_flower_setup ( struct dsa_switch * ds ) ;
void sja1105_flower_teardown ( struct dsa_switch * ds ) ;
2020-05-05 22:20:55 +03:00
struct sja1105_rule * sja1105_rule_find ( struct sja1105_private * priv ,
unsigned long cookie ) ;
2020-03-29 14:52:02 +03:00
2019-05-02 23:23:30 +03:00
# endif