338 lines
9.0 KiB
C
Raw Normal View History

/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright (c) 2018, Sensor-Technik Wiedemann GmbH
* Copyright (c) 2018-2019, Vladimir Oltean <olteanv@gmail.com>
*/
#ifndef _SJA1105_H
#define _SJA1105_H
net: dsa: sja1105: Add support for the PTP clock The design of this PHC driver is influenced by the switch's behavior w.r.t. timestamping. It exposes two PTP counters, one free-running (PTPTSCLK) and the other offset- and frequency-corrected in hardware through PTPCLKVAL, PTPCLKADD and PTPCLKRATE. The MACs can sample either of these for frame timestamps. However, the user manual warns that taking timestamps based on the corrected clock is less than useful, as the switch can deliver corrupted timestamps in a variety of circumstances. Therefore, this PHC uses the free-running PTPTSCLK together with a timecounter/cyclecounter structure that translates it into a software time domain. Thus, the settime/adjtime and adjfine callbacks are hardware no-ops. The timestamps (introduced in a further patch) will also be translated to the correct time domain before being handed over to the userspace PTP stack. The introduction of a second set of PHC operations that operate on the hardware PTPCLKVAL/PTPCLKADD/PTPCLKRATE in the future is somewhat unavoidable, as the TTEthernet core uses the corrected PTP time domain. However, the free-running counter + timecounter structure combination will suffice for now, as the resulting timestamps yield a sub-50 ns synchronization offset in steady state using linuxptp. For this patch, in absence of frame timestamping, the operations of the switch PHC were tested by syncing it to the system time as a local slave clock with: phc2sys -s CLOCK_REALTIME -c swp2 -O 0 -m -S 0.01 Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08 15:04:34 +03:00
#include <linux/ptp_clock_kernel.h>
#include <linux/timecounter.h>
#include <linux/dsa/sja1105.h>
net: dsa: sja1105: implement cross-chip bridging operations sja1105 uses dsa_8021q for DSA tagging, a format which is VLAN at heart and which is compatible with cascading. A complete description of this tagging format is in net/dsa/tag_8021q.c, but a quick summary is that each external-facing port tags incoming frames with a unique pvid, and this special VLAN is transmitted as tagged towards the inside of the system, and as untagged towards the exterior. The tag encodes the switch id and the source port index. This means that cross-chip bridging for dsa_8021q only entails adding the dsa_8021q pvids of one switch to the RX filter of the other switches. Everything else falls naturally into place, as long as the bottom-end of ports (the leaves in the tree) is comprised exclusively of dsa_8021q-compatible (i.e. sja1105 switches). Otherwise, there would be a chance that a front-panel switch transmits a packet tagged with a dsa_8021q header, header which it wouldn't be able to remove, and which would hence "leak" out. The only use case I tested (due to lack of board availability) was when the sja1105 switches are part of disjoint trees (however, this doesn't change the fact that multiple sja1105 switches still need unique switch identifiers in such a system). But in principle, even "true" single-tree setups (with DSA links) should work just as fine, except for a small change which I can't test: dsa_towards_port should be used instead of dsa_upstream_port (I made the assumption that the routing port that any sja1105 should use towards its neighbours is the CPU port. That might not hold true in other setups). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-05-10 19:37:43 +03:00
#include <linux/dsa/8021q.h>
#include <net/dsa.h>
#include <linux/mutex.h>
#include "sja1105_static_config.h"
#define SJA1105_NUM_PORTS 5
#define SJA1105_NUM_TC 8
#define SJA1105ET_FDB_BIN_SIZE 4
/* The hardware value is in multiples of 10 ms.
* The passed parameter is in multiples of 1 ms.
*/
#define SJA1105_AGEING_TIME_MS(ms) ((ms) / 10)
#define SJA1105_NUM_L2_POLICERS 45
typedef enum {
SPI_READ = 0,
SPI_WRITE = 1,
} sja1105_spi_rw_mode_t;
net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload This qdisc offload is the closest thing to what the SJA1105 supports in hardware for time-based egress shaping. The switch core really is built around SAE AS6802/TTEthernet (a TTTech standard) but can be made to operate similarly to IEEE 802.1Qbv with some constraints: - The gate control list is a global list for all ports. There are 8 execution threads that iterate through this global list in parallel. I don't know why 8, there are only 4 front-panel ports. - Care must be taken by the user to make sure that two execution threads never get to execute a GCL entry simultaneously. I created a O(n^4) checker for this hardware limitation, prior to accepting a taprio offload configuration as valid. - The spec says that if a GCL entry's interval is shorter than the frame length, you shouldn't send it (and end up in head-of-line blocking). Well, this switch does anyway. - The switch has no concept of ADMIN and OPER configurations. Because it's so simple, the TAS settings are loaded through the static config tables interface, so there isn't even place for any discussion about 'graceful switchover between ADMIN and OPER'. You just reset the switch and upload a new OPER config. - The switch accepts multiple time sources for the gate events. Right now I am using the standalone clock source as opposed to PTP. So the base time parameter doesn't really do much. Support for the PTP clock source will be added in a future series. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15 05:00:02 +03:00
#include "sja1105_tas.h"
#include "sja1105_ptp.h"
net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload This qdisc offload is the closest thing to what the SJA1105 supports in hardware for time-based egress shaping. The switch core really is built around SAE AS6802/TTEthernet (a TTTech standard) but can be made to operate similarly to IEEE 802.1Qbv with some constraints: - The gate control list is a global list for all ports. There are 8 execution threads that iterate through this global list in parallel. I don't know why 8, there are only 4 front-panel ports. - Care must be taken by the user to make sure that two execution threads never get to execute a GCL entry simultaneously. I created a O(n^4) checker for this hardware limitation, prior to accepting a taprio offload configuration as valid. - The spec says that if a GCL entry's interval is shorter than the frame length, you shouldn't send it (and end up in head-of-line blocking). Well, this switch does anyway. - The switch has no concept of ADMIN and OPER configurations. Because it's so simple, the TAS settings are loaded through the static config tables interface, so there isn't even place for any discussion about 'graceful switchover between ADMIN and OPER'. You just reset the switch and upload a new OPER config. - The switch accepts multiple time sources for the gate events. Right now I am using the standalone clock source as opposed to PTP. So the base time parameter doesn't really do much. Support for the PTP clock source will be added in a future series. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15 05:00:02 +03:00
/* Keeps the different addresses between E/T and P/Q/R/S */
struct sja1105_regs {
u64 device_id;
u64 prod_id;
u64 status;
u64 port_control;
u64 rgu;
net: dsa: sja1105: implement tc-gate using time-triggered virtual links Restrict the TTEthernet hardware support on this switch to operate as closely as possible to IEEE 802.1Qci as possible. This means that it can perform PTP-time-based ingress admission control on streams identified by {DMAC, VID, PCP}, which is useful when trying to ensure the determinism of traffic scheduled via IEEE 802.1Qbv. The oddity comes from the fact that in hardware (and in TTEthernet at large), virtual links always need a full-blown action, including not only the type of policing, but also the list of destination ports. So in practice, a single tc-gate action will result in all packets getting dropped. Additional actions (either "trap" or "redirect") need to be specified in the same filter rule such that the conforming packets are actually forwarded somewhere. Apart from the VL Lookup, Policing and Forwarding tables which need to be programmed for each flow (virtual link), the Schedule engine also needs to be told to open/close the admission gates for each individual virtual link. A fairly accurate (and detailed) description of how that works is already present in sja1105_tas.c, since it is already used to trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key point here, we remember that the schedule engine supports 8 "subschedules" (execution threads that iterate through the global schedule in parallel, and that no 2 hardware threads must execute a schedule entry at the same time). For tc-taprio, each egress port used one of these 8 subschedules, leaving a total of 4 subschedules unused. In principle we could have allocated 1 subschedule for the tc-gate offload of each ingress port, but actually the schedules of all virtual links installed on each ingress port would have needed to be merged together, before they could have been programmed to hardware. So simplify our life and just merge the entire tc-gate configuration, for all virtual links on all ingress ports, into a single subschedule. Be sure to check that against the usual hardware scheduling conflicts, and program it to hardware alongside any tc-taprio subschedule that may be present. The following scenarios were tested: 1. Quantitative testing: tc qdisc add dev swp2 clsact tc filter add dev swp2 ingress flower skip_sw \ dst_mac 42:be:24:9b:76:20 \ action gate index 1 base-time 0 \ sched-entry OPEN 1200 -1 -1 \ sched-entry CLOSE 1200 -1 -1 \ action trap ping 192.168.1.2 -f PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. ............................. --- 192.168.1.2 ping statistics --- 948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms 2. Qualitative testing (with a phase-aligned schedule - the clocks are synchronized by ptp4l, not shown here): Receiver (sja1105): tc qdisc add dev swp2 clsact now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \ sec=$(echo $now | awk -F. '{print $1}') && \ base_time="$(((sec + 2) * 1000000000))" && \ echo "base time ${base_time}" tc filter add dev swp2 ingress flower skip_sw \ dst_mac 42:be:24:9b:76:20 \ action gate base-time ${base_time} \ sched-entry OPEN 60000 -1 -1 \ sched-entry CLOSE 40000 -1 -1 \ action trap Sender (enetc): now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \ sec=$(echo $now | awk -F. '{print $1}') && \ base_time="$(((sec + 2) * 1000000000))" && \ echo "base time ${base_time}" tc qdisc add dev eno0 parent root taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time ${base_time} \ sched-entry S 01 50000 \ sched-entry S 00 50000 \ flags 2 ping -A 192.168.1.1 PING 192.168.1.1 (192.168.1.1): 56 data bytes ... ^C --- 192.168.1.1 ping statistics --- 1425 packets transmitted, 1424 packets received, 0% packet loss round-trip min/avg/max = 0.322/0.361/0.990 ms And just for comparison, with the tc-taprio schedule deleted: ping -A 192.168.1.1 PING 192.168.1.1 (192.168.1.1): 56 data bytes ... ^C --- 192.168.1.1 ping statistics --- 33 packets transmitted, 19 packets received, 42% packet loss round-trip min/avg/max = 0.336/0.464/0.597 ms Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05 22:20:56 +03:00
u64 vl_status;
u64 config;
net: dsa: sja1105: Add support for the SGMII port SJA1105 switches R and S have one SerDes port with an 802.3z quasi-compatible PCS, hardwired on port 4. The other ports are still MII/RMII/RGMII. The PCS performs rate adaptation to lower link speeds; the MAC on this port is hardwired at gigabit. Only full duplex is supported. The SGMII port can be configured as part of the static config tables, as well as through a dedicated SPI address region for its pseudo-clause-22 registers. However it looks like the static configuration is not able to change some out-of-reset values (like the value of MII_BMCR), so at the end of the day, having code for it is utterly pointless. We are just going to use the pseudo-C22 interface. Because the PCS gets reset when the switch resets, we have to add even more restoration logic to sja1105_static_config_reload, otherwise the SGMII port breaks after operations such as enabling PTP timestamping which require a switch reset. >From PHYLINK perspective, the switch supports *only* SGMII (it doesn't support 1000Base-X). It also doesn't expose access to the raw config word for in-band AN in registers MII_ADV/MII_LPA. It is able to work in the following modes: - Forced speed - SGMII in-band AN slave (speed received from PHY) - SGMII in-band AN master (acting as a PHY) The latter mode is not supported by this patch. It is even unclear to me how that would be described. There is some code for it left in the patch, but 'an_master' is always passed as false. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-20 13:29:37 +02:00
u64 sgmii;
u64 rmii_pll1;
net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT The SJA1105 switch family has a PTP_CLK pin which emits a signal with fixed 50% duty cycle, but variable frequency and programmable start time. On the second generation (P/Q/R/S) switches, this pin supports even more functionality. The use case described by the hardware documents talks about synchronization via oneshot pulses: given 2 sja1105 switches, arbitrarily designated as a master and a slave, the master emits a single pulse on PTP_CLK, while the slave is configured to timestamp this pulse received on its PTP_CLK pin (which must obviously be configured as input). The difference between the timestamps then exactly becomes the slave offset to the master. The only trouble with the above is that the hardware is very much tied into this use case only, and not very generic beyond that: - When emitting a oneshot pulse, instead of being told when to emit it, the switch just does it "now" and tells you later what time it was, via the PTPSYNCTS register. [ Incidentally, this is the same register that the slave uses to collect the ext_ts timestamp from, too. ] - On the sync slave, there is no interrupt mechanism on reception of a new extts, and no FIFO to buffer them, because in the foreseen use case, software is in control of both the master and the slave pins, so it "knows" when there's something to collect. These 2 problems mean that: - We don't support (at least yet) the quirky oneshot mode exposed by the hardware, just normal periodic output. - We abuse the hardware a little bit when we expose generic extts. Because there's no interrupt mechanism, we need to poll at double the frequency we expect to receive a pulse. Currently that means a non-configurable "twice a second". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24 00:59:24 +02:00
u64 ptppinst;
u64 ptppindur;
net: dsa: sja1105: Add support for the PTP clock The design of this PHC driver is influenced by the switch's behavior w.r.t. timestamping. It exposes two PTP counters, one free-running (PTPTSCLK) and the other offset- and frequency-corrected in hardware through PTPCLKVAL, PTPCLKADD and PTPCLKRATE. The MACs can sample either of these for frame timestamps. However, the user manual warns that taking timestamps based on the corrected clock is less than useful, as the switch can deliver corrupted timestamps in a variety of circumstances. Therefore, this PHC uses the free-running PTPTSCLK together with a timecounter/cyclecounter structure that translates it into a software time domain. Thus, the settime/adjtime and adjfine callbacks are hardware no-ops. The timestamps (introduced in a further patch) will also be translated to the correct time domain before being handed over to the userspace PTP stack. The introduction of a second set of PHC operations that operate on the hardware PTPCLKVAL/PTPCLKADD/PTPCLKRATE in the future is somewhat unavoidable, as the TTEthernet core uses the corrected PTP time domain. However, the free-running counter + timecounter structure combination will suffice for now, as the resulting timestamps yield a sub-50 ns synchronization offset in steady state using linuxptp. For this patch, in absence of frame timestamping, the operations of the switch PHC were tested by syncing it to the system time as a local slave clock with: phc2sys -s CLOCK_REALTIME -c swp2 -O 0 -m -S 0.01 Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08 15:04:34 +03:00
u64 ptp_control;
u64 ptpclkval;
net: dsa: sja1105: Add support for the PTP clock The design of this PHC driver is influenced by the switch's behavior w.r.t. timestamping. It exposes two PTP counters, one free-running (PTPTSCLK) and the other offset- and frequency-corrected in hardware through PTPCLKVAL, PTPCLKADD and PTPCLKRATE. The MACs can sample either of these for frame timestamps. However, the user manual warns that taking timestamps based on the corrected clock is less than useful, as the switch can deliver corrupted timestamps in a variety of circumstances. Therefore, this PHC uses the free-running PTPTSCLK together with a timecounter/cyclecounter structure that translates it into a software time domain. Thus, the settime/adjtime and adjfine callbacks are hardware no-ops. The timestamps (introduced in a further patch) will also be translated to the correct time domain before being handed over to the userspace PTP stack. The introduction of a second set of PHC operations that operate on the hardware PTPCLKVAL/PTPCLKADD/PTPCLKRATE in the future is somewhat unavoidable, as the TTEthernet core uses the corrected PTP time domain. However, the free-running counter + timecounter structure combination will suffice for now, as the resulting timestamps yield a sub-50 ns synchronization offset in steady state using linuxptp. For this patch, in absence of frame timestamping, the operations of the switch PHC were tested by syncing it to the system time as a local slave clock with: phc2sys -s CLOCK_REALTIME -c swp2 -O 0 -m -S 0.01 Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08 15:04:34 +03:00
u64 ptpclkrate;
net: dsa: sja1105: Implement state machine for TAS with PTP clock source Tested using the following bash script and the tc from iproute2-next: #!/bin/bash set -e -u -o pipefail NSEC_PER_SEC="1000000000" gatemask() { local tc_list="$1" local mask=0 for tc in ${tc_list}; do mask=$((${mask} | (1 << ${tc}))) done printf "%02x" ${mask} } if ! systemctl is-active --quiet ptp4l; then echo "Please start the ptp4l service" exit fi now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }') # Phase-align the base time to the start of the next second. sec=$(echo "${now}" | gawk -F. '{ print $1; }') base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))" tc qdisc add dev swp5 parent root handle 100 taprio \ num_tc 8 \ map 0 1 2 3 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time ${base_time} \ sched-entry S $(gatemask 7) 100000 \ sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \ clockid CLOCK_TAI flags 2 The "state machine" is a workqueue invoked after each manipulation command on the PTP clock (reset, adjust time, set time, adjust frequency) which checks over the state of the time-aware scheduler. So it is not monitored periodically, only in reaction to a PTP command typically triggered from a userspace daemon (linuxptp). Otherwise there is no reason for things to go wrong. Now that the timecounter/cyclecounter has been replaced with hardware operations on the PTP clock, the TAS Kconfig now depends upon PTP and the standalone clocksource operating mode has been removed. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-12 02:11:54 +02:00
u64 ptpclkcorp;
net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT The SJA1105 switch family has a PTP_CLK pin which emits a signal with fixed 50% duty cycle, but variable frequency and programmable start time. On the second generation (P/Q/R/S) switches, this pin supports even more functionality. The use case described by the hardware documents talks about synchronization via oneshot pulses: given 2 sja1105 switches, arbitrarily designated as a master and a slave, the master emits a single pulse on PTP_CLK, while the slave is configured to timestamp this pulse received on its PTP_CLK pin (which must obviously be configured as input). The difference between the timestamps then exactly becomes the slave offset to the master. The only trouble with the above is that the hardware is very much tied into this use case only, and not very generic beyond that: - When emitting a oneshot pulse, instead of being told when to emit it, the switch just does it "now" and tells you later what time it was, via the PTPSYNCTS register. [ Incidentally, this is the same register that the slave uses to collect the ext_ts timestamp from, too. ] - On the sync slave, there is no interrupt mechanism on reception of a new extts, and no FIFO to buffer them, because in the foreseen use case, software is in control of both the master and the slave pins, so it "knows" when there's something to collect. These 2 problems mean that: - We don't support (at least yet) the quirky oneshot mode exposed by the hardware, just normal periodic output. - We abuse the hardware a little bit when we expose generic extts. Because there's no interrupt mechanism, we need to poll at double the frequency we expect to receive a pulse. Currently that means a non-configurable "twice a second". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24 00:59:24 +02:00
u64 ptpsyncts;
net: dsa: sja1105: Implement state machine for TAS with PTP clock source Tested using the following bash script and the tc from iproute2-next: #!/bin/bash set -e -u -o pipefail NSEC_PER_SEC="1000000000" gatemask() { local tc_list="$1" local mask=0 for tc in ${tc_list}; do mask=$((${mask} | (1 << ${tc}))) done printf "%02x" ${mask} } if ! systemctl is-active --quiet ptp4l; then echo "Please start the ptp4l service" exit fi now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }') # Phase-align the base time to the start of the next second. sec=$(echo "${now}" | gawk -F. '{ print $1; }') base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))" tc qdisc add dev swp5 parent root handle 100 taprio \ num_tc 8 \ map 0 1 2 3 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time ${base_time} \ sched-entry S $(gatemask 7) 100000 \ sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \ clockid CLOCK_TAI flags 2 The "state machine" is a workqueue invoked after each manipulation command on the PTP clock (reset, adjust time, set time, adjust frequency) which checks over the state of the time-aware scheduler. So it is not monitored periodically, only in reaction to a PTP command typically triggered from a userspace daemon (linuxptp). Otherwise there is no reason for things to go wrong. Now that the timecounter/cyclecounter has been replaced with hardware operations on the PTP clock, the TAS Kconfig now depends upon PTP and the standalone clocksource operating mode has been removed. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-12 02:11:54 +02:00
u64 ptpschtm;
u64 ptpegr_ts[SJA1105_NUM_PORTS];
u64 pad_mii_tx[SJA1105_NUM_PORTS];
u64 pad_mii_rx[SJA1105_NUM_PORTS];
u64 pad_mii_id[SJA1105_NUM_PORTS];
u64 cgu_idiv[SJA1105_NUM_PORTS];
u64 mii_tx_clk[SJA1105_NUM_PORTS];
u64 mii_rx_clk[SJA1105_NUM_PORTS];
u64 mii_ext_tx_clk[SJA1105_NUM_PORTS];
u64 mii_ext_rx_clk[SJA1105_NUM_PORTS];
u64 rgmii_tx_clk[SJA1105_NUM_PORTS];
u64 rmii_ref_clk[SJA1105_NUM_PORTS];
u64 rmii_ext_tx_clk[SJA1105_NUM_PORTS];
u64 mac[SJA1105_NUM_PORTS];
u64 mac_hl1[SJA1105_NUM_PORTS];
u64 mac_hl2[SJA1105_NUM_PORTS];
u64 ether_stats[SJA1105_NUM_PORTS];
u64 qlevel[SJA1105_NUM_PORTS];
};
struct sja1105_info {
u64 device_id;
/* Needed for distinction between P and R, and between Q and S
* (since the parts with/without SGMII share the same
* switch core and device_id)
*/
u64 part_no;
/* E/T and P/Q/R/S have partial timestamps of different sizes.
* They must be reconstructed on both families anyway to get the full
* 64-bit values back.
*/
int ptp_ts_bits;
/* Also SPI commands are of different sizes to retrieve
* the egress timestamps.
*/
int ptpegr_ts_bytes;
int num_cbs_shapers;
const struct sja1105_dynamic_table_ops *dyn_ops;
const struct sja1105_table_ops *static_ops;
const struct sja1105_regs *regs;
net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously In VLAN-unaware mode, sja1105 uses VLAN tags with a custom TPID of 0xdadb. While in the yet-to-be introduced best_effort_vlan_filtering mode, it needs to work with normal VLAN TPID values. A complication arises when we must transmit a VLAN-tagged packet to the switch when it's in VLAN-aware mode. We need to construct a packet with 2 VLAN tags, and the switch will use the outer header for routing and pop it on egress. But sadly, here the 2 hardware generations don't behave the same: - E/T switches won't pop an ETH_P_8021AD tag on egress, it seems (packets will remain double-tagged). - P/Q/R/S switches will drop a packet with 2 ETH_P_8021Q tags (it looks like it tries to prevent VLAN hopping). But looks like the reverse is also true: - E/T switches have no problem popping the outer tag from packets with 2 ETH_P_8021Q tags. - P/Q/R/S will have no problem popping a single tag even if that is ETH_P_8021AD. So it is clear that if we want the hardware to work with dsa_8021q tagging in VLAN-aware mode, we need to send different TPIDs depending on revision. Keep that information in priv->info->qinq_tpid. The per-port tagger structure will hold an xmit_tpid value that depends not only upon the qinq_tpid, but also upon the VLAN awareness state itself (in case we must transmit using 0xdadb). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12 20:20:32 +03:00
/* Both E/T and P/Q/R/S have quirks when it comes to popping the S-Tag
* from double-tagged frames. E/T will pop it only when it's equal to
* TPID from the General Parameters Table, while P/Q/R/S will only
* pop it when it's equal to TPID2.
*/
u16 qinq_tpid;
int (*reset_cmd)(struct dsa_switch *ds);
net: dsa: sja1105: Error out if RGMII delays are requested in DT Documentation/devicetree/bindings/net/ethernet.txt is confusing because it says what the MAC should not do, but not what it *should* do: * "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, the MAC should not add an RX delay in this case) The gap in semantics is threefold: 1. Is it illegal for the MAC to apply the Rx internal delay by itself, and simplify the phy_mode (mask off "rgmii-rxid" into "rgmii") before passing it to of_phy_connect? The documentation would suggest yes. 1. For "rgmii-rxid", while the situation with the Rx clock skew is more or less clear (needs to be added by the PHY), what should the MAC driver do about the Tx delays? Is it an implicit wild card for the MAC to apply delays in the Tx direction if it can? What if those were already added as serpentine PCB traces, how could that be made more obvious through DT bindings so that the MAC doesn't attempt to add them twice and again potentially break the link? 3. If the interface is a fixed-link and therefore the PHY object is fixed (a purely software entity that obviously cannot add clock skew), what is the meaning of the above property? So an interpretation of the RGMII bindings was chosen that hopefully does not contradict their intention but also makes them more applied. The SJA1105 driver understands to act upon "rgmii-*id" phy-mode bindings if the port is in the PHY role (either explicitly, or if it is a fixed-link). Otherwise it always passes the duty of setting up delays to the PHY driver. The error behavior that this patch adds is required on SJA1105E/T where the MAC really cannot apply internal delays. If the other end of the fixed-link cannot apply RGMII delays either (this would be specified through its own DT bindings), then the situation requires PCB delays. For SJA1105P/Q/R/S, this is however hardware supported and the error is thus only temporary. I created a stub function pointer for configuring delays per-port on RXC and TXC, and will implement it when I have access to a board with this hardware setup. Meanwhile do not allow the user to select an invalid configuration. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02 23:23:32 +03:00
int (*setup_rgmii_delay)(const void *ctx, int port);
/* Prototypes from include/net/dsa.h */
int (*fdb_add_cmd)(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid);
int (*fdb_del_cmd)(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid);
void (*ptp_cmd_packing)(u8 *buf, struct sja1105_ptp_cmd *cmd,
enum packing_op op);
const char *name;
};
enum sja1105_key_type {
SJA1105_KEY_BCAST,
SJA1105_KEY_TC,
SJA1105_KEY_VLAN_UNAWARE_VL,
SJA1105_KEY_VLAN_AWARE_VL,
};
struct sja1105_key {
enum sja1105_key_type type;
union {
/* SJA1105_KEY_TC */
struct {
int pcp;
} tc;
/* SJA1105_KEY_VLAN_UNAWARE_VL */
/* SJA1105_KEY_VLAN_AWARE_VL */
struct {
u64 dmac;
u16 vid;
u16 pcp;
} vl;
};
};
enum sja1105_rule_type {
SJA1105_RULE_BCAST_POLICER,
SJA1105_RULE_TC_POLICER,
SJA1105_RULE_VL,
};
enum sja1105_vl_type {
SJA1105_VL_NONCRITICAL,
SJA1105_VL_RATE_CONSTRAINED,
SJA1105_VL_TIME_TRIGGERED,
};
struct sja1105_rule {
struct list_head list;
unsigned long cookie;
unsigned long port_mask;
struct sja1105_key key;
enum sja1105_rule_type type;
/* Action */
union {
/* SJA1105_RULE_BCAST_POLICER */
struct {
int sharindx;
} bcast_pol;
/* SJA1105_RULE_TC_POLICER */
struct {
int sharindx;
} tc_pol;
/* SJA1105_RULE_VL */
struct {
enum sja1105_vl_type type;
net: dsa: sja1105: implement tc-gate using time-triggered virtual links Restrict the TTEthernet hardware support on this switch to operate as closely as possible to IEEE 802.1Qci as possible. This means that it can perform PTP-time-based ingress admission control on streams identified by {DMAC, VID, PCP}, which is useful when trying to ensure the determinism of traffic scheduled via IEEE 802.1Qbv. The oddity comes from the fact that in hardware (and in TTEthernet at large), virtual links always need a full-blown action, including not only the type of policing, but also the list of destination ports. So in practice, a single tc-gate action will result in all packets getting dropped. Additional actions (either "trap" or "redirect") need to be specified in the same filter rule such that the conforming packets are actually forwarded somewhere. Apart from the VL Lookup, Policing and Forwarding tables which need to be programmed for each flow (virtual link), the Schedule engine also needs to be told to open/close the admission gates for each individual virtual link. A fairly accurate (and detailed) description of how that works is already present in sja1105_tas.c, since it is already used to trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key point here, we remember that the schedule engine supports 8 "subschedules" (execution threads that iterate through the global schedule in parallel, and that no 2 hardware threads must execute a schedule entry at the same time). For tc-taprio, each egress port used one of these 8 subschedules, leaving a total of 4 subschedules unused. In principle we could have allocated 1 subschedule for the tc-gate offload of each ingress port, but actually the schedules of all virtual links installed on each ingress port would have needed to be merged together, before they could have been programmed to hardware. So simplify our life and just merge the entire tc-gate configuration, for all virtual links on all ingress ports, into a single subschedule. Be sure to check that against the usual hardware scheduling conflicts, and program it to hardware alongside any tc-taprio subschedule that may be present. The following scenarios were tested: 1. Quantitative testing: tc qdisc add dev swp2 clsact tc filter add dev swp2 ingress flower skip_sw \ dst_mac 42:be:24:9b:76:20 \ action gate index 1 base-time 0 \ sched-entry OPEN 1200 -1 -1 \ sched-entry CLOSE 1200 -1 -1 \ action trap ping 192.168.1.2 -f PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. ............................. --- 192.168.1.2 ping statistics --- 948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms 2. Qualitative testing (with a phase-aligned schedule - the clocks are synchronized by ptp4l, not shown here): Receiver (sja1105): tc qdisc add dev swp2 clsact now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \ sec=$(echo $now | awk -F. '{print $1}') && \ base_time="$(((sec + 2) * 1000000000))" && \ echo "base time ${base_time}" tc filter add dev swp2 ingress flower skip_sw \ dst_mac 42:be:24:9b:76:20 \ action gate base-time ${base_time} \ sched-entry OPEN 60000 -1 -1 \ sched-entry CLOSE 40000 -1 -1 \ action trap Sender (enetc): now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \ sec=$(echo $now | awk -F. '{print $1}') && \ base_time="$(((sec + 2) * 1000000000))" && \ echo "base time ${base_time}" tc qdisc add dev eno0 parent root taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time ${base_time} \ sched-entry S 01 50000 \ sched-entry S 00 50000 \ flags 2 ping -A 192.168.1.1 PING 192.168.1.1 (192.168.1.1): 56 data bytes ... ^C --- 192.168.1.1 ping statistics --- 1425 packets transmitted, 1424 packets received, 0% packet loss round-trip min/avg/max = 0.322/0.361/0.990 ms And just for comparison, with the tc-taprio schedule deleted: ping -A 192.168.1.1 PING 192.168.1.1 (192.168.1.1): 56 data bytes ... ^C --- 192.168.1.1 ping statistics --- 33 packets transmitted, 19 packets received, 42% packet loss round-trip min/avg/max = 0.336/0.464/0.597 ms Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05 22:20:56 +03:00
unsigned long destports;
int sharindx;
int maxlen;
int ipv;
u64 base_time;
u64 cycle_time;
int num_entries;
struct action_gate_entry *entries;
struct flow_stats stats;
} vl;
};
};
struct sja1105_flow_block {
struct list_head rules;
bool l2_policer_used[SJA1105_NUM_L2_POLICERS];
int num_virtual_links;
};
net: dsa: sja1105: save/restore VLANs using a delta commit method Managing the VLAN table that is present in hardware will become very difficult once we add a third operating state (best_effort_vlan_filtering). That is because correct cleanup (not too little, not too much) becomes virtually impossible, when VLANs can be added from the bridge layer, from dsa_8021q for basic tagging, for cross-chip bridging, as well as retagging rules for sub-VLANs and cross-chip sub-VLANs. So we need to rethink VLAN interaction with the switch in a more scalable way. In preparation for that, use the priv->expect_dsa_8021q boolean to classify any VLAN request received through .port_vlan_add or .port_vlan_del towards either one of 2 internal lists: bridge VLANs and dsa_8021q VLANs. Then, implement a central sja1105_build_vlan_table method that creates a VLAN configuration from scratch based on the 2 lists of VLANs kept by the driver, and based on the VLAN awareness state. Currently, if we are VLAN-unaware, install the dsa_8021q VLANs, otherwise the bridge VLANs. Then, implement a delta commit procedure that identifies which VLANs from this new configuration are actually different from the config previously committed to hardware. We apply the delta through the dynamic configuration interface (we don't reset the switch). The result is that the hardware should see the exact sequence of operations as before this patch. This also helps remove the "br" argument passed to dsa_8021q_crosschip_bridge_join, which it was only using to figure out whether it should commit the configuration back to us or not, based on the VLAN awareness state of the bridge. We can simplify that, by always allowing those VLANs inside of our dsa_8021q_vlans list, and committing those to hardware when necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12 20:20:29 +03:00
struct sja1105_bridge_vlan {
struct list_head list;
int port;
u16 vid;
bool pvid;
bool untagged;
};
enum sja1105_vlan_state {
SJA1105_VLAN_UNAWARE,
SJA1105_VLAN_BEST_EFFORT,
SJA1105_VLAN_FILTERING_FULL,
};
struct sja1105_private {
struct sja1105_static_config static_config;
net: dsa: sja1105: Error out if RGMII delays are requested in DT Documentation/devicetree/bindings/net/ethernet.txt is confusing because it says what the MAC should not do, but not what it *should* do: * "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, the MAC should not add an RX delay in this case) The gap in semantics is threefold: 1. Is it illegal for the MAC to apply the Rx internal delay by itself, and simplify the phy_mode (mask off "rgmii-rxid" into "rgmii") before passing it to of_phy_connect? The documentation would suggest yes. 1. For "rgmii-rxid", while the situation with the Rx clock skew is more or less clear (needs to be added by the PHY), what should the MAC driver do about the Tx delays? Is it an implicit wild card for the MAC to apply delays in the Tx direction if it can? What if those were already added as serpentine PCB traces, how could that be made more obvious through DT bindings so that the MAC doesn't attempt to add them twice and again potentially break the link? 3. If the interface is a fixed-link and therefore the PHY object is fixed (a purely software entity that obviously cannot add clock skew), what is the meaning of the above property? So an interpretation of the RGMII bindings was chosen that hopefully does not contradict their intention but also makes them more applied. The SJA1105 driver understands to act upon "rgmii-*id" phy-mode bindings if the port is in the PHY role (either explicitly, or if it is a fixed-link). Otherwise it always passes the duty of setting up delays to the PHY driver. The error behavior that this patch adds is required on SJA1105E/T where the MAC really cannot apply internal delays. If the other end of the fixed-link cannot apply RGMII delays either (this would be specified through its own DT bindings), then the situation requires PCB delays. For SJA1105P/Q/R/S, this is however hardware supported and the error is thus only temporary. I created a stub function pointer for configuring delays per-port on RXC and TXC, and will implement it when I have access to a board with this hardware setup. Meanwhile do not allow the user to select an invalid configuration. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02 23:23:32 +03:00
bool rgmii_rx_delay[SJA1105_NUM_PORTS];
bool rgmii_tx_delay[SJA1105_NUM_PORTS];
bool best_effort_vlan_filtering;
const struct sja1105_info *info;
struct gpio_desc *reset_gpio;
struct spi_device *spidev;
struct dsa_switch *ds;
net: dsa: sja1105: save/restore VLANs using a delta commit method Managing the VLAN table that is present in hardware will become very difficult once we add a third operating state (best_effort_vlan_filtering). That is because correct cleanup (not too little, not too much) becomes virtually impossible, when VLANs can be added from the bridge layer, from dsa_8021q for basic tagging, for cross-chip bridging, as well as retagging rules for sub-VLANs and cross-chip sub-VLANs. So we need to rethink VLAN interaction with the switch in a more scalable way. In preparation for that, use the priv->expect_dsa_8021q boolean to classify any VLAN request received through .port_vlan_add or .port_vlan_del towards either one of 2 internal lists: bridge VLANs and dsa_8021q VLANs. Then, implement a central sja1105_build_vlan_table method that creates a VLAN configuration from scratch based on the 2 lists of VLANs kept by the driver, and based on the VLAN awareness state. Currently, if we are VLAN-unaware, install the dsa_8021q VLANs, otherwise the bridge VLANs. Then, implement a delta commit procedure that identifies which VLANs from this new configuration are actually different from the config previously committed to hardware. We apply the delta through the dynamic configuration interface (we don't reset the switch). The result is that the hardware should see the exact sequence of operations as before this patch. This also helps remove the "br" argument passed to dsa_8021q_crosschip_bridge_join, which it was only using to figure out whether it should commit the configuration back to us or not, based on the VLAN awareness state of the bridge. We can simplify that, by always allowing those VLANs inside of our dsa_8021q_vlans list, and committing those to hardware when necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-12 20:20:29 +03:00
struct list_head dsa_8021q_vlans;
struct list_head bridge_vlans;
struct sja1105_flow_block flow_block;
struct sja1105_port ports[SJA1105_NUM_PORTS];
/* Serializes transmission of management frames so that
* the switch doesn't confuse them with one another.
*/
struct mutex mgmt_lock;
net: dsa: tag_8021q: add a context structure While working on another tag_8021q driver implementation, some things became apparent: - It is not mandatory for a DSA driver to offload the tag_8021q VLANs by using the VLAN table per se. For example, it can add custom TCAM rules that simply encapsulate RX traffic, and redirect & decapsulate rules for TX traffic. For such a driver, it makes no sense to receive the tag_8021q configuration through the same callback as it receives the VLAN configuration from the bridge and the 8021q modules. - Currently, sja1105 (the only tag_8021q user) sets a priv->expect_dsa_8021q variable to distinguish between the bridge calling, and tag_8021q calling. That can be improved, to say the least. - The crosschip bridging operations are, in fact, stateful already. The list of crosschip_links must be kept by the caller and passed to the relevant tag_8021q functions. So it would be nice if the tag_8021q configuration was more self-contained. This patch attempts to do that. Create a struct dsa_8021q_context which encapsulates a struct dsa_switch, and has 2 function pointers for adding and deleting a VLAN. These will replace the previous channel to the driver, which was through the .port_vlan_add and .port_vlan_del callbacks of dsa_switch_ops. Also put the list of crosschip_links into this dsa_8021q_context. Drivers that don't support cross-chip bridging can simply omit to initialize this list, as long as they dont call any cross-chip function. The sja1105_vlan_add and sja1105_vlan_del functions are refactored into a smaller sja1105_vlan_add_one, which now has 2 entry points: - sja1105_vlan_add, from struct dsa_switch_ops - sja1105_dsa_8021q_vlan_add, from the tag_8021q ops But even this change is fairly trivial. It just reflects the fact that for sja1105, the VLANs from these 2 channels end up in the same hardware table. However that is not necessarily true in the general sense (and that's the reason for making this change). The rest of the patch is mostly plain refactoring of "ds" -> "ctx". The dsa_8021q_context structure needs to be propagated because adding a VLAN is now done through the ops function pointers inside of it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10 19:48:56 +03:00
struct dsa_8021q_context *dsa_8021q_ctx;
enum sja1105_vlan_state vlan_state;
struct sja1105_cbs_entry *cbs;
struct sja1105_tagger_data tagger_data;
struct sja1105_ptp_data ptp_data;
net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload This qdisc offload is the closest thing to what the SJA1105 supports in hardware for time-based egress shaping. The switch core really is built around SAE AS6802/TTEthernet (a TTTech standard) but can be made to operate similarly to IEEE 802.1Qbv with some constraints: - The gate control list is a global list for all ports. There are 8 execution threads that iterate through this global list in parallel. I don't know why 8, there are only 4 front-panel ports. - Care must be taken by the user to make sure that two execution threads never get to execute a GCL entry simultaneously. I created a O(n^4) checker for this hardware limitation, prior to accepting a taprio offload configuration as valid. - The spec says that if a GCL entry's interval is shorter than the frame length, you shouldn't send it (and end up in head-of-line blocking). Well, this switch does anyway. - The switch has no concept of ADMIN and OPER configurations. Because it's so simple, the TAS settings are loaded through the static config tables interface, so there isn't even place for any discussion about 'graceful switchover between ADMIN and OPER'. You just reset the switch and upload a new OPER config. - The switch accepts multiple time sources for the gate events. Right now I am using the standalone clock source as opposed to PTP. So the base time parameter doesn't really do much. Support for the PTP clock source will be added in a future series. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15 05:00:02 +03:00
struct sja1105_tas_data tas_data;
};
#include "sja1105_dynamic_config.h"
struct sja1105_spi_message {
u64 access;
u64 read_count;
u64 address;
};
net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload This qdisc offload is the closest thing to what the SJA1105 supports in hardware for time-based egress shaping. The switch core really is built around SAE AS6802/TTEthernet (a TTTech standard) but can be made to operate similarly to IEEE 802.1Qbv with some constraints: - The gate control list is a global list for all ports. There are 8 execution threads that iterate through this global list in parallel. I don't know why 8, there are only 4 front-panel ports. - Care must be taken by the user to make sure that two execution threads never get to execute a GCL entry simultaneously. I created a O(n^4) checker for this hardware limitation, prior to accepting a taprio offload configuration as valid. - The spec says that if a GCL entry's interval is shorter than the frame length, you shouldn't send it (and end up in head-of-line blocking). Well, this switch does anyway. - The switch has no concept of ADMIN and OPER configurations. Because it's so simple, the TAS settings are loaded through the static config tables interface, so there isn't even place for any discussion about 'graceful switchover between ADMIN and OPER'. You just reset the switch and upload a new OPER config. - The switch accepts multiple time sources for the gate events. Right now I am using the standalone clock source as opposed to PTP. So the base time parameter doesn't really do much. Support for the PTP clock source will be added in a future series. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15 05:00:02 +03:00
/* From sja1105_main.c */
enum sja1105_reset_reason {
SJA1105_VLAN_FILTERING = 0,
SJA1105_RX_HWTSTAMPING,
SJA1105_AGEING_TIME,
SJA1105_SCHEDULING,
SJA1105_BEST_EFFORT_POLICING,
SJA1105_VIRTUAL_LINKS,
};
int sja1105_static_config_reload(struct sja1105_private *priv,
enum sja1105_reset_reason reason);
net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload This qdisc offload is the closest thing to what the SJA1105 supports in hardware for time-based egress shaping. The switch core really is built around SAE AS6802/TTEthernet (a TTTech standard) but can be made to operate similarly to IEEE 802.1Qbv with some constraints: - The gate control list is a global list for all ports. There are 8 execution threads that iterate through this global list in parallel. I don't know why 8, there are only 4 front-panel ports. - Care must be taken by the user to make sure that two execution threads never get to execute a GCL entry simultaneously. I created a O(n^4) checker for this hardware limitation, prior to accepting a taprio offload configuration as valid. - The spec says that if a GCL entry's interval is shorter than the frame length, you shouldn't send it (and end up in head-of-line blocking). Well, this switch does anyway. - The switch has no concept of ADMIN and OPER configurations. Because it's so simple, the TAS settings are loaded through the static config tables interface, so there isn't even place for any discussion about 'graceful switchover between ADMIN and OPER'. You just reset the switch and upload a new OPER config. - The switch accepts multiple time sources for the gate events. Right now I am using the standalone clock source as opposed to PTP. So the base time parameter doesn't really do much. Support for the PTP clock source will be added in a future series. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15 05:00:02 +03:00
void sja1105_frame_memory_partitioning(struct sja1105_private *priv);
/* From sja1105_spi.c */
int sja1105_xfer_buf(const struct sja1105_private *priv,
sja1105_spi_rw_mode_t rw, u64 reg_addr,
net: dsa: sja1105: Switch to scatter/gather API for SPI This reworks the SPI transfer implementation to make use of more of the SPI core features. The main benefit is to avoid the memcpy in sja1105_xfer_buf(). The memcpy was only needed because the function was transferring a single buffer at a time. So it needed to copy the caller-provided buffer at buf + 4, to store the SPI message header in the "headroom" area. But the SPI core supports scatter-gather messages, comprised of multiple transfers. We can actually use those to break apart every SPI message into 2 transfers: one for the header and one for the actual payload. To keep the behavior the same regarding the chip select signal, it is necessary to tell the SPI core to de-assert the chip select after each chunk. This was not needed before, because each spi_message contained only 1 single transfer. The meaning of the per-transfer cs_change=1 is: - If the transfer is the last one of the message, keep CS asserted - Otherwise, deassert CS We need to deassert CS in the "otherwise" case, which was implicit before. Avoiding the memcpy creates yet another opportunity. The device can't process more than 256 bytes of SPI payload at a time, so the sja1105_xfer_long_buf() function used to exist, to split the larger caller buffer into chunks. But these chunks couldn't be used as scatter/gather buffers for spi_message until now, because of that memcpy (we would have needed more memory for each chunk). So we can now remove the sja1105_xfer_long_buf() function and have a single implementation for long and short buffers. Another benefit is lower usage of stack memory. Previously we had to store 2 SPI buffers for each chunk. Due to the elimination of the memcpy, we can now send pointers to the actual chunks from the caller-supplied buffer to the SPI core. Since the patch merges two functions into a rewritten implementation, the function prototype was also changed, mainly for cosmetic consistency with the structures used within it. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-12 01:31:15 +03:00
u8 *buf, size_t len);
int sja1105_xfer_u32(const struct sja1105_private *priv,
net: dsa: sja1105: Implement the .gettimex64 system call for PTP Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace applications (i.e. phc2sys) to compensate for the delays incurred while reading the PHC's time. The task itself of taking the software timestamp is delegated to the SPI subsystem, through the newly introduced API in struct spi_transfer. The goal is to cross-timestamp I/O operations on the switch's PTP clock with values in the local system clock (CLOCK_REALTIME). For that we need to understand a bit of the hardware internals. The 'read PTP time' message is a 12 byte structure, first 4 bytes of which represent the SPI header, and the last 8 bytes represent the 64-bit PTP time. The switch itself starts processing the command immediately after receiving the last bit of the address, i.e. at the middle of byte 3 (last byte of header). The PTP time is shadowed to a buffer register in the switch, and retrieved atomically during the subsequent SPI frames. A similar thing goes on for the 'write PTP time' message, although in that case the switch waits until the 64-bit PTP time becomes fully available before taking any action. So the byte that needs to be software-timestamped is byte 11 (last) of the transfer. The patch creates a common (and local) sja1105_xfer implementation for the SPI I/O, and offers 3 front-ends: - sja1105_xfer_u32 and sja1105_xfer_u64: these are capable of optionally requesting a PTP timestamp - sja1105_xfer_buf: this is for large transfers (e.g. the static config buffer) and other misc data, and there is no point in giving timestamping capabilities to this. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-09 13:32:22 +02:00
sja1105_spi_rw_mode_t rw, u64 reg_addr, u32 *value,
struct ptp_system_timestamp *ptp_sts);
int sja1105_xfer_u64(const struct sja1105_private *priv,
net: dsa: sja1105: Implement the .gettimex64 system call for PTP Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace applications (i.e. phc2sys) to compensate for the delays incurred while reading the PHC's time. The task itself of taking the software timestamp is delegated to the SPI subsystem, through the newly introduced API in struct spi_transfer. The goal is to cross-timestamp I/O operations on the switch's PTP clock with values in the local system clock (CLOCK_REALTIME). For that we need to understand a bit of the hardware internals. The 'read PTP time' message is a 12 byte structure, first 4 bytes of which represent the SPI header, and the last 8 bytes represent the 64-bit PTP time. The switch itself starts processing the command immediately after receiving the last bit of the address, i.e. at the middle of byte 3 (last byte of header). The PTP time is shadowed to a buffer register in the switch, and retrieved atomically during the subsequent SPI frames. A similar thing goes on for the 'write PTP time' message, although in that case the switch waits until the 64-bit PTP time becomes fully available before taking any action. So the byte that needs to be software-timestamped is byte 11 (last) of the transfer. The patch creates a common (and local) sja1105_xfer implementation for the SPI I/O, and offers 3 front-ends: - sja1105_xfer_u32 and sja1105_xfer_u64: these are capable of optionally requesting a PTP timestamp - sja1105_xfer_buf: this is for large transfers (e.g. the static config buffer) and other misc data, and there is no point in giving timestamping capabilities to this. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-09 13:32:22 +02:00
sja1105_spi_rw_mode_t rw, u64 reg_addr, u64 *value,
struct ptp_system_timestamp *ptp_sts);
int sja1105_static_config_upload(struct sja1105_private *priv);
int sja1105_inhibit_tx(const struct sja1105_private *priv,
unsigned long port_bitmap, bool tx_inhibited);
extern const struct sja1105_info sja1105e_info;
extern const struct sja1105_info sja1105t_info;
extern const struct sja1105_info sja1105p_info;
extern const struct sja1105_info sja1105q_info;
extern const struct sja1105_info sja1105r_info;
extern const struct sja1105_info sja1105s_info;
/* From sja1105_clocking.c */
typedef enum {
XMII_MAC = 0,
XMII_PHY = 1,
} sja1105_mii_role_t;
typedef enum {
XMII_MODE_MII = 0,
XMII_MODE_RMII = 1,
XMII_MODE_RGMII = 2,
net: dsa: sja1105: Add support for the SGMII port SJA1105 switches R and S have one SerDes port with an 802.3z quasi-compatible PCS, hardwired on port 4. The other ports are still MII/RMII/RGMII. The PCS performs rate adaptation to lower link speeds; the MAC on this port is hardwired at gigabit. Only full duplex is supported. The SGMII port can be configured as part of the static config tables, as well as through a dedicated SPI address region for its pseudo-clause-22 registers. However it looks like the static configuration is not able to change some out-of-reset values (like the value of MII_BMCR), so at the end of the day, having code for it is utterly pointless. We are just going to use the pseudo-C22 interface. Because the PCS gets reset when the switch resets, we have to add even more restoration logic to sja1105_static_config_reload, otherwise the SGMII port breaks after operations such as enabling PTP timestamping which require a switch reset. >From PHYLINK perspective, the switch supports *only* SGMII (it doesn't support 1000Base-X). It also doesn't expose access to the raw config word for in-band AN in registers MII_ADV/MII_LPA. It is able to work in the following modes: - Forced speed - SGMII in-band AN slave (speed received from PHY) - SGMII in-band AN master (acting as a PHY) The latter mode is not supported by this patch. It is even unclear to me how that would be described. There is some code for it left in the patch, but 'an_master' is always passed as false. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-20 13:29:37 +02:00
XMII_MODE_SGMII = 3,
} sja1105_phy_interface_t;
typedef enum {
SJA1105_SPEED_10MBPS = 3,
SJA1105_SPEED_100MBPS = 2,
SJA1105_SPEED_1000MBPS = 1,
SJA1105_SPEED_AUTO = 0,
} sja1105_speed_t;
int sja1105pqrs_setup_rgmii_delay(const void *ctx, int port);
int sja1105_clocking_setup_port(struct sja1105_private *priv, int port);
int sja1105_clocking_setup(struct sja1105_private *priv);
/* From sja1105_ethtool.c */
void sja1105_get_ethtool_stats(struct dsa_switch *ds, int port, u64 *data);
void sja1105_get_strings(struct dsa_switch *ds, int port,
u32 stringset, u8 *data);
int sja1105_get_sset_count(struct dsa_switch *ds, int port, int sset);
/* From sja1105_dynamic_config.c */
int sja1105_dynamic_config_read(struct sja1105_private *priv,
enum sja1105_blk_idx blk_idx,
int index, void *entry);
int sja1105_dynamic_config_write(struct sja1105_private *priv,
enum sja1105_blk_idx blk_idx,
int index, void *entry, bool keep);
enum sja1105_iotag {
SJA1105_C_TAG = 0, /* Inner VLAN header */
SJA1105_S_TAG = 1, /* Outer VLAN header */
};
u8 sja1105et_fdb_hash(struct sja1105_private *priv, const u8 *addr, u16 vid);
int sja1105et_fdb_add(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid);
int sja1105et_fdb_del(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid);
int sja1105pqrs_fdb_add(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid);
int sja1105pqrs_fdb_del(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid);
/* From sja1105_flower.c */
int sja1105_cls_flower_del(struct dsa_switch *ds, int port,
struct flow_cls_offload *cls, bool ingress);
int sja1105_cls_flower_add(struct dsa_switch *ds, int port,
struct flow_cls_offload *cls, bool ingress);
net: dsa: sja1105: implement tc-gate using time-triggered virtual links Restrict the TTEthernet hardware support on this switch to operate as closely as possible to IEEE 802.1Qci as possible. This means that it can perform PTP-time-based ingress admission control on streams identified by {DMAC, VID, PCP}, which is useful when trying to ensure the determinism of traffic scheduled via IEEE 802.1Qbv. The oddity comes from the fact that in hardware (and in TTEthernet at large), virtual links always need a full-blown action, including not only the type of policing, but also the list of destination ports. So in practice, a single tc-gate action will result in all packets getting dropped. Additional actions (either "trap" or "redirect") need to be specified in the same filter rule such that the conforming packets are actually forwarded somewhere. Apart from the VL Lookup, Policing and Forwarding tables which need to be programmed for each flow (virtual link), the Schedule engine also needs to be told to open/close the admission gates for each individual virtual link. A fairly accurate (and detailed) description of how that works is already present in sja1105_tas.c, since it is already used to trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key point here, we remember that the schedule engine supports 8 "subschedules" (execution threads that iterate through the global schedule in parallel, and that no 2 hardware threads must execute a schedule entry at the same time). For tc-taprio, each egress port used one of these 8 subschedules, leaving a total of 4 subschedules unused. In principle we could have allocated 1 subschedule for the tc-gate offload of each ingress port, but actually the schedules of all virtual links installed on each ingress port would have needed to be merged together, before they could have been programmed to hardware. So simplify our life and just merge the entire tc-gate configuration, for all virtual links on all ingress ports, into a single subschedule. Be sure to check that against the usual hardware scheduling conflicts, and program it to hardware alongside any tc-taprio subschedule that may be present. The following scenarios were tested: 1. Quantitative testing: tc qdisc add dev swp2 clsact tc filter add dev swp2 ingress flower skip_sw \ dst_mac 42:be:24:9b:76:20 \ action gate index 1 base-time 0 \ sched-entry OPEN 1200 -1 -1 \ sched-entry CLOSE 1200 -1 -1 \ action trap ping 192.168.1.2 -f PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. ............................. --- 192.168.1.2 ping statistics --- 948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms 2. Qualitative testing (with a phase-aligned schedule - the clocks are synchronized by ptp4l, not shown here): Receiver (sja1105): tc qdisc add dev swp2 clsact now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \ sec=$(echo $now | awk -F. '{print $1}') && \ base_time="$(((sec + 2) * 1000000000))" && \ echo "base time ${base_time}" tc filter add dev swp2 ingress flower skip_sw \ dst_mac 42:be:24:9b:76:20 \ action gate base-time ${base_time} \ sched-entry OPEN 60000 -1 -1 \ sched-entry CLOSE 40000 -1 -1 \ action trap Sender (enetc): now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \ sec=$(echo $now | awk -F. '{print $1}') && \ base_time="$(((sec + 2) * 1000000000))" && \ echo "base time ${base_time}" tc qdisc add dev eno0 parent root taprio \ num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time ${base_time} \ sched-entry S 01 50000 \ sched-entry S 00 50000 \ flags 2 ping -A 192.168.1.1 PING 192.168.1.1 (192.168.1.1): 56 data bytes ... ^C --- 192.168.1.1 ping statistics --- 1425 packets transmitted, 1424 packets received, 0% packet loss round-trip min/avg/max = 0.322/0.361/0.990 ms And just for comparison, with the tc-taprio schedule deleted: ping -A 192.168.1.1 PING 192.168.1.1 (192.168.1.1): 56 data bytes ... ^C --- 192.168.1.1 ping statistics --- 33 packets transmitted, 19 packets received, 42% packet loss round-trip min/avg/max = 0.336/0.464/0.597 ms Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05 22:20:56 +03:00
int sja1105_cls_flower_stats(struct dsa_switch *ds, int port,
struct flow_cls_offload *cls, bool ingress);
void sja1105_flower_setup(struct dsa_switch *ds);
void sja1105_flower_teardown(struct dsa_switch *ds);
struct sja1105_rule *sja1105_rule_find(struct sja1105_private *priv,
unsigned long cookie);
#endif