linux/net
Vladimir Oltean 2c0b03258b net: dsa: give preference to local CPU ports
Be there an "H" switch topology, where there are 2 switches connected as
follows:

         eth0                                                     eth1
          |                                                        |
       CPU port                                                CPU port
          |                        DSA link                        |
 sw0p0  sw0p1  sw0p2  sw0p3  sw0p4 -------- sw1p4  sw1p3  sw1p2  sw1p1  sw1p0
   |             |      |                            |      |             |
 user          user   user                         user   user          user
 port          port   port                         port   port          port

basically one where each switch has its own CPU port for termination,
but there is also a DSA link in case packets need to be forwarded in
hardware between one switch and another.

DSA insists to see this as a daisy chain topology, basically registering
all network interfaces as sw0p0@eth0, ... sw1p0@eth0 and disregarding
eth1 as a valid DSA master.

This is only half the story, since when asked using dsa_port_is_cpu(),
DSA will respond that sw1p1 is a CPU port, however one which has no
dp->cpu_dp pointing to it. So sw1p1 is enabled, but not used.

Furthermore, be there a driver for switches which support only one
upstream port. This driver iterates through its ports and checks using
dsa_is_upstream_port() whether the current port is an upstream one.
For switch 1, two ports pass the "is upstream port" checks:

- sw1p4 is an upstream port because it is a routing port towards the
  dedicated CPU port assigned using dsa_tree_setup_default_cpu()

- sw1p1 is also an upstream port because it is a CPU port, albeit one
  that is disabled. This is because dsa_upstream_port() returns:

	if (!cpu_dp)
		return port;

  which means that if @dp does not have a ->cpu_dp pointer (which is a
  characteristic of CPU ports themselves as well as unused ports), then
  @dp is its own upstream port.

So the driver for switch 1 rightfully says: I have two upstream ports,
but I don't support multiple upstream ports! So let me error out, I
don't know which one to choose and what to do with the other one.

Generally I am against enforcing any default policy in the kernel in
terms of user to CPU port assignment (like round robin or such) but this
case is different. To solve the conundrum, one would have to:

- Disable sw1p1 in the device tree or mark it as "not a CPU port" in
  order to comply with DSA's view of this topology as a daisy chain,
  where the termination traffic from switch 1 must pass through switch 0.
  This is counter-productive because it wastes 1Gbps of termination
  throughput in switch 1.
- Disable the DSA link between sw0p4 and sw1p4 and do software
  forwarding between switch 0 and 1, and basically treat the switches as
  part of disjoint switch trees. This is counter-productive because it
  wastes 1Gbps of autonomous forwarding throughput between switch 0 and 1.
- Treat sw0p4 and sw1p4 as user ports instead of DSA links. This could
  work, but it makes cross-chip bridging impossible. In this setup we
  would need to have 2 separate bridges, br0 spanning the ports of
  switch 0, and br1 spanning the ports of switch 1, and the "DSA links
  treated as user ports" sw0p4 (part of br0) and sw1p4 (part of br1) are
  the gateway ports between one bridge and another. This is hard to
  manage from a user's perspective, who wants to have a unified view of
  the switching fabric and the ability to transparently add ports to the
  same bridge. VLANs would also need to be explicitly managed by the
  user on these gateway ports.

So it seems that the only reasonable thing to do is to make DSA prefer
CPU ports that are local to the switch. Meaning that by default, the
user and DSA ports of switch 0 will get assigned to the CPU port from
switch 0 (sw0p1) and the user and DSA ports of switch 1 will get
assigned to the CPU port from switch 1.

The way this solves the problem is that sw1p4 is no longer an upstream
port as far as switch 1 is concerned (it no longer views sw0p1 as its
dedicated CPU port).

So here we are, the first multi-CPU port that DSA supports is also
perhaps the most uneventful one: the individual switches don't support
multiple CPUs, however the DSA switch tree as a whole does have multiple
CPU ports. No user space assignment of user ports to CPU ports is
desirable, necessary, or possible.

Ports that do not have a local CPU port (say there was an extra switch
hanging off of sw0p0) default to the standard implementation of getting
assigned to the first CPU port of the DSA switch tree. Is that good
enough? Probably not (if the downstream switch was hanging off of switch
1, we would most certainly prefer its CPU port to be sw1p1), but in
order to support that use case too, we would need to traverse the
dst->rtable in search of an optimum dedicated CPU port, one that has the
smallest number of hops between dp->ds and dp->cpu_dp->ds. At the
moment, the DSA routing table structure does not keep the number of hops
between dl->dp and dl->link_dp, and while it is probably deducible,
there is zero justification to write that code now. Let's hope DSA will
never have to support that use case.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05 11:05:48 +01:00
..
6lowpan
9p 9p/trans_virtio: Fix spelling mistakes 2021-06-02 14:01:55 -07:00
802 net/802/garp: fix memleak in garp_request_join() 2021-07-01 11:21:57 -07:00
8021q dev_ioctl: split out ndo_eth_ioctl 2021-07-27 20:11:45 +01:00
appletalk net: socket: rework compat_ifreq_ioctl() 2021-07-23 14:20:25 +01:00
atm atm: Use list_for_each_entry() to simplify code in resources.c 2021-06-10 14:08:09 -07:00
ax25 ax25: use skb_expand_head 2021-08-03 11:21:39 +01:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
bluetooth TTY / Serial patches for 5.14-rc1 2021-07-05 14:08:24 -07:00
bpf Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2021-07-31 11:23:26 -07:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge 2021-08-04 12:35:07 +01:00
caif net: fix uninit-value in caif_seqpkt_sendmsg 2021-07-15 11:08:33 -07:00
can can: j1939: extend UAPI to notify about RX status 2021-08-04 12:11:52 +02:00
ceph Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
core net: Replace deprecated CPU-hotplug functions. 2021-08-04 13:47:50 -07:00
dcb net: dcb: Return the correct errno code 2021-06-01 17:01:33 -07:00
dccp memcg: enable accounting for inet_bin_bucket cache 2021-07-20 06:00:38 -07:00
decnet net: decnet: Fix refcount warning for new dn_fib_info 2021-08-03 14:23:22 -07:00
dns_resolver
dsa net: dsa: give preference to local CPU ports 2021-08-05 11:05:48 +01:00
ethernet move netdev_boot_setup into Space.c 2021-08-03 13:05:26 +01:00
ethtool ethtool: runtime-resume netdev parent in ethnl_ops_begin 2021-08-03 12:58:22 +01:00
hsr net: hsr: don't check sequence number if tag removal is offloaded 2021-06-16 12:13:01 -07:00
ieee802154 net: socket: rework compat_ifreq_ioctl() 2021-07-23 14:20:25 +01:00
ife
ipv4 net: add extack arg for link ops 2021-08-04 10:01:26 +01:00
ipv6 ipv6: exthdrs: get rid of indirect calls in ip6_parse_tlv() 2021-08-04 10:34:40 +01:00
iucv s390: iucv: Avoid field over-reading memcpy() 2021-07-01 15:54:01 -07:00
kcm net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
key net: Remove unnecessary variables 2021-05-26 07:03:39 +02:00
l2tp l2tp: Fix spelling mistakes 2021-06-07 14:08:30 -07:00
l3mdev
lapb net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c 2021-06-08 16:31:25 -07:00
llc net: llc: fix skb_over_panic 2021-07-27 13:05:56 +01:00
mac80211 mac80211: fix enabling 4-address mode on a sta vif after assoc 2021-07-23 10:34:13 +02:00
mac802154 net: mac802154: Fix general protection fault 2021-04-06 22:42:16 +02:00
mctp mctp: remove duplicated assignment of pointer hdr 2021-08-05 10:56:01 +01:00
mpls mpls: defer ttl decrement in mpls_forward() 2021-07-23 17:17:56 +01:00
mptcp net: Use nlmsg_unicast() instead of netlink_unicast() 2021-07-13 09:28:29 -07:00
ncsi net/ncsi: add dummy response handler for Intel boards 2021-07-08 14:16:39 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-07-31 09:14:46 -07:00
netlabel net: cipso: fix warnings in netlbl_cipsov4_add_std 2021-07-27 20:58:30 +01:00
netlink net: netlink: Remove unused function 2021-07-30 18:35:47 +02:00
netrom netrom: Decrease sock refcount when sock timers expire 2021-07-18 09:48:59 -07:00
nfc nfc: hci: pass callback data param as pointer in nci_request() 2021-08-02 15:11:37 +01:00
nsh
openvswitch openvswitch: fix sparse warning incorrect type 2021-07-27 11:48:43 +01:00
packet Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
phonet phonet: use siocdevprivate 2021-07-27 20:11:43 +01:00
psample
qrtr Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-07-31 09:14:46 -07:00
rds Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
rfkill Another set of updates, all over the map: 2021-04-20 16:44:04 -07:00
rose
rxrpc Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
sched net_sched: refactor TC action init API 2021-08-02 10:24:38 +01:00
sctp sctp: fix return value check in __sctp_rcv_asconf_lookup 2021-07-28 09:25:18 +01:00
smc net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
strparser net: sock: introduce sk_error_report 2021-06-29 11:28:21 -07:00
sunrpc NFS client updates for Linux 5.14 2021-07-09 09:43:57 -07:00
switchdev net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge 2021-08-04 12:35:07 +01:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-07-31 09:14:46 -07:00
tls Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
unix af_unix: Add OOB support 2021-08-04 09:55:52 +01:00
vmw_vsock Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
wireless cfg80211: Fix possible memory leak in function cfg80211_bss_update 2021-07-23 10:38:18 +02:00
x25 net: x25: Use list_for_each_entry() to simplify code in x25_route.c 2021-06-10 14:08:09 -07:00
xdp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-29 15:45:27 -07:00
xfrm Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
compat.c net: Return the correct errno code 2021-06-03 15:13:56 -07:00
devres.c net: devres: Correct a grammatical error 2021-06-11 12:55:28 -07:00
Kconfig mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
Makefile mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
socket.c mctp: Add MCTP base 2021-07-29 15:06:49 +01:00
sysctl_net.c net: Ensure net namespace isolation of sysctls 2021-04-12 13:27:11 -07:00