linux/net/dsa/switch.c
Vladimir Oltean 161ca59d39 net: dsa: reference count the MDB entries at the cross-chip notifier level
Ever since the cross-chip notifiers were introduced, the design was
meant to be simplistic and just get the job done without worrying too
much about dangling resources left behind.

For example, somebody installs an MDB entry on sw0p0 in this daisy chain
topology. It gets installed using ds->ops->port_mdb_add() on sw0p0,
sw1p4 and sw2p4.

                                                    |
           sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
        [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
        [   x   ] [       ] [       ] [       ] [       ]
                                          |
                                          +---------+
                                                    |
           sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
        [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
        [       ] [       ] [       ] [       ] [   x   ]
                                          |
                                          +---------+
                                                    |
           sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
        [  user ] [  user ] [  user ] [  user ] [  dsa  ]
        [       ] [       ] [       ] [       ] [   x   ]

Then the same person deletes that MDB entry. The cross-chip notifier for
deletion only matches sw0p0:

                                                    |
           sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
        [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
        [   x   ] [       ] [       ] [       ] [       ]
                                          |
                                          +---------+
                                                    |
           sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
        [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
        [       ] [       ] [       ] [       ] [       ]
                                          |
                                          +---------+
                                                    |
           sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
        [  user ] [  user ] [  user ] [  user ] [  dsa  ]
        [       ] [       ] [       ] [       ] [       ]

Why?

Because the DSA links are 'trunk' ports, if we just go ahead and delete
the MDB from sw1p4 and sw2p4 directly, we might delete those multicast
entries when they are still needed. Just consider the fact that somebody
does:

- add a multicast MAC address towards sw0p0 [ via the cross-chip
  notifiers it gets installed on the DSA links too ]
- add the same multicast MAC address towards sw0p1 (another port of that
  same switch)
- delete the same multicast MAC address from sw0p0.

At this point, if we deleted the MAC address from the DSA links, it
would be flooded, even though there is still an entry on switch 0 which
needs it not to.

So that is why deletions only match the targeted source port and nothing
on DSA links. Of course, dangling resources means that the hardware
tables will eventually run out given enough additions/removals, but hey,
at least it's simple.

But there is a bigger concern which needs to be addressed, and that is
our support for SWITCHDEV_OBJ_ID_HOST_MDB. DSA simply translates such an
object into a dsa_port_host_mdb_add() which ends up as ds->ops->port_mdb_add()
on the upstream port, and a similar thing happens on deletion:
dsa_port_host_mdb_del() will trigger ds->ops->port_mdb_del() on the
upstream port.

When there are 2 VLAN-unaware bridges spanning the same switch (which is
a use case DSA proudly supports), each bridge will install its own
SWITCHDEV_OBJ_ID_HOST_MDB entries. But upon deletion, DSA goes ahead and
emits a DSA_NOTIFIER_MDB_DEL for dp->cpu_dp, which is shared between the
user ports enslaved to br0 and the user ports enslaved to br1. Not good.
The host-trapped multicast addresses installed by br1 will be deleted
when any state changes in br0 (IGMP timers expire, or ports leave, etc).

To avoid this, we could of course go the route of the zero-sum game and
delete the DSA_NOTIFIER_MDB_DEL call for dp->cpu_dp. But the better
design is to just admit that on shared ports like DSA links and CPU
ports, we should be reference counting calls, even if this consumes some
dynamic memory which DSA has traditionally avoided. On the flip side,
the hardware tables of switches are limited in size, so it would be good
if the OS managed them properly instead of having them eventually
overflow.

To address the memory usage concern, we only apply the refcounting of
MDB entries on ports that are really shared (CPU ports and DSA links)
and not on user ports. In a typical single-switch setup, this means only
the CPU port (and the host MDB entries are not that many, really).

The name of the newly introduced data structures (dsa_mac_addr) is
chosen in such a way that will be reusable for host FDB entries (next
patch).

With this change, we can finally have the same matching logic for the
MDB additions and deletions, as well as for their host-trapped variants.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-29 10:46:23 -07:00

641 lines
16 KiB
C

// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Handling of a single switch chip, part of a switch fabric
*
* Copyright (c) 2017 Savoir-faire Linux Inc.
* Vivien Didelot <vivien.didelot@savoirfairelinux.com>
*/
#include <linux/if_bridge.h>
#include <linux/netdevice.h>
#include <linux/notifier.h>
#include <linux/if_vlan.h>
#include <net/switchdev.h>
#include "dsa_priv.h"
static unsigned int dsa_switch_fastest_ageing_time(struct dsa_switch *ds,
unsigned int ageing_time)
{
int i;
for (i = 0; i < ds->num_ports; ++i) {
struct dsa_port *dp = dsa_to_port(ds, i);
if (dp->ageing_time && dp->ageing_time < ageing_time)
ageing_time = dp->ageing_time;
}
return ageing_time;
}
static int dsa_switch_ageing_time(struct dsa_switch *ds,
struct dsa_notifier_ageing_time_info *info)
{
unsigned int ageing_time = info->ageing_time;
if (ds->ageing_time_min && ageing_time < ds->ageing_time_min)
return -ERANGE;
if (ds->ageing_time_max && ageing_time > ds->ageing_time_max)
return -ERANGE;
/* Program the fastest ageing time in case of multiple bridges */
ageing_time = dsa_switch_fastest_ageing_time(ds, ageing_time);
if (ds->ops->set_ageing_time)
return ds->ops->set_ageing_time(ds, ageing_time);
return 0;
}
static bool dsa_switch_mtu_match(struct dsa_switch *ds, int port,
struct dsa_notifier_mtu_info *info)
{
if (ds->index == info->sw_index && port == info->port)
return true;
/* Do not propagate to other switches in the tree if the notifier was
* targeted for a single switch.
*/
if (info->targeted_match)
return false;
if (dsa_is_dsa_port(ds, port) || dsa_is_cpu_port(ds, port))
return true;
return false;
}
static int dsa_switch_mtu(struct dsa_switch *ds,
struct dsa_notifier_mtu_info *info)
{
int port, ret;
if (!ds->ops->port_change_mtu)
return -EOPNOTSUPP;
for (port = 0; port < ds->num_ports; port++) {
if (dsa_switch_mtu_match(ds, port, info)) {
ret = ds->ops->port_change_mtu(ds, port, info->mtu);
if (ret)
return ret;
}
}
return 0;
}
static int dsa_switch_bridge_join(struct dsa_switch *ds,
struct dsa_notifier_bridge_info *info)
{
struct dsa_switch_tree *dst = ds->dst;
if (dst->index == info->tree_index && ds->index == info->sw_index &&
ds->ops->port_bridge_join)
return ds->ops->port_bridge_join(ds, info->port, info->br);
if ((dst->index != info->tree_index || ds->index != info->sw_index) &&
ds->ops->crosschip_bridge_join)
return ds->ops->crosschip_bridge_join(ds, info->tree_index,
info->sw_index,
info->port, info->br);
return 0;
}
static int dsa_switch_bridge_leave(struct dsa_switch *ds,
struct dsa_notifier_bridge_info *info)
{
bool unset_vlan_filtering = br_vlan_enabled(info->br);
struct dsa_switch_tree *dst = ds->dst;
struct netlink_ext_ack extack = {0};
int err, port;
if (dst->index == info->tree_index && ds->index == info->sw_index &&
ds->ops->port_bridge_join)
ds->ops->port_bridge_leave(ds, info->port, info->br);
if ((dst->index != info->tree_index || ds->index != info->sw_index) &&
ds->ops->crosschip_bridge_join)
ds->ops->crosschip_bridge_leave(ds, info->tree_index,
info->sw_index, info->port,
info->br);
/* If the bridge was vlan_filtering, the bridge core doesn't trigger an
* event for changing vlan_filtering setting upon slave ports leaving
* it. That is a good thing, because that lets us handle it and also
* handle the case where the switch's vlan_filtering setting is global
* (not per port). When that happens, the correct moment to trigger the
* vlan_filtering callback is only when the last port leaves the last
* VLAN-aware bridge.
*/
if (unset_vlan_filtering && ds->vlan_filtering_is_global) {
for (port = 0; port < ds->num_ports; port++) {
struct net_device *bridge_dev;
bridge_dev = dsa_to_port(ds, port)->bridge_dev;
if (bridge_dev && br_vlan_enabled(bridge_dev)) {
unset_vlan_filtering = false;
break;
}
}
}
if (unset_vlan_filtering) {
err = dsa_port_vlan_filtering(dsa_to_port(ds, info->port),
false, &extack);
if (extack._msg)
dev_err(ds->dev, "port %d: %s\n", info->port,
extack._msg);
if (err && err != EOPNOTSUPP)
return err;
}
return 0;
}
/* Matches for all upstream-facing ports (the CPU port and all upstream-facing
* DSA links) that sit between the targeted port on which the notifier was
* emitted and its dedicated CPU port.
*/
static bool dsa_switch_host_address_match(struct dsa_switch *ds, int port,
int info_sw_index, int info_port)
{
struct dsa_port *targeted_dp, *cpu_dp;
struct dsa_switch *targeted_ds;
targeted_ds = dsa_switch_find(ds->dst->index, info_sw_index);
targeted_dp = dsa_to_port(targeted_ds, info_port);
cpu_dp = targeted_dp->cpu_dp;
if (dsa_switch_is_upstream_of(ds, targeted_ds))
return port == dsa_towards_port(ds, cpu_dp->ds->index,
cpu_dp->index);
return false;
}
static struct dsa_mac_addr *dsa_mac_addr_find(struct list_head *addr_list,
const unsigned char *addr,
u16 vid)
{
struct dsa_mac_addr *a;
list_for_each_entry(a, addr_list, list)
if (ether_addr_equal(a->addr, addr) && a->vid == vid)
return a;
return NULL;
}
static int dsa_switch_do_mdb_add(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_mdb *mdb)
{
struct dsa_port *dp = dsa_to_port(ds, port);
struct dsa_mac_addr *a;
int err;
/* No need to bother with refcounting for user ports */
if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
return ds->ops->port_mdb_add(ds, port, mdb);
a = dsa_mac_addr_find(&dp->mdbs, mdb->addr, mdb->vid);
if (a) {
refcount_inc(&a->refcount);
return 0;
}
a = kzalloc(sizeof(*a), GFP_KERNEL);
if (!a)
return -ENOMEM;
err = ds->ops->port_mdb_add(ds, port, mdb);
if (err) {
kfree(a);
return err;
}
ether_addr_copy(a->addr, mdb->addr);
a->vid = mdb->vid;
refcount_set(&a->refcount, 1);
list_add_tail(&a->list, &dp->mdbs);
return 0;
}
static int dsa_switch_do_mdb_del(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_mdb *mdb)
{
struct dsa_port *dp = dsa_to_port(ds, port);
struct dsa_mac_addr *a;
int err;
/* No need to bother with refcounting for user ports */
if (!(dsa_port_is_cpu(dp) || dsa_port_is_dsa(dp)))
return ds->ops->port_mdb_del(ds, port, mdb);
a = dsa_mac_addr_find(&dp->mdbs, mdb->addr, mdb->vid);
if (!a)
return -ENOENT;
if (!refcount_dec_and_test(&a->refcount))
return 0;
err = ds->ops->port_mdb_del(ds, port, mdb);
if (err) {
refcount_inc(&a->refcount);
return err;
}
list_del(&a->list);
kfree(a);
return 0;
}
static int dsa_switch_fdb_add(struct dsa_switch *ds,
struct dsa_notifier_fdb_info *info)
{
int port = dsa_towards_port(ds, info->sw_index, info->port);
if (!ds->ops->port_fdb_add)
return -EOPNOTSUPP;
return ds->ops->port_fdb_add(ds, port, info->addr, info->vid);
}
static int dsa_switch_fdb_del(struct dsa_switch *ds,
struct dsa_notifier_fdb_info *info)
{
int port = dsa_towards_port(ds, info->sw_index, info->port);
if (!ds->ops->port_fdb_del)
return -EOPNOTSUPP;
return ds->ops->port_fdb_del(ds, port, info->addr, info->vid);
}
static int dsa_switch_hsr_join(struct dsa_switch *ds,
struct dsa_notifier_hsr_info *info)
{
if (ds->index == info->sw_index && ds->ops->port_hsr_join)
return ds->ops->port_hsr_join(ds, info->port, info->hsr);
return -EOPNOTSUPP;
}
static int dsa_switch_hsr_leave(struct dsa_switch *ds,
struct dsa_notifier_hsr_info *info)
{
if (ds->index == info->sw_index && ds->ops->port_hsr_leave)
return ds->ops->port_hsr_leave(ds, info->port, info->hsr);
return -EOPNOTSUPP;
}
static int dsa_switch_lag_change(struct dsa_switch *ds,
struct dsa_notifier_lag_info *info)
{
if (ds->index == info->sw_index && ds->ops->port_lag_change)
return ds->ops->port_lag_change(ds, info->port);
if (ds->index != info->sw_index && ds->ops->crosschip_lag_change)
return ds->ops->crosschip_lag_change(ds, info->sw_index,
info->port);
return 0;
}
static int dsa_switch_lag_join(struct dsa_switch *ds,
struct dsa_notifier_lag_info *info)
{
if (ds->index == info->sw_index && ds->ops->port_lag_join)
return ds->ops->port_lag_join(ds, info->port, info->lag,
info->info);
if (ds->index != info->sw_index && ds->ops->crosschip_lag_join)
return ds->ops->crosschip_lag_join(ds, info->sw_index,
info->port, info->lag,
info->info);
return 0;
}
static int dsa_switch_lag_leave(struct dsa_switch *ds,
struct dsa_notifier_lag_info *info)
{
if (ds->index == info->sw_index && ds->ops->port_lag_leave)
return ds->ops->port_lag_leave(ds, info->port, info->lag);
if (ds->index != info->sw_index && ds->ops->crosschip_lag_leave)
return ds->ops->crosschip_lag_leave(ds, info->sw_index,
info->port, info->lag);
return 0;
}
static int dsa_switch_mdb_add(struct dsa_switch *ds,
struct dsa_notifier_mdb_info *info)
{
int port = dsa_towards_port(ds, info->sw_index, info->port);
if (!ds->ops->port_mdb_add)
return -EOPNOTSUPP;
return dsa_switch_do_mdb_add(ds, port, info->mdb);
}
static int dsa_switch_mdb_del(struct dsa_switch *ds,
struct dsa_notifier_mdb_info *info)
{
int port = dsa_towards_port(ds, info->sw_index, info->port);
if (!ds->ops->port_mdb_del)
return -EOPNOTSUPP;
return dsa_switch_do_mdb_del(ds, port, info->mdb);
}
static int dsa_switch_host_mdb_add(struct dsa_switch *ds,
struct dsa_notifier_mdb_info *info)
{
int err = 0;
int port;
if (!ds->ops->port_mdb_add)
return -EOPNOTSUPP;
for (port = 0; port < ds->num_ports; port++) {
if (dsa_switch_host_address_match(ds, port, info->sw_index,
info->port)) {
err = dsa_switch_do_mdb_add(ds, port, info->mdb);
if (err)
break;
}
}
return err;
}
static int dsa_switch_host_mdb_del(struct dsa_switch *ds,
struct dsa_notifier_mdb_info *info)
{
int err = 0;
int port;
if (!ds->ops->port_mdb_del)
return -EOPNOTSUPP;
for (port = 0; port < ds->num_ports; port++) {
if (dsa_switch_host_address_match(ds, port, info->sw_index,
info->port)) {
err = dsa_switch_do_mdb_del(ds, port, info->mdb);
if (err)
break;
}
}
return err;
}
static bool dsa_switch_vlan_match(struct dsa_switch *ds, int port,
struct dsa_notifier_vlan_info *info)
{
if (ds->index == info->sw_index && port == info->port)
return true;
if (dsa_is_dsa_port(ds, port))
return true;
return false;
}
static int dsa_switch_vlan_add(struct dsa_switch *ds,
struct dsa_notifier_vlan_info *info)
{
int port, err;
if (!ds->ops->port_vlan_add)
return -EOPNOTSUPP;
for (port = 0; port < ds->num_ports; port++) {
if (dsa_switch_vlan_match(ds, port, info)) {
err = ds->ops->port_vlan_add(ds, port, info->vlan,
info->extack);
if (err)
return err;
}
}
return 0;
}
static int dsa_switch_vlan_del(struct dsa_switch *ds,
struct dsa_notifier_vlan_info *info)
{
if (!ds->ops->port_vlan_del)
return -EOPNOTSUPP;
if (ds->index == info->sw_index)
return ds->ops->port_vlan_del(ds, info->port, info->vlan);
/* Do not deprogram the DSA links as they may be used as conduit
* for other VLAN members in the fabric.
*/
return 0;
}
static int dsa_switch_change_tag_proto(struct dsa_switch *ds,
struct dsa_notifier_tag_proto_info *info)
{
const struct dsa_device_ops *tag_ops = info->tag_ops;
int port, err;
if (!ds->ops->change_tag_protocol)
return -EOPNOTSUPP;
ASSERT_RTNL();
for (port = 0; port < ds->num_ports; port++) {
if (!dsa_is_cpu_port(ds, port))
continue;
err = ds->ops->change_tag_protocol(ds, port, tag_ops->proto);
if (err)
return err;
dsa_port_set_tag_protocol(dsa_to_port(ds, port), tag_ops);
}
/* Now that changing the tag protocol can no longer fail, let's update
* the remaining bits which are "duplicated for faster access", and the
* bits that depend on the tagger, such as the MTU.
*/
for (port = 0; port < ds->num_ports; port++) {
if (dsa_is_user_port(ds, port)) {
struct net_device *slave;
slave = dsa_to_port(ds, port)->slave;
dsa_slave_setup_tagger(slave);
/* rtnl_mutex is held in dsa_tree_change_tag_proto */
dsa_slave_change_mtu(slave, slave->mtu);
}
}
return 0;
}
static int dsa_switch_mrp_add(struct dsa_switch *ds,
struct dsa_notifier_mrp_info *info)
{
if (!ds->ops->port_mrp_add)
return -EOPNOTSUPP;
if (ds->index == info->sw_index)
return ds->ops->port_mrp_add(ds, info->port, info->mrp);
return 0;
}
static int dsa_switch_mrp_del(struct dsa_switch *ds,
struct dsa_notifier_mrp_info *info)
{
if (!ds->ops->port_mrp_del)
return -EOPNOTSUPP;
if (ds->index == info->sw_index)
return ds->ops->port_mrp_del(ds, info->port, info->mrp);
return 0;
}
static int
dsa_switch_mrp_add_ring_role(struct dsa_switch *ds,
struct dsa_notifier_mrp_ring_role_info *info)
{
if (!ds->ops->port_mrp_add)
return -EOPNOTSUPP;
if (ds->index == info->sw_index)
return ds->ops->port_mrp_add_ring_role(ds, info->port,
info->mrp);
return 0;
}
static int
dsa_switch_mrp_del_ring_role(struct dsa_switch *ds,
struct dsa_notifier_mrp_ring_role_info *info)
{
if (!ds->ops->port_mrp_del)
return -EOPNOTSUPP;
if (ds->index == info->sw_index)
return ds->ops->port_mrp_del_ring_role(ds, info->port,
info->mrp);
return 0;
}
static int dsa_switch_event(struct notifier_block *nb,
unsigned long event, void *info)
{
struct dsa_switch *ds = container_of(nb, struct dsa_switch, nb);
int err;
switch (event) {
case DSA_NOTIFIER_AGEING_TIME:
err = dsa_switch_ageing_time(ds, info);
break;
case DSA_NOTIFIER_BRIDGE_JOIN:
err = dsa_switch_bridge_join(ds, info);
break;
case DSA_NOTIFIER_BRIDGE_LEAVE:
err = dsa_switch_bridge_leave(ds, info);
break;
case DSA_NOTIFIER_FDB_ADD:
err = dsa_switch_fdb_add(ds, info);
break;
case DSA_NOTIFIER_FDB_DEL:
err = dsa_switch_fdb_del(ds, info);
break;
case DSA_NOTIFIER_HSR_JOIN:
err = dsa_switch_hsr_join(ds, info);
break;
case DSA_NOTIFIER_HSR_LEAVE:
err = dsa_switch_hsr_leave(ds, info);
break;
case DSA_NOTIFIER_LAG_CHANGE:
err = dsa_switch_lag_change(ds, info);
break;
case DSA_NOTIFIER_LAG_JOIN:
err = dsa_switch_lag_join(ds, info);
break;
case DSA_NOTIFIER_LAG_LEAVE:
err = dsa_switch_lag_leave(ds, info);
break;
case DSA_NOTIFIER_MDB_ADD:
err = dsa_switch_mdb_add(ds, info);
break;
case DSA_NOTIFIER_MDB_DEL:
err = dsa_switch_mdb_del(ds, info);
break;
case DSA_NOTIFIER_HOST_MDB_ADD:
err = dsa_switch_host_mdb_add(ds, info);
break;
case DSA_NOTIFIER_HOST_MDB_DEL:
err = dsa_switch_host_mdb_del(ds, info);
break;
case DSA_NOTIFIER_VLAN_ADD:
err = dsa_switch_vlan_add(ds, info);
break;
case DSA_NOTIFIER_VLAN_DEL:
err = dsa_switch_vlan_del(ds, info);
break;
case DSA_NOTIFIER_MTU:
err = dsa_switch_mtu(ds, info);
break;
case DSA_NOTIFIER_TAG_PROTO:
err = dsa_switch_change_tag_proto(ds, info);
break;
case DSA_NOTIFIER_MRP_ADD:
err = dsa_switch_mrp_add(ds, info);
break;
case DSA_NOTIFIER_MRP_DEL:
err = dsa_switch_mrp_del(ds, info);
break;
case DSA_NOTIFIER_MRP_ADD_RING_ROLE:
err = dsa_switch_mrp_add_ring_role(ds, info);
break;
case DSA_NOTIFIER_MRP_DEL_RING_ROLE:
err = dsa_switch_mrp_del_ring_role(ds, info);
break;
default:
err = -EOPNOTSUPP;
break;
}
if (err)
dev_dbg(ds->dev, "breaking chain for DSA event %lu (%d)\n",
event, err);
return notifier_from_errno(err);
}
int dsa_switch_register_notifier(struct dsa_switch *ds)
{
ds->nb.notifier_call = dsa_switch_event;
return raw_notifier_chain_register(&ds->dst->nh, &ds->nb);
}
void dsa_switch_unregister_notifier(struct dsa_switch *ds)
{
int err;
err = raw_notifier_chain_unregister(&ds->dst->nh, &ds->nb);
if (err)
dev_err(ds->dev, "failed to unregister notifier (%d)\n", err);
}