03db3a2d81
RoCE GIDs are based on IP addresses configured on Ethernet net-devices which relate to the RDMA (RoCE) device port. Currently, each of the low-level drivers that support RoCE (ocrdma, mlx4) manages its own RoCE port GID table. As there's nothing which is essentially vendor specific, we generalize that, and enhance the RDMA core GID cache to do this job. In order to populate the GID table, we listen for events: (a) netdev up/down/change_addr events - if a netdev is built onto our RoCE device, we need to add/delete its IPs. This involves adding all GIDs related to this ndev, add default GIDs, etc. (b) inet events - add new GIDs (according to the IP addresses) to the table. For programming the port RoCE GID table, providers must implement the add_gid and del_gid callbacks. RoCE GID management requires us to state the associated net_device alongside the GID. This information is necessary in order to manage the GID table. For example, when a net_device is removed, its associated GIDs need to be removed as well. RoCE mandates generating a default GID for each port, based on the related net-device's IPv6 link local. In contrast to the GID based on the regular IPv6 link-local (as we generate GID per IP address), the default GID is also available when the net device is down (in order to support loopback). Locking is done as follows: The patch modify the GID table code both for new RoCE drivers implementing the add_gid/del_gid callbacks and for current RoCE and IB drivers that do not. The flows for updating the table are different, so the locking requirements are too. While updating RoCE GID table, protection against multiple writers is achieved via mutex_lock(&table->lock). Since writing to a table requires us to find an entry (possible a free entry) in the table and then modify it, this mutex protects both the find_gid and write_gid ensuring the atomicity of the action. Each entry in the GID cache is protected by rwlock. In RoCE, writing (usually results from netdev notifier) involves invoking the vendor's add_gid and del_gid callbacks, which could sleep. Therefore, an invalid flag is added for each entry. Updates for RoCE are done via a workqueue, thus sleeping is permitted. In IB, updates are done in write_lock_irq(&device->cache.lock), thus write_gid isn't allowed to sleep and add_gid/del_gid are not called. When passing net-device into/out-of the GID cache, the device is always passed held (dev_hold). The code uses a single work item for updating all RDMA devices, following a netdev or inet notifier. The patch moves the cache from being a client (which was incorrect, as the cache is part of the IB infrastructure) to being explicitly initialized/freed when a device is registered/removed. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> |
||
---|---|---|
.. | ||
addr.c | ||
agent.c | ||
agent.h | ||
cache.c | ||
cm_msgs.h | ||
cm.c | ||
cma.c | ||
core_priv.h | ||
device.c | ||
fmr_pool.c | ||
iwcm.c | ||
iwcm.h | ||
iwpm_msg.c | ||
iwpm_util.c | ||
iwpm_util.h | ||
mad_priv.h | ||
mad_rmpp.c | ||
mad_rmpp.h | ||
mad.c | ||
Makefile | ||
multicast.c | ||
netlink.c | ||
opa_smi.h | ||
packer.c | ||
roce_gid_mgmt.c | ||
sa_query.c | ||
sa.h | ||
smi.c | ||
smi.h | ||
sysfs.c | ||
ucm.c | ||
ucma.c | ||
ud_header.c | ||
umem_odp.c | ||
umem_rbtree.c | ||
umem.c | ||
user_mad.c | ||
uverbs_cmd.c | ||
uverbs_main.c | ||
uverbs_marshall.c | ||
uverbs.h | ||
verbs.c |