IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Fix typo of the cited commit that calls to mlx5_create_ttc_table, instead
of mlx5_create_inner_ttc_table.
Fixes: f4b45940e9b9 ("net/mlx5: Embed mlx5_ttc_table")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Enable user to disable VDPA net auxiliary device so that when it is not
required, user can disable it.
For example,
$ devlink dev param set pci/0000:06:00.0 \
name enable_vnet value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0
At this point devlink instance do not create auxiliary device
mlx5_core.vnet.2 for the VDPA net functionality.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable user to disable RDMA auxiliary device so that when it is not
required, user can disable it.
For example,
$ devlink dev param set pci/0000:06:00.0 \
name enable_rdma value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0
At this point devlink instance do not create auxiliary device
mlx5_core.rdma.2 for the RDMA functionality.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable user to disable Ethernet auxiliary device so that when it is not
required, user can disable it.
For example,
$ devlink dev param set pci/0000:06:00.0 \
name enable_eth value false cmode driverinit
$ devlink dev reload pci/0000:06:00.0
At this point devlink instance do not create mlx5_core.eth.2 auxiliary
device for the Ethernet functionality.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup routine missed to unpublish the parameters. Add it.
Fixes: e890acd5ff18 ("net/mlx5: Add devlink flow_steering_mode parameter")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed says:
====================
pull-request: mlx5-next 2020-08-9
This pulls mlx5-next branch which includes patches already reviewed on
net-next and rdma mailing lists.
1) mlx5 single E-Switch FDB for lag
2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
3) Add DCS caps & fields support
[1] https://patchwork.kernel.org/project/netdevbpf/cover/20210803231959.26513-1-saeed@kernel.org/
[2] 0e3364dab7.1626609184.git.leonro@nvidia.com/
[3] 55e1d69bef.1624258894.git.leonro@nvidia.com/
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Lag, Create shared FDB when in switchdev mode
net/mlx5: E-Switch, add logic to enable shared FDB
net/mlx5: Lag, move lag destruction to a workqueue
net/mlx5: Lag, properly lock eswitch if needed
net/mlx5: Add send to vport rules on paired device
net/mlx5: E-Switch, Add event callback for representors
net/mlx5e: Use shared mappings for restoring from metadata
net/mlx5e: Add an option to create a shared mapping
net/mlx5: E-Switch, set flow source for send to uplink rule
RDMA/mlx5: Add shared FDB support
{net, RDMA}/mlx5: Extend send to vport rules
RDMA/mlx5: Fill port info based on the relevant eswitch
net/mlx5: Lag, add initial logic for shared FDB
net/mlx5: Return mdev from eswitch
IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
net/mlx5: Add DCS caps & fields support
====================
Link: https://lore.kernel.org/r/20210809202522.316930-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds skb's frag page recycling support based on
the frag page support in page pool.
The performance improves above 10~20% for single thread iperf
TCP flow with IOMMU disabled when iperf server and irq/NAPI
have a different CPU.
The performance improves about 135%(14Gbit to 33Gbit) for single
thread iperf TCP flow when IOMMU is in strict mode and iperf
server shares the same cpu with irq/NAPI.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, page->pp is cleared and set everytime the page
is recycled, which is unnecessary.
So only set the page->pp when the page is added to the page
pool and only clear it when the page is released from the
page pool.
This is also a preparation to support allocating frag page
in page pool.
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All kernel devlink implementations call to devlink_alloc() during
initialization routine for specific device which is used later as
a parent device for devlink_register().
Such late device assignment causes to the situation which requires us to
call to device_register() before setting other parameters, but that call
opens devlink to the world and makes accessible for the netlink users.
Any attempt to move devlink_register() to be the last call generates the
following error due to access to the devlink->dev pointer.
[ 8.758862] devlink_nl_param_fill+0x2e8/0xe50
[ 8.760305] devlink_param_notify+0x6d/0x180
[ 8.760435] __devlink_params_register+0x2f1/0x670
[ 8.760558] devlink_params_register+0x1e/0x20
The simple change of API to set devlink device in the devlink_alloc()
instead of devlink_register() fixes all this above and ensures that
prior to call to devlink_register() everything already set.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink port already has pointer to the devlink instance and all API
calls that forward these devlink ports to the drivers perform same
"devlink_port->devlink" assignment before actual call.
This patch removes useless parameter and allows us in the future
to create specific devlink_port_ops to manage user space access with
reliable ops assignment.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When compiling with clang in certain configurations, an objtool warning
appears:
drivers/net/ethernet/stmicro/stmmac/dwmac-ipq806x.o: warning: objtool:
ipq806x_gmac_probe() falls through to next function phy_modes()
This happens because the unreachable annotation in the third switch
statement is not eliminated. The compiler should know that the first
default case would prevent the second and third from being reached as
the comment notes but sanitizer options can make it harder for the
compiler to reason this out.
Help the compiler out by eliminating the unreachable() annotation and
unifying the default case error handling so that there is no objtool
warning, the meaning of the code stays the same, and there is less
duplication.
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The true check on the variable startable in the ternary operator
is always false because the previous if statement handles the true
condition for startable. Hence the ternary check is dead code and
can be removed.
Addresses-Coverity: ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables support for hard irqs deferral feature from Eric Dumazet
[1] for TI K3 CPSW driver by using napi_complete_done() in TX completion
path.
Depending on gro_flush_timeout and napi_defer_hard_irqs at gives up to 30%
CPU utilization reduction:
gro_flush_timeout=50000
napi_defer_hard_irqs=2
netperf -l 10 -H 192.168.1.1 -t UDP_STREAM -c -C -- -m 1470
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
Socket Message Elapsed Messages CPU Service
Size Size Time Okay Errors Throughput Util Demand
bytes bytes secs # # 10^6bits/sec % SS us/KB
before:
212992 1470 10.00 809632 0 952.0 42.98 14.792
212992 10.00 809630 952.0 50.66 8.719
after:
212992 1470 10.00 813686 0 956.8 32.14 11.009
212992 10.00 813686 956.8 50.05 8.570
[1] https://lore.kernel.org/netdev/20200422161329.56026-1-edumazet@google.com/
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On TI K3 am64x platform the issue with RX IRQ is observed - it's become
disabled forever after .ndo_stop(). The K3 CPSW driver manipulates RX IRQ
by using standard Linux enable_irq()/disable_irq_nosync() API as there is
no IRQ enable/disable options in CPSW HW itself, as result during
.ndo_stop() following sequence happens
phy_stop()
teardown TX/RX channels
wait for TX tdown complete
napi_disable(TX)
clean up TX channels
(a)
napi_disable(RX)
At point (a) it's not possible to predict if RX IRQ was triggered or not.
if RX IRQ was triggered then it also not possible to definitely say if RX
NAPI was run or only scheduled and immediately canceled by
napi_disable(RX). Actually the last case causes RX IRQ to be permanently
disabled.
Another observed issue is that RX IRQ enable counter become unbalanced if
(gro_flush_timeout =! 0) while (napi_defer_hard_irqs == 0):
Unbalanced enable for IRQ 44
WARNING: CPU: 0 PID: 10 at ../kernel/irq/manage.c:776 __enable_irq+0x38/0x80
__enable_irq+0x38/0x80
enable_irq+0x54/0xb0
am65_cpsw_nuss_rx_poll+0x2f4/0x368
__napi_poll+0x34/0x1b8
net_rx_action+0xe4/0x220
_stext+0x11c/0x284
run_ksoftirqd+0x4c/0x60
To avoid above issues introduce flag indicating if RX was actually disabled
before enabling it in am65_cpsw_nuss_rx_poll() and restore RX IRQ state in
.ndo_open()
Fixes: 4f7cce272403 ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g")
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hence all users of davinci_cpdma switched to skb_put_padto() the frame
padding can be removed from it.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use skb_put_padto() instead of skb_padto() so skb->len also got updated, as
preparation for further removing frame padding from cpdma.
It also makes xmit path more clear and linear.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use skb_put_padto() instead of skb_padto() so skb->len also got updated, as
preparation for further removing frame padding from cpdma.
It also makes xmit path more clear and linear.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Build failure in drivers/net/wwan/mhi_wwan_mbim.c:
add missing parameter (0, assuming we don't want buffer pre-alloc).
Conflict in drivers/net/dsa/sja1105/sja1105_main.c between:
589918df9322 ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
0fac6aa098ed ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode")
Follow the instructions from the commit message of the former commit
- removed the if conditions. When looking at commit 589918df9322 ("net:
dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
note that the mask_iotag fields get removed by the following patch.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If both eswitches are in switchdev mode and the uplink representors
are enslaved to the same bond device create a shared FDB configuration.
When moving to shared FDB mode not only the hardware needs be configured
but the RDMA driver needs to reconfigure itself.
When such change is done, unload the RDMA devices, configure the hardware
and load the RDMA representors.
When destroying the lag (can happen if a PCI function is unbinded,
driver is unloaded or by just removing a netdev from the bond) make sure
to restore the system to the previous state only if possible.
For example, if a PCI function is unbinded there is no need to load the
representors as the device is going away.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Shared FDB allows to direct traffic from all the vports in the HCA to a
single eswitch. In order to do that three things are needed.
1) Point the ingress ACL of the slave uplink to that of the master.
With this, wire traffic from both uplinks will reach the same eswitch
with the same metadata where a single steering rule can catch traffic
from both ports.
2) Set the FDB root flow table of the slave's eswitch to that of the
master. As this flow table can change dynamically make sure to
sync it on any set root flow table FDB command.
This will make sure traffic from SFs, VFs, ECPFs and PFs reach the
master eswitch.
3) Split wire traffic at the eswitch manager egress ACL so that it's
directed to the native eswitch manager. We only treat wire traffic
from both ports the same at the eswitch level. If such traffic wasn't
handled in the eswitch it needs to reach the right representor to be
processed by software. For example LACP packets should *always*
reach the right uplink representor for correct operation.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
If a netdev is removed from the lag the lag should be destroyed.
With downstream patches this might trigger a reconfiguration of
representors on a different eswitch and such we don't have the proper
locking to so from this path. Move the destruction to be done by the
workqueue.
As the destruction won't affect the netdev side it okay to do so.
The RDMA side will be reconfigured and it already coded to handle such
reconfiguration.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently when doing hardware lag we check the eswitch mode
but as this isn't done under a lock the check isn't valid.
As the code needs to sync between two different devices an extra
care is needed.
- When going to change eswitch mode, if hardware lag is active destroy it.
- While changing eswitch modes block any hardware bond creation.
- Delay handling bonding events until there are no mode changes in
progress.
- When attaching a new mdev to lag, block until there is no mode change
in progress. In order for the mode change to finish the interface lock
will have to be taken. Release the lock and sleep for 100ms to
allow forward progress. As this is a very rare condition (can happen if
the user unbinds and binds a PCI function while also changing eswitch
mode of the other PCI function) it has no real world impact.
As taking multiple eswitch mode locks is now required lockdep will
complain about a possible deadlock. Register a key per eswitch to make
lockdep happy.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When two mlx5 devices are paired in switchdev mode, always offload the
send-to-vport rule to the peer E-Switch. This allows to abstract
the logic when this is really necessary (single FDB) and combine
the logic of both cases into one.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
This callback will allow to notify representors about relevant events
when in OFFLOADS mode. In downstream patches, this will be used to notify
about PAIR/UNPAIR devcom events.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
FTEs are added with mapped metadata which is saved per eswitch.
When uplink reps are bonded and we are in a single FDB mode,
we could fail to find metadata which was stored on one eswitch mapping
but not the other or with a different id.
To resolve this issue use shared mapping between eswitch ports.
We do not have any conflict using a single mapping, for a type,
between the ports.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Set the flow source param to local vport for the uplink rep
send-to-vport rule.
This will comply with the recent changes in SW steering that
use the flow source as an indication for the rule type - rx or tx.
Since the uplink send-to-vport rule is forwarding traffic to the wire
it has to indicate that it is an sx rule and can't use the any port
value in the flow source.
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
In shared FDB there is only one eswitch which is active and it receives
traffic from all representors and all vports in the HCA.
While the Ethernet representor will always reside on its native PF
the IB representor will not. Extend send to vport rule creation to
support such flows. Need to account for source vport that sends the
traffic (on which the representors resides) and the target eswitch
the traffic which reach.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
As shared FDB requires changes in two subsystems first expose the needed
core functions so the RDMA side can be changed.
mlx5_lag_is_master(): return true if a given mlx5 device is the lag master.
mlx5_lag_is_shared_fdb(): Returns true if the lag mode is shared FDB.
mlx5_lag_get_peer_mdev(): Return the peer mdev in lag.
The mentioned functions will be used by downstream patches in order
to add support for shared FDB for the RDMA side.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Export a function so users can retrieve the mellanox device that manages
the eswitch from the eswitch device.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Smatch says:
drivers/net/ethernet/neterion/vxge/vxge-main.c:3518 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3518 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3520 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
drivers/net/ethernet/neterion/vxge/vxge-main.c:3520 vxge_device_unregister() error: Using vdev after free_{netdev,candev}(dev);
Since vdev pointer is netdev private data accessing it after free_netdev()
call can cause use-after-free bug. Fix it by moving free_netdev() call at
the end of the function
Fixes: 6cca200362b4 ("vxge: cleanup probe error paths")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Smatch says:
drivers/net/ethernet/freescale/fec_main.c:3994 fec_drv_remove() error: Using fep after free_{netdev,candev}(ndev);
drivers/net/ethernet/freescale/fec_main.c:3995 fec_drv_remove() error: Using fep after free_{netdev,candev}(ndev);
Since fep pointer is netdev private data, accessing it after free_netdev()
call can cause use-after-free bug. Fix it by moving free_netdev() call at
the end of the function
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: a31eda65ba21 ("net: fec: fix clock count mis-match")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Reviewed-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The am65_cpsw_port_offload_fwd_mark_update() causes NULL exception crash
when there is at least one disabled port and any other port added to the
bridge first time.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000858
pc : am65_cpsw_port_offload_fwd_mark_update+0x54/0x68
lr : am65_cpsw_netdevice_event+0x8c/0xf0
Call trace:
am65_cpsw_port_offload_fwd_mark_update+0x54/0x68
notifier_call_chain+0x54/0x98
raw_notifier_call_chain+0x14/0x20
call_netdevice_notifiers_info+0x34/0x78
__netdev_upper_dev_link+0x1c8/0x290
netdev_master_upper_dev_link+0x1c/0x28
br_add_if+0x3f0/0x6d0 [bridge]
Fix it by adding proper check for port->ndev != NULL.
Fixes: 2934db9bcb30 ("net: ti: am65-cpsw-nuss: Add netdevice notifiers")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the error code if bnx2x_alloc_fw_stats_mem() fails. The current
code returns success.
Fixes: ad5afc89365e ("bnx2x: Separate VF and PF logic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit b0e81817629a496854ff1799f6cbd89597db65fd. Explicit
driver dependency on the bridge is no longer needed since
switchdev_bridge_port_{,un}offload() is no longer implemented by the
bridge driver but by switchdev.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the introduction of explicit offloading API in switchdev in commit
2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform which bridge
ports are offloaded"), we started having Ethernet switch drivers calling
directly into a function exported by net/bridge/br_switchdev.c, which is
a function exported by the bridge driver.
This means that drivers that did not have an explicit dependency on the
bridge before, like cpsw and am65-cpsw, now do - otherwise it is not
possible to call a symbol exported by a driver that can be built as
module unless you are a module too.
There was an attempt to solve the dependency issue in the form of commit
b0e81817629a ("net: build all switchdev drivers as modules when the
bridge is a module"). Grygorii Strashko, however, says about it:
| In my opinion, the problem is a bit bigger here than just fixing the
| build :(
|
| In case, of ^cpsw the switchdev mode is kinda optional and in many
| cases (especially for testing purposes, NFS) the multi-mac mode is
| still preferable mode.
|
| There were no such tight dependency between switchdev drivers and
| bridge core before and switchdev serviced as independent, notification
| based layer between them, so ^cpsw still can be "Y" and bridge can be
| "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE
| will need to be set as "Y", or we will have to update drivers to
| support build with BRIDGE=n and maintain separate builds for
| networking vs non-networking testing. But is this enough? Wouldn't
| it cause 'chain reaction' required to add more and more "Y" options
| (like CONFIG_VLAN_8021Q)?
|
| PS. Just to be sure we on the same page - ARM builds will be forced
| (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our
| automation testing will just fail with omap2plus_defconfig.
In the light of this, it would be desirable for some configurations to
avoid dependencies between switchdev drivers and the bridge, and have
the switchdev mode as completely optional within the driver.
Arnd Bergmann also tried to write a patch which better expressed the
build time dependency for Ethernet switch drivers where the switchdev
support is optional, like cpsw/am65-cpsw, and this made the drivers
follow the bridge (compile as module if the bridge is a module) only if
the optional switchdev support in the driver was enabled in the first
place:
https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/
but this still did not solve the fact that cpsw and am65-cpsw now must
be built as modules when the bridge is a module - it just expressed
correctly that optional dependency. But the new behavior is an apparent
regression from Grygorii's perspective.
So to support the use case where the Ethernet driver is built-in,
NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we
need a framework that can handle the possible absence of the bridge from
the running system, i.e. runtime bloatware as opposed to build-time
bloatware.
Luckily we already have this framework, since switchdev has been using
it extensively. Events from the bridge side are transmitted to the
driver side using notifier chains - this was originally done so that
unrelated drivers could snoop for events emitted by the bridge towards
ports that are implemented by other drivers (think of a switch driver
with LAG offload that listens for switchdev events on a bonding/team
interface that it offloads).
There are also events which are transmitted from the driver side to the
bridge side, which again are modeled using notifiers.
SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with
notifying the bridge that a MAC address has been dynamically learned.
So there is a precedent we can use for modeling the new framework.
The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work
that the bridge needs to do when a port becomes offloaded is blocking in
its nature: replay VLANs, MDBs etc. The calling context is indeed
blocking (we are under rtnl_mutex), but the existing switchdev
notification chain that the bridge is subscribed to is only the atomic
one. So we need to subscribe the bridge to the blocking switchdev
notification chain too.
This patch:
- keeps the driver-side perception of the switchdev_bridge_port_{,un}offload
unchanged
- moves the implementation of switchdev_bridge_port_{,un}offload from
the bridge module into the switchdev module.
- makes everybody that is subscribed to the switchdev blocking notifier
chain "hear" offload & unoffload events
- makes the bridge driver subscribe and handle those events
- moves the bridge driver's handling of those events into 2 new
functions called br_switchdev_port_{,un}offload. These functions
contain in fact the core of the logic that was previously in
switchdev_bridge_port_{,un}offload, just that now we go through an
extra indirection layer to reach them.
Unlike all the other switchdev notification structures, the structure
used to carry the bridge port information, struct
switchdev_notifier_brport_info, does not contain a "bool handled".
This is because in the current usage pattern, we always know that a
switchdev bridge port offloading event will be handled by the bridge,
because the switchdev_bridge_port_offload() call was initiated by a
NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a
bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event
couldn't have happened.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid reconfig problems due to failures in netif_set_real_num_tx_queues()
by using netif_set_real_num_queues().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink trap group is registered but not released in error flow,
add the missing devlink_trap_groups_unregister() call.
Fixes: 0a9003f45e91 ("net: marvell: prestera: devlink: add traps/groups implementation")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a switch port is connected to a MAC, use the common dpaa2-mac support
for exporting the available MAC statistics.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the next patch, we'll add support for also exporting the MAC
statistics in the ethtool stats. Annotate already present HW stats with
a suggestive prefix.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Integrate the common MAC endpoint management support into the
dpaa2-switch driver as well. Nothing special happens here, just that the
already available dpaa2-mac functions are also called from dpaa2-switch.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of a switch DPAA2 object, the interface ID is also needed when
querying for the object endpoint. Extend fsl_mc_get_endpoint() so that
users can also pass the interface ID that are interested in.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to dpaa2_switch_port_link_state_update is a leftover from the
time when on DPAA2 platforms the PHYs were started at boot time so when
an ifconfig was issued on the associated interface, the link status
needed to be checked directly from the ndo_open() callback. This is not
needed anymore since we are now properly integrated with the PHY layer
thus a link interrupt will come directly from the PHY eventually without
the need to call the sync function.
Fix this up by removing the call to dpaa2_switch_port_link_state_update.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should not enable the switch interfaces at probe time since this is
trigged by the open callback. Remove the call dpsw_enable() which does
exactly this.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MC firmware supplies us the switch interface index for which an
interrupt was triggered. Use this to our advantage instead of looping
through all the switch ports and doing unnecessary work.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Request all interrupt sources to be read and then cleared on the DPSW
object. In the next patches we'll also add support for treating other
interrupts.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch intends to fix MAC internal delay doesn't work, due to use
of_property_read_u32() incorrectly, and improve this feature a bit:
1) check the delay value if valid, only program register when it's 2000ps.
2) only enable "enet_2x_txclk" clock when require MAC internal delay.
Fixes: fc539459e900 ("net: fec: add MAC internal delayed clock feature support")
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20210803052424.19008-1-qiangqing.zhang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>