linux

iv/linux

Author	SHA1	Message	Date
Alexandra Winter	780b6e7db2	s390/qeth: implement ndo_bridge_getlink for learning_sync Documentation/networking/switchdev.txt and 'man bridge' indicate that the learning_sync bridge attribute is used to indicate whether a given device will sync MAC addresses learned on its device port to a master bridge FDB. learning_sync attribute can not be read while interface is offline (down). See 'commit e6e771b3d897 ("s390/qeth: detach netdevice while card is offline")' We return EOPNOTSUPP and not EONODEV in this case, because EONOTSUPP is the only rc that is tolerated by 'bridge -d link show'. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	817741a8ea	s390/qeth: Reset address notification in case of buffer overflow In case hardware sends more device-to-bridge-address-change notfications than the qeth-l2 driver can handle, the hardware will send an overflow event and then stop sending any events. It expects software to flush its FDB and start over again. Re-enabling address-change-notification will report all current addresses. In order to re-enable address-change-notification this patch defines the functions qeth_l2_dev2br_an_set() and qeth_l2_dev2br_an_set_cb to enable or disable dev-to-bridge-address-notification. A following patch will use the learning_sync bridgeport flag to trigger enabling or disabling of address-change-notification, so we define priv->brport_features to store the current setting. BRIDGE_INFO and ADDR_INFO functionality are mutually exclusive, whereas ADDR_INFO and qeth_l2_vnicc* can be used together. Alternative implementations to handle buffer overflow: Just re-enabling notification and adding all newly reported addresses would cover any lost 'add' events, but not the lost 'delete' events. Then these invalid addresses would stay in the bridge FDB as long as the device exists. Setting the net device down and up, would be an alternative, but is a bit drastic. If the net device has many secondary addresses this will create many delete/add events at its peers which could de-stabilize the network segment. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	d05e8e68b0	bridge: Add SWITCHDEV_FDB_FLUSH_TO_BRIDGE notifier so the switchdev can notifiy the bridge to flush non-permanent fdb entries for this port. This is useful whenever the hardware fdb of the switchdev is reset, but the netdev and the bridgeport are not deleted. Note that this has the same effect as the IFLA_BRPORT_FLUSH attribute. CC: Jiri Pirko <jiri@resnulli.us> CC: Ivan Vecera <ivecera@redhat.com> CC: Roopa Prabhu <roopa@nvidia.com> CC: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	10a6cfc0fc	s390/qeth: Translate address events into switchdev notifiers A qeth-l2 HiperSockets card can show switch-ish behaviour in the sense, that it can report all MACs that are reachable via this interface. Just like a switch device, it can notify the software bridge about changes to its fdb. This patch exploits this device-to-bridge-notification and extracts the relevant information from the hardware events to generate notifications to an attached software bridge. There are 2 sources for this information: 1) The reply message of Perform-Network-Subchannel-Operations (PNSO) (operation code ADDR_INFO) reports all addresses that are currently reachable (implemented in a later patch). 2) As long as device-to-bridge-notification is enabled, hardware will generate address change notification events, whenever the content of the hardware fdb changes (this patch). The bridge_hostnotify feature (PNSO operation code BRIDGE_INFO) uses the same address change notification events. We need to distinguish between qeth_pnso_mode QETH_PNSO_BRIDGEPORT and QETH_PNSO_ADDR_INFO and call a different handler. In both cases deadlocks must be prevented, if the workqueue is drained under lock and QETH_PNSO_NONE, when notification is disabled. bridge_hostnotify generates udev events, there is no intend to do the same for dev2br. Instead this patch will generate SWITCHDEV_FDB_ADD_TO_BRIDGE and SWITCHDEV_FDB_DEL_TO_BRIDGE notifications, that will cause the software bridge to add (or delete) entries to its fdb as 'extern_learn offload'. Documentation/networking/switchdev.txt proposes to add "depends NET_SWITCHDEV" to driver's Kconfig. This is not done here, so even in absence of the NET_SWITCHDEV module, the QETH_L2 module will still be built, but then the switchdev notifiers will have no effect. No VLAN filtering is done on the entries and VLAN information is not passed on to the bridge fdb entries. This could be added later. For now VLAN interfaces can be defined on the upper bridge interface. Multicast entries are not passed on to the bridge fdb. This could be added later. For now mcast flooding can be used in the bridge. The card reports all MACs that are in its FDB, but we must not pass on MACs that are registered for this interface. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	fa115adff2	s390/qeth: Detect PNSO OC3 capability This patch detects whether device-to-bridge-notification, provided by the Perform Network Subchannel Operation (PNSO) operation code ADDR_INFO (OC3), is supported by this card. A following patch will map this to the learning_sync bridgeport flag, so we store it in priv->brport_hw_features in bridgeport flag format. Only IQD cards provide PNSO. There is a feature bit to indicate whether the machine provides OC3, unfortunately it is not set on old machines. So PNSO is called to find out. As this will disable notification and is exclusive with bridgeport_notification, this must be done during card initialisation before previous settings are restored. PNSO functionality requires some configuration values that are added to the qeth_card.info structure. Some helper functions are defined to fill them out when the card is brought online and some other places are adapted, that can also benefit from these fields. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Alexandra Winter	b983aa1f7d	s390/cio: Helper functions to read CSSID, IID, and CHID Add helper functions to expose Channel Subsystem ID (CSSID), MIF Image Id (IID), Channel ID (CHID) and Channel Path ID (CHPID). These values are required by the qeth driver's exploitation of network- address-change-notifications to determine which entries belong to this interface. Store the Partition identifier in System log, as this may be used to map a Linux view to a Hardware view for debugging purpose. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:46 -07:00
Alexandra Winter	4fea49a79e	s390/cio: Add new Operation Code OC3 to PNSO Add support for operation code 3 (OC3) of the Perform-Network-Subchannel-Operations (PNSO) function of the Channel-Subsystem-Call (CHSC) instruction. PNSO provides 2 operation codes: OC0 - BRIDGE_INFO OC3 - ADDR_INFO (new) Extend the function calls to pnso to pass the OC and add new response code 0108. Support for OC3 is indicated by a flag in the css_general_characteristics. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:46 -07:00
Soheil Hassas Yeganeh	afb83012cc	tcp: schedule EPOLLOUT after a partial sendmsg For EPOLLET, applications must call sendmsg until they get EAGAIN. Otherwise, there is no guarantee that EPOLLOUT is sent if there was a failure upon memory allocation. As a result on high-speed NICs, userspace observes multiple small sendmsgs after a partial sendmsg until EAGAIN, since TCP can send 1-2 TSOs in between two sendmsg syscalls: // One large partial send due to memory allocation failure. sendmsg(20MB) = 2MB // Many small sends until EAGAIN. sendmsg(18MB) = 64KB sendmsg(17.9MB) = 128KB sendmsg(17.8MB) = 64KB ... sendmsg(...) = EAGAIN // At this point, userspace can assume an EPOLLOUT. To fix this, set the SOCK_NOSPACE on all partial sendmsg scenarios to guarantee that we send EPOLLOUT after partial sendmsg. After this commit userspace can assume that it will receive an EPOLLOUT after the first partial sendmsg. This EPOLLOUT will benefit from sk_stream_write_space() logic delaying the EPOLLOUT until significant space is available in write queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:58:24 -07:00
Soheil Hassas Yeganeh	8ba3c9d1c6	tcp: return EPOLLOUT from tcp_poll only when notsent_bytes is half the limit If there was any event available on the TCP socket, tcp_poll() will be called to retrieve all the events. In tcp_poll(), we call sk_stream_is_writeable() which returns true as long as we are at least one byte below notsent_lowat. This will result in quite a few spurious EPLLOUT and frequent tiny sendmsg() calls as a result. Similar to sk_stream_write_space(), use __sk_stream_is_writeable with a wake value of 1, so that we set EPOLLOUT only if half the space is available for write. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:58:24 -07:00
Shannon Nelson	ed6d9b0228	ionic: fix up debugfs after queue swap Clean and rebuild the debugfs info for the queues being swapped. Fixes: a34e25ab977c ("ionic: change the descriptor ring length without full reset") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:55:54 -07:00
Vladimir Oltean	b14a9fc452	__netif_receive_skb_core: don't untag vlan from skb on DSA master A DSA master interface has upper network devices, each representing an Ethernet switch port attached to it. Demultiplexing the source ports and setting skb->dev accordingly is done through the catch-all ETH_P_XDSA packet_type handler. Catch-all because DSA vendors have various header implementations, which can be placed anywhere in the frame: before the DMAC, before the EtherType, before the FCS, etc. So, the ETH_P_XDSA handler acts like an rx_handler more than anything. It is unlikely for the DSA master interface to have any other upper than the DSA switch interfaces themselves. Only maybe a bridge upper, but it is very likely that the DSA master will have no 8021q upper. So __netif_receive_skb_core() will try to untag the VLAN, despite the fact that the DSA switch interface might have an 8021q upper. So the skb will never reach that. So far, this hasn't been a problem because most of the possible placements of the DSA switch header mentioned in the first paragraph will displace the VLAN header when the DSA master receives the frame, so __netif_receive_skb_core() will not actually execute any VLAN-specific code for it. This only becomes a problem when the DSA switch header does not displace the VLAN header (for example with a tail tag). What the patch does is it bypasses the untagging of the skb when there is a DSA switch attached to this net device. So, DSA is the only packet_type handler which requires seeing the VLAN header. Once skb->dev will be changed, __netif_receive_skb_core() will be invoked again and untagging, or delivery to an 8021q upper, will happen in the RX of the DSA switch interface itself. see commit 9eb8eff0cf2f ("net: bridge: allow enslaving some DSA master network devices". This is actually the reason why I prefer keeping DSA as a packet_type handler of ETH_P_XDSA rather than converting to an rx_handler. Currently the rx_handler code doesn't support chaining, and this is a problem because a DSA master might be bridged. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:34:18 -07:00
David S. Miller	0ca6d8b7d6	Merge branch 'net-next-dsa-mt7530-add-support-for-MT7531' Landen Chao says: ==================== net-next: dsa: mt7530: add support for MT7531 This patch series adds support for MT7531. MT7531 is the next generation of MT7530 which could be found on Mediatek router platforms such as MT7622 or MT7629. It is also a 7-ports switch with 5 giga embedded phys, 2 cpu ports, and the same MAC logic of MT7530. Cpu port 6 only supports SGMII interface. Cpu port 5 supports either RGMII or SGMII in different HW SKU, but cannot be muxed to PHY of port 0/4 like mt7530. Due to support for SGMII interface, pll, and pad setting are different from MT7530. MT7531 SGMII interface can be configured in following mode: - 'SGMII AN mode' with in-band negotiation capability which is compatible with PHY_INTERFACE_MODE_SGMII. - 'SGMII force mode' without in-band negotiation which is compatible with 10B/8B encoding of PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause. - 2.5 times faster clocked 'SGMII force mode' without in-band negotiation which is compatible with 10B/8B encoding of PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause. v4 -> v5 - Add fixed-link node to dsa cpu port in dts file by suggestion of Vladimir Oltean. v3 -> v4 - Adjust the coding style by suggestion of Jakub Kicinski. Remove unnecessary jumping label, merge continuous numeric 'switch cases' into one line, and keep the variables longest to shortest (reverse xmas tree). v2 -> v3 - Keep the same setup logic of mt7530/mt7621 because these series of patches is for adding mt7531 hardware. - Do not adjust rgmii delay when vendor phy driver presents in order to prevent double adjustment by suggestion of Andrew Lunn. - Remove redundant 'Example 4' from dt-bindings by suggestion of Rob Herring. - Fix typo. v1 -> v2 - change phylink_validate callback function to support full-duplex gigabit only to match hardware capability. - add description of SGMII interface. - configure mt7531 cpu port in fastest speed by default. - parse SGMII control word for in-band negotiation mode. - configure RGMII delay based on phy.rst. - Rename the definition in the header file to avoid potential conflicts. - Add wrapper function for mdio read/write to support both C22 and C45. - correct fixed-link speed of 2500base-x in dts. - add MT7531 port mirror setting. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:39 -07:00
Landen Chao	79a675e6b1	arm64: dts: mt7622: add mt7531 dsa to bananapi-bpi-r64 board Add mt7531 dsa to bananapi-bpi-r64 board for 5 giga Ethernet ports support. Signed-off-by: Landen Chao <landen.chao@mediatek.com> Tested-By: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:39 -07:00
Landen Chao	6af064486b	arm64: dts: mt7622: add mt7531 dsa to mt7622-rfb1 board Add mt7531 dsa to mt7622-rfb1 board for 5 giga Ethernet ports support. mt7622 only supports 1 sgmii interface, so either gmac0 or gmac1 can be configured as sgmii interface. In this patch, change to connect mt7622 gmac0 and mt7531 port6 through sgmii interface. Signed-off-by: Landen Chao <landen.chao@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:39 -07:00
Landen Chao	c288575f78	net: dsa: mt7530: Add the support of MT7531 switch Add new support for MT7531: MT7531 is the next generation of MT7530. It is also a 7-ports switch with 5 giga embedded phys, 2 cpu ports, and the same MAC logic of MT7530. Cpu port 6 only supports SGMII interface. Cpu port 5 supports either RGMII or SGMII in different HW sku, but cannot be muxed to PHY of port 0/4 like mt7530. Due to SGMII interface support, pll, and pad setting are different from MT7530. This patch adds different initial setting, and SGMII phylink handlers of MT7531. MT7531 SGMII interface can be configured in following mode: - 'SGMII AN mode' with in-band negotiation capability which is compatible with PHY_INTERFACE_MODE_SGMII. - 'SGMII force mode' without in-band negotiation which is compatible with 10B/8B encoding of PHY_INTERFACE_MODE_1000BASEX with fixed full-duplex and fixed pause. - 2.5 times faster clocked 'SGMII force mode' without in-band negotiation which is compatible with 10B/8B encoding of PHY_INTERFACE_MODE_2500BASEX with fixed full-duplex and fixed pause. Signed-off-by: Landen Chao <landen.chao@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:39 -07:00
Landen Chao	27834b0223	dt-bindings: net: dsa: add new MT7531 binding to support MT7531 Add devicetree binding to support the compatible mt7531 switch as used in the MediaTek MT7531 switch. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Landen Chao <landen.chao@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:39 -07:00
Landen Chao	88bdef8be9	net: dsa: mt7530: Extend device data ready for adding a new hardware Add a structure holding required operations for each device such as device initialization, PHY port read or write, a checker whether PHY interface is supported on a certain port, MAC port setup for either bus pad or a specific PHY interface. The patch is done for ready adding a new hardware MT7531, and keep the same setup logic of existing hardware. Signed-off-by: Landen Chao <landen.chao@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:38 -07:00
Landen Chao	dc8ef938c9	net: dsa: mt7530: Refine message in Kconfig Refine message in Kconfig with fixing typo and an explicit MT7621 support. Signed-off-by: Landen Chao <landen.chao@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 16:30:38 -07:00
Xie He	4b46838535	drivers/net/wan/x25_asy: Remove an unnecessary x25_type_trans call x25_type_trans only needs to be called before we call netif_rx to pass the skb to upper layers. It does not need to be called before lapb_data_received. The LAPB module does not need the fields that are set by calling it. In the other two X.25 drivers - lapbether and hdlc_x25. x25_type_trans is only called before netif_rx and not before lapb_data_received. Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:41:02 -07:00
Paolo Abeni	2de79ee27f	net: try to avoid unneeded backlog flush flush_all_backlogs() may cause deadlock on systems running processes with FIFO scheduling policy. The above is critical in -RT scenarios, where user-space specifically ensure no network activity is scheduled on the CPU running the mentioned FIFO process, but still get stuck. This commit tries to address the problem checking the backlog status on the remote CPUs before scheduling the flush operation. If the backlog is empty, we can skip it. v1 -> v2: - explicitly clear flushed cpu mask - Eric Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:39:00 -07:00
David S. Miller	7b2d1b8d9d	Merge branch 'mlxsw-Derive-SBIB-from-maximum-port-speed-and-MTU' Ido Schimmel says: ==================== mlxsw: Derive SBIB from maximum port speed & MTU Petr says: Internal buffer is a part of port headroom used for packets that are mirrored due to triggers that the Spectrum ASIC considers "egress". Besides ACL mirroring on port egresss this includes also packets mirrored due to ECN marking. This patchset changes the way the internal mirroring buffer is reserved. Currently the buffer reflects port MTU and speed accurately. In the future, mlxsw should support dcbnl_setbuffer hook to allow the users to set buffer sizes by hand. In that case, there might not be enough space for growth of the internal mirroring buffer due to MTU and speed changes. While vetoing MTU changes would be merely confusing, port speed changes cannot be vetoed, and such change would simply lead to issues in packet mirroring. For these reasons, with these patches the internal mirroring buffer is derived from maximum MTU and maximum speed achievable on the port. Patches #1 and #2 introduce a new callback to determine the maximum speed a given port can achieve. With patches #3 and #4, the information about, respectively, maximum MTU and maximum port speed, is kept in struct mlxsw_sp_port. In patch #5, maximum MTU and maximum speed are used to determine the size of the internal buffer. MTU update and speed update hooks are dropped, because they are no longer necessary. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
Petr Machata	532b49e41e	mlxsw: spectrum_span: Derive SBIB from maximum port speed & MTU The SBIB register configures the size of an internal buffer that the Spectrum ASICs use when mirroring traffic on egress. This size should be taken into account when validating that the port headroom buffers are not larger than the chip can handle. Up until now this was not done, which is incidentally not a problem, because the priority group buffers that mlxsw auto-configures are small enough that the boundary condition could not be violated. However when dcbnl_setbuffer is implemented, the user has control over sizes of PG buffers, and they might overshoot the headroom capacity. However the size of the SBIB buffer depends on port speed, and that cannot be vetoed. Therefore SBIB size should be deduced from maximum port speed. Additionally, once the buffers are configured by hand, the user could get into an uncomfortable situation where their MTU change requests get vetoed, because the SBIB does not fit anymore. Therefore derive SBIB size from maximum permissible MTU as well. Remove all the code that adjusted the SBIB size whenever speed or MTU changed. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
Petr Machata	3232e8c66e	mlxsw: spectrum: Keep maximum speed around The maximum port speed depends on link modes supported by the port, and for Ethernet ports is constant. The maximum speed will be handy when setting SBIB, the internal buffer used for traffic mirroring. Therefore, keep it in struct mlxsw_sp_port for easy access. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
Petr Machata	2ecf87ae6c	mlxsw: spectrum: Keep maximum MTU around The maximum port MTU depends on port type. On Spectrum, mlxsw configures all ports as Ethernet ports, and the maximum MTU therefore never changes. Besides checking MTU configuration, maximum MTU will also be handy when setting SBIB, the internal buffer used for traffic mirroring. Therefore, keep it in struct mlxsw_sp_port for easy access. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
Petr Machata	60fbc52184	mlxsw: spectrum_ethtool: Introduce ptys_max_speed callback The SBIB register configures the size of an internal buffer that the Spectrum ASICs use when mirroring traffic on egress. This size should be taken into account when validating that the port headroom buffers are not larger than the chip can handle. Up until now this was not done, which is incidentally not a problem, because the priority group buffers that mlxsw auto-configures are small enough that the boundary condition could not be violated. When dcbnl_setbuffer is implemented, the user gets control over sizes of PG buffers, and they might overshoot the headroom capacity. However the size of the SBIB buffer depends on port speed, which cannot be vetoed. There is obviously no way to retroactively push back on requests for overlarge PG buffers, or reject an overlarge MTU, or cancel losslessness of a certain PG. Therefore, instead of taking into account the current speed when calculating SBIB buffer size, take into account the maximum speed that a port with given Ethernet protocol capabilities can have. To that end, add a new ethtool callback, ptys_max_speed, which determines this maximum speed. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
Petr Machata	d24ca6c0a7	mlxsw: spectrum_ethtool: Extract a helper to get Ethernet attributes In order to allow reusing the logic, extract from mlxsw_sp_port_get_link_ksettings() the code to obtain Ethernet protocol attributes, mlxsw_sp_port_ptys_query(). Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:37:30 -07:00
David S. Miller	7952d7edf3	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Tony Nguyen says: ==================== 40GbE Intel Wired LAN Driver Updates 2020-09-14 This series contains updates to i40e driver only. Li RongQing removes binding affinity mask to a fixed CPU and sets prefetch of Rx buffer page to occur conditionally. Björn provides AF_XDP performance improvements by not prefetching HW descriptors, using 16 byte descriptors, and moving buffer allocation out of Rx processing loop. v2: Define prefetch_page_address in a common header for patch 2. Dropped, previous, patch 5 as it is being reworked to be more generalized. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:07:58 -07:00
David S. Miller	e0d9ae699e	RxRPC development fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl9fipcACgkQ+7dXa6fL C2sNIhAAjnqKckjbLtzy2ZhO3nyEMlABYtGcDi8a1x3H42Ncsqca5GiKjjY54n90 rLe2iyX/5ncURjrkVVUJFTlkhQrha40dOp/DYHHbwj4ko9P625QrsPn0h5zo/Ben UUeOVqibyAOoqXWCqhRgLF1BhPmg/22TtHiqbcRul+nss9vcjuFcjOEIhNVZDUfu VPjeitxF9Tuz9FEH00UJs23LWONBCvNWDtCjAj/hf328Mk+TptSiFPTNVEuPrbje 1IbBy3PjBzeL2CFtp0OQs3uibAz+7C9IY4i53tdBPQNE5uW1FE/Wm7ixK3Oseq8X hkAP3phNG669tZzE+49g0X1AfqHJr9F0dGbdIqOYC4seyC6NXROuvnzX3HdV7gYd MwCyIcjWxw2B6dhjk2sDncFjr7Tima6KRvWHsf8cEk645gbMEltvNxJi1KCK/sj/ wpiiQrPZZ82e+RfIfQ5l5cuMEROceZ1LpUKRK5rc4Gc49xuFbanoOYh4iBChmABb ULKVRHb/HFRIY9Y8boxw+0iDzDYQugoH6IsEEBdH87UBonEfPaJpcRTljcFU4LVh ppNeOXFu0p+CQwDaLDhTILDVoFDjMfVAjOC42TMfiTLEarWz5cpRPu96tOerpSgk Ulmh6m2cGNYDOIuCdVyRJFf5F9+Mj3VIBygven4GuWUqkZ18ooc= =0qvR -----END PGP SIGNATURE----- Merge tag 'rxrpc-next-20200914' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fixes for the connection manager rewrite Here are some fixes for the connection manager rewrite: (1) Fix a goto to the wrong place in error handling. (2) Fix a missing NULL pointer check. (3) The stored allocation error needs to be stored signed. (4) Fix a leak of connection bundle when clearing connections due to net namespace exit. (5) Fix an overget of the bundle when setting up a new client conn. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 14:03:38 -07:00
Luo bin	33acd755f4	hinic: add vxlan segmentation and cs offload support Add NETIF_F_GSO_UDP_TUNNEL and NETIF_F_GSO_UDP_TUNNEL_CSUM features to support vxlan segmentation and checksum offload. Ipip and ipv6 tunnel packets are regarded as non-tunnel pkt for hw and as for other type of tunnel pkts, checksum offload is disabled. Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:59:15 -07:00
Zhang Changzhong	f3694707ad	net: qlcnic: remove unused variable 'val' in qlcnic_83xx_cam_unlock() Fixes the following W=1 kernel build warning(s): drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c:661:6: warning: variable 'val' set but not used [-Wunused-but-set-variable] 661 \| u32 val; \| ^~~ After commit 7f9664525f9c ("qlcnic: 83xx memory map and HW access routines"), variable 'val' is never used in qlcnic_83xx_cam_unlock(), so removing it to avoid build warning. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:44:12 -07:00
Zhang Changzhong	f7ab0f04a0	net: pxa168_eth: remove unused variable 'retval' int pxa168_eth_change_mtu() Fixes the following W=1 kernel build warning(s): drivers/net/ethernet/marvell/pxa168_eth.c:1190:6: warning: variable 'retval' set but not used [-Wunused-but-set-variable] 1190 \| int retval; \| ^~~~~~ Function pxa168_eth_change_mtu() always return zero, so variable 'retval' is redundant, just remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:43:38 -07:00
Zhang Changzhong	992bae7e42	net: fec: ptp: remove unused variable 'ns' in fec_time_keep() Fixes the following W=1 kernel build warning(s): drivers/net/ethernet/freescale/fec_ptp.c:523:6: warning: variable 'ns' set but not used [-Wunused-but-set-variable] 523 \| u64 ns; \| ^~ After commit 6605b730c061 ("FEC: Add time stamping code and a PTP hardware clock"), variable 'ns' is never used in fec_time_keep(), so removing it to avoid build warning. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:42:50 -07:00
Zhang Changzhong	85743cead5	net: dnet: remove unused variable 'tx_status 'in dnet_start_xmit() Fixes the following W=1 kernel build warning(s): drivers/net/ethernet/dnet.c:510:6: warning: variable 'tx_status' set but not used [-Wunused-but-set-variable] u32 tx_status, irq_enable; ^~~~~~~~~ After commit 4796417417a6 ("dnet: Dave DNET ethernet controller driver (updated)"), variable 'tx_status' is never used in dnet_start_xmit(), so removing it to avoid build warning. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:42:09 -07:00
Eric Dumazet	0cbe6a8f08	tcp: remove SOCK_QUEUE_SHRUNK SOCK_QUEUE_SHRUNK is currently used by TCP as a temporary state that remembers if some room has been made in the rtx queue by an incoming ACK packet. This is later used from tcp_check_space() before considering to send EPOLLOUT. Problem is: If we receive SACK packets, and no packet is removed from RTX queue, we can send fresh packets, thus moving them from write queue to rtx queue and eventually empty the write queue. This stall can happen if TCP_NOTSENT_LOWAT is used. With this fix, we no longer risk stalling sends while holes are repaired, and we can fully use socket sndbuf. This also removes a cache line dirtying for typical RPC workloads. Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:36:00 -07:00
Xie He	b4c5881446	net/packet: Fix a comment about hard_header_len and headroom allocation This comment is outdated and no longer reflects the actual implementation of af_packet.c. Reasons for the new comment: 1. In af_packet.c, the function packet_snd first reserves a headroom of length (dev->hard_header_len + dev->needed_headroom). Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, which calls dev->header_ops->create, to create the link layer header. If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of length (dev->hard_header_len), and checks if the user has provided a header sized between (dev->min_header_len) and (dev->hard_header_len) (in dev_validate_header). This shows the developers of af_packet.c expect hard_header_len to be consistent with header_ops. 2. In af_packet.c, the function packet_sendmsg_spkt has a FIXME comment. That comment states that prepending an LL header internally in a driver is considered a bug. I believe this bug can be fixed by setting hard_header_len to 0, making the internal header completely invisible to af_packet.c (and requesting the headroom in needed_headroom instead). 3. There is a commit for a WiFi driver: commit 9454f7a895b8 ("mwifiex: set needed_headroom, not hard_header_len") According to the discussion about it at: https://patchwork.kernel.org/patch/11407493/ The author tried to set the WiFi driver's hard_header_len to the Ethernet header length, and request additional header space internally needed by setting needed_headroom. This means this usage is already adopted by driver developers. Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Brian Norris <briannorris@chromium.org> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:34:39 -07:00
David S. Miller	b91c06c5df	Merge branch 'mptcp-introduce-support-for-real-multipath-xmit' Paolo Abeni says: ==================== mptcp: introduce support for real multipath xmit This series enable MPTCP socket to transmit data on multiple subflows concurrently in a load balancing scenario. First the receive code path is refactored to better deal with out-of-order data (patches 1-7). An RB-tree is introduced to queue MPTCP-level out-of-order data, closely resembling the TCP level OoO handling. When data is sent on multiple subflows, the peer can easily see OoO - "future" data at the MPTCP level, especially if speeds, delay, or jitter are not symmetric. The other major change regards the netlink PM, which is extended to allow creating non backup subflows in patches 9-11. There are a few smaller additions, like the introduction of OoO related mibs, send buffer autotuning and better ack handling. Finally a bunch of new self-tests is introduced. The new feature is tested ensuring that the B/W used by an MPTCP socket using multiple subflows matches the link aggregated B/W - we use low B/W virtual links, to ensure the tests are not CPU bounded. v1 -> v2: - fix 32 bit build breakage - fix a bunch of checkpatch issues ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:03 -07:00
Paolo Abeni	1a418cb8e8	mptcp: simult flow self-tests Add a bunch of test-cases for multiple subflow xmit: create multiple subflows simulating different links condition via netem and verify that the msk is able to use completely the aggregated bandwidth. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:03 -07:00
Paolo Abeni	c76c695656	mptcp: call tcp_cleanup_rbuf on subflows That is needed to let the subflows announce promptly when new space is available in the receive buffer. tcp_cleanup_rbuf() is currently a static function, drop the scope modifier and add a declaration in the TCP header. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	d5f49190de	mptcp: allow picking different xmit subflows Update the scheduler to less trivial heuristic: cache the last used subflow, and try to send on it a reasonably long burst of data. When the burst or the subflow send space is exhausted, pick the subflow with the lower ratio between write space and send buffer - that is, the subflow with the greater relative amount of free space. v1 -> v2: - fix 32 bit build breakage due to 64bits div - fix checkpath issues (uint64_t -> u64) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	4596a2c1b7	mptcp: allow creating non-backup subflows Currently the 'backup' attribute of local endpoint is ignored. Let's use it for the MP_JOIN handshake Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	ef0da3b8a2	mptcp: move address attribute into mptcp_addr_info So that can be accessed easily from the subflow creation helper. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	06242e44b9	mptcp: add OoO related mibs Add a bunch of MPTCP mibs related to MPTCP OoO data processing. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	04e4cd4f7c	mptcp: cleanup mptcp_subflow_discard_data() There is no need to use the tcp_read_sock(), we can simply drop the skb. Additionally try to look at the next buffer for in order data. This both simplifies the code and avoid unneeded indirect calls. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	ab174ad8ef	mptcp: move ooo skbs into msk out of order queue. Add an RB-tree to cope with OoO (at MPTCP level) data. __mptcp_move_skb() insert into the RB tree "future" data, eventually coalescing skb as allowed by the MPTCP DSN. To simplify sequence accounting, move the DSN inside the cb. After successfully enqueuing in sequence data, check if we can use any data from the RB tree. Additionally move the data_fin check after spooling data from the OoO tree, otherwise we could miss shutdown events. The RB tree code is copied as verbatim as possible from tcp_data_queue_ofo(), with a few simplifications due to the fact that MPTCP doesn't need to cope with sacks. All bugs here are added by me. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	8268ed4c9d	mptcp: introduce and use mptcp_try_coalesce() Factor-out existing code, will be re-used by the next patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	da51aef5fe	mptcp: basic sndbuf autotuning Let the msk sendbuf track the size of the larger subflow's send window, so that we ensure mptcp_sendmsg() does not exceed MPTCP-level send window. The update is performed just before try to send any data. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	6719331c2f	mptcp: trigger msk processing even for OoO data This is a prerequisite to allow receiving data from multiple subflows without re-injection. Instead of dropping the OoO - "future" data in subflow_check_data_avail(), call into __mptcp_move_skbs() and let the msk drop that. To avoid code duplication factor out the mptcp_subflow_discard_data() helper. Note that __mptcp_move_skbs() can now find multiple subflows with data avail (comprising to-be-discarded data), so must update the byte counter incrementally. v1 -> v2: - fix checkpatch issues (unsigned -> unsigned int) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	47bebdf365	mptcp: set data_ready status bit in subflow_check_data_avail() This simplify mptcp_subflow_data_available() and will made follow-up patches simpler. Additionally remove the unneeded checks on subflow copied_seq: we always whole skbs out of subflows. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
Paolo Abeni	63561a403c	mptcp: rethink 'is writable' conditional Currently, when checking for the 'msk is writable' condition, we look at the individual subflows write space. That works well while we send data via a single subflow, but will not as soon as we will enable concurrent xmit on multiple subflows. With this change msk becomes writable when the following conditions hold: - the socket has some free write space - there is at least a subflow with write free space Additionally we need to set the NOSPACE bit on all subflows before blocking. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:28:02 -07:00
David S. Miller	26cdb8f72a	Merge branch 'ethernet-convert-tasklets-to-use-new-tasklet_setup-API' Allen Pais says: ==================== ethernet: convert tasklets to use new tasklet_setup API Commit 12cc923f1ccc ("tasklet: Introduce new initialization API")' introduced a new tasklet initialization API. This series converts all the crypto modules to use the new tasklet_setup() API This series is based on v5.9-rc5 v3: fix subject prefix use backpointer instead of fragile priv to netdev. v2: fix kdoc reported by Jakub Kicinski. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-14 13:02:38 -07:00

1 2 3 4 5 ...

951046 Commits