linux

iv/linux

Author	SHA1	Message	Date
Jakub Kicinski	f261aa15b2	Merge branch 'selftests-drv-net-rss_ctx-add-tests-for-rss-contexts' Jakub Kicinski says: ==================== selftests: drv-net: rss_ctx: add tests for RSS contexts Add a few tests exercising RSS context API. In addition to basic sanity checks, tests add RSS contexts, n-tuple rule to direct traffic to them (based on dst port), and qstats to make sure traffic landed where we expected. v2 adds a test for removing contexts out of order. When testing bnxt - either the new test or running more tests after the overlap test makes the device act strangely. To the point where it may start giving out ntuple IDs of 0 for all rules.. $ export NETIF=eth0 REMOTE_... $ ./drivers/net/hw/rss_ctx.py KTAP version 1 1..8 ok 1 rss_ctx.test_rss_key_indir ok 2 rss_ctx.test_rss_context ok 3 rss_ctx.test_rss_context4 # Increasing queue count 44 -> 66 # Failed to create context 32, trying to test what we got ok 4 rss_ctx.test_rss_context32 # SKIP Tested only 31 contexts, wanted 32 ok 5 rss_ctx.test_rss_context_overlap ok 6 rss_ctx.test_rss_context_overlap2 # .. sprays traffic like a headless chicken .. not ok 7 rss_ctx.test_rss_context_out_of_order ok 8 rss_ctx.test_rss_context4_create_with_cfg # Totals: pass:6 fail:1 xfail:0 xpass:0 skip:1 error:0 v2: https://lore.kernel.org/all/20240625010210.2002310-1-kuba@kernel.org v1: https://lore.kernel.org/all/20240620232902.1343834-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20240626012456.2326192-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-26 19:07:29 -07:00
Jakub Kicinski	f898c16a06	selftests: drv-net: rss_ctx: add tests for RSS configuration and contexts Add tests focusing on indirection table configuration and creating extra RSS contexts in drivers which support it. $ export NETIF=eth0 REMOTE_... $ ./drivers/net/hw/rss_ctx.py KTAP version 1 1..8 ok 1 rss_ctx.test_rss_key_indir ok 2 rss_ctx.test_rss_context ok 3 rss_ctx.test_rss_context4 # Increasing queue count 44 -> 66 # Failed to create context 32, trying to test what we got ok 4 rss_ctx.test_rss_context32 # SKIP Tested only 31 contexts, wanted 32 ok 5 rss_ctx.test_rss_context_overlap ok 6 rss_ctx.test_rss_context_overlap2 # .. sprays traffic like a headless chicken .. not ok 7 rss_ctx.test_rss_context_out_of_order ok 8 rss_ctx.test_rss_context4_create_with_cfg # Totals: pass:6 fail:1 xfail:0 xpass:0 skip:1 error:0 Note that rss_ctx.test_rss_context_out_of_order fails with the device I tested with, but it seems to be a device / driver bug. Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626012456.2326192-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-26 19:07:16 -07:00
Jakub Kicinski	94fecaa6dc	selftests: drv-net: add ability to wait for at least N packets to load gen Teach the load generator how to wait for at least given number of packets to be received. This will be useful for filtering where we'll want to send a non-trivial number of packets and make sure they landed in right queues. Reviewed-by: Breno Leitao <leitao@debian.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626012456.2326192-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-26 19:06:03 -07:00
Jakub Kicinski	af8e51644a	selftests: drv-net: add helper to wait for HW stats to sync Some devices DMA stats to the host periodically. Add a helper which can wait for that to happen, based on frequency reported by the driver in ethtool. Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626012456.2326192-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-26 19:06:03 -07:00
Jakub Kicinski	8b8fe28015	selftests: drv-net: try to check if port is in use We use random ports for communication. As Willem predicted this leads to occasional failures. Try to check if port is already in use by opening a socket and binding to that port. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240626012456.2326192-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-26 19:06:03 -07:00
Jakub Kicinski	2e2de714d6	Merge branch 'mlxsw-reduce-memory-footprint-of-mlxsw-driver' Petr Machata says: ==================== mlxsw: Reduce memory footprint of mlxsw driver Amit Cohen writes: A previous patch-set used page pool to allocate buffers, to simplify the change, we first used one continuous buffer, which was allocated with order > 0. This set improves page pool usage to allocate the exact number of pages which are required for packet. This change requires using fragmented SKB, till now all the buffer was in the linear part. Note that 'skb->truesize' is decreased for small packets. This set significantly reduces memory consumption of mlxsw driver. The footprint is reduced by 26%. Patch set overview: Patch #1 calculates number of scatter/gather entries and stores the value Patch #2 converts the driver to use fragmented buffers ==================== Link: https://patch.msgid.link/cover.1719321422.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-26 07:43:59 -07:00
Amit Cohen	36437f469d	mlxsw: pci: Use fragmented buffers WQE (Work Queue Element) includes 3 scatter/gather entries for buffers. The buffer can be split into 3 parts, software should set address and byte count of each part. A previous patch-set used page pool to allocate buffers, to simplify the change, we first used one continuous buffer, which was allocated with order > 0. This patch improves page pool usage to allocate the exact number of pages which are required for packet. As part of init, fill WQE.address[x] and WQE.byte_count* with pages which are allocated from the pool. Fill x entries according to number of scatter/gather entries which are required for maximum packet size. When a packet is received, check the actual size and replace only the used pages. Save bytes for software overhead only as part of the first entry. This change also requires using fragmented SKB, till now all the buffer was in the linear part. Note that 'skb->truesize' is decreased for small packets. For now the maximum buffer size is 3 * PAGE_SIZE which is enough, in case that the driver will support larger MTU, we can use 'order' to allocate more than one page per scatter/gather entry. This change significantly reduces memory consumption of mlxsw driver. The footprint is reduced by 26%. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/ee38898c692e7f644a7f3ea4d33aeddb4dd917d2.1719321422.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-26 07:43:57 -07:00
Amit Cohen	8f8cea8f3d	mlxsw: pci: Store number of scatter/gather entries for maximum packet size A previous patch-set used page pool for Rx buffers allocations. To simplify the change, we first used page pool for one allocation per packet - one continuous buffer is allocated for each packet. This can be improved by using fragmented buffers, then memory consumption will be significantly reduced. WQE (Work Queue Element) includes up to 3 scatter/gather entries for data. As preparation for fragmented buffer usage, calculate number of scatter/gather entries which are required for packet according to maximum MTU and store it for future use. For now use PAGE_SIZE for each entry, which means that maximum buffer size is 3 * PAGE_SIZE. This is enough for the maximum MTU which is supported in the driver now (10K). Warn in an unlikely case of maximum MTU which requires more than 3 pages, for now this warn should not happen with standard page size (>=4K) and maximum MTU (10K). Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/98c3e3adb7e727e571ac538faf67cef262cec4fc.1719321422.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-26 07:43:57 -07:00
Uwe Kleine-König	a6a6a98094	net: Drop explicit initialization of struct i2c_device_id::driver_data to 0 These drivers don't use the driver_data member of struct i2c_device_id, so don't explicitly initialize this member. This prepares putting driver_data in an anonymous union which requires either no initialization or named designators. But it's also a nice cleanup on its own. While add it, also remove commas after the sentinel entries. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Reviewed-by: Petr Machata <petrm@nvidia.com> # For mlxsw Reviewed-by: Kory Maincent <Kory.maincent@bootlin.com> Reviewed-by: Jeremy Kerr <jk@codeconstruct.com.au> # for mctp-i2c Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20240625083853.2205977-2-u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-26 07:28:08 -07:00
Jakub Kicinski	50b70845fc	Merge branch 'add-ethernet-driver-for-tehuti-networks-tn40xx-chips' FUJITA Tomonori says: ==================== add ethernet driver for Tehuti Networks TN40xx chips This patchset adds a new 10G ethernet driver for Tehuti Networks TN40xx chips. Note in mainline, there is a driver for Tehuti Networks (drivers/net/ethernet/tehuti/tehuti.[hc]), which supports TN30xx chips. Multiple vendors (DLink, Asus, Edimax, QNAP, etc) developed adapters based on TN40xx chips. Tehuti Networks went out of business but the drivers are still distributed under GPL2 with some of the hardware (and also available on some sites). With some changes, I try to upstream this driver with a new PHY driver in Rust. The major change is replacing the PHY abstraction layer in the original driver with phylink. TN40xx chips are used with various PHY hardware (AMCC QT2025, TI TLK10232, Aqrate AQR105, and Marvell MV88X3120, MV88X3310, and MV88E2010). I've also been working on a new PHY driver for QT2025 in Rust [1]. For now, I enable only adapters using QT2025 PHY in the PCI ID table of this driver. I've tested this driver and the QT2025 PHY driver with Edimax EN-9320 10G adapter and 10G-SR SFP+. In mainline, there are PHY drivers for AQR105 and Marvell PHYs, which could work for some TN40xx adapters with this driver. To make reviewing easier, this patchset has only basic functions. Once merged, I'll submit features like ethtool support. v11: https://lore.kernel.org/netdev/20240618051608.95208-7-fujita.tomonori@gmail.com/ v10: https://lore.kernel.org/netdev/20240611045217.78529-7-fujita.tomonori@gmail.com/ v9: https://lore.kernel.org/netdev/20240605232608.65471-1-fujita.tomonori@gmail.com/ v8: https://lore.kernel.org/netdev/20240603064955.58327-1-fujita.tomonori@gmail.com/ v7: https://lore.kernel.org/netdev/20240527203928.38206-7-fujita.tomonori@gmail.com/ v6: https://lore.kernel.org/netdev/20240512085611.79747-2-fujita.tomonori@gmail.com/ v5: https://lore.kernel.org/netdev/20240508113947.68530-1-fujita.tomonori@gmail.com/ v4: https://lore.kernel.org/netdev/20240501230552.53185-1-fujita.tomonori@gmail.com/ v3: https://lore.kernel.org/netdev/20240429043827.44407-1-fujita.tomonori@gmail.com/ v2: https://lore.kernel.org/netdev/20240425010354.32605-1-fujita.tomonori@gmail.com/ v1: https://lore.kernel.org/netdev/20240415104352.4685-1-fujita.tomonori@gmail.com/ [1] https://lore.kernel.org/netdev/20240415104701.4772-1-fujita.tomonori@gmail.com/ ==================== Link: https://patch.msgid.link/20240623235507.108147-1-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 18:44:22 -07:00
FUJITA Tomonori	3082412242	net: tn40xx: add phylink support This patch adds supports for multiple PHY hardware with phylink. The adapters with TN40xx chips use multiple PHY hardware; AMCC QT2025, TI TLK10232, Aqrate AQR105, and Marvell 88X3120, 88X3310, and MV88E2010. For now, the PCI ID table of this driver enables adapters using only QT2025 PHY. I've tested this driver and the QT2025 PHY driver (SFP+ 10G SR) with Edimax EN-9320 10G adapter. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Hans-Frieder Vogt <hfdevel@gmx.net> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20240623235507.108147-8-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 18:44:19 -07:00
FUJITA Tomonori	7fdbd2f2bb	net: tn40xx: add mdio bus support This patch adds supports for mdio bus. A later path adds PHYLIB support on the top of this. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://patch.msgid.link/20240623235507.108147-7-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 18:44:19 -07:00
FUJITA Tomonori	37c4947af4	net: tn40xx: add basic Rx handling This patch adds basic Rx handling. The Rx logic uses three major data structures; two ring buffers with NIC and one database. One ring buffer is used to send information to NIC about memory to be stored packets to be received. The other is used to get information from NIC about received packets. The database is used to keep the information about DMA mapping. After a packet arrived, the db is used to pass the packet to the network stack. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Hans-Frieder Vogt <hfdevel@gmx.net> Link: https://patch.msgid.link/20240623235507.108147-6-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 18:44:19 -07:00
FUJITA Tomonori	dd2a0ff554	net: tn40xx: add basic Tx handling This patch adds device specific structures to initialize the hardware with basic Tx handling. The original driver loads the embedded firmware in the header file. This driver is implemented to use the firmware APIs. The Tx logic uses three major data structures; two ring buffers with NIC and one database. One ring buffer is used to send information about packets to be sent for NIC. The other is used to get information from NIC about packet that are sent. The database is used to keep the information about DMA mapping. After a packet is sent, the db is used to free the resource used for the packet. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://patch.msgid.link/20240623235507.108147-5-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 18:44:19 -07:00
FUJITA Tomonori	ffa28c748b	net: tn40xx: add register defines This adds several defines to handle registers in Tehuti Networks TN40xx chips for later patches. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Hans-Frieder Vogt <hfdevel@gmx.net> Link: https://patch.msgid.link/20240623235507.108147-4-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 18:44:19 -07:00
FUJITA Tomonori	ab61adc600	net: tn40xx: add pci driver for Tehuti Networks TN40xx chips This just adds the scaffolding for an ethernet driver for Tehuti Networks TN40xx chips. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240623235507.108147-3-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 18:44:19 -07:00
FUJITA Tomonori	eee5528890	PCI: Add Edimax Vendor ID to pci_ids.h Add the Edimax Vendor ID (0x1432) for an ethernet driver for Tehuti Networks TN40xx chips. This ID can be used for Realtek 8180 and Ralink rt28xx wireless drivers. Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20240623235507.108147-2-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 18:44:19 -07:00
Chris Packham	c0c68e4d52	dt-bindings: net: dsa: mediatek,mt7530: Minor wording fixes Update the mt7530 binding with some minor updates that make the document easier to read. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20240624211858.1990601-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:54:53 -07:00
Jakub Kicinski	a425a973e9	Merge branch 'gve-add-flow-steering-support' Ziwei Xiao says: ==================== gve: Add flow steering support To support flow steering in GVE driver, there are two adminq changes need to be made in advance. The first one is adding adminq mutex lock, which is to allow the incoming flow steering operations to be able to temporarily drop the rtnl_lock to reduce the latency for registering flow rules among several NICs at the same time. This could be achieved by the future changes to reduce the drivers' dependencies on the rtnl lock for particular ethtool ops. The second one is to add the extended adminq command so that we can support larger adminq command such as configure_flow_rule command. In that patch, there is a new added function called gve_adminq_execute_extended_cmd with the attribute of __maybe_unused. That attribute will be removed in the third patch of this series where it will use the previously unused function. And the other three patches are needed for the actual flow steering feature support in driver. ==================== Link: https://patch.msgid.link/20240625001232.1476315-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:48:35 -07:00
Jeroen de Borst	6f3bc48756	gve: Add flow steering ethtool support Implement the ethtool commands that can be used to configure and query flow-steering rules. A large part of this change consists of translating the ethtool representation of 'ntuples' to our internal gve_flow_rule and vice-versa in the new created gve_flow_rule.c Considering the possible large amount of flow rules, the driver doesn't store all the rules locally. When the user runs 'ethtool -n <nic>' to check the registered rules, the driver will send adminq command to query a limited amount of rules/rule ids(that filled in a 4096 bytes dma memory) at a time as a cache for the ethtool queries. The adminq query commands will be repeated for several times until the ethtool has queried all the needed rules. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-6-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:48:33 -07:00
Jeroen de Borst	57718b60df	gve: Add flow steering adminq commands Add new adminq commands for the driver to configure and query flow rules that are stored in the device. Flow steering rules are assigned with a location that determines the relative order of the rules. Flow rules can run up to an order of millions. In such cases, storing a full copy of the rules in the driver to prepare for the ethtool query is infeasible while querying them from the device is better. That needs to be optimized too so that we don't send a lot of adminq commands. The solution here is to store a limited number of rules/rule ids in the driver in a cache. Use dma_pool to allocate 4k bytes which lets device write at most 46 flow rules(4096/88) or 1024 rule ids(4096/4) at a time. For configuring flow rules, there are 3 sub-commands: - ADD which adds a rule at the location supplied - DEL which deletes the rule at the location supplied - RESET which clears all currently active rules in the device For querying flow rules, there are also 3 sub-commands: - QUERY_RULES corresponds to ETHTOOL_GRXCLSRULE. It fills the rules in the allocated cache after querying the device - QUERY_RULES_IDS corresponds to ETHTOOL_GRXCLSRLALL. It fills the rule_ids in the allocated cache after querying the device - QUERY_RULES_STATS corresponds to ETHTOOL_GRXCLSRLCNT. It queries the device's current flow rule number and the supported max flow rule limit Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-5-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:48:33 -07:00
Jeroen de Borst	3519c00557	gve: Add flow steering device option Add a new device option to signal to the driver that the device supports flow steering. This device option also carries the maximum number of flow steering rules that the device can store. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-4-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:48:33 -07:00
Jeroen de Borst	fcfe6318db	gve: Add adminq extended command The adminq command is limited to 64 bytes per entry and it's 56 bytes for the command itself at maximum. To support larger commands, we need to dma_alloc a separate memory to put the command in that memory and send the dma memory address instead of the actual command. Introduce an extended adminq command to wrap the real command with the inner opcode and the allocated dma memory address specified. Once the device receives it, it can get the real command from the given dma memory address. As designed with the device, all the extended commands will use inner opcode larger than 0xFF. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Co-developed-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-3-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:48:33 -07:00
Ziwei Xiao	1108566ca5	gve: Add adminq mutex lock We were depending on the rtnl_lock to make sure there is only one adminq command running at a time. But some commands may take too long to hold the rtnl_lock, such as the upcoming flow steering operations. For such situations, it can temporarily drop the rtnl_lock, and replace it for these operations with a new adminq lock, which can ensure the adminq command execution to be thread-safe. Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240625001232.1476315-2-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:48:33 -07:00
Jakub Kicinski	63173885cc	Merge branch 'ethtool-provide-the-dim-profile-fine-tuning-channel' Heng Qi says: ==================== ethtool: provide the dim profile fine-tuning channel The NetDIM library provides excellent acceleration for many modern network cards. However, the default profiles of DIM limits its maximum capabilities for different NICs, so providing a way which the NIC can be custom configured is necessary. Currently, the way is based on the commonly used "ethtool -C". For example, on the server side, the virtio-net NIC with rx dim enabled has 8 queues and runs nginx. The client uses the following command to send traffic to the server: ./wrk http://server_ip:80 -c 64 -t 5 -d 30 Then adjust the default rx-profile for server dim to {.usec = 1, .pkts = 256, .comps = n/a,}, {.usec = 8, .pkts = 256, .comps = n/a,}, {.usec = 30, .pkts = 256, .comps = n/a,}, {.usec = 64, .pkts = 256, .comps = n/a,}, {.usec = 128, .pkts = 256, .comps = n/a,} The server PPS is improved by 20%+. ==================== Link: https://patch.msgid.link/20240621101353.107425-1-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:15:10 -07:00
Heng Qi	dcb67f6a9e	virtio-net: support dim profile fine-tuning Virtio-net has different types of back-end device implementations. In order to effectively optimize the dim library's gains for different device implementations, let's use the new interface params to initialize and query dim results from a customized profile list. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240621101353.107425-6-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:15:06 -07:00
Heng Qi	13ba28c5cd	dim: add new interfaces for initialization and getting results DIM-related mode and work have been collected in one same place, so new interfaces are added to provide convenience. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240621101353.107425-5-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:15:06 -07:00
Heng Qi	f750dfe825	ethtool: provide customized dim profile management The NetDIM library, currently leveraged by an array of NICs, delivers excellent acceleration benefits. Nevertheless, NICs vary significantly in their dim profile list prerequisites. Specifically, virtio-net backends may present diverse sw or hw device implementation, making a one-size-fits-all parameter list impractical. On Alibaba Cloud, the virtio DPU's performance under the default DIM profile falls short of expectations, partly due to a mismatch in parameter configuration. I also noticed that ice/idpf/ena and other NICs have customized profilelist or placed some restrictions on dim capabilities. Motivated by this, I tried adding new params for "ethtool -C" that provides a per-device control to modify and access a device's interrupt parameters. Usage ======== The target NIC is named ethx. Assume that ethx only declares support for rx profile setting (with DIM_PROFILE_RX flag set in profile_flags) and supports modification of usec and pkt fields. 1. Query the currently customized list of the device $ ethtool -c ethx ... rx-profile: {.usec = 1, .pkts = 256, .comps = n/a,}, {.usec = 8, .pkts = 256, .comps = n/a,}, {.usec = 64, .pkts = 256, .comps = n/a,}, {.usec = 128, .pkts = 256, .comps = n/a,}, {.usec = 256, .pkts = 256, .comps = n/a,} tx-profile: n/a 2. Tune $ ethtool -C ethx rx-profile 1,1,n_2,n,n_3,3,n_4,4,n_n,5,n "n" means do not modify this field. $ ethtool -c ethx ... rx-profile: {.usec = 1, .pkts = 1, .comps = n/a,}, {.usec = 2, .pkts = 256, .comps = n/a,}, {.usec = 3, .pkts = 3, .comps = n/a,}, {.usec = 4, .pkts = 4, .comps = n/a,}, {.usec = 256, .pkts = 5, .comps = n/a,} tx-profile: n/a 3. Hint If the device does not support some type of customized dim profiles, the corresponding "n/a" will display. If the "n/a" field is being modified, -EOPNOTSUPP will be reported. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240621101353.107425-4-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:15:06 -07:00
Heng Qi	b65e697a7c	dim: make DIMLIB dependent on NET DIMLIB's capabilities are supplied by the dim, net_dim, and rdma_dim objects, and dim's interfaces solely act as a base for net_dim and rdma_dim and are not explicitly used anywhere else. rdma_dim is utilized by the infiniband driver, while net_dim is for network devices, excluding the soc/fsl driver. In this patch, net_dim relies on some NET's interfaces, thus DIMLIB needs to explicitly depend on the NET Kconfig. The soc/fsl driver uses the functions provided by net_dim, so it also needs to depend on NET. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240621101353.107425-3-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:15:06 -07:00
Heng Qi	0e942053e4	linux/dim: move useful macros to .h file Useful macros will be used effectively elsewhere. These will be utilized in subsequent patches. Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240621101353.107425-2-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:15:06 -07:00
Jakub Kicinski	c84f93243e	Merge branch 'ravb-add-mii-support-for-r-car-v4m' Geert Uytterhoeven says: ==================== ravb: Add MII support for R-Car V4M All EtherAVB instances on R-Car Gen3/Gen4 SoCs support the RGMII interface. In addition, the first two EtherAVB instances on R-Car V4M also support the MII interface, but this is not yet supported by the driver. This patch series adds support for MII on R-Car Gen4, after the customary cleanup. The corresponding pin control support is available in [1]. Compile-tested only, as all AVB interfaces on the Gray Hawk Single development board are connected to RGMII PHYs. No regressions on R-Car V4H. [1] "[PATCH/RFC] pinctrl: renesas: r8a779h0: Add AVB MII pins and groups" https://lore.kernel.org/4a0a12227f2145ef53b18bc08f45b19dcd745fc6.1718378739.git.geert+renesas@glider.be/ v1: https://lore.kernel.org/f0ef3e00aec461beb33869ab69ccb44a23d78f51.1718378166.git.geert+renesas@glider.be ==================== Link: https://patch.msgid.link/cover.1719234830.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:07:06 -07:00
Geert Uytterhoeven	6e0713cc82	ravb: Add MII support for R-Car V4M All EtherAVB instances on R-Car Gen3/Gen4 SoCs support the RGMII interface. In addition, the first two EtherAVB instances on R-Car V4M also support the MII interface, but this is not yet supported by the driver. Add support for MII on R-Car Gen4 by adding an R-Car Gen4-specific EMAC initialization function that selects the MII clock instead of the RGMII clock when the PHY interface is MII. Note that all implementations of EtherAVB on R-Car Gen4 SoCs have the APSR register, but only MII-capable instances are documented to have the MIISELECT bit, which has a documented value of zero when reserved. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://patch.msgid.link/3a21d1d6680864aa85afff9260234c2b8054020a.1719234830.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:07:04 -07:00
Geert Uytterhoeven	8d653d26ff	ravb: Improve ravb_hw_info instance order Move ravb_gen2_hw_info before ravb_gen3_hw_info to match ravb_match_table[] order. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://patch.msgid.link/a76febe3737e26365a784e9193da9363f22aa550.1719234830.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 17:07:04 -07:00
Li RongQing	d891317fe4	virtio_net: Remove u64_stats_update_begin()/end() for stats fetch This place is fetching the stats, u64_stats_update_begin()/end() should not be used, and the fetcher of stats is in the same context as the updater of the stats, so don't need any protection Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://lore.kernel.org/20240621094552.53469-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 16:42:26 -07:00
Yujie Liu	c4532232fa	selftests: net: remove unneeded IP_GRE config It seems that there is no definition for config IP_GRE, and it is not a dependency of other configs, so remove it. linux$ find -name Kconfig \| xargs grep "IP_GRE" <-- nothing There is a IPV6_GRE config defined in net/ipv6/Kconfig. It only depends on NET_IPGRE_DEMUX but not IP_GRE. Signed-off-by: Yujie Liu <yujie.liu@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20240624055539.2092322-1-yujie.liu@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 08:37:55 -07:00
James Chapman	a8a8d89dbd	l2tp: remove incorrect __rcu attribute This fixes a sparse warning. Fixes: d18d3f0a24fc ("l2tp: replace hlist with simple list for per-tunnel session list") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406220754.evK8Hrjw-lkp@intel.com/ Signed-off-by: James Chapman <jchapman@katalix.com> Link: https://patch.msgid.link/20240624082945.1925009-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-25 08:29:42 -07:00
Elad Yifee	73cfd947db	net: ethernet: mtk_eth_soc: ppe: prevent ppe update for non-mtk devices Introduce an additional validation to ensure that the PPE index is modified exclusively for mtk_eth ingress devices. This primarily addresses the issue related to WED operation with multiple PPEs. Fixes: dee4dd10c79a ("net: ethernet: mtk_eth_soc: ppe: add support for multiple PPEs") Signed-off-by: Elad Yifee <eladwf@gmail.com> Link: https://lore.kernel.org/r/20240623175113.24437-1-eladwf@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 15:35:53 +02:00
Paolo Abeni	1d70687592	Merge branch 'net-macb-wol-enhancements' Vineeth Karumanchi says: ==================== net: macb: WOL enhancements - Add provisioning for queue tie-off and queue disable during suspend. - Add support for ARP packet types to WoL. - Advertise WoL attributes by default. - Extend MACB supported WoL modes to the PHY supported WoL modes. - Deprecate magic-packet property. v6: https://lore.kernel.org/netdev/20240617070413.2291511-1-vineeth.karumanchi@amd.com/ v5: https://lore.kernel.org/netdev/20240611162827.887162-1-vineeth.karumanchi@amd.com/ v4: https://lore.kernel.org/lkml/20240610053936.622237-1-vineeth.karumanchi@amd.com/ v3: https://lore.kernel.org/netdev/20240605102457.4050539-1-vineeth.karumanchi@amd.com/ v2: https://lore.kernel.org/netdev/20240222153848.2374782-1-vineeth.karumanchi@amd.com/ v1: https://lore.kernel.org/lkml/20240130104845.3995341-1-vineeth.karumanchi@amd.com/#t ==================== Link: https://lore.kernel.org/r/20240621045735.3031357-1-vineeth.karumanchi@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:53:09 +02:00
Vineeth Karumanchi	783bfe279e	dt-bindings: net: cdns,macb: Deprecate magic-packet property WOL modes such as magic-packet should be an OS policy. By default, advertise supported modes and use ethtool to activate the required mode. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:53:07 +02:00
Vineeth Karumanchi	0cb8de39a7	net: macb: Add ARP support to WOL Extend wake-on LAN support with an ARP packet. Currently, if PHY supports WOL, ethtool ignores the modes supported by MACB. This change extends the WOL modes with MACB supported modes. Advertise wake-on LAN supported modes by default without relying on dt node. By default, wake-on LAN will be in disabled state. Using ethtool, users can enable/disable or choose packet types. For wake-on LAN via ARP, ensure the IP address is assigned and report an error otherwise. Co-developed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Tested-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> # on SAMA7G5 Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:53:07 +02:00
Vineeth Karumanchi	3650a8cc5b	net: macb: Enable queue disable Enable queue disable for Versal devices. Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:53:07 +02:00
Vineeth Karumanchi	759cc793eb	net: macb: queue tie-off or disable during WOL suspend When GEM is used as a wake device, it is not mandatory for the RX DMA to be active. The RX engine in IP only needs to receive and identify a wake packet through an interrupt. The wake packet is of no further significance; hence, it is not required to be copied into memory. By disabling RX DMA during suspend, we can avoid unnecessary DMA processing of any incoming traffic. During suspend, perform either of the below operations: - tie-off/dummy descriptor: Disable unused queues by connecting them to a looped descriptor chain without free slots. - queue disable: The newer IP version allows disabling individual queues. Co-developed-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Harini Katakam <harini.katakam@amd.com> Signed-off-by: Vineeth Karumanchi <vineeth.karumanchi@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> Tested-by: Claudiu Beznea <claudiu.beznea@tuxon.dev> # on SAMA7G5 Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:53:07 +02:00
Paolo Abeni	7e7c714a36	Merge branch 'af_unix-remove-spin_lock_nested-and-convert-to-lock_cmp_fn' Kuniyuki Iwashima says: ==================== af_unix: Remove spin_lock_nested() and convert to lock_cmp_fn. This series removes spin_lock_nested() in AF_UNIX and instead defines the locking orders as functions tied to each lock by lockdep_set_lock_cmp_fn(). When the defined function returns a negative value, lockdep considers it will not cause deadlock. (See ->cmp_fn() in check_deadlock() and check_prev_add().) When we cannot define the total ordering, we return -1 for the allowed ordering and otherwise 0 as undefined. [0] [0]: https://lore.kernel.org/netdev/thzkgbuwuo3knevpipu4rzsh5qgmwhklihypdgziiruabvh46f@uwdkpcfxgloo/ Changes: v4: * Patch 4 * Make unix_state_lock_cmp_fn() symmetric. v3: https://lore.kernel.org/netdev/20240614200715.93150-1-kuniyu@amazon.com/ * Patch 3 * Cache sk->sk_state * s/unix_state_lock()/unix_state_unlock()/ * Patch 8 * Add embryo -> listener locking order v2: https://lore.kernel.org/netdev/20240611222905.34695-1-kuniyu@amazon.com/ * Patch 1 & 2 * Use (((l) > (r)) - ((l) < (r))) for comparison v1: https://lore.kernel.org/netdev/20240610223501.73191-1-kuniyu@amazon.com/ ==================== Link: https://lore.kernel.org/r/20240620205623.60139-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:21 +02:00
Kuniyuki Iwashima	22e5751b05	af_unix: Don't use spin_lock_nested() in copy_peercred(). When (AF_UNIX, SOCK_STREAM) socket connect()s to a listening socket, the listener's sk_peer_pid/sk_peer_cred are copied to the client in copy_peercred(). Then, two sk_peer_locks are held there; one is client's and another is listener's. However, the latter is not needed because we hold the listner's unix_state_lock() there and unix_listen() cannot update the cred concurrently. Let's drop the unnecessary spin_lock() and use the bare spin_lock() for the client to protect concurrent read by getsockopt(SO_PEERCRED). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:19 +02:00
Kuniyuki Iwashima	e4bd881d98	af_unix: Remove put_pid()/put_cred() in copy_peercred(). When (AF_UNIX, SOCK_STREAM) socket connect()s to a listening socket, the listener's sk_peer_pid/sk_peer_cred are copied to the client in copy_peercred(). Then, the client's sk_peer_pid and sk_peer_cred are always NULL, so we need not call put_pid() and put_cred() there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	faf489e689	af_unix: Set sk_peer_pid/sk_peer_cred locklessly for new socket. init_peercred() is called in 3 places: 1. socketpair() : both sockets 2. connect() : child socket 3. listen() : listening socket The first two need not hold sk_peer_lock because no one can touch the socket. Let's set cred/pid without holding lock for the two cases and rename the old init_peercred() to update_peercred() to properly reflect the use case. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	8647ece481	af_unix: Define locking order for U_RECVQ_LOCK_EMBRYO in unix_collect_skb(). While GC is cleaning up cyclic references by SCM_RIGHTS, unix_collect_skb() collects skb in the socket's recvq. If the socket is TCP_LISTEN, we need to collect skb in the embryo's queue. Then, both the listener's recvq lock and the embroy's one are held. The locking is always done in the listener -> embryo order. Let's define it as unix_recvq_lock_cmp_fn() instead of using spin_lock_nested(). Note that the reverse order is defined for consistency. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	7202cb5916	af_unix: Remove U_LOCK_GC_LISTENER. Commit 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") added U_LOCK_GC_LISTENER for the old GC, but it's no longer needed for the new GC. Let's remove U_LOCK_GC_LISTENER and unix_state_lock_nested() as there's no user. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	c4da4661d9	af_unix: Remove U_LOCK_DIAG. sk_diag_dump_icons() acquires embryo's lock by unix_state_lock_nested() to fetch its peer. The embryo's ->peer is set to NULL only when its parent listener is close()d. Then, unix_release_sock() is called for each embryo after unlinking skb by skb_dequeue(). In sk_diag_dump_icons(), we hold the parent's recvq lock, so we need not acquire unix_state_lock_nested(), and peer is always non-NULL. Let's remove unnecessary unix_state_lock_nested() and non-NULL test for peer. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00
Kuniyuki Iwashima	b380b18102	af_unix: Don't acquire unix_state_lock() for sock_i_ino(). sk_diag_dump_peer() and sk_diag_dump() call unix_state_lock() for sock_i_ino() which reads SOCK_INODE(sk->sk_socket)->i_ino, but it's protected by sk->sk_callback_lock. Let's remove unnecessary unix_state_lock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-25 11:10:18 +02:00

1 2 3 4 5 ...

1281276 Commits