linux

iv/linux

Author	SHA1	Message	Date
Jiawen Wu	b501d261a5	net: txgbe: add FDIR ATR support Add flow director ATR filter. ATR mode is enabled by default to filter TCP packets. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-20 13:18:53 +02:00
Jakub Kicinski	7e8fcb8154	Merge branch 'ionic-rework-fix-for-doorbell-miss' Shannon Nelson says: ==================== ionic: rework fix for doorbell miss A latency test in a scaled out setting (many VMs with many queues) has uncovered an issue with our missed doorbell fix from commit b69585bfcece ("ionic: missed doorbell workaround") As a refresher, the Elba ASIC has an issue where once in a blue moon it might miss/drop a queue doorbell notification from the driver. This can result in Tx timeouts and potential Rx buffer misses. The basic problem with the original solution is that we're delaying things with a timer for every single queue, periodically using mod_timer() to reset to reset the alarm, and mod_timer() becomes a more and more expensive thing as there are more and more VFs and queues each with their own timer. A ping-pong latency test tends to exacerbate the effect such that every napi is doing a mod_timer() in every cycle. An alternative has been worked out to replace this using periodic workqueue items outside the napi cycle to request a napi_schedule driven by a single delayed-workqueue per device rather than a timer for every queue. Also, now that newer firmware is actually reporting its ASIC type, we can restrict this to the appropriate chip. The testing scenario used 128 VFs in UP state, 16 queues per VF, and latency tests were done using TCP_RR with adaptive interrupt coalescing enabled, running on 1 VF. We would see 99th percentile latencies of up to 900us range, with some max fliers as much as 4ms. With these fixes the 99th percentile latencies are typically well under 50us with the occasional max under 500us. v1: https://lore.kernel.org/netdev/20240610230706.34883-1-shannon.nelson@amd.com/ ==================== Link: https://lore.kernel.org/r/20240619003257.6138-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 18:31:49 -07:00
Brett Creeley	da0262c2c9	ionic: Only run the doorbell workaround for certain asic_type If the doorbell workaround isn't required for a certain asic_type then there is no need to run the associated code. Since newer FW versions are finally reporting their asic_type we can use a flag to determine whether or not to do the workaround. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240619003257.6138-9-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 18:31:49 -07:00
Brett Creeley	f703d56c03	ionic: Use an u16 for rx_copybreak We only support (u16)-1 size for rx_copybreak, so we can reduce the field size and move a couple other fields around to save a little space in the ionic_lif struct. Before: /* size: 17440, cachelines: 273, members: 56 / / sum members: 17403, holes: 9, sum holes: 37 / / last cacheline: 32 bytes / After: / size: 17424, cachelines: 273, members: 56 / / sum members: 17401, holes: 7, sum holes: 23 / / last cacheline: 16 bytes */ Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240619003257.6138-8-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 18:31:49 -07:00
Shannon Nelson	55a3982ec7	ionic: check for queue deadline in doorbell_napi_work Check the deadline against the last time run and only schedule a new napi if we haven't been run recently. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240619003257.6138-7-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 18:31:48 -07:00
Shannon Nelson	d7f9bc6859	ionic: add per-queue napi_schedule for doorbell check Add a work item for each queue that will be run on the queue's preferred cpu and will schedule another napi. This napi is run in case the device missed a doorbell and didn't process a packet. This is a problem for the Elba asic that happens very rarely. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240619003257.6138-6-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 18:31:48 -07:00
Shannon Nelson	4ded136c78	ionic: add work item for missed-doorbell check Add the first queued work for checking on the missed doorbell. This is a delayed work item that reschedules itself every cycle starting at probe. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240619003257.6138-5-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 18:31:48 -07:00
Shannon Nelson	9e25450da7	ionic: add private workqueue per-device Instead of using the system's default workqueue, add a private workqueue for the device to use for its little jobs. This is to better support the new work items we will be adding in the next patches for PF and VF specific jobs, without inundating the system workqueue in a couple of customer cases where our devices get scaled out to 100-200 VFs. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240619003257.6138-4-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 18:31:48 -07:00
Brett Creeley	d458d4b4fd	ionic: Keep interrupt affinity up to date Currently the driver either sets the initial interrupt affinity for its adminq and tx/rx queues on probe or resets it on various down/up/reconfigure flows. If any user and/or user process (i.e. irqbalance) changes IRQ affinity for any of the driver's interrupts that will be reset to driver defaults whenever any down/up/reconfigure operation happens. This is incorrect and is fixed by making 2 changes: 1. Allocate an array of cpumasks that's only allocated on probe and destroyed on remove. 2. Update the cpumask(s) for interrupts that are in use by registering for affinity notifiers. Signed-off-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240619003257.6138-3-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 18:31:33 -07:00
Shannon Nelson	4aaa49a282	ionic: remove missed doorbell per-queue timer Remove the timer-per-queue mechanics from the missed doorbell check in preparation for the new missed doorbell fix. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240619003257.6138-2-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 18:30:54 -07:00
Jakub Kicinski	6f80fcdfbc	Merge branch 'mlxsw-use-page-pool-for-rx-buffers-allocation' Petr Machata says: ==================== mlxsw: Use page pool for Rx buffers allocation Amit Cohen writes: After using NAPI to process events from hardware, the next step is to use page pool for Rx buffers allocation, which is also enhances performance. To simplify this change, first use page pool to allocate one continuous buffer for each packet, later memory consumption can be improved by using fragmented buffers. This set significantly enhances mlxsw driver performance, CPU can handle about 370% of the packets per second it previously handled. The next planned improvement is using XDP to optimize telemetry. Patch set overview: Patches #1-#2 are small preparations for page pool usage Patch #3 initializes page pool, but do not use it Patch #4 converts the driver to use page pool for buffers allocations Patch #5 is an optimization for buffer access Patch #6 cleans up an unused structure Patch #7 uses napi_consume_skb() as part of Tx completion ==================== Link: https://lore.kernel.org/r/cover.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:38:13 -07:00
Amit Cohen	d94ae6415b	mlxsw: pci: Use napi_consume_skb() to free SKB as part of Tx completion Currently, as part of Tx completion, the driver calls dev_kfree_skb_any() to free the SKB. For this flow, the correct function is napi_consume_skb(). This function and dev_consume_skb_any() were added to be used for consumed SKBs, which were not dropped, so the skb:kfree_skb tracepoint is not triggered, and we can get better diagnostics about dropped packets. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/a9f9f3dc884c0d1be4bd4c9d72030c88c7ac004f.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:38:11 -07:00
Amit Cohen	e8441b1f6b	mlxsw: pci: Do not store SKB for RDQ elements The previous patch used page pool to allocate buffers for RDQ. With this change, 'elem_info->u.rdq.skb' is not used anymore, as we do not allocate SKB before getting the packet, we hold page pointer and build the SKB around it once packet is received. Remove the union and store SKB pointer for SDQ only. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/23a531008936dc9a1a298643fb1e4f9a7b8e6eb3.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:38:11 -07:00
Amit Cohen	0f3cd437a1	mlxsw: pci: Optimize data buffer access Before accessing data buffer, call net_prefetch() to load it into the cache. This change improves driver performance, CPU can handle about 7.1% more packets per second. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1fa07c510890866a6f201163ab7e78890ba28b3b.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:38:11 -07:00
Amit Cohen	b5b60bb491	mlxsw: pci: Use page pool for Rx buffers allocation As part of driver init, all Rx queues are filled with buffers for hardware usage. Later, when a packet is received, a new buffer should be allocated to be used by hardware instead of the received buffer. Packet's processing time includes allocation time, which can be improved using page pool. Using page pool, DMA mapping is done only for first allocation of buffers. As subsequent buffers allocation avoid DMA mapping, it results in performance improvement. The purpose of page pool is to allocate pages fast from cache without locking. This lockless guarantee naturally comes from running under a NAPI. Use page pool to allocate the data buffer only, so hardware will use it to fill the packet. At completion time, attach the data buffer (now filled with packet payload) to new SKB which is allocated around the received buffer. SKB building at completion time prevents cache miss for each packet, as now the SKB is allocated right before packets will be handled by networking stack. Page pool for each Rx queue enhances Rx side performance by reclaiming buffers back to each queue specific pool. This change significantly improves driver performance, CPU can handle about 345% of the packets per second it previously handled. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1cf788a8f43c70aae6d526018ef77becb27ad6d3.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:38:10 -07:00
Amit Cohen	5642c6a086	mlxsw: pci: Initialize page pool per CQ Next patch will use page pool to allocate buffers for RDQ. Initialize page pool for each CQ, which is mapped 1:1 to RDQ. Page pool for each Rx queue enhances Rx side performance by reclaiming buffers back to each queue specific pool. When only one NAPI instance is the consumer of pages from page pool, it is recommended to pass it as part of 'page_pool_params', then page pool APIs will be done without special locks. mlxsw driver holds NAPI instance per CQ, so add page pool per CQ and use the existing NAPI instance. For now, pages are not allocated from the pool, next patch will use it. Some notes regarding 'page_pool_params': * Use PP_FLAG_DMA_MAP to allow page pool handles DMA mapping, for now do not use sync flag, as only the device writes to this memory and we read it only when it finishes writing there. This will probably be changed when we will support XDP. * Define 'order' according to maximum MTU and take into account software overhead. Some round up are done, which means that we allocate more pages than we really need. This can be improved later by using fragmented buffers. * Use pool_size = MLXSW_PCI_WQE_COUNT. This will be the size of 'ptr_ring', and should be the maximum amount of packets that page pool will allocate memory for. In our case, this is the queue size, defined as MLXSW_PCI_WQE_COUNT. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/02e5856ae7c572d4293ce6bb92c286ee6cfec800.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:38:10 -07:00
Amit Cohen	7555b7f338	mlxsw: pci: Store CQ pointer as part of RDQ structure Next patches will add support for page pool in mlxsw driver. Page pool will be used to allocate buffers for RDQ and will use NAPI instance of the appropriate CQ (RDQ is mapped 1:1 to CQ). To allow pool initialization as part of CQ init, when NAPI is initialized, page_pool structure will be as part of CQ structure. Later, the allocations for RDQ will be done from the pool in the appropriate CQ. To allow access to the appropriate pool, set CQ pointer as part of RDQ initialization. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/d60918ca1e142a554af1df9c1152cdac83854a3b.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:38:10 -07:00
Amit Cohen	39fa294f58	mlxsw: pci: Split NAPI setup/teardown into two steps mlxsw_pci_cq_napi_setup() includes both NAPI initialization and enablement, similar to teardown function. Next patches will add support for page pool in mlxsw driver, then we use NAPI instance for page pool. Page pool initialization should be done before NAPI enablement, same for page pool destruction which should be done after NAPI disablement. As preparation, split NAPI setup/teardown into two steps, then page pool setup will be done between the phases. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/8dbf37e859f07247498fca17109b8858ff2b0498.1718709196.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:38:10 -07:00
Lukasz Majewski	89f5e60777	net: hsr: cosmetic: Remove extra white space This change just removes extra (i.e. not needed) white space in prp_drop_frame() function. No functional changes. Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240618125817.1111070-1-lukma@denx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:32:57 -07:00
Jiri Pirko	c8bd1f7f3e	virtio_net: add support for Byte Queue Limits Add support for Byte Queue Limits (BQL). Tested on qemu emulated virtio_net device with 1, 2 and 4 queues. Tested with fq_codel and pfifo_fast. Super netperf with 50 threads is running in background. Netperf TCP_RR results: NOBQL FQC 1q: 159.56 159.33 158.50 154.31 agv: 157.925 NOBQL FQC 2q: 184.64 184.96 174.73 174.15 agv: 179.62 NOBQL FQC 4q: 994.46 441.96 416.50 499.56 agv: 588.12 NOBQL PFF 1q: 148.68 148.92 145.95 149.48 agv: 148.2575 NOBQL PFF 2q: 171.86 171.20 170.42 169.42 agv: 170.725 NOBQL PFF 4q: 1505.23 1137.23 2488.70 3507.99 agv: 2159.7875 BQL FQC 1q: 1332.80 1297.97 1351.41 1147.57 agv: 1282.4375 BQL FQC 2q: 768.30 817.72 864.43 974.40 agv: 856.2125 BQL FQC 4q: 945.66 942.68 878.51 822.82 agv: 897.4175 BQL PFF 1q: 149.69 151.49 149.40 147.47 agv: 149.5125 BQL PFF 2q: 2059.32 798.74 1844.12 381.80 agv: 1270.995 BQL PFF 4q: 1871.98 4420.02 4916.59 13268.16 agv: 6119.1875 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240618144456.1688998-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:28:19 -07:00
Jeff Johnson	2b0cd6b727	net: smc9194: add missing MODULE_DESCRIPTION() macro With ARCH=m68k, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/smsc/smc9194.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-smsc-v1-1-ad3d7200421e@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:22:33 -07:00
Jeff Johnson	5e736135ad	net: ethernet: mac89x0: add missing MODULE_DESCRIPTION() macro With ARCH=m68k, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/cirrus/mac89x0.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-cirrus-v1-1-07f5bd0b64cb@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:22:00 -07:00
Jeff Johnson	deb9d57662	net: amd: add missing MODULE_DESCRIPTION() macros With ARCH=m68k, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/a2065.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/ariadne.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/atarilance.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/hplance.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/7990.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/mvme147.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/sun3lance.o Add the missing invocation of the MODULE_DESCRIPTION() macro to all files which have a MODULE_LICENSE(). This includes drivers/net/ethernet/amd/lance.c which, although it did not produce a warning with the m68k allmodconfig configuration, may cause this warning with other configurations. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-amd-v1-1-50ee7a9ad50e@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:21:40 -07:00
Jeff Johnson	e6fed01554	net: arcnet: com20020-isa: add missing MODULE_DESCRIPTION() macro With ARCH=m68k, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020-isa.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-arcnet-v1-1-90e42bc58102@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-19 17:21:12 -07:00
Matthias Schiffer	3d49ee2127	net: dsa: mt7530: add support for bridge port isolation Remove a pair of ports from the port matrix when both ports have the isolated flag set. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 13:10:01 +01:00
Matthias Schiffer	c25c961fc7	net: dsa: mt7530: factor out bridge join/leave logic As preparation for implementing bridge port isolation, move the logic to add and remove bits in the port matrix into a new helper mt7530_update_port_member(), which is called from mt7530_port_bridge_join() and mt7530_port_bridge_leave(). Another part of the preparation is using dsa_port_offloads_bridge_dev() instead of dsa_port_offloads_bridge() to check for bridge membership, as we don't have a struct dsa_bridge in mt7530_port_bridge_flags(). The port matrix setting is slightly streamlined, now always first setting the mt7530_port's pm field and then writing the port matrix from that field into the hardware register, instead of duplicating the bit manipulation for both the struct field and the register. mt7530_port_bridge_join() was previously using \|= to update the port matrix with the port bitmap, which was unnecessary, as pm would only have the CPU port set before joining a bridge; a simple assignment can be used for both joining and leaving (and will also work when individual bits are added/removed in port_bitmap with regard to the previous port matrix, which is what happens with port isolation). No functional change intended. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 13:10:01 +01:00
David S. Miller	65e4efa049	Merge branch 'net-drop-rx-socket-tracepoint' Yan Zhai says: ==================== net: pass receive socket to drop tracepoint We set up our production packet drop monitoring around the kfree_skb tracepoint. While this tracepoint is extremely valuable for diagnosing critical problems, it also has some limitation with drops on the local receive path: this tracepoint can only inspect the dropped skb itself, but such skb might not carry enough information to: 1. determine in which netns/container this skb gets dropped 2. determine by which socket/service this skb oughts to be received The 1st issue is because skb->dev is the only member field with valid netns reference. But skb->dev can get cleared or reused. For example, tcp_v4_rcv will clear skb->dev and in later processing it might be reused for OFO tree. The 2nd issue is because there is no reference on an skb that reliably points to a receiving socket. skb->sk usually points to the local sending socket, and it only points to a receive socket briefly after early demux stage, yet the socket can get stolen later. For certain drop reason like TCP OFO_MERGE, Zerowindow, UDP at PROTO_MEM error, etc, it is hard to infer which receiving socket is impacted. This cannot be overcome by simply looking at the packet header, because of complications like sk lookup programs. In the past, single purpose tracepoints like trace_udp_fail_queue_rcv_skb, trace_sock_rcvqueue_full, etc are added as needed to provide more visibility. This could be handled in a more generic way. In this change set we propose a new 'sk_skb_reason_drop' call as a drop-in replacement for kfree_skb_reason at various local input path. It accepts an extra receiving socket argument. Both issues above can be resolved via this new argument. V4->V5: rename rx_skaddr to rx_sk to be more clear visually, suggested by Jesper Dangaard Brouer. V3->V4: adjusted the TP_STRUCT field order to align better, suggested by Steven Rostedt. V2->V3: fixed drop_monitor function signatures; fixed a few uninitialized sks; Added a few missing report tags from test bots (also noticed by Dan Carpenter and Simon Horman). V1->V2: instead of using skb->cb, directly add the needed argument to trace_kfree_skb tracepoint. Also renamed functions as Eric Dumazet suggested. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 12:44:23 +01:00
Yan Zhai	e2e7d78d9a	af_packet: use sk_skb_reason_drop to free rx packets Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011859.Aacus8GV-lkp@intel.com/ Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 12:44:22 +01:00
Yan Zhai	fc0cc92488	udp: use sk_skb_reason_drop to free rx packets Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011751.NpVN0sSk-lkp@intel.com/ Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 12:44:22 +01:00
Yan Zhai	46a02aa357	tcp: use sk_skb_reason_drop to free rx packets Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202406011539.jhwBd7DX-lkp@intel.com/ Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 12:44:22 +01:00
Yan Zhai	ce9a2424e9	net: raw: use sk_skb_reason_drop to free rx packets Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 12:44:22 +01:00
Yan Zhai	7467de1763	ping: use sk_skb_reason_drop to free rx packets Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving socket to the tracepoint. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 12:44:22 +01:00
Yan Zhai	ba8de796ba	net: introduce sk_skb_reason_drop function Long used destructors kfree_skb and kfree_skb_reason do not pass receiving socket to packet drop tracepoints trace_kfree_skb. This makes it hard to track packet drops of a certain netns (container) or a socket (user application). The naming of these destructors are also not consistent with most sk/skb operating functions, i.e. functions named "sk_xxx" or "skb_xxx". Introduce a new functions sk_skb_reason_drop as drop-in replacement for kfree_skb_reason on local receiving path. Callers can now pass receiving sockets to the tracepoints. kfree_skb and kfree_skb_reason are still usable but they are now just inline helpers that call sk_skb_reason_drop. Note it is not feasible to do the same to consume_skb. Packets not dropped can flow through multiple receive handlers, and have multiple receiving sockets. Leave it untouched for now. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 12:44:22 +01:00
Yan Zhai	c53795d48e	net: add rx_sk to trace_kfree_skb skb does not include enough information to find out receiving sockets/services and netns/containers on packet drops. In theory skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP stack for OOO packet lookup. Similarly, skb->sk often identifies a local sender, and tells nothing about a receiver. Allow passing an extra receiving socket to the tracepoint to improve the visibility on receiving drops. Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 12:44:22 +01:00
David S. Miller	6f46fc9bc2	Merge branch 'am65x-ptp' Diogo Ivo says ==================== Enable PTP timestamping/PPS for AM65x SR1.0 devices This patch series enables support for PTP in AM65x SR1.0 devices. This feature relies heavily on the Industrial Ethernet Peripheral (IEP) hardware module, which implements a hardware counter through which time is kept. This hardware block is the basis for exposing a PTP hardware clock to userspace and for issuing timestamps for incoming/outgoing packets, allowing for time synchronization. The IEP also has compare registers that fire an interrupt when the counter reaches the value stored in a compare register. This feature allows us to support PPS events in the kernel. The changes are separated into five patches: - PATCH 01/05: Register SR1.0 devices with the IEP infrastructure to expose a PHC clock to userspace, allowing time to be adjusted using standard PTP tools. The code for issuing/ collecting packet timestamps is already present in the current state of the driver, so only this needs to be done. - PATCH 02/05: Remove unnecessary spinlock synchronization. - PATCH 03/05: Document IEP interrupt in DT binding. - PATCH 04/05: Add support for IEP compare event/interrupt handling to enable PPS events. - PATCH 05/05: Add the interrupts to the IOT2050 device tree. Currently every compare event generates two interrupts, the first corresponding to the actual event and the second being a spurious but otherwise harmless interrupt. The root cause of this has been identified and has been solved in the platform's SDK. A forward port of the SDK's patches also fixes the problem in upstream but is not included here since it's upstreaming is out of the scope of this series. If someone from TI would be willing to chime in and help get the interrupt changes upstream that would be great! Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> --- Changes in v4: - Remove unused 'flags' variables in patch 02/05 - Add patch 03/05 describing IEP interrupt in DT binding - Link to v3: https://lore.kernel.org/r/20240607-iep-v3-0-4824224105bc@siemens.com Changes in v3: - Collect Reviewed-by tags - Add patch 02/04 removing spinlocks from IEP driver - Use mutex-based synchronization when accessing HW registers - Link to v2: https://lore.kernel.org/r/20240604-iep-v2-0-ea8e1c0a5686@siemens.com Changes in v2: - Collect Reviewed-by tags - PATCH 01/03: Limit line length to 80 characters - PATCH 02/03: Proceed with limited functionality if getting IRQ fails, limit line length to 80 characters - Link to v1: https://lore.kernel.org/r/20240529-iep-v1-0-7273c07592d3@siemens.com ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 10:56:10 +01:00
Diogo Ivo	71be1189c9	arm64: dts: ti: iot2050: Add IEP interrupts for SR1.0 devices Add the interrupts needed for PTP Hardware Clock support via IEP in SR1.0 devices. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 10:56:10 +01:00
Diogo Ivo	f18ad402cd	net: ti: icss-iep: Enable compare events The IEP module supports compare events, in which a value is written to a hardware register and when the IEP counter reaches the written value an interrupt is generated. Add handling for this interrupt in order to support PPS events. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 10:56:10 +01:00
Diogo Ivo	5056860cf8	dt-bindings: net: Add IEP interrupt The IEP interrupt is used in order to support both capture events, where an incoming external signal gets timestamped on arrival, and compare events, where an interrupt is generated internally when the IEP counter reaches a programmed value. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 10:56:10 +01:00
Diogo Ivo	5758e03cf6	net: ti: icss-iep: Remove spinlock-based synchronization As all sources of concurrency in hardware register access occur in non-interrupt context eliminate spinlock-based synchronization and rely on the mutex-based synchronization that is already present. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 10:56:10 +01:00
Diogo Ivo	5e1e43893b	net: ti: icssg-prueth: Enable PTP timestamping support for SR1.0 devices Enable PTP support for AM65x SR1.0 devices by registering with the IEP infrastructure in order to expose a PTP clock to userspace. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 10:56:09 +01:00
Hongfu Li	9f1f70dd85	rds:Simplify the allocation of slab caches Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Hongfu Li <lihongfu@kylinos.cn> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-06-19 10:47:40 +01:00
Haiyang Zhang	382d1741b5	net: mana: Add support for page sizes other than 4KB on ARM64 As defined by the MANA Hardware spec, the queue size for DMA is 4KB minimal, and power of 2. And, the HWC queue size has to be exactly 4KB. To support page sizes other than 4KB on ARM64, define the minimal queue size as a macro separately from the PAGE_SIZE, which we always assumed it to be 4KB before supporting ARM64. Also, add MANA specific macros and update code related to size alignment, DMA region calculations, etc. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1718655446-6576-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-18 18:21:18 -07:00
Jakub Kicinski	2c6a4b969c	Merge branch 'net-mlx4_en-use-ethtool_puts-sprintf' Kamal Heib says: ==================== net/mlx4_en: Use ethtool_puts/sprintf This patchset updates the mlx4_en driver to use the ethtool_puts and ethtool_sprintf helper functions. Signed-off-by: Kamal Heib <kheib@redhat.com> ==================== Link: https://lore.kernel.org/r/20240617172329.239819-1-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-18 18:19:23 -07:00
Kamal Heib	6c7dd432dc	net/mlx4_en: Use ethtool_puts/sprintf to fill stats strings Use the ethtool_puts/ethtool_sprintf helper to print the stats strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-4-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-18 18:19:18 -07:00
Kamal Heib	4454929c34	net/mlx4_en: Use ethtool_puts to fill selftest strings Use the ethtool_puts helper to print the selftest strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-3-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-18 18:19:18 -07:00
Kamal Heib	e52e010395	net/mlx4_en: Use ethtool_puts to fill priv flags strings Use the ethtool_puts helper to print the priv flags strings into the ethtool strings interface. Signed-off-by: Kamal Heib <kheib@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617172329.239819-2-kheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-18 18:19:18 -07:00
Christophe JAILLET	8c379e3ce4	net: microchip: Constify struct vcap_operations "struct vcap_operations" are not modified in these drivers. Constifying this structure moves some data to a read-only section, so increase overall security. In order to do it, "struct vcap_control" also needs to be adjusted to this new const qualifier. As an example, on a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 15176 1094 16 16286 3f9e drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.o After: ===== text data bss dec hex filename 15268 998 16 16282 3f9a drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/d8e76094d2e98ebb5bfc8205799b3a9db0b46220.1718524644.git.christophe.jaillet@wanadoo.fr Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-18 15:26:39 +02:00
Paolo Abeni	e845bb84fb	Merge branch 'introduce-phy-mode-10g-qxgmii' Luo Jie says: ==================== Introduce PHY mode 10G-QXGMII This patch series adds 10G-QXGMII mode for PHY driver. The patch series is split from the QCA8084 PHY driver patch series below. https://lore.kernel.org/all/20231215074005.26976-1-quic_luoj@quicinc.com/ Per Andrew Lunn’s advice, submitting this patch series for acceptance as they already include the necessary 'Reviewed-by:' tags. This way, they need not wait for QCA8084 series patches to conclude review. Changes in v2: * remove PHY_INTERFACE_MODE_10G_QXGMII from workaround of validation in the phylink_validate_phy. 10G_QXGMII will be set into phy->possible_interfaces in its .config_init method of PHY driver that supports it. ==================== Link: https://lore.kernel.org/r/20240615120028.2384732-1-quic_luoj@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-18 13:28:29 +02:00
Vladimir Oltean	5dfabcdd76	dt-bindings: net: ethernet-controller: add 10g-qxgmii mode Add the new interface mode 10g-qxgmii, which is similar to usxgmii but extend to 4 channels to support maximum of 4 ports with the link speed 10M/100M/1G/2.5G. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-18 13:28:26 +02:00
Vladimir Oltean	777b8afb81	net: phy: introduce core support for phy-mode = "10g-qxgmii" 10G-QXGMII is a MAC-to-PHY interface defined by the USXGMII multiport specification. It uses the same signaling as USXGMII, but it multiplexes 4 ports over the link, resulting in a maximum speed of 2.5G per port. Some in-tree SoCs like the NXP LS1028A use "usxgmii" when they mean either the single-port USXGMII or the quad-port 10G-QXGMII variant, and they could get away just fine with that thus far. But there is a need to distinguish between the 2 as far as SerDes drivers are concerned. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-06-18 13:28:26 +02:00

1 2 3 4 5 ...

1280826 Commits