linux

iv/linux

Author	SHA1	Message	Date
Matthieu Baerts	c4192967e6	selftests: mptcp: lib: format subtests results in TAP The current selftests infrastructure formats the results in TAP 13. This version doesn't support subtests and only the end result of each selftest is taken into account. It means that a single issue in a subtest of a selftest containing multiple subtests forces the whole selftest to be marked as failed. It also means that subtests results are not tracked by CIs executing selftests. MPTCP selftests run hundreds of various subtests. It is then important to track each of them and not one result per selftest. It is particularly interesting to do that when validating stable kernels with the last version of the test suite: tests might fail because a feature is not supported but the test didn't skip that part. In this case, if subtests are not tracked, the whole selftest will be marked as failed making the other subtests useless because their results are ignored. This patch adds some helpers in mptcp_lib.sh to be able to easily format subtests results in TAP in the different MPTCP selftests. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	d8463d8165	selftests: mptcp: userspace_pm: reduce dup code around printf In this selftest, "printf" is always used with "stdbuf". With a new helper, it is possible to call "stdbuf" only from one place. This makes the code a bit clearer to read. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	e198ad7592	selftests: mptcp: userspace_pm: uniform results printing There are a few reasons to do that: - When the tabs are not printed as 8 spaces, some results were not properly aligned - Some lines printing the test name were very long due to the use of a lot of spaces/tabs at the end and stdbuf at the beginning. - To reduce duplicated code, e.g. to print what has failed and set the status But by centralising how the test results are printed, this also prepares future commits to avoid more duplicated code and ease the tracking of the different subtests. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	8320b1387a	selftests: mptcp: userspace_pm: fix shellcheck warnings shellcheck recently helped to find an issue where a wrong variable name was used. It is then good to fix the other harmless issues in order to spot "real" ones later. Here, three categories of warnings are ignored: - SC2317: Command appears to be unreachable. The cleanup() function is invoke indirectly via the EXIT trap. - SC2034: Variable appears unused. The check_expected_one() function takes the name of the variable in argument but it ends up reading the content: indirect usage. - SC2086: Double quote to prevent globbing and word splitting. This is recommended but the current usage is correct and there is no need to do all these modifications to be compliant with this rule. One error has been fixed with SC2181: Check exit code directly with e.g. 'if ! mycmd;', not indirectly with $?. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	e141c1e8e4	selftests: mptcp: userspace pm: don't stop if error No more tests were executed after a failure but it is still interesting to get results for all the tests to better understand what's still OK and what's not after a modification. Now we only exit earlier if the two connections cannot be established. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	edbc16c43b	selftests: mptcp: connect: don't stop if error No more tests were executed after a failure but it is still interesting to get results for all the tests to better understand what's still OK and what's not after a modification. Now we only exit earlier if the basic tests are failing: no ping going through namespaces or unable to transfer data on the loopback interface. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Rohan G Thomas	47448ff2d5	net: stmmac: xgmac: Fix L3L4 filter count Get the exact count of L3L4 filters when the L3L4FNUM field of HW_FEATURE1 register is >= 8. If L3L4FNUM < 8, then the number of L3L4 filters supported by XGMAC is equal to L3L4FNUM. From L3L4FNUM >= 8 the number of L3L4 filters goes on like 8, 16, 32, ... Current maximum of L3L4FNUM = 10. Also, fix the XGMAC_IDDR bitmask of L3L4_ADDR_CTRL register. IDDR field starts from the 8th bit of the L3L4_ADDR_CTRL register. IDDR[3:0] indicates the type of L3L4 filter register while IDDR[8:4] indicates the filter number (0 to 31). So overall 9 bits are used for IDDR (i.e. L3L4_ADDR_CTRL[16:8]) to address the registers of all the filters. Currently, XGMAC_IDDR is GENMASK(15,8), causing issues accessing L3L4 filters above 15 for those XGMACs configured with more than 16 L3L4 filters. Signed-off-by: Rohan G Thomas <rohan.g.thomas@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:08:39 +01:00
Wang Ming	daa751444f	net: ipv4: Use kfree_sensitive instead of kfree key might contain private part of the key, so better use kfree_sensitive to free it. Fixes: 38320c70d282 ("[IPSEC]: Use crypto_aead and authenc in ESP") Signed-off-by: Wang Ming <machel@vivo.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:03:03 +01:00
David S. Miller	b3f937f15c	Merge branch 'backup-nexthop-ID' Ido Schimmel says: ==================== Add backup nexthop ID support tl;dr ===== This patchset adds a new bridge port attribute specifying the nexthop object ID to attach to a redirected skb as tunnel metadata. The ID is used by the VXLAN driver to choose the target VTEP for the skb. This is useful for EVPN multi-homing, where we want to redirect local (intra-rack) traffic upon carrier loss through one of the other VTEPs (ES peers) connected to the target host. Background ========== In a typical EVPN multi-homing setup each host is multi-homed using a set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf switches in a rack. These switches act as VTEPs and are not directly connected (as opposed to MLAG), but can communicate with each other (as well as with VTEPs in remote racks) via spine switches over L3. The control plane uses Type 1 routes [1] to create a mapping between an ES and VTEPs where the ES has active links. In addition, the control plane uses Type 2 routes [2] to create a mapping between {MAC, VLAN} and an ES. These tables are then used by the control plane to instruct VTEPs how to reach remote hosts. For example, assuming {MAC X, VLAN Y} is accessible via ES1 and this ES has active links to VTEP1 and VTEP2. The control plane will program the following entries to a remote VTEP: # ip nexthop add id 1 via $VTEP1_IP fdb # ip nexthop add id 2 via $VTEP2_IP fdb # ip nexthop add id 10 group 1/2 fdb # bridge fdb add $MAC_X dev vx0 master extern_learn vlan $VLAN_Y # bridge fdb add $MAC_Y dev vx0 self extern_learn nhid 10 src_vni $VNI_Y Remote traffic towards the host will be load balanced between VTEP1 and VTEP2. If the control plane notices a carrier loss on the ES1 link connected to VTEP1, it will issue a Type 1 route withdraw, prompting remote VTEPs to remove the effected nexthop from the group: # ip nexthop replace id 10 group 2 fdb Motivation ========== While remote traffic can be redirected to a VTEP with an active ES link by withdrawing a Type 1 route, the same is not true for local traffic. A host that is multi-homed to VTEP1 and VTEP2 via another ES (e.g., ES2) will send its traffic to {MAC X, VLAN Y} via one of these two switches, according to its LAG hash algorithm which is not under our control. If the traffic arrives at VTEP1 - which no longer has an active ES1 link - it will be dropped due to the carrier loss. In MLAG setups, the above problem is solved by redirecting the traffic through the peer link upon carrier loss. This is achieved by defining the peer link as the backup port of the host facing bond. For example: # bridge link set dev bond0 backup_port bond_peer Unlike MLAG, there is no peer link between the leaf switches in EVPN. Instead, upon carrier loss, local traffic should be redirected through one of the active ES peers. This can be achieved by defining the VXLAN port as the backup port of the host facing bonds. For example: # bridge link set dev es1_bond backup_port vx0 However, the VXLAN driver is not programmed with FDB entries for locally attached hosts and therefore does not know to which VTEP to redirect the traffic to. This will result in the traffic being replicated to all the VTEPs (potentially hundreds) in the network and each VTEP dropping the traffic, except for the active ES peer. Avoiding the flooding by programming local FDB entries in the VXLAN driver is not a viable solution as it requires to significantly increase the number of programmed FDB entries. Implementation ============== The proposed solution is to create an FDB nexthop group for each ES with the IP addresses of the active ES peers and set this ID as the backup nexthop ID (new bridge port attribute) of the ES link. For example, on VTEP1: # ip nexthop add id 1 via $VTEP2_IP fdb # ip nexthop add id 10 group 1 fdb # bridge link set dev es1_bond backup_nhid 10 # bridge link set dev es1_bond backup_port vx0 When the ES link loses its carrier, traffic will be redirected to the VXLAN port, but instead of only attaching the tunnel ID (i.e., VNI) as tunnel metadata to the skb, the backup nexthop ID will be attached as well. The VXLAN driver will then use this information to forward the skb via the nexthop object associated with the ID, as if the skb hit an FDB entry associated with this ID. Testing ======= A test for both the existing backup port attribute as well as the new backup nexthop ID attribute is added in patch #4. Patchset overview ================= Patch #1 extends the tunnel key structure with the new nexthop ID field. Patch #2 uses the new field in the VXLAN driver to forward packets via the specified nexthop ID. Patch #3 adds the new backup nexthop ID bridge port attribute and adjusts the bridge driver to attach the ID as tunnel metadata upon redirection. Patch #4 adds a selftest. iproute2 patches can be found here [3]. Changelog ========= Since RFC [4]: * Added Nik's tags. [1] https://datatracker.ietf.org/doc/html/rfc7432#section-7.1 [2] https://datatracker.ietf.org/doc/html/rfc7432#section-7.2 [3] https://github.com/idosch/iproute2/tree/submit/backup_nhid_v1 [4] https://lore.kernel.org/netdev/20230713070925.3955850-1-idosch@nvidia.com/ ==================== Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 10:53:49 +01:00
Ido Schimmel	b408453053	selftests: net: Add bridge backup port and backup nexthop ID test Add test cases for bridge backup port and backup nexthop ID, testing both good and bad flows. Example truncated output: # ./test_bridge_backup_port.sh [...] Tests passed: 83 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 10:53:49 +01:00
Ido Schimmel	29cfb2aaa4	bridge: Add backup nexthop ID support Add a new bridge port attribute that allows attaching a nexthop object ID to an skb that is redirected to a backup bridge port with VLAN tunneling enabled. Specifically, when redirecting a known unicast packet, read the backup nexthop ID from the bridge port that lost its carrier and set it in the bridge control block of the skb before forwarding it via the backup port. Note that reading the ID from the bridge port should not result in a cache miss as the ID is added next to the 'backup_port' field that was already accessed. After this change, the 'state' field still stays on the first cache line, together with other data path related fields such as 'flags and 'vlgrp': struct net_bridge_port { struct net_bridge * br; /* 0 8 / struct net_device dev; /* 8 8 / netdevice_tracker dev_tracker; / 16 0 / struct list_head list; / 16 16 / long unsigned int flags; / 32 8 / struct net_bridge_vlan_group vlgrp; /* 40 8 / struct net_bridge_port backup_port; /* 48 8 / u32 backup_nhid; / 56 4 / u8 priority; / 60 1 / u8 state; / 61 1 / u16 port_no; / 62 2 / / --- cacheline 1 boundary (64 bytes) --- */ [...] } __attribute__((__aligned__(8))); When forwarding an skb via a bridge port that has VLAN tunneling enabled, check if the backup nexthop ID stored in the bridge control block is valid (i.e., not zero). If so, instead of attaching the pre-allocated metadata (that only has the tunnel key set), allocate a new metadata, set both the tunnel key and the nexthop object ID and attach it to the skb. By default, do not dump the new attribute to user space as a value of zero is an invalid nexthop object ID. The above is useful for EVPN multihoming. When one of the links composing an Ethernet Segment (ES) fails, traffic needs to be redirected towards the host via one of the other ES peers. For example, if a host is multihomed to three different VTEPs, the backup port of each ES link needs to be set to the VXLAN device and the backup nexthop ID needs to point to an FDB nexthop group that includes the IP addresses of the other two VTEPs. The VXLAN driver will extract the ID from the metadata of the redirected skb, calculate its flow hash and forward it towards one of the other VTEPs. If the ID does not exist, or represents an invalid nexthop object, the VXLAN driver will drop the skb. This relieves the bridge driver from the need to validate the ID. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 10:53:49 +01:00
Ido Schimmel	d977e1c8e3	vxlan: Add support for nexthop ID metadata VXLAN FDB entries can point to FDB nexthop objects. Each such object includes the IP address(es) of remote VTEP(s) via which the target host is accessible. Example: # ip nexthop add id 1 via 192.0.2.1 fdb # ip nexthop add id 2 via 192.0.2.17 fdb # ip nexthop add id 1000 group 1/2 fdb # bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 1000 src_vni 10020 This is useful for EVPN multihoming where a single host can be connected to multiple VTEPs. The source VTEP will calculate the flow hash of the skb and forward it towards the IP address of one of the VTEPs member in the nexthop group. There are cases where an external entity (e.g., the bridge driver) can provide not only the tunnel ID (i.e., VNI) of the skb, but also the ID of the nexthop object via which the skb should be forwarded. Therefore, in order to support such cases, when the VXLAN device is in external / collect metadata mode and the tunnel info attached to the skb is of bridge type, extract the nexthop ID from the tunnel info. If the ID is valid (i.e., non-zero), forward the skb via the nexthop object associated with the ID, as if the skb hit an FDB entry associated with this ID. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 10:53:48 +01:00
Ido Schimmel	8bb5e82589	ip_tunnels: Add nexthop ID field to ip_tunnel_key Extend the ip_tunnel_key structure with a field indicating the ID of the nexthop object via which the skb should be routed. The field is going to be populated in subsequent patches by the bridge driver in order to indicate to the VXLAN driver which FDB nexthop object to use in order to reach the target host. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 10:53:48 +01:00
Mao Zhu	03df47c1bb	can: ucan: Remove repeated word Delete one of repeated word 'information' in comment. Signed-off-by: Mao Zhu <zhumao001@208suo.com> Link: https://lore.kernel.org/all/20230718163718.461137-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-07-19 09:04:36 +02:00
Marc Kleine-Budde	b38eb89112	Merge patch series "can: kvaser_pciefd: Add support for new Kvaser PCI Express devices" Jimmy Assarsson <extja@kvaser.com> says: This patch series adds support for a range of new Kvaser PCI Express devices based on the SmartFusion2 SoC, to the kvaser_pciefd driver. In the first patch, the hardware specific constants and functions are moved into a driver_data struct. In the second patch, we add the new devices and their hardware specific constants and functions. Changes in v2: - Rebased on can: kvaser_pciefd: Fixes and improvements https://lore.kernel.org/all/20230529134248.752036-1-extja@kvaser.com - Dropped can: kvaser_pciefd: Wrap register read and writes with macros https://lore.kernel.org/linux-can/20230523094354.83792-17-extja@kvaser.com since the driver became a lot cleaner when using FIELD_{GET,PREP} and GENMASK. Moved some parts of the patch into can: kvaser_pciefd: Move hardware specific constants and functions into a driver_data struct Removed macros reading/writing registers. - Link to v1: https://lore.kernel.org/all/20230523094354.83792-14-extja@kvaser.com Link: https://lore.kernel.org/all/20230622151153.294844-1-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-07-19 09:04:21 +02:00
Jimmy Assarsson	f33ad6776b	can: kvaser_pciefd: Add support for new Kvaser pciefd devices Add support for new Kvaser pciefd devices, based on SmartFusion2 SoC. Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/all/20230622151153.294844-3-extja@kvaser.com [mkl: mark structs as static] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-07-19 09:03:10 +02:00
Jimmy Assarsson	c2ad812956	can: kvaser_pciefd: Move hardware specific constants and functions into a driver_data struct Move hardware specific address offsets, interrupt masks and DMA mapping function, into struct kvaser_pciefd_driver_data, as a step towards adding new devices based on different hardware. Co-developed-by: Martin Jocic <majoc@kvaser.com> Signed-off-by: Martin Jocic <majoc@kvaser.com> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/all/20230622151153.294844-2-extja@kvaser.com [mkl: mark structs as static] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-07-19 09:01:21 +02:00
Marc Kleine-Budde	2e12d79f56	Merge patch series "can: xilinx_can: Add support for reset" Michal Simek <michal.simek@amd.com> says: IP core has option reset line which can be wired that's why add support for optional reset. Changes in v2: - Add Conor's ACK - Fix use-after-free in xcan_remove reported by Marc. - Link to v1: https://lore.kernel.org/all/cover.1689084227.git.michal.simek@amd.com Link: https://lore.kernel.org/all/cover.1689164442.git.michal.simek@amd.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-07-19 08:57:05 +02:00
Rob Herring	22d8e8d633	can: Explicitly include correct DT includes The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it as merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/all/20230714174757.4060748-1-robh@kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-07-19 08:55:28 +02:00
Srinivas Neeli	25000fc785	can: xilinx_can: Add support for controller reset Add support for an optional reset for the CAN controller using the reset driver. If the CAN node contains the "resets" property, then this logic will perform CAN controller reset. Signed-off-by: Srinivas Neeli <srinivas.neeli@amd.com> Signed-off-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/all/ab7e6503aa3343e39ead03c1797e765be6c50de2.1689164442.git.michal.simek@amd.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-07-19 08:55:28 +02:00
Michal Simek	62bd0232d7	dt-bindings: can: xilinx_can: Add reset description IP core has input for reset signal which can be connected that's why describe optional reset property. Signed-off-by: Michal Simek <michal.simek@amd.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/all/bfaed896cc51af02fe5f290675313ab4dcab0d33.1689164442.git.michal.simek@amd.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2023-07-19 08:55:28 +02:00
Jakub Kicinski	7f5acea727	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-07-17 (iavf) This series contains updates to iavf driver only. Ding Hui fixes use-after-free issue by calling netif_napi_del() for all allocated q_vectors. He also resolves out-of-bounds issue by not updating to new values when timeout is encountered. Marcin and Ahmed change the way resets are handled so that the callback operating under the RTNL lock will wait for the reset to finish, the rtnl_lock sensitive functions in reset flow will schedule the netdev update for later in order to remove circular dependency with the critical lock. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: fix reset task race with iavf_remove() iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies Revert "iavf: Do not restart Tx queues after reset task failure" Revert "iavf: Detach device during reset task" iavf: Wait for reset in callbacks which trigger it iavf: use internal state to free traffic IRQs iavf: Fix out-of-bounds when setting channels on remove iavf: Fix use-after-free in free_netdev ==================== Link: https://lore.kernel.org/r/20230717175205.3217774-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:49:08 -07:00
Jakub Kicinski	e9b2bd96af	Merge branch 'tcp-annotate-data-races-in-tcp_rsk-req' Eric Dumazet says: ==================== tcp: annotate data-races in tcp_rsk(req) Small series addressing two syzbot reports around tcp_rsk(req) ==================== Link: https://lore.kernel.org/r/20230717144445.653164-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:45:30 -07:00
Eric Dumazet	eba20811f3	tcp: annotate data-races around tcp_rsk(req)->ts_recent TCP request sockets are lockless, tcp_rsk(req)->ts_recent can change while being read by another cpu as syzbot noticed. This is harmless, but we should annotate the known races. Note that tcp_check_req() changes req->ts_recent a bit early, we might change this in the future. BUG: KCSAN: data-race in tcp_check_req / tcp_check_req write to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 1: tcp_check_req+0x694/0xc70 net/ipv4/tcp_minisocks.c:762 tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071 ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:303 [inline] ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:468 [inline] ip_rcv_finish net/ipv4/ip_input.c:449 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5493 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607 process_backlog+0x21f/0x380 net/core/dev.c:5935 __napi_poll+0x60/0x3b0 net/core/dev.c:6498 napi_poll net/core/dev.c:6565 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6698 __do_softirq+0xc1/0x265 kernel/softirq.c:571 do_softirq+0x7e/0xb0 kernel/softirq.c:472 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:396 local_bh_enable+0x1f/0x20 include/linux/bottom_half.h:33 rcu_read_unlock_bh include/linux/rcupdate.h:843 [inline] __dev_queue_xmit+0xabb/0x1d10 net/core/dev.c:4271 dev_queue_xmit include/linux/netdevice.h:3088 [inline] neigh_hh_output include/net/neighbour.h:528 [inline] neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0x700/0x840 net/ipv4/ip_output.c:229 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:292 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:431 dst_output include/net/dst.h:458 [inline] ip_local_out net/ipv4/ip_output.c:126 [inline] __ip_queue_xmit+0xa4d/0xa70 net/ipv4/ip_output.c:533 ip_queue_xmit+0x38/0x40 net/ipv4/ip_output.c:547 __tcp_transmit_skb+0x1194/0x16e0 net/ipv4/tcp_output.c:1399 tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline] tcp_write_xmit+0x13ff/0x2fd0 net/ipv4/tcp_output.c:2693 __tcp_push_pending_frames+0x6a/0x1a0 net/ipv4/tcp_output.c:2877 tcp_push_pending_frames include/net/tcp.h:1952 [inline] __tcp_sock_set_cork net/ipv4/tcp.c:3336 [inline] tcp_sock_set_cork+0xe8/0x100 net/ipv4/tcp.c:3343 rds_tcp_xmit_path_complete+0x3b/0x40 net/rds/tcp_send.c:52 rds_send_xmit+0xf8d/0x1420 net/rds/send.c:422 rds_send_worker+0x42/0x1d0 net/rds/threads.c:200 process_one_work+0x3e6/0x750 kernel/workqueue.c:2408 worker_thread+0x5f2/0xa10 kernel/workqueue.c:2555 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 read to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 0: tcp_check_req+0x32a/0xc70 net/ipv4/tcp_minisocks.c:622 tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071 ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:303 [inline] ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:468 [inline] ip_rcv_finish net/ipv4/ip_input.c:449 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5493 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607 process_backlog+0x21f/0x380 net/core/dev.c:5935 __napi_poll+0x60/0x3b0 net/core/dev.c:6498 napi_poll net/core/dev.c:6565 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6698 __do_softirq+0xc1/0x265 kernel/softirq.c:571 run_ksoftirqd+0x17/0x20 kernel/softirq.c:939 smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 value changed: 0x1cd237f1 -> 0x1cd237f2 Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230717144445.653164-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:45:27 -07:00
Eric Dumazet	5e5265522a	tcp: annotate data-races around tcp_rsk(req)->txhash TCP request sockets are lockless, some of their fields can change while being read by another cpu as syzbot noticed. This is usually harmless, but we should annotate the known races. This patch takes care of tcp_rsk(req)->txhash, a separate one is needed for tcp_rsk(req)->ts_recent. BUG: KCSAN: data-race in tcp_make_synack / tcp_rtx_synack write to 0xffff8881362304bc of 4 bytes by task 32083 on cpu 1: tcp_rtx_synack+0x9d/0x2a0 net/ipv4/tcp_output.c:4213 inet_rtx_syn_ack+0x38/0x80 net/ipv4/inet_connection_sock.c:880 tcp_check_req+0x379/0xc70 net/ipv4/tcp_minisocks.c:665 tcp_v6_rcv+0x125b/0x1b20 net/ipv6/tcp_ipv6.c:1673 ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437 ip6_input_finish net/ipv6/ip6_input.c:482 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:303 [inline] ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5452 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566 netif_receive_skb_internal net/core/dev.c:5652 [inline] netif_receive_skb+0x4a/0x310 net/core/dev.c:5711 tun_rx_batched+0x3bf/0x400 tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997 tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043 call_write_iter include/linux/fs.h:1871 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x4ab/0x7d0 fs/read_write.c:584 ksys_write+0xeb/0x1a0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x42/0x50 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff8881362304bc of 4 bytes by task 32078 on cpu 0: tcp_make_synack+0x367/0xb40 net/ipv4/tcp_output.c:3663 tcp_v6_send_synack+0x72/0x420 net/ipv6/tcp_ipv6.c:544 tcp_conn_request+0x11a8/0x1560 net/ipv4/tcp_input.c:7059 tcp_v6_conn_request+0x13f/0x180 net/ipv6/tcp_ipv6.c:1175 tcp_rcv_state_process+0x156/0x1de0 net/ipv4/tcp_input.c:6494 tcp_v6_do_rcv+0x98a/0xb70 net/ipv6/tcp_ipv6.c:1509 tcp_v6_rcv+0x17b8/0x1b20 net/ipv6/tcp_ipv6.c:1735 ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437 ip6_input_finish net/ipv6/ip6_input.c:482 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:303 [inline] ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5452 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566 netif_receive_skb_internal net/core/dev.c:5652 [inline] netif_receive_skb+0x4a/0x310 net/core/dev.c:5711 tun_rx_batched+0x3bf/0x400 tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997 tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043 call_write_iter include/linux/fs.h:1871 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x4ab/0x7d0 fs/read_write.c:584 ksys_write+0xeb/0x1a0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x42/0x50 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x91d25731 -> 0xe79325cd Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 32078 Comm: syz-executor.4 Not tainted 6.5.0-rc1-syzkaller-00033-geb26cbb1a754 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023 Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230717144445.653164-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:45:27 -07:00
Subbaraya Sundeep	e7002b3b20	octeontx2-pf: mcs: Generate hash key using ecb(aes) Hardware generated encryption and ICV tags are found to be wrong when tested with IEEE MACSEC test vectors. This is because as per the HRM, the hash key (derived by AES-ECB block encryption of an all 0s block with the SAK) has to be programmed by the software in MCSX_RS_MCS_CPM_TX_SLAVE_SA_PLCY_MEM_4X register. Hence fix this by generating hash key in software and configuring in hardware. Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/1689574603-28093-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:21:56 -07:00
Jakub Kicinski	3223eeaf05	Merge branch 'remove-unnecessary-void-conversions' Wu Yunchuan says: ==================== Remove unnecessary (void) conversions Remove (void) conversions under "drivers/net" directory. PATCH v2 link: https://lore.kernel.org/all/20230710063828.172593-1-suhui@nfschina.com/ PATCH v1 link: https://lore.kernel.org/all/20230628024121.1439149-1-yunchuan@nfschina.com/ ==================== Link: https://lore.kernel.org/r/20230717030937.53818-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:01:07 -07:00
Wu Yunchuan	1d5123efdb	net: bna: Remove unnecessary (void) conversions No need cast (void) to (struct bnad_tx_info ) or (struct bnad_rx_info ). Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com> Link: https://lore.kernel.org/r/20230717031229.55169-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:01:07 -07:00
Wu Yunchuan	9235e3bcc6	can: ems_pci: Remove unnecessary (void) conversions No need cast (void) to (struct ems_pci_card *). Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20230717031221.55073-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:01:02 -07:00
Wu Yunchuan	04115debed	net: mdio: Remove unnecessary (void) conversions No need cast (void) to (struct xgene_mdio_pdata *). Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230717031212.54991-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:00:47 -07:00
Wu Yunchuan	099090c6ef	ethernet: smsc: remove unnecessary (void) conversions No need cast (voidd) to (struct smsc911x_data ) or (struct smsc9420_pdata ). Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com> Link: https://lore.kernel.org/r/20230717031204.54912-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:00:47 -07:00
Wu Yunchuan	c59cc2679a	ice: remove unnecessary (void) conversions No need cast (void) to (struct ice_ring_container *). Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com> Link: https://lore.kernel.org/r/20230717031154.54740-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:00:47 -07:00
Wu Yunchuan	406eb9cf6f	net: hns: Remove unnecessary (void) conversions No need cast (void) to (struct hns_mdio_device *). Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com> Reviewed-by: Hao Lan <lanhao@huawei.com> Link: https://lore.kernel.org/r/20230717031137.54639-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:00:47 -07:00
Wu Yunchuan	14fbcad00f	net: hns3: remove unnecessary (void) conversions. No need cast (void) to (struct hns3_nic_priv *). Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com> Reviewed-by: Hao Lan <lanhao@huawei.com> Link: https://lore.kernel.org/r/20230717031128.54557-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:00:47 -07:00
Wu Yunchuan	89c04d6c49	net: ppp: Remove unnecessary (void) conversions No need cast (void) to (struct sock *). Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/20230717031115.54432-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:00:47 -07:00
Wu Yunchuan	f15fbe46f5	net: atlantic: Remove unnecessary (void) conversions No need cast (void) to (struct hw_atl2_priv *). Signed-off-by: Wu Yunchuan <yunchuan@nfschina.com> Link: https://lore.kernel.org/r/20230717031055.54266-1-yunchuan@nfschina.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 19:00:47 -07:00
Florian Kauer	78adb4bcf9	igc: Prevent garbled TX queue with XDP ZEROCOPY In normal operation, each populated queue item has next_to_watch pointing to the last TX desc of the packet, while each cleaned item has it set to 0. In particular, next_to_use that points to the next (necessarily clean) item to use has next_to_watch set to 0. When the TX queue is used both by an application using AF_XDP with ZEROCOPY as well as a second non-XDP application generating high traffic, the queue pointers can get in an invalid state where next_to_use points to an item where next_to_watch is NOT set to 0. However, the implementation assumes at several places that this is never the case, so if it does hold, bad things happen. In particular, within the loop inside of igc_clean_tx_irq(), next_to_clean can overtake next_to_use. Finally, this prevents any further transmission via this queue and it never gets unblocked or signaled. Secondly, if the queue is in this garbled state, the inner loop of igc_clean_tx_ring() will never terminate, completely hogging a CPU core. The reason is that igc_xdp_xmit_zc() reads next_to_use before acquiring the lock, and writing it back (potentially unmodified) later. If it got modified before locking, the outdated next_to_use is written pointing to an item that was already used elsewhere (and thus next_to_watch got written). Fixes: 9acf59a752d4 ("igc: Enable TX via AF_XDP zero-copy") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 18:43:39 -07:00
Eric Dumazet	dfa2f04833	tcp: get rid of sysctl_tcp_adv_win_scale With modern NIC drivers shifting to full page allocations per received frame, we face the following issue: TCP has one per-netns sysctl used to tweak how to translate a memory use into an expected payload (RWIN), in RX path. tcp_win_from_space() implementation is limited to few cases. For hosts dealing with various MSS, we either under estimate or over estimate the RWIN we send to the remote peers. For instance with the default sysctl_tcp_adv_win_scale value, we expect to store 50% of payload per allocated chunk of memory. For the typical use of MTU=1500 traffic, and order-0 pages allocations by NIC drivers, we are sending too big RWIN, leading to potential tcp collapse operations, which are extremely expensive and source of latency spikes. This patch makes sysctl_tcp_adv_win_scale obsolete, and instead uses a per socket scaling factor, so that we can precisely adjust the RWIN based on effective skb->len/skb->truesize ratio. This patch alone can double TCP receive performance when receivers are too slow to drain their receive queue, or by allowing a bigger RWIN when MSS is close to PAGE_SIZE. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20230717152917.751987-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 18:41:18 -07:00
Jakub Kicinski	936fd2c50b	linux-can-fixes-for-6.5-20230717 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEDs2BvajyNKlf9TJQvlAcSiqKBOgFAmS1gXMTHG1rbEBwZW5n dXRyb25peC5kZQAKCRC+UBxKKooE6LzRB/0bvAxT3Xm63DzSKtSwv4l0RimpzqCN T105KT82Y24/AUfVJfEHVUETSeOOR6xNTHYzijJq8kF+/9bSYEkfkVcq/gN5QSdd 7hdRFOQbTwfRUGHF5E6XFua/NcbjIrY9vZtttiI9scC7jLO8vz5wtdNFYUMSY1Bh 3hE079AHCtDxHX2wIM5dQN5P847QmJAykaErxRftJ+0dWpHbon4WyfT+6MAVCN5s 3VSwJ3p1fhKl5h3JmRYqSXeaJ6wV1vfGawj4HoyfENPWlMmTOuQ7uobrPuNrzE92 f1S21O+woe0eaypOkaJLeYg1X7ifDqyccT8sSNwSzys/9Ay6xmxQ9NNl =nSGE -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2023-07-17 The 1st patch is by Ziyang Xuan and fixes a possible memory leak in the receiver handling in the CAN RAW protocol. YueHaibing contributes a use after free in bcm_proc_show() of the Broad Cast Manager (BCM) CAN protocol. The next 2 patches are by me and fix a possible null pointer dereference in the RX path of the gs_usb driver with activated hardware timestamps and the candlelight firmware. The last patch is by Fedor Ross, Marek Vasut and me and targets the mcp251xfd driver. The polling timeout of __mcp251xfd_chip_set_mode() is increased to fix bus joining on busy CAN buses and very low bit rate. * tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout can: gs_usb: fix time stamp counter initialization can: gs_usb: gs_can_open(): improve error handling can: bcm: Fix UAF in bcm_proc_show() can: raw: fix receiver memory leak ==================== Link: https://lore.kernel.org/r/20230717180938.230816-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 18:33:34 -07:00
John Fastabend	195e903b34	mailmap: Add entry for old intel email Fix old email to avoid bouncing email from net/drivers and older netdev work. Anyways my @intel email hasn't been active for years. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230717173306.38407-1-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 18:27:59 -07:00
Jakub Kicinski	63c8778d91	Merge branch 'net-mana-fix-doorbell-access-for-receive-queues' Long Li says: ==================== net: mana: Fix doorbell access for receive queues This patchset fixes the issues discovered during 200G physical link tests. It fixes doorbell usage and WQE format for receive queues. ==================== Link: https://lore.kernel.org/r/1689622539-5334-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 18:00:21 -07:00
Long Li	f5e39b5712	net: mana: Use the correct WQE count for ringing RQ doorbell The hardware specification specifies that WQE_COUNT should set to 0 for the Receive Queue. Although currently the hardware doesn't enforce the check, in the future releases it may check on this value. Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1689622539-5334-3-git-send-email-longli@linuxonhyperv.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 18:00:18 -07:00
Long Li	da4e864807	net: mana: Batch ringing RX queue doorbell on receiving packets It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance. Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll(). With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues. Tests showed no regression in network latency benchmarks on single connection. Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1689622539-5334-2-git-send-email-longli@linuxonhyperv.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 18:00:13 -07:00
Shannon Nelson	d1998e505a	mailmap: add entries for past lives Update old emails for my current work email. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20230717193242.43670-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 17:57:43 -07:00
Minjie Du	f8e343326c	net: mvpp2: debugfs: remove redundant parameter check in three functions As per the comment above debugfs_create_dir(), it is not expected to return an error, so an extra error check is not needed. Drop the return check of debugfs_create_dir() in mvpp2_dbgfs_c2_entry_init(), mvpp2_dbgfs_flow_tbl_entry_init() and mvpp2_dbgfs_cls_init(). Signed-off-by: Minjie Du <duminjie@vivo.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230717025538.2848-1-duminjie@vivo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 17:53:11 -07:00
Jiawen Wu	9843814fc6	net: txgbe: change LAN reset mode The old way to do LAN reset is sending reset command to firmware. Once firmware performs reset, it reconfigures what it needs. In the new firmware versions, veto bit is introduced for NCSI/LLDP to block PHY domain in LAN reset. At this point, writing register of LAN reset directly makes the same effect as the old way. And it does not reset MNG domain, so that veto bit does not change. Since veto bit was never used, the old firmware is compatible with the driver before and after this change. The new firmware needs to use with the driver after this change if it wants to implement the new feature, otherwise it is the same as the old firmware. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230717021333.94181-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 17:45:09 -07:00
Mahmoud Maatuq	3645c71b58	selftests/net: replace manual array size calc with ARRAYSIZE macro. fixes coccinelle WARNING: Use ARRAY_SIZE Signed-off-by: Mahmoud Maatuq <mahmoudmatook.mm@gmail.com> Link: https://lore.kernel.org/r/20230716184349.2124858-1-mahmoudmatook.mm@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 17:43:51 -07:00
Geliang Tang	8daf847714	bpf: Drop useless btf_vmlinux in bpf_tcp_ca The code using btf_vmlinux in bpf_tcp_ca is removed by the commit 9f0265e921de ("bpf: Require only one of cong_avoid() and cong_control() from a TCP CC") so drop this useless btf_vmlinux declaration. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/4d38da4eadaba476bd92ffcd7a5a03a5e28745c0.1689582557.git.geliang.tang@suse.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-18 17:31:10 -07:00
Anh Tuan Phan	89dc4037dd	samples/bpf: README: Update build dependencies required Update samples/bpf/README.rst to add pahole to the build dependencies list. Add the reference to "Documentation/process/changes.rst" for minimum version required so that the version required will not be outdated in the future. Signed-off-by: Anh Tuan Phan <tuananhlfc@gmail.com> Link: https://lore.kernel.org/r/aecaf7a2-9100-cd5b-5cf4-91e5dbb2c90d@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-18 17:29:43 -07:00
Alexei Starovoitov	4b3ccca5c8	Merge branch 'bpf-refcount-followups-2-owner-field' Dave Marchevsky says: ==================== BPF Refcount followups 2: owner field This series adds an 'owner' field to bpf_{list,rb}_node structs, to be used by the runtime to determine whether insertion or removal operations are valid in shared ownership scenarios. Both the races which the series fixes and the fix itself are inspired by Kumar's suggestions in [0]. Aside from insertion and removal having more reasons to fail, there are no user-facing changes as a result of this series. * Patch 1 reverts disabling of bpf_refcount_acquire so that the fixed logic can be exercised by CI. It should _not_ be applied. * Patch 2 adds internal definitions of bpf_{rb,list}_node so that their fields are easier to access. * Patch 3 is the meat of the series - it adds 'owner' field and enforcement of correct owner to insertion and removal helpers. * Patch 4 adds a test based on Kumar's examples. * Patch 5 disables the test until bpf_refcount_acquire is re-enabled. * Patch 6 reverts disabling of test added in this series logic can be exercised by CI. It should _not_ be applied. [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2/ Changelog: v1 -> v2: lore.kernel.org/bpf/20230711175945.3298231-1-davemarchevsky@fb.com/ Patch 2 ("Introduce internal definitions for UAPI-opaque bpf_{rb,list}_node") * Rename bpf_{rb,list}_node_internal -> bpf_{list,rb}_node_kern (Alexei) Patch 3 ("bpf: Add 'owner' field to bpf_{list,rb}_node") * WARN_ON_ONCE in __bpf_list_del when node has wrong owner. This shouldn't happen, but worth checking regardless (Alexei, offline convo) * Continue previous patch's renaming changes ==================== Link: https://lore.kernel.org/r/20230718083813.3416104-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-18 17:23:10 -07:00

... 3 4 5 6 7 ...

1200653 Commits