linux

iv/linux

Author	SHA1	Message	Date
Petr Machata	d47230a348	net: bridge: Add a tracepoint for MDB overflows The following patch will add two more maximum MDB allowances to the global one, mcast_hash_max, that exists today. In all these cases, attempts to add MDB entries above the configured maximums through netlink, fail noisily and obviously. Such visibility is missing when adding entries through the control plane traffic, by IGMP or MLD packets. To improve visibility in those cases, add a trace point that reports the violation, including the relevant netdevice (be it a slave or the bridge itself), and the MDB entry parameters: # perf record -e bridge:br_mdb_full & # [...] # perf script \| cut -d: -f4- dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 0 dev v2 af 10 src :: grp ff0e::112/00:00:00:00:00:00 vid 0 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 10 dev v2 af 10 src 2001:db8:1::1 grp ff0e::1/00:00:00:00:00:00 vid 10 dev v2 af 2 src ::ffff:192.0.2.1 grp ::ffff:239.1.1.1/00:00:00:00:00:00 vid 10 CC: Steven Rostedt <rostedt@goodmis.org> CC: linux-trace-kernel@vger.kernel.org Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:48:25 +00:00
Petr Machata	eceb30854f	net: bridge: Change a cleanup in br_multicast_new_port_group() to goto This function is getting more to clean up in the following patches. Structuring the cleanups in one labeled block will allow reusing the same cleanup from several places. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:48:25 +00:00
Petr Machata	976b3858dd	net: bridge: Add br_multicast_del_port_group() Since cleaning up the effects of br_multicast_new_port_group() just consists of delisting and freeing the memory, the function br_mdb_add_group_star_g() inlines the corresponding code. In the following patches, number of per-port and per-port-VLAN MDB entries is going to be maintained, and that counter will have to be updated. Because that logic is going to be hidden in the br_multicast module, introduce a new hook intended to again remove a newly-created group. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:48:25 +00:00
Petr Machata	1c85b80b20	net: bridge: Move extack-setting to br_multicast_new_port_group() Now that br_multicast_new_port_group() takes an extack argument, move setting the extack there. The downside is that the error messages end up being less specific (the function cannot distinguish between (S,G) and (*,G) groups). However, the alternative is to check in the caller whether the callee set the extack, and if it didn't, set it. But that is only done when the callee is not exactly known. (E.g. in case of a notifier invocation.) Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:48:25 +00:00
Petr Machata	60977a0c63	net: bridge: Add extack to br_multicast_new_port_group() Make it possible to set an extack in br_multicast_new_port_group(). Eventually, this function will check for per-port and per-port-vlan MDB maximums, and will use the extack to communicate the reason for the bounce. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:48:25 +00:00
Petr Machata	c00041cf1c	net: bridge: Set strict_start_type at two policies Make any attributes newly-added to br_port_policy or vlan_tunnel_policy parsed strictly, to prevent userspace from passing garbage. Note that this patchset only touches the former policy. The latter was adjusted for completeness' sake. There do not appear to be other _deprecated calls with non-NULL policies. Suggested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:48:25 +00:00
David S. Miller	8b7018fa71	Merge branch 'sparx5-PSFP-support' Daniel Machon says: ==================== net: Add support for PSFP in Sparx5 ================================================================================ Add support for Per-Stream Filtering and Policing (802.1Q-2018, 8.6.5.1). ================================================================================ The VCAP CLM (VCAP IS0 ingress classifier) classifies streams, identified by ISDX (Ingress Service Index, frame metadata), and maps ISDX to streams. Flow meters are also classified by ISDX, and implemented using service policers (Service Dual Leacky Buckets, SDLB). Leacky buckets are linked together in a leak chain of a leak group. Leak groups a preconfigured to serve buckets within a certain rate interval. Stream gates are time-based policers used by PSFP. Frames are dropped based on the gate state (OPEN/ CLOSE), whose state will be altered based on the Gate Control List (GCL) and current PTP time. Apart from time-based policing, stream gates can alter egress queue selection for the frames that pass through the Gate. This is done through Internal Priority Selector (IPS). Stream gates are mapped from stream filters. Support for tc actions gate and police, have been added to the VCAP IS0 set of supported actions. Examples: // tc filter with gate action $ tc filter add dev eth1 ingress chain 1100000 prio 1 handle 1001 protocol \ 802.1q flower skip_sw vlan_id 100 action gate base-time 0 sched-entry open \ 700000 7 8m sched-entry close 300000 action goto chain 1200000 // tc filter with police action $ tc filter add dev eth1 ingress chain 1100000 prio 1 handle 1002 protocol \ 802.1q flower skip_sw vlan_id 100 action police rate 1gbit burst 8096 \ conform-exceed drop action goto chain 1200000 ================================================================================ Patches ================================================================================ Patch #1: Adds new register needed for PSFP. Patch #2: Adds resource pools to control PSFP needed chip resources. Patch #3: Adds support for SDLB's needed for flow-meters. Patch #4: Adds support for service policers. Patch #5: Adds support for PSFP flow-meters, using service policers. Patch #6: Adds a new function to calculate basetime, required by flow-meters. Patch #7: Adds support for PSFP stream gates. Patch #8: Adds support for PSFP stream filters. Patch #9: Adds a function to initialize flow-meters, stream gates and stream filters. Patch #10: Adds the required flower code to configure PSFP using the tc command. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:26 +00:00
Daniel Machon	6ebf182bfd	sparx5: add support for configuring PSFP via tc Add support for tc actions gate and police, in order to implement support for configuring PSFP through tc. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:26 +00:00
Daniel Machon	e116b19db2	net: microchip: sparx5: initialize PSFP Initialize the SDLB's, stream gates and stream filters. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:26 +00:00
Daniel Machon	ae3e691f34	net: microchip: sparx5: add support for PSFP stream filters Add support for configuring PSFP stream filters (IEEE 802.1Q-2018, 8.6.5.1.1). The VCAP CLM (VCAP IS0 ingress classifier) classifies streams, identified by ISDX (Ingress Service Index, frame metadata), and maps ISDX to streams. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:26 +00:00
Daniel Machon	c70a5e2c3d	net: microchip: sparx5: add support for PSFP stream gates Add support for configuring PSFP stream gates (IEEE 802.1Q-2018, 8.6.5.1.2). Stream gates are time-based policers used by PSFP. Frames are dropped based on the gate state (OPEN/ CLOSE), whose state will be altered based on the Gate Control List (GCL) and current PTP time. Apart from time-based policing, stream gates can alter egress queue selection for the frames that pass through the Gate. This is done through Internal Priority Selector (IPS). Stream gates are mapped from stream filters. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:26 +00:00
Daniel Machon	9e02131ec2	net: microchip: sparx5: add function for calculating PTP basetime Add a new function for calculating PTP basetime, required by the stream gate scheduler to calculate gate state (open / close). Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:26 +00:00
Daniel Machon	d2185e79ba	net: microchip: sparx5: add support for PSFP flow-meters Add support for configuring PSFP flow-meters (IEEE 802.1Q-2018, 8.6.5.1.3). The VCAP CLM (VCAP IS0 ingress classifier) classifies streams, identified by ISDX (Ingress Service Index, frame metadata), and maps ISDX to flow-meters. SDLB's provide the flow-meter parameters. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:25 +00:00
Daniel Machon	1db82abf19	net: microchip: sparx5: add support for service policers Add initial API for configuring policers. This patch add support for service policers. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:25 +00:00
Daniel Machon	9bf5088989	net: microchip: sparx5: add support for Service Dual Leacky Buckets Add support for Service Dual Leacky Buckets (SDLB), used to implement PSFP flow-meters. Buckets are linked together in a leak chain of a leak group. Leak groups a preconfigured to serve buckets within a certain rate interval. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:25 +00:00
Daniel Machon	bb535c0dbb	net: microchip: sparx5: add resource pools Add resource pools and accessor functions. These pools can be queried by the driver, whenever a finite resource is required. Some resources can be reused, in which case an index and a reference count is used to keep track of users. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:25 +00:00
Daniel Machon	edad83e2ba	net: microchip: add registers needed for PSFP Add registers needed for PSFP. This patch also renames a single register, shortening its name (SYS_CLK_PER_100PS). Uses have been update accordingly. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-06 08:26:25 +00:00
David S. Miller	042b7858d5	Merge branch 'net-smc-parallelism' D. Wythe says: ==================== net/smc: optimize the parallelism of SMC-R connections This patch set attempts to optimize the parallelism of SMC-R connections, mainly to reduce unnecessary blocking on locks, and to fix exceptions that occur after thoses optimization. According to Off-CPU graph, SMC worker's off-CPU as that: smc_close_passive_work (1.09%) smcr_buf_unuse (1.08%) smc_llc_flow_initiate (1.02%) smc_listen_work (48.17%) __mutex_lock.isra.11 (47.96%) An ideal SMC-R connection process should only block on the IO events of the network, but it's quite clear that the SMC-R connection now is queued on the lock most of the time. The goal of this patchset is to achieve our ideal situation where network IO events are blocked for the majority of the connection lifetime. There are three big locks here: 1. smc_client_lgr_pending & smc_server_lgr_pending 2. llc_conf_mutex 3. rmbs_lock & sndbufs_lock And an implementation issue: 1. confirm/delete rkey msg can't be sent concurrently while protocol allows indeed. Unfortunately,The above problems together affect the parallelism of SMC-R connection. If any of them are not solved. our goal cannot be achieved. After this patch set, we can get a quite ideal off-CPU graph as following: smc_close_passive_work (41.58%) smcr_buf_unuse (41.57%) smc_llc_do_delete_rkey (41.57%) smc_listen_work (39.10%) smc_clc_wait_msg (13.18%) tcp_recvmsg_locked (13.18) smc_listen_find_device (25.87%) smcr_lgr_reg_rmbs (25.87%) smc_llc_do_confirm_rkey (25.87%) We can see that most of the waiting times are waiting for network IO events. This also has a certain performance improvement on our short-lived conenction wrk/nginx benchmark test: +--------------+------+------+-------+--------+------+--------+ \|conns/qps \|c4 \| c8 \| c16 \| c32 \| c64 \| c200 \| +--------------+------+------+-------+--------+------+--------+ \|SMC-R before \|9.7k \| 10k \| 10k \| 9.9k \| 9.1k \| 8.9k \| +--------------+------+------+-------+--------+------+--------+ \|SMC-R now \|13k \| 19k \| 18k \| 16k \| 15k \| 12k \| +--------------+------+------+-------+--------+------+--------+ \|TCP \|15k \| 35k \| 51k \| 80k \| 100k \| 162k \| +--------------+------+------+-------+--------+------+--------+ The reason why the benefit is not obvious after the number of connections has increased dues to workqueue. If we try to change workqueue to UNBOUND, we can obtain at least 4-5 times performance improvement, reach up to half of TCP. However, this is not an elegant solution, the optimization of it will be much more complicated. But in any case, we will submit relevant optimization patches as soon as possible. Please note that the premise here is that the lock related problem must be solved first, otherwise, no matter how we optimize the workqueue, there won't be much improvement. Because there are a lot of related changes to the code, if you have any questions or suggestions, please let me know. Thanks D. Wythe v1 -> v2: 1. Fix panic in SMC-D scenario 2. Fix lnkc related hashfn calculation exception, caused by operator priority 3. Only wake up one connection if the lnk is not active 4. Delete obsolete unlock logic in smc_listen_work() 5. PATCH format, do Reverse Christmas tree 6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx 7. PATCH format, add correct fix tag for the patches for fixes. 8. PATCH format, fix some spelling error 9. PATCH format, rename slow to do_slow v2 -> v3: 1. add SMC-D support, remove the concept of link cluster since SMC-D has no link at all. Replace it by lgr decision maker, who provides suggestions to SMC-D and SMC-R on whether to create new link group. 2. Fix the corruption problem described by PATCH 'fix application data exception' on SMC-D. v3 -> v4: 1. Fix panic caused by uninitialization map. v4 -> v5: 1. Make SMC-D buf creation be serial to avoid Potential error 2. Add a flag to synchronize the success of the first contact with the ready of the link group, including SMC-D and SMC-R. 3. Fixed possible reference count leak in smc_llc_flow_start(). 4. reorder the patch, make bugfix PATCH be ahead. v5 -> v6: 1. Separate the bugfix patches to make it independent. 2. Merge patch 'fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending' with patch 'remove locks smc_client_lgr_pending and smc_server_lgr_pending' 3. Format code styles, including alignment and reverse christmas tree style. 4. Fix a possible memory leak in smc_llc_rmt_delete_rkey() and smc_llc_rmt_conf_rkey(). v6 -> v7: 1. Discard patch attempting to remove global locks 2. Discard patch attempting make confirm/delete rkey process concurrently ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-04 09:48:19 +00:00
D. Wythe	aff7bfed90	net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore It's clear that rmbs_lock and sndbufs_lock are aims to protect the rmbs list or the sndbufs list. During connection establieshment, smc_buf_get_slot() will always be invoked, and it only performs read semantics in rmbs list and sndbufs list. Based on the above considerations, we replace mutex with rw_semaphore. Only smc_buf_get_slot() use down_read() to allow smc_buf_get_slot() run concurrently, other part use down_write() to keep exclusive semantics. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-04 09:48:19 +00:00
D. Wythe	4da687448d	net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() Unlike smc_buf_create() and smcr_buf_unuse(), smcr_lgr_reg_rmbs() is exclusive when assigned rmb_desc was not registered, although it can be executed in parallel when assigned rmb_desc was registered already and only performs read semtamics on it. Hence, we can not simply replace it with read semaphore. The idea here is that if the assigned rmb_desc was registered already, use read semaphore to protect the critical section, once the assigned rmb_desc was not registered, keep using keep write semaphore still to keep its exclusivity. Thanks to the reusable features of rmb_desc, which allows us to execute in parallel in most cases. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-04 09:48:19 +00:00
D. Wythe	f6421014e8	net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse() Following is part of Off-CPU graph during frequent SMC-R short-lived processing: process_one_work (51.19%) smc_close_passive_work (28.36%) smcr_buf_unuse (28.34%) rwsem_down_write_slowpath (28.22%) smc_listen_work (22.83%) smc_clc_wait_msg (1.84%) smc_buf_create (20.45%) smcr_buf_map_usable_links rwsem_down_write_slowpath (20.43%) smcr_lgr_reg_rmbs (0.53%) rwsem_down_write_slowpath (0.43%) smc_llc_do_confirm_rkey (0.08%) We can clearly see that during the connection establishment time, waiting time of connections is not on IO, but on llc_conf_mutex. What is more important, the core critical area (smcr_buf_unuse() & smc_buf_create()) only perfroms read semantics on links, we can easily replace it with read semaphore. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-04 09:48:19 +00:00
D. Wythe	b5dd4d6981	net/smc: llc_conf_mutex refactor, replace it with rw_semaphore llc_conf_mutex was used to protect links and link related configurations in the same link group, for example, add or delete links. However, in most cases, the protected critical area has only read semantics and with no write semantics at all, such as obtaining a usable link or an available rmb_desc. This patch do simply code refactoring, replace mutex with rw_semaphore, replace mutex_lock with down_write and replace mutex_unlock with up_write. Theoretically, this replacement is equivalent, but after this patch, we can distinguish lock granularity according to different semantics of critical areas. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-04 09:48:19 +00:00
Jakub Kicinski	88c940cccb	Merge branch 'updates-to-enetc-txq-management' Vladimir Oltean says: ==================== Updates to ENETC TXQ management The set ensures that the number of TXQs given by enetc to the network stack (mqprio or TX hashing) + the number of TXQs given to XDP never exceeds the number of available TXQs. These are the first 4 patches of series "[v5,net-next,00/17] ENETC mqprio/taprio cleanup" from here: https://patchwork.kernel.org/project/netdevbpf/cover/20230202003621.2679603-1-vladimir.oltean@nxp.com/ There is no change in this version compared to there. I split them off because this contains a fix for net-next and it would be good if it could go in quickly. I also did it to reduce the patch count of that other series, if I need to respin it again. ==================== Link: https://lore.kernel.org/r/20230203001116.3814809-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 20:05:59 -08:00
Vladimir Oltean	800db2d125	net: enetc: ensure we always have a minimum number of TXQs for stack Currently it can happen that an mqprio qdisc is installed with num_tc 8, and this will reserve 8 (out of 8) TXQs for the network stack. Then we can attach an XDP program, and this will crop 2 TXQs, leaving just 6 for mqprio. That's not what the user requested, and we should fail it. On the other hand, if mqprio isn't requested, we still give the 8 TXQs to the network stack (with hashing among a single traffic class), but then, cropping 2 TXQs for XDP is fine, because the user didn't explicitly ask for any number of TXQs, so no expectations are violated. Simply put, the logic that mqprio should impose a minimum number of TXQs for the network never existed. Let's say (more or less arbitrarily) that without mqprio, the driver expects a minimum number of TXQs equal to the number of CPUs (on NXP LS1028A, that is either 1, or 2). And with mqprio, mqprio gives the minimum required number of TXQs. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 20:05:57 -08:00
Vladimir Oltean	4ea1dd743e	net: enetc: recalculate num_real_tx_queues when XDP program attaches Since the blamed net-next commit, enetc_setup_xdp_prog() no longer goes through enetc_open(), and therefore, the function which was supposed to detect whether a BPF program exists (in order to crop some TX queues from network stack usage), enetc_num_stack_tx_queues(), no longer gets called. We can move the netif_set_real_num_rx_queues() call to enetc_alloc_msix() (probe time), since it is a runtime invariant. We can do the same thing with netif_set_real_num_tx_queues(), and let enetc_reconfigure_xdp_cb() explicitly recalculate and change the number of stack TX queues. Fixes: `c33bfaf91c` ("net: enetc: set up XDP program under enetc_reconfigure()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 20:05:57 -08:00
Vladimir Oltean	46a0ecf93b	net: enetc: allow the enetc_reconfigure() callback to fail enetc_reconfigure() was modified in commit `c33bfaf91c` ("net: enetc: set up XDP program under enetc_reconfigure()") to take an optional callback that runs while the netdev is down, but this callback currently cannot fail. Code up the error handling so that the interface is restarted with the old resources if the callback fails. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 20:05:57 -08:00
Vladimir Oltean	1c81a9b3aa	net: enetc: simplify enetc_num_stack_tx_queues() We keep a pointer to the xdp_prog in the private netdev structure as well; what's replicated per RX ring is done so just for more convenient access from the NAPI poll procedure. Simplify enetc_num_stack_tx_queues() by looking at priv->xdp_prog rather than iterating through the information replicated per RX ring. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 20:05:57 -08:00
Jakub Kicinski	8788260e8f	Merge branch 'raw-add-drop-reasons-and-use-another-hash-function' Eric Dumazet says: ==================== raw: add drop reasons and use another hash function Two first patches add drop reasons to raw input processing. Last patch spreads RAW sockets in the shared hash tables to avoid long hash buckets in some cases. ==================== Link: https://lore.kernel.org/r/20230202094100.3083177-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:56:26 -08:00
Eric Dumazet	6579f5bacc	raw: use net_hash_mix() in hash function Some applications seem to rely on RAW sockets. If they use private netns, we can avoid piling all RAW sockets bound to a given protocol into a single bucket. Also place (struct raw_hashinfo).lock into its own cache line to limit false sharing. Alternative would be to have per-netns hashtables, but this seems too expensive for most netns where RAW sockets are not used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:56:23 -08:00
Eric Dumazet	42186e6c00	ipv4: raw: add drop reasons Use existing helpers and drop reason codes for RAW input path. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:56:23 -08:00
Eric Dumazet	8d8ebd77f5	ipv6: raw: add drop reasons Use existing helpers and drop reason codes for RAW input path. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:56:23 -08:00
Jakub Kicinski	dfefcb0c04	Merge branch 'devlink-move-devlink-dev-code-to-a-separate-file' Moshe Shemesh says: ==================== devlink: Move devlink dev code to a separate file This patchset is moving code from the file leftover.c to new file dev.c. About 1.3K lines are moved by this patchset covering most of the devlink dev object callbacks and functionality: reload, eswitch, info, flash and selftest. ==================== Link: https://lore.kernel.org/r/1675349226-284034-1-git-send-email-moshe@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:25:28 -08:00
Moshe Shemesh	7c976c7cfc	devlink: Move devlink dev selftest code to dev Move devlink dev selftest callbacks and related code from leftover.c to file dev.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:25:26 -08:00
Moshe Shemesh	ec4a0ce92e	devlink: Move devlink_info_req struct to be local As all users of the struct devlink_info_req are already in dev.c, move this struct from devl_internal.c to be local in dev.c. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:25:26 -08:00
Moshe Shemesh	a13aab66cb	devlink: Move devlink dev flash code to dev Move devlink dev flash callbacks, helpers and other related code from leftover.c to dev.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:25:26 -08:00
Moshe Shemesh	d60191c46e	devlink: Move devlink dev info code to dev Move devlink dev info callbacks, related drivers helpers functions and other related code from leftover.c to dev.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:25:26 -08:00
Moshe Shemesh	af2f8c1f82	devlink: Move devlink dev eswitch code to dev Move devlink dev eswitch callbacks and related code from leftover.c to file dev.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:25:25 -08:00
Moshe Shemesh	c6ed7d6ef9	devlink: Move devlink dev reload code to dev Move devlink dev reload callback and related code from leftover.c to file dev.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:25:25 -08:00
Moshe Shemesh	dbeeca81bd	devlink: Split out dev get and dump code Move devlink dev get and dump callbacks and related dev code to new file dev.c. This file shall include all callbacks that are specific on devlink dev object. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:25:25 -08:00
Vladimir Oltean	d795527d50	net: dsa: use NL_SET_ERR_MSG_WEAK_MOD() more consistently Now that commit `028fb19c6b` ("netlink: provide an ability to set default extack message") provides a weak function that doesn't override an existing extack message provided by the driver, it makes sense to use it also for LAG and HSR offloading, not just for bridge offloading. Also consistently put the message string on a separate line, to reduce line length from 92 to 84 characters. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230202140354.3158129-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-02-03 19:23:32 -08:00
David S. Miller	8065c0e13f	Merge branch 'yt8531-support' Frank Sae says: ==================== net: add dts for yt8521 and yt8531s, add driver for yt8531 Add dts for yt8521 and yt8531s, add driver for yt8531. These patches have been verified on our AM335x platform (motherboard) which has one integrated yt8521 and one RGMII interface. It can connect to daughter boards like yt8531s or yt8531 board. v5: - change the compatible of yaml - change the maintainers of yaml from "frank sae" to "Frank Sae" v4: - change default tx delay from 150ps to 1950ps - add compatible for yaml v3: - change default rx delay from 1900ps to 1950ps - moved ytphy_rgmii_clk_delay_config_with_lock from yt8521's patch to yt8531's patch - removed unnecessary checks of phydev->attached_dev->dev_addr v2: - split BIT macro as one patch - split "dts for yt8521/yt8531s ... " patch as two patches - use standard rx-internal-delay-ps and tx-internal-delay-ps, removed motorcomm,sds-tx-amplitude - removed ytphy_parse_dt, ytphy_probe_helper and ytphy_config_init_helper - not store dts arg to yt8521_priv ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-03 09:34:51 +00:00
Frank Sae	4ac94f728a	net: phy: Add driver for Motorcomm yt8531 gigabit ethernet phy Add a driver for the motorcomm yt8531 gigabit ethernet phy. We have verified the driver on AM335x platform with yt8531 board. On the board, yt8531 gigabit ethernet phy works in utp mode, RGMII interface, supports 1000M/100M/10M speeds, and wol(magic package). Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-03 09:34:51 +00:00
Frank Sae	36152f87dd	net: phy: Add dts support for Motorcomm yt8531s gigabit ethernet phy Add dts support for Motorcomm yt8531s gigabit ethernet phy. Change yt8521_probe to support clk config of yt8531s. Becase yt8521_probe does the things which yt8531s is needed, so removed yt8531s function. This patch has been verified on AM335x platform with yt8531s board. Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-03 09:34:51 +00:00
Frank Sae	a6e68f0f87	net: phy: Add dts support for Motorcomm yt8521 gigabit ethernet phy Add dts support for Motorcomm yt8521 gigabit ethernet phy. Add ytphy_rgmii_clk_delay_config function to support dst config for the delay of rgmii clk. This funciont is common for yt8521, yt8531s and yt8531. This patch has been verified on AM335x platform. Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-03 09:34:51 +00:00
Frank Sae	4869a146cd	net: phy: Add BIT macro for Motorcomm yt8521/yt8531 gigabit ethernet phy Add BIT macro for Motorcomm yt8521/yt8531 gigabit ethernet phy. This is a preparatory patch. Add BIT macro for 0xA012 reg, and supplement for 0xA001 and 0xA003 reg. These will be used to support dts. Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-03 09:34:50 +00:00
Frank Sae	cf08dfe8ae	dt-bindings: net: Add Motorcomm yt8xxx ethernet phy Add a YAML binding document for the Motorcomm yt8xxx Ethernet phy. Signed-off-by: Frank Sae <Frank.Sae@motor-comm.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-03 09:34:50 +00:00
David S. Miller	18390581d0	Merge branch 'act_ct-UDP-NEW' Vlad Buslov says: ==================== net: Allow offloading of UDP NEW connections via act_ct Currently only bidirectional established connections can be offloaded via act_ct. Such approach allows to hardcode a lot of assumptions into act_ct, flow_table and flow_offload intermediate layer codes. In order to enabled offloading of unidirectional UDP NEW connections start with incrementally changing the following assumptions: - Drivers assume that only established connections are offloaded and don't support updating existing connections. Extract ctinfo from meta action cookie and refuse offloading of new connections in the drivers. - Fix flow_table offload fixup algorithm to calculate flow timeout according to current connection state instead of hardcoded "established" value. - Add new flow_table flow flag that designates bidirectional connections instead of assuming it and hardcoding hardware offload of every flow in both directions. - Add new flow_table flow flag that designates connections that are offloaded to hardware as "established" instead of assuming it. This allows some optimizations in act_ct and prevents spamming the flow_table workqueue with redundant tasks. With all the necessary infrastructure in place modify act_ct to offload UDP NEW as unidirectional connection. Pass reply direction traffic to CT and promote connection to bidirectional when UDP connection state changes to "assured". Rely on refresh mechanism to propagate connection state change to supporting drivers. Note that early drop algorithm that is designed to free up some space in connection tracking table when it becomes full (by randomly deleting up to 5% of non-established connections) currently ignores connections marked as "offloaded". Now, with UDP NEW connections becoming "offloaded" it could allow malicious user to perform DoS attack by filling the table with non-droppable UDP NEW connections by sending just one packet in single direction. To prevent such scenario change early drop algorithm to also consider "offloaded" connections for deletion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-03 09:31:25 +00:00
Vlad Buslov	df25455e5a	netfilter: nf_conntrack: allow early drop of offloaded UDP conns Both synchronous early drop algorithm and asynchronous gc worker completely ignore connections with IPS_OFFLOAD_BIT status bit set. With new functionality that enabled UDP NEW connection offload in action CT malicious user can flood the conntrack table with offloaded UDP connections by just sending a single packet per 5tuple because such connections can no longer be deleted by early drop algorithm. To mitigate the issue allow both early drop and gc to consider offloaded UDP connections for deletion. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-03 09:31:24 +00:00
Vlad Buslov	6a9bad0069	net/sched: act_ct: offload UDP NEW connections Modify the offload algorithm of UDP connections to the following: - Offload NEW connection as unidirectional. - When connection state changes to ESTABLISHED also update the hardware flow. However, in order to prevent act_ct from spamming offload add wq for every packet coming in reply direction in this state verify whether connection has already been updated to ESTABLISHED in the drivers. If that it the case, then skip flow_table and let conntrack handle such packets which will also allow conntrack to potentially promote the connection to ASSURED. - When connection state changes to ASSURED set the flow_table flow NF_FLOW_HW_BIDIRECTIONAL flag which will cause refresh mechanism to offload the reply direction. All other protocols have their offload algorithm preserved and are always offloaded as bidirectional. Note that this change tries to minimize the load on flow_table add workqueue. First, it tracks the last ctinfo that was offloaded by using new flow 'NF_FLOW_HW_ESTABLISHED' flag and doesn't schedule the refresh for reply direction packets when the offloads have already been updated with current ctinfo. Second, when 'add' task executes on workqueue it always update the offload with current flow state (by checking 'bidirectional' flow flag and obtaining actual ctinfo/cookie through meta action instead of caching any of these from the moment of scheduling the 'add' work) preventing the need from scheduling more updates if state changed concurrently while the 'add' work was pending on workqueue. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-03 09:31:24 +00:00
Vlad Buslov	d5774cb6c5	net/sched: act_ct: set ctinfo in meta action depending on ct state Currently tcf_ct_flow_table_fill_actions() function assumes that only established connections can be offloaded and always sets ctinfo to either IP_CT_ESTABLISHED or IP_CT_ESTABLISHED_REPLY strictly based on direction without checking actual connection state. To enable UDP NEW connection offload set the ctinfo, metadata cookie and NF_FLOW_HW_ESTABLISHED flow_offload flags bit based on ct->status value. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-02-03 09:31:24 +00:00

1 2 3 4 5 ...

1156172 Commits