linux

iv/linux

Author	SHA1	Message	Date
David Howells	c336a79983	nvmet-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage When transmitting data, call down into TCP using a single sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather than copied instead of calling sendpage. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Willem de Bruijn <willemb@google.com> cc: Keith Busch <kbusch@kernel.org> cc: Jens Axboe <axboe@fb.com> cc: Christoph Hellwig <hch@lst.de> cc: Chaitanya Kulkarni <kch@nvidia.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-nvme@lists.infradead.org Link: https://lore.kernel.org/r/20230623225513.2732256-9-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:50:12 -07:00
David Howells	7769887817	nvme-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage When transmitting data, call down into TCP using a sendmsg with MSG_SPLICE_PAGES instead of sendpage. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Willem de Bruijn <willemb@google.com> cc: Keith Busch <kbusch@kernel.org> cc: Jens Axboe <axboe@fb.com> cc: Christoph Hellwig <hch@lst.de> cc: Chaitanya Kulkarni <kch@nvidia.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-nvme@lists.infradead.org Link: https://lore.kernel.org/r/20230623225513.2732256-8-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:50:12 -07:00
David Howells	a1a5e87527	dlm: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage When transmitting data, call down a layer using a single sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced rather using sendpage. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells <dhowells@redhat.com> cc: Christine Caulfield <ccaulfie@redhat.com> cc: David Teigland <teigland@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: cluster-devel@redhat.com Link: https://lore.kernel.org/r/20230623225513.2732256-7-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:50:12 -07:00
David Howells	572efade27	rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage When transmitting data, call down into TCP using a single sendmsg with MSG_SPLICE_PAGES to indicate that content should be spliced. To make this work, the data is assembled in a bio_vec array and attached to a BVEC-type iterator. Signed-off-by: David Howells <dhowells@redhat.com> cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: rds-devel@oss.oracle.com Link: https://lore.kernel.org/r/20230623225513.2732256-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:50:12 -07:00
David Howells	fa094ccae1	ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage() Use sendmsg() and MSG_SPLICE_PAGES rather than sendpage in ceph when transmitting data. For the moment, this can only transmit one page at a time because of the architecture of net/ceph/, but if write_partial_message_data() can be given a bvec[] at a time by the iteration code, this would allow pages to be sent in a batch. Signed-off-by: David Howells <dhowells@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Link: https://lore.kernel.org/r/20230623225513.2732256-5-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:50:12 -07:00
David Howells	40a8c17aa7	ceph: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage Use sendmsg() and MSG_SPLICE_PAGES rather than sendpage in ceph when transmitting data. For the moment, this can only transmit one page at a time because of the architecture of net/ceph/, but if write_partial_message_data() can be given a bvec[] at a time by the iteration code, this would allow pages to be sent in a batch. Signed-off-by: David Howells <dhowells@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Link: https://lore.kernel.org/r/20230623225513.2732256-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:50:12 -07:00
David Howells	c729ed6f5b	net: Use sendmsg(MSG_SPLICE_PAGES) not sendpage in skb_send_sock() Use sendmsg() with MSG_SPLICE_PAGES rather than sendpage in skb_send_sock(). This causes pages to be spliced from the source iterator if possible. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Note that this could perhaps be improved to fill out a bvec array with all the frags and then make a single sendmsg call, possibly sticking the header on the front also. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Link: https://lore.kernel.org/r/20230623225513.2732256-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:50:12 -07:00
David Howells	f8dd95b29d	tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage As MSG_SENDPAGE_NOTLAST is being phased out along with sendpage(), don't use it further in than the sendpage methods, but rather translate it to MSG_MORE and use that instead. Signed-off-by: David Howells <dhowells@redhat.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: Bernard Metzler <bmt@zurich.ibm.com> cc: Jason Gunthorpe <jgg@ziepe.ca> cc: Leon Romanovsky <leon@kernel.org> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Sitnicki <jakub@cloudflare.com> cc: David Ahern <dsahern@kernel.org> cc: Karsten Graul <kgraul@linux.ibm.com> cc: Wenjia Zhang <wenjia@linux.ibm.com> cc: Jan Karcher <jaka@linux.ibm.com> cc: "D. Wythe" <alibuda@linux.alibaba.com> cc: Tony Lu <tonylu@linux.alibaba.com> cc: Wen Gu <guwen@linux.alibaba.com> cc: Boris Pismenny <borisp@nvidia.com> cc: Steffen Klassert <steffen.klassert@secunet.com> cc: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20230623225513.2732256-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:50:12 -07:00
Jakub Kicinski	b545a13ca9	mlx5-updates-2023-06-21 mlx5 driver minor cleanup and fixes to net-next -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmSV8icACgkQSD+KveBX +j6WgAf+Ph5w2dAfuadjlAcQlK2WKXyS1OliVpvCuglepgsP9Zree11WVoyiAKef 70Xd5Vu4xAQTdpCsw4DGM7sh6xW1SxvZFP1A/FVk7UU1E0zL2SXzKyEHK29I1MH5 nej2hRf/W1dwYtxfwNOTKAFco3wr0e1vLgMDEqZBZbJXzcUetDRADkgWrx9U+pno lBPhVMZWK5R0GzOjlWZXoedaXx2RIcm+5U02ov5S6d1y8AA+sE93tiYxrP9z/2lj nml1KeQQl0Ku/y+e8RMSUd9mPdomQZi6CVHMD1wV5DNv6dnrT1bPrUdt4torSXEI KAzkVie979XP2jvkDKY2nyXZ8dn7cw== =woKn -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-06-21 mlx5 driver minor cleanup and fixes to net-next * tag 'mlx5-updates-2023-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Remove pointless vport lookup from mlx5_esw_check_port_type() net/mlx5: Remove redundant check from mlx5_esw_query_vport_vhca_id() net/mlx5: Remove redundant is_mdev_switchdev_mode() check from is_ib_rep_supported() net/mlx5: Remove redundant MLX5_ESWITCH_MANAGER() check from is_ib_rep_supported() net/mlx5e: E-Switch, Fix shared fdb error flow net/mlx5e: Remove redundant comment net/mlx5e: E-Switch, Pass other_vport flag if vport is not 0 net/mlx5e: E-Switch, Use xarray for devcom paired device index net/mlx5e: E-Switch, Add peer fdb miss rules for vport manager or ecpf net/mlx5e: Use vhca_id for device index in vport rx rules net/mlx5: Lag, Remove duplicate code checking lag is supported net/mlx5: Fix error code in mlx5_is_reset_now_capable() net/mlx5: Fix reserved at offset in hca_cap register net/mlx5: Fix SFs kernel documentation error net/mlx5: Fix UAF in mlx5_eswitch_cleanup() ==================== Link: https://lore.kernel.org/r/20230623192907.39033-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:48:04 -07:00
Jakub Kicinski	35bf34b078	Merge branch 'netlink-add-display-hint-to-ynl' Donald Hunter says: ==================== netlink: add display-hint to ynl Add a display-hint property to the netlink schema, to be used by generic netlink clients as hints about how to display attribute values. A display-hint on an attribute definition is intended for letting a client such as ynl know that, for example, a u32 should be rendered as an ipv4 address. The display-hint enumeration includes a small number of networking domain-specific value types. ==================== Link: https://lore.kernel.org/r/20230623201928.14275-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:45:52 -07:00
Donald Hunter	334f39ce17	netlink: specs: add display hints to ovs_flow Add display hints for mac, ipv4, ipv6, hex and uuid to the ovs_flow schema. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230623201928.14275-4-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:45:49 -07:00
Donald Hunter	d8eea68d91	tools: ynl: add display-hint support to ynl Add support to the ynl tool for rendering output based on display-hint properties. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230623201928.14275-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:45:49 -07:00
Donald Hunter	737eab775d	netlink: specs: add display-hint to schema definitions Add a display-hint property to the netlink schema that is for providing optional hints to generic netlink clients about how to display attribute values. A display-hint on an attribute definition is intended for letting a client such as ynl know that, for example, a u32 should be rendered as an ipv4 address. The display-hint enumeration includes a small number of networking domain-specific value types. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230623201928.14275-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:45:49 -07:00
Jakub Kicinski	2ffecf1a42	Core WPAN changes: * Support for active scans * Support for answering BEACON_REQ * Specific MLME handling for limited devices WPAN driver changes: * ca8210: - Flag the devices as limited - Remove stray gpiod_unexport() call -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAmSV2A0ACgkQJWrqGEe9 VoSFxgf+MH6U+kmOU9rAikG12t+qTVBKzUkxUgCXcNKSb8oo2/5xF4jWpPd0Oa0W Ztneu7N/1o8FFTLJiI5l0KkidU2Nq9jBgYJPZPBen1TucrDapyMvQ1UzfA3LF8Vj rnBWXiz9oDWAg33KuSGZwgagvYe+UawQTzsvM+HKfai5VMyCzno0oJ8RXAJ/6jgY GkB9e3I5nkZUT9wVzeFKQXEnqGRjZRS2WoC/a76fRz0larGoVeHY3IWWCkQYamU6 vro18m2yiv/DkZaT5L0YQlYJ8LWojkaCgUVmJlP/9ngMO9WXqriVytMJBpcmFIhp 1us6PVS5ue65C1aVS3uHQSe47WM0SA== =sP6V -----END PGP SIGNATURE----- Merge tag 'ieee802154-for-net-next-2023-06-23' of gitolite.kernel.org:pub/scm/linux/kernel/git/wpan/wpan-next Miquel Raynal says: ==================== Core WPAN changes: - Support for active scans - Support for answering BEACON_REQ - Specific MLME handling for limited devices WPAN driver changes: - ca8210: - Flag the devices as limited - Remove stray gpiod_unexport() call * tag 'ieee802154-for-net-next-2023-06-23' of gitolite.kernel.org:pub/scm/linux/kernel/git/wpan/wpan-next: ieee802154: ca8210: Remove stray gpiod_unexport() call ieee802154: ca8210: Flag the driver as being limited net: ieee802154: Handle limited devices with only datagram support mac802154: Handle received BEACON_REQ ieee802154: Add support for allowing to answer BEACON_REQ mac802154: Handle active scanning ieee802154: Add support for user active scan requests ==================== Link: https://lore.kernel.org/r/20230623195506.40b87b5f@xps-13 Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:41:46 -07:00
Jakub Kicinski	14fd5e0d48	Merge branch 'selftests-mptcp-refactoring-and-minor-fixes' Mat Martineau says: ==================== selftests: mptcp: Refactoring and minor fixes Patch 1 moves code around for clarity and improved code reuse. Patch 2 makes use of new MPTCP info that consolidates MPTCP-level and subflow-level information. Patches 3-7 refactor code to favor limited-scope environment vars over optional parameters. Patch 8: typo fix ==================== Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-0-a883213c8ba9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:38:02 -07:00
Yueh-Shun Li	e6b8a78ea2	selftests: mptcp: connect: fix comment typo Spell "transmissions" properly. Found by searching for keyword "tranm". Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Yueh-Shun Li <shamrocklee@posteo.net> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-8-a883213c8ba9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:37:57 -07:00
Geliang Tang	9e9d176df8	selftests: mptcp: add pm_nl_set_endpoint helper This patch moves endpoint settings out of do_transfer() into a new helper pm_nl_set_endpoint(). And invoke this helper in do_transfer(). This makes the code much more clearer. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-7-a883213c8ba9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:37:57 -07:00
Geliang Tang	1534f87ee0	selftests: mptcp: drop sflags parameter run_tests() accepts too many optional parameters. Before this modification, it was required to set all of then when only the last one had to be changed. That's not clear to see all these 0 and it makes the maintenance harder: run_tests $ns1 $ns2 10.0.1.1 1 2 3 slow Instead, the parameter can be set as an env var with a limited scope: foo=1 bar=2 next=3 \ run_tests $ns1 $ns2 10.0.1.1 slow This patch switches to key/value "sflags=*" instead of positional parameter sflags of do_transfer() and run_tests(). Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-6-a883213c8ba9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:37:57 -07:00
Geliang Tang	595ef566a2	selftests: mptcp: drop addr_nr_ns1/2 parameters run_tests() accepts too many optional parameters. Before this modification, it was required to set all of then when only the last one had to be changed. That's not clear to see all these 0 and it makes the maintenance harder: run_tests $ns1 $ns2 10.0.1.1 1 2 3 slow Instead, the parameter can be set as an env var with a limited scope: foo=1 bar=2 next=3 \ run_tests $ns1 $ns2 10.0.1.1 slow This patch switches to key/value "addr_nr_ns1=, addr_nr_ns2=" instead of positional parameters addr_nr_ns1 and addr_nr_ns2 of do_transfer() and run_tests(). Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-5-a883213c8ba9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:37:57 -07:00
Geliang Tang	0c93af1f89	selftests: mptcp: drop test_linkfail parameter run_tests() accepts too many optional parameters. Before this modification, it was required to set all of then when only the last one had to be changed. That's not clear to see all these 0 and it makes the maintenance harder: run_tests $ns1 $ns2 10.0.1.1 1 2 3 slow Instead, the parameter can be set as an env var with a limited scope: foo=1 bar=2 next=3 \ run_tests $ns1 $ns2 10.0.1.1 slow This patch switches to key/value "test_linkfail=*" instead of positional parameter test_linkfail of do_transfer() and run_tests(). Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-4-a883213c8ba9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:37:57 -07:00
Geliang Tang	be7e9786c9	selftests: mptcp: set FAILING_LINKS in run_tests Set FAILING_LINKS as an env var with a limited scope only when calling run_tests(). Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-3-a883213c8ba9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:37:57 -07:00
Geliang Tang	d7ced753aa	selftests: mptcp: check subflow and addr infos New MPTCP info are being checked in multiple places to improve the code coverage when using the userspace PM. This patch makes chk_mptcp_info() more generic to be able to check subflows, add_addr_signal and add_addr_accepted info (and even more later). New arguments are now required to get different infos from the two namespaces because some counters are specific to the client or the server. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-2-a883213c8ba9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:37:57 -07:00
Geliang Tang	4369c198e5	selftests: mptcp: test userspace pm out of transfer This patch moves userspace pm tests out of do_transfer(). Move add address test into a new function userspace_pm_add_addr(), and remove address test into userspace_pm_rm_sf_addr_ns1(). Move add subflow test into userspace_pm_add_sf() and remove subflow into userspace_pm_rm_sf_addr_ns2(). Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20230623-send-net-next-20230623-v1-1-a883213c8ba9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:37:57 -07:00
Jakub Kicinski	c4015bbee9	Merge branch 'net-stmmac-introduce-devres-helpers-for-stmmac-platform-drivers' Bartosz Golaszewski says: ==================== net: stmmac: introduce devres helpers for stmmac platform drivers The goal of this series is two-fold: to make the API for stmmac platforms more logically correct (by providing functions that acquire resources with release counterparts that undo only their actions and nothing more) and to provide devres variants of commonly use registration functions that allows to significantly simplify the platform drivers. The current pattern for stmmac platform drivers is to call stmmac_probe_config_dt(), possibly the platform's init() callback and then call stmmac_drv_probe(). The resources allocated by these calls will then be released by calling stmmac_pltfr_remove(). This goes against the commonly accepted way of providing each function that allocated a resource with a function that frees it. First: provide wrappers around platform's init() and exit() callbacks that allow users to skip checking if the callbacks exist manually. Second: provide stmmac_pltfr_probe() which calls the platform init() callback and then calls stmmac_drv_probe() together with a variant of stmmac_pltfr_remove() that DOES NOT call stmmac_remove_config_dt(). For now this variant is called stmmac_pltfr_remove_no_dt() but once all users of the old stmmac_pltfr_remove() are converted to the devres helper, it will be renamed back to stmmac_pltfr_remove() and the no_dt function removed. Finally use the devres helpers in dwmac-qco-ethqos to show how much simplier the driver's probe() becomes. This series obviously just starts the conversion process and other platform drivers will need to be converted once the helpers land in net/. ==================== Link: https://lore.kernel.org/r/20230623100417.93592-1-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:36:06 -07:00
Bartosz Golaszewski	4194f32a4b	net: stmmac: dwmac-qcom-ethqos: use devm_stmmac_pltfr_probe() Use the devres variant of stmmac_pltfr_probe() and finally drop the remove() callback entirely. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-12-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:36:00 -07:00
Bartosz Golaszewski	fc9ee2ac4f	net: stmmac: platform: provide devm_stmmac_pltfr_probe() Provide a devres variant of stmmac_pltfr_probe() which allows users to skip calling stmmac_pltfr_remove() at driver detach. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-11-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:36:00 -07:00
Bartosz Golaszewski	061425d933	net: stmmac: dwmac-qco-ethqos: use devm_stmmac_probe_config_dt() Significantly simplify the driver's probe() function by using the devres variant of stmmac_probe_config_dt(). This allows to drop the goto jumps entirely. The remove_new() callback now needs to be switched to stmmac_pltfr_remove_no_dt(). Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-10-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:36:00 -07:00
Bartosz Golaszewski	d740654273	net: stmmac: platform: provide devm_stmmac_probe_config_dt() Provide a devres variant of stmmac_probe_config_dt() that allows users to skip calling stmmac_remove_config_dt() at driver detach. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-9-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:36:00 -07:00
Bartosz Golaszewski	1be0c9d65e	net: stmmac: platform: provide stmmac_pltfr_remove_no_dt() Add a variant of stmmac_pltfr_remove() that only frees resources allocated by stmmac_pltfr_probe() and - unlike stmmac_pltfr_remove() - does not call stmmac_remove_config_dt(). Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-8-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	0a68a59493	net: stmmac: dwmac-generic: use stmmac_pltfr_probe() Shrink the code and remove labels by using the new stmmac_pltfr_probe() function. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-7-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	3d5bf75d76	net: stmmac: platform: provide stmmac_pltfr_probe() Implement stmmac_pltfr_probe() which is the logical API counterpart for stmmac_pltfr_remove(). It calls the platform's init() callback and then probes the stmmac device. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-6-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	40db9f1ddf	net: stmmac: dwmac-generic: use stmmac_pltfr_exit() Shrink the code in dwmac-generic by using the new stmmac_pltfr_exit() helper. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-5-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	5b0acf8dd2	net: stmmac: platform: provide stmmac_pltfr_exit() Provide a helper wrapper around calling the platform's exit() callback. This allows users to skip checking if the callback exists. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-4-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	4450e7d423	net: stmmac: dwmac-generic: use stmmac_pltfr_init() Shrink the code in dwmac-generic by using the new stmmac_pltfr_init() helper. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-3-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Bartosz Golaszewski	97117eb51e	net: stmmac: platform: provide stmmac_pltfr_init() Provide a helper wrapper around calling the platform's init() callback. This allows users to skip checking if the callback exists. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20230623100417.93592-2-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:35:59 -07:00
Jakub Kicinski	cfd40b82a5	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-06-22 (ice) This series contains updates to ice driver only. Jake adds a slight wait on control queue send to reduce wait time for responses that occur within normal times. Maciej allows for hot-swapping XDP programs. Przemek removes unnecessary checks when enabling SR-IOV and freeing allocated memory. Christophe Jaillet converts a managed memory allocation to a regular one. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: use ice_down_up() where applicable ice: Remove managed memory usage in ice_get_fw_log_cfg() ice: remove null checks before devm_kfree() calls ice: clean up freeing SR-IOV VFs ice: allow hot-swapping XDP programs ice: reduce initial wait for control queue messages ==================== Link: https://lore.kernel.org/r/20230622183601.2406499-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:32:18 -07:00
Pavel Begunkov	2fe11c9d36	net/tcp: optimise locking for blocking splice Even when tcp_splice_read() reads all it was asked for, for blocking sockets it'll release and immediately regrab the socket lock, loop around and break on the while check. Check tss.len right after we adjust it, and return if we're done. That saves us one release_sock(); lock_sock(); pair per successful blocking splice read. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/80736a2cc6d478c383ea565ba825eaf4d1abd876.1687523671.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:24:01 -07:00
Kuniyuki Iwashima	3f5f118bb6	af_unix: Call scm_recv() only after scm_set_cred(). syzkaller hit a WARN_ON_ONCE(!scm->pid) in scm_pidfd_recv(). In unix_stream_read_generic(), if there is no skb in the queue, we could bail out the do-while loop without calling scm_set_cred(): 1. No skb in the queue 2. sk is non-blocking or shutdown(sk, RCV_SHUTDOWN) is called concurrently or peer calls close() If the socket is configured with SO_PASSCRED or SO_PASSPIDFD, scm_recv() would populate cmsg with garbage. Let's not call scm_recv() unless there is skb to receive. WARNING: CPU: 1 PID: 3245 at include/net/scm.h:138 scm_pidfd_recv include/net/scm.h:138 [inline] WARNING: CPU: 1 PID: 3245 at include/net/scm.h:138 scm_recv.constprop.0+0x754/0x850 include/net/scm.h:177 Modules linked in: CPU: 1 PID: 3245 Comm: syz-executor.1 Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:scm_pidfd_recv include/net/scm.h:138 [inline] RIP: 0010:scm_recv.constprop.0+0x754/0x850 include/net/scm.h:177 Code: 67 fd e9 55 fd ff ff e8 4a 70 67 fd e9 7f fd ff ff e8 40 70 67 fd e9 3e fb ff ff e8 36 70 67 fd e9 02 fd ff ff e8 8c 3a 20 fd <0f> 0b e9 fe fb ff ff e8 50 70 67 fd e9 2e f9 ff ff e8 46 70 67 fd RSP: 0018:ffffc90009af7660 EFLAGS: 00010216 RAX: 00000000000000a1 RBX: ffff888041e58a80 RCX: ffffc90003852000 RDX: 0000000000040000 RSI: ffffffff842675b4 RDI: 0000000000000007 RBP: ffffc90009af7810 R08: 0000000000000007 R09: 0000000000000013 R10: 00000000000000f8 R11: 0000000000000001 R12: ffffc90009af7db0 R13: 0000000000000000 R14: ffff888041e58a88 R15: 1ffff9200135eecc FS: 00007f6b7113f640(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6b7111de38 CR3: 0000000012a6e002 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> unix_stream_read_generic+0x5fe/0x1f50 net/unix/af_unix.c:2830 unix_stream_recvmsg+0x194/0x1c0 net/unix/af_unix.c:2880 sock_recvmsg_nosec net/socket.c:1019 [inline] sock_recvmsg+0x188/0x1d0 net/socket.c:1040 ____sys_recvmsg+0x210/0x610 net/socket.c:2712 ___sys_recvmsg+0xff/0x190 net/socket.c:2754 do_recvmmsg+0x25d/0x6c0 net/socket.c:2848 __sys_recvmmsg net/socket.c:2927 [inline] __do_sys_recvmmsg net/socket.c:2950 [inline] __se_sys_recvmmsg net/socket.c:2943 [inline] __x64_sys_recvmmsg+0x224/0x290 net/socket.c:2943 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f6b71da2e5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007f6b7113ecc8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 00000000004bc050 RCX: 00007f6b71da2e5d RDX: 0000000000000007 RSI: 0000000020006600 RDI: 000000000000000b RBP: 00000000004bc050 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000120 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000006e R14: 00007f6b71e03530 R15: 0000000000000000 </TASK> Fixes: `5e2ff6704a` ("scm: add SO_PASSPIDFD and SCM_PIDFD") Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Alexander Mikhalitsyn <alexander@mihalicyn.com> Link: https://lore.kernel.org/r/20230622184351.91544-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:15:01 -07:00
Jakub Kicinski	1c78eb8760	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-06-22 (iavf) This series contains updates to iavf driver only. Przemek defers removing, previous, primary MAC address until after getting result of adding its replacement. He also does some cleanup by removing unused functions and making applicable functions static. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: iavf: make functions static where possible iavf: remove some unused functions and pointless wrappers iavf: fix err handling for MAC replace ==================== Link: https://lore.kernel.org/r/20230622165914.2203081-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:12:05 -07:00
Randy Dunlap	6a11af7c21	revert "s390/net: lcs: use IS_ENABLED() for kconfig detection" The referenced patch is causing build errors when ETHERNET=y and FDDI=m. While we work out the preferred patch(es), revert this patch to make the pain go away. Fixes: `1282723361` ("s390/net: lcs: use IS_ENABLED() for kconfig detection") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Link: lore.kernel.org/r/202306202129.pl0AqK8G-lkp@intel.com Cc: Alexandra Winter <wintera@linux.ibm.com> Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230622155409.27311-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:10:49 -07:00
Giulio Benetti	28e219aea0	net: phy: broadcom: drop brcm_phy_setbits() and use phy_set_bits() instead Linux provides phy_set_bits() helper so let's drop brcm_phy_setbits() and use phy_set_bits() in its place. Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20230622184721.24368-1-giulio.benetti@benettiengineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 15:05:40 -07:00
Jakub Kicinski	a685d0df75	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZJX+ygAKCRDbK58LschI g0/2AQDHg12smf9mPfK9wOFDNRIIX8r2iufB8LUFQMzCwltN6gEAkAdkAyfbof7P TMaNUiHABijAFtChxoSI35j3OOSRrwE= =GJgN -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-06-23 We've added 49 non-merge commits during the last 24 day(s) which contain a total of 70 files changed, 1935 insertions(+), 442 deletions(-). The main changes are: 1) Extend bpf_fib_lookup helper to allow passing the route table ID, from Louis DeLosSantos. 2) Fix regsafe() in verifier to call check_ids() for scalar registers, from Eduard Zingerman. 3) Extend the set of cpumask kfuncs with bpf_cpumask_first_and() and a rework of bpf_cpumask_any() kfuncs. Additionally, add selftests, from David Vernet. 4) Fix socket lookup BPF helpers for tc/XDP to respect VRF bindings, from Gilad Sever. 5) Change bpf_link_put() to use workqueue unconditionally to fix it under PREEMPT_RT, from Sebastian Andrzej Siewior. 6) Follow-ups to address issues in the bpf_refcount shared ownership implementation, from Dave Marchevsky. 7) A few general refactorings to BPF map and program creation permissions checks which were part of the BPF token series, from Andrii Nakryiko. 8) Various fixes for benchmark framework and add a new benchmark for BPF memory allocator to BPF selftests, from Hou Tao. 9) Documentation improvements around iterators and trusted pointers, from Anton Protopopov. 10) Small cleanup in verifier to improve allocated object check, from Daniel T. Lee. 11) Improve performance of bpf_xdp_pointer() by avoiding access to shared_info when XDP packet does not have frags, from Jesper Dangaard Brouer. 12) Silence a harmless syzbot-reported warning in btf_type_id_size(), from Yonghong Song. 13) Remove duplicate bpfilter_umh_cleanup in favor of umd_cleanup_helper, from Jarkko Sakkinen. 14) Fix BPF selftests build for resolve_btfids under custom HOSTCFLAGS, from Viktor Malik. tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits) bpf, docs: Document existing macros instead of deprecated bpf, docs: BPF Iterator Document selftests/bpf: Fix compilation failure for prog vrf_socket_lookup selftests/bpf: Add vrf_socket_lookup tests bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint bpf: Factor out socket lookup functions for the TC hookpoint. selftests/bpf: Set the default value of consumer_cnt as 0 selftests/bpf: Ensure that next_cpu() returns a valid CPU number selftests/bpf: Output the correct error code for pthread APIs selftests/bpf: Use producer_cnt to allocate local counter array xsk: Remove unused inline function xsk_buff_discard() bpf: Keep BPF_PROG_LOAD permission checks clear of validations bpf: Centralize permissions checks for all BPF map types bpf: Inline map creation logic in map_create() function bpf: Move unprivileged checks into map_create() and bpf_prog_load() bpf: Remove in_atomic() from bpf_link_put(). selftests/bpf: Verify that check_ids() is used for scalars in regsafe() bpf: Verify scalar ids mapping in regsafe() using check_ids() selftests/bpf: Check if mark_chain_precision() follows scalar ids ... ==================== Link: https://lore.kernel.org/r/20230623211256.8409-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-24 14:52:28 -07:00
Jakub Kicinski	d1d29a42f7	Merge branch 'mlxsw-maintain-candidate-rifs' Petr Machata says: ==================== mlxsw: Maintain candidate RIFs The mlxsw driver currently makes the assumption that the user applies configuration in a bottom-up manner. Thus netdevices need to be added to the bridge before IP addresses are configured on that bridge or SVI added on top of it. Enslaving a netdevice to another netdevice that already has uppers is in fact forbidden by mlxsw for this reason. Despite this safety, it is rather easy to get into situations where the offloaded configuration is just plain wrong. As an example, take a front panel port, configure an IP address: it gets a RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the port from the bridge again, but the RIF never comes back. There is a number of similar situations, where changing the configuration there and back utterly breaks the offload. The situation is going to be made better by implementing a range of replays and post-hoc offloads. This patch set lays the ground for replay of next hops. The particular issue that it deals with is that currently, driver-specific bookkeeping for next hops is hooked off RIF objects, which come and go across the lifetime of a netdevice. We would rather keep these objects at an entity that mirrors the lifetime of the netdevice itself. That way they are at hand and can be offloaded when a RIF is eventually created. To that end, with this patchset, mlxsw keeps a hash table of CRIFs: candidate RIFs, persistent handles for netdevices that mlxsw deems potentially interesting. The lifetime of a CRIF matches that of the underlying netdevice, and thus a RIF can always assume a CRIF exists. A CRIF is where next hops are kept, and when RIF is created, these next hops can be easily offloaded. (Previously only the next hops created after the RIF was created were offloaded.) - Patches #1 and #2 are minor adjustments. - In patches #3 and #4, add CRIF bookkeeping. - In patch #5, link CRIFs to RIFs such that given a netdevice-backed RIF, the corresponding CRIF is easy to look up. - Patch #6 is a clean-up allowed by the previous patches - Patches #7 and #8 move next hop tracking to CRIFs No observable effects are intended as of yet. This will be useful once there is support for RIF creation for netdevices that become mlxsw uppers, which will come in following patch sets. ==================== Link: https://lore.kernel.org/r/cover.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	9464a3d68e	mlxsw: spectrum_router: Track next hops at CRIFs Move the list of next hops from struct mlxsw_sp_rif to mlxsw_sp_crif. The reason is that eventually, next hops for mlxsw uppers should be offloaded and unoffloaded on demand as a netdevice becomes an upper, or stops being one. Currently, next hops are tracked at RIFs, but RIFs do not exist when a netdevice is not an mlxsw uppers. CRIFs are kept track of throughout the netdevice lifetime. Correspondingly, track at each next hop not its RIF, but its CRIF (from which a RIF can always be deduced). Note that now that next hops are tracked at a CRIF, it is not necessary to move each over to a new RIF when it is necessary to edit a RIF. Therefore drop mlxsw_sp_nexthop_rif_migrate() and have mlxsw_sp_rif_migrate_destroy() call mlxsw_sp_nexthop_rif_update() directly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/e7c1c0a7dd13883b0f09aeda12c4fcf4d63a70e3.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	a285d66423	mlxsw: spectrum_router: Split nexthop finalization to two stages Nexthop finalization consists of two steps: the part where the offload is removed, because the backing RIF is now gone; and the part where the association to the RIF is severed. Extract from mlxsw_sp_nexthop_type_fini() a helper that covers the unoffloading part, mlxsw_sp_nexthop_type_rif_gone(), so that it can later be called independently. Note that this swaps around the ordering of mlxsw_sp_nexthop_ipip_fini() vs. mlxsw_sp_nexthop_rif_fini(). The current ordering is more of a historical happenstance than a conscious decision. The two cleanups do not depend on each other, and this change should have no observable effects. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7134559534c5f5c4807c3a1569fae56f8887e763.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	bdc0b78e79	mlxsw: spectrum_router: Use router.lb_crif instead of .lb_rif_index A previous patch added a pointer to loopback CRIF to the router data structure. That makes the loopback RIF index redundant, as everything necessary can be derived from the CRIF. Drop the field and adjust the code accordingly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/8637bf959bc5b6c9d5184b9bd8a0cd53c5132835.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	aa21242b07	mlxsw: spectrum_router: Link CRIFs to RIFs When a RIF is about to be created, the registration of the netdevice that it should be associated with must have been seen in the past, and a CRIF created. Therefore make this a hard requirement by looking up the CRIF during RIF creation, and complaining loudly when there isn't one. This then allows to keep a link between a RIF and its corresponding CRIF (and back, as the relationship is one-to-at-most-one), which do. The CRIF will later be useful as the objects tracked there will be offloaded lazily as a result of RIF creation. CRIFs are created when an "interesting" netdevice is registered, and destroyed after such device is unregistered. CRIFs are supposed to already exist when a RIF creation request arises, and exist at least as long as that RIF exists. This makes for a simple invariant: it is always safe to dereference CRIF pointer from "its" RIF. To guarantee this, CRIFs cannot be removed immediately when the UNREGISTER event is delivered. The reason is that if a RIF's netdevices has an IPv6 address, removal of this address is notified in an atomic block. To remove the RIF, the IPv6 removal handler schedules a work item. It must be safe for this work item to access the associated CRIF as well. Thus when a netdevice that backs the CRIF is removed, if it still has a RIF, do not actually free the CRIF, only toggle its can_destroy flag, which this patch adds. Later on, mlxsw_sp_rif_destroy() collects the CRIF. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/68c8e33afa6b8c03c431b435e1685ffdff752e63.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	78126cfd5d	mlxsw: spectrum_router: Maintain CRIF for fallback loopback RIF CRIFs are generally not maintained for loopback RIFs. However, the RIF for the default VRF is used for offloading of blackhole nexthops. Nexthops expect to have a valid CRIF. Therefore in this patch, add code to maintain CRIF for the loopback RIF as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7f2b2fcc98770167ed1254a904c3f7f585ba43f0.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	4796c287b7	mlxsw: spectrum_router: Maintain a hash table of CRIFs CRIFs are objects that mlxsw maintains for netdevices that may not have an associated RIF (i.e. they may not have been instantiated in the ASIC), but if indeed they do not, it is quite possible they will in the future. These netdevices are candidate RIFs, hence CRIFs. Netdevices for which CRIFs are created include e.g. bridges, LAGs, or front panel ports. The idea is that next hops would be kept at CRIFs, not RIFs, and thus it would be easier to offload and unoffload the entities that have been added before the RIF was created. In this patch, add the code for low-level CRIF maintenance: create and destroy, and keep in a table keyed by the netdevice pointer for easy recall. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/186d44e399c475159da20689f2c540719f2d1ed0.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00
Petr Machata	f3c85eed1a	mlxsw: spectrum_router: Use mlxsw_sp_ul_rif_get() to get main VRF LB RIF The current function, mlxsw_sp_router_ul_rif_get(), is a wrapper around the function mentioned in the subject. As such it forms an external interface of the router code. In future patches we will want to maintain connection between RIFs and the CRIFs (introduced in the next patch) that back them. That will not hold for the VRF-based loopback netdevices, so the whole CRIF business can be kept hidden from the rest of mlxsw. But for the main VRF loopback RIF we do want to keep the RIF-CRIF connection, because that RIF is used for blackhole next hops, and the next hop code can be kept simpler for assuming rif->crif is valid. Hence, instead, call mlxsw_sp_ul_rif_get() to create the main VRF loopback RIF. This being an internal function will take the CRIF argument anyway. Furthermore, the function does not lock, which is not necessary at this point in code yet. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Link: https://lore.kernel.org/r/7a39a011a02a84164cd7f5da7985ec5b2ae01ba5.1687438411.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-06-23 19:01:56 -07:00

1 2 3 4 5 ...

1188920 Commits