linux

iv/linux

Author	SHA1	Message	Date
Romain Gantois	0f671b3b6e	net: pcs: rzn1-miic: Init RX clock early if MAC requires it The GMAC1 controller in the RZN1 IP requires the RX MII clock signal to be started before it initializes its own hardware, thus before it calls phylink_start. Implement the pcs_pre_init() callback so that the RX clock signal can be enabled early if necessary. Reported-by: Clément Léger <clement.leger@bootlin.com> Link: https://lore.kernel.org/linux-arm-kernel/20230116103926.276869-4-clement.leger@bootlin.com/ Signed-off-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240326-rxc_bugfix-v6-7-24a74e5c761f@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 19:21:34 -07:00
Russell King (Oracle)	30dc587396	net: phy: qcom: at803x: Avoid hibernating if MAC requires RX clock Stmmac controllers connected to an at803x PHY cannot resume properly after suspend when WoL is enabled. This happens because the MAC requires an RX clock generated by the PHY to initialize its hardware properly. But the RX clock is cut when the PHY suspends and isn't brought up until the MAC driver resumes the phylink. Prevent the at803x PHY driver from going into suspend if the attached MAC driver always requires an RX clock signal. Reported-by: Clark Wang <xiaoning.wang@nxp.com> Link: https://lore.kernel.org/all/20230202081559.3553637-1-xiaoning.wang@nxp.com/ Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [rgantois: commit log] Signed-off-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240326-rxc_bugfix-v6-6-24a74e5c761f@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 19:21:34 -07:00
Romain Gantois	58329b03a5	net: stmmac: Signal to PHY/PCS drivers to keep RX clock on There is a reocurring issue with stmmac controllers where the MAC fails to initialize its hardware if an RX clock signal isn't provided on the MAC/PHY link. This causes issues when PHY or PCS devices either go into suspend while cutting the RX clock or do not bring the clock signal up early enough for the MAC to initialize successfully. Set the mac_requires_rxc flag in the stmmac phylink config so that PHY/PCS drivers know to keep the RX clock up at all times. Reported-by: Clark Wang <xiaoning.wang@nxp.com> Link: https://lore.kernel.org/all/20230202081559.3553637-1-xiaoning.wang@nxp.com/ Reported-by: Clément Léger <clement.leger@bootlin.com> Link: https://lore.kernel.org/linux-arm-kernel/20230116103926.276869-4-clement.leger@bootlin.com/ Co-developed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240326-rxc_bugfix-v6-5-24a74e5c761f@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 19:21:34 -07:00
Romain Gantois	f7bff228a6	net: stmmac: Support a generic PCS field in mac_device_info Global stmmac support for early initialization of PCS devices requires a generic PCS reference that can be passed to phylink_pcs_pre_init(). Currently, the mac_device_info struct contains only one PCS field, which is specific to the Lynx model. As PCS models are hardware-specific, it is more appropriate to have a generic PCS field in the mac_device_info struct. Refactor the lynx_pcs field into a generic phylink_pcs field. Signed-off-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240326-rxc_bugfix-v6-4-24a74e5c761f@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 19:21:33 -07:00
Maxime Chevallier	10658e99d9	net: stmmac: don't rely on lynx_pcs presence to check for a PHY When initializing attached PHYs, there are some cases where we don't expect any PHY to be connected. The logic uses conditions based on various local PCS configuration, but also calls-in phylink_expects_phy() via stmmac_init_phy(), which is enough to ensure we don't try to initialize a PHY when using a Lynx PCS, as long as we have the phy_interface set to a 802.3z mode and are using inband negociation. Drop the lynx check, making the stmmac generic code more pcs_lynx-agnostic. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> [rgantois: commit log] Signed-off-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240326-rxc_bugfix-v6-3-24a74e5c761f@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 19:21:33 -07:00
Romain Gantois	dceb393a0a	net: phylink: add rxc_always_on flag to phylink_pcs Some MAC drivers (e.g. stmmac) require a continuous receive clock signal to be generated by a PCS that is handled by a standalone PCS driver. Such a PCS driver does not have access to a PHY device, thus cannot check the PHY_F_RXC_ALWAYS_ON flag. They cannot check max_requires_rxc in the phylink config either, since it is a private member. Therefore, a new flag is needed to signal to the PCS that it should keep the RX clock signal up at all times. Co-developed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240326-rxc_bugfix-v6-2-24a74e5c761f@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 19:21:33 -07:00
Russell King (Oracle)	21d9ba5bc5	net: phylink: add PHY_F_RXC_ALWAYS_ON to PHY dev flags Some MAC controllers (e.g. stmmac) require their connected PHY to continuously provide a receive clock signal. This can cause issues in two cases: 1. The clock signal hasn't been started yet by the time the MAC driver initializes its hardware. This can make the initialization fail, as in the case of the rzn1 GMAC1 driver. 2. The clock signal is cut during a power saving event. By the time the MAC is brought back up, the clock signal is still not active since phylink_start hasn't been called yet. This brings us back to case 1. If a PHY driver reads this flag, it should ensure that the receive clock signal is started as soon as possible, and that it isn't brought down when the PHY goes into suspend. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [rgantois: commit log] Signed-off-by: Romain Gantois <romain.gantois@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240326-rxc_bugfix-v6-1-24a74e5c761f@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 19:21:33 -07:00
Jakub Kicinski	af352c3b66	Merge branch 'compiler_types-add-endianness-dependent-__counted_by_-le-be' Alexander Lobakin says: ==================== compiler_types: add Endianness-dependent __counted_by_{le,be} Some structures contain flexible arrays at the end and the counter for them, but the counter has explicit Endianness and thus __counted_by() can't be used directly. To increase test coverage for potential problems without breaking anything, introduce __counted_by_{le,be} defined depending on platform's Endianness to either __counted_by() when applicable or noop otherwise. The first user will be virtchnl2.h from idpf just as example with 9 flex structures having Little Endian counters. Maybe it would be a good idea to introduce such attributes on compiler level if possible, but for now let's stop on what we have. ==================== Link: https://lore.kernel.org/r/20240327142241.1745989-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:50:51 -07:00
Alexander Lobakin	93d24acfa0	idpf: sprinkle __counted_by{,_le}() in the virtchnl2 header Both virtchnl2.h and its consumer idpf_virtchnl.c are very error-prone. There are 10 structures with flexible arrays at the end, but 9 of them has flex member counter in Little Endian. Make the code a bit more robust by applying __counted_by_le() to those 9. LE platforms is the main target for this driver, so they would receive additional protection. While we're here, add __counted_by() to virtchnl2_ptype::proto_id, as its counter is `u8` regardless of the Endianness. Compile test on x86_64 (LE) didn't reveal any new issues after applying the attributes. Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240327142241.1745989-4-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:50:47 -07:00
Alexander Lobakin	c00d33f1fc	idpf: make virtchnl2.h self-contained To ease maintaining of virtchnl2.h, which already is messy enough, make it self-contained by adding missing if_ether.h include due to %ETH_ALEN usage. At the same time, virtchnl2_lan_desc.h is not used anywhere in the file, so move this include to idpf_txrx.h to speed up C preprocessing. Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240327142241.1745989-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:50:47 -07:00
Alexander Lobakin	ca7e324e8a	compiler_types: add Endianness-dependent __counted_by_{le,be} Some structures contain flexible arrays at the end and the counter for them, but the counter has explicit Endianness and thus __counted_by() can't be used directly. To increase test coverage for potential problems without breaking anything, introduce __counted_by_{le,be}() defined depending on platform's Endianness to either __counted_by() when applicable or noop otherwise. Maybe it would be a good idea to introduce such attributes on compiler level if possible, but for now let's stop on what we have. Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240327142241.1745989-2-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:50:47 -07:00
Jakub Kicinski	6e9b01909a	net: remove gfp_mask from napi_alloc_skb() __napi_alloc_skb() is napi_alloc_skb() with the added flexibility of choosing gfp_mask. This is a NAPI function, so GFP_ATOMIC is implied. The only practical choice the caller has is whether to set __GFP_NOWARN. But that's a false choice, too, allocation failures in atomic context will happen, and printing warnings in logs, effectively for a packet drop, is both too much and very likely non-actionable. This leads me to a conclusion that most uses of napi_alloc_skb() are simply misguided, and should use __GFP_NOWARN in the first place. We also have a "standard" way of reporting allocation failures via the queue stat API (qstats::rx-alloc-fail). The direct motivation for this patch is that one of the drivers used at Meta calls napi_alloc_skb() (so prior to this patch without __GFP_NOWARN), and the resulting OOM warning is the top networking warning in our fleet. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240327040213.3153864-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:30:40 -07:00
Bjorn Helgaas	49d665b853	qed: Drop useless pci_params.pm_cap qed_init_pci() used pci_params.pm_cap only to cache the pci_dev.pm_cap. Drop the cache and use pci_dev.pm_cap directly. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240325224931.1462051-1-helgaas@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:22:28 -07:00
John Fraker	3bcbc67be1	gve: Add counter adminq_get_ptype_map_cnt to stats report This counter counts the number of times get_ptype_map is executed on the admin queue, and was previously missing from the stats report. Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240325223308.618671-1-jfraker@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:21:05 -07:00
Jakub Kicinski	c602f4ca13	Merge branch 'ravb-support-describing-the-mdio-bus' Niklas Söderlund says: ==================== ravb: Support describing the MDIO bus This series adds support to the binding and driver of the Renesas Ethernet AVB to described the MDIO bus. Currently the driver uses the OF node of the device itself when registering the MDIO bus. This forces any MDIO bus properties the MDIO core should react on to be set on the device OF node. This is confusing and none of the MDIO bus properties are described in the Ethernet AVB bindings. Patch 1/2 extends the bindings with an optional mdio child-node to the device that can be used to contain the MDIO bus settings. While patch 2/2 changes the driver to use this node (if present) when registering the MDIO bus. If the new optional mdio child-node is not present the driver fallback to the old behavior and uses the device OF node like before. This change is fully backward compatible with existing usage of the bindings. ==================== Link: https://lore.kernel.org/r/20240325153451.2366083-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:17:54 -07:00
Niklas Söderlund	2c60c4c008	ravb: Add support for an optional MDIO mode The driver used the DT node of the device itself when registering the MDIO bus. While this works, it creates a problem: it forces any MDIO bus properties to also be set on the devices DT node. This mixes the properties of two distinctly different things and is confusing. This change adds support for an optional mdio node to be defined as a child to the device DT node. The child node can then be used to describe MDIO bus properties that the MDIO core can act on when registering the bus. If no mdio child node is found the driver fallback to the old behavior and register the MDIO bus using the device DT node. This change is backward compatible with old bindings in use. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240325153451.2366083-3-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:17:52 -07:00
Niklas Söderlund	a87590c45c	dt-bindings: net: renesas,etheravb: Add optional MDIO bus node The Renesas Ethernet AVB bindings do not allow the MDIO bus to be described. This has not been needed as only a single PHY is supported and no MDIO bus properties have been needed. Add an optional mdio node to the binding which allows the MDIO bus to be described and allow bus properties to be set. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20240325153451.2366083-2-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:17:52 -07:00
Jakub Kicinski	fb984d17fd	Merge branch 'doc-netlink-specs-add-vlan-support' Hangbin Liu says: ==================== doc/netlink/specs: Add vlan support Add vlan support in rt_link spec. ==================== Link: https://lore.kernel.org/r/20240327123130.1322921-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:07:10 -07:00
Hangbin Liu	782c1084b9	doc/netlink/specs: Add vlan attr in rt_link spec With command: # ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/rt_link.yaml \ --do getlink --json '{"ifname": "eno1.2"}' --output-json \| \ jq -C '.linkinfo' Before: Exception: No message format for 'vlan' in sub-message spec 'linkinfo-data-msg' After: { "kind": "vlan", "data": { "protocol": "8021q", "id": 2, "flag": { "flags": [ "reorder-hdr" ], "mask": "0xffffffff" }, "egress-qos": { "mapping": [ { "from": 1, "to": 2 }, { "from": 4, "to": 4 } ] } } } Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240327123130.1322921-3-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:07:08 -07:00
Hangbin Liu	b334f5ed3d	ynl: support hex display_hint for integer Some times it would be convenient to read the integer as hex, like mask values. Suggested-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://lore.kernel.org/r/20240327123130.1322921-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:07:08 -07:00
Jakub Kicinski	51cf49f626	Merge branch 'selftests-fixes-for-kernel-ci' Petr Machata says: ==================== selftests: Fixes for kernel CI As discussed on the bi-weekly call on Jan 30, and in mailing around kernel CI effort, some changes are desirable in the suite of forwarding selftests the better to work with the CI tooling. Namely: - The forwarding selftests use a configuration file where names of interfaces are defined and various variables can be overridden. There is also forwarding.config.sample that users can use as a template to refer to when creating the config file. What happens a fair bit is that users either do not know about this at all, or simply forget, and are confused by cryptic failures about interfaces that cannot be created. In patches #1 - #3 have lib.sh just be the single source of truth with regards to which variables exist. That includes the topology variables which were previously only in the sample file, and any "tweak variables", such as what tools to use, sleep times, etc. forwarding.config.sample then becomes just a placeholder with a couple examples. Unless specific HW should be exercised, or specific tools used, the defaults are usually just fine. - Several net/forwarding/ selftests (and one net/ one) cannot be run on veth pairs, they need an actual HW interface to run on. They are generic in the sense that any capable HW should pass them, which is why they have been put to net/forwarding/ as opposed to drivers/net/, but they do not generalize to veth. The fact that these tests are in net/forwarding/, but still complaining when run, is confusing. In patches #4 - #6 move these tests to a new directory drivers/net/hw. - The following patches extend the codebase to handle well test results other than pass and fail. Patch #7 is preparatory. It converts several log_test_skip to XFAIL, so that tests do not spuriously end up returning non-0 when they are not supposed to. In patches #8 - #10, introduce some missing ksft constants, then support having those constants in RET, and then finally in EXIT_STATUS. - The traffic scheduler tests generate a large amount of network traffic to test the behavior of the scheduler. This demands a relatively high-performance computer. On slow machines, such as with a debugging kernel, the test would spuriously fail. It can still be useful to "go through the motions" though, to possibly catch bugs in setup of the scheduler graph and passing packets around. Thus we still want to run the tests, just with lowered demands. To that end, in patches #11 - #12, introduce an environment variable KSFT_MACHINE_SLOW, with obvious meaning. Tests can then make checks more lenient, such as mark failures as XFAIL. A helper, xfail_on_slow, is provided to mark performance-sensitive parts of the selftest. - In patch #13, use a similar mechanism to mark a NH group stats selftest to XFAIL HW stats tests when run on VETH pairs. - All these changes complicate the hitherto straightforward logging and checking logic, so in patch #14, add a selftest that checks this functionality in lib.sh. v1 (vs. an RFC circulated through linux-kselftest): - Patch #9: - Clarify intended usage by s/set_ret/ret_set_ksft_status/, s/nret/ksft_status/ ==================== Link: https://lore.kernel.org/r/cover.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:46 -07:00
Petr Machata	8ff2d7abfb	selftests: forwarding: Add a test for testing lib.sh functionality Rerunning various scenarios to make sure lib.sh changes do not impact the observable behavior is no fun. Add a selftest at least for the bare basics -- the mechanics of setting RET, retmsg, and EXIT_STATUS. Since the selftest itself uses lib.sh, it would be possible to break lib.sh in such a way that invalidates result of the selftest. Since the metatest only uses the bare basics (just pass/fail), hopefully such fundamental breakages would be noticed. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/6d25cedbf2d4b83614944809a34fe023fbe8db38.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:43 -07:00
Petr Machata	6db870bbf7	selftests: forwarding: router_mpath_nh_lib: Don't skip, xfail on veth When the NH group stats tests are currently run on a veth topology, the HW-stats leg of each test is SKIP'ped. But kernel networking CI interprets skips as a sign that tooling is missing, and prompts maintainer investigation. Lack of capability to pass a test should be expressed as XFAIL. Selftests that require HW should normally be put in drivers/net/hw, but doing so for the NH counter selftests would just lead to a lot of duplicity. So instead, introduce a helper, xfail_on_veth(), which can be used to mark selftests that should XFAIL instead of FAILing when run on a veth topology. On non-veth topology, they don't do anything. Use the helper in the HW-stats part of router_mpath_nh_lib selftest. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/15f0ab9637aa0497f164ec30e83c1c8f53d53719.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:43 -07:00
Petr Machata	e10391092a	selftests: forwarding: Mark performance-sensitive tests When run on a slow machine, the scheduler traffic tests can be expected to fail, and should be reported as XFAIL in that case. Therefore run these tests through the perf_sensitive wrapper. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/9a357f8cf34f5ececac08d43a3eb023008996035.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:43 -07:00
Petr Machata	e16a8d209c	selftests: forwarding: Support for performance sensitive tests Several tests in the suite use large amounts of traffic to e.g. cause congestion and evaluate RED or shaper performance. These tests will not run well on a slow machine, be it one with heavy debug kernel, or a VM, or e.g. a single-board computer. Allow users to specify an environment variable, KSFT_MACHINE_SLOW=yes, to indicate that the tests are being run on one such machine. Performance sensitive tests can then use a new helper, xfail_on_slow(), to mark parts of the test that are sensitive to low-performance machines. The helper can be used to just mark the whole suite, like so: xfail_on_slow tests_run ... or, on the other side of the granularity spectrum, to override individual checks: xfail_on_slow check_err $? "Expected much, got little." Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/99a376a2d2ffdaeee7752b1910cb0c3ea5d80fbe.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:43 -07:00
Petr Machata	a923af1cee	selftests: forwarding: Convert log_test() to recognize RET values In a previous patch, the interpretation of RET value was changed to mean the kselftest framework constant with the test outcome: $ksft_pass, $ksft_xfail, etc. Update log_test() to recognize the various possible RET values. Then have EXIT_STATUS track the RET value of the current test. This differs subtly from the way RET tracks the value: while for RET we want to recognize XFAIL as a separate status, for purposes of exit code, we want to to conflate XFAIL and PASS, because they both communicate non-failure. Thus add a new helper, ksft_exit_status_merge(). With this log_test_skip() and log_test_xfail() can be reexpressed as thin wrappers around log_test. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/e5f807cb5476ab795fd14ac74da53a731a9fc432.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:43 -07:00
Petr Machata	596c8819cb	selftests: forwarding: Have RET track kselftest framework constants The variable RET keeps track of whether the test under execution has so far failed or not. Currently it works in binary fashion: zero means everything is fine, non-zero means something failed. log_test() then uses the value to given a human-readable message. In order to allow log_test() to report skips and xfails, the semantics of RET need to be more fine-grained. Therefore have RET value be one of kselftest framework constants: $ksft_fail, $ksft_xfail, etc. The current logic in check_err() is such that first non-zero value of RET trumps all those that follow. But that is not right when RET has more fine-grained value semantics. Different outcomes have different weights. The results of PASS and XFAIL are mostly the same: they both communicate a test that did not go wrong. SKIP communicates lack of tooling, which the user should go and try to fix, and as such should not be overridden by the passes. So far, the higher-numbered statuses can be considered weightier. But FAIL should be the weightiest. Add a helper, ksft_status_merge(), which merges two statuses in a way that respects the above conditions. Express it in a generic manner, because exit status merge is subtly different, and we want to reuse the same logic. Use the new helper when setting RET in check_err(). Re-express check_fail() in terms of check_err() to avoid duplication. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/7dfff51cc925c7a3ac879b9050a0d6a327c8d21f.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:42 -07:00
Petr Machata	51ccf267be	selftests: lib: Define more kselftest exit codes The following patches will operate with more exit codes besides ksft_skip. Add them here. Additionally, move a duplicated skip exit code definition from forwarding/tc_tunnel_key.sh. Keep a similar duplicate in forwarding/devlink_lib.sh, because even though lib.sh will have been sourced in all cases where devlink_lib is, the inclusion is not visible in the file itself, and relying on it would be confusing. Cc: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/545a03046c7aca0628a51a389a9b81949ab288ce.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:42 -07:00
Petr Machata	677f394956	selftests: forwarding: Change inappropriate log_test_skip() calls The SKIP return should be used for cases where tooling of the machine under test is lacking. For cases where HW is lacking, the appropriate outcome is XFAIL. This is the case with ethtool_rmon and mlxsw_lib. For these, introduce a new helper, log_test_xfail(). Do the same for router_mpath_nh_lib. Note that it will be fixed using a more reusable way in a following patch. For the two resource_scale selftests, the log should simply not be written, because there is no problem. Cc: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/3d668d8fb6fa0d9eeb47ce6d9e54114348c7c179.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:42 -07:00
Petr Machata	0c499a3517	selftests: forwarding: Ditch skip_on_veth() Since the selftests that are not supposed to run on veth pairs are now in their own dedicated directory, the skip_on_veth logic can go away. Drop it from the selftests, and from lib.sh. Cc: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/63b470e10d65270571ee7de709b31672ce314872.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:42 -07:00
Petr Machata	40d269c000	selftests: forwarding: Move several selftests The tests in net/forwarding are generally expected to be HW-independent. There are however several tests that, while not depending on any HW in particular, nevertheless depend on being used on HW interfaces. Placing these selftests to net/forwarding is confusing, because the selftest will just report it can't be run on veth pairs. At the same time, placing them to a particular driver's selftests subdirectory would be wrong. Instead, add a new directory, drivers/net/hw, where these generic but HW independent selftests should be placed. Move over several such tests including one helper library. Since typically these tests will not be expected to run, omit the directory drivers/net/hw from the TARGETS list in selftests/Makefile. Retain a Makefile in the new directory itself, so that a user can make -C into that directory and act on those tests explicitly. Cc: Roger Quadros <rogerq@kernel.org> Cc: Tobias Waldekranz <tobias@waldekranz.com> Cc: Danielle Ratson <danieller@nvidia.com> Cc: Davide Caratti <dcaratti@redhat.com> Cc: Johannes Nixdorf <jnixdorf-oss@avm.de> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/e11dae1f62703059e9fc2240004288ac7cc15756.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:42 -07:00
Petr Machata	0faa565bc4	selftests: forwarding: ipip_lib: Do not import lib.sh This library is always sourced in the context where lib.sh has already been sourced as well. Therefore drop the explicit sourcing and expect the client to already have done it. This will simplify moving some of the clients to a different directory. Signed-off-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/a4da5e9cd42a34cbace917a048ca71081719d6ac.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:42 -07:00
Petr Machata	0cb862871f	selftests: forwarding: README: Document customization That any sort of customization is possible at all, let alone how it should be done, is currently not at all clear. Document the whats and hows in README. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/e819623af6aaeea49e9dc36cecd95694fad73bb8.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:42 -07:00
Petr Machata	fd36fd26ae	selftests: forwarding.config.sample: Move overrides to lib.sh forwarding.config.sample, net/lib.sh and net/forwarding/lib.sh contain definitions and redefinitions of some of the same variables. The overlap between net/forwarding/lib.sh and forwarding.config.sample is especially large. This duplication is a potential source of confusion and problems. It would be overall less error prone if each variable were defined in one place only. In this patch set, that place is the library itself. Therefore move all comments from forwarding.config.sample to net/forwarding/lib.sh. Move over also a definition of TC_FLAG, which was missing from lib.sh entirely. Additionally, add to lib.sh a default definition of the topology variables. The logic behind this is that forgetting to specify forwarding.config was a frequent source of frustration for the selftest users. But really, most of the time the default veth based topology is just fine. We considered just sourcing forwarding.config.sample instead if forwarding.config is not available, but this is a cleaner solution. That means the syntax of the forwarding.config.sample override has to change to an array assignment, so that the whole variable is overwritten, not just individual keys, which could leave the value of some keys unchanged. Do the same in lib.sh for any cut'n'pasters out there. The config file is then given a sort of carte blanche to redefine whatever variables it sees fit from the libraries. This is described in a comment in the file. Only a handful of variables are left behind, to illustrate the customization. The fact that the variables are now missing from forwarding.config.sample, and therefore would miss from forwarding.config derived from that file as well, should not change anything. This is just the sample file. Users that keep their own forwarding.config would retain it as before. The only observable change is introduction of TC_FLAG to lib.sh, because now the filters would not be attempted to install to HW datapath. For veth pairs this does not change anything. For HW deployments, users presumably have forwarding.config with this value overridden. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/b9b8a11a22821a7aa532211ff461a34f596e26bf.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:41 -07:00
Petr Machata	fa61e9aec9	selftests: net: libs: Change variable fallback syntax The current syntax of X=${X:=X} first evaluates the ${X:=Y} expression, which either uses the existing value of $X if there is one, or uses the value of "Y" as a fallback, and assigns it to X. The expression is then replaced with the now-current value of $X. Assigning that value to X once more is meaningless. So avoid the outer X=... bit, and instead express the same idea though the do-nothing ":" built-in as : "${X:=Y}". This also cleans up the block nicely and makes it more readable. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/1890ddc58420c2c0d5ba3154c87ecc6d9faf6947.1711464583.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 18:03:41 -07:00
Jakub Kicinski	5e47fbe5ce	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-03-28 17:25:57 -07:00
Linus Torvalds	50108c352d	Including fixes from bpf, WiFi and netfilter. Current release - regressions: - ipv6: fix address dump when IPv6 is disabled on an interface Current release - new code bugs: - bpf: temporarily disable atomic operations in BPF arena - nexthop: fix uninitialized variable in nla_put_nh_group_stats() Previous releases - regressions: - bpf: protect against int overflow for stack access size - hsr: fix the promiscuous mode in offload mode - wifi: don't always use FW dump trig - tls: adjust recv return with async crypto and failed copy to userspace - tcp: properly terminate timers for kernel sockets - ice: fix memory corruption bug with suspend and rebuild - at803x: fix kernel panic with at8031_probe - qeth: handle deferred cc1 Previous releases - always broken: - bpf: fix bug in BPF_LDX_MEMSX - netfilter: reject table flag and netdev basechain updates - inet_defrag: prevent sk release while still in use - wifi: pick the version of SESSION_PROTECTION_NOTIF - wwan: t7xx: split 64bit accesses to fix alignment issues - mlxbf_gige: call request_irq() after NAPI initialized - hns3: fix kernel crash when devlink reload during pf initialization Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmYFezkSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkdAUP/3SYNsFNIkh0/jwQqO9qBLJfI4suFjYG +s8jOGdCiA7n7aSgzv/RgGZ7XNqOegW3mpPRHecVVZcDu5I9y9N4AOhTDQG84TM/ 65YatgWpiZJT74oVEpoA8zcnmb4CCGYdWAxJCQZUKXoLjMAMPWelU4ee6VwonxGy GJ97+a4AxTXGvmQTi3rz0HLrSHQaizA+D7YP7YD8JczkG7I7kcAIR+SUWVKLSuw0 VJnbko7RPIe3vdFFlMFypPgpZASjnO0O8g60s+eruazarEpMZE2+RqPfyz0nEg+u IK3W9zRw7r0PMkKqk9PoSaRjsIaNqIZBJR2Smh2cLMIpEB4CUvEFLi7WAshIdyUC +LBN9um3Ep3vLYh4nyuU3FzAyqdsqEo6+ayJCTRKq91xv9LrLmIN16IQpAqaRikb LJAuiaASwIpyu1FxBuTv41mLEUKtpm7ooziomHTJ7KbtzSf4QevRMBtorrB5t7VH l4yvp9ymcwHE79q8nrak1JH1JI/kCT5ZEPSqcOU5UNKSf6INjWqUTJedqZdVa5wB WiSZBixAmsc7DgZzARWKotRkgBEDyGeeHwrNLo/2kS8rS+hUCf6mSafpTZiPI/kL e+SVh+9RA8elFIF3sBV0VPcyt35G+if8o1NG1/2OTDPvZEkIz21eJhJgGyxRMHCD cpVSRBkU+np3 =HbtI -----END PGP SIGNATURE----- Merge tag 'net-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf, WiFi and netfilter. Current release - regressions: - ipv6: fix address dump when IPv6 is disabled on an interface Current release - new code bugs: - bpf: temporarily disable atomic operations in BPF arena - nexthop: fix uninitialized variable in nla_put_nh_group_stats() Previous releases - regressions: - bpf: protect against int overflow for stack access size - hsr: fix the promiscuous mode in offload mode - wifi: don't always use FW dump trig - tls: adjust recv return with async crypto and failed copy to userspace - tcp: properly terminate timers for kernel sockets - ice: fix memory corruption bug with suspend and rebuild - at803x: fix kernel panic with at8031_probe - qeth: handle deferred cc1 Previous releases - always broken: - bpf: fix bug in BPF_LDX_MEMSX - netfilter: reject table flag and netdev basechain updates - inet_defrag: prevent sk release while still in use - wifi: pick the version of SESSION_PROTECTION_NOTIF - wwan: t7xx: split 64bit accesses to fix alignment issues - mlxbf_gige: call request_irq() after NAPI initialized - hns3: fix kernel crash when devlink reload during pf initialization" * tag 'net-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits) inet: inet_defrag: prevent sk release while still in use Octeontx2-af: fix pause frame configuration in GMP mode net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips net: bcmasp: Remove phy_{suspend/resume} net: bcmasp: Bring up unimac after PHY link up net: phy: qcom: at803x: fix kernel panic with at8031_probe netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c netfilter: nf_tables: skip netdev hook unregistration if table is dormant netfilter: nf_tables: reject table flag and netdev basechain updates netfilter: nf_tables: reject destroy command to remove basechain hooks bpf: update BPF LSM designated reviewer list bpf: Protect against int overflow for stack access size bpf: Check bloom filter map value size bpf: fix warning for crash_kexec selftests: netdevsim: set test timeout to 10 minutes net: wan: framer: Add missing static inline qualifiers mlxbf_gige: call request_irq() after NAPI initialized tls: get psock ref after taking rxlock to avoid leak selftests: tls: add test with a partially invalid iov tls: adjust recv return with async crypto and failed copy to userspace ...	2024-03-28 13:09:37 -07:00
Florian Westphal	18685451fc	inet: inet_defrag: prevent sk release while still in use ip_local_out() and other functions can pass skb->sk as function argument. If the skb is a fragment and reassembly happens before such function call returns, the sk must not be released. This affects skb fragments reassembled via netfilter or similar modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline. Eric Dumazet made an initial analysis of this bug. Quoting Eric: Calling ip_defrag() in output path is also implying skb_orphan(), which is buggy because output path relies on sk not disappearing. A relevant old patch about the issue was : 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()") [..] net/ipv4/ip_output.c depends on skb->sk being set, and probably to an inet socket, not an arbitrary one. If we orphan the packet in ipvlan, then downstream things like FQ packet scheduler will not work properly. We need to change ip_defrag() to only use skb_orphan() when really needed, ie whenever frag_list is going to be used. Eric suggested to stash sk in fragment queue and made an initial patch. However there is a problem with this: If skb is refragmented again right after, ip_do_fragment() will copy head->sk to the new fragments, and sets up destructor to sock_wfree. IOW, we have no choice but to fix up sk_wmem accouting to reflect the fully reassembled skb, else wmem will underflow. This change moves the orphan down into the core, to last possible moment. As ip_defrag_offset is aliased with sk_buff->sk member, we must move the offset into the FRAG_CB, else skb->sk gets clobbered. This allows to delay the orphaning long enough to learn if the skb has to be queued or if the skb is completing the reasm queue. In the former case, things work as before, skb is orphaned. This is safe because skb gets queued/stolen and won't continue past reasm engine. In the latter case, we will steal the skb->sk reference, reattach it to the head skb, and fix up wmem accouting when inet_frag inflates truesize. Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().") Diagnosed-by: Eric Dumazet <edumazet@google.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-28 12:06:22 +01:00
Hariprasad Kelam	40d4b4807c	Octeontx2-af: fix pause frame configuration in GMP mode The Octeontx2 MAC block (CGX) has separate data paths (SMU and GMP) for different speeds, allowing for efficient data transfer. The previous patch which added pause frame configuration has a bug due to which pause frame feature is not working in GMP mode. This patch fixes the issue by configurating appropriate registers. Fixes: f7e086e754fe ("octeontx2-af: Pause frame configuration at cgx") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240326052720.4441-1-hkelam@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-28 11:56:47 +01:00
Raju Lakkaraju	e4a58989f5	net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips PCI11x1x Rev B0 devices might drop packets when receiving back to back frames at 2.5G link speed. Change the B0 Rev device's Receive filtering Engine FIFO threshold parameter from its hardware default of 4 to 3 dwords to prevent the problem. Rev C0 and later hardware already defaults to 3 dwords. Fixes: bb4f6bffe33c ("net: lan743x: Add PCI11010 / PCI11414 device IDs") Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240326065805.686128-1-Raju.Lakkaraju@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-28 11:36:10 +01:00
Paolo Abeni	eb67cdb33f	Merge branch 'net-bcmasp-phy-managements-fixes' Justin Chen says: ==================== net: bcmasp: phy managements fixes Fix two issues. - The unimac may be put in a bad state if PHY RX clk doesn't exist during reset. Work around this by bringing the unimac out of reset during phy up. - Remove redundant phy_{suspend/resume} ==================== Link: https://lore.kernel.org/r/20240325193025.1540737-1-justin.chen@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-28 10:46:39 +01:00
Justin Chen	4494c10e00	net: bcmasp: Remove phy_{suspend/resume} phy_{suspend/resume} is redundant. It gets called from phy_{stop/start}. Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-28 10:46:38 +01:00
Justin Chen	dfd222e2ae	net: bcmasp: Bring up unimac after PHY link up The unimac requires the PHY RX clk during reset or it may be put into a bad state. Bring up the unimac after link up to ensure the PHY RX clk exists. Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-28 10:46:38 +01:00
Christian Marangi	6a4aee2777	net: phy: qcom: at803x: fix kernel panic with at8031_probe On reworking and splitting the at803x driver, in splitting function of at803x PHYs it was added a NULL dereference bug where priv is referenced before it's actually allocated and then is tried to write to for the is_1000basex and is_fiber variables in the case of at8031, writing on the wrong address. Fix this by correctly setting priv local variable only after at803x_probe is called and actually allocates priv in the phydev struct. Reported-by: William Wortel <wwortel@dorpstraat.com> Cc: <stable@vger.kernel.org> Fixes: 25d2ba94005f ("net: phy: at803x: move specific at8031 probe mode check to dedicated probe") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240325190621.2665-1-ansuelsmth@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-28 10:42:22 +01:00
Paolo Abeni	005e528c24	netfilter pull request 24-03-28 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmYE3H8ACgkQ1V2XiooU IOR/2xAAiN+XhXMFiplQvR/6CEVirEriFIUUT8IR+fwIBUsFc6QAMpdNwCHo7j49 2gFxHNER8oD2IynVCvqYM/O2Ukz8S3FclMicgOb03HZ3cenCUjW1l0el8dYhb8Do twVs2Trt217QyiwNkM0B68+5lo3wDo3uB70p2abdmcJ1tgCPncpDPR2Pl3DBPswP kMrO1aohYBTn4SFyaVbCLzkOlS1T6Bf4yqQMcL+zgIdd9+kLkYRqHlvMiwM/vgwp JJk7mnQx9h73y8sx9EAgaaf+63rNJ1JcDsKAhAAqJa9lJMVOPFTChaDOGp4aInvD qYBUIqCRC/FWN7BEnq4Hj6NvLmbUPm+9YkMnE7nCfZXJVCVUyfwBFtv1FVKvD5YU Ybi7Nf66bh9kYhy23TmhdQXEjzO9rIrJQEADz7qmGYydx27c5rwWUu8u7c0SRQ/V l/1rmT39Dr3vyZIBguIXaeU8hSwVvtlhavEVQ73wKGdMGgdg6QUB2zUrYvb/IU74 v9PSmIoXKdcG4wwQj/ijRGgsZx556ifzehwbHhvFpCOn2THprMLnIaXC/mLKyiJb TwBhTxyYBqGecEa214VQaxUndzwT9txZ7ExO6ZzEZEip0RERX6LoSvlvy4xo715U ndq2s/yCSjRprmvEZSkgBwva0LEXjrFl9gormfkM+ejnDC4eAcQ= =sQm5 -----END PGP SIGNATURE----- Merge tag 'nf-24-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: Patch #1 reject destroy chain command to delete device hooks in netdev family, hence, only delchain commands are allowed. Patch #2 reject table flag update interference with netdev basechain hook updates, this can leave hooks in inconsistent registration/unregistration state. Patch #3 do not unregister netdev basechain hooks if table is dormant. Otherwise, splat with double unregistration is possible. Patch #4 fixes Kconfig to allow to restore IP_NF_ARPTABLES, from Kuniyuki Iwashima. There are a more fixes still in progress on my side that need more work. * tag 'nf-24-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c netfilter: nf_tables: skip netdev hook unregistration if table is dormant netfilter: nf_tables: reject table flag and netdev basechain updates netfilter: nf_tables: reject destroy command to remove basechain hooks ==================== Link: https://lore.kernel.org/r/20240328031855.2063-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-28 10:23:03 +01:00
Paolo Abeni	7e6f4b2af5	for-net -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmYExnMACgkQ6rmadz2v bTofuA/8CVtNs4vcBfHSDaz9SzcSOp5pGhmUFHpwXkE5NPyi6tTRFRxfCkEK9/UG 1z9J54U6I7HB/6zbrhf1PP9c7ZbD9awPYTXude1cQaN9lgyxnfl5rfMDj4H5+5S7 TlmXxtFXUDlhcl8Hayxxe8UEZd6VPbfTP0/b7BRsesrT+G3+FxVf1Mh43NjEllYQ Fn/s/4UpYxz0YJCuud97fL+Vd04Dpx33ZihhIXU0hQ85ieyRMozat9o8n2bTsUGv 7K9Jsp9SzLpELeS/ScbzCqgU5mAJYfQWaXtt7tRNOpetvmL3/HQGAM3JRmPlOtna KDjZFO8ihIxSpqxXxwLjy3Z9SgzwqfVn6SP4cA+vhK2Nbk1vItAD/BvPkxsX1Zl+ Q8zSHQGNtoz+dMPlQtU1nEjVdk8YxQ/R9OI807CuiifY6590V13SfiNnxgoC213A tduI8q/EBFvAnuA8IJlutfVasHRuqCPmn0PXYWnlaWJP9tExE3shjCJG2Qmy3+bC z8RHeswujidR22VL8vDLxRKtlDl3mOclBqSJa+Cz5gH3oEBlvMfD0UU8CFeiEM4p ngryIc2dtd4Jd7eDKw2caNq+rgaTXpUjFi34deR0T0jO+YEwHGw6Kr/JYvU4UovY /YgGIeQXNMoO5eI72nNyDIeZNwENZLnt2P618vjIPDL+Pqau7go= =Sz5u -----END PGP SIGNATURE----- Merge tag 'for-net' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2024-03-27 The following pull-request contains BPF updates for your net tree. We've added 4 non-merge commits during the last 1 day(s) which contain a total of 5 files changed, 26 insertions(+), 3 deletions(-). The main changes are: 1) Fix bloom filter value size validation and protect the verifier against such mistakes, from Andrei. 2) Fix build due to CONFIG_KEXEC_CORE/CRASH_DUMP split, from Hari. 3) Update bpf_lsm maintainers entry, from Matt. * tag 'for-net' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: update BPF LSM designated reviewer list bpf: Protect against int overflow for stack access size bpf: Check bloom filter map value size bpf: fix warning for crash_kexec ==================== Link: https://lore.kernel.org/r/20240328012938.24249-1-alexei.starovoitov@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2024-03-28 10:08:00 +01:00
Linus Torvalds	8d025e2092	Changes since last update: - Add a new reviewer Sandeep Dhavale to build a healthier community; - Drop experimental warning for FSDAX. -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmYE1ucRHHhpYW5nQGtl cm5lbC5vcmcACgkQUXZn5Zlu5qoSnw//Q35Sc7fIQb6YfuOOQ2VQX7TVE4jTrLej +3hIQUz3BA8wRwrBYdeJLcvjda4y+0gNxlpw2ycNS6S4vxAiVEAuwPJYO6oAC0Il ETa6opc1vpTMeXgwwtbXC7ACnWas9EQC61Z4E8W5zeVcNqQZnZyInMJ9Rkjqs/iJ VjJH2wWR5MIgWJdEPqchPx/28nqbBOcztc7ARJqpujyZEvu+OBVIPv8P/7N0a5yG bpHkDrzoelBMkpktpuvrkv84ymyCDC7LH9mq+Wk6dY5wRFxa2BtoZVh6YcpcjrpR 75GZVg3BN3Ph41JCwYHqxyRGpoLO11dSYi6DNDVngxOGtkTNRVGrJ1FYEapWV+o8 1MnEbl0vZSHUkjrIFbfZTFSqpvW2XSfEOa3heNDFknmzT/ISobSGENULp9cggcYI jhS6wtVG4bl5bCiCKCZluByr8/J8TCQc/5t5f5bQLy2MWqlyjaSx82uWuDpzO1Hh +q+p+MB+ketMUwxaIUAuTNgzhFPFT4Na/ni9WqP7Ri3GJY6pdjDUUtrtoIBK4oPQ ajUWhPlOk5zMwLq9Jl4MiG1ostBI9P37ZerjsdaLDZYElGhTjwjPu/xlh7p26Inq Ufq3QaQH2wai+oAVS6Sli3dJfb399XVJhmT2WFMH+0DmQW6JzvsGTdUX2fMgv5sb I7dVfuceTs0= =+ZW9 -----END PGP SIGNATURE----- Merge tag 'erofs-for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Add a new reviewer Sandeep Dhavale to build a healthier community - Drop experimental warning for FSDAX * tag 'erofs-for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: MAINTAINERS: erofs: add myself as reviewer erofs: drop experimental warning for FSDAX	2024-03-27 20:24:09 -07:00
Kuniyuki Iwashima	15fba562f7	netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c syzkaller started to report a warning below [0] after consuming the commit 4654467dc7e1 ("netfilter: arptables: allow xtables-nft only builds"). The change accidentally removed the dependency on NETFILTER_FAMILY_ARP from IP_NF_ARPTABLES. If NF_TABLES_ARP is not enabled on Kconfig, NETFILTER_FAMILY_ARP will be removed and some code necessary for arptables will not be compiled. $ grep -E "(NETFILTER_FAMILY_ARP\|IP_NF_ARPTABLES\|NF_TABLES_ARP)" .config CONFIG_NETFILTER_FAMILY_ARP=y # CONFIG_NF_TABLES_ARP is not set CONFIG_IP_NF_ARPTABLES=y $ make olddefconfig $ grep -E "(NETFILTER_FAMILY_ARP\|IP_NF_ARPTABLES\|NF_TABLES_ARP)" .config # CONFIG_NF_TABLES_ARP is not set CONFIG_IP_NF_ARPTABLES=y So, when nf_register_net_hooks() is called for arptables, it will trigger the splat below. Now IP_NF_ARPTABLES is only enabled by IP_NF_ARPFILTER, so let's restore the dependency on NETFILTER_FAMILY_ARP in IP_NF_ARPFILTER. [0]: WARNING: CPU: 0 PID: 242 at net/netfilter/core.c:316 nf_hook_entry_head+0x1e1/0x2c0 net/netfilter/core.c:316 Modules linked in: CPU: 0 PID: 242 Comm: syz-executor.0 Not tainted 6.8.0-12821-g537c2e91d354 #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:nf_hook_entry_head+0x1e1/0x2c0 net/netfilter/core.c:316 Code: 83 fd 04 0f 87 bc 00 00 00 e8 5b 84 83 fd 4d 8d ac ec a8 0b 00 00 e8 4e 84 83 fd 4c 89 e8 5b 5d 41 5c 41 5d c3 e8 3f 84 83 fd <0f> 0b e8 38 84 83 fd 45 31 ed 5b 5d 4c 89 e8 41 5c 41 5d c3 e8 26 RSP: 0018:ffffc90000b8f6e8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff83c42164 RDX: ffff888106851180 RSI: ffffffff83c42321 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000005 R09: 000000000000000a R10: 0000000000000003 R11: ffff8881055c2f00 R12: ffff888112b78000 R13: 0000000000000000 R14: ffff8881055c2f00 R15: ffff8881055c2f00 FS: 00007f377bd78800(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000496068 CR3: 000000011298b003 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> __nf_register_net_hook+0xcd/0x7a0 net/netfilter/core.c:428 nf_register_net_hook+0x116/0x170 net/netfilter/core.c:578 nf_register_net_hooks+0x5d/0xc0 net/netfilter/core.c:594 arpt_register_table+0x250/0x420 net/ipv4/netfilter/arp_tables.c:1553 arptable_filter_table_init+0x41/0x60 net/ipv4/netfilter/arptable_filter.c:39 xt_find_table_lock+0x2e9/0x4b0 net/netfilter/x_tables.c:1260 xt_request_find_table_lock+0x2b/0xe0 net/netfilter/x_tables.c:1285 get_info+0x169/0x5c0 net/ipv4/netfilter/arp_tables.c:808 do_arpt_get_ctl+0x3f9/0x830 net/ipv4/netfilter/arp_tables.c:1444 nf_getsockopt+0x76/0xd0 net/netfilter/nf_sockopt.c:116 ip_getsockopt+0x17d/0x1c0 net/ipv4/ip_sockglue.c:1777 tcp_getsockopt+0x99/0x100 net/ipv4/tcp.c:4373 do_sock_getsockopt+0x279/0x360 net/socket.c:2373 __sys_getsockopt+0x115/0x1e0 net/socket.c:2402 __do_sys_getsockopt net/socket.c:2412 [inline] __se_sys_getsockopt net/socket.c:2409 [inline] __x64_sys_getsockopt+0xbd/0x150 net/socket.c:2409 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x7f377beca6fe Code: 1f 44 00 00 48 8b 15 01 97 0a 00 f7 d8 64 89 02 b8 ff ff ff ff eb b8 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 c9 RSP: 002b:00000000005df728 EFLAGS: 00000246 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 00000000004966e0 RCX: 00007f377beca6fe RDX: 0000000000000060 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 000000000042938a R08: 00000000005df73c R09: 00000000005df800 R10: 00000000004966e8 R11: 0000000000000246 R12: 0000000000000003 R13: 0000000000496068 R14: 0000000000000003 R15: 00000000004bc9d8 </TASK> Fixes: 4654467dc7e1 ("netfilter: arptables: allow xtables-nft only builds") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-03-28 03:54:02 +01:00
Pablo Neira Ayuso	216e7bf740	netfilter: nf_tables: skip netdev hook unregistration if table is dormant Skip hook unregistration when adding or deleting devices from an existing netdev basechain. Otherwise, commit/abort path try to unregister hooks which not enabled. Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Fixes: 7d937b107108 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-03-28 03:54:01 +01:00
Pablo Neira Ayuso	1e1fb6f00f	netfilter: nf_tables: reject table flag and netdev basechain updates netdev basechain updates are stored in the transaction object hook list. When setting on the table dormant flag, it iterates over the existing hooks in the basechain. Thus, skipping the hooks that are being added/deleted in this transaction, which leaves hook registration in inconsistent state. Reject table flag updates in combination with netdev basechain updates in the same batch: - Update table flags and add/delete basechain: Check from basechain update path if there are pending flag updates for this table. - add/delete basechain and update table flags: Iterate over the transaction list to search for basechain updates from the table update path. In both cases, the batch is rejected. Based on suggestion from Florian Westphal. Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Fixes: 7d937b107108f ("netfilter: nf_tables: support for deleting devices in an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2024-03-28 03:54:01 +01:00

1 2 3 4 5 ...

1264777 Commits