linux

iv/linux

Author	SHA1	Message	Date
Christian Marangi	4fd1b6d47a	leds: trigger: netdev: introduce check for possible hw control Introduce function to check if the requested mode can use hw control in preparation for hw control support. Currently everything is handled in software so can_hw_control will always return false. Add knob with the new value hw_control in trigger_data struct to set hw control possible. Useful for future implementation to implement in set_baseline_state() the required function to set the requested mode using LEDs hw control ops and in other function to reject set if hw control is currently active. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-31 09:42:08 +01:00
Andrew Lunn	28a6a2ef18	leds: trigger: netdev: refactor code setting device name Move the code into a helper, ready for it to be called at other times. No intended behaviour change. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-31 09:42:08 +01:00
Christian Marangi	8aa2fd7b66	Documentation: leds: leds-class: Document new Hardware driven LEDs APIs Document new Hardware driven LEDs APIs. Some LEDs can be programmed to be driven by hardware. This is not limited to blink but also to turn off or on autonomously. To support this feature, a LED needs to implement various additional ops and needs to declare specific support for the supported triggers. Add documentation for each required value and API to make hw control possible and implementable by both LEDs and triggers. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-31 09:42:08 +01:00
Andrew Lunn	052c38eb17	leds: add API to get attached device for LED hw control Some specific LED triggers blink the LED based on events from a device or subsystem. For example, an LED could be blinked to indicate a network device is receiving packets, or a disk is reading blocks. To correctly enable and request the hw control of the LED, the trigger has to check if the network interface or block device configured via a /sys/class/led file match the one the LED driver provide for hw control for. Provide an API call to get the device which the LED blinks for. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-31 09:42:08 +01:00
Christian Marangi	ed554d3f94	leds: add APIs for LEDs hw control Add an option to permit LED driver to declare support for a specific trigger to use hw control and setup the LED to blink based on specific provided modes. Add APIs for LEDs hw control. These functions will be used to activate hardware control where a LED will use the provided flags, from an unique defined supported trigger, to setup the LED to be driven by hardware. Add hw_control_is_supported() to ask the LED driver if the requested mode by the trigger are supported and the LED can be setup to follow the requested modes. Deactivate hardware blink control by setting brightness to LED_OFF via the brightness_set() callback. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-05-31 09:42:08 +01:00
Pedro Tammela	f4e4534850	net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report The current code for the length calculation wrongly truncates the reported length of the groups array, causing an under report of the subscribed groups. To fix this, use 'BITS_TO_BYTES()' which rounds up the division by 8. Fixes: b42be38b2778 ("netlink: add API to retrieve all group memberships") Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230529153335.389815-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-31 00:02:24 -07:00
Xin Long	6cd8ec58c1	tipc: delete tipc_mtu_bad from tipc_udp_enable Since commit a4dfa72d0acd ("tipc: set default MTU for UDP media"), it's been no longer using dev->mtu for b->mtu, and the issue described in commit 3de81b758853 ("tipc: check minimum bearer MTU") doesn't exist in UDP bearer any more. Besides, dev->mtu can still be changed to a too small mtu after the UDP bearer is created even with tipc_mtu_bad() check in tipc_udp_enable(). Note that NETDEV_CHANGEMTU event processing in tipc_l2_device_event() doesn't really work for UDP bearer. So this patch deletes the unnecessary tipc_mtu_bad from tipc_udp_enable. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Link: https://lore.kernel.org/r/282f1f5cc40e6cad385aa1c60569e6c5b70e2fb3.1685371933.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-31 00:01:11 -07:00
Jakub Kicinski	c23515ad4e	Merge branch 'net-dsa-mv88e6xxx-add-88e6361-support' Alexis Lothoré says: ==================== net: dsa: mv88e6xxx: add 88E6361 support This series brings initial support for Marvell 88E6361 switch. MV88E6361 is a 8 ports switch with 5 integrated Gigabit PHYs and 3 2.5Gigabit SerDes interfaces. It is in fact a new variant in the 88E639X/88E6193X/88E6191X family with a subset of existing features: - port 0: MII, RMII, RGMII, 1000BaseX, 2500BaseX - port 3 to 7: triple speed internal phys - port 9 and 10: 1000BaseX, 25000BaseX Since said family is already well supported in mv88e6xxx driver, adding initial support for this new switch mostly consists in finding the ID exposed in its identification register, adding a proper description in switch description tables in mv88e6xxx driver, and enforcing 88E6361 specificities in mv88e6393x_XXX methods. - first 4 commits introduce an internal phy offset field for switches which have internal phys but not starting from port 0 - 5th commit is a fix on existing switches based on first commits - 6th commit is a slight modification to prepare 886361 support - last commit introduces 88E6361 support in 88E6393X family This initial support has been tested with two samples of a custom board with the following hardware configuration: - a main CPU connected to MV88E6361 using port 0 as CPU port - port 9 wired to a SFP cage - port 10 wired to a G.Hn transceiver The following setup was used: PC <-ethernet-> (copper SFP) - Board 1 - (G.hn) <-phone line(RJ11)-> (G.hn) Board 2 The unit 1 has been configured to bridge SFP port and G.hn port together, which allowed to successfully ping Board 2 from PC. ==================== Link: https://lore.kernel.org/r/20230529080246.82953-1-alexis.lothore@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:54:35 -07:00
Alexis Lothoré	12899f2998	net: dsa: mv88e6xxx: enable support for 88E6361 switch Marvell 88E6361 is an 8-port switch derived from the 88E6393X/88E9193X/88E6191X switches family. It can benefit from the existing mv88e6xxx driver by simply adding the proper switch description in the driver. Main differences with other switches from this family are: - 8 ports exposed (instead of 11): ports 1, 2 and 8 not available - No 5GBase-x nor SFI/USXGMII support Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:54:33 -07:00
Alexis Lothoré	18e1b7422d	net: dsa: mv88e6xxx: pass mv88e6xxx_chip structure to port_max_speed_mode Some switches families have minor differences on supported link speed for ports. Instead of redefining a new port_max_speed_mode for each different configuration, allow to pass mv88e6xxx_chip structure to allow differentiating those chips by known chip id Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:54:33 -07:00
Alexis Lothoré	2f93493970	net: dsa: mv88e6xxx: fix 88E6393X family internal phys layout 88E6393X/88E6193X/88E6191X switches have in fact 8 internal PHYs, but those are not present starting at port 0: supported ports go from 1 to 8 Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:54:33 -07:00
Alexis Lothoré	3ba89b28ad	net: dsa: mv88e6xxx: add field to specify internal phys layout mv88e6xxx currently assumes that switch equipped with internal phys have those phys mapped contiguously starting from port 0 (see mv88e6xxx_phy_is_internal). However, some switches have internal PHYs but NOT starting from port 0. For example 88e6393X, 88E6193X and 88E6191X have integrated PHYs available on ports 1 to 8 To properly support this offset, add a new field to allow specifying an internal PHYs layout. If field is not set, default layout is assumed (start at port 0) Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:54:33 -07:00
Alexis Lothoré	7a2dd00be8	net: dsa: mv88e6xxx: use mv88e6xxx_phy_is_internal in mv88e6xxx_port_ppu_updates Make sure to use existing helper to get internal PHYs count instead of redoing it manually Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:54:33 -07:00
Alexis Lothoré	ca34593190	net: dsa: mv88e6xxx: pass directly chip structure to mv88e6xxx_phy_is_internal Since this function is a simple helper, we do not need to pass a full dsa_switch structure, we can directly pass the mv88e6xxx_chip structure. Doing so will allow to share this function with any other function not manipulating dsa_switch structure but needing info about number of internal phys Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:54:33 -07:00
Alexis Lothoré	9229a9483d	dt-bindings: net: dsa: marvell: add MV88E6361 switch to compatibility list Marvell MV88E6361 is an 8-port switch derived from the 88E6393X/88E9193X/88E6191X switches family. Since its functional behavior is very close to switches from this family, it can benefit from existing drivers for this family, so add it to the list of compatible switches Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:54:33 -07:00
Jakub Kicinski	e180a33cf4	Merge branch 'add-layer-2-miss-indication-and-filtering' Ido Schimmel says: ==================== Add layer 2 miss indication and filtering tl;dr ===== This patchset adds a single bit to the tc skb extension to indicate that a packet encountered a layer 2 miss in the bridge and extends flower to match on this metadata. This is required for non-DF (Designated Forwarder) filtering in EVPN multi-homing which prevents decapsulated BUM packets from being forwarded multiple times to the same multi-homed host. Background ========== In a typical EVPN multi-homing setup each host is multi-homed using a set of links called ES (Ethernet Segment, i.e., LAG) to multiple leaf switches in a rack. These switches act as VTEPs and are not directly connected (as opposed to MLAG), but can communicate with each other (as well as with VTEPs in remote racks) via spine switches over L3. When a host sends a BUM packet over ES1 to VTEP1, the VTEP will flood it to other VTEPs in the network, including those connected to the host over ES1. The receiving VTEPs must drop the packet and not forward it back to the host. This is called "split-horizon filtering" (SPH) [1]. FRR configures SPH filtering using two tc filters. The first, an ingress filter that matches on packets received from VTEP1 and marks them using a fwmark (firewall mark). The second, an egress filter configured on the LAG interface connected to the host that matches on the fwmark and drops the packets. Example: # tc filter add dev vxlan0 ingress pref 1 proto all flower enc_src_ip $VTEP1_IP action skbedit mark 101 # tc filter add dev bond0 egress pref 1 handle 101 fw action drop Motivation ========== For each ES, only one VTEP is elected by the control plane as the DF. The DF is responsible for forwarding decapsulated BUM traffic to the host over the ES. The non-DF VTEPs must drop such traffic as otherwise the host will receive multiple copies of BUM traffic. This is called "non-DF filtering" [2]. Filtering of multicast and broadcast traffic can be achieved using the following flower filter: # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop Unlike broadcast and multicast traffic, it is not currently possible to filter unknown unicast traffic. The classification into unknown unicast is performed by the bridge driver, but is not visible to other layers. Implementation ============== The proposed solution is to add a single bit to the tc skb extension that is set by the bridge for packets that encountered an FDB or MDB miss. The flower classifier is extended to be able to match on this new metadata bit in a similar fashion to existing metadata options such as 'indev'. A bit that is set for every flooded packet would also work, but it does not allow us to differentiate between registered and unregistered multicast traffic which might be useful in the future. A relatively generic name is chosen for this bit - 'l2_miss' - to allow its use to be extended to other layer 2 devices such as VXLAN, should a use case arise. With the above, the control plane can implement a non-DF filter using the following tc filters: # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop # tc filter add dev bond0 egress pref 2 proto all flower indev vxlan0 l2_miss true action drop The first drops broadcast and multicast traffic and the second drops unknown unicast traffic. Testing ======= A test exercising the different permutations of the 'l2_miss' bit is added in patch #8. Patchset overview ================= Patch #1 adds the new bit to the tc skb extension and sets it in the bridge driver for packets that encountered a miss. The marking of the packets and the use of this extension is protected by the 'tc_skb_ext_tc' static key in order to keep performance impact to a minimum when the feature is not in use. Patch #2 extends the flow dissector to dissect this information from the tc skb extension into the 'FLOW_DISSECTOR_KEY_META' key. Patch #3 extends the flower classifier to be able to match on the new layer 2 miss metadata. The classifier enables the 'tc_skb_ext_tc' static key upon the installation of the first filter that matches on 'l2_miss' and disables the key upon the removal of the last filter that matches on it. Patch #4 rejects matching on the new metadata in drivers that already support the 'FLOW_DISSECTOR_KEY_META' key. Patches #5-#6 are small preparations in mlxsw. Patch #7 extends mlxsw to be able to match on layer 2 miss. Patch #8 adds a selftest. iproute2 patches can be found here [3]. [1] https://datatracker.ietf.org/doc/html/rfc7432#section-8.3 [2] https://datatracker.ietf.org/doc/html/rfc7432#section-8.5 [3] https://github.com/idosch/iproute2/tree/submit/non_df_filter_v1 [4] https://lore.kernel.org/netdev/20230518113328.1952135-1-idosch@nvidia.com/ [5] https://lore.kernel.org/netdev/20230509070446.246088-1-idosch@nvidia.com/ ==================== Link: https://lore.kernel.org/r/20230529114835.372140-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:37:02 -07:00
Ido Schimmel	8c33266ae2	selftests: forwarding: Add layer 2 miss test cases Add test cases to verify that the bridge driver correctly marks layer 2 misses only when it should and that the flower classifier can match on this metadata. Example output: # ./tc_flower_l2_miss.sh TEST: L2 miss - Unicast [ OK ] TEST: L2 miss - Multicast (IPv4) [ OK ] TEST: L2 miss - Multicast (IPv6) [ OK ] TEST: L2 miss - Link-local multicast (IPv4) [ OK ] TEST: L2 miss - Link-local multicast (IPv6) [ OK ] TEST: L2 miss - Broadcast [ OK ] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:37:01 -07:00
Ido Schimmel	caa4c58ab5	mlxsw: spectrum_flower: Add ability to match on layer 2 miss Add the 'fdb_miss' key element to supported key blocks and make use of it to match on layer 2 miss. The key is only supported on Spectrum-{2,3,4}. An error is returned for Spectrum-1 since the key element is not present in any of its key blocks. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:37:00 -07:00
Ido Schimmel	0b9cd74b8d	mlxsw: spectrum_flower: Do not force matching on iif Currently, mlxsw only supports the 'ingress_ifindex' field in the 'FLOW_DISSECTOR_KEY_META' key, but subsequent patches are going to add support for the 'l2_miss' field as well. It is valid to only match on 'l2_miss' without 'ingress_ifindex', so do not force matching on it. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:37:00 -07:00
Ido Schimmel	d04e265096	mlxsw: spectrum_flower: Split iif parsing to a separate function Currently, mlxsw only supports the 'ingress_ifindex' field in the 'FLOW_DISSECTOR_KEY_META' key, but subsequent patches are going to add support for the 'l2_miss' field as well. Split the parsing of the 'ingress_ifindex' field to a separate function to avoid nesting. No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:37:00 -07:00
Ido Schimmel	f4356947f0	flow_offload: Reject matching on layer 2 miss Adjust drivers that support the 'FLOW_DISSECTOR_KEY_META' key to reject filters that try to match on the newly added layer 2 miss field. Add an extack message to clearly communicate the failure reason to user space. The following users were not patched: 1. mtk_flow_offload_replace(): Only checks that the key is present, but does not do anything with it. 2. mlx5_tc_ct_set_tuple_match(): Used as part of netfilter offload, which does not make use of the new field, unlike tc. 3. get_netdev_from_rule() in nfp: Likewise. Example: # tc filter add dev swp1 egress pref 1 proto all flower skip_sw l2_miss true action drop Error: mlxsw_spectrum: Can't match on "l2_miss". We have an error talking to the kernel Acked-by: Elad Nachman <enachman@marvell.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:37:00 -07:00
Ido Schimmel	1a432018c0	net/sched: flower: Allow matching on layer 2 miss Add the 'TCA_FLOWER_L2_MISS' netlink attribute that allows user space to match on packets that encountered a layer 2 miss. The miss indication is set as metadata in the tc skb extension by the bridge driver upon FDB or MDB lookup miss and dissected by the flow dissector to the 'FLOW_DISSECTOR_KEY_META' key. The use of this skb extension is guarded by the 'tc_skb_ext_tc' static key. As such, enable / disable this key when filters that match on layer 2 miss are added / deleted. Tested: # cat tc_skb_ext_tc.py #!/usr/bin/env -S drgn -s vmlinux refcount = prog["tc_skb_ext_tc"].key.enabled.counter.value_() print(f"tc_skb_ext_tc reference count is {refcount}") # ./tc_skb_ext_tc.py tc_skb_ext_tc reference count is 0 # tc filter add dev swp1 egress proto all handle 101 pref 1 flower src_mac 00:11:22:33:44:55 action drop # tc filter add dev swp1 egress proto all handle 102 pref 2 flower src_mac 00:11:22:33:44:55 l2_miss true action drop # tc filter add dev swp1 egress proto all handle 103 pref 3 flower src_mac 00:11:22:33:44:55 l2_miss false action drop # ./tc_skb_ext_tc.py tc_skb_ext_tc reference count is 2 # tc filter replace dev swp1 egress proto all handle 102 pref 2 flower src_mac 00:01:02:03:04:05 l2_miss false action drop # ./tc_skb_ext_tc.py tc_skb_ext_tc reference count is 2 # tc filter del dev swp1 egress proto all handle 103 pref 3 flower # tc filter del dev swp1 egress proto all handle 102 pref 2 flower # tc filter del dev swp1 egress proto all handle 101 pref 1 flower # ./tc_skb_ext_tc.py tc_skb_ext_tc reference count is 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:37:00 -07:00
Ido Schimmel	d5ccfd90df	flow_dissector: Dissect layer 2 miss from tc skb extension Extend the 'FLOW_DISSECTOR_KEY_META' key with a new 'l2_miss' field and populate it from a field with the same name in the tc skb extension. This field is set by the bridge driver for packets that incur an FDB or MDB miss. The next patch will extend the flower classifier to be able to match on layer 2 misses. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:37:00 -07:00
Ido Schimmel	7b4858df3b	skbuff: bridge: Add layer 2 miss indication For EVPN non-DF (Designated Forwarder) filtering we need to be able to prevent decapsulated traffic from being flooded to a multi-homed host. Filtering of multicast and broadcast traffic can be achieved using the following flower filter: # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop Unlike broadcast and multicast traffic, it is not currently possible to filter unknown unicast traffic. The classification into unknown unicast is performed by the bridge driver, but is not visible to other layers such as tc. Solve this by adding a new 'l2_miss' bit to the tc skb extension. Clear the bit whenever a packet enters the bridge (received from a bridge port or transmitted via the bridge) and set it if the packet did not match an FDB or MDB entry. If there is no skb extension and the bit needs to be cleared, then do not allocate one as no extension is equivalent to the bit being cleared. The bit is not set for broadcast packets as they never perform a lookup and therefore never incur a miss. A bit that is set for every flooded packet would also work for the current use case, but it does not allow us to differentiate between registered and unregistered multicast traffic, which might be useful in the future. To keep the performance impact to a minimum, the marking of packets is guarded by the 'tc_skb_ext_tc' static key. When 'false', the skb is not touched and an skb extension is not allocated. Instead, only a 5 bytes nop is executed, as demonstrated below for the call site in br_handle_frame(). Before the patch: ``` memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); c37b09: 49 c7 44 24 28 00 00 movq $0x0,0x28(%r12) c37b10: 00 00 p = br_port_get_rcu(skb->dev); c37b12: 49 8b 44 24 10 mov 0x10(%r12),%rax memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); c37b17: 49 c7 44 24 30 00 00 movq $0x0,0x30(%r12) c37b1e: 00 00 c37b20: 49 c7 44 24 38 00 00 movq $0x0,0x38(%r12) c37b27: 00 00 ``` After the patch (when static key is disabled): ``` memset(skb->cb, 0, sizeof(struct br_input_skb_cb)); c37c29: 49 c7 44 24 28 00 00 movq $0x0,0x28(%r12) c37c30: 00 00 c37c32: 49 8d 44 24 28 lea 0x28(%r12),%rax c37c37: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax) c37c3e: 00 c37c3f: 48 c7 40 10 00 00 00 movq $0x0,0x10(%rax) c37c46: 00 #ifdef CONFIG_HAVE_JUMP_LABEL_HACK static __always_inline bool arch_static_branch(struct static_key *key, bool branch) { asm_volatile_goto("1:" c37c47: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) br_tc_skb_miss_set(skb, false); p = br_port_get_rcu(skb->dev); c37c4c: 49 8b 44 24 10 mov 0x10(%r12),%rax ``` Subsequent patches will extend the flower classifier to be able to match on the new 'l2_miss' bit and enable / disable the static key when filters that match on it are added / deleted. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:37:00 -07:00
Zhengchao Shao	36eec020fa	net: sched: fix NULL pointer dereference in mq_attach When use the following command to test: 1)ip link add bond0 type bond 2)ip link set bond0 up 3)tc qdisc add dev bond0 root handle ffff: mq 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq The kernel reports NULL pointer dereference issue. The stack information is as follows: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Internal error: Oops: 0000000096000006 [#1] SMP Modules linked in: pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mq_attach+0x44/0xa0 lr : qdisc_graft+0x20c/0x5cc sp : ffff80000e2236a0 x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0 x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1 x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000 x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000 x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000 x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008 x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000 Call trace: mq_attach+0x44/0xa0 qdisc_graft+0x20c/0x5cc tc_modify_qdisc+0x1c4/0x664 rtnetlink_rcv_msg+0x354/0x440 netlink_rcv_skb+0x64/0x144 rtnetlink_rcv+0x28/0x34 netlink_unicast+0x1e8/0x2a4 netlink_sendmsg+0x308/0x4a0 sock_sendmsg+0x64/0xac ____sys_sendmsg+0x29c/0x358 ___sys_sendmsg+0x90/0xd0 __sys_sendmsg+0x7c/0xd0 __arm64_sys_sendmsg+0x2c/0x38 invoke_syscall+0x54/0x114 el0_svc_common.constprop.1+0x90/0x174 do_el0_svc+0x3c/0xb0 el0_svc+0x24/0xec el0t_64_sync_handler+0x90/0xb4 el0t_64_sync+0x174/0x178 This is because when mq is added for the first time, qdiscs in mq is set to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we need to initialize qdiscs in the mq before continuing to graft. Otherwise, it will couse NULL pointer dereference issue in mq_attach(). And the same issue will occur in the attach functions of mqprio, taprio and htb. ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and the command should be dropped. Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Tested-by: Peilin Ye <peilin.ye@bytedance.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20230527093747.3583502-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:31:40 -07:00
Jakub Kicinski	bb50f12c69	Merge branch 'net-sched-fixes-for-sch_ingress-and-sch_clsact' Peilin Ye says: ==================== net/sched: Fixes for sch_ingress and sch_clsact These are v6 fixes for ingress and clsact Qdiscs, including only first 4 patches (already tested and reviewed) from v5. Patch 5 and 6 from previous versions are still under discussion and will be sent separately. [a] https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b Link to v5: https://lore.kernel.org/r/cover.1684887977.git.peilin.ye@bytedance.com/ Link to v4: https://lore.kernel.org/r/cover.1684825171.git.peilin.ye@bytedance.com/ Link to v3 (incomplete): https://lore.kernel.org/r/cover.1684821877.git.peilin.ye@bytedance.com/ Link to v2: https://lore.kernel.org/r/cover.1684796705.git.peilin.ye@bytedance.com/ Link to v1: https://lore.kernel.org/r/cover.1683326865.git.peilin.ye@bytedance.com/ ==================== Link: https://lore.kernel.org/r/cover.1685388545.git.peilin.ye@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:31:06 -07:00
Peilin Ye	9de95df5d1	net/sched: Prohibit regrafting ingress or clsact Qdiscs Currently, after creating an ingress (or clsact) Qdisc and grafting it under TC_H_INGRESS (TC_H_CLSACT), it is possible to graft it again under e.g. a TBF Qdisc: $ ip link add ifb0 type ifb $ tc qdisc add dev ifb0 handle 1: root tbf rate 20kbit buffer 1600 limit 3000 $ tc qdisc add dev ifb0 clsact $ tc qdisc link dev ifb0 handle ffff: parent 1:1 $ tc qdisc show dev ifb0 qdisc tbf 1: root refcnt 2 rate 20Kbit burst 1600b lat 560.0ms qdisc clsact ffff: parent ffff:fff1 refcnt 2 ^^^^^^^^ clsact's refcount has increased: it is now grafted under both TC_H_CLSACT and 1:1. ingress and clsact Qdiscs should only be used under TC_H_INGRESS (TC_H_CLSACT). Prohibit regrafting them. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: 1f211a1b929c ("net, sched: add clsact qdisc") Tested-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:31:05 -07:00
Peilin Ye	f85fa45d4a	net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs Currently it is possible to add e.g. an HTB Qdisc under ffff:fff1 (TC_H_INGRESS, TC_H_CLSACT): $ ip link add name ifb0 type ifb $ tc qdisc add dev ifb0 parent ffff:fff1 htb $ tc qdisc add dev ifb0 clsact Error: Exclusivity flag on, cannot modify. $ drgn ... >>> ifb0 = netdev_get_by_name(prog, "ifb0") >>> qdisc = ifb0.ingress_queue.qdisc_sleeping >>> print(qdisc.ops.id.string_().decode()) htb >>> qdisc.flags.value_() # TCQ_F_INGRESS 2 Only allow ingress and clsact Qdiscs under ffff:fff1. Return -EINVAL for everything else. Make TCQ_F_INGRESS a static flag of ingress and clsact Qdiscs. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: 1f211a1b929c ("net, sched: add clsact qdisc") Tested-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:31:05 -07:00
Peilin Ye	5eeebfe6c4	net/sched: sch_clsact: Only create under TC_H_CLSACT clsact Qdiscs are only supposed to be created under TC_H_CLSACT (which equals TC_H_INGRESS). Return -EOPNOTSUPP if 'parent' is not TC_H_CLSACT. Fixes: 1f211a1b929c ("net, sched: add clsact qdisc") Tested-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:31:05 -07:00
Peilin Ye	c7cfbd1150	net/sched: sch_ingress: Only create under TC_H_INGRESS ingress Qdiscs are only supposed to be created under TC_H_INGRESS. Return -EOPNOTSUPP if 'parent' is not TC_H_INGRESS, similar to mq_init(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/ Tested-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 23:31:04 -07:00
Linus Torvalds	48b1320a67	for-6.4-rc4-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmR2TDwACgkQxWXV+ddt WDsMvQ/+KgUXW+Liu5BaOyD5UzPL4BgHWiPTmJyRpsWTkGm8LE/yRCRoxqp1XbU+ nOjQpjkxI+ziRgKpDTAGFK/w51TV9ECM5wyZiXx93TO6iaTOuYCtSnSsWylzEC1H q9I3znLJSWrnBPTktwTZ29rvKvXj1k3th8ypyI9ho7N+3H0Uzt2VIPxrH2oVXZNz f2vkjSX9pKGN5zxM2ahd3Nde4Ma6yAlJLD+pnlYK20zH/30cAXdJsUCsUqQLXDL1 sUR++Br7qym3Wqn9Qa5R71IPJ1FieW2NaHgAz4dBBFfqe5PR7YCGL/Md6G+CFJ1E qLLFOWpELpqkeQdvivBnMZWqgpw+54Pdfuqxg7VylEmUc1y6CK4ab5XctpXIf75h 6bK0RPZ7D9jZl6JukkWftoS4XnW2cseyEfHneDMZDty4v1bxwR6g7i4ZTym413Gx Td1Z+G6BN5O5ih0Pc0CgSS3QnndWTUl3LAHiuxRErrK4dxpeuQlDTGWWY7YVyRPJ O9yC24GbHyWYBYHtNACEn6/GlXQjtswhjlHxqONmQfnstZL7Fz8si9EQEOWwssJE PIlb022a1mvR42yHr64TE0SzpDZbMY8mnULAsSrWgPXh3IAt1ztUuJajcFs84MZr qWewi4F/3wDAB0m1lUbAOmeBbpAw5gSGHhwBrjdK3EWJr2kxQ50= =viyP -----END PGP SIGNATURE----- Merge tag 'for-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "One bug fix and two build warning fixes: - call proper end bio callback for metadata RAID0 in a rare case of an unaligned block - fix uninitialized variable (reported by gcc 10.2) - fix warning about potential access beyond array bounds on mips64 with 64k pages (runtime check would not allow that)" * tag 'for-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix csum_tree_block page iteration to avoid tripping on -Werror=array-bounds btrfs: fix an uninitialized variable warning in btrfs_log_inode btrfs: call btrfs_orig_bbio_end_io in btrfs_end_bio_work	2023-05-30 17:23:50 -04:00
Linus Torvalds	afead42fdf	perf tools fixes for v6.4: 2nd batch - Fix BPF CO-RE naming convention for checking the availability of fields on 'union perf_mem_data_src' on the running kernel. - Remove the use of llvm-strip on BPF skel object files, not needed, fixes a build breakage when the llvm package, that contains it in most distros, isn't installed. - Fix tools that use both evsel->{bpf_counter_list,bpf_filters}, removing them from a union. - Remove extra "--" from the 'perf ftrace latency' --use-nsec option, previously it was working only when using the '-n' alternative. - Don't stop building when both binutils-devel and a C++ compiler isn't available to compile the alternative C++ demangle support code, disable that feature instead. - Sync the linux/in.h and coresight-pmu.h header copies with the kernel sources. - Fix relative include path to cs-etm.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZHY9egAKCRCyPKLppCJ+ JzbBAQCv+i0j/+garWKCTSe33yztzQGS6jxu/YzQrbQO427DMwEA9fJgp/r2OQC5 wMM5gng2fPaHe6Hs4cnPL/SzMxLC2gQ= =fHRR -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.4-2-2023-05-30' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix BPF CO-RE naming convention for checking the availability of fields on 'union perf_mem_data_src' on the running kernel - Remove the use of llvm-strip on BPF skel object files, not needed, fixes a build breakage when the llvm package, that contains it in most distros, isn't installed - Fix tools that use both evsel->{bpf_counter_list,bpf_filters}, removing them from a union - Remove extra "--" from the 'perf ftrace latency' --use-nsec option, previously it was working only when using the '-n' alternative - Don't stop building when both binutils-devel and a C++ compiler isn't available to compile the alternative C++ demangle support code, disable that feature instead - Sync the linux/in.h and coresight-pmu.h header copies with the kernel sources - Fix relative include path to cs-etm.h * tag 'perf-tools-fixes-for-v6.4-2-2023-05-30' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf evsel: Separate bpf_counter_list and bpf_filters, can be used at the same time tools headers UAPI: Sync the linux/in.h with the kernel sources perf cs-etm: Copy kernel coresight-pmu.h header perf bpf: Do not use llvm-strip on BPF binary perf build: Don't compile demangle-cxx.cpp if not necessary perf arm: Fix include path to cs-etm.h perf bpf filter: Fix a broken perf sample data naming for BPF CO-RE perf ftrace latency: Remove unnecessary "--" from --use-nsec option	2023-05-30 17:12:13 -04:00
Linus Torvalds	1683c329b6	regmap: Fixes for v6.4 The most important fix here is for missing dropping of the RCU read lock when syncing maple tree register caches, the physical devices I have that use the code don't do any syncing so I'd only ever tested this with virtual devices and missed the fact that we need to drop the lock in order to write to buses that need to sleep. Otherwise there's a fix for an edge case when splitting up large batch writes which has been lurking for a long time, a check to make sure nobody writes new drivers with a bug that was found in several SoundWire drivers and a tweak to the way the new kunit tests are enabled to ensure they don't cause regmap to be enabled when it wouldn't otherwise be. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmR15dQACgkQJNaLcl1U h9BGKgf+NFD98ZNPsvOoFqAdG33yTDWISTf8FtIakM074vhz0xrBTFolqf/wmBGF 8VEabgfpmLSyuGEizjK/YLDK4f143XBgivyXUH04vGeAQUx4MfBaf0DSxBlQLh4K K3LebH5QNQqdv9Qx7xMy7Y06r3KiQUxUvM8AINukspY0pmY8LtxyGLvv8T3kSpjz WU28umwPrNsr36TcSuowncvZmMJ7Y1kSEnNjer1cdVlJVn4cFzkl/Sashmy8Rg3g 5Qmbg6Kuc6mLetoshtsculYntoLUJeCt6oX01Iq3cxVqkNi+mpOgwMOgjFFlwtrc fela1GXUFEKe31FKyQPE0DRKv697eg== =iAaD -----END PGP SIGNATURE----- Merge tag 'regmap-fix-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fixes from Mark Brown: "The most important fix here is for missing dropping of the RCU read lock when syncing maple tree register caches, the physical devices I have that use the code don't do any syncing so I'd only ever tested this with virtual devices and missed the fact that we need to drop the lock in order to write to buses that need to sleep. Otherwise there's a fix for an edge case when splitting up large batch writes which has been lurking for a long time, a check to make sure nobody writes new drivers with a bug that was found in several SoundWire drivers and a tweak to the way the new kunit tests are enabled to ensure they don't cause regmap to be enabled when it wouldn't otherwise be" * tag 'regmap-fix-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: maple: Drop the RCU read lock while syncing registers regmap: sdw: check for invalid multi-register writes config regmap: Account for register length when chunking regmap: REGMAP_KUNIT should not select REGMAP	2023-05-30 17:07:25 -04:00
Linus Torvalds	6d86b56f54	modules-6.4-rc5 A fix is provided for ia64. Even though ia64 is on life support it helps to fix issues if we can. Thanks to Linus for doing tons of the ia64 debugging. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmR2JPsSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinwGIQAIrm0qpwVgDvh98B/anYDVbPpdn1U+/I s9LBUMZCMIVEHGoOUU7YV3QGMh27OFvsNW72ggrgPuCbOgPzAfyxLJZoNY1nLURO ZWSi2Jg4Om82BTP/Vw79yDjikLjJWAUKT/nHuyJxOnLKVsjKrlKuX+UAPXCqMv1O yukZnCoQx6c57iRFLrpGq/+OM4Y/vZ2w8zeb5/HOSvVglkIITlXvcMUXmk0JxtS5 Rr5R14F59BZnpQD5F3hYYvIWycM2DYNdHA5FFPLt1US6TAXjYlk/hf6jgEmHxvHI jN6U3qG0Wm8VO9ZlPXzwKYTPmHhc5llXqSlkXILuYke79w1dS1BINJb7dg3LcGna eq2IX0ZwxC4SicvDDFWZGmriwIIYbR7jWrSmtTr7W1HIb5mICIrXSuG0150xaCYv fc79brpBRm5me+py+WYkFKqT3DKkGHdB85+c7iXPToZeHY8v6581t3R4XQSHzPF6 JY0Uhlsb17Korbsl8Iw2FFQ8K8DRjLpeeQnDU7iwEuGpZrPSx872o8pDNH1rnmIp wIOjfnjWc5arlCPjWWc0Ga/np7ewBP4Zw4Ff4AFyZG1ISrTmzFBWQI7TqyemjC9v RTIBN5QmJIW6Zai/7oL5xNapRJCO250+dg1G8uOka2dklfnLdKxIBafWBa+/Znmn ZpmhFljq1w3y =h/VF -----END PGP SIGNATURE----- Merge tag 'modules-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules fix from Luis Chamberlain: "A fix is provided for ia64. Even though ia64 is on life support it helps to fix issues if we can. Thanks to Linus for doing tons of the ia64 debugging" * tag 'modules-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: fix module load for ia64	2023-05-30 15:49:40 -04:00
Theodore Ts'o	eb1f822c76	ext4: enable the lazy init thread when remounting read/write In commit a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled") we defer clearing tyhe SB_RDONLY flag in struct super. However, we didn't defer when we checked sb_rdonly() to determine the lazy itable init thread should be enabled, with the next result that the lazy inode table initialization would not be properly started. This can cause generic/231 to fail in ext4's nojournal mode. Fix this by moving when we decide to start or stop the lazy itable init thread to after we clear the SB_RDONLY flag when we are remounting the file system read/write. Fixes a44be64bbecb ("ext4: don't clear SB_RDONLY when remounting r/w until...") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230527035729.1001605-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2023-05-30 15:33:57 -04:00
Jan Kara	1077b2d53e	ext4: fix fsync for non-directories Commit e360c6ed7274 ("ext4: Drop special handling of journalled data from ext4_sync_file()") simplified ext4_sync_file() by dropping special handling of journalled data mode as it was not needed anymore. However that branch was also used for directories and symlinks and since the fastcommit code does not track metadata changes to non-regular files, the change has caused e.g. fsync(2) on directories to not commit transaction as it should. Fix the problem by adding handling for non-regular files. Fixes: e360c6ed7274 ("ext4: Drop special handling of journalled data from ext4_sync_file()") Reported-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/all/ZFqO3xVnmhL7zv1x@debian-BULLSEYE-live-builder-AMD64 Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20230524104453.8734-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2023-05-30 15:33:57 -04:00
Theodore Ts'o	aff3bea953	ext4: add lockdep annotations for i_data_sem for ea_inode's Treat i_data_sem for ea_inodes as being in their own lockdep class to avoid lockdep complaints about ext4_setattr's use of inode_lock() on normal inodes potentially causing lock ordering with i_data_sem on ea_inodes in ext4_xattr_inode_write(). However, ea_inodes will be operated on by ext4_setattr(), so this isn't a problem. Cc: stable@kernel.org Link: https://syzkaller.appspot.com/bug?extid=298c5d8fb4a128bc27b0 Reported-by: syzbot+298c5d8fb4a128bc27b0@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230524034951.779531-5-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2023-05-30 15:33:57 -04:00
Theodore Ts'o	2bc7e7c1a3	ext4: disallow ea_inodes with extended attributes An ea_inode stores the value of an extended attribute; it can not have extended attributes itself, or this will cause recursive nightmares. Add a check in ext4_iget() to make sure this is the case. Cc: stable@kernel.org Reported-by: syzbot+e44749b6ba4d0434cd47@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230524034951.779531-4-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2023-05-30 15:33:57 -04:00
Theodore Ts'o	b928dfdcb2	ext4: set lockdep subclass for the ea_inode in ext4_xattr_inode_cache_find() If the ea_inode has been pushed out of the inode cache while there is still a reference in the mb_cache, the lockdep subclass will not be set on the inode, which can lead to some lockdep false positives. Fixes: 33d201e0277b ("ext4: fix lockdep warning about recursive inode locking") Cc: stable@kernel.org Reported-by: syzbot+d4b971e744b1f5439336@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20230524034951.779531-3-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2023-05-30 15:33:50 -04:00
Jakub Kicinski	2e246bca98	Merge branch 'devlink-move-port-ops-into-separate-structure' Jiri Pirko says: ==================== devlink: move port ops into separate structure In devlink, some of the objects have separate ops registered alongside with the object itself. Port however have ops in devlink_ops structure. For drivers what register multiple kinds of ports with different ops this is not convenient. This patchset changes does following changes: 1) Introduces devlink_port_ops with functions that allow devlink port to be registered passing a pointer to driver port ops. (patch #1) 2) Converts drivers to define port_ops and register ports passing the ops pointer. (patches #2, #3, #4, #6, #8, and #9) 3) Moves ops from devlink_ops struct to devlink_port_ops. (patches #5, #7, #10-15) No functional changes. ==================== Link: https://lore.kernel.org/r/20230526102841.2226553-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:22 -07:00
Jiri Pirko	4b5ed2b5a1	devlink: save devlink_port_ops into a variable in devlink_port_function_validate() Now when the original ops variable is removed, introduce it again but this time for devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:20 -07:00
Jiri Pirko	216ba9f4ad	devlink: move port_del() to devlink_port_ops Move port_del() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:20 -07:00
Jiri Pirko	216aa67f3e	devlink: move port_fn_state_get/set() to devlink_port_ops Move port_fn_state_get/set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:20 -07:00
Jiri Pirko	4a490d7154	devlink: move port_fn_migratable_get/set() to devlink_port_ops Move port_fn_migratable_get/set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:20 -07:00
Jiri Pirko	933c13275c	devlink: move port_fn_roce_get/set() to devlink_port_ops Move port_fn_roce_get/set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:20 -07:00
Jiri Pirko	71c93e37cf	devlink: move port_fn_hw_addr_get/set() to devlink_port_ops Move port_fn_hw_addr_get/set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:20 -07:00
Jiri Pirko	aa3aff8264	mlx5: register devlink ports with ops Use newly introduce devlink port registration function variant and register devlink port passing ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:20 -07:00
Jiri Pirko	7bfb3d0a83	sfc: register devlink port with ops Use newly introduce devlink port registration function variant and register devlink port passing ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:20 -07:00
Jiri Pirko	65a4c44bf9	devlink: move port_type_set() op into devlink_port_ops Move port_type_set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:20 -07:00
Jiri Pirko	8a756d91d2	mlx4: register devlink port with ops Use newly introduce devlink port registration function variant and register devlink port passing ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-30 10:32:20 -07:00

... 3 4 5 6 7 ...

1186969 Commits