linux

iv/linux

Author	SHA1	Message	Date
Vladimir Oltean	cf1c39d3b3	net: dsa: avoid one dsa_to_port() in dsa_slave_change_mtu We could retrieve the cpu_dp pointer directly from the "dp" we already have, no need to resort to dsa_to_port(ds, port). This change also removes the need for an "int port", so that is also deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:34:34 +01:00
Vladimir Oltean	b2033a05a7	net: dsa: use dsa_tree_for_each_user_port in dsa_slave_change_mtu Use the more conventional iterator over user ports instead of explicitly ignoring them, and use the more conventional name "other_dp" instead of "dp_iter", for readability. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:34:34 +01:00
Vladimir Oltean	726816a129	net: dsa: make cross-chip notifiers more efficient for host events To determine whether a given port should react to the port targeted by the notifier, dsa_port_host_vlan_match() and dsa_port_host_address_match() look at the positioning of the switch port currently executing the notifier relative to the switch port for which the notifier was emitted. To maintain stylistic compatibility with the other match functions from switch.c, the host address and host VLAN match functions take the notifier information about targeted port, switch and tree indices as argument. However, these functions only use that information to retrieve the struct dsa_port *targeted_dp, which is an invariant for the outer loop that calls them. So it makes more sense to calculate the targeted dp only once, and pass it to them as argument. But furthermore, the targeted dp is actually known at the time the call to dsa_port_notify() is made. It is just that we decide to only save the indices of the port, switch and tree in the notifier structure, just to retrace our steps and find the dp again using dsa_switch_find() and dsa_to_port(). But both the above functions are relatively expensive, since they need to iterate through lists. It appears more straightforward to make all notifiers just pass the targeted dp inside their info structure, and have the code that needs the indices to look at info->dp->index instead of info->port, or info->dp->ds->index instead of info->sw_index, or info->dp->ds->dst->index instead of info->tree_index. For the sake of consistency, all cross-chip notifiers are converted to pass the "dp" directly. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:34:34 +01:00
Vladimir Oltean	8e9e678e47	net: dsa: move reset of VLAN filtering to dsa_port_switchdev_unsync_attrs In dsa_port_switchdev_unsync_attrs() there is a comment that resetting the VLAN filtering isn't done where it is expected. And since commit 108dc8741c20 ("net: dsa: Avoid cross-chip syncing of VLAN filtering"), there is no reason to handle this in switch.c either. Therefore, move the logic to port.c, and adapt it slightly to the data structures and naming conventions from there. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:34:34 +01:00
Nicolas Dichtel	c6a4254c18	doc/ip-sysctl: add bc_forwarding Let's describe this sysctl. Fixes: 5cbf777cfdf6 ("route: add support for directed broadcast forwarding") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-20 10:31:43 +01:00
Song Liu	559089e0a9	vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Huge page backed vmalloc memory could benefit performance in many cases. However, some users of vmalloc may not be ready to handle huge pages for various reasons: hardware constraints, potential pages split, etc. VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge pages. However, it is not easy to track down all the users that require the opt-out, as the allocation are passed different stacks and may cause issues in different layers. To address this issue, replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that users that benefit from huge pages could ask specificially. Also, remove vmalloc_no_huge() and add opt-in helper vmalloc_huge(). Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP") Link: https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/" Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-19 12:08:57 -07:00
Linus Torvalds	b7f73403a3	spi: Fixes for v5.18 A few more fixes for SPI, plus one new PCI ID for another Intel chipset. All device specific stuff. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmJe030ACgkQJNaLcl1U h9BTnwf/VXTuq7Qn+xrWaA6tGJuYijgyc8vVUylutqJl6LaAMnk7NyPS5KWd1qx2 0Yejp8R961YxG6xjduiUJgWnAU0MPu0JOVnR8s0mt2MBWDuLh53aQBgWKvm247Sz kN1mkSJcOWjzW0kVoY6XO8WW/Nofa8POtTR0CSxuwNByp6AGPay67BpyL586wej2 D/wUXnU7FMIOgE/GQ0OQJrbPQVPaqEGLjJrAVyszZxqfROdR6CHUODE2KtZ+EuGU Je36I5W2F6eXiyMLNl9bdfXU7qMxIg66MPTFeeIM7QcfpscqTZVeo+0zMEaGXGvc cg1ezRSq5wE8LisA8ZFUwVmHtSYw5A== =7AEL -----END PGP SIGNATURE----- Merge tag 'spi-fix-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few more fixes for SPI, plus one new PCI ID for another Intel chipset. All device specific stuff" * tag 'spi-fix-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and controller spi: cadence-quadspi: fix incorrect supports_op() return value spi: intel: Add support for Raptor Lake-S SPI serial flash spi: spi-mtk-nor: initialize spi controller after resume	2022-04-19 10:30:43 -07:00
Christian Brauner	705191b03d	fs: fix acl translation Last cycle we extended the idmapped mounts infrastructure to support idmapped mounts of idmapped filesystems (No such filesystem yet exist.). Since then, the meaning of an idmapped mount is a mount whose idmapping is different from the filesystems idmapping. While doing that work we missed to adapt the acl translation helpers. They still assume that checking for the identity mapping is enough. But they need to use the no_idmapping() helper instead. Note, POSIX ACLs are always translated right at the userspace-kernel boundary using the caller's current idmapping and the initial idmapping. The order depends on whether we're coming from or going to userspace. The filesystem's idmapping doesn't matter at the border. Consequently, if a non-idmapped mount is passed we need to make sure to always pass the initial idmapping as the mount's idmapping and not the filesystem idmapping. Since it's irrelevant here it would yield invalid ids and prevent setting acls for filesystems that are mountable in a userns and support posix acls (tmpfs and fuse). I verified the regression reported in [1] and verified that this patch fixes it. A regression test will be added to xfstests in parallel. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1] Fixes: bd303368b776 ("fs: support mapped mounts of mapped filesystems") Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # 5.17 Cc: <regressions@lists.linux.dev> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-19 10:19:02 -07:00
Pavel Pisa	cfdb2f365c	MAINTAINERS: Add maintainers for CTU CAN FD IP core driver This patch adds an entry for the CTU CAN FD IP to the maintainers file. Link: https://lore.kernel.org/all/2cc77e2999d9688bed155e4c7f7807e46d1bf9e3.1647904780.git.pisa@cmp.felk.cvut.cz Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:15 +02:00
Pavel Pisa	c3a0addefb	docs: ctucanfd: CTU CAN FD open-source IP core documentation. CTU CAN FD IP core documentation based on Martin Jeřábek's diploma theses Open-source and Open-hardware CAN FD Protocol Support https://dspace.cvut.cz/handle/10467/80366 . Link: https://lore.kernel.org/all/692b965999ff6c272239df0fe1c76b68d02b134d.1647932262.git.pisa@cmp.felk.cvut.cz Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Martin Jerabek <martin.jerabek01@gmail.com> Signed-off-by: Ondrej Ille <ondrej.ille@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:14 +02:00
Pavel Pisa	e8f0c23a24	can: ctucanfd: CTU CAN FD open-source IP core - platform/SoC support. Platform bus adaptation for CTU CAN FD open-source IP core. The core has been tested together with OpenCores SJA1000 modified to be CAN FD frames tolerant on MicroZed Zynq based MZ_APO education kits designed by Petr Porazil from PiKRON.com company. FPGA design https://gitlab.fel.cvut.cz/canbus/zynq/zynq-can-sja1000-top. The kit description at the Computer Architectures course pages https://cw.fel.cvut.cz/wiki/courses/b35apo/documentation/mz_apo/start . Kit carrier board and mechanics design source files https://gitlab.com/pikron/projects/mz_apo/microzed_apo The work is documented in Martin Jeřábek's diploma theses Open-source and Open-hardware CAN FD Protocol Support https://dspace.cvut.cz/handle/10467/80366 . Link: https://lore.kernel.org/all/4d5c53499bafe7717815f948801bd5aedaa05c12.1647904780.git.pisa@cmp.felk.cvut.cz Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Martin Jerabek <martin.jerabek01@gmail.com> Signed-off-by: Ondrej Ille <ondrej.ille@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:14 +02:00
Pavel Pisa	792a5b678e	can: ctucanfd: CTU CAN FD open-source IP core - PCI bus support. PCI bus adaptation for CTU CAN FD open-source IP core. The project providing FPGA design for Intel EP4CGX15 based DB4CGX15 PCIe board with PiKRON.com designed transceiver riser shield is available at https://gitlab.fel.cvut.cz/canbus/pcie-ctucanfd . Link: https://lore.kernel.org/all/a81333e206a9bcf9434797f6f54d8664775542e2.1647904780.git.pisa@cmp.felk.cvut.cz Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Martin Jerabek <martin.jerabek01@gmail.com> Signed-off-by: Ondrej Ille <ondrej.ille@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:14 +02:00
Martin Jerabek	2dcb8e8782	can: ctucanfd: add support for CTU CAN FD open-source IP core - bus independent part. This driver adds support for the CTU CAN FD open-source IP core. More documentation and core sources at project page (https://gitlab.fel.cvut.cz/canbus/ctucanfd_ip_core). The core integration to Xilinx Zynq system as platform driver is available (https://gitlab.fel.cvut.cz/canbus/zynq/zynq-can-sja1000-top). Implementation on Intel FPGA based PCI Express board is available from project (https://gitlab.fel.cvut.cz/canbus/pcie-ctucanfd). More about CAN bus related projects used and developed at CTU FEE at https://canbus.pages.fel.cvut.cz/ . Link: https://lore.kernel.org/all/1906e4941560ae2ce4b8d181131fd4963aa31611.1647904780.git.pisa@cmp.felk.cvut.cz Signed-off-by: Martin Jerabek <martin.jerabek01@gmail.com> Signed-off-by: Ondrej Ille <ondrej.ille@gmail.com> Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:14 +02:00
Pavel Pisa	1da9d6e35b	dt-bindings: net: can: binding for CTU CAN FD open-source IP core. The device-tree bindings for open-source/open-hardware CAN FD IP core designed at the Czech Technical University in Prague. CTU CAN FD IP core and other CTU CAN bus related projects listing and documentation page http://canbus.pages.fel.cvut.cz/ Link: https://lore.kernel.org/all/c5a37fc470ae065b21e79caa65863539393c0d7c.1647904780.git.pisa@cmp.felk.cvut.cz Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:14 +02:00
Pavel Pisa	fb23e43a0a	dt-bindings: vendor-prefix: add prefix for the Czech Technical University in Prague. The Czech Technical University in Prague (CTU) is one of the biggest and oldest (founded 1707) technical universities in Europe. The abbreviation in Czech language is ČVUT according to official name in Czech language České vysoké učení technické v Praze The English translation The Czech Technical University in Prague The university pages in English https://www.cvut.cz/en Link: https://lore.kernel.org/all/ff3a7216114fcd83530e70b994ef0e4277ddf000.1647904780.git.pisa@cmp.felk.cvut.cz Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:14 +02:00
Marc Kleine-Budde	c6f2a617a0	can: mcp251xfd: add support for mcp251863 The MCP251863 device is a CAN-FD controller (MCP2518FD) with an integrated transceiver (ATA6563). This patch add support for the new device. Link: https://lore.kernel.org/all/20220419072805.2840340-3-mkl@pengutronix.de Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:13 +02:00
Marc Kleine-Budde	6211197648	dt-binding: can: mcp251xfd: add binding information for mcp251863 The MCP251863 device is a CAN-FD controller (MCP2518FD) with an integrated Transceiver (ATA6563). Add the microchip,mcp251863 as a new compatible to the binding. Link: https://lore.kernel.org/all/20220419072805.2840340-2-mkl@pengutronix.de Cc: devicetree@vger.kernel.org Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:13 +02:00
Wolfram Sang	44b6b105dd	dt-bindings: can: renesas,rcar-canfd: document r8a77961 support This patch adds documentation for the r8a77961 to the renesas,rcar-canfd binding. Link: https://lore.kernel.org/all/20220401153743.77871-1-wsa+renesas@sang-engineering.com Cc: devicetree@vger.kernel.org Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:13 +02:00
Marc Kleine-Budde	ae38fda029	can: xilinx_can: mark bit timing constants as const This patch marks the bit timing constants as const. Fixes: c223da689324 ("can: xilinx_can: Add support for CANFD FD frames") Link: https://lore.kernel.org/all/20220317203119.792552-1-mkl@pengutronix.de Cc: Appana Durga Kedareswara rao <appana.durga.rao@xilinx.com> Cc: Naga Sureshkumar Relli <naga.sureshkumar.relli@xilinx.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:13 +02:00
Lukas Bulwahn	badea4fc70	MAINTAINERS: rectify entry for XILINX CAN DRIVER Commit 7843d3c8e5e6 ("dt-bindings: can: xilinx_can: Convert Xilinx CAN binding to YAML") converts xilinx_can.txt to xilinx,can.yaml, but missed to adjust its reference in MAINTAINERS. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a broken reference. Repair this file reference in XILINX CAN DRIVER. Fixes: 7843d3c8e5e6 ("dt-bindings: can: xilinx_can: Convert Xilinx CAN binding to YAML") Link: https://lore.kernel.org/all/20220321122840.17841-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:13 +02:00
Minghao Chi	e6ec837905	can: flexcan: using pm_runtime_resume_and_get instead of pm_runtime_get_sync Using pm_runtime_resume_and_get is more appropriate for simplifing code Link: https://lore.kernel.org/all/20220419081449.2574026-1-chi.minghao@zte.com.cn Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:12 +02:00
Christophe Leroy	bb75e352d7	can: mscan: mpc5xxx_can: Prepare cleanup of powerpc's asm/prom.h powerpc's asm/prom.h brings some headers that it doesn't need itself. In order to clean it up, first add missing headers in users of asm/prom.h Link: https://lore.kernel.org/all/878888f9057ad2f66ca0621a0007472bf57f3e3d.1648833432.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:12 +02:00
Kris Bahnsen	20c7258980	can: Fix Links to Technologic Systems web resources Technologic Systems has rebranded as embeddedTS with the current domain eventually going offline. Update web/doc URLs to correct resource locations. Link: https://lore.kernel.org/all/20220329201229.16279-1-kris@embeddedTS.com Signed-off-by: Kris Bahnsen <kris@embeddedTS.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:12 +02:00
Marc Kleine-Budde	85d4eb2a3d	can: bittiming: can_calc_bittiming(): prefer small bit rate pre-scalers over larger ones The CiA (CAN in Automation) lists in their Newsletter 1/2018 in the "Recommendation for the CAN FD bit-timing" [1] article several recommendations, one of them is: \| Recommendation 3: Choose BRPA and BRPD as low as possible [1] https://can-newsletter.org/uploads/media/raw/f6a36d1461371a2f86ef0011a513712c.pdf With the current bit timing algorithm Srinivas Neeli noticed that on the Xilinx Versal ACAP board the CAN data bit timing parameters are not calculated optimally. For most bit rates, the bit rate prescaler (BRP) is != 1, although it's possible to configure the requested with a bit rate with a prescaler of 1: \| Data Bit timing parameters for xilinx_can_fd2i with 79.999999 MHz ref clock (cmd-line) using algo 'v4.8' \| nominal real Bitrt nom real SampP \| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error \| 12000000 12 2 2 2 1 1 11428571 4.8% 75.0% 71.4% 4.8% \| 10000000 25 1 1 1 1 2 9999999 0.0% 75.0% 75.0% 0.0% \| 8000000 12 3 3 3 1 1 7999999 0.0% 75.0% 70.0% 6.7% \| 5000000 50 1 1 1 1 4 4999999 0.0% 75.0% 75.0% 0.0% \| 4000000 62 1 1 1 1 5 3999999 0.0% 75.0% 75.0% 0.0% \| 2000000 125 1 1 1 1 10 1999999 0.0% 75.0% 75.0% 0.0% \| 1000000 250 1 1 1 1 20 999999 0.0% 75.0% 75.0% 0.0% The bit timing parameter calculation algorithm iterates effectively from low to high BRP values. It selects a new best parameter set, if the sample point error of the current parameter set is equal or less to old best parameter set. If the given hardware constraints (clock rate and bit timing parameter constants) don't allow a sample point error of 0, the algorithm will first find a valid bit timing parameter set with a low BRP, but then will accept parameter sets with higher BRPs that have the same sample point error. This patch changes the algorithm to only accept a new parameter set, if the resulting sample point error is lower. This leads to the following data bit timing parameter for the Versal ACAP board: \| Data Bit timing parameters for xilinx_can_fd2i with 79.999999 MHz ref clock (cmd-line) using algo 'can-next' \| nominal real Bitrt nom real SampP \| Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error \| 12000000 12 2 2 2 1 1 11428571 4.8% 75.0% 71.4% 4.8% \| 10000000 12 2 3 2 1 1 9999999 0.0% 75.0% 75.0% 0.0% \| 8000000 12 3 3 3 1 1 7999999 0.0% 75.0% 70.0% 6.7% \| 5000000 12 5 6 4 1 1 4999999 0.0% 75.0% 75.0% 0.0% \| 4000000 12 7 7 5 1 1 3999999 0.0% 75.0% 75.0% 0.0% \| 2000000 12 14 15 10 1 1 1999999 0.0% 75.0% 75.0% 0.0% \| 1000000 25 14 15 10 1 2 999999 0.0% 75.0% 75.0% 0.0% Note: Due to HW constraints a data bit rate of 1 MBit/s with BRP = 1 is not possible. Link: https://lore.kernel.org/all/20220318144913.873614-1-mkl@pengutronix.de Link: https://lore.kernel.org/all/20220113203004.jf2rqj2pirhgx72i@pengutronix.de Cc: Srinivas Neeli <sneeli@xilinx.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 17:12:05 +02:00
Marc Kleine-Budde	eb38c2053b	can: rx-offload: rename can_rx_offload_queue_sorted() -> can_rx_offload_queue_timestamp() This patch renames the function can_rx_offload_queue_sorted() to can_rx_offload_queue_timestamp(). This better describes what the function does, it adds a newly RX'ed skb to the sorted queue by its timestamp. Link: https://lore.kernel.org/all/20220417194327.2699059-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-04-19 16:58:04 +02:00
Eric Dumazet	99c07327ae	netlink: reset network and mac headers in netlink_dump() netlink_dump() is allocating an skb, reserves space in it but forgets to reset network header. This allows a BPF program, invoked later from sk_filter() to access uninitialized kernel memory from the reserved space. Theorically mac header reset could be omitted, because it is set to a special initial value. bpf_internal_load_pointer_neg_helper calls skb_mac_header() without checking skb_mac_header_was_set(). Relying on skb->len not being too big seems fragile. We also could add a sanity check in bpf_internal_load_pointer_neg_helper() to avoid surprises in the future. syzbot report was: BUG: KMSAN: uninit-value in ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637 ___bpf_prog_run+0xa22b/0xb420 kernel/bpf/core.c:1637 __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796 bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline] __bpf_prog_run include/linux/filter.h:626 [inline] bpf_prog_run include/linux/filter.h:633 [inline] __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756 bpf_prog_run_save_cb include/linux/filter.h:770 [inline] sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150 sk_filter include/linux/filter.h:905 [inline] netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276 netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_read_iter+0x5a9/0x630 net/socket.c:1039 do_iter_readv_writev+0xa7f/0xc70 do_iter_read+0x52c/0x14c0 fs/read_write.c:786 vfs_readv fs/read_write.c:906 [inline] do_readv+0x432/0x800 fs/read_write.c:943 __do_sys_readv fs/read_write.c:1034 [inline] __se_sys_readv fs/read_write.c:1031 [inline] __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was stored to memory at: ___bpf_prog_run+0x96c/0xb420 kernel/bpf/core.c:1558 __bpf_prog_run32+0x121/0x180 kernel/bpf/core.c:1796 bpf_dispatcher_nop_func include/linux/bpf.h:784 [inline] __bpf_prog_run include/linux/filter.h:626 [inline] bpf_prog_run include/linux/filter.h:633 [inline] __bpf_prog_run_save_cb+0x168/0x580 include/linux/filter.h:756 bpf_prog_run_save_cb include/linux/filter.h:770 [inline] sk_filter_trim_cap+0x3bc/0x8c0 net/core/filter.c:150 sk_filter include/linux/filter.h:905 [inline] netlink_dump+0xe0c/0x16c0 net/netlink/af_netlink.c:2276 netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_read_iter+0x5a9/0x630 net/socket.c:1039 do_iter_readv_writev+0xa7f/0xc70 do_iter_read+0x52c/0x14c0 fs/read_write.c:786 vfs_readv fs/read_write.c:906 [inline] do_readv+0x432/0x800 fs/read_write.c:943 __do_sys_readv fs/read_write.c:1034 [inline] __se_sys_readv fs/read_write.c:1031 [inline] __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was created at: slab_post_alloc_hook mm/slab.h:737 [inline] slab_alloc_node mm/slub.c:3244 [inline] __kmalloc_node_track_caller+0xde3/0x14f0 mm/slub.c:4972 kmalloc_reserve net/core/skbuff.c:354 [inline] __alloc_skb+0x545/0xf90 net/core/skbuff.c:426 alloc_skb include/linux/skbuff.h:1158 [inline] netlink_dump+0x30f/0x16c0 net/netlink/af_netlink.c:2242 netlink_recvmsg+0x1129/0x1c80 net/netlink/af_netlink.c:2002 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_read_iter+0x5a9/0x630 net/socket.c:1039 do_iter_readv_writev+0xa7f/0xc70 do_iter_read+0x52c/0x14c0 fs/read_write.c:786 vfs_readv fs/read_write.c:906 [inline] do_readv+0x432/0x800 fs/read_write.c:943 __do_sys_readv fs/read_write.c:1034 [inline] __se_sys_readv fs/read_write.c:1031 [inline] __x64_sys_readv+0xe5/0x120 fs/read_write.c:1031 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x44/0xae CPU: 0 PID: 3470 Comm: syz-executor751 Not tainted 5.17.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: db65a3aaf29e ("netlink: Trim skb to alloc size to avoid MSG_TRUNC") Fixes: 9063e21fb026 ("netlink: autosize skb lengthes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220415181442.551228-1-eric.dumazet@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 15:05:03 +02:00
Paolo Abeni	cc4bdef26e	Merge branch 'rtnetlink-improve-alt_ifname-config-and-fix-dangerous-group-usage' Florent Fourcot says: ==================== rtnetlink: improve ALT_IFNAME config and fix dangerous GROUP usage First commit forbids dangerous calls when both IFNAME and GROUP are given, since it can introduce unexpected behaviour when IFNAME does not match any interface. Second patch achieves primary goal of this patchset to fix/improve IFLA_ALT_IFNAME attribute, since previous code was never working for newlink/setlink. ip-link command is probably getting interface index before, and was not using this feature. Last two patches are improving error code on corner cases. Changes in v2: * Remove ifname argument in rtnl_dev_get/do_setlink functions (simplify code) * Use a boolean to avoid condition duplication in __rtnl_newlink Changes in v3: * Simplify rtnl_dev_get signature Changes in v4: * Rename link_lookup to link_specified Changes in v5: * Re-order patches ==================== Link: https://lore.kernel.org/r/20220415165330.10497-1-florent.fourcot@wifirst.fr Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 13:39:01 +02:00
Florent Fourcot	b6177d3240	rtnetlink: return EINVAL when request cannot succeed A request without interface name/interface index/interface group cannot work. We should return EINVAL Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 13:38:55 +02:00
Florent Fourcot	dee04163e9	rtnetlink: return ENODEV when IFLA_ALT_IFNAME is used in dellink If IFLA_ALT_IFNAME is set and given interface is not found, we should return ENODEV and be consistent with IFLA_IFNAME behaviour This commit extends feature of commit 76c9ac0ee878, "net: rtnetlink: add possibility to use alternative names as message handle" CC: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 13:38:50 +02:00
Florent Fourcot	5ea08b5286	rtnetlink: enable alt_ifname for setlink/newlink buffer called "ifname" given in function rtnl_dev_get is always valid when called by setlink/newlink, but contains only empty string when IFLA_IFNAME is not given. So IFLA_ALT_IFNAME is always ignored This patch fixes rtnl_dev_get function with a remove of ifname argument, and move ifname copy in do_setlink when required. It extends feature of commit 76c9ac0ee878, "net: rtnetlink: add possibility to use alternative names as message handle"" CC: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 13:38:43 +02:00
Florent Fourcot	ef2a7c9065	rtnetlink: return ENODEV when ifname does not exist and group is given When the interface does not exist, and a group is given, the given parameters are being set to all interfaces of the given group. The given IFNAME/ALT_IF_NAME are being ignored in that case. That can be dangerous since a typo (or a deleted interface) can produce weird side effects for caller: Case 1: IFLA_IFNAME=valid_interface IFLA_GROUP=1 MTU=1234 Case 1 will update MTU and group of the given interface "valid_interface". Case 2: IFLA_IFNAME=doesnotexist IFLA_GROUP=1 MTU=1234 Case 2 will update MTU of all interfaces in group 1. IFLA_IFNAME is ignored in this case This behaviour is not consistent and dangerous. In order to fix this issue, we now return ENODEV when the given IFNAME does not exist. Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 13:38:31 +02:00
Paolo Abeni	8b11c35d97	Merge branch 'net-sched-allow-user-to-select-txqueue' Tonghao Zhang says: ==================== net: sched: allow user to select txqueue From: Tonghao Zhang <xiangxia.m.yue@gmail.com> Patch 1 allow user to select txqueue in clsact hook. Patch 2 support skbhash to select txqueue. ==================== Link: https://lore.kernel.org/r/20220415164046.26636-1-xiangxia.m.yue@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 12:20:48 +02:00
Tonghao Zhang	38a6f08657	net: sched: support hash selecting tx queue This patch allows users to pick queue_mapping, range from A to B. Then we can load balance packets from A to B tx queue. The range is an unsigned 16bit value in decimal format. $ tc filter ... action skbedit queue_mapping skbhash A B "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is enhanced with flags: SKBEDIT_F_TXQ_SKBHASH +----+ +----+ +----+ \| P1 \| \| P2 \| \| Pn \| +----+ +----+ +----+ \| \| \| +-----------+-----------+ \| \| clsact/skbedit \| MQ v +-----------+-----------+ \| q0 \| qn \| qm v v v HTB/FQ FIFO ... FIFO For example: If P1 sends out packets to different Pods on other host, and we want distribute flows from qn - qm. Then we can use skb->hash as hash. setup commands: $ NETDEV=eth0 $ ip netns add n1 $ ip link add ipv1 link $NETDEV type ipvlan mode l2 $ ip link set ipv1 netns n1 $ ip netns exec n1 ifconfig ipv1 2.2.2.100/24 up $ tc qdisc add dev $NETDEV clsact $ tc filter add dev $NETDEV egress protocol ip prio 1 \ flower skip_hw src_ip 2.2.2.100 action skbedit queue_mapping skbhash 2 6 $ tc qdisc add dev $NETDEV handle 1: root mq $ tc qdisc add dev $NETDEV parent 1:1 handle 2: htb $ tc class add dev $NETDEV parent 2: classid 2:1 htb rate 100kbit $ tc class add dev $NETDEV parent 2: classid 2:2 htb rate 200kbit $ tc qdisc add dev $NETDEV parent 1:2 tbf rate 100mbit burst 100mb latency 1 $ tc qdisc add dev $NETDEV parent 1:3 pfifo $ tc qdisc add dev $NETDEV parent 1:4 pfifo $ tc qdisc add dev $NETDEV parent 1:5 pfifo $ tc qdisc add dev $NETDEV parent 1:6 pfifo $ tc qdisc add dev $NETDEV parent 1:7 pfifo $ ip netns exec n1 iperf3 -c 2.2.2.1 -i 1 -t 10 -P 10 pick txqueue from 2 - 6: $ ethtool -S $NETDEV \| grep -i tx_queue_[0-9]_bytes tx_queue_0_bytes: 42 tx_queue_1_bytes: 0 tx_queue_2_bytes: 11442586444 tx_queue_3_bytes: 7383615334 tx_queue_4_bytes: 3981365579 tx_queue_5_bytes: 3983235051 tx_queue_6_bytes: 6706236461 tx_queue_7_bytes: 42 tx_queue_8_bytes: 0 tx_queue_9_bytes: 0 txqueues 2 - 6 are mapped to classid 1:3 - 1:7 $ tc -s class show dev $NETDEV ... class mq 1:3 root leaf 8002: Sent 11949133672 bytes 7929798 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq 1:4 root leaf 8003: Sent 7710449050 bytes 5117279 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq 1:5 root leaf 8004: Sent 4157648675 bytes 2758990 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq 1:6 root leaf 8005: Sent 4159632195 bytes 2759990 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq 1:7 root leaf 8006: Sent 7003169603 bytes 4646912 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 ... Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Lobakin <alobakin@pm.me> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Talal Ahmad <talalahmad@google.com> Cc: Kevin Hao <haokexin@gmail.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Kees Cook <keescook@chromium.org> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Cc: Antoine Tenart <atenart@kernel.org> Cc: Wei Wang <weiwan@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 12:20:45 +02:00
Tonghao Zhang	2f1e85b1ae	net: sched: use queue_mapping to pick tx queue This patch fixes issue: * If we install tc filters with act_skbedit in clsact hook. It doesn't work, because netdev_core_pick_tx() overwrites queue_mapping. $ tc filter ... action skbedit queue_mapping 1 And this patch is useful: * We can use FQ + EDT to implement efficient policies. Tx queues are picked by xps, ndo_select_queue of netdev driver, or skb hash in netdev_core_pick_tx(). In fact, the netdev driver, and skb hash are _not_ under control. xps uses the CPUs map to select Tx queues, but we can't figure out which task_struct of pod/containter running on this cpu in most case. We can use clsact filters to classify one pod/container traffic to one Tx queue. Why ? In containter networking environment, there are two kinds of pod/ containter/net-namespace. One kind (e.g. P1, P2), the high throughput is key in these applications. But avoid running out of network resource, the outbound traffic of these pods is limited, using or sharing one dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods (e.g. Pn), the low latency of data access is key. And the traffic is not limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc. This choice provides two benefits. First, contention on the HTB/FQ Qdisc lock is significantly reduced since fewer CPUs contend for the same queue. More importantly, Qdisc contention can be eliminated completely if each CPU has its own FIFO Qdisc for the second kind of pods. There must be a mechanism in place to support classifying traffic based on pods/container to different Tx queues. Note that clsact is outside of Qdisc while Qdisc can run a classifier to select a sub-queue under the lock. In general recording the decision in the skb seems a little heavy handed. This patch introduces a per-CPU variable, suggested by Eric. The xmit.skip_txqueue flag is firstly cleared in __dev_queue_xmit(). - Tx Qdisc may install that skbedit actions, then xmit.skip_txqueue flag is set in qdisc->enqueue() though tx queue has been selected in netdev_tx_queue_mapping() or netdev_core_pick_tx(). That flag is cleared firstly in __dev_queue_xmit(), is useful: - Avoid picking Tx queue with netdev_tx_queue_mapping() in next netdev in such case: eth0 macvlan - eth0.3 vlan - eth0 ixgbe-phy: For example, eth0, macvlan in pod, which root Qdisc install skbedit queue_mapping, send packets to eth0.3, vlan in host. In __dev_queue_xmit() of eth0.3, clear the flag, does not select tx queue according to skb->queue_mapping because there is no filters in clsact or tx Qdisc of this netdev. Same action taked in eth0, ixgbe in Host. - Avoid picking Tx queue for next packet. If we set xmit.skip_txqueue in tx Qdisc (qdisc->enqueue()), the proper way to clear it is clearing it in __dev_queue_xmit when processing next packets. For performance reasons, use the static key. If user does not config the NET_EGRESS, the patch will not be compiled. +----+ +----+ +----+ \| P1 \| \| P2 \| \| Pn \| +----+ +----+ +----+ \| \| \| +-----------+-----------+ \| \| clsact/skbedit \| MQ v +-----------+-----------+ \| q0 \| q1 \| qn v v v HTB/FQ HTB/FQ ... FIFO Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Lobakin <alobakin@pm.me> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Talal Ahmad <talalahmad@google.com> Cc: Kevin Hao <haokexin@gmail.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Kees Cook <keescook@chromium.org> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Cc: Antoine Tenart <atenart@kernel.org> Cc: Wei Wang <weiwan@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 12:20:45 +02:00
Vladimir Oltean	4cf35a2b62	net: mscc: ocelot: fix broken IP multicast flooding When the user runs: bridge link set dev $br_port mcast_flood on this command should affect not only L2 multicast, but also IPv4 and IPv6 multicast. In the Ocelot switch, unknown multicast gets flooded according to different PGIDs according to its type, and PGID_MC only handles L2 multicast. Therefore, by leaving PGID_MCIPV4 and PGID_MCIPV6 at their default value of 0, unknown IP multicast traffic is never flooded. Fixes: 421741ea5672 ("net: mscc: ocelot: offload bridge port flags to device") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220415151950.219660-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 10:33:33 +02:00
Kurt Kanzenbach	0763120b09	net: dsa: hellcreek: Calculate checksums in tagger In case the checksum calculation is offloaded to the DSA master network interface, it will include the switch trailing tag. As soon as the switch strips that tag on egress, the calculated checksum is wrong. Therefore, add the checksum calculation to the tagger (if required) before adding the switch tag. This way, the hellcreek code works with all DSA master interfaces regardless of their declared feature set. Fixes: 01ef09caad66 ("net: dsa: Add tag handling for Hirschmann Hellcreek switches") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220415103320.90657-1-kurt@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-19 09:49:38 +02:00
Manuel Ullmann	cbe6c3a8f8	net: atlantic: invert deep par in pm functions, preventing null derefs This will reset deeply on freeze and thaw instead of suspend and resume and prevent null pointer dereferences of the uninitialized ring 0 buffer while thawing. The impact is an indefinitely hanging kernel. You can't switch consoles after this and the only possible user interaction is SysRq. BUG: kernel NULL pointer dereference RIP: 0010:aq_ring_rx_fill+0xcf/0x210 [atlantic] aq_vec_init+0x85/0xe0 [atlantic] aq_nic_init+0xf7/0x1d0 [atlantic] atl_resume_common+0x4f/0x100 [atlantic] pci_pm_thaw+0x42/0xa0 resolves in aq_ring.o to ``` 0000000000000ae0 <aq_ring_rx_fill>: { /* ... / baf: 48 8b 43 08 mov 0x8(%rbx),%rax buff->flags = 0U; / buff is NULL */ ``` The bug has been present since the introduction of the new pm code in 8aaa112a57c1 ("net: atlantic: refactoring pm logic") and was hidden until 8ce84271697a ("net: atlantic: changes for multi-TC support"), which refactored the aq_vec_{free,alloc} functions into aq_vec_{,ring}_{free,alloc}, but is technically not wrong. The original functions just always reinitialized the buffers on S3/S4. If the interface is down before freezing, the bug does not occur. It does not matter, whether the initrd contains and loads the module before thawing. So the fix is to invert the boolean parameter deep in all pm function calls, which was clearly intended to be set like that. First report was on Github [1], which you have to guess from the resume logs in the posted dmesg snippet. Recently I posted one on Bugzilla [2], since I did not have an AQC device so far. #regzbot introduced: 8ce84271697a #regzbot from: koo5 <kolman.jindrich@gmail.com> #regzbot monitor: https://github.com/Aquantia/AQtion/issues/32 Fixes: 8aaa112a57c1 ("net: atlantic: refactoring pm logic") Link: https://github.com/Aquantia/AQtion/issues/32 [1] Link: https://bugzilla.kernel.org/show_bug.cgi?id=215798 [2] Cc: stable@vger.kernel.org Reported-by: koo5 <kolman.jindrich@gmail.com> Signed-off-by: Manuel Ullmann <labre@posteo.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 13:34:36 +01:00
Luiz Angelo Daros de Luca	a997157e42	docs: net: dsa: describe issues with checksum offload DSA tags before IP header (categories 1 and 2) or after the payload (3) might introduce offload checksum issues. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 13:29:02 +01:00
David S. Miller	2a38de067b	Merge branch 'mlxsw-line-card' Ido Schimmel says: ==================== mlxsw: Introduce line card support for modular switch Jiri says: This patchset introduces support for modular switch systems and also introduces mlxsw support for NVIDIA Mellanox SN4800 modular switch. It contains 8 slots to accommodate line cards - replaceable PHY modules which may contain gearboxes. Currently supported line card: 16X 100GbE (QSFP28) Other line cards that are going to be supported: 8X 200GbE (QSFP56) 4X 400GbE (QSFP-DD) There may be other types of line cards added in the future. To be consistent with the port split configuration (splitter cabels), the line card entities are treated in the similar way. The nature of a line card is not "a pluggable device", but "a pluggable PHY module". A concept of "provisioning" is introduced. The user may "provision" certain slot with a line card type. Driver then creates all instances (devlink ports, netdevices, etc) related to this line card type. It does not matter if the line card is plugged-in at the time. User is able to configure netdevices, devlink ports, setup port splitters, etc. From the perspective of the switch ASIC, all is present and can be configured. The carrier of netdevices stays down if the line card is not plugged-in. Once the line card is inserted and activated, the carrier of the related netdevices is then reflecting the physical line state, same as for an ordinary fixed port. Once user does not want to use the line card related instances anymore, he can "unprovision" the slot. Driver then removes the instances. Patches 1-4 are extending devlink driver API and UAPI in order to register, show, dump, provision and activate the line card. Patches 5-17 are implementing the introduced API in mlxsw. The last patch adds a selftest for mlxsw line cards. Example: $ devlink port # No ports are listed $ devlink lc pci/0000:01:00.0: lc 1 state unprovisioned supported_types: 16x100G lc 2 state unprovisioned supported_types: 16x100G lc 3 state unprovisioned supported_types: 16x100G lc 4 state unprovisioned supported_types: 16x100G lc 5 state unprovisioned supported_types: 16x100G lc 6 state unprovisioned supported_types: 16x100G lc 7 state unprovisioned supported_types: 16x100G lc 8 state unprovisioned supported_types: 16x100G Note that driver exposes list supported line card types. Currently there is only one: "16x100G". To provision the slot #8: $ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G $ devlink lc show pci/0000:01:00.0 lc 8 pci/0000:01:00.0: lc 8 state active type 16x100G supported_types: 16x100G $ devlink port pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4 pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4 pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4 pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4 pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4 pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4 pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4 pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4 pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4 pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4 pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4 pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4 pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4 pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4 pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4 pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4 To uprovision the slot #8: $ devlink lc set pci/0000:01:00.0 lc 8 notype ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:19 +01:00
Jiri Pirko	e1fad9517f	selftests: mlxsw: Introduce devlink line card provision/unprovision/activation tests Introduce basic line card manipulation which consists of provisioning, unprovisioning and activation of a line card. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:19 +01:00
Jiri Pirko	6445eef0f6	mlxsw: spectrum: Add port to linecard mapping For each port get slot_index using PMLP register. For ports residing on a linecard, identify it with the linecard by setting mapping using devlink_port_linecard_set() helper. Use linecard slot index for PMTDB register queries. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:19 +01:00
Jiri Pirko	45bf3b7267	mlxsw: core: Extend driver ops by remove selected ports op In case of line card implementation, the core has to have a way to remove relevant ports manually. Extend the Spectrum driver ops by an op that implements port removal of selected ports upon request. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:19 +01:00
Jiri Pirko	ee7a70fa67	mlxsw: core_linecards: Implement line card activation process Allow to process events generated upon line card getting "ready" and "active". When DSDSC event with "ready" bit set is delivered, that means the line card is powered up. Use MDDC register to push the line card to active state. Once FW is done with that, the DSDSC event with "active" bit set is delivered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:19 +01:00
Jiri Pirko	b217127e5e	mlxsw: core_linecards: Add line card objects and implement provisioning Introduce objects for line cards and an infrastructure around that. Use devlink_linecard_create/destroy() to register the line card with devlink core. Implement provisioning ops with a list of supported line cards. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:19 +01:00
Jiri Pirko	5bade5aa4a	mlxsw: reg: Add Management Binary Code Transfer Register The MBCT register allows to transfer binary INI codes from the host to the management FW by transferring it by chunks of maximum 1KB. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:19 +01:00
Jiri Pirko	5290a8ff2e	mlxsw: reg: Add Management DownStream Device Control Register The MDDC register allows to control downstream devices and line cards. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:19 +01:00
Jiri Pirko	505f524dc6	mlxsw: reg: Add Management DownStream Device Query Register The MDDQ register allows to query the DownStream device properties. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:19 +01:00
Jiri Pirko	b0ec003e9a	mlxsw: spectrum: Introduce port mapping change event processing Register PMLPE trap and process the port mapping changes delivered by it by creating related ports. Note that this happens after provisioning. The INI of the linecard is processed and merged by FW. PMLPE is generated for each port. Process this mapping change. Layout of PMLPE is the same as layout of PMLP. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:19 +01:00
Jiri Pirko	adc6462376	mlxsw: Narrow the critical section of devl_lock during ports creation/removal No need to hold the lock for alloc and freecpu. So narrow the critical section. Follow-up patch is going to benefit from this by adding more code to the functions which will be out of the critical as well. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:18 +01:00
Jiri Pirko	ebf0c53417	mlxsw: reg: Add Ports Mapping Event Configuration Register The PMECR register is used to enable/disable event triggering in case of local port mapping change. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-18 11:00:18 +01:00

1 2 3 4 5 ...

1090001 Commits