linux

iv/linux

Author	SHA1	Message	Date
Florian Westphal	2954fe60e3	netfilter: let reset rules clean out conntrack entries iptables/nftables support responding to tcp packets with tcp resets. The generated tcp reset packet passes through both output and postrouting netfilter hooks, but conntrack will never see them because the generated skb has its ->nfct pointer copied over from the packet that triggered the reset rule. If the reset rule is used for established connections, this may result in the conntrack entry to be around for a very long time (default timeout is 5 days). One way to avoid this would be to not copy the nf_conn pointer so that the rest packet passes through conntrack too. Problem is that output rules might not have the same conntrack zone setup as the prerouting ones, so its possible that the reset skb won't find the correct entry. Generating a template entry for the skb seems error prone as well. Add an explicit "closing" function that switches a confirmed conntrack entry to closed state and wire this up for tcp. If the entry isn't confirmed, no action is needed because the conntrack entry will never be committed to the table. Reported-by: Russel King <linux@armlinux.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-02-17 13:04:56 +01:00
Julian Anastasov	e4d0fe71f5	ipvs: avoid kfree_rcu without 2nd arg Avoid possible synchronize_rcu() as part from the kfree_rcu() call when 2nd arg is not provided. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-02-02 14:02:01 +01:00
Gavrilov Ilia	f6477ec62f	netfilter: conntrack: remote a return value of the 'seq_print_acct' function. The static 'seq_print_acct' function always returns 0. Change the return value to 'void' and remove unnecessary checks. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1ca9e41770cb ("netfilter: Remove uses of seq_<foo> return values") Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-02-01 12:18:52 +01:00
Florian Westphal	28af0f009d	netfilter: conntrack: udp: fix seen-reply test IPS_SEEN_REPLY_BIT is only useful for test_bit() api. Fixes: 4883ec512c17 ("netfilter: conntrack: avoid reload of ct->status") Reported-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-02-01 12:18:51 +01:00
Yang Yingliang	1fb7696ac6	netfilter: nf_tables: fix wrong pointer passed to PTR_ERR() It should be 'chain' passed to PTR_ERR() in the error path after calling nft_chain_lookup() in nf_tables_delrule(). Fixes: f80a612dd77c ("netfilter: nf_tables: add support to destroy operation") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-02-01 12:18:51 +01:00
Alok Tiwari	dac7f50a45	netfilter: nf_tables: NULL pointer dereference in nf_tables_updobj() static analyzer detect null pointer dereference case for 'type' function __nft_obj_type_get() can return NULL value which require to handle if type is NULL pointer return -ENOENT. This is a theoretical issue, since an existing object has a type, but better add this failsafe check. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2023-02-01 12:18:48 +01:00
Jakub Kicinski	dd25cfab16	Merge branch 'net-ipa-remaining-ipa-v5-0-support' Alex Elder says: ==================== net: ipa: remaining IPA v5.0 support This series includes almost all remaining IPA code changes required to support IPA v5.0. IPA register definitions and configuration data for IPA v5.0 will be sent later (soon). Note that the GSI register definitions still require work. GSI for IPA v5.0 supports up to 256 (rather than 32) channels, and this changes the way GSI register offsets are calculated. A few GSI register fields also change. The first patch in this series increases the number of IPA endpoints supported by the driver, from 32 to 36. The next updates the width of the destination field for the IP_PACKET_INIT immediate command so it can represent up to 256 endpoints rather than just 32. The next adds a few definitions of some IPA registers and fields that are first available in IPA v5.0. The next two patches update the code that handles router and filter table caches. Previously these were referred to as "hashed" tables, and the IPv4 and IPv6 tables are now combined into one "unified" table. The sixth and seventh patches add support for a new pulse generator, which allows time periods to be specified with a wider range of clock resolution. And the last patch just defines two new memory regions that were not previously used. ==================== Link: https://lore.kernel.org/r/20230130210158.4126129-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:45:54 -08:00
Alex Elder	5157d6bfca	net: ipa: define two new memory regions IPA v5.0 uses two memory regions not previously used. Define them and treat them as valid only for IPA v5.0. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:45:52 -08:00
Alex Elder	2cdbcbfd48	net: ipa: support a third pulse register The AP has third pulse generator available starting with IPA v5.0. Redefine ipa_qtime_val() to support that possibility. Pass the IPA pointer as an argument so the version can be determined. And stop using the sign of the returned tick count to indicate which of two pulse generators to use. Instead, have the caller provide the address of a variable that will hold the selected pulse generator for the Qtime value. And for version 5.0, check whether the third pulse generator best represents the time period. Add code in ipa_qtime_config() to configure the fourth pulse generator for IPA v5.0+; in that case configure both the third and fourth pulse generators to use 10 msec granularity. Consistently use "ticks" for local variables that represent a tick count. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:45:52 -08:00
Alex Elder	32079a4ab1	net: ipa: greater timer granularity options Starting with IPA v5.0, the head-of-line blocking timer has more than two pulse generators available to define timer granularity. To prepare for that, change the way the field value is encoded to use ipa_reg_encode() rather than ipa_reg_bit(). The aggregation granularity selection could (in principle) also use an additional pulse generator starting with IPA v5.0. Encode the AGGR_GRAN_SEL field differently to allow that as well. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:45:52 -08:00
Alex Elder	a08cedc31d	net: ipa: support zeroing new cache tables IPA v5.0+ separates the configuration of entries in the cached (previously "hashed") routing and filtering tables into distinct registers. Previously a single "filter and router" register updated entries in both tables at once; now the routing and filter table caches have separate registers that define their content. This patch updates the code that zeroes entries in the cached filter and router tables to support IPA versions including v5.0+. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:45:52 -08:00
Alex Elder	8e7c89d84a	net: ipa: update table cache flushing Update the code that causes filter and router table caches to be flushed so that it supports IPA versions 5.0+. It adds a comment in ipa_hardware_config_hashing() that explains that cacheing does not need to be enabled, just as before, because it's enabled by default. (For the record, the FILT_ROUT_CACHE_CFG register would have been used if we wanted to explicitly enable these.) Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:45:51 -08:00
Alex Elder	8ba59716d1	net: ipa: define IPA v5.0+ registers Define some new registers that appear starting with IPA v5.0, along with enumerated types identifying their fields. Code that uses these will be added by upcoming patches. Most of the new registers are related to filter and routing tables, and in particular, their "hashed" variant. These tables are better described as "cached", where a hash value determines which entries are cached. From now on, naming related to this functionality will use "cache" instead of "hash", and that is reflected in these new register names. Some registers for managing these caches and their contents have changed as well. A few other new field definitions for registers (unrelated to table caches) are also defined. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:45:51 -08:00
Alex Elder	c84ddc1197	net: ipa: extend endpoints in packet init command The IP_PACKET_INIT immediate command defines the destination endpoint to which a packet should be sent. Prior to IPA v5.0, a 5 bit field in that command represents the endpoint, but starting with IPA v5.0, the field is extended to 8 bits to support more than 32 endpoints. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:45:51 -08:00
Alex Elder	07abde549b	net: ipa: support more endpoints Increase the number of endpoints supported by the driver to 36, which IPA v5.0 supports. This makes it impossible to check at build time whether the supported number is too big to fit within the (5-bit) PACKET_INIT destination endpoint field. Instead, convert the build time check to compare against what fits in 8 bits. Add a check in ipa_endpoint_config() to also ensure the hardware reports an endpoint count that's in the expected range. Just open-code 32 as the limit (the PACKET_INIT field mask is not available where we'd want to use it). Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:45:51 -08:00
Jakub Kicinski	71af6a2ddf	mlx5-updates-2023-01-30 Add fast update encryption key Jianbo Liu Says: ================ Data encryption keys (DEKs) are the keys used for data encryption and decryption operations. Starting from version 22.33.0783, firmware is optimized to accelerate the update of user keys into DEK object in hardware. The support for bulk allocation and destruction of DEK objects is added, and the bulk allocated DEKs are uninitialized, as the bulk creation requires no input key. When offload encryption/decryption, user gets one object from a bulk, and updates key by a new "modify DEK" command. This command is the same as create DEK object, but requires no heavy context memory allocation in firmware, which consumes most cpu cycles of the create DEK command. DEKs are cached internally by the NIC, so invalidating internal NIC caches is required before reusing DEKs. The SYNC_CRYPTO command is added to support it. DEK object can be reused, the keys in it can be updated after this command is executed. This patchset enhances the key creation and destruction flow, to get use of this new feature. Any user, for example, ktls, ipsec and macsec, can use it to offload keys. But, only ktls uses it, as others don't need many keys, and caching two many DEKs in pool is wasteful. There are two new data struts added: a. DEK pool. One pool is created for each key type. The bulks by the type, are placed in the pool's different bulk lists, according to the number of available and in_used DEKs in the bulk. b. DEK bulk. All DEKs in one bulk allocation are store here. There are two bitmaps to indicate the state of each DEK. New APIs are then added. When user need a DEK object, a. Fetch one bulk with avail DEKs, from the partial_list or avail_list, otherwise create new one. b. Pick one DEK, and set its need_sync and in_used bits to 1. Move the bulk to full_list if no more available keys, or put it to partial_list if the bulk is newly created. c. Update DEK object's key with user key, by the "modify DEK" command. d. Return DEK struct to user, then it gets the object id and fills it into the offload commands. When user free a DEK, a. Set in_use bit to 0. If all need_sync bits are 1 and all in_use bits of this bulk are 0, move it to sync_list. b. If the number of DEKs, which are freed by users, is over the threshold (128), schedule a workqueue to do the sync process. For the sync process, the SYNC_CRYPTO command is executed first. Then, for each bulks in partial_list, full_list and sync_list, reset need_sync bits of the freed DEK objects. If all need_sync bits in one bulk are zero, move it to avail_list. We already supported TIS pool to recycle the TISes. With this series and TIS pool, TLS CPS performance is improved greatly. And we tested https on the system: CPU: dual AMD EPYC 7763 64-Core processors RAM: 512G DEV: ConnectX-6 DX, with FW ver 22.33.0838 and TLS_OPTIMISE=true TLS CPS performance numbers are: Before: 11k connections/sec After: 101 connections/sec ================ -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmPYho4ACgkQSD+KveBX +j4tmQf/UnDnj55lf7zxvDYCgIThSFeqIPCnwnbRRTPB85jsjsBMx+52ugYGJ5kZ Mci93QfkDoIEAAamBwj76X3skobmsdKZsOmFyLKpfWBz6K98EZVC7nAPPRO9o80Z YGQQAbUn8I/USC0cB2BICCnjkbcpeMUgYYqnLteBsKBiH3IkMoEtkeaWN0M3SHK/ xKLZwlpX+2gIotr6h2ftd8B8ygL1CSyMTqIp0vrSQY69ucTpgtsbDufODbU58p7n JUOVtNM5irwi2QdfSJjPAc1vMkkVJYCGbE1mxMjbKyDOMEnK5vIiMgb7RWRSMbdC FzSY0/vQxFoOB21+CeiGN/rPvMnRCA== =/Cb4 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2023-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-01-30 Add fast update encryption key Jianbo Liu Says: ================ Data encryption keys (DEKs) are the keys used for data encryption and decryption operations. Starting from version 22.33.0783, firmware is optimized to accelerate the update of user keys into DEK object in hardware. The support for bulk allocation and destruction of DEK objects is added, and the bulk allocated DEKs are uninitialized, as the bulk creation requires no input key. When offload encryption/decryption, user gets one object from a bulk, and updates key by a new "modify DEK" command. This command is the same as create DEK object, but requires no heavy context memory allocation in firmware, which consumes most cpu cycles of the create DEK command. DEKs are cached internally by the NIC, so invalidating internal NIC caches is required before reusing DEKs. The SYNC_CRYPTO command is added to support it. DEK object can be reused, the keys in it can be updated after this command is executed. This patchset enhances the key creation and destruction flow, to get use of this new feature. Any user, for example, ktls, ipsec and macsec, can use it to offload keys. But, only ktls uses it, as others don't need many keys, and caching two many DEKs in pool is wasteful. There are two new data struts added: a. DEK pool. One pool is created for each key type. The bulks by the type, are placed in the pool's different bulk lists, according to the number of available and in_used DEKs in the bulk. b. DEK bulk. All DEKs in one bulk allocation are store here. There are two bitmaps to indicate the state of each DEK. New APIs are then added. When user need a DEK object, a. Fetch one bulk with avail DEKs, from the partial_list or avail_list, otherwise create new one. b. Pick one DEK, and set its need_sync and in_used bits to 1. Move the bulk to full_list if no more available keys, or put it to partial_list if the bulk is newly created. c. Update DEK object's key with user key, by the "modify DEK" command. d. Return DEK struct to user, then it gets the object id and fills it into the offload commands. When user free a DEK, a. Set in_use bit to 0. If all need_sync bits are 1 and all in_use bits of this bulk are 0, move it to sync_list. b. If the number of DEKs, which are freed by users, is over the threshold (128), schedule a workqueue to do the sync process. For the sync process, the SYNC_CRYPTO command is executed first. Then, for each bulks in partial_list, full_list and sync_list, reset need_sync bits of the freed DEK objects. If all need_sync bits in one bulk are zero, move it to avail_list. We already supported TIS pool to recycle the TISes. With this series and TIS pool, TLS CPS performance is improved greatly. And we tested https on the system: CPU: dual AMD EPYC 7763 64-Core processors RAM: 512G DEV: ConnectX-6 DX, with FW ver 22.33.0838 and TLS_OPTIMISE=true TLS CPS performance numbers are: Before: 11k connections/sec After: 101 connections/sec ================ * tag 'mlx5-updates-2023-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: kTLS, Improve connection rate by using fast update encryption key net/mlx5: Keep only one bulk of full available DEKs net/mlx5: Add async garbage collector for DEK bulk net/mlx5: Reuse DEKs after executing SYNC_CRYPTO command net/mlx5: Use bulk allocation for fast update encryption key net/mlx5: Add bulk allocation and modify_dek operation net/mlx5: Add support SYNC_CRYPTO command net/mlx5: Add new APIs for fast update encryption key net/mlx5: Refactor the encryption key creation net/mlx5: Add const to the key pointer of encryption key creation net/mlx5: Prepare for fast crypto key update if hardware supports it net/mlx5: Change key type to key purpose net/mlx5: Add IFC bits and enums for crypto key net/mlx5: Add IFC bits for general obj create param net/mlx5: Header file for crypto ==================== Link: https://lore.kernel.org/r/20230131031201.35336-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:35:34 -08:00
Jakub Kicinski	c925ed5f66	Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN: Remove redundant Device Control Error Reporting Enable Bjorn Helgaas says: Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core sets the Device Control bits that enable error reporting for PCIe devices. This series removes redundant calls to pci_enable_pcie_error_reporting() that do the same thing from several NIC drivers. There are several more drivers where this should be removed; I started with just the Intel drivers here. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ixgbe: Remove redundant pci_enable_pcie_error_reporting() igc: Remove redundant pci_enable_pcie_error_reporting() igb: Remove redundant pci_enable_pcie_error_reporting() ice: Remove redundant pci_enable_pcie_error_reporting() iavf: Remove redundant pci_enable_pcie_error_reporting() i40e: Remove redundant pci_enable_pcie_error_reporting() fm10k: Remove redundant pci_enable_pcie_error_reporting() e1000e: Remove redundant pci_enable_pcie_error_reporting() ==================== Link: https://lore.kernel.org/r/20230130192519.686446-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:04:25 -08:00
Jakub Kicinski	67971c381f	Merge branch 'selftests-mlxsw-convert-to-iproute2-dcb' Petr Machata says: ==================== selftests: mlxsw: Convert to iproute2 dcb There is a dedicated tool for configuration of DCB in iproute2. Use it in the selftests instead of lldpad. Patches #1-#3 convert three tests. Patch #4 drops the now-unnecessary lldpad helpers. ==================== Link: https://lore.kernel.org/r/cover.1675096231.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:02:13 -08:00
Petr Machata	bd32ff6872	selftests: net: forwarding: lib: Drop lldpad_app_wait_set(), _del() The existing users of these helpers have been converted to iproute2 dcb. Drop the helpers. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:02:11 -08:00
Petr Machata	5b3ef0452c	selftests: mlxsw: qos_defprio: Convert from lldptool to dcb Set up default port priority through the iproute2 dcb tool, which is easier to understand and manage. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:02:11 -08:00
Petr Machata	10d5bd0b69	selftests: mlxsw: qos_dscp_router: Convert from lldptool to dcb Set up DSCP prioritization through the iproute2 dcb tool, which is easier to understand and manage. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:02:11 -08:00
Petr Machata	1680801ef6	selftests: mlxsw: qos_dscp_bridge: Convert from lldptool to dcb Set up DSCP prioritization through the iproute2 dcb tool, which is easier to understand and manage. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 21:02:11 -08:00
Jakub Kicinski	88b49402fa	Merge branch 'net-mdio-add-amlogic-gxl-mdio-mux-support' Jerome Brunet says: ==================== net: mdio: add amlogic gxl mdio mux support Add support for the MDIO multiplexer found in the Amlogic GXL SoC family. This multiplexer allows to choose between the external (SoC pins) MDIO bus, or the internal one leading to the integrated 10/100M PHY. This multiplexer has been handled with the mdio-mux-mmioreg generic driver so far. When it was added, it was thought the logic was handled by a single register. It turns out more than a single register need to be properly set. As long as the device is using the Amlogic vendor bootloader, or upstream u-boot with net support, it is working fine since the kernel is inheriting the bootloader settings. Without net support in the bootloader, this glue comes unset in the kernel and only the external path may operate properly. With this driver (and the associated change in arch/arm64/boot/dts/amlogic/meson-gxl.dtsi), the kernel no longer relies on the bootloader to set things up, fixing the problem. ==================== Link: https://lore.kernel.org/r/20230130151616.375168-1-jbrunet@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:59:09 -08:00
Jerome Brunet	9a24e1ff43	net: mdio: add amlogic gxl mdio mux support Add support for the mdio mux and internal phy glue of the GXL SoC family Reported-by: Da Xue <da@lessconfused.com> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:59:07 -08:00
Jerome Brunet	cc732d2351	dt-bindings: net: add amlogic gxl mdio multiplexer Add documentation for the MDIO bus multiplexer found on the Amlogic GXL SoC family Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:59:07 -08:00
Jakub Kicinski	1b98ac0fc8	Merge branch 'tools-ynl-more-docs-and-basic-ethtool-support' Jakub Kicinski says: ==================== tools: ynl: more docs and basic ethtool support I got discouraged from supporting ethtool in specs, because generating the user space C code seems a little tricky. The messages are ID'ed in a "directional" way (to and from kernel are separate ID "spaces"). There is value, however, in having the spec and being able to for example use it in Python. After paying off some technical debt - add a partial ethtool spec. Partial because the header for ethtool is almost a 1000 LoC, so converting in one sitting is tough. But adding new commands should be trivial now. Last but not least I add more docs, I realized that I've been sending a similar "instructions" email to people working on new families. It's now intro-specs.rst. ==================== Link: https://lore.kernel.org/r/20230131023354.1732677-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:05 -08:00
Jakub Kicinski	981cbcb030	tools: net: use python3 explicitly The scripts require Python 3 and some distros are dropping Python 2 support. Reported-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	01e47a3722	docs: netlink: add a starting guide for working with specs We have a bit of documentation about the internals of Netlink and the specs, but really the goal is for most people to not worry about those. Add a practical guide for beginners who want to poke at the specs. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	b784db7ae8	netlink: specs: add partial specification for ethtool Ethtool is one of the most actively developed families. With the changes to the CLI it should be possible to use the YNL based code for easy prototyping and development. Add a partial family definition. I've tested the string set and rings. I don't have any MAC Merge implementation to test with, but I added the definition for it, anyway, because it's last. New commands can simply be added at the end without having to worry about manually providing IDs / values. Set (with notification support - None is the response, the data is from the notification): $ sudo ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/ethtool.yaml \ --do rings-set \ --json '{"header":{"dev-name":"enp0s31f6"}, "rx":129}' \ --subscribe monitor None [{'msg': {'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'}, 'rx': 136, 'rx-max': 4096, 'tx': 256, 'tx-max': 4096, 'tx-push': 0}, 'name': 'rings-ntf'}] Do / dump (yes, the kernel requires that even for dump and even if empty - the "header" nest must be there): $ ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/ethtool.yaml \ --do rings-get \ --json '{"header":{"dev-index": 2}}' {'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'}, 'rx': 136, 'rx-max': 4096, 'tx': 256, 'tx-max': 4096, 'tx-push': 0} $ ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/ethtool.yaml \ --dump rings-get \ --json '{"header":{}}' [{'header': {'dev-index': 2, 'dev-name': 'enp0s31f6'}, 'rx': 136, 'rx-max': 4096, 'tx': 256, 'tx-max': 4096, 'tx-push': 0}, {'header': {'dev-index': 3, 'dev-name': 'wlp0s20f3'}, 'tx-push': 0}, {'header': {'dev-index': 19, 'dev-name': 'enp58s0u1u1'}, 'rx': 100, 'rx-max': 4096, 'tx-push': 0}] And error reporting: $ ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/ethtool.yaml \ --dump rings-get \ --json '{"header":{"flags":5}}' Netlink error: Invalid argument nl_len = 68 (52) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'reserved bit set', 'bad-attr-offs': 24, 'bad-attr': '.header.flags'} None Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	8403bf0445	netlink: specs: finish up operation enum-models I had a (bright?) idea of introducing the concept of enum-models to account for all the weird ways families enumerate their messages. I've never finished it because generating C code for each of them is pretty daunting. But for languages which can use ID values directly the support is simple enough, so clean this up a bit. "unified" model is what I recommend going forward. "directional" model is what ethtool uses. "notify-split" is used by the proposed DPLL code, but we can just make them use "unified", it hasn't been merged :) Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	5c6674f6eb	tools: ynl: load jsonschema on demand The CLI script tries to validate jsonschema by default. It's seems better to validate too many times than too few. However, when copying the scripts to random servers having to install jsonschema is tedious. Load jsonschema via importlib, and let the user opt out. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	8dfec0a888	tools: ynl: use operation names from spec on the CLI When I wrote the first version of the Python code I was quite excited that we can generate class methods directly from the spec. Unfortunately we need to use valid identifiers for method names (specifically no dashes are allowed). Don't reuse those names on the CLI, it's much more natural to use the operation names exactly as listed in the spec. Instead of: ./cli --do rings_get use: ./cli --do rings-get Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	4cd2796f3f	tools: ynl: support pretty printing bad attribute names One of my favorite features of the Netlink specs is that they make decoding structured extack a ton easier. Implement pretty printing bad attribute names in YNL. For example it will now say: 'bad-attr': '.header.flags' rather than the useless: 'bad-attr-offs': 32 Proof: $ ./cli.py --spec ethtool.yaml --do rings_get \ --json '{"header":{"dev-index":1, "flags":4}}' Netlink error: Invalid argument nl_len = 68 (52) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'reserved bit set', 'bad-attr': '.header.flags'} Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	90256f3f80	tools: ynl: support multi-attr Ethtool uses mutli-attr, add the support to YNL. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	fd0616d342	tools: ynl: support directional enum-model in CLI Support families which use different IDs for messages to and from the kernel. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	19b64b48a3	tools: ynl: add support for types needed by ethtool Ethtool needs support for handful of extra types. It doesn't have the definitions section yet. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	30a5c6c810	tools: ynl: use the common YAML loading and validation code Adapt the common object hierarchy in code gen and CLI. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	3aacf82813	tools: ynl: add an object hierarchy to represent parsed spec There's a lot of copy and pasting going on between the "cli" and code gen when it comes to representing the parsed spec. Create a library which both can use. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	4e4480e89c	tools: ynl: move the cli and netlink code around Move the CLI code out of samples/ and the library part of it into tools/net/ynl/lib/. This way we can start sharing some code with the code gen. Initially I thought that code gen is too C-specific to share anything but basic stuff like calculating values for enums can easily be shared. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Jakub Kicinski	eaf317e7d2	tools: ynl-gen: prevent do / dump reordering An earlier fix tried to address generated code jumping around one code-gen run to another. Turns out dict()s are already ordered since Python 3.7, the problem is that we iterate over operation modes using a set(). Sets are unordered in Python. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-31 20:36:03 -08:00
Caleb Connolly	df54fde451	net: ipa: use dev PM wakeirq handling Replace the enable_irq_wake() call with one to dev_pm_set_wake_irq() instead. This will let the dev PM framework automatically manage the the wakeup capability of the ipa IRQ and ensure that userspace requests to enable/disable wakeup for the IPA via sysfs are respected. Signed-off-by: Caleb Connolly <caleb.connolly@linaro.org> Reviewed-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20230127202758.2913612-1-caleb.connolly@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-31 15:24:38 +01:00
Arnd Bergmann	562c65486c	net: dsa: microchip: ptp: fix up PTP dependency When NET_DSA_MICROCHIP_KSZ_COMMON is built-in but PTP is a loadable module, the ksz_ptp support still causes a link failure: ld.lld-16: error: undefined symbol: ptp_clock_index >>> referenced by ksz_ptp.c >>> drivers/net/dsa/microchip/ksz_ptp.o:(ksz_get_ts_info) in archive vmlinux.a This can happen if NET_DSA_MICROCHIP_KSZ8863_SMI is enabled, or even if none of the KSZ9477_I2C/KSZ_SPI/KSZ8863_SMI ones are active but only the common module is. The most straightforward way to address this is to move the dependency to NET_DSA_MICROCHIP_KSZ_PTP itself, which can now only be enabled if both PTP_1588_CLOCK support is reachable from NET_DSA_MICROCHIP_KSZ_COMMON. Alternatively, one could make NET_DSA_MICROCHIP_KSZ_COMMON a hidden Kconfig symbol and extend the PTP_1588_CLOCK_OPTIONAL dependency to NET_DSA_MICROCHIP_KSZ8863_SMI as well, but that is a little more fragile. Fixes: eac1ea20261e ("net: dsa: microchip: ptp: add the posix clock support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230130131808.1084796-1-arnd@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-31 13:21:38 +01:00
Randy Dunlap	a266ef69b8	Documentation: networking: correct spelling Correct spelling problems for Documentation/networking/ as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Jiri Pirko <jiri@nvidia.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20230129231053.20863-5-rdunlap@infradead.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-31 13:00:47 +01:00
Nick Child	6831582937	ibmvnic: Toggle between queue types in affinity mapping Previously, ibmvnic IRQs were assigned to CPU numbers by assigning all the IRQs for transmit queues then assigning all the IRQs for receive queues. With multi-threaded processors, in a heavy RX or TX environment, physical cores would either be overloaded or underutilized (due to the IRQ assignment algorithm). This approach is sub-optimal because IRQs for the same subprocess (RX or TX) would be bound to adjacent CPU numbers, meaning they were more likely to be contending for the same core. For example, in a system with 64 CPU's and 32 queues, the IRQs would be bound to CPU in the following pattern: IRQ type \| CPU number ----------------------- TX0 \| 0-1 TX1 \| 2-3 <etc> RX0 \| 32-33 RX1 \| 34-35 <etc> Observe that in SMT-8, the first 4 tx queues would be sharing the same core. A more optimal algorithm would balance the number RX and TX IRQ's across the physical cores. Therefore, to increase performance, distribute RX and TX IRQs across cores by alternating between assigning IRQs for RX and TX queues to CPUs. With a system with 64 CPUs and 32 queues, this results in the following pattern: IRQ type \| CPU number ----------------------- TX0 \| 0-1 RX0 \| 2-3 TX1 \| 4-5 RX1 \| 6-7 <etc> Observe that in SMT-8, there is equal distribution of RX and TX IRQs per core. In the above case, each core handles 2 TX and 2 RX IRQ's. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Link: https://lore.kernel.org/r/20230127214358.318152-1-nnac123@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2023-01-31 10:09:13 +01:00
Jakub Kicinski	6a8ab43683	Merge branch 'add-support-for-the-the-vsc7512-internal-copper-phys' Colin Foster says: ==================== add support for the the vsc7512 internal copper phys This patch series is a continuation to add support for the VSC7512: https://patchwork.kernel.org/project/netdevbpf/list/?series=674168&state=* That series added the framework and initial functionality for the VSC7512 chip. Several of these patches grew during the initial development of the framework, which is why v1 will include changelogs. It was during v9 of that original MFD patch set that these were dropped. With that out of the way, the VSC7512 is mainly a subset of the VSC7514 chip. The 7512 lacks an internal MIPS processor, but otherwise many of the register definitions are identical. That is why several of these patches are simply to expose common resources from drivers/net/ethernet/mscc/*. This patch only adds support for the first four ports (swp0-swp3). The remaining ports require more significant changes to the felix driver, and will be handled in the future. ==================== Link: https://lore.kernel.org/r/20230127193559.1001051-1-colin.foster@in-advantage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-30 21:07:23 -08:00
Colin Foster	8dccdd277e	mfd: ocelot: add external ocelot switch control Utilize the existing ocelot MFD interface to add switch functionality to the Microsemi VSC7512 chip. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Acked-for-MFD-by: Lee Jones <lee@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-30 21:07:21 -08:00
Colin Foster	3d7316ac81	net: dsa: ocelot: add external ocelot switch control Add control of an external VSC7512 chip. Currently the four copper phy ports are fully functional. Communication to external phys is also functional, but the SGMII / QSGMII interfaces are currently non-functional. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-30 21:07:21 -08:00
Colin Foster	11fc80cbb2	dt-bindings: mfd: ocelot: add ethernet-switch hardware support The main purpose of the Ocelot chips are the Ethernet switching functionalities. Document the support for these features. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-30 21:07:21 -08:00
Colin Foster	dd43f5e768	dt-bindings: net: mscc,vsc7514-switch: add dsa binding for the vsc7512 The VSC7511, VSC7512, VSC7513 and VSC7514 all have the ability to be controlled either internally by a memory-mapped CPU, or externally via interfaces like SPI and PCIe. The internal CPU of the VSC7511 and 7512 don't have the resources to run Linux, so must be controlled via these external interfaces in a DSA configuration. Add mscc,vsc7512-switch compatible string to indicate that the chips are being controlled externally in a DSA configuration. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-30 21:07:21 -08:00
Colin Foster	fde0b6ced8	mfd: ocelot: prepend resource size macros to be 32-bit The *_RES_SIZE macros are initally <= 0x100. Future resource sizes will be upwards of 0x200000 in size. To keep things clean, fully align the RES_SIZE macros to 32-bit to do nothing more than make the code more consistent. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Acked-for-MFD-by: Lee Jones <lee@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # regression Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-01-30 21:07:20 -08:00

1 2 3 4 5 ...

1155788 Commits