linux

iv/linux

Author	SHA1	Message	Date
Liu Shixin	b8f6b0522c	netlabel: Fix memory leak in netlbl_mgmt_add_common Hulk Robot reported memory leak in netlbl_mgmt_add_common. The problem is non-freed map in case of netlbl_domhsh_add() failed. BUG: memory leak unreferenced object 0xffff888100ab7080 (size 96): comm "syz-executor537", pid 360, jiffies 4294862456 (age 22.678s) hex dump (first 32 bytes): 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................ backtrace: [<0000000008b40026>] netlbl_mgmt_add_common.isra.0+0xb2a/0x1b40 [<000000003be10950>] netlbl_mgmt_add+0x271/0x3c0 [<00000000c70487ed>] genl_family_rcv_msg_doit.isra.0+0x20e/0x320 [<000000001f2ff614>] genl_rcv_msg+0x2bf/0x4f0 [<0000000089045792>] netlink_rcv_skb+0x134/0x3d0 [<0000000020e96fdd>] genl_rcv+0x24/0x40 [<0000000042810c66>] netlink_unicast+0x4a0/0x6a0 [<000000002e1659f0>] netlink_sendmsg+0x789/0xc70 [<000000006e43415f>] sock_sendmsg+0x139/0x170 [<00000000680a73d7>] ____sys_sendmsg+0x658/0x7d0 [<0000000065cbb8af>] ___sys_sendmsg+0xf8/0x170 [<0000000019932b6c>] __sys_sendmsg+0xd3/0x190 [<00000000643ac172>] do_syscall_64+0x37/0x90 [<000000009b79d6dc>] entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: `63c4168874` ("netlabel: Add network address selectors to the NetLabel/LSM domain mapping") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:19:04 -07:00
David S. Miller	f0c227c7df	mlx5-updates-2021-06-14 1) Trivial Lag refactroing in preparation for upcomming Single FDB lag feature - First 3 patches 2) Scalable IRQ distriburion for Sub-functions A subfunction (SF) is a lightweight function that has a parent PCI function (PF) on which it is deployed. Currently, mlx5 subfunction is sharing the IRQs (MSI-X) with their parent PCI function. Before this series the PF allocates enough IRQs to cover all the cores in a system, Newly created SFs will re-use all the IRQs that the PF has allocated for itself. Hence, the more SFs are created, there are more EQs per IRQs. Therefore, whenever we handle an interrupt, we need to pull all SFs EQs and PF EQs instead of PF EQs without SFs on the system. This leads to a hard impact on the performance of SFs and PF. For example, on machine with: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores. PCI Express 3 with BW of 126 Gb/s. ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16. test case: iperf TX BW single CPU, affinity of app and IRQ are the same. PF only: no SFs on the system, 56 IRQs. SF (before), 250 SFs Sharing the same 56 IRQs . SF (now), 250 SFs + 255 avaiable IRQs for the NIC. (please see IRQ spread scheme below). application SF-IRQ channel BW(Gb/sec) interrupts/sec iperf TX affinity PF only cpu={0} cpu={0} cpu={0} 79 8200 SF (before) cpu={0} cpu={0} cpu={0} 51.3 (-35%) 9500 SF (now) cpu={0} cpu={0} cpu={0} 78 (-2%) 8200 command: $ taskset -c 0 iperf -c 11.1.1.1 -P 3 -i 6 -t 30 \| grep SUM The different between the SF examples is that before this series we allocate num_cpus (56) IRQs, and all of them were shared among the PF and the SFs. And after this series, we allocate 255 IRQs, and we spread the SFs among the above IRQs. This have significantly decreased the load on each IRQ and the number of EQs per IRQ is down by 95% (251->11). In this patchset the solution proposed is to have a dedicated IRQ pool for SFs to use. the pool will allocate a large number of IRQs for SFs to grab from in order to minimize irq sharing between the different SFs. IRQs will not be requested from the OS until they are 1st requested by an SF consumer, and will be eventually released when the last SF consumer releases them. For the detailed IRQ spread and allocation scheme please see last patch: ("net/mlx5: Round-Robin EQs over IRQs") -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmDIJUkACgkQSD+KveBX +j7tgQf+KtxzniuEY+JgbGWWyQvglx88S6WfhTOhZZllm2QXa2wWX24mz/AdYc0x QCT6yUzvaeaHPNpw/KwCw1IKpB9dlT+wIBD9NCEqtHqj+bVz+ioL/OlM5VJj+wC2 kp+EjYsQbwgZIM40JgLLu2uzLy/5w7a1v9Rj0l4mLRZqPmrqeKrIAsVkVutaxtPg PtECBag4XtYERMXOfKohnXanwjW6ZyYQ0Yal76jNqoXXgy5dHr/JJDZQZTDURt7S 3ex0gwTZwHfOLFQdRzD+U0kuC2/6sHMfeVrKO6QxuG/gihYe8FXEQ4qVSJmgXANP VH6n1Vk5IhaMzYKfGFb2OGOWanAVIA== =z0x7 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-06-14 1) Trivial Lag refactroing in preparation for upcomming Single FDB lag feature - First 3 patches 2) Scalable IRQ distriburion for Sub-functions A subfunction (SF) is a lightweight function that has a parent PCI function (PF) on which it is deployed. Currently, mlx5 subfunction is sharing the IRQs (MSI-X) with their parent PCI function. Before this series the PF allocates enough IRQs to cover all the cores in a system, Newly created SFs will re-use all the IRQs that the PF has allocated for itself. Hence, the more SFs are created, there are more EQs per IRQs. Therefore, whenever we handle an interrupt, we need to pull all SFs EQs and PF EQs instead of PF EQs without SFs on the system. This leads to a hard impact on the performance of SFs and PF. For example, on machine with: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores. PCI Express 3 with BW of 126 Gb/s. ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16. test case: iperf TX BW single CPU, affinity of app and IRQ are the same. PF only: no SFs on the system, 56 IRQs. SF (before), 250 SFs Sharing the same 56 IRQs . SF (now), 250 SFs + 255 avaiable IRQs for the NIC. (please see IRQ spread scheme below). application SF-IRQ channel BW(Gb/sec) interrupts/sec iperf TX affinity PF only cpu={0} cpu={0} cpu={0} 79 8200 SF (before) cpu={0} cpu={0} cpu={0} 51.3 (-35%) 9500 SF (now) cpu={0} cpu={0} cpu={0} 78 (-2%) 8200 command: $ taskset -c 0 iperf -c 11.1.1.1 -P 3 -i 6 -t 30 \| grep SUM The different between the SF examples is that before this series we allocate num_cpus (56) IRQs, and all of them were shared among the PF and the SFs. And after this series, we allocate 255 IRQs, and we spread the SFs among the above IRQs. This have significantly decreased the load on each IRQ and the number of EQs per IRQ is down by 95% (251->11). In this patchset the solution proposed is to have a dedicated IRQ pool for SFs to use. the pool will allocate a large number of IRQs for SFs to grab from in order to minimize irq sharing between the different SFs. IRQs will not be requested from the OS until they are 1st requested by an SF consumer, and will be eventually released when the last SF consumer releases them. For the detailed IRQ spread and allocation scheme please see last patch: ("net/mlx5: Round-Robin EQs over IRQs") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:14:21 -07:00
David S. Miller	08ab4d7441	Merge branch 'occteontx2-rate-limit-offload' Subbaraya Sundeep says: ==================== octeontx2: Add ingress ratelimit offload This patchset adds ingress rate limiting hardware offload support for CN10K silicons. Police actions are added for TC matchall and flower filters. CN10K has ingress rate limiting feature where a receive queue is mapped to bandwidth profile and the profile is configured with rate and burst parameters by software. CN10K hardware supports three levels of ingress policing or ratelimiting. Multiple leaf profiles can point to a single mid level profile and multiple mid level profile can point to a single top level one. Only leaf level profiles are used for configuring rate limiting. Patch 1 adds the new bandwidth profile contexts in AF driver similar to other hardware contexts Patch 2 adds the debugfs changes to dump bandwidth profile contexts Patch 3 adds support for police action with TC matchall filter Patch 4 uses NL_SET_ERR_MSG_MOD for tc code Patch 5 adds support for police action with TC flower filter ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
Subbaraya Sundeep	68fbff68db	octeontx2-pf: Add police action for TC flower Added police action for ingress TC flower hardware offload. With this rate limiting can be done per flow. Since rate limiting is tied to RQs in hardware the number of TC flower filters with action as police is limited to number of receive queues of the interface. Both bps and pps modes are supported. Examples to rate limit a flow: $ ethtool -K eth0 hw-tc-offload on $ tc qdisc add dev eth0 ingress $ tc filter add dev eth0 parent ffff: protocol ip \ flower ip_proto udp dst_port 80 action \ police rate 100Mbit burst 32Kbit $ tc filter add dev eth0 parent ffff: \ protocol ip flower dst_mac 5e:b2:34:ee:29:49 \ action police pkts_rate 5000 pkts_burst 2048 Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
Subbaraya Sundeep	5d2fdd86d5	octeontx2-pf: Use NL_SET_ERR_MSG_MOD for TC This patch modifies all netdev_err messages in tc code to NL_SET_ERR_MSG_MOD. NL_SET_ERR_MSG_MOD does not support format specifiers yet hence netdev_err messages with only strings are modified. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
Sunil Goutham	2ca89a2c37	octeontx2-pf: TC_MATCHALL ingress ratelimiting offload Add TC_MATCHALL ingress ratelimiting offload support with POLICE action for entire traffic coming into the interface. Eg: To ratelimit ingress traffic to 100Mbps $ ethtool -K eth0 hw-tc-offload on $ tc qdisc add dev eth0 clsact $ tc filter add dev eth0 ingress matchall skip_sw \ action police rate 100Mbit burst 32Kbit To support this, a leaf level bandwidth profile is allocated and all RQs' contexts used by this interface are updated to point to it. And the leaf level bandwidth profile is configured with user specified rate and burst sizes. Co-developed-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
Sunil Goutham	e7d8971763	octeontx2-af: cn10k: Debugfs support for bandwidth profiles Added support for dumping current resource status of bandwidth profiles and contexts of allocated profiles via debugfs. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
Sunil Goutham	e8e095b3b3	octeontx2-af: cn10k: Bandwidth profiles config support CN10K silicons supports hierarchial ingress packet ratelimiting. There are 3 levels of profilers supported leaf, mid and top. Ratelimiting is done after packet forwarding decision is taken and a NIXLF's RQ is identified to DMA the packet. RQ's context points to a leaf bandwidth profile which can be configured to achieve desired ratelimit. This patch adds logic for management of these bandwidth profiles ie profile alloc, free, context update etc. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:11:05 -07:00
David S. Miller	ad5645d7b9	Merge branch 'pci200syn-cleanups' Peng Li says: ==================== net: pci200syn: clean up some code style issues This patchset clean up some code style issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:03:17 -07:00
Peng Li	6855d301e9	net: pci200syn: fix the comments style issue Networking block comments don't use an empty /* line, use /* Comment... This patch fixes the comments style issues. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:03:17 -07:00
Peng Li	8e7680c102	net: pci200syn: add necessary () to macro argument Macro argument 'card' may be better as '(card)' to avoid precedence issues. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:03:17 -07:00
Peng Li	2b63744668	net: pci200syn: add some required spaces Add spaces required after that close brace '}'. Add spaces required before the open parenthesis '('. Add spaces required after that ','. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:03:17 -07:00
Peng Li	b9282333ef	net: pci200syn: replace comparison to NULL with "!card" According to the chackpatch.pl, comparison to NULL could be written "!card". Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:03:17 -07:00
Peng Li	f9a03eae28	net: pci200syn: add blank line after declarations This patch fixes the checkpatch error about missing a blank line after declarations. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:03:17 -07:00
Peng Li	bbcb2840b0	net: pci200syn: remove redundant blank lines This patch removes some redundant blank lines. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 11:03:16 -07:00
David S. Miller	5938b227ca	Merge branch 'z85230-cleanups' Peng Li says: ==================== net: z85230: clean up some code style issues This patchset clean up some code style issues. --- Change Log: V1 -> V2: 1, fix the comments from Andrew, add commit message to [patch 04/11] about remove volatile. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Peng Li	2b28b711ac	net: z85230: remove unnecessary out of memory message This patch removes unnecessary out of memory message, to fix the following checkpatch.pl warning: "WARNING: Possible unnecessary 'out of memory' message" Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Peng Li	00a580db9e	net: z85230: fix the code style issue about open brace { This patch fixes the code style issue according to checkpatch.pl error: "ERROR: that open brace { should be on the previous line". Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Peng Li	b87a5cf656	net: z85230: add some required spaces Add space required before the open parenthesis '(' and '{'. Add space required after that close brace '}' and ',' Add spaces required around that '=' , '&', '*', '\|', '+', '/' and '-'. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Peng Li	a04544ffe8	net: z85230: remove trailing whitespaces This patch removes trailing whitespaces. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Peng Li	57b6de35cf	net: z85230: fix the code style issue about "if..else.." According to the chackpatch.pl, else should follow close brace '}', braces {} should be used on all arms of this statement. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Peng Li	c6c3ba4578	net: z85230: fix the comments style issue Networking block comments don't use an empty /* line, use /* Comment... Block comments use * on subsequent lines. Block comments use a trailing */ on a separate line. This patch fixes the comments style issues. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Peng Li	b55932bcfa	net: z85230: replace comparison to NULL with "!skb" According to the chackpatch.pl, comparison to NULL could be written "!skb". Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Peng Li	e07a1f9cbd	net: z85230: fix the code style issue about EXPORT_SYMBOL(foo) According to the chackpatch.pl, EXPORT_SYMBOL(foo); should immediately follow its function/variable. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Peng Li	61312d78e1	net: z85230: add blank line after declarations This patch fixes the checkpatch error about missing a blank line after declarations. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Peng Li	336bac5eda	net: z85230: remove redundant blank lines This patch removes some redundant blank lines. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:55:18 -07:00
Boris Sukholitko	0dca2c7404	net/sched: cls_flower: Remove match on n_proto The following flower filters fail to match packets: tc filter add dev eth0 ingress protocol 0x8864 flower \ action simple sdata hi64 tc filter add dev eth0 ingress protocol 802.1q flower \ vlan_ethtype 0x8864 action simple sdata "hi vlan" The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is being dissected by __skb_flow_dissect and it's internal protocol is being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES tunnel is transparent to the callers of __skb_flow_dissect. OTOH, in the filters above, cls_flower configures its key->basic.n_proto to the ETH_P_PPP_SES value configured by the user. Matching on this key fails because of __skb_flow_dissect "transparency" mentioned above. In the following, I would argue that the problem lies with cls_flower, unnessary attempting key->basic.n_proto match. There are 3 close places in fl_set_key in cls_flower setting up mask->basic.n_proto. They are (in reverse order of appearance in the code) due to: (a) No vlan is given: use TCA_FLOWER_KEY_ETH_TYPE parameter (b) One vlan tag is given: use TCA_FLOWER_KEY_VLAN_ETH_TYPE (c) Two vlans are given: use TCA_FLOWER_KEY_CVLAN_ETH_TYPE The match in case (a) is unneeded because flower has no its own eth_type parameter. It was removed by Jamal Hadi Salim in commit 488b41d020fb06428b90289f70a41210718f52b7 in iproute2. For TCA_FLOWER_KEY_ETH_TYPE the userspace uses the generic tc filter protocol field. Therefore the match for the case (a) is done by tc itself. The matches in cases (b), (c) are unneeded because the protocol will appear in and will be matched by flow_dissector_key_vlan.vlan_tpid. Therefore in the best case, key->basic.n_proto will try to repeat vlan key match again. The below patch removes mask->basic.n_proto setting and resets it to 0 in case (c). Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:26:51 -07:00
Matteo Croce	a955318fe6	stmmac: align RX buffers On RX an SKB is allocated and the received buffer is copied into it. But on some architectures, the memcpy() needs the source and destination buffers to have the same alignment to be efficient. This is not our case, because SKB data pointer is misaligned by two bytes to compensate the ethernet header. Align the RX buffer the same way as the SKB one, so the copy is faster. An iperf3 RX test gives a decent improvement on a RISC-V machine: before: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 733 MBytes 615 Mbits/sec 88 sender [ 5] 0.00-10.01 sec 730 MBytes 612 Mbits/sec receiver after: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver And the memcpy() overhead during the RX drops dramatically. before: Overhead Shared O Symbol 43.35% [kernel] [k] memcpy 33.77% [kernel] [k] __asm_copy_to_user 3.64% [kernel] [k] sifive_l2_flush64_range after: Overhead Shared O Symbol 45.40% [kernel] [k] __asm_copy_to_user 28.09% [kernel] [k] memcpy 4.27% [kernel] [k] sifive_l2_flush64_range Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-15 10:25:18 -07:00
Shay Drory	c36326d38d	net/mlx5: Round-Robin EQs over IRQs Whenever users provided affinity for an EQ creation request, map the EQ to a matching IRQ. Matching IRQ=IRQ with the same affinity and type (completion/control) of the EQ created. This mapping is being done in agressive dedicated IRQ allocation scheme, which described bellow. First, we check whether there is a matching IRQ that his min threshold is not exhausted. - min_eqs_threshold = 3 for control EQ. - min_eqs_threshold = 1 for completion EQ. In case no matching IRQ was found, try to request a new IRQ. In case we can't request a new IRQ, reuse least-used matching IRQ. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:58:00 -07:00
Shay Drory	c8ea212bfd	net/mlx5: Separate between public and private API of sf.h Move mlx5_sf_max_functions() and friends from the privete sf/sf.h to the public lib/sf.h. This is done in order to have one direction include paths. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:58:00 -07:00
Shay Drory	3af26495a2	net/mlx5: Enlarge interrupt field in CREATE_EQ FW is now supporting more than 256 MSI-X per PF (up to 2K). Hence, enlarge interrupt field in CREATE_EQ to make use of the new MSI-X's. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:58:00 -07:00
Shay Drory	71e084e264	net/mlx5: Allocating a pool of MSI-X vectors for SFs SFs (Sub Functions) currently use IRQs from the global IRQ table their parent Physical Function have. In order to better scale, we need to allocate more IRQs and share them between different SFs. Driver will maintain 3 separated irq pools: 1. A pool that serve the PF consumer (PF's netdev, rdma stacks), similar to what the driver had before this patch. i.e, this pool will share irqs between rdma and netev, and will keep the irq indexes and allocation order. The last is important for PF netdev rmap (aRFS). 2. A pool of control IRQs for SFs. The size of this pool is the number of SFs that can be created divided by SFS_PER_IRQ. This pool will serve the control path EQs of the SFs. 3. A pool of completion data path IRQs for SFs transport queues. The size of this pool is: num_irqs_allocated - pf_pool_size - sf_ctrl_pool_size. This pool will served netdev and rdma stacks. Moreover, rmap is not supported on SFs. Sharing methodology of the SFs pools is explained in the next patch. Important note: rmap is not supported on SFs because rmap mapping cannot function correctly for IRQs that are shared for different core/netdev RX rings. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:58:00 -07:00
Shay Drory	fc63dd2a85	net/mlx5: Change IRQ storage logic from static to dynamic Store newly created IRQs in the xarray DB instead of a static array, so we will be able to store only IRQs which are being used. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:59 -07:00
Shay Drory	2d74524c01	net/mlx5: Moving rmap logic to EQs IRQs are being simplified in order to ease their sharing and any feature specific object will be moved to upper layer. Hence we move rmap object into eq_table. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:59 -07:00
Shay Drory	e8abebb3a4	net/mlx5: Extend mlx5_irq_request to request IRQ from the kernel Extend mlx5_irq_request so that IRQs will be requested upon EQ creation, and not on driver boot. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:59 -07:00
Shay Drory	2de6153837	net/mlx5: Removing rmap per IRQ In next patches, IRQs will be requested according to demand, instead of statically on driver boot. Also, currently, rmap is managed by the IRQ layer. rmap management will move out from the IRQ layer in future patches. Therefore, we want to remove the IRQ from the rmap, when IRQ is destroyed, instead of removing all the IRQs from the rmap when irq_table is destroyed. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:58 -07:00
Leon Romanovsky	652e3581f2	net/mlx5: Clean license text in eq.[c\|h] files The eq.[c\|h] files are under major rewrite. so use this opportunity and update their copyright and license texts. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:58 -07:00
Leon Romanovsky	e4e3f24b82	net/mlx5: Provide cpumask at EQ creation phase The users of EQ are running their code on different CPUs and with various affinity patterns. Move the cpumask setting close to their actual usage. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:57 -07:00
Shay Drory	3b43190b2f	net/mlx5: Introduce API for request and release IRQs Introduce new API that will allow IRQs users to hold a pointer to mlx5_irq. In the end of this series, IRQs will be allocated on demand. Hence, this will allow us to properly manage and use IRQs. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:57 -07:00
Leon Romanovsky	c38421abcf	net/mlx5: Delay IRQ destruction till all users are gone Shared IRQ are consumed by multiple EQ users and in order to properly initialize and later release such IRQs, we add kref counting of IRQ structure. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:57 -07:00
Mark Bloch	8a66e45859	net/mlx5: Change ownership model for lag Lag is used to combine two PCI functions of the same HCA into a single logical unit. This is a core functionality and as such should be managed by the core driver. Currently this isn't the case. While we store the lag software structure inside the lower device, its lifetime (creation / destruction) is dictated by the mlx5e part. Change the ownership model so lag is tied to the lifetime of the lower level driver instead to the mlx5e part. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:56 -07:00
Mark Bloch	8ed19471fd	net/mlx5: Lag, Don't rescan if the device is going down If MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV is set it means the device is going down and mlx5_rescan_drivers_locked() shouldn't be called. With this patch and the previous one in the series, unbinding a PCI function when its netdev is part of a bond works and leaves the system in a working state. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:56 -07:00
Mark Bloch	8c22ad36ee	net/mlx5: Lag, refactor disable flow When a net device is removed (can happen if the PCI function is unbound from the system) it's not enough to destroy the hardware lag. The system should recreate the original devices that were present before the lag. As the same flow is done when a net device is removed from the bond refactor and reuse the code. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2021-06-14 20:57:56 -07:00
Loic Poulain	89212e160b	net: wwan: Fix WWAN config symbols There is not strong reason to have both WWAN and WWAN_CORE symbols, Let's build the WWAN core framework when WWAN is selected, in the same way as for other subsystems. This fixes issue with mhi_net selecting WWAN_CORE without WWAN and reported by kernel test robot: Kconfig warnings: (for reference only) WARNING: unmet direct dependencies detected for WWAN_CORE Depends on NETDEVICES && WWAN Selected by - MHI_NET && NETDEVICES && NET_CORE && MHI_BUS Fixes: `9a44c1cc63` ("net: Add a WWAN subsystem") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:17:10 -07:00
Vladimir Oltean	ec13357263	net: flow_dissector: fix RPS on DSA masters After the blamed patch, __skb_flow_dissect() on the DSA master stopped adjusting for the length of the DSA headers. This is because it was told to adjust only if the needed_headroom is zero, aka if there is no DSA header. Of course, the adjustment should be done only if there _is_ a DSA header. Modify the comment too so it is clearer. Fixes: `4e50025129` ("net: dsa: generalize overhead for taggers that use both headers and trailers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:15:22 -07:00
Vladimir Oltean	3009e8aa85	net: dsa: sja1105: constify the sja1105_regs structures The struct sja1105_regs tables are not modified during the runtime of the driver, so they can be made constant. In fact, struct sja1105_info already holds a const pointer to these. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:14:24 -07:00
David S. Miller	0b703008b5	Merge branch 'tja1103-improvewmentsa' Vladimir Oltean says: ==================== Fixes and improvements to TJA1103 PHY driver This series contains: - an erratum workaround for the TJA1103 PHY integrated in SJA1110 - an adaptation of the driver so it prints less unnecessary information when probing on SJA1110 - a PTP RX timestamping bug fix and a clarification patch Targeting net-next since the PHY support is currently in net-next only. Changes in v3: Added one more patch which improves the readability of nxp_c45_reconstruct_ts. Changes in v2: Added a comment to the hardware workaround procedure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:12:59 -07:00
Vladimir Oltean	0b5f0f29b1	net: phy: nxp-c45-tja11xx: enable MDIO write access to the master/slave registers The SJA1110 switch integrates TJA1103 PHYs, but in SJA1110 switch rev B silicon, there is a bug in that the registers for selecting the 100base-T1 autoneg master/slave roles are not writable. To enable write access to the master/slave registers, these additional PHY writes are necessary during initialization. The issue has been corrected in later SJA1110 silicon versions and is not present in the standalone PHY variants, but applying the workaround unconditionally in the driver should not do any harm. Suggested-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:12:59 -07:00
Vladimir Oltean	109258ed62	net: phy: nxp-c45-tja11xx: fix potential RX timestamp wraparound The reconstruction procedure for partial timestamps reads the current PTP time and fills in the low 2 bits of the second portion, as well as the nanoseconds portion, from the actual hardware packet timestamp. Critically, the reconstruction procedure works because it assumes that the current PTP time is strictly larger than the hardware timestamp was: it detects a 2-bit wraparound of the 'seconds' portion by checking whether the 'seconds' portion of the partial hardware timestamp is larger than the 'seconds' portion of the current time. That can only happen if the hardware timestamp was captured by the PHY during the last phase of a 'modulo 4 seconds' interval, and the current PTP time was read by the driver during the initial phase of the next 'modulo 4 seconds' interval. The partial RX timestamps are added to priv->rx_queue in nxp_c45_rxtstamp() and they are processed potentially in parallel by the aux worker thread in nxp_c45_do_aux_work(). This means that it is possible for nxp_c45_do_aux_work() to process more than one RX timestamp during the same schedule. There is one premature optimization that will cause issues: for RX timestamping, the driver reads the current time only once, and it uses that to reconstruct all PTP RX timestamps in the queue. For the second and later timestamps, this will be an issue if we are processing two RX timestamps which are to the left and to the right, respectively, of a 4-bit wraparound of the 'seconds' portion of the PTP time, and the current PTP time is also pre-wraparound. 0.000000000 4.000000000 8.000000000 12.000000000 \|..................\|..................\|..................\|............> ^ ^ ^ ^ time \| \| \| \| \| \| \| process hwts 1 and hwts 2 \| \| \| \| \| hwts 2 \| \| \| read current PTP time \| hwts 1 What will happen in that case is that hwts 2 (post-wraparound) will use a stale current PTP time that is pre-wraparound. But nxp_c45_reconstruct_ts will not detect this condition, because it is not coded up for it, so it will reconstruct hwts 2 with a current time from the previous 4 second interval (i.e. 0.something instead of 4.something). This is solvable by making sure that the full 64-bit current time is always read after the PHY has taken the partial RX timestamp. We do this by reading the current PTP time for every timestamp in the RX queue. Fixes: `514def5dd3` ("phy: nxp-c45-tja11xx: add timestamping support") Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:12:59 -07:00
Vladimir Oltean	661fef5698	net: phy: nxp-c45-tja11xx: express timestamp wraparound interval in terms of TS_SEC_MASK nxp_c45_reconstruct_ts() takes a partial hardware timestamp in @hwts, with 2 bits of the 'seconds' portion, and a full PTP time in @ts. It patches in the lower bits of @hwts into @ts, and to ensure that the reconstructed timestamp is correct, it checks whether the lower 2 bits of @hwts are not in fact higher than the lower 2 bits of @ts. This is not logically possible because, according to the calling convention, @ts was collected later in time than @hwts, but due to two's complement arithmetic it can actually happen, because the current PTP time might have wrapped around between when @hwts was collected and when @ts was, yielding the lower 2 bits of @ts smaller than those of @hwts. To correct for that situation which is expected to happen under normal conditions, the driver subtracts exactly one wraparound interval from the reconstructed timestamp, since the upper bits of that need to correspond to what the upper bits of @hwts were, not to what the upper bits of @ts were. Readers might be confused because the driver denotes the amount of bits that the partial hardware timestamp has to offer as TS_SEC_MASK (timestamp mask for seconds). But it subtracts a seemingly unrelated BIT(2), which is in fact more subtle: if the hardware timestamp provides 2 bits of partial 'seconds' timestamp, then the wraparound interval is 2^2 == BIT(2). But nonetheless, it is better to express the wraparound interval in terms of a definition we already have, so replace BIT(2) with 1 + GENMASK(1, 0) which produces the same result but is clearer. Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-14 13:12:59 -07:00

1 2 3 4 5 ...

1015708 Commits