linux

iv/linux

Author	SHA1	Message	Date
Eduard Zingerman	befae75856	bpf: propagate nullness information for reg to reg comparisons Propagate nullness information for branches of register to register equality compare instructions. The following rules are used: - suppose register A maybe null - suppose register B is not null - for JNE A, B, ... - A is not null in the false branch - for JEQ A, B, ... - A is not null in the true branch E.g. for program like below: r6 = skb->sk; r7 = sk_fullsock(r6); r0 = sk_fullsock(r6); if (r0 == 0) return 0; (a) if (r0 != r7) return 0; (b) *r7->type; (c) return 0; It is safe to dereference r7 at point (c), because of (a) and (b). Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221115224859.2452988-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-11-15 17:38:36 -08:00
Toke Høiland-Jørgensen	32637e3300	bpf: Expand map key argument of bpf_redirect_map to u64 For queueing packets in XDP we want to add a new redirect map type with support for 64-bit indexes. To prepare fore this, expand the width of the 'key' argument to the bpf_redirect_map() helper. Since BPF registers are always 64-bit, this should be safe to do after the fact. Acked-by: Song Liu <song@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221108140601.149971-3-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-11-15 09:00:27 -08:00
Toke Høiland-Jørgensen	14d898f3c1	dev: Move received_rps counter next to RPS members in softnet data Move the received_rps counter value next to the other RPS-related members in softnet_data. This closes two four-byte holes in the structure, making room for another pointer in the first two cache lines without bumping the xmit struct to its own line. Acked-by: Song Liu <song@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221108140601.149971-2-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-11-15 09:00:27 -08:00
Daniel Müller	26a9b433cf	bpf/docs: Document how to run CI without patch submission This change documents the process for running the BPF CI before submitting a patch to the upstream mailing list, similar to what happens if a patch is send to bpf@vger.kernel.org: it builds kernel and selftests and runs the latter on different architecture (but it notably does not cover stylistic checks such as cover letter verification). Running BPF CI this way can help achieve better test coverage ahead of patch submission than merely running locally (say, using tools/testing/selftests/bpf/vmtest.sh), as additional architectures may be covered as well. Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221114211501.2068684-1-deso@posteo.net	2022-11-15 15:33:27 +01:00
Kumar Kartikeya Dwivedi	6728aea721	bpf: Refactor btf_struct_access Instead of having to pass multiple arguments that describe the register, pass the bpf_reg_state into the btf_struct_access callback. Currently, all call sites simply reuse the btf and btf_id of the reg they want to check the access of. The only exception to this pattern is the callsite in check_ptr_to_map_access, hence for that case create a dummy reg to simulate PTR_TO_BTF_ID access. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-11-14 21:52:45 -08:00
Kumar Kartikeya Dwivedi	894f2a8b16	bpf: Rename MEM_ALLOC to MEM_RINGBUF Currently, verifier uses MEM_ALLOC type tag to specially tag memory returned from bpf_ringbuf_reserve helper. However, this is currently only used for this purpose and there is an implicit assumption that it only refers to ringbuf memory (e.g. the check for ARG_PTR_TO_ALLOC_MEM in check_func_arg_reg_off). Hence, rename MEM_ALLOC to MEM_RINGBUF to indicate this special relationship and instead open the use of MEM_ALLOC for more generic allocations made for user types. Also, since ARG_PTR_TO_ALLOC_MEM_OR_NULL is unused, simply drop it. Finally, update selftests using 'alloc_' verifier string to 'ringbuf_'. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-11-14 21:52:45 -08:00
Kumar Kartikeya Dwivedi	2de2669b4e	bpf: Rename RET_PTR_TO_ALLOC_MEM Currently, the verifier has two return types, RET_PTR_TO_ALLOC_MEM, and RET_PTR_TO_ALLOC_MEM_OR_NULL, however the former is confusingly named to imply that it carries MEM_ALLOC, while only the latter does. This causes confusion during code review leading to conclusions like that the return value of RET_PTR_TO_DYNPTR_MEM_OR_NULL (which is RET_PTR_TO_ALLOC_MEM \| PTR_MAYBE_NULL) may be consumable by bpf_ringbuf_{submit,commit}. Rename it to make it clear MEM_ALLOC needs to be tacked on top of RET_PTR_TO_MEM. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-11-14 21:52:45 -08:00
Kumar Kartikeya Dwivedi	f0c5941ff5	bpf: Support bpf_list_head in map values Add the support on the map side to parse, recognize, verify, and build metadata table for a new special field of the type struct bpf_list_head. To parameterize the bpf_list_head for a certain value type and the list_node member it will accept in that value type, we use BTF declaration tags. The definition of bpf_list_head in a map value will be done as follows: struct foo { struct bpf_list_node node; int data; }; struct map_value { struct bpf_list_head head __contains(foo, node); }; Then, the bpf_list_head only allows adding to the list 'head' using the bpf_list_node 'node' for the type struct foo. The 'contains' annotation is a BTF declaration tag composed of four parts, "contains:name:node" where the name is then used to look up the type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during the lookup. The node defines name of the member in this type that has the type struct bpf_list_node, which is actually used for linking into the linked list. For now, 'kind' part is hardcoded as struct. This allows building intrusive linked lists in BPF, using container_of to obtain pointer to entry, while being completely type safe from the perspective of the verifier. The verifier knows exactly the type of the nodes, and knows that list helpers return that type at some fixed offset where the bpf_list_node member used for this list exists. The verifier also uses this information to disallow adding types that are not accepted by a certain list. For now, no elements can be added to such lists. Support for that is coming in future patches, hence draining and freeing items is done with a TODO that will be resolved in a future patch. Note that the bpf_list_head_free function moves the list out to a local variable under the lock and releases it, doing the actual draining of the list items outside the lock. While this helps with not holding the lock for too long pessimizing other concurrent list operations, it is also necessary for deadlock prevention: unless every function called in the critical section would be notrace, a fentry/fexit program could attach and call bpf_map_update_elem again on the map, leading to the same lock being acquired if the key matches and lead to a deadlock. While this requires some special effort on part of the BPF programmer to trigger and is highly unlikely to occur in practice, it is always better if we can avoid such a condition. While notrace would prevent this, doing the draining outside the lock has advantages of its own, hence it is used to also fix the deadlock related problem. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-11-14 21:52:45 -08:00
Kumar Kartikeya Dwivedi	e5feed0f64	bpf: Fix copy_map_value, zero_map_value The current offset needs to also skip over the already copied region in addition to the size of the next field. This case manifests where there are gaps between adjacent special fields. It was observed that for a map value with size 48, having fields at: off: 0, 16, 32 size: 4, 16, 16 The current code does: memcpy(dst + 0, src + 0, 0) memcpy(dst + 4, src + 4, 12) memcpy(dst + 20, src + 20, 12) memcpy(dst + 36, src + 36, 12) With the fix, it is done correctly as: memcpy(dst + 0, src + 0, 0) memcpy(dst + 4, src + 4, 12) memcpy(dst + 32, src + 32, 0) memcpy(dst + 48, src + 48, 0) Fixes: 4d7d7f69f4b1 ("bpf: Adapt copy_map_value for multiple offset case") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-11-14 21:52:45 -08:00
Kumar Kartikeya Dwivedi	2d57725257	bpf: Remove BPF_MAP_OFF_ARR_MAX In f71b2f64177a ("bpf: Refactor map->off_arr handling"), map->off_arr was refactored to be btf_field_offs. The number of field offsets is equal to maximum possible fields limited by BTF_FIELDS_MAX. Hence, reuse BTF_FIELDS_MAX as spin_lock and timer no longer are to be handled specially for offset sorting, fix the comment, and remove incorrect WARN_ON as its rec->cnt can never exceed this value. The reason to keep separate constant was the it was always more 2 more than total kptrs. This is no longer the case. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-11-14 21:52:45 -08:00
Kumar Kartikeya Dwivedi	1f6d52f1a8	bpf: Remove local kptr references in documentation We don't want to commit to a specific name for these. Simply call them allocated objects coming from bpf_obj_new, which is completely clear in itself. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-11-14 21:52:45 -08:00
Andrii Nakryiko	de763fbb2c	Merge branch 'libbpf: Fixed various checkpatch issues' Kang Minchul says: ==================== This patch series contains various checkpatch fixes in btf.c, libbpf.c, ringbuf.c. I know these are trivial but some issues are hard to ignore and I think these checkpatch issues are accumulating. v1 -> v2: changed cover letter message. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2022-11-14 11:43:20 -08:00
Kang Minchul	b486d19a0a	libbpf: checkpatch: Fixed code alignments in ringbuf.c Fixed some checkpatch issues in ringbuf.c Signed-off-by: Kang Minchul <tegongkang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221113190648.38556-4-tegongkang@gmail.com	2022-11-14 11:43:17 -08:00
Kang Minchul	e3ba8e4e8c	libbpf: Fixed various checkpatch issues in libbpf.c Fixed following checkpatch issues: WARNING: Block comments use a trailing / on a separate line + other BPF program's BTF object / WARNING: Possible repeated word: 'be' + name. This is important to be be able to find corresponding BTF ERROR: switch and case should be at the same indent + switch (ext->kcfg.sz) { + case 1: (__u8 )ext_val = value; break; + case 2: (__u16 )ext_val = value; break; + case 4: (__u32 )ext_val = value; break; + case 8: (__u64 )ext_val = value; break; + default: ERROR: trailing statements should be on next line + case 1: (__u8 )ext_val = value; break; ERROR: trailing statements should be on next line + case 2: (__u16 )ext_val = value; break; ERROR: trailing statements should be on next line + case 4: (__u32 )ext_val = value; break; ERROR: trailing statements should be on next line + case 8: (__u64 )ext_val = value; break; ERROR: code indent should use tabs where possible + }$ WARNING: please, no spaces at the start of a line + }$ WARNING: Block comments use a trailing / on a separate line + for faster search / ERROR: code indent should use tabs where possible +^I^I^I^I^I^I &ext->kcfg.is_signed);$ WARNING: braces {} are not necessary for single statement blocks + if (err) { + return err; + } ERROR: code indent should use tabs where possible +^I^I^I^I sizeof(obj->btf_modules), obj->btf_module_cnt + 1);$ Signed-off-by: Kang Minchul <tegongkang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221113190648.38556-3-tegongkang@gmail.com	2022-11-14 11:43:06 -08:00
Kang Minchul	c7694ac340	libbpf: checkpatch: Fixed code alignments in btf.c Fixed some checkpatch issues in btf.c Signed-off-by: Kang Minchul <tegongkang@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221113190648.38556-2-tegongkang@gmail.com	2022-11-14 11:42:53 -08:00
Maryam Tahhan	e662c77536	bpf, docs: Fixup cpumap sphinx >= 3.1 warning Fixup bpf_map_update_elem() declaration to use a single line. Reported-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Akira Yokosawa <akiyks@gmail.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221113103327.3287482-1-mtahhan@redhat.com	2022-11-14 19:10:57 +01:00
David Michael	dfd0afbf15	libbpf: Fix uninitialized warning in btf_dump_dump_type_data GCC 11.3.0 fails to compile btf_dump.c due to the following error, which seems to originate in btf_dump_struct_data where the returned value would be uninitialized if btf_vlen returns zero. btf_dump.c: In function ‘btf_dump_dump_type_data’: btf_dump.c:2363:12: error: ‘err’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 2363 \| if (err < 0) \| ^ Fixes: 920d16af9b42 ("libbpf: BTF dumper support for typed data") Signed-off-by: David Michael <fedora.dm0@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/87zgcu60hq.fsf@gmail.com	2022-11-14 19:02:32 +01:00
David S. Miller	f12ed9c048	mlx5-updates-2022-11-12 Misc updates to mlx5 driver 1) Support enhanced CQE compression, on ConnectX6-Dx Reduce irq rate, cpu utilization and latency. 2) Connection tracking: Optimize the pre_ct table lookup for rules installed on chain 0. 3) implement ethtool get_link_ext_stats for PHY down events 4) Expose device vhca_id to debugfs 5) misc cleanups and trivial changes -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmNvc2QACgkQSD+KveBX +j4DuggAiKEu7Imu94k+zXhm/n7Ykb8c1VIQVT9d9+lkyLaRVt5i7q6OpKtbRl4X k5JMK0+xxXZ6lN/Du+iC3hwAxTayVOQn1KoQFVA85aAOBfd8fS4o7hMZ+gEAeaOy qh34sUGPX5YzALN5zhIdk97h1kYvjbjX8CujyluBR9TR9564m4e9oJzqBPC1otab 3YHqeCiP0P8TEclUbiKgkCxvB1JgnzXUnu/u74O6nmzB51gqr1gAZWIUE5y7uuGD WjD/1x62h5kr1xN1TtItKJQWVi1QGXsgiYbB13pNvDCpEMM2ZioEp9GDhWjOLQyh gYHm3Bne8YaV4dVf0lr/7ijTMSwJjQ== =BBj5 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2022-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-11-12 Misc updates to mlx5 driver 1) Support enhanced CQE compression, on ConnectX6-Dx Reduce irq rate, cpu utilization and latency. 2) Connection tracking: Optimize the pre_ct table lookup for rules installed on chain 0. 3) implement ethtool get_link_ext_stats for PHY down events 4) Expose device vhca_id to debugfs 5) misc cleanups and trivial changes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 11:35:28 +00:00
Shenwei Wang	6970ef27ff	net: fec: add xdp and page pool statistics Added xdp and page pool statistics. In order to make the implementation simple and compatible, the patch uses the 32bit integer to record the XDP statistics. Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 11:28:55 +00:00
David S. Miller	4d1bbdf57b	Merge branch 'sparx5-sorted-VCAP-rules' Steen Hegelund says: ==================== net: Add support for sorted VCAP rules in Sparx5 This provides support for adding Sparx5 VCAP rules in sorted order, VCAP rule counters and TC filter matching on ARP frames. It builds on top of the initial IS2 VCAP support found in these series: https://lore.kernel.org/all/20221020130904.1215072-1-steen.hegelund@microchip.com/ https://lore.kernel.org/all/20221109114116.3612477-1-steen.hegelund@microchip.com/ Functionality ============= When a new VCAP rule is added the driver will now ensure that the rule is inserted in sorted order, and when a rule is removed, the remaining rules will be moved to keep the sorted order and remove any gaps in the VCAP address space. A VCAP rule is ordered using these 3 values: - Rule size: the count of VCAP addresses used by the rule. The largest rule have highest priority - Rule User: The rules are ordered by the user enumeration - Priority: The priority provided in the flower filter. The lowest value has the highest priority. A VCAP instance may contain the counter as part of the VCAP cache area, and this counter may be one or more bits in width. This type of counter automatically increments its value when the rule is hit. Other VCAP instances have a dedicated counter area outside of the VCAP and in this case the rule must contain the counter id to be able to locate the counter value and cause the counter to be incremented. In this case there must also be a VCAP rule action that sets the counter id. The Sparx5 IS2 VCAP uses a dedicated counter area with 32bit counters. This series adds support for getting VCAP rule counters and provide these via the TC statistic interface. This only support packet counters, not byte counters. Finally the series adds support for the ARP frame dissector and configures the Sparx5 IS2 VCAP to generate the ARP keyset when ARP traffic is received. Delivery: ========= This is current plan for delivering the full VCAP feature set of Sparx5: - DebugFS support for inspecting rules - TC protocol all support - Sparx5 IS0 VCAP support - TC policer and drop action support (depends on the Sparx5 QoS support upstreamed separately) - Sparx5 ES0 VCAP support - TC flower template support - TC matchall filter support for mirroring and policing ports - TC flower filter mirror action support - Sparx5 ES2 VCAP support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 11:24:17 +00:00
Steen Hegelund	dccc30cc49	net: microchip: sparx5: Add KUNIT test of counters and sorted rules This tests the insert, move and deleting of rules and checks that the unused VCAP addresses are initialized correctly. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 11:24:17 +00:00
Steen Hegelund	40e7fe18ab	net: microchip: sparx5: Add support for TC flower filter statistics This provides flower filter packet statistics (bytes are not supported) via the dedicated IS2 counter feature. All rules having the same TC cookie will contribute to the packet statistics for the filter as they are considered to be part of the same TC flower filter. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 11:24:17 +00:00
Steen Hegelund	f13230a474	net: microchip: sparx5: Add support for IS2 VCAP rule counters This adds API methods to set and get a rule counter. A VCAP instance may contain the counter as part of the VCAP cache area, and this counter may be one or more bits in width. This type of counter automatically increments it value when the rule is hit. Other VCAP instances have a dedicated counter area outside of the VCAP and in this case the rule must contain the counter id to be able to locate the counter value. In this case there must also be a rule action that updates the counter using the rule id when the rule is hit. The Sparx5 IS2 VCAP uses a dedicated counter area. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 11:24:17 +00:00
Steen Hegelund	990e483981	net: microchip: sparx5: Add/delete rules in sorted order This adds a sorting criteria to rule insertion and deletion. The criteria is (in the listed order): - Rule size (largest size first) - User (based on an enumerated user value) - Priority (highest priority first, aka lowest value) When a rule is deleted the other rules may need to be moved to fill the gap to use the available VCAP address space in the best possible way. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 11:24:17 +00:00
Steen Hegelund	3a344f99bb	net: microchip: sparx5: Add support for TC flower ARP dissector This add support for Sparx5 for dissecting TC ARP flower filter keys and sets up the Sparx5 IS2 VCAP to generate the ARP keyset for ARP frames. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 11:24:16 +00:00
Steen Hegelund	70ea86a0df	net: flow_offload: add support for ARP frame matching This adds a new flow_rule_match_arp function that allows drivers to be able to dissect ARP frames. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 11:24:16 +00:00
xu xin	2fd450cd83	ipasdv4/tcp_ipv4: remove redundant assignment The value of 'st->state' has been verified as "TCP_SEQ_STATE_LISTENING", it's unnecessary to assign TCP_SEQ_STATE_LISTENING to it, so we can remove it. Signed-off-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 11:20:15 +00:00
David S. Miller	8a30b30b26	Merge branch 'ibmvnic-affinity-hints' Nick Child says: ==================== ibmvnic: Introduce affinity hint support This is a patchset to do 3 things to improve ibmvnic performance: 1. Assign affinity hints to ibmvnic queue irq's 2. Update affinity hints on cpu hotplug events 3. Introduce transmit packet steering (XPS) NOTE: If irqbalance is running, you need to stop it from overriding our affinity hints. To do this you can do one of: - systemctl stop irqbalance - ban the ibmvnic module irqs - you must have the latest irqbalance v9.2, the banmod argument was broken before this - in /etc/sysconfig/irqbalance -> IRQBALANCE_ARGS="--banmod=ibmvnic" - systemctl restart irqbalance ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 10:47:07 +00:00
Nick Child	df8f66d02d	ibmvnic: Update XPS assignments during affinity binding Transmit Packet Steering (XPS) maps cpu numbers to transmit queues. By running the same connection on the same set of cpu's, contention for the queue and cache miss rate can be minimized. When assigning a cpu mask for a tranmit queues irq number, assign the same cpu mask as the set of cpu's that XPS should use for that queue. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 10:47:07 +00:00
Nick Child	92125c3a60	ibmvnic: Add hotpluggable CPU callbacks to reassign affinity hints When CPU's are added and removed, ibmvnic devices will reassign hint values. Introduce a new cpu hotplug state CPUHP_IBMVNIC_DEAD to signal to ibmvnic devices that the CPU has been removed and it is time to reset affinity hint assignments. On the other hand, when CPU's are being added, add a state instance to CPUHP_AP_ONLINE_DYN which will trigger a reassignment of affinity hints once the new CPU's are online. This implementation is based on the virtio_net driver. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 10:47:07 +00:00
Nick Child	44fbc1b6e0	ibmvnic: Assign IRQ affinity hints to device queues Assign affinity hints to ibmvnic device queue interrupts. Affinity hints are assigned and removed during sub-crq init and teardown, respectively. This update should improve latency if utilized as interrupt lines and processing are more equally distributed among CPU's. This implementation is based on the virtio_net driver. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com> Reviewed-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 10:47:07 +00:00
Md Fahad Iqbal Polash	e384cf35bf	ice: virtchnl rss hena support Add support for 2 virtchnl msgs: VIRTCHNL_OP_SET_RSS_HENA VIRTCHNL_OP_GET_RSS_HENA_CAPS The first one allows VFs to clear all previously programmed RSS configuration and customize it. The second one returns the RSS HENA bits allowed by the hardware. Introduce ice_err_to_virt_err which converts kernel specific errors to virtchnl errors. Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 10:44:38 +00:00
Chuang Wang	ab00af85d2	net: tun: rebuild error handling in tun_get_user The error handling in tun_get_user is very scattered. This patch unifies error handling, reduces duplication of code, and makes the logic clearer. Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-14 09:52:07 +00:00
Saeed Mahameed	e07c4924a7	net/mlx5e: ethtool: get_link_ext_stats for PHY down events Implement ethtool_op get_link_ext_stats for PHY down events Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com>	2022-11-12 02:20:20 -08:00
Oz Shlomo	05bb74c29d	net/mlx5e: CT, optimize pre_ct table lookup The pre_ct table realizes in hardware the act_ct cache logic, bypassing the CT table if the ct state was already set by a previous ct lookup. As such, the pre_ct table will always miss for chain 0 filters. Optimize the pre_ct table lookup for rules installed on chain 0. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:20 -08:00
Tariq Toukan	3413615330	net/mlx5e: kTLS, Use a single async context object per a callback bulk A single async context object is sufficient to wait for the completions of many callbacks. Switch to using one instance per a bulk of commands. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:20 -08:00
Tariq Toukan	4d78a2ebbd	net/mlx5e: kTLS, Remove unnecessary per-callback completion Waiting on a completion object for each callback before cleaning up their async contexts is not necessary, as this is already implied in the mlx5_cmd_cleanup_async_ctx() API. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:20 -08:00
Tariq Toukan	1f74399fd1	net/mlx5e: kTLS, Remove unused work field Work field in struct mlx5e_async_ctx is not used. Remove it. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:19 -08:00
Roi Dayan	9897229061	net/mlx5e: TC, Remove redundant WARN_ON() The case where the packet is not offloaded and needs to be restored to slow path and couldn't find expected tunnel information should not dump a call trace to the user. there is a debug call. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:19 -08:00
Guy Truzman	e74ae1faeb	net/mlx5e: Add error flow when failing update_rx Up until now, return value of update_rx was ignored. Therefore, flow continues even if it fails. Add error flow in case of update_rx fails in mlx5e_open_locked, mlx5i_open and mlx5i_pkey_open. Signed-off-by: Guy Truzman <gtruzman@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:19 -08:00
Tariq Toukan	38438d39a9	net/mlx5e: Move params kernel log print to probe function Params info print was meant to be printed on load. With time, new calls to mlx5e_init_rq_type_params and mlx5e_build_rq_params were added, mistakenly printing the params once again. Move the print to were it belongs, in mlx5e_probe. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:19 -08:00
Ofer Levi	2c925db0a7	net/mlx5e: Support enhanced CQE compression CQE compression feature improves performance by reducing PCI bandwidth bottleneck on CQEs write. Enhanced CQE compression introduced in ConnectX-6 and it aims to reduce CPU utilization of SW side packets decompression by eliminating the need to rewrite ownership bit, which is likely to cost a cache-miss, is replaced by validity byte handled solely by HW. Another advantage of the enhanced feature is that session packets are available to SW as soon as a single CQE slot is filled, instead of waiting for session to close, this improves packet latency from NIC to host. Performance: Following are tested scenarios and reults comparing basic and enahnced CQE compression. setup: IXIA 100GbE connected directly to port 0 and port 1 of ConnectX-6 Dx 100GbE dual port. Case #1 RX only, single flow goes to single queue: IRQ rate reduced by ~ 30%, CPU utilization improved by 2%. Case #2 IP forwarding from port 1 to port 0 single flow goes to single queue: Avg latency improved from 60us to 21us, frame loss improved from 0.5% to 0.0%. Case #3 IP forwarding from port 1 to port 0 Max Throughput IXIA sends 100%, 8192 UDP flows, goes to 24 queues: Enhanced is equal or slightly better than basic. Testing the basic compression feature with this patch shows there is no perfrormance degradation of the basic compression feature. Signed-off-by: Ofer Levi <oferle@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:19 -08:00
Gal Pressman	9458108040	net/mlx5e: Use clamp operation instead of open coding it Replace the min/max operations with a single clamp. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:19 -08:00
Anisse Astier	60551e95a8	net/mlx5e: remove unused list in arfs This is never used, and probably something that was intended to be used before per-protocol hash tables were chosen instead. Signed-off-by: Anisse Astier <anisse@astier.eu> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:19 -08:00
Eli Cohen	dd3dd7263c	net/mlx5: Expose vhca_id to debugfs hca_id is an identifier of an mlx5_core instance within the hardware. This identifier may be required for troubleshooting. Expose it to debugfs. Example: $ cat /sys/kernel/debug/mlx5/mlx5_core.sf.2/vhca_id 0x12 Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:18 -08:00
Moshe Shemesh	71b75f0e02	net/mlx5: Unregister traps on driver unload flow Before this patch, devlink traps are registered only on full driver probe and unregistered on driver removal. As devlink traps are not usable once driver functionality is unloaded, it should be unrgeistered also on flows that unload the driver and then registered when loaded back, e.g. devlink reload flow. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:18 -08:00
Colin Ian King	d23b928bef	net/mlx5: Fix spelling mistake "destoy" -> "destroy" There is a spelling mistake in an error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:18 -08:00
Roi Dayan	ea645f97bc	net/mlx5: Bridge, Use debug instead of warn if entry doesn't exists There is no need for the warn if entry already removed. Use debug print like in the update flow. Also update the messages so user can identify if the it's from the update flow or remove flow. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-11-12 02:20:18 -08:00
Eric Dumazet	b548b17a93	tcp: tcp_wfree() refactoring Use try_cmpxchg() (instead of cmpxchg()) in a more readable way. oval = smp_load_acquire(&sk->sk_tsq_flags); do { ... } while (!try_cmpxchg(&sk->sk_tsq_flags, &oval, nval)); Reduce indentation level. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221110190239.3531280-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-11 21:38:03 -08:00
Eric Dumazet	fac30731b9	tcp: adopt try_cmpxchg() in tcp_release_cb() try_cmpxchg() is slighly more efficient (at least on x86), and smp_load_acquire(&sk->sk_tsq_flags) could avoid a KCSAN report. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221110174829.3403442-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-11 21:37:49 -08:00

1 2 3 4 5 ...

1138383 Commits