linux

iv/linux

Author	SHA1	Message	Date
Martin KaFai Lau	273b7f0fb4	bpf: Change bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt() This patch changes bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt(). It removes the duplicated code from bpf_getsockopt(SOL_TCP). Before this patch, there were some optnames available to bpf_setsockopt(SOL_TCP) but missing in bpf_getsockopt(SOL_TCP). For example, TCP_NODELAY, TCP_MAXSEG, TCP_KEEPIDLE, TCP_KEEPINTVL, and a few more. It surprises users from time to time. This patch automatically closes this gap without duplicating more code. bpf_getsockopt(TCP_SAVED_SYN) does not free the saved_syn, so it stays in sol_tcp_sockopt(). For string name value like TCP_CONGESTION, bpf expects it is always null terminated, so sol_tcp_sockopt() decrements optlen by one before calling do_tcp_getsockopt() and the 'if (optlen < saved_optlen) memset(..,0,..);' in __bpf_getsockopt() will always do a null termination. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002918.2894511-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:32 -07:00
Martin KaFai Lau	65ddc82d3b	bpf: Change bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt() This patch changes bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt(). It removes all duplicated code from bpf_getsockopt(SOL_SOCKET). Before this patch, there were some optnames available to bpf_setsockopt(SOL_SOCKET) but missing in bpf_getsockopt(SOL_SOCKET). It surprises users from time to time. For example, SO_REUSEADDR, SO_KEEPALIVE, SO_RCVLOWAT, and SO_MAX_PACING_RATE. This patch automatically closes this gap without duplicating more code. The only exception is SO_BINDTODEVICE because it needs to acquire a blocking lock. Thus, SO_BINDTODEVICE is not supported. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002912.2894040-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:31 -07:00
Martin KaFai Lau	c2b063ca34	bpf: Embed kernel CONFIG check into the if statement in bpf_getsockopt This patch moves the "#ifdef CONFIG_XXX" check into the "if/else" statement itself. The change is done for the bpf_getsockopt() function only. It will make the latter patches easier to follow without the surrounding ifdef macro. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002906.2893572-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:31 -07:00
Martin KaFai Lau	0f95f7d426	bpf: net: Avoid do_ipv6_getsockopt() taking sk lock when called from bpf Similar to the earlier patch that changes sk_getsockopt() to use sockopt_{lock,release}_sock() such that it can avoid taking the lock when called from bpf. This patch also changes do_ipv6_getsockopt() to use sockopt_{lock,release}_sock() such that bpf_getsockopt(SOL_IPV6) can reuse do_ipv6_getsockopt(). Although bpf_getsockopt(SOL_IPV6) currently does not support optname that requires lock_sock(), using sockopt_{lock,release}_sock() consistently across *_getsockopt() will make future optname addition harder to miss the sockopt_{lock,release}_sock() usage. eg. when adding new optname that requires a lock and the new optname is needed in bpf_getsockopt() also. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002859.2893064-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:31 -07:00
Martin KaFai Lau	6dadbe4bac	bpf: net: Change do_ipv6_getsockopt() to take the sockptr_t argument Similar to the earlier patch that changes sk_getsockopt() to take the sockptr_t argument . This patch also changes do_ipv6_getsockopt() to take the sockptr_t argument such that a latter patch can make bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt(). Note on the change in ip6_mc_msfget(). This function is to return an array of sockaddr_storage in optval. This function is shared between ipv6_get_msfilter() and compat_ipv6_get_msfilter(). However, the sockaddr_storage is stored at different offset of the optval because of the difference between group_filter and compat_group_filter. Thus, a new 'ss_offset' argument is added to ip6_mc_msfget(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002853.2892532-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:31 -07:00
Martin KaFai Lau	9c3f9707de	net: Add a len argument to compat_ipv6_get_msfilter() Pass the len to the compat_ipv6_get_msfilter() instead of compat_ipv6_get_msfilter() getting it again from optlen. Its counter part ipv6_get_msfilter() is also taking the len from do_ipv6_getsockopt(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002846.2892091-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:31 -07:00
Martin KaFai Lau	75f2397988	net: Remove unused flags argument from do_ipv6_getsockopt The 'unsigned int flags' argument is always 0, so it can be removed. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002840.2891763-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:31 -07:00
Martin KaFai Lau	1985320c54	bpf: net: Avoid do_ip_getsockopt() taking sk lock when called from bpf Similar to the earlier commit that changed sk_setsockopt() to use sockopt_{lock,release}_sock() such that it can avoid taking lock when called from bpf. This patch also changes do_ip_getsockopt() to use sockopt_{lock,release}_sock() such that a latter patch can make bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002834.2891514-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:31 -07:00
Martin KaFai Lau	728f064cd7	bpf: net: Change do_ip_getsockopt() to take the sockptr_t argument Similar to the earlier patch that changes sk_getsockopt() to take the sockptr_t argument. This patch also changes do_ip_getsockopt() to take the sockptr_t argument such that a latter patch can make bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt(). Note on the change in ip_mc_gsfget(). This function is to return an array of sockaddr_storage in optval. This function is shared between ip_get_mcast_msfilter() and compat_ip_get_mcast_msfilter(). However, the sockaddr_storage is stored at different offset of the optval because of the difference between group_filter and compat_group_filter. Thus, a new 'ss_offset' argument is added to ip_mc_gsfget(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002828.2890585-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:31 -07:00
Martin KaFai Lau	d51bbff2ab	bpf: net: Avoid do_tcp_getsockopt() taking sk lock when called from bpf Similar to the earlier commit that changed sk_setsockopt() to use sockopt_{lock,release}_sock() such that it can avoid taking lock when called from bpf. This patch also changes do_tcp_getsockopt() to use sockopt_{lock,release}_sock() such that a latter patch can make bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002821.2889765-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:31 -07:00
Martin KaFai Lau	34704ef024	bpf: net: Change do_tcp_getsockopt() to take the sockptr_t argument Similar to the earlier patch that changes sk_getsockopt() to take the sockptr_t argument . This patch also changes do_tcp_getsockopt() to take the sockptr_t argument such that a latter patch can make bpf_getsockopt(SOL_TCP) to reuse do_tcp_getsockopt(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002815.2889332-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:30 -07:00
Martin KaFai Lau	2c5b6bf5cd	bpf: net: Avoid sk_getsockopt() taking sk lock when called from bpf Similar to the earlier commit that changed sk_setsockopt() to use sockopt_{lock,release}_sock() such that it can avoid taking lock when called from bpf. This patch also changes sk_getsockopt() to use sockopt_{lock,release}_sock() such that a latter patch can make bpf_getsockopt(SOL_SOCKET) to reuse sk_getsockopt(). Only sk_get_filter() requires this change and it is used by the optname SO_GET_FILTER. The '.getname' implementations in sock->ops->getname() is not changed also since bpf does not always have the sk->sk_socket pointer and cannot support SO_PEERNAME. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002809.2888981-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:30 -07:00
Martin KaFai Lau	4ff09db1b7	bpf: net: Change sk_getsockopt() to take the sockptr_t argument This patch changes sk_getsockopt() to take the sockptr_t argument such that it can be used by bpf_getsockopt(SOL_SOCKET) in a latter patch. security_socket_getpeersec_stream() is not changed. It stays with the __user ptr (optval.user and optlen.user) to avoid changes to other security hooks. bpf_getsockopt(SOL_SOCKET) also does not support SO_PEERSEC. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002802.2888419-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:30 -07:00
Martin KaFai Lau	ba74a7608d	net: Change sock_getsockopt() to take the sk ptr instead of the sock ptr A latter patch refactors bpf_getsockopt(SOL_SOCKET) with the sock_getsockopt() to avoid code duplication and code drift between the two duplicates. The current sock_getsockopt() takes sock ptr as the argument. The very first thing of this function is to get back the sk ptr by 'sk = sock->sk'. bpf_getsockopt() could be called when the sk does not have the sock ptr created. Meaning sk->sk_socket is NULL. For example, when a passive tcp connection has just been established but has yet been accept()-ed. Thus, it cannot use the sock_getsockopt(sk->sk_socket) or else it will pass a NULL ptr. This patch moves all sock_getsockopt implementation to the newly added sk_getsockopt(). The new sk_getsockopt() takes a sk ptr and immediately gets the sock ptr by 'sock = sk->sk_socket' The existing sock_getsockopt(sock) is changed to call sk_getsockopt(sock->sk). All existing callers have both sock->sk and sk->sk_socket pointer. The latter patch will make bpf_getsockopt(SOL_SOCKET) call sk_getsockopt(sk) directly. The bpf_getsockopt(SOL_SOCKET) does not use the optnames that require sk->sk_socket, so it will be safe. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220902002756.2887884-1-kafai@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-02 20:34:30 -07:00
Gal Pressman	8254393663	net: ieee802154: Fix compilation error when CONFIG_IEEE802154_NL802154_EXPERIMENTAL is disabled When CONFIG_IEEE802154_NL802154_EXPERIMENTAL is disabled, NL802154_CMD_DEL_SEC_LEVEL is undefined and results in a compilation error: net/ieee802154/nl802154.c:2503:19: error: 'NL802154_CMD_DEL_SEC_LEVEL' undeclared here (not in a function); did you mean 'NL802154_CMD_SET_CCA_ED_LEVEL'? 2503 \| .resv_start_op = NL802154_CMD_DEL_SEC_LEVEL + 1, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ \| NL802154_CMD_SET_CCA_ED_LEVEL Unhide the experimental commands, having them defined in an enum makes no difference. Fixes: 9c5d03d36251 ("genetlink: start to validate reserved header bytes") Signed-off-by: Gal Pressman <gal@nvidia.com> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Tested-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lore.kernel.org/r/20220902030620.2737091-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-02 19:59:08 -07:00
Ian Rogers	af515a5587	selftests/xsk: Avoid use-after-free on ctx The put lowers the reference count to 0 and frees ctx, reading it afterwards is invalid. Move the put after the uses and determine the last use by the reference count being 1. Fixes: 39e940d4abfa ("selftests/xsk: Destroy BPF resources only when ctx refcount drops to 0") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220901202645.1463552-1-irogers@google.com	2022-09-02 15:57:18 +02:00
Daniel Müller	afef88e655	selftests/bpf: Store BPF object files with .bpf.o extension BPF object files are, in a way, the final artifact produced as part of the ahead-of-time compilation process. That makes them somewhat special compared to "regular" object files, which are a intermediate build artifacts that can typically be removed safely. As such, it can make sense to name them differently to make it easier to spot this difference at a glance. Among others, libbpf-bootstrap [0] has established the extension .bpf.o for BPF object files. It seems reasonable to follow this example and establish the same denomination for selftest build artifacts. To that end, this change adjusts the corresponding part of the build system and the test programs loading BPF object files to work with .bpf.o files. [0] https://github.com/libbpf/libbpf-bootstrap Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220901222253.1199242-1-deso@posteo.net	2022-09-02 15:55:37 +02:00
Maciej Fijalkowski	fe2ad08e1e	selftests/xsk: Add support for zero copy testing Introduce new mode to xdpxceiver responsible for testing AF_XDP zero copy support of driver that serves underlying physical device. When setting up test suite, determine whether driver has ZC support or not by trying to bind XSK ZC socket to the interface. If it succeeded, interpret it as ZC support being in place and do softirq and busy poll tests for zero copy mode. Note that Rx dropped tests are skipped since ZC path is not touching rx_dropped stat at all. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220901114813.16275-7-maciej.fijalkowski@intel.com	2022-09-02 15:38:08 +02:00
Maciej Fijalkowski	c29fe883de	selftests/xsk: Make sure single threaded test terminates For single threaded poll tests call pthread_kill() from main thread so that we are sure worker thread has finished its job and it is possible to proceed with next test types from test suite. It was observed that on some platforms it takes a bit longer for worker thread to exit and next test case sees device as busy in this case. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220901114813.16275-6-maciej.fijalkowski@intel.com	2022-09-02 15:38:03 +02:00
Maciej Fijalkowski	a693ff3ed5	selftests/xsk: Add support for executing tests on physical device Currently, architecture of xdpxceiver is designed strictly for conducting veth based tests. Veth pair is created together with a network namespace and one of the veth interfaces is moved to the mentioned netns. Then, separate threads for Tx and Rx are spawned which will utilize described setup. Infrastructure described in the paragraph above can not be used for testing AF_XDP support on physical devices. That testing will be conducted on a single network interface and same queue. Xskxceiver needs to be extended to distinguish between veth tests and physical interface tests. Since same iface/queue id pair will be used by both Tx/Rx threads for physical device testing, Tx thread, which happen to run after the Rx thread, is going to create XSK socket with shared umem flag. In order to track this setting throughout the lifetime of spawned threads, introduce 'shared_umem' boolean variable to struct ifobject and set it to true when xdpxceiver is run against physical device. In such case, UMEM size needs to be doubled, so half of it will be used by Rx thread and other half by Tx thread. For two step based test types, value of XSKMAP element under key 0 has to be updated as there is now another socket for the second step. Also, to avoid race conditions when destroying XSK resources, move this activity to the main thread after spawned Rx and Tx threads have finished its job. This way it is possible to gracefully remove shared umem without introducing synchronization mechanisms. To run xsk selftests suite on physical device, append "-i $IFACE" when invoking test_xsk.sh. For veth based tests, simply skip it. When "-i $IFACE" is in place, under the hood test_xsk.sh will use $IFACE for both interfaces supplied to xdpxceiver, which in turn will interpret that this execution of test suite is for a physical device. Note that currently this makes it possible only to test SKB and DRV mode (in case underlying device has native XDP support). ZC testing support is added in a later patch. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220901114813.16275-5-maciej.fijalkowski@intel.com	2022-09-02 15:37:57 +02:00
Maciej Fijalkowski	24037ba7c4	selftests/xsk: Increase chars for interface name to 16 So that "enp240s0f0" or such name can be used against xskxceiver. While at it, also extend character count for netns name. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220901114813.16275-4-maciej.fijalkowski@intel.com	2022-09-02 15:37:53 +02:00
Maciej Fijalkowski	1adef0643b	selftests/xsk: Introduce default Rx pkt stream In order to prepare xdpxceiver for physical device testing, let us introduce default Rx pkt stream. Reason for doing it is that physical device testing will use a UMEM with a doubled size where half of it will be used by Tx and other half by Rx. This means that pkt addresses will differ for Tx and Rx streams. Rx thread will initialize the xsk_umem_info::base_addr that is added here so that pkt_set(), when working on Rx UMEM will add this offset and second half of UMEM space will be used. Note that currently base_addr is 0 on both sides. Future commit will do the mentioned initialization. Previously, veth based testing worked on separate UMEMs, so single default stream was fine. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220901114813.16275-3-maciej.fijalkowski@intel.com	2022-09-02 15:37:45 +02:00
Maciej Fijalkowski	0d68e6fe12	selftests/xsk: Query for native XDP support Currently, xdpxceiver assumes that underlying device supports XDP in native mode - it is fine by now since tests can run only on a veth pair. Future commit is going to allow running test suite against physical devices, so let us query the device if it is capable of running XDP programs in native mode. This way xdpxceiver will not try to run TEST_MODE_DRV if device being tested is not supporting it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220901114813.16275-2-maciej.fijalkowski@intel.com	2022-09-02 15:37:37 +02:00
Shmulik Ladkani	8cc61b7a64	selftests/bpf: Amend test_tunnel to exercise BPF_F_TUNINFO_FLAGS Get the tunnel flags in {ipv6}vxlan_get_tunnel_src and ensure they are aligned with tunnel params set at {ipv6}vxlan_set_tunnel_dst. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220831144010.174110-2-shmulik.ladkani@gmail.com	2022-09-02 15:21:03 +02:00
Shmulik Ladkani	44c51472be	bpf: Support getting tunnel flags Existing 'bpf_skb_get_tunnel_key' extracts various tunnel parameters (id, ttl, tos, local and remote) but does not expose ip_tunnel_info's tun_flags to the BPF program. It makes sense to expose tun_flags to the BPF program. Assume for example multiple GRE tunnels maintained on a single GRE interface in collect_md mode. The program expects origins to initiate over GRE, however different origins use different GRE characteristics (e.g. some prefer to use GRE checksum, some do not; some pass a GRE key, some do not, etc..). A BPF program getting tun_flags can therefore remember the relevant flags (e.g. TUNNEL_CSUM, TUNNEL_SEQ...) for each initiating remote. In the reply path, the program can use 'bpf_skb_set_tunnel_key' in order to correctly reply to the remote, using similar characteristics, based on the stored tunnel flags. Introduce BPF_F_TUNINFO_FLAGS flag for bpf_skb_get_tunnel_key. If specified, 'bpf_tunnel_key->tunnel_flags' is set with the tun_flags. Decided to use the existing unused 'tunnel_ext' as the storage for the 'tunnel_flags' in order to avoid changing bpf_tunnel_key's layout. Also, the following has been considered during the design: 1. Convert the "interesting" internal TUNNEL_xxx flags back to BPF_F_yyy and place into the new 'tunnel_flags' field. This has 2 drawbacks: - The BPF_F_yyy flags are from set_tunnel_key enumeration space, e.g. BPF_F_ZERO_CSUM_TX. It is awkward that it is "returned" into tunnel_flags from a get_tunnel_key call. - Not all "interesting" TUNNEL_xxx flags can be mapped to existing BPF_F_yyy flags, and it doesn't make sense to create new BPF_F_yyy flags just for purposes of the returned tunnel_flags. 2. Place key.tun_flags into 'tunnel_flags' but mask them, keeping only "interesting" flags. That's ok, but the drawback is that what's "interesting" for my usecase might be limiting for other usecases. Therefore I decided to expose what's in key.tun_flags as is, which seems most flexible. The BPF user can just choose to ignore bits he's not interested in. The TUNNEL_xxx are also UAPI, so no harm exposing them back in the get_tunnel_key call. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220831144010.174110-1-shmulik.ladkani@gmail.com	2022-09-02 15:20:55 +02:00
Shung-Hsi Yu	dc84dbbcc9	bpf, tnums: Warn against the usage of tnum_in(tnum_range(), ...) Commit a657182a5c51 ("bpf: Don't use tnum_range on array range checking for poke descriptors") has shown that using tnum_range() as argument to tnum_in() can lead to misleading code that looks like tight bound check when in fact the actual allowed range is much wider. Document such behavior to warn against its usage in general, and suggest some scenario where result can be trusted. Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/984b37f9fdf7ac36831d2137415a4a915744c1b6.1661462653.git.daniel@iogearbox.net Link: https://www.openwall.com/lists/oss-security/2022/08/26/1 Link: https://lore.kernel.org/bpf/20220831031907.16133-3-shung-hsi.yu@suse.com Link: https://lore.kernel.org/bpf/20220831031907.16133-2-shung-hsi.yu@suse.com	2022-09-02 14:44:54 +02:00
Jakub Kicinski	c3f760ef12	net: remove netif_tx_napi_add() All callers are now gone. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 12:41:43 +01:00
Eric Dumazet	977f1aa5e4	net: bql: add more documentation Add some documentation for netdev_tx_sent_queue() and netdev_tx_completed_queue() Stating that netdev_tx_completed_queue() must be called once per TX completion round is apparently not obvious for everybody. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 12:27:46 +01:00
David S. Miller	25de4a0b7b	Merge branch 'net-ipa-transaction-state-IDs' Alex Elder says: ==================== net: ipa: use IDs to track transaction state This series is the first of three groups of changes that simplify the way the IPA driver tracks the state of its transactions. Each GSI channel has a fixed number of transactions allocated at initialization time. The number allocated matches the number of TREs in the transfer ring associated with the channel. This is because the transfer ring limits the number of transfers that can ever be underway, and in the worst case, each transaction represents a single TRE. Transactions go through various states during their lifetime. Currently a set of lists keeps track of which transactions are in each state. Initially, all transactions are free. An allocated transaction is placed on the allocated list. Once an allocated transaction is committed, it is moved from the allocated to the committed list. When a committed transaction is sent to hardware (via a doorbell) it is moved to the pending list. When hardware signals that some work has completed, transactions are moved to the completed list. Finally, when a completed transaction is polled it's moved to the polled list before being removed when it becomes free. Changing a transaction's state thus normally involves manipulating two lists, and to prevent corruption a spinlock is held while the lists are updated. Transactions move through their states in a well-defined sequence though, and they do so strictly in order. So transaction 0 is always allocated before transaction 1; transaction 0 is always committed before transaction 1; and so on, through completion, polling, and becoming free. Because of this, it's sufficient to just keep track of which transaction is the first in each state. The rest of the transactions in a given state can be derived from the first transaction in an "adjacent" state. As a result, we can track the state of all transactions with a set of indexes, and can update these without the need for a spinlock. This first group of patches just defines the set of indexes that will be used for this new way of tracking transaction state. Two more groups of patches will follow. I've broken the 17 patches into these three groups to facilitate review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 12:08:44 +01:00
Alex Elder	fd3bd0398a	net: ipa: track polled transactions with an ID Add a transaction ID to track the first element in the transaction array that has been polled. Advance the ID when we are releasing a transaction. Temporarily add warnings that verify that the first polled transaction tracked by the ID matches the first element on the polled list, both when polling and freeing. Remove the temporary warnings added by the previous commit. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 12:08:44 +01:00
Alex Elder	949cd0b5c2	net: ipa: track completed transactions with an ID Add a transaction ID field to track the first element in the transaction array that has completed but has not yet been polled. Advance the ID when we are processing a transaction in the NAPI polling loop (where completed transactions become polled). Temporarily add warnings that verify that the first completed transaction tracked by the ID matches the first element on the completed list, both when pending and completing. Remove the temporary warnings added by the previous commit. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 12:08:44 +01:00
Alex Elder	eeff7c14e0	net: ipa: track pending transactions with an ID Add a transaction ID field to track the first element in the transaction array that is pending (sent to hardware) but not yet complete. Advance the ID when a completion event for a channel indicates that transactions have completed. Temporarily add warnings that verify that the first pending transaction tracked by the ID matches the first element on the pending list, both when pending and completing, as well as when resetting the channel. Remove the temporary warnings added by the previous commit. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 12:08:44 +01:00
Alex Elder	fc95d958e2	net: ipa: track committed transactions with an ID Add a transaction ID field to track the first element in a channel's transaction array that has been committed, but not yet passed to the hardware. Advance the ID when the hardware is notified via doorbell that TREs from a transaction are ready for consumption. Temporarily add warnings that verify that the first committed transaction tracked by the ID matches the first element on the committed list, both when committing and pending (at doorbell). Remove the temporary warnings added by the previous commit. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 12:08:44 +01:00
Alex Elder	41e2a2c054	net: ipa: track allocated transactions with an ID Transactions for a channel are now managed in an array, with a free transaction ID indicating which is the next one free. Add another transaction ID field to track the first element in the array that has been allocated. Advance it when a transaction is committed (because that is when that transaction leaves allocated state). Temporarily add warnings that verify that the first allocated transaction tracked by the ID matches the first element on the allocated list, both when allocating and committing a transaction. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 12:08:44 +01:00
Alex Elder	12382d1167	net: ipa: use an array for transactions Transactions are always allocated one at a time. The maximum number of them we could ever need occurs if each TRE is assigned to a transaction. So a channel requires no more transactions than the number of TREs in its transfer ring. That number is known to be a power-of-2 less than 65536. The transaction pool abstraction is used for other things, but for transactions we can use a simple array of transaction structures, and use a free index to indicate which entry in the array is the next one free for allocation. By having the number of elements in the array be a power-of-2, we can use an ever-incrementing 16-bit free index, and use it modulo the array size. Distinguish a "trans_id" (whose value can exceed the number of entries in the transaction array) from a "trans_index" (which is less than the number of entries). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 12:08:44 +01:00
David S. Miller	a01105f174	Merge branch 'lan966x-make-reset-optional' Michael Walle says: ==================== net: lan966x: make reset optional This is the remaining part of the reset rework on the LAN966x targetting the netdev tree. The former series can be found at: https://lore.kernel.org/lkml/20220826115607.1148489-1-michael@walle.cc/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 11:37:27 +01:00
Michael Walle	f4c1f51cea	net: lan966x: make reset optional There is no dedicated reset for just the switch core. The reset which is used up until now, is more of a global reset, resetting almost the whole SoC and cause spurious errors by doing so. Make it possible to handle the reset elsewhere and make the reset optional. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 11:37:27 +01:00
Michael Walle	baa6a9b590	dt-bindings: net: sparx5: don't require a reset line Make the reset line optional. It turns out, there is no dedicated reset for the switch. Instead, the reset which was used up until now, was kind of a global reset. This is now handled elsewhere, thus don't require a reset. Signed-off-by: Michael Walle <michael@walle.cc> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-02 11:37:27 +01:00
Wolfram Sang	bf99f11df4	wifi: move from strlcpy with unused retval to strscpy Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220830201457.7984-2-wsa+renesas@sang-engineering.com	2022-09-02 11:47:22 +03:00
Jinpeng Cui	1dc13236ef	wifi: wilc1000: remove redundant ret variable Return value from cfg80211_rx_mgmt() directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Jinpeng Cui <cui.jinpeng2@zte.com.cn> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220830105505.287564-1-cui.jinpeng2@zte.com.cn	2022-09-02 11:46:32 +03:00
Yang Yingliang	b0ea758b30	wifi: rtw88: add missing destroy_workqueue() on error path in rtw_core_init() Add the missing destroy_workqueue() before return from rtw_core_init() in error path. Fixes: fe101716c7c9 ("rtw88: replace tx tasklet with work queue") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220826023817.3908255-1-yangyingliang@huawei.com	2022-09-02 11:45:30 +03:00
Dan Carpenter	f97c81f5b7	wifi: wfx: prevent underflow in wfx_send_pds() This does a "chunk_len - 4" subtraction later when it calls: ret = wfx_hif_configuration(wdev, buf + 4, chunk_len - 4); so check for "chunk_len" is less than 4. Fixes: dcbecb497908 ("staging: wfx: allow new PDS format") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/Yv8eX7Xv2ubUOvW7@kili	2022-09-02 11:44:35 +03:00
Dan Carpenter	620d5eaeb9	wifi: rtl8xxxu: tighten bounds checking in rtl8xxxu_read_efuse() There some bounds checking to ensure that "map_addr" is not out of bounds before the start of the loop. But the checking needs to be done as we iterate through the loop because "map_addr" gets larger as we iterate. Fixes: 26f1fad29ad9 ("New driver: rtl8xxxu (mac80211)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jes Sorensen <Jes.Sorensen@gmail.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/Yv8eGLdBslLAk3Ct@kili	2022-09-02 11:43:50 +03:00
Ping-Ke Shih	fec11dee17	wifi: rtw89: declare to support beamformee above bandwidth 80MHz Declare this to tell AP we can support beamformee over bandwidth 160M, and then yield better performance in field. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220826061011.9037-3-pkshih@realtek.com	2022-09-02 11:37:31 +03:00
Ping-Ke Shih	ad275d0a82	wifi: rtw89: correct polling address of address CAM Writing address to kick hardware to initialize address CAM, and then poll ready bit to determine completed. Old wrong code poll wrong register address, so it can lead error and fail to bring up interface. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220826061011.9037-2-pkshih@realtek.com	2022-09-02 11:37:31 +03:00
Ping-Ke Shih	0d466f0526	wifi: rtw89: no HTC field if TX rate might fallback to legacy Packets containing HTC field with legacy rate could be dropped by AP. If TX rate of report is lower than MCS2, hardware might fall back rate to legacy. Therefore, add a checking rule to avoid HTC field in this situation. Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220826061011.9037-1-pkshih@realtek.com	2022-09-02 11:37:31 +03:00
Ping-Ke Shih	4a29213cd7	wifi: rtw89: pci: correct TX resource checking in low power mode Number of TX resource must be minimum of TX_BD and TX_WD. Only considering TX_BD could drop TX packets pulled from mac80211 if TX_WD is unavailable. Fixes: 52edbb9fb78a ("rtw89: ps: access TX/RX rings via another registers in low power mode") Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220824063312.15784-2-pkshih@realtek.com	2022-09-02 11:37:03 +03:00
Ping-Ke Shih	b7e715d3dc	wifi: rtw89: pci: fix interrupt stuck after leaving low power mode We turn off interrupt in ISR, and re-enable interrupt in threadfn or napi_poll according to the mode it stays. If we are turning off interrupt, rtwpci->running flag is unset and interrupt handler stop processing even if it was called, so disallow to re-enable interrupt in this situation. Or, wifi chip doesn't trigger interrupt events anymore because interrupt status (ISR) isn't clear by interrupt handler anymore. Fixes: c83dcd0508e2 ("rtw89: pci: add a separate interrupt handler for low power mode") Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220824063312.15784-1-pkshih@realtek.com	2022-09-02 11:37:03 +03:00
Cheng-Chieh Hsieh	9bea576175	wifi: rtw89: enlarge the CFO tracking boundary The calibration value of XTAL offset may be too large in some wifi modules, that the CFO tracking mechanism under the existing tracking boundary can not adjust the CFO to the tolerable range. So we enlarge it. Signed-off-by: Cheng-Chieh Hsieh <cj.hsieh@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220824061425.13764-1-pkshih@realtek.com	2022-09-02 11:36:31 +03:00
Chin-Yen Lee	9e3d242fd3	wifi: rtw89: pci: correct suspend/resume setting for variant chips We find that suspend/resume tests cause 8852CE lost, because some pci registers are changed for 8852CE. So, correct them accordingly. Signed-off-by: Chin-Yen Lee <timlee@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220819064811.37700-6-pkshih@realtek.com	2022-09-02 11:35:52 +03:00

... 3 4 5 6 7 ...

1123158 Commits