linux

iv/linux

Author	SHA1	Message	Date
Vladimir Oltean	fcbf979a5b	Revert "net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver" This reverts commit `6d709cadfd`. The above change was done to avoid calling symbols exported by the switch driver from the tagging protocol driver. With the tagger-owned storage model, we have a new option on our hands, and that is for the switch driver to provide a data consumer handler in the form of a function pointer inside the ->connect_tag_protocol() method. Having a function pointer avoids the problems of the exported symbols approach. By creating a handler for metadata frames holding TX timestamps on SJA1110, we are able to eliminate an skb queue from the tagger data, and replace it with a simple, and stateless, function pointer. This skb queue is now handled exclusively by sja1105_ptp.c, which makes the code easier to follow, as it used to be before the reverted patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:34 +00:00
Vladimir Oltean	c79e84866d	net: dsa: tag_sja1105: convert to tagger-owned data Currently, struct sja1105_tagger_data is a part of struct sja1105_private, and is used by the sja1105 driver to populate dp->priv. With the movement towards tagger-owned storage, the sja1105 driver should not be the owner of this memory. This change implements the connection between the sja1105 switch driver and its tagging protocol, which means that sja1105_tagger_data no longer stays in dp->priv but in ds->tagger_data, and that the sja1105 driver now only populates the sja1105_port_deferred_xmit callback pointer. The kthread worker is now the responsibility of the tagger. The sja1105 driver also alters the tagger's state some more, especially with regard to the PTP RX timestamping state. This will be fixed up a bit in further changes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	22ee9f8e40	net: dsa: sja1105: move ts_id from sja1105_tagger_data The TX timestamp ID is incremented by the SJA1110 PTP timestamping callback (->port_tx_timestamp) for every packet, when cloning it. It isn't used by the tagger at all, even though it sits inside the struct sja1105_tagger_data. Also, serialization to this structure is currently done through tagger_data->meta_lock, which is a cheap hack because the meta_lock isn't used for anything else on SJA1110 (sja1105_rcv_meta_state_machine isn't called). This change moves ts_id from sja1105_tagger_data to sja1105_private and introduces a dedicated spinlock for it, also in sja1105_private. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	bfcf142522	net: dsa: sja1105: make dp->priv point directly to sja1105_tagger_data The design of the sja1105 tagger dp->priv is that each port has a separate struct sja1105_port, and the sp->data pointer points to a common struct sja1105_tagger_data. We have removed all per-port members accessible by the tagger, and now only struct sja1105_tagger_data remains. Make dp->priv point directly to this. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	6f6770ab1c	net: dsa: sja1105: remove hwts_tx_en from tagger data This tagger property is in fact not used at all by the tagger, only by the switch driver. Therefore it makes sense to be moved to sja1105_private. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	d38049bbe7	net: dsa: sja1105: bring deferred xmit implementation in line with ocelot-8021q When the ocelot-8021q driver was converted to deferred xmit as part of commit `8d5f7954b7` ("net: dsa: felix: break at first CPU port during init and teardown"), the deferred implementation was deliberately made subtly different from what sja1105 has. The implementation differences lied on the following observations: - There might be a race between these two lines in tag_sja1105.c: skb_queue_tail(&sp->xmit_queue, skb_get(skb)); kthread_queue_work(sp->xmit_worker, &sp->xmit_work); and the skb dequeue logic in sja1105_port_deferred_xmit(). For example, the xmit_work might be already queued, however the work item has just finished walking through the skb queue. Because we don't check the return code from kthread_queue_work, we don't do anything if the work item is already queued. However, nobody will take that skb and send it, at least until the next timestampable skb is sent. This creates additional (and avoidable) TX timestamping latency. To close that race, what the ocelot-8021q driver does is it doesn't keep a single work item per port, and a skb timestamping queue, but rather dynamically allocates a work item per packet. - It is also unnecessary to have more than one kthread that does the work. So delete the per-port kthread allocations and replace them with a single kthread which is global to the switch. This change brings the two implementations in line by applying those observations to the sja1105 driver as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	a3d74295d7	net: dsa: sja1105: let deferred packets time out when sent to ports going down This code is not necessary and complicates the conversion of this driver to tagger-owned memory. If there is a PTP packet that is sent concurrently with the port getting disabled, the deferred xmit mechanism is robust enough to time out when it sees that it hasn't been delivered, and recovers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	35d9768021	net: dsa: tag_ocelot: convert to tagger-owned data The felix driver makes very light use of dp->priv, and the tagger is effectively stateless. dp->priv is practically only needed to set up a callback to perform deferred xmit of PTP and STP packets using the ocelot-8021q tagging protocol (the main ocelot tagging protocol makes no use of dp->priv, although this driver sets up dp->priv irrespective of actual tagging protocol in use). struct felix_port (what used to be pointed to by dp->priv) is removed and replaced with a two-sided structure. The public side of this structure, visible to the switch driver, is ocelot_8021q_tagger_data. The private side is ocelot_8021q_tagger_private, and the latter structure physically encapsulates the former. The public half of the tagger data structure can be accessed through a helper of the same name (ocelot_8021q_tagger_data) which also sanity-checks the protocol currently in use by the switch. The public/private split was requested by Andrew Lunn. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Vladimir Oltean	dc452a471d	net: dsa: introduce tagger-owned storage for private and shared data Ansuel is working on register access over Ethernet for the qca8k switch family. This requires the qca8k tagging protocol driver to receive frames which aren't intended for the network stack, but instead for the qca8k switch driver itself. The dp->priv is currently the prevailing method for passing data back and forth between the tagging protocol driver and the switch driver. However, this method is riddled with caveats. The DSA design allows in principle for any switch driver to return any protocol it desires in ->get_tag_protocol(). The dsa_loop driver can be modified to do just that. But in the current design, the memory behind dp->priv has to be allocated by the switch driver, so if the tagging protocol is paired to an unexpected switch driver, we may end up in NULL pointer dereferences inside the kernel, or worse (a switch driver may allocate dp->priv according to the expectations of a different tagger). The latter possibility is even more plausible considering that DSA switches can dynamically change tagging protocols in certain cases (dsa <-> edsa, ocelot <-> ocelot-8021q), and the current design lends itself to mistakes that are all too easy to make. This patch proposes that the tagging protocol driver should manage its own memory, instead of relying on the switch driver to do so. After analyzing the different in-tree needs, it can be observed that the required tagger storage is per switch, therefore a ds->tagger_data pointer is introduced. In principle, per-port storage could also be introduced, although there is no need for it at the moment. Future changes will replace the current usage of dp->priv with ds->tagger_data. We define a "binding" event between the DSA switch tree and the tagging protocol. During this binding event, the tagging protocol's ->connect() method is called first, and this may allocate some memory for each switch of the tree. Then a cross-chip notifier is emitted for the switches within that tree, and they are given the opportunity to fix up the tagger's memory (for example, they might set up some function pointers that represent virtual methods for consuming packets). Because the memory is owned by the tagger, there exists a ->disconnect() method for the tagger (which is the place to free the resources), but there doesn't exist a ->disconnect() method for the switch driver. This is part of the design. The switch driver should make minimal use of the public part of the tagger data, and only after type-checking it using the supplied "proto" argument. In the code there are in fact two binding events, one is the initial event in dsa_switch_setup_tag_protocol(). At this stage, the cross chip notifier chains aren't initialized, so we call each switch's connect() method by hand. Then there is dsa_tree_bind_tag_proto() during dsa_tree_change_tag_proto(), and here we have an old protocol and a new one. We first connect to the new one before disconnecting from the old one, to simplify error handling a bit and to ensure we remain in a valid state at all times. Co-developed-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:51:33 +00:00
Tobias Waldekranz	e0068620e5	net: dsa: mv88e6xxx: Add tx fwd offload PVT on intermediate devices In a typical mv88e6xxx switch tree like this: CPU \| .----. .--0--. \| .--0--. \| sw0 \| \| \| sw1 \| '-1-2-' \| '-1-2-' '---' If sw1p{1,2} are added to a bridge that sw0p1 is not a part of, sw0 still needs to add a crosschip PVT entry for the virtual DSA device assigned to represent the bridge. Fixes: `ce5df6894a` ("net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:38:23 +00:00
xu xin	8c8b7aa7fb	net: Enable neighbor sysctls that is save for userns root Inside netns owned by non-init userns, sysctls about ARP/neighbor is currently not visible and configurable. For the attributes these sysctls correspond to, any modifications make effects on the performance of networking(ARP, especilly) only in the scope of netns, which does not affect other netns. Actually, some tools via netlink can modify these attribute. iproute2 is an example. see as follows: $ unshare -ur -n $ cat /proc/sys/net/ipv4/neigh/lo/retrans_time cat: can't open '/proc/sys/net/ipv4/neigh/lo/retrans_time': No such file or directory $ ip ntable show dev lo inet arp_cache dev lo refcnt 1 reachable 19494 base_reachable 30000 retrans 1000 gc_stale 60000 delay_probe 5000 queue 101 app_probes 0 ucast_probes 3 mcast_probes 3 anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000 inet6 ndisc_cache dev lo refcnt 1 reachable 42394 base_reachable 30000 retrans 1000 gc_stale 60000 delay_probe 5000 queue 101 app_probes 0 ucast_probes 3 mcast_probes 3 anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 0 $ ip ntable change name arp_cache dev <if> retrans 2000 inet arp_cache dev lo refcnt 1 reachable 22917 base_reachable 30000 retrans 2000 gc_stale 60000 delay_probe 5000 queue 101 app_probes 0 ucast_probes 3 mcast_probes 3 anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000 inet6 ndisc_cache dev lo refcnt 1 reachable 35524 base_reachable 30000 retrans 1000 gc_stale 60000 delay_probe 5000 queue 101 app_probes 0 ucast_probes 3 mcast_probes 3 anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 0 Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Acked-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-12-12 12:34:38 +00:00
Jakub Kicinski	77ab714f00	Merge branch 'add-fdma-support-on-ocelot-switch-driver' Clément Léger says: ==================== Add FDMA support on ocelot switch driver This series adds support for the Frame DMA present on the VSC7514 switch. The FDMA is able to extract and inject packets on the various ethernet interfaces present on the switch. ==================== Link: https://lore.kernel.org/r/20211209154911.3152830-1-clement.leger@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 20:57:00 -08:00
Clément Léger	753a026cfe	net: ocelot: add FDMA support Ethernet frames can be extracted or injected autonomously to or from the device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list data structures in memory are used for injecting or extracting Ethernet frames. The FDMA generates interrupts when frame extraction or injection is done and when the linked lists need updating. The FDMA is shared between all the ethernet ports of the switch and uses a linked list of descriptors (DCB) to inject and extract packets. Before adding descriptors, the FDMA channels must be stopped. It would be inefficient to do that each time a descriptor would be added so the channels are restarted only once they stopped. Both channels uses ring-like structure to feed the DCBs to the FDMA. head and tail are never touched by hardware and are completely handled by the driver. On top of that, page recycling has been added and is mostly taken from gianfar driver. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Co-developed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 20:56:58 -08:00
Clément Léger	de5841e1c9	net: ocelot: add support for ndo_change_mtu This commit adds support for changing MTU for the ocelot register based interface. For ocelot, JUMBO frame size can be set up to 25000 bytes but has been set to 9000 which is a saner value and allows for maximum gain of performance with FDMA. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 20:56:57 -08:00
Clément Léger	b471a71e52	net: ocelot: add and export ocelot_ptp_rx_timestamp() In order to support PTP in FDMA, PTP handling code is needed. Since this is the same as for register-based extraction, export it with a new ocelot_ptp_rx_timestamp() function. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 20:56:57 -08:00
Clément Léger	e5150f0072	net: ocelot: export ocelot_ifh_port_set() to setup IFH FDMA will need this code to prepare the injection frame header when sending SKBs. Move this code into ocelot_ifh_port_set() and add conditional IFH setting for vlan and rew op if they are not set. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 20:56:57 -08:00
Jakub Kicinski	1868d997cf	Merge branch 'net-wwan-iosm-improvements' M Chetan Kumar says: ==================== net: wwan: iosm: improvements This patch series brings in IOSM driver improvments. PATCH1: Set tx queue len. PATCH2: Release data channel if there is no active IP session. PATCH3: Removes dead code. PATCH4: Correct open parenthesis alignment. ==================== Link: https://lore.kernel.org/r/20211209143230.3054755-1-m.chetan.kumar@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 20:53:59 -08:00
M Chetan Kumar	dd464f145c	net: wwan: iosm: correct open parenthesis alignment Fix checkpatch warning in iosm_ipc_mmio.c - Alignment should match open parenthesis Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 20:53:56 -08:00
M Chetan Kumar	8a7ed60050	net: wwan: iosm: removed unused function decl ipc_wwan_tx_flowctrl() is declared in iosm_ipc_wwan.h but is not defined. Removed the dead code. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 20:53:56 -08:00
M Chetan Kumar	da633aa316	net: wwan: iosm: release data channel in case no active IP session If there is no active IP session (interface up & running) then release the data channel. Use nr_sessions variable to track current active IP sessions. If the count drops to 0, then send pipe close ctrl message to release the data channel. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 20:53:56 -08:00
M Chetan Kumar	5d710dc331	net: wwan: iosm: set tx queue len Set wwan net dev tx queue len to DEFAULT_TX_QUEUE_LEN. Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 20:53:56 -08:00
Colin Foster	840ece19e9	net: ocelot: fix missed include in the vsc7514_regs.h file commit `32ecd22ba6` ("net: mscc: ocelot: split register definitions to a separate file") left out an include for <soc/mscc/ocelot_vcap.h>. It was missed because the only consumer was ocelot_vsc7514.h, which already included ocelot_vcap. Fixes: `32ecd22ba6` ("net: mscc: ocelot: split register definitions to a separate file") Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20211209074010.1813010-1-colin.foster@in-advantage.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 19:52:10 -08:00
Erik Ekman	7adf905333	net: bna: Update supported link modes The BR-series installation guide from https://driverdownloads.qlogic.com/ mentions the cards support 10Gbase-SR/LR as well as direct attach cables. The cards only have SFP+ ports, so 10000baseT is not the right mode. Switch to using more specific link modes added in commit `5711a98221` ("net: ethtool: add support for 1000BaseX and missing 10G link modes"). Only compile tested. Signed-off-by: Erik Ekman <erik@kryo.se> Link: https://lore.kernel.org/r/20211208230022.153496-1-erik@kryo.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 19:50:40 -08:00
Kuniyuki Iwashima	33d60fbd21	sock: Use sock_owned_by_user_nocheck() instead of sk_lock.owned. This patch moves sock_release_ownership() down in include/net/sock.h and replaces some sk_lock.owned tests with sock_owned_by_user_nocheck(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/r/20211208062158.54132-1-kuniyu@amazon.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 19:43:00 -08:00
Jakub Kicinski	be3158290d	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Andrii Nakryiko says: ==================== bpf-next 2021-12-10 v2 We've added 115 non-merge commits during the last 26 day(s) which contain a total of 182 files changed, 5747 insertions(+), 2564 deletions(-). The main changes are: 1) Various samples fixes, from Alexander Lobakin. 2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov. 3) A batch of new unified APIs for libbpf, logging improvements, version querying, etc. Also a batch of old deprecations for old APIs and various bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko. 4) BPF documentation reorganization and improvements, from Christoph Hellwig and Dave Tucker. 5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in libbpf, from Hengqi Chen. 6) Verifier log fixes, from Hou Tao. 7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong. 8) Extend branch record capturing to all platforms that support it, from Kajol Jain. 9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi. 10) bpftool doc-generating script improvements, from Quentin Monnet. 11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet. 12) Deprecation warning fix for perf/bpf_counter, from Song Liu. 13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf, from Tiezhu Yang. 14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song. 15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun, and others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (115 commits) libbpf: Add "bool skipped" to struct bpf_map libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition bpftool: Switch bpf_object__load_xattr() to bpf_object__load() selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr() selftests/bpf: Add test for libbpf's custom log_buf behavior selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load() libbpf: Deprecate bpf_object__load_xattr() libbpf: Add per-program log buffer setter and getter libbpf: Preserve kernel error code and remove kprobe prog type guessing libbpf: Improve logging around BPF program loading libbpf: Allow passing user log setting through bpf_object_open_opts libbpf: Allow passing preallocated log_buf when loading BTF into kernel libbpf: Add OPTS-based bpf_btf_load() API libbpf: Fix bpf_prog_load() log_buf logic for log_level 0 samples/bpf: Remove unneeded variable bpf: Remove redundant assignment to pointer t selftests/bpf: Fix a compilation warning perf/bpf_counter: Use bpf_map_create instead of bpf_create_map samples: bpf: Fix 'unknown warning group' build warning on Clang samples: bpf: Fix xdp_sample_user.o linking with Clang ... ==================== Link: https://lore.kernel.org/r/20211210234746.2100561-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 15:56:13 -08:00
Shuyi Cheng	229fae38d0	libbpf: Add "bool skipped" to struct bpf_map Fix error: "failed to pin map: Bad file descriptor, path: /sys/fs/bpf/_rodata_str1_1." In the old kernel, the global data map will not be created, see [0]. So we should skip the pinning of the global data map to avoid bpf_object__pin_maps returning error. Therefore, when the map is not created, we mark “map->skipped" as true and then check during relocation and during pinning. Fixes: `16e0c35c6f` ("libbpf: Load global data maps lazily on legacy kernels") Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2021-12-10 15:35:30 -08:00
Vincent Minet	b69c5c07a6	libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition The btf__dedup_deprecated name was misspelled in the definition of the compat symbol for btf__dedup. This leads it to be missing from the shared library. This fixes it. Fixes: `957d350a8b` ("libbpf: Turn btf_dedup_opts into OPTS-based struct") Signed-off-by: Vincent Minet <vincent@vincent-minet.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211210063112.80047-1-vincent@vincent-minet.net	2021-12-10 15:35:29 -08:00
Alexei Starovoitov	bd6b3b355a	Merge branch 'Enhance and rework logging controls in libbpf' Andrii Nakryiko says: ==================== Add new open options and per-program setters to control BTF and program loading log verboseness and allow providing custom log buffers to capture logs of interest. Note how custom log_buf and log_level are orthogonal, which matches previous (alas less customizable) behavior of libbpf, even though it sort of worked by accident: if someone specified log_level = 1 in bpf_object__load_xattr(), first attempt to load any BPF program resulted in wasted bpf() syscall with -EINVAL due to !!log_buf != !!log_level. Then on retry libbpf would allocated log_buffer and try again, after which prog loading would succeed and libbpf would print verbose program loading log through its print callback. This behavior is now documented and made more efficient, not wasting unnecessary syscall. But additionally, log_level can be controlled globally on a per-bpf_object level through bpf_object_open_opts, as well as on a per-program basis with bpf_program__set_log_buf() and bpf_program__set_log_level() APIs. Now that we have a more future-proof way to set log_level, deprecate bpf_object__load_xattr(). v2->v3: - added log_buf selftests for bpf_prog_load() and bpf_btf_load(); - fix !log_buf in bpf_prog_load (John); - fix log_level==0 in bpf_btf_load (thanks selftest!); v1->v2: - fix log_level == 0 handling of bpf_prog_load, add as patch #1 (Alexei); - add comments explaining log_buf_size overflow prevention (Alexei). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2021-12-10 15:35:14 -08:00
Andrii Nakryiko	b59e4ce8bc	bpftool: Switch bpf_object__load_xattr() to bpf_object__load() Switch all the uses of to-be-deprecated bpf_object__load_xattr() into a simple bpf_object__load() calls with optional log_level passed through open_opts.kernel_log_level, if -d option is specified. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-13-andrii@kernel.org	2021-12-10 15:29:18 -08:00
Andrii Nakryiko	3fc5fdcca1	selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr() Switch from bpf_object__load_xattr() to bpf_object__load() and kernel_log_level in bpf_object_open_opts. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-12-andrii@kernel.org	2021-12-10 15:29:18 -08:00
Andrii Nakryiko	57e889269a	selftests/bpf: Add test for libbpf's custom log_buf behavior Add a selftest that validates that per-program and per-object log_buf overrides work as expected. Also test same logic for low-level bpf_prog_load() and bpf_btf_load() APIs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-11-andrii@kernel.org	2021-12-10 15:29:18 -08:00
Andrii Nakryiko	dc94121b5c	selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load() Switch all selftests uses of to-be-deprecated bpf_load_btf() with equivalent bpf_btf_load() calls. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-10-andrii@kernel.org	2021-12-10 15:29:18 -08:00
Andrii Nakryiko	e7b924ca71	libbpf: Deprecate bpf_object__load_xattr() Deprecate non-extensible bpf_object__load_xattr() in v0.8 ([0]). With log_level control through bpf_object_open_opts or bpf_program__set_log_level(), we are finally at the point where bpf_object__load_xattr() doesn't provide any functionality that can't be accessed through other (better) ways. The other feature, target_btf_path, is also controllable through bpf_object_open_opts. [0] Closes: https://github.com/libbpf/libbpf/issues/289 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-9-andrii@kernel.org	2021-12-10 15:29:18 -08:00
Andrii Nakryiko	b3ce907950	libbpf: Add per-program log buffer setter and getter Allow to set user-provided log buffer on a per-program basis ([0]). This gives great deal of flexibility in terms of which programs are loaded with logging enabled and where corresponding logs go. Log buffer set with bpf_program__set_log_buf() overrides kernel_log_buf and kernel_log_size settings set at bpf_object open time through bpf_object_open_opts, if any. Adjust bpf_object_load_prog_instance() logic to not perform own log buf allocation and load retry if custom log buffer is provided by the user. [0] Closes: https://github.com/libbpf/libbpf/issues/418 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-8-andrii@kernel.org	2021-12-10 15:29:18 -08:00
Andrii Nakryiko	2eda2145eb	libbpf: Preserve kernel error code and remove kprobe prog type guessing Instead of rewriting error code returned by the kernel of prog load with libbpf-sepcific variants pass through the original error. There is now also no need to have a backup generic -LIBBPF_ERRNO__LOAD fallback error as bpf_prog_load() guarantees that errno will be properly set no matter what. Also drop a completely outdated and pretty useless BPF_PROG_TYPE_KPROBE guess logic. It's not necessary and neither it's helpful in modern BPF applications. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-7-andrii@kernel.org	2021-12-10 15:29:18 -08:00
Andrii Nakryiko	ad9a7f9644	libbpf: Improve logging around BPF program loading Add missing "prog '%s': " prefixes in few places and use consistently markers for beginning and end of program load logs. Here's an example of log output: libbpf: prog 'handler': BPF program load failed: Permission denied libbpf: -- BEGIN PROG LOAD LOG --- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 ; out1 = in1; 0: (18) r1 = 0xffffc9000cdcc000 2: (61) r1 = (u32 )(r1 +0) ... 81: (63) (u32 )(r4 +0) = r5 R1_w=map_value(id=0,off=16,ks=4,vs=20,imm=0) R4=map_value(id=0,off=400,ks=4,vs=16,imm=0) invalid access to map value, value_size=16 off=400 size=4 R4 min value is outside of the allowed memory range processed 63 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: failed to load program 'handler' libbpf: failed to load object 'test_skeleton' The entire verifier log, including BEGIN and END markers are now always youtput during a single print callback call. This should make it much easier to post-process or parse it, if necessary. It's not an explicit API guarantee, but it can be reasonably expected to stay like that. Also __bpf_object__open is renamed to bpf_object_open() as it's always an adventure to find the exact function that implements bpf_object's open phase, so drop the double underscored and use internal libbpf naming convention. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-6-andrii@kernel.org	2021-12-10 15:29:17 -08:00
Andrii Nakryiko	e0e3ea888c	libbpf: Allow passing user log setting through bpf_object_open_opts Allow users to provide their own custom log_buf, log_size, and log_level at bpf_object level through bpf_object_open_opts. This log_buf will be used during BTF loading. Subsequent patch will use same log_buf during BPF program loading, unless overriden at per-bpf_program level. When such custom log_buf is provided, libbpf won't be attempting retrying loading of BTF to try to provide its own log buffer to capture kernel's error log output. User is responsible to provide big enough buffer, otherwise they run a risk of getting -ENOSPC error from the bpf() syscall. See also comments in bpf_object_open_opts regarding log_level and log_buf interactions. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-5-andrii@kernel.org	2021-12-10 15:29:17 -08:00
Andrii Nakryiko	1a190d1e8e	libbpf: Allow passing preallocated log_buf when loading BTF into kernel Add libbpf-internal btf_load_into_kernel() that allows to pass preallocated log_buf and custom log_level to be passed into kernel during BPF_BTF_LOAD call. When custom log_buf is provided, btf_load_into_kernel() won't attempt an retry with automatically allocated internal temporary buffer to capture BTF validation log. It's important to note the relation between log_buf and log_level, which slightly deviates from stricter kernel logic. From kernel's POV, if log_buf is specified, log_level has to be > 0, and vice versa. While kernel has good reasons to request such "sanity, this, in practice, is a bit unconvenient and restrictive for libbpf's high-level bpf_object APIs. So libbpf will allow to set non-NULL log_buf and log_level == 0. This is fine and means to attempt to load BTF without logging requested, but if it failes, retry the load with custom log_buf and log_level 1. Similar logic will be implemented for program loading. In practice this means that users can provide custom log buffer just in case error happens, but not really request slower verbose logging all the time. This is also consistent with libbpf behavior when custom log_buf is not set: libbpf first tries to load everything with log_level=0, and only if error happens allocates internal log buffer and retries with log_level=1. Also, while at it, make BTF validation log more obvious and follow the log pattern libbpf is using for dumping BPF verifier log during BPF_PROG_LOAD. BTF loading resulting in an error will look like this: libbpf: BTF loading error: -22 libbpf: -- BEGIN BTF LOAD LOG --- magic: 0xeb9f version: 1 flags: 0x0 hdr_len: 24 type_off: 0 type_len: 1040 str_off: 1040 str_len: 2063598257 btf_total_size: 1753 Total section length too long -- END BTF LOAD LOG -- libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring. This makes it much easier to find relevant parts in libbpf log output. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-4-andrii@kernel.org	2021-12-10 15:29:17 -08:00
Andrii Nakryiko	0ed08d6725	libbpf: Add OPTS-based bpf_btf_load() API Similar to previous bpf_prog_load() and bpf_map_create() APIs, add bpf_btf_load() API which is taking optional OPTS struct. Schedule bpf_load_btf() for deprecation in v0.8 ([0]). This makes naming consistent with BPF_BTF_LOAD command, sets up an API for extensibility in the future, moves options parameters (log-related fields) into optional options, and also allows to pass log_level directly. It also removes log buffer auto-allocation logic from low-level API (consistent with bpf_prog_load() behavior), but preserves a special treatment of log_level == 0 with non-NULL log_buf, which matches low-level bpf_prog_load() and high-level libbpf APIs for BTF and program loading behaviors. [0] Closes: https://github.com/libbpf/libbpf/issues/419 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-3-andrii@kernel.org	2021-12-10 15:29:17 -08:00
Andrii Nakryiko	4cf23a3c63	libbpf: Fix bpf_prog_load() log_buf logic for log_level 0 To unify libbpf APIs behavior w.r.t. log_buf and log_level, fix bpf_prog_load() to follow the same logic as bpf_btf_load() and high-level bpf_object__load() API will follow in the subsequent patches: - if log_level is 0 and non-NULL log_buf is provided by a user, attempt load operation initially with no log_buf and log_level set; - if successful, we are done, return new FD; - on error, retry the load operation with log_level bumped to 1 and log_buf set; this way verbose logging will be requested only when we are sure that there is a failure, but will be fast in the common/expected success case. Of course, user can still specify log_level > 0 from the very beginning to force log collection. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211209193840.1248570-2-andrii@kernel.org	2021-12-10 15:29:17 -08:00
Ye Guojin	db10415448	selftests: mptcp: remove duplicate include in mptcp_inq.c 'sys/ioctl.h' included in 'mptcp_inq.c' is duplicated. Reported-by: ZealRobot <zealci@zte.com.cn> Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20211210071424.425773-1-ye.guojin@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 11:00:32 -08:00
Eric Dumazet	e1b539bd73	xfrm: add net device refcount tracker to struct xfrm_state_offload Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Link: https://lore.kernel.org/r/20211209154451.4184050-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 08:16:33 -08:00
Jakub Kicinski	3d20408dff	Merge branch 'net-netns-refcount-tracking-base-series' Eric Dumazet says: ==================== net: netns refcount tracking, base series We have 100+ syzbot reports about netns being dismantled too soon, still unresolved as of today. We think a missing get_net() or an extra put_net() is the root cause. In order to find the bug(s), and be able to spot future ones, this patch adds CONFIG_NET_NS_REFCNT_TRACKER and new helpers to precisely pair all put_net() with corresponding get_net(). To use these helpers, each data structure owning a refcount should also use a "netns_tracker" to pair the get() and put(). Small sections of codes where the get()/put() are in sight do not need to have a tracker, because they are short lived, but in theory it is also possible to declare an on-stack tracker. v2: Include core networking patches only. ==================== Link: https://lore.kernel.org/r/20211210074426.279563-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 06:38:30 -08:00
Eric Dumazet	11b311a867	ppp: add netns refcount tracker Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 06:38:27 -08:00
Eric Dumazet	285ec2fef4	l2tp: add netns refcount tracker to l2tp_dfs_seq_data Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 06:38:27 -08:00
Eric Dumazet	dbdcda634c	net: sched: add netns refcount tracker to struct tcf_exts Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 06:38:26 -08:00
Eric Dumazet	04a931e58d	net: add netns refcount tracker to struct seq_net_private Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 06:38:26 -08:00
Eric Dumazet	ffa84b5ffb	net: add netns refcount tracker to struct sock Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 06:38:26 -08:00
Eric Dumazet	9ba74e6c9e	net: add networking namespace refcount tracker We have 100+ syzbot reports about netns being dismantled too soon, still unresolved as of today. We think a missing get_net() or an extra put_net() is the root cause. In order to find the bug(s), and be able to spot future ones, this patch adds CONFIG_NET_NS_REFCNT_TRACKER and new helpers to precisely pair all put_net() with corresponding get_net(). To use these helpers, each data structure owning a refcount should also use a "netns_tracker" to pair the get and put. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-10 06:38:26 -08:00
Geert Uytterhoeven	e5d75fc20b	sh_eth: Use dev_err_probe() helper Use the dev_err_probe() helper, instead of open-coding the same operation. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Link: https://lore.kernel.org/r/2576cc15bdbb5be636640f491bcc087a334e2c02.1638959463.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-12-09 19:10:23 -08:00

1 2 3 4 5 ...

1060369 Commits