linux

iv/linux

Author	SHA1	Message	Date
David Vernet	57e7c169cd	bpf: Add __bpf_kfunc tag for marking kernel functions as kfuncs kfuncs are functions defined in the kernel, which may be invoked by BPF programs. They may or may not also be used as regular kernel functions, implying that they may be static (in which case the compiler could e.g. inline it away, or elide one or more arguments), or it could have external linkage, but potentially be elided in an LTO build if a function is observed to never be used, and is stripped from the final kernel binary. This has already resulted in some issues, such as those discussed in [0] wherein changes in DWARF that identify when a parameter has been optimized out can break BTF encodings (and in general break the kfunc). [0]: https://lore.kernel.org/all/1675088985-20300-2-git-send-email-alan.maguire@oracle.com/ We therefore require some convenience macro that kfunc developers can use just add to their kfuncs, and which will prevent all of the above issues from happening. This is in contrast with what we have today, where some kfunc definitions have "noinline", some have "__used", and others are static and have neither. Note that longer term, this mechanism may be replaced by a macro that more closely resembles EXPORT_SYMBOL_GPL(), as described in [1]. For now, we're going with this shorter-term approach to fix existing issues in kfuncs. [1]: https://lore.kernel.org/lkml/Y9AFT4pTydKh+PD3@maniforge.lan/ Note as well that checkpatch complains about this patch with the following: ERROR: Macros with complex values should be enclosed in parentheses +#define __bpf_kfunc __used noinline There seems to be a precedent for using this pattern in other places such as compiler_types.h (see e.g. __randomize_layout and noinstr), so it seems appropriate. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230201173016.342758-2-void@manifault.com	2023-02-02 00:25:13 +01:00
Daniel Borkmann	10d1b0e4da	Merge branch 'xdp-ice-mbuf' Maciej Fijalkowski says: ==================== Although this work started as an effort to add multi-buffer XDP support to ice driver, as usual it turned out that some other side stuff needed to be addressed, so let me give you an overview. First patch adjusts legacy-rx in a way that it will be possible to refer to skb_shared_info being at the end of the buffer when gathering up frame fragments within xdp_buff. Then, patches 2-9 prepare ice driver in a way that actual multi-buffer patches will be easier to swallow. 10 and 11 are the meat. What is worth mentioning is that this set actually fixes things as patch 11 removes the logic based on next_dd/rs and we previously stepped away from this for ice_xmit_zc(). Currently, AF_XDP ZC XDP_TX workload is off as there are two cleaning sides that can be triggered and two of them work on different internal logic. This set unifies that and allows us to improve the performance by 2x with a trick on the last (13) patch. 12th is a simple cleanup of no longer fields from Tx ring. I might be wrong but I have not seen anyone reporting performance impact among patches that add XDP multi-buffer support to a particular driver. Numbers below were gathered via xdp_rxq_info and xdp_redirect_map on 1500 MTU: XDP_DROP +1% XDP_PASS -1,2% XDP_TX -0,5% XDP_REDIRECT -3,3% Cherry on top, which is not directly related to mbuf support (last patch): XDP_TX ZC +126% Target the we agreed on was to not degrade performance for any action by anything that would be over 5%, so our goal was met. Basically this set keeps the performance where it was. Redirect is slower due to more frequent tail bumps. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>	2023-02-01 23:31:28 +01:00
Maciej Fijalkowski	a24b4c6e9a	ice: xsk: Do not convert to buff to frame for XDP_TX Let us store pointer to xdp_buff that came from xsk_buff_pool on tx_buf so that it will be possible to recycle it via xsk_buff_free() on Tx cleaning side. This way it is not necessary to do expensive copy to another xdp_buff backed by a newly allocated page. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-14-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	f4db7b314d	ice: Remove next_{dd,rs} fields from ice_tx_ring Now that both ZC and standard XDP data paths stopped using Tx logic based on next_dd and next_rs fields, we can safely remove these fields and shrink Tx ring structure. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-13-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	3246a10752	ice: Add support for XDP multi-buffer on Tx side Similarly as for Rx side in previous patch, logic on XDP Tx in ice driver needs to be adjusted for multi-buffer support. Specifically, the way how HW Tx descriptors are produced and cleaned. Currently, XDP_TX works on strict ring boundaries, meaning it sets RS bit (on producer side) / looks up DD bit (on consumer/cleaning side) every quarter of the ring. It means that if for example multi buffer frame would span across the ring quarter boundary (say that frame consists of 4 frames and we start from 62 descriptor where ring is sized to 256 entries), RS bit would be produced in the middle of multi buffer frame, which would be a broken behavior as it needs to be set on the last descriptor of the frame. To make it work, set RS bit at the last descriptor from the batch of frames that XDP_TX action was used on and make the first entry remember the index of last descriptor with RS bit set. This way, cleaning side can take the index of descriptor with RS bit, look up DD bit's presence and clean from first entry to last. In order to clean up the code base introduce the common ice_set_rs_bit() which will return index of descriptor that got RS bit produced on so that standard driver can store this within proper ice_tx_buf and ZC driver can simply ignore return value. Co-developed-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-12-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	2fba7dc515	ice: Add support for XDP multi-buffer on Rx side Ice driver needs to be a bit reworked on Rx data path in order to support multi-buffer XDP. For skb path, it currently works in a way that Rx ring carries pointer to skb so if driver didn't manage to combine fragmented frame at current NAPI instance, it can restore the state on next instance and keep looking for last fragment (so descriptor with EOP bit set). What needs to be achieved is that xdp_buff needs to be combined in such way (linear + frags part) in the first place. Then skb will be ready to go in case of XDP_PASS or BPF program being not present on interface. If BPF program is there, it would work on multi-buffer XDP. At this point xdp_buff resides directly on Rx ring, so given the fact that skb will be built straight from xdp_buff, there will be no further need to carry skb on Rx ring. Besides removing skb pointer from Rx ring, lots of members have been moved around within ice_rx_ring. First and foremost reason was to place rx_buf with xdp_buff on the same cacheline. This means that once we touch rx_buf (which is a preceding step before touching xdp_buff), xdp_buff will already be hot in cache. Second thing was that xdp_rxq is used rather rarely and it occupies a separate cacheline, so maybe it is better to have it at the end of ice_rx_ring. Other change that affects ice_rx_ring is the introduction of ice_rx_ring::first_desc. Its purpose is twofold - first is to propagate rx_buf->act to all the parts of current xdp_buff after running XDP program, so that ice_put_rx_buf() that got moved out of the main Rx processing loop will be able to tak an appriopriate action on each buffer. Second is for ice_construct_skb(). ice_construct_skb() has a copybreak mechanism which had an explicit impact on xdp_buff->skb conversion in the new approach when legacy Rx flag is toggled. It works in a way that linear part is 256 bytes long, if frame is bigger than that, remaining bytes are going as a frag to skb_shared_info. This means while memcpying frags from xdp_buff to newly allocated skb, care needs to be taken when picking the destination frag array entry. Upon the time ice_construct_skb() is called, when dealing with fragmented frame, current rx_buf points to the last fragment, but copybreak needs to be done against the first one. That's where ice_rx_ring::first_desc helps. When frame building spans across NAPI polls (DD bit is not set on current descriptor and xdp->data is not NULL) with current Rx buffer handling state there might be some problems. Since calls to ice_put_rx_buf() were pulled out of the main Rx processing loop and were scoped from cached_ntc to current ntc, remember that now mentioned function relies on rx_buf->act, which is set within ice_run_xdp(). ice_run_xdp() is called when EOP bit was found, so currently we could put Rx buffer with rx_buf->act being uninitialized. To address this, change scoping to rely on first_desc on both boundaries instead. This also implies that cleaned_count which is used as an input to ice_alloc_rx_buffers() and tells how many new buffers should be refilled has to be adjusted. If it stayed as is, what could happen is a case where ntc would go over ntu. Therefore, remove cleaned_count altogether and use against allocing routine newly introduced ICE_RX_DESC_UNUSED() macro which is an equivalent of ICE_DESC_UNUSED() dedicated for Rx side and based on struct ice_rx_ring::first_desc instead of next_to_clean. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-11-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	8a11b334ec	ice: Use xdp->frame_sz instead of recalculating truesize SKB path calculates truesize on three different functions, which could be avoided as xdp_buff carries the already calculated truesize under xdp_buff::frame_sz. If ice_add_rx_frag() is adjusted to take the xdp_buff as an input just like functions responsible for creating sk_buff initially, codebase could be simplified by removing these redundant recalculations and rely on xdp_buff::frame_sz instead. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-10-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	9070fe3da0	ice: Do not call ice_finalize_xdp_rx() unnecessarily Currently ice_finalize_xdp_rx() is called only when xdp_prog is present on VSI, which is a good thing. However, this optimization can be enhanced and check only if any of the XDP_TX/XDP_REDIRECT took place in current Rx processing. Non-zero value of @xdp_xmit indicates that xdp_prog is present on VSI. This way XDP_DROP-based workloads will not suffer from unnecessary calls to external function. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-9-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	60bc72b3c4	ice: Use ice_max_xdp_frame_size() in ice_xdp_setup_prog() This should have been used in there from day 1, let us address that before introducing XDP multi-buffer support for Rx side. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-8-maciej.fijalkowski@intel.com	2023-02-01 23:30:27 +01:00
Maciej Fijalkowski	1dc1a7e7f4	ice: Centrallize Rx buffer recycling Currently calls to ice_put_rx_buf() are sprinkled through ice_clean_rx_irq() - first place is for explicit flow director's descriptor handling, second is after running XDP prog and the last one is after taking care of skb. 1st callsite was actually only for ntc bump purpose, as Rx buffer to be recycled is not even passed to a function. It is possible to walk through Rx buffers processed in particular NAPI cycle by caching ntc from beginning of the ice_clean_rx_irq(). To do so, let us store XDP verdict inside ice_rx_buf, so action we need to take on will be known. For XDP prog absence, just store ICE_XDP_PASS as a verdict. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-7-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Maciej Fijalkowski	e44f4790a2	ice: Inline eop check This might be in future used by ZC driver and might potentially yield a minor performance boost. While at it, constify arguments that ice_is_non_eop() takes, since they are pointers and this will help compiler while generating asm. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-6-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Maciej Fijalkowski	d7956d81f1	ice: Pull out next_to_clean bump out of ice_put_rx_buf() Plan is to move ice_put_rx_buf() to the end of ice_clean_rx_irq() so in order to keep the ability of walking through HW Rx descriptors, pull out next_to_clean handling out of ice_put_rx_buf(). Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-5-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Maciej Fijalkowski	ac07533911	ice: Store page count inside ice_rx_buf This will allow us to avoid carrying additional auxiliary array of page counts when dealing with XDP multi buffer support. Previously combining fragmented frame to skb was not affected in the same way as XDP would be as whole frame is needed to be in place before executing XDP prog. Therefore, when going through HW Rx descriptors one-by-one, calls to ice_put_rx_buf() need to be taken after running XDP prog on a potentially multi buffered frame, so some additional storage of page count is needed. By adding page count to rx buf, it will make it easier to walk through processed entries at the end of rx cleaning routine and decide whether or not buffers should be recycled. While at it, bump ice_rx_buf::pagecnt_bias from u16 up to u32. It was proven many times that calculations on variables smaller than standard register size are harmful. This was also the case during experiments with embedding page count to ice_rx_buf - when this was added as u16 it had a performance impact. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-4-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Maciej Fijalkowski	cb0473e0e9	ice: Add xdp_buff to ice_rx_ring struct In preparation for XDP multi-buffer support, let's store xdp_buff on Rx ring struct. This will allow us to combine fragmented frames across separate NAPI cycles in the same way as currently skb fragments are handled. This means that skb pointer on Rx ring will become redundant and will be removed. For now it is kept and layout of Rx ring struct was not inspected, some member movement will be needed later on so that will be the time to take care of it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-3-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Maciej Fijalkowski	c61bcebde7	ice: Prepare legacy-rx for upcoming XDP multi-buffer support Rx path is going to be modified in a way that fragmented frame will be gathered within xdp_buff in the first place. This approach implies that underlying buffer has to provide tailroom for skb_shared_info. This is currently the case when ring uses build_skb but not when legacy-rx knob is turned on. This case configures 2k Rx buffers and has no way to provide either headroom or tailroom - FWIW it currently has XDP_PACKET_HEADROOM which is broken and in here it is removed. 2k Rx buffers were used so driver in this setting was able to support 9k MTU as it can chain up to 5 Rx buffers. With offset configuring HW writing 2k of a data was passing the half of the page which broke the assumption of our internal page recycling tricks. Now if above got fixed and legacy-rx path would be left as is, when referring to skb_shared_info via xdp_get_shared_info_from_buff(), packet's content would be corrupted again. Hence size of Rx buffer needs to be lowered and therefore supported MTU. This operation will allow us to keep the unified data path and with 8k MTU users (if any of legacy-rx) would still be good to go. However, tendency is to drop the support for this code path at some point. Add ICE_RXBUF_1664 as vsi::rx_buf_len and ICE_MAX_FRAME_LEGACY_RX (8320) as vsi::max_frame for legacy-rx. For bigger page sizes configure 3k Rx buffers, not 2k. Since headroom support is removed, disable data_meta support on legacy-rx. When preparing XDP buff, rely on ice_rx_ring::rx_offset setting when deciding whether to support data_meta or not. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-2-maciej.fijalkowski@intel.com	2023-02-01 23:30:26 +01:00
Alexei Starovoitov	c1a3daf736	Merge branch 'Support bpf trampoline for s390x' Ilya Leoshkevich says: ==================== v2: https://lore.kernel.org/bpf/20230128000650.1516334-1-iii@linux.ibm.com/#t v2 -> v3: - Make __arch_prepare_bpf_trampoline static. (Reported-by: kernel test robot <lkp@intel.com>) - Support both old- and new- style map definitions in sk_assign. (Alexei) - Trim DENYLIST.s390x. (Alexei) - Adjust s390x vmlinux path in vmtest.sh. - Drop merged fixes. v1: https://lore.kernel.org/bpf/20230125213817.1424447-1-iii@linux.ibm.com/#t v1 -> v2: - Fix core_read_macros, sk_assign, test_profiler, test_bpffs (24/31; I'm not quite happy with the fix, but don't have better ideas), and xdp_synproxy. (Andrii) - Prettify liburandom_read and verify_pkcs7_sig fixes. (Andrii) - Fix bpf_usdt_arg using barrier_var(); prettify barrier_var(). (Andrii) - Change BPF_MAX_TRAMP_LINKS to enum and query it using BTF. (Andrii) - Improve bpf_jit_supports_kfunc_call() description. (Alexei) - Always check sign_extend() return value. - Cc: Alexander Gordeev. Hi, This series implements poke, trampoline, kfunc, and mixing subprogs and tailcalls on s390x. The following failures still remain: #82 get_stack_raw_tp:FAIL get_stack_print_output:FAIL:user_stack corrupted user stack Known issue: We cannot reliably unwind userspace on s390x without DWARF. #101 ksyms_module:FAIL address of kernel function bpf_testmod_test_mod_kfunc is out of range Known issue: Kernel and modules are too far away from each other on s390x. #190 stacktrace_build_id:FAIL Known issue: We cannot reliably unwind userspace on s390x without DWARF. #281 xdp_metadata:FAIL See patch 6. None of these seem to be due to the new changes. Best regards, Ilya ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-29 19:16:29 -08:00
Ilya Leoshkevich	ee105d5a50	selftests/bpf: Trim DENYLIST.s390x Now that trampoline is implemented, enable a number of tests on s390x. 18 of the remaining failures have to do with either lack of rethook (fixed by [1]) or syscall symbols missing from BTF (fixed by [2]). Do not re-classify the remaining failures for now; wait until the s390/for-next fixes are merged and re-classify only the remaining few. [1] https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=for-next&id=1a280f48c0e403903cf0b4231c95b948e664f25a [2] https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git/commit/?h=for-next&id=2213d44e140f979f4b60c3c0f8dd56d151cc8692 Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-9-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-29 19:16:29 -08:00
Ilya Leoshkevich	af320fb7dd	selftests/bpf: Fix s390x vmlinux path After commit edd4a8667355 ("s390/boot: get rid of startup archive") there is no more compressed/ subdirectory. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-8-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-29 19:16:29 -08:00
Ilya Leoshkevich	63d7b53ab5	s390/bpf: Implement bpf_jit_supports_kfunc_call() Implement calling kernel functions from eBPF. In general, the eBPF ABI is fairly close to that of s390x, with one important difference: on s390x callers should sign-extend signed arguments. Handle that by using information returned by bpf_jit_find_kfunc_model(). Here is an example of how sign extensions works. Suppose we need to call the following function from BPF: ; long noinline bpf_kfunc_call_test4(signed char a, short b, int c, long d) 0000000000936a78 <bpf_kfunc_call_test4>: 936a78: c0 04 00 00 00 00 jgnop bpf_kfunc_call_test4 ; return (long)a + (long)b + (long)c + d; 936a7e: b9 08 00 45 agr %r4,%r5 936a82: b9 08 00 43 agr %r4,%r3 936a86: b9 08 00 24 agr %r2,%r4 936a8a: c0 f4 00 1e 3b 27 jg <__s390_indirect_jump_r14> As per the s390x ABI, bpf_kfunc_call_test4() has the right to assume that a, b and c are sign-extended by the caller, which results in using 64-bit additions (agr) without any additional conversions. Without sign extension we would have the following on the JITed code side: ; tmp = bpf_kfunc_call_test4(-3, -30, -200, -1000); ; 5: b4 10 00 00 ff ff ff fd w1 = -3 0x3ff7fdcdad4: llilf %r2,0xfffffffd ; 6: b4 20 00 00 ff ff ff e2 w2 = -30 0x3ff7fdcdada: llilf %r3,0xffffffe2 ; 7: b4 30 00 00 ff ff ff 38 w3 = -200 0x3ff7fdcdae0: llilf %r4,0xffffff38 ; 8: b7 40 00 00 ff ff fc 18 r4 = -1000 0x3ff7fdcdae6: lgfi %r5,-1000 0x3ff7fdcdaec: mvc 64(4,%r15),160(%r15) 0x3ff7fdcdaf2: lgrl %r1,bpf_kfunc_call_test4@GOT 0x3ff7fdcdaf8: brasl %r14,__s390_indirect_jump_r1 This first 3 llilfs are 32-bit loads, that need to be sign-extended to 64 bits. Note: at the moment bpf_jit_find_kfunc_model() does not seem to play nicely with XDP metadata functions: add_kfunc_call() adds an "abstract" bpf_*() version to kfunc_btf_tab, but then fixup_kfunc_call() puts the concrete version into insn->imm, which bpf_jit_find_kfunc_model() cannot find. But this seems to be a common code problem. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-7-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-29 19:16:29 -08:00
Ilya Leoshkevich	dd691e847d	s390/bpf: Implement bpf_jit_supports_subprog_tailcalls() Allow mixing subprogs and tail calls by passing the current tail call count to subprogs. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-6-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-29 19:16:29 -08:00
Ilya Leoshkevich	528eb2cb87	s390/bpf: Implement arch_prepare_bpf_trampoline() arch_prepare_bpf_trampoline() is used for direct attachment of eBPF programs to various places, bypassing kprobes. It's responsible for calling a number of eBPF programs before, instead and/or after whatever they are attached to. Add a s390x implementation, paying attention to the following: - Reuse the existing JIT infrastructure, where possible. - Like the existing JIT, prefer making multiple passes instead of backpatching. Currently 2 passes is enough. If literal pool is introduced, this needs to be raised to 3. However, at the moment adding literal pool only makes the code larger. If branch shortening is introduced, the number of passes needs to be increased even further. - Support both regular and ftrace calling conventions, depending on the trampoline flags. - Use expolines for indirect calls. - Handle the mismatch between the eBPF and the s390x ABIs. - Sign-extend fmod_ret return values. invoke_bpf_prog() produces about 120 bytes; it might be possible to slightly optimize this, but reaching 50 bytes, like on x86_64, looks unrealistic: just loading cookie, __bpf_prog_enter, bpf_func, insnsi and __bpf_prog_exit as literals already takes at least 5 * 12 = 60 bytes, and we can't use relative addressing for most of them. Therefore, lower BPF_MAX_TRAMP_LINKS on s390x. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-5-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-29 19:16:28 -08:00
Ilya Leoshkevich	f1d5df84cd	s390/bpf: Implement bpf_arch_text_poke() bpf_arch_text_poke() is used to hotpatch eBPF programs and trampolines. s390x has a very strict hotpatching restriction: the only thing that is allowed to be hotpatched is conditional branch mask. Take the same approach as commit de5012b41e5c ("s390/ftrace: implement hotpatching"): create a conditional jump to a "plt", which loads the target address from memory and jumps to it; then first patch this address, and then the mask. Trampolines (introduced in the next patch) respect the ftrace calling convention: the return address is in %r0, and %r1 is clobbered. With that in mind, bpf_arch_text_poke() does not differentiate between jumps and calls. However, there is a simple optimization for jumps (for the epilogue_ip case): if a jump already points to the destination, then there is no "plt" and we can just flip the mask. For simplicity, the "plt" template is defined in assembly, and its size is used to define C arrays. There doesn't seem to be a way to convey this size to C as a constant, so it's hardcoded and double-checked during runtime. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-4-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-29 19:16:28 -08:00
Ilya Leoshkevich	bb4ef8fc3d	s390/bpf: Add expoline to tail calls All the indirect jumps in the eBPF JIT already use expolines, except for the tail call one. Fixes: de5cb6eb514e ("s390: use expoline thunks in the BPF JIT") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-29 19:16:28 -08:00
Ilya Leoshkevich	7ce878ca81	selftests/bpf: Fix sk_assign on s390x sk_assign is failing on an s390x machine running Debian "bookworm" for 2 reasons: legacy server_map definition and uninitialized addrlen in recvfrom() call. Fix by adding a new-style server_map definition and dropping addrlen (recvfrom() allows NULL values for src_addr and addrlen). Since the test should support tc built without libbpf, build the prog twice: with the old-style definition and with the new-style definition, then select the right one at runtime. This could be done at compile time too, but this would not be cross-compilation friendly. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230129190501.1624747-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-29 19:16:28 -08:00
Ilya Leoshkevich	07dcbd7325	s390/bpf: Fix a typo in a comment "desription" should be "description". Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-27-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:45:15 -08:00
Ilya Leoshkevich	49f67f393f	bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag s390x eBPF JIT needs to know whether a function return value is signed and which function arguments are signed, in order to generate code compliant with the s390x ABI. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-26-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:45:15 -08:00
Ilya Leoshkevich	0f0e5f5bd5	bpf: iterators: Split iterators.lskel.h into little- and big- endian versions iterators.lskel.h is little-endian, therefore bpf iterator is currently broken on big-endian systems. Introduce a big-endian version and add instructions regarding its generation. Unfortunately bpftool's cross-endianness capabilities are limited to BTF right now, so the procedure requires access to a big-endian machine or a configured emulator. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-25-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:45:15 -08:00
Ilya Leoshkevich	42fae973c2	libbpf: Fix BPF_PROBE_READ{_STR}_INTO() on s390x BPF_PROBE_READ_INTO() and BPF_PROBE_READ_STR_INTO() should map to bpf_probe_read() and bpf_probe_read_str() respectively in order to work correctly on architectures with !ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-24-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:45:14 -08:00
Ilya Leoshkevich	25c76ed428	libbpf: Fix unbounded memory access in bpf_usdt_arg() Loading programs that use bpf_usdt_arg() on s390x fails with: ; if (arg_num >= BPF_USDT_MAX_ARG_CNT \|\| arg_num >= spec->arg_cnt) 128: (79) r1 = (u64 )(r10 -24) ; frame1: R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 129: (25) if r1 > 0xb goto pc+83 ; frame1: R1_w=scalar(umax=11,var_off=(0x0; 0xf)) ... ; arg_spec = &spec->args[arg_num]; 135: (79) r1 = (u64 )(r10 -24) ; frame1: R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 ... ; switch (arg_spec->arg_type) { 139: (61) r1 = (u32 )(r2 +8) R2 unbounded memory access, make sure to bounds check any such access The reason is that, even though the C code enforces that arg_num < BPF_USDT_MAX_ARG_CNT, the verifier cannot propagate this constraint to the arg_spec assignment yet. Help it by forcing r1 back to stack after comparison. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-23-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:45:14 -08:00
Ilya Leoshkevich	e85465e420	libbpf: Simplify barrier_var() Use a single "+r" constraint instead of the separate "=r" and "0". Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-22-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:45:14 -08:00
Ilya Leoshkevich	1b5e385325	selftests/bpf: Fix profiler on s390x Use bpf_probe_read_kernel() and bpf_probe_read_kernel_str() instead of bpf_probe_read() and bpf_probe_read_kernel(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-21-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:45:14 -08:00
Ilya Leoshkevich	438a2edf26	selftests/bpf: Fix xdp_synproxy/tc on s390x Use the correct datatype for the values map values; currently the test works by accident, since on little-endian machines it is sometimes acceptable to access u64 as u32. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-20-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:45:14 -08:00
Ilya Leoshkevich	d504270a23	selftests/bpf: Fix vmlinux test on s390x Use a syscall macro to access the nanosleep()'s first argument; currently the code uses gprs[2] instead of orig_gpr2. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-18-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:09 -08:00
Ilya Leoshkevich	26e8a01494	selftests/bpf: Fix test_xdp_adjust_tail_grow2 on s390x s390x cache line size is 256 bytes, so skb_shared_info must be aligned on a much larger boundary than for x86. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-17-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:09 -08:00
Ilya Leoshkevich	207612eb12	selftests/bpf: Fix test_lsm on s390x Use syscall macros to access the setdomainname() arguments; currently the code uses gprs[2] instead of orig_gpr2 for the first argument. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-16-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:09 -08:00
Ilya Leoshkevich	be6b5c10ec	selftests/bpf: Add a sign-extension test for kfuncs s390x ABI requires the caller to zero- or sign-extend the arguments. eBPF already deals with zero-extension (by definition of its ABI), but not with sign-extension. Add a test to cover that potentially problematic area. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-15-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:09 -08:00
Ilya Leoshkevich	80a611904e	selftests/bpf: Increase SIZEOF_BPF_LOCAL_STORAGE_ELEM on s390x sizeof(struct bpf_local_storage_elem) is 512 on s390x: struct bpf_local_storage_elem { struct hlist_node map_node; /* 0 16 / struct hlist_node snode; / 16 16 / struct bpf_local_storage local_storage; /* 32 8 / struct callback_head rcu __attribute__((__aligned__(8))); / 40 16 / / XXX 200 bytes hole, try to pack / / --- cacheline 1 boundary (256 bytes) --- / struct bpf_local_storage_data sdata __attribute__((__aligned__(256))); / 256 8 / / size: 512, cachelines: 2, members: 5 / / sum members: 64, holes: 1, sum holes: 200 / / padding: 248 / / forced alignments: 2, forced holes: 1, sum forced holes: 200 */ } __attribute__((__aligned__(256))); As the existing comment suggests, use a larger number in order to be future-proof. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-14-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:08 -08:00
Ilya Leoshkevich	2934565f04	selftests/bpf: Check stack_mprotect() return value If stack_mprotect() succeeds, errno is not changed. This can produce misleading error messages, that show stale errno. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-13-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:08 -08:00
Ilya Leoshkevich	06cea99e68	selftests/bpf: Fix cgrp_local_storage on s390x Sync the definition of socket_cookie between the eBPF program and the test. Currently the test works by accident, since on little-endian it is sometimes acceptable to access u64 as u32. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-12-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:08 -08:00
Ilya Leoshkevich	06c1865b0b	selftests/bpf: Fix xdp_do_redirect on s390x s390x cache line size is 256 bytes, so skb_shared_info must be aligned on a much larger boundary than for x86. This makes the maximum packet size smaller. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-11-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:08 -08:00
Ilya Leoshkevich	56e1a50483	selftests/bpf: Fix verify_pkcs7_sig on s390x Use bpf_probe_read_kernel() instead of bpf_probe_read(), which is not defined on all architectures. While at it, improve the error handling: do not hide the verifier log, and check the return values of bpf_probe_read_kernel() and bpf_copy_from_user(). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-10-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:08 -08:00
Ilya Leoshkevich	98e13848cf	selftests/bpf: Fix decap_sanity_ns cleanup decap_sanity prints the following on the 1st run: decap_sanity: sh: 1: Syntax error: Bad fd number and the following on the 2nd run: Cannot create namespace file "/run/netns/decap_sanity_ns": File exists The problem is that the cleanup command has a typo and does nothing. Fix the typo. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-9-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:08 -08:00
Ilya Leoshkevich	804acdd251	selftests/bpf: Set errno when urand_spawn() fails The result of urand_spawn() is checked with ASSERT_OK_PTR, which treats NULL as success if errno == 0. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-8-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:08 -08:00
Ilya Leoshkevich	31da9be64a	selftests/bpf: Fix kfree_skb on s390x h_proto is big-endian; use htons() in order to make comparison work on both little- and big-endian machines. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-7-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:08 -08:00
Ilya Leoshkevich	6eab2370d1	selftests/bpf: Fix symlink creation error When building with O=, the following error occurs: ln: failed to create symbolic link 'no_alu32/bpftool': No such file or directory Adjust the code to account for $(OUTPUT). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-6-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:08 -08:00
Ilya Leoshkevich	b14b01f281	selftests/bpf: Fix liburandom_read.so linker error When building with O=, the following linker error occurs: clang: error: no such file or directory: 'liburandom_read.so' Fix by adding $(OUTPUT) to the linker search path. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-5-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:30:08 -08:00
Ilya Leoshkevich	8fb9fb2f17	selftests/bpf: Query BPF_MAX_TRAMP_LINKS using BTF Do not hard-code the value, since for s390x it will be smaller than for x86. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-4-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:29:41 -08:00
Ilya Leoshkevich	390a07a921	bpf: Change BPF_MAX_TRAMP_LINKS to enum This way it's possible to query its value from testcases using BTF. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:27:12 -08:00
Ilya Leoshkevich	bf3849755a	bpf: Use ARG_CONST_SIZE_OR_ZERO for 3rd argument of bpf_tcp_raw_gen_syncookie_ipv{4,6}() These functions already check that th_len < sizeof(*th), and propagating the lower bound (th_len > 0) may be challenging in complex code, e.g. as is the case with xdp_synproxy test on s390x [1]. Switch to ARG_CONST_SIZE_OR_ZERO in order to make the verifier accept code where it cannot prove that th_len > 0. [1] https://lore.kernel.org/bpf/CAEf4Bzb3uiSHtUbgVWmkWuJ5Sw1UZd4c_iuS4QXtUkXmTTtXuQ@mail.gmail.com/ Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:27:12 -08:00
Randy Dunlap	1d3cab43f4	Documentation: bpf: correct spelling Correct spelling problems for Documentation/bpf/ as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: bpf@vger.kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://lore.kernel.org/r/20230128195046.13327-1-rdunlap@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-28 12:22:20 -08:00

1 2 3 4 5 ...

1155732 Commits