1268832 Commits

Author SHA1 Message Date
Puranjay Mohan
20a759df3b riscv, bpf: make some atomic operations fully ordered
The BPF atomic operations with the BPF_FETCH modifier along with
BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements
all atomic operations except BPF_CMPXCHG with relaxed ordering.

Section 8.1 of the "The RISC-V Instruction Set Manual Volume I:
Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic
Instructions" says:

| To provide more efficient support for release consistency [5], each
| atomic instruction has two bits, aq and rl, used to specify additional
| memory ordering constraints as viewed by other RISC-V harts.

and

| If only the aq bit is set, the atomic memory operation is treated as
| an acquire access.
| If only the rl bit is set, the atomic memory operation is treated as a
| release access.
|
| If both the aq and rl bits are set, the atomic memory operation is
| sequentially consistent.

Fix this by setting both aq and rl bits as 1 for operations with
BPF_FETCH and BPF_XCHG.

[1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf

Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240505201633.123115-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 17:00:06 -07:00
Xiao Wang
80c5a07ae6 riscv, bpf: Fix typo in comment
We can use either "instruction" or "insn" in the comment.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Link: https://lore.kernel.org/r/20240507111618.437121-1-xiao.w.wang@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 16:56:43 -07:00
Ilya Leoshkevich
68378982f0 s390/bpf: Emit a barrier for BPF_FETCH instructions
BPF_ATOMIC_OP() macro documentation states that "BPF_ADD | BPF_FETCH"
should be the same as atomic_fetch_add(), which is currently not the
case on s390x: the serialization instruction "bcr 14,0" is missing.
This applies to "and", "or" and "xor" variants too.

s390x is allowed to reorder stores with subsequent fetches from
different addresses, so code relying on BPF_FETCH acting as a barrier,
for example:

  stw [%r0], 1
  afadd [%r1], %r2
  ldxw %r3, [%r4]

may be broken. Fix it by emitting "bcr 14,0".

Note that a separate serialization instruction is not needed for
BPF_XCHG and BPF_CMPXCHG, because COMPARE AND SWAP performs
serialization itself.

Fixes: ba3b86b9cef0 ("s390/bpf: Implement new atomic ops")
Reported-by: Puranjay Mohan <puranjay12@gmail.com>
Closes: https://lore.kernel.org/bpf/mb61p34qvq3wf.fsf@kernel.org/
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20240507000557.12048-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 16:55:57 -07:00
Alexei Starovoitov
55302bc1ca Merge branch 'bpf-inline-helpers-in-arm64-and-riscv-jits'
Puranjay Mohan says:

====================
bpf: Inline helpers in arm64 and riscv JITs

Changes in v5 -> v6:
arm64 v5: https://lore.kernel.org/all/20240430234739.79185-1-puranjay@kernel.org/
riscv v2: https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/
- Combine riscv and arm64 changes in single series
- Some coding style fixes

Changes in v4 -> v5:
v4: https://lore.kernel.org/all/20240429131647.50165-1-puranjay@kernel.org/
- Implement the inlining of the bpf_get_smp_processor_id() in the JIT.

NOTE: This needs to be based on:
https://lore.kernel.org/all/20240430175834.33152-1-puranjay@kernel.org/
to be built.

Manual run of bpf-ci with this series rebased on above:
https://github.com/kernel-patches/bpf/pull/6929

Changes in v3 -> v4:
v3: https://lore.kernel.org/all/20240426121349.97651-1-puranjay@kernel.org/
- Fix coding style issue related to C89 standards.

Changes in v2 -> v3:
v2: https://lore.kernel.org/all/20240424173550.16359-1-puranjay@kernel.org/
- Fixed the xlated dump of percpu mov to "r0 = &(void __percpu *)(r0)"
- Made ARM64 and x86-64 use the same code for inlining. The only difference
  that remains is the per-cpu address of the cpu_number.

Changes in v1 -> v2:
v1: https://lore.kernel.org/all/20240405091707.66675-1-puranjay12@gmail.com/
- Add a patch to inline bpf_get_smp_processor_id()
- Fix an issue in MRS instruction encoding as pointed out by Will
- Remove CONFIG_SMP check because arm64 kernel always compiles with CONFIG_SMP

This series adds the support of internal only per-CPU instructions and inlines
the bpf_get_smp_processor_id() helper call for ARM64 and RISC-V BPF JITs.

Here is an example of calls to bpf_get_smp_processor_id() and
percpu_array_map_lookup_elem() before and after this series on ARM64.

                                         BPF
                                        =====
              BEFORE                                       AFTER
             --------                                     -------

int cpu = bpf_get_smp_processor_id();           int cpu = bpf_get_smp_processor_id();
(85) call bpf_get_smp_processor_id#229032       (85) call bpf_get_smp_processor_id#8

p = bpf_map_lookup_elem(map, &zero);            p = bpf_map_lookup_elem(map, &zero);
(18) r1 = map[id:78]                            (18) r1 = map[id:153]
(18) r2 = map[id:82][0]+65536                   (18) r2 = map[id:157][0]+65536
(85) call percpu_array_map_lookup_elem#313512   (07) r1 += 496
                                                (61) r0 = *(u32 *)(r2 +0)
                                                (35) if r0 >= 0x1 goto pc+5
                                                (67) r0 <<= 3
                                                (0f) r0 += r1
                                                (79) r0 = *(u64 *)(r0 +0)
                                                (bf) r0 = &(void __percpu *)(r0)
                                                (05) goto pc+1
                                                (b7) r0 = 0

                                      ARM64 JIT
                                     ===========

              BEFORE                                       AFTER
             --------                                     -------

int cpu = bpf_get_smp_processor_id();           int cpu = bpf_get_smp_processor_id();
mov     x10, #0xfffffffffffff4d0                mrs     x10, sp_el0
movk    x10, #0x802b, lsl #16                   ldr     w7, [x10, #24]
movk    x10, #0x8000, lsl #32
blr     x10
add     x7, x0, #0x0

p = bpf_map_lookup_elem(map, &zero);            p = bpf_map_lookup_elem(map, &zero);
mov     x0, #0xffff0003ffffffff                 mov     x0, #0xffff0003ffffffff
movk    x0, #0xce5c, lsl #16                    movk    x0, #0xe0f3, lsl #16
movk    x0, #0xca00                             movk    x0, #0x7c00
mov     x1, #0xffff8000ffffffff                 mov     x1, #0xffff8000ffffffff
movk    x1, #0x8bdb, lsl #16                    movk    x1, #0xb0c7, lsl #16
movk    x1, #0x6000                             movk    x1, #0xe000
mov     x10, #0xffffffffffff3ed0                add     x0, x0, #0x1f0
movk    x10, #0x802d, lsl #16                   ldr     w7, [x1]
movk    x10, #0x8000, lsl #32                   cmp     x7, #0x1
blr     x10                                     b.cs    0x0000000000000090
add     x7, x0, #0x0                            lsl     x7, x7, #3
                                                add     x7, x7, x0
                                                ldr     x7, [x7]
                                                mrs     x10, tpidr_el1
                                                add     x7, x7, x10
                                                b       0x0000000000000094
                                                mov     x7, #0x0

              Performance improvement found using benchmark[1]

./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

  +---------------+-------------------+-------------------+--------------+
  |      Name     |      Before       |        After      |   % change   |
  |---------------+-------------------+-------------------+--------------|
  | glob-arr-inc  | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s |   + 10.74%   |
  | arr-inc       | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s |   + 5.37%    |
  | hash-inc      | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s |   + 2.08%    |
  +---------------+-------------------+-------------------+--------------+

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

             RISCV64 JIT output for `call bpf_get_smp_processor_id`
            =======================================================

                  Before                           After
                 --------                         -------

           auipc   t1,0x848c                  ld    a5,32(tp)
           jalr    604(t1)
           mv      a5,a0

  Benchmark using [1] on Qemu.

  ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

  +---------------+------------------+------------------+--------------+
  |      Name     |     Before       |       After      |   % change   |
  |---------------+------------------+------------------+--------------|
  | glob-arr-inc  | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s |   + 24.04%   |
  | arr-inc       | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s |   + 23.56%   |
  | hash-inc      | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s |   + 32.18%   |
  +---------------+------------------+------------------+--------------+
====================

Link: https://lore.kernel.org/r/20240502151854.9810-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 16:54:35 -07:00
Puranjay Mohan
75fe4c0b3e bpf, arm64: inline bpf_get_smp_processor_id() helper
Inline calls to bpf_get_smp_processor_id() helper in the JIT by emitting
a read from struct thread_info. The SP_EL0 system register holds the
pointer to the task_struct and thread_info is the first member of this
struct. We can read the cpu number from the thread_info.

Here is how the ARM64 JITed assembly changes after this commit:

                                      ARM64 JIT
                                     ===========

              BEFORE                                    AFTER
             --------                                  -------

int cpu = bpf_get_smp_processor_id();        int cpu = bpf_get_smp_processor_id();

mov     x10, #0xfffffffffffff4d0             mrs     x10, sp_el0
movk    x10, #0x802b, lsl #16                ldr     w7, [x10, #24]
movk    x10, #0x8000, lsl #32
blr     x10
add     x7, x0, #0x0

               Performance improvement using benchmark[1]

./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

+---------------+-------------------+-------------------+--------------+
|      Name     |      Before       |        After      |   % change   |
|---------------+-------------------+-------------------+--------------|
| glob-arr-inc  | 23.380 ± 1.675M/s | 25.893 ± 0.026M/s |   + 10.74%   |
| arr-inc       | 23.928 ± 0.034M/s | 25.213 ± 0.063M/s |   + 5.37%    |
| hash-inc      | 12.352 ± 0.005M/s | 12.609 ± 0.013M/s |   + 2.08%    |
+---------------+-------------------+-------------------+--------------+

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 16:54:34 -07:00
Puranjay Mohan
7a4c32222b arm64, bpf: add internal-only MOV instruction to resolve per-CPU addrs
Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.

Since commit 7158627686f0 ("arm64: percpu: implement optimised pcpu
access using tpidr_el1"), the per-cpu offset for the CPU is stored in
the tpidr_el1/2 register of that CPU.

To support this BPF instruction in the ARM64 JIT, the following ARM64
instructions are emitted:

mov dst, src		// Move src to dst, if src != dst
mrs tmp, tpidr_el1/2	// Move per-cpu offset of the current cpu in tmp.
add dst, dst, tmp	// Add the per cpu offset to the dst.

To measure the performance improvement provided by this change, the
benchmark in [1] was used:

Before:
glob-arr-inc   :   23.597 ± 0.012M/s
arr-inc        :   23.173 ± 0.019M/s
hash-inc       :   12.186 ± 0.028M/s

After:
glob-arr-inc   :   23.819 ± 0.034M/s
arr-inc        :   23.285 ± 0.017M/s
hash-inc       :   12.419 ± 0.011M/s

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 16:54:34 -07:00
Puranjay Mohan
2ddec2c80b riscv, bpf: inline bpf_get_smp_processor_id()
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit.

RISCV saves the pointer to the CPU's task_struct in the TP (thread
pointer) register. This makes it trivial to get the CPU's processor id.
As thread_info is the first member of task_struct, we can read the
processor id from TP + offsetof(struct thread_info, cpu).

          RISCV64 JIT output for `call bpf_get_smp_processor_id`
	  ======================================================

                Before                           After
               --------                         -------

         auipc   t1,0x848c                  ld    a5,32(tp)
         jalr    604(t1)
         mv      a5,a0

Benchmark using [1] on Qemu.

./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc

+---------------+------------------+------------------+--------------+
|      Name     |     Before       |       After      |   % change   |
|---------------+------------------+------------------+--------------|
| glob-arr-inc  | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s |   + 24.04%   |
| arr-inc       | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s |   + 23.56%   |
| hash-inc      | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s |   + 32.18%   |
+---------------+------------------+------------------+--------------+

NOTE: This benchmark includes changes from this patch and the previous
      patch that implemented the per-cpu insn.

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 16:54:34 -07:00
Puranjay Mohan
19c56d4e5b riscv, bpf: add internal-only MOV instruction to resolve per-CPU addrs
Support an instruction for resolving absolute addresses of per-CPU
data from their per-CPU offsets. This instruction is internal-only and
users are not allowed to use them directly. They will only be used for
internal inlining optimizations for now between BPF verifier and BPF
JITs.

RISC-V uses generic per-cpu implementation where the offsets for CPUs
are kept in an array called __per_cpu_offset[cpu_number]. RISCV stores
the address of the task_struct in TP register. The first element in
task_struct is struct thread_info, and we can get the cpu number by
reading from the TP register + offsetof(struct thread_info, cpu).

Once we have the cpu number in a register we read the offset for that
cpu from address: &__per_cpu_offset + cpu_number << 3. Then we add this
offset to the destination register.

To measure the improvement from this change, the benchmark in [1] was
used on Qemu:

Before:
glob-arr-inc   :    1.127 ± 0.013M/s
arr-inc        :    1.121 ± 0.004M/s
hash-inc       :    0.681 ± 0.052M/s

After:
glob-arr-inc   :    1.138 ± 0.011M/s
arr-inc        :    1.366 ± 0.006M/s
hash-inc       :    0.676 ± 0.001M/s

[1] https://github.com/anakryiko/linux/commit/8dec900975ef

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 16:54:34 -07:00
Shahab Vahedi
f122668ddc ARC: Add eBPF JIT support
This will add eBPF JIT support to the 32-bit ARCv2 processors. The
implementation is qualified by running the BPF tests on a Synopsys HSDK
board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU.

The test_bpf.ko reports 2-10 fold improvements in execution time of its
tests. For instance:

test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS
test_bpf: #33 tcpdump port 22 jited:1 120  224  260 PASS

test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS
test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1  23 PASS

test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS
test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS

Deployment and structure
------------------------
The related codes are added to "arch/arc/net":

- bpf_jit.h       -- The interface that a back-end translator must provide
- bpf_jit_core.c  -- Knows how to handle the input eBPF byte stream
- bpf_jit_arcv2.c -- The back-end code that knows the translation logic

The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance
to the whole process. Normally, the translation is done in one pass,
namely the "normal pass". In case some relocations are not known during
this pass, some data (arc_jit_data) is allocated for the next pass to
come. This possible next (and last) pass is called the "extra pass".

1. Normal pass       # The necessary pass
     1a. Dry run       # Get the whole JIT length, epilogue offset, etc.
     1b. Emit phase    # Allocate memory and start emitting instructions
2. Extra pass        # Only needed if there are relocations to be fixed
     2a. Patch relocations

Support status
--------------
The JIT compiler supports BPF instructions up to "cpu=v4". However, it
does not yet provide support for:

- Tail calls
- Atomic operations
- 64-bit division/remainder
- BPF_PROBE_MEM* (exception table)

The result of "test_bpf" test suite on an HSDK board is:

hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf

  test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

All the failing test cases are due to the ones that were not JIT'ed.
Categorically, they can be represented as:

  .-----------.------------.-------------.
  | test type |   opcodes  | # of cases  |
  |-----------+------------+-------------|
  | atomic    | 0xC3, 0xDB |         149 |
  | div64     | 0x37, 0x3F |          22 |
  | mod64     | 0x97, 0x9F |          15 |
  `-----------^------------+-------------|
                           | (total) 186 |
                           `-------------'

Setup: build config
-------------------
The following configs must be set to have a working JIT test:

  CONFIG_BPF_JIT=y
  CONFIG_BPF_JIT_ALWAYS_ON=y
  CONFIG_TEST_BPF=m

The following options are not necessary for the tests module,
but are good to have:

  CONFIG_DEBUG_INFO=y             # prerequisite for below
  CONFIG_DEBUG_INFO_BTF=y         # so bpftool can generate vmlinux.h

  CONFIG_FTRACE=y                 #
  CONFIG_BPF_SYSCALL=y            # all these options lead to
  CONFIG_KPROBE_EVENTS=y          # having CONFIG_BPF_EVENTS=y
  CONFIG_PERF_EVENTS=y            #

Some BPF programs provide data through /sys/kernel/debug:
  CONFIG_DEBUG_FS=y
arc# mount -t debugfs debugfs /sys/kernel/debug

Setup: elfutils
---------------
The libdw.{so,a} library that is used by pahole for processing
the final binary must come from elfutils 0.189 or newer. The
support for ARCv2 [1] has been added since that version.

[1]
https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7

Setup: pahole
-------------
The line below in linux/scripts/Makefile.btf must be commented out:

pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats

Or else, the build will fail:

$ make V=1
  ...
  BTF     .btf.vmlinux.bin.o
pahole -J --btf_gen_floats                    \
       -j --lang_exclude=rust                 \
       --skip_encoding_btf_inconsistent_proto \
       --btf_gen_optimized .tmp_vmlinux.btf
Complex, interval and imaginary float types are not supported
Encountered error while encoding BTF.
  ...
  BTFIDS  vmlinux
./tools/bpf/resolve_btfids/resolve_btfids vmlinux
libbpf: failed to find '.BTF' ELF section in vmlinux
FAILED: load BTF from vmlinux: No data available

This is due to the fact that the ARC toolchains generate
"complex float" DIE entries in libgcc and at the moment, pahole
can't handle such entries.

Running the tests
-----------------
host$ scp /bld/linux/lib/test_bpf.ko arc:
arc # sysctl net.core.bpf_jit_enable=1
arc # insmod test_bpf.ko test_suite=test_bpf
      ...
      test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS
      test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed]

Acknowledgments
---------------
- Claudiu Zissulescu for his unwavering support
- Yuriy Kolerov for testing and troubleshooting
- Vladimir Isaev for the pahole workaround
- Sergey Matyukevich for paving the road by adding the interpreter support

Signed-off-by: Shahab Vahedi <shahab@synopsys.com>
Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-12 16:51:36 -07:00
Jakub Kicinski
cddd2dc639 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-05-08 (most Intel drivers)

This series contains updates to i40e, iavf, ice, igb, igc, e1000e, and ixgbe
drivers.

Asbjørn Sloth Tønnesen adds checks against supported flower control flags
for i40e, iavf, ice, and igb drivers.

Michal corrects filters removed during eswitch release for ice.

Corinna Vinschen defers PTP initialization to later in probe so that
netdev log entry is initialized on igc.

Ilpo Järvinen removes a couple of unused, duplicate defines on
e1000e and ixgbe.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates
  igc: fix a log entry using uninitialized netdev
  ice: remove correct filters during eswitch release
  igb: flower: validate control flags
  ice: flower: validate control flags
  iavf: flower: validate control flags
  i40e: flower: validate control flags
====================

Link: https://lore.kernel.org/r/20240508173342.2760994-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:33:52 -07:00
Jakub Kicinski
24e28b60b0 Merge branch 'net-qede-convert-filter-code-to-use-extack'
Asbjørn Sloth Tønnesen says:

====================
net: qede: convert filter code to use extack

This series converts the filter code in the qede driver
to use NL_SET_ERR_MSG_*(extack, ...) for error handling.

Patch 1-12 converts qede_parse_flow_attr() to use extack,
along with all it's static helper functions.

qede_parse_flow_attr() is used in two places:
- qede_add_tc_flower_fltr()
- qede_flow_spec_to_rule()

In the latter call site extack is faked in the same way as
is done in mlxsw (patch 12).

While the conversion is going on, some error messages are silenced
in between patch 1-12. If wanted could squash patch 1-12 in a v3, but
I felt that it would be easier to review as 12 more trivial patches.

Patch 13 and 14, finishes up by converting qede_parse_actions(),
and ensures that extack is propagated to it, in both call contexts.

v1: https://lore.kernel.org/netdev/20240507104421.1628139-1-ast@fiberby.net/
====================

Link: https://lore.kernel.org/r/20240508143404.95901-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:02 -07:00
Asbjørn Sloth Tønnesen
841548793b net: qede: use extack in qede_parse_actions()
Convert DP_NOTICE/DP_INFO to NL_SET_ERR_MSG_MOD.

Keep edev around for use with QEDE_RSS_COUNT().

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-15-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
d2a437efd0 net: qede: propagate extack through qede_flow_spec_validate()
Pass extack to qede_flow_spec_validate() when called in
qede_flow_spec_to_rule().

Pass extack to qede_parse_actions().

Not converting qede_flow_spec_validate() to use extack for
errors, as it's only called from qede_flow_spec_to_rule(),
where extack is faked into a DP_NOTICE anyway, so opting to
keep DP_VERBOSE/DP_NOTICE usage.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-14-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
eb705d7345 net: qede: use faked extack in qede_flow_spec_to_rule()
Since qede_parse_flow_attr() now does error reporting
through extack, then give it a fake extack and extract the
error message afterwards if one was set.

The extracted error message is then passed on through
DP_NOTICE(), including messages that was earlier issued
with DP_INFO().

This fake extack approach is already used by
mlxsw_env_linecard_modules_power_mode_apply() in
drivers/net/ethernet/mellanox/mlxsw/core_env.c

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-13-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
d6883bceb2 net: qede: use extack in qede_parse_flow_attr()
Convert qede_parse_flow_attr() to take extack,
and drop the edev argument.

Convert DP_NOTICE calls to use NL_SET_ERR_MSG_* instead.

Pass extack in calls to qede_flow_parse_{tcp,udp}_v{4,6}().

In calls to qede_parse_flow_attr(), if extack is
unavailable, then use NULL for now, until a
subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-12-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
f833a6555e net: qede: add extack in qede_add_tc_flower_fltr()
Define extack locally, to reduce line lengths and aid future users.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-11-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
9c8f5ed884 net: qede: use extack in qede_flow_parse_udp_v4()
Convert qede_flow_parse_udp_v4() to take extack,
and drop the edev argument.

Pass extack in call to qede_flow_parse_v4_common().

In call to qede_flow_parse_udp_v4(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-10-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
b73ad5c7a7 net: qede: use extack in qede_flow_parse_udp_v6()
Convert qede_flow_parse_udp_v6() to take extack,
and drop the edev argument.

Pass extack in call to qede_flow_parse_v6_common().

In call to qede_flow_parse_udp_v6(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-9-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:30:00 -07:00
Asbjørn Sloth Tønnesen
f84d52776c net: qede: use extack in qede_flow_parse_tcp_v4()
Convert qede_flow_parse_tcp_v4() to take extack,
and drop the edev argument.

Pass extack in call to qede_flow_parse_v4_common().

In call to qede_flow_parse_tcp_v4(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-8-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
b1a18d5781 net: qede: use extack in qede_flow_parse_tcp_v6()
Convert qede_flow_parse_tcp_v6() to take extack,
and drop the edev argument.

Pass extack in call to qede_flow_parse_v6_common().

In call to qede_flow_parse_tcp_v6(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-7-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
f2f993835b net: qede: use extack in qede_flow_parse_v4_common()
Convert qede_flow_parse_v4_common() to take extack,
and drop the edev argument.

Convert DP_NOTICE call to use NL_SET_ERR_MSG_MOD instead.

Pass extack in calls to qede_flow_parse_ports() and
qede_set_v4_tuple_to_profile().

In calls to qede_flow_parse_v4_common(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-6-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
a62944d11a net: qede: use extack in qede_flow_parse_v6_common()
Convert qede_flow_parse_v6_common() to take extack,
and drop the edev argument.

Convert DP_NOTICE call to use NL_SET_ERR_MSG_MOD instead.

Pass extack in calls to qede_flow_parse_ports() and
qede_set_v6_tuple_to_profile().

In calls to qede_flow_parse_v6_common(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-5-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
f63a9dc507 net: qede: use extack in qede_set_v4_tuple_to_profile()
Convert qede_set_v4_tuple_to_profile() to take extack,
and drop the edev argument.

Convert DP_INFO call to use NL_SET_ERR_MSG_MOD instead.

In calls to qede_set_v4_tuple_to_profile(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-4-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
6f88f1257a net: qede: use extack in qede_set_v6_tuple_to_profile()
Convert qede_set_v6_tuple_to_profile() to take extack,
and drop the edev argument.

Convert DP_INFO call to use NL_SET_ERR_MSG_MOD instead.

In calls to qede_set_v6_tuple_to_profile(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-3-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Asbjørn Sloth Tønnesen
a7c9540e96 net: qede: use extack in qede_flow_parse_ports()
Convert qede_flow_parse_ports to use extack,
and drop the edev argument.

Convert DP_NOTICE call to use NL_SET_ERR_MSG_MOD instead.

In calls to qede_flow_parse_ports(), use NULL as extack
for now, until a subsequent patch makes extack available.

Only compile tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508143404.95901-2-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:29:59 -07:00
Eric Dumazet
d50729f1d6 net: usb: smsc95xx: stop lying about skb->truesize
Some usb drivers try to set small skb->truesize and break
core networking stacks.

In this patch, I removed one of the skb->truesize override.

I also replaced one skb_clone() by an allocation of a fresh
and small skb, to get minimally sized skbs, like we did
in commit 1e2c61172342 ("net: cdc_ncm: reduce skb truesize
in rx path") and 4ce62d5b2f7a ("net: usb: ax88179_178a:
stop lying about skb->truesize")

v3: also fix a sparse error ( https://lore.kernel.org/oe-kbuild-all/202405091310.KvncIecx-lkp@intel.com/ )
v2: leave the skb_trim() game because smsc95xx_rx_csum_offload()
    needs the csum part. (Jakub)
    While we are it, use get_unaligned() in smsc95xx_rx_csum_offload().

Fixes: 2f7ca802bdae ("net: Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steve Glendinning <steve.glendinning@shawell.net>
Cc: UNGLinuxDriver@microchip.com
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240509083313.2113832-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:22:53 -07:00
Colin Ian King
089507a679 net: dsa: microchip: Fix spellig mistake "configur" -> "configure"
There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240509065023.3033397-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 19:21:22 -07:00
Kuniyuki Iwashima
7172dc93d6 af_unix: Add dead flag to struct scm_fp_list.
Commit 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges()
during GC.") fixed use-after-free by avoid accessing edge->successor while
GC is in progress.

However, there could be a small race window where another process could
call unix_del_edges() while gc_in_progress is true and __skb_queue_purge()
is on the way.

So, we need another marker for struct scm_fp_list which indicates if the
skb is garbage-collected.

This patch adds dead flag in struct scm_fp_list and set it true before
calling __skb_queue_purge().

Fixes: 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges() during GC.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240508171150.50601-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:52:45 -07:00
Andy Shevchenko
84c8b7ad5e net: ethernet: adi: adin1110: Replace linux/gpio.h by proper one
linux/gpio.h is deprecated and subject to remove.
The driver doesn't use it directly, replace it
with what is really being used.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508114519.972082-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:51:34 -07:00
Hariprasad Kelam
04fb71cc5f octeontx2-pf: Reuse Transmit queue/Send queue index of HTB class
Real number of Transmit queues are incremented when user enables HTB
class and vice versa. Depending on SKB priority driver returns transmit
queue (Txq). Transmit queues and Send queues are one-to-one mapped.

In few scenarios, Driver is returning transmit queue value which is
greater than real number of transmit queue and Stack detects this as
error and overwrites transmit queue value.

For example
user has added two classes and real number of queues are incremented
accordingly
- tc class add dev eth1 parent 1: classid 1:1 htb
      rate 100Mbit ceil 100Mbit prio 1 quantum 1024
- tc class add dev eth1 parent 1: classid 1:2 htb
      rate 100Mbit ceil 200Mbit prio 7 quantum 1024

now if user deletes the class with id 1:1, driver decrements the real
number of queues
- tc class del dev eth1 classid 1:1

But for the class with id 1:2, driver is returning transmit queue
value which is higher than real number of transmit queue leading
to below error

eth1 selects TX queue x, but real number of TX queues is x

This patch solves the problem by assigning deleted class transmit
queue/send queue to active class.

Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508070935.11501-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:49:15 -07:00
Jakub Kicinski
9c1bbc7ea1 Merge branch 'gve-minor-cleanups'
Simon Horman says:

====================
gve: Minor cleanups

This short patchset provides two minor cleanups for the gve driver.

These were found by tooling as mentioned in each patch,
and otherwise by inspection.

No change in run time behaviour is intended.
Each patch is compile tested only.

v1: https://lore.kernel.org/r/20240503-gve-comma-v1-0-b50f965694ef@kernel.org
====================

Link: https://lore.kernel.org/r/20240508-gve-comma-v2-0-1ac919225f13@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:40:52 -07:00
Simon Horman
ba8bcb012b gve: Use ethtool_sprintf/puts() to fill stats strings
Make use of standard helpers to simplify filling in stats strings.

The first two ethtool_puts() changes address the following fortification
warnings flagged by W=1 builds with clang-18. (The last ethtool_puts
change does not because the warning relates to writing beyond the first
element of an array, and gve_gstrings_priv_flags only has one element.)

.../fortify-string.h:562:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
  562 |                         __read_overflow2_field(q_size_field, size);
      |                         ^
.../fortify-string.h:562:4: warning: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Wattribute-warning]

Likewise, the same changes resolve the same problems flagged by Smatch.

.../gve_ethtool.c:100 gve_get_strings() error: __builtin_memcpy() '*gve_gstrings_main_stats' too small (32 vs 576)
.../gve_ethtool.c:120 gve_get_strings() error: __builtin_memcpy() '*gve_gstrings_adminq_stats' too small (32 vs 512)

Compile tested only.

Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240508-gve-comma-v2-2-1ac919225f13@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:40:49 -07:00
Simon Horman
ebb8308eac gve: Avoid unnecessary use of comma operator
Although it does not seem to have any untoward side-effects,
the use of ';' to separate to assignments seems more appropriate than ','.

Flagged by clang-18 -Wcomma

No functional change intended.
Compile tested only.

Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240508-gve-comma-v2-1-1ac919225f13@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:40:49 -07:00
Jakub Kicinski
b9d5f5711d selftests: net: increase the delay for relative cmsg_time.sh test
Slow machines can delay scheduling of the packets for milliseconds.
Increase the delay to 8ms if KSFT_MACHINE_SLOW. Try to limit the
variability by moving setsockopts earlier (before we read time).

This fixes the "TXTIME rel" failures on debug kernels, like:

  Case ICMPv4  - TXTIME rel returned '', expected 'OK'

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240510005705.43069-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:22:10 -07:00
Jakub Kicinski
2d3b8dfd82 selftests: net: fix timestamp not arriving in cmsg_time.sh
On slow machines the SND timestamp sometimes doesn't arrive before
we quit. The test only waits as long as the packet delay, so it's
easy for a race condition to happen.

Double the wait but do a bit of polling, once the SND timestamp
arrives there's no point to wait any longer.

This fixes the "TXTIME abs" failures on debug kernels, like:

   Case ICMPv4  - TXTIME abs returned '', expected 'OK'

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240510005705.43069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:22:00 -07:00
Daniel Jurgens
b49bd37f0b virtio_net: Fix memory leak in virtnet_rx_mod_work
The pointer delcaration was missing the __free(kfree).

Fixes: ff7c7d9f5261 ("virtio_net: Remove command data from control_buf")
Reported-by: Jens Axboe <axboe@kernel.dk>
Closes: https://lore.kernel.org/netdev/0674ca1b-020f-4f93-94d0-104964566e3f@kernel.dk/
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240509183634.143273-1-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:20:07 -07:00
Vadim Fedorenko
38155539a1 bnxt_en: silence clang build warning
Clang build brings a warning:

    ../drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:133:12: warning:
    comparison of distinct pointer types ('typeof (tmo_us) *' (aka 'unsigned
    int *') and 'typeof (65535) *' (aka 'int *'))
    [-Wcompare-distinct-pointer-types]
      133 |                 tmo_us = min(tmo_us, BNXT_PTP_QTS_MAX_TMO_US);
          |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix it by specifying proper type for BNXT_PTP_QTS_MAX_TMO_US.

Fixes: 7de3c2218eed ("bnxt_en: Add a timeout parameter to bnxt_hwrm_port_ts_query()")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240509151833.12579-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10 18:16:35 -07:00
David S. Miller
f8beae078c gtp pull request 24-05-07
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEFEKqOX9jqZmzwkCc1GSBvS7ZkBkFAmY5ajUACgkQ1GSBvS7Z
 kBmR5w//U8Cr5QqBzvczVZNmEtgNm4Zsy9DR2QnV7Wzp1hH/4DL7oDzJX319sw0i
 jNUu5QnGhbq5T3hBF/GIbugSQSVhTeS2oRk/Qxc/k8X7cNchlk+iB9U1YF8Lppj8
 Y7pB6pT20SFO/8nhQguYQC8F+W0TKnpwSNcykHHIF1THe1MO8PfPbkdcvty28Mvs
 xB/IiYee5nm88Chx/+PAuJWFSaUK/AmS5jlcBRKpffzPu/HtQ6cLtC8eRrZlYG0v
 plVkExIlsRYhCudBp8ihhLQx2Nc3PpbRPDnE8AKyQCOILXM29/Lk85U4583Lbg9t
 bMQuVEm1lw0HPH2iDMRwkkdsl09xPZX7XpnAv5zhEXnl9kDaEwLjV3eXaxdMUmK0
 wHd05OMnPiOeyTyBvAjVdjNythfgg8fdy40K8E7G2BXw0G9Yv+/klyT2LM8bhmYp
 ec2tI9FyxzCtY/Dz89Of2xQCrFLSXCHVjQAwmAYheSw247JfKu2m3fSnC5ABd/cm
 B70VBkXESYk78uzK6JtpRCqA3CgppeB7tryOJ60Ac0GczeZUSUirbSJxH5laMF71
 aPnhw+PRep+GWaX0ym2dmaIiK6NnkbEHk5B4oHwJFoJzwqJ2ezUAVh0oQssSxpwJ
 zU4i8CtFegVWNsyly7N8js96a6T/Sn5Sny0NTCt+nJpbCwLaaD0=
 =aPYm
 -----END PGP SIGNATURE-----

Merge tag 'gtp-24-05-07' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/gtp
Pablo neira Ayuso says:

====================
gtp pull request 24-05-07

This v3 includes:
- fix for clang uninitialized variable per Jakub.
- address Smatch and Coccinelle reports per Simon
- remove inline in new IPv6 support per Simon
- fix memleaks in netlink control plane per Simon
-o-

The following patchset contains IPv6 GTP driver support for net-next,
this also includes IPv6 over IPv4 and vice-versa:

Patch #1 removes a unnecessary stack variable initialization in the
         socket routine.

Patch #2 deals with GTP extension headers. This variable length extension
         header to decapsulate packets accordingly. Otherwise, packets are
         dropped when these extension headers are present which breaks
         interoperation with other non-Linux based GTP implementations.

Patch #3 prepares for IPv6 support by moving IPv4 specific fields in PDP
         context objects to a union.

Patch #4 adds IPv6 support while retaining backward compatibility.
         Three new attributes allows to declare an IPv6 GTP tunnel
         GTPA_FAMILY, GTPA_PEER_ADDR6 and GTPA_MS_ADDR6 as well as
         IFLA_GTP_LOCAL6 to declare the IPv6 GTP UDP socket. Up to this
         patch, only IPv6 outer in IPv6 inner is supported.

Patch #5 uses IPv6 address /64 prefix for UE/MS in the inner headers.
         Unlike IPv4, which provides a 1:1 mapping between UE/MS,
         IPv6 tunnel encapsulates traffic for /64 address as specified
         by 3GPP TS. Patch has been split from Patch #4 to highlight
         this behaviour.

Patch #6 passes up IPv6 link-local traffic, such as IPv6 SLAAC, for
         handling to userspace so they are handled as control packets.

Patch #7 prepares to allow for GTP IPv4 over IPv6 and vice-versa by
         moving IP specific debugging out of the function to build
         IPv4 and IPv6 GTP packets.

Patch #8 generalizes TOS/DSCP handling following similar approach as
         in the existing iptunnel infrastructure.

Patch #9 adds a helper function to build an IPv4 GTP packet in the outer
         header.

Patch #10 adds a helper function to build an IPv6 GTP packet in the outer
          header.

Patch #11 adds support for GTP IPv4-over-IPv6 and vice-versa.

Patch #12 allows to use the same TID/TEID (tunnel identifier) for inner
          IPv4 and IPv6 packets for better UE/MS dual stack integration.

This series integrates with the osmocom.org project CI and TTCN-3 test
infrastructure (Oliver Smith) as well as the userspace libgtpnl library.

Thanks to Harald Welte, Oliver Smith and Pau Espin for reviewing and
providing feedback through the osmocom.org redmine platform to make this
happen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-10 13:59:27 +01:00
Florian Westphal
fa23e0d4b7 netfilter: nf_tables: allow clone callbacks to sleep
Sven Auhagen reports transaction failures with following error:
  ./main.nft:13:1-26: Error: Could not process rule: Cannot allocate memory
  percpu: allocation failed, size=16 align=8 atomic=1, atomic alloc failed, no space left

This points to failing pcpu allocation with GFP_ATOMIC flag.
However, transactions happen from user context and are allowed to sleep.

One case where we can call into percpu allocator with GFP_ATOMIC is
nft_counter expression.

Normally this happens from control plane, so this could use GFP_KERNEL
instead.  But one use case, element insertion from packet path,
needs to use GFP_ATOMIC allocations (nft_dynset expression).

At this time, .clone callbacks always use GFP_ATOMIC for this reason.

Add gfp_t argument to the .clone function and pass GFP_KERNEL or
GFP_ATOMIC flag depending on context, this allows all clone memory
allocations to sleep for the normal (transaction) case.

Cc: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-10 11:13:45 +02:00
Florian Westphal
a8a388c2aa selftests: netfilter: add packetdrill based conntrack tests
Add a new test script that uses packetdrill tool to exercise conntrack
state machine.

Needs ip/ip6tables and conntrack tool (to check if we have an entry in
the expected state).

Test cases added here cover following scenarios:
1. already-acked (retransmitted) packets are not tagged as INVALID
2. RST packet coming when conntrack is already closing (FIN/CLOSE_WAIT)
  transitions conntrack to CLOSE even if the RST is not an exact match
3. RST packets with out-of-window sequence numbers are marked as INVALID
4. SYN+Challenge ACK: check that challenge ack is allowed to pass
5. Old SYN/ACK: check conntrack handles the case where SYN is answered
  with SYN/ACK for an old, previous connection attempt
6. Check SYN reception while in ESTABLISHED state generates a challenge
   ack, RST response clears 'outdated' state + next SYN retransmit gets
   us into 'SYN_RECV' conntrack state.

Tests get run twice, once with ipv4 and once with ipv6.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-10 11:13:45 +02:00
Florian Westphal
532aec7e87 netfilter: nft_set_pipapo: remove dirty flag
After previous change:
 ->clone exists: ->dirty is always true
 ->clone == NULL ->dirty is always false

So remove this flag.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-10 11:13:45 +02:00
Florian Westphal
3f1d886cc7 netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
This set type keeps two copies of the sets' content,
   priv->match (live version, used to match from packet path)
   priv->clone (work-in-progress version of the 'future' priv->match).

All additions and removals are done on priv->clone.  When transaction
completes, priv->clone becomes priv->match and a new clone is allocated
for use by next transaction.

Problem is that the cloning requires GFP_KERNEL allocations but we
cannot fail at either commit or abort time.

This patch defers the clone until we get an insertion or removal
request.  This allows us to handle OOM situations correctly.

This also allows to remove ->dirty in a followup change:

If ->clone exists, ->dirty is always true
If ->clone is NULL, ->dirty is always false, no elements were added
or removed (except catchall elements which are external to the specific
set backend).

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-10 11:05:16 +02:00
Florian Westphal
a238106703 netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
The helper uses priv->clone unconditionally which will fail once we do
the clone conditionally on first insert or removal.

'nft get element' from userspace needs to use priv->match since this
runs from rcu read side lock section.

Prepare for this by passing the match backend data as argument.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-10 11:04:08 +02:00
gaoxingwang
1cd354fe1e net: ipv6: fix wrong start position when receive hop-by-hop fragment
In IPv6, ipv6_rcv_core will parse the hop-by-hop type extension header and increase skb->transport_header by one extension header length.
But if there are more other extension headers like fragment header at this time, the skb->transport_header points to the second extension header,
not the transport layer header or the first extension header.

This will result in the start and nexthdrp variable not pointing to the same position in ipv6frag_thdr_trunced,
and ipv6_skip_exthdr returning incorrect offset and frag_off.Sometimes,the length of the last sharded packet is smaller than the calculated incorrect offset, resulting in packet loss.
We can use network header to offset and calculate the correct position to solve this problem.

Fixes: 9d9e937b1c8b (ipv6/netfilter: Discard first fragment not including all headers)
Signed-off-by: Gao Xingwang <gaoxingwang1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-10 10:04:06 +01:00
Eric Dumazet
383eed2de5 tcp: get rid of twsk_unique()
DCCP is going away soon, and had no twsk_unique() method.

We can directly call tcp_twsk_unique() for TCP sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240507164140.940547-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-09 20:25:55 -07:00
Praveen Kumar Kannoju
33fb988b67 net/sched: adjust device watchdog timer to detect stopped queue at right time
Applications are sensitive to long network latency, particularly
heartbeat monitoring ones. Longer the tx timeout recovery higher the
risk with such applications on a production machines. This patch
remedies, yet honoring device set tx timeout.

Modify watchdog next timeout to be shorter than the device specified.
Compute the next timeout be equal to device watchdog timeout less the
how long ago queue stop had been done. At next watchdog timeout tx
timeout handler is called into if still in stopped state. Either called
or not called, restore the watchdog timeout back to device specified.

Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com>
Link: https://lore.kernel.org/r/20240508133617.4424-1-praveen.kannoju@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-09 20:24:13 -07:00
Alan Maguire
fcd1ed89a0 kbuild,bpf: Switch to using --btf_features for pahole v1.26 and later
The btf_features list can be used for pahole v1.26 and later -
it is useful because if a feature is not yet implemented it will
not exit with a failure message.  This will allow us to add feature
requests to the pahole options without having to check pahole versions
in future; if the version of pahole supports the feature it will be
added.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240507135514.490467-1-alan.maguire@oracle.com
2024-05-09 14:44:35 -07:00
Martin KaFai Lau
0d03a4d24b Merge branch 'use network helpers, part 4'
Geliang Tang says:

====================
From: Geliang Tang <tanggeliang@kylinos.cn>

This patchset adds post_socket_cb pointer into
struct network_helper_opts to make start_server_addr() helper
more flexible. With these modifications, many duplicate codes
can be dropped.

Patches 1-3 address Martin's comments in the previous series.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-09 13:40:39 -07:00
Geliang Tang
7abbf38cd8 selftests/bpf: Drop get_port in test_tcp_check_syncookie
The arguments "addr" and "len" of run_test() have dropped. This makes
function get_port() useless. Drop it from test_tcp_check_syncookie_user.c.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/a9b5c8064ab4cbf0f68886fe0e4706428b8d0d47.1714907662.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-09 13:40:38 -07:00
Geliang Tang
65a3f0df44 selftests/bpf: Use connect_to_fd in test_tcp_check_syncookie
This patch uses public helper connect_to_fd() exported in network_helpers.h
instead of the local defined function connect_to_server() in
test_tcp_check_syncookie_user.c. This can avoid duplicate code.

Then the arguments "addr" and "len" of run_test() become useless, drop them
too.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/e0ae6b790ac0abc7193aadfb2660c8c9eb0fe1f0.1714907662.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-09 13:40:38 -07:00