37880 Commits

Author SHA1 Message Date
Pedro Tammela
ee3d122854 selftests: tc-testing: add test for 'rt' upgrade on hfsc
Add a test to check if inner rt curves are upgraded to sc curves.

Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-21 11:46:41 +01:00
Linus Torvalds
9c5d00cb7b perf tools fixes for v6.6: 2nd batch
- Fix regression in reading scale and unit files from sysfs for PMU
   events, so that we can use that info to pretty print instead of
   printing raw numbers:
 
   # perf stat -e power/energy-ram/,power/energy-gpu/ sleep 2
 
    Performance counter stats for 'system wide':
 
               1.64 Joules power/energy-ram/
               0.20 Joules power/energy-gpu/
 
        2.001228914 seconds time elapsed
   #
   # grep -m1 "model name" /proc/cpuinfo
   model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
   #
 
 - The small llvm.cpp file used to check if the llvm devel files are present was
   incorrectly deleted when removing the BPF event in 'perf trace', put it back
   as it is also used by tools/bpf/bpftool, that uses llvm routines to do
   disassembly of BPF object files.
 
 - Fix use of addr_location__exit() in dlfilter__object_code(), making sure that
   it is only used to pair a previous addr_location__init() call.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZTKh5AAKCRCyPKLppCJ+
 J/g/AP0f6SNyHJz21JzDTzyjXAeSdMzKwic0LXv+kATQy31HJAD+Kf7UKQieUeZB
 fxvp60aKyFN8IVIgpYiAjZMS3k9XPAY=
 =N7Gv
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.6-2-2023-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix regression in reading scale and unit files from sysfs for PMU
   events, so that we can use that info to pretty print instead of
   printing raw numbers:

     # perf stat -e power/energy-ram/,power/energy-gpu/ sleep 2

      Performance counter stats for 'system wide':

                 1.64 Joules power/energy-ram/
                 0.20 Joules power/energy-gpu/

          2.001228914 seconds time elapsed
     #
     # grep -m1 "model name" /proc/cpuinfo
     model name	: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
     #

 - The small llvm.cpp file used to check if the llvm devel files are
   present was incorrectly deleted when removing the BPF event in 'perf
   trace', put it back as it is also used by tools/bpf/bpftool, that
   uses llvm routines to do disassembly of BPF object files.

 - Fix use of addr_location__exit() in dlfilter__object_code(), making
   sure that it is only used to pair a previous addr_location__init()
   call.

* tag 'perf-tools-fixes-for-v6.6-2-2023-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  tools build: Fix llvm feature detection, still used by bpftool
  perf dlfilter: Add a test for object_code()
  perf dlfilter: Fix use of addr_location__exit() in dlfilter__object_code()
  perf pmu: Fix perf stat output with correct scale and unit
2023-10-20 14:49:24 -07:00
Linus Torvalds
444ccf1b11 linux_kselftest_active-fixes-6.6-rc7
This Kselftest update for Linux 6.6-rc7 consists of one single fix
 to assert check in user_events abi_test to properly check bit value
 on Big Endian architectures. The current code treats the bit values
 as Little Endian and the check fails on Big Endian.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmUytR4ACgkQCwJExA0N
 QxzjoQ/+MWgtyggLZo/dP+qJ7AU+tG08YXWWuh99lkSxVs3xHvwl2EGaYf1PXN9c
 ZUXb/KGfls8G4tv20KU2+/uSRSirkf+CFLN6HaBBG+2cun8o0KpHlVKGfmvRjb13
 ZUX8UQJ5u9kTSTqV7gCxVbemV5iOhTazuwVQ1Aq7wFL6e/0oL4eolbgNP0ub76vy
 UsZl9j/8pFhtVfdqRJqorKQ9H5RgvW7CCmobkUruGJoXg7jFuFdeqXtS4U1ziyJT
 g42LtXuC053KrEnEmhj1EadZC4E1eXadffssanfyWolAXjRD3N+2vLYasJOiGDAO
 mT111kQLaRNvZLJBULlrnWITkbVOhfE9Cxu14idxvdLShQiUO8kRCjoN2TbLZ2ET
 n7u/4anHBu99ljO6uZVNe7JY5ZrwaM1o7Myvk9eOVRe4WKkgOXJKUM3lfwvfMK8Q
 sEWvfBY04gL5y697DDiQvJ4g0fRwwnoadpHFvJTJX977H5C8/c750YZw/Emuip9D
 zxyHGEVtncM6awbHP8bGBgg2f6XISWKs5qvzrwLmdXx42oi0VjU4z+cEVOUSx/rY
 K+B+LVzcdYc/6UhjQzmylNBUs5CO7K6D05/tes07QaZ1MCCyw3b+DAR+LJn2U2+C
 Y3+x/kQT16abMIIpTM8qVWqxswvaFSyZ+5hCkVrAsYwkM1fpB4c=
 =PJhV
 -----END PGP SIGNATURE-----

Merge tag 'linux_kselftest_active-fixes-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fix from Shuah Khan:
 "One single fix to assert check in user_events abi_test to properly
  check bit value on Big Endian architectures. The code treated the bit
  values as Little Endian and the check failed on Big Endian"

* tag 'linux_kselftest_active-fixes-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/user_events: Fix abi_test for BE archs
2023-10-20 14:45:41 -07:00
Hou Tao
d440ba91ca selftests/bpf: Add more test cases for bpf memory allocator
Add the following 3 test cases for bpf memory allocator:
1) Do allocation in bpf program and free through map free
2) Do batch per-cpu allocation and per-cpu free in bpf program
3) Do per-cpu allocation in bpf program and free through map free

For per-cpu allocation, because per-cpu allocation can not refill timely
sometimes, so test 2) and test 3) consider it is OK for
bpf_percpu_obj_new_impl() to return NULL.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231020133202.4043247-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-20 14:15:13 -07:00
Kumar Kartikeya Dwivedi
da1055b673 selftests/bpf: Make linked_list failure test more robust
The linked list failure test 'pop_front_off' and 'pop_back_off'
currently rely on matching exact instruction and register values.  The
purpose of the test is to ensure the offset is correctly incremented for
the returned pointers from list pop helpers, which can then be used with
container_of to obtain the real object. Hence, somehow obtaining the
information that the offset is 48 will work for us. Make the test more
robust by relying on verifier error string of bpf_spin_lock and remove
dependence on fragile instruction index or register number, which can be
affected by different clang versions used to build the selftests.

Fixes: 300f19dcdb99 ("selftests/bpf: Add BPF linked list API tests")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231020144839.2734006-1-memxor@gmail.com
2023-10-20 09:29:39 -07:00
Francis Laniel
03b80ff802 selftests/ftrace: Add new test case which checks non unique symbol
If name_show() is non unique, this test will try to install a kprobe on this
function which should fail returning EADDRNOTAVAIL.
On kernel where name_show() is not unique, this test is skipped.

Link: https://lore.kernel.org/all/20231020104250.9537-3-flaniel@linux.microsoft.com/

Cc: stable@vger.kernel.org
Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-10-20 22:11:49 +09:00
Josh Poimboeuf
34de4fe7d1 objtool: Fix return thunk patching in retpolines
With CONFIG_RETHUNK enabled, the compiler replaces every RET with a tail
call to a return thunk ('JMP __x86_return_thunk').  Objtool annotates
all such return sites so they can be patched during boot by
apply_returns().

The implementation of __x86_return_thunk() is just a bare RET.  It's
only meant to be used temporarily until apply_returns() patches all
return sites with either a JMP to another return thunk or an actual RET.

Removing the .text..__x86.return_thunk section would break objtool's
detection of return sites in retpolines.  Since retpolines and return
thunks would land in the same section, the compiler no longer uses
relocations for the intra-section jumps between the retpolines and the
return thunk, causing objtool to overlook them.

As a result, none of the retpolines' return sites would get patched.
Each one stays at 'JMP __x86_return_thunk', effectively a bare RET.

Fix it by teaching objtool to detect when a non-relocated jump target is
a return thunk (or retpoline).

  [ bp: Massage the commit message now that the offending commit
    removing the .text..__x86.return_thunk section has been zapped.
    Still keep the objtool change here as it makes objtool more robust
    wrt handling such intra-TU jumps without relocations, should some
    toolchain and/or config generate them in the future. ]

Reported-by: David Kaplan <david.kaplan@amd.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231012024737.eg5phclogp67ik6x@treble
2023-10-20 12:51:41 +02:00
Jakub Kicinski
7d4caf54d2 netlink: specs: add support for auto-sized scalars
Support uint / sint types in specs and YNL.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:43:35 +01:00
Jakub Kicinski
e1b347c5f7 tools: ynl-gen: make the mnl_type() method public
uint/sint support will add more logic to mnl_type(),
deduplicate it and make it more accessible.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-20 11:43:35 +01:00
Arnaldo Carvalho de Melo
4fa008a2db tools build: Fix llvm feature detection, still used by bpftool
When removing the BPF event for perf a feature test that checks if the
llvm devel files are availabe was removed but that is also used by
bpftool.

bpftool uses it to decide what kind of disassembly it will use: llvm or
binutils based.

Removing the tools/build/feature/test-llvm.cpp file made bpftool to
always fallback to binutils disassembly, even with the llvm devel files
installed, fix it by restoring just that small test-llvm.cpp test file.

Fixes: 56b11a2126bf2f42 ("perf bpf: Remove support for embedding clang for compiling BPF events (-e foo.c)")
Reported-by: Manu Bretelle <chantr4@gmail.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Manu Bretelle <chantr4@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/lkml/ZTGa0Ukt7QyxWcVy@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-10-19 22:33:58 -03:00
Chuyi Zhou
130e0f7af9 selftests/bpf: Add tests for open-coded task and css iter
This patch adds 4 subtests to demonstrate these patterns and validating
correctness.

subtest1:

1) We use task_iter to iterate all process in the system and search for the
current process with a given pid.

2) We create some threads in current process context, and use
BPF_TASK_ITER_PROC_THREADS to iterate all threads of current process. As
expected, we would find all the threads of current process.

3) We create some threads and use BPF_TASK_ITER_ALL_THREADS to iterate all
threads in the system. As expected, we would find all the threads which was
created.

subtest2:

We create a cgroup and add the current task to the cgroup. In the
BPF program, we would use bpf_for_each(css_task, task, css) to iterate all
tasks under the cgroup. As expected, we would find the current process.

subtest3:

1) We create a cgroup tree. In the BPF program, we use
bpf_for_each(css, pos, root, XXX) to iterate all descendant under the root
with pre and post order. As expected, we would find all descendant and the
last iterating cgroup in post-order is root cgroup, the first iterating
cgroup in pre-order is root cgroup.

2) We wse BPF_CGROUP_ITER_ANCESTORS_UP to traverse the cgroup tree starting
from leaf and root separately, and record the height. The diff of the
hights would be the total tree-high - 1.

subtest4:

Add some failure testcase when using css_task, task and css iters, e.g,
unlock when using task-iters to iterate tasks.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Link: https://lore.kernel.org/r/20231018061746.111364-9-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-19 17:02:47 -07:00
Chuyi Zhou
ddab78cbb5 selftests/bpf: rename bpf_iter_task.c to bpf_iter_tasks.c
The newly-added struct bpf_iter_task has a name collision with a selftest
for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
renamed in order to avoid the collision.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-8-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-19 17:02:47 -07:00
Chuyi Zhou
7251d0905e bpf: Introduce css open-coded iterator kfuncs
This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css in open-coded iterator
style. These kfuncs actually wrapps css_next_descendant_{pre, post}.
css_iter can be used to:

1) iterating a sepcific cgroup tree with pre/post/up order

2) iterating cgroup_subsystem in BPF Prog, like
for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.

The API design is consistent with cgroup_iter. bpf_iter_css_new accepts
parameters defining iteration order and starting css. Here we also reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
BPF_CGROUP_ITER_ANCESTORS_UP enums.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-5-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-19 17:02:46 -07:00
Chuyi Zhou
c68a78ffe2 bpf: Introduce task open coded iterator kfuncs
This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_task in open-coded iterator
style. BPF programs can use these kfuncs or through bpf_for_each macro to
iterate all processes in the system.

The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
accepts a specific task and iterating type which allows:

1. iterating all process in the system (BPF_TASK_ITER_ALL_PROCS)

2. iterating all threads in the system (BPF_TASK_ITER_ALL_THREADS)

3. iterating all threads of a specific task (BPF_TASK_ITER_PROC_THREADS)

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Link: https://lore.kernel.org/r/20231018061746.111364-4-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-19 17:02:46 -07:00
Chuyi Zhou
9c66dc94b6 bpf: Introduce css_task open-coded iterator kfuncs
This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css_task in open-coded
iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
end}. BPF programs can use these kfuncs through bpf_for_each macro for
iteration of all tasks under a css.

css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
the bpf side has to be careful in where it allows to use this iter.
Currently we only allow it in bpf_lsm and bpf iter-s.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-3-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-19 17:02:46 -07:00
Jakub Kicinski
f9bc3cbc20 tools: ynl-gen: support limit names
Support the use of symbolic names like s8-min or u32-max in checks
to make writing specs less painful.

Link: https://lore.kernel.org/r/20231018163917.2514503-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:54:56 -07:00
Jakub Kicinski
668c1ac828 tools: ynl-gen: support full range of min/max checks for integer values
Extend the support to full range of min/max checks.
None of the existing YNL families required complex integer validation.

The support is less than trivial, because we try to keep struct nla_policy
tiny the min/max members it holds in place are s16. Meaning we can only
express checks in range of s16. For larger ranges we need to define
a structure and link it in the policy.

Link: https://lore.kernel.org/r/20231018163917.2514503-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:54:56 -07:00
Jakub Kicinski
ee0a4cfcbd tools: ynl-gen: track attribute use
For range validation we'll need to know if any individual
attribute is used on input (i.e. whether we will generate
a policy for it). Track this information.

Link: https://lore.kernel.org/r/20231018163917.2514503-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 15:54:56 -07:00
Jakub Kicinski
041c3466f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

net/mac80211/key.c
  02e0e426a2fb ("wifi: mac80211: fix error path key leak")
  2a8b665e6bcc ("wifi: mac80211: remove key_mtx")
  7d6904bf26b9 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/

Adjacent changes:

drivers/net/ethernet/ti/Kconfig
  a602ee3176a8 ("net: ethernet: ti: Fix mixed module-builtin object")
  98bdeae9502b ("net: cpmac: remove driver to prepare for platform removal")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 13:29:01 -07:00
Linus Torvalds
ce55c22ec8 Including fixes from bluetooth, netfilter, WiFi.
Feels like an up-tick in regression fixes, mostly for older releases.
 The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as fairly
 clear-cut user reported regressions. The mlx5 DMA bug was causing strife
 for 390x folks. The fixes themselves are not particularly scary, tho.
 No open investigations / outstanding reports at the time of writing.
 
 Current release - regressions:
 
  - eth: mlx5: perform DMA operations in the right locations,
    make devices usable on s390x, again
 
  - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve,
    previous fix of rejecting invalid config broke some scripts
 
  - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock
 
  - revert "ethtool: Fix mod state of verbose no_mask bitset",
    needs more work
 
 Current release - new code bugs:
 
  - tcp: fix listen() warning with v4-mapped-v6 address
 
 Previous releases - regressions:
 
  - tcp: allow tcp_disconnect() again when threads are waiting,
    it was denied to plug a constant source of bugs but turns out
    .NET depends on it
 
  - eth: mlx5: fix double-free if buffer refill fails under OOM
 
  - revert "net: wwan: iosm: enable runtime pm support for 7560",
    it's causing regressions and the WWAN team at Intel disappeared
 
  - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains
    a single skb, fix single-stream perf regression on some devices
 
 Previous releases - always broken:
 
  - Bluetooth:
    - fix issues in legacy BR/EDR PIN code pairing
    - correctly bounds check and pad HCI_MON_NEW_INDEX name
 
  - netfilter:
    - more fixes / follow ups for the large "commit protocol" rework,
      which went in as a fix to 6.5
    - fix null-derefs on netlink attrs which user may not pass in
 
  - tcp: fix excessive TLP and RACK timeouts from HZ rounding
    (bless Debian for keeping HZ=250 alive)
 
  - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
    letting frankenstein UDP super-frames from getting into the stack
 
  - net: fix interface altnames when ifc moves to a new namespace
 
  - eth: qed: fix the size of the RX buffers
 
  - mptcp: avoid sending RST when closing the initial subflow
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmUxawUACgkQMUZtbf5S
 Irt75w/+MC2ilnFQhfoQqpCxL2uHWOKzhenUz2PoNKR60OKgLLmTq8YBYcpyCVAQ
 wBQaNzNu1/yjF5BG7aNS/j0suGeJsOfhfoahQvXlLaat9NuDqxTpaeoT5FZ7eNQw
 RZ8+CJug3BRDV0TkaJH9UDVC/nfJTsnsGWNIhNXYGPsuveqAUun+xrnN8ZbvZIrn
 6D9rMF+u9SdVO+ANCquXBC7+CWEWiJS1ljUrU7BRNiv/9FSlnPQtjOdpuKleeBO8
 4usMS7TezHNgRdiAKC8GjSGUiIkIIMJT4y4wuczBEQAD4Pkki9UpBrui97ozOj7h
 W4N7UOuPlUBIardvKNoYz9rZyiFXBcPPm0GruHiuCqpxyqmzoFgv2XJsb/6KfzNn
 Dyro+lvh8smtbFHvFqiwaNu5y8ucfClaowvR4gjSe2KcB7hIpwNkh6vWC6OMGJK3
 hiKHnDrnXBQMbnP1YfiJ4feLmm3UYCG8eFdv/ULZT0a9TzZ7fKfzAfywUwD+/O8Y
 +S28Hr9srdDCHO7ih/gF3Wq9wtnnLy8QEkpt7cpXjRDj0uWH8JkHU3YEIGF2814Y
 LNVGmX9y6RcgrHNp03K1PmUcgAzhTTuV9QRoQKEucuBT5AK9ALDQ8YomnWDWDgrp
 UOdJPi1RUTsmqslADF15wZ9W5Ki/cDnUJsE4HU/MtnM2w95C49Q=
 =z1F6
 -----END PGP SIGNATURE-----

Merge tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, netfilter, WiFi.

  Feels like an up-tick in regression fixes, mostly for older releases.
  The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as
  fairly clear-cut user reported regressions. The mlx5 DMA bug was
  causing strife for 390x folks. The fixes themselves are not
  particularly scary, tho. No open investigations / outstanding reports
  at the time of writing.

  Current release - regressions:

   - eth: mlx5: perform DMA operations in the right locations, make
     devices usable on s390x, again

   - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner
     curve, previous fix of rejecting invalid config broke some scripts

   - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock

   - revert "ethtool: Fix mod state of verbose no_mask bitset", needs
     more work

  Current release - new code bugs:

   - tcp: fix listen() warning with v4-mapped-v6 address

  Previous releases - regressions:

   - tcp: allow tcp_disconnect() again when threads are waiting, it was
     denied to plug a constant source of bugs but turns out .NET depends
     on it

   - eth: mlx5: fix double-free if buffer refill fails under OOM

   - revert "net: wwan: iosm: enable runtime pm support for 7560", it's
     causing regressions and the WWAN team at Intel disappeared

   - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a
     single skb, fix single-stream perf regression on some devices

  Previous releases - always broken:

   - Bluetooth:
      - fix issues in legacy BR/EDR PIN code pairing
      - correctly bounds check and pad HCI_MON_NEW_INDEX name

   - netfilter:
      - more fixes / follow ups for the large "commit protocol" rework,
        which went in as a fix to 6.5
      - fix null-derefs on netlink attrs which user may not pass in

   - tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless
     Debian for keeping HZ=250 alive)

   - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
     letting frankenstein UDP super-frames from getting into the stack

   - net: fix interface altnames when ifc moves to a new namespace

   - eth: qed: fix the size of the RX buffers

   - mptcp: avoid sending RST when closing the initial subflow"

* tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
  Revert "ethtool: Fix mod state of verbose no_mask bitset"
  selftests: mptcp: join: no RST when rm subflow/addr
  mptcp: avoid sending RST when closing the initial subflow
  mptcp: more conservative check for zero probes
  tcp: check mptcp-level constraints for backlog coalescing
  selftests: mptcp: join: correctly check for no RST
  net: ti: icssg-prueth: Fix r30 CMDs bitmasks
  selftests: net: add very basic test for netdev names and namespaces
  net: move altnames together with the netdevice
  net: avoid UAF on deleted altname
  net: check for altname conflicts when changing netdev's netns
  net: fix ifname in netlink ntf during netns move
  net: ethernet: ti: Fix mixed module-builtin object
  net: phy: bcm7xxx: Add missing 16nm EPHY statistics
  ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
  tcp_bpf: properly release resources on error paths
  net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
  net: mdio-mux: fix C45 access returning -EIO after API change
  tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
  octeon_ep: update BQL sent bytes before ringing doorbell
  ...
2023-10-19 12:08:18 -07:00
Kent Overstreet
faf1dce852 objtool: Add bcachefs noreturns
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-19 14:58:29 -04:00
Linus Torvalds
189b756271 seccomp fix for v6.6-rc7
- Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmUwfdQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJixtD/98vbEvwKMPxQxTH17Y2XMA7uSg
 8/WV6reVxE+DCalieqO14XRP/I++88W5A+HJSxZ+C9Q1WFXdsGINr3XMcihWZIc1
 IkzJ7t+LTTn4Y8MxC0m7LyIqpRIVYTkKW6RwrUbsZj0+vrpXtSWz6Gy7Oz7YXvah
 Y3DHVyFpi8FNK5cEz0cmBIITE0CCR/eQuiMD1057esudruWCFiKKGCZN6hPdNTlE
 pzCYGKNlA3eXOzsW/N2qAIj2nlnEJCMpP92vnvpRhcNIa+MKCzAELSi73QScG/ja
 lrcvCWWpWmR5sSNhA2dWbamR/S6GrEtwo16J4Sg/sYqHrywsr0vVGjEHLzkpf8Ef
 zxW5mGXl78dL+zoq6woyOzt/ztQ1kqVbSXF0VMMhVjUQhTSkvivr6LX+mwvNJYim
 KLAaJmWgPj6wm9agpQ/hukNKaENiA6sDTb+wpcWKdrWl958DoSY8zl3ZdJXWSxnE
 EWXLRtPioMP9BR/9U63Fr2U/m7dXKXHRfn5J2+0WBdq4MsuPtwkIs6SAaec0AKMS
 jJnRcu/b1ub00WitnJZk8v4jFfHVernFVkUGhFrgdXzwF7hXaMvsAPK6wfg2MNpe
 9aRPS4Pf8xgF6Bz7XqNrw3qgNmQwZ5kn/ucuwSjfOc2BcwSKjYGVqwXUQ8hbQdNK
 HtrskXY14BxtSLo1Gg==
 =Ir3f
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fix from Kees Cook:

 - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)

* tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  perf/benchmark: fix seccomp_unotify benchmark for 32-bit
2023-10-19 10:10:14 -07:00
Matthieu Baerts
2cfaa8b3b7 selftests: mptcp: join: no RST when rm subflow/addr
Recently, we noticed that some RST were wrongly generated when removing
the initial subflow.

This patch makes sure RST are not sent when removing any subflows or any
addresses.

Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-5-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:10:00 -07:00
Matthieu Baerts
b134a58054 selftests: mptcp: join: correctly check for no RST
The commit mentioned below was more tolerant with the number of RST seen
during a test because in some uncontrollable situations, multiple RST
can be generated.

But it was not taking into account the case where no RST are expected:
this validation was then no longer reporting issues for the 0 RST case
because it is not possible to have less than 0 RST in the counter. This
patch fixes the issue by adding a specific condition.

Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-1-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19 09:09:59 -07:00
Manu Bretelle
6bd5e167af bpftool: Wrap struct_ops dump in an array
When dumping a struct_ops, 2 dictionaries are emitted.

When using `name`, they were already wrapped in an array, but not when
using `id`. Causing `jq` to fail at parsing the payload as it reached
the comma following the first dict.

This change wraps those dictionaries in an array so valid json is emitted.

Before, jq fails to parse the output:
```
 $ sudo bpftool struct_ops dump id 1523612 | jq . > /dev/null
parse error: Expected value before ',' at line 19, column 2
```

After, no error parsing the output:
```
sudo ./bpftool  struct_ops dump id 1523612 | jq . > /dev/null
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20231018230133.1593152-3-chantr4@gmail.com
2023-10-19 16:30:15 +02:00
Manu Bretelle
90704b4be0 bpftool: Fix printing of pointer value
When printing a pointer value, "%p" will either print the hexadecimal
value of the pointer (e.g `0x1234`), or `(nil)` when NULL.

Both of those are invalid json "integer" values and need to be wrapped
in quotes.

Before:
```
$ sudo bpftool struct_ops dump  name ned_dummy_cca | grep next
                    "next": (nil),
$ sudo bpftool struct_ops dump  name ned_dummy_cca | \
    jq '.[1].bpf_struct_ops_tcp_congestion_ops.data.list.next'
parse error: Invalid numeric literal at line 29, column 34
```

After:
```
$ sudo ./bpftool struct_ops dump  name ned_dummy_cca | grep next
                    "next": "(nil)",
$ sudo ./bpftool struct_ops dump  name ned_dummy_cca | \
    jq '.[1].bpf_struct_ops_tcp_congestion_ops.data.list.next'
"(nil)"
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20231018230133.1593152-2-chantr4@gmail.com
2023-10-19 16:29:36 +02:00
Jakub Kicinski
3920431d98 selftests: net: add very basic test for netdev names and namespaces
Add selftest for fixes around naming netdevs and namespaces.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19 15:51:16 +02:00
Eric Dumazet
878d951c67 inet: lock the socket in ip_sock_set_tos()
Christoph Paasch reported a panic in TCP stack [1]

Indeed, we should not call sk_dst_reset() without holding
the socket lock, as __sk_dst_get() callers do not all rely
on bare RCU.

[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 12bad6067 P4D 12bad6067 PUD 12bad5067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 2750 Comm: syz-executor.5 Not tainted 6.6.0-rc4-g7a5720a344e7 #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:tcp_get_metrics+0x118/0x8f0 net/ipv4/tcp_metrics.c:321
Code: c7 44 24 70 02 00 8b 03 89 44 24 48 c7 44 24 4c 00 00 00 00 66 c7 44 24 58 02 00 66 ba 02 00 b1 01 89 4c 24 04 4c 89 7c 24 10 <49> 8b 0f 48 8b 89 50 05 00 00 48 89 4c 24 30 33 81 00 02 00 00 69
RSP: 0018:ffffc90000af79b8 EFLAGS: 00010293
RAX: 000000000100007f RBX: ffff88812ae8f500 RCX: ffff88812b5f8f01
RDX: 0000000000000002 RSI: ffffffff8300f080 RDI: 0000000000000002
RBP: 0000000000000002 R08: 0000000000000003 R09: ffffffff8205eca0
R10: 0000000000000002 R11: ffff88812b5f8f00 R12: ffff88812a9e0580
R13: 0000000000000000 R14: ffff88812ae8fbd2 R15: 0000000000000000
FS: 00007f70a006b640(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000012bad7003 CR4: 0000000000170ee0
Call Trace:
<TASK>
tcp_fastopen_cache_get+0x32/0x140 net/ipv4/tcp_metrics.c:567
tcp_fastopen_cookie_check+0x28/0x180 net/ipv4/tcp_fastopen.c:419
tcp_connect+0x9c8/0x12a0 net/ipv4/tcp_output.c:3839
tcp_v4_connect+0x645/0x6e0 net/ipv4/tcp_ipv4.c:323
__inet_stream_connect+0x120/0x590 net/ipv4/af_inet.c:676
tcp_sendmsg_fastopen+0x2d6/0x3a0 net/ipv4/tcp.c:1021
tcp_sendmsg_locked+0x1957/0x1b00 net/ipv4/tcp.c:1073
tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1336
__sock_sendmsg+0x83/0xd0 net/socket.c:730
__sys_sendto+0x20a/0x2a0 net/socket.c:2194
__do_sys_sendto net/socket.c:2206 [inline]

Fixes: e08d0b3d1723 ("inet: implement lockless IP_TOS")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231018090014.345158-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19 13:13:13 +02:00
Pedro Tammela
35027c7909 selftests: tc-testing: move auxiliary scripts to a dedicated folder
Some taprio tests need auxiliary scripts to wait for workqueue events to
process. Move them to a dedicated folder in order to package them for
the kselftests tarball.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231017152309.3196320-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18 18:07:51 -07:00
Pedro Tammela
f157b73d51 selftests: tc-testing: add missing Kconfig options to 'config'
Make sure CI builds using just tc-testing/config can run all tdc tests.
Some tests were broken because of missing knobs.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20231017152309.3196320-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18 18:07:51 -07:00
Jiri Slaby (SUSE)
31c65705a8 perf/benchmark: fix seccomp_unotify benchmark for 32-bit
Commit 7d5cb68af638 (perf/benchmark: add a new benchmark for
seccom_unotify) added a reference to __NR_seccomp into perf. This is
fine as it added also a definition of __NR_seccomp for 64-bit. But it
failed to do so for 32-bit as instead of ifndef, ifdef was used.

Fix this typo (so fix the build of perf on 32-bit).

Fixes: 7d5cb68af638 (perf/benchmark: add a new benchmark for seccom_unotify)
Cc: Andrei Vagin <avagin@google.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20231017083019.31733-1-jirislaby@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2023-10-18 17:47:18 -07:00
Liam R. Howlett
099d7439ce maple_tree: add GFP_KERNEL to allocations in mas_expected_entries()
Users complained about OOM errors during fork without triggering
compaction.  This can be fixed by modifying the flags used in
mas_expected_entries() so that the compaction will be triggered in low
memory situations.  Since mas_expected_entries() is only used during fork,
the extra argument does not need to be passed through.

Additionally, the two test_maple_tree test cases and one benchmark test
were altered to use the correct locking type so that allocations would not
trigger sleeping and thus fail.  Testing was completed with lockdep atomic
sleep detection.

The additional locking change requires rwsem support additions to the
tools/ directory through the use of pthreads pthread_rwlock_t.  With this
change test_maple_tree works in userspace, as a module, and in-kernel.

Users may notice that the system gave up early on attempting to start new
processes instead of attempting to reclaim memory.

Link: https://lkml.kernel.org/r/20230915093243epcms1p46fa00bbac1ab7b7dca94acb66c44c456@epcms1p4
Link: https://lkml.kernel.org/r/20231012155233.2272446-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: <jason.sim@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 12:12:41 -07:00
Samasth Norway Ananda
e2de156b0d selftests/mm: include mman header to access MREMAP_DONTUNMAP identifier
Definition for MREMAP_DONTUNMAP is not present in glibc older than 2.32
thus throwing an undeclared error when running make on mm.  Including
linux/mman.h solves the build error for people having older glibc.

Link: https://lkml.kernel.org/r/20231012155257.891776-1-samasth.norway.ananda@oracle.com
Fixes: 0183d777c29a ("selftests: mm: remove duplicate unneeded defines")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/linux-mm/CA+G9fYvV-71XqpCr_jhdDfEtN701fBdG3q+=bafaZiGwUXy_aA@mail.gmail.com/
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18 12:12:41 -07:00
Phil Sutter
2e2d9c7d4d selftests: netfilter: Run nft_audit.sh in its own netns
Don't mess with the host's firewall ruleset. Since audit logging is not
per-netns, add an initial delay of a second so other selftests' netns
cleanups have a chance to finish.

Fixes: e8dbde59ca3f ("selftests: netfilter: Test nf_tables audit logging")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 13:47:08 +02:00
Phil Sutter
1baf0152f7 netfilter: nf_tables: audit log object reset once per table
When resetting multiple objects at once (via dump request), emit a log
message per table (or filled skb) and resurrect the 'entries' parameter
to contain the number of objects being logged for.

To test the skb exhaustion path, perform some bulk counter and quota
adds in the kselftest.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com> (Audit)
Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18 13:43:40 +02:00
Phil Sutter
c4eee56e14 net: skb_find_text: Ignore patterns extending past 'to'
Assume that caller's 'to' offset really represents an upper boundary for
the pattern search, so patterns extending past this offset are to be
rejected.

The old behaviour also was kind of inconsistent when it comes to
fragmentation (or otherwise non-linear skbs): If the pattern started in
between 'to' and 'from' offsets but extended to the next fragment, it
was not found if 'to' offset was still within the current fragment.

Test the new behaviour in a kselftest using iptables' string match.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixes: f72b948dcbb8 ("[NET]: skb_find_text ignores to argument")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-18 11:09:55 +01:00
Larysa Zaremba
bb6a88885f selftests/bpf: Add options and frags to xdp_hw_metadata
This is a follow-up to the commit 9b2b86332a9b ("bpf: Allow to use kfunc
XDP hints and frags together").

The are some possible implementations problems that may arise when providing
metadata specifically for multi-buffer packets, therefore there must be a
possibility to test such option separately.

Add an option to use multi-buffer AF_XDP xdp_hw_metadata and mark used XDP
program as capable to use frags.

As for now, xdp_hw_metadata accepts no options, so add simple option
parsing logic and a help message.

For quick reference, also add an ingress packet generation command to the
help message. The command comes from [0].

Example of output for multi-buffer packet:

  xsk_ring_cons__peek: 1
  0xead018: rx_desc[15]->addr=10000000000f000 addr=f100 comp_addr=f000
  rx_hash: 0x5789FCBB with RSS type:0x29
  rx_timestamp:  1696856851535324697 (sec:1696856851.5353)
  XDP RX-time:   1696856843158256391 (sec:1696856843.1583)
  	delta sec:-8.3771 (-8377068.306 usec)
  AF_XDP time:   1696856843158413078 (sec:1696856843.1584)
  	delta sec:0.0002 (156.687 usec)
  0xead018: complete idx=23 addr=f000
  xsk_ring_cons__peek: 1
  0xead018: rx_desc[16]->addr=100000000008000 addr=8100 comp_addr=8000
  0xead018: complete idx=24 addr=8000
  xsk_ring_cons__peek: 1
  0xead018: rx_desc[17]->addr=100000000009000 addr=9100 comp_addr=9000 EoP
  0xead018: complete idx=25 addr=9000

Metadata is printed for the first packet only.

  [0] https://lore.kernel.org/all/20230119221536.3349901-18-sdf@google.com/

Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20231017162800.24080-1-larysa.zaremba@intel.com
2023-10-18 10:08:28 +02:00
Jakub Kicinski
9fea94d3a8 tools: ynl: fix converting flags to names after recent cleanup
I recently cleaned up specs to not specify enum-as-flags
when target enum is already defined as flags.
YNL Python library did not convert flags, unfortunately,
so this caused breakage for Stan and Willem.

Note that the nlspec.py abstraction already hides the differences
between flags and enums (value vs user_value), so the changes
are pretty trivial.

Fixes: 0629f22ec130 ("ynl: netdev: drop unnecessary enum-as-flags")
Reported-and-tested-by: Willem de Bruijn <willemb@google.com>
Reported-and-tested-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/all/ZS10NtQgd_BJZ3RU@google.com/
Link: https://lore.kernel.org/r/20231016213937.1820386-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17 17:59:46 -07:00
Johannes Nixdorf
6f84090333 selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest
Add a suite covering the fdb_n_learned and fdb_max_learned bridge
features, touching all special cases in accounting at least once.

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-5-32cddff87758@avm.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-17 17:39:02 -07:00
Beau Belgrave
cf5a103c98 selftests/user_events: Fix abi_test for BE archs
The abi_test currently uses a long sized test value for enablement
checks. On LE this works fine, however, on BE this results in inaccurate
assert checks due to a bit being used and assuming it's value is the
same on both LE and BE.

Use int type for 32-bit values and long type for 64-bit values to ensure
appropriate behavior on both LE and BE.

Fixes: 60b1af8de8c1 ("tracing/user_events: Add ABI self-test")
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-10-17 15:07:19 -06:00
Daniel Borkmann
24516309e3 selftests/bpf: Add additional mprog query test coverage
Add several new test cases which assert corner cases on the mprog query
mechanism, for example, around passing in a too small or a larger array
than the current count.

  ./test_progs -t tc_opts
  #252     tc_opts_after:OK
  #253     tc_opts_append:OK
  #254     tc_opts_basic:OK
  #255     tc_opts_before:OK
  #256     tc_opts_chain_classic:OK
  #257     tc_opts_chain_mixed:OK
  #258     tc_opts_delete_empty:OK
  #259     tc_opts_demixed:OK
  #260     tc_opts_detach:OK
  #261     tc_opts_detach_after:OK
  #262     tc_opts_detach_before:OK
  #263     tc_opts_dev_cleanup:OK
  #264     tc_opts_invalid:OK
  #265     tc_opts_max:OK
  #266     tc_opts_mixed:OK
  #267     tc_opts_prepend:OK
  #268     tc_opts_query:OK
  #269     tc_opts_query_attach:OK
  #270     tc_opts_replace:OK
  #271     tc_opts_revision:OK
  Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20231017081728.24769-1-daniel@iogearbox.net
2023-10-17 12:57:43 -07:00
Yafang Shao
44cb03f19b selftests/bpf: Add selftest for bpf_task_under_cgroup() in sleepable prog
The result is as follows:

  $ tools/testing/selftests/bpf/test_progs --name=task_under_cgroup
  #237     task_under_cgroup:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Without the previous patch, there will be RCU warnings in dmesg when
CONFIG_PROVE_RCU is enabled. While with the previous patch, there will
be no warnings.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20231007135945.4306-2-laoar.shao@gmail.com
2023-10-17 18:31:27 +02:00
Andrii Nakryiko
137df1189d libbpf: Don't assume SHT_GNU_verdef presence for SHT_GNU_versym section
Fix too eager assumption that SHT_GNU_verdef ELF section is going to be
present whenever binary has SHT_GNU_versym section. It seems like either
SHT_GNU_verdef or SHT_GNU_verneed can be used, so failing on missing
SHT_GNU_verdef actually breaks use cases in production.

One specific reported issue, which was used to manually test this fix,
was trying to attach to `readline` function in BASH binary.

Fixes: bb7fa09399b9 ("libbpf: Support symbol versioning for uprobe")
Reported-by: Liam Wisehart <liamwisehart@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Manu Bretelle <chantr4@gmail.com>
Reviewed-by: Fangrui Song <maskray@google.com>
Acked-by: Hengqi Chen <hengqi.chen@gmail.com>
Link: https://lore.kernel.org/bpf/20231016182840.4033346-1-andrii@kernel.org
2023-10-17 11:43:20 +02:00
Jakub Kicinski
a3c2dd9648 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZS1d4wAKCRDbK58LschI
 g4DSAP441CdKh8fd+wNKUSKHFbpCQ6EvocR6Nf+Sj2DFUx/w/QEA7mfju7Abqjc3
 xwDEx0BuhrjMrjV5MmEpxc7lYl9XcQU=
 =vuWk
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-10-16

We've added 90 non-merge commits during the last 25 day(s) which contain
a total of 120 files changed, 3519 insertions(+), 895 deletions(-).

The main changes are:

1) Add missed stats for kprobes to retrieve the number of missed kprobe
   executions and subsequent executions of BPF programs, from Jiri Olsa.

2) Add cgroup BPF sockaddr hooks for unix sockets. The use case is
   for systemd to reimplement the LogNamespace feature which allows
   running multiple instances of systemd-journald to process the logs
   of different services, from Daan De Meyer.

3) Implement BPF CPUv4 support for s390x BPF JIT, from Ilya Leoshkevich.

4) Improve BPF verifier log output for scalar registers to better
   disambiguate their internal state wrt defaults vs min/max values
   matching, from Andrii Nakryiko.

5) Extend the BPF fib lookup helpers for IPv4/IPv6 to support retrieving
   the source IP address with a new BPF_FIB_LOOKUP_SRC flag,
   from Martynas Pumputis.

6) Add support for open-coded task_vma iterator to help with symbolization
   for BPF-collected user stacks, from Dave Marchevsky.

7) Add libbpf getters for accessing individual BPF ring buffers which
   is useful for polling them individually, for example, from Martin Kelly.

8) Extend AF_XDP selftests to validate the SHARED_UMEM feature,
   from Tushar Vyavahare.

9) Improve BPF selftests cross-building support for riscv arch,
   from Björn Töpel.

10) Add the ability to pin a BPF timer to the same calling CPU,
   from David Vernet.

11) Fix libbpf's bpf_tracing.h macros for riscv to use the generic
   implementation of PT_REGS_SYSCALL_REGS() to access syscall arguments,
   from Alexandre Ghiti.

12) Extend libbpf to support symbol versioning for uprobes, from Hengqi Chen.

13) Fix bpftool's skeleton code generation to guarantee that ELF data
    is 8 byte aligned, from Ian Rogers.

14) Inherit system-wide cpu_mitigations_off() setting for Spectre v1/v4
    security mitigations in BPF verifier, from Yafang Shao.

15) Annotate struct bpf_stack_map with __counted_by attribute to prepare
    BPF side for upcoming __counted_by compiler support, from Kees Cook.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (90 commits)
  bpf: Ensure proper register state printing for cond jumps
  bpf: Disambiguate SCALAR register state output in verifier logs
  selftests/bpf: Make align selftests more robust
  selftests/bpf: Improve missed_kprobe_recursion test robustness
  selftests/bpf: Improve percpu_alloc test robustness
  selftests/bpf: Add tests for open-coded task_vma iter
  bpf: Introduce task_vma open-coded iterator kfuncs
  selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c
  bpf: Don't explicitly emit BTF for struct btf_iter_num
  bpf: Change syscall_nr type to int in struct syscall_tp_t
  net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set
  bpf: Avoid unnecessary audit log for CPU security mitigations
  selftests/bpf: Add tests for cgroup unix socket address hooks
  selftests/bpf: Make sure mount directory exists
  documentation/bpf: Document cgroup unix socket address hooks
  bpftool: Add support for cgroup unix socket address hooks
  libbpf: Add support for cgroup unix socket address hooks
  bpf: Implement cgroup sockaddr hooks for unix sockets
  bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf
  bpf: Propagate modified uaddrlen from cgroup sockaddr programs
  ...
====================

Link: https://lore.kernel.org/r/20231016204803.30153-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-16 21:05:33 -07:00
Linus Torvalds
86d6a628a2 ARM:
- Fix the handling of the phycal timer offset when FEAT_ECV
   and CNTPOFF_EL2 are implemented.
 
 - Restore the functionnality of Permission Indirection that
   was broken by the Fine Grained Trapping rework
 
 - Cleanup some PMU event sharing code
 
 MIPS:
 
 - Fix W=1 build.
 
 s390:
 
 - One small fix for gisa to avoid stalls.
 
 x86:
 
 - Truncate writes to PMU counters to the counter's width to avoid spurious
   overflows when emulating counter events in software.
 
 - Set the LVTPC entry mask bit when handling a PMI (to match Intel-defined
   architectural behavior).
 
 - Treat KVM_REQ_PMI as a wake event instead of queueing host IRQ work to
   kick the guest out of emulated halt.
 
 - Fix for loading XSAVE state from an old kernel into a new one.
 
 - Fixes for AMD AVIC
 
 selftests:
 
 - Play nice with %llx when formatting guest printf and assert statements.
 
 - Clean up stale test metadata.
 
 - Zero-initialize structures in memslot perf test to workaround a suspected
   "may be used uninitialized" false positives from GCC.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmUtvXgUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOE3gf/Q0Xvi/oU/+iDMuvfCbMZg/nhbrsa
 WmE/zXLrCF0DknppAsWulkhLGL2ceL6X+f37f2vWpBdG9SVDG/vSAg+QQDwsXiKN
 hTJoaybtMMPZM64emPZKeLMrq3A/U32qIMmWMJkoQRyz6dftUhGqZEuy1jw8oomJ
 n9idRDCMkbo+bick4URt0FEuI3Tf6dPIlG7P5hObFTw+nenzzxTjoxWZ432Mgyod
 yqveEke4hcQ+6K1zdAcDNZqT9ZhxeTxAO4yrBAYfnFoPLhUXKDUumkqAQPNOhKTo
 YN+b29kHBm+HvYkHN785FQla/13wjE1aq5TUj5J7NEDv4uRXDefDq2OAeg==
 =b9AY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Fix the handling of the phycal timer offset when FEAT_ECV and
     CNTPOFF_EL2 are implemented

   - Restore the functionnality of Permission Indirection that was
     broken by the Fine Grained Trapping rework

   - Cleanup some PMU event sharing code

  MIPS:

   - Fix W=1 build

  s390:

   - One small fix for gisa to avoid stalls

  x86:

   - Truncate writes to PMU counters to the counter's width to avoid
     spurious overflows when emulating counter events in software

   - Set the LVTPC entry mask bit when handling a PMI (to match
     Intel-defined architectural behavior)

   - Treat KVM_REQ_PMI as a wake event instead of queueing host IRQ work
     to kick the guest out of emulated halt

   - Fix for loading XSAVE state from an old kernel into a new one

   - Fixes for AMD AVIC

  selftests:

   - Play nice with %llx when formatting guest printf and assert
     statements

   - Clean up stale test metadata

   - Zero-initialize structures in memslot perf test to workaround a
     suspected 'may be used uninitialized' false positives from GCC"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: arm64: timers: Correctly handle TGE flip with CNTPOFF_EL2
  KVM: arm64: POR{E0}_EL1 do not need trap handlers
  KVM: arm64: Add nPIR{E0}_EL1 to HFG traps
  KVM: MIPS: fix -Wunused-but-set-variable warning
  KVM: arm64: pmu: Drop redundant check for non-NULL kvm_pmu_events
  KVM: SVM: Fix build error when using -Werror=unused-but-set-variable
  x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested()
  x86: KVM: SVM: add support for Invalid IPI Vector interception
  x86: KVM: SVM: always update the x2avic msr interception
  KVM: selftests: Force load all supported XSAVE state in state test
  KVM: selftests: Load XSAVE state into untouched vCPU during state test
  KVM: selftests: Touch relevant XSAVE state in guest for state test
  KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}
  x86/fpu: Allow caller to constrain xfeatures when copying to uabi buffer
  KVM: selftests: Zero-initialize entire test_result in memslot perf test
  KVM: selftests: Remove obsolete and incorrect test case metadata
  KVM: selftests: Treat %llx like %lx when formatting guest printf
  KVM: x86/pmu: Synthesize at most one PMI per VM-exit
  KVM: x86: Mask LVTPC when handling a PMI
  KVM: x86/pmu: Truncate counter value to allowed width on write
  ...
2023-10-16 18:34:17 -07:00
Chuck Lever
f14122b2c2 tools: ynl: Add source files for nfsd netlink protocol
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-10-16 12:44:10 -04:00
Andrii Nakryiko
72f8a1de4a bpf: Disambiguate SCALAR register state output in verifier logs
Currently the way that verifier prints SCALAR_VALUE register state (and
PTR_TO_PACKET, which can have var_off and ranges info as well) is very
ambiguous.

In the name of brevity we are trying to eliminate "unnecessary" output
of umin/umax, smin/smax, u32_min/u32_max, and s32_min/s32_max values, if
possible. Current rules are that if any of those have their default
value (which for mins is the minimal value of its respective types: 0,
S32_MIN, or S64_MIN, while for maxs it's U32_MAX, S32_MAX, S64_MAX, or
U64_MAX) *OR* if there is another min/max value that as matching value.
E.g., if smin=100 and umin=100, we'll emit only umin=10, omitting smin
altogether. This approach has a few problems, being both ambiguous and
sort-of incorrect in some cases.

Ambiguity is due to missing value could be either default value or value
of umin/umax or smin/smax. This is especially confusing when we mix
signed and unsigned ranges. Quite often, umin=0 and smin=0, and so we'll
have only `umin=0` leaving anyone reading verifier log to guess whether
smin is actually 0 or it's actually -9223372036854775808 (S64_MIN). And
often times it's important to know, especially when debugging tricky
issues.

"Sort-of incorrectness" comes from mixing negative and positive values.
E.g., if umin is some large positive number, it can be equal to smin
which is, interpreted as signed value, is actually some negative value.
Currently, that smin will be omitted and only umin will be emitted with
a large positive value, giving an impression that smin is also positive.

Anyway, ambiguity is the biggest issue making it impossible to have an
exact understanding of register state, preventing any sort of automated
testing of verifier state based on verifier log. This patch is
attempting to rectify the situation by removing ambiguity, while
minimizing the verboseness of register state output.

The rules are straightforward:
  - if some of the values are missing, then it definitely has a default
  value. I.e., `umin=0` means that umin is zero, but smin is actually
  S64_MIN;
  - all the various boundaries that happen to have the same value are
  emitted in one equality separated sequence. E.g., if umin and smin are
  both 100, we'll emit `smin=umin=100`, making this explicit;
  - we do not mix negative and positive values together, and even if
  they happen to have the same bit-level value, they will be emitted
  separately with proper sign. I.e., if both umax and smax happen to be
  0xffffffffffffffff, we'll emit them both separately as
  `smax=-1,umax=18446744073709551615`;
  - in the name of a bit more uniformity and consistency,
  {u32,s32}_{min,max} are renamed to {s,u}{min,max}32, which seems to
  improve readability.

The above means that in case of all 4 ranges being, say, [50, 100] range,
we'd previously see hugely ambiguous:

    R1=scalar(umin=50,umax=100)

Now, we'll be more explicit:

    R1=scalar(smin=umin=smin32=umin32=50,smax=umax=smax32=umax32=100)

This is slightly more verbose, but distinct from the case when we don't
know anything about signed boundaries and 32-bit boundaries, which under
new rules will match the old case:

    R1=scalar(umin=50,umax=100)

Also, in the name of simplicity of implementation and consistency, order
for {s,u}32_{min,max} are emitted *before* var_off. Previously they were
emitted afterwards, for unclear reasons.

This patch also includes a few fixes to selftests that expect exact
register state to accommodate slight changes to verifier format. You can
see that the changes are pretty minimal in common cases.

Note, the special case when SCALAR_VALUE register is a known constant
isn't changed, we'll emit constant value once, interpreted as signed
value.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-5-andrii@kernel.org
2023-10-16 13:49:18 +02:00
Andrii Nakryiko
cde7851428 selftests/bpf: Make align selftests more robust
Align subtest is very specific and finicky about expected verifier log
output and format. This is often completely unnecessary as in a bunch of
situations test actually cares about var_off part of register state. But
given how exact it is right now, any tiny verifier log changes can lead
to align tests failures, requiring constant adjustment.

This patch tries to make this a bit more robust by making logic first
search for specified register and then allowing to match only portion of
register state, not everything exactly. This will come handly with
follow up changes to SCALAR register output disambiguation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-4-andrii@kernel.org
2023-10-16 13:49:18 +02:00
Andrii Nakryiko
08a7078fea selftests/bpf: Improve missed_kprobe_recursion test robustness
Given missed_kprobe_recursion is non-serial and uses common testing
kfuncs to count number of recursion misses it's possible that some other
parallel test can trigger extraneous recursion misses. So we can't
expect exactly 1 miss. Relax conditions and expect at least one.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-3-andrii@kernel.org
2023-10-16 13:49:18 +02:00
Andrii Nakryiko
2d78928c9c selftests/bpf: Improve percpu_alloc test robustness
Make these non-serial tests filter BPF programs by intended PID of
a test runner process. This makes it isolated from other parallel tests
that might interfere accidentally.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-2-andrii@kernel.org
2023-10-16 13:49:18 +02:00