23629 Commits

Author SHA1 Message Date
Andrii Nakryiko
919d2b1dbb libbpf: Allow modification of BTF and add btf__add_str API
Allow internal BTF representation to switch from default read-only mode, in
which raw BTF data is a single non-modifiable block of memory with BTF header,
types, and strings layed out sequentially and contiguously in memory, into
a writable representation with types and strings data split out into separate
memory regions, that can be dynamically expanded.

Such writable internal representation is transparent to users of libbpf APIs,
but allows to append new types and strings at the end of BTF, which is
a typical use case when generating BTF programmatically. All the basic
guarantees of BTF types and strings layout is preserved, i.e., user can get
`struct btf_type *` pointer and read it directly. Such btf_type pointers might
be invalidated if BTF is modified, so some care is required in such mixed
read/write scenarios.

Switch from read-only to writable configuration happens automatically the
first time when user attempts to modify BTF by either adding a new type or new
string. It is still possible to get raw BTF data, which is a single piece of
memory that can be persisted in ELF section or into a file as raw BTF. Such
raw data memory is also still owned by BTF and will be freed either when BTF
object is freed or if another modification to BTF happens, as any modification
invalidates BTF raw representation.

This patch adds the first two BTF manipulation APIs: btf__add_str(), which
allows to add arbitrary strings to BTF string section, and btf__find_str()
which allows to find existing string offset, but not add it if it's missing.
All the added strings are automatically deduplicated. This is achieved by
maintaining an additional string lookup index for all unique strings. Such
index is built when BTF is switched to modifiable mode. If at that time BTF
strings section contained duplicate strings, they are not de-duplicated. This
is done specifically to not modify the existing content of BTF (types, their
string offsets, etc), which can cause confusion and is especially important
property if there is struct btf_ext associated with struct btf. By following
this "imperfect deduplication" process, btf_ext is kept consitent and correct.
If deduplication of strings is necessary, it can be forced by doing BTF
deduplication, at which point all the strings will be eagerly deduplicated and
all string offsets both in struct btf and struct btf_ext will be updated.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-6-andriin@fb.com
2020-09-28 17:27:31 -07:00
Andrii Nakryiko
7d9c71e10b libbpf: Extract generic string hashing function for reuse
Calculating a hash of zero-terminated string is a common need when using
hashmap, so extract it for reuse.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-5-andriin@fb.com
2020-09-28 17:27:31 -07:00
Andrii Nakryiko
192f5a1fe6 libbpf: Generalize common logic for managing dynamically-sized arrays
Managing dynamically-sized array is a common, but not trivial functionality,
which significant amount of logic and code to implement properly. So instead
of re-implementing it all the time, extract it into a helper function ans
reuse.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-4-andriin@fb.com
2020-09-28 17:27:31 -07:00
Andrii Nakryiko
b86042478f libbpf: Remove assumption of single contiguous memory for BTF data
Refactor internals of struct btf to remove assumptions that BTF header, type
data, and string data are layed out contiguously in a memory in a single
memory allocation. Now we have three separate pointers pointing to the start
of each respective are: header, types, strings. In the next patches, these
pointers will be re-assigned to point to independently allocated memory areas,
if BTF needs to be modified.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-3-andriin@fb.com
2020-09-28 17:27:31 -07:00
Andrii Nakryiko
740e69c3c5 libbpf: Refactor internals of BTF type index
Refactor implementation of internal BTF type index to not use direct pointers.
Instead it uses offset relative to the start of types data section. This
allows for types data to be reallocatable, enabling implementation of
modifiable BTF.

As now getting type by ID has an extra indirection step, convert all internal
type lookups to a new helper btf_type_id(), that returns non-const pointer to
a type by its ID.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200926011357.2366158-2-andriin@fb.com
2020-09-28 17:27:31 -07:00
Toke Høiland-Jørgensen
b000def2e0 selftests: Remove fmod_ret from test_overhead
The test_overhead prog_test included an fmod_ret program that attached to
__set_task_comm() in the kernel. However, this function was never listed as
allowed for return modification, so this only worked because of the
verifier skipping tests when a trampoline already existed for the attach
point. Now that the verifier checks have been fixed, remove fmod_ret from
the test so it works again.

Fixes: 4eaf0b5c5e04 ("selftest/bpf: Fmod_ret prog and implement test_overhead as part of bench")
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-28 17:20:28 -07:00
Lorenz Bauer
5b87adc3ce selftest: bpf: Test copying a sockmap and sockhash
Since we can now call map_update_elem(sockmap) from bpf_iter context
it's possible to copy a sockmap or sockhash in the kernel. Add a
selftest which exercises this.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200928090805.23343-5-lmb@cloudflare.com
2020-09-28 16:48:02 -07:00
Lorenz Bauer
2787031733 selftests: bpf: Remove shared header from sockmap iter test
The shared header to define SOCKMAP_MAX_ENTRIES is a bit overkill.
Dynamically allocate the sock_fd array based on bpf_map__max_entries
instead.

Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200928090805.23343-4-lmb@cloudflare.com
2020-09-28 16:48:02 -07:00
Lorenz Bauer
26c3270ddb selftests: bpf: Add helper to compare socket cookies
We compare socket cookies to ensure that insertion into a sockmap worked.
Pull this out into a helper function for use in other tests.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200928090805.23343-3-lmb@cloudflare.com
2020-09-28 16:47:58 -07:00
Song Liu
09d8ad1688 selftests/bpf: Add raw_tp_test_run
This test runs test_run for raw_tracepoint program. The test covers ctx
input, retval output, and running on correct cpu.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-4-songliubraving@fb.com
2020-09-28 21:52:36 +02:00
Song Liu
88f7fe7233 libbpf: Support test run of raw tracepoint programs
Add bpf_prog_test_run_opts() with support of new fields in bpf_attr.test,
namely, flags and cpu. Also extend _opts operations to support outputs via
opts.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-3-songliubraving@fb.com
2020-09-28 21:52:36 +02:00
Song Liu
1b4d60ec16 bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint
Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
the target program on a specific CPU. This is achieved by a new flag in
bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
BPF programs that handle perf_event and other percpu resources, as the
program can access these resource locally.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
2020-09-28 21:52:36 +02:00
Jakub Kicinski
8c4cf4bc3e selftests: net: add a test for static UDP tunnel ports
Check UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN works as expected.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28 12:50:12 -07:00
Jakub Kicinski
53db3e53e2 selftests: net: add a test for shared UDP tunnel info tables
Add a test run of checks validating the shared UDP tunnel port
tables function as we expect.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28 12:50:12 -07:00
Arnaldo Carvalho de Melo
717d182e41 Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes and get v5.10 development in sync with the main kernel
sources.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 15:44:52 -03:00
Ian Rogers
a55b7bb1c1 perf test: Fix msan uninitialized use.
Ensure 'st' is initialized before an error branch is taken.
Fixes test "67: Parse and process metrics" with LLVM msan:

  ==6757==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5570edae947d in rblist__exit tools/perf/util/rblist.c:114:2
    #1 0x5570edb1c6e8 in runtime_stat__exit tools/perf/util/stat-shadow.c:141:2
    #2 0x5570ed92cfae in __compute_metric tools/perf/tests/parse-metric.c:187:2
    #3 0x5570ed92cb74 in compute_metric tools/perf/tests/parse-metric.c:196:9
    #4 0x5570ed92c6d8 in test_recursion_fail tools/perf/tests/parse-metric.c:318:2
    #5 0x5570ed92b8c8 in test__parse_metric tools/perf/tests/parse-metric.c:356:2
    #6 0x5570ed8de8c1 in run_test tools/perf/tests/builtin-test.c:410:9
    #7 0x5570ed8ddadf in test_and_print tools/perf/tests/builtin-test.c:440:9
    #8 0x5570ed8dca04 in __cmd_test tools/perf/tests/builtin-test.c:661:4
    #9 0x5570ed8dbc07 in cmd_test tools/perf/tests/builtin-test.c:807:9
    #10 0x5570ed7326cc in run_builtin tools/perf/perf.c:313:11
    #11 0x5570ed731639 in handle_internal_command tools/perf/perf.c:365:8
    #12 0x5570ed7323cd in run_argv tools/perf/perf.c:409:2
    #13 0x5570ed731076 in main tools/perf/perf.c:539:3

Fixes: commit f5a56570a3f2 ("perf test: Fix memory leaks in parse-metric test")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20200923210655.4143682-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 09:24:01 -03:00
Ian Rogers
aa98d8482c perf parse-events: Reduce casts around bp_addr
perf_event_attr bp_addr is a u64. parse-events.y parses it as a u64, but
casts it to a void* and then parse-events.c casts it back to a u64.
Rather than all the casts, change the type of the address to be a u64.

This removes an issue noted in:

  https://lore.kernel.org/lkml/20200903184359.GC3495158@kernel.org/

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200925003903.561568-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 09:22:39 -03:00
Namhyung Kim
40b74c30ff perf test: Add expand cgroup event test
It'll expand given events for cgroups A, B and C.

  $ perf test -v expansion
  69: Event expansion for cgroups                      :
  --- start ---
  test child forked, pid 983140
  metric expr 1 / IPC for CPI
  metric expr instructions / cycles for IPC
  found event instructions
  found event cycles
  adding {instructions,cycles}:W
  copying metric event for cgroup 'A': instructions (idx=0)
  copying metric event for cgroup 'B': instructions (idx=0)
  copying metric event for cgroup 'C': instructions (idx=0)
  test child finished with 0
  ---- end ----
  Event expansion for cgroups: Ok

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200924124455.336326-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 09:21:05 -03:00
Namhyung Kim
89fb1ca2ab perf tools: Allow creation of cgroup without open
This is a preparation for a test case of expanding events for multiple
cgroups.  Instead of using real system cgroup, the test will use fake
cgroups so it needs a way to have them without a open file descriptor.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200924124455.336326-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 09:18:06 -03:00
Namhyung Kim
b214ba8c42 perf tools: Copy metric events properly when expand cgroups
The metricgroup__copy_metric_events() is to handle metrics events when
expanding event for cgroups.  As the metric events keep pointers to
evsel, it should be refreshed when events are cloned during the
operation.

The perf_stat__collect_metric_expr() is also called in case an event has
a metric directly.

During the copy, it references evsel by index as the evlist now has
cloned evsels for the given cgroup.

Also kernel test robot found an issue in the python module import so add
empty implementations of those two functions to fix it.

Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200924124455.336326-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 09:16:21 -03:00
Namhyung Kim
d1c5a0e86a perf stat: Add --for-each-cgroup option
The --for-each-cgroup option is a syntax sugar to monitor large number
of cgroups easily.  Current command line requires to list all the events
and cgroups even if users want to monitor same events for each cgroup.
This patch addresses that usage by copying given events for each cgroup
on user's behalf.

For instance, if they want to monitor 6 events for 200 cgroups each they
should write 1200 event names (with -e) AND 1200 cgroup names (with -G)
on the command line.  But with this change, they can just specify 6
events and 200 cgroups with a new option.

A simpler example below: It wants to measure 3 events for 2 cgroups ('A'
and 'B').  The result is that total 6 events are counted like below.

  $ perf stat -a -e cpu-clock,cycles,instructions --for-each-cgroup A,B sleep 1

   Performance counter stats for 'system wide':

              988.18 msec cpu-clock                 A #    0.987 CPUs utilized
       3,153,761,702      cycles                    A #    3.200 GHz                      (100.00%)
       8,067,769,847      instructions              A #    2.57  insn per cycle           (100.00%)
              982.71 msec cpu-clock                 B #    0.982 CPUs utilized
       3,136,093,298      cycles                    B #    3.182 GHz                      (99.99%)
       8,109,619,327      instructions              B #    2.58  insn per cycle           (99.99%)

         1.001228054 seconds time elapsed

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200924124455.336326-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 09:07:08 -03:00
Paolo Bonzini
0c899c25d7 KVM: x86: do not attempt TSC synchronization on guest writes
KVM special-cases writes to MSR_IA32_TSC so that all CPUs have
the same base for the TSC.  This logic is complicated, and we
do not want it to have any effect once the VM is started.

In particular, if any guest started to synchronize its TSCs
with writes to MSR_IA32_TSC rather than MSR_IA32_TSC_ADJUST,
the additional effect of kvm_write_tsc code would be uncharted
territory.

Therefore, this patch makes writes to MSR_IA32_TSC behave
essentially the same as writes to MSR_IA32_TSC_ADJUST when
they come from the guest.  A new selftest (which passes
both before and after the patch) checks the current semantics
of writes to MSR_IA32_TSC and MSR_IA32_TSC_ADJUST originating
from both the host and the guest.

Upcoming work to remove the special side effects
of host-initiated writes to MSR_IA32_TSC and MSR_IA32_TSC_ADJUST
will be able to build onto this test, adjusting the host side
to use the new APIs and achieve the same effect.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:59:52 -04:00
Alexander Graf
d468706e31 KVM: selftests: Add test for user space MSR handling
Now that we have the ability to handle MSRs from user space and also to
select which ones we do want to prevent in-kernel KVM code from handling,
let's add a selftest to show case and verify the API.

Signed-off-by: Alexander Graf <graf@amazon.com>

Message-Id: <20200925143422.21718-9-graf@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:45 -04:00
Sean Christopherson
7f3603b631 KVM: VMX: Rename RDTSCP secondary exec control name to insert "ENABLE"
Rename SECONDARY_EXEC_RDTSCP to SECONDARY_EXEC_ENABLE_RDTSCP in
preparation for consolidating the logic for adjusting secondary exec
controls based on the guest CPUID model.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923165048.20486-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:30 -04:00
Namhyung Kim
7fedd9b84b perf evsel: Add evsel__clone() function
The evsel__clone() is to create an exactly same evsel from same
attributes.  The function assumes the given evsel is not configured
yet so it cares fields set during event parsing.  Those fields are now
moved together as Jiri suggested.  Note that metric events will be
handled by later patch.

It will be used by perf stat to generate separate events for each
cgroup.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200924124455.336326-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 08:55:48 -03:00
Jin Yao
b5ff7f2799 perf vendor events: Update SkylakeX events to v1.21
- Update SkylakeX events to v1.21.
- Update SkylakeX JSON metrics from TMAM 4.0.

Other fixes:

- Add NO_NMI_WATCHDOG metric constraint to Backend_Bound
- Fix misspelled error

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/20200922031918.3723-1-yao.jin@linux.intel.com/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 08:46:47 -03:00
Jin Yao
038d3b53c2 perf vendor events intel: Update CascadelakeX events to v1.08
- Update CascadelakeX events to v1.08.
- Update CascadelakeX JSON metrics from TMAM 4.0.

Other fixes:

- Add NO_NMI_WATCHDOG metric constraint to Backend_Bound
- Change 'MB/sec' to 'MB' in UNC_M_PMM_BANDWIDTH.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Link: https://lore.kernel.org/lkml/20200922031918.3723-1-yao.jin@linux.intel.com/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-28 08:46:37 -03:00
Jakub Kicinski
090bc03bc9 netdevsim: fix duplicated debugfs directory
The "ethtool" debugfs directory holds per-netdev knobs, so move
it from the device instance directory to the port directory.

This fixes the following warning when creating multiple ports:

 debugfs: Directory 'ethtool' with parent 'netdevsim1' already present!

Fixes: ff1f7c17fb20 ("netdevsim: add pause frame stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:19:08 -07:00
Jacob Keller
cbb58368fb netdevsim: add support for flash_update overwrite mask
The devlink interface recently gained support for a new "overwrite mask"
parameter that allows specifying how various sub-sections of a flash
component are modified when updating.

Add support for this to netdevsim, to enable easily testing the
interface. Make the allowed overwrite mask values controllable via
a debugfs parameter. This enables testing a flow where the driver
rejects an unsupportable overwrite mask.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25 17:20:57 -07:00
Jacob Keller
bc75c054f0 devlink: convert flash_update to use params structure
The devlink core recently gained support for checking whether the driver
supports a flash_update parameter, via `supported_flash_update_params`.
However, parameters are specified as function arguments. Adding a new
parameter still requires modifying the signature of the .flash_update
callback in all drivers.

Convert the .flash_update function to take a new `struct
devlink_flash_update_params` instead. By using this structure, and the
`supported_flash_update_params` bit field, a new parameter to
flash_update can be added without requiring modification to existing
drivers.

As before, all parameters except file_name will require driver opt-in.
Because file_name is a necessary field to for the flash_update to make
sense, no "SUPPORTED" bitflag is provided and it is always considered
valid. All future additional parameters will require a new bit in the
supported_flash_update_params bitfield.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Bin Luo <luobin9@huawei.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25 17:20:57 -07:00
Linus Torvalds
7c7ec3226f Five small fixes. The nested migration bug will be fixed
with a better API in 5.10 or 5.11, for now this is a fix
 that works with existing userspace but keeps the current
 ugly API.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl9ufLMUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMe9AgAgU3YQ2SktkqEOXjHMLqCH5Y3PKFI
 S2anYpoKlH36Q6kzoqtkCj0GVagvdh5+Envz3I/tMdhv3Y/JgZaX1wHAe4cUl9BT
 VyoiDBTWkhYRmpUbLYA8AtmgxQw1Hp8srH86rnvVGmLG6zdAa/rgUAKiQgT688Ej
 CQvF5H7Zi3viPo2rInNSkgTIgewduqSWkwJ6+h4AQMmNJpbRaeZs45yMYyyu/FIi
 hUazy7Rwk2vkWcuTd/sqH9b9y3VCYpN9juRaehEiK8qxXT3ydTU4Tub25BHmvXdr
 dx5pShG4P3nAGnfV1qKAemyQcY7sjfMieqN1F3QcsRcxqZgySUm11o2JRw==
 =sHsX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm fixes from Paolo Bonzini:
 "Five small fixes.

  The nested migration bug will be fixed with a better API in 5.10 or
  5.11, for now this is a fix that works with existing userspace but
  keeps the current ugly API"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Add a dedicated INVD intercept routine
  KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE
  KVM: x86: fix MSR_IA32_TSC read for nested migration
  selftests: kvm: Fix assert failure in single-step test
  KVM: x86: VMX: Make smaller physical guest address space support user-configurable
2020-09-25 17:15:19 -07:00
John Fastabend
99d4def4d0 bpf: Add AND verifier test case where 32bit and 64bit bounds differ
If we AND two values together that are known in the 32bit subregs, but not
known in the 64bit registers we rely on the tnum value to report the 32bit
subreg is known. And do not use mark_reg_known() directly from
scalar32_min_max_and()

Add an AND test to cover the case with known 32bit subreg, but unknown
64bit reg.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-25 16:47:21 -07:00
J. Bruce Fields
e56dc9e294 nfsd: remove fault injection code
It was an interesting idea but nobody seems to be using it, it's buggy
at this point, and nfs4state.c is already complicated enough without it.
The new nfsd/clients/ code provides some of the same functionality, and
could probably do more if desired.

This feature has been deprecated since 9d60d93198c6 ("Deprecate nfsd
fault injection").

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-09-25 18:01:26 -04:00
Martin KaFai Lau
9a856cae22 bpf: selftest: Add test_btf_skc_cls_ingress
This patch attaches a classifier prog to the ingress filter.
It exercises the following helpers with different socket pointer
types in different logical branches:
1. bpf_sk_release()
2. bpf_sk_assign()
3. bpf_skc_to_tcp_request_sock(), bpf_skc_to_tcp_sock()
4. bpf_tcp_gen_syncookie, bpf_tcp_check_syncookie

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000458.3859627-1-kafai@fb.com
2020-09-25 13:58:02 -07:00
Martin KaFai Lau
0c402c6c30 bpf: selftest: Remove enum tcp_ca_state from bpf_tcp_helpers.h
The enum tcp_ca_state is available in <linux/tcp.h>.
Remove it from the bpf_tcp_helpers.h to avoid conflict when the bpf prog
needs to include both both <linux/tcp.h> and bpf_tcp_helpers.h.

Modify the bpf_cubic.c and bpf_dctcp.c to use <linux/tcp.h> instead.
The <linux/stddef.h> is needed by <linux/tcp.h>.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000452.3859313-1-kafai@fb.com
2020-09-25 13:58:02 -07:00
Martin KaFai Lau
edc2d66ad1 bpf: selftest: Use bpf_skc_to_tcp_sock() in the sock_fields test
This test uses bpf_skc_to_tcp_sock() to get a kernel tcp_sock ptr "ktp".
Access the ktp->lsndtime and also pass ktp to bpf_sk_storage_get().

It also exercises the bpf_sk_cgroup_id() and bpf_sk_ancestor_cgroup_id()
with the "ktp".  To do that, a parent cgroup and a child cgroup are
created.  The bpf prog is attached to the child cgroup.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000446.3858975-1-kafai@fb.com
2020-09-25 13:58:02 -07:00
Martin KaFai Lau
c40a565a04 bpf: selftest: Use network_helpers in the sock_fields test
This patch uses start_server() and connect_to_fd() from network_helpers.h
to remove the network testing boiler plate codes.  epoll is no longer
needed also since the timeout has already been taken care of also.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000440.3858639-1-kafai@fb.com
2020-09-25 13:58:02 -07:00
Martin KaFai Lau
b18c1f0aa4 bpf: selftest: Adapt sock_fields test to use skel and global variables
skel is used.

Global variables are used to store the result from bpf prog.
addr_map, sock_result_map, and tcp_sock_result_map are gone.
Instead, global variables listen_tp, srv_sa6, cli_tp,, srv_tp,
listen_sk, srv_sk, and cli_sk are added.
Because of that, bpf_addr_array_idx and bpf_result_array_idx are also
no longer needed.

CHECK() macro from test_progs.h is reused and bail as soon as
a CHECK failure.

shutdown() is used to ensure the previous data-ack is received.
The bytes_acked, bytes_received, and the pkt_out_cnt checks are
using "<" to accommodate the final ack may not have been received/sent.
It is enough since it is not the focus of this test.

The sk local storage is all initialized to 0xeB9F now, so the
check_sk_pkt_out_cnt() always checks with the 0xeB9F base.  It is to
keep things simple.

The next patch will reuse helpers from network_helpers.h to simplify
things further.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000434.3858204-1-kafai@fb.com
2020-09-25 13:58:02 -07:00
Martin KaFai Lau
6f521a2bd2 bpf: selftest: Move sock_fields test into test_progs
This is a mechanical change to
1. move test_sock_fields.c to prog_tests/sock_fields.c
2. rename progs/test_sock_fields_kern.c to progs/test_sock_fields.c

Minimal change is made to the code itself.  Next patch will make
changes to use new ways of writing test, e.g. use skel and global
variables.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000427.3857814-1-kafai@fb.com
2020-09-25 13:58:02 -07:00
Martin KaFai Lau
5d13746dd8 bpf: selftest: Add ref_tracking verifier test for bpf_skc casting
The patch tests for:
1. bpf_sk_release() can be called on a tcp_sock btf_id ptr.

2. Ensure the tcp_sock btf_id pointer cannot be used
   after bpf_sk_release().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000421.3857616-1-kafai@fb.com
2020-09-25 13:58:02 -07:00
Martin KaFai Lau
27e5203bd9 bpf: Change bpf_sk_assign to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
This patch changes the bpf_sk_assign() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

The bpf_sk_lookup_assign() is taking ARG_PTR_TO_SOCKET_"OR_NULL".  Meaning
it specifically takes a literal NULL.  ARG_PTR_TO_BTF_ID_SOCK_COMMON
does not allow a literal NULL, so another ARG type is required
for this purpose and another follow-up patch can be used if
there is such need.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000415.3857374-1-kafai@fb.com
2020-09-25 13:58:02 -07:00
Martin KaFai Lau
c0df236e13 bpf: Change bpf_tcp_*_syncookie to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
This patch changes the bpf_tcp_*_syncookie() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000409.3856725-1-kafai@fb.com
2020-09-25 13:58:01 -07:00
Martin KaFai Lau
592a349864 bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
This patch changes the bpf_sk_storage_*() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

A micro benchmark has been done on a "cgroup_skb/egress" bpf program
which does a bpf_sk_storage_get().  It was driven by netperf doing
a 4096 connected UDP_STREAM test with 64bytes packet.
The stats from "kernel.bpf_stats_enabled" shows no meaningful difference.

The sk_storage_get_btf_proto, sk_storage_delete_btf_proto,
btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are
no longer needed, so they are removed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000402.3856307-1-kafai@fb.com
2020-09-25 13:58:01 -07:00
Martin KaFai Lau
a5fa25adf0 bpf: Change bpf_sk_release and bpf_sk_*cgroup_id to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
The previous patch allows the networking bpf prog to use the
bpf_skc_to_*() helpers to get a PTR_TO_BTF_ID socket pointer,
e.g. "struct tcp_sock *".  It allows the bpf prog to read all the
fields of the tcp_sock.

This patch changes the bpf_sk_release() and bpf_sk_*cgroup_id()
to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will
work with the pointer returned by the bpf_skc_to_*() helpers
also.  For example, the following will work:

	sk = bpf_skc_lookup_tcp(skb, tuple, tuplen, BPF_F_CURRENT_NETNS, 0);
	if (!sk)
		return;
	tp = bpf_skc_to_tcp_sock(sk);
	if (!tp) {
		bpf_sk_release(sk);
		return;
	}
	lsndtime = tp->lsndtime;
	/* Pass tp to bpf_sk_release() will also work */
	bpf_sk_release(tp);

Since PTR_TO_BTF_ID could be NULL, the helper taking
ARG_PTR_TO_BTF_ID_SOCK_COMMON has to check for NULL at runtime.

A btf_id of "struct sock" may not always mean a fullsock.  Regardless
the helper's running context may get a non-fullsock or not,
considering fullsock check/handling is pretty cheap, it is better to
keep the same verifier expectation on helper that takes ARG_PTR_TO_BTF_ID*
will be able to handle the minisock situation.  In the bpf_sk_*cgroup_id()
case,  it will try to get a fullsock by using sk_to_full_sk() as its
skb variant bpf_sk"b"_*cgroup_id() has already been doing.

bpf_sk_release can already handle minisock, so nothing special has to
be done.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000356.3856047-1-kafai@fb.com
2020-09-25 13:58:01 -07:00
Peter Oskolkov
f166b111e0 rseq/selftests: Test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
Based on Google-internal RSEQ work done by Paul Turner and Andrew
Hunter.

This patch adds a selftest for MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.
The test quite often fails without the previous patch in this
patchset, but consistently passes with it.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/20200923233618.2572849-3-posk@google.com
2020-09-25 14:23:27 +02:00
Peter Oskolkov
ea366dd79c rseq/selftests,x86_64: Add rseq_offset_deref_addv()
This patch adds rseq_offset_deref_addv() function to
tools/testing/selftests/rseq/rseq-x86.h, to be used in a selftest in
the next patch in the patchset.

Once an architecture adds support for this function they should define
"RSEQ_ARCH_HAS_OFFSET_DEREF_ADDV".

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/20200923233618.2572849-2-posk@google.com
2020-09-25 14:23:27 +02:00
Geliang Tang
dd72b0fede selftests: mptcp: add remove addr and subflow test cases
This patch added the remove addr and subflow test cases and two new
functions.

The first function run_remove_tests calls do_transfer with two new
arguments, rm_nr_ns1 and rm_nr_ns2, for the numbers of addresses should be
removed during the transfer process in namespace 1 and namespace 2.

If both these two arguments are 0, we do the join test cases with
"mptcp_connect -j" command. Otherwise, do the remove test cases with
"mptcp_connect -r" command.

The second function chk_rm_nr checks the RM_ADDR related mibs's counters.

The output of the test cases looks like this:

11 remove single subflow           syn[ ok ] - synack[ ok ] - ack[ ok ]
                                   rm [ ok ] - sf    [ ok ]
12 remove multiple subflows        syn[ ok ] - synack[ ok ] - ack[ ok ]
                                   rm [ ok ] - sf    [ ok ]
13 remove single address           syn[ ok ] - synack[ ok ] - ack[ ok ]
                                   add[ ok ] - echo  [ ok ]
                                   rm [ ok ] - sf    [ ok ]
14 remove subflow and signal       syn[ ok ] - synack[ ok ] - ack[ ok ]
                                   add[ ok ] - echo  [ ok ]
                                   rm [ ok ] - sf    [ ok ]
15 remove subflows and signal      syn[ ok ] - synack[ ok ] - ack[ ok ]
                                   add[ ok ] - echo  [ ok ]
                                   rm [ ok ] - sf    [ ok ]

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
1315332409 selftests: mptcp: add remove cfg in mptcp_connect
This patch added a new cfg, named cfg_remove in mptcp_connect. This new
cfg_remove is copied from cfg_join. The only difference between them is in
the do_rnd_write function. Here we slow down the transfer process of all
data to let the RM_ADDR suboption can be sent and received completely.
Otherwise the remove address and subflow test cases don't work.

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:34 -07:00
Geliang Tang
be61316003 selftests: mptcp: add ADD_ADDR mibs check function
This patch added the ADD_ADDR related mibs counter check function
chk_add_nr(). This function check both ADD_ADDR and ADD_ADDR with
echo flag.

The output looks like this:

 07 unused signal address             syn[ ok ] - synack[ ok ] - ack[ ok ]
                                      add[ ok ] - echo  [ ok ]
 08 signal address                    syn[ ok ] - synack[ ok ] - ack[ ok ]
                                      add[ ok ] - echo  [ ok ]
 09 subflow and signal                syn[ ok ] - synack[ ok ] - ack[ ok ]
                                      add[ ok ] - echo  [ ok ]
 10 multiple subflows and signal      syn[ ok ] - synack[ ok ] - ack[ ok ]
                                      add[ ok ] - echo  [ ok ]
 11 remove subflow and signal         syn[ ok ] - synack[ ok ] - ack[ ok ]
                                      add[ ok ] - echo  [ ok ]

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24 19:58:33 -07:00
Andrii Nakryiko
87f92ac4c1 libbpf: Fix XDP program load regression for old kernels
Fix regression in libbpf, introduced by XDP link change, which causes XDP
programs to fail to be loaded into kernel due to specified BPF_XDP
expected_attach_type. While kernel doesn't enforce expected_attach_type for
BPF_PROG_TYPE_XDP, some old kernels already support XDP program, but they
don't yet recognize expected_attach_type field in bpf_attr, so setting it to
non-zero value causes program load to fail.

Luckily, libbpf already has a mechanism to deal with such cases, so just make
expected_attach_type optional for XDP programs.

Fixes: dc8698cac7aa ("libbpf: Add support for BPF XDP link")
Reported-by: Nikita Shirokov <tehnerd@tehnerd.com>
Reported-by: Udip Pant <udippant@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200924171705.3803628-1-andriin@fb.com
2020-09-24 10:33:02 -07:00