29483 Commits

Author SHA1 Message Date
Christoph Hellwig
9d7a224b46 dma-direct: always allow dma mask <= physiscal memory size
This way an architecture with less than 4G of RAM can support dma_mask
smaller than 32-bit without a ZONE_DMA.  Apparently that is a common
case on powerpc.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-10-01 07:31:25 -07:00
Christoph Hellwig
b4ebe60632 dma-direct: implement complete bus_dma_mask handling
Instead of rejecting devices with a too small bus_dma_mask we can handle
by taking the bus dma_mask into account for allocations and bounce
buffering decisions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01 07:28:03 -07:00
Christoph Hellwig
7d21ee4c71 dma-direct: refine dma_direct_alloc zone selection
We need to take the DMA offset and encryption bit into account when
selecting a zone.  User the opportunity to factor out the zone
selection into a helper for reuse.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-10-01 07:28:00 -07:00
Christoph Hellwig
a20bb05837 dma-direct: add an explicit dma_direct_get_required_mask
This is somewhat modelled after the powerpc version, and differs from
the legacy fallback in use fls64 instead of pointlessly splitting up the
address into low and high dwords and in that it takes (__)phys_to_dma
into account.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-10-01 07:27:15 -07:00
Roman Gushchin
c6fdcd6e0c bpf: don't allow create maps of per-cpu cgroup local storages
Explicitly forbid creating map of per-cpu cgroup local storages.
This behavior matches the behavior of shared cgroup storages.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:33 +02:00
Roman Gushchin
b741f16303 bpf: introduce per-cpu cgroup local storage
This commit introduced per-cpu cgroup local storage.

Per-cpu cgroup local storage is very similar to simple cgroup storage
(let's call it shared), except all the data is per-cpu.

The main goal of per-cpu variant is to implement super fast
counters (e.g. packet counters), which don't require neither
lookups, neither atomic operations.

>From userspace's point of view, accessing a per-cpu cgroup storage
is similar to other per-cpu map types (e.g. per-cpu hashmaps and
arrays).

Writing to a per-cpu cgroup storage is not atomic, but is performed
by copying longs, so some minimal atomicity is here, exactly
as with other per-cpu maps.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:32 +02:00
Roman Gushchin
f294b37ec7 bpf: rework cgroup storage pointer passing
To simplify the following introduction of per-cpu cgroup storage,
let's rework a bit a mechanism of passing a pointer to a cgroup
storage into the bpf_get_local_storage(). Let's save a pointer
to the corresponding bpf_cgroup_storage structure, instead of
a pointer to the actual buffer.

It will help us to handle per-cpu storage later, which has
a different way of accessing to the actual data.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:32 +02:00
Roman Gushchin
8bad74f984 bpf: extend cgroup bpf core to allow multiple cgroup storage types
In order to introduce per-cpu cgroup storage, let's generalize
bpf cgroup core to support multiple cgroup storage types.
Potentially, per-node cgroup storage can be added later.

This commit is mostly a formal change that replaces
cgroup_storage pointer with a array of cgroup_storage pointers.
It doesn't actually introduce a new storage type,
it will be done later.

Each bpf program is now able to have one cgroup storage of each type.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:32 +02:00
Will Deacon
22839869f2 signal: Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack
The sigaltstack(2) system call fails with -ENOMEM if the new alternative
signal stack is found to be smaller than SIGMINSTKSZ. On architectures
such as arm64, where the native value for SIGMINSTKSZ is larger than
the compat value, this can result in an unexpected error being reported
to a compat task. See, for example:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904385

This patch fixes the problem by extending do_sigaltstack to take the
minimum signal stack size as an additional parameter, allowing the
native and compat system call entry code to pass in their respective
values. COMPAT_SIGMINSTKSZ is just defined as SIGMINSTKSZ if it has not
been defined by the architecture.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Reported-by: Steve McIntyre <steve.mcintyre@arm.com>
Tested-by: Steve McIntyre <93sam@debian.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-10-01 11:43:55 +01:00
Marc Zyngier
94967b55eb genirq/debugfs: Reinstate full OF path for domain name
On a DT based system, we use the of_node full name to name the
corresponding irq domain. We expect that name to be unique, so so that
domains with the same base name won't clash (this happens on multi-node
topologies, for example).

Since a7e4cfb0a7ca ("of/fdt: only store the device node basename in
full_name"), of_node_full_name() lies and only returns the basename. This
breaks the above requirement, and we end-up with only a subset of the
domains in /sys/kernel/debug/irq/domains.

Let's reinstate the feature by using the fancy new %pOF format specifier,
which happens to do the right thing.

Fixes: a7e4cfb0a7ca ("of/fdt: only store the device node basename in full_name")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20181001100522.180054-3-marc.zyngier@arm.com
2018-10-01 12:24:53 +02:00
Marc Zyngier
513145ea66 genirq/debugfs: Reset domain debugfs_file on removal of the debugfs file
When removing a debugfs file for a given irq domain, we fail to clear the
corresponding field, meaning that the corresponding domain won't be created
again if we need to do so.

It turns out that this is exactly what irq_domain_update_bus_token does
(delete old file, update domain name, recreate file).

This doesn't have any impact other than making debug more difficult, but we
do value ease of debugging... So clear the debugfs_file field.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20181001100522.180054-2-marc.zyngier@arm.com
2018-10-01 12:24:53 +02:00
Greg Kroah-Hartman
af17b3aa1f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Thomas writes:
  "A single fix for a missing sanity check when a pinned event is tried
  to be read on the wrong CPU due to a legit event scheduling failure."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Add sanity check to deal with pinned event failure
2018-09-29 11:32:03 -07:00
Reinette Chatre
befb1b3c27 perf/core: Add sanity check to deal with pinned event failure
It is possible that a failure can occur during the scheduling of a
pinned event. The initial portion of perf_event_read_local() contains
the various error checks an event should pass before it can be
considered valid. Ensure that the potential scheduling failure
of a pinned event is checked for and have a credible error.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/6486385d1f30336e9973b24c8c65f5079543d3d3.1537377064.git.reinette.chatre@intel.com
2018-09-28 22:44:53 +02:00
Peng Hao
dc6253108f tick/broadcast: Remove redundant check
tick_device_is_functional() is called early in tick_broadcast_control(), so
no need to call it again later.

Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fweisbec@gmail.com
Link: https://lkml.kernel.org/r/1538150608-2599-1-git-send-email-penghao122@sina.com.cn
2018-09-28 22:29:35 +02:00
Roman Gushchin
4288ea006c bpf: harden flags check in cgroup_storage_update_elem()
cgroup_storage_update_elem() shouldn't accept any flags
argument values except BPF_ANY and BPF_EXIST to guarantee
the backward compatibility, had a new flag value been added.

Fixes: de9cbbaadba5 ("bpf: introduce cgroup storage maps")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-28 15:50:23 +02:00
Yonghong Song
5bf7a60b8e bpf: permit CGROUP_DEVICE programs accessing helper bpf_get_current_cgroup_id()
Currently, helper bpf_get_current_cgroup_id() is not permitted
for CGROUP_DEVICE type of programs. If the helper is used
in such cases, the verifier will log the following error:

  0: (bf) r6 = r1
  1: (69) r7 = *(u16 *)(r6 +0)
  2: (85) call bpf_get_current_cgroup_id#80
  unknown func bpf_get_current_cgroup_id#80

The bpf_get_current_cgroup_id() is useful for CGROUP_DEVICE
type of programs in order to customize action based on cgroup id.
This patch added such a support.

Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-28 14:15:19 +02:00
Ard Biesheuvel
e872267b8b jump_table: Move entries into ro_after_init region
The __jump_table sections emitted into the core kernel and into
each module consist of statically initialized references into
other parts of the code, and with the exception of entries that
point into init code, which are defused at post-init time, these
data structures are never modified.

So let's move them into the ro_after_init section, to prevent them
from being corrupted inadvertently by buggy code, or deliberately
by an attacker.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: https://lkml.kernel.org/r/20180919065144.25010-9-ard.biesheuvel@linaro.org
2018-09-27 17:56:49 +02:00
Ard Biesheuvel
1948367768 jump_label: Annotate entries that operate on __init code earlier
Jump table entries are mostly read-only, with the exception of the
init and module loader code that defuses entries that point into init
code when the code being referred to is freed.

For robustness, it would be better to move these entries into the
ro_after_init section, but clearing the 'code' member of each jump
table entry referring to init code at module load time races with the
module_enable_ro() call that remaps the ro_after_init section read
only, so we'd like to do it earlier.

So given that whether such an entry refers to init code can be decided
much earlier, we can pull this check forward. Since we may still need
the code entry at this point, let's switch to setting a low bit in the
'key' member just like we do to annotate the default state of a jump
table entry.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jessica Yu <jeyu@kernel.org>
Link: https://lkml.kernel.org/r/20180919065144.25010-8-ard.biesheuvel@linaro.org
2018-09-27 17:56:48 +02:00
Ard Biesheuvel
50ff18ab49 jump_label: Implement generic support for relative references
To reduce the size taken up by absolute references in jump label
entries themselves and the associated relocation records in the
.init segment, add support for emitting them as relative references
instead.

Note that this requires some extra care in the sorting routine, given
that the offsets change when entries are moved around in the jump_entry
table.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jessica Yu <jeyu@kernel.org>
Link: https://lkml.kernel.org/r/20180919065144.25010-3-ard.biesheuvel@linaro.org
2018-09-27 17:56:47 +02:00
Ard Biesheuvel
9ae033aca8 jump_label: Abstract jump_entry member accessors
In preparation of allowing architectures to use relative references
in jump_label entries [which can dramatically reduce the memory
footprint], introduce abstractions for references to the 'code' and
'key' members of struct jump_entry.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jessica Yu <jeyu@kernel.org>
Link: https://lkml.kernel.org/r/20180919065144.25010-2-ard.biesheuvel@linaro.org
2018-09-27 17:56:46 +02:00
Jiri Kosina
53c613fe63 x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
(once enabled) prevents cross-hyperthread control of decisions made by
indirect branch predictors.

Enable this feature if

- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)

After some previous discussion, this leaves STIBP on all the time, as wrmsr
on crossing kernel boundary is a no-no. This could perhaps later be a bit
more optimized (like disabling it in NOHZ, experiment with disabling it in
idle, etc) if needed.

Note that the synchronization of the mask manipulation via newly added
spec_ctrl_mutex is currently not strictly needed, as the only updater is
already being serialized by cpu_add_remove_lock, but let's make this a
little bit more future-proof.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
2018-09-26 14:26:52 +02:00
Jiri Kosina
dbfe2953f6 x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
Currently, IBPB is only issued in cases when switching into a non-dumpable
process, the rationale being to protect such 'important and security
sensitive' processess (such as GPG) from data leaking into a different
userspace process via spectre v2.

This is however completely insufficient to provide proper userspace-to-userpace
spectrev2 protection, as any process can poison branch buffers before being
scheduled out, and the newly scheduled process immediately becomes spectrev2
victim.

In order to minimize the performance impact (for usecases that do require
spectrev2 protection), issue the barrier only in cases when switching between
processess where the victim can't be ptraced by the potential attacker (as in
such cases, the attacker doesn't have to bother with branch buffers at all).

[ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
  PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
  fine-grained ]

Fixes: 18bf3c3ea8 ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm
2018-09-26 14:26:51 +02:00
Andy Shevchenko
ca9184f079 tracing: Trivia spelling fix containerof() -> container_of()
This is the only location on kernel that has wrong spelling
of the container_of() helper. Fix it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2018-09-26 12:21:00 +03:00
David S. Miller
105bc1306e Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-09-25

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Allow for RX stack hardening by implementing the kernel's flow
   dissector in BPF. Idea was originally presented at netconf 2017 [0].
   Quote from merge commit:

     [...] Because of the rigorous checks of the BPF verifier, this
     provides significant security guarantees. In particular, the BPF
     flow dissector cannot get inside of an infinite loop, as with
     CVE-2013-4348, because BPF programs are guaranteed to terminate.
     It cannot read outside of packet bounds, because all memory accesses
     are checked. Also, with BPF the administrator can decide which
     protocols to support, reducing potential attack surface. Rarely
     encountered protocols can be excluded from dissection and the
     program can be updated without kernel recompile or reboot if a
     bug is discovered. [...]

   Also, a sample flow dissector has been implemented in BPF as part
   of this work, from Petar and Willem.

   [0] http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf

2) Add support for bpftool to list currently active attachment
   points of BPF networking programs providing a quick overview
   similar to bpftool's perf subcommand, from Yonghong.

3) Fix a verifier pruning instability bug where a union member
   from the register state was not cleared properly leading to
   branches not being pruned despite them being valid candidates,
   from Alexei.

4) Various smaller fast-path optimizations in XDP's map redirect
   code, from Jesper.

5) Enable to recognize BPF_MAP_TYPE_REUSEPORT_SOCKARRAY maps
   in bpftool, from Roman.

6) Remove a duplicate check in libbpf that probes for function
   storage, from Taeung.

7) Fix an issue in test_progs by avoid checking for errno since
   on success its value should not be checked, from Mauricio.

8) Fix unused variable warning in bpf_getsockopt() helper when
   CONFIG_INET is not configured, from Anders.

9) Fix a compilation failure in the BPF sample code's use of
   bpf_flow_keys, from Prashant.

10) Minor cleanups in BPF code, from Yue and Zhong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 20:29:38 -07:00
Christoph Hellwig
974c24c5be dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration
The patch adding the infrastructure failed to actually add the symbol
declaration, oops..

Fixes: faef87723a ("dma-noncoherent: add a arch_sync_dma_for_cpu_all hook")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paul Burton <paul.burton@mips.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-09-25 15:11:58 -07:00
David S. Miller
a06ee256e5 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Version bump conflict in batman-adv, take what's in net-next.

iavf conflict, adjustment of netdev_ops in net-next conflicting
with poll controller method removal in net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 10:35:29 -07:00
Ingo Molnar
9835bf7ff8 perf/core improvements and fixes:
perf test:
 
 - Add watchpoint entry (Ravi Bangoria)
 
 Build fixes:
 
 - Initialize perf_data_file fd field to fix building the CTF (trace format)
   converter with with gcc 4.8.4 on Ubuntu 14.04 (Jérémie Galarneau)
 
 - Use -Wno-redundant-decls to build with PYTHON=python3 to
   build the python binding, fixing the build in systems such
   as Clear Linux (Arnaldo Carvalho de Melo)
 
 Hardware tracing:
 
 - Suppress AUX/OVERWRITE records (Alexander Shishkin)
 
 Infrastructure:
 
 - Adopt PTR_ERR_OR_ZERO from the kernel and use it in
   the bpf-loader instead of open coded equivalent (Ding Xiang)
 
 - Improve the event ordering code to make it clear and fix
   a bug related to freeing of events when using pipe mode
   from 'record' to 'inject' (Jiri Olsa)
 
 - Some prep work to facilitate per-cpu threads to write
   record data to per-cpu files (Jiri Olsa)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCW6JP9wAKCRCyPKLppCJ+
 J9vjAQDfizmUwoCqgZD4q9d9hIx1KWrFK5q6EJOPsY+l0288BQEAzgx3Q0c7zFS1
 WB0DjtGgEntudoH57vqsswrT7VPKPAE=
 =Ee8x
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-4.20-20180919' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf test improvements:

- Add watchpoint entry (Ravi Bangoria)

Build fixes:

- Initialize perf_data_file fd field to fix building the CTF (trace format)
  converter with with gcc 4.8.4 on Ubuntu 14.04 (Jérémie Galarneau)

- Use -Wno-redundant-decls to build with PYTHON=python3 to
  build the python binding, fixing the build in systems such
  as Clear Linux (Arnaldo Carvalho de Melo)

Hardware tracing improvements:

- Suppress AUX/OVERWRITE records (Alexander Shishkin)

Infrastructure changes:

- Adopt PTR_ERR_OR_ZERO from the kernel and use it in
  the bpf-loader instead of open coded equivalent (Ding Xiang)

- Improve the event ordering code to make it clear and fix
  a bug related to freeing of events when using pipe mode
  from 'record' to 'inject' (Jiri Olsa)

- Some prep work to facilitate per-cpu threads to write
  record data to per-cpu files (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-25 11:20:01 +02:00
Greg Kroah-Hartman
2dd68cc7fd Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net
Dave writes:
  "Networking fixes:

  1) Fix multiqueue handling of coalesce timer in stmmac, from Jose
     Abreu.

   2) Fix memory corruption in NFC, from Suren Baghdasaryan.

   3) Don't write reserved bits in ravb driver, from Kazuya Mizuguchi.

   4) SMC bug fixes from Karsten Graul, YueHaibing, and Ursula Braun.

   5) Fix TX done race in mvpp2, from Antoine Tenart.

   6) ipv6 metrics leak, from Wei Wang.

   7) Adjust firmware version requirements in mlxsw, from Petr Machata.

   8) Fix autonegotiation on resume in r8169, from Heiner Kallweit.

   9) Fixed missing entries when dumping /proc/net/if_inet6, from Jeff
      Barnhill.

   10) Fix double free in devlink, from Dan Carpenter.

   11) Fix ethtool regression from UFO feature removal, from Maciej
       Żenczykowski.

   12) Fix drivers that have a ndo_poll_controller() that captures the
       cpu entirely on loaded hosts by trying to drain all rx and tx
       queues, from Eric Dumazet.

   13) Fix memory corruption with jumbo frames in aquantia driver, from
       Friedemann Gerold."

* gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (79 commits)
  net: mvneta: fix the remaining Rx descriptor unmapping issues
  ip_tunnel: be careful when accessing the inner header
  mpls: allow routes on ip6gre devices
  net: aquantia: memory corruption on jumbo frames
  tun: remove ndo_poll_controller
  nfp: remove ndo_poll_controller
  bnxt: remove ndo_poll_controller
  bnx2x: remove ndo_poll_controller
  mlx5: remove ndo_poll_controller
  mlx4: remove ndo_poll_controller
  i40evf: remove ndo_poll_controller
  ice: remove ndo_poll_controller
  igb: remove ndo_poll_controller
  ixgb: remove ndo_poll_controller
  fm10k: remove ndo_poll_controller
  ixgbevf: remove ndo_poll_controller
  ixgbe: remove ndo_poll_controller
  bonding: use netpoll_poll_dev() helper
  netpoll: make ndo_poll_controller() optional
  rds: Fix build regression.
  ...
2018-09-25 11:19:49 +02:00
Ingo Molnar
fb437bc8fe This is the 4.19-rc5 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlunyjMACgkQONu9yGCS
 aT52HhAA0JU7E88QPZ1gSxc1ifTaIlHXhLQSvQKAXOhIvHDwj4tEKDqPhpCN/dWX
 /o/xaUf36gU0VzUD/1IyEiMFmJEeFKnfvN5SZYZLk8uSrd4swqaY8mSueZxNEDz4
 YNK9ugI/tPztuuz7I6KrO1iVquY1WlnECxc9FH76wvHsit8Sr3PvzhR+CVrOi+8p
 k3cpWlhHiOzT/3K3Wv2Et+oh+U+myKtQTlJDSe3fMx5chksJpBmsV/IDEtsLNZfz
 3v25fHz5a3DOYqKkGJaDrbLyPNC85249B+CiXqbXvfOAHDVkMwYqcxYUG+YZ5cpm
 U0OShLXm67dz8vT9cxqOSguCliPRlM9W5+EKzmVT7l8+ycds3BuEEHg1xWPrJWgG
 7XO10HkhZl+VvnJCj54KaszMUOdpvdEQSUs82gAFxjPbQIx5gosN9O0H+DnirMhS
 6VtzS20ZoIzjd4YVkRoLNcobHB4bZVTNXZ1Zi3C/neP9pxUjhOk0y+Vr/crC5Xph
 3TykIMgiVa+CdvQ/f4LOSiCgTFhF0tLGtfDQTG7f+9+W5pMc4NKSLi8EOMlJtYEy
 wsCYZ7/T9ElgrEzFvlxSvDwiPUhcldNao/EGdRYvMxXtgj0Ctw8LhR/2YKkqo6LK
 oMoKKWkj0o7uKSHKq+dakS0FprKnBnvE2Y+XA4SO/saPGFlDAVc=
 =OFJh
 -----END PGP SIGNATURE-----

Merge tag 'v4.19-rc5' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-09-25 11:19:44 +02:00
Ravi Bangoria
ccea8727dc trace_uprobe/sdt: Prevent multiple reference counter for same uprobe
We assume to have only one reference counter for one uprobe.
Don't allow user to add multiple trace_uprobe entries having
same inode+offset but different reference counter.

Ex,
  # echo "p:sdt_tick/loop2 /home/ravi/tick:0x6e4(0x10036)" > uprobe_events
  # echo "p:sdt_tick/loop2_1 /home/ravi/tick:0x6e4(0xfffff)" >> uprobe_events
  bash: echo: write error: Invalid argument

  # dmesg
  trace_kprobe: Reference counter offset mismatch.

There is one exception though:
When user is trying to replace the old entry with the new
one, we allow this if the new entry does not conflict with
any other existing entries.

Link: http://lkml.kernel.org/r/20180820044250.11659-4-ravi.bangoria@linux.ibm.com

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-09-24 04:44:54 -04:00
Ravi Bangoria
22bad38286 uprobes/sdt: Prevent multiple reference counter for same uprobe
We assume to have only one reference counter for one uprobe.
Don't allow user to register multiple uprobes having same
inode+offset but different reference counter.

Link: http://lkml.kernel.org/r/20180820044250.11659-3-ravi.bangoria@linux.ibm.com

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-09-24 04:44:54 -04:00
Ravi Bangoria
1cc33161a8 uprobes: Support SDT markers having reference count (semaphore)
Userspace Statically Defined Tracepoints[1] are dtrace style markers
inside userspace applications. Applications like PostgreSQL, MySQL,
Pthread, Perl, Python, Java, Ruby, Node.js, libvirt, QEMU, glib etc
have these markers embedded in them. These markers are added by developer
at important places in the code. Each marker source expands to a single
nop instruction in the compiled code but there may be additional
overhead for computing the marker arguments which expands to couple of
instructions. In case the overhead is more, execution of it can be
omitted by runtime if() condition when no one is tracing on the marker:

    if (reference_counter > 0) {
        Execute marker instructions;
    }

Default value of reference counter is 0. Tracer has to increment the
reference counter before tracing on a marker and decrement it when
done with the tracing.

Implement the reference counter logic in core uprobe. User will be
able to use it from trace_uprobe as well as from kernel module. New
trace_uprobe definition with reference counter will now be:

    <path>:<offset>[(ref_ctr_offset)]

where ref_ctr_offset is an optional field. For kernel module, new
variant of uprobe_register() has been introduced:

    uprobe_register_refctr(inode, offset, ref_ctr_offset, consumer)

No new variant for uprobe_unregister() because it's assumed to have
only one reference counter for one uprobe.

[1] https://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

Note: 'reference counter' is called as 'semaphore' in original Dtrace
(or Systemtap, bcc and even in ELF) documentation and code. But the
term 'semaphore' is misleading in this context. This is just a counter
used to hold number of tracers tracing on a marker. This is not really
used for any synchronization. So we are calling it a 'reference counter'
in kernel / perf code.

Link: http://lkml.kernel.org/r/20180820044250.11659-2-ravi.bangoria@linux.ibm.com

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
[Only trace_uprobe.c]
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-09-24 04:44:53 -04:00
Steven Rostedt (VMware)
d6b183eda4 tracing/kprobe: Remove unneeded extra strchr() from create_trace_kprobe()
By utilizing a temporary variable, we can avoid adding another call to
strchr(). Instead, save the first call to a temp variable, and then use that
variable as the reference to set the event variable.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-09-24 04:44:52 -04:00
Dennis Zhou (Facebook)
f0fcb3ec89 blkcg: remove additional reference to the css
The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.

Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:15 -06:00
Dennis Zhou (Facebook)
c839e7a03f blkcg: remove bio->bi_css and instead use bio->bi_blkg
Prior patches ensured that all bios are now associated with some blkg.
This now makes bio->bi_css unnecessary as blkg maintains a reference to
the blkcg already.

This patch removes the field bi_css and transfers corresponding uses to
access via bi_blkg.

Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-21 20:29:13 -06:00
John Fastabend
b05545e15e bpf: sockmap, fix transition through disconnect without close
It is possible (via shutdown()) for TCP socks to go trough TCP_CLOSE
state via tcp_disconnect() without actually calling tcp_close which
would then call our bpf_tcp_close() callback. Because of this a user
could disconnect a socket then put it in a LISTEN state which would
break our assumptions about sockets always being ESTABLISHED state.

To resolve this rely on the unhash hook, which is called in the
disconnect case, to remove the sock from the sockmap.

Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-22 02:46:41 +02:00
John Fastabend
5607fff303 bpf: sockmap only allow ESTABLISHED sock state
After this patch we only allow socks that are in ESTABLISHED state or
are being added via a sock_ops event that is transitioning into an
ESTABLISHED state. By allowing sock_ops events we allow users to
manage sockmaps directly from sock ops programs. The two supported
sock_ops ops are BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB and
BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB.

Similar to TLS ULP this ensures sk_user_data is correct.

Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-22 02:46:41 +02:00
zhong jiang
788758d1fe bpf: remove redundant null pointer check before consume_skb
consume_skb has taken the null pointer into account. hence it is safe
to remove the redundant null pointer check before consume_skb.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-09-21 22:51:51 +02:00
YueHaibing
3bf181bc5d kernel/sys.c: remove duplicated include
Link: http://lkml.kernel.org/r/20180821133424.18716-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 22:01:11 +02:00
KJ Tsanaktsidis
f83606f5eb fork: report pid exhaustion correctly
Make the clone and fork syscalls return EAGAIN when the limit on the
number of pids /proc/sys/kernel/pid_max is exceeded.

Currently, when the pid_max limit is exceeded, the kernel will return
ENOSPC from the fork and clone syscalls.  This is contrary to the
documented behaviour, which explicitly calls out the pid_max case as one
where EAGAIN should be returned.  It also leads to really confusing error
messages in userspace programs which will complain about a lack of disk
space when they fail to create processes/threads for this reason.

This error is being returned because alloc_pid() uses the idr api to find
a new pid; when there are none available, idr_alloc_cyclic() returns
-ENOSPC, and this is being propagated back to userspace.

This behaviour has been broken before, and was explicitly fixed in
commit 35f71bc0a09a ("fork: report pid reservation failure properly"),
so I think -EAGAIN is definitely the right thing to return in this case.
The current behaviour change dates from commit 95846ecf9dac ("pid:
replace pid bitmap implementation with IDR AIP") and was I believe
unintentional.

This patch has no impact on the case where allocating a pid fails because
the child reaper for the namespace is dead; that case will still return
-ENOMEM.

Link: http://lkml.kernel.org/r/20180903111016.46461-1-ktsanaktsidis@zendesk.com
Fixes: 95846ecf9dac ("pid: replace pid bitmap implementation with IDR AIP")
Signed-off-by: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Gargi Sharma <gs051095@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 22:01:11 +02:00
Christoph Hellwig
9406a49fd1 dma-mapping: support non-coherent devices in dma_common_get_sgtable
We can use the arch_dma_coherent_to_pfn hook to provide a ->get_sgtable
implementation.  Note that this isn't an endorsement of this interface
(which is a horrible bad idea), but it is required to move arm64 over
to the generic code without a loss of functionality.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-09-20 09:01:17 +02:00
Christoph Hellwig
58b0440663 dma-mapping: consolidate the dma mmap implementations
The only functional differences (modulo a few missing fixes in the arch
code) is that architectures without coherent caches need a hook to
convert a virtual or dma address into a pfn, given that we don't have
the kernel linear mapping available for the otherwise easy virt_to_page
call.  As a side effect we can support mmap of the per-device coherent
area even on architectures not providing the callback, and we make
previous dangerous default methods dma_common_mmap actually save for
non-coherent architectures by rejecting it without the right helper.

In addition to that we need a hook so that some architectures can
override the protection bits when mmaping a dma coherent allocations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
2018-09-20 09:01:16 +02:00
Christoph Hellwig
bc3ec75de5 dma-mapping: merge direct and noncoherent ops
All the cache maintainance is already stubbed out when not enabled,
but merging the two allows us to nicely handle the case where
cache maintainance is required for some devices, but not others.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
2018-09-20 09:01:15 +02:00
Christoph Hellwig
f3ecc0ff04 dma-mapping: move the dma_coherent flag to struct device
Various architectures support both coherent and non-coherent dma on a
per-device basis.  Move the dma_noncoherent flag from the mips archdata
field to struct device proper to prepare the infrastructure for reuse on
other architectures.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-20 09:01:15 +02:00
Christoph Hellwig
684f7e91d3 dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration
The patch adding the infrastructure failed to actually add the symbol
declaration, oops..

Fixes: faef87723a ("dma-noncoherent: add a arch_sync_dma_for_cpu_all hook")
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-09-20 09:01:14 +02:00
He Zhe
a3ceed87b0 dma-mapping: fix panic caused by passing empty cma command line argument
early_cma does not check input argument before passing it to
simple_strtoull. The argument would be a NULL pointer if "cma", without
its value, is set in command line and thus causes the following panic.

PANIC: early exception 0xe3 IP 10:ffffffffa3e9db8d error 0 cr2 0x0
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-yocto-standard+ #7
[    0.000000] RIP: 0010:_parse_integer_fixup_radix+0xd/0x70
...
[    0.000000] Call Trace:
[    0.000000]  simple_strtoull+0x29/0x70
[    0.000000]  memparse+0x26/0x90
[    0.000000]  early_cma+0x17/0x6a
[    0.000000]  do_early_param+0x57/0x8e
[    0.000000]  parse_args+0x208/0x320
[    0.000000]  ? rdinit_setup+0x30/0x30
[    0.000000]  parse_early_options+0x29/0x2d
[    0.000000]  ? rdinit_setup+0x30/0x30
[    0.000000]  parse_early_param+0x36/0x4d
[    0.000000]  setup_arch+0x336/0x99e
[    0.000000]  start_kernel+0x6f/0x4e6
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x6f/0x72
[    0.000000]  secondary_startup_64+0xa4/0xb0

This patch adds a check to prevent the panic.

Signed-off-by: He Zhe <zhe.he@windriver.com>
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-09-20 09:01:08 +02:00
Greg Kroah-Hartman
f21f7fa263 Vaibhav Nagarnaik found that modifying the ring buffer size could cause
a huge latency in the system because it does a while loop to free pages
 without releasing the CPU (on non preempt kernels). In a case where there
 are hundreds of thousands of pages to free it could actually cause a system
 stall. A properly place cond_resched() solves this issue.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW6GGJhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qo2dAQDN4SUsItEc28ij5vYKoP1mSLt8aax1
 1UoIHrh1pTLUMQD+PSlbtZnUq27vfGwyEFrIWLQ5eeDy3IESkQzoXWcs0gY=
 =HpN3
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Steven writes:
  "Vaibhav Nagarnaik found that modifying the ring buffer size could cause
   a huge latency in the system because it does a while loop to free pages
   without releasing the CPU (on non preempt kernels). In a case where there
   are hundreds of thousands of pages to free it could actually cause a system
   stall. A properly place cond_resched() solves this issue."
2018-09-19 07:41:46 +02:00
Alexander Shishkin
1627314fb5 perf: Suppress AUX/OVERWRITE records
It has been pointed out to me many times that it is useful to be able to
switch off AUX records to save the bandwidth for records that actually
matter, for example, in AUX overwrite mode.

The usefulness of PERF_RECORD_AUX is in some of its flags, like the
TRUNCATED flag that tells the decoder where exactly gaps in the trace
are.  The OVERWRITE flag, on the other hand will be set on every single
record in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
generated on every target task's sched_out, which over time adds up to a
lot of useless information.

If any folks out there have userspace that depends on a constant stream
of OVERWRITE records for a good reason, they'll have to let us know.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Link: http://lkml.kernel.org/r/20180404145323.28651-1-alexander.shishkin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-18 17:21:13 -03:00
Arnaldo Carvalho de Melo
7f16023bfc Merge remote-tracking branch 'acme/perf/urgent' into perf/core
To pick up fixes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-18 17:20:41 -03:00
David S. Miller
e366fa4350 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Two new tls tests added in parallel in both net and net-next.

Used Stephen Rothwell's linux-next resolution.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18 09:33:27 -07:00