31867 Commits

Author SHA1 Message Date
Vincent Guittot
b90f7c9d21 sched/pelt: Fix update of blocked PELT ordering
update_cfs_rq_load_avg() can call cpufreq_update_util() to trigger an
update of the frequency. Make sure that RT, DL and IRQ PELT signals have
been updated before calling cpufreq.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: dsmythies@telus.net
Cc: juri.lelli@redhat.com
Cc: mgorman@suse.de
Cc: rostedt@goodmis.org
Fixes: 371bf4273269 ("sched/rt: Add rt_rq utilization tracking")
Fixes: 3727e0e16340 ("sched/dl: Add dl_rq utilization tracking")
Fixes: 91c27493e78d ("sched/irq: Add IRQ utilization tracking")
Link: https://lkml.kernel.org/r/1572434309-32512-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-13 08:01:31 +01:00
Peter Zijlstra
ff51ff84d8 sched/core: Avoid spurious lock dependencies
While seemingly harmless, __sched_fork() does hrtimer_init(), which,
when DEBUG_OBJETS, can end up doing allocations.

This then results in the following lock order:

  rq->lock
    zone->lock.rlock
      batched_entropy_u64.lock

Which in turn causes deadlocks when we do wakeups while holding that
batched_entropy lock -- as the random code does.

Solve this by moving __sched_fork() out from under rq->lock. This is
safe because nothing there relies on rq->lock, as also evident from the
other __sched_fork() callsite.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: bigeasy@linutronix.de
Cc: cl@linux.com
Cc: keescook@chromium.org
Cc: penberg@kernel.org
Cc: rientjes@google.com
Cc: thgarnie@google.com
Cc: tytso@mit.edu
Cc: will@kernel.org
Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
Link: https://lkml.kernel.org/r/20191001091837.GK4536@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-13 08:01:30 +01:00
Linus Torvalds
eb094f0696 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TSX Async Abort and iTLB Multihit mitigations from Thomas Gleixner:
 "The performance deterioration departement is not proud at all of
  presenting the seventh installment of speculation mitigations and
  hardware misfeature workarounds:

   1) TSX Async Abort (TAA) - 'The Annoying Affair'

      TAA is a hardware vulnerability that allows unprivileged
      speculative access to data which is available in various CPU
      internal buffers by using asynchronous aborts within an Intel TSX
      transactional region.

      The mitigation depends on a microcode update providing a new MSR
      which allows to disable TSX in the CPU. CPUs which have no
      microcode update can be mitigated by disabling TSX in the BIOS if
      the BIOS provides a tunable.

      Newer CPUs will have a bit set which indicates that the CPU is not
      vulnerable, but the MSR to disable TSX will be available
      nevertheless as it is an architected MSR. That means the kernel
      provides the ability to disable TSX on the kernel command line,
      which is useful as TSX is a truly useful mechanism to accelerate
      side channel attacks of all sorts.

   2) iITLB Multihit (NX) - 'No eXcuses'

      iTLB Multihit is an erratum where some Intel processors may incur
      a machine check error, possibly resulting in an unrecoverable CPU
      lockup, when an instruction fetch hits multiple entries in the
      instruction TLB. This can occur when the page size is changed
      along with either the physical address or cache type. A malicious
      guest running on a virtualized system can exploit this erratum to
      perform a denial of service attack.

      The workaround is that KVM marks huge pages in the extended page
      tables as not executable (NX). If the guest attempts to execute in
      such a page, the page is broken down into 4k pages which are
      marked executable. The workaround comes with a mechanism to
      recover these shattered huge pages over time.

  Both issues come with full documentation in the hardware
  vulnerabilities section of the Linux kernel user's and administrator's
  guide.

  Thanks to all patch authors and reviewers who had the extraordinary
  priviledge to be exposed to this nuisance.

  Special thanks to Borislav Petkov for polishing the final TAA patch
  set and to Paolo Bonzini for shepherding the KVM iTLB workarounds and
  providing also the backports to stable kernels for those!"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs
  Documentation: Add ITLB_MULTIHIT documentation
  kvm: x86: mmu: Recovery of shattered NX large pages
  kvm: Add helper function for creating VM worker threads
  kvm: mmu: ITLB_MULTIHIT mitigation
  cpu/speculation: Uninline and export CPU mitigations helpers
  x86/cpu: Add Tremont to the cpu vulnerability whitelist
  x86/bugs: Add ITLB_MULTIHIT bug infrastructure
  x86/tsx: Add config options to set tsx=on|off|auto
  x86/speculation/taa: Add documentation for TSX Async Abort
  x86/tsx: Add "auto" option to the tsx= cmdline parameter
  kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
  x86/speculation/taa: Add sysfs reporting for TSX Async Abort
  x86/speculation/taa: Add mitigation for TSX Async Abort
  x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
  x86/cpu: Add a helper function x86_read_arch_cap_msr()
  x86/msr: Add the IA32_TSX_CTRL MSR
2019-11-12 10:53:24 -08:00
Tejun Heo
743210386c cgroup: use cgrp->kn->id as the cgroup ID
cgroup ID is currently allocated using a dedicated per-hierarchy idr
and used internally and exposed through tracepoints and bpf.  This is
confusing because there are tracepoints and other interfaces which use
the cgroupfs ino as IDs.

The preceding changes made kn->id exposed as ino as 64bit ino on
supported archs or ino+gen (low 32bits as ino, high gen).  There's no
reason for cgroup to use different IDs.  The kernfs IDs are unique and
userland can easily discover them and map them back to paths using
standard file operations.

This patch replaces cgroup IDs with kernfs IDs.

* cgroup_id() is added and all cgroup ID users are converted to use it.

* kernfs_node creation is moved to earlier during cgroup init so that
  cgroup_id() is available during init.

* While at it, s/cgroup/cgrp/ in psi helpers for consistency.

* Fallback ID value is changed to 1 to be consistent with root cgroup
  ID.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
2019-11-12 08:18:04 -08:00
Tejun Heo
40430452fd kernfs: use 64bit inos if ino_t is 64bit
Each kernfs_node is identified with a 64bit ID.  The low 32bit is
exposed as ino and the high gen.  While this already allows using inos
as keys by looking up with wildcard generation number of 0, it's
adding unnecessary complications for 64bit ino archs which can
directly use kernfs_node IDs as inos to uniquely identify each cgroup
instance.

This patch exposes IDs directly as inos on 64bit ino archs.  The
conversion is mostly straight-forward.

* 32bit ino archs behave the same as before.  64bit ino archs now use
  the whole 64bit ID as ino and the generation number is fixed at 1.

* 64bit inos still use the same idr allocator which gurantees that the
  lower 32bits identify the current live instance uniquely and the
  high 32bits are incremented whenever the low bits wrap.  As the
  upper 32bits are no longer used as gen and we don't wanna start ino
  allocation with 33rd bit set, the initial value for highbits
  allocation is changed to 0 on 64bit ino archs.

* blktrace exposes two 32bit numbers - (INO,GEN) pair - to identify
  the issuing cgroup.  Userland builds FILEID_INO32_GEN fids from
  these numbers to look up the cgroups.  To remain compatible with the
  behavior, always output (LOW32,HIGH32) which will be constructed
  back to the original 64bit ID by __kernfs_fh_to_dentry().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
2019-11-12 08:18:04 -08:00
Tejun Heo
fe0f726c9f kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id()
kernfs_find_and_get_node_by_ino() looks the kernfs_node matching the
specified ino.  On top of that, kernfs_get_node_by_id() and
kernfs_fh_get_inode() implement full ID matching by testing the rest
of ID.

On surface, confusingly, the two are slightly different in that the
latter uses 0 gen as wildcard while the former doesn't - does it mean
that the latter can't uniquely identify inodes w/ 0 gen?  In practice,
this is a distinction without a difference because generation number
starts at 1.  There are no actual IDs with 0 gen, so it can always
safely used as wildcard.

Let's simplify the code by renaming kernfs_find_and_get_node_by_ino()
to kernfs_find_and_get_node_by_id(), moving all lookup logics into it,
and removing now unnecessary kernfs_get_node_by_id().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-12 08:18:04 -08:00
Tejun Heo
67c0496e87 kernfs: convert kernfs_node->id from union kernfs_node_id to u64
kernfs_node->id is currently a union kernfs_node_id which represents
either a 32bit (ino, gen) pair or u64 value.  I can't see much value
in the usage of the union - all that's needed is a 64bit ID which the
current code is already limited to.  Using a union makes the code
unnecessarily complicated and prevents using 64bit ino without adding
practical benefits.

This patch drops union kernfs_node_id and makes kernfs_node->id a u64.
ino is stored in the lower 32bits and gen upper.  Accessors -
kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the
ino and gen.  This simplifies ID handling less cumbersome and will
allow using 64bit inos on supported archs.

This patch doesn't make any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexei Starovoitov <ast@kernel.org>
2019-11-12 08:18:03 -08:00
Srivatsa S. Bhat (VMware)
d61ca3c25e sched/Kconfig: Fix spelling mistake in user-visible help text
Fix a spelling mistake in the help text for PREEMPT_RT.

Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/157204450499.10518.4542293884417101528.stgit@srivatsa-ubuntu
2019-11-12 11:35:32 +01:00
Mukesh Ojha
1d6acc18fe time: Fix spelling mistake in comment
witin => within

Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1571124819-9639-1-git-send-email-mojha@codeaurora.org
2019-11-12 11:30:46 +01:00
Arnd Bergmann
20d087368d time: Optimize ns_to_timespec64()
ns_to_timespec64() calls div_s64_rem(), which is a rather slow function on
32-bit architectures, as it cannot take advantage of the do_div()
optimizations for constant arguments.

Open-code the div_s64_rem() function in ns_to_timespec64(), so a constant
divider can be passed into the optimized div_u64_rem() function.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191108203435.112759-3-arnd@arndb.de
2019-11-12 08:15:15 +01:00
Arnd Bergmann
2f5841349d ntp/y2038: Remove incorrect time_t truncation
A cast to 'time_t' was accidentally left in place during the
conversion of __do_adjtimex() to 64-bit timestamps, so the
resulting value is incorrectly truncated.

Remove the cast so the 64-bit time gets propagated correctly.

Fixes: ead25417f82e ("timex: use __kernel_timex internally")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191108203435.112759-2-arnd@arndb.de
2019-11-12 08:13:44 +01:00
Rafael J. Wysocki
c1d51f684c cpuidle: Use nanoseconds as the unit of time
Currently, the cpuidle subsystem uses microseconds as the unit of
time which (among other things) causes the idle loop to incur some
integer division overhead for no clear benefit.

In order to allow cpuidle to measure time in nanoseconds, add two
new fields, exit_latency_ns and target_residency_ns, to represent the
exit latency and target residency of an idle state in nanoseconds,
respectively, to struct cpuidle_state and initialize them with the
help of the corresponding values in microseconds provided by drivers.
Additionally, change cpuidle_governor_latency_req() to return the
idle state exit latency constraint in nanoseconds.

Also meeasure idle state residency (last_residency_ns in struct
cpuidle_device and time_ns in struct cpuidle_driver) in nanoseconds
and update the cpuidle core and governors accordingly.

However, the menu governor still computes typical intervals in
microseconds to avoid integer overflows.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
2019-11-11 21:56:07 +01:00
Linus Torvalds
de620fb99e Merge branch 'for-5.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "There's an inadvertent preemption point in ptrace_stop() which was
  reliably triggering for a test scenario significantly slowing it down.

  This contains Oleg's fix to remove the unwanted preemption point"

* 'for-5.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop()
2019-11-11 12:41:14 -08:00
Masahiro Yamada
f276031b4e kheaders: explain why include/config/autoconf.h is excluded from md5sum
This comment block explains why include/generated/compile.h is omitted,
but nothing about include/generated/autoconf.h, which might be more
difficult to understand. Add more comments.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-11 20:10:01 +09:00
Masahiro Yamada
1463f74f49 kheaders: remove the last bashism to allow sh to run it
'pushd' ... 'popd' is the last bash-specific code in this script.
One way to avoid it is to run the code in a sub-shell.

With that addressed, you can run this script with sh.

I replaced $(BASH) with $(CONFIG_SHELL), and I changed the hashbang
to #!/bin/sh.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-11 20:10:01 +09:00
Masahiro Yamada
ea79e5168b kheaders: optimize header copy for in-tree builds
This script copies headers by the cpio command twice; first from
srctree, and then from objtree. However, when we building in-tree,
we know the srctree and the objtree are the same. That is, all the
headers copied by the first cpio are overwritten by the second one.

Skip the first cpio when we are building in-tree.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-11 20:10:01 +09:00
Masahiro Yamada
0e11773e76 kheaders: optimize md5sum calculation for in-tree builds
This script computes md5sum of headers in srctree and in objtree.
However, when we are building in-tree, we know the srctree and the
objtree are the same. That is, we end up with the same computation
twice. In fact, the first two lines of kernel/kheaders.md5 are always
the same for in-tree builds.

Unify the two md5sum calculations.

For in-tree builds ($building_out_of_srctree is empty), we check
only two directories, "include", and "arch/$SRCARCH/include".

For out-of-tree builds ($building_out_of_srctree is 1), we check
4 directories, "$srctree/include", "$srctree/arch/$SRCARCH/include",
"include", and "arch/$SRCARCH/include" since we know they are all
different.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-11 20:10:01 +09:00
Masahiro Yamada
9a06635718 kheaders: remove unneeded 'cat' command piped to 'head' / 'tail'
The 'head' and 'tail' commands can take a file path directly.
So, you do not need to run 'cat'.

  cat kernel/kheaders.md5 | head -1

... is equivalent to:

  head -1 kernel/kheaders.md5

and the latter saves forking one process.

While I was here, I replaced 'head -1' with 'head -n 1'.

I also replaced '==' with '=' since we do not have a good reason to
use the bashism.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-11-11 20:10:01 +09:00
Eric Dumazet
5e76f56457 dma-debug: increase HASH_SIZE
With modern NIC, it is not unusual having about ~256,000 active dma
mappings and a hash size of 1024 buckets is too small.

Forcing full cache line per bucket does not seem useful, especially now
that we have contention on free_entries_lock for allocations and freeing
of entries.  Better use the space to fit more buckets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-11-11 10:52:18 +01:00
Eric Dumazet
d3694f3073 dma-debug: reorder struct dma_debug_entry fields
Move all fields used during exact match lookups to the first cache line.
This makes debug_dma_mapping_error() and friends about 50% faster.

Since it removes two 32bit holes, force a cacheline alignment on struct
dma_debug_entry.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-11-11 10:52:18 +01:00
Christoph Hellwig
3acac06550 dma-mapping: merge the generic remapping helpers into dma-direct
Integrate the generic dma remapping implementation into the main flow.
This prepares for architectures like xtensa that use an uncached
segment for pages in the kernel mapping, but can also remap highmem
from CMA.  To simplify that implementation we now always deduct the
page from the physical address via the DMA address instead of the
virtual address.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
2019-11-11 10:52:18 +01:00
Christoph Hellwig
34dc0ea6bc dma-direct: provide mmap and get_sgtable method overrides
For dma-direct we know that the DMA address is an encoding of the
physical address that we can trivially decode.  Use that fact to
provide implementations that do not need the arch_dma_coherent_to_pfn
architecture hook.  Note that we still can only support mmap of
non-coherent memory only if the architecture provides a way to set an
uncached bit in the page tables.  This must be true for architectures
that use the generic remap helpers, but other architectures can also
manually select it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
2019-11-11 10:52:15 +01:00
Jiri Slaby
4b48512c2e stacktrace: Get rid of unneeded '!!' pattern
My commit b0c51f158455 ("stacktrace: Don't skip first entry on
noncurrent tasks") adds one or zero to skipnr by "!!(current == tsk)".

But the C99 standard says:

  The == (equal to) and != (not equal to) operators are
  ...
  Each of the operators yields 1 if the specified relation is true and 0
  if it is false.

So there is no need to prepend the above expression by "!!" -- remove it.

Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191111092647.27419-1-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 10:30:59 +01:00
Frederic Weisbecker
feb4a51323 irq_work: Slightly simplify IRQ_WORK_PENDING clearing
Instead of fetching the value of flags and perform an xchg() to clear
a bit, just use atomic_fetch_andnot() that is more suitable to do that
job in one operation while keeping the full ordering.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191108160858.31665-4-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 09:03:31 +01:00
Frederic Weisbecker
25269871db irq_work: Fix irq_work_claim() memory ordering
When irq_work_claim() finds IRQ_WORK_PENDING flag already set, we just
return and don't raise a new IPI. We expect the destination to see
and handle our latest updades thanks to the pairing atomic_xchg()
in irq_work_run_list().

But cmpxchg() doesn't guarantee a full memory barrier upon failure. So
it's possible that the destination misses our latest updates.

So use atomic_fetch_or() instead that is unconditionally fully ordered
and also performs exactly what we want here and simplify the code.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191108160858.31665-3-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 09:03:31 +01:00
Frederic Weisbecker
153bedbac2 irq_work: Convert flags to atomic_t
We need to convert flags to atomic_t in order to later fix an ordering
issue on atomic_cmpxchg() failure. This will allow us to use atomic_fetch_or().

Also clarify the nature of those flags.

[ mingo: Converted two more usage site the original patch missed. ]

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191108160858.31665-2-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 09:02:56 +01:00
Peter Zijlstra
a0e813f26e sched/core: Further clarify sched_class::set_next_task()
It turns out there really is something special to the first
set_next_task() invocation. In specific the 'change' pattern really
should not cause balance callbacks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: ktkhai@virtuozzo.com
Cc: mgorman@suse.de
Cc: qais.yousef@arm.com
Cc: qperret@google.com
Cc: rostedt@goodmis.org
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Fixes: f95d4eaee6d0 ("sched/{rt,deadline}: Fix set_next_task vs pick_next_task")
Link: https://lkml.kernel.org/r/20191108131909.775434698@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 08:35:21 +01:00
Peter Zijlstra
2eeb01a28c sched/fair: Use mul_u32_u32()
While reading the code I encountered another site where we should be
using mul_u32_u32() because GCC just won't take a hint.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: ktkhai@virtuozzo.com
Cc: mgorman@suse.de
Cc: qais.yousef@arm.com
Cc: qperret@google.com
Cc: rostedt@goodmis.org
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20191108131909.717931380@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 08:35:20 +01:00
Peter Zijlstra
98c2f700ed sched/core: Simplify sched_class::pick_next_task()
Now that the indirect class call never uses the last two arguments of
pick_next_task(), remove them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: ktkhai@virtuozzo.com
Cc: mgorman@suse.de
Cc: qais.yousef@arm.com
Cc: qperret@google.com
Cc: rostedt@goodmis.org
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20191108131909.660595546@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 08:35:20 +01:00
Peter Zijlstra
5d7d605642 sched/core: Optimize pick_next_task()
Ever since we moved the sched_class definitions into their own files,
the constant expression {fair,idle}_sched_class.pick_next_task() is
not in fact a compile time constant anymore and results in an indirect
call (barring LTO).

Fix that by exposing pick_next_task_{fair,idle}() directly, this gets
rid of the indirect call (and RETPOLINE) on the fast path.

Also remove the unlikely() from the idle case, it is in fact /the/ way
we select idle -- and that is a very common thing to do.

Performance for will-it-scale/sched_yield improves by 2% (as reported
by 0-day).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: ktkhai@virtuozzo.com
Cc: mgorman@suse.de
Cc: qais.yousef@arm.com
Cc: qperret@google.com
Cc: rostedt@goodmis.org
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20191108131909.603037345@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 08:35:19 +01:00
Peter Zijlstra
f488e1057b sched/core: Make pick_next_task_idle() more consistent
Only pick_next_task_fair() needs the @prev and @rf argument; these are
required to implement the cpu-cgroup optimization. None of the other
pick_next_task() methods need this. Make pick_next_task_idle() more
consistent.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: ktkhai@virtuozzo.com
Cc: mgorman@suse.de
Cc: qais.yousef@arm.com
Cc: qperret@google.com
Cc: rostedt@goodmis.org
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20191108131909.545730862@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 08:35:19 +01:00
Peter Zijlstra
7277a34c6b sched/fair: Better document newidle_balance()
Whilst chasing the pick_next_task() race, there was some confusion
about the newidle_balance() return values. Document them.

[ mingo: Minor edits. ]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: dietmar.eggemann@arm.com
Cc: juri.lelli@redhat.com
Cc: ktkhai@virtuozzo.com
Cc: mgorman@suse.de
Cc: qais.yousef@arm.com
Cc: qperret@google.com
Cc: rostedt@goodmis.org
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20191108131909.488364308@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 08:35:18 +01:00
Ingo Molnar
6d5a763c30 Linux 5.4-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3IqJQeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGOiUH+gOEDwid5OODaFAd
 CggXugdFIlBZefKqGVNW5sjgX8pxFWHXuEMC8iNb6QXtQZdFrI6LFf9hhUDmzQtm
 6y1LPxxEiTZjObMEsBNylb7tyzgujFHcAlp0Zro3w/HLCqmYTSP3FF46i2u6KZfL
 XhkpM4X7R7qxlfpdhlfESv/ElRGocZe6SwXfC7pcPo5flFcmkdu9ijqhNd/6CZ/h
 Nf9rTsD/wEDVUelFbgVN+LJzlaB0tsyc4Zbof07n8OsFZjhdEOop8gfM/kTBLcyY
 6bh66SfDScdsNnC/l8csbPjSZRx+i+nQs67DyhGNnsSAFgHBZdC4Tb/2mDCwhCLR
 dUvuYZc=
 =1N6F
 -----END PGP SIGNATURE-----

Merge tag 'v5.4-rc7' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 08:34:59 +01:00
Ingo Molnar
1ca7feb590 Linux 5.4-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3IqJQeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGOiUH+gOEDwid5OODaFAd
 CggXugdFIlBZefKqGVNW5sjgX8pxFWHXuEMC8iNb6QXtQZdFrI6LFf9hhUDmzQtm
 6y1LPxxEiTZjObMEsBNylb7tyzgujFHcAlp0Zro3w/HLCqmYTSP3FF46i2u6KZfL
 XhkpM4X7R7qxlfpdhlfESv/ElRGocZe6SwXfC7pcPo5flFcmkdu9ijqhNd/6CZ/h
 Nf9rTsD/wEDVUelFbgVN+LJzlaB0tsyc4Zbof07n8OsFZjhdEOop8gfM/kTBLcyY
 6bh66SfDScdsNnC/l8csbPjSZRx+i+nQs67DyhGNnsSAFgHBZdC4Tb/2mDCwhCLR
 dUvuYZc=
 =1N6F
 -----END PGP SIGNATURE-----

Merge tag 'v5.4-rc7' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-11 07:59:06 +01:00
Linus Torvalds
621084cd3d Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "A small set of fixes for timekeepoing and clocksource drivers:

   - VDSO data was updated conditional on the availability of a VDSO
     capable clocksource. This causes the VDSO functions which do not
     depend on a VDSO capable clocksource to operate on stale data.
     Always update unconditionally.

   - Prevent a double free in the mediatek driver

   - Use the proper helper in the sh_mtu2 driver so it won't attempt to
     initialize non-existing interrupts"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping/vsyscall: Update VDSO data unconditionally
  clocksource/drivers/sh_mtu2: Do not loop using platform_get_irq_by_name()
  clocksource/drivers/mediatek: Fix error handling
2019-11-10 12:03:58 -08:00
Linus Torvalds
81388c2b3f Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "Two fixes for scheduler regressions:

   - Plug a subtle race condition which was introduced with the rework
     of the next task selection functionality. The change of task
     properties became unprotected which can be observed inconsistently
     causing state corruption.

   - A trivial compile fix for CONFIG_CGROUPS=n"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix pick_next_task() vs 'change' pattern race
  sched/core: Fix compilation error when cgroup not selected
2019-11-10 12:00:47 -08:00
Linus Torvalds
ffba65ea24 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixlet from Thomas Gleixner:
 "A trivial fix for a kernel doc regression where an argument change was
  not reflected in the documentation"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irq/irqdomain: Update __irq_domain_alloc_fwnode() function documentation
2019-11-10 11:51:11 -08:00
Linus Torvalds
20c7e29684 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull stacktrace fix from Thomas Gleixner:
 "A small fix for a stacktrace regression.

  Saving a stacktrace for a foreign task skipped an extra entry which
  makes e.g. the output of /proc/$PID/stack incomplete"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stacktrace: Don't skip first entry on noncurrent tasks
2019-11-10 11:47:39 -08:00
Al Viro
69924b8968 audit_get_nd(): don't unlock parent too early
if the child has been negative and just went positive
under us, we want coherent d_is_positive() and ->d_inode.
Don't unlock the parent until we'd done that work...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-11-10 11:56:55 -05:00
Al Viro
630faf81b3 cgroup: don't put ERR_PTR() into fc->root
the caller of ->get_tree() expects NULL left there on error...

Reported-by: Thibaut Sautereau <thibaut@sautereau.fr>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-11-10 11:53:27 -05:00
David S. Miller
14684b9301 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
One conflict in the BPF samples Makefile, some fixes in 'net' whilst
we were converting over to Makefile.target rules in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-09 11:04:37 -08:00
Linus Torvalds
0058b0a506 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) BPF sample build fixes from Björn Töpel

 2) Fix powerpc bpf tail call implementation, from Eric Dumazet.

 3) DCCP leaks jiffies on the wire, fix also from Eric Dumazet.

 4) Fix crash in ebtables when using dnat target, from Florian Westphal.

 5) Fix port disable handling whne removing bcm_sf2 driver, from Florian
    Fainelli.

 6) Fix kTLS sk_msg trim on fallback to copy mode, from Jakub Kicinski.

 7) Various KCSAN fixes all over the networking, from Eric Dumazet.

 8) Memory leaks in mlx5 driver, from Alex Vesker.

 9) SMC interface refcounting fix, from Ursula Braun.

10) TSO descriptor handling fixes in stmmac driver, from Jose Abreu.

11) Add a TX lock to synchonize the kTLS TX path properly with crypto
    operations. From Jakub Kicinski.

12) Sock refcount during shutdown fix in vsock/virtio code, from Stefano
    Garzarella.

13) Infinite loop in Intel ice driver, from Colin Ian King.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits)
  ixgbe: need_wakeup flag might not be set for Tx
  i40e: need_wakeup flag might not be set for Tx
  igb/igc: use ktime accessors for skb->tstamp
  i40e: Fix for ethtool -m issue on X722 NIC
  iavf: initialize ITRN registers with correct values
  ice: fix potential infinite loop because loop counter being too small
  qede: fix NULL pointer deref in __qede_remove()
  net: fix data-race in neigh_event_send()
  vsock/virtio: fix sock refcnt holding during the shutdown
  net: ethernet: octeon_mgmt: Account for second possible VLAN header
  mac80211: fix station inactive_time shortly after boot
  net/fq_impl: Switch to kvmalloc() for memory allocation
  mac80211: fix ieee80211_txq_setup_flows() failure path
  ipv4: Fix table id reference in fib_sync_down_addr
  ipv6: fixes rt6_probe() and fib6_nh->last_probe init
  net: hns: Fix the stray netpoll locks causing deadlock in NAPI path
  net: usb: qmi_wwan: add support for DW5821e with eSIM support
  CDC-NCM: handle incomplete transfer of MTU
  nfc: netlink: fix double device reference drop
  NFC: st21nfca: fix double free
  ...
2019-11-08 18:21:05 -08:00
Peter Zijlstra
6e2df0581f sched: Fix pick_next_task() vs 'change' pattern race
Commit 67692435c411 ("sched: Rework pick_next_task() slow-path")
inadvertly introduced a race because it changed a previously
unexplored dependency between dropping the rq->lock and
sched_class::put_prev_task().

The comments about dropping rq->lock, in for example
newidle_balance(), only mentions the task being current and ->on_cpu
being set. But when we look at the 'change' pattern (in for example
sched_setnuma()):

	queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
	running = task_current(rq, p); /* rq->curr == p */

	if (queued)
		dequeue_task(...);
	if (running)
		put_prev_task(...);

	/* change task properties */

	if (queued)
		enqueue_task(...);
	if (running)
		set_next_task(...);

It becomes obvious that if we do this after put_prev_task() has
already been called on @p, things go sideways. This is exactly what
the commit in question allows to happen when it does:

	prev->sched_class->put_prev_task(rq, prev, rf);
	if (!rq->nr_running)
		newidle_balance(rq, rf);

The newidle_balance() call will drop rq->lock after we've called
put_prev_task() and that allows the above 'change' pattern to
interleave and mess up the state.

Furthermore, it turns out we lost the RT-pull when we put the last DL
task.

Fix both problems by extracting the balancing from put_prev_task() and
doing a multi-class balance() pass before put_prev_task().

Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path")
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Quentin Perret <qperret@google.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
2019-11-08 22:34:14 +01:00
Qais Yousef
e3b8b6a0d1 sched/core: Fix compilation error when cgroup not selected
When cgroup is disabled the following compilation error was hit

	kernel/sched/core.c: In function ‘uclamp_update_active_tasks’:
	kernel/sched/core.c:1081:23: error: storage size of ‘it’ isn’t known
	  struct css_task_iter it;
			       ^~
	kernel/sched/core.c:1084:2: error: implicit declaration of function ‘css_task_iter_start’; did you mean ‘__sg_page_iter_start’? [-Werror=implicit-function-declaration]
	  css_task_iter_start(css, 0, &it);
	  ^~~~~~~~~~~~~~~~~~~
	  __sg_page_iter_start
	kernel/sched/core.c:1085:14: error: implicit declaration of function ‘css_task_iter_next’; did you mean ‘__sg_page_iter_next’? [-Werror=implicit-function-declaration]
	  while ((p = css_task_iter_next(&it))) {
		      ^~~~~~~~~~~~~~~~~~
		      __sg_page_iter_next
	kernel/sched/core.c:1091:2: error: implicit declaration of function ‘css_task_iter_end’; did you mean ‘get_task_cred’? [-Werror=implicit-function-declaration]
	  css_task_iter_end(&it);
	  ^~~~~~~~~~~~~~~~~
	  get_task_cred
	kernel/sched/core.c:1081:23: warning: unused variable ‘it’ [-Wunused-variable]
	  struct css_task_iter it;
			       ^~
	cc1: some warnings being treated as errors
	make[2]: *** [kernel/sched/core.o] Error 1

Fix by protetion uclamp_update_active_tasks() with
CONFIG_UCLAMP_TASK_GROUP

Fixes: babbe170e053 ("sched/uclamp: Update CPU's refcount on TG's clamp changes")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Patrick Bellasi <patrick.bellasi@matbug.net>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Ben Segall <bsegall@google.com>
Link: https://lkml.kernel.org/r/20191105112212.596-1-qais.yousef@arm.com
2019-11-08 22:34:14 +01:00
Catalin Marinas
6be22809e5 Merge branches 'for-next/elf-hwcap-docs', 'for-next/smccc-conduit-cleanup', 'for-next/zone-dma', 'for-next/relax-icc_pmr_el1-sync', 'for-next/double-page-fault', 'for-next/misc', 'for-next/kselftest-arm64-signal' and 'for-next/kaslr-diagnostics' into for-next/core
* for-next/elf-hwcap-docs:
  : Update the arm64 ELF HWCAP documentation
  docs/arm64: cpu-feature-registers: Rewrite bitfields that don't follow [e, s]
  docs/arm64: cpu-feature-registers: Documents missing visible fields
  docs/arm64: elf_hwcaps: Document HWCAP_SB
  docs/arm64: elf_hwcaps: sort the HWCAP{, 2} documentation by ascending value

* for-next/smccc-conduit-cleanup:
  : SMC calling convention conduit clean-up
  firmware: arm_sdei: use common SMCCC_CONDUIT_*
  firmware/psci: use common SMCCC_CONDUIT_*
  arm: spectre-v2: use arm_smccc_1_1_get_conduit()
  arm64: errata: use arm_smccc_1_1_get_conduit()
  arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()

* for-next/zone-dma:
  : Reintroduction of ZONE_DMA for Raspberry Pi 4 support
  arm64: mm: reserve CMA and crashkernel in ZONE_DMA32
  dma/direct: turn ARCH_ZONE_DMA_BITS into a variable
  arm64: Make arm64_dma32_phys_limit static
  arm64: mm: Fix unused variable warning in zone_sizes_init
  mm: refresh ZONE_DMA and ZONE_DMA32 comments in 'enum zone_type'
  arm64: use both ZONE_DMA and ZONE_DMA32
  arm64: rename variables used to calculate ZONE_DMA32's size
  arm64: mm: use arm64_dma_phys_limit instead of calling max_zone_dma_phys()

* for-next/relax-icc_pmr_el1-sync:
  : Relax ICC_PMR_EL1 (GICv3) accesses when ICC_CTLR_EL1.PMHE is clear
  arm64: Document ICC_CTLR_EL3.PMHE setting requirements
  arm64: Relax ICC_PMR_EL1 accesses when ICC_CTLR_EL1.PMHE is clear

* for-next/double-page-fault:
  : Avoid a double page fault in __copy_from_user_inatomic() if hw does not support auto Access Flag
  mm: fix double page fault on arm64 if PTE_AF is cleared
  x86/mm: implement arch_faults_on_old_pte() stub on x86
  arm64: mm: implement arch_faults_on_old_pte() on arm64
  arm64: cpufeature: introduce helper cpu_has_hw_af()

* for-next/misc:
  : Various fixes and clean-ups
  arm64: kpti: Add NVIDIA's Carmel core to the KPTI whitelist
  arm64: mm: Remove MAX_USER_VA_BITS definition
  arm64: mm: simplify the page end calculation in __create_pgd_mapping()
  arm64: print additional fault message when executing non-exec memory
  arm64: psci: Reduce the waiting time for cpu_psci_cpu_kill()
  arm64: pgtable: Correct typo in comment
  arm64: docs: cpu-feature-registers: Document ID_AA64PFR1_EL1
  arm64: cpufeature: Fix typos in comment
  arm64/mm: Poison initmem while freeing with free_reserved_area()
  arm64: use generic free_initrd_mem()
  arm64: simplify syscall wrapper ifdeffery

* for-next/kselftest-arm64-signal:
  : arm64-specific kselftest support with signal-related test-cases
  kselftest: arm64: fake_sigreturn_misaligned_sp
  kselftest: arm64: fake_sigreturn_bad_size
  kselftest: arm64: fake_sigreturn_duplicated_fpsimd
  kselftest: arm64: fake_sigreturn_missing_fpsimd
  kselftest: arm64: fake_sigreturn_bad_size_for_magic0
  kselftest: arm64: fake_sigreturn_bad_magic
  kselftest: arm64: add helper get_current_context
  kselftest: arm64: extend test_init functionalities
  kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht]
  kselftest: arm64: mangle_pstate_invalid_daif_bits
  kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils
  kselftest: arm64: extend toplevel skeleton Makefile

* for-next/kaslr-diagnostics:
  : Provide diagnostics on boot for KASLR
  arm64: kaslr: Check command line before looking for a seed
  arm64: kaslr: Announce KASLR status on boot
2019-11-08 17:46:11 +00:00
Steven Rostedt (VMware)
7e16f581a8 ftrace: Separate out functionality from ftrace_location_range()
Create a new function called lookup_rec() from the functionality of
ftrace_location_range(). The difference between lookup_rec() is that it
returns the record that it finds, where as ftrace_location_range() returns
only if it found a match or not.

The lookup_rec() is static, and can be used for new functionality where
ftrace needs to find a record of a specific address.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-11-08 12:26:46 -05:00
Steven Rostedt (VMware)
714641c367 ftrace: Separate out the copying of a ftrace_hash from __ftrace_hash_move()
Most of the functionality of __ftrace_hash_move() can be reused, but not all
of it. That is, __ftrace_hash_move() is used to simply make a new hash from
an existing one, using the same size as the original. Creating a dup_hash(),
where we can specify a new size will be useful when we want to create a hash
with a default size, or simply copy the old one.

Signed-off-by: Steven Rostedt (VMWare) <rostedt@goodmis.org>
2019-11-08 12:25:46 -05:00
Martin KaFai Lau
7e3617a72d bpf: Add array support to btf_struct_access
This patch adds array support to btf_struct_access().
It supports array of int, array of struct and multidimensional
array.

It also allows using u8[] as a scratch space.  For example,
it allows access the "char cb[48]" with size larger than
the array's element "char".  Another potential use case is
"u64 icsk_ca_priv[]" in the tcp congestion control.

btf_resolve_size() is added to resolve the size of any type.
It will follow the modifier if there is any.  Please
see the function comment for details.

This patch also adds the "off < moff" check at the beginning
of the for loop.  It is to reject cases when "off" is pointing
to a "hole" in a struct.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191107180903.4097702-1-kafai@fb.com
2019-11-07 10:59:08 -08:00
Christoph Hellwig
4e1003aa56 dma-direct: remove the dma_handle argument to __dma_direct_alloc_pages
The argument isn't used anywhere, so stop passing it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
2019-11-07 17:25:57 +01:00
Christoph Hellwig
acaade1af3 dma-direct: remove __dma_direct_free_pages
We can just call dma_free_contiguous directly instead of wrapping it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
2019-11-07 17:25:40 +01:00