IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Pull x86 fixes from Thomas Gleixner:
"A set of fixes and updates for x86:
- Address a swiotlb regression which was caused by the recent DMA
rework and made driver fail because dma_direct_supported() returned
false
- Fix a signedness bug in the APIC ID validation which caused invalid
APIC IDs to be detected as valid thereby bloating the CPU possible
space.
- Fix inconsisten config dependcy/select magic for the MFD_CS5535
driver.
- Fix a corruption of the physical address space bits when encryption
has reduced the address space and late cpuinfo updates overwrite
the reduced bit information with the original value.
- Dominiks syscall rework which consolidates the architecture
specific syscall functions so all syscalls can be wrapped with the
same macros. This allows to switch x86/64 to struct pt_regs based
syscalls. Extend the clearing of user space controlled registers in
the entry patch to the lower registers"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Fix signedness bug in APIC ID validity checks
x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
x86/olpc: Fix inconsistent MFD_CS5535 configuration
swiotlb: Use dma_direct_supported() for swiotlb_ops
syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
syscalls/core, syscalls/x86: Clean up syscall stub naming convention
syscalls/x86: Extend register clearing on syscall entry to lower registers
syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
x86/syscalls: Don't pointlessly reload the system call number
x86/mm: Fix documentation of module mapping range with 4-level paging
x86/cpuid: Switch to 'static const' specifier
Pull scheduler fixes from Thomas Gleixner:
"A few scheduler fixes:
- Prevent a bogus warning vs. runqueue clock update flags in
do_sched_rt_period_timer()
- Simplify the helper functions which handle requests for skipping
the runqueue clock updat.
- Do not unlock the tunables mutex in the error path of the cpu
frequency scheduler utils. Its not held.
- Enforce proper alignement for 'struct util_est' in sched_avg to
prevent a misalignment fault on IA64"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Force proper alignment of 'struct util_est'
sched/core: Simplify helpers for rq clock update skip requests
sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
sched/cpufreq/schedutil: Fix error path mutex unlock
Pull more perf updates from Thomas Gleixner:
"A rather large set of perf updates:
Kernel:
- Fix various initialization issues
- Prevent creating [ku]probes for not CAP_SYS_ADMIN users
Tooling:
- Show only failing syscalls with 'perf trace --failure' (Arnaldo
Carvalho de Melo)
e.g: See what 'openat' syscalls are failing:
# perf trace --failure -e openat
762.323 ( 0.007 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video2) = -1 ENOENT No such file or directory
<SNIP N /dev/videoN open attempts... sigh, where is that improvised camera lid?!? >
790.228 ( 0.008 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video63) = -1 ENOENT No such file or directory
^C#
- Show information about the event (freq, nr_samples, total
period/nr_events) in the annotate --tui and --stdio2 'perf
annotate' output, similar to the first line in the 'perf report
--tui', but just for the samples for a the annotated symbol
(Arnaldo Carvalho de Melo)
- Introduce 'perf version --build-options' to show what features were
linked, aliased as well as a shorter 'perf -vv' (Jin Yao)
- Add a "dso_size" sort order (Kim Phillips)
- Remove redundant ')' in the tracepoint output in 'perf trace'
(Changbin Du)
- Synchronize x86's cpufeatures.h, no effect on toolss (Arnaldo
Carvalho de Melo)
- Show group details on the title line in the annotate browser and
'perf annotate --stdio2' output, so that the per-event columns can
have headers (Arnaldo Carvalho de Melo)
- Fixup vertical line separating metrics from instructions and
cleaning unused lines at the bottom, both in the annotate TUI
browser (Arnaldo Carvalho de Melo)
- Remove duplicated 'samples' in lost samples warning in
'perf report' (Arnaldo Carvalho de Melo)
- Synchronize i915_drm.h, silencing the perf build process,
automagically adding support for the new DRM_I915_QUERY ioctl
(Arnaldo Carvalho de Melo)
- Make auxtrace_queues__add_buffer() allocate struct buffer, from a
patchkit already applied (Adrian Hunter)
- Fix the --stdio2/TUI annotate output to include group details, be
it for a recorded '{a,b,f}' explicit event group or when forcing
group display using 'perf report --group' for a set of events not
recorded as a group (Arnaldo Carvalho de Melo)
- Fix display artifacts in the ui browser (base class for the
annotate and main report/top TUI browser) related to the extra
title lines work (Arnaldo Carvalho de Melo)
- perf auxtrace refactorings, leftovers from a previously partially
processed patchset (Adrian Hunter)
- Fix the builtin clang build (Sandipan Das, Arnaldo Carvalho de
Melo)
- Synchronize i915_drm.h, silencing a perf build warning and in the
process automagically adding support for a new ioctl command
(Arnaldo Carvalho de Melo)
- Fix a strncpy issue in uprobe tracing"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
perf/core: Need CAP_SYS_ADMIN to create k/uprobe with perf_event_open()
tracing/uprobe_event: Fix strncpy corner case
perf/core: Fix perf_uprobe_init()
perf/core: Fix perf_kprobe_init()
perf/core: Fix use-after-free in uprobe_perf_close()
perf tests clang: Fix function name for clang IR test
perf clang: Add support for recent clang versions
perf tools: Fix perf builds with clang support
perf tools: No need to include namespaces.h in util.h
perf hists browser: Remove leftover from row returned from refresh
perf hists browser: Show extra_title_lines in the 'D' debug hotkey
perf auxtrace: Make auxtrace_queues__add_buffer() do CPU filtering
tools headers uapi: Synchronize i915_drm.h
perf report: Remove duplicated 'samples' in lost samples warning
perf ui browser: Fixup cleaning unused lines at the bottom
perf annotate browser: Fixup vertical line separating metrics from instructions
perf annotate: Show group details on the title line
perf auxtrace: Make auxtrace_queues__add_buffer() allocate struct buffer
perf/x86/intel: Move regs->flags EXACT bit init
perf trace: Remove redundant ')'
...
Pull irq affinity fixes from Thomas Gleixner:
- Fix error path handling in the affinity spreading code
- Make affinity spreading smarter to avoid issues on systems which
claim to have hotpluggable CPUs while in fact they can't hotplug
anything.
So instead of trying to spread the vectors (and thereby the
associated device queues) to all possibe CPUs, spread them on all
present CPUs first. If there are left over vectors after that first
step they are spread among the possible, but not present CPUs which
keeps the code backwards compatible for virtual decives and NVME
which allocate a queue per possible CPU, but makes the spreading
smarter for devices which have less queues than possible or present
CPUs.
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/affinity: Spread irq vectors among present CPUs as far as possible
genirq/affinity: Allow irq spreading from a given starting point
genirq/affinity: Move actual irq vector spreading into a helper function
genirq/affinity: Rename *node_to_possible_cpumask as *node_to_cpumask
genirq/affinity: Don't return with empty affinity masks on error
For s390 new kernels are loaded to fixed addresses in memory before they
are booted. With the current code this is a problem as it assumes the
kernel will be loaded to an 'arbitrary' address. In particular,
kexec_locate_mem_hole searches for a large enough memory region and sets
the load address (kexec_bufer->mem) to it.
Luckily there is a simple workaround for this problem. By returning 1
in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off. This
allows the architecture to set kbuf->mem by hand. While the trick works
fine for the kernel it does not for the purgatory as here the
architectures don't have access to its kexec_buffer.
Give architectures access to the purgatories kexec_buffer by changing
kexec_load_purgatory to take a pointer to it. With this change
architectures have access to the buffer and can edit it as they need.
A nice side effect of this change is that we can get rid of the
purgatory_info->purgatory_load_address field. As now the information
stored there can directly be accessed from kbuf->mem.
Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "kexec_file, x86, powerpc: refactoring for other
architecutres", v2.
This is a preparatory patchset for adding kexec_file support on arm64.
It was originally included in a arm64 patch set[1], but Philipp is also
working on their kexec_file support on s390[2] and some changes are now
conflicting.
So these common parts were extracted and put into a separate patch set
for better integration. What's more, my original patch#4 was split into
a few small chunks for easier review after Dave's comment.
As such, the resulting code is basically identical with my original, and
the only *visible* differences are:
- renaming of _kexec_kernel_image_probe() and _kimage_file_post_load_cleanup()
- change one of types of arguments at prepare_elf64_headers()
Those, unfortunately, require a couple of trivial changes on the rest
(#1, #6 to #13) of my arm64 kexec_file patch set[1].
Patch #1 allows making a use of purgatory optional, particularly useful
for arm64.
Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load,
verify_sig}() and arch_kimage_file_post_load_cleanup() across
architectures.
Patches #3-#7 are also intended to generalize parse_elf64_headers(),
along with exclude_mem_range(), to be made best re-use of.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html
[2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html
This patch (of 7):
On arm64, crash dump kernel's usable memory is protected by *unmapping*
it from kernel virtual space unlike other architectures where the region
is just made read-only. It is highly unlikely that the region is
accidentally corrupted and this observation rationalizes that digest
check code can also be dropped from purgatory. The resulting code is so
simple as it doesn't require a bit ugly re-linking/relocation stuff,
i.e. arch_kexec_apply_relocations_add().
Please see:
http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html
All that the purgatory does is to shuffle arguments and jump into a new
kernel, while we still need to have some space for a hash value
(purgatory_sha256_digest) which is never checked against.
As such, it doesn't make sense to have trampline code between old kernel
and new kernel on arm64.
This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and
allows related code to be compiled in only if necessary.
[takahiro.akashi@linaro.org: fix trivial screwup]
Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org
Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've got a bug report indicating a kernel panic at booting on an x86-32
system, and it turned out to be the invalid PCI resource assigned after
reallocation. __find_resource() first aligns the resource start address
and resets the end address with start+size-1 accordingly, then checks
whether it's contained. Here the end address may overflow the integer,
although resource_contains() still returns true because the function
validates only start and end address. So this ends up with returning an
invalid resource (start > end).
There was already an attempt to cover such a problem in the commit
47ea91b405 ("Resource: fix wrong resource window calculation"), but
this case is an overseen one.
This patch adds the validity check of the newly calculated resource for
avoiding the integer overflow problem.
Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1086739
Link: http://lkml.kernel.org/r/s5hpo37d5l8.wl-tiwai@suse.de
Fixes: 23c570a674 ("resource: ability to resize an allocated resource")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reported-by: Michael Henders <hendersm@shaw.ca>
Tested-by: Michael Henders <hendersm@shaw.ca>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull tracing fixes from Steven Rostedt:
"A few clean ups and bug fixes:
- replace open coded "ARRAY_SIZE()" with macro
- updates to uprobes
- bug fix for perf event filter on error path"
* tag 'trace-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Enforce passing in filter=NULL to create_filter()
trace_uprobe: Simplify probes_seq_show()
trace_uprobe: Use %lx to display offset
tracing/uprobe: Add support for overlayfs
tracing: Use ARRAY_SIZE() macro instead of open coding it
Pull kdb updates from Jason Wessel:
- fix 2032 time access issues and new compiler warnings
- minor regression test cleanup
- formatting fixes for end user use of kdb
* tag 'for_linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
kdb: use memmove instead of overlapping memcpy
kdb: use ktime_get_mono_fast_ns() instead of ktime_get_ts()
kdb: bl: don't use tab character in output
kdb: drop newline in unknown command output
kdb: make "mdr" command repeat
kdb: use __ktime_get_real_seconds instead of __current_kernel_time
misc: kgdbts: Display progress of asynchronous tests
Pull more power management updates from Rafael Wysocki:
"These include one big-ticket item which is the rework of the idle loop
in order to prevent CPUs from spending too much time in shallow idle
states. It reduces idle power on some systems by 10% or more and may
improve performance of workloads in which the idle loop overhead
matters. This has been in the works for several weeks and it has been
tested and reviewed quite thoroughly.
Also included are changes that finalize the cpufreq cleanup moving
frequency table validation from drivers to the core, a few fixes and
cleanups of cpufreq drivers, a cpuidle documentation update and a PM
QoS core update to mark the expected switch fall-throughs in it.
Specifics:
- Rework the idle loop in order to prevent CPUs from spending too
much time in shallow idle states by making it stop the scheduler
tick before putting the CPU into an idle state only if the idle
duration predicted by the idle governor is long enough.
That required the code to be reordered to invoke the idle governor
before stopping the tick, among other things (Rafael Wysocki,
Frederic Weisbecker, Arnd Bergmann).
- Add the missing description of the residency sysfs attribute to the
cpuidle documentation (Prashanth Prakash).
- Finalize the cpufreq cleanup moving frequency table validation from
drivers to the core (Viresh Kumar).
- Fix a clock leak regression in the armada-37xx cpufreq driver
(Gregory Clement).
- Fix the initialization of the CPU performance data structures for
shared policies in the CPPC cpufreq driver (Shunyong Yang).
- Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers a
bit (Viresh Kumar, Rafael Wysocki).
- Mark the expected switch fall-throughs in the PM QoS core (Gustavo
Silva)"
* tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
tick-sched: avoid a maybe-uninitialized warning
cpufreq: Drop cpufreq_table_validate_and_show()
cpufreq: SCMI: Don't validate the frequency table twice
cpufreq: CPPC: Initialize shared perf capabilities of CPUs
cpufreq: armada-37xx: Fix clock leak
cpufreq: CPPC: Don't set transition_latency
cpufreq: ti-cpufreq: Use builtin_platform_driver()
cpufreq: intel_pstate: Do not include debugfs.h
PM / QoS: mark expected switch fall-throughs
cpuidle: Add definition of residency to sysfs documentation
time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
nohz: Avoid duplication of code related to got_idle_tick
nohz: Gather tick_sched booleans under a common flag field
cpuidle: menu: Avoid selecting shallow states with stopped tick
cpuidle: menu: Refine idle state selection for running tick
sched: idle: Select idle state before stopping the tick
time: hrtimer: Introduce hrtimer_next_event_without()
time: tick-sched: Split tick_nohz_stop_sched_tick()
cpuidle: Return nohz hint from cpuidle_select()
jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
...
This results in no change in structure size on 64-bit machines as it
fits in the padding between the gfp_t and the void *. 32-bit machines
will grow the structure from 8 to 12 bytes. Almost all radix trees are
protected with (at least) a spinlock, so as they are converted from
radix trees to xarrays, the data structures will shrink again.
Initialising the spinlock requires a name for the benefit of lockdep, so
RADIX_TREE_INIT() now needs to know the name of the radix tree it's
initialising, and so do IDR_INIT() and IDA_INIT().
Also add the xa_lock() and xa_unlock() family of wrappers to make it
easier to use the lock. If we could rely on -fplan9-extensions in the
compiler, we could avoid all of this syntactic sugar, but that wasn't
added until gcc 4.6.
Link: http://lkml.kernel.org/r/20180313132639.17387-8-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's some inconsistency with what to set the output parameter filterp
when passing to create_filter(..., struct event_filter **filterp).
Whatever filterp points to, should be NULL when calling this function. The
create_filter() calls create_filter_start() with a pointer to a local
"filter" variable that is set to NULL. The create_filter_start() has a
WARN_ON() if the passed in pointer isn't pointing to a value set to NULL.
Ideally, create_filter() should pass the filterp variable it received to
create_filter_start() and not hide it as with a local variable, this allowed
create_filter() to fail, and not update the passed in filter, and the caller
of create_filter() then tried to free filter, which was never initialized to
anything, causing memory corruption.
Link: http://lkml.kernel.org/r/00000000000032a0c30569916870@google.com
Fixes: 80765597bc ("tracing: Rewrite filter logic to be simpler and faster")
Reported-by: syzbot+dadcc936587643d7f568@syzkaller.appspotmail.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
uprobes cannot successfully attach to binaries located in a directory
mounted with overlayfs.
To verify, create directories for mounting overlayfs
(upper,lower,work,merge), move some binary into merge/ and use readelf
to obtain some known instruction of the binary. I used /bin/true and the
entry instruction(0x13b0):
$ mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merge
$ cd /sys/kernel/debug/tracing
$ echo 'p:true_entry PATH_TO_MERGE/merge/true:0x13b0' > uprobe_events
$ echo 1 > events/uprobes/true_entry/enable
This returns 'bash: echo: write error: Input/output error' and dmesg
tells us 'event trace: Could not enable event true_entry'
This change makes create_trace_uprobe() look for the real inode of a
dentry. In the case of normal filesystems, this simplifies to just
returning the inode. In the case of overlayfs(and similar fs) we will
obtain the underlying dentry and corresponding inode, upon which uprobes
can successfully register.
Running the example above with the patch applied, we can see that the
uprobe is enabled and will output to trace as expected.
Link: http://lkml.kernel.org/r/20180410231030.2720-1-hmclauchlan@fb.com
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
It is useless to re-invent the ARRAY_SIZE macro so let's use it instead
of DATA_CNT.
Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
(sizeof(E)@p /sizeof(*E))
|
(sizeof(E)@p /sizeof(E[...]))
|
(sizeof(E)@p /sizeof(T))
)
Link: http://lkml.kernel.org/r/20171016012250.26453-1-jeremy.lefaure@lse.epita.fr
Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
[ Removed useless include of kernel.h ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* pm-cpuidle:
tick-sched: avoid a maybe-uninitialized warning
cpuidle: Add definition of residency to sysfs documentation
time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
nohz: Avoid duplication of code related to got_idle_tick
nohz: Gather tick_sched booleans under a common flag field
cpuidle: menu: Avoid selecting shallow states with stopped tick
cpuidle: menu: Refine idle state selection for running tick
sched: idle: Select idle state before stopping the tick
time: hrtimer: Introduce hrtimer_next_event_without()
time: tick-sched: Split tick_nohz_stop_sched_tick()
cpuidle: Return nohz hint from cpuidle_select()
jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
sched: idle: Do not stop the tick before cpuidle_idle_call()
sched: idle: Do not stop the tick upfront in the idle loop
time: tick-sched: Reorganize idle tick management code
* pm-qos:
PM / QoS: mark expected switch fall-throughs
Pull tracing updates from Steven Rostedt:
"New features:
- Tom Zanussi's extended histogram work.
This adds the synthetic events to have histograms from multiple
event data Adds triggers "onmatch" and "onmax" to call the
synthetic events Several updates to the histogram code from this
- Allow way to nest ring buffer calls in the same context
- Allow absolute time stamps in ring buffer
- Rewrite of filter code parsing based on Al Viro's suggestions
- Setting of trace_clock to global if TSC is unstable (on boot)
- Better OOM handling when allocating large ring buffers
- Added initcall tracepoints (consolidated initcall_debug code with
them)
And other various fixes and clean ups"
* tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
init: Have initcall_debug still work without CONFIG_TRACEPOINTS
init, tracing: Have printk come through the trace events for initcall_debug
init, tracing: instrument security and console initcall trace events
init, tracing: Add initcall trace events
tracing: Add rcu dereference annotation for test func that touches filter->prog
tracing: Add rcu dereference annotation for filter->prog
tracing: Fixup logic inversion on setting trace_global_clock defaults
tracing: Hide global trace clock from lockdep
ring-buffer: Add set/clear_current_oom_origin() during allocations
ring-buffer: Check if memory is available before allocation
lockdep: Add print_irqtrace_events() to __warn
vsprintf: Do not preprocess non-dereferenced pointers for bprintf (%px and %pK)
tracing: Uninitialized variable in create_tracing_map_fields()
tracing: Make sure variable string fields are NULL-terminated
tracing: Add action comparisons when testing matching hist triggers
tracing: Don't add flag strings when displaying variable references
tracing: Fix display of hist trigger expressions containing timestamps
ftrace: Drop a VLA in module_exists()
tracing: Mention trace_clock=global when warning about unstable clocks
tracing: Default to using trace_global_clock if sched_clock is unstable
...
The use of bitfields seems to confuse gcc, leading to a false-positive
warning in all compiler versions:
kernel/time/tick-sched.c: In function 'tick_nohz_idle_exit':
kernel/time/tick-sched.c:538:2: error: 'now' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This introduces a temporary variable to track the flags so gcc
doesn't have to evaluate twice, eliminating the code path that
leads to the warning.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85301
Fixes: 1cae544d42d2 ("nohz: Gather tick_sched booleans under a common flag field")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull networking fixes from David Miller:
1) The sockmap code has to free socket memory on close if there is
corked data, from John Fastabend.
2) Tunnel names coming from userspace need to be length validated. From
Eric Dumazet.
3) arp_filter() has to take VRFs properly into account, from Miguel
Fadon Perlines.
4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti.
5) Missing idr_remove() in u32_delete_key(), from Cong Wang.
6) More syzbot stuff. Several use of uninitialized value fixes all
over, from Eric Dumazet.
7) Do not leak kernel memory to userspace in sctp, also from Eric
Dumazet.
8) Discard frames from unused ports in DSA, from Andrew Lunn.
9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas
Falcon.
10) Do not access dp83640 PHY registers prematurely after reset, from
Esben Haabendal.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
vhost-net: set packet weight of tx polling to 2 * vq size
net: thunderx: rework mac addresses list to u64 array
inetpeer: fix uninit-value in inet_getpeer
dp83640: Ensure against premature access to PHY registers after reset
devlink: convert occ_get op to separate registration
ARM: dts: ls1021a: Specify TBIPA register address
net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
ibmvnic: Do not reset CRQ for Mobility driver resets
ibmvnic: Fix failover case for non-redundant configuration
ibmvnic: Fix reset scheduler error handling
ibmvnic: Zero used TX descriptor counter on reset
ibmvnic: Fix DMA mapping mistakes
tipc: use the right skb in tipc_sk_fill_sock_diag()
sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
net: dsa: Discard frames from unused ports
sctp: do not leak kernel memory to user space
soreuseport: initialise timewait reuseport field
ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
dccp: initialize ireq->ir_mark
net: fix uninit-value in __hw_addr_add_ex()
...
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use timerqueue_iterate_next() to get to the next timer in
__hrtimer_next_event_base() without browsing the timerqueue
details diredctly.
No intentional changes in functionality.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move the code setting ts->got_idle_tick into tick_sched_do_timer() to
avoid code duplication.
No intentional changes in functionality.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Optimize the space and leave plenty of room for further flags.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
[ rjw: Do not use __this_cpu_read() to access tick_stopped and add
got_idle_tick to avoid overloading inidle ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the tick isn't stopped, the target residency of the state selected
by the menu governor may be greater than the actual time to the next
tick and that means lost energy.
To avoid that, make tick_nohz_get_sleep_length() return the current
time to the next event (before stopping the tick) in addition to the
estimated one via an extra pointer argument and make menu_select()
use that value to refine the state selection when necessary.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
In order to address the issue with short idle duration predictions
by the idle governor after the scheduler tick has been stopped,
reorder the code in cpuidle_idle_call() so that the governor idle
state selection runs before tick_nohz_idle_go_idle() and use the
"nohz" hint returned by cpuidle_select() to decide whether or not
to stop the tick.
This isn't straightforward, because menu_select() invokes
tick_nohz_get_sleep_length() to get the time to the next timer
event and the number returned by the latter comes from
__tick_nohz_idle_stop_tick(). Fortunately, however, it is possible
to compute that number without actually stopping the tick and with
the help of the existing code.
Namely, tick_nohz_get_sleep_length() can be made call
tick_nohz_next_event(), introduced earlier, to get the time to the
next non-highres timer event. If that happens, tick_nohz_next_event()
need not be called by __tick_nohz_idle_stop_tick() again.
If it turns out that the scheduler tick cannot be stopped going
forward or the next timer event is too close for the tick to be
stopped, tick_nohz_get_sleep_length() can simply return the time to
the next event currently programmed into the corresponding clock
event device.
In addition to knowing the return value of tick_nohz_next_event(),
however, tick_nohz_get_sleep_length() needs to know the time to the
next highres timer event, but with the scheduler tick timer excluded,
which can be computed with the help of hrtimer_get_next_event().
That minimum of that number and the tick_nohz_next_event() return
value is the total time to the next timer event with the assumption
that the tick will be stopped. It can be returned to the idle
governor which can use it for predicting idle duration (under the
assumption that the tick will be stopped) and deciding whether or
not it makes sense to stop the tick before putting the CPU into the
selected idle state.
With the above, the sleep_length field in struct tick_sched is not
necessary any more, so drop it.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199227
Reported-by: Doug Smythies <dsmythies@telus.net>
Reported-by: Thomas Ilsche <thomas.ilsche@tu-dresden.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf 2018-04-09
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Two sockmap fixes: i) fix a potential warning when a socket with
pending cork data is closed by freeing the memory right when the
socket is closed instead of seeing still outstanding memory at
garbage collector time, ii) fix a NULL pointer deref in case of
duplicates release calls, so make sure to only reset the sk_prot
pointer when it's in a valid state to do so, both from John.
2) Fix a compilation warning in bpf_prog_attach_check_attach_type()
by moving the function under CONFIG_CGROUP_BPF ifdef since only
used there, from Anders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>