IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
- Add sysadmin documentation for cpuidle (Rafael Wysocki).
- Make it possible to specify a cpuidle governor from kernel
command line, add new cpuidle state sysfs attributes for
governor evaluation, and improve the "polling" idle state
handling (Rafael Wysocki).
- Fix the handling of the "required-opps" DT property in the
operating performance points (OPP) framework, improve the
integration of it with the generic power domains (genpd)
framework, improve the handling of performance states in
them and clean up the idle states vs performance states
separation in genpd (Viresh Kumar, Ulf Hansson).
- Add a cpufreq driver called "qcom-hw" for Qualcomm SoCs using
a hardware engine to control CPU frequency transitions along
with DT bindings for it (Taniya Das).
- Fix an intel_pstate driver issue related to CPU offline and
update the documentation of it (Srinivas Pandruvada).
- Clean up the imx6q cpufreq driver (Anson Huang).
- Add SPDX license IDs to cpufreq schedutil governor files (Daniel
Lezcano).
- Switch over the runtime PM framework to using high-res timers
for device autosuspend to allow the control of it to be more
precise (Vincent Guittot).
- Disable non-wakeup ACPI GPEs during suspend-to-idle so that they
don't prevent the system from reaching the target low-power state
and simplify the suspend-to-idle handling on ACPI platforms
without full Low-Power S0 Idle (LPS0) support (Rafael Wysocki).
- Add system-wide suspend and resume support to the devfreq
framework (Lukasz Luba).
- Clean up the SmartReflex adaptive voltage scaling (AVS) driver and
add an SPDX license ID to it (Nishanth Menon, Uwe Kleine-König,
Thomas Meyer).
- Get rid of code duplication by using the DEFINE_SHOW_ATTRIBUTE
macro in some places, fix some DT node refcount leaks, and do
some other janitorial cleanups (Yangtao Li).
- Update the cpupower, intel_pstate_tracer and turbosat utilities
(Abhishek Goel, Doug Smythies, Len Brown).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcHMOKAAoJEILEb/54YlRxtHUP/i4MePJYYIab3dW2WMtFC9wP
K3Z/qva5uiEampEZbjlATlqUrd0XxMU2YfkJ9N08rJ33bKnBi8wNp0BPiwZjYs77
lwjuEQ0Depz85LJ7W+dOjjxucdYv/H/ZXVK7VF2sfAfbpEbD7+RNTTa1+zdoh3Yq
AiCKH/fU+Yc+wOMcXxj8vWe8EecsuYfZUC/p/yJDv1uMaUyHTgiCAyE/S2JzvZcQ
mGFmly8b1hSkvCPOsipq/O8HUDtve8sOyZvFO5Up5BiNbDyhEHS4tDiKipbKgc/J
ljzP3pVATlnEZLFxyqg32PhuhwYB0MtN3+BZGTjLelqTmElTx9A2JvWBVk4jgaC9
ps97nUz1HO2dr1ERDyMnDtZFHFp24LVQcIB2RKfj/lqSq94IWQphmNZVF219jdyd
68wR2/fnkzTeDhJ1JvcJb5XKrrd/wq3IKfCJAd9AXVxy27BrCB0ryTsQFvXJ+AzV
n603yf3Sb7QaIC7Mofhj2WT3N/BQnzMzZhBJRA3EeX/Gd7TkLtQuvvHZQzcGbgOY
GGm6WKGWgEUp3YIJIz5ihUuy9SUETCvR2PqGZfo+Wmc7F77j/5FY5ejRt3llkeFn
8KaVQH9YIRjffBr1nclXMHKaRtf/JVB9SgVmDvA6Sh3Ac5/KBy3+IgwiJ2z5dunr
tYV/QJ22r15xEBITTSJ7
=WhM8
-----END PGP SIGNATURE-----
Merge tag 'pm-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add sysadmin documentation for cpuidle, extend the cpuidle
subsystem somewhat, improve the handling of performance states in the
generic power domains (genpd) and operating performance points (OPP)
frameworks, add a new cpufreq driver for Qualcomm SoCs, update some
other cpufreq drivers, switch over the runtime PM framework to using
high-res timers for device autosuspend, fix a problem with
suspend-to-idle on ACPI-based platforms, add system-wide suspend and
resume handling to the devfreq framework, do some janitorial cleanups
all over and update some utilities.
Specifics:
- Add sysadmin documentation for cpuidle (Rafael Wysocki).
- Make it possible to specify a cpuidle governor from kernel command
line, add new cpuidle state sysfs attributes for governor
evaluation, and improve the "polling" idle state handling (Rafael
Wysocki).
- Fix the handling of the "required-opps" DT property in the
operating performance points (OPP) framework, improve the
integration of it with the generic power domains (genpd) framework,
improve the handling of performance states in them and clean up the
idle states vs performance states separation in genpd (Viresh
Kumar, Ulf Hansson).
- Add a cpufreq driver called "qcom-hw" for Qualcomm SoCs using a
hardware engine to control CPU frequency transitions along with DT
bindings for it (Taniya Das).
- Fix an intel_pstate driver issue related to CPU offline and update
the documentation of it (Srinivas Pandruvada).
- Clean up the imx6q cpufreq driver (Anson Huang).
- Add SPDX license IDs to cpufreq schedutil governor files (Daniel
Lezcano).
- Switch over the runtime PM framework to using high-res timers for
device autosuspend to allow the control of it to be more precise
(Vincent Guittot).
- Disable non-wakeup ACPI GPEs during suspend-to-idle so that they
don't prevent the system from reaching the target low-power state
and simplify the suspend-to-idle handling on ACPI platforms without
full Low-Power S0 Idle (LPS0) support (Rafael Wysocki).
- Add system-wide suspend and resume support to the devfreq framework
(Lukasz Luba).
- Clean up the SmartReflex adaptive voltage scaling (AVS) driver and
add an SPDX license ID to it (Nishanth Menon, Uwe Kleine-König,
Thomas Meyer).
- Get rid of code duplication by using the DEFINE_SHOW_ATTRIBUTE
macro in some places, fix some DT node refcount leaks, and do some
other janitorial cleanups (Yangtao Li).
- Update the cpupower, intel_pstate_tracer and turbosat utilities
(Abhishek Goel, Doug Smythies, Len Brown)"
* tag 'pm-4.21-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (54 commits)
PM / Domains: remove define_genpd_open_function() and define_genpd_debugfs_fops()
PM-runtime: Switch autosuspend over to using hrtimers
cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver
dt-bindings: cpufreq: Introduce QCOM cpufreq firmware bindings
ACPI: PM: Loop in full LPS0 mode only
ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle
tools/power/x86/intel_pstate_tracer: Fix non root execution for post processing a trace file
tools/power turbostat: consolidate duplicate model numbers
tools/power turbostat: fix goldmont C-state limit decoding
PM / Domains: Propagate performance state updates
PM / Domains: Factorize dev_pm_genpd_set_performance_state()
PM / Domains: Save OPP table pointer in genpd
OPP: Don't return 0 on error from of_get_required_opp_performance_state()
OPP: Add dev_pm_opp_xlate_performance_state() helper
OPP: Improve _find_table_of_opp_np()
PM / Domains: Make genpd performance states orthogonal to the idlestates
PM / sleep: convert to DEFINE_SHOW_ATTRIBUTE
cpuidle: Add 'above' and 'below' idle state metrics
PM / AVS: SmartReflex: Switch to SPDX Licence ID
PM / AVS: SmartReflex: NULL check before some freeing functions is not needed
...
There are several locations that compare constants to the beginning of
string variables to determine what commands should be done, then the
constant length is used to index into the string. This is error prone as the
hard coded numbers have to match the size of the constants. Instead, use the
len returned from str_has_prefix() and remove the open coded string length
sizes.
Cc: Joe Perches <joe@perches.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org> (for trace_probe part)
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As str_has_prefix() returns the length on match, we can use that for the
updating of the string pointer instead of recalculating the prefix size.
Cc: Tom Zanussi <zanussi@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
There are several instances of strncmp(str, "const", 123), where 123 is the
strlen of the const string to check if "const" is the prefix of str. But
this can be error prone. Use str_has_prefix() instead.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The tracing histogram code contains a lot of instances of the construct:
strncmp(str, "const", sizeof("const") - 1)
This can be prone to bugs due to typos or bad cut and paste. Use the
str_has_prefix() helper macro instead that removes the need for having two
copies of the constant string.
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In commit 478409dd683d ("tracing: Add hook to function tracing for other
subsystems to use"), a new function ‘ftrace_exports’ was added. Since
this function can be made static, make it so.
Silence the following warning triggered using W=1:
kernel/trace/trace.c:2451:6: warning: no previous prototype for ‘ftrace_exports’ [-Wmissing-prototypes]
Link: http://lkml.kernel.org/r/20180516193012.25390-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
trace_seq_printf(..., "%s", ...) can be done with trace_seq_puts()
instead, avoiding printf overhead. In the second instance, the string
we're copying was just created from an snprintf() to a stack buffer, so
we might as well do that printf directly. This naturally leads to moving
the declaration of the str buffer inside the CONFIG_KALLSYMS guard,
which in turn will make gcc inline the function for !CONFIG_KALLSYMS (it
only has a single caller, but the huge stack frame seems to make gcc not
inline it for CONFIG_KALLSYMS).
Link: http://lkml.kernel.org/r/20181029223542.26175-4-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Building with -Wformat-nonliteral, gcc complains
kernel/trace/trace_output.c: In function ‘seq_print_sym’:
kernel/trace/trace_output.c:356:3: warning: format not a string literal, argument types not checked [-Wformat-nonliteral]
trace_seq_printf(s, fmt, name);
But seq_print_sym only has a single caller which passes "%s" as fmt, so
we might as well just use that directly. That also paves the way for
further cleanups that will actually make that format string go away
entirely.
Link: http://lkml.kernel.org/r/20181029223542.26175-3-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
These two functions are nearly identical, so we can avoid some code
duplication by moving the conditional into a common implementation.
Link: http://lkml.kernel.org/r/20181029223542.26175-2-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
All var_refs are now handled uniformly and there's no reason to treat
the synth_refs in a special way now, so remove them and associated
functions.
Link: http://lkml.kernel.org/r/b4d3470526b8f0426dcec125399dad9ad9b8589d.1545161087.git.tom.zanussi@linux.intel.com
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since every var ref for a trigger has an entry in the var_ref[] array,
use that to destroy the var_refs, instead of piecemeal via the field
expressions.
This allows us to avoid having to keep and treat differently separate
lists for the action-related references, which future patches will
remove.
Link: http://lkml.kernel.org/r/fad1a164f0e257c158e70d6eadbf6c586e04b2a2.1545161087.git.tom.zanussi@linux.intel.com
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Have create_var_ref() manage the hist trigger's var_ref list, rather
than having similar code doing it in multiple places. This cleans up
the code and makes sure var_refs are always accounted properly.
Also, document the var_ref-related functions to make what their
purpose clearer.
Link: http://lkml.kernel.org/r/05ddae93ff514e66fc03897d6665231892939913.1545161087.git.tom.zanussi@linux.intel.com
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since all the variable reference hist_fields are collected into
hist_data->var_refs[] array, there's no need to go through all the
fields looking for them, or in separate arrays like synth_var_refs[],
which will be going away soon anyway.
This also allows us to get rid of some unnecessary code and functions
currently used for the same purpose.
Link: http://lkml.kernel.org/r/1545246556.4239.7.camel@gmail.com
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
There's no need to use strlen() for static strings when the length is
already known, so update trace_events_hist.c with sizeof() for those
cases.
Link: http://lkml.kernel.org/r/e3e754f2bd18e56eaa8baf79bee619316ebf4cfc.1545161087.git.tom.zanussi@linux.intel.com
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The function ftrace_graph_get_ret_stack() takes a task struct descriptor but
uses current as the task to perform the operations on. In pretty much all
cases the task decriptor is the same as current, so this wasn't an issue.
But there is a case in the ARM architecture that passes in a task that is
not current, and expects a result from that task, and this code breaks it.
Fixes: 51584396cff5 ("arm64: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack")
Reported-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Commit 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") will
result in fork failing if allocating a kernel stack for a task in
dup_task_struct exceeds the kernel memory allowance for that cgroup.
Unfortunately, it also results in a crash.
This is due to the code jumping to free_stack and calling
free_thread_stack when the memcg kernel stack charge fails, but without
tsk->stack pointing at the freshly allocated stack.
This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:
#5 [ffffc900244efc88] die at ffffffff8101f0ab
#6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
#7 [ffffc900244efce0] general_protection at ffffffff818ff082
[exception RIP: llist_add_batch+7]
RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000
RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a
RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0
R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
#9 [ffffc900244efdb8] copy_process at ffffffff81086e37
#10 [ffffc900244efe98] _do_fork at ffffffff810884e0
#11 [ffffc900244eff10] sys_vfork at ffffffff810887ff
#12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43
RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948
RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421
RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00
R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040
R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018
ORIG_RAX: 000000000000003a CS: 0033 SS: 002b
The simplest fix is to assign tsk->stack right where it is allocated.
Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull timer fix from Ingo Molnar:
"Fix a division by zero crash in the posix-timers code"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-timers: Fix division by zero bug
Pull futex fix from Ingo Molnar:
"A single fix for a robust futexes race between sys_exit() and
sys_futex_lock_pi()"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Cure exit race
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-12-21
The following pull-request contains BPF updates for your *net-next* tree.
There is a merge conflict in test_verifier.c. Result looks as follows:
[...]
},
{
"calls: cross frame pruning",
.insns = {
[...]
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
.result_unpriv = REJECT,
.errstr = "!read_ok",
.result = REJECT,
},
{
"jset: functional",
.insns = {
[...]
{
"jset: unknown const compare not taken",
.insns = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_get_prandom_u32),
BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1),
BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.errstr_unpriv = "!read_ok",
.result_unpriv = REJECT,
.errstr = "!read_ok",
.result = REJECT,
},
[...]
{
"jset: range",
.insns = {
[...]
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.result_unpriv = ACCEPT,
.result = ACCEPT,
},
The main changes are:
1) Various BTF related improvements in order to get line info
working. Meaning, verifier will now annotate the corresponding
BPF C code to the error log, from Martin and Yonghong.
2) Implement support for raw BPF tracepoints in modules, from Matt.
3) Add several improvements to verifier state logic, namely speeding
up stacksafe check, optimizations for stack state equivalence
test and safety checks for liveness analysis, from Alexei.
4) Teach verifier to make use of BPF_JSET instruction, add several
test cases to kselftests and remove nfp specific JSET optimization
now that verifier has awareness, from Jakub.
5) Improve BPF verifier's slot_type marking logic in order to
allow more stack slot sharing, from Jiong.
6) Add sk_msg->size member for context access and add set of fixes
and improvements to make sock_map with kTLS usable with openssl
based applications, from John.
7) Several cleanups and documentation updates in bpftool as well as
auto-mount of tracefs for "bpftool prog tracelog" command,
from Quentin.
8) Include sub-program tags from now on in bpf_prog_info in order to
have a reliable way for user space to get all tags of the program
e.g. needed for kallsyms correlation, from Song.
9) Add BTF annotations for cgroup_local_storage BPF maps and
implement bpf fs pretty print support, from Roman.
10) Fix bpftool in order to allow for cross-compilation, from Ivan.
11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order
to be compatible with libbfd and allow for Debian packaging,
from Jakub.
12) Remove an obsolete prog->aux sanitation in dump and get rid of
version check for prog load, from Daniel.
13) Fix a memory leak in libbpf's line info handling, from Prashant.
14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info
does not get unaligned, from Jesper.
15) Fix test_progs kselftest to work with older compilers which are less
smart in optimizing (and thus throwing build error), from Stanislav.
16) Cleanup and simplify AF_XDP socket teardown, from Björn.
17) Fix sk lookup in BPF kselftest's test_sock_addr with regards
to netns_id argument, from Andrey.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The frame_size passed to build_skb must be aligned, else it is
possible that the embedded struct skb_shared_info gets unaligned.
For correctness make sure that xdpf->headroom in included in the
alignment. No upstream drivers can hit this, as all XDP drivers provide
an aligned headroom. This was discovered when playing with implementing
XDP support for mvneta, which have a 2 bytes DSA header, and this
Marvell ARM64 platform didn't like doing atomic operations on an
unaligned skb_shinfo(skb)->dataref addresses.
Fixes: 1c601d829ab0 ("bpf: cpumap xdp_buff to skb conversion and allocation")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.
Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
The cleanup in commit 356da6d0cde3 ("dma-mapping: bypass indirect calls
for dma-direct") accidentally inverted the logic in the check for the
presence of a ->dma_supported() callback. Switch this back to the way it
was to prevent a crash on boot.
Fixes: 356da6d0cde3 ("dma-mapping: bypass indirect calls for dma-direct")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reorder the calls to check_max_stack_depth() and sanitize_dead_code()
to separate functions which can rewrite instructions from pure checks.
No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Some JITs (nfp) try to optimize code on their own. It could make
sense in case of BPF_JSET instruction which is currently not interpreted
by the verifier, meaning for instance that dead could would not be
detected if it was under BPF_JSET branch.
Teach the verifier basics of BPF_JSET, JIT optimizations will be
removed shortly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull networking fixes from David Miller:
1) Off by one in netlink parsing of mac802154_hwsim, from Alexander
Aring.
2) nf_tables RCU usage fix from Taehee Yoo.
3) Flow dissector needs nhoff and thoff clamping, from Stanislav
Fomichev.
4) Missing sin6_flowinfo initialization in SCTP, from Xin Long.
5) Spectrev1 in ipmr and ip6mr, from Gustavo A. R. Silva.
6) Fix r8169 crash when DEBUG_SHIRQ is enabled, from Heiner Kallweit.
7) Fix SKB leak in rtlwifi, from Larry Finger.
8) Fix state pruning in bpf verifier, from Jakub Kicinski.
9) Don't handle completely duplicate fragments as overlapping, from
Michal Kubecek.
10) Fix memory corruption with macb and 64-bit DMA, from Anssi Hannula.
11) Fix TCP fallback socket release in smc, from Myungho Jung.
12) gro_cells_destroy needs to napi_disable, from Lorenzo Bianconi.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (130 commits)
rds: Fix warning.
neighbor: NTF_PROXY is a valid ndm_flag for a dump request
net: mvpp2: fix the phylink mode validation
net/sched: cls_flower: Remove old entries from rhashtable
net/tls: allocate tls context using GFP_ATOMIC
iptunnel: make TUNNEL_FLAGS available in uapi
gro_cell: add napi_disable in gro_cells_destroy
lan743x: Remove MAC Reset from initialization
net/mlx5e: Remove the false indication of software timestamping support
net/mlx5: Typo fix in del_sw_hw_rule
net/mlx5e: RX, Fix wrong early return in receive queue poll
ipv6: explicitly initialize udp6_addr in udp_sock_create6()
bnxt_en: Fix ethtool self-test loopback.
net/rds: remove user triggered WARN_ON in rds_sendmsg
net/rds: fix warn in rds_message_alloc_sgs
ath10k: skip sending quiet mode cmd for WCN3990
mac80211: free skb fraglist before freeing the skb
nl80211: fix memory leak if validate_pae_over_nl80211() fails
net/smc: fix TCP fallback socket release
vxge: ensure data0 is initialized in when fetching firmware version information
...
If we want to map memory from the DMA allocator to userspace it must be
zeroed at allocation time to prevent stale data leaks. We already do
this on most common architectures, but some architectures don't do this
yet, fix them up, either by passing GFP_ZERO when we use the normal page
allocator or doing a manual memset otherwise.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Sam Ravnborg <sam@ravnborg.org> [sparc]
This patch rejects a line_info if the bpf insn code referred by
line_info.insn_off is 0. F.e. a broken userspace tool might generate
a line_info.insn_off that points to the second 8 bytes of a BPF_LD_IMM64.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There is no reason that modules should not be able
to use this, and NFS will need it when converted to
use 'struct cred'.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Sometimes we want to opportunistically get a
ref to a cred in an rcu_read_lock protected section.
get_task_cred() does this, and NFS does as similar thing
with its own credential structures.
To prepare for NFS converting to use 'struct cred' more
uniformly, define get_cred_rcu(), and use it in
get_task_cred().
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFS needs to compare to credentials, to see if they can
be treated the same w.r.t. filesystem access. Sometimes
an ordering is needed when credentials are used as a key
to an rbtree.
NFS currently has its own private credential management from
before 'struct cred' existed. To move it over to more consistent
use of 'struct cred' we need a comparison function.
This patch adds that function.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Devices which use managed interrupts usually have two classes of
interrupts:
- Interrupts for multiple device queues
- Interrupts for general device management
Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.
Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:
default_irq_affinity = 4..7
So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.
It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled. That limitation was reported by Kashyap and Sumit.
Expand struct irq_affinity_desc with a new bit 'is_managed' which is set
for truly managed interrupts (queue interrupts) and cleared for the general
device interrupts.
[ tglx: Simplify code and massage changelog ]
Reported-by: Kashyap Desai <kashyap.desai@broadcom.com>
Reported-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Dou Liyang <douliyangs@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-pci@vger.kernel.org
Cc: shivasharan.srikanteshwara@broadcom.com
Cc: ming.lei@redhat.com
Cc: hch@lst.de
Cc: bhelgaas@google.com
Cc: douliyang1@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-3-douliyangs@gmail.com
The interrupt affinity management uses straight cpumask pointers to convey
the automatically assigned affinity masks for managed interrupts. The core
interrupt descriptor allocation also decides based on the pointer being non
NULL whether an interrupt is managed or not.
Devices which use managed interrupts usually have two classes of
interrupts:
- Interrupts for multiple device queues
- Interrupts for general device management
Currently both classes are treated the same way, i.e. as managed
interrupts. The general interrupts get the default affinity mask assigned
while the device queue interrupts are spread out over the possible CPUs.
Treating the general interrupts as managed is both a limitation and under
certain circumstances a bug. Assume the following situation:
default_irq_affinity = 4..7
So if CPUs 4-7 are offlined, then the core code will shut down the device
management interrupts because the last CPU in their affinity mask went
offline.
It's also a limitation because it's desired to allow manual placement of
the general device interrupts for various reasons. If they are marked
managed then the interrupt affinity setting from both user and kernel space
is disabled.
To remedy that situation it's required to convey more information than the
cpumasks through various interfaces related to interrupt descriptor
allocation.
Instead of adding yet another argument, create a new data structure
'irq_affinity_desc' which for now just contains the cpumask. This struct
can be expanded to convey auxilliary information in the next step.
No functional change, just preparatory work.
[ tglx: Simplified logic and clarified changelog ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Dou Liyang <douliyangs@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-pci@vger.kernel.org
Cc: kashyap.desai@broadcom.com
Cc: shivasharan.srikanteshwara@broadcom.com
Cc: sumit.saxena@broadcom.com
Cc: ming.lei@redhat.com
Cc: hch@lst.de
Cc: douliyang1@huawei.com
Link: https://lkml.kernel.org/r/20181204155122.6327-2-douliyangs@gmail.com
Current btf internal verbose logger logs fwd type as
[2] FWD A type_id=0
where A is the type name.
Commit 9d5f9f701b18 ("bpf: btf: fix struct/union/fwd types
with kind_flag") introduced kind_flag which can be used
to distinguish whether a forward type is a struct or
union.
Also, "type_id=0" does not carry any meaningful
information for fwd type as btf_type.type = 0 is simply
enforced during btf verification and is not used
anywhere else.
This commit changed the log to
[2] FWD A struct
if kind_flag = 0, or
[2] FWD A union
if kind_flag = 1.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Verifier is supposed to support sharing stack slot allocated to ptr with
SCALAR_VALUE for privileged program. However this doesn't happen for some
cases.
The reason is verifier is not clearing slot_type STACK_SPILL for all bytes,
it only clears part of them, while verifier is using:
slot_type[0] == STACK_SPILL
as a convention to check one slot is ptr type.
So, the consequence of partial clearing slot_type is verifier could treat a
partially overridden ptr slot, which should now be a SCALAR_VALUE slot,
still as ptr slot, and rejects some valid programs.
Before this patch, test_xdp_noinline.o under bpf selftests, bpf_lxc.o and
bpf_netdev.o under Cilium bpf repo, when built with -mattr=+alu32 are
rejected due to this issue. After this patch, they all accepted.
There is no processed insn number change before and after this patch on
Cilium bpf programs.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():
CPU0 CPU1
sys_exit() sys_futex()
do_exit() futex_lock_pi()
exit_signals(tsk) No waiters:
tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
mm_release(tsk) Set waiter bit
exit_robust_list(tsk) { *uaddr = 0x80000PID;
Set owner died attach_to_pi_owner() {
*uaddr = 0xC0000000; tsk = get_task(PID);
} if (!tsk->flags & PF_EXITING) {
... attach();
tsk->flags |= PF_EXITPIDONE; } else {
if (!(tsk->flags & PF_EXITPIDONE))
return -EAGAIN;
return -ESRCH; <--- FAIL
}
ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.
Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.
If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.
Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de
Distributions build drivers as modules, including network and filesystem
drivers which export numerous tracepoints. This enables
bpf(BPF_RAW_TRACEPOINT_OPEN) to attach to those tracepoints.
Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- A bunch of new irqchip drivers (RDA8810PL, Madera, imx-irqsteer)
- Updates for new (and old) platforms (i.MX8MQ, F1C100s)
- A number of SPDX cleanups
- A workaround for a very broken GICv3 implementation
- A platform-msi fix
- Various cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlwZI8cVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDyokP+gKoKbZMc1E7dX6WxUrKh2N+fMJF
uVbuGF2s57CLG955YNuyo8BK4meWJIHGO3JahwE8I/9eu0G7PaudYvpZgP7s/sxD
XHLWFVHB1mq4lExMcluT0jG4ZpX7EKvYB1KGqgYM1ScOS9Uubb4ZG9T5GPhUT/YM
w1BAtHaZmCAg8d0wNPUMaAFc9Bd2B9Z1C8nwS+wpdJRxYxE9x8BES42r95rbXCG6
5Cq2ol/NbF4RbFodel4YdiAIKfrQtXyQ3N3twC5GRXln4XLjUfzs4mA5rxLLoeGZ
2UGXeIk0GcokSWF/e+0p3tQDWKwdbqoBhbRbqk7u5ZWuEWTRf4Zot3IlCVpJAMM3
iRw5XChWxovC+/oqgin4sp1gNpSRgf5mMvR1EauR5DTVtwlOjUBKaPEyKLrPITOo
B42EJugJ94J0YVdT9RUJsOSXIdOiYFE6I9F4i/XioLYq5FItBB56/81ARZgEncpg
FEdtseCCtRC3WWGzghxZsSzCW3iGi8wdddRdZmOXCNdPtH03TZg0dGPS+KIn8Soh
eVSGImV/4efN6hh6fSryeR02fYT3DKGgDQUiV4e/1SOSzxy6VjjrOh48tB8qn/M7
NbFZMqDKnltsXT2C+bh6zjhorbVCkj8AEtx1oF0d7iIyBxor3eHUelTz6VglNlLq
RFetH+Yjh9nt9ReO
=1Mk9
-----END PGP SIGNATURE-----
Merge tag 'irqchip-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- A bunch of new irqchip drivers (RDA8810PL, Madera, imx-irqsteer)
- Updates for new (and old) platforms (i.MX8MQ, F1C100s)
- A number of SPDX cleanups
- A workaround for a very broken GICv3 implementation
- A platform-msi fix
- Various cleanups
The last users were removed a while ago since everyone moved to ktime_t,
so we can remove the two unused interfaces for old timespec structures.
With those two gone, set_normalized_timespec() is also unused, so
remove that as well.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Stultz <john.stultz@linaro.org>
After arch/sh has removed the last reference to these functions,
we can remove them completely and just rely on the 64-bit time_t
based versions. This cleans up a rather ugly use of __weak
functions.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Now that 32-bit architectures have two variants of sys_rt_sigtimedwaid()
for 32-bit and 64-bit time_t, we also need to have a second compat system
call entry point on the corresponding 64-bit architectures.
The traditional system call keeps getting handled
by compat_sys_rt_sigtimedwait(), and this adds a new
compat_sys_rt_sigtimedwait_time64() that differs only in the timeout
argument type.
The naming remains a bit asymmetric for the moment. Ideally we would
want to have compat_sys_rt_sigtimedwait_time32() for the old version
and compat_sys_rt_sigtimedwait() for the new one to mirror the names
of the native entry points, but renaming the existing system call
tables causes unnecessary churn. I would suggest renaming all such
system calls together at a later point.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Once sys_rt_sigtimedwait() gets changed to a 64-bit time_t, we have
to provide compatibility support for existing binaries.
An earlier version of this patch reused the compat_sys_rt_sigtimedwait
entry point to avoid code duplication, but this newer approach
duplicates the existing native entry point instead, which seems
a bit cleaner.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
recvmmsg() takes two arguments to pointers of structures that differ
between 32-bit and 64-bit architectures: mmsghdr and timespec.
For y2038 compatbility, we are changing the native system call from
timespec to __kernel_timespec with a 64-bit time_t (in another patch),
and use the existing compat system call on both 32-bit and 64-bit
architectures for compatibility with traditional 32-bit user space.
As we now have two variants of recvmmsg() for 32-bit tasks that are both
different from the variant that we use on 64-bit tasks, this means we
also require two compat system calls!
The solution I picked is to flip things around: The existing
compat_sys_recvmmsg() call gets moved from net/compat.c into net/socket.c
and now handles the case for old user space on all architectures that
have set CONFIG_COMPAT_32BIT_TIME. A new compat_sys_recvmmsg_time64()
call gets added in the old place for 64-bit architectures only, this
one handles the case of a compat mmsghdr structure combined with
__kernel_timespec.
In the indirect sys_socketcall(), we now need to call either
do_sys_recvmmsg() or __compat_sys_recvmmsg(), depending on what kind of
architecture we are on. For compat_sys_socketcall(), no such change is
needed, we always call __compat_sys_recvmmsg().
I decided to not add a new SYS_RECVMMSG_TIME64 socketcall: Any libc
implementation for 64-bit time_t will need significant changes including
an updated asm/unistd.h, and it seems better to consistently use the
separate syscalls that configuration, leaving the socketcall only for
backward compatibility with 32-bit time_t based libc.
The naming is asymmetric for the moment, so both existing syscalls
entry points keep their names, while the new ones are recvmmsg_time32
and compat_recvmmsg_time64 respectively. I expect that we will rename
the compat syscalls later as we start using generated syscall tables
everywhere and add these entry points.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Go over the IRQ subsystem source code (including irqchip drivers) and
fix common typos in comments.
No change in functionality intended.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org