IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In bpf_tcp_sendmsg() the sk_alloc_sg() may fail. In the case of
ENOMEM, it may also mean that we've partially filled the scatterlist
entries with pages. Later jumping to sk_stream_wait_memory()
we could further fail with an error for several reasons, however
we miss to call free_start_sg() if the local sk_msg_buff was used.
Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
While working on bpf_tcp_sendmsg() code, I noticed that when a
sk->sk_err is set we error out with err = sk->sk_err. However
this is problematic since sk->sk_err is a positive error value
and therefore we will neither go into sk_stream_error() nor will
we report an error back to user space. I had this case with EPIPE
and user space was thinking sendmsg() succeeded since EPIPE is
a positive value, thinking we submitted 32 bytes. Fix it by
negating the sk->sk_err value.
Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-08-07
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add cgroup local storage for BPF programs, which provides a fast
accessible memory for storing various per-cgroup data like number
of transmitted packets, etc, from Roman.
2) Support bpf_get_socket_cookie() BPF helper in several more program
types that have a full socket available, from Andrey.
3) Significantly improve the performance of perf events which are
reported from BPF offload. Also convert a couple of BPF AF_XDP
samples overto use libbpf, both from Jakub.
4) seg6local LWT provides the End.DT6 action, which allows to
decapsulate an outer IPv6 header containing a Segment Routing Header.
Adds this action now to the seg6local BPF interface, from Mathieu.
5) Do not mark dst register as unbounded in MOV64 instruction when
both src and dst register are the same, from Arthur.
6) Define u_smp_rmb() and u_smp_wmb() to their respective barrier
instructions on arm64 for the AF_XDP sample code, from Brian.
7) Convert the tcp_client.py and tcp_server.py BPF selftest scripts
over from Python 2 to Python 3, from Jeremy.
8) Enable BTF build flags to the BPF sample code Makefile, from Taeung.
9) Remove an unnecessary rcu_read_lock() in run_lwt_bpf(), from Taehee.
10) Several improvements to the README.rst from the BPF documentation
to make it more consistent with RST format, from Tobin.
11) Replace all occurrences of strerror() by calls to strerror_r()
in libbpf and fix a FORTIFY_SOURCE build error along with it,
from Thomas.
12) Fix a bug in bpftool's get_btf() function to correctly propagate
an error via PTR_ERR(), from Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
__cgroup_bpf_attach() and __cgroup_bpf_detach() functions have
a good amount of duplicated code, which is possible to eliminate
by introducing the update_effective_progs() helper function.
The update_effective_progs() calls compute_effective_progs()
and then in case of success it calls activate_effective_progs()
for each descendant cgroup. In case of failure (OOM), it releases
allocated prog arrays and return the error code.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.
Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.
SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.
Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Recently we tried to make the preemptirqsoff tracer to use irqsoff
tracepoint probes. However this causes issues as reported by Masami:
[2.271078] Testing tracer preemptirqsoff: .. no entries found ..FAILED!
[2.381015] WARNING: CPU: 0 PID: 1 at /home/mhiramat/ksrc/linux/kernel/
trace/trace.c:1512 run_tracer_selftest+0xf3/0x154
This is due to the tracepoint code increasing the preempt nesting count
by calling an additional preempt_disable before calling into the
preemptoff tracer which messes up the preempt_count() check in
tracer_hardirqs_off.
To fix this, make the irqsoff tracer probes balance the additional outer
preempt_disable with a preempt_enable_notrace.
The other way to fix this is to just use SRCU for all tracepoints.
However we can't do that because we can't use NMIs from RCU context.
Link: http://lkml.kernel.org/r/20180806034049.67949-1-joel@joelfernandes.org
Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage")
Fixes: e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- GICv3 ITS LPI allocation revamp
- GICv3 support for hypervisor-enforced LPI range
- GICv3 ITS conversion to raw spinlock
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAltoBXMVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDyUYP/1feAq3F7ZmhCIZka4c6y/m4EBpq
BjWEEgOAGMEyyB4s98flsRtZcEUxxp6CqEXo2FgCsd1Nj+og7oA7vwOlqy3aGzsi
9f/Z5Wi6SlG06lH5tmYNkyVbGk2tE3s2FzkH5Rg8qZGk+X3OCOdNs/+G20pYAkSp
ESePWSapbQUJSExJ1MqzfdHFidtVA1V+ev8BKdIp2ykl1NRae8LJeKHIbqac49Ym
JclfCLFpQM1M1ElB9j0E8hAvZhz10oOz7TtBR737O/1QEifVyFqGBckPzldvwIJM
zZ+nR+Yzj1ruD109xwaF1iKy9AinZWhiqrtN7UXJ3jwHtNih+sy0R6FQ38GMNoOC
0K02n/qStR5xglGr4BmAcWlOuFtBYWfz6HpSVMqaTWWmOxHEiqS6pXtEA+dV/YyI
wHLbo0YzpWTQm6t1+b/PoByAJ0/hOcD1nOD57b+NGjX7tZV0sGjpGsecvFhTSywh
BN3COBi9k/FOBrOTGDX1qUAI+mEf76vc2BAC+BkkoiiMg3WlY0E9qfQJguUxHdrb
0LS3lDZoHCNoz8RZLrUyenTT0NYGcjPGUTinMDJWG79VGXOWFexTDdCuX0kF90CK
1Zie3O6lrTYolmaiyLUxwukKp1SVUyoA5IpKVwfDJQYUhEfk27yvlzg2MBMcHDRA
uy3QSkmjx9vw/sAu
=gKw8
-----END PGP SIGNATURE-----
Merge tag 'irqchip-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- GICv3 ITS LPI allocation revamp
- GICv3 support for hypervisor-enforced LPI range
- GICv3 ITS conversion to raw spinlock
At present, "systemctl suspend" and "shutdown" can run in parrallel. A
system can suspend after devices_shutdown(), and resume. Then the shutdown
task goes on to power off. This causes many devices are not really shut
off. Hence replacing reboot_mutex with system_transition_mutex (renamed
from pm_mutex) to achieve the exclusion. The renaming of pm_mutex as
system_transition_mutex can be better to reflect the purpose of the mutex.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
This addresses Coverity-ID: 114713 ("Missing break in switch").
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltU8z0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5X8H/2fJr7m3k242+t76
sitwvx1eoPqTgryW59dRKm9IuXAGA+AjauvHzaz1QxomeQa50JghGWefD0eiJfkA
1AphQ/24EOiAbbVk084dAI/C2p122dE4D5Fy7CrfLnuouyrbFaZI5STbnrRct7sR
9deeYW0GDHO1Uenp4WDCj0baaqJqaevZ+7GG09DnWpya2nQtSkGBjqn6GpYmrfOU
mqFuxAX8mEOW6cwK16y/vYtnVjuuMAiZ63/OJ8AQ6d6ArGLwAsdn7f8Fn4I4tEr2
L0d3CRLUyegms4++Dmlu05k64buQu46WlPhjCZc5/Ts4kjrNxBuHejj2/jeSnUSt
vJJlibI=
=42a5
-----END PGP SIGNATURE-----
Merge tag 'v4.18-rc6' into for-4.19/block2
Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a
merge conflict down the line.
Signed-of-by: Jens Axboe <axboe@kernel.dk>
When cpu_stop_queue_work() releases the lock for the stopper
thread that was queued into its wake queue, preemption is
enabled, which leads to the following deadlock:
CPU0 CPU1
sched_setaffinity(0, ...)
__set_cpus_allowed_ptr()
stop_one_cpu(0, ...) stop_two_cpus(0, 1, ...)
cpu_stop_queue_work(0, ...) cpu_stop_queue_two_works(0, ..., 1, ...)
-grabs lock for migration/0-
-spins with preemption disabled,
waiting for migration/0's lock to be
released-
-adds work items for migration/0
and queues migration/0 to its
wake_q-
-releases lock for migration/0
and preemption is enabled-
-current thread is preempted,
and __set_cpus_allowed_ptr
has changed the thread's
cpu allowed mask to CPU1 only-
-acquires migration/0 and migration/1's
locks-
-adds work for migration/0 but does not
add migration/0 to wake_q, since it is
already in a wake_q-
-adds work for migration/1 and adds
migration/1 to its wake_q-
-releases migration/0 and migration/1's
locks, wakes migration/1, and enables
preemption-
-since migration/1 is requested to run,
migration/1 begins to run and waits on
migration/0, but migration/0 will never
be able to run, since the thread that
can wake it is affine to CPU1-
Disable preemption in cpu_stop_queue_work() before queueing works for
stopper threads, and queueing the stopper thread in the wake queue, to
ensure that the operation of queueing the works and waking the stopper
threads is atomic.
Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: matt@codeblueprint.co.uk
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1533329766-4856-1-git-send-email-isaacm@codeaurora.org
Co-Developed-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Pull timer fixes from Thomas Gleixner:
"Two oneliners addressing NOHZ failures:
- Use a bitmask to check for the pending timer softirq and not the
bit number. The existing code using the bit number checked for
the wrong bit, which caused timers to either expire late or stop
completely.
- Make the nohz evaluation on interrupt exit more robust. The
existing code did not re-arm the hardware when interrupting a
running softirq in task context (ksoftirqd or tail of
local_bh_enable()), which caused timers to either expire late
or stop completely"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: Fix missing tick reprogram when interrupting an inline softirq
nohz: Fix local_timer_softirq_pending()
There are only two signals that are delivered to every member of a
signal group: SIGSTOP and SIGKILL. Signal delivery requires every
signal appear to be delivered either before or after a clone syscall.
SIGKILL terminates the clone so does not need to be considered. Which
leaves only SIGSTOP that needs to be considered when creating new
threads.
Today in the event of a group stop TIF_SIGPENDING will get set and the
fork will restart ensuring the fork syscall participates in the group
stop.
A fork (especially of a process with a lot of memory) is one of the
most expensive system so we really only want to restart a fork when
necessary.
It is easy so check to see if a SIGSTOP is ongoing and have the new
thread join it immediate after the clone completes. Making it appear
the clone completed happened just before the SIGSTOP.
The calculate_sigpending function will see the bits set in jobctl and
set TIF_SIGPENDING to ensure the new task takes the slow path to userspace.
V2: The call to task_join_group_stop was moved before the new task is
added to the thread group list. This should not matter as
sighand->siglock is held over both the addition of the threads,
the call to task_join_group_stop and do_signal_stop. But the change
is trivial and it is one less thing to worry about when reading
the code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add a function calculate_sigpending to test to see if any signals are
pending for a new task immediately following fork. Signals have to
happen either before or after fork. Today our practice is to push
all of the signals to before the fork, but that has the downside that
frequent or periodic signals can make fork take much much longer than
normal or prevent fork from completing entirely.
So we need move signals that we can after the fork to prevent that.
This updates the code to set TIF_SIGPENDING on a new task if there
are signals or other activities that have moved so that they appear
to happen after the fork.
As the code today restarts if it sees any such activity this won't
immediately have an effect, as there will be no reason for it
to set TIF_SIGPENDING immediately after the fork.
Adding calculate_sigpending means the code in fork can safely be
changed to not always restart if a signal is pending.
The new calculate_sigpending function sets sigpending if there
are pending bits in jobctl, pending signals, the freezer needs
to freeze the new task or the live kernel patching framework
need the new thread to take the slow path to userspace.
I have verified that setting TIF_SIGPENDING does make a new process
take the slow path to userspace before it executes it's first userspace
instruction.
I have looked at the callers of signal_wake_up and the code paths
setting TIF_SIGPENDING and I don't see anything else that needs to be
handled. The code probably doesn't need to set TIF_SIGPENDING for the
kernel live patching as it uses a separate thread flag as well. But
at this point it seems safer reuse the recalc_sigpending logic and get
the kernel live patching folks to sort out their story later.
V2: I have moved the test into schedule_tail where siglock can
be grabbed and recalc_sigpending can be reused directly.
Further as the last action of setting up a new task this
guarantees that TIF_SIGPENDING will be properly set in the
new process.
The helper calculate_sigpending takes the siglock and
uncontitionally sets TIF_SIGPENDING and let's recalc_sigpending
clear TIF_SIGPENDING if it is unnecessary. This allows reusing
the existing code and keeps maintenance of the conditions simple.
Oleg Nesterov <oleg@redhat.com> suggested the movement
and pointed out the need to take siglock if this code
was going to be called while the new task is discoverable.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The full nohz tick is reprogrammed in irq_exit() only if the exit is not in
a nesting interrupt. This stands as an optimization: whether a hardirq or a
softirq is interrupted, the tick is going to be reprogrammed when necessary
at the end of the inner interrupt, with even potential new updates on the
timer queue.
When soft interrupts are interrupted, it's assumed that they are executing
on the tail of an interrupt return. In that case tick_nohz_irq_exit() is
called after softirq processing to take care of the tick reprogramming.
But the assumption is wrong: softirqs can be processed inline as well, ie:
outside of an interrupt, like in a call to local_bh_enable() or from
ksoftirqd.
Inline softirqs don't reprogram the tick once they are done, as opposed to
interrupt tail softirq processing. So if a tick interrupts an inline
softirq processing, the next timer will neither be reprogrammed from the
interrupting tick's irq_exit() nor after the interrupted softirq
processing. This situation may leave the tick unprogrammed while timers are
armed.
To fix this, simply keep reprogramming the tick even if a softirq has been
interrupted. That can be optimized further, but for now correctness is more
important.
Note that new timers enqueued in nohz_full mode after a softirq gets
interrupted will still be handled just fine through self-IPIs triggered by
the timer code.
Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org # 4.14+
Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.org
The support of force threading interrupts which are set up with both a
primary and a threaded handler wreckaged the setup of regular requested
threaded interrupts (primary handler == NULL).
The reason is that it does not check whether the primary handler is set to
the default handler which wakes the handler thread. Instead it replaces the
thread handler with the primary handler as it would do with force threaded
interrupts which have been requested via request_irq(). So both the primary
and the thread handler become the same which then triggers the warnon that
the thread handler tries to wakeup a not configured secondary thread.
Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
when requesting the threaded interrupt, which is normaly caught by the
sanity checks when force irq threading is disabled.
Fix it by skipping the force threading setup when a regular threaded
interrupt is requested. As a consequence the interrupt request which lacks
the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
it.
Fixes: 2a1d3ab8986d ("genirq: Handle force threading of irqs with primary and thread handler")
Reported-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kurt Kanzenbach <kurt.kanzenbach@linutronix.de>
Cc: stable@vger.kernel.org
Code is emitting the following error message during boot on systems
without PMU hardware support while probing NMI capability.
NMI watchdog: Perf event create on CPU 0 failed with -2
This error is emitted as the perf subsystem returns -ENOENT due to lack of
PMUs in the system.
It is followed by the warning that NMI watchdog is disabled:
NMI watchdog: Perf NMI watchdog permanently disabled
While NMI disabled information is useful for ordinary users, seeing a PERF
event create failed with error code -2 is not.
Reduce the message severity to debug so that if debugging is still possible
in case the error code returned by perf is required for analysis.
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=599368
Link: https://lkml.kernel.org/r/20180803060943.2643-1-okaya@kernel.org
The bpf_get_local_storage() helper function is used
to get a pointer to the bpf local storage from a bpf program.
It takes a pointer to a storage map and flags as arguments.
Right now it accepts only cgroup storage maps, and flags
argument has to be 0. Further it can be extended to support
other types of local storage: e.g. thread local storage etc.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
As there is one-to-one relation between a bpf program
and cgroup local storage map, there is no sense in
creating a map of cgroup local storage maps.
Forbid it explicitly to avoid possible side effects.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
BPF_MAP_TYPE_CGROUP_STORAGE maps are special in a way
that the access from the bpf program side is lookup-free.
That means the result is guaranteed to be a valid
pointer to the cgroup storage; no NULL-check is required.
This patch introduces BPF_PTR_TO_MAP_VALUE return type,
which is required to cause the verifier accept programs,
which are not checking the map value pointer for being NULL.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch converts bpf_prog_array from an array of prog pointers
to the array of struct bpf_prog_array_item elements.
This allows to save a cgroup storage pointer for each bpf program
efficiently attached to a cgroup.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If a bpf program is using cgroup local storage, allocate
a bpf_cgroup_storage structure automatically on attaching the program
to a cgroup and save the pointer into the corresponding bpf_prog_list
entry.
Analogically, release the cgroup local storage on detaching
of the bpf program.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commit introduces the bpf_cgroup_storage_set() helper,
which will be used to pass a pointer to a cgroup storage
to the bpf helper.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commit introduces BPF_MAP_TYPE_CGROUP_STORAGE maps:
a special type of maps which are implementing the cgroup storage.
>From the userspace point of view it's almost a generic
hash map with the (cgroup inode id, attachment type) pair
used as a key.
The only difference is that some operations are restricted:
1) a user can't create new entries,
2) a user can't remove existing entries.
The lookup from userspace is o(log(n)).
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commits extends existing bpf maps memory charging API
to support dynamic charging/uncharging.
This is required to account memory used by maps,
if all entries are created dynamically after
the map initialization.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The BTF conflicts were simple overlapping changes.
The virtio_net conflict was an overlap of a fix of statistics counter,
happening alongisde a move over to a bonafide statistics structure
rather than counting value on the stack.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix within_notrace_func() to check only notrace functions and to ignore the
kprobe-event which can not solve symbol addresses.
within_notrace_func() returns true if the given kprobe events probe point
seems to be out-of-range. But that is not the correct place to check for it,
it should be done in kprobes afterwards.
kprobe-events allow users to define a probe point on "currently unloaded
module" so that it can trace the event during module load. In this case, the
user will put a probe on a symbol which is not in kallsyms yet and it hits
the within_notrace_func(). As a result, kprobe-events always refuses if
user defines a probe on a "currenly unloaded module".
Fixes: commit 45408c4f9250 ("tracing: kprobes: Prohibit probing on notrace function")
Link: http://lkml.kernel.org/r/153319624799.29007.13604430345640129926.stgit@devbox
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
we prefer to the kmemdup rather than kmalloc+memcpy. so just
replace them.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Using cpu_all_mask in clockevents cpumask may result in issues while
comparing multiple clockevent devices to choose the preferred one.
On one of the platforms with 2 system (i.e. non per-CPU) timers with
different ratings, having cpu_all_mask for one of the device resulted in a
boot hang due to a endless loop in clockevents_notify_released() as both
were clocksources were selected as preferred.
In order to prevent such issues in the future, warn if any clockevent
driver sets cpu_all_mask as it's cpumask and just override it to use
cpu_possible_mask. All the existing occurrences of cpu_all_mask are already
replaced with cpu_possible_mask.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lkml.kernel.org/r/1531308264-24220-3-git-send-email-sudeep.holla@arm.com
This is the last instance of cpu_all_mask usage in the core framework.
Replace it with cpu_possible_mask like all other instances in the
clockevent drivers. This makes it possible to add a warning in the core
clockevents_register_device on usage of cpu_all_mask from any clockevent
drivers in the future.
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lkml.kernel.org/r/1531308264-24220-2-git-send-email-sudeep.holla@arm.com
timer_base::must_forward_clock is indicating that the base clock might be
stale due to a long idle sleep.
The forwarding of the base clock takes place in the timer softirq or when a
timer is enqueued to a base which is idle. If the enqueue of timer to an
idle base happens from a remote CPU, then the following race can happen:
CPU0 CPU1
run_timer_softirq mod_timer
base = lock_timer_base(timer);
base->must_forward_clk = false
if (base->must_forward_clk)
forward(base); -> skipped
enqueue_timer(base, timer, idx);
-> idx is calculated high due to
stale base
unlock_timer_base(timer);
base = lock_timer_base(timer);
forward(base);
The root cause is that timer_base::must_forward_clk is cleared outside the
timer_base::lock held region, so the remote queuing CPU observes it as
cleared, but the base clock is still stale. This can cause large
granularity values for timers, i.e. the accuracy of the expiry time
suffers.
Prevent this by clearing the flag with timer_base::lock held, so that the
forwarding takes place before the cleared flag is observable by a remote
CPU.
Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john.stultz@linaro.org
Cc: sboyd@kernel.org
Cc: linux-arm-msm@vger.kernel.org
Link: https://lkml.kernel.org/r/1533199863-22748-1-git-send-email-gkohli@codeaurora.org
Return statements in functions returning bool should use true or false
instead of an integer value.
This code was detected with the help of Coccinelle.
Link: http://lkml.kernel.org/r/20180802010056.GA31012@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The value of ring_buffer_record_is_set_on() is either true or false, so have
its return value be bool.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The value of ring_buffer_record_is_on() is either true or false, so have its
return value be bool.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Almost all architectures include it. Add a ARCH_NO_PREEMPT symbol to
disable preempt support for alpha, hexagon, non-coldfire m68k and
user mode Linux.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
There's code that expects tracer_tracing_is_on() to be either true or false,
not some random number. Currently, it should only return one or zero, but
just in case, change its return value to bool, to enforce it.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The start function of the hwlat tracer should never be called when the hwlat
thread already exists. If it is called, do a WARN_ON().
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The hwlat tracer uses a kernel thread to measure latencies. The function
that creates this kernel thread, start_kthread(), can be called when the
tracer is initialized and when the tracer is explicitly enabled.
start_kthread() does not check if there is an existing hwlat kernel
thread and will create a new one each time it is called.
This causes the reference to the previous thread to be lost. Without the
thread reference, the old kernel thread becomes unstoppable and
continues to use CPU time even after the hwlat tracer has been disabled.
This problem can be observed when a system is booted with tracing
enabled and the hwlat tracer is configured like this:
echo hwlat > current_tracer; echo 1 > tracing_on
Add the missing check for an existing kernel thread in start_kthread()
to prevent this problem. This function and the rest of the hwlat kernel
thread setup and teardown are already serialized because they are called
through the tracer core code with trace_type_lock held.
[
Note, this only fixes the symptom. The real fix was not to call
this function when tracing_on was already one. But this still makes
the code more robust, so we'll add it.
]
Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de
Signed-off-by: Erica Bugden <erica.bugden@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, when one echo's in 1 into tracing_on, the current tracer's
"start()" function is executed, even if tracing_on was already one. This can
lead to strange side effects. One being that if the hwlat tracer is enabled,
and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
start() function is called again which will recreate another kernel thread,
and make it unable to remove the old one.
Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de
Cc: stable@vger.kernel.org
Fixes: 2df8f8a6a897e ("tracing: Fix regression with irqsoff tracer and tracing_on file")
Reported-by: Erica Bugden <erica.bugden@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
We were hitting a panic in production where we put too many times on the
request queue. This is because we'd get the throttle_queue of the
parent if we fork()'ed while we needed to be throttled, but we didn't
have a reference on it. Instead just clear these flags on fork so the
child doesn't pay for the sins of its father.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAltguv8UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQVeRaWujKfIqYOA/9GgMzBJYU+bVCbNSagcq6LluWFoYV
ObZb9sfsf23wL0YgtKgkWaCefWAAYnWOr6bUvDa+5oMRLVR+bsP+YEkCVK45CJr0
g44oe4VH9t5inX2F2JSkoVbkUDZIwwOxiTi/L4Emqhv8cT9zc89tcKRjYhqt50d1
4Gm4++jZcTHQNKkzYUIIpKc0TZmKW5mRNmFaGogWPi72FWrhbDfjKLZZUvd+kIUC
HSKnv6pKnwbxLPhd9i0p5NchuTM6kRCptGzN07UUzeww6UVvs8t62+DzHUM1o3Ft
sraIx7BLenGC8OBCgi8aNkE+yseQE4h2OTym3paEkLVJsl/9qcsSyXL1dwO4Z96U
HFq/TpDZoBieZihHDBk4ry7ox942mE5N51QTDUh+cygEWeNvqGwqpAUbI14J23oh
3p7w7hgXAtdtuj4pzqUARemHvIR0Xbpn8ritH9cx1s1mDdycyyBDn9mFw3Ehigom
XIpUrSJtdfJYFj+z6wA4vXssvXe4TITrJTUmPAM1Alk1p+LhRkTA8JxBjHmL3qjR
mFIxA40t+ON5OtCqTtGsapaoJy2Jj97dPEp5i5Jg49BQclQoTG2rpYuIu/aKrixG
EZwdezckD3DPQUQdQidru7dS1J/phIaDDvEauq291ERHPfNAxQuMllXHeczzyJkc
eVRMkj0/E5lihlE=
=MN3z
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"A single small audit fix to guard against memory allocation failures
when logging information about a kernel module load.
It's small, easy to understand, and self-contained; while nothing is
zero risk, this should be pretty low"
* tag 'audit-pr-20180731' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: fix potential null dereference 'context->module.name'
When check_alu_op() handles a BPF_MOV64 between two registers,
it calls check_reg_arg(DST_OP) on the dst register, marking it
as unbounded. If the src and dst register are the same, this
marks the src as unbounded, which can lead to unexpected errors
for further checks that rely on bounds info. For example:
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_MOV64_REG(BPF_REG_2, BPF_REG_2),
BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
Results in:
"math between ctx pointer and register with unbounded
min value is not allowed"
check_alu_op() now uses check_reg_arg(DST_OP_NO_MARK), and MOVs
that need to mark the dst register (MOVIMM, MOV32) do so.
Added a test case for MOV64 dst == src, and dst != src.
Signed-off-by: Arthur Fabre <afabre@cloudflare.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
local_timer_softirq_pending() checks whether the timer softirq is
pending with: local_softirq_pending() & TIMER_SOFTIRQ.
This is wrong because TIMER_SOFTIRQ is the softirq number and not a
bitmask. So the test checks for the wrong bit.
Use BIT(TIMER_SOFTIRQ) instead.
Fixes: 5d62c183f9e9 ("nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()")
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Cc: bigeasy@linutronix.de
Cc: peterz@infradead.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180731161358.29472-1-anna-maria@linutronix.de