IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
- Fix off-by-one error in retrieving the context buffer for trace_printk()
- Fix off-by-one error in stack nesting limit
- Fix recursion to not make all NMI code false positive as recursing
- Stop losing events in function tracing when transitioning between irq context
- Stop losing events in ring buffer when transitioning between irq context
- Fix return code of error pointer in parse_synth_field() to prevent
NULL pointer dereference.
- Fix false positive of NMI recursion in kprobe event handling
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX6QqmRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qtWiAQCKf3xblmQDcge2BLQPy2H7ih5VCSSb
vDPrQKgLa35pSgEAhGj7mgAKxb6kDbae1BqPorfa/qn9Bi13XfRNTxBK2Qw=
=zKfd
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix off-by-one error in retrieving the context buffer for
trace_printk()
- Fix off-by-one error in stack nesting limit
- Fix recursion to not make all NMI code false positive as recursing
- Stop losing events in function tracing when transitioning between irq
context
- Stop losing events in ring buffer when transitioning between irq
context
- Fix return code of error pointer in parse_synth_field() to prevent
NULL pointer dereference.
- Fix false positive of NMI recursion in kprobe event handling
* tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
kprobes: Tell lockdep about kprobe nesting
tracing: Make -ENOMEM the default error for parse_synth_field()
ring-buffer: Fix recursion protection transitions between interrupt context
tracing: Fix the checking of stackidx in __ftrace_trace_stack
ftrace: Handle tracing when switching between context
ftrace: Fix recursion check for NMI test
tracing: Fix out of bounds write in get_trace_buf
- Unify the handling of managed and stateless device links in the
runtime PM framework and prevent runtime PM references to devices
from being leaked after device link removal (Rafael Wysocki).
- Fix two mistakes in the cpuidle documentation (Julia Lawall).
- Prevent the schedutil cpufreq governor from missing policy
limits updates in some cases (Viresh Kumar).
- Prevent static OPPs from being dropped by mistake (Viresh Kumar).
- Prevent helper function in the OPP framework from returning
prematurely (Viresh Kumar).
- Prevent opp_table_lock from being held too long during removal
of OPP tables with no more active references (Viresh Kumar).
- Drop redundant semicolon from the Intel RAPL power capping
driver (Tom Rix).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+kAnUSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxND0P/j5e/wqFZ/34XEaUa4PIs27TDoJHq1Jh
PfUxqSBEVO6Xm8TMCtOIuS8gw4x7ehenXC8gFdivy+Fu+4QNkPKYggF5/LgWO8Gl
cMNzYwJ8shIBQWQZftIx01Bn5tAvk1YhV1mnSNf580Iy7FhKqojMnrvzQnpGD4jR
piBkvBcbIJWDk+T96RzbqnqmMD0euvlYfzg1KnsyqsOpoRl7kZoH4ahWYskdIxDY
NtA4SqQ8dvxjTwI8+a+JBb//ua9jjjDyjd7FUV87HMdcVh9rZKbmHKkk4Yv/3C4C
jZuCTV6zaWIHCbbkZwi+OTNE8q21GgWm1xqwnGWVTFSGrs8ZyfjLs73/g9S5zI3N
C/n4YUJxiYDVcnrAVwR7kSTEouiQ6mTuChOt7T3r1Ilx67rw4TDsI59Fk1Xn07bZ
1wfaPASUPsFAxF7vMhkdzsidhpR3BYNgMwmAWo9R4Yvw4evyRN4tywoUQJ1X27zg
ERir6mFVz6XCnXPrJhyx0bWLo+VD8J/arhfIPSTqHR7wn0tgc23aVeYy7wvfYiiu
QQJqQ58Aoa0Z7ZpeAv3xbSung+eqQ/dDC9FnXqxdZCI6brYUhrZ19OhsFnEyzGUR
IDRzcP1/72AxhidpjO71txOJSJFTqKF2LzL/wuS6kMbYwmFjYfj18xX5GvOUuXmL
hO+E+QdleQFo
=lKQA
-----END PGP SIGNATURE-----
Merge tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the device links support in runtime PM, correct mistakes in
the cpuidle documentation, fix the handling of policy limits changes
in the schedutil cpufreq governor, fix assorted issues in the OPP
(operating performance points) framework and make one janitorial
change.
Specifics:
- Unify the handling of managed and stateless device links in the
runtime PM framework and prevent runtime PM references to devices
from being leaked after device link removal (Rafael Wysocki).
- Fix two mistakes in the cpuidle documentation (Julia Lawall).
- Prevent the schedutil cpufreq governor from missing policy limits
updates in some cases (Viresh Kumar).
- Prevent static OPPs from being dropped by mistake (Viresh Kumar).
- Prevent helper function in the OPP framework from returning
prematurely (Viresh Kumar).
- Prevent opp_table_lock from being held too long during removal of
OPP tables with no more active references (Viresh Kumar).
- Drop redundant semicolon from the Intel RAPL power capping driver
(Tom Rix)"
* tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Resume the device earlier in __device_release_driver()
PM: runtime: Drop pm_runtime_clean_up_links()
PM: runtime: Drop runtime PM references to supplier on link removal
powercap/intel_rapl: remove unneeded semicolon
Documentation: PM: cpuidle: correct path name
Documentation: PM: cpuidle: correct typo
cpufreq: schedutil: Don't skip freq update if need_freq_update is set
opp: Reduce the size of critical section in _opp_table_kref_release()
opp: Fix early exit from dev_pm_opp_register_set_opp_helper()
opp: Don't always remove static OPPs in _of_add_opp_table_v1()
Lockdep state handling on NMI enter and exit is nothing specific to X86. It's
not any different on other architectures. Also the extra state type is not
necessary, irqentry_state_t can carry the necessary information as well.
Move it to common code and extend irqentry_state_t to carry lockdep state.
[ Ira: Make exit_rcu and lockdep a union as they are mutually exclusive
between the IRQ and NMI exceptions, and add kernel documentation for
struct irqentry_state_t ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201102205320.1458656-7-ira.weiny@intel.com
When an exception/interrupt hits kernel space and the kernel is not
currently in the idle task then RCU must be watching.
irqentry_enter() validates this via rcu_irq_enter_check_tick(), which in
turn invokes lockdep when taking a lock. But at that point lockdep does not
yet know about the fact that interrupts have been disabled by the CPU,
which triggers a lockdep splat complaining about inconsistent state.
Invoking trace_hardirqs_off() before rcu_irq_enter_check_tick() defeats the
point of rcu_irq_enter_check_tick() because trace_hardirqs_off() uses RCU.
So use the same sequence as for the idle case and tell lockdep about the
irq state change first, invoke the RCU check and then do the lockdep and
tracer update.
Fixes: a5497bab5f72 ("entry: Provide generic interrupt entry/exit code")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87y2jhl19s.fsf@nanos.tec.linutronix.de
Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in INT3 context and other context to
suppress the false warnings.
Link: https://lore.kernel.org/r/20201102160234.fa0ae70915ad9e2b21c08b85@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Let's handle the successful call of mod_verify_sig() right after that call,
making the *switch* statement only handle the real errors, and then move
the comment from the first *case* before *switch* itself and the comment
before *default* after it. Fix the comment style, add article/comma/dash,
spell out "nomem" as "lack of memory" in these comments, while at it...
Suggested-by: Joe Perches <joe@perches.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Let's move the common handling of the non-fatal errors after the *switch*
statement -- this avoids *goto*s inside that *switch*...
Suggested-by: Joe Perches <joe@perches.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
The 'reason' variable in module_sig_check() points to 3 strings across
the *switch* statement, all needlessly starting with the same text.
Let's put the starting text into the pr_notice() call -- it saves 21
bytes of the object code (x86 gcc 10.2.1).
Suggested-by: Joe Perches <joe@perches.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Sergey Shtylyov <s.shtylyov@omprussia.ru>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
kcov_common_handle is a method that is used to obtain a "default" KCOV
remote handle of the current process. The handle can later be passed
to kcov_remote_start in order to collect coverage for the processing
that is initiated by one process, but done in another. For details see
Documentation/dev-tools/kcov.rst and comments in kernel/kcov.c.
Presently, if kcov_common_handle is called in an IRQ context, it will
return a handle for the interrupted process. This may lead to
unreliable and incorrect coverage collection.
Adjust the behavior of kcov_common_handle in the following way. If it
is called in a task context, return the common handle for the
currently running task. Otherwise, return 0.
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The default value for refscale.nreaders is -1, which results in the code
setting the value to three-quarters of the number of CPUs. On single-CPU
systems, this results in three-quarters of the value one, which the C
language's integer arithmetic rounds to zero. This in turn results in
a divide-by-zero error.
This commit therefore adds bounds checking to the refscale module
parameters, so that if they are less than one, they are set to the
value one.
Reported-by: kernel test robot <lkp@intel.com>
Tested-by "Chen, Rong A" <rong.a.chen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
At the end of the test and after rcu_torture_writer() stalls, rcutorture
invokes show_rcu_gp_kthreads() in order to dump out information on the
RCU grace-period kthread. This makes a lot of sense when testing vanilla
RCU, but not so much for the other flavors. This commit therefore allows
per-flavor kthread-dump functions to be specified.
[ paulmck: Apply feedback from kernel test robot <lkp@intel.com>. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The infinite for-loop in rcu_tasks_wait_gp() has its only exit at the
top of the loop, so this commit does the straightforward conversion to
a while-loop, thus saving a few lines.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The lockdep_is_held() macro is defined as:
#define lockdep_is_held(lock) lock_is_held(&(lock)->dep_map)
This hides away the dereference, so that builds with !LOCKDEP don't break.
This works in current kernels because the RCU_LOCKDEP_WARN() eliminates
its condition at preprocessor time in !LOCKDEP kernels. However, later
patches in this series will cause the compiler to see this condition even
in !LOCKDEP kernels. This commit prepares for this upcoming change by
switching from lock_is_held() to lockdep_is_held().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
--
CC: jiangshanlai@gmail.com
CC: paulmck@kernel.org
CC: josh@joshtriplett.org
CC: rostedt@goodmis.org
CC: mathieu.desnoyers@efficios.com
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Avoid setting up watchpoints on NULL pointers, as otherwise we would
crash inside the KCSAN runtime (when checking for value changes) instead
of the instrumented code.
Because that may be confusing, skip any address less than PAGE_SIZE.
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
In preparation of supporting only addresses not within the NULL page,
change the selftest to never use addresses that are less than PAGE_SIZE.
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
parse_synth_field() returns a pointer and requires that errors get
surrounded by ERR_PTR(). The ret variable is initialized to zero, but should
never be used as zero, and if it is, it could cause a false return code and
produce a NULL pointer dereference. It makes no sense to set ret to zero.
Set ret to -ENOMEM (the most common error case), and have any other errors
set it to something else. This removes the need to initialize ret on *every*
error branch.
Fixes: 761a8c58db6b ("tracing, synthetic events: Replace buggy strcat() with seq_buf operations")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The recursion protection of the ring buffer depends on preempt_count() to be
correct. But it is possible that the ring buffer gets called after an
interrupt comes in but before it updates the preempt_count(). This will
trigger a false positive in the recursion code.
Use the same trick from the ftrace function callback recursion code which
uses a "transition" bit that gets set, to allow for a single recursion for
to handle transitions between contexts.
Cc: stable@vger.kernel.org
Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
removed various __user annotations from function signatures as part of
its refactoring.
It also removed the __user annotation for proc_dohung_task_timeout_secs()
at its declaration in sched/sysctl.h, but not at its definition in
kernel/hung_task.c.
Hence, sparse complains:
kernel/hung_task.c:271:5: error: symbol 'proc_dohung_task_timeout_secs' redeclared with different type (incompatible argument 3 (different address spaces))
Adjust the annotation at the definition fitting to that refactoring to make
sparse happy again, which also resolves this warning from sparse:
kernel/hung_task.c:277:52: warning: incorrect type in argument 3 (different address spaces)
kernel/hung_task.c:277:52: expected void *
kernel/hung_task.c:277:52: got void [noderef] __user *buffer
No functional change. No change in object code.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Ignatov <rdna@fb.com>
Link: https://lkml.kernel.org/r/20201028130541.20320-1-lukas.bulwahn@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a small race window when a delayed work is being canceled and
the work still might be queued from the timer_fn:
CPU0 CPU1
kthread_cancel_delayed_work_sync()
__kthread_cancel_work_sync()
__kthread_cancel_work()
work->canceling++;
kthread_delayed_work_timer_fn()
kthread_insert_work();
BUG: kthread_insert_work() should not get called when work->canceling is
set.
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201014083030.16895-1-qiang.zhang@windriver.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This testcase
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <pthread.h>
#include <assert.h>
void *tf(void *arg)
{
return NULL;
}
int main(void)
{
int pid = fork();
if (!pid) {
kill(getpid(), SIGSTOP);
pthread_t th;
pthread_create(&th, NULL, tf, NULL);
return 0;
}
waitpid(pid, NULL, WSTOPPED);
ptrace(PTRACE_SEIZE, pid, 0, PTRACE_O_TRACECLONE);
waitpid(pid, NULL, 0);
ptrace(PTRACE_CONT, pid, 0,0);
waitpid(pid, NULL, 0);
int status;
int thread = waitpid(-1, &status, 0);
assert(thread > 0 && thread != pid);
assert(status == 0x80137f);
return 0;
}
fails and triggers WARN_ON_ONCE(!signr) in do_jobctl_trap().
This is because task_join_group_stop() has 2 problems when current is traced:
1. We can't rely on the "JOBCTL_STOP_PENDING" check, a stopped tracee
can be woken up by debugger and it can clone another thread which
should join the group-stop.
We need to check group_stop_count || SIGNAL_STOP_STOPPED.
2. If SIGNAL_STOP_STOPPED is already set, we should not increment
sig->group_stop_count and add JOBCTL_STOP_CONSUME. The new thread
should stop without another do_notify_parent_cldstop() report.
To clarify, the problem is very old and we should blame
ptrace_init_task(). But now that we have task_join_group_stop() it makes
more sense to fix this helper to avoid the code duplication.
Reported-by: syzbot+3485e3773f7da290eecc@syzkaller.appspotmail.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <christian@brauner.io>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019134237.GA18810@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cpufreq policy's frequency limits (min/max) can get changed at any
point of time, while schedutil is trying to update the next frequency.
Though the schedutil governor has necessary locking and support in place
to make sure we don't miss any of those updates, there is a corner case
where the governor will find that the CPU is already running at the
desired frequency and so may skip an update.
For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
and is running at 1 GHz currently. Schedutil tries to update the
frequency to 1.2 GHz, during this time the policy limits get changed as
policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
frequency at various instances, we will eventually set the frequency to
1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.
Now lets say the policy limits get changed back at this time with
policy->min as 1 GHz. The next time schedutil is invoked by the
scheduler, we will reevaluate the next frequency (because
need_freq_update will get set due to limits change event) and lets say
we want to set the frequency to 1.2 GHz again. At this point
sugov_update_next_freq() will find the next_freq == current_freq and
will abort the update, while the CPU actually runs at 1.4 GHz.
Until now need_freq_update was used as a flag to indicate that the
policy's frequency limits have changed, and that we should consider the
new limits while reevaluating the next frequency.
This patch fixes the above mentioned issue by extending the purpose of
the need_freq_update flag. If this flag is set now, the schedutil
governor will not try to abort a frequency change even if next_freq ==
current_freq.
As similar behavior is required in the case of
CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
set to false if that flag is set for the driver.
We also don't need to consider the need_freq_update flag in
sugov_update_single() anymore to handle the special case of busy CPU, as
we won't abort a frequency update anymore.
Reported-by: zhuguangqing <zhuguangqing@xiaomi.com>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rearrange code to avoid a branch ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The array size is FTRACE_KSTACK_NESTING, so the index FTRACE_KSTACK_NESTING
is illegal too. And fix two typos by the way.
Link: https://lkml.kernel.org/r/20201031085714.2147-1-hqjagain@gmail.com
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The tbl_dma_addr argument is used to check the DMA boundary for the
allocations, and thus needs to be a dma_addr_t. swiotlb-xen instead
passed a physical address, which could lead to incorrect results for
strange offsets. Fix this by removing the parameter entirely and hard
code the DMA address for io_tlb_start instead.
Fixes: 91ffe4ad534a ("swiotlb-xen: introduce phys_to_dma/dma_to_phys translations")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to
allocate a buffer for the swiotlb. It does so by calling
memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
If the allocation must fail, no_iotlb_memory is set.
Later during initialization swiotlb-xen comes in
(drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start
is != 0, it thinks the memory is ready to use when actually it is not.
When the swiotlb is actually needed, swiotlb_tbl_map_single gets called
and since no_iotlb_memory is set the kernel panics.
Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been
initialized, it would do the initialization itself, which might still
succeed.
Fix the panic by setting io_tlb_start to 0 on swiotlb initialization
failure, and also by setting no_iotlb_memory to false on swiotlb
initialization success.
Fixes: ac2cbab21f31 ("x86: Don't panic if can not alloc buffer for swiotlb")
Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When an interrupt or NMI comes in and switches the context, there's a delay
from when the preempt_count() shows the update. As the preempt_count() is
used to detect recursion having each context have its own bit get set when
tracing starts, and if that bit is already set, it is considered a recursion
and the function exits. But if this happens in that section where context
has changed but preempt_count() has not been updated, this will be
incorrectly flagged as a recursion.
To handle this case, create another bit call TRANSITION and test it if the
current context bit is already set. Flag the call as a recursion if the
TRANSITION bit is already set, and if not, set it and continue. The
TRANSITION bit will be cleared normally on the return of the function that
set it, or if the current context bit is clear, set it and clear the
TRANSITION bit to allow for another transition between the current context
and an even higher one.
Cc: stable@vger.kernel.org
Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The code that checks recursion will work to only do the recursion check once
if there's nested checks. The top one will do the check, the other nested
checks will see recursion was already checked and return zero for its "bit".
On the return side, nothing will be done if the "bit" is zero.
The problem is that zero is returned for the "good" bit when in NMI context.
This will set the bit for NMIs making it look like *all* NMI tracing is
recursing, and prevent tracing of anything in NMI context!
The simple fix is to return "bit + 1" and subtract that bit on the end to
get the real bit.
Cc: stable@vger.kernel.org
Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The nesting count of trace_printk allows for 4 levels of nesting. The
nesting counter starts at zero and is incremented before being used to
retrieve the current context's buffer. But the index to the buffer uses the
nesting counter after it was incremented, and not its original number,
which in needs to do.
Link: https://lkml.kernel.org/r/20201029161905.4269-1-hqjagain@gmail.com
Cc: stable@vger.kernel.org
Fixes: 3d9622c12c887 ("tracing: Add barrier to trace_printk() buffer nesting modification")
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Prevent undefined behaviour in the timespec64_to_ns() conversion which
is used for converting user supplied time input to nanoseconds. It
lacked overflow protection.
- Mark sched_clock_read_begin/retry() to prevent recursion in the tracer
- Remove unused debug functions in the hrtimer and timerlist code
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+evaETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoT69D/0Z3ZGdNQu3mAPYdMSaW7R3MfXnLKno
biRCwre/T88Exv+Yu612iLthTlfNlxZbofAgUc5BvCwjAjtt+uXlmS8mwXNsbGFd
516hisUnjJkjbcHI6WxrQQ7+YsK2XLjOeV90TpLs9zMgI94kQ+uzEzqKc6ZaRQP4
Zi/emqgHeO/RoObNBWjNh9McrHqXllXU/LeqD7y82JXdLPik78PniKFkh7B1d6u8
RL7kW1TeblkNaHMVjO4/lM9zR1hkZr4GPYbEHIbVN8FMQ10BAD2iswUR72U2mqx7
zf2Jt+09UEzPRu2gOr4Lrvo6bJQKClu4ZUnRt4apU4BeTgyG0gi7pgy0VZ3M1xtq
KUnN8dNSMBwTurv9GuGTdiNpfFad2nsnSAra8r7GlZDHxNe1wWIPJmyu/FV1xohD
yUgRq2s7rvCb894oz159hyIwkI/ZNbFr/AJTCvdzowgf9nf4WS5/Y2/JVLF28LRT
4xPhvuVUfrhHJajoZayuVvxYSVEPZFek2SgUEIFi6b1+FiMFwV6guv/f31D5GQ+E
NHgvDMbDsVmzGt++c4zFHoqWvBrgC8Os8AEP8s2oczfvwMt2A8g7nIrGieAux+Jm
eLO1DfXaXpQi/MmTBf/WD1dKwjuoLV9dT0Fp2y2docu9SXm/bMgRhxX7fm3uDge7
W87XWkfIxd5fzA==
=Gdoq
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A few fixes for timers/timekeeping:
- Prevent undefined behaviour in the timespec64_to_ns() conversion
which is used for converting user supplied time input to
nanoseconds. It lacked overflow protection.
- Mark sched_clock_read_begin/retry() to prevent recursion in the
tracer
- Remove unused debug functions in the hrtimer and timerlist code"
* tag 'timers-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Prevent undefined behaviour in timespec64_to_ns()
timers: Remove unused inline funtion debug_timer_free()
hrtimer: Remove unused inline function debug_hrtimer_free()
time/sched_clock: Mark sched_clock_read_begin/retry() as notrace
caused by recursion when enabling or disabling a tracer on RISC-V (probably
all architectures which patch through stop machine).
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+evLQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQR6D/49PIepVEn63HmgsKdePUShUFzYb+v6
mPV0iwaDCdW6BKXVEuHkzGdz3dB5ytdIZP6tJLV5a44tYxPhYESO+O4FwAJnqBxD
JGWLUELxqeEESnjBHww26/YptxqRwcVir2G0WeRDIO0eV2JE424ptdRSisgRVC2s
Kcu41D9Ry5SLayEKN29GyRXQPrMwLCUTAzeziN4TwJeRB2exSUMg6iDFiGC2uZy6
IzSu9BXak5T61LL7q5urElkY8a+sGLx6N29QBtSNxazYeXCzCoEtX+1IQ2e9b3PO
4abFsEtYJ9Mbw4R+iKm0lIpUMKmr/GOPWBs/PDTqK0aYhpc8UnKXTCr6hWLTvtkE
Anb/08AqN7YtvgbBR+bEN4fPOP4jL5kQVjLbNlIKm5ktRjZ5Sfd9OKOc//c/Bs3e
Auqe2u+UW2REDsB3lNiVFY+I/1nQUVwKLwfOsBHJW1KzH6O+BhJrJgigUERTeyTf
B5eLsOIxKgP1vprA/qQgPc/VT25PFIMOtxr7KhH1qPBej9R/wGEOdeOGSdJoiNYM
aH3kCTUMU0NP8IUSyVQFt4v2IhZ5aF+3rQmyVXifGRl54QCXKgb5gzqcQ/L2SFn+
I4SkB35hAoEzJ9S4vtx5ZtWPXAO20V644qRMalYv5X739zMVZQ2AyHGB9rhhktmX
a8ZDIVFCh0ssmw==
=38RZ
-----END PGP SIGNATURE-----
Merge tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp fix from Thomas Gleixner:
"A single fix for stop machine.
Mark functions no trace to prevent a crash caused by recursion when
enabling or disabling a tracer on RISC-V (probably all architectures
which patch through stop machine)"
* tag 'smp-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine, rcu: Mark functions as notrace
- Fix incorrect failure injection handling on the fuxtex code
- Prevent a preemption warning in lockdep when tracking local_irq_enable()
and interrupts are already enabled
- Remove more raw_cpu_read() usage from lockdep which causes state
corruption on !X86 architectures.
- Make the nr_unused_locks accounting in lockdep correct again.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+evEUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYod93EADG90GmRBYQxn6y2eQUKE/9f5SMiMFJ
KdvuqNBqHOhYm3iUPbZJcb0P/JZi1NHP0fBFMishdESGi96tD96K7T04WD0gmjtm
ArWFroe8uzEYtY9atlEwM0Nrvq0w8ZBLv9x1adXzJ59vB8/8Uq+wzYioSWn9yMcv
ye3jfVyAlM7ouFHDQAA36s/nhvZfxms4C0t+6S3gjVTIp/6riGuYh5t7dbXUMlnu
nGLiIJFjU+ekurweVDGpqD/nAxYfqf3UxebWnrosf7iu6suwYwaPFZGZ/kxlbr5e
qWx0B1RuhjAoefVJlPTkHmuhd0SnH/Gm/tTNkQ3LidJhPTIhLJlb7zffwyZlc510
VdaUipfZ6bNqDD6/dK6fKJtdKSE4w/z3pT53954NUD5zw/jIcHlgnaQieh72DH+F
1EKqmsNrwHAxYfMndQxLGdIoBScUAFzHzDnzsY9KKS2cfhChljzLa2nDIfMsDfKQ
aROugzEbZPQEb1iWUEOF3XopcuZzZQCaPlLDLvAnsBeYEPm0gdmbKFPFsDjOyBVX
/Qc41O7DyHKcoiLX2zM2c7CxnV5J6YEZz3jQSZLFlpH9Ih7jwAl9/6VirggNUNvV
YVsgM/myhYQtJBqHHojNppZFFW3KdgfxWuY7+qt7Ox5w/ck5qYQwRnoB4FROwVHV
pzcYTBE5qkQnIw==
=S01o
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"A couple of locking fixes:
- Fix incorrect failure injection handling in the fuxtex code
- Prevent a preemption warning in lockdep when tracking
local_irq_enable() and interrupts are already enabled
- Remove more raw_cpu_read() usage from lockdep which causes state
corruption on !X86 architectures.
- Make the nr_unused_locks accounting in lockdep correct again"
* tag 'locking-urgent-2020-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Fix nr_unused_locks accounting
locking/lockdep: Remove more raw_cpu_read() usage
futex: Fix incorrect should_fail_futex() handling
lockdep: Fix preemption WARN for spurious IRQ-enable
Hi Linus,
Please, pull the following patches that replace zero-length arrays with
flexible-array members.
Thanks
--
Gustavo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl+cjRUACgkQRwW0y0cG
2zGWAhAAjUfTsAmXWhKNaWFSCYR0Q822puTUWOKfiBd+jjGaO04luTtr2gjv2Dkb
Vgad8H4N8oZU79xfh5JZ5PUyScaso8wE6ZJTh2PLKXpKmNd213f5x/pIt78CCDTa
Y1L/eR41mmveTL3VNS3sf6WaZpT9owxJKGIY8JgdiOmSjxJQpX5zdaC1KYso4eXr
lIXIRo9VLEmVLhhHhZi+QmX6+aQ05E1D9K0ENe4/uEnRsV525W78iwZ4fYeLzr+A
krEOdgx6sPgzajPYnHoayrrcKNKxD5YY1SWuVSm2tqYYIhlRoK3f5xgLOd10RiHE
YMgx8aWzGmGJwoUhgp1bo/l9EZ7O8OWRqM/GOP4x6Wgjdhqw2x5jgskmhsKNGEXu
/BlbS+qL5aUrMCxhvNbApuZW6xBiBbva76MH3vU9vFhZbVz1CHLQdGI0tfxggYWS
jc2UPgoxL9OQlf3jSc+gK7RMFhBGNWn2Aiy8GQas3BxPYXuYPvwOj+irDOG/qZ9D
VZ5swUw4+th+DsF5K53mEFeLv0fONMgL9Ka5bNR6+k6HG0WNLYYVOiet3xYUDo1f
eZbMZthfc+QW7R8cwG0WuFk6rC6mLqE+A9nQuLZoJD+VMuJd4pwW9+6EW8nDX08w
FS4/o92xUFJfOCgaLRS61FSAuSmFENieN+yoKMK/Uf6PJVdNMb4=
=vyu3
-----END PGP SIGNATURE-----
Merge tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull more flexible-array member conversions from Gustavo A. R. Silva:
"Replace zero-length arrays with flexible-array members"
* tag 'flexible-array-conversions-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
printk: ringbuffer: Replace zero-length array with flexible-array member
net/smc: Replace zero-length array with flexible-array member
net/mlx5: Replace zero-length array with flexible-array member
mei: hw: Replace zero-length array with flexible-array member
gve: Replace zero-length array with flexible-array member
Bluetooth: btintel: Replace zero-length array with flexible-array member
scsi: target: tcmu: Replace zero-length array with flexible-array member
ima: Replace zero-length array with flexible-array member
enetc: Replace zero-length array with flexible-array member
fs: Replace zero-length array with flexible-array member
Bluetooth: Replace zero-length array with flexible-array member
params: Replace zero-length array with flexible-array member
tracepoint: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_proto: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_commands: Replace zero-length array with flexible-array member
mailbox: zynqmp-ipi-message: Replace zero-length array with flexible-array member
dmaengine: ti-cppi5: Replace zero-length array with flexible-array member
Almost all machines use GENERIC_CLOCKEVENTS, so it feels wrong to
require each one to select that symbol manually.
Instead, enable it whenever CONFIG_LEGACY_TIMER_TICK is disabled as
a simplification. It should be possible to select both
GENERIC_CLOCKEVENTS and LEGACY_TIMER_TICK from an architecture now
and decide at runtime between the two.
For the clockevents arch-support.txt file, this means that additional
architectures are marked as TODO when they have at least one machine
that still uses LEGACY_TIMER_TICK, rather than being marked 'ok' when
at least one machine has been converted. This means that both m68k and
arm (for riscpc) revert to TODO.
At this point, we could just always enable CONFIG_GENERIC_CLOCKEVENTS
rather than leaving it off when not needed. I built an m68k
defconfig kernel (using gcc-10.1.0) and found that this would add
around 5.5KB in kernel image size:
text data bss dec hex filename
3861936 1092236 196656 5150828 4e986c obj-m68k/vmlinux-no-clockevent
3866201 1093832 196184 5156217 4ead79 obj-m68k/vmlinux-clockevent
On Arm (MACH_RPC), that difference appears to be twice as large,
around 11KB on top of an 6MB vmlinux.
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
There are no more users of xtime_update aside from legacy_timer_tick(),
so fold it into that function and remove the declaration.
update_process_times() is now only called inside of the kernel/time/
code, so the declaration can be moved there.
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
All platforms that currently do not use generic clockevents roughly call
the same set of functions in their timer interrupts: xtime_update(),
update_process_times() and profile_tick(), sometimes in a different
sequence.
Add a helper function that performs all three of them, to make the
callers more uniform and simplify the interface.
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
With Arm EBSA110 gone, nothing uses it any more, so the corresponding
code and the Kconfig option can be removed.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
If a hashtab is accessed in both non-NMI and NMI context, the system may
deadlock on bucket->lock. Fix this issue with percpu counter map_locked.
map_locked rejects concurrent access to the same bucket from the same CPU.
To reduce memory overhead, map_locked is not added per bucket. Instead,
8 percpu counters are added to each hashtab. buckets are assigned to these
counters based on the lower bits of its hash.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201029071925.3103400-3-songliubraving@fb.com
- Modify Kconfig to prevent configuring either the "conservative"
or the "ondemand" governor as the default cpufreq governor if
intel_pstate is selected, in which case "schedutil" is the
default choice for the default governor setting (Rafael Wysocki).
- Modify the cpufreq core, intel_pstate and the schedutil governor
to avoid missing updates of the HWP max limit when intel_pstate
operates in the passive mode with HWP enabled (Rafael Wysocki).
- Fix max_cstate module parameter handling in intel_idle for
processor models with C-state tables coming from ACPI (Chen Yu).
- Clean up assorted pieces of power management code (Jackie Zamow,
Tom Rix, Zhang Qilong).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+cOyISHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxxj0P/1qmvZeLR5ODvSKLURSlzQYPiF4Ud0uo
7huByO3K63hJ9qpEQZVVzNl0Yuu5vCa7cliebnIRNDHMy4AVOVw5Qf9Uhz/tFlmZ
ZBvSX1LGWdvMzUnx00AASdHbNOrFy+YyooQ5RfA8sFmtvAhRhPNRW9VCzc9Midoh
cICSQ6fQDn+0wb+hv89HQlIRvbakM6a8QDL4xI6Y8aS/a5OwOHGJQnC7ZLPIoQLi
BxD/IEDdNvDiLFPx8OpxJ93NtLaz1tyvWFJHvqSnOREDYB22YkG8WLZftY7h8pz6
MMbXrZN11/Tex1H0KXDI4YscCBnwxNnFuBo1ur/FX6GvDm2Sm9ybpVHCXxYKux46
ZFc8K2auqApt98xTlrqKru/rfCB2/fubypUustqpYeCjowBENPFbjyag8bRaKLh3
awiDwwz1wZDd2mZfEl4iaxJI4gjUwRmPDC1UiT1dlc7nozaDzPxrGGiQghAshOnW
JVQAiw+evERIFuH5YvJ+/si1sJYhreMjoQ7mICNMmpgJ2ElVvNufpW9aVRQiycp1
9a0u1ZgGs9DI1580+YTl1nGvsRefdgvMDpozcESwxY8dmMm+027+lyLhwlrUTPfy
gmakUcb4jbvIicG53huxus7teqBLwrX36HOHBb1l3tSJfOkcQieddGUZxuT8Q9xp
rAY2mRMRr7SI
=w+LI
-----END PGP SIGNATURE-----
Merge tag 'pm-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a few issues related to running intel_pstate in the passive
mode with HWP enabled, correct the handling of the max_cstate module
parameter in intel_idle and make a few janitorial changes.
Specifics:
- Modify Kconfig to prevent configuring either the "conservative" or
the "ondemand" governor as the default cpufreq governor if
intel_pstate is selected, in which case "schedutil" is the default
choice for the default governor setting (Rafael Wysocki).
- Modify the cpufreq core, intel_pstate and the schedutil governor to
avoid missing updates of the HWP max limit when intel_pstate
operates in the passive mode with HWP enabled (Rafael Wysocki).
- Fix max_cstate module parameter handling in intel_idle for
processor models with C-state tables coming from ACPI (Chen Yu).
- Clean up assorted pieces of power management code (Jackie Zamow,
Tom Rix, Zhang Qilong)"
* tag 'pm-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: schedutil: Always call driver if CPUFREQ_NEED_UPDATE_LIMITS is set
cpufreq: Introduce cpufreq_driver_test_flags()
cpufreq: speedstep: remove unneeded semicolon
PM: sleep: fix typo in kernel/power/process.c
intel_idle: Fix max_cstate for processor models without C-state tables
cpufreq: intel_pstate: Avoid missing HWP max updates in passive mode
cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS driver flag
cpufreq: Avoid configuring old governors as default with intel_pstate
cpufreq: e_powersaver: remove unreachable break
Chris reported that commit 24d5a3bffef1 ("lockdep: Fix
usage_traceoverflow") breaks the nr_unused_locks validation code
triggered by /proc/lockdep_stats.
By fully splitting LOCK_USED and LOCK_USED_READ it becomes a bad
indicator for accounting nr_unused_locks; simplyfy by using any first
bit.
Fixes: 24d5a3bffef1 ("lockdep: Fix usage_traceoverflow")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://lkml.kernel.org/r/20201027124834.GL2628@hirez.programming.kicks-ass.net
I initially thought raw_cpu_read() was OK, since if it is !0 we have
IRQs disabled and can't get migrated, so if we get migrated both CPUs
must have 0 and it doesn't matter which 0 we read.
And while that is true; it isn't the whole store, on pretty much all
architectures (except x86) this can result in computing the address for
one CPU, getting migrated, the old CPU continuing execution with another
task (possibly setting recursion) and then the new CPU reading the value
of the old CPU, which is no longer 0.
Similer to:
baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201026152256.GB2651@hirez.programming.kicks-ass.net
Commit 3193c0836 ("bpf: Disable GCC -fgcse optimization for
___bpf_prog_run()") introduced a __no_fgcse macro that expands to a
function scope __attribute__((optimize("-fno-gcse"))), to disable a
GCC specific optimization that was causing trouble on x86 builds, and
was not expected to have any positive effect in the first place.
However, as the GCC manual documents, __attribute__((optimize))
is not for production use, and results in all other optimization
options to be forgotten for the function in question. This can
cause all kinds of trouble, but in one particular reported case,
it causes -fno-asynchronous-unwind-tables to be disregarded,
resulting in .eh_frame info to be emitted for the function.
This reverts commit 3193c0836, and instead, it disables the -fgcse
optimization for the entire source file, but only when building for
X86 using GCC with CONFIG_BPF_JIT_ALWAYS_ON disabled. Note that the
original commit states that CONFIG_RETPOLINE=n triggers the issue,
whereas CONFIG_RETPOLINE=y performs better without the optimization,
so it is kept disabled in both cases.
Fixes: 3193c0836f20 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=GFHpzoj_hCoBQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/20201028171506.15682-2-ardb@kernel.org
Because sugov_update_next_freq() may skip a frequency update even if
the need_freq_update flag has been set for the policy at hand, policy
limits updates may not take effect as expected.
For example, if the intel_pstate driver operates in the passive mode
with HWP enabled, it needs to update the HWP min and max limits when
the policy min and max limits change, respectively, but that may not
happen if the target frequency does not change along with the limit
at hand. In particular, if the policy min is changed first, causing
the target frequency to be adjusted to it, and the policy max limit
is changed later to the same value, the HWP max limit will not be
updated to follow it as expected, because the target frequency is
still equal to the policy min limit and it will not change until
that limit is updated.
To address this issue, modify get_next_freq() to let the driver
callback run if the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag
is set regardless of whether or not the new frequency to set is
equal to the previous one.
Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
Reported-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 1c534352f47f cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS ...
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: a62f68f5ca53 cpufreq: Introduce cpufreq_driver_test_flags()
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If a module fails to load due to an error in prepare_coming_module(),
the following error handling in load_module() runs with
MODULE_STATE_COMING in module's state. Fix it by correctly setting
MODULE_STATE_GOING under "bug_cleanup" label.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
irq_enter_from_user_mode() was changed to irqentry_enter_from_user_mode().
Update the comment within irqentry_enter() to reflect this change.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201028163632.965518-1-ira.weiny@intel.com