IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This reverts commit 6fb8f43cede0e4bd3ead847de78d531424a96be9.
The IO threads do allow signals now, including SIGSTOP, and we can allow
ptrace attach. Attaching won't reveal anything interesting for the IO
threads, but it will allow eg gdb to attach to a task with io_urings
and IO threads without complaining. And once attached, it will allow
the usual introspection into regular threads.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit 5be28c8f85ce99ed2d329d2ad8bdd18ea19473a5.
IO threads now take signals just fine, so there's no reason to limit them
specifically. Revert the change that prevented that from happening.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is racy - move the blocking into when the task is created and
we're marking it as PF_IO_WORKER anyway. The IO threads are now
prepared to handle signals like SIGSTOP as well, so clear that from
the mask to allow proper stopping of IO threads.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently module can be unloaded even if there's a trampoline
register in it. It's easily reproduced by running in parallel:
# while :; do ./test_progs -t module_attach; done
# while :; do rmmod bpf_testmod; sleep 0.5; done
Taking the module reference in case the trampoline's ip is
within the module code. Releasing it when the trampoline's
ip is unregistered.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210326105900.151466-1-jolsa@kernel.org
Right now we're never calling get_signal() from PF_IO_WORKER threads, but
in preparation for doing so, don't handle a fatal signal for them. The
workers have state they need to cleanup when exiting, so just return
instead of calling do_exit() on their behalf. The threads themselves will
detect a fatal signal and do proper shutdown.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Modify the runtime PM device suspend to avoid suspending
supplier devices before the consumer device's status changes
to RPM_SUSPENDED (Rafael Wysocki).
- Change the Energy Model code to prevent it from attempting to
create its main debugfs directory too early (Lukasz Luba).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmBeCi8SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxpLYQAJoZrFn3lBIt54Jr28n2DUkFphkt8Ve5
k0lYxW+UDmE+3NCgqrgIl1TB/oJzgPE6s15sv0gtO8IkAaHApoSZbWpT2hs5ipFs
S3zC1/IsBeHB4D2uf/AXhlRKpQtwltNlcTA+zR0bOZ/vqunxbECPpe2yotU4lZb4
EXKwEQEztM0K7uygNL2kxDklT0lOAJ4Ut9tK/wZsPLSO4YuHGgOtKXfnZAVGzupX
gBC5xnGSN3Dr/ywUnMlDc2mmyDots3W9g/uyvBPEUyqMx35kdfKy12Iwk98YbjQU
KIjKeSRPA0tmioXicNkzdikK/9ueV6Yk+89yP3AbmrVxXH0AmEu35wz9NII6nOVj
fSux1CACKFl0LAS0+BtESov2949XwXyOJsgHmKSP8jf5l7gmXk4UHSnU4OLC9t8l
7MjjlRb+0caHNJdpVtbl+eV3JTuR7vcz8xefXbr30r47YLD6MQukRrVi3yMrCWg0
vhLnIL4ZxGUB+D2HwKp3jm8Ezgh/xvnNOdUULLJPTsDWdcSC3SKd3P0KAIU5JAJa
YoNuCkzORHNXWG+B3FBJ35fyOpHDREF+JUqoS6tSxyptbTcLi6sUzHBl3YUTEhBC
nk7lhIUSNjl3kZy4hnm9PSZrbyvvgOw55k2dpZWGKu1UtZZ6/Ck/GTnnSwDJD0AA
cSPZrE0Hm6K7
=9csw
-----END PGP SIGNATURE-----
Merge tag 'pm-5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix an issue related to device links in the runtime PM framework
and debugfs usage in the Energy Model code.
Specifics:
- Modify the runtime PM device suspend to avoid suspending supplier
devices before the consumer device's status changes to
RPM_SUSPENDED (Rafael Wysocki)
- Change the Energy Model code to prevent it from attempting to
create its main debugfs directory too early (Lukasz Luba)"
* tag 'pm-5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: EM: postpone creating the debugfs dir till fs_initcall
PM: runtime: Defer suspending suppliers
The name string for BPF_XOR is "xor", not "or". Fix it.
Fixes: 981f94c3e921 ("bpf: Add bitwise atomic instructions")
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Brendan Jackman <jackmanb@google.com>
Link: https://lore.kernel.org/bpf/20210325134141.8533-1-xukuohai@huawei.com
With the introduction of the struct_ops program type, it became possible to
implement kernel functionality in BPF, making it viable to use BPF in place
of a regular kernel module for these particular operations.
Thus far, the only user of this mechanism is for implementing TCP
congestion control algorithms. These are clearly marked as GPL-only when
implemented as modules (as seen by the use of EXPORT_SYMBOL_GPL for
tcp_register_congestion_control()), so it seems like an oversight that this
was not carried over to BPF implementations. Since this is the only user
of the struct_ops mechanism, just enforcing GPL-only for the struct_ops
program type seems like the simplest way to fix this.
Fixes: 0baf26b0fcd7 ("bpf: tcp: Support tcp_congestion_ops in bpf")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210326100314.121853-1-toke@redhat.com
Merge misc fixes from Andrew Morton:
"14 patches.
Subsystems affected by this patch series: mm (hugetlb, kasan, gup,
selftests, z3fold, kfence, memblock, and highmem), squashfs, ia64,
gcov, and mailmap"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mailmap: update Andrey Konovalov's email address
mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
mm: memblock: fix section mismatch warning again
kfence: make compatible with kmemleak
gcov: fix clang-11+ support
ia64: fix format strings for err_inject
ia64: mca: allocate early mca with GFP_ATOMIC
squashfs: fix xattr id and id lookup sanity checks
squashfs: fix inode lookup sanity checks
z3fold: prevent reclaim/free race for headless pages
selftests/vm: fix out-of-tree build
mm/mmu_notifiers: ensure range_end() is paired with range_start()
kasan: fix per-page tags for non-page_alloc pages
hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
LLVM changed the expected function signatures for llvm_gcda_start_file()
and llvm_gcda_emit_function() in the clang-11 release. Users of
clang-11 or newer may have noticed their kernels failing to boot due to
a panic when enabling CONFIG_GCOV_KERNEL=y +CONFIG_GCOV_PROFILE_ALL=y.
Fix up the function signatures so calling these functions doesn't panic
the kernel.
Link: https://reviews.llvm.org/rGcdd683b516d147925212724b09ec6fb792a40041
Link: https://reviews.llvm.org/rG13a633b438b6500ecad9e4f936ebadf3411d0f44
Link: https://lkml.kernel.org/r/20210312224132.3413602-2-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Prasad Sodagudi <psodagud@quicinc.com>
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Cc: <stable@vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mask is built in build_balance_mask() by for_each_cpu(i, sg_span), so
it must be a subset of sched_group_span(sg).
So the cpumask_and() call is redundant - remove it.
[ mingo: Adjusted the changelog a bit. ]
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com>
Link: https://lore.kernel.org/r/20210325023140.23456-1-song.bao.hua@hisilicon.com
-1 is -EPERM which is a somewhat odd error to return from
sched_dynamic_write(). No other callers care about which negative
value is used.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210325004515.531631-2-linux@rasmusvillemoes.dk
Use the enum names which are also what is used in the switch() in
sched_dynamic_update().
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: https://lore.kernel.org/r/20210325004515.531631-1-linux@rasmusvillemoes.dk
Pull networking fixes from David Miller:
"Various fixes, all over:
1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu.
2) Always store the rx queue mapping in veth, from Maciej
Fijalkowski.
3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov.
4) Fix memory leak in octeontx2-af from Colin Ian King.
5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from
Yonghong Song.
6) Fix tx ptp stats in mlx5, from Aya Levin.
7) Check correct ip version in tun decap, fropm Roi Dayan.
8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit.
9) Work item memork leak in mlx5, from Shay Drory.
10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann.
11) Lack of preemptrion awareness in macvlan, from Eric Dumazet.
12) Fix data race in pxa168_eth, from Pavel Andrianov.
13) Range validate stab in red_check_params(), from Eric Dumazet.
14) Inherit vlan filtering setting properly in b53 driver, from
Florian Fainelli.
15) Fix rtnl locking in igc driver, from Sasha Neftin.
16) Pause handling fixes in igc driver, from Muhammad Husaini
Zulkifli.
17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits.
18) Use after free in qlcnic, from Lv Yunlong.
19) fix crash in fritzpci mISDN, from Tong Zhang.
20) Premature rx buffer reuse in igb, from Li RongQing.
21) Missing termination of ip[a driver message handler arrays, from
Alex Elder.
22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25
driver, from Xie He.
23) Use after free in c_can_pci_remove(), from Tong Zhang.
24) Uninitialized variable use in nl80211, from Jarod Wilson.
25) Off by one size calc in bpf verifier, from Piotr Krysiuk.
26) Use delayed work instead of deferrable for flowtable GC, from
Yinjun Zhang.
27) Fix infinite loop in NPC unmap of octeontx2 driver, from
Hariprasad Kelam.
28) Fix being unable to change MTU of dwmac-sun8i devices due to lack
of fifo sizes, from Corentin Labbe.
29) DMA use after free in r8169 with WoL, fom Heiner Kallweit.
30) Mismatched prototypes in isdn-capi, from Arnd Bergmann.
31) Fix psample UAPI breakage, from Ido Schimmel"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits)
psample: Fix user API breakage
math: Export mul_u64_u64_div_u64
ch_ktls: fix enum-conversion warning
octeontx2-af: Fix memory leak of object buf
ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation
net: bridge: don't notify switchdev for local FDB addresses
net/sched: act_ct: clear post_ct if doing ct_clear
net: dsa: don't assign an error value to tag_ops
isdn: capi: fix mismatched prototypes
net/mlx5: SF, do not use ecpu bit for vhca state processing
net/mlx5e: Fix division by 0 in mlx5e_select_queue
net/mlx5e: Fix error path for ethtool set-priv-flag
net/mlx5e: Offload tuple rewrite for non-CT flows
net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP
net/mlx5: Add back multicast stats for uplink representor
net: ipconfig: ic_dev can be NULL in ic_close_devs
MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one
docs: networking: Fix a typo
r8169: fix DMA being used after buffer free if WoL is enabled
net: ipa: fix init header command validation
...
This commit causes rcutorture to test the new start_poll_synchronize_rcu()
and poll_state_synchronize_rcu() functions. Because of the difficulty of
determining the nature of a synchronous RCU grace (expedited or not),
the test that insisted that poll_state_synchronize_rcu() detect an
intervening synchronize_rcu() had to be dropped.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
There is a need for a non-blocking polling interface for RCU grace
periods, so this commit supplies start_poll_synchronize_rcu() and
poll_state_synchronize_rcu() for this purpose. Note that the existing
get_state_synchronize_rcu() may be used if future grace periods are
inevitable (perhaps due to a later call_rcu() invocation). The new
start_poll_synchronize_rcu() is to be used if future grace periods
might not otherwise happen. Finally, poll_state_synchronize_rcu()
provides a lockless check for a grace period having elapsed since
the corresponding call to either of the get_state_synchronize_rcu()
or start_poll_synchronize_rcu().
As with get_state_synchronize_rcu(), the return value from either
get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
to a later call to either poll_state_synchronize_rcu() or the existing
(might_sleep) cond_synchronize_rcu().
[ paulmck: Revert cond_synchronize_rcu() to might_sleep() per Frederic Weisbecker feedback. ]
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
gcc warns about an empty statement when audit_remove_mark is defined to
nothing:
kernel/auditfilter.c: In function 'audit_data_to_entry':
kernel/auditfilter.c:609:51: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
609 | audit_remove_mark(entry->rule.exe); /* that's the template one */
| ^
Change the macros to use the usual "do { } while (0)" instead, and change a
few more that were (void)0, for consistency.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The debugfs directory '/sys/kernel/debug/energy_model' is needed before
the Energy Model registration can happen. With the recent change in
debugfs subsystem it's not allowed to create this directory at early
stage (core_initcall). Thus creating this directory would fail.
Postpone the creation of the EM debug dir to later stage: fs_initcall.
It should be safe since all clients: CPUFreq drivers, Devfreq drivers
will be initialized in later stages.
The custom debug log below prints the time of creation the EM debug dir
at fs_initcall and successful registration of EMs at later stages.
[ 1.505717] energy_model: creating rootdir
[ 3.698307] cpu cpu0: EM: created perf domain
[ 3.709022] cpu cpu1: EM: created perf domain
Fixes: 56348560d495 ("debugfs: do not attempt to create a new file before the filesystem is initalized")
Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
A long-tail load balance cost is observed on the newly idle path,
this is caused by a race window between the first nr_running check
of the busiest runqueue and its nr_running recheck in detach_tasks.
Before the busiest runqueue is locked, the tasks on the busiest
runqueue could be pulled by other CPUs and nr_running of the busiest
runqueu becomes 1 or even 0 if the running task becomes idle, this
causes detach_tasks breaks with LBF_ALL_PINNED flag set, and triggers
load_balance redo at the same sched_domain level.
In order to find the new busiest sched_group and CPU, load balance will
recompute and update the various load statistics, which eventually leads
to the long-tail load balance cost.
This patch clears LBF_ALL_PINNED flag for this race condition, and hence
reduces the long-tail cost of newly idle balance.
Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/1614154549-116078-1-git-send-email-aubrey.li@intel.com
update_idle_core() is only done for the case of sched_smt_present.
but test_idle_cores() is done for all machines even those without
SMT.
This can contribute to up 8%+ hackbench performance loss on a
machine like kunpeng 920 which has no SMT. This patch removes the
redundant test_idle_cores() for !SMT machines.
Hackbench is ran with -g {2..14}, for each g it is ran 10 times to get
an average.
$ numactl -N 0 hackbench -p -T -l 20000 -g $1
The below is the result of hackbench w/ and w/o this patch:
g= 2 4 6 8 10 12 14
w/o: 1.8151 3.8499 5.5142 7.2491 9.0340 10.7345 12.0929
w/ : 1.8428 3.7436 5.4501 6.9522 8.2882 9.9535 11.3367
+4.1% +8.3% +7.3% +6.3%
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210320221432.924-1-song.bao.hua@hisilicon.com
We noticed that the cost of psi increases with the increase in the
levels of the cgroups. Particularly the cost of cpu_clock() sticks out
as the kernel calls it multiple times as it traverses up the cgroup
tree. This patch reduces the calls to cpu_clock().
Performed perf bench on Intel Broadwell with 3 levels of cgroup.
Before the patch:
$ perf bench sched all
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.747 [sec]
# Running sched/pipe benchmark...
# Executed 1000000 pipe operations between two processes
Total time: 3.516 [sec]
3.516689 usecs/op
284358 ops/sec
After the patch:
$ perf bench sched all
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.640 [sec]
# Running sched/pipe benchmark...
# Executed 1000000 pipe operations between two processes
Total time: 3.329 [sec]
3.329820 usecs/op
300316 ops/sec
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20210321205156.4186483-1-shakeelb@google.com
Most callsites were covered by commit
a8b62fd08505 ("stop_machine: Add function and caller debug info")
but this skipped queue_stop_cpus_work(). Add caller debug info to it.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201210163830.21514-2-valentin.schneider@arm.com
Clang doesn't like format strings that truncate a 32-bit
value to something shorter:
kernel/locking/lockdep.c:709:4: error: format specifies type 'short' but the argument has type 'int' [-Werror,-Wformat]
In this case, the warning is a slightly questionable, as it could realize
that both class->wait_type_outer and class->wait_type_inner are in fact
8-bit struct members, even though the result of the ?: operator becomes an
'int'.
However, there is really no point in printing the number as a 16-bit
'short' rather than either an 8-bit or 32-bit number, so just change
it to a normal %d.
Fixes: de8f5e4f2dc1 ("lockdep: Introduce wait-type checks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210322115531.3987555-1-arnd@kernel.org
Of the three LSMs that implement the security_task_getsecid() LSM
hook, all three LSMs provide the task's objective security
credentials. This turns out to be unfortunate as most of the hook's
callers seem to expect the task's subjective credentials, although
a small handful of callers do correctly expect the objective
credentials.
This patch is the first step towards fixing the problem: it splits
the existing security_task_getsecid() hook into two variants, one
for the subjective creds, one for the objective creds.
void security_task_getsecid_subj(struct task_struct *p,
u32 *secid);
void security_task_getsecid_obj(struct task_struct *p,
u32 *secid);
While this patch does fix all of the callers to use the correct
variant, in order to keep this patch focused on the callers and to
ease review, the LSMs continue to use the same implementation for
both hooks. The net effect is that this patch should not change
the behavior of the kernel in any way, it will be up to the latter
LSM specific patches in this series to change the hook
implementations and return the correct credentials.
Acked-by: Mimi Zohar <zohar@linux.ibm.com> (IMA)
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
There is a need for a non-blocking polling interface for RCU grace
periods, so this commit supplies start_poll_synchronize_rcu() and
poll_state_synchronize_rcu() for this purpose. Note that the existing
get_state_synchronize_rcu() may be used if future grace periods are
inevitable (perhaps due to a later call_rcu() invocation). The new
start_poll_synchronize_rcu() is to be used if future grace periods
might not otherwise happen. Finally, poll_state_synchronize_rcu()
provides a lockless check for a grace period having elapsed since
the corresponding call to either of the get_state_synchronize_rcu()
or start_poll_synchronize_rcu().
As with get_state_synchronize_rcu(), the return value from either
get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
to a later call to either poll_state_synchronize_rcu() or the existing
(might_sleep) cond_synchronize_rcu().
[ paulmck: Remove redundant smp_mb() per Frederic Weisbecker feedback. ]
[ Update poll_state_synchronize_rcu() docbook per Frederic Weisbecker feedback. ]
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The Frequency Invariance Engine (FIE) is providing a frequency scaling
correction factor that helps achieve more accurate load-tracking.
Normally, this scaling factor can be obtained directly with the help of
the cpufreq drivers as they know the exact frequency the hardware is
running at. But that isn't the case for CPPC cpufreq driver.
Another way of obtaining that is using the arch specific counter
support, which is already present in kernel, but that hardware is
optional for platforms.
This patch updates the CPPC driver to register itself with the topology
core to provide its own implementation (cppc_scale_freq_tick()) of
topology_scale_freq_tick() which gets called by the scheduler on every
tick. Note that the arch specific counters have higher priority than
CPPC counters, if available, though the CPPC driver doesn't need to have
any special handling for that.
On an invocation of cppc_scale_freq_tick(), we schedule an irq work
(since we reach here from hard-irq context), which then schedules a
normal work item and cppc_scale_freq_workfn() updates the per_cpu
arch_freq_scale variable based on the counter updates since the last
tick.
To allow platforms to disable this CPPC counter-based frequency
invariance support, this is all done under CONFIG_ACPI_CPPC_CPUFREQ_FIE,
which is enabled by default.
This also exports sched_setattr_nocheck() as the CPPC driver can be
built as a module.
Cc: linux-acpi@vger.kernel.org
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Fix ~42 single-word typos in scheduler code comments.
We have accumulated a few fun ones over the years. :-)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: linux-kernel@vger.kernel.org
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBXahgQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppMVEAC+Kn8AmNPbV7/AX3jfZYEh1UwyPetpJQ2m
FiWkXnuG85kM3UD12S5RYEYkHxzSob2d1yfZ+kL1TAkVJaz3FVoUU9ms0guXfCNb
l8k5fgK2zlegCyBIsPnouR/zV4Y/GJjf+tY0/c1e2Ovfl1zjCW486PvwjJzjMy8b
rXUi3MMKB3JPltML152qi9S1lJJuIHMB22ZUdTiyX+u4RtCzvGHGZmlpb4sw73RF
IRN7qBDYy5Pth+PCUBrhveIPmF/QSKhPHTarczIkgqSw/fSslsgEdBe88fxBDfbf
+WIaYifwqDongT4wkboXFUPTkSUlA+TbvnMW6dRZJTJvRspKz0SV4l+xC/QvT231
JqHqvRk2FkdVlpfXBvdVz94jLFiBJSl02QqTseQGbRdFY4BvxqkC15z4HkPdldJ8
QM2+6ZfzVWbzZkssgK42kTuDq9EX5Ks/+rOkIM/z2L5D00sbeeCVGCeNXf3uS7So
s7pskeTOLoXSvTpwzzEBEpJ6ebU698B1hx++Hjuy95Zifs2holkHXu36wvYmWFDm
CmxZ48waSQJq/emjbOSYfJthKc/TmaUzocsnMvSA5eoCmP445OUQJJTfifEj50if
/k0+XTi1DOrYHyy8R7a8T7xXDJIlMGY7fZyvmzopfRlJHnaHkeBfpbSaPCZXoAiJ
8T/mkYohAw==
=xaEf
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block
Pull io_uring followup fixes from Jens Axboe:
- The SIGSTOP change from Eric, so we properly ignore that for
PF_IO_WORKER threads.
- Disallow sending signals to PF_IO_WORKER threads in general, we're
not interested in having them funnel back to the io_uring owning
task.
- Stable fix from Stefan, ensuring we properly break links for short
send/sendmsg recv/recvmsg if MSG_WAITALL is set.
- Catch and loop when needing to run task_work before a PF_IO_WORKER
threads goes to sleep.
* tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block:
io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
io-wq: ensure task is running before processing task_work
signal: don't allow STOP on PF_IO_WORKER threads
signal: don't allow sending any signals to PF_IO_WORKER threads
plus a DocBook fix.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmBXMJIRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jl4BAAoDqrifzbrgC3wylJALpEAYnPJ1uPdAKP
tE1O1wPoJhb9P2b5ktWUiRzrAx9wpRD3Z3nIxsGgUAC1G/StJ9mF/XgigF0QSAFl
rn49iey6XljcB9prBpFnFkS9C4LmYX4P+0KDImerriSI2rHE/jlhBZrhlQRKTfcj
tHssqsu4i0ZH/O2xmOd0wOeDXiF/EkQX1FFekjfxFa+1xACW979Ucf8RTWjfhkVl
Dtvort/WC/VDzDXH+B0uPVGornTjZL6U6YcsmXu8EmXNo2htgHSkUBvLDMEs/T1q
vtkoTzoz4nrndSCDzSLZJOgp/qCn8Nf2iYesxzV8EICOj6ZDSqpOFIBH/dI0Swvi
8mUzzLRJ4Tb/ng806DBBxZw80q3SWt5VngBZjW37cSyIDtFRvdsp8F/VavBTvPx8
7rleLF0vftWTVVSiBluzZQiIb7wYqr/zQT9Umne/DfvPCqZi9GnJLcBU50Sg/fEB
cAMc8D6jYkoHiYT3eHr/O7QxNyyf7kaMfNMZV0Io71WTYudCvQOPTF055fWLD1+w
zc0MTuIWl+wkLlV9XQ8y9ol/frpN97tHRBOHSiukcci+7YVQwB4J6hla7094GpLl
6zNqQza2QrGtAX9lbwLlXGdnAqOQExyu+sGHZS7IdUUgj2z047iFzOPepWqqYimL
RHO/DJLSGqI=
=IkEX
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"A change to robustify force-threaded IRQ handlers to always disable
interrupts, plus a DocBook fix.
The force-threaded IRQ handler change has been accelerated from the
normal schedule of such a change to keep the bad pattern/workaround of
spin_lock_irqsave() in handlers or IRQF_NOTHREAD as a kludge from
spreading"
* tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Disable interrupts for force threaded handlers
genirq/irq_sim: Fix typos in kernel doc (fnode -> fwnode)
devicetree-node lookups.
- Restore the IRQ2 ignore logic
- Fix get_nr_restart_syscall() to return the correct restart syscall number.
Split in a 4-patches set to avoid kABI breakage when backporting to dead
kernels.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmBXJu0ACgkQEsHwGGHe
VUrCkQ/9Et5W76HMQfHccluks2i2yNXgd7nROhIt0iMS1Ph86AWYJZmMZ2dbaqW8
nORU20ziHme+9PScmcJb2LdJxIRDtYNs1J811IYeKNpvj8KHXtV2VYCVG9UcL21E
FmUlZf5oINiDMzu3q4SuqHw9t7X6RCItolQIRmQHDXqPraFhBxji2VOFXDIg+qhf
a4sBz6UfxA4a/b7d/KxHxNvuQE5Cluc9gninhtaYh1b7OQZJX4+vTa3W5V4kK0df
ohOH5pnJp9V7qH2CmB3UcGWJTxHeLbm4E0KYkyasnKG9M0KmIvJ6jNARlRAo3hAF
hn9D4xLtsnIWjtO6xEVdF7kSizkYZRPay5kX88quvlSa0FkkPnsUvFtW79Yi3ZNy
vL2NAu2biqNQyo7ZWVffJns2DrJwYZ6KOGA6oUBwTUBfieF9KMdDew8IXRUMYNdO
LzW87Irf9eZj9c+b7Rtr0VofmKgRYwy1Lo8eVT+VGkV+nOTOB9rlAll2lYBq3aNA
W6ei0S5/1zaRF5aU6Qmnap4eb1X/tp845q6CPYa9kIsZwVyGFOa7iLeYcNn9qHdB
G6RW6CUh97A7wwxUYt5VGUscjYV2V9Ycv9HvIwrG/T7aezWnhI9ODtggzDgCnbls
og6N/+heLZ9G/DyxAEmHuazV2ItDPJq69gag/POHhXJaSUGbdbA=
=WfC4
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"The freshest pile of shiny x86 fixes for 5.12:
- Add the arch-specific mapping between physical and logical CPUs to
fix devicetree-node lookups
- Restore the IRQ2 ignore logic
- Fix get_nr_restart_syscall() to return the correct restart syscall
number. Split in a 4-patches set to avoid kABI breakage when
backporting to dead kernels"
* tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/of: Fix CPU devicetree-node lookups
x86/ioapic: Ignore IRQ2 again
x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART
x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()
x86: Move TS_COMPAT back to asm/thread_info.h
kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
Just like we don't allow normal signals to IO threads, don't deliver a
STOP to a task that has PF_IO_WORKER set. The IO threads don't take
signals in general, and have no means of flushing out a stop either.
Longer term, we may want to look into allowing stop of these threads,
as it relates to eg process freezing. For now, this prevents a spin
issue if a SIGSTOP is delivered to the parent task.
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
They don't take signals individually, and even if they share signals with
the parent task, don't allow them to be delivered through the worker
thread. Linux does allow this kind of behavior for regular threads, but
it's really a compatability thing that we need not care about for the IO
threads.
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since this message is printed when dynamically allocated spinlocks (e.g.
kzalloc()) are used without initialization (e.g. spin_lock_init()),
suggest to developers to check whether initialization functions for objects
were called, before making developers wonder what annotation is missing.
[ mingo: Minor tweaks to the message. ]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210321064913.4619-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With interrupt force threading all device interrupt handlers are invoked
from kernel threads. Contrary to hard interrupt context the invocation only
disables bottom halfs, but not interrupts. This was an oversight back then
because any code like this will have an issue:
thread(irq_A)
irq_handler(A)
spin_lock(&foo->lock);
interrupt(irq_B)
irq_handler(B)
spin_lock(&foo->lock);
This has been triggered with networking (NAPI vs. hrtimers) and console
drivers where printk() happens from an interrupt which interrupted the
force threaded handler.
Now people noticed and started to change the spin_lock() in the handler to
spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the
interrupt request which in turn breaks RT.
Fix the root cause and not the symptom and disable interrupts before
invoking the force threaded handler which preserves the regular semantics
and the usefulness of the interrupt force threading as a general debugging
tool.
For not RT this is not changing much, except that during the execution of
the threaded handler interrupts are delayed until the handler
returns. Vs. scheduling and softirq processing there is no difference.
For RT kernels there is no issue.
Fixes: 8d32a307e4fa ("genirq: Provide forced interrupt threading")
Reported-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Johan Hovold <johan@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBVI8cQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuFOD/494N0khk5EpLnoq0+/uyRpnqnTjL3n+iWc
fviiodL2/eirKWML/WbNUaKOWMs76iBwRqvTFnmCuyVexM9iPq3BXHocNYESYFni
0EfuL+jzs/LjQLVJgCxyYUyafDtCGZ5ct/3ilfGWSY13ngfYdUVT1p+u9NK94T63
4SrT6KKqEnpStpA1kjCw+doL17Tx2jrcrnX8gztIm0IarTnJGusiNZboy1IBMcqf
Lw7CEePn4b9/0wKJa8sDYIFtI8Rvj2Jk86c4DDpGgoPU6I9fGPnp3oMGrxlwectT
uTguzTlKAvbSu6v+2jqHCcXpkOG3aQJJM+YaNZmWOKwkLdyzLLIDT7SPlNHlacDF
yBj+Ou3FbKvVUrYldUHlQoLZIAgp7AQO1JBilijNNibXsH0M4Gaw3aGPFmhEFfeJ
/y+DXEfi2TGC6Yo+Ogub9Rh3gd2kgATu9Qbbnxi5TmYFc6WASBHP3OQEMVpVkD6F
IZxZDvIKMj3DoYX3Can0vlqiWhmL5o7gyaRTkmxc4A21CR+AHstupDNTHbR23IsY
dVxWmfrU25VFcIUAUOUgzPayDRn5KevexXjpkC8MVPQUqe/8FgI18eigDWTwlkcG
0AZUraswv8uT5b0oLj9cawtAU9Dlit7niI6r9I3dtoUAD3JY4+yDp7oZp2TTOV2z
+rgS+5zjug==
=aPxz
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Quieter week this time, which was both expected and desired. About
half of the below is fixes for this release, the other half are just
fixes in general. In detail:
- Fix the freezing of IO threads, by making the freezer not send them
fake signals. Make them freezable by default.
- Like we did for personalities, move the buffer IDR to xarray. Kills
some code and avoids a use-after-free on teardown.
- SQPOLL cleanups and fixes (Pavel)
- Fix linked timeout race (Pavel)
- Fix potential completion post use-after-free (Pavel)
- Cleanup and move internal structures outside of general kernel view
(Stefan)
- Use MSG_SIGNAL for send/recv from io_uring (Stefan)"
* tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block:
io_uring: don't leak creds on SQO attach error
io_uring: use typesafe pointers in io_uring_task
io_uring: remove structures from include/linux/io_uring.h
io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
io_uring: fix sqpoll cancellation via task_work
io_uring: add generic callback_head helpers
io_uring: fix concurrent parking
io_uring: halt SQO submission on ctx exit
io_uring: replace sqd rw_semaphore with mutex
io_uring: fix complete_post use ctx after free
io_uring: fix ->flags races by linked timeouts
io_uring: convert io_buffer_idr to XArray
io_uring: allow IO worker threads to be frozen
kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
When irq_matrix_free() is called for an unallocated vector the
managed_allocated and total_allocated counters get out of sync with the
real state of the matrix. Later, when the last interrupt is freed, these
counters will underflow resulting in UINTMAX because the counters are
unsigned.
While this is certainly a problem of the calling code, this can be catched
in the allocator by checking the allocation bit for the to be freed vector
which simplifies debugging.
An example of the problem described above:
https://lore.kernel.org/lkml/20210318192819.636943062@linutronix.de/
Add the missing sanity check and emit a warning when it triggers.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210319111823.1105248-1-vkuznets@redhat.com
Simplify kdb commands registration via using linked list instead of
static array for commands storage.
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/20210224070827.408771-1-sumit.garg@linaro.org
Reviewed-by: Douglas Anderson <dianders@chromium.org>
[daniel.thompson@linaro.org: Removed a bunch of .cmd_minline = 0
initializers]
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Cleanup kdb code to get rid of unused function definitions/prototypes.
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/20210224071653.409150-1-sumit.garg@linaro.org
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Sites that match init_section_contains() get marked as INIT. For
built-in code init_sections contains both __init and __exit text. OTOH
kernel_text_address() only explicitly includes __init text (and there
are no __exit text markers).
Match what jump_label already does and ignore the warning for INIT
sites. Also see the excellent changelog for commit: 8f35eaa5f2de
("jump_label: Don't warn on __exit jump entries")
Fixes: 9183c3f9ed710 ("static_call: Add inline static call infrastructure")
Reported-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lkml.kernel.org/r/20210318113610.739542434@infradead.org
The intent is to avoid writing init code after init (because the text
might have been freed). The code is needlessly different between
jump_label and static_call and not obviously correct.
The existing code relies on the fact that the module loader clears the
init layout, such that within_module_init() always fails, while
jump_label relies on the module state which is more obvious and
matches the kernel logic.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lkml.kernel.org/r/20210318113610.636651340@infradead.org
It turns out that static_call_set_init() does not preserve the other
flags; IOW. it clears TAIL if it was set.
Fixes: 9183c3f9ed710 ("static_call: Add inline static call infrastructure")
Reported-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lkml.kernel.org/r/20210318113610.519406371@infradead.org
The ww_acquire_ctx structure for ww_mutex needs to persist for a complete
lock/unlock cycle. In the ww_mutex test in locktorture, however, both
ww_acquire_init() and ww_acquire_fini() are called within the lock
function only. This causes a lockdep splat of "WARNING: Nested lock
was not taken" when lockdep is enabled in the kernel.
To fix this problem, we need to move the ww_acquire_fini() after
the ww_mutex_unlock() in torture_ww_mutex_unlock(). This is done by
allocating a global array of ww_acquire_ctx structures. Each locking
thread is associated with its own ww_acquire_ctx via the unique thread
id it has so that both the lock and unlock functions can access the
same ww_acquire_ctx structure.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210318172814.4400-6-longman@redhat.com