IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Pull workqueue updates from Tejun Heo:
"Just a couple cleanup patches. No functional changes."
* 'for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Remove GPF argument from alloc_workqueue_attrs()
workqueue: Make alloc/apply/free_workqueue_attrs() static
Pull integrity updates from Mimi Zohar:
"Bug fixes, code clean up, and new features:
- IMA policy rules can be defined in terms of LSM labels, making the
IMA policy dependent on LSM policy label changes, in particular LSM
label deletions. The new environment, in which IMA-appraisal is
being used, frequently updates the LSM policy and permits LSM label
deletions.
- Prevent an mmap'ed shared file opened for write from also being
mmap'ed execute. In the long term, making this and other similar
changes at the VFS layer would be preferable.
- The IMA per policy rule template format support is needed for a
couple of new/proposed features (eg. kexec boot command line
measurement, appended signatures, and VFS provided file hashes).
- Other than the "boot-aggregate" record in the IMA measuremeent
list, all other measurements are of file data. Measuring and
storing the kexec boot command line in the IMA measurement list is
the first buffer based measurement included in the measurement
list"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: Introduce struct evm_xattr
ima: Update MAX_TEMPLATE_NAME_LEN to fit largest reasonable definition
KEXEC: Call ima_kexec_cmdline to measure the boot command line args
IMA: Define a new template field buf
IMA: Define a new hook to measure the kexec boot command line arguments
IMA: support for per policy rule template formats
integrity: Fix __integrity_init_keyring() section mismatch
ima: Use designated initializers for struct ima_event_data
ima: use the lsm policy update notifier
LSM: switch to blocking policy update notifiers
x86/ima: fix the Kconfig dependency for IMA_ARCH_POLICY
ima: Make arch_policy_entry static
ima: prevent a file already mmap'ed write to be mmap'ed execute
x86/ima: check EFI SetupMode too
-----BEGIN PGP SIGNATURE-----
iQIVAwUAXRU89Pu3V2unywtrAQIdBBAAmMBsrfv+LUN4Vru/D6KdUO4zdYGcNK6m
S56bcNfP6oIDEj6HrNNnzKkWIZpdZ61Odv1zle96+v4WZ/6rnLCTpcsdaFNTzaoO
YT2jk7jplss0ImrMv1DSoykGqO3f0ThMIpGCxHKZADGSu0HMbjSEh+zLPV4BaMtT
BVuF7P3eZtDRLdDtMtYcgvf5UlbdoBEY8w1FUjReQx8hKGxVopGmCo5vAeiY8W9S
ybFSZhPS5ka33ynVrLJH2dqDo5A8pDhY8I4bdlcxmNtRhnPCYZnuvTqeAzyUKKdI
YN9zJeDu1yHs9mi8dp45NPJiKy6xLzWmUwqH8AvR8MWEkrwzqbzNZCEHZ41j74hO
YZWI0JXi72cboszFvOwqJERvITKxrQQyVQLPRQE2vVbG0bIZPl8i7oslFVhitsl+
evWqHb4lXY91rI9cC6JIXR1OiUjp68zXPv7DAnxv08O+PGcioU1IeOvPivx8QSx4
5aUeCkYIIAti/GISzv7xvcYh8mfO76kBjZSB35fX+R9DkeQpxsHmmpWe+UCykzWn
EwhHQn86+VeBFP6RAXp8CgNCLbrwkEhjzXQl/70s1eYbwvK81VcpDAQ6+cjpf4Hb
QUmrUJ9iE0wCNl7oqvJZoJvWVGlArvPmzpkTJk3N070X2R0T7x1WCsMlPDMJGhQ2
fVHvA3QdgWs=
=Push
-----END PGP SIGNATURE-----
Merge tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull keyring namespacing from David Howells:
"These patches help make keys and keyrings more namespace aware.
Firstly some miscellaneous patches to make the process easier:
- Simplify key index_key handling so that the word-sized chunks
assoc_array requires don't have to be shifted about, making it
easier to add more bits into the key.
- Cache the hash value in the key so that we don't have to calculate
on every key we examine during a search (it involves a bunch of
multiplications).
- Allow keying_search() to search non-recursively.
Then the main patches:
- Make it so that keyring names are per-user_namespace from the point
of view of KEYCTL_JOIN_SESSION_KEYRING so that they're not
accessible cross-user_namespace.
keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEYRING_NAME for this.
- Move the user and user-session keyrings to the user_namespace
rather than the user_struct. This prevents them propagating
directly across user_namespaces boundaries (ie. the KEY_SPEC_*
flags will only pick from the current user_namespace).
- Make it possible to include the target namespace in which the key
shall operate in the index_key. This will allow the possibility of
multiple keys with the same description, but different target
domains to be held in the same keyring.
keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEY_TAG for this.
- Make it so that keys are implicitly invalidated by removal of a
domain tag, causing them to be garbage collected.
- Institute a network namespace domain tag that allows keys to be
differentiated by the network namespace in which they operate. New
keys that are of a type marked 'KEY_TYPE_NET_DOMAIN' are assigned
the network domain in force when they are created.
- Make it so that the desired network namespace can be handed down
into the request_key() mechanism. This allows AFS, NFS, etc. to
request keys specific to the network namespace of the superblock.
This also means that the keys in the DNS record cache are
thenceforth namespaced, provided network filesystems pass the
appropriate network namespace down into dns_query().
For DNS, AFS and NFS are good, whilst CIFS and Ceph are not. Other
cache keyrings, such as idmapper keyrings, also need to set the
domain tag - for which they need access to the network namespace of
the superblock"
* tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Pass the network namespace into request_key mechanism
keys: Network namespace domain tag
keys: Garbage collect keys for which the domain has been removed
keys: Include target namespace in match criteria
keys: Move the user and user-session keyrings to the user_namespace
keys: Namespace keyring names
keys: Add a 'recurse' flag for keyring searches
keys: Cache the hash value to avoid lots of recalculation
keys: Simplify key description management
-----BEGIN PGP SIGNATURE-----
iQIVAwUAXRPObfu3V2unywtrAQJLKA//WENO5pZDHe49T+4GCY0ZmnGHKBUnU7g9
DUjxSNS8a/nwCyEdApZk9uHp2xsOedP6pjQ4VRWMQfrIPx0Yh9o3J+BQxvyP7PDf
jEH+5CYC8dZnJJjjteWCcPEGrUoNb1YKfDRBU745YY+rLdHWvhHc27B6SYBg5BGT
OwW3qyHvp0WMp7TehMALdnkqGph5gR5QMr45tOrH6DkGAhN8mAIKD699d3MqZG73
+S5KlQOlDlEVrxbD/BgzlzEJQUBQyq8hd61taBFT7LXBNlLJJOnMhd7UJY5IJE7J
Vi9NpcLj4Emwv4wvZ2xneV0rMbsCbxRMKZLDRuqQ6Tm17xjpjro4n1ujneTAqmmy
d+XlrVQ2ZMciMNmGleezOoBib9QbY5NWdilc2ls5ydFGiBVL73bIOYtEQNai8lWd
LBBIIrxOmLO7bnipgqVKRnqeMdMkpWaLISoRfSeJbRt4lGxmka9bDBrSgONnxzJK
JG+sB8ahSVZaBbhERW8DKnBz61Yf8ka7ijVvjH3zCXu0rbLTy+LLUz5kbzbBP9Fc
LiUapLV/v420gD2ZRCgPQwtQui4TpBkSGJKS1Ippyn7LGBNCZLM4Y8vOoo4nqr7z
RhpEKbKeOdVjORaYjO8Zttj8gN9rT6WnPcyCTHdNEnyjotU1ykyVBkzexj+VYvjM
C3eIdjG7Jk0=
=c2FO
-----END PGP SIGNATURE-----
Merge tag 'keys-request-20190626' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull request_key improvements from David Howells:
"These are all request_key()-related, including a fix and some improvements:
- Fix the lack of a Link permission check on a key found by
request_key(), thereby enabling request_key() to link keys that
don't grant this permission to the target keyring (which must still
grant Write permission).
Note that the key must be in the caller's keyrings already to be
found.
- Invalidate used request_key authentication keys rather than
revoking them, so that they get cleaned up immediately rather than
hanging around till the expiry time is passed.
- Move the RCU locks outwards from the keyring search functions so
that a request_key_rcu() can be provided. This can be called in RCU
mode, so it can't sleep and can't upcall - but it can be called
from LOOKUP_RCU pathwalk mode.
- Cache the latest positive result of request_key*() temporarily in
task_struct so that filesystems that make a lot of request_key()
calls during pathwalk can take advantage of it to avoid having to
redo the searching. This requires CONFIG_KEYS_REQUEST_CACHE=y.
It is assumed that the key just found is likely to be used multiple
times in each step in an RCU pathwalk, and is likely to be reused
for the next step too.
Note that the cleanup of the cache is done on TIF_NOTIFY_RESUME,
just before userspace resumes, and on exit"
* tag 'keys-request-20190626' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Kill off request_key_async{,_with_auxdata}
keys: Cache result of request_key*() temporarily in task_struct
keys: Provide request_key_rcu()
keys: Move the RCU locks outwards from the keyring search functions
keys: Invalidate used request_key authentication keys
keys: Fix request_key() lack of Link perm check on found key
-----BEGIN PGP SIGNATURE-----
iQIVAwUAXQo23fu3V2unywtrAQJghA/+Oi2W9tSfz67zMupYiqa71x5Zg5XlUVIz
RJxSIwYhE4bhGwodTmqgRlT6f64Gbgt0K8YapGUIbtV/T6d1w02oEmt0V9vad9Zi
wTH79hH5QKNvewUDhrWODsWhtOBWu1sGt9OozI+c65lsvTpHY4Ox7zIl4DtfBdNK
nLUxl82h7EHF9H4TtIKxfKlLkIkmt7NRbK3z1eUP+IG/7MBzoyXgXo/gvoHUCOMR
lhGxttZfxYdZuR9JoR2FBckvKulgafbwjoUc69EDfr8a8IZZrpaUuSTvSPbCfzj1
j0yXfoowiWvsI1lFFBHeE0BfteJRQ9O2Pkwh1Z9M6v4zjwNNprDOw9a3VroeSgS/
OWJyHNjeNLDMMZDm1YYCYs0B416q+lZtdAoE/nhR/lGZlBfKTyAa6Cfo4r0RBpYb
zAxk6K4HcLBL0dkxkTXkxUJPnoDts5bMEL3YuZeVWd7Ef5s5GHW34JI+CFrMR29s
fC9W+ZEZ74fVo2goPz2ekeiSyp28TkWusXxUCk07g0BsXQzB7v5XXUGtU9hAJ6pe
aMBfLwAvQkkGi56CPnGWn6WlZ+AgxbRqnlYWpWf0q+PLiuyo4OeRZzhn6AdNQcCR
2QsTBILOvZbhjEki84ZfsuLLq2k79C2xluEd9JlSAvx5/D93xjMB2qVzR1M6DbdA
+u1nS8Z6WHA=
=Oy7N
-----END PGP SIGNATURE-----
Merge tag 'keys-misc-20190619' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull misc keyring updates from David Howells:
"These are some miscellaneous keyrings fixes and improvements:
- Fix a bunch of warnings from sparse, including missing RCU bits and
kdoc-function argument mismatches
- Implement a keyctl to allow a key to be moved from one keyring to
another, with the option of prohibiting key replacement in the
destination keyring.
- Grant Link permission to possessors of request_key_auth tokens so
that upcall servicing daemons can more easily arrange things such
that only the necessary auth key is passed to the actual service
program, and not all the auth keys a daemon might possesss.
- Improvement in lookup_user_key().
- Implement a keyctl to allow keyrings subsystem capabilities to be
queried.
The keyutils next branch has commits to make available, document and
test the move-key and capabilities code:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log
They're currently on the 'next' branch"
* tag 'keys-misc-20190619' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Add capability-checking keyctl function
keys: Reuse keyring_index_key::desc_len in lookup_user_key()
keys: Grant Link permission to possessers of request_key auth keys
keys: Add a keyctl to move a key between keyrings
keys: Hoist locking out of __key_link_begin()
keys: Break bits out of key_unlink()
keys: Change keyring_serialise_link_sem to a mutex
keys: sparse: Fix kdoc mismatches
keys: sparse: Fix incorrect RCU accesses
keys: sparse: Fix key_fs[ug]id_changed()
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl0bgNYUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXONcRAAqpeGVh3/eU5bmGeiOWZJ5TREx0Qf
4M8Z3CElxtbPF4nz1nARUbH424zF91AOa0B4JVO8BFCgxWN5M3dDOLjqLLfJkfbE
mQMmiPoua1qXTMRi/9S+3kNFYO4IL/sFFiiqY6XVcW6xIUzp3rLwEjcHC/deszP7
/e8IqLUFAqj853W0k7qyLMRFEQVBzrABgtiSX+X06sCB8OmAVxhpevSRR1lmmfEu
sjwuAvxexVlmojwI6HkoANyRzqJRX6y7sMGSbr10I/T9YJTk4VPfeFwSS3qBsf15
z9gTbvFrRcXKoA9U8iG45K0lUinka9OuGxJD/AxuJv+ncyJjWqX+aokvzeo7Wmv6
sbAyD+ikl9kxvE+sZ3l9yZEVHjFIbjmZY/gzG+ZZD2EEwKBuaQBN5mmSjrUkySJk
sbF+oBABLptitJIa/cZJ5QHeAPR1NBqSXKhnhG26IR8iwQqpZhefa8yXpF/x3Tn8
FckvY+YpIakOAMQ/ezVvFaaEELieiRZqqI/ShrochJzwRXHnnbCTPRtNb9NyjOeU
DZCBASPhrYfBJz3n0fZR2HCnpMZwCSGBgmVn3jmh3YyxKnILdQ4DxKgJCv730jwh
9T1+1g2/MW554Gted7KLlkE+aj+BzORx6XJ9H8SKmYB85NF5KnnJMiVktjfl4Jr4
A8meV9KGwAcyBOU=
=8HBN
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20190702' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"This pull request is a bit early, but with some vacation time coming
up I wanted to send this out now just in case the remote Internet Gods
decide not to smile on me once the merge window opens. The patchset
for v5.3 is pretty minor this time, the highlights include:
- When the audit daemon is sent a signal, ensure we deliver
information about the sender even when syscall auditing is not
enabled/supported.
- Add the ability to filter audit records based on network address
family.
- Tighten the audit field filtering restrictions on string based
fields.
- Cleanup the audit field filtering verification code.
- Remove a few BUG() calls from the audit code"
* tag 'audit-pr-20190702' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove the BUG() calls in the audit rule comparison functions
audit: enforce op for string fields
audit: add saddr_fam filter field
audit: re-structure audit field valid checks
audit: deliver signal_info regarless of syscall
Pull scheduler updates from Ingo Molnar:
- Remove the unused per rq load array and all its infrastructure, by
Dietmar Eggemann.
- Add utilization clamping support by Patrick Bellasi. This is a
refinement of the energy aware scheduling framework with support for
boosting of interactive and capping of background workloads: to make
sure critical GUI threads get maximum frequency ASAP, and to make
sure background processing doesn't unnecessarily move to cpufreq
governor to higher frequencies and less energy efficient CPU modes.
- Add the bare minimum of tracepoints required for LISA EAS regression
testing, by Qais Yousef - which allows automated testing of various
power management features, including energy aware scheduling.
- Restructure the former tsk_nr_cpus_allowed() facility that the -rt
kernel used to modify the scheduler's CPU affinity logic such as
migrate_disable() - introduce the task->cpus_ptr value instead of
taking the address of &task->cpus_allowed directly - by Sebastian
Andrzej Siewior.
- Misc optimizations, fixes, cleanups and small enhancements - see the
Git log for details.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
sched/uclamp: Add uclamp support to energy_compute()
sched/uclamp: Add uclamp_util_with()
sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
sched/uclamp: Set default clamps for RT tasks
sched/uclamp: Reset uclamp values on RESET_ON_FORK
sched/uclamp: Extend sched_setattr() to support utilization clamping
sched/core: Allow sched_setattr() to use the current policy
sched/uclamp: Add system default clamps
sched/uclamp: Enforce last task's UCLAMP_MAX
sched/uclamp: Add bucket local max tracking
sched/uclamp: Add CPU's clamp buckets refcounting
sched/fair: Rename weighted_cpuload() to cpu_runnable_load()
sched/debug: Export the newly added tracepoints
sched/debug: Add sched_overutilized tracepoint
sched/debug: Add new tracepoint to track PELT at se level
sched/debug: Add new tracepoints to track PELT at rq level
sched/debug: Add a new sched_trace_*() helper functions
sched/autogroup: Make autogroup_path() always available
sched/wait: Deduplicate code with do-while
sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
...
Pull locking updates from Ingo Molnar:
"The main changes in this cycle are:
- rwsem scalability improvements, phase #2, by Waiman Long, which are
rather impressive:
"On a 2-socket 40-core 80-thread Skylake system with 40 reader
and writer locking threads, the min/mean/max locking operations
done in a 5-second testing window before the patchset were:
40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255
After the patchset, they became:
40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"
There's a lot of changes to the locking implementation that makes
it similar to qrwlock, including owner handoff for more fair
locking.
Another microbenchmark shows how across the spectrum the
improvements are:
"With a locking microbenchmark running on 5.1 based kernel, the
total locking rates (in kops/s) on a 2-socket Skylake system
with equal numbers of readers and writers (mixed) before and
after this patchset were:
# of Threads Before Patch After Patch
------------ ------------ -----------
2 2,618 4,193
4 1,202 3,726
8 802 3,622
16 729 3,359
32 319 2,826
64 102 2,744"
The changes are extensive and the patch-set has been through
several iterations addressing various locking workloads. There
might be more regressions, but unless they are pathological I
believe we want to use this new implementation as the baseline
going forward.
- jump-label optimizations by Daniel Bristot de Oliveira: the primary
motivation was to remove IPI disturbance of isolated RT-workload
CPUs, which resulted in the implementation of batched jump-label
updates. Beyond the improvement of the real-time characteristics
kernel, in one test this patchset improved static key update
overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
as well.
- atomic64_t cross-arch type cleanups by Mark Rutland: over the last
~10 years of atomic64_t existence the various types used by the
APIs only had to be self-consistent within each architecture -
which means they became wildly inconsistent across architectures.
Mark puts and end to this by reworking all the atomic64
implementations to use 's64' as the base type for atomic64_t, and
to ensure that this type is consistently used for parameters and
return values in the API, avoiding further problems in this area.
- A large set of small improvements to lockdep by Yuyang Du: type
cleanups, output cleanups, function return type and othr cleanups
all around the place.
- A set of percpu ops cleanups and fixes by Peter Zijlstra.
- Misc other changes - please see the Git log for more details"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
locking/lockdep: increase size of counters for lockdep statistics
locking/atomics: Use sed(1) instead of non-standard head(1) option
locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
x86/jump_label: Make tp_vec_nr static
x86/percpu: Optimize raw_cpu_xchg()
x86/percpu, sched/fair: Avoid local_clock()
x86/percpu, x86/irq: Relax {set,get}_irq_regs()
x86/percpu: Relax smp_processor_id()
x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
locking/rwsem: Guard against making count negative
locking/rwsem: Adaptive disabling of reader optimistic spinning
locking/rwsem: Enable time-based spinning on reader-owned rwsem
locking/rwsem: Make rwsem->owner an atomic_long_t
locking/rwsem: Enable readers spinning on writer
locking/rwsem: Clarify usage of owner's nonspinaable bit
locking/rwsem: Wake up almost all readers in wait queue
locking/rwsem: More optimal RT task handling of null owner
locking/rwsem: Always release wait_lock before waking up tasks
locking/rwsem: Implement lock handoff to prevent lock starvation
locking/rwsem: Make rwsem_spin_on_owner() return owner state
...
Pull RCU updates from Ingo Molnar:
"The changes in this cycle are:
- RCU flavor consolidation cleanups and optmizations
- Documentation updates
- Miscellaneous fixes
- SRCU updates
- RCU-sync flavor consolidation
- Torture-test updates
- Linux-kernel memory-consistency-model updates, most notably the
addition of plain C-language accesses"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
tools/memory-model: Improve data-race detection
tools/memory-model: Change definition of rcu-fence
tools/memory-model: Expand definition of barrier
tools/memory-model: Do not use "herd" to refer to "herd7"
tools/memory-model: Fix comment in MP+poonceonces.litmus
Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic()
rcu: Don't return a value from rcu_assign_pointer()
rcu: Force inlining of rcu_read_lock()
rcu: Fix irritating whitespace error in rcu_assign_pointer()
rcu: Upgrade sync_exp_work_done() to smp_mb()
rcutorture: Upper case solves the case of the vanishing NULL pointer
torture: Suppress propagating trace_printk() warning
rcutorture: Dump trace buffer for callback pipe drain failures
torture: Add --trust-make to suppress "make clean"
torture: Make --cpus override idleness calculations
torture: Run kernel build in source directory
torture: Add function graph-tracing cheat sheet
torture: Capture qemu output
rcutorture: Tweak kvm options
rcutorture: Add trivial RCU implementation
...
Pull x96 apic updates from Thomas Gleixner:
"Updates for the x86 APIC interrupt handling and APIC timer:
- Fix a long standing issue with spurious interrupts which was caused
by the big vector management rework a few years ago. Robert Hodaszi
provided finally enough debug data and an excellent initial failure
analysis which allowed to understand the underlying issues.
This contains a change to the core interrupt management code which
is required to handle this correctly for the APIC/IO_APIC. The core
changes are NOOPs for most architectures except ARM64. ARM64 is not
impacted by the change as confirmed by Marc Zyngier.
- Newer systems allow to disable the PIT clock for power saving
causing panic in the timer interrupt delivery check of the IO/APIC
when the HPET timer is not enabled either. While the clock could be
turned on this would cause an endless whack a mole game to chase
the proper register in each affected chipset.
These systems provide the relevant frequencies for TSC, CPU and the
local APIC timer via CPUID and/or MSRs, which allows to avoid the
PIT/HPET based calibration. As the calibration code is the only
usage of the legacy timers on modern systems and is skipped anyway
when the frequencies are known already, there is no point in
setting up the PIT and actually checking for the interrupt delivery
via IO/APIC.
To achieve this on a wide variety of platforms, the CPUID/MSR based
frequency readout has been made more robust, which also allowed to
remove quite some workarounds which turned out to be not longer
required. Thanks to Daniel Drake for analysis, patches and
verification"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/irq: Seperate unused system vectors from spurious entry again
x86/irq: Handle spurious interrupt after shutdown gracefully
x86/ioapic: Implement irq_get_irqchip_state() callback
genirq: Add optional hardware synchronization for shutdown
genirq: Fix misleading synchronize_irq() documentation
genirq: Delay deactivation in free_irq()
x86/timer: Skip PIT initialization on modern chipsets
x86/apic: Use non-atomic operations when possible
x86/apic: Make apic_bsp_setup() static
x86/tsc: Set LAPIC timer period to crystal clock frequency
x86/apic: Rename 'lapic_timer_frequency' to 'lapic_timer_period'
x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency
Pull timer updates from Thomas Gleixner:
"The timer and timekeeping departement delivers:
Core:
- The consolidation of the VDSO code into a generic library including
the conversion of x86 and ARM64. Conversion of ARM and MIPS are en
route through the relevant maintainer trees and should end up in
5.4.
This gets rid of the unnecessary different copies of the same code
and brings all architectures on the same level of VDSO
functionality.
- Make the NTP user space interface more robust by restricting the
TAI offset to prevent undefined behaviour. Includes a selftest.
- Validate user input in the compat settimeofday() syscall to catch
invalid values which would be turned into valid values by a
multiplication overflow
- Consolidate the time accessors
- Small fixes, improvements and cleanups all over the place
Drivers:
- Support for the NXP system counter, TI davinci timer
- Move the Microsoft HyperV clocksource/events code into the
drivers/clocksource directory so it can be shared between x86 and
ARM64.
- Overhaul of the Tegra driver
- Delay timer support for IXP4xx
- Small fixes, improvements and cleanups as usual"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
time: Validate user input in compat_settimeofday()
timer: Document TIMER_PINNED
clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic
clocksource/drivers: Make Hyper-V clocksource ISA agnostic
MAINTAINERS: Fix Andy's surname and the directory entries of VDSO
hrtimer: Use a bullet for the returns bullet list
arm64: vdso: Fix compilation with clang older than 8
arm64: compat: Fix __arch_get_hw_counter() implementation
arm64: Fix __arch_get_hw_counter() implementation
lib/vdso: Make delta calculation work correctly
MAINTAINERS: Add entry for the generic VDSO library
arm64: compat: No need for pre-ARMv7 barriers on an ARMv8 system
arm64: vdso: Remove unnecessary asm-offsets.c definitions
vdso: Remove superfluous #ifdef __KERNEL__ in vdso/datapage.h
clocksource/drivers/davinci: Add support for clocksource
clocksource/drivers/davinci: Add support for clockevents
clocksource/drivers/tegra: Set up maximum-ticks limit properly
clocksource/drivers/tegra: Cycles can't be 0
clocksource/drivers/tegra: Restore base address before cleanup
clocksource/drivers/tegra: Add verbose definition for 1MHz constant
...
Pull irq updates from Thomas Gleixner:
"The irq departement provides the usual mixed bag:
Core:
- Further improvements to the irq timings code which aims to predict
the next interrupt for power state selection to achieve better
latency/power balance
- Add interrupt statistics to the core NMI handlers
- The usual small fixes and cleanups
Drivers:
- Support for Renesas RZ/A1, Annapurna Labs FIC, Meson-G12A SoC and
Amazon Gravition AMR/GIC interrupt controllers.
- Rework of the Renesas INTC controller driver
- ACPI support for Socionext SoCs
- Enhancements to the CSKY interrupt controller
- The usual small fixes and cleanups"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
irq/irqdomain: Fix comment typo
genirq: Update irq stats from NMI handlers
irqchip/gic-pm: Remove PM_CLK dependency
irqchip/al-fic: Introduce Amazon's Annapurna Labs Fabric Interrupt Controller Driver
dt-bindings: interrupt-controller: Add Amazon's Annapurna Labs FIC
softirq: Use __this_cpu_write() in takeover_tasklets()
irqchip/mbigen: Stop printing kernel addresses
irqchip/gic: Add dependency for ARM_GIC_MAX_NR
genirq/affinity: Remove unused argument from [__]irq_build_affinity_masks()
genirq/timings: Add selftest for next event computation
genirq/timings: Add selftest for irqs circular buffer
genirq/timings: Add selftest for circular array
genirq/timings: Encapsulate storing function
genirq/timings: Encapsulate timings push
genirq/timings: Optimize the period detection speed
genirq/timings: Fix timings buffer inspection
genirq/timings: Fix next event index function
irqchip/qcom: Use struct_size() in devm_kzalloc()
irqchip/irq-csky-mpintc: Remove unnecessary loop in interrupt handler
dt-bindings: interrupt-controller: Update csky mpintc
...
Pull SMP/hotplug updates from Thomas Gleixner:
"A small set of updates for SMP and CPU hotplug:
- Abort disabling secondary CPUs in the freezer when a wakeup is
pending instead of evaluating it only after all CPUs have been
offlined.
- Remove the shared annotation for the strict per CPU cfd_data in the
smp function call core code.
- Remove the return values of smp_call_function() and on_each_cpu()
as they are unconditionally 0. Fixup the few callers which actually
bothered to check the return value"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Remove smp_call_function() and on_each_cpu() return values
smp: Do not mark call_function_data as shared
cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending
cpu/hotplug: Fix notify_cpu_starting() reference in bringup_wait_for_ap()
- Improve stop_machine wait logic: replace cpu_relax_yield call in generic
stop_machine function with a weak stop_machine_yield function. This is
overridden on s390, which yields the current cpu to the neighbouring cpu
after a couple of retries, instead of blindly giving up the cpu to the
hipervisor. This significantly improves stop_machine performance on s390 in
overcommitted scenarios.
This includes common code changes which have been Acked by Peter Zijlstra
and Thomas Gleixner.
- Improve jump label transformation speed: transform jump labels without
using stop_machine.
- Refactoring of the vfio-ccw cp handling, simplifying the code and
avoiding unneeded allocating/copying.
- Various vfio-ccw fixes (ccw translation, state machine).
- Add support for vfio-ap queue interrupt control in the guest.
This includes s390 kvm changes which have been Acked by Christian
Borntraeger.
- Add protected virtualization support for virtio-ccw.
- Enforce both CONFIG_SMP and CONFIG_HOTPLUG_CPU, which allows to remove some
code which most likely isn't working at all, besides that s390 didn't even
compile for !CONFIG_SMP.
- Support for special flagged EP11 CPRBs for zcrypt.
- Handle PCI devices with no support for new MIO instructions.
- Avoid KASAN false positives in reworked stack unwinder.
- Couple of fixes for the QDIO layer.
- Convert s390 specific documentation to ReST format.
- Let s390 crypto modules return -ENODEV instead of -EOPNOTSUPP if hardware is
missing. This way our modules behave like most other modules and which is
also what systemd's systemd-modules-load.service expects.
- Replace defconfig with performance_defconfig, so there is one config file
less to maintain.
- Remove the SCLP call home device driver, which was never useful.
- Cleanups all over the place.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl0iEpcACgkQjYWKoQLX
FBgtZwf8DOJ6COUG91jKP0RSDlc2YvIMBxopQ38ql1lIsTj5t6DvJ2z3X5uct1wy
6mMiF01VuyD4V4UXbTJQrihzNx7D4dUh47s2sS+diGHxJyXacVxlmjS5k+6pLIUO
AyLvtCcoqDPPiThqnSTZFRm/TcfO/25fCG/IdjrFGj1MD09wHpUCh16tmRPTGFlC
BWZeilDT77fVXnh7Ggn3JB0mQay5PAw2ODOxELHTUBaLmYF8RJPPVKBPmXGl9P1W
84ESm2p+iALGGWDiTOUad9eu8wyQci/V/R+hFgs0Bz/HRcjznNH5EVvfQNCD4VNF
g/PET10nIQYZv2BNdi0cwRjR9jCFbw==
=jp0i
-----END PGP SIGNATURE-----
Merge tag 's390-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Improve stop_machine wait logic: replace cpu_relax_yield call in
generic stop_machine function with a weak stop_machine_yield
function. This is overridden on s390, which yields the current cpu to
the neighbouring cpu after a couple of retries, instead of blindly
giving up the cpu to the hipervisor. This significantly improves
stop_machine performance on s390 in overcommitted scenarios.
This includes common code changes which have been Acked by Peter
Zijlstra and Thomas Gleixner.
- Improve jump label transformation speed: transform jump labels
without using stop_machine.
- Refactoring of the vfio-ccw cp handling, simplifying the code and
avoiding unneeded allocating/copying.
- Various vfio-ccw fixes (ccw translation, state machine).
- Add support for vfio-ap queue interrupt control in the guest. This
includes s390 kvm changes which have been Acked by Christian
Borntraeger.
- Add protected virtualization support for virtio-ccw.
- Enforce both CONFIG_SMP and CONFIG_HOTPLUG_CPU, which allows to
remove some code which most likely isn't working at all, besides that
s390 didn't even compile for !CONFIG_SMP.
- Support for special flagged EP11 CPRBs for zcrypt.
- Handle PCI devices with no support for new MIO instructions.
- Avoid KASAN false positives in reworked stack unwinder.
- Couple of fixes for the QDIO layer.
- Convert s390 specific documentation to ReST format.
- Let s390 crypto modules return -ENODEV instead of -EOPNOTSUPP if
hardware is missing. This way our modules behave like most other
modules and which is also what systemd's systemd-modules-load.service
expects.
- Replace defconfig with performance_defconfig, so there is one config
file less to maintain.
- Remove the SCLP call home device driver, which was never useful.
- Cleanups all over the place.
* tag 's390-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (83 commits)
docs: s390: s390dbf: typos and formatting, update crash command
docs: s390: unify and update s390dbf kdocs at debug.c
docs: s390: restore important non-kdoc parts of s390dbf.rst
vfio-ccw: Fix the conversion of Format-0 CCWs to Format-1
s390/pci: correctly handle MIO opt-out
s390/pci: deal with devices that have no support for MIO instructions
s390: ap: kvm: Enable PQAP/AQIC facility for the guest
s390: ap: implement PAPQ AQIC interception in kernel
vfio: ap: register IOMMU VFIO notifier
s390: ap: kvm: add PQAP interception for AQIC
s390/unwind: cleanup unused READ_ONCE_TASK_STACK
s390/kasan: avoid false positives during stack unwind
s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
s390/qdio: (re-)initialize tiqdio list entries
s390/dasd: Fix a precision vs width bug in dasd_feature_list()
s390/cio: introduce driver_override on the css bus
vfio-ccw: make convert_ccw0_to_ccw1 static
vfio-ccw: Remove copy_ccw_from_iova()
vfio-ccw: Factor out the ccw0-to-ccw1 transition
vfio-ccw: Copy CCW data outside length calculation
...
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG
and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags'
introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl0eHqcACgkQa9axLQDI
XvFyNA/+L+bnkz8m3ncydlqqfXomQn4eJJVQ8Uksb0knJz+1+3CUxxbO4ry4jXZN
fMkbggYrDPRKpDbsUl0lsRipj7jW9bqan+N37c3SWqCkgb6HqDaHViwxdx6Ec/Uk
gHudozDSPh/8c7hxGcSyt/CFyuW6b+8eYIQU5rtIgz8aVY2BypBvS/7YtYCbIkx0
w4CFleRTK1zXD5mJQhrc6jyDx659sVkrAvdhf6YIymOY8nBTv40vwdNo3beJMYp8
Po/+0Ixu+VkHUNtmYYZQgP/AGH96xiTcRnUqd172JdtRPpCLqnLqwFokXeVIlUKT
KZFMDPzK+756Ayn4z4huEePPAOGlHbJje8JVNnFyreKhVVcCotW7YPY/oJR10bnc
eo7yD+DxABTn+93G2yP436bNVa8qO1UqjOBfInWBtnNFJfANIkZweij/MQ6MjaTA
o7KtviHnZFClefMPoiI7HDzwL8XSmsBDbeQ04s2Wxku1Y2xUHLx4iLmadwLQ1ZPb
lZMTZP3N/T1554MoURVA1afCjAwiqU3bt1xDUGjbBVjLfSPBAn/25IacsG9Li9AF
7Rp1M9VhrfLftjFFkB2HwpbhRASOxaOSx+EI3kzEfCtM2O9I1WHgP3rvCdc3l0HU
tbK0/IggQicNgz7GSZ8xDlWPwwSadXYGLys+xlMZEYd3pDIOiFc=
=0TDT
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}
- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly
- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)
- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new
XAFLAG and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)
- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)
- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic
- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms
- perf: DDR performance monitor support for iMX8QXP
- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers
- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent
- arm64 do_page_fault() and hugetlb cleanups
- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)
- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the
'arm_boot_flags' introduced in 5.1)
- CONFIG_RANDOMIZE_BASE now enabled in defconfig
- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area
- Make ZONE_DMA32 configurable
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
perf: arm_spe: Enable ACPI/Platform automatic module loading
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
ACPI/PPTT: Modify node flag detection to find last IDENTICAL
x86/entry: Simplify _TIF_SYSCALL_EMU handling
arm64: rename dump_instr as dump_kernel_instr
arm64/mm: Drop [PTE|PMD]_TYPE_FAULT
arm64: Implement panic_smp_self_stop()
arm64: Improve parking of stopped CPUs
arm64: Expose FRINT capabilities to userspace
arm64: Expose ARMv8.5 CondM capability to userspace
arm64: defconfig: enable CONFIG_RANDOMIZE_BASE
arm64: ARM64_MODULES_PLTS must depend on MODULES
arm64: bpf: do not allocate executable memory
arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
arm64: module: create module allocations without exec permissions
arm64: Allow user selection of ARM64_MODULE_PLTS
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
arm64: Allow selecting Pseudo-NMI again
...
The user value is validated after converting the timeval to a timespec, but
for a wide range of negative tv_usec values the multiplication overflow turns
them in positive numbers. So the 'validated later' is not catching the
invalid input.
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1562460701-113301-1-git-send-email-zhengbin13@huawei.com
The NMI handlers handle_percpu_devid_fasteoi_nmi() and handle_fasteoi_nmi()
do not update the interrupt counts. Due to that the NMI interrupt count
does not show up correctly in /proc/interrupts.
Add the statistics and treat the NMI handlers in the same way as per cpu
interrupts and prevent them from updating irq_desc::tot_count as this might
be corrupted due to concurrency.
[ tglx: Massaged changelog ]
Fixes: 2dcf1fbcad35 ("genirq: Provide NMI handlers")
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1562313336-11888-1-git-send-email-sthotton@marvell.com
Fix two issues:
When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU
reference to the parent's objective credentials, then give that pointer
to get_cred(). However, the object lifetime rules for things like
struct cred do not permit unconditionally turning an RCU reference into
a stable reference.
PTRACE_TRACEME records the parent's credentials as if the parent was
acting as the subject, but that's not the case. If a malicious
unprivileged child uses PTRACE_TRACEME and the parent is privileged, and
at a later point, the parent process becomes attacker-controlled
(because it drops privileges and calls execve()), the attacker ends up
with control over two processes with a privileged ptrace relationship,
which can be abused to ptrace a suid binary and obtain root privileges.
Fix both of these by always recording the credentials of the process
that is requesting the creation of the ptrace relationship:
current_cred() can't change under us, and current is the proper subject
for access control.
This change is theoretically userspace-visible, but I am not aware of
any code that it will actually break.
Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fixes a deadlock from a previous fix to keep module loading
and function tracing text modifications from stepping on each other.
(this has a few patches to help document the issue in comments)
- Fix a crash when the snapshot buffer gets out of sync with the
main ring buffer.
- Fix a memory leak when reading the memory logs
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXRzBCBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnDaAP9qTFBOFtgIGCT5wVP8xjQeESxh1b8R
tbaT7/U2oPpeiwEAvp1mYo5UYcc8KauBqVaLSLJVN4pv07xiZF5Qgh9C1QE=
=m2IT
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This includes three fixes:
- Fix a deadlock from a previous fix to keep module loading and
function tracing text modifications from stepping on each other
(this has a few patches to help document the issue in comments)
- Fix a crash when the snapshot buffer gets out of sync with the main
ring buffer
- Fix a memory leak when reading the memory logs"
* tag 'trace-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare()
tracing/snapshot: Resize spare buffer if size changed
tracing: Fix memory leak in tracing_err_log_open()
ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare()
ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()
free_irq() ensures that no hardware interrupt handler is executing on a
different CPU before actually releasing resources and deactivating the
interrupt completely in a domain hierarchy.
But that does not catch the case where the interrupt is on flight at the
hardware level but not yet serviced by the target CPU. That creates an
interesing race condition:
CPU 0 CPU 1 IRQ CHIP
interrupt is raised
sent to CPU1
Unable to handle
immediately
(interrupts off,
deep idle delay)
mask()
...
free()
shutdown()
synchronize_irq()
release_resources()
do_IRQ()
-> resources are not available
That might be harmless and just trigger a spurious interrupt warning, but
some interrupt chips might get into a wedged state.
Utilize the existing irq_get_irqchip_state() callback for the
synchronization in free_irq().
synchronize_hardirq() is not using this mechanism as it might actually
deadlock unter certain conditions, e.g. when called with interrupts
disabled and the target CPU is the one on which the synchronization is
invoked. synchronize_irq() uses it because that function cannot be called
from non preemtible contexts as it might sleep.
No functional change intended and according to Marc the existing GIC
implementations where the driver supports the callback should be able
to cope with that core change. Famous last words.
Fixes: 464d12309e1b ("x86/vector: Switch IOAPIC to global reservation mode")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.279463375@linutronix.de
When interrupts are shutdown, they are immediately deactivated in the
irqdomain hierarchy. While this looks obviously correct there is a subtle
issue:
There might be an interrupt in flight when free_irq() is invoking the
shutdown. This is properly handled at the irq descriptor / primary handler
level, but the deactivation might completely disable resources which are
required to acknowledge the interrupt.
Split the shutdown code and deactivate the interrupt after synchronization
in free_irq(). Fixup all other usage sites where this is not an issue to
invoke the combined shutdown_and_deactivate() function instead.
This still might be an issue if the interrupt in flight servicing is
delayed on a remote CPU beyond the invocation of synchronize_irq(), but
that cannot be handled at that level and needs to be handled in the
synchronize_irq() context.
Fixes: f8264e34965a ("irqdomain: Introduce new interfaces to support hierarchy irqdomains")
Reported-by: Robert Hodaszi <Robert.Hodaszi@digi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/20190628111440.098196390@linutronix.de
During soft reboot(kexec_file_load) boot command line
arguments are not measured.
Call ima hook ima_kexec_cmdline to measure the boot command line
arguments into IMA measurement list.
- call ima_kexec_cmdline from kexec_file_load.
- move the call ima_add_kexec_buffer after the cmdline
args have been measured.
Signed-off-by: Prakhar Srivastava <prsriva02@gmail.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Pull SMP fixes from Thomas Gleixner:
"Two small changes for the cpu hotplug code:
- Prevent out of bounds access which actually might crash the machine
caused by a missing bounds check in the fail injection code
- Warn about unsupported migitation mode command line arguments to
make people aware that they typoed the paramater. Not necessarily a
fix but quite some people tripped over that"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Fix out-of-bounds read when setting fail state
cpu/speculation: Warn on unsupported mitigations= parameter
Pull perf fixes from Ingo Molnar:
"Various fixes, most of them related to bugs perf fuzzing found in the
x86 code"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/regs: Use PERF_REG_EXTENDED_MASK
perf/x86: Remove pmu->pebs_no_xmm_regs
perf/x86: Clean up PEBS_XMM_REGS
perf/x86/regs: Check reserved bits
perf/x86: Disable extended registers for non-supported PMUs
perf/ioctl: Add check for the sample_period value
perf/core: Fix perf_sample_regs_user() mm check
Avoid skipping bus-level PCI power management during system
resume for PCIe ports left in D0 during the preceding suspend
transition on platforms where the power states of those ports
can change out of the PCI layer's control.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl0XJzcSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxylwQAI8owd3eQV6UNDybkT5MiP0lWb9nbl83
2ouxla+FtAzRFJC0yW4RK86cW4i/Yl8767KV2yqX/69ftmz4XhZBJ63ijKAEoG6o
tHFyY7twy7Sr0MvPRD9rtjUkmdOx9z0OFKHgLhSzC/V4PvgGZTt+eYBm1Bp3icZp
ZY9CFx/bSt9tURY//VqXhvBWT6pEpn1B1D7hsiAp041EwhtTONNs7xAa7ucIP+aG
Ufyb0waVYmiFCX+Lrt/gHzEO2YIpTHIUw3DaMcbR8plHc1gpYtbuZ2ZMScgt2TgL
f0s7GeMOXtF3sODOd/1mhg127ShWbqUkf8EHDyU3JAWa9aesLr3BoFGtKyAT1rbg
O9nyJGBGj5ByUNefua0S8+q0kWI2XHdLAQ8CHBlBQx5W1x1Yg2EeV2Kosxjuhfdp
5K9wFIiPG0F/rtGoAA61dMH9tt87NnY8PgeCyHLFUCoJbhySWr18kwrwrdkimqa5
9FR8OTa8CHGQ/0bPvw+w8S9FdxiEM6yw4wuMLIy3c+a22+lgIiPvkgqzdsWYULdX
CrI62jvz5SvoTwK/UEp9PrCnnHbp4crbSp73Vgo1o1bi5eeaaSobRECq+IbN0T3P
X1H/xn+18mUqmCg4WtDX++14Fe1rMHoe/5CqqE/mp8aCqE9q/3fbAs9INnWJcyrP
a2O0Wk0jLE76
=eGQi
-----END PGP SIGNATURE-----
Merge tag 'pm-5.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Avoid skipping bus-level PCI power management during system resume for
PCIe ports left in D0 during the preceding suspend transition on
platforms where the power states of those ports can change out of the
PCI layer's control"
* tag 'pm-5.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI: PM: Avoid skipping bus-level PM on platforms without ACPI
Commit 5eed6f1dff87 ("fork,memcg: fix crash in free_thread_stack on
memcg charge fail") corrected two instances, but there was a third
instance of this bug.
Without setting tsk->stack, if memcg_charge_kernel_stack fails, it'll
execute free_thread_stack() on a dangling pointer.
Enterprise kernels are compiled with VMAP_STACK=y so this isn't
critical, but custom VMAP_STACK=n builds should have some performance
advantage, with the drawback of risking to fail fork because compaction
didn't succeed. So as long as VMAP_STACK=n is a supported option it's
worth fixing it upstream.
Link: http://lkml.kernel.org/r/20190619011450.28048-1-aarcange@redhat.com
Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the minimal fix for stable, I'll send cleanups later.
Commit 854a6ed56839 ("signal: Add restore_user_sigmask()") introduced
the visible change which breaks user-space: a signal temporary unblocked
by set_user_sigmask() can be delivered even if the caller returns
success or timeout.
Change restore_user_sigmask() to accept the additional "interrupted"
argument which should be used instead of signal_pending() check, and
update the callers.
Eric said:
: For clarity. I don't think this is required by posix, or fundamentally to
: remove the races in select. It is what linux has always done and we have
: applications who care so I agree this fix is needed.
:
: Further in any case where the semantic change that this patch rolls back
: (aka where allowing a signal to be delivered and the select like call to
: complete) would be advantage we can do as well if not better by using
: signalfd.
:
: Michael is there any chance we can get this guarantee of the linux
: implementation of pselect and friends clearly documented. The guarantee
: that if the system call completes successfully we are guaranteed that no
: signal that is unblocked by using sigmask will be delivered?
Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Eric Wong <e@80x24.org>
Tested-by: Eric Wong <e@80x24.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: <stable@vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text
permissions race") causes a possible deadlock between register_kprobe()
and ftrace_run_update_code() when ftrace is using stop_machine().
The existing dependency chain (in reverse order) is:
-> #1 (text_mutex){+.+.}:
validate_chain.isra.21+0xb32/0xd70
__lock_acquire+0x4b8/0x928
lock_acquire+0x102/0x230
__mutex_lock+0x88/0x908
mutex_lock_nested+0x32/0x40
register_kprobe+0x254/0x658
init_kprobes+0x11a/0x168
do_one_initcall+0x70/0x318
kernel_init_freeable+0x456/0x508
kernel_init+0x22/0x150
ret_from_fork+0x30/0x34
kernel_thread_starter+0x0/0xc
-> #0 (cpu_hotplug_lock.rw_sem){++++}:
check_prev_add+0x90c/0xde0
validate_chain.isra.21+0xb32/0xd70
__lock_acquire+0x4b8/0x928
lock_acquire+0x102/0x230
cpus_read_lock+0x62/0xd0
stop_machine+0x2e/0x60
arch_ftrace_update_code+0x2e/0x40
ftrace_run_update_code+0x40/0xa0
ftrace_startup+0xb2/0x168
register_ftrace_function+0x64/0x88
klp_patch_object+0x1a2/0x290
klp_enable_patch+0x554/0x980
do_one_initcall+0x70/0x318
do_init_module+0x6e/0x250
load_module+0x1782/0x1990
__s390x_sys_finit_module+0xaa/0xf0
system_call+0xd8/0x2d0
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(text_mutex);
lock(cpu_hotplug_lock.rw_sem);
lock(text_mutex);
lock(cpu_hotplug_lock.rw_sem);
It is similar problem that has been solved by the commit 2d1e38f56622b9b
("kprobes: Cure hotplug lock ordering issues"). Many locks are involved.
To be on the safe side, text_mutex must become a low level lock taken
after cpu_hotplug_lock.rw_sem.
This can't be achieved easily with the current ftrace design.
For example, arm calls set_all_modules_text_rw() already in
ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c.
This functions is called:
+ outside stop_machine() from ftrace_run_update_code()
+ without stop_machine() from ftrace_module_enable()
Fortunately, the problematic fix is needed only on x86_64. It is
the only architecture that calls set_all_modules_text_rw()
in ftrace path and supports livepatching at the same time.
Therefore it is enough to move text_mutex handling from the generic
kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c:
ftrace_arch_code_modify_prepare()
ftrace_arch_code_modify_post_process()
This patch basically reverts the ftrace part of the problematic
commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module
text permissions race"). And provides x86_64 specific-fix.
Some refactoring of the ftrace code will be needed when livepatching
is implemented for arm or nds32. These architectures call
set_all_modules_text_rw() and use stop_machine() at the same time.
Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com
Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race")
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
[
As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of
ftrace_run_update_code() as it is a void function.
]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
That gets rid of this warning:
./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank line; unexpected unindent.
and displays nicely both at the source code and at the produced
documentation.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Doc Mailing List <linux-doc@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lkml.kernel.org/r/74ddad7dac331b4e5ce4a90e15c8a49e3a16d2ac.1561372382.git.mchehab+samsung@kernel.org
All callers use GFP_KERNEL. No point in having that argument.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
None of those functions have any users outside of workqueue.c. Confine
them.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
anon_inode_getfd() should be used *ONLY* in situations when we are
guaranteed to be past the last failure point (including copying the
descriptor number to userland, at that). And ksys_close() should
not be used for cleanups at all.
anon_inode_getfile() is there for all nontrivial cases like that.
Just use that...
Fixes: b3e583825266 ("clone: add CLONE_PIDFD")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Setting invalid value to /sys/devices/system/cpu/cpuX/hotplug/fail
can control `struct cpuhp_step *sp` address, results in the following
global-out-of-bounds read.
Reproducer:
# echo -2 > /sys/devices/system/cpu/cpu0/hotplug/fail
KASAN report:
BUG: KASAN: global-out-of-bounds in write_cpuhp_fail+0x2cd/0x2e0
Read of size 8 at addr ffffffff89734438 by task bash/1941
CPU: 0 PID: 1941 Comm: bash Not tainted 5.2.0-rc6+ #31
Call Trace:
write_cpuhp_fail+0x2cd/0x2e0
dev_attr_store+0x58/0x80
sysfs_kf_write+0x13d/0x1a0
kernfs_fop_write+0x2bc/0x460
vfs_write+0x1e1/0x560
ksys_write+0x126/0x250
do_syscall_64+0xc1/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f05e4f4c970
The buggy address belongs to the variable:
cpu_hotplug_lock+0x98/0xa0
Memory state around the buggy address:
ffffffff89734300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
ffffffff89734380: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffff89734400: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
^
ffffffff89734480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffff89734500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Add a sanity check for the value written from user space.
Fixes: 1db49484f21ed ("smp/hotplug: Hotplug state fail injection")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Link: https://lkml.kernel.org/r/20190627024732.31672-1-devel@etsukata.com
There are platforms that do not call pm_set_suspend_via_firmware(),
so pm_suspend_via_firmware() returns 'false' on them, but the power
states of PCI devices (PCIe ports in particular) are changed as a
result of powering down core platform components during system-wide
suspend. Thus the pm_suspend_via_firmware() checks in
pci_pm_suspend_noirq() and pci_pm_resume_noirq() introduced by
commit 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to-
idle") are not sufficient to determine that devices left in D0
during suspend will remain in D0 during resume and so the bus-level
power management can be skipped for them.
For this reason, introduce a new global suspend flag,
PM_SUSPEND_FLAG_NO_PLATFORM, set it for suspend-to-idle only
and replace the pm_suspend_via_firmware() checks mentioned above
with checks against this flag.
Fixes: 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to-idle")
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Move the user and user-session keyrings to the user_namespace struct rather
than pinning them from the user_struct struct. This prevents these
keyrings from propagating across user-namespaces boundaries with regard to
the KEY_SPEC_* flags, thereby making them more useful in a containerised
environment.
The issue is that a single user_struct may be represent UIDs in several
different namespaces.
The way the patch does this is by attaching a 'register keyring' in each
user_namespace and then sticking the user and user-session keyrings into
that. It can then be searched to retrieve them.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jann Horn <jannh@google.com>
Keyring names are held in a single global list that any process can pick
from by means of keyctl_join_session_keyring (provided the keyring grants
Search permission). This isn't very container friendly, however.
Make the following changes:
(1) Make default session, process and thread keyring names begin with a
'.' instead of '_'.
(2) Keyrings whose names begin with a '.' aren't added to the list. Such
keyrings are system specials.
(3) Replace the global list with per-user_namespace lists. A keyring adds
its name to the list for the user_namespace that it is currently in.
(4) When a user_namespace is deleted, it just removes itself from the
keyring name list.
The global keyring_name_lock is retained for accessing the name lists.
This allows (4) to work.
This can be tested by:
# keyctl newring foo @s
995906392
# unshare -U
$ keyctl show
...
995906392 --alswrv 65534 65534 \_ keyring: foo
...
$ keyctl session foo
Joined session keyring: 935622349
As can be seen, a new session keyring was created.
The capability bit KEYCTL_CAPS1_NS_KEYRING_NAME is set if the kernel is
employing this feature.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric W. Biederman <ebiederm@xmission.com>
Currently, if the user specifies an unsupported mitigation strategy on the
kernel command line, it will be ignored silently. The code will fall back
to the default strategy, possibly leaving the system more vulnerable than
expected.
This may happen due to e.g. a simple typo, or, for a stable kernel release,
because not all mitigation strategies have been backported.
Inform the user by printing a message.
Fixes: 98af8452945c5565 ("cpu/speculation: Add 'mitigations=' cmdline option")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190516070935.22546-1-geert@linux-m68k.org
When system has been running for a long time, signed integer
counters are not enough for some lockdep statistics. Using
unsigned long counters can satisfy the requirement. Besides,
most of lockdep statistics are unsigned. It is better to use
unsigned int instead of int.
Remove unused variables.
- max_recursion_depth
- nr_cyclic_check_recursions
- nr_find_usage_forwards_recursions
- nr_find_usage_backwards_recursions
Signed-off-by: Kobe Wu <kobe-cp.wu@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-mediatek@lists.infradead.org>
Cc: <wsd_upstream@mediatek.com>
Cc: Eason Lin <eason-yh.lin@mediatek.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://lkml.kernel.org/r/1561365348-16050-1-git-send-email-kobe-cp.wu@mediatek.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The last cleanup patch triggered another issue, as now another function
should be moved into the same section:
kernel/locking/lockdep.c:3580:12: error: 'mark_lock' defined but not used [-Werror=unused-function]
static int mark_lock(struct task_struct *curr, struct held_lock *this,
Move mark_lock() into the same #ifdef section as its only caller, and
remove the now-unused mark_lock_irq() stub helper.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yuyang Du <duyuyang@gmail.com>
Fixes: 0d2cc3b34532 ("locking/lockdep: Move valid_state() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING")
Link: https://lkml.kernel.org/r/20190617124718.1232976-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Energy Aware Scheduler (EAS) estimates the energy impact of waking
up a task on a given CPU. This estimation is based on:
a) an (active) power consumption defined for each CPU frequency
b) an estimation of which frequency will be used on each CPU
c) an estimation of the busy time (utilization) of each CPU
Utilization clamping can affect both b) and c).
A CPU is expected to run:
- on an higher than required frequency, but for a shorter time, in case
its estimated utilization will be smaller than the minimum utilization
enforced by uclamp
- on a smaller than required frequency, but for a longer time, in case
its estimated utilization is bigger than the maximum utilization
enforced by uclamp
While compute_energy() already accounts clamping effects on busy time,
the clamping effects on frequency selection are currently ignored.
Fix it by considering how CPU clamp values will be affected by a
task waking up and being RUNNABLE on that CPU.
Do that by refactoring schedutil_freq_util() to take an additional
task_struct* which allows EAS to evaluate the impact on clamp values of
a task being eventually queued in a CPU. Clamp values are applied to the
RT+CFS utilization only when a FREQUENCY_UTIL is required by
compute_energy().
Do note that switching from ENERGY_UTIL to FREQUENCY_UTIL in the
computation of the cpu_util signal implies that we are more likely to
estimate the highest OPP when a RT task is running in another CPU of
the same performance domain. This can have an impact on energy
estimation but:
- it's not easy to say which approach is better, since it depends on
the use case
- the original approach could still be obtained by setting a smaller
task-specific util_min whenever required
Since we are at that:
- rename schedutil_freq_util() into schedutil_cpu_util(),
since it's not only used for frequency selection.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-12-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So far uclamp_util() allows to clamp a specified utilization considering
the clamp values requested by RUNNABLE tasks in a CPU. For the Energy
Aware Scheduler (EAS) it is interesting to test how clamp values will
change when a task is becoming RUNNABLE on a given CPU.
For example, EAS is interested in comparing the energy impact of
different scheduling decisions and the clamp values can play a role on
that.
Add uclamp_util_with() which allows to clamp a given utilization by
considering the possible impact on CPU clamp values of a specified task.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-11-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Each time a frequency update is required via schedutil, a frequency is
selected to (possibly) satisfy the utilization reported by each
scheduling class and irqs. However, when utilization clamping is in use,
the frequency selection should consider userspace utilization clamping
hints. This will allow, for example, to:
- boost tasks which are directly affecting the user experience
by running them at least at a minimum "requested" frequency
- cap low priority tasks not directly affecting the user experience
by running them only up to a maximum "allowed" frequency
These constraints are meant to support a per-task based tuning of the
frequency selection thus supporting a fine grained definition of
performance boosting vs energy saving strategies in kernel space.
Add support to clamp the utilization of RUNNABLE FAIR and RT tasks
within the boundaries defined by their aggregated utilization clamp
constraints.
Do that by considering the max(min_util, max_util) to give boosted tasks
the performance they need even when they happen to be co-scheduled with
other capped tasks.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-10-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
By default FAIR tasks start without clamps, i.e. neither boosted nor
capped, and they run at the best frequency matching their utilization
demand. This default behavior does not fit RT tasks which instead are
expected to run at the maximum available frequency, if not otherwise
required by explicitly capping them.
Enforce the correct behavior for RT tasks by setting util_min to max
whenever:
1. the task is switched to the RT class and it does not already have a
user-defined clamp value assigned.
2. an RT task is forked from a parent with RESET_ON_FORK set.
NOTE: utilization clamp values are cross scheduling class attributes and
thus they are never changed/reset once a value has been explicitly
defined from user-space.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-9-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>