IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Pull irq fix from Thomas Gleixner:
"Fix for an off by one error in a cpumask result comparison"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Fix cpumask check in __irq_startup_managed()
Pull networking fixes from David Miller:
1) Fix hotplug deadlock in hv_netvsc, from Stephen Hemminger.
2) Fix double-free in rmnet driver, from Dan Carpenter.
3) INET connection socket layer can double put request sockets, fix
from Eric Dumazet.
4) Don't match collect metadata-mode tunnels if the device is down,
from Haishuang Yan.
5) Do not perform TSO6/GSO on ipv6 packets with extensions headers in
be2net driver, from Suresh Reddy.
6) Fix scaling error in gen_estimator, from Eric Dumazet.
7) Fix 64-bit statistics deadlock in systemport driver, from Florian
Fainelli.
8) Fix use-after-free in sctp_sock_dump, from Xin Long.
9) Reject invalid BPF_END instructions in verifier, from Edward Cree.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
Documentation: link in networking docs
tcp: fix data delivery rate
bpf/verifier: reject BPF_ALU64|BPF_END
sctp: do not mark sk dumped when inet_sctp_diag_fill returns err
sctp: fix an use-after-free issue in sctp_sock_dump
netvsc: increase default receive buffer size
tcp: update skb->skb_mstamp more carefully
net: ipv4: fix l3slave check for index returned in IP_PKTINFO
net: smsc911x: Quieten netif during suspend
net: systemport: Fix 64-bit stats deadlock
net: vrf: avoid gcc-4.6 warning
qed: remove unnecessary call to memset
tg3: clean up redundant initialization of tnapi
tls: make tls_sw_free_resources static
sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
MAINTAINERS: review Renesas DT bindings as well
net_sched: gen_estimator: fix scaling error in bytes/packets samples
nfp: wait for the NSP resource to appear on boot
nfp: wait for board state before talking to the NSP
...
The result of cpumask_any_and() is invalid when result greater or equal
nr_cpu_ids. The current check is checking for greater only. Fix it.
Fixes: 761ea388e8c4 ("genirq: Handle managed irqs gracefully in irq_startup()")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170913213152.272283444@linutronix.de
Neither ___bpf_prog_run nor the JITs accept it.
Also adds a new test case.
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull more set_fs removal from Al Viro:
"Christoph's 'use kernel_read and friends rather than open-coding
set_fs()' series"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: unexport vfs_readv and vfs_writev
fs: unexport vfs_read and vfs_write
fs: unexport __vfs_read/__vfs_write
lustre: switch to kernel_write
gadget/f_mass_storage: stop messing with the address limit
mconsole: switch to kernel_read
btrfs: switch write_buf to kernel_write
net/9p: switch p9_fd_read to kernel_write
mm/nommu: switch do_mmap_private to kernel_read
serial2002: switch serial2002_tty_write to kernel_{read/write}
fs: make the buf argument to __kernel_write a void pointer
fs: fix kernel_write prototype
fs: fix kernel_read prototype
fs: move kernel_read to fs/read_write.c
fs: move kernel_write to fs/read_write.c
autofs4: switch autofs4_write to __kernel_write
ashmem: switch to ->read_iter
Pull ipc compat cleanup and 64-bit time_t from Al Viro:
"IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
Dinamani"
* 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
utimes: Make utimes y2038 safe
ipc: shm: Make shmid_kernel timestamps y2038 safe
ipc: sem: Make sem_array timestamps y2038 safe
ipc: msg: Make msg_queue timestamps y2038 safe
ipc: mqueue: Replace timespec with timespec64
ipc: Make sys_semtimedop() y2038 safe
get rid of SYSVIPC_COMPAT on ia64
semtimedop(): move compat to native
shmat(2): move compat to native
msgrcv(2), msgsnd(2): move compat to native
ipc(2): move compat to native
ipc: make use of compat ipc_perm helpers
semctl(): move compat to native
semctl(): separate all layout-dependent copyin/copyout
msgctl(): move compat to native
msgctl(): split the actual work from copyin/copyout
ipc: move compat shmctl to native
shmctl: split the work from copyin/copyout
Merge misc fixes from Andrew Morton:
"A few leftovers"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, page_owner: skip unnecessary stack_trace entries
arm64: stacktrace: avoid listing stacktrace functions in stacktrace
mm: treewide: remove GFP_TEMPORARY allocation flag
IB/mlx4: fix sprintf format warning
fscache: fix fscache_objlist_show format processing
lib/test_bitmap.c: use ULL suffix for 64-bit constants
procfs: remove unused variable
drivers/media/cec/cec-adap.c: fix build with gcc-4.4.4
idr: remove WARN_ON_ONCE() when trying to replace negative ID
Now that we have added breaks in the wait queue scan and allow bookmark
on scan position, we put this logic in the wake_up_page_bit function.
We can have very long page wait list in large system where multiple
pages share the same wait list. We break the wake up walk here to allow
other cpus a chance to access the list, and not to disable the interrupts
when traversing the list for too long. This reduces the interrupt and
rescheduling latency, and excessive page wait queue lock hold time.
[ v2: Remove bookmark_wake_function ]
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We encountered workloads that have very long wake up list on large
systems. A waker takes a long time to traverse the entire wake list and
execute all the wake functions.
We saw page wait list that are up to 3700+ entries long in tests of
large 4 and 8 socket systems. It took 0.8 sec to traverse such list
during wake up. Any other CPU that contends for the list spin lock will
spin for a long time. It is a result of the numa balancing migration of
hot pages that are shared by many threads.
Multiple CPUs waking are queued up behind the lock, and the last one
queued has to wait until all CPUs did all the wakeups.
The page wait list is traversed with interrupt disabled, which caused
various problems. This was the original cause that triggered the NMI
watch dog timer in: https://patchwork.kernel.org/patch/9800303/ . Only
extending the NMI watch dog timer there helped.
This patch bookmarks the waker's scan position in wake list and break
the wake up walk, to allow access to the list before the waker resume
its walk down the rest of the wait list. It lowers the interrupt and
rescheduling latency.
This patch also provides a performance boost when combined with the next
patch to break up page wakeup list walk. We saw 22% improvement in the
will-it-scale file pread2 test on a Xeon Phi system running 256 threads.
[ v2: Merged in Linus' changes to remove the bookmark_wake_function, and
simply access to flags. ]
Reported-by: Kan Liang <kan.liang@intel.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All watchdog thread related functions are delegated to the smpboot thread
infrastructure, which handles serialization against CPU hotplug correctly.
The sysctl interface is completely decoupled from anything which requires
CPU hotplug protection.
No need to protect the sysctl writes against cpu hotplug anymore. Remove it
and add the now required protection to the powerpc arch_nmi_watchdog
implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170912194148.418497420@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that all functionality is properly serialized against CPU hotplug,
remove the extra per cpu storage which holds the disabled events for
cleanup. The core makes sure that cleanup happens before new events are
created.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194148.340708074@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Get rid of the hodgepodge which tries to be smart about perf being
unavailable and error printout rate limiting.
That's all not required simply because this is never invoked when the perf
NMI watchdog is not functional.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194148.259651788@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
watchdog_nmi_enable() is an unparseable mess, Provide a clean perf specific
implementation, which will be used when the existing setup/teardown mess is
replaced.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194148.180215498@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use the init time detection of the perf NMI watchdog to determine whether
the perf NMI watchdog is functional. If not disable it permanentely. It
won't come back magically at runtime.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194148.099799541@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The watchdog tries to create perf events even after it figured out that
perf is not functional or the requested event is not supported.
That's braindead as this can be done once at init time and if not supported
the NMI watchdog can be turned off unconditonally.
Implement the perf hardlockup detector functionality for that. This creates
a new event create function, which will replace the unholy mess of the
existing one in later patches.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194148.019090547@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Letting user space poke directly at variables which are used at run time is
stupid and causes a lot of race conditions and other issues.
Seperate the user variables and on change invoke the reconfiguration, which
then stops the watchdogs, reevaluates the new user value and restarts the
watchdogs with the new parameters.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.939985640@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
need to be done in two steps.
1) Stop all NMIs
2) Read the new parameters and start NMIs
Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
clean reconfiguration add a 'run' argument and split the functionality in
powerpc.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20170912194147.862865570@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reflect that these variables are user interface related and remove the
whitespace damage in the sysctl table while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.783210221@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The sysctl of the nmi_watchdog file prevents writes by setting:
min = max = 0
if none of the users is enabled. That involves ifdeffery and is competely
non obvious.
If none of the facilities is enabeld, then the file can simply be made read
only. Move the ifdeffery into the header and use a constant for file
permissions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.706073616@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use a single function to update sysctl changes. This is not a high
frequency user space interface and it's root only.
Preparatory patch to cleanup the sysctl variable handling.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.549114957@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The lockup detector reconfiguration tears down all watchdog threads when
the watchdog is disabled and sets them up again when its enabled.
That's a pointless exercise. The watchdog threads are not consuming an
insane amount of resources, so it's enough to set them up at init time and
keep them in parked position when the watchdog is disabled and unpark them
when it is reenabled. The smpboot thread infrastructure takes care of
keeping the force parked threads in place even across cpu hotplug.
Aside of that the code implements the park/unpark facility of smp hotplug
threads on its own, which is even more pointless. We have functionality in
the smpboot thread code to do so.
Use the new thread management functions and get rid of the unholy mess.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.470370113@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The lockup detector reconfiguration tears down all watchdog threads when
the watchdog is disabled and sets them up again when its enabled.
That's a pointless exercise. The watchdog threads are not consuming an
insane amount of resources, so it's enough to set them up at init time and
keep them in parked position when the watchdog is disabled and unpark them
when it is reenabled. The smpboot thread infrastructure takes care of
keeping the force parked threads in place even across cpu hotplug.
Another horrible mechanism are the open coded park/unpark loops which are
used for reconfiguration of the watchdog. The smpboot infrastructure allows
exactly the same via smpboot_update_cpumask_thread_percpu(), which is cpu
hotplug safe. Using that instead of the open coded loops allows to get rid
of the hotplug locking mess in the watchdog code.
Implement a clean infrastructure which allows to replace the open coded
nonsense.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.377182587@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
smpboot_update_cpumask_threads_percpu() allocates a temporary cpumask at
runtime. This is suboptimal because the call site needs more code size for
proper error handling than a statically allocated temporary mask requires
data size.
Add static temporary cpumask. The function is globaly serialized, so no
further protection required.
Remove the half baken error handling in the watchdog code and get rid of
the export as there are no in tree modular users of that function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.297288838@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Split the write part of the cpumask proc handler out into a separate helper
to avoid deep indentation. This also reduces the patch complexity in the
following cleanups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.218075991@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The #ifdef maze in this file is horrible, group stuff at least a bit so one
can figure out what belongs to what.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.139629546@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Having stub functions which take a full page is not helping the
readablility of code.
Condense them and move the doubled #ifdef variant into the SYSFS section.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194147.045545271@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
b94f51183b06 ("kernel/watchdog: prevent false hardlockup on overloaded system")
tries to fix the following issue:
proc_write()
set_sample_period() <--- New sample period becoms visible
<----- Broken starts
proc_watchdog_update()
watchdog_enable_all_cpus() watchdog_hrtimer_fn()
update_watchdog_all_cpus() restart_timer(sample_period)
watchdog_park_threads()
thread->park()
disable_nmi()
<----- Broken ends
The reason why this is broken is that the update of the watchdog threshold
becomes immediately effective and visible for the hrtimer function which
uses that value to rearm the timer. But the NMI/perf side still uses the
old value up to the point where it is disabled. If the rate has been
lowered then the NMI can run fast enough to 'detect' a hard lockup because
the timer has not fired due to the longer period.
The patch 'fixed' this by adding a variable:
proc_write()
set_sample_period()
<----- Broken starts
proc_watchdog_update()
watchdog_enable_all_cpus() watchdog_hrtimer_fn()
update_watchdog_all_cpus() restart_timer(sample_period)
watchdog_park_threads()
park_in_progress = 1
<----- Broken ends
nmi_watchdog()
if (park_in_progress)
return;
The only effect of this variable was to make the window where the breakage
can hit small enough that it was not longer observable in testing. From a
correctness point of view it is a pointless bandaid which merily papers
over the root cause: the unsychronized update of the variable.
Looking deeper into the related code pathes unearthed similar problems in
the watchdog_start()/stop() functions.
watchdog_start()
perf_nmi_event_start()
hrtimer_start()
watchdog_stop()
hrtimer_cancel()
perf_nmi_event_stop()
In both cases the call order is wrong because if the tasks gets preempted
or the VM gets scheduled out long enough after the first call, then there is
a chance that the next NMI will see a stale hrtimer interrupt count and
trigger a false positive hard lockup splat.
Get rid of park_in_progress so the code can be gradually deobfuscated and
pruned from several layers of duct tape papering over the root cause,
which has been either ignored or not understood at all.
Once this is removed the underlying problem will be fixed by rewriting the
proc interface to do a proper synchronized update.
Address the start/stop() ordering problem as well by reverting the call
order, so this part is at least correct now.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1709052038270.2393@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following deadlock is possible in the watchdog hotplug code:
cpus_write_lock()
...
takedown_cpu()
smpboot_park_threads()
smpboot_park_thread()
kthread_park()
->park() := watchdog_disable()
watchdog_nmi_disable()
perf_event_release_kernel();
put_event()
_free_event()
->destroy() := hw_perf_event_destroy()
x86_release_hardware()
release_ds_buffers()
get_online_cpus()
when a per cpu watchdog perf event is destroyed which drops the last
reference to the PMU hardware. The cleanup code there invokes
get_online_cpus() which instantly deadlocks because the hotplug percpu
rwsem is write locked.
To solve this add a deferring mechanism:
cpus_write_lock()
kthread_park()
watchdog_nmi_disable(deferred)
perf_event_disable(event);
move_event_to_deferred(event);
....
cpus_write_unlock()
cleaup_deferred_events()
perf_event_release_kernel()
This is still properly serialized against concurrent hotplug via the
cpu_add_remove_lock, which is held by the task which initiated the hotplug
event.
This is also used to handle event destruction when the watchdog threads are
parked via other mechanisms than CPU hotplug.
Analyzed-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The self disabling feature is broken vs. CPU hotplug locking:
CPU 0 CPU 1
cpus_write_lock();
cpu_up(1)
wait_for_completion()
....
unpark_watchdog()
->unpark()
perf_event_create() <- fails
watchdog_enable &= ~NMI_WATCHDOG;
....
cpus_write_unlock();
CPU 2
cpus_write_lock()
cpu_down(2)
wait_for_completion()
wakeup(watchdog);
watchdog()
if (!(watchdog_enable & NMI_WATCHDOG))
watchdog_nmi_disable()
perf_event_disable()
....
cpus_read_lock();
stop_smpboot_threads()
park_watchdog();
wait_for_completion(watchdog->parked);
Result: End of hotplug and instantaneous full lockup of the machine.
There is a similar problem with disabling the watchdog via the user space
interface as the sysctl function fiddles with watchdog_enable directly.
It's very debatable whether this is required at all. If the watchdog works
nicely on N CPUs and it fails to enable on the N + 1 CPU either during
hotplug or because the user space interface disabled it via sysctl cpumask
and then some perf user grabbed the counter which is then unavailable for
the watchdog when the sysctl cpumask gets changed back.
There is no real justification for this.
One of the reasons WHY this is done is the utter stupidity of the init code
of the perf NMI watchdog. Instead of checking upfront at boot whether PERF
is available and functional at all, it just does this check at run time
over and over when user space fiddles with the sysctl. That's broken beyond
repair along with the idiotic error code dependent warn level printks and
the even more silly printk rate limiting.
If the init code checks whether perf works at boot time, then this mess can
be more or less avoided completely. Perf does not come magically into life
at runtime. Brain usage while coding is overrated.
Remove the cruft and add a temporary safe guard which gets removed later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.806708429@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The function is only used by the KVM init code. Mark it __init to prevent
creative abuse.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.727134632@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Following patches will use the mutex for other purposes as well. Rename it
as it is not longer a proc specific thing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.647714850@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The watchdog proc interface causes extensive recursive locking of the CPU
hotplug percpu rwsem, which is deadlock prone.
Replace the get/put_online_cpus() pairs with cpu_hotplug_disable()/enable()
calls for now. Later patches will remove that requirement completely.
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.568079057@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This interface has several issues:
- It's causing recursive locking of the hotplug lock.
- It's complete overkill to teardown all threads and then recreate them
The same can be achieved with the simple hardlockup_detector_perf_stop /
restart() interfaces. The abuse from the busy looping poweroff() loop of
PARISC has been solved as well.
Remove the cruft.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.487537732@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
PARISC has a a busy looping power off routine. If the watchdog is enabled
the watchdog timer will still fire, but the thread is not running, which
causes the softlockup watchdog to trigger.
Provide a interface which allows to turn the watchdog off.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: linux-parisc@vger.kernel.org
Link: http://lkml.kernel.org/r/20170912194146.327343752@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Provide an interface to stop and restart perf NMI watchdog events on all
CPUs. This is only usable during init and especially for handling the perf
HT bug on Intel machines. It's safe to use it this way as nothing can
start/stop the NMI watchdog in parallel.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/20170912194146.167649596@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation. As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.
The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory. So
this is rather misleading and hard to evaluate for any benefits.
I have checked some random users and none of them has added the flag
with a specific justification. I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring. This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.
I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL. Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.
I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.
This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
seems to be a heuristic without any measured advantage for most (if not
all) its current users. The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers. So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.
[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull scheduler fixes from Ingo Molnar:
"Three CPU hotplug related fixes and a debugging improvement"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/debug: Add debugfs knob for "sched_debug"
sched/core: WARN() when migrating to an offline CPU
sched/fair: Plug hole between hotplug and active_load_balance()
sched/fair: Avoid newidle balance for !active CPUs
Summary of modules changes for the 4.14 merge window:
- Minor code cleanups and fixes
- modpost: avoid building modules that have names that exceed the size
of the name field in struct module
Signed-off-by: Jessica Yu <jeyu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJZuOmrAAoJEMBFfjjOO8FySvAP/2SLHR+HLU53jbUdQTZF4cYp
2PitELmICHSOmBC2frBsZiy1Dnzh2LDHM4gEenWYkk2nUfpPbURYi+43xbUKugmR
I1pwr5aanibogCfu2C/xi57RonxkS0l/BsFOorFPNNqH8H24rsZaUfNMUtuOsh3D
K1KjM/N5BTncVF2wdXogPl1mlngtzM1Nvu02EbmltJYlTmwv+BlHc6xu4677sW6u
zeZ1gBt/oeKIgenYphL/NmbdI6veV8LVUd5EzcK7QQCbp2Pf/gAKQakQauDHNmRp
WQtNhTksvbKS1qmTX8Qf4UE1i9Sfzg1kokg3AMIsIFJMFCN+WkGz38yTzoNDRUgi
afv9Z0XPgBfoGvwZ2RCPtZqZXC/OHEUbhfnXTFPnjIQAHTrNWNGzwj89RXKTCTLz
dCgA4zUZ9DgGyve2iqDvgWSn+Tb2RevPhajzepEcpz+UNUdXQRJHdcVEfLXWN/1u
dqYXiLWSIcCfqIRl4RDwYeTSbeY9GrLkLzHsL7YSGVL//jubEoKjsSEr2cLsngtr
953jbA+El2DwnPJDoeEAOIN0XBg4arA9Roj4eIBeqG7y/BGpIn0HI+fZui5zxAoR
1fWhmqG5Uvoz/hzWwWOQTu3cQP2fgyJ5Jzg784oLjF8LwCrZYWc+yDdO+J5WywFJ
iA7DBkZoajKFQjf9SZkU
=oTCM
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
"Summary of modules changes for the 4.14 merge window:
- minor code cleanups and fixes
- modpost: avoid building modules that have names that exceed the
size of the name field in struct module"
* tag 'modules-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Remove const attribute from alias for MODULE_DEVICE_TABLE
module: fix ddebug_remove_module()
modpost: abort if module name is too long
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAlmoS/wUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQVeRaWujKfIqubxAAhkcmOgf+bh881VOWjkrl0MpO6n30
LAyLHNpa95xYw7AuxSrx+XP21hZHVWOSPEZdDjC+BOTToqv025XyYUAh+vvhm1pc
HgT7oNOyfEnGdXG8VtluC2zhSunw/gDz7uoUh7+dHpVqa+NayRqaopNY+4tgtVjT
6/DMwfvonTD5GWaNxraFZLaOENXAjbdVBcqoHhnY9cp4w5uGQ3rt6dFpLpW/gW7n
/fUzsjnLTztrsRx3nyEkwJuo/pxugbmZU5sjVgCFd7P729CfBVKqoToIh0CqJfj6
s4RIb//XmRxxiTF1EO7N1suPaqnESjT+Ua3moIuEixs4QjiEu25TNZy8K0b2zLsL
sTt40F5KAbKYXH/WyZxEtPf0HOUwL68oFZ+c4VYcCK6LwJmBLnfhan4BSZgH0/EO
rBIlb5O1znyfuGmLnjUfn+BlPuP35PhRpZVWP2eLZtOC4lY+yaVqzauFIEY/wY96
dYM6YwtJYuZ3C8sQxjT6UWuOYyj/02EgPbvlS7nv4zp1pZNnZ0dx8sfEu6FNeakY
QZAaI4oDvkpj7x4a0biNinacCYIUacRDF63jcKQnaNp3F3Nf1Vh4DKQWbFLfMidN
luWsEsVrPfLynUMZLq3KVUg825bTQw1MapqzlADmOyX6Dq/87/a+nY9IXWOH9TSm
fJjuSsMAtnui1/k=
=/6oy
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20170831' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"A relatively quiet period for SELinux, 11 patches with only two/three
having any substantive changes.
These noteworthy changes include another tweak to the NNP/nosuid
handling, per-file labeling for cgroups, and an object class fix for
AF_UNIX/SOCK_RAW sockets; the rest of the changes are minor tweaks or
administrative updates (Stephen's email update explains the file
explosion in the diffstat).
Everything passes the selinux-testsuite"
[ Also a couple of small patches from the security tree from Tetsuo
Handa for Tomoyo and LSM cleanup. The separation of security policy
updates wasn't all that clean - Linus ]
* tag 'selinux-pr-20170831' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: constify nf_hook_ops
selinux: allow per-file labeling for cgroupfs
lsm_audit: update my email address
selinux: update my email address
MAINTAINERS: update the NetLabel and Labeled Networking information
selinux: use GFP_NOWAIT in the AVC kmem_caches
selinux: Generalize support for NNP/nosuid SELinux domain transitions
selinux: genheaders should fail if too many permissions are defined
selinux: update the selinux info in MAINTAINERS
credits: update Paul Moore's info
selinux: Assign proper class to PF_UNIX/SOCK_RAW sockets
tomoyo: Update URLs in Documentation/admin-guide/LSM/tomoyo.rst
LSM: Remove security_task_create() hook.
Pull irq fixes from Ingo Molnar:
"A sparse irq race/locking fix, and a MSI irq domains population fix"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Make sparse_irq_lock protect what it should protect
genirq/msi: Fix populating multiple interrupts
I'm forever late for editing my kernel cmdline, add a runtime knob to
disable the "sched_debug" thing.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907150614.142924283@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Migrating tasks to offline CPUs is a pretty big fail, warn about it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907150614.094206976@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The load balancer applies cpu_active_mask to whatever sched_domains it
finds, however in the case of active_balance there is a hole between
setting rq->{active_balance,push_cpu} and running the stop_machine
work doing the actual migration.
The @push_cpu can go offline in this window, which would result in us
moving a task onto a dead cpu, which is a fairly bad thing.
Double check the active mask before the stop work does the migration.
CPU0 CPU1
<SoftIRQ>
stop_machine(takedown_cpu)
load_balance() cpu_stopper_thread()
... work = multi_cpu_stop
stop_one_cpu_nowait( /* wait for CPU0 */
.func = active_load_balance_cpu_stop
);
</SoftIRQ>
cpu_stopper_thread()
work = multi_cpu_stop
/* sync with CPU1 */
take_cpu_down()
<idle>
play_dead();
work = active_load_balance_cpu_stop
set_task_cpu(p, CPU1); /* oops!! */
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170907150614.044460912@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On CPU hot unplug, when parking the last kthread we'll try and
schedule into idle to kill the CPU. This last schedule can (and does)
trigger newidle balance because at this point the sched domains are
still up because of commit:
77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
Obviously pulling tasks to an already offline CPU is a bad idea, and
all balancing operations _should_ be subject to cpu_active_mask, make
it so.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
Link: http://lkml.kernel.org/r/20170907150613.994135806@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull namespace updates from Eric Biederman:
"Life has been busy and I have not gotten half as much done this round
as I would have liked. I delayed it so that a minor conflict
resolution with the mips tree could spend a little time in linux-next
before I sent this pull request.
This includes two long delayed user namespace changes from Kirill
Tkhai. It also includes a very useful change from Serge Hallyn that
allows the security capability attribute to be used inside of user
namespaces. The practical effect of this is people can now untar
tarballs and install rpms in user namespaces. It had been suggested to
generalize this and encode some of the namespace information
information in the xattr name. Upon close inspection that makes the
things that should be hard easy and the things that should be easy
more expensive.
Then there is my bugfix/cleanup for signal injection that removes the
magic encoding of the siginfo union member from the kernel internal
si_code. The mips folks reported the case where I had used FPE_FIXME
me is impossible so I have remove FPE_FIXME from mips, while at the
same time including a return statement in that case to keep gcc from
complaining about unitialized variables.
I almost finished the work to get make copy_siginfo_to_user a trivial
copy to user. The code is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3
But I did not have time/energy to get the code posted and reviewed
before the merge window opened.
I was able to see that the security excuse for just copying fields
that we know are initialized doesn't work in practice there are buggy
initializations that don't initialize the proper fields in siginfo. So
we still sometimes copy unitialized data to userspace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
Introduce v3 namespaced file capabilities
mips/signal: In force_fcr31_sig return in the impossible case
signal: Remove kernel interal si_code magic
fcntl: Don't use ambiguous SIG_POLL si_codes
prctl: Allow local CAP_SYS_ADMIN changing exe_file
security: Use user_namespace::level to avoid redundant iterations in cap_capable()
userns,pidns: Verify the userns for new pid namespaces
signal/testing: Don't look for __SI_FAULT in userspace
signal/mips: Document a conflict with SI_USER with SIGFPE
signal/sparc: Document a conflict with SI_USER with SIGFPE
signal/ia64: Document a conflict with SI_USER with SIGFPE
signal/alpha: Document a conflict with SI_USER for SIGTRAP
clang does not support variable length array for structure member.
It has the following error during compilation:
kernel/trace/trace_syscalls.c:568:17: error: fields must have a constant size:
'variable length array in structure' extension will never be supported
unsigned long args[sys_data->nb_args];
^
The fix is to use a fixed array length instead.
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Work around kernel-doc warning ('*' in Sphinx doc means "emphasis"):
../kernel/sched/fair.c:7584: WARNING: Inline emphasis start-string without end-string.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f18b30f9-6251-6d86-9d44-16501e386891@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull networking fixes from David Miller:
"The iwlwifi firmware compat fix is in here as well as some other
stuff:
1) Fix request socket leak introduced by BPF deadlock fix, from Eric
Dumazet.
2) Fix VLAN handling with TXQs in mac80211, from Johannes Berg.
3) Missing __qdisc_drop conversions in prio and qfq schedulers, from
Gao Feng.
4) Use after free in netlink nlk groups handling, from Xin Long.
5) Handle MTU update properly in ipv6 gre tunnels, from Xin Long.
6) Fix leak of ipv6 fib tables on netns teardown, from Sabrina Dubroca
with follow-on fix from Eric Dumazet.
7) Need RCU and preemption disabled during generic XDP data patch,
from John Fastabend"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
bpf: make error reporting in bpf_warn_invalid_xdp_action more clear
Revert "mdio_bus: Remove unneeded gpiod NULL check"
bpf: devmap, use cond_resched instead of cpu_relax
bpf: add support for sockmap detach programs
net: rcu lock and preempt disable missing around generic xdp
bpf: don't select potentially stale ri->map from buggy xdp progs
net: tulip: Constify tulip_tbl
net: ethernet: ti: netcp_core: no need in netif_napi_del
davicom: Display proper debug level up to 6
net: phy: sfp: rename dt properties to match the binding
dt-binding: net: sfp binding documentation
dt-bindings: add SFF vendor prefix
dt-bindings: net: don't confuse with generic PHY property
ip6_tunnel: fix setting hop_limit value for ipv6 tunnel
ip_tunnel: fix setting ttl and tos value in collect_md mode
ipv6: fix typo in fib6_net_exit()
tcp: fix a request socket leak
sctp: fix missing wake ups in some situations
netfilter: xt_hashlimit: fix build error caused by 64bit division
netfilter: xt_hashlimit: alloc hashtable with right size
...
Merge more updates from Andrew Morton:
- most of the rest of MM
- a small number of misc things
- lib/ updates
- checkpatch
- autofs updates
- ipc/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits)
ipc: optimize semget/shmget/msgget for lots of keys
ipc/sem: play nicer with large nsops allocations
ipc/sem: drop sem_checkid helper
ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
ipc: convert ipc_namespace.count from atomic_t to refcount_t
kcov: support compat processes
sh: defconfig: cleanup from old Kconfig options
mn10300: defconfig: cleanup from old Kconfig options
m32r: defconfig: cleanup from old Kconfig options
drivers/pps: use surrounding "if PPS" to remove numerous dependency checks
drivers/pps: aesthetic tweaks to PPS-related content
cpumask: make cpumask_next() out-of-line
kmod: move #ifdef CONFIG_MODULES wrapper to Makefile
kmod: split off umh headers into its own file
MAINTAINERS: clarify kmod is just a kernel module loader
kmod: split out umh code into its own file
test_kmod: flip INT checks to be consistent
test_kmod: remove paranoid UINT_MAX check on uint range processing
vfat: deduplicate hex2bin()
...