IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add "idsfromsid" mount option to indicate to cifs.ko that it should
try to retrieve the uid and gid owner fields from special sids in the
ACL if present. This first patch just adds the parsing for the mount
option.
Signed-off-by: Steve French <steve.french@primarydata.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
We are already doing the same thing for an ordinary open case:
we can't keep read oplock on a file if we have mandatory byte-range
locks because pagereading can conflict with these locks on a server.
Fix it by setting oplock level to NONE.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
openFileList of tcon can be changed while cifs_reopen_file() is called
that can lead to an unexpected behavior when we return to the loop.
Fix this by introducing a temp list for keeping all file handles that
need to be reopen.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
We split the rawntlmssp authentication into negotiate and
authencate parts. We also clean up the code and add helpers.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Add helper functions and split Kerberos authentication off
SMB2_sess_setup.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
/sys/module/cifs/parameters should display the three
other module load time configuration settings for cifs.ko
Signed-off-by: Germano Percossi <germano.percossi@citrix.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
Cleanup some missing mem frees on some cifs ioctls, and
clarify others to make more obvious that no data is returned.
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
[CIFS] We had cases where we sent a SMB2/SMB3 setinfo request with all
timestamp (and DOS attribute) fields marked as 0 (ie do not change)
e.g. on chmod or chown.
Signed-off-by: Steve French <steve.french@primarydata.com>
CC: Stable <stable@vger.kernel.org>
Add mount option "max_credits" to allow setting maximum SMB3
credits to any value from 10 to 64000 (default is 32000).
This can be useful to workaround servers with problems allocating
credits, or to throttle the client to use smaller amount of
simultaneous i/o or to workaround server performance issues.
Also adds a cap, so that even if the server granted us more than
65000 credits due to a server bug, we would not use that many.
Signed-off-by: Steve French <steve.french@primarydata.com>
Continuous Availability features like persistent handles
require that clients reconnect their open files, not
just the sessions, soon after the network connection comes
back up, otherwise the server will throw away the state
(byte range locks, leases, deny modes) on those handles
after a timeout.
Add code to reconnect handles when use_persistent set
(e.g. Continuous Availability shares) after tree reconnect.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Germano Percossi <germano.percossi@citrix.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Remove the global file_list_lock to simplify cifs/smb3 locking and
have spinlocks that more closely match the information they are
protecting.
Add new tcon->open_file_lock and file->file_info_lock spinlocks.
Locks continue to follow a heirachy,
cifs_socket --> cifs_ses --> cifs_tcon --> cifs_file
where global tcp_ses_lock still protects socket and cifs_ses, while the
the newer locks protect the lower level structure's information
(tcon and cifs_file respectively).
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Germano Percossi <germano.percossi@citrix.com>
Patch a6b5058 results in -EREMOTE returned by is_path_accessible() in
cifs_mount() to be ignored which breaks DFS mounting.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
When we open a durable handle we give a Globally Unique
Identifier (GUID) to the server which we must keep for later reference
e.g. when reopening persistent handles on reconnection.
Without this the GUID generated for a new persistent handle was lost and
16 zero bytes were used instead on re-opening.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
GUIDs although random, and 16 bytes, need to be generated as
proper uuids.
Signed-off-by: Steve French <steve.french@primarydata.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reported-by: David Goebels <davidgoe@microsoft.com>
CC: Stable <stable@vger.kernel.org>
The kernel client requests 2 credits for many operations even though
they only use 1 credit (presumably to build up a buffer of credit).
Some servers seem to give the client as much credit as is requested. In
this case, the amount of credit the client has continues increasing to
the point where (server->credits * MAX_BUFFER_SIZE) overflows in
smb2_wait_mtu_credits().
Fix this by throttling the credit requests if an set limit is reached.
For async requests where the credit charge may be > 1, request as much
credit as what is charged.
The limit is chosen somewhat arbitrarily. The Windows client
defaults to 128 credits, the Windows server allows clients up to
512 credits (or 8192 for Windows 2016), and the NetApp server
(and at least one other) does not limit clients at all.
Choose a high enough value such that the client shouldn't limit
performance.
This behavior was seen with a NetApp filer (NetApp Release 9.0RC2).
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
In debugging smb3, it is useful to display the number
of credits available, so we can see when the server has not granted
sufficient operations for the client to make progress, or alternatively
the client has requested too many credits (as we saw in a recent bug)
so we can compare with the number of credits the server thinks
we have.
Add a /proc/fs/cifs/DebugData line to display the client view
on how many credits are available.
Signed-off-by: Steve French <steve.french@primarydata.com>
Reported-by: Germano Percossi <germano.percossi@citrix.com>
CC: Stable <stable@vger.kernel.org>
Add parsing for new pseudo-xattr user.cifs.creationtime file
attribute to allow backup and test applications to view
birth time of file on cifs/smb3 mounts.
Signed-off-by: Steve French <steve.french@primarydata.com>
Add parsing for new pseudo-xattr user.cifs.dosattrib file attribute
so tools can recognize what kind of file it is, and verify if common
SMB3 attributes (system, hidden, archive, sparse, indexed etc.) are
set.
Signed-off-by: Steve French <steve.french@primarydata.com>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Pull parisc fixes from Helge Deller:
"Some final updates and fixes for this merge window for the parisc
architecture. Changes include:
- Fix boot problems with new memblock allocator on rp3410 machine
- Increase initial kernel mapping size for 32- and 64-bit kernels,
this allows to boot bigger kernels which have many modules built-in
- Fix kernel layout regarding __gp and move exception table into RO
section
- Show trap names in crashes, use extable.h header instead of
module.h"
* 'parisc-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Show trap name in kernel crash
parisc: Zero-initialize newly alloced memblock
parisc: Move exception table into read-only section
parisc: Fix kernel memory layout regarding position of __gp
parisc: Increase initial kernel mapping size
parisc: Migrate exception table users off module.h and onto extable.h
Pull uaccess.h prepwork from Al Viro:
"Preparations to tree-wide switch to use of linux/uaccess.h (which,
obviously, will allow to start unifying stuff for real). The last step
there, ie
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
`git grep -l "$PATT"|grep -v ^include/linux/uaccess.h`
is not taken here - I would prefer to do it once just before or just
after -rc1. However, everything should be ready for it"
* 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
remove a stray reference to asm/uaccess.h in docs
sparc64: separate extable_64.h, switch elf_64.h to it
score: separate extable.h, switch module.h to it
mips: separate extable.h, switch module.h to it
x86: separate extable.h, switch sections.h to it
remove stray include of asm/uaccess.h from cacheflush.h
mn10300: remove a bogus processor.h->uaccess.h include
xtensa: split uaccess.h into C and asm sides
bonding: quit messing with IOCTL
kill __kernel_ds_p off
mn10300: finish verify_area() off
frv: move HAVE_ARCH_UNMAPPED_AREA to pgtable.h
exceptions: detritus removal
Pull drm updates from Dave Airlie:
"Core:
- Fence destaging work
- DRIVER_LEGACY to split off legacy drm drivers
- drm_mm refactoring
- Splitting drm_crtc.c into chunks and documenting better
- Display info fixes
- rbtree support for prime buffer lookup
- Simple VGA DAC driver
Panel:
- Add Nexus 7 panel
- More simple panels
i915:
- Refactoring GEM naming
- Refactored vma/active tracking
- Lockless request lookups
- Better stolen memory support
- FBC fixes
- SKL watermark fixes
- VGPU improvements
- dma-buf fencing support
- Better DP dongle support
amdgpu:
- Powerplay for Iceland asics
- Improved GPU reset support
- UVD/VEC powergating support for CZ/ST
- Preinitialised VRAM buffer support
- Virtual display support
- Initial SI support
- GTT rework
- PCI shutdown callback support
- HPD IRQ storm fixes
amdkfd:
- bugfixes
tilcdc:
- Atomic modesetting support
mediatek:
- AAL + GAMMA engine support
- Hook up gamma LUT
- Temporal dithering support
imx:
- Pixel clock from devicetree
- drm bridge support for LVDS bridges
- active plane reconfiguration
- VDIC deinterlacer support
- Frame synchronisation unit support
- Color space conversion support
analogix:
- PSR support
- Better panel on/off support
rockchip:
- rk3399 vop/crtc support
- PSR support
vc4:
- Interlaced vblank timing
- 3D rendering CPU overhead reduction
- HDMI output fixes
tda998x:
- HDMI audio ASoC support
sunxi:
- Allwinner A33 support
- better TCON support
msm:
- DT binding cleanups
- Explicit fence-fd support
sti:
- remove sti415/416 support
etnaviv:
- MMUv2 refactoring
- GC3000 support
exynos:
- Refactoring HDMI DCC/PHY
- G2D pm regression fix
- Page fault issues with wait for vblank
There is no nouveau work in this tree, as Ben didn't get a pull
request in, and he was fighting moving to atomic and adding mst
support, so maybe best it waits for a cycle"
* tag 'drm-for-v4.9' of git://people.freedesktop.org/~airlied/linux: (1412 commits)
drm/crtc: constify drm_crtc_index parameter
drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next
drm/i915/guc: Unwind GuC workqueue reservation if request construction fails
drm/i915: Reset the breadcrumbs IRQ more carefully
drm/i915: Force relocations via cpu if we run out of idle aperture
drm/i915: Distinguish last emitted request from last submitted request
drm/i915: Allow DP to work w/o EDID
drm/i915: Move long hpd handling into the hotplug work
drm/i915/execlists: Reinitialise context image after GPU hang
drm/i915: Use correct index for backtracking HUNG semaphores
drm/i915: Unalias obj->phys_handle and obj->userptr
drm/i915: Just clear the mmiodebug before a register access
drm/i915/gen9: only add the planes actually affected by ddb changes
drm/i915: Allow PCH DPLL sharing regardless of DPLL_SDVO_HIGH_SPEED
drm/i915/bxt: Fix HDMI DPLL configuration
drm/i915/gen9: fix the watermark res_blocks value
drm/i915/gen9: fix plane_blocks_per_line on watermarks calculations
drm/i915/gen9: minimum scanlines for Y tile is not always 4
drm/i915/gen9: fix the WaWmMemoryReadLatency implementation
drm/i915/kbl: KBL also needs to run the SAGV code
...
Merge more updates from Andrew Morton:
- a few block updates that fell in my lap
- lib/ updates
- checkpatch
- autofs
- ipc
- a ton of misc other things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (100 commits)
mm: split gfp_mask and mapping flags into separate fields
fs: use mapping_set_error instead of opencoded set_bit
treewide: remove redundant #include <linux/kconfig.h>
hung_task: allow hung_task_panic when hung_task_warnings is 0
kthread: add kerneldoc for kthread_create()
kthread: better support freezable kthread workers
kthread: allow to modify delayed kthread work
kthread: allow to cancel kthread work
kthread: initial support for delayed kthread work
kthread: detect when a kthread work is used by more workers
kthread: add kthread_destroy_worker()
kthread: add kthread_create_worker*()
kthread: allow to call __kthread_create_on_node() with va_list args
kthread/smpboot: do not park in kthread_create_on_cpu()
kthread: kthread worker API cleanup
kthread: rename probe_kthread_data() to kthread_probe_data()
scripts/tags.sh: enable code completion in VIM
mm: kmemleak: avoid using __va() on addresses that don't have a lowmem mapping
kdump, vmcoreinfo: report memory sections virtual addresses
ipc/sem.c: add cond_resched in exit_sme
...
mapping->flags currently encodes two different things into a single flag.
It contains sticky gfp_mask for page cache allocations and AS_ codes used
to report errors/enospace and other states which are mapping specific.
Condensing the two semantically unrelated things saves few bytes but it
also complicates other things. For one thing the gfp flags space is
reduced and in fact we are already running out of available bits. It can
be assumed that more gfp flags will be necessary later on.
To not introduce the address_space grow (at least on x86_64) we can stick
it right after private_lock because we have a hole there.
struct address_space {
struct inode * host; /* 0 8 */
struct radix_tree_root page_tree; /* 8 16 */
spinlock_t tree_lock; /* 24 4 */
atomic_t i_mmap_writable; /* 28 4 */
struct rb_root i_mmap; /* 32 8 */
struct rw_semaphore i_mmap_rwsem; /* 40 40 */
/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
long unsigned int nrpages; /* 80 8 */
long unsigned int nrexceptional; /* 88 8 */
long unsigned int writeback_index; /* 96 8 */
const struct address_space_operations * a_ops; /* 104 8 */
long unsigned int flags; /* 112 8 */
spinlock_t private_lock; /* 120 4 */
/* XXX 4 bytes hole, try to pack */
/* --- cacheline 2 boundary (128 bytes) --- */
struct list_head private_list; /* 128 16 */
void * private_data; /* 144 8 */
/* size: 152, cachelines: 3, members: 14 */
/* sum members: 148, holes: 1, sum holes: 4 */
/* last cacheline: 24 bytes */
};
Link: http://lkml.kernel.org/r/20160912114852.GI14524@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The mapping_set_error() helper sets the correct AS_ flag for the mapping
so there is no reason to open code it. Use the helper directly.
[akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kernel source files need not include <linux/kconfig.h> explicitly
because the top Makefile forces to include it with:
-include $(srctree)/include/linux/kconfig.h
This commit removes explicit includes except the following:
* arch/s390/include/asm/facilities_src.h
* tools/testing/radix-tree/linux/kernel.h
These two are used for host programs.
Link: http://lkml.kernel.org/r/1473656164-11929-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously hung_task_panic would not be respected if enabled after
hung_task_warnings had already been decremented to 0.
Permit the kernel to panic if hung_task_panic is enabled after
hung_task_warnings has already been decremented to 0 and another task
hangs for hung_task_timeout_secs seconds.
Check if hung_task_panic is enabled so we don't return prematurely, and
check if hung_task_warnings is non-zero so we don't print the warning
unnecessarily.
[akpm@linux-foundation.org: fix off-by-one]
Link: http://lkml.kernel.org/r/1473450214-4049-1-git-send-email-jsiddle@redhat.com
Signed-off-by: John Siddle <jsiddle@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This macro is referenced in other kerneldoc comments, but lacks one of its
own; fix that.
Link: http://lkml.kernel.org/r/20160826072313.726a3485@lwn.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Reported-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.
It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.
I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.
Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.
Link: http://lkml.kernel.org/r/1470754545-17632-12-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are situations when we need to modify the delay of a delayed kthread
work. For example, when the work depends on an event and the initial delay
means a timeout. Then we want to queue the work immediately when the event
happens.
This patch implements kthread_mod_delayed_work() as inspired workqueues.
It cancels the timer, removes the work from any worker list and queues it
again with the given timeout.
A very special case is when the work is being canceled at the same time.
It might happen because of the regular kthread_cancel_delayed_work_sync()
or by another kthread_mod_delayed_work(). In this case, we do nothing and
let the other operation win. This should not normally happen as the caller
is supposed to synchronize these operations a reasonable way.
Link: http://lkml.kernel.org/r/1470754545-17632-11-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are going to use kthread workers more widely and sometimes we will need
to make sure that the work is neither pending nor running.
This patch implements cancel_*_sync() operations as inspired by
workqueues. Well, we are synchronized against the other operations via
the worker lock, we use del_timer_sync() and a counter to count parallel
cancel operations. Therefore the implementation might be easier.
First, we check if a worker is assigned. If not, the work has newer been
queued after it was initialized.
Second, we take the worker lock. It must be the right one. The work must
not be assigned to another worker unless it is initialized in between.
Third, we try to cancel the timer when it exists. The timer is deleted
synchronously to make sure that the timer call back is not running. We
need to temporary release the worker->lock to avoid a possible deadlock
with the callback. In the meantime, we set work->canceling counter to
avoid any queuing.
Fourth, we try to remove the work from a worker list. It might be
the list of either normal or delayed works.
Fifth, if the work is running, we call kthread_flush_work(). It might
take an arbitrary time. We need to release the worker-lock again. In the
meantime, we again block any queuing by the canceling counter.
As already mentioned, the check for a pending kthread work is done under a
lock. In compare with workqueues, we do not need to fight for a single
PENDING bit to block other operations. Therefore we do not suffer from
the thundering storm problem and all parallel canceling jobs might use
kthread_flush_work(). Any queuing is blocked until the counter gets zero.
Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are going to use kthread_worker more widely and delayed works
will be pretty useful.
The implementation is inspired by workqueues. It uses a timer to queue
the work after the requested delay. If the delay is zero, the work is
queued immediately.
In compare with workqueues, each work is associated with a single worker
(kthread). Therefore the implementation could be much easier. In
particular, we use the worker->lock to synchronize all the operations with
the work. We do not need any atomic operation with a flags variable.
In fact, we do not need any state variable at all. Instead, we add a list
of delayed works into the worker. Then the pending work is listed either
in the list of queued or delayed works. And the existing check of pending
works is the same even for the delayed ones.
A work must not be assigned to another worker unless reinitialized.
Therefore the timer handler might expect that dwork->work->worker is valid
and it could simply take the lock. We just add some sanity checks to help
with debugging a potential misuse.
Link: http://lkml.kernel.org/r/1470754545-17632-9-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nothing currently prevents a work from queuing for a kthread worker when
it is already running on another one. This means that the work might run
in parallel on more than one worker. Also some operations are not
reliable, e.g. flush.
This problem will be even more visible after we add kthread_cancel_work()
function. It will only have "work" as the parameter and will use
worker->lock to synchronize with others.
Well, normally this is not a problem because the API users are sane.
But bugs might happen and users also might be crazy.
This patch adds a warning when we try to insert the work for another
worker. It does not fully prevent the misuse because it would make the
code much more complicated without a big benefit.
It adds the same warning also into kthread_flush_work() instead of the
repeated attempts to get the right lock.
A side effect is that one needs to explicitly reinitialize the work if it
must be queued into another worker. This is needed, for example, when the
worker is stopped and started again. It is a bit inconvenient. But it
looks like a good compromise between the stability and complexity.
I have double checked all existing users of the kthread worker API and
they all seems to initialize the work after the worker gets started.
Just for completeness, the patch adds a check that the work is not already
in a queue.
The patch also puts all the checks into a separate function. It will be
reused when implementing delayed works.
Link: http://lkml.kernel.org/r/1470754545-17632-8-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current kthread worker users call flush() and stop() explicitly.
This function does the same plus it frees the kthread_worker struct
in one call.
It is supposed to be used together with kthread_create_worker*() that
allocates struct kthread_worker.
Link: http://lkml.kernel.org/r/1470754545-17632-7-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kthread workers are currently created using the classic kthread API,
namely kthread_run(). kthread_worker_fn() is passed as the @threadfn
parameter.
This patch defines kthread_create_worker() and
kthread_create_worker_on_cpu() functions that hide implementation details.
They enforce using kthread_worker_fn() for the main thread. But I doubt
that there are any plans to create any alternative. In fact, I think that
we do not want any alternative main thread because it would be hard to
support consistency with the rest of the kthread worker API.
The naming and function of kthread_create_worker() is inspired by the
workqueues API like the rest of the kthread worker API.
The kthread_create_worker_on_cpu() variant is motivated by the original
kthread_create_on_cpu(). Note that we need to bind per-CPU kthread
workers already when they are created. It makes the life easier.
kthread_bind() could not be used later for an already running worker.
This patch does _not_ convert existing kthread workers. The kthread
worker API need more improvements first, e.g. a function to destroy the
worker.
IMPORTANT:
kthread_create_worker_on_cpu() allows to use any format of the worker
name, in compare with kthread_create_on_cpu(). The good thing is that it
is more generic. The bad thing is that most users will need to pass the
cpu number in two parameters, e.g. kthread_create_worker_on_cpu(cpu,
"helper/%d", cpu).
To be honest, the main motivation was to avoid the need for an empty
va_list. The only legal way was to create a helper function that would be
called with an empty list. Other attempts caused compilation warnings or
even errors on different architectures.
There were also other alternatives, for example, using #define or
splitting __kthread_create_worker(). The used solution looked like the
least ugly.
Link: http://lkml.kernel.org/r/1470754545-17632-6-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kthread_create_on_node() implements a bunch of logic to create the
kthread. It is already called by kthread_create_on_cpu().
We are going to extend the kthread worker API and will need to call
kthread_create_on_node() with va_list args there.
This patch does only a refactoring and does not modify the existing
behavior.
Link: http://lkml.kernel.org/r/1470754545-17632-5-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kthread_create_on_cpu() was added by the commit 2a1d446019
("kthread: Implement park/unpark facility"). It is currently used only
when enabling new CPU. For this purpose, the newly created kthread has to
be parked.
The CPU binding is a bit tricky. The kthread is parked when the CPU has
not been allowed yet. And the CPU is bound when the kthread is unparked.
The function would be useful for more per-CPU kthreads, e.g.
bnx2fc_thread, fcoethread. For this purpose, the newly created kthread
should stay in the uninterruptible state.
This patch moves the parking into smpboot. It binds the thread already
when created. Then the function might be used universally. Also the
behavior is consistent with kthread_create() and kthread_create_on_node().
Link: http://lkml.kernel.org/r/1470754545-17632-4-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A good practice is to prefix the names of functions by the name
of the subsystem.
The kthread worker API is a mix of classic kthreads and workqueues. Each
worker has a dedicated kthread. It runs a generic function that process
queued works. It is implemented as part of the kthread subsystem.
This patch renames the existing kthread worker API to use
the corresponding name from the workqueues API prefixed by
kthread_:
__init_kthread_worker() -> __kthread_init_worker()
init_kthread_worker() -> kthread_init_worker()
init_kthread_work() -> kthread_init_work()
insert_kthread_work() -> kthread_insert_work()
queue_kthread_work() -> kthread_queue_work()
flush_kthread_work() -> kthread_flush_work()
flush_kthread_worker() -> kthread_flush_worker()
Note that the names of DEFINE_KTHREAD_WORK*() macros stay
as they are. It is common that the "DEFINE_" prefix has
precedence over the subsystem names.
Note that INIT() macros and init() functions use different
naming scheme. There is no good solution. There are several
reasons for this solution:
+ "init" in the function names stands for the verb "initialize"
aka "initialize worker". While "INIT" in the macro names
stands for the noun "INITIALIZER" aka "worker initializer".
+ INIT() macros are used only in DEFINE() macros
+ init() functions are used close to the other kthread()
functions. It looks much better if all the functions
use the same scheme.
+ There will be also kthread_destroy_worker() that will
be used close to kthread_cancel_work(). It is related
to the init() function. Again it looks better if all
functions use the same naming scheme.
+ there are several precedents for such init() function
names, e.g. amd_iommu_init_device(), free_area_init_node(),
jump_label_init_type(), regmap_init_mmio_clk(),
+ It is not an argument but it was inconsistent even before.
[arnd@arndb.de: fix linux-next merge conflict]
Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "kthread: Kthread worker API improvements"
The intention of this patchset is to make it easier to manipulate and
maintain kthreads. Especially, I want to replace all the custom main
cycles with a generic one. Also I want to make the kthreads sleep in a
consistent state in a common place when there is no work.
This patch (of 11):
A good practice is to prefix the names of functions by the name of the
subsystem.
This patch fixes the name of probe_kthread_data(). The other wrong
functions names are part of the kthread worker API and will be fixed
separately.
Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a
physical address to a virtual one using __va(). However, such physical
addresses may sometimes be located in highmem and using __va() is
incorrect, leading to inconsistent object tracking in kmemleak.
The following functions have been added to the kmemleak API and they take
a physical address as the object pointer. They only perform the
corresponding action if the address has a lowmem mapping:
kmemleak_alloc_phys
kmemleak_free_part_phys
kmemleak_not_leak_phys
kmemleak_ignore_phys
The affected calling places have been updated to use the new kmemleak
API.
Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KASLR memory randomization can randomize the base of the physical memory
mapping (PAGE_OFFSET), vmalloc (VMALLOC_START) and vmemmap
(VMEMMAP_START). Adding these variables on VMCOREINFO so tools can easily
identify the base of each memory section.
Link: http://lkml.kernel.org/r/1471531632-23003-1-git-send-email-thgarnie@google.com
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Eugene Surovegin <surovegin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In CONFIG_PREEMPT=n kernel a softlockup was observed while the for loop in
exit_sem. Apparently it's possible for the loop to take quite a long time
and it doesn't have a scheduling point in it. Since the codes is
executing under an rcu read section this may also cause rcu stalls, which
in turn block synchronize_rcu operations, which more or less de-stabilises
the whole system.
Fix this by introducing a cond_resched() at the beginning of the loop.
So this patch fixes the following:
NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119]
CPU: 10 PID: 18119 Comm: httpd Tainted: G O 4.4.20-clouder2 #6
Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000
RIP: 0010:[<ffffffff81614bc7>] [<ffffffff81614bc7>] _raw_spin_lock+0x17/0x30
RSP: 0018:ffff881c95553e40 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d
RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4
RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88
R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0
R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0
FS: 0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0
Call Trace:
? exit_sem+0x7c/0x280
do_exit+0x338/0xb40
do_group_exit+0x43/0xd0
SyS_exit_group+0x14/0x20
entry_SYSCALL_64_fastpath+0x16/0x6e
Link: http://lkml.kernel.org/r/1475154992-6363-1-git-send-email-kernel@kyup.com
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Cc: Herton R. Krzesinski <herton@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Blocked tasks queued in q_senders waiting for their message to fit in the
queue are blindly awoken every time we think there's a remote chance this
might happen. This could cause numerous (and expensive -- thundering
herd-ish) bogus wakeups if the queue is still really full. Adding to the
scheduling cost/overhead, there's also the fact that we need to take the
ipc object lock and requeue ourselves in the q_senders list.
By keeping track of the blocked sender's message size, we can know
previously if the wakeup ought to occur or not. Otherwise, to maintain
the current wakeup order we just move it to the tail. This is exactly
what occurs right now if the sender needs to go back to sleep.
The case of EIDRM is left completely untouched, as we need to wakeup all
the tasks, and shouldn't be playing games in the first place.
This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context
switches (avg out of ten runs). Although these tests are really about
functionality (as opposed to performance), is does show the direct
benefits of the optimization.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the use of wake_qs in sysv msg queues are only for the receiver
tasks that are blocked on the queue. But blocked sender tasks (due to
queue size constraints) still are awoken with the ipc object lock held,
which can be a problem particularly for small sized queues and far from
gracious for -rt (just like it was for the receiver side).
The paths that actually wakeup a sender are obviously related to when we
are either getting rid of the queue or after (some) space is freed-up
after a receiver takes the msg (msgrcv). Furthermore, with the exception
of msgrcv, we can always piggy-back on expunge_all that has its own tasks
lined-up for waking. Finally, upon unlinking the message, it should be no
problem delaying the wakeups a bit until after we've released the lock.
Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves the wakeup_process() invocation so it is not done under
the ipc global lock by making use of a lockless wake_q. With this change,
the waiter is woken up once the message has been assigned and it does not
need to loop on SMP if the message points to NULL. In the signal case we
still need to check the pointer under the lock to verify the state.
This change should also avoid the introduction of preempt_disable() in -RT
which avoids a busy-loop which pools for the NULL -> !NULL change if the
waiter has a higher priority compared to the waker.
By making use of wake_qs, the logic of sysv msg queues is greatly
simplified (and very well suited as we can batch lockless wakeups),
particularly around the lockless receive algorithm.
This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
Radeon R7, 12 Compute Cores 4C+8G":
test | before | after | diff
-----------------|------------|------------|----------
pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
pmsg-shared 2 60 | 22,884,224 | 24,278,200 | + ~6.09 %
Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.net
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 6d07b68ce1 ("ipc/sem.c: optimize sem_lock()") introduced a
race:
sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)
As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).
The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.
The patch also updates stale documentation regarding the locking.
With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce1 ("ipc/sem.c: optimize sem_lock()") (3.10?)
The alternative is to revert the patch that introduced the race.
The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().
Background:
Here is the race of the current implementation:
Thread A: (simple op)
- does the first "sma->complex_count == 0" test
Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps
Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test
Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count
Thread A:
- does the complex_count test
Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.
Fixes: 6d07b68ce1 ("ipc/sem.c: optimize sem_lock()")
Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
Reported-by: <felixh@informatik.uni-bremen.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <1vier1@web.de>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's no point in collecting coverage from lib/stackdepot.c, as it is
not a function of syscall inputs. Disabling kcov instrumentation for that
file will reduce the coverage noise level.
Link: http://lkml.kernel.org/r/1474640972-104131-1-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>