linux

iv/linux

Author	SHA1	Message	Date
Tejun Heo	4e8f0a6096	workqueue: remove worker_pool->gcwq The only remaining user of pool->gcwq is std_worker_pool_pri(). Reimplement it using get_gcwq() and remove worker_pool->gcwq. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:34 -08:00
Tejun Heo	38db41d984	workqueue: replace for_each_worker_pool() with for_each_std_worker_pool() for_each_std_worker_pool() takes @cpu instead of @gcwq. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:34 -08:00
Tejun Heo	a1056305fa	workqueue: make freezing/thawing per-pool Instead of holding locks from both pools and then processing the pools together, make freezing/thwaing per-pool - grab locks of one pool, process it, release it and then proceed to the next pool. While this patch changes processing order across pools, order within each pool remains the same. As each pool is independent, this shouldn't break anything. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	94cf58bb29	workqueue: make hotplug processing per-pool Instead of holding locks from both pools and then processing the pools together, make hotplug processing per-pool - grab locks of one pool, process it, release it and then proceed to the next pool. rebind_workers() is updated to take and process @pool instead of @gcwq which results in a lot of de-indentation. gcwq_claim_assoc_and_lock() and its counterpart are replaced with in-line per-pool locking. While this patch changes processing order across pools, order within each pool remains the same. As each pool is independent, this shouldn't break anything. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	d565ed6309	workqueue: move global_cwq->lock to worker_pool Move gcwq->lock to pool->lock. The conversion is mostly straight-forward. Things worth noting are * In many places, this removes the need to use gcwq completely. pool is used directly instead. get_std_worker_pool() is added to help some of these conversions. This also leaves get_work_gcwq() without any user. Removed. * In hotplug and freezer paths, the pools belonging to a CPU are often processed together. This patch makes those paths hold locks of all pools, with highpri lock nested inside, to keep the conversion straight-forward. These nested lockings will be removed by following patches. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	ec22ca5eab	workqueue: move global_cwq->cpu to worker_pool Move gcwq->cpu to pool->cpu. This introduces a couple places where gcwq->pools[0].cpu is used. These will soon go away as gcwq is further reduced. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	c9e7cf273f	workqueue: move busy_hash from global_cwq to worker_pool There's no functional necessity for the two pools on the same CPU to share the busy hash table. It's also likely to be a bottleneck when implementing pools with user-specified attributes. This patch makes busy_hash per-pool. The conversion is mostly straight-forward. Changes worth noting are, * Large block of changes in rebind_workers() is moving the block inside for_each_worker_pool() as now there are separate hash tables for each pool. This changes the order of operations but doesn't break anything. * Thre for_each_worker_pool() loops in gcwq_unbind_fn() are combined into one. This again changes the order of operaitons but doesn't break anything. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	7c3eed5cd6	workqueue: record pool ID instead of CPU in work->data when off-queue Currently, when a work item is off-queue, work->data records the CPU it was last on, which is used to locate the last executing instance for non-reentrance, flushing, etc. We're in the process of removing global_cwq and making worker_pool the top level abstraction. This patch makes work->data point to the pool it was last associated with instead of CPU. After the previous WORK_OFFQ_POOL_CPU and worker_poo->id additions, the conversion is fairly straight-forward. WORK_OFFQ constants and functions are modified to record and read back pool ID instead. worker_pool_by_id() is added to allow looking up pool from ID. get_work_pool() replaces get_work_gcwq(), which is reimplemented using get_work_pool(). get_work_pool_id() replaces work_cpu(). This patch shouldn't introduce any observable behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	9daf9e678d	workqueue: add worker_pool->id Add worker_pool->id which is allocated from worker_pool_idr. This will be used to record the last associated worker_pool in work->data. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	715b06b864	workqueue: introduce WORK_OFFQ_CPU_NONE Currently, when a work item is off queue, high bits of its data encodes the last CPU it was on. This is scheduled to be changed to pool ID, which will make it impossible to use WORK_CPU_NONE to indicate no association. This patch limits the number of bits which are used for off-queue cpu number to 31 (so that the max fits in an int) and uses the highest possible value - WORK_OFFQ_CPU_NONE - to indicate no association. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	35b6bb63b8	workqueue: make GCWQ_FREEZING a pool flag Make GCWQ_FREEZING a pool flag POOL_FREEZING. This patch doesn't change locking - FREEZING on both pools of a CPU are set or clear together while holding gcwq->lock. It shouldn't cause any functional difference. This leaves gcwq->flags w/o any flags. Removed. While at it, convert BUG_ON()s in freeze_workqueue_begin() and thaw_workqueues() to WARN_ON_ONCE(). This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	2464757086	workqueue: make GCWQ_DISASSOCIATED a pool flag Make GCWQ_DISASSOCIATED a pool flag POOL_DISASSOCIATED. This patch doesn't change locking - DISASSOCIATED on both pools of a CPU are set or clear together while holding gcwq->lock. It shouldn't cause any functional difference. This is part of an effort to remove global_cwq and make worker_pool the top level abstraction, which in turn will help implementing worker pools with user-specified attributes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	e34cdddb03	workqueue: use std_ prefix for the standard per-cpu pools There are currently two worker pools per cpu (including the unbound cpu) and they are the only pools in use. New class of pools are scheduled to be added and some pool related APIs will be added inbetween. Call the existing pools the standard pools and prefix them with std_. Do this early so that new APIs can use std_ prefix from the beginning. This patch doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:33 -08:00
Tejun Heo	e2905b2912	workqueue: unexport work_cpu() This function no longer has any external users. Unexport it. It will be removed later on. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-01-24 11:01:32 -08:00
Tejun Heo	2eaebdb33e	workqueue: move struct worker definition to workqueue_internal.h This will be used to implement an inline function to query whether %current is a workqueue worker and, if so, allow determining which work item it's executing. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2013-01-18 14:05:55 -08:00
Tejun Heo	ea138446e5	workqueue: rename kernel/workqueue_sched.h to kernel/workqueue_internal.h Workqueue wants to expose more interface internal to kernel/. Instead of adding a new header file, repurpose kernel/workqueue_sched.h. Rename it to workqueue_internal.h and add include protector. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org>	2013-01-18 14:05:55 -08:00
Tejun Heo	111c225a5f	workqueue: set PF_WQ_WORKER on rescuers PF_WQ_WORKER is used to tell scheduler that the task is a workqueue worker and needs wq_worker_sleeping/waking_up() invoked on it for concurrency management. As rescuers never participate in concurrency management, PF_WQ_WORKER wasn't set on them. There's a need for an interface which can query whether %current is executing a work item and if so which. Such interface requires a way to identify all tasks which may execute work items and PF_WQ_WORKER will be used for that. As all normal workers always have PF_WQ_WORKER set, we only need to add it to rescuers. As rescuers start with WORKER_PREP but never clear it, it's always NOT_RUNNING and there's no need to worry about it interfering with concurrency management even if PF_WQ_WORKER is set; however, unlike normal workers, rescuers currently don't have its worker struct as kthread_data(). It uses the associated workqueue_struct instead. This is problematic as wq_worker_sleeping/waking_up() expect struct worker at kthread_data(). This patch adds worker->rescue_wq and start rescuer kthreads with worker struct as kthread_data and sets PF_WQ_WORKER on rescuers. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2013-01-17 17:19:58 -08:00
Tejun Heo	023f27d3d6	workqueue: fix find_worker_executing_work() brekage from hashtable conversion `42f8570f43` ("workqueue: use new hashtable implementation") incorrectly made busy workers hashed by the pointer value of worker instead of work. This broke find_worker_executing_work() which in turn broke a lot of fundamental operations of workqueue - non-reentrancy and flushing among others. The flush malfunction triggered warning in disk event code in Fengguang's automated test. write_dev_root_ (3265) used greatest stack depth: 2704 bytes left ------------[ cut here ]------------ WARNING: at /c/kernel-tests/src/stable/block/genhd.c:1574 disk_clear_events+0x\ cf/0x108() Hardware name: Bochs Modules linked in: Pid: 3328, comm: ata_id Not tainted 3.7.0-01930-gbff6343 #1167 Call Trace: [<ffffffff810997c4>] warn_slowpath_common+0x83/0x9c [<ffffffff810997f7>] warn_slowpath_null+0x1a/0x1c [<ffffffff816aea77>] disk_clear_events+0xcf/0x108 [<ffffffff811bd8be>] check_disk_change+0x27/0x59 [<ffffffff822e48e2>] cdrom_open+0x49/0x68b [<ffffffff81ab0291>] idecd_open+0x88/0xb7 [<ffffffff811be58f>] __blkdev_get+0x102/0x3ec [<ffffffff811bea08>] blkdev_get+0x18f/0x30f [<ffffffff811bebfd>] blkdev_open+0x75/0x80 [<ffffffff8118f510>] do_dentry_open+0x1ea/0x295 [<ffffffff8118f5f0>] finish_open+0x35/0x41 [<ffffffff8119c720>] do_last+0x878/0xa25 [<ffffffff8119c993>] path_openat+0xc6/0x333 [<ffffffff8119cf37>] do_filp_open+0x38/0x86 [<ffffffff81190170>] do_sys_open+0x6c/0xf9 [<ffffffff8119021e>] sys_open+0x21/0x23 [<ffffffff82c1c3d9>] system_call_fastpath+0x16/0x1b Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Sasha Levin <sasha.levin@oracle.com>	2012-12-19 11:24:06 -08:00
Tejun Heo	a2c1c57be8	workqueue: consider work function when searching for busy work items To avoid executing the same work item concurrenlty, workqueue hashes currently busy workers according to their current work items and looks up the the table when it wants to execute a new work item. If there already is a worker which is executing the new work item, the new item is queued to the found worker so that it gets executed only after the current execution finishes. Unfortunately, a work item may be freed while being executed and thus recycled for different purposes. If it gets recycled for a different work item and queued while the previous execution is still in progress, workqueue may make the new work item wait for the old one although the two aren't really related in any way. In extreme cases, this false dependency may lead to deadlock although it's extremely unlikely given that there aren't too many self-freeing work item users and they usually don't wait for other work items. To alleviate the problem, record the current work function in each busy worker and match it together with the work item address in find_worker_executing_work(). While this isn't complete, it ensures that unrelated work items don't interact with each other and in the very unlikely case where a twisted wq user triggers it, it's always onto itself making the culprit easy to spot. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andrey Isakov <andy51@gmx.ru> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51701 Cc: stable@vger.kernel.org	2012-12-18 10:56:14 -08:00
Sasha Levin	42f8570f43	workqueue: use new hashtable implementation Switch workqueues to use the new hashtable implementation. This reduces the amount of generic unrelated code in the workqueues. This patch depends on `d9b482c` ("hashtable: introduce a small and naive hashtable") which was merged in v3.6. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-12-18 09:21:13 -08:00
Linus Torvalds	e7b55b8fcd	Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "Nothing exciting. Just two trivial changes." * 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: add WARN_ON_ONCE() on CPU number to wq_worker_waking_up() workqueue: trivial fix for return statement in work_busy()	2012-12-12 08:15:13 -08:00
Tejun Heo	fc4b514f27	workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s `8852aac25e` ("workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay") unexpectedly uncovered a very nasty abuse of delayed_work in megaraid - it allocated work_struct, casted it to delayed_work and then pass that into queue_delayed_work(). Previously, this was okay because 0 @delay short-circuited to queue_work() before doing anything with delayed_work. `8852aac25e` moved 0 @delay test into __queue_delayed_work() after sanity check on delayed_work making megaraid trigger BUG_ON(). Although megaraid is already fixed by `c1d390d8e6` ("megaraid: fix BUG_ON() from incorrect use of delayed work"), this patch converts BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s so that such abusers, if there are more, trigger warning but don't crash the machine. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Xiaotian Feng <xtfeng@gmail.com>	2012-12-04 07:58:47 -08:00
Joonsoo Kim	3657600040	workqueue: add WARN_ON_ONCE() on CPU number to wq_worker_waking_up() Recently, workqueue code has gone through some changes and we found some bugs related to concurrency management operations happening on the wrong CPU. When a worker is concurrency managed (!WORKER_NOT_RUNNIG), it should be bound to its associated cpu and woken up to that cpu. Add WARN_ON_ONCE() to verify this. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-12-01 16:45:45 -08:00
Joonsoo Kim	999767beb1	workqueue: trivial fix for return statement in work_busy() Return type of work_busy() is unsigned int. There is return statement returning boolean value, 'false' in work_busy(). It is not problem, because 'false' may be treated '0'. However, fixing it would make code robust. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-12-01 16:45:40 -08:00
Tejun Heo	8852aac25e	workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay `8376fe22c7` ("workqueue: implement mod_delayed_work[_on]()") implemented mod_delayed_work[_on]() using the improved try_to_grab_pending(). The function is later used, among others, to replace [__]candel_delayed_work() + queue_delayed_work() combinations. Unfortunately, a delayed_work item w/ zero @delay is handled slightly differently by mod_delayed_work_on() compared to queue_delayed_work_on(). The latter skips timer altogether and directly queues it using queue_work_on() while the former schedules timer which will expire on the closest tick. This means, when @delay is zero, that [__]cancel_delayed_work() + queue_delayed_work_on() makes the target item immediately executable while mod_delayed_work_on() may induce delay of upto a full tick. This somewhat subtle difference breaks some of the converted users. e.g. block queue plugging uses delayed_work for deferred processing and uses mod_delayed_work_on() when the queue needs to be immediately unplugged. The above problem manifested as noticeably higher number of context switches under certain circumstances. The difference in behavior was caused by missing special case handling for 0 delay in mod_delayed_work_on() compared to queue_delayed_work_on(). Joonsoo Kim posted a patch to add it - ("workqueue: optimize mod_delayed_work_on() when @delay == 0")[1]. The patch was queued for 3.8 but it was described as optimization and I missed that it was a correctness issue. As both queue_delayed_work_on() and mod_delayed_work_on() use __queue_delayed_work() for queueing, it seems that the better approach is to move the 0 delay special handling to the function instead of duplicating it in mod_delayed_work_on(). Fix the problem by moving 0 delay special case handling from queue_delayed_work_on() to __queue_delayed_work(). This replaces Joonsoo's patch. [1] http://thread.gmane.org/gmane.linux.kernel/1379011/focus=1379012 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Anders Kaseorg <andersk@MIT.EDU> Reported-and-tested-by: Zlatko Calusic <zlatko.calusic@iskon.hr> LKML-Reference: <alpine.DEB.2.00.1211280953350.26602@dr-wily.mit.edu> LKML-Reference: <50A78AA9.5040904@iskon.hr> Cc: Joonsoo Kim <js1304@gmail.com>	2012-12-01 16:43:18 -08:00
Mike Galbraith	412d32e6c9	workqueue: exit rescuer_thread() as TASK_RUNNING A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling off, never to be seen again. In the case where this occurred, an exiting thread hit reiserfs homebrew conditional resched while holding a mutex, bringing the box to its knees. PID: 18105 TASK: ffff8807fd412180 CPU: 5 COMMAND: "kdmflush" #0 [ffff8808157e7670] schedule at ffffffff8143f489 #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs] #2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14 #3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs] #4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2 #5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41 #6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a #7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88 #8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850 #9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f [exception RIP: kernel_thread_helper] RIP: ffffffff8144a5c0 RSP: ffff8808157e7f58 RFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8107af60 RDI: ffff8803ee491d18 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2012-12-01 15:56:42 -08:00
Dan Magenheimer	c0158ca64d	workqueue: cancel_delayed_work() should return %false if work item is idle `57b30ae77b` ("workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()") made cancel_delayed_work() always return %true unless someone else is also trying to cancel the work item, which is broken - if the target work item is idle, the return value should be %false. try_to_grab_pending() indicates that the target work item was idle by zero return value. Use it for return. Note that this brings cancel_delayed_work() in line with __cancel_work_timer() in return value handling. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <444a6439-b1a4-4740-9e7e-bc37267cfe73@default>	2012-10-24 12:38:16 -07:00
Linus Torvalds	033d9959ed	Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...	2012-10-02 09:54:49 -07:00
Tejun Heo	7c6e72e46c	workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() `e0aecdd874` ("workqueue: use irqsafe timer for delayed_work") made try_to_grab_pending() safe to use from irq context but forgot to remove WARN_ON_ONCE(in_irq()). Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com>	2012-09-20 10:03:19 -07:00
Lai Jiangshan	70369b117a	workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue_set_max_active() may increase ->max_active without activating delayed works and may make the activation order differ from the queueing order. Both aren't strictly bugs but the resulting behavior could be a bit odd. To make things more consistent, use cwq_set_max_active() helper which immediately makes use of the newly increased max_mactive if there are delayed work items and also keeps the activation order. tj: Slight update to description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-19 10:40:48 -07:00
Lai Jiangshan	9f4bd4cddb	workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() Using a helper instead of open code makes thaw_workqueues() clearer. The helper will also be used by the next patch. tj: Slight update to comment and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-19 10:40:48 -07:00
Tejun Heo	ed48ece27c	workqueue: reimplement work_on_cpu() using system_wq The existing work_on_cpu() implementation is hugely inefficient. It creates a new kthread, execute that single function and then let the kthread die on each invocation. Now that system_wq can handle concurrent executions, there's no advantage of doing this. Reimplement work_on_cpu() using system_wq which makes it simpler and way more efficient. stable: While this isn't a fix in itself, it's needed to fix a workqueue related bug in cpufreq/powernow-k8. AFAICS, this shouldn't break other existing users. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Len Brown <lenb@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@vger.kernel.org	2012-09-19 10:13:12 -07:00
Lai Jiangshan	b3f9f405a2	workqueue: remove @delayed from cwq_dec_nr_in_flight() @delayed is now always false for all callers, remove it. tj: Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-18 10:40:00 -07:00
Lai Jiangshan	3aa6249759	workqueue: fix possible stall on try_to_grab_pending() of a delayed work item Currently, when try_to_grab_pending() grabs a delayed work item, it leaves its linked work items alone on the delayed_works. The linked work items are always NO_COLOR and will cause future cwq_activate_first_delayed() increase cwq->nr_active incorrectly, and may cause the whole cwq to stall. For example, state: cwq->max_active = 1, cwq->nr_active = 1 one work in cwq->pool, many in cwq->delayed_works. step1: try_to_grab_pending() removes a work item from delayed_works but leaves its NO_COLOR linked work items on it. step2: Later on, cwq_activate_first_delayed() activates the linked work item increasing ->nr_active. step3: cwq->nr_active = 1, but all activated work items of the cwq are NO_COLOR. When they finish, cwq->nr_active will not be decreased due to NO_COLOR, and no further work items will be activated from cwq->delayed_works. the cwq stalls. Fix it by ensuring the target work item is activated before stealing PENDING in try_to_grab_pending(). This ensures that all the linked work items are activated without incorrectly bumping cwq->nr_active. tj: Updated comment and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org	2012-09-18 10:40:00 -07:00
Lai Jiangshan	a5b4e57d7c	workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue_cpu_down_callback() is used only if HOTPLUG_CPU=y, so hotcpu_notifier() fits better than cpu_notifier(). When HOTPLUG_CPU=y, hotcpu_notifier() and cpu_notifier() are the same. When HOTPLUG_CPU=n, if we use cpu_notifier(), workqueue_cpu_down_callback() will be called during boot to do nothing, and the memory of workqueue_cpu_down_callback() and gcwq_unbind_fn() will be discarded after boot. If we use hotcpu_notifier(), we can avoid the no-op call of workqueue_cpu_down_callback() and the memory of workqueue_cpu_down_callback() and gcwq_unbind_fn() will be discard at build time: $ ls -l kernel/workqueue.o.cpu_notifier kernel/workqueue.o.hotcpu_notifier -rw-rw-r-- 1 laijs laijs 484080 Sep 15 11:31 kernel/workqueue.o.cpu_notifier -rw-rw-r-- 1 laijs laijs 478240 Sep 15 11:31 kernel/workqueue.o.hotcpu_notifier $ size kernel/workqueue.o.cpu_notifier kernel/workqueue.o.hotcpu_notifier text data bss dec hex filename 18513 2387 1221 22121 5669 kernel/workqueue.o.cpu_notifier 18082 2355 1221 21658 549a kernel/workqueue.o.hotcpu_notifier tj: Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-18 09:59:23 -07:00
Lai Jiangshan	9fdf9b73d6	workqueue: use __cpuinit instead of __devinit for cpu callbacks For workqueue hotplug callbacks, it makes less sense to use __devinit which discards the memory after boot if !HOTPLUG. __cpuinit, which discards the memory after boot if !HOTPLUG_CPU fits better. tj: Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-18 09:59:23 -07:00
Lai Jiangshan	b2eb83d123	workqueue: rename manager_mutex to assoc_mutex Now that manager_mutex's role has changed from synchronizing manager role to excluding hotplug against manager, the name is misleading. As it is protecting the CPU-association of the gcwq now, rename it to assoc_mutex. This patch is pure rename and doesn't introduce any functional change. tj: Updated comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-18 09:59:23 -07:00
Lai Jiangshan	5f7dabfd5c	workqueue: WORKER_REBIND is no longer necessary for idle rebinding Now both worker destruction and idle rebinding remove the worker from idle list while it's still idle, so list_empty(&worker->entry) can be used to test whether either is pending and WORKER_DIE to distinguish between the two instead making WORKER_REBIND unnecessary. Use list_empty(&worker->entry) to determine whether destruction or rebinding is pending. This simplifies worker state transitions. WORKER_REBIND is not needed anymore. Remove it. tj: Updated comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-18 09:59:23 -07:00
Lai Jiangshan	eab6d82843	workqueue: WORKER_REBIND is no longer necessary for busy rebinding Because the old unbind/rebinding implementation wasn't atomic w.r.t. GCWQ_DISASSOCIATED manipulation which is protected by global_cwq->lock, we had to use two flags, WORKER_UNBOUND and WORKER_REBIND, to avoid incorrectly losing all NOT_RUNNING bits with back-to-back CPU hotplug operations; otherwise, completion of rebinding while another unbinding is in progress could clear UNBIND prematurely. Now that both unbind/rebinding are atomic w.r.t. GCWQ_DISASSOCIATED, there's no need to use two flags. Just one is enough. Don't use WORKER_REBIND for busy rebinding. tj: Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-18 09:59:22 -07:00
Lai Jiangshan	ea1abd6197	workqueue: reimplement idle worker rebinding Currently rebind_workers() uses rebinds idle workers synchronously before proceeding to requesting busy workers to rebind. This is necessary because all workers on @worker_pool->idle_list must be bound before concurrency management local wake-ups from the busy workers take place. Unfortunately, the synchronous idle rebinding is quite complicated. This patch reimplements idle rebinding to simplify the code path. Rather than trying to make all idle workers bound before rebinding busy workers, we simply remove all to-be-bound idle workers from the idle list and let them add themselves back after completing rebinding (successful or not). As only workers which finished rebinding can on on the idle worker list, the idle worker list is guaranteed to have only bound workers unless CPU went down again and local wake-ups are safe. After the change, @worker_pool->nr_idle may deviate than the actual number of idle workers on @worker_pool->idle_list. More specifically, nr_idle may be non-zero while ->idle_list is empty. All users of ->nr_idle and ->idle_list are audited. The only affected one is too_many_workers() which is updated to check %false if ->idle_list is empty regardless of ->nr_idle. After this patch, rebind_workers() no longer performs the nasty idle-rebind retries which require temporary release of gcwq->lock, and both unbinding and rebinding are atomic w.r.t. global_cwq->lock. worker->idle_rebind and global_cwq->rebind_hold are now unnecessary and removed along with the definition of struct idle_rebind. Changed from V1: 1) remove unlikely from too_many_workers(), ->idle_list can be empty anytime, even before this patch, no reason to use unlikely. 2) fix a small rebasing mistake. (which is from rebasing the orignal fixing patch to for-next) 3) add a lot of comments. 4) clear WORKER_REBIND unconditionaly in idle_worker_rebind() tj: Updated comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-18 09:59:22 -07:00
Tejun Heo	6c1423ba5d	Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into for-3.7 This merge is necessary as Lai's CPU hotplug restructuring series depends on the CPU hotplug bug fixes in for-3.6-fixes. The merge creates one trivial conflict between the following two commits. `96e65306b8` "workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic" `e2b6a6d570` "workqueue: use system_highpri_wq for highpri workers in rebind_workers()" Both add local variable definitions to the same block and can be merged in any order. Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-17 16:09:09 -07:00
Lai Jiangshan	960bd11bf2	workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn() busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed (CPU is down again). This used to be okay because the flag wasn't used for anything else. However, after `25511a477` "workqueue: reimplement CPU online rebinding to handle idle workers", WORKER_REBIND is also used to command idle workers to rebind. If not cleared, the worker may confuse the next CPU_UP cycle by having REBIND spuriously set or oops / get stuck by prematurely calling idle_worker_rebind(). WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5 00() Hardware name: Bochs Modules linked in: test_wq(O-) Pid: 33, comm: kworker/1:1 Tainted: G O 3.6.0-rc1-work+ #3 Call Trace: [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500 [<ffffffff810bc16e>] kthread+0xbe/0xd0 [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10 ---[ end trace e977cf20f4661968 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500 PGD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: test_wq(O-) CPU 0 Pid: 33, comm: kworker/1:1 Tainted: G W O 3.6.0-rc1-work+ #3 Bochs Bochs RIP: 0010:[<ffffffff810b3db0>] [<ffffffff810b3db0>] worker_thread+0x360/0x500 RSP: 0018:ffff88001e1c9de0 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009 RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580 R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900 FS: 0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900) Stack: ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010 ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900 ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340 Call Trace: [<ffffffff810bc16e>] kthread+0xbe/0xd0 [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10 Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b RIP [<ffffffff810b3db0>] worker_thread+0x360/0x500 RSP <ffff88001e1c9de0> CR2: 0000000000000000 There was no reason to keep WORKER_REBIND on failure in the first place - WORKER_UNBOUND is guaranteed to be set in such cases preventing incorrectly activating concurrency management. Always clear WORKER_REBIND. tj: Updated comment and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-17 15:42:31 -07:00
Lai Jiangshan	ee378aa49b	workqueue: fix possible idle worker depletion across CPU hotplug To simplify both normal and CPU hotplug paths, worker management is prevented while CPU hoplug is in progress. This is achieved by CPU hotplug holding the same exclusion mechanism used by workers to ensure there's only one manager per pool. If someone else seems to be performing the manager role, workers proceed to execute work items. CPU hotplug using the same mechanism can lead to idle worker depletion because all workers could proceed to execute work items while CPU hotplug is in progress and CPU hotplug itself wouldn't actually perform the worker management duty - it doesn't guarantee that there's an idle worker left when it releases management. This idle worker depletion, under extreme circumstances, can break forward-progress guarantee and thus lead to deadlock. This patch fixes the bug by using separate mechanisms for manager exclusion among workers and hotplug exclusion. For manager exclusion, POOL_MANAGING_WORKERS which was restored by the previous patch is used. pool->manager_mutex is now only used for exclusion between the elected manager and CPU hotplug. The elected manager won't proceed without holding pool->manager_mutex. This ensures that the worker which won the manager position can't skip managing while CPU hotplug is in progress. It will block on manager_mutex and perform management after CPU hotplug is complete. Note that hotplug may happen while waiting for manager_mutex. A manager isn't either on idle or busy list and thus the hoplug code can't unbind/rebind it. Make the manager handle its own un/rebinding. tj: Updated comment and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-10 10:05:54 -07:00
Lai Jiangshan	552a37e936	workqueue: restore POOL_MANAGING_WORKERS This patch restores POOL_MANAGING_WORKERS which was replaced by pool->manager_mutex by `6037315269` "workqueue: use mutex for global_cwq manager exclusion". There's a subtle idle worker depletion bug across CPU hotplug events and we need to distinguish an actual manager and CPU hotplug preventing management. POOL_MANAGING_WORKERS will be used for the former and manager_mutex the later. This patch just lays POOL_MANAGING_WORKERS on top of the existing manager_mutex and doesn't introduce any synchronization changes. The next patch will update it. Note that this patch fixes a non-critical anomaly where too_many_workers() may return %true spuriously while CPU hotplug is in progress. While the issue could schedule idle timer spuriously, it didn't trigger any actual misbehavior. tj: Rewrote patch description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-09-10 10:04:54 -07:00
Tejun Heo	ec58815ab0	workqueue: fix possible deadlock in idle worker rebinding Currently, rebind_workers() and idle_worker_rebind() are two-way interlocked. rebind_workers() waits for idle workers to finish rebinding and rebound idle workers wait for rebind_workers() to finish rebinding busy workers before proceeding. Unfortunately, this isn't enough. The second wait from idle workers is implemented as follows. wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND)); rebind_workers() clears WORKER_REBIND, wakes up the idle workers and then returns. If CPU hotplug cycle happens again before one of the idle workers finishes the above wait_event(), rebind_workers() will repeat the first part of the handshake - set WORKER_REBIND again and wait for the idle worker to finish rebinding - and this leads to deadlock because the idle worker would be waiting for WORKER_REBIND to clear. This is fixed by adding another interlocking step at the end - rebind_workers() now waits for all the idle workers to finish the above WORKER_REBIND wait before returning. This ensures that all rebinding steps are complete on all idle workers before the next hotplug cycle can happen. This problem was diagnosed by Lai Jiangshan who also posted a patch to fix the issue, upon which this patch is based. This is the minimal fix and further patches are scheduled for the next merge window to simplify the CPU hotplug path. Signed-off-by: Tejun Heo <tj@kernel.org> Original-patch-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <1346516916-1991-3-git-send-email-laijs@cn.fujitsu.com>	2012-09-05 16:10:15 -07:00
Tejun Heo	90beca5de5	workqueue: move WORKER_REBIND clearing in rebind_workers() to the end of the function This doesn't make any functional difference and is purely to help the next patch to be simpler. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>	2012-09-05 16:10:14 -07:00
Lai Jiangshan	96e65306b8	workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic The compiler may compile the following code into TWO write/modify instructions. worker->flags &= ~WORKER_UNBOUND; worker->flags \|= WORKER_REBIND; so the other CPU may temporarily see worker->flags which doesn't have either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup prematurely. Fix it by using single explicit assignment via ACCESS_ONCE(). Because idle workers have another WORKER_NOT_RUNNING flag, this bug doesn't exist for them; however, update it to use the same pattern for consistency. tj: Applied the change to idle workers too and updated comments and patch description a bit. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2012-09-04 17:04:45 -07:00
Tejun Heo	57b30ae77b	workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() cancel_delayed_work() can't be called from IRQ handlers due to its use of del_timer_sync() and can't cancel work items which are already transferred from timer to worklist. Also, unlike other flush and cancel functions, a canceled delayed_work would still point to the last associated cpu_workqueue. If the workqueue is destroyed afterwards and the work item is re-used on a different workqueue, the queueing code can oops trying to dereference already freed cpu_workqueue. This patch reimplements cancel_delayed_work() using try_to_grab_pending() and set_work_cpu_and_clear_pending(). This allows the function to be called from IRQ handlers and makes its behavior consistent with other flush / cancel functions. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>	2012-08-21 13:18:24 -07:00
Tejun Heo	e0aecdd874	workqueue: use irqsafe timer for delayed_work Up to now, for delayed_works, try_to_grab_pending() couldn't be used from IRQ handlers because IRQs may happen while delayed_work_timer_fn() is in progress leading to indefinite -EAGAIN. This patch makes delayed_work use the new TIMER_IRQSAFE flag for delayed_work->timer. This makes try_to_grab_pending() and thus mod_delayed_work_on() safe to call from IRQ handlers. Signed-off-by: Tejun Heo <tj@kernel.org>	2012-08-21 13:18:24 -07:00
Tejun Heo	ae930e0f4e	workqueue: gut system_nrt[_freezable]_wq() Now that all workqueues are non-reentrant, system[_freezable]_wq() are equivalent to system_nrt[_freezable]_wq(). Replace the latter with wrappers around system[_freezable]_wq(). The wrapping goes through inline functions so that __deprecated can be added easily. Signed-off-by: Tejun Heo <tj@kernel.org>	2012-08-20 14:51:23 -07:00

1 2 3 4 5 ...

315 Commits