2498 Commits

Author SHA1 Message Date
Masanari Iida
278cee0515 treewide: Fix typo in printk
Correct spelling typo in printk within various drivers.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-06-18 13:48:45 +02:00
Greg Kroah-Hartman
bb07b00be7 Merge 3.10-rc6 into driver-core-next
We want these fixes here too.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-17 16:57:20 -07:00
Greg Kroah-Hartman
0e496b8e84 Merge 3.10-rc6 into char-misc-next
We want the fixes in here.
2013-06-17 11:54:25 -07:00
Tejun Heo
a4244454df percpu-refcount: use RCU-sched insted of normal RCU
percpu-refcount was incorrectly using preempt_disable/enable() for RCU
critical sections against call_rcu().  6a24474da8 ("percpu-refcount:
consistently use plain (non-sched) RCU") fixed it by converting the
preepmtion operations with rcu_read_[un]lock() citing that there isn't
any advantage in using sched-RCU over using the usual one; however,
rcu_read_[un]lock() for the preemptible RCU implementation -
CONFIG_TREE_PREEMPT_RCU, chosen when CONFIG_PREEMPT - are slightly
more expensive than preempt_disable/enable().

In a contrived microbench which repeats the followings,

 - percpu_ref_get()
 - copy 32 bytes of data into percpu buffer
 - percpu_put_get()
 - copy 32 bytes of data into percpu buffer

rcu_read_[un]lock() used in percpu_ref_get/put() makes it go slower by
about 15% when compared to using sched-RCU.

As the RCU critical sections are extremely short, using sched-RCU
shouldn't have any latency implications.  Convert to RCU-sched.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
2013-06-16 16:12:26 -07:00
Tejun Heo
dbece3a0f1 percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()
Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed.  Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.

For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added.  The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.

While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().

v2: Patch description rephrased to emphasize that tryget() may
    continue to succeed on some CPUs after kill() returns as suggested
    by Kent.

v3: Function comment in percpu_ref_kill_and_confirm() updated warning
    people to not depend on the implied RCU grace period from the
    confirm callback as it's an implementation detail.

Signed-off-by: Tejun Heo <tj@kernel.org>
Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13 19:23:53 -07:00
Tejun Heo
bc497bd33b percpu-refcount: implement percpu_ref_cancel_init()
Normally, percpu_ref_init() initializes and percpu_ref_kill()
initiates destruction which completes asynchronously.  The
asynchronous destruction can be problematic in init failure path where
the caller wants to destroy half-constructed object - distinguishing
half-constructed objects from the usual release method can be painful
for complex objects.

This patch implements percpu_ref_cancel_init() which synchronously
destroys the percpu_ref without invoking release.  To avoid
unintentional misuses, the function requires the ref to have finished
percpu_ref_init() but never used and triggers WARN otherwise.

v2: Explain the weird name and usage restriction in the function
    comment.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-13 11:08:27 -07:00
Tejun Heo
acac7883ee percpu-refcount: add __must_check to percpu_ref_init() and don't use ACCESS_ONCE() in percpu_ref_kill_rcu()
Two small changes.

* Unlike most init functions, percpu_ref_init() allocates memory and
  may fail.  Let's mark it with __must_check in case the caller
  forgets.

* percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to
  dereference @ref->pcpu_count, which can be misleading.  The pointer
  is guaranteed to be valid and visible and can't change underneath
  the function.  Drop ACCESS_ONCE().

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-13 11:08:26 -07:00
Tejun Heo
ac899061a9 percpu-refcount: cosmetic updates
* s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t
  postfix for types and the type is gonna be used for a different type
  of callback too.

* Add @ARG to function comments.

* Drop unnecessary and unaligned indentation from percpu_ref_init()
  function comment.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Kent Overstreet <koverstreet@google.com>
2013-06-12 20:43:06 -07:00
Chen Gang
5402b8047b lib/mpi/mpicoder.c: looping issue, need stop when equal to zero, found by 'EXTRA_FLAGS=-W'.
For 'while' looping, need stop when 'nbytes == 0', or will cause issue.
('nbytes' is size_t which is always bigger or equal than zero).

The related warning: (with EXTRA_CFLAGS=-W)

  lib/mpi/mpicoder.c:40:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:44 -07:00
Kees Cook
b7165ebbf0 kobject: sanitize argument for format string
Unlike kobject_set_name(), the kset_create_and_add() interface does not
provide a way to use format strings, so make sure that the interface
cannot be abused accidentally. It looks like all current callers use
static strings, so there's no existing flaw.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-07 16:05:50 -07:00
Andy Shevchenko
4cd5773a2a net: core: move mac_pton() to lib/net_utils.c
Since we have at least one user of this function outside of CONFIG_NET
scope, we have to provide this function independently. The proposed
solution is to move it under lib/net_utils.c with corresponding
configuration variable and select wherever it is needed.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-05 12:00:27 -07:00
Herbert Xu
67822649d7 crypto: crct10dif - Use PTR_RET
lib/crc-t10dif.c:42:1-3: WARNING: PTR_RET can be used

 Use PTR_RET rather than if(IS_ERR(...)) + PTR_ERR

Generated by: coccinelle/api/ptr_ret.cocci

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-06-05 16:27:51 +08:00
Kent Overstreet
c1ae6e9b4d percpu-refcount: Don't use silly cmpxchg()
The cmpxchg() was just to ensure the debug check didn't race, which was
a bit excessive. The caller is supposed to do the appropriate
synchronization, which means percpu_ref_kill() can just do a simple
store.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-03 16:04:04 -07:00
Kent Overstreet
215e262f2a percpu: implement generic percpu refcounting
This implements a refcount with similar semantics to
atomic_get()/atomic_dec_and_test() - but percpu.

It also implements two stage shutdown, as we need it to tear down the
percpu counts.  Before dropping the initial refcount, you must call
percpu_ref_kill(); this puts the refcount in "shutting down mode" and
switches back to a single atomic refcount with the appropriate
barriers (synchronize_rcu()).

It's also legal to call percpu_ref_kill() multiple times - it only
returns true once, so callers don't have to reimplement shutdown
synchronization.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-03 15:36:41 -07:00
Seth Jennings
3a76e5e09f debugfs: add get/set for atomic types
debugfs currently lack the ability to create attributes
that set/get atomic_t values.

This patch adds support for this through a new
debugfs_create_atomic_t() function.

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-03 13:55:01 -07:00
Steven Rostedt
360603a1be sprintf: hex_string(): fix comment
hex_string() had a typo in a comment.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-29 01:14:46 +02:00
Helge Deller
70ef5578dd MPILIB: disable usage of floating point registers on parisc
The umul_ppmm() macro for parisc uses the xmpyu assembler statement
which does calculation via a floating point register.

But usage of floating point registers inside the Linux kernel are not
allowed and gcc will stop compilation due to the -mdisable-fpregs
compiler option.

Fix this by disabling the umul_ppmm() and udiv_qrnnd() macros. The
mpilib will then use the generic built-in implementations instead.

Signed-off-by: Helge Deller <deller@gmx.de>
2013-05-24 22:30:11 +02:00
Linus Torvalds
c7153d0643 Driver core fixes for 3.10-rc2
Here are 3 tiny driver core fixes for 3.10-rc2.
 
 A needed symbol export, a change to make it easier to track down
 offending sysfs files with incorrect attributes, and a klist bugfix.
 
 All have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlGePdAACgkQMUfUDdst+ynX3wCfbodTGeimy2GTnc5psVgXV/x4
 bz8AnR6G/JNCw54meAJ5UlYJRj7Dwo09
 =MNP/
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg Kroah-Hartman:
 "Here are 3 tiny driver core fixes for 3.10-rc2.

  A needed symbol export, a change to make it easier to track down
  offending sysfs files with incorrect attributes, and a klist bugfix.

  All have been in linux-next for a while"

* tag 'driver-core-3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  klist: del waiter from klist_remove_waiters before wakeup waitting process
  driver core: print sysfs attribute name when warning about bogus permissions
  driver core: export subsys_virtual_register
2013-05-23 09:27:08 -07:00
Randy Dunlap
b4d3ba3346 lib: make iovec obj instead of lib
Fix build error io vmw_vmci.ko when CONFIG_VMWARE_VMCI=m by chaning
iovec.o from lib-y to obj-y.

  ERROR: "memcpy_toiovec" [drivers/misc/vmw_vmci/vmw_vmci.ko] undefined!
  ERROR: "memcpy_fromiovec" [drivers/misc/vmw_vmci/vmw_vmci.ko] undefined!

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-23 09:17:11 -07:00
wang, biao
ac5a2962b0 klist: del waiter from klist_remove_waiters before wakeup waitting process
There is a race between klist_remove and klist_release. klist_remove
uses a local var waiter saved on stack. When klist_release calls
wake_up_process(waiter->process) to wake up the waiter, waiter might run
immediately and reuse the stack. Then, klist_release calls
list_del(&waiter->list) to change previous
wait data and cause prior waiter thread corrupt.

The patch fixes it against kernel 3.9.

Signed-off-by: wang, biao <biao.wang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-21 10:16:39 -07:00
Tim Chen
2d31e518a4 crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework
When CRC T10 DIF is calculated using the crypto transform framework, we
wrap the crc_t10dif function call to utilize it.  This allows us to
take advantage of any accelerated CRC T10 DIF transform that is
plugged into the crypto framework.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-05-20 20:11:01 +08:00
Rusty Russell
d2f83e9078 Hoist memcpy_fromiovec/memcpy_toiovec into lib/
ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!

That function is only present with CONFIG_NET.  Turns out that
crypto/algif_skcipher.c also uses that outside net, but it actually
needs sockets anyway.

In addition, commit 6d4f0139d642c45411a47879325891ce2a7c164a added
CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
that function and revert that commit too.

socket.h already includes uio.h, so no callers need updating; trying
only broke things fo x86_64 randconfig (thanks Fengguang!).

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-05-20 10:24:22 +09:30
Linus Torvalds
ebb3727779 Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "It might look big in volume, but when categorized, not a lot of
  drivers are touched.  The pull request contains:

   - mtip32xx fixes from Micron.

   - A slew of drbd updates, this time in a nicer series.

   - bcache, a flash/ssd caching framework from Kent.

   - Fixes for cciss"

* 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
  bcache: Use bd_link_disk_holder()
  bcache: Allocator cleanup/fixes
  cciss: bug fix to prevent cciss from loading in kdump crash kernel
  cciss: add cciss_allow_hpsa module parameter
  drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
  mtip32xx: Workaround for unaligned writes
  bcache: Make sure blocksize isn't smaller than device blocksize
  bcache: Fix merge_bvec_fn usage for when it modifies the bvm
  bcache: Correctly check against BIO_MAX_PAGES
  bcache: Hack around stuff that clones up to bi_max_vecs
  bcache: Set ra_pages based on backing device's ra_pages
  bcache: Take data offset from the bdev superblock.
  mtip32xx: mtip32xx: Disable TRIM support
  mtip32xx: fix a smatch warning
  bcache: Disable broken btree fuzz tester
  bcache: Fix a format string overflow
  bcache: Fix a minor memory leak on device teardown
  bcache: Documentation updates
  bcache: Use WARN_ONCE() instead of __WARN()
  bcache: Add missing #include <linux/prefetch.h>
  ...
2013-05-08 11:51:05 -07:00
Davidlohr Bueso
9607a85b67 rwsem: check counter to avoid cmpxchg calls
This patch tries to reduce the amount of cmpxchg calls in the writer
failed path by checking the counter value first before issuing the
instruction.  If ->count is not set to RWSEM_WAITING_BIAS then there is
no point wasting a cmpxchg call.

Furthermore, Michel states "I suppose it helps due to the case where
someone else steals the lock while we're trying to acquire
sem->wait_lock."

Two very different workloads and machines were used to see how this
patch improves throughput: pgbench on a quad-core laptop and aim7 on a
large 8 socket box with 80 cores.

Some results comparing Michel's fast-path write lock stealing
(tps-rwsem) on a quad-core laptop running pgbench:

  | db_size | clients  |  tps-rwsem     |   tps-patch  |
  +---------+----------+----------------+--------------+
  | 160 MB   |       1 |           6906 |         9153 | + 32.5
  | 160 MB   |       2 |          15931 |        22487 | + 41.1%
  | 160 MB   |       4 |          33021 |        32503 |
  | 160 MB   |       8 |          34626 |        34695 |
  | 160 MB   |      16 |          33098 |        34003 |
  | 160 MB   |      20 |          31343 |        31440 |
  | 160 MB   |      30 |          28961 |        28987 |
  | 160 MB   |      40 |          26902 |        26970 |
  | 160 MB   |      50 |          25760 |        25810 |
  ------------------------------------------------------
  | 1.6 GB   |       1 |           7729 |         7537 |
  | 1.6 GB   |       2 |          19009 |        23508 | + 23.7%
  | 1.6 GB   |       4 |          33185 |        32666 |
  | 1.6 GB   |       8 |          34550 |        34318 |
  | 1.6 GB   |      16 |          33079 |        32689 |
  | 1.6 GB   |      20 |          31494 |        31702 |
  | 1.6 GB   |      30 |          28535 |        28755 |
  | 1.6 GB   |      40 |          27054 |        27017 |
  | 1.6 GB   |      50 |          25591 |        25560 |
  ------------------------------------------------------
  | 7.6 GB   |       1 |           6224 |         7469 | + 20.0%
  | 7.6 GB   |       2 |          13611 |        12778 |
  | 7.6 GB   |       4 |          33108 |        32927 |
  | 7.6 GB   |       8 |          34712 |        34878 |
  | 7.6 GB   |      16 |          32895 |        33003 |
  | 7.6 GB   |      20 |          31689 |        31974 |
  | 7.6 GB   |      30 |          29003 |        28806 |
  | 7.6 GB   |      40 |          26683 |        26976 |
  | 7.6 GB   |      50 |          25925 |        25652 |
  ------------------------------------------------------

For the aim7 worloads, they overall improved on top of Michel's
patchset.  For full graphs on how the rwsem series plus this patch
behaves on a large 8 socket machine against a vanilla kernel:

  http://stgolabs.net/rwsem-aim7-results.tar.gz

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 16:11:51 -07:00
Anatol Pomozov
2d864e4171 kref: minor cleanup
- make warning smp-safe
 - result of atomic _unless_zero functions should be checked by caller
   to avoid use-after-free error
 - trivial whitespace fix.

Link: https://lkml.org/lkml/2013/4/12/391

Tested: compile x86, boot machine and run xfstests
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
[ Removed line-break, changed to use WARN_ON_ONCE()  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 16:09:00 -07:00
Linus Torvalds
c8de2fa4dc Merge branch 'rwsem-optimizations'
Merge rwsem optimizations from Michel Lespinasse:
 "These patches extend Alex Shi's work (which added write lock stealing
  on the rwsem slow path) in order to provide rwsem write lock stealing
  on the fast path (that is, without taking the rwsem's wait_lock).

  I have unfortunately been unable to push this through -next before due
  to Ingo Molnar / David Howells / Peter Zijlstra being busy with other
  things.  However, this has gotten some attention from Rik van Riel and
  Davidlohr Bueso who both commented that they felt this was ready for
  v3.10, and Ingo Molnar has said that he was OK with me pushing
  directly to you.  So, here goes :)

  Davidlohr got the following test results from pgbench running on a
  quad-core laptop:

    | db_size | clients  |  tps-vanilla   |   tps-rwsem  |
    +---------+----------+----------------+--------------+
    | 160 MB   |       1 |           5803 |         6906 | + 19.0%
    | 160 MB   |       2 |          13092 |        15931 |
    | 160 MB   |       4 |          29412 |        33021 |
    | 160 MB   |       8 |          32448 |        34626 |
    | 160 MB   |      16 |          32758 |        33098 |
    | 160 MB   |      20 |          26940 |        31343 | + 16.3%
    | 160 MB   |      30 |          25147 |        28961 |
    | 160 MB   |      40 |          25484 |        26902 |
    | 160 MB   |      50 |          24528 |        25760 |
    ------------------------------------------------------
    | 1.6 GB   |       1 |           5733 |         7729 | + 34.8%
    | 1.6 GB   |       2 |           9411 |        19009 | + 101.9%
    | 1.6 GB   |       4 |          31818 |        33185 |
    | 1.6 GB   |       8 |          33700 |        34550 |
    | 1.6 GB   |      16 |          32751 |        33079 |
    | 1.6 GB   |      20 |          30919 |        31494 |
    | 1.6 GB   |      30 |          28540 |        28535 |
    | 1.6 GB   |      40 |          26380 |        27054 |
    | 1.6 GB   |      50 |          25241 |        25591 |
    ------------------------------------------------------
    | 7.6 GB   |       1 |           5779 |         6224 |
    | 7.6 GB   |       2 |          10897 |        13611 | + 24.9%
    | 7.6 GB   |       4 |          32683 |        33108 |
    | 7.6 GB   |       8 |          33968 |        34712 |
    | 7.6 GB   |      16 |          32287 |        32895 |
    | 7.6 GB   |      20 |          27770 |        31689 | + 14.1%
    | 7.6 GB   |      30 |          26739 |        29003 |
    | 7.6 GB   |      40 |          24901 |        26683 |
    | 7.6 GB   |      50 |          17115 |        25925 | + 51.5%
    ------------------------------------------------------

  (Davidlohr also has one additional patch which further improves
  throughput, though I will ask him to send it directly to you as I have
  suggested some minor changes)."

* emailed patches from Michel Lespinasse <walken@google.com>:
  rwsem: no need for explicit signed longs
  x86 rwsem: avoid taking slow path when stealing write lock
  rwsem: do not block readers at head of queue if other readers are active
  rwsem: implement support for write lock stealing on the fastpath
  rwsem: simplify __rwsem_do_wake
  rwsem: skip initial trylock in rwsem_down_write_failed
  rwsem: avoid taking wait_lock in rwsem_down_write_failed
  rwsem: use cmpxchg for trying to steal write lock
  rwsem: more agressive lock stealing in rwsem_down_write_failed
  rwsem: simplify rwsem_down_write_failed
  rwsem: simplify rwsem_down_read_failed
  rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed
  rwsem: shorter spinlocked section in rwsem_down_failed_common()
  rwsem: make the waiter type an enumeration rather than a bitmask
2013-05-07 09:22:03 -07:00
Davidlohr Bueso
b5f541810e rwsem: no need for explicit signed longs
Change explicit "signed long" declarations into plain "long" as suggested
by Peter Hurley.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:17 -07:00
Michel Lespinasse
25c3932596 rwsem: do not block readers at head of queue if other readers are active
This change fixes a race condition where a reader might determine it
needs to block, but by the time it acquires the wait_lock the rwsem has
active readers and no queued waiters.

In this situation the reader can run in parallel with the existing
active readers; it does not need to block until the active readers
complete.

Thanks to Peter Hurley for noticing this possible race.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:17 -07:00
Michel Lespinasse
fe6e674c61 rwsem: implement support for write lock stealing on the fastpath
When we decide to wake up readers, we must first grant them as many read
locks as necessary, and then actually wake up all these readers.  But in
order to know how many read shares to grant, we must first count the
readers at the head of the queue.  This might take a while if there are
many readers, and we want to be protected against a writer stealing the
lock while we're counting.  To that end, we grant the first reader lock
before counting how many more readers are queued.

We also require some adjustments to the wake_type semantics.

RWSEM_WAKE_NO_ACTIVE used to mean that we had found the count to be
RWSEM_WAITING_BIAS, in which case the rwsem was known to be free as
nobody could steal it while we hold the wait_lock.  This doesn't make
sense once we implement fastpath write lock stealing, so we now use
RWSEM_WAKE_ANY in that case.

Similarly, when rwsem_down_write_failed found that a read lock was
active, it would use RWSEM_WAKE_READ_OWNED which signalled that new
readers could be woken without checking first that the rwsem was
available.  We can't do that anymore since the existing readers might
release their read locks, and a writer could steal the lock before we
wake up additional readers.  So, we have to use a new RWSEM_WAKE_READERS
value to indicate we only want to wake readers, but we don't currently
hold any read lock.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:16 -07:00
Michel Lespinasse
8cf5322ce6 rwsem: simplify __rwsem_do_wake
This is mostly for cleanup value:

- We don't need several gotos to handle the case where the first
  waiter is a writer. Two simple tests will do (and generate very
  similar code).

- In the remainder of the function, we know the first waiter is a reader,
  so we don't have to double check that. We can use do..while loops
  to iterate over the readers to wake (generates slightly better code).

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:16 -07:00
Michel Lespinasse
9b0fc9c09f rwsem: skip initial trylock in rwsem_down_write_failed
We can skip the initial trylock in rwsem_down_write_failed() if there
are known active lockers already, thus saving one likely-to-fail
cmpxchg.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:16 -07:00
Michel Lespinasse
a7d2c573ae rwsem: avoid taking wait_lock in rwsem_down_write_failed
In rwsem_down_write_failed(), if there are active locks after we wake up
(i.e.  the lock got stolen from us), skip taking the wait_lock and go
back to sleep immediately.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:16 -07:00
Michel Lespinasse
5ede972df1 rwsem: use cmpxchg for trying to steal write lock
Using rwsem_atomic_update to try stealing the write lock forced us to
undo the adjustment in the failure path.  We can have simpler and faster
code by using cmpxchg instead.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:16 -07:00
Michel Lespinasse
ed00f64346 rwsem: more agressive lock stealing in rwsem_down_write_failed
Some small code simplifications can be achieved by doing more agressive
lock stealing:

- When rwsem_down_write_failed() notices that there are no active locks
  (and thus no thread to wake us if we decided to sleep), it used to wake
  the first queued process. However, stealing the lock is also sufficient
  to deal with this case, so we don't need this check anymore.

- In try_get_writer_sem(), we can steal the lock even when the first waiter
  is a reader. This is correct because the code path that wakes readers is
  protected by the wait_lock. As to the performance effects of this change,
  they are expected to be minimal: readers are still granted the lock
  (rather than having to acquire it themselves) when they reach the front
  of the wait queue, so we have essentially the same behavior as in
  rwsem-spinlock.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:16 -07:00
Michel Lespinasse
023fe4f712 rwsem: simplify rwsem_down_write_failed
When waking writers, we never grant them the lock - instead, they have
to acquire it themselves when they run, and remove themselves from the
wait_list when they succeed.

As a result, we can do a few simplifications in rwsem_down_write_failed():

- We don't need to check for !waiter.task since __rwsem_do_wake() doesn't
  remove writers from the wait_list

- There is no point releaseing the wait_lock before entering the wait loop,
  as we will need to reacquire it immediately. We can change the loop so
  that the lock is always held at the start of each loop iteration.

- We don't need to get a reference on the task structure, since the task
  is responsible for removing itself from the wait_list. There is no risk,
  like in the rwsem_down_read_failed() case, that a task would wake up and
  exit (thus destroying its task structure) while __rwsem_do_wake() is
  still running - wait_lock protects against that.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:16 -07:00
Michel Lespinasse
da16922cc0 rwsem: simplify rwsem_down_read_failed
When trying to acquire a read lock, the RWSEM_ACTIVE_READ_BIAS
adjustment doesn't cause other readers to block, so we never have to
worry about waking them back after canceling this adjustment in
rwsem_down_read_failed().

We also never want to steal the lock in rwsem_down_read_failed(), so we
don't have to grab the wait_lock either.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:16 -07:00
Michel Lespinasse
1e78277ccb rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed
Remove the rwsem_down_failed_common function and replace it with two
identical copies of its code in rwsem_down_{read,write}_failed.

This is because we want to make different optimizations in
rwsem_down_{read,write}_failed; we are adding this pure-duplication
step as a separate commit in order to make it easier to check the
following steps.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:16 -07:00
Michel Lespinasse
f7dd1cee9a rwsem: shorter spinlocked section in rwsem_down_failed_common()
This change reduces the size of the spinlocked and TASK_UNINTERRUPTIBLE
sections in rwsem_down_failed_common():

- We only need the sem->wait_lock to insert ourselves on the wait_list;
  the waiter node can be prepared outside of the wait_lock.

- The task state only needs to be set to TASK_UNINTERRUPTIBLE immediately
  before checking if we actually need to sleep; it doesn't need to protect
  the entire function.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:16 -07:00
Michel Lespinasse
e2d57f782c rwsem: make the waiter type an enumeration rather than a bitmask
We are not planning to add some new waiter flags, so we can convert the
waiter type into an enumeration.

Background: David Howells suggested I do this back when I tried adding
a new waiter type for unfair readers. However, I believe the cleanup
applies regardless of that use case.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 07:20:15 -07:00
David Howells
9e6879460c Give the OID registry file module info to avoid kernel tainting
Give the OID registry file module information so that it doesn't taint the
kernel when compiled as a module and loaded.

Reported-by: Dros Adamson <Weston.Adamson@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Trond Myklebust <Trond.Myklebust@netapp.com>
cc: stable@vger.kernel.org
cc: linux-nfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-05 14:38:00 -07:00
Linus Torvalds
20a2078ce7 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "This is the main drm pull request for 3.10.

  Wierd bits:
   - OMAP drm changes required OMAP dss changes, in drivers/video, so I
     took them in here.
   - one more fbcon fix for font handover
   - VT switch avoidance in pm code
   - scatterlist helpers for gpu drivers - have acks from akpm

  Highlights:
   - qxl kms driver - driver for the spice qxl virtual GPU

  Nouveau:
   - fermi/kepler VRAM compression
   - GK110/nvf0 modesetting support.

  Tegra:
   - host1x core merged with 2D engine support

  i915:
   - vt switchless resume
   - more valleyview support
   - vblank fixes
   - modesetting pipe config rework

  radeon:
   - UVD engine support
   - SI chip tiling support
   - GPU registers initialisation from golden values.

  exynos:
   - device tree changes
   - fimc block support

  Otherwise:
   - bunches of fixes all over the place."

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (513 commits)
  qxl: update to new idr interfaces.
  drm/nouveau: fix build with nv50->nvc0
  drm/radeon: fix handling of v6 power tables
  drm/radeon: clarify family checks in pm table parsing
  drm/radeon: consolidate UVD clock programming
  drm/radeon: fix UPLL_REF_DIV_MASK definition
  radeon: add bo tracking debugfs
  drm/radeon: add new richland pci ids
  drm/radeon: add some new SI PCI ids
  drm/radeon: fix scratch reg handling for UVD fence
  drm/radeon: allocate SA bo in the requested domain
  drm/radeon: fix possible segfault when parsing pm tables
  drm/radeon: fix endian bugs in atom_allocate_fb_scratch()
  OMAPDSS: TFP410: return EPROBE_DEFER if the i2c adapter not found
  OMAPDSS: VENC: Add error handling for venc_probe_pdata
  OMAPDSS: HDMI: Add error handling for hdmi_probe_pdata
  OMAPDSS: RFBI: Add error handling for rfbi_probe_pdata
  OMAPDSS: DSI: Add error handling for dsi_probe_pdata
  OMAPDSS: SDI: Add error handling for sdi_probe_pdata
  OMAPDSS: DPI: Add error handling for dpi_probe_pdata
  ...
2013-05-02 19:40:34 -07:00
Linus Torvalds
0279b3c0ad Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "This fixes the cputime scaling overflow problems for good without
  having bad 32-bit overhead, and gets rid of the div64_u64_rem() helper
  as well."

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "math64: New div64_u64_rem helper"
  sched: Avoid prev->stime underflow
  sched: Do not account bogus utime
  sched: Avoid cputime scaling overflow
2013-05-02 14:56:31 -07:00
Linus Torvalds
20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Linus Torvalds
5f56886521 Merge branch 'akpm' (incoming from Andrew)
Merge third batch of fixes from Andrew Morton:
 "Most of the rest.  I still have two large patchsets against AIO and
  IPC, but they're a bit stuck behind other trees and I'm about to
  vanish for six days.

   - random fixlets
   - inotify
   - more of the MM queue
   - show_stack() cleanups
   - DMI update
   - kthread/workqueue things
   - compat cleanups
   - epoll udpates
   - binfmt updates
   - nilfs2
   - hfs
   - hfsplus
   - ptrace
   - kmod
   - coredump
   - kexec
   - rbtree
   - pids
   - pidns
   - pps
   - semaphore tweaks
   - some w1 patches
   - relay updates
   - core Kconfig changes
   - sysrq tweaks"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits)
  Documentation/sysrq: fix inconstistent help message of sysrq key
  ethernet/emac/sysrq: fix inconstistent help message of sysrq key
  sparc/sysrq: fix inconstistent help message of sysrq key
  powerpc/xmon/sysrq: fix inconstistent help message of sysrq key
  ARM/etm/sysrq: fix inconstistent help message of sysrq key
  power/sysrq: fix inconstistent help message of sysrq key
  kgdb/sysrq: fix inconstistent help message of sysrq key
  lib/decompress.c: fix initconst
  notifier-error-inject: fix module names in Kconfig
  kernel/sys.c: make prctl(PR_SET_MM) generally available
  UAPI: remove empty Kbuild files
  menuconfig: print more info for symbol without prompts
  init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display
  kconfig menu: move Virtualization drivers near other virtualization options
  Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  relay: use macro PAGE_ALIGN instead of FIX_SIZE
  kernel/relay.c: move FIX_SIZE macro into relay.c
  kernel/relay.c: remove unused function argument actor
  drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave()
  drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave()
  ...
2013-04-30 17:37:43 -07:00
Andi Kleen
6f9982bdde lib/decompress.c: fix initconst
Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:09 -07:00
Akinobu Mita
e12a95f40a notifier-error-inject: fix module names in Kconfig
The Kconfig help text for MEMORY_NOTIFIER_ERROR_INJECT and
OF_RECONFIG_NOTIFIER_ERROR_INJECT has mismatched module names.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:09 -07:00
Stephen Boyd
446f24d119 Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
The help text for this config is duplicated across the x86, parisc, and
s390 Kconfig.debug files.  Arnd Bergman noted that the help text was
slightly misleading and should be fixed to state that enabling this
option isn't a problem when using pre 4.4 gcc.

To simplify the rewording, consolidate the text into lib/Kconfig.debug
and modify it there to be more explicit about when you should say N to
this config.

Also, make the text a bit more generic by stating that this option
enables compile time checks so we can cover architectures which emit
warnings vs.  ones which emit errors.  The details of how an
architecture decided to implement the checks isn't as important as the
concept of compile time checking of copy_from_user() calls.

While we're doing this, remove all the copy_from_user_overflow() code
that's duplicated many times and place it into lib/ so that any
architecture supporting this option can get the function for free.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:09 -07:00
Davidlohr Bueso
c75aaa8ed0 rbtree_test: add __init/__exit annotations
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:07 -07:00
Davidlohr Bueso
4130f0efbf rbtree_test: add extra rbtree integrity check
Account for the rbtree having  2**bh(v)-1 internal nodes.

While this can be seen as a consequence of other checks, Michel states
that it nicely sums up what the other properties are for.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:07 -07:00
Andy Shevchenko
d338b1379f dynamic_debug: reuse generic string_unescape function
There is kernel function to do the job in generic way. Let's use it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:03 -07:00