3819 Commits

Author SHA1 Message Date
Alex Elder
e0c594878e libceph: record byte count not page count
Record the byte count for an osd request rather than the page count.
The number of pages can always be derived from the byte count (and
alignment/offset) but the reverse is not true.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:36 -07:00
Alex Elder
0fff87ec79 libceph: separate read and write data
An osd request defines information about where data to be read
should be placed as well as where data to write comes from.
Currently these are represented by common fields.

Keep information about data for writing separate from data to be
read by splitting these into data_in and data_out fields.

This is the key patch in this whole series, in that it actually
identifies which osd requests generate outgoing data and which
generate incoming data.  It's less obvious (currently) that an osd
CALL op generates both outgoing and incoming data; that's the focus
of some upcoming work.

This resolves:
    http://tracker.ceph.com/issues/4127

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:27 -07:00
Alex Elder
2ac2b7a6d4 libceph: distinguish page and bio requests
An osd request uses either pages or a bio list for its data.  Use a
union to record information about the two, and add a data type
tag to select between them.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:25 -07:00
Alex Elder
2794a82a11 libceph: separate osd request data info
Pull the fields in an osd request structure that define the data for
the request out into a separate structure.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:24 -07:00
Linus Torvalds
20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Arjan van de Ven
564a232c05 NVMe: Set TASK_INTERRUPTIBLE before processing queues
The kthread has two tasks; handling timeouts (for which it runs once per
second), and submitting queued BIOs.  If a BIO happens to be queued after
the thread has processed the queue but before it calls schedule_timeout(),
the thread will sleep for a second before submitting it, which can cause
performance problems in some rare cases (that will become more common in
a subsequent patch).

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-01 16:40:27 -04:00
Mihnea Dobrescu-Balaur
60abc786dd aoe: replace kmalloc and then memcpy with kmemdup
Signed-off-by: Mihnea Dobrescu-Balaur <mihneadb@gmail.com>
Cc: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:08 -07:00
Michal Belczyk
078be02b80 nbd: increase default and max request sizes
Raise the default max request size for nbd to 128KB (from 127KB) to get it
4KB aligned.  This patch also allows the max request size to be increased
(via /sys/block/nbd<x>/queue/max_sectors_kb) to 32MB.

The patch makes nbd network traffic more efficient by:
- reducing request fragmentation (4KB alignment)
- reducing the number of requests (fewer round trips, less network overhead)

Especially in high latency networks, larger request size can make a dramatic

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Michal Belczyk <belczyk@bsd.krakow.pl>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:07 -07:00
Akinobu Mita
38b682b261 drbd: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:42 -07:00
Mike Miller
0821e90405 cciss: bug fix to prevent cciss from loading in kdump crash kernel
By default the cciss driver supports all "older" HP Smart Array
controllers and hpsa supports all controllers starting with the G6 family.
 There are module parameters that allow a user to override those defaults
and use hpsa for any HP Smart Array controller.

If the user does override the default behavior and uses hpsa for older
controllers it is possible that cciss may try to load in a kdump crash
kernel.  This may happen if cciss is loaded first from the kdump initrd
image.  If cciss does load rather than hpsa and reset_devices is true we
immediately call cciss_hard_reset_controller.  This will result in a
kernel panic and the core file cannot be created.  This patch prevents
cciss from trying to load in this scenario.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-29 21:24:02 +02:00
Mike Miller
e4292e05d4 cciss: add cciss_allow_hpsa module parameter
Add the cciss_allow_hpsa modules parameter.  This allows users to use the
hpsa driver instead of cciss for older controllers.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-29 21:24:02 +02:00
Jingoo Han
aed3d67e57 drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
Add CONFIG_PM_SLEEP to suspend/resume functions to fix the following build
warning when CONFIG_PM_SLEEP is not selected.  This is because sleep PM
callbacks defined by SIMPLE_DEV_PM_OPS are only used when the
CONFIG_PM_SLEEP is enabled.

drivers/block/mg_disk.c:783:12: warning: 'mg_suspend' defined but not used [-Wunused-function]
drivers/block/mg_disk.c:807:12: warning: 'mg_resume' defined but not used [-Wunused-function]

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-29 21:24:02 +02:00
Asai Thambi S P
2077d94726 mtip32xx: Workaround for unaligned writes
Workaround for handling unaligned writes: limit number of outstanding
unaligned writes

Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-29 21:19:49 +02:00
Linus Torvalds
96d8683483 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
 "It's a simple fix for a hard to hit race, but low-risk and clearly
  correct"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: do a safe list traversal in rbd_img_request_submit()
2013-04-17 12:52:02 -07:00
Alex Elder
46faeed4a6 rbd: do a safe list traversal in rbd_img_request_submit()
It's possible that the reference to the object request dropped
inside the loop in rbd_img_request_submit() will be the last
one, in which case the content of the object pointer can't be
trusted.

Use a safe form of the object request list traversal to avoid
problems.

This resolves:
    http://tracker.ceph.com/issues/4705

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-17 11:39:09 -07:00
Keith Busch
5e82e952f0 NVMe: Add a character device for each nvme device
Registers a miscellaneous device for each nvme controller probed. This
creates character device files as /dev/nvmeN, where N is the device
instance, and supports nvme admin ioctl commands so devices without
namespaces can be managed.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-04-16 15:43:55 -04:00
Matthew Wilcox
1c9b52651d NVMe: Fix endian-related problems in user I/O submission path
When constructing the command, dsmgmt needs to be treated as a 32-bit
value, not a 16-bit value.  reftag, apptag and appmask all need to be
converted from native-endian to little-endian.  Again, sparse's bitwise
warnings caught this problem.  Thanks to Keith for pointing out the
correct way to fix the reftag.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
2013-04-16 15:21:06 -04:00
Matthew Wilcox
af2d9ca744 NVMe: Fix I/O cancellation status on big-endian machines
The sparse bitwise checks pointed out that I needed to shift the status
before changing its endianness, not after.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-04-16 15:18:30 -04:00
Vishal Verma
8741ee4cb6 NVMe: Fix sparse warnings in scsi emulation
Sparse produced warnings for some instances of
mismatched types and direct userspace dereferences.
This patch fixes those for the scsi emulation layer.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-04-16 15:14:09 -04:00
Matthew Wilcox
422ef0c7c8 NVMe: Don't fail initialisation unnecessarily
The nvme_dev_add() function currently returns the last error code that it
saw, which (if everything else succeeds) happens to be the result of an
optional command, so it can legitimately fail.  Looking at the error path
more closely reveals that we should return success from this function,
even if no device namespaces are added.  So once the queues are created
and the device has responded to Identify, make sure that this function
succeeds.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
2013-04-16 15:13:41 -04:00
Matthew Wilcox
063cc6d559 NVMe: Abstract out sector to block number conversion
Introduce nvme_block_nr() to help convert sectors to block numbers.
This fixes an integer overflow in the SCSI conversion layer, and it's
slightly less typing than opencoding it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
2013-04-16 15:05:22 -04:00
Arjan van de Ven
acb7aa0db0 NVMe: Use round_jiffies_relative() for the periodic, once-per-second timer
The nvme driver has a "once per second" event where the management kthread
wakes up the system and then reschedules itself for 1 second later.
For power efficiency reasons, I'd like this timer to happen together
with other wakeups in the system.

This patch makes the schedule_timeout() call in the kthread use
round_jiffies_relative(), causing the wakeup to at least align with other
"once per X seconds" events in the kernel.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-04-16 15:05:16 -04:00
Asai Thambi S P
68466cbf92 mtip32xx: mtip32xx: Disable TRIM support
Temporarily disabling TRIM support until TRIM related issues
are addressed in the firmware.

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-14 10:24:24 +02:00
Asai Thambi S P
5a79e1ac21 mtip32xx: fix a smatch warning
Reported smatch warning:
drivers/block/mtip32xx/mtip32xx.c:4163 mtip_block_shutdown() warn: variable dereferenced before check 'dd->disk' (see line 4159)

dd->disk->disk_name accessed before the check if dd->disk is NULL. Fixed this
and access of dd->queue/dd->disk->queue.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-14 10:24:24 +02:00
Linus Torvalds
27387dd8c6 for-linus-20130409
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJRZCnRAAoJEPfTWPspceCmQD0QAMS6a7kYln9n7ByzMa1rfm7x
 Mhu3NkT3I34a5+hgdiIwWTjqJ2Uu78S5pD522eAP4m9yGcG7m1c3aqdTEewDck7K
 xcIjgzS1j1J/XRlrtgPxJhsiA23Kzhliwztrr1YhczWLMijvIrNu2KYZ4EK4/YPB
 KkdGJJitunrOw4nhmnB1AYCRJoZJ5KXOpigqVr7vxTh0Ye7ue2k9mDD3x6tOp/Q0
 TxKxXyEZF9yHgUkpARMcy4OEcr63APfAeiAnZOPpdK5rbCc8pVXpoadLE8UipnlI
 0FcGBzuEezQE3FZrT9PL58FFNEPUmx3NWz2SwYV7Te5Gw5Zw/ffMxk8geE12ALmb
 YuULHGOxz97rxjFImYYGqIOQl7F6V59bYgbXegmKnWjorKEn6LxHyNI1Q76oH5V3
 UgvkmTK91yNa6DvB24Vrt1yKb3GvKQApHAwEn82y+LqcZFkWq1iKqlPVNNGGjCbR
 j47Qu4G/6Bif0vTFC34ciI79ga/di4S4FUOCthsjskm7waDUNvrlbP4Mvwq1VPAc
 VZaYGrQcWCDxuAY/S3XTR+2ci/B0n4Fhyn/iFg/6S+nQawJDeiM+C3yVDbH8caq4
 wV2PXEQ/r+SII4PVGFAmATewUKF3dkUpT2TKhc9KRHZh9+mFBXFj13MZ3+WpBPHl
 NTvEyi5nUL+MdyZGuJUi
 =CB+W
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20130409' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "I've got a few smaller fixes queued up for 3.9 that should go in.  The
  major one is the loop regression, the others are nice fixes on their
  own though.  It contains:

   - Fix for unitialized var in the block sysfs code, courtesy of Arnd
     and gcc-4.8.

   - Two fixes for mtip32xx, fixing probe and command timeout.  Also a
     debug measure that could have waited for 3.10, but it's driver
     only, so I let it slip in.

   - Revert the loop partition cleanup fix, it could cause a deadlock on
     auto-teardown as part of umount.  The fix is clear, but at this
     point we just want to revert it and get a real fix in for 3.10."

* tag 'for-linus-20130409' of git://git.kernel.dk/linux-block:
  Revert "loop: cleanup partitions when detaching loop device"
  mtip32xx: fix two smatch warnings
  mtip32xx: Add debugfs entry device_status
  mtip32xx: return 0 from pci probe in case of rebuild
  mtip32xx: recovery from command timeout
  block: avoid using uninitialized value in from queue_var_store
2013-04-09 12:05:41 -07:00
Al Viro
d9dda78bad procfs: new helper - PDE_DATA(inode)
The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data.  Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:32 -04:00
Al Viro
e88b7bb002 cciss: switch to ->show_info()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:19 -04:00
Al Viro
03d95eb2f2 lift sb_start_write() out of ->write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:12:56 -04:00
Jens Axboe
c2fccc1c9f Revert "loop: cleanup partitions when detaching loop device"
This reverts commit 8761a3dc1f07b163414e2215a2cadbb4cfe2a107.

There are situations where the destruction path is called
with the bdev->bd_mutex already held, which then deadlocks in
loop_clr_fd(). The normal partition cleanup does a trylock()
on the mutex, but it'd be nice to have a more bullet proof
method in loop. So punt this more involved fix to the next
merge window, and just back out this buggy fix for now.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-08 10:12:11 +02:00
Jens Axboe
c66bb3f075 mtip32xx: fix two smatch warnings
Dan reports:

New smatch warnings:
drivers/block/mtip32xx/mtip32xx.c:2728 show_device_status() warn: variable dereferenced before check 'dd' (see line 2727)
drivers/block/mtip32xx/mtip32xx.c:2758 show_device_status() warn: variable dereferenced before check 'dd' (see line 2757)

which are checking if dd == NULL, in a list_for_each_entry() type loop.
Get rid of the check, dd can never be NULL here.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-04 09:03:41 +02:00
Asai Thambi S P
0caff00390 mtip32xx: Add debugfs entry device_status
This patch adds a new debugfs entry 'device_status' in
/sys/kernel/debug/rssd. The value of this entry shows
devices online and those in the process of removing.

Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-03 21:54:02 +02:00
Asai Thambi S P
6b06d35f3f mtip32xx: return 0 from pci probe in case of rebuild
Fix to return 0 from pci probe in case of rebuild. If not, pci consider
probe has failed, and crash during rmmod.

Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-03 21:54:02 +02:00
Asai Thambi S P
d0d096b1d8 mtip32xx: recovery from command timeout
To recover from command timeouts, reset the device. In addition
to that improved timeout handling of PIO commands.

Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-03 21:54:01 +02:00
Jens Axboe
64f8de4da7 Merge branch 'writeback-workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into for-3.10/core
Tejun writes:

-----

This is the pull request for the earlier patchset[1] with the same
name.  It's only three patches (the first one was committed to
workqueue tree) but the merge strategy is a bit involved due to the
dependencies.

* Because the conversion needs features from wq/for-3.10,
  block/for-3.10/core is based on rc3, and wq/for-3.10 has conflicts
  with rc3, I pulled mainline (rc5) into wq/for-3.10 to prevent those
  workqueue conflicts from flaring up in block tree.

* Resolving the issue that Jan and Dave raised about debugging
  requires arch-wide changes.  The patchset is being worked on[2] but
  it'll have to go through -mm after these changes show up in -next,
  and not included in this pull request.

The three commits are located in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git writeback-workqueue

Pulling it into block/for-3.10/core produces a conflict in
drivers/md/raid5.c between the following two commits.

  e3620a3ad5 ("MD RAID5: Avoid accessing gendisk or queue structs when not available")
  2f6db2a707 ("raid5: use bio_reset()")

The conflict is trivial - one removes an "if ()" conditional while the
other removes "rbi->bi_next = NULL" right above it.  We just need to
remove both.  The merged branch is available at

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git block-test-merge

so that you can use it for verification.  The test merge commit has
proper merge description.

While these changes are a bit of pain to route, they make code simpler
and even have, while minute, measureable performance gain[3] even on a
workload which isn't particularly favorable to showing the benefits of
this conversion.

----

Fixed up the conflict.

Conflicts:
	drivers/md/raid5.c

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-02 10:04:39 +02:00
Anatol Pomozov
c1681bf8a7 loop: prevent bdev freeing while device in use
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".

But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
  bd_set_size+0x10/0xa0
  loop_clr_fd+0x1f8/0x420 [loop]
  lo_ioctl+0x200/0x7e0 [loop]
  lo_compat_ioctl+0x47/0xe0 [loop]
  compat_blkdev_ioctl+0x341/0x1290
  do_filp_open+0x42/0xa0
  compat_sys_ioctl+0xc1/0xf20
  do_sys_open+0x16e/0x1d0
  sysenter_dispatch+0x7/0x1a

To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().

The issue is reprodusible on current Linus head and v3.3. Here is the test:

  dd if=/dev/zero of=loop.file bs=1M count=1
  while [ true ]; do
    losetup /dev/loop0 loop.file
    echo 2 > /proc/sys/vm/drop_caches
    losetup -d /dev/loop0
  done

[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
  time we call loop_set_fd() we check that loop_device->lo_state is
  Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
  it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
  device we'll get ENXIO.

  loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
  loop_device->lo_ctl_mutex. ]

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-01 15:48:47 -07:00
Linus Torvalds
ff3421dee6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) sadb_msg prepared for IPSEC userspace forgets to initialize the
    satype field, fix from Nicolas Dichtel.

 2) Fix mac80211 synchronization during station removal, from Johannes
    Berg.

 3) Fix IPSEC sequence number notifications when they wrap, from Steffen
    Klassert.

 4) Fix cfg80211 wdev tracing crashes when add_virtual_intf() returns an
    error pointer, from Johannes Berg.

 5) In mac80211, don't call into the channel context code with the
    interface list mutex held.  From Johannes Berg.

 6) In mac80211, if we don't actually associate, do not restart the STA
    timer, otherwise we can crash.  From Ben Greear.

 7) Missing dma_mapping_error() check in e1000, ixgb, and e1000e.  From
    Christoph Paasch.

 8) Fix sja1000 driver defines to not conflict with SH port, from Marc
    Kleine-Budde.

 9) Don't call il4965_rs_use_green with a NULL station, from Colin Ian
    King.

10) Suspend/Resume in the FEC driver fail because the buffer descriptors
    are not initialized at all the moments in which they should.  Fix
    from Frank Li.

11) cpsw and davinci_emac drivers both use the wrong interface to
    restart a stopped TX queue.  Use netif_wake_queue not
    netif_start_queue, the latter is for initialization/bringup not
    active management of the queue.  From Mugunthan V N.

12) Fix regression in rate calculations done by
    psched_ratecfg_precompute(), missing u64 type promotion.  From
    Sergey Popovich.

13) Fix length overflow in tg3 VPD parsing, from Kees Cook.

14) AOE driver fails to allocate enough headroom, resulting in crashes.
    Fix from Eric Dumazet.

15) RX overflow happens too quickly in sky2 driver because pause packet
    thresholds are not programmed correctly.  From Mirko Lindner.

16) Bonding driver manages arp_interval and miimon settings incorrectly,
    disabling one unintentionally disables both.  Fix from Nikolay
    Aleksandrov.

17) smsc75xx drivers don't program the RX mac properly for jumbo frames.
    Fix from Steve Glendinning.

18) Fix off-by-one in Codel packet scheduler.  From Vijay Subramanian.

19) Fix packet corruption in atl1c by disabling MSI support, from Hannes
    Frederic Sowa.

20) netdev_rx_handler_unregister() needs a synchronize_net() to fix
    crashes in bonding driver unload stress tests.  From Eric Dumazet.

21) rxlen field of ks8851 RX packet descriptors not interpreted
    correctly (it is 12 bits not 16 bits, so needs to be masked after
    shifting the 32-bit value down 16 bits).  Fix from Max Nekludov.

22) Fix missed RX/TX enable in sh_eth driver due to mishandling of link
    change indications.  From Sergei Shtylyov.

23) Fix crashes during spurious ECI interrupts in sh_eth driver, also
    from Sergei Shtylyov.

24) dm9000 driver initialization is done wrong for revision B devices
    with DSP PHY, from Joseph CHANG.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
  DM9000B: driver initialization upgrade
  sh_eth: make 'link' field of 'struct sh_eth_private' *int*
  sh_eth: workaround for spurious ECI interrupt
  sh_eth: fix handling of no LINK signal
  ks8851: Fix interpretation of rxlen field.
  net: add a synchronize_net() in netdev_rx_handler_unregister()
  MAINTAINERS: Update netxen_nic maintainers list
  atl1e: drop pci-msi support because of packet corruption
  net: fq_codel: Fix off-by-one error
  net: calxedaxgmac: Wake-on-LAN fixes
  net: calxedaxgmac: fix rx ring handling when OOM
  net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb'
  smsc75xx: fix jumbo frame support
  net: fix the use of this_cpu_ptr
  bonding: fix disabling of arp_interval and miimon
  ipv6: don't accept node local multicast traffic from the wire
  sky2: Threshold for Pause Packet is set wrong
  sky2: Receive Overflows not counted
  aoe: reserve enough headroom on skbs
  line up comment for ndo_bridge_getlink
  ...
2013-04-01 08:06:30 -07:00
Linus Torvalds
d299c29039 for-linus-20130331
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJRWHWXAAoJEPfTWPspceCmyGIQANHJlvexkzqkPsxzfA+hKi36
 90ramlHmOIGLqxKk8pJLEhAJAEAEmR1sN5FfPBeiI3I7E8RT+vuPHCOCqXhAXgku
 5saB294H0OGeaGsw4cxIl4KQFxBwa2PDskFq5irV4AYJd1IMolwUdyELr2wv37g1
 d4vJJUeJIUBON47pZjVfV96nQ4utISMjtHLeBmvpeREcmfqn2I1qKyYcEXxDkNeX
 DWRIyeJ/UApCxEWbZcxFgaVNVWE/9nGg861HgnuazCu+OiwUVhfMpS+azj/dtl8G
 wdZLhokjXZBi9yd70h8mZ9XReIqMbTUP6k4texNrUQXgHaN87OVUiCgbzL5JBfUB
 Iq2bmlCkSIUOwxV9qOsv1MfNo9TJTB2ZcOZJH381BAqf/ua1ouGzZu9KLTxmalZi
 yIO3oTpifELxgfCV7O/HGEP1jkRTROwpRFjErqPOFx+Jr9vhT+xj/LGZYgAzaVhX
 1HCXMtp8xjRBZa7TrHq/FZY2iO4fS3JZNGg0XaIVim8yHiFWfMnGxOg4TSs5rqEy
 AyPg3rFVufb7n9zSdRpYfgAg6gYK/pgHZ7OcyFTt44wRrGSWpMlR8TMxJREytbJx
 JjKlO2qRuIbBJXnoBS1J3W22Yt8NN/TaaMIoVL4GHD3fUYMbL88NugsjIZ5VKe/N
 /sw12PuUld2rTR+FghHV
 =u2RH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20130331' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Alright, this time from 10K up in the air.

  Collection of fixes that have been queued up since the merge window
  opened, hence postponed until later in the cycle.  The pull request
  contains:

   - A bunch of fixes for the xen blk front/back driver.

   - A round of fixes for the new IBM RamSan driver, fixing various
     nasty issues.

   - Fixes for multiple drives from Wei Yongjun, bad handling of return
     values and wrong pointer math.

   - A fix for loop properly killing partitions when being detached."

* tag 'for-linus-20130331' of git://git.kernel.dk/linux-block: (25 commits)
  mg_disk: fix error return code in mg_probe()
  rsxx: remove unused variable
  rsxx: enable error return of rsxx_eeh_save_issued_dmas()
  block: removes dynamic allocation on stack
  Block: blk-flush: Fixed indent code style
  cciss: fix invalid use of sizeof in cciss_find_cfgtables()
  loop: cleanup partitions when detaching loop device
  loop: fix error return code in loop_add()
  mtip32xx: fix error return code in mtip_pci_probe()
  xen-blkfront: remove frame list from blk_shadow
  xen-blkfront: pre-allocate pages for requests
  xen-blkback: don't store dev_bus_addr
  xen-blkfront: switch from llist to list
  xen-blkback: fix foreach_grant_safe to handle empty lists
  xen-blkfront: replace kmalloc and then memcpy with kmemdup
  xen-blkback: fix dispatch_rw_block_io() error path
  rsxx: fix missing unlock on error return in rsxx_eeh_remap_dmas()
  Adding in EEH support to the IBM FlashSystem 70/80 device driver
  block: IBM RamSan 70/80 error message bug fix.
  block: IBM RamSan 70/80 branding changes.
  ...
2013-03-31 11:38:59 -07:00
Linus Torvalds
b92eded4b7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fix from Sage Weil:
 "This fixes a regression introduced during the last merge window when
  mapping non-existent images."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: don't zero-fill non-image object requests
2013-03-29 11:47:43 -07:00
Alex Elder
6e2a4505db rbd: don't zero-fill non-image object requests
A result of ENOENT from a read request for an object that's part of
an rbd image indicates that there is a hole in that portion of the
image.  Similarly, a short read for such an object indicates that
the remainder of the read should be interpreted a full read with
zeros filling out the end of the request.

This behavior is not correct for objects that are not backing rbd
image data.  Currently rbd_img_obj_request_callback() assumes it
should be done for all objects.

Change rbd_img_obj_request_callback() so it only does this zeroing
for image objects.  Encapsulate that special handling in its own
function.  Add an assertion that the image object request is a bio
request, since we assume that (and we currently don't support any
other types).

This resolves a problem identified here:
    http://tracker.ceph.com/issues/4559

The regression was introduced by bf0d5f503dc11d6314c0503591d258d60ee9c944.

Reported-by: Dan van der Ster <dan@vanderster.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-off-by: Sage Weil <sage@inktank.com>
2013-03-29 11:32:07 -07:00
Vishal Verma
5d0f6131a7 NVMe: Add nvme-scsi.c
Translates SCSI commands in SG_IO ioctl to NVMe commands.
Uses the scsi-nvme translation spec from nvmexpress.org as reference.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-03-28 14:50:49 -04:00
Eric Dumazet
91c5746425 aoe: reserve enough headroom on skbs
Some network drivers use a non default hard_header_len

Transmitted skb should take into account dev->hard_header_len, or risk
crashes or expensive reallocations.

In the case of aoe, lets reserve MAX_HEADER bytes.

David reported a crash in defxx driver, solved by this patch.

Reported-by: David Oostdyk <daveo@ll.mit.edu>
Tested-by: David Oostdyk <daveo@ll.mit.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ed Cashin <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 14:29:47 -04:00
Lars Ellenberg
0b6ef4164f drbd: fix if(); found by kbuild test robot
Recently introduced al_begin_io_nonblock() was returning -EBUSY,
even when it should return -EWOULDBLOCK.

Impact:
A few spurious wake_up() calls in prepare_al_transaction_nonblock().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:26 -06:00
Philipp Reisner
3990e04df0 drbd: use sched_setscheduler()
It was unnoticed for some time that assigning to current->policy is
no longer sufficient to set a real time priority for a kernel thread.

Reported-by: Charlie Suffin <Charlie.Suffin@stratus.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:25 -06:00
Philipp Reisner
7c689e63a8 drbd: fix for deadlock when using automatic split-brain-recovery
With an automatic after split-brain recovery policy of
"after-sb-1pri call-pri-lost-after-sb",
when trying to drbd_set_role() to R_SECONDARY,
we run into a deadlock.

This was first recognized and supposedly fixed by
2009-06-10 "Fixed a deadlock when using automatic split brain recovery when both nodes are"
replacing drbd_set_role() with drbd_change_state() in that code-path,
but the first hunk of that patch forgets to remove the drbd_set_role().

We apparently only ever tested the "two primaries" case.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:25 -06:00
Alexey Khoroshilov
193d01532a drbd: add module_put() on error path in drbd_proc_open()
If single_open() fails in drbd_proc_open(), module refcount is left incremented.
The patch adds module_put() on the error path.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:25 -06:00
Lars Ellenberg
607f25e56e drbd: fix drbd epoch write count for ahead/behind mode
The sanity check when receiving P_BARRIER_ACK does expect all write
requests with a given req->epoch to have been either all replicated,
or all not replicated.

Because req->epoch was assigned before calling maybe_pull_ahead(),
this expectation was not met, leading to an off-by-one in the sanity
check, and further to a "Protocol Error".

Fix: move the call to maybe_pull_ahead() a few lines up,
and assign req->epoch only after that.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:25 -06:00
Philipp Reisner
ef57f9e6bb drbd: Fix build error when CONFIG_CRYPTO_HMAC is not set
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:25 -06:00
Lars Ellenberg
a3f8f7dc7a drbd: validate resync_after dependency on attach already
We validated resync_after dependencies, if changed via disk-options.
But we did not validate them when first created via attach.
We also did not check or cleanup dependencies that used to be correct,
but now point to meanwhile removed minor devices.

If the drbd_resync_after_valid() validation in disk-options tried to
follow a dependency chain in this way, this could lead to NULL pointer
dereference.

Validate resync_after settings in drbd_adm_attach() already, as well as
in drbd_adm_disk_opts(), and and only reject dependency loops.
Depending on non-existing disks is allowed and equivalent to no dependency.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:25 -06:00
Lars Ellenberg
94ad0a1014 drbd: fix memory leak
We forgot to free the disk_conf,
so for each attach/detach cycle we leaked 336 bytes.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:25 -06:00
Lars Ellenberg
7074e4a745 drbd: only fail empty flushes if no good data is reachable
We completed empty flushes (blkdev_issue_flush()) with IO error
if we lost the local disk, even if we still have an established
replication link to a healthy remote disk.

Fix this to only report errors to upper layers,
if neither local nor remote data is reachable.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:25 -06:00