IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This ends up being a rather large patch but what it's doing is
somewhat straightforward.
Basically, this is replacing two calls with one. The first of the
two calls is initializing a struct ceph_osd_data with data (either a
page array, a page list, or a bio list); the second is setting an
osd request op so it associates that data with one of the op's
parameters. In place of those two will be a single function that
initializes the op directly.
That means we sort of fan out a set of the needed functions:
- extent ops with pages data
- extent ops with pagelist data
- extent ops with bio list data
and
- class ops with page data for receiving a response
We also have define another one, but it's only used internally:
- class ops with pagelist data for request parameters
Note that we *still* haven't gotten rid of the osd request's
r_data_in and r_data_out fields. All the osd ops refer to them for
their data. For now, these data fields are pointers assigned to the
appropriate r_data_* field when these new functions are called.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This patch just trivially moves around some code for consistency.
In preparation for initializing osd request data fields in
ceph_osdc_build_request(), I wanted to verify that rbd did in fact
call that immediately before it called ceph_osdc_start_request().
It was true (although image requests are built in a group and then
started as a group). But I made the changes here just to make
it more obvious, by making all of the calls follow a common
sequence:
osd_req_op_<optype>_init();
ceph_osd_data_<type>_init()
osd_req_op_<optype>_<datafield>()
rbd_osd_req_format()
...
ret = rbd_obj_request_submit()
I moved the initialization of the callback for image object requests
into rbd_img_request_fill_bio(), again, for consistency. To avoid
a forward reference, I moved the definition of rbd_img_obj_callback()
up in the file.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The osd data for a request is currently initialized inside
rbd_osd_req_create(), but that assumes an object request's data
belongs in the osd request's data in or data out field.
There are only three places where requests with data are set up, and
it turns out it's easier to call just the osd data init routines
directly there rather than handling it in rbd_osd_req_create().
(The real motivation here is moving toward getting rid of the
osd request in and out data fields.)
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently an object request has its osd request's data field set in
rbd_osd_req_format_op(). That assumes a single osd op per object
request, and that won't be the case for long.
Move the code that sets this out and into the caller.
Rename rbd_osd_req_format_op() to be just rbd_osd_req_format(),
removing the notion that it's doing anything op-specific.
This and the next patch resolve:
http://tracker.ceph.com/issues/4658
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An osd request now holds all of its source op structures, and every
place that initializes one of these is in fact initializing one
of the entries in the the osd request's array.
So rather than supplying the address of the op to initialize, have
caller specify the osd request and an indication of which op it
would like to initialize. This better hides the details the
op structure (and faciltates moving the data pointers they use).
Since osd_req_op_init() is a common routine, and it's not used
outside the osd client code, give it static scope. Also make
it return the address of the specified op (so all the other
init routines don't have to repeat that code).
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An extent type osd operation currently implies that there will
be corresponding data supplied in the data portion of the request
(for write) or response (for read) message. Similarly, an osd class
method operation implies a data item will be supplied to receive
the response data from the operation.
Add a ceph_osd_data pointer to each of those structures, and assign
it to point to eithre the incoming or the outgoing data structure in
the osd message. The data is not always available when an op is
initially set up, so add two new functions to allow setting them
after the op has been initialized.
Begin to make use of the data item pointer available in the osd
operation rather than the request data in or out structure in
places where it's convenient. Add some assertions to verify
pointers are always set the way they're expected to be.
This is a sort of stepping stone toward really moving the data
into the osd request ops, to allow for some validation before
making that jump.
This is the first in a series of patches that resolve:
http://tracker.ceph.com/issues/4657
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An osd request keeps a pointer to the osd operations (ops) array
that it builds in its request message.
In order to allow each op in the array to have its own distinct
data, we will need to keep track of each op's data, and that
information does not go over the wire.
As long as we're tracking the data we might as well just track the
entire (source) op definition for each of the ops. And if we're
doing that, we'll have no more need to keep a pointer to the
wire-encoded version.
This patch makes the array of source ops be kept with the osd
request structure, and uses that instead of the version encoded in
the message in places where that was previously used. The array
will be embedded in the request structure, and the maximum number of
ops we ever actually use is currently 2. So reduce CEPH_OSD_MAX_OP
to 2 to reduce the size of the structure.
The result of doing this sort of ripples back up, and as a result
various function parameters and local variables become unnecessary.
Make r_num_ops be unsigned, and move the definition of struct
ceph_osd_req_op earlier to ensure it's defined where needed.
It does not yet add per-op data, that's coming soon.
This resolves:
http://tracker.ceph.com/issues/4656
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Define rbd_osd_req_format_op(), which encapsulates formatting
an osd op into an object request's osd request message. Only
one op is supported right now.
Stop calling ceph_osdc_build_request() in rbd_osd_req_create().
Instead, call rbd_osd_req_format_op() in each of the callers of
rbd_osd_req_create().
This is to prepare for the next patch, in which the source ops for
an osd request will be held in the osd request itself. Because of
that, we won't have the source op to work with until after the
request is created, so we can't format the op until then.
This an the next patch resolve:
http://tracker.ceph.com/issues/4656
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Define and use functions that encapsulate the initializion of a
ceph_osd_data structure.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When rbd creates an object request containing an object method call
operation it is passing 0 for the size. I originally thought this
was because the length was not needed for method calls, but I think
it really should be supplied, to describe how much space is
available to receive response data. So provide the supplied length.
This resolves:
http://tracker.ceph.com/issues/4659
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When assigning a bio pointer to an osd request, we don't have an
efficient way of knowing the total length bytes in the bio list.
That information is available at the point it's set up by the rbd
code, so record it with the osd data when it's set.
This and the next patch are related to maintaining the length of a
message's data independent of the message header, as described here:
http://tracker.ceph.com/issues/4589
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The rbd code has a function that allocates and populates a
ceph_osd_req_op structure (the in-core version of an osd request
operation). When reviewed, Josh suggested two things: that the
big varargs function might be better split into type-specific
functions; and that this functionality really belongs in the osd
client rather than rbd.
This patch implements both of Josh's suggestions. It breaks
up the rbd function into separate functions and defines them
in the osd client module as exported interfaces. Unlike the
rbd version, however, the functions don't allocate an osd_req_op
structure; they are provided the address of one and that is
initialized instead.
The rbd function has been eliminated and calls to it have been
replaced by calls to the new routines. The rbd code now now use a
stack (struct) variable to hold the op rather than allocating and
freeing it each time.
For now only the capabilities used by rbd are implemented.
Implementing all the other osd op types, and making the rest of the
code use it will be done separately, in the next few patches.
Note that only the extent, cls, and watch portions of the
ceph_osd_req_op structure are currently used. Delete the others
(xattr, pgls, and snap) from its definition so nobody thinks it's
actually implemented or needed. We can add it back again later
if needed, when we know it's been tested.
This (and a few follow-on patches) resolves:
http://tracker.ceph.com/issues/3861
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Move some definitions for max integer values out of the rbd code and
into the more central "decode.h" header file. These really belong
in a Linux (or libc) header somewhere, but I haven't gotten around
to proposing that yet.
This is in preparation for moving some code out of rbd.c and into
the osd client.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The length of outgoing data in an osd request is dependent on the
osd ops that are embedded in that request. Each op is encoded into
a request message using osd_req_encode_op(), so that should be used
to determine the amount of outgoing data implied by the op as it
is encoded.
Have osd_req_encode_op() return the number of bytes of outgoing data
implied by the op being encoded, and accumulate and use that in
ceph_osdc_build_request().
As a result, ceph_osdc_build_request() no longer requires its "len"
parameter, so get rid of it.
Using the sum of the op lengths rather than the length provided is
a valid change because:
- The only callers of osd ceph_osdc_build_request() are
rbd and the osd client (in ceph_osdc_new_request() on
behalf of the file system).
- When rbd calls it, the length provided is only non-zero for
write requests, and in that case the single op has the
same length value as what was passed here.
- When called from ceph_osdc_new_request(), (it's not all that
easy to see, but) the length passed is also always the same
as the extent length encoded in its (single) write op if
present.
This resolves:
http://tracker.ceph.com/issues/4406
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Record the byte count for an osd request rather than the page count.
The number of pages can always be derived from the byte count (and
alignment/offset) but the reverse is not true.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An osd request defines information about where data to be read
should be placed as well as where data to write comes from.
Currently these are represented by common fields.
Keep information about data for writing separate from data to be
read by splitting these into data_in and data_out fields.
This is the key patch in this whole series, in that it actually
identifies which osd requests generate outgoing data and which
generate incoming data. It's less obvious (currently) that an osd
CALL op generates both outgoing and incoming data; that's the focus
of some upcoming work.
This resolves:
http://tracker.ceph.com/issues/4127
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An osd request uses either pages or a bio list for its data. Use a
union to record information about the two, and add a data type
tag to select between them.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Pull the fields in an osd request structure that define the data for
the request out into a separate structure.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Pull VFS updates from Al Viro,
Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).
7kloc removed.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...
The kthread has two tasks; handling timeouts (for which it runs once per
second), and submitting queued BIOs. If a BIO happens to be queued after
the thread has processed the queue but before it calls schedule_timeout(),
the thread will sleep for a second before submitting it, which can cause
performance problems in some rare cases (that will become more common in
a subsequent patch).
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Raise the default max request size for nbd to 128KB (from 127KB) to get it
4KB aligned. This patch also allows the max request size to be increased
(via /sys/block/nbd<x>/queue/max_sectors_kb) to 32MB.
The patch makes nbd network traffic more efficient by:
- reducing request fragmentation (4KB alignment)
- reducing the number of requests (fewer round trips, less network overhead)
Especially in high latency networks, larger request size can make a dramatic
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Michal Belczyk <belczyk@bsd.krakow.pl>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
By default the cciss driver supports all "older" HP Smart Array
controllers and hpsa supports all controllers starting with the G6 family.
There are module parameters that allow a user to override those defaults
and use hpsa for any HP Smart Array controller.
If the user does override the default behavior and uses hpsa for older
controllers it is possible that cciss may try to load in a kdump crash
kernel. This may happen if cciss is loaded first from the kdump initrd
image. If cciss does load rather than hpsa and reset_devices is true we
immediately call cciss_hard_reset_controller. This will result in a
kernel panic and the core file cannot be created. This patch prevents
cciss from trying to load in this scenario.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add the cciss_allow_hpsa modules parameter. This allows users to use the
hpsa driver instead of cciss for older controllers.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add CONFIG_PM_SLEEP to suspend/resume functions to fix the following build
warning when CONFIG_PM_SLEEP is not selected. This is because sleep PM
callbacks defined by SIMPLE_DEV_PM_OPS are only used when the
CONFIG_PM_SLEEP is enabled.
drivers/block/mg_disk.c:783:12: warning: 'mg_suspend' defined but not used [-Wunused-function]
drivers/block/mg_disk.c:807:12: warning: 'mg_resume' defined but not used [-Wunused-function]
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Workaround for handling unaligned writes: limit number of outstanding
unaligned writes
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull Ceph fix from Sage Weil:
"It's a simple fix for a hard to hit race, but low-risk and clearly
correct"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: do a safe list traversal in rbd_img_request_submit()
It's possible that the reference to the object request dropped
inside the loop in rbd_img_request_submit() will be the last
one, in which case the content of the object pointer can't be
trusted.
Use a safe form of the object request list traversal to avoid
problems.
This resolves:
http://tracker.ceph.com/issues/4705
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Registers a miscellaneous device for each nvme controller probed. This
creates character device files as /dev/nvmeN, where N is the device
instance, and supports nvme admin ioctl commands so devices without
namespaces can be managed.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
When constructing the command, dsmgmt needs to be treated as a 32-bit
value, not a 16-bit value. reftag, apptag and appmask all need to be
converted from native-endian to little-endian. Again, sparse's bitwise
warnings caught this problem. Thanks to Keith for pointing out the
correct way to fix the reftag.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
The sparse bitwise checks pointed out that I needed to shift the status
before changing its endianness, not after.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Sparse produced warnings for some instances of
mismatched types and direct userspace dereferences.
This patch fixes those for the scsi emulation layer.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
The nvme_dev_add() function currently returns the last error code that it
saw, which (if everything else succeeds) happens to be the result of an
optional command, so it can legitimately fail. Looking at the error path
more closely reveals that we should return success from this function,
even if no device namespaces are added. So once the queues are created
and the device has responded to Identify, make sure that this function
succeeds.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Introduce nvme_block_nr() to help convert sectors to block numbers.
This fixes an integer overflow in the SCSI conversion layer, and it's
slightly less typing than opencoding it.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
The nvme driver has a "once per second" event where the management kthread
wakes up the system and then reschedules itself for 1 second later.
For power efficiency reasons, I'd like this timer to happen together
with other wakeups in the system.
This patch makes the schedule_timeout() call in the kthread use
round_jiffies_relative(), causing the wakeup to at least align with other
"once per X seconds" events in the kernel.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Temporarily disabling TRIM support until TRIM related issues
are addressed in the firmware.
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reported smatch warning:
drivers/block/mtip32xx/mtip32xx.c:4163 mtip_block_shutdown() warn: variable dereferenced before check 'dd->disk' (see line 4159)
dd->disk->disk_name accessed before the check if dd->disk is NULL. Fixed this
and access of dd->queue/dd->disk->queue.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJRZCnRAAoJEPfTWPspceCmQD0QAMS6a7kYln9n7ByzMa1rfm7x
Mhu3NkT3I34a5+hgdiIwWTjqJ2Uu78S5pD522eAP4m9yGcG7m1c3aqdTEewDck7K
xcIjgzS1j1J/XRlrtgPxJhsiA23Kzhliwztrr1YhczWLMijvIrNu2KYZ4EK4/YPB
KkdGJJitunrOw4nhmnB1AYCRJoZJ5KXOpigqVr7vxTh0Ye7ue2k9mDD3x6tOp/Q0
TxKxXyEZF9yHgUkpARMcy4OEcr63APfAeiAnZOPpdK5rbCc8pVXpoadLE8UipnlI
0FcGBzuEezQE3FZrT9PL58FFNEPUmx3NWz2SwYV7Te5Gw5Zw/ffMxk8geE12ALmb
YuULHGOxz97rxjFImYYGqIOQl7F6V59bYgbXegmKnWjorKEn6LxHyNI1Q76oH5V3
UgvkmTK91yNa6DvB24Vrt1yKb3GvKQApHAwEn82y+LqcZFkWq1iKqlPVNNGGjCbR
j47Qu4G/6Bif0vTFC34ciI79ga/di4S4FUOCthsjskm7waDUNvrlbP4Mvwq1VPAc
VZaYGrQcWCDxuAY/S3XTR+2ci/B0n4Fhyn/iFg/6S+nQawJDeiM+C3yVDbH8caq4
wV2PXEQ/r+SII4PVGFAmATewUKF3dkUpT2TKhc9KRHZh9+mFBXFj13MZ3+WpBPHl
NTvEyi5nUL+MdyZGuJUi
=CB+W
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20130409' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"I've got a few smaller fixes queued up for 3.9 that should go in. The
major one is the loop regression, the others are nice fixes on their
own though. It contains:
- Fix for unitialized var in the block sysfs code, courtesy of Arnd
and gcc-4.8.
- Two fixes for mtip32xx, fixing probe and command timeout. Also a
debug measure that could have waited for 3.10, but it's driver
only, so I let it slip in.
- Revert the loop partition cleanup fix, it could cause a deadlock on
auto-teardown as part of umount. The fix is clear, but at this
point we just want to revert it and get a real fix in for 3.10."
* tag 'for-linus-20130409' of git://git.kernel.dk/linux-block:
Revert "loop: cleanup partitions when detaching loop device"
mtip32xx: fix two smatch warnings
mtip32xx: Add debugfs entry device_status
mtip32xx: return 0 from pci probe in case of rebuild
mtip32xx: recovery from command timeout
block: avoid using uninitialized value in from queue_var_store
The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data. Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This reverts commit 8761a3dc1f07b163414e2215a2cadbb4cfe2a107.
There are situations where the destruction path is called
with the bdev->bd_mutex already held, which then deadlocks in
loop_clr_fd(). The normal partition cleanup does a trylock()
on the mutex, but it'd be nice to have a more bullet proof
method in loop. So punt this more involved fix to the next
merge window, and just back out this buggy fix for now.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dan reports:
New smatch warnings:
drivers/block/mtip32xx/mtip32xx.c:2728 show_device_status() warn: variable dereferenced before check 'dd' (see line 2727)
drivers/block/mtip32xx/mtip32xx.c:2758 show_device_status() warn: variable dereferenced before check 'dd' (see line 2757)
which are checking if dd == NULL, in a list_for_each_entry() type loop.
Get rid of the check, dd can never be NULL here.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch adds a new debugfs entry 'device_status' in
/sys/kernel/debug/rssd. The value of this entry shows
devices online and those in the process of removing.
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix to return 0 from pci probe in case of rebuild. If not, pci consider
probe has failed, and crash during rmmod.
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
To recover from command timeouts, reset the device. In addition
to that improved timeout handling of PIO commands.
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tejun writes:
-----
This is the pull request for the earlier patchset[1] with the same
name. It's only three patches (the first one was committed to
workqueue tree) but the merge strategy is a bit involved due to the
dependencies.
* Because the conversion needs features from wq/for-3.10,
block/for-3.10/core is based on rc3, and wq/for-3.10 has conflicts
with rc3, I pulled mainline (rc5) into wq/for-3.10 to prevent those
workqueue conflicts from flaring up in block tree.
* Resolving the issue that Jan and Dave raised about debugging
requires arch-wide changes. The patchset is being worked on[2] but
it'll have to go through -mm after these changes show up in -next,
and not included in this pull request.
The three commits are located in the following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git writeback-workqueue
Pulling it into block/for-3.10/core produces a conflict in
drivers/md/raid5.c between the following two commits.
e3620a3ad5 ("MD RAID5: Avoid accessing gendisk or queue structs when not available")
2f6db2a707 ("raid5: use bio_reset()")
The conflict is trivial - one removes an "if ()" conditional while the
other removes "rbi->bi_next = NULL" right above it. We just need to
remove both. The merged branch is available at
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git block-test-merge
so that you can use it for verification. The test merge commit has
proper merge description.
While these changes are a bit of pain to route, they make code simpler
and even have, while minute, measureable performance gain[3] even on a
workload which isn't particularly favorable to showing the benefits of
this conversion.
----
Fixed up the conflict.
Conflicts:
drivers/md/raid5.c
Signed-off-by: Jens Axboe <axboe@kernel.dk>
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
time we call loop_set_fd() we check that loop_device->lo_state is
Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
it will get EBUSY. And if we try to loop_clr_fd() on unbound loop
device we'll get ENXIO.
loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
loop_device->lo_ctl_mutex. ]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) sadb_msg prepared for IPSEC userspace forgets to initialize the
satype field, fix from Nicolas Dichtel.
2) Fix mac80211 synchronization during station removal, from Johannes
Berg.
3) Fix IPSEC sequence number notifications when they wrap, from Steffen
Klassert.
4) Fix cfg80211 wdev tracing crashes when add_virtual_intf() returns an
error pointer, from Johannes Berg.
5) In mac80211, don't call into the channel context code with the
interface list mutex held. From Johannes Berg.
6) In mac80211, if we don't actually associate, do not restart the STA
timer, otherwise we can crash. From Ben Greear.
7) Missing dma_mapping_error() check in e1000, ixgb, and e1000e. From
Christoph Paasch.
8) Fix sja1000 driver defines to not conflict with SH port, from Marc
Kleine-Budde.
9) Don't call il4965_rs_use_green with a NULL station, from Colin Ian
King.
10) Suspend/Resume in the FEC driver fail because the buffer descriptors
are not initialized at all the moments in which they should. Fix
from Frank Li.
11) cpsw and davinci_emac drivers both use the wrong interface to
restart a stopped TX queue. Use netif_wake_queue not
netif_start_queue, the latter is for initialization/bringup not
active management of the queue. From Mugunthan V N.
12) Fix regression in rate calculations done by
psched_ratecfg_precompute(), missing u64 type promotion. From
Sergey Popovich.
13) Fix length overflow in tg3 VPD parsing, from Kees Cook.
14) AOE driver fails to allocate enough headroom, resulting in crashes.
Fix from Eric Dumazet.
15) RX overflow happens too quickly in sky2 driver because pause packet
thresholds are not programmed correctly. From Mirko Lindner.
16) Bonding driver manages arp_interval and miimon settings incorrectly,
disabling one unintentionally disables both. Fix from Nikolay
Aleksandrov.
17) smsc75xx drivers don't program the RX mac properly for jumbo frames.
Fix from Steve Glendinning.
18) Fix off-by-one in Codel packet scheduler. From Vijay Subramanian.
19) Fix packet corruption in atl1c by disabling MSI support, from Hannes
Frederic Sowa.
20) netdev_rx_handler_unregister() needs a synchronize_net() to fix
crashes in bonding driver unload stress tests. From Eric Dumazet.
21) rxlen field of ks8851 RX packet descriptors not interpreted
correctly (it is 12 bits not 16 bits, so needs to be masked after
shifting the 32-bit value down 16 bits). Fix from Max Nekludov.
22) Fix missed RX/TX enable in sh_eth driver due to mishandling of link
change indications. From Sergei Shtylyov.
23) Fix crashes during spurious ECI interrupts in sh_eth driver, also
from Sergei Shtylyov.
24) dm9000 driver initialization is done wrong for revision B devices
with DSP PHY, from Joseph CHANG.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
DM9000B: driver initialization upgrade
sh_eth: make 'link' field of 'struct sh_eth_private' *int*
sh_eth: workaround for spurious ECI interrupt
sh_eth: fix handling of no LINK signal
ks8851: Fix interpretation of rxlen field.
net: add a synchronize_net() in netdev_rx_handler_unregister()
MAINTAINERS: Update netxen_nic maintainers list
atl1e: drop pci-msi support because of packet corruption
net: fq_codel: Fix off-by-one error
net: calxedaxgmac: Wake-on-LAN fixes
net: calxedaxgmac: fix rx ring handling when OOM
net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb'
smsc75xx: fix jumbo frame support
net: fix the use of this_cpu_ptr
bonding: fix disabling of arp_interval and miimon
ipv6: don't accept node local multicast traffic from the wire
sky2: Threshold for Pause Packet is set wrong
sky2: Receive Overflows not counted
aoe: reserve enough headroom on skbs
line up comment for ndo_bridge_getlink
...