linux

iv/linux

Author	SHA1	Message	Date
Mike Snitzer	23508a96cd	dm: add WRITE SAME support WRITE SAME bios have a payload that contain a single page. When cloning WRITE SAME bios DM has no need to modify the bi_io_vec attributes (and doing so would be detrimental). DM need only alter the start and end of the WRITE SAME bio accordingly. Rather than duplicate __clone_and_map_discard, factor out a common function that is also used by __clone_and_map_write_same. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:37 +00:00
Mike Snitzer	d54eaa5a0f	dm: prepare to support WRITE SAME Allow targets to opt in to WRITE SAME support by setting 'num_write_same_requests' in the dm_target structure. A dm device will only advertise WRITE SAME support if all its targets and all its underlying devices support it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:36 +00:00
Mikulas Patocka	9c5091f2ee	dm ioctl: use kmalloc if possible If the parameter buffer is small enough, try to allocate it with kmalloc() rather than vmalloc(). vmalloc is noticeably slower than kmalloc because it has to manipulate page tables. In my tests, on PA-RISC this patch speeds up activation 13 times. On Opteron this patch speeds up activation by 5%. This patch introduces a new function free_params() to free the parameters and this uses new flags that record whether or not vmalloc() was used and whether or not the input buffer must be wiped after use. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:36 +00:00
Mikulas Patocka	5023e5cf58	dm ioctl: remove PF_MEMALLOC When allocating memory for the userspace ioctl data, set some appropriate GPF flags directly instead of using PF_MEMALLOC. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:36 +00:00
Joe Thornber	7960123f2d	dm persistent data: improve improve space map block alloc failure message Improve space map error message when unable to allocate a new metadata block. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:36 +00:00
Mike Snitzer	c397741c76	dm thin: use DMERR_LIMIT for errors Throttle all errors logged from the IO path by dm thin. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:34 +00:00
Mike Snitzer	89ddeb8cb1	dm persistent data: use DMERR_LIMIT for errors Nearly all of persistent-data is in the IO path so throttle error messages with DMERR_LIMIT to limit the amount logged when something has gone wrong. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:34 +00:00
Mike Snitzer	a5bd968aeb	dm block manager: reinstate message when validator fails Reinstate a useful error message when the block manager buffer validator fails. This was mistakenly eliminated when the block manager was converted to use dm-bufio. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:34 +00:00
Jonathan Brassow	3a0f9aaee0	dm raid: round region_size to power of two If the user does not supply a bitmap region_size to the dm raid target, a reasonable size is computed automatically. If this is not a power of 2, the md code will report an error later. This patch catches the problem early and rounds the region_size to the next power of two. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:33 +00:00
Joe Thornber	2aab38502d	dm thin: cleanup dead code Remove unused @data_block parameter from cell_defer. Change thin_bio_map to use many returns rather than setting a variable. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:33 +00:00
Joe Thornber	f286ba0eed	dm thin: rename cell_defer_except to cell_defer_no_holder Rename cell_defer_except() to cell_defer_no_holder() which describes its function more clearly. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:33 +00:00
Mikulas Patocka	9aa0c0e60f	dm snapshot: optimize track_chunk track_chunk is always called with interrupts enabled. Consequently, we do not need to save and restore interrupt state in "flags" variable. This patch changes spin_lock_irqsave to spin_lock_irq and spin_unlock_irqrestore to spin_unlock_irq. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:33 +00:00
Mikulas Patocka	19cbbc60c6	dm raid: use DM_ENDIO_INCOMPLETE Use a defined macro DM_ENDIO_INCOMPLETE instead of a numeric constant. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:32 +00:00
Mikulas Patocka	7c27213b20	dm raid1: remove impossible mempool_alloc error test mempool_alloc can't fail if __GFP_WAIT is specified, so the condition that tests if read_record is non-NULL is always true. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:32 +00:00
Mike Snitzer	018debea8d	dm thin: emit ignore_discard in status when discards disabled If "ignore_discard" is specified when creating the thin pool device then discard support is disabled for that device. The pool device's status should reflect this fact rather than stating "no_discard_passdown" (which implies discards are enabled but passdown is disabled). Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:32 +00:00
Joe Thornber	e3cbf94513	dm persistent data: fix nested btree deletion When deleting nested btrees, the code forgets to delete the innermost btree. The thin-metadata code serendipitously compensates for this by claiming there is one extra layer in the tree. This patch corrects both problems. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:32 +00:00
Joe Thornber	563af186df	dm thin: wake worker when discard is prepared When discards are prepared it is best to directly wake the worker that will process them. The worker will be woken anyway, via periodic commit, but there is no reason to not wake_worker here. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:31 +00:00
Joe Thornber	e8088073c9	dm thin: fix race between simultaneous io and discards to same block There is a race when discard bios and non-discard bios are issued simultaneously to the same block. Discard support is expensive for all thin devices precisely because you have to be careful to quiesce the area you're discarding. DM thin must handle this conflicting IO pattern (simultaneous non-discard vs discard) even though a sane application shouldn't be issuing such IO. The race manifests as follows: 1. A non-discard bio is mapped in thin_bio_map. This doesn't lock out parallel activity to the same block. 2. A discard bio is issued to the same block as the non-discard bio. 3. The discard bio is locked in a dm_bio_prison_cell in process_discard to lock out parallel activity against the same block. 4. The non-discard bio's mapping continues and its all_io_entry is incremented so the bio is accounted for in the thin pool's all_io_ds which is a dm_deferred_set used to track time locality of non-discard IO. 5. The non-discard bio is finally locked in a dm_bio_prison_cell in process_bio. The race can result in deadlock, leaving the block layer hanging waiting for completion of a discard bio that never completes, e.g.: INFO: task ruby:15354 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ruby D ffffffff8160f0e0 0 15354 15314 0x00000000 ffff8802fb08bc58 0000000000000082 ffff8802fb08bfd8 0000000000012900 ffff8802fb08a010 0000000000012900 0000000000012900 0000000000012900 ffff8802fb08bfd8 0000000000012900 ffff8803324b9480 ffff88032c6f14c0 Call Trace: [<ffffffff814e5a19>] schedule+0x29/0x70 [<ffffffff814e3d85>] schedule_timeout+0x195/0x220 [<ffffffffa06b9bc1>] ? _dm_request+0x111/0x160 [dm_mod] [<ffffffff814e589e>] wait_for_common+0x11e/0x190 [<ffffffff8107a170>] ? try_to_wake_up+0x2b0/0x2b0 [<ffffffff814e59ed>] wait_for_completion+0x1d/0x20 [<ffffffff81233289>] blkdev_issue_discard+0x219/0x260 [<ffffffff81233e79>] blkdev_ioctl+0x6e9/0x7b0 [<ffffffff8119a65c>] block_ioctl+0x3c/0x40 [<ffffffff8117539c>] do_vfs_ioctl+0x8c/0x340 [<ffffffff8119a547>] ? block_llseek+0x67/0xb0 [<ffffffff811756f1>] sys_ioctl+0xa1/0xb0 [<ffffffff810561f6>] ? sys_rt_sigprocmask+0x86/0xd0 [<ffffffff814ef099>] system_call_fastpath+0x16/0x1b The thinp-test-suite's test_discard_random_sectors reliably hits this deadlock on fast SSD storage. The fix for this race is that the all_io_entry for a bio must be incremented whilst the dm_bio_prison_cell is held for the bio's associated virtual and physical blocks. That cell locking wasn't occurring early enough in thin_bio_map. This patch fixes this. Care is taken to always call the new function inc_all_io_entry() with the relevant cells locked, but they are generally unlocked before calling issue() to try to avoid holding the cells locked across generic_submit_request. Also, now that thin_bio_map may lock bios in a cell, process_bio() is no longer the only thread that will do so. Because of this we must be sure to use cell_defer_except() to release all non-holder entries, that were added by the other thread, because they must be deferred. This patch depends on "dm thin: replace dm_cell_release_singleton with cell_defer_except". Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@vger.kernel.org	2012-12-21 20:23:31 +00:00
Joe Thornber	b7ca9c9273	dm thin: replace dm_cell_release_singleton with cell_defer_except Change existing users of the function dm_cell_release_singleton to share cell_defer_except instead, and then remove the now-unused function. Everywhere that calls dm_cell_release_singleton, the bio in question is the holder of the cell. If there are no non-holder entries in the cell then cell_defer_except behaves exactly like dm_cell_release_singleton. Conversely, if there are non-holder entries then dm_cell_release_singleton must not be used because those entries would need to be deferred. Consequently, it is safe to replace use of dm_cell_release_singleton with cell_defer_except. This patch is a pre-requisite for "dm thin: fix race between simultaneous io and discards to same block". Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:31 +00:00
Mike Snitzer	c1a94672a8	dm: disable WRITE SAME WRITE SAME bios are not yet handled correctly by device-mapper so disable their use on device-mapper devices by setting max_write_same_sectors to zero. As an example, a ciphertext device is incompatible because the data gets changed according to the location at which it written and so the dm crypt target cannot support it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Cc: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:30 +00:00
Alasdair G Kergon	e910d7ebec	dm ioctl: prevent unsafe change to dm_ioctl data_size Abort dm ioctl processing if userspace changes the data_size parameter after we validated it but before we finished copying the data buffer from userspace. The dm ioctl parameters are processed in the following sequence: 1. ctl_ioctl() calls copy_params(); 2. copy_params() makes a first copy of the fixed-sized portion of the userspace parameters into the local variable "tmp"; 3. copy_params() then validates tmp.data_size and allocates a new structure big enough to hold the complete data and copies the whole userspace buffer there; 4. ctl_ioctl() reads userspace data the second time and copies the whole buffer into the pointer "param"; 5. ctl_ioctl() reads param->data_size without any validation and stores it in the variable "input_param_size"; 6. "input_param_size" is further used as the authoritative size of the kernel buffer. The problem is that userspace code could change the contents of user memory between steps 2 and 4. In particular, the data_size parameter can be changed to an invalid value after the kernel has validated it. This lets userspace force the kernel to access invalid kernel memory. The fix is to ensure that the size has not changed at step 4. This patch shouldn't have a security impact because CAP_SYS_ADMIN is required to run this code, but it should be fixed anyway. Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org	2012-12-21 20:23:30 +00:00
Mikulas Patocka	550929faf8	dm persistent data: rename node to btree_node This patch fixes a compilation failure on sparc32 by renaming struct node. struct node is already defined in include/linux/node.h. On sparc32, it happens to be included through other dependencies and persistent-data doesn't compile because of conflicting declarations. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2012-12-21 20:23:30 +00:00
Linus Torvalds	ea88eeac0c	md update for 3.8 Mostly just little fixes. Probably biggest part is AVX accelerated RAID6 calculations. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIVAwUAUM/w2Dnsnt1WYoG5AQKXlg/9F5juv4CjRkRRFLqZgOPBLmn/s/2Vspgh 2Kv8Jcyixd8jUQNbobZv0ahlJH/iSU61kpOE8QjLbKi5Y42vAbM0ZU2aHJ6nqGZy HiTI8K+7kTvCK3ZXLcUQ+4oPPBNTcoTZbLWaEOmIqB1ruLddoIR7M9fG3PspVeG0 jijnXR8IfL6mr4YDXnJkEhFrneTysVik05RkKYZKyM/9r3stAoMJ9o0/EFy3OFxb lO6mLEtvjVArXcnuf1RMCw2YKgki9Y4r73HCplgQsVFvcxcpsya4gFF+lRR5j7cO /eMYbSQ89iWEYKh1dJ9u1nofc8fX5ia71QQyO1fkO4GXRHXPVIyBgKSbe7SaL6iG JUMm7idUV2rZGeq3ln3k8Yor4QqHvN1n7pRKKUF+ZdsPoQ1B/TABu+qpsAdo5ZhP fxDsULsHrzEaxgetd4V8F2Uptca9ni43sMI8mwsvVlA0p6SOzMIyoJLC9xAZpx11 b3H3+7Oje/fasmszBoq5B9uAlSt9XXVN4DDn2q6cX+S96JSX6jcsN1c6cJBO+ZxB OU6a6P5mnU6HuxU02rspe7G8BeU+ybaonErOW+GdyC4r7M/cImC0dSp0NGHK2211 oqu0xBx/Q/ddTFwKQqa4HzR2ws09+LhKbjdqYIhCEKttIbLIAjf73ARZ19XPSRRX pDR/ey2CB6E= =uK52 -----END PGP SIGNATURE----- Merge tag 'md-3.8' of git://neil.brown.name/md Pull md update from Neil Brown: "Mostly just little fixes. Probably biggest part is AVX accelerated RAID6 calculations." * tag 'md-3.8' of git://neil.brown.name/md: md/raid5: add blktrace calls md/raid5: use async_tx_quiesce() instead of open-coding it. md: Use ->curr_resync as last completed request when cleanly aborting resync. lib/raid6: build proper files on corresponding arch lib/raid6: Add AVX2 optimized gen_syndrome functions lib/raid6: Add AVX2 optimized recovery functions md: Update checkpoint of resync/recovery based on time. md:Add place to update ->recovery_cp. md.c: re-indent various 'switch' statements. md: close race between removing and adding a device. md: removed unused variable in calc_sb_1_csm.	2012-12-18 09:32:44 -08:00
NeilBrown	a9add5d92b	md/raid5: add blktrace calls This makes it easier to trace what raid5 is doing. Signed-off-by: NeilBrown <neilb@suse.de>	2012-12-18 10:22:21 +11:00
Linus Torvalds	9228ff9038	Merge branch 'for-3.8/drivers' of git://git.kernel.dk/linux-block Pull block driver update from Jens Axboe: "Now that the core bits are in, here are the driver bits for 3.8. The branch contains: - A huge pile of drbd bits that were dumped from the 3.7 merge window. Following that, it was both made perfectly clear that there is going to be no more over-the-wall pulls and how the situation on individual pulls can be improved. - A few cleanups from Akinobu Mita for drbd and cciss. - Queue improvement for loop from Lukas. This grew into adding a generic interface for waiting/checking an even with a specific lock, allowing this to be pulled out of md and now loop and drbd is also using it. - A few fixes for xen back/front block driver from Roger Pau Monne. - Partition improvements from Stephen Warren, allowing partiion UUID to be used as an identifier." * 'for-3.8/drivers' of git://git.kernel.dk/linux-block: (609 commits) drbd: update Kconfig to match current dependencies drbd: Fix drbdsetup wait-connect, wait-sync etc... commands drbd: close race between drbd_set_role and drbd_connect drbd: respect no-md-barriers setting also when changed online via disk-options drbd: Remove obsolete check drbd: fixup after wait_even_lock_irq() addition to generic code loop: Limit the number of requests in the bio list wait: add wait_event_lock_irq() interface xen-blkfront: free allocated page xen-blkback: move free persistent grants code block: partition: msdos: provide UUIDs for partitions init: reduce PARTUUID min length to 1 from 36 block: store partition_meta_info.uuid as a string cciss: use check_signature() cciss: cleanup bitops usage drbd: use copy_highpage drbd: if the replication link breaks during handshake, keep retrying drbd: check return of kmalloc in receive_uuids drbd: Broadcast sync progress no more often than once per second drbd: don't try to clear bits once the disk has failed ...	2012-12-17 13:39:11 -08:00
Linus Torvalds	a2013a13e6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial branch from Jiri Kosina: "Usual stuff -- comment/printk typo fixes, documentation updates, dead code elimination." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) HOWTO: fix double words typo x86 mtrr: fix comment typo in mtrr_bp_init propagate name change to comments in kernel source doc: Update the name of profiling based on sysfs treewide: Fix typos in various drivers treewide: Fix typos in various Kconfig wireless: mwifiex: Fix typo in wireless/mwifiex driver messages: i2o: Fix typo in messages/i2o scripts/kernel-doc: check that non-void fcts describe their return value Kernel-doc: Convention: Use a "Return" section to describe return values radeon: Fix typo and copy/paste error in comments doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c various: Fix spelling of "asynchronous" in comments. Fix misspellings of "whether" in comments. eisa: Fix spelling of "asynchronous". various: Fix spelling of "registered" in comments. doc: fix quite a few typos within Documentation target: iscsi: fix comment typos in target/iscsi drivers treewide: fix typo of "suport" in various comments and Kconfig treewide: fix typo of "suppport" in various comments ...	2012-12-13 12:00:02 -08:00
NeilBrown	749586b7d3	md/raid5: use async_tx_quiesce() instead of open-coding it. handle_stripe_expansion contains: if (tx) { async_tx_ack(tx); dma_wait_for_async_tx(tx); } which is very similar to the body of async_tx_quiesce(), except that the later handles an error from dma_wait_for_async_tx() (admittedly by panicing, but that decision belongs in the dma code, not the md code). So just us async_tx_quiesce(). Acked-by: Dan Williams <djbw@fb.com> Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-12-13 19:52:32 +11:00
majianpeng	0a19caabf0	md: Use ->curr_resync as last completed request when cleanly aborting resync. If a resync is aborted cleanly, ->curr_resync is a reliable record of where we got up to. If there was an error it is less reliable but we always know that ->curr_resync_completed is safe. So add a flag MD_RECOVERY_ERROR to differentiate between these cases and set recovery_cp accordingly. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-12-13 19:52:11 +11:00
majianpeng	54f89341e8	md: Update checkpoint of resync/recovery based on time. md will current only only checkpoint recovery or resync ever 1/16th of the device size. As devices get larger this can become a long time an so a lot of work that might need to be duplicated after a shutdown. So add a time-based checkpoint. Every 5 minutes limits the amount of duplicated effort to at most 5 minutes, and has almost zero impact on performance. [changelog entry re-written by NeilBrown] Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-12-13 16:41:40 +11:00
kernelmail	35d78c6696	md:Add place to update ->recovery_cp. In resyncing, recovery_cp only updated when resync aborted or completed. But in md drives,many place used it to judge.So add a place to update. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-12-13 16:41:01 +11:00
NeilBrown	c02c0aeb6c	md.c: re-indent various 'switch' statements. Intent was unnecessarily deep. Also change one 'switch' which has a single case element, into an 'if'. Signed-off-by: NeilBrown <neilb@suse.de>	2012-12-11 13:39:21 +11:00
NeilBrown	a7a3f08dc2	md: close race between removing and adding a device. When we remove a device from an md array, the final removal of the "dev-XX" sys entry is run asynchronously. If we then re-add that device immediately before the worker thread gets to run, we can end up trying to add the "dev-XX" sysfs entry back before it has been removed. So in both places where we add a device, call flush_workqueue(md_misc_wq); before taking the md lock (as holding the md lock can prevent removal to complete). Signed-off-by: NeilBrown <neilb@suse.de>	2012-12-11 13:35:54 +11:00
NeilBrown	1f3c9907b8	md: removed unused variable in calc_sb_1_csm. 'i' is unused. NeilBrown <neilb@suse.de>	2012-12-11 13:09:00 +11:00
Linus Torvalds	4ccc804586	Single bugfix for raid1/raid10. Fixes a recently introduced deadlock. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIVAwUAULVcJjnsnt1WYoG5AQKJjw//WXClUwmYi7FD9u8OfqLKsmvJCB8rTQFL RQTkm/AGwXMTrmV2iifR7Hpm14wP7pbwwiFNbLv4cw8W8ldt+4PfjCRTyoSwH5HT +evDAxmtuTfhznGn9fWCJClmrng0+W0ir3Bmkju+u35orCx97+98Cgv4rVAeYZ0R TR59g10y0c1QQuLPXoe3J5iVKuXjrlW4USIGRzkaqKmSZa9LGLETLE9V/RQtcdVD HB+uVMEMIGfibWSa918yWRbje8tGYeFyWOrxLs6eS/MdMEWPOdYgesxMErN+8/9q ZPYc+wIbGdYfzn8RcuAlE10ZMkS/6eNCn/O5ztBrM9Iyztecv3TKxNzb1S9RHppZ ze9d7qfX5kDhwc7YPikPZlnP4CDElDjaPzb0jSyy6FwNzWV45YuC9D5n4xGPOgcC 83ORlSzMcv6NOFZc8HjrV4NFYE4Dezm0sThFPMEkY2FfLzIztg1H5Q0k0bvfxtqa yzCaQtuGjMhsbcLELqHCXFNHFhBVaetuFKAPRnynnkgSDMiZVjmV5/rsapy+qBON 4BSI7Shwq5jn1xrqVd6ylLic5nkFIGuU7jZ15VftzP3ggQxmSLuq8Q7ZOZ+cXdZ9 bQTkZyrwsIp8qxBE9DCi33VDDcUNiSvCnTdr18XPAzJ0DQIQ4hhlmLxGGj8BV5f4 KKOqr3RBP6I= =fV2f -----END PGP SIGNATURE----- Merge tag 'md-3.7-fixes' of git://neil.brown.name/md Pull md bugfix from NeilBrown: "Single bugfix for raid1/raid10. Fixes a recently introduced deadlock." * tag 'md-3.7-fixes' of git://neil.brown.name/md: md/raid1{,0}: fix deadlock in bitmap_unplug.	2012-12-02 16:24:31 -08:00
Lukas Czerner	eed8c02e68	wait: add wait_event_lock_irq() interface New wait_event{_interruptible}_lock_irq{_cmd} macros added. This commit moves the private wait_event_lock_irq() macro from MD to regular wait includes, introduces new macro wait_event_lock_irq_cmd() instead of using the old method with omitting cmd parameter which is ugly and makes a use of new macros in the MD. It also introduces the _interruptible_ variant. The use of new interface is when one have a special lock to protect data structures used in the condition, or one also needs to invoke "cmd" before putting it to sleep. All new macros are expected to be called with the lock taken. The lock is released before sleep and is reacquired afterwards. We will leave the macro with the lock held. Note to DM: IMO this should also fix theoretical race on waitqueue while using simultaneously wait_event_lock_irq() and wait_event() because of lack of locking around current state setting and wait queue removal. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-30 11:47:57 +01:00
NeilBrown	874807a831	md/raid1{,0}: fix deadlock in bitmap_unplug. If the raid1 or raid10 unplug function gets called from a make_request function (which is very possible) when there are bios on the current->bio_list list, then it will not be able to successfully call bitmap_unplug() and it could need to submit more bios and wait for them to complete. But they won't complete while current->bio_list is non-empty. So detect that case and handle the unplugging off to another thread just like we already do when called from within the scheduler. RAID1 version of bug was introduced in 3.6, so that part of fix is suitable for 3.6.y. RAID10 part won't apply. Cc: stable@vger.kernel.org Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com> Reported-by: Peter Maloney <peter.maloney@brockmann-consult.de> Signed-off-by: NeilBrown <neilb@suse.de>	2012-11-27 12:14:40 +11:00
Linus Torvalds	1d838d70fb	Several bug fixes for md in 3.7 - raid5 discard has problems - raid10 replacement devices have problems - bad block lock seqlock usage has problems - dm-raid doesn't free everything -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIVAwUAUK/PfTnsnt1WYoG5AQJlFBAAry6TrfIEed7Sz1BwY0w1Ofd5ZFt6DCN3 CXc6yi7LQhaMAUYsMcF07BFfuphal0St68vwckFkd1jPShUgruetzsUPLdS1+cql AKOQZmJegN+yvpf+N6PxER8z0Ju8M0RNVCvgRZB166ujmoEHGf7A564Hby+FINpZ zk1d5eVtcRL05oV0NbeLaX8bNp42nNx2wwvFtM6NEVF4vwbzGzXkC9ePQ6oERJvQ Oqsu6F+TzqztIPYk/fbl1Yr/FPVAWXi4dR7KNxs/jHFcnWPi9vKcjjh1jrq46rNy xQY+y0xW6FlN0uApIKT6NC3UWutgwOGUqRdCRc4LJ1nT6aHVIn5OCIsipgRrlV0O da5pM+rgIMJK3kyT6NjhtuWuQZE4P4OSOmnq5q81VT9XOKADVsFOfibtrIr8cxYS c/8mNJVfd+cU58XNKGIEt886DsN+uzWiY8U8HZVckfeVxrBTIPas4ERXlurx+G1D jhXqK8TuEfi6ILNdBlWPphAr2ytFqWWpQIGXgYGHEIJp5WaUHoEoEblznl1MiRlZ +tYIYy0SRkcZuxs6nUNF8Or5vFidjvaIFJPjIJwSIhwgzkaV+YFad4GfI7/WgWaq 7VU12MG7UlXLlaGN1Yadvh3jAk7L45DPzWUa/Zgvvtrvvdp3JU7VQhD8d6oc/kxD 3IOrUdAXWxU= =fznK -----END PGP SIGNATURE----- Merge tag 'md-3.7-fixes' of git://neil.brown.name/md Pull md fixes from NeilBrown: "Several bug fixes for md in 3.7: - raid5 discard has problems - raid10 replacement devices have problems - bad block lock seqlock usage has problems - dm-raid doesn't free everything" * tag 'md-3.7-fixes' of git://neil.brown.name/md: md/raid10: decrement correct pending counter when writing to replacement. md/raid10: close race that lose writes lost when replacement completes. md/raid5: Make sure we clear R5_Discard when discard is finished. md/raid5: move resolving of reconstruct_state earlier in stripe_handle. md/raid5: round discard alignment up to power of 2. md: make sure everything is freed when dm-raid stops an array. md: Avoid write invalid address if read_seqretry returned true. md: Reassigned the parameters if read_seqretry returned true in func md_is_badblock.	2012-11-23 12:11:13 -10:00
Jens Axboe	a8c32a5c98	dm: fix deadlock with request based dm and queue request_fn recursion Request based dm attempts to re-run the request queue off the request completion path. If used with a driver that potentially does end_io from its request_fn, we could deadlock trying to recurse back into request dispatch. Fix this by punting the request queue run to kblockd. Tested to fix a quickly reproducible deadlock in such a scenario. Cc: stable@kernel.org Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:54 +01:00
NeilBrown	884162df2a	md/raid10: decrement correct pending counter when writing to replacement. When a write to a replacement device completes, we carefully and correctly found the rdev that the write actually went to and the blithely called rdev_dec_pending on the primary rdev, even if this write was to the replacement. This means that any writes to an array while a replacement was ongoing would cause the nr_pending count for the primary device to go negative, so it could never be removed. This bug has been present since replacement was introduced in 3.3, so it is suitable for any -stable kernel since then. Reported-by: "George Spelvin" <linux@horizon.com> Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>	2012-11-22 15:12:42 +11:00
NeilBrown	e7c0c3fa29	md/raid10: close race that lose writes lost when replacement completes. When a replacement operation completes there is a small window when the original device is marked 'faulty' and the replacement still looks like a replacement. The faulty should be removed and the replacement moved in place very quickly, bit it isn't instant. So the code write out to the array must handle the possibility that the only working device for some slot in the replacement - but it doesn't. If the primary device is faulty it just gives up. This can lead to corruption. So make the code more robust: if either the primary or the replacement is present and working, write to them. Only when neither are present do we give up. This bug has been present since replacement was introduced in 3.3, so it is suitable for any -stable kernel since then. Reported-by: "George Spelvin" <linux@horizon.com> Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>	2012-11-22 15:12:36 +11:00
NeilBrown	ca64cae960	md/raid5: Make sure we clear R5_Discard when discard is finished. commit 9e44476851e91c86c98eb92b9bc27fb801f89072 MD: raid5 avoid unnecessary zero page for trim change raid5 to clear R5_Discard when the complete request is handled rather than when submitting the per-device discard request. However it did not clear R5_Discard for the parity device. This means that if the stripe_head was reused before it expired from the cache, the setting would be wrong and a hang would result. Also if the R5_Uptodate bit happens to be set, R5_Discard again won't be cleared. But R5_Uptodate really should be clear at this point. So make sure R5_Discard is cleared in all cases, and clear R5_Uptodate when a 'discard' completes. Signed-off-by: NeilBrown <neilb@suse.de>	2012-11-22 09:14:13 +11:00
NeilBrown	ef5b7c69b7	md/raid5: move resolving of reconstruct_state earlier in stripe_handle. The chunk of code in stripe_handle which responds to a *_result value in reconstruct_state is really the completion of some processing that happened outside of handle_stripe (possibly asynchronously) and so should be one of the first things done in handle_stripe(). After the next patch it will be important that it happens before handle_stripe_clean_event(), as that will clear some dev->flags bit that this code tests. Signed-off-by: NeilBrown <neilb@suse.de>	2012-11-22 09:14:09 +11:00
NeilBrown	4ac6875eeb	md/raid5: round discard alignment up to power of 2. blkdev_issue_discard currently assumes that the granularity is a power of 2. So in raid5, round the chosen number up to avoid embarrassment. Cc: Shaohua Li <shli@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de>	2012-11-20 19:42:56 +11:00
NeilBrown	5eff3c439d	md: make sure everything is freed when dm-raid stops an array. md_stop() would stop an array, but not free various attached data structures. For internal arrays, these are freed later in do_md_stop() or mddev_put(), but they don't apply for dm-raid arrays. So get md_stop() to free them, and only all it from dm-raid. For internal arrays we now call __md_stop. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-11-20 10:27:37 +11:00
majianpeng	35f9ac2dce	md: Avoid write invalid address if read_seqretry returned true. If read_seqretry returned true and bbp was changed, it will write invalid address which can cause some serious problem. This bug was introduced by commit v3.0-rc7-130-g2699b67. So fix is suitable for 3.0.y thru 3.6.y. Reported-by: zhuwenfeng@kedacom.com Tested-by: zhuwenfeng@kedacom.com Cc: stable@vger.kernel.org Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-11-20 10:27:17 +11:00
majianpeng	ab05613a06	md: Reassigned the parameters if read_seqretry returned true in func md_is_badblock. This bug was introduced by commit(v3.0-rc7-126-g2230dfe). So fix is suitable for 3.0.y thru 3.6.y. Cc: stable@vger.kernel.org Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-11-20 10:27:05 +11:00
Jonathan Brassow	ed30be077e	MD RAID10: Fix oops when creating RAID10 arrays via dm-raid.c Commit 2863b9eb didn't take into account the changes to add TRIM support to RAID10 (commit 532a2a3fb). That is, when using dm-raid.c to create the RAID10 arrays, there is no mddev->gendisk or mddev->queue. The code added to support TRIM simply assumes that mddev->queue is available without checking. The result is an oops any time dm-raid.c attempts to create a RAID10 device. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-10-31 11:42:30 +11:00
NeilBrown	02b898f2f0	md/raid1: Fix assembling of arrays containing Replacements. setup_conf in raid1.c uses conf->raid_disks before assigning a value. It is used when including 'Replacement' devices. The consequence is that assembling an array which contains a replacement will misbehave and either not include the replacement, or not include the device being replaced. Though this doesn't lead directly to data corruption, it could lead to reduced data safety. So use mddev->raid_disks, which is initialised, instead. Bug was introduced by commit c19d57980b38a5bb613a898937a1cf85f422fb9b md/raid1: recognise replacements when assembling arrays. in 3.3, so fix is suitable for 3.3.y thru 3.6.y. Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>	2012-10-31 11:42:03 +11:00
Masanari Iida	83f0d77a7f	md: Fix typo in drivers/md Correct spelling typo in drivers/md. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-10-29 22:57:50 +01:00
Eric Sandeen	0be1fecd7e	md faulty: use disk_stack_limits() in: fe86cdce block: do not artificially constrain max_sectors for stacking drivers max_sectors defaults to UINT_MAX. md faulty wasn't using disk_stack_limits(), so inherited this large value as well. This triggered a bug in XFS when stressed over md_faulty, when a very large bio_alloc() failed. That was on an older kernel, and I can't reproduce exactly the same thing upstream, but I think the fix is appropriate in any case. Thanks to Mike Snitzer for pointing out the problem. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-10-22 10:44:55 +11:00

... 3 4 5 6 7 ...

2787 Commits