linux

iv/linux

Author	SHA1	Message	Date
Mikulas Patocka	93e6442c76	dm: add basic support for using the select or poll function Add the ability to poll on the /dev/mapper/control device. The select or poll function waits until any event happens on any dm device since opening the /dev/mapper/control device. When select or poll returns the device as readable, we must close and reopen the device to wait for new dm events. Usage: 1. open the /dev/mapper/control device 2. scan the event numbers of all devices we are interested in and process them 3. call select, poll or epoll on the handle (it waits until some new event happens since opening the device) 4. close the /dev/mapper/control handle 5. go to step 1 The next commit allows to re-arm the polling without closing and reopening the device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-06-19 11:03:49 -04:00
Ming Lei	f660174e8b	blk-mq: use the introduced blk_mq_unquiesce_queue() blk_mq_unquiesce_queue() is used for unquiescing the queue explicitly, so replace blk_mq_start_stopped_hw_queues() with it. For the scsi part, this patch takes Bart's suggestion to switch to block quiesce/unquiesce API completely. Cc: linux-nvme@lists.infradead.org Cc: linux-scsi@vger.kernel.org Cc: dm-devel@redhat.com Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 14:20:04 -06:00
NeilBrown	9b10f6a9c2	block: remove bio_clone() and all references. bio_clone() is no longer used. Only bio_clone_bioset() or bio_clone_fast(). This is for the best, as bio_clone() used fs_bio_set, and filesystems are unlikely to want to use bio_clone(). So remove bio_clone() and all references. This includes a fix to some incorrect documentation. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 12:40:59 -06:00
NeilBrown	5a136fdf5a	bcache: use kmalloc to allocate bio in bch_data_verify() This function allocates a bio, then a collection of pages. It copes with failure. It currently uses a mempool() to allocate the bio, but alloc_page() to allocate the pages. These fail in different ways, so the usage is inconsistent. Change the bio_clone() to bio_clone_kmalloc() so that no pool is used either for the bio or the pages. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by : Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 12:40:59 -06:00
NeilBrown	47e0fb461f	blk: make the bioset rescue_workqueue optional. This patch converts bioset_create() to not create a workqueue by default, so alloctions will never trigger punt_bios_to_rescuer(). It also introduces a new flag BIOSET_NEED_RESCUER which tells bioset_create() to preserve the old behavior. All callers of bioset_create() that are inside block device drivers, are given the BIOSET_NEED_RESCUER flag. biosets used by filesystems or other top-level users do not need rescuing as the bio can never be queued behind other bios. This includes fs_bio_set, blkdev_dio_pool, btrfs_bioset, xfs_ioend_bioset, and one allocated by target_core_iblock.c. biosets used by md/raid do not need rescuing as their usage was recently audited and revised to never risk deadlock. It is hoped that most, if not all, of the remaining biosets can end up being the non-rescued version. Reviewed-by: Christoph Hellwig <hch@lst.de> Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes) Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 12:40:59 -06:00
NeilBrown	011067b056	blk: replace bioset_create_nobvec() with a flags arg to bioset_create() "flags" arguments are often seen as good API design as they allow easy extensibility. bioset_create_nobvec() is implemented internally as a variation in flags passed to __bioset_create(). To support future extension, make the internal structure part of the API. i.e. add a 'flags' argument to bioset_create() and discard bioset_create_nobvec(). Note that the bio_split allocations in drivers/md/raid* do not need the bvec mempool - they should have used bioset_create_nobvec(). Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 12:40:59 -06:00
NeilBrown	af67c31fba	blk: remove bio_set arg from blk_queue_split() blk_queue_split() is always called with the last arg being q->bio_split, where 'q' is the first arg. Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses q->bio_split. This is inconsistent and unnecessary. Remove the last arg and always use q->bio_split inside blk_queue_split() Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed) Reviewed-by: Javier González <javier@cnexlabs.com> Tested-by: Javier González <javier@cnexlabs.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 12:40:59 -06:00
Lidong Zhong	8df7202439	md: change the initialization value for a spare device spot to MD_DISK_ROLE_SPARE The value for spare spot of sb->dev_roles is changed from MD_DISK_ROLE_FAULTY to MD_DISK_ROLE_SPARE to keep align with the value when the superblock is firstly created in userspace. Signed-off-by: Lidong Zhong <lzhong@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-06-16 12:04:09 -07:00
Guoqing Jiang	037d2ff62c	md/raid1: remove unused bio in sync_request_write The "bio" is not used in sync_request_write after commit a68e58703575 ("md/raid1: split out two sub-functions from sync_request_write"). Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-06-16 12:04:09 -07:00
Guoqing Jiang	1cdd125794	md/raid10: fix FailFast test for wrong device We need to test FailFast flag for replacement device here since the set up for writing is for the replacement, so we need fix it like: - if (test_bit(FailFast, &conf->mirrors[d].rdev->flags)) + if (test_bit(FailFast, &conf->mirrors[d].replacement->flags)) Since commit f90145f317ef ("md/raid10: add rcu protection to rdev access in raid10_sync_request.") had added the rcu protection for the part, so let's extend the range protected by rcu and use rdev directly. Fixes: 1919cbb ("md/raid10: add failfast handling for writes.") Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-06-16 12:04:08 -07:00
Jens Axboe	c27b2d634f	Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-4.13/block Pull NVMe changes for 4.13 from Christoph: Highlights: - UUID identifier support from Johannes - Lots of cleanups from Sagi - Host Memory Buffer support from me And lots of cleanups and smaller fixes of course. Note that the UUID identifier changes are based on top of the uuid tree. I am the maintainer of that tree and will send it to Linus as soon as 4.12 is released as various other trees depend on it as well (and the diffstat includes those changes unfortunately)	2017-06-16 10:14:59 -06:00
Dan Williams	abebfbe2f7	dm: add ->flush() dax operation support Allow device-mapper to route flush operations to the per-target implementation. In order for the device stacking to work we need a dax_dev and a pgoff relative to that device. This gives each layer of the stack the information it needs to look up the operation pointer for the next level. This conceptually allows for an array of mixed device drivers with varying flush implementations. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-15 14:34:59 -07:00
Mike Snitzer	cd15fb64ee	Revert "dm mirror: use all available legs on multiple failures" This reverts commit 12a7cf5ba6c776a2621d8972c7d42e8d3d959d20. This commit apparently attempted to fix an issue that didn't really exist, furthermore: this commit is the source of deadlocks and crashes seen in multiple cases related to failing the primary mirror dev while syncing. Reported-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-06-15 08:39:15 -04:00
Dan Carpenter	047385b3dd	dm: missing break in process_queued_bios() his used to be a fall through case, but we shifted code around and I think we want a break here now. Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-14 08:22:55 -06:00
Mikulas Patocka	f9c79bc05a	md: don't use flush_signals in userspace processes The function flush_signals clears all pending signals for the process. It may be used by kernel threads when we need to prepare a kernel thread for responding to signals. However using this function for an userspaces processes is incorrect - clearing signals without the program expecting it can cause misbehavior. The raid1 and raid5 code uses flush_signals in its request routine because it wants to prepare for an interruptible wait. This patch drops flush_signals and uses sigprocmask instead to block all signals (including SIGKILL) around the schedule() call. The signals are not lost, but the schedule() call won't respond to them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Acked-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-06-13 10:18:02 -07:00
NeilBrown	cc27b0c78c	md: fix deadlock between mddev_suspend() and md_write_start() If mddev_suspend() races with md_write_start() we can deadlock with mddev_suspend() waiting for the request that is currently in md_write_start() to complete the ->make_request() call, and md_write_start() waiting for the metadata to be updated to mark the array as 'dirty'. As metadata updates done by md_check_recovery() only happen then the mddev_lock() can be claimed, and as mddev_suspend() is often called with the lock held, these threads wait indefinitely for each other. We fix this by having md_write_start() abort if mddev_suspend() is happening, and ->make_request() aborts if md_write_start() aborted. md_make_request() can detect this abort, decrease the ->active_io count, and wait for mddev_suspend(). Reported-by: Nix <nix@esperi.org.uk> Fix: 68866e425be2(MD: no sync IO while suspended) Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-06-13 10:18:01 -07:00
Christoph Hellwig	fdd050b5b3	Merge branch 'uuid-types' of bombadil.infradead.org:public_git/uuid into nvme-base	2017-06-13 11:45:14 +02:00
Ondrej Mosnáček	2ad50606f8	dm integrity: reject mappings too large for device dm-integrity would successfully create mappings with the number of sectors greater than the provided data sector count. Attempts to read sectors of this mapping that were beyond the provided data sector count would then yield run-time messages of the form "device-mapper: integrity: Too big sector number: ...". Fix this by emitting an error when the requested mapping size is bigger than the provided data sector count. Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-06-12 17:05:55 -04:00
Jens Axboe	8f66439eec	Linux 4.12-rc5 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU 4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+ 2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4= =m2Gx -----END PGP SIGNATURE----- Merge tag 'v4.12-rc5' into for-4.13/block We've already got a few conflicts and upcoming work depends on some of the changes that have gone into mainline as regression fixes for this series. Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream trees to continue working on 4.13 changes. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-12 08:30:13 -06:00
Dan Williams	7e026c8c0a	dm: add ->copy_from_iter() dax operation support Allow device-mapper to route copy_from_iter operations to the per-target implementation. In order for the device stacking to work we need a dax_dev and a pgoff relative to that device. This gives each layer of the stack the information it needs to look up the operation pointer for the next level. This conceptually allows for an array of mixed device drivers with varying copy_from_iter implementations. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-09 09:22:21 -07:00
Christoph Hellwig	4e4cbee93d	block: switch bios to blk_status_t Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	fc17b6534e	blk-mq: switch ->queue_rq return value to blk_status_t Use the same values for use for request completion errors as the return value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause a requeue, and all the others are completed as-is. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	2a842acab1	block: introduce new block status code type Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later. For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it. blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	1be5690984	dm: change ->end_io calling convention Turn the error paramter into a pointer so that target drivers can change the value, and make sure only DM_ENDIO_* values are returned from the methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	846785e6a5	dm: don't return errnos from ->map Instead use the special DM_MAPIO_KILL return value to return -EIO just like we do for the request based path. Note that dm-log-writes returned -ENOMEM in a few places, which now becomes -EIO instead. No consumer treats -ENOMEM special so this shouldn't be an issue (and it should use a mempool to start with to make guaranteed progress). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	14ef1e4826	dm mpath: merge do_end_io_bio into multipath_end_io_bio This simplifies the code and especially the error passing a bit and will help with the next patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Christoph Hellwig	9966afaf91	dm: fix REQ_RAHEAD handling A few (but not all) dm targets use a special EWOULDBLOCK error code for failing REQ_RAHEAD requests that fail due to a lack of available resources. But no one else knows about this magic code, and lower level drivers also don't generate it when failing read-ahead requests for similar reasons. So remove this special casing and ignore all additional error handling for REQ_RAHEAD - if this was a real underlying error we'd get a normal read once the real read comes in. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
NeilBrown	a415c0f106	md: initialise ->writes_pending in personality modules. The new per-cpu counter for writes_pending is initialised in md_alloc(), which is not called by dm-raid. So dm-raid fails when md_write_start() is called. Move the initialization to the personality modules that need it. This way it is always initialised when needed, but isn't unnecessarily initialized (requiring memory allocation) when the personality doesn't use writes_pending. Reported-by: Heinz Mauelshagen <heinzm@redhat.com> Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-06-05 16:04:35 -07:00
Amir Goldstein	e6fd2093a8	md: namespace private helper names The md private helper uuid_equal() collides with a generic helper of the same name. Rename the md private helper to md_uuid_equal() and do the same for md_sb_equal(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Shaohua Li <shli@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>	2017-06-05 16:56:37 +02:00
Linus Torvalds	60c42a31dc	- a DM verity fix for a mode when no salt is used - a fix to DM to account for the possibility that PREFLUSH or FUA are used without the SYNC flag if the underlying storage doesn't have a volatile write-cache - a DM ioctl memory allocation flag fix to use __GFP_HIGH to allow emergency forward progress (by using memory reserves as last resort) - a small DM integrity cleanup to use kvmalloc() instead of duplicating the same -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJZMZqnAAoJEMUj8QotnQNaB3UIAKDWS1bAAuPR6iiIdlffmZzf DxbCkpgCIHcmuihNKm+57P+Vwa7a5OFb7LSZhU7p2ormncrzEmLRP4D5twtyxTNt f1zT85gB3FXZVRDbcOP7yx2JuZdimu41kYBc9PaQuyTaNUl8lisnQhTVuJ9m0VO3 jdgQpSD0jQPhSFd/rA4hi75l2MeKwOJA2cXsfBgSxIW+rflYFi7SoNfG7wQattpU ArdDEgU++lRuk9FaGVHOIgTduhutk8NtRQFmxXZlE10KEf05SbBLVr1EGgKLVnyC 72+NI8OKeqic9R4phyWNZhErWnteXy9bbQtvBOpEIs8byFY176byzBkGBXjHebE= =MLRg -----END PGP SIGNATURE----- Merge tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - a DM verity fix for a mode when no salt is used - a fix to DM to account for the possibility that PREFLUSH or FUA are used without the SYNC flag if the underlying storage doesn't have a volatile write-cache - a DM ioctl memory allocation flag fix to use __GFP_HIGH to allow emergency forward progress (by using memory reserves as last resort) - a small DM integrity cleanup to use kvmalloc() instead of duplicating the same * tag 'for-4.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: make flush bios explicitly sync dm ioctl: restore __GFP_HIGH in copy_params() dm integrity: use kvmalloc() instead of dm_integrity_kvmalloc() dm verity: fix no salt use case	2017-06-02 11:50:37 -07:00
Jan Kara	5a8948f8a3	md: Make flush bios explicitely sync Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA\|PREFLUSH\|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. CC: linux-raid@vger.kernel.org CC: Shaohua Li <shli@kernel.org> Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-31 09:25:53 -07:00
Jan Kara	ff0361b34a	dm: make flush bios explicitly sync Commit b685d3d65ac7 ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous") removed REQ_SYNC flag from WRITE_{FUA\|PREFLUSH\|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions. Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. Fixes: b685d3d65ac7 ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous") Cc: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-31 10:50:23 -04:00
Nix	e153903686	md: report sector of stripes with check mismatches This makes it possible, with appropriate filesystem support, for a sysadmin to tell what is affected by the mismatch, and whether it should be ignored (if it's inside a swap partition, for instance). We ratelimit to prevent log flooding: if there are so many mismatches that ratelimiting is necessary, the individual messages are relatively unlikely to be important (either the machine is swapping like crazy or something is very wrong with the disk). Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-24 16:31:13 -07:00
Kyungchan Koh	4179bc30b2	md: uuid debug statement now in processor byte order. Previously, the uuid debug statements were printed in little-endian format, which wasn't consistent in machines that might not be in little-endian byte order. With this change, the output will be consistent for all machines with different byte-ordering. Signed-off-by: Kyungchan Koh <kkc6196@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-24 15:58:43 -07:00
Junaid Shahid	8c1e2162f2	dm ioctl: restore __GFP_HIGH in copy_params() Commit d224e9381897 ("drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant") left out the __GFP_HIGH flag when converting from __vmalloc to kvmalloc. This can cause the DM ioctl to fail in some low memory situations where it wouldn't have failed earlier. Add __GFP_HIGH back to avoid any potential regression. Fixes: d224e9381897 ("drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant") Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-22 19:30:03 -04:00
Mikulas Patocka	702a6204f8	dm integrity: use kvmalloc() instead of dm_integrity_kvmalloc() Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-22 14:09:52 -04:00
Gilad Ben-Yossef	f52236e0b0	dm verity: fix no salt use case DM-Verity has an (undocumented) mode where no salt is used. This was never handled directly by the DM-Verity code, instead working due to the fact that calling crypto_shash_update() with a zero length data is an implicit noop. This is no longer the case now that we have switched to crypto_ahash_update(). Fix the issue by introducing explicit handling of the no salt use case to DM-Verity. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Reported-by: Marian Csontos <mcsontos@redhat.com> Fixes: d1ac3ff ("dm verity: switch to using asynchronous hash crypto API") Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-22 13:49:03 -04:00
Guoqing Jiang	2dffdc0724	md-cluster: fix potential lock issue in add_new_disk The add_new_disk returns with communication locked if __sendmsg returns failure, fix it with call unlock_comm before return. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> CC: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-05-21 20:37:09 -07:00
Linus Torvalds	8b4822de59	Merge tag 'md/4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull MD fixes from Shaohua Li: - Several bug fixes for raid5-cache from Song Liu, mainly handle journal disk error - Fix bad block handling in choosing raid1 disk from Tomasz Majchrzak - Simplify external metadata array sysfs handling from Artur Paszkiewicz - Optimize raid0 discard handling from me, now raid0 will dispatch large discard IO directly to underlayer disks. * tag 'md/4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: raid1: prefer disk without bad blocks md/r5cache: handle sync with data in write back cache md/r5cache: gracefully handle journal device errors for writeback mode md/raid1/10: avoid unnecessary locking md/raid5-cache: in r5l_do_submit_io(), submit io->split_bio first md/md0: optimize raid0 discard handling md: don't return -EAGAIN in md_allow_write for external metadata arrays md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock	2017-05-18 12:04:41 -07:00
Colin Ian King	7e1b9521f5	dm cache: handle kmalloc failure allocating background_tracker struct Currently there is no kmalloc failure check on the allocation of the background_tracker struct in btracker_create(), and so a NULL return will lead to a NULL pointer dereference. Add a NULL check. Detected by CoverityScan, CID#1416587 ("Dereference null return value") Fixes: b29d4986d ("dm cache: significant rework to leverage dm-bio-prison-v2") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-17 09:44:53 -04:00
Mikulas Patocka	13840d3801	dm bufio: make the parameter "retain_bytes" unsigned long Change the type of the parameter "retain_bytes" from unsigned to unsigned long, so that on 64-bit machines the user can set more than 4GiB of data to be retained. Also, change the type of the variable "count" in the function "__evict_old_buffers" to unsigned long. The assignment "count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];" could result in unsigned long to unsigned overflow and that could result in buffers not being freed when they should. While at it, avoid division in get_retain_buffers(). Division is slow, we can change it to shift because we have precalculated the log2 of block size. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-16 15:12:08 -04:00
Christoph Hellwig	f98e0eb680	dm mpath: multipath_clone_and_map must not return -EIO Since 412445ac ("dm: introduce a new DM_MAPIO_KILL return value"), the clone_and_map_rq methods must not return errno values, so fix it up to properly return DM_MAPIO_KILL, instead of the -EIO value that snuck in due to a conflict between two patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:53 -04:00
Christoph Hellwig	18a482f524	dm mpath: don't return -EIO from dm_report_EIO Instead just turn the macro into a helper for the warning message. This removes an unnecessary assignment and will allow the next commit to fix a place where -EIO is the wrong return value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:52 -04:00
Christoph Hellwig	ece0728037	dm rq: add a missing break to map_request We don't want to bug when receiving a DM_MAPIO_KILL value.. Fixes: 412445ac ("dm: introduce a new DM_MAPIO_KILL return value") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:51 -04:00
Joe Thornber	0377a07c7a	dm space map disk: fix some book keeping in the disk space map When decrementing the reference count for a block, the free count wasn't being updated if the reference count went to zero. Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:50 -04:00
Joe Thornber	91bcdb92d3	dm thin metadata: call precommit before saving the roots These calls were the wrong way round in __write_initial_superblock. Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-15 15:09:49 -04:00
Joe Thornber	2e63309507	dm cache policy smq: don't do any writebacks unless IDLE If there are no clean blocks to be demoted the writeback will be triggered at that point. Preemptively writing back can hurt high IO load scenarios. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:33 -04:00
Joe Thornber	49b7f76890	dm cache: simplify the IDLE vs BUSY state calculation Drop the MODERATE state since it wasn't buying us much. Also, in check_migrations(), prepare for the next commit ("dm cache policy smq: don't do any writebacks unless IDLE") by deferring to the policy to make the final decision on whether writebacks can be serviced. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:33 -04:00
Joe Thornber	701e03e4e1	dm cache: track all IO to the cache rather than just the origin device's IO IO tracking used to throttle writebacks when the origin device is busy. Even if all the IO is going to the fast device, writebacks can significantly degrade performance. So track all IO to gauge whether the cache is busy or not. Otherwise, synthetic IO tests (e.g. fio) that might send all IO to the fast device wouldn't cause writebacks to get throttled. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:33 -04:00
Joe Thornber	6cf4cc8f8b	dm cache policy smq: stop preemptively demoting blocks It causes a lot of churn if the working set's size is close to the fast device's size. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-05-14 21:54:33 -04:00

... 2 3 4 5 6 ...

5028 Commits