IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to the discard granularity as done by nbd is mostly harmless
but also useless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to the discard granularity as done by ubd is mostly harmless
but also useless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull MD updates from Song:
"1. Improve annotation in raid5 code, by Logan Gunthorpe.
2. Support MD_BROKEN flag in raid-1/5/10, by Mariusz Tkaczyk.
3. Other small fixes/cleanups."
* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: Replace role magic numbers with defined constants
md/raid0: Ignore RAID0 layout if the second zone has only one device
md/raid5: Annotate functions that hold device_lock with __must_hold
md/raid5-ppl: Annotate with rcu_dereference_protected()
md/raid5: Annotate rdev/replacement access when mddev_lock is held
md/raid5: Annotate rdev/replacement accesses when nr_pending is elevated
md/raid5: Add __rcu annotation to struct disk_info
md/raid5: Un-nest struct raid5_percpu definition
md/raid5: Cleanup setup_conf() error returns
md: replace deprecated strlcpy & remove duplicated line
md/bitmap: don't set sb values if can't pass sanity check
md: fix an incorrect NULL check in md_reload_sb
md: fix an incorrect NULL check in does_sb_need_changing
raid5: introduce MD_BROKEN
md: Set MD_BROKEN for RAID1 and RAID10
Total 16 bytes can be saved in two ways:
1) The field 'bio' will only be used in bio based mode, and the field
'rq' will only be used in mq mode. Since they won't be used in the
same time, declare a union for them.
2) The field 'bool fake_timeout' can be placed in the hole after the
field 'error'.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20220426022133.3999006-1-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There are several instances where magic numbers are used in md.c instead
of the defined constants in md_p.h. This patch set improves code
readability by replacing all occurrences of 0xffff, 0xfffe, and 0xfffd when
relating to md roles with their equivalent defined constant.
Signed-off-by: David Sloan <david.sloan@eideticom.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
The RAID0 layout is irrelevant if all members have the same size so the
array has only one zone. It is *also* irrelevant if the array has two
zones and the second zone has only one device, for example if the array
has two members of different sizes.
So in that case it makes sense to allow assembly even when the layout is
undefined, like what is done when the array has only one zone.
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org>
Signed-off-by: Song Liu <song@kernel.org>
A handful of functions note the device_lock must be held with a comment
but this is not comprehensive. Many other functions hold the lock when
taken so add an __must_hold() to each call to annotate when the lock is
held.
This makes it a bit easier to analyse device_lock.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
To suppress the last remaining sparse warnings about accessing
rdev, add rcu_dereference_protected calls to a couple places
in raid5-ppl. All of these places are called under raid5_run and
therefore are occurring before the array has started and is thus
safe.
There's no sensible check to do for the second argument of
rcu_dereference_protected() so a comment is added instead.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
The mddev_lock should be held during raid5_remove_disk() which is when
the rdev/replacement pointers are modified. So any access to these
pointers marked __rcu should be safe whenever the mddev_lock is held.
There are numerous such access that currently produce sparse warnings.
Add a helper function, rdev_mdlock_deref() that wraps
rcu_dereference_protected() in all these instances.
This annotation fixes a number of sparse warnings.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
There are a number of accesses to __rcu variables that should be safe
because nr_pending in the disk is known to be elevated.
Create a wrapper around rcu_dereference_protected() to annotate these
accesses and verify that nr_pending is non-zero.
This fixes a number of sparse warnings.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
rdev and replacement are protected in some circumstances with
rcu_dereference and synchronize_rcu (in raid5_remove_disk()). However,
they were not annotated with __rcu so a sparse warning is emitted for
every rcu_dereference() call.
Add the __rcu annotation and fix up the initialization with
RCU_INIT_POINTER, all pointer modifications with rcu_assign_pointer(),
a few cases where the pointer value is tested with rcu_access_pointer()
and one case where READ_ONCE() is used instead of rcu_dereference(),
a case in print_raid5_conf() that should have rcu_dereference() and
rcu_read_[un]lock() calls.
Additional sparse issues will be fixed up in further commits.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Sparse reports many warnings of the form:
drivers/md/raid5.c:1476:16: warning: dereference of noderef expression
This is because all struct raid5_percpu definitions get marked as
__percpu when really only the pointer in r5conf should have that
annotation.
Fix this by moving the defnition of raid5_precpu out of the definition
of struct r5conf.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Be more careful about the error returns. Most errors in this function
are actually ENOMEM, but it forcibly returns EIO if conf has been
allocated.
Instead return ret and ensure it is set appropriately before each goto
abort.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
This commit includes two topics:
1> replace deprecated strlcpy
change strlcpy to strscpy for strlcpy is marked as deprecated in
Documentation/process/deprecated.rst
2> remove duplicated strlcpy line
in md_bitmap_read_sb@md-bitmap.c there are two duplicated strlcpy(), the
history:
- commit cf921cc19c ("Add node recovery callbacks") introduced the first
usage of strlcpy().
- commit b97e92574c ("Use separate bitmaps for each nodes in the cluster")
introduced the second strlcpy(). this time, the two strlcpy() are same,
we can remove anyone safely.
- commit d3b178adb3 ("md: Skip cluster setup for dm-raid") added dm-raid
special handling. And the "nodes" value is the key of this patch. but
from this patch, strlcpy() which was introduced by b97e92574c
become necessary.
- commit 3c462c880b ("md: Increment version for clustered bitmaps") used
clustered major version to only handle in clustered env. this patch
could look a polishment for clustered code logic.
So cf921cc19c became useless after d3b178adb3, we could remove it
safely.
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
The bug is here:
if (!rdev || rdev->desc_nr != nr) {
The list iterator value 'rdev' will *always* be set and non-NULL
by rdev_for_each_rcu(), so it is incorrect to assume that the
iterator value will be NULL if the list is empty or no element
found (In fact, it will be a bogus pointer to an invalid struct
object containing the HEAD). Otherwise it will bypass the check
and lead to invalid memory access passing the check.
To fix the bug, use a new variable 'iter' as the list iterator,
while using the original variable 'pdev' as a dedicated pointer to
point to the found element.
Cc: stable@vger.kernel.org
Fixes: 70bcecdb15 ("md-cluster: Improve md_reload_sb to be less error prone")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: Song Liu <song@kernel.org>
The bug is here:
if (!rdev)
The list iterator value 'rdev' will *always* be set and non-NULL
by rdev_for_each(), so it is incorrect to assume that the iterator
value will be NULL if the list is empty or no element found.
Otherwise it will bypass the NULL check and lead to invalid memory
access passing the check.
To fix the bug, use a new variable 'iter' as the list iterator,
while using the original variable 'rdev' as a dedicated pointer to
point to the found element.
Cc: stable@vger.kernel.org
Fixes: 2aa82191ac ("md-cluster: Perform a lazy update")
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
Raid456 module had allowed to achieve failed state. It was fixed by
fb73b357fb ("raid5: block failing device if raid will be failed").
This fix introduces a bug, now if raid5 fails during IO, it may result
with a hung task without completion. Faulty flag on the device is
necessary to process all requests and is checked many times, mainly in
analyze_stripe().
Allow to set faulty on drive again and set MD_BROKEN if raid is failed.
As a result, this level is allowed to achieve failed state again, but
communication with userspace (via -EBUSY status) will be preserved.
This restores possibility to fail array via #mdadm --set-faulty command
and will be fixed by additional verification on mdadm side.
Reproduction steps:
mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
mkfs.xfs /dev/md126 -f
mount /dev/md126 /mnt/root/
fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
--bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
--time_based --group_reporting --name=throughput-test-job
--eta-newline=1 &
echo 1 > /sys/block/nvme2n1/device/device/remove
echo 1 > /sys/block/nvme1n1/device/device/remove
[ 1475.787779] Call Trace:
[ 1475.793111] __schedule+0x2a6/0x700
[ 1475.799460] schedule+0x38/0xa0
[ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
[ 1475.813856] ? finish_wait+0x80/0x80
[ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
[ 1475.828281] ? finish_wait+0x80/0x80
[ 1475.834727] ? finish_wait+0x80/0x80
[ 1475.841127] ? finish_wait+0x80/0x80
[ 1475.847480] md_handle_request+0x119/0x190
[ 1475.854390] md_make_request+0x8a/0x190
[ 1475.861041] generic_make_request+0xcf/0x310
[ 1475.868145] submit_bio+0x3c/0x160
[ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
[ 1475.882070] iomap_dio_bio_actor+0x175/0x390
[ 1475.889149] iomap_apply+0xff/0x310
[ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
[ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
[ 1475.909974] iomap_dio_rw+0x2f2/0x490
[ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
[ 1475.923680] ? atime_needs_update+0x77/0xe0
[ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
[ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
[ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
[ 1475.953403] aio_read+0xd5/0x180
[ 1475.959395] ? _cond_resched+0x15/0x30
[ 1475.965907] io_submit_one+0x20b/0x3c0
[ 1475.972398] __x64_sys_io_submit+0xa2/0x180
[ 1475.979335] ? do_io_getevents+0x7c/0xc0
[ 1475.986009] do_syscall_64+0x5b/0x1a0
[ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 1476.000255] RIP: 0033:0x7f11fc27978d
[ 1476.006631] Code: Bad RIP value.
[ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.
Cc: stable@vger.kernel.org
Fixes: fb73b357fb ("raid5: block failing device if raid will be failed")
Reviewd-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
There is no direct mechanism to determine raid failure outside
personality. It is done by checking rdev->flags after executing
md_error(). If "faulty" flag is not set then -EBUSY is returned to
userspace. -EBUSY means that array will be failed after drive removal.
Mdadm has special routine to handle the array failure and it is executed
if -EBUSY is returned by md.
There are at least two known reasons to not consider this mechanism
as correct:
1. drive can be removed even if array will be failed[1].
2. -EBUSY seems to be wrong status. Array is not busy, but removal
process cannot proceed safe.
-EBUSY expectation cannot be removed without breaking compatibility
with userspace. In this patch first issue is resolved by adding support
for MD_BROKEN flag for RAID1 and RAID10. Support for RAID456 is added in
next commit.
The idea is to set the MD_BROKEN if we are sure that raid is in failed
state now. This is done in each error_handler(). In md_error() MD_BROKEN
flag is checked. If is set, then -EBUSY is returned to userspace.
As in previous commit, it causes that #mdadm --set-faulty is able to
fail array. Previously proposed workaround is valid if optional
functionality[1] is disabled.
[1] commit 9a567843f7ce("md: allow last device to be forcibly removed from
RAID1/RAID10.")
Reviewd-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
lo_refcount counts how many openers a loop device has, but that count
is already provided by the block layer in the bd_openers field of the
whole-disk block_device. Remove lo_refcount and allow opens to
succeed even on devices beeing deleted - now that ->free_disk is
implemented we can handle that race gracefull and all I/O on it will
just fail. Similarly there is a small race window now where
loop_control_remove does not synchronize the delete vs the remove
due do bd_openers not being under lo_mutex protection, but we can
handle that just as gracefully.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since ->release is called with disk->open_mutex held, and __loop_clr_fd()
from lo_release() is called via ->release when disk_openers() == 0, we are
guaranteed that "struct file" which will be passed to loop_validate_file()
via fget() cannot be the loop device __loop_clr_fd(lo, true) will clear.
Thus, there is no need to hold loop_validate_mutex from __loop_clr_fd()
if release == true.
When I made commit 3ce6e1f662 ("loop: reintroduce global lock for
safe loop_validate_file() traversal"), I wrote "It is acceptable for
loop_validate_file() to succeed, for actual clear operation has not started
yet.". But now I came to feel why it is acceptable to succeed.
It seems that the loop driver was added in Linux 1.3.68, and
if (lo->lo_refcnt > 1)
return -EBUSY;
check in loop_clr_fd() was there from the beginning. The intent of this
check was unclear. But now I think that current
disk_openers(lo->lo_disk) > 1
form is there for three reasons.
(1) Avoid I/O errors when some process which opens and reads from this
loop device in response to uevent notification (e.g. systemd-udevd),
as described in commit a1ecac3b06 ("loop: Make explicit loop
device destruction lazy"). This opener is short-lived because it is
likely that the file descriptor used by that process is closed soon.
(2) Avoid I/O errors caused by underlying layer of stacked loop devices
(i.e. ioctl(some_loop_fd, LOOP_SET_FD, other_loop_fd)) being suddenly
disappeared. This opener is long-lived because this reference is
associated with not a file descriptor but lo->lo_backing_file.
(3) Avoid I/O errors caused by underlying layer of mounted loop device
(i.e. mount(some_loop_device, some_mount_point)) being suddenly
disappeared. This opener is long-lived because this reference is
associated with not a file descriptor but mount.
While race in (1) might be acceptable, (2) and (3) should be checked
racelessly. That is, make sure that __loop_clr_fd() will not run if
loop_validate_file() succeeds, by doing refcount check with global lock
held when explicit loop device destruction is requested.
As a result of no longer waiting for lo->lo_mutex after setting Lo_rundown,
we can remove pointless BUG_ON(lo->lo_state != Lo_rundown) check.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220330052917.2566582-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, udev change event is generated for a loop device before the
device is ready for IO. Due to serialization on lo->lo_mutex in
lo_open() this does not matter because anybody is able to open the
device and do IO only after the configuration is finished. However this
synchronization in lo_open() is going away so make sure userspace
reacting to the change event will see the new device state by generating
the event only when the device is setup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ensure that the lo_device which is stored in the gendisk private
data is valid until the gendisk is freed. Currently the loop driver
uses a lot of effort to make sure a device is not freed when it is
still in use, but to to fix a potential deadlock this will be relaxed
a bit soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
->release is only called after all outstanding I/O has completed, so only
freeze the queue when clearing the backing file of a live loop device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220330052917.2566582-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
By the time the final ->release is called there can't be outstanding I/O.
For non-final ->release there is no need for driver action at all.
Thus remove the useless queue freeze.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220330052917.2566582-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Nothing prevents a file system or userspace opener of the block device
from redirtying the page right afte sync_blockdev returned. Fortunately
data in the page cache during a block device change is mostly harmless
anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220330052917.2566582-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no need to reinitialize idle_worker_list, worker_tree and timer
every time a loop device is configured. Just initialize them once at
allocation time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220330052917.2566582-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use a common helper for both timer based and uncoditional freeing of idle
workers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220330052917.2566582-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All manipulation of bd_openers is under disk->open_mutex and will remain
so for the foreseeable future. But at least one place reads it without
the lock (blkdev_get) and there are more to be added. So make sure the
compiler does not do turn the increments and decrements into non-atomic
sequences by using an atomic_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a helper that returns the openers for a given gendisk to avoid having
drivers poke into disk->part0 to get at this information in a somewhat
cumbersome way.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove the bdev variable and just use the gendisk pointed to by the
zram_device directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use a local variable for the gendisk instead of the part0 block_device,
as the gendisk is what this function actually operates on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The bdev parameter to ->ioctl contains the block device that the ioctl
is called on, which can be the partition. But the openers check in
nbd_bdev_reset really needs to check use the whole device, so switch to
using that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions. This fixes the following warnings from coccicheck:
./drivers/block/drbd/drbd_req.c:912:9-10: WARNING: return of 0/1 in
function 'remote_due_to_read_balancing' with return type bool
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-8-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of invoking a synchronize_rcu() to free a pointer
after a grace period we can directly make use of new API
that does the same but in more efficient way.
TO: Jens Axboe <axboe@kernel.dk>
TO: Philipp Reisner <philipp.reisner@linbit.com>
TO: Jason Gunthorpe <jgg@nvidia.com>
TO: drbd-dev@lists.linbit.com
TO: linux-block@vger.kernel.org
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-7-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
when run checkpath.pl for the first patch, found that
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'.
so fix it. BTW
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-6-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Variable err is set to '-EIO' but this value is never read as
it is overwritten or not used later on, hence it is a redundant
assignment and can be removed.
Clean up the following clang-analyzer warning:
drivers/block/drbd/drbd_receiver.c:3955:5: warning: Value stored to
'err' is never read [clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-4-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
gcc -Wextra warns about mixing drbd_state_rv with drbd_ret_code
in a couple of places:
drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_set_role':
drivers/block/drbd/drbd_nl.c:777:14: warning: comparison between 'enum drbd_state_rv' and 'enum drbd_ret_code' [-Wenum-compare]
777 | if (retcode != NO_ERROR)
| ^~
drivers/block/drbd/drbd_nl.c:784:12: warning: implicit conversion from 'enum drbd_ret_code' to 'enum drbd_state_rv' [-Wenum-conversion]
784 | retcode = ERR_MANDATORY_TAG;
| ^
drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_attach':
drivers/block/drbd/drbd_nl.c:1965:10: warning: implicit conversion from 'enum drbd_state_rv' to 'enum drbd_ret_code' [-Wenum-conversion]
1965 | retcode = rv; /* FIXME: Type mismatch. */
| ^
drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_connect':
drivers/block/drbd/drbd_nl.c:2690:10: warning: implicit conversion from 'enum drbd_state_rv' to 'enum drbd_ret_code' [-Wenum-conversion]
2690 | retcode = conn_request_state(connection, NS(conn, C_UNCONNECTED), CS_VERBOSE);
| ^
drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_disconnect':
drivers/block/drbd/drbd_nl.c:2803:11: warning: implicit conversion from 'enum drbd_state_rv' to 'enum drbd_ret_code' [-Wenum-conversion]
2803 | retcode = rv; /* FIXME: Type mismatch. */
| ^
In each case, both are passed into drbd_adm_finish(), which just takes
a 32-bit integer and is happy with either, presumably intentionally.
Restructure the code to pass either type directly in there in most
cases, avoiding the warnings.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-3-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There are two initializers for P_RETRY_WRITE:
drivers/block/drbd/drbd_main.c:3676:22: warning: initialized field overwritten [-Woverride-init]
Remove the first one since it was already ignored by the compiler
and reorder the list to match the enum definition. As P_ZEROES had
no entry, add that one instead.
Fixes: 036b17eaab ("drbd: Receiving part for the PROTOCOL_UPDATE packet")
Fixes: f31e583aa2 ("drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-2-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Randomly poking into block device internals for manual prefetches isn't
exactly a very maintainable thing to do. And none of the performance
critical direct I/O implementations still use this library function
anyway, so just drop it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220415045258.199825-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Secure erase is a very different operation from discard in that it is
a data integrity operation vs hint. Fully split the limits and helper
infrastructure to make the separation more clear.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2]
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Acked-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Abstract away implementation details from file systems by providing a
block_device based helper to retrieve the discard granularity.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Link: https://lore.kernel.org/r/20220415045258.199825-26-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.
The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a helper to query the number of sectors support per each discard bio
based on the block device and use this helper to stop various places from
poking into the request_queue to see if discard is supported and if so how
much. This mirrors what is done e.g. for write zeroes as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-24-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move all the logic to limit the discard bio size into a common helper
so that it is better documented.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No need to inline these fairly larger helpers. Also fix the return value
to be unsigned, just like the field in struct queue_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220415045258.199825-22-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>