IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This commit implements processing of the REQ_OP_ZONE_RESET_ALL operation
for zoned mapped devices. Given that this operation always has a BIO
sector of 0 and a 0 size, processing through the regular BIO
__split_and_process_bio() function does not work because this function
would always select the first target. Instead, handling of this
operation is implemented using the function __send_zone_reset_all().
Similarly to the __send_empty_flush() function, the new
__send_zone_reset_all() function manually goes through all targets of a
mapped device table doing the following:
1) If the target can natively support REQ_OP_ZONE_RESET_ALL,
__send_duplicate_bios() is used to forward the reset all operation to
the target. This case is handled with the
__send_zone_reset_all_native() function.
2) For other targets, the function __send_zone_reset_all_emulated() is
executed to emulate the execution of REQ_OP_ZONE_RESET_ALL using
regular REQ_OP_ZONE_RESET operations.
Targets that can natively support REQ_OP_ZONE_RESET_ALL are identified
using the new target field zone_reset_all_supported. This boolean is set
to true in for targets that have reliable zone limits, that is, targets
that map all sequential write required zones of their zoned device(s).
Setting this field is handled in dm_set_zones_restrictions() and
device_get_zone_resource_limits().
For targets with unreliable zone limits, REQ_OP_ZONE_RESET_ALL must be
emulated (case 2 above). This is implemented with
__send_zone_reset_all_emulated() and is similar to the block layer
function blkdev_zone_reset_all_emulated(): first a report zones is done
for the zones of the target to identify zones that need reset, that is,
any sequential write required zone that is not already empty. This is
done using a bitmap and the function dm_zone_get_reset_bitmap() which
sets to 1 the bit corresponding to a zone that needs reset. Next, this
zone bitmap is inspected and a clone BIO modified to use the
REQ_OP_ZONE_RESET operation issued for any zone with its bit set in the
zone bitmap.
This implementation is more efficient than what the block layer does
with blkdev_zone_reset_all_emulated(), which is always used for DM zoned
devices currently: as we can natively use REQ_OP_ZONE_RESET_ALL on
targets mapping all sequential write required zones, resetting all zones
of a zoned mapped device can be much faster compared to always emulating
this operation using regular per-zone reset. In the worst case, this
implementation is as-efficient as the block layer emulation. This
reduction in the time it takes to reset all zones of a zoned mapped
device depends directly on the mapped device targets mapping (reliable
zone limits or not).
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240704052816.623865-4-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge in last round of queue limits changes from Christoph.
* for-6.11/block-limits: (26 commits)
block: move the bounce flag into the features field
block: move the skip_tagset_quiesce flag to queue_limits
block: move the pci_p2pdma flag to queue_limits
block: move the zone_resetall flag to queue_limits
block: move the zoned flag into the features field
block: move the poll flag to queue_limits
block: move the dax flag to queue_limits
block: move the nowait flag to queue_limits
block: move the synchronous flag to queue_limits
block: move the stable_writes flag to queue_limits
block: move the io_stat flag setting to queue_limits
block: move the add_random flag to queue_limits
block: move the nonrot flag to queue_limits
block: move cache control settings out of queue->flags
block: remove blk_flush_policy
block: freeze the queue in queue_attr_store
nbd: move setting the cache control flags to __nbd_set_size
virtio_blk: remove virtblk_update_cache_mode
loop: fold loop_update_rotational into loop_reconfigure_limits
loop: also use the default block size from an underlying block device
...
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the zoned flags into the features field to reclaim a little
bit of space.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20240617060532.127975-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With the switch to using the zone append emulation of the block layer
zone write plugging, the macro DM_ZONE_INVALID_WP_OFST is no longer used
in dm-zone.c. Remove its definition.
Fixes: f211268ed1 ("dm: Use the block layer zone append emulation")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240611023639.89277-5-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The generic stacking of limits implemented in the block layer cannot
correctly handle stacking of zone resource limits (max open zones and
max active zones) because these limits are for an entire device but the
stacking may be for a portion of that device (e.g. a dm-linear target
that does not cover an entire block device). As a result, when DM
devices are created on top of zoned block devices, the DM device never
has any zone resource limits advertized, which is only correct if all
underlying target devices also have no zone resource limits.
If at least one target device has resource limits, the user may see
either performance issues (if the max open zone limit of the device is
exceeded) or write I/O errors if the max active zone limit of one of
the underlying target devices is exceeded.
While it is very difficult to correctly and reliably stack zone resource
limits in general, cases where targets are not sharing zone resources of
the same device can be dealt with relatively easily. Such situation
happens when a target maps all sequential zones of a zoned block device:
for such mapping, other targets mapping other parts of the same zoned
block device can only contain conventional zones and thus will not
require any zone resource to correctly handle write operations.
For a mapped device constructed with such targets, which includes mapped
devices constructed with targets mapping entire zoned block devices, the
zone resource limits can be reliably determined using the non-zero
minimum of the zone resource limits of all targets.
For mapped devices that include targets partially mapping the set of
sequential write required zones of zoned block devices, instead of
advertizing no zone resource limits, it is also better to set the mapped
device limits to the non-zero minimum of the limits of all targets. In
this case the limits for a target depend on the number of sequential
zones being mapped: if this number of zone is larger than the limits,
then the limits of the device apply and can be used. If on the other
hand the target maps a number of zones smaller than the limits, then no
limits is needed and we can assume that the target has no limits (limits
set to 0).
This commit improves zone resource limits handling as described above
by modifying dm_set_zones_restrictions() to iterate the targets of a
mapped device to evaluate the max open and max active zone limits. This
relies on an internal "stacking" of the limits of the target devices
combined with a direct counting of the number of sequential zones
mapped by the targets.
1) For a target mapping an entire zoned block device, the limits for the
target are set to the limits of the device.
2) For a target partially mapping a zoned block device, the number of
mapped sequential zones is used to determine the limits: if the
target maps more sequential write required zones than the device
limits, then the limits of the device are used as-is. If the number
of mapped sequential zones is lower than the limits, then we assume
that the target has no limits (limits set to 0).
As this evaluation is done for each target, the zone resource limits
for the mapped device are evaluated as the non-zero minimum of the
limits of all the targets.
For configurations resulting in unreliable limits, i.e. a table
containing a target partially mapping a zoned device, a warning message
is issued.
The counting of mapped sequential zones for the target is done using the
new function dm_device_count_zones() which performs a report zones on
the entire block device with the callback dm_device_count_zones_cb().
This count of mapped sequential zones is also used to determine if the
mapped device contains only conventional zones. This allows simplifying
dm_set_zones_restrictions() to not do a report zones just for this.
For mapped devices mapping only conventional zones, as before, the
mapped device is changed to a regular device by setting its zoned limit
to false and clearing all its zone related limits.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240611023639.89277-4-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
dm_revalidate_zones() is called from dm_set_zone_restrictions() when the
mapped device queue limits are not yet set. However,
dm_revalidate_zones() calls blk_revalidate_disk_zones() and this
function consults and modifies the mapped device queue limits. Thus,
currently, blk_revalidate_disk_zones() operates on limits that are not
yet initialized.
Fix this by moving the call to dm_revalidate_zones() out of
dm_set_zone_restrictions() and into dm_table_set_restrictions() after
executing queue_limits_set().
To further cleanup dm_set_zones_restrictions(), the message about the
type of zone append (native or emulated) is also moved inside
dm_revalidate_zones().
Fixes: 1c0e720228 ("dm: use queue_limits_set")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240611023639.89277-3-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't stuff the values directly into the queue without any
synchronization, but instead delay applying the queue limits in
the caller and let dm_set_zones_restrictions work on the limit
structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240527123634.1116952-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fold it into the only caller in preparation to changes in the
queue limits setup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240527123634.1116952-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Keep it together with the rest of the zoned code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240527123634.1116952-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Using targets such as dm-linear, a mapped device can be created to
contain only conventional zones. Such device should not be treated as
zoned as it does not contain any mandatory sequential write required
zone. Since such device can be randomly written, we can modify
dm_set_zones_restrictions() to set the mapped device zoned queue limit
to false to expose it as a regular block device. The function
dm_check_zoned() does this after counting the number of conventional
zones of the mapped device and comparing it to the total number of zones
reported. The special dm_check_zoned_cb() report zones callback function
is used to count conventional zones.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Link: https://lore.kernel.org/r/20240501110907.96950-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The only user of blk_revalidate_disk_zones() second argument was the
SCSI disk driver (sd). Now that this driver does not require this
update_driver_data argument, remove it to simplify the interface of
blk_revalidate_disk_zones(). Also update the function kdoc comment to
be more accurate (i.e. there is no gendisk ->revalidate method).
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240408014128.205141-21-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For targets requiring zone append operation emulation with regular
writes (e.g. dm-crypt), we can use the block layer emulation provided by
zone write plugging. Remove DM implemented zone append emulation and
enable the block layer one.
This is done by setting the max_zone_append_sectors limit of the
mapped device queue to 0 for mapped devices that have a target table
that cannot support native zone append operations (e.g. dm-crypt).
Such mapped devices are flagged with the DMF_EMULATE_ZONE_APPEND flag.
dm_split_and_process_bio() is modified to execute
blk_zone_write_plug_bio() for such device to let the block layer
transform zone append operations into regular writes. This is done
after ensuring that the submitted BIO is split if it straddles zone
boundaries. Both changes are implemented unsing the inline helpers
dm_zone_write_plug_bio() and dm_zone_bio_needs_split() respectively.
dm_revalidate_zones() is also modified to use the block layer provided
function blk_revalidate_disk_zones() so that all zone resources needed
for zone append emulation are initialized by the block layer without DM
core needing to do anything. Since the device table is not yet live when
dm_revalidate_zones() is executed, enabling the use of
blk_revalidate_disk_zones() requires adding a pointer to the device
table in struct mapped_device. This avoids errors in
dm_blk_report_zones() trying to get the table with dm_get_live_table().
The mapped device table pointer is set to the table passed as argument
to dm_revalidate_zones() before calling blk_revalidate_disk_zones() and
reset to NULL after this function returns to restore the live table
handling for user call of report zones.
All the code related to zone append emulation is removed from
dm-zone.c. This leads to simplifications of the functions __map_bio()
and dm_zone_endio(). This later function now only needs to deal with
completions of real zone append operations for targets that support it.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240408014128.205141-13-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.
It is less verbose and it improves the semantic.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has
deprecated its use.
Suggested-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
being split acorss files.
- Improve DM core's BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling.
- Optimize DM core's more common bio splitting by eliminating the use
of bio cloning with bio_split+bio_chain. Shift that cloning cost to
the relatively unlikely dm_io requeue case that only occurs during
error handling. Introduces dm_io_rewind() that will clone a bio that
reflects the subset of the original bio that must be requeued.
- Remove DM core's dm_table_get_num_targets() wrapper and audit all
dm_table_get_target() callers.
- Fix potential for OOM with DM writecache target by setting a default
MAX_WRITEBACK_JOBS (set to 256MiB or 1/16 of total system memory,
whichever is smaller).
- Fix DM writecache target's stats that are reported through
DM-specific table info.
- Fix use-after-free crash in dm_sm_register_threshold_callback().
- Refine DM core's Persistent Reservation handling in preparation for
broader work Mike Christie is doing to add compatibility with
Microsoft Windows Failover Cluster.
- Fix various KASAN reported bugs in the DM raid target.
- Fix DM raid target crash due to md_handle_request() bio splitting
that recurses to block core without properly initializing the bio's
bi_dev.
- Fix some code comment typos and fix some Documentation formatting.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmLoAAUACgkQxSPxCi2d
A1rFUAf/RnLkzNPS1QJ1uVbiiW64zQUD2o5U8kAliOZxoXQ45U+CgL22a5VaT2R+
rI9+YSg/VX9YkdJwB6I+y7VRWjUCO3NfYOfQUX5NS8GfcPQQn2Zp+jy3t8VnjjXN
5+8Ylu5tX1+QSQiBB9JvIgiK71rSorT6H+KF6bZypL7te5kj39hDaRbmh40VOMOY
QWfs8DmyJft9IHPG7Mku/rFJbdVJph3f31IoqUOoGTOKoDPDUDezhCVHi2vpFcV+
S+iCKkeXmP6eUdCnNzBreYPTcP9OtCvkZEPhFg4lh+jyhjyFq1o0B8G5cRo5jvgr
GcN1GUT040sSqNXFquA4UU6RGvaxlg==
=gBRJ
-----END PGP SIGNATURE-----
Merge tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Refactor DM core's mempool allocation so that it clearer by not being
split acorss files.
- Improve DM core's BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling.
- Optimize DM core's more common bio splitting by eliminating the use
of bio cloning with bio_split+bio_chain. Shift that cloning cost to
the relatively unlikely dm_io requeue case that only occurs during
error handling. Introduces dm_io_rewind() that will clone a bio that
reflects the subset of the original bio that must be requeued.
- Remove DM core's dm_table_get_num_targets() wrapper and audit all
dm_table_get_target() callers.
- Fix potential for OOM with DM writecache target by setting a default
MAX_WRITEBACK_JOBS (set to 256MiB or 1/16 of total system memory,
whichever is smaller).
- Fix DM writecache target's stats that are reported through
DM-specific table info.
- Fix use-after-free crash in dm_sm_register_threshold_callback().
- Refine DM core's Persistent Reservation handling in preparation for
broader work Mike Christie is doing to add compatibility with
Microsoft Windows Failover Cluster.
- Fix various KASAN reported bugs in the DM raid target.
- Fix DM raid target crash due to md_handle_request() bio splitting
that recurses to block core without properly initializing the bio's
bi_dev.
- Fix some code comment typos and fix some Documentation formatting.
* tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits)
dm: fix dm-raid crash if md_handle_request() splits bio
dm raid: fix address sanitizer warning in raid_resume
dm raid: fix address sanitizer warning in raid_status
dm: Start pr_preempt from the same starting path
dm: Fix PR release handling for non All Registrants
dm: Start pr_reserve from the same starting path
dm: Allow dm_call_pr to be used for path searches
dm: return early from dm_pr_call() if DM device is suspended
dm thin: fix use-after-free crash in dm_sm_register_threshold_callback
dm writecache: count number of blocks discarded, not number of discard bios
dm writecache: count number of blocks written, not number of write bios
dm writecache: count number of blocks read, not number of read bios
dm writecache: return void from functions
dm kcopyd: use __GFP_HIGHMEM when allocating pages
dm writecache: set a default MAX_WRITEBACK_JOBS
Documentation: dm writecache: Render status list as list
Documentation: dm writecache: add blank line before optional parameters
dm snapshot: fix typo in snapshot_map() comment
dm raid: remove redundant "the" in parse_raid_params() comment
dm cache: fix typo in 2 comment blocks
...
Use the enum req_op type for request operations instead of unsigned int.
This patch fixes a sparse warning that has been introduced by making
enum req_op __bitwise.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-30-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All callers of dm_table_get_target() are expected to do proper bounds
checking on the index they pass.
Move dm_table_get_target() to dm-core.h to make it extra clear that only
DM core code should be using it. Switch it to be inlined while at it.
Standardize all DM core callers to use the same for loop pattern and
make associated variables as local as possible. Rename some variables
(e.g. s/table/t/ and s/tgt/ti/) along the way.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
More efficient and readable to just access table->num_targets directly.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Move the zone related fields that are currently stored in
struct request_queue to struct gendisk as these are part of the highlevel
block layer API and are only used for non-passthrough I/O that requires
the gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220706070350.1703384-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pass a block_device instead of a request_queue as that is what most
callers have at hand.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220706070350.1703384-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use bdev_is_zoned in all places where a block_device is available instead
of open coding it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220706070350.1703384-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
dm_zone_map_bio() is only called from __map_bio in which the io's
reference is grabbed already, and the reference won't be released
until the bio is submitted, so not necessary to do it dm_zone_map_bio
any more.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Commit 0fbb4d93b3 ("dm: add dm_submit_bio_remap interface") changed
the alloc_io() function to delay the initialization of struct dm_io's
orig_bio member, leaving it NULL until after the dm_io and associated
user submitted bio is processed by __split_and_process_bio(). This
change causes a NULL pointer dereference in dm_zone_map_bio() when the
original user bio is inspected to detect the need for zone append
command emulation.
Fix this NULL pointer by updating dm_zone_map_bio() to not access
->orig_bio when the same info can be accessed from the clone of the
->orig_bio _before_ any ->map processing. Save off the bio_op() and
bio_sectors() for the clone and then use the saved orig_bio_details as
needed.
Fixes: 0fbb4d93b3 ("dm: add dm_submit_bio_remap interface")
Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
There are no more end-users of REQ_OP_WRITE_SAME left, so we can start
deleting it.
Link: https://lore.kernel.org/r/20220209082828.2629273-7-hch@lst.de
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make sure that the zone write pointer offset array is allocated with a
vmalloc in dm_zone_revalidate_cb() by passing GFP_KERNEL gfp flag to
kvcalloc(). However, since we do not want to trigger IOs while
revalidating zones, change dm_revalidate_zones() to have the zone scan
done in GFP_NOIO context using memalloc_noio_save/restore calls.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: bb37d77239 ("dm: introduce zone append emulation")
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
For zoned targets that cannot support zone append operations, implement
an emulation using regular write operations. If the original BIO
submitted by the user is a zone append operation, change its clone into
a regular write operation directed at the target zone write pointer
position.
To do so, an array of write pointer offsets (write pointer position
relative to the start of a zone) is added to struct mapped_device. All
operations that modify a sequential zone write pointer (writes, zone
reset, zone finish and zone append) are intersepted in __map_bio() and
processed using the new functions dm_zone_map_bio().
Detection of the target ability to natively support zone append
operations is done from dm_table_set_restrictions() by calling the
function dm_set_zones_restrictions(). A target that does not support
zone append operation, either by explicitly declaring it using the new
struct dm_target field zone_append_not_supported, or because the device
table contains a non-zoned device, has its mapped device marked with the
new flag DMF_ZONE_APPEND_EMULATED. The helper function
dm_emulate_zone_append() is introduced to test a mapped device for this
new flag.
Atomicity of the zones write pointer tracking and updates is done using
a zone write locking mechanism based on a bitmap. This is similar to
the block layer method but based on BIOs rather than struct request.
A zone write lock is taken in dm_zone_map_bio() for any clone BIO with
an operation type that changes the BIO target zone write pointer
position. The zone write lock is released if the clone BIO is failed
before submission or when dm_zone_endio() is called when the clone BIO
completes.
The zone write lock bitmap of the mapped device, together with a bitmap
indicating zone types (conv_zones_bitmap) and the write pointer offset
array (zwp_offset) are allocated and initialized with a full device zone
report in dm_set_zones_restrictions() using the function
dm_revalidate_zones().
For failed operations that may have modified a zone write pointer, the
zone write pointer offset is marked as invalid in dm_zone_endio().
Zones with an invalid write pointer offset are checked and the write
pointer updated using an internal report zone operation when the
faulty zone is accessed again by the user.
All functions added for this emulation have a minimal overhead for
zoned targets natively supporting zone append operations. Regular
device targets are also not affected. The added code also does not
impact builds with CONFIG_BLK_DEV_ZONED disabled by stubbing out all
dm zone related functions.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
A target map method requesting the requeue of a bio with
DM_MAPIO_REQUEUE or completing it with DM_ENDIO_REQUEUE can cause
unaligned write errors if the bio is a write operation targeting a
sequential zone. If a zoned target request such a requeue, warn about
it and kill the IO.
The function dm_is_zone_write() is introduced to detect write operations
to zoned targets.
This change does not affect the target drivers supporting zoned devices
and exposing a zoned device, namely dm-crypt, dm-linear and dm-flakey as
none of these targets ever request a requeue.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
To simplify the implementation of the report_zones operation of a zoned
target, introduce the function dm_report_zones() to set a target
mapping start sector in struct dm_report_zones_args and call
blkdev_report_zones(). This new function is exported and the report
zones callback function dm_report_zones_cb() is not.
dm-linear, dm-flakey and dm-crypt are modified to use dm_report_zones().
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Move core and table code used for zoned targets and conditionally
defined with #ifdef CONFIG_BLK_DEV_ZONED to the new file dm-zone.c.
This file is conditionally compiled depending on CONFIG_BLK_DEV_ZONED.
The small helper dm_set_zones_restrictions() is introduced to
initialize a mapped device request queue zone attributes in
dm_table_set_restrictions().
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>