for-5.16/block-2021-10-29
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmF8KDgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmQ2D/wO0nH3U+3+OZChi3XUwYck9Dev3o6BANCF ClATiK/kivZY0xY1r8J4ixirZo2gcjIMpWSC3JGYZ5LdspfmYGLUbMjfZsaeU23i lAKaX1IqfArmHN76k3IU1bKCg7B0/LFwC0q9QTFWTSwNSs8RK/EZLJ61U1hEXUb3 OfIpaMmvPiMaU7yuPqhcZK14m1cg1srrLM4rFB/PqsWWStF07pHq32WeArGDAU0e Fe0YSnYD7qqA5Qc37KwqjCTmmxKX5YZf7etIcA6p3DNmwcuQrVNzKoCH/ZEDijaD E2bS/BWbN1x96+rtoEZfBYEaNIrkmJzmW6+fJ53OITbJF3KqP6V66erhqNcFYCzC mhFlRe7voXb/8AP7zQqSIhK529BUBM36sQ6nF7EiQcDrfLc1z39mq6eblUxbknIA DDPISD5Tseik9N9x0bc7vINseKyHI1E90VAU/XKADcuGbzLvehPx+2p+Iq5ch5Ah oa1G3RdlWWQOZxphJHWJhu1qMfo5+FP9dFZj1aoo7b8Kbc/CedyoQe71cpIE5wNh Jj/EpWJnuyKXwuTic2VYGC+6ezM9O5DSdqCfP3YuZky95VESyvRCKJYMMgBYRVdC /LuxhnBXIY2G8An7ZTnX0kLCCvLbapIwa0NyA98/xeOngO843coJ6wn8ZmE9LJNH kMmpCygUrA== =QWC+ -----END PGP SIGNATURE----- Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: - mq-deadline accounting improvements (Bart) - blk-wbt timer fix (Andrea) - Untangle the block layer includes (Christoph) - Rework the poll support to be bio based, which will enable adding support for polling for bio based drivers (Christoph) - Block layer core support for multi-actuator drives (Damien) - blk-crypto improvements (Eric) - Batched tag allocation support (me) - Request completion batching support (me) - Plugging improvements (me) - Shared tag set improvements (John) - Concurrent queue quiesce support (Ming) - Cache bdev in ->private_data for block devices (Pavel) - bdev dio improvements (Pavel) - Block device invalidation and block size improvements (Xie) - Various cleanups, fixes, and improvements (Christoph, Jackie, Masahira, Tejun, Yu, Pavel, Zheng, me) * tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits) blk-mq-debugfs: Show active requests per queue for shared tags block: improve readability of blk_mq_end_request_batch() virtio-blk: Use blk_validate_block_size() to validate block size loop: Use blk_validate_block_size() to validate block size nbd: Use blk_validate_block_size() to validate block size block: Add a helper to validate the block size block: re-flow blk_mq_rq_ctx_init() block: prefetch request to be initialized block: pass in blk_mq_tags to blk_mq_rq_ctx_init() block: add rq_flags to struct blk_mq_alloc_data block: add async version of bio_set_polled block: kill DIO_MULTI_BIO block: kill unused polling bits in __blkdev_direct_IO() block: avoid extra iter advance with async iocb block: Add independent access ranges support blk-mq: don't issue request directly in case that current is to be blocked sbitmap: silence data race warning blk-cgroup: synchronize blkg creation against policy deactivation block: refactor bio_iov_bvec_set() block: add single bio async direct IO helper ...
This commit is contained in:
commit
33c8846c81
@ -7,230 +7,269 @@ Inline Encryption
|
||||
Background
|
||||
==========
|
||||
|
||||
Inline encryption hardware sits logically between memory and the disk, and can
|
||||
en/decrypt data as it goes in/out of the disk. Inline encryption hardware has a
|
||||
fixed number of "keyslots" - slots into which encryption contexts (i.e. the
|
||||
encryption key, encryption algorithm, data unit size) can be programmed by the
|
||||
kernel at any time. Each request sent to the disk can be tagged with the index
|
||||
of a keyslot (and also a data unit number to act as an encryption tweak), and
|
||||
the inline encryption hardware will en/decrypt the data in the request with the
|
||||
encryption context programmed into that keyslot. This is very different from
|
||||
full disk encryption solutions like self encrypting drives/TCG OPAL/ATA
|
||||
Security standards, since with inline encryption, any block on disk could be
|
||||
encrypted with any encryption context the kernel chooses.
|
||||
Inline encryption hardware sits logically between memory and disk, and can
|
||||
en/decrypt data as it goes in/out of the disk. For each I/O request, software
|
||||
can control exactly how the inline encryption hardware will en/decrypt the data
|
||||
in terms of key, algorithm, data unit size (the granularity of en/decryption),
|
||||
and data unit number (a value that determines the initialization vector(s)).
|
||||
|
||||
Some inline encryption hardware accepts all encryption parameters including raw
|
||||
keys directly in low-level I/O requests. However, most inline encryption
|
||||
hardware instead has a fixed number of "keyslots" and requires that the key,
|
||||
algorithm, and data unit size first be programmed into a keyslot. Each
|
||||
low-level I/O request then just contains a keyslot index and data unit number.
|
||||
|
||||
Note that inline encryption hardware is very different from traditional crypto
|
||||
accelerators, which are supported through the kernel crypto API. Traditional
|
||||
crypto accelerators operate on memory regions, whereas inline encryption
|
||||
hardware operates on I/O requests. Thus, inline encryption hardware needs to be
|
||||
managed by the block layer, not the kernel crypto API.
|
||||
|
||||
Inline encryption hardware is also very different from "self-encrypting drives",
|
||||
such as those based on the TCG Opal or ATA Security standards. Self-encrypting
|
||||
drives don't provide fine-grained control of encryption and provide no way to
|
||||
verify the correctness of the resulting ciphertext. Inline encryption hardware
|
||||
provides fine-grained control of encryption, including the choice of key and
|
||||
initialization vector for each sector, and can be tested for correctness.
|
||||
|
||||
Objective
|
||||
=========
|
||||
|
||||
We want to support inline encryption (IE) in the kernel.
|
||||
To allow for testing, we also want a crypto API fallback when actual
|
||||
IE hardware is absent. We also want IE to work with layered devices
|
||||
like dm and loopback (i.e. we want to be able to use the IE hardware
|
||||
of the underlying devices if present, or else fall back to crypto API
|
||||
en/decryption).
|
||||
|
||||
We want to support inline encryption in the kernel. To make testing easier, we
|
||||
also want support for falling back to the kernel crypto API when actual inline
|
||||
encryption hardware is absent. We also want inline encryption to work with
|
||||
layered devices like device-mapper and loopback (i.e. we want to be able to use
|
||||
the inline encryption hardware of the underlying devices if present, or else
|
||||
fall back to crypto API en/decryption).
|
||||
|
||||
Constraints and notes
|
||||
=====================
|
||||
|
||||
- IE hardware has a limited number of "keyslots" that can be programmed
|
||||
with an encryption context (key, algorithm, data unit size, etc.) at any time.
|
||||
One can specify a keyslot in a data request made to the device, and the
|
||||
device will en/decrypt the data using the encryption context programmed into
|
||||
that specified keyslot. When possible, we want to make multiple requests with
|
||||
the same encryption context share the same keyslot.
|
||||
- We need a way for upper layers (e.g. filesystems) to specify an encryption
|
||||
context to use for en/decrypting a bio, and device drivers (e.g. UFSHCD) need
|
||||
to be able to use that encryption context when they process the request.
|
||||
Encryption contexts also introduce constraints on bio merging; the block layer
|
||||
needs to be aware of these constraints.
|
||||
|
||||
- We need a way for upper layers like filesystems to specify an encryption
|
||||
context to use for en/decrypting a struct bio, and a device driver (like UFS)
|
||||
needs to be able to use that encryption context when it processes the bio.
|
||||
- Different inline encryption hardware has different supported algorithms,
|
||||
supported data unit sizes, maximum data unit numbers, etc. We call these
|
||||
properties the "crypto capabilities". We need a way for device drivers to
|
||||
advertise crypto capabilities to upper layers in a generic way.
|
||||
|
||||
- We need a way for device drivers to expose their inline encryption
|
||||
capabilities in a unified way to the upper layers.
|
||||
- Inline encryption hardware usually (but not always) requires that keys be
|
||||
programmed into keyslots before being used. Since programming keyslots may be
|
||||
slow and there may not be very many keyslots, we shouldn't just program the
|
||||
key for every I/O request, but rather keep track of which keys are in the
|
||||
keyslots and reuse an already-programmed keyslot when possible.
|
||||
|
||||
- Upper layers typically define a specific end-of-life for crypto keys, e.g.
|
||||
when an encrypted directory is locked or when a crypto mapping is torn down.
|
||||
At these times, keys are wiped from memory. We must provide a way for upper
|
||||
layers to also evict keys from any keyslots they are present in.
|
||||
|
||||
Design
|
||||
======
|
||||
- When possible, device-mapper devices must be able to pass through the inline
|
||||
encryption support of their underlying devices. However, it doesn't make
|
||||
sense for device-mapper devices to have keyslots themselves.
|
||||
|
||||
We add a struct bio_crypt_ctx to struct bio that can
|
||||
represent an encryption context, because we need to be able to pass this
|
||||
encryption context from the upper layers (like the fs layer) to the
|
||||
device driver to act upon.
|
||||
Basic design
|
||||
============
|
||||
|
||||
While IE hardware works on the notion of keyslots, the FS layer has no
|
||||
knowledge of keyslots - it simply wants to specify an encryption context to
|
||||
use while en/decrypting a bio.
|
||||
We introduce ``struct blk_crypto_key`` to represent an inline encryption key and
|
||||
how it will be used. This includes the actual bytes of the key; the size of the
|
||||
key; the algorithm and data unit size the key will be used with; and the number
|
||||
of bytes needed to represent the maximum data unit number the key will be used
|
||||
with.
|
||||
|
||||
We introduce a keyslot manager (KSM) that handles the translation from
|
||||
encryption contexts specified by the FS to keyslots on the IE hardware.
|
||||
This KSM also serves as the way IE hardware can expose its capabilities to
|
||||
upper layers. The generic mode of operation is: each device driver that wants
|
||||
to support IE will construct a KSM and set it up in its struct request_queue.
|
||||
Upper layers that want to use IE on this device can then use this KSM in
|
||||
the device's struct request_queue to translate an encryption context into
|
||||
a keyslot. The presence of the KSM in the request queue shall be used to mean
|
||||
that the device supports IE.
|
||||
We introduce ``struct bio_crypt_ctx`` to represent an encryption context. It
|
||||
contains a data unit number and a pointer to a blk_crypto_key. We add pointers
|
||||
to a bio_crypt_ctx to ``struct bio`` and ``struct request``; this allows users
|
||||
of the block layer (e.g. filesystems) to provide an encryption context when
|
||||
creating a bio and have it be passed down the stack for processing by the block
|
||||
layer and device drivers. Note that the encryption context doesn't explicitly
|
||||
say whether to encrypt or decrypt, as that is implicit from the direction of the
|
||||
bio; WRITE means encrypt, and READ means decrypt.
|
||||
|
||||
The KSM uses refcounts to track which keyslots are idle (either they have no
|
||||
encryption context programmed, or there are no in-flight struct bios
|
||||
referencing that keyslot). When a new encryption context needs a keyslot, it
|
||||
tries to find a keyslot that has already been programmed with the same
|
||||
encryption context, and if there is no such keyslot, it evicts the least
|
||||
recently used idle keyslot and programs the new encryption context into that
|
||||
one. If no idle keyslots are available, then the caller will sleep until there
|
||||
is at least one.
|
||||
We also introduce ``struct blk_crypto_profile`` to contain all generic inline
|
||||
encryption-related state for a particular inline encryption device. The
|
||||
blk_crypto_profile serves as the way that drivers for inline encryption hardware
|
||||
advertise their crypto capabilities and provide certain functions (e.g.,
|
||||
functions to program and evict keys) to upper layers. Each device driver that
|
||||
wants to support inline encryption will construct a blk_crypto_profile, then
|
||||
associate it with the disk's request_queue.
|
||||
|
||||
The blk_crypto_profile also manages the hardware's keyslots, when applicable.
|
||||
This happens in the block layer, so that users of the block layer can just
|
||||
specify encryption contexts and don't need to know about keyslots at all, nor do
|
||||
device drivers need to care about most details of keyslot management.
|
||||
|
||||
blk-mq changes, other block layer changes and blk-crypto-fallback
|
||||
=================================================================
|
||||
Specifically, for each keyslot, the block layer (via the blk_crypto_profile)
|
||||
keeps track of which blk_crypto_key that keyslot contains (if any), and how many
|
||||
in-flight I/O requests are using it. When the block layer creates a
|
||||
``struct request`` for a bio that has an encryption context, it grabs a keyslot
|
||||
that already contains the key if possible. Otherwise it waits for an idle
|
||||
keyslot (a keyslot that isn't in-use by any I/O), then programs the key into the
|
||||
least-recently-used idle keyslot using the function the device driver provided.
|
||||
In both cases, the resulting keyslot is stored in the ``crypt_keyslot`` field of
|
||||
the request, where it is then accessible to device drivers and is released after
|
||||
the request completes.
|
||||
|
||||
We add a pointer to a ``bi_crypt_context`` and ``keyslot`` to
|
||||
struct request. These will be referred to as the ``crypto fields``
|
||||
for the request. This ``keyslot`` is the keyslot into which the
|
||||
``bi_crypt_context`` has been programmed in the KSM of the ``request_queue``
|
||||
that this request is being sent to.
|
||||
``struct request`` also contains a pointer to the original bio_crypt_ctx.
|
||||
Requests can be built from multiple bios, and the block layer must take the
|
||||
encryption context into account when trying to merge bios and requests. For two
|
||||
bios/requests to be merged, they must have compatible encryption contexts: both
|
||||
unencrypted, or both encrypted with the same key and contiguous data unit
|
||||
numbers. Only the encryption context for the first bio in a request is
|
||||
retained, since the remaining bios have been verified to be merge-compatible
|
||||
with the first bio.
|
||||
|
||||
We introduce ``block/blk-crypto-fallback.c``, which allows upper layers to remain
|
||||
blissfully unaware of whether or not real inline encryption hardware is present
|
||||
underneath. When a bio is submitted with a target ``request_queue`` that doesn't
|
||||
support the encryption context specified with the bio, the block layer will
|
||||
en/decrypt the bio with the blk-crypto-fallback.
|
||||
To make it possible for inline encryption to work with request_queue based
|
||||
layered devices, when a request is cloned, its encryption context is cloned as
|
||||
well. When the cloned request is submitted, it is then processed as usual; this
|
||||
includes getting a keyslot from the clone's target device if needed.
|
||||
|
||||
If the bio is a ``WRITE`` bio, a bounce bio is allocated, and the data in the bio
|
||||
is encrypted stored in the bounce bio - blk-mq will then proceed to process the
|
||||
bounce bio as if it were not encrypted at all (except when blk-integrity is
|
||||
concerned). ``blk-crypto-fallback`` sets the bounce bio's ``bi_end_io`` to an
|
||||
internal function that cleans up the bounce bio and ends the original bio.
|
||||
blk-crypto-fallback
|
||||
===================
|
||||
|
||||
If the bio is a ``READ`` bio, the bio's ``bi_end_io`` (and also ``bi_private``)
|
||||
is saved and overwritten by ``blk-crypto-fallback`` to
|
||||
``bio_crypto_fallback_decrypt_bio``. The bio's ``bi_crypt_context`` is also
|
||||
overwritten with ``NULL``, so that to the rest of the stack, the bio looks
|
||||
as if it was a regular bio that never had an encryption context specified.
|
||||
``bio_crypto_fallback_decrypt_bio`` will decrypt the bio, restore the original
|
||||
``bi_end_io`` (and also ``bi_private``) and end the bio again.
|
||||
It is desirable for the inline encryption support of upper layers (e.g.
|
||||
filesystems) to be testable without real inline encryption hardware, and
|
||||
likewise for the block layer's keyslot management logic. It is also desirable
|
||||
to allow upper layers to just always use inline encryption rather than have to
|
||||
implement encryption in multiple ways.
|
||||
|
||||
Regardless of whether real inline encryption hardware is used or the
|
||||
Therefore, we also introduce *blk-crypto-fallback*, which is an implementation
|
||||
of inline encryption using the kernel crypto API. blk-crypto-fallback is built
|
||||
into the block layer, so it works on any block device without any special setup.
|
||||
Essentially, when a bio with an encryption context is submitted to a
|
||||
request_queue that doesn't support that encryption context, the block layer will
|
||||
handle en/decryption of the bio using blk-crypto-fallback.
|
||||
|
||||
For encryption, the data cannot be encrypted in-place, as callers usually rely
|
||||
on it being unmodified. Instead, blk-crypto-fallback allocates bounce pages,
|
||||
fills a new bio with those bounce pages, encrypts the data into those bounce
|
||||
pages, and submits that "bounce" bio. When the bounce bio completes,
|
||||
blk-crypto-fallback completes the original bio. If the original bio is too
|
||||
large, multiple bounce bios may be required; see the code for details.
|
||||
|
||||
For decryption, blk-crypto-fallback "wraps" the bio's completion callback
|
||||
(``bi_complete``) and private data (``bi_private``) with its own, unsets the
|
||||
bio's encryption context, then submits the bio. If the read completes
|
||||
successfully, blk-crypto-fallback restores the bio's original completion
|
||||
callback and private data, then decrypts the bio's data in-place using the
|
||||
kernel crypto API. Decryption happens from a workqueue, as it may sleep.
|
||||
Afterwards, blk-crypto-fallback completes the bio.
|
||||
|
||||
In both cases, the bios that blk-crypto-fallback submits no longer have an
|
||||
encryption context. Therefore, lower layers only see standard unencrypted I/O.
|
||||
|
||||
blk-crypto-fallback also defines its own blk_crypto_profile and has its own
|
||||
"keyslots"; its keyslots contain ``struct crypto_skcipher`` objects. The reason
|
||||
for this is twofold. First, it allows the keyslot management logic to be tested
|
||||
without actual inline encryption hardware. Second, similar to actual inline
|
||||
encryption hardware, the crypto API doesn't accept keys directly in requests but
|
||||
rather requires that keys be set ahead of time, and setting keys can be
|
||||
expensive; moreover, allocating a crypto_skcipher can't happen on the I/O path
|
||||
at all due to the locks it takes. Therefore, the concept of keyslots still
|
||||
makes sense for blk-crypto-fallback.
|
||||
|
||||
Note that regardless of whether real inline encryption hardware or
|
||||
blk-crypto-fallback is used, the ciphertext written to disk (and hence the
|
||||
on-disk format of data) will be the same (assuming the hardware's implementation
|
||||
of the algorithm being used adheres to spec and functions correctly).
|
||||
|
||||
If a ``request queue``'s inline encryption hardware claimed to support the
|
||||
encryption context specified with a bio, then it will not be handled by the
|
||||
``blk-crypto-fallback``. We will eventually reach a point in blk-mq when a
|
||||
struct request needs to be allocated for that bio. At that point,
|
||||
blk-mq tries to program the encryption context into the ``request_queue``'s
|
||||
keyslot_manager, and obtain a keyslot, which it stores in its newly added
|
||||
``keyslot`` field. This keyslot is released when the request is completed.
|
||||
|
||||
When the first bio is added to a request, ``blk_crypto_rq_bio_prep`` is called,
|
||||
which sets the request's ``crypt_ctx`` to a copy of the bio's
|
||||
``bi_crypt_context``. bio_crypt_do_front_merge is called whenever a subsequent
|
||||
bio is merged to the front of the request, which updates the ``crypt_ctx`` of
|
||||
the request so that it matches the newly merged bio's ``bi_crypt_context``. In particular, the request keeps a copy of the ``bi_crypt_context`` of the first
|
||||
bio in its bio-list (blk-mq needs to be careful to maintain this invariant
|
||||
during bio and request merges).
|
||||
|
||||
To make it possible for inline encryption to work with request queue based
|
||||
layered devices, when a request is cloned, its ``crypto fields`` are cloned as
|
||||
well. When the cloned request is submitted, blk-mq programs the
|
||||
``bi_crypt_context`` of the request into the clone's request_queue's keyslot
|
||||
manager, and stores the returned keyslot in the clone's ``keyslot``.
|
||||
on-disk format of data) will be the same (assuming that both the inline
|
||||
encryption hardware's implementation and the kernel crypto API's implementation
|
||||
of the algorithm being used adhere to spec and function correctly).
|
||||
|
||||
blk-crypto-fallback is optional and is controlled by the
|
||||
``CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK`` kernel configuration option.
|
||||
|
||||
API presented to users of the block layer
|
||||
=========================================
|
||||
|
||||
``struct blk_crypto_key`` represents a crypto key (the raw key, size of the
|
||||
key, the crypto algorithm to use, the data unit size to use, and the number of
|
||||
bytes required to represent data unit numbers that will be specified with the
|
||||
``bi_crypt_context``).
|
||||
``blk_crypto_config_supported()`` allows users to check ahead of time whether
|
||||
inline encryption with particular crypto settings will work on a particular
|
||||
request_queue -- either via hardware or via blk-crypto-fallback. This function
|
||||
takes in a ``struct blk_crypto_config`` which is like blk_crypto_key, but omits
|
||||
the actual bytes of the key and instead just contains the algorithm, data unit
|
||||
size, etc. This function can be useful if blk-crypto-fallback is disabled.
|
||||
|
||||
``blk_crypto_init_key`` allows upper layers to initialize such a
|
||||
``blk_crypto_key``.
|
||||
``blk_crypto_init_key()`` allows users to initialize a blk_crypto_key.
|
||||
|
||||
``bio_crypt_set_ctx`` should be called on any bio that a user of
|
||||
the block layer wants en/decrypted via inline encryption (or the
|
||||
blk-crypto-fallback, if hardware support isn't available for the desired
|
||||
crypto configuration). This function takes the ``blk_crypto_key`` and the
|
||||
data unit number (DUN) to use when en/decrypting the bio.
|
||||
Users must call ``blk_crypto_start_using_key()`` before actually starting to use
|
||||
a blk_crypto_key on a request_queue (even if ``blk_crypto_config_supported()``
|
||||
was called earlier). This is needed to initialize blk-crypto-fallback if it
|
||||
will be needed. This must not be called from the data path, as this may have to
|
||||
allocate resources, which may deadlock in that case.
|
||||
|
||||
``blk_crypto_config_supported`` allows upper layers to query whether or not the
|
||||
an encryption context passed to request queue can be handled by blk-crypto
|
||||
(either by real inline encryption hardware, or by the blk-crypto-fallback).
|
||||
This is useful e.g. when blk-crypto-fallback is disabled, and the upper layer
|
||||
wants to use an algorithm that may not supported by hardware - this function
|
||||
lets the upper layer know ahead of time that the algorithm isn't supported,
|
||||
and the upper layer can fallback to something else if appropriate.
|
||||
Next, to attach an encryption context to a bio, users should call
|
||||
``bio_crypt_set_ctx()``. This function allocates a bio_crypt_ctx and attaches
|
||||
it to a bio, given the blk_crypto_key and the data unit number that will be used
|
||||
for en/decryption. Users don't need to worry about freeing the bio_crypt_ctx
|
||||
later, as that happens automatically when the bio is freed or reset.
|
||||
|
||||
``blk_crypto_start_using_key`` - Upper layers must call this function on
|
||||
``blk_crypto_key`` and a ``request_queue`` before using the key with any bio
|
||||
headed for that ``request_queue``. This function ensures that either the
|
||||
hardware supports the key's crypto settings, or the crypto API fallback has
|
||||
transforms for the needed mode allocated and ready to go. Note that this
|
||||
function may allocate an ``skcipher``, and must not be called from the data
|
||||
path, since allocating ``skciphers`` from the data path can deadlock.
|
||||
Finally, when done using inline encryption with a blk_crypto_key on a
|
||||
request_queue, users must call ``blk_crypto_evict_key()``. This ensures that
|
||||
the key is evicted from all keyslots it may be programmed into and unlinked from
|
||||
any kernel data structures it may be linked into.
|
||||
|
||||
``blk_crypto_evict_key`` *must* be called by upper layers before a
|
||||
``blk_crypto_key`` is freed. Further, it *must* only be called only once
|
||||
there are no more in-flight requests that use that ``blk_crypto_key``.
|
||||
``blk_crypto_evict_key`` will ensure that a key is removed from any keyslots in
|
||||
inline encryption hardware that the key might have been programmed into (or the blk-crypto-fallback).
|
||||
In summary, for users of the block layer, the lifecycle of a blk_crypto_key is
|
||||
as follows:
|
||||
|
||||
1. ``blk_crypto_config_supported()`` (optional)
|
||||
2. ``blk_crypto_init_key()``
|
||||
3. ``blk_crypto_start_using_key()``
|
||||
4. ``bio_crypt_set_ctx()`` (potentially many times)
|
||||
5. ``blk_crypto_evict_key()`` (after all I/O has completed)
|
||||
6. Zeroize the blk_crypto_key (this has no dedicated function)
|
||||
|
||||
If a blk_crypto_key is being used on multiple request_queues, then
|
||||
``blk_crypto_config_supported()`` (if used), ``blk_crypto_start_using_key()``,
|
||||
and ``blk_crypto_evict_key()`` must be called on each request_queue.
|
||||
|
||||
API presented to device drivers
|
||||
===============================
|
||||
|
||||
A :c:type:``struct blk_keyslot_manager`` should be set up by device drivers in
|
||||
the ``request_queue`` of the device. The device driver needs to call
|
||||
``blk_ksm_init`` (or its resource-managed variant ``devm_blk_ksm_init``) on the
|
||||
``blk_keyslot_manager``, while specifying the number of keyslots supported by
|
||||
the hardware.
|
||||
A device driver that wants to support inline encryption must set up a
|
||||
blk_crypto_profile in the request_queue of its device. To do this, it first
|
||||
must call ``blk_crypto_profile_init()`` (or its resource-managed variant
|
||||
``devm_blk_crypto_profile_init()``), providing the number of keyslots.
|
||||
|
||||
The device driver also needs to tell the KSM how to actually manipulate the
|
||||
IE hardware in the device to do things like programming the crypto key into
|
||||
the IE hardware into a particular keyslot. All this is achieved through the
|
||||
struct blk_ksm_ll_ops field in the KSM that the device driver
|
||||
must fill up after initing the ``blk_keyslot_manager``.
|
||||
Next, it must advertise its crypto capabilities by setting fields in the
|
||||
blk_crypto_profile, e.g. ``modes_supported`` and ``max_dun_bytes_supported``.
|
||||
|
||||
The KSM also handles runtime power management for the device when applicable
|
||||
(e.g. when it wants to program a crypto key into the IE hardware, the device
|
||||
must be runtime powered on) - so the device driver must also set the ``dev``
|
||||
field in the ksm to point to the `struct device` for the KSM to use for runtime
|
||||
power management.
|
||||
It then must set function pointers in the ``ll_ops`` field of the
|
||||
blk_crypto_profile to tell upper layers how to control the inline encryption
|
||||
hardware, e.g. how to program and evict keyslots. Most drivers will need to
|
||||
implement ``keyslot_program`` and ``keyslot_evict``. For details, see the
|
||||
comments for ``struct blk_crypto_ll_ops``.
|
||||
|
||||
``blk_ksm_reprogram_all_keys`` can be called by device drivers if the device
|
||||
needs each and every of its keyslots to be reprogrammed with the key it
|
||||
"should have" at the point in time when the function is called. This is useful
|
||||
e.g. if a device loses all its keys on runtime power down/up.
|
||||
Once the driver registers a blk_crypto_profile with a request_queue, I/O
|
||||
requests the driver receives via that queue may have an encryption context. All
|
||||
encryption contexts will be compatible with the crypto capabilities declared in
|
||||
the blk_crypto_profile, so drivers don't need to worry about handling
|
||||
unsupported requests. Also, if a nonzero number of keyslots was declared in the
|
||||
blk_crypto_profile, then all I/O requests that have an encryption context will
|
||||
also have a keyslot which was already programmed with the appropriate key.
|
||||
|
||||
If the driver used ``blk_ksm_init`` instead of ``devm_blk_ksm_init``, then
|
||||
``blk_ksm_destroy`` should be called to free up all resources used by a
|
||||
``blk_keyslot_manager`` once it is no longer needed.
|
||||
If the driver implements runtime suspend and its blk_crypto_ll_ops don't work
|
||||
while the device is runtime-suspended, then the driver must also set the ``dev``
|
||||
field of the blk_crypto_profile to point to the ``struct device`` that will be
|
||||
resumed before any of the low-level operations are called.
|
||||
|
||||
If there are situations where the inline encryption hardware loses the contents
|
||||
of its keyslots, e.g. device resets, the driver must handle reprogramming the
|
||||
keyslots. To do this, the driver may call ``blk_crypto_reprogram_all_keys()``.
|
||||
|
||||
Finally, if the driver used ``blk_crypto_profile_init()`` instead of
|
||||
``devm_blk_crypto_profile_init()``, then it is responsible for calling
|
||||
``blk_crypto_profile_destroy()`` when the crypto profile is no longer needed.
|
||||
|
||||
Layered Devices
|
||||
===============
|
||||
|
||||
Request queue based layered devices like dm-rq that wish to support IE need to
|
||||
create their own keyslot manager for their request queue, and expose whatever
|
||||
functionality they choose. When a layered device wants to pass a clone of that
|
||||
request to another ``request_queue``, blk-crypto will initialize and prepare the
|
||||
clone as necessary - see ``blk_crypto_insert_cloned_request`` in
|
||||
``blk-crypto.c``.
|
||||
|
||||
|
||||
Future Optimizations for layered devices
|
||||
========================================
|
||||
|
||||
Creating a keyslot manager for a layered device uses up memory for each
|
||||
keyslot, and in general, a layered device merely passes the request on to a
|
||||
"child" device, so the keyslots in the layered device itself are completely
|
||||
unused, and don't need any refcounting or keyslot programming. We can instead
|
||||
define a new type of KSM; the "passthrough KSM", that layered devices can use
|
||||
to advertise an unlimited number of keyslots, and support for any encryption
|
||||
algorithms they choose, while not actually using any memory for each keyslot.
|
||||
Another use case for the "passthrough KSM" is for IE devices that do not have a
|
||||
limited number of keyslots.
|
||||
|
||||
Request queue based layered devices like dm-rq that wish to support inline
|
||||
encryption need to create their own blk_crypto_profile for their request_queue,
|
||||
and expose whatever functionality they choose. When a layered device wants to
|
||||
pass a clone of that request to another request_queue, blk-crypto will
|
||||
initialize and prepare the clone as necessary; see
|
||||
``blk_crypto_insert_cloned_request()``.
|
||||
|
||||
Interaction between inline encryption and blk integrity
|
||||
=======================================================
|
||||
@ -257,7 +296,7 @@ Because there isn't any real hardware yet, it seems prudent to assume that
|
||||
hardware implementations might not implement both features together correctly,
|
||||
and disallow the combination for now. Whenever a device supports integrity, the
|
||||
kernel will pretend that the device does not support hardware inline encryption
|
||||
(by essentially setting the keyslot manager in the request_queue of the device
|
||||
to NULL). When the crypto API fallback is enabled, this means that all bios with
|
||||
and encryption context will use the fallback, and IO will complete as usual.
|
||||
When the fallback is disabled, a bio with an encryption context will be failed.
|
||||
(by setting the blk_crypto_profile in the request_queue of the device to NULL).
|
||||
When the crypto API fallback is enabled, this means that all bios with and
|
||||
encryption context will use the fallback, and IO will complete as usual. When
|
||||
the fallback is disabled, a bio with an encryption context will be failed.
|
||||
|
3
Makefile
3
Makefile
@ -1115,7 +1115,8 @@ export MODORDER := $(extmod_prefix)modules.order
|
||||
export MODULES_NSDEPS := $(extmod_prefix)modules.nsdeps
|
||||
|
||||
ifeq ($(KBUILD_EXTMOD),)
|
||||
core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/
|
||||
core-y += kernel/ certs/ mm/ fs/ ipc/ security/ crypto/
|
||||
core-$(CONFIG_BLOCK) += block/
|
||||
|
||||
vmlinux-dirs := $(patsubst %/,%,$(filter %/, \
|
||||
$(core-y) $(core-m) $(drivers-y) $(drivers-m) \
|
||||
|
@ -58,7 +58,7 @@ struct nfhd_device {
|
||||
struct gendisk *disk;
|
||||
};
|
||||
|
||||
static blk_qc_t nfhd_submit_bio(struct bio *bio)
|
||||
static void nfhd_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct nfhd_device *dev = bio->bi_bdev->bd_disk->private_data;
|
||||
struct bio_vec bvec;
|
||||
@ -76,7 +76,6 @@ static blk_qc_t nfhd_submit_bio(struct bio *bio)
|
||||
sec += len;
|
||||
}
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static int nfhd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
|
||||
|
@ -16,7 +16,6 @@
|
||||
#include <linux/console.h>
|
||||
#include <linux/memblock.h>
|
||||
#include <linux/ioport.h>
|
||||
#include <linux/blkdev.h>
|
||||
|
||||
#include <asm/bootinfo.h>
|
||||
#include <asm/mach-rc32434/ddr.h>
|
||||
|
@ -7,7 +7,6 @@
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/linkage.h>
|
||||
#include <linux/mm.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/memblock.h>
|
||||
#include <linux/pm.h>
|
||||
#include <linux/smp.h>
|
||||
|
@ -11,7 +11,6 @@
|
||||
#include <linux/spinlock.h>
|
||||
#include <linux/mm.h>
|
||||
#include <linux/memblock.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/screen_info.h>
|
||||
|
@ -25,7 +25,6 @@
|
||||
#include <linux/memblock.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/delay.h>
|
||||
#include <linux/blkdev.h> /* for initrd_* */
|
||||
#include <linux/pagemap.h>
|
||||
|
||||
#include <asm/pgalloc.h>
|
||||
|
@ -21,6 +21,7 @@
|
||||
#include <linux/namei.h>
|
||||
#include <linux/pagemap.h>
|
||||
#include <linux/poll.h>
|
||||
#include <linux/seq_file.h>
|
||||
#include <linux/slab.h>
|
||||
|
||||
#include <asm/prom.h>
|
||||
|
@ -27,6 +27,7 @@
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/ata.h>
|
||||
#include <linux/hdreg.h>
|
||||
#include <linux/major.h>
|
||||
#include <linux/cdrom.h>
|
||||
#include <linux/proc_fs.h>
|
||||
#include <linux/seq_file.h>
|
||||
|
@ -100,7 +100,7 @@ static void simdisk_transfer(struct simdisk *dev, unsigned long sector,
|
||||
spin_unlock(&dev->lock);
|
||||
}
|
||||
|
||||
static blk_qc_t simdisk_submit_bio(struct bio *bio)
|
||||
static void simdisk_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct simdisk *dev = bio->bi_bdev->bd_disk->private_data;
|
||||
struct bio_vec bvec;
|
||||
@ -118,7 +118,6 @@ static blk_qc_t simdisk_submit_bio(struct bio *bio)
|
||||
}
|
||||
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static int simdisk_open(struct block_device *bdev, fmode_t mode)
|
||||
|
@ -73,7 +73,7 @@ config BLK_DEV_ZONED
|
||||
|
||||
config BLK_DEV_THROTTLING
|
||||
bool "Block layer bio throttling support"
|
||||
depends on BLK_CGROUP=y
|
||||
depends on BLK_CGROUP
|
||||
select BLK_CGROUP_RWSTAT
|
||||
help
|
||||
Block layer bio throttling support. It can be used to limit
|
||||
@ -112,7 +112,7 @@ config BLK_WBT_MQ
|
||||
|
||||
config BLK_CGROUP_IOLATENCY
|
||||
bool "Enable support for latency based cgroup IO protection"
|
||||
depends on BLK_CGROUP=y
|
||||
depends on BLK_CGROUP
|
||||
help
|
||||
Enabling this option enables the .latency interface for IO throttling.
|
||||
The IO controller will attempt to maintain average IO latencies below
|
||||
@ -132,7 +132,7 @@ config BLK_CGROUP_FC_APPID
|
||||
|
||||
config BLK_CGROUP_IOCOST
|
||||
bool "Enable support for cost model based cgroup IO controller"
|
||||
depends on BLK_CGROUP=y
|
||||
depends on BLK_CGROUP
|
||||
select BLK_RQ_IO_DATA_LEN
|
||||
select BLK_RQ_ALLOC_TIME
|
||||
help
|
||||
@ -190,39 +190,31 @@ config BLK_INLINE_ENCRYPTION_FALLBACK
|
||||
by falling back to the kernel crypto API when inline
|
||||
encryption hardware is not present.
|
||||
|
||||
menu "Partition Types"
|
||||
|
||||
source "block/partitions/Kconfig"
|
||||
|
||||
endmenu
|
||||
|
||||
endif # BLOCK
|
||||
|
||||
config BLOCK_COMPAT
|
||||
bool
|
||||
depends on BLOCK && COMPAT
|
||||
default y
|
||||
def_bool COMPAT
|
||||
|
||||
config BLK_MQ_PCI
|
||||
bool
|
||||
depends on BLOCK && PCI
|
||||
default y
|
||||
def_bool PCI
|
||||
|
||||
config BLK_MQ_VIRTIO
|
||||
bool
|
||||
depends on BLOCK && VIRTIO
|
||||
depends on VIRTIO
|
||||
default y
|
||||
|
||||
config BLK_MQ_RDMA
|
||||
bool
|
||||
depends on BLOCK && INFINIBAND
|
||||
depends on INFINIBAND
|
||||
default y
|
||||
|
||||
config BLK_PM
|
||||
def_bool BLOCK && PM
|
||||
def_bool PM
|
||||
|
||||
# do not use in new code
|
||||
config BLOCK_HOLDER_DEPRECATED
|
||||
bool
|
||||
|
||||
source "block/Kconfig.iosched"
|
||||
|
||||
endif # BLOCK
|
||||
|
@ -1,6 +1,4 @@
|
||||
# SPDX-License-Identifier: GPL-2.0
|
||||
if BLOCK
|
||||
|
||||
menu "IO Schedulers"
|
||||
|
||||
config MQ_IOSCHED_DEADLINE
|
||||
@ -45,5 +43,3 @@ config BFQ_CGROUP_DEBUG
|
||||
files in a cgroup which can be useful for debugging.
|
||||
|
||||
endmenu
|
||||
|
||||
endif
|
||||
|
@ -3,13 +3,13 @@
|
||||
# Makefile for the kernel block layer
|
||||
#
|
||||
|
||||
obj-$(CONFIG_BLOCK) := bdev.o fops.o bio.o elevator.o blk-core.o blk-sysfs.o \
|
||||
obj-y := bdev.o fops.o bio.o elevator.o blk-core.o blk-sysfs.o \
|
||||
blk-flush.o blk-settings.o blk-ioc.o blk-map.o \
|
||||
blk-exec.o blk-merge.o blk-timeout.o \
|
||||
blk-lib.o blk-mq.o blk-mq-tag.o blk-stat.o \
|
||||
blk-mq-sysfs.o blk-mq-cpumap.o blk-mq-sched.o ioctl.o \
|
||||
genhd.o ioprio.o badblocks.o partitions/ blk-rq-qos.o \
|
||||
disk-events.o
|
||||
disk-events.o blk-ia-ranges.o
|
||||
|
||||
obj-$(CONFIG_BOUNCE) += bounce.o
|
||||
obj-$(CONFIG_BLK_DEV_BSG_COMMON) += bsg.o
|
||||
@ -36,6 +36,6 @@ obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o
|
||||
obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o
|
||||
obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o
|
||||
obj-$(CONFIG_BLK_PM) += blk-pm.o
|
||||
obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += keyslot-manager.o blk-crypto.o
|
||||
obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += blk-crypto.o blk-crypto-profile.o
|
||||
obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) += blk-crypto-fallback.o
|
||||
obj-$(CONFIG_BLOCK_HOLDER_DEPRECATED) += holder.o
|
||||
|
18
block/bdev.c
18
block/bdev.c
@ -12,6 +12,7 @@
|
||||
#include <linux/major.h>
|
||||
#include <linux/device_cgroup.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/backing-dev.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/blkpg.h>
|
||||
@ -326,12 +327,12 @@ int bdev_read_page(struct block_device *bdev, sector_t sector,
|
||||
if (!ops->rw_page || bdev_get_integrity(bdev))
|
||||
return result;
|
||||
|
||||
result = blk_queue_enter(bdev->bd_disk->queue, 0);
|
||||
result = blk_queue_enter(bdev_get_queue(bdev), 0);
|
||||
if (result)
|
||||
return result;
|
||||
result = ops->rw_page(bdev, sector + get_start_sect(bdev), page,
|
||||
REQ_OP_READ);
|
||||
blk_queue_exit(bdev->bd_disk->queue);
|
||||
blk_queue_exit(bdev_get_queue(bdev));
|
||||
return result;
|
||||
}
|
||||
|
||||
@ -362,7 +363,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
|
||||
|
||||
if (!ops->rw_page || bdev_get_integrity(bdev))
|
||||
return -EOPNOTSUPP;
|
||||
result = blk_queue_enter(bdev->bd_disk->queue, 0);
|
||||
result = blk_queue_enter(bdev_get_queue(bdev), 0);
|
||||
if (result)
|
||||
return result;
|
||||
|
||||
@ -375,7 +376,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
|
||||
clean_page_buffers(page);
|
||||
unlock_page(page);
|
||||
}
|
||||
blk_queue_exit(bdev->bd_disk->queue);
|
||||
blk_queue_exit(bdev_get_queue(bdev));
|
||||
return result;
|
||||
}
|
||||
|
||||
@ -492,6 +493,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
|
||||
spin_lock_init(&bdev->bd_size_lock);
|
||||
bdev->bd_partno = partno;
|
||||
bdev->bd_inode = inode;
|
||||
bdev->bd_queue = disk->queue;
|
||||
bdev->bd_stats = alloc_percpu(struct disk_stats);
|
||||
if (!bdev->bd_stats) {
|
||||
iput(inode);
|
||||
@ -962,9 +964,11 @@ EXPORT_SYMBOL(blkdev_put);
|
||||
* @pathname: special file representing the block device
|
||||
* @dev: return value of the block device's dev_t
|
||||
*
|
||||
* Get a reference to the blockdevice at @pathname in the current
|
||||
* namespace if possible and return it. Return ERR_PTR(error)
|
||||
* otherwise.
|
||||
* Lookup the block device's dev_t at @pathname in the current
|
||||
* namespace if possible and return it by @dev.
|
||||
*
|
||||
* RETURNS:
|
||||
* 0 if succeeded, errno otherwise.
|
||||
*/
|
||||
int lookup_bdev(const char *pathname, dev_t *dev)
|
||||
{
|
||||
|
@ -6,13 +6,13 @@
|
||||
#include <linux/slab.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/cgroup.h>
|
||||
#include <linux/elevator.h>
|
||||
#include <linux/ktime.h>
|
||||
#include <linux/rbtree.h>
|
||||
#include <linux/ioprio.h>
|
||||
#include <linux/sbitmap.h>
|
||||
#include <linux/delay.h>
|
||||
|
||||
#include "elevator.h"
|
||||
#include "bfq-iosched.h"
|
||||
|
||||
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||
@ -463,7 +463,7 @@ static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
|
||||
{
|
||||
if (blkg_rwstat_init(&stats->bytes, gfp) ||
|
||||
blkg_rwstat_init(&stats->ios, gfp))
|
||||
return -ENOMEM;
|
||||
goto error;
|
||||
|
||||
#ifdef CONFIG_BFQ_CGROUP_DEBUG
|
||||
if (blkg_rwstat_init(&stats->merged, gfp) ||
|
||||
@ -476,13 +476,15 @@ static int bfqg_stats_init(struct bfqg_stats *stats, gfp_t gfp)
|
||||
bfq_stat_init(&stats->dequeue, gfp) ||
|
||||
bfq_stat_init(&stats->group_wait_time, gfp) ||
|
||||
bfq_stat_init(&stats->idle_time, gfp) ||
|
||||
bfq_stat_init(&stats->empty_time, gfp)) {
|
||||
bfqg_stats_exit(stats);
|
||||
return -ENOMEM;
|
||||
}
|
||||
bfq_stat_init(&stats->empty_time, gfp))
|
||||
goto error;
|
||||
#endif
|
||||
|
||||
return 0;
|
||||
|
||||
error:
|
||||
bfqg_stats_exit(stats);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
static struct bfq_group_data *cpd_to_bfqgd(struct blkcg_policy_data *cpd)
|
||||
|
@ -117,7 +117,6 @@
|
||||
#include <linux/slab.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/cgroup.h>
|
||||
#include <linux/elevator.h>
|
||||
#include <linux/ktime.h>
|
||||
#include <linux/rbtree.h>
|
||||
#include <linux/ioprio.h>
|
||||
@ -127,6 +126,7 @@
|
||||
|
||||
#include <trace/events/block.h>
|
||||
|
||||
#include "elevator.h"
|
||||
#include "blk.h"
|
||||
#include "blk-mq.h"
|
||||
#include "blk-mq-tag.h"
|
||||
@ -6884,8 +6884,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
|
||||
struct blk_mq_tags *tags = hctx->sched_tags;
|
||||
unsigned int min_shallow;
|
||||
|
||||
min_shallow = bfq_update_depths(bfqd, tags->bitmap_tags);
|
||||
sbitmap_queue_min_shallow_depth(tags->bitmap_tags, min_shallow);
|
||||
min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
|
||||
sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
|
||||
}
|
||||
|
||||
static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
|
||||
|
@ -6,7 +6,7 @@
|
||||
* Written by: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
*/
|
||||
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/mempool.h>
|
||||
#include <linux/export.h>
|
||||
#include <linux/bio.h>
|
||||
@ -134,7 +134,7 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
|
||||
iv = bip->bip_vec + bip->bip_vcnt;
|
||||
|
||||
if (bip->bip_vcnt &&
|
||||
bvec_gap_to_prev(bio->bi_bdev->bd_disk->queue,
|
||||
bvec_gap_to_prev(bdev_get_queue(bio->bi_bdev),
|
||||
&bip->bip_vec[bip->bip_vcnt - 1], offset))
|
||||
return 0;
|
||||
|
||||
|
171
block/bio.c
171
block/bio.c
@ -87,7 +87,8 @@ static struct bio_slab *create_bio_slab(unsigned int size)
|
||||
|
||||
snprintf(bslab->name, sizeof(bslab->name), "bio-%d", size);
|
||||
bslab->slab = kmem_cache_create(bslab->name, size,
|
||||
ARCH_KMALLOC_MINALIGN, SLAB_HWCACHE_ALIGN, NULL);
|
||||
ARCH_KMALLOC_MINALIGN,
|
||||
SLAB_HWCACHE_ALIGN | SLAB_TYPESAFE_BY_RCU, NULL);
|
||||
if (!bslab->slab)
|
||||
goto fail_alloc_slab;
|
||||
|
||||
@ -156,7 +157,7 @@ out:
|
||||
|
||||
void bvec_free(mempool_t *pool, struct bio_vec *bv, unsigned short nr_vecs)
|
||||
{
|
||||
BIO_BUG_ON(nr_vecs > BIO_MAX_VECS);
|
||||
BUG_ON(nr_vecs > BIO_MAX_VECS);
|
||||
|
||||
if (nr_vecs == BIO_MAX_VECS)
|
||||
mempool_free(bv, pool);
|
||||
@ -281,6 +282,7 @@ void bio_init(struct bio *bio, struct bio_vec *table,
|
||||
|
||||
atomic_set(&bio->__bi_remaining, 1);
|
||||
atomic_set(&bio->__bi_cnt, 1);
|
||||
bio->bi_cookie = BLK_QC_T_NONE;
|
||||
|
||||
bio->bi_max_vecs = max_vecs;
|
||||
bio->bi_io_vec = table;
|
||||
@ -546,7 +548,7 @@ EXPORT_SYMBOL(zero_fill_bio);
|
||||
* REQ_OP_READ, zero the truncated part. This function should only
|
||||
* be used for handling corner cases, such as bio eod.
|
||||
*/
|
||||
void bio_truncate(struct bio *bio, unsigned new_size)
|
||||
static void bio_truncate(struct bio *bio, unsigned new_size)
|
||||
{
|
||||
struct bio_vec bv;
|
||||
struct bvec_iter iter;
|
||||
@ -677,7 +679,7 @@ static void bio_alloc_cache_destroy(struct bio_set *bs)
|
||||
void bio_put(struct bio *bio)
|
||||
{
|
||||
if (unlikely(bio_flagged(bio, BIO_REFFED))) {
|
||||
BIO_BUG_ON(!atomic_read(&bio->__bi_cnt));
|
||||
BUG_ON(!atomic_read(&bio->__bi_cnt));
|
||||
if (!atomic_dec_and_test(&bio->__bi_cnt))
|
||||
return;
|
||||
}
|
||||
@ -772,6 +774,23 @@ const char *bio_devname(struct bio *bio, char *buf)
|
||||
}
|
||||
EXPORT_SYMBOL(bio_devname);
|
||||
|
||||
/**
|
||||
* bio_full - check if the bio is full
|
||||
* @bio: bio to check
|
||||
* @len: length of one segment to be added
|
||||
*
|
||||
* Return true if @bio is full and one segment with @len bytes can't be
|
||||
* added to the bio, otherwise return false
|
||||
*/
|
||||
static inline bool bio_full(struct bio *bio, unsigned len)
|
||||
{
|
||||
if (bio->bi_vcnt >= bio->bi_max_vecs)
|
||||
return true;
|
||||
if (bio->bi_iter.bi_size > UINT_MAX - len)
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
static inline bool page_is_mergeable(const struct bio_vec *bv,
|
||||
struct page *page, unsigned int len, unsigned int off,
|
||||
bool *same_page)
|
||||
@ -791,6 +810,44 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
|
||||
return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE);
|
||||
}
|
||||
|
||||
/**
|
||||
* __bio_try_merge_page - try appending data to an existing bvec.
|
||||
* @bio: destination bio
|
||||
* @page: start page to add
|
||||
* @len: length of the data to add
|
||||
* @off: offset of the data relative to @page
|
||||
* @same_page: return if the segment has been merged inside the same page
|
||||
*
|
||||
* Try to add the data at @page + @off to the last bvec of @bio. This is a
|
||||
* useful optimisation for file systems with a block size smaller than the
|
||||
* page size.
|
||||
*
|
||||
* Warn if (@len, @off) crosses pages in case that @same_page is true.
|
||||
*
|
||||
* Return %true on success or %false on failure.
|
||||
*/
|
||||
static bool __bio_try_merge_page(struct bio *bio, struct page *page,
|
||||
unsigned int len, unsigned int off, bool *same_page)
|
||||
{
|
||||
if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
|
||||
return false;
|
||||
|
||||
if (bio->bi_vcnt > 0) {
|
||||
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
|
||||
|
||||
if (page_is_mergeable(bv, page, len, off, same_page)) {
|
||||
if (bio->bi_iter.bi_size > UINT_MAX - len) {
|
||||
*same_page = false;
|
||||
return false;
|
||||
}
|
||||
bv->bv_len += len;
|
||||
bio->bi_iter.bi_size += len;
|
||||
return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* Try to merge a page into a segment, while obeying the hardware segment
|
||||
* size limit. This is not for normal read/write bios, but for passthrough
|
||||
@ -908,7 +965,7 @@ EXPORT_SYMBOL(bio_add_pc_page);
|
||||
int bio_add_zone_append_page(struct bio *bio, struct page *page,
|
||||
unsigned int len, unsigned int offset)
|
||||
{
|
||||
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
|
||||
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
|
||||
bool same_page = false;
|
||||
|
||||
if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND))
|
||||
@ -922,45 +979,6 @@ int bio_add_zone_append_page(struct bio *bio, struct page *page,
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_add_zone_append_page);
|
||||
|
||||
/**
|
||||
* __bio_try_merge_page - try appending data to an existing bvec.
|
||||
* @bio: destination bio
|
||||
* @page: start page to add
|
||||
* @len: length of the data to add
|
||||
* @off: offset of the data relative to @page
|
||||
* @same_page: return if the segment has been merged inside the same page
|
||||
*
|
||||
* Try to add the data at @page + @off to the last bvec of @bio. This is a
|
||||
* useful optimisation for file systems with a block size smaller than the
|
||||
* page size.
|
||||
*
|
||||
* Warn if (@len, @off) crosses pages in case that @same_page is true.
|
||||
*
|
||||
* Return %true on success or %false on failure.
|
||||
*/
|
||||
bool __bio_try_merge_page(struct bio *bio, struct page *page,
|
||||
unsigned int len, unsigned int off, bool *same_page)
|
||||
{
|
||||
if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
|
||||
return false;
|
||||
|
||||
if (bio->bi_vcnt > 0) {
|
||||
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
|
||||
|
||||
if (page_is_mergeable(bv, page, len, off, same_page)) {
|
||||
if (bio->bi_iter.bi_size > UINT_MAX - len) {
|
||||
*same_page = false;
|
||||
return false;
|
||||
}
|
||||
bv->bv_len += len;
|
||||
bio->bi_iter.bi_size += len;
|
||||
return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__bio_try_merge_page);
|
||||
|
||||
/**
|
||||
* __bio_add_page - add page(s) to a bio in a new segment
|
||||
* @bio: destination bio
|
||||
@ -1015,52 +1033,40 @@ int bio_add_page(struct bio *bio, struct page *page,
|
||||
}
|
||||
EXPORT_SYMBOL(bio_add_page);
|
||||
|
||||
void bio_release_pages(struct bio *bio, bool mark_dirty)
|
||||
void __bio_release_pages(struct bio *bio, bool mark_dirty)
|
||||
{
|
||||
struct bvec_iter_all iter_all;
|
||||
struct bio_vec *bvec;
|
||||
|
||||
if (bio_flagged(bio, BIO_NO_PAGE_REF))
|
||||
return;
|
||||
|
||||
bio_for_each_segment_all(bvec, bio, iter_all) {
|
||||
if (mark_dirty && !PageCompound(bvec->bv_page))
|
||||
set_page_dirty_lock(bvec->bv_page);
|
||||
put_page(bvec->bv_page);
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_release_pages);
|
||||
EXPORT_SYMBOL_GPL(__bio_release_pages);
|
||||
|
||||
static void __bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
|
||||
void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
|
||||
{
|
||||
size_t size = iov_iter_count(iter);
|
||||
|
||||
WARN_ON_ONCE(bio->bi_max_vecs);
|
||||
|
||||
if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
|
||||
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
|
||||
size_t max_sectors = queue_max_zone_append_sectors(q);
|
||||
|
||||
size = min(size, max_sectors << SECTOR_SHIFT);
|
||||
}
|
||||
|
||||
bio->bi_vcnt = iter->nr_segs;
|
||||
bio->bi_io_vec = (struct bio_vec *)iter->bvec;
|
||||
bio->bi_iter.bi_bvec_done = iter->iov_offset;
|
||||
bio->bi_iter.bi_size = iter->count;
|
||||
bio->bi_iter.bi_size = size;
|
||||
bio_set_flag(bio, BIO_NO_PAGE_REF);
|
||||
bio_set_flag(bio, BIO_CLONED);
|
||||
}
|
||||
|
||||
static int bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
|
||||
{
|
||||
__bio_iov_bvec_set(bio, iter);
|
||||
iov_iter_advance(iter, iter->count);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int bio_iov_bvec_set_append(struct bio *bio, struct iov_iter *iter)
|
||||
{
|
||||
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
|
||||
struct iov_iter i = *iter;
|
||||
|
||||
iov_iter_truncate(&i, queue_max_zone_append_sectors(q) << 9);
|
||||
__bio_iov_bvec_set(bio, &i);
|
||||
iov_iter_advance(iter, i.count);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void bio_put_pages(struct page **pages, size_t size, size_t off)
|
||||
{
|
||||
size_t i, nr = DIV_ROUND_UP(size + (off & ~PAGE_MASK), PAGE_SIZE);
|
||||
@ -1130,7 +1136,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
|
||||
{
|
||||
unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
|
||||
unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
|
||||
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
|
||||
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
|
||||
unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
|
||||
struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
|
||||
struct page **pages = (struct page **)bv;
|
||||
@ -1202,9 +1208,9 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
|
||||
int ret = 0;
|
||||
|
||||
if (iov_iter_is_bvec(iter)) {
|
||||
if (bio_op(bio) == REQ_OP_ZONE_APPEND)
|
||||
return bio_iov_bvec_set_append(bio, iter);
|
||||
return bio_iov_bvec_set(bio, iter);
|
||||
bio_iov_bvec_set(bio, iter);
|
||||
iov_iter_advance(iter, bio->bi_iter.bi_size);
|
||||
return 0;
|
||||
}
|
||||
|
||||
do {
|
||||
@ -1260,18 +1266,7 @@ int submit_bio_wait(struct bio *bio)
|
||||
}
|
||||
EXPORT_SYMBOL(submit_bio_wait);
|
||||
|
||||
/**
|
||||
* bio_advance - increment/complete a bio by some number of bytes
|
||||
* @bio: bio to advance
|
||||
* @bytes: number of bytes to complete
|
||||
*
|
||||
* This updates bi_sector, bi_size and bi_idx; if the number of bytes to
|
||||
* complete doesn't align with a bvec boundary, then bv_len and bv_offset will
|
||||
* be updated on the last bvec as well.
|
||||
*
|
||||
* @bio will then represent the remaining, uncompleted portion of the io.
|
||||
*/
|
||||
void bio_advance(struct bio *bio, unsigned bytes)
|
||||
void __bio_advance(struct bio *bio, unsigned bytes)
|
||||
{
|
||||
if (bio_integrity(bio))
|
||||
bio_integrity_advance(bio, bytes);
|
||||
@ -1279,7 +1274,7 @@ void bio_advance(struct bio *bio, unsigned bytes)
|
||||
bio_crypt_advance(bio, bytes);
|
||||
bio_advance_iter(bio, &bio->bi_iter, bytes);
|
||||
}
|
||||
EXPORT_SYMBOL(bio_advance);
|
||||
EXPORT_SYMBOL(__bio_advance);
|
||||
|
||||
void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
|
||||
struct bio *src, struct bvec_iter *src_iter)
|
||||
@ -1467,10 +1462,10 @@ again:
|
||||
return;
|
||||
|
||||
if (bio->bi_bdev && bio_flagged(bio, BIO_TRACKED))
|
||||
rq_qos_done_bio(bio->bi_bdev->bd_disk->queue, bio);
|
||||
rq_qos_done_bio(bdev_get_queue(bio->bi_bdev), bio);
|
||||
|
||||
if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
|
||||
trace_block_bio_complete(bio->bi_bdev->bd_disk->queue, bio);
|
||||
trace_block_bio_complete(bdev_get_queue(bio->bi_bdev), bio);
|
||||
bio_clear_flag(bio, BIO_TRACE_COMPLETION);
|
||||
}
|
||||
|
||||
|
@ -32,6 +32,7 @@
|
||||
#include <linux/psi.h>
|
||||
#include "blk.h"
|
||||
#include "blk-ioprio.h"
|
||||
#include "blk-throttle.h"
|
||||
|
||||
/*
|
||||
* blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
|
||||
@ -620,7 +621,7 @@ struct block_device *blkcg_conf_open_bdev(char **inputp)
|
||||
*/
|
||||
int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
|
||||
char *input, struct blkg_conf_ctx *ctx)
|
||||
__acquires(rcu) __acquires(&bdev->bd_disk->queue->queue_lock)
|
||||
__acquires(rcu) __acquires(&bdev->bd_queue->queue_lock)
|
||||
{
|
||||
struct block_device *bdev;
|
||||
struct request_queue *q;
|
||||
@ -631,7 +632,15 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
|
||||
if (IS_ERR(bdev))
|
||||
return PTR_ERR(bdev);
|
||||
|
||||
q = bdev->bd_disk->queue;
|
||||
q = bdev_get_queue(bdev);
|
||||
|
||||
/*
|
||||
* blkcg_deactivate_policy() requires queue to be frozen, we can grab
|
||||
* q_usage_counter to prevent concurrent with blkcg_deactivate_policy().
|
||||
*/
|
||||
ret = blk_queue_enter(q, 0);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
rcu_read_lock();
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
@ -702,6 +711,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
|
||||
goto success;
|
||||
}
|
||||
success:
|
||||
blk_queue_exit(q);
|
||||
ctx->bdev = bdev;
|
||||
ctx->blkg = blkg;
|
||||
ctx->body = input;
|
||||
@ -714,6 +724,7 @@ fail_unlock:
|
||||
rcu_read_unlock();
|
||||
fail:
|
||||
blkdev_put_no_open(bdev);
|
||||
blk_queue_exit(q);
|
||||
/*
|
||||
* If queue was bypassing, we should retry. Do so after a
|
||||
* short msleep(). It isn't strictly necessary but queue
|
||||
@ -736,9 +747,9 @@ EXPORT_SYMBOL_GPL(blkg_conf_prep);
|
||||
* with blkg_conf_prep().
|
||||
*/
|
||||
void blkg_conf_finish(struct blkg_conf_ctx *ctx)
|
||||
__releases(&ctx->bdev->bd_disk->queue->queue_lock) __releases(rcu)
|
||||
__releases(&ctx->bdev->bd_queue->queue_lock) __releases(rcu)
|
||||
{
|
||||
spin_unlock_irq(&ctx->bdev->bd_disk->queue->queue_lock);
|
||||
spin_unlock_irq(&bdev_get_queue(ctx->bdev)->queue_lock);
|
||||
rcu_read_unlock();
|
||||
blkdev_put_no_open(ctx->bdev);
|
||||
}
|
||||
@ -841,7 +852,7 @@ static void blkcg_fill_root_iostats(void)
|
||||
while ((dev = class_dev_iter_next(&iter))) {
|
||||
struct block_device *bdev = dev_to_bdev(dev);
|
||||
struct blkcg_gq *blkg =
|
||||
blk_queue_root_blkg(bdev->bd_disk->queue);
|
||||
blk_queue_root_blkg(bdev_get_queue(bdev));
|
||||
struct blkg_iostat tmp;
|
||||
int cpu;
|
||||
|
||||
@ -1800,7 +1811,7 @@ static inline struct blkcg_gq *blkg_tryget_closest(struct bio *bio,
|
||||
|
||||
rcu_read_lock();
|
||||
blkg = blkg_lookup_create(css_to_blkcg(css),
|
||||
bio->bi_bdev->bd_disk->queue);
|
||||
bdev_get_queue(bio->bi_bdev));
|
||||
while (blkg) {
|
||||
if (blkg_tryget(blkg)) {
|
||||
ret_blkg = blkg;
|
||||
@ -1836,8 +1847,8 @@ void bio_associate_blkg_from_css(struct bio *bio,
|
||||
if (css && css->parent) {
|
||||
bio->bi_blkg = blkg_tryget_closest(bio, css);
|
||||
} else {
|
||||
blkg_get(bio->bi_bdev->bd_disk->queue->root_blkg);
|
||||
bio->bi_blkg = bio->bi_bdev->bd_disk->queue->root_blkg;
|
||||
blkg_get(bdev_get_queue(bio->bi_bdev)->root_blkg);
|
||||
bio->bi_blkg = bdev_get_queue(bio->bi_bdev)->root_blkg;
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
|
||||
|
404
block/blk-core.c
404
block/blk-core.c
@ -18,6 +18,7 @@
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/blk-pm.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/highmem.h>
|
||||
#include <linux/mm.h>
|
||||
#include <linux/pagemap.h>
|
||||
@ -49,6 +50,7 @@
|
||||
#include "blk-mq.h"
|
||||
#include "blk-mq-sched.h"
|
||||
#include "blk-pm.h"
|
||||
#include "blk-throttle.h"
|
||||
|
||||
struct dentry *blk_debugfs_root;
|
||||
|
||||
@ -214,8 +216,7 @@ int blk_status_to_errno(blk_status_t status)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_status_to_errno);
|
||||
|
||||
static void print_req_error(struct request *req, blk_status_t status,
|
||||
const char *caller)
|
||||
void blk_print_req_error(struct request *req, blk_status_t status)
|
||||
{
|
||||
int idx = (__force int)status;
|
||||
|
||||
@ -223,9 +224,9 @@ static void print_req_error(struct request *req, blk_status_t status,
|
||||
return;
|
||||
|
||||
printk_ratelimited(KERN_ERR
|
||||
"%s: %s error, dev %s, sector %llu op 0x%x:(%s) flags 0x%x "
|
||||
"%s error, dev %s, sector %llu op 0x%x:(%s) flags 0x%x "
|
||||
"phys_seg %u prio class %u\n",
|
||||
caller, blk_errors[idx].name,
|
||||
blk_errors[idx].name,
|
||||
req->rq_disk ? req->rq_disk->disk_name : "?",
|
||||
blk_rq_pos(req), req_op(req), blk_op_str(req_op(req)),
|
||||
req->cmd_flags & ~REQ_OP_MASK,
|
||||
@ -233,33 +234,6 @@ static void print_req_error(struct request *req, blk_status_t status,
|
||||
IOPRIO_PRIO_CLASS(req->ioprio));
|
||||
}
|
||||
|
||||
static void req_bio_endio(struct request *rq, struct bio *bio,
|
||||
unsigned int nbytes, blk_status_t error)
|
||||
{
|
||||
if (error)
|
||||
bio->bi_status = error;
|
||||
|
||||
if (unlikely(rq->rq_flags & RQF_QUIET))
|
||||
bio_set_flag(bio, BIO_QUIET);
|
||||
|
||||
bio_advance(bio, nbytes);
|
||||
|
||||
if (req_op(rq) == REQ_OP_ZONE_APPEND && error == BLK_STS_OK) {
|
||||
/*
|
||||
* Partial zone append completions cannot be supported as the
|
||||
* BIO fragments may end up not being written sequentially.
|
||||
*/
|
||||
if (bio->bi_iter.bi_size)
|
||||
bio->bi_status = BLK_STS_IOERR;
|
||||
else
|
||||
bio->bi_iter.bi_sector = rq->__sector;
|
||||
}
|
||||
|
||||
/* don't actually finish bio if it's part of flush sequence */
|
||||
if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
|
||||
bio_endio(bio);
|
||||
}
|
||||
|
||||
void blk_dump_rq_flags(struct request *rq, char *msg)
|
||||
{
|
||||
printk(KERN_INFO "%s: dev %s: flags=%llx\n", msg,
|
||||
@ -402,7 +376,7 @@ void blk_cleanup_queue(struct request_queue *q)
|
||||
*/
|
||||
mutex_lock(&q->sysfs_lock);
|
||||
if (q->elevator)
|
||||
blk_mq_sched_free_requests(q);
|
||||
blk_mq_sched_free_rqs(q);
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
|
||||
percpu_ref_exit(&q->q_usage_counter);
|
||||
@ -415,7 +389,7 @@ EXPORT_SYMBOL(blk_cleanup_queue);
|
||||
static bool blk_try_enter_queue(struct request_queue *q, bool pm)
|
||||
{
|
||||
rcu_read_lock();
|
||||
if (!percpu_ref_tryget_live(&q->q_usage_counter))
|
||||
if (!percpu_ref_tryget_live_rcu(&q->q_usage_counter))
|
||||
goto fail;
|
||||
|
||||
/*
|
||||
@ -430,7 +404,7 @@ static bool blk_try_enter_queue(struct request_queue *q, bool pm)
|
||||
return true;
|
||||
|
||||
fail_put:
|
||||
percpu_ref_put(&q->q_usage_counter);
|
||||
blk_queue_exit(q);
|
||||
fail:
|
||||
rcu_read_unlock();
|
||||
return false;
|
||||
@ -470,10 +444,11 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
|
||||
|
||||
static inline int bio_queue_enter(struct bio *bio)
|
||||
{
|
||||
struct gendisk *disk = bio->bi_bdev->bd_disk;
|
||||
struct request_queue *q = disk->queue;
|
||||
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
|
||||
|
||||
while (!blk_try_enter_queue(q, false)) {
|
||||
struct gendisk *disk = bio->bi_bdev->bd_disk;
|
||||
|
||||
if (bio->bi_opf & REQ_NOWAIT) {
|
||||
if (test_bit(GD_DEAD, &disk->state))
|
||||
goto dead;
|
||||
@ -553,7 +528,7 @@ struct request_queue *blk_alloc_queue(int node_id)
|
||||
|
||||
q->node = node_id;
|
||||
|
||||
atomic_set(&q->nr_active_requests_shared_sbitmap, 0);
|
||||
atomic_set(&q->nr_active_requests_shared_tags, 0);
|
||||
|
||||
timer_setup(&q->timeout, blk_rq_timed_out_timer, 0);
|
||||
INIT_WORK(&q->timeout_work, blk_timeout_work);
|
||||
@ -586,7 +561,7 @@ struct request_queue *blk_alloc_queue(int node_id)
|
||||
|
||||
blk_queue_dma_alignment(q, 511);
|
||||
blk_set_default_limits(&q->limits);
|
||||
q->nr_requests = BLKDEV_MAX_RQ;
|
||||
q->nr_requests = BLKDEV_DEFAULT_RQ;
|
||||
|
||||
return q;
|
||||
|
||||
@ -654,8 +629,9 @@ static void handle_bad_sector(struct bio *bio, sector_t maxsector)
|
||||
{
|
||||
char b[BDEVNAME_SIZE];
|
||||
|
||||
pr_info_ratelimited("attempt to access beyond end of device\n"
|
||||
pr_info_ratelimited("%s: attempt to access beyond end of device\n"
|
||||
"%s: rw=%d, want=%llu, limit=%llu\n",
|
||||
current->comm,
|
||||
bio_devname(bio, b), bio->bi_opf,
|
||||
bio_end_sector(bio), maxsector);
|
||||
}
|
||||
@ -797,7 +773,7 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
|
||||
static noinline_for_stack bool submit_bio_checks(struct bio *bio)
|
||||
{
|
||||
struct block_device *bdev = bio->bi_bdev;
|
||||
struct request_queue *q = bdev->bd_disk->queue;
|
||||
struct request_queue *q = bdev_get_queue(bdev);
|
||||
blk_status_t status = BLK_STS_IOERR;
|
||||
struct blk_plug *plug;
|
||||
|
||||
@ -839,7 +815,7 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
|
||||
}
|
||||
|
||||
if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
|
||||
bio_clear_hipri(bio);
|
||||
bio_clear_polled(bio);
|
||||
|
||||
switch (bio_op(bio)) {
|
||||
case REQ_OP_DISCARD:
|
||||
@ -912,25 +888,22 @@ end_io:
|
||||
return false;
|
||||
}
|
||||
|
||||
static blk_qc_t __submit_bio(struct bio *bio)
|
||||
static void __submit_bio(struct bio *bio)
|
||||
{
|
||||
struct gendisk *disk = bio->bi_bdev->bd_disk;
|
||||
blk_qc_t ret = BLK_QC_T_NONE;
|
||||
|
||||
if (unlikely(bio_queue_enter(bio) != 0))
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
|
||||
if (!submit_bio_checks(bio) || !blk_crypto_bio_prep(&bio))
|
||||
goto queue_exit;
|
||||
if (disk->fops->submit_bio) {
|
||||
ret = disk->fops->submit_bio(bio);
|
||||
goto queue_exit;
|
||||
if (!disk->fops->submit_bio) {
|
||||
blk_mq_submit_bio(bio);
|
||||
return;
|
||||
}
|
||||
return blk_mq_submit_bio(bio);
|
||||
|
||||
disk->fops->submit_bio(bio);
|
||||
queue_exit:
|
||||
blk_queue_exit(disk->queue);
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
@ -952,10 +925,9 @@ queue_exit:
|
||||
* bio_list_on_stack[1] contains bios that were submitted before the current
|
||||
* ->submit_bio_bio, but that haven't been processed yet.
|
||||
*/
|
||||
static blk_qc_t __submit_bio_noacct(struct bio *bio)
|
||||
static void __submit_bio_noacct(struct bio *bio)
|
||||
{
|
||||
struct bio_list bio_list_on_stack[2];
|
||||
blk_qc_t ret = BLK_QC_T_NONE;
|
||||
|
||||
BUG_ON(bio->bi_next);
|
||||
|
||||
@ -963,7 +935,7 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
|
||||
current->bio_list = bio_list_on_stack;
|
||||
|
||||
do {
|
||||
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
|
||||
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
|
||||
struct bio_list lower, same;
|
||||
|
||||
/*
|
||||
@ -972,7 +944,7 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
|
||||
bio_list_on_stack[1] = bio_list_on_stack[0];
|
||||
bio_list_init(&bio_list_on_stack[0]);
|
||||
|
||||
ret = __submit_bio(bio);
|
||||
__submit_bio(bio);
|
||||
|
||||
/*
|
||||
* Sort new bios into those for a lower level and those for the
|
||||
@ -981,7 +953,7 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
|
||||
bio_list_init(&lower);
|
||||
bio_list_init(&same);
|
||||
while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL)
|
||||
if (q == bio->bi_bdev->bd_disk->queue)
|
||||
if (q == bdev_get_queue(bio->bi_bdev))
|
||||
bio_list_add(&same, bio);
|
||||
else
|
||||
bio_list_add(&lower, bio);
|
||||
@ -995,22 +967,19 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
|
||||
} while ((bio = bio_list_pop(&bio_list_on_stack[0])));
|
||||
|
||||
current->bio_list = NULL;
|
||||
return ret;
|
||||
}
|
||||
|
||||
static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
|
||||
static void __submit_bio_noacct_mq(struct bio *bio)
|
||||
{
|
||||
struct bio_list bio_list[2] = { };
|
||||
blk_qc_t ret;
|
||||
|
||||
current->bio_list = bio_list;
|
||||
|
||||
do {
|
||||
ret = __submit_bio(bio);
|
||||
__submit_bio(bio);
|
||||
} while ((bio = bio_list_pop(&bio_list[0])));
|
||||
|
||||
current->bio_list = NULL;
|
||||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
@ -1022,7 +991,7 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
|
||||
* systems and other upper level users of the block layer should use
|
||||
* submit_bio() instead.
|
||||
*/
|
||||
blk_qc_t submit_bio_noacct(struct bio *bio)
|
||||
void submit_bio_noacct(struct bio *bio)
|
||||
{
|
||||
/*
|
||||
* We only want one ->submit_bio to be active at a time, else stack
|
||||
@ -1030,14 +999,12 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
|
||||
* to collect a list of requests submited by a ->submit_bio method while
|
||||
* it is active, and then process them after it returned.
|
||||
*/
|
||||
if (current->bio_list) {
|
||||
if (current->bio_list)
|
||||
bio_list_add(¤t->bio_list[0], bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
if (!bio->bi_bdev->bd_disk->fops->submit_bio)
|
||||
return __submit_bio_noacct_mq(bio);
|
||||
return __submit_bio_noacct(bio);
|
||||
else if (!bio->bi_bdev->bd_disk->fops->submit_bio)
|
||||
__submit_bio_noacct_mq(bio);
|
||||
else
|
||||
__submit_bio_noacct(bio);
|
||||
}
|
||||
EXPORT_SYMBOL(submit_bio_noacct);
|
||||
|
||||
@ -1054,10 +1021,10 @@ EXPORT_SYMBOL(submit_bio_noacct);
|
||||
* in @bio. The bio must NOT be touched by thecaller until ->bi_end_io() has
|
||||
* been called.
|
||||
*/
|
||||
blk_qc_t submit_bio(struct bio *bio)
|
||||
void submit_bio(struct bio *bio)
|
||||
{
|
||||
if (blkcg_punt_bio_submit(bio))
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
|
||||
/*
|
||||
* If it's a regular read/write or a barrier with data attached,
|
||||
@ -1068,7 +1035,7 @@ blk_qc_t submit_bio(struct bio *bio)
|
||||
|
||||
if (unlikely(bio_op(bio) == REQ_OP_WRITE_SAME))
|
||||
count = queue_logical_block_size(
|
||||
bio->bi_bdev->bd_disk->queue) >> 9;
|
||||
bdev_get_queue(bio->bi_bdev)) >> 9;
|
||||
else
|
||||
count = bio_sectors(bio);
|
||||
|
||||
@ -1089,19 +1056,92 @@ blk_qc_t submit_bio(struct bio *bio)
|
||||
if (unlikely(bio_op(bio) == REQ_OP_READ &&
|
||||
bio_flagged(bio, BIO_WORKINGSET))) {
|
||||
unsigned long pflags;
|
||||
blk_qc_t ret;
|
||||
|
||||
psi_memstall_enter(&pflags);
|
||||
ret = submit_bio_noacct(bio);
|
||||
submit_bio_noacct(bio);
|
||||
psi_memstall_leave(&pflags);
|
||||
|
||||
return ret;
|
||||
return;
|
||||
}
|
||||
|
||||
return submit_bio_noacct(bio);
|
||||
submit_bio_noacct(bio);
|
||||
}
|
||||
EXPORT_SYMBOL(submit_bio);
|
||||
|
||||
/**
|
||||
* bio_poll - poll for BIO completions
|
||||
* @bio: bio to poll for
|
||||
* @flags: BLK_POLL_* flags that control the behavior
|
||||
*
|
||||
* Poll for completions on queue associated with the bio. Returns number of
|
||||
* completed entries found.
|
||||
*
|
||||
* Note: the caller must either be the context that submitted @bio, or
|
||||
* be in a RCU critical section to prevent freeing of @bio.
|
||||
*/
|
||||
int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
|
||||
{
|
||||
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
|
||||
blk_qc_t cookie = READ_ONCE(bio->bi_cookie);
|
||||
int ret;
|
||||
|
||||
if (cookie == BLK_QC_T_NONE ||
|
||||
!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
|
||||
return 0;
|
||||
|
||||
if (current->plug)
|
||||
blk_flush_plug(current->plug, false);
|
||||
|
||||
if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT))
|
||||
return 0;
|
||||
if (WARN_ON_ONCE(!queue_is_mq(q)))
|
||||
ret = 0; /* not yet implemented, should not happen */
|
||||
else
|
||||
ret = blk_mq_poll(q, cookie, iob, flags);
|
||||
blk_queue_exit(q);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_poll);
|
||||
|
||||
/*
|
||||
* Helper to implement file_operations.iopoll. Requires the bio to be stored
|
||||
* in iocb->private, and cleared before freeing the bio.
|
||||
*/
|
||||
int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob,
|
||||
unsigned int flags)
|
||||
{
|
||||
struct bio *bio;
|
||||
int ret = 0;
|
||||
|
||||
/*
|
||||
* Note: the bio cache only uses SLAB_TYPESAFE_BY_RCU, so bio can
|
||||
* point to a freshly allocated bio at this point. If that happens
|
||||
* we have a few cases to consider:
|
||||
*
|
||||
* 1) the bio is beeing initialized and bi_bdev is NULL. We can just
|
||||
* simply nothing in this case
|
||||
* 2) the bio points to a not poll enabled device. bio_poll will catch
|
||||
* this and return 0
|
||||
* 3) the bio points to a poll capable device, including but not
|
||||
* limited to the one that the original bio pointed to. In this
|
||||
* case we will call into the actual poll method and poll for I/O,
|
||||
* even if we don't need to, but it won't cause harm either.
|
||||
*
|
||||
* For cases 2) and 3) above the RCU grace period ensures that bi_bdev
|
||||
* is still allocated. Because partitions hold a reference to the whole
|
||||
* device bdev and thus disk, the disk is also still valid. Grabbing
|
||||
* a reference to the queue in bio_poll() ensures the hctxs and requests
|
||||
* are still valid as well.
|
||||
*/
|
||||
rcu_read_lock();
|
||||
bio = READ_ONCE(kiocb->private);
|
||||
if (bio && bio->bi_bdev)
|
||||
ret = bio_poll(bio, iob, flags);
|
||||
rcu_read_unlock();
|
||||
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(iocb_bio_iopoll);
|
||||
|
||||
/**
|
||||
* blk_cloned_rq_check_limits - Helper function to check a cloned request
|
||||
* for the new queue limits
|
||||
@ -1177,8 +1217,7 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
|
||||
if (blk_crypto_insert_cloned_request(rq))
|
||||
return BLK_STS_IOERR;
|
||||
|
||||
if (blk_queue_io_stat(q))
|
||||
blk_account_io_start(rq);
|
||||
blk_account_io_start(rq);
|
||||
|
||||
/*
|
||||
* Since we have a scheduler attached on the top device,
|
||||
@ -1246,41 +1285,19 @@ again:
|
||||
}
|
||||
}
|
||||
|
||||
static void blk_account_io_completion(struct request *req, unsigned int bytes)
|
||||
void __blk_account_io_done(struct request *req, u64 now)
|
||||
{
|
||||
if (req->part && blk_do_io_stat(req)) {
|
||||
const int sgrp = op_stat_group(req_op(req));
|
||||
const int sgrp = op_stat_group(req_op(req));
|
||||
|
||||
part_stat_lock();
|
||||
part_stat_add(req->part, sectors[sgrp], bytes >> 9);
|
||||
part_stat_unlock();
|
||||
}
|
||||
part_stat_lock();
|
||||
update_io_ticks(req->part, jiffies, true);
|
||||
part_stat_inc(req->part, ios[sgrp]);
|
||||
part_stat_add(req->part, nsecs[sgrp], now - req->start_time_ns);
|
||||
part_stat_unlock();
|
||||
}
|
||||
|
||||
void blk_account_io_done(struct request *req, u64 now)
|
||||
void __blk_account_io_start(struct request *rq)
|
||||
{
|
||||
/*
|
||||
* Account IO completion. flush_rq isn't accounted as a
|
||||
* normal IO on queueing nor completion. Accounting the
|
||||
* containing request is enough.
|
||||
*/
|
||||
if (req->part && blk_do_io_stat(req) &&
|
||||
!(req->rq_flags & RQF_FLUSH_SEQ)) {
|
||||
const int sgrp = op_stat_group(req_op(req));
|
||||
|
||||
part_stat_lock();
|
||||
update_io_ticks(req->part, jiffies, true);
|
||||
part_stat_inc(req->part, ios[sgrp]);
|
||||
part_stat_add(req->part, nsecs[sgrp], now - req->start_time_ns);
|
||||
part_stat_unlock();
|
||||
}
|
||||
}
|
||||
|
||||
void blk_account_io_start(struct request *rq)
|
||||
{
|
||||
if (!blk_do_io_stat(rq))
|
||||
return;
|
||||
|
||||
/* passthrough requests can hold bios that do not have ->bi_bdev set */
|
||||
if (rq->bio && rq->bio->bi_bdev)
|
||||
rq->part = rq->bio->bi_bdev;
|
||||
@ -1376,112 +1393,6 @@ void blk_steal_bios(struct bio_list *list, struct request *rq)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_steal_bios);
|
||||
|
||||
/**
|
||||
* blk_update_request - Complete multiple bytes without completing the request
|
||||
* @req: the request being processed
|
||||
* @error: block status code
|
||||
* @nr_bytes: number of bytes to complete for @req
|
||||
*
|
||||
* Description:
|
||||
* Ends I/O on a number of bytes attached to @req, but doesn't complete
|
||||
* the request structure even if @req doesn't have leftover.
|
||||
* If @req has leftover, sets it up for the next range of segments.
|
||||
*
|
||||
* Passing the result of blk_rq_bytes() as @nr_bytes guarantees
|
||||
* %false return from this function.
|
||||
*
|
||||
* Note:
|
||||
* The RQF_SPECIAL_PAYLOAD flag is ignored on purpose in this function
|
||||
* except in the consistency check at the end of this function.
|
||||
*
|
||||
* Return:
|
||||
* %false - this request doesn't have any more data
|
||||
* %true - this request has more data
|
||||
**/
|
||||
bool blk_update_request(struct request *req, blk_status_t error,
|
||||
unsigned int nr_bytes)
|
||||
{
|
||||
int total_bytes;
|
||||
|
||||
trace_block_rq_complete(req, blk_status_to_errno(error), nr_bytes);
|
||||
|
||||
if (!req->bio)
|
||||
return false;
|
||||
|
||||
#ifdef CONFIG_BLK_DEV_INTEGRITY
|
||||
if (blk_integrity_rq(req) && req_op(req) == REQ_OP_READ &&
|
||||
error == BLK_STS_OK)
|
||||
req->q->integrity.profile->complete_fn(req, nr_bytes);
|
||||
#endif
|
||||
|
||||
if (unlikely(error && !blk_rq_is_passthrough(req) &&
|
||||
!(req->rq_flags & RQF_QUIET)))
|
||||
print_req_error(req, error, __func__);
|
||||
|
||||
blk_account_io_completion(req, nr_bytes);
|
||||
|
||||
total_bytes = 0;
|
||||
while (req->bio) {
|
||||
struct bio *bio = req->bio;
|
||||
unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes);
|
||||
|
||||
if (bio_bytes == bio->bi_iter.bi_size)
|
||||
req->bio = bio->bi_next;
|
||||
|
||||
/* Completion has already been traced */
|
||||
bio_clear_flag(bio, BIO_TRACE_COMPLETION);
|
||||
req_bio_endio(req, bio, bio_bytes, error);
|
||||
|
||||
total_bytes += bio_bytes;
|
||||
nr_bytes -= bio_bytes;
|
||||
|
||||
if (!nr_bytes)
|
||||
break;
|
||||
}
|
||||
|
||||
/*
|
||||
* completely done
|
||||
*/
|
||||
if (!req->bio) {
|
||||
/*
|
||||
* Reset counters so that the request stacking driver
|
||||
* can find how many bytes remain in the request
|
||||
* later.
|
||||
*/
|
||||
req->__data_len = 0;
|
||||
return false;
|
||||
}
|
||||
|
||||
req->__data_len -= total_bytes;
|
||||
|
||||
/* update sector only for requests with clear definition of sector */
|
||||
if (!blk_rq_is_passthrough(req))
|
||||
req->__sector += total_bytes >> 9;
|
||||
|
||||
/* mixed attributes always follow the first bio */
|
||||
if (req->rq_flags & RQF_MIXED_MERGE) {
|
||||
req->cmd_flags &= ~REQ_FAILFAST_MASK;
|
||||
req->cmd_flags |= req->bio->bi_opf & REQ_FAILFAST_MASK;
|
||||
}
|
||||
|
||||
if (!(req->rq_flags & RQF_SPECIAL_PAYLOAD)) {
|
||||
/*
|
||||
* If total number of sectors is less than the first segment
|
||||
* size, something has gone terribly wrong.
|
||||
*/
|
||||
if (blk_rq_bytes(req) < blk_rq_cur_bytes(req)) {
|
||||
blk_dump_rq_flags(req, "request botched");
|
||||
req->__data_len = blk_rq_cur_bytes(req);
|
||||
}
|
||||
|
||||
/* recalculate the number of segments */
|
||||
req->nr_phys_segments = blk_recalc_rq_segments(req);
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_update_request);
|
||||
|
||||
#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
|
||||
/**
|
||||
* rq_flush_dcache_pages - Helper function to flush all pages in a request
|
||||
@ -1629,6 +1540,32 @@ int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
|
||||
}
|
||||
EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
|
||||
|
||||
void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
|
||||
{
|
||||
struct task_struct *tsk = current;
|
||||
|
||||
/*
|
||||
* If this is a nested plug, don't actually assign it.
|
||||
*/
|
||||
if (tsk->plug)
|
||||
return;
|
||||
|
||||
plug->mq_list = NULL;
|
||||
plug->cached_rq = NULL;
|
||||
plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
|
||||
plug->rq_count = 0;
|
||||
plug->multiple_queues = false;
|
||||
plug->has_elevator = false;
|
||||
plug->nowait = false;
|
||||
INIT_LIST_HEAD(&plug->cb_list);
|
||||
|
||||
/*
|
||||
* Store ordering should not be needed here, since a potential
|
||||
* preempt will imply a full memory barrier
|
||||
*/
|
||||
tsk->plug = plug;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_start_plug - initialize blk_plug and track it inside the task_struct
|
||||
* @plug: The &struct blk_plug that needs to be initialized
|
||||
@ -1654,25 +1591,7 @@ EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
|
||||
*/
|
||||
void blk_start_plug(struct blk_plug *plug)
|
||||
{
|
||||
struct task_struct *tsk = current;
|
||||
|
||||
/*
|
||||
* If this is a nested plug, don't actually assign it.
|
||||
*/
|
||||
if (tsk->plug)
|
||||
return;
|
||||
|
||||
INIT_LIST_HEAD(&plug->mq_list);
|
||||
INIT_LIST_HEAD(&plug->cb_list);
|
||||
plug->rq_count = 0;
|
||||
plug->multiple_queues = false;
|
||||
plug->nowait = false;
|
||||
|
||||
/*
|
||||
* Store ordering should not be needed here, since a potential
|
||||
* preempt will imply a full memory barrier
|
||||
*/
|
||||
tsk->plug = plug;
|
||||
blk_start_plug_nr_ios(plug, 1);
|
||||
}
|
||||
EXPORT_SYMBOL(blk_start_plug);
|
||||
|
||||
@ -1718,12 +1637,14 @@ struct blk_plug_cb *blk_check_plugged(blk_plug_cb_fn unplug, void *data,
|
||||
}
|
||||
EXPORT_SYMBOL(blk_check_plugged);
|
||||
|
||||
void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
|
||||
void blk_flush_plug(struct blk_plug *plug, bool from_schedule)
|
||||
{
|
||||
flush_plug_callbacks(plug, from_schedule);
|
||||
|
||||
if (!list_empty(&plug->mq_list))
|
||||
if (!list_empty(&plug->cb_list))
|
||||
flush_plug_callbacks(plug, from_schedule);
|
||||
if (!rq_list_empty(plug->mq_list))
|
||||
blk_mq_flush_plug_list(plug, from_schedule);
|
||||
if (unlikely(!from_schedule && plug->cached_rq))
|
||||
blk_mq_free_plug_rqs(plug);
|
||||
}
|
||||
|
||||
/**
|
||||
@ -1738,11 +1659,10 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
|
||||
*/
|
||||
void blk_finish_plug(struct blk_plug *plug)
|
||||
{
|
||||
if (plug != current->plug)
|
||||
return;
|
||||
blk_flush_plug_list(plug, false);
|
||||
|
||||
current->plug = NULL;
|
||||
if (plug == current->plug) {
|
||||
blk_flush_plug(plug, false);
|
||||
current->plug = NULL;
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL(blk_finish_plug);
|
||||
|
||||
|
@ -12,12 +12,13 @@
|
||||
#include <crypto/skcipher.h>
|
||||
#include <linux/blk-cgroup.h>
|
||||
#include <linux/blk-crypto.h>
|
||||
#include <linux/blk-crypto-profile.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/crypto.h>
|
||||
#include <linux/keyslot-manager.h>
|
||||
#include <linux/mempool.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/random.h>
|
||||
#include <linux/scatterlist.h>
|
||||
|
||||
#include "blk-crypto-internal.h"
|
||||
|
||||
@ -72,12 +73,12 @@ static mempool_t *bio_fallback_crypt_ctx_pool;
|
||||
static DEFINE_MUTEX(tfms_init_lock);
|
||||
static bool tfms_inited[BLK_ENCRYPTION_MODE_MAX];
|
||||
|
||||
static struct blk_crypto_keyslot {
|
||||
static struct blk_crypto_fallback_keyslot {
|
||||
enum blk_crypto_mode_num crypto_mode;
|
||||
struct crypto_skcipher *tfms[BLK_ENCRYPTION_MODE_MAX];
|
||||
} *blk_crypto_keyslots;
|
||||
|
||||
static struct blk_keyslot_manager blk_crypto_ksm;
|
||||
static struct blk_crypto_profile blk_crypto_fallback_profile;
|
||||
static struct workqueue_struct *blk_crypto_wq;
|
||||
static mempool_t *blk_crypto_bounce_page_pool;
|
||||
static struct bio_set crypto_bio_split;
|
||||
@ -88,9 +89,9 @@ static struct bio_set crypto_bio_split;
|
||||
*/
|
||||
static u8 blank_key[BLK_CRYPTO_MAX_KEY_SIZE];
|
||||
|
||||
static void blk_crypto_evict_keyslot(unsigned int slot)
|
||||
static void blk_crypto_fallback_evict_keyslot(unsigned int slot)
|
||||
{
|
||||
struct blk_crypto_keyslot *slotp = &blk_crypto_keyslots[slot];
|
||||
struct blk_crypto_fallback_keyslot *slotp = &blk_crypto_keyslots[slot];
|
||||
enum blk_crypto_mode_num crypto_mode = slotp->crypto_mode;
|
||||
int err;
|
||||
|
||||
@ -103,45 +104,41 @@ static void blk_crypto_evict_keyslot(unsigned int slot)
|
||||
slotp->crypto_mode = BLK_ENCRYPTION_MODE_INVALID;
|
||||
}
|
||||
|
||||
static int blk_crypto_keyslot_program(struct blk_keyslot_manager *ksm,
|
||||
const struct blk_crypto_key *key,
|
||||
unsigned int slot)
|
||||
static int
|
||||
blk_crypto_fallback_keyslot_program(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_key *key,
|
||||
unsigned int slot)
|
||||
{
|
||||
struct blk_crypto_keyslot *slotp = &blk_crypto_keyslots[slot];
|
||||
struct blk_crypto_fallback_keyslot *slotp = &blk_crypto_keyslots[slot];
|
||||
const enum blk_crypto_mode_num crypto_mode =
|
||||
key->crypto_cfg.crypto_mode;
|
||||
int err;
|
||||
|
||||
if (crypto_mode != slotp->crypto_mode &&
|
||||
slotp->crypto_mode != BLK_ENCRYPTION_MODE_INVALID)
|
||||
blk_crypto_evict_keyslot(slot);
|
||||
blk_crypto_fallback_evict_keyslot(slot);
|
||||
|
||||
slotp->crypto_mode = crypto_mode;
|
||||
err = crypto_skcipher_setkey(slotp->tfms[crypto_mode], key->raw,
|
||||
key->size);
|
||||
if (err) {
|
||||
blk_crypto_evict_keyslot(slot);
|
||||
blk_crypto_fallback_evict_keyslot(slot);
|
||||
return err;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int blk_crypto_keyslot_evict(struct blk_keyslot_manager *ksm,
|
||||
const struct blk_crypto_key *key,
|
||||
unsigned int slot)
|
||||
static int blk_crypto_fallback_keyslot_evict(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_key *key,
|
||||
unsigned int slot)
|
||||
{
|
||||
blk_crypto_evict_keyslot(slot);
|
||||
blk_crypto_fallback_evict_keyslot(slot);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* The crypto API fallback KSM ops - only used for a bio when it specifies a
|
||||
* blk_crypto_key that was not supported by the device's inline encryption
|
||||
* hardware.
|
||||
*/
|
||||
static const struct blk_ksm_ll_ops blk_crypto_ksm_ll_ops = {
|
||||
.keyslot_program = blk_crypto_keyslot_program,
|
||||
.keyslot_evict = blk_crypto_keyslot_evict,
|
||||
static const struct blk_crypto_ll_ops blk_crypto_fallback_ll_ops = {
|
||||
.keyslot_program = blk_crypto_fallback_keyslot_program,
|
||||
.keyslot_evict = blk_crypto_fallback_keyslot_evict,
|
||||
};
|
||||
|
||||
static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
|
||||
@ -159,7 +156,7 @@ static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
|
||||
bio_endio(src_bio);
|
||||
}
|
||||
|
||||
static struct bio *blk_crypto_clone_bio(struct bio *bio_src)
|
||||
static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
|
||||
{
|
||||
struct bvec_iter iter;
|
||||
struct bio_vec bv;
|
||||
@ -186,13 +183,14 @@ static struct bio *blk_crypto_clone_bio(struct bio *bio_src)
|
||||
return bio;
|
||||
}
|
||||
|
||||
static bool blk_crypto_alloc_cipher_req(struct blk_ksm_keyslot *slot,
|
||||
struct skcipher_request **ciph_req_ret,
|
||||
struct crypto_wait *wait)
|
||||
static bool
|
||||
blk_crypto_fallback_alloc_cipher_req(struct blk_crypto_keyslot *slot,
|
||||
struct skcipher_request **ciph_req_ret,
|
||||
struct crypto_wait *wait)
|
||||
{
|
||||
struct skcipher_request *ciph_req;
|
||||
const struct blk_crypto_keyslot *slotp;
|
||||
int keyslot_idx = blk_ksm_get_slot_idx(slot);
|
||||
const struct blk_crypto_fallback_keyslot *slotp;
|
||||
int keyslot_idx = blk_crypto_keyslot_index(slot);
|
||||
|
||||
slotp = &blk_crypto_keyslots[keyslot_idx];
|
||||
ciph_req = skcipher_request_alloc(slotp->tfms[slotp->crypto_mode],
|
||||
@ -209,7 +207,7 @@ static bool blk_crypto_alloc_cipher_req(struct blk_ksm_keyslot *slot,
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool blk_crypto_split_bio_if_needed(struct bio **bio_ptr)
|
||||
static bool blk_crypto_fallback_split_bio_if_needed(struct bio **bio_ptr)
|
||||
{
|
||||
struct bio *bio = *bio_ptr;
|
||||
unsigned int i = 0;
|
||||
@ -264,7 +262,7 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
|
||||
{
|
||||
struct bio *src_bio, *enc_bio;
|
||||
struct bio_crypt_ctx *bc;
|
||||
struct blk_ksm_keyslot *slot;
|
||||
struct blk_crypto_keyslot *slot;
|
||||
int data_unit_size;
|
||||
struct skcipher_request *ciph_req = NULL;
|
||||
DECLARE_CRYPTO_WAIT(wait);
|
||||
@ -276,7 +274,7 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
|
||||
blk_status_t blk_st;
|
||||
|
||||
/* Split the bio if it's too big for single page bvec */
|
||||
if (!blk_crypto_split_bio_if_needed(bio_ptr))
|
||||
if (!blk_crypto_fallback_split_bio_if_needed(bio_ptr))
|
||||
return false;
|
||||
|
||||
src_bio = *bio_ptr;
|
||||
@ -284,24 +282,25 @@ static bool blk_crypto_fallback_encrypt_bio(struct bio **bio_ptr)
|
||||
data_unit_size = bc->bc_key->crypto_cfg.data_unit_size;
|
||||
|
||||
/* Allocate bounce bio for encryption */
|
||||
enc_bio = blk_crypto_clone_bio(src_bio);
|
||||
enc_bio = blk_crypto_fallback_clone_bio(src_bio);
|
||||
if (!enc_bio) {
|
||||
src_bio->bi_status = BLK_STS_RESOURCE;
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* Use the crypto API fallback keyslot manager to get a crypto_skcipher
|
||||
* for the algorithm and key specified for this bio.
|
||||
* Get a blk-crypto-fallback keyslot that contains a crypto_skcipher for
|
||||
* this bio's algorithm and key.
|
||||
*/
|
||||
blk_st = blk_ksm_get_slot_for_key(&blk_crypto_ksm, bc->bc_key, &slot);
|
||||
blk_st = blk_crypto_get_keyslot(&blk_crypto_fallback_profile,
|
||||
bc->bc_key, &slot);
|
||||
if (blk_st != BLK_STS_OK) {
|
||||
src_bio->bi_status = blk_st;
|
||||
goto out_put_enc_bio;
|
||||
}
|
||||
|
||||
/* and then allocate an skcipher_request for it */
|
||||
if (!blk_crypto_alloc_cipher_req(slot, &ciph_req, &wait)) {
|
||||
if (!blk_crypto_fallback_alloc_cipher_req(slot, &ciph_req, &wait)) {
|
||||
src_bio->bi_status = BLK_STS_RESOURCE;
|
||||
goto out_release_keyslot;
|
||||
}
|
||||
@ -362,7 +361,7 @@ out_free_bounce_pages:
|
||||
out_free_ciph_req:
|
||||
skcipher_request_free(ciph_req);
|
||||
out_release_keyslot:
|
||||
blk_ksm_put_slot(slot);
|
||||
blk_crypto_put_keyslot(slot);
|
||||
out_put_enc_bio:
|
||||
if (enc_bio)
|
||||
bio_put(enc_bio);
|
||||
@ -380,7 +379,7 @@ static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
|
||||
container_of(work, struct bio_fallback_crypt_ctx, work);
|
||||
struct bio *bio = f_ctx->bio;
|
||||
struct bio_crypt_ctx *bc = &f_ctx->crypt_ctx;
|
||||
struct blk_ksm_keyslot *slot;
|
||||
struct blk_crypto_keyslot *slot;
|
||||
struct skcipher_request *ciph_req = NULL;
|
||||
DECLARE_CRYPTO_WAIT(wait);
|
||||
u64 curr_dun[BLK_CRYPTO_DUN_ARRAY_SIZE];
|
||||
@ -393,17 +392,18 @@ static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
|
||||
blk_status_t blk_st;
|
||||
|
||||
/*
|
||||
* Use the crypto API fallback keyslot manager to get a crypto_skcipher
|
||||
* for the algorithm and key specified for this bio.
|
||||
* Get a blk-crypto-fallback keyslot that contains a crypto_skcipher for
|
||||
* this bio's algorithm and key.
|
||||
*/
|
||||
blk_st = blk_ksm_get_slot_for_key(&blk_crypto_ksm, bc->bc_key, &slot);
|
||||
blk_st = blk_crypto_get_keyslot(&blk_crypto_fallback_profile,
|
||||
bc->bc_key, &slot);
|
||||
if (blk_st != BLK_STS_OK) {
|
||||
bio->bi_status = blk_st;
|
||||
goto out_no_keyslot;
|
||||
}
|
||||
|
||||
/* and then allocate an skcipher_request for it */
|
||||
if (!blk_crypto_alloc_cipher_req(slot, &ciph_req, &wait)) {
|
||||
if (!blk_crypto_fallback_alloc_cipher_req(slot, &ciph_req, &wait)) {
|
||||
bio->bi_status = BLK_STS_RESOURCE;
|
||||
goto out;
|
||||
}
|
||||
@ -434,7 +434,7 @@ static void blk_crypto_fallback_decrypt_bio(struct work_struct *work)
|
||||
|
||||
out:
|
||||
skcipher_request_free(ciph_req);
|
||||
blk_ksm_put_slot(slot);
|
||||
blk_crypto_put_keyslot(slot);
|
||||
out_no_keyslot:
|
||||
mempool_free(f_ctx, bio_fallback_crypt_ctx_pool);
|
||||
bio_endio(bio);
|
||||
@ -473,9 +473,9 @@ static void blk_crypto_fallback_decrypt_endio(struct bio *bio)
|
||||
* @bio_ptr: pointer to the bio to prepare
|
||||
*
|
||||
* If bio is doing a WRITE operation, this splits the bio into two parts if it's
|
||||
* too big (see blk_crypto_split_bio_if_needed). It then allocates a bounce bio
|
||||
* for the first part, encrypts it, and update bio_ptr to point to the bounce
|
||||
* bio.
|
||||
* too big (see blk_crypto_fallback_split_bio_if_needed()). It then allocates a
|
||||
* bounce bio for the first part, encrypts it, and updates bio_ptr to point to
|
||||
* the bounce bio.
|
||||
*
|
||||
* For a READ operation, we mark the bio for decryption by using bi_private and
|
||||
* bi_end_io.
|
||||
@ -499,8 +499,8 @@ bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr)
|
||||
return false;
|
||||
}
|
||||
|
||||
if (!blk_ksm_crypto_cfg_supported(&blk_crypto_ksm,
|
||||
&bc->bc_key->crypto_cfg)) {
|
||||
if (!__blk_crypto_cfg_supported(&blk_crypto_fallback_profile,
|
||||
&bc->bc_key->crypto_cfg)) {
|
||||
bio->bi_status = BLK_STS_NOTSUPP;
|
||||
return false;
|
||||
}
|
||||
@ -526,7 +526,7 @@ bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr)
|
||||
|
||||
int blk_crypto_fallback_evict_key(const struct blk_crypto_key *key)
|
||||
{
|
||||
return blk_ksm_evict_key(&blk_crypto_ksm, key);
|
||||
return __blk_crypto_evict_key(&blk_crypto_fallback_profile, key);
|
||||
}
|
||||
|
||||
static bool blk_crypto_fallback_inited;
|
||||
@ -534,6 +534,7 @@ static int blk_crypto_fallback_init(void)
|
||||
{
|
||||
int i;
|
||||
int err;
|
||||
struct blk_crypto_profile *profile = &blk_crypto_fallback_profile;
|
||||
|
||||
if (blk_crypto_fallback_inited)
|
||||
return 0;
|
||||
@ -544,24 +545,24 @@ static int blk_crypto_fallback_init(void)
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
err = blk_ksm_init(&blk_crypto_ksm, blk_crypto_num_keyslots);
|
||||
err = blk_crypto_profile_init(profile, blk_crypto_num_keyslots);
|
||||
if (err)
|
||||
goto fail_free_bioset;
|
||||
err = -ENOMEM;
|
||||
|
||||
blk_crypto_ksm.ksm_ll_ops = blk_crypto_ksm_ll_ops;
|
||||
blk_crypto_ksm.max_dun_bytes_supported = BLK_CRYPTO_MAX_IV_SIZE;
|
||||
profile->ll_ops = blk_crypto_fallback_ll_ops;
|
||||
profile->max_dun_bytes_supported = BLK_CRYPTO_MAX_IV_SIZE;
|
||||
|
||||
/* All blk-crypto modes have a crypto API fallback. */
|
||||
for (i = 0; i < BLK_ENCRYPTION_MODE_MAX; i++)
|
||||
blk_crypto_ksm.crypto_modes_supported[i] = 0xFFFFFFFF;
|
||||
blk_crypto_ksm.crypto_modes_supported[BLK_ENCRYPTION_MODE_INVALID] = 0;
|
||||
profile->modes_supported[i] = 0xFFFFFFFF;
|
||||
profile->modes_supported[BLK_ENCRYPTION_MODE_INVALID] = 0;
|
||||
|
||||
blk_crypto_wq = alloc_workqueue("blk_crypto_wq",
|
||||
WQ_UNBOUND | WQ_HIGHPRI |
|
||||
WQ_MEM_RECLAIM, num_online_cpus());
|
||||
if (!blk_crypto_wq)
|
||||
goto fail_free_ksm;
|
||||
goto fail_destroy_profile;
|
||||
|
||||
blk_crypto_keyslots = kcalloc(blk_crypto_num_keyslots,
|
||||
sizeof(blk_crypto_keyslots[0]),
|
||||
@ -595,8 +596,8 @@ fail_free_keyslots:
|
||||
kfree(blk_crypto_keyslots);
|
||||
fail_free_wq:
|
||||
destroy_workqueue(blk_crypto_wq);
|
||||
fail_free_ksm:
|
||||
blk_ksm_destroy(&blk_crypto_ksm);
|
||||
fail_destroy_profile:
|
||||
blk_crypto_profile_destroy(profile);
|
||||
fail_free_bioset:
|
||||
bioset_exit(&crypto_bio_split);
|
||||
out:
|
||||
@ -610,7 +611,7 @@ out:
|
||||
int blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num)
|
||||
{
|
||||
const char *cipher_str = blk_crypto_modes[mode_num].cipher_str;
|
||||
struct blk_crypto_keyslot *slotp;
|
||||
struct blk_crypto_fallback_keyslot *slotp;
|
||||
unsigned int i;
|
||||
int err = 0;
|
||||
|
||||
|
@ -7,7 +7,7 @@
|
||||
#define __LINUX_BLK_CRYPTO_INTERNAL_H
|
||||
|
||||
#include <linux/bio.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-mq.h>
|
||||
|
||||
/* Represents a crypto mode supported by blk-crypto */
|
||||
struct blk_crypto_mode {
|
||||
|
565
block/blk-crypto-profile.c
Normal file
565
block/blk-crypto-profile.c
Normal file
@ -0,0 +1,565 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* Copyright 2019 Google LLC
|
||||
*/
|
||||
|
||||
/**
|
||||
* DOC: blk-crypto profiles
|
||||
*
|
||||
* 'struct blk_crypto_profile' contains all generic inline encryption-related
|
||||
* state for a particular inline encryption device. blk_crypto_profile serves
|
||||
* as the way that drivers for inline encryption hardware expose their crypto
|
||||
* capabilities and certain functions (e.g., functions to program and evict
|
||||
* keys) to upper layers. Device drivers that want to support inline encryption
|
||||
* construct a crypto profile, then associate it with the disk's request_queue.
|
||||
*
|
||||
* If the device has keyslots, then its blk_crypto_profile also handles managing
|
||||
* these keyslots in a device-independent way, using the driver-provided
|
||||
* functions to program and evict keys as needed. This includes keeping track
|
||||
* of which key and how many I/O requests are using each keyslot, getting
|
||||
* keyslots for I/O requests, and handling key eviction requests.
|
||||
*
|
||||
* For more information, see Documentation/block/inline-encryption.rst.
|
||||
*/
|
||||
|
||||
#define pr_fmt(fmt) "blk-crypto: " fmt
|
||||
|
||||
#include <linux/blk-crypto-profile.h>
|
||||
#include <linux/device.h>
|
||||
#include <linux/atomic.h>
|
||||
#include <linux/mutex.h>
|
||||
#include <linux/pm_runtime.h>
|
||||
#include <linux/wait.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
|
||||
struct blk_crypto_keyslot {
|
||||
atomic_t slot_refs;
|
||||
struct list_head idle_slot_node;
|
||||
struct hlist_node hash_node;
|
||||
const struct blk_crypto_key *key;
|
||||
struct blk_crypto_profile *profile;
|
||||
};
|
||||
|
||||
static inline void blk_crypto_hw_enter(struct blk_crypto_profile *profile)
|
||||
{
|
||||
/*
|
||||
* Calling into the driver requires profile->lock held and the device
|
||||
* resumed. But we must resume the device first, since that can acquire
|
||||
* and release profile->lock via blk_crypto_reprogram_all_keys().
|
||||
*/
|
||||
if (profile->dev)
|
||||
pm_runtime_get_sync(profile->dev);
|
||||
down_write(&profile->lock);
|
||||
}
|
||||
|
||||
static inline void blk_crypto_hw_exit(struct blk_crypto_profile *profile)
|
||||
{
|
||||
up_write(&profile->lock);
|
||||
if (profile->dev)
|
||||
pm_runtime_put_sync(profile->dev);
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_crypto_profile_init() - Initialize a blk_crypto_profile
|
||||
* @profile: the blk_crypto_profile to initialize
|
||||
* @num_slots: the number of keyslots
|
||||
*
|
||||
* Storage drivers must call this when starting to set up a blk_crypto_profile,
|
||||
* before filling in additional fields.
|
||||
*
|
||||
* Return: 0 on success, or else a negative error code.
|
||||
*/
|
||||
int blk_crypto_profile_init(struct blk_crypto_profile *profile,
|
||||
unsigned int num_slots)
|
||||
{
|
||||
unsigned int slot;
|
||||
unsigned int i;
|
||||
unsigned int slot_hashtable_size;
|
||||
|
||||
memset(profile, 0, sizeof(*profile));
|
||||
init_rwsem(&profile->lock);
|
||||
|
||||
if (num_slots == 0)
|
||||
return 0;
|
||||
|
||||
/* Initialize keyslot management data. */
|
||||
|
||||
profile->slots = kvcalloc(num_slots, sizeof(profile->slots[0]),
|
||||
GFP_KERNEL);
|
||||
if (!profile->slots)
|
||||
return -ENOMEM;
|
||||
|
||||
profile->num_slots = num_slots;
|
||||
|
||||
init_waitqueue_head(&profile->idle_slots_wait_queue);
|
||||
INIT_LIST_HEAD(&profile->idle_slots);
|
||||
|
||||
for (slot = 0; slot < num_slots; slot++) {
|
||||
profile->slots[slot].profile = profile;
|
||||
list_add_tail(&profile->slots[slot].idle_slot_node,
|
||||
&profile->idle_slots);
|
||||
}
|
||||
|
||||
spin_lock_init(&profile->idle_slots_lock);
|
||||
|
||||
slot_hashtable_size = roundup_pow_of_two(num_slots);
|
||||
/*
|
||||
* hash_ptr() assumes bits != 0, so ensure the hash table has at least 2
|
||||
* buckets. This only makes a difference when there is only 1 keyslot.
|
||||
*/
|
||||
if (slot_hashtable_size < 2)
|
||||
slot_hashtable_size = 2;
|
||||
|
||||
profile->log_slot_ht_size = ilog2(slot_hashtable_size);
|
||||
profile->slot_hashtable =
|
||||
kvmalloc_array(slot_hashtable_size,
|
||||
sizeof(profile->slot_hashtable[0]), GFP_KERNEL);
|
||||
if (!profile->slot_hashtable)
|
||||
goto err_destroy;
|
||||
for (i = 0; i < slot_hashtable_size; i++)
|
||||
INIT_HLIST_HEAD(&profile->slot_hashtable[i]);
|
||||
|
||||
return 0;
|
||||
|
||||
err_destroy:
|
||||
blk_crypto_profile_destroy(profile);
|
||||
return -ENOMEM;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_crypto_profile_init);
|
||||
|
||||
static void blk_crypto_profile_destroy_callback(void *profile)
|
||||
{
|
||||
blk_crypto_profile_destroy(profile);
|
||||
}
|
||||
|
||||
/**
|
||||
* devm_blk_crypto_profile_init() - Resource-managed blk_crypto_profile_init()
|
||||
* @dev: the device which owns the blk_crypto_profile
|
||||
* @profile: the blk_crypto_profile to initialize
|
||||
* @num_slots: the number of keyslots
|
||||
*
|
||||
* Like blk_crypto_profile_init(), but causes blk_crypto_profile_destroy() to be
|
||||
* called automatically on driver detach.
|
||||
*
|
||||
* Return: 0 on success, or else a negative error code.
|
||||
*/
|
||||
int devm_blk_crypto_profile_init(struct device *dev,
|
||||
struct blk_crypto_profile *profile,
|
||||
unsigned int num_slots)
|
||||
{
|
||||
int err = blk_crypto_profile_init(profile, num_slots);
|
||||
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
return devm_add_action_or_reset(dev,
|
||||
blk_crypto_profile_destroy_callback,
|
||||
profile);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(devm_blk_crypto_profile_init);
|
||||
|
||||
static inline struct hlist_head *
|
||||
blk_crypto_hash_bucket_for_key(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_key *key)
|
||||
{
|
||||
return &profile->slot_hashtable[
|
||||
hash_ptr(key, profile->log_slot_ht_size)];
|
||||
}
|
||||
|
||||
static void
|
||||
blk_crypto_remove_slot_from_lru_list(struct blk_crypto_keyslot *slot)
|
||||
{
|
||||
struct blk_crypto_profile *profile = slot->profile;
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&profile->idle_slots_lock, flags);
|
||||
list_del(&slot->idle_slot_node);
|
||||
spin_unlock_irqrestore(&profile->idle_slots_lock, flags);
|
||||
}
|
||||
|
||||
static struct blk_crypto_keyslot *
|
||||
blk_crypto_find_keyslot(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_key *key)
|
||||
{
|
||||
const struct hlist_head *head =
|
||||
blk_crypto_hash_bucket_for_key(profile, key);
|
||||
struct blk_crypto_keyslot *slotp;
|
||||
|
||||
hlist_for_each_entry(slotp, head, hash_node) {
|
||||
if (slotp->key == key)
|
||||
return slotp;
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static struct blk_crypto_keyslot *
|
||||
blk_crypto_find_and_grab_keyslot(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_key *key)
|
||||
{
|
||||
struct blk_crypto_keyslot *slot;
|
||||
|
||||
slot = blk_crypto_find_keyslot(profile, key);
|
||||
if (!slot)
|
||||
return NULL;
|
||||
if (atomic_inc_return(&slot->slot_refs) == 1) {
|
||||
/* Took first reference to this slot; remove it from LRU list */
|
||||
blk_crypto_remove_slot_from_lru_list(slot);
|
||||
}
|
||||
return slot;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_crypto_keyslot_index() - Get the index of a keyslot
|
||||
* @slot: a keyslot that blk_crypto_get_keyslot() returned
|
||||
*
|
||||
* Return: the 0-based index of the keyslot within the device's keyslots.
|
||||
*/
|
||||
unsigned int blk_crypto_keyslot_index(struct blk_crypto_keyslot *slot)
|
||||
{
|
||||
return slot - slot->profile->slots;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_crypto_keyslot_index);
|
||||
|
||||
/**
|
||||
* blk_crypto_get_keyslot() - Get a keyslot for a key, if needed.
|
||||
* @profile: the crypto profile of the device the key will be used on
|
||||
* @key: the key that will be used
|
||||
* @slot_ptr: If a keyslot is allocated, an opaque pointer to the keyslot struct
|
||||
* will be stored here; otherwise NULL will be stored here.
|
||||
*
|
||||
* If the device has keyslots, this gets a keyslot that's been programmed with
|
||||
* the specified key. If the key is already in a slot, this reuses it;
|
||||
* otherwise this waits for a slot to become idle and programs the key into it.
|
||||
*
|
||||
* This must be paired with a call to blk_crypto_put_keyslot().
|
||||
*
|
||||
* Context: Process context. Takes and releases profile->lock.
|
||||
* Return: BLK_STS_OK on success, meaning that either a keyslot was allocated or
|
||||
* one wasn't needed; or a blk_status_t error on failure.
|
||||
*/
|
||||
blk_status_t blk_crypto_get_keyslot(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_key *key,
|
||||
struct blk_crypto_keyslot **slot_ptr)
|
||||
{
|
||||
struct blk_crypto_keyslot *slot;
|
||||
int slot_idx;
|
||||
int err;
|
||||
|
||||
*slot_ptr = NULL;
|
||||
|
||||
/*
|
||||
* If the device has no concept of "keyslots", then there is no need to
|
||||
* get one.
|
||||
*/
|
||||
if (profile->num_slots == 0)
|
||||
return BLK_STS_OK;
|
||||
|
||||
down_read(&profile->lock);
|
||||
slot = blk_crypto_find_and_grab_keyslot(profile, key);
|
||||
up_read(&profile->lock);
|
||||
if (slot)
|
||||
goto success;
|
||||
|
||||
for (;;) {
|
||||
blk_crypto_hw_enter(profile);
|
||||
slot = blk_crypto_find_and_grab_keyslot(profile, key);
|
||||
if (slot) {
|
||||
blk_crypto_hw_exit(profile);
|
||||
goto success;
|
||||
}
|
||||
|
||||
/*
|
||||
* If we're here, that means there wasn't a slot that was
|
||||
* already programmed with the key. So try to program it.
|
||||
*/
|
||||
if (!list_empty(&profile->idle_slots))
|
||||
break;
|
||||
|
||||
blk_crypto_hw_exit(profile);
|
||||
wait_event(profile->idle_slots_wait_queue,
|
||||
!list_empty(&profile->idle_slots));
|
||||
}
|
||||
|
||||
slot = list_first_entry(&profile->idle_slots, struct blk_crypto_keyslot,
|
||||
idle_slot_node);
|
||||
slot_idx = blk_crypto_keyslot_index(slot);
|
||||
|
||||
err = profile->ll_ops.keyslot_program(profile, key, slot_idx);
|
||||
if (err) {
|
||||
wake_up(&profile->idle_slots_wait_queue);
|
||||
blk_crypto_hw_exit(profile);
|
||||
return errno_to_blk_status(err);
|
||||
}
|
||||
|
||||
/* Move this slot to the hash list for the new key. */
|
||||
if (slot->key)
|
||||
hlist_del(&slot->hash_node);
|
||||
slot->key = key;
|
||||
hlist_add_head(&slot->hash_node,
|
||||
blk_crypto_hash_bucket_for_key(profile, key));
|
||||
|
||||
atomic_set(&slot->slot_refs, 1);
|
||||
|
||||
blk_crypto_remove_slot_from_lru_list(slot);
|
||||
|
||||
blk_crypto_hw_exit(profile);
|
||||
success:
|
||||
*slot_ptr = slot;
|
||||
return BLK_STS_OK;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_crypto_put_keyslot() - Release a reference to a keyslot
|
||||
* @slot: The keyslot to release the reference of (may be NULL).
|
||||
*
|
||||
* Context: Any context.
|
||||
*/
|
||||
void blk_crypto_put_keyslot(struct blk_crypto_keyslot *slot)
|
||||
{
|
||||
struct blk_crypto_profile *profile;
|
||||
unsigned long flags;
|
||||
|
||||
if (!slot)
|
||||
return;
|
||||
|
||||
profile = slot->profile;
|
||||
|
||||
if (atomic_dec_and_lock_irqsave(&slot->slot_refs,
|
||||
&profile->idle_slots_lock, flags)) {
|
||||
list_add_tail(&slot->idle_slot_node, &profile->idle_slots);
|
||||
spin_unlock_irqrestore(&profile->idle_slots_lock, flags);
|
||||
wake_up(&profile->idle_slots_wait_queue);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* __blk_crypto_cfg_supported() - Check whether the given crypto profile
|
||||
* supports the given crypto configuration.
|
||||
* @profile: the crypto profile to check
|
||||
* @cfg: the crypto configuration to check for
|
||||
*
|
||||
* Return: %true if @profile supports the given @cfg.
|
||||
*/
|
||||
bool __blk_crypto_cfg_supported(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_config *cfg)
|
||||
{
|
||||
if (!profile)
|
||||
return false;
|
||||
if (!(profile->modes_supported[cfg->crypto_mode] & cfg->data_unit_size))
|
||||
return false;
|
||||
if (profile->max_dun_bytes_supported < cfg->dun_bytes)
|
||||
return false;
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* __blk_crypto_evict_key() - Evict a key from a device.
|
||||
* @profile: the crypto profile of the device
|
||||
* @key: the key to evict. It must not still be used in any I/O.
|
||||
*
|
||||
* If the device has keyslots, this finds the keyslot (if any) that contains the
|
||||
* specified key and calls the driver's keyslot_evict function to evict it.
|
||||
*
|
||||
* Otherwise, this just calls the driver's keyslot_evict function if it is
|
||||
* implemented, passing just the key (without any particular keyslot). This
|
||||
* allows layered devices to evict the key from their underlying devices.
|
||||
*
|
||||
* Context: Process context. Takes and releases profile->lock.
|
||||
* Return: 0 on success or if there's no keyslot with the specified key, -EBUSY
|
||||
* if the keyslot is still in use, or another -errno value on other
|
||||
* error.
|
||||
*/
|
||||
int __blk_crypto_evict_key(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_key *key)
|
||||
{
|
||||
struct blk_crypto_keyslot *slot;
|
||||
int err = 0;
|
||||
|
||||
if (profile->num_slots == 0) {
|
||||
if (profile->ll_ops.keyslot_evict) {
|
||||
blk_crypto_hw_enter(profile);
|
||||
err = profile->ll_ops.keyslot_evict(profile, key, -1);
|
||||
blk_crypto_hw_exit(profile);
|
||||
return err;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
blk_crypto_hw_enter(profile);
|
||||
slot = blk_crypto_find_keyslot(profile, key);
|
||||
if (!slot)
|
||||
goto out_unlock;
|
||||
|
||||
if (WARN_ON_ONCE(atomic_read(&slot->slot_refs) != 0)) {
|
||||
err = -EBUSY;
|
||||
goto out_unlock;
|
||||
}
|
||||
err = profile->ll_ops.keyslot_evict(profile, key,
|
||||
blk_crypto_keyslot_index(slot));
|
||||
if (err)
|
||||
goto out_unlock;
|
||||
|
||||
hlist_del(&slot->hash_node);
|
||||
slot->key = NULL;
|
||||
err = 0;
|
||||
out_unlock:
|
||||
blk_crypto_hw_exit(profile);
|
||||
return err;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_crypto_reprogram_all_keys() - Re-program all keyslots.
|
||||
* @profile: The crypto profile
|
||||
*
|
||||
* Re-program all keyslots that are supposed to have a key programmed. This is
|
||||
* intended only for use by drivers for hardware that loses its keys on reset.
|
||||
*
|
||||
* Context: Process context. Takes and releases profile->lock.
|
||||
*/
|
||||
void blk_crypto_reprogram_all_keys(struct blk_crypto_profile *profile)
|
||||
{
|
||||
unsigned int slot;
|
||||
|
||||
if (profile->num_slots == 0)
|
||||
return;
|
||||
|
||||
/* This is for device initialization, so don't resume the device */
|
||||
down_write(&profile->lock);
|
||||
for (slot = 0; slot < profile->num_slots; slot++) {
|
||||
const struct blk_crypto_key *key = profile->slots[slot].key;
|
||||
int err;
|
||||
|
||||
if (!key)
|
||||
continue;
|
||||
|
||||
err = profile->ll_ops.keyslot_program(profile, key, slot);
|
||||
WARN_ON(err);
|
||||
}
|
||||
up_write(&profile->lock);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_crypto_reprogram_all_keys);
|
||||
|
||||
void blk_crypto_profile_destroy(struct blk_crypto_profile *profile)
|
||||
{
|
||||
if (!profile)
|
||||
return;
|
||||
kvfree(profile->slot_hashtable);
|
||||
kvfree_sensitive(profile->slots,
|
||||
sizeof(profile->slots[0]) * profile->num_slots);
|
||||
memzero_explicit(profile, sizeof(*profile));
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_crypto_profile_destroy);
|
||||
|
||||
bool blk_crypto_register(struct blk_crypto_profile *profile,
|
||||
struct request_queue *q)
|
||||
{
|
||||
if (blk_integrity_queue_supports_integrity(q)) {
|
||||
pr_warn("Integrity and hardware inline encryption are not supported together. Disabling hardware inline encryption.\n");
|
||||
return false;
|
||||
}
|
||||
q->crypto_profile = profile;
|
||||
return true;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_crypto_register);
|
||||
|
||||
void blk_crypto_unregister(struct request_queue *q)
|
||||
{
|
||||
q->crypto_profile = NULL;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_crypto_intersect_capabilities() - restrict supported crypto capabilities
|
||||
* by child device
|
||||
* @parent: the crypto profile for the parent device
|
||||
* @child: the crypto profile for the child device, or NULL
|
||||
*
|
||||
* This clears all crypto capabilities in @parent that aren't set in @child. If
|
||||
* @child is NULL, then this clears all parent capabilities.
|
||||
*
|
||||
* Only use this when setting up the crypto profile for a layered device, before
|
||||
* it's been exposed yet.
|
||||
*/
|
||||
void blk_crypto_intersect_capabilities(struct blk_crypto_profile *parent,
|
||||
const struct blk_crypto_profile *child)
|
||||
{
|
||||
if (child) {
|
||||
unsigned int i;
|
||||
|
||||
parent->max_dun_bytes_supported =
|
||||
min(parent->max_dun_bytes_supported,
|
||||
child->max_dun_bytes_supported);
|
||||
for (i = 0; i < ARRAY_SIZE(child->modes_supported); i++)
|
||||
parent->modes_supported[i] &= child->modes_supported[i];
|
||||
} else {
|
||||
parent->max_dun_bytes_supported = 0;
|
||||
memset(parent->modes_supported, 0,
|
||||
sizeof(parent->modes_supported));
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_crypto_intersect_capabilities);
|
||||
|
||||
/**
|
||||
* blk_crypto_has_capabilities() - Check whether @target supports at least all
|
||||
* the crypto capabilities that @reference does.
|
||||
* @target: the target profile
|
||||
* @reference: the reference profile
|
||||
*
|
||||
* Return: %true if @target supports all the crypto capabilities of @reference.
|
||||
*/
|
||||
bool blk_crypto_has_capabilities(const struct blk_crypto_profile *target,
|
||||
const struct blk_crypto_profile *reference)
|
||||
{
|
||||
int i;
|
||||
|
||||
if (!reference)
|
||||
return true;
|
||||
|
||||
if (!target)
|
||||
return false;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(target->modes_supported); i++) {
|
||||
if (reference->modes_supported[i] & ~target->modes_supported[i])
|
||||
return false;
|
||||
}
|
||||
|
||||
if (reference->max_dun_bytes_supported >
|
||||
target->max_dun_bytes_supported)
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_crypto_has_capabilities);
|
||||
|
||||
/**
|
||||
* blk_crypto_update_capabilities() - Update the capabilities of a crypto
|
||||
* profile to match those of another crypto
|
||||
* profile.
|
||||
* @dst: The crypto profile whose capabilities to update.
|
||||
* @src: The crypto profile whose capabilities this function will update @dst's
|
||||
* capabilities to.
|
||||
*
|
||||
* Blk-crypto requires that crypto capabilities that were
|
||||
* advertised when a bio was created continue to be supported by the
|
||||
* device until that bio is ended. This is turn means that a device cannot
|
||||
* shrink its advertised crypto capabilities without any explicit
|
||||
* synchronization with upper layers. So if there's no such explicit
|
||||
* synchronization, @src must support all the crypto capabilities that
|
||||
* @dst does (i.e. we need blk_crypto_has_capabilities(@src, @dst)).
|
||||
*
|
||||
* Note also that as long as the crypto capabilities are being expanded, the
|
||||
* order of updates becoming visible is not important because it's alright
|
||||
* for blk-crypto to see stale values - they only cause blk-crypto to
|
||||
* believe that a crypto capability isn't supported when it actually is (which
|
||||
* might result in blk-crypto-fallback being used if available, or the bio being
|
||||
* failed).
|
||||
*/
|
||||
void blk_crypto_update_capabilities(struct blk_crypto_profile *dst,
|
||||
const struct blk_crypto_profile *src)
|
||||
{
|
||||
memcpy(dst->modes_supported, src->modes_supported,
|
||||
sizeof(dst->modes_supported));
|
||||
|
||||
dst->max_dun_bytes_supported = src->max_dun_bytes_supported;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_crypto_update_capabilities);
|
@ -11,7 +11,7 @@
|
||||
|
||||
#include <linux/bio.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/keyslot-manager.h>
|
||||
#include <linux/blk-crypto-profile.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/slab.h>
|
||||
|
||||
@ -218,8 +218,9 @@ static bool bio_crypt_check_alignment(struct bio *bio)
|
||||
|
||||
blk_status_t __blk_crypto_init_request(struct request *rq)
|
||||
{
|
||||
return blk_ksm_get_slot_for_key(rq->q->ksm, rq->crypt_ctx->bc_key,
|
||||
&rq->crypt_keyslot);
|
||||
return blk_crypto_get_keyslot(rq->q->crypto_profile,
|
||||
rq->crypt_ctx->bc_key,
|
||||
&rq->crypt_keyslot);
|
||||
}
|
||||
|
||||
/**
|
||||
@ -233,7 +234,7 @@ blk_status_t __blk_crypto_init_request(struct request *rq)
|
||||
*/
|
||||
void __blk_crypto_free_request(struct request *rq)
|
||||
{
|
||||
blk_ksm_put_slot(rq->crypt_keyslot);
|
||||
blk_crypto_put_keyslot(rq->crypt_keyslot);
|
||||
mempool_free(rq->crypt_ctx, bio_crypt_ctx_pool);
|
||||
blk_crypto_rq_set_defaults(rq);
|
||||
}
|
||||
@ -264,6 +265,7 @@ bool __blk_crypto_bio_prep(struct bio **bio_ptr)
|
||||
{
|
||||
struct bio *bio = *bio_ptr;
|
||||
const struct blk_crypto_key *bc_key = bio->bi_crypt_context->bc_key;
|
||||
struct blk_crypto_profile *profile;
|
||||
|
||||
/* Error if bio has no data. */
|
||||
if (WARN_ON_ONCE(!bio_has_data(bio))) {
|
||||
@ -280,8 +282,8 @@ bool __blk_crypto_bio_prep(struct bio **bio_ptr)
|
||||
* Success if device supports the encryption context, or if we succeeded
|
||||
* in falling back to the crypto API.
|
||||
*/
|
||||
if (blk_ksm_crypto_cfg_supported(bio->bi_bdev->bd_disk->queue->ksm,
|
||||
&bc_key->crypto_cfg))
|
||||
profile = bdev_get_queue(bio->bi_bdev)->crypto_profile;
|
||||
if (__blk_crypto_cfg_supported(profile, &bc_key->crypto_cfg))
|
||||
return true;
|
||||
|
||||
if (blk_crypto_fallback_bio_prep(bio_ptr))
|
||||
@ -357,7 +359,7 @@ bool blk_crypto_config_supported(struct request_queue *q,
|
||||
const struct blk_crypto_config *cfg)
|
||||
{
|
||||
return IS_ENABLED(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) ||
|
||||
blk_ksm_crypto_cfg_supported(q->ksm, cfg);
|
||||
__blk_crypto_cfg_supported(q->crypto_profile, cfg);
|
||||
}
|
||||
|
||||
/**
|
||||
@ -378,7 +380,7 @@ bool blk_crypto_config_supported(struct request_queue *q,
|
||||
int blk_crypto_start_using_key(const struct blk_crypto_key *key,
|
||||
struct request_queue *q)
|
||||
{
|
||||
if (blk_ksm_crypto_cfg_supported(q->ksm, &key->crypto_cfg))
|
||||
if (__blk_crypto_cfg_supported(q->crypto_profile, &key->crypto_cfg))
|
||||
return 0;
|
||||
return blk_crypto_fallback_start_using_mode(key->crypto_cfg.crypto_mode);
|
||||
}
|
||||
@ -394,18 +396,17 @@ int blk_crypto_start_using_key(const struct blk_crypto_key *key,
|
||||
* evicted from any hardware that it might have been programmed into. The key
|
||||
* must not be in use by any in-flight IO when this function is called.
|
||||
*
|
||||
* Return: 0 on success or if key is not present in the q's ksm, -err on error.
|
||||
* Return: 0 on success or if the key wasn't in any keyslot; -errno on error.
|
||||
*/
|
||||
int blk_crypto_evict_key(struct request_queue *q,
|
||||
const struct blk_crypto_key *key)
|
||||
{
|
||||
if (blk_ksm_crypto_cfg_supported(q->ksm, &key->crypto_cfg))
|
||||
return blk_ksm_evict_key(q->ksm, key);
|
||||
if (__blk_crypto_cfg_supported(q->crypto_profile, &key->crypto_cfg))
|
||||
return __blk_crypto_evict_key(q->crypto_profile, key);
|
||||
|
||||
/*
|
||||
* If the request queue's associated inline encryption hardware didn't
|
||||
* have support for the key, then the key might have been programmed
|
||||
* into the fallback keyslot manager, so try to evict from there.
|
||||
* If the request_queue didn't support the key, then blk-crypto-fallback
|
||||
* may have been used, so try to evict the key from blk-crypto-fallback.
|
||||
*/
|
||||
return blk_crypto_fallback_evict_key(key);
|
||||
}
|
||||
|
@ -65,13 +65,19 @@ EXPORT_SYMBOL_GPL(blk_execute_rq_nowait);
|
||||
|
||||
static bool blk_rq_is_poll(struct request *rq)
|
||||
{
|
||||
return rq->mq_hctx && rq->mq_hctx->type == HCTX_TYPE_POLL;
|
||||
if (!rq->mq_hctx)
|
||||
return false;
|
||||
if (rq->mq_hctx->type != HCTX_TYPE_POLL)
|
||||
return false;
|
||||
if (WARN_ON_ONCE(!rq->bio))
|
||||
return false;
|
||||
return true;
|
||||
}
|
||||
|
||||
static void blk_rq_poll_completion(struct request *rq, struct completion *wait)
|
||||
{
|
||||
do {
|
||||
blk_poll(rq->q, request_to_qc_t(rq->mq_hctx, rq), true);
|
||||
bio_poll(rq->bio, NULL, 0);
|
||||
cond_resched();
|
||||
} while (!completion_done(wait));
|
||||
}
|
||||
|
@ -379,7 +379,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error)
|
||||
* @rq is being submitted. Analyze what needs to be done and put it on the
|
||||
* right queue.
|
||||
*/
|
||||
void blk_insert_flush(struct request *rq)
|
||||
bool blk_insert_flush(struct request *rq)
|
||||
{
|
||||
struct request_queue *q = rq->q;
|
||||
unsigned long fflags = q->queue_flags; /* may change, cache */
|
||||
@ -409,7 +409,7 @@ void blk_insert_flush(struct request *rq)
|
||||
*/
|
||||
if (!policy) {
|
||||
blk_mq_end_request(rq, 0);
|
||||
return;
|
||||
return true;
|
||||
}
|
||||
|
||||
BUG_ON(rq->bio != rq->biotail); /*assumes zero or single bio rq */
|
||||
@ -420,10 +420,8 @@ void blk_insert_flush(struct request *rq)
|
||||
* for normal execution.
|
||||
*/
|
||||
if ((policy & REQ_FSEQ_DATA) &&
|
||||
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
|
||||
blk_mq_request_bypass_insert(rq, false, false);
|
||||
return;
|
||||
}
|
||||
!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH)))
|
||||
return false;
|
||||
|
||||
/*
|
||||
* @rq should go through flush machinery. Mark it part of flush
|
||||
@ -439,6 +437,8 @@ void blk_insert_flush(struct request *rq)
|
||||
spin_lock_irq(&fq->mq_flush_lock);
|
||||
blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
|
||||
spin_unlock_irq(&fq->mq_flush_lock);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
|
348
block/blk-ia-ranges.c
Normal file
348
block/blk-ia-ranges.c
Normal file
@ -0,0 +1,348 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* Block device concurrent positioning ranges.
|
||||
*
|
||||
* Copyright (C) 2021 Western Digital Corporation or its Affiliates.
|
||||
*/
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/init.h>
|
||||
|
||||
#include "blk.h"
|
||||
|
||||
static ssize_t
|
||||
blk_ia_range_sector_show(struct blk_independent_access_range *iar,
|
||||
char *buf)
|
||||
{
|
||||
return sprintf(buf, "%llu\n", iar->sector);
|
||||
}
|
||||
|
||||
static ssize_t
|
||||
blk_ia_range_nr_sectors_show(struct blk_independent_access_range *iar,
|
||||
char *buf)
|
||||
{
|
||||
return sprintf(buf, "%llu\n", iar->nr_sectors);
|
||||
}
|
||||
|
||||
struct blk_ia_range_sysfs_entry {
|
||||
struct attribute attr;
|
||||
ssize_t (*show)(struct blk_independent_access_range *iar, char *buf);
|
||||
};
|
||||
|
||||
static struct blk_ia_range_sysfs_entry blk_ia_range_sector_entry = {
|
||||
.attr = { .name = "sector", .mode = 0444 },
|
||||
.show = blk_ia_range_sector_show,
|
||||
};
|
||||
|
||||
static struct blk_ia_range_sysfs_entry blk_ia_range_nr_sectors_entry = {
|
||||
.attr = { .name = "nr_sectors", .mode = 0444 },
|
||||
.show = blk_ia_range_nr_sectors_show,
|
||||
};
|
||||
|
||||
static struct attribute *blk_ia_range_attrs[] = {
|
||||
&blk_ia_range_sector_entry.attr,
|
||||
&blk_ia_range_nr_sectors_entry.attr,
|
||||
NULL,
|
||||
};
|
||||
ATTRIBUTE_GROUPS(blk_ia_range);
|
||||
|
||||
static ssize_t blk_ia_range_sysfs_show(struct kobject *kobj,
|
||||
struct attribute *attr, char *buf)
|
||||
{
|
||||
struct blk_ia_range_sysfs_entry *entry =
|
||||
container_of(attr, struct blk_ia_range_sysfs_entry, attr);
|
||||
struct blk_independent_access_range *iar =
|
||||
container_of(kobj, struct blk_independent_access_range, kobj);
|
||||
ssize_t ret;
|
||||
|
||||
mutex_lock(&iar->queue->sysfs_lock);
|
||||
ret = entry->show(iar, buf);
|
||||
mutex_unlock(&iar->queue->sysfs_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static const struct sysfs_ops blk_ia_range_sysfs_ops = {
|
||||
.show = blk_ia_range_sysfs_show,
|
||||
};
|
||||
|
||||
/*
|
||||
* Independent access range entries are not freed individually, but alltogether
|
||||
* with struct blk_independent_access_ranges and its array of ranges. Since
|
||||
* kobject_add() takes a reference on the parent kobject contained in
|
||||
* struct blk_independent_access_ranges, the array of independent access range
|
||||
* entries cannot be freed until kobject_del() is called for all entries.
|
||||
* So we do not need to do anything here, but still need this no-op release
|
||||
* operation to avoid complaints from the kobject code.
|
||||
*/
|
||||
static void blk_ia_range_sysfs_nop_release(struct kobject *kobj)
|
||||
{
|
||||
}
|
||||
|
||||
static struct kobj_type blk_ia_range_ktype = {
|
||||
.sysfs_ops = &blk_ia_range_sysfs_ops,
|
||||
.default_groups = blk_ia_range_groups,
|
||||
.release = blk_ia_range_sysfs_nop_release,
|
||||
};
|
||||
|
||||
/*
|
||||
* This will be executed only after all independent access range entries are
|
||||
* removed with kobject_del(), at which point, it is safe to free everything,
|
||||
* including the array of ranges.
|
||||
*/
|
||||
static void blk_ia_ranges_sysfs_release(struct kobject *kobj)
|
||||
{
|
||||
struct blk_independent_access_ranges *iars =
|
||||
container_of(kobj, struct blk_independent_access_ranges, kobj);
|
||||
|
||||
kfree(iars);
|
||||
}
|
||||
|
||||
static struct kobj_type blk_ia_ranges_ktype = {
|
||||
.release = blk_ia_ranges_sysfs_release,
|
||||
};
|
||||
|
||||
/**
|
||||
* disk_register_ia_ranges - register with sysfs a set of independent
|
||||
* access ranges
|
||||
* @disk: Target disk
|
||||
* @new_iars: New set of independent access ranges
|
||||
*
|
||||
* Register with sysfs a set of independent access ranges for @disk.
|
||||
* If @new_iars is not NULL, this set of ranges is registered and the old set
|
||||
* specified by q->ia_ranges is unregistered. Otherwise, q->ia_ranges is
|
||||
* registered if it is not already.
|
||||
*/
|
||||
int disk_register_independent_access_ranges(struct gendisk *disk,
|
||||
struct blk_independent_access_ranges *new_iars)
|
||||
{
|
||||
struct request_queue *q = disk->queue;
|
||||
struct blk_independent_access_ranges *iars;
|
||||
int i, ret;
|
||||
|
||||
lockdep_assert_held(&q->sysfs_dir_lock);
|
||||
lockdep_assert_held(&q->sysfs_lock);
|
||||
|
||||
/* If a new range set is specified, unregister the old one */
|
||||
if (new_iars) {
|
||||
if (q->ia_ranges)
|
||||
disk_unregister_independent_access_ranges(disk);
|
||||
q->ia_ranges = new_iars;
|
||||
}
|
||||
|
||||
iars = q->ia_ranges;
|
||||
if (!iars)
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* At this point, iars is the new set of sector access ranges that needs
|
||||
* to be registered with sysfs.
|
||||
*/
|
||||
WARN_ON(iars->sysfs_registered);
|
||||
ret = kobject_init_and_add(&iars->kobj, &blk_ia_ranges_ktype,
|
||||
&q->kobj, "%s", "independent_access_ranges");
|
||||
if (ret) {
|
||||
q->ia_ranges = NULL;
|
||||
kfree(iars);
|
||||
return ret;
|
||||
}
|
||||
|
||||
for (i = 0; i < iars->nr_ia_ranges; i++) {
|
||||
iars->ia_range[i].queue = q;
|
||||
ret = kobject_init_and_add(&iars->ia_range[i].kobj,
|
||||
&blk_ia_range_ktype, &iars->kobj,
|
||||
"%d", i);
|
||||
if (ret) {
|
||||
while (--i >= 0)
|
||||
kobject_del(&iars->ia_range[i].kobj);
|
||||
kobject_del(&iars->kobj);
|
||||
kobject_put(&iars->kobj);
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
|
||||
iars->sysfs_registered = true;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void disk_unregister_independent_access_ranges(struct gendisk *disk)
|
||||
{
|
||||
struct request_queue *q = disk->queue;
|
||||
struct blk_independent_access_ranges *iars = q->ia_ranges;
|
||||
int i;
|
||||
|
||||
lockdep_assert_held(&q->sysfs_dir_lock);
|
||||
lockdep_assert_held(&q->sysfs_lock);
|
||||
|
||||
if (!iars)
|
||||
return;
|
||||
|
||||
if (iars->sysfs_registered) {
|
||||
for (i = 0; i < iars->nr_ia_ranges; i++)
|
||||
kobject_del(&iars->ia_range[i].kobj);
|
||||
kobject_del(&iars->kobj);
|
||||
kobject_put(&iars->kobj);
|
||||
} else {
|
||||
kfree(iars);
|
||||
}
|
||||
|
||||
q->ia_ranges = NULL;
|
||||
}
|
||||
|
||||
static struct blk_independent_access_range *
|
||||
disk_find_ia_range(struct blk_independent_access_ranges *iars,
|
||||
sector_t sector)
|
||||
{
|
||||
struct blk_independent_access_range *iar;
|
||||
int i;
|
||||
|
||||
for (i = 0; i < iars->nr_ia_ranges; i++) {
|
||||
iar = &iars->ia_range[i];
|
||||
if (sector >= iar->sector &&
|
||||
sector < iar->sector + iar->nr_sectors)
|
||||
return iar;
|
||||
}
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static bool disk_check_ia_ranges(struct gendisk *disk,
|
||||
struct blk_independent_access_ranges *iars)
|
||||
{
|
||||
struct blk_independent_access_range *iar, *tmp;
|
||||
sector_t capacity = get_capacity(disk);
|
||||
sector_t sector = 0;
|
||||
int i;
|
||||
|
||||
/*
|
||||
* While sorting the ranges in increasing LBA order, check that the
|
||||
* ranges do not overlap, that there are no sector holes and that all
|
||||
* sectors belong to one range.
|
||||
*/
|
||||
for (i = 0; i < iars->nr_ia_ranges; i++) {
|
||||
tmp = disk_find_ia_range(iars, sector);
|
||||
if (!tmp || tmp->sector != sector) {
|
||||
pr_warn("Invalid non-contiguous independent access ranges\n");
|
||||
return false;
|
||||
}
|
||||
|
||||
iar = &iars->ia_range[i];
|
||||
if (tmp != iar) {
|
||||
swap(iar->sector, tmp->sector);
|
||||
swap(iar->nr_sectors, tmp->nr_sectors);
|
||||
}
|
||||
|
||||
sector += iar->nr_sectors;
|
||||
}
|
||||
|
||||
if (sector != capacity) {
|
||||
pr_warn("Independent access ranges do not match disk capacity\n");
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool disk_ia_ranges_changed(struct gendisk *disk,
|
||||
struct blk_independent_access_ranges *new)
|
||||
{
|
||||
struct blk_independent_access_ranges *old = disk->queue->ia_ranges;
|
||||
int i;
|
||||
|
||||
if (!old)
|
||||
return true;
|
||||
|
||||
if (old->nr_ia_ranges != new->nr_ia_ranges)
|
||||
return true;
|
||||
|
||||
for (i = 0; i < old->nr_ia_ranges; i++) {
|
||||
if (new->ia_range[i].sector != old->ia_range[i].sector ||
|
||||
new->ia_range[i].nr_sectors != old->ia_range[i].nr_sectors)
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* disk_alloc_independent_access_ranges - Allocate an independent access ranges
|
||||
* data structure
|
||||
* @disk: target disk
|
||||
* @nr_ia_ranges: Number of independent access ranges
|
||||
*
|
||||
* Allocate a struct blk_independent_access_ranges structure with @nr_ia_ranges
|
||||
* access range descriptors.
|
||||
*/
|
||||
struct blk_independent_access_ranges *
|
||||
disk_alloc_independent_access_ranges(struct gendisk *disk, int nr_ia_ranges)
|
||||
{
|
||||
struct blk_independent_access_ranges *iars;
|
||||
|
||||
iars = kzalloc_node(struct_size(iars, ia_range, nr_ia_ranges),
|
||||
GFP_KERNEL, disk->queue->node);
|
||||
if (iars)
|
||||
iars->nr_ia_ranges = nr_ia_ranges;
|
||||
return iars;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(disk_alloc_independent_access_ranges);
|
||||
|
||||
/**
|
||||
* disk_set_independent_access_ranges - Set a disk independent access ranges
|
||||
* @disk: target disk
|
||||
* @iars: independent access ranges structure
|
||||
*
|
||||
* Set the independent access ranges information of the request queue
|
||||
* of @disk to @iars. If @iars is NULL and the independent access ranges
|
||||
* structure already set is cleared. If there are no differences between
|
||||
* @iars and the independent access ranges structure already set, @iars
|
||||
* is freed.
|
||||
*/
|
||||
void disk_set_independent_access_ranges(struct gendisk *disk,
|
||||
struct blk_independent_access_ranges *iars)
|
||||
{
|
||||
struct request_queue *q = disk->queue;
|
||||
|
||||
if (WARN_ON_ONCE(iars && !iars->nr_ia_ranges)) {
|
||||
kfree(iars);
|
||||
iars = NULL;
|
||||
}
|
||||
|
||||
mutex_lock(&q->sysfs_dir_lock);
|
||||
mutex_lock(&q->sysfs_lock);
|
||||
|
||||
if (iars) {
|
||||
if (!disk_check_ia_ranges(disk, iars)) {
|
||||
kfree(iars);
|
||||
iars = NULL;
|
||||
goto reg;
|
||||
}
|
||||
|
||||
if (!disk_ia_ranges_changed(disk, iars)) {
|
||||
kfree(iars);
|
||||
goto unlock;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* This may be called for a registered queue. E.g. during a device
|
||||
* revalidation. If that is the case, we need to unregister the old
|
||||
* set of independent access ranges and register the new set. If the
|
||||
* queue is not registered, registration of the device request queue
|
||||
* will register the independent access ranges, so only swap in the
|
||||
* new set and free the old one.
|
||||
*/
|
||||
reg:
|
||||
if (blk_queue_registered(q)) {
|
||||
disk_register_independent_access_ranges(disk, iars);
|
||||
} else {
|
||||
swap(q->ia_ranges, iars);
|
||||
kfree(iars);
|
||||
}
|
||||
|
||||
unlock:
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
mutex_unlock(&q->sysfs_dir_lock);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(disk_set_independent_access_ranges);
|
@ -6,7 +6,7 @@
|
||||
* Written by: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
*/
|
||||
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/backing-dev.h>
|
||||
#include <linux/mempool.h>
|
||||
#include <linux/bio.h>
|
||||
@ -409,9 +409,9 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
|
||||
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
|
||||
|
||||
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
|
||||
if (disk->queue->ksm) {
|
||||
if (disk->queue->crypto_profile) {
|
||||
pr_warn("blk-integrity: Integrity and hardware inline encryption are not supported together. Disabling hardware inline encryption.\n");
|
||||
blk_ksm_unregister(disk->queue);
|
||||
blk_crypto_unregister(disk->queue);
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
@ -3165,12 +3165,12 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
|
||||
if (IS_ERR(bdev))
|
||||
return PTR_ERR(bdev);
|
||||
|
||||
ioc = q_to_ioc(bdev->bd_disk->queue);
|
||||
ioc = q_to_ioc(bdev_get_queue(bdev));
|
||||
if (!ioc) {
|
||||
ret = blk_iocost_init(bdev->bd_disk->queue);
|
||||
ret = blk_iocost_init(bdev_get_queue(bdev));
|
||||
if (ret)
|
||||
goto err;
|
||||
ioc = q_to_ioc(bdev->bd_disk->queue);
|
||||
ioc = q_to_ioc(bdev_get_queue(bdev));
|
||||
}
|
||||
|
||||
spin_lock_irq(&ioc->lock);
|
||||
@ -3332,12 +3332,12 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
|
||||
if (IS_ERR(bdev))
|
||||
return PTR_ERR(bdev);
|
||||
|
||||
ioc = q_to_ioc(bdev->bd_disk->queue);
|
||||
ioc = q_to_ioc(bdev_get_queue(bdev));
|
||||
if (!ioc) {
|
||||
ret = blk_iocost_init(bdev->bd_disk->queue);
|
||||
ret = blk_iocost_init(bdev_get_queue(bdev));
|
||||
if (ret)
|
||||
goto err;
|
||||
ioc = q_to_ioc(bdev->bd_disk->queue);
|
||||
ioc = q_to_ioc(bdev_get_queue(bdev));
|
||||
}
|
||||
|
||||
spin_lock_irq(&ioc->lock);
|
||||
|
@ -74,6 +74,7 @@
|
||||
#include <linux/sched/signal.h>
|
||||
#include <trace/events/block.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/blk-cgroup.h>
|
||||
#include "blk-rq-qos.h"
|
||||
#include "blk-stat.h"
|
||||
#include "blk.h"
|
||||
|
@ -6,12 +6,45 @@
|
||||
#include <linux/module.h>
|
||||
#include <linux/bio.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/scatterlist.h>
|
||||
|
||||
#include <trace/events/block.h>
|
||||
|
||||
#include "blk.h"
|
||||
#include "blk-rq-qos.h"
|
||||
#include "blk-throttle.h"
|
||||
|
||||
static inline void bio_get_first_bvec(struct bio *bio, struct bio_vec *bv)
|
||||
{
|
||||
*bv = mp_bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
|
||||
}
|
||||
|
||||
static inline void bio_get_last_bvec(struct bio *bio, struct bio_vec *bv)
|
||||
{
|
||||
struct bvec_iter iter = bio->bi_iter;
|
||||
int idx;
|
||||
|
||||
bio_get_first_bvec(bio, bv);
|
||||
if (bv->bv_len == bio->bi_iter.bi_size)
|
||||
return; /* this bio only has a single bvec */
|
||||
|
||||
bio_advance_iter(bio, &iter, iter.bi_size);
|
||||
|
||||
if (!iter.bi_bvec_done)
|
||||
idx = iter.bi_idx - 1;
|
||||
else /* in the middle of bvec */
|
||||
idx = iter.bi_idx;
|
||||
|
||||
*bv = bio->bi_io_vec[idx];
|
||||
|
||||
/*
|
||||
* iter.bi_bvec_done records actual length of the last bvec
|
||||
* if this bio ends in the middle of one io vector
|
||||
*/
|
||||
if (iter.bi_bvec_done)
|
||||
bv->bv_len = iter.bi_bvec_done;
|
||||
}
|
||||
|
||||
static inline bool bio_will_gap(struct request_queue *q,
|
||||
struct request *prev_rq, struct bio *prev, struct bio *next)
|
||||
@ -285,13 +318,13 @@ split:
|
||||
* iopoll in direct IO routine. Given performance gain of iopoll for
|
||||
* big IO can be trival, disable iopoll when split needed.
|
||||
*/
|
||||
bio_clear_hipri(bio);
|
||||
|
||||
bio_clear_polled(bio);
|
||||
return bio_split(bio, sectors, GFP_NOIO, bs);
|
||||
}
|
||||
|
||||
/**
|
||||
* __blk_queue_split - split a bio and submit the second half
|
||||
* @q: [in] request_queue new bio is being queued at
|
||||
* @bio: [in, out] bio to be split
|
||||
* @nr_segs: [out] number of segments in the first bio
|
||||
*
|
||||
@ -302,9 +335,9 @@ split:
|
||||
* of the caller to ensure that q->bio_split is only released after processing
|
||||
* of the split bio has finished.
|
||||
*/
|
||||
void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
|
||||
void __blk_queue_split(struct request_queue *q, struct bio **bio,
|
||||
unsigned int *nr_segs)
|
||||
{
|
||||
struct request_queue *q = (*bio)->bi_bdev->bd_disk->queue;
|
||||
struct bio *split = NULL;
|
||||
|
||||
switch (bio_op(*bio)) {
|
||||
@ -321,21 +354,6 @@ void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
|
||||
nr_segs);
|
||||
break;
|
||||
default:
|
||||
/*
|
||||
* All drivers must accept single-segments bios that are <=
|
||||
* PAGE_SIZE. This is a quick and dirty check that relies on
|
||||
* the fact that bi_io_vec[0] is always valid if a bio has data.
|
||||
* The check might lead to occasional false negatives when bios
|
||||
* are cloned, but compared to the performance impact of cloned
|
||||
* bios themselves the loop below doesn't matter anyway.
|
||||
*/
|
||||
if (!q->limits.chunk_sectors &&
|
||||
(*bio)->bi_vcnt == 1 &&
|
||||
((*bio)->bi_io_vec[0].bv_len +
|
||||
(*bio)->bi_io_vec[0].bv_offset) <= PAGE_SIZE) {
|
||||
*nr_segs = 1;
|
||||
break;
|
||||
}
|
||||
split = blk_bio_segment_split(q, *bio, &q->bio_split, nr_segs);
|
||||
break;
|
||||
}
|
||||
@ -365,9 +383,11 @@ void __blk_queue_split(struct bio **bio, unsigned int *nr_segs)
|
||||
*/
|
||||
void blk_queue_split(struct bio **bio)
|
||||
{
|
||||
struct request_queue *q = bdev_get_queue((*bio)->bi_bdev);
|
||||
unsigned int nr_segs;
|
||||
|
||||
__blk_queue_split(bio, &nr_segs);
|
||||
if (blk_may_split(q, *bio))
|
||||
__blk_queue_split(q, bio, &nr_segs);
|
||||
}
|
||||
EXPORT_SYMBOL(blk_queue_split);
|
||||
|
||||
@ -558,6 +578,23 @@ static inline unsigned int blk_rq_get_max_segments(struct request *rq)
|
||||
return queue_max_segments(rq->q);
|
||||
}
|
||||
|
||||
static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
|
||||
sector_t offset)
|
||||
{
|
||||
struct request_queue *q = rq->q;
|
||||
|
||||
if (blk_rq_is_passthrough(rq))
|
||||
return q->limits.max_hw_sectors;
|
||||
|
||||
if (!q->limits.chunk_sectors ||
|
||||
req_op(rq) == REQ_OP_DISCARD ||
|
||||
req_op(rq) == REQ_OP_SECURE_ERASE)
|
||||
return blk_queue_get_max_sectors(q, req_op(rq));
|
||||
|
||||
return min(blk_max_size_offset(q, offset, 0),
|
||||
blk_queue_get_max_sectors(q, req_op(rq)));
|
||||
}
|
||||
|
||||
static inline int ll_new_hw_segment(struct request *req, struct bio *bio,
|
||||
unsigned int nr_phys_segs)
|
||||
{
|
||||
@ -718,6 +755,13 @@ static enum elv_merge blk_try_req_merge(struct request *req,
|
||||
return ELEVATOR_NO_MERGE;
|
||||
}
|
||||
|
||||
static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b)
|
||||
{
|
||||
if (bio_page(a) == bio_page(b) && bio_offset(a) == bio_offset(b))
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
/*
|
||||
* For non-mq, this has to be called with the request spinlock acquired.
|
||||
* For mq with scheduling, the appropriate queue wide lock should be held.
|
||||
@ -1023,12 +1067,11 @@ static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
|
||||
* @q: request_queue new bio is being queued at
|
||||
* @bio: new bio being queued
|
||||
* @nr_segs: number of segments in @bio
|
||||
* @same_queue_rq: pointer to &struct request that gets filled in when
|
||||
* another request associated with @q is found on the plug list
|
||||
* (optional, may be %NULL)
|
||||
* @same_queue_rq: output value, will be true if there's an existing request
|
||||
* from the passed in @q already in the plug list
|
||||
*
|
||||
* Determine whether @bio being queued on @q can be merged with a request
|
||||
* on %current's plugged list. Returns %true if merge was successful,
|
||||
* Determine whether @bio being queued on @q can be merged with the previous
|
||||
* request on %current's plugged list. Returns %true if merge was successful,
|
||||
* otherwise %false.
|
||||
*
|
||||
* Plugging coalesces IOs from the same issuer for the same purpose without
|
||||
@ -1041,36 +1084,26 @@ static enum bio_merge_status blk_attempt_bio_merge(struct request_queue *q,
|
||||
* Caller must ensure !blk_queue_nomerges(q) beforehand.
|
||||
*/
|
||||
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
|
||||
unsigned int nr_segs, struct request **same_queue_rq)
|
||||
unsigned int nr_segs, bool *same_queue_rq)
|
||||
{
|
||||
struct blk_plug *plug;
|
||||
struct request *rq;
|
||||
struct list_head *plug_list;
|
||||
|
||||
plug = blk_mq_plug(q, bio);
|
||||
if (!plug)
|
||||
if (!plug || rq_list_empty(plug->mq_list))
|
||||
return false;
|
||||
|
||||
plug_list = &plug->mq_list;
|
||||
|
||||
list_for_each_entry_reverse(rq, plug_list, queuelist) {
|
||||
if (rq->q == q && same_queue_rq) {
|
||||
/*
|
||||
* Only blk-mq multiple hardware queues case checks the
|
||||
* rq in the same queue, there should be only one such
|
||||
* rq in a queue
|
||||
**/
|
||||
*same_queue_rq = rq;
|
||||
}
|
||||
|
||||
if (rq->q != q)
|
||||
continue;
|
||||
|
||||
if (blk_attempt_bio_merge(q, rq, bio, nr_segs, false) ==
|
||||
BIO_MERGE_OK)
|
||||
return true;
|
||||
/* check the previously added entry for a quick merge attempt */
|
||||
rq = rq_list_peek(&plug->mq_list);
|
||||
if (rq->q == q) {
|
||||
/*
|
||||
* Only blk-mq multiple hardware queues case checks the rq in
|
||||
* the same queue, there should be only one such rq in a queue
|
||||
*/
|
||||
*same_queue_rq = true;
|
||||
}
|
||||
|
||||
if (blk_attempt_bio_merge(q, rq, bio, nr_segs, false) == BIO_MERGE_OK)
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
|
@ -287,7 +287,7 @@ static const char *const cmd_flag_name[] = {
|
||||
CMD_FLAG_NAME(BACKGROUND),
|
||||
CMD_FLAG_NAME(NOWAIT),
|
||||
CMD_FLAG_NAME(NOUNMAP),
|
||||
CMD_FLAG_NAME(HIPRI),
|
||||
CMD_FLAG_NAME(POLLED),
|
||||
};
|
||||
#undef CMD_FLAG_NAME
|
||||
|
||||
@ -453,11 +453,11 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
|
||||
atomic_read(&tags->active_queues));
|
||||
|
||||
seq_puts(m, "\nbitmap_tags:\n");
|
||||
sbitmap_queue_show(tags->bitmap_tags, m);
|
||||
sbitmap_queue_show(&tags->bitmap_tags, m);
|
||||
|
||||
if (tags->nr_reserved_tags) {
|
||||
seq_puts(m, "\nbreserved_tags:\n");
|
||||
sbitmap_queue_show(tags->breserved_tags, m);
|
||||
sbitmap_queue_show(&tags->breserved_tags, m);
|
||||
}
|
||||
}
|
||||
|
||||
@ -488,7 +488,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
|
||||
if (res)
|
||||
goto out;
|
||||
if (hctx->tags)
|
||||
sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
|
||||
sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
|
||||
out:
|
||||
@ -522,77 +522,13 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
|
||||
if (res)
|
||||
goto out;
|
||||
if (hctx->sched_tags)
|
||||
sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
|
||||
sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
|
||||
out:
|
||||
return res;
|
||||
}
|
||||
|
||||
static int hctx_io_poll_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = data;
|
||||
|
||||
seq_printf(m, "considered=%lu\n", hctx->poll_considered);
|
||||
seq_printf(m, "invoked=%lu\n", hctx->poll_invoked);
|
||||
seq_printf(m, "success=%lu\n", hctx->poll_success);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static ssize_t hctx_io_poll_write(void *data, const char __user *buf,
|
||||
size_t count, loff_t *ppos)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = data;
|
||||
|
||||
hctx->poll_considered = hctx->poll_invoked = hctx->poll_success = 0;
|
||||
return count;
|
||||
}
|
||||
|
||||
static int hctx_dispatched_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = data;
|
||||
int i;
|
||||
|
||||
seq_printf(m, "%8u\t%lu\n", 0U, hctx->dispatched[0]);
|
||||
|
||||
for (i = 1; i < BLK_MQ_MAX_DISPATCH_ORDER - 1; i++) {
|
||||
unsigned int d = 1U << (i - 1);
|
||||
|
||||
seq_printf(m, "%8u\t%lu\n", d, hctx->dispatched[i]);
|
||||
}
|
||||
|
||||
seq_printf(m, "%8u+\t%lu\n", 1U << (i - 1), hctx->dispatched[i]);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static ssize_t hctx_dispatched_write(void *data, const char __user *buf,
|
||||
size_t count, loff_t *ppos)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = data;
|
||||
int i;
|
||||
|
||||
for (i = 0; i < BLK_MQ_MAX_DISPATCH_ORDER; i++)
|
||||
hctx->dispatched[i] = 0;
|
||||
return count;
|
||||
}
|
||||
|
||||
static int hctx_queued_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = data;
|
||||
|
||||
seq_printf(m, "%lu\n", hctx->queued);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static ssize_t hctx_queued_write(void *data, const char __user *buf,
|
||||
size_t count, loff_t *ppos)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = data;
|
||||
|
||||
hctx->queued = 0;
|
||||
return count;
|
||||
}
|
||||
|
||||
static int hctx_run_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = data;
|
||||
@ -614,7 +550,7 @@ static int hctx_active_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = data;
|
||||
|
||||
seq_printf(m, "%d\n", atomic_read(&hctx->nr_active));
|
||||
seq_printf(m, "%d\n", __blk_mq_active_requests(hctx));
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -663,57 +599,6 @@ CTX_RQ_SEQ_OPS(default, HCTX_TYPE_DEFAULT);
|
||||
CTX_RQ_SEQ_OPS(read, HCTX_TYPE_READ);
|
||||
CTX_RQ_SEQ_OPS(poll, HCTX_TYPE_POLL);
|
||||
|
||||
static int ctx_dispatched_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = data;
|
||||
|
||||
seq_printf(m, "%lu %lu\n", ctx->rq_dispatched[1], ctx->rq_dispatched[0]);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static ssize_t ctx_dispatched_write(void *data, const char __user *buf,
|
||||
size_t count, loff_t *ppos)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = data;
|
||||
|
||||
ctx->rq_dispatched[0] = ctx->rq_dispatched[1] = 0;
|
||||
return count;
|
||||
}
|
||||
|
||||
static int ctx_merged_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = data;
|
||||
|
||||
seq_printf(m, "%lu\n", ctx->rq_merged);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static ssize_t ctx_merged_write(void *data, const char __user *buf,
|
||||
size_t count, loff_t *ppos)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = data;
|
||||
|
||||
ctx->rq_merged = 0;
|
||||
return count;
|
||||
}
|
||||
|
||||
static int ctx_completed_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = data;
|
||||
|
||||
seq_printf(m, "%lu %lu\n", ctx->rq_completed[1], ctx->rq_completed[0]);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static ssize_t ctx_completed_write(void *data, const char __user *buf,
|
||||
size_t count, loff_t *ppos)
|
||||
{
|
||||
struct blk_mq_ctx *ctx = data;
|
||||
|
||||
ctx->rq_completed[0] = ctx->rq_completed[1] = 0;
|
||||
return count;
|
||||
}
|
||||
|
||||
static int blk_mq_debugfs_show(struct seq_file *m, void *v)
|
||||
{
|
||||
const struct blk_mq_debugfs_attr *attr = m->private;
|
||||
@ -789,9 +674,6 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
|
||||
{"tags_bitmap", 0400, hctx_tags_bitmap_show},
|
||||
{"sched_tags", 0400, hctx_sched_tags_show},
|
||||
{"sched_tags_bitmap", 0400, hctx_sched_tags_bitmap_show},
|
||||
{"io_poll", 0600, hctx_io_poll_show, hctx_io_poll_write},
|
||||
{"dispatched", 0600, hctx_dispatched_show, hctx_dispatched_write},
|
||||
{"queued", 0600, hctx_queued_show, hctx_queued_write},
|
||||
{"run", 0600, hctx_run_show, hctx_run_write},
|
||||
{"active", 0400, hctx_active_show},
|
||||
{"dispatch_busy", 0400, hctx_dispatch_busy_show},
|
||||
@ -803,9 +685,6 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_ctx_attrs[] = {
|
||||
{"default_rq_list", 0400, .seq_ops = &ctx_default_rq_list_seq_ops},
|
||||
{"read_rq_list", 0400, .seq_ops = &ctx_read_rq_list_seq_ops},
|
||||
{"poll_rq_list", 0400, .seq_ops = &ctx_poll_rq_list_seq_ops},
|
||||
{"dispatched", 0600, ctx_dispatched_show, ctx_dispatched_write},
|
||||
{"merged", 0600, ctx_merged_show, ctx_merged_write},
|
||||
{"completed", 0600, ctx_completed_show, ctx_completed_write},
|
||||
{},
|
||||
};
|
||||
|
||||
|
@ -57,10 +57,8 @@ void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx);
|
||||
|
||||
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
|
||||
void __blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
|
||||
return;
|
||||
clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
|
||||
|
||||
/*
|
||||
@ -363,7 +361,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
|
||||
}
|
||||
}
|
||||
|
||||
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
|
||||
bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
|
||||
unsigned int nr_segs)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
@ -389,13 +387,10 @@ bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
|
||||
* potentially merge with. Currently includes a hand-wavy stop
|
||||
* count of 8, to not spend too much time checking for merges.
|
||||
*/
|
||||
if (blk_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs)) {
|
||||
ctx->rq_merged++;
|
||||
if (blk_bio_list_merge(q, &ctx->rq_lists[type], bio, nr_segs))
|
||||
ret = true;
|
||||
}
|
||||
|
||||
spin_unlock(&ctx->lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -515,83 +510,71 @@ void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
|
||||
percpu_ref_put(&q->q_usage_counter);
|
||||
}
|
||||
|
||||
static int blk_mq_sched_alloc_tags(struct request_queue *q,
|
||||
struct blk_mq_hw_ctx *hctx,
|
||||
unsigned int hctx_idx)
|
||||
static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
|
||||
struct blk_mq_hw_ctx *hctx,
|
||||
unsigned int hctx_idx)
|
||||
{
|
||||
struct blk_mq_tag_set *set = q->tag_set;
|
||||
int ret;
|
||||
|
||||
hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
|
||||
set->reserved_tags, set->flags);
|
||||
if (!hctx->sched_tags)
|
||||
return -ENOMEM;
|
||||
|
||||
ret = blk_mq_alloc_rqs(set, hctx->sched_tags, hctx_idx, q->nr_requests);
|
||||
if (ret) {
|
||||
blk_mq_free_rq_map(hctx->sched_tags, set->flags);
|
||||
hctx->sched_tags = NULL;
|
||||
if (blk_mq_is_shared_tags(q->tag_set->flags)) {
|
||||
hctx->sched_tags = q->sched_shared_tags;
|
||||
return 0;
|
||||
}
|
||||
|
||||
return ret;
|
||||
hctx->sched_tags = blk_mq_alloc_map_and_rqs(q->tag_set, hctx_idx,
|
||||
q->nr_requests);
|
||||
|
||||
if (!hctx->sched_tags)
|
||||
return -ENOMEM;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void blk_mq_exit_sched_shared_tags(struct request_queue *queue)
|
||||
{
|
||||
blk_mq_free_rq_map(queue->sched_shared_tags);
|
||||
queue->sched_shared_tags = NULL;
|
||||
}
|
||||
|
||||
/* called in queue's release handler, tagset has gone away */
|
||||
static void blk_mq_sched_tags_teardown(struct request_queue *q)
|
||||
static void blk_mq_sched_tags_teardown(struct request_queue *q, unsigned int flags)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
int i;
|
||||
|
||||
queue_for_each_hw_ctx(q, hctx, i) {
|
||||
if (hctx->sched_tags) {
|
||||
blk_mq_free_rq_map(hctx->sched_tags, hctx->flags);
|
||||
if (!blk_mq_is_shared_tags(flags))
|
||||
blk_mq_free_rq_map(hctx->sched_tags);
|
||||
hctx->sched_tags = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
if (blk_mq_is_shared_tags(flags))
|
||||
blk_mq_exit_sched_shared_tags(q);
|
||||
}
|
||||
|
||||
static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
|
||||
static int blk_mq_init_sched_shared_tags(struct request_queue *queue)
|
||||
{
|
||||
struct blk_mq_tag_set *set = queue->tag_set;
|
||||
int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
int ret, i;
|
||||
|
||||
/*
|
||||
* Set initial depth at max so that we don't need to reallocate for
|
||||
* updating nr_requests.
|
||||
*/
|
||||
ret = blk_mq_init_bitmaps(&queue->sched_bitmap_tags,
|
||||
&queue->sched_breserved_tags,
|
||||
MAX_SCHED_RQ, set->reserved_tags,
|
||||
set->numa_node, alloc_policy);
|
||||
if (ret)
|
||||
return ret;
|
||||
queue->sched_shared_tags = blk_mq_alloc_map_and_rqs(set,
|
||||
BLK_MQ_NO_HCTX_IDX,
|
||||
MAX_SCHED_RQ);
|
||||
if (!queue->sched_shared_tags)
|
||||
return -ENOMEM;
|
||||
|
||||
queue_for_each_hw_ctx(queue, hctx, i) {
|
||||
hctx->sched_tags->bitmap_tags =
|
||||
&queue->sched_bitmap_tags;
|
||||
hctx->sched_tags->breserved_tags =
|
||||
&queue->sched_breserved_tags;
|
||||
}
|
||||
|
||||
sbitmap_queue_resize(&queue->sched_bitmap_tags,
|
||||
queue->nr_requests - set->reserved_tags);
|
||||
blk_mq_tag_update_sched_shared_tags(queue);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
|
||||
{
|
||||
sbitmap_queue_free(&queue->sched_bitmap_tags);
|
||||
sbitmap_queue_free(&queue->sched_breserved_tags);
|
||||
}
|
||||
|
||||
int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
|
||||
{
|
||||
unsigned int i, flags = q->tag_set->flags;
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
struct elevator_queue *eq;
|
||||
unsigned int i;
|
||||
int ret;
|
||||
|
||||
if (!e) {
|
||||
@ -606,23 +589,23 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
|
||||
* Additionally, this is a per-hw queue depth.
|
||||
*/
|
||||
q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
|
||||
BLKDEV_MAX_RQ);
|
||||
BLKDEV_DEFAULT_RQ);
|
||||
|
||||
queue_for_each_hw_ctx(q, hctx, i) {
|
||||
ret = blk_mq_sched_alloc_tags(q, hctx, i);
|
||||
if (blk_mq_is_shared_tags(flags)) {
|
||||
ret = blk_mq_init_sched_shared_tags(q);
|
||||
if (ret)
|
||||
goto err_free_tags;
|
||||
return ret;
|
||||
}
|
||||
|
||||
if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
|
||||
ret = blk_mq_init_sched_shared_sbitmap(q);
|
||||
queue_for_each_hw_ctx(q, hctx, i) {
|
||||
ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i);
|
||||
if (ret)
|
||||
goto err_free_tags;
|
||||
goto err_free_map_and_rqs;
|
||||
}
|
||||
|
||||
ret = e->ops.init_sched(q, e);
|
||||
if (ret)
|
||||
goto err_free_sbitmap;
|
||||
goto err_free_map_and_rqs;
|
||||
|
||||
blk_mq_debugfs_register_sched(q);
|
||||
|
||||
@ -631,7 +614,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
|
||||
ret = e->ops.init_hctx(hctx, i);
|
||||
if (ret) {
|
||||
eq = q->elevator;
|
||||
blk_mq_sched_free_requests(q);
|
||||
blk_mq_sched_free_rqs(q);
|
||||
blk_mq_exit_sched(q, eq);
|
||||
kobject_put(&eq->kobj);
|
||||
return ret;
|
||||
@ -642,12 +625,10 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
|
||||
|
||||
return 0;
|
||||
|
||||
err_free_sbitmap:
|
||||
if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
|
||||
blk_mq_exit_sched_shared_sbitmap(q);
|
||||
err_free_tags:
|
||||
blk_mq_sched_free_requests(q);
|
||||
blk_mq_sched_tags_teardown(q);
|
||||
err_free_map_and_rqs:
|
||||
blk_mq_sched_free_rqs(q);
|
||||
blk_mq_sched_tags_teardown(q, flags);
|
||||
|
||||
q->elevator = NULL;
|
||||
return ret;
|
||||
}
|
||||
@ -656,14 +637,20 @@ err_free_tags:
|
||||
* called in either blk_queue_cleanup or elevator_switch, tagset
|
||||
* is required for freeing requests
|
||||
*/
|
||||
void blk_mq_sched_free_requests(struct request_queue *q)
|
||||
void blk_mq_sched_free_rqs(struct request_queue *q)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
int i;
|
||||
|
||||
queue_for_each_hw_ctx(q, hctx, i) {
|
||||
if (hctx->sched_tags)
|
||||
blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
|
||||
if (blk_mq_is_shared_tags(q->tag_set->flags)) {
|
||||
blk_mq_free_rqs(q->tag_set, q->sched_shared_tags,
|
||||
BLK_MQ_NO_HCTX_IDX);
|
||||
} else {
|
||||
queue_for_each_hw_ctx(q, hctx, i) {
|
||||
if (hctx->sched_tags)
|
||||
blk_mq_free_rqs(q->tag_set,
|
||||
hctx->sched_tags, i);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -684,8 +671,6 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
|
||||
blk_mq_debugfs_unregister_sched(q);
|
||||
if (e->type->ops.exit_sched)
|
||||
e->type->ops.exit_sched(e);
|
||||
blk_mq_sched_tags_teardown(q);
|
||||
if (blk_mq_is_sbitmap_shared(flags))
|
||||
blk_mq_exit_sched_shared_sbitmap(q);
|
||||
blk_mq_sched_tags_teardown(q, flags);
|
||||
q->elevator = NULL;
|
||||
}
|
||||
|
@ -2,21 +2,22 @@
|
||||
#ifndef BLK_MQ_SCHED_H
|
||||
#define BLK_MQ_SCHED_H
|
||||
|
||||
#include "elevator.h"
|
||||
#include "blk-mq.h"
|
||||
#include "blk-mq-tag.h"
|
||||
|
||||
#define MAX_SCHED_RQ (16 * BLKDEV_MAX_RQ)
|
||||
#define MAX_SCHED_RQ (16 * BLKDEV_DEFAULT_RQ)
|
||||
|
||||
void blk_mq_sched_assign_ioc(struct request *rq);
|
||||
|
||||
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
|
||||
unsigned int nr_segs, struct request **merged_request);
|
||||
bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
|
||||
bool blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
|
||||
unsigned int nr_segs);
|
||||
bool blk_mq_sched_try_insert_merge(struct request_queue *q, struct request *rq,
|
||||
struct list_head *free);
|
||||
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx);
|
||||
void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
|
||||
void __blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx);
|
||||
|
||||
void blk_mq_sched_insert_request(struct request *rq, bool at_head,
|
||||
bool run_queue, bool async);
|
||||
@ -28,45 +29,51 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
|
||||
|
||||
int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e);
|
||||
void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
|
||||
void blk_mq_sched_free_requests(struct request_queue *q);
|
||||
void blk_mq_sched_free_rqs(struct request_queue *q);
|
||||
|
||||
static inline bool
|
||||
blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
|
||||
unsigned int nr_segs)
|
||||
static inline void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
if (blk_queue_nomerges(q) || !bio_mergeable(bio))
|
||||
return false;
|
||||
if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
|
||||
__blk_mq_sched_restart(hctx);
|
||||
}
|
||||
|
||||
return __blk_mq_sched_bio_merge(q, bio, nr_segs);
|
||||
static inline bool bio_mergeable(struct bio *bio)
|
||||
{
|
||||
return !(bio->bi_opf & REQ_NOMERGE_FLAGS);
|
||||
}
|
||||
|
||||
static inline bool
|
||||
blk_mq_sched_allow_merge(struct request_queue *q, struct request *rq,
|
||||
struct bio *bio)
|
||||
{
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e && e->type->ops.allow_merge)
|
||||
return e->type->ops.allow_merge(q, rq, bio);
|
||||
if (rq->rq_flags & RQF_ELV) {
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if (e->type->ops.allow_merge)
|
||||
return e->type->ops.allow_merge(q, rq, bio);
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
static inline void blk_mq_sched_completed_request(struct request *rq, u64 now)
|
||||
{
|
||||
struct elevator_queue *e = rq->q->elevator;
|
||||
if (rq->rq_flags & RQF_ELV) {
|
||||
struct elevator_queue *e = rq->q->elevator;
|
||||
|
||||
if (e && e->type->ops.completed_request)
|
||||
e->type->ops.completed_request(rq, now);
|
||||
if (e->type->ops.completed_request)
|
||||
e->type->ops.completed_request(rq, now);
|
||||
}
|
||||
}
|
||||
|
||||
static inline void blk_mq_sched_requeue_request(struct request *rq)
|
||||
{
|
||||
struct request_queue *q = rq->q;
|
||||
struct elevator_queue *e = q->elevator;
|
||||
if (rq->rq_flags & RQF_ELV) {
|
||||
struct request_queue *q = rq->q;
|
||||
struct elevator_queue *e = q->elevator;
|
||||
|
||||
if ((rq->rq_flags & RQF_ELVPRIV) && e && e->type->ops.requeue_request)
|
||||
e->type->ops.requeue_request(rq);
|
||||
if ((rq->rq_flags & RQF_ELVPRIV) && e->type->ops.requeue_request)
|
||||
e->type->ops.requeue_request(rq);
|
||||
}
|
||||
}
|
||||
|
||||
static inline bool blk_mq_sched_has_work(struct blk_mq_hw_ctx *hctx)
|
||||
|
@ -24,13 +24,12 @@
|
||||
*/
|
||||
bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
|
||||
if (blk_mq_is_shared_tags(hctx->flags)) {
|
||||
struct request_queue *q = hctx->queue;
|
||||
struct blk_mq_tag_set *set = q->tag_set;
|
||||
|
||||
if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
|
||||
!test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
|
||||
atomic_inc(&set->active_queues_shared_sbitmap);
|
||||
atomic_inc(&hctx->tags->active_queues);
|
||||
} else {
|
||||
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
|
||||
!test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
|
||||
@ -45,9 +44,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
|
||||
*/
|
||||
void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
|
||||
{
|
||||
sbitmap_queue_wake_all(tags->bitmap_tags);
|
||||
sbitmap_queue_wake_all(&tags->bitmap_tags);
|
||||
if (include_reserve)
|
||||
sbitmap_queue_wake_all(tags->breserved_tags);
|
||||
sbitmap_queue_wake_all(&tags->breserved_tags);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -57,20 +56,20 @@ void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
|
||||
void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
struct blk_mq_tags *tags = hctx->tags;
|
||||
struct request_queue *q = hctx->queue;
|
||||
struct blk_mq_tag_set *set = q->tag_set;
|
||||
|
||||
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
|
||||
if (blk_mq_is_shared_tags(hctx->flags)) {
|
||||
struct request_queue *q = hctx->queue;
|
||||
|
||||
if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
|
||||
&q->queue_flags))
|
||||
return;
|
||||
atomic_dec(&set->active_queues_shared_sbitmap);
|
||||
} else {
|
||||
if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
|
||||
return;
|
||||
atomic_dec(&tags->active_queues);
|
||||
}
|
||||
|
||||
atomic_dec(&tags->active_queues);
|
||||
|
||||
blk_mq_tag_wakeup_all(tags, false);
|
||||
}
|
||||
|
||||
@ -87,6 +86,21 @@ static int __blk_mq_get_tag(struct blk_mq_alloc_data *data,
|
||||
return __sbitmap_queue_get(bt);
|
||||
}
|
||||
|
||||
unsigned long blk_mq_get_tags(struct blk_mq_alloc_data *data, int nr_tags,
|
||||
unsigned int *offset)
|
||||
{
|
||||
struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
|
||||
struct sbitmap_queue *bt = &tags->bitmap_tags;
|
||||
unsigned long ret;
|
||||
|
||||
if (data->shallow_depth ||data->flags & BLK_MQ_REQ_RESERVED ||
|
||||
data->hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
|
||||
return 0;
|
||||
ret = __sbitmap_queue_get_batch(bt, nr_tags, offset);
|
||||
*offset += tags->nr_reserved_tags;
|
||||
return ret;
|
||||
}
|
||||
|
||||
unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
||||
{
|
||||
struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
|
||||
@ -101,10 +115,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
||||
WARN_ON_ONCE(1);
|
||||
return BLK_MQ_NO_TAG;
|
||||
}
|
||||
bt = tags->breserved_tags;
|
||||
bt = &tags->breserved_tags;
|
||||
tag_offset = 0;
|
||||
} else {
|
||||
bt = tags->bitmap_tags;
|
||||
bt = &tags->bitmap_tags;
|
||||
tag_offset = tags->nr_reserved_tags;
|
||||
}
|
||||
|
||||
@ -150,9 +164,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
|
||||
data->ctx);
|
||||
tags = blk_mq_tags_from_data(data);
|
||||
if (data->flags & BLK_MQ_REQ_RESERVED)
|
||||
bt = tags->breserved_tags;
|
||||
bt = &tags->breserved_tags;
|
||||
else
|
||||
bt = tags->bitmap_tags;
|
||||
bt = &tags->bitmap_tags;
|
||||
|
||||
/*
|
||||
* If destination hw queue is changed, fake wake up on
|
||||
@ -186,13 +200,19 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
|
||||
const int real_tag = tag - tags->nr_reserved_tags;
|
||||
|
||||
BUG_ON(real_tag >= tags->nr_tags);
|
||||
sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu);
|
||||
sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
|
||||
} else {
|
||||
BUG_ON(tag >= tags->nr_reserved_tags);
|
||||
sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
|
||||
sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
|
||||
}
|
||||
}
|
||||
|
||||
void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags)
|
||||
{
|
||||
sbitmap_queue_clear_batch(&tags->bitmap_tags, tags->nr_reserved_tags,
|
||||
tag_array, nr_tags);
|
||||
}
|
||||
|
||||
struct bt_iter_data {
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
busy_iter_fn *fn;
|
||||
@ -340,9 +360,9 @@ static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags,
|
||||
WARN_ON_ONCE(flags & BT_TAG_ITER_RESERVED);
|
||||
|
||||
if (tags->nr_reserved_tags)
|
||||
bt_tags_for_each(tags, tags->breserved_tags, fn, priv,
|
||||
bt_tags_for_each(tags, &tags->breserved_tags, fn, priv,
|
||||
flags | BT_TAG_ITER_RESERVED);
|
||||
bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, flags);
|
||||
bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, flags);
|
||||
}
|
||||
|
||||
/**
|
||||
@ -379,9 +399,12 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
|
||||
void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
|
||||
busy_tag_iter_fn *fn, void *priv)
|
||||
{
|
||||
int i;
|
||||
unsigned int flags = tagset->flags;
|
||||
int i, nr_tags;
|
||||
|
||||
for (i = 0; i < tagset->nr_hw_queues; i++) {
|
||||
nr_tags = blk_mq_is_shared_tags(flags) ? 1 : tagset->nr_hw_queues;
|
||||
|
||||
for (i = 0; i < nr_tags; i++) {
|
||||
if (tagset->tags && tagset->tags[i])
|
||||
__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
|
||||
BT_TAG_ITER_STARTED);
|
||||
@ -459,8 +482,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
|
||||
continue;
|
||||
|
||||
if (tags->nr_reserved_tags)
|
||||
bt_for_each(hctx, tags->breserved_tags, fn, priv, true);
|
||||
bt_for_each(hctx, tags->bitmap_tags, fn, priv, false);
|
||||
bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
|
||||
bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
|
||||
}
|
||||
blk_queue_exit(q);
|
||||
}
|
||||
@ -492,56 +515,10 @@ free_bitmap_tags:
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
|
||||
int node, int alloc_policy)
|
||||
{
|
||||
int ret;
|
||||
|
||||
ret = blk_mq_init_bitmaps(&tags->__bitmap_tags,
|
||||
&tags->__breserved_tags,
|
||||
tags->nr_tags, tags->nr_reserved_tags,
|
||||
node, alloc_policy);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
tags->bitmap_tags = &tags->__bitmap_tags;
|
||||
tags->breserved_tags = &tags->__breserved_tags;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set)
|
||||
{
|
||||
int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
|
||||
int i, ret;
|
||||
|
||||
ret = blk_mq_init_bitmaps(&set->__bitmap_tags, &set->__breserved_tags,
|
||||
set->queue_depth, set->reserved_tags,
|
||||
set->numa_node, alloc_policy);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
for (i = 0; i < set->nr_hw_queues; i++) {
|
||||
struct blk_mq_tags *tags = set->tags[i];
|
||||
|
||||
tags->bitmap_tags = &set->__bitmap_tags;
|
||||
tags->breserved_tags = &set->__breserved_tags;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set)
|
||||
{
|
||||
sbitmap_queue_free(&set->__bitmap_tags);
|
||||
sbitmap_queue_free(&set->__breserved_tags);
|
||||
}
|
||||
|
||||
struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
|
||||
unsigned int reserved_tags,
|
||||
int node, unsigned int flags)
|
||||
int node, int alloc_policy)
|
||||
{
|
||||
int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(flags);
|
||||
struct blk_mq_tags *tags;
|
||||
|
||||
if (total_tags > BLK_MQ_TAG_MAX) {
|
||||
@ -557,22 +534,19 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
|
||||
tags->nr_reserved_tags = reserved_tags;
|
||||
spin_lock_init(&tags->lock);
|
||||
|
||||
if (blk_mq_is_sbitmap_shared(flags))
|
||||
return tags;
|
||||
|
||||
if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
|
||||
if (blk_mq_init_bitmaps(&tags->bitmap_tags, &tags->breserved_tags,
|
||||
total_tags, reserved_tags, node,
|
||||
alloc_policy) < 0) {
|
||||
kfree(tags);
|
||||
return NULL;
|
||||
}
|
||||
return tags;
|
||||
}
|
||||
|
||||
void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags)
|
||||
void blk_mq_free_tags(struct blk_mq_tags *tags)
|
||||
{
|
||||
if (!blk_mq_is_sbitmap_shared(flags)) {
|
||||
sbitmap_queue_free(tags->bitmap_tags);
|
||||
sbitmap_queue_free(tags->breserved_tags);
|
||||
}
|
||||
sbitmap_queue_free(&tags->bitmap_tags);
|
||||
sbitmap_queue_free(&tags->breserved_tags);
|
||||
kfree(tags);
|
||||
}
|
||||
|
||||
@ -592,7 +566,6 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
|
||||
if (tdepth > tags->nr_tags) {
|
||||
struct blk_mq_tag_set *set = hctx->queue->tag_set;
|
||||
struct blk_mq_tags *new;
|
||||
bool ret;
|
||||
|
||||
if (!can_grow)
|
||||
return -EINVAL;
|
||||
@ -604,34 +577,42 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
|
||||
if (tdepth > MAX_SCHED_RQ)
|
||||
return -EINVAL;
|
||||
|
||||
new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth,
|
||||
tags->nr_reserved_tags, set->flags);
|
||||
/*
|
||||
* Only the sbitmap needs resizing since we allocated the max
|
||||
* initially.
|
||||
*/
|
||||
if (blk_mq_is_shared_tags(set->flags))
|
||||
return 0;
|
||||
|
||||
new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
|
||||
if (!new)
|
||||
return -ENOMEM;
|
||||
ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
|
||||
if (ret) {
|
||||
blk_mq_free_rq_map(new, set->flags);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
|
||||
blk_mq_free_rq_map(*tagsptr, set->flags);
|
||||
blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
|
||||
*tagsptr = new;
|
||||
} else {
|
||||
/*
|
||||
* Don't need (or can't) update reserved tags here, they
|
||||
* remain static and should never need resizing.
|
||||
*/
|
||||
sbitmap_queue_resize(tags->bitmap_tags,
|
||||
sbitmap_queue_resize(&tags->bitmap_tags,
|
||||
tdepth - tags->nr_reserved_tags);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int size)
|
||||
void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set, unsigned int size)
|
||||
{
|
||||
sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
|
||||
struct blk_mq_tags *tags = set->shared_tags;
|
||||
|
||||
sbitmap_queue_resize(&tags->bitmap_tags, size - set->reserved_tags);
|
||||
}
|
||||
|
||||
void blk_mq_tag_update_sched_shared_tags(struct request_queue *q)
|
||||
{
|
||||
sbitmap_queue_resize(&q->sched_shared_tags->bitmap_tags,
|
||||
q->nr_requests - q->tag_set->reserved_tags);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -2,52 +2,30 @@
|
||||
#ifndef INT_BLK_MQ_TAG_H
|
||||
#define INT_BLK_MQ_TAG_H
|
||||
|
||||
/*
|
||||
* Tag address space map.
|
||||
*/
|
||||
struct blk_mq_tags {
|
||||
unsigned int nr_tags;
|
||||
unsigned int nr_reserved_tags;
|
||||
|
||||
atomic_t active_queues;
|
||||
|
||||
struct sbitmap_queue *bitmap_tags;
|
||||
struct sbitmap_queue *breserved_tags;
|
||||
|
||||
struct sbitmap_queue __bitmap_tags;
|
||||
struct sbitmap_queue __breserved_tags;
|
||||
|
||||
struct request **rqs;
|
||||
struct request **static_rqs;
|
||||
struct list_head page_list;
|
||||
|
||||
/*
|
||||
* used to clear request reference in rqs[] before freeing one
|
||||
* request pool
|
||||
*/
|
||||
spinlock_t lock;
|
||||
};
|
||||
struct blk_mq_alloc_data;
|
||||
|
||||
extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
|
||||
unsigned int reserved_tags,
|
||||
int node, unsigned int flags);
|
||||
extern void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags);
|
||||
int node, int alloc_policy);
|
||||
extern void blk_mq_free_tags(struct blk_mq_tags *tags);
|
||||
extern int blk_mq_init_bitmaps(struct sbitmap_queue *bitmap_tags,
|
||||
struct sbitmap_queue *breserved_tags,
|
||||
unsigned int queue_depth,
|
||||
unsigned int reserved,
|
||||
int node, int alloc_policy);
|
||||
|
||||
extern int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set);
|
||||
extern void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set);
|
||||
extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
|
||||
unsigned long blk_mq_get_tags(struct blk_mq_alloc_data *data, int nr_tags,
|
||||
unsigned int *offset);
|
||||
extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
|
||||
unsigned int tag);
|
||||
void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags);
|
||||
extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
|
||||
struct blk_mq_tags **tags,
|
||||
unsigned int depth, bool can_grow);
|
||||
extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set,
|
||||
extern void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set,
|
||||
unsigned int size);
|
||||
extern void blk_mq_tag_update_sched_shared_tags(struct request_queue *q);
|
||||
|
||||
extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
|
||||
void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
|
||||
|
1032
block/blk-mq.c
1032
block/blk-mq.c
File diff suppressed because it is too large
Load Diff
@ -25,18 +25,14 @@ struct blk_mq_ctx {
|
||||
unsigned short index_hw[HCTX_MAX_TYPES];
|
||||
struct blk_mq_hw_ctx *hctxs[HCTX_MAX_TYPES];
|
||||
|
||||
/* incremented at dispatch time */
|
||||
unsigned long rq_dispatched[2];
|
||||
unsigned long rq_merged;
|
||||
|
||||
/* incremented at completion time */
|
||||
unsigned long ____cacheline_aligned_in_smp rq_completed[2];
|
||||
|
||||
struct request_queue *queue;
|
||||
struct blk_mq_ctxs *ctxs;
|
||||
struct kobject kobj;
|
||||
} ____cacheline_aligned_in_smp;
|
||||
|
||||
void blk_mq_submit_bio(struct bio *bio);
|
||||
int blk_mq_poll(struct request_queue *q, blk_qc_t cookie, struct io_comp_batch *iob,
|
||||
unsigned int flags);
|
||||
void blk_mq_exit_queue(struct request_queue *q);
|
||||
int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
|
||||
void blk_mq_wake_waiters(struct request_queue *q);
|
||||
@ -54,15 +50,12 @@ void blk_mq_put_rq_ref(struct request *rq);
|
||||
*/
|
||||
void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
|
||||
unsigned int hctx_idx);
|
||||
void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
|
||||
struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
|
||||
unsigned int hctx_idx,
|
||||
unsigned int nr_tags,
|
||||
unsigned int reserved_tags,
|
||||
unsigned int flags);
|
||||
int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
|
||||
unsigned int hctx_idx, unsigned int depth);
|
||||
|
||||
void blk_mq_free_rq_map(struct blk_mq_tags *tags);
|
||||
struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
|
||||
unsigned int hctx_idx, unsigned int depth);
|
||||
void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
|
||||
struct blk_mq_tags *tags,
|
||||
unsigned int hctx_idx);
|
||||
/*
|
||||
* Internal helpers for request insertion into sw queues
|
||||
*/
|
||||
@ -109,9 +102,9 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
|
||||
enum hctx_type type = HCTX_TYPE_DEFAULT;
|
||||
|
||||
/*
|
||||
* The caller ensure that if REQ_HIPRI, poll must be enabled.
|
||||
* The caller ensure that if REQ_POLLED, poll must be enabled.
|
||||
*/
|
||||
if (flags & REQ_HIPRI)
|
||||
if (flags & REQ_POLLED)
|
||||
type = HCTX_TYPE_POLL;
|
||||
else if ((flags & REQ_OP_MASK) == REQ_OP_READ)
|
||||
type = HCTX_TYPE_READ;
|
||||
@ -128,6 +121,8 @@ extern int __blk_mq_register_dev(struct device *dev, struct request_queue *q);
|
||||
extern int blk_mq_sysfs_register(struct request_queue *q);
|
||||
extern void blk_mq_sysfs_unregister(struct request_queue *q);
|
||||
extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
|
||||
void blk_mq_free_plug_rqs(struct blk_plug *plug);
|
||||
void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule);
|
||||
|
||||
void blk_mq_release(struct request_queue *q);
|
||||
|
||||
@ -154,23 +149,27 @@ struct blk_mq_alloc_data {
|
||||
blk_mq_req_flags_t flags;
|
||||
unsigned int shallow_depth;
|
||||
unsigned int cmd_flags;
|
||||
unsigned int rq_flags;
|
||||
|
||||
/* allocate multiple requests/tags in one go */
|
||||
unsigned int nr_tags;
|
||||
struct request **cached_rq;
|
||||
|
||||
/* input & output parameter */
|
||||
struct blk_mq_ctx *ctx;
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
};
|
||||
|
||||
static inline bool blk_mq_is_sbitmap_shared(unsigned int flags)
|
||||
static inline bool blk_mq_is_shared_tags(unsigned int flags)
|
||||
{
|
||||
return flags & BLK_MQ_F_TAG_HCTX_SHARED;
|
||||
}
|
||||
|
||||
static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
|
||||
{
|
||||
if (data->q->elevator)
|
||||
return data->hctx->sched_tags;
|
||||
|
||||
return data->hctx->tags;
|
||||
if (!(data->rq_flags & RQF_ELV))
|
||||
return data->hctx->tags;
|
||||
return data->hctx->sched_tags;
|
||||
}
|
||||
|
||||
static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
|
||||
@ -220,24 +219,24 @@ static inline int blk_mq_get_rq_budget_token(struct request *rq)
|
||||
|
||||
static inline void __blk_mq_inc_active_requests(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
if (blk_mq_is_sbitmap_shared(hctx->flags))
|
||||
atomic_inc(&hctx->queue->nr_active_requests_shared_sbitmap);
|
||||
if (blk_mq_is_shared_tags(hctx->flags))
|
||||
atomic_inc(&hctx->queue->nr_active_requests_shared_tags);
|
||||
else
|
||||
atomic_inc(&hctx->nr_active);
|
||||
}
|
||||
|
||||
static inline void __blk_mq_dec_active_requests(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
if (blk_mq_is_sbitmap_shared(hctx->flags))
|
||||
atomic_dec(&hctx->queue->nr_active_requests_shared_sbitmap);
|
||||
if (blk_mq_is_shared_tags(hctx->flags))
|
||||
atomic_dec(&hctx->queue->nr_active_requests_shared_tags);
|
||||
else
|
||||
atomic_dec(&hctx->nr_active);
|
||||
}
|
||||
|
||||
static inline int __blk_mq_active_requests(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
if (blk_mq_is_sbitmap_shared(hctx->flags))
|
||||
return atomic_read(&hctx->queue->nr_active_requests_shared_sbitmap);
|
||||
if (blk_mq_is_shared_tags(hctx->flags))
|
||||
return atomic_read(&hctx->queue->nr_active_requests_shared_tags);
|
||||
return atomic_read(&hctx->nr_active);
|
||||
}
|
||||
static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
|
||||
@ -260,7 +259,20 @@ static inline void blk_mq_put_driver_tag(struct request *rq)
|
||||
__blk_mq_put_driver_tag(rq->mq_hctx, rq);
|
||||
}
|
||||
|
||||
bool blk_mq_get_driver_tag(struct request *rq);
|
||||
bool __blk_mq_get_driver_tag(struct blk_mq_hw_ctx *hctx, struct request *rq);
|
||||
|
||||
static inline bool blk_mq_get_driver_tag(struct request *rq)
|
||||
{
|
||||
struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
|
||||
|
||||
if (rq->tag != BLK_MQ_NO_TAG &&
|
||||
!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
|
||||
hctx->tags->rqs[rq->tag] = rq;
|
||||
return true;
|
||||
}
|
||||
|
||||
return __blk_mq_get_driver_tag(hctx, rq);
|
||||
}
|
||||
|
||||
static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
|
||||
{
|
||||
@ -331,19 +343,18 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
|
||||
if (bt->sb.depth == 1)
|
||||
return true;
|
||||
|
||||
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
|
||||
if (blk_mq_is_shared_tags(hctx->flags)) {
|
||||
struct request_queue *q = hctx->queue;
|
||||
struct blk_mq_tag_set *set = q->tag_set;
|
||||
|
||||
if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
|
||||
return true;
|
||||
users = atomic_read(&set->active_queues_shared_sbitmap);
|
||||
} else {
|
||||
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
|
||||
return true;
|
||||
users = atomic_read(&hctx->tags->active_queues);
|
||||
}
|
||||
|
||||
users = atomic_read(&hctx->tags->active_queues);
|
||||
|
||||
if (!users)
|
||||
return true;
|
||||
|
||||
|
@ -189,9 +189,10 @@ static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio)
|
||||
* BIO_TRACKED lets controllers know that a bio went through the
|
||||
* normal rq_qos path.
|
||||
*/
|
||||
bio_set_flag(bio, BIO_TRACKED);
|
||||
if (q->rq_qos)
|
||||
if (q->rq_qos) {
|
||||
bio_set_flag(bio, BIO_TRACKED);
|
||||
__rq_qos_throttle(q->rq_qos, bio);
|
||||
}
|
||||
}
|
||||
|
||||
static inline void rq_qos_track(struct request_queue *q, struct request *rq,
|
||||
|
@ -17,6 +17,7 @@
|
||||
#include "blk-mq.h"
|
||||
#include "blk-mq-debugfs.h"
|
||||
#include "blk-wbt.h"
|
||||
#include "blk-throttle.h"
|
||||
|
||||
struct queue_sysfs_entry {
|
||||
struct attribute attr;
|
||||
@ -432,26 +433,11 @@ static ssize_t queue_poll_show(struct request_queue *q, char *page)
|
||||
static ssize_t queue_poll_store(struct request_queue *q, const char *page,
|
||||
size_t count)
|
||||
{
|
||||
unsigned long poll_on;
|
||||
ssize_t ret;
|
||||
|
||||
if (!q->tag_set || q->tag_set->nr_maps <= HCTX_TYPE_POLL ||
|
||||
!q->tag_set->map[HCTX_TYPE_POLL].nr_queues)
|
||||
if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
|
||||
return -EINVAL;
|
||||
|
||||
ret = queue_var_store(&poll_on, page, count);
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
|
||||
if (poll_on) {
|
||||
blk_queue_flag_set(QUEUE_FLAG_POLL, q);
|
||||
} else {
|
||||
blk_mq_freeze_queue(q);
|
||||
blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
|
||||
blk_mq_unfreeze_queue(q);
|
||||
}
|
||||
|
||||
return ret;
|
||||
pr_info_ratelimited("writes to the poll attribute are ignored.\n");
|
||||
pr_info_ratelimited("please use driver specific parameters instead.\n");
|
||||
return count;
|
||||
}
|
||||
|
||||
static ssize_t queue_io_timeout_show(struct request_queue *q, char *page)
|
||||
@ -887,16 +873,15 @@ int blk_register_queue(struct gendisk *disk)
|
||||
}
|
||||
|
||||
mutex_lock(&q->sysfs_lock);
|
||||
|
||||
ret = disk_register_independent_access_ranges(disk, NULL);
|
||||
if (ret)
|
||||
goto put_dev;
|
||||
|
||||
if (q->elevator) {
|
||||
ret = elv_register_queue(q, false);
|
||||
if (ret) {
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
mutex_unlock(&q->sysfs_dir_lock);
|
||||
kobject_del(&q->kobj);
|
||||
blk_trace_remove_sysfs(dev);
|
||||
kobject_put(&dev->kobj);
|
||||
return ret;
|
||||
}
|
||||
if (ret)
|
||||
goto put_dev;
|
||||
}
|
||||
|
||||
blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
|
||||
@ -927,6 +912,16 @@ unlock:
|
||||
percpu_ref_switch_to_percpu(&q->q_usage_counter);
|
||||
}
|
||||
|
||||
return ret;
|
||||
|
||||
put_dev:
|
||||
disk_unregister_independent_access_ranges(disk);
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
mutex_unlock(&q->sysfs_dir_lock);
|
||||
kobject_del(&q->kobj);
|
||||
blk_trace_remove_sysfs(dev);
|
||||
kobject_put(&dev->kobj);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -972,6 +967,7 @@ void blk_unregister_queue(struct gendisk *disk)
|
||||
mutex_lock(&q->sysfs_lock);
|
||||
if (q->elevator)
|
||||
elv_unregister_queue(q);
|
||||
disk_unregister_independent_access_ranges(disk);
|
||||
mutex_unlock(&q->sysfs_lock);
|
||||
mutex_unlock(&q->sysfs_dir_lock);
|
||||
|
||||
|
@ -13,6 +13,7 @@
|
||||
#include <linux/blk-cgroup.h>
|
||||
#include "blk.h"
|
||||
#include "blk-cgroup-rwstat.h"
|
||||
#include "blk-throttle.h"
|
||||
|
||||
/* Max dispatch from a group in 1 round */
|
||||
#define THROTL_GRP_QUANTUM 8
|
||||
@ -37,60 +38,9 @@
|
||||
*/
|
||||
#define LATENCY_FILTERED_HD (1000L) /* 1ms */
|
||||
|
||||
static struct blkcg_policy blkcg_policy_throtl;
|
||||
|
||||
/* A workqueue to queue throttle related work */
|
||||
static struct workqueue_struct *kthrotld_workqueue;
|
||||
|
||||
/*
|
||||
* To implement hierarchical throttling, throtl_grps form a tree and bios
|
||||
* are dispatched upwards level by level until they reach the top and get
|
||||
* issued. When dispatching bios from the children and local group at each
|
||||
* level, if the bios are dispatched into a single bio_list, there's a risk
|
||||
* of a local or child group which can queue many bios at once filling up
|
||||
* the list starving others.
|
||||
*
|
||||
* To avoid such starvation, dispatched bios are queued separately
|
||||
* according to where they came from. When they are again dispatched to
|
||||
* the parent, they're popped in round-robin order so that no single source
|
||||
* hogs the dispatch window.
|
||||
*
|
||||
* throtl_qnode is used to keep the queued bios separated by their sources.
|
||||
* Bios are queued to throtl_qnode which in turn is queued to
|
||||
* throtl_service_queue and then dispatched in round-robin order.
|
||||
*
|
||||
* It's also used to track the reference counts on blkg's. A qnode always
|
||||
* belongs to a throtl_grp and gets queued on itself or the parent, so
|
||||
* incrementing the reference of the associated throtl_grp when a qnode is
|
||||
* queued and decrementing when dequeued is enough to keep the whole blkg
|
||||
* tree pinned while bios are in flight.
|
||||
*/
|
||||
struct throtl_qnode {
|
||||
struct list_head node; /* service_queue->queued[] */
|
||||
struct bio_list bios; /* queued bios */
|
||||
struct throtl_grp *tg; /* tg this qnode belongs to */
|
||||
};
|
||||
|
||||
struct throtl_service_queue {
|
||||
struct throtl_service_queue *parent_sq; /* the parent service_queue */
|
||||
|
||||
/*
|
||||
* Bios queued directly to this service_queue or dispatched from
|
||||
* children throtl_grp's.
|
||||
*/
|
||||
struct list_head queued[2]; /* throtl_qnode [READ/WRITE] */
|
||||
unsigned int nr_queued[2]; /* number of queued bios */
|
||||
|
||||
/*
|
||||
* RB tree of active children throtl_grp's, which are sorted by
|
||||
* their ->disptime.
|
||||
*/
|
||||
struct rb_root_cached pending_tree; /* RB tree of active tgs */
|
||||
unsigned int nr_pending; /* # queued in the tree */
|
||||
unsigned long first_pending_disptime; /* disptime of the first tg */
|
||||
struct timer_list pending_timer; /* fires on first_pending_disptime */
|
||||
};
|
||||
|
||||
enum tg_state_flags {
|
||||
THROTL_TG_PENDING = 1 << 0, /* on parent's pending tree */
|
||||
THROTL_TG_WAS_EMPTY = 1 << 1, /* bio_lists[] became non-empty */
|
||||
@ -98,93 +48,6 @@ enum tg_state_flags {
|
||||
|
||||
#define rb_entry_tg(node) rb_entry((node), struct throtl_grp, rb_node)
|
||||
|
||||
enum {
|
||||
LIMIT_LOW,
|
||||
LIMIT_MAX,
|
||||
LIMIT_CNT,
|
||||
};
|
||||
|
||||
struct throtl_grp {
|
||||
/* must be the first member */
|
||||
struct blkg_policy_data pd;
|
||||
|
||||
/* active throtl group service_queue member */
|
||||
struct rb_node rb_node;
|
||||
|
||||
/* throtl_data this group belongs to */
|
||||
struct throtl_data *td;
|
||||
|
||||
/* this group's service queue */
|
||||
struct throtl_service_queue service_queue;
|
||||
|
||||
/*
|
||||
* qnode_on_self is used when bios are directly queued to this
|
||||
* throtl_grp so that local bios compete fairly with bios
|
||||
* dispatched from children. qnode_on_parent is used when bios are
|
||||
* dispatched from this throtl_grp into its parent and will compete
|
||||
* with the sibling qnode_on_parents and the parent's
|
||||
* qnode_on_self.
|
||||
*/
|
||||
struct throtl_qnode qnode_on_self[2];
|
||||
struct throtl_qnode qnode_on_parent[2];
|
||||
|
||||
/*
|
||||
* Dispatch time in jiffies. This is the estimated time when group
|
||||
* will unthrottle and is ready to dispatch more bio. It is used as
|
||||
* key to sort active groups in service tree.
|
||||
*/
|
||||
unsigned long disptime;
|
||||
|
||||
unsigned int flags;
|
||||
|
||||
/* are there any throtl rules between this group and td? */
|
||||
bool has_rules[2];
|
||||
|
||||
/* internally used bytes per second rate limits */
|
||||
uint64_t bps[2][LIMIT_CNT];
|
||||
/* user configured bps limits */
|
||||
uint64_t bps_conf[2][LIMIT_CNT];
|
||||
|
||||
/* internally used IOPS limits */
|
||||
unsigned int iops[2][LIMIT_CNT];
|
||||
/* user configured IOPS limits */
|
||||
unsigned int iops_conf[2][LIMIT_CNT];
|
||||
|
||||
/* Number of bytes dispatched in current slice */
|
||||
uint64_t bytes_disp[2];
|
||||
/* Number of bio's dispatched in current slice */
|
||||
unsigned int io_disp[2];
|
||||
|
||||
unsigned long last_low_overflow_time[2];
|
||||
|
||||
uint64_t last_bytes_disp[2];
|
||||
unsigned int last_io_disp[2];
|
||||
|
||||
unsigned long last_check_time;
|
||||
|
||||
unsigned long latency_target; /* us */
|
||||
unsigned long latency_target_conf; /* us */
|
||||
/* When did we start a new slice */
|
||||
unsigned long slice_start[2];
|
||||
unsigned long slice_end[2];
|
||||
|
||||
unsigned long last_finish_time; /* ns / 1024 */
|
||||
unsigned long checked_last_finish_time; /* ns / 1024 */
|
||||
unsigned long avg_idletime; /* ns / 1024 */
|
||||
unsigned long idletime_threshold; /* us */
|
||||
unsigned long idletime_threshold_conf; /* us */
|
||||
|
||||
unsigned int bio_cnt; /* total bios */
|
||||
unsigned int bad_bio_cnt; /* bios exceeding latency threshold */
|
||||
unsigned long bio_cnt_reset_time;
|
||||
|
||||
atomic_t io_split_cnt[2];
|
||||
atomic_t last_io_split_cnt[2];
|
||||
|
||||
struct blkg_rwstat stat_bytes;
|
||||
struct blkg_rwstat stat_ios;
|
||||
};
|
||||
|
||||
/* We measure latency for request size from <= 4k to >= 1M */
|
||||
#define LATENCY_BUCKET_SIZE 9
|
||||
|
||||
@ -231,16 +94,6 @@ struct throtl_data
|
||||
|
||||
static void throtl_pending_timer_fn(struct timer_list *t);
|
||||
|
||||
static inline struct throtl_grp *pd_to_tg(struct blkg_policy_data *pd)
|
||||
{
|
||||
return pd ? container_of(pd, struct throtl_grp, pd) : NULL;
|
||||
}
|
||||
|
||||
static inline struct throtl_grp *blkg_to_tg(struct blkcg_gq *blkg)
|
||||
{
|
||||
return pd_to_tg(blkg_to_pd(blkg, &blkcg_policy_throtl));
|
||||
}
|
||||
|
||||
static inline struct blkcg_gq *tg_to_blkg(struct throtl_grp *tg)
|
||||
{
|
||||
return pd_to_blkg(&tg->pd);
|
||||
@ -1794,7 +1647,7 @@ static void throtl_shutdown_wq(struct request_queue *q)
|
||||
cancel_work_sync(&td->dispatch_work);
|
||||
}
|
||||
|
||||
static struct blkcg_policy blkcg_policy_throtl = {
|
||||
struct blkcg_policy blkcg_policy_throtl = {
|
||||
.dfl_cftypes = throtl_files,
|
||||
.legacy_cftypes = throtl_legacy_files,
|
||||
|
||||
@ -2208,9 +2061,9 @@ void blk_throtl_charge_bio_split(struct bio *bio)
|
||||
} while (parent);
|
||||
}
|
||||
|
||||
bool blk_throtl_bio(struct bio *bio)
|
||||
bool __blk_throtl_bio(struct bio *bio)
|
||||
{
|
||||
struct request_queue *q = bio->bi_bdev->bd_disk->queue;
|
||||
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
|
||||
struct blkcg_gq *blkg = bio->bi_blkg;
|
||||
struct throtl_qnode *qn = NULL;
|
||||
struct throtl_grp *tg = blkg_to_tg(blkg);
|
||||
@ -2221,19 +2074,12 @@ bool blk_throtl_bio(struct bio *bio)
|
||||
|
||||
rcu_read_lock();
|
||||
|
||||
/* see throtl_charge_bio() */
|
||||
if (bio_flagged(bio, BIO_THROTTLED))
|
||||
goto out;
|
||||
|
||||
if (!cgroup_subsys_on_dfl(io_cgrp_subsys)) {
|
||||
blkg_rwstat_add(&tg->stat_bytes, bio->bi_opf,
|
||||
bio->bi_iter.bi_size);
|
||||
blkg_rwstat_add(&tg->stat_ios, bio->bi_opf, 1);
|
||||
}
|
||||
|
||||
if (!tg->has_rules[rw])
|
||||
goto out;
|
||||
|
||||
spin_lock_irq(&q->queue_lock);
|
||||
|
||||
throtl_update_latency_buckets(td);
|
||||
@ -2317,7 +2163,6 @@ again:
|
||||
|
||||
out_unlock:
|
||||
spin_unlock_irq(&q->queue_lock);
|
||||
out:
|
||||
bio_set_flag(bio, BIO_THROTTLED);
|
||||
|
||||
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
|
||||
|
182
block/blk-throttle.h
Normal file
182
block/blk-throttle.h
Normal file
@ -0,0 +1,182 @@
|
||||
#ifndef BLK_THROTTLE_H
|
||||
#define BLK_THROTTLE_H
|
||||
|
||||
#include "blk-cgroup-rwstat.h"
|
||||
|
||||
/*
|
||||
* To implement hierarchical throttling, throtl_grps form a tree and bios
|
||||
* are dispatched upwards level by level until they reach the top and get
|
||||
* issued. When dispatching bios from the children and local group at each
|
||||
* level, if the bios are dispatched into a single bio_list, there's a risk
|
||||
* of a local or child group which can queue many bios at once filling up
|
||||
* the list starving others.
|
||||
*
|
||||
* To avoid such starvation, dispatched bios are queued separately
|
||||
* according to where they came from. When they are again dispatched to
|
||||
* the parent, they're popped in round-robin order so that no single source
|
||||
* hogs the dispatch window.
|
||||
*
|
||||
* throtl_qnode is used to keep the queued bios separated by their sources.
|
||||
* Bios are queued to throtl_qnode which in turn is queued to
|
||||
* throtl_service_queue and then dispatched in round-robin order.
|
||||
*
|
||||
* It's also used to track the reference counts on blkg's. A qnode always
|
||||
* belongs to a throtl_grp and gets queued on itself or the parent, so
|
||||
* incrementing the reference of the associated throtl_grp when a qnode is
|
||||
* queued and decrementing when dequeued is enough to keep the whole blkg
|
||||
* tree pinned while bios are in flight.
|
||||
*/
|
||||
struct throtl_qnode {
|
||||
struct list_head node; /* service_queue->queued[] */
|
||||
struct bio_list bios; /* queued bios */
|
||||
struct throtl_grp *tg; /* tg this qnode belongs to */
|
||||
};
|
||||
|
||||
struct throtl_service_queue {
|
||||
struct throtl_service_queue *parent_sq; /* the parent service_queue */
|
||||
|
||||
/*
|
||||
* Bios queued directly to this service_queue or dispatched from
|
||||
* children throtl_grp's.
|
||||
*/
|
||||
struct list_head queued[2]; /* throtl_qnode [READ/WRITE] */
|
||||
unsigned int nr_queued[2]; /* number of queued bios */
|
||||
|
||||
/*
|
||||
* RB tree of active children throtl_grp's, which are sorted by
|
||||
* their ->disptime.
|
||||
*/
|
||||
struct rb_root_cached pending_tree; /* RB tree of active tgs */
|
||||
unsigned int nr_pending; /* # queued in the tree */
|
||||
unsigned long first_pending_disptime; /* disptime of the first tg */
|
||||
struct timer_list pending_timer; /* fires on first_pending_disptime */
|
||||
};
|
||||
|
||||
enum {
|
||||
LIMIT_LOW,
|
||||
LIMIT_MAX,
|
||||
LIMIT_CNT,
|
||||
};
|
||||
|
||||
struct throtl_grp {
|
||||
/* must be the first member */
|
||||
struct blkg_policy_data pd;
|
||||
|
||||
/* active throtl group service_queue member */
|
||||
struct rb_node rb_node;
|
||||
|
||||
/* throtl_data this group belongs to */
|
||||
struct throtl_data *td;
|
||||
|
||||
/* this group's service queue */
|
||||
struct throtl_service_queue service_queue;
|
||||
|
||||
/*
|
||||
* qnode_on_self is used when bios are directly queued to this
|
||||
* throtl_grp so that local bios compete fairly with bios
|
||||
* dispatched from children. qnode_on_parent is used when bios are
|
||||
* dispatched from this throtl_grp into its parent and will compete
|
||||
* with the sibling qnode_on_parents and the parent's
|
||||
* qnode_on_self.
|
||||
*/
|
||||
struct throtl_qnode qnode_on_self[2];
|
||||
struct throtl_qnode qnode_on_parent[2];
|
||||
|
||||
/*
|
||||
* Dispatch time in jiffies. This is the estimated time when group
|
||||
* will unthrottle and is ready to dispatch more bio. It is used as
|
||||
* key to sort active groups in service tree.
|
||||
*/
|
||||
unsigned long disptime;
|
||||
|
||||
unsigned int flags;
|
||||
|
||||
/* are there any throtl rules between this group and td? */
|
||||
bool has_rules[2];
|
||||
|
||||
/* internally used bytes per second rate limits */
|
||||
uint64_t bps[2][LIMIT_CNT];
|
||||
/* user configured bps limits */
|
||||
uint64_t bps_conf[2][LIMIT_CNT];
|
||||
|
||||
/* internally used IOPS limits */
|
||||
unsigned int iops[2][LIMIT_CNT];
|
||||
/* user configured IOPS limits */
|
||||
unsigned int iops_conf[2][LIMIT_CNT];
|
||||
|
||||
/* Number of bytes dispatched in current slice */
|
||||
uint64_t bytes_disp[2];
|
||||
/* Number of bio's dispatched in current slice */
|
||||
unsigned int io_disp[2];
|
||||
|
||||
unsigned long last_low_overflow_time[2];
|
||||
|
||||
uint64_t last_bytes_disp[2];
|
||||
unsigned int last_io_disp[2];
|
||||
|
||||
unsigned long last_check_time;
|
||||
|
||||
unsigned long latency_target; /* us */
|
||||
unsigned long latency_target_conf; /* us */
|
||||
/* When did we start a new slice */
|
||||
unsigned long slice_start[2];
|
||||
unsigned long slice_end[2];
|
||||
|
||||
unsigned long last_finish_time; /* ns / 1024 */
|
||||
unsigned long checked_last_finish_time; /* ns / 1024 */
|
||||
unsigned long avg_idletime; /* ns / 1024 */
|
||||
unsigned long idletime_threshold; /* us */
|
||||
unsigned long idletime_threshold_conf; /* us */
|
||||
|
||||
unsigned int bio_cnt; /* total bios */
|
||||
unsigned int bad_bio_cnt; /* bios exceeding latency threshold */
|
||||
unsigned long bio_cnt_reset_time;
|
||||
|
||||
atomic_t io_split_cnt[2];
|
||||
atomic_t last_io_split_cnt[2];
|
||||
|
||||
struct blkg_rwstat stat_bytes;
|
||||
struct blkg_rwstat stat_ios;
|
||||
};
|
||||
|
||||
extern struct blkcg_policy blkcg_policy_throtl;
|
||||
|
||||
static inline struct throtl_grp *pd_to_tg(struct blkg_policy_data *pd)
|
||||
{
|
||||
return pd ? container_of(pd, struct throtl_grp, pd) : NULL;
|
||||
}
|
||||
|
||||
static inline struct throtl_grp *blkg_to_tg(struct blkcg_gq *blkg)
|
||||
{
|
||||
return pd_to_tg(blkg_to_pd(blkg, &blkcg_policy_throtl));
|
||||
}
|
||||
|
||||
/*
|
||||
* Internal throttling interface
|
||||
*/
|
||||
#ifndef CONFIG_BLK_DEV_THROTTLING
|
||||
static inline int blk_throtl_init(struct request_queue *q) { return 0; }
|
||||
static inline void blk_throtl_exit(struct request_queue *q) { }
|
||||
static inline void blk_throtl_register_queue(struct request_queue *q) { }
|
||||
static inline void blk_throtl_charge_bio_split(struct bio *bio) { }
|
||||
static inline bool blk_throtl_bio(struct bio *bio) { return false; }
|
||||
#else /* CONFIG_BLK_DEV_THROTTLING */
|
||||
int blk_throtl_init(struct request_queue *q);
|
||||
void blk_throtl_exit(struct request_queue *q);
|
||||
void blk_throtl_register_queue(struct request_queue *q);
|
||||
void blk_throtl_charge_bio_split(struct bio *bio);
|
||||
bool __blk_throtl_bio(struct bio *bio);
|
||||
static inline bool blk_throtl_bio(struct bio *bio)
|
||||
{
|
||||
struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg);
|
||||
|
||||
if (bio_flagged(bio, BIO_THROTTLED))
|
||||
return false;
|
||||
if (!tg->has_rules[bio_data_dir(bio)])
|
||||
return false;
|
||||
|
||||
return __blk_throtl_bio(bio);
|
||||
}
|
||||
#endif /* CONFIG_BLK_DEV_THROTTLING */
|
||||
|
||||
#endif
|
@ -357,6 +357,9 @@ static void wb_timer_fn(struct blk_stat_callback *cb)
|
||||
unsigned int inflight = wbt_inflight(rwb);
|
||||
int status;
|
||||
|
||||
if (!rwb->rqos.q->disk)
|
||||
return;
|
||||
|
||||
status = latency_exceeded(rwb, cb->stat);
|
||||
|
||||
trace_wbt_timer(rwb->rqos.q->disk->bdi, status, rqd->scale_step,
|
||||
|
131
block/blk.h
131
block/blk.h
@ -12,6 +12,8 @@
|
||||
#include "blk-mq.h"
|
||||
#include "blk-mq-sched.h"
|
||||
|
||||
struct elevator_type;
|
||||
|
||||
/* Max future timer expiry for timeouts */
|
||||
#define BLK_MAX_TIMEOUT (5 * HZ)
|
||||
|
||||
@ -94,6 +96,44 @@ static inline bool bvec_gap_to_prev(struct request_queue *q,
|
||||
return __bvec_gap_to_prev(q, bprv, offset);
|
||||
}
|
||||
|
||||
static inline bool rq_mergeable(struct request *rq)
|
||||
{
|
||||
if (blk_rq_is_passthrough(rq))
|
||||
return false;
|
||||
|
||||
if (req_op(rq) == REQ_OP_FLUSH)
|
||||
return false;
|
||||
|
||||
if (req_op(rq) == REQ_OP_WRITE_ZEROES)
|
||||
return false;
|
||||
|
||||
if (req_op(rq) == REQ_OP_ZONE_APPEND)
|
||||
return false;
|
||||
|
||||
if (rq->cmd_flags & REQ_NOMERGE_FLAGS)
|
||||
return false;
|
||||
if (rq->rq_flags & RQF_NOMERGE_FLAGS)
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/*
|
||||
* There are two different ways to handle DISCARD merges:
|
||||
* 1) If max_discard_segments > 1, the driver treats every bio as a range and
|
||||
* send the bios to controller together. The ranges don't need to be
|
||||
* contiguous.
|
||||
* 2) Otherwise, the request will be normal read/write requests. The ranges
|
||||
* need to be contiguous.
|
||||
*/
|
||||
static inline bool blk_discard_mergable(struct request *req)
|
||||
{
|
||||
if (req_op(req) == REQ_OP_DISCARD &&
|
||||
queue_max_discard_segments(req->q) > 1)
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_BLK_DEV_INTEGRITY
|
||||
void blk_flush_integrity(void);
|
||||
bool __bio_integrity_endio(struct bio *);
|
||||
@ -175,21 +215,28 @@ static inline void blk_integrity_del(struct gendisk *disk)
|
||||
|
||||
unsigned long blk_rq_timeout(unsigned long timeout);
|
||||
void blk_add_timer(struct request *req);
|
||||
void blk_print_req_error(struct request *req, blk_status_t status);
|
||||
|
||||
bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
|
||||
unsigned int nr_segs, struct request **same_queue_rq);
|
||||
unsigned int nr_segs, bool *same_queue_rq);
|
||||
bool blk_bio_list_merge(struct request_queue *q, struct list_head *list,
|
||||
struct bio *bio, unsigned int nr_segs);
|
||||
|
||||
void blk_account_io_start(struct request *req);
|
||||
void blk_account_io_done(struct request *req, u64 now);
|
||||
void __blk_account_io_start(struct request *req);
|
||||
void __blk_account_io_done(struct request *req, u64 now);
|
||||
|
||||
/*
|
||||
* Plug flush limits
|
||||
*/
|
||||
#define BLK_MAX_REQUEST_COUNT 32
|
||||
#define BLK_PLUG_FLUSH_SIZE (128 * 1024)
|
||||
|
||||
/*
|
||||
* Internal elevator interface
|
||||
*/
|
||||
#define ELV_ON_HASH(rq) ((rq)->rq_flags & RQF_HASHED)
|
||||
|
||||
void blk_insert_flush(struct request *rq);
|
||||
bool blk_insert_flush(struct request *rq);
|
||||
|
||||
int elevator_switch_mq(struct request_queue *q,
|
||||
struct elevator_type *new_e);
|
||||
@ -202,7 +249,7 @@ static inline void elevator_exit(struct request_queue *q,
|
||||
{
|
||||
lockdep_assert_held(&q->sysfs_lock);
|
||||
|
||||
blk_mq_sched_free_requests(q);
|
||||
blk_mq_sched_free_rqs(q);
|
||||
__elevator_exit(q, e);
|
||||
}
|
||||
|
||||
@ -220,7 +267,32 @@ ssize_t part_timeout_show(struct device *, struct device_attribute *, char *);
|
||||
ssize_t part_timeout_store(struct device *, struct device_attribute *,
|
||||
const char *, size_t);
|
||||
|
||||
void __blk_queue_split(struct bio **bio, unsigned int *nr_segs);
|
||||
static inline bool blk_may_split(struct request_queue *q, struct bio *bio)
|
||||
{
|
||||
switch (bio_op(bio)) {
|
||||
case REQ_OP_DISCARD:
|
||||
case REQ_OP_SECURE_ERASE:
|
||||
case REQ_OP_WRITE_ZEROES:
|
||||
case REQ_OP_WRITE_SAME:
|
||||
return true; /* non-trivial splitting decisions */
|
||||
default:
|
||||
break;
|
||||
}
|
||||
|
||||
/*
|
||||
* All drivers must accept single-segments bios that are <= PAGE_SIZE.
|
||||
* This is a quick and dirty check that relies on the fact that
|
||||
* bi_io_vec[0] is always valid if a bio has data. The check might
|
||||
* lead to occasional false negatives when bios are cloned, but compared
|
||||
* to the performance impact of cloned bios themselves the loop below
|
||||
* doesn't matter anyway.
|
||||
*/
|
||||
return q->limits.chunk_sectors || bio->bi_vcnt != 1 ||
|
||||
bio->bi_io_vec->bv_len + bio->bi_io_vec->bv_offset > PAGE_SIZE;
|
||||
}
|
||||
|
||||
void __blk_queue_split(struct request_queue *q, struct bio **bio,
|
||||
unsigned int *nr_segs);
|
||||
int ll_back_merge_fn(struct request *req, struct bio *bio,
|
||||
unsigned int nr_segs);
|
||||
bool blk_attempt_req_merge(struct request_queue *q, struct request *rq,
|
||||
@ -240,7 +312,25 @@ int blk_dev_init(void);
|
||||
*/
|
||||
static inline bool blk_do_io_stat(struct request *rq)
|
||||
{
|
||||
return rq->rq_disk && (rq->rq_flags & RQF_IO_STAT);
|
||||
return (rq->rq_flags & RQF_IO_STAT) && rq->rq_disk;
|
||||
}
|
||||
|
||||
static inline void blk_account_io_done(struct request *req, u64 now)
|
||||
{
|
||||
/*
|
||||
* Account IO completion. flush_rq isn't accounted as a
|
||||
* normal IO on queueing nor completion. Accounting the
|
||||
* containing request is enough.
|
||||
*/
|
||||
if (blk_do_io_stat(req) && req->part &&
|
||||
!(req->rq_flags & RQF_FLUSH_SEQ))
|
||||
__blk_account_io_done(req, now);
|
||||
}
|
||||
|
||||
static inline void blk_account_io_start(struct request *req)
|
||||
{
|
||||
if (blk_do_io_stat(req))
|
||||
__blk_account_io_start(req);
|
||||
}
|
||||
|
||||
static inline void req_set_nomerge(struct request_queue *q, struct request *req)
|
||||
@ -285,22 +375,6 @@ void ioc_clear_queue(struct request_queue *q);
|
||||
|
||||
int create_task_io_context(struct task_struct *task, gfp_t gfp_mask, int node);
|
||||
|
||||
/*
|
||||
* Internal throttling interface
|
||||
*/
|
||||
#ifdef CONFIG_BLK_DEV_THROTTLING
|
||||
extern int blk_throtl_init(struct request_queue *q);
|
||||
extern void blk_throtl_exit(struct request_queue *q);
|
||||
extern void blk_throtl_register_queue(struct request_queue *q);
|
||||
extern void blk_throtl_charge_bio_split(struct bio *bio);
|
||||
bool blk_throtl_bio(struct bio *bio);
|
||||
#else /* CONFIG_BLK_DEV_THROTTLING */
|
||||
static inline int blk_throtl_init(struct request_queue *q) { return 0; }
|
||||
static inline void blk_throtl_exit(struct request_queue *q) { }
|
||||
static inline void blk_throtl_register_queue(struct request_queue *q) { }
|
||||
static inline void blk_throtl_charge_bio_split(struct bio *bio) { }
|
||||
static inline bool blk_throtl_bio(struct bio *bio) { return false; }
|
||||
#endif /* CONFIG_BLK_DEV_THROTTLING */
|
||||
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
|
||||
extern ssize_t blk_throtl_sample_time_show(struct request_queue *q, char *page);
|
||||
extern ssize_t blk_throtl_sample_time_store(struct request_queue *q,
|
||||
@ -368,13 +442,20 @@ extern struct device_attribute dev_attr_events;
|
||||
extern struct device_attribute dev_attr_events_async;
|
||||
extern struct device_attribute dev_attr_events_poll_msecs;
|
||||
|
||||
static inline void bio_clear_hipri(struct bio *bio)
|
||||
static inline void bio_clear_polled(struct bio *bio)
|
||||
{
|
||||
/* can't support alloc cache if we turn off polling */
|
||||
bio_clear_flag(bio, BIO_PERCPU_CACHE);
|
||||
bio->bi_opf &= ~REQ_HIPRI;
|
||||
bio->bi_opf &= ~REQ_POLLED;
|
||||
}
|
||||
|
||||
long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
|
||||
long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
|
||||
|
||||
extern const struct address_space_operations def_blk_aops;
|
||||
|
||||
int disk_register_independent_access_ranges(struct gendisk *disk,
|
||||
struct blk_independent_access_ranges *new_iars);
|
||||
void disk_unregister_independent_access_ranges(struct gendisk *disk);
|
||||
|
||||
#endif /* BLK_INTERNAL_H */
|
||||
|
@ -14,6 +14,7 @@
|
||||
#include <linux/pagemap.h>
|
||||
#include <linux/mempool.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-cgroup.h>
|
||||
#include <linux/backing-dev.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/hash.h>
|
||||
|
@ -26,7 +26,6 @@
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/elevator.h>
|
||||
#include <linux/bio.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/slab.h>
|
||||
@ -40,6 +39,7 @@
|
||||
|
||||
#include <trace/events/block.h>
|
||||
|
||||
#include "elevator.h"
|
||||
#include "blk.h"
|
||||
#include "blk-mq-sched.h"
|
||||
#include "blk-pm.h"
|
||||
@ -637,7 +637,7 @@ static struct elevator_type *elevator_get_default(struct request_queue *q)
|
||||
return NULL;
|
||||
|
||||
if (q->nr_hw_queues != 1 &&
|
||||
!blk_mq_is_sbitmap_shared(q->tag_set->flags))
|
||||
!blk_mq_is_shared_tags(q->tag_set->flags))
|
||||
return NULL;
|
||||
|
||||
return elevator_get(q, "mq-deadline", false);
|
||||
|
@ -1,17 +1,13 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#ifndef _LINUX_ELEVATOR_H
|
||||
#define _LINUX_ELEVATOR_H
|
||||
#ifndef _ELEVATOR_H
|
||||
#define _ELEVATOR_H
|
||||
|
||||
#include <linux/percpu.h>
|
||||
#include <linux/hashtable.h>
|
||||
|
||||
#ifdef CONFIG_BLOCK
|
||||
|
||||
struct io_cq;
|
||||
struct elevator_type;
|
||||
#ifdef CONFIG_BLK_DEBUG_FS
|
||||
struct blk_mq_debugfs_attr;
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Return values from elevator merger
|
||||
@ -162,20 +158,9 @@ extern struct request *elv_rb_find(struct rb_root *, sector_t);
|
||||
#define ELEVATOR_INSERT_FLUSH 5
|
||||
#define ELEVATOR_INSERT_SORT_MERGE 6
|
||||
|
||||
#define rq_end_sector(rq) (blk_rq_pos(rq) + blk_rq_sectors(rq))
|
||||
#define rb_entry_rq(node) rb_entry((node), struct request, rb_node)
|
||||
|
||||
#define rq_entry_fifo(ptr) list_entry((ptr), struct request, queuelist)
|
||||
#define rq_fifo_clear(rq) list_del_init(&(rq)->queuelist)
|
||||
|
||||
/*
|
||||
* Elevator features.
|
||||
*/
|
||||
|
||||
/* Supports zoned block devices sequential write constraint */
|
||||
#define ELEVATOR_F_ZBD_SEQ_WRITE (1U << 0)
|
||||
/* Supports scheduling on multiple hardware queues */
|
||||
#define ELEVATOR_F_MQ_AWARE (1U << 1)
|
||||
|
||||
#endif /* CONFIG_BLOCK */
|
||||
#endif
|
||||
#endif /* _ELEVATOR_H */
|
282
block/fops.c
282
block/fops.c
@ -17,7 +17,7 @@
|
||||
#include <linux/fs.h>
|
||||
#include "blk.h"
|
||||
|
||||
static struct inode *bdev_file_inode(struct file *file)
|
||||
static inline struct inode *bdev_file_inode(struct file *file)
|
||||
{
|
||||
return file->f_mapping->host;
|
||||
}
|
||||
@ -54,14 +54,12 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
|
||||
static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
|
||||
struct iov_iter *iter, unsigned int nr_pages)
|
||||
{
|
||||
struct file *file = iocb->ki_filp;
|
||||
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
|
||||
struct block_device *bdev = iocb->ki_filp->private_data;
|
||||
struct bio_vec inline_vecs[DIO_INLINE_BIO_VECS], *vecs;
|
||||
loff_t pos = iocb->ki_pos;
|
||||
bool should_dirty = false;
|
||||
struct bio bio;
|
||||
ssize_t ret;
|
||||
blk_qc_t qc;
|
||||
|
||||
if ((pos | iov_iter_alignment(iter)) &
|
||||
(bdev_logical_block_size(bdev) - 1))
|
||||
@ -78,7 +76,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
|
||||
|
||||
bio_init(&bio, vecs, nr_pages);
|
||||
bio_set_dev(&bio, bdev);
|
||||
bio.bi_iter.bi_sector = pos >> 9;
|
||||
bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
|
||||
bio.bi_write_hint = iocb->ki_hint;
|
||||
bio.bi_private = current;
|
||||
bio.bi_end_io = blkdev_bio_end_io_simple;
|
||||
@ -102,13 +100,12 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
|
||||
if (iocb->ki_flags & IOCB_HIPRI)
|
||||
bio_set_polled(&bio, iocb);
|
||||
|
||||
qc = submit_bio(&bio);
|
||||
submit_bio(&bio);
|
||||
for (;;) {
|
||||
set_current_state(TASK_UNINTERRUPTIBLE);
|
||||
if (!READ_ONCE(bio.bi_private))
|
||||
break;
|
||||
if (!(iocb->ki_flags & IOCB_HIPRI) ||
|
||||
!blk_poll(bdev_get_queue(bdev), qc, true))
|
||||
if (!(iocb->ki_flags & IOCB_HIPRI) || !bio_poll(&bio, NULL, 0))
|
||||
blk_io_schedule();
|
||||
}
|
||||
__set_current_state(TASK_RUNNING);
|
||||
@ -126,6 +123,11 @@ out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
enum {
|
||||
DIO_SHOULD_DIRTY = 1,
|
||||
DIO_IS_SYNC = 2,
|
||||
};
|
||||
|
||||
struct blkdev_dio {
|
||||
union {
|
||||
struct kiocb *iocb;
|
||||
@ -133,35 +135,27 @@ struct blkdev_dio {
|
||||
};
|
||||
size_t size;
|
||||
atomic_t ref;
|
||||
bool multi_bio : 1;
|
||||
bool should_dirty : 1;
|
||||
bool is_sync : 1;
|
||||
struct bio bio;
|
||||
unsigned int flags;
|
||||
struct bio bio ____cacheline_aligned_in_smp;
|
||||
};
|
||||
|
||||
static struct bio_set blkdev_dio_pool;
|
||||
|
||||
static int blkdev_iopoll(struct kiocb *kiocb, bool wait)
|
||||
{
|
||||
struct block_device *bdev = I_BDEV(kiocb->ki_filp->f_mapping->host);
|
||||
struct request_queue *q = bdev_get_queue(bdev);
|
||||
|
||||
return blk_poll(q, READ_ONCE(kiocb->ki_cookie), wait);
|
||||
}
|
||||
|
||||
static void blkdev_bio_end_io(struct bio *bio)
|
||||
{
|
||||
struct blkdev_dio *dio = bio->bi_private;
|
||||
bool should_dirty = dio->should_dirty;
|
||||
bool should_dirty = dio->flags & DIO_SHOULD_DIRTY;
|
||||
|
||||
if (bio->bi_status && !dio->bio.bi_status)
|
||||
dio->bio.bi_status = bio->bi_status;
|
||||
|
||||
if (!dio->multi_bio || atomic_dec_and_test(&dio->ref)) {
|
||||
if (!dio->is_sync) {
|
||||
if (atomic_dec_and_test(&dio->ref)) {
|
||||
if (!(dio->flags & DIO_IS_SYNC)) {
|
||||
struct kiocb *iocb = dio->iocb;
|
||||
ssize_t ret;
|
||||
|
||||
WRITE_ONCE(iocb->private, NULL);
|
||||
|
||||
if (likely(!dio->bio.bi_status)) {
|
||||
ret = dio->size;
|
||||
iocb->ki_pos += ret;
|
||||
@ -170,8 +164,7 @@ static void blkdev_bio_end_io(struct bio *bio)
|
||||
}
|
||||
|
||||
dio->iocb->ki_complete(iocb, ret, 0);
|
||||
if (dio->multi_bio)
|
||||
bio_put(&dio->bio);
|
||||
bio_put(&dio->bio);
|
||||
} else {
|
||||
struct task_struct *waiter = dio->waiter;
|
||||
|
||||
@ -191,16 +184,12 @@ static void blkdev_bio_end_io(struct bio *bio)
|
||||
static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
|
||||
unsigned int nr_pages)
|
||||
{
|
||||
struct file *file = iocb->ki_filp;
|
||||
struct inode *inode = bdev_file_inode(file);
|
||||
struct block_device *bdev = I_BDEV(inode);
|
||||
struct block_device *bdev = iocb->ki_filp->private_data;
|
||||
struct blk_plug plug;
|
||||
struct blkdev_dio *dio;
|
||||
struct bio *bio;
|
||||
bool is_poll = (iocb->ki_flags & IOCB_HIPRI) != 0;
|
||||
bool is_read = (iov_iter_rw(iter) == READ), is_sync;
|
||||
loff_t pos = iocb->ki_pos;
|
||||
blk_qc_t qc = BLK_QC_T_NONE;
|
||||
int ret = 0;
|
||||
|
||||
if ((pos | iov_iter_alignment(iter)) &
|
||||
@ -210,28 +199,31 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
|
||||
bio = bio_alloc_kiocb(iocb, nr_pages, &blkdev_dio_pool);
|
||||
|
||||
dio = container_of(bio, struct blkdev_dio, bio);
|
||||
dio->is_sync = is_sync = is_sync_kiocb(iocb);
|
||||
if (dio->is_sync) {
|
||||
atomic_set(&dio->ref, 1);
|
||||
/*
|
||||
* Grab an extra reference to ensure the dio structure which is embedded
|
||||
* into the first bio stays around.
|
||||
*/
|
||||
bio_get(bio);
|
||||
|
||||
is_sync = is_sync_kiocb(iocb);
|
||||
if (is_sync) {
|
||||
dio->flags = DIO_IS_SYNC;
|
||||
dio->waiter = current;
|
||||
bio_get(bio);
|
||||
} else {
|
||||
dio->flags = 0;
|
||||
dio->iocb = iocb;
|
||||
}
|
||||
|
||||
dio->size = 0;
|
||||
dio->multi_bio = false;
|
||||
dio->should_dirty = is_read && iter_is_iovec(iter);
|
||||
if (is_read && iter_is_iovec(iter))
|
||||
dio->flags |= DIO_SHOULD_DIRTY;
|
||||
|
||||
/*
|
||||
* Don't plug for HIPRI/polled IO, as those should go straight
|
||||
* to issue
|
||||
*/
|
||||
if (!is_poll)
|
||||
blk_start_plug(&plug);
|
||||
blk_start_plug(&plug);
|
||||
|
||||
for (;;) {
|
||||
bio_set_dev(bio, bdev);
|
||||
bio->bi_iter.bi_sector = pos >> 9;
|
||||
bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
|
||||
bio->bi_write_hint = iocb->ki_hint;
|
||||
bio->bi_private = dio;
|
||||
bio->bi_end_io = blkdev_bio_end_io;
|
||||
@ -246,7 +238,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
|
||||
|
||||
if (is_read) {
|
||||
bio->bi_opf = REQ_OP_READ;
|
||||
if (dio->should_dirty)
|
||||
if (dio->flags & DIO_SHOULD_DIRTY)
|
||||
bio_set_pages_dirty(bio);
|
||||
} else {
|
||||
bio->bi_opf = dio_bio_write_op(iocb);
|
||||
@ -260,40 +252,15 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
|
||||
|
||||
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS);
|
||||
if (!nr_pages) {
|
||||
bool polled = false;
|
||||
|
||||
if (iocb->ki_flags & IOCB_HIPRI) {
|
||||
bio_set_polled(bio, iocb);
|
||||
polled = true;
|
||||
}
|
||||
|
||||
qc = submit_bio(bio);
|
||||
|
||||
if (polled)
|
||||
WRITE_ONCE(iocb->ki_cookie, qc);
|
||||
submit_bio(bio);
|
||||
break;
|
||||
}
|
||||
|
||||
if (!dio->multi_bio) {
|
||||
/*
|
||||
* AIO needs an extra reference to ensure the dio
|
||||
* structure which is embedded into the first bio
|
||||
* stays around.
|
||||
*/
|
||||
if (!is_sync)
|
||||
bio_get(bio);
|
||||
dio->multi_bio = true;
|
||||
atomic_set(&dio->ref, 2);
|
||||
} else {
|
||||
atomic_inc(&dio->ref);
|
||||
}
|
||||
|
||||
atomic_inc(&dio->ref);
|
||||
submit_bio(bio);
|
||||
bio = bio_alloc(GFP_KERNEL, nr_pages);
|
||||
}
|
||||
|
||||
if (!is_poll)
|
||||
blk_finish_plug(&plug);
|
||||
blk_finish_plug(&plug);
|
||||
|
||||
if (!is_sync)
|
||||
return -EIOCBQUEUED;
|
||||
@ -302,10 +269,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
|
||||
set_current_state(TASK_UNINTERRUPTIBLE);
|
||||
if (!READ_ONCE(dio->waiter))
|
||||
break;
|
||||
|
||||
if (!(iocb->ki_flags & IOCB_HIPRI) ||
|
||||
!blk_poll(bdev_get_queue(bdev), qc, true))
|
||||
blk_io_schedule();
|
||||
blk_io_schedule();
|
||||
}
|
||||
__set_current_state(TASK_RUNNING);
|
||||
|
||||
@ -318,6 +282,94 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
|
||||
return ret;
|
||||
}
|
||||
|
||||
static void blkdev_bio_end_io_async(struct bio *bio)
|
||||
{
|
||||
struct blkdev_dio *dio = container_of(bio, struct blkdev_dio, bio);
|
||||
struct kiocb *iocb = dio->iocb;
|
||||
ssize_t ret;
|
||||
|
||||
if (likely(!bio->bi_status)) {
|
||||
ret = dio->size;
|
||||
iocb->ki_pos += ret;
|
||||
} else {
|
||||
ret = blk_status_to_errno(bio->bi_status);
|
||||
}
|
||||
|
||||
iocb->ki_complete(iocb, ret, 0);
|
||||
|
||||
if (dio->flags & DIO_SHOULD_DIRTY) {
|
||||
bio_check_pages_dirty(bio);
|
||||
} else {
|
||||
bio_release_pages(bio, false);
|
||||
bio_put(bio);
|
||||
}
|
||||
}
|
||||
|
||||
static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
|
||||
struct iov_iter *iter,
|
||||
unsigned int nr_pages)
|
||||
{
|
||||
struct block_device *bdev = iocb->ki_filp->private_data;
|
||||
struct blkdev_dio *dio;
|
||||
struct bio *bio;
|
||||
loff_t pos = iocb->ki_pos;
|
||||
int ret = 0;
|
||||
|
||||
if ((pos | iov_iter_alignment(iter)) &
|
||||
(bdev_logical_block_size(bdev) - 1))
|
||||
return -EINVAL;
|
||||
|
||||
bio = bio_alloc_kiocb(iocb, nr_pages, &blkdev_dio_pool);
|
||||
dio = container_of(bio, struct blkdev_dio, bio);
|
||||
dio->flags = 0;
|
||||
dio->iocb = iocb;
|
||||
bio_set_dev(bio, bdev);
|
||||
bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
|
||||
bio->bi_write_hint = iocb->ki_hint;
|
||||
bio->bi_end_io = blkdev_bio_end_io_async;
|
||||
bio->bi_ioprio = iocb->ki_ioprio;
|
||||
|
||||
if (iov_iter_is_bvec(iter)) {
|
||||
/*
|
||||
* Users don't rely on the iterator being in any particular
|
||||
* state for async I/O returning -EIOCBQUEUED, hence we can
|
||||
* avoid expensive iov_iter_advance(). Bypass
|
||||
* bio_iov_iter_get_pages() and set the bvec directly.
|
||||
*/
|
||||
bio_iov_bvec_set(bio, iter);
|
||||
} else {
|
||||
ret = bio_iov_iter_get_pages(bio, iter);
|
||||
if (unlikely(ret)) {
|
||||
bio->bi_status = BLK_STS_IOERR;
|
||||
bio_endio(bio);
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
dio->size = bio->bi_iter.bi_size;
|
||||
|
||||
if (iov_iter_rw(iter) == READ) {
|
||||
bio->bi_opf = REQ_OP_READ;
|
||||
if (iter_is_iovec(iter)) {
|
||||
dio->flags |= DIO_SHOULD_DIRTY;
|
||||
bio_set_pages_dirty(bio);
|
||||
}
|
||||
} else {
|
||||
bio->bi_opf = dio_bio_write_op(iocb);
|
||||
task_io_account_write(bio->bi_iter.bi_size);
|
||||
}
|
||||
|
||||
if (iocb->ki_flags & IOCB_HIPRI) {
|
||||
bio->bi_opf |= REQ_POLLED | REQ_NOWAIT;
|
||||
submit_bio(bio);
|
||||
WRITE_ONCE(iocb->private, bio);
|
||||
} else {
|
||||
if (iocb->ki_flags & IOCB_NOWAIT)
|
||||
bio->bi_opf |= REQ_NOWAIT;
|
||||
submit_bio(bio);
|
||||
}
|
||||
return -EIOCBQUEUED;
|
||||
}
|
||||
|
||||
static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
|
||||
{
|
||||
unsigned int nr_pages;
|
||||
@ -326,9 +378,11 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
|
||||
return 0;
|
||||
|
||||
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
|
||||
if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_VECS)
|
||||
return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
|
||||
|
||||
if (likely(nr_pages <= BIO_MAX_VECS)) {
|
||||
if (is_sync_kiocb(iocb))
|
||||
return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
|
||||
return __blkdev_direct_IO_async(iocb, iter, nr_pages);
|
||||
}
|
||||
return __blkdev_direct_IO(iocb, iter, bio_max_segs(nr_pages));
|
||||
}
|
||||
|
||||
@ -405,8 +459,7 @@ static loff_t blkdev_llseek(struct file *file, loff_t offset, int whence)
|
||||
static int blkdev_fsync(struct file *filp, loff_t start, loff_t end,
|
||||
int datasync)
|
||||
{
|
||||
struct inode *bd_inode = bdev_file_inode(filp);
|
||||
struct block_device *bdev = I_BDEV(bd_inode);
|
||||
struct block_device *bdev = filp->private_data;
|
||||
int error;
|
||||
|
||||
error = file_write_and_wait_range(filp, start, end);
|
||||
@ -448,6 +501,8 @@ static int blkdev_open(struct inode *inode, struct file *filp)
|
||||
bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp);
|
||||
if (IS_ERR(bdev))
|
||||
return PTR_ERR(bdev);
|
||||
|
||||
filp->private_data = bdev;
|
||||
filp->f_mapping = bdev->bd_inode->i_mapping;
|
||||
filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
|
||||
return 0;
|
||||
@ -455,29 +510,12 @@ static int blkdev_open(struct inode *inode, struct file *filp)
|
||||
|
||||
static int blkdev_close(struct inode *inode, struct file *filp)
|
||||
{
|
||||
struct block_device *bdev = I_BDEV(bdev_file_inode(filp));
|
||||
struct block_device *bdev = filp->private_data;
|
||||
|
||||
blkdev_put(bdev, filp->f_mode);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
|
||||
{
|
||||
struct block_device *bdev = I_BDEV(bdev_file_inode(file));
|
||||
fmode_t mode = file->f_mode;
|
||||
|
||||
/*
|
||||
* O_NDELAY can be altered using fcntl(.., F_SETFL, ..), so we have
|
||||
* to updated it before every ioctl.
|
||||
*/
|
||||
if (file->f_flags & O_NDELAY)
|
||||
mode |= FMODE_NDELAY;
|
||||
else
|
||||
mode &= ~FMODE_NDELAY;
|
||||
|
||||
return blkdev_ioctl(bdev, mode, cmd, arg);
|
||||
}
|
||||
|
||||
/*
|
||||
* Write data to the block device. Only intended for the block device itself
|
||||
* and the raw driver which basically is a fake block device.
|
||||
@ -487,14 +525,14 @@ static long block_ioctl(struct file *file, unsigned cmd, unsigned long arg)
|
||||
*/
|
||||
static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
|
||||
{
|
||||
struct file *file = iocb->ki_filp;
|
||||
struct inode *bd_inode = bdev_file_inode(file);
|
||||
struct block_device *bdev = iocb->ki_filp->private_data;
|
||||
struct inode *bd_inode = bdev->bd_inode;
|
||||
loff_t size = i_size_read(bd_inode);
|
||||
struct blk_plug plug;
|
||||
size_t shorted = 0;
|
||||
ssize_t ret;
|
||||
|
||||
if (bdev_read_only(I_BDEV(bd_inode)))
|
||||
if (bdev_read_only(bdev))
|
||||
return -EPERM;
|
||||
|
||||
if (IS_SWAPFILE(bd_inode) && !is_hibernate_resume_dev(bd_inode->i_rdev))
|
||||
@ -526,24 +564,26 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
|
||||
|
||||
static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)
|
||||
{
|
||||
struct file *file = iocb->ki_filp;
|
||||
struct inode *bd_inode = bdev_file_inode(file);
|
||||
loff_t size = i_size_read(bd_inode);
|
||||
struct block_device *bdev = iocb->ki_filp->private_data;
|
||||
loff_t size = i_size_read(bdev->bd_inode);
|
||||
loff_t pos = iocb->ki_pos;
|
||||
size_t shorted = 0;
|
||||
ssize_t ret;
|
||||
|
||||
if (pos >= size)
|
||||
return 0;
|
||||
|
||||
size -= pos;
|
||||
if (iov_iter_count(to) > size) {
|
||||
shorted = iov_iter_count(to) - size;
|
||||
iov_iter_truncate(to, size);
|
||||
if (unlikely(pos + iov_iter_count(to) > size)) {
|
||||
if (pos >= size)
|
||||
return 0;
|
||||
size -= pos;
|
||||
if (iov_iter_count(to) > size) {
|
||||
shorted = iov_iter_count(to) - size;
|
||||
iov_iter_truncate(to, size);
|
||||
}
|
||||
}
|
||||
|
||||
ret = generic_file_read_iter(iocb, to);
|
||||
iov_iter_reexpand(to, iov_iter_count(to) + shorted);
|
||||
|
||||
if (unlikely(shorted))
|
||||
iov_iter_reexpand(to, iov_iter_count(to) + shorted);
|
||||
return ret;
|
||||
}
|
||||
|
||||
@ -592,16 +632,18 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
|
||||
switch (mode) {
|
||||
case FALLOC_FL_ZERO_RANGE:
|
||||
case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
|
||||
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
|
||||
GFP_KERNEL, BLKDEV_ZERO_NOUNMAP);
|
||||
error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
|
||||
len >> SECTOR_SHIFT, GFP_KERNEL,
|
||||
BLKDEV_ZERO_NOUNMAP);
|
||||
break;
|
||||
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE:
|
||||
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
|
||||
GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
|
||||
error = blkdev_issue_zeroout(bdev, start >> SECTOR_SHIFT,
|
||||
len >> SECTOR_SHIFT, GFP_KERNEL,
|
||||
BLKDEV_ZERO_NOFALLBACK);
|
||||
break;
|
||||
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
|
||||
error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
|
||||
GFP_KERNEL, 0);
|
||||
error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
|
||||
len >> SECTOR_SHIFT, GFP_KERNEL, 0);
|
||||
break;
|
||||
default:
|
||||
error = -EOPNOTSUPP;
|
||||
@ -618,10 +660,10 @@ const struct file_operations def_blk_fops = {
|
||||
.llseek = blkdev_llseek,
|
||||
.read_iter = blkdev_read_iter,
|
||||
.write_iter = blkdev_write_iter,
|
||||
.iopoll = blkdev_iopoll,
|
||||
.iopoll = iocb_bio_iopoll,
|
||||
.mmap = generic_file_mmap,
|
||||
.fsync = blkdev_fsync,
|
||||
.unlocked_ioctl = block_ioctl,
|
||||
.unlocked_ioctl = blkdev_ioctl,
|
||||
#ifdef CONFIG_COMPAT
|
||||
.compat_ioctl = compat_blkdev_ioctl,
|
||||
#endif
|
||||
|
@ -19,6 +19,7 @@
|
||||
#include <linux/seq_file.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/kmod.h>
|
||||
#include <linux/major.h>
|
||||
#include <linux/mutex.h>
|
||||
#include <linux/idr.h>
|
||||
#include <linux/log2.h>
|
||||
@ -625,6 +626,26 @@ void del_gendisk(struct gendisk *disk)
|
||||
}
|
||||
EXPORT_SYMBOL(del_gendisk);
|
||||
|
||||
/**
|
||||
* invalidate_disk - invalidate the disk
|
||||
* @disk: the struct gendisk to invalidate
|
||||
*
|
||||
* A helper to invalidates the disk. It will clean the disk's associated
|
||||
* buffer/page caches and reset its internal states so that the disk
|
||||
* can be reused by the drivers.
|
||||
*
|
||||
* Context: can sleep
|
||||
*/
|
||||
void invalidate_disk(struct gendisk *disk)
|
||||
{
|
||||
struct block_device *bdev = disk->part0;
|
||||
|
||||
invalidate_bdev(bdev);
|
||||
bdev->bd_inode->i_mapping->wb_err = 0;
|
||||
set_capacity(disk, 0);
|
||||
}
|
||||
EXPORT_SYMBOL(invalidate_disk);
|
||||
|
||||
/* sysfs access to bad-blocks list. */
|
||||
static ssize_t disk_badblocks_show(struct device *dev,
|
||||
struct device_attribute *attr,
|
||||
@ -884,7 +905,7 @@ ssize_t part_stat_show(struct device *dev,
|
||||
struct device_attribute *attr, char *buf)
|
||||
{
|
||||
struct block_device *bdev = dev_to_bdev(dev);
|
||||
struct request_queue *q = bdev->bd_disk->queue;
|
||||
struct request_queue *q = bdev_get_queue(bdev);
|
||||
struct disk_stats stat;
|
||||
unsigned int inflight;
|
||||
|
||||
@ -928,7 +949,7 @@ ssize_t part_inflight_show(struct device *dev, struct device_attribute *attr,
|
||||
char *buf)
|
||||
{
|
||||
struct block_device *bdev = dev_to_bdev(dev);
|
||||
struct request_queue *q = bdev->bd_disk->queue;
|
||||
struct request_queue *q = bdev_get_queue(bdev);
|
||||
unsigned int inflight[2];
|
||||
|
||||
if (queue_is_mq(q))
|
||||
@ -1268,6 +1289,9 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
|
||||
if (!disk->bdi)
|
||||
goto out_free_disk;
|
||||
|
||||
/* bdev_alloc() might need the queue, set before the first call */
|
||||
disk->queue = q;
|
||||
|
||||
disk->part0 = bdev_alloc(disk, 0);
|
||||
if (!disk->part0)
|
||||
goto out_free_bdi;
|
||||
@ -1283,7 +1307,6 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
|
||||
disk_to_dev(disk)->type = &disk_type;
|
||||
device_initialize(disk_to_dev(disk));
|
||||
inc_diskseq(disk);
|
||||
disk->queue = q;
|
||||
q->disk = disk;
|
||||
lockdep_init_map(&disk->lockdep_map, "(bio completion)", lkclass, 0);
|
||||
#ifdef CONFIG_BLOCK_HOLDER_DEPRECATED
|
||||
@ -1388,12 +1411,6 @@ void set_disk_ro(struct gendisk *disk, bool read_only)
|
||||
}
|
||||
EXPORT_SYMBOL(set_disk_ro);
|
||||
|
||||
int bdev_read_only(struct block_device *bdev)
|
||||
{
|
||||
return bdev->bd_read_only || get_disk_ro(bdev->bd_disk);
|
||||
}
|
||||
EXPORT_SYMBOL(bdev_read_only);
|
||||
|
||||
void inc_diskseq(struct gendisk *disk)
|
||||
{
|
||||
disk->diskseq = atomic64_inc_return(&diskseq);
|
||||
|
@ -1,5 +1,6 @@
|
||||
// SPDX-License-Identifier: GPL-2.0-only
|
||||
#include <linux/genhd.h>
|
||||
#include <linux/slab.h>
|
||||
|
||||
struct bd_holder_disk {
|
||||
struct list_head list;
|
||||
|
@ -538,12 +538,22 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
|
||||
*
|
||||
* New commands must be compatible and go into blkdev_common_ioctl
|
||||
*/
|
||||
int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
|
||||
unsigned long arg)
|
||||
long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
|
||||
{
|
||||
int ret;
|
||||
loff_t size;
|
||||
struct block_device *bdev = I_BDEV(file->f_mapping->host);
|
||||
void __user *argp = (void __user *)arg;
|
||||
fmode_t mode = file->f_mode;
|
||||
loff_t size;
|
||||
int ret;
|
||||
|
||||
/*
|
||||
* O_NDELAY can be altered using fcntl(.., F_SETFL, ..), so we have
|
||||
* to updated it before every ioctl.
|
||||
*/
|
||||
if (file->f_flags & O_NDELAY)
|
||||
mode |= FMODE_NDELAY;
|
||||
else
|
||||
mode &= ~FMODE_NDELAY;
|
||||
|
||||
switch (cmd) {
|
||||
/* These need separate implementations for the data structure */
|
||||
@ -588,7 +598,6 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
|
||||
return -ENOTTY;
|
||||
return bdev->bd_disk->fops->ioctl(bdev, mode, cmd, arg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blkdev_ioctl); /* for /dev/raw */
|
||||
|
||||
#ifdef CONFIG_COMPAT
|
||||
|
||||
|
@ -1,578 +0,0 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* Copyright 2019 Google LLC
|
||||
*/
|
||||
|
||||
/**
|
||||
* DOC: The Keyslot Manager
|
||||
*
|
||||
* Many devices with inline encryption support have a limited number of "slots"
|
||||
* into which encryption contexts may be programmed, and requests can be tagged
|
||||
* with a slot number to specify the key to use for en/decryption.
|
||||
*
|
||||
* As the number of slots is limited, and programming keys is expensive on
|
||||
* many inline encryption hardware, we don't want to program the same key into
|
||||
* multiple slots - if multiple requests are using the same key, we want to
|
||||
* program just one slot with that key and use that slot for all requests.
|
||||
*
|
||||
* The keyslot manager manages these keyslots appropriately, and also acts as
|
||||
* an abstraction between the inline encryption hardware and the upper layers.
|
||||
*
|
||||
* Lower layer devices will set up a keyslot manager in their request queue
|
||||
* and tell it how to perform device specific operations like programming/
|
||||
* evicting keys from keyslots.
|
||||
*
|
||||
* Upper layers will call blk_ksm_get_slot_for_key() to program a
|
||||
* key into some slot in the inline encryption hardware.
|
||||
*/
|
||||
|
||||
#define pr_fmt(fmt) "blk-crypto: " fmt
|
||||
|
||||
#include <linux/keyslot-manager.h>
|
||||
#include <linux/device.h>
|
||||
#include <linux/atomic.h>
|
||||
#include <linux/mutex.h>
|
||||
#include <linux/pm_runtime.h>
|
||||
#include <linux/wait.h>
|
||||
#include <linux/blkdev.h>
|
||||
|
||||
struct blk_ksm_keyslot {
|
||||
atomic_t slot_refs;
|
||||
struct list_head idle_slot_node;
|
||||
struct hlist_node hash_node;
|
||||
const struct blk_crypto_key *key;
|
||||
struct blk_keyslot_manager *ksm;
|
||||
};
|
||||
|
||||
static inline void blk_ksm_hw_enter(struct blk_keyslot_manager *ksm)
|
||||
{
|
||||
/*
|
||||
* Calling into the driver requires ksm->lock held and the device
|
||||
* resumed. But we must resume the device first, since that can acquire
|
||||
* and release ksm->lock via blk_ksm_reprogram_all_keys().
|
||||
*/
|
||||
if (ksm->dev)
|
||||
pm_runtime_get_sync(ksm->dev);
|
||||
down_write(&ksm->lock);
|
||||
}
|
||||
|
||||
static inline void blk_ksm_hw_exit(struct blk_keyslot_manager *ksm)
|
||||
{
|
||||
up_write(&ksm->lock);
|
||||
if (ksm->dev)
|
||||
pm_runtime_put_sync(ksm->dev);
|
||||
}
|
||||
|
||||
static inline bool blk_ksm_is_passthrough(struct blk_keyslot_manager *ksm)
|
||||
{
|
||||
return ksm->num_slots == 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_ksm_init() - Initialize a keyslot manager
|
||||
* @ksm: The keyslot_manager to initialize.
|
||||
* @num_slots: The number of key slots to manage.
|
||||
*
|
||||
* Allocate memory for keyslots and initialize a keyslot manager. Called by
|
||||
* e.g. storage drivers to set up a keyslot manager in their request_queue.
|
||||
*
|
||||
* Return: 0 on success, or else a negative error code.
|
||||
*/
|
||||
int blk_ksm_init(struct blk_keyslot_manager *ksm, unsigned int num_slots)
|
||||
{
|
||||
unsigned int slot;
|
||||
unsigned int i;
|
||||
unsigned int slot_hashtable_size;
|
||||
|
||||
memset(ksm, 0, sizeof(*ksm));
|
||||
|
||||
if (num_slots == 0)
|
||||
return -EINVAL;
|
||||
|
||||
ksm->slots = kvcalloc(num_slots, sizeof(ksm->slots[0]), GFP_KERNEL);
|
||||
if (!ksm->slots)
|
||||
return -ENOMEM;
|
||||
|
||||
ksm->num_slots = num_slots;
|
||||
|
||||
init_rwsem(&ksm->lock);
|
||||
|
||||
init_waitqueue_head(&ksm->idle_slots_wait_queue);
|
||||
INIT_LIST_HEAD(&ksm->idle_slots);
|
||||
|
||||
for (slot = 0; slot < num_slots; slot++) {
|
||||
ksm->slots[slot].ksm = ksm;
|
||||
list_add_tail(&ksm->slots[slot].idle_slot_node,
|
||||
&ksm->idle_slots);
|
||||
}
|
||||
|
||||
spin_lock_init(&ksm->idle_slots_lock);
|
||||
|
||||
slot_hashtable_size = roundup_pow_of_two(num_slots);
|
||||
/*
|
||||
* hash_ptr() assumes bits != 0, so ensure the hash table has at least 2
|
||||
* buckets. This only makes a difference when there is only 1 keyslot.
|
||||
*/
|
||||
if (slot_hashtable_size < 2)
|
||||
slot_hashtable_size = 2;
|
||||
|
||||
ksm->log_slot_ht_size = ilog2(slot_hashtable_size);
|
||||
ksm->slot_hashtable = kvmalloc_array(slot_hashtable_size,
|
||||
sizeof(ksm->slot_hashtable[0]),
|
||||
GFP_KERNEL);
|
||||
if (!ksm->slot_hashtable)
|
||||
goto err_destroy_ksm;
|
||||
for (i = 0; i < slot_hashtable_size; i++)
|
||||
INIT_HLIST_HEAD(&ksm->slot_hashtable[i]);
|
||||
|
||||
return 0;
|
||||
|
||||
err_destroy_ksm:
|
||||
blk_ksm_destroy(ksm);
|
||||
return -ENOMEM;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_ksm_init);
|
||||
|
||||
static void blk_ksm_destroy_callback(void *ksm)
|
||||
{
|
||||
blk_ksm_destroy(ksm);
|
||||
}
|
||||
|
||||
/**
|
||||
* devm_blk_ksm_init() - Resource-managed blk_ksm_init()
|
||||
* @dev: The device which owns the blk_keyslot_manager.
|
||||
* @ksm: The blk_keyslot_manager to initialize.
|
||||
* @num_slots: The number of key slots to manage.
|
||||
*
|
||||
* Like blk_ksm_init(), but causes blk_ksm_destroy() to be called automatically
|
||||
* on driver detach.
|
||||
*
|
||||
* Return: 0 on success, or else a negative error code.
|
||||
*/
|
||||
int devm_blk_ksm_init(struct device *dev, struct blk_keyslot_manager *ksm,
|
||||
unsigned int num_slots)
|
||||
{
|
||||
int err = blk_ksm_init(ksm, num_slots);
|
||||
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
return devm_add_action_or_reset(dev, blk_ksm_destroy_callback, ksm);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(devm_blk_ksm_init);
|
||||
|
||||
static inline struct hlist_head *
|
||||
blk_ksm_hash_bucket_for_key(struct blk_keyslot_manager *ksm,
|
||||
const struct blk_crypto_key *key)
|
||||
{
|
||||
return &ksm->slot_hashtable[hash_ptr(key, ksm->log_slot_ht_size)];
|
||||
}
|
||||
|
||||
static void blk_ksm_remove_slot_from_lru_list(struct blk_ksm_keyslot *slot)
|
||||
{
|
||||
struct blk_keyslot_manager *ksm = slot->ksm;
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&ksm->idle_slots_lock, flags);
|
||||
list_del(&slot->idle_slot_node);
|
||||
spin_unlock_irqrestore(&ksm->idle_slots_lock, flags);
|
||||
}
|
||||
|
||||
static struct blk_ksm_keyslot *blk_ksm_find_keyslot(
|
||||
struct blk_keyslot_manager *ksm,
|
||||
const struct blk_crypto_key *key)
|
||||
{
|
||||
const struct hlist_head *head = blk_ksm_hash_bucket_for_key(ksm, key);
|
||||
struct blk_ksm_keyslot *slotp;
|
||||
|
||||
hlist_for_each_entry(slotp, head, hash_node) {
|
||||
if (slotp->key == key)
|
||||
return slotp;
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static struct blk_ksm_keyslot *blk_ksm_find_and_grab_keyslot(
|
||||
struct blk_keyslot_manager *ksm,
|
||||
const struct blk_crypto_key *key)
|
||||
{
|
||||
struct blk_ksm_keyslot *slot;
|
||||
|
||||
slot = blk_ksm_find_keyslot(ksm, key);
|
||||
if (!slot)
|
||||
return NULL;
|
||||
if (atomic_inc_return(&slot->slot_refs) == 1) {
|
||||
/* Took first reference to this slot; remove it from LRU list */
|
||||
blk_ksm_remove_slot_from_lru_list(slot);
|
||||
}
|
||||
return slot;
|
||||
}
|
||||
|
||||
unsigned int blk_ksm_get_slot_idx(struct blk_ksm_keyslot *slot)
|
||||
{
|
||||
return slot - slot->ksm->slots;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_ksm_get_slot_idx);
|
||||
|
||||
/**
|
||||
* blk_ksm_get_slot_for_key() - Program a key into a keyslot.
|
||||
* @ksm: The keyslot manager to program the key into.
|
||||
* @key: Pointer to the key object to program, including the raw key, crypto
|
||||
* mode, and data unit size.
|
||||
* @slot_ptr: A pointer to return the pointer of the allocated keyslot.
|
||||
*
|
||||
* Get a keyslot that's been programmed with the specified key. If one already
|
||||
* exists, return it with incremented refcount. Otherwise, wait for a keyslot
|
||||
* to become idle and program it.
|
||||
*
|
||||
* Context: Process context. Takes and releases ksm->lock.
|
||||
* Return: BLK_STS_OK on success (and keyslot is set to the pointer of the
|
||||
* allocated keyslot), or some other blk_status_t otherwise (and
|
||||
* keyslot is set to NULL).
|
||||
*/
|
||||
blk_status_t blk_ksm_get_slot_for_key(struct blk_keyslot_manager *ksm,
|
||||
const struct blk_crypto_key *key,
|
||||
struct blk_ksm_keyslot **slot_ptr)
|
||||
{
|
||||
struct blk_ksm_keyslot *slot;
|
||||
int slot_idx;
|
||||
int err;
|
||||
|
||||
*slot_ptr = NULL;
|
||||
|
||||
if (blk_ksm_is_passthrough(ksm))
|
||||
return BLK_STS_OK;
|
||||
|
||||
down_read(&ksm->lock);
|
||||
slot = blk_ksm_find_and_grab_keyslot(ksm, key);
|
||||
up_read(&ksm->lock);
|
||||
if (slot)
|
||||
goto success;
|
||||
|
||||
for (;;) {
|
||||
blk_ksm_hw_enter(ksm);
|
||||
slot = blk_ksm_find_and_grab_keyslot(ksm, key);
|
||||
if (slot) {
|
||||
blk_ksm_hw_exit(ksm);
|
||||
goto success;
|
||||
}
|
||||
|
||||
/*
|
||||
* If we're here, that means there wasn't a slot that was
|
||||
* already programmed with the key. So try to program it.
|
||||
*/
|
||||
if (!list_empty(&ksm->idle_slots))
|
||||
break;
|
||||
|
||||
blk_ksm_hw_exit(ksm);
|
||||
wait_event(ksm->idle_slots_wait_queue,
|
||||
!list_empty(&ksm->idle_slots));
|
||||
}
|
||||
|
||||
slot = list_first_entry(&ksm->idle_slots, struct blk_ksm_keyslot,
|
||||
idle_slot_node);
|
||||
slot_idx = blk_ksm_get_slot_idx(slot);
|
||||
|
||||
err = ksm->ksm_ll_ops.keyslot_program(ksm, key, slot_idx);
|
||||
if (err) {
|
||||
wake_up(&ksm->idle_slots_wait_queue);
|
||||
blk_ksm_hw_exit(ksm);
|
||||
return errno_to_blk_status(err);
|
||||
}
|
||||
|
||||
/* Move this slot to the hash list for the new key. */
|
||||
if (slot->key)
|
||||
hlist_del(&slot->hash_node);
|
||||
slot->key = key;
|
||||
hlist_add_head(&slot->hash_node, blk_ksm_hash_bucket_for_key(ksm, key));
|
||||
|
||||
atomic_set(&slot->slot_refs, 1);
|
||||
|
||||
blk_ksm_remove_slot_from_lru_list(slot);
|
||||
|
||||
blk_ksm_hw_exit(ksm);
|
||||
success:
|
||||
*slot_ptr = slot;
|
||||
return BLK_STS_OK;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_ksm_put_slot() - Release a reference to a slot
|
||||
* @slot: The keyslot to release the reference of.
|
||||
*
|
||||
* Context: Any context.
|
||||
*/
|
||||
void blk_ksm_put_slot(struct blk_ksm_keyslot *slot)
|
||||
{
|
||||
struct blk_keyslot_manager *ksm;
|
||||
unsigned long flags;
|
||||
|
||||
if (!slot)
|
||||
return;
|
||||
|
||||
ksm = slot->ksm;
|
||||
|
||||
if (atomic_dec_and_lock_irqsave(&slot->slot_refs,
|
||||
&ksm->idle_slots_lock, flags)) {
|
||||
list_add_tail(&slot->idle_slot_node, &ksm->idle_slots);
|
||||
spin_unlock_irqrestore(&ksm->idle_slots_lock, flags);
|
||||
wake_up(&ksm->idle_slots_wait_queue);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_ksm_crypto_cfg_supported() - Find out if a crypto configuration is
|
||||
* supported by a ksm.
|
||||
* @ksm: The keyslot manager to check
|
||||
* @cfg: The crypto configuration to check for.
|
||||
*
|
||||
* Checks for crypto_mode/data unit size/dun bytes support.
|
||||
*
|
||||
* Return: Whether or not this ksm supports the specified crypto config.
|
||||
*/
|
||||
bool blk_ksm_crypto_cfg_supported(struct blk_keyslot_manager *ksm,
|
||||
const struct blk_crypto_config *cfg)
|
||||
{
|
||||
if (!ksm)
|
||||
return false;
|
||||
if (!(ksm->crypto_modes_supported[cfg->crypto_mode] &
|
||||
cfg->data_unit_size))
|
||||
return false;
|
||||
if (ksm->max_dun_bytes_supported < cfg->dun_bytes)
|
||||
return false;
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_ksm_evict_key() - Evict a key from the lower layer device.
|
||||
* @ksm: The keyslot manager to evict from
|
||||
* @key: The key to evict
|
||||
*
|
||||
* Find the keyslot that the specified key was programmed into, and evict that
|
||||
* slot from the lower layer device. The slot must not be in use by any
|
||||
* in-flight IO when this function is called.
|
||||
*
|
||||
* Context: Process context. Takes and releases ksm->lock.
|
||||
* Return: 0 on success or if there's no keyslot with the specified key, -EBUSY
|
||||
* if the keyslot is still in use, or another -errno value on other
|
||||
* error.
|
||||
*/
|
||||
int blk_ksm_evict_key(struct blk_keyslot_manager *ksm,
|
||||
const struct blk_crypto_key *key)
|
||||
{
|
||||
struct blk_ksm_keyslot *slot;
|
||||
int err = 0;
|
||||
|
||||
if (blk_ksm_is_passthrough(ksm)) {
|
||||
if (ksm->ksm_ll_ops.keyslot_evict) {
|
||||
blk_ksm_hw_enter(ksm);
|
||||
err = ksm->ksm_ll_ops.keyslot_evict(ksm, key, -1);
|
||||
blk_ksm_hw_exit(ksm);
|
||||
return err;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
blk_ksm_hw_enter(ksm);
|
||||
slot = blk_ksm_find_keyslot(ksm, key);
|
||||
if (!slot)
|
||||
goto out_unlock;
|
||||
|
||||
if (WARN_ON_ONCE(atomic_read(&slot->slot_refs) != 0)) {
|
||||
err = -EBUSY;
|
||||
goto out_unlock;
|
||||
}
|
||||
err = ksm->ksm_ll_ops.keyslot_evict(ksm, key,
|
||||
blk_ksm_get_slot_idx(slot));
|
||||
if (err)
|
||||
goto out_unlock;
|
||||
|
||||
hlist_del(&slot->hash_node);
|
||||
slot->key = NULL;
|
||||
err = 0;
|
||||
out_unlock:
|
||||
blk_ksm_hw_exit(ksm);
|
||||
return err;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_ksm_reprogram_all_keys() - Re-program all keyslots.
|
||||
* @ksm: The keyslot manager
|
||||
*
|
||||
* Re-program all keyslots that are supposed to have a key programmed. This is
|
||||
* intended only for use by drivers for hardware that loses its keys on reset.
|
||||
*
|
||||
* Context: Process context. Takes and releases ksm->lock.
|
||||
*/
|
||||
void blk_ksm_reprogram_all_keys(struct blk_keyslot_manager *ksm)
|
||||
{
|
||||
unsigned int slot;
|
||||
|
||||
if (blk_ksm_is_passthrough(ksm))
|
||||
return;
|
||||
|
||||
/* This is for device initialization, so don't resume the device */
|
||||
down_write(&ksm->lock);
|
||||
for (slot = 0; slot < ksm->num_slots; slot++) {
|
||||
const struct blk_crypto_key *key = ksm->slots[slot].key;
|
||||
int err;
|
||||
|
||||
if (!key)
|
||||
continue;
|
||||
|
||||
err = ksm->ksm_ll_ops.keyslot_program(ksm, key, slot);
|
||||
WARN_ON(err);
|
||||
}
|
||||
up_write(&ksm->lock);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_ksm_reprogram_all_keys);
|
||||
|
||||
void blk_ksm_destroy(struct blk_keyslot_manager *ksm)
|
||||
{
|
||||
if (!ksm)
|
||||
return;
|
||||
kvfree(ksm->slot_hashtable);
|
||||
kvfree_sensitive(ksm->slots, sizeof(ksm->slots[0]) * ksm->num_slots);
|
||||
memzero_explicit(ksm, sizeof(*ksm));
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_ksm_destroy);
|
||||
|
||||
bool blk_ksm_register(struct blk_keyslot_manager *ksm, struct request_queue *q)
|
||||
{
|
||||
if (blk_integrity_queue_supports_integrity(q)) {
|
||||
pr_warn("Integrity and hardware inline encryption are not supported together. Disabling hardware inline encryption.\n");
|
||||
return false;
|
||||
}
|
||||
q->ksm = ksm;
|
||||
return true;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_ksm_register);
|
||||
|
||||
void blk_ksm_unregister(struct request_queue *q)
|
||||
{
|
||||
q->ksm = NULL;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_ksm_intersect_modes() - restrict supported modes by child device
|
||||
* @parent: The keyslot manager for parent device
|
||||
* @child: The keyslot manager for child device, or NULL
|
||||
*
|
||||
* Clear any crypto mode support bits in @parent that aren't set in @child.
|
||||
* If @child is NULL, then all parent bits are cleared.
|
||||
*
|
||||
* Only use this when setting up the keyslot manager for a layered device,
|
||||
* before it's been exposed yet.
|
||||
*/
|
||||
void blk_ksm_intersect_modes(struct blk_keyslot_manager *parent,
|
||||
const struct blk_keyslot_manager *child)
|
||||
{
|
||||
if (child) {
|
||||
unsigned int i;
|
||||
|
||||
parent->max_dun_bytes_supported =
|
||||
min(parent->max_dun_bytes_supported,
|
||||
child->max_dun_bytes_supported);
|
||||
for (i = 0; i < ARRAY_SIZE(child->crypto_modes_supported);
|
||||
i++) {
|
||||
parent->crypto_modes_supported[i] &=
|
||||
child->crypto_modes_supported[i];
|
||||
}
|
||||
} else {
|
||||
parent->max_dun_bytes_supported = 0;
|
||||
memset(parent->crypto_modes_supported, 0,
|
||||
sizeof(parent->crypto_modes_supported));
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_ksm_intersect_modes);
|
||||
|
||||
/**
|
||||
* blk_ksm_is_superset() - Check if a KSM supports a superset of crypto modes
|
||||
* and DUN bytes that another KSM supports. Here,
|
||||
* "superset" refers to the mathematical meaning of the
|
||||
* word - i.e. if two KSMs have the *same* capabilities,
|
||||
* they *are* considered supersets of each other.
|
||||
* @ksm_superset: The KSM that we want to verify is a superset
|
||||
* @ksm_subset: The KSM that we want to verify is a subset
|
||||
*
|
||||
* Return: True if @ksm_superset supports a superset of the crypto modes and DUN
|
||||
* bytes that @ksm_subset supports.
|
||||
*/
|
||||
bool blk_ksm_is_superset(struct blk_keyslot_manager *ksm_superset,
|
||||
struct blk_keyslot_manager *ksm_subset)
|
||||
{
|
||||
int i;
|
||||
|
||||
if (!ksm_subset)
|
||||
return true;
|
||||
|
||||
if (!ksm_superset)
|
||||
return false;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(ksm_superset->crypto_modes_supported); i++) {
|
||||
if (ksm_subset->crypto_modes_supported[i] &
|
||||
(~ksm_superset->crypto_modes_supported[i])) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
if (ksm_subset->max_dun_bytes_supported >
|
||||
ksm_superset->max_dun_bytes_supported) {
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_ksm_is_superset);
|
||||
|
||||
/**
|
||||
* blk_ksm_update_capabilities() - Update the restrictions of a KSM to those of
|
||||
* another KSM
|
||||
* @target_ksm: The KSM whose restrictions to update.
|
||||
* @reference_ksm: The KSM to whose restrictions this function will update
|
||||
* @target_ksm's restrictions to.
|
||||
*
|
||||
* Blk-crypto requires that crypto capabilities that were
|
||||
* advertised when a bio was created continue to be supported by the
|
||||
* device until that bio is ended. This is turn means that a device cannot
|
||||
* shrink its advertised crypto capabilities without any explicit
|
||||
* synchronization with upper layers. So if there's no such explicit
|
||||
* synchronization, @reference_ksm must support all the crypto capabilities that
|
||||
* @target_ksm does
|
||||
* (i.e. we need blk_ksm_is_superset(@reference_ksm, @target_ksm) == true).
|
||||
*
|
||||
* Note also that as long as the crypto capabilities are being expanded, the
|
||||
* order of updates becoming visible is not important because it's alright
|
||||
* for blk-crypto to see stale values - they only cause blk-crypto to
|
||||
* believe that a crypto capability isn't supported when it actually is (which
|
||||
* might result in blk-crypto-fallback being used if available, or the bio being
|
||||
* failed).
|
||||
*/
|
||||
void blk_ksm_update_capabilities(struct blk_keyslot_manager *target_ksm,
|
||||
struct blk_keyslot_manager *reference_ksm)
|
||||
{
|
||||
memcpy(target_ksm->crypto_modes_supported,
|
||||
reference_ksm->crypto_modes_supported,
|
||||
sizeof(target_ksm->crypto_modes_supported));
|
||||
|
||||
target_ksm->max_dun_bytes_supported =
|
||||
reference_ksm->max_dun_bytes_supported;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_ksm_update_capabilities);
|
||||
|
||||
/**
|
||||
* blk_ksm_init_passthrough() - Init a passthrough keyslot manager
|
||||
* @ksm: The keyslot manager to init
|
||||
*
|
||||
* Initialize a passthrough keyslot manager.
|
||||
* Called by e.g. storage drivers to set up a keyslot manager in their
|
||||
* request_queue, when the storage driver wants to manage its keys by itself.
|
||||
* This is useful for inline encryption hardware that doesn't have the concept
|
||||
* of keyslots, and for layered devices.
|
||||
*/
|
||||
void blk_ksm_init_passthrough(struct blk_keyslot_manager *ksm)
|
||||
{
|
||||
memset(ksm, 0, sizeof(*ksm));
|
||||
init_rwsem(&ksm->lock);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_ksm_init_passthrough);
|
@ -9,12 +9,12 @@
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/elevator.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/sbitmap.h>
|
||||
|
||||
#include <trace/events/block.h>
|
||||
|
||||
#include "elevator.h"
|
||||
#include "blk.h"
|
||||
#include "blk-mq.h"
|
||||
#include "blk-mq-debugfs.h"
|
||||
@ -453,11 +453,11 @@ static void kyber_depth_updated(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
|
||||
struct blk_mq_tags *tags = hctx->sched_tags;
|
||||
unsigned int shift = tags->bitmap_tags->sb.shift;
|
||||
unsigned int shift = tags->bitmap_tags.sb.shift;
|
||||
|
||||
kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U;
|
||||
|
||||
sbitmap_queue_min_shallow_depth(tags->bitmap_tags, kqd->async_depth);
|
||||
sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, kqd->async_depth);
|
||||
}
|
||||
|
||||
static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
|
||||
|
@ -9,7 +9,6 @@
|
||||
#include <linux/fs.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/elevator.h>
|
||||
#include <linux/bio.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/slab.h>
|
||||
@ -20,6 +19,7 @@
|
||||
|
||||
#include <trace/events/block.h>
|
||||
|
||||
#include "elevator.h"
|
||||
#include "blk.h"
|
||||
#include "blk-mq.h"
|
||||
#include "blk-mq-debugfs.h"
|
||||
@ -31,6 +31,11 @@
|
||||
*/
|
||||
static const int read_expire = HZ / 2; /* max time before a read is submitted. */
|
||||
static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
|
||||
/*
|
||||
* Time after which to dispatch lower priority requests even if higher
|
||||
* priority requests are pending.
|
||||
*/
|
||||
static const int prio_aging_expire = 10 * HZ;
|
||||
static const int writes_starved = 2; /* max times reads can starve a write */
|
||||
static const int fifo_batch = 16; /* # of sequential requests treated as one
|
||||
by the above parameters. For throughput. */
|
||||
@ -51,17 +56,16 @@ enum dd_prio {
|
||||
|
||||
enum { DD_PRIO_COUNT = 3 };
|
||||
|
||||
/* I/O statistics per I/O priority. */
|
||||
/*
|
||||
* I/O statistics per I/O priority. It is fine if these counters overflow.
|
||||
* What matters is that these counters are at least as wide as
|
||||
* log2(max_outstanding_requests).
|
||||
*/
|
||||
struct io_stats_per_prio {
|
||||
local_t inserted;
|
||||
local_t merged;
|
||||
local_t dispatched;
|
||||
local_t completed;
|
||||
};
|
||||
|
||||
/* I/O statistics for all I/O priorities (enum dd_prio). */
|
||||
struct io_stats {
|
||||
struct io_stats_per_prio stats[DD_PRIO_COUNT];
|
||||
uint32_t inserted;
|
||||
uint32_t merged;
|
||||
uint32_t dispatched;
|
||||
atomic_t completed;
|
||||
};
|
||||
|
||||
/*
|
||||
@ -74,6 +78,7 @@ struct dd_per_prio {
|
||||
struct list_head fifo_list[DD_DIR_COUNT];
|
||||
/* Next request in FIFO order. Read, write or both are NULL. */
|
||||
struct request *next_rq[DD_DIR_COUNT];
|
||||
struct io_stats_per_prio stats;
|
||||
};
|
||||
|
||||
struct deadline_data {
|
||||
@ -88,8 +93,6 @@ struct deadline_data {
|
||||
unsigned int batching; /* number of sequential requests made */
|
||||
unsigned int starved; /* times reads have starved writes */
|
||||
|
||||
struct io_stats __percpu *stats;
|
||||
|
||||
/*
|
||||
* settings that change how the i/o scheduler behaves
|
||||
*/
|
||||
@ -98,38 +101,12 @@ struct deadline_data {
|
||||
int writes_starved;
|
||||
int front_merges;
|
||||
u32 async_depth;
|
||||
int prio_aging_expire;
|
||||
|
||||
spinlock_t lock;
|
||||
spinlock_t zone_lock;
|
||||
};
|
||||
|
||||
/* Count one event of type 'event_type' and with I/O priority 'prio' */
|
||||
#define dd_count(dd, event_type, prio) do { \
|
||||
struct io_stats *io_stats = get_cpu_ptr((dd)->stats); \
|
||||
\
|
||||
BUILD_BUG_ON(!__same_type((dd), struct deadline_data *)); \
|
||||
BUILD_BUG_ON(!__same_type((prio), enum dd_prio)); \
|
||||
local_inc(&io_stats->stats[(prio)].event_type); \
|
||||
put_cpu_ptr(io_stats); \
|
||||
} while (0)
|
||||
|
||||
/*
|
||||
* Returns the total number of dd_count(dd, event_type, prio) calls across all
|
||||
* CPUs. No locking or barriers since it is fine if the returned sum is slightly
|
||||
* outdated.
|
||||
*/
|
||||
#define dd_sum(dd, event_type, prio) ({ \
|
||||
unsigned int cpu; \
|
||||
u32 sum = 0; \
|
||||
\
|
||||
BUILD_BUG_ON(!__same_type((dd), struct deadline_data *)); \
|
||||
BUILD_BUG_ON(!__same_type((prio), enum dd_prio)); \
|
||||
for_each_present_cpu(cpu) \
|
||||
sum += local_read(&per_cpu_ptr((dd)->stats, cpu)-> \
|
||||
stats[(prio)].event_type); \
|
||||
sum; \
|
||||
})
|
||||
|
||||
/* Maps an I/O priority class to a deadline scheduler priority. */
|
||||
static const enum dd_prio ioprio_class_to_prio[] = {
|
||||
[IOPRIO_CLASS_NONE] = DD_BE_PRIO,
|
||||
@ -233,7 +210,9 @@ static void dd_merged_requests(struct request_queue *q, struct request *req,
|
||||
const u8 ioprio_class = dd_rq_ioclass(next);
|
||||
const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
|
||||
|
||||
dd_count(dd, merged, prio);
|
||||
lockdep_assert_held(&dd->lock);
|
||||
|
||||
dd->per_prio[prio].stats.merged++;
|
||||
|
||||
/*
|
||||
* if next expires before rq, assign its expire time to rq
|
||||
@ -270,6 +249,16 @@ deadline_move_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
|
||||
deadline_remove_request(rq->q, per_prio, rq);
|
||||
}
|
||||
|
||||
/* Number of requests queued for a given priority level. */
|
||||
static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
|
||||
{
|
||||
const struct io_stats_per_prio *stats = &dd->per_prio[prio].stats;
|
||||
|
||||
lockdep_assert_held(&dd->lock);
|
||||
|
||||
return stats->inserted - atomic_read(&stats->completed);
|
||||
}
|
||||
|
||||
/*
|
||||
* deadline_check_fifo returns 0 if there are no expired requests on the fifo,
|
||||
* 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
|
||||
@ -355,12 +344,27 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
|
||||
return rq;
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true if and only if @rq started after @latest_start where
|
||||
* @latest_start is in jiffies.
|
||||
*/
|
||||
static bool started_after(struct deadline_data *dd, struct request *rq,
|
||||
unsigned long latest_start)
|
||||
{
|
||||
unsigned long start_time = (unsigned long)rq->fifo_time;
|
||||
|
||||
start_time -= dd->fifo_expire[rq_data_dir(rq)];
|
||||
|
||||
return time_after(start_time, latest_start);
|
||||
}
|
||||
|
||||
/*
|
||||
* deadline_dispatch_requests selects the best request according to
|
||||
* read/write expire, fifo_batch, etc
|
||||
* read/write expire, fifo_batch, etc and with a start time <= @latest_start.
|
||||
*/
|
||||
static struct request *__dd_dispatch_request(struct deadline_data *dd,
|
||||
struct dd_per_prio *per_prio)
|
||||
struct dd_per_prio *per_prio,
|
||||
unsigned long latest_start)
|
||||
{
|
||||
struct request *rq, *next_rq;
|
||||
enum dd_data_dir data_dir;
|
||||
@ -372,6 +376,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
|
||||
if (!list_empty(&per_prio->dispatch)) {
|
||||
rq = list_first_entry(&per_prio->dispatch, struct request,
|
||||
queuelist);
|
||||
if (started_after(dd, rq, latest_start))
|
||||
return NULL;
|
||||
list_del_init(&rq->queuelist);
|
||||
goto done;
|
||||
}
|
||||
@ -449,6 +455,9 @@ dispatch_find_request:
|
||||
dd->batching = 0;
|
||||
|
||||
dispatch_request:
|
||||
if (started_after(dd, rq, latest_start))
|
||||
return NULL;
|
||||
|
||||
/*
|
||||
* rq is the selected appropriate request.
|
||||
*/
|
||||
@ -457,7 +466,7 @@ dispatch_request:
|
||||
done:
|
||||
ioprio_class = dd_rq_ioclass(rq);
|
||||
prio = ioprio_class_to_prio[ioprio_class];
|
||||
dd_count(dd, dispatched, prio);
|
||||
dd->per_prio[prio].stats.dispatched++;
|
||||
/*
|
||||
* If the request needs its target zone locked, do it.
|
||||
*/
|
||||
@ -466,6 +475,34 @@ done:
|
||||
return rq;
|
||||
}
|
||||
|
||||
/*
|
||||
* Check whether there are any requests with priority other than DD_RT_PRIO
|
||||
* that were inserted more than prio_aging_expire jiffies ago.
|
||||
*/
|
||||
static struct request *dd_dispatch_prio_aged_requests(struct deadline_data *dd,
|
||||
unsigned long now)
|
||||
{
|
||||
struct request *rq;
|
||||
enum dd_prio prio;
|
||||
int prio_cnt;
|
||||
|
||||
lockdep_assert_held(&dd->lock);
|
||||
|
||||
prio_cnt = !!dd_queued(dd, DD_RT_PRIO) + !!dd_queued(dd, DD_BE_PRIO) +
|
||||
!!dd_queued(dd, DD_IDLE_PRIO);
|
||||
if (prio_cnt < 2)
|
||||
return NULL;
|
||||
|
||||
for (prio = DD_BE_PRIO; prio <= DD_PRIO_MAX; prio++) {
|
||||
rq = __dd_dispatch_request(dd, &dd->per_prio[prio],
|
||||
now - dd->prio_aging_expire);
|
||||
if (rq)
|
||||
return rq;
|
||||
}
|
||||
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* Called from blk_mq_run_hw_queue() -> __blk_mq_sched_dispatch_requests().
|
||||
*
|
||||
@ -477,15 +514,26 @@ done:
|
||||
static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
struct deadline_data *dd = hctx->queue->elevator->elevator_data;
|
||||
const unsigned long now = jiffies;
|
||||
struct request *rq;
|
||||
enum dd_prio prio;
|
||||
|
||||
spin_lock(&dd->lock);
|
||||
rq = dd_dispatch_prio_aged_requests(dd, now);
|
||||
if (rq)
|
||||
goto unlock;
|
||||
|
||||
/*
|
||||
* Next, dispatch requests in priority order. Ignore lower priority
|
||||
* requests if any higher priority requests are pending.
|
||||
*/
|
||||
for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
|
||||
rq = __dd_dispatch_request(dd, &dd->per_prio[prio]);
|
||||
if (rq)
|
||||
rq = __dd_dispatch_request(dd, &dd->per_prio[prio], now);
|
||||
if (rq || dd_queued(dd, prio))
|
||||
break;
|
||||
}
|
||||
|
||||
unlock:
|
||||
spin_unlock(&dd->lock);
|
||||
|
||||
return rq;
|
||||
@ -519,7 +567,7 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
|
||||
|
||||
dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
|
||||
|
||||
sbitmap_queue_min_shallow_depth(tags->bitmap_tags, dd->async_depth);
|
||||
sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
|
||||
}
|
||||
|
||||
/* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
|
||||
@ -536,12 +584,21 @@ static void dd_exit_sched(struct elevator_queue *e)
|
||||
|
||||
for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
|
||||
struct dd_per_prio *per_prio = &dd->per_prio[prio];
|
||||
const struct io_stats_per_prio *stats = &per_prio->stats;
|
||||
uint32_t queued;
|
||||
|
||||
WARN_ON_ONCE(!list_empty(&per_prio->fifo_list[DD_READ]));
|
||||
WARN_ON_ONCE(!list_empty(&per_prio->fifo_list[DD_WRITE]));
|
||||
}
|
||||
|
||||
free_percpu(dd->stats);
|
||||
spin_lock(&dd->lock);
|
||||
queued = dd_queued(dd, prio);
|
||||
spin_unlock(&dd->lock);
|
||||
|
||||
WARN_ONCE(queued != 0,
|
||||
"statistics for priority %d: i %u m %u d %u c %u\n",
|
||||
prio, stats->inserted, stats->merged,
|
||||
stats->dispatched, atomic_read(&stats->completed));
|
||||
}
|
||||
|
||||
kfree(dd);
|
||||
}
|
||||
@ -566,11 +623,6 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
|
||||
|
||||
eq->elevator_data = dd;
|
||||
|
||||
dd->stats = alloc_percpu_gfp(typeof(*dd->stats),
|
||||
GFP_KERNEL | __GFP_ZERO);
|
||||
if (!dd->stats)
|
||||
goto free_dd;
|
||||
|
||||
for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
|
||||
struct dd_per_prio *per_prio = &dd->per_prio[prio];
|
||||
|
||||
@ -586,15 +638,13 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
|
||||
dd->front_merges = 1;
|
||||
dd->last_dir = DD_WRITE;
|
||||
dd->fifo_batch = fifo_batch;
|
||||
dd->prio_aging_expire = prio_aging_expire;
|
||||
spin_lock_init(&dd->lock);
|
||||
spin_lock_init(&dd->zone_lock);
|
||||
|
||||
q->elevator = eq;
|
||||
return 0;
|
||||
|
||||
free_dd:
|
||||
kfree(dd);
|
||||
|
||||
put_eq:
|
||||
kobject_put(&eq->kobj);
|
||||
return ret;
|
||||
@ -677,8 +727,11 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
|
||||
blk_req_zone_write_unlock(rq);
|
||||
|
||||
prio = ioprio_class_to_prio[ioprio_class];
|
||||
dd_count(dd, inserted, prio);
|
||||
rq->elv.priv[0] = (void *)(uintptr_t)1;
|
||||
per_prio = &dd->per_prio[prio];
|
||||
if (!rq->elv.priv[0]) {
|
||||
per_prio->stats.inserted++;
|
||||
rq->elv.priv[0] = (void *)(uintptr_t)1;
|
||||
}
|
||||
|
||||
if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
|
||||
blk_mq_free_requests(&free);
|
||||
@ -687,7 +740,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
|
||||
|
||||
trace_block_rq_insert(rq);
|
||||
|
||||
per_prio = &dd->per_prio[prio];
|
||||
if (at_head) {
|
||||
list_add(&rq->queuelist, &per_prio->dispatch);
|
||||
} else {
|
||||
@ -759,12 +811,13 @@ static void dd_finish_request(struct request *rq)
|
||||
|
||||
/*
|
||||
* The block layer core may call dd_finish_request() without having
|
||||
* called dd_insert_requests(). Hence only update statistics for
|
||||
* requests for which dd_insert_requests() has been called. See also
|
||||
* blk_mq_request_bypass_insert().
|
||||
* called dd_insert_requests(). Skip requests that bypassed I/O
|
||||
* scheduling. See also blk_mq_request_bypass_insert().
|
||||
*/
|
||||
if (rq->elv.priv[0])
|
||||
dd_count(dd, completed, prio);
|
||||
if (!rq->elv.priv[0])
|
||||
return;
|
||||
|
||||
atomic_inc(&per_prio->stats.completed);
|
||||
|
||||
if (blk_queue_is_zoned(q)) {
|
||||
unsigned long flags;
|
||||
@ -809,6 +862,7 @@ static ssize_t __FUNC(struct elevator_queue *e, char *page) \
|
||||
#define SHOW_JIFFIES(__FUNC, __VAR) SHOW_INT(__FUNC, jiffies_to_msecs(__VAR))
|
||||
SHOW_JIFFIES(deadline_read_expire_show, dd->fifo_expire[DD_READ]);
|
||||
SHOW_JIFFIES(deadline_write_expire_show, dd->fifo_expire[DD_WRITE]);
|
||||
SHOW_JIFFIES(deadline_prio_aging_expire_show, dd->prio_aging_expire);
|
||||
SHOW_INT(deadline_writes_starved_show, dd->writes_starved);
|
||||
SHOW_INT(deadline_front_merges_show, dd->front_merges);
|
||||
SHOW_INT(deadline_async_depth_show, dd->front_merges);
|
||||
@ -838,6 +892,7 @@ static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count)
|
||||
STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, msecs_to_jiffies)
|
||||
STORE_JIFFIES(deadline_read_expire_store, &dd->fifo_expire[DD_READ], 0, INT_MAX);
|
||||
STORE_JIFFIES(deadline_write_expire_store, &dd->fifo_expire[DD_WRITE], 0, INT_MAX);
|
||||
STORE_JIFFIES(deadline_prio_aging_expire_store, &dd->prio_aging_expire, 0, INT_MAX);
|
||||
STORE_INT(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX);
|
||||
STORE_INT(deadline_front_merges_store, &dd->front_merges, 0, 1);
|
||||
STORE_INT(deadline_async_depth_store, &dd->front_merges, 1, INT_MAX);
|
||||
@ -856,6 +911,7 @@ static struct elv_fs_entry deadline_attrs[] = {
|
||||
DD_ATTR(front_merges),
|
||||
DD_ATTR(async_depth),
|
||||
DD_ATTR(fifo_batch),
|
||||
DD_ATTR(prio_aging_expire),
|
||||
__ATTR_NULL
|
||||
};
|
||||
|
||||
@ -947,38 +1003,48 @@ static int dd_async_depth_show(void *data, struct seq_file *m)
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Number of requests queued for a given priority level. */
|
||||
static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
|
||||
{
|
||||
return dd_sum(dd, inserted, prio) - dd_sum(dd, completed, prio);
|
||||
}
|
||||
|
||||
static int dd_queued_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct request_queue *q = data;
|
||||
struct deadline_data *dd = q->elevator->elevator_data;
|
||||
u32 rt, be, idle;
|
||||
|
||||
spin_lock(&dd->lock);
|
||||
rt = dd_queued(dd, DD_RT_PRIO);
|
||||
be = dd_queued(dd, DD_BE_PRIO);
|
||||
idle = dd_queued(dd, DD_IDLE_PRIO);
|
||||
spin_unlock(&dd->lock);
|
||||
|
||||
seq_printf(m, "%u %u %u\n", rt, be, idle);
|
||||
|
||||
seq_printf(m, "%u %u %u\n", dd_queued(dd, DD_RT_PRIO),
|
||||
dd_queued(dd, DD_BE_PRIO),
|
||||
dd_queued(dd, DD_IDLE_PRIO));
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Number of requests owned by the block driver for a given priority. */
|
||||
static u32 dd_owned_by_driver(struct deadline_data *dd, enum dd_prio prio)
|
||||
{
|
||||
return dd_sum(dd, dispatched, prio) + dd_sum(dd, merged, prio)
|
||||
- dd_sum(dd, completed, prio);
|
||||
const struct io_stats_per_prio *stats = &dd->per_prio[prio].stats;
|
||||
|
||||
lockdep_assert_held(&dd->lock);
|
||||
|
||||
return stats->dispatched + stats->merged -
|
||||
atomic_read(&stats->completed);
|
||||
}
|
||||
|
||||
static int dd_owned_by_driver_show(void *data, struct seq_file *m)
|
||||
{
|
||||
struct request_queue *q = data;
|
||||
struct deadline_data *dd = q->elevator->elevator_data;
|
||||
u32 rt, be, idle;
|
||||
|
||||
spin_lock(&dd->lock);
|
||||
rt = dd_owned_by_driver(dd, DD_RT_PRIO);
|
||||
be = dd_owned_by_driver(dd, DD_BE_PRIO);
|
||||
idle = dd_owned_by_driver(dd, DD_IDLE_PRIO);
|
||||
spin_unlock(&dd->lock);
|
||||
|
||||
seq_printf(m, "%u %u %u\n", rt, be, idle);
|
||||
|
||||
seq_printf(m, "%u %u %u\n", dd_owned_by_driver(dd, DD_RT_PRIO),
|
||||
dd_owned_by_driver(dd, DD_BE_PRIO),
|
||||
dd_owned_by_driver(dd, DD_IDLE_PRIO));
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
@ -2,6 +2,8 @@
|
||||
#
|
||||
# Partition configuration
|
||||
#
|
||||
menu "Partition Types"
|
||||
|
||||
config PARTITION_ADVANCED
|
||||
bool "Advanced partition selection"
|
||||
help
|
||||
@ -267,3 +269,5 @@ config CMDLINE_PARTITION
|
||||
help
|
||||
Say Y here if you want to read the partition table from bootargs.
|
||||
The format for the command line is just like mtdparts.
|
||||
|
||||
endmenu
|
||||
|
@ -5,6 +5,7 @@
|
||||
* Copyright (C) 2020 Christoph Hellwig
|
||||
*/
|
||||
#include <linux/fs.h>
|
||||
#include <linux/major.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/ctype.h>
|
||||
#include <linux/genhd.h>
|
||||
@ -203,7 +204,7 @@ static ssize_t part_alignment_offset_show(struct device *dev,
|
||||
struct block_device *bdev = dev_to_bdev(dev);
|
||||
|
||||
return sprintf(buf, "%u\n",
|
||||
queue_limit_alignment_offset(&bdev->bd_disk->queue->limits,
|
||||
queue_limit_alignment_offset(&bdev_get_queue(bdev)->limits,
|
||||
bdev->bd_start_sect));
|
||||
}
|
||||
|
||||
@ -213,7 +214,7 @@ static ssize_t part_discard_alignment_show(struct device *dev,
|
||||
struct block_device *bdev = dev_to_bdev(dev);
|
||||
|
||||
return sprintf(buf, "%u\n",
|
||||
queue_limit_discard_alignment(&bdev->bd_disk->queue->limits,
|
||||
queue_limit_discard_alignment(&bdev_get_queue(bdev)->limits,
|
||||
bdev->bd_start_sect));
|
||||
}
|
||||
|
||||
|
@ -5,7 +5,7 @@
|
||||
*/
|
||||
|
||||
#include <linux/t10-pi.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/crc-t10dif.h>
|
||||
#include <linux/module.h>
|
||||
#include <net/checksum.h>
|
||||
|
@ -61,10 +61,10 @@
|
||||
#include <linux/hdreg.h>
|
||||
#include <linux/delay.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/major.h>
|
||||
#include <linux/mutex.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/elevator.h>
|
||||
#include <linux/interrupt.h>
|
||||
#include <linux/platform_device.h>
|
||||
|
||||
|
@ -68,6 +68,7 @@
|
||||
#include <linux/delay.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/major.h>
|
||||
#include <linux/mutex.h>
|
||||
#include <linux/completion.h>
|
||||
#include <linux/wait.h>
|
||||
|
@ -282,7 +282,7 @@ out:
|
||||
return err;
|
||||
}
|
||||
|
||||
static blk_qc_t brd_submit_bio(struct bio *bio)
|
||||
static void brd_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct brd_device *brd = bio->bi_bdev->bd_disk->private_data;
|
||||
sector_t sector = bio->bi_iter.bi_sector;
|
||||
@ -299,16 +299,14 @@ static blk_qc_t brd_submit_bio(struct bio *bio)
|
||||
|
||||
err = brd_do_bvec(brd, bvec.bv_page, len, bvec.bv_offset,
|
||||
bio_op(bio), sector);
|
||||
if (err)
|
||||
goto io_error;
|
||||
if (err) {
|
||||
bio_io_error(bio);
|
||||
return;
|
||||
}
|
||||
sector += len >> SECTOR_SHIFT;
|
||||
}
|
||||
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
io_error:
|
||||
bio_io_error(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static int brd_rw_page(struct block_device *bdev, sector_t sector,
|
||||
|
@ -1448,7 +1448,7 @@ extern void conn_free_crypto(struct drbd_connection *connection);
|
||||
/* drbd_req */
|
||||
extern void do_submit(struct work_struct *ws);
|
||||
extern void __drbd_make_request(struct drbd_device *, struct bio *);
|
||||
extern blk_qc_t drbd_submit_bio(struct bio *bio);
|
||||
void drbd_submit_bio(struct bio *bio);
|
||||
extern int drbd_read_remote(struct drbd_device *device, struct drbd_request *req);
|
||||
extern int is_valid_ar_handle(struct drbd_request *, sector_t);
|
||||
|
||||
|
@ -1596,7 +1596,7 @@ void do_submit(struct work_struct *ws)
|
||||
}
|
||||
}
|
||||
|
||||
blk_qc_t drbd_submit_bio(struct bio *bio)
|
||||
void drbd_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct drbd_device *device = bio->bi_bdev->bd_disk->private_data;
|
||||
|
||||
@ -1609,7 +1609,6 @@ blk_qc_t drbd_submit_bio(struct bio *bio)
|
||||
|
||||
inc_ap_bio(device);
|
||||
__drbd_make_request(device, bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static bool net_timeout_reached(struct drbd_request *net_req,
|
||||
|
@ -184,6 +184,7 @@ static int print_unex = 1;
|
||||
#include <linux/ioport.h>
|
||||
#include <linux/interrupt.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/major.h>
|
||||
#include <linux/platform_device.h>
|
||||
#include <linux/mod_devicetable.h>
|
||||
#include <linux/mutex.h>
|
||||
|
@ -272,19 +272,6 @@ static void __loop_update_dio(struct loop_device *lo, bool dio)
|
||||
blk_mq_unfreeze_queue(lo->lo_queue);
|
||||
}
|
||||
|
||||
/**
|
||||
* loop_validate_block_size() - validates the passed in block size
|
||||
* @bsize: size to validate
|
||||
*/
|
||||
static int
|
||||
loop_validate_block_size(unsigned short bsize)
|
||||
{
|
||||
if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize))
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* loop_set_size() - sets device size and notifies userspace
|
||||
* @lo: struct loop_device to set the size for
|
||||
@ -1236,7 +1223,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
|
||||
}
|
||||
|
||||
if (config->block_size) {
|
||||
error = loop_validate_block_size(config->block_size);
|
||||
error = blk_validate_block_size(config->block_size);
|
||||
if (error)
|
||||
goto out_unlock;
|
||||
}
|
||||
@ -1329,7 +1316,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
|
||||
{
|
||||
struct file *filp = NULL;
|
||||
gfp_t gfp = lo->old_gfp_mask;
|
||||
struct block_device *bdev = lo->lo_device;
|
||||
int err = 0;
|
||||
bool partscan = false;
|
||||
int lo_number;
|
||||
@ -1395,22 +1381,16 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
|
||||
blk_queue_logical_block_size(lo->lo_queue, 512);
|
||||
blk_queue_physical_block_size(lo->lo_queue, 512);
|
||||
blk_queue_io_min(lo->lo_queue, 512);
|
||||
if (bdev) {
|
||||
invalidate_bdev(bdev);
|
||||
bdev->bd_inode->i_mapping->wb_err = 0;
|
||||
}
|
||||
set_capacity(lo->lo_disk, 0);
|
||||
invalidate_disk(lo->lo_disk);
|
||||
loop_sysfs_exit(lo);
|
||||
if (bdev) {
|
||||
/* let user-space know about this change */
|
||||
kobject_uevent(&disk_to_dev(bdev->bd_disk)->kobj, KOBJ_CHANGE);
|
||||
}
|
||||
/* let user-space know about this change */
|
||||
kobject_uevent(&disk_to_dev(lo->lo_disk)->kobj, KOBJ_CHANGE);
|
||||
mapping_set_gfp_mask(filp->f_mapping, gfp);
|
||||
/* This is safe: open() is still holding a reference. */
|
||||
module_put(THIS_MODULE);
|
||||
blk_mq_unfreeze_queue(lo->lo_queue);
|
||||
|
||||
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN && bdev;
|
||||
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
|
||||
lo_number = lo->lo_number;
|
||||
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
|
||||
out_unlock:
|
||||
@ -1759,7 +1739,7 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
|
||||
if (lo->lo_state != Lo_bound)
|
||||
return -ENXIO;
|
||||
|
||||
err = loop_validate_block_size(arg);
|
||||
err = blk_validate_block_size(arg);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
|
@ -84,7 +84,7 @@ static bool n64cart_do_bvec(struct device *dev, struct bio_vec *bv, u32 pos)
|
||||
return true;
|
||||
}
|
||||
|
||||
static blk_qc_t n64cart_submit_bio(struct bio *bio)
|
||||
static void n64cart_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct bio_vec bvec;
|
||||
struct bvec_iter iter;
|
||||
@ -92,16 +92,14 @@ static blk_qc_t n64cart_submit_bio(struct bio *bio)
|
||||
u32 pos = bio->bi_iter.bi_sector << SECTOR_SHIFT;
|
||||
|
||||
bio_for_each_segment(bvec, bio, iter) {
|
||||
if (!n64cart_do_bvec(dev, &bvec, pos))
|
||||
goto io_error;
|
||||
if (!n64cart_do_bvec(dev, &bvec, pos)) {
|
||||
bio_io_error(bio);
|
||||
return;
|
||||
}
|
||||
pos += bvec.bv_len;
|
||||
}
|
||||
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
io_error:
|
||||
bio_io_error(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static const struct block_device_operations n64cart_fops = {
|
||||
|
@ -310,20 +310,13 @@ static void nbd_mark_nsock_dead(struct nbd_device *nbd, struct nbd_sock *nsock,
|
||||
nsock->sent = 0;
|
||||
}
|
||||
|
||||
static void nbd_size_clear(struct nbd_device *nbd)
|
||||
{
|
||||
if (nbd->config->bytesize) {
|
||||
set_capacity(nbd->disk, 0);
|
||||
kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
|
||||
}
|
||||
}
|
||||
|
||||
static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
|
||||
loff_t blksize)
|
||||
{
|
||||
if (!blksize)
|
||||
blksize = 1u << NBD_DEF_BLKSIZE_BITS;
|
||||
if (blksize < 512 || blksize > PAGE_SIZE || !is_power_of_2(blksize))
|
||||
|
||||
if (blk_validate_block_size(blksize))
|
||||
return -EINVAL;
|
||||
|
||||
nbd->config->bytesize = bytesize;
|
||||
@ -1237,7 +1230,9 @@ static void nbd_config_put(struct nbd_device *nbd)
|
||||
&nbd->config_lock)) {
|
||||
struct nbd_config *config = nbd->config;
|
||||
nbd_dev_dbg_close(nbd);
|
||||
nbd_size_clear(nbd);
|
||||
invalidate_disk(nbd->disk);
|
||||
if (nbd->config->bytesize)
|
||||
kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
|
||||
if (test_and_clear_bit(NBD_RT_HAS_PID_FILE,
|
||||
&config->runtime_flags))
|
||||
device_remove_file(disk_to_dev(nbd->disk), &pid_attr);
|
||||
|
@ -1422,7 +1422,7 @@ static struct nullb_queue *nullb_to_queue(struct nullb *nullb)
|
||||
return &nullb->queues[index];
|
||||
}
|
||||
|
||||
static blk_qc_t null_submit_bio(struct bio *bio)
|
||||
static void null_submit_bio(struct bio *bio)
|
||||
{
|
||||
sector_t sector = bio->bi_iter.bi_sector;
|
||||
sector_t nr_sectors = bio_sectors(bio);
|
||||
@ -1434,7 +1434,6 @@ static blk_qc_t null_submit_bio(struct bio *bio)
|
||||
cmd->bio = bio;
|
||||
|
||||
null_handle_cmd(cmd, sector, nr_sectors, bio_op(bio));
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static bool should_timeout_request(struct request *rq)
|
||||
|
@ -2400,7 +2400,7 @@ static void pkt_make_request_write(struct request_queue *q, struct bio *bio)
|
||||
}
|
||||
}
|
||||
|
||||
static blk_qc_t pkt_submit_bio(struct bio *bio)
|
||||
static void pkt_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct pktcdvd_device *pd;
|
||||
char b[BDEVNAME_SIZE];
|
||||
@ -2423,7 +2423,7 @@ static blk_qc_t pkt_submit_bio(struct bio *bio)
|
||||
*/
|
||||
if (bio_data_dir(bio) == READ) {
|
||||
pkt_make_request_read(pd, bio);
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
}
|
||||
|
||||
if (!test_bit(PACKET_WRITABLE, &pd->flags)) {
|
||||
@ -2455,10 +2455,9 @@ static blk_qc_t pkt_submit_bio(struct bio *bio)
|
||||
pkt_make_request_write(bio->bi_bdev->bd_disk->queue, split);
|
||||
} while (split != bio);
|
||||
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
end_io:
|
||||
bio_io_error(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static void pkt_init_queue(struct pktcdvd_device *pd)
|
||||
|
@ -578,7 +578,7 @@ out:
|
||||
return next;
|
||||
}
|
||||
|
||||
static blk_qc_t ps3vram_submit_bio(struct bio *bio)
|
||||
static void ps3vram_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct ps3_system_bus_device *dev = bio->bi_bdev->bd_disk->private_data;
|
||||
struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
|
||||
@ -594,13 +594,11 @@ static blk_qc_t ps3vram_submit_bio(struct bio *bio)
|
||||
spin_unlock_irq(&priv->lock);
|
||||
|
||||
if (busy)
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
|
||||
do {
|
||||
bio = ps3vram_do_bio(dev, bio);
|
||||
} while (bio);
|
||||
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static const struct block_device_operations ps3vram_fops = {
|
||||
|
@ -836,7 +836,7 @@ struct rbd_options {
|
||||
u32 alloc_hint_flags; /* CEPH_OSD_OP_ALLOC_HINT_FLAG_* */
|
||||
};
|
||||
|
||||
#define RBD_QUEUE_DEPTH_DEFAULT BLKDEV_MAX_RQ
|
||||
#define RBD_QUEUE_DEPTH_DEFAULT BLKDEV_DEFAULT_RQ
|
||||
#define RBD_ALLOC_SIZE_DEFAULT (64 * 1024)
|
||||
#define RBD_LOCK_TIMEOUT_DEFAULT 0 /* no timeout */
|
||||
#define RBD_READ_ONLY_DEFAULT false
|
||||
|
@ -1176,7 +1176,7 @@ static blk_status_t rnbd_queue_rq(struct blk_mq_hw_ctx *hctx,
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int rnbd_rdma_poll(struct blk_mq_hw_ctx *hctx)
|
||||
static int rnbd_rdma_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
|
||||
{
|
||||
struct rnbd_queue *q = hctx->driver_data;
|
||||
struct rnbd_clt_dev *dev = q->dev;
|
||||
|
@ -10,7 +10,7 @@
|
||||
#define RNBD_PROTO_H
|
||||
|
||||
#include <linux/types.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/limits.h>
|
||||
#include <linux/inet.h>
|
||||
#include <linux/in.h>
|
||||
|
@ -50,7 +50,7 @@ struct rsxx_bio_meta {
|
||||
|
||||
static struct kmem_cache *bio_meta_pool;
|
||||
|
||||
static blk_qc_t rsxx_submit_bio(struct bio *bio);
|
||||
static void rsxx_submit_bio(struct bio *bio);
|
||||
|
||||
/*----------------- Block Device Operations -----------------*/
|
||||
static int rsxx_blkdev_ioctl(struct block_device *bdev,
|
||||
@ -120,7 +120,7 @@ static void bio_dma_done_cb(struct rsxx_cardinfo *card,
|
||||
}
|
||||
}
|
||||
|
||||
static blk_qc_t rsxx_submit_bio(struct bio *bio)
|
||||
static void rsxx_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct rsxx_cardinfo *card = bio->bi_bdev->bd_disk->private_data;
|
||||
struct rsxx_bio_meta *bio_meta;
|
||||
@ -169,7 +169,7 @@ static blk_qc_t rsxx_submit_bio(struct bio *bio)
|
||||
if (st)
|
||||
goto queue_err;
|
||||
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
|
||||
queue_err:
|
||||
kmem_cache_free(bio_meta_pool, bio_meta);
|
||||
@ -177,7 +177,6 @@ req_err:
|
||||
if (st)
|
||||
bio->bi_status = st;
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
/*----------------- Device Setup -------------------*/
|
||||
|
@ -16,6 +16,7 @@
|
||||
#include <linux/fd.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/major.h>
|
||||
#include <linux/mutex.h>
|
||||
#include <linux/hdreg.h>
|
||||
#include <linux/kernel.h>
|
||||
|
@ -815,9 +815,17 @@ static int virtblk_probe(struct virtio_device *vdev)
|
||||
err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
|
||||
struct virtio_blk_config, blk_size,
|
||||
&blk_size);
|
||||
if (!err)
|
||||
if (!err) {
|
||||
err = blk_validate_block_size(blk_size);
|
||||
if (err) {
|
||||
dev_err(&vdev->dev,
|
||||
"virtio_blk: invalid block size: 0x%x\n",
|
||||
blk_size);
|
||||
goto out_cleanup_disk;
|
||||
}
|
||||
|
||||
blk_queue_logical_block_size(q, blk_size);
|
||||
else
|
||||
} else
|
||||
blk_size = queue_logical_block_size(q);
|
||||
|
||||
/* Use topology information if available */
|
||||
|
@ -42,6 +42,7 @@
|
||||
#include <linux/cdrom.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/major.h>
|
||||
#include <linux/mutex.h>
|
||||
#include <linux/scatterlist.h>
|
||||
#include <linux/bitmap.h>
|
||||
|
@ -1598,22 +1598,18 @@ static void __zram_make_request(struct zram *zram, struct bio *bio)
|
||||
/*
|
||||
* Handler function for all zram I/O requests.
|
||||
*/
|
||||
static blk_qc_t zram_submit_bio(struct bio *bio)
|
||||
static void zram_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct zram *zram = bio->bi_bdev->bd_disk->private_data;
|
||||
|
||||
if (!valid_io_request(zram, bio->bi_iter.bi_sector,
|
||||
bio->bi_iter.bi_size)) {
|
||||
atomic64_inc(&zram->stats.invalid_io);
|
||||
goto error;
|
||||
bio_io_error(bio);
|
||||
return;
|
||||
}
|
||||
|
||||
__zram_make_request(zram, bio);
|
||||
return BLK_QC_T_NONE;
|
||||
|
||||
error:
|
||||
bio_io_error(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static void zram_slot_free_notify(struct block_device *bdev,
|
||||
|
@ -30,6 +30,7 @@
|
||||
#include <linux/sched.h>
|
||||
#include <linux/types.h>
|
||||
#include <linux/workqueue.h>
|
||||
#include <linux/sched/clock.h>
|
||||
|
||||
struct drm_i915_private;
|
||||
struct timer_list;
|
||||
|
@ -1163,7 +1163,7 @@ static void quit_max_writeback_rate(struct cache_set *c,
|
||||
|
||||
/* Cached devices - read & write stuff */
|
||||
|
||||
blk_qc_t cached_dev_submit_bio(struct bio *bio)
|
||||
void cached_dev_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct search *s;
|
||||
struct block_device *orig_bdev = bio->bi_bdev;
|
||||
@ -1176,7 +1176,7 @@ blk_qc_t cached_dev_submit_bio(struct bio *bio)
|
||||
dc->io_disable)) {
|
||||
bio->bi_status = BLK_STS_IOERR;
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
}
|
||||
|
||||
if (likely(d->c)) {
|
||||
@ -1222,8 +1222,6 @@ blk_qc_t cached_dev_submit_bio(struct bio *bio)
|
||||
} else
|
||||
/* I/O request sent to backing device */
|
||||
detached_dev_do_request(d, bio, orig_bdev, start_time);
|
||||
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static int cached_dev_ioctl(struct bcache_device *d, fmode_t mode,
|
||||
@ -1273,7 +1271,7 @@ static void flash_dev_nodata(struct closure *cl)
|
||||
continue_at(cl, search_free, NULL);
|
||||
}
|
||||
|
||||
blk_qc_t flash_dev_submit_bio(struct bio *bio)
|
||||
void flash_dev_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct search *s;
|
||||
struct closure *cl;
|
||||
@ -1282,7 +1280,7 @@ blk_qc_t flash_dev_submit_bio(struct bio *bio)
|
||||
if (unlikely(d->c && test_bit(CACHE_SET_IO_DISABLE, &d->c->flags))) {
|
||||
bio->bi_status = BLK_STS_IOERR;
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
}
|
||||
|
||||
s = search_alloc(bio, d, bio->bi_bdev, bio_start_io_acct(bio));
|
||||
@ -1298,7 +1296,7 @@ blk_qc_t flash_dev_submit_bio(struct bio *bio)
|
||||
continue_at_nobarrier(&s->cl,
|
||||
flash_dev_nodata,
|
||||
bcache_wq);
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
} else if (bio_data_dir(bio)) {
|
||||
bch_keybuf_check_overlapping(&s->iop.c->moving_gc_keys,
|
||||
&KEY(d->id, bio->bi_iter.bi_sector, 0),
|
||||
@ -1314,7 +1312,6 @@ blk_qc_t flash_dev_submit_bio(struct bio *bio)
|
||||
}
|
||||
|
||||
continue_at(cl, search_free, NULL);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static int flash_dev_ioctl(struct bcache_device *d, fmode_t mode,
|
||||
|
@ -37,10 +37,10 @@ unsigned int bch_get_congested(const struct cache_set *c);
|
||||
void bch_data_insert(struct closure *cl);
|
||||
|
||||
void bch_cached_dev_request_init(struct cached_dev *dc);
|
||||
blk_qc_t cached_dev_submit_bio(struct bio *bio);
|
||||
void cached_dev_submit_bio(struct bio *bio);
|
||||
|
||||
void bch_flash_dev_request_init(struct bcache_device *d);
|
||||
blk_qc_t flash_dev_submit_bio(struct bio *bio);
|
||||
void flash_dev_submit_bio(struct bio *bio);
|
||||
|
||||
extern struct kmem_cache *bch_search_cache;
|
||||
|
||||
|
@ -8,6 +8,7 @@
|
||||
#define DM_BIO_RECORD_H
|
||||
|
||||
#include <linux/bio.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
|
||||
/*
|
||||
* There are lots of mutable fields in the bio struct that get
|
||||
|
@ -13,7 +13,7 @@
|
||||
#include <linux/ktime.h>
|
||||
#include <linux/genhd.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/keyslot-manager.h>
|
||||
#include <linux/blk-crypto-profile.h>
|
||||
|
||||
#include <trace/events/block.h>
|
||||
|
||||
@ -200,7 +200,7 @@ struct dm_table {
|
||||
struct dm_md_mempools *mempools;
|
||||
|
||||
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
|
||||
struct blk_keyslot_manager *ksm;
|
||||
struct blk_crypto_profile *crypto_profile;
|
||||
#endif
|
||||
};
|
||||
|
||||
|
@ -15,6 +15,7 @@
|
||||
#include <linux/key.h>
|
||||
#include <linux/bio.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/mempool.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/crypto.h>
|
||||
|
@ -12,6 +12,7 @@
|
||||
#include "dm-ima.h"
|
||||
|
||||
#include <linux/ima.h>
|
||||
#include <linux/sched/mm.h>
|
||||
#include <crypto/hash.h>
|
||||
#include <linux/crypto.h>
|
||||
#include <crypto/hash_info.h>
|
||||
|
@ -27,6 +27,7 @@
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/sched/clock.h>
|
||||
|
||||
|
||||
#define DM_MSG_PREFIX "multipath historical-service-time"
|
||||
|
@ -7,7 +7,6 @@
|
||||
#include "dm-core.h"
|
||||
#include "dm-rq.h"
|
||||
|
||||
#include <linux/elevator.h> /* for rq_end_sector() */
|
||||
#include <linux/blk-mq.h>
|
||||
|
||||
#define DM_MSG_PREFIX "core-rq"
|
||||
|
@ -10,6 +10,7 @@
|
||||
#include <linux/module.h>
|
||||
#include <linux/vmalloc.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/namei.h>
|
||||
#include <linux/ctype.h>
|
||||
#include <linux/string.h>
|
||||
@ -169,7 +170,7 @@ static void free_devices(struct list_head *devices, struct mapped_device *md)
|
||||
}
|
||||
}
|
||||
|
||||
static void dm_table_destroy_keyslot_manager(struct dm_table *t);
|
||||
static void dm_table_destroy_crypto_profile(struct dm_table *t);
|
||||
|
||||
void dm_table_destroy(struct dm_table *t)
|
||||
{
|
||||
@ -199,7 +200,7 @@ void dm_table_destroy(struct dm_table *t)
|
||||
|
||||
dm_free_md_mempools(t->mempools);
|
||||
|
||||
dm_table_destroy_keyslot_manager(t);
|
||||
dm_table_destroy_crypto_profile(t);
|
||||
|
||||
kfree(t);
|
||||
}
|
||||
@ -1186,8 +1187,8 @@ static int dm_table_register_integrity(struct dm_table *t)
|
||||
|
||||
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
|
||||
|
||||
struct dm_keyslot_manager {
|
||||
struct blk_keyslot_manager ksm;
|
||||
struct dm_crypto_profile {
|
||||
struct blk_crypto_profile profile;
|
||||
struct mapped_device *md;
|
||||
};
|
||||
|
||||
@ -1213,13 +1214,11 @@ static int dm_keyslot_evict_callback(struct dm_target *ti, struct dm_dev *dev,
|
||||
* When an inline encryption key is evicted from a device-mapper device, evict
|
||||
* it from all the underlying devices.
|
||||
*/
|
||||
static int dm_keyslot_evict(struct blk_keyslot_manager *ksm,
|
||||
static int dm_keyslot_evict(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_key *key, unsigned int slot)
|
||||
{
|
||||
struct dm_keyslot_manager *dksm = container_of(ksm,
|
||||
struct dm_keyslot_manager,
|
||||
ksm);
|
||||
struct mapped_device *md = dksm->md;
|
||||
struct mapped_device *md =
|
||||
container_of(profile, struct dm_crypto_profile, profile)->md;
|
||||
struct dm_keyslot_evict_args args = { key };
|
||||
struct dm_table *t;
|
||||
int srcu_idx;
|
||||
@ -1239,150 +1238,148 @@ static int dm_keyslot_evict(struct blk_keyslot_manager *ksm,
|
||||
return args.err;
|
||||
}
|
||||
|
||||
static const struct blk_ksm_ll_ops dm_ksm_ll_ops = {
|
||||
.keyslot_evict = dm_keyslot_evict,
|
||||
};
|
||||
|
||||
static int device_intersect_crypto_modes(struct dm_target *ti,
|
||||
struct dm_dev *dev, sector_t start,
|
||||
sector_t len, void *data)
|
||||
static int
|
||||
device_intersect_crypto_capabilities(struct dm_target *ti, struct dm_dev *dev,
|
||||
sector_t start, sector_t len, void *data)
|
||||
{
|
||||
struct blk_keyslot_manager *parent = data;
|
||||
struct blk_keyslot_manager *child = bdev_get_queue(dev->bdev)->ksm;
|
||||
struct blk_crypto_profile *parent = data;
|
||||
struct blk_crypto_profile *child =
|
||||
bdev_get_queue(dev->bdev)->crypto_profile;
|
||||
|
||||
blk_ksm_intersect_modes(parent, child);
|
||||
blk_crypto_intersect_capabilities(parent, child);
|
||||
return 0;
|
||||
}
|
||||
|
||||
void dm_destroy_keyslot_manager(struct blk_keyslot_manager *ksm)
|
||||
void dm_destroy_crypto_profile(struct blk_crypto_profile *profile)
|
||||
{
|
||||
struct dm_keyslot_manager *dksm = container_of(ksm,
|
||||
struct dm_keyslot_manager,
|
||||
ksm);
|
||||
struct dm_crypto_profile *dmcp = container_of(profile,
|
||||
struct dm_crypto_profile,
|
||||
profile);
|
||||
|
||||
if (!ksm)
|
||||
if (!profile)
|
||||
return;
|
||||
|
||||
blk_ksm_destroy(ksm);
|
||||
kfree(dksm);
|
||||
blk_crypto_profile_destroy(profile);
|
||||
kfree(dmcp);
|
||||
}
|
||||
|
||||
static void dm_table_destroy_keyslot_manager(struct dm_table *t)
|
||||
static void dm_table_destroy_crypto_profile(struct dm_table *t)
|
||||
{
|
||||
dm_destroy_keyslot_manager(t->ksm);
|
||||
t->ksm = NULL;
|
||||
dm_destroy_crypto_profile(t->crypto_profile);
|
||||
t->crypto_profile = NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* Constructs and initializes t->ksm with a keyslot manager that
|
||||
* represents the common set of crypto capabilities of the devices
|
||||
* described by the dm_table. However, if the constructed keyslot
|
||||
* manager does not support a superset of the crypto capabilities
|
||||
* supported by the current keyslot manager of the mapped_device,
|
||||
* it returns an error instead, since we don't support restricting
|
||||
* crypto capabilities on table changes. Finally, if the constructed
|
||||
* keyslot manager doesn't actually support any crypto modes at all,
|
||||
* it just returns NULL.
|
||||
* Constructs and initializes t->crypto_profile with a crypto profile that
|
||||
* represents the common set of crypto capabilities of the devices described by
|
||||
* the dm_table. However, if the constructed crypto profile doesn't support all
|
||||
* crypto capabilities that are supported by the current mapped_device, it
|
||||
* returns an error instead, since we don't support removing crypto capabilities
|
||||
* on table changes. Finally, if the constructed crypto profile is "empty" (has
|
||||
* no crypto capabilities at all), it just sets t->crypto_profile to NULL.
|
||||
*/
|
||||
static int dm_table_construct_keyslot_manager(struct dm_table *t)
|
||||
static int dm_table_construct_crypto_profile(struct dm_table *t)
|
||||
{
|
||||
struct dm_keyslot_manager *dksm;
|
||||
struct blk_keyslot_manager *ksm;
|
||||
struct dm_crypto_profile *dmcp;
|
||||
struct blk_crypto_profile *profile;
|
||||
struct dm_target *ti;
|
||||
unsigned int i;
|
||||
bool ksm_is_empty = true;
|
||||
bool empty_profile = true;
|
||||
|
||||
dksm = kmalloc(sizeof(*dksm), GFP_KERNEL);
|
||||
if (!dksm)
|
||||
dmcp = kmalloc(sizeof(*dmcp), GFP_KERNEL);
|
||||
if (!dmcp)
|
||||
return -ENOMEM;
|
||||
dksm->md = t->md;
|
||||
dmcp->md = t->md;
|
||||
|
||||
ksm = &dksm->ksm;
|
||||
blk_ksm_init_passthrough(ksm);
|
||||
ksm->ksm_ll_ops = dm_ksm_ll_ops;
|
||||
ksm->max_dun_bytes_supported = UINT_MAX;
|
||||
memset(ksm->crypto_modes_supported, 0xFF,
|
||||
sizeof(ksm->crypto_modes_supported));
|
||||
profile = &dmcp->profile;
|
||||
blk_crypto_profile_init(profile, 0);
|
||||
profile->ll_ops.keyslot_evict = dm_keyslot_evict;
|
||||
profile->max_dun_bytes_supported = UINT_MAX;
|
||||
memset(profile->modes_supported, 0xFF,
|
||||
sizeof(profile->modes_supported));
|
||||
|
||||
for (i = 0; i < dm_table_get_num_targets(t); i++) {
|
||||
ti = dm_table_get_target(t, i);
|
||||
|
||||
if (!dm_target_passes_crypto(ti->type)) {
|
||||
blk_ksm_intersect_modes(ksm, NULL);
|
||||
blk_crypto_intersect_capabilities(profile, NULL);
|
||||
break;
|
||||
}
|
||||
if (!ti->type->iterate_devices)
|
||||
continue;
|
||||
ti->type->iterate_devices(ti, device_intersect_crypto_modes,
|
||||
ksm);
|
||||
ti->type->iterate_devices(ti,
|
||||
device_intersect_crypto_capabilities,
|
||||
profile);
|
||||
}
|
||||
|
||||
if (t->md->queue && !blk_ksm_is_superset(ksm, t->md->queue->ksm)) {
|
||||
if (t->md->queue &&
|
||||
!blk_crypto_has_capabilities(profile,
|
||||
t->md->queue->crypto_profile)) {
|
||||
DMWARN("Inline encryption capabilities of new DM table were more restrictive than the old table's. This is not supported!");
|
||||
dm_destroy_keyslot_manager(ksm);
|
||||
dm_destroy_crypto_profile(profile);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/*
|
||||
* If the new KSM doesn't actually support any crypto modes, we may as
|
||||
* well represent it with a NULL ksm.
|
||||
* If the new profile doesn't actually support any crypto capabilities,
|
||||
* we may as well represent it with a NULL profile.
|
||||
*/
|
||||
ksm_is_empty = true;
|
||||
for (i = 0; i < ARRAY_SIZE(ksm->crypto_modes_supported); i++) {
|
||||
if (ksm->crypto_modes_supported[i]) {
|
||||
ksm_is_empty = false;
|
||||
for (i = 0; i < ARRAY_SIZE(profile->modes_supported); i++) {
|
||||
if (profile->modes_supported[i]) {
|
||||
empty_profile = false;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (ksm_is_empty) {
|
||||
dm_destroy_keyslot_manager(ksm);
|
||||
ksm = NULL;
|
||||
if (empty_profile) {
|
||||
dm_destroy_crypto_profile(profile);
|
||||
profile = NULL;
|
||||
}
|
||||
|
||||
/*
|
||||
* t->ksm is only set temporarily while the table is being set
|
||||
* up, and it gets set to NULL after the capabilities have
|
||||
* been transferred to the request_queue.
|
||||
* t->crypto_profile is only set temporarily while the table is being
|
||||
* set up, and it gets set to NULL after the profile has been
|
||||
* transferred to the request_queue.
|
||||
*/
|
||||
t->ksm = ksm;
|
||||
t->crypto_profile = profile;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void dm_update_keyslot_manager(struct request_queue *q,
|
||||
struct dm_table *t)
|
||||
static void dm_update_crypto_profile(struct request_queue *q,
|
||||
struct dm_table *t)
|
||||
{
|
||||
if (!t->ksm)
|
||||
if (!t->crypto_profile)
|
||||
return;
|
||||
|
||||
/* Make the ksm less restrictive */
|
||||
if (!q->ksm) {
|
||||
blk_ksm_register(t->ksm, q);
|
||||
/* Make the crypto profile less restrictive. */
|
||||
if (!q->crypto_profile) {
|
||||
blk_crypto_register(t->crypto_profile, q);
|
||||
} else {
|
||||
blk_ksm_update_capabilities(q->ksm, t->ksm);
|
||||
dm_destroy_keyslot_manager(t->ksm);
|
||||
blk_crypto_update_capabilities(q->crypto_profile,
|
||||
t->crypto_profile);
|
||||
dm_destroy_crypto_profile(t->crypto_profile);
|
||||
}
|
||||
t->ksm = NULL;
|
||||
t->crypto_profile = NULL;
|
||||
}
|
||||
|
||||
#else /* CONFIG_BLK_INLINE_ENCRYPTION */
|
||||
|
||||
static int dm_table_construct_keyslot_manager(struct dm_table *t)
|
||||
static int dm_table_construct_crypto_profile(struct dm_table *t)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
void dm_destroy_keyslot_manager(struct blk_keyslot_manager *ksm)
|
||||
void dm_destroy_crypto_profile(struct blk_crypto_profile *profile)
|
||||
{
|
||||
}
|
||||
|
||||
static void dm_table_destroy_keyslot_manager(struct dm_table *t)
|
||||
static void dm_table_destroy_crypto_profile(struct dm_table *t)
|
||||
{
|
||||
}
|
||||
|
||||
static void dm_update_keyslot_manager(struct request_queue *q,
|
||||
struct dm_table *t)
|
||||
static void dm_update_crypto_profile(struct request_queue *q,
|
||||
struct dm_table *t)
|
||||
{
|
||||
}
|
||||
|
||||
@ -1414,9 +1411,9 @@ int dm_table_complete(struct dm_table *t)
|
||||
return r;
|
||||
}
|
||||
|
||||
r = dm_table_construct_keyslot_manager(t);
|
||||
r = dm_table_construct_crypto_profile(t);
|
||||
if (r) {
|
||||
DMERR("could not construct keyslot manager.");
|
||||
DMERR("could not construct crypto profile.");
|
||||
return r;
|
||||
}
|
||||
|
||||
@ -2070,7 +2067,7 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
|
||||
return r;
|
||||
}
|
||||
|
||||
dm_update_keyslot_manager(q, t);
|
||||
dm_update_crypto_profile(q, t);
|
||||
disk_update_readahead(t->md->disk);
|
||||
|
||||
return 0;
|
||||
|
@ -18,6 +18,7 @@
|
||||
#include "dm-verity-verify-sig.h"
|
||||
#include <linux/module.h>
|
||||
#include <linux/reboot.h>
|
||||
#include <linux/scatterlist.h>
|
||||
|
||||
#define DM_MSG_PREFIX "verity"
|
||||
|
||||
|
@ -29,7 +29,7 @@
|
||||
#include <linux/refcount.h>
|
||||
#include <linux/part_stat.h>
|
||||
#include <linux/blk-crypto.h>
|
||||
#include <linux/keyslot-manager.h>
|
||||
#include <linux/blk-crypto-profile.h>
|
||||
|
||||
#define DM_MSG_PREFIX "core"
|
||||
|
||||
@ -1183,14 +1183,13 @@ static noinline void __set_swap_bios_limit(struct mapped_device *md, int latch)
|
||||
mutex_unlock(&md->swap_bios_lock);
|
||||
}
|
||||
|
||||
static blk_qc_t __map_bio(struct dm_target_io *tio)
|
||||
static void __map_bio(struct dm_target_io *tio)
|
||||
{
|
||||
int r;
|
||||
sector_t sector;
|
||||
struct bio *clone = &tio->clone;
|
||||
struct dm_io *io = tio->io;
|
||||
struct dm_target *ti = tio->ti;
|
||||
blk_qc_t ret = BLK_QC_T_NONE;
|
||||
|
||||
clone->bi_end_io = clone_endio;
|
||||
|
||||
@ -1226,7 +1225,7 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
|
||||
case DM_MAPIO_REMAPPED:
|
||||
/* the bio has been remapped so dispatch it */
|
||||
trace_block_bio_remap(clone, bio_dev(io->orig_bio), sector);
|
||||
ret = submit_bio_noacct(clone);
|
||||
submit_bio_noacct(clone);
|
||||
break;
|
||||
case DM_MAPIO_KILL:
|
||||
if (unlikely(swap_bios_limit(ti, clone))) {
|
||||
@ -1248,8 +1247,6 @@ static blk_qc_t __map_bio(struct dm_target_io *tio)
|
||||
DMWARN("unimplemented target map return value: %d", r);
|
||||
BUG();
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static void bio_setup_sector(struct bio *bio, sector_t sector, unsigned len)
|
||||
@ -1336,7 +1333,7 @@ static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
|
||||
}
|
||||
}
|
||||
|
||||
static blk_qc_t __clone_and_map_simple_bio(struct clone_info *ci,
|
||||
static void __clone_and_map_simple_bio(struct clone_info *ci,
|
||||
struct dm_target_io *tio, unsigned *len)
|
||||
{
|
||||
struct bio *clone = &tio->clone;
|
||||
@ -1346,8 +1343,7 @@ static blk_qc_t __clone_and_map_simple_bio(struct clone_info *ci,
|
||||
__bio_clone_fast(clone, ci->bio);
|
||||
if (len)
|
||||
bio_setup_sector(clone, ci->sector, *len);
|
||||
|
||||
return __map_bio(tio);
|
||||
__map_bio(tio);
|
||||
}
|
||||
|
||||
static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
|
||||
@ -1361,7 +1357,7 @@ static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
|
||||
|
||||
while ((bio = bio_list_pop(&blist))) {
|
||||
tio = container_of(bio, struct dm_target_io, clone);
|
||||
(void) __clone_and_map_simple_bio(ci, tio, len);
|
||||
__clone_and_map_simple_bio(ci, tio, len);
|
||||
}
|
||||
}
|
||||
|
||||
@ -1405,7 +1401,7 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
|
||||
free_tio(tio);
|
||||
return r;
|
||||
}
|
||||
(void) __map_bio(tio);
|
||||
__map_bio(tio);
|
||||
|
||||
return 0;
|
||||
}
|
||||
@ -1520,11 +1516,10 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md,
|
||||
/*
|
||||
* Entry point to split a bio into clones and submit them to the targets.
|
||||
*/
|
||||
static blk_qc_t __split_and_process_bio(struct mapped_device *md,
|
||||
static void __split_and_process_bio(struct mapped_device *md,
|
||||
struct dm_table *map, struct bio *bio)
|
||||
{
|
||||
struct clone_info ci;
|
||||
blk_qc_t ret = BLK_QC_T_NONE;
|
||||
int error = 0;
|
||||
|
||||
init_clone_info(&ci, md, map, bio);
|
||||
@ -1567,19 +1562,17 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md,
|
||||
|
||||
bio_chain(b, bio);
|
||||
trace_block_split(b, bio->bi_iter.bi_sector);
|
||||
ret = submit_bio_noacct(bio);
|
||||
submit_bio_noacct(bio);
|
||||
}
|
||||
}
|
||||
|
||||
/* drop the extra reference count */
|
||||
dm_io_dec_pending(ci.io, errno_to_blk_status(error));
|
||||
return ret;
|
||||
}
|
||||
|
||||
static blk_qc_t dm_submit_bio(struct bio *bio)
|
||||
static void dm_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct mapped_device *md = bio->bi_bdev->bd_disk->private_data;
|
||||
blk_qc_t ret = BLK_QC_T_NONE;
|
||||
int srcu_idx;
|
||||
struct dm_table *map;
|
||||
|
||||
@ -1609,10 +1602,9 @@ static blk_qc_t dm_submit_bio(struct bio *bio)
|
||||
if (is_abnormal_io(bio))
|
||||
blk_queue_split(&bio);
|
||||
|
||||
ret = __split_and_process_bio(md, map, bio);
|
||||
__split_and_process_bio(md, map, bio);
|
||||
out:
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*-----------------------------------------------------------------
|
||||
@ -1671,14 +1663,14 @@ static const struct dax_operations dm_dax_ops;
|
||||
static void dm_wq_work(struct work_struct *work);
|
||||
|
||||
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
|
||||
static void dm_queue_destroy_keyslot_manager(struct request_queue *q)
|
||||
static void dm_queue_destroy_crypto_profile(struct request_queue *q)
|
||||
{
|
||||
dm_destroy_keyslot_manager(q->ksm);
|
||||
dm_destroy_crypto_profile(q->crypto_profile);
|
||||
}
|
||||
|
||||
#else /* CONFIG_BLK_INLINE_ENCRYPTION */
|
||||
|
||||
static inline void dm_queue_destroy_keyslot_manager(struct request_queue *q)
|
||||
static inline void dm_queue_destroy_crypto_profile(struct request_queue *q)
|
||||
{
|
||||
}
|
||||
#endif /* !CONFIG_BLK_INLINE_ENCRYPTION */
|
||||
@ -1704,7 +1696,7 @@ static void cleanup_mapped_device(struct mapped_device *md)
|
||||
dm_sysfs_exit(md);
|
||||
del_gendisk(md->disk);
|
||||
}
|
||||
dm_queue_destroy_keyslot_manager(md->queue);
|
||||
dm_queue_destroy_crypto_profile(md->queue);
|
||||
blk_cleanup_disk(md->disk);
|
||||
}
|
||||
|
||||
|
@ -41,6 +41,7 @@
|
||||
#include <linux/sched/signal.h>
|
||||
#include <linux/kthread.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/badblocks.h>
|
||||
#include <linux/sysctl.h>
|
||||
#include <linux/seq_file.h>
|
||||
@ -51,6 +52,7 @@
|
||||
#include <linux/hdreg.h>
|
||||
#include <linux/proc_fs.h>
|
||||
#include <linux/random.h>
|
||||
#include <linux/major.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/reboot.h>
|
||||
#include <linux/file.h>
|
||||
@ -441,19 +443,19 @@ check_suspended:
|
||||
}
|
||||
EXPORT_SYMBOL(md_handle_request);
|
||||
|
||||
static blk_qc_t md_submit_bio(struct bio *bio)
|
||||
static void md_submit_bio(struct bio *bio)
|
||||
{
|
||||
const int rw = bio_data_dir(bio);
|
||||
struct mddev *mddev = bio->bi_bdev->bd_disk->private_data;
|
||||
|
||||
if (mddev == NULL || mddev->pers == NULL) {
|
||||
bio_io_error(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
}
|
||||
|
||||
if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) {
|
||||
bio_io_error(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
}
|
||||
|
||||
blk_queue_split(&bio);
|
||||
@ -462,15 +464,13 @@ static blk_qc_t md_submit_bio(struct bio *bio)
|
||||
if (bio_sectors(bio) != 0)
|
||||
bio->bi_status = BLK_STS_IOERR;
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
}
|
||||
|
||||
/* bio could be mergeable after passing to underlayer */
|
||||
bio->bi_opf &= ~REQ_NOMERGE;
|
||||
|
||||
md_handle_request(mddev, bio);
|
||||
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
/* mddev_suspend makes sure no new requests are submitted
|
||||
|
@ -16,13 +16,13 @@ void mmc_crypto_set_initial_state(struct mmc_host *host)
|
||||
{
|
||||
/* Reset might clear all keys, so reprogram all the keys. */
|
||||
if (host->caps2 & MMC_CAP2_CRYPTO)
|
||||
blk_ksm_reprogram_all_keys(&host->ksm);
|
||||
blk_crypto_reprogram_all_keys(&host->crypto_profile);
|
||||
}
|
||||
|
||||
void mmc_crypto_setup_queue(struct request_queue *q, struct mmc_host *host)
|
||||
{
|
||||
if (host->caps2 & MMC_CAP2_CRYPTO)
|
||||
blk_ksm_register(&host->ksm, q);
|
||||
blk_crypto_register(&host->crypto_profile, q);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(mmc_crypto_setup_queue);
|
||||
|
||||
@ -30,12 +30,15 @@ void mmc_crypto_prepare_req(struct mmc_queue_req *mqrq)
|
||||
{
|
||||
struct request *req = mmc_queue_req_to_req(mqrq);
|
||||
struct mmc_request *mrq = &mqrq->brq.mrq;
|
||||
struct blk_crypto_keyslot *keyslot;
|
||||
|
||||
if (!req->crypt_ctx)
|
||||
return;
|
||||
|
||||
mrq->crypto_ctx = req->crypt_ctx;
|
||||
if (req->crypt_keyslot)
|
||||
mrq->crypto_key_slot = blk_ksm_get_slot_idx(req->crypt_keyslot);
|
||||
|
||||
keyslot = req->crypt_keyslot;
|
||||
if (keyslot)
|
||||
mrq->crypto_key_slot = blk_crypto_keyslot_index(keyslot);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(mmc_crypto_prepare_req);
|
||||
|
@ -12,6 +12,7 @@
|
||||
#include <linux/slab.h>
|
||||
#include <linux/stat.h>
|
||||
#include <linux/pm_runtime.h>
|
||||
#include <linux/scatterlist.h>
|
||||
|
||||
#include <linux/mmc/host.h>
|
||||
#include <linux/mmc/card.h>
|
||||
|
@ -6,7 +6,7 @@
|
||||
*/
|
||||
|
||||
#include <linux/blk-crypto.h>
|
||||
#include <linux/keyslot-manager.h>
|
||||
#include <linux/blk-crypto-profile.h>
|
||||
#include <linux/mmc/host.h>
|
||||
|
||||
#include "cqhci-crypto.h"
|
||||
@ -23,9 +23,10 @@ static const struct cqhci_crypto_alg_entry {
|
||||
};
|
||||
|
||||
static inline struct cqhci_host *
|
||||
cqhci_host_from_ksm(struct blk_keyslot_manager *ksm)
|
||||
cqhci_host_from_crypto_profile(struct blk_crypto_profile *profile)
|
||||
{
|
||||
struct mmc_host *mmc = container_of(ksm, struct mmc_host, ksm);
|
||||
struct mmc_host *mmc =
|
||||
container_of(profile, struct mmc_host, crypto_profile);
|
||||
|
||||
return mmc->cqe_private;
|
||||
}
|
||||
@ -57,12 +58,12 @@ static int cqhci_crypto_program_key(struct cqhci_host *cq_host,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int cqhci_crypto_keyslot_program(struct blk_keyslot_manager *ksm,
|
||||
static int cqhci_crypto_keyslot_program(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_key *key,
|
||||
unsigned int slot)
|
||||
|
||||
{
|
||||
struct cqhci_host *cq_host = cqhci_host_from_ksm(ksm);
|
||||
struct cqhci_host *cq_host = cqhci_host_from_crypto_profile(profile);
|
||||
const union cqhci_crypto_cap_entry *ccap_array =
|
||||
cq_host->crypto_cap_array;
|
||||
const struct cqhci_crypto_alg_entry *alg =
|
||||
@ -115,11 +116,11 @@ static int cqhci_crypto_clear_keyslot(struct cqhci_host *cq_host, int slot)
|
||||
return cqhci_crypto_program_key(cq_host, &cfg, slot);
|
||||
}
|
||||
|
||||
static int cqhci_crypto_keyslot_evict(struct blk_keyslot_manager *ksm,
|
||||
static int cqhci_crypto_keyslot_evict(struct blk_crypto_profile *profile,
|
||||
const struct blk_crypto_key *key,
|
||||
unsigned int slot)
|
||||
{
|
||||
struct cqhci_host *cq_host = cqhci_host_from_ksm(ksm);
|
||||
struct cqhci_host *cq_host = cqhci_host_from_crypto_profile(profile);
|
||||
|
||||
return cqhci_crypto_clear_keyslot(cq_host, slot);
|
||||
}
|
||||
@ -132,7 +133,7 @@ static int cqhci_crypto_keyslot_evict(struct blk_keyslot_manager *ksm,
|
||||
* "enabled" when these are called, i.e. CQHCI_ENABLE might not be set in the
|
||||
* CQHCI_CFG register. But the hardware allows that.
|
||||
*/
|
||||
static const struct blk_ksm_ll_ops cqhci_ksm_ops = {
|
||||
static const struct blk_crypto_ll_ops cqhci_crypto_ops = {
|
||||
.keyslot_program = cqhci_crypto_keyslot_program,
|
||||
.keyslot_evict = cqhci_crypto_keyslot_evict,
|
||||
};
|
||||
@ -157,8 +158,8 @@ cqhci_find_blk_crypto_mode(union cqhci_crypto_cap_entry cap)
|
||||
*
|
||||
* If the driver previously set MMC_CAP2_CRYPTO and the CQE declares
|
||||
* CQHCI_CAP_CS, initialize the crypto support. This involves reading the
|
||||
* crypto capability registers, initializing the keyslot manager, clearing all
|
||||
* keyslots, and enabling 128-bit task descriptors.
|
||||
* crypto capability registers, initializing the blk_crypto_profile, clearing
|
||||
* all keyslots, and enabling 128-bit task descriptors.
|
||||
*
|
||||
* Return: 0 if crypto was initialized or isn't supported; whether
|
||||
* MMC_CAP2_CRYPTO remains set indicates which one of those cases it is.
|
||||
@ -168,7 +169,7 @@ int cqhci_crypto_init(struct cqhci_host *cq_host)
|
||||
{
|
||||
struct mmc_host *mmc = cq_host->mmc;
|
||||
struct device *dev = mmc_dev(mmc);
|
||||
struct blk_keyslot_manager *ksm = &mmc->ksm;
|
||||
struct blk_crypto_profile *profile = &mmc->crypto_profile;
|
||||
unsigned int num_keyslots;
|
||||
unsigned int cap_idx;
|
||||
enum blk_crypto_mode_num blk_mode_num;
|
||||
@ -199,15 +200,15 @@ int cqhci_crypto_init(struct cqhci_host *cq_host)
|
||||
*/
|
||||
num_keyslots = cq_host->crypto_capabilities.config_count + 1;
|
||||
|
||||
err = devm_blk_ksm_init(dev, ksm, num_keyslots);
|
||||
err = devm_blk_crypto_profile_init(dev, profile, num_keyslots);
|
||||
if (err)
|
||||
goto out;
|
||||
|
||||
ksm->ksm_ll_ops = cqhci_ksm_ops;
|
||||
ksm->dev = dev;
|
||||
profile->ll_ops = cqhci_crypto_ops;
|
||||
profile->dev = dev;
|
||||
|
||||
/* Unfortunately, CQHCI crypto only supports 32 DUN bits. */
|
||||
ksm->max_dun_bytes_supported = 4;
|
||||
profile->max_dun_bytes_supported = 4;
|
||||
|
||||
/*
|
||||
* Cache all the crypto capabilities and advertise the supported crypto
|
||||
@ -223,7 +224,7 @@ int cqhci_crypto_init(struct cqhci_host *cq_host)
|
||||
cq_host->crypto_cap_array[cap_idx]);
|
||||
if (blk_mode_num == BLK_ENCRYPTION_MODE_INVALID)
|
||||
continue;
|
||||
ksm->crypto_modes_supported[blk_mode_num] |=
|
||||
profile->modes_supported[blk_mode_num] |=
|
||||
cq_host->crypto_cap_array[cap_idx].sdus_mask * 512;
|
||||
}
|
||||
|
||||
|
@ -15,6 +15,7 @@
|
||||
#include <linux/slab.h>
|
||||
#include <linux/major.h>
|
||||
#include <linux/backing-dev.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/fs_context.h>
|
||||
#include "mtdcore.h"
|
||||
|
||||
|
@ -162,7 +162,7 @@ static int nsblk_do_bvec(struct nd_namespace_blk *nsblk,
|
||||
return err;
|
||||
}
|
||||
|
||||
static blk_qc_t nd_blk_submit_bio(struct bio *bio)
|
||||
static void nd_blk_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct bio_integrity_payload *bip;
|
||||
struct nd_namespace_blk *nsblk = bio->bi_bdev->bd_disk->private_data;
|
||||
@ -173,7 +173,7 @@ static blk_qc_t nd_blk_submit_bio(struct bio *bio)
|
||||
bool do_acct;
|
||||
|
||||
if (!bio_integrity_prep(bio))
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
|
||||
bip = bio_integrity(bio);
|
||||
rw = bio_data_dir(bio);
|
||||
@ -199,7 +199,6 @@ static blk_qc_t nd_blk_submit_bio(struct bio *bio)
|
||||
bio_end_io_acct(bio, start);
|
||||
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static int nsblk_rw_bytes(struct nd_namespace_common *ndns,
|
||||
|
@ -1440,7 +1440,7 @@ static int btt_do_bvec(struct btt *btt, struct bio_integrity_payload *bip,
|
||||
return ret;
|
||||
}
|
||||
|
||||
static blk_qc_t btt_submit_bio(struct bio *bio)
|
||||
static void btt_submit_bio(struct bio *bio)
|
||||
{
|
||||
struct bio_integrity_payload *bip = bio_integrity(bio);
|
||||
struct btt *btt = bio->bi_bdev->bd_disk->private_data;
|
||||
@ -1451,7 +1451,7 @@ static blk_qc_t btt_submit_bio(struct bio *bio)
|
||||
bool do_acct;
|
||||
|
||||
if (!bio_integrity_prep(bio))
|
||||
return BLK_QC_T_NONE;
|
||||
return;
|
||||
|
||||
do_acct = blk_queue_io_stat(bio->bi_bdev->bd_disk->queue);
|
||||
if (do_acct)
|
||||
@ -1483,7 +1483,6 @@ static blk_qc_t btt_submit_bio(struct bio *bio)
|
||||
bio_end_io_acct(bio, start);
|
||||
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static int btt_rw_page(struct block_device *bdev, sector_t sector,
|
||||
|
@ -7,6 +7,7 @@
|
||||
#include <linux/export.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/device.h>
|
||||
#include <linux/ctype.h>
|
||||
#include <linux/ndctl.h>
|
||||
|
@ -190,7 +190,7 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem,
|
||||
return rc;
|
||||
}
|
||||
|
||||
static blk_qc_t pmem_submit_bio(struct bio *bio)
|
||||
static void pmem_submit_bio(struct bio *bio)
|
||||
{
|
||||
int ret = 0;
|
||||
blk_status_t rc = 0;
|
||||
@ -229,7 +229,6 @@ static blk_qc_t pmem_submit_bio(struct bio *bio)
|
||||
bio->bi_status = errno_to_blk_status(ret);
|
||||
|
||||
bio_endio(bio);
|
||||
return BLK_QC_T_NONE;
|
||||
}
|
||||
|
||||
static int pmem_rw_page(struct block_device *bdev, sector_t sector,
|
||||
|
@ -6,6 +6,7 @@
|
||||
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/blk-mq.h>
|
||||
#include <linux/blk-integrity.h>
|
||||
#include <linux/compat.h>
|
||||
#include <linux/delay.h>
|
||||
#include <linux/errno.h>
|
||||
@ -118,25 +119,6 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
|
||||
static void nvme_update_keep_alive(struct nvme_ctrl *ctrl,
|
||||
struct nvme_command *cmd);
|
||||
|
||||
/*
|
||||
* Prepare a queue for teardown.
|
||||
*
|
||||
* This must forcibly unquiesce queues to avoid blocking dispatch, and only set
|
||||
* the capacity to 0 after that to avoid blocking dispatchers that may be
|
||||
* holding bd_butex. This will end buffered writers dirtying pages that can't
|
||||
* be synced.
|
||||
*/
|
||||
static void nvme_set_queue_dying(struct nvme_ns *ns)
|
||||
{
|
||||
if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
|
||||
return;
|
||||
|
||||
blk_set_queue_dying(ns->queue);
|
||||
blk_mq_unquiesce_queue(ns->queue);
|
||||
|
||||
set_capacity_and_notify(ns->disk, 0);
|
||||
}
|
||||
|
||||
void nvme_queue_scan(struct nvme_ctrl *ctrl)
|
||||
{
|
||||
/*
|
||||
@ -345,15 +327,19 @@ static inline enum nvme_disposition nvme_decide_disposition(struct request *req)
|
||||
return RETRY;
|
||||
}
|
||||
|
||||
static inline void nvme_end_req(struct request *req)
|
||||
static inline void nvme_end_req_zoned(struct request *req)
|
||||
{
|
||||
blk_status_t status = nvme_error_status(nvme_req(req)->status);
|
||||
|
||||
if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
|
||||
req_op(req) == REQ_OP_ZONE_APPEND)
|
||||
req->__sector = nvme_lba_to_sect(req->q->queuedata,
|
||||
le64_to_cpu(nvme_req(req)->result.u64));
|
||||
}
|
||||
|
||||
static inline void nvme_end_req(struct request *req)
|
||||
{
|
||||
blk_status_t status = nvme_error_status(nvme_req(req)->status);
|
||||
|
||||
nvme_end_req_zoned(req);
|
||||
nvme_trace_bio_complete(req);
|
||||
blk_mq_end_request(req, status);
|
||||
}
|
||||
@ -380,6 +366,13 @@ void nvme_complete_rq(struct request *req)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(nvme_complete_rq);
|
||||
|
||||
void nvme_complete_batch_req(struct request *req)
|
||||
{
|
||||
nvme_cleanup_cmd(req);
|
||||
nvme_end_req_zoned(req);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(nvme_complete_batch_req);
|
||||
|
||||
/*
|
||||
* Called to unwind from ->queue_rq on a failed command submission so that the
|
||||
* multipathing code gets called to potentially failover to another path.
|
||||
@ -631,7 +624,7 @@ static inline void nvme_init_request(struct request *req,
|
||||
|
||||
req->cmd_flags |= REQ_FAILFAST_DRIVER;
|
||||
if (req->mq_hctx->type == HCTX_TYPE_POLL)
|
||||
req->cmd_flags |= REQ_HIPRI;
|
||||
req->cmd_flags |= REQ_POLLED;
|
||||
nvme_clear_nvme_request(req);
|
||||
memcpy(nvme_req(req)->cmd, cmd, sizeof(*cmd));
|
||||
}
|
||||
@ -4473,6 +4466,37 @@ out:
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(nvme_init_ctrl);
|
||||
|
||||
static void nvme_start_ns_queue(struct nvme_ns *ns)
|
||||
{
|
||||
if (test_and_clear_bit(NVME_NS_STOPPED, &ns->flags))
|
||||
blk_mq_unquiesce_queue(ns->queue);
|
||||
}
|
||||
|
||||
static void nvme_stop_ns_queue(struct nvme_ns *ns)
|
||||
{
|
||||
if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
|
||||
blk_mq_quiesce_queue(ns->queue);
|
||||
}
|
||||
|
||||
/*
|
||||
* Prepare a queue for teardown.
|
||||
*
|
||||
* This must forcibly unquiesce queues to avoid blocking dispatch, and only set
|
||||
* the capacity to 0 after that to avoid blocking dispatchers that may be
|
||||
* holding bd_butex. This will end buffered writers dirtying pages that can't
|
||||
* be synced.
|
||||
*/
|
||||
static void nvme_set_queue_dying(struct nvme_ns *ns)
|
||||
{
|
||||
if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
|
||||
return;
|
||||
|
||||
blk_set_queue_dying(ns->queue);
|
||||
nvme_start_ns_queue(ns);
|
||||
|
||||
set_capacity_and_notify(ns->disk, 0);
|
||||
}
|
||||
|
||||
/**
|
||||
* nvme_kill_queues(): Ends all namespace queues
|
||||
* @ctrl: the dead controller that needs to end
|
||||
@ -4488,7 +4512,7 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
|
||||
|
||||
/* Forcibly unquiesce queues to avoid blocking dispatch */
|
||||
if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
|
||||
blk_mq_unquiesce_queue(ctrl->admin_q);
|
||||
nvme_start_admin_queue(ctrl);
|
||||
|
||||
list_for_each_entry(ns, &ctrl->namespaces, list)
|
||||
nvme_set_queue_dying(ns);
|
||||
@ -4551,7 +4575,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
|
||||
|
||||
down_read(&ctrl->namespaces_rwsem);
|
||||
list_for_each_entry(ns, &ctrl->namespaces, list)
|
||||
blk_mq_quiesce_queue(ns->queue);
|
||||
nvme_stop_ns_queue(ns);
|
||||
up_read(&ctrl->namespaces_rwsem);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(nvme_stop_queues);
|
||||
@ -4562,11 +4586,25 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
|
||||
|
||||
down_read(&ctrl->namespaces_rwsem);
|
||||
list_for_each_entry(ns, &ctrl->namespaces, list)
|
||||
blk_mq_unquiesce_queue(ns->queue);
|
||||
nvme_start_ns_queue(ns);
|
||||
up_read(&ctrl->namespaces_rwsem);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(nvme_start_queues);
|
||||
|
||||
void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
|
||||
{
|
||||
if (!test_and_set_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
|
||||
blk_mq_quiesce_queue(ctrl->admin_q);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
|
||||
|
||||
void nvme_start_admin_queue(struct nvme_ctrl *ctrl)
|
||||
{
|
||||
if (test_and_clear_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
|
||||
blk_mq_unquiesce_queue(ctrl->admin_q);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(nvme_start_admin_queue);
|
||||
|
||||
void nvme_sync_io_queues(struct nvme_ctrl *ctrl)
|
||||
{
|
||||
struct nvme_ns *ns;
|
||||
|
@ -2382,7 +2382,7 @@ nvme_fc_ctrl_free(struct kref *ref)
|
||||
list_del(&ctrl->ctrl_list);
|
||||
spin_unlock_irqrestore(&ctrl->rport->lock, flags);
|
||||
|
||||
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
|
||||
nvme_start_admin_queue(&ctrl->ctrl);
|
||||
blk_cleanup_queue(ctrl->ctrl.admin_q);
|
||||
blk_cleanup_queue(ctrl->ctrl.fabrics_q);
|
||||
blk_mq_free_tag_set(&ctrl->admin_tag_set);
|
||||
@ -2510,7 +2510,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
|
||||
/*
|
||||
* clean up the admin queue. Same thing as above.
|
||||
*/
|
||||
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
|
||||
nvme_stop_admin_queue(&ctrl->ctrl);
|
||||
blk_sync_queue(ctrl->ctrl.admin_q);
|
||||
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
|
||||
nvme_fc_terminate_exchange, &ctrl->ctrl);
|
||||
@ -3095,7 +3095,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
|
||||
ctrl->ctrl.max_hw_sectors = ctrl->ctrl.max_segments <<
|
||||
(ilog2(SZ_4K) - 9);
|
||||
|
||||
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
|
||||
nvme_start_admin_queue(&ctrl->ctrl);
|
||||
|
||||
ret = nvme_init_ctrl_finish(&ctrl->ctrl);
|
||||
if (ret || test_bit(ASSOC_FAILED, &ctrl->flags))
|
||||
@ -3249,7 +3249,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
|
||||
nvme_fc_free_queue(&ctrl->queues[0]);
|
||||
|
||||
/* re-enable the admin_q so anything new can fast fail */
|
||||
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
|
||||
nvme_start_admin_queue(&ctrl->ctrl);
|
||||
|
||||
/* resume the io queues so that things will fast fail */
|
||||
nvme_start_queues(&ctrl->ctrl);
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user