IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
For clarity, avoid using just the "blk_crypto_" prefix for functions and
structs that are specific to blk-crypto-fallback. Instead, use
"blk_crypto_fallback_". Some places already did this, but others
didn't.
This is also a prerequisite for using "struct blk_crypto_keyslot" to
mean a generic blk-crypto keyslot (which is what it sounds like).
Rename the fallback one to "struct blk_crypto_fallback_keyslot".
No change in behavior.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20211018180453.40441-2-ebiggers@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
To hide internal implementation and simplify some driver code,
this adds a helper to invalidate the gendisk. It will clean the
gendisk's associated buffer/page caches and reset its internal
states.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210922123711.187-2-xieyongji@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_try_enter_queue() already takes rcu_read_lock/unlock, so we can
avoid the second pair in percpu_ref_tryget_live(), use a newly added
percpu_ref_tryget_live_rcu().
As rcu_read_lock/unlock imply barrier()s, it's pretty noticeable,
especially for for !CONFIG_PREEMPT_RCU (default for some distributions),
where __rcu_read_lock/unlock() are not inlined.
3.20% io_uring [kernel.vmlinux] [k] __rcu_read_unlock
3.05% io_uring [kernel.vmlinux] [k] __rcu_read_lock
2.52% io_uring [kernel.vmlinux] [k] __rcu_read_unlock
2.28% io_uring [kernel.vmlinux] [k] __rcu_read_lock
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6b11c67ea495ed9d44f067622d852de4a510ce65.1634822969.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We switched to directly use dev_t to get block device, lookup changed the
meaning of use, now we fix this conflicting comment.
Fixes: 4e7b5671c6a8 ("block: remove i_bdev")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211021071344.1600362-1-liu.yun@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since it is now possible for a tagset to share a single set of tags, the
iter function should not re-iter the tags for the count of #hw queues in
that case. Rather it should just iter once.
Fixes: e155b0c238b2 ("blk-mq: Use shared tags for shared sbitmap support")
Reported-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Kashyap Desai <kashyap.desai@broadcom.com>
Link: https://lore.kernel.org/r/1634550083-202815-1-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Consolidate the various helpers into a single blk_flush_plug helper that
takes a plk_plug and the from_scheduler bool and switch all callsites to
call it directly. Checks that the plug is non-NULL must be performed by
the caller, something that most already do anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211020144119.142582-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't call flush_plug_callbacks if there are no plug callbacks.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211020144119.142582-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace the call to blk_flush_plug_list in blk_mq_submit_bio with a
direct call to blk_mq_flush_plug_list. This means we do not flush
plug callback from stackable devices, which doesn't really help with
the accumulated requests anyway, and it also means the cached requests
aren't freed here as they can still be used later on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211020144119.142582-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This check is meant to catch cases where a requeue is attempted on a
request that is still inserted. It's never really been useful to catch any
misuse, and now it's actively wrong. Outside of that, this should not be a
BUG_ON() to begin with.
Remove the check as it's now causing active harm, as requeue off the plug
path will trigger it even though the request state is just fine.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/linux-block/CAHj4cs80zAUc2grnCZ015-2Rvd-=gXRfB_dFKy=RTm+wRo09HQ@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Inline BIO_NO_PAGE_REF check of bio_release_pages() to avoid function
call.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
percpu_ref_put() are inlined for performance and bloat the binary, we
don't care about the fail case of blk_try_enter_queue(), so we can
replace it with a call to blk_queue_exit().
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
First, get rid of an extra branch and chain error checks. Also reshuffle
it with bio_advance(), so it goes closer to the final check, with that
the compiler loads rq->rq_flags only once, and also doesn't reload
bio->bi_iter.bi_size if bio_advance() didn't actually advanced the iter.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert bdev->bd_disk->queue to bdev_get_queue(), which is faster.
Apparently, there are a few such spots in block that got lost during
rebases.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In bfq_pd_alloc(), the function bfqg_stats_init() init bfqg. If
blkg_rwstat_init() init bfqg_stats->bytes successful and init
bfqg_stats->ios failed, bfqg_stats_init() return failed, bfqg will
be freed. But blkg_rwstat->cpu_cnt is not deleted from the list of
percpu_counters. If we traverse the list of percpu_counters, It will
have UAF problem.
we should use blkg_rwstat_exit() to cleanup bfqg_stats bytes in the
above scenario.
Fixes: commit fd41e60331b ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios")
Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20211018024225.1493938-1-zhengliang6@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we don't use an IO scheduler or have shared tags, then we don't need
to call into this external function at all. This saves ~2% for such
a setup.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Return to the normal blk_mq_submit_bio flow if the bio did not end up
actually being a flush because the device didn't support it. Note that
this is basically impossible to hit without special instrumentation given
that submit_bio_checks already clears these flags usually, so we'd need a
tight race to actually hit this code path.
With this the call to blk_mq_run_hw_queue for the flush requests can be
removed given that the actual flush requests are always issued via the
requeue workqueue which runs the queue unconditionally.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211019122553.2467817-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we have just one queue type in the plug list, then we can extend our
direct issue to cover a full plug list as well. This allows sending a
batch of requests for direct issue, which is more efficient than doing
one-at-a-time kind of issue.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use a singly linked list for the blk_plug. This saves 8 bytes in the
blk_plug struct, and makes for faster list manipulations than doubly
linked lists. As we don't use the doubly linked lists for anything,
singly linked is just fine.
This yields a bump in default (merging enabled) performance from 7.0
to 7.1M IOPS, and ~7.5M IOPS with merging disabled.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use the proper helper to read the block device size and switch various
places to pass the size in terms of sectors which is more practical.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211019062024.2171074-4-hch@lst.de
[axboe: fix comment typo]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We get all sorts of unreliable and funky results since the bio is
designed to align on a cacheline, which it does not when inlined like
this.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is in the fast path of driver issue or completion, and it's a single
array index operation. Move it inline to avoid a function call for it.
This does mean making struct blk_mq_tags block layer public, but there's
not really much in there.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Even if we have multiple queues in the plug list, chances that they
are very interspersed is minimal. Don't bother spending CPU cycles
sorting the list.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of returning the same queue request through a request pointer,
use a boolean to accomplish the same.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We only need to call it to resolve the blk_status_t -> errno mapping for
tracing, so move the conversion into the tracepoints that are not called
at all when tracing isn't enabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is called for every write in the fast path, move it inline next
to get_disk_ro() which is called internally.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We added RQF_ELV to tell whether there's an IO scheduler attached, and
RQF_ELVPRIV tells us whether there's an IO scheduler with private data
attached. Don't check RQF_ELV in blk_mq_free_request(), what we care
about here is just if we have scheduler private data attached.
This fixes a boot crash
Fixes: 2ff0682da6e0 ("block: store elevator state in request")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reading the inode size brings in a new cacheline for IO submit, and
it's in the hot path being checked for every single IO. When doing
millions of IOs per core per second, this is noticeable overhead.
Cache the nr_sectors in the bdev itself.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use the proper helper to read the block device size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of calling blk_mq_end_request() on a single request, add a helper
that takes the new struct io_comp_batch and completes any request stored
in there.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
struct io_comp_batch contains a list head and a completion handler, which
will allow completions to more effciently completed batches of IO.
For now, no functional changes in this patch, we just define the
io_comp_batch structure and add the argument to the file_operations iopoll
handler.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of open-coding the list additions, traversal, and removal,
provide a basic set of helpers.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Just like the blk_mq_ctx counterparts, we've got a bunch of counters
in here that are only for debugfs and are of questionnable value. They
are:
- dispatched, index of how many requests were dispatched in one go
- poll_{considered,invoked,success}, which track poll sucess rates. We're
confident in the iopoll implementation at this point, don't bother
tracking these.
As a bonus, this shrinks each hardware queue from 576 bytes to 512 bytes,
dropping a whole cacheline.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These were added as part of early days debugging for blk-mq, and they
are not really useful anymore. Rather than spend cycles updating them,
just get rid of them.
As a bonus, this shrinks the per-cpu software queue size from 256b
to 192b. That's a whole cacheline less.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a local variable for rq_flags, it helps to compile out some of
rq_flags reloads.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We should have enough of registers in blk_mq_rq_ctx_init(), store them
in local vars, so we don't keep reloading them.
note: keeping q->elevator may look unnecessary, but it's also used
inside inlined blk_mq_tags_from_data().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't init rq->hash and rq->rb_node in blk_mq_rq_ctx_init() if there is
no elevator. Also, move some other initialisers that imply barriers to
the end, so the compiler is free to rearrange and optimise other the
rest of them.
note: fold in a change from Jens leaving queue_list unconditional, as
it might lead to problems otherwise.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When adding partitions to the disk, the reference count of the disk
object is increased. then alloc partition device and called
device_add(), if the device_add() return error, the reference
count of the disk object will be reduced twice, at put_device(pdev)
and put_disk(disk). this leads to the end of the object's life cycle
prematurely, and trigger following calltrace.
__init_work+0x2d/0x50 kernel/workqueue.c:519
synchronize_rcu_expedited+0x3af/0x650 kernel/rcu/tree_exp.h:847
bdi_remove_from_list mm/backing-dev.c:938 [inline]
bdi_unregister+0x17f/0x5c0 mm/backing-dev.c:946
release_bdi+0xa1/0xc0 mm/backing-dev.c:968
kref_put include/linux/kref.h:65 [inline]
bdi_put+0x72/0xa0 mm/backing-dev.c:976
bdev_free_inode+0x11e/0x220 block/bdev.c:408
i_callback+0x3f/0x70 fs/inode.c:226
rcu_do_batch kernel/rcu/tree.c:2508 [inline]
rcu_core+0x76d/0x16c0 kernel/rcu/tree.c:2743
__do_softirq+0x1d7/0x93b kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu kernel/softirq.c:636 [inline]
irq_exit_rcu+0xf2/0x130 kernel/softirq.c:648
sysvec_apic_timer_interrupt+0x93/0xc0
making disk is NULL when calling put_disk().
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211018103422.2043-1-qiang.zhang1211@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add an rq private RQF_ELV flag, which tells the block layer that this
request was initialized on a queue that has an IO scheduler attached.
This allows for faster checking in the fast path, rather than having to
deference rq->q later on.
Elevator switching does full quiesce of the queue before detaching an
IO scheduler, so it's safe to cache this in the request itself.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We set BIO_TRACKED unconditionally when rq_qos_throttle() is called, even
though we may not even have an rq_qos handler. Only mark it as TRACKED if
it really is potentially tracked.
This saves considerable time for the case where the bio isn't tracked:
2.64% -1.65% [kernel.vmlinux] [k] bio_endio
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For some reason we still have them in blk-core, with the rest of the
request completion being in blk-mq. That causes and out-of-line call
for each completion.
Move them into blk-mq.c instead, where they belong.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have exactly one caller of this, just get rid of adding the useless
function name to the output.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we're completing nbytes and nbytes is the size of the bio, don't bother
with calling into the iterator increment helpers. Just clear the bio
size and we're done.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>