IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In preparation for allowing BIO based device drivers to use zone write
plugging and its zone append emulation, allow these drivers to call
blk_revalidate_disk_zones() so that all zone resources necessary to zone
write plugging can be initialized.
To do so, remove the check in blk_revalidate_disk_zones() restricting
the use of this function to mq request-based drivers to allow also
BIO-based drivers to use it. This is safe to do as long as the
BIO-based block device queue is already setup and usable, as it should,
and can be safely frozen.
The helper function disk_need_zone_resources() is added to control the
allocation and initialization of the zone write plug hash table and
of the conventional zone bitmap only for mq devices and for BIO-based
devices that require zone append emulation.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240408014128.205141-12-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Given that zone write plugging manages all writes to zones of a zoned
block device and tracks the write pointer position of all zones that are
not full nor empty, emulating zone append operations using regular
writes can be implemented generically, without relying on the underlying
device driver to implement such emulation. This is needed for devices
that do not natively support the zone append command (e.g. SMR
hard-disks).
A device may request zone append emulation by setting its
max_zone_append_sectors queue limit to 0. For such device, the function
blk_zone_wplug_prepare_bio() changes zone append BIOs into
non-mergeable regular write BIOs. Modified zone append BIOs are flagged
with the new BIO flag BIO_EMULATES_ZONE_APPEND. This flag is checked
on completion of the BIO in blk_zone_write_plug_bio_endio() to restore
the original REQ_OP_ZONE_APPEND operation code of the BIO.
The block layer internal inline helper function bio_is_zone_append() is
added to test if a BIO is either a native zone append operation
(REQ_OP_ZONE_APPEND operation code) or if it is flagged with
BIO_EMULATES_ZONE_APPEND. Given that both native and emulated zone
append BIO completion handling should be similar, The functions
blk_update_request() and blk_zone_complete_request_bio() are modified to
use bio_is_zone_append() to execute blk_zone_update_request_bio() for
both native and emulated zone append operations.
This commit contains contributions from Christoph Hellwig <hch@lst.de>.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240408014128.205141-11-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for adding a generic zone append emulation using zone
write plugging, allow device drivers supporting zoned block device to
set a the max_zone_append_sectors queue limit of a device to 0 to
indicate the lack of native support for zone append operations and that
the block layer should emulate these operations using regular write
operations.
blk_queue_max_zone_append_sectors() is modified to allow passing 0 as
the max_zone_append_sectors argument. The function
queue_max_zone_append_sectors() is also modified to ensure that the
minimum of the max_hw_sectors and chunk_sectors limit is used whenever
the max_zone_append_sectors limit is 0. This minimum is consistent with
the value set for the max_zone_append_sectors limit by the function
blk_validate_zoned_limits() when limits for a queue are validated.
The helper functions queue_emulates_zone_append() and
bdev_emulates_zone_append() are added to test if a queue (or block
device) emulates zone append operations.
In order for blk_revalidate_disk_zones() to accept zoned block devices
relying on zone append emulation, the direct check to the
max_zone_append_sectors queue limit of the disk is replaced by a check
using the value returned by queue_max_zone_append_sectors(). Similarly,
queue_zone_append_max_show() is modified to use the same accessor so
that the sysfs attribute advertizes the non-zero limit that will be
used, regardless if it is for native or emulated commands.
For stacking drivers, a top device should not need to care if the
underlying devices have native or emulated zone append operations.
blk_stack_limits() is thus modified to set the top device
max_zone_append_sectors limit using the new accessor
queue_limits_max_zone_append_sectors(). queue_max_zone_append_sectors()
is modified to use this function as well. Stacking drivers that require
zone append emulation, e.g. dm-crypt, can still request this feature by
calling blk_queue_max_zone_append_sectors() with a 0 limit.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240408014128.205141-10-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For a zoned block device that has no limit on the number of open zones
and no limit on the number of active zones, the zone write plug mempool
is created with a size of 128 zone write plugs. For such case, set the
device max_open_zones queue limit to this value to indicate to the user
the potential performance penalty that may happen when writing
simultaneously to more zones than the mempool size.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240408014128.205141-9-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Zone write plugging implements a per-zone "plug" for write operations
to control the submission and execution order of write operations to
sequential write required zones of a zoned block device. Per-zone
plugging guarantees that at any time there is at most only one write
request per zone being executed. This mechanism is intended to replace
zone write locking which implements a similar per-zone write throttling
at the scheduler level, but is implemented only by mq-deadline.
Unlike zone write locking which operates on requests, zone write
plugging operates on BIOs. A zone write plug is simply a BIO list that
is atomically manipulated using a spinlock and a kblockd submission
work. A write BIO to a zone is "plugged" to delay its execution if a
write BIO for the same zone was already issued, that is, if a write
request for the same zone is being executed. The next plugged BIO is
unplugged and issued once the write request completes.
This mechanism allows to:
- Untangle zone write ordering from block IO schedulers. This allows
removing the restriction on using mq-deadline for writing to zoned
block devices. Any block IO scheduler, including "none" can be used.
- Zone write plugging operates on BIOs instead of requests. Plugged
BIOs waiting for execution thus do not hold scheduling tags and thus
are not preventing other BIOs from executing (reads or writes to
other zones). Depending on the workload, this can significantly
improve the device use (higher queue depth operation) and
performance.
- Both blk-mq (request based) zoned devices and BIO-based zoned devices
(e.g. device mapper) can use zone write plugging. It is mandatory
for the former but optional for the latter. BIO-based drivers can
use zone write plugging to implement write ordering guarantees, or
the drivers can implement their own if needed.
- The code is less invasive in the block layer and is mostly limited to
blk-zoned.c with some small changes in blk-mq.c, blk-merge.c and
bio.c.
Zone write plugging is implemented using struct blk_zone_wplug. This
structure includes a spinlock, a BIO list and a work structure to
handle the submission of plugged BIOs. Zone write plugs structures are
managed using a per-disk hash table.
Plugging of zone write BIOs is done using the function
blk_zone_write_plug_bio() which returns false if a BIO execution does
not need to be delayed and true otherwise. This function is called
from blk_mq_submit_bio() after a BIO is split to avoid large BIOs
spanning multiple zones which would cause mishandling of zone write
plugs. This ichange enables by default zone write plugging for any mq
request-based block device. BIO-based device drivers can also use zone
write plugging by expliclty calling blk_zone_write_plug_bio() in their
->submit_bio method. For such devices, the driver must ensure that a
BIO passed to blk_zone_write_plug_bio() is already split and not
straddling zone boundaries.
Only write and write zeroes BIOs are plugged. Zone write plugging does
not introduce any significant overhead for other operations. A BIO that
is being handled through zone write plugging is flagged using the new
BIO flag BIO_ZONE_WRITE_PLUGGING. A request handling a BIO flagged with
this new flag is flagged with the new RQF_ZONE_WRITE_PLUGGING flag.
The completion of BIOs and requests flagged trigger respectively calls
to the functions blk_zone_write_bio_endio() and
blk_zone_write_complete_request(). The latter function is used to
trigger submission of the next plugged BIO using the zone plug work.
blk_zone_write_bio_endio() does the same for BIO-based devices.
This ensures that at any time, at most one request (blk-mq devices) or
one BIO (BIO-based devices) is being executed for any zone. The
handling of zone write plugs using a per-zone plug spinlock maximizes
parallelism and device usage by allowing multiple zones to be writen
simultaneously without lock contention.
Zone write plugging ignores flush BIOs without data. Hovever, any flush
BIO that has data is always plugged so that the write part of the flush
sequence is serialized with other regular writes.
Given that any BIO handled through zone write plugging will be the only
BIO in flight for the target zone when it is executed, the unplugging
and submission of a BIO will have no chance of successfully merging with
plugged requests or requests in the scheduler. To overcome this
potential performance degradation, blk_mq_submit_bio() calls the
function blk_zone_write_plug_attempt_merge() to try to merge other
plugged BIOs with the one just unplugged and submitted. Successful
merging is signaled using blk_zone_write_plug_bio_merged(), called from
bio_attempt_back_merge(). Furthermore, to avoid recalculating the number
of segments of plugged BIOs to attempt merging, the number of segments
of a plugged BIO is saved using the new struct bio field
__bi_nr_segments. To avoid growing the size of struct bio, this field is
added as a union with the bio_cookie field. This is safe to do as
polling is always disabled for plugged BIOs.
When BIOs are plugged in a zone write plug, the device request queue
usage counter is always incremented. This reference is kept and reused
for blk-mq devices when the plugged BIO is unplugged and submitted
again using submit_bio_noacct_nocheck(). For this case, the unplugged
BIO is already flagged with BIO_ZONE_WRITE_PLUGGING and
blk_mq_submit_bio() proceeds directly to allocating a new request for
the BIO, re-using the usage reference count taken when the BIO was
plugged. This extra reference count is dropped in
blk_zone_write_plug_attempt_merge() for any plugged BIO that is
successfully merged. Given that BIO-based devices will not take this
path, the extra reference is dropped after a plugged BIO is unplugged
and submitted.
Zone write plugs are dynamically allocated and managed using a hash
table (an array of struct hlist_head) with RCU protection.
A zone write plug is allocated when a write BIO is received for the
zone and not freed until the zone is fully written, reset or finished.
To detect when a zone write plug can be freed, the write state of each
zone is tracked using a write pointer offset which corresponds to the
offset of a zone write pointer relative to the zone start. Write
operations always increment this write pointer offset. Zone reset
operations set it to 0 and zone finish operations set it to the zone
size.
If a write error happens, the wp_offset value of a zone write plug may
become incorrect and out of sync with the device managed write pointer.
This is handled using the zone write plug flag BLK_ZONE_WPLUG_ERROR.
The function blk_zone_wplug_handle_error() is called from the new disk
zone write plug work when this flag is set. This function executes a
report zone to update the zone write pointer offset to the current
value as indicated by the device. The disk zone write plug work is
scheduled whenever a BIO flagged with BIO_ZONE_WRITE_PLUGGING completes
with an error or when bio_zone_wplug_prepare_bio() detects an unaligned
write. Once scheduled, the disk zone write plugs work keeps running
until all zone errors are handled.
To match the new data structures used for zoned disks, the function
disk_free_zone_bitmaps() is renamed to the more generic
disk_free_zone_resources(). The function disk_init_zone_resources() is
also introduced to initialize zone write plugs resources when a gendisk
is allocated.
In order to guarantee that the user can simultaneously write up to a
number of zones equal to a device max active zone limit or max open zone
limit, zone write plugs are allocated using a mempool sized to the
maximum of these 2 device limits. For a device that does not have
active and open zone limits, 128 is used as the default mempool size.
If a change to the device active and open zone limits is detected, the
disk mempool is resized when blk_revalidate_disk_zones() is executed.
This commit contains contributions from Christoph Hellwig <hch@lst.de>.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240408014128.205141-8-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for adding zone write plugging, modify
blk_revalidate_disk_zones() to get the capacity of zones of a zoned
block device. This capacity value as a number of 512B sectors is stored
in the gendisk zone_capacity field.
Given that host-managed SMR disks (including zoned UFS drives) and all
known NVMe ZNS devices have the same zone capacity for all zones
blk_revalidate_disk_zones() returns an error if different capacities are
detected for different zones.
This also adds check to verify that the values reported by the device
for zone capacities are correct, that is, that the zone capacity is
never 0, does not exceed the zone size and is equal to the zone size for
conventional zones.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240408014128.205141-7-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove "static" from the definition of bio_attempt_back_merge() and
declare this function in block/blk.h to allow using it internally from
other block layer files. The definition of enum bio_merge_status is
also moved to block/blk.h.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240408014128.205141-6-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Implement the inline helper functions bio_straddles_zones() and
bio_offset_from_zone_start() to respectively test if a BIO crosses a
zone boundary (the start sector and last sector belong to different
zones) and to obtain the offset of a BIO from the start sector of its
target zone.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240408014128.205141-5-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
On completion of a zone append request, the request sector indicates the
location of the written data. This value must be returned to the user
through the BIO iter sector. This is done in 2 places: in
blk_complete_request() and in blk_update_request(). Introduce the inline
helper function blk_zone_update_request_bio() to avoid duplicating
this BIO update for zone append requests, and to compile out this
helper call when CONFIG_BLK_DEV_ZONED is not enabled.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240408014128.205141-4-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Moving req_bio_endio() code into its only caller, blk_update_request(),
allows reducing accesses to and tests of bio and request fields. Also,
given that partial completions of zone append operations is not
possible and that zone append operations cannot be merged, the update
of the BIO sector using the request sector for these operations can be
moved directly before the call to bio_endio().
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240408014128.205141-3-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
On completion of a flush sequence, blk_flush_restore_request() restores
the bio of a request to the original submitted BIO. However, the last
use of the request in the flush sequence may have been for a POSTFLUSH
which does not have a sector. So make sure to restore the request sector
using the iter sector of the original BIO. This BIO has not changed yet
since the completions of the flush sequence intermediate steps use
requeueing of the request until all steps are completed.
Restoring the request sector ensures that blk_mq_end_request() will see
a valid sector as originally set when the flush BIO was submitted.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240408014128.205141-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blkdev_dio_unaligned() is called from __blkdev_direct_IO(),
__blkdev_direct_IO_simple(), and __blkdev_direct_IO_async(), and all these
are only called from blkdev_direct_IO().
Move the blkdev_dio_unaligned() call to the common callsite,
blkdev_direct_IO().
Pass those functions the bdev pointer from blkdev_direct_IO(), as it is
non-trivial to look up.
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240415122020.1541594-1-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use bio_list_merge_init instead of open coding bio_list_merge and
bio_list_init.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240328084147.2954434-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use bio_list_merge_init instead of open coding bio_list_merge and
bio_list_init.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240328084147.2954434-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use bio_list_merge_init instead of open coding bio_list_merge and
bio_list_init.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240328084147.2954434-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is a simple combination of bio_list_merge + bio_list_init
similar to list_splice_init. While it only saves a single
line in a callers, it makes the move all bios from one list to
another and reinitialize the original pattern a lot more obvious
in the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Sakai <msakai@redhat.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240328084147.2954434-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently tg_prfill_limit() uses a combination of snprintf() and strcpy()
to generate the values parts of the limits string, before passing them as
arguments to seq_printf().
Convert to use only a sequence of seq_printf() calls per argument, which is
simpler.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240327094020.3505514-1-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kernel parameter of `isolcpus=` or 'nohz_full=' are used to isolate CPUs
for specific task, and it isn't expected to let block IO disturb these CPUs.
blk-mq kworker shouldn't be scheduled on isolated CPUs. Also if isolated
CPUs is run for blk-mq kworker, long block IO latency can be caused.
Kernel workqueue only respects CPU isolation for WQ_UNBOUND, for bound
WQ, the responsibility is on user because CPU is specified as WQ API
parameter, such as mod_delayed_work_on(cpu), queue_delayed_work_on(cpu)
and queue_work_on(cpu).
So not run blk-mq kworker on isolated CPUs by removing isolated CPUs
from hctx->cpumask. Meantime use queue map to check if all CPUs in this
hw queue are offline instead of hctx->cpumask, this way can avoid any
cost in fast IO code path, and is safe since hctx->cpumask are only
used in the two cases.
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Andrew Theurer <atheurer@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Sebastian Jug <sejug@redhat.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Tejun Heo <tj@kernel.org>
Tesed-by: Joe Mario <jmario@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20240322021244.1056223-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This debugging check will become more costly in the future when we shrink
struct page. It has not proven to be useful, so simply remove it.
This lets us use __xa_insert instead of __xa_cmpxchg() as we no longer
need to know about the page that is currently stored in the XArray.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20240315181212.2573753-1-willy@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Deduplicate Kconfig entries for CONFIG_CXL_PMU
- Fix unselectable choice entry in MIPS Kconfig, and forbid this
structure
- Remove unused include/asm-generic/export.h
- Fix a NULL pointer dereference bug in modpost
- Enable -Woverride-init warning consistently with W=1
- Drop KCSAN flags from *.mod.c files
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmYJVK0VHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGf/sP/3GOk//cQGwPyWCgtCEUo6T4yyD7
1m2TTR0JQk/lcohSFtYk0I20rhKRqU6yAMAERmyehI66D2QY7lhiYVc16ram5y04
x0nWxd9IqerIlGJtaWePOvNqKdCw2EP9fS9NKz58rEDMGlsSf0Rd3NEdSsWoH8td
dECtt8yCawENAMStb/rAfsnL6kn2JIhVMyqwo0RdQfiaVT5Zk6Qgpko0Oq0ncRP2
qdNgHbvnJdKMy81FHSBAi0QEZOYvhFNX+E+6lFfWEsX6xT+wvXddCNQzJf/YV3Cw
Klw1tGveV7UGzlZ4fsnFrv4V6g1KO2AD3342efdDo++ypBEBpImVODc+Rp0jE9Nk
OgdOQRe2k9a5keH0LWY0ehvDbQlSbfNxk0wNtAfo5Kk5e41nHmHJBWCwGG+cXrjJ
mPJjSrTpuNVSaGV0kt3EskHbDBeBmIIg+5QPbldmW2qcC88kWoavkyLD3WPFsg/a
CAuR/HqH7MDfxzvsqTCjonlVcyDKX6aW66LrQ1NCtmphI4F8mdKp746CzGlziuIm
gjYJL/UWVlx0VebMo8dwDpaHvez4/4s6xAJcyqtA+TS5HbrQWKQuwFkiv4iWQxNd
MvyVdzgKhcMdoXhfFpUZ0LlFvHGefJ+Z6N1FQLoQJkTirt5aqRbEAjP0VXwQB4eH
zYygkhvvtiH9/STu
=tx+2
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Deduplicate Kconfig entries for CONFIG_CXL_PMU
- Fix unselectable choice entry in MIPS Kconfig, and forbid this
structure
- Remove unused include/asm-generic/export.h
- Fix a NULL pointer dereference bug in modpost
- Enable -Woverride-init warning consistently with W=1
- Drop KCSAN flags from *.mod.c files
* tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: Fix typo HEIGTH to HEIGHT
Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer
kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
kbuild: make -Woverride-init warnings more consistent
modpost: do not make find_tosym() return NULL
export.h: remove include/asm-generic/export.h
kconfig: do not reparent the menu inside a choice block
MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice
cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmYJSoQACgkQEsHwGGHe
VUqGSxAAoEgLUz3A3rxPhUQ4P6ZCNYRIOV+/877EaADCE0FhBnxUVbtxxYMbfnjD
vZ1AsMH9KpZzIGkUJk9501p7yD5Vdgrqwl8A3+zcs52BpXj+1y5dnb74AVN9dLaQ
f0zfvcJorSL4MjHZ04EdgU6ZSYN3O60LzA8zYVDEki1uVru3CIrJzlKB5fjuFd8h
Of17FuL1Tx/CyrLwxfv1O2drYGtjGw+OkIDiw/NZs4lr9+eA5xSxlS/aoipmN6aZ
F0TajJMjk5ME33mRYasBfprlB8CI7OkuZi0DvMALH1FjsW8mRon+qQ0QDuaoWoKN
rdubBoufHokbJ+oAt8inytNAXaTIj0AMDESP4gG4olaPXR41nTlE6ur1bzGbrfRC
g6FABAMIK/sgeMIZPT9Tu08G7j/QgaKFB9U8V5YNir9eruA6rAsfGVB4M7FHy+tX
zlx0nIZAYyqTLG9PlawnnNFh+IzLmwjB4vDwwbN7Jx0G0iU/FVdjgdtyLXezYMtd
QA1JFbai0irAQUCNoVIhiq8pCEbnvfq+pc6JQ9mnvhwJZrMuN+eXj4XBPPmQHN45
p9yZiNf09jPtV6B3HeLQc+PmOG5kKz+vfyAq7LuWtyFLvVEztssmsOSjx/cDn+xf
MeXL8q+8GnsaZD5kKr6oBPDYrOT0ebML9A+Qq6V4OEJhGmo8ErI=
=EzDx
-----END PGP SIGNATURE-----
Merge tag 'edac_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Fix more issues in the AMD FMPM driver
* tag 'edac_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
RAS: Avoid build errors when CONFIG_DEBUG_FS=n
RAS/AMD/FMPM: Safely handle saved records of various sizes
RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()
- Fix the IRQ sharing with pinctrl-amd and ACPI OSL
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmYJSYMACgkQEsHwGGHe
VUoWag//STcc2r8aw6PdAvtm0VlUqY1oTrvMK6jvJORXFYEFl+R9lMW7sg9jAsDo
fCcmRWFLD1dLy+eU+QNW4WNeqBlfODHKWQsxxtP9gl93qxvRAF6gvWW06JURvIoL
yqrJGj9P3X7WsccaQAhOPpDKg93yU+uXvhc3J3Edi0WXsdBZ2g6jsAFj35TbAPb/
DRl+8uFO3lGzF64ygg2hUFBPYE1tAEApbHrTpahQBHwiyA3iCKg0djgI9fnIM5E0
Q4mCtcs6JwkpRY5Z/QYLT/VJsrFmVrxRyBnPAWZ677x/QTfGZk/QKZ7kDkQuTh/k
jmfUVeJ4hnWmBg72GwYuahqKLlPlGUeG3iePQtw2rAKpDCU7xC3QMxbKzrVTCYKU
JG75uIzO7+0W0IBYLRAG+TYI/YVyK+UlbLV4AAu2QtEiqJ6pQ2ZLg9NcDtnJjNqU
CdwVgOeRGLLvQnKvS3Slj3+Z1wGEIFDE/w0dl9B0YEYMKiZNPoiClE/7VEA3b4Rn
Q8jsT9WoiiagFFwvkTpCofjvTgioOLXJ8h7adxREi9GSYQy2mxq8XsZQEViozM5W
mBVPRd2PCS7xIRwTBMehiK6Zta56z9h7RYhhpW2oPYy38/tHyhvvlyso9DF+I9pD
uQkOKfDDh26IRFBr0fMR8dD0Host8xDQS1h+hCwyi4iPX8sgB/A=
=80JR
-----END PGP SIGNATURE-----
Merge tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix an unused function warning on irqchip/irq-armada-370-xp
- Fix the IRQ sharing with pinctrl-amd and ACPI OSL
* tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/armada-370-xp: Suppress unused-function warning
genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd
- Use the correct stalled cycles PMCs on AMD Zen2 and newer
- Fix detection of the LBR freeze feature on AMD
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmYJR3oACgkQEsHwGGHe
VUo/eRAAnGUq0rSi4ZUqsTtbu/jNepKbaeS7jR3/p3V6iSwfmjEoJ2xE6uIdN5vD
fnL6UkeDRMc8LaKHIdLD4ZbN8NRa3hOyzf5K7wwVp5bwle0NeyrcG5wVK8LgT/X/
rPSk7YxoR5frkYcA6zZwezJOv3HGYt8RMr5bKMD3YiJ35/XCdPsKnbHJTHb+F23Y
tYFBeyzRzOebQu0fFKP8ML9LbqvELESqJ5Smwu/jQ25aBW7sFsUNAxseGU2tYahX
c6pm8ytIlpZFwmi1HzXmMICF7lWugFO/KkP/ndCM1IpmujVGy56hrpLEy5gT3gzh
NE/nZDoqJAO2zhg2FuKybh3akdT+IgXUTjxYMYGUOkJIChzie3o4p9OqichgTIv5
+ngAq5qzjAHfC7cZ5nA96XWkw1fFU6BqlA3KPs1mzQU9uTDz7tSkyxIitp3C8L0B
JlilTr6yHUprJzFwCDk4hb+hfP5A9qYnrNeacMlldZmbH1jLYHEzB9FudK82MeM+
tIKFnM2jyRaRs/s8n+/UdrOVFNGk/+scX8GQllEBF451a8J5x1CYeHB7dGW+4pf/
cx5TupHg8dDRgNMsbaeEvwERoPu4h/VRozfBi6r1WjjskVm24lIdFFKTSm3BDbLk
EH3cflv/h8KE19cr0XLb7aYYw/9jb4cpnb0WBMw1gQOSvUMXzxU=
=gmta
-----END PGP SIGNATURE-----
Merge tag 'perf_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fixes from Borislav Petkov:
- Define the correct set of default hw events on AMD Zen4
- Use the correct stalled cycles PMCs on AMD Zen2 and newer
- Fix detection of the LBR freeze feature on AMD
* tag 'perf_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later
perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later
perf/x86/amd/lbr: Use freeze based on availability
x86/cpufeatures: Add new word for scattered features
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmYJOv8ACgkQEsHwGGHe
VUrfCw/+MNZAKrldyZgLKndL8fNKxnbJkKltC5LgrkBtF0qyWWY2JLUJB0oyiTnf
LIhAcK+fmcUS5ugIqSFpRWwJYahmWwDMw6xu3zglbHtaGx/aGYn0CutprCZpoR0m
339g9JRL2tKK2tyZ1xodQXh6zjnF2gYzyhOTvgforYmRMJC/air1ZH8H+Cyyhrkw
lu5OXttB1wSZMTrx/J8bMFnmw6DpAr+owVgX/bZE7UbjpmRNGpc1NhevL+L96NjX
932cRbOzlAhRe6qGg3aTDMaY0MjJizixKagfmOK8L/QX63FtoUv7WIlpE4LzBZo/
Kh+bs+UkvXBVi5sNSleWvlLo/XZYqcjojVe39MvZ3c+Yea5H5iDY35mNH0yfP88O
+7ntrqbLRm+5g6ywg/HuQV57Vvy6jze0XZswbWciouDBYSqBM13ctiEe3L6N9wIX
7i+8T4ODkWH+l8hls8NhGr2IglGC/3pwB777yew6xUvBEUB4EB4vcD9kR6snmKsp
eAJHMnDfAJD/N0YShOAbzBMzSaavmfOkHO7Vj3b/qCe9fkhYRGVJOmsbtUsznyQP
4lawu0ikRCNRaTofKZc8tdkEtGbsn0CNFNrxjc/GghKVwF5+FqX1Z3P5wR7IhY5J
vz8qLRBW2Zu2K7aEnnBh/qr5/uLJr5Df9qnqpbGag4jxNHamRjg=
=VHYj
-----END PGP SIGNATURE-----
Merge tag 'timers_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers update from Borislav Petkov:
- Volunteer in Anna-Maria and Frederic as timers co-maintainers so that
tglx can relax more :-P
* tag 'timers_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add co-maintainers for time[rs]
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmYJOlEACgkQEsHwGGHe
VUpfLg//R2a9SILACbpfUU3sun7qN2EKRzi7ed0zcZZaw64WNvXtnRK3ogY19oML
JDjcCv0SlbRu1gT+WdB8XZfITi3UDNbMBgt/WY8EG6k7isXTs4A3mrLfSkpOnpNK
sWLtSEcTUktHR5ufngf7tlab7S09xvCmHF5BKqY3XyahbzOkIe2t06CbYFEXozmg
2pqC9ySYnZyroChYIFOChhcU3aGdUznxHmawvg6Avlo9xjmFmKRnAP+ScCrfAcrP
Ar0D14UFKuJXeT4UZI9pe/2snNEOmR3H8l088+R1dG8pWtlK7k+PQwRYq7jYQEMU
8R23bEK4SWAMFkNobPpoFYdNgt7IyJ/ZOEwiTMVeFfFu5dbIJROqW2ovic97o3z+
3DaaKY0/nMvNjWKeUekpVUx/CUOvACiskPC0340thfYTXuSVziTaT8MA/P7hsj8H
Udt9BCOw+qvFxWu7PI9UHe2IoBS4a2Cm8OXHEocMwkQqxYq2WKdUG+xoErSyTvYY
jJ7KLQ/J1yg1GWoaXUbvTa1cFZj+AKyjmsyrcdu19BpQWkzd4OY8IQffdU1Esg2S
BJYG691pKgS0xkjfwF+a2pzZ9bEWAA5Z9W0b81onySAHmDuqywLPR+brbPIhv1ZW
ma7MBDc/zoHAQ+1g3+UwficqYAA0tIY7dJOFXdAd8iYu8z8YJd0=
=duc+
-----END PGP SIGNATURE-----
Merge tag 'objtool_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
- Fix a format specifier build error in objtool during an x32 build
* tag 'objtool_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix compile failure when using the x32 compiler
make ... arch/x86/virt/vmx/tdx/seamcall.o
work again
- Do not do ROM range scans and memory validation when the kernel is
running as a SEV-SNP guest as those can get problematic and, before
that, are not really needed in such a guest
- Exclude the build-time generated vdso-image-x32.o object from objtool
validation and in particular the return sites in there due to
a warning which fires when an unpatched return thunk is being used
- Improve the NMI CPUs stall message to show additional information
about the state of each CPU wrt the NMI handler
- Enable gcc named address spaces support only on !KCSAN configs due to
compiler options incompatibility
- Revert a change which was trying to use GB pages for mapping regions
only when the regions would be large enough but that change lead to
kexec failing
- A documentation fixlet
-----BEGIN PGP SIGNATURE-----
iQIyBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmYJN64ACgkQEsHwGGHe
VUo9KA/48ELqeKhCAFQdkZn8nnYf57g7bSMibMCnI5/ndNjyuShdZO92SGtBV7fV
4L77vJRJXx2ytz4MztOPVClXaBEs4uDEa0NyZFSuEdvRIvJuglibsleQOFTbKfkD
HdL0geI5exyfCC3BmslCS857sd6aBanqmYPddVk1+5mwhTGcAcwy9aiNTCzxxE22
KuaO12A0+UNOdXNLAuUythmYL2V0Xn2Z9sXpDRchwsRnj12C1S2flIhPsWxg9AU+
3ws9PnfTFTcfcViEug5p1nyN0gWCgYhMxaN8i3IS4smEc09Yq0pxVsitwYqJB1D6
+neC27FTF4DCBr4G/8yu/x5BVHS92s9VK4u9Wo4nf9M4XBlMVU+If0bn9jtvE6jI
GbdoHzoF7e2SEgzVvIjOHpdiBxEzW6S4lZGEtj6oaYR+cI0Fc7vZXoHy8kXohtTz
QT2M1Hl9tMh2HjkwvoQLtNMYVHMWnuDy1X9k25Qk4nnCQJy9DbGiPnpxaGuPtaJ7
s8kha4r+U/zpEiEa0S5AoIozVD/syXe1lOtaN2dUpOWjqBJAMGr1QkK3NnOxEcO3
CeKOCcSfjvvAKPyTnk1Vrtec5KwHmUgD/VN2fn7EdHZVwxlbaoCybpP3g4CT4Jpl
TQakg0du9TyG9BXituN4XaZgGskDYFRlXv+U+Lity4wvEjWgmg==
=JrjO
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure single object builds in arch/x86/virt/ ala
make ... arch/x86/virt/vmx/tdx/seamcall.o
work again
- Do not do ROM range scans and memory validation when the kernel is
running as a SEV-SNP guest as those can get problematic and, before
that, are not really needed in such a guest
- Exclude the build-time generated vdso-image-x32.o object from objtool
validation and in particular the return sites in there due to a
warning which fires when an unpatched return thunk is being used
- Improve the NMI CPUs stall message to show additional information
about the state of each CPU wrt the NMI handler
- Enable gcc named address spaces support only on !KCSAN configs due to
compiler options incompatibility
- Revert a change which was trying to use GB pages for mapping regions
only when the regions would be large enough but that change lead to
kexec failing
- A documentation fixlet
* tag 'x86_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Use obj-y to descend into arch/x86/virt/
x86/sev: Skip ROM range scans and validation for SEV-SNP guests
x86/vdso: Fix rethunk patching for vdso-image-x32.o too
x86/nmi: Upgrade NMI backtrace stall checks & messages
x86/percpu: Disable named address spaces for KCSAN
Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
Documentation/x86: Fix title underline length
Fixed a typo in some variables where height was misspelled as heigth.
Signed-off-by: Isak Ellmer <isak01@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
As of the first s390 pull request during the 6.9 merge window,
commit 691632f0e869 ("Merge tag 's390-6.9-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux"), s390 can be
built with LLVM=1 when using LLVM 18.1.0, which is the first version
that has SystemZ support implemented in ld.lld and llvm-objcopy.
Update the supported architectures table in the Kbuild LLVM
documentation to note this explicitly to make it more discoverable by
users and other developers. Additionally, this brings s390 in line with
the rest of the architectures in the table, which all support LLVM=1.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When KCSAN and CONSTRUCTORS are enabled, one can trigger the
"Unpatched return thunk in use. This should not happen!"
catch-all warning.
Usually, when objtool runs on the .o objects, it does generate a section
.return_sites which contains all offsets in the objects to the return
thunks of the functions present there. Those return thunks then get
patched at runtime by the alternatives.
KCSAN and CONSTRUCTORS add this to the object file's .text.startup
section:
-------------------
Disassembly of section .text.startup:
...
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
which, if it is built as a module goes through the intermediary stage of
creating a <module>.mod.c file which, when translated, receives a second
constructor:
-------------------
Disassembly of section .text.startup:
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
...
0000000000000030 <_sub_I_00099_0>:
30: f3 0f 1e fa endbr64
34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
35: R_X86_64_PLT32 __tsan_init-0x4
39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
3a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
in the .ko file.
Objtool has run already so that second constructor's return thunk cannot
be added to the .return_sites section and thus the return thunk remains
unpatched and the warning rightfully fires.
Drop KCSAN flags from the mod.c generation stage as those constructors
do not contain data races one would be interested about.
Debugged together with David Kaplan <David.Kaplan@amd.com> and Nikolay
Borisov <nik.borisov@suse.com>.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/r/0851a207-7143-417e-be31-8bf2b3afb57d@molgen.mpg.de
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The -Woverride-init warn about code that may be intentional or not,
but the inintentional ones tend to be real bugs, so there is a bit of
disagreement on whether this warning option should be enabled by default
and we have multiple settings in scripts/Makefile.extrawarn as well as
individual subsystems.
Older versions of clang only supported -Wno-initializer-overrides with
the same meaning as gcc's -Woverride-init, though all supported versions
now work with both. Because of this difference, an earlier cleanup of
mine accidentally turned the clang warning off for W=1 builds and only
left it on for W=2, while it's still enabled for gcc with W=1.
There is also one driver that only turns the warning off for newer
versions of gcc but not other compilers, and some but not all the
Makefiles still use a cc-disable-warning conditional that is no
longer needed with supported compilers here.
Address all of the above by removing the special cases for clang
and always turning the warning off unconditionally where it got
in the way, using the syntax that is supported by both compilers.
Fixes: 2cd3271b7a31 ("kbuild: avoid duplicate warning options")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When compiling the v6.9-rc1 kernel with the x32 compiler, the following
errors are reported. The reason is that we take an "unsigned long"
variable and print it using "PRIx64" format string.
In file included from check.c:16:
check.c: In function ‘add_dead_ends’:
/usr/src/git/linux-2.6/tools/objtool/include/objtool/warn.h:46:17: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘long unsigned int’ [-Werror=format=]
46 | "%s: warning: objtool: " format "\n", \
| ^~~~~~~~~~~~~~~~~~~~~~~~
check.c:613:33: note: in expansion of macro ‘WARN’
613 | WARN("can't find unreachable insn at %s+0x%" PRIx64,
| ^~~~
...
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-kernel@vger.kernel.org
* Allow stripe unit/width value passed via mount option to be written over
existing values in the super block.
* Do not set current->journal_info to avoid its value from being miused by
another filesystem context.
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZgKa+AAKCRAH7y4RirJu
9IL1APwPBMzSowijBI/rCD5BGlISn7mCRlZwvyXE1avmRmbQPAEApU5yRhBHWi62
629azfSr1I5m678xM7WQKh6X3/VUDAg=
=pqNH
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Allow stripe unit/width value passed via mount option to be written
over existing values in the super block
- Do not set current->journal_info to avoid its value from being miused
by another filesystem context
* tag 'xfs-6.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't use current->journal_info
xfs: allow sunit mount option to repair bad primary sb stripe values
Fully half this pull is updates to lpfc and qla2xxx which got
committed just as the merge window opened. A sizeable fraction of the
driver updates are simple bug fixes (and lock reworks for bug fixes in
the case of lpfc), so rather than splitting the few actual
enhancements out, we're just adding the drivers to the -rc1 pull. The
enhancements for lpfc are log message removals, copyright updates and
three patches redefining types. For qla2xxx it's just removing a
debug message on module removal and the manufacturer detail update.
The two major fixes are the sg teardown race and a core error leg
problem with the procfs directory not being removed if we destroy a
created host that never got to the running state. The rest are minor
fixes and constifications.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZghLoiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pisheICAQDVOLQd
GHg/lRzbbBbeqU8aDiZCSfbPlRUla1IutNlZCQD7BmlP8bMQuHcY4auHMttCeLYd
s+EDe2cpznokwuNP0d4=
=NtRd
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes and updates from James Bottomley:
"Fully half this pull is updates to lpfc and qla2xxx which got
committed just as the merge window opened. A sizeable fraction of the
driver updates are simple bug fixes (and lock reworks for bug fixes in
the case of lpfc), so rather than splitting the few actual
enhancements out, we're just adding the drivers to the -rc1 pull.
The enhancements for lpfc are log message removals, copyright updates
and three patches redefining types. For qla2xxx it's just removing a
debug message on module removal and the manufacturer detail update.
The two major fixes are the sg teardown race and a core error leg
problem with the procfs directory not being removed if we destroy a
created host that never got to the running state. The rest are minor
fixes and constifications"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (41 commits)
scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload
scsi: core: Fix unremoved procfs host directory regression
scsi: mpi3mr: Avoid memcpy field-spanning write WARNING
scsi: sd: Fix TCG OPAL unlock on system resume
scsi: sg: Avoid sg device teardown race
scsi: lpfc: Copyright updates for 14.4.0.1 patches
scsi: lpfc: Update lpfc version to 14.4.0.1
scsi: lpfc: Define types in a union for generic void *context3 ptr
scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr
scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr
scsi: lpfc: Use a dedicated lock for ras_fwlog state
scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()
scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port()
scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic
scsi: lpfc: Remove IRQF_ONESHOT flag from threaded IRQ handling
scsi: lpfc: Move NPIV's transport unregistration to after resource clean up
scsi: lpfc: Remove unnecessary log message in queuecommand path
scsi: qla2xxx: Update version to 10.02.09.200-k
scsi: qla2xxx: Delay I/O Abort on PCI error
scsi: qla2xxx: Change debug message during driver unload
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmYIJi4ACgkQFA3kzBSg
KbZ3vg/+NyeJiAIOnKK0qnCHfKWB3RmVFLe7KupN5n8j9zwsWzWSYyACCdFCSl9R
uh9vVZV3oRbekvB3vpBjUktvwaNG9zTiCx/i/Vs7IOKOGvN6ZjwpZtoqJO4mgwzx
zOeh1xRjrbpzHWv6w5nXtJ1VVQMQ5V+oinfffKHKEeFuG0NuHYiyk5Cynwwt2uQX
hS6c+Dv4ufz65uctCDOdi41crTDLxnfrTKi8mUiPZKeFL3entN3qWOVwtLz6Cy1x
RukSjeKmmLKsROnhaV2Z22Nfr9RpFtZ2VIns/f/ZptPKTreWnj7aXSwifUsky4T1
IFwKT8rGyqPMVansdxjtTM3IjYggzgVL3Ffuk9JZTxWB9f8I0pSQOUUiqwC0ftRt
AYbOp/fM7o8DCJl198lrujV6TaCiQKc13NSAPEGweIatkZ/zj46zV8f3Ky1uKZib
3WmnMLkjVTBcX9THKmCjGp5UxGxynIuXrlu2/rToEQrAa0pFM7xnoI+FPLmKuSYy
uYRQxpItJ4/Q8xF5FYkjVQH7m208YbVt+9sQEN3m5mJ0ieso2vof09kMz7/OthSz
Y/VkXVEfr399DcPI3nbNL0NdGVp3zAhEVy73Hc+IcGjq+NPXmvjLjm2kxJQ0gsLC
LgAqbqVC37S0mswoae4/Dd7rYQFVeenZZoLwl0ob+bjTu+BQraw=
=Vp8P
-----END PGP SIGNATURE-----
Merge tag 'i2c-for-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"A fix from Andi for I2C host drivers"
* tag 'i2c-for-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i801: Fix a refactoring that broke a touchpad on Lenovo P1
Here are a bunch of small USB fixes for reported problems and
regressions for 6.9-rc2. Included in here are:
- deadlock fixes for long-suffering issues
- USB phy driver revert for reported problem
- typec fixes for reported problems
- duplicate id in dwc3 dropped
- dwc2 driver fixes
- udc driver warning fix
- cdc-wdm race bugfix
- other tiny USB bugfixes
All of these have been in linux-next this past week with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZggDDg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymuXQCg0/LF/RSoCer/7dczP7zglY+Mw+sAni6ft9jx
gzxF9jiqPAjjePT7YFgE
=AB/K
-----END PGP SIGNATURE-----
Merge tag 'usb-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a bunch of small USB fixes for reported problems and
regressions for 6.9-rc2. Included in here are:
- deadlock fixes for long-suffering issues
- USB phy driver revert for reported problem
- typec fixes for reported problems
- duplicate id in dwc3 dropped
- dwc2 driver fixes
- udc driver warning fix
- cdc-wdm race bugfix
- other tiny USB bugfixes
All of these have been in linux-next this past week with no reported
issues"
* tag 'usb-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
USB: core: Fix deadlock in port "disable" sysfs attribute
USB: core: Add hub_get() and hub_put() routines
usb: typec: ucsi: Check capabilities before cable and identity discovery
usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
usb: typec: ucsi_acpi: Refactor and fix DELL quirk
usb: typec: ucsi: Ack unsupported commands
usb: typec: ucsi: Check for notifications after init
usb: typec: ucsi: Clear EVENT_PENDING under PPM lock
usb: typec: Return size of buffer if pd_set operation succeeds
usb: udc: remove warning when queue disabled ep
usb: dwc3: pci: Drop duplicate ID
usb: dwc3: Properly set system wakeup
Revert "usb: phy: generic: Get the vbus supply"
usb: cdc-wdm: close race between read and workqueue
usb: dwc2: gadget: LPM flow fix
usb: dwc2: gadget: Fix exiting from clock gating
usb: dwc2: host: Fix ISOC flow in DDMA mode
usb: dwc2: host: Fix remote wakeup from hibernation
usb: dwc2: host: Fix hibernation flow
USB: core: Fix deadlock in usb_deauthorize_interface()
...
Here are two small staging driver fixes for the vc04_services driver
that resolve reported problems:
- strncpy fix for information leak
- another information leak discovered by the previous strncpy fix
Both of these have been in linux-next all this past week with no
reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZggDpw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynyVACeP70TljJdaUAsUgTvRDFjf7unXtoAoJl2Awz0
oXqFuGqwt+WJeXqjwamx
=60tL
-----END PGP SIGNATURE-----
Merge tag 'staging-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are two small staging driver fixes for the vc04_services driver
that resolve reported problems:
- strncpy fix for information leak
- another information leak discovered by the previous strncpy fix
Both of these have been in linux-next all this past week with no
reported issues"
* tag 'staging-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vc04_services: fix information leak in create_component()
staging: vc04_services: changen strncpy() to strscpy_pad()
malfunctions on some Lenovo P1 models by incorrectly overwriting
a status variable during successful SMBUS transactions.
-----BEGIN PGP SIGNATURE-----
iIwEABYIADQWIQScDfrjQa34uOld1VLaeAVmJtMtbgUCZgXR5RYcYW5kaS5zaHl0
aUBrZXJuZWwub3JnAAoJENp4BWYm0y1uqAQBAJqp/3aqB816lBH0F2i2qTs/sXL4
lmeDWXG3XXkdOC+WAQCcxJ/EoJcNH6kDfMm0XvgyixTDVy68Mk7HMsQOg1ntBQ==
=vXmL
-----END PGP SIGNATURE-----
Merge tag 'i2c-host-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
One fix in the i801 driver where a bug caused touchpad
malfunctions on some Lenovo P1 models by incorrectly overwriting
a status variable during successful SMBUS transactions.
Commit c33621b4c5ad ("x86/virt/tdx: Wire up basic SEAMCALL functions")
introduced a new instance of core-y instead of the standardized obj-y
syntax.
X86 Makefiles descend into subdirectories of arch/x86/virt inconsistently;
into arch/x86/virt/ via core-y defined in arch/x86/Makefile, but into
arch/x86/virt/svm/ via obj-y defined in arch/x86/Kbuild.
This is problematic when you build a single object in parallel because
multiple threads attempt to build the same file.
$ make -j$(nproc) arch/x86/virt/vmx/tdx/seamcall.o
[ snip ]
AS arch/x86/virt/vmx/tdx/seamcall.o
AS arch/x86/virt/vmx/tdx/seamcall.o
fixdep: error opening file: arch/x86/virt/vmx/tdx/.seamcall.o.d: No such file or directory
make[4]: *** [scripts/Makefile.build:362: arch/x86/virt/vmx/tdx/seamcall.o] Error 2
Use the obj-y syntax, as it works correctly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240330060554.18524-1-masahiroy@kernel.org
This kselftest fixes update for Linux 6.9-rc2 consists of fixes
to seccomp and ftrace tests and a change to add config file for
dmabuf-heap test to increase coverage.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmYHILAACgkQCwJExA0N
QxwigxAA6715Nlzu3K3Dv0PG4IXdlx4ETnWfYO0YU5C30C+JVnQ5aAeCHo1wXK8f
EkBNImD2Fdy+dlTLVLlVhGsqYKNkGj335af66HmKBbNa/oOvO/cTxuJc6awPH6Sp
zOuf1G7fddOpai20If5/1SS9esMjCEPAuywHdAmztrdUcj28qnhQCJr1mQoJUL9y
1ow/ghEFZnAM77TCIKwMU6ow4ufbTwCn2pH+ctWqRBjZ9C3N1DgNXzIf2N8vb4Jw
ExU8WyI2wB1XDrxm1wEiFMzJKpK7BTDq1DUJe12LF+7dEAEw8s/9Vvl3YSt6iipY
r8RQBbkxfVnH7gv0vtk6jEmx7UTDQzKIGr3KKHPedNwffq03ObqlU3yIooDgYK6d
iyMKxkIimL7Cw8/oP3DrONxUbfcvsLnXOfVcBUYqTrElW/bx0Z/8wOZtUGCiCWty
hNup0gq8Mwg4YoqNpg0JjoEdgxUcEy5GEzqWWFFuugMEJNDBQHun2hCaFNUu7sxZ
lCER5PZDVH0GAacvnTnUa6SPiVr0hHnv985sCM78rUxAVK8Yggb8cWNrqtW1S7Ee
avCST17JBWGyEkyUIq4cHBtxXlGpoO3tqK1wMhXElXm/WmFeFf9N0lkb3uhORZnv
xYrtCQGOnwIUVmA0QXQQshMWIqTHEjb2uJIKFabhwJptLSYzPgA=
=qPtu
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Fixes to seccomp and ftrace tests and a change to add config file for
dmabuf-heap test to increase coverage"
* tag 'linux_kselftest-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: dmabuf-heap: add config file for the test
selftests/seccomp: Try to fit runtime of benchmark into timeout
selftests/ftrace: Fix event filter target_func selection
This kunit update for Linux 6.9-rc2 consists of one urgent fix for
--alltests build failure related to renaming of CONFIG_DAMON_DBGFS
to DAMON_DBGFS_DEPRECATED to the missing config option.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmYHEeIACgkQCwJExA0N
QxwWpxAAkCyeAE/A0EZwNNKI0xP7FdclaApaXByM+bfrNc58lKc6x5Za7G4j4UTa
u4Ve52/eO3RCE74q8RocQuLjNiL3kb2LePT5UvF7YbYHV0Z6RUDtSBroQaaBKSvk
1XWRpkPNe8zQrLa1HTM/vJyD9f3HejHowVYvqcxf/AKMCz20iUxqrQgF7yPMCkk8
ZPSHZYn0sHCwR2pcq1WjaPf06zrbl3PmKIdg0hP+OSvdL70MwhrAfh4uwC2ZNSNO
Q2UGGqDejEH9HsRfdj9XoPRrTCn/+7hG37MFOJP9D5kENIQ94gpagBw9G1n+pQkY
OKfyJcgJZ53A9lgtz9UXyKerNH6Eth6HYzDvDFCutHRx4h3LvqEpsuXv6ev09Tj/
7kS/DK2EIDhW5jpdixt+HtQuDG3j9HabZBljXYsGgRNi13uj/Y3Ta3MtS9Re6bHr
0XrW5l14x8Ogo5vcahlIAP7lpQDVIsMP8Waa7fWzI8mATCI7nk5yAIf+Wp92VBwQ
uFug7solHnkPWzYdHgm5IrVBN4k7W2BxgWbN7ASTHcjabcxiKfJzHjoHNZDWEqN3
vsFtpj0GUzoYNe2JZQmdOGNcj+K6A9UY8Ii29cZkY/oKcbY1noVhJ2Bnf6R3T3/Z
NYAJa5jUKZBigtX8aIO7ZvWXo9Sc17Gn9N6iUgtjWJ2UNT27UiY=
=GX2q
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-kunit-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit fixes from Shuah Khan:
"One urgent fix for --alltests build failure related to renaming of
CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED to the missing config
option"
* tag 'linux_kselftest-kunit-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
The config fragment enlists all the config options needed for the test.
This config is merged into the kernel's config on which this test is
run.
Fixed whitespace errors during commit:
Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The seccomp benchmark runs five scenarios, one calibration run with no
seccomp filters enabled then four further runs each adding a filter. The
calibration run times itself for 15s and then each additional run executes
for the same number of times.
Currently the seccomp tests, including the benchmark, run with an extended
120s timeout but this is not sufficient to robustly run the tests on a lot
of platforms. Sample timings from some recent runs:
Platform Run 1 Run 2 Run 3 Run 4
--------- ----- ----- ----- -----
PowerEdge R200 16.6s 16.6s 31.6s 37.4s
BBB (arm) 20.4s 20.4s 54.5s
Synquacer (arm64) 20.7s 23.7s 40.3s
The x86 runs from the PowerEdge are quite marginal and routinely fail, for
the successful run reported here the timed portions of the run are at
117.2s leaving less than 3s of margin which is frequently breached. The
added overhead of adding filters on the other platforms is such that there
is no prospect of their runs fitting into the 120s timeout, especially
on 32 bit arm where there is no BPF JIT.
While we could lower the time we calibrate for I'm also already seeing the
currently completing runs reporting issues with the per filter overheads
not matching expectations:
Let's instead raise the timeout to 180s which is only a 50% increase on the
current timeout which is itself not *too* large given that there's only two
tests in this suite.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The original version of the mitigation would patch in the calls to the
untraining routines directly. That is, the alternative() in UNTRAIN_RET
will patch in the CALL to srso_alias_untrain_ret() directly.
However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain
mess") meant well in trying to clean up the situation, due to micro-
architectural reasons, the untraining routine srso_alias_untrain_ret()
must be the target of a CALL instruction and not of a JMP instruction as
it is done now.
Reshuffle the alternative macros to accomplish that.
Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Revert thermal core optimization that introduced a functional issue
causing a critical trip point to be crossed in some cases (Daniel
Lezcano).
- Add missing conversion between different state ranges to the
devfreq cooling device driver (Ye Zhang).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmYHBtMSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx9PUP/3Fj6pk2ibvesE9ixcDcLcP0E614RsB9
nGtS5819Kzw++4rqbHCq9USqxumtp28b/B+zZim39AmOKdKvtX3cz9XXZQpq95Va
1H8I0unAQwHmviPG9ZLBOLm6zdGdbbsFFUYctH0dhHt9epMNTAwFUSf4CUek0gxA
V/73KDCIqjR1boapaPyIeGD25TvlSipvQBzHdu476SK4DA0rUvmyl1XLVz1sg3UJ
Xvwig9za4TlemuTn+1g7192OIWcG67locfMlEjS+18rkbABARUWdnDbm8zH5N2PF
geetn9ukbFkFmShUCGR6h40lHpkcxI/8WEht7nNVn2AGImr1eIbcPoFjYlcf/6/a
ttAT/qof/d4/uy0clYObMXrFRkCExsI9U3+rZ/TnN0fGmr90mSjAAwSyPdDVNYSg
ksZlLojvrvCkdWYjtewr0ONbCJmADT3LnD+VsXQq0hUlMWXQndGLXHn5RzzCHIZa
oEdlnCYQjnvdZNk2rdMgxaAxGg1vIKWl5TccUl9tWDJz9m1qh8d+6lL75AnsiYjR
zuwD2MIO/WXpEFbEZeSv0DNJY2djIrYXakZjLnf+OlzoykZOqJ9vAXk2r4hxAqd0
yk9DmLWd9QBWLQTsiGiWp95bmI9qDYs1lC5MIEM7xKiJzjth1lwzxB7SkbaR/UyB
HrOKLgQloOoa
=8O09
-----END PGP SIGNATURE-----
Merge tag 'thermal-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These revert a problematic optimization commit and address a devfreq
cooling device issue.
Specifics:
- Revert thermal core optimization that introduced a functional issue
causing a critical trip point to be crossed in some cases (Daniel
Lezcano)
- Add missing conversion between different state ranges to the
devfreq cooling device driver (Ye Zhang)"
* tag 'thermal-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: devfreq_cooling: Fix perf state when calculate dfc res_util
Revert "thermal: core: Don't update trip points inside the hysteresis range"