IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRCvcIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpk+JEACj01t7Xen2+Razagu3aTx9tmRGFnTNR3MY
raFG6B1TADk1TgCWWa2C4Dj67SOispPLm8hbIcOxqB1UscDWCCwjmnr/debADFzW
Ap6shv/IRwVGmDp+F7ocYas0ynwooOJg4WJTwkSKz2o4m4p3vzlwAKi4fLiSjbXp
gJTrA7WEvDOVjzajlTFUtjr8rc6PdunbGm25cPIufAxUEhvttYex2VbVqjDmfNsE
8tyyk9RWbe4AY/ZYaGXVn4yQ/CgL/sXFkVc5noRXNfAQ/K3CVLQrFLJ3JlwUHpiA
xXBor21TUWCZEo33Y2G5NConAYqE7etoPTkaTDO3/aZ+dAMFyhC/WAYLz1KZGMh1
+g1fDX1QKEd40H2lfDXvqF1ob7Ut8EzUx+gvBXcc3/AiRpJ5rjfOcj6LPUMUqQJk
nucLLFTiMKecnDMBERbvixqbaTyrjvkFEj2wYJvgj1LKXAd+x/bj8SGajs9r88Nb
9YT9ai/+Yl7Ppfb67rCgXJU7oNZQSAQ2H+X/l2jbiqImOgq1u/45AmINnbanS7HH
Y1I8pbH45AcnCgkJRoQwrNX3BnTOTBJ+D/4Fl4b8jsihq0D3UtwCwPCObHP4LW9S
MUNPhP3tUuYsAgXqX80+Sao6SYvXDwnbWOM+LOaaZXgjb1ndwDUZXpto8Ra8WB1u
8kM6s6ZR7g==
=W1Zb
-----END PGP SIGNATURE-----
Merge tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- drbd patches, bringing us closer to unifying the out-of-tree version
and the in tree one (Andreas, Christoph)
- support for auto-quiesce for the s390 dasd driver (Stefan)
- MD pull request via Song:
- md/bitmap: Optimal last page size (Jon Derrick)
- Various raid10 fixes (Yu Kuai, Li Nan)
- md: add error_handlers for raid0 and linear (Mariusz Tkaczyk)
- NVMe pull request via Christoph:
- Drop redundant pci_enable_pcie_error_reporting (Bjorn Helgaas)
- Validate nvmet module parameters (Chaitanya Kulkarni)
- Fence TCP socket on receive error (Chris Leech)
- Fix async event trace event (Keith Busch)
- Minor cleanups (Chaitanya Kulkarni, zhenwei pi)
- Fix and cleanup nvmet Identify handling (Damien Le Moal,
Christoph Hellwig)
- Fix double blk_mq_complete_request race in the timeout handler
(Lei Yin)
- Fix irq locking in nvme-fcloop (Ming Lei)
- Remove queue mapping helper for rdma devices (Sagi Grimberg)
- use structured request attribute checks for nbd (Jakub)
- fix blk-crypto race conditions between keyslot management (Eric)
- add sed-opal support for reading read locking range attributes
(Ondrej)
- make fault injection configurable for null_blk (Akinobu)
- clean up the request insertion API (Christoph)
- clean up the queue running API (Christoph)
- blkg config helper cleanups (Tejun)
- lazy init support for blk-iolatency (Tejun)
- various fixes and tweaks to ublk (Ming)
- remove hybrid polling. It hasn't really been useful since we got
async polled IO support, and these days we don't support sync polled
IO at all (Keith)
- misc fixes, cleanups, improvements (Zhong, Ondrej, Colin, Chengming,
Chaitanya, me)
* tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux: (118 commits)
nbd: fix incomplete validation of ioctl arg
ublk: don't return 0 in case of any failure
sed-opal: geometry feature reporting command
null_blk: Always check queue mode setting from configfs
block: ublk: switch to ioctl command encoding
blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush
block, bfq: Fix division by zero error on zero wsum
fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m
block: store bdev->bd_disk->fops->submit_bio state in bdev
block: re-arrange the struct block_device fields for better layout
md/raid5: remove unused working_disks variable
md/raid10: don't call bio_start_io_acct twice for bio which experienced read error
md/raid10: fix memleak of md thread
md/raid10: fix memleak for 'conf->bio_split'
md/raid10: fix leak of 'r10bio->remaining' for recovery
md/raid10: don't BUG_ON() in raise_barrier()
md: fix soft lockup in status_resync
md: add error_handlers for raid0 and linear
md: Use optimal I/O size for last bitmap page
md: Fix types in sb writer
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmRHC3gACgkQxWXV+ddt
WDvI/A//ZzREEE0wNexbuidoTacDVXVJ6LBb2K1eP+HUKfsmd6GYWQDJ9x/ExpKb
T1ehLibCYWLeYxEREFbjXI3x9G8mrvLzvzsqXs/MzJPkmEF1igPddFztidBwvLQH
ey/Bh+cra2bpVhRhkX0Cf09/q/YWp17/d14ZxxW60PMfyhx8RWXejXhHkulOPVv8
+3FL8E0kc2Zjx9ioUwOy/i18LR6YzsCNVXoHzUZuWyWM4A7NG2TZR6FhuLSjlWSZ
3RAnROwr+8i5nR0xchcyYaVMO2LMbqH6mBtHnXCtxCr+4pFrfrvKym+CQco/Xriz
v1y/xDc23XeYXLCVhb0beJ6uRcjaM9+gvDF1oVBSJEv6V7sQr/tEGo/8QRehfEfT
FTro7Lf89R1GOa1IBSkv/T5S25d9LlIID3/g7PbcUBtXNKvLAjDAGTH9bzL4HS5x
/MKwN80GvaGs1KyEfUndbVPIpAwNFDYZPHM7nw1x+JTkIBcHgfjRyAMAC9jrJd0D
730W04c+0nXZtQGtKKsxc3U8y4ewzSJAKx9t7Vgo7+1P6dSRnzvJee3x/5kXV9Yn
MhxxzYDfIN9EcWbASdSm11gY5WZdG3an609pO7nc1T2K4Tuo0SPs4xOR7c3xuZrY
MN5z3QFWyI2ustUuTG+nsd5J81j76DEmj5ymWQfG3SBplTneDM0=
=Jt7p
-----END PGP SIGNATURE-----
Merge tag 'for-6.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Mostly core changes and cleanups, some notable fixes and two
performance improvements in directory logging.
The IO path cleanups are removing or refactoring old code, scrub main
loop has been completely rewritten also refactoring old code.
There are some changes to non-btrfs code, mostly trivial, the cgroup
punt bio logic is only moved from generic code.
Performance improvements:
- improve logging changes in a directory during one transaction,
avoid iterating over items and reduce lock contention (fsync time
4x lower)
- when logging directory entries during one transaction, reduce
locking of subvolume trees by checking tree-log instead
(improvement in throughput and latency for concurrent access to a
subvolume)
Notable fixes:
- dev-replace:
- properly honor read mode when requested to avoid reading from
source device
- target device won't be used for eventual read repair, this is
unreliable for NODATASUM files
- when there are unpaired (and unrepairable) metadata during
replace, exit early with error and don't try to finish whole
operation
- scrub ioctl properly rejects unknown flags
- fix global block reserve calculations
- fix partial direct io write when there's a page fault in the
middle, iomap will try to continue with partial request but the
btrfs part did not match that, this can lead to zeros written
instead of data
Core changes:
- io path:
- continued cleanups and refactoring around bio handling
- extent io submit path simplifications and cleanups
- flush write path simplifications and cleanups
- rework logic of passing sync mode of bio, with further cleanups
- rewrite scrub code flow, restructure how the stripes are enumerated
and verified in a more unified way
- allow to set lower threshold for block group reclaim in debug mode
to aid zoned mode testing
- remove obsolete time-based delayed ref throttling logic when
truncating items
- DREW locks are not using percpu variables anymore
- more warning fixes (-Wmaybe-uninitialized)
- u64 division simplifications
- error handling improvements
Non-btrfs code changes:
- push cgroup punt bio logic to btrfs code (there was no other user
of that), the functionality can be now selected separately by
BLK_CGROUP_PUNT_BIO
- crc32c_impl removed after removing last uses in btrfs code
- add btrfs_assertfail() to objtool table"
* tag 'for-6.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (147 commits)
btrfs: mark btrfs_assertfail() __noreturn
btrfs: fix uninitialized variable warnings
btrfs: use log root when iterating over index keys when logging directory
btrfs: avoid iterating over all indexes when logging directory
btrfs: dev-replace: error out if we have unrepaired metadata error during
btrfs: remove pointless loop at btrfs_get_next_valid_item()
btrfs: scrub: reject unsupported scrub flags
btrfs: reinterpret async discard iops_limit=0 as no delay
btrfs: set default discard iops_limit to 1000
btrfs: remove unused raid56 functions which were dedicated for scrub
btrfs: scrub: remove scrub_bio structure
btrfs: scrub: remove scrub_block and scrub_sector structures
btrfs: scrub: remove the old scrub recheck code
btrfs: scrub: remove the old writeback infrastructure
btrfs: scrub: remove scrub_parity structure
btrfs: scrub: use scrub_stripe to implement RAID56 P/Q scrub
btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure
btrfs: scrub: introduce helper to queue a stripe for scrub
btrfs: scrub: introduce error reporting functionality for scrub_stripe
btrfs: scrub: introduce a writeback helper for scrub_stripe
...
API:
- Total usage stats now include all that returned error (instead of some).
- Remove maximum hash statesize limit.
- Add cloning support for hmac and unkeyed hashes.
- Demote BUG_ON in crypto_unregister_alg to a WARN_ON.
Algorithms:
- Use RIP-relative addressing on x86 to prepare for PIE build.
- Add accelerated AES/GCM stitched implementation on powerpc P10.
- Add some test vectors for cmac(camellia).
- Remove failure case where jent is unavailable outside of FIPS mode in drbg.
- Add permanent and intermittent health error checks in jitter RNG.
Drivers:
- Add support for 402xx devices in qat.
- Add support for HiSTB TRNG.
- Fix hash concurrency issues in stm32.
- Add OP-TEE firmware support in caam.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmRGCjcACgkQxycdCkmx
i6d6JA//ZmwgEqAKA8qWpHnNKZylTLqFhLxnKZwr4Hhp1KzManh/T9pepXiD2zAY
D92wU60v0hfGAazeUWQRmrIZxcjyd3b3Tr7WiFuNoZbkPsuXWZAoz8iHgMq69dqb
DXZhKJnlmVlcr+qTSk9MP8HODL5kU6Ug2pk+r8hL/WsBI+JGfZEXKcJhhMqYLYls
nl+NN4fkE5tgcTh2lp/9dQsQRylhESZuqb8L2wItQmripSbhPGwYf24I7B7xcGrn
o7X4XG//cQO6zQErgnOJOosIgJEEynW27CN4ZiHB8WhRAk0YLXydQBs6EjZgNA8H
EvZC/bIx2YOt8ngG99q4kRg4OgKp4c7UnV6l1pxuJWbIyXrFh4djxHdq9pTYr3UB
P3pVEX38Wu7U5Tfgy3y1QqZzsvrPjmnI3NQ8QBrcFzNRDan5K6nH4kQyk9Cv7LQm
GlE1JOThU5U2G33ZWKCluJUjVUCRceMWQYla1X5R4uWMCwSqRMpmx8Ib9QvbYlWe
iUI+RatLnlIobx+lgaC8mtij9dQddFjk6YwFYhQcD3Bl30DhTeIlbnOUY9YOTXps
H6V9X2inVUjyZr1uJ4a7rPdCUuzQxR6HWPyp6fXMlbLrEhL8e6c4/QbEoTubRQeS
WTtoIFt4ezd2SG6hI6dTCscgFc5EAyEMDD5GtQmJeyozu0Gqtpo=
=ITkW
-----END PGP SIGNATURE-----
Merge tag 'v6.4-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Total usage stats now include all that returned errors (instead of
just some)
- Remove maximum hash statesize limit
- Add cloning support for hmac and unkeyed hashes
- Demote BUG_ON in crypto_unregister_alg to a WARN_ON
Algorithms:
- Use RIP-relative addressing on x86 to prepare for PIE build
- Add accelerated AES/GCM stitched implementation on powerpc P10
- Add some test vectors for cmac(camellia)
- Remove failure case where jent is unavailable outside of FIPS mode
in drbg
- Add permanent and intermittent health error checks in jitter RNG
Drivers:
- Add support for 402xx devices in qat
- Add support for HiSTB TRNG
- Fix hash concurrency issues in stm32
- Add OP-TEE firmware support in caam"
* tag 'v6.4-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (139 commits)
i2c: designware: Add doorbell support for Mendocino
i2c: designware: Use PCI PSP driver for communication
powerpc: Move Power10 feature PPC_MODULE_FEATURE_P10
crypto: p10-aes-gcm - Remove POWER10_CPU dependency
crypto: testmgr - Add some test vectors for cmac(camellia)
crypto: cryptd - Add support for cloning hashes
crypto: cryptd - Convert hash to use modern init_tfm/exit_tfm
crypto: hmac - Add support for cloning
crypto: hash - Add crypto_clone_ahash/shash
crypto: api - Add crypto_clone_tfm
crypto: api - Add crypto_tfm_get
crypto: x86/sha - Use local .L symbols for code
crypto: x86/crc32 - Use local .L symbols for code
crypto: x86/aesni - Use local .L symbols for code
crypto: x86/sha256 - Use RIP-relative addressing
crypto: x86/ghash - Use RIP-relative addressing
crypto: x86/des3 - Use RIP-relative addressing
crypto: x86/crc32c - Use RIP-relative addressing
crypto: x86/cast6 - Use RIP-relative addressing
crypto: x86/cast5 - Use RIP-relative addressing
...
Sometimes we use seq_buf to format a string buffer, which
we then pass to printk(). However, in certain situations
the seq_buf string buffer can get too big, exceeding the
PRINTKRB_RECORD_MAX bytes limit, and causing printk() to
truncate the string.
Add a new seq_buf helper. This helper prints the seq_buf
string buffer line by line, using \n as a delimiter,
rather than passing the whole string buffer to printk()
at once.
Link: https://lkml.kernel.org/r/20230415100110.1419872-1-senozhatsky@chromium.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmRGaygACgkQUqAMR0iA
lPKlGBAAqn0yS8E2CP16Oo8nCB5AjoPVzohh6pQ6O8G0CFhvu47EKVTHPTa1BEFE
YAz94geN5crpAmEcQyBcqkcJuLRXmYBOqE1x9M4PcCUUXTjcyYEzBYsOZO+j5jB7
LUPX6jBbm2PpbT/e1ZSr90R8MhblVfBTD7DJHmXGhibYHj5D4KOwxQnhx8uWz9aT
dgTWm1AgwEX85wUpXil5phD+YnvI/TxGlyV4AVOYh3y3K7Kc4CAeHFzCsg3h/Amr
c2RR1dzvmMcEvg8lF3U9MsnVNF/2i0Tg9BXLRxSe1c20CKhtzNNPH5krPa3vHGeP
P//FWDAd9S2hev54TN7LO92V+IsDh8nlU++HwRua50wflzJU/tkyWDtcmmlkGU6A
hqtMUWE4libAaAW7FBJomRFirmEtEA4GwXN5WH3+B6htgVwKKrKhL9U/PtQtZxZ1
GUEvtjmnBIfGndu7fHv70a1sLc9LuebOfmOQs3W6p6KUZkmL1Hqg1WGQoYwmUz4A
bZRbCwMYNJCG4iO2jDmPU27D6tWMbQdt1kZ20svP6p3PRGy8EuI1C5tnO5Jhkw3E
FCFudMMZEuZmBoztWWqEkZSfbMDlH6kc1+6+HMuCfSrpg6QD87TzO5CONIHCZyk9
f3UD04R//BubTdiKQ4y/g6OwctihX7F8i3O71hTj5etuYqPs0nI=
=t0d6
-----END PGP SIGNATURE-----
Merge tag 'printk-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Code cleanup and dead code removal
* tag 'printk-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Remove obsoleted check for non-existent "user" object
lib/vsprintf: Use isodigit() for the octal number check
Remove orphaned CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT
These are various cleanups, fixing a number of uapi header files to no
longer reference CONFIG_* symbols, and one patch that introduces the
new CONFIG_HAS_IOPORT symbol for architectures that provide working
inb()/outb() macros, as a preparation for adding driver dependencies
on those in the following release.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmRG8IkACgkQYKtH/8kJ
Uid15Q/9E/neIIEqEk6IvtyhUicrJiIZUM0rGoYtWXiz75ggk6Kx9+3I+j8zIQ/E
kf2TzAG7q9Md7nfTDFLr4FSr0IcNDj+VG4nYxUyDHdKGcARO+g9Kpdvscxip3lgU
Rw5w74Gyd30u4iUKGS39OYuxcCgl9LaFjMA9Gh402Oiaoh+OYLmgQS9h/goUD5KN
Nd+AoFvkdbnHl0/SpxthLRyL5rFEATBmAY7apYViPyMvfjS3gfDJwXJR9jkKgi6X
Qs4t8Op8BA3h84dCuo6VcFqgAJs2Wiq3nyTSUnkF8NxJ2RFTpeiVgfsLOzXHeDgz
SKDB4Lp14o3mlyZyj00MWq1uMJRRetUgNiVb6iHOoKQ/E4demBdh+mhIFRybjM5B
XNTWFcg9PWFCMa4W9jnLfZBc881X4+7T+qUF8I0W/1AbRJUmyGj8HO6jLceC4yGD
UYLn5oFPM6OWXHp6DqJrCr9Yw8h6fuviQZFEbl/ARlgVGt+J4KbYweJYk8DzfX6t
PZIj8LskOqyIpRuC2oDA1PHxkaJ1/z+N5oRBHq1uicSh4fxY5HW7HnyzgF08+R3k
cf+fjAhC3TfGusHkBwQKQJvpxrxZjPuvYXDZ0GxTvNKJRB8eMeiTm1n41E5oTVwQ
swSblSCjZj/fMVVPXLcjxEW4SBNWRxa9Lz3tIPXb3RheU10Lfy8=
=H3k4
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"These are various cleanups, fixing a number of uapi header files to no
longer reference CONFIG_* symbols, and one patch that introduces the
new CONFIG_HAS_IOPORT symbol for architectures that provide working
inb()/outb() macros, as a preparation for adding driver dependencies
on those in the following release"
* tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
Kconfig: introduce HAS_IOPORT option and select it as necessary
scripts: Update the CONFIG_* ignore list in headers_install.sh
pktcdvd: Remove CONFIG_CDROM_PKTCDVD_WCACHE from uapi header
Move bp_type_idx to include/linux/hw_breakpoint.h
Move ep_take_care_of_epollwakeup() to fs/eventpoll.c
Move COMPAT_ATM_ADDPARTY to net/atm/svc.c
- Improve the VDSO build time checks to cover all dynamic relocations
VDSO does not allow dynamic relcations, but the build time check is
incomplete and fragile.
It's based on architectures specifying the relocation types to search
for and does not handle R_*_NONE relocation entries correctly.
R_*_NONE relocations are injected by some GNU ld variants if they fail
to determine the exact .rel[a]/dyn_size to cover trailing zeros.
R_*_NONE relocations must be ignored by dynamic loaders, so they
should be ignored in the build time check too.
Remove the architecture specific relocation types to check for and
validate strictly that no other relocations than R_*_NONE end up
in the VSDO .so file.
- Prefer signal delivery to the current thread for
CLOCK_PROCESS_CPUTIME_ID based posix-timers
Such timers prefer to deliver the signal to the main thread of a
process even if the context in which the timer expires is the current
task. This has the downside that it might wake up an idle thread.
As there is no requirement or guarantee that the signal has to be
delivered to the main thread, avoid this by preferring the current
task if it is part of the thread group which shares sighand.
This not only avoids waking idle threads, it also distributes the
signal delivery in case of multiple timers firing in the context
of different threads close to each other better.
- Align the tick period properly (again)
For a long time the tick was starting at CLOCK_MONOTONIC zero, which
allowed users space applications to either align with the tick or to
place a periodic computation so that it does not interfere with the
tick. The alignement of the tick period was more by chance than by
intention as the tick is set up before a high resolution clocksource is
installed, i.e. timekeeping is still tick based and the tick period
advances from there.
The early enablement of sched_clock() broke this alignement as the time
accumulated by sched_clock() is taken into account when timekeeping is
initialized. So the base value now(CLOCK_MONOTONIC) is not longer a
multiple of tick periods, which breaks applications which relied on
that behaviour.
Cure this by aligning the tick starting point to the next multiple of
tick periods, i.e 1000ms/CONFIG_HZ.
- A set of NOHZ fixes and enhancements
- Cure the concurrent writer race for idle and IO sleeptime statistics
The statitic values which are exposed via /proc/stat are updated from
the CPU local idle exit and remotely by cpufreq, but that happens
without any form of serialization. As a consequence sleeptimes can be
accounted twice or worse.
Prevent this by restricting the accumulation writeback to the CPU
local idle exit and let the remote access compute the accumulated
value.
- Protect idle/iowait sleep time with a sequence count
Reading idle/iowait sleep time, e.g. from /proc/stat, can race with
idle exit updates. As a consequence the readout may result in random
and potentially going backwards values.
Protect this by a sequence count, which fixes the idle time
statistics issue, but cannot fix the iowait time problem because
iowait time accounting races with remote wake ups decrementing the
remote runqueues nr_iowait counter. The latter is impossible to fix,
so the only way to deal with that is to document it properly and to
remove the assertion in the selftest which triggers occasionally due
to that.
- Restructure struct tick_sched for better cache layout
- Some small cleanups and a better cache layout for struct tick_sched
- Implement the missing timer_wait_running() callback for POSIX CPU timers
For unknown reason the introduction of the timer_wait_running() callback
missed to fixup posix CPU timers, which went unnoticed for almost four
years.
While initially only targeted to prevent livelocks between a timer
deletion and the timer expiry function on PREEMPT_RT enabled kernels, it
turned out that fixing this for mainline is not as trivial as just
implementing a stub similar to the hrtimer/timer callbacks.
The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled systems
there is a livelock issue independent of RT.
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU timers
out from hard interrupt context to task work, which is handled before
returning to user space or to a VM. The expiry mechanism moves the
expired timers to a stack local list head with sighand lock held. Once
sighand is dropped the task can be preempted and a task which wants to
delete a timer will spin-wait until the expiry task is scheduled back
in. In the worst case this will end up in a livelock when the preempting
task and the expiry task are pinned on the same CPU.
The timer wheel has a timer_wait_running() mechanism for RT, which uses
a per CPU timer-base expiry lock which is held by the expiry code and the
task waiting for the timer function to complete blocks on that lock.
This does not work in the same way for posix CPU timers as there is no
timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry lock
can be used too in a slightly different way.
Add a per task mutex to struct posix_cputimers_work, let the expiry task
hold it accross the expiry function and let the deleting task which
waits for the expiry to complete block on the mutex.
In the non-contended case this results in an extra mutex_lock()/unlock()
pair on both sides.
This avoids spin-waiting on a task which is scheduled out, prevents the
livelock and cures the problem for RT and !RT systems.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGrj4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZhdEAC/lwfDWCnTXHC8ExQQRDIVNyXmDlLb
EHB8ZY7Wc4gNZ8UEXEOLOXJHMG9bsbtPGctVewJwRGnXZWKVhpPwQba6kCRycyX0
0J6l5DlvUaGGrpoOzOZwgETRmtIZE9tEArZR8xlfRScYd93a7yLhwIjO8JaV9vKs
IQpAQMeJ/ysp6gHrS59qakYfoHU/ERUAu3Tk4GqHUtPtcyz3nX3eTlLWV8LySqs+
00qr2yc0bQFUFoKzTCxtM8lcEi9ja9SOj1rw28348O+BXE4d0HC12Ie7eU/CDN2Y
OAlWYxVjy4LMh24LDrRQKTzoVqx9MXDx2g+09B3t8NK5LgeS+EJIjujDhZF147/H
5y906nplZUKa8BiZW5Rpm/HKH8tFI80T9XWSQCRBeMgTEJyRyRU1yASAwO4xw+dY
Dn3tGmFGymcV/72o4ic9JFKQd8cTSxPjEJS3qqzMkEAtyI/zPBmKxj/Tce50OH40
6FSZq1uU21ZQzszwSHISwgFtNr75laUSK4Z1te5OhPOOz+C7O9YqHvqS/1jwhPj2
tMd8X17fRW3UTUBlBj+zqxqiEGBl/Yk2AvKrJIXGUtfWYCtjMJ7ieCf0kZ7NSVJx
9ewubA0gqseMD783YomZsy8LLtMKnhclJeslUOVb1oKs1q/WF1R/k6qjy9vUwYaB
nIJuHl8mxSetag==
=SVnj
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers and timekeeping updates from Thomas Gleixner:
- Improve the VDSO build time checks to cover all dynamic relocations
VDSO does not allow dynamic relocations, but the build time check is
incomplete and fragile.
It's based on architectures specifying the relocation types to search
for and does not handle R_*_NONE relocation entries correctly.
R_*_NONE relocations are injected by some GNU ld variants if they
fail to determine the exact .rel[a]/dyn_size to cover trailing zeros.
R_*_NONE relocations must be ignored by dynamic loaders, so they
should be ignored in the build time check too.
Remove the architecture specific relocation types to check for and
validate strictly that no other relocations than R_*_NONE end up in
the VSDO .so file.
- Prefer signal delivery to the current thread for
CLOCK_PROCESS_CPUTIME_ID based posix-timers
Such timers prefer to deliver the signal to the main thread of a
process even if the context in which the timer expires is the current
task. This has the downside that it might wake up an idle thread.
As there is no requirement or guarantee that the signal has to be
delivered to the main thread, avoid this by preferring the current
task if it is part of the thread group which shares sighand.
This not only avoids waking idle threads, it also distributes the
signal delivery in case of multiple timers firing in the context of
different threads close to each other better.
- Align the tick period properly (again)
For a long time the tick was starting at CLOCK_MONOTONIC zero, which
allowed users space applications to either align with the tick or to
place a periodic computation so that it does not interfere with the
tick. The alignement of the tick period was more by chance than by
intention as the tick is set up before a high resolution clocksource
is installed, i.e. timekeeping is still tick based and the tick
period advances from there.
The early enablement of sched_clock() broke this alignement as the
time accumulated by sched_clock() is taken into account when
timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is
not longer a multiple of tick periods, which breaks applications
which relied on that behaviour.
Cure this by aligning the tick starting point to the next multiple of
tick periods, i.e 1000ms/CONFIG_HZ.
- A set of NOHZ fixes and enhancements:
* Cure the concurrent writer race for idle and IO sleeptime
statistics
The statitic values which are exposed via /proc/stat are updated
from the CPU local idle exit and remotely by cpufreq, but that
happens without any form of serialization. As a consequence
sleeptimes can be accounted twice or worse.
Prevent this by restricting the accumulation writeback to the CPU
local idle exit and let the remote access compute the accumulated
value.
* Protect idle/iowait sleep time with a sequence count
Reading idle/iowait sleep time, e.g. from /proc/stat, can race
with idle exit updates. As a consequence the readout may result
in random and potentially going backwards values.
Protect this by a sequence count, which fixes the idle time
statistics issue, but cannot fix the iowait time problem because
iowait time accounting races with remote wake ups decrementing
the remote runqueues nr_iowait counter. The latter is impossible
to fix, so the only way to deal with that is to document it
properly and to remove the assertion in the selftest which
triggers occasionally due to that.
* Restructure struct tick_sched for better cache layout
* Some small cleanups and a better cache layout for struct
tick_sched
- Implement the missing timer_wait_running() callback for POSIX CPU
timers
For unknown reason the introduction of the timer_wait_running()
callback missed to fixup posix CPU timers, which went unnoticed for
almost four years.
While initially only targeted to prevent livelocks between a timer
deletion and the timer expiry function on PREEMPT_RT enabled kernels,
it turned out that fixing this for mainline is not as trivial as just
implementing a stub similar to the hrtimer/timer callbacks.
The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled
systems there is a livelock issue independent of RT.
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU
timers out from hard interrupt context to task work, which is handled
before returning to user space or to a VM. The expiry mechanism moves
the expired timers to a stack local list head with sighand lock held.
Once sighand is dropped the task can be preempted and a task which
wants to delete a timer will spin-wait until the expiry task is
scheduled back in. In the worst case this will end up in a livelock
when the preempting task and the expiry task are pinned on the same
CPU.
The timer wheel has a timer_wait_running() mechanism for RT, which
uses a per CPU timer-base expiry lock which is held by the expiry
code and the task waiting for the timer function to complete blocks
on that lock.
This does not work in the same way for posix CPU timers as there is
no timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry
lock can be used too in a slightly different way.
Add a per task mutex to struct posix_cputimers_work, let the expiry
task hold it accross the expiry function and let the deleting task
which waits for the expiry to complete block on the mutex.
In the non-contended case this results in an extra
mutex_lock()/unlock() pair on both sides.
This avoids spin-waiting on a task which is scheduled out, prevents
the livelock and cures the problem for RT and !RT systems
* tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Implement the missing timer_wait_running callback
selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
selftests/proc: Remove idle time monotonicity assertions
MAINTAINERS: Remove stale email address
timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
timers/nohz: Add a comment about broken iowait counter update race
timers/nohz: Protect idle/iowait sleep time under seqcount
timers/nohz: Only ever update sleeptime from idle exit
timers/nohz: Restructure and reshuffle struct tick_sched
tick/common: Align tick period with the HZ tick.
selftests/timers/posix_timers: Test delivery of signals across threads
posix-timers: Prefer delivery of signals to the current thread
vdso: Improve cmd_vdso_check to check all dynamic relocations
Prevent a race vs. statically initialized objects. Such objects are
usually not initialized via an init() function. They are special cased
and detected on first use under the assumption that they are already
correctly initialized via the static initializer.
This works correctly unless there are two concurrent debug object
operations on such an object.
The first one detects that the object is not yet tracked and tries to
establish a tracking object after dropping the debug objects hash bucket
lock. The concurrent operation does the same. The one which wins the
race ends up modifying the state of the object which makes the other one
fail resulting in a bogus debug objects warning.
Prevent this by making the detection of a static object and the
allocation of a tracking object atomic under the hash bucket lock. So the
first one to acquire the hash bucket lock will succeed and the second one
will observe the correct tracking state.
This race existed forever but was only exposed when the timer wheel code
added a debug_object_assert_init() call outside of the timer base locked
region. This replaced the previous warning about timer::function being
NULL which had to be removed when the timer_shutdown() mechanics were
added.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGfx8THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYod4YD/98pgjxl9zht0tJpjOQv1GHQeKWGOnS
T2NcK7UBF7IZnGVCaQovs1TLPEiHZMY9TgSmefP9UuYNCUthdzgxUv1hljb915zI
xcQmqFopUyFF+F+qE7ti1C4HvXzbdss14XK97EcsoooS1ALq5xTkUJEcmdLFRL85
/ACkHz0/iMEHT9QVX6WoAOptg7HLoscb30CEZGa8skStAIRZfMIFqmN5GXzKUsPH
oLUldSjoXyqq2ZBu9jiO9GoPmei3VuaZO3qWtN4KYY0C37BvKavgS2N/NsOh7s+0
I51G5+R8o6kQgr3RSll6frsPcy1EXsgPDZXO5tC1W9bp6+yrQ97ztdG0QS52fcPb
fcCQtAX3L+K38vf4GfvboDyf7x21leJSYE3u+HCXUlyC2Es8QZgWw4U7Bi8IwSZg
/BKC6QkQD/YyG/aQyZq6ZGiLgbJt8g53WiR8HGx35P3RUEy5Mit3bBSuq1dSuGR0
RozFlWswUif3Teticq33MR6Mv9M3866lX4iTMGT50xjJZirb8ongpKkRxIOHVeXV
4//0V/GOswyTwkY884Q6zJCZZq2FEudn6/Vtjh97zLxvJzLbdIEnEPC5HG75Jed0
a9NISg+NT9VOx4PLwgMWgW6dlT5SNUeWD4ddC879c4ELbyNd1i4AY54pMrcwEVVj
fGdL6pFfFzZI5w==
=19cg
-----END PGP SIGNATURE-----
Merge tag 'core-debugobjects-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core debugobjects update from Thomas Gleixner:
"A single update to debugobjects:
Prevent a race vs statically initialized objects. Such objects are
usually not initialized via an init() function. They are special cased
and detected on first use under the assumption that they are already
correctly initialized via the static initializer.
This works correctly unless there are two concurrent debug object
operations on such an object.
The first one detects that the object is not yet tracked and tries to
establish a tracking object after dropping the debug objects hash
bucket lock. The concurrent operation does the same. The one which
wins the race ends up modifying the state of the object which makes
the other one fail resulting in a bogus debug objects warning.
Prevent this by making the detection of a static object and the
allocation of a tracking object atomic under the hash bucket lock. So
the first one to acquire the hash bucket lock will succeed and the
second one will observe the correct tracking state.
This race existed forever but was only exposed when the timer wheel
code added a debug_object_assert_init() call outside of the timer base
locked region. This replaced the previous warning about
timer::function being NULL which had to be removed when the
timer_shutdown() mechanics were added"
* tag 'core-debugobjects-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobject: Prevent init race with static objects
This KUnit update Linux 6.4-rc1 consists of:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmRFYFsACgkQCwJExA0N
Qxw+PxAA1KHnHool3QbzZouFgLgTS2N/hxsOIoWKeUl6guUPX0XYu67FEIyt7p5k
a1eFLjt+q4URW/heHKYdffP+Up6xhN5yVP8xJEcbn6GD13lz1clI9RAjObiPOehc
KOV90PeAEfzosEGRIp97g4Gzu8NUMZqN7BsKBdzYJ4rEftlcjaILBVp4OfSuCyAi
UbYBdRjK4eIOwGXuHVfhNqzH1HRSbzcoSRTywj5qW0Qhpe6KnZBRuZESXYBsxzGb
G0nd4+OttjZyplI/xQYwaU0XGAI6roG5G4nAT5YGHLp5g8rTaHetTi+i3iK4iEru
wEL0NgywkA0ujAge97RldOjtU97vvSFk7FwxdS9lxaMW/Ut2sN72I2ThI8dBvVRZ
fcw8t8mmT1gUv3SCq+s1X13vz22IedXLOfvOY2o/fLk2zxOw5e8FirAz/aFeOf3K
++hK+IQvDmeMMv08bz0ORzdRQcjdwQNQ3klnfdrUVFN9yK+iAllOJ/nrXHLNIXu4
c3ITlAMldcAf2W+LRWzvqqKyT4H8MCXL3L0bBc1M1reRu9nM89AZedO8MHCB0R9Q
2ic0rOxIwZzPJuk0qPDxEVmN7Rpyx85I96YOwRemJTEfdkB/ZX+BfOU0KzinOVHC
3qrHuIw/SyRTlUEDAr53gJ5WHbdjhKAmrd1/FuplyoOSX0w6VVA=
=COQn
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
* tag 'linux-kselftest-kunit-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: add tests for using current KUnit test field
kunit: tool: Add support for SH under QEMU
kunit: tool: Add support for overriding the QEMU serial port
.gitignore: Unignore .kunitconfig
list: test: Test the klist structure
kunit: increase KUNIT_LOG_SIZE to 2048 bytes
kunit: Use gfp in kunit_alloc_resource() kernel-doc
kunit: tool: fix pre-existing `mypy --strict` errors and update run_checks.py
kunit: tool: remove unused imports and variables
kunit: tool: add subscripts for type annotations where appropriate
kunit: fix bug of extra newline characters in debugfs logs
kunit: fix bug in the order of lines in debugfs logs
kunit: fix bug in debugfs logs of parameterized tests
kunit: tool: Add support for m68k under QEMU
o MAINTAINERS files additions and changes.
o Fix hotplug warning in nohz code.
o Tick dependency changes by Zqiang.
o Lazy-RCU shrinker fixes by Zqiang.
o rcu-tasks stall reporting improvements by Neeraj.
o Initial changes for renaming of k[v]free_rcu() to its new k[v]free_rcu_mightsleep()
name for robustness.
o Documentation Updates:
o Significant changes to srcu_struct size.
o Deadlock detection for srcu_read_lock() vs synchronize_srcu() from Boqun.
o rcutorture and rcu-related tool, which are targeted for v6.4 from Boqun's tree.
o Other misc changes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEcoCIrlGe4gjE06JJqA4nf2o45hAFAmQuBnIACgkQqA4nf2o4
5hACVRAAoXu7/gfh5Pjw9O4E4pCdPJKsZZVYrcrVGrq6NAxRn6M1SgurAdC5grj2
96x0waoGaiO82V0H5iJMcKdAVu67x9R8WaQ1JoxN75Efn8h9W4TguB87TV1gk0xS
eZ18b/CyEaM5mNb80DFFF4FLohy5737p/kNTMqXQdUyR1BsDl16iRMgjiBiFhNUx
yPo8Y2kC2U2OTbldZgaE7s9bQO3xxEcifx93sGWsAex/gx54FYNisiwSlCOSgOE+
XkYo/OKk8Xvr82tLVX8XQVEPCMJ+rxea8T5zSs8/alvsPq7gA8wW3y6fsoa3vUU/
+Gd+W+Q/OsONIDtp8rQAY1qsD0ScDpaR8052RSH0zTa7pj8HsQgE5PjZ+cJW0SEi
cKN+Oe8+ETqKald+xZ6PDf58O212VLrru3RpQWrOQcJ7fmKmfT4REK0RcbLgg4qT
CBgOo6eg+ub4pxq2y11LZJBNTv1/S7xAEzFE0kArew64KB2gyVud0VJRZVAJnEfe
93QQVDFrwK2bhgWQZ6J6IbTvGeQW0L93IibuaU6jhZPR283VtUIIvM7vrOylN7Fq
4jsae0T7YGYfKUhgTpm7rCnm8A/D3Ni8MY0sKYYgDSyKmZUsnpI5wpx1xke4lwwV
ErrY46RCFa+k8wscc6iWfB4cGXyyFHyu+wtyg0KpFn5JAzcfz4A=
=Rgbj
-----END PGP SIGNATURE-----
Merge tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux
Pull RCU updates from Joel Fernandes:
- Updates and additions to MAINTAINERS files, with Boqun being added to
the RCU entry and Zqiang being added as an RCU reviewer.
I have also transitioned from reviewer to maintainer; however, Paul
will be taking over sending RCU pull-requests for the next merge
window.
- Resolution of hotplug warning in nohz code, achieved by fixing
cpu_is_hotpluggable() through interaction with the nohz subsystem.
Tick dependency modifications by Zqiang, focusing on fixing usage of
the TICK_DEP_BIT_RCU_EXP bitmask.
- Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n
kernels, fixed by Zqiang.
- Improvements to rcu-tasks stall reporting by Neeraj.
- Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for
increased robustness, affecting several components like mac802154,
drbd, vmw_vmci, tracing, and more.
A report by Eric Dumazet showed that the API could be unknowingly
used in an atomic context, so we'd rather make sure they know what
they're asking for by being explicit:
https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/
- Documentation updates, including corrections to spelling,
clarifications in comments, and improvements to the srcu_size_state
comments.
- Better srcu_struct cache locality for readers, by adjusting the size
of srcu_struct in support of SRCU usage by Christoph Hellwig.
- Teach lockdep to detect deadlocks between srcu_read_lock() vs
synchronize_srcu() contributed by Boqun.
Previously lockdep could not detect such deadlocks, now it can.
- Integration of rcutorture and rcu-related tools, targeted for v6.4
from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis
module parameter, and more
- Miscellaneous changes, various code cleanups and comment improvements
* tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits)
checkpatch: Error out if deprecated RCU API used
mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep()
rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep()
ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access
rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed
rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan()
rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early
rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels
rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check
rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
rcu/trace: use strscpy() to instead of strncpy()
tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
...
Use the same pattern as the compat version of this code does: instead of
copying the whole array to a kernel buffer and then having a separate
phase of verifying it, just do it one entry at a time, verifying as you
go.
On Jens' /dev/zero readv() test this improves performance by ~6%.
[ This was obviously triggered by Jens' ITER_UBUF updates series ]
Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/all/de35d11d-bce7-e976-7372-1f2caf417103@kernel.dk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRCvdsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpg4oD/457EJ21Fm36NuyT/S0Cr8ok9Tdk7t9BeBh
V/9CYThoXr5aqAox0Vq23FF+Rhzm81GzwYERN4493LBblliNeNOo2IaXF9/7qrUW
11v9Bkug2J3k3hRGtEa6Zl0EpMu+FRLsNpchjFS2KPuOq+iMDxrvwuy50kidWg7n
r25e4UwpExVO9fIoUSmzgWVfRHOTuj9yiG/UsaH2+2BRXerIX0Q1tyElwmcGh25M
Ad2hN+yDnuIbNA5gNUpnzY32Dp0zjAsquc//QOvq9mltcNTElokB8idGliismvyd
8qF0lkwQwewOBT/sSD5EY3K0Qd8IJu425bvT/yPUDScHz1chxHUoxo5eisIr2M9l
5AL5KHAf7Zzs8ZuV+IYPzZ5qM6a/vF3mHUisKRNKYVhF46Nmd4cBratfXwWb1MxV
clQM2qr0TLOYli9mOeTXph3hg/rBVqKqf90boAZoN8b2tWBKlMykpqRadbepjrgx
bmBSwwAF99NxIHEjU3U5DMdUloCSiMZIfMfDxQrPNDrfWAW4xJs5Ym0VeOjEotTt
oFEs1fr6c3Mn7KEuPPfOtnDxvs51IP/B8+gDgMt/edf+wHiCU1Zm31u2gxt2dsKh
g73Y92i5SHjIf36H5szBTeioyMy1E1VA9HF14xWz2eKdQ+wxQ9VNWoctcJ85k3F4
6AZDYRIrWA==
=EaE9
-----END PGP SIGNATURE-----
Merge tag 'iter-ubuf.2-2023-04-21' of git://git.kernel.dk/linux
Pull ITER_UBUF updates from Jens Axboe:
"This turns singe vector imports into ITER_UBUF, rather than
ITER_IOVEC.
The former is more trivial to iterate and advance, and hence a bit
more efficient. From some very unscientific testing, ~60% of all iovec
imports are single vector"
* tag 'iter-ubuf.2-2023-04-21' of git://git.kernel.dk/linux:
iov_iter: Mark copy_compat_iovec_from_user() noinline
iov_iter: import single vector iovecs as ITER_UBUF
iov_iter: convert import_single_range() to ITER_UBUF
iov_iter: overlay struct iovec and ubuf/len
iov_iter: set nr_segs = 1 for ITER_UBUF
iov_iter: remove iov_iter_iovec()
iov_iter: add iter_iov_addr() and iter_iov_len() helpers
ALSA: pcm: check for user backed iterator, not specific iterator type
IB/qib: check for user backed iterator, not specific iterator type
IB/hfi1: check for user backed iterator, not specific iterator type
iov_iter: add iter_iovec() helper
block: ensure bio_alloc_map_data() deals with ITER_UBUF correctly
In the case of reverse allocation, mas->index and mas->last do not point
to the correct allocation range, which will cause users to get incorrect
allocation results, so fix it. If the user does not use it in a specific
way, this bug will not be triggered.
This is a bug, but only VMA uses it now, the way VMA is used now will
not trigger it. There is a possibility that a user will trigger it in
the future.
Also re-check whether the size is still satisfied after the lower bound
was increased, which is a corner case and is incorrect in previous
versions.
Link: https://lkml.kernel.org/r/20230419093625.99201-1-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
__show_mem() needs to iterate over all zones that have memory, we can
simplify the code by using for_each_populated_zone().
Link: https://lkml.kernel.org/r/20230417035226.4013584-1-yajun.deng@linux.dev
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Export group_cpus_evenly() so that some modules
can make use of it to group CPUs evenly according
to NUMA and CPU locality.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230323053043.35-2-xieyongji@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This has a slight benefit for x86 and has no effect on other targets.
The benefit to x86 is it change the codegen for setting a node to block
from `mov %r0, %r1; or $RB_BLACK, %r1` to `lea RB_BLACK(%r0), %r1` which
saves an instructions.
In all other cases it just replace ALU with ALU (or -> and) which
perform the same on all machines I am aware of.
Total instructions in rbtree.o:
Before - 802
After - 782
so it saves about 20 `mov` instructions.
Link: https://lkml.kernel.org/r/20230404221350.3806566-1-goldstein.w.n@gmail.com
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The type of variable pointed to by pivs is unsigned long, but the type
used in sizeof is a pointer type. Change it to unsigned long.
This change has no runtime effect, as sizeof(ul) == sizeof(ul *).
Link: https://lkml.kernel.org/r/20230411023513.15227-1-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reported-by: David Binderman <dcb314@hotmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Simplify code of mas_wr_node_walk() without changing functionality, and
improve readability. Remove some special judgments. Instead of
dynamically recording the min and max in the loop, get the final min and
max directly at the end.
Link: https://lkml.kernel.org/r/20230314124203.91572-3-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add vm_map_ram()/vm_unmap_ram() test case to our stress test-suite.
[akpm@linux-foundation.org: fix whitespace, per Lorenzo]
Link: https://lkml.kernel.org/r/20230330190639.431589-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The internal function of mas_awalk() was incorrectly skipping the last
entry in a node, which could potentially be NULL. This is only a problem
for the left-most node in the tree - otherwise that NULL would not exist.
Fix mas_awalk() by using the metadata to obtain the end of the node for
the loop and the logical pivot as apposed to the raw pivot value.
Link: https://lkml.kernel.org/r/20230414145728.4067069-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stop using maple state min/max for the range by passing through pointers
for those values. This will allow the maple state to be reused without
resetting.
Also add some logic to fail out early on searching with invalid
arguments.
Link: https://lkml.kernel.org/r/20230414145728.4067069-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This was only ever used by btrfs, and the usage just went away.
This effectively reverts df91f56adce1 ("libcrc32c: Add crc32c_impl
function").
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This fixes a build error when CONFIG_FAULT_INJECTION_CONFIGFS=y and
CONFIG_CONFIGFS_FS=m.
Since the fault-injection library cannot built as a module, avoid building
configfs as a module.
Fixes: 4668c7a2940d ("fault-inject: allow configuration via configfs")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304150025.K0hczLR4-lkp@intel.com/
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In mas_alloc_nodes(), "node->node_count = 0" means to initialize the
node_count field of the new node, but the node may not be a new node. It
may be a node that existed before and node_count has a value, setting it
to 0 will cause a memory leak. At this time, mas->alloc->total will be
greater than the actual number of nodes in the linked list, which may
cause many other errors. For example, out-of-bounds access in
mas_pop_node(), and mas_pop_node() may return addresses that should not be
used. Fix it by initializing node_count only for new nodes.
Also, by the way, an if-else statement was removed to simplify the code.
Link: https://lkml.kernel.org/r/20230411041005.26205-1-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Statically initialized objects are usually not initialized via the init()
function of the subsystem. They are special cased and the subsystem
provides a function to validate whether an object which is not yet tracked
by debugobjects is statically initialized. This means the object is started
to be tracked on first use, e.g. activation.
This works perfectly fine, unless there are two concurrent operations on
that object. Schspa decoded the problem:
T0 T1
debug_object_assert_init(addr)
lock_hash_bucket()
obj = lookup_object(addr);
if (!obj) {
unlock_hash_bucket();
- > preemption
lock_subsytem_object(addr);
activate_object(addr)
lock_hash_bucket();
obj = lookup_object(addr);
if (!obj) {
unlock_hash_bucket();
if (is_static_object(addr))
init_and_track(addr);
lock_hash_bucket();
obj = lookup_object(addr);
obj->state = ACTIVATED;
unlock_hash_bucket();
subsys function modifies content of addr,
so static object detection does
not longer work.
unlock_subsytem_object(addr);
if (is_static_object(addr)) <- Fails
debugobject emits a warning and invokes the fixup function which
reinitializes the already active object in the worst case.
This race exists forever, but was never observed until mod_timer() got a
debug_object_assert_init() added which is outside of the timer base lock
held section right at the beginning of the function to cover the lockless
early exit points too.
Rework the code so that the lookup, the static object check and the
tracking object association happens atomically under the hash bucket
lock. This prevents the issue completely as all callers are serialized on
the hash bucket lock and therefore cannot observe inconsistent state.
Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects")
Reported-by: syzbot+5093ba19745994288b53@syzkaller.appspotmail.com
Debugged-by: Schspa Shi <schspa@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://syzkaller.appspot.com/bug?id=22c8a5938eab640d1c6bcc0e3dc7be519d878462
Link: https://lore.kernel.org/lkml/20230303161906.831686-1-schspa@gmail.com
Link: https://lore.kernel.org/r/87zg7dzgao.ffs@tglx
Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.
So remove it in the files in this commit, none of which can be built as
modules.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.
So remove it in the files in this commit, none of which can be built as
modules.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.
So remove it in the files in this commit, none of which can be built as
modules.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Now blake2s-generic.c can no longer be a module, drop all remaining
module-related code as well.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Requested-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Since commit 8b41fc4454e ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.
So remove it in the files in this commit, none of which can be built as
modules.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: linux-modules@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
This provides a helper function to allow configuration of fault-injection
for configfs-based drivers.
The config items created by this function have the same interface as the
one created under debugfs by fault_create_debugfs_attr().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Link: https://lore.kernel.org/r/20230327143733.14599-2-akinobu.mita@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After commit 6376ce56feb6 ("iov_iter: import single vector iovecs as
ITER_UBUF"), GCC does an inter-procedural compiler optimization which
moves the user_access_begin() out of copy_compat_iovec_from_user() and
into its callers:
lib/iov_iter.o: warning: objtool: .altinstr_replacement+0x0: redundant UACCESS disable
lib/iov_iter.o: warning: objtool: iovec_from_user.part.0+0xc7: call to copy_compat_iovec_from_user.part.0() with UACCESS enabled
lib/iov_iter.o: warning: objtool: __import_iovec+0x21d: call to copy_compat_iovec_from_user.part.0() with UACCESS enabled
Enforce the "no UACCESS enable across function boundaries" rule by
disabling cloning for copy_compat_iovec_from_user().
Fixes: 6376ce56feb6 ("iov_iter: import single vector iovecs as ITER_UBUF")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
https://lkml.kernel.org/lkml/20230327120017.6bb826d7@canb.auug.org.au
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When we get a random number to generate a flag in the valid range of
UNESCAPE flags, use UNESCAPE_ALL_MASK, It's more correct and prevents from
missed updates of the test coverage in the future if any.
Link: https://lkml.kernel.org/r/20230327142604.48213-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ELF is acronym and therefore should be spelled in all caps.
I left one exception at Documentation/arm/nwfpe/nwfpe.rst which looks like
being written in the first person.
Link: https://lkml.kernel.org/r/Y/3wGWQviIOkyLJW@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Provide a means to copy a page to user space from an iterator, aborting if
a page fault would occur. This supports compound pages, but may be passed
a tail page with an offset extending further into the compound page, so we
cannot pass a folio.
This allows for this function to be called from atomic context and _try_
to user pages if they are faulted in, aborting if not.
The function does not use _copy_to_iter() in order to not specify
might_fault(), this is similar to copy_page_from_iter_atomic().
This is being added in order that an iteratable form of vread() can be
implemented while holding spinlocks.
Link: https://lkml.kernel.org/r/19734729defb0f498a76bdec1bef3ac48a3af3e8.1679511146.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There is a concurrency bug that may cause the wrong value to be loaded
when a CPU is modifying the maple tree.
CPU1:
mtree_insert_range()
mas_insert()
mas_store_root()
...
mas_root_expand()
...
rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
ma_set_meta(node, maple_leaf_64, 0, slot); <---IP
CPU2:
mtree_load()
mtree_lookup_walk()
ma_data_end();
When CPU1 is about to execute the instruction pointed to by IP, the
ma_data_end() executed by CPU2 may return the wrong end position, which
will cause the value loaded by mtree_load() to be wrong.
An example of triggering the bug:
Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
mas_root_expand().
static DEFINE_MTREE(tree);
int work(void *p) {
unsigned long val;
for (int i = 0 ; i< 30; ++i) {
val = (unsigned long)mtree_load(&tree, 8);
mdelay(5);
pr_info("%lu",val);
}
return 0;
}
mt_init_flags(&tree, MT_FLAGS_USE_RCU);
mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
run_thread(work)
mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);
In RCU mode, mtree_load() should always return the value before or after
the data structure is modified, and in this example mtree_load(&tree, 8)
may return 56789 which is not expected, it should always return NULL. Fix
it by put ma_set_meta() before rcu_assign_pointer().
Link: https://lkml.kernel.org/r/20230314124203.91572-4-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
if (likely(offset > end))
max = pivots[offset];
The above code should be changed to if (likely(offset < end)), which is
correct. This affects the correctness of ma_data_end(). Now it seems
that the final result will not be wrong, but it is best to change it.
This patch does not change the code as above, because it simplifies the
code by the way.
Link: https://lkml.kernel.org/r/20230314124203.91572-1-zhangpeng.00@bytedance.com
Link: https://lkml.kernel.org/r/20230314124203.91572-2-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dereferencing RCU objects within the RCU callback without the RCU check
has caused lockdep to complain. Fix the RCU dereferencing by using the
RCU callback lock to ensure the operation is safe.
Also stop creating a new lock to use for dereferencing during destruction
of the tree or subtree. Instead, pass through a pointer to the tree that
has the lock that is held for RCU dereferencing checking. It also does
not make sense to use the maple state in the freeing scenario as the tree
walk is a special case where the tree no longer has the normal encodings
and parent pointers.
Link: https://lkml.kernel.org/r/20230227173632.3292573-8-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add an smp_rmb() before reading the parent pointer to ensure that anything
read from the node prior to the parent pointer hasn't been reordered ahead
of this check.
The is necessary for RCU mode.
Link: https://lkml.kernel.org/r/20230227173632.3292573-7-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During the development of the maple tree, the strategy of freeing multiple
nodes changed and, in the process, the pivots were reused to store
pointers to dead nodes. To ensure the readers see accurate pivots, the
writers need to mark the nodes as dead and call smp_wmb() to ensure any
readers can identify the node as dead before using the pivot values.
There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead. Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.
Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed
are marked as dead to ensure there are no other call paths besides the two
updated paths.
This is necessary for the RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The call to mte_set_dead_node() before the smp_wmb() already calls
smp_wmb() so this is not needed. This is an optimization for the RCU mode
of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-5-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The walk to destroy the nodes was not always setting the node type and
would result in a destroy method potentially using the values as nodes.
Avoid this by setting the correct node types. This is necessary for the
RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-4-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When initially starting a search, the root node may already be in the
process of being replaced in RCU mode. Detect and restart the walk if
this is the case. This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-3-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Fix VMA tree modification under mmap read lock".
Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused by an
inconsistency between threads walking the VMA maple tree. The
inconsistency is caused by the page fault handler modifying the maple tree
while holding the mmap_lock for read.
This only happens for stack VMAs. We had thought this was safe as it only
modifies a single pivot in the tree. Unfortunately, syzbot constructed a
test case where the stack had no guard page and grew the stack to abut the
next VMA. This causes us to delete the NULL entry between the two VMAs
and rewrite the node.
We considered several options for fixing this, including dropping the
mmap_lock, then reacquiring it for write; and relaxing the definition of
the tree to permit a zero-length NULL entry in the node. We decided the
best option was to backport some of the RCU patches from -next, which
solve the problem by allocating a new node and RCU-freeing the old node.
Since the problem exists in 6.1, we preferred a solution which is similar
to the one we intended to merge next merge window.
These patches have been in -next since next-20230301, and have received
intensive testing in Android as part of the RCU page fault patchset. They
were also sent as part of the "Per-VMA locks" v4 patch series. Patches 1
to 7 are bug fixes for RCU mode of the tree and patch 8 enables RCU mode
for the tree.
Performance v6.3-rc3 vs patched v6.3-rc3: Running these changes through
mmtests showed there was a 15-20% performance decrease in
will-it-scale/brk1-processes. This tests creating and inserting a single
VMA repeatedly through the brk interface and isn't representative of any
real world applications.
This patch (of 8):
ma_pivots() and ma_data_end() may be called with a dead node. Ensure to
that the node isn't dead before using the returned values.
This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230327185532.2354250-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-2-surenb@google.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chriscli@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: freak07 <michalechner92@googlemail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
Port access. In a future patch HAS_IOPORT=n will disable compilation of
the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces such as s390.
The following architectures do not select HAS_IOPORT:
* ARC
* C-SKY
* Hexagon
* Nios II
* OpenRISC
* s390
* User-Mode Linux
* Xtensa
All other architectures select HAS_IOPORT at least conditionally.
The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
for HAS_IOPORT specific sections will be added in subsequent patches on
a per subsystem basis.
Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net> # for ARCH=um
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Create test suite called "kunit_current" to add test coverage for the use
of current->kunit_test, which returns the current KUnit test.
Add two test cases:
- kunit_current_test to test current->kunit_test and the method
kunit_get_current_test(), which utilizes current->kunit_test.
- kunit_current_fail_test to test the method
kunit_fail_current_test(), which utilizes current->kunit_test.
Signed-off-by: Rae Moar <rmoar@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>