linux

iv/linux

Author	SHA1	Message	Date
Linus Torvalds	8ca09d5fa3	cpumask: fix incorrect cpumask scanning result checks It turns out that commit `596ff4a09b` ("cpumask: re-introduce constant-sized cpumask optimizations") exposed a number of cases of drivers not checking the result of "cpumask_next()" and friends correctly. The documented correct check for "no more cpus in the cpumask" is to check for the result being equal or larger than the number of possible CPU ids, exactly _because_ we've always done those constant-sized cpumask scans using a widened type before. So the return value of a cpumask scan should be checked with if (cpu >= nr_cpu_ids) ... because the cpumask scan did not necessarily stop exactly at that maximum CPU id. But a few cases ended up instead using checks like if (cpu == nr_cpumask_bits) ... which used that internal "widened" number of bits. And that used to work pretty much by accident (ok, in this case "by accident" is simply because it matched the historical internal implementation of the cpumask scanning, so it was more of a "intentionally using implementation details rather than an accident"). But the extended constant-sized optimizations then did that internal implementation differently, and now that code that did things wrong but matched the old implementation no longer worked at all. Which then causes subsequent odd problems due to using what ends up being an invalid CPU ID. Most of these cases require either unusual hardware or special uses to hit, but the random.c one triggers quite easily. All you really need is to have a sufficiently small CONFIG_NR_CPUS value for the bit scanning optimization to be triggered, but not enough CPUs to then actually fill that widened cpumask. At that point, the cpumask scanning will return the NR_CPUS constant, which is _not_ the same as nr_cpumask_bits. This just does the mindless fix with sed -i 's/== nr_cpumask_bits/>= nr_cpu_ids/' to fix the incorrect uses. The ones in the SCSI lpfc driver in particular could probably be fixed more cleanly by just removing that repeated pattern entirely, but I am not emptionally invested enough in that driver to care. Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/lkml/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/ Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/lkml/CAMuHMdUKo_Sf7TjKzcNDa8Ve+6QrK+P8nSQrSQ=6LTRmcBKNww@mail.gmail.com/ Reported-by: Vernon Yang <vernon2gm@gmail.com> Link: https://lore.kernel.org/lkml/20230306160651.2016767-1-vernon2gm@gmail.com/ Cc: Yury Norov <yury.norov@gmail.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-03-06 12:15:13 -08:00
Jason A. Donenfeld	6bb20c152b	random: do not include <asm/archrandom.h> from random.h The <asm/archrandom.h> header is a random.c private detail, not something to be called by other code. As such, don't make it automatically available by way of random.h. Cc: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-12-20 03:13:45 +01:00
Linus Torvalds	75f4d9af8b	iov_iter work; most of that is about getting rid of direction misannotations and (hopefully) preventing more of the same for the future. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHQEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY5ZzQAAKCRBZ7Krx/gZQ 65RZAP4nTkvOn0NZLVFkuGOx8pgJelXAvrteyAuecVL8V6CR4AD40qCVY51PJp8N MzwiRTeqnGDxTTF7mgd//IB6hoatAA== =bcvF -----END PGP SIGNATURE----- Merge tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter work; most of that is about getting rid of direction misannotations and (hopefully) preventing more of the same for the future" * tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: use less confusing names for iov_iter direction initializers iov_iter: saner checks for attempt to copy to/from iterator [xen] fix "direction" argument of iov_iter_kvec() [vhost] fix 'direction' argument of iov_iter_{init,bvec}() [target] fix iov_iter_bvec() "direction" argument [s390] memcpy_real(): WRITE is "data source", not destination... [s390] zcore: WRITE is "data source", not destination... [infiniband] READ is "data destination", not source... [fsi] WRITE is "data source", not destination... [s390] copy_oldmem_kernel() - WRITE is "data source", not destination csum_and_copy_to_iter(): handle ITER_DISCARD get rid of unlikely() on page_copy_sane() calls	2022-12-12 18:29:54 -08:00
Jason A. Donenfeld	39ec9e6b14	random: align entropy_timer_state to cache line The theory behind the jitter dance is that multiple things are poking at the same cache line. This only works, however, if what's being poked at is actually all in the same cache line. Ensure this is the case by aligning the struct on the stack to the cache line size. We can't use ____cacheline_aligned on a stack variable, because gcc assumes 16 byte alignment when only 8 byte alignment is provided by the kernel, which means gcc could technically do something pathological like `(rsp & ~48) - 64`. It doesn't, but rather than risk it, just do the stack alignment manually with PTR_ALIGN and an oversized buffer. Fixes: `50ee7529ec` ("random: try to actively add entropy rather than passively wait for it") Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-12-04 14:37:08 +01:00
Jason A. Donenfeld	b83e45fd06	random: mix in cycle counter when jitter timer fires Rather than just relying on interaction between cache lines of the timer and the main loop, also explicitly take into account the fact that the timer might fire at some time that's hard to predict, due to scheduling, interrupts, or cross-CPU conditions. Mix in a cycle counter during the firing of the timer, in addition to the existing one during the scheduling of the timer. It can't hurt and can only help. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-12-04 14:37:08 +01:00
Jason A. Donenfeld	1c21fe00ed	random: spread out jitter callback to different CPUs Rather than merely hoping that the callback gets called on another CPU, arrange for that to actually happen, by round robining which CPU the timer fires on. This way, on multiprocessor machines, we exacerbate jitter by touching the same memory from multiple different cores. There's a little bit of tricky bookkeeping involved here, because using timer_setup_on_stack() + add_timer_on() + del_timer_sync() will result in a use after free. See this sample code: <https://xn--4db.cc/xBdEiIKO/c>. Instead, it's necessary to call [try_to_]del_timer_sync() before calling add_timer_on(), so that the final call to del_timer_sync() at the end of the function actually succeeds at making sure no handlers are running. Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-12-04 14:37:08 +01:00
Jason A. Donenfeld	0e42d14be2	random: remove extraneous period and add a missing one in comments Just some trivial typo fixes, and reflowing of lines. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-29 15:42:23 +01:00
Al Viro	de4eda9de2	use less confusing names for iov_iter direction initializers READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2022-11-25 13:01:55 -05:00
Jason A. Donenfeld	bbc7e1bed1	random: add back async readiness notifier This is required by vsprint, because it can't do things synchronously from hardirq context, and it will be useful for an EFI notifier as well. I didn't initially want to do this, but with two potential consumers now, it seems worth it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-22 14:53:00 +01:00
Jason A. Donenfeld	9148de3196	random: reseed in delayed work rather than on-demand Currently, we reseed when random bytes are requested, if the current seed is too old. Since random bytes can be requested from all contexts, including hard IRQ, this means sometimes we wind up adding a bit of latency to hard IRQ. This was so much of a problem on s390x that now s390x just doesn't provide its architectural RNG from hard IRQ context, so we miss out in that case. Instead, let's just schedule a persistent delayed work, so that the reseeding and potentially expensive operations will always happen from process context, reducing unexpected latencies from hard IRQ. This also has the nice effect of accumulating a transcript of random inputs over time, since it means that we amass more input values. And it should make future vDSO integration a bit easier. Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Juergen Christ <jchrist@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-18 02:18:10 +01:00
Jason A. Donenfeld	db516da95c	hw_random: use add_hwgenerator_randomness() for early entropy Rather than calling add_device_randomness(), the add_early_randomness() function should use add_hwgenerator_randomness(), so that the early entropy can be potentially credited, which allows for the RNG to initialize earlier without having to wait for the kthread to come up. This requires some minor API refactoring, by adding a `sleep_after` parameter to add_hwgenerator_randomness(), so that we don't hit a blocking sleep from add_early_randomness(). Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-18 02:18:10 +01:00
Jason A. Donenfeld	19258d05b6	random: modernize documentation comment on get_random_bytes() The prior text was very old and made outdated references to TCP sequence numbers, which should use one of the integer functions instead, since batched entropy was introduced. The current way of describing the quality of functions is just to say that it's as good as /dev/urandom, which now all the functions are. Fixes: `f5b98461cb` ("random: use chacha20 for get_random_int/long") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-18 02:18:10 +01:00
Jason A. Donenfeld	b240bab518	random: adjust comment to account for removed function Since `de492c83ca` ("prandom: remove unused functions"), get_random_int() no longer exists, so remove its reference from this comment. Fixes: `de492c83ca` ("prandom: remove unused functions") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-18 02:18:10 +01:00
Jason A. Donenfeld	2c03e16f44	random: remove early archrandom abstraction The arch_get_random_early() abstraction is not completely useful and adds complexity, because it's not a given that there will be no calls to arch_get_random() between random_init_early(), which uses arch_get_random_early(), and init_cpu_features(). During that gap, crng_reseed() might be called, which uses arch_get_random(), since it's mostly not init code. Instead we can test whether we're in the early phase in arch_get_random() itself, and in doing so avoid all ambiguity about where we are. Fortunately, the only architecture that currently implements arch_get_random_early() also has an alternatives-based cpu feature system, one flag of which determines whether the other flags have been initialized. This makes it possible to do the early check with zero cost once the system is initialized. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-18 02:18:10 +01:00
Jason A. Donenfeld	b9b01a5625	random: use random.trust_{bootloader,cpu} command line option only It's very unusual to have both a command line option and a compile time option, and apparently that's confusing to people. Also, basically everybody enables the compile time option now, which means people who want to disable this wind up having to use the command line option to ensure that anyway. So just reduce the number of moving pieces and nix the compile time option in favor of the more versatile command line option. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-18 02:18:10 +01:00
Jason A. Donenfeld	7f576b2593	random: add helpers for random numbers with given floor or range Now that we have get_random_u32_below(), it's nearly trivial to make inline helpers to compute get_random_u32_above() and get_random_u32_inclusive(), which will help clean up open coded loops and manual computations throughout the tree. One snag is that in order to make get_random_u32_inclusive() operate on closed intervals, we have to do some (unlikely) special case handling if get_random_u32_inclusive(0, U32_MAX) is called. The least expensive way of doing this is actually to adjust the slowpath of get_random_u32_below() to have its undefined 0 result just return the output of get_random_u32(). We can make this basically free by calling get_random_u32() before the branch, so that the branch latency gets interleaved. Cc: stable@vger.kernel.org # to ease future backports that use this api Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-18 02:15:12 +01:00
Jason A. Donenfeld	e9a688bcb1	random: use rejection sampling for uniform bounded random integers Until the very recent commits, many bounded random integers were calculated using `get_random_u32() % max_plus_one`, which not only incurs the price of a division -- indicating performance mostly was not a real issue -- but also does not result in a uniformly distributed output if max_plus_one is not a power of two. Recent commits moved to using `prandom_u32_max(max_plus_one)`, which replaces the division with a faster multiplication, but still does not solve the issue with non-uniform output. For some users, maybe this isn't a problem, and for others, maybe it is, but for the majority of users, probably the question has never been posed and analyzed, and nobody thought much about it, probably assuming random is random is random. In other words, the unthinking expectation of most users is likely that the resultant numbers are uniform. So we implement here an efficient way of generating uniform bounded random integers. Through use of compile-time evaluation, and avoiding divisions as much as possible, this commit introduces no measurable overhead. At least for hot-path uses tested, any potential difference was lost in the noise. On both clang and gcc, code generation is pretty small. The new function, get_random_u32_below(), lives in random.h, rather than prandom.h, and has a "get_random_xxx" function name, because it is suitable for all uses, including cryptography. In order to be efficient, we implement a kernel-specific variant of Daniel Lemire's algorithm from "Fast Random Integer Generation in an Interval", linked below. The kernel's variant takes advantage of constant folding to avoid divisions entirely in the vast majority of cases, works on both 32-bit and 64-bit architectures, and requests a minimal amount of bytes from the RNG. Link: https://arxiv.org/pdf/1805.10941.pdf Cc: stable@vger.kernel.org # to ease future backports that use this api Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-11-17 17:36:47 +01:00
Jean-Philippe Brucker	f5e4ec155d	random: use arch_get_random*_early() in random_init() While reworking the archrandom handling, commit `d349ab99ee` ("random: handle archrandom with multiple longs") switched to the non-early archrandom helpers in random_init(), which broke initialization of the entropy pool from the arm64 random generator. Indeed at that point the arm64 CPU features, which verify that all CPUs have compatible capabilities, are not finalized so arch_get_random_seed_longs() is unsuccessful. Instead random_init() should use the _early functions, which check only the boot CPU on arm64. On other architectures the _early functions directly call the normal ones. Fixes: `d349ab99ee` ("random: handle archrandom with multiple longs") Cc: stable@vger.kernel.org Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-10-29 00:24:03 +02:00
Jason A. Donenfeld	de492c83ca	prandom: remove unused functions With no callers left of prandom_u32() and prandom_bytes(), as well as get_random_int(), remove these deprecated wrappers, in favor of get_random_u32() and get_random_bytes(). Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-10-11 17:42:58 -06:00
Jason A. Donenfeld	a890d1c657	random: clear new batches when bringing new CPUs online The commit that added the new get_random_{u8,u16}() functions neglected to update the code that clears the batches when bringing up a new CPU. It also forgot a few comments and helper defines, so add those in too. Fixes: `585cd5fe9f` ("random: add 8-bit and 16-bit batches") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-10-06 09:35:55 -06:00
William Zijl	d687772e6d	random: fix typos in get_random_bytes() comment Remove extra whitespace and add a missing word to a sentence describing get_random_bytes(). Signed-off-by: William Zijl <postmaster@gusted.xyz> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-10-01 23:37:51 +02:00
Jason A. Donenfeld	1227334713	random: schedule jitter credit for next jiffy, not in two jiffies Counterintuitively, mod_timer(..., jiffies + 1) will cause the timer to fire not in the next jiffy, but in two jiffies. The way to cause the timer to fire in the next jiffy is with mod_timer(..., jiffies). Doing so then lets us bump the upper bound back up again. Fixes: `50ee7529ec` ("random: try to actively add entropy rather than passively wait for it") Fixes: `829d680e82` ("random: cap jitter samples per bit to factor of HZ") Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-10-01 12:04:21 +02:00
Jason A. Donenfeld	585cd5fe9f	random: add 8-bit and 16-bit batches There are numerous places in the kernel that would be sped up by having smaller batches. Currently those callsites do `get_random_u32() & 0xff` or similar. Since these are pretty spread out, and will require patches to multiple different trees, let's get ahead of the curve and lay the foundation for `get_random_u8()` and `get_random_u16()`, so that it's then possible to start submitting conversion patches leisurely. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-09-29 21:37:27 +02:00
Jason A. Donenfeld	dd54fd7dfa	random: use init_utsname() instead of utsname() Rather than going through the current-> indirection for utsname, at this point in boot, init_utsname()==utsname(), so just use it directly that way. Additionally, init_utsname() appears to be available nearly always, so move it into random_init_early(). Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-09-29 21:37:27 +02:00
Jason A. Donenfeld	f62384995e	random: split initialization into early step and later step The full RNG initialization relies on some timestamps, made possible with initialization functions like time_init() and timekeeping_init(). However, these are only available rather late in initialization. Meanwhile, other things, such as memory allocator functions, make use of the RNG much earlier. So split RNG initialization into two phases. We can provide arch randomness very early on, and then later, after timekeeping and such are available, initialize the rest. This ensures that, for example, slabs are properly randomized if RDRAND is available. Without this, CONFIG_SLAB_FREELIST_RANDOM=y loses a degree of its security, because its random seed is potentially deterministic, since it hasn't yet incorporated RDRAND. It also makes it possible to use a better seed in kfence, which currently relies on only the cycle counter. Another positive consequence is that on systems with RDRAND, running with CONFIG_WARN_ALL_UNSEEDED_RANDOM=y results in no warnings at all. One subtle side effect of this change is that on systems with no RDRAND, RDTSC is now only queried by random_init() once, committing the moment of the function call, instead of multiple times as before. This is intentional, as the multiple RDTSCs in a loop before weren't accomplishing very much, with jitter being better provided by try_to_generate_entropy(). Plus, filling blocks with RDTSC is still being done in extract_entropy(), which is necessarily called before random bytes are served anyway. Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-09-29 21:36:27 +02:00
Jason A. Donenfeld	748bc4dd9e	random: use expired timer rather than wq for mixing fast pool Previously, the fast pool was dumped into the main pool periodically in the fast pool's hard IRQ handler. This worked fine and there weren't problems with it, until RT came around. Since RT converts spinlocks into sleeping locks, problems cropped up. Rather than switching to raw spinlocks, the RT developers preferred we make the transformation from originally doing: do_some_stuff() spin_lock() do_some_other_stuff() spin_unlock() to doing: do_some_stuff() queue_work_on(some_other_stuff_worker) This is an ordinary pattern done all over the kernel. However, Sherry noticed a 10% performance regression in qperf TCP over a 40gbps InfiniBand card. Quoting her message: > MT27500 Family [ConnectX-3] cards: > Infiniband device 'mlx4_0' port 1 status: > default gid: fe80:0000:0000:0000:0010:e000:0178:9eb1 > base lid: 0x6 > sm lid: 0x1 > state: 4: ACTIVE > phys state: 5: LinkUp > rate: 40 Gb/sec (4X QDR) > link_layer: InfiniBand > > Cards are configured with IP addresses on private subnet for IPoIB > performance testing. > Regression identified in this bug is in TCP latency in this stack as reported > by qperf tcp_lat metric: > > We have one system listen as a qperf server: > [root@yourQperfServer ~]# qperf > > Have the other system connect to qperf server as a client (in this > case, it’s X7 server with Mellanox card): > [root@yourQperfClient ~]# numactl -m0 -N0 qperf 20.20.20.101 -v -uu -ub --time 60 --wait_server 20 -oo msg_size:4K:1024K:*2 tcp_lat Rather than incur the scheduling latency from queue_work_on, we can instead switch to running on the next timer tick, on the same core. This also batches things a bit more -- once per jiffy -- which is okay now that mix_interrupt_randomness() can credit multiple bits at once. Reported-by: Sherry Yang <sherry.yang@oracle.com> Tested-by: Paul Webb <paul.x.webb@oracle.com> Cc: Sherry Yang <sherry.yang@oracle.com> Cc: Phillip Goerl <phillip.goerl@oracle.com> Cc: Jack Vogel <jack.vogel@oracle.com> Cc: Nicky Veitch <nicky.veitch@oracle.com> Cc: Colm Harrington <colm.harrington@oracle.com> Cc: Ramanan Govindarajan <ramanan.govindarajan@oracle.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Tejun Heo <tj@kernel.org> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: stable@vger.kernel.org Fixes: `58340f8e95` ("random: defer fast pool mixing to worker") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-09-28 18:17:53 +02:00
Jason A. Donenfeld	9ee0507e89	random: avoid reading two cache lines on irq randomness In order to avoid reading and dirtying two cache lines on every IRQ, move the work_struct to the bottom of the fast_pool struct. add_ interrupt_randomness() always touches .pool and .count, which are currently split, because .mix pushes everything down. Instead, move .mix to the bottom, so that .pool and .count are always in the first cache line, since .mix is only accessed when the pool is full. Fixes: `58340f8e95` ("random: defer fast pool mixing to worker") Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-09-28 18:17:30 +02:00
Jason A. Donenfeld	e78a802a7b	random: clamp credited irq bits to maximum mixed Since the most that's mixed into the pool is sizeof(long)*2, don't credit more than that many bytes of entropy. Fixes: `e3e33fc2ea` ("random: do not use input pool from hard IRQs") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-09-23 12:31:05 +02:00
Jason A. Donenfeld	d775335e35	random: throttle hwrng writes if no entropy is credited If a hwrng source does not provide an entropy estimate, it currently does not contribute at all to the CRNG. In order to help fix this, in case add_hwgenerator_randomness() is called with the entropy parameter set to zero, go to sleep until one reseed interval has passed. While the hwrng thread currently only runs under conditions where this is non-zero, this change is not harmful and prepares for future updates to the hwrng core. Cc: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-09-23 12:27:57 +02:00
Dominik Brodowski	745558f958	random: use hwgenerator randomness more frequently at early boot Mix in randomness from hw-rng sources more frequently during early boot, approximately once for every rng reseed. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-09-23 12:27:57 +02:00
Jason A. Donenfeld	cd4f24ae94	random: restore O_NONBLOCK support Prior to 5.6, when /dev/random was opened with O_NONBLOCK, it would return -EAGAIN if there was no entropy. When the pools were unified in 5.6, this was lost. The post 5.6 behavior of blocking until the pool is initialized, and ignoring O_NONBLOCK in the process, went unnoticed, with no reports about the regression received for two and a half years. However, eventually this indeed did break somebody's userspace. So we restore the old behavior, by returning -EAGAIN if the pool is not initialized. Unlike the old /dev/random, this can only occur during early boot, after which it never blocks again. In order to make this O_NONBLOCK behavior consistent with other expectations, also respect users reading with preadv2(RWF_NOWAIT) and similar. Fixes: `30c08efec8` ("random: make /dev/random be almost like /dev/urandom") Reported-by: Guozihua <guozihua@huawei.com> Reported-by: Zhongguohua <zhongguohua1@huawei.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andrew Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-09-23 12:27:57 +02:00
Linus Torvalds	228dfe98a3	Char / Misc driver changes for 6.0-rc1 Here is the large set of char and misc and other driver subsystem changes for 6.0-rc1. Highlights include: - large set of IIO driver updates, additions, and cleanups - new habanalabs device support added (loads of register maps much like GPUs have) - soundwire driver updates - phy driver updates - slimbus driver updates - tiny virt driver fixes and updates - misc driver fixes and updates - interconnect driver updates - hwtracing driver updates - fpga driver updates - extcon driver updates - firmware driver updates - counter driver update - mhi driver fixes and updates - binder driver fixes and updates - speakup driver fixes Full details are in the long shortlog contents. All of these have been in linux-next for a while without any reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYup9QQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylBKQCfaSuzl9ZP9dTvAw2FPp14oRqXnpoAnicvWAoq 1vU9Vtq2c73uBVLdZm4m =AwP3 -----END PGP SIGNATURE----- Merge tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc driver updates from Greg KH: "Here is the large set of char and misc and other driver subsystem changes for 6.0-rc1. Highlights include: - large set of IIO driver updates, additions, and cleanups - new habanalabs device support added (loads of register maps much like GPUs have) - soundwire driver updates - phy driver updates - slimbus driver updates - tiny virt driver fixes and updates - misc driver fixes and updates - interconnect driver updates - hwtracing driver updates - fpga driver updates - extcon driver updates - firmware driver updates - counter driver update - mhi driver fixes and updates - binder driver fixes and updates - speakup driver fixes All of these have been in linux-next for a while without any reported problems" * tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (634 commits) drivers: lkdtm: fix clang -Wformat warning char: remove VR41XX related char driver misc: Mark MICROCODE_MINOR unused spmi: trace: fix stack-out-of-bound access in SPMI tracing functions dt-bindings: iio: adc: Add compatible for MT8188 iio: light: isl29028: Fix the warning in isl29028_remove() iio: accel: sca3300: Extend the trigger buffer from 16 to 32 bytes iio: fix iio_format_avail_range() printing for none IIO_VAL_INT iio: adc: max1027: unlock on error path in max1027_read_single_value() iio: proximity: sx9324: add empty line in front of bullet list iio: magnetometer: hmc5843: Remove duplicate 'the' iio: magn: yas530: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: magnetometer: ak8974: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: veml6030: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: vcnl4035: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: vcnl4000: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: tsl2591: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() iio: light: tsl2583: Use DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr() iio: light: isl29028: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() iio: light: gp2ap002: Switch to DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr() ...	2022-08-04 11:05:48 -07:00
Jason A. Donenfeld	7f637be4d4	random: correct spelling of "overwrites" It was missing an 'r'. Fixes: `186873c549` ("random: use simpler fast key erasure flow on per-cpu keys") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-07-30 01:13:02 +02:00
Jason A. Donenfeld	d349ab99ee	random: handle archrandom with multiple longs The archrandom interface was originally designed for x86, which supplies RDRAND/RDSEED for receiving random words into registers, resulting in one function to generate an int and another to generate a long. However, other architectures don't follow this. On arm64, the SMCCC TRNG interface can return between one and three longs. On s390, the CPACF TRNG interface can return arbitrary amounts, with four longs having the same cost as one. On UML, the os_getrandom() interface can return arbitrary amounts. So change the api signature to take a "max_longs" parameter designating the maximum number of longs requested, and then return the number of longs generated. Since callers need to check this return value and loop anyway, each arch implementation does not bother implementing its own loop to try again to fill the maximum number of longs. Additionally, all existing callers pass in a constant max_longs parameter. Taken together, these two things mean that the codegen doesn't really change much for one-word-at-a-time platforms, while performance is greatly improved on platforms such as s390. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-07-25 13:26:14 +02:00
Uros Bizjak	b7a68f67ff	random: use try_cmpxchg in _credit_init_bits Use `!try_cmpxchg(ptr, &orig, new)` instead of `cmpxchg(ptr, orig, new) != orig` in _credit_init_bits. This has two benefits: - The x86 cmpxchg instruction returns success in the ZF flag, so this change saves a compare after cmpxchg, as well as a related move instruction in front of cmpxchg. - try_cmpxchg implicitly assigns the *ptr value to &orig when cmpxchg fails, enabling further code simplifications. This patch has no functional change. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-07-18 15:04:04 +02:00
Jason A. Donenfeld	829d680e82	random: cap jitter samples per bit to factor of HZ Currently the jitter mechanism will require two timer ticks per iteration, and it requires N iterations per bit. This N is determined with a small measurement, and if it's too big, it won't waste time with jitter entropy because it'd take too long or not have sufficient entropy anyway. With the current max N of 32, there are large timeouts on systems with a small CONFIG_HZ. Rather than set that maximum to 32, instead choose a factor of CONFIG_HZ. In this case, 1/30 seems to yield sane values for different configurations of CONFIG_HZ. Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Fixes: `78c768e619` ("random: vary jitter iterations based on cycle counter speed") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-07-16 10:42:12 -07:00
Kalesh Singh	261e224d6a	pm/sleep: Add PM_USERSPACE_AUTOSLEEP Kconfig Systems that initiate frequent suspend/resume from userspace can make the kernel aware by enabling PM_USERSPACE_AUTOSLEEP config. This allows for certain sleep-sensitive code (wireguard/rng) to decide on what preparatory work should be performed (or not) in their pm_notification callbacks. This patch was prompted by the discussion at [1] which attempts to remove CONFIG_ANDROID that currently guards these code paths. [1] https://lore.kernel.org/r/20220629150102.1582425-1-hch@lst.de/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Link: https://lore.kernel.org/r/20220630191230.235306-1-kaleshsingh@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-07-01 10:39:20 +02:00
Jason A. Donenfeld	63b8ea5e4f	random: update comment from copy_to_user() -> copy_to_iter() This comment wasn't updated when we moved from read() to read_iter(), so this patch makes the trivial fix. Fixes: `1b388e7765` ("random: convert to using fops->read_iter()") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-20 11:06:17 +02:00
Jason A. Donenfeld	c01d4d0a82	random: quiet urandom warning ratelimit suppression message random.c ratelimits how much it warns about uninitialized urandom reads using __ratelimit(). When the RNG is finally initialized, it prints the number of missed messages due to ratelimiting. It has been this way since that functionality was introduced back in 2018. Recently, `cc1e127bfa` ("random: remove ratelimiting for in-kernel unseeded randomness") put a bit more stress on the urandom ratelimiting, which teased out a bug in the implementation. Specifically, when under pressure, __ratelimit() will print its own message and reset the count back to 0, making the final message at the end less useful. Secondly, it does so as a pr_warn(), which apparently is undesirable for people's CI. Fortunately, __ratelimit() has the RATELIMIT_MSG_ON_RELEASE flag exactly for this purpose, so we set the flag. Fixes: `4e00b339e2` ("random: rate limit unseeded randomness warnings") Cc: stable@vger.kernel.org Reported-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: Ron Economos <re@w6rz.net> Tested-by: Ron Economos <re@w6rz.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-19 23:50:46 +02:00
Jason A. Donenfeld	534d2eaf19	random: schedule mix_interrupt_randomness() less often It used to be that mix_interrupt_randomness() would credit 1 bit each time it ran, and so add_interrupt_randomness() would schedule mix() to run every 64 interrupts, a fairly arbitrary number, but nonetheless considered to be a decent enough conservative estimate. Since `e3e33fc2ea` ("random: do not use input pool from hard IRQs"), mix() is now able to credit multiple bits, depending on the number of calls to add(). This was done for reasons separate from this commit, but it has the nice side effect of enabling this patch to schedule mix() less often. Currently the rules are: a) Credit 1 bit for every 64 calls to add(). b) Schedule mix() once a second that add() is called. c) Schedule mix() once every 64 calls to add(). Rules (a) and (c) no longer need to be coupled. It's still important to have _some_ value in (c), so that we don't "over-saturate" the fast pool, but the once per second we get from rule (b) is a plenty enough baseline. So, by increasing the 64 in rule (c) to something larger, we avoid calling queue_work_on() as frequently during irq storms. This commit changes that 64 in rule (c) to be 1024, which means we schedule mix() 16 times less often. And it does not need to change the 64 in rule (a). Fixes: `58340f8e95` ("random: defer fast pool mixing to worker") Cc: stable@vger.kernel.org Cc: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-19 23:50:45 +02:00
Jason A. Donenfeld	e052a478a7	random: remove rng_has_arch_random() With arch randomness being used by every distro and enabled in defconfigs, the distinction between rng_has_arch_random() and rng_is_initialized() is now rather small. In fact, the places where they differ are now places where paranoid users and system builders really don't want arch randomness to be used, in which case we should respect that choice, or places where arch randomness is known to be broken, in which case that choice is all the more important. So this commit just removes the function and its one user. Reviewed-by: Petr Mladek <pmladek@suse.com> # for vsprintf.c Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-10 11:29:48 +02:00
Jason A. Donenfeld	60e5b2886b	random: do not use jump labels before they are initialized Stephen reported that a static key warning splat appears during early boot on systems that credit randomness from device trees that contain an "rng-seed" property, because because setup_machine_fdt() is called before jump_label_init() during setup_arch(): static_key_enable_cpuslocked(): static key '0xffffffe51c6fcfc0' used before call to jump_label_init() WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 static_key_enable_cpuslocked+0xb0/0xb8 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0+ #224 44b43e377bfc84bc99bb5ab885ff694984ee09ff pstate: 600001c9 (nZCv dAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : static_key_enable_cpuslocked+0xb0/0xb8 lr : static_key_enable_cpuslocked+0xb0/0xb8 sp : ffffffe51c393cf0 x29: ffffffe51c393cf0 x28: 000000008185054c x27: 00000000f1042f10 x26: 0000000000000000 x25: 00000000f10302b2 x24: 0000002513200000 x23: 0000002513200000 x22: ffffffe51c1c9000 x21: fffffffdfdc00000 x20: ffffffe51c2f0831 x19: ffffffe51c6fcfc0 x18: 00000000ffff1020 x17: 00000000e1e2ac90 x16: 00000000000000e0 x15: ffffffe51b710708 x14: 0000000000000066 x13: 0000000000000018 x12: 0000000000000000 x11: 0000000000000000 x10: 00000000ffffffff x9 : 0000000000000000 x8 : 0000000000000000 x7 : 61632065726f6665 x6 : 6220646573752027 x5 : ffffffe51c641d25 x4 : ffffffe51c13142c x3 : ffff0a00ffffff05 x2 : 40000000ffffe003 x1 : 00000000000001c0 x0 : 0000000000000065 Call trace: static_key_enable_cpuslocked+0xb0/0xb8 static_key_enable+0x2c/0x40 crng_set_ready+0x24/0x30 execute_in_process_context+0x80/0x90 _credit_init_bits+0x100/0x154 add_bootloader_randomness+0x64/0x78 early_init_dt_scan_chosen+0x140/0x184 early_init_dt_scan_nodes+0x28/0x4c early_init_dt_scan+0x40/0x44 setup_machine_fdt+0x7c/0x120 setup_arch+0x74/0x1d8 start_kernel+0x84/0x44c __primary_switched+0xc0/0xc8 ---[ end trace 0000000000000000 ]--- random: crng init done Machine model: Google Lazor (rev1 - 2) with LTE A trivial fix went in to address this on arm64, `73e2d827a5` ("arm64: Initialize jump labels before setup_machine_fdt()"). I wrote patches as well for arm32 and risc-v. But still patches are needed on xtensa, powerpc, arc, and mips. So that's 7 platforms where things aren't quite right. This sort of points to larger issues that might need a larger solution. Instead, this commit just defers setting the static branch until later in the boot process. random_init() is called after jump_label_init() has been called, and so is always a safe place from which to adjust the static branch. Fixes: `f5bda35fba` ("random: use static branch for crng_ready()") Reported-by: Stephen Boyd <swboyd@chromium.org> Reported-by: Phil Elwell <phil@raspberrypi.com> Tested-by: Phil Elwell <phil@raspberrypi.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-10 11:29:48 +02:00
Jason A. Donenfeld	77fc95f8c0	random: account for arch randomness in bits Rather than accounting in bytes and multiplying (shifting), we can just account in bits and avoid the shift. The main motivation for this is there are other patches in flux that expand this code a bit, and avoiding the duplication of "* 8" everywhere makes things a bit clearer. Cc: stable@vger.kernel.org Fixes: `12e45a2a63` ("random: credit architectural init the exact amount") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-10 11:29:42 +02:00
Jason A. Donenfeld	39e0f991a6	random: mark bootloader randomness code as __init add_bootloader_randomness() and the variables it touches are only used during __init and not after, so mark these as __init. At the same time, unexport this, since it's only called by other __init code that's built-in. Cc: stable@vger.kernel.org Fixes: `428826f535` ("fdt: add support for rng-seed") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-10 11:28:16 +02:00
Jason A. Donenfeld	9b29b6b203	random: avoid checking crng_ready() twice in random_init() The current flow expands to: if (crng_ready()) ... else if (...) if (!crng_ready()) ... The second crng_ready() call is redundant, but can't so easily be optimized out by the compiler. This commit simplifies that to: if (crng_ready() ... else if (...) ... Fixes: `560181c27b` ("random: move initialization functions out of hot pages") Cc: stable@vger.kernel.org Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-06-10 11:09:36 +02:00
Jason A. Donenfeld	1ce6c8d68f	random: check for signals after page of pool writes get_random_bytes_user() checks for signals after producing a PAGE_SIZE worth of output, just like /dev/zero does. write_pool() is doing basically the same work (actually, slightly more expensive), and so should stop to check for signals in the same way. Let's also name it write_pool_user() to match get_random_bytes_user(), so this won't be misused in the future. Before this patch, massive writes to /dev/urandom would tie up the process for an extremely long time and make it unterminatable. After, it can be successfully interrupted. The following test program can be used to see this works as intended: #include <unistd.h> #include <fcntl.h> #include <signal.h> #include <stdio.h> static unsigned char x[~0U]; static void handle(int) { } int main(int argc, char *argv[]) { pid_t pid = getpid(), child; int fd; signal(SIGUSR1, handle); if (!(child = fork())) { for (;;) kill(pid, SIGUSR1); } fd = open("/dev/urandom", O_WRONLY); pause(); printf("interrupted after writing %zd bytes\n", write(fd, x, sizeof(x))); close(fd); kill(child, SIGTERM); return 0; } Result before: "interrupted after writing 2147479552 bytes" Result after: "interrupted after writing 4096 bytes" Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-05-22 22:34:31 +02:00
Jens Axboe	79025e727a	random: wire up fops->splice_{read,write}_iter() Now that random/urandom is using {read,write}_iter, we can wire it up to using the generic splice handlers. Fixes: `36e2c7421f` ("fs: don't allow splice read/write without explicit ops") Signed-off-by: Jens Axboe <axboe@kernel.dk> [Jason: added the splice_write path. Note that sendfile() and such still does not work for read, though it does for write, because of a file type restriction in splice_direct_to_actor(), which I'll address separately.] Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-05-20 18:26:48 +02:00
Jens Axboe	22b0a222af	random: convert to using fops->write_iter() Now that the read side has been converted to fix a regression with splice, convert the write side as well to have some symmetry in the interface used (and help deprecate ->write()). Signed-off-by: Jens Axboe <axboe@kernel.dk> [Jason: cleaned up random_ioctl a bit, require full writes in RNDADDENTROPY since it's crediting entropy, simplify control flow of write_pool(), and incorporate suggestions from Al.] Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-05-20 18:26:48 +02:00
Jens Axboe	1b388e7765	random: convert to using fops->read_iter() This is a pre-requisite to wiring up splice() again for the random and urandom drivers. It also allows us to remove the INT_MAX check in getrandom(), because import_single_range() applies capping internally. Signed-off-by: Jens Axboe <axboe@kernel.dk> [Jason: rewrote get_random_bytes_user() to simplify and also incorporate additional suggestions from Al.] Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-05-20 18:26:48 +02:00
Jason A. Donenfeld	3092adcef3	random: unify batched entropy implementations There are currently two separate batched entropy implementations, for u32 and u64, with nearly identical code, with the goal of avoiding unaligned memory accesses and letting the buffers be used more efficiently. Having to maintain these two functions independently is a bit of a hassle though, considering that they always need to be kept in sync. This commit factors them out into a type-generic macro, so that the expansion produces the same code as before, such that diffing the assembly shows no differences. This will also make it easier in the future to add u16 and u8 batches. This was initially tested using an always_inline function and letting gcc constant fold the type size in, but the code gen was less efficient, and in general it was more verbose and harder to follow. So this patch goes with the boring macro solution, similar to what's already done for the _wait functions in random.h. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-05-19 16:54:15 +02:00

1 2 3 4 5 ...

467 Commits