7132 Commits

Author SHA1 Message Date
Marco Elver
40eb5cf4cc kasan: test: make use of kunit_skip()
Make use of the recently added kunit_skip() to skip tests, as it permits
TAP parsers to recognize if a test was deliberately skipped.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-25 11:31:03 -06:00
David Gow
d99ea67514 kunit: test: Add example tests which are always skipped
Add two new tests to the example test suite, both of which are always
skipped. This is used as an example for how to write tests which are
skipped, and to demonstrate the difference between kunit_skip() and
kunit_mark_skipped().

Note that these tests are enabled by default, so a default run of KUnit
will have two skipped tests.

Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-25 11:31:03 -06:00
David Gow
6d2426b2f2 kunit: Support skipped tests
The kunit_mark_skipped() macro marks the current test as "skipped", with
the provided reason. The kunit_skip() macro will mark the test as
skipped, and abort the test.

The TAP specification supports this "SKIP directive" as a comment after
the "ok" / "not ok" for a test. See the "Directives" section of the TAP
spec for details:
https://testanything.org/tap-specification.html#directives

The 'success' field for KUnit tests is replaced with a kunit_status
enum, which can be SUCCESS, FAILURE, or SKIPPED, combined with a
'status_comment' containing information on why a test was skipped.

A new 'kunit_status' test suite is added to test this.

Signed-off-by: David Gow <davidgow@google.com>
Tested-by: Marco Elver <elver@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-25 11:31:03 -06:00
Daniel Latypov
ebd09577be lib/test: convert lib/test_list_sort.c to use KUnit
Functionally, this just means that the test output will be slightly
changed and it'll now depend on CONFIG_KUNIT=y/m.

It'll still run at boot time and can still be built as a loadable
module.

There was a pre-existing patch to convert this test that I found later,
here [1]. Compared to [1], this patch doesn't rename files and uses
KUnit features more heavily (i.e. does more than converting pr_err()
calls to KUNIT_FAIL()).

What this conversion gives us:
* a shorter test thanks to KUnit's macros
* a way to run this a bit more easily via kunit.py (and
CONFIG_KUNIT_ALL_TESTS=y) [2]
* a structured way of reporting pass/fail
* uses kunit-managed allocations to avoid the risk of memory leaks
* more descriptive error messages:
  * i.e. it prints out which fields are invalid, what the expected
  values are, etc.

What this conversion does not do:
* change the name of the file (and thus the name of the module)
* change the name of the config option

Leaving these as-is for now to minimize the impact to people wanting to
run this test. IMO, that concern trumps following KUnit's style guide
for both names, at least for now.

[1] https://lore.kernel.org/linux-kselftest/20201015014616.309000-1-vitor@massaru.org/
[2] Can be run via
$ ./tools/testing/kunit/kunit.py run --kunitconfig /dev/stdin <<EOF
CONFIG_KUNIT=y
CONFIG_TEST_LIST_SORT=y
EOF

[16:55:56] Configuring KUnit Kernel ...
[16:55:56] Building KUnit Kernel ...
[16:56:29] Starting KUnit Kernel ...
[16:56:32] ============================================================
[16:56:32] ======== [PASSED] list_sort ========
[16:56:32] [PASSED] list_sort_test
[16:56:32] ============================================================
[16:56:32] Testing complete. 1 tests run. 0 failed. 0 crashed.
[16:56:32] Elapsed time: 35.668s total, 0.001s configuring, 32.725s building, 0.000s running

Note: the build time is as after a `make mrproper`.

Signed-off-by: Daniel Latypov <dlatypov@google.com>
Tested-by: David Gow <davidgow@google.com>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-25 11:31:03 -06:00
Daniel Latypov
7122debb43 kunit: introduce kunit_kmalloc_array/kunit_kcalloc() helpers
Add in:
* kunit_kmalloc_array() and wire up kunit_kmalloc() to be a special
case of it.
* kunit_kcalloc() for symmetry with kunit_kzalloc()

This should using KUnit more natural by making it more similar to the
existing *alloc() APIs.

And while we shouldn't necessarily be writing unit tests where overflow
should be a concern, it can't hurt to be safe.

Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-25 11:31:03 -06:00
David Gow
44acdbb250 kunit: Add gnu_printf specifiers
Some KUnit functions use variable arguments to implement a printf-like
format string. Use the __printf() attribute to let the compiler warn if
invalid format strings are passed in.

If the kernel is build with W=1, it complained about the lack of these
specifiers, e.g.:
../lib/kunit/test.c:72:2: warning: function ‘kunit_log_append’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]

Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-23 17:06:04 -06:00
David Gow
255ede3b12 lib/cmdline_kunit: Remove a cast which are no-longer required
With some of the stricter type checking in KUnit's EXPECT macros
removed, a cast in cmdline_kunit is no longer required.

Remove the unnecessary cast, using NULL instead of (int *) to make it
clearer.

Signed-off-by: David Gow <davidgow@google.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-23 16:41:41 -06:00
Paul E. McKenney
1253b9b87e clocksource: Provide kernel module to test clocksource watchdog
When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  It would be good
to have a way of testing the clocksource watchdog's ability to
distinguish between these two causes of clock skew and instability.

Therefore, provide a new clocksource-wdtest module selected by a new
TEST_CLOCKSOURCE_WATCHDOG Kconfig option.  This module has a single module
parameter named "holdoff" that provides the number of seconds of delay
before testing should start, which defaults to zero when built as a module
and to 10 seconds when built directly into the kernel.  Very large systems
that boot slowly may need to increase the value of this module parameter.

This module uses hand-crafted clocksource structures to do its testing,
thus avoiding messing up timing for the rest of the kernel and for user
applications.  This module first verifies that the ->uncertainty_margin
field of the clocksource structures are set sanely.  It then tests the
delay-detection capability of the clocksource watchdog, increasing the
number of consecutive delays injected, first provoking console messages
complaining about the delays and finally forcing a clock-skew event.
Unexpected test results cause at least one WARN_ON_ONCE() console splat.
If there are no splats, the test has passed.  Finally, it fuzzes the
value returned from a clocksource to test the clocksource watchdog's
ability to detect time skew.

This module checks the state of its clocksource after each test, and
uses WARN_ON_ONCE() to emit a console splat if there are any failures.
This should enable all types of test frameworks to detect any such
failures.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Reported-by: Chris Mason <clm@fb.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-5-paulmck@kernel.org
2021-06-22 16:53:17 +02:00
Peter Zijlstra
1a81229604 lockdep/selftest: Remove wait-type RCU_CALLBACK tests
The problem is that rcu_callback_map doesn't have wait_types defined,
and doing so would make it indistinguishable from SOFTIRQ in any case.
Remove it.

Fixes: 9271a40d2a14 ("lockdep/selftest: Add wait context selftests")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20210617190313.384290291@infradead.org
2021-06-22 16:42:08 +02:00
Peter Zijlstra
c0c2c0dad6 lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING
When PROVE_RAW_LOCK_NESTING=y many of the selftests FAILED because
HARDIRQ context is out-of-bounds for spinlocks. Instead make the
default hardware context the threaded hardirq context, which preserves
the old locking rules.

The wait-type specific locking selftests will have a non-threaded
HARDIRQ variant.

Fixes: de8f5e4f2dc1 ("lockdep: Introduce wait-type checks")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20210617190313.322096283@infradead.org
2021-06-22 16:42:08 +02:00
Boqun Feng
8946ccc25e locking/selftests: Add a selftest for check_irq_usage()
Johannes Berg reported a lockdep problem which could be reproduced by
the special test case introduced in this patch, so add it.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210618170110.3699115-5-boqun.feng@gmail.com
2021-06-22 16:42:07 +02:00
Peter Zijlstra
49faa77759 locking/lockdep: Improve noinstr vs errors
Better handle the failure paths.

  vmlinux.o: warning: objtool: debug_locks_off()+0x23: call to console_verbose() leaves .noinstr.text section
  vmlinux.o: warning: objtool: debug_locks_off()+0x19: call to __kasan_check_write() leaves .noinstr.text section

  debug_locks_off+0x19/0x40:
  instrument_atomic_write at include/linux/instrumented.h:86
  (inlined by) __debug_locks_off at include/linux/debug_locks.h:17
  (inlined by) debug_locks_off at lib/debug_locks.c:41

Fixes: 6eebad1ad303 ("lockdep: __always_inline more for noinstr")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210621120120.784404944@infradead.org
2021-06-22 13:56:43 +02:00
John Ogness
766c268bc6 lib/dump_stack: move cpu lock to printk.c
dump_stack() implements its own cpu-reentrant spinning lock to
best-effort serialize stack traces in the printk log. However,
there are other functions (such as show_regs()) that can also
benefit from this serialization.

Move the cpu-reentrant spinning lock (cpu lock) into new helper
functions printk_cpu_lock_irqsave()/printk_cpu_unlock_irqrestore()
so that it is available for others as well. For !CONFIG_SMP the
cpu lock is a NOP.

Note that having multiple cpu locks in the system can easily
lead to deadlock. Code needing a cpu lock should use the
printk cpu lock, since the printk cpu lock could be acquired
from any code and any context.

Also note that it is not necessary for a cpu lock to disable
interrupts. However, in upcoming work this cpu lock will be used
for emergency tasks (for example, atomic consoles during kernel
crashes) and any interruptions while holding the cpu lock should
be avoided if possible.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
[pmladek@suse.com: Backported on top of 5.13-rc1.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210617095051.4808-2-john.ogness@linutronix.de
2021-06-22 09:56:10 +02:00
Peter Zijlstra
2f064a59a1 sched: Change task_struct::state
Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
2021-06-18 11:43:09 +02:00
Ingo Molnar
b2c0931a07 Merge branch 'sched/urgent' into sched/core, to resolve conflicts
This commit in sched/urgent moved the cfs_rq_is_decayed() function:

  a7b359fc6a37: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")

and this fresh commit in sched/core modified it in the old location:

  9e077b52d86a: ("sched/pelt: Check that *_avg are null when *_sum are")

Merge the two variants.

Conflicts:
	kernel/sched/fair.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-18 11:31:25 +02:00
Johannes Berg
ca2e334232 lib: add iomem emulation (logic_iomem)
Add IO memory emulation that uses callbacks for read/write to
the allocated regions. The callbacks can be registered by the
users using logic_iomem_alloc().

To use, an architecture must 'select LOGIC_IOMEM' in Kconfig
and then include <asm-generic/logic_io.h> into asm/io.h to get
the __raw_read*/__raw_write* functions.

Optionally, an architecture may 'select LOGIC_IOMEM_FALLBACK'
in which case non-emulated regions will 'fall back' to the
various real_* functions that must then be provided.

Cc: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2021-06-17 21:44:51 +02:00
Greg Kroah-Hartman
68afbd8459 Linux 5.13-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDGe+4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/IUH/iyHVulAtAhL9bnR
 qL4M1kWfcG1sKS2TzGRZzo6YiUABf89vFP90r4sKxG3AKrb8YkTwmJr8B/sWwcsv
 PpKkXXTobbDfpSrsXGEapBkQOE7h2w739XeXyBLRPkoCR4UrEFn68TV2rLjMLBPS
 /EIZkonXLWzzWalgKDP4wSJ7GaQxi3LMx3dGAvbFArEGZ1mPHNlgWy2VokFY/yBf
 qh1EZ5rugysc78JCpTqfTf3fUPK2idQW5gtHSMbyESrWwJ/3XXL9o1ET3JWURYf1
 b0FgVztzddwgULoIGWLxDH5WWts3l54sjBLj0yrLUlnGKA5FjrZb12g9PdhdywuY
 /8KfjeE=
 =JfJm
 -----END PGP SIGNATURE-----

Merge tag 'v5.13-rc6' into driver-core-next

We need the driver core fix in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-14 09:07:45 +02:00
Greg Kroah-Hartman
db4e54aefd Linux 5.13-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDGe+4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/IUH/iyHVulAtAhL9bnR
 qL4M1kWfcG1sKS2TzGRZzo6YiUABf89vFP90r4sKxG3AKrb8YkTwmJr8B/sWwcsv
 PpKkXXTobbDfpSrsXGEapBkQOE7h2w739XeXyBLRPkoCR4UrEFn68TV2rLjMLBPS
 /EIZkonXLWzzWalgKDP4wSJ7GaQxi3LMx3dGAvbFArEGZ1mPHNlgWy2VokFY/yBf
 qh1EZ5rugysc78JCpTqfTf3fUPK2idQW5gtHSMbyESrWwJ/3XXL9o1ET3JWURYf1
 b0FgVztzddwgULoIGWLxDH5WWts3l54sjBLj0yrLUlnGKA5FjrZb12g9PdhdywuY
 /8KfjeE=
 =JfJm
 -----END PGP SIGNATURE-----

Merge tag 'v5.13-rc6' into char-misc-next

We need the fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-14 08:59:06 +02:00
David Gow
b6d5799b0b kunit: Add 'kunit_shutdown' option
Add a new kernel command-line option, 'kunit_shutdown', which allows the
user to specify that the kernel poweroff, halt, or reboot after
completing all KUnit tests; this is very handy for running KUnit tests
on UML or a VM so that the UML/VM process exits cleanly immediately
after running all tests without needing a special initramfs.

Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Brendan Higgins <brendanhiggins@google.com>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Tested-By: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-11 16:04:57 -06:00
David Gow
384426bd10 kunit: Fix result propagation for parameterised tests
When one parameter of a parameterised test failed, its failure would be
propagated to the overall test, but not to the suite result (unless it
was the last parameter).

This is because test_case->success was being reset to the test->success
result after each parameter was used, so a failing test's result would
be overwritten by a non-failing result. The overall test result was
handled in a third variable, test_result, but this was discarded after
the status line was printed.

Instead, just propagate the result after each parameter run.

Signed-off-by: David Gow <davidgow@google.com>
Fixes: fadb08e7c750 ("kunit: Support for Parameterized Testing")
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2021-06-11 15:50:38 -06:00
Masami Hiramatsu
e5efaeb8a8 bootconfig: Support mixing a value and subkeys under a key
Support mixing a value and subkeys under a key. Since kernel cmdline
options will support "aaa.bbb=value1 aaa.bbb.ccc=value2", it is
better that the bootconfig supports such configuration too.

Note that this does not change syntax itself but just accepts
mixed value and subkeys e.g.

key = value1
key.subkey = value2

But this is not accepted;

key {
 value1
 subkey = value2
}

That will make value1 as a subkey.

Also, the order of the value node under a key is fixed. If there
are a value and subkeys, the value is always the first child node
of the key. Thus if user specifies subkeys first, e.g.

key.subkey = value1
key = value2

In the program (and /proc/bootconfig), it will be shown as below

key = value2
key.subkey = value1

Link: https://lkml.kernel.org/r/162262194685.264090.7738574774030567419.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-10 13:41:26 -04:00
Masami Hiramatsu
ca24306d83 bootconfig: Change array value to use child node
It is not possible to put an array value with subkeys under
a key node, because both of subkeys and the array elements
are using "next" field of the xbc_node.

Thus this changes the array values to use "child" field in
the array case. The reason why split this change is to
test it easily.

Link: https://lkml.kernel.org/r/162262193838.264090.16044473274501498656.stgit@devnote2

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-10 13:38:25 -04:00
Al Viro
6852df1266 csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller
... since all the logics is already there for use by iovec/kvec/etc.
cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:25 -04:00
Al Viro
2a510a744b clean up copy_mc_pipe_to_iter()
... and we don't need kmap_atomic() there - kmap_local_page() is fine.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:24 -04:00
Al Viro
893839fd57 pipe_zero(): we don't need no stinkin' kmap_atomic()...
FWIW, memcpy_to_page() itself almost certainly ought to
use kmap_local_page()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:24 -04:00
Al Viro
2495bdcc86 iov_iter: clean csum_and_copy_...() primitives up a bit
1) kmap_atomic() is not needed here, kmap_local_page() is enough.
2) No need to make sum = csum_block_add(sum, next, off); conditional
upon next != 0 - adding 0 is a no-op as far as csum_block_add()
is concerned.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:23 -04:00
Al Viro
55ca375c5d copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases
kmap_local_page() is enough.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:22 -04:00
Al Viro
c1d4d6a9ae copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases
kmap_local_page() is enough there.  Moreover, we can use _copy_to_iter()
for actual copying in those cases - no useful extra checks on the
address we are copying from in that call.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:22 -04:00
Al Viro
4b179e9a9c iterate_xarray(): only of the first iteration we might get offset != 0
recalculating offset on each iteration is pointless - on all subsequent
passes through the loop it will be zero anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:21 -04:00
Al Viro
a6e4ec7bfd pull handling of ->iov_offset into iterate_{iovec,bvec,xarray}
fewer arguments (by one, but still...) for iterate_...() macros

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:20 -04:00
Al Viro
7baa509900 iov_iter: make iterator callbacks use base and len instead of iovec
Iterator macros used to provide the arguments for step callbacks in
a structure matching the flavour - iovec for ITER_IOVEC, kvec for
ITER_KVEC and bio_vec for ITER_BVEC.  That already broke down for
ITER_XARRAY (bio_vec there); now that we are using kvec callback
for bvec and xarray cases, we are always passing a pointer + length
(void __user * + size_t for ITER_IOVEC callback, void * + size_t
for everything else).

Note that the original reason for bio_vec (page + offset + len) in
case of ITER_BVEC used to be that we did *not* want to kmap a
page when all we wanted was e.g. to find the alignment of its
subrange.  Now all such users are gone and the ones that are left
want the page mapped anyway for actually copying the data.

So in all cases we have pointer + length, and there's no good
reason for keeping those in struct iovec or struct kvec - we
can just pass them to callback separately.

Again, less boilerplate in callbacks...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:20 -04:00
Al Viro
622838f3fd iov_iter: make the amount already copied available to iterator callbacks
Making iterator macros keep track of the amount of data copied is pretty
easy and it has several benefits:
	1) we no longer need the mess like (from += v.iov_len) - v.iov_len
in the callbacks - initial value + total amount copied so far would do
just fine.
	2) less obviously, we no longer need to remember the initial amount
of data we wanted to copy; the loops in iterator macros are along the lines
of
	wanted = bytes;
	while (bytes) {
		copy some
		bytes -= copied
		if short copy
			break
	}
	bytes = wanted - bytes;
Replacement is
	offs = 0;
	while (bytes) {
		copy some
		offs += copied
		bytes -= copied
		if short copy
			break
	}
	bytes = offs;
That wouldn't be a win per se, but unlike the initial value of bytes, the amount
copied so far *is* useful in callbacks.
	3) in some cases (csum_and_copy_..._iter()) we already had offs manually
maintained by the callbacks.  With that change we can drop that.

	Less boilerplate and more readable code...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:19 -04:00
Al Viro
21b56c8477 iov_iter: get rid of separate bvec and xarray callbacks
After the previous commit we have
	* xarray and bvec callbacks idential in all cases
	* both equivalent to kvec callback wrapped into
kmap_local_page()/kunmap_local() pair.

So we can pass only two (iovec and kvec) callbacks to
iterate_and_advance() and let iterate_{bvec,xarray} wrap
it into kmap_local_page()/kunmap_local_page().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:18 -04:00
Al Viro
1b4fb5ffd7 iov_iter: teach iterate_{bvec,xarray}() about possible short copies
... and now we finally can sort out the mess in _copy_mc_to_iter().
Provide a variant of iterate_and_advance() that does *NOT* ignore
the return values of bvec, xarray and kvec callbacks, use that in
_copy_mc_to_iter().  That gets rid of magic in those callbacks -
we used to need it so we'd get at least the right return value in
case of failure halfway through.

As a bonus, now iterator is advanced by the amount actually copied
for all flavours.  That's what the callers expect and it used to do that
correctly in iovec and xarray cases.  However, in kvec and bvec cases
the iterator had not been advanced on such failures, breaking the users.
Fixed now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:18 -04:00
Al Viro
7491a2bf64 iterate_bvec(): expand bvec.h macro forest, massage a bit
... incidentally, using pointer instead of index in an array
(the only change here) trims half-kilobyte of .text...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:17 -04:00
Al Viro
5c67aa90cd iov_iter: unify iterate_iovec and iterate_kvec
The differences between iterate_iovec and iterate_kvec are minor:
	* kvec callback is treated as if it returned 0
	* initialization of __p is with i->iov and i->kvec resp.
which is trivially dealt with.

No code generation changes - compiler is quite capable of turning
	left = ((void)(STEP), 0);
	__v.iov_len -= left;
(with no accesses to left downstream) and
	(void)(STEP);
into the same code.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:16 -04:00
Al Viro
7a1bcb5d25 iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec
Premature optimization is the root of all evil...  Trying
to unroll the first pass through the loop makes it harder
to follow and not just for readers - compiler ends up
generating worse code than it would on a "non-optimized"
loop.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:16 -04:00
Al Viro
f5da83545f iterate_and_advance(): get rid of magic in case when n is 0
iov_iter_advance() needs to do some non-trivial work when it's given
0 as argument (skip all empty iovecs, mostly).  We used to implement
it via iterate_and_advance(); we no longer do so and for all other
users of iterate_and_advance() zero length is a no-op.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:15 -04:00
Al Viro
594e450b3f csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter()
Namely, have off counted starting from 0 rather than from csstate->off.
To compensate we need to shift the initial value (csstate->sum) (rotate
by 8 bits, as usual for csum) and do the same after we are finished adding
the pieces up.

What we get out of that is a bit more redundancy in our variables - from
is always equal to addr + off, which will be useful several commits down
the road.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:14 -04:00
Al Viro
f0b65f39ac iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant
Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the
callers do *not* need to do iov_iter_advance() after it.  In case when they end
up consuming less than they'd been given they need to do iov_iter_revert() on
everything they had not consumed.  That, however, needs to be done only on slow
paths.

All in-tree callers converted.  And that kills the last user of iterate_all_kinds()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:14 -04:00
Al Viro
e4f8df8679 [xarray] iov_iter_npages(): just use DIV_ROUND_UP()
Compiler is capable of recognizing division by power of 2 and turning
it into shifts.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:13 -04:00
Al Viro
66531c65aa iov_iter_npages(): don't bother with iterate_all_kinds()
note that in bvec case pages can be compound ones - we can't just assume
that each segment is covered by one (sub)page

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:12 -04:00
Al Viro
3d671ca62a get rid of iterate_all_kinds() in iov_iter_get_pages()/iov_iter_get_pages_alloc()
Here iterate_all_kinds() is used just to find the first (non-empty, in
case of iovec) segment.  Which can be easily done explicitly.
Note that in bvec case we now can get more than PAGE_SIZE worth of them,
in case when we have a compound page in bvec and a range that crosses
a subpage boundary.  Older behaviour had been to stop on that boundary;
we used to get the right first page (for_each_bvec() took care of that),
but that was all we'd got.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:12 -04:00
Al Viro
610c7a7154 iov_iter_gap_alignment(): get rid of iterate_all_kinds()
For one thing, it's only used for iovec (and makes sense only for those).
For another, here we don't care about iov_offset, since the beginning of
the first segment and the end of the last one are ignored.  So it makes
a lot more sense to just walk through the iovec array...

We need to deal with the case of truncated iov_iter, but unlike the
situation with iov_iter_alignment() we don't care where the last
segment ends - just which segment is the last one.

[fixed a braino spotted by Qian Cai <quic_qiancai@quicinc.com>]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:11 -04:00
Al Viro
9221d2e37b iov_iter_alignment(): don't bother with iterate_all_kinds()
It's easier to go over the array manually.  We need to watch out
for truncated iov_iter, though - iovec array might cover more
than i->count.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:10 -04:00
Al Viro
8409a0d261 sanitize iov_iter_fault_in_readable()
1) constify iov_iter argument; we are not advancing it in this primitive.

2) cap the amount requested by the amount of data in iov_iter.  All
existing callers should've been safe, but the check is really cheap and
doing it here makes for easier analysis, as well as more consistent
semantics among the primitives.

3) don't bother with iterate_iovec().  Explicit loop is not any harder
to follow, and we get rid of standalone iterate_iovec() users - it's
only used by iterate_and_advance() and (soon to be gone) iterate_all_kinds().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:10 -04:00
Al Viro
185ac4d436 iov_iter: optimize iov_iter_advance() for iovec and kvec
We can do better than generic iterate_and_advance() for this one;
inspired by bvec_iter_advance() (and massaged into that form by
equivalent transformations).

[fixed a braino caught by kernel test robot <oliver.sang@intel.com>]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:09 -04:00
Al Viro
8cd54c1c84 iov_iter: separate direction from flavour
Instead of having them mixed in iter->type, use separate ->iter_type
and ->data_source (u8 and bool resp.)  And don't bother with (pseudo-)
bitmap for the former - microoptimizations from being able to check
if the flavour is one of two values are not worth the confusion for
optimizer.  It can't prove that we never get e.g. ITER_IOVEC | ITER_PIPE,
so we end up with extra headache.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:08 -04:00
Al Viro
556351c1c0 iov_iter_advance(): don't modify ->iov_offset for ITER_DISCARD
the field is not used for that flavour

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:08 -04:00
Al Viro
28f38db7ed iov_iter: reorder handling of flavours in primitives
iovec is the most common one; test it first and test explicitly,
rather than "not anything else".  Replace all flavour checks with
use of iov_iter_is_...() helpers.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10 11:45:07 -04:00