1262714 Commits

Author SHA1 Message Date
Thomas Bertschinger
b07ce72626 bcachefs: omit alignment attribute on big endian struct bkey
This is needed for building Rust bindings on big endian architectures
like s390x. Currently this is only done in userspace, but it might
happen in-kernel in the future. When creating a Rust binding for struct
bkey, the "packed" attribute is needed to get a type with the correct
member offsets in the big endian case. However, rustc does not allow
types to have both a "packed" and "align" attribute. Thus, in order to
get a Rust type compatible with the C type, we must omit the "aligned"
attribute in C.

This does not affect the struct's size or member offsets, only its
toplevel alignment, which should be an acceptable impact.

The little endian version can have the "align" attribute because the
"packed" attr is redundant, and rust-bindgen will omit the "packed" attr
when an "align" attr is present and it can do so without changing a
type's layout

Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:25 -04:00
Kent Overstreet
6e9d0558b1 bcachefs: bch2_trigger_alloc() handles state changes better
bch2_trigger_alloc() kicks off certain tasks on bucket state changes;
e.g. triggering the bucket discard worker and the invalidate worker.

We've observed the discard worker running too often - most runs it
doesn't do any work, according to the tracepoint - so clearly, we're
kicking it off too often.

This adds an explicit statechange() macro to make these checks more
precise.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
b63570f747 bcachefs: bch2_print_opts()
Make sure early error messages get redirected, for
kernel-fsck-from-userland.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
130d229ff5 bcachefs: Improve error messages in device remove path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
5ca8ff157d bcachefs: Use kvzalloc() when dynamically allocating btree paths
THis silences a mm/page_alloc.c warning about allocating more than a
page with GFP_NOFAIL - and there's no reason for this to not have a
vmalloc fallback anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
83bd5985fa bcachefs: Track iter->ip_allocated at bch2_trans_copy_iter()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
3254c1b0e5 bcachefs: Save key_cache_path in peek_slot()
When bch2_btree_iter_peek_slot() clones the iterator to search for the
next key, and then discovers that the key from the cloned iterator is
the key we want to return - we also want to save the
iter->key_cache_path as well, for the update path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
91dcad18d3 bcachefs: Pin btree cache in ram for random access in fsck
Various phases of fsck involve checking references from one btree to
another: this means doing a sequential scan of one btree, and then
mostly random access into the second.

This is particularly painful for checking extents <-> backpointers; we
can prefetch btree node access on the sequential scan, but not on the
random access portion, and this is particularly painful on spinning
rust, where we'd like to keep the pipeline fairly full of btree node
reads so that the elevator can reduce seeking.

This patch implements prefetching and pinning of the portion of the
btree that we'll be doing random access to. We already calculate how
much of the random access btree will fit in memory so it's a fairly
straightforward change.

This will put more pressure on system memory usage, so we introduce a
new option, fsck_memory_usage_percent, which is the percentage of total
system ram that fsck is allowed to pin.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
835cd3e147 bcachefs: Check for subvolume children when deleting subvolumes
Recursively destroying subvolumes isn't allowed yet.

Fixes: https://github.com/koverstreet/bcachefs/issues/634
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
b26d79147f bcachefs: BTREE_ID_subvolume_children
Add a btree to record a parent -> child subvolume relationships,
according to the filesystem heirarchy.

The subvolume_children btree is a bitset btree: if a bit is set at pos
p, that means p.offset is a child of subvolume p.inode.

This will be used for efficiently listing subvolumes, as well as
recursive deletion.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
b8628a2529 bcachefs: bch_subvolume::fs_path_parent
Record the filesystem path heirarchy for subvolumes in bch_subvolume

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
e07c28ab92 bcachefs: bch2_btree_bit_mod()
Provide a non-write buffer version of bch2_btree_bit_mod_buffered(), for
the subvolume children btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
506b187603 bcachefs: bch2_btree_bit_mod -> bch2_btree_bit_mod_buffered
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
56e230473d bcachefs: Correctly reattach subvolumes
Subvolumes need special handling to reattach - we always reattach them
in the root subvolume's lost+found, and they need a slightly different
kind of dirent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
3a136177f3 bcachefs: check_path() now prints full inode when reattaching
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
688a769409 bcachefs: Pass inode bkey to check_path()
prep work for improving logging/error messages

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
f5d58d0c72 bcachefs: Fix path where dirent -> subvol missing and we don't fix
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
64304aaf4e bcachefs: bch_subvolume::parent -> creation_parent
bit of renaming, prep for adding a fs path parent

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
45b4ed525e bcachefs: Repair subvol dirents that point to non subvols
when repair switches d_type to or from DT_SUBVOL, we need to update the
target accordingly

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
c60b7f803c bcachefs: check dirent->d_parent_subvol
Check that d_parent_subvol makes sense - the dirent's snapshot must be
visible in d_parent_subvol (i.e. an ancestor of d_parent_subvol's
snapshot) in order to be visible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
f4e68c859f bcachefs: check inode->bi_parent_subvol against dirent
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
ea27001e14 bcachefs: delete duplicated checks in check_dirent_to_subvol()
these were already checked in check_subvol()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
e539ebb867 bcachefs: simplify check_dirent_inode_dirent()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:24 -04:00
Kent Overstreet
0b498a5a39 bcachefs: check bi_parent_subvol in check_inode()
check for inodes with a nonzero bi_parent_subvol field that aren't
actually subvolume roots

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
971a1503a2 bcachefs: better log message in lookup_inode_for_snapshot()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
0b17618fdc bcachefs: check_inode_dirent_inode()
check that if an inode has a backpointer, the dirent it points to points
back to it.

We do this in check_dirent_inode_dirent(), but only for inodes that have
dirents that point to them - we also have to do the check starting from
the inode to catch inodes that don't have dirents that point to them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
f2b02d099c bcachefs: Check subvol <-> inode pointers in check_inode()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
4c20278eb1 bcachefs: Check subvol <-> inode pointers in check_subvol()
Subvolumes and subvolume root inodes point to each other: this verifies
the subvolume -> inode -> subvolme path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
52946d828a bcachefs: Kill more -EIO error codes
This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:23 -04:00
Kent Overstreet
da23795e4c bcachefs: thread_with_file: add f_ops.flush
Add a flush op, to return the exit code via close().

Also update bcachefs usage to use this to return fsck exit codes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:20 -04:00
Kent Overstreet
6b33312925 bcachefs: thread_with_file: Fix missing va_end()
Fixes: https://lore.kernel.org/linux-bcachefs/202402131603.E953E2CF@keescook/T/#u
Reported-by: coverity scan
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:18 -04:00
Darrick J. Wong
658a1e42ce bcachefs: thread_with_file: allow ioctls against these files
Make it so that a thread_with_stdio user can handle ioctls against the
file descriptor.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:15 -04:00
Darrick J. Wong
ab6752e24e bcachefs: thread_with_file: create ops structure for thread_with_stdio
Create an ops structure so we can add more file-based functionality in
the next few patches.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:13 -04:00
Darrick J. Wong
1cbae651e5 bcachefs: thread_with_file: fix various printf problems
Experimentally fix some problems with stdio_redirect_vprintf by creating
a MOO variant with which we can experiment.  We can't do a GFP_KERNEL
allocation while holding the spinlock, and I don't like how the printf
function can silently truncate the output if memory allocation fails.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:10 -04:00
Darrick J. Wong
fcb1620edd bcachefs: thread_with_file: allow creation of readonly files
Create a new run_thread_with_stdout function that opens a file in
O_RDONLY mode so that the kernel can write things to userspace but
userspace cannot write to the kernel.  This will be used to convey xfs
health event information to userspace.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:08 -04:00
Kent Overstreet
a5a650d647 bcachefs: thread_with_stdio: suppress hung task warning
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:04 -04:00
Kent Overstreet
5c3273ec3c kernel/hung_task.c: export sysctl_hung_task_timeout_secs
needed for thread_with_file; also rare but not unheard of to need this
in module code, when blocking on user input.

one workaround used by some code is wait_event_interruptible() - but
that can be buggy if the outer context isn't expecting unwinding.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: fuyuanli <fuyuanli@didiglobal.com>
2024-03-13 21:22:04 -04:00
Kent Overstreet
8f9320d3a3 bcachefs: thread_with_stdio: Mark completed in ->release()
This fixes stdio_redirect_read() getting stuck, not noticing that the
pipe has been closed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 21:22:00 -04:00
Jonathan Cameron
2056b282ad of: unittest: Use for_each_child_of_node_scoped()
A simple example of the utility of this autocleanup approach to
handling of_node_put().

In this particular case some of the nodes needed for the test are
not available and the _available_ version would cause them to be
skipped resulting in a test failure.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240225142714.286440-4-jic23@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
2024-03-13 16:59:00 -06:00
Jonathan Cameron
34af4554fb of: Introduce for_each_*_child_of_node_scoped() to automate of_node_put() handling
To avoid issues with out of order cleanup, or ambiguity about when the
auto freed data is first instantiated, do it within the for loop definition.

The disadvantage is that the struct device_node *child variable creation
is not immediately obvious where this is used.
However, in many cases, if there is another definition of
struct device_node *child; the compiler / static analysers will notify us
that it is unused, or uninitialized.

Note that, in the vast majority of cases, the _available_ form should be
used and as code is converted to these scoped handers, we should confirm
that any cases that do not check for available have a good reason not
to.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240225142714.286440-3-jic23@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
2024-03-13 16:59:00 -06:00
Jonathan Cameron
9448e55d03 of: Add cleanup.h based auto release via __free(device_node) markings
The recent addition of scope based cleanup support to the kernel
provides a convenient tool to reduce the chances of leaking reference
counts where of_node_put() should have been called in an error path.

This enables
	struct device_node *child __free(device_node) = NULL;

	for_each_child_of_node(np, child) {
		if (test)
			return test;
	}

with no need for a manual call of of_node_put().
A following patch will reduce the scope of the child variable to the
for loop, to avoid an issues with ordering of autocleanup, and make it
obvious when this assigned a non NULL value.

In this simple example the gains are small but there are some very
complex error handling cases buried in these loops that will be
greatly simplified by enabling early returns with out the need
for this manual of_node_put() call.

Note that there are coccinelle checks in
scripts/coccinelle/iterators/for_each_child.cocci to detect a failure
to call of_node_put(). This new approach does not cause false positives.
Longer term we may want to add scripting to check this new approach is
done correctly with no double of_node_put() calls being introduced due
to the auto cleanup. It may also be useful to script finding places
this new approach is useful.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240225142714.286440-2-jic23@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
2024-03-13 16:58:50 -06:00
Kent Overstreet
032b3fd057 bcachefs: Thread with file documentation
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:13 -04:00
Kent Overstreet
f704f108af bcachefs: thread_with_stdio: fix bch2_stdio_redirect_readline()
This fixes a bug where we'd return data without waiting for a newline,
if data was present but a newline was not.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:13 -04:00
Kent Overstreet
a6777ca4ff bcachefs: thread_with_stdio: kill thread_with_stdio_done()
Move the cleanup code to a wrapper function, where we can call it after
the thread_with_stdio fn exits.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:13 -04:00
Kent Overstreet
60e1baa872 bcachefs: thread_with_stdio: convert to darray
- eliminate the dependency on printbufs, so that we can lift
   thread_with_file for use in xfs
 - add a nonblocking parameter to stdio_redirect_printf(), and either
   block if the buffer is full or drop it on the floor - don't buffer
   infinitely

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:13 -04:00
Kent Overstreet
e017047fdb bcachefs: thread_with_stdio: eliminate double buffering
The output buffer lock has to be a spinlock so that we can write to it
from interrupt context, so we can't use a direct copy_to_user; this
switches thread_with_file_read() to use fault_in_writeable() and
copy_to_user_nofault(), similar to how thread_with_file_write() works.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:13 -04:00
Kent Overstreet
cb6fc943b6 bcachefs: kill kvpmalloc()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13 18:39:12 -04:00
Kent Overstreet
0225bdfafd mempool: kvmalloc pool
Add mempool_init_kvmalloc_pool() and mempool_create_kvmalloc_pool(),
which wrap kvmalloc() instead of kmalloc() - kmalloc() with a vmalloc()
fallback.

This is part of a bcachefs cleanup - dropping an internal kvpmalloc()
helper (which predates kvmalloc()) along with mempool helpers; this
replaces the bcachefs-private kvpmalloc_pool.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-mm@kvack.org
2024-03-13 18:38:13 -04:00
Muhammad Usama Anjum
2842dc9bc1 mtd: spi-nor: core: correct type of i
The i should be signed to find out the end of the loop. Otherwise,
i >= 0 is always true and loop becomes infinite. Make its type to be
int.

Fixes: 6a9eda34418f ("mtd: spi-nor: core: set mtd->eraseregions for non-uniform erase map")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240304090103.818092-1-usama.anjum@collabora.com
2024-03-13 23:22:19 +01:00
Miquel Raynal
8c8d25d959 SPI NOR gets the non uniform erase code cleaned. We stopped using
bitmasks for erase types and flags, and instead introduced dedicated
 members. We then passed the SPI NOR erase map to MTD. Users can now
 determine the erase regions and make informed decisions on partitions
 size.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEHUIqys8OyG1eHf7fS1VPR6WNFOkFAmXgjyUACgkQS1VPR6WN
 FOk75wf8DtDQsjGX3MBAqmLZcCBtDhso6IDGdvGanCrpk8Oi9bD61cqncdgG/lEz
 A7Ir7QXDPp8bvzxextJ2XyY2ekk2vfp81kJQx4Ethp7u8MrT3WLydGc4EghmC1OU
 FJ7vckp2u/NJZvn1h/DeRPhBxKVnDq0ez1QRyrNj37pw6OoyyGmOf3zHIKWWmAsz
 PAThBvqhVSnsOyJDF7deO4j2B7CDOE7KEUWGOIMhnqSj024mQX8PgLERzUzQnu3T
 le7lwIhtYX+Ubs9f3dboNiK3hrw95NiIEyYx6HCIUmSeA5x8mGRj8h4Kt5UmFzN4
 m5ypa+RgiIZqJozU6pVYgVm3TSml2w==
 =hkZT
 -----END PGP SIGNATURE-----

Merge tag 'spi-nor/for-6.9' into mtd/next

SPI NOR gets the non uniform erase code cleaned. We stopped using
bitmasks for erase types and flags, and instead introduced dedicated
members. We then passed the SPI NOR erase map to MTD. Users can now
determine the erase regions and make informed decisions on partitions
size.
2024-03-13 23:21:48 +01:00