IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY+5NlQAKCRCRxhvAZXjc
orOaAP9i2h3OJy95nO2Fpde0Bt2UT+oulKCCcGlvXJ8/+TQpyQD/ZQq47gFQ0EAz
Br5NxeyGeecAb0lHpFz+CpLGsxMrMwQ=
=+BG5
-----END PGP SIGNATURE-----
Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs idmapping updates from Christian Brauner:
- Last cycle we introduced the dedicated struct mnt_idmap type for
mount idmapping and the required infrastucture in 256c8aed2b42 ("fs:
introduce dedicated idmap type for mounts"). As promised in last
cycle's pull request message this converts everything to rely on
struct mnt_idmap.
Currently we still pass around the plain namespace that was attached
to a mount. This is in general pretty convenient but it makes it easy
to conflate namespaces that are relevant on the filesystem with
namespaces that are relevant on the mount level. Especially for
non-vfs developers without detailed knowledge in this area this was a
potential source for bugs.
This finishes the conversion. Instead of passing the plain namespace
around this updates all places that currently take a pointer to a
mnt_userns with a pointer to struct mnt_idmap.
Now that the conversion is done all helpers down to the really
low-level helpers only accept a struct mnt_idmap argument instead of
two namespace arguments.
Conflating mount and other idmappings will now cause the compiler to
complain loudly thus eliminating the possibility of any bugs. This
makes it impossible for filesystem developers to mix up mount and
filesystem idmappings as they are two distinct types and require
distinct helpers that cannot be used interchangeably.
Everything associated with struct mnt_idmap is moved into a single
separate file. With that change no code can poke around in struct
mnt_idmap. It can only be interacted with through dedicated helpers.
That means all filesystems are and all of the vfs is completely
oblivious to the actual implementation of idmappings.
We are now also able to extend struct mnt_idmap as we see fit. For
example, we can decouple it completely from namespaces for users that
don't require or don't want to use them at all. We can also extend
the concept of idmappings so we can cover filesystem specific
requirements.
In combination with the vfs{g,u}id_t work we finished in v6.2 this
makes this feature substantially more robust and thus difficult to
implement wrong by a given filesystem and also protects the vfs.
- Enable idmapped mounts for tmpfs and fulfill a longstanding request.
A long-standing request from users had been to make it possible to
create idmapped mounts for tmpfs. For example, to share the host's
tmpfs mount between multiple sandboxes. This is a prerequisite for
some advanced Kubernetes cases. Systemd also has a range of use-cases
to increase service isolation. And there are more users of this.
However, with all of the other work going on this was way down on the
priority list but luckily someone other than ourselves picked this
up.
As usual the patch is tiny as all the infrastructure work had been
done multiple kernel releases ago. In addition to all the tests that
we already have I requested that Rodrigo add a dedicated tmpfs
testsuite for idmapped mounts to xfstests. It is to be included into
xfstests during the v6.3 development cycle. This should add a slew of
additional tests.
* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
shmem: support idmapped mounts for tmpfs
fs: move mnt_idmap
fs: port vfs{g,u}id helpers to mnt_idmap
fs: port fs{g,u}id helpers to mnt_idmap
fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
fs: port i_{g,u}id_{needs_}update() to mnt_idmap
quota: port to mnt_idmap
fs: port privilege checking helpers to mnt_idmap
fs: port inode_owner_or_capable() to mnt_idmap
fs: port inode_init_owner() to mnt_idmap
fs: port acl to mnt_idmap
fs: port xattr to mnt_idmap
fs: port ->permission() to pass mnt_idmap
fs: port ->fileattr_set() to pass mnt_idmap
fs: port ->set_acl() to pass mnt_idmap
fs: port ->get_acl() to pass mnt_idmap
fs: port ->tmpfile() to pass mnt_idmap
fs: port ->rename() to pass mnt_idmap
fs: port ->mknod() to pass mnt_idmap
fs: port ->mkdir() to pass mnt_idmap
...
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEES8DXskRxsqGE6vXTAA5oQRlWghUFAmPuC5kTHGpsYXl0b25A
a2VybmVsLm9yZwAKCRAADmhBGVaCFdXpEADFN/loTtcANLPQmLmgmJLDuZr2zKrf
aziMJRjqGMx6BLdwBgX8/XGBNwpG4tkVbI+zdRoVHkpcayMDpLq0dnrvi79a/dGU
fBrI72ZDMd/S9lzbodObHMziLqvgFthsPm9ldVAZ2Kt400KKNE+ozcveiC3yVGy0
n1k5BSt/78abzpqut5whVgJBooHtUMCh3XvBJPKwgOneHfAXCm+jqaXlKKpKlpZj
s2OUyn8BLfNkTgpAZ88L5Rkf0mftjziL6C8KOMy1hvOsyiP0IkwLuQ/kO+2H0Ate
p3tbOGvUT+n1gYpFYBDLnuWB4G8+CVPxfoO6KGhwT4OlCJpPlNCM8O+w/A/dKn4I
858spkpYPMy91lEkcrRLRkg/MARWGTgZ3k76fp3OWNnfruWd6ekMlYKx9n6CIy34
Aoc3Svy9KeA7oRrbRDltmw3UVmz53GcDo337ZL1J6Jph3s86dMG7AwGYvoDfKuKK
b0oNK5db5v50scBnRHX6UWejE5fSnnHvgC7pU57u08odCVEUALB+r8f04vmkxcVJ
Qed7lolQdFzn9ddaOXzpg5KeCe/cX3p4IPZSTHad7CPr8gswmC135DfXCr64x2hC
5jyNzKbe/x+7B2xCweHmEk4ojt8IU3UaYxLJoQkNeVr8rEGC9gqZgkSDe7BnTpOf
wT0ijzhy2u5RKg==
=Zhf3
-----END PGP SIGNATURE-----
Merge tag 'locks-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking updates from Jeff Layton:
"The main change here is that I've broken out most of the file locking
definitions into a new header file. I also went ahead and completed
the removal of locks_inode function"
* tag 'locks-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fs: remove locks_inode
filelock: move file locking definitions to separate header file
Convert function to use folios throughout. This is in preparation for the
removal of find_get_pgaes_range_tag(). This change removes 8 calls to
compound_head().
Also had to modify and rename gfs2_write_jdata_pagevec() to take in and
utilize folio_batch rather than pagevec and use folios rather than pages.
gfs2_write_jdata_batch() now supports large folios.
Link: https://lkml.kernel.org/r/20230104211448.4804-18-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In gfs2_make_fs_rw(), make sure to call gfs2_consist() to report an
inconsistency and mark the filesystem as withdrawn when
gfs2_find_jhead() fails.
At the end of gfs2_make_fs_rw(), when we discover that the filesystem
has been withdrawn, make sure we report an error. This also replaces
the gfs2_withdrawn() check after gfs2_find_jhead().
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: syzbot+f51cb4b9afbd87ec06f2@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This reverts commit 970343cd4904 ("GFS2: free disk inode which is
deleted by remote node -V2").
The original intent behind commit 970343cd49 was to cull dentries when a
remote node requests to demote an iopen glock, which happens when the
remote node tries to delete the inode. This is now handled by
gfs2_try_evict(), which is called via iopen_go_callback() ->
gfs2_queue_try_to_evict().
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Add a gfs2_evict_inodes() helper that evicts inodes cooperatively across
the cluster. This avoids running into timeouts during unmount
unnecessarily.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In gfs2_kill_sb(), flush the delete work queue after setting the
SDF_DEACTIVATING flag. This ensures that no new inodes will be
instantiated anymore, and the inode cache will be empty after the
following kill_block_super() -> generic_shutdown_super() ->
evict_inodes() call.
With that, function gfs2_make_fs_ro() now calls gfs2_flush_delete_work()
after the workqueue has been destroyed. Skip that by checking for the
presence of the SDF_DEACTIVATING flag.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Add a check to delete_work_func() so that it quits when it finds that
the filesystem is deactivating. This speeds up the delete workqueue
draining in gfs2_kill_sb().
In addition, make sure that iopen_go_callback() won't queue any new
delete work while the filesystem is deactivating.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Add a new SDF_DEACTIVATING super block flag that is set when the
filesystem has started to deactivate. This will be used in the next
patch to stop and drain the delete work during unmount.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_clear_rgrpd() is called during unmount to free all rgrps
and their sub-objects. If the rgrp glock is held (e.g. in SH) it calls
gfs2_glock_cb() to unlock, then calls flush_delayed_work() to make
sure any glock work is finished. However, there is a race with other
cluster nodes who may request the rgrp glock in another mode (say, EX).
Func gfs2_clear_rgrpd() calls glock_clear_object() which sets gl_object
to NULL but that's done without holding the gl_lockref spin_lock.
While the lock is not held Another node's demote request can cause the
state machine to run again, and since the gl_lockref is released in
do_xmote, the second process's call to do_xmote can call go_inval
(rgrp_go_inval) after the gl_object has been cleared, which results in
NULL pointer reference of the rgrp glock's gl_object.
Other go_inval glops functions don't require the gl_object to exist, as
evidenced by function inode_go_inval() which explicitly checks for if
(ip) before referencing gl_object. This patch does the same thing
for rgrp glocks. Both the go_inval and go_sync ops are patched to check
the existence of gl_object (rgd) before trying to dereference it.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function delete_work_func() is used for two purposes:
* to immediately try to evict the glock's inode, and
* to verify after a little while that the inode has been deleted as
expected, and didn't just get skipped.
These two operations are not separated very well, so introduce two new
glock flags to improved that. Split gfs2_queue_delete_work() into
gfs2_queue_try_to_evict and gfs2_queue_verify_evict().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Move the global delete workqueue into struct gfs2_sbd so that we can
flush / drain it without interfering with other filesystems.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Get rid of the GLF_PENDING_DELETE glock flag introduced by commit
a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work"). The only use
of that flag is to prevent the iopen glock from being demoted (i.e.,
unlocked) while delete work is pending. It turns out that demoting the
iopen glock while delete work is pending is perfectly fine; we only need
to make sure that the glock isn't being freed while still in use. This
is ensured by the previous patch because delete_work_func() owns a
reference while the work is queued or running.
With these changes, gfs2_queue_delete_work() no longer takes the glock
spin lock, so we can use it in iopen_go_callback() instead of
open-coding it there.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In __gfs2_glock_put(), remove the glock from the lru list *after*
dropping the glock lock. This prevents deadlocks against
gfs2_scan_glock_lru().
In gfs2_scan_glock_lru(), make sure that the glock's reference count is
zero before moving the glock to the dispose list. This skips glocks
that are marked dead as well as glocks that are still in use.
Additionally, switch to spin_trylock() as we already do in
gfs2_dispose_glock_lru(); this alone would also be enough to prevent
deadlocks against __gfs2_glock_put().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Switch to list_for_each_entry_safe() and eliminate the "skipped" list in
gfs2_scan_glock_lru().
At the same time, scan the requested number of items to scan, not one
more than that number.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Improve the comment describing the inode and iopen glock interactions
and the glock poking related to inode evict.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function glock_clear_object() checks if the specified glock is still
pointing at the right object and clears the gl_object pointer. To
handle the case of incompletely constructed inodes, glock_clear_object()
also allows gl_object to be NULL.
However, in the teardown case, when iget_failed() is called and the
inode is removed from the inode hash, by the time we get to the
glock_clear_object() calls in gfs2_put_super() and its helpers, we don't
have exclusion against concurrent gfs2_inode_lookup() and
gfs2_create_inode() calls, and the inode and iopen glocks may already be
pointing at another inode, so the checks in glock_clear_object() are
incorrect.
To better handle this case, always completely disassociate an inode from
its glocks before tearing it down. In addition, get rid of a duplicate
glock_clear_object() call in gfs2_evict_inode(). That way,
glock_clear_object() will only ever be called when the glock points at
the current inode, and the NULL check in glock_clear_object() can be
removed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The uevent() callback in struct kset_uevent_ops does not modify the
kobject passed into it, so make the pointer const to enforce this
restriction. When doing so, fix up all existing uevent() callbacks to
have the correct signature to preserve the build.
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-17-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b2b0a5e97855 switched from generic_writepages() to
filemap_fdatawrite_wbc() in gfs2_ail1_start_one() on the path to
replacing ->writepage() with ->writepages() and eventually eliminating
the former. Function gfs2_ail1_start_one() is called from
gfs2_log_flush(), our main function for flushing the filesystem log.
Unfortunately, at least as implemented today, ->writepage() and
->writepages() are entirely different operations for journaled data
inodes: while the former creates and submits transactions covering the
data to be written, the latter flushes dirty buffers out to disk.
With gfs2_ail1_start_one() now calling ->writepages(), we end up
creating filesystem transactions while we are in the course of a log
flush, which immediately deadlocks on the sdp->sd_log_flush_lock
semaphore.
Work around that by going back to how things used to work before commit
b2b0a5e97855 for now; figuring out a superior solution will take time we
don't have available right now. However ...
Since the removal of generic_writepages() is imminent, open-code it
here. We're already inside a blk_start_plug() ... blk_finish_plug()
section here, so skip that part of the original generic_writepages().
This reverts commit b2b0a5e978552e348f85ad9c7568b630a5ede659.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
These places just use b_page to get to the buffer's address_space.
Link: https://lkml.kernel.org/r/20221215214402.3522366-9-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The operations in struct page_ops all operate on folios, so rename
struct page_ops to struct folio_ops.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: port around not removing iomap_valid]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The ->page_prepare() handler in struct iomap_page_ops is now somewhat
misnamed, so rename it to ->get_folio().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Change the iomap ->page_prepare() handler to get and return a locked
folio instead of doing that in iomap_write_begin(). This allows to
recover from out-of-memory situations in ->page_prepare(), which
eliminates the corresponding error handling code in iomap_write_begin().
The ->put_folio() handler now also isn't called with NULL as the folio
value anymore.
Filesystems are expected to use the iomap_get_folio() helper for getting
locked folios in their ->page_prepare() handlers.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The ->page_done() handler in struct iomap_page_ops is now somewhat
misnamed in that it mainly deals with unlocking and putting a folio, so
rename it to ->put_folio().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
When an iomap defines a ->page_done() handler in its page_ops, delegate
unlocking the folio and putting the folio reference to that handler.
This allows to fix a race between journaled data writes and folio
writeback in gfs2: before this change, gfs2_iomap_page_done() was called
after unlocking the folio, so writeback could start writing back the
folio's buffers before they could be marked for writing to the journal.
Also, try_to_free_buffers() could free the buffers before
gfs2_iomap_page_done() was done adding the buffers to the current
current transaction. With this change, gfs2_iomap_page_done() adds the
buffers to the current transaction while the folio is still locked, so
the problems described above can no longer occur.
The only current user of ->page_done() is gfs2, so other filesystems are
not affected. To catch out any out-of-tree users, switch from a page to
a folio in ->page_done().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The file locking definitions have lived in fs.h since the dawn of time,
but they are only used by a small subset of the source files that
include it.
Move the file locking definitions to a new header file, and add the
appropriate #include directives to the source files that need them. By
doing this we trim down fs.h a bit and limit the amount of rebuilding
that has to be done when we make changes to the file locking APIs.
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Steve French <stfrench@microsoft.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
- Revert a change to delete_work_func() that has gone wrong in commit
c412a97cf6c5 ("gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED
inodes").
- Avoid dequeuing GL_ASYNC glock holders twice by first checking if the
holder is still queued.
- gfs2: Always check the inode size of inline inodes when reading in
inodes to prevent corrupt filesystem images from causing weid errors.
- Properly handle a race between gfs2_create_inode() and
gfs2_inode_lookup() that causes insert_inode_locked4() to return
-EBUSY.
- Fix and clean up the interaction between gfs2_create_inode() and
gfs2_evict_inode() by completely handling the inode deallocation and
destruction in gfs2_evict_inode().
- Remove support for glock holder auto-demotion as we have no current
plans of using this feature again.
- And a few more minor cleanups and clarifications.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmOcXbEUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZToVmA/5AQ8BkPBTmQmwpP1Nlox21Gf1Pf8e
8Nne19X85ZEkSSRU+2xzF9TetRzBM/LrdV1x0hjzUCveNFsiKBGer/kObT3gh8ST
HqXRkJz96lHvcQMbNH1JFgYwz9tdxgbCc3xVBAWKeXgy+hrQsiJAnYvlRJpc5T67
+sGAPcCoVXxmkHhW0STLKFY2jNUem6hxox6wDpEK8JEcMAQJa9s9RCiPlWVKUV/p
hD9T0Hh336sRIVOOPLqY71tA2cgy4/d95zVo61h5vGpAwVkGkFnHtyMUAbwfJncf
KljV8y8lLxFoxOcwLJ0Z9bbjM2+fHzOCUiSt245lup3+diTdjr/WN0bn68/wRLfd
ktylQZdvbPO3q44LeQDQIlPT1xH/Srdm9tZbSyn6p4aRc9s07nVdqBHZ9b4TkREo
4ZdeSu/OG0+h/kIn9HCPfrmxUKN3a9RMI4cXesLu7WmuNZylpHynVrX78K8TAFfq
yfTsqjCIe84xppW3Rg2vS3DfAuLwE+QzeYzd9vT1zAKn7krS/f5IXVawG5Tj0K6y
83eeGuw1BeAH6jNO7ZhomC5Gea/PPn02RmFXhlG1uKMHBMYMI0MBcYmUbp9lweCG
2jiT43D3fTLMreaTiZUsOC1Qn7HPEb2SKm9YFXM2e5cQh2iLfpg9q0aKRYSYmwbC
u/JixreXHb+HfkE=
=mwok
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v6.1-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updtaes from Andreas Gruenbacher:
- Revert a change to delete_work_func() that has gone wrong in commit
c412a97cf6c5 ("gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED
inodes").
- Avoid dequeuing GL_ASYNC glock holders twice by first checking if the
holder is still queued.
- gfs2: Always check the inode size of inline inodes when reading in
inodes to prevent corrupt filesystem images from causing weid errors.
- Properly handle a race between gfs2_create_inode() and
gfs2_inode_lookup() that causes insert_inode_locked4() to return
-EBUSY.
- Fix and clean up the interaction between gfs2_create_inode() and
gfs2_evict_inode() by completely handling the inode deallocation and
destruction in gfs2_evict_inode().
- Remove support for glock holder auto-demotion as we have no current
plans of using this feature again.
- And a few more minor cleanups and clarifications.
* tag 'gfs2-v6.1-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Remove support for glock holder auto-demotion (2)
gfs2: Remove support for glock holder auto-demotion
gfs2: Minor gfs2_try_evict cleanup
gfs2: Partially revert gfs2_inode_lookup change
gfs2: Add gfs2_inode_lookup comment
gfs2: Uninline and improve glock_{set,clear}_object
gfs2: Simply dequeue iopen glock in gfs2_evict_inode
gfs2: Clean up after gfs2_create_inode rework
gfs2: Avoid dequeuing GL_ASYNC glock holders twice
gfs2: Make gfs2_glock_hold return its glock argument
gfs2: Always check inode size of inline inodes
gfs2: Cosmetic gfs2_dinode_{in,out} cleanup
gfs2: Handle -EBUSY result of insert_inode_locked4
gfs2: Fix and clean up create / evict interaction
gfs2: Clean up initialization of "ip" in gfs2_create_inode
gfs2: Get rid of ghs[] in gfs2_create_inode
gfs2: Add extra error check in alloc_dinode
As a follow-up to the previous commit, move the recovery related code in
__gfs2_glock_dq() to gfs2_glock_dq() where it better fits. No
functional change.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Remove the support for glock holder auto-demotion (commit dc732906c245
and folow-ups) as we are not planning to use this feature, and the
additional code therefore only adds unnecessary complexity.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In gfs2_try_evict(), when an inode can't be evicted, we are grabbing a
temporary reference on the inode glock to poke that glock. That should
be safe, but it's easier to just grab an inode reference as we already
do earlier in this function.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Commit c412a97cf6c5 changed delete_work_func() to always perform an
inode lookup when gfs2_try_evict() fails. This doesn't make sense as a
gfs2_try_evict() failure indicates that the inode is likely still in
use. Revert that change.
Fixes: c412a97cf6c5 ("gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Add comment on when and why gfs2_cancel_delete_work() needs to be
skipped in gfs2_inode_lookup().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Those functions have reached a size at which having them inline isn't
useful anymore, so uninline them. In addition, report the glock name on
assertion failures.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
With the previous change, to simplify things, we can always just dequeue
and uninitialize the iopen glock in gfs2_evict_inode() even if it isn't
queued anymore.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Since commit 3d36e57ff768 ("gfs2: gfs2_create_inode rework"),
gfs2_evict_inode() and gfs2_create_inode() / gfs2_inode_lookup() will
synchronize via the inode hash table and we can be certain that once a
new inode is inserted into the inode hash table(), gfs2_evict_inode()
has completely destroyed any previous versions. We no longer need to
worry about overlapping inode object lifespans. Update the code and
comments accordingly.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When a locking request fails, the associated glock holder is
automatically dequeued from the list of active and waiting holders. For
GL_ASYNC locking requests, this will obviously happen asynchronously
and it can race with attempts to cancel that locking request via
gfs2_glock_dq(). Therefore, don't forget to check if a locking request
has already been dequeued in gfs2_glock_dq().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>