IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add a check to delete_work_func() so that it quits when it finds that
the filesystem is deactivating. This speeds up the delete workqueue
draining in gfs2_kill_sb().
In addition, make sure that iopen_go_callback() won't queue any new
delete work while the filesystem is deactivating.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function delete_work_func() is used for two purposes:
* to immediately try to evict the glock's inode, and
* to verify after a little while that the inode has been deleted as
expected, and didn't just get skipped.
These two operations are not separated very well, so introduce two new
glock flags to improved that. Split gfs2_queue_delete_work() into
gfs2_queue_try_to_evict and gfs2_queue_verify_evict().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Move the global delete workqueue into struct gfs2_sbd so that we can
flush / drain it without interfering with other filesystems.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Get rid of the GLF_PENDING_DELETE glock flag introduced by commit
a0e3cc65fa ("gfs2: Turn gl_delete into a delayed work"). The only use
of that flag is to prevent the iopen glock from being demoted (i.e.,
unlocked) while delete work is pending. It turns out that demoting the
iopen glock while delete work is pending is perfectly fine; we only need
to make sure that the glock isn't being freed while still in use. This
is ensured by the previous patch because delete_work_func() owns a
reference while the work is queued or running.
With these changes, gfs2_queue_delete_work() no longer takes the glock
spin lock, so we can use it in iopen_go_callback() instead of
open-coding it there.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In __gfs2_glock_put(), remove the glock from the lru list *after*
dropping the glock lock. This prevents deadlocks against
gfs2_scan_glock_lru().
In gfs2_scan_glock_lru(), make sure that the glock's reference count is
zero before moving the glock to the dispose list. This skips glocks
that are marked dead as well as glocks that are still in use.
Additionally, switch to spin_trylock() as we already do in
gfs2_dispose_glock_lru(); this alone would also be enough to prevent
deadlocks against __gfs2_glock_put().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Switch to list_for_each_entry_safe() and eliminate the "skipped" list in
gfs2_scan_glock_lru().
At the same time, scan the requested number of items to scan, not one
more than that number.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function glock_clear_object() checks if the specified glock is still
pointing at the right object and clears the gl_object pointer. To
handle the case of incompletely constructed inodes, glock_clear_object()
also allows gl_object to be NULL.
However, in the teardown case, when iget_failed() is called and the
inode is removed from the inode hash, by the time we get to the
glock_clear_object() calls in gfs2_put_super() and its helpers, we don't
have exclusion against concurrent gfs2_inode_lookup() and
gfs2_create_inode() calls, and the inode and iopen glocks may already be
pointing at another inode, so the checks in glock_clear_object() are
incorrect.
To better handle this case, always completely disassociate an inode from
its glocks before tearing it down. In addition, get rid of a duplicate
glock_clear_object() call in gfs2_evict_inode(). That way,
glock_clear_object() will only ever be called when the glock points at
the current inode, and the NULL check in glock_clear_object() can be
removed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
As a follow-up to the previous commit, move the recovery related code in
__gfs2_glock_dq() to gfs2_glock_dq() where it better fits. No
functional change.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Remove the support for glock holder auto-demotion (commit dc732906c2
and folow-ups) as we are not planning to use this feature, and the
additional code therefore only adds unnecessary complexity.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In gfs2_try_evict(), when an inode can't be evicted, we are grabbing a
temporary reference on the inode glock to poke that glock. That should
be safe, but it's easier to just grab an inode reference as we already
do earlier in this function.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Commit c412a97cf6 changed delete_work_func() to always perform an
inode lookup when gfs2_try_evict() fails. This doesn't make sense as a
gfs2_try_evict() failure indicates that the inode is likely still in
use. Revert that change.
Fixes: c412a97cf6 ("gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Those functions have reached a size at which having them inline isn't
useful anymore, so uninline them. In addition, report the glock name on
assertion failures.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When a locking request fails, the associated glock holder is
automatically dequeued from the list of active and waiting holders. For
GL_ASYNC locking requests, this will obviously happen asynchronously
and it can race with attempts to cancel that locking request via
gfs2_glock_dq(). Therefore, don't forget to check if a locking request
has already been dequeued in gfs2_glock_dq().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Resolves a conflict in gfs2_inode_lookup() between the following commits:
gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes
gfs2: Mark the remaining process-independent glock holders as GL_NOPID
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
There are a couple places in function do_xmote where normal processing
is circumvented due to withdraws in progress. However, since we bypass
most of do_xmote() we bypass telling dlm to lock the dlm lock, which
means dlm will never respond with a completion callback. Since the
completion callback ordinarily clears GLF_LOCK, this patch changes
function do_xmote to handle those situations more gracefully so the
file system may be unmounted after withdraw.
A very similar situation happens with the GLF_DEMOTE_IN_PROGRESS flag,
which is cleared by function finish_xmote(). Since the withdraw causes
us to skip the majority of do_xmote, it therefore also skips the call
to finish_xmote() so the DEMOTE_IN_PROGRESS flag needs to be cleared
manually.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When a withdraw occurs, ordinary (not system) glocks may not be granted
anymore. Later, when the file system is unmounted, gfs2_gl_hash_clear()
tries to clear out all the glocks, but these un-grantable pending
waiters prevent some glocks from being freed. So the unmount hangs, at
least for its ten-minute timeout period.
This patch takes measures to remove any pending waiters from
the glocks that will never be granted. This allows the unmount to
proceed in a reasonable period of time.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, delete_work_func() would check for the GLF_DEMOTE
flag on the iopen glock and if set, it would perform special processing.
However, there was a race whereby the GLF_DEMOTE flag could be set by
another process after the check. Then when it called
gfs2_lookup_by_inum() which calls gfs2_inode_lookup(), it tried to lock
the iopen glock in SH mode, but the GLF_DEMOTE flag prevented the
request from being granted. But the iopen glock could never be demoted
because that happens when the inode is evicted, and the evict was never
completed because of the failed lookup.
To fix that, change function gfs2_inode_lookup() so that when
GFS2_BLKST_UNLINKED inodes are searched, it uses the LM_FLAG_TRY flag
for the iopen glock. If the locking request fails, fail
gfs2_inode_lookup() with -EAGAIN so that delete_work_func() can retry
the operation later.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
- Instantiate glocks ouside of the glock state engine, in the contect of
the process taking the glock. This moves unnecessary complexity out
of the core glock code. Clean up the instantiate logic to be more
sensible.
- In gfs2_glock_async_wait(), cancel pending locking request upon
failure. Make sure all glocks are left in a consistent state.
- Various other minor cleanups and fixes.
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmLtdg8UHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrqvA//WRdBtVgT7/5pkjljRolkBZ8B3sYx
T2KlHuiQdvnTGf2dWnOOoUzEZvPXPUovUZMA4dHx0jcRpOi4BsYGz986K/Zpq5hs
vieFEoKQdWk9O9NoNdRJN8Rl1tHTwejZi+kLerhYoJzgMC8AvgieLGO0Ol4Y0joc
lxop/8L1Tn2GiCN4NcBN7Eg2CC4ke58KZcMgWhWVBR2ZJe9/qdqlVEiehiSbCiiN
l89vsYLrG6bMylvNPc+AiyEvIGF5qkEHAErPIs7SfrjNRRWVhkmvTCWAO6JnehTQ
XwqYQiAWCXfxBXUYG1VSCgjmTynmO2yg1Slt+86OauI9ka+ow8epSmHh95TT1JcY
pmVF6CYhLI49dNl3R68CFlQ+Ov6iGt6gx9KEud5oE/Ew0vd/WIyi2/jSGrX59S07
zktMzEDjn31+jw31Raxc6+TQEU+0jQHCwzKWjbJ0tYy3nBdkCyefHwm199Ff40M/
6jHWaH/qcyuq8crrc8PLSJOguSd7FdfdFhXEmpaH2CPybvfuEVJfig4vYee3YtSx
KtZvgpy3bxBCfBDD7CPKfKMLrKrklYH+h7/lhCxbuSH0HvyS0ayXhmSvhXgfn+4e
uWY5yk7gHAaaKGOBkkYwFAWV7X32LS0ndWzI8Ac8m20ifV0eeveRNEX0A/fHIX2U
DlbhYq889mc2P70=
=qFus
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v5.19-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Instantiate glocks ouside of the glock state engine, in the contect
of the process taking the glock. This moves unnecessary complexity
out of the core glock code. Clean up the instantiate logic to be more
sensible.
- In gfs2_glock_async_wait(), cancel pending locking request upon
failure. Make sure all glocks are left in a consistent state.
- Various other minor cleanups and fixes.
* tag 'gfs2-v5.19-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: List traversal in do_promote is safe
gfs2: do_promote glock holder stealing fix
gfs2: Use better variable name
gfs2: Make go_instantiate take a glock
gfs2: Add new go_held glock operation
gfs2: Revert 'Fix "truncate in progress" hang'
gfs2: Instantiate glocks ouside of glock state engine
gfs2: Fix up gfs2_glock_async_wait
gfs2: Minor gfs2_glock_nq_m cleanup
gfs2: Fix spelling mistake in comment
gfs2: Rewrap overlong comment in do_promote
gfs2: Remove redundant NULL check before kfree
Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.
This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.
In some cases it's not possible to determine a good name at the time when
a shrinker is allocated. For such cases shrinker_debugfs_rename() is
provided.
The expected format is:
<subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.
After this change the shrinker debugfs directory looks like:
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40
[roman.gushchin@linux.dev: fix build warnings]
Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In do_promote(), we're never removing the current entry from the list
and so the list traversal is actually safe. Switch back to
list_for_each_entry().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In do_promote(), when the glock had no strong holders, we were
accidentally calling demote_incompat_holders() with new_gh == NULL, so
no weak holders were considered incompatible. Instead, the new holder
should have been passed in.
For doing that, the HIF_HOLDER flag needs to be set in new_gh to prevent
may_grant() from complaining. This means that the new holder will now
be recognized as a current holder, so skip over it explicitly in
demote_incompat_holders() to prevent it from being dequeued.
To further clarify things, we can now rename new_gh to current_gh in
demote_incompat_holders(); after all, the HIF_HOLDER flag is already set,
which means the new holder is already a current holder.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In do_promote() and add_to_queue(), use current_gh as the variable name
for the first strong holder we could find: this matches the variable
name is may_grant(), and more clearly indicates that we're interested in
one (any) of the current strong holders.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Make go_instantiate take a glock instead of a glock holder as its argument:
this handler is supposed to instantiate the object associated with the glock.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Right now, inode_go_instantiate() contains functionality that relates to
how a glock is held rather than the glock itself, like waiting for
pending direct I/O to complete and completing interrupted truncates.
This code is meant to be run each time a holder is acquired, but
go_instantiate is actually only called once, when the glock is
instantiated.
To fix that, introduce a new go_held glock operation that is called each
time a glock holder is acquired. Move the holder specific code in
inode_go_instantiate() over to inode_go_held().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Now that interrupted truncates are completed in the context of the
process taking the glock, there is no need for the glock state engine to
delegate that task to gfs2_quotad or for quotad to perform those
truncates anymore. Get rid of the obsolete associated infrastructure.
Reverts commit 813e0c46c9 ("GFS2: Fix "truncate in progress" hang").
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Instantiate glocks outside of the glock state engine: there is no real
reason for instantiating them inside the glock state engine; it only
complicates the code.
Instead, instantiate them in gfs2_glock_wait() and gfs2_glock_async_wait()
using the new gfs2_glock_holder_ready() helper. On top of that, the only
other place that acquires a glock without using gfs2_glock_wait() or
gfs2_glock_async_wait() is gfs2_upgrade_iopen_glock(), so call
gfs2_glock_holder_ready() there as well.
If a dinode has a pending truncate, the glock-specific instantiate function
for inodes wakes up the truncate function in the quota daemon. Waiting for
the completion of the truncate was previously done by the glock state
engine, but we now need to wait in inode_go_instantiate().
This also means that gfs2_instantiate() will now no longer return any
"special" error codes.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Since commit 1fc05c8d84 ("gfs2: cancel timed-out glock requests"), a
pending locking request can be canceled by calling gfs2_glock_dq() on
the pending holder. In gfs2_glock_async_wait(), when we time out, use
that to cancel the remaining locking requests and dequeue the locking
requests already granted. That's simpler as well as more efficient than
waiting for all locking requests to eventually be granted and dequeuing
them then.
In addition, gfs2_glock_async_wait() promises that by the time the
function completes, all glocks are either granted or dequeued, but the
implementation doesn't keep that promise if individual locking requests
fail. Fix that as well.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Add a GL_NOPID flag to indicate that once a glock holder has been acquired, it
won't be associated with the current process anymore. This is useful for iopen
and flock glocks which are associated with open files, as well as journal glock
holders and similar which are associated with the filesystem.
Once GL_NOPID is used for all applicable glocks (see the next patches),
processes will no longer be falsely reported as holding glocks which they are
not actually holding in the glocks dump file. Unlike before, when a process is
reported as having "(ended)", this will indicate an actual bug.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Include flock glocks in the "glockfd" debugfs file. Those are similar to the
iopen glocks; while an open file is holding an flock, it is holding the file's
flock glock.
We cannot take f_fl_mutex in gfs2_glockfd_seq_show_flock() or else dumping the
"glockfd" file would block on flock operations. Instead, use the file->f_lock
spin lock to protect the f_fl_gh.gh_gl glock pointer.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When a process has a gfs2 file open, the file is keeping a reference on the
underlying gfs2 inode, and the inode is keeping the inode's iopen glock held in
shared mode. In other words, the process depends on the iopen glock of each
open gfs2 file. Expose those dependencies in a new "glockfd" debugfs file.
The new debugfs file contains one line for each gfs2 file descriptor,
specifying the tgid, file descriptor number, and glock name, e.g.,
1601 6 5/816d
This list is compiled by iterating all tasks on the system using find_ge_pid(),
and all file descriptors of each task using task_lookup_next_fd_rcu(). To make
that work from gfs2, export those two functions.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Add state and flags arguments to gfs2_rlist_alloc() to make it somewhat more
obvious which state and flags an rlist uses. With that, stop knocking off
flags in gfs2_glock_nq_m() and its nq_m_sync() helper that are never set in the
first place.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Rewrap the comment to keep the line length below 80 characters.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Clang's structure layout randomization feature gets upset when it sees
struct address_space (which is randomized) cast to struct gfs2_glock.
This is due to seeing the mapping pointer as being treated as an array
of gfs2_glock, rather than "something else, before struct address_space":
In file included from fs/gfs2/acl.c:23:
fs/gfs2/meta_io.h:44:12: error: casting from randomized structure pointer type 'struct address_space *' to 'struct gfs2_glock *'
return (((struct gfs2_glock *)mapping) - 1)->gl_name.ln_sbd;
^
Replace the instances of open-coded pointer math with container_of()
usage, and update the allocator to match.
Some cleanups and conversion of gfs2_glock_get() and
gfs2_glock_dealloc() by Andreas.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/202205041550.naKxwCBj-lkp@intel.com
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Bill Wendling <morbo@google.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The gh_error field if a glock holder is initialized to zero in
gfs2_holder_init(). When a locking operation fails, gh_error is set to
an error code; when it succeeds, the gh_error value is left unchanged.
The field isn't initialized in gfs2_holder_reinit(), which is a problem.
Instead of fixing that directly, initialize gh_error in gfs2_glock_nq().
That also obsoletes the assignment in do_flock().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The gfs2 evict code tries to upgrade the iopen glock from SH to EX. If
the attempt to upgrade times out, gfs2 needs to tell dlm to cancel the
lock request or it can deadlock. We also need to wake up the process
waiting for the lock when dlm sends its AST back to gfs2.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
It turns out that the might_sleep() call that commit 660a6126f8 adds
is triggering occasional data corruption in testing. We're not sure
about the root cause yet, but since this commit was added as a debugging
aid only, revert it for now.
This reverts commit 660a6126f8.
Fixes: 660a6126f8 ("gfs2: check context in gfs2_glock_put")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The description of gfs2_instantiate accidentally lists a glock argument,
but the function takes a glock holder.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The function name in the kernel-doc comment wasn't updated when the
function was renamed.
Fixes: b016d9a84a ("gfs2: Save ip from gfs2_glock_nq_init")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When we mock up a temporary holder in gfs2_glock_cb to demote weak holders in
response to a remote locking conflict, we don't set the HIF_HOLDER flag. This
causes function may_grant to BUG. Fix by setting the missing HIF_HOLDER flag
in the mock glock holder.
In addition, define the mock glock holder where it is used.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function demote_incompat_holders iterates over the list of glock holders
with list_for_each_entry, and it then sometimes removes the current
holder from the list. This will get the loop stuck; we must use
list_for_each_entry_safe instead.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Replace test_bit() + set_bit() with test_and_set_bit() where we need an atomic
operation. Use clear_and_wake_up_bit() instead of open coding it.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Add a might_sleep call into gfs2_glock_put which can sleep in DLM when
the last reference is released. This will show problems earlier, and
not only when the last reference is put.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
So far, glock_hash_walk took a reference on each glock it iterated over, and it
was the examiner's responsibility to drop those references. Dropping the final
reference to a glock can sleep and the examiners are called in a RCU critical
section with spin locks held, so examiners that didn't need the extra reference
had to drop it asynchronously via gfs2_glock_queue_put or similar. This wasn't
done correctly in thaw_glock which did call gfs2_glock_put, and not at all in
dump_glock_func.
Change glock_hash_walk to not take glock references at all. That way, the
examiners that don't need them won't have to bother with slow asynchronous
puts, and the examiners that do need references can take them themselves.
Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
In gfs2_inode_lookup and gfs2_create_inode, we're calling
gfs2_cancel_delete_work which currently cancels any remote delete work
(delete_work_func) synchronously. This means that if the work is
currently running, it will wait for it to finish. We're doing this to
pevent a previous instance of an inode from having any influence on the
next instance.
However, delete_work_func uses gfs2_inode_lookup internally, and we can
end up in a deadlock when delete_work_func gets interrupted at the wrong
time. For example,
(1) An inode's iopen glock has delete work queued, but the inode
itself has been evicted from the inode cache.
(2) The delete work is preempted before reaching gfs2_inode_lookup.
(3) Another process recreates the inode (gfs2_create_inode). It tries
to cancel any outstanding delete work, which blocks waiting for
the ongoing delete work to finish.
(4) The delete work calls gfs2_inode_lookup, which blocks waiting for
gfs2_create_inode to instantiate and unlock the new inode =>
deadlock.
It turns out that when the delete work notices that its inode has been
re-instantiated, it will do nothing. This means that it's safe to
cancel the delete work asynchronously. This prevents the kind of
deadlock described above.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, when a glock was locked, the very first holder on the
queue would unlock the lockref and call the go_instantiate glops function
(if one existed), unless GL_SKIP was specified. When we introduced the new
node-scope concept, we allowed multiple holders to lock glocks in EX mode
and share the lock.
But node-scope introduced a new problem: if the first holder has GL_SKIP
and the next one does NOT, since it is not the first holder on the queue,
the go_instantiate op was not called. Eventually the GL_SKIP holder may
call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was
still a window of time in which another non-GL_SKIP holder assumes the
instantiate function had been called by the first holder. In the case of
rgrp glocks, this led to a NULL pointer dereference on the buffer_heads.
This patch tries to fix the problem by introducing two new glock flags:
GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function
needs to be called to "fill in" or "read in" the object before it is
referenced.
GLF_INSTANTIATE_IN_PROG which is used to determine when a process is
in the process of reading in the object. Whenever a function needs to
reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if
set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate"
function.
As before, the gl_lockref spin_lock is unlocked during the IO operation,
which may take a relatively long amount of time to complete. While
unlocked, if another process determines go_instantiate is still needed,
it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate
glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared,
it needs to check GLF_INSTANTIATE_NEEDED again because the other process's
go_instantiate operation may not have been successful.
Functions that previously called the instantiate sub-functions now call
directly into gfs2_instantiate so the new bits are managed properly.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Before this patch, function do_promote had a section of code that did
the actual instantiation. This patch splits that off into its own
function, gfs2_instantiate, which prepares us for the next patch that
will use that function.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch further simplifies function do_promote by eliminating some
redundant code in favor of using a lock_released flag. This is just
prep work for a future patch.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>