linux/fs/gfs2
Bob Peterson d552a2b9b3 GFS2: Non-recursive delete
Implement truncate/delete as a non-recursive algorithm. The older
algorithm was implemented with recursion to strip off each layer
at a time (going by height, starting with the maximum height.
This version tries to do the same thing but without recursion,
and without needing to allocate new structures or lists in memory.

For example, say you want to truncate a very large file to 1 byte,
and its end-of-file metapath is: 0.505.463.428. The starting
metapath would be 0.0.0.0. Since it's a truncate to non-zero, it
needs to preserve that byte, and all metadata pointing to it.
So it would start at 0.0.0.0, look up all its metadata buffers,
then free all data blocks pointed to at the highest level.
After that buffer is "swept", it moves on to 0.0.0.1, then
0.0.0.2, etc., reading in buffers and sweeping them clean.
When it gets to the end of the 0.0.0 metadata buffer (for 4K
blocks the last valid one is 0.0.0.508), it backs up to the
previous height and starts working on 0.0.1.0, then 0.0.1.1,
and so forth. After it reaches the end and sweeps 0.0.1.508,
it continues with 0.0.2.0, and so on. When that height is
exhausted, and it reaches 0.0.508.508 it backs up another level,
to 0.1.0.0, then 0.1.0.1, through 0.1.0.508. So it has to keep
marching backwards and forwards through the metadata until it's
all swept clean. Once it has all the data blocks freed, it
lowers the strip height, and begins the process all over again,
but with one less height. This time it sweeps 0.0.0 through
0.505.463. When that's clean, it lowers the strip height again
and works to free 0.505. Eventually it strips the lowest height, 0.
For a delete or truncate to 0, all metadata for all heights of
0.0.0.0 would be freed. For a truncate to 1 byte, 0.0.0.0 would
be preserved.

This isn't much different from normal integer incrementing,
where an integer gets incremented from 0000 (0.0.0.0) to 3021
(3.0.2.1). So 0000 gets increments to 0001, 0002, up to 0009,
then on to 0010, 0011 up to 0099, then 0100 and so forth. It's
just that each "digit" goes from 0 to 508 (for a total of 509
pointers) rather than from 0 to 9.

Note that the dinode will only have 483 pointers due to the
dinode structure itself.

Also note: this is just an example. These numbers (509 and 483)
are based on a standard 4K block size. Smaller block sizes will
yield smaller numbers of indirect pointers accordingly.

The truncation process is accomplished with the help of two
major functions and a few helper functions.

Functions do_strip and recursive_scan are obsolete, so removed.

New function sweep_bh_for_rgrps cleans a buffer_head pointed to
by the given metapath and height. By cleaning, I mean it frees
all blocks starting at the offset passed in metapath. It starts
at the first block in the buffer pointed to by the metapath and
identifies its resource group (rgrp). From there it frees all
subsequent block pointers that lie within that rgrp. If it's
already inside a transaction, it stays within it as long as it
can. In other words, it doesn't close a transaction until it knows
it's freed what it can from the resource group. In this way,
multiple buffers may be cleaned in a single transaction, as long
as those blocks in the buffer all lie within the same rgrp.

If it's not in a transaction, it starts one. If the buffer_head
has references to blocks within multiple rgrps, it frees all the
blocks inside the first rgrp it finds, then closes the
transaction. Then it repeats the cycle: identifies the next
unfreed block, uses it to find its rgrp, then starts a new
transaction for that set. It repeats this process repeatedly
until the buffer_head contains no more references to any blocks
past the given metapath.

Function trunc_dealloc has been reworked into a finite state
automaton. It has basically 3 active states:
DEALLOC_MP_FULL, DEALLOC_MP_LOWER, and DEALLOC_FILL_MP:

The DEALLOC_MP_FULL state implies the metapath has a full set
of buffers out to the "shrink height", and therefore, it can
call function sweep_bh_for_rgrps to free the blocks within the
highest height of the metapath. If it's just swept the lowest
level (or an error has occurred) the state machine is ended.
Otherwise it proceeds to the DEALLOC_MP_LOWER state.

The DEALLOC_MP_LOWER state implies we are finished with a given
buffer_head, which may now be released, and therefore we are
then missing some buffer information from the metapath. So we
need to find more buffers to read in. In most cases, this is
just a matter of releasing the buffer_head and moving to the
next pointer from the previous height, so it may be read in and
swept as well. If it can't find another non-null pointer to
process, it checks whether it's reached the end of a height
and needs to lower the strip height, or whether it still needs
move forward through the previous height's metadata. In this
state, all zero-pointers are skipped. From this state, it can
only loop around (once more backing up another height) or,
once a valid metapath is found (one that has non-zero
pointers), proceed to state DEALLOC_FILL_MP.

The DEALLOC_FILL_MP state implies that we have a metapath
but not all its buffers are read in. So we must proceed to read
in buffer_heads until the metapath has a valid buffer for every
height. If the previous state backed us up 3 heights, we may
need to read in a buffer, increment the height, then repeat the
process until buffers have been read in for all required heights.
If it's successful reading a buffer, and it's at the highest
height we need, it proceeds back to the DEALLOC_MP_FULL state.
If it's unable to fill in a buffer, (encounters a hole, etc.)
it tries to find another non-zero block pointer. If they're all
zero, it lowers the height and returns to the DEALLOC_MP_LOWER
state. If it finds a good non-null pointer, it loops around and
reads it in, while keeping the metapath in lock-step with the
pointers it examines.

The state machine runs until the truncation request is
satisfied. Then any transactions are ended, the quota and
statfs data are updated, and the function is complete.

Helper function metaptr1 was introduced to be an easy way to
determine the start of a buffer_head's indirect pointers.

Helper function lookup_mp_height was introduced to find a
metapath index and read in the buffer that corresponds to it.
In this way, function lookup_metapath becomes a simple loop to
call it for every height.

Helper function fillup_metapath is similar to lookup_metapath
except it can do partial lookups. If the state machine
backed up multiple levels (like 2999 wrapping to 3000) it
needs to find out the next starting point and start issuing
metadata reads at that point.

Helper function hptrs is a shortcut to determine how many
pointers should be expected in a buffer. Height 0 is the dinode
which has fewer pointers than the others.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-19 08:25:43 -04:00
..
acl.c posix_acl: Clear SGID bit when setting file permissions 2016-09-22 10:55:32 +02:00
acl.h gfs2: Switch to generic xattr handlers 2016-05-12 22:28:05 -04:00
aops.c We've got eight GFS2 patches for this merge window: 2017-02-21 07:46:34 -08:00
bmap.c GFS2: Non-recursive delete 2017-04-19 08:25:43 -04:00
bmap.h GFS2: Clean up journal extent mapping 2014-03-03 13:50:12 +00:00
dentry.c gfs2: Lock holder cleanup 2016-06-27 09:47:09 -05:00
dir.c block,fs: untangle fs.h and blk_types.h 2016-11-01 09:43:26 -06:00
dir.h GFS2: Make rename not save dirent location 2014-10-01 14:06:15 +01:00
export.c gfs2: Get rid of gfs2_ilookup 2016-06-27 09:47:08 -05:00
file.c gfs2: Re-enable fallocate for the rindex 2017-04-05 11:45:26 -04:00
gfs2.h
glock.c gfs2: Switch to rhashtable_lookup_get_insert_fast 2017-04-03 09:14:41 -04:00
glock.h gfs2: Lock holder cleanup 2016-06-27 09:47:09 -05:00
glops.c GFS2: Get rid of dead code in inode_go_demote_ok 2016-04-05 11:59:18 -04:00
glops.h GFS2: update freeze code to use freeze/thaw_super on all nodes 2014-11-17 10:36:39 +00:00
incore.h gfs2: Don't pack struct lm_lockname 2017-03-16 09:58:49 -04:00
inode.c Revert "GFS2: Wait for iopen glock dequeues" 2017-04-03 09:14:47 -04:00
inode.h GFS2: use BIT() macro 2016-08-02 12:05:27 -05:00
Kconfig Finally eradicate CONFIG_HOTPLUG 2013-06-03 14:20:18 -07:00
lock_dlm.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
log.c We've got eight GFS2 patches for this merge window: 2017-02-21 07:46:34 -08:00
log.h GFS2: remove transaction glock 2014-05-14 10:04:34 +01:00
lops.c block: better op and flags encoding 2016-10-28 08:48:16 -06:00
lops.h gfs2: use bio op accessors 2016-06-07 13:41:38 -06:00
main.c gfs2: fix to detect failure of register_shrinker 2016-09-21 12:09:40 -05:00
Makefile
meta_io.c We've got eight GFS2 patches for this merge window: 2017-02-21 07:46:34 -08:00
meta_io.h GFS2: Refactor gfs2_remove_from_journal 2016-05-06 11:27:27 -05:00
ops_fstype.c for-4.11/linus-merge-signed 2017-02-21 10:57:33 -08:00
quota.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
quota.h GFS2: Make rgrp reservations part of the gfs2_inode structure 2015-12-14 12:16:38 -06:00
recovery.c GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes 2016-07-21 13:02:44 -05:00
recovery.h GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes 2016-07-21 13:02:44 -05:00
rgrp.c GFS2: Non-recursive delete 2017-04-19 08:25:43 -04:00
rgrp.h GFS2: Non-recursive delete 2017-04-19 08:25:43 -04:00
super.c Revert "GFS2: Wait for iopen glock dequeues" 2017-04-03 09:14:47 -04:00
super.h GFS2: update freeze code to use freeze/thaw_super on all nodes 2014-11-17 10:36:39 +00:00
sys.c sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
sys.h
trace_gfs2.h gfs2: Make statistics unsigned, suitable for use with do_div() 2015-09-03 13:33:32 -05:00
trans.c GFS2: Reduce contention on gfs2_log_lock 2017-01-30 12:10:25 -05:00
trans.h
util.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
util.h GFS2: Make rgrp reservations part of the gfs2_inode structure 2015-12-14 12:16:38 -06:00
xattr.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
xattr.h gfs2: Remove gfs2_xattr_acl_chmod 2015-12-06 21:25:17 -05:00