fs: fix lazytime expiration handling in __writeback_single_inode()
When lazytime is enabled and an inode is being written due to its in-memory updated timestamps having expired, either due to a sync() or syncfs() system call or due to dirtytime_expire_interval having elapsed, the VFS needs to inform the filesystem so that the filesystem can copy the inode's timestamps out to the on-disk data structures. This is done by __writeback_single_inode() calling mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC). However, this occurs after __writeback_single_inode() has already cleared the dirty flags from ->i_state. This causes two bugs: - mark_inode_dirty_sync() redirties the inode, causing it to remain dirty. This wastefully causes the inode to be written twice. But more importantly, it breaks cases where sync_filesystem() is expected to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY ioctl (as reported at https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well as possibly filesystem freezing (freeze_super()). - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is called from __writeback_single_inode() for lazytime expiration, xfs_fs_dirty_inode() ignores the notification. (XFS only cares about lazytime expirations, and it assumes that i_state will contain I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS. Fix this by moving the call to mark_inode_dirty_sync() to earlier in __writeback_single_inode(), before the dirty flags are cleared from i_state. This makes filesystems be properly notified of the timestamp expiration, and it avoids incorrectly redirtying the inode. This fixes xfstest generic/580 (which tests FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime enabled. It also fixes the new lazytime xfstest I've proposed, which reproduces the above-mentioned XFS bug (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org). Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the right thing to do because mark_inode_dirty_sync() now knows not to move the inode to a writeback list if it is currently queued for sync. Fixes:0ae45f63d4
("vfs: add support for a lazytime mount option") Cc: stable@vger.kernel.org Depends-on:5afced3bf2
("writeback: Avoid skipping inode writeback") Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jan Kara <jack@suse.cz>
This commit is contained in:
parent
e609571b5f
commit
1e249cb5b7
@ -1473,22 +1473,26 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
|
|||||||
ret = err;
|
ret = err;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If the inode has dirty timestamps and we need to write them, call
|
||||||
|
* mark_inode_dirty_sync() to notify the filesystem about it and to
|
||||||
|
* change I_DIRTY_TIME into I_DIRTY_SYNC.
|
||||||
|
*/
|
||||||
|
if ((inode->i_state & I_DIRTY_TIME) &&
|
||||||
|
(wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
|
||||||
|
time_after(jiffies, inode->dirtied_time_when +
|
||||||
|
dirtytime_expire_interval * HZ))) {
|
||||||
|
trace_writeback_lazytime(inode);
|
||||||
|
mark_inode_dirty_sync(inode);
|
||||||
|
}
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Some filesystems may redirty the inode during the writeback
|
* Some filesystems may redirty the inode during the writeback
|
||||||
* due to delalloc, clear dirty metadata flags right before
|
* due to delalloc, clear dirty metadata flags right before
|
||||||
* write_inode()
|
* write_inode()
|
||||||
*/
|
*/
|
||||||
spin_lock(&inode->i_lock);
|
spin_lock(&inode->i_lock);
|
||||||
|
|
||||||
dirty = inode->i_state & I_DIRTY;
|
dirty = inode->i_state & I_DIRTY;
|
||||||
if ((inode->i_state & I_DIRTY_TIME) &&
|
|
||||||
((dirty & I_DIRTY_INODE) ||
|
|
||||||
wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
|
|
||||||
time_after(jiffies, inode->dirtied_time_when +
|
|
||||||
dirtytime_expire_interval * HZ))) {
|
|
||||||
dirty |= I_DIRTY_TIME;
|
|
||||||
trace_writeback_lazytime(inode);
|
|
||||||
}
|
|
||||||
inode->i_state &= ~dirty;
|
inode->i_state &= ~dirty;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
|
|||||||
|
|
||||||
spin_unlock(&inode->i_lock);
|
spin_unlock(&inode->i_lock);
|
||||||
|
|
||||||
if (dirty & I_DIRTY_TIME)
|
|
||||||
mark_inode_dirty_sync(inode);
|
|
||||||
/* Don't write the inode if only I_DIRTY_PAGES was set */
|
/* Don't write the inode if only I_DIRTY_PAGES was set */
|
||||||
if (dirty & ~I_DIRTY_PAGES) {
|
if (dirty & ~I_DIRTY_PAGES) {
|
||||||
int err = write_inode(inode, wbc);
|
int err = write_inode(inode, wbc);
|
||||||
|
Loading…
Reference in New Issue
Block a user