2006-10-11 01:20:50 -07:00
/*
2006-10-11 01:20:53 -07:00
* linux / fs / ext4 / fsync . c
2006-10-11 01:20:50 -07:00
*
* Copyright ( C ) 1993 Stephen Tweedie ( sct @ redhat . com )
* from
* Copyright ( C ) 1992 Remy Card ( card @ masi . ibp . fr )
* Laboratoire MASI - Institut Blaise Pascal
* Universite Pierre et Marie Curie ( Paris VI )
* from
* linux / fs / minix / truncate . c Copyright ( C ) 1991 , 1992 Linus Torvalds
*
2006-10-11 01:20:53 -07:00
* ext4fs fsync primitive
2006-10-11 01:20:50 -07:00
*
* Big - endian to little - endian byte - swapping / bitmaps by
* David S . Miller ( davem @ caip . rutgers . edu ) , 1995
*
* Removed unnecessary code duplication for little endian machines
* and excessive __inline__s .
* Andi Kleen , 1997
*
* Major simplications and cleanup - we only need to do the metadata , because
* we can depend on generic_block_fdatasync ( ) to sync the data blocks .
*/
# include <linux/time.h>
# include <linux/fs.h>
# include <linux/sched.h>
# include <linux/writeback.h>
2008-07-11 19:27:31 -04:00
# include <linux/blkdev.h>
2009-06-17 11:48:11 -04:00
2008-04-29 18:13:32 -04:00
# include "ext4.h"
# include "ext4_jbd2.h"
2006-10-11 01:20:50 -07:00
2009-06-17 11:48:11 -04:00
# include <trace/events/ext4.h>
2010-05-17 08:00:00 -04:00
/*
* If we ' re not journaling and this is a just - created file , we have to
* sync our parent directory ( if it was freshly created ) since
* otherwise it will only be written by writeback , leaving a huge
* window during which a crash may lose the file . This may apply for
* the parent directory ' s parent as well , and so on recursively , if
* they are also freshly created .
*/
ext4: sync the directory inode in ext4_sync_parent()
ext4 has taken the stance that, in the absence of a journal,
when an fsync/fdatasync of an inode is done, the parent
directory should be sync'ed if this inode entry is new.
ext4_sync_parent(), which implements this, does indeed sync
the dirent pages for parent directories, but it does not
sync the directory *inode*. This patch fixes this.
Also now return error status from ext4_sync_parent().
I tested this using a power fail test, which panics a
machine running a file server getting requests from a
client. Without this patch, on about every other test run,
the server is missing many, many files that had been synced.
With this patch, on > 6 runs, I see zero files being lost.
Google-Bug-Id: 4179519
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10 22:05:31 -04:00
static int ext4_sync_parent ( struct inode * inode )
2010-05-17 08:00:00 -04:00
{
struct dentry * dentry = NULL ;
2011-07-30 12:34:19 -04:00
struct inode * next ;
ext4: sync the directory inode in ext4_sync_parent()
ext4 has taken the stance that, in the absence of a journal,
when an fsync/fdatasync of an inode is done, the parent
directory should be sync'ed if this inode entry is new.
ext4_sync_parent(), which implements this, does indeed sync
the dirent pages for parent directories, but it does not
sync the directory *inode*. This patch fixes this.
Also now return error status from ext4_sync_parent().
I tested this using a power fail test, which panics a
machine running a file server getting requests from a
client. Without this patch, on about every other test run,
the server is missing many, many files that had been synced.
With this patch, on > 6 runs, I see zero files being lost.
Google-Bug-Id: 4179519
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10 22:05:31 -04:00
int ret = 0 ;
2010-05-17 08:00:00 -04:00
2011-07-30 12:34:19 -04:00
if ( ! ext4_test_inode_state ( inode , EXT4_STATE_NEWENTRY ) )
return 0 ;
inode = igrab ( inode ) ;
while ( ext4_test_inode_state ( inode , EXT4_STATE_NEWENTRY ) ) {
2010-05-17 08:00:00 -04:00
ext4_clear_inode_state ( inode , EXT4_STATE_NEWENTRY ) ;
2012-06-09 13:19:12 -04:00
dentry = d_find_any_alias ( inode ) ;
2011-07-30 12:34:19 -04:00
if ( ! dentry )
2010-05-17 08:00:00 -04:00
break ;
2015-03-17 22:25:59 +00:00
next = igrab ( d_inode ( dentry - > d_parent ) ) ;
2011-07-30 12:34:19 -04:00
dput ( dentry ) ;
if ( ! next )
break ;
iput ( inode ) ;
inode = next ;
2016-09-05 23:21:43 -04:00
/*
* The directory inode may have gone through rmdir by now . But
* the inode itself and its blocks are still allocated ( we hold
* a reference to the inode so it didn ' t go through
* ext4_evict_inode ( ) ) and so we are safe to flush metadata
* blocks and the inode .
*/
ext4: sync the directory inode in ext4_sync_parent()
ext4 has taken the stance that, in the absence of a journal,
when an fsync/fdatasync of an inode is done, the parent
directory should be sync'ed if this inode entry is new.
ext4_sync_parent(), which implements this, does indeed sync
the dirent pages for parent directories, but it does not
sync the directory *inode*. This patch fixes this.
Also now return error status from ext4_sync_parent().
I tested this using a power fail test, which panics a
machine running a file server getting requests from a
client. Without this patch, on about every other test run,
the server is missing many, many files that had been synced.
With this patch, on > 6 runs, I see zero files being lost.
Google-Bug-Id: 4179519
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10 22:05:31 -04:00
ret = sync_mapping_buffers ( inode - > i_mapping ) ;
if ( ret )
break ;
2012-12-10 14:06:03 -05:00
ret = sync_inode_metadata ( inode , 1 ) ;
ext4: sync the directory inode in ext4_sync_parent()
ext4 has taken the stance that, in the absence of a journal,
when an fsync/fdatasync of an inode is done, the parent
directory should be sync'ed if this inode entry is new.
ext4_sync_parent(), which implements this, does indeed sync
the dirent pages for parent directories, but it does not
sync the directory *inode*. This patch fixes this.
Also now return error status from ext4_sync_parent().
I tested this using a power fail test, which panics a
machine running a file server getting requests from a
client. Without this patch, on about every other test run,
the server is missing many, many files that had been synced.
With this patch, on > 6 runs, I see zero files being lost.
Google-Bug-Id: 4179519
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10 22:05:31 -04:00
if ( ret )
break ;
2010-05-17 08:00:00 -04:00
}
2011-07-30 12:34:19 -04:00
iput ( inode ) ;
ext4: sync the directory inode in ext4_sync_parent()
ext4 has taken the stance that, in the absence of a journal,
when an fsync/fdatasync of an inode is done, the parent
directory should be sync'ed if this inode entry is new.
ext4_sync_parent(), which implements this, does indeed sync
the dirent pages for parent directories, but it does not
sync the directory *inode*. This patch fixes this.
Also now return error status from ext4_sync_parent().
I tested this using a power fail test, which panics a
machine running a file server getting requests from a
client. Without this patch, on about every other test run,
the server is missing many, many files that had been synced.
With this patch, on > 6 runs, I see zero files being lost.
Google-Bug-Id: 4179519
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10 22:05:31 -04:00
return ret ;
2010-05-17 08:00:00 -04:00
}
2006-10-11 01:20:50 -07:00
/*
2006-10-11 01:20:53 -07:00
* akpm : A new design for ext4_sync_file ( ) .
2006-10-11 01:20:50 -07:00
*
* This is only called from sys_fsync ( ) , sys_fdatasync ( ) and sys_msync ( ) .
* There cannot be a transaction open by this task .
* Another task could have dirtied this inode . Its data can be in any
* state in the journalling system .
*
* What we do is just kick off a commit and wait on it . This will snapshot the
* inode to disk .
*/
2011-07-16 20:44:56 -04:00
int ext4_sync_file ( struct file * file , loff_t start , loff_t end , int datasync )
2006-10-11 01:20:50 -07:00
{
2010-05-26 17:53:25 +02:00
struct inode * inode = file - > f_mapping - > host ;
2009-12-08 23:51:10 -05:00
struct ext4_inode_info * ei = EXT4_I ( inode ) ;
2008-07-11 19:27:31 -04:00
journal_t * journal = EXT4_SB ( inode - > i_sb ) - > s_journal ;
2013-06-04 14:40:09 -04:00
int ret = 0 , err ;
2009-12-08 23:51:10 -05:00
tid_t commit_tid ;
2011-05-24 12:00:54 -04:00
bool needs_barrier = false ;
2006-10-11 01:20:50 -07:00
2007-10-16 18:38:25 -04:00
J_ASSERT ( ext4_journal_current_handle ( ) = = NULL ) ;
2006-10-11 01:20:50 -07:00
2011-03-21 21:38:05 -04:00
trace_ext4_sync_file_enter ( file , datasync ) ;
2008-10-05 20:50:06 -04:00
2013-06-12 22:38:04 -04:00
if ( inode - > i_sb - > s_flags & MS_RDONLY ) {
/* Make sure that we read updated s_mount_flags value */
smp_rmb ( ) ;
if ( EXT4_SB ( inode - > i_sb ) - > s_mount_flags & EXT4_MF_FS_ABORTED )
ret = - EROFS ;
2013-06-04 14:40:39 -04:00
goto out ;
2013-06-12 22:38:04 -04:00
}
2009-12-08 23:51:10 -05:00
2010-05-17 08:00:00 -04:00
if ( ! journal ) {
2016-06-26 18:25:01 -04:00
ret = __generic_file_fsync ( file , start , end , datasync ) ;
2016-09-05 23:21:43 -04:00
if ( ! ret )
ext4: sync the directory inode in ext4_sync_parent()
ext4 has taken the stance that, in the absence of a journal,
when an fsync/fdatasync of an inode is done, the parent
directory should be sync'ed if this inode entry is new.
ext4_sync_parent(), which implements this, does indeed sync
the dirent pages for parent directories, but it does not
sync the directory *inode*. This patch fixes this.
Also now return error status from ext4_sync_parent().
I tested this using a power fail test, which panics a
machine running a file server getting requests from a
client. Without this patch, on about every other test run,
the server is missing many, many files that had been synced.
With this patch, on > 6 runs, I see zero files being lost.
Google-Bug-Id: 4179519
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-04-10 22:05:31 -04:00
ret = ext4_sync_parent ( inode ) ;
2016-06-26 18:25:01 -04:00
if ( test_opt ( inode - > i_sb , BARRIER ) )
goto issue_flush ;
2013-06-04 14:40:39 -04:00
goto out ;
2010-05-17 08:00:00 -04:00
}
2009-12-08 23:51:10 -05:00
2013-06-04 14:40:09 -04:00
ret = filemap_write_and_wait_range ( inode - > i_mapping , start , end ) ;
if ( ret )
return ret ;
2006-10-11 01:20:50 -07:00
/*
2009-12-08 23:51:10 -05:00
* data = writeback , ordered :
2006-10-11 01:20:50 -07:00
* The caller ' s filemap_fdatawrite ( ) / wait will sync the data .
2009-12-08 23:51:10 -05:00
* Metadata is in the journal , we wait for proper transaction to
* commit here .
2006-10-11 01:20:50 -07:00
*
* data = journal :
* filemap_fdatawrite won ' t do anything ( the buffers are clean ) .
2006-10-11 01:20:53 -07:00
* ext4_force_commit will write the file data into the journal and
2006-10-11 01:20:50 -07:00
* will wait on that .
* filemap_fdatawait ( ) will encounter a ton of newly - dirtied pages
* ( they were dirtied by commit ) . But that ' s OK - the blocks are
* safe in - journal , which is all fsync ( ) needs to ensure .
*/
2011-03-21 21:38:05 -04:00
if ( ext4_should_journal_data ( inode ) ) {
ret = ext4_force_commit ( inode - > i_sb ) ;
goto out ;
}
2006-10-11 01:20:50 -07:00
2009-12-08 23:51:10 -05:00
commit_tid = datasync ? ei - > i_datasync_tid : ei - > i_sync_tid ;
2011-05-24 12:00:54 -04:00
if ( journal - > j_flags & JBD2_BARRIER & &
! jbd2_trans_will_send_data_barrier ( journal , commit_tid ) )
needs_barrier = true ;
ext4/jbd2: don't wait (forever) for stale tid caused by wraparound
In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.
Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().
Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.
As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: George Barnett <gbarnett@atlassian.com>
Cc: stable@vger.kernel.org
2013-04-03 22:02:52 -04:00
ret = jbd2_complete_transaction ( journal , commit_tid ) ;
2012-08-17 09:58:17 -04:00
if ( needs_barrier ) {
2016-06-26 18:25:01 -04:00
issue_flush :
2012-08-17 09:58:17 -04:00
err = blkdev_issue_flush ( inode - > i_sb - > s_bdev , GFP_KERNEL , NULL ) ;
if ( ! ret )
ret = err ;
}
2013-06-04 14:40:09 -04:00
out :
2011-03-21 21:38:05 -04:00
trace_ext4_sync_file_exit ( inode , ret ) ;
2006-10-11 01:20:50 -07:00
return ret ;
}