linux/fs
Filipe Manana 1a4bcf470c Btrfs: fix fsync data loss after adding hard link to inode
We have a scenario where after the fsync log replay we can lose file data
that had been previously fsync'ed if we added an hard link for our inode
and after that we sync'ed the fsync log (for example by fsync'ing some
other file or directory).

This is because when adding an hard link we updated the inode item in the
log tree with an i_size value of 0. At that point the new inode item was
in memory only and a subsequent fsync log replay would not make us lose
the file data. However if after adding the hard link we sync the log tree
to disk, by fsync'ing some other file or directory for example, we ended
up losing the file data after log replay, because the inode item in the
persisted log tree had an an i_size of zero.

This is easy to reproduce, and the following excerpt from my test for
xfstests shows this:

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create one file with data and fsync it.
  # This made the btrfs fsync log persist the data and the inode metadata with
  # a correct inode->i_size (4096 bytes).
  $XFS_IO_PROG -f -c "pwrite -S 0xaa -b 4K 0 4K" -c "fsync" \
       $SCRATCH_MNT/foo | _filter_xfs_io

  # Now add one hard link to our file. This made the btrfs code update the fsync
  # log, in memory only, with an inode metadata having a size of 0.
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link

  # Now force persistence of the fsync log to disk, for example, by fsyncing some
  # other file.
  touch $SCRATCH_MNT/bar
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar

  # Before a power loss or crash, we could read the 4Kb of data from our file as
  # expected.
  echo "File content before:"
  od -t x1 $SCRATCH_MNT/foo

  # Simulate a crash/power loss.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # After the fsync log replay, because the fsync log had a value of 0 for our
  # inode's i_size, we couldn't read anymore the 4Kb of data that we previously
  # wrote and fsync'ed. The size of the file became 0 after the fsync log replay.
  echo "File content after:"
  od -t x1 $SCRATCH_MNT/foo

Another alternative test, that doesn't need to fsync an inode in the same
transaction it was created, is:

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create our test file with some data.
  $XFS_IO_PROG -f -c "pwrite -S 0xaa -b 8K 0 8K" \
       $SCRATCH_MNT/foo | _filter_xfs_io

  # Make sure the file is durably persisted.
  sync

  # Append some data to our file, to increase its size.
  $XFS_IO_PROG -f -c "pwrite -S 0xcc -b 4K 8K 4K" \
       $SCRATCH_MNT/foo | _filter_xfs_io

  # Fsync the file, so from this point on if a crash/power failure happens, our
  # new data is guaranteed to be there next time the fs is mounted.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo

  # Add one hard link to our file. This made btrfs write into the in memory fsync
  # log a special inode with generation 0 and an i_size of 0 too. Note that this
  # didn't update the inode in the fsync log on disk.
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link

  # Now make sure the in memory fsync log is durably persisted.
  # Creating and fsync'ing another file will do it.
  touch $SCRATCH_MNT/bar
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar

  # As expected, before the crash/power failure, we should be able to read the
  # 12Kb of file data.
  echo "File content before:"
  od -t x1 $SCRATCH_MNT/foo

  # Simulate a crash/power loss.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # After mounting the fs again, the fsync log was replayed.
  # The btrfs fsync log replay code didn't update the i_size of the persisted
  # inode because the inode item in the log had a special generation with a
  # value of 0 (and it couldn't know the correct i_size, since that inode item
  # had a 0 i_size too). This made the last 4Kb of file data inaccessible and
  # effectively lost.
  echo "File content after:"
  od -t x1 $SCRATCH_MNT/foo

This isn't a new issue/regression. This problem has been around since the
log tree code was added in 2008:

  Btrfs: Add a write ahead tree log to optimize synchronous operations
  (commit e02119d5a7)

Test cases for xfstests follow soon.

CC: <stable@vger.kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-02-14 08:22:49 -08:00
..
9p assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
adfs adfs: add __printf verification, fix format/argument mismatches 2014-08-08 15:57:24 -07:00
affs fs/affs/file.c: remove obsolete pagesize check 2014-12-13 12:42:52 -08:00
afs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-12-11 14:27:06 -08:00
autofs4 assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
befs befs: remove dead code 2014-12-13 12:42:51 -08:00
bfs fs/bfs: use bfs prefix for dump_imap 2014-08-08 15:57:24 -07:00
btrfs Btrfs: fix fsync data loss after adding hard link to inode 2015-02-14 08:22:49 -08:00
cachefiles assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
ceph ceph: use %zu for len in ceph_fill_inline_data() 2015-01-08 20:36:56 +03:00
cifs cifs: make new inode cache when file type is different 2014-12-22 14:16:21 -06:00
coda coda_venus_readdir(): use file_inode() 2014-12-11 16:28:12 -05:00
configfs assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
cramfs fs/cramfs/inode.c: use linux/uaccess.h 2014-08-08 15:57:25 -07:00
debugfs Driver core patches for 3.19-rc1 2014-12-14 16:10:09 -08:00
devpts
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
ecryptfs Fixes for filename decryption and encrypted view plus a cleanup 2014-12-19 18:15:12 -08:00
efivarfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
efs fs/efs/namei.c: return is not a function 2014-08-08 15:57:18 -07:00
exofs Boaz Harrosh - Fix broken email address 2014-10-19 20:22:32 +03:00
exportfs move d_rcu from overlapping d_child to overlapping d_alias 2014-11-03 15:20:29 -05:00
ext2 ext2: Convert to private i_dquot field 2014-11-10 10:06:10 +01:00
ext3 ext3: Convert to private i_dquot field 2014-11-10 10:06:10 +01:00
ext4 Revert a potential seek_data/hole regression which shows up when using 2015-01-06 14:05:40 -08:00
f2fs f2fs: avoid to ra unneeded blocks in recover flow 2014-12-08 14:19:09 -08:00
fat fat: fix data past EOF resulting from fsx testsuite 2014-12-13 12:42:51 -08:00
freevxfs
fscache fs/fscache/object-list.c: use __seq_open_private() 2014-10-13 17:52:21 +01:00
fuse fuse: add memory barrier to INIT 2015-01-06 10:45:35 +01:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
hfs fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp 2014-12-10 17:41:16 -08:00
hfsplus hfsplus: fix longname handling 2014-12-18 19:08:10 -08:00
hostfs hostfs: support rename flags 2014-08-07 14:40:09 -04:00
hpfs fs/hpfs/dnode.c: fix suspect code indent 2014-08-08 15:57:22 -07:00
hppfs vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
hugetlbfs mm: convert i_mmap_mutex to rwsem 2014-12-13 12:42:45 -08:00
isofs isofs: Fix unchecked printing of ER records 2014-12-19 11:29:24 +01:00
jbd jbd: Deletion of an unnecessary check before the function call "iput" 2014-11-18 10:15:29 +01:00
jbd2 Lots of bugs fixes, including Zheng and Jan's extent status shrinker 2014-12-12 09:28:03 -08:00
jffs2 jffs2: Drop bogus if in comment 2014-11-28 18:23:44 -08:00
jfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-10 16:10:49 -08:00
kernfs kernfs: Fix kernfs_name_compare 2015-01-09 15:51:08 -08:00
lockd LOCKD: Fix a race when initialising nlmsvc_timeout 2015-01-05 19:40:53 -08:00
logfs fs/logfs/readwrite.c: kernel-doc warning fixes 2014-08-06 18:01:12 -07:00
minix minix zmap block counts calculation fix 2014-08-08 15:57:20 -07:00
ncpfs Merge branch 'akpm' (patchbomb from Andrew) 2014-12-10 18:34:42 -08:00
nfs NFSv4: Remove incorrect check in can_open_delegated() 2015-01-05 19:40:54 -08:00
nfs_common lockd: move lockd's grace period handling into its own module 2014-09-17 16:33:11 -04:00
nfsd nfsd: fix fi_delegees leak when fi_had_conflict returns true 2015-01-07 13:38:21 -05:00
nilfs2 nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races 2014-12-10 17:41:16 -08:00
nls
notify sched, fanotify: Deal with nested sleeps 2015-01-09 11:18:12 +01:00
ntfs assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
ocfs2 ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file 2015-01-08 15:10:51 -08:00
omfs FS/OMFS: block number sanity check during fill_super operation 2014-10-14 02:18:22 +02:00
openpromfs
overlayfs Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
proc Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-12-19 13:26:08 -08:00
pstore Driver core patches for 3.19-rc1 2014-12-14 16:10:09 -08:00
qnx4
qnx6 fs/qnx6: update debugging to current functions 2014-08-08 15:57:26 -07:00
quota vfs: Remove i_dquot field from inode 2014-11-10 10:06:18 +01:00
ramfs fs/ramfs/file-nommu.c: replace count*size kzalloc by kcalloc 2014-08-08 15:57:18 -07:00
reiserfs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-12-16 15:46:01 -08:00
romfs fs/romfs/super.c: add blank line after declarations 2014-08-08 15:57:25 -07:00
squashfs Squashfs: Add LZ4 compression configuration option 2014-11-27 18:48:44 +00:00
sysfs sysfs/kernfs: make read requests on pre-alloc files use the buffer. 2014-11-07 10:54:38 -08:00
sysv
ubifs UBIFS: fix a couple bugs in UBIFS xattr length calculation 2014-11-07 12:32:22 +02:00
udf udf: Reduce repeated dereferences 2014-12-21 22:42:37 +01:00
ufs fs/ufs/balloc.c: remove unused variable 2014-10-14 02:18:20 +02:00
xfs xfs: update for 3.19-rc1 2014-12-12 09:48:17 -08:00
aio.c aio: Skip timer for io_getevents if timeout=0 2014-12-13 17:50:20 -05:00
anon_inodes.c
attr.c
bad_inode.c bad_inode: add ->rename2() 2014-08-07 14:40:09 -04:00
binfmt_aout.c assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
binfmt_elf_fdpic.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
binfmt_elf.c Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2014-12-11 17:56:37 -08:00
binfmt_em86.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
binfmt_flat.c
binfmt_misc.c unfuck binfmt_misc.c (broken by commit e6084d4) 2014-12-17 08:27:14 -05:00
binfmt_script.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
binfmt_som.c
block_dev.c fs: add freeze_super/thaw_super fs hooks 2014-11-17 10:35:17 +00:00
buffer.c fs: clarify rate limit suppressed buffer I/O errors 2014-10-21 13:55:11 -06:00
char_dev.c fs/char_dev.c: remove pointless assignment from __register_chrdev_region() 2014-12-10 17:41:04 -08:00
compat_binfmt_elf.c
compat_ioctl.c
compat.c vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
coredump.c coredump: add %i/%I in core_pattern to report the tid of the crashed thread 2014-10-14 02:18:21 +02:00
dcache.c Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
dcookies.c
direct-io.c fuse: honour max_read and max_write in direct_io mode 2014-09-26 21:16:51 -04:00
drop_caches.c mm: vmscan: invoke slab shrinkers from shrink_zone() 2014-12-13 12:42:48 -08:00
eventfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
eventpoll.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
exec.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
fcntl.c vfs: renumber FMODE_NONOTIFY and add to uniqueness check 2015-01-08 15:10:52 -08:00
fhandle.c
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-13 11:28:42 +02:00
file.c fs/file.c: replace get_unused_fd() with get_unused_fd_flags(0) 2014-12-10 17:41:10 -08:00
filesystems.c
fs_pin.c make fs/{namespace,super}.c forget about acct.h 2014-08-07 14:40:09 -04:00
fs_struct.c
fs-writeback.c writeback: fix a subtle race condition in I_DIRTY clearing 2014-11-04 10:42:23 -07:00
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-16 15:53:03 -08:00
internal.h take the targets of /proc/*/ns/* symlinks to separate fs 2014-12-10 21:30:20 -05:00
ioctl.c Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux 2014-12-16 15:25:31 -08:00
Kconfig overlay filesystem 2014-10-24 00:14:38 +02:00
Kconfig.binfmt binfmt_elf: allow arch code to examine PT_LOPROC ... PT_HIPROC headers 2014-11-24 07:45:02 +01:00
libfs.c move d_rcu from overlapping d_child to overlapping d_alias 2014-11-03 15:20:29 -05:00
locks.c locks: fix NULL-deref in generic_delete_lease 2015-01-13 07:00:55 -05:00
Makefile Merge branch 'nsfs' into for-next 2014-12-10 21:31:59 -05:00
mbcache.c
mount.h common object embedded into various struct ....ns 2014-12-04 14:31:00 -05:00
mpage.c vfs: guard end of device for mpage interface 2014-10-09 22:25:53 -04:00
namei.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-16 15:53:03 -08:00
namespace.c mnt: Fix a memory stomp in umount 2014-12-18 11:22:02 -08:00
no-block.c
nsfs.c take the targets of /proc/*/ns/* symlinks to separate fs 2014-12-10 21:30:20 -05:00
open.c Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux 2014-12-16 15:25:31 -08:00
pipe.c
pnode.c mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers. 2014-12-02 10:46:50 -06:00
pnode.h
posix_acl.c
proc_namespace.c vfs: make mounts and mountstats honor root dir like mountinfo does 2014-12-17 08:27:15 -05:00
read_write.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2014-12-14 20:36:37 -08:00
readdir.c vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
select.c
seq_file.c fs, seq_file: fallback to vmalloc instead of oom kill processes 2014-12-13 12:42:49 -08:00
signalfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
splice.c vfs: export do_splice_direct() to modules 2014-10-24 00:14:35 +02:00
stack.c fs: fix comment for 'CONFIG_LBADF' 2014-08-26 09:35:56 +02:00
stat.c
statfs.c
super.c vfs: Remove i_dquot field from inode 2014-11-10 10:06:18 +01:00
sync.c kill f_dentry uses 2014-11-19 13:01:25 -05:00
timerfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
utimes.c
xattr.c new helper: audit_file() 2014-11-19 13:01:26 -05:00