40356 Commits

Author SHA1 Message Date
Dongsheng Yang
d3001ed3a8 btrfs: qgroup: update limit info in function btrfs_run_qgroups().
When we commit_transaction(), qgroups in btree should be updated.
But, limit info is not considered currently. It will cause a problem
when a qgroup of a snapshot inherit the limit info from srcqgroup,
then there is an inconsistency.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:42 -07:00
Dongsheng Yang
1510e71c62 btrfs: qgroup: consolidate the parameter of fucntion update_qgroup_limit_item().
Cleanup: Change the parameter of update_qgroup_limit_item() to the family of
update_qgroup_xxx_item().

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:41 -07:00
Dongsheng Yang
e8c8541ac3 btrfs: qgroup: update qgroup in memory at the same time when we update it in btree.
When we call btrfs_qgroup_inherit() with BTRFS_QGROUP_INHERIT_SET_LIMITS,
btrfs will update the limit info of qgroup in btree but forget to update
the qgroup in rbtree at the same time. It obviousely will cause an inconsistency.

This patch fix it by updating the rbtree at the same time.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:40 -07:00
Dongsheng Yang
3eeb4d597e btrfs: qgroup: inherit limit info from srcgroup in creating snapshot.
Currently, when we snapshot a subvol, snapshot will not copy the limits
from srcqgroup.

This patch make the qgroup in snapshot inherit the limit info when create
a snapshot.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:52:38 -07:00
Zhao Lei
c99f1b0c6c btrfs: Support busy loop of write and delete
Reproduce:
 while true; do
   dd if=/dev/zero of=/mnt/btrfs/file count=[75% fs_size]
   rm /mnt/btrfs/file
 done
 Then we can see above loop failed on NO_SPACE.

It it long-term problem since very beginning, because delayed-iput
after rm are not run.

We already have commit_transaction() in alloc_space code, but it is
not triggered in above case.
This patch trigger commit_transaction() to run delayed-iput and
reflash pinned-space to to make write success.

It is based on previous fix of delayed-iput in commit_transaction(),
need to be applied on top of:
btrfs: Fix NO_SPACE bug caused by delayed-iput

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:31:10 -07:00
Zhao Lei
d7c151717a btrfs: Fix NO_SPACE bug caused by delayed-iput
Steps to reproduce:
  while true; do
    dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%]
    rm /btrfs_dir/file
    sync
  done

  And we'll see dd failed because btrfs return NO_SPACE.

Reason:
  Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs()
  in end to free fs space for next write, but sometimes it hadn't
  done work on time, because btrfs-cleaner thread get delayed-iputs
  from list before, but do iput() after next write.

  This is log:
  [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin

  [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs()
  [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs()
  [ 2569.087554] comm=sync func=btrfs_commit_transaction() end

  [ 2569.191081] comm=dd begin
  [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28

  [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888
  [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512
  ...
  [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496
  [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end

Fix:
  Make btrfs_commit_transaction() wait current running btrfs-cleaner's
  delayed-iputs() done in end.

Test:
  Use script similar to above(more complex),
  before patch:
    7 failed in 100 * 20 loop.
  after patch:
    0 failed in 100 * 20 loop.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:27:41 -07:00
Zhao Lei
18d018ad2c btrfs: add WARN_ON() to check is space_info op current
space_info's value calculation is some complex and easy to cause
bug, add WARN_ON() to help debug.

Changelog v1->v2:
 Put WARN_ON()s under the ENOSPC_DEBUG mount option.
 Suggested by: David Sterba <dsterba@suse.cz>

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:27:35 -07:00
Zhao Lei
c30666d466 btrfs: Set relative data on clear btrfs_block_group_cache->pinned
Bug1:
  space_info->bytes_readonly was set to very large(negative) value in
  btrfs_remove_block_group().

Reason:
  Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(),
  but above space was not counted to space_info->bytes_readonly.

  Then in btrfs_remove_block_group():
    block_group->space_info->bytes_readonly -= block_group->key.offset;
  We can see following value in trace:
    btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: bytes_readonly=12582912, key.offset=134217728

Bug2:
  space_info->total_bytes_pinned grow to value larger than fs size.
  In a 1.2G fs, we can get following trace log:
  at first:
    ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned flags=1 869793792 + 95944704 = 965738496
  after some op:
    ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned flags=1 1780178944 + 95944704 = 1876123648
  after some op:
    ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned flags=1 2924568576 + 95551488 = 3020120064
  ...

Reason:
  Similar to bug1, we also need to adjust space_info->total_bytes_pinned
  in above code block.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:27:29 -07:00
Zhao Lei
264ca0f60b btrfs: Adjust commit-transaction condition to avoid NO_SPACE more
If we have any chance to make a successful write, we should not give up.

This patch adjust commit-transaction condition from:
  pinned >= wanted
to
  left + pinned >= wanted

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:27:24 -07:00
Zhao Lei
f2ab76188e btrfs: Fix tail space processing in find_free_dev_extent()
It is another reason for NO_SPACE case.

When we found enough free space in loop and saved them to
max_hole_start/size before, and tail space contains pending extent,
origional innocent max_hole_start/size are reset in retry.

As a result, find_free_dev_extent() returns less space than it can,
and cause NO_SPACE in user program.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:27:18 -07:00
Zhao Lei
94b947b2f3 btrfs: fix condition of commit transaction
Old code bypass commit transaction when we don't have enough
pinned space, but another case is there exist freed bgs in current
transction, it have possibility to make alloc_chunk success.

This patch modify the condition to:
if (have_free_bg || have_pinned_space) commit_transaction()

Confirmed above action by printk before and after patch.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:26:40 -07:00
Chris Mason
de249e66a7 Btrfs: fix uninit variable in clone ioctl
Commit 0d97a64e0 creates a new variable but doesn't always set it up.
This puts it back to the original method (key.offset + 1) for the cases
not covered by Filipe's new logic.

Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:03:58 -07:00
Filipe Manana
ccccf3d672 Btrfs: fix inode eviction infinite loop after cloning into it
If we attempt to clone a 0 length region into a file we can end up
inserting a range in the inode's extent_io tree with a start offset
that is greater then the end offset, which triggers immediately the
following warning:

[ 3914.619057] WARNING: CPU: 17 PID: 4199 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
[ 3914.620886] BTRFS: end < start 4095 4096
(...)
[ 3914.638093] Call Trace:
[ 3914.638636]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[ 3914.639620]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[ 3914.640789]  [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
[ 3914.642041]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[ 3914.643236]  [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
[ 3914.644441]  [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
[ 3914.645711]  [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
[ 3914.646914]  [<ffffffff8142b2fb>] ? _raw_spin_unlock+0x28/0x33
[ 3914.648058]  [<ffffffffa03cbac4>] ? test_range_bit+0xcc/0xde [btrfs]
[ 3914.650105]  [<ffffffffa03cb3c3>] lock_extent+0x13/0x15 [btrfs]
[ 3914.651361]  [<ffffffffa03db39e>] lock_extent_range+0x3d/0xcd [btrfs]
[ 3914.652761]  [<ffffffffa03de1fe>] btrfs_ioctl_clone+0x278/0x388 [btrfs]
[ 3914.654128]  [<ffffffff811226dd>] ? might_fault+0x58/0xb5
[ 3914.655320]  [<ffffffffa03e0909>] btrfs_ioctl+0xb51/0x2195 [btrfs]
(...)
[ 3914.669271] ---[ end trace 14843d3e2e622fc1 ]---

This later makes the inode eviction handler enter an infinite loop that
keeps dumping the following warning over and over:

[ 3915.117629] WARNING: CPU: 22 PID: 4228 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
[ 3915.119913] BTRFS: end < start 4095 4096
(...)
[ 3915.137394] Call Trace:
[ 3915.137913]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[ 3915.139154]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[ 3915.140316]  [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
[ 3915.141505]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[ 3915.142709]  [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
[ 3915.143849]  [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
[ 3915.145120]  [<ffffffffa038c1e3>] ? btrfs_kill_super+0x17/0x23 [btrfs]
[ 3915.146352]  [<ffffffff811548f6>] ? deactivate_locked_super+0x3b/0x50
[ 3915.147565]  [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
[ 3915.148785]  [<ffffffff8142b7e2>] ? _raw_write_unlock+0x28/0x33
[ 3915.149931]  [<ffffffffa03bc325>] btrfs_evict_inode+0x196/0x482 [btrfs]
[ 3915.151154]  [<ffffffff81168904>] evict+0xa0/0x148
[ 3915.152094]  [<ffffffff811689e5>] dispose_list+0x39/0x43
[ 3915.153081]  [<ffffffff81169564>] evict_inodes+0xdc/0xeb
[ 3915.154062]  [<ffffffff81154418>] generic_shutdown_super+0x49/0xef
[ 3915.155193]  [<ffffffff811546d1>] kill_anon_super+0x13/0x1e
[ 3915.156274]  [<ffffffffa038c1e3>] btrfs_kill_super+0x17/0x23 [btrfs]
(...)
[ 3915.167404] ---[ end trace 14843d3e2e622fc2 ]---

So just bail out of the clone ioctl if the length of the region to clone
is zero, without locking any extent range, in order to prevent this issue
(same behaviour as a pwrite with a 0 length for example).

This is trivial to reproduce. For example, the steps for the test I just
made for fstests:

  mkfs.btrfs -f SCRATCH_DEV
  mount SCRATCH_DEV $SCRATCH_MNT

  touch $SCRATCH_MNT/foo
  touch $SCRATCH_MNT/bar

  $CLONER_PROG -s 0 -d 4096 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
  umount $SCRATCH_MNT

A test case for fstests follows soon.

CC: <stable@vger.kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:03:28 -07:00
Filipe Manana
113e828386 Btrfs: fix inode eviction infinite loop after extent_same ioctl
If we pass a length of 0 to the extent_same ioctl, we end up locking an
extent range with a start offset greater then its end offset (if the
destination file's offset is greater than zero). This results in a warning
from extent_io.c:insert_state through the following call chain:

  btrfs_extent_same()
    btrfs_double_lock()
      lock_extent_range()
        lock_extent(inode->io_tree, offset, offset + len - 1)
          lock_extent_bits()
            __set_extent_bit()
              insert_state()
                --> WARN_ON(end < start)

This leads to an infinite loop when evicting the inode. This is the same
problem that my previous patch titled
"Btrfs: fix inode eviction infinite loop after cloning into it" addressed
but for the extent_same ioctl instead of the clone ioctl.

CC: <stable@vger.kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:03:27 -07:00
Filipe Manana
df858e7672 Btrfs: fix range cloning when same inode used as source and destination
While searching for extents to clone we might find one where we only use
a part of it coming from its tail. If our destination inode is the same
the source inode, we end up removing the tail part of the extent item and
insert after a new one that point to the same extent with an adjusted
key file offset and data offset. After this we search for the next extent
item in the fs/subvol tree with a key that has an offset incremented by
one. But this second search leaves us at the new extent item we inserted
previously, and since that extent item has a non-zero data offset, it
it can make us call btrfs_drop_extents with an empty range (start == end)
which causes the following warning:

[23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]()
(...)
[23978.557266] Call Trace:
[23978.557978]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[23978.559191]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[23978.560699]  [<ffffffffa047f0ea>] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs]
[23978.562389]  [<ffffffff8104544d>] warn_slowpath_null+0x1a/0x1c
[23978.563613]  [<ffffffffa047f0ea>] btrfs_drop_extent_cache+0x43/0x385 [btrfs]
[23978.565103]  [<ffffffff810e3a18>] ? time_hardirqs_off+0x15/0x28
[23978.566294]  [<ffffffff81079ff8>] ? trace_hardirqs_off+0xd/0xf
[23978.567438]  [<ffffffffa047f73d>] __btrfs_drop_extents+0x6b/0x9e1 [btrfs]
[23978.568702]  [<ffffffff8107c03f>] ? trace_hardirqs_on+0xd/0xf
[23978.569763]  [<ffffffff811441c0>] ? ____cache_alloc+0x69/0x2eb
[23978.570817]  [<ffffffff81142269>] ? virt_to_head_page+0x9/0x36
[23978.571872]  [<ffffffff81143c15>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb
[23978.573466]  [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18
[23978.574962]  [<ffffffffa0480d07>] btrfs_drop_extents+0x66/0x7f [btrfs]
[23978.576179]  [<ffffffffa049aa35>] btrfs_clone+0x516/0xaf5 [btrfs]
[23978.577311]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
[23978.578520]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
[23978.580282]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
(...)
[23978.591887] ---[ end trace 988ec2a653d03ed3 ]---

Then we attempt to insert a new extent item with a key that already
exists, which makes btrfs_insert_empty_item return -EEXIST resulting in
abortion of the current transaction:

[23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]()
(...)
[23978.622589] Call Trace:
[23978.623181]  [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[23978.624359]  [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[23978.625573]  [<ffffffffa044ab6c>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs]
[23978.626971]  [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[23978.628003]  [<ffffffff8108a6c8>] ? vprintk_default+0x1d/0x1f
[23978.629138]  [<ffffffffa044ab6c>] __btrfs_abort_transaction+0x52/0x114 [btrfs]
[23978.630528]  [<ffffffffa049ad1b>] btrfs_clone+0x7fc/0xaf5 [btrfs]
[23978.631635]  [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs]
[23978.632886]  [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs]
[23978.634119]  [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs]
(...)
[23978.647714] ---[ end trace 988ec2a653d03ed4 ]---

This is wrong because we should not process the extent item that we just
inserted previously, and instead process the extent item that follows it
in the tree

For example for the test case I wrote for fstests:

   bs=$((64 * 1024))
   mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc
   mount /dev/sdc /mnt

   xfs_io -f -c "pwrite -S 0xaa $(($bs * 2)) $(($bs * 2))" /mnt/foo

   $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo
   $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo

The second clone call fails with -EEXIST, because when we process the
first extent item (offset 262144), we drop part of it (counting from the
end) and then insert a new extent item with a key greater then the key we
found. The next time we search the tree we search for a key with offset
262144 + 1, which leaves us at the new extent item we have just inserted
but we think it refers to an extent that we need to clone.

Fix this by ensuring the next search key uses an offset corresponding to
the offset of the key we found previously plus the data length of the
corresponding extent item. This ensures we skip new extent items that we
inserted and works for the case of implicit holes too (NO_HOLES feature).

A test case for fstests follows soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-04-13 07:03:26 -07:00
Sheng Yong
1a7e985dd1 UBIFS: fix output format of INUM_WATERMARK
The INUM_WATERMARK is a unsigned 32bit value, `%d' prints it as negatave:
[  103.682255] UBIFS warning (ubi0:0 pid 691): ubifs_new_inode: running out of inode numbers (current 122763, max -256)

Fix it as:
[  154.422940] UBIFS warning (ubi0:0 pid 688): ubifs_new_inode: running out of inode numbers (current 122765, max 4294967040)

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-04-13 13:37:31 +03:00
Dave Chinner
6a63ef064b Merge branch 'xfs-misc-fixes-for-4.1-3' into for-next
Conflicts:
	fs/xfs/xfs_iops.c
2015-04-13 11:40:16 +10:00
Christoph Hellwig
21c3ea1881 xfs: unlock i_mutex in xfs_break_layouts
We want to drop all I/O path locks when recalling layouts, and that includes
i_mutex for the write path.  Without this we get stuck processe when recalls
take too long.

[dchinner: fix build with !CONFIG_PNFS]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-13 11:38:29 +10:00
Brian Foster
66db810496 xfs: kill unnecessary firstused overflow check on attr3 leaf removal
xfs_attr3_leaf_remove() removes an attribute from an attr leaf block. If
the attribute nameval data happens to be at the start of the nameval
region, a new start offset (firstused) for the region is calculated
(since the region grows from the tail of the block to the start). Once
the new firstused is calculated, it is checked for zero in an apparent
overflow check.

Now that the in-core firstused is 32-bit, overflow is not possible and
this check can be removed. Since the purpose for this check is not
documented and appears to exist since the port to Linux, be conservative
and replace it with an assert.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-13 11:27:59 +10:00
Brian Foster
e87021a2bc xfs: use larger in-core attr firstused field and detect overflow
The on-disk xfs_attr3_leaf_hdr structure firstused field is 16-bit and
subject to overflow when fs block size is 64k. The field is typically
initialized to block size when an attr leaf block is initialized. This
problem is demonstrated by assert failures when running xfstests
generic/117 on an fs with 64k blocks.

To support the existing attr leaf block algorithms for insertion,
rebalance and entry movement, increase the size of the in-core firstused
field to 32-bit and handle the potential overflow on conversion to/from
the on-disk structure. If the overflow condition occurs, set a special
value in the firstused field that is translated back on header read. The
special value is only required in the case of an empty 64k attr block. A
value of zero is used because firstused is initialized to the block size
and grows backwards from there. Furthermore, the attribute block header
occupies the first bytes of the block. Thus, a value of zero has no
other legitimate meaning for this structure. Two new conversion helpers
are created to manage the conversion of firstused to and from disk.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-13 11:27:10 +10:00
Brian Foster
2f66124154 xfs: pass attr geometry to attr leaf header conversion functions
The firstused field of the xfs_attr3_leaf_hdr structure is subject to an
overflow when fs blocksize is 64k. In preparation to handle this
overflow in the header conversion functions, pass the attribute geometry
to the functions that convert the in-core structure to and from the
on-disk structure.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-13 11:26:02 +10:00
Eric Sandeen
bbe051c841 xfs: disallow ro->rw remount on norecovery mount
There's a bit of a loophole in norecovery mount handling right
now: an initial mount must be readonly, but nothing prevents
a mount -o remount,rw from producing a writable, unrecovered
xfs filesystem.

It might be possible to try to perform a log recovery when this
is requested, but I'm not sure it's worth the effort.  For now,
simply disallow this sort of transition.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-13 11:25:41 +10:00
kbuild test robot
72c1a73993 xfs: xfs_shift_file_space can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-04-13 11:25:04 +10:00
Linus Torvalds
6a23b45f1d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs and fs fixes from Al Viro:
 "Several AIO and OCFS2 fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ocfs2: _really_ sync the right range
  ocfs2_file_write_iter: keep return value and current position update in sync
  [regression] ocfs2: do *not* increment ->ki_pos twice
  ioctx_alloc(): fix vma (and file) leak on failure
  fix mremap() vs. ioctx_kill() race
2015-04-12 10:56:12 -07:00
Michael Halcrow
4461471107 ext4 crypto: enable filename encryption
Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 01:09:05 -04:00
Michael Halcrow
1f3862b557 ext4 crypto: filename encryption modifications
Modifies htree_dirblock_to_tree, dx_make_map, ext4_match search_dir,
and ext4_find_dest_de to support fname crypto.  Filename encryption
feature is not yet enabled at this patch.

Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 01:09:03 -04:00
Michael Halcrow
b309848644 ext4 crypto: partial update to namei.c for fname crypto
Modifies dx_show_leaf and dx_probe to support fname encryption.
Filename encryption not yet enabled.

Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 01:07:01 -04:00
Michael Halcrow
4bdfc873ba ext4 crypto: insert encrypted filenames into a leaf directory block
Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 00:56:28 -04:00
Theodore Ts'o
2f61830ae3 ext4 crypto: teach ext4_htree_store_dirent() to store decrypted filenames
For encrypted directories, we need to pass in a separate parameter for
the decrypted filename, since the directory entry contains the
encrypted filename.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 00:56:26 -04:00
Michael Halcrow
d5d0e8c720 ext4 crypto: filename encryption facilities
Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 00:56:17 -04:00
Michael Halcrow
c9c7429c2e ext4 crypto: implement the ext4 decryption read path
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 00:56:10 -04:00
Michael Halcrow
2058f83a72 ext4 crypto: implement the ext4 encryption write path
Pulls block_write_begin() into fs/ext4/inode.c because it might need
to do a low-level read of the existing data, in which case we need to
decrypt it.

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 00:55:10 -04:00
Michael Halcrow
dde680cefc ext4 crypto: inherit encryption policies on inode and directory create
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 00:55:09 -04:00
Theodore Ts'o
d9cdc90331 ext4 crypto: enforce context consistency
Enforce the following inheritance policy:

1) An unencrypted directory may contain encrypted or unencrypted files
or directories.

2) All files or directories in a directory must be protected using the
same key as their containing directory.

As a result, assuming the following setup:

mke2fs -t ext4 -Fq -O encrypt /dev/vdc
mount -t ext4 /dev/vdc /vdc
mkdir /vdc/a /vdc/b /vdc/c
echo foo | e4crypt add_key /vdc/a
echo bar | e4crypt add_key /vdc/b
for i in a b c ; do cp /etc/motd /vdc/$i/motd-$i ; done

Then we will see the following results:

cd /vdc
mv a b			# will fail; /vdc/a and /vdc/b have different keys
mv b/motd-b a		# will fail, see above
ln a/motd-a b		# will fail, see above
mv c a	    		# will fail; all inodes in an encrypted directory
   	  		#	must be encrypted
ln c/motd-c b		# will fail, see above
mv a/motd-a c		# will succeed
mv c/motd-a a		# will succeed

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 00:55:08 -04:00
Michael Halcrow
88bd6ccdcd ext4 crypto: add encryption key management facilities
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <muslukhovi@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 00:55:06 -04:00
Michael Halcrow
b30ab0e034 ext4 crypto: add ext4 encryption facilities
On encrypt, we will re-assign the buffer_heads to point to a bounce
page rather than the control_page (which is the original page to write
that contains the plaintext). The block I/O occurs against the bounce
page.  On write completion, we re-assign the buffer_heads to the
original plaintext page.

On decrypt, we will attach a read completion callback to the bio
struct. This read completion will decrypt the read contents in-place
prior to setting the page up-to-date.

The current encryption mode, AES-256-XTS, lacks cryptographic
integrity. AES-256-GCM is in-plan, but we will need to devise a
mechanism for handling the integrity data.

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-04-12 00:43:56 -04:00
Al Viro
7da839c475 ocfs2: use __generic_file_write_iter()
we can do that now - all we need is to clear IOCB_DIRECT from ->ki_flags in
"can't do dio" case.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:30:22 -04:00
Al Viro
2ba48ce513 mirror O_APPEND and O_DIRECT into iocb->ki_flags
... avoiding write_iter/fcntl races.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:30:22 -04:00
Al Viro
3309dd04cb switch generic_write_checks() to iocb and iter
... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success.  Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Conflicts:
	fs/ext4/file.c
2015-04-11 22:30:21 -04:00
Al Viro
90320251db ocfs2: move generic_write_checks() before the alignment checks
Alignment checks for dio depend upon the range truncation done by
generic_write_checks().  They can be done as soon as we got ocfs2_rw_lock()
and that actually makes ocfs2_prepare_inode_for_write() simpler.

	The only thing to watch out for is restoring the original count
in "unlock and redo without dio" case.  Position doesn't need to be
restored, since we change it only in O_APPEND case and in that case it
will be reassigned anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:30:21 -04:00
Al Viro
5dc3161cb6 ocfs2_file_write_iter: stop messing with ppos
it's &iocb->ki_pos; no need to obfuscate.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:30:21 -04:00
Al Viro
dfea934575 Merge branch 'for-linus' into for-next 2015-04-11 22:29:51 -04:00
Al Viro
165f1a6e30 udf_file_write_iter: reorder and simplify
it's easier to do generic_write_checks() first

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:50 -04:00
Al Viro
6b775b18ee fuse: ->direct_IO() doesn't need generic_write_checks()
already done by caller.  We used to call __fuse_direct_write(), which
called generic_write_checks(); now the former got expanded, bringing
the latter to the surface.  It used to be called all along and calling
it from there had been wrong all along...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:50 -04:00
Al Viro
e768d7ff7b ext4_file_write_iter: move generic_write_checks() up
simpler that way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:49 -04:00
Al Viro
99733fa372 xfs_file_aio_write_checks: switch to iocb/iov_iter
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:49 -04:00
Al Viro
0fa6b005af generic_write_checks(): drop isblk argument
all remaining callers are passing 0; some just obscure that fact.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:48 -04:00
Al Viro
7ec7b94a33 blkdev_write_iter: expand generic_file_checks() call in there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:48 -04:00
Al Viro
5f380c7fa7 lift generic_write_checks() into callers of __generic_file_write_iter()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:47 -04:00
Al Viro
e9d1593d4e cifs: fold cifs_iovec_write() into the only caller
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:29:47 -04:00