Commit Graph

412959 Commits

Author SHA1 Message Date
Ben Myers
324bb26144 Merge branch 'xfs-for-linus-v3.13-rc5' into for-next 2013-12-18 10:36:58 -06:00
Dave Chinner
ac8809f9ab xfs: abort metadata writeback on permanent errors
If we are doing aysnc writeback of metadata, we can get write errors
but have nobody to report them to. At the moment, we simply attempt
to reissue the write from io completion in the hope that it's a
transient error.

When it's not a transient error, the buffer is stuck forever in
this loop, and we cannot break out of it. Eventually, unmount will
hang because the AIL cannot be emptied and everything goes downhill
from them.

To solve this problem, only retry the write IO once before aborting
it. We don't throw the buffer away because some transient errors can
last minutes (e.g.  FC path failover) or even hours (thin
provisioned devices that have run out of backing space) before they
go away. Hence we really want to keep trying until we can't try any
more.

Because the buffer was not cleaned, however, it does not get removed
from the AIL and hence the next pass across the AIL will start IO on
it again. As such, we still get the "retry forever" semantics that
we currently have, but we allow other access to the buffer in the
mean time. Meanwhile the filesystem can continue to modify the
buffer and relog it, so the IO errors won't hang the log or the
filesystem.

Now when we are pushing the AIL, we can see all these "permanent IO
error" buffers and we can issue a warning about failures before we
retry the IO. We can also catch these buffers when unmounting an
issue a corruption warning, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-17 09:40:23 -06:00
Dave Chinner
33177f0536 xfs: swalloc doesn't align allocations properly
When swalloc is specified as a mount option, allocations are
supposed to be aligned to the stripe width rather than the stripe
unit of the underlying filesystem. However, it does not do this.

What the implementation does is round up the allocation size to a
stripe width, hence ensuring that all allocations span a full stripe
width. It does not, however, ensure that that allocation is aligned
to a stripe width, and hence the allocations can span multiple
underlying stripes and so still see RMW cycles for things like
direct IO on MD RAID.

So, if the swalloc mount option is set, change the allocation
alignment in xfs_bmap_btalloc() to use the stripe width rather than
the stripe unit.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-17 09:30:12 -06:00
Christoph Hellwig
83a0adc3f9 xfs: remove xfsbdstrat error
The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
handles the case of a shut down filesystem.  Most of the users have private,
uncached buffers that can just be freed in this case, but the complex error
handling in xfs_bioerror_relse messes up the case when it's called without
a locked buffer.

Remove xfsbdstrat and opencode the error handling in the callers.  All but
one can simply return an error and don't need to deal with buffer state,
and the one caller that cares about the buffer state could do with a major
cleanup as well, but we'll defer that to later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-17 09:28:43 -06:00
Dave Chinner
6e708bcf65 xfs: align initial file allocations correctly
The function xfs_bmap_isaeof() is used to indicate that an
allocation is occurring at or past the end of file, and as such
should be aligned to the underlying storage geometry if possible.

Commit 27a3f8f ("xfs: introduce xfs_bmap_last_extent") changed the
behaviour of this function for empty files - it turned off
allocation alignment for this case accidentally. Hence large initial
allocations from direct IO are not getting correctly aligned to the
underlying geometry, and that is cause write performance to drop in
alignment sensitive configurations.

Fix it by considering allocation into empty files as requiring
aligned allocation again.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit f9b395a8ef)
2013-12-17 09:17:25 -06:00
Namjae Jeon
809625ca54 MAINTAINERS: fix incorrect mail address of XFS maintainer
When I tried to send the patches to XFS Maintainers,
I got returned mail included delivery fail message for Dave's mail.
Maybe, Dave Chinner mail address is incorrect.
I try to fix it correctly.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit db10bddc7d)
2013-12-17 09:17:04 -06:00
Jie Liu
718cc6f88c xfs: fix infinite loop by detaching the group/project hints from user dquot
xfs_quota(8) will hang up if trying to turn group/project quota off
before the user quota is off, this could be 100% reproduced by:
  # mount -ouquota,gquota /dev/sda7 /xfs
  # mkdir /xfs/test
  # xfs_quota -xc 'off -g' /xfs <-- hangs up
  # echo w > /proc/sysrq-trigger
  # dmesg

  SysRq : Show Blocked State
  task                        PC stack   pid father
  xfs_quota       D 0000000000000000     0 27574   2551 0x00000000
  [snip]
  Call Trace:
  [<ffffffff81aaa21d>] schedule+0xad/0xc0
  [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
  [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
  [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
  [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
  [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
  [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
  [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
  [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
  [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
  [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
  [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
  [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
  [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
  [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
  [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
  [<ffffffff812fba4b>] ? iput+0x5b/0x90
  [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
  [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b

It's fine if we turn user quota off at first, then turn off other
kind of quotas if they are enabled since the group/project dquot
refcount is decreased to zero once the user quota if off. Otherwise,
those dquots refcount is non-zero due to the user dquot might refer
to them as hint(s).  Hence, above operation cause an infinite loop
at xfs_qm_dquot_walk() while trying to purge dquot cache.

This problem has been around since Linux 3.4, it was introduced by:
  [ b84a3a9675 xfs: remove the per-filesystem list of dquots ]

Originally we will release the group dquot pointers because the user
dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
However, with above change, there is no such work to be done before
purging group/project dquot cache.

In order to solve this problem, this patch introduces a special routine
xfs_qm_dqpurge_hints(), and it would release the group/project dquot
pointers the user dquots maybe carrying around as a hint, and then it
will proceed to purge the user dquot cache if requested.

Cc: stable@vger.kernel.org
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit df8052e7da)
2013-12-17 09:16:40 -06:00
Jie Liu
5c22727895 xfs: fix assertion failure at xfs_setattr_nonsize
For CRC enabled v5 super block, change a file's ownership can simply
trigger an ASSERT failure at xfs_setattr_nonsize() if both group and
project quota are enabled, i.e,

[  305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621
[  305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable]
[  305.383939] Call Trace:
[  305.385536]  [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs]
[  305.387142]  [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs]
[  305.388727]  [<ffffffff811ca388>] notify_change+0x1a8/0x350
[  305.390298]  [<ffffffff811ac39d>] chown_common+0xfd/0x110
[  305.391868]  [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110
[  305.393440]  [<ffffffff811ad760>] SyS_lchown+0x20/0x30
[  305.394995]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
[  305.399870] RIP  [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs]

This fix adjust the assertion to check if the super block support both
quota inodes or not.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 5a01dd54f4)
2013-12-17 09:16:08 -06:00
Jie Liu
30d161c9aa xfs: fix false assertion at xfs_qm_vop_create_dqattach
After the previous fix, there still has another ASSERT failure if turning
off any type of quota while fsstress is running at the same time.

Backtrace in this case:

[   50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118
[   50.867924] ------------[ cut here ]------------
... <snip>
[   50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable]
[   50.867999] invalid opcode: 0000 [#1] SMP
[   50.869407] Call Trace:
[   50.869446]  [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs]
[   50.869512]  [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs]
[   50.869564]  [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs]
[   50.869615]  [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs]
[   50.869655]  [<ffffffff811becd5>] vfs_mkdir+0x95/0x130
[   50.869689]  [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0
[   50.869723]  [<ffffffff811bf689>] SyS_mkdir+0x19/0x20
[   50.869757]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
[   50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip>
[   50.870003] RIP  [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs]
[   50.870050]  RSP <ffff88002941fd60>
[   50.879251] ---[ end trace c93a2b342341c65b ]---

We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(),
however the assertion itself is not right IMHO.  While performing quota off, we
firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking
any special locks, see xfs_qm_scall_quotaoff().  Hence there is no guarantee
that the desired quota is still active.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 37eb9706eb)
2013-12-17 09:15:45 -06:00
Mark Tinguely
3a8c92086d xfs: fix memory leak in xfs_dir2_node_removename
Fix the leak of kernel memory in xfs_dir2_node_removename()
when xfs_dir2_leafn_remove() returns an error code.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit ef701600fd)
2013-12-17 09:15:12 -06:00
Ben Myers
46f23adf78 Merge branch 'xfs-factor-icluster-macros' into for-next 2013-12-13 14:15:33 -06:00
Jie Liu
f9e5abcfc5 xfs: use xfs_icluster_size_fsb in xfs_imap
Use xfs_icluster_size_fsb() in xfs_imap().

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 15:51:49 +11:00
Jie Liu
982e939e4d xfs: use xfs_icluster_size_fsb in xfs_ifree_cluster
Use xfs_icluster_size_fsb() in xfs_ifree_cluster(), rename variable
ninodes to inodes_per_cluster, the latter is more meaningful.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 15:51:49 +11:00
Jie Liu
6e0c7b8c3e xfs: use xfs_icluster_size_fsb in xfs_ialloc_inode_init
Use xfs_icluster_size_fsb() in xfs_ialloc_inode_init(), rename variable
ninodes to inodes_per_cluster, the latter is more meaningful.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 15:51:49 +11:00
Jie Liu
a2ba07b2d2 xfs: use xfs_icluster_size_fsb in xfs_bulkstat
Use xfs_icluster_size_fsb() in xfs_bulkstat(), make the related
variables more meaningful and remove an unused variable nimask
from it.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 15:51:48 +11:00
Jie Liu
904957b750 xfs: introduce a common helper xfs_icluster_size_fsb
Introduce a common routine xfs_icluster_size_fsb() to calculate
and return the number of file system blocks per inode cluster.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 15:51:48 +11:00
Jie Liu
126cd105d4 xfs: get rid of XFS_IALLOC_BLOCKS macros
Get rid of XFS_IALLOC_BLOCKS() marcos, use mp->m_ialloc_blks directly.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 15:51:48 +11:00
Jie Liu
0f49efd805 xfs: get rid of XFS_INODE_CLUSTER_SIZE macros
Get rid of XFS_INODE_CLUSTER_SIZE() macros, use mp->m_inode_cluster_size
directly.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 15:51:48 +11:00
Jie Liu
717834383c xfs: get rid of XFS_IALLOC_INODES macros
Get rid of XFS_IALLOC_INODES() marcos, use mp->m_ialloc_inos directly.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 15:51:46 +11:00
Christoph Hellwig
ffda4e83aa xfs: remove the quotaoff log format from the quotaoff log item
This one doesn't save a whole lot of memory, but still makes the
code simpler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 11:34:08 +11:00
Christoph Hellwig
ce8e962939 xfs: remove the dquot log format from the dquot log item
No need to keep the dquot log format around all the time, we can
easily generate it at iop_format time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 11:34:07 +11:00
Christoph Hellwig
2f251293b0 xfs: remove the inode log format from the inode log item
No need to keep the inode log format around all the time, we can
easily generate it at iop_format time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 11:34:05 +11:00
Christoph Hellwig
da7765031d xfs: format logged extents directly into the CIL
With the new iop_format scheme there is no need to have a temporary buffer
to format logged extents into, we can do so directly into the CIL.  This
also allows to remove the shortcut for big endian systems that probably
hasn't gotten a lot of test coverage for a long time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 11:34:04 +11:00
Christoph Hellwig
bde7cff67c xfs: format log items write directly into the linear CIL buffer
Instead of setting up pointers to memory locations in iop_format which then
get copied into the CIL linear buffer after return move the copy into
the individual inode items.  This avoids the need to always have a memory
block in the exact same layout that gets written into the log around, and
allow the log items to be much more flexible in their in-memory layouts.

The only caveat is that we need to properly align the data for each
iovec so that don't have structures misaligned in subsequent iovecs.

Note that all log item format routines now need to be careful to modify
the copy of the item that was placed into the CIL after calls to
xlog_copy_iovec instead of the in-memory copy.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 11:34:02 +11:00
Christoph Hellwig
1234351cba xfs: introduce xlog_copy_iovec
Add a helper to abstract out filling the log iovecs in the log item
format handlers.  This will allow us to change the way we do the log
item formatting more easily.

The copy in the name is a bit confusing for now as it just assigns a
pointer and lets the CIL code perform the copy, but that will change
soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 11:00:43 +11:00
Christoph Hellwig
3de559fbd0 xfs: refactor xfs_inode_item_format
Split out a function to handle the data and attr fork, as well as a
helper for the really old v1 inodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 11:00:43 +11:00
Christoph Hellwig
ce9641d6c9 xfs: refactor xfs_inode_item_size
Split out two helpers to size the data and attribute to make the
function more readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 11:00:43 +11:00
Christoph Hellwig
7aeb722241 xfs: refactor xfs_buf_item_format_segment
Add two helpers to make the code more readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 11:00:43 +11:00
Christoph Hellwig
9597df6b26 xfs: remove duplicate code in xlog_cil_insert_format_items
Share code that was previously duplicated in two branches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2013-12-13 11:00:42 +11:00
Dave Chinner
f9b395a8ef xfs: align initial file allocations correctly
The function xfs_bmap_isaeof() is used to indicate that an
allocation is occurring at or past the end of file, and as such
should be aligned to the underlying storage geometry if possible.

Commit 27a3f8f ("xfs: introduce xfs_bmap_last_extent") changed the
behaviour of this function for empty files - it turned off
allocation alignment for this case accidentally. Hence large initial
allocations from direct IO are not getting correctly aligned to the
underlying geometry, and that is cause write performance to drop in
alignment sensitive configurations.

Fix it by considering allocation into empty files as requiring
aligned allocation again.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-11 15:23:04 -06:00
Ben Myers
8e825e3a02 xfs: fix calculation of freed inode cluster blocks
rec.ir_startino is an agino rather than an ino.  Use the correct macro
when dealing with it in xfs_difree.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2013-12-11 15:22:43 -06:00
Namjae Jeon
db10bddc7d MAINTAINERS: fix incorrect mail address of XFS maintainer
When I tried to send the patches to XFS Maintainers,
I got returned mail included delivery fail message for Dave's mail.
Maybe, Dave Chinner mail address is incorrect.
I try to fix it correctly.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-11 15:22:13 -06:00
Dave Chinner
b3f03bac81 xfs: xfs_dir2_block_to_sf temp buffer allocation fails
If we are using a large directory block size, and memory becomes
fragmented, we can get memory allocation failures trying to
kmem_alloc(64k) for a temporary buffer. However, there is not need
for a directory buffer sized allocation, as the end result ends up
in the inode literal area. This is, at most, slightly less than 2k
of space, and hence we don't need an allocation larger than that
fora temporary buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-11 14:59:20 -06:00
Dave Chinner
f94c44573e xfs: growfs overruns AGFL buffer on V4 filesystems
This loop in xfs_growfs_data_private() is incorrect for V4
superblocks filesystems:

		for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
			agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);

For V4 filesystems, we don't have a agfl header structure, and so
XFS_AGFL_SIZE() returns an entire sector's worth of entries, which
we then index from an offset into the sector. Hence: buffer overrun.

This problem was introduced in 3.10 by commit 77c95bba ("xfs: add
CRC checks to the AGFL") which changed the AGFL structure but failed
to update the growfs code to handle the different structures.

Fix it by using the correct offset into the buffer for both V4 and
V5 filesystems.

Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit b7d961b35b)
2013-12-10 10:04:27 -06:00
Jie Liu
2f42d612e7 xfs: don't perform discard if the given range length is less than block size
For discard operation, we should return EINVAL if the given range length
is less than a block size, otherwise it will go through the file system
to discard data blocks as the end range might be evaluated to -1, e.g,
# fstrim -v -o 0 -l 100 /xfs7
/xfs7: 9811378176 bytes were trimmed

This issue can be triggered via xfstests/generic/288.

Also, it seems to get the request queue pointer via bdev_get_queue()
instead of the hard code pointer dereference is not a bad thing.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit f9fd013561)
2013-12-10 10:00:33 -06:00
Dan Carpenter
31978b5cc6 xfs: underflow bug in xfs_attrlist_by_handle()
If we allocate less than sizeof(struct attrlist) then we end up
corrupting memory or doing a ZERO_PTR_SIZE dereference.

This can only be triggered with CAP_SYS_ADMIN.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 071c529eb6)
2013-12-10 09:59:37 -06:00
Jie Liu
df8052e7da xfs: fix infinite loop by detaching the group/project hints from user dquot
xfs_quota(8) will hang up if trying to turn group/project quota off
before the user quota is off, this could be 100% reproduced by:
  # mount -ouquota,gquota /dev/sda7 /xfs
  # mkdir /xfs/test
  # xfs_quota -xc 'off -g' /xfs <-- hangs up
  # echo w > /proc/sysrq-trigger
  # dmesg

  SysRq : Show Blocked State
  task                        PC stack   pid father
  xfs_quota       D 0000000000000000     0 27574   2551 0x00000000
  [snip]
  Call Trace:
  [<ffffffff81aaa21d>] schedule+0xad/0xc0
  [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
  [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
  [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
  [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
  [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
  [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
  [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
  [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
  [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
  [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
  [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
  [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
  [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
  [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
  [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
  [<ffffffff812fba4b>] ? iput+0x5b/0x90
  [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
  [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b

It's fine if we turn user quota off at first, then turn off other
kind of quotas if they are enabled since the group/project dquot
refcount is decreased to zero once the user quota if off. Otherwise,
those dquots refcount is non-zero due to the user dquot might refer
to them as hint(s).  Hence, above operation cause an infinite loop
at xfs_qm_dquot_walk() while trying to purge dquot cache.

This problem has been around since Linux 3.4, it was introduced by:
  [ b84a3a9675 xfs: remove the per-filesystem list of dquots ]

Originally we will release the group dquot pointers because the user
dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
However, with above change, there is no such work to be done before
purging group/project dquot cache.

In order to solve this problem, this patch introduces a special routine
xfs_qm_dqpurge_hints(), and it would release the group/project dquot
pointers the user dquots maybe carrying around as a hint, and then it
will proceed to purge the user dquot cache if requested.

Cc: stable@vger.kernel.org
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-09 12:14:44 -06:00
Jie Liu
5a01dd54f4 xfs: fix assertion failure at xfs_setattr_nonsize
For CRC enabled v5 super block, change a file's ownership can simply
trigger an ASSERT failure at xfs_setattr_nonsize() if both group and
project quota are enabled, i.e,

[  305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621
[  305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable]
[  305.383939] Call Trace:
[  305.385536]  [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs]
[  305.387142]  [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs]
[  305.388727]  [<ffffffff811ca388>] notify_change+0x1a8/0x350
[  305.390298]  [<ffffffff811ac39d>] chown_common+0xfd/0x110
[  305.391868]  [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110
[  305.393440]  [<ffffffff811ad760>] SyS_lchown+0x20/0x30
[  305.394995]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
[  305.399870] RIP  [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs]

This fix adjust the assertion to check if the super block support both
quota inodes or not.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-09 12:10:30 -06:00
Christoph Hellwig
c91c46c127 xfs: add xfs_setattr_time
Split out a xfs_setattr_time helper to share code between truncate and
regular setattr similar to xfs_setattr_mode.  I might also have another
caller growing for this in the near future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-06 17:26:19 -06:00
Christoph Hellwig
0c3d88dfce xfs: tiny xfs_setattr_mode cleanup
Remove the pointless tp argument, and properly align the local variable
declarations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-06 17:18:41 -06:00
Jie Liu
37eb9706eb xfs: fix false assertion at xfs_qm_vop_create_dqattach
After the previous fix, there still has another ASSERT failure if turning
off any type of quota while fsstress is running at the same time.

Backtrace in this case:

[   50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118
[   50.867924] ------------[ cut here ]------------
... <snip>
[   50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable]
[   50.867999] invalid opcode: 0000 [#1] SMP
[   50.869407] Call Trace:
[   50.869446]  [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs]
[   50.869512]  [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs]
[   50.869564]  [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs]
[   50.869615]  [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs]
[   50.869655]  [<ffffffff811becd5>] vfs_mkdir+0x95/0x130
[   50.869689]  [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0
[   50.869723]  [<ffffffff811bf689>] SyS_mkdir+0x19/0x20
[   50.869757]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
[   50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip>
[   50.870003] RIP  [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs]
[   50.870050]  RSP <ffff88002941fd60>
[   50.879251] ---[ end trace c93a2b342341c65b ]---

We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(),
however the assertion itself is not right IMHO.  While performing quota off, we
firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking
any special locks, see xfs_qm_scall_quotaoff().  Hence there is no guarantee
that the desired quota is still active.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-06 16:10:21 -06:00
Jie Liu
afbd123db4 xfs: integrate xfs_quota_priv header file to xfs_qm
The xfs_quota_priv header file is only included by xfs_qm header and
there is no much users for its contents, hence we can move those stuff
to xfs_qm header file and kill it.

This patch also remove an unused macro DQFLAGTO_TYPESTR.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-06 14:16:33 -06:00
Jie Liu
c61a9e39f6 xfs: make quota metadata truncation behavior consistent to user space
In xfs_qm_scall_trunc_qfiles(), we ignore the error if failed to remove
the users quota metadata and proceed to remove groups and projects if
they are being there.  However, in user space, the remove operation will
break and return if failed to remove any kind of quota.
Also for v5 super block, we can enabled both group and project quota at
the same time, in this case the current error handling will cover the
group error with projects but they might failed due to different reasons.

It seems we'd better the error handling consistent to the user space and
don't trying to remove another kind of quota metadata if the previous
operation is failed.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-06 14:06:15 -06:00
Mark Tinguely
ef701600fd xfs: fix memory leak in xfs_dir2_node_removename
Fix the leak of kernel memory in xfs_dir2_node_removename()
when xfs_dir2_leafn_remove() returns an error code.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05 16:51:19 -06:00
Mark Tinguely
2a84108fe2 xfs: free the list of recovery items on error
Recovery builds a list of items on the transaction's
r_itemq head. Normally these items are committed and freed.
But in the event of a recovery error, these allocations
are leaked.

If the error occurs during item reordering, then reconstruct
the r_itemq list before deleting the list to avoid leaking
the entries that were on one of the temporary lists.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05 16:50:47 -06:00
Christoph Hellwig
dff6efc326 fs: fix iversion handling
Currently notify_change directly updates i_version for size updates,
which not only is counter to how all other fields are updated through
struct iattr, but also breaks XFS, which need inode updates to happen
under its own lock, and synchronized to the structure that gets written
to the log.

Remove the update in the common code, and it to btrfs and ext4,
XFS already does a proper updaste internally and currently gets a
double update with the existing code.

IMHO this is 3.13 and -stable material and should go in through the XFS
tree.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05 16:36:21 -06:00
Dave Chinner
b7d961b35b xfs: growfs overruns AGFL buffer on V4 filesystems
This loop in xfs_growfs_data_private() is incorrect for V4
superblocks filesystems:

		for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
			agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);

For V4 filesystems, we don't have a agfl header structure, and so
XFS_AGFL_SIZE() returns an entire sector's worth of entries, which
we then index from an offset into the sector. Hence: buffer overrun.

This problem was introduced in 3.10 by commit 77c95bba ("xfs: add
CRC checks to the AGFL") which changed the AGFL structure but failed
to update the growfs code to handle the different structures.

Fix it by using the correct offset into the buffer for both V4 and
V5 filesystems.

Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-05 16:19:51 -06:00
Jie Liu
f9fd013561 xfs: don't perform discard if the given range length is less than block size
For discard operation, we should return EINVAL if the given range length
is less than a block size, otherwise it will go through the file system
to discard data blocks as the end range might be evaluated to -1, e.g,
# fstrim -v -o 0 -l 100 /xfs7
/xfs7: 9811378176 bytes were trimmed

This issue can be triggered via xfstests/generic/288.

Also, it seems to get the request queue pointer via bdev_get_queue()
instead of the hard code pointer dereference is not a bad thing.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04 15:42:52 -06:00
Christoph Hellwig
10f73d27c8 xfs: fix the comment explaining xfs_trans_dqlockedjoin
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04 14:26:57 -06:00
Dan Carpenter
071c529eb6 xfs: underflow bug in xfs_attrlist_by_handle()
If we allocate less than sizeof(struct attrlist) then we end up
corrupting memory or doing a ZERO_PTR_SIZE dereference.

This can only be triggered with CAP_SYS_ADMIN.

Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-04 14:23:46 -06:00