3137 Commits

Author SHA1 Message Date
Eric Sandeen
ca23f8fdd6 xfs: add xfs_verifier_error()
We want to distinguish between corruption, CRC errors,
etc.  In addition, the full stack trace on verifier errors
seems less than helpful; it looks more like an oops than
corruption.

Create a new function to specifically alert the user to
verifier errors, which can differentiate between
EFSCORRUPTED and CRC mismatches.  It doesn't dump stack
unless the xfs error level is turned up high.

Define a new error message (EFSBADCRC) to clearly identify
CRC errors.  (Defined to EBADMSG, bad message)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27 15:21:07 +11:00
Eric Sandeen
f1dbcd7e38 xfs: add helper for updating checksums on xfs_bufs
Many/most callers of xfs_update_cksum() pass bp->b_addr and
BBTOB(bp->b_length) as the first 2 args.  Add a helper
which can just accept the bp and the crc offset, and work
it out on its own, for brevity.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27 15:18:23 +11:00
Eric Sandeen
5158217058 xfs: add helper for verifying checksums on xfs_bufs
Many/most callers of xfs_verify_cksum() pass bp->b_addr and
BBTOB(bp->b_length) as the first 2 args.  Add a helper
which can just accept the bp and the crc offset, and work
it out on its own, for brevity.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27 15:17:27 +11:00
Eric Sandeen
533b81c875 xfs: Use defines for CRC offsets in all cases
Some calls to crc functions used useful #defines,
others used awkward offsetof() constructs.

Switch them all to #define to make things a bit cleaner.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27 15:15:27 +11:00
Eric Sandeen
e0d2c23a25 xfs: skip pointless CRC updates after verifier failures
Most write verifiers don't update CRCs after the verifier
has failed and the buffer has been marked in error.  These
two didn't, but should.

Add returns to the verifier failure block, since the buffer
won't be written anyway.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-27 15:14:31 +11:00
Namjae Jeon
e1d8fb88a6 xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate
This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for XFS.

The semantics of this flag are following:
1) It collapses the range lying between offset and length by removing any data
   blocks which are present in this range and than updates all the logical
   offsets of extents beyond "offset + len" to nullify the hole created by
   removing blocks. In short, it does not leave a hole.
2) It should be used exclusively. No other fallocate flag in combination.
3) Offset and length supplied to fallocate should be fs block size aligned
   in case of xfs and ext4.
4) Collaspe range does not work beyond i_size.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-24 10:58:19 +11:00
Linus Torvalds
645ceee885 Merge branch 'xfs-fixes-for-3.14-rc4' of git://oss.sgi.com/xfs/xfs
Pull xfs fixes from Dave Chinner:
 "This is the first pull request I've had to do for you, so I'm still
  sorting things out.  The reason I'm sending this and not Ben should be
  obvious from the first commit below - SGI has stepped down from the
  XFS maintainership role.  As such, I'd like to take another
  opportunity to thank them for their many years of effort maintaining
  XFS and supporting the XFS community that they developed from the
  ground up.

  So I haven't had time to work things like signed tags into my
  workflows yet, so this is just a repo branch I'm asking you to pull
  from.  And yes, I named the branch -rc4 because I wanted the fixes in
  rc4, not because the branch was for merging into -rc3.  Probably not
  right, either.

  Anyway, I should have everything sorted out by the time the next merge
  window comes around.  If there's anything that you don't like in the
  pull req, feel free to flame me unmercifully.

  The changes are fixes for recent regressions and important thinkos in
  verification code:

        - a log vector buffer alignment issue on ia32
        - timestamps on truncate got mangled
        - primary superblock CRC validation fixes and error message
          sanitisation"

* 'xfs-fixes-for-3.14-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: limit superblock corruption errors to actual corruption
  xfs: skip verification on initial "guess" superblock read
  MAINTAINERS: SGI no longer maintaining XFS
  xfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sb
  xfs: ensure correct log item buffer alignment
  xfs: ensure correct timestamp updates from truncate
2014-02-22 08:26:01 -08:00
Jan Kara
0dc83bd30b Revert "writeback: do not sync data dirtied after sync start"
This reverts commit c4a391b53a72d2df4ee97f96f78c1d5971b47489. Dave
Chinner <david@fromorbit.com> has reported the commit may cause some
inodes to be left out from sync(2). This is because we can call
redirty_tail() for some inode (which sets i_dirtied_when to current time)
after sync(2) has started or similarly requeue_inode() can set
i_dirtied_when to current time if writeback had to skip some pages. The
real problem is in the functions clobbering i_dirtied_when but fixing
that isn't trivial so revert is a safer choice for now.

CC: stable@vger.kernel.org # >= 3.13
Signed-off-by: Jan Kara <jack@suse.cz>
2014-02-22 02:02:28 +01:00
Dave Chinner
027f185e66 Merge remote-tracking branch 'xfs-async-aio-extend' into for-next 2014-02-20 15:16:39 +11:00
Dave Chinner
b678573e29 Merge branch 'xfs-fixes-for-3.15' into for-next 2014-02-20 15:16:09 +11:00
Eric Sandeen
5ef11eb070 xfs: limit superblock corruption errors to actual corruption
Today, if

xfs_sb_read_verify
  xfs_sb_verify
    xfs_mount_validate_sb

detects superblock corruption, it'll be extremely noisy, dumping
2 stacks, 2 hexdumps, etc.

This is because we call XFS_CORRUPTION_ERROR in xfs_mount_validate_sb
as well as in xfs_sb_read_verify.

Also, *any* errors in xfs_mount_validate_sb which are not corruption
per se; things like too-big-blocksize, bad version, bad magic, v1 dirs,
rw-incompat etc - things which do not return EFSCORRUPTED - will
still do the whole XFS_CORRUPTION_ERROR spew when xfs_sb_read_verify
sees any error at all.  And it suggests to the user that they
should run xfs_repair, even if the root cause of the mount failure
is a simple incompatibility.

I'll submit that the probably-not-corrupted errors don't warrant
this much noise, so this patch removes the warning for anything
other than EFSCORRUPTED returns, and replaces the lower-level
XFS_CORRUPTION_ERROR with an xfs_notice().

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-19 15:39:35 +11:00
Eric Sandeen
daba5427da xfs: skip verification on initial "guess" superblock read
When xfs_readsb() does the very first read of the superblock,
it makes a guess at the length of the buffer, based on the
sector size of the underlying storage.  This may or may
not match the filesystem sector size in sb_sectsize, so
we can't i.e. do a CRC check on it; it might be too short.

In fact, mounting a filesystem with sb_sectsize larger
than the device sector size will cause a mount failure
if CRCs are enabled, because we are checksumming a length
which exceeds the buffer passed to it.

So always read twice; the first time we read with NULL
buffer ops to skip verification; then set the proper
read length, hook up the proper verifier, and give it
another go.

Once we are sure that we've got the right buffer length,
we can also use bp->b_length in the xfs_sb_read_verify,
rather than the less-trusted on-disk sectorsize for
secondary superblocks.  Before this we ran the risk of
passing junk to the crc32c routines, which didn't always
handle extreme values.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-19 15:39:16 +11:00
Eric Sandeen
7a01e707a3 xfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sb
My earlier commit 10e6e65 deserves a layer or two of brown paper
bags.  The logic in that commit means that a CRC failure on the
primary superblock will *never* result in an error return.

Hopefully this fixes it, so that we always return the error
if it's a primary superblock, otherwise only if the filesystem
has CRCs enabled.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-02-19 15:33:05 +11:00
Dave Chinner
3895e51f6d xfs: ensure correct log item buffer alignment
On 32 bit platforms, the log item vector headers are not 64 bit
aligned or sized. hence if we don't take care to align them
correctly or pad the buffer appropriately for 8 byte alignment, we
can end up with alignment issues when accessing the user buffer
directly as a structure.

To solve this, simply pad the buffer headers to 64 bit offset so
that the data section is always 8 byte aligned.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Tested-by: Michael L. Semon <mlsemon35@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-10 10:37:18 +11:00
Christoph Hellwig
fe60a8a091 xfs: ensure correct timestamp updates from truncate
The VFS doesn't set the proper ATTR_CTIME and ATTR_MTIME values for
truncate, so filesystems have to manually add them.  The
introduction of xfs_setattr_time accidentally broke this special
case an caused a regression in generic/313.  Fix this by removing
the local mask variable in xfs_setattr_size so that we only have a
single place to keep the attribute information.

cc: <stable@vger.kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-10 10:35:22 +11:00
Christoph Hellwig
9862f62fab xfs: allow appending aio writes
XFS can easily support appending aio writes by ensuring we always allocate
blocks as unwritten extents when performing direct I/O writes and only
converting them to written extents at I/O completion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-10 10:28:04 +11:00
Christoph Hellwig
d531d91d69 xfs: always use unwritten extents for direct I/O writes
To allow aio writes beyond i_size we need to create unwritten extents for
newly allocated blocks, similar to how we already do inside i_size.

Instead of adding another special case we now use unwritten extents
unconditionally.  This also marks the end of directly allocation data
extents in all of XFS - we now always use either delalloc or unwritten
extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-10 10:27:43 +11:00
Al Viro
d311d79de3 fix O_SYNC|O_APPEND syncing the wrong range on write()
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
	pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
	pos_before_write .. pos_before_write + written - 1
instead.  Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().

All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of ->aio_write().

The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-02-09 15:18:09 -05:00
Jie Liu
492185ef1d xfs: remove XFS_TRANS_DEBUG dead code
Remove the leftover XFS_TRANS_DEBUG dead code following the previous
cleaning up of it in commits ec47eb6b0b450.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07 15:26:11 +11:00
Jie Liu
4ae69fea58 xfs: return -E2BIG if hit the maximum size limits of ACLs
We should return -E2BIG rather than -EINVAL if hit the maximum size
limits of ACLS, as the former is consistent with VFS xattr syscalls.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07 15:26:11 +11:00
Eric Sandeen
392c6de98a xfs: sanitize sb_inopblock in xfs_mount_validate_sb
xfs_mount_validate_sb doesn't check sb_inopblock for sanity
(as does its xfs_repair counterpart, FWIW).

If it's out of bounds, we can go off the rails in i.e.
xfs_inode_buf_verify(), which uses sb_inopblock as a loop
limit when stepping through a metadata buffer.

The problem can be demonstrated easily by corrupting
sb_inopblock with xfs_db and trying to mount the result:

# mkfs.xfs -dfile,name=fsfile,size=1g
# xfs_db -x fsfile
xfs_db> sb 0
xfs_db> write inopblock 512
inopblock = 512
xfs_db> quit

# mount -o loop fsfile  mnt
and we blow up in xfs_inode_buf_verify().

With this patch, we get a (very noisy) corruption error,
and fail the mount as we should.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07 15:26:11 +11:00
Jie Liu
c6f9726444 xfs: convert xfs_log_commit_cil() to void
Convert xfs_log_commit_cil() to a void function since it return nothing
but 0 in any case, after that we can simplify the relative code logic
in xfs_trans_commit() accordingly.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07 15:26:07 +11:00
Brian Foster
410b11a675 xfs: use tr_qm_dqalloc log reservation for dquot alloc
The dquot allocation path in xfs_qm_dqread() currently uses the
attribute set log reservation, which appears to be incorrect. We
have reports of transaction reservation overruns with the current
code. E.g., a repeated run of xfstests test generic/270 on a 512b
block size fs occassionally produces the following in dmesg:

	XFS (sdN): xlog_write: reservation summary:
	  trans type  = QM_DQALLOC (30)
	  unit res    = 7080 bytes
	  current res = -632 bytes
	  total reg   = 0 bytes (o/flow = 0 bytes)
	  ophdrs      = 0 (ophdr space = 0 bytes)
	  ophdr + reg = 0 bytes
	  num regions = 0

	XFS (sdN): xlog_write: reservation ran out. Need to up reservation

The dquot allocation case should consist of a write reservation
(i.e., we are allocating a range of the internal quota file) plus
the size of the actual dquots. We already have a log reservation
definition for this operation (tr_qm_dqalloc). Use it in
xfs_qm_dqread() and update the log reservation calculation function
to use the write res. calculation function rather than reading the
assumed to be pre-calculated value directly.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07 14:55:54 +11:00
Eric Sandeen
c19ec23535 xfs: remove unused tr_swrite
tr_swrite is never used, remove it.

From a very quick look, I think the usage of it (and its ancestor
XFS_SWRITE_LOG_RES) went away in commit 13e6d5cd "xfs: merge fsync
and O_SYNC handling" back in 2009.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07 14:54:22 +11:00
Brian Foster
70bbca0776 xfs: use tr_growrtalloc for growing rt files
This is a regression from the following commit:

3d3c8b5222b9 xfs: refactor xfs_trans_reserve() interface

Use the tr_growrtalloc log reservation for growing the
bitmap/summary files.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-02-07 14:53:50 +11:00
Linus Torvalds
f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Linus Torvalds
f1499382f1 xfs: update #2 for v3.14-rc1
- allow logical sector sized direct io on 'advanced format' 4k/512 disk.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJS6C5lAAoJENaLyazVq6ZOFFMP/Ag3IzyGv9zQ1I1ebqxdGNmU
 8bNVvurN+Az1ISgXkMdcbqNgpVxcWaAcpbJwxWsRBaKBxtBmBWa0TF33XEmrFdmO
 Qqw3qLSsPzIDP5l518/bpn1Q0XhMWO2zss9HtHogw/7WBkB5qQYn94DAwIF1Zrga
 cimpuiVv7u73BnlMJeQl+mqONkuEUl9Oc9lOyDP7lqbAlp3mBDylTs0q99kug8fF
 LSQ+f+06PFKshPzTtcZHhqwMtK7Pd+wPp3B3i0iGvv1GcWWL1adM5HfWPLLQ/85l
 CvdTnKnIWpNvmaEbPtpB+kuz7V3CtnLFIBM1oRiYZoNRNEUG3SkkqH+xSkXY+Wfr
 Wiuplt50vHUvn1tiL6n2GE0QAyTNhcy2RiWzpkIim6S2LUXCmdOKvT9hEEs5ymVl
 Y+VSXwdPEbCHSfhqkg+PXUbWxk+WTkDe9rOMFK4VcTjDuxiWUAiFgVpEHrxjPV4A
 rFPwNgfUpaRMou5SzTZHLuM8YQ75DckDH+O3XpWD4boEfeTGeV3LhlRzLvIkL0I2
 8VB24kV9TPUj+rphh9AckdU3kfN1wIb7ZOjwr008I5udfgPwjoj+kSgnd+k2yPtr
 r/ryHTV+RchbsiYEdriibSgUOtJNdNWnboba1DEDZDSvqALsq5eKWmpm3InE85Ix
 2bN0SdjsTphp/Mc1Mllg
 =c0tN
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-v3.14-rc1-2' of git://oss.sgi.com/xfs/xfs

Pull second xfs update from Ben Myers:
 "Allow logical sector sized direct io on 'advanced format' 4k/512 disk"

* tag 'xfs-for-linus-v3.14-rc1-2' of git://oss.sgi.com/xfs/xfs:
  xfs: allow logical-sector sized O_DIRECT
  xfs: rename xfs_buftarg structure members
  xfs: clean up xfs_buftarg
2014-01-28 18:21:22 -08:00
Linus Torvalds
bf3d846b78 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted stuff; the biggest pile here is Christoph's ACL series.  Plus
  assorted cleanups and fixes all over the place...

  There will be another pile later this week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
  __dentry_path() fixes
  vfs: Remove second variable named error in __dentry_path
  vfs: Is mounted should be testing mnt_ns for NULL or error.
  Fix race when checking i_size on direct i/o read
  hfsplus: remove can_set_xattr
  nfsd: use get_acl and ->set_acl
  fs: remove generic_acl
  nfs: use generic posix ACL infrastructure for v3 Posix ACLs
  gfs2: use generic posix ACL infrastructure
  jfs: use generic posix ACL infrastructure
  xfs: use generic posix ACL infrastructure
  reiserfs: use generic posix ACL infrastructure
  ocfs2: use generic posix ACL infrastructure
  jffs2: use generic posix ACL infrastructure
  hfsplus: use generic posix ACL infrastructure
  f2fs: use generic posix ACL infrastructure
  ext2/3/4: use generic posix ACL infrastructure
  btrfs: use generic posix ACL infrastructure
  fs: make posix_acl_create more useful
  fs: make posix_acl_chmod more useful
  ...
2014-01-28 08:38:04 -08:00
Christoph Hellwig
2401dc2975 xfs: use generic posix ACL infrastructure
Also don't bother to set up a .get_acl method for symlinks as we do not
support access control (ACLs or even mode bits) for symlinks in Linux,
and create inodes with the proper mode instead of fixing it up later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:21 -05:00
Christoph Hellwig
37bc15392a fs: make posix_acl_create more useful
Rename the current posix_acl_created to __posix_acl_create and add
a fully featured helper to set up the ACLs on file creation that
uses get_acl().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:18 -05:00
Christoph Hellwig
5bf3258fd2 fs: make posix_acl_chmod more useful
Rename the current posix_acl_chmod to __posix_acl_chmod and add
a fully featured ACL chmod helper that uses the ->set_acl inode
operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:18 -05:00
Al Viro
96c8c44211 xfs: switch to kfree_put_link()
don't bother open-coding it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 03:13:00 -05:00
Eric Sandeen
7c71ee7803 xfs: allow logical-sector sized O_DIRECT
Some time ago, mkfs.xfs started picking the storage physical
sector size as the default filesystem "sector size" in order
to avoid RMW costs incurred by doing IOs at logical sector
size alignments.

However, this means that for a filesystem made with i.e.
a 4k sector size on an "advanced format" 4k/512 disk,
512-byte direct IOs are no longer allowed.  This means
that XFS has essentially turned this AF drive into a hard
4K device, from the filesystem on up.

XFS's mkfs-specified "sector size" is really just controlling
the minimum size & alignment of filesystem metadata.

There is no real need to tightly couple XFS's minimal
metadata size to the minimum allowed direct IO size;
XFS can continue doing metadata in optimal sizes, but
still allow smaller DIOs for apps which issue them,
for whatever reason.

This patch adds a new field to the xfs_buftarg, so that
we now track 2 sizes:

 1) The metadata sector size, which is the minimum unit and
    alignment of IO which will be performed by metadata operations.
 2) The device logical sector size

The first is used internally by the file system for metadata
alignment and IOs.
The second is used for the minimum allowed direct IO alignment.

This has passed xfstests on filesystems made with 4k sectors,
including when run under the patch I sent to ignore
XFS_IOC_DIOINFO, and issue 512 DIOs anyway.  I also directly
tested end of block behavior on preallocated, sparse, and
existing files when we do a 512 IO into a 4k file on a 
4k-sector filesystem, to be sure there were no unexpected
behaviors.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-01-24 11:55:42 -06:00
Eric Sandeen
6da54179b3 xfs: rename xfs_buftarg structure members
In preparation for adding new members to the structure,
give these old ones more descriptive names:

	bt_ssize -> bt_meta_sectorsize
	bt_smask -> bt_meta_sectormask

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-01-24 11:52:12 -06:00
Eric Sandeen
f0bc9985fe xfs: clean up xfs_buftarg
Clean up the xfs_buftarg structure a bit:
- remove bt_bsize which is never used
- replace bt_sshift with bt_ssize; we only ever shift it back

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-01-24 11:49:20 -06:00
Chuansheng Liu
1f4a63bf01 xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 6f96b3063cdd473c68664a190524ed966ac0cd92)
2014-01-10 12:39:38 -06:00
Jie Liu
bba719b500 xfs: fix off-by-one error in xfs_attr3_rmt_verify
With CRC check is enabled, if trying to set an attributes value just
equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
attr write verification procedure failure, which would yield the back
trace like below:

<snip>
XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
<snip>
Call Trace:
[<ffffffff816f0042>] dump_stack+0x45/0x56
[<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
[<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
[<ffffffff81097230>] ? wake_up_state+0x20/0x20
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
[<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
[<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
[<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
[<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
[<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
[<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
[<ffffffff811df9b2>] generic_setxattr+0x62/0x80
[<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
[<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
[<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
[<ffffffff811e054e>] setxattr+0x12e/0x1c0
[<ffffffff811c6e82>] ? final_putname+0x22/0x50
[<ffffffff811c708b>] ? putname+0x2b/0x40
[<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
[<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
[<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
[<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
[<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f

Tests:
    setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile

This patch fix it to check the remote EA size is greater than the
XATTR_SIZE_MAX rather than more than or equal to it, because it's
valid if the specified EA value size is equal to the limitation as
per VFS setxattr interface.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 85dd0707f0cad26d60f2dc574d17a5ab948d10f7)
2014-01-10 12:38:41 -06:00
Ben Myers
bf3964c188 Merge branch 'xfs-extent-list-locking-fixes' into for-next
A set of fixes which makes sure we are taking the ilock whenever accessing the
extent list.  This was associated with "Access to block zero" messages which
may result in extent list corruption.
2014-01-09 16:03:18 -06:00
Ben Myers
dc16b186bb Merge branch 'xfs-misc' into for-next
A bugfix for an off-by-one in the remote attribute verifier, and a fix for a
missing destroy_work_on_stack() in the allocation worker.
2014-01-09 15:58:59 -06:00
Chuansheng Liu
6f96b3063c xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-01-09 15:50:31 -06:00
Zhi Yong Wu
ab29743117 xfs: allow linkat() on O_TMPFILE files
The VFS allows an anonymous temporary file to be named at a later
time via a linkat() syscall. The inodes for O_TMPFILE files are
are marked with a special flag I_LINKABLE and have a zero link count.

To support this in XFS, xfs_link() detects if this flag I_LINKABLE
is set and behaves appropriately when detected. So in this case,
its transaciton reservation takes into account the additional
overhead of removing the inode from the unlinked list. Then the
inode is removed from the unlinked list and the directory entry
is added. Finally its link count is bumped accordingly.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de> 
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-01-06 13:52:53 -06:00
Zhi Yong Wu
99b6436bc2 xfs: add O_TMPFILE support
Add two functions xfs_create_tmpfile() and xfs_vn_tmpfile()
to support O_TMPFILE file creation.

In contrast to xfs_create(), xfs_create_tmpfile() has a different
log reservation to the regular file creation because there is no
directory modification, and doesn't check if an entry can be added
to the directory, but the reservation quotas is required appropriately,
and finally its inode is added to the unlinked list.

xfs_vn_tmpfile() add one O_TMPFILE method to VFS interface and directly
invoke xfs_create_tmpfile().

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-01-06 13:50:06 -06:00
Zhi Yong Wu
163467d375 xfs: factor prid related codes into xfs_get_initial_prid()
It will be reused by the O_TMPFILE creation function.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-01-06 13:46:25 -06:00
Jie Liu
85dd0707f0 xfs: fix off-by-one error in xfs_attr3_rmt_verify
With CRC check is enabled, if trying to set an attributes value just
equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
attr write verification procedure failure, which would yield the back
trace like below:

<snip>
XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
<snip>
Call Trace:
[<ffffffff816f0042>] dump_stack+0x45/0x56
[<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
[<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
[<ffffffff81097230>] ? wake_up_state+0x20/0x20
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
[<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
[<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
[<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
[<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
[<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
[<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
[<ffffffff811df9b2>] generic_setxattr+0x62/0x80
[<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
[<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
[<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
[<ffffffff811e054e>] setxattr+0x12e/0x1c0
[<ffffffff811c6e82>] ? final_putname+0x22/0x50
[<ffffffff811c708b>] ? putname+0x2b/0x40
[<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
[<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
[<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
[<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
[<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f

Tests:
    setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile

This patch fix it to check the remote EA size is greater than the
XATTR_SIZE_MAX rather than more than or equal to it, because it's
valid if the specified EA value size is equal to the limitation as
per VFS setxattr interface.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2014-01-06 13:37:35 -06:00
Jens Axboe
b28bc9b38c Linux 3.13-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJSwLfoAAoJEHm+PkMAQRiGi6QH/1U1B7lmHChDTw3jj1lfm9gA
 189Si4QJlnxFWCKHvKEL+pcaVuACU+aMGI8+KyMYK4/JfuWVjjj5fr/SvyHH2/8m
 LdSK8aHMhJ46uBS4WJ/l6v46qQa5e2vn8RKSBAyKm/h4vpt+hd6zJdoFrFai4th7
 k/TAwOAEHI5uzexUChwLlUBRTvbq4U8QUvDu+DeifC8cT63CGaaJ4qVzjOZrx1an
 eP6UXZrKDASZs7RU950i7xnFVDQu4PsjlZi25udsbeiKcZJgPqGgXz5ULf8ZH8RQ
 YCi1JOnTJRGGjyIOyLj7pyB01h7XiSM2+eMQ0S7g54F2s7gCJ58c2UwQX45vRWU=
 =/4/R
 -----END PGP SIGNATURE-----

Merge tag 'v3.13-rc6' into for-3.14/core

Needed to bring blk-mq uptodate, since changes have been going in
since for-3.14/core was established.

Fixup merge issues related to the immutable biovec changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk-flush.c
	fs/btrfs/check-integrity.c
	fs/btrfs/extent_io.c
	fs/btrfs/scrub.c
	fs/logfs/dev_bdev.c
2013-12-31 09:51:02 -07:00
Christoph Hellwig
eef334e577 xfs: assert that we hold the ilock for extent map access
Make sure that xfs_bmapi_read has the ilock held in some way, and that
xfs_bmapi_write, xfs_bmapi_delay, xfs_bunmapi and xfs_iread_extents are
called with the ilock held exclusively.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-18 16:08:46 -06:00
Christoph Hellwig
568d994e9f xfs: use xfs_ilock_attr_map_shared in xfs_attr_list_int
We might not have read in the extent list at this point, so make sure we
take the ilock exclusively if we have to do so.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-18 16:08:04 -06:00
Christoph Hellwig
683cb94159 xfs: use xfs_ilock_attr_map_shared in xfs_attr_get
We might not have read in the extent list at this point, so make sure we
take the ilock exclusively if we have to do so.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-18 16:07:09 -06:00
Christoph Hellwig
da51d32d45 xfs: use xfs_ilock_data_map_shared in xfs_qm_dqiterate
We might not have read in the extent list at this point, so make sure we
take the ilock exclusively if we have to do so.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-18 16:06:38 -06:00
Christoph Hellwig
f4df8adc83 xfs: use xfs_ilock_data_map_shared in xfs_qm_dqtobp
We might not have read in the extent list at this point, so make sure we
take the ilock exclusively if we have to do so.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-12-18 16:04:18 -06:00