43248 Commits

Author SHA1 Message Date
Linus Torvalds
33caf82acf Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "All kinds of stuff.  That probably should've been 5 or 6 separate
  branches, but by the time I'd realized how large and mixed that bag
  had become it had been too close to -final to play with rebasing.

  Some fs/namei.c cleanups there, memdup_user_nul() introduction and
  switching open-coded instances, burying long-dead code, whack-a-mole
  of various kinds, several new helpers for ->llseek(), assorted
  cleanups and fixes from various people, etc.

  One piece probably deserves special mention - Neil's
  lookup_one_len_unlocked().  Similar to lookup_one_len(), but gets
  called without ->i_mutex and tries to avoid ever taking it.  That, of
  course, means that it's not useful for any directory modifications,
  but things like getting inode attributes in nfds readdirplus are fine
  with that.  I really should've asked for moratorium on lookup-related
  changes this cycle, but since I hadn't done that early enough...  I
  *am* asking for that for the coming cycle, though - I'm going to try
  and get conversion of i_mutex to rwsem with ->lookup() done under lock
  taken shared.

  There will be a patch closer to the end of the window, along the lines
  of the one Linus had posted last May - mechanical conversion of
  ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
  inode_is_locked()/inode_lock_nested().  To quote Linus back then:

    -----
    |    This is an automated patch using
    |
    |        sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    |        sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    |        sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[     ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    |        sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    |        sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    |    with a very few manual fixups
    -----

  I'm going to send that once the ->i_mutex-affecting stuff in -next
  gets mostly merged (or when Linus says he's about to stop taking
  merges)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  nfsd: don't hold i_mutex over userspace upcalls
  fs:affs:Replace time_t with time64_t
  fs/9p: use fscache mutex rather than spinlock
  proc: add a reschedule point in proc_readfd_common()
  logfs: constify logfs_block_ops structures
  fcntl: allow to set O_DIRECT flag on pipe
  fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
  fs: xattr: Use kvfree()
  [s390] page_to_phys() always returns a multiple of PAGE_SIZE
  nbd: use ->compat_ioctl()
  fs: use block_device name vsprintf helper
  lib/vsprintf: add %*pg format specifier
  fs: use gendisk->disk_name where possible
  poll: plug an unused argument to do_poll
  amdkfd: don't open-code memdup_user()
  cdrom: don't open-code memdup_user()
  rsxx: don't open-code memdup_user()
  mtip32xx: don't open-code memdup_user()
  [um] mconsole: don't open-code memdup_user_nul()
  [um] hostaudio: don't open-code memdup_user()
  ...
2016-01-12 17:11:47 -08:00
Linus Torvalds
fce205e9da Merge branch 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs copy_file_range updates from Al Viro:
 "Several series around copy_file_range/CLONE"

* 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  btrfs: use new dedupe data function pointer
  vfs: hoist the btrfs deduplication ioctl to the vfs
  vfs: wire up compat ioctl for CLONE/CLONE_RANGE
  cifs: avoid unused variable and label
  nfsd: implement the NFSv4.2 CLONE operation
  nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
  vfs: pull btrfs clone API to vfs layer
  locks: new locks_mandatory_area calling convention
  vfs: Add vfs_copy_file_range() support for pagecache copies
  btrfs: add .copy_file_range file operation
  x86: add sys_copy_file_range to syscall tables
  vfs: add copy_file_range syscall and vfs helper
2016-01-12 16:30:34 -08:00
Linus Torvalds
065019a38f File locking related changes for v4.5 (pile #1)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWkwhdAAoJEAAOaEEZVoIVgpUQAMhB2+ryZtlJy4s7lkfI3Wwi
 ni7lAuJ6xXB0FIA8wqNzz6fVDY0pbsfwR45OS11fh+hU2FnM8REHCDPC47E8MQYx
 ft0Kfp7Z0tLAPni7XTVd/gFy8zTDGOKXBlu44PNaVEdtPJzIXwVzm2QkT7F3ExOz
 mkXSCta7lFemBQ0DhbafiWbfQ8yav1HFGZG7XN06A76y8ZET+Uu1oyiPPI4jvHlO
 vHrxpwia2ROnQHeG0pLR7KvOmN3ZSTJZuH6LiMZH0QFqyocYzmhR9rQ/hrxBg0rU
 IDzcMjP0ybU9Fu/o7sDShnkTawRuVLt0zasfdlYtGVCTYBx8f7WqkJnLTCwWYVDG
 MLQM7y8xWHM1f7uLhgT8WHg8O/e5saVUQ/djBqPI/ubGG1/LHDxyxH/GPVbeKa66
 G8jChyPmIdxdsjIapzefOjnTIi2vhZqv9I1gSKCj+x554GahoYQe7l0YbNnZGmNS
 O12QQ7dUpkzgDQEiTh73S3Ay2Ng95K2DztuHs6NXFdbiwpFMZqVATLXBEOYryBx/
 n487ZqrsTV7T3jH/ekxth1+j0Hpmigj8FNy21/nZ0Nr0OaTJFwsLEdN4Vi7LIM+H
 jBMEBk5dGIHODMvB/8NCud0eWzB671iLgVto7or/rT1YmaFapl/KR7FEWNv19sLN
 tshSViTosLGffQMpObOk
 =wJUS
 -----END PGP SIGNATURE-----

Merge tag 'locks-v4.5-1' of git://git.samba.org/jlayton/linux

Pull file locking updates from Jeff Layton:
 "File locking related changes for v4.5 (pile #1)

  Highlights:
   - new Kconfig option to allow disabling mandatory locking (which is
     racy anyway)
   - new tracepoints for setlk and close codepaths
   - fix for a long-standing bug in code that handles races between
     setting a POSIX lock and close()"

* tag 'locks-v4.5-1' of git://git.samba.org/jlayton/linux:
  locks: rename __posix_lock_file to posix_lock_inode
  locks: prink more detail when there are leaked locks
  locks: pass inode pointer to locks_free_lock_context
  locks: sprinkle some tracepoints around the file locking code
  locks: don't check for race with close when setting OFD lock
  locks: fix unlock when fcntl_setlk races with a close
  fs: make locks.c explicitly non-modular
  locks: use list_first_entry_or_null()
  locks: Don't allow mounts in user namespaces to enable mandatory locking
  locks: Allow disabling mandatory locking at compile time
2016-01-12 15:46:17 -08:00
Linus Torvalds
4f31d774dd Merge branch 'for-linus-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
 "This contains beside of random fixes/cleanups two bigger changes:

   - seccomp support by Mickaël Salaün

   - IRQ rework by Anton Ivanov"

* 'for-linus-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Use race-free temporary file creation
  um: Do not set unsecure permission for temporary file
  um: Fix build error and kconfig for i386
  um: Add seccomp support
  um: Add full asm/syscall.h support
  selftests/seccomp: Remove the need for HAVE_ARCH_TRACEHOOK
  um: Fix ptrace GETREGS/SETREGS bugs
  um: link with -lpthread
  um: Update UBD to use pread/pwrite family of functions
  um: Do not change hard IRQ flags in soft IRQ processing
  um: Prevent IRQ handler reentrancy
  uml: flush stdout before forking
  uml: fix hostfs mknod()
2016-01-12 13:27:18 -08:00
Linus Torvalds
67c707e451 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "The main changes in this cycle were:

   - code patching and cpu_has cleanups (Borislav Petkov)

   - paravirt cleanups (Juergen Gross)

   - TSC cleanup (Thomas Gleixner)

   - ptrace cleanup (Chen Gang)"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch/x86/kernel/ptrace.c: Remove unused arg_offs_table
  x86/mm: Align macro defines
  x86/cpu: Provide a config option to disable static_cpu_has
  x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
  x86/cpufeature: Cleanup get_cpu_cap()
  x86/cpufeature: Move some of the scattered feature bits to x86_capability
  x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_defer
  x86/paravirt: Remove unused pv_apic_ops structure
  x86/tsc: Remove unused tsc_pre_init() hook
  x86: Remove unused function cpu_has_ht_siblings()
  x86/paravirt: Kill some unused patching functions
2016-01-11 16:26:03 -08:00
Jaegeuk Kim
447135a866 f2fs: should unset atomic flag after successful commit
If there is an error during commit, we should keep the flag in order to
abort it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-11 15:56:44 -08:00
Jaegeuk Kim
1663cae48c f2fs: fix wrong memory condition check
This patch fixes wrong decision for avaliable_free_memory.
The return valus is already set as false, so we should consider true condition
below only.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-11 15:56:43 -08:00
Jaegeuk Kim
42190d2a86 f2fs: monitor the number of background checkpoint
This patch adds to show the number of background checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-11 15:56:42 -08:00
Jaegeuk Kim
d0239e1bf5 f2fs: detect idle time depending on user behavior
This patch adds last time that user requested filesystem operations.
This information is used to detect whether system is idle or not later.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-11 15:56:37 -08:00
Jaegeuk Kim
6beceb5427 f2fs: introduce time and interval facility
This patch adds time and interval arrays to store some timing variables.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-11 15:36:27 -08:00
Linus Torvalds
ddf1d6238d Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr updates from Al Viro:
 "Andreas' xattr cleanup series.

  It's a followup to his xattr work that went in last cycle; -0.5KLoC"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  xattr handlers: Simplify list operation
  ocfs2: Replace list xattr handler operations
  nfs: Move call to security_inode_listsecurity into nfs_listxattr
  xfs: Change how listxattr generates synthetic attributes
  tmpfs: listxattr should include POSIX ACL xattrs
  tmpfs: Use xattr handler infrastructure
  btrfs: Use xattr handler infrastructure
  vfs: Distinguish between full xattr names and proper prefixes
  posix acls: Remove duplicate xattr name definitions
  gfs2: Remove gfs2_xattr_acl_chmod
  vfs: Remove vfs_xattr_cmp
2016-01-11 13:32:10 -08:00
Linus Torvalds
32fb378437 Merge branch 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs RCU symlink updates from Al Viro:
 "Replacement of ->follow_link/->put_link, allowing to stay in RCU mode
  even if the symlink is not an embedded one.

  No changes since the mailbomb on Jan 1"

* 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch ->get_link() to delayed_call, kill ->put_link()
  kill free_page_put_link()
  teach nfs_get_link() to work in RCU mode
  teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode
  teach shmem_get_link() to work in RCU mode
  teach page_get_link() to work in RCU mode
  replace ->follow_link() with new method that could stay in RCU mode
  don't put symlink bodies in pagecache into highmem
  namei: page_getlink() and page_follow_link_light() are the same thing
  ufs: get rid of ->setattr() for symlinks
  udf: don't duplicate page_symlink_inode_operations
  logfs: don't duplicate page_symlink_inode_operations
  switch befs long symlinks to page_symlink_operations
2016-01-11 13:13:23 -08:00
Dave Chinner
dde7f55bd0 Merge branch 'xfs-misc-fixes-for-4.5-2' into for-next 2016-01-12 07:04:30 +11:00
Dave Chinner
7d6a13f023 xfs: handle dquot buffer readahead in log recovery correctly
When we do dquot readahead in log recovery, we do not use a verifier
as the underlying buffer may not have dquots in it. e.g. the
allocation operation hasn't yet been replayed. Hence we do not want
to fail recovery because we detect an operation to be replayed has
not been run yet. This problem was addressed for inodes in commit
d891400 ("xfs: inode buffers may not be valid during recovery
readahead") but the problem was not recognised to exist for dquots
and their buffers as the dquot readahead did not have a verifier.

The result of not using a verifier is that when the buffer is then
next read to replay a dquot modification, the dquot buffer verifier
will only be attached to the buffer if *readahead is not complete*.
Hence we can read the buffer, replay the dquot changes and then add
it to the delwri submission list without it having a verifier
attached to it. This then generates warnings in xfs_buf_ioapply(),
which catches and warns about this case.

Fix this and make it handle the same readahead verifier error cases
as for inode buffers by adding a new readahead verifier that has a
write operation as well as a read operation that marks the buffer as
not done if any corruption is detected.  Also make sure we don't run
readahead if the dquot buffer has been marked as cancelled by
recovery.

This will result in readahead either succeeding and the buffer
having a valid write verifier, or readahead failing and the buffer
state requiring the subsequent read to resubmit the IO with the new
verifier.  In either case, this will result in the buffer always
ending up with a valid write verifier on it.

Note: we also need to fix the inode buffer readahead error handling
to mark the buffer with EIO. Brian noticed the code I copied from
there wrong during review, so fix it at the same time. Add comments
linking the two functions that handle readahead verifier errors
together so we don't forget this behavioural link in future.

cc: <stable@vger.kernel.org> # 3.12 - current
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-12 07:04:01 +11:00
Dave Chinner
b79f4a1c68 xfs: inode recovery readahead can race with inode buffer creation
When we do inode readahead in log recovery, we do can do the
readahead before we've replayed the icreate transaction that stamps
the buffer with inode cores. The inode readahead verifier catches
this and marks the buffer as !done to indicate that it doesn't yet
contain valid inodes.

In adding buffer error notification  (i.e. setting b_error = -EIO at
the same time as as we clear the done flag) to such a readahead
verifier failure, we can then get subsequent inode recovery failing
with this error:

XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32

This occurs when readahead completion races with icreate item replay
such as:

	inode readahead
		find buffer
		lock buffer
		submit RA io
	....
	icreate recovery
	    xfs_trans_get_buffer
		find buffer
		lock buffer
		<blocks on RA completion>
	.....
	<ra completion>
		fails verifier
		clear XBF_DONE
		set bp->b_error = -EIO
		release and unlock buffer
	<icreate gains lock>
	icreate initialises buffer
	marks buffer as done
	adds buffer to delayed write queue
	releases buffer

At this point, we have an initialised inode buffer that is up to
date but has an -EIO state registered against it. When we finally
get to recovering an inode in that buffer:

	inode item recovery
	    xfs_trans_read_buffer
		find buffer
		lock buffer
		sees XBF_DONE is set, returns buffer
	    sees bp->b_error is set
		fail log recovery!

Essentially, we need xfs_trans_get_buf_map() to clear the error status of
the buffer when doing a lookup. This function returns uninitialised
buffers, so the buffer returned can not be in an error state and
none of the code that uses this function expects b_error to be set
on return. Indeed, there is an ASSERT(!bp->b_error); in the
transaction case in xfs_trans_get_buf_map() that would have caught
this if log recovery used transactions....

This patch firstly changes the inode readahead failure to set -EIO
on the buffer, and secondly changes xfs_buf_get_map() to never
return a buffer with an error state set so this first change doesn't
cause unexpected log recovery failures.

cc: <stable@vger.kernel.org> # 3.12 - current
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-12 07:03:44 +11:00
Chris Mason
988f1f576d Merge branch 'for-chris-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.5
Signed-off-by: Chris Mason <clm@fb.com>
2016-01-11 08:39:28 -08:00
Chris Mason
b28cf57246 Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
Signed-off-by: Chris Mason <clm@fb.com>
2016-01-11 06:08:37 -08:00
Chris Mason
a3058101c1 Merge branch 'misc-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 2016-01-11 05:59:32 -08:00
Eric Sandeen
f6106efae5 xfs: eliminate committed arg from xfs_bmap_finish
Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the
associated comments were replicated several times across
the attribute code, all dealing with what to do if the
transaction was or wasn't committed.

And in that replicated code, an ASSERT() test of an
uninitialized variable occurs in several locations:

	error = xfs_attr_thing(&args);
	if (!error) {
		error = xfs_bmap_finish(&args.trans, args.flist,
					&committed);
	}
	if (error) {
		ASSERT(committed);

If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish,
never set "committed", and then test it in the ASSERT.

Fix this up by moving the committed state internal to xfs_bmap_finish,
and add a new inode argument.  If an inode is passed in, it is passed
through to __xfs_trans_roll() and joined to the transaction there if
the transaction was committed.

xfs_qm_dqalloc() was a little unique in that it called bjoin rather
than ijoin, but as Dave points out we can detect the committed state
but checking whether (*tpp != tp).

Addresses-Coverity-Id: 102360
Addresses-Coverity-Id: 102361
Addresses-Coverity-Id: 102363
Addresses-Coverity-Id: 102364
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-11 11:34:01 +11:00
Vegard Nossum
9f2dfda2f2 uml: fix hostfs mknod()
An inverted return value check in hostfs_mknod() caused the function
to return success after handling it as an error (and cleaning up).

It resulted in the following segfault when trying to bind() a named
unix socket:

  Pid: 198, comm: a.out Not tainted 4.4.0-rc4
  RIP: 0033:[<0000000061077df6>]
  RSP: 00000000daae5d60  EFLAGS: 00010202
  RAX: 0000000000000000 RBX: 000000006092a460 RCX: 00000000dfc54208
  RDX: 0000000061073ef1 RSI: 0000000000000070 RDI: 00000000e027d600
  RBP: 00000000daae5de0 R08: 00000000da980ac0 R09: 0000000000000000
  R10: 0000000000000003 R11: 00007fb1ae08f72a R12: 0000000000000000
  R13: 000000006092a460 R14: 00000000daaa97c0 R15: 00000000daaa9a88
  Kernel panic - not syncing: Kernel mode fault at addr 0x40, ip 0x61077df6
  CPU: 0 PID: 198 Comm: a.out Not tainted 4.4.0-rc4 #1
  Stack:
   e027d620 dfc54208 0000006f da981398
   61bee000 0000c1ed daae5de0 0000006e
   e027d620 dfcd4208 00000005 6092a460
  Call Trace:
   [<60dedc67>] SyS_bind+0xf7/0x110
   [<600587be>] handle_syscall+0x7e/0x80
   [<60066ad7>] userspace+0x3e7/0x4e0
   [<6006321f>] ? save_registers+0x1f/0x40
   [<6006c88e>] ? arch_prctl+0x1be/0x1f0
   [<60054985>] fork_handler+0x85/0x90

Let's also get rid of the "cosmic ray protection" while we're at it.

Fixes: e9193059b1b3 "hostfs: fix races in dentry_name() and inode_name()"
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 21:49:47 +01:00
Richard Weinberger
4fdd1d51ad ubifs: Use XATTR_*_PREFIX_LEN
...instead of open coding it.

Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 12:33:47 +01:00
Dongsheng Yang
170eb55f7d UBIFS: add a comment in key.h for unused parameter
Add a comment in key.h to explain why we keep an unused
parameter in key helpers.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-01-10 12:33:30 +01:00
Dan Williams
5a023cdba5 block: enable dax for raw block devices
If an application wants exclusive access to all of the persistent memory
provided by an NVDIMM namespace it can use this raw-block-dax facility
to forgo establishing a filesystem.  This capability is targeted
primarily to hypervisors wanting to provision persistent memory for
guests.  It can be disabled / enabled dynamically via the new BLKDAXSET
ioctl.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-09 06:30:49 -08:00
Dan Williams
4ebb16ca9a block: introduce bdev_file_inode()
Similar to the file_inode() helper, provide a helper to lookup the inode for a
raw block device itself.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-09 06:30:49 -08:00
NeilBrown
bbddca8e8f nfsd: don't hold i_mutex over userspace upcalls
We need information about exports when crossing mountpoints during
lookup or NFSv4 readdir.  If we don't already have that information
cached, we may have to ask (and wait for) rpc.mountd.

In both cases we currently hold the i_mutex on the parent of the
directory we're asking rpc.mountd about.  We've seen situations where
rpc.mountd performs some operation on that directory that tries to take
the i_mutex again, resulting in deadlock.

With some care, we may be able to avoid that in rpc.mountd.  But it
seems better just to avoid holding a mutex while waiting on userspace.

It appears that lookup_one_len is pretty much the only operation that
needs the i_mutex.  So we could just drop the i_mutex elsewhere and do
something like

	mutex_lock()
	lookup_one_len()
	mutex_unlock()

In many cases though the lookup would have been cached and not required
the i_mutex, so it's more efficient to create a lookup_one_len() variant
that only takes the i_mutex when necessary.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09 03:07:52 -05:00
DengChao
db39c16724 fs:affs:Replace time_t with time64_t
The affs code uses "time_t" and "get_seconds()". This will cause
problems on 32-bit architectures in 2038 when time_t overflows.
This patch replaces them with "time64_t" and
"ktime_get_real_seconds()". This patch introduces expensive 64-bit
divsion in "secs_to_datestamp()", considering this function is not
called so often, the cost should be acceptable.

Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: DengChao <chao.deng@linaro.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09 02:59:19 -05:00
Sasha Levin
8f5fed1e91 fs/9p: use fscache mutex rather than spinlock
We may sleep inside a the lock, so use a mutex rather than spinlock.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09 02:57:21 -05:00
Eric Dumazet
3cc4a84e02 proc: add a reschedule point in proc_readfd_common()
User can pass an arbitrary large buffer to getdents().

It is typically a 32KB buffer used by libc scandir() implementation.

When scanning /proc/{pid}/fd, we can hold cpu way too long,
so add a cond_resched() to be kind with other tasks.

We've seen latencies of more than 50ms on real workloads.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09 02:56:10 -05:00
Julia Lawall
bc51b2a919 logfs: constify logfs_block_ops structures
The logfs_block_ops structures are never modified, so declare them as
const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09 02:55:45 -05:00
Stanislav Kinsburskiy
0dbf5f2065 fcntl: allow to set O_DIRECT flag on pipe
With packetized mode for pipes, it's not possible to set O_DIRECT on pipe file
via sys_fcntl, because of unsupported sanity checks.
Ability to set this flag will be used by CRIU to migrate packetized pipes.

v2:
Fixed typos and mode variable to check.

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09 02:55:37 -05:00
Abhi Das
90330e689c fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
During testing, I discovered that __generic_file_splice_read() returns
0 (EOF) when aops->readpage fails with AOP_TRUNCATED_PAGE on the first
page of a single/multi-page splice read operation. This EOF return code
causes the userspace test to (correctly) report a zero-length read error
when it was expecting otherwise.

The current strategy of returning a partial non-zero read when ->readpage
returns AOP_TRUNCATED_PAGE works only when the failed page is not the
first of the lot being processed.

This patch attempts to retry lookup and call ->readpage again on pages
that had previously failed with AOP_TRUNCATED_PAGE. With this patch, my
tests pass and I haven't noticed any unwanted side effects.

This version removes the thrice-retry loop and instead indefinitely
retries lookups on AOP_TRUNCATED_PAGE errors from ->readpage. This
behavior is now similar to do_generic_file_read().

Signed-off-by: Abhi Das <adas@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09 02:55:35 -05:00
Richard Weinberger
0b2a6f231d fs: xattr: Use kvfree()
... instead of open coding it.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-09 02:55:18 -05:00
Al Viro
263a3df18f nbd: use ->compat_ioctl()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08 21:20:32 -05:00
Al Viro
6108209c4a Merge branch 'for-linus' into work.misc 2016-01-08 21:20:11 -05:00
Jann Horn
a7f61e89af compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS)
This replaces all code in fs/compat_ioctl.c that translated
ioctl arguments into a in-kernel structure, then performed
do_ioctl under set_fs(KERNEL_DS), with code that allocates
data on the user stack and can call the VFS ioctl handler
under USER_DS.

This is done as a hardening measure because the caller
does not know what kind of ioctl handler will be invoked,
only that no corresponding compat_ioctl handler exists and
what the ioctl command number is. The accidental
invocation of an unlocked_ioctl handler that unexpectedly
calls copy_to_user could be a severe security issue.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08 21:18:13 -05:00
Al Viro
66cf191f3e compat_ioctl: don't pass fd around when not needed
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08 21:16:50 -05:00
Jann Horn
b43417216e compat_ioctl: don't look up the fd twice
In code in fs/compat_ioctl.c that translates ioctl arguments
into a in-kernel structure, then performs sys_ioctl, possibly
under set_fs(KERNEL_DS), this commit changes the sys_ioctl
calls to do_ioctl calls. do_ioctl is a new function that does
the same thing as sys_ioctl, but doesn't look up the fd again.

This change is made to avoid (potential) security issues
because of ioctl handlers that accept one of the ioctl
commands I2C_FUNCS, VIDEO_GET_EVENT, MTIOCPOS, MTIOCGET,
TIOCGSERIAL, TIOCSSERIAL, RTC_IRQP_READ, RTC_EPOCH_READ.
This can happen for multiple reasons:

 - The ioctl command number could be reused.
 - The ioctl handler might not check the full ioctl
   command. This is e.g. true for drm_ioctl.
 - The ioctl handler is very special, e.g. cuse_file_ioctl

The real issue is that set_fs(KERNEL_DS) is used here,
but that's fixed in a separate commit
"compat_ioctl: don't call do_ioctl under set_fs(KERNEL_DS)".

This change mitigates potential security issues by
preventing a race that permits invocation of
unlocked_ioctl handlers under KERNEL_DS through compat
code even if a corresponding compat_ioctl handler exists.

So far, no way has been identified to use this to damage
kernel memory without having CAP_SYS_ADMIN in the init ns
(with the capability, doing reads/writes at arbitrary
kernel addresses should be easy through CUSE's ioctl
handler with FUSE_IOCTL_UNRESTRICTED set).

[AV: two missed sys_ioctl() taken care of]

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08 21:16:11 -05:00
Jeff Layton
6b9b21073d nfsd: give up on CB_LAYOUTRECALLs after two lease periods
Have the CB_LAYOUTRECALL code treat NFS4_OK and NFS4ERR_DELAY returns
equivalently. Change the code to periodically resend CB_LAYOUTRECALLS
until the ls_layouts list is empty or the client returns a different
error code.

If we go for two lease periods without the list being emptied or the
client sending a hard error, then we give up and clean out the list
anyway.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-01-08 16:47:51 -05:00
Li Xi
9b7365fc1c ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
support for ext4. The interface is kept consistent with
XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.

Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
2016-01-08 16:01:22 -05:00
Li Xi
689c958cbe ext4: add project quota support
This patch adds mount options for enabling/disabling project quota
accounting and enforcement. A new specific inode is also used for
project quota accounting.

[ Includes fix from Dan Carpenter to crrect error checking from dqget(). ]

Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
2016-01-08 16:01:22 -05:00
Li Xi
040cb3786d ext4: adds project ID support
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
2016-01-08 16:01:21 -05:00
Theodore Ts'o
56a04915df ext4 crypto: simplify interfaces to directory entry insert functions
A number of functions include ext4_add_dx_entry, make_indexed_dir,
etc. are being passed a dentry even though the only thing they use is
the containing parent.  We can shrink the code size slightly by making
this replacement.  This will also be useful in cases where we don't
have a dentry as the argument to the directory entry insert functions.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-01-08 16:00:31 -05:00
Chao Yu
9b72a388f5 f2fs: skip releasing nodes in chindless extent tree
If there are no nodes in extent tree, let's skip releasing step to avoid
any overhead of grabbing/releasing extent tree lock.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:57:18 -08:00
Chao Yu
68e3538510 f2fs: use atomic type for node count in extent tree
1. rename field in struct extent_tree from count to node_cnt for
   readability.
2. alter to use atomic type for node_cnt.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:57:11 -08:00
Chao Yu
da5af127a1 f2fs: recognize encrypted data in f2fs_fiemap
This patch fixes to teach f2fs_fiemap to recognize encrypted data.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:51:58 -08:00
Jaegeuk Kim
2c4db1a6f6 f2fs: clean up f2fs_balance_fs
This patch adds one parameter to clean up all the callers of f2fs_balance_fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:45:23 -08:00
Jaegeuk Kim
2a4b8e9fab f2fs: remove redundant calls
This patch removes redundant calls.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:45:23 -08:00
Jaegeuk Kim
12719ae14e f2fs: avoid unnecessary f2fs_balance_fs calls
Only when node page is newly dirtied, it needs to check whether we need to do
f2fs_gc.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:45:22 -08:00
Jaegeuk Kim
7612118ae8 f2fs: check the page status filled from disk
After reading a page, we need to check whether there is any error.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:45:21 -08:00
Chao Yu
0e022ea8fc f2fs: introduce __get_node_page to reuse common code
There are duplicated code in between get_node_page and get_node_page_ra,
introduce __get_node_page to includes common parts of these two, and
export get_node_page and get_node_page_ra by reusing __get_node_page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08 11:45:20 -08:00