38440 Commits

Author SHA1 Message Date
Andrew Morton
98acbf63d6 fs/ocfs2/stack_user.c: fix typo in ocfs2_control_release()
It is supposed to zero pv_minor.

Reported-by: Himangi Saraogi <himangi774@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:46 -04:00
Andrea Gelmini
7143e49441 ntfs: remove bogus space
fs/ntfs/debug.c:124: WARNING: space prohibited between function name and
open parenthesis '('

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:46 -04:00
Anton Altaparmakov
5272d036b2 ntfs: use find_get_page_flags() to mark page accessed as it is no longer marked later on
Mel Gorman's commit 2457aec63745 ("mm: non-atomically mark page accessed
during page cache allocation where possible") removed mark_page_accessed()
calls from NTFS without updating the matching find_lock_page() to
find_get_page_flags(GFP_LOCK | FGP_ACCESSED) thus causing the page to
never be marked accessed.

This patch fixes that.

Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:46 -04:00
Yann Droneaud
0b37e097a6 fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
According to commit 80af258867648 ("fanotify: groups can specify their
f_flags for new fd"), file descriptors created as part of file access
notification events inherit flags from the event_f_flags argument passed
to syscall fanotify_init(2)[1].

Unfortunately O_CLOEXEC is currently silently ignored.

Indeed, event_f_flags are only given to dentry_open(), which only seems to
care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in
open_check_o_direct() and O_LARGEFILE in generic_file_open().

It's a pity, since, according to some lookup on various search engines and
http://codesearch.debian.net/, there's already some userspace code which
use O_CLOEXEC:

- in systemd's readahead[2]:

    fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

- in clsync[3]:

    #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)

    int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);

- in examples [4] from "Filesystem monitoring in the Linux
  kernel" article[5] by Aleksander Morgado:

    if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                      O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)

Additionally, since commit 48149e9d3a7e ("fanotify: check file flags
passed in fanotify_init").  having O_CLOEXEC as part of fanotify_init()
second argument is expressly allowed.

So it seems expected to set close-on-exec flag on the file descriptors if
userspace is allowed to request it with O_CLOEXEC.

But Andrew Morton raised[6] the concern that enabling now close-on-exec
might break existing applications which ask for O_CLOEXEC but expect the
file descriptor to be inherited across exec().

In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file
descriptor returned as part of file access notify can break applications
due to deadlock.  So close-on-exec is needed for most applications.

More, applications asking for close-on-exec are likely expecting it to be
enabled, relying on O_CLOEXEC being effective.  If not, it might weaken
their security, as noted by Jan Kara[8].

So this patch replaces call to macro get_unused_fd() by a call to function
get_unused_fd_flags() with event_f_flags value as argument.  This way
O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is
interpreted and close-on-exec get enabled when requested.

[1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
[2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
[3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
    https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
[4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
[5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
[6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
[7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
[8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Mihai Don\u021bu <mihai.dontu@gmail.com>
Cc: Pádraig Brady <P@draigBrady.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Richard Guy Briggs <rgb@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:46 -04:00
Sasha Levin
105d1b4253 fsnotify: don't put user context if it was never assigned
On some failure paths we may attempt to free user context even if it
wasn't assigned yet.  This will cause a NULL ptr deref and a kernel BUG.

The path I was looking at is in inotify_new_group():

        oevent = kmalloc(sizeof(struct inotify_event_info), GFP_KERNEL);
        if (unlikely(!oevent)) {
                fsnotify_destroy_group(group);
                return ERR_PTR(-ENOMEM);
        }

fsnotify_destroy_group() would get called here, but
group->inotify_data.user is only getting assigned later:

	group->inotify_data.user = get_current_user();

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Reviewed-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:45 -04:00
Andrew Morton
cafbaae8af fs/notify/group.c: make fsnotify_final_destroy_group() static
No callers outside this file.

Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:45 -04:00
Jan Kara
6174c2eb8e udf: Fix loading of special inodes
Some UDF media have special inodes (like VAT or metadata partition
inodes) whose link_count is 0. Thus commit 4071b9136223 (udf: Properly
detect stale inodes) broke loading these inodes because udf_iget()
started returning -ESTALE for them. Since we still need to properly
detect stale inodes queried by NFS, create two variants of udf_iget() -
one which is used for looking up special inodes (which ignores
link_count == 0) and one which is used for other cases which return
ESTALE when link_count == 0.

Fixes: 4071b913622316970d0e1919f7d82b4403fec5f2
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2014-10-09 13:06:14 +02:00
Linus Torvalds
47137c6ba1 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Nothing really exciting this time:

   - a few fixlets in the NOHZ code

   - a new ARM SoC timer abomination.  One should expect that we have
     enough of them already, but they insist on inventing new ones.

   - the usual bunch of ARM SoC timer updates.  That feels like herding
     cats"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: arm_arch_timer: Consolidate arch_timer_evtstrm_enable
  clocksource: arm_arch_timer: Enable counter access for 32-bit ARM
  clocksource: arm_arch_timer: Change clocksource name if CP15 unavailable
  clocksource: sirf: Disable counter before re-setting it
  clocksource: cadence_ttc: Add support for 32bit mode
  clocksource: tcb_clksrc: Sanitize IRQ request
  clocksource: arm_arch_timer: Discard unavailable timers correctly
  clocksource: vf_pit_timer: Support shutdown mode
  ARM: meson6: clocksource: Add Meson6 timer support
  ARM: meson: documentation: Add timer documentation
  clocksource: sh_tmu: Document r8a7779 binding
  clocksource: sh_mtu2: Document r7s72100 binding
  clocksource: sh_cmt: Document SoC specific bindings
  timerfd: Remove an always true check
  nohz: Avoid tick's double reprogramming in highres mode
  nohz: Fix spurious periodic tick behaviour in low-res dynticks mode
2014-10-09 06:35:05 -04:00
Al Viro
821cc3070f ncpfs: use list_for_each_entry() for d_subdirs walk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:16 -04:00
Seunghun Lee
5e6123f347 vfs: move getname() from callers to do_mount()
It would make more sense to pass char __user * instead of
char * in callers of do_mount() and do getname() inside do_mount().

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Seunghun Lee <waydi1@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:16 -04:00
Al Viro
4d93bc3e81 gfs2_atomic_open(): skip lookups on hashed dentry
hashed dentry can be passed to ->atomic_open() only if
a) it has just passed revalidation and
b) it's negative

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:15 -04:00
Al Viro
9bb8730ed3 jfs: don't hash direct inode
hlist_add_fake(inode->i_hash), same as for the rest of special ones...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:13 -04:00
Al Viro
c2e3f5d5f4 ecryptfs: ->f_op is never NULL
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:12 -04:00
Al Viro
e983094d6d missing annotation in fs/file.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:11 -04:00
Tim Gardner
b8850d1fa8 fs: namespace: suppress 'may be used uninitialized' warnings
The gcc version 4.9.1 compiler complains Even though it isn't possible for
these variables to not get initialized before they are used.

fs/namespace.c: In function ‘SyS_mount’:
fs/namespace.c:2720:8: warning: ‘kernel_dev’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags,
        ^
fs/namespace.c:2699:8: note: ‘kernel_dev’ was declared here
  char *kernel_dev;
        ^
fs/namespace.c:2720:8: warning: ‘kernel_type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags,
        ^
fs/namespace.c:2697:8: note: ‘kernel_type’ was declared here
  char *kernel_type;
        ^

Fix the warnings by simplifying copy_mount_string() as suggested by Al Viro.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:10 -04:00
Al Viro
2ec3a12a66 cachefiles_write_page(): switch to __kernel_write()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:05 -04:00
Al Viro
4b8e992392 9p: switch to %p[dD]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:04 -04:00
Al Viro
35c265e008 cifs: switch to use of %p[dD]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:03 -04:00
Mikulas Patocka
c2ca0fcd20 fs: make cont_expand_zero interruptible
This patch makes it possible to kill a process looping in
cont_expand_zero. A process may spend a lot of time in this function, so
it is desirable to be able to kill it.

It happened to me that I wanted to copy a piece data from the disk to a
file. By mistake, I used the "seek" parameter to dd instead of "skip". Due
to the "seek" parameter, dd attempted to extend the file and became stuck
doing so - the only possibility was to reset the machine or wait many
hours until the filesystem runs out of space and cont_expand_zero fails.
We need this patch to be able to terminate the process.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:03 -04:00
Tetsuo Handa
475d0db742 fs: Fix theoretical division by 0 in super_cache_scan().
total_objects could be 0 and is used as a denom.

While total_objects is a "long", total_objects == 0 unlikely happens for
3.12 and later kernels because 32-bit architectures would not be able to
hold (1 << 32) objects. However, total_objects == 0 may happen for kernels
between 3.1 and 3.11 because total_objects in prune_super() was an "int"
and (e.g.) x86_64 architecture might be able to hold (1 << 32) objects.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable <stable@kernel.org> # 3.1+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:02 -04:00
Daeseok Youn
b8314f9303 dcache: Fix no spaces at the start of a line in dcache.c
Fixed coding style in dcache.c

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:02 -04:00
Al Viro
99358a1ca5 [jffs2] kill wbuf_queued/wbuf_dwork_lock
schedule_delayed_work() happening when the work is already pending is
a cheap no-op.  Don't bother with ->wbuf_queued logics - it's both
broken (cancelling ->wbuf_dwork leaves it set, as spotted by Jeff Harris)
and pointless.  It's cheaper to let schedule_delayed_work() handle that
case.

Reported-by: Jeff Harris <jefftharris@gmail.com>
Tested-by: Jeff Harris <jefftharris@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:01 -04:00
Al Viro
19d860a140 handle suicide on late failure exits in execve() in search_binary_handler()
... rather than doing that in the guts of ->load_binary().
[updated to fix the bug spotted by Shentino - for SIGSEGV we really need
something stronger than send_sig_info(); again, better do that in one place]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:00 -04:00
Al Viro
2926620145 dcache.c: call ->d_prune() regardless of d_unhashed()
the only in-tree instance checks d_unhashed() anyway,
out-of-tree code can preserve the current behaviour by
adding such check if they want it and we get an ability
to use it in cases where we *want* to be notified of
killing being inevitable before ->d_lock is dropped,
whether it's unhashed or not.  In particular, autofs
would benefit from that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:59 -04:00
Al Viro
29355c3904 d_prune_alias(): just lock the parent and call __dentry_kill()
The only reason for games with ->d_prune() was __d_drop(), which
was needed only to force dput() into killing the sucker off.

Note that lock_parent() can be called under ->i_lock and won't
drop it, so dentry is safe from somebody managing to kill it
under us - it won't happen while we are holding ->i_lock.

__dentry_kill() is called only with ->d_lockref.count being 0
(here and when picked from shrink list) or 1 (dput() and dropping
the ancestors in shrink_dentry_list()), so it will never be called
twice - the first thing it's doing is making ->d_lockref.count
negative and once that happens, nothing will increment it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:59 -04:00
Eric W. Biederman
bbd5192412 proc: Update proc_flush_task_mnt to use d_invalidate
Now that d_invalidate always succeeds and flushes mount points use
it in stead of a combination of shrink_dcache_parent and d_drop
in proc_flush_task_mnt.  This removes the danger of a mount point
under /proc/<pid>/... becoming unreachable after the d_drop.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:58 -04:00
Eric W. Biederman
c143c2333c vfs: Remove d_drop calls from d_revalidate implementations
Now that d_invalidate always succeeds it is not longer necessary or
desirable to hard code d_drop calls into filesystem specific
d_revalidate implementations.

Remove the unnecessary d_drop calls and rely on d_invalidate
to drop the dentries.  Using d_invalidate ensures that paths
to mount points will not be dropped.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:58 -04:00
Eric W. Biederman
5542aa2fa7 vfs: Make d_invalidate return void
Now that d_invalidate can no longer fail, stop returning a useless
return code.  For the few callers that checked the return code update
remove the handling of d_invalidate failure.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:57 -04:00
Eric W. Biederman
1ffe46d11c vfs: Merge check_submounts_and_drop and d_invalidate
Now that d_invalidate is the only caller of check_submounts_and_drop,
expand check_submounts_and_drop inline in d_invalidate.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:57 -04:00
Eric W. Biederman
9b053f3207 vfs: Remove unnecessary calls of check_submounts_and_drop
Now that check_submounts_and_drop can not fail and is called from
d_invalidate there is no longer a need to call check_submounts_and_drom
from filesystem d_revalidate methods so remove it.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:56 -04:00
Eric W. Biederman
8ed936b567 vfs: Lazily remove mounts on unlinked files and directories.
With the introduction of mount namespaces and bind mounts it became
possible to access files and directories that on some paths are mount
points but are not mount points on other paths.  It is very confusing
when rm -rf somedir returns -EBUSY simply because somedir is mounted
somewhere else.  With the addition of user namespaces allowing
unprivileged mounts this condition has gone from annoying to allowing
a DOS attack on other users in the system.

The possibility for mischief is removed by updating the vfs to support
rename, unlink and rmdir on a dentry that is a mountpoint and by
lazily unmounting mountpoints on deleted dentries.

In particular this change allows rename, unlink and rmdir system calls
on a dentry without a mountpoint in the current mount namespace to
succeed, and it allows rename, unlink, and rmdir performed on a
distributed filesystem to update the vfs cache even if when there is a
mount in some namespace on the original dentry.

There are two common patterns of maintaining mounts: Mounts on trusted
paths with the parent directory of the mount point and all ancestory
directories up to / owned by root and modifiable only by root
(i.e. /media/xxx, /dev, /dev/pts, /proc, /sys, /sys/fs/cgroup/{cpu,
cpuacct, ...}, /usr, /usr/local).  Mounts on unprivileged directories
maintained by fusermount.

In the case of mounts in trusted directories owned by root and
modifiable only by root the current parent directory permissions are
sufficient to ensure a mount point on a trusted path is not removed
or renamed by anyone other than root, even if there is a context
where the there are no mount points to prevent this.

In the case of mounts in directories owned by less privileged users
races with users modifying the path of a mount point are already a
danger.  fusermount already uses a combination of chdir,
/proc/<pid>/fd/NNN, and UMOUNT_NOFOLLOW to prevent these races.  The
removable of global rename, unlink, and rmdir protection really adds
nothing new to consider only a widening of the attack window, and
fusermount is already safe against unprivileged users modifying the
directory simultaneously.

In principle for perfect userspace programs returning -EBUSY for
unlink, rmdir, and rename of dentires that have mounts in the local
namespace is actually unnecessary.  Unfortunately not all userspace
programs are perfect so retaining -EBUSY for unlink, rmdir and rename
of dentries that have mounts in the current mount namespace plays an
important role of maintaining consistency with historical behavior and
making imperfect userspace applications hard to exploit.

v2: Remove spurious old_dentry.
v3: Optimized shrink_submounts_and_drop
    Removed unsued afs label
v4: Simplified the changes to check_submounts_and_drop
    Do not rename check_submounts_and_drop shrink_submounts_and_drop
    Document what why we need atomicity in check_submounts_and_drop
    Rely on the parent inode mutex to make d_revalidate and d_invalidate
    an atomic unit.
v5: Refcount the mountpoint to detach in case of simultaneous
    renames.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:56 -04:00
Eric W. Biederman
80b5dce8c5 vfs: Add a function to lazily unmount all mounts from any dentry.
The new function detach_mounts comes in two pieces.  The first piece
is a static inline test of d_mounpoint that returns immediately
without taking any locks if d_mounpoint is not set.  In the common
case when mountpoints are absent this allows the vfs to continue
running with it's same cacheline foot print.

The second piece of detach_mounts __detach_mounts actually does the
work and it assumes that a mountpoint is present so it is slow and
takes namespace_sem for write, and then locks the mount hash (aka
mount_lock) after a struct mountpoint has been found.

With those two locks held each entry on the list of mounts on a
mountpoint is selected and lazily unmounted until all of the mount
have been lazily unmounted.

v7: Wrote a proper change description and removed the changelog
    documenting deleted wrong turns.

Signed-off-by: Eric W. Biederman <ebiederman@twitter.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:55 -04:00
Eric W. Biederman
e2dfa93546 vfs: factor out lookup_mountpoint from new_mountpoint
I am shortly going to add a new user of struct mountpoint that
needs to look up existing entries but does not want to create
a struct mountpoint if one does not exist.  Therefore to keep
the code simple and easy to read split out lookup_mountpoint
from new_mountpoint.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:55 -04:00
Eric W. Biederman
0a5eb7c818 vfs: Keep a list of mounts on a mount point
To spot any possible problems call BUG if a mountpoint
is put when it's list of mounts is not empty.

AV: use hlist instead of list_head

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Eric W. Biederman <ebiederman@twitter.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:54 -04:00
Eric W. Biederman
7af1364ffa vfs: Don't allow overwriting mounts in the current mount namespace
In preparation for allowing mountpoints to be renamed and unlinked
in remote filesystems and in other mount namespaces test if on a dentry
there is a mount in the local mount namespace before allowing it to
be renamed or unlinked.

The primary motivation here are old versions of fusermount unmount
which is not safe if the a path can be renamed or unlinked while it is
verifying the mount is safe to unmount.  More recent versions are simpler
and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount
in a directory owned by an arbitrary user.

Miklos Szeredi <miklos@szeredi.hu> reports this is approach is good
enough to remove concerns about new kernels mixed with old versions
of fusermount.

A secondary motivation for restrictions here is that it removing empty
directories that have non-empty mount points on them appears to
violate the rule that rmdir can not remove empty directories.  As
Linus Torvalds pointed out this is useful for programs (like git) that
test if a directory is empty with rmdir.

Therefore this patch arranges to enforce the existing mount point
semantics for local mount namespace.

v2: Rewrote the test to be a drop in replacement for d_mountpoint
v3: Use bool instead of int as the return type of is_local_mountpoint

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:54 -04:00
Eric W. Biederman
bafc9b754f vfs: More precise tests in d_invalidate
The current comments in d_invalidate about what and why it is doing
what it is doing are wildly off-base.  Which is not surprising as
the comments date back to last minute bug fix of the 2.2 kernel.

The big fat lie of a comment said: If it's a directory, we can't drop
it for fear of somebody re-populating it with children (even though
dropping it would make it unreachable from that root, we still might
repopulate it if it was a working directory or similar).

[AV] What we really need to avoid is multiple dentry aliases of the
same directory inode; on all filesystems that have ->d_revalidate()
we either declare all positive dentries always valid (and thus never
fed to d_invalidate()) or use d_materialise_unique() and/or d_splice_alias(),
which take care of alias prevention.

The current rules are:
- To prevent mount point leaks dentries that are mount points or that
  have childrent that are mount points may not be be unhashed.
- All dentries may be unhashed.
- Directories may be rehashed with d_materialise_unique

check_submounts_and_drop implements this already for well maintained
remote filesystems so implement the current rules in d_invalidate
by just calling check_submounts_and_drop.

The one difference between d_invalidate and check_submounts_and_drop
is that d_invalidate must respect it when a d_revalidate method has
earlier called d_drop so preserve the d_unhashed check in
d_invalidate.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:54 -04:00
Eric W. Biederman
3ccb354d64 vfs: Document the effect of d_revalidate on d_find_alias
d_drop or check_submounts_and_drop called from d_revalidate can result
in renamed directories with child dentries being unhashed.  These
renamed and drop directory dentries can be rehashed after
d_materialise_unique uses d_find_alias to find them.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:53 -04:00
Al Viro
9ea459e110 delayed mntput
On final mntput() we want fs shutdown to happen before return to
userland; however, the only case where we want it happen right
there (i.e. where task_work_add won't do) is MNT_INTERNAL victim.
Those have to be fully synchronous - failure halfway through module
init might count on having vfsmount killed right there.  Fortunately,
final mntput on MNT_INTERNAL vfsmounts happens on shallow stack.
So we handle those synchronously and do an analog of delayed fput
logics for everything else.

As the result, we are guaranteed that fs shutdown will always happen
on shallow stack.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:53 -04:00
Ian Kent
b3ca406f27 autofs - remove obsolete d_invalidate() from expire
Biederman's umount-on-rmdir series changes d_invalidate() to sumarily remove
mounts under the passed in dentry regardless of whether they are busy
or not. So calling this in fs/autofs4/expire.c:autofs4_tree_busy() is
definitely the wrong thing to do becuase it will silently umount entries
instead of just cleaning stale dentrys.

But this call shouldn't be needed and testing shows that automounting
continues to function without it.

As Al Viro correctly surmises the original intent of the call was to
perform what shrink_dcache_parent() does.

If at some time in the future I see stale dentries accumulating
following failed mounts I'll revisit the issue and possibly add a
shrink_dcache_parent() call if needed.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:52 -04:00
Al Viro
8d85b4845a Allow sharing external names after __d_move()
* external dentry names get a small structure prepended to them
(struct external_name).
* it contains an atomic refcount, matching the number of struct dentry
instances that have ->d_name.name pointing to that external name.  The
first thing free_dentry() does is decrementing refcount of external name,
so the instances that are between the call of free_dentry() and
RCU-delayed actual freeing do not contribute.
* __d_move(x, y, false) makes the name of x equal to the name of y,
external or not.  If y has an external name, extra reference is grabbed
and put into x->d_name.name.  If x used to have an external name, the
reference to the old name is dropped and, should it reach zero, freeing
is scheduled via kfree_rcu().
* free_dentry() in dentry with external name decrements the refcount of
that name and, should it reach zero, does RCU-delayed call that will
free both the dentry and external name.  Otherwise it does what it
used to do, except that __d_free() doesn't even look at ->d_name.name;
it simply frees the dentry.

All non-RCU accesses to dentry external name are safe wrt freeing since they
all should happen before free_dentry() is called.  RCU accesses might run
into a dentry seen by free_dentry() or into an old name that got already
dropped by __d_move(); however, in both cases dentry must have been
alive and refer to that name at some point after we'd done rcu_read_lock(),
which means that any freeing must be still pending.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:41 -04:00
Trond Myklebust
6543f80367 NFSv4.1/pnfs: replace broken pnfs_put_lseg_async
You cannot call pnfs_put_lseg_async() more than once per lseg, so it
is really an inappropriate way to deal with a refcount issue.

Instead, replace it with a function that decrements the refcount, and
puts the final 'free' operation (which is incompatible with locks) on
the workqueue.

Cc: Weston Andros Adamson <dros@primarydata.com>
Fixes: e6cf82d1830f: pnfs: add pnfs_put_lseg_async
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-10-08 16:45:43 -04:00
Andy Lutomirski
a1480dcc3c fs: Add a missing permission check to do_umount
Accessing do_remount_sb should require global CAP_SYS_ADMIN, but
only one of the two call sites was appropriately protected.

Fixes CVE-2014-7975.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2014-10-08 12:32:47 -07:00
Tom Haynes
ea18cb3f11 NFSv4: Remove dead prototype for nfs4_insert_deviceid_node()
nfs4_insert_deviceid_node() was removed in 661373b13d0490ff410a2133d4a7a117f2dd037e

Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-10-08 14:31:01 -04:00
Linus Torvalds
da01e61428 Merge tag 'f2fs-for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "This patch-set introduces a couple of new features such as large
  sector size, FITRIM, and atomic/volatile writes.

  Several patches enhance power-off recovery and checkpoint routines.

  The fsck.f2fs starts to support fixing corrupted partitions with
  recovery hints provided by this patch-set.

  Summary:
   - retain some recovery information for fsck.f2fs
   - enhance checkpoint speed
   - enhance flush command management
   - bug fix for lseek
   - tune in-place-update policies
   - enhance roll-forward speed
   - revisit all the roll-forward and fsync rules
   - support larget sector size
   - support FITRIM
   - support atomic and volatile writes

  And several clean-ups and bug fixes are included"

* tag 'f2fs-for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits)
  f2fs: support volatile operations for transient data
  f2fs: support atomic writes
  f2fs: remove unused return value
  f2fs: clean up f2fs_ioctl functions
  f2fs: potential shift wrapping buf in f2fs_trim_fs()
  f2fs: call f2fs_unlock_op after error was handled
  f2fs: check the use of macros on block counts and addresses
  f2fs: refactor flush_nat_entries to remove costly reorganizing ops
  f2fs: introduce FITRIM in f2fs_ioctl
  f2fs: introduce cp_control structure
  f2fs: use more free segments until SSR is activated
  f2fs: change the ipu_policy option to enable combinations
  f2fs: fix to search whole dirty segmap when get_victim
  f2fs: fix to clean previous mount option when remount_fs
  f2fs: skip punching hole in special condition
  f2fs: support large sector size
  f2fs: fix to truncate blocks past EOF in ->setattr
  f2fs: update i_size when __allocate_data_block
  f2fs: use MAX_BIO_BLOCKS(sbi)
  f2fs: remove redundant operation during roll-forward recovery
  ...
2014-10-08 12:53:15 -04:00
Linus Torvalds
6dea0737bc Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Highlights:

   - support the NFSv4.2 SEEK operation (allowing clients to support
     SEEK_HOLE/SEEK_DATA), thanks to Anna.
   - end the grace period early in a number of cases, mitigating a
     long-standing annoyance, thanks to Jeff
   - improve SMP scalability, thanks to Trond"

* 'for-3.18' of git://linux-nfs.org/~bfields/linux: (55 commits)
  nfsd: eliminate "to_delegation" define
  NFSD: Implement SEEK
  NFSD: Add generic v4.2 infrastructure
  svcrdma: advertise the correct max payload
  nfsd: introduce nfsd4_callback_ops
  nfsd: split nfsd4_callback initialization and use
  nfsd: introduce a generic nfsd4_cb
  nfsd: remove nfsd4_callback.cb_op
  nfsd: do not clear rpc_resp in nfsd4_cb_done_sequence
  nfsd: fix nfsd4_cb_recall_done error handling
  nfsd4: clarify how grace period ends
  nfsd4: stop grace_time update at end of grace period
  nfsd: skip subsequent UMH "create" operations after the first one for v4.0 clients
  nfsd: set and test NFSD4_CLIENT_STABLE bit to reduce nfsdcltrack upcalls
  nfsd: serialize nfsdcltrack upcalls for a particular client
  nfsd: pass extra info in env vars to upcalls to allow for early grace period end
  nfsd: add a v4_end_grace file to /proc/fs/nfsd
  lockd: add a /proc/fs/lockd/nlm_end_grace file
  nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE
  nfsd: remove redundant boot_time parm from grace_done client tracking op
  ...
2014-10-08 12:51:44 -04:00
Linus Torvalds
25641c0c8d NFS client updates for Linux 3.18
Highlights include:
 
 Stable fixes:
 - fix an NFSv4.1 state renewal regression
 - fix open/lock state recovery error handling
 - fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
 - fix statd when reconnection fails
 - Don't wake tasks during connection abort
 - Don't start reboot recovery if lease check fails
 - fix duplicate proc entries
 
 Features:
 - pNFS block driver fixes and clean ups from Christoph
 - More code cleanups from Anna
 - Improve mmap() writeback performance
 - Replace use of PF_TRANS with a more generic mechanism for avoiding
   deadlocks in nfs_release_page
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUMpFYAAoJEGcL54qWCgDywHYP/A7XNykwOGhoHVP1Cgr3xqoz
 gVhAw97AEMZE8xSNVEGS++pJTe59JVzsIsYAwdHMwePV33l3zyzYorae6N9p7zWF
 0xVaNQ4qNLVhbrNLAoB5KA/c3/jMnNjF5t15+8akZad5pt4kXLlhSKjyVpdEEtJE
 A0eneXShMYEeLZoOJhpQt5bsw0OZ8YbWWEMjGlDqyeelvV3K1+zfivQOoyX6hS4w
 XFkPEDmU7zunE/xFP9ZoUaVdLO0TvOWfEZ7STWoHm7NuWfPQiDb9w1mTnuZbZyka
 ssezoGcitzwsjCcQ5e1iKTOoFRIsm/zYXFQgFQL7VFMBU1Tss9Of8047EyDkqcPF
 GxctsGg0gQ2FkG7yx7JH7AKpyibOIuByQrQQ916coWSf7K0L4H4Rcky3vryroylP
 1e1RI49xu215OTm+dLvlvYCv55bqCrTmaUGImZac18+ixD2eh6MNfW2ubSdxk89L
 U2rTFV09Bd52N7IQOGQx1FBEI2ZnIFUV4UaFz7v+rGFxOnk6+WYe+iWyb4wC70Yc
 8Jh/gTIQDd5aghql3FTieMOyfEvO6Re4pLMXmqEWMAevicx2t8DwkJriRu6X8Iy2
 rlDlBPwu5QmRWC20Dc897f0VajwDtwdeB8puod7nobOWzOfx4FrNqLJ+jR3pmHUk
 0otvJytqemXt+zkqqHKK
 =/OQi
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - fix an NFSv4.1 state renewal regression
   - fix open/lock state recovery error handling
   - fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
   - fix statd when reconnection fails
   - don't wake tasks during connection abort
   - don't start reboot recovery if lease check fails
   - fix duplicate proc entries

  Features:
  - pNFS block driver fixes and clean ups from Christoph
  - More code cleanups from Anna
  - Improve mmap() writeback performance
  - Replace use of PF_TRANS with a more generic mechanism for avoiding
    deadlocks in nfs_release_page"

* tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (66 commits)
  NFSv4.1: Fix an NFSv4.1 state renewal regression
  NFSv4: fix open/lock state recovery error handling
  NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
  NFS: Fabricate fscache server index key correctly
  SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
  NFSv3: Fix missing includes of nfs3_fs.h
  NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()
  NFS: avoid waiting at all in nfs_release_page when congested.
  NFS: avoid deadlocks with loop-back mounted NFS filesystems.
  MM: export page_wakeup functions
  SCHED: add some "wait..on_bit...timeout()" interfaces.
  NFS: don't use STABLE writes during writeback.
  NFSv4: use exponential retry on NFS4ERR_DELAY for async requests.
  rpc: Add -EPERM processing for xs_udp_send_request()
  rpc: return sent and err from xs_sendpages()
  lockd: Try to reconnect if statd has moved
  SUNRPC: Don't wake tasks during connection abort
  Fixing lease renewal
  nfs: fix duplicate proc entries
  pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripe
  ...
2014-10-08 12:49:23 -04:00
Qu Wenruo
a43bb39b5c btrfs: Fix compile error when CONFIG_SECURITY is not set.
Fix the following compile error when CONFIG_SECURITY is not set:

error: 'struct security_mnt_opts' has no member named 'num_mnt_opts'

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-10-08 06:59:24 -07:00
Fabian Frederick
d29c0afe4d GFS2: use _RET_IP_ instead of (unsigned long)__builtin_return_address(0)
use macro definition

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-10-08 09:57:07 +01:00
Linus Torvalds
28596c9722 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull "trivial tree" updates from Jiri Kosina:
 "Usual pile from trivial tree everyone is so eagerly waiting for"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  Remove MN10300_PROC_MN2WS0038
  mei: fix comments
  treewide: Fix typos in Kconfig
  kprobes: update jprobe_example.c for do_fork() change
  Documentation: change "&" to "and" in Documentation/applying-patches.txt
  Documentation: remove obsolete pcmcia-cs from Changes
  Documentation: update links in Changes
  Documentation: Docbook: Fix generated DocBook/kernel-api.xml
  score: Remove GENERIC_HAS_IOMAP
  gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
  tty: doc: Fix grammar in serial/tty
  dma-debug: modify check_for_stack output
  treewide: fix errors in printk
  genirq: fix reference in devm_request_threaded_irq comment
  treewide: fix synchronize_rcu() in comments
  checkstack.pl: port to AArch64
  doc: queue-sysfs: minor fixes
  init/do_mounts: better syntax description
  MIPS: fix comment spelling
  powerpc/simpleboot: fix comment
  ...
2014-10-07 21:16:26 -04:00
Chris Mason
0d4cf4e6bf Btrfs: fix compiles when CONFIG_BTRFS_FS_RUN_SANITY_TESTS is off
Commit fccb84c94 moved added some helpers to cleanup our sanity tests,
but it looks like both Dave and I always compile with the tests enabled.

This fixes things to work when they are turned off too.

Signed-off-by: Chris Mason <clm@fb.com>
2014-10-07 13:24:20 -07:00