linux/fs/autofs4/autofs_i.h
David Howells cc53ce53c8 Add a dentry op to allow processes to be held during pathwalk transit
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk.  The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).

The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory.  This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.

The ->d_manage() dentry operation:

	int (*d_manage)(struct path *path, bool mounting_here);

takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.

->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep.  However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.

Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.

follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).

A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage().  follow_down()
ignores automount points so that it can be used to mount on them.

__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself.  That would allow the autofs
daemon to continue on in rcu-walk mode.

Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked.  It can always be set again when necessary.

==========================
WHAT THIS MEANS FOR AUTOFS
==========================

Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.

autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it.  This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.

The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:

	mkdir         S ffffffff8014e05a     0 32580  24956
	Call Trace:
	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4

versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:

	automount     D ffffffff8014e05a     0 32581      1              32561
	Call Trace:
	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
	 [<ffffffff8005d229>] tracesys+0x71/0xe0
	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

which means that the system is deadlocked.

This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().

Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:31 -05:00

286 lines
7.8 KiB
C

/* -*- c -*- ------------------------------------------------------------- *
*
* linux/fs/autofs/autofs_i.h
*
* Copyright 1997-1998 Transmeta Corporation - All Rights Reserved
* Copyright 2005-2006 Ian Kent <raven@themaw.net>
*
* This file is part of the Linux kernel and is made available under
* the terms of the GNU General Public License, version 2, or at your
* option, any later version, incorporated herein by reference.
*
* ----------------------------------------------------------------------- */
/* Internal header file for autofs */
#include <linux/auto_fs4.h>
#include <linux/auto_dev-ioctl.h>
#include <linux/mutex.h>
#include <linux/spinlock.h>
#include <linux/list.h>
/* This is the range of ioctl() numbers we claim as ours */
#define AUTOFS_IOC_FIRST AUTOFS_IOC_READY
#define AUTOFS_IOC_COUNT 32
#define AUTOFS_DEV_IOCTL_IOC_FIRST (AUTOFS_DEV_IOCTL_VERSION)
#define AUTOFS_DEV_IOCTL_IOC_COUNT (AUTOFS_IOC_COUNT - 11)
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/time.h>
#include <linux/string.h>
#include <linux/wait.h>
#include <linux/sched.h>
#include <linux/mount.h>
#include <linux/namei.h>
#include <asm/current.h>
#include <asm/uaccess.h>
/* #define DEBUG */
#ifdef DEBUG
#define DPRINTK(fmt, args...) \
do { \
printk(KERN_DEBUG "pid %d: %s: " fmt "\n", \
current->pid, __func__, ##args); \
} while (0)
#else
#define DPRINTK(fmt, args...) do {} while (0)
#endif
#define AUTOFS_WARN(fmt, args...) \
do { \
printk(KERN_WARNING "pid %d: %s: " fmt "\n", \
current->pid, __func__, ##args); \
} while (0)
#define AUTOFS_ERROR(fmt, args...) \
do { \
printk(KERN_ERR "pid %d: %s: " fmt "\n", \
current->pid, __func__, ##args); \
} while (0)
extern spinlock_t autofs4_lock;
/* Unified info structure. This is pointed to by both the dentry and
inode structures. Each file in the filesystem has an instance of this
structure. It holds a reference to the dentry, so dentries are never
flushed while the file exists. All name lookups are dealt with at the
dentry level, although the filesystem can interfere in the validation
process. Readdir is implemented by traversing the dentry lists. */
struct autofs_info {
struct dentry *dentry;
struct inode *inode;
int flags;
struct completion expire_complete;
struct list_head active;
int active_count;
struct list_head expiring;
struct autofs_sb_info *sbi;
unsigned long last_used;
atomic_t count;
uid_t uid;
gid_t gid;
mode_t mode;
size_t size;
void (*free)(struct autofs_info *);
union {
const char *symlink;
} u;
};
#define AUTOFS_INF_EXPIRING (1<<0) /* dentry is in the process of expiring */
#define AUTOFS_INF_MOUNTPOINT (1<<1) /* mountpoint status for direct expire */
#define AUTOFS_INF_PENDING (1<<2) /* dentry pending mount */
struct autofs_wait_queue {
wait_queue_head_t queue;
struct autofs_wait_queue *next;
autofs_wqt_t wait_queue_token;
/* We use the following to see what we are waiting for */
struct qstr name;
u32 dev;
u64 ino;
uid_t uid;
gid_t gid;
pid_t pid;
pid_t tgid;
/* This is for status reporting upon return */
int status;
unsigned int wait_ctr;
};
#define AUTOFS_SBI_MAGIC 0x6d4a556d
struct autofs_sb_info {
u32 magic;
int pipefd;
struct file *pipe;
pid_t oz_pgrp;
int catatonic;
int version;
int sub_version;
int min_proto;
int max_proto;
unsigned long exp_timeout;
unsigned int type;
int reghost_enabled;
int needs_reghost;
struct super_block *sb;
struct mutex wq_mutex;
spinlock_t fs_lock;
struct autofs_wait_queue *queues; /* Wait queue pointer */
spinlock_t lookup_lock;
struct list_head active_list;
struct list_head expiring_list;
};
static inline struct autofs_sb_info *autofs4_sbi(struct super_block *sb)
{
return (struct autofs_sb_info *)(sb->s_fs_info);
}
static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry)
{
return (struct autofs_info *)(dentry->d_fsdata);
}
/* autofs4_oz_mode(): do we see the man behind the curtain? (The
processes which do manipulations for us in user space sees the raw
filesystem without "magic".) */
static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) {
return sbi->catatonic || task_pgrp_nr(current) == sbi->oz_pgrp;
}
/* Does a dentry have some pending activity? */
static inline int autofs4_ispending(struct dentry *dentry)
{
struct autofs_info *inf = autofs4_dentry_ino(dentry);
if (inf->flags & AUTOFS_INF_PENDING)
return 1;
if (inf->flags & AUTOFS_INF_EXPIRING)
return 1;
return 0;
}
static inline void autofs4_copy_atime(struct file *src, struct file *dst)
{
dst->f_path.dentry->d_inode->i_atime =
src->f_path.dentry->d_inode->i_atime;
return;
}
struct inode *autofs4_get_inode(struct super_block *, struct autofs_info *);
void autofs4_free_ino(struct autofs_info *);
/* Expiration */
int is_autofs4_dentry(struct dentry *);
int autofs4_expire_wait(struct dentry *dentry);
int autofs4_expire_run(struct super_block *, struct vfsmount *,
struct autofs_sb_info *,
struct autofs_packet_expire __user *);
int autofs4_do_expire_multi(struct super_block *sb, struct vfsmount *mnt,
struct autofs_sb_info *sbi, int when);
int autofs4_expire_multi(struct super_block *, struct vfsmount *,
struct autofs_sb_info *, int __user *);
struct dentry *autofs4_expire_direct(struct super_block *sb,
struct vfsmount *mnt,
struct autofs_sb_info *sbi, int how);
struct dentry *autofs4_expire_indirect(struct super_block *sb,
struct vfsmount *mnt,
struct autofs_sb_info *sbi, int how);
/* Device node initialization */
int autofs_dev_ioctl_init(void);
void autofs_dev_ioctl_exit(void);
/* Operations structures */
extern const struct inode_operations autofs4_symlink_inode_operations;
extern const struct inode_operations autofs4_dir_inode_operations;
extern const struct inode_operations autofs4_root_inode_operations;
extern const struct inode_operations autofs4_indirect_root_inode_operations;
extern const struct inode_operations autofs4_direct_root_inode_operations;
extern const struct file_operations autofs4_dir_operations;
extern const struct file_operations autofs4_root_operations;
/* Initializing function */
int autofs4_fill_super(struct super_block *, void *, int);
struct autofs_info *autofs4_init_ino(struct autofs_info *, struct autofs_sb_info *sbi, mode_t mode);
/* Queue management functions */
int autofs4_wait(struct autofs_sb_info *,struct dentry *, enum autofs_notify);
int autofs4_wait_release(struct autofs_sb_info *,autofs_wqt_t,int);
void autofs4_catatonic_mode(struct autofs_sb_info *);
static inline u32 autofs4_get_dev(struct autofs_sb_info *sbi)
{
return new_encode_dev(sbi->sb->s_dev);
}
static inline u64 autofs4_get_ino(struct autofs_sb_info *sbi)
{
return sbi->sb->s_root->d_inode->i_ino;
}
static inline int simple_positive(struct dentry *dentry)
{
return dentry->d_inode && !d_unhashed(dentry);
}
static inline void __autofs4_add_expiring(struct dentry *dentry)
{
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
if (ino) {
if (list_empty(&ino->expiring))
list_add(&ino->expiring, &sbi->expiring_list);
}
return;
}
static inline void autofs4_add_expiring(struct dentry *dentry)
{
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
if (ino) {
spin_lock(&sbi->lookup_lock);
if (list_empty(&ino->expiring))
list_add(&ino->expiring, &sbi->expiring_list);
spin_unlock(&sbi->lookup_lock);
}
return;
}
static inline void autofs4_del_expiring(struct dentry *dentry)
{
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
if (ino) {
spin_lock(&sbi->lookup_lock);
if (!list_empty(&ino->expiring))
list_del_init(&ino->expiring);
spin_unlock(&sbi->lookup_lock);
}
return;
}
void autofs4_dentry_release(struct dentry *);
extern void autofs4_kill_sb(struct super_block *);