2005-04-17 02:20:36 +04:00
/*
* linux / fs / namei . c
*
* Copyright ( C ) 1991 , 1992 Linus Torvalds
*/
/*
* Some corrections by tytso .
*/
/* [Feb 1997 T. Schoebel-Theuer] Complete rewrite of the pathname
* lookup logic .
*/
/* [Feb-Apr 2000, AV] Rewrite to the new namespace architecture.
*/
# include <linux/init.h>
# include <linux/module.h>
# include <linux/slab.h>
# include <linux/fs.h>
# include <linux/namei.h>
# include <linux/quotaops.h>
# include <linux/pagemap.h>
[PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 01:06:03 +04:00
# include <linux/fsnotify.h>
2005-04-17 02:20:36 +04:00
# include <linux/personality.h>
# include <linux/security.h>
2009-02-04 17:06:57 +03:00
# include <linux/ima.h>
2005-04-17 02:20:36 +04:00
# include <linux/syscalls.h>
# include <linux/mount.h>
# include <linux/audit.h>
2006-01-11 23:17:46 +03:00
# include <linux/capability.h>
2005-10-19 01:20:16 +04:00
# include <linux/file.h>
2006-01-19 04:43:53 +03:00
# include <linux/fcntl.h>
2008-04-29 12:00:10 +04:00
# include <linux/device_cgroup.h>
2009-03-30 03:50:06 +04:00
# include <linux/fs_struct.h>
2005-04-17 02:20:36 +04:00
# include <asm/uaccess.h>
2009-12-04 23:47:36 +03:00
# include "internal.h"
2005-04-17 02:20:36 +04:00
/* [Feb-1997 T. Schoebel-Theuer]
* Fundamental changes in the pathname lookup mechanisms ( namei )
* were necessary because of omirr . The reason is that omirr needs
* to know the _real_ pathname , not the user - supplied one , in case
* of symlinks ( and also when transname replacements occur ) .
*
* The new code replaces the old recursive symlink resolution with
* an iterative one ( in case of non - nested symlink chains ) . It does
* this with calls to < fs > _follow_link ( ) .
* As a side effect , dir_namei ( ) , _namei ( ) and follow_link ( ) are now
* replaced with a single function lookup_dentry ( ) that can handle all
* the special cases of the former code .
*
* With the new dcache , the pathname is stored at each inode , at least as
* long as the refcount of the inode is positive . As a side effect , the
* size of the dcache depends on the inode cache and thus is dynamic .
*
* [ 29 - Apr - 1998 C . Scott Ananian ] Updated above description of symlink
* resolution to correspond with current state of the code .
*
* Note that the symlink resolution is not * completely * iterative .
* There is still a significant amount of tail - and mid - recursion in
* the algorithm . Also , note that < fs > _readlink ( ) is not used in
* lookup_dentry ( ) : lookup_dentry ( ) on the result of < fs > _readlink ( )
* may return different results than < fs > _follow_link ( ) . Many virtual
* filesystems ( including / proc ) exhibit this behavior .
*/
/* [24-Feb-97 T. Schoebel-Theuer] Side effects caused by new implementation:
* New symlink semantics : when open ( ) is called with flags O_CREAT | O_EXCL
* and the name already exists in form of a symlink , try to create the new
* name indicated by the symlink . The old code always complained that the
* name already exists , due to not following the symlink even if its target
* is nonexistent . The new semantics affects also mknod ( ) and link ( ) when
* the name is a symlink pointing to a non - existant name .
*
* I don ' t know which semantics is the right one , since I have no access
* to standards . But I found by trial that HP - UX 9.0 has the full " new "
* semantics implemented , while SunOS 4.1 .1 and Solaris ( SunOS 5.4 ) have the
* " old " one . Personally , I think the new semantics is much more logical .
* Note that " ln old new " where " new " is a symlink pointing to a non - existing
* file does succeed in both HP - UX and SunOs , but not in Solaris
* and in the old Linux semantics .
*/
/* [16-Dec-97 Kevin Buhr] For security reasons, we change some symlink
* semantics . See the comments in " open_namei " and " do_link " below .
*
* [ 10 - Sep - 98 Alan Modra ] Another symlink change .
*/
/* [Feb-Apr 2000 AV] Complete rewrite. Rules for symlinks:
* inside the path - always follow .
* in the last component in creation / removal / renaming - never follow .
* if LOOKUP_FOLLOW passed - follow .
* if the pathname has trailing slashes - follow .
* otherwise - don ' t follow .
* ( applied in that order ) .
*
* [ Jun 2000 AV ] Inconsistent behaviour of open ( ) in case if flags = = O_CREAT
* restored for 2.4 . This is the last surviving part of old 4.2 BSD bug .
* During the 2.4 we need to fix the userland stuff depending on it -
* hopefully we will be able to get rid of that wart in 2.5 . So far only
* XEmacs seems to be relying on it . . .
*/
/*
* [ Sep 2001 AV ] Single - semaphore locking scheme ( kudos to David Holland )
2006-03-23 14:00:33 +03:00
* implemented . Let ' s see if raised priority of - > s_vfs_rename_mutex gives
2005-04-17 02:20:36 +04:00
* any extra contention . . .
*/
/* In order to reduce some races, while at the same time doing additional
* checking and hopefully speeding things up , we copy filenames to the
* kernel data space before using them . .
*
* POSIX .1 2.4 : an empty pathname is invalid ( ENOENT ) .
* PATH_MAX includes the nul terminator - - RR .
*/
2006-01-15 00:20:43 +03:00
static int do_getname ( const char __user * filename , char * page )
2005-04-17 02:20:36 +04:00
{
int retval ;
unsigned long len = PATH_MAX ;
if ( ! segment_eq ( get_fs ( ) , KERNEL_DS ) ) {
if ( ( unsigned long ) filename > = TASK_SIZE )
return - EFAULT ;
if ( TASK_SIZE - ( unsigned long ) filename < PATH_MAX )
len = TASK_SIZE - ( unsigned long ) filename ;
}
retval = strncpy_from_user ( page , filename , len ) ;
if ( retval > 0 ) {
if ( retval < len )
return 0 ;
return - ENAMETOOLONG ;
} else if ( ! retval )
retval = - ENOENT ;
return retval ;
}
char * getname ( const char __user * filename )
{
char * tmp , * result ;
result = ERR_PTR ( - ENOMEM ) ;
tmp = __getname ( ) ;
if ( tmp ) {
int retval = do_getname ( filename , tmp ) ;
result = tmp ;
if ( retval < 0 ) {
__putname ( tmp ) ;
result = ERR_PTR ( retval ) ;
}
}
audit_getname ( result ) ;
return result ;
}
# ifdef CONFIG_AUDITSYSCALL
void putname ( const char * name )
{
2006-07-16 14:38:45 +04:00
if ( unlikely ( ! audit_dummy_context ( ) ) )
2005-04-17 02:20:36 +04:00
audit_putname ( name ) ;
else
__putname ( name ) ;
}
EXPORT_SYMBOL ( putname ) ;
# endif
2009-08-28 22:51:25 +04:00
/*
* This does basic POSIX ACL permission checking
2005-04-17 02:20:36 +04:00
*/
2009-08-28 22:51:25 +04:00
static int acl_permission_check ( struct inode * inode , int mask ,
2005-04-17 02:20:36 +04:00
int ( * check_acl ) ( struct inode * inode , int mask ) )
{
umode_t mode = inode - > i_mode ;
2008-07-16 05:03:57 +04:00
mask & = MAY_READ | MAY_WRITE | MAY_EXEC ;
2008-11-14 02:39:05 +03:00
if ( current_fsuid ( ) = = inode - > i_uid )
2005-04-17 02:20:36 +04:00
mode > > = 6 ;
else {
if ( IS_POSIXACL ( inode ) & & ( mode & S_IRWXG ) & & check_acl ) {
int error = check_acl ( inode , mask ) ;
2009-08-28 22:51:25 +04:00
if ( error ! = - EAGAIN )
2005-04-17 02:20:36 +04:00
return error ;
}
if ( in_group_p ( inode - > i_gid ) )
mode > > = 3 ;
}
/*
* If the DACs are ok we don ' t need any capability check .
*/
2008-07-16 05:03:57 +04:00
if ( ( mask & ~ mode ) = = 0 )
2005-04-17 02:20:36 +04:00
return 0 ;
2009-08-28 22:51:25 +04:00
return - EACCES ;
}
/**
* generic_permission - check for access rights on a Posix - like filesystem
* @ inode : inode to check access rights for
* @ mask : right to check for ( % MAY_READ , % MAY_WRITE , % MAY_EXEC )
* @ check_acl : optional callback to check for Posix ACLs
*
* Used to check for read / write / execute permissions on a file .
* We use " fsuid " for this , letting us set arbitrary permissions
* for filesystem access without changing the " normal " uids which
* are used for other things . .
*/
int generic_permission ( struct inode * inode , int mask ,
int ( * check_acl ) ( struct inode * inode , int mask ) )
{
int ret ;
/*
* Do the basic POSIX ACL permission checks .
*/
ret = acl_permission_check ( inode , mask , check_acl ) ;
if ( ret ! = - EACCES )
return ret ;
2005-04-17 02:20:36 +04:00
/*
* Read / write DACs are always overridable .
* Executable DACs are overridable if at least one exec bit is set .
*/
2008-07-31 15:41:58 +04:00
if ( ! ( mask & MAY_EXEC ) | | execute_ok ( inode ) )
2005-04-17 02:20:36 +04:00
if ( capable ( CAP_DAC_OVERRIDE ) )
return 0 ;
/*
* Searching includes executable on directories , else just read .
*/
2009-12-29 23:50:19 +03:00
mask & = MAY_READ | MAY_WRITE | MAY_EXEC ;
2005-04-17 02:20:36 +04:00
if ( mask = = MAY_READ | | ( S_ISDIR ( inode - > i_mode ) & & ! ( mask & MAY_WRITE ) ) )
if ( capable ( CAP_DAC_READ_SEARCH ) )
return 0 ;
return - EACCES ;
}
2008-10-24 11:59:29 +04:00
/**
* inode_permission - check for access rights to a given inode
* @ inode : inode to check permission on
* @ mask : right to check for ( % MAY_READ , % MAY_WRITE , % MAY_EXEC )
*
* Used to check for read / write / execute permissions on an inode .
* We use " fsuid " for this , letting us set arbitrary permissions
* for filesystem access without changing the " normal " uids which
* are used for other things .
*/
2008-07-22 08:07:17 +04:00
int inode_permission ( struct inode * inode , int mask )
2005-04-17 02:20:36 +04:00
{
2008-07-16 05:03:57 +04:00
int retval ;
2005-04-17 02:20:36 +04:00
if ( mask & MAY_WRITE ) {
2007-10-17 10:27:08 +04:00
umode_t mode = inode - > i_mode ;
2005-04-17 02:20:36 +04:00
/*
* Nobody gets write access to a read - only fs .
*/
if ( IS_RDONLY ( inode ) & &
( S_ISREG ( mode ) | | S_ISDIR ( mode ) | | S_ISLNK ( mode ) ) )
return - EROFS ;
/*
* Nobody gets write access to an immutable file .
*/
if ( IS_IMMUTABLE ( inode ) )
return - EACCES ;
}
2008-12-04 18:06:33 +03:00
if ( inode - > i_op - > permission )
2008-07-17 17:37:02 +04:00
retval = inode - > i_op - > permission ( inode , mask ) ;
2008-07-31 15:41:58 +04:00
else
2009-08-28 22:51:25 +04:00
retval = generic_permission ( inode , mask , inode - > i_op - > check_acl ) ;
2008-07-31 15:41:58 +04:00
2005-04-17 02:20:36 +04:00
if ( retval )
return retval ;
2008-04-29 12:00:10 +04:00
retval = devcgroup_inode_permission ( inode , mask ) ;
if ( retval )
return retval ;
2008-07-16 05:03:57 +04:00
return security_inode_permission ( inode ,
2008-07-28 21:32:38 +04:00
mask & ( MAY_READ | MAY_WRITE | MAY_EXEC | MAY_APPEND ) ) ;
2005-04-17 02:20:36 +04:00
}
2005-11-09 08:35:04 +03:00
/**
* file_permission - check for additional access rights to a given file
* @ file : file to check access rights for
* @ mask : right to check for ( % MAY_READ , % MAY_WRITE , % MAY_EXEC )
*
* Used to check for read / write / execute permissions on an already opened
* file .
*
* Note :
* Do not use this function in new code . All access checks should
2008-10-24 11:59:29 +04:00
* be done using inode_permission ( ) .
2005-11-09 08:35:04 +03:00
*/
int file_permission ( struct file * file , int mask )
{
2008-07-22 08:07:17 +04:00
return inode_permission ( file - > f_path . dentry - > d_inode , mask ) ;
2005-11-09 08:35:04 +03:00
}
2005-04-17 02:20:36 +04:00
/*
* get_write_access ( ) gets write permission for a file .
* put_write_access ( ) releases this write permission .
* This is used for regular files .
* We cannot support write ( and maybe mmap read - write shared ) accesses and
* MAP_DENYWRITE mmappings simultaneously . The i_writecount field of an inode
* can have the following values :
* 0 : no writers , no VM_DENYWRITE mappings
* < 0 : ( - i_writecount ) vm_area_structs with VM_DENYWRITE set exist
* > 0 : ( i_writecount ) users are writing to the file .
*
* Normally we operate on that counter with atomic_ { inc , dec } and it ' s safe
* except for the cases where we don ' t hold i_writecount yet . Then we need to
* use { get , deny } _write_access ( ) - these functions check the sign and refuse
* to do the change if sign is wrong . Exclusion between them is provided by
* the inode - > i_lock spinlock .
*/
int get_write_access ( struct inode * inode )
{
spin_lock ( & inode - > i_lock ) ;
if ( atomic_read ( & inode - > i_writecount ) < 0 ) {
spin_unlock ( & inode - > i_lock ) ;
return - ETXTBSY ;
}
atomic_inc ( & inode - > i_writecount ) ;
spin_unlock ( & inode - > i_lock ) ;
return 0 ;
}
int deny_write_access ( struct file * file )
{
2006-12-08 13:36:35 +03:00
struct inode * inode = file - > f_path . dentry - > d_inode ;
2005-04-17 02:20:36 +04:00
spin_lock ( & inode - > i_lock ) ;
if ( atomic_read ( & inode - > i_writecount ) > 0 ) {
spin_unlock ( & inode - > i_lock ) ;
return - ETXTBSY ;
}
atomic_dec ( & inode - > i_writecount ) ;
spin_unlock ( & inode - > i_lock ) ;
return 0 ;
}
2008-02-15 06:34:38 +03:00
/**
* path_get - get a reference to a path
* @ path : path to get the reference to
*
* Given a path increment the reference count to the dentry and the vfsmount .
*/
void path_get ( struct path * path )
{
mntget ( path - > mnt ) ;
dget ( path - > dentry ) ;
}
EXPORT_SYMBOL ( path_get ) ;
2008-02-15 06:34:35 +03:00
/**
* path_put - put a reference to a path
* @ path : path to put the reference to
*
* Given a path decrement the reference count to the dentry and the vfsmount .
*/
void path_put ( struct path * path )
2005-04-17 02:20:36 +04:00
{
2008-02-15 06:34:35 +03:00
dput ( path - > dentry ) ;
mntput ( path - > mnt ) ;
2005-04-17 02:20:36 +04:00
}
2008-02-15 06:34:35 +03:00
EXPORT_SYMBOL ( path_put ) ;
2005-04-17 02:20:36 +04:00
2005-10-19 01:20:16 +04:00
/**
* release_open_intent - free up open intent resources
* @ nd : pointer to nameidata
*/
void release_open_intent ( struct nameidata * nd )
{
2006-12-08 13:36:35 +03:00
if ( nd - > intent . open . file - > f_path . dentry = = NULL )
2005-10-19 01:20:16 +04:00
put_filp ( nd - > intent . open . file ) ;
else
fput ( nd - > intent . open . file ) ;
}
2006-09-27 12:50:44 +04:00
static inline struct dentry *
do_revalidate ( struct dentry * dentry , struct nameidata * nd )
{
int status = dentry - > d_op - > d_revalidate ( dentry , nd ) ;
if ( unlikely ( status < = 0 ) ) {
/*
* The dentry failed validation .
* If d_revalidate returned 0 attempt to invalidate
* the dentry otherwise d_revalidate is asking us
* to return a fail status .
*/
if ( ! status ) {
if ( ! d_invalidate ( dentry ) ) {
dput ( dentry ) ;
dentry = NULL ;
}
} else {
dput ( dentry ) ;
dentry = ERR_PTR ( status ) ;
}
}
return dentry ;
}
2009-12-07 20:01:50 +03:00
/*
* force_reval_path - force revalidation of a dentry
*
* In some situations the path walking code will trust dentries without
* revalidating them . This causes problems for filesystems that depend on
* d_revalidate to handle file opens ( e . g . NFSv4 ) . When FS_REVAL_DOT is set
* ( which indicates that it ' s possible for the dentry to go stale ) , force
* a d_revalidate call before proceeding .
*
* Returns 0 if the revalidation was successful . If the revalidation fails ,
* either return the error returned by d_revalidate or - ESTALE if the
* revalidation it just returned 0. If d_revalidate returns 0 , we attempt to
* invalidate the dentry . It ' s up to the caller to handle putting references
* to the path if necessary .
*/
static int
force_reval_path ( struct path * path , struct nameidata * nd )
{
int status ;
struct dentry * dentry = path - > dentry ;
/*
* only check on filesystems where it ' s possible for the dentry to
* become stale . It ' s assumed that if this flag is set then the
* d_revalidate op will also be defined .
*/
if ( ! ( dentry - > d_sb - > s_type - > fs_flags & FS_REVAL_DOT ) )
return 0 ;
status = dentry - > d_op - > d_revalidate ( dentry , nd ) ;
if ( status > 0 )
return 0 ;
if ( ! status ) {
d_invalidate ( dentry ) ;
status = - ESTALE ;
}
return status ;
}
2005-04-17 02:20:36 +04:00
/*
2009-12-16 09:01:38 +03:00
* Short - cut version of permission ( ) , for calling on directories
* during pathname resolution . Combines parts of permission ( )
* and generic_permission ( ) , and tests ONLY for MAY_EXEC permission .
2005-04-17 02:20:36 +04:00
*
* If appropriate , check DAC only . If not appropriate , or
2009-12-16 09:01:38 +03:00
* short - cut DAC fails , then call - > permission ( ) to do more
2005-04-17 02:20:36 +04:00
* complete permission check .
*/
2009-12-16 09:01:38 +03:00
static int exec_permission ( struct inode * inode )
2005-04-17 02:20:36 +04:00
{
2009-08-28 22:51:25 +04:00
int ret ;
2005-04-17 02:20:36 +04:00
2009-08-28 22:08:31 +04:00
if ( inode - > i_op - > permission ) {
2009-08-28 22:51:25 +04:00
ret = inode - > i_op - > permission ( inode , MAY_EXEC ) ;
2009-08-28 22:08:31 +04:00
if ( ! ret )
goto ok ;
return ret ;
}
2009-08-28 22:51:25 +04:00
ret = acl_permission_check ( inode , MAY_EXEC , inode - > i_op - > check_acl ) ;
if ( ! ret )
2005-04-17 02:20:36 +04:00
goto ok ;
2009-08-28 21:53:56 +04:00
if ( capable ( CAP_DAC_OVERRIDE ) | | capable ( CAP_DAC_READ_SEARCH ) )
2005-04-17 02:20:36 +04:00
goto ok ;
2009-08-28 22:51:25 +04:00
return ret ;
2005-04-17 02:20:36 +04:00
ok :
2008-07-17 17:37:02 +04:00
return security_inode_permission ( inode , MAY_EXEC ) ;
2005-04-17 02:20:36 +04:00
}
2009-04-07 19:49:53 +04:00
static __always_inline void set_root ( struct nameidata * nd )
{
if ( ! nd - > root . mnt ) {
struct fs_struct * fs = current - > fs ;
read_lock ( & fs - > lock ) ;
nd - > root = fs - > root ;
path_get ( & nd - > root ) ;
read_unlock ( & fs - > lock ) ;
}
}
2009-08-09 01:41:57 +04:00
static int link_path_walk ( const char * , struct nameidata * ) ;
2006-01-15 00:21:31 +03:00
static __always_inline int __vfs_follow_link ( struct nameidata * nd , const char * link )
2005-04-17 02:20:36 +04:00
{
int res = 0 ;
char * name ;
if ( IS_ERR ( link ) )
goto fail ;
if ( * link = = ' / ' ) {
2009-04-07 19:49:53 +04:00
set_root ( nd ) ;
2008-02-15 06:34:35 +03:00
path_put ( & nd - > path ) ;
2009-04-07 19:49:53 +04:00
nd - > path = nd - > root ;
path_get ( & nd - > root ) ;
2005-04-17 02:20:36 +04:00
}
2008-11-05 17:07:21 +03:00
2005-04-17 02:20:36 +04:00
res = link_path_walk ( link , nd ) ;
if ( nd - > depth | | res | | nd - > last_type ! = LAST_NORM )
return res ;
/*
* If it is an iterative symlinks resolution in open_namei ( ) we
* have to copy the last component . And all that crap because of
* bloody create ( ) on broken symlinks . Furrfu . . .
*/
name = __getname ( ) ;
if ( unlikely ( ! name ) ) {
2008-02-15 06:34:35 +03:00
path_put ( & nd - > path ) ;
2005-04-17 02:20:36 +04:00
return - ENOMEM ;
}
strcpy ( name , nd - > last . name ) ;
nd - > last . name = name ;
return 0 ;
fail :
2008-02-15 06:34:35 +03:00
path_put ( & nd - > path ) ;
2005-04-17 02:20:36 +04:00
return PTR_ERR ( link ) ;
}
2008-02-15 06:34:35 +03:00
static void path_put_conditional ( struct path * path , struct nameidata * nd )
2006-03-27 13:14:53 +04:00
{
dput ( path - > dentry ) ;
2008-02-15 06:34:32 +03:00
if ( path - > mnt ! = nd - > path . mnt )
2006-03-27 13:14:53 +04:00
mntput ( path - > mnt ) ;
}
static inline void path_to_nameidata ( struct path * path , struct nameidata * nd )
{
2008-02-15 06:34:32 +03:00
dput ( nd - > path . dentry ) ;
if ( nd - > path . mnt ! = path - > mnt )
mntput ( nd - > path . mnt ) ;
nd - > path . mnt = path - > mnt ;
nd - > path . dentry = path - > dentry ;
2006-03-27 13:14:53 +04:00
}
2006-01-15 00:21:31 +03:00
static __always_inline int __do_follow_link ( struct path * path , struct nameidata * nd )
2005-04-17 02:20:36 +04:00
{
int error ;
2005-08-20 05:02:56 +04:00
void * cookie ;
2005-06-07 00:36:03 +04:00
struct dentry * dentry = path - > dentry ;
2005-04-17 02:20:36 +04:00
2005-06-07 00:36:14 +04:00
touch_atime ( path - > mnt , dentry ) ;
2005-04-17 02:20:36 +04:00
nd_set_link ( nd , NULL ) ;
2005-06-07 00:36:03 +04:00
2008-02-15 06:34:32 +03:00
if ( path - > mnt ! = nd - > path . mnt ) {
2006-03-27 13:14:53 +04:00
path_to_nameidata ( path , nd ) ;
dget ( dentry ) ;
}
mntget ( path - > mnt ) ;
2009-12-23 07:45:11 +03:00
nd - > last_type = LAST_BIND ;
2005-08-20 05:02:56 +04:00
cookie = dentry - > d_inode - > i_op - > follow_link ( dentry , nd ) ;
error = PTR_ERR ( cookie ) ;
if ( ! IS_ERR ( cookie ) ) {
2005-04-17 02:20:36 +04:00
char * s = nd_get_link ( nd ) ;
2005-08-20 05:02:56 +04:00
error = 0 ;
2005-04-17 02:20:36 +04:00
if ( s )
error = __vfs_follow_link ( nd , s ) ;
2009-12-07 20:01:50 +03:00
else if ( nd - > last_type = = LAST_BIND ) {
error = force_reval_path ( & nd - > path , nd ) ;
if ( error )
path_put ( & nd - > path ) ;
}
2005-04-17 02:20:36 +04:00
if ( dentry - > d_inode - > i_op - > put_link )
2005-08-20 05:02:56 +04:00
dentry - > d_inode - > i_op - > put_link ( dentry , nd , cookie ) ;
2005-04-17 02:20:36 +04:00
}
return error ;
}
/*
* This limits recursive symlink follows to 8 , while
* limiting consecutive symlinks to 40.
*
* Without that kind of total limit , nasty chains of consecutive
* symlinks can cause almost arbitrarily long lookups .
*/
2005-06-07 00:35:58 +04:00
static inline int do_follow_link ( struct path * path , struct nameidata * nd )
2005-04-17 02:20:36 +04:00
{
int err = - ELOOP ;
if ( current - > link_count > = MAX_NESTED_LINKS )
goto loop ;
if ( current - > total_link_count > = 40 )
goto loop ;
BUG_ON ( nd - > depth > = MAX_NESTED_LINKS ) ;
cond_resched ( ) ;
2005-06-07 00:35:58 +04:00
err = security_inode_follow_link ( path - > dentry , nd ) ;
2005-04-17 02:20:36 +04:00
if ( err )
goto loop ;
current - > link_count + + ;
current - > total_link_count + + ;
nd - > depth + + ;
2005-06-07 00:36:03 +04:00
err = __do_follow_link ( path , nd ) ;
2009-08-09 01:32:02 +04:00
path_put ( path ) ;
2005-06-07 00:36:02 +04:00
current - > link_count - - ;
nd - > depth - - ;
2005-04-17 02:20:36 +04:00
return err ;
loop :
2008-02-15 06:34:35 +03:00
path_put_conditional ( path , nd ) ;
path_put ( & nd - > path ) ;
2005-04-17 02:20:36 +04:00
return err ;
}
2009-04-18 11:26:48 +04:00
int follow_up ( struct path * path )
2005-04-17 02:20:36 +04:00
{
struct vfsmount * parent ;
struct dentry * mountpoint ;
spin_lock ( & vfsmount_lock ) ;
2009-04-18 11:26:48 +04:00
parent = path - > mnt - > mnt_parent ;
if ( parent = = path - > mnt ) {
2005-04-17 02:20:36 +04:00
spin_unlock ( & vfsmount_lock ) ;
return 0 ;
}
mntget ( parent ) ;
2009-04-18 11:26:48 +04:00
mountpoint = dget ( path - > mnt - > mnt_mountpoint ) ;
2005-04-17 02:20:36 +04:00
spin_unlock ( & vfsmount_lock ) ;
2009-04-18 11:26:48 +04:00
dput ( path - > dentry ) ;
path - > dentry = mountpoint ;
mntput ( path - > mnt ) ;
path - > mnt = parent ;
2005-04-17 02:20:36 +04:00
return 1 ;
}
/* no need for dcache_lock, as serialization is taken care in
* namespace . c
*/
2005-06-07 00:36:05 +04:00
static int __follow_mount ( struct path * path )
{
int res = 0 ;
while ( d_mountpoint ( path - > dentry ) ) {
2009-04-18 22:06:57 +04:00
struct vfsmount * mounted = lookup_mnt ( path ) ;
2005-06-07 00:36:05 +04:00
if ( ! mounted )
break ;
dput ( path - > dentry ) ;
if ( res )
mntput ( path - > mnt ) ;
path - > mnt = mounted ;
path - > dentry = dget ( mounted - > mnt_root ) ;
res = 1 ;
}
return res ;
}
2009-04-18 21:59:41 +04:00
static void follow_mount ( struct path * path )
2005-04-17 02:20:36 +04:00
{
2009-04-18 21:59:41 +04:00
while ( d_mountpoint ( path - > dentry ) ) {
2009-04-18 22:06:57 +04:00
struct vfsmount * mounted = lookup_mnt ( path ) ;
2005-04-17 02:20:36 +04:00
if ( ! mounted )
break ;
2009-04-18 21:59:41 +04:00
dput ( path - > dentry ) ;
mntput ( path - > mnt ) ;
path - > mnt = mounted ;
path - > dentry = dget ( mounted - > mnt_root ) ;
2005-04-17 02:20:36 +04:00
}
}
/* no need for dcache_lock, as serialization is taken care in
* namespace . c
*/
2009-04-18 21:58:15 +04:00
int follow_down ( struct path * path )
2005-04-17 02:20:36 +04:00
{
struct vfsmount * mounted ;
2009-04-18 22:06:57 +04:00
mounted = lookup_mnt ( path ) ;
2005-04-17 02:20:36 +04:00
if ( mounted ) {
2009-04-18 21:58:15 +04:00
dput ( path - > dentry ) ;
mntput ( path - > mnt ) ;
path - > mnt = mounted ;
path - > dentry = dget ( mounted - > mnt_root ) ;
2005-04-17 02:20:36 +04:00
return 1 ;
}
return 0 ;
}
2006-01-15 00:21:31 +03:00
static __always_inline void follow_dotdot ( struct nameidata * nd )
2005-04-17 02:20:36 +04:00
{
2009-04-07 19:49:53 +04:00
set_root ( nd ) ;
2006-09-29 13:01:22 +04:00
2005-04-17 02:20:36 +04:00
while ( 1 ) {
2008-02-15 06:34:32 +03:00
struct dentry * old = nd - > path . dentry ;
2005-04-17 02:20:36 +04:00
2009-04-07 19:49:53 +04:00
if ( nd - > path . dentry = = nd - > root . dentry & &
nd - > path . mnt = = nd - > root . mnt ) {
2005-04-17 02:20:36 +04:00
break ;
}
2008-02-15 06:34:32 +03:00
if ( nd - > path . dentry ! = nd - > path . mnt - > mnt_root ) {
2010-01-30 23:47:29 +03:00
/* rare case of legitimate dget_parent()... */
nd - > path . dentry = dget_parent ( nd - > path . dentry ) ;
2005-04-17 02:20:36 +04:00
dput ( old ) ;
break ;
}
2010-01-30 23:47:29 +03:00
if ( ! follow_up ( & nd - > path ) )
2005-04-17 02:20:36 +04:00
break ;
}
2009-04-18 21:59:41 +04:00
follow_mount ( & nd - > path ) ;
2005-04-17 02:20:36 +04:00
}
/*
* It ' s more convoluted than I ' d like it to be , but . . . it ' s still fairly
* small and for now I ' d prefer to have fast path as straight as possible .
* It _is_ time - critical .
*/
static int do_lookup ( struct nameidata * nd , struct qstr * name ,
struct path * path )
{
2008-02-15 06:34:32 +03:00
struct vfsmount * mnt = nd - > path . mnt ;
2009-08-13 23:38:37 +04:00
struct dentry * dentry , * parent ;
struct inode * dir ;
2009-08-13 18:27:43 +04:00
/*
* See if the low - level filesystem might want
* to use its own hash . .
*/
if ( nd - > path . dentry - > d_op & & nd - > path . dentry - > d_op - > d_hash ) {
int err = nd - > path . dentry - > d_op - > d_hash ( nd - > path . dentry , name ) ;
if ( err < 0 )
return err ;
}
2005-04-17 02:20:36 +04:00
2009-08-13 18:27:43 +04:00
dentry = __d_lookup ( nd - > path . dentry , name ) ;
2005-04-17 02:20:36 +04:00
if ( ! dentry )
goto need_lookup ;
if ( dentry - > d_op & & dentry - > d_op - > d_revalidate )
goto need_revalidate ;
done :
path - > mnt = mnt ;
path - > dentry = dentry ;
2005-06-07 00:36:13 +04:00
__follow_mount ( path ) ;
2005-04-17 02:20:36 +04:00
return 0 ;
need_lookup :
2009-08-13 23:38:37 +04:00
parent = nd - > path . dentry ;
dir = parent - > d_inode ;
mutex_lock ( & dir - > i_mutex ) ;
/*
* First re - do the cached lookup just in case it was created
* while we waited for the directory semaphore . .
*
* FIXME ! This could use version numbering or similar to
* avoid unnecessary cache lookups .
*
* The " dcache_lock " is purely to protect the RCU list walker
* from concurrent renames at this point ( we mustn ' t get false
* negatives from the RCU list walk here , unlike the optimistic
* fast walk ) .
*
* so doing d_lookup ( ) ( with seqlock ) , instead of lockfree __d_lookup
*/
dentry = d_lookup ( parent , name ) ;
if ( ! dentry ) {
struct dentry * new ;
/* Don't create child dentry for a dead directory. */
dentry = ERR_PTR ( - ENOENT ) ;
if ( IS_DEADDIR ( dir ) )
goto out_unlock ;
new = d_alloc ( parent , name ) ;
dentry = ERR_PTR ( - ENOMEM ) ;
if ( new ) {
dentry = dir - > i_op - > lookup ( dir , new , nd ) ;
if ( dentry )
dput ( new ) ;
else
dentry = new ;
}
out_unlock :
mutex_unlock ( & dir - > i_mutex ) ;
if ( IS_ERR ( dentry ) )
goto fail ;
goto done ;
}
/*
* Uhhuh ! Nasty case : the cache was re - populated while
* we waited on the semaphore . Need to revalidate .
*/
mutex_unlock ( & dir - > i_mutex ) ;
if ( dentry - > d_op & & dentry - > d_op - > d_revalidate ) {
dentry = do_revalidate ( dentry , nd ) ;
if ( ! dentry )
dentry = ERR_PTR ( - ENOENT ) ;
}
2005-04-17 02:20:36 +04:00
if ( IS_ERR ( dentry ) )
goto fail ;
goto done ;
need_revalidate :
2006-09-27 12:50:44 +04:00
dentry = do_revalidate ( dentry , nd ) ;
if ( ! dentry )
goto need_lookup ;
if ( IS_ERR ( dentry ) )
goto fail ;
goto done ;
2005-04-17 02:20:36 +04:00
fail :
return PTR_ERR ( dentry ) ;
}
2010-02-16 21:09:36 +03:00
/*
* This is a temporary kludge to deal with " automount " symlinks ; proper
* solution is to trigger them on follow_mount ( ) , so that do_lookup ( )
* would DTRT . To be killed before 2.6 .34 - final .
*/
static inline int follow_on_final ( struct inode * inode , unsigned lookup_flags )
{
return inode & & unlikely ( inode - > i_op - > follow_link ) & &
( ( lookup_flags & LOOKUP_FOLLOW ) | | S_ISDIR ( inode - > i_mode ) ) ;
}
2005-04-17 02:20:36 +04:00
/*
* Name resolution .
2005-04-29 19:00:17 +04:00
* This is the basic name resolution function , turning a pathname into
* the final dentry . We expect ' base ' to be positive and a directory .
2005-04-17 02:20:36 +04:00
*
2005-04-29 19:00:17 +04:00
* Returns 0 and nd will have valid dentry and mnt on success .
* Returns error and drops reference to input namei data on failure .
2005-04-17 02:20:36 +04:00
*/
2009-08-09 01:41:57 +04:00
static int link_path_walk ( const char * name , struct nameidata * nd )
2005-04-17 02:20:36 +04:00
{
struct path next ;
struct inode * inode ;
int err ;
unsigned int lookup_flags = nd - > flags ;
while ( * name = = ' / ' )
name + + ;
if ( ! * name )
goto return_reval ;
2008-02-15 06:34:32 +03:00
inode = nd - > path . dentry - > d_inode ;
2005-04-17 02:20:36 +04:00
if ( nd - > depth )
2006-02-05 10:28:01 +03:00
lookup_flags = LOOKUP_FOLLOW | ( nd - > flags & LOOKUP_CONTINUE ) ;
2005-04-17 02:20:36 +04:00
/* At this point we know we have a real path component. */
for ( ; ; ) {
unsigned long hash ;
struct qstr this ;
unsigned int c ;
2005-10-19 01:20:18 +04:00
nd - > flags | = LOOKUP_CONTINUE ;
2009-12-16 09:01:38 +03:00
err = exec_permission ( inode ) ;
2005-04-17 02:20:36 +04:00
if ( err )
break ;
this . name = name ;
c = * ( const unsigned char * ) name ;
hash = init_name_hash ( ) ;
do {
name + + ;
hash = partial_name_hash ( c , hash ) ;
c = * ( const unsigned char * ) name ;
} while ( c & & ( c ! = ' / ' ) ) ;
this . len = name - ( const char * ) this . name ;
this . hash = end_name_hash ( hash ) ;
/* remove trailing slashes? */
if ( ! c )
goto last_component ;
while ( * + + name = = ' / ' ) ;
if ( ! * name )
goto last_with_slashes ;
/*
* " . " and " .. " are special - " .. " especially so because it has
* to be able to know about the current root directory and
* parent relationships .
*/
if ( this . name [ 0 ] = = ' . ' ) switch ( this . len ) {
default :
break ;
case 2 :
if ( this . name [ 1 ] ! = ' . ' )
break ;
2005-06-07 00:36:13 +04:00
follow_dotdot ( nd ) ;
2008-02-15 06:34:32 +03:00
inode = nd - > path . dentry - > d_inode ;
2005-04-17 02:20:36 +04:00
/* fallthrough */
case 1 :
continue ;
}
/* This does the actual lookups.. */
err = do_lookup ( nd , & this , & next ) ;
if ( err )
break ;
err = - ENOENT ;
inode = next . dentry - > d_inode ;
if ( ! inode )
goto out_dput ;
if ( inode - > i_op - > follow_link ) {
2005-06-07 00:35:58 +04:00
err = do_follow_link ( & next , nd ) ;
2005-04-17 02:20:36 +04:00
if ( err )
goto return_err ;
err = - ENOENT ;
2008-02-15 06:34:32 +03:00
inode = nd - > path . dentry - > d_inode ;
2005-04-17 02:20:36 +04:00
if ( ! inode )
break ;
2005-09-07 02:18:21 +04:00
} else
path_to_nameidata ( & next , nd ) ;
2005-04-17 02:20:36 +04:00
err = - ENOTDIR ;
if ( ! inode - > i_op - > lookup )
break ;
continue ;
/* here ends the main loop */
last_with_slashes :
lookup_flags | = LOOKUP_FOLLOW | LOOKUP_DIRECTORY ;
last_component :
2006-02-05 10:28:01 +03:00
/* Clear LOOKUP_CONTINUE iff it was previously unset */
nd - > flags & = lookup_flags | ~ LOOKUP_CONTINUE ;
2005-04-17 02:20:36 +04:00
if ( lookup_flags & LOOKUP_PARENT )
goto lookup_parent ;
if ( this . name [ 0 ] = = ' . ' ) switch ( this . len ) {
default :
break ;
case 2 :
if ( this . name [ 1 ] ! = ' . ' )
break ;
2005-06-07 00:36:13 +04:00
follow_dotdot ( nd ) ;
2008-02-15 06:34:32 +03:00
inode = nd - > path . dentry - > d_inode ;
2005-04-17 02:20:36 +04:00
/* fallthrough */
case 1 :
goto return_reval ;
}
err = do_lookup ( nd , & this , & next ) ;
if ( err )
break ;
inode = next . dentry - > d_inode ;
2010-02-16 21:09:36 +03:00
if ( follow_on_final ( inode , lookup_flags ) ) {
2005-06-07 00:35:58 +04:00
err = do_follow_link ( & next , nd ) ;
2005-04-17 02:20:36 +04:00
if ( err )
goto return_err ;
2008-02-15 06:34:32 +03:00
inode = nd - > path . dentry - > d_inode ;
2005-09-07 02:18:21 +04:00
} else
path_to_nameidata ( & next , nd ) ;
2005-04-17 02:20:36 +04:00
err = - ENOENT ;
if ( ! inode )
break ;
if ( lookup_flags & LOOKUP_DIRECTORY ) {
err = - ENOTDIR ;
2008-12-04 18:06:33 +03:00
if ( ! inode - > i_op - > lookup )
2005-04-17 02:20:36 +04:00
break ;
}
goto return_base ;
lookup_parent :
nd - > last = this ;
nd - > last_type = LAST_NORM ;
if ( this . name [ 0 ] ! = ' . ' )
goto return_base ;
if ( this . len = = 1 )
nd - > last_type = LAST_DOT ;
else if ( this . len = = 2 & & this . name [ 1 ] = = ' . ' )
nd - > last_type = LAST_DOTDOT ;
else
goto return_base ;
return_reval :
/*
* We bypassed the ordinary revalidation routines .
* We may need to check the cached dentry for staleness .
*/
2008-02-15 06:34:32 +03:00
if ( nd - > path . dentry & & nd - > path . dentry - > d_sb & &
( nd - > path . dentry - > d_sb - > s_type - > fs_flags & FS_REVAL_DOT ) ) {
2005-04-17 02:20:36 +04:00
err = - ESTALE ;
/* Note: we do not d_invalidate() */
2008-02-15 06:34:32 +03:00
if ( ! nd - > path . dentry - > d_op - > d_revalidate (
nd - > path . dentry , nd ) )
2005-04-17 02:20:36 +04:00
break ;
}
return_base :
return 0 ;
out_dput :
2008-02-15 06:34:35 +03:00
path_put_conditional ( & next , nd ) ;
2005-04-17 02:20:36 +04:00
break ;
}
2008-02-15 06:34:35 +03:00
path_put ( & nd - > path ) ;
2005-04-17 02:20:36 +04:00
return_err :
return err ;
}
2008-02-08 15:19:52 +03:00
static int path_walk ( const char * name , struct nameidata * nd )
2005-04-17 02:20:36 +04:00
{
2009-08-09 01:41:57 +04:00
struct path save = nd - > path ;
int result ;
2005-04-17 02:20:36 +04:00
current - > total_link_count = 0 ;
2009-08-09 01:41:57 +04:00
/* make sure the stuff we saved doesn't go away */
path_get ( & save ) ;
result = link_path_walk ( name , nd ) ;
if ( result = = - ESTALE ) {
/* nd->path had been dropped */
current - > total_link_count = 0 ;
nd - > path = save ;
path_get ( & nd - > path ) ;
nd - > flags | = LOOKUP_REVAL ;
result = link_path_walk ( name , nd ) ;
}
path_put ( & save ) ;
return result ;
2005-04-17 02:20:36 +04:00
}
2009-04-07 19:44:16 +04:00
static int path_init ( int dfd , const char * name , unsigned int flags , struct nameidata * nd )
2005-04-17 02:20:36 +04:00
{
2005-04-29 19:00:17 +04:00
int retval = 0 ;
2006-02-05 10:28:02 +03:00
int fput_needed ;
struct file * file ;
2005-04-17 02:20:36 +04:00
nd - > last_type = LAST_ROOT ; /* if there are only slashes... */
nd - > flags = flags ;
nd - > depth = 0 ;
2009-04-07 19:49:53 +04:00
nd - > root . mnt = NULL ;
2005-04-17 02:20:36 +04:00
if ( * name = = ' / ' ) {
2009-04-07 19:49:53 +04:00
set_root ( nd ) ;
nd - > path = nd - > root ;
path_get ( & nd - > root ) ;
2006-01-19 04:43:53 +03:00
} else if ( dfd = = AT_FDCWD ) {
2009-04-07 19:49:53 +04:00
struct fs_struct * fs = current - > fs ;
2006-09-29 13:01:22 +04:00
read_lock ( & fs - > lock ) ;
2008-02-15 06:34:38 +03:00
nd - > path = fs - > pwd ;
path_get ( & fs - > pwd ) ;
2006-09-29 13:01:22 +04:00
read_unlock ( & fs - > lock ) ;
2006-01-19 04:43:53 +03:00
} else {
struct dentry * dentry ;
file = fget_light ( dfd , & fput_needed ) ;
2006-02-05 10:28:02 +03:00
retval = - EBADF ;
if ( ! file )
2006-06-04 13:51:37 +04:00
goto out_fail ;
2006-01-19 04:43:53 +03:00
2006-12-08 13:36:35 +03:00
dentry = file - > f_path . dentry ;
2006-01-19 04:43:53 +03:00
2006-02-05 10:28:02 +03:00
retval = - ENOTDIR ;
if ( ! S_ISDIR ( dentry - > d_inode - > i_mode ) )
2006-06-04 13:51:37 +04:00
goto fput_fail ;
2006-01-19 04:43:53 +03:00
retval = file_permission ( file , MAY_EXEC ) ;
2006-02-05 10:28:02 +03:00
if ( retval )
2006-06-04 13:51:37 +04:00
goto fput_fail ;
2006-01-19 04:43:53 +03:00
2008-02-15 06:34:38 +03:00
nd - > path = file - > f_path ;
path_get ( & file - > f_path ) ;
2006-01-19 04:43:53 +03:00
fput_light ( file , fput_needed ) ;
2005-04-17 02:20:36 +04:00
}
2009-04-07 19:44:16 +04:00
return 0 ;
2007-05-09 13:33:41 +04:00
2009-04-07 19:44:16 +04:00
fput_fail :
fput_light ( file , fput_needed ) ;
out_fail :
return retval ;
}
/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
static int do_path_lookup ( int dfd , const char * name ,
unsigned int flags , struct nameidata * nd )
{
int retval = path_init ( dfd , name , flags , nd ) ;
if ( ! retval )
retval = path_walk ( name , nd ) ;
2008-02-15 06:34:32 +03:00
if ( unlikely ( ! retval & & ! audit_dummy_context ( ) & & nd - > path . dentry & &
nd - > path . dentry - > d_inode ) )
audit_inode ( name , nd - > path . dentry ) ;
2009-04-07 19:49:53 +04:00
if ( nd - > root . mnt ) {
path_put ( & nd - > root ) ;
nd - > root . mnt = NULL ;
}
2006-02-05 10:28:02 +03:00
return retval ;
2005-04-17 02:20:36 +04:00
}
2008-02-08 15:19:52 +03:00
int path_lookup ( const char * name , unsigned int flags ,
2006-01-19 04:43:53 +03:00
struct nameidata * nd )
{
return do_path_lookup ( AT_FDCWD , name , flags , nd ) ;
}
2008-08-02 08:49:18 +04:00
int kern_path ( const char * name , unsigned int flags , struct path * path )
{
struct nameidata nd ;
int res = do_path_lookup ( AT_FDCWD , name , flags , & nd ) ;
if ( ! res )
* path = nd . path ;
return res ;
}
fs: introduce vfs_path_lookup
Stackable file systems, among others, frequently need to lookup paths or
path components starting from an arbitrary point in the namespace
(identified by a dentry and a vfsmount). Currently, such file systems use
lookup_one_len, which is frowned upon [1] as it does not pass the lookup
intent along; not passing a lookup intent, for example, can trigger BUG_ON's
when stacking on top of NFSv4.
The first patch introduces a new lookup function to allow lookup starting
from an arbitrary point in the namespace. This approach has been suggested
by Christoph Hellwig [2].
The second patch changes sunrpc to use vfs_path_lookup.
The third patch changes nfsctl.c to use vfs_path_lookup.
The fourth patch marks link_path_walk static.
The fifth, and last patch, unexports path_walk because it is no longer
unnecessary to call it directly, and using the new vfs_path_lookup is
cleaner.
For example, the following snippet of code, looks up "some/path/component"
in a directory pointed to by parent_{dentry,vfsmnt}:
err = vfs_path_lookup(parent_dentry, parent_vfsmnt,
"some/path/component", 0, &nd);
if (!err) {
/* exits */
...
/* once done, release the references */
path_release(&nd);
} else if (err == -ENOENT) {
/* doesn't exist */
} else {
/* other error */
}
VFS functions such as lookup_create can be used on the nameidata structure
to pass the create intent to the file system.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 12:48:18 +04:00
/**
* vfs_path_lookup - lookup a file path relative to a dentry - vfsmount pair
* @ dentry : pointer to dentry of the base directory
* @ mnt : pointer to vfs mount of the base directory
* @ name : pointer to file name
* @ flags : lookup flags
* @ nd : pointer to nameidata
*/
int vfs_path_lookup ( struct dentry * dentry , struct vfsmount * mnt ,
const char * name , unsigned int flags ,
struct nameidata * nd )
{
int retval ;
/* same as do_path_lookup */
nd - > last_type = LAST_ROOT ;
nd - > flags = flags ;
nd - > depth = 0 ;
2008-06-10 03:40:35 +04:00
nd - > path . dentry = dentry ;
nd - > path . mnt = mnt ;
path_get ( & nd - > path ) ;
2009-04-07 19:53:49 +04:00
nd - > root = nd - > path ;
path_get ( & nd - > root ) ;
fs: introduce vfs_path_lookup
Stackable file systems, among others, frequently need to lookup paths or
path components starting from an arbitrary point in the namespace
(identified by a dentry and a vfsmount). Currently, such file systems use
lookup_one_len, which is frowned upon [1] as it does not pass the lookup
intent along; not passing a lookup intent, for example, can trigger BUG_ON's
when stacking on top of NFSv4.
The first patch introduces a new lookup function to allow lookup starting
from an arbitrary point in the namespace. This approach has been suggested
by Christoph Hellwig [2].
The second patch changes sunrpc to use vfs_path_lookup.
The third patch changes nfsctl.c to use vfs_path_lookup.
The fourth patch marks link_path_walk static.
The fifth, and last patch, unexports path_walk because it is no longer
unnecessary to call it directly, and using the new vfs_path_lookup is
cleaner.
For example, the following snippet of code, looks up "some/path/component"
in a directory pointed to by parent_{dentry,vfsmnt}:
err = vfs_path_lookup(parent_dentry, parent_vfsmnt,
"some/path/component", 0, &nd);
if (!err) {
/* exits */
...
/* once done, release the references */
path_release(&nd);
} else if (err == -ENOENT) {
/* doesn't exist */
} else {
/* other error */
}
VFS functions such as lookup_create can be used on the nameidata structure
to pass the create intent to the file system.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 12:48:18 +04:00
retval = path_walk ( name , nd ) ;
2008-02-15 06:34:32 +03:00
if ( unlikely ( ! retval & & ! audit_dummy_context ( ) & & nd - > path . dentry & &
nd - > path . dentry - > d_inode ) )
audit_inode ( name , nd - > path . dentry ) ;
fs: introduce vfs_path_lookup
Stackable file systems, among others, frequently need to lookup paths or
path components starting from an arbitrary point in the namespace
(identified by a dentry and a vfsmount). Currently, such file systems use
lookup_one_len, which is frowned upon [1] as it does not pass the lookup
intent along; not passing a lookup intent, for example, can trigger BUG_ON's
when stacking on top of NFSv4.
The first patch introduces a new lookup function to allow lookup starting
from an arbitrary point in the namespace. This approach has been suggested
by Christoph Hellwig [2].
The second patch changes sunrpc to use vfs_path_lookup.
The third patch changes nfsctl.c to use vfs_path_lookup.
The fourth patch marks link_path_walk static.
The fifth, and last patch, unexports path_walk because it is no longer
unnecessary to call it directly, and using the new vfs_path_lookup is
cleaner.
For example, the following snippet of code, looks up "some/path/component"
in a directory pointed to by parent_{dentry,vfsmnt}:
err = vfs_path_lookup(parent_dentry, parent_vfsmnt,
"some/path/component", 0, &nd);
if (!err) {
/* exits */
...
/* once done, release the references */
path_release(&nd);
} else if (err == -ENOENT) {
/* doesn't exist */
} else {
/* other error */
}
VFS functions such as lookup_create can be used on the nameidata structure
to pass the create intent to the file system.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 12:48:18 +04:00
2009-04-07 19:53:49 +04:00
path_put ( & nd - > root ) ;
nd - > root . mnt = NULL ;
fs: introduce vfs_path_lookup
Stackable file systems, among others, frequently need to lookup paths or
path components starting from an arbitrary point in the namespace
(identified by a dentry and a vfsmount). Currently, such file systems use
lookup_one_len, which is frowned upon [1] as it does not pass the lookup
intent along; not passing a lookup intent, for example, can trigger BUG_ON's
when stacking on top of NFSv4.
The first patch introduces a new lookup function to allow lookup starting
from an arbitrary point in the namespace. This approach has been suggested
by Christoph Hellwig [2].
The second patch changes sunrpc to use vfs_path_lookup.
The third patch changes nfsctl.c to use vfs_path_lookup.
The fourth patch marks link_path_walk static.
The fifth, and last patch, unexports path_walk because it is no longer
unnecessary to call it directly, and using the new vfs_path_lookup is
cleaner.
For example, the following snippet of code, looks up "some/path/component"
in a directory pointed to by parent_{dentry,vfsmnt}:
err = vfs_path_lookup(parent_dentry, parent_vfsmnt,
"some/path/component", 0, &nd);
if (!err) {
/* exits */
...
/* once done, release the references */
path_release(&nd);
} else if (err == -ENOENT) {
/* doesn't exist */
} else {
/* other error */
}
VFS functions such as lookup_create can be used on the nameidata structure
to pass the create intent to the file system.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 12:48:18 +04:00
2009-04-07 19:49:53 +04:00
return retval ;
fs: introduce vfs_path_lookup
Stackable file systems, among others, frequently need to lookup paths or
path components starting from an arbitrary point in the namespace
(identified by a dentry and a vfsmount). Currently, such file systems use
lookup_one_len, which is frowned upon [1] as it does not pass the lookup
intent along; not passing a lookup intent, for example, can trigger BUG_ON's
when stacking on top of NFSv4.
The first patch introduces a new lookup function to allow lookup starting
from an arbitrary point in the namespace. This approach has been suggested
by Christoph Hellwig [2].
The second patch changes sunrpc to use vfs_path_lookup.
The third patch changes nfsctl.c to use vfs_path_lookup.
The fourth patch marks link_path_walk static.
The fifth, and last patch, unexports path_walk because it is no longer
unnecessary to call it directly, and using the new vfs_path_lookup is
cleaner.
For example, the following snippet of code, looks up "some/path/component"
in a directory pointed to by parent_{dentry,vfsmnt}:
err = vfs_path_lookup(parent_dentry, parent_vfsmnt,
"some/path/component", 0, &nd);
if (!err) {
/* exits */
...
/* once done, release the references */
path_release(&nd);
} else if (err == -ENOENT) {
/* doesn't exist */
} else {
/* other error */
}
VFS functions such as lookup_create can be used on the nameidata structure
to pass the create intent to the file system.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 12:48:18 +04:00
}
2007-10-17 10:25:38 +04:00
static struct dentry * __lookup_hash ( struct qstr * name ,
struct dentry * base , struct nameidata * nd )
2005-04-17 02:20:36 +04:00
{
2007-04-26 11:12:05 +04:00
struct dentry * dentry ;
2005-04-17 02:20:36 +04:00
struct inode * inode ;
int err ;
inode = base - > d_inode ;
/*
* See if the low - level filesystem might want
* to use its own hash . .
*/
if ( base - > d_op & & base - > d_op - > d_hash ) {
err = base - > d_op - > d_hash ( base , name ) ;
dentry = ERR_PTR ( err ) ;
if ( err < 0 )
goto out ;
}
2009-08-13 23:38:37 +04:00
dentry = __d_lookup ( base , name ) ;
/* lockess __d_lookup may fail due to concurrent d_move()
* in some unrelated directory , so try with d_lookup
*/
if ( ! dentry )
dentry = d_lookup ( base , name ) ;
if ( dentry & & dentry - > d_op & & dentry - > d_op - > d_revalidate )
dentry = do_revalidate ( dentry , nd ) ;
2005-04-17 02:20:36 +04:00
if ( ! dentry ) {
2008-07-02 23:30:15 +04:00
struct dentry * new ;
/* Don't create child dentry for a dead directory. */
dentry = ERR_PTR ( - ENOENT ) ;
if ( IS_DEADDIR ( inode ) )
goto out ;
new = d_alloc ( base , name ) ;
2005-04-17 02:20:36 +04:00
dentry = ERR_PTR ( - ENOMEM ) ;
if ( ! new )
goto out ;
dentry = inode - > i_op - > lookup ( inode , new , nd ) ;
if ( ! dentry )
dentry = new ;
else
dput ( new ) ;
}
out :
return dentry ;
}
2007-04-26 11:12:05 +04:00
/*
* Restricted form of lookup . Doesn ' t follow links , single - component only ,
* needs parent already locked . Doesn ' t follow mounts .
* SMP - safe .
*/
2007-10-17 10:25:38 +04:00
static struct dentry * lookup_hash ( struct nameidata * nd )
2007-04-26 11:12:05 +04:00
{
int err ;
2009-12-16 09:01:38 +03:00
err = exec_permission ( nd - > path . dentry - > d_inode ) ;
2007-04-26 11:12:05 +04:00
if ( err )
2007-10-17 10:25:38 +04:00
return ERR_PTR ( err ) ;
2008-02-15 06:34:32 +03:00
return __lookup_hash ( & nd - > last , nd - > path . dentry , nd ) ;
2005-04-17 02:20:36 +04:00
}
2007-10-17 10:25:38 +04:00
static int __lookup_one_len ( const char * name , struct qstr * this ,
struct dentry * base , int len )
2005-04-17 02:20:36 +04:00
{
unsigned long hash ;
unsigned int c ;
2007-04-26 11:12:05 +04:00
this - > name = name ;
this - > len = len ;
2005-04-17 02:20:36 +04:00
if ( ! len )
2007-04-26 11:12:05 +04:00
return - EACCES ;
2005-04-17 02:20:36 +04:00
hash = init_name_hash ( ) ;
while ( len - - ) {
c = * ( const unsigned char * ) name + + ;
if ( c = = ' / ' | | c = = ' \0 ' )
2007-04-26 11:12:05 +04:00
return - EACCES ;
2005-04-17 02:20:36 +04:00
hash = partial_name_hash ( c , hash ) ;
}
2007-04-26 11:12:05 +04:00
this - > hash = end_name_hash ( hash ) ;
return 0 ;
}
2005-04-17 02:20:36 +04:00
2007-10-17 10:25:38 +04:00
/**
2008-03-20 03:01:00 +03:00
* lookup_one_len - filesystem helper to lookup single pathname component
2007-10-17 10:25:38 +04:00
* @ name : pathname component to lookup
* @ base : base directory to lookup from
* @ len : maximum length @ len should be interpreted to
*
2008-03-20 03:01:00 +03:00
* Note that this routine is purely a helper for filesystem usage and should
* not be called by generic code . Also note that by using this function the
2007-10-17 10:25:38 +04:00
* nameidata argument is passed to the filesystem methods and a filesystem
* using this helper needs to be prepared for that .
*/
2007-04-26 11:12:05 +04:00
struct dentry * lookup_one_len ( const char * name , struct dentry * base , int len )
{
int err ;
struct qstr this ;
2009-04-21 02:18:37 +04:00
WARN_ON_ONCE ( ! mutex_is_locked ( & base - > d_inode - > i_mutex ) ) ;
2007-04-26 11:12:05 +04:00
err = __lookup_one_len ( name , & this , base , len ) ;
2007-10-17 10:25:38 +04:00
if ( err )
return ERR_PTR ( err ) ;
2009-12-16 09:01:38 +03:00
err = exec_permission ( base - > d_inode ) ;
2007-04-26 11:12:05 +04:00
if ( err )
return ERR_PTR ( err ) ;
2005-11-09 08:35:06 +03:00
return __lookup_hash ( & this , base , NULL ) ;
2007-04-26 11:12:05 +04:00
}
2008-07-22 17:59:21 +04:00
int user_path_at ( int dfd , const char __user * name , unsigned flags ,
struct path * path )
2005-04-17 02:20:36 +04:00
{
2008-07-22 17:59:21 +04:00
struct nameidata nd ;
2005-04-17 02:20:36 +04:00
char * tmp = getname ( name ) ;
int err = PTR_ERR ( tmp ) ;
if ( ! IS_ERR ( tmp ) ) {
2008-07-22 17:59:21 +04:00
BUG_ON ( flags & LOOKUP_PARENT ) ;
err = do_path_lookup ( dfd , tmp , flags , & nd ) ;
2005-04-17 02:20:36 +04:00
putname ( tmp ) ;
2008-07-22 17:59:21 +04:00
if ( ! err )
* path = nd . path ;
2005-04-17 02:20:36 +04:00
}
return err ;
}
2008-07-21 17:32:51 +04:00
static int user_path_parent ( int dfd , const char __user * path ,
struct nameidata * nd , char * * name )
{
char * s = getname ( path ) ;
int error ;
if ( IS_ERR ( s ) )
return PTR_ERR ( s ) ;
error = do_path_lookup ( dfd , s , LOOKUP_PARENT , nd ) ;
if ( error )
putname ( s ) ;
else
* name = s ;
return error ;
}
2005-04-17 02:20:36 +04:00
/*
* It ' s inline , so penalty for filesystems that don ' t use sticky bit is
* minimal .
*/
static inline int check_sticky ( struct inode * dir , struct inode * inode )
{
2008-11-14 02:39:05 +03:00
uid_t fsuid = current_fsuid ( ) ;
2005-04-17 02:20:36 +04:00
if ( ! ( dir - > i_mode & S_ISVTX ) )
return 0 ;
2008-11-14 02:39:05 +03:00
if ( inode - > i_uid = = fsuid )
2005-04-17 02:20:36 +04:00
return 0 ;
2008-11-14 02:39:05 +03:00
if ( dir - > i_uid = = fsuid )
2005-04-17 02:20:36 +04:00
return 0 ;
return ! capable ( CAP_FOWNER ) ;
}
/*
* Check whether we can remove a link victim from directory dir , check
* whether the type of victim is right .
* 1. We can ' t do it if dir is read - only ( done in permission ( ) )
* 2. We should have write and exec permissions on dir
* 3. We can ' t remove anything from append - only dir
* 4. We can ' t do anything with immutable dir ( done in permission ( ) )
* 5. If the sticky bit on dir is set we should either
* a . be owner of dir , or
* b . be owner of victim , or
* c . have CAP_FOWNER capability
* 6. If the victim is append - only or immutable we can ' t do antyhing with
* links pointing to it .
* 7. If we were asked to remove a directory and victim isn ' t one - ENOTDIR .
* 8. If we were asked to remove a non - directory and victim isn ' t one - EISDIR .
* 9. We can ' t remove a root or mountpoint .
* 10. We don ' t allow removal of NFS sillyrenamed files ; it ' s handled by
* nfs_async_unlink ( ) .
*/
2006-01-15 00:20:43 +03:00
static int may_delete ( struct inode * dir , struct dentry * victim , int isdir )
2005-04-17 02:20:36 +04:00
{
int error ;
if ( ! victim - > d_inode )
return - ENOENT ;
BUG_ON ( victim - > d_parent - > d_inode ! = dir ) ;
2009-12-25 13:07:33 +03:00
audit_inode_child ( victim , dir ) ;
2005-04-17 02:20:36 +04:00
2008-07-22 08:07:17 +04:00
error = inode_permission ( dir , MAY_WRITE | MAY_EXEC ) ;
2005-04-17 02:20:36 +04:00
if ( error )
return error ;
if ( IS_APPEND ( dir ) )
return - EPERM ;
if ( check_sticky ( dir , victim - > d_inode ) | | IS_APPEND ( victim - > d_inode ) | |
2008-11-20 02:36:38 +03:00
IS_IMMUTABLE ( victim - > d_inode ) | | IS_SWAPFILE ( victim - > d_inode ) )
2005-04-17 02:20:36 +04:00
return - EPERM ;
if ( isdir ) {
if ( ! S_ISDIR ( victim - > d_inode - > i_mode ) )
return - ENOTDIR ;
if ( IS_ROOT ( victim ) )
return - EBUSY ;
} else if ( S_ISDIR ( victim - > d_inode - > i_mode ) )
return - EISDIR ;
if ( IS_DEADDIR ( dir ) )
return - ENOENT ;
if ( victim - > d_flags & DCACHE_NFSFS_RENAMED )
return - EBUSY ;
return 0 ;
}
/* Check whether we can create an object with dentry child in directory
* dir .
* 1. We can ' t do it if child already exists ( open has special treatment for
* this case , but since we are inlined it ' s OK )
* 2. We can ' t do it if dir is read - only ( done in permission ( ) )
* 3. We should have write and exec permissions on dir
* 4. We can ' t do it if dir is immutable ( done in permission ( ) )
*/
2008-07-30 17:08:48 +04:00
static inline int may_create ( struct inode * dir , struct dentry * child )
2005-04-17 02:20:36 +04:00
{
if ( child - > d_inode )
return - EEXIST ;
if ( IS_DEADDIR ( dir ) )
return - ENOENT ;
2008-07-22 08:07:17 +04:00
return inode_permission ( dir , MAY_WRITE | MAY_EXEC ) ;
2005-04-17 02:20:36 +04:00
}
/*
* O_DIRECTORY translates into forcing a directory lookup .
*/
static inline int lookup_flags ( unsigned int f )
{
unsigned long retval = LOOKUP_FOLLOW ;
if ( f & O_NOFOLLOW )
retval & = ~ LOOKUP_FOLLOW ;
if ( f & O_DIRECTORY )
retval | = LOOKUP_DIRECTORY ;
return retval ;
}
/*
* p1 and p2 should be directories on the same fs .
*/
struct dentry * lock_rename ( struct dentry * p1 , struct dentry * p2 )
{
struct dentry * p ;
if ( p1 = = p2 ) {
2006-07-03 11:25:05 +04:00
mutex_lock_nested ( & p1 - > d_inode - > i_mutex , I_MUTEX_PARENT ) ;
2005-04-17 02:20:36 +04:00
return NULL ;
}
2006-03-23 14:00:33 +03:00
mutex_lock ( & p1 - > d_inode - > i_sb - > s_vfs_rename_mutex ) ;
2005-04-17 02:20:36 +04:00
2008-10-16 02:50:28 +04:00
p = d_ancestor ( p2 , p1 ) ;
if ( p ) {
mutex_lock_nested ( & p2 - > d_inode - > i_mutex , I_MUTEX_PARENT ) ;
mutex_lock_nested ( & p1 - > d_inode - > i_mutex , I_MUTEX_CHILD ) ;
return p ;
2005-04-17 02:20:36 +04:00
}
2008-10-16 02:50:28 +04:00
p = d_ancestor ( p1 , p2 ) ;
if ( p ) {
mutex_lock_nested ( & p1 - > d_inode - > i_mutex , I_MUTEX_PARENT ) ;
mutex_lock_nested ( & p2 - > d_inode - > i_mutex , I_MUTEX_CHILD ) ;
return p ;
2005-04-17 02:20:36 +04:00
}
2006-07-03 11:25:05 +04:00
mutex_lock_nested ( & p1 - > d_inode - > i_mutex , I_MUTEX_PARENT ) ;
mutex_lock_nested ( & p2 - > d_inode - > i_mutex , I_MUTEX_CHILD ) ;
2005-04-17 02:20:36 +04:00
return NULL ;
}
void unlock_rename ( struct dentry * p1 , struct dentry * p2 )
{
2006-01-10 02:59:24 +03:00
mutex_unlock ( & p1 - > d_inode - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
if ( p1 ! = p2 ) {
2006-01-10 02:59:24 +03:00
mutex_unlock ( & p2 - > d_inode - > i_mutex ) ;
2006-03-23 14:00:33 +03:00
mutex_unlock ( & p1 - > d_inode - > i_sb - > s_vfs_rename_mutex ) ;
2005-04-17 02:20:36 +04:00
}
}
int vfs_create ( struct inode * dir , struct dentry * dentry , int mode ,
struct nameidata * nd )
{
2008-07-30 17:08:48 +04:00
int error = may_create ( dir , dentry ) ;
2005-04-17 02:20:36 +04:00
if ( error )
return error ;
2008-12-04 18:06:33 +03:00
if ( ! dir - > i_op - > create )
2005-04-17 02:20:36 +04:00
return - EACCES ; /* shouldn't it be ENOSYS? */
mode & = S_IALLUGO ;
mode | = S_IFREG ;
error = security_inode_create ( dir , dentry , mode ) ;
if ( error )
return error ;
2009-01-26 18:45:12 +03:00
vfs_dq_init ( dir ) ;
2005-04-17 02:20:36 +04:00
error = dir - > i_op - > create ( dir , dentry , mode , nd ) ;
2005-09-10 00:01:44 +04:00
if ( ! error )
2005-11-03 18:57:06 +03:00
fsnotify_create ( dir , dentry ) ;
2005-04-17 02:20:36 +04:00
return error ;
}
2008-10-24 11:58:10 +04:00
int may_open ( struct path * path , int acc_mode , int flag )
2005-04-17 02:20:36 +04:00
{
2008-10-24 11:58:10 +04:00
struct dentry * dentry = path - > dentry ;
2005-04-17 02:20:36 +04:00
struct inode * inode = dentry - > d_inode ;
int error ;
if ( ! inode )
return - ENOENT ;
2009-01-05 21:27:23 +03:00
switch ( inode - > i_mode & S_IFMT ) {
case S_IFLNK :
2005-04-17 02:20:36 +04:00
return - ELOOP ;
2009-01-05 21:27:23 +03:00
case S_IFDIR :
if ( acc_mode & MAY_WRITE )
return - EISDIR ;
break ;
case S_IFBLK :
case S_IFCHR :
2008-10-24 11:58:10 +04:00
if ( path - > mnt - > mnt_flags & MNT_NODEV )
2005-04-17 02:20:36 +04:00
return - EACCES ;
2009-01-05 21:27:23 +03:00
/*FALLTHRU*/
case S_IFIFO :
case S_IFSOCK :
2005-04-17 02:20:36 +04:00
flag & = ~ O_TRUNC ;
2009-01-05 21:27:23 +03:00
break ;
2008-02-16 01:37:48 +03:00
}
2007-10-17 10:31:14 +04:00
2008-10-24 11:58:10 +04:00
error = inode_permission ( inode , acc_mode ) ;
2007-10-17 10:31:14 +04:00
if ( error )
return error ;
2009-02-04 17:06:57 +03:00
2005-04-17 02:20:36 +04:00
/*
* An append - only file must be opened in append mode for writing .
*/
if ( IS_APPEND ( inode ) ) {
2009-12-24 14:47:55 +03:00
if ( ( flag & O_ACCMODE ) ! = O_RDONLY & & ! ( flag & O_APPEND ) )
2009-12-16 11:54:00 +03:00
return - EPERM ;
2005-04-17 02:20:36 +04:00
if ( flag & O_TRUNC )
2009-12-16 11:54:00 +03:00
return - EPERM ;
2005-04-17 02:20:36 +04:00
}
/* O_NOATIME can only be set by the owner or superuser */
2009-12-16 11:54:00 +03:00
if ( flag & O_NOATIME & & ! is_owner_or_cap ( inode ) )
return - EPERM ;
2005-04-17 02:20:36 +04:00
/*
* Ensure there are no outstanding leases on the file .
*/
2009-12-16 14:27:40 +03:00
return break_lease ( inode , flag ) ;
2009-12-16 11:54:00 +03:00
}
2005-04-17 02:20:36 +04:00
2009-12-16 11:54:00 +03:00
static int handle_truncate ( struct path * path )
{
struct inode * inode = path - > dentry - > d_inode ;
int error = get_write_access ( inode ) ;
if ( error )
return error ;
/*
* Refuse to truncate files with mandatory locks held on them .
*/
error = locks_verify_locked ( inode ) ;
if ( ! error )
error = security_path_truncate ( path , 0 ,
ATTR_MTIME | ATTR_CTIME | ATTR_OPEN ) ;
if ( ! error ) {
error = do_truncate ( path - > dentry , 0 ,
ATTR_MTIME | ATTR_CTIME | ATTR_OPEN ,
NULL ) ;
}
put_write_access ( inode ) ;
2009-09-04 21:08:46 +04:00
return error ;
2005-04-17 02:20:36 +04:00
}
2008-02-16 01:37:27 +03:00
/*
* Be careful about ever adding any more callers of this
* function . Its flags must be in the namei format , not
* what get passed to sys_open ( ) .
*/
static int __open_namei_create ( struct nameidata * nd , struct path * path ,
2009-12-24 14:47:55 +03:00
int open_flag , int mode )
2006-10-01 10:29:02 +04:00
{
int error ;
2008-02-15 06:34:32 +03:00
struct dentry * dir = nd - > path . dentry ;
2006-10-01 10:29:02 +04:00
if ( ! IS_POSIXACL ( dir - > d_inode ) )
2009-03-30 03:08:22 +04:00
mode & = ~ current_umask ( ) ;
2008-12-17 07:24:15 +03:00
error = security_path_mknod ( & nd - > path , path - > dentry , mode , 0 ) ;
if ( error )
goto out_unlock ;
2006-10-01 10:29:02 +04:00
error = vfs_create ( dir - > d_inode , path - > dentry , mode , nd ) ;
2008-12-17 07:24:15 +03:00
out_unlock :
2006-10-01 10:29:02 +04:00
mutex_unlock ( & dir - > d_inode - > i_mutex ) ;
2008-02-15 06:34:32 +03:00
dput ( nd - > path . dentry ) ;
nd - > path . dentry = path - > dentry ;
2006-10-01 10:29:02 +04:00
if ( error )
return error ;
/* Don't check for write permission, don't truncate */
2009-12-24 14:47:55 +03:00
return may_open ( & nd - > path , 0 , open_flag & ~ O_TRUNC ) ;
2006-10-01 10:29:02 +04:00
}
2008-02-16 01:37:27 +03:00
/*
* Note that while the flag value ( low two bits ) for sys_open means :
* 00 - read - only
* 01 - write - only
* 10 - read - write
* 11 - special
* it is changed into
* 00 - no permissions needed
* 01 - read - permission
* 10 - write - permission
* 11 - read - write
* for the internal routines ( ie open_namei ( ) / follow_link ( ) etc )
* This is more logical , and also allows the 00 " no perm needed "
* to be used for symlinks ( where the permissions are checked
* later ) .
*
*/
static inline int open_to_namei_flags ( int flag )
{
if ( ( flag + 1 ) & O_ACCMODE )
flag + + ;
return flag ;
}
2009-12-16 11:54:00 +03:00
static int open_will_truncate ( int flag , struct inode * inode )
2008-02-16 01:37:48 +03:00
{
/*
* We ' ll never write to the fs underlying
* a device file .
*/
if ( special_file ( inode - > i_mode ) )
return 0 ;
return ( flag & O_TRUNC ) ;
}
2009-12-24 09:26:48 +03:00
static struct file * finish_open ( struct nameidata * nd ,
int open_flag , int flag , int acc_mode )
{
struct file * filp ;
int will_truncate ;
int error ;
will_truncate = open_will_truncate ( flag , nd - > path . dentry - > d_inode ) ;
if ( will_truncate ) {
error = mnt_want_write ( nd - > path . mnt ) ;
if ( error )
goto exit ;
}
error = may_open ( & nd - > path , acc_mode , open_flag ) ;
if ( error ) {
if ( will_truncate )
mnt_drop_write ( nd - > path . mnt ) ;
goto exit ;
}
filp = nameidata_to_filp ( nd ) ;
if ( ! IS_ERR ( filp ) ) {
error = ima_file_check ( filp , acc_mode ) ;
if ( error ) {
fput ( filp ) ;
filp = ERR_PTR ( error ) ;
}
}
if ( ! IS_ERR ( filp ) ) {
if ( acc_mode & MAY_WRITE )
vfs_dq_init ( nd - > path . dentry - > d_inode ) ;
if ( will_truncate ) {
error = handle_truncate ( & nd - > path ) ;
if ( error ) {
fput ( filp ) ;
filp = ERR_PTR ( error ) ;
}
}
}
/*
* It is now safe to drop the mnt write
* because the filp has had a write taken
* on its behalf .
*/
if ( will_truncate )
mnt_drop_write ( nd - > path . mnt ) ;
return filp ;
exit :
if ( ! IS_ERR ( nd - > intent . open . file ) )
release_open_intent ( nd ) ;
path_put ( & nd - > path ) ;
return ERR_PTR ( error ) ;
}
2009-12-24 09:58:28 +03:00
static struct file * do_last ( struct nameidata * nd , struct path * path ,
int open_flag , int flag , int acc_mode ,
int mode , const char * pathname ,
struct dentry * dir , int * is_link )
{
struct file * filp ;
int error ;
* is_link = 0 ;
error = PTR_ERR ( path - > dentry ) ;
if ( IS_ERR ( path - > dentry ) ) {
mutex_unlock ( & dir - > d_inode - > i_mutex ) ;
goto exit ;
}
if ( IS_ERR ( nd - > intent . open . file ) ) {
error = PTR_ERR ( nd - > intent . open . file ) ;
goto exit_mutex_unlock ;
}
/* Negative dentry, just create the file */
if ( ! path - > dentry - > d_inode ) {
/*
* This write is needed to ensure that a
* ro - > rw transition does not occur between
* the time when the file is created and when
* a permanent write count is taken through
* the ' struct file ' in nameidata_to_filp ( ) .
*/
error = mnt_want_write ( nd - > path . mnt ) ;
if ( error )
goto exit_mutex_unlock ;
error = __open_namei_create ( nd , path , open_flag , mode ) ;
if ( error ) {
mnt_drop_write ( nd - > path . mnt ) ;
goto exit ;
}
filp = nameidata_to_filp ( nd ) ;
mnt_drop_write ( nd - > path . mnt ) ;
if ( ! IS_ERR ( filp ) ) {
error = ima_file_check ( filp , acc_mode ) ;
if ( error ) {
fput ( filp ) ;
filp = ERR_PTR ( error ) ;
}
}
return filp ;
}
/*
* It already exists .
*/
mutex_unlock ( & dir - > d_inode - > i_mutex ) ;
audit_inode ( pathname , path - > dentry ) ;
error = - EEXIST ;
if ( flag & O_EXCL )
goto exit_dput ;
if ( __follow_mount ( path ) ) {
error = - ELOOP ;
if ( flag & O_NOFOLLOW )
goto exit_dput ;
}
error = - ENOENT ;
if ( ! path - > dentry - > d_inode )
goto exit_dput ;
if ( path - > dentry - > d_inode - > i_op - > follow_link ) {
* is_link = 1 ;
return NULL ;
}
path_to_nameidata ( path , nd ) ;
error = - EISDIR ;
if ( S_ISDIR ( path - > dentry - > d_inode - > i_mode ) )
goto exit ;
filp = finish_open ( nd , open_flag , flag , acc_mode ) ;
return filp ;
exit_mutex_unlock :
mutex_unlock ( & dir - > d_inode - > i_mutex ) ;
exit_dput :
path_put_conditional ( path , nd ) ;
exit :
if ( ! IS_ERR ( nd - > intent . open . file ) )
release_open_intent ( nd ) ;
path_put ( & nd - > path ) ;
return ERR_PTR ( error ) ;
}
2005-04-17 02:20:36 +04:00
/*
2008-02-16 01:37:48 +03:00
* Note that the low bits of the passed in " open_flag "
* are not the same as in the local variable " flag " . See
* open_to_namei_flags ( ) for more details .
2005-04-17 02:20:36 +04:00
*/
2008-02-16 01:37:28 +03:00
struct file * do_filp_open ( int dfd , const char * pathname ,
2009-04-06 19:16:22 +04:00
int open_flag , int mode , int acc_mode )
2005-04-17 02:20:36 +04:00
{
2008-02-16 01:37:48 +03:00
struct file * filp ;
2008-02-16 01:37:28 +03:00
struct nameidata nd ;
2009-04-06 19:16:22 +04:00
int error ;
2010-01-13 23:01:15 +03:00
struct path path ;
2005-04-17 02:20:36 +04:00
struct dentry * dir ;
int count = 0 ;
2008-02-16 01:37:27 +03:00
int flag = open_to_namei_flags ( open_flag ) ;
2010-01-13 23:01:15 +03:00
int force_reval = 0 ;
2009-12-24 09:58:28 +03:00
int is_link ;
2005-04-17 02:20:36 +04:00
2009-10-27 13:05:28 +03:00
/*
* O_SYNC is implemented as __O_SYNC | O_DSYNC . As many places only
* check for O_DSYNC if the need any syncing at all we enforce it ' s
* always set instead of having to deal with possibly weird behaviour
* for malicious applications setting only __O_SYNC .
*/
if ( open_flag & __O_SYNC )
open_flag | = O_DSYNC ;
2009-04-06 19:16:22 +04:00
if ( ! acc_mode )
2009-12-24 14:58:56 +03:00
acc_mode = MAY_OPEN | ACC_MODE ( open_flag ) ;
2005-04-17 02:20:36 +04:00
2005-10-19 01:20:16 +04:00
/* O_TRUNC implies we need access checks for write permissions */
if ( flag & O_TRUNC )
acc_mode | = MAY_WRITE ;
2005-04-17 02:20:36 +04:00
/* Allow the LSM permission hook to distinguish append
access from general write access . */
if ( flag & O_APPEND )
acc_mode | = MAY_APPEND ;
/*
* The simplest case - just a plain lookup .
*/
if ( ! ( flag & O_CREAT ) ) {
2009-08-13 20:40:45 +04:00
filp = get_empty_filp ( ) ;
if ( filp = = NULL )
return ERR_PTR ( - ENFILE ) ;
nd . intent . open . file = filp ;
2009-12-19 18:10:39 +03:00
filp - > f_flags = open_flag ;
2009-08-13 20:40:45 +04:00
nd . intent . open . flags = flag ;
nd . intent . open . create_mode = 0 ;
error = do_path_lookup ( dfd , pathname ,
lookup_flags ( flag ) | LOOKUP_OPEN , & nd ) ;
if ( IS_ERR ( nd . intent . open . file ) ) {
if ( error = = 0 ) {
error = PTR_ERR ( nd . intent . open . file ) ;
path_put ( & nd . path ) ;
}
} else if ( error )
release_open_intent ( & nd ) ;
2005-04-17 02:20:36 +04:00
if ( error )
2008-02-16 01:37:28 +03:00
return ERR_PTR ( error ) ;
2005-04-17 02:20:36 +04:00
goto ok ;
}
/*
* Create - we need to know the parent .
*/
2010-01-13 23:01:15 +03:00
reval :
2009-04-07 19:44:16 +04:00
error = path_init ( dfd , pathname , LOOKUP_PARENT , & nd ) ;
2005-04-17 02:20:36 +04:00
if ( error )
2008-02-16 01:37:28 +03:00
return ERR_PTR ( error ) ;
2010-01-13 23:01:15 +03:00
if ( force_reval )
nd . flags | = LOOKUP_REVAL ;
2009-04-07 19:44:16 +04:00
error = path_walk ( pathname , & nd ) ;
2009-06-18 18:30:15 +04:00
if ( error ) {
if ( nd . root . mnt )
path_put ( & nd . root ) ;
2009-04-07 19:44:16 +04:00
return ERR_PTR ( error ) ;
2009-06-18 18:30:15 +04:00
}
2009-04-07 19:44:16 +04:00
if ( unlikely ( ! audit_dummy_context ( ) ) )
audit_inode ( pathname , nd . path . dentry ) ;
2005-04-17 02:20:36 +04:00
/*
* We have the parent and last component . First of all , check
* that we are not asked to creat ( 2 ) an obvious directory - that
* will not do .
*/
error = - EISDIR ;
2008-02-16 01:37:28 +03:00
if ( nd . last_type ! = LAST_NORM | | nd . last . name [ nd . last . len ] )
2008-08-03 06:36:57 +04:00
goto exit_parent ;
2005-04-17 02:20:36 +04:00
2008-08-03 06:36:57 +04:00
error = - ENFILE ;
filp = get_empty_filp ( ) ;
if ( filp = = NULL )
goto exit_parent ;
nd . intent . open . file = filp ;
2009-12-19 18:10:39 +03:00
filp - > f_flags = open_flag ;
2008-08-03 06:36:57 +04:00
nd . intent . open . flags = flag ;
nd . intent . open . create_mode = mode ;
2008-02-16 01:37:28 +03:00
dir = nd . path . dentry ;
nd . flags & = ~ LOOKUP_PARENT ;
2008-08-03 06:36:57 +04:00
nd . flags | = LOOKUP_CREATE | LOOKUP_OPEN ;
2008-08-05 11:00:49 +04:00
if ( flag & O_EXCL )
nd . flags | = LOOKUP_EXCL ;
2006-01-10 02:59:24 +03:00
mutex_lock ( & dir - > d_inode - > i_mutex ) ;
2008-02-16 01:37:28 +03:00
path . dentry = lookup_hash ( & nd ) ;
path . mnt = nd . path . mnt ;
2009-12-24 09:58:28 +03:00
filp = do_last ( & nd , & path , open_flag , flag , acc_mode , mode ,
pathname , dir , & is_link ) ;
if ( is_link )
2005-04-17 02:20:36 +04:00
goto do_link ;
2009-12-24 10:02:38 +03:00
if ( nd . root . mnt )
path_put ( & nd . root ) ;
2009-12-24 09:58:28 +03:00
return filp ;
2005-04-17 02:20:36 +04:00
ok :
2009-12-24 09:26:48 +03:00
filp = finish_open ( & nd , open_flag , flag , acc_mode ) ;
2009-06-18 18:30:15 +04:00
if ( nd . root . mnt )
path_put ( & nd . root ) ;
2008-02-16 01:37:48 +03:00
return filp ;
2005-04-17 02:20:36 +04:00
exit_dput :
2008-02-16 01:37:28 +03:00
path_put_conditional ( & path , & nd ) ;
2005-04-17 02:20:36 +04:00
exit :
2008-02-16 01:37:28 +03:00
if ( ! IS_ERR ( nd . intent . open . file ) )
release_open_intent ( & nd ) ;
2008-08-03 06:36:57 +04:00
exit_parent :
2009-04-07 19:49:53 +04:00
if ( nd . root . mnt )
path_put ( & nd . root ) ;
2008-02-16 01:37:28 +03:00
path_put ( & nd . path ) ;
return ERR_PTR ( error ) ;
2005-04-17 02:20:36 +04:00
do_link :
error = - ELOOP ;
if ( flag & O_NOFOLLOW )
goto exit_dput ;
/*
* This is subtle . Instead of calling do_follow_link ( ) we do the
* thing by hands . The reason is that this way we have zero link_count
* and path_walk ( ) ( called from - > follow_link ) honoring LOOKUP_PARENT .
* After that we have the parent and last component , i . e .
* we are in the same situation as after the first path_walk ( ) .
* Well , almost - if the last component is normal we get its copy
* stored in nd - > last . name and we will have to putname ( ) it when we
* are done . Procfs - like symlinks just set LAST_BIND .
*/
2008-02-16 01:37:28 +03:00
nd . flags | = LOOKUP_PARENT ;
error = security_inode_follow_link ( path . dentry , & nd ) ;
2005-04-17 02:20:36 +04:00
if ( error )
goto exit_dput ;
2008-02-16 01:37:28 +03:00
error = __do_follow_link ( & path , & nd ) ;
2009-08-09 01:32:02 +04:00
path_put ( & path ) ;
2006-07-14 11:23:49 +04:00
if ( error ) {
/* Does someone understand code flow here? Or it is only
* me so stupid ? Anathema to whoever designed this non - sense
* with " intent.open " .
*/
2008-02-16 01:37:28 +03:00
release_open_intent ( & nd ) ;
2009-06-18 18:30:15 +04:00
if ( nd . root . mnt )
path_put ( & nd . root ) ;
2010-01-13 23:01:15 +03:00
if ( error = = - ESTALE & & ! force_reval ) {
force_reval = 1 ;
goto reval ;
}
2008-02-16 01:37:28 +03:00
return ERR_PTR ( error ) ;
2006-07-14 11:23:49 +04:00
}
2008-02-16 01:37:28 +03:00
nd . flags & = ~ LOOKUP_PARENT ;
if ( nd . last_type = = LAST_BIND )
2005-04-17 02:20:36 +04:00
goto ok ;
error = - EISDIR ;
2008-02-16 01:37:28 +03:00
if ( nd . last_type ! = LAST_NORM )
2005-04-17 02:20:36 +04:00
goto exit ;
2008-02-16 01:37:28 +03:00
if ( nd . last . name [ nd . last . len ] ) {
__putname ( nd . last . name ) ;
2005-04-17 02:20:36 +04:00
goto exit ;
}
error = - ELOOP ;
if ( count + + = = 32 ) {
2008-02-16 01:37:28 +03:00
__putname ( nd . last . name ) ;
2005-04-17 02:20:36 +04:00
goto exit ;
}
2008-02-16 01:37:28 +03:00
dir = nd . path . dentry ;
2006-01-10 02:59:24 +03:00
mutex_lock ( & dir - > d_inode - > i_mutex ) ;
2008-02-16 01:37:28 +03:00
path . dentry = lookup_hash ( & nd ) ;
path . mnt = nd . path . mnt ;
2009-12-24 10:05:43 +03:00
filp = do_last ( & nd , & path , open_flag , flag , acc_mode , mode ,
pathname , dir , & is_link ) ;
2009-12-24 10:08:19 +03:00
__putname ( nd . last . name ) ;
2009-12-24 10:05:43 +03:00
if ( is_link )
goto do_link ;
if ( nd . root . mnt )
path_put ( & nd . root ) ;
return filp ;
2005-04-17 02:20:36 +04:00
}
2008-02-16 01:37:28 +03:00
/**
* filp_open - open file and return file pointer
*
* @ filename : path to open
* @ flags : open flags as per the open ( 2 ) second argument
* @ mode : mode for the new file if O_CREAT is set , else ignored
*
* This is the helper to open a file from kernelspace if you really
* have to . But in generally you should not do this , so please move
* along , nothing to see here . .
*/
struct file * filp_open ( const char * filename , int flags , int mode )
{
2009-04-06 19:16:22 +04:00
return do_filp_open ( AT_FDCWD , filename , flags , mode , 0 ) ;
2008-02-16 01:37:28 +03:00
}
EXPORT_SYMBOL ( filp_open ) ;
2005-04-17 02:20:36 +04:00
/**
* lookup_create - lookup a dentry , creating it if it doesn ' t exist
* @ nd : nameidata info
* @ is_dir : directory flag
*
* Simple function to lookup and return a dentry and create it
* if it doesn ' t exist . Is SMP - safe .
2005-06-23 11:09:49 +04:00
*
2008-02-15 06:34:32 +03:00
* Returns with nd - > path . dentry - > d_inode - > i_mutex locked .
2005-04-17 02:20:36 +04:00
*/
struct dentry * lookup_create ( struct nameidata * nd , int is_dir )
{
2005-06-23 11:09:49 +04:00
struct dentry * dentry = ERR_PTR ( - EEXIST ) ;
2005-04-17 02:20:36 +04:00
2008-02-15 06:34:32 +03:00
mutex_lock_nested ( & nd - > path . dentry - > d_inode - > i_mutex , I_MUTEX_PARENT ) ;
2005-06-23 11:09:49 +04:00
/*
* Yucky last component or no last component at all ?
* ( foo / . , foo / . . , /////)
*/
2005-04-17 02:20:36 +04:00
if ( nd - > last_type ! = LAST_NORM )
goto fail ;
nd - > flags & = ~ LOOKUP_PARENT ;
2008-08-05 11:00:49 +04:00
nd - > flags | = LOOKUP_CREATE | LOOKUP_EXCL ;
2006-08-23 04:06:02 +04:00
nd - > intent . open . flags = O_EXCL ;
2005-06-23 11:09:49 +04:00
/*
* Do the final lookup .
*/
2005-11-09 08:35:06 +03:00
dentry = lookup_hash ( nd ) ;
2005-04-17 02:20:36 +04:00
if ( IS_ERR ( dentry ) )
goto fail ;
2005-06-23 11:09:49 +04:00
2008-05-15 12:49:12 +04:00
if ( dentry - > d_inode )
goto eexist ;
2005-06-23 11:09:49 +04:00
/*
* Special case - lookup gave negative , but . . . we had foo / bar /
* From the vfs_mknod ( ) POV we just have a negative dentry -
* all is fine . Let ' s be bastards - you had / on the end , you ' ve
* been asking for ( non - existent ) directory . - ENOENT for you .
*/
2008-05-15 12:49:12 +04:00
if ( unlikely ( ! is_dir & & nd - > last . name [ nd - > last . len ] ) ) {
dput ( dentry ) ;
dentry = ERR_PTR ( - ENOENT ) ;
}
2005-04-17 02:20:36 +04:00
return dentry ;
2008-05-15 12:49:12 +04:00
eexist :
2005-04-17 02:20:36 +04:00
dput ( dentry ) ;
2008-05-15 12:49:12 +04:00
dentry = ERR_PTR ( - EEXIST ) ;
2005-04-17 02:20:36 +04:00
fail :
return dentry ;
}
2005-05-19 23:26:43 +04:00
EXPORT_SYMBOL_GPL ( lookup_create ) ;
2005-04-17 02:20:36 +04:00
int vfs_mknod ( struct inode * dir , struct dentry * dentry , int mode , dev_t dev )
{
2008-07-30 17:08:48 +04:00
int error = may_create ( dir , dentry ) ;
2005-04-17 02:20:36 +04:00
if ( error )
return error ;
if ( ( S_ISCHR ( mode ) | | S_ISBLK ( mode ) ) & & ! capable ( CAP_MKNOD ) )
return - EPERM ;
2008-12-04 18:06:33 +03:00
if ( ! dir - > i_op - > mknod )
2005-04-17 02:20:36 +04:00
return - EPERM ;
2008-04-29 12:00:10 +04:00
error = devcgroup_inode_mknod ( mode , dev ) ;
if ( error )
return error ;
2005-04-17 02:20:36 +04:00
error = security_inode_mknod ( dir , dentry , mode , dev ) ;
if ( error )
return error ;
2009-01-26 18:45:12 +03:00
vfs_dq_init ( dir ) ;
2005-04-17 02:20:36 +04:00
error = dir - > i_op - > mknod ( dir , dentry , mode , dev ) ;
2005-09-10 00:01:44 +04:00
if ( ! error )
2005-11-03 18:57:06 +03:00
fsnotify_create ( dir , dentry ) ;
2005-04-17 02:20:36 +04:00
return error ;
}
2008-02-16 01:37:57 +03:00
static int may_mknod ( mode_t mode )
{
switch ( mode & S_IFMT ) {
case S_IFREG :
case S_IFCHR :
case S_IFBLK :
case S_IFIFO :
case S_IFSOCK :
case 0 : /* zero mode translates to S_IFREG */
return 0 ;
case S_IFDIR :
return - EPERM ;
default :
return - EINVAL ;
}
}
2009-01-14 16:14:31 +03:00
SYSCALL_DEFINE4 ( mknodat , int , dfd , const char __user * , filename , int , mode ,
unsigned , dev )
2005-04-17 02:20:36 +04:00
{
2008-07-21 17:32:51 +04:00
int error ;
char * tmp ;
struct dentry * dentry ;
2005-04-17 02:20:36 +04:00
struct nameidata nd ;
if ( S_ISDIR ( mode ) )
return - EPERM ;
2008-07-21 17:32:51 +04:00
error = user_path_parent ( dfd , filename , & nd , & tmp ) ;
2005-04-17 02:20:36 +04:00
if ( error )
2008-07-21 17:32:51 +04:00
return error ;
2005-04-17 02:20:36 +04:00
dentry = lookup_create ( & nd , 0 ) ;
2008-02-16 01:37:57 +03:00
if ( IS_ERR ( dentry ) ) {
error = PTR_ERR ( dentry ) ;
goto out_unlock ;
}
2008-02-15 06:34:32 +03:00
if ( ! IS_POSIXACL ( nd . path . dentry - > d_inode ) )
2009-03-30 03:08:22 +04:00
mode & = ~ current_umask ( ) ;
2008-02-16 01:37:57 +03:00
error = may_mknod ( mode ) ;
if ( error )
goto out_dput ;
error = mnt_want_write ( nd . path . mnt ) ;
if ( error )
goto out_dput ;
2008-12-17 07:24:15 +03:00
error = security_path_mknod ( & nd . path , dentry , mode , dev ) ;
if ( error )
goto out_drop_write ;
2008-02-16 01:37:57 +03:00
switch ( mode & S_IFMT ) {
2005-04-17 02:20:36 +04:00
case 0 : case S_IFREG :
2008-02-15 06:34:32 +03:00
error = vfs_create ( nd . path . dentry - > d_inode , dentry , mode , & nd ) ;
2005-04-17 02:20:36 +04:00
break ;
case S_IFCHR : case S_IFBLK :
2008-02-15 06:34:32 +03:00
error = vfs_mknod ( nd . path . dentry - > d_inode , dentry , mode ,
2005-04-17 02:20:36 +04:00
new_decode_dev ( dev ) ) ;
break ;
case S_IFIFO : case S_IFSOCK :
2008-02-15 06:34:32 +03:00
error = vfs_mknod ( nd . path . dentry - > d_inode , dentry , mode , 0 ) ;
2005-04-17 02:20:36 +04:00
break ;
}
2008-12-17 07:24:15 +03:00
out_drop_write :
2008-02-16 01:37:57 +03:00
mnt_drop_write ( nd . path . mnt ) ;
out_dput :
dput ( dentry ) ;
out_unlock :
2008-02-15 06:34:32 +03:00
mutex_unlock ( & nd . path . dentry - > d_inode - > i_mutex ) ;
2008-02-15 06:34:35 +03:00
path_put ( & nd . path ) ;
2005-04-17 02:20:36 +04:00
putname ( tmp ) ;
return error ;
}
2009-01-14 16:14:16 +03:00
SYSCALL_DEFINE3 ( mknod , const char __user * , filename , int , mode , unsigned , dev )
2006-01-19 04:43:53 +03:00
{
return sys_mknodat ( AT_FDCWD , filename , mode , dev ) ;
}
2005-04-17 02:20:36 +04:00
int vfs_mkdir ( struct inode * dir , struct dentry * dentry , int mode )
{
2008-07-30 17:08:48 +04:00
int error = may_create ( dir , dentry ) ;
2005-04-17 02:20:36 +04:00
if ( error )
return error ;
2008-12-04 18:06:33 +03:00
if ( ! dir - > i_op - > mkdir )
2005-04-17 02:20:36 +04:00
return - EPERM ;
mode & = ( S_IRWXUGO | S_ISVTX ) ;
error = security_inode_mkdir ( dir , dentry , mode ) ;
if ( error )
return error ;
2009-01-26 18:45:12 +03:00
vfs_dq_init ( dir ) ;
2005-04-17 02:20:36 +04:00
error = dir - > i_op - > mkdir ( dir , dentry , mode ) ;
2005-09-10 00:01:44 +04:00
if ( ! error )
2005-11-03 18:57:06 +03:00
fsnotify_mkdir ( dir , dentry ) ;
2005-04-17 02:20:36 +04:00
return error ;
}
2009-01-14 16:14:31 +03:00
SYSCALL_DEFINE3 ( mkdirat , int , dfd , const char __user * , pathname , int , mode )
2005-04-17 02:20:36 +04:00
{
int error = 0 ;
char * tmp ;
2006-10-01 10:29:01 +04:00
struct dentry * dentry ;
struct nameidata nd ;
2005-04-17 02:20:36 +04:00
2008-07-21 17:32:51 +04:00
error = user_path_parent ( dfd , pathname , & nd , & tmp ) ;
if ( error )
2006-10-01 10:29:01 +04:00
goto out_err ;
2005-04-17 02:20:36 +04:00
2006-10-01 10:29:01 +04:00
dentry = lookup_create ( & nd , 1 ) ;
error = PTR_ERR ( dentry ) ;
if ( IS_ERR ( dentry ) )
goto out_unlock ;
2005-04-17 02:20:36 +04:00
2008-02-15 06:34:32 +03:00
if ( ! IS_POSIXACL ( nd . path . dentry - > d_inode ) )
2009-03-30 03:08:22 +04:00
mode & = ~ current_umask ( ) ;
2008-02-16 01:37:57 +03:00
error = mnt_want_write ( nd . path . mnt ) ;
if ( error )
goto out_dput ;
2008-12-17 07:24:15 +03:00
error = security_path_mkdir ( & nd . path , dentry , mode ) ;
if ( error )
goto out_drop_write ;
2008-02-15 06:34:32 +03:00
error = vfs_mkdir ( nd . path . dentry - > d_inode , dentry , mode ) ;
2008-12-17 07:24:15 +03:00
out_drop_write :
2008-02-16 01:37:57 +03:00
mnt_drop_write ( nd . path . mnt ) ;
out_dput :
2006-10-01 10:29:01 +04:00
dput ( dentry ) ;
out_unlock :
2008-02-15 06:34:32 +03:00
mutex_unlock ( & nd . path . dentry - > d_inode - > i_mutex ) ;
2008-02-15 06:34:35 +03:00
path_put ( & nd . path ) ;
2006-10-01 10:29:01 +04:00
putname ( tmp ) ;
out_err :
2005-04-17 02:20:36 +04:00
return error ;
}
2009-01-14 16:14:22 +03:00
SYSCALL_DEFINE2 ( mkdir , const char __user * , pathname , int , mode )
2006-01-19 04:43:53 +03:00
{
return sys_mkdirat ( AT_FDCWD , pathname , mode ) ;
}
2005-04-17 02:20:36 +04:00
/*
* We try to drop the dentry early : we should have
* a usage count of 2 if we ' re the only user of this
* dentry , and if that is true ( possibly after pruning
* the dcache ) , then we drop the dentry now .
*
* A low - level filesystem can , if it choses , legally
* do a
*
* if ( ! d_unhashed ( dentry ) )
* return - EBUSY ;
*
* if it cannot handle the case of removing a directory
* that is still in use by something else . .
*/
void dentry_unhash ( struct dentry * dentry )
{
dget ( dentry ) ;
2006-12-07 07:37:07 +03:00
shrink_dcache_parent ( dentry ) ;
2005-04-17 02:20:36 +04:00
spin_lock ( & dcache_lock ) ;
spin_lock ( & dentry - > d_lock ) ;
if ( atomic_read ( & dentry - > d_count ) = = 2 )
__d_drop ( dentry ) ;
spin_unlock ( & dentry - > d_lock ) ;
spin_unlock ( & dcache_lock ) ;
}
int vfs_rmdir ( struct inode * dir , struct dentry * dentry )
{
int error = may_delete ( dir , dentry , 1 ) ;
if ( error )
return error ;
2008-12-04 18:06:33 +03:00
if ( ! dir - > i_op - > rmdir )
2005-04-17 02:20:36 +04:00
return - EPERM ;
2009-01-26 18:45:12 +03:00
vfs_dq_init ( dir ) ;
2005-04-17 02:20:36 +04:00
2006-01-10 02:59:24 +03:00
mutex_lock ( & dentry - > d_inode - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
dentry_unhash ( dentry ) ;
if ( d_mountpoint ( dentry ) )
error = - EBUSY ;
else {
error = security_inode_rmdir ( dir , dentry ) ;
if ( ! error ) {
error = dir - > i_op - > rmdir ( dir , dentry ) ;
if ( ! error )
dentry - > d_inode - > i_flags | = S_DEAD ;
}
}
2006-01-10 02:59:24 +03:00
mutex_unlock ( & dentry - > d_inode - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
if ( ! error ) {
d_delete ( dentry ) ;
}
dput ( dentry ) ;
return error ;
}
2006-01-19 04:43:53 +03:00
static long do_rmdir ( int dfd , const char __user * pathname )
2005-04-17 02:20:36 +04:00
{
int error = 0 ;
char * name ;
struct dentry * dentry ;
struct nameidata nd ;
2008-07-21 17:32:51 +04:00
error = user_path_parent ( dfd , pathname , & nd , & name ) ;
2005-04-17 02:20:36 +04:00
if ( error )
2008-07-21 17:32:51 +04:00
return error ;
2005-04-17 02:20:36 +04:00
switch ( nd . last_type ) {
2008-10-16 02:50:29 +04:00
case LAST_DOTDOT :
error = - ENOTEMPTY ;
goto exit1 ;
case LAST_DOT :
error = - EINVAL ;
goto exit1 ;
case LAST_ROOT :
error = - EBUSY ;
goto exit1 ;
2005-04-17 02:20:36 +04:00
}
2008-10-16 02:50:29 +04:00
nd . flags & = ~ LOOKUP_PARENT ;
2008-02-15 06:34:32 +03:00
mutex_lock_nested ( & nd . path . dentry - > d_inode - > i_mutex , I_MUTEX_PARENT ) ;
2005-11-09 08:35:06 +03:00
dentry = lookup_hash ( & nd ) ;
2005-04-17 02:20:36 +04:00
error = PTR_ERR ( dentry ) ;
2006-10-01 10:29:01 +04:00
if ( IS_ERR ( dentry ) )
goto exit2 ;
2008-02-16 01:37:34 +03:00
error = mnt_want_write ( nd . path . mnt ) ;
if ( error )
goto exit3 ;
2008-12-17 07:24:15 +03:00
error = security_path_rmdir ( & nd . path , dentry ) ;
if ( error )
goto exit4 ;
2008-02-15 06:34:32 +03:00
error = vfs_rmdir ( nd . path . dentry - > d_inode , dentry ) ;
2008-12-17 07:24:15 +03:00
exit4 :
2008-02-16 01:37:34 +03:00
mnt_drop_write ( nd . path . mnt ) ;
exit3 :
2006-10-01 10:29:01 +04:00
dput ( dentry ) ;
exit2 :
2008-02-15 06:34:32 +03:00
mutex_unlock ( & nd . path . dentry - > d_inode - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
exit1 :
2008-02-15 06:34:35 +03:00
path_put ( & nd . path ) ;
2005-04-17 02:20:36 +04:00
putname ( name ) ;
return error ;
}
2009-01-14 16:14:22 +03:00
SYSCALL_DEFINE1 ( rmdir , const char __user * , pathname )
2006-01-19 04:43:53 +03:00
{
return do_rmdir ( AT_FDCWD , pathname ) ;
}
2005-04-17 02:20:36 +04:00
int vfs_unlink ( struct inode * dir , struct dentry * dentry )
{
int error = may_delete ( dir , dentry , 0 ) ;
if ( error )
return error ;
2008-12-04 18:06:33 +03:00
if ( ! dir - > i_op - > unlink )
2005-04-17 02:20:36 +04:00
return - EPERM ;
2009-01-26 18:45:12 +03:00
vfs_dq_init ( dir ) ;
2005-04-17 02:20:36 +04:00
2006-01-10 02:59:24 +03:00
mutex_lock ( & dentry - > d_inode - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
if ( d_mountpoint ( dentry ) )
error = - EBUSY ;
else {
error = security_inode_unlink ( dir , dentry ) ;
2010-03-03 22:12:08 +03:00
if ( ! error ) {
2005-04-17 02:20:36 +04:00
error = dir - > i_op - > unlink ( dir , dentry ) ;
2010-03-03 22:12:08 +03:00
if ( ! error )
dentry - > d_inode - > i_flags | = S_DEAD ;
}
2005-04-17 02:20:36 +04:00
}
2006-01-10 02:59:24 +03:00
mutex_unlock ( & dentry - > d_inode - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
/* We don't d_delete() NFS sillyrenamed files--they still exist. */
if ( ! error & & ! ( dentry - > d_flags & DCACHE_NFSFS_RENAMED ) ) {
2008-02-06 12:37:13 +03:00
fsnotify_link_count ( dentry - > d_inode ) ;
2005-08-05 00:07:08 +04:00
d_delete ( dentry ) ;
2005-04-17 02:20:36 +04:00
}
[PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 01:06:03 +04:00
2005-04-17 02:20:36 +04:00
return error ;
}
/*
* Make sure that the actual truncation of the file will occur outside its
2006-01-10 02:59:24 +03:00
* directory ' s i_mutex . Truncate can take a long time if there is a lot of
2005-04-17 02:20:36 +04:00
* writeout happening , and we don ' t want to prevent access to the directory
* while waiting on the I / O .
*/
2006-01-19 04:43:53 +03:00
static long do_unlinkat ( int dfd , const char __user * pathname )
2005-04-17 02:20:36 +04:00
{
2008-07-21 17:32:51 +04:00
int error ;
char * name ;
2005-04-17 02:20:36 +04:00
struct dentry * dentry ;
struct nameidata nd ;
struct inode * inode = NULL ;
2008-07-21 17:32:51 +04:00
error = user_path_parent ( dfd , pathname , & nd , & name ) ;
2005-04-17 02:20:36 +04:00
if ( error )
2008-07-21 17:32:51 +04:00
return error ;
2005-04-17 02:20:36 +04:00
error = - EISDIR ;
if ( nd . last_type ! = LAST_NORM )
goto exit1 ;
2008-10-16 02:50:29 +04:00
nd . flags & = ~ LOOKUP_PARENT ;
2008-02-15 06:34:32 +03:00
mutex_lock_nested ( & nd . path . dentry - > d_inode - > i_mutex , I_MUTEX_PARENT ) ;
2005-11-09 08:35:06 +03:00
dentry = lookup_hash ( & nd ) ;
2005-04-17 02:20:36 +04:00
error = PTR_ERR ( dentry ) ;
if ( ! IS_ERR ( dentry ) ) {
/* Why not before? Because we want correct error value */
if ( nd . last . name [ nd . last . len ] )
goto slashes ;
inode = dentry - > d_inode ;
if ( inode )
atomic_inc ( & inode - > i_count ) ;
2008-02-16 01:37:34 +03:00
error = mnt_want_write ( nd . path . mnt ) ;
if ( error )
goto exit2 ;
2008-12-17 07:24:15 +03:00
error = security_path_unlink ( & nd . path , dentry ) ;
if ( error )
goto exit3 ;
2008-02-15 06:34:32 +03:00
error = vfs_unlink ( nd . path . dentry - > d_inode , dentry ) ;
2008-12-17 07:24:15 +03:00
exit3 :
2008-02-16 01:37:34 +03:00
mnt_drop_write ( nd . path . mnt ) ;
2005-04-17 02:20:36 +04:00
exit2 :
dput ( dentry ) ;
}
2008-02-15 06:34:32 +03:00
mutex_unlock ( & nd . path . dentry - > d_inode - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
if ( inode )
iput ( inode ) ; /* truncate the inode here */
exit1 :
2008-02-15 06:34:35 +03:00
path_put ( & nd . path ) ;
2005-04-17 02:20:36 +04:00
putname ( name ) ;
return error ;
slashes :
error = ! dentry - > d_inode ? - ENOENT :
S_ISDIR ( dentry - > d_inode - > i_mode ) ? - EISDIR : - ENOTDIR ;
goto exit2 ;
}
2009-01-14 16:14:31 +03:00
SYSCALL_DEFINE3 ( unlinkat , int , dfd , const char __user * , pathname , int , flag )
2006-01-19 04:43:53 +03:00
{
if ( ( flag & ~ AT_REMOVEDIR ) ! = 0 )
return - EINVAL ;
if ( flag & AT_REMOVEDIR )
return do_rmdir ( dfd , pathname ) ;
return do_unlinkat ( dfd , pathname ) ;
}
2009-01-14 16:14:16 +03:00
SYSCALL_DEFINE1 ( unlink , const char __user * , pathname )
2006-01-19 04:43:53 +03:00
{
return do_unlinkat ( AT_FDCWD , pathname ) ;
}
2008-06-24 18:50:16 +04:00
int vfs_symlink ( struct inode * dir , struct dentry * dentry , const char * oldname )
2005-04-17 02:20:36 +04:00
{
2008-07-30 17:08:48 +04:00
int error = may_create ( dir , dentry ) ;
2005-04-17 02:20:36 +04:00
if ( error )
return error ;
2008-12-04 18:06:33 +03:00
if ( ! dir - > i_op - > symlink )
2005-04-17 02:20:36 +04:00
return - EPERM ;
error = security_inode_symlink ( dir , dentry , oldname ) ;
if ( error )
return error ;
2009-01-26 18:45:12 +03:00
vfs_dq_init ( dir ) ;
2005-04-17 02:20:36 +04:00
error = dir - > i_op - > symlink ( dir , dentry , oldname ) ;
2005-09-10 00:01:44 +04:00
if ( ! error )
2005-11-03 18:57:06 +03:00
fsnotify_create ( dir , dentry ) ;
2005-04-17 02:20:36 +04:00
return error ;
}
2009-01-14 16:14:31 +03:00
SYSCALL_DEFINE3 ( symlinkat , const char __user * , oldname ,
int , newdfd , const char __user * , newname )
2005-04-17 02:20:36 +04:00
{
2008-07-21 17:32:51 +04:00
int error ;
char * from ;
char * to ;
2006-10-01 10:29:01 +04:00
struct dentry * dentry ;
struct nameidata nd ;
2005-04-17 02:20:36 +04:00
from = getname ( oldname ) ;
2008-07-21 17:32:51 +04:00
if ( IS_ERR ( from ) )
2005-04-17 02:20:36 +04:00
return PTR_ERR ( from ) ;
2008-07-21 17:32:51 +04:00
error = user_path_parent ( newdfd , newname , & nd , & to ) ;
2006-10-01 10:29:01 +04:00
if ( error )
2008-07-21 17:32:51 +04:00
goto out_putname ;
2006-10-01 10:29:01 +04:00
dentry = lookup_create ( & nd , 0 ) ;
error = PTR_ERR ( dentry ) ;
if ( IS_ERR ( dentry ) )
goto out_unlock ;
2008-02-16 01:37:45 +03:00
error = mnt_want_write ( nd . path . mnt ) ;
if ( error )
goto out_dput ;
2008-12-17 07:24:15 +03:00
error = security_path_symlink ( & nd . path , dentry , from ) ;
if ( error )
goto out_drop_write ;
2008-06-24 18:50:16 +04:00
error = vfs_symlink ( nd . path . dentry - > d_inode , dentry , from ) ;
2008-12-17 07:24:15 +03:00
out_drop_write :
2008-02-16 01:37:45 +03:00
mnt_drop_write ( nd . path . mnt ) ;
out_dput :
2006-10-01 10:29:01 +04:00
dput ( dentry ) ;
out_unlock :
2008-02-15 06:34:32 +03:00
mutex_unlock ( & nd . path . dentry - > d_inode - > i_mutex ) ;
2008-02-15 06:34:35 +03:00
path_put ( & nd . path ) ;
2006-10-01 10:29:01 +04:00
putname ( to ) ;
out_putname :
2005-04-17 02:20:36 +04:00
putname ( from ) ;
return error ;
}
2009-01-14 16:14:16 +03:00
SYSCALL_DEFINE2 ( symlink , const char __user * , oldname , const char __user * , newname )
2006-01-19 04:43:53 +03:00
{
return sys_symlinkat ( oldname , AT_FDCWD , newname ) ;
}
2005-04-17 02:20:36 +04:00
int vfs_link ( struct dentry * old_dentry , struct inode * dir , struct dentry * new_dentry )
{
struct inode * inode = old_dentry - > d_inode ;
int error ;
if ( ! inode )
return - ENOENT ;
2008-07-30 17:08:48 +04:00
error = may_create ( dir , new_dentry ) ;
2005-04-17 02:20:36 +04:00
if ( error )
return error ;
if ( dir - > i_sb ! = inode - > i_sb )
return - EXDEV ;
/*
* A link to an append - only or immutable file cannot be created .
*/
if ( IS_APPEND ( inode ) | | IS_IMMUTABLE ( inode ) )
return - EPERM ;
2008-12-04 18:06:33 +03:00
if ( ! dir - > i_op - > link )
2005-04-17 02:20:36 +04:00
return - EPERM ;
2008-06-24 18:50:15 +04:00
if ( S_ISDIR ( inode - > i_mode ) )
2005-04-17 02:20:36 +04:00
return - EPERM ;
error = security_inode_link ( old_dentry , dir , new_dentry ) ;
if ( error )
return error ;
2008-06-24 18:50:15 +04:00
mutex_lock ( & inode - > i_mutex ) ;
2009-01-26 18:45:12 +03:00
vfs_dq_init ( dir ) ;
2005-04-17 02:20:36 +04:00
error = dir - > i_op - > link ( old_dentry , dir , new_dentry ) ;
2008-06-24 18:50:15 +04:00
mutex_unlock ( & inode - > i_mutex ) ;
2005-09-10 00:01:45 +04:00
if ( ! error )
2008-06-24 18:50:15 +04:00
fsnotify_link ( dir , inode , new_dentry ) ;
2005-04-17 02:20:36 +04:00
return error ;
}
/*
* Hardlinks are often used in delicate situations . We avoid
* security - related surprises by not following symlinks on the
* newname . - - KAB
*
* We don ' t follow them on the oldname either to be compatible
* with linux 2.0 , and to avoid hard - linking to directories
* and other special files . - - ADM
*/
2009-01-14 16:14:31 +03:00
SYSCALL_DEFINE5 ( linkat , int , olddfd , const char __user * , oldname ,
int , newdfd , const char __user * , newname , int , flags )
2005-04-17 02:20:36 +04:00
{
struct dentry * new_dentry ;
2008-07-22 17:59:21 +04:00
struct nameidata nd ;
struct path old_path ;
2005-04-17 02:20:36 +04:00
int error ;
2008-07-21 17:32:51 +04:00
char * to ;
2005-04-17 02:20:36 +04:00
2006-06-25 16:49:11 +04:00
if ( ( flags & ~ AT_SYMLINK_FOLLOW ) ! = 0 )
2006-02-25 00:04:21 +03:00
return - EINVAL ;
2008-07-22 17:59:21 +04:00
error = user_path_at ( olddfd , oldname ,
flags & AT_SYMLINK_FOLLOW ? LOOKUP_FOLLOW : 0 ,
& old_path ) ;
2005-04-17 02:20:36 +04:00
if ( error )
2008-07-21 17:32:51 +04:00
return error ;
error = user_path_parent ( newdfd , newname , & nd , & to ) ;
2005-04-17 02:20:36 +04:00
if ( error )
goto out ;
error = - EXDEV ;
2008-07-22 17:59:21 +04:00
if ( old_path . mnt ! = nd . path . mnt )
2005-04-17 02:20:36 +04:00
goto out_release ;
new_dentry = lookup_create ( & nd , 0 ) ;
error = PTR_ERR ( new_dentry ) ;
2006-10-01 10:29:01 +04:00
if ( IS_ERR ( new_dentry ) )
goto out_unlock ;
2008-02-16 01:37:45 +03:00
error = mnt_want_write ( nd . path . mnt ) ;
if ( error )
goto out_dput ;
2008-12-17 07:24:15 +03:00
error = security_path_link ( old_path . dentry , & nd . path , new_dentry ) ;
if ( error )
goto out_drop_write ;
2008-07-22 17:59:21 +04:00
error = vfs_link ( old_path . dentry , nd . path . dentry - > d_inode , new_dentry ) ;
2008-12-17 07:24:15 +03:00
out_drop_write :
2008-02-16 01:37:45 +03:00
mnt_drop_write ( nd . path . mnt ) ;
out_dput :
2006-10-01 10:29:01 +04:00
dput ( new_dentry ) ;
out_unlock :
2008-02-15 06:34:32 +03:00
mutex_unlock ( & nd . path . dentry - > d_inode - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
out_release :
2008-02-15 06:34:35 +03:00
path_put ( & nd . path ) ;
2008-07-21 17:32:51 +04:00
putname ( to ) ;
2005-04-17 02:20:36 +04:00
out :
2008-07-22 17:59:21 +04:00
path_put ( & old_path ) ;
2005-04-17 02:20:36 +04:00
return error ;
}
2009-01-14 16:14:16 +03:00
SYSCALL_DEFINE2 ( link , const char __user * , oldname , const char __user * , newname )
2006-01-19 04:43:53 +03:00
{
2006-02-25 00:04:21 +03:00
return sys_linkat ( AT_FDCWD , oldname , AT_FDCWD , newname , 0 ) ;
2006-01-19 04:43:53 +03:00
}
2005-04-17 02:20:36 +04:00
/*
* The worst of all namespace operations - renaming directory . " Perverted "
* doesn ' t even start to describe it . Somebody in UCB had a heck of a trip . . .
* Problems :
* a ) we can get into loop creation . Check is done in is_subdir ( ) .
* b ) race potential - two innocent renames can create a loop together .
* That ' s where 4.4 screws up . Current fix : serialization on
2006-03-23 14:00:33 +03:00
* sb - > s_vfs_rename_mutex . We might be more accurate , but that ' s another
2005-04-17 02:20:36 +04:00
* story .
* c ) we have to lock _three_ objects - parents and victim ( if it exists ) .
2006-01-10 02:59:24 +03:00
* And that - after we got - > i_mutex on parents ( until then we don ' t know
2005-04-17 02:20:36 +04:00
* whether the target exists ) . Solution : try to be smart with locking
* order for inodes . We rely on the fact that tree topology may change
2006-03-23 14:00:33 +03:00
* only under - > s_vfs_rename_mutex _and_ that parent of the object we
2005-04-17 02:20:36 +04:00
* move will be locked . Thus we can rank directories by the tree
* ( ancestors first ) and rank all non - directories after them .
* That works since everybody except rename does " lock parent, lookup,
2006-03-23 14:00:33 +03:00
* lock child " and rename is under ->s_vfs_rename_mutex.
2005-04-17 02:20:36 +04:00
* HOWEVER , it relies on the assumption that any object with - > lookup ( )
* has no more than 1 dentry . If " hybrid " objects will ever appear ,
* we ' d better make sure that there ' s no link ( 2 ) for them .
* d ) some filesystems don ' t support opened - but - unlinked directories ,
* either because of layout or because they are not ready to deal with
* all cases correctly . The latter will be fixed ( taking this sort of
* stuff into VFS ) , but the former is not going away . Solution : the same
* trick as in rmdir ( ) .
* e ) conversion from fhandle to dentry may come in the wrong moment - when
2006-01-10 02:59:24 +03:00
* we are removing the target . Solution : we will have to grab - > i_mutex
2005-04-17 02:20:36 +04:00
* in the fhandle_to_dentry code . [ FIXME - current nfsfh . c relies on
2006-01-10 02:59:24 +03:00
* - > i_mutex on parents , which works but leads to some truely excessive
2005-04-17 02:20:36 +04:00
* locking ] .
*/
2005-05-06 03:16:09 +04:00
static int vfs_rename_dir ( struct inode * old_dir , struct dentry * old_dentry ,
struct inode * new_dir , struct dentry * new_dentry )
2005-04-17 02:20:36 +04:00
{
int error = 0 ;
struct inode * target ;
/*
* If we are going to change the parent - check write permissions ,
* we ' ll need to flip ' . . ' .
*/
if ( new_dir ! = old_dir ) {
2008-07-22 08:07:17 +04:00
error = inode_permission ( old_dentry - > d_inode , MAY_WRITE ) ;
2005-04-17 02:20:36 +04:00
if ( error )
return error ;
}
error = security_inode_rename ( old_dir , old_dentry , new_dir , new_dentry ) ;
if ( error )
return error ;
target = new_dentry - > d_inode ;
if ( target ) {
2006-01-10 02:59:24 +03:00
mutex_lock ( & target - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
dentry_unhash ( new_dentry ) ;
}
if ( d_mountpoint ( old_dentry ) | | d_mountpoint ( new_dentry ) )
error = - EBUSY ;
else
error = old_dir - > i_op - > rename ( old_dir , old_dentry , new_dir , new_dentry ) ;
if ( target ) {
if ( ! error )
target - > i_flags | = S_DEAD ;
2006-01-10 02:59:24 +03:00
mutex_unlock ( & target - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
if ( d_unhashed ( new_dentry ) )
d_rehash ( new_dentry ) ;
dput ( new_dentry ) ;
}
2005-09-10 00:01:45 +04:00
if ( ! error )
2006-09-09 01:22:21 +04:00
if ( ! ( old_dir - > i_sb - > s_type - > fs_flags & FS_RENAME_DOES_D_MOVE ) )
d_move ( old_dentry , new_dentry ) ;
2005-04-17 02:20:36 +04:00
return error ;
}
2005-05-06 03:16:09 +04:00
static int vfs_rename_other ( struct inode * old_dir , struct dentry * old_dentry ,
struct inode * new_dir , struct dentry * new_dentry )
2005-04-17 02:20:36 +04:00
{
struct inode * target ;
int error ;
error = security_inode_rename ( old_dir , old_dentry , new_dir , new_dentry ) ;
if ( error )
return error ;
dget ( new_dentry ) ;
target = new_dentry - > d_inode ;
if ( target )
2006-01-10 02:59:24 +03:00
mutex_lock ( & target - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
if ( d_mountpoint ( old_dentry ) | | d_mountpoint ( new_dentry ) )
error = - EBUSY ;
else
error = old_dir - > i_op - > rename ( old_dir , old_dentry , new_dir , new_dentry ) ;
if ( ! error ) {
2010-03-03 22:12:08 +03:00
if ( target )
target - > i_flags | = S_DEAD ;
2006-09-09 01:22:21 +04:00
if ( ! ( old_dir - > i_sb - > s_type - > fs_flags & FS_RENAME_DOES_D_MOVE ) )
2005-04-17 02:20:36 +04:00
d_move ( old_dentry , new_dentry ) ;
}
if ( target )
2006-01-10 02:59:24 +03:00
mutex_unlock ( & target - > i_mutex ) ;
2005-04-17 02:20:36 +04:00
dput ( new_dentry ) ;
return error ;
}
int vfs_rename ( struct inode * old_dir , struct dentry * old_dentry ,
struct inode * new_dir , struct dentry * new_dentry )
{
int error ;
int is_dir = S_ISDIR ( old_dentry - > d_inode - > i_mode ) ;
[PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 01:06:03 +04:00
const char * old_name ;
2005-04-17 02:20:36 +04:00
if ( old_dentry - > d_inode = = new_dentry - > d_inode )
return 0 ;
error = may_delete ( old_dir , old_dentry , is_dir ) ;
if ( error )
return error ;
if ( ! new_dentry - > d_inode )
2008-07-30 17:08:48 +04:00
error = may_create ( new_dir , new_dentry ) ;
2005-04-17 02:20:36 +04:00
else
error = may_delete ( new_dir , new_dentry , is_dir ) ;
if ( error )
return error ;
2008-12-04 18:06:33 +03:00
if ( ! old_dir - > i_op - > rename )
2005-04-17 02:20:36 +04:00
return - EPERM ;
2009-01-26 18:45:12 +03:00
vfs_dq_init ( old_dir ) ;
vfs_dq_init ( new_dir ) ;
2005-04-17 02:20:36 +04:00
[PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 01:06:03 +04:00
old_name = fsnotify_oldname_init ( old_dentry - > d_name . name ) ;
2005-04-17 02:20:36 +04:00
if ( is_dir )
error = vfs_rename_dir ( old_dir , old_dentry , new_dir , new_dentry ) ;
else
error = vfs_rename_other ( old_dir , old_dentry , new_dir , new_dentry ) ;
2009-12-25 12:57:57 +03:00
if ( ! error )
fsnotify_move ( old_dir , new_dir , old_name , is_dir ,
2007-06-07 20:19:32 +04:00
new_dentry - > d_inode , old_dentry ) ;
[PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-13 01:06:03 +04:00
fsnotify_oldname_free ( old_name ) ;
2005-04-17 02:20:36 +04:00
return error ;
}
2009-01-14 16:14:31 +03:00
SYSCALL_DEFINE4 ( renameat , int , olddfd , const char __user * , oldname ,
int , newdfd , const char __user * , newname )
2005-04-17 02:20:36 +04:00
{
2008-07-21 17:32:51 +04:00
struct dentry * old_dir , * new_dir ;
struct dentry * old_dentry , * new_dentry ;
struct dentry * trap ;
2005-04-17 02:20:36 +04:00
struct nameidata oldnd , newnd ;
2008-07-21 17:32:51 +04:00
char * from ;
char * to ;
int error ;
2005-04-17 02:20:36 +04:00
2008-07-21 17:32:51 +04:00
error = user_path_parent ( olddfd , oldname , & oldnd , & from ) ;
2005-04-17 02:20:36 +04:00
if ( error )
goto exit ;
2008-07-21 17:32:51 +04:00
error = user_path_parent ( newdfd , newname , & newnd , & to ) ;
2005-04-17 02:20:36 +04:00
if ( error )
goto exit1 ;
error = - EXDEV ;
2008-02-15 06:34:32 +03:00
if ( oldnd . path . mnt ! = newnd . path . mnt )
2005-04-17 02:20:36 +04:00
goto exit2 ;
2008-02-15 06:34:32 +03:00
old_dir = oldnd . path . dentry ;
2005-04-17 02:20:36 +04:00
error = - EBUSY ;
if ( oldnd . last_type ! = LAST_NORM )
goto exit2 ;
2008-02-15 06:34:32 +03:00
new_dir = newnd . path . dentry ;
2005-04-17 02:20:36 +04:00
if ( newnd . last_type ! = LAST_NORM )
goto exit2 ;
2008-10-16 02:50:29 +04:00
oldnd . flags & = ~ LOOKUP_PARENT ;
newnd . flags & = ~ LOOKUP_PARENT ;
2008-10-16 02:50:29 +04:00
newnd . flags | = LOOKUP_RENAME_TARGET ;
2008-10-16 02:50:29 +04:00
2005-04-17 02:20:36 +04:00
trap = lock_rename ( new_dir , old_dir ) ;
2005-11-09 08:35:06 +03:00
old_dentry = lookup_hash ( & oldnd ) ;
2005-04-17 02:20:36 +04:00
error = PTR_ERR ( old_dentry ) ;
if ( IS_ERR ( old_dentry ) )
goto exit3 ;
/* source must exist */
error = - ENOENT ;
if ( ! old_dentry - > d_inode )
goto exit4 ;
/* unless the source is a directory trailing slashes give -ENOTDIR */
if ( ! S_ISDIR ( old_dentry - > d_inode - > i_mode ) ) {
error = - ENOTDIR ;
if ( oldnd . last . name [ oldnd . last . len ] )
goto exit4 ;
if ( newnd . last . name [ newnd . last . len ] )
goto exit4 ;
}
/* source should not be ancestor of target */
error = - EINVAL ;
if ( old_dentry = = trap )
goto exit4 ;
2005-11-09 08:35:06 +03:00
new_dentry = lookup_hash ( & newnd ) ;
2005-04-17 02:20:36 +04:00
error = PTR_ERR ( new_dentry ) ;
if ( IS_ERR ( new_dentry ) )
goto exit4 ;
/* target should not be an ancestor of source */
error = - ENOTEMPTY ;
if ( new_dentry = = trap )
goto exit5 ;
2008-02-16 01:37:49 +03:00
error = mnt_want_write ( oldnd . path . mnt ) ;
if ( error )
goto exit5 ;
2008-12-17 07:24:15 +03:00
error = security_path_rename ( & oldnd . path , old_dentry ,
& newnd . path , new_dentry ) ;
if ( error )
goto exit6 ;
2005-04-17 02:20:36 +04:00
error = vfs_rename ( old_dir - > d_inode , old_dentry ,
new_dir - > d_inode , new_dentry ) ;
2008-12-17 07:24:15 +03:00
exit6 :
2008-02-16 01:37:49 +03:00
mnt_drop_write ( oldnd . path . mnt ) ;
2005-04-17 02:20:36 +04:00
exit5 :
dput ( new_dentry ) ;
exit4 :
dput ( old_dentry ) ;
exit3 :
unlock_rename ( new_dir , old_dir ) ;
exit2 :
2008-02-15 06:34:35 +03:00
path_put ( & newnd . path ) ;
2008-07-21 17:32:51 +04:00
putname ( to ) ;
2005-04-17 02:20:36 +04:00
exit1 :
2008-02-15 06:34:35 +03:00
path_put ( & oldnd . path ) ;
2005-04-17 02:20:36 +04:00
putname ( from ) ;
2008-07-21 17:32:51 +04:00
exit :
2005-04-17 02:20:36 +04:00
return error ;
}
2009-01-14 16:14:17 +03:00
SYSCALL_DEFINE2 ( rename , const char __user * , oldname , const char __user * , newname )
2006-01-19 04:43:53 +03:00
{
return sys_renameat ( AT_FDCWD , oldname , AT_FDCWD , newname ) ;
}
2005-04-17 02:20:36 +04:00
int vfs_readlink ( struct dentry * dentry , char __user * buffer , int buflen , const char * link )
{
int len ;
len = PTR_ERR ( link ) ;
if ( IS_ERR ( link ) )
goto out ;
len = strlen ( link ) ;
if ( len > ( unsigned ) buflen )
len = buflen ;
if ( copy_to_user ( buffer , link , len ) )
len = - EFAULT ;
out :
return len ;
}
/*
* A helper for - > readlink ( ) . This should be used * ONLY * for symlinks that
* have - > follow_link ( ) touching nd only in nd_set_link ( ) . Using ( or not
* using ) it for any given inode is up to filesystem .
*/
int generic_readlink ( struct dentry * dentry , char __user * buffer , int buflen )
{
struct nameidata nd ;
2005-08-20 05:02:56 +04:00
void * cookie ;
2008-06-10 03:40:37 +04:00
int res ;
2005-08-20 05:02:56 +04:00
2005-04-17 02:20:36 +04:00
nd . depth = 0 ;
2005-08-20 05:02:56 +04:00
cookie = dentry - > d_inode - > i_op - > follow_link ( dentry , & nd ) ;
2008-06-10 03:40:37 +04:00
if ( IS_ERR ( cookie ) )
return PTR_ERR ( cookie ) ;
res = vfs_readlink ( dentry , buffer , buflen , nd_get_link ( & nd ) ) ;
if ( dentry - > d_inode - > i_op - > put_link )
dentry - > d_inode - > i_op - > put_link ( dentry , & nd , cookie ) ;
return res ;
2005-04-17 02:20:36 +04:00
}
int vfs_follow_link ( struct nameidata * nd , const char * link )
{
return __vfs_follow_link ( nd , link ) ;
}
/* get the link contents into pagecache */
static char * page_getlink ( struct dentry * dentry , struct page * * ppage )
{
2008-12-19 23:47:12 +03:00
char * kaddr ;
struct page * page ;
2005-04-17 02:20:36 +04:00
struct address_space * mapping = dentry - > d_inode - > i_mapping ;
2006-06-23 13:05:08 +04:00
page = read_mapping_page ( mapping , 0 , NULL ) ;
2005-04-17 02:20:36 +04:00
if ( IS_ERR ( page ) )
2007-05-07 01:49:04 +04:00
return ( char * ) page ;
2005-04-17 02:20:36 +04:00
* ppage = page ;
2008-12-19 23:47:12 +03:00
kaddr = kmap ( page ) ;
nd_terminate_link ( kaddr , dentry - > d_inode - > i_size , PAGE_SIZE - 1 ) ;
return kaddr ;
2005-04-17 02:20:36 +04:00
}
int page_readlink ( struct dentry * dentry , char __user * buffer , int buflen )
{
struct page * page = NULL ;
char * s = page_getlink ( dentry , & page ) ;
int res = vfs_readlink ( dentry , buffer , buflen , s ) ;
if ( page ) {
kunmap ( page ) ;
page_cache_release ( page ) ;
}
return res ;
}
2005-08-20 05:02:56 +04:00
void * page_follow_link_light ( struct dentry * dentry , struct nameidata * nd )
2005-04-17 02:20:36 +04:00
{
2005-08-20 05:02:56 +04:00
struct page * page = NULL ;
2005-04-17 02:20:36 +04:00
nd_set_link ( nd , page_getlink ( dentry , & page ) ) ;
2005-08-20 05:02:56 +04:00
return page ;
2005-04-17 02:20:36 +04:00
}
2005-08-20 05:02:56 +04:00
void page_put_link ( struct dentry * dentry , struct nameidata * nd , void * cookie )
2005-04-17 02:20:36 +04:00
{
2005-08-20 05:02:56 +04:00
struct page * page = cookie ;
if ( page ) {
2005-04-17 02:20:36 +04:00
kunmap ( page ) ;
page_cache_release ( page ) ;
}
}
fs: symlink write_begin allocation context fix
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.
The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.
Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).
This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).
[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 23:00:53 +03:00
/*
* The nofs argument instructs pagecache_write_begin to pass AOP_FLAG_NOFS
*/
int __page_symlink ( struct inode * inode , const char * symname , int len , int nofs )
2005-04-17 02:20:36 +04:00
{
struct address_space * mapping = inode - > i_mapping ;
2006-03-11 14:27:13 +03:00
struct page * page ;
2007-10-16 12:25:01 +04:00
void * fsdata ;
2007-02-16 12:27:18 +03:00
int err ;
2005-04-17 02:20:36 +04:00
char * kaddr ;
fs: symlink write_begin allocation context fix
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.
The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.
Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).
This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).
[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 23:00:53 +03:00
unsigned int flags = AOP_FLAG_UNINTERRUPTIBLE ;
if ( nofs )
flags | = AOP_FLAG_NOFS ;
2005-04-17 02:20:36 +04:00
2006-03-25 14:07:57 +03:00
retry :
2007-10-16 12:25:01 +04:00
err = pagecache_write_begin ( NULL , mapping , 0 , len - 1 ,
fs: symlink write_begin allocation context fix
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.
The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.
Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).
This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).
[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 23:00:53 +03:00
flags , & page , & fsdata ) ;
2005-04-17 02:20:36 +04:00
if ( err )
2007-10-16 12:25:01 +04:00
goto fail ;
2005-04-17 02:20:36 +04:00
kaddr = kmap_atomic ( page , KM_USER0 ) ;
memcpy ( kaddr , symname , len - 1 ) ;
kunmap_atomic ( kaddr , KM_USER0 ) ;
2007-10-16 12:25:01 +04:00
err = pagecache_write_end ( NULL , mapping , 0 , len - 1 , len - 1 ,
page , fsdata ) ;
2005-04-17 02:20:36 +04:00
if ( err < 0 )
goto fail ;
2007-10-16 12:25:01 +04:00
if ( err < len - 1 )
goto retry ;
2005-04-17 02:20:36 +04:00
mark_inode_dirty ( inode ) ;
return 0 ;
fail :
return err ;
}
2006-03-11 14:27:13 +03:00
int page_symlink ( struct inode * inode , const char * symname , int len )
{
return __page_symlink ( inode , symname , len ,
fs: symlink write_begin allocation context fix
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened. They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim. This bug could
cause filesystem deadlocks.
The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock. The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.
Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
this flag in their write_begin function. Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).
This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
random example).
[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org> [2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
untouched to the grab_cache_page_write_begin() function. That
just simplifies everybody, and may even allow future expansion of the
logic. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 23:00:53 +03:00
! ( mapping_gfp_mask ( inode - > i_mapping ) & __GFP_FS ) ) ;
2006-03-11 14:27:13 +03:00
}
2007-02-12 11:55:39 +03:00
const struct inode_operations page_symlink_inode_operations = {
2005-04-17 02:20:36 +04:00
. readlink = generic_readlink ,
. follow_link = page_follow_link_light ,
. put_link = page_put_link ,
} ;
2008-07-22 17:59:21 +04:00
EXPORT_SYMBOL ( user_path_at ) ;
2005-04-17 02:20:36 +04:00
EXPORT_SYMBOL ( follow_down ) ;
EXPORT_SYMBOL ( follow_up ) ;
EXPORT_SYMBOL ( get_write_access ) ; /* binfmt_aout */
EXPORT_SYMBOL ( getname ) ;
EXPORT_SYMBOL ( lock_rename ) ;
EXPORT_SYMBOL ( lookup_one_len ) ;
EXPORT_SYMBOL ( page_follow_link_light ) ;
EXPORT_SYMBOL ( page_put_link ) ;
EXPORT_SYMBOL ( page_readlink ) ;
2006-03-11 14:27:13 +03:00
EXPORT_SYMBOL ( __page_symlink ) ;
2005-04-17 02:20:36 +04:00
EXPORT_SYMBOL ( page_symlink ) ;
EXPORT_SYMBOL ( page_symlink_inode_operations ) ;
EXPORT_SYMBOL ( path_lookup ) ;
2008-08-02 08:49:18 +04:00
EXPORT_SYMBOL ( kern_path ) ;
fs: introduce vfs_path_lookup
Stackable file systems, among others, frequently need to lookup paths or
path components starting from an arbitrary point in the namespace
(identified by a dentry and a vfsmount). Currently, such file systems use
lookup_one_len, which is frowned upon [1] as it does not pass the lookup
intent along; not passing a lookup intent, for example, can trigger BUG_ON's
when stacking on top of NFSv4.
The first patch introduces a new lookup function to allow lookup starting
from an arbitrary point in the namespace. This approach has been suggested
by Christoph Hellwig [2].
The second patch changes sunrpc to use vfs_path_lookup.
The third patch changes nfsctl.c to use vfs_path_lookup.
The fourth patch marks link_path_walk static.
The fifth, and last patch, unexports path_walk because it is no longer
unnecessary to call it directly, and using the new vfs_path_lookup is
cleaner.
For example, the following snippet of code, looks up "some/path/component"
in a directory pointed to by parent_{dentry,vfsmnt}:
err = vfs_path_lookup(parent_dentry, parent_vfsmnt,
"some/path/component", 0, &nd);
if (!err) {
/* exits */
...
/* once done, release the references */
path_release(&nd);
} else if (err == -ENOENT) {
/* doesn't exist */
} else {
/* other error */
}
VFS functions such as lookup_create can be used on the nameidata structure
to pass the create intent to the file system.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 12:48:18 +04:00
EXPORT_SYMBOL ( vfs_path_lookup ) ;
2008-07-22 08:07:17 +04:00
EXPORT_SYMBOL ( inode_permission ) ;
2005-11-09 08:35:04 +03:00
EXPORT_SYMBOL ( file_permission ) ;
2005-04-17 02:20:36 +04:00
EXPORT_SYMBOL ( unlock_rename ) ;
EXPORT_SYMBOL ( vfs_create ) ;
EXPORT_SYMBOL ( vfs_follow_link ) ;
EXPORT_SYMBOL ( vfs_link ) ;
EXPORT_SYMBOL ( vfs_mkdir ) ;
EXPORT_SYMBOL ( vfs_mknod ) ;
EXPORT_SYMBOL ( generic_permission ) ;
EXPORT_SYMBOL ( vfs_readlink ) ;
EXPORT_SYMBOL ( vfs_rename ) ;
EXPORT_SYMBOL ( vfs_rmdir ) ;
EXPORT_SYMBOL ( vfs_symlink ) ;
EXPORT_SYMBOL ( vfs_unlink ) ;
EXPORT_SYMBOL ( dentry_unhash ) ;
EXPORT_SYMBOL ( generic_readlink ) ;