2005-04-16 15:20:36 -07:00
/*
* proc / fs / generic . c - - - generic routines for the proc - fs
*
* This file contains generic proc - fs routines for handling
* directories and files .
*
* Copyright ( C ) 1991 , 1992 Linus Torvalds .
* Copyright ( C ) 1997 Theodore Ts ' o
*/
# include <linux/errno.h>
# include <linux/time.h>
# include <linux/proc_fs.h>
# include <linux/stat.h>
2010-06-04 11:30:02 +02:00
# include <linux/mm.h>
2005-04-16 15:20:36 -07:00
# include <linux/module.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 17:04:11 +09:00
# include <linux/slab.h>
2013-02-27 17:03:16 -08:00
# include <linux/printk.h>
2005-04-16 15:20:36 -07:00
# include <linux/mount.h>
# include <linux/init.h>
# include <linux/idr.h>
# include <linux/bitops.h>
2006-03-26 01:36:55 -08:00
# include <linux/spinlock.h>
Fix rmmod/read/write races in /proc entries
Fix following races:
===========================================
1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
meanwhile. Or, more generically, system call done on /proc file, method
supplied by module is called, module dissapeares meanwhile.
pde = create_proc_entry()
if (!pde)
return -ENOMEM;
pde->write_proc = ...
open
write
copy_from_user
pde = create_proc_entry();
if (!pde) {
remove_proc_entry();
return -ENOMEM;
/* module unloaded */
}
*boom*
==========================================
2. bogo-revoke aka proc_kill_inodes()
remove_proc_entry vfs_read
proc_kill_inodes [check ->f_op validness]
[check ->f_op->read validness]
[verify_area, security permissions checks]
->f_op = NULL;
if (file->f_op->read)
/* ->f_op dereference, boom */
NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set ->owner for them, so proxying for
directories may be unneeded.
NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
If your in-tree module uses something else, yell on me. Full audit pending.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 23:39:00 -07:00
# include <linux/completion.h>
2005-04-16 15:20:36 -07:00
# include <asm/uaccess.h>
2006-01-08 01:04:16 -08:00
# include "internal.h"
2015-09-09 15:35:57 -07:00
static DEFINE_RWLOCK ( proc_subdir_lock ) ;
2006-03-26 01:36:55 -08:00
2011-03-23 16:42:52 -07:00
static int proc_match ( unsigned int len , const char * name , struct proc_dir_entry * de )
2005-04-16 15:20:36 -07:00
{
2014-12-10 15:45:01 -08:00
if ( len < de - > namelen )
return - 1 ;
if ( len > de - > namelen )
return 1 ;
return memcmp ( name , de - > name , len ) ;
}
static struct proc_dir_entry * pde_subdir_first ( struct proc_dir_entry * dir )
{
2014-12-10 15:45:07 -08:00
return rb_entry_safe ( rb_first ( & dir - > subdir ) , struct proc_dir_entry ,
subdir_node ) ;
2014-12-10 15:45:01 -08:00
}
static struct proc_dir_entry * pde_subdir_next ( struct proc_dir_entry * dir )
{
2014-12-10 15:45:07 -08:00
return rb_entry_safe ( rb_next ( & dir - > subdir_node ) , struct proc_dir_entry ,
subdir_node ) ;
2014-12-10 15:45:01 -08:00
}
static struct proc_dir_entry * pde_subdir_find ( struct proc_dir_entry * dir ,
const char * name ,
unsigned int len )
{
struct rb_node * node = dir - > subdir . rb_node ;
while ( node ) {
struct proc_dir_entry * de = container_of ( node ,
struct proc_dir_entry ,
subdir_node ) ;
int result = proc_match ( len , name , de ) ;
if ( result < 0 )
node = node - > rb_left ;
else if ( result > 0 )
node = node - > rb_right ;
else
return de ;
}
return NULL ;
}
static bool pde_subdir_insert ( struct proc_dir_entry * dir ,
struct proc_dir_entry * de )
{
struct rb_root * root = & dir - > subdir ;
struct rb_node * * new = & root - > rb_node , * parent = NULL ;
/* Figure out where to put new node */
while ( * new ) {
struct proc_dir_entry * this =
container_of ( * new , struct proc_dir_entry , subdir_node ) ;
int result = proc_match ( de - > namelen , de - > name , this ) ;
parent = * new ;
if ( result < 0 )
new = & ( * new ) - > rb_left ;
else if ( result > 0 )
new = & ( * new ) - > rb_right ;
else
return false ;
}
/* Add new node and rebalance tree. */
rb_link_node ( & de - > subdir_node , parent , new ) ;
rb_insert_color ( & de - > subdir_node , root ) ;
return true ;
2005-04-16 15:20:36 -07:00
}
static int proc_notify_change ( struct dentry * dentry , struct iattr * iattr )
{
2015-03-17 22:25:59 +00:00
struct inode * inode = d_inode ( dentry ) ;
2005-04-16 15:20:36 -07:00
struct proc_dir_entry * de = PDE ( inode ) ;
int error ;
error = inode_change_ok ( inode , iattr ) ;
if ( error )
2010-06-04 11:30:02 +02:00
return error ;
2005-04-16 15:20:36 -07:00
2010-06-04 11:30:02 +02:00
setattr_copy ( inode , iattr ) ;
mark_inode_dirty ( inode ) ;
2012-12-15 11:48:48 +01:00
2014-01-23 15:55:41 -08:00
proc_set_user ( de , inode - > i_uid , inode - > i_gid ) ;
2005-04-16 15:20:36 -07:00
de - > mode = inode - > i_mode ;
2010-06-04 11:30:02 +02:00
return 0 ;
2005-04-16 15:20:36 -07:00
}
2005-09-06 15:17:18 -07:00
static int proc_getattr ( struct vfsmount * mnt , struct dentry * dentry ,
struct kstat * stat )
{
2015-03-17 22:25:59 +00:00
struct inode * inode = d_inode ( dentry ) ;
2015-02-12 15:01:03 -08:00
struct proc_dir_entry * de = PDE ( inode ) ;
2005-09-06 15:17:18 -07:00
if ( de & & de - > nlink )
2011-10-28 14:13:29 +02:00
set_nlink ( inode , de - > nlink ) ;
2005-09-06 15:17:18 -07:00
generic_fillattr ( inode , stat ) ;
return 0 ;
}
2007-02-12 00:55:40 -08:00
static const struct inode_operations proc_file_inode_operations = {
2005-04-16 15:20:36 -07:00
. setattr = proc_notify_change ,
} ;
/*
* This function parses a name such as " tty/driver/serial " , and
* returns the struct proc_dir_entry for " /proc/tty/driver " , and
* returns " serial " in residual .
*/
2010-03-05 13:43:59 -08:00
static int __xlate_proc_name ( const char * name , struct proc_dir_entry * * ret ,
const char * * residual )
2005-04-16 15:20:36 -07:00
{
const char * cp = name , * next ;
struct proc_dir_entry * de ;
2011-03-23 16:42:52 -07:00
unsigned int len ;
2005-04-16 15:20:36 -07:00
proc: less special case in xlate code
If valid "parent" is passed to proc_create/remove_proc_entry(), then name of
PDE should consist of only one path component, otherwise creation or or
removal will fail. However, if NULL is passed as parent then create/remove
accept full path as a argument. This is arbitrary restriction -- all
infrastructure is in place.
So, patch allows the following to succeed:
create_proc_entry("foo/bar", 0, pde_baz);
remove_proc_entry("baz/foo/bar", &proc_root);
Also makes the following to behave identically:
create_proc_entry("foo/bar", 0, NULL);
create_proc_entry("foo/bar", 0, &proc_root);
Discrepancy noticed by Den Lunev (IIRC).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 01:01:40 -07:00
de = * ret ;
if ( ! de )
de = & proc_root ;
2005-04-16 15:20:36 -07:00
while ( 1 ) {
next = strchr ( cp , ' / ' ) ;
if ( ! next )
break ;
len = next - cp ;
2014-12-10 15:45:01 -08:00
de = pde_subdir_find ( de , cp , len ) ;
2010-03-05 13:44:00 -08:00
if ( ! de ) {
WARN ( 1 , " name '%s' \n " , name ) ;
2010-03-05 13:43:59 -08:00
return - ENOENT ;
2010-03-05 13:44:00 -08:00
}
2005-04-16 15:20:36 -07:00
cp + = len + 1 ;
}
* residual = cp ;
* ret = de ;
2010-03-05 13:43:59 -08:00
return 0 ;
}
static int xlate_proc_name ( const char * name , struct proc_dir_entry * * ret ,
const char * * residual )
{
int rv ;
2015-09-09 15:35:57 -07:00
read_lock ( & proc_subdir_lock ) ;
2010-03-05 13:43:59 -08:00
rv = __xlate_proc_name ( name , ret , residual ) ;
2015-09-09 15:35:57 -07:00
read_unlock ( & proc_subdir_lock ) ;
2010-03-05 13:43:59 -08:00
return rv ;
2005-04-16 15:20:36 -07:00
}
2008-07-26 11:21:37 +04:00
static DEFINE_IDA ( proc_inum_ida ) ;
2005-04-16 15:20:36 -07:00
static DEFINE_SPINLOCK ( proc_inum_lock ) ; /* protects the above */
2008-07-26 11:18:28 +04:00
# define PROC_DYNAMIC_FIRST 0xF0000000U
2005-04-16 15:20:36 -07:00
/*
* Return an inode number between PROC_DYNAMIC_FIRST and
* 0xffffffff , or zero on failure .
*/
2011-06-17 13:33:20 -07:00
int proc_alloc_inum ( unsigned int * inum )
2005-04-16 15:20:36 -07:00
{
2008-07-26 11:18:28 +04:00
unsigned int i ;
2005-04-16 15:20:36 -07:00
int error ;
retry :
2011-06-17 13:33:20 -07:00
if ( ! ida_pre_get ( & proc_inum_ida , GFP_KERNEL ) )
return - ENOMEM ;
2005-04-16 15:20:36 -07:00
2012-12-21 20:38:00 -08:00
spin_lock_irq ( & proc_inum_lock ) ;
2008-07-26 11:21:37 +04:00
error = ida_get_new ( & proc_inum_ida , & i ) ;
2012-12-21 20:38:00 -08:00
spin_unlock_irq ( & proc_inum_lock ) ;
2005-04-16 15:20:36 -07:00
if ( error = = - EAGAIN )
goto retry ;
else if ( error )
2011-06-17 13:33:20 -07:00
return error ;
2005-04-16 15:20:36 -07:00
2008-07-26 11:18:28 +04:00
if ( i > UINT_MAX - PROC_DYNAMIC_FIRST ) {
2012-12-21 20:38:00 -08:00
spin_lock_irq ( & proc_inum_lock ) ;
2008-07-26 11:21:37 +04:00
ida_remove ( & proc_inum_ida , i ) ;
2012-12-21 20:38:00 -08:00
spin_unlock_irq ( & proc_inum_lock ) ;
2011-06-17 13:33:20 -07:00
return - ENOSPC ;
2008-07-26 11:18:28 +04:00
}
2011-06-17 13:33:20 -07:00
* inum = PROC_DYNAMIC_FIRST + i ;
return 0 ;
2005-04-16 15:20:36 -07:00
}
2011-06-17 13:33:20 -07:00
void proc_free_inum ( unsigned int inum )
2005-04-16 15:20:36 -07:00
{
2012-12-21 20:38:00 -08:00
unsigned long flags ;
spin_lock_irqsave ( & proc_inum_lock , flags ) ;
2008-07-26 11:21:37 +04:00
ida_remove ( & proc_inum_ida , inum - PROC_DYNAMIC_FIRST ) ;
2012-12-21 20:38:00 -08:00
spin_unlock_irqrestore ( & proc_inum_lock , flags ) ;
2005-04-16 15:20:36 -07:00
}
/*
* Don ' t create negative dentries here , return - ENOENT by hand
* instead .
*/
[NET]: Make /proc/net a symlink on /proc/self/net (v3)
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.
The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.
The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.
# ls -l /proc/net
lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net
In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.
Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
screwup pointed out by Stephen.
To get the correct nlink count the ->getattr callback for /proc/net
is overridden to read one from the net->proc_net entry.
To make selinux still work the net->proc_net entry is initialized
properly, i.e. with the "net" name and the proc_net parent.
Selinux fixes are
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:08:40 -08:00
struct dentry * proc_lookup_de ( struct proc_dir_entry * de , struct inode * dir ,
struct dentry * dentry )
2005-04-16 15:20:36 -07:00
{
2013-01-25 20:11:22 -05:00
struct inode * inode ;
2005-04-16 15:20:36 -07:00
2015-09-09 15:35:57 -07:00
read_lock ( & proc_subdir_lock ) ;
2014-12-10 15:45:01 -08:00
de = pde_subdir_find ( de , dentry - > d_name . name , dentry - > d_name . len ) ;
if ( de ) {
pde_get ( de ) ;
2015-09-09 15:35:57 -07:00
read_unlock ( & proc_subdir_lock ) ;
2014-12-10 15:45:01 -08:00
inode = proc_get_inode ( dir - > i_sb , de ) ;
if ( ! inode )
return ERR_PTR ( - ENOMEM ) ;
d_set_d_op ( dentry , & simple_dentry_operations ) ;
d_add ( dentry , inode ) ;
return NULL ;
2005-04-16 15:20:36 -07:00
}
2015-09-09 15:35:57 -07:00
read_unlock ( & proc_subdir_lock ) ;
2013-01-25 20:11:22 -05:00
return ERR_PTR ( - ENOENT ) ;
2005-04-16 15:20:36 -07:00
}
[NET]: Make /proc/net a symlink on /proc/self/net (v3)
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.
The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.
The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.
# ls -l /proc/net
lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net
In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.
Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
screwup pointed out by Stephen.
To get the correct nlink count the ->getattr callback for /proc/net
is overridden to read one from the net->proc_net entry.
To make selinux still work the net->proc_net entry is initialized
properly, i.e. with the "net" name and the proc_net parent.
Selinux fixes are
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:08:40 -08:00
struct dentry * proc_lookup ( struct inode * dir , struct dentry * dentry ,
2012-06-10 17:13:09 -04:00
unsigned int flags )
[NET]: Make /proc/net a symlink on /proc/self/net (v3)
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.
The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.
The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.
# ls -l /proc/net
lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net
In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.
Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
screwup pointed out by Stephen.
To get the correct nlink count the ->getattr callback for /proc/net
is overridden to read one from the net->proc_net entry.
To make selinux still work the net->proc_net entry is initialized
properly, i.e. with the "net" name and the proc_net parent.
Selinux fixes are
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:08:40 -08:00
{
return proc_lookup_de ( PDE ( dir ) , dir , dentry ) ;
}
2005-04-16 15:20:36 -07:00
/*
* This returns non - zero if at EOF , so that the / proc
* root directory can use this and check if it should
* continue with the < pid > entries . .
*
* Note that the VFS - layer doesn ' t care about the return
* value of the readdir ( ) call , as long as it ' s non - negative
* for success . .
*/
2013-05-16 12:07:31 -04:00
int proc_readdir_de ( struct proc_dir_entry * de , struct file * file ,
struct dir_context * ctx )
2005-04-16 15:20:36 -07:00
{
int i ;
2013-05-16 12:07:31 -04:00
if ( ! dir_emit_dots ( file , ctx ) )
return 0 ;
2015-09-09 15:35:57 -07:00
read_lock ( & proc_subdir_lock ) ;
2014-12-10 15:45:01 -08:00
de = pde_subdir_first ( de ) ;
2013-05-16 12:07:31 -04:00
i = ctx - > pos - 2 ;
for ( ; ; ) {
if ( ! de ) {
2015-09-09 15:35:57 -07:00
read_unlock ( & proc_subdir_lock ) ;
2013-05-16 12:07:31 -04:00
return 0 ;
}
if ( ! i )
break ;
2014-12-10 15:45:01 -08:00
de = pde_subdir_next ( de ) ;
2013-05-16 12:07:31 -04:00
i - - ;
2005-04-16 15:20:36 -07:00
}
2013-05-16 12:07:31 -04:00
do {
struct proc_dir_entry * next ;
pde_get ( de ) ;
2015-09-09 15:35:57 -07:00
read_unlock ( & proc_subdir_lock ) ;
2013-05-16 12:07:31 -04:00
if ( ! dir_emit ( ctx , de - > name , de - > namelen ,
de - > low_ino , de - > mode > > 12 ) ) {
pde_put ( de ) ;
return 0 ;
}
2015-09-09 15:35:57 -07:00
read_lock ( & proc_subdir_lock ) ;
2013-05-16 12:07:31 -04:00
ctx - > pos + + ;
2014-12-10 15:45:01 -08:00
next = pde_subdir_next ( de ) ;
2013-05-16 12:07:31 -04:00
pde_put ( de ) ;
de = next ;
} while ( de ) ;
2015-09-09 15:35:57 -07:00
read_unlock ( & proc_subdir_lock ) ;
2013-08-19 16:26:12 -07:00
return 1 ;
2005-04-16 15:20:36 -07:00
}
2013-05-16 12:07:31 -04:00
int proc_readdir ( struct file * file , struct dir_context * ctx )
[NET]: Make /proc/net a symlink on /proc/self/net (v3)
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.
The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.
The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.
# ls -l /proc/net
lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net
In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.
Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
screwup pointed out by Stephen.
To get the correct nlink count the ->getattr callback for /proc/net
is overridden to read one from the net->proc_net entry.
To make selinux still work the net->proc_net entry is initialized
properly, i.e. with the "net" name and the proc_net parent.
Selinux fixes are
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:08:40 -08:00
{
2013-05-16 12:07:31 -04:00
struct inode * inode = file_inode ( file ) ;
[NET]: Make /proc/net a symlink on /proc/self/net (v3)
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.
The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.
The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.
# ls -l /proc/net
lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net
In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.
Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
screwup pointed out by Stephen.
To get the correct nlink count the ->getattr callback for /proc/net
is overridden to read one from the net->proc_net entry.
To make selinux still work the net->proc_net entry is initialized
properly, i.e. with the "net" name and the proc_net parent.
Selinux fixes are
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:08:40 -08:00
2013-05-16 12:07:31 -04:00
return proc_readdir_de ( PDE ( inode ) , file , ctx ) ;
[NET]: Make /proc/net a symlink on /proc/self/net (v3)
Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.
The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.
The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.
# ls -l /proc/net
lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net
In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.
Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
screwup pointed out by Stephen.
To get the correct nlink count the ->getattr callback for /proc/net
is overridden to read one from the net->proc_net entry.
To make selinux still work the net->proc_net entry is initialized
properly, i.e. with the "net" name and the proc_net parent.
Selinux fixes are
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-07 11:08:40 -08:00
}
2005-04-16 15:20:36 -07:00
/*
* These are the generic / proc directory operations . They
* use the in - memory " struct proc_dir_entry " tree to parse
* the / proc directory .
*/
2007-02-12 00:55:34 -08:00
static const struct file_operations proc_dir_operations = {
proc: stop using BKL
There are four BKL users in proc: de_put(), proc_lookup_de(),
proc_readdir_de(), proc_root_readdir(),
1) de_put()
-----------
de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
needed. BKL doesn't matter to possible refcount leak as well.
2) proc_lookup_de()
-------------------
Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
potentially blocking, all callers of proc_lookup_de() eventually end up
from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
doesn't protect anything.
3) proc_readdir_de()
--------------------
"." and ".." part doesn't need BKL, walking PDE list is under
proc_subdir_lock, calling filldir callback is potentially blocking
because it writes to luserspace. All proc_readdir_de() callers
eventually come from ->readdir hook which is under directory's
->i_mutex -- BKL doesn't protect anything.
4) proc_root_readdir_de()
-------------------------
proc_root_readdir_de is ->readdir hook, see (3).
Since readdir hooks doesn't use BKL anymore, switch to
generic_file_llseek, since it also takes directory's i_mutex.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-27 22:48:36 +03:00
. llseek = generic_file_llseek ,
2005-04-16 15:20:36 -07:00
. read = generic_read_dir ,
2013-05-16 12:07:31 -04:00
. iterate = proc_readdir ,
2005-04-16 15:20:36 -07:00
} ;
/*
* proc directories can do almost nothing . .
*/
2007-02-12 00:55:40 -08:00
static const struct inode_operations proc_dir_inode_operations = {
2005-04-16 15:20:36 -07:00
. lookup = proc_lookup ,
2005-09-06 15:17:18 -07:00
. getattr = proc_getattr ,
2005-04-16 15:20:36 -07:00
. setattr = proc_notify_change ,
} ;
static int proc_register ( struct proc_dir_entry * dir , struct proc_dir_entry * dp )
{
2011-06-17 13:33:20 -07:00
int ret ;
2014-12-10 15:45:01 -08:00
2011-06-17 13:33:20 -07:00
ret = proc_alloc_inum ( & dp - > low_ino ) ;
if ( ret )
return ret ;
2006-03-26 01:36:55 -08:00
2015-09-09 15:35:57 -07:00
write_lock ( & proc_subdir_lock ) ;
2007-07-15 23:40:09 -07:00
dp - > parent = dir ;
2014-12-10 15:45:04 -08:00
if ( pde_subdir_insert ( dir , dp ) = = false ) {
2014-12-10 15:45:01 -08:00
WARN ( 1 , " proc_dir_entry '%s/%s' already registered \n " ,
dir - > name , dp - > name ) ;
2015-09-09 15:35:57 -07:00
write_unlock ( & proc_subdir_lock ) ;
2014-12-10 15:45:04 -08:00
proc_free_inum ( dp - > low_ino ) ;
return - EEXIST ;
}
2015-09-09 15:35:57 -07:00
write_unlock ( & proc_subdir_lock ) ;
2007-07-15 23:40:09 -07:00
2005-04-16 15:20:36 -07:00
return 0 ;
}
proc: fix ->open'less usage due to ->proc_fops flip
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 04:18:37 -08:00
static struct proc_dir_entry * __proc_create ( struct proc_dir_entry * * parent ,
2005-04-16 15:20:36 -07:00
const char * name ,
2011-07-24 03:36:29 -04:00
umode_t mode ,
2005-04-16 15:20:36 -07:00
nlink_t nlink )
{
struct proc_dir_entry * ent = NULL ;
2014-08-08 14:21:25 -07:00
const char * fn ;
struct qstr qstr ;
2005-04-16 15:20:36 -07:00
proc: less special case in xlate code
If valid "parent" is passed to proc_create/remove_proc_entry(), then name of
PDE should consist of only one path component, otherwise creation or or
removal will fail. However, if NULL is passed as parent then create/remove
accept full path as a argument. This is arbitrary restriction -- all
infrastructure is in place.
So, patch allows the following to succeed:
create_proc_entry("foo/bar", 0, pde_baz);
remove_proc_entry("baz/foo/bar", &proc_root);
Also makes the following to behave identically:
create_proc_entry("foo/bar", 0, NULL);
create_proc_entry("foo/bar", 0, &proc_root);
Discrepancy noticed by Den Lunev (IIRC).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 01:01:40 -07:00
if ( xlate_proc_name ( name , parent , & fn ) ! = 0 )
2005-04-16 15:20:36 -07:00
goto out ;
2014-08-08 14:21:25 -07:00
qstr . name = fn ;
qstr . len = strlen ( fn ) ;
if ( qstr . len = = 0 | | qstr . len > = 256 ) {
WARN ( 1 , " name len %u \n " , qstr . len ) ;
return NULL ;
}
if ( * parent = = & proc_root & & name_to_int ( & qstr ) ! = ~ 0U ) {
WARN ( 1 , " create '/proc/%s' by hand \n " , qstr . name ) ;
return NULL ;
}
2015-05-11 16:44:25 -05:00
if ( is_empty_pde ( * parent ) ) {
WARN ( 1 , " attempt to add to permanently empty directory " ) ;
return NULL ;
}
2005-04-16 15:20:36 -07:00
2014-08-08 14:21:25 -07:00
ent = kzalloc ( sizeof ( struct proc_dir_entry ) + qstr . len + 1 , GFP_KERNEL ) ;
2012-10-04 17:15:43 -07:00
if ( ! ent )
goto out ;
2005-04-16 15:20:36 -07:00
2014-08-08 14:21:25 -07:00
memcpy ( ent - > name , fn , qstr . len + 1 ) ;
ent - > namelen = qstr . len ;
2005-04-16 15:20:36 -07:00
ent - > mode = mode ;
ent - > nlink = nlink ;
2014-12-10 15:45:01 -08:00
ent - > subdir = RB_ROOT ;
proc: fix proc_dir_entry refcounting
Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.
This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():
1) PDE leak
remove_proc_entry de_put
----------------- ------
[refcnt = 1]
if (atomic_read(&de->count) == 0)
if (atomic_dec_and_test(&de->count))
if (de->deleted)
/* also not taken! */
free_proc_entry(de);
else
de->deleted = 1;
[refcount=0, deleted=1]
2) use after free
remove_proc_entry de_put
----------------- ------
[refcnt = 1]
if (atomic_dec_and_test(&de->count))
if (atomic_read(&de->count) == 0)
free_proc_entry(de);
/* boom! */
if (de->deleted)
free_proc_entry(de);
BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
[<c10ac4f0>] vsnprintf+0x2ad/0x49b
[<c10ac779>] vscnprintf+0x14/0x1f
[<c1018e6b>] vprintk+0xc5/0x2f9
[<c10379f1>] handle_fasteoi_irq+0x0/0xab
[<c1004f44>] do_IRQ+0x9f/0xb7
[<c117db3b>] preempt_schedule_irq+0x3f/0x5b
[<c100264e>] need_resched+0x1f/0x21
[<c10190ba>] printk+0x1b/0x1f
[<c107c8ad>] de_put+0x3d/0x50
[<c107c8f8>] proc_delete_inode+0x38/0x41
[<c107c8c0>] proc_delete_inode+0x0/0x41
[<c1066298>] generic_delete_inode+0x5e/0xc6
[<c1065aa9>] iput+0x60/0x62
[<c1063c8e>] d_kill+0x2d/0x46
[<c1063fa9>] dput+0xdc/0xe4
[<c10571a1>] __fput+0xb0/0xcd
[<c1054e49>] filp_close+0x48/0x4f
[<c1055ee9>] sys_close+0x67/0xa5
[<c10026b6>] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44
Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen => nobody
can mark PDE deleted.
Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.
Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-04 23:45:28 -08:00
atomic_set ( & ent - > count , 1 ) ;
Fix rmmod/read/write races in /proc entries
Fix following races:
===========================================
1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
meanwhile. Or, more generically, system call done on /proc file, method
supplied by module is called, module dissapeares meanwhile.
pde = create_proc_entry()
if (!pde)
return -ENOMEM;
pde->write_proc = ...
open
write
copy_from_user
pde = create_proc_entry();
if (!pde) {
remove_proc_entry();
return -ENOMEM;
/* module unloaded */
}
*boom*
==========================================
2. bogo-revoke aka proc_kill_inodes()
remove_proc_entry vfs_read
proc_kill_inodes [check ->f_op validness]
[check ->f_op->read validness]
[verify_area, security permissions checks]
->f_op = NULL;
if (file->f_op->read)
/* ->f_op dereference, boom */
NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set ->owner for them, so proxying for
directories may be unneeded.
NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
If your in-tree module uses something else, yell on me. Full audit pending.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-15 23:39:00 -07:00
spin_lock_init ( & ent - > pde_unload_lock ) ;
2008-07-25 01:48:29 -07:00
INIT_LIST_HEAD ( & ent - > pde_openers ) ;
2012-10-04 17:15:43 -07:00
out :
2005-04-16 15:20:36 -07:00
return ent ;
}
struct proc_dir_entry * proc_symlink ( const char * name ,
struct proc_dir_entry * parent , const char * dest )
{
struct proc_dir_entry * ent ;
proc: fix ->open'less usage due to ->proc_fops flip
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 04:18:37 -08:00
ent = __proc_create ( & parent , name ,
2005-04-16 15:20:36 -07:00
( S_IFLNK | S_IRUGO | S_IWUGO | S_IXUGO ) , 1 ) ;
if ( ent ) {
ent - > data = kmalloc ( ( ent - > size = strlen ( dest ) ) + 1 , GFP_KERNEL ) ;
if ( ent - > data ) {
strcpy ( ( char * ) ent - > data , dest ) ;
2014-12-25 16:47:49 -05:00
ent - > proc_iops = & proc_link_inode_operations ;
2005-04-16 15:20:36 -07:00
if ( proc_register ( parent , ent ) < 0 ) {
kfree ( ent - > data ) ;
kfree ( ent ) ;
ent = NULL ;
}
} else {
kfree ( ent ) ;
ent = NULL ;
}
}
return ent ;
}
2009-12-30 13:24:41 +08:00
EXPORT_SYMBOL ( proc_symlink ) ;
2005-04-16 15:20:36 -07:00
2013-04-12 02:48:30 +01:00
struct proc_dir_entry * proc_mkdir_data ( const char * name , umode_t mode ,
struct proc_dir_entry * parent , void * data )
2005-04-16 15:20:36 -07:00
{
struct proc_dir_entry * ent ;
2013-04-12 02:48:30 +01:00
if ( mode = = 0 )
mode = S_IRUGO | S_IXUGO ;
proc: fix ->open'less usage due to ->proc_fops flip
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 04:18:37 -08:00
ent = __proc_create ( & parent , name , S_IFDIR | mode , 2 ) ;
2005-04-16 15:20:36 -07:00
if ( ent ) {
2013-04-12 02:48:30 +01:00
ent - > data = data ;
2014-12-25 16:47:49 -05:00
ent - > proc_fops = & proc_dir_operations ;
ent - > proc_iops = & proc_dir_inode_operations ;
parent - > nlink + + ;
2005-04-16 15:20:36 -07:00
if ( proc_register ( parent , ent ) < 0 ) {
kfree ( ent ) ;
2014-12-25 16:47:49 -05:00
parent - > nlink - - ;
2005-04-16 15:20:36 -07:00
ent = NULL ;
}
}
return ent ;
}
2013-04-12 02:48:30 +01:00
EXPORT_SYMBOL_GPL ( proc_mkdir_data ) ;
2005-04-16 15:20:36 -07:00
2013-04-12 02:48:30 +01:00
struct proc_dir_entry * proc_mkdir_mode ( const char * name , umode_t mode ,
struct proc_dir_entry * parent )
2008-05-02 04:12:41 -07:00
{
2013-04-12 02:48:30 +01:00
return proc_mkdir_data ( name , mode , parent , NULL ) ;
2008-05-02 04:12:41 -07:00
}
2013-04-12 02:48:30 +01:00
EXPORT_SYMBOL ( proc_mkdir_mode ) ;
2008-05-02 04:12:41 -07:00
2005-04-16 15:20:36 -07:00
struct proc_dir_entry * proc_mkdir ( const char * name ,
struct proc_dir_entry * parent )
{
2013-04-12 02:48:30 +01:00
return proc_mkdir_data ( name , 0 , parent , NULL ) ;
2005-04-16 15:20:36 -07:00
}
2009-12-30 13:24:41 +08:00
EXPORT_SYMBOL ( proc_mkdir ) ;
2005-04-16 15:20:36 -07:00
2015-05-11 16:44:25 -05:00
struct proc_dir_entry * proc_create_mount_point ( const char * name )
{
umode_t mode = S_IFDIR | S_IRUGO | S_IXUGO ;
struct proc_dir_entry * ent , * parent = NULL ;
ent = __proc_create ( & parent , name , mode , 2 ) ;
if ( ent ) {
ent - > data = NULL ;
ent - > proc_fops = NULL ;
ent - > proc_iops = NULL ;
if ( proc_register ( parent , ent ) < 0 ) {
kfree ( ent ) ;
parent - > nlink - - ;
ent = NULL ;
}
}
return ent ;
}
2011-07-24 03:36:29 -04:00
struct proc_dir_entry * proc_create_data ( const char * name , umode_t mode ,
proc: introduce proc_create_data to setup de->data
This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
the most part of the kernel code. The original OOPS is described in the
commit 2d3a4e3666325a9709cc8ea2e88151394e8f20fc:
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
In addition to this, proc_create_data is introduced to fix reading from
proc without PDE->data. The race is basically the same as above.
create_proc_entries is replaced in the entire kernel code as new method
is also simply better.
This patch:
The problem is the same as for de->proc_fops. Right now PDE becomes visible
without data set. So, the entry could be looked up without data. This, in
most cases, will simply OOPS.
proc_create_data call is created to address this issue. proc_create now
becomes a wrapper around it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Chris Mason <chris.mason@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 01:02:00 -07:00
struct proc_dir_entry * parent ,
const struct file_operations * proc_fops ,
void * data )
proc: fix ->open'less usage due to ->proc_fops flip
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 04:18:37 -08:00
{
struct proc_dir_entry * pde ;
2013-03-30 21:20:14 -04:00
if ( ( mode & S_IFMT ) = = 0 )
mode | = S_IFREG ;
proc: fix ->open'less usage due to ->proc_fops flip
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 04:18:37 -08:00
2013-03-30 21:20:14 -04:00
if ( ! S_ISREG ( mode ) ) {
WARN_ON ( 1 ) ; /* use proc_mkdir() */
return NULL ;
proc: fix ->open'less usage due to ->proc_fops flip
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 04:18:37 -08:00
}
2014-12-25 16:47:49 -05:00
BUG_ON ( proc_fops = = NULL ) ;
2013-03-30 21:20:14 -04:00
if ( ( mode & S_IALLUGO ) = = 0 )
mode | = S_IRUGO ;
pde = __proc_create ( & parent , name , mode , 1 ) ;
proc: fix ->open'less usage due to ->proc_fops flip
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 04:18:37 -08:00
if ( ! pde )
goto out ;
pde - > proc_fops = proc_fops ;
proc: introduce proc_create_data to setup de->data
This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
the most part of the kernel code. The original OOPS is described in the
commit 2d3a4e3666325a9709cc8ea2e88151394e8f20fc:
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
In addition to this, proc_create_data is introduced to fix reading from
proc without PDE->data. The race is basically the same as above.
create_proc_entries is replaced in the entire kernel code as new method
is also simply better.
This patch:
The problem is the same as for de->proc_fops. Right now PDE becomes visible
without data set. So, the entry could be looked up without data. This, in
most cases, will simply OOPS.
proc_create_data call is created to address this issue. proc_create now
becomes a wrapper around it.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Chris Mason <chris.mason@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 01:02:00 -07:00
pde - > data = data ;
2014-12-25 16:47:49 -05:00
pde - > proc_iops = & proc_file_inode_operations ;
proc: fix ->open'less usage due to ->proc_fops flip
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 04:18:37 -08:00
if ( proc_register ( parent , pde ) < 0 )
goto out_free ;
return pde ;
out_free :
kfree ( pde ) ;
out :
return NULL ;
}
2009-12-30 13:24:41 +08:00
EXPORT_SYMBOL ( proc_create_data ) ;
2013-04-12 00:38:51 +01:00
void proc_set_size ( struct proc_dir_entry * de , loff_t size )
{
de - > size = size ;
}
EXPORT_SYMBOL ( proc_set_size ) ;
void proc_set_user ( struct proc_dir_entry * de , kuid_t uid , kgid_t gid )
{
de - > uid = uid ;
de - > gid = gid ;
}
EXPORT_SYMBOL ( proc_set_user ) ;
proc: fix ->open'less usage due to ->proc_fops flip
Typical PDE creation code looks like:
pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;
Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).
The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:
pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;
Fix most networking users for a start.
In the long run, create_proc_entry() for regular files will go.
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 04:18:37 -08:00
2009-12-15 16:45:39 -08:00
static void free_proc_entry ( struct proc_dir_entry * de )
2005-04-16 15:20:36 -07:00
{
2011-06-17 13:33:20 -07:00
proc_free_inum ( de - > low_ino ) ;
2005-04-16 15:20:36 -07:00
2008-02-08 04:18:28 -08:00
if ( S_ISLNK ( de - > mode ) )
2005-04-16 15:20:36 -07:00
kfree ( de - > data ) ;
kfree ( de ) ;
}
2009-12-15 16:45:39 -08:00
void pde_put ( struct proc_dir_entry * pde )
{
if ( atomic_dec_and_test ( & pde - > count ) )
free_proc_entry ( pde ) ;
}
2013-03-30 20:13:46 -04:00
/*
* Remove a / proc entry and free it if it ' s not currently in use .
*/
void remove_proc_entry ( const char * name , struct proc_dir_entry * parent )
{
struct proc_dir_entry * de = NULL ;
const char * fn = name ;
unsigned int len ;
2015-09-09 15:35:57 -07:00
write_lock ( & proc_subdir_lock ) ;
2013-03-30 20:13:46 -04:00
if ( __xlate_proc_name ( name , & parent , & fn ) ! = 0 ) {
2015-09-09 15:35:57 -07:00
write_unlock ( & proc_subdir_lock ) ;
2013-03-30 20:13:46 -04:00
return ;
}
len = strlen ( fn ) ;
2014-12-10 15:45:01 -08:00
de = pde_subdir_find ( parent , fn , len ) ;
if ( de )
rb_erase ( & de - > subdir_node , & parent - > subdir ) ;
2015-09-09 15:35:57 -07:00
write_unlock ( & proc_subdir_lock ) ;
2013-03-30 20:13:46 -04:00
if ( ! de ) {
WARN ( 1 , " name '%s' \n " , name ) ;
return ;
}
2013-04-03 19:07:30 -04:00
proc_entry_rundown ( de ) ;
2008-07-25 01:48:29 -07:00
2008-04-29 01:01:39 -07:00
if ( S_ISDIR ( de - > mode ) )
parent - > nlink - - ;
de - > nlink = 0 ;
2014-12-10 15:45:01 -08:00
WARN ( pde_subdir_first ( de ) ,
" %s: removing non-empty directory '%s/%s', leaking at least '%s' \n " ,
__func__ , de - > parent - > name , de - > name , pde_subdir_first ( de ) - > name ) ;
2009-12-15 16:45:39 -08:00
pde_put ( de ) ;
2005-04-16 15:20:36 -07:00
}
2009-12-30 13:24:41 +08:00
EXPORT_SYMBOL ( remove_proc_entry ) ;
2013-03-30 20:13:46 -04:00
int remove_proc_subtree ( const char * name , struct proc_dir_entry * parent )
{
struct proc_dir_entry * root = NULL , * de , * next ;
const char * fn = name ;
unsigned int len ;
2015-09-09 15:35:57 -07:00
write_lock ( & proc_subdir_lock ) ;
2013-03-30 20:13:46 -04:00
if ( __xlate_proc_name ( name , & parent , & fn ) ! = 0 ) {
2015-09-09 15:35:57 -07:00
write_unlock ( & proc_subdir_lock ) ;
2013-03-30 20:13:46 -04:00
return - ENOENT ;
}
len = strlen ( fn ) ;
2014-12-10 15:45:01 -08:00
root = pde_subdir_find ( parent , fn , len ) ;
2013-03-30 20:13:46 -04:00
if ( ! root ) {
2015-09-09 15:35:57 -07:00
write_unlock ( & proc_subdir_lock ) ;
2013-03-30 20:13:46 -04:00
return - ENOENT ;
}
2014-12-10 15:45:01 -08:00
rb_erase ( & root - > subdir_node , & parent - > subdir ) ;
2013-03-30 20:13:46 -04:00
de = root ;
while ( 1 ) {
2014-12-10 15:45:01 -08:00
next = pde_subdir_first ( de ) ;
2013-03-30 20:13:46 -04:00
if ( next ) {
2014-12-10 15:45:01 -08:00
rb_erase ( & next - > subdir_node , & de - > subdir ) ;
2013-03-30 20:13:46 -04:00
de = next ;
continue ;
}
2015-09-09 15:35:57 -07:00
write_unlock ( & proc_subdir_lock ) ;
2013-03-30 20:13:46 -04:00
2013-04-03 19:07:30 -04:00
proc_entry_rundown ( de ) ;
2013-03-30 20:13:46 -04:00
next = de - > parent ;
if ( S_ISDIR ( de - > mode ) )
next - > nlink - - ;
de - > nlink = 0 ;
if ( de = = root )
break ;
pde_put ( de ) ;
2015-09-09 15:35:57 -07:00
write_lock ( & proc_subdir_lock ) ;
2013-03-30 20:13:46 -04:00
de = next ;
}
pde_put ( root ) ;
return 0 ;
}
EXPORT_SYMBOL ( remove_proc_subtree ) ;
2013-04-12 14:06:01 +01:00
void * proc_get_parent_data ( const struct inode * inode )
{
struct proc_dir_entry * de = PDE ( inode ) ;
return de - > parent - > data ;
}
EXPORT_SYMBOL_GPL ( proc_get_parent_data ) ;
2013-04-12 17:27:28 +01:00
void proc_remove ( struct proc_dir_entry * de )
{
if ( de )
remove_proc_subtree ( de - > name , de - > parent ) ;
}
EXPORT_SYMBOL ( proc_remove ) ;
2013-04-12 18:03:36 +01:00
void * PDE_DATA ( const struct inode * inode )
{
return __PDE_DATA ( inode ) ;
}
EXPORT_SYMBOL ( PDE_DATA ) ;