linux

iv/linux

Author	SHA1	Message	Date
Chuck Lever	5b362ac379	NFS: Fix panic after nfs_umount() After a few unsuccessful NFS mount attempts in which the client and server cannot agree on an authentication flavor both support, the client panics. nfs_umount() is invoked in the kernel in this case. Turns out nfs_umount()'s UMNT RPC invocation causes the RPC client to write off the end of the rpc_clnt's iostat array. This is because the mount client's nrprocs field is initialized with the count of defined procedures (two: MNT and UMNT), rather than the size of the client's proc array (four). The fix is to use the same initialization technique used by most other upper layer clients in the kernel. Introduced by commit 0b524123, which failed to update nrprocs when support was added for UMNT in the kernel. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=24302 BugLink: http://bugs.launchpad.net/bugs/683938 Reported-by: Stefan Bader <stefan.bader@canonical.com> Tested-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@kernel.org # >= 2.6.32 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-10 13:01:50 -05:00
Tristan Ye	39c99f12f1	Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem. Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX rw_lock like buffered writes did(rw_level == 1), it turns out messing the usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being failed to get up_read'd correctly. This patch tries to teach ocfs2_dio_end_io to understand well on all locking stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private data, just like what we did for rw_lock. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-12-09 15:36:48 -08:00
Sunil Mushran	388c4bcb4e	ocfs2/dlm: Migrate lockres with no locks if it has a reference o2dlm was not migrating resources with zero locks because it assumed that that resource would get purged by dlm_thread. However, some usage patterns involve creating and dropping locks at a high rate leading to the migrate thread seeing zero locks but the purge thread seeing an active reference. When this happens, the dlm_thread cannot purge the resource and the migrate thread sees no reason to migrate that resource. The spell is broken when the migrate thread catches the resource with a lock. The fix is to make the migrate thread also consider the reference map. This usage pattern can be triggered by userspace on userdlm locks and flocks. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>	2010-12-09 15:36:00 -08:00
Christoph Hellwig	05340d4ab2	xfs: log timestamp changes to the source inode in rename Now that we don't mark VFS inodes dirty anymore for internal timestamp changes, but rely on the transaction subsystem to push them out, we need to explicitly log the source inode in rename after updating it's timestamps to make sure the changes actually get forced out by sync/fsync or an AIL push. We already account for the fourth inode in the log reservation, as a rename of directories needs to update the nlink field, so just adding the xfs_trans_log_inode call is enough. This fixes the xfsqa 065 regression introduced by: "xfs: don't use vfs writeback for pure metadata modifications" Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-12-09 17:07:02 -06:00
Josef Bacik	7e1fea731d	Btrfs: fixup return code for btrfs_del_orphan_item If the orphan item doesn't exist, we return 1, which doesn't make any sense to the callers. Instead return -ENOENT if we didn't find the item. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2010-12-09 13:57:15 -05:00
Josef Bacik	b8399dee47	Btrfs: do not do fast caching if we are allocating blocks for tree_root Since the fast caching uses normal tree locking, we can possibly deadlock if we get to the caching via a btrfs_search_slot() on the tree_root. So just check to see if the root we are on is the tree root, and just don't do the fast caching. Reported-by: Sage Weil <sage@newdream.net> Signed-off-by: Josef Bacik <josef@redhat.com>	2010-12-09 13:57:13 -05:00
Josef Bacik	2b20982e31	Btrfs: deal with space cache errors better Currently if the space cache inode generation number doesn't match the generation number in the space cache header we will just fail to load the space cache, but we won't mark the space cache as an error, so we'll keep getting that error each time somebody tries to cache that block group until we actually clear the thing. Fix this by marking the space cache as having an error so we only get the message once. This patch also makes it so that we don't try and setup space cache for a block group that isn't cached, since we won't be able to write it out anyway. None of these problems are actual problems, they are just annoying and sub-optimal. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2010-12-09 13:57:12 -05:00
Josef Bacik	955256f2c3	Btrfs: fix use after free in O_DIRECT This fixes a bug where we use dip after we have freed it. Instead just use the file_offset that was passed to the function. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2010-12-09 13:57:10 -05:00
Suresh Jayaraman	545c988b20	cifs: remove bogus remapping of error in cifs_filldir() As the FIXME points out correctly, now filldir() itself returns -EOVERFLOW if it not possible to represent the inode number supplied by the filesystem in the field provided by userspace. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-12-08 18:47:54 +00:00
Neil Brown	c1ac3ffcd0	nfsd: Fix possible BUG_ON firing in set_change_info If vfs_getattr in fill_post_wcc returns an error, we don't set fh_post_change. For NFSv4, this can result in set_change_info triggering a BUG_ON. i.e. fh_post_saved being zero isn't really a bug. So: - instead of BUGging when fh_post_saved is zero, just clear ->atomic. - if vfs_getattr fails in fill_post_wcc, take a copy of i_ctime anyway. This will be used i seg_change_info, but not overly trusted. - While we are there, remove the pointless 'if' statements in set_change_info. There is no harm setting all the values. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-12-08 11:44:04 -05:00
Trond Myklebust	2df485a774	nfs: remove extraneous and problematic calls to nfs_clear_request When a nfs_page is freed, nfs_free_request is called which also calls nfs_clear_request to clean out the lock and open contexts and free the pagecache page. However, a couple of places in the nfs code call nfs_clear_request themselves. What happens here if the refcount on the request is still high? We'll be releasing contexts and freeing pointers while the request is possibly still in use. Remove those bare calls to nfs_clear_context. That should only be done when the request is being freed. Note that when doing this, we need to watch out for tests of req->wb_page. Previously, nfs_set_page_tag_locked() and nfs_clear_page_tag_locked() would check the value of req->wb_page to figure out if the page is mapped into the nfsi->nfs_page_tree. We now indicate the page is mapped using the new bit PG_MAPPED in req->wb_flags . Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 23:02:44 -05:00
Mi Jinlong	0de1b7e800	nfs: kernel should return EPROTONOSUPPORT when not support NFSv4 When nfs client(kernel) don't support NFSv4, maybe user build kernel without NFSv4, there is a problem. Using command "mount SERVER-IP:/nfsv3 /mnt/" to mount NFSv3 filesystem, mount should should success, but fail and get error: "mount.nfs: an incorrect mount option was specified" System call mount "nfs"(not "nfs4") with "vers=4", if CONFIG_NFS_V4 is not defined, the "vers=4" will be parsed as invalid argument and kernel return EINVAL to nfs-utils. About that, we really want get EPROTONOSUPPORT rather than EINVAL. This path make sure kernel parses argument success, and return EPROTONOSUPPORT at nfs_validate_mount_data(). Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 19:30:44 -05:00
Sergey Vlasov	21ac19d484	NFS: Fix fcntl F_GETLK not reporting some conflicts The commit 129a84de2347002f09721cda3155ccfd19fade40 (locks: fix F_GETLK regression (failure to find conflicts)) fixed the posix_test_lock() function by itself, however, its usage in NFS changed by the commit 9d6a8c5c213e34c475e72b245a8eb709258e968c (locks: give posix_test_lock same interface as ->lock) remained broken - subsequent NFS-specific locking code received F_UNLCK instead of the user-specified lock type. To fix the problem, fl->fl_type needs to be saved before the posix_test_lock() call and restored if no local conflicts were reported. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=23892 Tested-by: Alexander Morozov <amorozov@etersoft.ru> Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Cc: <stable@kernel.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 19:30:43 -05:00
Aneesh Kumar K.V	08a22b392a	nfs: Discard ACL cache on mode update An update of mode bits can result in ACL value being changed. We need to mark the acl cache invalid when we update mode. Similarly we need to update file attribute when we change ACL value Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 19:30:42 -05:00
Lino Sanfilippo	fdbf3ceeb6	fanotify: Dont try to open a file descriptor for the overflow event We should not try to open a file descriptor for the overflow event since this will always fail. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-12-07 16:14:24 -05:00
Eric Paris	2637919893	fanotify: do not leak user reference on allocation failure If fanotify_init is unable to allocate a new fsnotify group it will return but will not drop its reference on the associated user struct. Drop that reference on error. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-12-07 16:14:23 -05:00
Eric Paris	a2ae4cc9a1	inotify: stop kernel memory leak on file creation failure If inotify_init is unable to allocate a new file for the new inotify group we leak the new group. This patch drops the reference on the group on file allocation failure. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> cc: stable@kernel.org Signed-off-by: Eric Paris <eparis@redhat.com>	2010-12-07 16:14:22 -05:00
Lino Sanfilippo	09e5f14e57	fanotify: on group destroy allow all waiters to bypass permission check When fanotify_release() is called, there may still be processes waiting for access permission. Currently only processes for which an event has already been queued into the groups access list will be woken up. Processes for which no event has been queued will continue to sleep and thus cause a deadlock when fsnotify_put_group() is called. Furthermore there is a race allowing further processes to be waiting on the access wait queue after wake_up (if they arrive before clear_marks_by_group() is called). This patch corrects this by setting a flag to inform processes that the group is about to be destroyed and thus not to wait for access permission. [additional changelog from eparis] Lets think about the 4 relevant code paths from the PoV of the 'operator' 'listener' 'responder' and 'closer'. Where operator is the process doing an action (like open/read) which could require permission. Listener is the task (or in this case thread) slated with reading from the fanotify file descriptor. The 'responder' is the thread responsible for responding to access requests. 'Closer' is the thread attempting to close the fanotify file descriptor. The 'operator' is going to end up in: fanotify_handle_event() get_response_from_access() (THIS BLOCKS WAITING ON USERSPACE) The 'listener' interesting code path fanotify_read() copy_event_to_user() prepare_for_access_response() (THIS CREATES AN fanotify_response_event) The 'responder' code path: fanotify_write() process_access_response() (REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator') The 'closer': fanotify_release() (SUPPOSED TO CLEAN UP THE REST OF THIS MESS) What we have today is that in the closer we remove all of the fanotify_response_events and set a bit so no more response events are ever created in prepare_for_access_response(). The bug is that we never wake all of the operators up and tell them to move along. You fix that in fanotify_get_response_from_access(). You also fix other operators which haven't gotten there yet. So I agree that's a good fix. [/additional changelog from eparis] [remove additional changes to minimize patch size] [move initialization so it was inside CONFIG_FANOTIFY_PERMISSION] Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-12-07 16:14:22 -05:00
Lino Sanfilippo	1734dee4e3	fanotify: Dont allow a mask of 0 if setting or removing a mark In mark_remove_from_mask() we destroy marks that have their event mask cleared. Thus we should not allow the creation of those marks in the first place. With this patch we check if the mask given from user is 0 in case of FAN_MARK_ADD. If so we return an error. Same for FAN_MARK_REMOVE since this does not have any effect. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-12-07 16:14:21 -05:00
Lino Sanfilippo	fa218ab98c	fanotify: correct broken ref counting in case adding a mark failed If adding a mount or inode mark failed fanotify_free_mark() is called explicitly. But at this time the mark has already been put into the destroy list of the fsnotify_mark kernel thread. If the thread is too slow it will try to decrease the reference of a mark, that has already been freed by fanotify_free_mark(). (If its fast enough it will only decrease the marks ref counter from 2 to 1 - note that the counter has been increased to 2 in add_mark() - which has practically no effect.) This patch fixes the ref counting by not calling free_mark() explicitly, but decreasing the ref counter and rely on the fsnotify_mark thread to cleanup in case adding the mark has failed. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-12-07 16:14:21 -05:00
Lino Sanfilippo	b1085ba80c	fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called Unsetting FMODE_NONOTIFY in fsnotify_open() is too late, since fsnotify_perm() is called before. If FMODE_NONOTIFY is set fsnotify_perm() will skip permission checks, so a user can still disable permission checks by setting this flag in an open() call. This patch corrects this by unsetting the flag before fsnotify_perm is called. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-12-07 16:14:21 -05:00
Eric Paris	ecf6f5e7d6	fanotify: deny permissions when no event was sent If no event was sent to userspace we cannot expect userspace to respond to permissions requests. Today such requests just hang forever. This patch will deny any permissions event which was unable to be sent to userspace. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2010-12-07 16:14:17 -05:00
Jeff Layton	7d161b7f41	cifs: allow calling cifs_build_path_to_root on incomplete cifs_sb It's possible that cifs_mount will call cifs_build_path_to_root on a newly instantiated cifs_sb. In that case, it's likely that the master_tlink pointer has not yet been instantiated. Fix this by having cifs_build_path_to_root take a cifsTconInfo pointer as well, and have the caller pass that in. Reported-and-Tested-by: Robbert Kouprie <robbert@exx.nl> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-12-07 19:25:37 +00:00
Jeff Layton	03ceace5c6	cifs: fix check of error return from is_path_accessable This function will return 0 if everything went ok. Commit 9d002df4 however added a block of code after the following check for rc == -EREMOTE. With that change and when rc == 0, doing the "goto mount_fail_check" here skips that code, leaving the tlink_tree and master_tlink pointer unpopulated. That causes an oops later in cifs_root_iget. Reported-and-Tested-by: Robbert Kouprie <robbert@exx.nl> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-12-07 19:17:59 +00:00
Trond Myklebust	47c716cbf6	NFS: Readdir cleanups No functional changes, but clarify the code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 14:09:02 -05:00
Trond Myklebust	18fb5fe40c	NFS: nfs_readdir_search_for_cookie() don't mark as eof if cookie not found If we're searching for a specific cookie, and it isn't found in the page cache, we should try an uncached_readdir(). To do so, we return EBADCOOKIE, but we don't set desc->eof. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-07 12:41:58 -05:00
Ian Kent	de47de7404	autofs4 - remove ioctl mutex (bz23142) With the recent changes to remove the BKL a mutex was added to the ioctl entry point for calls to the old ioctl interface. This mutex needs to be removed because of the need for the expire ioctl to call back to the daemon to perform a umount and receive a completion status (via another ioctl). This should be fine as the new ioctl interface uses much of the same code and it has been used without a mutex for around a year without issue, as was the original intention. Ref: Bugzilla bug 23142 Signed-off-by: Ian Kent <raven@themaw.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-12-07 07:45:44 -08:00
Linus Torvalds	086b17046c	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2_connection_find() returns pointer to bad structure ocfs2: char is not always signed Ocfs2: Stop tracking a negative dentry after dentry_iput(). ocfs2: fix memory leak fs/ocfs2/dlm: Use GFP_ATOMIC under spin_lock	2010-12-06 20:08:25 -08:00
Jeff Layton	8846399968	cifs: remove Local_System_Name ...this string is zeroed out and nothing ever changes it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-12-06 22:45:19 +00:00
Jeff Layton	79df1baeec	cifs: fix use of CONFIG_CIFS_ACL Some of the code under CONFIG_CIFS_ACL is dependent upon code under CONFIG_CIFS_EXPERIMENTAL, but the Kconfig options don't reflect that dependency. Move more of the ACL code out from under CONFIG_CIFS_EXPERIMENTAL and under CONFIG_CIFS_ACL. Also move find_readable_file out from other any sort of Kconfig option and make it a function normally compiled in. Reported-and-Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-12-06 20:22:39 +00:00
Sage Weil	1cd275f609	ceph: fix ioctl magic The ioctl magic was inadvertently changed in 571dba52. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-06 09:45:22 -08:00
Eric W. Biederman	7b2a69ba70	Revert "vfs: show unreachable paths in getcwd and proc" Because it caused a chroot ttyname regression in 2.6.36. As of 2.6.36 ttyname does not work in a chroot. It has already been reported that screen breaks, and for me this breaks an automated distribution testsuite, that I need to preserve the ability to run the existing binaries on for several more years. glibc 2.11.3 which has a fix for this is not an option. The root cause of this breakage is: commit 8df9d1a4142311c084ffeeacb67cd34d190eff74 Author: Miklos Szeredi <mszeredi@suse.cz> Date: Tue Aug 10 11:41:41 2010 +0200 vfs: show unreachable paths in getcwd and proc Prepend "(unreachable)" to path strings if the path is not reachable from the current root. Two places updated are - the return string from getcwd() - and symlinks under /proc/$PID. Other uses of d_path() are left unchanged (we know that some old software crashes if /proc/mounts is changed). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> So remove the nice sounding, but ultimately ill advised change to how /proc/fd symlinks work. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-12-05 16:39:45 -08:00
Steve French	ebb27386ff	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2010-12-03 03:52:43 +00:00
Frederic Weisbecker	238af8751f	reiserfs: don't acquire lock recursively in reiserfs_acl_chmod reiserfs_acl_chmod() can be called by reiserfs_set_attr() and then take the reiserfs lock a second time. Thereafter it may call journal_begin() that definitely requires the lock not to be nested in order to release it before taking the journal mutex because the reiserfs lock depends on the journal mutex already. So, aviod nesting the lock in reiserfs_acl_chmod(). Reported-by: Pawel Zawora <pzawora@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Pawel Zawora <pzawora@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: <stable@kernel.org> [2.6.32.x+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-12-02 14:51:15 -08:00
Suresh Jayaraman	6d20e8406f	cifs: add attribute cache timeout (actimeo) tunable Currently, the attribute cache timeout for CIFS is hardcoded to 1 second. This means that the client might have to issue a QPATHINFO/QFILEINFO call every 1 second to verify if something has changes, which seems too expensive. On the other hand, if the timeout is hardcoded to a higher value, workloads that expect strict cache coherency might see unexpected results. Making attribute cache timeout as a tunable will allow us to make a tradeoff between performance and cache metadata correctness depending on the application/workload needs. Add 'actimeo' tunable that can be used to tune the attribute cache timeout. The default timeout is set to 1 second. Also, display actimeo option value in /proc/mounts. It appears to me that 'actimeo' and the proposed (but not yet merged) 'strictcache' option cannot coexist, so care must be taken that we reset the other option if one of them is set. Changes since last post: - fix option parsing and handle possible values correcly Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Signed-off-by: Steve French <sfrench@us.ibm.com>	2010-12-02 19:32:11 +00:00
Linus Torvalds	8cb280c90f	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: only run xfs_error_test if error injection is active xfs: avoid moving stale inodes in the AIL xfs: delayed alloc blocks beyond EOF are valid after writeback xfs: push stale, pinned buffers on trylock failures xfs: fix failed write truncation handling.	2010-12-02 09:13:36 -08:00
Linus Torvalds	8520eeaa12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: fix parsing of hostname in dfs referrals cifs: display fsc in /proc/mounts cifs: enable fscache iff fsc mount option is used explicitly cifs: allow fsc mount option only if CONFIG_CIFS_FSCACHE is set cifs: Handle extended attribute name cifs_acl to generate cifs acl blob (try #4) cifs: Misc. cleanup in cifsacl handling [try #4] cifs: trivial comment fix for cifs_invalidate_mapping [CIFS] fs/cifs/Kconfig: CIFS depends on CRYPTO_HMAC cifs: don't take extra tlink reference in initiate_cifs_search cifs: Percolate error up to the caller during get/set acls [try #4] cifs: fix another memleak, in cifs_root_iget cifs: fix potential use-after-free in cifs_oplock_break_put	2010-12-02 08:04:21 -08:00
Trond Myklebust	11de3b11e0	NFS: Fix a memory leak in nfs_readdir We need to ensure that the entries in the nfs_cache_array get cleared when the page is removed from the page cache. To do so, we use the freepage address_space operation. Change nfs_readdir_clear_array to use kmap_atomic(), so that the function can be safely called from all contexts. Finally, modify the cache_page_release helper to call nfs_readdir_clear_array directly, when dealing with an anonymous page from 'uncached_readdir'. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-02 09:58:00 -05:00
Herb Shiu	a5b10629ed	ceph: Behave better when handling file lock replies. Fill in the local lock with response data if appropriate, and don't call posix_lock_file when reading locks. Signed-off-by: Herb Shiu <herb_shiu@tcloudcomputing.com> Acked-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-01 14:22:34 -08:00
Herb Shiu	637ae8d547	ceph: pass lock information by struct file_lock instead of as individual params. Signed-off-by: Herb Shiu <herb_shiu@tcloudcomputing.com> Acked-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-01 14:22:34 -08:00
Herb Shiu	25933abdd8	ceph: Handle file locks in replies from the MDS. Previously the kernel client incorrectly assumed everything was a directory. Signed-off-by: Herb Shiu <herb_shiu@tcloudcomputing.com> Acked-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-01 14:22:27 -08:00
Sage Weil	884ea89276	ceph: avoid possible null deref in readdir after dir llseek last may be NULL, but we dereference it in the else branch without checking. Normally it doesn't trigger because last == NULL when fpos == 2, but it could happen on a newly opened dir if the user seeks forward. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-01 14:15:31 -08:00
Dave Chinner	c76febef57	xfs: only run xfs_error_test if error injection is active Recent tests writing lots of small files showed the flusher thread being CPU bound and taking a long time to do allocations on a debug kernel. perf showed this as the prime reason: samples pcnt function DSO _______ _____ ___________________________ _________________ 224648.00 36.8% xfs_error_test [kernel.kallsyms] 86045.00 14.1% xfs_btree_check_sblock [kernel.kallsyms] 39778.00 6.5% prandom32 [kernel.kallsyms] 37436.00 6.1% xfs_btree_increment [kernel.kallsyms] 29278.00 4.8% xfs_btree_get_rec [kernel.kallsyms] 27717.00 4.5% random32 [kernel.kallsyms] Walking btree blocks during allocation checking them requires each block (a cache hit, so no I/O) call xfs_error_test(), which then does a random32() call as the first operation. IOWs, ~50% of the CPU is being consumed just testing whether we need to inject an error, even though error injection is not active. Kill this overhead when error injection is not active by adding a global counter of active error traps and only calling into xfs_error_test when fault injection is active. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-01 07:40:20 -06:00
Dave Chinner	de25c1818c	xfs: avoid moving stale inodes in the AIL When an inode has been marked stale because the cluster is being freed, we don't want to (re-)insert this inode into the AIL. There is a race condition where the cluster buffer may be unpinned before the inode is inserted into the AIL during transaction committed processing. If the buffer is unpinned before the inode item has been committed and inserted, then it is possible for the buffer to be released and hence processthe stale inode callbacks before the inode is inserted into the AIL. In this case, we then insert a clean, stale inode into the AIL which will never get removed by an IO completion. It will, however, get reclaimed and that triggers an assert in xfs_inode_free() complaining about freeing an inode still in the AIL. This race can be avoided by not moving stale inodes forward in the AIL during transaction commit completion processing. This closes the race condition by ensuring we never insert clean stale inodes into the AIL. It is safe to do this because a dirty stale inode, by definition, must already be in the AIL. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-01 07:40:20 -06:00
Dave Chinner	309c848002	xfs: delayed alloc blocks beyond EOF are valid after writeback There is an assumption in the parts of XFS that flushing a dirty file will make all the delayed allocation blocks disappear from an inode. That is, that after calling xfs_flush_pages() then ip->i_delayed_blks will be zero. This is an invalid assumption as we may have specualtive preallocation beyond EOF and they are recorded in ip->i_delayed_blks. A flush of the dirty pages of an inode will not change the state of these blocks beyond EOF, so a non-zero deeelalloc block count after a flush is valid. The bmap code has an invalid ASSERT() that needs to be removed, and the swapext code has a bug in that while it swaps the data forks around, it fails to swap the i_delayed_blks counter associated with the fork and hence can get the block accounting wrong. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-01 07:40:20 -06:00
Dave Chinner	90810b9e82	xfs: push stale, pinned buffers on trylock failures As reported by Nick Piggin, XFS is suffering from long pauses under highly concurrent workloads when hosted on ramdisks. The problem is that an inode buffer is stuck in the pinned state in memory and as a result either the inode buffer or one of the inodes within the buffer is stopping the tail of the log from being moved forward. The system remains in this state until a periodic log force issued by xfssyncd causes the buffer to be unpinned. The main problem is that these are stale buffers, and are hence held locked until the transaction/checkpoint that marked them state has been committed to disk. When the filesystem gets into this state, only the xfssyncd can cause the async transactions to be committed to disk and hence unpin the inode buffer. This problem was encountered when scaling the busy extent list, but only the blocking lock interface was fixed to solve the problem. Extend the same fix to the buffer trylock operations - if we fail to lock a pinned, stale buffer, then force the log immediately so that when the next attempt to lock it comes around, it will have been unpinned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-01 07:40:20 -06:00
Dave Chinner	c726de4409	xfs: fix failed write truncation handling. Since the move to the new truncate sequence we call xfs_setattr to truncate down excessively instanciated blocks. As shown by the testcase in kernel.org BZ #22452 that doesn't work too well. Due to the confusion of the internal inode size, and the VFS inode i_size it zeroes data that it shouldn't. But full blown truncate seems like overkill here. We only instanciate delayed allocations in the write path, and given that we never released the iolock we can't have converted them to real allocations yet either. The only nasty case is pre-existing preallocation which we need to skip. We already do this for page discard during writeback, so make the delayed allocation block punching a generic function and call it from the failed write path as well as xfs_aops_discard_page. The callers are responsible for ensuring that partial blocks are not truncated away, and that they hold the ilock. Based on a fix originally from Christoph Hellwig. This version used filesystem blocks as the range unit. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-01 07:40:19 -06:00
Trond Myklebust	0aded708d1	NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler We need to use the cookie from the previous array entry, not the actual cookie that we are searching for (except for the case of uncached_readdir). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-01 08:16:16 -05:00
Oleg Nesterov	114279be21	exec: copy-and-paste the fixes into compat_do_execve() paths Note: this patch targets 2.6.37 and tries to be as simple as possible. That is why it adds more copy-and-paste horror into fs/compat.c and uglifies fs/exec.c, this will be cleanuped later. compat_copy_strings() plays with bprm->vma/mm directly and thus has two problems: it lacks the RLIMIT_STACK check and argv/envp memory is not visible to oom killer. Export acct_arg_size() and get_arg_page(), change compat_copy_strings() to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0) as do_execve() does. Add the fatal_signal_pending/cond_resched checks into compat_count() and compat_copy_strings(), this matches the code in fs/exec.c and certainly makes sense. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-30 17:56:38 -08:00
Oleg Nesterov	3c77f84572	exec: make argv/envp memory visible to oom-killer Brad Spengler published a local memory-allocation DoS that evades the OOM-killer (though not the virtual memory RLIMIT): http://www.grsecurity.net/~spender/64bit_dos.c execve()->copy_strings() can allocate a lot of memory, but this is not visible to oom-killer, nobody can see the nascent bprm->mm and take it into account. With this patch get_arg_page() increments current's MM_ANONPAGES counter every time we allocate the new page for argv/envp. When do_execve() succeds or fails, we change this counter back. Technically this is not 100% correct, we can't know if the new page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but I don't think this really matters and everything becomes correct once exec changes ->mm or fails. Reported-by: Brad Spengler <spender@grsecurity.net> Reviewed-and-discussed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-30 17:56:37 -08:00

1 2 3 4 5 ...

20529 Commits