linux

iv/linux

Author	SHA1	Message	Date
Erez Zadok	0d132f7364	ecryptfs: don't ignore return value from lock_rename Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-01-19 22:35:59 -06:00
Erez Zadok	e27759d7a3	ecryptfs: initialize private persistent file before dereferencing pointer Ecryptfs_open dereferences a pointer to the private lower file (the one stored in the ecryptfs inode), without checking if the pointer is NULL. Right afterward, it initializes that pointer if it is NULL. Swap order of statements to first initialize. Bug discovered by Duckjin Kang. Signed-off-by: Duckjin Kang <fromdj2k@gmail.com> Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-01-19 22:32:54 -06:00
Tyler Hicks	38e3eaeedc	eCryptfs: Remove mmap from directory operations Adrian reported that mkfontscale didn't work inside of eCryptfs mounts. Strace revealed the following: open("./", O_RDONLY\|O_NONBLOCK\|O_LARGEFILE\|O_DIRECTORY\|O_CLOEXEC) = 3 fcntl64(3, F_GETFD) = 0x1 (flags FD_CLOEXEC) open("./fonts.scale", O_WRONLY\|O_CREAT\|O_TRUNC, 0666) = 4 getdents(3, /* 80 entries */, 32768) = 2304 open("./.", O_RDONLY) = 5 fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 fstat64(5, {st_mode=S_IFDIR\|0755, st_size=16384, ...}) = 0 mmap2(NULL, 16384, PROT_READ, MAP_PRIVATE, 5, 0) = 0xb7fcf000 close(5) = 0 --- SIGBUS (Bus error) @ 0 (0) --- +++ killed by SIGBUS +++ The mmap2() on a directory was successful, resulting in a SIGBUS signal later. This patch removes mmap() from the list of possible ecryptfs_dir_fops so that mmap() isn't possible on eCryptfs directory files. https://bugs.launchpad.net/ecryptfs/+bug/400443 Reported-by: Adrian C. <anrxc@sysphere.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-01-19 22:32:11 -06:00
Tyler Hicks	f8f484d1b6	eCryptfs: Add getattr function The i_blocks field of an eCryptfs inode cannot be trusted, but generic_fillattr() uses it to instantiate the blocks field of a stat() syscall when a filesystem doesn't implement its own getattr(). Users have noticed that the output of du is incorrect on newly created files. This patch creates ecryptfs_getattr() which calls into the lower filesystem's getattr() so that eCryptfs can use its kstat.blocks value after calling generic_fillattr(). It is important to note that the block count includes the eCryptfs metadata stored in the beginning of the lower file plus any padding used to fill an extent before encryption. https://bugs.launchpad.net/ecryptfs/+bug/390833 Reported-by: Dominic Sacré <dominic.sacre@gmx.de> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-01-19 22:32:09 -06:00
Tyler Hicks	5f3ef64f4d	eCryptfs: Use notify_change for truncating lower inodes When truncating inodes in the lower filesystem, eCryptfs directly invoked vmtruncate(). As Christoph Hellwig pointed out, vmtruncate() is a filesystem helper function, but filesystems may need to do more than just a call to vmtruncate(). This patch moves the lower inode truncation out of ecryptfs_truncate() and renames the function to truncate_upper(). truncate_upper() updates an iattr for the lower inode to indicate if the lower inode needs to be truncated upon return. ecryptfs_setattr() then calls notify_change(), using the updated iattr for the lower inode, to complete the truncation. For eCryptfs functions needing to truncate, ecryptfs_truncate() is reintroduced as a simple way to truncate the upper inode to a specified size and then truncate the lower inode accordingly. https://bugs.launchpad.net/bugs/451368 Reported-by: Christoph Hellwig <hch@lst.de> Acked-by: Dustin Kirkland <kirkland@canonical.com> Cc: ecryptfs-devel@lists.launchpad.net Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2010-01-19 22:32:07 -06:00
Linus Torvalds	1e868d8e6d	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: xfs_swap_extents needs to handle dynamic fork offsets xfs: fix missing error check in xfs_rtfree_range xfs: fix stale inode flush avoidance xfs: Remove inode iolock held check during allocation xfs: reclaim all inodes by background tree walks xfs: Avoid inodes in reclaim when flushing from inode cache xfs: reclaim inodes under a write lock	2010-01-18 14:08:07 -08:00
Linus Torvalds	7dc9c484a7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: do_add_mount() should sanitize mnt_flags CIFS shouldn't make mountpoints shrinkable mnt_flags fixes in do_remount() attach_recursive_mnt() needs to hold vfsmount_lock over set_mnt_shared() may_umount() needs namespace_sem Fix configfs leak Fix the -ESTALE handling in do_filp_open() ecryptfs: Fix refcnt leak on ecryptfs_follow_link() error path Fix ACC_MODE() for real Unrot uml mconsole a bit hppfs: handle ->put_link() Kill 9p readlink() fix autofs/afs/etc. magic mountpoint breakage	2010-01-17 11:01:16 -08:00
David Howells	7e6608724c	nommu: fix shared mmap after truncate shrinkage problems Fix a problem in NOMMU mmap with ramfs whereby a shared mmap can happen over the end of a truncation. The problem is that ramfs_nommu_check_mappings() checks that the reduced file size against the VMA tree, but not the vm_region tree. The following sequence of events can cause the problem: fd = open("/tmp/x", O_RDWR\|O_TRUNC\|O_CREAT, 0600); ftruncate(fd, 32 * 1024); a = mmap(NULL, 32 * 1024, PROT_READ\|PROT_WRITE, MAP_SHARED, fd, 0); b = mmap(NULL, 16 * 1024, PROT_READ\|PROT_WRITE, MAP_SHARED, fd, 0); munmap(a, 32 * 1024); ftruncate(fd, 16 * 1024); c = mmap(NULL, 32 * 1024, PROT_READ\|PROT_WRITE, MAP_SHARED, fd, 0); Mapping 'a' creates a vm_region covering 32KB of the file. Mapping 'b' sees that the vm_region from 'a' is covering the region it wants and so shares it, pinning it in memory. Mapping 'a' then goes away and the file is truncated to the end of VMA 'b'. However, the region allocated by 'a' is still in effect, and has _not_ been reduced. Mapping 'c' is then created, and because there's a vm_region covering the desired region, get_unmapped_area() is _not_ called to repeat the check, and the mapping is granted, even though the pages from the latter half of the mapping have been discarded. However: d = mmap(NULL, 16 * 1024, PROT_READ\|PROT_WRITE, MAP_SHARED, fd, 0); Mapping 'd' should work, and should end up sharing the region allocated by 'a'. To deal with this, we shrink the vm_region struct during the truncation, lest do_mmap_pgoff() take it as licence to share the full region automatically without calling the get_unmapped_area() file op again. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-16 12:15:40 -08:00
David Howells	81759b5b22	nommu: fix race between ramfs truncation and shared mmap Fix the race between the truncation of a ramfs file and an attempt to make a shared mmap of region of that file. The problem is that do_mmap_pgoff() calls f_op->get_unmapped_area() to verify that the file region is made of contiguous pages and to find its base address - but there isn't any locking to guarantee this region until vma_prio_tree_insert() is called by add_vma_to_mm(). Note that moving the functionality into f_op->mmap() doesn't help as that is also called before vma_prio_tree_insert(). Instead make ramfs_nommu_check_mappings() grab nommu_region_sem whilst it does its checks. This means that this function will wait whilst mmaps take place. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-16 12:15:40 -08:00
Al Viro	27d55f1f4c	do_add_mount() should sanitize mnt_flags MNT_WRITE_HOLD shouldn't leak into new vfsmount and neither should MNT_SHARED (the latter will be set properly, along with the rest of shared-subtree data structures) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-16 13:07:36 -05:00
Al Viro	7e1295d9f8	CIFS shouldn't make mountpoints shrinkable Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-16 13:06:32 -05:00
Al Viro	7b43a79f32	mnt_flags fixes in do_remount() * need vfsmount_lock over modifying it * need to preserve MNT_SHARED/MNT_UNBINDABLE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-16 13:01:26 -05:00
Al Viro	df1a1ad297	attach_recursive_mnt() needs to hold vfsmount_lock over set_mnt_shared() race in mnt_flags update Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-16 12:57:40 -05:00
Al Viro	8ad08d8a0c	may_umount() needs namespace_sem otherwise it races with clone_mnt() changing mnt_share/mnt_slaves Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-16 12:56:08 -05:00
Eric Paris	976ae32be4	inotify: only warn once for inotify problems inotify will WARN() if it finds that the idr and the fsnotify internals somehow got out of sync. It was only supposed to do this once but due to this stupid bug it would warn every single time a problem was detected. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-15 14:49:23 -08:00
Eric Paris	9e572cc987	inotify: do not reuse watch descriptors Since commit `7e790dd5fc` ("inotify: fix error paths in inotify_update_watch") inotify changed the manor in which it gave watch descriptors back to userspace. Previous to this commit inotify acted like the following: inotify_add_watch(X, Y, Z) = 1 inotify_rm_watch(X, 1); inotify_add_watch(X, Y, Z) = 2 but after this patch inotify would return watch descriptors like so: inotify_add_watch(X, Y, Z) = 1 inotify_rm_watch(X, 1); inotify_add_watch(X, Y, Z) = 1 which I saw as equivalent to opening an fd where open(file) = 1; close(1); open(file) = 1; seemed perfectly reasonable. The issue is that quite a bit of userspace apparently relies on the behavior in which watch descriptors will not be quickly reused. KDE relies on it, I know some selinux packages rely on it, and I have heard complaints from other random sources such as debian bug 558981. Although the man page implies what we do is ok, we broke userspace so this patch almost reverts us to the old behavior. It is still slightly racey and I have patches that would fix that, but they are rather large and this will fix it for all real world cases. The race is as follows: - task1 creates a watch and blocks in idr_new_watch() before it updates the hint. - task2 creates a watch and updates the hint. - task1 updates the hint with it's older wd - task removes the watch created by task2 - task adds a new watch and will reuse the wd originally given to task2 it requires moving some locking around the hint (last_wd) but this should solve it for the real world and be -stable safe. As a side effect this patch papers over a bug in the lib/idr code which is causing a large number WARN's to pop on people's system and many reports in kerneloops.org. I'm working on the root cause of that idr bug seperately but this should make inotify immune to that issue. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-15 14:49:23 -08:00
Dave Chinner	e09f98606d	xfs: xfs_swap_extents needs to handle dynamic fork offsets When swapping extents, we can corrupt inodes by swapping data forks that are in incompatible formats. This is caused by the two indoes having different fork offsets due to the presence of an attribute fork on an attr2 filesystem. xfs_fsr tries to be smart about setting the fork offset, but the trick it plays only works on attr1 (old fixed format attribute fork) filesystems. Changing the way xfs_fsr sets up the attribute fork will prevent this situation from ever occurring, so in the kernel code we can get by with a preventative fix - check that the data fork in the defragmented inode is in a format valid for the inode it is being swapped into. This will lead to files that will silently and potentially repeatedly fail defragmentation, so issue a warning to the log when this particular failure occurs to let us know that xfs_fsr needs updating/fixing. To help identify how to improve xfs_fsr to avoid this issue, add trace points for the inodes being swapped so that we can determine why the swap was rejected and to confirm that the code is making the right decisions and modifications when swapping forks. A further complication is even when the swap is allowed to proceed when the fork offset is different between the two inodes then value for the maximum number of extents the data fork can hold can be wrong. Make sure these are also set correctly after the swap occurs. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-15 13:49:07 -06:00
Dave Chinner	3daeb42c13	xfs: fix missing error check in xfs_rtfree_range When xfs_rtfind_forw() returns an error, the block is returned uninitialised. xfs_rtfree_range() is not checking the error return, so could be using an uninitialised block number for modifying bitmap summary info. The problem was found by gcc when compiling the userspace libxfs code - it is an copy of the kernel code with the exact same bug. gcc gives an uninitialised variable warning on the userspace code but not on the kernel code. You gotta love the consistency (Mmmm, slightly chewy today!). Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-15 13:46:19 -06:00
Dave Chinner	4b6a46882c	xfs: fix stale inode flush avoidance When reclaiming stale inodes, we need to guarantee that inodes are unpinned before returning with a "clean" status. If we don't we can reclaim inodes that are pinned, leading to use after free in the transaction subsystem as transactions complete. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-15 13:46:02 -06:00
Dave Chinner	126976c7c1	xfs: Remove inode iolock held check during allocation lockdep complains about a the lock not being initialised as we do an ASSERT based check that the lock is not held before we initialise it to catch inodes freed with the lock held. lockdep does this check for us in the lock initialisation code, so remove the ASSERT to stop the lockdep warning. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-15 13:45:33 -06:00
Dave Chinner	57817c6822	xfs: reclaim all inodes by background tree walks We cannot do direct inode reclaim without taking the flush lock to ensure that we do not reclaim an inode under IO. We check the inode is clean before doing direct reclaim, but this is not good enough because the inode flush code marks the inode clean once it has copied the in-core dirty state to the backing buffer. It is the flush lock that determines whether the inode is still under IO, even though it is marked clean, and the inode is still required at IO completion so we can't reclaim it even though it is clean in core. Hence the requirement that we need to take the flush lock even on clean inodes because this guarantees that the inode writeback IO has completed and it is safe to reclaim the inode. With delayed write inode flushing, we coul dend up waiting a long time on the flush lock even for a clean inode. The background reclaim already handles this efficiently, so avoid all the problems by killing the direct reclaim path altogether. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-15 13:44:44 -06:00
Dave Chinner	018027be90	xfs: Avoid inodes in reclaim when flushing from inode cache The reclaim code will handle flushing of dirty inodes before reclaim occurs, so avoid them when determining whether an inode is a candidate for flushing to disk when walking the radix trees. This is based on a test patch from Christoph Hellwig. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-15 13:44:21 -06:00
Dave Chinner	c8e20be020	xfs: reclaim inodes under a write lock Make the inode tree reclaim walk exclusive to avoid races with concurrent sync walkers and lookups. This is a version of a patch posted by Christoph Hellwig that avoids all the code duplication. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-15 13:43:55 -06:00
Al Viro	9b6e310211	Fix configfs leak Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-14 09:05:42 -05:00
Al Viro	9850c05655	Fix the -ESTALE handling in do_filp_open() Instead of playing sick games with path saving, cleanups, just retry the entire thing once with LOOKUP_REVAL added. Post-.34 we'll convert all -ESTALE handling in there to that style, rather than playing with many retry loops deep in the call chain. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-14 09:05:26 -05:00
OGAWA Hirofumi	806892e9e1	ecryptfs: Fix refcnt leak on ecryptfs_follow_link() error path If ->follow_link handler return the error, it should decrement nd->path refcnt. But, ecryptfs_follow_link() doesn't decrement. This patch fix it by using usual nd_set_link() style error handling, instead of playing with nd->path. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-14 09:05:26 -05:00
Al Viro	6d125529c6	Fix ACC_MODE() for real commit `5300990c03` had stepped on a rather nasty mess: definitions of ACC_MODE used to be different. Fixed the resulting breakage, converting them to variant that takes O_... value; all callers have that and it actually simplifies life (see tomoyo part of changes). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-14 09:05:26 -05:00
Al Viro	7b264fc2be	hppfs: handle ->put_link() current code works only because nothing in procfs has non-trivial ->put_link(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-14 09:05:25 -05:00
Al Viro	204f2f0e82	Kill 9p readlink() For symlinks generic_readlink() will work just fine and for directories we don't want ->readlink() at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-14 09:05:25 -05:00
Al Viro	86acdca1b6	fix autofs/afs/etc. magic mountpoint breakage We end up trying to kfree() nd.last.name on open("/mnt/tmp", O_CREAT) if /mnt/tmp is an autofs direct mount. The reason is that nd.last_type is bogus here; we want LAST_BIND for everything of that kind and we get LAST_NORM left over from finding parent directory. So make sure that it is set properly; set to LAST_BIND before doing ->follow_link() - for normal symlinks it will be changed by __vfs_follow_link() and everything else needs it set that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-01-14 09:05:25 -05:00
Linus Torvalds	e80c14e1ae	Merge branch 'fasync-helper' * fasync-helper: fasync: split 'fasync_helper()' into separate add/remove functions	2010-01-13 13:42:49 -08:00
Dave Chinner	2c761270d5	lib: Introduce generic list_sort function There are two copies of list_sort() in the tree already, one in the DRM code, another in ubifs. Now XFS needs this as well. Create a generic list_sort() function from the ubifs version and convert existing users to it so we don't end up with yet another copy in the tree. Signed-off-by: Dave Chinner <david@fromorbit.com> Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Artem Bityutskiy <dedekind@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-12 21:02:00 -08:00
Linus Torvalds	1b4d40a517	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: Ensure we force all busy extents in range to disk xfs: Don't flush stale inodes xfs: fix timestamp handling in xfs_setattr xfs: use DECLARE_EVENT_CLASS	2010-01-11 09:48:48 -08:00
Linus Torvalds	79ecb043ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: Use MAX_LFS_FILESIZE for meta inode size GFS2: Fix gfs2_xattr_acl_chmod() GFS2: Fix locking bug in rename GFS2: Ensure uptodate inode size when using O_APPEND	2010-01-11 09:48:29 -08:00
Linus Torvalds	db1fc95744	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: quota: Fix dquot_transfer for filesystems different from ext4	2010-01-11 09:48:14 -08:00
Minchan Kim	7f53a09ed4	smaps: fix wrong rss count A long time ago we regarded zero page as file_rss and vm_normal_page doesn't return NULL. But now, we reinstated ZERO_PAGE and vm_normal_page's implementation can return NULL in case of zero page. Also we don't count it with file_rss any more. Then, RSS and PSS can't be matched. For consistency, Let's ignore zero page in smaps_pte_range. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Acked-by: Matt Mackall <mpm@selenic.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-11 09:34:07 -08:00
KOSAKI Motohiro	1306d603fc	proc: partially revert "procfs: provide stack information for threads" Commit `d899bf7b` (procfs: provide stack information for threads) introduced to show stack information in /proc/{pid}/status. But it cause large performance regression. Unfortunately /proc/{pid}/status is used ps command too and ps is one of most important component. Because both to take mmap_sem and page table walk are heavily operation. If many process run, the ps performance is, [before `d899bf7b`] % perf stat ps >/dev/null Performance counter stats for 'ps': 4090.435806 task-clock-msecs # 0.032 CPUs 229 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 234 page-faults # 0.000 M/sec 8587565207 cycles # 2099.425 M/sec 9866662403 instructions # 1.149 IPC 3789415411 cache-references # 926.409 M/sec 30419509 cache-misses # 7.437 M/sec 128.859521955 seconds time elapsed [after `d899bf7b`] % perf stat ps > /dev/null Performance counter stats for 'ps': 4305.081146 task-clock-msecs # 0.028 CPUs 480 context-switches # 0.000 M/sec 2 CPU-migrations # 0.000 M/sec 237 page-faults # 0.000 M/sec 9021211334 cycles # 2095.480 M/sec 10605887536 instructions # 1.176 IPC 3612650999 cache-references # 839.160 M/sec 23917502 cache-misses # 5.556 M/sec 152.277819582 seconds time elapsed Thus, this patch revert it. Fortunately /proc/{pid}/task/{tid}/smaps provide almost same information. we can use it. Commit `d899bf7b` introduced two features: 1) Add the annotattion of [thread stack: xxxx] mark to /proc/{pid}/task/{tid}/maps. 2) Add StackUsage field to /proc/{pid}/status. I only revert (2), because I haven't seen (1) cause regression. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-11 09:34:06 -08:00
Jan Kara	05b5d89823	quota: Fix dquot_transfer for filesystems different from ext4 Commit `fd8fbfc1` modified the way we find amount of reserved space belonging to an inode. The amount of reserved space is checked from dquot_transfer and thus inode_reserved_space gets called even for filesystems that don't provide get_reserved_space callback which results in a BUG. Fix the problem by checking get_reserved_space callback and return 0 if the filesystem does not provide it. CC: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>	2010-01-11 13:06:41 +01:00
Steven Whitehouse	ba198098a2	GFS2: Use MAX_LFS_FILESIZE for meta inode size Using ~0ULL was cauing sign issues in filemap_fdatawrite_range, so use MAX_LFS_FILESIZE instead. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-01-11 08:57:55 +00:00
Dave Chinner	fd45e47841	xfs: Ensure we force all busy extents in range to disk When we search for and find a busy extent during allocation we force the log out to ensure the extent free transaction is on disk before the allocation transaction. The current implementation has a subtle bug in it--it does not handle multiple overlapping ranges. That is, if we free lots of little extents into a single contiguous extent, then allocate the contiguous extent, the busy search code stops searching at the first extent it finds that overlaps the allocated range. It then uses the commit LSN of the transaction to force the log out to. Unfortunately, the other busy ranges might have more recent commit LSNs than the first busy extent that is found, and this results in xfs_alloc_search_busy() returning before all the extent free transactions are on disk for the range being allocated. This can lead to potential metadata corruption or stale data exposure after a crash because log replay won't replay all the extent free transactions that cover the allocation range. Modified-by: Alex Elder <aelder@sgi.com> (Dropped the "found" argument from the xfs_alloc_busysearch trace event.) Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-10 12:22:02 -06:00
Dave Chinner	44e08c45cc	xfs: Don't flush stale inodes Because inodes remain in cache much longer than inode buffers do under memory pressure, we can get the situation where we have stale, dirty inodes being reclaimed but the backing storage has been freed. Hence we should never, ever flush XFS_ISTALE inodes to disk as there is no guarantee that the backing buffer is in cache and still marked stale when the flush occurs. Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-10 12:22:00 -06:00
Christoph Hellwig	d6d59bada3	xfs: fix timestamp handling in xfs_setattr We currently have some rather odd code in xfs_setattr for updating the a/c/mtime timestamps: - first we do a non-transaction update if all three are updated together - second we implicitly update the ctime for various changes instead of relying on the ATTR_CTIME flag - third we set the timestamps to the current time instead of the arguments in the iattr structure in many cases. This patch makes sure we update it in a consistent way: - always transactional - ctime is only updated if ATTR_CTIME is set or we do a size update, which is a special case - always to the times passed in from the caller instead of the current time The only non-size caller of xfs_setattr that doesn't come from the VFS is updated to set ATTR_CTIME and pass in a valid ctime value. Reported-by: Eric Blake <ebb9@byu.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-10 12:21:58 -06:00
Christoph Hellwig	ea9a48881e	xfs: use DECLARE_EVENT_CLASS Using DECLARE_EVENT_CLASS allows us to to use trace event code instead of duplicating it in the binary. This was not available before 2.6.33 so it had to be done as a separate step once the prerequisite was merged. This only requires changes to xfs_trace.h and the results are rather impressive: hch@brick:~/work/linux-2.6/obj-kvm$ size fs/xfs/xfs.o* text data bss dec hex filename 607732 41884 3616 653232 9f7b0 fs/xfs/xfs.o 1026732 41884 3808 1072424 105d28 fs/xfs/xfs.o.old Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-10 12:21:56 -06:00
Linus Torvalds	82062e7b50	Merge branch 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'reiserfs/kill-bkl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: reiserfs: Relax reiserfs_xattr_set_handle() while acquiring xattr locks reiserfs: Fix unreachable statement reiserfs: Don't call reiserfs_get_acl() with the reiserfs lock reiserfs: Relax lock on xattr removing reiserfs: Relax the lock before truncating pages reiserfs: Fix recursive lock on lchown reiserfs: Fix mistake in down_write() conversion	2010-01-08 14:03:55 -08:00
Linus Torvalds	dbd6a7cfea	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: kill some warnings on i386 builds	2010-01-08 13:57:32 -08:00
Linus Torvalds	e2b6d02cca	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: nfs: fix oops in nfs_rename() sunrpc: fix build-time warning sunrpc: on successful gss error pipe write, don't return error SUNRPC: Fix the return value in gss_import_sec_context() SUNRPC: Fix up an error return value in gss_import_sec_context_kerberos()	2010-01-08 13:55:14 -08:00
Dave Chinner	a539bd8c86	xfs: kill some warnings on i386 builds Randy Dunlap Reported printk() format-related warnings reported on i386 builds in his environment. Dave Chinner provided this patch to eliminate them. Signed-off by: Dave Chinner <david@fromorbit.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-01-08 13:32:29 -06:00
Steven Whitehouse	e412bdb126	GFS2: Fix gfs2_xattr_acl_chmod() The ref counting for the bh returned by gfs2_ea_find() was wrong. This patch ensures that we always drop the ref count to that bh correctly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-01-08 13:42:59 +00:00
Steven Whitehouse	24b977b5fd	GFS2: Fix locking bug in rename The rename code was taking a resource group lock in cases where it wasn't actually needed, this caused problems if the rename was resulting in an inode being unlinked. The patch ensures that we only take the rgrp lock early if it is really needed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-01-08 13:42:42 +00:00
Steven Whitehouse	56aa616a03	GFS2: Ensure uptodate inode size when using O_APPEND The VFS reads the inode size during generic_file_aio_write() but with no locking around it. In order to get the expected result from O_APPEND opens, this patch updated the inode size before calling generic_file_aio_write() There is of course still a race here, in that there is nothing to prevent another node coming in and extending the file in the mean time. On the other hand, when used with file locking this will ensure that the expected results are obtained. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-01-08 13:42:27 +00:00

1 2 3 4 5 ...

16475 Commits