linux

iv/linux

Author	SHA1	Message	Date
Al Viro	9f1fafee9e	merge handle_reval_dot and nameidata_drop_rcu_last new helper: complete_walk(). Done on successful completion of walk, drops out of RCU mode, does d_revalidate of final result if that hadn't been done already. handle_reval_dot() and nameidata_drop_rcu_last() subsumed into that one; callers converted to use of complete_walk(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-05-26 07:26:32 -04:00
Al Viro	19660af736	consolidate nameidata_..._drop_rcu() Merge these into a single function (unlazy_walk(nd, dentry)), kill ..._maybe variants Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-05-26 07:26:02 -04:00
Phillip Lougher	d7f2ff6718	Squashfs: update email address My existing email address may stop working in a month or two, so update email to one that will continue working. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-05-26 10:49:11 +01:00
Michal Marek	8d2c50e3b6	gfs2: Drop __TIME__ usage The kernel already prints its build timestamp during boot, no need to repeat it in random drivers and produce different object files each time. Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: cluster-devel@redhat.com Signed-off-by: Michal Marek <mmarek@suse.cz>	2011-05-26 10:54:37 +02:00
Michal Marek	75ce481e15	dlm: Drop __TIME__ usage The kernel already prints its build timestamp during boot, no need to repeat it in random drivers and produce different object files each time. Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Cc: cluster-devel@redhat.com Signed-off-by: Michal Marek <mmarek@suse.cz>	2011-05-26 09:46:17 +02:00
Joel Becker	ece928df16	Merge branch 'move_extents' of git://oss.oracle.com/git/tye/linux-2.6 into ocfs2-merge-window Conflicts: fs/ocfs2/ioctl.c	2011-05-25 21:51:55 -07:00
Tristan Ye	3d1c1829eb	Ocfs2: Teach local-mounted ocfs2 to handle unwritten_extents correctly. Oops, local-mounted of 'ocfs2_fops_no_plocks' is just missing the support of unwritten_extents/punching-hole due to no func pointer was given correctly to '.follocate' field. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 21:06:28 -07:00
Sunil Mushran	66effd3c68	ocfs2/dlm: Do not migrate resource to a node that is leaving the domain During dlm domain shutdown, o2dlm has to free all the lock resources. Ones that have no locks and references are freed. Ones that have locks and/or references are migrated to another node. The first task in migration is finding a target. Currently we scan the lock resource and find one node that either has a lock or a reference. This is not very efficient in a parallel umount case as we might end up migrating the lock resource to a node which itself may have to migrate it to a third node. The patch scans the dlm->exit_domain_map to ensure the target node is not leaving the domain. If no valid target node is found, o2dlm does not migrate the resource but instead waits for the unlock and deref messages that will allow it to free the resource. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-25 21:05:22 -07:00
Sunil Mushran	bddefdeec5	ocfs2/dlm: Add new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG This patch adds a new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG and ups the dlm protocol to 1.2. o2dlm sends this new message in dlm_unregister_domain() to mark the beginning of the exit domain. This message is sent to all nodes in the domain. Currently o2dlm has no way of informing other nodes of its impending exit. This information is useful as the other nodes could disregard the exiting node in certain operations. For example, in resource migration. If two or more nodes were umounting in parallel, it would be more efficient if o2dlm were to choose a non-exiting node to be the new master node rather than an exiting one. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-25 21:05:15 -07:00
Eric Paris	4db70f73e5	tmpfs: fix XATTR N overriding POSIX_ACL Y Choosing TMPFS_XATTR default N was switching off TMPFS_POSIX_ACL, even if it had been Y in oldconfig; and Linus reports that PulseAudio goes subtly wrong unless it can use ACLs on /dev/shm. Make TMPFS_POSIX_ACL select TMPFS_XATTR (and depend upon TMPFS), and move the TMPFS_POSIX_ACL entry before the TMPFS_XATTR entry, to avoid asking unnecessary questions then ignoring their answers. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 19:53:02 -07:00
Linus Torvalds	14d74e0cab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd: net: fix get_net_ns_by_fd for !CONFIG_NET_NS ns proc: Return -ENOENT for a nonexistent /proc/self/ns/ entry. ns: Declare sys_setns in syscalls.h net: Allow setting the network namespace by fd ns proc: Add support for the ipc namespace ns proc: Add support for the uts namespace ns proc: Add support for the network namespace. ns: Introduce the setns syscall ns: proc files for namespace naming policy.	2011-05-25 18:10:16 -07:00
Linus Torvalds	3f5785ec31	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (89 commits) bonding: documentation and code cleanup for resend_igmp bonding: prevent deadlock on slave store with alb mode (v3) net: hold rtnl again in dump callbacks Add Fujitsu 1000base-SX PCI ID to tg3 bnx2x: protect sequence increment with mutex sch_sfq: fix peek() implementation isdn: netjet - blacklist Digium TDM400P via-velocity: don't annotate MAC registers as packed xen: netfront: hold RTNL when updating features. sctp: fix memory leak of the ASCONF queue when free asoc net: make dev_disable_lro use physical device if passed a vlan dev (v2) net: move is_vlan_dev into public header file (v2) bug.h: Fix build with CONFIG_PRINTK disabled. wireless: fix fatal kernel-doc error + warning in mac80211.h wireless: fix cfg80211.h new kernel-doc warnings iwlagn: dbg_fixed_rate only used when CONFIG_MAC80211_DEBUGFS enabled dst: catch uninitialized metrics be2net: hash key for rss-config cmd not set bridge: initialize fake_rtable metrics net: fix __dst_destroy_metrics_generic() ... Fix up trivial conflicts in drivers/staging/brcm80211/brcmfmac/wl_cfg80211.c	2011-05-25 17:00:17 -07:00
Ding Dinghua	3991b4008c	jbd2: fix a potential leak of a journal_head on an error path drop jh->b_jcount in error path Signed-off-by: Ding Dinghua <dingdinghua@nrchpc.ac.cn> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-25 17:43:48 -04:00
Yongqiang Yang	1b16da77f9	ext4: teach ext4_ext_split to calculate extents efficiently Make ext4_ext_split() get extents to be moved by calculating in a statement instead of counting in a loop. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-25 17:41:48 -04:00
Jan Kara	ae24f28d39	ext4: Convert ext4 to new truncate calling convention Trivial conversion. Fixup one error handling case calling vmtruncate() and remove ->truncate callback. We also fix a bug that IS_IMMUTABLE and IS_APPEND files could not be truncated during failed writes. In fact, the test can be completely removed as upper layers do necessary permission checks for truncate in do_sys_[f]truncate() and may_open() anyway. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-25 17:39:48 -04:00
Linus Torvalds	57bb559574	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) ceph: fix cap flush race reentrancy libceph: subscribe to osdmap when cluster is full libceph: handle new osdmap down/state change encoding rbd: handle online resize of underlying rbd image ceph: avoid inode lookup on nfs fh reconnect ceph: use LOOKUPINO to make unconnected nfs fh more reliable rbd: use snprintf for disk->disk_name rbd: cleanup: make kfree match kmalloc rbd: warn on update_snaps failure on notify ceph: check return value for start_request in writepages ceph: remove useless check libceph: add missing breaks in addr_set_port libceph: fix TAG_WAIT case ceph: fix broken comparison in readdir loop libceph: fix osdmap timestamp assignment ceph: fix rare potential cap leak libceph: use snprintf for unknown addrs libceph: use snprintf for formatting object name ceph: use snprintf for dirstat content libceph: fix uninitialized value when no get_authorizer method is set ...	2011-05-25 11:46:31 -07:00
David S. Miller	22e95ac87d	Merge branch 'for-davem' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2011-05-25 13:28:55 -04:00
Phillip Lougher	1094a4a611	Squashfs: add extra sanity checks at mount time Add some extra sanity checks of the inode and directory structures. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-05-25 18:21:33 +01:00
Phillip Lougher	1cac63cc9b	Squashfs: add sanity checks to fragment reading at mount time Fsfuzzer generates corrupted filesystems which throw a warn_on in kmalloc. One of these is due to a corrupted superblock fragments field. Fix this by checking that the number of bytes to be read (and allocated) does not extend into the next filesystem structure. Also add a couple of other sanity checks of the mount-time fragment table structures. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-05-25 18:21:33 +01:00
Phillip Lougher	ac51a0a713	Squashfs: add sanity checks to lookup table reading at mount time Fsfuzzer generates corrupted filesystems which throw a warn_on in kmalloc. One of these is due to a corrupted superblock inodes field. Fix this by checking that the number of bytes to be read (and allocated) does not extend into the next filesystem structure. Also add a couple of other sanity checks of the mount-time lookup table structures. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-05-25 18:21:32 +01:00
Phillip Lougher	37986f63c8	Squashfs: add sanity checks to id reading at mount time Fsfuzzer generates corrupted filesystems which throw a warn_on in kmalloc. One of these is due to a corrupted superblock no_ids field. Fix this by checking that the number of bytes to be read (and allocated) does not extend into the next filesystem structure. Also add a couple of other sanity checks of the mount-time id table structures. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-05-25 18:21:32 +01:00
Phillip Lougher	6f04864515	Squashfs: add sanity checks to xattr reading at mount time These checks add sanity checking of the mount-time xattr structures. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-05-25 18:21:31 +01:00
Phillip Lougher	76e002f755	Squashfs: reverse order of filesystem table reading Reverse order of table reading from mostly first to last in placement order, to last to first. This is to enable extra superblock sanity checks to be added in later patches. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-05-25 18:21:31 +01:00
Phillip Lougher	82de647e1f	Squashfs: move table allocation into squashfs_read_table() This eliminates a lot of duplicate code. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>	2011-05-25 18:21:31 +01:00
Linus Torvalds	2a651c7f8d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: update Documentation pointers net/9p: enable 9p to work in non-default network namespace net/9p: p9_idpool_get return -1 on error fs/9p: Don't clunk dentry fid when we fail to get a writeback inode 9p: Small cleanup in <net/9p/9p.h> 9p: remove experimental tag from tested configurations 9p: typo fixes and minor cleanups net/9p: Change linuxdoc names to match functions.	2011-05-25 09:21:56 -07:00
Linus Torvalds	c88bc60a3b	Merge branch 'for-2.6.40/splice' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.40/splice' of git://git.kernel.dk/linux-2.6-block: splice: add wakeup_pipe_readers()	2011-05-25 09:20:20 -07:00
Linus Torvalds	798ce8f1cc	Merge branch 'for-2.6.40/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.40/core' of git://git.kernel.dk/linux-2.6-block: (40 commits) cfq-iosched: free cic_index if cfqd allocation fails cfq-iosched: remove unused 'group_changed' in cfq_service_tree_add() cfq-iosched: reduce bit operations in cfq_choose_req() cfq-iosched: algebraic simplification in cfq_prio_to_maxrq() blk-cgroup: Initialize ioc->cgroup_changed at ioc creation time block: move bd_set_size() above rescan_partitions() in __blkdev_get() block: call elv_bio_merged() when merged cfq-iosched: Make IO merge related stats per cpu cfq-iosched: Fix a memory leak of per cpu stats for root group backing-dev: Kill set but not used var in bdi_debug_stats_show() block: get rid of on-stack plugging debug checks blk-throttle: Make no throttling rule group processing lockless blk-cgroup: Make cgroup stat reset path blkg->lock free for dispatch stats blk-cgroup: Make 64bit per cpu stats safe on 32bit arch blk-throttle: Make dispatch stats per cpu blk-throttle: Free up a group only after one rcu grace period blk-throttle: Use helper function to add root throtl group to lists blk-throttle: Introduce a helper function to fill in device details blk-throttle: Dynamically allocate root group blk-cgroup: Allow sleeping while dynamically allocating a group ...	2011-05-25 09:14:07 -07:00
Christoph Hellwig	233eebb9a9	xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent The code in xfs_bmap_del_extent does not correctly decrement the extent buffer index when deleting a whole extent. Most of the time this gets caught by checks in xfs_bmapi that work around it and decrement it manually and thus wasn't noticed so far. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-25 10:48:38 -05:00
Christoph Hellwig	87bef1812d	xfs: check for valid indices in xfs_iext_get_ext and xfs_iext_idx_to_irec Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-25 10:48:37 -05:00
Christoph Hellwig	ab1908a5bb	xfs: fix up asserts in xfs_iflush_fork Remove asserts in xfs_iflush_fork that would call xfs_iext_get_ext with a potentially invalid extent buffer index. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-25 10:48:37 -05:00
Christoph Hellwig	f1c63b73cf	xfs: do not do pointer arithmetic on extent records We need to call xfs_iext_get_ext for the previous extent to get a valid pointer, and can't just do pointer arithmetics as they might be in different pages. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-25 10:48:37 -05:00
Christoph Hellwig	00239acf36	xfs: do not use unchecked extent indices in xfs_bunmapi Make sure to only call xfs_iext_get_ext after we've validate the extent index when moving on to the next index in xfs_bunmapi. Also remove the old workaround for too large indices that has been superceeded by the proper fix in xfs_bmap_del_extent. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-25 10:48:37 -05:00
Christoph Hellwig	5690f92199	xfs: do not use unchecked extent indices in xfs_bmapi Make sure to only call xfs_iext_get_ext after we've validate the extent index when moving on to the next index in xfs_bmapi. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-25 10:48:37 -05:00
Christoph Hellwig	2f2b3220b0	xfs: do not use unchecked extent indices in xfs_bmap_add_extent_* Make sure to only call xfs_iext_get_ext after we've validate the extent index in the various xfs_bmap_add_extent_* helpers. Based on an earlier patch from Lachlan McIlroy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-25 10:48:37 -05:00
Christoph Hellwig	ec90c55634	xfs: remove if_lastex The if_lastex field in struct xfs_ifork is only used as a temporary index during xfs_bmapi and xfs_bunmapi. Instead of using the inode fork to store it keep it local in the callchain. Fortunately this is very easy as we already pass a stack copy of it down the whole chain which can simplify be changed to be passed by reference. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-25 10:48:37 -05:00
Christoph Hellwig	548932739b	xfs: remove the unused XFS_BMAPI_RSVBLOCKS flag The XFS_BMAPI_RSVBLOCKS is unused, and as far as I can see has always been. Remove it to simplify the bmapi implementation and conserve stack space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-25 10:48:36 -05:00
Andrew Morton	2a5cac17c0	fs/ncpfs/inode.c: suppress used-uninitialised warning We get this spurious warning: fs/ncpfs/inode.c: In function 'ncp_fill_super': fs/ncpfs/inode.c:451: warning: 'data.mounted_vol[1u]' may be used uninitialized in this function fs/ncpfs/inode.c:451: warning: 'data.mounted_vol[2u]' may be used uninitialized in this function fs/ncpfs/inode.c:451: warning: 'data.mounted_vol[3u]' may be used uninitialized in this function ... It's notabug, but we can easily fix it with a memset(). Reported-by: Harry Wei <jiaweiwei.xiyou@gmail.com> Cc: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:56 -07:00
Amerigo Wang	e50c1f609c	fscache: remove dead code under CONFIG_WORKQUEUE_DEBUGFS There is no CONFIG_WORKQUEUE_DEBUGFS any more, so this code is dead. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:44 -07:00
Stephen Wilson	5b52fc890b	proc: allocate storage for numa_maps statistics once In show_numa_map() we collect statistics into a numa_maps structure. Since the number of NUMA nodes can be very large, this structure is not a candidate for stack allocation. Instead of going thru a kmalloc()+kfree() cycle each time show_numa_map() is invoked, perform the allocation just once when /proc/pid/numa_maps is opened. Performing the allocation when numa_maps is opened, and thus before a reference to the target tasks mm is taken, eliminates a potential stalemate condition in the oom-killer as originally described by Hugh Dickins: ... imagine what happens if the system is out of memory, and the mm we're looking at is selected for killing by the OOM killer: while we wait in __get_free_page for more memory, no memory is freed from the selected mm because it cannot reach exit_mmap while we hold that reference. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:35 -07:00
Stephen Wilson	f2beb79836	proc: make struct proc_maps_private truly private Now that mm/mempolicy.c is no longer implementing /proc/pid/numa_maps there is no need to export struct proc_maps_private to the world. Move it to fs/proc/internal.h instead. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:35 -07:00
Stephen Wilson	f69ff943df	mm: proc: move show_numa_map() to fs/proc/task_mmu.c Moving show_numa_map() from mempolicy.c to task_mmu.c solves several issues. - Having the show() operation "miles away" from the corresponding seq_file iteration operations is a maintenance burden. - The need to export ad hoc info like struct proc_maps_private is eliminated. - The implementation of show_numa_map() can be improved in a simple manner by cooperating with the other seq_file operations (start, stop, etc) -- something that would be messy to do without this change. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:34 -07:00
Eric Paris	b09e0fa4b4	tmpfs: implement generic xattr support Implement generic xattrs for tmpfs filesystems. The Feodra project, while trying to replace suid apps with file capabilities, realized that tmpfs, which is used on the build systems, does not support file capabilities and thus cannot be used to build packages which use file capabilities. Xattrs are also needed for overlayfs. The xattr interface is a bit odd. If a filesystem does not implement any {get,set,list}xattr functions the VFS will call into some random LSM hooks and the running LSM can then implement some method for handling xattrs. SELinux for example provides a method to support security.selinux but no other security.* xattrs. As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have xattr handler routines specifically to handle acls. Because of this tmpfs would loose the VFS/LSM helpers to support the running LSM. To make up for that tmpfs had stub functions that did nothing but call into the LSM hooks which implement the helpers. This new patch does not use the LSM fallback functions and instead just implements a native get/set/list xattr feature for the full security.* and trusted.* namespace like a normal filesystem. This means that tmpfs can now support both security.selinux and security.capability, which was not previously possible. The basic implementation is that I attach a: struct shmem_xattr { struct list_head list; /* anchored by shmem_inode_info->xattr_list / char name; size_t size; char value[0]; }; Into the struct shmem_inode_info for each xattr that is set. This implementation could easily support the user.* namespace as well, except some care needs to be taken to prevent large amounts of unswappable memory being allocated for unprivileged users. [mszeredi@suse.cz: new config option, suport trusted.*, support symlinks] Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Hugh Dickins <hughd@google.com> Tested-by: Jordi Pujol <jordipujolp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:31 -07:00
Ying Han	1495f230fa	vmscan: change shrinker API by passing shrink_control struct Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:26 -07:00
Ying Han	a09ed5e000	vmscan: change shrink_slab() interfaces by passing shrink_control Consolidate the existing parameters to shrink_slab() into a new shrink_control struct. This is needed later to pass the same struct to shrinkers. Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:25 -07:00
Peter Zijlstra	3d48ae45e7	mm: Convert i_mmap_lock to a mutex Straightforward conversion of i_mmap_lock to a mutex. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:18 -07:00
Peter Zijlstra	97a894136f	mm: Remove i_mmap_lock lockbreak Hugh says: "The only significant loser, I think, would be page reclaim (when concurrent with truncation): could spin for a long time waiting for the i_mmap_mutex it expects would soon be dropped? " Counter points: - cpu contention makes the spin stop (need_resched()) - zap pages should be freeing pages at a higher rate than reclaim ever can I think the simplification of the truncate code is definitely worth it. Effectively reverts: `2aa15890f3` ("mm: prevent concurrent unmap_mapping_range() on the same inode") and takes out the code that caused its problem. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:17 -07:00
Peter Zijlstra	d16dfc550f	mm: mmu_gather rework Rework the existing mmu_gather infrastructure. The direct purpose of these patches was to allow preemptible mmu_gather, but even without that I think these patches provide an improvement to the status quo. The first 9 patches rework the mmu_gather infrastructure. For review purpose I've split them into generic and per-arch patches with the last of those a generic cleanup. The next patch provides generic RCU page-table freeing, and the followup is a patch converting s390 to use this. I've also got 4 patches from DaveM lined up (not included in this series) that uses this to implement gup_fast() for sparc64. Then there is one patch that extends the generic mmu_gather batching. After that follow the mm preemptibility patches, these make part of the mm a lot more preemptible. It converts i_mmap_lock and anon_vma->lock to mutexes which together with the mmu_gather rework makes mmu_gather preemptible as well. Making i_mmap_lock a mutex also enables a clean-up of the truncate code. This also allows for preemptible mmu_notifiers, something that XPMEM I think wants. Furthermore, it removes the new and universially detested unmap_mutex. This patch: Remove the first obstacle towards a fully preemptible mmu_gather. The current scheme assumes mmu_gather is always done with preemption disabled and uses per-cpu storage for the page batches. Change this to try and allocate a page for batching and in case of failure, use a small on-stack array to make some progress. Preemptible mmu_gather is desired in general and usable once i_mmap_lock becomes a mutex. Doing it before the mutex conversion saves us from having to rework the code by moving the mmu_gather bits inside the pte_lock. Also avoid flushing the tlb batches from under the pte lock, this is useful even without the i_mmap_lock conversion as it significantly reduces pte lock hold times. [akpm@linux-foundation.org: fix comment tpyo] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:12 -07:00
Michal Hocko	d05f3169c0	mm: make expand_downwards() symmetrical with expand_upwards() Currently we have expand_upwards exported while expand_downwards is accessible only via expand_stack or expand_stack_downwards. check_stack_guard_page is a nice example of the asymmetry. It uses expand_stack for VM_GROWSDOWN while expand_upwards is called for VM_GROWSUP case. Let's clean this up by exporting both functions and make those names consistent. Let's use expand_{upwards,downwards} because expanding doesn't always involve stack manipulation (an example is ia64_do_page_fault which uses expand_upwards for registers backing store expansion). expand_downwards has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because get_arg_page calls the downwards version in the early process initialization phase for growsup configuration. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Hugh Dickins <hughd@google.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:12 -07:00
Aneesh Kumar K.V	398c4f0efb	fs/9p: Don't clunk dentry fid when we fail to get a writeback inode The dentry fid get clunked via the dput. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-05-25 08:46:38 -05:00
Eric Van Hensbergen	87211cd8db	9p: remove experimental tag from tested configurations The 9p client is currently undergoing regular regresssion and stress testing as a by-product of the virtfs work. I think its finally time to take off the experimental tags from the well-tested code paths. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-05-25 08:46:38 -05:00
Vivek Haldar	556b27abf7	ext4: do not normalize block requests from fallocate() Currently, an fallocate request of size slightly larger than a power of 2 is turned into two block requests, each a power of 2, with the extra blocks pre-allocated for future use. When an application calls fallocate, it already has an idea about how large the file may grow so there is usually little benefit to reserve extra blocks on the preallocation list. This reduces disk fragmentation. Tested: fsstress. Also verified manually that fallocat'ed files are contiguously laid out with this change (whereas without it they begin at power-of-2 boundaries, leaving blocks in between). CPU usage of fallocate is not appreciably higher. In a tight fallocate loop, CPU usage hovers between 5%-8% with this change, and 5%-7% without it. Using a simulated file system aging program which the file system to 70%, the percentage of free extents larger than 8MB (as measured by e2freefrag) increased from 38.8% without this change, to 69.4% with this change. Signed-off-by: Vivek Haldar <haldar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-25 07:41:54 -04:00
Allison Henderson	a4bb6b64e3	ext4: enable "punch hole" functionality This patch adds new routines: "ext4_punch_hole" "ext4_ext_punch_hole" and "ext4_ext_check_cache" fallocate has been modified to call ext4_punch_hole when the punch hole flag is passed. At the moment, we only support punching holes in extents, so this routine is pretty much a wrapper for the ext4_ext_punch_hole routine. The ext4_ext_punch_hole routine first completes all outstanding writes with the associated pages, and then releases them. The unblock aligned data is zeroed, and all blocks in between are punched out. The ext4_ext_check_cache routine is very similar to ext4_ext_in_cache except it accepts a ext4_ext_cache parameter instead of a ext4_extent parameter. This routine is used by ext4_ext_punch_hole to check and see if a block in a hole that has been cached. The ext4_ext_cache parameter is necessary because the members ext4_extent structure are not large enough to hold a 32 bit value. The existing ext4_ext_in_cache routine has become a wrapper to this new function. [ext4 punch hole patch series 5/5 v7] Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Mingming Cao <cmm@us.ibm.com>	2011-05-25 07:41:50 -04:00
Allison Henderson	e861304b8e	ext4: add "punch hole" flag to ext4_map_blocks() This patch adds a new flag to ext4_map_blocks() that specifies the given range of blocks should be punched out. Extents are first converted to uninitialized extents before they are punched out. Because punching a hole may require that the extent be split, it is possible that the splitting may need more blocks than are available. To deal with this, use of reserved blocks are enabled to allow the split to proceed. The routine then returns the number of blocks successfully punched out. [ext4 punch hole patch series 4/5 v7] Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Mingming Cao <cmm@us.ibm.com>	2011-05-25 07:41:46 -04:00
Allison Henderson	d583fb87a3	ext4: punch out extents This patch modifies the truncate routines to support hole punching Below is a brief summary of the patches changes: - Added end param to ext_ext4_rm_leaf This function has been modified to accept an end parameter which enables it to punch holes in leafs instead of just truncating them. - Implemented the "remove head" case in the ext_remove_blocks routine This routine is used by ext_ext4_rm_leaf to remove the tail of an extent during a truncate. The new ext_ext4_rm_leaf routine will now also use it to remove the head of an extent in the case that the hole covers a region of blocks at the beginning of an extent. - Added "end" param to ext4_ext_remove_space routine This function has been modified to accept a stop parameter, which is passed through to ext4_ext_rm_leaf. [ext4 punch hole patch series 3/5 v6] Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-25 07:41:43 -04:00
Allison Henderson	308488518d	ext4: add new function ext4_block_zero_page_range() This patch modifies the existing ext4_block_truncate_page() function which was used by the truncate code path, and which zeroes out block unaligned data, by adding a new length parameter, and renames it to ext4_block_zero_page_rage(). This function can now be used to zero out the head of a block, the tail of a block, or the middle of a block. The ext4_block_truncate_page() function is now a wrapper to ext4_block_zero_page_range(). [ext4 punch hole patch series 2/5 v7] Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Mingming Cao <cmm@us.ibm.com>	2011-05-25 07:41:32 -04:00
Allison Henderson	55f020db66	ext4: add flag to ext4_has_free_blocks This patch adds an allocation request flag to the ext4_has_free_blocks function which enables the use of reserved blocks. This will allow a punch hole to proceed even if the disk is full. Punching a hole may require additional blocks to first split the extents. Because ext4_has_free_blocks is a low level function, the flag needs to be passed down through several functions listed below: ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth ext4_ext_split ext4_ext_new_meta_block ext4_mb_new_blocks ext4_claim_free_blocks ext4_has_free_blocks [ext4 punch hole patch series 1/5 v7] Signed-off-by: Allison Henderson <achender@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Mingming Cao <cmm@us.ibm.com>	2011-05-25 07:41:26 -04:00
Tristan Ye	dda54e76d7	Ocfs2/move_extents: Set several trivial constraints for threshold. The threshold should be greater than clustersize and less than i_size. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:13 +08:00
Tristan Ye	4dfa66bd59	Ocfs2/move_extents: Let defrag handle partial extent moving. We're going to support partial extent moving, which may split entire extent movement into pieces to compromise the insuffice allocations, it eases the 'ENSPC' pain and makes the whole moving much less likely to fail, the downside is it may make the fs even more fragmented before moving, just let the userspace make a trade-off here. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:12 +08:00
Tristan Ye	53069d4e76	Ocfs2/move_extents: move/defrag extents within a certain range. the basic logic of moving extents for a file is pretty like punching-hole sequence, walk the extents within the range as user specified, calculating an appropriate len to defrag/move, then let ocfs2_defrag/move_extent() to do the actual moving. This func ends up setting 'OCFS2_MOVE_EXT_FL_COMPLETE' to userpace if operation gets done successfully. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:12 +08:00
Tristan Ye	ee16cc037e	Ocfs2/move_extents: helper to calculate the defraging length in one run. The helper is to calculate the defrag length in one run according to a threshold, it will proceed doing defragmentation until the threshold was meet, and skip a LARGE extent if any. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:12 +08:00
Tristan Ye	e08477176d	Ocfs2/move_extents: move entire/partial extent. ocfs2_move_extent() logic will validate the goal_offset_in_block, where extents to be moved, what's more, it also compromises a bit to probe the appropriate region around given goal_offset when the original goal is not able to fit the movement. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:11 +08:00
Tristan Ye	8473aa8a2b	Ocfs2/move_extents: helpers to update the group descriptor and global bitmap inode. These helpers were actually borrowed from alloc.c, which may be publicized later. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:11 +08:00
Tristan Ye	e6b5859ccc	Ocfs2/move_extents: helper to probe a proper region to move in an alloc group. Before doing the movement of extents, we'd better probe the alloc group from 'goal_blk' for searching a contiguous region to fit the wanted movement, we even will have a best-effort try by compromising to a threshold around the given goal. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:11 +08:00
Tristan Ye	99e4c75041	Ocfs2/move_extents: helper to validate and adjust moving goal. First best-effort attempt to validate and adjust the goal (physical address in block), while it can't guarantee later operation can succeed all the time since global_bitmap may change a bit over time. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:10 +08:00
Tristan Ye	1c06b91261	Ocfs2/move_extents: find the victim alloc group, where the given #blk fits. This function tries locate the right alloc group, where a given physical block resides, it returns the caller a buffer_head of victim group descriptor, and also the offset of block in this group, by passing the block number. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:10 +08:00
Tristan Ye	202ee5facb	Ocfs2/move_extents: defrag a range of extent. It's a relatively complete function to accomplish defragmentation for entire or partial extent, one journal handle was kept during the operation, it was logically doing one more thing than ocfs2_move_extent() acutally, yes, it's claiming the new clusters itself;-) Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:09 +08:00
Tristan Ye	8f603e567a	Ocfs2/move_extents: move a range of extent. The moving range of __ocfs2_move_extent() was within one extent always, it consists following parts: 1. Duplicates the clusters in pages to new_blkoffset, where extent to be moved. 2. Split the original extent with new extent, coalecse the nearby extents if possible. 3. Append old clusters to truncate log, or decrease_refcount if the extent was refcounted. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:09 +08:00
Tristan Ye	de474ee8bb	Ocfs2/move_extents: lock allocators and reserve metadata blocks and data clusters for extents moving. ocfs2_lock_allocators_move_extents() was like the common ocfs2_lock_allocators(), to lock metadata and data alloctors during extents moving, reserve appropriate metadata blocks and data clusters, also performa a best- effort to calculate the credits for journal transaction in one run of movement. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:09 +08:00
Tristan Ye	028ba5df63	Ocfs2/move_extents: Add basic framework and source files for extent moving. Adding new files move_extents.[c\|h] and fill it with nothing but only a context structure. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:08 +08:00
Tristan Ye	220ebc4334	Ocfs2/move_extents: Adding new ioctl code 'OCFS2_IOC_MOVE_EXT' to ocfs2. Patch also manages to add a manipulative struture for this ioctl. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:08 +08:00
Tristan Ye	3e19a25e05	Ocfs2/refcounttree: Publicize couple of funcs from refcounttree.c The original goal of commonizing these funcs is to benefit defraging/extent_moving codes in the future, based on the fact that reflink and defragmentation having the same Copy-On-Wrtie mechanism. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 15:17:08 +08:00
Tristan Ye	d24a10b9f8	Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl. This new code is a bit more complicated than former ones, the goal is to show user all statistics required to take a deep insight into filesystem on how the disk is being fragmentaed. The goal is achieved by scaning global bitmap from (cluster)group to group to figure out following factors in the filesystem: - How many free chunks in a fixed size as user requested. - How many real free chunks in all size. - Min/Max/Avg size(in) clusters of free chunks. - How do free chunks distribute(in size) in terms of a histogram, just like following: --------------------------------------------------------- Extent Size Range : Free extents Free Clusters Percent 32K... 64K- : 1 1 0.00% 1M... 2M- : 9 288 0.03% 8M... 16M- : 2 831 0.09% 32M... 64M- : 1 2047 0.23% 128M... 256M- : 1 8191 0.92% 256M... 512M- : 2 21706 2.43% 512M... 1024M- : 27 858623 96.29% --------------------------------------------------------- Userspace ioctl() call eventually gets the above info returned by passing a 'struct ocfs2_info_freefrag' with the chunk_size being specified first. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 12:18:07 +08:00
Tristan Ye	3e5db17d4d	Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl. The new code is dedicated to calculate free inodes number of all inode_allocs, then return the info to userpace in terms of an array. Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent' from userspace, is now going to be involved. setting the flag on means no cluster coherency considered, usually, userspace tools choose none-coherency strategy by default for the sake of performace. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 12:18:02 +08:00
Tristan Ye	8aa1fa360d	Ocfs2: Using inline funcs to set/clear FILLED flags in info handler. It just removes some macros for the sake of typechecking gains. Signed-off-by: Tristan Ye <tristan.ye@oracle.com>	2011-05-25 12:17:18 +08:00
Aditya Kali	ae81230686	ext4: reserve inodes and feature code for 'quota' feature I am working on patch to add quota as a built-in feature for ext4 filesystem. The implementation is based on the design given at https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4. This patch reserves the inode numbers 3 and 4 for quota purposes and also reserves EXT4_FEATURE_RO_COMPAT_QUOTA feature code. Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 19:00:39 -04:00
Johann Lombardi	c5e06d101a	ext4: add support for multiple mount protection Prevent an ext4 filesystem from being mounted multiple times. A sequence number is stored on disk and is periodically updated (every 5 seconds by default) by a mounted filesystem. At mount time, we now wait for s_mmp_update_interval seconds to make sure that the MMP sequence does not change. In case of failure, the nodename, bdevname and the time at which the MMP block was last updated is displayed. Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Johann Lombardi <johann@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 18:31:25 -04:00
Eric W. Biederman	62ca24baf1	ns proc: Return -ENOENT for a nonexistent /proc/self/ns/ entry. Spotted-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2011-05-24 15:30:33 -07:00
Kazuya Mio	d02a9391f7	ext4: ensure f_bfree returned by ext4_statfs() is non-negative I found the issue that the number of free blocks went negative. # stat -f /mnt/mp1/ File: "/mnt/mp1/" ID: e175ccb83a872efe Namelen: 255 Type: ext2/ext3 Block size: 4096 Fundamental block size: 4096 Blocks: Total: 258022 Free: -15 Available: -13122 Inodes: Total: 65536 Free: 63029 f_bfree in struct statfs will go negative when the filesystem has few free blocks. Because the number of dirty blocks is bigger than the number of free blocks in the following two cases. CASE 1: ext4_da_writepages mpage_da_map_and_submit ext4_map_blocks ext4_ext_map_blocks ext4_mb_new_blocks ext4_mb_diskspace_used percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len); <--- interrupt statfs systemcall ---> ext4_da_update_reserve_space percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + ei->i_allocated_meta_blocks); CASE 2: ext4_write_begin __block_write_begin ext4_map_blocks ext4_ext_map_blocks ext4_mb_new_blocks ext4_mb_diskspace_used percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len); <--- interrupt statfs systemcall ---> percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks); To avoid the issue, this patch ensures that f_bfree is non-negative. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>	2011-05-24 18:30:07 -04:00
Lukas Czerner	28739eea9c	ext4: protect bb_first_free in ext4_trim_all_free() with group lock We should protect reading bd_info->bb_first_free with the group lock because otherwise we might miss some free blocks. This is not a big deal at all, but the change to do right thing is really simple, so lets do that. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 18:28:07 -04:00
Lukas Czerner	7894408666	ext4: only load buddy bitmap in ext4_trim_fs() when it is needed Currently we are loading buddy ext4_mb_load_buddy() for every block group we are going through in ext4_trim_fs() in many cases just to find out that there is not enough space to be bothered with. As Amir Goldstein suggested we can use bb_free information directly from ext4_group_info. This commit removes ext4_mb_load_buddy() from ext4_trim_fs() and rather get the ext4_group_info via ext4_get_group_info() and use the bb_free information directly from that. This avoids unnecessary call to load buddy in the case the group does not have enough free space to trim. Loading buddy is now moved to ext4_trim_all_free(). Tested by me with xfstests 251. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 18:16:27 -04:00
Linus Torvalds	dc522adbee	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: jbd: Fix comment to match the code in journal_start() jbd/jbd2: remove obsolete summarise_journal_usage. jbd: Fix forever sleeping process in do_get_write_access() ext2: fix error msg when mounting fs with too-large blocksize jbd: fix fsync() tid wraparound bug ext3: Fix fs corruption when make_indexed_dir() fails ext3: Fix lock inversion in ext3_symlink()	2011-05-24 15:11:46 -07:00
Linus Torvalds	df3256f9ab	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: make plock operation killable dlm: remove shared message stub for recovery dlm: delayed reply message warning dlm: Remove superfluous call to recalc_sigpending()	2011-05-24 15:04:00 -07:00
Eryu Guan	c867516de5	jbd2: Fix comment to match the code in jbd2__journal_start() jbd2__journal_start() returns an ERR_PTR() value rather than NULL on failure. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 17:09:58 -04:00
John W. Linville	31ec97d9ce	Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem	2011-05-24 16:47:54 -04:00
Linus Torvalds	b0ca118dba	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (43 commits) TOMOYO: Fix wrong domainname validation. SELINUX: add /sys/fs/selinux mount point to put selinuxfs CRED: Fix load_flat_shared_library() to initialise bprm correctly SELinux: introduce path_has_perm flex_array: allow 0 length elements flex_arrays: allow zero length flex arrays flex_array: flex_array_prealloc takes a number of elements, not an end SELinux: pass last path component in may_create SELinux: put name based create rules in a hashtable SELinux: generic hashtab entry counter SELinux: calculate and print hashtab stats with a generic function SELinux: skip filename trans rules if ttype does not match parent dir SELinux: rename filename_compute_type argument to type instead of con SELinux: fix comment to state filename_compute_type takes an objname not a qstr SMACK: smack_file_lock can use the struct path LSM: separate LSM_AUDIT_DATA_DENTRY from LSM_AUDIT_DATA_PATH LSM: split LSM_AUDIT_DATA_FS into _PATH and _INODE SELINUX: Make selinux cache VFS RCU walks safe SECURITY: Move exec_permission RCU checks into security modules SELinux: security_read_policy should take a size_t not ssize_t ...	2011-05-24 13:38:19 -07:00
Sage Weil	db3540522e	ceph: fix cap flush race reentrancy In `e9964c10` we change cap flushing to do a delicate dance because some inodes on the cap_dirty list could be in a migrating state (got EXPORT but not IMPORT) in which we couldn't actually flush and move from dirty->flushing, breaking the while (!empty) { process first } loop structure. It worked for a single sync thread, but was not reentrant and triggered infinite loops when multiple syncers came along. Instead, move inodes with dirty to a separate cap_dirty_migrating list when in the limbo export-but-no-import state, allowing us to go back to the simple loop structure (which was reentrant). This is cleaner and more robust. Audited the cap_dirty users and this looks fine: list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we have dirty caps (which list we're on is irrelevant) and list_del_init() calls still do the right thing. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:12 -07:00
Sage Weil	45e3d3eeb6	ceph: avoid inode lookup on nfs fh reconnect If we get the inode from the MDS, we have a reference in req; don't do a fresh lookup. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:06 -07:00
Sage Weil	3c454cf216	ceph: use LOOKUPINO to make unconnected nfs fh more reliable If we are unable to locate an inode by ino, ask the MDS using the new LOOKUPINO command. Signed-off-by: Sage Weil <sage@newdream.net>	2011-05-24 11:52:05 -07:00
Linus Torvalds	eb08d8ff47	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: (52 commits) UBIFS: switch to dynamic printks UBIFS: fix kernel-doc comments UBIFS: fix extremely rare mount failure UBIFS: simplify LEB recovery function further UBIFS: always cleanup the recovered LEB UBIFS: clean up LEB recovery function UBIFS: fix-up free space on mount if flag is set UBIFS: add the fixup function UBIFS: add a superblock flag for free space fix-up UBIFS: share the next_log_lnum helper UBIFS: expect corruption only in last journal head LEBs UBIFS: synchronize write-buffer before switching to the next bud UBIFS: remove BUG statement UBIFS: change bud replay function conventions UBIFS: substitute the replay tree with a replay list UBIFS: simplify replay UBIFS: store free and dirty space in the bud replay entry UBIFS: remove unnecessary stack variable UBIFS: double check that buds are replied in order UBIFS: make 2 functions static ...	2011-05-24 11:51:07 -07:00
Christoph Hellwig	55a7bc5a30	xfs: do not discard alloc btree blocks Blocks for the allocation btree are allocated from and released to the AGFL, and thus frequently reused. Even worse we do not have an easy way to avoid using an AGFL block when it is discarded due to the simple FILO list of free blocks, and thus can frequently stall on blocks that are currently undergoing a discard. Add a flag to the busy extent tracking structure to skip the discard for allocation btree blocks. In normal operation these blocks are reused frequently enough that there is no need to discard them anyway, but if they spill over to the allocation btree as part of a balance we "leak" blocks that we would otherwise discard. We could fix this by adding another flag and keeping these block in the rbtree even after they aren't busy any more so that we could discard them when they migrate out of the AGFL. Given that this would cause significant overhead I don't think it's worthwile for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-24 11:17:22 -05:00
Christoph Hellwig	e84661aa84	xfs: add online discard support Now that we have reliably tracking of deleted extents in a transaction we can easily implement "online" discard support which calls blkdev_issue_discard once a transaction commits. The actual discard is a two stage operation as we first have to mark the busy extent as not available for reuse before we can start the actual discard. Note that we don't bother supporting discard for the non-delaylog mode. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-24 11:17:13 -05:00
Jan Kara	93628ffb9b	ext4: fix waiting and sending of a barrier in ext4_sync_file() jbd2_log_start_commit() returns 1 only when we really start a transaction. But we also need to wait for a transaction when the commit is already running. Fix this problem by waiting for transaction commit unconditionally (which is just a quick check if the transaction is already committed). Also we have to be more careful with sending of a barrier because when transaction is being committed in parallel to ext4_sync_file() running, we cannot be sure that the barrier the journalling code sends happens after we wrote all the data for fsync (note that not every data writeout needs to trigger metadata changes thus commit of some metadata changes can be running while other data is still written out). So use jbd2_will_send_data_barrier() helper to detect the common cases when we can be sure barrier will be issued by the commit code and issue the barrier ourselves in the remaining cases. Reported-by: Edward Goggin <egoggin@vmware.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 12:00:54 -04:00
Jan Kara	bbd2be3691	jbd2: Add function jbd2_trans_will_send_data_barrier() Provide a function which returns whether a transaction with given tid will send a flush to the filesystem device. The function will be used by ext4 to detect whether fsync needs to send a separate flush or not. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 11:59:18 -04:00
Jan Kara	81be12c817	jbd2: fix sending of data flush on journal commit In data=ordered mode, it's theoretically possible (however rare) that an inode is filed to transaction's t_inode_list and a flusher thread writes all the data and inode is reclaimed before the transaction starts to commit. In such a case, we could erroneously omit sending a flush to file system device when it is different from the journal device (because data can still be in disk cache only). Fix the problem by setting a flag in a transaction when some inode is added to it and then send disk flush in the commit code when the flag is set. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 11:52:40 -04:00
Yongqiang Yang	b221349fa8	ext4: fix ext4_ext_fiemap_cb() to handle blocks before request range correctly To get delayed-extent information, ext4_ext_fiemap_cb() looks up pagecache, it thus collects information starting from a page's head block. If blocksize < pagesize, the beginning blocks of a page may lies before the request range. So ext4_ext_fiemap_cb() should proceed ignoring them, because they has been handled before. If no mapped buffer in the range is found in the 1st page, we need to look up the 2nd page, otherwise delayed-extents after a hole will be ignored. Without this patch, xfstests 225 will hung on ext4 with 1K block. Reported-by: Amir Goldstein <amir73il@users.sourceforge.net> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-05-24 11:36:58 -04:00
James Morris	434d42cfd0	Merge branch 'next' into for-linus	2011-05-24 22:55:24 +10:00
Robin Dong	98ba073c60	ocfs2: change incorrect 'extern' keyword to 'static' in dlmfs Change function param_set_dlmfs_capabilities from 'extern' to 'static' since function param_get_dlmfs_capabilities is also 'static'. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:59:40 -07:00
Sunil Mushran	9f62e96084	ocfs2/dlm: dlm_is_lockres_migrateable() returns boolean Patch cleans up the gunk added by commit `388c4bcb4e`. dlm_is_lockres_migrateable() now returns 1 if lockresource is deemed migrateable and 0 if not. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:37:39 -07:00
Tao Ma	10fca35ff1	ocfs2: Add trace event for trim. Add the corresponding trace event for trim. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:37:20 -07:00
Tao Ma	55e67872b6	ocfs2: Add FITRIM ioctl. Add the corresponding ioctl function for FITRIM. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-05-23 23:37:19 -07:00

1 2 3 4 5 ...

22823 Commits