linux

iv/linux

Author	SHA1	Message	Date
Jan Kara	5acda9d12d	bdi: avoid oops on device removal After commit 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue") when device is removed while we are writing to it we crash in bdi_writeback_workfn() -> set_worker_desc() because bdi->dev is NULL. This can happen because even though bdi_unregister() cancels all pending flushing work, nothing really prevents new ones from being queued from balance_dirty_pages() or other places. Fix the problem by clearing BDI_registered bit in bdi_unregister() and checking it before scheduling of any flushing work. Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Derek Basehore <dbasehore@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-03 16:20:49 -07:00
Derek Basehore	6ca738d60c	backing_dev: fix hung task on sync bdi_wakeup_thread_delayed() used the mod_delayed_work() function to schedule work to writeback dirty inodes. The problem with this is that it can delay work that is scheduled for immediate execution, such as the work from sync_inodes_sb(). This can happen since mod_delayed_work() can now steal work from a work_queue. This fixes the problem by using queue_delayed_work() instead. This is a regression caused by commit 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue"). The reason that this causes a problem is that laptop-mode will change the delay, dirty_writeback_centisecs, to 60000 (10 minutes) by default. In the case that bdi_wakeup_thread_delayed() races with sync_inodes_sb(), sync will be stopped for 10 minutes and trigger a hung task. Even if dirty_writeback_centisecs is not long enough to cause a hung task, we still don't want to delay sync for that long. We fix the problem by using queue_delayed_work() when we want to schedule writeback sometime in future. This function doesn't change the timer if it is already armed. For the same reason, we also change bdi_writeback_workfn() to immediately queue the work again in the case that the work_list is not empty. The same problem can happen if the sync work is run on the rescue worker. [jack@suse.cz: update changelog, add comment, use bdi_wakeup_thread_delayed()] Signed-off-by: Derek Basehore <dbasehore@chromium.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zento.linux.org.uk> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Derek Basehore <dbasehore@chromium.org> Cc: Kees Cook <keescook@chromium.org> Cc: Benson Leung <bleung@chromium.org> Cc: Sonny Rao <sonnyrao@chromium.org> Cc: Luigi Semenzato <semenzato@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-03 16:20:49 -07:00
Dave Chinner	a6cf33bc56	Merge branch 'xfs-bug-fixes-for-3.15-3' into for-next	2014-04-04 08:07:35 +11:00
Mark Tinguely	c88547a811	xfs: fix directory hash ordering bug Commit f5ea1100 ("xfs: add CRCs to dir2/da node blocks") introduced in 3.10 incorrectly converted the btree hash index array pointer in xfs_da3_fixhashpath(). It resulted in the the current hash always being compared against the first entry in the btree rather than the current block index into the btree block's hash entry array. As a result, it was comparing the wrong hashes, and so could misorder the entries in the btree. For most cases, this doesn't cause any problems as it requires hash collisions to expose the ordering problem. However, when there are hash collisions within a directory there is a very good probability that the entries will be ordered incorrectly and that actually matters when duplicate hashes are placed into or removed from the btree block hash entry array. This bug results in an on-disk directory corruption and that results in directory verifier functions throwing corruption warnings into the logs. While no data or directory entries are lost, access to them may be compromised, and attempts to remove entries from a directory that has suffered from this corruption may result in a filesystem shutdown. xfs_repair will fix the directory hash ordering without data loss occuring. [dchinner: wrote useful a commit message] cc: <stable@vger.kernel.org> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-04-04 07:10:49 +11:00
Linus Torvalds	32d01dc7be	Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot updates for cgroup: - The biggest one is cgroup's conversion to kernfs. cgroup took after the long abandoned vfs-entangled sysfs implementation and made it even more convoluted over time. cgroup's internal objects were fused with vfs objects which also brought in vfs locking and object lifetime rules. Naturally, there are places where vfs rules don't fit and nasty hacks, such as credential switching or lock dance interleaving inode mutex and cgroup_mutex with object serial number comparison thrown in to decide whether the operation is actually necessary, needed to be employed. After conversion to kernfs, internal object lifetime and locking rules are mostly isolated from vfs interactions allowing shedding of several nasty hacks and overall simplification. This will also allow implmentation of operations which may affect multiple cgroups which weren't possible before as it would have required nesting i_mutexes. - Various simplifications including dropping of module support, easier cgroup name/path handling, simplified cgroup file type handling and task_cg_lists optimization. - Prepatory changes for the planned unified hierarchy, which is still a patchset away from being actually operational. The dummy hierarchy is updated to serve as the default unified hierarchy. Controllers which aren't claimed by other hierarchies are associated with it, which BTW was what the dummy hierarchy was for anyway. - Various fixes from Li and others. This pull request includes some patches to add missing slab.h to various subsystems. This was triggered xattr.h include removal from cgroup.h. cgroup.h indirectly got included a lot of files which brought in xattr.h which brought in slab.h. There are several merge commits - one to pull in kernfs updates necessary for converting cgroup (already in upstream through driver-core), others for interfering changes in the fixes branch" * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits) cgroup: remove useless argument from cgroup_exit() cgroup: fix spurious lockdep warning in cgroup_exit() cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c cgroup: break kernfs active_ref protection in cgroup directory operations cgroup: fix cgroup_taskset walking order cgroup: implement CFTYPE_ONLY_ON_DFL cgroup: make cgrp_dfl_root mountable cgroup: drop const from @buffer of cftype->write_string() cgroup: rename cgroup_dummy_root and related names cgroup: move ->subsys_mask from cgroupfs_root to cgroup cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding cgroup: remove NULL checks from [pr_cont_]cgroup_{name\|path}() cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root cgroup: reorganize cgroup bootstrapping cgroup: relocate setting of CGRP_DEAD cpuset: use rcu_read_lock() to protect task_cs() cgroup_freezer: document freezer_fork() subtleties cgroup: update cgroup_transfer_tasks() to either succeed or fail cgroup: drop task_lock() protection around task->cgroups cgroup: update how a newly forked task gets associated with css_set ...	2014-04-03 13:05:42 -07:00
Dan Carpenter	805eeb8e04	xfs: extra semi-colon breaks a condition There were some extra semi-colons here which mean that we return true unintentionally. Fixes: a49935f200e2 ('xfs: xfs_check_page_type buffer checks need help') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2014-04-04 06:56:30 +11:00
Linus Torvalds	159d8133d0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual rocket science -- mostly documentation and comment updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: sparse: fix comment doc: fix double words isdn: capi: fix "CAPI_VERSION" comment doc: DocBook: Fix typos in xml and template file Bluetooth: add module name for btwilink driver core: unexport static function create_syslog_header mmc: core: typo fix in printk specifier ARM: spear: clean up editing mistake net-sysfs: fix comment typo 'CONFIG_SYFS' doc: Insert MODULE_ in module-signing macros Documentation: update URL to hfsplus Technote 1150 gpio: update path to documentation ixgbe: Fix format string in ixgbe_fcoe. Kconfig: Remove useless "default N" lines user_namespace.c: Remove duplicated word in comment CREDITS: fix formatting treewide: Fix typo in Documentation/DocBook mm: Fix warning on make htmldocs caused by slab.c ata: ata-samsung_cf: cleanup in header file idr: remove unused prototype of idr_free()	2014-04-02 16:23:38 -07:00
Linus Torvalds	b9f2b21a32	Devicetree changes for v3.15 Updates to devicetree core code. This branch contains the following notable changes: * Add reserved memory binding * Make struct device_node a kobject and remove legacy /proc/device-tree * ePAPR conformance fixes * Update in-kernel DTC copy to version v1.4.0 * Preparation changes for dynamic device tree overlays * minor bug fixes and documentation changes The most significant change in this branch is the conversion of struct device_node to be a kobject that is exposed via sysfs and removal of the old /proc/device-tree code. This simplifies the device tree handling code and tightens up the lifecycle on device tree nodes. [updated: added fix for dangling select PROC_DEVICETREE] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJTOyNwAAoJEMWQL496c2LNZY0QAIreUrpo3/hKRau61EDPXkOA UFRyPUHD0k/dNXWWDbTfvKH/nAfzdVwejhePqEWiODiFOFkq7JyQlMKPA+CZuZj0 ygN4215A1yj/hDf6JRD5Zn4WGpawDt9InlbZSps6P5dd8voV5t5dz6uzz+Y7uqaK CAjTDlBSmxEen5vRHiHQgKv74au/+b9yfSURjPQVWg46+wl3WJwjsdzerphm4unW tpEr8zkIsm51mqqAx4penIuiovh7+L2J5v4BFeg8o+kaZEuZpVxLHJPOuBd5hdom zeqEIj3AqHTh5suYIHe4aAbZ2wMP3kYGgkPGwfWLnwLyULxalcCtGZeaCi9nwTFj Fdj+7f17ocrt5mif0f5Deufi1LqJsDjhY6G9p7HuV7Y9hsMILpJIUoGENPji+TWj BA4L45eaPmNYdKJytEtFD7F2WnXeHZ6fDtYho/39DWW+Bt16IFX85T199irhxGG4 byN6LRaahk2UeycSXkQHAlWOQHqzBcJJAkQLN2iahzyYRr9Dy+VI2E9clm53m49O YQYcONdUlMYrtfRwJpbB9XHM0HgZUvg0LT5z/iHQs9uJtoo33Oj+zxFixyZLQ9Dq qyLqQWEpV9gFLAo9tpf56gffkLiJRsHkX4UJ6oTtj4DY1WWU9H81jjCvv/7flzp/ 8ZyyZzANQf1DZ9kqO2v+ =lyA5 -----END PGP SIGNATURE----- Merge tag 'dt-for-linus' of git://git.secretlab.ca/git/linux Pull devicetree changes from Grant Likely: "Updates to devicetree core code. This branch contains the following notable changes: - add reserved memory binding - make struct device_node a kobject and remove legacy /proc/device-tree - ePAPR conformance fixes - update in-kernel DTC copy to version v1.4.0 - preparatory changes for dynamic device tree overlays - minor bug fixes and documentation changes The most significant change in this branch is the conversion of struct device_node to be a kobject that is exposed via sysfs and removal of the old /proc/device-tree code. This simplifies the device tree handling code and tightens up the lifecycle on device tree nodes. [updated: added fix for dangling select PROC_DEVICETREE]" * tag 'dt-for-linus' of git://git.secretlab.ca/git/linux: (29 commits) dt: Remove dangling "select PROC_DEVICETREE" of: Add support for ePAPR "stdout-path" property of: device_node kobject lifecycle fixes of: only scan for reserved mem when fdt present powerpc: add support for reserved memory defined by device tree arm64: add support for reserved memory defined by device tree of: add missing major vendors of: add vendor prefix for SMSC of: remove /proc/device-tree of/selftest: Add self tests for manipulation of properties of: Make device nodes kobjects so they show up in sysfs arm: add support for reserved memory defined by device tree drivers: of: add support for custom reserved memory drivers drivers: of: add initialization code for dynamic reserved memory drivers: of: add initialization code for static reserved memory of: document bindings for reserved-memory nodes Revert "of: fix of_update_property()" kbuild: dtbs_install: new make target ARM: mvebu: Allows to get the SoC ID even without PCI enabled of: Allows to use the PCI translator without the PCI core ...	2014-04-02 14:27:15 -07:00
Linus Torvalds	7125764c5d	Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull compat time conversion changes from Peter Anvin: "Despite the branch name this is really neither an x86 nor an x32-specific patchset, although it the implementation of the discussions that followed the x32 security hole a few months ago. This removes get/put_compat_timespec/val() and replaces them with compat_get/put_timespec/val() which are savvy as to the current status of COMPAT_USE_64BIT_TIME. It removes several unused and/or incorrect/misleading functions (like compat_put_timeval_convert which doesn't in fact do any conversion) and also replaces several open-coded implementations what is now called compat_convert_timespec() with that function" * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: compat: Fix sparse address space warnings compat: Get rid of (get\|put)_compat_time(val\|spec)	2014-04-02 12:51:41 -07:00
Rajat Jain	f3846266f5	fuse: fix "uninitialized variable" warning Fix the following warning: In file included from include/linux/fs.h:16:0, from fs/fuse/fuse_i.h:13, from fs/fuse/file.c:9: fs/fuse/file.c: In function 'fuse_file_poll': include/linux/rbtree.h:82:28: warning: 'parent' may be used uninitialized in this function [-Wmaybe-uninitialized] fs/fuse/file.c:2592:27: note: 'parent' was declared here Signed-off-by: Rajat Jain <rajatxjain@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:51 +02:00
Pavel Emelyanov	4d99ff8f12	fuse: Turn writeback cache on Introduce a bit kernel and userspace exchange between each-other on the init stage and turn writeback on if the userspace want this and mount option 'allow_wbcache' is present (controlled by fusermount). Also add each writable file into per-inode write list and call the generic_file_aio_write to make use of the Linux page cache engine. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:50 +02:00
Pavel Emelyanov	ea8cd33390	fuse: Fix O_DIRECT operations vs cached writeback misorder The problem is: 1. write cached data to a file 2. read directly from the same file (via another fd) The 2nd operation may read stale data, i.e. the one that was in a file before the 1st op. Problem is in how fuse manages writeback. When direct op occurs the core kernel code calls filemap_write_and_wait to flush all the cached ops in flight. But fuse acks the writeback right after the ->writepages callback exits w/o waiting for the real write to happen. Thus the subsequent direct op proceeds while the real writeback is still in flight. This is a problem for backends that reorder operation. Fix this by making the fuse direct IO callback explicitly wait on the in-flight writeback to finish. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:50 +02:00
Maxim Patlasov	fe38d7df23	fuse: fuse_flush() should wait on writeback The aim of .flush fop is to hint file-system that flushing its state or caches or any other important data to reliable storage would be desirable now. fuse_flush() passes this hint by sending FUSE_FLUSH request to userspace. However, dirty pages and pages under writeback may be not visible to userspace yet if we won't ensure it explicitly. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:50 +02:00
Pavel Emelyanov	6b12c1b37e	fuse: Implement write_begin/write_end callbacks The .write_begin and .write_end are requiered to use generic routines (generic_file_aio_write --> ... --> generic_perform_write) for buffered writes. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:49 +02:00
Maxim Patlasov	482fce55d2	fuse: restructure fuse_readpage() Move the code filling and sending read request to a separate function. Future patches will use it for .write_begin -- partial modification of a page requires reading the page from the storage very similarly to what fuse_readpage does. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:49 +02:00
Pavel Emelyanov	e7cc133c37	fuse: Flush files on wb close Any write request requires a file handle to report to the userspace. Thus when we close a file (and free the fuse_file with this info) we have to flush all the outstanding dirty pages. filemap_write_and_wait() is enough because every page under fuse writeback is accounted in ff->count. This delays actual close until all fuse wb is completed. In case of "write cache" turned off, the flush is ensured by fuse_vma_close(). Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:49 +02:00
Maxim Patlasov	b0aa760652	fuse: Trust kernel i_mtime only Let the kernel maintain i_mtime locally: - clear S_NOCMTIME - implement i_op->update_time() - flush mtime on fsync and last close - update i_mtime explicitly on truncate and fallocate Fuse inode flag FUSE_I_MTIME_DIRTY serves as indication that local i_mtime should be flushed to the server eventually. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:48 +02:00
Pavel Emelyanov	8373200b12	fuse: Trust kernel i_size only Make fuse think that when writeback is on the inode's i_size is always up-to-date and not update it with the value received from the userspace. This is done because the page cache code may update i_size without letting the FS know. This assumption implies fixing the previously introduced short-read helper -- when a short read occurs the 'hole' is filled with zeroes. fuse_file_fallocate() is also fixed because now we should keep i_size up to date, so it must be updated if FUSE_FALLOCATE request succeeded. Signed-off-by: Maxim V. Patlasov <MPatlasov@parallels.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:48 +02:00
Pavel Emelyanov	d5cd66c58e	fuse: Connection bit for enabling writeback Off (0) by default. Will be used in the next patches and will be turned on at the very end. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:48 +02:00
Pavel Emelyanov	a92adc824e	fuse: Prepare to handle short reads A helper which gets called when read reports less bytes than was requested. See patch "trust kernel i_size only" for details. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:47 +02:00
Pavel Emelyanov	650b22b941	fuse: Linking file to inode helper When writeback is ON every writeable file should be in per-inode write list, not only mmap-ed ones. Thus introduce a helper for this linkage. Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-02 15:38:47 +02:00
Linus Torvalds	7a48837732	Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-block Pull core block layer updates from Jens Axboe: "This is the pull request for the core block IO bits for the 3.15 kernel. It's a smaller round this time, it contains: - Various little blk-mq fixes and additions from Christoph and myself. - Cleanup of the IPI usage from the block layer, and associated helper code. From Frederic Weisbecker and Jan Kara. - Duplicate code cleanup in bio-integrity from Gu Zheng. This will give you a merge conflict, but that should be easy to resolve. - blk-mq notify spinlock fix for RT from Mike Galbraith. - A blktrace partial accounting bug fix from Roman Pen. - Missing REQ_SYNC detection fix for blk-mq from Shaohua Li" * 'for-3.15/core' of git://git.kernel.dk/linux-block: (25 commits) blk-mq: add REQ_SYNC early rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock blk-mq: support partial I/O completions blk-mq: merge blk_mq_insert_request and blk_mq_run_request blk-mq: remove blk_mq_alloc_rq blk-mq: don't dump CPU -> hw queue map on driver load blk-mq: fix wrong usage of hctx->state vs hctx->flags blk-mq: allow blk_mq_init_commands() to return failure block: remove old blk_iopoll_enabled variable blktrace: fix accounting of partially completed requests smp: Rename __smp_call_function_single() to smp_call_function_single_async() smp: Remove wait argument from __smp_call_function_single() watchdog: Simplify a little the IPI call smp: Move __smp_call_function_single() below its safe version smp: Consolidate the various smp_call_function_single() declensions smp: Teach __smp_call_function_single() to check for offline cpus smp: Remove unused list_head from csd smp: Iterate functions through llist_for_each_entry_safe() block: Stop abusing rq->csd.list in blk-softirq block: Remove useless IPI struct initialization ...	2014-04-01 19:19:15 -07:00
Eric Whitney	ad6599ab3a	ext4: fix premature freeing of partial clusters split across leaf blocks Xfstests generic/311 and shared/298 fail when run on a bigalloc file system. Kernel error messages produced during the tests report that blocks to be freed are already on the to-be-freed list. When e2fsck is run at the end of the tests, it typically reports bad i_blocks and bad free blocks counts. The bug that causes these failures is located in ext4_ext_rm_leaf(). Code at the end of the function frees a partial cluster if it's not shared with an extent remaining in the leaf. However, if all the extents in the leaf have been removed, the code dereferences an invalid extent pointer (off the front of the leaf) when the check for sharing is made. This generally has the effect of unconditionally freeing the partial cluster, which leads to the observed failures when the partial cluster is shared with the last extent in the next leaf. Fix this by attempting to free the cluster only if extents remain in the leaf. Any remaining partial cluster will be freed if possible when the next leaf is processed or when leaf removal is complete. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-04-01 19:49:30 -04:00
Linus Torvalds	158e0d3621	Driver core / sysfs patches for 3.15-rc1 Here's the big driver core / sysfs update for 3.15-rc1. Lots of kernfs updates to make it useful for other subsystems, and a few other tiny driver core patches. All have been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iEYEABECAAYFAlM7A0wACgkQMUfUDdst+ynJNACfZlY+KNKIhNFt1OOW8rQfSZzy 1PYAnjYuOoly01JlPrpJD5b4TdxaAq71 =GVUg -----END PGP SIGNATURE----- Merge tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and sysfs updates from Greg KH: "Here's the big driver core / sysfs update for 3.15-rc1. Lots of kernfs updates to make it useful for other subsystems, and a few other tiny driver core patches. All have been in linux-next for a while" * tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (42 commits) Revert "sysfs, driver-core: remove unused {sysfs\|device}_schedule_callback_owner()" kernfs: cache atomic_write_len in kernfs_open_file numa: fix NULL pointer access and memory leak in unregister_one_node() Revert "driver core: synchronize device shutdown" kernfs: fix off by one error. kernfs: remove duplicate dir.c at the top dir x86: align x86 arch with generic CPU modalias handling cpu: add generic support for CPU feature based module autoloading sysfs: create bin_attributes under the requested group driver core: unexport static function create_syslog_header firmware: use power efficient workqueue for unloading and aborting fw load firmware: give a protection when map page failed firmware: google memconsole driver fixes firmware: fix google/gsmi duplicate efivars_sysfs_init() drivers/base: delete non-required instances of include <linux/init.h> kernfs: fix kernfs_node_from_dentry() ACPI / platform: drop redundant ACPI_HANDLE check kernfs: fix hash calculation in kernfs_rename_ns() kernfs: add CONFIG_KERNFS sysfs, kobject: add sysfs wrapper for kernfs_enable_ns() ...	2014-04-01 16:28:19 -07:00
Linus Torvalds	1ead658124	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer changes from Thomas Gleixner: "This assorted collection provides: - A new timer based timer broadcast feature for systems which do not provide a global accessible timer device. That allows those systems to put CPUs into deep idle states where the per cpu timer device stops. - A few NOHZ_FULL related improvements to the timer wheel - The usual updates to timer devices found in ARM SoCs - Small improvements and updates all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) tick: Remove code duplication in tick_handle_periodic() tick: Fix spelling mistake in tick_handle_periodic() x86: hpet: Use proper destructor for delayed work workqueue: Provide destroy_delayed_work_on_stack() clocksource: CMT, MTU2, TMU and STI should depend on GENERIC_CLOCKEVENTS timer: Remove code redundancy while calling get_nohz_timer_target() hrtimer: Rearrange comments in the order struct members are declared timer: Use variable head instead of &work_list in __run_timers() clocksource: exynos_mct: silence a static checker warning arm: zynq: Add support for cpufreq arm: zynq: Don't use arm_global_timer with cpufreq clocksource/cadence_ttc: Overhaul clocksource frequency adjustment clocksource/cadence_ttc: Call clockevents_update_freq() with IRQs enabled clocksource: Add Kconfig entries for CMT, MTU2, TMU and STI sh: Remove Kconfig entries for TMU, CMT and MTU2 ARM: shmobile: Remove CMT, TMU and STI Kconfig entries clocksource: armada-370-xp: Use atomic access for shared registers clocksource: orion: Use atomic access for shared registers clocksource: timer-keystone: Delete unnecessary variable clocksource: timer-keystone: introduce clocksource driver for Keystone ...	2014-04-01 11:00:07 -07:00
Linus Torvalds	a21e40877a	Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "The main purpose is to fix a full dynticks bug related to virtualization, where steal time accounting appears to be zero in /proc/stat even after a few seconds of competing guests running busy loops in a same host CPU. It's not a regression though as it was there since the beginning. The other commits are preparatory work to fix the bug and various cleanups" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch: Remove stub cputime.h headers sched: Remove needless round trip nsecs <-> tick conversion of steal time cputime: Fix jiffies based cputime assumption on steal accounting cputime: Bring cputime -> nsecs conversion cputime: Default implementation of nsecs -> cputime conversion cputime: Fix nsecs_to_cputime() return type cast	2014-04-01 10:16:10 -07:00
Miklos Szeredi	bd42998a6b	ext4: add cross rename support Implement RENAME_EXCHANGE flag in renameat2 syscall. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:44 +02:00
Miklos Szeredi	bd1af145b9	ext4: rename: split out helper functions Cross rename (exchange source and dest) will need to call some of these helpers for both source and dest, while overwriting rename currently only calls them for one or the other. This also makes the code easier to follow. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:44 +02:00
Miklos Szeredi	0d7d5d678b	ext4: rename: move EMLINK check up Move checking i_nlink from after ext4_get_first_dir_block() to before. The check doesn't rely on the result of that function and the function only fails on fs corruption, so the order shouldn't matter. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:44 +02:00
Miklos Szeredi	c0d268c366	ext4: rename: create ext4_renament structure for local vars Need to split up ext4_rename() into helpers but there are too many local variables involved, so create a new structure. This also, apparently, makes the generated code size slightly smaller. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:43 +02:00
Miklos Szeredi	da1ce0670c	vfs: add cross-rename If flags contain RENAME_EXCHANGE then exchange source and destination files. There's no restriction on the type of the files; e.g. a directory can be exchanged with a symlink. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:43 +02:00
J. Bruce Fields	4fd699ae3f	vfs: lock_two_nondirectories: allow directory args lock_two_nondirectories warned if either of its args was a directory. Instead just ignore the directory args. This is needed for locking in cross rename. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-04-01 17:08:43 +02:00
Miklos Szeredi	0b3974eb04	security: add flags to rename hooks Add flags to security_path_rename() and security_inode_rename() hooks. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:43 +02:00
Miklos Szeredi	0a7c3937a1	vfs: add RENAME_NOREPLACE flag If this flag is specified and the target of the rename exists then the rename syscall fails with EEXIST. The VFS does the existence checking, so it is trivial to enable for most local filesystems. This patch only enables it in ext4. For network filesystems the VFS check is not enough as there may be a race between a remote create and the rename, so these filesystems need to handle this flag in their ->rename() implementations to ensure atomicity. Andy writes about why this is useful: "The trivial answer: to eliminate the race condition from 'mv -i'. Another answer: there's a common pattern to atomically create a file with contents: open a temporary file, write to it, optionally fsync it, close it, then link(2) it to the final name, then unlink the temporary file. The reason to use link(2) is because it won't silently clobber the destination. This is annoying: - It requires an extra system call that shouldn't be necessary. - It doesn't work on (IMO sensible) filesystems that don't support hard links (e.g. vfat). - It's not atomic -- there's an intermediate state where both files exist. - It's ugly. The new rename flag will make this totally sensible. To be fair, on new enough kernels, you can also use O_TMPFILE and linkat to achieve the same thing even more cleanly." Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:43 +02:00
Miklos Szeredi	520c8b1650	vfs: add renameat2 syscall Add new renameat2 syscall, which is the same as renameat with an added flags argument. Pass flags to vfs_rename() and to i_op->rename() as well. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:42 +02:00
Miklos Szeredi	bc27027a73	vfs: rename: use common code for dir and non-dir There's actually very little difference between vfs_rename_dir() and vfs_rename_other() so move both inline into vfs_rename() which still stays reasonably readable. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:42 +02:00
Miklos Szeredi	de22a4c372	vfs: rename: move d_move() up Move the d_move() in vfs_rename_dir() up, similarly to how it's done in vfs_rename_other(). The next patch will consolidate these two functions and this is the only structural difference between them. I'm not sure if doing the d_move() after the dput is even valid. But there may be a logical explanation for that. But moving the d_move() before the dput() (and the mutex_unlock()) should definitely not hurt. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:42 +02:00
Miklos Szeredi	44b1d53043	vfs: add d_is_dir() Add d_is_dir(dentry) helper which is analogous to S_ISDIR(). To avoid confusion, rename d_is_directory() to d_can_lookup(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:41 +02:00
Lukas Czerner	e5b30416f3	ext4: remove unneeded test of ret variable Currently in ext4_fallocate() and ext4_zero_range() we're testing ret variable along with new_size. However in ext4_fallocate() we just tested ret before and in ext4_zero_range() if will always be zero when we get there so there is no need to test it in both cases. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-01 00:59:21 -04:00
Linus Torvalds	9d919e8d5b	Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "PREPARE_[DELAYED_]WORK() were used to change the work function of work items without fully reinitializing it; however, this makes workqueue consider the work item as a different one from before and allows the work item to start executing before the previous instance is finished which can lead to extremely subtle issues which are painful to debug. The interface has never been popular. This pull request contains patches to remove existing usages and kill the interface. As one of the changes was routed during the last devel cycle and another depended on a pending change in nvme, for-3.15 contains a couple merge commits. In addition, interfaces which were deprecated quite a while ago - __cancel_delayed_work() and WQ_NON_REENTRANT - are removed too" * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove deprecated WQ_NON_REENTRANT workqueue: Spelling s/instensive/intensive/ workqueue: remove PREPARE_[DELAYED_]WORK() staging/fwserial: don't use PREPARE_WORK afs: don't use PREPARE_WORK nvme: don't use PREPARE_WORK usb: don't use PREPARE_DELAYED_WORK floppy: don't use PREPARE_[DELAYED_]WORK ps3-vuart: don't use PREPARE_WORK wireless/rt2x00: don't use PREPARE_WORK in rt2800usb.c workqueue: Remove deprecated __cancel_delayed_work()	2014-03-31 15:08:51 -07:00
Linus Torvalds	1ce235faa8	- KGDB support for arm64 - PCI I/O space extended to 16M (in preparation of PCIe support patches) - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the time being), together with swiotlb late initialisation to correctly setup the bounce buffer - DMA API cache maintenance support (not all ARMv8 platforms have hardware cache coherency) - Crypto extensions advertising via ELF_HWCAP2 for compat user space - Perf support for dwarf unwinding in compat mode - asm/tlb.h converted to the generic mmu_gather code - asm-generic rwsem implementation - Code clean-up -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAABAgAGBQJTOaqsAAoJEGvWsS0AyF7xYNUP/3/IPySIB+/6pyUG6q7kvIpF Di93M+VdmnLEOKhhx/tjkiEmEQMp0hFPeOlQRWf/Ugg4ksulP6gRejdDEjIfkmsk LrRXLjvH79NDJbN0pTUXqGDvLLZ9Qnib+HEOuKABIYUrwhNKySBk+5omGfXFtwLR Mb5JxPX0kbBXOqbOX4RgANQoRlE8GxJR3V245zlGxA4klcN4IiaDy/99kj+kaeaa Cl8X9K2I550IZ2YUAWPOut2aee2qRFQtAhIDgVthTYlGRx7Y/rDLM16B8fFY/T0H 7azIpSO5hk5lp8J3giJHYajlJlXNla5FeHQb8XAVnlyqFBmCUn0vvd2VbPvWREJp UD8t1vZZt/s2he6CVAQIfQghwLyzrpPa19KbnyI+3HtsZ+NS/puBJmcVKZ2PBY/L 28BsRzB7BKAPEVhNmyPwFHNdZTvjaqYUCLhQ0uTp1sSHMcLeSs7+vyMR99f/0u9E doSYAeF41ZkxHXL5xEevdj4sFkCEY1XFxER1Y8VM1rqHTeGEoeYbdS/u9tEeBgit jBelvHAlNTBgbur2nW4E9fQpAF2CsvWnRq6lSmDRTkyjzcLUQqA8bsQJ3aUyJtZt j17kUIzSH1q7x3zAaWQcvMVeawdkv2+HanjuTOdeO2ehvyG71vvxA3RkCv8o5Jhh da+jAMhkpYQxk8mSKkWm =8+cB -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull ARM64 updates from Catalin Marinas: - KGDB support for arm64 - PCI I/O space extended to 16M (in preparation of PCIe support patches) - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the time being), together with swiotlb late initialisation to correctly setup the bounce buffer - DMA API cache maintenance support (not all ARMv8 platforms have hardware cache coherency) - Crypto extensions advertising via ELF_HWCAP2 for compat user space - Perf support for dwarf unwinding in compat mode - asm/tlb.h converted to the generic mmu_gather code - asm-generic rwsem implementation - Code clean-up * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits) arm64: Remove pgprot_dmacoherent() arm64: Support DMA_ATTR_WRITE_COMBINE arm64: Implement custom mmap functions for dma mapping arm64: Fix __range_ok macro arm64: Fix duplicated Kconfig entries arm64: mm: Route pmd thp functions through pte equivalents arm64: rwsem: use asm-generic rwsem implementation asm-generic: rwsem: de-PPCify rwsem.h arm64: enable generic CPU feature modalias matching for this architecture arm64: smp: make local symbol static arm64: debug: make local symbols static ARM64: perf: support dwarf unwinding in compat mode ARM64: perf: add support for frame pointer unwinding in compat mode ARM64: perf: add support for perf registers API arm64: Add boot time configuration of Intermediate Physical Address size arm64: Do not synchronise I and D caches for special ptes arm64: Make DMA coherent and strongly ordered mappings not executable arm64: barriers: add dmb barrier arm64: topology: Implement basic CPU topology support arm64: advertise ARMv8 extensions to 32-bit compat ELF binaries ...	2014-03-31 15:01:45 -07:00
Linus Torvalds	190f918660	Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 compat wrapper rework from Heiko Carstens: "S390 compat system call wrapper simplification work. The intention of this work is to get rid of all hand written assembly compat system call wrappers on s390, which perform proper sign or zero extension, or pointer conversion of compat system call parameters. Instead all of this should be done with C code eg by using Al's COMPAT_SYSCALL_DEFINEx() macro. Therefore all common code and s390 specific compat system calls have been converted to the COMPAT_SYSCALL_DEFINEx() macro. In order to generate correct code all compat system calls may only have eg compat_ulong_t parameters, but no unsigned long parameters. Those patches which change parameter types from unsigned long to compat_ulong_t parameters are separate in this series, but shouldn't cause any harm. The only compat system calls which intentionally have 64 bit parameters (preadv64 and pwritev64) in support of the x86/32 ABI haven't been changed, but are now only available if an architecture defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64. System calls which do not have a compat variant but still need proper zero extension on s390, like eg "long sys_brk(unsigned long brk)" will get a proper wrapper function with the new s390 specific COMPAT_SYSCALL_WRAPx() macro: COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk); which generates the following code (simplified): asmlinkage long sys_brk(unsigned long brk); asmlinkage long compat_sys_brk(long brk) { return sys_brk((u32)brk); } Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines includes both linux/syscall.h and linux/compat.h, it will generate build errors, if the declaration of sys_brk() doesn't match, or if there exists a non-matching compat_sys_brk() declaration. In addition this will intentionally result in a link error if somewhere else a compat_sys_brk() function exists, which probably should have been used instead. Two more BUILD_BUG_ONs make sure the size and type of each compat syscall parameter can be handled correctly with the s390 specific macros. I converted the compat system calls step by step to verify the generated code is correct and matches the previous code. In fact it did not always match, however that was always a bug in the hand written asm code. In result we get less code, less bugs, and much more sanity checking" * 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits) s390/compat: add copyright statement compat: include linux/unistd.h within linux/compat.h s390/compat: get rid of compat wrapper assembly code s390/compat: build error for large compat syscall args mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: convert to COMPAT_SYSCALL_DEFINE security/compat: convert to COMPAT_SYSCALL_DEFINE mm/compat: convert to COMPAT_SYSCALL_DEFINE net/compat: convert to COMPAT_SYSCALL_DEFINE kernel/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: optional preadv64/pwrite64 compat system calls ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t s390/compat: partial parameter conversion within syscall wrappers s390/compat: automatic zero, sign and pointer conversion of syscalls s390/compat: add sync_file_range and fallocate compat syscalls ...	2014-03-31 14:32:17 -07:00
Linus Torvalds	7cc3afdf43	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI changes from Ingo Molnar: "The main changes: - Add debug code to the dump EFI pagetable - Borislav Petkov - Make 1:1 runtime mapping robust when booting on machines with lots of memory - Borislav Petkov - Move the EFI facilities bits out of 'x86_efi_facility' and into efi.flags which is the standard architecture independent place to keep EFI state, by Matt Fleming. - Add 'EFI mixed mode' support: this allows 64-bit kernels to be booted from 32-bit firmware. This needs a bootloader that supports the 'EFI handover protocol'. By Matt Fleming" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86, efi: Abstract x86 efi_early calls x86/efi: Restore 'attr' argument to query_variable_info() x86/efi: Rip out phys_efi_get_time() x86/efi: Preserve segment registers in mixed mode x86/boot: Fix non-EFI build x86, tools: Fix up compiler warnings x86/efi: Re-disable interrupts after calling firmware services x86/boot: Don't overwrite cr4 when enabling PAE x86/efi: Wire up CONFIG_EFI_MIXED x86/efi: Add mixed runtime services support x86/efi: Firmware agnostic handover entry points x86/efi: Split the boot stub into 32/64 code paths x86/efi: Add early thunk code to go from 64-bit to 32-bit x86/efi: Build our own EFI services pointer table efi: Add separate 32-bit/64-bit definitions x86/efi: Delete dead code when checking for non-native x86/mm/pageattr: Always dump the right page table in an oops x86, tools: Consolidate #ifdef code x86/boot: Cleanup header.S by removing some #ifdefs efi: Use NULL instead of 0 for pointer ...	2014-03-31 12:26:05 -07:00
Linus Torvalds	b3fd4ea9df	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "Main changes: - Torture-test changes, including refactoring of rcutorture and introduction of a vestigial locktorture. - Real-time latency fixes. - Documentation updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) rcu: Provide grace-period piggybacking API rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone rcu: Fix sparse warning for rcu_expedited from kernel/ksysfs.c notifier: Substitute rcu_access_pointer() for rcu_dereference_raw() Documentation/memory-barriers.txt: Clarify release/acquire ordering rcutorture: Save kvm.sh output to log rcutorture: Add a lock_busted to test the test rcutorture: Place kvm-test-1-run.sh output into res directory rcutorture: Rename TREE_RCU-Kconfig.txt locktorture: Add kvm-recheck.sh plug-in for locktorture rcutorture: Gracefully handle NULL cleanup hooks locktorture: Add vestigial locktorture configuration rcutorture: Introduce "rcu" directory level underneath configs rcutorture: Rename kvm-test-1-rcu.sh rcutorture: Remove RCU dependencies from ver_functions.sh API rcutorture: Create CFcommon file for common Kconfig parameters rcutorture: Create config files for scripted test-the-test testing rcutorture: Add an rcu_busted to test the test locktorture: Add a lock-torture kernel module rcutorture: Abstract kvm-recheck.sh ...	2014-03-31 11:05:24 -07:00
Steven Whitehouse	1b2ad41214	GFS2: Fix address space from page function Now that rgrps use the address space which is part of the super block, we need to update gfs2_mapping2sbd() to take account of that. The only way to do that easily is to use a different set of address_space_operations for rgrps. Reported-by: Abhi Das <adas@redhat.com> Tested-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-31 17:48:27 +01:00
Abhi Das	059788039f	GFS2: Fix uninitialized VFS inode in gfs2_create_inode When gfs2_create_inode() fails due to quota violation, the VFS inode is not completely uninitialized. This can cause a list corruption error. This patch correctly uninitializes the VFS inode when a quota violation occurs in the gfs2_create_inode codepath. Resolves: rhbz#1059808 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-31 16:41:39 +01:00
Jeff Layton	29723adee1	locks: make locks_mandatory_area check for file-private locks Allow locks_mandatory_area() to handle file-private locks correctly. If there is a file-private lock set on an open file and we're doing I/O via the same, then that should not cause anything to block. Handle this by first doing a non-blocking FL_ACCESS check for a file-private lock, and then fall back to checking for a classic POSIX lock (and possibly blocking). Note that this approach is subject to the same races that have always plagued mandatory locking on Linux. Reported-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	d7a06983a0	locks: fix locks_mandatory_locked to respect file-private locks As Trond pointed out, you can currently deadlock yourself by setting a file-private lock on a file that requires mandatory locking and then trying to do I/O on it. Avoid this problem by plumbing some knowledge of file-private locks into the mandatory locking code. In order to do this, we must pass down information about the struct file that's being used to locks_verify_locked. Reported-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	90478939dc	locks: require that flock->l_pid be set to 0 for file-private locks Neil Brown suggested potentially overloading the l_pid value as a "lock context" field for file-private locks. While I don't think we will probably want to do that here, it's probably a good idea to ensure that in the future we could extend this API without breaking existing callers. Typically the l_pid value is ignored for incoming struct flock arguments, serving mainly as a place to return the pid of the owner if there is a conflicting lock. For file-private locks, require that it currently be set to 0 and return EINVAL if it isn't. If we eventually want to make a non-zero l_pid mean something, then this will help ensure that we don't break legacy programs that are using file-private locks. Cc: Neil Brown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00
Jeff Layton	5d50ffd7c3	locks: add new fcntl cmd values for handling file private locks Due to some unfortunate history, POSIX locks have very strange and unhelpful semantics. The thing that usually catches people by surprise is that they are dropped whenever the process closes any file descriptor associated with the inode. This is extremely problematic for people developing file servers that need to implement byte-range locks. Developers often need a "lock management" facility to ensure that file descriptors are not closed until all of the locks associated with the inode are finished. Additionally, "classic" POSIX locks are owned by the process. Locks taken between threads within the same process won't conflict with one another, which renders them useless for synchronization between threads. This patchset adds a new type of lock that attempts to address these issues. These locks conflict with classic POSIX read/write locks, but have semantics that are more like BSD locks with respect to inheritance and behavior on close. This is implemented primarily by changing how fl_owner field is set for these locks. Instead of having them owned by the files_struct of the process, they are instead owned by the filp on which they were acquired. Thus, they are inherited across fork() and are only released when the last reference to a filp is put. These new semantics prevent them from being merged with classic POSIX locks, even if they are acquired by the same process. These locks will also conflict with classic POSIX locks even if they are acquired by the same process or on the same file descriptor. The new locks are managed using a new set of cmd values to the fcntl() syscall. The initial implementation of this converts these values to "classic" cmd values at a fairly high level, and the details are not exposed to the underlying filesystem. We may eventually want to push this handing out to the lower filesystem code but for now I don't see any need for it. Also, note that with this implementation the new cmd values are only available via fcntl64() on 32-bit arches. There's little need to add support for legacy apps on a new interface like this. Signed-off-by: Jeff Layton <jlayton@redhat.com>	2014-03-31 08:24:43 -04:00

1 2 3 4 5 ...

35592 Commits