linux

iv/linux

Author	SHA1	Message	Date
Eric W. Biederman	0d0826019e	mnt: Prevent pivot_root from creating a loop in the mount tree Andy Lutomirski recently demonstrated that when chroot is used to set the root path below the path for the new ``root'' passed to pivot_root the pivot_root system call succeeds and leaks mounts. In examining the code I see that starting with a new root that is below the current root in the mount tree will result in a loop in the mount tree after the mounts are detached and then reattached to one another. Resulting in all kinds of ugliness including a leak of that mounts involved in the leak of the mount loop. Prevent this problem by ensuring that the new mount is reachable from the current root of the mount tree. [Added stable cc. Fixes CVE-2014-7970. --Andy] Cc: stable@vger.kernel.org Reported-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/87bnpmihks.fsf@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>	2014-10-14 14:27:19 -07:00
Neale Ferguson	c07127b48c	dlm: fix missing endian conversion of rcom_status flags The flags are already converted to le when being sent, but are not being converted back to cpu when received. Signed-off-by: Neale Ferguson <neale@sinenomine.net> Signed-off-by: David Teigland <teigland@redhat.com>	2014-10-14 15:11:48 -05:00
Yan, Zheng	0bc62284ee	ceph: fix divide-by-zero in __validate_layout() The 'stripe_unit' field is 64 bits, casting it to 32 bits can result zero. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 12:57:05 -07:00
Fabian Frederick	ab6c2c3ebe	ceph: fix bool assignments Fix some coccinelle warnings: fs/ceph/caps.c:2400:6-10: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2401:6-15: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2402:6-17: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2403:6-22: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2404:6-22: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2405:6-19: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2440:4-20: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2469:3-16: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2490:2-18: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2519:3-7: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2549:3-12: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2575:2-6: WARNING: Assignment of bool to 0/1 fs/ceph/caps.c:2589:3-7: WARNING: Assignment of bool to 0/1 Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2014-10-14 12:57:04 -07:00
John Spray	14ed97033d	ceph: additional debugfs output MDS session state and client global ID is useful instrumentation when testing. Signed-off-by: John Spray <john.spray@redhat.com>	2014-10-14 12:57:01 -07:00
John Spray	a687ecaf50	ceph: export ceph_session_state_name function ...so that it can be used from the ceph debugfs code when dumping session info. Signed-off-by: John Spray <john.spray@redhat.com>	2014-10-14 12:56:50 -07:00
Yan, Zheng	b1ee94aa59	ceph: include the initial ACL in create/mkdir/mknod MDS requests Current code set new file/directory's initial ACL in a non-atomic manner. Client first sends request to MDS to create new file/directory, then set the initial ACL after the new file/directory is successfully created. The fix is include the initial ACL in create/mkdir/mknod MDS requests. So MDS can handle creating file/directory and setting the initial ACL in one request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:56:49 -07:00
Yan, Zheng	25e6bae356	ceph: use pagelist to present MDS request data Current code uses page array to present MDS request data. Pages in the array are allocated/freed by caller of ceph_mdsc_do_request(). If request is interrupted, the pages can be freed while they are still being used by the request message. The fix is use pagelist to present MDS request data. Pagelist is reference counted. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:56:49 -07:00
Yan, Zheng	e4339d28f6	libceph: reference counting pagelist this allow pagelist to present data that may be sent multiple times. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:56:48 -07:00
Yan, Zheng	0abb43dcac	ceph: fix llistxattr on symlink only regular file and directory have vxattrs. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 12:56:48 -07:00
John Spray	dbd0c8bf79	ceph: send client metadata to MDS Implement version 2 of CEPH_MSG_CLIENT_SESSION syntax, which includes additional client metadata to allow the MDS to report on clients by user-sensible names like hostname. Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 12:56:47 -07:00
Chao Yu	a4483e8a42	ceph: remove redundant code for max file size verification Both ceph_update_writeable_page and ceph_setattr will verify file size with max size ceph supported. There are two caller for ceph_update_writeable_page, ceph_write_begin and ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in generic_write_checks of ceph_write_iter; for ceph_page_mkwrite, we have no chance to change file size when mmap. Likewise we have already verified the size in inode_change_ok when we call ceph_setattr. So let's remove the redundant code for max file size verification. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 21:03:40 +04:00
Yan, Zheng	3b70b388e3	ceph: remove redundant io_iter_advance() ceph_sync_read and generic_file_read_iter() have already advanced the IO iterator. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 21:03:39 +04:00
Yan, Zheng	6cd3bcad0d	ceph: move ceph_find_inode() outside the s_mutex ceph_find_inode() may wait on freeing inode, using it inside the s_mutex may cause deadlock. (the freeing inode is waiting for OSD read reply, but dispatch thread is blocked by the s_mutex) Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 21:03:39 +04:00
Yan, Zheng	508b32d866	ceph: request xattrs if xattr_version is zero Following sequence of events can happen. - Client releases an inode, queues cap release message. - A 'lookup' reply brings the same inode back, but the reply doesn't contain xattrs because MDS didn't receive the cap release message and thought client already has up-to-data xattrs. The fix is force sending a getattr request to MDS if xattrs_version is 0. The getattr mask is set to CEPH_STAT_CAP_XATTR, so MDS knows client does not have xattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 21:03:38 +04:00
Yan, Zheng	03974e8177	ceph: make sure request isn't in any waiting list when kicking request. we may corrupt waiting list if a request in the waiting list is kicked. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 21:03:24 +04:00
Yan, Zheng	656e438294	ceph: protect kick_requests() with mdsc->mutex Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 21:03:24 +04:00
Yan, Zheng	5d23371fdb	ceph: trim unused inodes before reconnecting to recovering MDS So the recovering MDS does not need to fetch these ununsed inodes during cache rejoin. This may reduce MDS recovery time. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 21:03:22 +04:00
Martin K. Petersen	e19a8a0ad2	block: Remove REQ_KERNEL REQ_KERNEL is no longer used. Remove it and drop the redundant uio argument to nfs_file_direct_{read,write}. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-10-14 09:00:44 -06:00
Vinícius Tinti	0458a953d8	btrfs: LLVMLinux: Remove VLAIS Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99 compliant equivalent. This patch instead allocates the appropriate amount of memory using a char array using the SHASH_DESC_ON_STACK macro. The new code can be compiled with both gcc and clang. Signed-off-by: Vinícius Tinti <viniciustinti@gmail.com> Reviewed-by: Jan-Simon Möller <dl9pf@gmx.de> Reviewed-by: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Behan Webster <behanw@converseincode.com> Acked-by: Chris Mason <clm@fb.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net>	2014-10-14 10:51:22 +02:00
Linus Torvalds	1b5a5f59e3	FS-Cache fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVDwD3ROxKuMESys7AQLayg//Tmdi4eLzcky/HcOfAoVIY3B5Wvs1MBbN 3HhaYWKDeJvWxFmRDfQK0c1dyjBA2Xe7bPhdwQ8S9epAWAoW6D4g3Mg2+YReGLCK U/CcrMHN77RSydTG0Mj/Z99IynSdf9rwdNrCEy8NiNkGe8Z/JCFPpZurRCc4PL44 4miTUq3ESMTGkUsa9BH+T0ngEka2ZdwnmzlYkdzeqmjmlbFx8RxcEewBeAoAlU73 eihKKyX+1uWX/2DmJol5NtZx+BbNkFsO+pX+s+70TsbjiyILCAmgh5meTpkGsDrW iJGcgxwhcmyq1aTPcHRmXeNsVenbqRefGUtz7B5Q0x1Uk+ofRYfVVdiyTS2juGbC DFGyNBUcFqsmbSMxM+yZGSzgR9KbzoZHDR/ppbJfMqIoe+oGju/NE+AZ6Q3f2/Es AIGc8imc96QU08OnrZtreZxfgFMcFxBoGHvAM9AUr1ue80SWhVRZjwYx/JcIP7Cm TKyilgb5hfxJ7zon+JuHSqttpeG3zOTjjhcKDmJlybYkKlTeRXm6ZcKVrro5d2+z GLnH32HQRJvXBZslymqb7OgkxIW4ySO3PcAWTosUv9+zG0BPR1mB0NVQrSLEPk4L JHA+Mjp8O37pN3kRantVNHk73t0z4qkbi8Ixft0yAus9qNNFMeKh+7NbBRjxUZAU ARcAbvVMyT0= =RtLr -----END PGP SIGNATURE----- Merge tag 'fscache-fixes-20141013' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fs-cache fixes from David Howells: "Two fixes for bugs in CacheFiles and a cleanup in FS-Cache" * tag 'fscache-fixes-20141013' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fs/fscache/object-list.c: use __seq_open_private() CacheFiles: Fix incorrect test for in-memory object collision CacheFiles: Handle object being killed before being set up	2014-10-14 08:40:15 +02:00
Linus Torvalds	b11445f830	* Fix for a theoretical race condition which could lead to a situation when UBIFS is unable to mount a file-system (Hujianyang) * Few fixes for the ubiblock sybsystem, error path fixes * The ubiblock subsystem has had the volume size change handling improved * Few fixes and nicifications in the fastmap subsystem -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUO9l0AAoJECmIfjd9wqK01TwP/jAcA7GEnxUpQ8UFBZhJEIN0 0Ad4oDrGShpuEYgYyRFjCstuXErJBhMwImrevJhRmwxaY2fzGqBeDO9YKKGkKDfa qjGsQrUaCJgV6qC2iT056ZmI7V/XyZfnZQ4Z8nQbzafoJ3MPbB6ExqBy8CZi8q/6 A516cen/cnZfHOQ1aqN6gyw2l976IzdJx8v0WOeYaXcvfDMrmfY8mkfh7EahOIVm Kz9BVlVRxxfKPCqMpm+xV8KAOsMueOnKy+6zL7rFh+AvLQBACq44BV1HkZtg2avX NBAo1RTPumeCht2t4nLJfgc+BJZ7cNpNFAijsWVJxp6umUqlsnbqckAx69O+JE9/ VZjM1KN1suI0bm01bj6xysGvg+JNTMiZ+HEqiseICSWtDbnCT4qDL3MPFgmD9OYh 9ar92Ku2HeY3DakKNd89gqw0ey28cv4i957KleneYzewcfFQ5pC/dp4thcDWa5fH AHoblC4ShmcURDPYsIKRZsiTUf/uf3iLFIWAGJBDnSRg4dzzjoJkenz4W5ecWFDj JokceklSf0zm8qAAdIUXw5Sihza1cnSBAIYBxVR808U+bwkCTOFF5xcTQy6wKf3y NBb+ygh/ugps8B2evJEmp6ByLWQZr8j1q7IokZtglKWN2qOTfzyMxzlWl9vOQJYq EQytnka5OEEXamr7g1iB =2XCN -----END PGP SIGNATURE----- Merge tag 'upstream-3.18-rc1-v2' of git://git.infradead.org/linux-ubifs Pull UBI/UBIFS fixes from Artem Bityutskiy: - fix for a theoretical race condition which could lead to a situation when UBIFS is unable to mount a file-system (Hujianyang) - a few fixes for the ubiblock sybsystem, error path fixes - the ubiblock subsystem has had the volume size change handling improved - a few fixes and nicifications in the fastmap subsystem * tag 'upstream-3.18-rc1-v2' of git://git.infradead.org/linux-ubifs: UBI: Fastmap: Calc fastmap size correctly UBIFS: Fix trivial typo in power_cut_emulated() UBI: Fix trivial typo in __schedule_ubi_work UBI: wl: Rename cancel flag to shutdown UBI: ubi_eba_read_leb: Remove in vain variable assignment UBIFS: Align the dump messages of SB_NODE UBI: Fix livelock in produce_free_peb() UBI: return on error in rename_volumes() UBI: Improve comment on work_sem UBIFS: Remove bogus assert UBI: Dispatch update notification if the volume is updated UBI: block: Add support for the UBI_VOLUME_UPDATED notification UBI: block: Fix block device size setting UBI: block: fix dereference on uninitialized dev UBI: add missing kmem_cache_free() in process_pool_aeb error path UBIFS: fix free log space calculation UBIFS: fix a race condition	2014-10-14 08:38:54 +02:00
Darrick J. Wong	813d32f913	ext4: check s_chksum_driver when looking for bg csum presence Convert the ext4_has_group_desc_csum predicate to look for a checksum driver instead of the metadata_csum flag and change the bg checksum calculation function to look for GDT_CSUM before taking the crc16 path. Without this patch, if we mount with ^uninit_bg,^metadata_csum and later metadata_csum gets turned on by accident, the block group checksum functions will incorrectly assume that checksumming is enabled (metadata_csum) but that crc16 should be used (!s_chksum_driver). This is totally wrong, so fix the predicate and the checksum formula selection. (Granted, if the metadata_csum feature bit gets enabled on a live FS then something underhanded is going on, but we could at least avoid writing garbage into the on-disk fields.) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org	2014-10-14 02:35:49 -04:00
Linus Torvalds	0ef3a56b1c	Merge branch 'CVE-2014-7975' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux Pull do_umount fix from Andy Lutomirski: "This fix really ought to be safe. Inside a mountns owned by a non-root user namespace, the namespace root almost always has MNT_LOCKED set (if it doesn't, then there's a bug, because rootfs could be exposed). In that case, calling umount on "/" will return -EINVAL with or without this patch. Outside a userns, this patch will have no effect. may_mount, required by umount, already checks ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN) so an additional capable(CAP_SYS_ADMIN) check will have no effect. That leaves anything that calls umount on "/" in a non-root userns while chrooted. This is the case that is currently broken (it remounts ro, which shouldn't be allowed) and that my patch changes to -EPERM. If anything relies on that, I'd be surprised" * 'CVE-2014-7975' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux: fs: Add a missing permission check to do_umount	2014-10-14 08:35:01 +02:00
Peter Feiner	64e455079e	mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared For VMAs that don't want write notifications, PTEs created for read faults have their write bit set. If the read fault happens after VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain clear after subsequent writes. Here's a simple code snippet to demonstrate the bug: char* m = mmap(NULL, getpagesize(), PROT_READ \| PROT_WRITE, MAP_ANONYMOUS \| MAP_SHARED, -1, 0); system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY / assert(m == '\0'); /* new PTE allows write access / assert(!soft_dirty(x)); m = 'x'; /* should dirty the page / assert(soft_dirty(x)); / fails */ With this patch, write notifications are enabled when VM_SOFTDIRTY is cleared. Furthermore, to avoid unnecessary faults, write notifications are disabled when VM_SOFTDIRTY is set. As a side effect of enabling and disabling write notifications with care, this patch fixes a bug in mprotect where vm_page_prot bits set by drivers were zapped on mprotect. An analogous bug was fixed in mmap by commit c9d0bf241451 ("mm: uncached vma support with writenotify"). Signed-off-by: Peter Feiner <pfeiner@google.com> Reported-by: Peter Feiner <pfeiner@google.com> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Jamie Liu <jamieliu@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:28 +02:00
Zach Brown	9470dd5d35	fs: check bh blocknr earlier when searching lru It's very common for the buffer heads in the lru to have different block numbers. By comparing the blocknr before the bdev and size we can reduce the cost of searching in the very common case where all the entries have the same bdev and size. In quick hot cache cycle counting tests on a single fs workstation this cut the cost of a miss by about 20%. A diff of the disassembly shows the reordering of the bdev and blocknr comparisons. This is in such a tiny loop that skipping one comparison is a meaningful portion of the total work being done: 1628: 83 c1 01 add $0x1,%ecx 162b: 83 f9 08 cmp $0x8,%ecx 162e: 74 60 je 1690 <__find_get_block+0xa0> 1630: 89 c8 mov %ecx,%eax 1632: 65 4c 8b 04 c5 00 00 mov %gs:0x0(,%rax,8),%r8 1639: 00 00 163b: 4d 85 c0 test %r8,%r8 163e: 4c 89 c3 mov %r8,%rbx 1641: 74 e5 je 1628 <__find_get_block+0x38> - 1643: 4d 3b 68 30 cmp 0x30(%r8),%r13 + 1643: 4d 3b 68 18 cmp 0x18(%r8),%r13 1647: 75 df jne 1628 <__find_get_block+0x38> - 1649: 4d 3b 60 18 cmp 0x18(%r8),%r12 + 1649: 4d 3b 60 30 cmp 0x30(%r8),%r12 164d: 75 d9 jne 1628 <__find_get_block+0x38> 164f: 49 39 50 20 cmp %rdx,0x20(%r8) 1653: 75 d3 jne 1628 <__find_get_block+0x38> Signed-off-by: Zach Brown <zab@zabbo.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:26 +02:00
Rasmus Villemoes	a97df4277d	isofs: replace strnicmp with strncasecmp The kernel used to contain two functions for length-delimited, case-insensitive string comparison, strnicmp with correct semantics and a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper for the new strncasecmp to avoid breaking existing users. To allow the compat wrapper strnicmp to be removed at some point in the future, and to avoid the extra indirection cost, do s/strnicmp/strncasecmp/g. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:24 +02:00
Rasmus Villemoes	2bd63329cb	ocfs2: replace strnicmp with strncasecmp The kernel used to contain two functions for length-delimited, case-insensitive string comparison, strnicmp with correct semantics and a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper for the new strncasecmp to avoid breaking existing users. To allow the compat wrapper strnicmp to be removed at some point in the future, and to avoid the extra indirection cost, do s/strnicmp/strncasecmp/g. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:24 +02:00
Rasmus Villemoes	87e747cdb9	cifs: replace strnicmp with strncasecmp The kernel used to contain two functions for length-delimited, case-insensitive string comparison, strnicmp with correct semantics and a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper for the new strncasecmp to avoid breaking existing users. To allow the compat wrapper strnicmp to be removed at some point in the future, and to avoid the extra indirection cost, do s/strnicmp/strncasecmp/g. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Steve French <sfrench@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:24 +02:00
Fabian Frederick	76e5121089	FS/OMFS: block number sanity check during fill_super operation This patch defines maximum block number to 2^31. It also converts bitmap_size and array_size to unsigned int in omfs_get_imap Signed-off-by: Fabian Frederick <fabf@skynet.be> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Bob Copeland <me@bobcopeland.com> Acked-by: Bob Copeland <me@bobcopeland.com> Tested-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:22 +02:00
Fabian Frederick	c70b17b653	fs/affs: remove redundant sys_tz declarations sys_tz is already declared in include/linux/time.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:22 +02:00
Fabian Frederick	73516ace94	fs/affs/file.c: fix shadow warnings Four functions declared variables twice resulting in shadow warnings. This patch renames internal variables and adds blank line after declarations. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:22 +02:00
Fabian Frederick	3bc759931d	fs/affs/inode.c: remove unused variable head is set to AFFS_HEAD(bh) but never used. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:22 +02:00
Fabian Frederick	1e907f4f11	fs/affs/super.c: remove unused variable key is set in affs_fill_super but never used. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:21 +02:00
Oleg Nesterov	b03023ecbd	coredump: add %i/%I in core_pattern to report the tid of the crashed thread format_corename() can only pass the leader's pid to the core handler, but there is no simple way to figure out which thread originated the coredump. As Jan explains, this also means that there is no simple way to create the backtrace of the crashed process: As programs are mostly compiled with implicit gcc -fomit-frame-pointer one needs program's .eh_frame section (equivalently PT_GNU_EH_FRAME segment) or .debug_frame section. .debug_frame usually is present only in separate debug info files usually not even installed on the system. While .eh_frame is a part of the executable/library (and it is even always mapped for C++ exceptions unwinding) it no longer has to be present anywhere on the disk as the program could be upgraded in the meantime and the running instance has its executable file already unlinked from disk. One possibility is to echo 0x3f >/proc/*/coredump_filter and dump all the file-backed memory including the executable's .eh_frame section. But that can create huge core files, for example even due to mmapped data files. Other possibility would be to read .eh_frame from /proc/PID/mem at the core_pattern handler time of the core dump. For the backtrace one needs to read the register state first which can be done from core_pattern handler: ptrace(PTRACE_SEIZE, tid, 0, PTRACE_O_TRACEEXIT) close(0); // close pipe fd to resume the sleeping dumper waitpid(); // should report EXIT PTRACE_GETREGS or other requests The remaining problem is how to get the 'tid' value of the crashed thread. It could be read from the first NT_PRSTATUS note of the core file but that makes the core_pattern handler complicated. Unfortunately %t is already used so this patch uses %i/%I. Automatic Bug Reporting Tool (https://github.com/abrt/abrt/wiki/overview) is experimenting with this. It is using the elfutils (https://fedorahosted.org/elfutils/) unwinder for generating the backtraces. Apart from not needing matching executables as mentioned above, another advantage is that we can get the backtrace without saving the core (which might be quite large) to disk. [mmilata@redhat.com: final paragraph of changelog] Signed-off-by: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: Mark Wielaard <mjw@redhat.com> Cc: Martin Milata <mmilata@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:21 +02:00
Fabian Frederick	877aabd6ce	fat: remove redundant sys_tz declaration sys_tz is already declared extern struct in include/linux/time.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:20 +02:00
Fabian Frederick	54cc6cea73	fs/reiserfs/journal.c: fix sparse context imbalance warning Merge conditional unlock/lock in the same condition to avoid sparse warning: fs/reiserfs/journal.c:703:36: warning: context imbalance in 'add_to_chunk' - unexpected unlock Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:20 +02:00
Fabian Frederick	35c0b380d8	fs/ufs/balloc.c: remove unused variable ucg is defined and set in ufs_bitmap_search but never used. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:20 +02:00
Fabian Frederick	a792d90829	fs/hfs/hfs_fs.h: remove redundant sys_tz declaration sys_tz is already declared in include/linux/time.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:20 +02:00
Andreas Rohner	b9f6614072	nilfs2: improve the performance of fdatasync() Support for fdatasync() has been implemented in NILFS2 for a long time, but whenever the corresponding inode is dirty the implementation falls back to a full-flegded sync(). Since every write operation has to update the modification time of the file, the inode will almost always be dirty and fdatasync() will fall back to sync() most of the time. But this fallback is only necessary for a change of the file size and not for a change of the various timestamps. This patch adds a new flag NILFS_I_INODE_SYNC to differentiate between those two situations. * If it is set the file size was changed and a full sync is necessary. * If it is not set then only the timestamps were updated and fdatasync() can go ahead. There is already a similar flag I_DIRTY_DATASYNC on the VFS layer with the exact same semantics. Unfortunately it cannot be used directly, because NILFS2 doesn't implement write_inode() and doesn't clear the VFS flags when inodes are written out. So the VFS writeback thread can clear I_DIRTY_DATASYNC at any time without notifying NILFS2. So I_DIRTY_DATASYNC has to be mapped onto NILFS_I_INODE_SYNC in nilfs_update_inode(). Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:20 +02:00
Andreas Rohner	e2c7617ae3	nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs() Under normal circumstances nilfs_sync_fs() writes out the super block, which causes a flush of the underlying block device. But this depends on the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the last segment crosses a segment boundary. So if only a small amount of data is written before the call to nilfs_sync_fs(), no flush of the block device occurs. In the above case an additional call to blkdev_issue_flush() is needed. To prevent unnecessary overhead, the new flag nilfs->ns_flushed_device is introduced, which is cleared whenever new logs are written and set whenever the block device is flushed. For convenience the function nilfs_flush_device() is added, which contains the above logic. Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:20 +02:00
Himangi Saraogi	0f2a84f41a	fs/befs/btree.c: remove typedef befs_btree_node The Linux kernel coding style guidelines suggest not using typedefs for structure types. This patch gets rid of the typedef for befs_btree_node. The following Coccinelle semantic patch detects the case. @tn1@ type td; @@ typedef struct { ... } td; @script:python tf@ td << tn1.td; tdres; @@ coccinelle.tdres = td; @@ type tn1.td; identifier tf.tdres; @@ -typedef struct + tdres { ... } -td ; @@ type tn1.td; identifier tf.tdres; @@ -td + struct tdres Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:20 +02:00
NeilBrown	ef16cc5909	autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode. If rcu-walk mode we don't have to return -EISDIR for non-mount-traps as we will simply drop into REF-walk and handling DCACHE_NEED_AUTOMOUNT dentrys the slow way. But it is better if we do when possible. In 'oz_mode', use the same condition as ref-walk: if not a mountpoint, then it must be -EISDIR. In regular mode there are most tests needed. Most of them can be performed without taking any spinlocks. If we find a directory that isn't obviously empty, and isn't mounted on, we need to call 'simple_empty()' which does take a spinlock. If this turned out to hurt performance, some other approach could be found to signal when a directory is known to be empty. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:16 +02:00
NeilBrown	4d885f90e3	autofs4: avoid taking fs_lock during rcu-walk ->fs_lock protects AUTOFS_INF_EXPIRING. We need to be sure that once the flag is set, no new references beneath the dentry are taken. So rcu-walk currently needs to take fs_lock before checking the flag. This hurts performance. Change the expiry to a two-stage process. First set AUTOFS_INF_NO_RCU which forces any path walk into ref-walk mode, then drop the lock and call synchronize_rcu(). Once that returns we can be sure no rcu-walk is active beneath the dentry and we can check reference counts again. Now during an RCU-walk we can test AUTOFS_INF_EXPIRING without taking the lock as along as we test AUTOFS_INF_NO_RCU too. If either are set, we must abort the RCU-walk If neither are set, we know that refcounts will be tested again after we finish the RCU-walk so we are safe to continue. ->fs_lock is still taken in d_manage() to check for a non-trap directory. That will be resolved in the next patch. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:16 +02:00
NeilBrown	6ece08e618	autofs4: make "autofs4_can_expire" idempotent. Have a "test" function change the value it is testing can be confusing, particularly as a future patch will be calling this function twice. So move the update for 'last_used' to avoid repeat expiry to the place where the final determination on what to expire is known. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:16 +02:00
NeilBrown	a5d1dba143	autofs4: factor should_expire() out of autofs4_expire_indirect. Future patch will potentially call this twice, so make it separate. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:16 +02:00
NeilBrown	23bfc2a24e	autofs4: allow RCU-walk to walk through autofs4 This series teaches autofs about RCU-walk so that we don't drop straight into REF-walk when we hit an autofs directory, and so that we avoid spinlocks as much as possible when performing an RCU-walk. This is needed so that the benefits of the recent NFS support for RCU-walk are fully available when NFS filesystems are automounted. Patches have been carefully reviewed and tested both with test suites and in production - thanks a lot to Ian Kent for his support there. This patch (of 6): Any attempt to look up a pathname that passes though an autofs4 mount is currently forced out of RCU-walk into REF-walk. This can significantly hurt performance of many-thread work loads on many-core systems, especially if the automounted filesystem supports RCU-walk but doesn't get to benefit from it. So if autofs4_d_manage is called with rcu_walk set, only fail with -ECHILD if it is necessary to wait longer than a spinlock. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Ian Kent <raven@themaw.net> Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:16 +02:00
Fabian Frederick	8a273345dc	fs/ncpfs/dir.c: remove redundant sys_tz declaration sys_tz is already declared in include/linux/time.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:16 +02:00
Arnd Bergmann	de8288b1f8	binfmt_misc: work around gcc-4.9 warning gcc-4.9 on ARM gives us a mysterious warning about the binfmt_misc parse_command function: fs/binfmt_misc.c: In function 'parse_command.part.3': fs/binfmt_misc.c:405:7: warning: array subscript is above array bounds [-Warray-bounds] I've managed to trace this back to the ARM implementation of memset, which is called from copy_from_user in case of a fault and which does #define memset(p,v,n) \ ({ \ void *__p = (p); size_t __n = n; \ if ((__n) != 0) { \ if (__builtin_constant_p((v)) && (v) == 0) \ __memzero((__p),(__n)); \ else \ memset((__p),(v),(__n)); \ } \ (__p); \ }) Apparently gcc gets confused by the check for "size != 0" and believes that the size might be zero when it gets to the line that does "if (s[count-1] == '\n')", so it would access data outside of the array. gcc is clearly wrong here, since this condition was already checked earlier in the function and the 'size' value can not change in the meantime. Fortunately, we can work around it and get rid of the warning by rearranging the function to check for zero size after doing the copy_from_user. It is still safe to pass a zero size into copy_from_user, so it does not cause any side effects. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:16 +02:00
Mike Frysinger	bbaecc0882	binfmt_misc: expand the register format limit to 1920 bytes The current code places a 256 byte limit on the registration format. This ends up being fairly limited when you try to do matching against a binary format like ELF: - the magic & mask formats cannot have any embedded NUL chars (string_unescape_inplace halts at the first NUL) - each escape sequence quadruples the size: \x00 is needed for NUL - trying to match bytes at the start of the file as well as further on leads to a lot of \x00 sequences in the mask - magic & mask have to be the same length (when decoded) - still need bytes for the other fields - impossible! Let's look at a concrete (and common) example: using QEMU to run MIPS ELFs. The name field uses 11 bytes "qemu-mipsel". The interp uses 20 bytes "/usr/bin/qemu-mipsel". The type & flags takes up 4 bytes. We need 7 bytes for the delimiter (usually ":"). We can skip offset. So already we're down to 107 bytes to use with the magic/mask instead of the real limit of 128 (BINPRM_BUF_SIZE). If people use shell code to register (which they do the majority of the time), they're down to ~26 possible bytes since the escape sequence must be \x##. The ELF format looks like (both 32 & 64 bit): e_ident: 16 bytes e_type: 2 bytes e_machine: 2 bytes Those 20 bytes are enough for most architectures because they have so few formats in the first place, thus they can be uniquely identified. That also means for shell users, since 20 is smaller than 26, they can sanely register a handler. But for some targets (like MIPS), we need to poke further. The ELF fields continue on: e_entry: 4 or 8 bytes e_phoff: 4 or 8 bytes e_shoff: 4 or 8 bytes e_flags: 4 bytes We only care about e_flags here as that includes the bits to identify whether the ELF is O32/N32/N64. But now we have to consume another 16 bytes (for 32 bit ELFs) or 28 bytes (for 64 bit ELFs) just to match the flags. If every byte is escaped, we send 288 more bytes to the kernel ((20 {e_ident,e_type,e_machine} + 12 {e_entry,e_phoff,e_shoff} + 4 {e_flags}) * 2 {mask,magic} * 4 {escape}) and we've clearly blown our budget. Even if we try to be clever and do the decoding ourselves (rather than relying on the kernel to process \x##), we still can't hit the mark -- string_unescape_inplace treats mask & magic as C strings so NUL cannot be embedded. That leaves us with having to pass \x00 for the 12/24 entry/phoff/shoff bytes (as those will be completely random addresses), and that is a minimum requirement of 48/96 bytes for the mask alone. Add up the rest and we blow through it (this is for 64 bit ELFs): magic: 20 {e_ident,e_type,e_machine} + 24 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 48 # ^^ See note below. mask: 20 {e_ident,e_type,e_machine} + 96 {e_entry,e_phoff,e_shoff} + 4 {e_flags} = 120 Remember above we had 107 left over, and now we're at 168. This is of course the best case scenario -- you'll also want to have NUL bytes in the magic & mask too to match literal zeros. Note: the reason we can use 24 in the magic is that we can work off of the fact that for bytes the mask would clobber, we can stuff any value into magic that we want. So when mask is \x00, we don't need the magic to also be \x00, it can be an unescaped raw byte like '!'. This lets us handle more formats (barely) under the current 256 limit, but that's a pretty tall hoop to force people to jump through. With all that said, let's bump the limit from 256 bytes to 1920. This way we support escaping every byte of the mask & magic field (which is 1024 bytes by themselves -- 128 * 4 * 2), and we leave plenty of room for other fields. Like long paths to the interpreter (when you have source in your /really/long/homedir/qemu/foo). Since the current code stuffs more than one structure into the same buffer, we leave a bit of space to easily round up to 2k. 1920 is just as arbitrary as 256 ;). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-10-14 02:18:15 +02:00

... 2 3 4 5 6 ...

38454 Commits