24897 Commits

Author SHA1 Message Date
Andi Kleen
18772641db direct-io: separate map_bh from dio
Only a single b_private field in the map_bh buffer head is needed after
the submission path. Move map_bh separately to avoid storing
this information in the long term slab.

This avoids the weird 104 byte hole in struct dio_submit which also needed
to be memseted early.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:57 +02:00
Andi Kleen
6e8267f532 direct-io: use a slab cache for struct dio
A direct slab call is slightly faster than kmalloc and can be better cached
per CPU. It also avoids rounding to the next kmalloc slab.

In addition this enforces cache line alignment for struct dio to avoid
any false sharing.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:57 +02:00
Andi Kleen
0dc2bc49be direct-io: rearrange fields in dio/dio_submit to avoid holes
Fix most problems reported by pahole.

There is still a weird 104 byte hole after map_bh. I'm not sure what
causes this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:56 +02:00
Andi Kleen
cde1ecb324 direct-io: fix a wrong comment
There's nothing on the stack, even before my changes.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:56 +02:00
Andi Kleen
eb28be2b4c direct-io: separate fields only used in the submission path from struct dio
This large, but largely mechanic, patch moves all fields in struct dio
that are only used in the submission path into a separate on stack
data structure. This has the advantage that the memory is very likely
cache hot, which is not guaranteed for memory fresh out of kmalloc.

This also gives gcc more optimization potential because it can easier
determine that there are no external aliases for these variables.

The sdio initialization is a initialization now instead of memset.
This allows gcc to break sdio into individual fields and optimize
away unnecessary zeroing (after all the functions are inlined)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:56 +02:00
Christoph Hellwig
62a3ddef61 vfs: fix spinning prevention in prune_icache_sb
We need to move the inode to the end of the list to actually make the
spinning prevention explained in the comment above it work.  With a
plain list_move it will simply stay in place as we're always reclaiming
from the head of the list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:55 +02:00
Andreas Gruenbacher
948409c74d vfs: add a comment to inode_permission()
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:55 +02:00
Andreas Gruenbacher
d124b60a83 vfs: pass all mask flags check_acl and posix_acl_permission
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:54 +02:00
Andreas Gruenbacher
8fd90c8d1d vfs: indicate that the permission functions take all the MAY_* flags
Acked-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruen@kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:54 +02:00
Eric W. Biederman
1448c721e4 compat: sync compat_stats with statfs.
This was found by inspection while tracking a similar
bug in compat_statfs64, that has been fixed in mainline
since decemeber.

- This fixes a bug where not all of the f_spare fields
  were cleared on mips and s390.
- Add the f_flags field to struct compat_statfs
- Copy f_flags to userspace in case someone cares.
- Use __clear_user to copy the f_spare field to userspace
  to ensure that all of the elements of f_spare are cleared.
  On some architectures f_spare is has 5 ints and on some
  architectures f_spare only has 4 ints.  Which makes
  the previous technique of clearing each int individually
  broken.

I don't expect anyone actually uses the old statfs system
call anymore but if they do let them benefit from having
the compat and the native version working the same.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:53 +02:00
Bryan Schumaker
a877ee03ac vfs: add "device" tag to /proc/self/mountstats
nfsiostat was failing to find mounted filesystems on kernels after
2.6.38 because of changes to show_vfsstat() by commit
c7f404b40a3665d9f4e9a927cc5c1ee0479ed8f9.  This patch adds back the
"device" tag before the nfs server entry so scripts can parse the
mountstats file correctly.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: stable@kernel.org [>=2.6.39]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 13:55:08 +02:00
Wang Sheng-Hui
814e1d25a5 cleanup: vfs: small comment fix for block_invalidatepage
The patch is aganist 3.1-rc3.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 13:55:08 +02:00
Steve French
96814ecb40 Add definition for share encryption
Samba supports a setfs info level to negotiate encrypted
shares.  This patch adds the defines so we recognize
this info level.  Later patches will add the enablement
for it.

Acked-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-27 16:53:31 -05:00
Eric Gouriou
80e675f906 ext4: optimize memmmove lengths in extent/index insertions
ext4_ext_insert_extent() (respectively ext4_ext_insert_index())
was using EXT_MAX_EXTENT() (resp. EXT_MAX_INDEX()) to determine
how many entries needed to be moved beyond the insertion point.
In practice this means that (320 - I) * 24 bytes were memmove()'d
when I is the insertion point, rather than (#entries - I) * 24 bytes.

This patch uses EXT_LAST_EXTENT() (resp. EXT_LAST_INDEX()) instead
to only move existing entries. The code flow is also simplified
slightly to highlight similarities and reduce code duplication in
the insertion logic.

This patch reduces system CPU consumption by over 25% on a 4kB
synchronous append DIO write workload when used with the
pre-2.6.39 x86_64 memmove() implementation. With the much faster
2.6.39 memmove() implementation we still see a decrease in
system CPU usage between 2% and 7%.

Note that the ext_debug() output changes with this patch, splitting
some log information between entries. Users of the ext_debug() output
should note that the "move %d" units changed from reporting the number
of bytes moved to reporting the number of entries moved.

Signed-off-by: Eric Gouriou <egouriou@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-27 11:52:18 -04:00
Eric Gouriou
6f91bc5fda ext4: optimize ext4_ext_convert_to_initialized()
This patch introduces a fast path in ext4_ext_convert_to_initialized()
for the case when the conversion can be performed by transferring
the newly initialized blocks from the uninitialized extent into
an adjacent initialized extent. Doing so removes the expensive
invocations of memmove() which occur during extent insertion and
the subsequent merge.

In practice this should be the common case for clients performing
append writes into files pre-allocated via
fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via
direct IO and when using a suboptimal implementation of memmove()
(x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU
consumption by 32%.

Two new trace points are added to ext4_ext_convert_to_initialized()
to offer visibility into its operations. No exit trace point has
been added due to the multiplicity of return points. This can be
revisited once the upstream cleanup is backported.

Signed-off-by: Eric Gouriou <egouriou@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-27 11:43:23 -04:00
Randy Dunlap
4470575461 jbd2: fix build when CONFIG_BUG is not enabled
Fix build error when CONFIG_BUG is not enabled:

fs/jbd2/transaction.c:1175:3: error: implicit declaration of function '__WARN'

by changing __WARN() to WARN_ON(), as suggested by
Arnaud Lacombe <lacombar@gmail.com>.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Arnaud Lacombe <lacombar@gmail.com>
2011-10-27 04:05:13 -04:00
Boaz Harrosh
60325f0c6e fs/Makefile: Stupid typo breakage of exofs inclusion
In my last patch I did a stupid mistake and broke the exofs
compilation completely. Fix it ASAP.

Instead of obj-y I did obj-$(y)

Really Really sorry. Me totally blushing :-{|

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-27 08:36:51 +02:00
Linus Torvalds
c28cfd60e4 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd: (21 commits)
  ore: Enable RAID5 mounts
  exofs: Support for RAID5 read-4-write interface.
  ore: RAID5 Write
  ore: RAID5 read
  fs/Makefile: Always inspect exofs/
  ore: Make ore_calc_stripe_info EXPORT_SYMBOL
  ore/exofs: Change ore_check_io API
  ore/exofs: Define new ore_verify_layout
  ore: Support for partial component table
  ore: Support for short read/writes
  exofs: Support for short read/writes
  ore: Remove check for ios->kern_buff in _prepare_for_striping to later
  ore: cleanup: Embed an ore_striping_info inside ore_io_state
  ore: Only IO one group at a time (API change)
  ore/exofs: Change the type of the devices array (API change)
  ore: Make ore_striping_info and ore_calc_stripe_info public
  exofs: Remove unused data_map member from exofs_sb_info
  exofs: Rename struct ore_components comps => oc
  exofs/super.c: local functions should be static
  exofs/ore.c: local functions should be static
  ...
2011-10-26 21:33:50 +02:00
Linus Torvalds
39adff5f69 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  time, s390: Get rid of compile warning
  dw_apb_timer: constify clocksource name
  time: Cleanup old CONFIG_GENERIC_TIME references that snuck in
  time: Change jiffies_to_clock_t() argument type to unsigned long
  alarmtimers: Fix error handling
  clocksource: Make watchdog reset lockless
  posix-cpu-timers: Cure SMP accounting oddities
  s390: Use direct ktime path for s390 clockevent device
  clockevents: Add direct ktime programming function
  clockevents: Make minimum delay adjustments configurable
  nohz: Remove "Switched to NOHz mode" debugging messages
  proc: Consider NO_HZ when printing idle and iowait times
  nohz: Make idle/iowait counter update conditional
  nohz: Fix update_ts_time_stat idle accounting
  cputime: Clean up cputime_to_usecs and usecs_to_cputime macros
  alarmtimers: Rework RTC device selection using class interface
  alarmtimers: Add try_to_cancel functionality
  alarmtimers: Add more refined alarm state tracking
  alarmtimers: Remove period from alarm structure
  alarmtimers: Remove interval cap limit hack
  ...
2011-10-26 17:15:03 +02:00
Tao Ma
b3ff056908 ext4: don't check io->flag when setting EXT4_STATE_DIO_UNWRITTEN inode state
When we want to convert the unitialized extent in direct write, we can
either do it in ext4_end_io_nolock(AIO case) or in
ext4_ext_direct_IO(non AIO case) and EXT4_I(inode)->cur_aio_dio is a
guard for ext4_ext_map_blocks to find the right case.  In e9e3bcecf,
we mistakenly change it by:

-			if (io)
+			if (io && !(io->flag & EXT4_IO_END_UNWRITTEN)) {
 				io->flag = EXT4_IO_END_UNWRITTEN;
-			else
+				atomic_inc(&EXT4_I(inode)->i_aiodio_unwritten);
+			} else
 				ext4_set_inode_state(inode,
 						     EXT4_STATE_DIO_UNWRITTEN);

So now if we map 2 blocks, and the first one set the
EXT_IO_END_UNWRITTEN, the 2nd mapping will set inode state because of
the check for the flag. This is wrong.

Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 11:08:39 -04:00
Robin Dong
0a10da73e1 ext4: fix a wrong comment in __mb_check_buddy()
The comment says the bit should be 0, but the after code assert the
bit to be 1.  This makes people confused, so fix it.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 08:48:54 -04:00
Linus Torvalds
e33bae14fd Merge branch 'for-linus' of git://github.com/ericvh/linux
* 'for-linus' of git://github.com/ericvh/linux:
  9p: fix 9p.txt to advertise msize instead of maxdata
  net/9p: Convert net/9p protocol dumps to tracepoints
  fs/9p: change an int to unsigned int
  fs/9p: Cleanup option parsing in 9p
  9p: move dereference after NULL check
  fs/9p: inode file operation is properly initialized init_special_inode
  fs/9p: Update zero-copy implementation in 9p
2011-10-26 14:20:53 +02:00
Robin Dong
b051d8dc4e ext4: remove unused variable in mb_find_extent()
The variable 'ord' in function mb_find_extent() is redundant, so
remove it.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 05:30:30 -04:00
Robin Dong
66a83cde47 ext4: remove unused variable in ext4_mb_generate_from_pa()
The variable 'count' in function ext4_mb_generate_from_pa() looks
useless, so remove it.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 05:29:21 -04:00
Robin Dong
ebbe027797 ext4: use stream-alloc when mb_group_prealloc set to zero
The kernel will crash on 

ext4_mb_mark_diskspace_used:
	BUG_ON(ac->ac_b_ex.fe_len <= 0);

after we set /sys/fs/ext4/sda/mb_group_prealloc to zero and create new files in an ext4 filesystem.

The reason is: ac_b_ex.fe_len also set to zero(mb_group_prealloc) in ext4_mb_normalize_group_request
because the ac_flags contains EXT4_MB_HINT_GROUP_ALLOC.

I think when someone set mb_group_prealloc to zero, it means DO NOT USE GROUP PREALLOCATION,
so we should set alloc-strategy to STREAM in this case.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 05:14:27 -04:00
Yongqiang Yang
fcbb551582 ext4: let ext4_page_mkwrite stop started handle in failure
The started journal handle should be stopped in failure case.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
2011-10-26 05:00:19 -04:00
Curt Wohlgemuth
6f8ff53726 ext4: handle NULL p_ext in ext4_ext_next_allocated_block()
In ext4_ext_next_allocated_block(), the path[depth] might
have a p_ext that is NULL -- see ext4_ext_binsearch().  In
such a case, dereferencing it will crash the machine.

This patch checks for p_ext == NULL in
ext4_ext_next_allocated_block() before dereferencinging it.

Tested using a hand-crafted an inode with eh_entries == 0 in
an extent block, verified that running FIEMAP on it crashes
without this patch, works fine with it.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 04:38:59 -04:00
Dan Carpenter
f85b287a01 ext4: error handling fix in ext4_ext_convert_to_initialized()
When allocated is unsigned it breaks the error handling at the end
of the function when we call:
	allocated = ext4_split_extent(...);
	if (allocated < 0)
		err = allocated;

I've made it a signed int instead of unsigned.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 03:42:36 -04:00
Eric Sandeen
665436175c ext4: use ext4_reserve_inode_write in ext4_xattr_set_handle
ext4_mark_iloc_dirty() says:

 * The caller must have previously called ext4_reserve_inode_write().
 * Give this, we know that the caller already has write access to iloc->bh.

ext4_xattr_set_handle, however, just open-codes it.  May as well use
the helper function for consistency.

No bug here, just tidiness.

(Note: on cleanup path, ext4_reserve_inode_write sets
the bh to NULL if it returns an error, and brelse() of 
a null bh is handled gracefully).

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 03:32:07 -04:00
Andreas Dilger
909a4cf1ff ext4: avoid setting directory i_nlink to zero
If a directory with more than EXT4_LINK_MAX subdirectories, the nlink
count is set to 1.  Subsequently, if any subdirectories are deleted,
ext4_dec_count() decrements the i_nlink count, which may go to 0
temporarily before being incremented back to 1.

While this is done under i_mutex, which prevents races for directory
and inode operations that check i_nlink, the temporary i_nlink == 0
case is exposed to userspace via stat() and similar calls that do not
hold i_mutex.

Instead, change the code to not decrement i_nlink count for any
directories that do not already have i_nlink larger than 2.

Reported-by: Cliff White <cliffw@whamcloud.com>
Reviewed-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26 03:22:31 -04:00
Sage Weil
3395734067 libceph: fix double-free of page vector
ceph_release_page_vector() kfrees the vector; we shouldn't do it here too.

Reported-by: Jeff Wu <cpwu@tnsoft.com.cn>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:17 -07:00
Amon Ott
3310f7541f ceph: fix 32-bit ino numbers
Fix 32-bit ino generation to not always be 1.

Signed-off-by: Amon Ott <a.ott@m-privacy.de>
2011-10-25 16:10:17 -07:00
Greg Farnum
a35eca958a ceph: let the set_layout ioctl set single traits
Previously we were validating the passed-in stripe unit, object size,
and stripe count against each other (and not testing most other stuff).
Instead, make sure that the composed previous layout and new values are valid,
and only send the new values to the MDS. This lets users change the
pool without setting the whole layout, for instance.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
2011-10-25 16:10:16 -07:00
Sage Weil
83eaea22bd Revert "ceph: don't truncate dirty pages in invalidate work thread"
This reverts commit c9af9fb68e01eb2c2165e1bc45cfeeed510c64e6.

We need to block and truncate all pages in order to reliably invalidate
them.  Otherwise, we could:

 - have some uptodate pages in the cache
 - queue an invalidate
 - write(2) locks some pages
 - invalidate_work skips them
 - write(2) only overwrites part of the page
 - page now dirty and uptodate
 -> partial leakage of invalidated data

It's not entirely clear why we started skipping locked pages in the first
place.  I just ran this through fsx and didn't see any problems.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:16 -07:00
Noah Watkins
80db8bea6a ceph: replace leading spaces with tabs
Trivial formatting fix.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:16 -07:00
Sage Weil
b61c27636f libceph: don't complain on msgpool alloc failures
The pool allocation failures are masked by the pool; there is no need to
spam the console about them.  (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Sage Weil
6ab00d465a libceph: create messenger with client
This simplifies the init/shutdown paths, and makes client->msgr available
during the rest of the setup process.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Sage Weil
6a8ea4706a ceph: document ioctls
...after some prodding by Christoph.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Sage Weil
0d66a487c1 ceph: implement (optional) max read size
The 'rsize' mount option limits the maximum size of an individual
read(ahead) operation that is sent off to an OSD.  This is distinct from
'rasize', which controls the size of the readahead window.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Sage Weil
83817e35cb ceph: rename rsize -> rasize
It controls readahead.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Sage Weil
7c272194e6 ceph: make readpages fully async
When we get a ->readpages() aop, submit async reads for all page ranges
in the provided page list.  Lock the pages immediately, so that VFS/MM
will block until the reads complete.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:14 -07:00
Linus Torvalds
ef78cc75f1 Merge branch 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (26 commits)
  Check validity of cl_rpcclient in nfs_server_list_show
  NFS: Get rid of the nfs_rdata_mempool
  NFS: Don't rely on PageError in nfs_readpage_release_partial
  NFS: Get rid of unnecessary calls to ClearPageError() in read code
  NFS: Get rid of nfs_restart_rpc()
  NFS: Get rid of the unused nfs_write_data->flags field
  NFS: Get rid of the unused nfs_read_data->flags field
  NFSv4: Translate NFS4ERR_BADNAME into ENOENT when applied to a lookup
  NFS: Remove the unused "lookupfh()" version of nfs4_proc_lookup()
  NFS: Use the inode->i_version to cache NFSv4 change attribute information
  SUNRPC: Remove unnecessary export of rpc_sockaddr2uaddr
  SUNRPC: Fix rpc_sockaddr2uaddr
  nfs/super.c: local functions should be static
  pnfsblock: fix writeback deadlock
  pnfsblock: fix NULL pointer dereference
  pnfs: recoalesce when ld read pagelist fails
  pnfs: recoalesce when ld write pagelist fails
  pnfs: make _set_lo_fail generic
  pnfsblock: add missing rpc_put_mount and path_put
  SUNRPC/NFS: make rpc pipe upcall generic
  ...
2011-10-25 15:44:06 +02:00
Linus Torvalds
1442d1678c Merge branch 'for-3.2' of git://linux-nfs.org/~bfields/linux
* 'for-3.2' of git://linux-nfs.org/~bfields/linux: (103 commits)
  nfs41: implement DESTROY_CLIENTID operation
  nfsd4: typo logical vs bitwise negate for want_mask
  nfsd4: allow NFS4_SHARE_SIGNAL_DELEG_WHEN_RESRC_AVAIL | NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED
  nfsd4: seq->status_flags may be used unitialized
  nfsd41: use SEQ4_STATUS_BACKCHANNEL_FAULT when cb_sequence is invalid
  nfsd4: implement new 4.1 open reclaim types
  nfsd4: remove unneeded CLAIM_DELEGATE_CUR workaround
  nfsd4: warn on open failure after create
  nfsd4: preallocate open stateid in process_open1()
  nfsd4: do idr preallocation with stateid allocation
  nfsd4: preallocate nfs4_file in process_open1()
  nfsd4: clean up open owners on OPEN failure
  nfsd4: simplify process_open1 logic
  nfsd4: make is_open_owner boolean
  nfsd4: centralize renew_client() calls
  nfsd4: typo logical vs bitwise negate
  nfs: fix bug about IPv6 address scope checking
  nfsd4: more robust ignoring of WANT bits in OPEN
  nfsd4: move name-length checks to xdr
  nfsd4: move access/deny validity checks to xdr code
  ...
2011-10-25 15:42:01 +02:00
Darrick J. Wong
cf8039036a ext4: prevent stack overrun in ext4_file_open
In ext4_file_open, the filesystem records the mountpoint of the first
file that is opened after mounting the filesystem.  It does this by
allocating a 64-byte stack buffer, calling d_path() to grab the mount
point through which this file was accessed, and then memcpy()ing 64
bytes into the superblock's s_last_mounted field, starting from the
return value of d_path(), which is stored as "cp".  However, if cp >
buf (which it frequently is since path components are prepended
starting at the end of buf) then we can end up copying stack data into
the superblock.

Writing stack variables into the superblock doesn't sound like a great
idea, so use strlcpy instead.  Andi Kleen suggested using strlcpy
instead of strncpy.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-25 09:18:41 -04:00
Eric W. Biederman
b9e2780d57 sysfs: Remove support for tagged directories with untagged members (again)
In commit 8a9ea3237e7e ("Merge git://.../davem/net-next") where my sysfs
changes from the net tree merged with the sysfs rbtree changes from
Mickulas Patocka the conflict resolution failed to preserve the
simplified property that was the point of my changes.

That is sysfs_find_dirent can now say something is a match if and only
s_name and s_ns match what we are looking for, and sysfs_readdir can
simply return all of the directory entries where s_ns matches the
directory that we should be returning.

Now that we are back to exact matches we can tweak sysfs_find_dirent and
the name rb_tree to order sysfs_dirents by s_ns s_name and remove the
second loop in sysfs_find_dirent.  However that change seems a bit much
for a conflict resolution so it can come later.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-25 15:10:28 +02:00
Dmitry Monakhov
a4e5d88b1b ext4: update EOFBLOCKS flag on fallocate properly
EOFBLOCK_FL should be updated if called w/o FALLOCATE_FL_KEEP_SIZE
Currently it happens only if new extent was allocated.

TESTCASE:
fallocate test_file -n -l4096
fallocate test_file -l4096
Last fallocate cmd has updated size, but keept EOFBLOCK_FL set. And
fsck will complain about that.

Also remove ping pong in ext4_fallocate() in case of new extents,
where ext4_ext_map_blocks() clear EOFBLOCKS bit, and later
ext4_falloc_update_inode() restore it again.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-25 08:15:12 -04:00
Linus Torvalds
8a9ea3237e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits)
  dp83640: free packet queues on remove
  dp83640: use proper function to free transmit time stamping packets
  ipv6: Do not use routes from locally generated RAs
  |PATCH net-next] tg3: add tx_dropped counter
  be2net: don't create multiple RX/TX rings in multi channel mode
  be2net: don't create multiple TXQs in BE2
  be2net: refactor VF setup/teardown code into be_vf_setup/clear()
  be2net: add vlan/rx-mode/flow-control config to be_setup()
  net_sched: cls_flow: use skb_header_pointer()
  ipv4: avoid useless call of the function check_peer_pmtu
  TCP: remove TCP_DEBUG
  net: Fix driver name for mdio-gpio.c
  ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
  rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
  ipv4: fix ipsec forward performance regression
  jme: fix irq storm after suspend/resume
  route: fix ICMP redirect validation
  net: hold sock reference while processing tx timestamps
  tcp: md5: add more const attributes
  Add ethtool -g support to virtio_net
  ...

Fix up conflicts in:
 - drivers/net/Kconfig:
	The split-up generated a trivial conflict with removal of a
	stale reference to Documentation/networking/net-modules.txt.
	Remove it from the new location instead.
 - fs/sysfs/dir.c:
	Fairly nasty conflicts with the sysfs rb-tree usage, conflicting
	with Eric Biederman's changes for tagged directories.
2011-10-25 13:25:22 +02:00
Stanislav Kinsbursky
16d0587090 NFSd: call svc rpcbind cleanup explicitly
We have to call svc_rpcb_cleanup() explicitly from nfsd_last_thread() since
this function is registered as service shutdown callback and thus nobody else
will done it for us.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-25 13:19:40 +02:00
Linus Torvalds
2d03423b23 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
  mm: memory hotplug: Check if pages are correctly reserved on a per-section basis
  Revert "memory hotplug: Correct page reservation checking"
  Update email address for stable patch submission
  dynamic_debug: fix undefined reference to `__netdev_printk'
  dynamic_debug: use a single printk() to emit messages
  dynamic_debug: remove num_enabled accounting
  dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions
  uio: Support physical addresses >32 bits on 32-bit systems
  sysfs: add unsigned long cast to prevent compile warning
  drivers: base: print rejected matches with DEBUG_DRIVER
  memory hotplug: Correct page reservation checking
  memory hotplug: Refuse to add unaligned memory regions
  remove the messy code file Documentation/zh_CN/SubmitChecklist
  ARM: mxc: convert device creation to use platform_device_register_full
  new helper to create platform devices with dma mask
  docs/driver-model: Update device class docs
  docs/driver-model: Document device.groups
  kobj_uevent: Ignore if some listeners cannot handle message
  dynamic_debug: make netif_dbg() call __netdev_printk()
  dynamic_debug: make netdev_dbg() call __netdev_printk()
  ...
2011-10-25 12:13:59 +02:00
Linus Torvalds
59e5253417 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
  MAINTAINERS: linux-m32r is moderated for non-subscribers
  linux@lists.openrisc.net is moderated for non-subscribers
  Drop default from "DM365 codec select" choice
  parisc: Kconfig: cleanup Kernel page size default
  Kconfig: remove redundant CONFIG_ prefix on two symbols
  cris: remove arch/cris/arch-v32/lib/nand_init.S
  microblaze: add missing CONFIG_ prefixes
  h8300: drop puzzling Kconfig dependencies
  MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
  tty: drop superfluous dependency in Kconfig
  ARM: mxc: fix Kconfig typo 'i.MX51'
  Fix file references in Kconfig files
  aic7xxx: fix Kconfig references to READMEs
  Fix file references in drivers/ide/
  thinkpad_acpi: Fix printk typo 'bluestooth'
  bcmring: drop commented out line in Kconfig
  btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
  doc: raw1394: Trivial typo fix
  CIFS: Don't free volume_info->UNC until we are entirely done with it.
  treewide: Correct spelling of successfully in comments
  ...
2011-10-25 12:11:02 +02:00