Commit Graph

1513 Commits

Author SHA1 Message Date
Trond Myklebust
beb2a5ec38 NFSv4: Ensure change attribute returned by GETATTR callback conforms to spec
According to RFC3530 we're supposed to cache the change attribute
 at the time the client receives a write delegation.
 If the inode is clean, a CB_GETATTR callback by the server to the
 client is supposed to return the cached change attribute.
 If, OTOH, the inode is dirty, the client should bump the cached
 change attribute by 1.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:51 -05:00
Trond Myklebust
566dd6064e NFS: Make directIO aware of compound pages...
...and avoid calling set_page_dirty on them

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:50 -05:00
Trond Myklebust
70b9ecbdb9 NFS: Make stat() return updated mtimes after a write()
The SuS states that a call to write() will cause mtime to be updated on
 the file. In order to satisfy that requirement, we need to flush out
 any cached writes in nfs_getattr().
 Speed things up slightly by not committing the writes.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:50 -05:00
Trond Myklebust
24174119c7 NFSv4: Ensure that we return the delegation on the target of a rename too.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:50 -05:00
Chuck Lever
40859d7ee6 NFS: support large reads and writes on the wire
Most NFS server implementations allow up to 64KB reads and writes on the
 wire.  The Solaris NFS server allows up to a megabyte, for instance.

 Now the Linux NFS client supports transfer sizes up to 1MB, too.  This will
 help reduce protocol and context switch overhead on read/write intensive NFS
 workloads, and support larger atomic read and write operations on servers
 that support them.

 Test-plan:
 Connectathon and iozone on mount point with wsize=rsize>32768 over TCP.
 Tests with NFS over UDP to verify the maximum RPC payload size cap.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:49 -05:00
Chuck Lever
325cfed9ae NFS: make "inode number mismatch" message more useful
To help NFS users and server developers, make the "inode number mismatch"
 message display more useful information.

 Test-plan:
 None.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:49 -05:00
Chuck Lever
dc20f80390 NFS: get rid of useless kernel log message
nfs_statfs() generates a log message when GETATTR returns an error.  This
 is usually a useless message.  Make it a dprintk.

 Test plan:
 None

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:48 -05:00
Chuck Lever
6b59a75460 NFS: Fix error recovery code in fs/nfs/inode.c:__init_nfs()
Red Hat found a problem in the error recovery logic in __init_nfs.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:48 -05:00
Chuck Lever
ce1a8e6796 NFS: use generic_write_checks() to sanity check direct writes
Replace ad hoc write parameter sanity checking in nfs_file_direct_write()
 with a call to generic_write_checks().  This should make the proper checks
 modulo the O_LARGEFILE flag, and should catch NFSv2-specific limitations by
 virtue of i_sb->s_maxbytes.

 Test plan:
 Posix compliance testing with both NFSv2 and NFSv3.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:47 -05:00
Trond Myklebust
286d7d6a0c NFSv4: Remove requirement for machine creds for the "setclientid" operation
Use a cred from the nfs4_client->cl_state_owners list.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:47 -05:00
Trond Myklebust
b4454fe1a7 NFSv4: Remove requirement for machine creds for the "renew" operation
In RFC3530, the RENEW operation is allowed to use either

 the same principal, RPC security flavour and (if RPCSEC_GSS), the same
  mechanism and service that was used for SETCLIENTID_CONFIRM

 OR

 Any principal, RPC security flavour and service combination that
 currently has an OPEN file on the server.

 Choose the latter since that doesn't require us to keep credentials for
 the same principal for the entire duration of the mount.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:47 -05:00
Trond Myklebust
58d9714a44 NFSv4: Send RENEW requests to the server only when we're holding state
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:46 -05:00
Trond Myklebust
5043e900f5 NFS: Convert instances of kernel_thread() to kthread()
Convert private implementations in NFSv4 state recovery and delegation
 code to use kthreads.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:46 -05:00
Trond Myklebust
433fbe4c88 NFSv4: State recovery cleanup
Use wait_on_bit() when waiting for state recovery to complete.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:45 -05:00
Trond Myklebust
26e976a884 NFSv4: OPEN/LOCK/LOCKU/CLOSE will automatically renew the NFSv4 lease
Cut down on the number of unnecessary RENEW requests on the wire.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:45 -05:00
Trond Myklebust
2bd615797e SUNRPC: Ensure that SIGKILL will always terminate a synchronous RPC call.
...and make sure that the "intr" flag also enables SIGHUP and SIGTERM to
 interrupt RPC calls too (as per the Solaris implementation).

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:45 -05:00
Trond Myklebust
fe650407a8 NFSv4: Make DELEGRETURN an interruptible operation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:44 -05:00
Trond Myklebust
a5d16a4d09 NFSv4: Convert LOCK rpc call into an asynchronous RPC call
In order to allow users to interrupt/cancel it.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:44 -05:00
Trond Myklebust
911d1aaf26 NFSv4: locking XDR cleanup
Get rid of some unnecessary intermediate structures

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:44 -05:00
Trond Myklebust
864472e9b8 NFSv4: Make open recovery track O_RDWR, O_RDONLY and O_WRONLY correctly
When recovering from a delegation recall or a network partition, we need
 to replay open(O_RDWR), open(O_RDONLY) and open(O_WRONLY) separately.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:43 -05:00
Trond Myklebust
e761692381 NFSv4: Make nfs4_state track O_RDWR, O_RDONLY and O_WRONLY separately
A closer reading of RFC3530 reveals that OPEN_DOWNGRADE must always
 specify a access modes that have been the argument of a previous OPEN
 operation.
 IOW: doing OPEN(O_RDWR) and then OPEN_DOWNGRADE(O_WRONLY) is forbidden
 unless the user called OPEN(O_WRONLY)

 In order to fix that, we really need to track the three possible open
 states separately.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:43 -05:00
Trond Myklebust
cdd4e68b5f NFSv4: Make open_confirm() asynchronous too
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:42 -05:00
Trond Myklebust
24ac23ab88 NFSv4: Convert open() into an asynchronous RPC call
OPEN is a stateful operation, so we must ensure that it always
 completes. In order to allow users to interrupt the operation,
 we need to make the RPC call asynchronous, and then wait on
 completion (or cancel).

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:42 -05:00
Trond Myklebust
e56e0b78eb NFSv4: Allocate OPEN call RPC arguments using kmalloc()
Cleanup in preparation for making OPEN calls interruptible by the user.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:41 -05:00
Trond Myklebust
06f814a3ad NFSv4: Make locku use the new RPC "wait on completion" interface.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:40 -05:00
Trond Myklebust
44c288732f NFSv4: stateful NFSv4 RPC call interface
The NFSv4 model requires us to complete all RPC calls that might
 establish state on the server whether or not the user wants to
 interrupt it. We may also need to schedule new work (including
 new RPC calls) in order to cancel the new state.

 The asynchronous RPC model will allow us to ensure that RPC calls
 always complete, but in order to allow for "synchronous" RPC, we
 want to add the ability to wait for completion.
 The waits are, of course, interruptible.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:40 -05:00
Trond Myklebust
4ce70ada1f SUNRPC: Further cleanups
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:40 -05:00
Trond Myklebust
963d8fe533 RPC: Clean up RPC task structure
Shrink the RPC task structure. Instead of storing separate pointers
 for task->tk_exit and task->tk_release, put them in a structure.

 Also pass the user data pointer as a parameter instead of passing it via
 task->tk_calldata. This enables us to nest callbacks.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:39 -05:00
Trond Myklebust
abd3e641d5 NFS: Work correctly with single-page ->writepage() calls
Ensure that we always initiate flushing of data before we exit
 a single-page ->writepage() call.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:39 -05:00
Linus Torvalds
d99cf9d679 Merge branch 'post-2.6.15' of git://brick.kernel.dk/data/git/linux-2.6-block
Manual fixup for merge with Jens' "Suspend support for libata", commit
ID 9b84754866.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 09:01:25 -08:00
Neil Brown
9f708e40fe [PATCH] knfsd: reduce stack consumption
A typical nfsd call trace is
 nfsd -> svc_process -> nfsd_dispatch -> nfsd3_proc_write ->
   nfsd_write ->nfsd_vfs_write -> vfs_writev

These add up to over 300 bytes on the stack.
Looking at each of these, I see that nfsd_write (which includes
 nfsd_vfs_write) contributes 0x8c to stack usage itself!!

It turns out this is because it puts a 'struct iattr' on the stack so
it can kill suid if needed.  The following patch saves about 50 bytes
off the stack in this call path.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
David Shaw
a334de2866 [PATCH] knfsd: check error status from vfs_getattr and i_op->fsync
Both vfs_getattr and i_op->fsync return error statuses which nfsd was
largely ignoring.  This as noticed when exporting directories using fuse.

This patch cleans up most of the offences, which involves moving the call
to vfs_getattr out of the xdr encoding routines (where it is too late to
report an error) into the main NFS procedure handling routines.

There is still a called to vfs_gettattr (related to the ACL code) where the
status is ignored, and called to nfsd_sync_dir don't check return status
either.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
Jan Kara
f93ea411b7 [PATCH] jbd: split checkpoint lists
Split the checkpoint list of the transaction into two lists.  In the first
list we keep the buffers that need to be submitted for IO.  In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete.  This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
Miklos Szeredi
39ee059aff [PATCH] fuse: check file type in lookup
Previously invalid types were quietly changed to regular files, but at
revalidation the inode was changed to bad.  This was rather inconsistent
behavior.

Now check if the type is valid on initial lookup, and return -EIO if not.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:56 -08:00
Miklos Szeredi
6ad84acab9 [PATCH] fuse: ensure progress in read and write
In direct_io mode, send at least one page per reqest.  Previously it was
possible that reqests with zero data were sent, and hence the read/write
didn't make any progress, resulting in an infinite (though interruptible)
loop.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:56 -08:00
Miklos Szeredi
3ec870d524 [PATCH] fuse: make maximum write data configurable
Make the maximum size of write data configurable by the filesystem.  The
previous fixed 4096 limit only worked on architectures where the page size is
less or equal to this.  This change make writing work on other architectures
too, and also lets the filesystem receive bigger write requests in direct_io
mode.

Normal writes which go through the page cache are still limited to a page
sized chunk per request.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:56 -08:00
Miklos Szeredi
1d3d752b47 [PATCH] fuse: clean up request size limit checking
Change the way a too large request is handled.  Until now in this case the
device read returned -EINVAL and the operation returned -EIO.

Make it more flexibible by not returning -EINVAL from the read, but restarting
it instead.

Also remove the fixed limit on setxattr data and let the filesystem provide as
large a read buffer as it needs to handle the extended attribute data.

The symbolic link length is already checked by VFS to be less than PATH_MAX,
so the extra check against FUSE_SYMLINK_MAX is not needed.

The check in fuse_create_open() against FUSE_NAME_MAX is not needed, since the
dentry has already been looked up, and hence the name already checked.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:56 -08:00
Miklos Szeredi
248d86e87d [PATCH] fuse: fail file operations on bad inode
Make file operations on a bad inode fail.  This just makes things a
bit more consistent.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi
6f9f11806a [PATCH] fuse: add code documentation
Document some not-so-trivial functions.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi
8cbdf1e6f6 [PATCH] fuse: support caching negative dentries
Add support for caching negative dentries.

Up till now, ->d_revalidate() always forced a new lookup on these.  Now let
the lookup method return a zero node ID (not used for anything else) meaning a
negative entry, but with a positive cache timeout.  The old way of signaling
negative entry (replying ENOENT) still works.

Userspace should check the ABI minor version to see whether sending a zero ID
is allowed by the kernel or not.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi
de5f120255 [PATCH] fuse: add frsize to statfs reply
Add 'frsize' member to the statfs reply.

I'm not sure if sending f_fsid will ever be needed, but just in case leave
some space at the end of the structure, so less compatibility mess would be
required.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi
45714d6561 [PATCH] fuse: bump interface version
Change interface version to 7.4.

Following changes will need backward compatibility support, so store the minor
version returned by userspace.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi
4633a22e7a [PATCH] fuse: clean up page offset calculation
Use page_offset() instead of doing page offset calculation by hand.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:55 -08:00
Miklos Szeredi
0aa7c6990e [PATCH] fuse: clean up fuse_lookup()
Simplify fuse_lookup() and related functions.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:54 -08:00
Martin Schwidefsky
347a8dc3b8 [PATCH] s390: cleanup Kconfig
Sanitize some s390 Kconfig options.  We have ARCH_S390, ARCH_S390X,
ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT.  Replace these 6 options by
S390, 64BIT and COMPAT.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:53 -08:00
Peter Oberparleiter
56dc6a88ec [PATCH] s390: cms volume label definitions
Moved definition of CMS volume label to vtoc.h and modify partitions/ibm.c to
use this volume label definition instead of anonymous array.

Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:48 -08:00
David Howells
642fb4d1f1 [PATCH] NOMMU: Provide shared-writable mmap support on ramfs
The attached patch makes ramfs support shared-writable mmaps by:

 (1) Attempting to perform a contiguous block allocation to the requested size
     when truncate attempts to increase the file from zero size, such as
     happens when:

	fd = shm_open("/file/on/ramfs", ...):
	ftruncate(fd, size_requested);
	addr = mmap(NULL, subsize, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_SHARED,
		    fd, offset);

 (2) Permitting any shared-writable mapping over any contiguous set of extant
     pages. get_unmapped_area() will return the address into the actual ramfs
     pages. The mapping may start anywhere and be of any size, but may not go
     over the end of file. Multiple mappings may overlap in any way.

 (3) Not permitting a file to be shrunk if it would truncate any shared
     mappings (private mappings are copied).

Thus this patch provides support for POSIX shared memory on NOMMU kernels,
with certain limitations such as there being a large enough block of pages
available to support the allocation and it only working on directly mappable
filesystems.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:32 -08:00
Nick Piggin
9617d95e6e [PATCH] mm: rmap optimisation
Optimise rmap functions by minimising atomic operations when we know there
will be no concurrent modifications.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:27 -08:00
David Gibson
1e8f889b10 [PATCH] Hugetlb: Copy on Write support
Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
supported.  This helps us to safely use hugetlb pages in many more
applications.  The patch makes the following changes.  If needed, I also have
it broken out according to the following paragraphs.

1. Add a pair of functions to set/clear write access on huge ptes.  The
   writable check in make_huge_pte is moved out to the caller for use by COW
   later.

2. Hugetlb copy-on-write requires special case handling in the following
   situations:

   - copy_hugetlb_page_range() - Copied pages must be write protected so
     a COW fault will be triggered (if necessary) if those pages are written
     to.

   - find_or_alloc_huge_page() - Only MAP_SHARED pages are added to the
     page cache.  MAP_PRIVATE pages still need to be locked however.

3. Provide hugetlb_cow() and calls from hugetlb_fault() and
   hugetlb_no_page() which handles the COW fault by making the actual copy.

4. Remove the check in hugetlbfs_file_map() so that MAP_PRIVATE mmaps
   will be allowed.  Make MAP_HUGETLB exempt from the depricated VM_RESERVED
   mapping check.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:23 -08:00
Joshua Kwan
bd6a59b22f [PATCH] hfsplus oops fix
nls_utf8 is available, and the check in hfsplus_fill_super checks the wrong
pointer for NULLness (it checks the saved nls, not the new one that it
needs to use.)

Signed-off-by: Joshua Kwan <joshk@triplehelix.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:20 -08:00
Jens Axboe
80cfd548ee [BLOCK] bio: check for same page merge possibilities in __bio_add_page()
For filesystems with a blocksize < page size, we can merge same page
calls into the bio_vec at the end of the bio. This saves segments
on systems with a page size > the "normal" 4kb fs block size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-01-06 09:43:28 +01:00
Linus Torvalds
29552b1462 Merge http://oss.oracle.com/git/ocfs2 2006-01-05 20:43:11 -08:00
Linus Torvalds
db9edfd7e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
Trivial manual merge fixup for usb_find_interface clashes.
2006-01-04 18:44:12 -08:00
Linus Torvalds
52347f4e81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2006-01-04 16:34:57 -08:00
Linus Torvalds
f61ea1b0c8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2006-01-04 16:30:12 -08:00
Linus Torvalds
d347da0def Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2006-01-04 16:27:41 -08:00
Linus Torvalds
e28cc71572 Relax the rw_verify_area() error checking.
In particular, allow over-large read- or write-requests to be downgraded
to a more reasonable range, rather than considering them outright errors.

We want to protect lower layers from (the sadly all too common) overflow
conditions, but prefer to do so by chopping the requests up, rather than
just refusing them outright.

Cc: Peter Anvin <hpa@zytor.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-04 16:20:40 -08:00
Steven Rostedt
e80a5dea8e [PATCH] sysfs: handle failures in sysfs_make_dirent
I noticed that if sysfs_make_dirent fails to allocate the sd, then a
null will be passed to sysfs_put.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:09 -08:00
Greg Kroah-Hartman
8218ef8093 [PATCH] Driver core: Make block devices create the proper symlink name
Block devices need to add the block device name to the symlink they put
in the device directory, otherwise multiple symlinks of the same name
can be created.  This matches the class system, which works the same
way, we just forgot to convert block at the same time.

Cc: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:09 -08:00
Kay Sievers
312c004d36 [PATCH] driver core: replace "hotplug" by "uevent"
Leave the overloaded "hotplug" word to susbsystems which are handling
real devices. The driver core does not "plug" anything, it just exports
the state to userspace and generates events.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:08 -08:00
Kay Sievers
033b96fd30 [PATCH] remove mount/umount uevents from superblock handling
The names of these events have been confusing from the beginning
on, as they have been more like claim/release events. We needed these
events for noticing HAL if storage devices have been mounted.

Thanks to Al, we have the proper solution now and can poll()
/proc/mounts instead to get notfied about mount tree changes.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:07 -08:00
Arnaldo Carvalho de Melo
14c850212e [INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:21 -08:00
Mark Fasheh
51e7a59870 [PATCH] o Update Kconfig documentation to reflect support for readonly mounts.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-01-03 11:45:57 -08:00
Adrian Bunk
82353b594c [PATCH] This patch contains the following cleanups:
- cluster/sys.c: make needlessly global code static
- dlm/: "extern" declarations for variables belong into header files
        (and in this case, they are already in dlmdomain.h)

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-01-03 11:45:55 -08:00
Mark Fasheh
b4e40a5188 [PATCH] OCFS2: The Second Oracle Cluster Filesystem
Link the code into the kernel build system. OCFS2 is marked as
experimental.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:48 -08:00
Mark Fasheh
ccd979bdbc [PATCH] OCFS2: The Second Oracle Cluster Filesystem
The OCFS2 file system module.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:47 -08:00
Mark Fasheh
8df08c89c6 [PATCH] OCFS2: The Second Oracle Cluster Filesystem
dlmfs: A minimal dlm userspace interface implemented via a virtual
file system.
Most of the OCFS2 tools make use of this to take cluster locks when
doing operations on the file system.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:47 -08:00
Kurt Hackel
6714d8e86b [PATCH] OCFS2: The Second Oracle Cluster Filesystem
A distributed lock manager built with the cluster file system use case
in mind. The OCFS2 dlm exposes a VMS style API, though things have
been simplified internally. The only lock levels implemented currently
are NLMODE, PRMODE and EXMODE.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:47 -08:00
Zach Brown
98211489d4 [PATCH] OCFS2: The Second Oracle Cluster Filesystem
Node messaging via tcp. Used by the dlm and the file system for point
to point communication between nodes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:46 -08:00
Mark Fasheh
a7f6a5fb4b [PATCH] OCFS2: The Second Oracle Cluster Filesystem
Disk based heartbeat. Configured and started from userspace, the
kernel component handles I/O submission and event generation via
callback mechanism.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:46 -08:00
Kurt Hackel
0c83ed8eeb [PATCH] OCFS2: The Second Oracle Cluster Filesystem
A simple node information service, filled and updated from
userspace. The rest of the stack queries this service for simple node
information.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:46 -08:00
Zach Brown
52fd3d6fea [PATCH] OCFS2: The Second Oracle Cluster Filesystem
Very simple printk wrapper which adds the ability to enable various
sets of debug messages at run-time.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:45 -08:00
Zach Brown
994fc28c7b [PATCH] add AOP_TRUNCATED_PAGE, prepend AOP_ to WRITEPAGE_ACTIVATE
readpage(), prepare_write(), and commit_write() callers are updated to
understand the special return code AOP_TRUNCATED_PAGE in the style of
writepage() and WRITEPAGE_ACTIVATE.  AOP_TRUNCATED_PAGE tells the caller that
the callee has unlocked the page and that the operation should be tried again
with a new page.  OCFS2 uses this to detect and work around a lock inversion in
its aop methods.  There should be no change in behaviour for methods that don't
return AOP_TRUNCATED_PAGE.

WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
made enums so that kerneldoc can be used to document their semantics.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
2006-01-03 11:45:42 -08:00
Joel Becker
7063fbf226 [PATCH] configfs: User-driven configuration filesystem
Configfs, a file system for userspace-driven kernel object configuration.
The OCFS2 stack makes extensive use of this for propagation of cluster
configuration information into kernel.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2006-01-03 11:45:28 -08:00
Adrian Bunk
f4b09ebc8b update the email address of Randy Dunlap
This patch removes all references to the bouncing address
rddunlap@osdl.org and one dead web page from the kernel.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
2006-01-03 13:37:51 +01:00
Matt Mackall
4a4efbdee2 s/retreiv/retriev/g
As everyone knows, the rule is: "i before e.. um.. always."

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-03 13:27:11 +01:00
Adrian Bunk
7a1119b1fc fs/qnx4/bitmap.c: #if 0 qnx4_new_block()
qnx4_new_block() is neither implemented nor used.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Anders Larsen <al@alarsen.net>
2006-01-03 13:21:37 +01:00
Adrian Bunk
4d399cae3f remove pointers to the defunct UDF mailing list
This patch removes pointers to the defunct UDF mailing list.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-03 13:19:13 +01:00
Linus Torvalds
8b90db0df7 Insanity avoidance in /proc
The old /proc interfaces were never updated to use loff_t, and are just
generally broken.  Now, we should be using the seq_file interface for
all of the proc files, but converting the legacy functions is more work
than most people care for and has little upside..

But at least we can make the non-LFS rules explicit, rather than just
insanely wrapping the offset or something.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-30 08:39:10 -08:00
Paolo 'Blaisorblade' Giarrusso
30f04a4efa [PATCH] uml: hostfs - fix possible PAGE_CACHE_SHIFT overflows
Prevent page->index << PAGE_CACHE_SHIFT from overflowing.

There is a casting there, but was added without care, so it's at the wrong
place. Note the extra parens around the shift - "+" is higher precedence than
"<<", leading to a GCC warning which saved all us.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-29 09:48:15 -08:00
Paolo 'Blaisorblade' Giarrusso
3d0a07e331 [PATCH] Hostfs: remove unused var
Trivial removal of unused variable from this file - doesn't even change the
generated assembly code, in fact (gcc should trigger a warning for unused value
here).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-29 09:48:15 -08:00
Adrian Bunk
0b57ee9e55 [SPARC]: introduce a SPARC Kconfig symbol
Introduce a Kconfig symbol SPARC that is defined on both the sparc and
sparc64 architectures.

This symbol makes some dependencies more readable.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-22 23:09:54 -08:00
ASANO Masahiro
0800c5f7a4 [PATCH] fix posix lock on NFS
NFS client prevents mandatory lock, but there is a flaw on it; Locks are
possibly left if the mode is changed while locking.

This permits unlocking even if the mandatory lock bits are set.

Signed-off-by: ASANO Masahiro <masano@tnes.nec.co.jp>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-22 09:24:05 -08:00
Tom Zanussi
fd30fc3256 [PATCH] relayfs: remove warning printk() in relay_switch_subbuf()
There's currently a diagnostic printk in relay_switch_subbuf() meant as
a warning if you accidentally try to log an event larger than the
sub-buffer size.

The problem is if this happens while logging from somewhere it's not
safe to be doing printks, such as in the scheduler, you can end up with
a deadlock.  This patch removes the warning from relay_switch_subbuf()
and instead prints some diagnostic info when the channel is closed.

Thanks to Mathieu Desnoyers for pointing out the problem and
suggesting a fix.

Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-20 17:33:22 -08:00
Andreas Gruenbacher
b7964c3d88 [PATCH] nfsd: check for read-only exports before setting acls
We must check for MAY_SATTR before setting acls, which includes checking
for read-only exports: the lower-level setxattr operation that
eventually sets the acl cannot check export-level restrictions.

Bug reported by Martin Walter <mawa@uni-freiburg.de>.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-20 10:31:33 -08:00
Trond Myklebust
9b5b1f5bf9 NLM: Fix Oops in nlmclnt_mark_reclaim()
When mixing -olock and -onolock mounts on the same client, we have to
 check that fl->fl_u.nfs_fl.owner is set before dereferencing it.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-19 23:12:31 -05:00
Trond Myklebust
29884df0d8 NFS: Fix another O_DIRECT race
Ensure we call unmap_mapping_range() and sync dirty pages to disk before
 doing an NFS direct write.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-19 23:12:09 -05:00
James Bottomley
2a1e1379ba Merge by hand (conflicts in scsi_lib.c)
This merge is pretty extensive.  The conflict is over the new
req->retries parameter, so I had to change the prototype to
scsi_setup_blk_pc_cmnd() and the usage in sd, sr and st.

Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-15 17:35:24 -06:00
Mike Christie
defd94b754 [SCSI] seperate max_sectors from max_hw_sectors
- export __blk_put_request and blk_execute_rq_nowait
needed for async REQ_BLOCK_PC requests
- seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
already testing against max_sectors and SCSI-ml was setting max_sectors and
max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
value to overcome memory and feedback issues.

Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
drivers that used to call blk_queue_max_sectors with a large value of
max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-15 15:11:40 -08:00
Al Viro
51bfb75b0b [PATCH] xfs: missing gfp_t annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-15 10:04:29 -08:00
Mike Christie
6e68af666f [SCSI] Convert SCSI mid-layer to scsi_execute_async
Add scsi helpers to create really-large-requests and convert
scsi-ml to scsi_execute_async().

Per Jens's previous comments, I placed this function in scsi_lib.c.
I made it follow all the queue's limits - I think I did at least :), so
I removed the warning on the function header.

I think the scsi_execute_* functions should eventually take a request_queue
and be placed some place where the dm-multipath hw_handler can use them
if that failover code is going to stay in the kernel. That conversion
patch will be sent in another mail though.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-14 19:03:35 -08:00
Jeff Mahoney
2499604960 [PATCH] reiserfs: close open transactions on error path
The following patch fixes a bug where if the journal is aborted, it can
leave a transaction open.  The result will be a BUG when another code
path attempts to start a transaction and will get a "nesting into
different fs" error, since current->journal_info will be left non-NULL.

Original fix against SUSE kernel by Chris Mason <mason@suse.com>

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-14 18:56:08 -08:00
Jeff Mahoney
5d5e815618 [PATCH] reiserfs: skip commit on io error
This should have been part of the original io error patch, but got
dropped somewhere along the way.

It's extremely important when handling the i/o error in the journal to
not commit the transaction with corrupt data.  This patch adds that code
back in.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-14 18:56:07 -08:00
John McCutchan
8140a5005b [PATCH] inotify: add two inotify_add_watch flags
The below patch lets userspace have more control over the inodes that
inotify will watch.  It introduces two new flags.

        IN_ONLYDIR -- only watch the inode if it is a directory.
        This is needed to avoid the race that can occur when we want to be
        sure that we are watching a directory.

        IN_DONT_FOLLOW -- don't follow a symlink.  In combination
        with IN_ONLYDIR we can make sure that we don't watch the target of
        symlinks.

The issues the flags fix came up when writing the gnome-vfs inotify
backend.  Default behaviour is unchanged.

Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 08:57:43 -08:00
Daniel Drake
894ec8707c [PATCH] Fix listxattr() for generic security attributes
Commit f549d6c18c introduced a generic
fallback for security xattrs, but appears to include a subtle bug.

Gentoo users with kernels with selinux compiled in, and coreutils compiled
with acl support, noticed that they could not copy files on tmpfs using
'cp'.

cp (compiled with acl support) copies the file, lists the extended
attributes on the old file, copies them all to the new file, and then
exits.  However the listxattr() calls were failing with this odd behaviour:

llistxattr("a.out", (nil), 0)           = 17
llistxattr("a.out", 0x7fffff8c6cb0, 17) = -1 ERANGE (Numerical result out of
range)

I believe this is a simple problem in the logic used to check the buffer
sizes; if the user sends a buffer the exact size of the data, then its ok
:)

This change solves the problem.
More info can be found at http://bugs.gentoo.org/113138

Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 08:57:42 -08:00
Trond Myklebust
3b6efee923 NFSv4: Fix an Oops in the synchronous write path
- Missing initialisation of attribute bitmask in _nfs4_proc_write()
 - On success, _nfs4_proc_write() must return number of bytes written.
 - Missing post_op_update_inode() in _nfs4_proc_write()
 - Missing initialisation of attribute bitmask in _nfs4_proc_commit()
 - Missing post_op_update_inode() in _nfs4_proc_commit()

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03 15:20:21 -05:00
Trond Myklebust
5ba7cc4801 NFS: Fix post-op attribute revalidation...
- Missing nfs_mark_for_revalidate in nfs_proc_link()
  - Missing nfs_mark_for_revalidate in nfs_rename()

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03 15:20:17 -05:00
Trond Myklebust
bb713d6d38 NFS: use set_page_writeback() in the appropriate places
Ensure that we use set_page_writeback() in the appropriate places
 to help the VM in keeping its page radix_tree in sync.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03 15:20:14 -05:00
Trond Myklebust
24aa1fe677 NFS: Fix a few further cache consistency regressions
Steve Dickson writes:
 Doing the following:
 1. On server:
 $ mkdir ~/t
 $ echo Hello > ~/t/tmp

 2. On client, wait for a string to appear in this file:
 $ until grep -q foo t/tmp ; do echo -n . ; sleep 1 ; done

 3. On server, create a *new* file with the same name containing that
 string:
 $ mv ~/t/tmp ~/t/tmp.old; echo foo > ~/t/tmp

 will show how the client will never (and I mean never ;-) ) see
 the updated file.

 The problem is that we do not update nfsi->cache_change_attribute when the
 file changes on the server (we only update it when our client makes the
 changes). This again means that functions like nfs_check_verifier() will
 fail to register when the parent directory has changed and should trigger
 a dentry lookup revalidation.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03 15:20:07 -05:00
Steve Dickson
223db122bf NFS: Fix cache consistency regression
Make sure cache_change_attribute is initialized to jiffies
 so when the mtime changes on directory, the directory
 will be refreshed.

 Signed-off by: Steve Dickson <steved@redhat.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03 15:20:03 -05:00