361 Commits

Author SHA1 Message Date
Al Viro
463ffb2e9d [PATCH] namei fixes (9/19)
New helper: __follow_mount(struct path *path).  Same as follow_mount(), except
that we do *not* do mntput() after the first lookup_mnt().

IOW, original path->mnt stays pinned down.  We also take care to do dput()
before mntput() in the loop body (follow_mount() also needs that reordering,
but that will be done later in the series).

The following are equivalent, assuming that path.mnt == x:
(1)
	follow_mount(&path.mnt, &path.dentry)
(2)
	__follow_mount(&path);
	if (path->mnt != x)
		mntput(x);
(3)
	if (__follow_mount(&path))
		mntput(x);

Callers of follow_mount() in __link_path_walk() converted to (2).

Equivalent transformation + fix for too-late-mntput() race in __follow_mount()
loop.

Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 14:42:25 -07:00
Al Viro
d671d5e514 [PATCH] namei fixes (8/19)
In open_namei() we never use path.mnt or path.dentry after exit: or ok:.
Assignment of path.dentry in case of LAST_BIND is dead code and only
obfuscates already convoluted function; assignment of path.mnt after
__do_follow_link() can be moved down to the place where we set path.dentry.

Obviously equivalent transformations, just to clean the air a bit in that
region.

Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 14:42:25 -07:00
Al Viro
cd4e91d3bc [PATCH] namei fixes (7/19)
The first argument of __do_follow_link() switched to struct path *
(__do_follow_link(path->dentry, ...) -> __do_follow_link(path, ...)).

All callers have the same calls of mntget() right before and dput()/mntput()
right after __do_follow_link(); these calls have been moved inside.

Obviously equivalent transformations.

Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 14:42:25 -07:00
Al Viro
839d9f93c9 [PATCH] namei fixes (6/19)
mntget(path->mnt) in do_follow_link() moved down to right before the
__do_follow_link() call and rigth after loop: resp.

dput()+mntput() on non-ELOOP branch moved up to right after __do_follow_link()
call.

resulting
loop:
	mntget(path->mnt);
	path_release(nd);
	dput(path->mnt);
	mntput(path->mnt);
replaced with equivalent
	dput(path->mnt);
	path_release(nd);

Equivalent transformations - the reason why we have that mntget() is that
__do_follow_link() can drop a reference to nd->mnt and that's what holds
path->mnt.  So that call can happen at any point prior to __do_follow_link()
touching nd->mnt.  The rest is obvious.

NOTE: current tree relies on symlinks *never* being mounted on anything.  It's
not hard to get rid of that assumption (actually, that will come for free
later in the series).  For now we are just not making the situation worse than
it is.

Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 14:42:25 -07:00
Al Viro
1be4a0900b [PATCH] namei fixes (5/19)
fix for too early mntput() in open_namei() - we pin path.mnt down for the
duration of __do_follow_link().  Otherwise we could get the fs where our
symlink lived unmounted while we were in __do_follow_link().  That would end
up with dentry of symlink staying pinned down through the fs shutdown.

Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 14:42:24 -07:00
Al Viro
d73ffe16b8 [PATCH] namei fixes (4/19)
path.mnt in open_namei() set to mirror nd->mnt.

nd->mnt is set in 3 places in that function - path_lookup() in the beginning,
__follow_down() loop after do_last: and __do_follow_link() call after
do_link:.

We set path.mnt to nd->mnt after path_lookup() and __do_follow_link().  In
__follow_down() loop we use &path.mnt instead of &nd->mnt and set nd->mnt to
path.mnt immediately after that loop.

Obviously equivalent transformation.

Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 14:42:24 -07:00
Al Viro
4e7506e4dd [PATCH] namei fixes (3/19)
Replaced struct dentry *dentry in namei with struct path path.  All uses of
dentry replaced with path.dentry there.

Obviously equivalent transformation.

Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 14:42:24 -07:00
Al Viro
5f92b3bcec [PATCH] namei fixes (2/19)
All callers of do_follow_link() do mntget() right before it and
dput()+mntput() right after.  These calls are moved inside do_follow_link()
now.

Obviously equivalent transformation.

Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 14:42:24 -07:00
Al Viro
90ebe5654f [PATCH] namei fixes
OK, here comes a patch series that hopefully should close all
too-early-mntput() races in fs/namei.c.  Entire area is convoluted as hell, so
I'm splitting that series into _very_ small chunks.

Patches alread in the tree close only (very wide) races in following symlinks
(see "busy inodes after umount" thread some time ago).  Unfortunately, quite a
few narrower races of the same nature were not closed.  Hopefully this should
take care of all of them.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-06 14:42:24 -07:00
Steve French
0b68177ccd Merge with rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-06-06 09:57:33 -07:00
Qu Fuping
854715be73 [PATCH] mpage_end_io_write() I/O error handling fix
When fsync() runs wait_on_page_writeback_range() it only inspects pages which
are actually under I/O (PAGECACHE_TAG_WRITEBACK).  If a page completed I/O
prior to wait_on_page_writeback_range() looking at it, it is supposed to have
recorded its I/O error state in the address_space.

But mpage_mpage_end_io_write() forgot to set the address_space error flag in
this case.

Signed-off-by: Qu Fuping <fs@ercist.iscas.ac.cn>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-04 17:12:59 -07:00
Dave Kleikamp
72e3148a6e JFS: Fix compiler warning in jfs_logmgr.c
fs/jfs/jfs_logmgr.c: In function `jfs_flush_journal':
fs/jfs/jfs_logmgr.c:1632: warning: unused variable `mp'

Some debug code in jfs_flush_journal does nothing when CONFIG_JFS_DEBUG
is not defined.  Place the whole code segment within an ifdef to avoid
unnecessary code to be compiled and the warning to be issued.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-06-03 14:09:54 -05:00
Steve French
d0d2f2df65 [CIFS] Update cifs version number and fix whitespace
Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-06-02 15:12:36 -07:00
Jan Kara
7e3b11a9be [PATCH] ext3: fix list scanning in __cleanup_transaction
Fix a bug in list scanning that can cause us to skip the last buffer on the
checkpoint list (and hence fail to do any progress under some rather
unfavorable conditions).

The problem is we first do jh=next_jh and then test

	} while (jh!=last_jh);

Hence we skip the last buffer on the list (if it was not the only buffer on
the list).  As we already do jh=next_jh; in the beginning of the loop we
are safe to just remove the assignment in the end.  It can happen that 'jh'
will be freed at the point we test jh != last_jh but that does not matter
as we never *dereference* the pointer.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-02 15:12:29 -07:00
Jan Kara
00ea81459c [PATCH] ext3: fix log_do_checkpoint() assertion failure
Fix possible false assertion failure in log_do_checkpoint().  We might fail
to detect that we actually made a progress when cleaning up the checkpoint
lists if we don't retry after writing something to disk.  The patch was
confirmed to fix observed assertion failures for several users.

When we flushed some buffers we need to retry scanning the list.
Otherwise we can fail to detect our progress.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-02 15:12:29 -07:00
Dave Kleikamp
c2731509cf JFS: kernel BUG at fs/jfs/jfs_txnmgr.c:859
add_missing_indices() must set tlck->type to tlckBTROOT when modifying
a root btree root to avoid a trap in txRelease()

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-06-02 12:18:20 -05:00
Dave Kleikamp
7078253c08 Merge with /home/shaggy/git/linus-clean/
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-06-02 12:12:57 -05:00
Linus Torvalds
16a789c11d Automatic merge of rsync://rsync.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 2005-06-01 16:32:03 -07:00
Steve French
12725675e2 Merge with rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-06-01 15:02:37 -07:00
Benjamin Herrenschmidt
5f64f73957 [PATCH] ppc32/ppc64: cleanup /proc/device-tree
This cleans up the /proc/device-tree representation of the Open Firmware
device-tree on ppc and ppc64.  It does the following things:

 - Workaround an issue in some Apple device-trees where a property may
   exist with the same name as a child node of the parent.  We now
   simply "drop" the property instead of creating duplicate entries in
   /proc with random result...

 - Do not try to chop off the "@0" at the end of a node name whose unit
   address is 0.  This is not useful, inconsistent, and the code was
   buggy and didn't always work anyway.

 - Do not create symlinks for the short name and unit address parts of a
   node.  These were never really used, bloated the memory footprint of
   the device-tree with useless struct proc_dir_entry and their matching
   dentry and inode cache bloat.

This results in smaller code, smaller memory footprint, and a more
accurate view of the tree presented to userland.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-01 07:54:14 -07:00
Goffredo Baroncelli
e74d633dc5 [PATCH] UDF filesystem: array '__mon_yday' declared as not static
in fs/udf/udftime.c the global array '__mon_yday' is not static, and it
conflicts with the glibc one when the kernel is compiled as user mode.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-31 14:54:18 -07:00
Steve French
af6f5e3247 Merge with rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-05-31 14:32:44 -07:00
Jeff Dike
a2e4b972c9 [PATCH] uml: remove 2_5compat.h
Remove old useless header that was used in Ye Olde Times during 2.4->2.5
porting to abstract differences.  It's definitions are no more used anyway, so
let's finally kill it.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-28 16:46:11 -07:00
Christoph Hellwig
66f5507133 [XFS] remove an over-zealous WARN_ON 2005-05-27 01:17:08 -07:00
Christoph Hellwig
b19312c4c8 Merge with /pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-05-27 01:16:24 -07:00
Vladimir Saveliev
f359b74c80 [PATCH] reiserfs: max_key fix
This patch fixes a bug introduced by Al Viro's patch: [patch 136/174]
reiserfs endianness: clone struct reiserfs_key

The problem is MAX_KEY and MAX_IN_CORE_KEY defined in this patch do not
look equal from reiserfs comp_key's point of view.  This caused reiserfs'
sanity check to complain.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-21 16:45:24 -07:00
Steve French
7e2987503d Merge with rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 2005-05-19 12:26:57 -07:00
Christoph Hellwig
f81a0bffa1 [AF_UNIX]: Use lookup_create().
currently it opencodes it, but that's in the way of chaning the
lookup_hash interface.

I'd prefer to disallow modular af_unix over exporting lookup_create,
but I'll leave that to you.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-05-19 12:26:43 -07:00
Stephen Tweedie
301216244b [PATCH] Avoid console spam with ext3 aborted journal.
Avoid console spam with ext3 aborted journal.

ext3 usually reports error conditions that it detects in its environment.
But when its journal gets aborted due to such errors, it can sometimes
continue to report that condition forever, spamming the console to such
an extent that the initial first cause of the journal abort can be lost.

When the journal aborts, we put the filesystem into readonly mode.  Most
subsequent filesystem operations will get rejected immediately by checks
for MS_RDONLY either in the filesystem or in the VFS.  But some paths do
not have such checks --- for example, if we continue to write to a file
handle that was opened before the fs went readonly.  (We only check for
the ROFS condition when the file is first opened.)  In these cases, we
can continue to generate log errors similar to

EXT3-fs error (device $DEV) in start_transaction: Journal has aborted

for each subsequent write.

There is really no point in generating these errors after the initial
error has been fully reported.  Specifically, if we're starting a
completely new filesystem operation, and the filesystem is *already*
readonly (ie. the ext3 layer has already detected and handled the
underlying jbd abort), and we see an EROFS error, then there is simply
no point in reporting it again.

Signed-off-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-18 09:10:02 -07:00
Steve French
b1a45695bd [CIFS] fix casts of unicode strings to match function definition
Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-05-17 16:07:23 -05:00
Steve French
b2aeb9d565 [CIFS] Fix oops in cifs_unlink. Caused in some cases when renaming over existing,
newly created, file.

Samba bugzilla: 2697

Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-05-17 13:16:18 -05:00
Steve French
67594feb4b [CIFS] missing break needed to handle < when mount option "mapchars" specified
Signed-off-by: Steve French (sfrench@us.ibm.com)
2005-05-17 13:04:49 -05:00
Andrew Morton
c64610ba58 [PATCH] block_read_full_page() get_block() error handling fix
If block_read_full_page() detects an error when running get_block() it will
run SetPageError(), then it will zero out the block in pagecache and will mark
the buffer_head uptodate.

So at the end of readahead we end up with a non-uptodate pagecache page which
is marked PageError.  But it has uptodate buffers.

The pagefault code will run ClearPageError, will launch readpage a second time
and block_read_full_page() will notice the uptodate buffers and will mark the
page uptodate as well.  We end up with an uptodate, !PageError page full of
zeros and the error is lost.

(It seems a little odd that filemap_nopage() runs ClearPageError().  I guess
all of this adds up to meaning that for each attempted access to the page, the
pagefault handler will retry the I/O.  Which is good and bad.  If the app is
ignoring SIGBUS for some reason we could get a lot of back-to-back I/O
errors.)

Fix it by not marking the pagecache buffer_head as uptodate if the attempt to
map that buffer to a disk block failed.

Credit-to: Qu Fuping <fs@ercist.iscas.ac.cn>

  For reporting the bug and identifying its source.

Signed-off-by: Qu Fuping <fs@ercist.iscas.ac.cn>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17 07:59:20 -07:00
Hugh Dickins
64d13c00cf [PATCH] fix impossible VmallocChunk
VmallocTotal: 34359738367 kB
VmallocUsed:    266288 kB
VmallocChunk: 18014366299193295 kB
is unsettling - x86_64 and some other architectures keep a separate address
range for modules in vmalloc's vmlist, which /proc/meminfo should pass over.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17 07:59:10 -07:00
Greg Kroah-Hartman
a84a505956 [PATCH] fix Linux kernel ELF core dump privilege elevation
As reported by Paul Starzetz <ihaquer@isec.pl>

Reference: CAN-2005-1263

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-05-16 21:07:05 -07:00
Jesper Juhl
259692bd5a JFS: Remove redundant kfree() NULL pointer checks
kfree() can handle a NULL pointer, don't worry about passing it one. 

Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2005-05-09 10:47:14 -05:00
Andrew Morton
b2411dd202 [PATCH] revert msdos partitioning fix
This change from March 3rd causes the partition parsing code to ignore
partitions which have a signature byte of zero.  Turns out that more people
have such partitions than we expected, and their device numbering is coming up
wrong in post-2.6.11 kernels.

So revert the change while we think about the problem a bit more.

Cc: Andries Brouwer <Andries.Brouwer@cwi.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-06 22:09:27 -07:00
Nathan Scott
d3870398fa [XFS] Fix directory inodes ioctl compat code, minor code consistency cleanups
SGI Modid: xfs-linux:xfs-kern:21810a

Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Christoph Hellwig <hch@sgi.com>
2005-05-06 06:44:46 -07:00
Russell Cattelan
68d1498c3a [XFS] Fix a bug in xfs_iomap for extent handling of write cases
This may be the cause of several open PV's of incorrect
delay flags being set and then tripping asserts.
Do not return a delay alloc extent when the caller is asking to do a write.

SGI Modid: xfs-linux:xfs-kern:189616a

Signed-off-by: Russell Cattelan <cattelan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@sgi.com>
2005-05-06 06:42:22 -07:00
Adrian Bunk
f59154c53f [PATCH] fs/udf/udftime.c: fix off by one error
This patch fixes an off by one error found by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:51 -07:00
Paolo 'Blaisorblade' Giarrusso
3677209239 [PATCH] comments on locking of task->comm
Add some comments about task->comm, to explain what it is near its definition
and provide some important pointers to its uses.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:48 -07:00
Randy.Dunlap
291c4a75ce [PATCH] reiserfs: use NULL instead of 0
Use NULL instead of 0 for pointer (sparse warning):
fs/reiserfs/namei.c:611:50: warning: Using plain integer as NULL pointer

Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:48 -07:00
Adrian Bunk
75c96f8584 [PATCH] make some things static
This patch makes some needlessly global identifiers static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Arjan van de Ven <arjanv@infradead.org>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:47 -07:00
Andrew Morton
d17d7fa44d [PATCH] revert ext3-writepages-support-for-writeback-mode
This had a fatal lock ranking bug: we do journal_start outside
mpage_writepages()'s lock_page().

Revert the whole thing, think again.

Credit-to: Jan Kara <jack@suse.cz>

For identifying the bug.

Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:44 -07:00
Christoph Hellwig
2ef41634de [PATCH] remove do_sync parameter from __invalidate_device
The only caller that ever sets it can call fsync_bdev itself easily.  Also
update some comments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:44 -07:00
Adrian Bunk
dfc1e14854 [PATCH] remove BK documentation
There's no longer a reason to document the obsolete BK usage.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:42 -07:00
Andrew Morton
f0fbd5fc09 [PATCH] __block_write_full_page() simplification
The `last_bh' logic probably isn't worth much.  In those situations where only
the front part of the page is being written out we will save some looping but
in the vastly more common case of an all-page writeout if just adds more code.

Nick Piggin <nickpiggin@yahoo.com.au>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:41 -07:00
Andrew Morton
05937baae9 [PATCH] __block_write_full_page speedup
Remove all those get_bh()'s and put_bh()'s by extending lock_page() to cover
the troublesome regions.

(get_bh() and put_bh() happen every time whereas contention on a page's lock
in there happens basically never).

Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:41 -07:00
Nick Piggin
ad576e63e0 [PATCH] __block_write_full_page race fix
When running
	fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte
page size over loopback to an image file on a tmpfs filesystem, I would
very quickly hit
	BUG_ON(!buffer_async_write(bh));
in fs/buffer.c:end_buffer_async_write

It seems that more than one request would be submitted for a given bh
at a time.

What would happen is the following:
2 threads doing __mpage_writepages on the same page.
Thread 1 - lock the page first, and enter __block_write_full_page.
Thread 1 - (eg.) mark_buffer_async_write on the first 2 buffers.
Thread 1 - set page writeback, unlock page.
Thread 2 - lock page, wait on page writeback
Thread 1 - submit_bh on the first 2 buffers.
=> both requests complete, none of the page buffers are async_write,
   end_page_writeback is called.
Thread 2 - wakes up. enters __block_write_full_page.
Thread 2 - mark_buffer_async_write on (eg.) the last buffer
Thread 1 - finds the last buffer has async_write set, submit_bh on that.
Thread 2 - submit_bh on the last buffer.
=> oops.

So change __block_write_full_page to explicitly keep track of the last bh
we need to issue, so we don't touch anything after issuing the last
request.

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:40 -07:00
Nick Piggin
f3ddbdc626 [PATCH] fix race in __block_prepare_write
Fix a race where __block_prepare_write can leak out an in-flight read
against a bh if get_block returns an error.  This can lead to the page
becoming unlocked while the buffer is locked and the read still in flight.
__mpage_writepage BUGs on this condition.

BUG sighted on a 2-way Itanium2 system with 16K PAGE_SIZE running

	fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2

where $DIR is a new ext2 filesystem with 4K blocks that is quite
small (causing get_block to fail often with -ENOSPC).

Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 16:36:40 -07:00