IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Tracepoints can provide insight into why btrfs hits bugs and be greatly
helpful for debugging, e.g
dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0
dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0
btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0)
btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0)
btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8
flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA
flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0)
flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0)
flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0)
btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0)
btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0)
Here is what I have added:
1) ordere_extent:
btrfs_ordered_extent_add
btrfs_ordered_extent_remove
btrfs_ordered_extent_start
btrfs_ordered_extent_put
These provide critical information to understand how ordered_extents are
updated.
2) extent_map:
btrfs_get_extent
extent_map is used in both read and write cases, and it is useful for tracking
how btrfs specific IO is running.
3) writepage:
__extent_writepage
btrfs_writepage_end_io_hook
Pages are cirtical resourses and produce a lot of corner cases during writeback,
so it is valuable to know how page is written to disk.
4) inode:
btrfs_inode_new
btrfs_inode_request
btrfs_inode_evict
These can show where and when a inode is created, when a inode is evicted.
5) sync:
btrfs_sync_file
btrfs_sync_fs
These show sync arguments.
6) transaction:
btrfs_transaction_commit
In transaction based filesystem, it will be useful to know the generation and
who does commit.
7) back reference and cow:
btrfs_delayed_tree_ref
btrfs_delayed_data_ref
btrfs_delayed_ref_head
btrfs_cow_block
Btrfs natively supports back references, these tracepoints are helpful on
understanding btrfs's COW mechanism.
8) chunk:
btrfs_chunk_alloc
btrfs_chunk_free
Chunk is a link between physical offset and logical offset, and stands for space
infomation in btrfs, and these are helpful on tracing space things.
9) reserved_extent:
btrfs_reserved_extent_alloc
btrfs_reserved_extent_free
These can show how btrfs uses its space.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The pointer to the extent buffer for the root of each tree
is protected by a spinlock so that we can safely read the pointer
and take a reference on the extent buffer.
But now that the extent buffers are freed via RCU, we can safely
use rcu_read_lock instead.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
A requested key is write locked in order to prevent modifications on the
authentication token while it is being used.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
The ecryptfs_find_auth_tok_for_sig() call is moved before the
mutex_lock(s->tfm_mutex) instruction in order to avoid possible deadlocks
that may occur by holding the lock on the two semaphores 'key->sem' and
's->tfm_mutex' in reverse order.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Authentication tokens content may change if another requestor calls the
update() method of the corresponding key. The new function
ecryptfs_verify_auth_tok_from_key() retrieves the authentication token from
the provided key and verifies if it is still valid before being used to
encrypt or decrypt an eCryptfs file.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
[tyhicks: Minor formatting changes]
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
The size of the 'keysig' array is incremented of one byte in order to make
room for the NULL character. The 'keysig' variable is used, in the function
ecryptfs_generate_key_packet_set(), to find an authentication token with
the given signature and is printed a debug message if it cannot be
retrieved.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
This patch removes the 'num_global_auth_toks' field of the
ecryptfs_mount_crypt_stat structure, used to count the number of items in
the 'global_auth_tok_list' list. This variable is not needed because there
are no checks based upon it.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
The pointer '(*auth_tok_key)' is set to NULL in case request_key()
fails, in order to prevent its use by functions calling
ecryptfs_keyring_auth_tok_for_sig().
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Cc: <stable@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Unlock the page in error path of ecryptfs_write_begin(). This may
happen, for example, if decryption fails while bring the page
up-to-date.
Cc: <stable@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Change the write path to encrypt the data only when the page is written to
disk in ecryptfs_writepage. Previously, ecryptfs encrypts the page in
ecryptfs_write_end which means that if there are multiple write requests to
the same page, ecryptfs ends up re-encrypting that page over and over again.
This patch minimizes the number of encryptions needed.
Signed-off-by: Thieu Le <thieule@chromium.org>
[tyhicks: Changed NULL .drop_inode sop pointer to generic_drop_inode]
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Now that grow_file() is not called in the ecryptfs_create() path, the
ECRYPTFS_NEW_FILE flag is no longer needed. It helped
ecryptfs_readpage() know not to decrypt zeroes that were read from the
lower file in the grow_file() path.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
When creating a new eCryptfs file, the crypto metadata is written out
and then the lower file was being "grown" with 4 kB of encrypted zeroes.
I suspect that growing the encrypted file was to prevent an information
leak that the unencrypted file was empty. However, the unencrypted file
size is stored, in plaintext, in the metadata so growing the file is
unnecessary.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Fix inode.c kernel-doc fatal error: 2 comment sections have the same name:
Error(fs/inode.c:1171): duplicate section name 'Note'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When m_start returns an error, the seq_file logic will still call m_stop
with that error entry, so we'd better make sure that we check it before
using it as a vma.
Introduced by commit ec6fd8a4355c ("report errors in /proc/*/*map*
sanely"), which replaced NULL with various ERR_PTR() cases.
(On ia64, you happen to get a unaligned fault instead of a page fault,
since the address used is generally some random error code like -EPERM)
Reported-by: Anca Emanuel <anca.emanuel@gmail.com>
Reported-by: Tony Luck <tony.luck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Wilson <wilsons@start.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the inode scalability patches have been merged, it is no longer
safe to call igrab() under the inode->i_lock.
Now that we no longer call nfs_clear_request() until the nfs_page is
being freed, we know that we are always holding a reference to the
nfs_open_context, which again holds a reference to the path, and so
the inode cannot be freed until the last nfs_page has been removed
from the radix tree and freed.
We can therefore skip the igrab()/iput() altogether.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I noticed that dio_end_io calls the appropriate endio function with an error,
but the endio functions don't actually do anything with that error, they assume
that if there was an error then the bio will not be uptodate. So if we had
checksum failures we would never pass back EIO. So if there is an error in our
endio functions make sure to clear the uptodate flag on the bio. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
When doing direct writes we store the checksums in the ordered sum stuff in the
ordered extent for writing them when the write completes, so we don't even use
the dip->csums array. So if we're writing, don't bother allocating dip->csums
since we won't use it anyway. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
This patch makes the free space cluster refilling code a little easier to
understand, and fixes some things with the bitmap part of it. Currently we
either want to refill a cluster with
1) All normal extent entries (those without bitmaps)
2) A bitmap entry with enough space
The current code has this ugly jump around logic that will first try and fill up
the cluster with extent entries and then if it can't do that it will try and
find a bitmap to use. So instead split this out into two functions, one that
tries to find only normal entries, and one that tries to find bitmaps.
This also fixes a suboptimal thing we would do with bitmaps. If we used a
bitmap we would just tell the cluster that we were pointing at a bitmap and it
would do the tree search in the block group for that entry every time we tried
to make an allocation. Instead of doing that now we just add it to the clusters
group.
I tested this with my ENOSPC tests and xfstests and it survived.
Signed-off-by: Josef Bacik <josef@redhat.com>
Now that the buffer cache has it's own LRU, we do not need to use
the page cache to provide persistent caching and reclaim
infrastructure. Convert the buffer cache to use alloc_pages()
instead of the page cache. This will remove all the overhead of page
cache management from setup and teardown of the buffers, as well as
needing to mark pages accessed as we find buffers in the buffer
cache.
By avoiding the page cache, we also remove the need to keep state in
the page_private(page) field for persistant storage across buffer
free/buffer rebuild and so all that code can be removed. This also
fixes the long-standing problem of not having enough bits in the
page_private field to track all the state needed for a 512
sector/64k page setup.
It also removes the need for page locking during reads as the pages
are unique to the buffer and nobody else will be attempting to
access them.
Finally, it removes the buftarg address space lock as a point of
global contention on workloads that allocate and free buffers
quickly such as when creating or removing large numbers of inodes in
parallel. This remove the 16TB limit on filesystem size on 32 bit
machines as the page index (32 bit) is no longer used for lookups
of metadata buffers - the buffer cache is now solely indexed by disk
address which is stored in a 64 bit field in the buffer.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
During mount, we can do a quotacheck that involves a bulkstat pass
on all inodes. If there are more inodes in the filesystem than can
be held in memory, we require the inode cache shrinker to run to
ensure that we don't run out of memory.
Unfortunately, the inode cache shrinker is not registered until we
get to the end of the superblock setup process, which is after a
quotacheck is run if it is needed. Hence we need to register the
inode cache shrinker earlier in the mount process so that we don't
OOM during mount. This requires that we also initialise the syncd
work before we register the shrinker, so we nee dto juggle that
around as well.
While there, make sure that we have set up the block sizes in the
VFS superblock correctly before the quotacheck is run so that any
inodes that are cached as a result of the quotacheck have their
block size fields set up correctly.
Cc: stable@kernel.org
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
When inside a transaction and we fail to read a buffer,
xfs_trans_read_buf returns a null buffer pointer and no error.
xfs_do_da_buf() checks the error return, but not the buffer, and as
a result this read failure condition causes a panic when it attempts
to dereference the non-existant buffer.
Make xfs_trans_read_buf() return the same error for this situation
regardless of whether it is in a transaction or not. This means
every caller does not need to check both the error return and the
buffer before proceeding to use the buffer.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
There is an ABBA deadlock between synchronous inode flushing in
xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
buffer, then takes inode ilocks, whilst synchronous reclaim takes
the ilock followed by the buffer lock in xfs_iflush().
To avoid this deadlock, separate the inode cluster buffer locking
semantics from the synchronous inode flush semantics, allowing
callers to attempt to lock the buffer but still issue synchronous IO
if it can get the buffer. This requires xfs_iflush() calls that
currently use non-blocking semantics to pass SYNC_TRYLOCK rather
than 0 as the flags parameter.
This allows xfs_reclaim_inode to avoid the deadlock on the buffer
lock and detect the failure so that it can drop the inode ilock and
restart the reclaim attempt on the inode. This allows
xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
release it and hence defuse the deadlock situation. It also has the
pleasant side effect of avoiding IO in xfs_reclaim_inode when it
tries to next reclaim the inode as it is now marked stale.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
On 32 bit systems, vmalloc space is limited and XFS can chew through
it quickly as the vmalloc space is lazily freed. This can result in
failure to map buffers, even when there is apparently large amounts
of vmalloc space available. Hence, if we fail to map a buffer, purge
the aliases that have not yet been freed to hopefuly free up enough
vmalloc space to allow a retry to succeed.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Preallocation and hole punch transactions are currently synchronous
and this is causing performance problems in some cases. The
transactions don't need to be synchronous as we don't need to
guarantee the preallocation is persistent on disk until a
fdatasync, fsync, sync operation occurs. If the file is opened
O_SYNC or O_DATASYNC, only then should the transaction be issued
synchronously.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
The release method for mds connections uses a backpointer to the
mds_client, so we need to flush the workqueue of any pending work (and
ceph_connection references) prior to freeing the mds_client. This fixes
an oops easily triggered under UML by
while true ; do mount ... ; umount ... ; done
Also fix an outdated comment: the flush in ceph_destroy_client only flushes
OSD connections out. This bug is basically an artifact of the ceph ->
ceph+libceph conversion.
Signed-off-by: Sage Weil <sage@newdream.net>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits)
ext4: fix a BUG in mb_mark_used during trim.
ext4: unused variables cleanup in fs/ext4/extents.c
ext4: remove redundant set_buffer_mapped() in ext4_da_get_block_prep()
ext4: add more tracepoints and use dev_t in the trace buffer
ext4: don't kfree uninitialized s_group_info members
ext4: add missing space in printk's in __ext4_grp_locked_error()
ext4: add FITRIM to compat_ioctl.
ext4: handle errors in ext4_clear_blocks()
ext4: unify the ext4_handle_release_buffer() api
ext4: handle errors in ext4_rename
jbd2: add COW fields to struct jbd2_journal_handle
jbd2: add the b_cow_tid field to journal_head struct
ext4: Initialize fsync transaction ids in ext4_new_inode()
ext4: Use single thread to perform DIO unwritten convertion
ext4: optimize ext4_bio_write_page() when no extent conversion is needed
ext4: skip orphan cleanup if fs has unknown ROCOMPAT features
ext4: use the nblocks arg to ext4_truncate_restart_trans()
ext4: fix missing iput of root inode for some mount error paths
ext4: make FIEMAP and delayed allocation play well together
ext4: suppress verbose debugging information if malloc-debug is off
...
Fi up conflicts in fs/ext4/super.c due to workqueue changes
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6: (9356 commits)
[media] rc: update for bitop name changes
fs: simplify iget & friends
fs: pull inode->i_lock up out of writeback_single_inode
fs: rename inode_lock to inode_hash_lock
fs: move i_wb_list out from under inode_lock
fs: move i_sb_list out from under inode_lock
fs: remove inode_lock from iput_final and prune_icache
fs: Lock the inode LRU list separately
fs: factor inode disposal
fs: protect inode->i_state with inode->i_lock
lib, arch: add filter argument to show_mem and fix private implementations
SLUB: Write to per cpu data when allocating it
slub: Fix debugobjects with lockless fastpath
autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd()
autofs4 - remove autofs4_lock
autofs4 - fix d_manage() return on rcu-walk
autofs4 - fix autofs4_expire_indirect() traversal
autofs4 - fix dentry leak in autofs4_expire_direct()
autofs4 - reinstate last used update on access
vfs - check non-mountpoint dentry might block in __follow_mount_rcu()
...
NOTE!
This merge commit was created to fix compilation error. The block
tree was merged upstream and removed the 'elv_queue_empty()'
function which the new 'mtdswap' driver is using. So a simple
merge of the mtd tree with upstream does not compile. And the
mtd tree has already be published, so re-basing it is not an option.
To fix this unfortunate situation, I had to merge upstream into the
mtd-2.6.git tree without committing, put the fixup patch on top of
this, and then commit this. The result is that we do not have commits
which do not compile.
In other words, this merge commit "merges" 3 things: the MTD tree, the
upstream tree, and the fixup patch.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: simplify iget & friends
fs: pull inode->i_lock up out of writeback_single_inode
fs: rename inode_lock to inode_hash_lock
fs: move i_wb_list out from under inode_lock
fs: move i_sb_list out from under inode_lock
fs: remove inode_lock from iput_final and prune_icache
fs: Lock the inode LRU list separately
fs: factor inode disposal
fs: protect inode->i_state with inode->i_lock
autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd()
autofs4 - remove autofs4_lock
autofs4 - fix d_manage() return on rcu-walk
autofs4 - fix autofs4_expire_indirect() traversal
autofs4 - fix dentry leak in autofs4_expire_direct()
autofs4 - reinstate last used update on access
vfs - check non-mountpoint dentry might block in __follow_mount_rcu()
Merge get_new_inode/get_new_inode_fast into iget5_locked/iget_locked
as those were the only callers. Remove the internal ifind/ifind_fast
helpers - ifind_fast only had a single caller, and ifind had two
callers wanting it to do different things. Also clean up the comments
in this area to focus on information important to a developer trying
to use it, instead of overloading them with implementation details.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
First thing we do in writeback_single_inode() is take the i_lock and
the last thing we do is drop it. A caller already holds the i_lock,
so pull the i_lock out of writeback_single_inode() to reduce the
round trips on this lock during inode writeback.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
All that remains of the inode_lock is protecting the inode hash list
manipulation and traversals. Rename the inode_lock to
inode_hash_lock to reflect it's actual function.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Protect the inode writeback list with a new global lock
inode_wb_list_lock and use it to protect the list manipulations and
traversals. This lock replaces the inode_lock as the inodes on the
list can be validity checked while holding the inode->i_lock and
hence the inode_lock is no longer needed to protect the list.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Protect the per-sb inode list with a new global lock
inode_sb_list_lock and use it to protect the list manipulations and
traversals. This lock replaces the inode_lock as the inodes on the
list can be validity checked while holding the inode->i_lock and
hence the inode_lock is no longer needed to protect the list.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that inode state changes are protected by the inode->i_lock and
the inode LRU manipulations by the inode_lru_lock, we can remove the
inode_lock from prune_icache and the initial part of iput_final().
instead of using the inode_lock to protect the inode during
iput_final, use the inode->i_lock instead. This protects the inode
against new references being taken while we change the inode state
to I_FREEING, as well as preventing prune_icache from grabbing the
inode while we are manipulating it. Hence we no longer need the
inode_lock in iput_final prior to setting I_FREEING on the inode.
For prune_icache, we no longer need the inode_lock to protect the
LRU list, and the inodes themselves are protected against freeing
races by the inode->i_lock. Hence we can lift the inode_lock from
prune_icache as well.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Introduce the inode_lru_lock to protect the inode_lru list. This
lock is nested inside the inode->i_lock to allow the inode to be
added to the LRU list in iput_final without needing to deal with
lock inversions. This keeps iput_final() clean and neat.
Further, where marking the inode I_FREEING and removing it from the
LRU, move the LRU list manipulation within the inode->i_lock to keep
the list manipulation consistent with iput_final. This also means
that most of the open coded LRU list removal + unused inode
accounting can now use the inode_lru_list_del() wrappers which
cleans the code up further.
However, this locking change means what the LRU traversal in
prune_icache() inverts this lock ordering and needs to use trylock
semantics on the inode->i_lock to avoid deadlocking. In these cases,
if we fail to lock the inode we move it to the back of the LRU to
prevent spinning on it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We have a couple of places that dispose of inodes. factor the
disposal into evict() to isolate this code and make it simpler to
peel away the inode_lock from the code.
While doing this, change the logic flow in iput_final() to separate
the different cases that need to be handled to make the transitions
the inode goes through more obvious.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Protect inode state transitions and validity checks with the
inode->i_lock. This enables us to make inode state transitions
independently of the inode_lock and is the first step to peeling
away the inode_lock from the code.
This requires that __iget() is done atomically with i_state checks
during list traversals so that we don't race with another thread
marking the inode I_FREEING between the state check and grabbing the
reference.
Also remove the unlock_new_inode() memory barrier optimisation
required to avoid taking the inode_lock when clearing I_NEW.
Simplify the code by simply taking the inode->i_lock around the
state change and wakeup. Because the wakeup is no longer tricky,
remove the wake_up_inode() function and open code the wakeup where
necessary.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
changes LAYOUTGET and GETDEVICEINFO XDR parsing to:
- not use vmap, which doesn't work on incoherent archs
- use xdr_stream parsing for all xdr
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Test NFS_INO_LAYOUTCOMMIT before kzalloc
Mark inode dirty to retry LAYOUTCOMMIT on kzalloc failure.
Add comments.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In fs/autofs4/dev-ioctl.c::autofs_dev_ioctl_setpipefd() we call fget(),
which may return NULL, but we do not explicitly test for that NULL return
so we may end up dereferencing a NULL pointer - bad.
When I originally submitted this patch I had chosen EBUSY as the return
value to use if this happens. Ian Kent was kind enough to explain why that
would most likely be wrong and why EBADF should most likely be used
instead. This version of the patch uses EBADF.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The autofs4_lock introduced by the rcu-walk changes has unnecessarily
broad scope. The locking is better handled by the per-autofs super
block lookup_lock.
Signed-off-by: Ian Kent <raven@themaw.net>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The daemon never needs to block and, in the rcu-walk case an error
return isn't used, so always return zero.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The vfs-scale changes changed the traversal used in
autofs4_expire_indirect() from a list to a depth first tree traversal
which isn't right.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There is a missing dput() when returning from autofs4_expire_direct()
when we see that the dentry is already a pending mount.
Signed-off-by: Ian Kent <raven@themaw.net>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>