IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
If we get an IO error on a synchronous superblock write, we attach an
error release function to it so that when the last reference goes away
the release function is called and the buffer is invalidated and
unlocked. The buffer is left locked until the release function is
called so that other concurrent users of the buffer will be locked out
until the buffer error is fully processed.
Unfortunately, for the superblock buffer the filesyetm itself holds a
reference to the buffer which prevents the reference count from
dropping to zero and the release function being called. As a result,
once an IO error occurs on a sync write, the buffer will never be
unlocked and all future attempts to lock the buffer will hang.
To make matters worse, this problems is not unique to such buffers;
if there is a concurrent _xfs_buf_find() running, the lookup will grab
a reference to the buffer and then wait on the buffer lock, preventing
the reference count from ever falling to zero and hence unlocking the
buffer.
As such, the whole b_relse function implementation is broken because it
cannot rely on the buffer reference count falling to zero to unlock the
errored buffer. The synchronous write error path is the only path that
uses this callback - it is used to ensure that the synchronous waiter
gets the buffer error before the error state is cleared from the buffer
by the release function.
Given that the only sychronous buffer writes now go through xfs_bwrite
and the error path in question can only occur for a write of a dirty,
logged buffer, we can move most of the b_relse processing to happen
inline in xfs_buf_iodone_callbacks, just like a normal I/O completion.
In addition to that we make sure the error is not cleared in
xfs_buf_iodone_callbacks, so that xfs_bwrite can reliably check it.
Given that xfs_bwrite keeps the buffer locked until it has waited for
it and checked the error this allows to reliably propagate the error
to the caller, and make sure that the buffer is reliably unlocked.
Given that xfs_buf_iodone_callbacks was the only instance of the
b_relse callback we can remove it entirely.
Based on earlier patches by Dave Chinner and Ajeet Yadav.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ajeet Yadav <ajeet.yadav.77@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Allow manual discards from userspace using the FITRIM ioctl. This is not
intended to be run during normal workloads, as the freepsace btree walks
can cause large performance degradation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
To ensure the log is covered and the filesystem idles correctly, we
need to ensure that dummy transactions hit the disk and do not stay
pinned in memory. If the superblock is pinned in memory, it can't
be flushed so the log covering cannot make progress. The result is
dependent on timing - more oftent han not we continue to issues a
log covering transaction every 36s rather than idling after ~90s.
Fix this by making the log covering transaction synchronous. To
avoid additional log force from xfssyncd, make the log covering
transaction take the place of the existing log force in the xfssyncd
background sync process.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
* 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits)
NFS fix the setting of exchange id flag
NFS: Don't use vm_map_ram() in readdir
NFSv4: Ensure continued open and lockowner name uniqueness
NFS: Move cl_delegations to the nfs_server struct
NFS: Introduce nfs_detach_delegations()
NFS: Move cl_state_owners and related fields to the nfs_server struct
NFS: Allow walking nfs_client.cl_superblocks list outside client.c
pnfs: layout roc code
pnfs: update nfs4_callback_recallany to handle layouts
pnfs: add CB_LAYOUTRECALL handling
pnfs: CB_LAYOUTRECALL xdr code
pnfs: change lo refcounting to atomic_t
pnfs: check that partial LAYOUTGET return is ignored
pnfs: add layout to client list before sending rpc
pnfs: serialize LAYOUTGET(openstateid)
pnfs: layoutget rpc code cleanup
pnfs: change how lsegs are removed from layout list
pnfs: change layout state seqlock to a spinlock
pnfs: add prefix to struct pnfs_layout_hdr fields
pnfs: add prefix to struct pnfs_layout_segment fields
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
UDF: Close small mem leak in udf_find_entry()
udf: Fix directory corruption after extent merging
udf: Protect udf_file_aio_write from possible races
udf: Remove unnecessary bkl usages
udf: Use of s_alloc_mutex to serialize udf_relocate_blocks() execution
udf: Replace bkl with the UDF_I(inode)->i_data_sem for protect udf_inode_info struct
udf: Remove BKL from free space counting functions
udf: Call udf_add_free_space() for more blocks at once in udf_free_blocks()
udf: Remove BKL from udf_put_super() and udf_remount_fs()
udf: Protect default inode credentials by rwlock
udf: Protect all modifications of LVID with s_alloc_mutex
udf: Move handling of uniqueID into a helper function and protect it by a s_alloc_mutex
udf: Remove BKL from udf_update_inode
udf: Convert UDF_SB(sb)->s_flags to use bitops
fs/udf: Add printf format/argument verification
fs/udf: Use vzalloc
(Evil merge: this also removes the BKL dependency from the Kconfig file)
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
ext4: fix trimming starting with block 0 with small blocksize
ext4: revert buggy trim overflow patch
ext4: don't pass entire map to check_eofblocks_fl
ext4: fix memory leak in ext4_free_branches
ext4: remove ext4_mb_return_to_preallocation()
ext4: flush the i_completed_io_list during ext4_truncate
ext4: add error checking to calls to ext4_handle_dirty_metadata()
ext4: fix trimming of a single group
ext4: fix uninitialized variable in ext4_register_li_request
ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary
ext4: drop i_state_flags on architectures with 64-bit longs
ext4: reorder ext4_inode_info structure elements to remove unneeded padding
ext4: drop ec_type from the ext4_ext_cache structure
ext4: use ext4_lblk_t instead of sector_t for logical blocks
ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED
ext4: fix 32bit overflow in ext4_ext_find_goal()
ext4: add more error checks to ext4_mkdir()
ext4: ext4_ext_migrate should use NULL not 0
ext4: Use ext4_error_file() to print the pathname to the corrupted inode
ext4: use IS_ERR() to check for errors in ext4_error_file
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext2: Resolve 'dereferencing pointer to incomplete type' when enabling EXT2_XATTR_DEBUG
ext3: Remove redundant unlikely()
ext2: Remove redundant unlikely()
ext3: speed up file creates by optimizing rec_len functions
ext2: speed up file creates by optimizing rec_len functions
ext3: Add more journal error check
ext3: Add journal error check in resize.c
quota: Use %pV and __attribute__((format (printf in __quota_error and fix fallout
ext3: Add FITRIM handling
ext3: Add batched discard support for ext3
ext3: Add journal error check into ext3_rename()
ext3: Use search_dirblock() in ext3_dx_find_entry()
ext3: Avoid uninitialized memory references with a corrupted htree directory
ext3: Return error code from generic_check_addressable
ext3: Add journal error check into ext3_delete_entry()
ext3: Add error check in ext3_mkdir()
fs/ext3/super.c: Use printf extension %pV
fs/ext2/super.c: Use printf extension %pV
ext3: don't update sb journal_devnum when RO dev
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: Don't set dentry->d_op in create routines
fs/9p: fix spelling typo
fs/9p: TREADLINK bugfix
net/9p: Use proper data types
fs/9p: Simplify the .L create operation
fs/9p: Move dotl inode operations into a seperate file
fs/9p: fix menu presentation
fs/9p: Fix the return error on default acl removal
fs/9p: Remove unnecessary semicolons
When s_first_data_block is not zero (which happens e.g. when block size is 1KB)
and trim ioctl is called to start trimming from block 0, the math in
ext4_get_group_no_and_offset() overflows. The overall result is that ioctl
returns EINVAL which is kind of unexpected and we probably don't want
userspace tools to bother with internal details of filesystem structure.
So just silently increase starting offset (and shorten length) when starting
block is below s_first_data_block.
CC: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If we lose the backchannel and then the client repairs the problem,
resend any callbacks.
We use a new cb_done flag to track whether there is still work to be
done for the callback or whether it can be destroyed with the rpc.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If this loses any backchannel, make sure we have a chance to notice that
and set the sequence flags.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Distinguish between when the callback channel is known to be down, and
when it is not yet confirmed. This will be useful in the 4.1 case.
Also, we don't seem to be using the fact that this field is atomic.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that we have a list of connections to choose from, we can teach the
callback code to just pick a suitable connection and use that, instead
of insisting on forever using the connection that the first
create_session was sent with.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Basic xdr and processing for BIND_CONN_TO_SESSION. This adds a
connection to the list of connections associated with a session.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* 'for-linus-merged' of git://oss.sgi.com/xfs/xfs: (47 commits)
xfs: convert grant head manipulations to lockless algorithm
xfs: introduce new locks for the log grant ticket wait queues
xfs: convert log grant heads to atomic variables
xfs: convert l_tail_lsn to an atomic variable.
xfs: convert l_last_sync_lsn to an atomic variable
xfs: make AIL tail pushing independent of the grant lock
xfs: use wait queues directly for the log wait queues
xfs: combine grant heads into a single 64 bit integer
xfs: rework log grant space calculations
xfs: fact out common grant head/log tail verification code
xfs: convert log grant ticket queues to list heads
xfs: use AIL bulk delete function to implement single delete
xfs: use AIL bulk update function to implement single updates
xfs: remove all the inodes on a buffer from the AIL in bulk
xfs: consume iodone callback items on buffers as they are processed
xfs: reduce the number of AIL push wakeups
xfs: bulk AIL insertion during transaction commit
xfs: clean up xfs_ail_delete()
xfs: Pull EFI/EFD handling out from under the AIL lock
xfs: fix EFI transaction cancellation.
...
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (22 commits)
MAINTAINERS: Update Joel Becker's email address
ocfs2: Remove unused truncate function from alloc.c
ocfs2/cluster: dereferencing before checking in nst_seq_show()
ocfs2: fix build for OCFS2_FS_STATS not enabled
ocfs2/cluster: Show o2net timing statistics
ocfs2/cluster: Track process message timing stats for each socket
ocfs2/cluster: Track send message timing stats for each socket
ocfs2/cluster: Use ktime instead of timeval in struct o2net_sock_container
ocfs2/cluster: Replace timeval with ktime in struct o2net_send_tracking
ocfs2: Add DEBUG_FS dependency
ocfs2/dlm: Hard code the values for enums
ocfs2/dlm: Minor cleanup
ocfs2/dlm: Cleanup dlmdebug.c
ocfs2: Release buffer_head in case of error in ocfs2_double_lock.
ocfs2/cluster: Pin the local node when o2hb thread starts
ocfs2/cluster: Show pin state for each o2hb region
ocfs2/cluster: Pin/unpin o2hb regions
ocfs2/cluster: Remove dropped region from o2hb quorum region bitmap
ocfs2/cluster: Pin the remote node item in configfs
ocfs2/dlm: make existing convertion precedent over new lock
...
Indicate support for referrals. Do not set any PNFS roles. Check the flags
returned by the server for validity. Do not use exchange flags from an old
client ID instance when recovering a client ID.
Update the EXCHID4_FLAG_XXX set to RFC 5661.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We do set dentry->d_op in lookup even in case of EOENT entries.
That implies we should have dentry->d_op already set when
create/mkdir/mknod/link/symlink routines are called
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
introduced a typo somehow during a hand merge
Reported by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Remove v9fs_vfs_readlink_dotl function and use generic_readlink. Update
v9fs_vfs_follow_link_dotl function to accommodate this change
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reported-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Make the 9P_FS kconfig options subordinate to the 9P_FS kconfig symbol
in the menu presentation instead of them all being at the same level.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
If we don't have default ACL, then trying to remove
default acl on a file should return 0.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
This merge pulls the XFS master branch into the latest Linus master.
This results in a merge conflict whose best fix is not obvious.
I manually fixed the conflict, in "fs/xfs/xfs_iget.c".
Dave Chinner had done work that resulted in RCU freeing of inodes
separate from what Nick Piggin had done, and their results differed
slightly in xfs_inode_free(). The fix updates Nick's call_rcu()
with the use of VFS_I(), while incorporating needed updates to some
XFS inode fields implemented in Dave's series. Dave's RCU callback
function has also been removed.
Signed-off-by: Alex Elder <aelder@sgi.com>
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
driver core: Document that device_rename() is only for networking
sysfs: remove useless test from sysfs_merge_group
driver-core: merge private parts of class and bus
driver core: fix whitespace in class_attr_string
When two concurrent unaligned, non-overlapping direct IOs are issued
to the same block, the direct Io layer will race to zero the block.
The result is that one of the concurrent IOs will overwrite data
written by the other IO with zeros. This is demonstrated by the
xfsqa test 240.
To avoid this problem, serialise all unaligned direct IOs to an
inode with a big hammer. We need a big hammer approach as we need to
serialise AIO as well, so we can't just block writes on locks.
Hence, the big hammer is calling xfs_ioend_wait() while holding out
other unaligned direct IOs from starting.
We don't bother trying to serialised aligned vs unaligned IOs as
they are overlapping IO and the result of concurrent overlapping IOs
is undefined - the result of either IO is a valid result so we let
them race. Hence we only penalise unaligned IO, which already has a
major overhead compared to aligned IO so this isn't a major problem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The buffered IO and direct IO write paths share a common set of
checks and limiting code prior to issuing the write. Factor that
into a common helper function.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Complete the split of the different write IO paths by splitting the
buffered IO write path out of xfs_file_aio_write(). This makes the
different mechanisms of the write patchs easier to follow.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The current xfs_file_aio_write code is a mess of locking shenanigans
to handle the different locking requirements of buffered and direct
IO. Start to clean this up by disentangling the direct IO path from
the mess.
This also removes the failed direct IO fallback path to buffered IO.
XFS handles all direct IO cases without needing to fall back to
buffered IO, so we can safely remove this unused path. This greatly
simplifies the logic and locking needed in the write path.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
We need to obtain the i_mutex, i_iolock and i_ilock during the read
and write paths. Add a set of wrapper functions to neatly
encapsulate the lock ordering and shared/exclusive semantics to make
the locking easier to follow and get right.
Note that this changes some of the exclusive locking serialisation in
that serialisation will occur against the i_mutex instead of the
XFS_IOLOCK_EXCL. This does not change any behaviour, and it is
arguably more efficient to use the mutex for such serialisation than
the rw_sem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
xfs_file_aio_write() only returns the error from synchronous
flushing of the data and inode if error == 0. At the point where
error is being checked, it is guaranteed to be > 0. Therefore any
errors returned by the data or fsync flush will never be returned.
Fix the checks so we overwrite the current error once and only if an
error really occurred.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.
The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org [2.6.37]
When I enable EXT2_XATTR_DEBUG in fs/ext2/xattr.c I get a build error stating
the following:
CC fs/ext2/xattr.o
fs/ext2/xattr.c: In function 'ext2_xattr_cache_insert':
fs/ext2/xattr.c:841: error: dereferencing pointer to incomplete type
fs/ext2/xattr.c:846: error: dereferencing pointer to incomplete type
make[2]: *** [fs/ext2/xattr.o] Error 1
make[1]: *** [fs/ext2] Error 2
make: *** [fs] Error 2
These lines reference ext2_xattr_cache->c_entry_count which is defined
in struct mb_cache. struct mb_cache is currently only defined in fs/mbcache.c.
Moving struct mb_cache definition to include/linux/mbcache.h to resolve the
issue.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Jan Kara <jack@suse.cz>
IS_ERR() already implies unlikely(), so it can be omitted here.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jan Kara <jack@suse.cz>
IS_ERR() already implies unlikely(), so it can be omitted here.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jan Kara <jack@suse.cz>
The addition of 64k block capability in the rec_len_from_disk
and rec_len_to_disk functions added a bit of math overhead which
slows down file create workloads needlessly when the architecture
cannot even support 64k blocks, thanks to page size limits.
Similar changes already exist in the ext4 codebase.
The directory entry checking can also be optimized a bit
by sprinkling in some unlikely() conditions to move the
error handling out of line.
bonnie++ sequential file creates on a 512MB ramdisk speeds up
from about 77,000/s to about 82,000/s, about a 6% improvement.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
The addition of 64k block capability in the rec_len_from_disk
and rec_len_to_disk functions added a bit of math overhead which
slows down file create workloads needlessly when the architecture
cannot even support 64k blocks, thanks to page size limits.
The directory entry checking can also be optimized a bit
by sprinkling in some unlikely() conditions to move the
error handling out of line.
bonnie++ sequential file creates on a 512MB ramdisk speeds up
from about 2200/s to about 2500/s, about a 14% improvement.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>