IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
commit 5d069dbe8aaf2a197142558b6fb2978189ba3454 upstream.
Jan Kara's analysis of the syzbot report (edited):
The reproducer opens a directory on FUSE filesystem, it then attaches
dnotify mark to the open directory. After that a fuse_do_getattr() call
finds that attributes returned by the server are inconsistent, and calls
make_bad_inode() which, among other things does:
inode->i_mode = S_IFREG;
This then confuses dnotify which doesn't tear down its structures
properly and eventually crashes.
Avoid calling make_bad_inode() on a live inode: switch to a private flag on
the fuse inode. Also add the test to ops which the bad_inode_ops would
have caught.
This bug goes back to the initial merge of fuse in 2.6.14...
Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
[bwh: Backported to 4.9:
- Drop changes in fuse_dir_fsync(), fuse_readahead(), fuse_evict_inode()
- In fuse_get_link(), return ERR_PTR(-EIO) for bad inodes
- Convert some additional calls to is_bad_inode()
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31070f6ccec09f3bd4f1e28cd1e592fa4f3ba0b6 upstream.
The ioctl encoding for this parameter is a long but the documentation says
it should be an int and the kernel drivers expect it to be an int. If the
fuse driver treats this as a long it might end up scribbling over the stack
of a userspace process that only allocated enough space for an int.
This was previously discussed in [1] and a patch for fuse was proposed in
[2]. From what I can tell the patch in [2] was nacked in favor of adding
new, "fixed" ioctls and using those from userspace. However there is still
no "fixed" version of these ioctls and the fact is that it's sometimes
infeasible to change all userspace to use the new one.
Handling the ioctls specially in the fuse driver seems like the most
pragmatic way for fuse servers to support them without causing crashes in
userspace applications that call them.
[1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/
[2]: https://sourceforge.net/p/fuse/mailman/message/31771759/
Signed-off-by: Chirantan Ekbote <chirantan@chromium.org>
Fixes: 59efec7b9039 ("fuse: implement ioctl support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5880c7a8620290a6c90ced7a0e8bd0ad9419601 upstream.
unlock_page() was missing in case of an already in-flight write against the
same page.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Fixes: ff17be086477 ("fuse: writepage: skip already in flight")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream.
Starting from commit 9c225f2655e3 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.
To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3Dhttps://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
so if we would do such a change it will break a real user.
Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.
This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 35d6fcbb7c3e296a52136347346a698a35af3fda upstream.
Do the proper cleanup in case the size check fails.
Tested with xfstests:generic/228
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 0cbade024ba5 ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate")
Cc: Liu Bo <bo.liu@linux.alibaba.com>
Cc: <stable@vger.kernel.org> # v3.5
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0cbade024ba501313da3b7e5dd2a188a6bc491b5 upstream.
fstests generic/228 reported this failure that fuse fallocate does not
honor what 'ulimit -f' has set.
This adds the necessary inode_newsize_ok() check.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Fixes: 05ba1f082300 ("fuse: add FALLOCATE operation")
Cc: <stable@vger.kernel.org> # v3.5
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9de5be06d0a89ca97b5ab902694d42dfd2bb77d2 upstream.
Writepage requests were cropped to i_size & 0xffffffff, which meant that
mmaped writes to any file larger than 4G might be silently discarded.
Fix by storing the file size in a properly sized variable (loff_t instead
of size_t).
Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
Fixes: 6eaf4782eb09 ("fuse: writepages: crop secondary requests")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2ebba824106dabe79937a9f29a875f837e1b6d4 upstream.
NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not
the page cache page.
Fixes: 8b284dc47291 ("fuse: writepages: handle same page rewrites")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ebacb81273599555a7a19f7754a1451206a5fc4f upstream.
In async IO blocking case the additional reference to the io is taken for
it to survive fuse_aio_complete(). In non blocking case this additional
reference is not needed, however we still reference io to figure out
whether to wait for completion or not. This is wrong and will lead to
use-after-free. Fix it by storing blocking information in separate
variable.
This was spotted by KASAN when running generic/208 fstest.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 744742d692e3 ("fuse: Add reference counting for fuse_io_priv")
Cc: <stable@vger.kernel.org> # v4.6
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 109728ccc5933151c68d1106e4065478a487a323 upstream.
The above error path returns with page unlocked, so this place seems also
to behave the same.
Fixes: f8dbdf81821b ("fuse: rework fuse_readpages()")
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68227c03cba84a24faf8a7277d2b1a03c8959c2c upstream.
Before the patch, the flock flag could remain uninitialized for the
lifespan of the fuse_file allocation. Unless set to true in
fuse_file_flock(), it would remain in an indeterminate state until read in
an if statement in fuse_release_common(). This could consequently lead to
taking an unexpected branch in the code.
The bug was discovered by a runtime instrumentation designed to detect use
of uninitialized memory in the kernel.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Fixes: 37fb3a30b462 ("fuse: fix flock")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e38bea99a80eab408adee27f873a188d57b76cb upstream.
fuse_file_put() was missing the "force" flag for the RELEASE request when
sending synchronously (fuseblk).
If this flag is not set, then a sync request may be interrupted before it
is dequeued by the userspace filesystem. In this case the OPEN won't be
balanced with a RELEASE.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 5a18ec176c93 ("fuse: fix hang of single threaded fuseblk filesystem")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If pos is at the beginning of a page and copied is zero then page is not
zeroed but is marked uptodate.
Fix by skipping everything except unlock/put of page if zero bytes were
copied.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 6b12c1b37e55 ("fuse: Implement write_begin/write_end callbacks")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pull misc vfs updates from Al Viro:
"Assorted misc bits and pieces.
There are several single-topic branches left after this (rename2
series from Miklos, current_time series from Deepa Dinamani, xattr
series from Andreas, uaccess stuff from from me) and I'd prefer to
send those separately"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
proc: switch auxv to use of __mem_open()
hpfs: support FIEMAP
cifs: get rid of unused arguments of CIFSSMBWrite()
posix_acl: uapi header split
posix_acl: xattr representation cleanups
fs/aio.c: eliminate redundant loads in put_aio_ring_file
fs/internal.h: add const to ns_dentry_operations declaration
compat: remove compat_printk()
fs/buffer.c: make __getblk_slow() static
proc: unsigned file descriptors
fs/file: more unsigned file descriptors
fs: compat: remove redundant check of nr_segs
cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
cifs: don't use memcpy() to copy struct iov_iter
get rid of separate multipage fault-in primitives
fs: Avoid premature clearing of capabilities
fs: Give dentry to inode_change_ok() instead of inode
fuse: Propagate dentry down to inode_change_ok()
ceph: Propagate dentry down to inode_change_ok()
xfs: Propagate dentry down to inode_change_ok()
...
To avoid clearing of capabilities or security related extended
attributes too early, inode_change_ok() will need to take dentry instead
of inode. Propagate it down to fuse_do_setattr().
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
When reading from a loop device backed by a fuse file it deadlocks on
lock_page().
This is because the page is already locked by the read() operation done on
the loop device. In this case we don't want to either lock the page or
dirty it.
So do what fs/direct-io.c does: only dirty the page for ITER_IOVEC vectors.
Reported-by: Sheng Yang <sheng@yasker.org>
Fixes: aa4d86163e4e ("block: loop: switch to VFS ITER_BVEC")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.1+
Reviewed-by: Sheng Yang <sheng@yasker.org>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Tested-by: Sheng Yang <sheng@yasker.org>
Tested-by: Ashish Samant <ashish.samant@oracle.com>
Pull fuse updates from Miklos Szeredi:
"This fixes error propagation from writeback to fsync/close for
writeback cache mode as well as adding a missing capability flag to
the INIT message. The rest are cleanups.
(The commits are recent but all the code actually sat in -next for a
while now. The recommits are due to conflict avoidance and the
addition of Cc: stable@...)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: use filemap_check_errors()
mm: export filemap_check_errors() to modules
fuse: fix wrong assignment of ->flags in fuse_send_init()
fuse: fuse_flush must check mapping->flags for errors
fuse: fsync() did not return IO errors
fuse: don't mess with blocking signals
new helper: wait_event_killable_exclusive()
fuse: improve aio directIO write performance for size extending writes
fuse_flush() calls write_inode_now() that triggers writeback, but actual
writeback will happen later, on fuse_sync_writes(). If an error happens,
fuse_writepage_end() will set error bit in mapping->flags. So, we have to
check mapping->flags after fuse_sync_writes().
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Due to implementation of fuse writeback filemap_write_and_wait_range() does
not catch errors. We have to do this directly after fuse_sync_writes()
Signed-off-by: Alexey Kuznetsov <kuznet@virtuozzo.com>
Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 4d99ff8f12eb ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone. This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted. Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.
[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While sending the blocking directIO in fuse, the write request is broken
into sub-requests, each of default size 128k and all the requests are sent
in non-blocking background mode if async_dio mode is supported by libfuse.
The process which issue the write wait for the completion of all the
sub-requests. Sending multiple requests parallely gives a chance to perform
parallel writes in the user space fuse implementation if it is
multi-threaded and hence improves the performance.
When there is a size extending aio dio write, we switch to blocking mode so
that we can properly update the size of the file after completion of the
writes. However, in this situation all the sub-requests are sent in
serialized manner where the next request is sent only after receiving the
reply of the current request. Hence the multi-threaded user space
implementation is not utilized properly.
This patch changes the size extending aio dio behavior to exactly follow
blocking dio. For multi threaded fuse implementation having 10 threads and
using buffer size of 64MB to perform async directIO, we are getting double
the speed.
Signed-off-by: Ashish Sangwan <ashishsangwan2@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pull vfs cleanups from Al Viro:
"More cleanups from Christoph"
* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nfsd: use RWF_SYNC
fs: add RWF_DSYNC aand RWF_SYNC
ceph: use generic_write_sync
fs: simplify the generic_write_sync prototype
fs: add IOCB_SYNC and IOCB_DSYNC
direct-io: remove the offset argument to dio_complete
direct-io: eliminate the offset argument to ->direct_IO
xfs: eliminate the pos variable in xfs_file_dio_aio_write
filemap: remove the pos argument to generic_file_direct_write
filemap: remove pos variables in generic_file_read_iter
Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually
work, so eliminate the superflous argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fuse_get_user_pages() should return error or 0. Otherwise fuse_direct_io
read will not return 0 to indicate that read has completed.
Fixes: 742f992708df ("fuse: return patrial success from fuse_direct_io()")
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a user calls writev/readv in direct io mode with partially valid data
in the iovec array such that any vector other than the first one in the
array contains invalid data, we currently return the error for the invalid
iovec.
Instead, we should return the number of bytes already written/read and not
the error as we do in the non direct_io case.
Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The 'reqs' member of fuse_io_priv serves two purposes. First is to track
the number of oustanding async requests to the server and to signal that
the io request is completed. The second is to be a reference count on the
structure to know when it can be freed.
For sync io requests these purposes can be at odds. fuse_direct_IO() wants
to block until the request is done, and since the signal is sent when
'reqs' reaches 0 it cannot keep a reference to the object. Yet it needs to
use the object after the userspace server has completed processing
requests. This leads to some handshaking and special casing that it
needlessly complicated and responsible for at least one race condition.
It's much cleaner and safer to maintain a separate reference count for the
object lifecycle and to let 'reqs' just be a count of outstanding requests
to the userspace server. Then we can know for sure when it is safe to free
the object without any handshaking or special cases.
The catch here is that most of the time these objects are stack allocated
and should not be freed. Initializing these objects with a single reference
that is never released prevents accidental attempts to free the objects.
Fixes: 9d5722b7777e ("fuse: handle synchronous iocbs internally")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
There's a race in fuse_direct_IO(), whereby is_sync_kiocb() is called on an
iocb that could have been freed if async io has already completed. The fix
in this case is simple and obvious: cache the result before starting io.
It was discovered by KASan:
kernel: ==================================================================
kernel: BUG: KASan: use after free in fuse_direct_IO+0xb1a/0xcc0 at addr ffff88036c414390
Signed-off-by: Robert Doebbelin <robert@quobyte.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: bcba24ccdc82 ("fuse: enable asynchronous processing direct IO")
Cc: <stable@vger.kernel.org> # 3.10+
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull fuse updates from Miklos Szeredi:
"This adds SEEK_HOLE and SEEK_DATA support in lseek"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: add support for SEEK_HOLE and SEEK_DATA in lseek
A useful performance improvement for accessing virtual machine images
via FUSE mount.
See https://bugzilla.redhat.com/show_bug.cgi?id=1220173 for a use-case
for glusterFS.
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.
Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.
A similar problem is described in 124d3b7041f ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -> goto again without skipping zero-length segment.
Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.
Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Maxim Patlasov <mpatlasov@parallels.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org>
Instead of having users check for FL_POSIX or FL_FLOCK to call the correct
locks API function, use the check within locks_lock_inode_wait(). This
allows for some later cleanup.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
Pull fuse updates from Miklos Szeredi:
"This is the start of improving fuse scalability.
An input queue and a processing queue is split out from the monolithic
fuse connection, each of those having their own spinlock. The end of
the patchset adds the ability to clone a fuse connection. This means,
that instead of having to read/write requests/answers on a single fuse
device fd, the fuse daemon can have multiple distinct file descriptors
open. Each of those can be used to receive requests and send answers,
currently the only constraint is that a request must be answered on
the same fd as it was read from.
This can be extended further to allow binding a device clone to a
specific CPU or NUMA node.
Based on a patchset by Srinivas Eeda and Ashish Samant. Thanks to
Ashish for the review of this series"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (40 commits)
fuse: update MAINTAINERS entry
fuse: separate pqueue for clones
fuse: introduce per-instance fuse_dev structure
fuse: device fd clone
fuse: abort: no fc->lock needed for request ending
fuse: no fc->lock for pqueue parts
fuse: no fc->lock in request_end()
fuse: cleanup request_end()
fuse: request_end(): do once
fuse: add req flag for private list
fuse: pqueue locking
fuse: abort: group pqueue accesses
fuse: cleanup fuse_dev_do_read()
fuse: move list_del_init() from request_end() into callers
fuse: duplicate ->connected in pqueue
fuse: separate out processing queue
fuse: simplify request_wait()
fuse: no fc->lock for iqueue parts
fuse: allow interrupt queuing without fc->lock
fuse: iqueue locking
...
Use flags for representing the state in fuse_req. This is needed since
req->list will be protected by different locks in different states, hence
we'll want the state itself to be split into distinct bits, each protected
with the relevant lock in that state.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
FUSE_REQ_INIT is actually the same state as FUSE_REQ_PENDING and
FUSE_REQ_READING and FUSE_REQ_WRITING can be merged into a common
FUSE_REQ_IO state.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Finer grained locking will mean there's no single lock to protect
modification of bitfileds in fuse_req.
So move to using bitops. Can use the non-atomic variants for those which
happen while the request definitely has only one reference.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
file_remove_suid() is a misnomer since it removes also file capabilities
stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells
something else than whether file_remove_suid() call is necessary which
leads to bugs.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear. For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi. To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.
This patch moves bdi->bdi_stat[] into wb.
* enum bdi_stat_item is renamed to wb_stat_item and the prefix of all
enums is changed from BDI_ to WB_.
* BDI_STAT_BATCH() -> WB_STAT_BATCH()
* [__]{add|inc|dec|sum}_wb_stat(bdi, ...) -> [__]{add|inc}_wb_stat(wb, ...)
* bdi_stat[_error]() -> wb_stat[_error]()
* bdi_writeout_inc() -> wb_writeout_inc()
* stat init is moved to bdi_wb_init() and bdi_wb_exit() is added and
frees stat.
* As there's still only one bdi_writeback per backing_dev_info, all
uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[]
introducing no behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success. Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Conflicts:
fs/ext4/file.c
already done by caller. We used to call __fuse_direct_write(), which
called generic_write_checks(); now the former got expanded, bringing
the latter to the surface. It used to be called all along and calling
it from there had been wrong all along...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>