IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
[ Upstream commit cee8f4f6fcabfdf229542926128e9874d19016d5 ]
RHBZ: 1933527
Under SMB1 + POSIX, if an inode is reused on a server after we have read and
cached a part of a file, when we then open the new file with the
re-cycled inode there is a chance that we may serve the old data out of cache
to the application.
This only happens for SMB1 (deprecated) and when posix are used.
The simplest solution to avoid this race is to force a revalidate
on smb1-posix open.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9bd4540836684013aaad6070a65d6fcdd9006625 ]
Currenly we doesn't assume that a server may break a lease
from RWH to RW which causes us setting a wrong lease state
on a file and thus mistakenly flushing data and byte-range
locks and purging cached data on the client. This leads to
performance degradation because subsequent IOs go directly
to the server.
Fix this by propagating new lease state and epoch values
to the oplock break handler through cifsFileInfo structure
and removing the use of cifsInodeInfo flags for that. It
allows to avoid some races of several lease/oplock breaks
using those flags in parallel.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 44805b0e62f15e90d233485420e1847133716bdc upstream.
Currently the client translates O_SYNC and O_DIRECT flags
into corresponding SMB create options when openning a file.
The problem is that on reconnect when the file is being
re-opened the client doesn't set those flags and it causes
a server to reject re-open requests because create options
don't match. The latter means that any subsequent system
call against that open file fail until a share is re-mounted.
Fix this by properly setting SMB create options when
re-openning files after reconnects.
Fixes: 1013e760d10e6: ("SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags")
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f582b273ec23332074d970a7fb25bef835df71f upstream.
Currently when the client creates a cifsFileInfo structure for
a newly opened file, it allocates a list of byte-range locks
with a pointer to the new cfile and attaches this list to the
inode's lock list. The latter happens before initializing all
other fields, e.g. cfile->tlink. Thus a partially initialized
cifsFileInfo structure becomes available to other threads that
walk through the inode's lock list. One example of such a thread
may be an oplock break worker thread that tries to push all
cached byte-range locks. This causes NULL-pointer dereference
in smb2_push_mandatory_locks() when accessing cfile->tlink:
[598428.945633] BUG: kernel NULL pointer dereference, address: 0000000000000038
...
[598428.945749] Workqueue: cifsoplockd cifs_oplock_break [cifs]
[598428.945793] RIP: 0010:smb2_push_mandatory_locks+0xd6/0x5a0 [cifs]
...
[598428.945834] Call Trace:
[598428.945870] ? cifs_revalidate_mapping+0x45/0x90 [cifs]
[598428.945901] cifs_oplock_break+0x13d/0x450 [cifs]
[598428.945909] process_one_work+0x1db/0x380
[598428.945914] worker_thread+0x4d/0x400
[598428.945921] kthread+0x104/0x140
[598428.945925] ? process_one_work+0x380/0x380
[598428.945931] ? kthread_park+0x80/0x80
[598428.945937] ret_from_fork+0x35/0x40
Fix this by reordering initialization steps of the cifsFileInfo
structure: initialize all the fields first and then add the new
byte-range lock list to the inode's lock list.
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d46b0da7a33dd8c99d969834f682267a45444ab3 ]
There's a deadlock that is possible and can easily be seen with
a test where multiple readers open/read/close of the same file
and a disruption occurs causing reconnect. The deadlock is due
a reader thread inside cifs_strict_readv calling down_read and
obtaining lock_sem, and then after reconnect inside
cifs_reopen_file calling down_read a second time. If in
between the two down_read calls, a down_write comes from
another process, deadlock occurs.
CPU0 CPU1
---- ----
cifs_strict_readv()
down_read(&cifsi->lock_sem);
_cifsFileInfo_put
OR
cifs_new_fileinfo
down_write(&cifsi->lock_sem);
cifs_reopen_file()
down_read(&cifsi->lock_sem);
Fix the above by changing all down_write(lock_sem) calls to
down_write_trylock(lock_sem)/msleep() loop, which in turn
makes the second down_read call benign since it will never
block behind the writer while holding lock_sem.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Suggested-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed--by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 30573a82fb179420b8aac30a3a3595aa96a93156 upstream.
Currently if the client identifies problems when processing
metadata returned in CREATE response, the open handle is being
leaked. This causes multiple problems like a file missing a lease
break by that client which causes high latencies to other clients
accessing the file. Another side-effect of this is that the file
can't be deleted.
Fix this by closing the file after the client hits an error after
the file was opened and the open descriptor wasn't returned to
the user space. Also convert -ESTALE to -EOPENSTALE to allow
the VFS to revalidate a dentry and retry the open.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31fad7d41e73731f05b8053d17078638cf850fa6 upstream.
In cifs_read_allocate_pages, in case of ENOMEM, we go through
whole rdata->pages array but we have failed the allocation before
nr_pages, therefore we may end up calling put_page with NULL
pointer, causing oops
Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b98749cac4a695f084a5ff076f4510b23e353ecd upstream.
In the oplock break handler, writing pending changes from pages puts
the FileInfo handle. If the refcount reaches zero it closes the handle
and waits for any oplock break handler to return, thus causing a deadlock.
To prevent this situation:
* We add a wait flag to cifsFileInfo_put() to decide whether we should
wait for running/pending oplock break handlers
* We keep an additionnal reference of the SMB FileInfo handle so that
for the rest of the handler putting the handle won't close it.
- The ref is bumped everytime we queue the handler via the
cifs_queue_oplock_break() helper.
- The ref is decremented at the end of the handler
This bug was triggered by xfstest 464.
Also important fix to address the various reports of
oops in smb2_push_mandatory_locks
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bc31d0cdcfbadb6258b45db97e93b1c83822ba33 ]
We have a customer reporting crashes in lock_get_status() with many
"Leaked POSIX lock" messages preceeding the crash.
Leaked POSIX lock on dev=0x0:0x56 ...
Leaked POSIX lock on dev=0x0:0x56 ...
Leaked POSIX lock on dev=0x0:0x56 ...
Leaked POSIX lock on dev=0x0:0x53 ...
Leaked POSIX lock on dev=0x0:0x53 ...
Leaked POSIX lock on dev=0x0:0x53 ...
Leaked POSIX lock on dev=0x0:0x53 ...
POSIX: fl_owner=ffff8900e7b79380 fl_flags=0x1 fl_type=0x1 fl_pid=20709
Leaked POSIX lock on dev=0x0:0x4b ino...
Leaked locks on dev=0x0:0x4b ino=0xf911400000029:
POSIX: fl_owner=ffff89f41c870e00 fl_flags=0x1 fl_type=0x1 fl_pid=19592
stack segment: 0000 [#1] SMP
Modules linked in: binfmt_misc msr tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 arc4 ecb auth_rpcgss nfsv4 md4 nfs nls_utf8 lockd grace cifs sunrpc ccm dns_resolver fscache af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock xfs libcrc32c sb_edac edac_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng vmw_balloon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr vmxnet3 i2c_piix4 vmw_vmci shpchp fjes processor button ac btrfs xor raid6_pq sr_mod cdrom ata_generic sd_mod ata_piix vmwgfx crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw ahci libahci drm libata vmw_pvscsi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
Supported: Yes
CPU: 6 PID: 28250 Comm: lsof Not tainted 4.4.156-94.64-default #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
task: ffff88a345f28740 ti: ffff88c74005c000 task.ti: ffff88c74005c000
RIP: 0010:[<ffffffff8125dcab>] [<ffffffff8125dcab>] lock_get_status+0x9b/0x3b0
RSP: 0018:ffff88c74005fd90 EFLAGS: 00010202
RAX: ffff89bde83e20ae RBX: ffff89e870003d18 RCX: 0000000049534f50
RDX: ffffffff81a3541f RSI: ffffffff81a3544e RDI: ffff89bde83e20ae
RBP: 0026252423222120 R08: 0000000020584953 R09: 000000000000ffff
R10: 0000000000000000 R11: ffff88c74005fc70 R12: ffff89e5ca7b1340
R13: 00000000000050e5 R14: ffff89e870003d30 R15: ffff89e5ca7b1340
FS: 00007fafd64be800(0000) GS:ffff89f41fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001c80018 CR3: 000000a522048000 CR4: 0000000000360670
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
0000000000000208 ffffffff81a3d6b6 ffff89e870003d30 ffff89e870003d18
ffff89e5ca7b1340 ffff89f41738d7c0 ffff89e870003d30 ffff89e5ca7b1340
ffffffff8125e08f 0000000000000000 ffff89bc22b67d00 ffff88c74005ff28
Call Trace:
[<ffffffff8125e08f>] locks_show+0x2f/0x70
[<ffffffff81230ad1>] seq_read+0x251/0x3a0
[<ffffffff81275bbc>] proc_reg_read+0x3c/0x70
[<ffffffff8120e456>] __vfs_read+0x26/0x140
[<ffffffff8120e9da>] vfs_read+0x7a/0x120
[<ffffffff8120faf2>] SyS_read+0x42/0xa0
[<ffffffff8161cbc3>] entry_SYSCALL_64_fastpath+0x1e/0xb7
When Linux closes a FD (close(), close-on-exec, dup2(), ...) it calls
filp_close() which also removes all posix locks.
The lock struct is initialized like so in filp_close() and passed
down to cifs
...
lock.fl_type = F_UNLCK;
lock.fl_flags = FL_POSIX | FL_CLOSE;
lock.fl_start = 0;
lock.fl_end = OFFSET_MAX;
...
Note the FL_CLOSE flag, which hints the VFS code that this unlocking
is done for closing the fd.
filp_close()
locks_remove_posix(filp, id);
vfs_lock_file(filp, F_SETLK, &lock, NULL);
return filp->f_op->lock(filp, cmd, fl) => cifs_lock()
rc = cifs_setlk(file, flock, type, wait_flag, posix_lck, lock, unlock, xid);
rc = server->ops->mand_unlock_range(cfile, flock, xid);
if (flock->fl_flags & FL_POSIX && !rc)
rc = locks_lock_file_wait(file, flock)
Notice how we don't call locks_lock_file_wait() which does the
generic VFS lock/unlock/wait work on the inode if rc != 0.
If we are closing the handle, the SMB server is supposed to remove any
locks associated with it. Similarly, cifs.ko frees and wakes up any
lock and lock waiter when closing the file:
cifs_close()
cifsFileInfo_put(file->private_data)
/*
* Delete any outstanding lock records. We'll lose them when the file
* is closed anyway.
*/
down_write(&cifsi->lock_sem);
list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) {
list_del(&li->llist);
cifs_del_lock_waiters(li);
kfree(li);
}
list_del(&cifs_file->llist->llist);
kfree(cifs_file->llist);
up_write(&cifsi->lock_sem);
So we can safely ignore unlocking failures in cifs_lock() if they
happen with the FL_CLOSE flag hint set as both the server and the
client take care of it during the actual closing.
This is not a proper fix for the unlocking failure but it's safe and
it seems to prevent the lock leakages and crashes the customer
experiences.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 6dfbd84684700cb58b34e8602c01c12f3d2595c8 upstream.
When we have a READ lease for a file and have just issued a write
operation to the server we need to purge the cache and set oplock/lease
level to NONE to avoid reading stale data. Currently we do that
only if a write operation succedeed thus not covering cases when
a request was sent to the server but a negative error code was
returned later for some other reasons (e.g. -EIOCBQUEUED or -EINTR).
Fix this by turning off caching regardless of the error code being
returned.
The patches fixes generic tests 075 and 112 from the xfs-tests.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 92a8109e4d3a34fb6b115c9098b51767dc933444 ]
The code tries to allocate a contiguous buffer with a size supplied by
the server (maxBuf). This could fail if memory is fragmented since it
results in high order allocations for commonly used server
implementations. It is also wasteful since there are probably
few locks in the usual case. Limit the buffer to be no larger than a
page to avoid memory allocation failures due to fragmentation.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit b9a74cde94957d82003fb9f7ab4777938ca851cd upstream.
If maxBuf is small but non-zero, it could result in a zero sized lock
element array which we would then try and access OOB.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 560d388950ceda5e7c7cdef7f3d9a8ff297bbf9d ]
cifs_relock_file() can perform a down_write() on the inode's lock_sem even
though it was already performed in cifs_strict_readv(). Lockdep complains
about this. AFAICS, there is no problem here, and lockdep just needs to be
told that this nesting is OK.
=============================================
[ INFO: possible recursive locking detected ]
4.11.0+ #20 Not tainted
---------------------------------------------
cat/701 is trying to acquire lock:
(&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00
but task is already holding lock:
(&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&cifsi->lock_sem);
lock(&cifsi->lock_sem);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by cat/701:
#0: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
stack backtrace:
CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
Call Trace:
dump_stack+0x85/0xc2
__lock_acquire+0x17dd/0x2260
? trace_hardirqs_on_thunk+0x1a/0x1c
? preempt_schedule_irq+0x6b/0x80
lock_acquire+0xcc/0x260
? lock_acquire+0xcc/0x260
? cifs_reopen_file+0x7a7/0xc00
down_read+0x2d/0x70
? cifs_reopen_file+0x7a7/0xc00
cifs_reopen_file+0x7a7/0xc00
? printk+0x43/0x4b
cifs_readpage_worker+0x327/0x8a0
cifs_readpage+0x8c/0x2a0
generic_file_read_iter+0x692/0xd00
cifs_strict_readv+0x29f/0x310
generic_file_splice_read+0x11c/0x1c0
do_splice_to+0xa5/0xc0
splice_direct_to_actor+0xfa/0x350
? generic_pipe_buf_nosteal+0x10/0x10
do_splice_direct+0xb5/0xe0
do_sendfile+0x278/0x3a0
SyS_sendfile64+0xc4/0xe0
entry_SYSCALL_64_fastpath+0x1f/0xbe
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f04a703c3d613845ae3141bfaf223489de8ab3eb upstream.
If cifs_zap_mapping() returned an error, we would return without putting
the xid that we got earlier. Restructure cifs_file_strict_mmap() and
cifs_file_mmap() to be more similar to each other and have a single
point of return that always puts the xid.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1013e760d10e614dc10b5624ce9fc41563ba2e65 upstream.
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fa839b4986d648b907d117275869a0e46c324b9 upstream.
This fixes Continuous Availability when errors during
file reopen are encountered.
cifs_user_readv and cifs_user_writev would wait for ever if
results of cifs_reopen_file are not stored and for later inspection.
In fact, results are checked and, in case of errors, a chain
of function calls leading to reads and writes to be scheduled in
a separate thread is skipped.
These threads will wake up the corresponding waiters once reads
and writes are done.
However, given the return value is not stored, when rc is checked
for errors a previous one (always zero) is inspected instead.
This leads to pending reads/writes added to the list, making
cifs_user_readv and cifs_user_writev wait for ever.
Signed-off-by: Germano Percossi <germano.percossi@citrix.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c25702cee1405099f982894c865c163de7909a8 upstream.
Currently we call copy_page_to_iter() for uncached reading into a pipe.
This is wrong because it treats pages as VFS cache pages and copies references
rather than actual data. When we are trying to read from the pipe we end up
calling page_cache_pipe_buf_confirm() which returns -ENODATA. This error
is translated into 0 which is returned to a user.
This issue is reproduced by running xfs-tests suite (generic test #249)
against mount points with "cache=none". Fix it by mapping pages manually
and calling copy_to_iter() that copies data into the pipe.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96a988ffeb90dba33a71c3826086fe67c897a183 upstream.
With the current code it is possible to lock a mutex twice when
a subsequent reconnects are triggered. On the 1st reconnect we
reconnect sessions and tcons and then persistent file handles.
If the 2nd reconnect happens during the reconnecting of persistent
file handles then the following sequence of calls is observed:
cifs_reopen_file -> SMB2_open -> small_smb2_init -> smb2_reconnect
-> cifs_reopen_persistent_file_handles -> cifs_reopen_file (again!).
So, we are trying to acquire the same cfile->fh_mutex twice which
is wrong. Fix this by moving reconnecting of persistent handles to
the delayed work (smb2_reconnect_server) and submitting this work
every time we reconnect tcon in SMB2 commands handling codepath.
This can also lead to corruption of a temporary file list in
cifs_reopen_persistent_file_handles() because we can recursively
call this function twice.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are already doing the same thing for an ordinary open case:
we can't keep read oplock on a file if we have mandatory byte-range
locks because pagereading can conflict with these locks on a server.
Fix it by setting oplock level to NONE.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
openFileList of tcon can be changed while cifs_reopen_file() is called
that can lead to an unexpected behavior when we return to the loop.
Fix this by introducing a temp list for keeping all file handles that
need to be reopen.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Continuous Availability features like persistent handles
require that clients reconnect their open files, not
just the sessions, soon after the network connection comes
back up, otherwise the server will throw away the state
(byte range locks, leases, deny modes) on those handles
after a timeout.
Add code to reconnect handles when use_persistent set
(e.g. Continuous Availability shares) after tree reconnect.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Germano Percossi <germano.percossi@citrix.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Remove the global file_list_lock to simplify cifs/smb3 locking and
have spinlocks that more closely match the information they are
protecting.
Add new tcon->open_file_lock and file->file_info_lock spinlocks.
Locks continue to follow a heirachy,
cifs_socket --> cifs_ses --> cifs_tcon --> cifs_file
where global tcp_ses_lock still protects socket and cifs_ses, while the
the newer locks protect the lower level structure's information
(tcon and cifs_file respectively).
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Germano Percossi <germano.percossi@citrix.com>
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning
current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.
Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Vladimir has noticed that we might declare memcg oom even during
readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
restriction) while __do_page_cache_readahead uses
page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
OOMs. This gfp mask discrepancy is really unfortunate and easily
fixable. Drop page_cache_alloc_readahead() which only has one user and
outsource the gfp_mask logic into readahead_gfp_mask and propagate this
mask from __do_page_cache_readahead down to read_pages.
This alone would have only very limited impact as most filesystems are
implementing ->readpages and the common implementation mpage_readpages
does GFP_KERNEL (with mapping_gfp restriction) again. We can tell it to
use readahead_gfp_mask instead as this function is called only during
readahead as well. The same applies to read_cache_pages.
ext4 has its own ext4_mpage_readpages but the path which has pages !=
NULL can use the same gfp mask. Btrfs, cifs, f2fs and orangefs are
doing a very similar pattern to mpage_readpages so the same can be
applied to them as well.
[akpm@linux-foundation.org: coding-style fixes]
[mhocko@suse.com: restrict gfp mask in mpage_alloc]
Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Chris Mason <clm@fb.com>
Cc: Steve French <sfrench@samba.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Changman Lee <cm224.lee@samsung.com>
Cc: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now, we send the tgid cross the wire. What we really want to send
though is a hashed fl_owner_t since samba treats this field as a generic
lockowner.
It turns out that because we enforce and release locks locally before
they are ever sent to the server, this patch makes no difference in
behavior. Still, setting OFD locks on the server using the process
pid seems wrong, so I think this patch still makes sense.
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilovsky@samba.org>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Pull cifs iovec cleanups from Al Viro.
* 'sendmsg.cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cifs: don't bother with kmap on read_pages side
cifs_readv_receive: use cifs_read_from_socket()
cifs: no need to wank with copying and advancing iovec on recvmsg side either
cifs: quit playing games with draining iovecs
cifs: merge the hash calculation helpers
Pull cifs updates from Steve French:
"Various small CIFS and SMB3 fixes (including some for stable)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
remove directory incorrectly tries to set delete on close on non-empty directories
Update cifs.ko version to 2.09
fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication
fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication
fs/cifs: correctly to anonymous authentication for the LANMAN authentication
fs/cifs: correctly to anonymous authentication via NTLMSSP
cifs: remove any preceding delimiter from prefix_path
cifs: Use file_dentry()
CIFS may be used as lower layer of overlayfs and accessing f_path.dentry can
lead to a crash.
Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
The kiocb already has the new position, so use that. The only interesting
case is AIO, where we currently don't bother updating ki_pos. We're about
to free the kiocb after we're done, so we might as well update it to make
everyone's life simpler.
While we're at it also return the bytes written argument passed in if
we were successful so that the boilerplate error switch code in the
callers can go away.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This will allow us to do per-I/O sync file writes, as required by a lot
of fileservers or storage targets.
XXX: Will need a few additional audits for O_DSYNC
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually
work, so eliminate the superflous argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mostly direct substitution with occasional adjustment or removing
outdated comments.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
lock_page() must operate on the whole compound page. It doesn't make
much sense to lock part of compound page. Change code to use head
page's PG_locked, if tail page is passed.
This patch also gets rid of custom helper functions --
__set_page_locked() and __clear_page_locked(). They are replaced with
helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG. Tail pages to these
helper would trigger VM_BUG_ON().
SLUB uses PG_locked as a bit spin locked. IIUC, tail pages should never
appear there. VM_BUG_ON() is added to make sure that this assumption is
correct.
[akpm@linux-foundation.org: fix fs/cifs/file.c]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.
Let's introduce a helper function which makes the restriction explicit and
easier to track. This patch doesn't introduce any functional changes.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWNsKlAAoJEAAOaEEZVoIVKNMP+QHb96HMNWnMlBE9jwPbBK/2
yM80sa6wRcbCF519sRFbmOheet4bgNSHixegtUez5kyqyI7Hr0tsRYvIo5/amAWX
EIh03fZoM+Bgm+dblYivorSrPmmx2UQ9RG6pUbcOPtxdCpQ79tfzVyYVykG5wcb5
NLSibG9s5USutOXPTatxDqS6P2QwvvWXHR5oX1mkU2W7nQXfHOdQKSuk5CqUeIWx
JSGIa+plS9fath1Ndu4pJ7atvU8cR0t+VeOqPmGoqqIDyGVbo45XgXZmk0xCxEs9
XsVSbdGBMAtA63xlZHFROADFNXIosay2zA7mdG0i3IrLRMQr/okQhTqBrFMKmj0m
cDMDNOs4j4M8JJPkwrJQ3S/1Tnl+zyAuKKTJwgvVnd1tcyTZjs3g77I9e84pSTsp
chL4FmfeR7dhk+YJgcnbzvnnP7tBbQcV0ET/ILVsDU7bNDujWlcDzYkbbWx70WLa
KobjmsW/OAGaQugIMA1oGLTexT1u9HtDYOw8JVNBKwlrnPKyFVb8X88gx2Laf34L
Qa04TdrFseuxbnBGifLyQTsLxgF9QalUo+51J0I4a7G3WX0U2Zuk+ZTbHc6ChhdW
d0oL2SEyToscRADRL0/u2CUR1dEXkdDXi3pxgvDs5PTJVU+lIy4czp/dI5JrjKUA
L7O27Kstgoe2GctHn6FI
=OYAZ
-----END PGP SIGNATURE-----
Merge tag 'locks-v4.4-1' of git://git.samba.org/jlayton/linux
Pull file locking updates from Jeff Layton:
"The largest series of changes is from Ben who offered up a set to add
a new helper function for setting locks based on the type set in
fl_flags. Dmitry also send in a fix for a potential race that he
found with KTSAN"
* tag 'locks-v4.4-1' of git://git.samba.org/jlayton/linux:
locks: cleanup posix_lock_inode_wait and flock_lock_inode_wait
Move locks API users to locks_lock_inode_wait()
locks: introduce locks_lock_inode_wait()
locks: Use more file_inode and fix a comment
fs: fix data races on inode->i_flctx
locks: change tracepoint for generic_add_lease
Instead of having users check for FL_POSIX or FL_FLOCK to call the correct
locks API function, use the check within locks_lock_inode_wait(). This
allows for some later cleanup.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Commit 6afdb859b710 ("mm: do not ignore mapping_gfp_mask in page cache
allocation paths") has caught some users of hardcoded GFP_KERNEL used in
the page cache allocation paths. This, however, wasn't complete and
there were others which went unnoticed.
Dave Chinner has reported the following deadlock for xfs on loop device:
: With the recent merge of the loop device changes, I'm now seeing
: XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
:
: The deadlocked is as follows:
:
: kloopd1: loop_queue_read_work
: xfs_file_iter_read
: lock XFS inode XFS_IOLOCK_SHARED (on image file)
: page cache read (GFP_KERNEL)
: radix tree alloc
: memory reclaim
: reclaim XFS inodes
: log force to unpin inodes
: <wait for log IO completion>
:
: xfs-cil/loop1: <does log force IO work>
: xlog_cil_push
: xlog_write
: <loop issuing log writes>
: xlog_state_get_iclog_space()
: <blocks due to all log buffers under write io>
: <waits for IO completion>
:
: kloopd1: loop_queue_write_work
: xfs_file_write_iter
: lock XFS inode XFS_IOLOCK_EXCL (on image file)
: <wait for inode to be unlocked>
:
: i.e. the kloopd, with it's split read and write work queues, has
: introduced a dependency through memory reclaim. i.e. that writes
: need to be able to progress for reads make progress.
:
: The problem, fundamentally, is that mpage_readpages() does a
: GFP_KERNEL allocation, rather than paying attention to the inode's
: mapping gfp mask, which is set to GFP_NOFS.
:
: The didn't used to happen, because the loop device used to issue
: reads through the splice path and that does:
:
: error = add_to_page_cache_lru(page, mapping, index,
: GFP_KERNEL & mapping_gfp_mask(mapping));
This has changed by commit aa4d86163e4 ("block: loop: switch to VFS
ITER_BVEC").
This patch changes mpage_readpage{s} to follow gfp mask set for the
mapping. There are, however, other places which are doing basically the
same.
lustre:ll_dir_filler is doing GFP_KERNEL from the function which
apparently uses GFP_NOFS for other allocations so let's make this
consistent.
cifs:readpages_get_pages is called from cifs_readpages and
__cifs_readpages_from_fscache called from the same path obeys mapping
gfp.
ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
regardless it uses mapping_gfp_mask for the page allocation.
ext4_mpage_readpages is the called from the page cache allocation path
same as read_pages and read_cache_pages
As I've noticed in my previous post I cannot say I would be happy about
sprinkling mapping_gfp_mask all over the place and it sounds like we
should drop gfp_mask argument altogether and use it internally in
__add_to_page_cache_locked that would require all the filesystems to use
mapping gfp consistently which I am not sure is the case here. From a
quick glance it seems that some file system use it all the time while
others are selective.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
structs should be constant.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
posix_lock_file_wait may fail under certain circumstances, and its result is
usually checked/returned. But given the complexity of cifs, I'm not sure if
the result is intentially left unchecked and always expected to succeed.
Signed-off-by: Chengyu Song <csong84@gatech.edu>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Steve French <smfrench@gmail.com>
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success. Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Conflicts:
fs/ext4/file.c