IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Just call inode_dio_wait directly instead of through a pointless wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The only difference to nfs_file_fsync is the call to pnfs_sync_inode. But
pnfs_sync_inode is just an inline that calls a pNFS layout driver method
if CONFIG_PNFS is designed, and thus can be called just fine from the core
NFS module.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* multipath:
NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback
pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3
SUNRPC: Allow addition of new transports to a struct rpc_clnt
NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections
SUNRPC: Make NFS swap work with multipath
SUNRPC: Add a helper to apply a function to all the rpc_clnt's transports
SUNRPC: Allow caller to specify the transport to use
SUNRPC: Use the multipath iterator to assign a transport to each task
SUNRPC: Make rpc_clnt store the multipath iterators
SUNRPC: Add a structure to track multiple transports
SUNRPC: Make freeing of struct xprt rcu-safe
SUNRPC: Uninline xprt_get(); It isn't performance critical.
SUNRPC: Reorder rpc_task to put waitqueue related info in same cachelines
SUNRPC: Remove unused function rpc_task_reset_client
* nfsv41_cb:
NFSv4.x: Fix NFS4ERR_RETRY_UNCACHED_REP in nfs4_callback_sequence
NFSv4.x: Allow multiple callbacks in flight
NFSv4.x: Fix wraparound issues when validing the callback sequence id
NFSv4.x: Enforce the ca_maxresponsesize_cached on the back channel
NFSv4.x: CB_SEQUENCE should return NFS4ERR_DELAY if still executing
NFSv4.x: Remove hard coded slotids in callback channel
Replace another case where the layout 'plh_block_lgets' can trigger
infinite loops in send_layoutget().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If the server reboots while there is a layoutget outstanding, then
the call to pnfs_choose_layoutget_stateid() will fail with an EAGAIN
error, which causes an infinite loop in send_layoutget(). The reason
why we never break out of the loop is that the layout 'plh_block_lgets'
field is never cleared.
Fix is to replace plh_block_lgets with NFS_LAYOUT_INVALID_STID, which
can be reset after a new layoutget.
Fixes: ab7d763e47 ("pNFS: Ensure nfs4_layoutget_prepare returns...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The newly added NFS v4.2 operations (ALLOCATE, DEALLOCATE, SEEK and CLONE)
use a helper called nfs42_set_rw_stateid to select a stateid that is sent
to the server. But they don't set the inode and state fields in the
nfs4_exception structure, and this don't partake in the stateid recovery
protocol. Because of this they will simply return errors insted of trying
to recover a stateid when the server return a BAD_STATEID error.
Additionally CLONE has the problem that it operates on two files and thus
two stateids, and thus needs to call the exception handler twice to
recover stateids.
While we're at it stop grabbing an addititional reference to the open
context in all these operations - having the file open guarantees that
the open context won't go away.
All this can be produces with the generic/168 and generic/170 tests in
xfstests which stress the CLONE stateid handling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
In the case where d_add_unique() finds an appropriate alias to use it will
have already incremented the reference count. An additional dget() to swap
the open context's dentry is unnecessary and will leak a reference.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 275bb30786 ("NFSv4: Move dentry instantiation into the NFSv4-...")
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When setting the layout return mode, we must always also set the
NFS_LAYOUT_RETURN_REQUESTED flag to ensure that we send a layoutreturn.
Otherwise pnfs_error_mark_layout_for_return() could set the mode, but
fail to send the layoutreturn because another is already in flight.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We don't need to schedule a layoutreturn if the layout segment can
be freed immediately.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Merge fixes from Andrew Morton:
"22 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT
radix-tree: fix oops after radix_tree_iter_retry
MAINTAINERS: trim the file triggers for ABI/API
dax: dirty inode only if required
thp: make deferred_split_scan() work again
mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
um: asm/page.h: remove the pte_high member from struct pte_t
mm, hugetlb: don't require CMA for runtime gigantic pages
mm/hugetlb: fix gigantic page initialization/allocation
mm: downgrade VM_BUG in isolate_lru_page() to warning
mempolicy: do not try to queue pages from !vma_migratable()
mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
vmstat: make vmstat_update deferrable
mm, vmstat: make quiet_vmstat lighter
mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT
memblock: don't mark memblock_phys_mem_size() as __init
dump_stack: avoid potential deadlocks
mm: validate_mm browse_rb SMP race condition
m32r: fix build failure due to SMP and MMU
...
Pull Ceph fixes from Sage Weil:
"We have a few wire protocol compatibility fixes, ports of a few recent
CRUSH mapping changes, and a couple error path fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: MOSDOpReply v7 encoding
libceph: advertise support for TUNABLES5
crush: decode and initialize chooseleaf_stable
crush: add chooseleaf_stable tunable
crush: ensure take bucket value is valid
crush: ensure bucket id is valid before indexing buckets array
ceph: fix snap context leak in error path
ceph: checking for IS_ERR instead of NULL
In the current implementation of the EPOLLEXCLUSIVE flag (added for
4.5-rc1), if epoll waiters create different POLL* sets and register them
as exclusive against the same target fd, the current implementation will
stop waking any further waiters once it finds the first idle waiter.
This means that waiters could miss wakeups in certain cases.
For example, when we wake up a pipe for reading we do:
wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM); So if
one epoll set or epfd is added to pipe p with POLLIN and a second set
epfd2 is added to pipe p with POLLRDNORM, only epfd may receive the
wakeup since the current implementation will stop after it finds any
intersection of events with a waiter that is blocked in epoll_wait().
We could potentially address this by requiring all epoll waiters that
are added to p be required to pass the same set of POLL* events. IE the
first EPOLL_CTL_ADD that passes EPOLLEXCLUSIVE establishes the set POLL*
flags to be used by any other epfds that are added as EPOLLEXCLUSIVE.
However, I think it might be somewhat confusing interface as we would
have to reference count the number of users for that set, and so
userspace would have to keep track of that count, or we would need a
more involved interface. It also adds some shared state that we'd have
store somewhere. I don't think anybody will want to bloat
__wait_queue_head for this.
I think what we could do instead, is to simply restrict EPOLLEXCLUSIVE
such that it can only be specified with EPOLLIN and/or EPOLLOUT. So
that way if the wakeup includes 'POLLIN' and not 'POLLOUT', we can stop
once we hit the first idle waiter that specifies the EPOLLIN bit, since
any remaining waiters that only have 'POLLOUT' set wouldn't need to be
woken. Likewise, we can do the same thing if 'POLLOUT' is in the wakeup
bit set and not 'POLLIN'. If both 'POLLOUT' and 'POLLIN' are set in the
wake bit set (there is at least one example of this I saw in fs/pipe.c),
then we just wake the entire exclusive list. Having both 'POLLOUT' and
'POLLIN' both set should not be on any performance critical path, so I
think that's ok (in fs/pipe.c its in pipe_release()). We also continue
to include EPOLLERR and EPOLLHUP by default in any exclusive set. Thus,
the user can specify EPOLLERR and/or EPOLLHUP but is not required to do
so.
Since epoll waiters may be interested in other events as well besides
EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP, these can still be added by
doing a 'dup' call on the target fd and adding that as one normally
would with EPOLL_CTL_ADD. Since I think that the POLLIN and POLLOUT
events are what we are interest in balancing, I think that the 'dup'
thing could perhaps be added to only one of the waiter threads.
However, I think that EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP should be
sufficient for the majority of use-cases.
Since EPOLLEXCLUSIVE is intended to be used with a target fd shared
among multiple epfds, where between 1 and n of the epfds may receive an
event, it does not satisfy the semantics of EPOLLONESHOT where only 1
epfd would get an event. Thus, it is not allowed to be specified in
conjunction with EPOLLEXCLUSIVE.
EPOLL_CTL_MOD is also not allowed if the fd was previously added as
EPOLLEXCLUSIVE. It seems with the limited number of flags to not be as
interesting, but this could be relaxed at some further point.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Madars Vitolins <m@silodev.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When recovery master down, dlm_do_local_recovery_cleanup() only remove
the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
Which will make umount thread falling in dead loop migrating $RECOVERY
to the dead node.
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously the pfn_mkwrite() fault handler for raw block devices called
bldev_dax_fault() -> __dax_fault() to do a full DAX page fault.
Really what the pfn_mkwrite() fault handler needs to do is call
dax_pfn_mkwrite() to make sure that the radix tree entry for the given
PTE is marked as dirty so that a follow-up fsync or msync call will
flush it durably to media.
Fixes: 5a023cdba5 ("block: enable dax for raw block devices")
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds multipathing to pNFS over NFSv3 as described in the flexfiles
draft spec.
Ideally, we'd like to do the same for pNFS files, but the NFSv4.1 protocol
requires a call to EXCHANGE_ID in order to test that the connection can do
session trunking.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Use the new helper to ensure that nfs4_proc_bind_conn_to_session() is called
for all connections.
However ensure that we only set the backchannel flag for the connection
pointed to by rpc_clnt->cl_xprt.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Merge fixes from Andrew Morton:
"18 fixes"
[ The 18 fixes turned into 17 commits, because one of the fixes was a
fix for another patch in the series that I just folded in by editing
the patch manually - hopefully correctly - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix memory leak in copy_huge_pmd()
drivers/hwspinlock: fix race between radix tree insertion and lookup
radix-tree: fix race in gang lookup
mm/vmpressure.c: fix subtree pressure detection
mm: polish virtual memory accounting
mm: warn about VmData over RLIMIT_DATA
Documentation: cgroup-v2: add memory.stat::sock description
mm: memcontrol: drop superfluous entry in the per-memcg stats array
drivers/scsi/sg.c: mark VMA as VM_IO to prevent migration
proc: revert /proc/<pid>/maps [stack:TID] annotation
numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390
MAINTAINERS: update Seth email
ocfs2/cluster: fix memory leak in o2hb_region_release
lib/test-string_helpers.c: fix and improve string_get_size() tests
thp: limit number of object to scan on deferred_split_scan()
thp: change deferred_split_count() to return number of THP in queue
thp: make split_queue per-node
Commit b76437579d ("procfs: mark thread stack correctly in
proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps.
Finding the task of a stack VMA requires walking the entire thread list,
turning this into quadratic behavior: a thousand threads means a
thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a
million combinations.
The cost is not in proportion to the usefulness as described in the
patch.
Drop the [stack:TID] annotation to make /proc/<pid>/maps (and
/proc/<pid>/numa_maps) usable again for higher thread counts.
The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as
identifying the stack VMA there is an O(1) operation.
Siddesh said:
"The end users needed a way to identify thread stacks programmatically and
there wasn't a way to do that. I'm afraid I no longer remember (or have
access to the resources that would aid my memory since I changed
employers) the details of their requirement. However, I did do this on my
own time because I thought it was an interesting project for me and nobody
really gave any feedback then as to its utility, so as far as I am
concerned you could roll back the main thread maps information since the
information is available in the thread-specific files"
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When working with hugetlbfs ptes (which are actually pmds) is not valid to
directly use pte functions like pte_present() because the hardware bit
layout of pmds and ptes can be different. This is the case on s390.
Therefore we have to convert the hugetlbfs ptes first into a valid pte
encoding with huge_ptep_get().
Currently the /proc/<pid>/numa_maps code uses hugetlbfs ptes without
huge_ptep_get(). On s390 this leads to the following two problems:
1) The pte_present() function returns false (instead of true) for
PROT_NONE hugetlb ptes. Therefore PROT_NONE vmas are missing
completely in the "numa_maps" output.
2) The pte_dirty() function always returns false for all hugetlb ptes.
Therefore these pages are reported as "mapped=xxx" instead of
"dirty=xxx".
Therefore use huge_ptep_get() to correctly convert the hugetlb ptes.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: <stable@vger.kernel.org> [4.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o2hb_region_release currently doesn't free o2hb_debug_buf
hr_db_elapsed_time and hr_db_pinned malloced in o2hb_debug_create. Also
we should call debugfs_remove before freeing its data, to prevent the risk
accessing debugfs rightly after its data has been freed.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
"This looks like a lot but it's a mixture of regression fixes as well
as fixes for longer standing issues.
1) Fix on-channel cancellation in mac80211, from Johannes Berg.
2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables
module, from Eric Dumazet.
3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric
Dumazet.
4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is
bound, from Craig Gallek.
5) GRO key comparisons don't take lightweight tunnels into account,
from Jesse Gross.
6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric
Dumazet.
7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we
register them, otherwise the NEWLINK netlink message is missing
the proper attributes. From Thadeu Lima de Souza Cascardo.
8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido
Schimmel
9) Handle fragments properly in ipv4 easly socket demux, from Eric
Dumazet.
10) Don't ignore the ifindex key specifier on ipv6 output route
lookups, from Paolo Abeni"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits)
tcp: avoid cwnd undo after receiving ECN
irda: fix a potential use-after-free in ircomm_param_request
net: tg3: avoid uninitialized variable warning
net: nb8800: avoid uninitialized variable warning
net: vxge: avoid unused function warnings
net: bgmac: clarify CONFIG_BCMA dependency
net: hp100: remove unnecessary #ifdefs
net: davinci_cpdma: use dma_addr_t for DMA address
ipv6/udp: use sticky pktinfo egress ifindex on connect()
ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
netlink: not trim skb for mmaped socket when dump
vxlan: fix a out of bounds access in __vxlan_find_mac
net: dsa: mv88e6xxx: fix port VLAN maps
fib_trie: Fix shift by 32 in fib_table_lookup
net: moxart: use correct accessors for DMA memory
ipv4: ipconfig: avoid unused ic_proto_used symbol
bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout.
bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter.
bnxt_en: Ring free response from close path should use completion ring
net_sched: drr: check for NULL pointer in drr_dequeue
...
Pull libnvdimm fixes from Dan Williams:
"1/ Fixes to the libnvdimm 'pfn' device that establishes a reserved
area for storing a struct page array.
2/ Fixes for dax operations on a raw block device to prevent pagecache
collisions with dax mappings.
3/ A fix for pfn_t usage in vm_insert_mixed that lead to a null
pointer de-reference.
These have received build success notification from the kbuild robot
across 153 configs and pass the latest ndctl tests"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
phys_to_pfn_t: use phys_addr_t
mm: fix pfn_t to page conversion in vm_insert_mixed
block: use DAX for partition table reads
block: revert runtime dax control of the raw block device
fs, block: force direct-I/O for dax-enabled block devices
devm_memremap_pages: fix vmem_altmap lifetime + alignment handling
libnvdimm, pfn: fix restoring memmap location
libnvdimm: fix mode determination for e820 devices
We need to initialize cb_sequenceres information when reporting a
NFS4ERR_RETRY_UNCACHED_REP error, since that will apply to the
next operation, not to the CB_SEQUENCE itself.
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Pull timer fixes from Thomas Gleixner:
"The timer departement delivers:
- a regression fix for the NTP code along with a proper selftest
- prevent a spurious timer interrupt in the NOHZ lowres code
- a fix for user space interfaces returning the remaining time on
architectures with CONFIG_TIME_LOW_RES=y
- a few patches to fix COMPILE_TEST fallout"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/nohz: Set the correct expiry when switching to nohz/lowres mode
clocksource: Fix dependencies for archs w/o HAS_IOMEM
clocksource: Select CLKSRC_MMIO where needed
tick/sched: Hide unused oneshot timer code
kselftests: timers: Add adjtimex SETOFFSET validity tests
ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO
itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper
hrtimer: Handle remaining time proper for TIME_LOW_RES
clockevents/tcb_clksrc: Prevent disabling an already disabled clock
Johan Hedberg says:
====================
pull request: bluetooth 2016-01-30
Here's a set of important Bluetooth fixes for the 4.5 kernel:
- Two fixes to 6LoWPAN code (one fixing a potential crash)
- Fix LE pairing with devices using both public and random addresses
- Fix allocation of dynamic LE PSM values
- Fix missing COMPATIBLE_IOCTL for UART line discipline
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid populating pagecache when the block device is in DAX mode.
Otherwise these page cache entries collide with the fsync/msync
implementation and break data durability guarantees.
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dynamically enabling DAX requires that the page cache first be flushed
and invalidated. This must occur atomically with the change of DAX mode
otherwise we confuse the fsync/msync tracking and violate data
durability guarantees. Eliminate the possibilty of DAX-disabled to
DAX-enabled transitions for now and revisit this for the next cycle.
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull btrfs fixes from Chris Mason:
"Dave had a small collection of fixes to the new free space tree code,
one of which was keeping our sysfs files more up to date with feature
bits as different things get enabled (lzo, raid5/6, etc).
I should have kept the sysfs stuff for rc3, since we always manage to
trip over something. This time it was GFP_KERNEL from somewhere that
is NOFS only. Instead of rebasing it out I've put a revert in, and
we'll fix it properly for rc3.
Otherwise, Filipe fixed a btrfs DIO race and Qu Wenruo fixed up a
use-after-free in our tracepoints that Dave Jones reported"
* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Revert "btrfs: synchronize incompat feature bits with sysfs files"
btrfs: don't use GFP_HIGHMEM for free-space-tree bitmap kzalloc
btrfs: sysfs: check initialization state before updating features
Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()"
btrfs: async-thread: Fix a use-after-free error for trace
Btrfs: fix race between fsync and lockless direct IO writes
btrfs: add free space tree to the cow-only list
btrfs: add free space tree to lockdep classes
btrfs: tweak free space tree bitmap allocation
btrfs: tests: switch to GFP_KERNEL
btrfs: synchronize incompat feature bits with sysfs files
btrfs: sysfs: introduce helper for syncing bits with sysfs files
btrfs: sysfs: add free-space-tree bit attribute
btrfs: sysfs: fix typo in compat_ro attribute definition
This reverts commit 14e46e0495.
This ends up doing sysfs operations from deep in balance (where we
should be GFP_NOFS) and under heavy balance load, we're making races
against sysfs internals.
Revert it for now while we figure things out.
Signed-off-by: Chris Mason <clm@fb.com>
NFS_LAYOUT_RETURN_BEFORE_CLOSE is being used to signal that a
layoutreturn is needed, either due to a layout recall or to a
layout error. Rename it to NFS_LAYOUT_RETURN_REQUESTED in order
to clarify its purpose.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The HCIUARTGETDEVICE, HCIUARTSETFLAGS and HCIUARTGETFLAGS ioctl are
missing the COMPATIBLE_IOCTL declaration.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
If the mount phase is not finished, we can't update the sysfs files.
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
The layoutreturn code currently relies on pnfs_put_lseg() to initiate the
RPC call when conditions are right. A problem arises when we want to
free the layout segment from inside an inode->i_lock section (e.g. in
pnfs_clear_request_commit()), since we cannot sleep.
The workaround is to move the actual call to pnfs_send_layoutreturn()
to pnfs_put_layout_hdr(), which doesn't have this restriction.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Parameter of trace_btrfs_work_queued() can be freed in its workqueue.
So no one use use that pointer after queue_work().
Fix the user-after-free bug by move the trace line before queue_work().
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
An fsync, using the fast path, can race with a concurrent lockless direct
IO write and end up logging a file extent item that points to an extent
that wasn't written to yet. This is because the fast fsync path collects
ordered extents into a local list and then collects all the new extent
maps to log file extent items based on them, while the direct IO write
path creates the new extent map before it creates the corresponding
ordered extent (and submitting the respective bio(s)).
So fix this by making the direct IO write path create ordered extents
before the extent maps and make the fast fsync path collect any new
ordered extents after it collects the extent maps.
Note that making the fsync handler call inode_dio_wait() (after acquiring
the inode's i_mutex) would not work and lead to a deadlock when doing
AIO, as through AIO we end up in a path where the fsync handler is called
(through dio_aio_complete_work() -> dio_complete() -> vfs_fsync_range())
before the inode's dio counter is decremented (inode_dio_wait() waits
for this counter to have a value of zero).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Hook the callback channel into the same session management machinery
as we use for the forward channel.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>