IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
XFS only supports the unwritten extent bit in the data fork, and only if
the file system has a version 5 superblock or the unwritten extent
feature bit.
We currently have two routines that validate the invariant:
xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met,
and xfs_validate_extent that triggers and assert in debug build.
Both of them iterate over all extents of an inode fork when called,
which isn't very efficient.
This patch instead adds a new helper that verifies the invariant one
extent at a time, and calls it from the places where we iterate over
all extents to converted them from or two the in-memory format. The
callers then return -EFSCORRUPTED when reading invalid extents from
disk, or trigger an assert when writing them to disk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We only ever use the normal and unwritten states. And the actual
ondisk format (this enum isn't despite being in xfs_format.h) only
has space for the unwritten bit anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
On some architectures do_div does the pointer compare
trick to make sure that we've sent it an unsigned 64-bit
number. (Why unsigned? I don't know.)
Fix up the few places that squawk about this; in
xfs_bmap_wants_extents() we just used a bare int64_t so change
that to unsigned.
In xfs_adjust_extent_unmap_boundaries() all we wanted was the
mod, and we have an xfs-specific function to handle that w/o
side effects, which includes proper casting for do_div.
In xfs_daddr_to_ag[b]no, we were using the wrong type anyway;
XFS_BB_TO_FSBT returns a block in the filesystem, so use
xfs_rfsblock_t not xfs_daddr_t, and gain the unsignedness
from that type as a bonus.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The kbuild test robot caught this; in debug code we have another
caller of do_div with a 32-bit dividend (j) which is caught now
that we are using the kernel-supplied do_div.
None of the values used here are 64-bit; just use simple division.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The trailing newlines wil lead to extra newlines in the trace file
which looks like the following output, so remove them.
>kworker/4:1H-1508 [004] .... 47879.101608: xfs_discard_extent: dev 8:0
>
>kworker/u16:2-238 [004] .... 47879.101725: xfs_extent_busy_clear: dev 8:0
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix the getfsmap tracepoints too]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Directory block readahead uses a complex iteration mechanism to map
between high-level directory blocks and underlying physical extents.
This mechanism attempts to traverse the higher-level dir blocks in a
manner that handles multi-fsb directory blocks and simultaneously
maintains a reference to the corresponding physical blocks.
This logic doesn't handle certain (discontiguous) physical extent
layouts correctly with multi-fsb directory blocks. For example,
consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
block size and a directory with the following extent layout:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..7]: 88..95 0 (88..95) 8
1: [8..15]: 80..87 0 (80..87) 8
2: [16..39]: 168..191 0 (168..191) 24
3: [40..63]: 5242952..5242975 1 (72..95) 24
Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
extent 2 is larger than the directory block size, the readahead code
erroneously assumes the block is contiguous and issues a readahead
based on the physical mapping of the first fsb of the dirblk. This
results in read verifier failure and a spurious corruption or crc
failure, depending on the filesystem format.
Further, the subsequent readahead code responsible for walking
through the physical table doesn't correctly advance the physical
block reference for dirblk 2. Instead of advancing two physical
filesystem blocks, the first iteration of the loop advances 1 block
(correctly), but the subsequent iteration advances 2 more physical
blocks because the next physical extent (extent 3, above) happens to
cover more than dirblk 2. At this point, the higher-level directory
block walking is completely off the rails of the actual physical
layout of the directory for the respective mapping table.
Update the contiguous dirblock logic to consider the current offset
in the physical extent to avoid issuing directory readahead to
unrelated blocks. Also, update the mapping table advancing code to
consider the current offset within the current dirblock to avoid
advancing the mapping reference too far beyond the dirblock.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Carlos had a case where "find" seemed to start spinning
forever and never return.
This was on a filesystem with non-default multi-fsb (8k)
directory blocks, and a fragmented directory with extents
like this:
0:[0,133646,2,0]
1:[2,195888,1,0]
2:[3,195890,1,0]
3:[4,195892,1,0]
4:[5,195894,1,0]
5:[6,195896,1,0]
6:[7,195898,1,0]
7:[8,195900,1,0]
8:[9,195902,1,0]
9:[10,195908,1,0]
10:[11,195910,1,0]
11:[12,195912,1,0]
12:[13,195914,1,0]
...
i.e. the first extent is a contiguous 2-fsb dir block, but
after that it is fragmented into 1 block extents.
At the top of the readdir path, we allocate a mapping array
which (for this filesystem geometry) can hold 10 extents; see
the assignment to map_info->map_size. During readdir, we are
therefore able to map extents 0 through 9 above into the array
for readahead purposes. If we count by 2, we see that the last
mapped index (9) is the first block of a 2-fsb directory block.
At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
more readahead; the outer loop assumes one full dir block is
processed each loop iteration, and an inner loop that ensures
that this is so by advancing to the next extent until a full
directory block is mapped.
The problem is that this inner loop may step past the last
extent in the mapping array as it tries to reach the end of
the directory block. This will read garbage for the extent
length, and as a result the loop control variable 'j' may
become corrupted and never fail the loop conditional.
The number of valid mappings we have in our array is stored
in map->map_valid, so stop this inner loop based on that limit.
There is an ASSERT at the top of the outer loop for this
same condition, but we never made it out of the inner loop,
so the ASSERT never fired.
Huge appreciation for Carlos for debugging and isolating
the problem.
Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
On a ppc64 machine executing overlayfs/019 with xfs as the lower and
upper filesystem causes the following call trace,
WARNING: CPU: 2 PID: 8034 at /root/repos/linux/fs/iomap.c:765 .iomap_dio_actor+0xcc/0x420
Modules linked in:
CPU: 2 PID: 8034 Comm: fsstress Tainted: G L 4.11.0-rc5-next-20170405 #100
task: c000000631314880 task.stack: c0000003915d4000
NIP: c00000000035a72c LR: c00000000035a6f4 CTR: c00000000035a660
REGS: c0000003915d7570 TRAP: 0700 Tainted: G L (4.11.0-rc5-next-20170405)
MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>
CR: 24004284 XER: 00000000
CFAR: c0000000006f7190 SOFTE: 1
GPR00: c00000000035a6f4 c0000003915d77f0 c0000000015a3f00 000000007c22f600
GPR04: 000000000022d000 0000000000002600 c0000003b2d56360 c0000003915d7960
GPR08: c0000003915d7cd0 0000000000000002 0000000000002600 c000000000521cc0
GPR12: 0000000024004284 c00000000fd80a00 000000004b04ae64 ffffffffffffffff
GPR16: 000000001000ca70 0000000000000000 c0000003b2d56380 c00000000153d2b8
GPR20: 0000000000000010 c0000003bc87bac8 0000000000223000 000000000022f5ff
GPR24: c0000003b2d56360 000000000000000c 0000000000002600 000000000022d000
GPR28: 0000000000000000 c0000003915d7960 c0000003b2d56360 00000000000001ff
NIP [c00000000035a72c] .iomap_dio_actor+0xcc/0x420
LR [c00000000035a6f4] .iomap_dio_actor+0x94/0x420
Call Trace:
[c0000003915d77f0] [c00000000035a6f4] .iomap_dio_actor+0x94/0x420 (unreliable)
[c0000003915d78f0] [c00000000035b9f4] .iomap_apply+0xf4/0x1f0
[c0000003915d79d0] [c00000000035c320] .iomap_dio_rw+0x230/0x420
[c0000003915d7ae0] [c000000000512a14] .xfs_file_dio_aio_read+0x84/0x160
[c0000003915d7b80] [c000000000512d24] .xfs_file_read_iter+0x104/0x130
[c0000003915d7c10] [c0000000002d6234] .__vfs_read+0x114/0x1a0
[c0000003915d7cf0] [c0000000002d7a8c] .vfs_read+0xac/0x1a0
[c0000003915d7d90] [c0000000002d96b8] .SyS_read+0x58/0x100
[c0000003915d7e30] [c00000000000b8e0] system_call+0x38/0xfc
Instruction dump:
78630020 7f831b78 7ffc07b4 7c7ce039 40820360 a13d0018 2f890003 419e0288
2f890004 419e00a0 2f890001 419e02a8 <0fe00000> 3b80fffb 38210100 7f83e378
The above problem can also be recreated on a regular xfs filesystem
using the command,
$ fsstress -d /mnt -l 1000 -n 1000 -p 1000
The reason for the call trace is,
1. When 'reserving' blocks for delayed allocation , XFS reserves more
blocks (i.e. past file's current EOF) than required. This is done
because XFS assumes that userspace might write more data and hence
'reserving' more blocks might lead to the file's new data being
stored contiguously on disk.
2. The in-memory 'struct xfs_bmbt_irec' mapping the file's last extent would
then cover the prealloc-ed EOF blocks in addition to the regular blocks.
3. When flushing the dirty blocks to disk, we only flush data till the
file's EOF. But before writing out the dirty data, we allocate blocks
on the disk for holding the file's new data. This allocation includes
the blocks that are part of the 'prealloc EOF blocks'.
4. Later, when the last reference to the inode is being closed, XFS frees the
unused 'prealloc EOF blocks' in xfs_inactive().
In step 3 above, When allocating space on disk for the delayed allocation
range, the space allocator might sometimes allocate less blocks than
required. If such an allocation ends right at the current EOF of the
file, We will not be able to clear the "delayed allocation" flag for the
'prealloc EOF blocks', since we won't have dirty buffer heads associated
with that range of the file.
In such a situation if a Direct I/O read operation is performed on file
range [X, Y] (where X < EOF and Y > EOF), we flush dirty data in the
range [X, Y] and invalidate page cache for that range (Refer to
iomap_dio_rw()). Later for performing the Direct I/O read, XFS obtains
the extent items (which are still cached in memory) for the file
range. When doing so we are not supposed to get an extent item with
IOMAP_DELALLOC flag set, since the previous "flush" operation should
have converted any delayed allocation data in the range [X, Y]. Hence we
end up hitting a WARN_ON_ONCE(1) statement in iomap_dio_actor().
This commit fixes the bug by preventing the read operation from going
beyond iomap_dio->i_size.
Reported-by: Santhosh G <santhog4@linux.vnet.ibm.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that reflink operations don't set the firstblock value we don't
need the workarounds for non-NULL firstblock values without a prior
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The main thing that xfs_bmap_remap_alloc does is fixing the AGFL, similar
to what we do in the space allocator. But the reflink code doesn't touch
the allocation btree unlike the normal space allocator, so we couldn't
care less about the state of the AGFL.
So remove xfs_bmap_remap_alloc and just handle the di_nblocks update in
the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Add a new helper to be used for reflink extent list additions instead of
funneling them through xfs_bmapi_write and overloading the firstblock
member in struct xfs_bmalloca and struct xfs_alloc_args.
With some small changes to xfs_bmap_remap_alloc this also means we do
not need a xfs_bmalloca structure for this case at all.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
For the reflink case we'd much rather pass the required arguments than
faking up a struct xfs_bmalloca.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We never do COW operations for the attr fork, so don't pretend we handle
them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
xfs_aglock_t, which truncates the value to 32 bits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When delivering an event to userspace for a file on an NFS share,
if the file is deleted on server side before user reads the event,
user will not get the event.
If the event queue contained several events, the stale event is
quietly dropped and read() returns to user with events read so far
in the buffer.
If the event queue contains a single stale event or if the stale
event is a permission event, read() returns to user with the kernel
internal error code 518 (EOPENSTALE), which is not a POSIX error code.
Check the internal return value -EOPENSTALE in fanotify_read(), just
the same as it is checked in path_openat() and drop the event in the
cases that it is not already dropped.
This is a reproducer from Marko Rauhamaa:
Just take the example program listed under "man fanotify" ("fantest")
and follow these steps:
==============================================================
NFS Server NFS Client(1) NFS Client(2)
==============================================================
# echo foo >/nfsshare/bar.txt
# cat /nfsshare/bar.txt
foo
# ./fantest /nfsshare
Press enter key to terminate.
Listening for events.
# rm -f /nfsshare/bar.txt
# cat /nfsshare/bar.txt
read: Unknown error 518
cat: /nfsshare/bar.txt: Operation not permitted
==============================================================
where NFS Client (1) and (2) are two terminal sessions on a single NFS
Client machine.
Reported-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com>
Tested-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com>
Cc: <linux-api@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Async request may be throttled in block layer, so page for async may keep WRITE_BACK
for a long time.
For encrytped inode, we need wait on page writeback no matter if the device supports
BDI_CAP_STABLE_WRITES. This may result in a higher waiting page writeback time for
async encrypted inode page.
This patch skips IPU for encrypted inode's updating write.
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch also reverts d0db7703ac1 ("f2fs: do SSR in higher priority").
This patch fixes out of free segments caused by many small file creation by
1) mkfs -s 1 2G
2) mount
3) untar
- preoduce 60000 small files burstly
4) sync
- flush node pages
- flush imeta
Here, when we do f2fs_balance_fs, we missed # of imeta blocks, resulting in
skipping to check has_not_enough_free_secs.
Another test is done by
1) mkfs -s 12 2G
2) mount
3) untar
- preoduce 60000 small files burstly
4) sync
- flush node pages
- flush imeta
In this case, this patch also fixes wrong block allocation under large section
size.
Reported-by: William Brana <wbrana@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With a recent addition of f2fs_lookup_extent_tree(), we get a warning about
the use of empty macros:
fs/f2fs/extent_cache.c: In function 'f2fs_lookup_extent_tree':
fs/f2fs/extent_cache.c:358:32: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body]
stat_inc_rbtree_node_hit(sbi);
A good way to avoid the warning and make the code more robust is to define
all no-op macros as 'do { } while (0)'.
Fixes: 54c2258cd63a ("f2fs: extract rb-tree operation infrastructure")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reivewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds an ioctl to flush data in faster device to cold area. User can
give device number and number of segments to move. It doesn't move it if there
is only one device.
The parameter looks like:
struct f2fs_flush_device {
u32 dev_num; /* device number to flush */
u32 segments; /* # of segments to flush */
};
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Improve comments in ext4_quota_{on|off}() to explain that returning
success despite ext4_journal_start() failing is deliberate.
Signed-off-by: Jan Kara <jack@suse.cz>
We recently shifted this code around, so we're no longer holding the
lock on this path.
Fixes: 755b5bc681eb ("fsnotify: Remove indirection from mark list addition")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Use temporary mapping for memory copying operations.
To avoid any sleeping problem,
mark_inode_dirty(inode) was moved after kunmap() in
udf_adinicb_readpage()
down_write(&iinfo->i_data_sem) set before kmap_atomic()
in udf_expand_file_adinicb()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
According to commit f90774e1fd27 ("checkpatch: look for symbolic
permissions and suggest octal instead")
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
- More O_TMPFILE fallout
- RENAME_WHITEOUT regression due to a mis-merge
- Memory leak in ubifs_mknod()
- Power-cut problem in UBI's update volume feature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJY/HTyAAoJEEtJtSqsAOnW7bYP/jYNCRKUls/FhWcF02h/tuPz
gYemObWxoEvPqy8laNwyYEkzct4ADsz4C2RCGxygJks2qYCzuXy+IQd/gQlJNIhM
B9sNa1se1ntt71vwdwgRUgVRMGjknJ6rY8ZP2actuuzztdJE9fGNKhEarMVyQQ61
f7+AvmFc0CHTEBi5hcHX/Rtdqvzt4OOJl5pLo+VT2RRegxb3oHgl1UTD+gqkdkMg
WOxvdriSQPi8HHz1clCN36lG2H05sKLd7YEhSWKf6Pt0YYJmF8ckxLERzRclpV4D
XXqUEwhrovJC+oRQbXS+DaJfjsML9y8GB28izC5c+H8/teGKv2n5NXtjsyWNCfnx
gmZT1S24TnTsofbmd1GoemMMD4Vlt9EDq2+QVFCaPgK9eWkDQ/xlhjrfPMsuz+YY
uCWgyzsJyvRewiOlkN1jcTua6f1nHt0SP7alNy0XUQF08jwYMaaf5IJAbEHP8Ink
seIXNObiGgHl+bBsTQ4SU9PUIJIKIar0TSQf8HXV+/Go3XDz4ZTSjc1KVXs5Hg97
irUPllIKmjb5Doiln12A5zePmgT2uaChFG21+6SwEWkFYrD3XIxBORxkEF6E7FJU
asRk2oBzNYezz+qpjn3NeZnIV7Uc9B8bBZpqFrsys2KRAJIZRtdhpAF+jIbrS+7N
JrakPr2I/FKnsNU/0mZh
=ER2q
-----END PGP SIGNATURE-----
Merge tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs
Pull UBI/UBIFS fixes from Richard Weinberger:
"This contains fixes for issues in both UBI and UBIFS:
- more O_TMPFILE fallout
- RENAME_WHITEOUT regression due to a mis-merge
- memory leak in ubifs_mknod()
- power-cut problem in UBI's update volume feature"
* tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs:
ubifs: Fix O_TMPFILE corner case in ubifs_link()
ubifs: Fix RENAME_WHITEOUT support
ubifs: Fix debug messages for an invalid filename in ubifs_dump_inode
ubifs: Fix debug messages for an invalid filename in ubifs_dump_node
ubifs: Remove filename from debug messages in ubifs_readdir
ubifs: Fix memory leak in error path in ubifs_mknod
ubi/upd: Always flush after prepared for an update
Both conflict were simple overlapping changes.
In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.
Signed-off-by: David S. Miller <davem@davemloft.net>
unsupported NFSv4 compound op.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJY+mlFAAoJECebzXlCjuG+eCMP/2r/lXdx91Kibg4/D/9OB+LW
YmcyUH83iRq/F/V88yEMz5XMUTZ/UkZ3iSGSDObjhBK7oswAtZhC1KiWOiuI9ry4
tcqZM3se7FQNpbwm2wvhtRmbjd2YOZfS3o32Uorhc/beMZzIzZhUb15pp5o3hOIy
4aoxeUs6//B51fil4Ur0oeLI1EqUd3t+1z0ytyRRzSfORHF/QEiTTXSmmKK+K2XA
t35XXYJE3SCLxQpWW/CHT/IODv6nsaM0rgN/kkmZUxAyaw2GXT7U6+mzkz3tSL8s
AEqNUptCn2M3iAv/e6wgpO3BS49HR+Gdg+JxPilCcs+GniWu4zH3sE0HFjTuqQ23
7u/NB4lkDqFs9pO1x4D5Y/69yXbs6jUHn+NMnhTZS85pTwU5o3RNhzwegoOfIp/3
CY22R92ZLHEqb1/VTc7rpeuj3d9r7Swg3U8X7kDnn7XtAj2HlMqaHZfw2zjTIYBO
wqKnikTDMHuzZ7MtegLld4oayNZO4hN7Vi01Npzgr3nkwvpFpz8Jbvsn8mMHvUq8
ZYWoMfz5rbH20N7cnFUAtB5fm67U9smjb01uzx3m6ylh1SqTWxAoZPWluGJTv0IA
JKEM4g4PSNOizhib6KI2PW0SngLkCWKLy463fDS2Mx/M7DBZf0j4697TDK21R5NB
Pny/pwJT4mjLwePc/qt2
=mSBF
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfix from Bruce Fields:
"Fix a 4.11 regression that triggers a BUG() on an attempt to use an
unsupported NFSv4 compound op"
* tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux:
nfsd: fix oops on unsupported operation
Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:
if (bi->profile)
disk->queue->backing_dev_info->capabilities |=
BDI_CAP_STABLE_WRITES;
else
disk->queue->backing_dev_info->capabilities &=
~BDI_CAP_STABLE_WRITES;
It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time. This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.
Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there. This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.
Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.4+, needs backporting
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Replace bdev_direct_access() with dax_direct_access() that uses
dax_device and dax_operations instead of a block_device and
block_device_operations for dax. Once all consumers of the old api have
been converted bdev_direct_access() will be deleted.
Given that block device partitioning decisions can cause dax page
alignment constraints to be violated this also introduces the
bdev_dax_pgoff() helper. It handles calculating a logical pgoff relative
to the dax_device and also checks for page alignment.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This passes on the scsi_cmnd result field to users of passthrough
requests. Currently we abuse req->errors for this purpose, but that
field will go away in its current form.
Note that the old IDE code abuses the errors field in very creative
ways and stores all kinds of different values in it. I didn't dare
to touch this magic, so the abuses are brought forward 1:1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The function only returns -EIO if rq->errors is non-zero, which is not
very useful and lets a large number of callers ignore the return value.
Just let the callers figure out their error themselves.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Drop 'parent' argument of bdi_register() and bdi_register_va(). It is
always NULL.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Now that all bdi structures filesystems use are properly refcounted, we
can remove the SB_I_DYNBDI flag.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.
CC: Richard Weinberger <richard@nod.at>
CC: Artem Bityutskiy <dedekind1@gmail.com>
CC: Adrian Hunter <adrian.hunter@intel.com>
CC: linux-mtd@lists.infradead.org
Acked-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.
CC: Anna Schumaker <anna.schumaker@netapp.com>
CC: linux-nfs@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.
CC: Petr Vandrovec <petr@vandrovec.name>
Acked-by: Petr Vandrovec <petr@vandrovec.name>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Similarly to set_bdev_super() NILFS2 just used block device reference to
bdi. Convert it to properly getting bdi reference. The reference will
get automatically dropped on superblock destruction.
CC: linux-nilfs@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <axboe@fb.com>
Similarly to set_bdev_super() GFS2 just used block device reference to
bdi. Convert it to properly getting bdi reference. The reference will
get automatically dropped on superblock destruction.
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: Bob Peterson <rpeterso@redhat.com>
CC: cluster-devel@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
It is not needed anymore since bdi is initialized whenever superblock
exists.
CC: Miklos Szeredi <miklos@szeredi.hu>
CC: linux-fsdevel@vger.kernel.org
Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.
CC: Miklos Szeredi <miklos@szeredi.hu>
CC: linux-fsdevel@vger.kernel.org
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.
CC: Boaz Harrosh <ooo@electrozaur.com>
CC: Benny Halevy <bhalevy@primarydata.com>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.
CC: Jan Harkes <jaharkes@cs.cmu.edu>
CC: coda@cs.cmu.edu
CC: codalist@coda.cs.cmu.edu
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.
CC: David Howells <dhowells@redhat.com>
CC: linux-afs@lists.infradead.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.
CC: Tyler Hicks <tyhicks@canonical.com>
CC: ecryptfs@vger.kernel.org
Acked-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocate struct backing_dev_info separately instead of embedding it
inside superblock. This unifies handling of bdi among users.
CC: Steve French <sfrench@samba.org>
CC: linux-cifs@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocate struct backing_dev_info separately instead of embedding it
inside client structure. This unifies handling of bdi among users.
CC: Ilya Dryomov <idryomov@gmail.com>
CC: "Yan, Zheng" <zyan@redhat.com>
CC: Sage Weil <sage@redhat.com>
CC: ceph-devel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>