53526 Commits

Author SHA1 Message Date
Chengguang Xu
b884014a91 ceph: adding protection for showing cap reservation info
Adding spinlock protection during getting cap reservation
ralated fields so that the numbers match below BUG_ON condition
in the code.

BUG_ON(mdsc->caps_total_count != mdsc->caps_use_count +
				 mdsc->caps_reserve_count +
				 mdsc->caps_avail_count);

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:46 +02:00
Chengguang Xu
4d8969af28 ceph: use seq_show_option for string type options
Using seq_show_option to replace seq_printf for string type options.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:45 +02:00
Chengguang Xu
11e1478df9 libceph, ceph: change permission for readonly debugfs entries
Remove write permission for debugfs entries which only have readonly
function.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:45 +02:00
Chengguang Xu
7ae7a828d9 ceph: keep consistent semantic in fscache related option combination
When specifying multiple fscache related options, the result isn't always
the same as option order, this fix will keep strict consistent meaning
by order.

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:45 +02:00
Chengguang Xu
4c069a5821 ceph: add newline to end of debug message format
Some of dout format do not include newline in the end,
fix for the files which are in fs/ceph and net/ceph directories,
and changing printk to dout for printing debug info in super.c

Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:44 +02:00
Ilya Dryomov
08c1ac508b libceph, ceph: move ceph_calc_file_object_mapping() to striper.c
ceph_calc_file_object_mapping() has nothing to do with osdmaps.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02 10:12:43 +02:00
Ilya Dryomov
dccbf08005 libceph, ceph: change ceph_calc_file_object_mapping() signature
- make it void
- xlen (object extent length) out parameter should be u32 because only
  a single stripe unit is mapped at a time

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02 10:12:38 +02:00
Theodore Ts'o
e40ff21389 ext4: force revalidation of directory pointer after seekdir(2)
A malicious user could force the directory pointer to be in an invalid
spot by using seekdir(2).  Use the mechanism we already have to notice
if the directory has changed since the last time we called
ext4_readdir() to force a revalidation of the pointer.

Reported-by: syzbot+1236ce66f79263e8a862@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-04-01 23:21:03 -04:00
Long Li
21a4e14aae cifs: smbd: disconnect transport on RDMA errors
On RDMA errors, transport should disconnect the RDMA CM connection. This
will notify the upper layer, and it will attempt transport reconnect.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:40 -05:00
Long Li
48f238a79f cifs: smbd: avoid reconnect lockup
During transport reconnect, other processes may have registered memory
and blocked on transport. This creates a deadlock situation because the
transport resources can't be freed, and reconnect is blocked.

Fix this by returning to upper layer on timeout. Before returning,
transport status is set to reconnecting so other processes will release
memory registration resources.

Upper layer will retry the reconnect. This is not in fast I/O path so
setting the timeout to 5 seconds.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:40 -05:00
Steve French
2a18287b54 Don't log confusing message on reconnect by default
Change the following message (which can occur on reconnect) from
a warning to an FYI message.  It is confusing to users.

   [58360.523634] CIFS VFS: Free previous auth_key.response = 00000000a91cdc84

By default this message won't show up on reconnect unless the user bumps
up the log level to include FYI messages.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-01 20:24:40 -05:00
Steve French
2564f2ff83 Don't log expected error on DFS referral request
STATUS_FS_DRIVER_REQUIRED is expected when DFS is not turned
on on the server.  Do not log it on DFS referral response.
It clutters the dmesg log unnecessarily at mount time.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com
Reviewed-by: Ronnie sahlberg <lsahlber@redhat.com>
2018-04-01 20:24:40 -05:00
Phillip Potter
31cd106bb1 fs: cifs: Replace _free_xid call in cifs_root_iget function
Modify end of cifs_root_iget function in fs/cifs/inode.c to call
free_xid(xid) instead of _free_xid(xid), thereby allowing debug
notification of this action when enabled.

Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-04-01 20:24:40 -05:00
Steve French
d68f353fc9 SMB3.1.1 dialect is no longer experimental
SMB3.1.1 is a very important dialect, with much improved security.
We can remove the ExPERIMENTAL comments about it. It is widely
supported by servers.

Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:40 -05:00
Steve French
6188f28bf6 Tree connect for SMB3.1.1 must be signed for non-encrypted shares
SMB3.1.1 tree connect was only being signed when signing was mandatory
but needs to always be signed (for non-guest users).

See MS-SMB2 section 3.2.4.1.1

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:40 -05:00
Ronnie Sahlberg
262916bc69 fix smb3-encryption breakage when CONFIG_DEBUG_SG=y
We can not use the standard sg_set_buf() fucntion since when
CONFIG_DEBUG_SG=y this adds a check that will BUG_ON for cifs.ko
when we pass it an object from the stack.

Create a new wrapper smb2_sg_set_buf() which avoids doing that particular check
and use it for smb3 encryption instead.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:40 -05:00
Gustavo A. R. Silva
70e80655f5 CIFS: fix sha512 check in cifs_crypto_secmech_release
It seems this is a copy-paste error and that the proper variable to use
in this particular case is _sha512_ instead of _md5_.

Addresses-Coverity-ID: 1465358 ("Copy-paste error")
Fixes: 1c6614d229e7 ("CIFS: add sha512 secmech")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2018-04-01 20:24:40 -05:00
Aurelien Aptel
8bd68c6e47 CIFS: implement v3.11 preauth integrity
SMB3.11 clients must implement pre-authentification integrity.

* new mechanism to certify requests/responses happening before Tree
  Connect.
* supersedes VALIDATE_NEGOTIATE
* fixes signing for SMB3.11

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-01 20:24:40 -05:00
Aurelien Aptel
5fcd7f3f96 CIFS: add sha512 secmech
* prepare for SMB3.11 pre-auth integrity
* enable sha512 when SMB311 is enabled in Kconfig
* add sha512 as a soft dependency

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-04-01 20:24:39 -05:00
Aurelien Aptel
82fb82be05 CIFS: refactor crypto shash/sdesc allocation&free
shash and sdesc and always allocated and freed together.
* abstract this in new functions cifs_alloc_hash() and cifs_free_hash().
* make smb2/3 crypto allocation independent from each other.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:39 -05:00
Ronnie Sahlberg
b7a73c84eb cifs: fix memory leak in SMB2_open()
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2018-04-01 20:24:39 -05:00
Colin Ian King
ac65cb6203 CIFS: SMBD: fix spelling mistake: "faield" and "legnth"
Trivial fix to spelling mistake in log_rdma_send and log_rdma_mr
message text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-04-01 20:24:39 -05:00
David S. Miller
c0b458a946 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c,
we had some overlapping changes:

1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE -->
   MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE

2) In 'net-next' params->log_rq_size is renamed to be
   params->log_rq_mtu_frames.

3) In 'net-next' params->hard_mtu is added.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01 19:49:34 -04:00
Theodore Ts'o
54dd0e0a1b ext4: add extra checks to ext4_xattr_block_get()
Add explicit checks in ext4_xattr_block_get() just in case the
e_value_offs and e_value_size fields in the the xattr block are
corrupted in memory after the buffer_verified bit is set on the xattr
block.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-03-30 20:04:11 -04:00
David Sterba
57599c7e77 btrfs: lift errors from add_extent_changeset to the callers
The missing error handling in add_extent_changeset was hidden, so make
it at least visible in the callers.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:03:25 +02:00
Liu Bo
f50f435390 Btrfs: print error messages when failing to read trees
When mount fails to read trees like fs tree, checksum tree, extent
tree, etc, there is not enough information about where went wrong.

With this, messages like

"BTRFS warning (device sdf): failed to read root (objectid=7): -5"

would help us a bit.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:02:14 +02:00
David Sterba
38e82de8cc btrfs: user proper type for btrfs_mask_flags flags
All users pass a local unsigned int and not the __uXX types that are
supposed to be used for userspace interfaces.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:07 +02:00
David Sterba
7e79cb86be btrfs: split dev-replace locking helpers for read and write
The current calls are unclear in what way btrfs_dev_replace_lock takes
the locks, so drop the argument, split the helpers and use similar
naming as for read and write locks.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:07 +02:00
David Sterba
e7ab0af6c3 btrfs: remove stale comments about fs_mutex
The fs_mutex has been killed in 2008, a213501153fd66e2 ("Btrfs: Replace
the big fs_mutex with a collection of other locks"), still remembered in
some comments.

We don't have any extra needs for locking in the ACL handlers.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:07 +02:00
David Sterba
88c14590cd btrfs: use RCU in btrfs_show_devname for device list traversal
The show_devname callback is used to print device name in
/proc/self/mounts, we need to traverse the device list consistently and
read the name that's copied to a seq buffer so we don't need further
locking.

If the first device is being deleted at the same time, the RCU will
allow us to read the device name, though it will become stale right
after the RCU protection ends. This is unavoidable and the user can
expect that the device will disappear from the filesystem's list at some
point.

The device_list_mutex was pretty heavy as it is used eg. for writing
superblock and a few other IO related contexts. This can stall any
application that reads the proc file for no reason.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:06 +02:00
David Sterba
d1980131ca btrfs: update barrier in should_cow_block
Once there was a simple int force_cow that was used with the plain
barriers, and then converted to a bit, so we should use the appropriate
barrier helper.

Other variables in the complex if condition do not depend on a barrier,
so we should be fine in case the atomic barrier becomes a no-op.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:06 +02:00
David Sterba
a32bf9a302 btrfs: use lockdep_assert_held for mutexes
Using lockdep_assert_held is preferred, replace mutex_is_locked.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:06 +02:00
David Sterba
a4666e688f btrfs: use lockdep_assert_held for spinlocks
Using lockdep_assert_held is preferred, replace assert_spin_locked.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:06 +02:00
Qu Wenruo
581c176041 btrfs: Validate child tree block's level and first key
We have several reports about node pointer points to incorrect child
tree blocks, which could have even wrong owner and level but still with
valid generation and checksum.

Although btrfs check could handle it and print error message like:
leaf parent key incorrect 60670574592

Kernel doesn't have enough check on this type of corruption correctly.
At least add such check to read_tree_block() and btrfs_read_buffer(),
where we need two new parameters @level and @first_key to verify the
child tree block.

The new @level check is mandatory and all call sites are already
modified to extract expected level from its call chain.

While @first_key is optional, the following call sites are skipping such
check:
1) Root node/leaf
   As ROOT_ITEM doesn't contain the first key, skip @first_key check.
2) Direct backref
   Only parent bytenr and level is known and we need to resolve the key
   all by ourselves, skip @first_key check.

Another note of this verification is, it needs extra info from nodeptr
or ROOT_ITEM, so it can't fit into current tree-checker framework, which
is limited to node/leaf boundary.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:06 +02:00
Qu Wenruo
3c0efdf03b btrfs: tests/qgroup: Fix wrong tree backref level
The extent tree of the test fs is like the following:

 BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free space 3919
  item 0 key (4096 168 4096) itemoff 3944 itemsize 51
          extent refs 1 gen 1 flags 2
          tree block key (68719476736 0 0) level 1
                                           ^^^^^^^
          ref#0: tree block backref root 5

And it's using an empty tree for fs tree, so there is no way that its
level can be 1.

For REAL (created by mkfs) fs tree backref with no skinny metadata, the
result should look like:

 item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
         refs 1 gen 4 flags TREE_BLOCK
         tree block key (256 INODE_ITEM 0) level 0
                                           ^^^^^^^
         tree block backref root 5

Fix the level to 0, so it won't break later tree level checker.

Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
Filipe Manana
8434ec46c6 Btrfs: fix copy_items() return value when logging an inode
When logging an inode, at tree-log.c:copy_items(), if we call
btrfs_next_leaf() at the loop which checks for the need to log holes, we
need to make sure copy_items() returns the value 1 to its caller and
not 0 (on success). This is because the path the caller passed was
released and is now different from what is was before, and the caller
expects a return value of 0 to mean both success and that the path
has not changed, while a return value of 1 means both success and
signals the caller that it can not reuse the path, it has to perform
another tree search.

Even though this is a case that should not be triggered on normal
circumstances or very rare at least, its consequences can be very
unpredictable (especially when replaying a log tree).

Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
Filipe Manana
4ee3fad34a Btrfs: fix fsync after hole punching when using no-holes feature
When we have the no-holes mode enabled and fsync a file after punching a
hole in it, we can end up not logging the whole hole range in the log tree.
This happens if the file has extent items that span more than one leaf and
we punch a hole that covers a range that starts in a leaf but does not go
beyond the offset of the first extent in the next leaf.

Example:

  $ mkfs.btrfs -f -O no-holes -n 65536 /dev/sdb
  $ mount /dev/sdb /mnt
  $ for ((i = 0; i <= 831; i++)); do
	offset=$((i * 2 * 256 * 1024))
	xfs_io -f -c "pwrite -S 0xab -b 256K $offset 256K" \
		/mnt/foobar >/dev/null
    done
  $ sync

  # We now have 2 leafs in our filesystem fs tree, the first leaf has an
  # item corresponding the extent at file offset 216530944 and the second
  # leaf has a first item corresponding to the extent at offset 217055232.
  # Now we punch a hole that partially covers the range of the extent at
  # offset 216530944 but does go beyond the offset 217055232.

  $ xfs_io -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" /mnt/foobar
  $ xfs_io -c "fsync" /mnt/foobar

  <power fail>

  # mount to replay the log
  $ mount /dev/sdb /mnt

  # Before this patch, only the subrange [216658016, 216662016[ (length of
  # 4000 bytes) was logged, leaving an incorrect file layout after log
  # replay.

Fix this by checking if there is a hole between the last extent item that
we processed and the first extent item in the next leaf, and if there is
one, log an explicit hole extent item.

Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
David Sterba
a1840b5023 btrfs: use helper to set ulist aux from a qgroup
We have a nice helper to do proper casting of a qgroup to a ulist aux
value. And several places that could make use of it.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
Qu Wenruo
0b78877a2a Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"
This reverts commit 48a89bc4f2ceab87bc858a8eb189636b09c846a7.

The idea to commit transaction and free some space after hitting qgroup
limit is good, although the problem is it can easily cause deadlocks.

One deadlock example is caused by trying to flush data while still
holding it:

Call Trace:
 __schedule+0x49d/0x10f0
 schedule+0xc6/0x290
 schedule_timeout+0x187/0x1c0
 wait_for_completion+0x204/0x3a0
 btrfs_wait_ordered_extents+0xa40/0xaf0 [btrfs]
 qgroup_reserve+0x913/0xa10 [btrfs]
 btrfs_qgroup_reserve_data+0x3ef/0x580 [btrfs]
 btrfs_check_data_free_space+0x96/0xd0 [btrfs]
 __btrfs_buffered_write+0x3ac/0xd40 [btrfs]
 btrfs_file_write_iter+0x62a/0xba0 [btrfs]
 __vfs_write+0x320/0x430
 vfs_write+0x107/0x270
 SyS_write+0xbf/0x150
 do_syscall_64+0x1b0/0x3d0
 entry_SYSCALL64_slow_path+0x25/0x25

Another can be caused by trying to commit one transaction while nesting
with trans handle held by ourselves:

btrfs_start_transaction()
|- btrfs_qgroup_reserve_meta_pertrans()
   |- qgroup_reserve()
      |- btrfs_join_transaction()
      |- btrfs_commit_transaction()

The retry is causing more problems than exppected when limit is enabled.
At least a graceful EDQUOT is way better than deadlock.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
Qu Wenruo
4ee0d8832c btrfs: qgroup: Update trace events for metadata reservation
Now trace_qgroup_meta_reserve() will have extra type parameter.

And introduce two new trace events:

1) trace_qgroup_meta_free_all_pertrans()
   For btrfs_qgroup_free_meta_all_pertrans()

2) trace_qgroup_meta_convert()
   For btrfs_qgroup_convert_reserved_meta()

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:05 +02:00
Qu Wenruo
8287475a20 btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space
For quota disabled->enable case, it's possible that at reservation time
quota was not enabled so no bytes were really reserved, while at release
time, quota was enabled so we will try to release some bytes we didn't
really own.

Such situation can cause metadata reserveation underflow, for both types,
also less possible for per-trans type since quota enable will commit
transaction.

To address this, record qgroup meta reserved bytes into
root::qgroup_meta_rsv_pertrans and ::prealloc.
So at releasing time we won't free any bytes we didn't reserve.

For DATA, it's already handled by io_tree, so nothing needs to be done
there.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:04 +02:00
Qu Wenruo
4f5427ccce btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item
Quite similar for delalloc, some modification to delayed-inode and
delayed-item reservation.  Also needs extra parameter for release case
to distinguish normal release and error release.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 02:01:03 +02:00
Theodore Ts'o
9496005d6c ext4: add bounds checking to ext4_xattr_find_entry()
Add some paranoia checks to make sure we don't stray beyond the end of
the valid memory region containing ext4 xattr entries while we are
scanning for a match.

Also rename the function to xattr_find_entry() since it is static and
thus only used in fs/ext4/xattr.c

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2018-03-30 20:00:56 -04:00
Qu Wenruo
43b18595d6 btrfs: qgroup: Use separate meta reservation type for delalloc
Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
preallocated meta rsv, making it quite easy to underflow qgroup meta
reservation.

Since we have the new qgroup meta rsv types, apply it to delalloc
reservation.

Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
rsv type.

And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
they will convert corresponding META_PREALLOC reservation to
META_PERTRANS.

This is mainly due to the fact that current qgroup numbers will only be
updated in btrfs_commit_transaction(), that's to say if we don't keep
such placeholder reservation, we can exceed qgroup limitation.

And for callers freeing outstanding extent in error handler, we will
just free META_PREALLOC bytes.

This behavior makes callers of btrfs_qgroup_release_meta() or
btrfs_qgroup_convert_meta() to be aware of which type they are.
So in this patch, btrfs_delalloc_release_metadata() and its callers get
an extra parameter to info qgroup to do correct meta convert/release.

The good news is, even we use the wrong type (convert or free), it won't
cause obvious bug, as prealloc type is always in good shape, and the
type only affects how per-trans meta is increased or not.

So the worst case will be at most metadata limitation can be sometimes
exceeded (no convert at all) or metadata limitation is reached too soon
(no free at all).

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:14 +02:00
Qu Wenruo
64cfaef636 btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS
For meta_prealloc reservation users, after btrfs_join_transaction()
caller will modify tree so part (or even all) meta_prealloc reservation
should be converted to meta_pertrans until transaction commit time.

This patch introduces a new function,
btrfs_qgroup_convert_reserved_meta() to do this for META_PREALLOC
reservation user.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:14 +02:00
Qu Wenruo
e1211d0e89 btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup
Since qgroup has seperate metadata reservation types now, we can
completely get rid of the old root->qgroup_meta_rsv, which mostly acts
as current META_PERTRANS reservation type.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:14 +02:00
Qu Wenruo
733e03a0b2 btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans
Btrfs uses 2 different methods to reseve metadata qgroup space.

1) Reserve at btrfs_start_transaction() time
   This is quite straightforward, caller will use the trans handler
   allocated to modify b-trees.

   In this case, reserved metadata should be kept until qgroup numbers
   are updated.

2) Reserve by using block_rsv first, and later btrfs_join_transaction()
   This is more complicated, caller will reserve space using block_rsv
   first, and then later call btrfs_join_transaction() to get a trans
   handle.

   In this case, before we modify trees, the reserved space can be
   modified on demand, and after btrfs_join_transaction(), such reserved
   space should also be kept until qgroup numbers are updated.

Since these two types behave differently, split the original "META"
reservation type into 2 sub-types:

  META_PERTRANS:
    For above case 1)

  META_PREALLOC:
    For reservations that happened before btrfs_join_transaction() of
    case 2)

NOTE: This patch will only convert existing qgroup meta reservation
callers according to its situation, not ensuring all callers are at
correct timing.
Such fix will be added in later patches.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:14 +02:00
Qu Wenruo
5c40507ffb btrfs: qgroup: Cleanup the remaining old reservation counters
So qgroup is switched to new separate types reservation system.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:13 +02:00
Qu Wenruo
64ee4e751a btrfs: qgroup: Update trace events to use new separate rsv types
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:13 +02:00
Qu Wenruo
429d6275d5 btrfs: qgroup: Fix wrong qgroup reservation update for relationship modification
When modifying qgroup relationship, for qgroup which only owns exclusive
extents, we will go through quick update path.

In this path, we will add/subtract exclusive and reference number for
parent qgroup, since the source (child) qgroup only has exclusive
extents, destination (parent) qgroup will also own or lose those extents
exclusively.

The same should be the same for reservation, since later reservation
adding/releasing will also affect parent qgroup, without the reservation
carried from child, parent will underflow reservation or have dead
reservation which will never be freed.

However original code doesn't do the same thing for reservation.
It handles qgroup reservation quite differently:

It removes qgroup reservation, as it's allocating space from the
reserved qgroup for relationship adding.
But does nothing for qgroup reservation if we're removing a qgroup
relationship.

According to the original code, it looks just like because we're adding
qgroup->rfer, the code assumes we're writing new data, so it's follows
the normal write routine, by reducing qgroup->reserved and adding
qgroup->rfer/excl.

This old behavior is wrong, and should be fixed to follow the same
excl/rfer behavior.

Just fix it by using the correct behavior described above.

Fixes: 31193213f1f9 ("Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-03-31 01:41:13 +02:00