IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
If we get an -ENOENT back from btrfs_uuid_iter_rem when iterating the
uuid tree we'll just continue and do btrfs_next_item(). However we've
done a btrfs_release_path() at this point and no longer have a valid
path. So increment the key and go back and do a normal search.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can just abort the transaction here, and in fact do that for every
other failure in this function except these two cases.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Normally when cloning a file range if we find an implicit hole at the end
of the range we assume it is because the NO_HOLES feature is enabled.
However that is not always the case. One well known case [1] is when we
have a power failure after mixing buffered and direct IO writes against
the same file.
In such cases we need to punch a hole in the destination file, and if
the NO_HOLES feature is not enabled, we need to insert explicit file
extent items to represent the hole. After commit 690a5dbfc51315
("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning
extents"), we started to insert file extent items representing the hole
with an item size of 0, which is invalid and should be 53 bytes (the size
of a btrfs_file_extent_item structure), resulting in all sorts of
corruptions and invalid memory accesses. This is detected by the tree
checker when we attempt to write a leaf to disk.
The problem can be sporadically triggered by test case generic/561 from
fstests. That test case does not exercise power failure and creates a new
filesystem when it starts, so it does not use a filesystem created by any
previous test that tests power failure. However the test does both
buffered and direct IO writes (through fsstress) and it's precisely that
which is creating the implicit holes in files. That happens even before
the commit mentioned earlier. I need to investigate why we get those
implicit holes to check if there is a real problem or not. For now this
change fixes the regression of introducing file extent items with an item
size of 0 bytes.
Fix the issue by calling btrfs_punch_hole_range() without passing a
btrfs_clone_extent_info structure, which ensures file extent items are
inserted to represent the hole with a correct item size. We were passing
a btrfs_clone_extent_info with a value of 0 for its 'item_size' field,
which was causing the insertion of file extent items with an item size
of 0.
[1] https://www.spinics.net/lists/linux-btrfs/msg75350.html
Reported-by: David Sterba <dsterba@suse.com>
Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When a tree mod log user no longer needs to use the tree it calls
btrfs_put_tree_mod_seq() to remove itself from the list of users and
delete all no longer used elements of the tree's red black tree, which
should be all elements with a sequence number less then our equals to
the caller's sequence number. However the logic is broken because it
can delete and free elements from the red black tree that have a
sequence number greater then the caller's sequence number:
1) At a point in time we have sequence numbers 1, 2, 3 and 4 in the
tree mod log;
2) The task which got assigned the sequence number 1 calls
btrfs_put_tree_mod_seq();
3) Sequence number 1 is deleted from the list of sequence numbers;
4) The current minimum sequence number is computed to be the sequence
number 2;
5) A task using sequence number 2 is at tree_mod_log_rewind() and gets
a pointer to one of its elements from the red black tree through
a call to tree_mod_log_search();
6) The task with sequence number 1 iterates the red black tree of tree
modification elements and deletes (and frees) all elements with a
sequence number less then or equals to 2 (the computed minimum sequence
number) - it ends up only leaving elements with sequence numbers of 3
and 4;
7) The task with sequence number 2 now uses the pointer to its element,
already freed by the other task, at __tree_mod_log_rewind(), resulting
in a use-after-free issue. When CONFIG_DEBUG_PAGEALLOC=y it produces
a trace like the following:
[16804.546854] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[16804.547451] CPU: 0 PID: 28257 Comm: pool Tainted: G W 5.4.0-rc8-btrfs-next-51 #1
[16804.548059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[16804.548666] RIP: 0010:rb_next+0x16/0x50
(...)
[16804.550581] RSP: 0018:ffffb948418ef9b0 EFLAGS: 00010202
[16804.551227] RAX: 6b6b6b6b6b6b6b6b RBX: ffff90e0247f6600 RCX: 6b6b6b6b6b6b6b6b
[16804.551873] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e0247f6600
[16804.552504] RBP: ffff90dffe0d4688 R08: 0000000000000001 R09: 0000000000000000
[16804.553136] R10: ffff90dffa4a0040 R11: 0000000000000000 R12: 000000000000002e
[16804.553768] R13: ffff90e0247f6600 R14: 0000000000001663 R15: ffff90dff77862b8
[16804.554399] FS: 00007f4b197ae700(0000) GS:ffff90e036a00000(0000) knlGS:0000000000000000
[16804.555039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16804.555683] CR2: 00007f4b10022000 CR3: 00000002060e2004 CR4: 00000000003606f0
[16804.556336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[16804.556968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[16804.557583] Call Trace:
[16804.558207] __tree_mod_log_rewind+0xbf/0x280 [btrfs]
[16804.558835] btrfs_search_old_slot+0x105/0xd00 [btrfs]
[16804.559468] resolve_indirect_refs+0x1eb/0xc70 [btrfs]
[16804.560087] ? free_extent_buffer.part.19+0x5a/0xc0 [btrfs]
[16804.560700] find_parent_nodes+0x388/0x1120 [btrfs]
[16804.561310] btrfs_check_shared+0x115/0x1c0 [btrfs]
[16804.561916] ? extent_fiemap+0x59d/0x6d0 [btrfs]
[16804.562518] extent_fiemap+0x59d/0x6d0 [btrfs]
[16804.563112] ? __might_fault+0x11/0x90
[16804.563706] do_vfs_ioctl+0x45a/0x700
[16804.564299] ksys_ioctl+0x70/0x80
[16804.564885] ? trace_hardirqs_off_thunk+0x1a/0x20
[16804.565461] __x64_sys_ioctl+0x16/0x20
[16804.566020] do_syscall_64+0x5c/0x250
[16804.566580] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[16804.567153] RIP: 0033:0x7f4b1ba2add7
(...)
[16804.568907] RSP: 002b:00007f4b197adc88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[16804.569513] RAX: ffffffffffffffda RBX: 00007f4b100210d8 RCX: 00007f4b1ba2add7
[16804.570133] RDX: 00007f4b100210d8 RSI: 00000000c020660b RDI: 0000000000000003
[16804.570726] RBP: 000055de05a6cfe0 R08: 0000000000000000 R09: 00007f4b197add44
[16804.571314] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4b197add48
[16804.571905] R13: 00007f4b197add40 R14: 00007f4b100210d0 R15: 00007f4b197add50
(...)
[16804.575623] ---[ end trace 87317359aad4ba50 ]---
Fix this by making btrfs_put_tree_mod_seq() skip deletion of elements that
have a sequence number equals to the computed minimum sequence number, and
not just elements with a sequence number greater then that minimum.
Fixes: bd989ba359f2ac ("Btrfs: add tree modification log functions")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Having checksum items, either on the checksums tree or in a log tree, that
represent ranges that overlap each other is a sign of a corruption. Such
case confuses the checksum lookup code and can result in not being able to
find checksums or find stale checksums.
So add a check for such case.
This is motivated by a recent fix for a case where a log tree had checksum
items covering ranges that overlap each other due to extent cloning, and
resulted in missing checksums after replaying the log tree. It also helps
detect past issues such as stale and outdated checksums due to overlapping,
commit 27b9a8122ff71a ("Btrfs: fix csum tree corruption, duplicate and
outdated checksums").
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When logging a file that has shared extents (reflinked with other files or
with itself), we can end up logging multiple checksum items that cover
overlapping ranges. This confuses the search for checksums at log replay
time causing some checksums to never be added to the fs/subvolume tree.
Consider the following example of a file that shares the same extent at
offsets 0 and 256Kb:
[ bytenr 13893632, offset 64Kb, len 64Kb ]
0 64Kb
[ bytenr 13631488, offset 64Kb, len 192Kb ]
64Kb 256Kb
[ bytenr 13893632, offset 0, len 256Kb ]
256Kb 512Kb
When logging the inode, at tree-log.c:copy_items(), when processing the
file extent item at offset 0, we log a checksum item covering the range
13959168 to 14024704, which corresponds to 13893632 + 64Kb and 13893632 +
64Kb + 64Kb, respectively.
Later when processing the extent item at offset 256K, we log the checksums
for the range from 13893632 to 14155776 (which corresponds to 13893632 +
256Kb). These checksums get merged with the checksum item for the range
from 13631488 to 13893632 (13631488 + 256Kb), logged by a previous fsync.
So after this we get the two following checksum items in the log tree:
(...)
item 6 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 3095 itemsize 512
range start 13631488 end 14155776 length 524288
item 7 key (EXTENT_CSUM EXTENT_CSUM 13959168) itemoff 3031 itemsize 64
range start 13959168 end 14024704 length 65536
The first one covers the range from the second one, they overlap.
So far this does not cause a problem after replaying the log, because
when replaying the file extent item for offset 256K, we copy all the
checksums for the extent 13893632 from the log tree to the fs/subvolume
tree, since searching for an checksum item for bytenr 13893632 leaves us
at the first checksum item, which covers the whole range of the extent.
However if we write 64Kb to file offset 256Kb for example, we will
not be able to find and copy the checksums for the last 128Kb of the
extent at bytenr 13893632, referenced by the file range 384Kb to 512Kb.
After writing 64Kb into file offset 256Kb we get the following extent
layout for our file:
[ bytenr 13893632, offset 64K, len 64Kb ]
0 64Kb
[ bytenr 13631488, offset 64Kb, len 192Kb ]
64Kb 256Kb
[ bytenr 14155776, offset 0, len 64Kb ]
256Kb 320Kb
[ bytenr 13893632, offset 64Kb, len 192Kb ]
320Kb 512Kb
After fsync'ing the file, if we have a power failure and then mount
the filesystem to replay the log, the following happens:
1) When replaying the file extent item for file offset 320Kb, we
lookup for the checksums for the extent range from 13959168
(13893632 + 64Kb) to 14155776 (13893632 + 256Kb), through a call
to btrfs_lookup_csums_range();
2) btrfs_lookup_csums_range() finds the checksum item that starts
precisely at offset 13959168 (item 7 in the log tree, shown before);
3) However that checksum item only covers 64Kb of data, and not 192Kb
of data;
4) As a result only the checksums for the first 64Kb of data referenced
by the file extent item are found and copied to the fs/subvolume tree.
The remaining 128Kb of data, file range 384Kb to 512Kb, doesn't get
the corresponding data checksums found and copied to the fs/subvolume
tree.
5) After replaying the log userspace will not be able to read the file
range from 384Kb to 512Kb, because the checksums are missing and
resulting in an -EIO error.
The following steps reproduce this scenario:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt/sdc
$ xfs_io -f -c "pwrite -S 0xa3 0 256K" /mnt/sdc/foobar
$ xfs_io -c "fsync" /mnt/sdc/foobar
$ xfs_io -c "pwrite -S 0xc7 256K 256K" /mnt/sdc/foobar
$ xfs_io -c "reflink /mnt/sdc/foobar 320K 0 64K" /mnt/sdc/foobar
$ xfs_io -c "fsync" /mnt/sdc/foobar
$ xfs_io -c "pwrite -S 0xe5 256K 64K" /mnt/sdc/foobar
$ xfs_io -c "fsync" /mnt/sdc/foobar
<power failure>
$ mount /dev/sdc /mnt/sdc
$ md5sum /mnt/sdc/foobar
md5sum: /mnt/sdc/foobar: Input/output error
$ dmesg | tail
[165305.003464] BTRFS info (device sdc): no csum found for inode 257 start 401408
[165305.004014] BTRFS info (device sdc): no csum found for inode 257 start 405504
[165305.004559] BTRFS info (device sdc): no csum found for inode 257 start 409600
[165305.005101] BTRFS info (device sdc): no csum found for inode 257 start 413696
[165305.005627] BTRFS info (device sdc): no csum found for inode 257 start 417792
[165305.006134] BTRFS info (device sdc): no csum found for inode 257 start 421888
[165305.006625] BTRFS info (device sdc): no csum found for inode 257 start 425984
[165305.007278] BTRFS info (device sdc): no csum found for inode 257 start 430080
[165305.008248] BTRFS warning (device sdc): csum failed root 5 ino 257 off 393216 csum 0x1337385e expected csum 0x00000000 mirror 1
[165305.009550] BTRFS warning (device sdc): csum failed root 5 ino 257 off 393216 csum 0x1337385e expected csum 0x00000000 mirror 1
Fix this simply by deleting first any checksums, from the log tree, for the
range of the extent we are logging at copy_items(). This ensures we do not
get checksum items in the log tree that have overlapping ranges.
This is a long time issue that has been present since we have the clone
(and deduplication) ioctl, and can happen both when an extent is shared
between different files and within the same file.
A test case for fstests follows soon.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Callers of alloc_test_extent_buffer have not correctly interpreted the
return value as error pointer, as alloc_test_extent_buffer should behave
as alloc_extent_buffer. The self-tests were unaffected but
btrfs_find_create_tree_block could call both functions and that would
cause problems up in the call chain.
Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The value 0 for devs_max means to spread the allocated chunks over all
available devices, eg. stripe for RAID0 or RAID5. This got mistakenly
copied to the RAID1C3/4 profiles. The intention is to have exactly 3 and
4 copies respectively.
Fixes: 47e6f7423b91 ("btrfs: add support for 3-copy replication (raid1c3)")
Fixes: 8d6fac0087e5 ("btrfs: add support for 4-copy replication (raid1c4)")
Signed-off-by: David Sterba <dsterba@suse.com>
Argument BTRFS_FILE_EXTENT_INLINE_DATA_START is defined as offsetof(),
which returns type size_t, so we need %zu instead of %lu.
This fixes a build warning on 32-bit ARM:
../fs/btrfs/tree-checker.c: In function 'check_extent_data_item':
../fs/btrfs/tree-checker.c:230:43: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=]
230 | "invalid item size, have %u expect [%lu, %u)",
| ~~^
| long unsigned int
| %u
Fixes: 153a6d299956 ("btrfs: tree-checker: Check item size before reading file extent type")
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andreas Färber <afaerber@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If we're rename exchanging two subvols we'll try to lock this lock
twice, which is bad. Just lock once if either of the ino's are subvols.
Fixes: cdd1fedf8261 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have a BUG_ON(ret < 0) in find_free_extent from
btrfs_cache_block_group. If we fail to allocate our ctl we'll just
panic, which is not good. Instead just go on to another block group.
If we fail to find a block group we don't want to return ENOSPC, because
really we got a ENOMEM and that's the root of the problem. Save our
return from btrfs_cache_block_group(), and then if we still fail to make
our allocation return that ret so we get the right error back.
Tested with inject-error.py from bcc.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Testing with the new fsstress uncovered a pretty nasty deadlock with
lookup and snapshot deletion.
Process A
unlink
-> final iput
-> inode_tree_del
-> synchronize_srcu(subvol_srcu)
Process B
btrfs_lookup <- srcu_read_lock() acquired here
-> btrfs_iget
-> find inode that has I_FREEING set
-> __wait_on_freeing_inode()
We're holding the srcu_read_lock() while doing the iget in order to make
sure our fs root doesn't go away, and then we are waiting for the inode
to finish freeing. However because the free'ing process is doing a
synchronize_srcu() we deadlock.
Fix this by dropping the synchronize_srcu() in inode_tree_del(). We
don't need people to stop accessing the fs root at this point, we're
only adding our empty root to the dead roots list.
A larger much more invasive fix is forthcoming to address how we deal
with fs roots, but this fixes the immediate problem.
Fixes: 76dda93c6ae2 ("Btrfs: add snapshot/subvolume destroy ioctl")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since commit 0e4a459f56c32d3e ("tracing: Remove unnecessary DEBUG_FS
dependency"), CONFIG_DEBUG_FS is no longer auto-enabled. This breaks
booting Debian 9, as systemd needs debugfs:
[FAILED] Failed to mount /sys/kernel/debug.
See 'systemctl status sys-kernel-debug.mount' for details.
[DEPEND] Dependency failed for Local File Systems.
...
You are in emergGive root password for maintenance
(or press Control-D to continue):
Fix this by enabling CONFIG_DEBUG_FS explicitly.
See also commit 18977008f44c66bd ("ARM: multi_v7_defconfig: Restore
debugfs support").
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://lore.kernel.org/r/20191209101327.26571-1-geert+renesas@glider.be
When using the NO_HOLES feature if we clone a range that contains a hole
and a temporary ENOSPC happens while dropping extents from the target
inode's range, we can end up failing and aborting the transaction with
-EEXIST or with a corrupt file extent item, that has a length greater
than it should and overlaps with other extents. For example when cloning
the following range from inode A to inode B:
Inode A:
extent A1 extent A2
[ ----------- ] [ hole, implicit, 4MB length ] [ ------------- ]
0 1MB 5MB 6MB
Range to clone: [1MB, 6MB)
Inode B:
extent B1 extent B2 extent B3 extent B4
[ ---------- ] [ --------- ] [ ---------- ] [ ---------- ]
0 1MB 1MB 2MB 2MB 5MB 5MB 6MB
Target range: [1MB, 6MB) (same as source, to make it easier to explain)
The following can happen:
1) btrfs_punch_hole_range() gets -ENOSPC from __btrfs_drop_extents();
2) At that point, 'cur_offset' is set to 1MB and __btrfs_drop_extents()
set 'drop_end' to 2MB, meaning it was able to drop only extent B2;
3) We then compute 'clone_len' as 'drop_end' - 'cur_offset' = 2MB - 1MB =
1MB;
4) We then attempt to insert a file extent item at inode B with a file
offset of 5MB, which is the value of clone_info->file_offset. This
fails with error -EEXIST because there's already an extent at that
offset (extent B4);
5) We abort the current transaction with -EEXIST and return that error
to user space as well.
Another example, for extent corruption:
Inode A:
extent A1 extent A2
[ ----------- ] [ hole, implicit, 10MB length ] [ ------------- ]
0 1MB 11MB 12MB
Inode B:
extent B1 extent B2
[ ----------- ] [ --------- ] [ ----------------------------- ]
0 1MB 1MB 5MB 5MB 12MB
Target range: [1MB, 12MB) (same as source, to make it easier to explain)
1) btrfs_punch_hole_range() gets -ENOSPC from __btrfs_drop_extents();
2) At that point, 'cur_offset' is set to 1MB and __btrfs_drop_extents()
set 'drop_end' to 5MB, meaning it was able to drop only extent B2;
3) We then compute 'clone_len' as 'drop_end' - 'cur_offset' = 5MB - 1MB =
4MB;
4) We then insert a file extent item at inode B with a file offset of 11MB
which is the value of clone_info->file_offset, and a length of 4MB (the
value of 'clone_len'). So we get 2 extents items with ranges that
overlap and an extent length of 4MB, larger then the extent A2 from
inode A (1MB length);
5) After that we end the transaction, balance the btree dirty pages and
then start another or join the previous transaction. It might happen
that the transaction which inserted the incorrect extent was committed
by another task so we end up with extent corruption if a power failure
happens.
So fix this by making sure we attempt to insert the extent to clone at
the destination inode only if we are past dropping the sub-range that
corresponds to a hole.
Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The branch of qgroup_rescan_init which is executed from the mount
path prints wrong errors messages. The textual print out in case
BTRFS_QGROUP_STATUS_FLAG_RESCAN/BTRFS_QGROUP_STATUS_FLAG_ON are not
set are transposed. Fix it by exchanging their place.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
* pm-cpuidle:
cpuidle: Drop unnecessary type cast in cpuidle_poll_time()
cpuidle: Fix cpuidle_driver_state_disabled()
cpuidle: use first valid target residency as poll time
* acpi-pm:
ACPI: PM: Avoid attaching ACPI PM domain to certain devices
When CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not defined
reserve_additional_memory() will set balloon_stats.target_pages to a
wrong value in case there are still some ballooned pages allocated via
alloc_xenballooned_pages().
This will result in balloon_process() no longer be triggered when
ballooned pages are freed in batches.
Reported-by: Nicholas Tsirakis <niko.tsirakis@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem
cache. This cache is destoyed when xen-blkif is unloaded so it is
necessary to wait for the deferred free routine used for such objects to
complete. This necessity was missed in commit 14855954f636 "xen-blkback:
allow module to be cleanly unloaded". This patch fixes the problem by
taking/releasing extra module references in xen_blkif_alloc/free()
respectively.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
SMB2_tdis() checks if a root handle is valid in order to decide
whether it needs to close the handle or not. However if another
thread has reference for the handle, it may end up with putting
the reference twice. The extra reference that we want to put
during the tree disconnect is the reference that has a directory
lease. So, track the fact that we have a directory lease and
close the handle only in that case.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ran into an intermittent crash in
SMB2_open_init+0x2f6/0x970
due to oparms.cifs_sb not being initialized when called from:
smb2_compound_op+0x45d/0x1690
Zero the whole oparms struct in the compounding path before setting up the
oparms so we don't risk any uninitialized fields.
Fixes: fdef665ba44a ("smb3: fix mode passed in on create for modetosid mount option")
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
drm-fixes-5.5-2019-12-12:
amdgpu:
- DC fixes for renoir
- Gfx8 fence flush align with mesa
- Power profile fix for arcturus
- Freesync fix
- DC I2c over aux fix
- DC aux defer fix
- GPU reset fix
- GPUVM invalidation semaphore fixes for PCO and SR-IOV
- Golden settings updates for gfx10
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191212223211.8034-1-alexander.deucher@amd.com
- Fix user reported issue #673: GPU hang on transition to idle
- Avoid corruption on the top of the screen on GLK+ by disabling FBC
- Fix non-privileged access to OA on Tigerlake
- Fix HDCP code not to touch global state when just computing commit
- Fix CI splat by saving irqstate around virtual_context_destroy
- Serialise context retirement possibly on another CPU
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191212100759.GA22260@jlahtine-desk.ger.corp.intel.com
Commit 5e6669387e22 ("of/platform: Pause/resume sync state during init
and of_platform_populate()") paused/resumed sync state during init only
if Linux had parsed and populated a devicetree.
However, the check for that (of_have_populated_dt()) can change after
of_platform_default_populate_init() executes. One example of this is
when devicetree unittests are enabled. This causes an unmatched
pause/resume of sync state. To avoid this, just unconditionally
pause/resume sync state during init.
Fixes: 5e6669387e22 ("of/platform: Pause/resume sync state during init and of_platform_populate()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Reviewed-by: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Json-schema requires a $ref to be under an 'allOf' if there are
additional constraints otherwise the additional constraints are
ignored. (Note that this behavior will be changed in draft8.)
Fixes: 641262f5e1ed ("dt-bindings: memory: Add binding for NVIDIA Tegra30 External Memory Controller")
Fixes: 785685b7a106 ("dt-bindings: memory: Add binding for NVIDIA Tegra30 Memory Controller")
Fixes: 8da65c377b21 ("dt-bindings: memory: tegra30: Convert to Tegra124 YAML")
Cc: Thierry Reding <treding@nvidia.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: linux-tegra@vger.kernel.org
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
While my email address has changed for a while, all the schemas I
contributed still have the old one unfortunately. Update it.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Rob Herring <robh@kernel.org>
Fix device memory flows so that only once there will be no live mmaped
VA to a given allocation the matching object will be destroyed.
This prevents a potential scenario that existing VA that was mmaped by
one process might still be used post its deallocation despite that it's
owned now by other process.
The above is achieved by integrating with IB core APIs to manage
mmap/munmap. Only once the refcount will become 0 the DM object and its
underlay area will be freed.
Fixes: 3b113a1ec3d4 ("IB/mlx5: Support device memory type attribute")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212100237.330654-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Introduce rdma_user_mmap_entry_insert_range() API to be used once the
required key for the given entry should be in a given range.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212100237.330654-2-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
It may fail to load guest driver in round 2 when using invalidate
semaphore for SRIOV. So it needs to avoid using invalidate semaphore
for SRIOV.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It may fail to load guest driver in round 2 or cause Xstart problem
when using invalidate semaphore for SRIOV or picasso. So it needs avoid
using invalidate semaphore for SRIOV and picasso.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are two flow rule destinations: QP and packet. While users are
setting DROP packet rule, the QP should not be set as a destination.
Fixes: 3b3233fbf02e ("IB/mlx5: Add flow counters binding support")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212091214.315005-4-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Current code device add sequence is:
ib_register_device()
ib_mad_init()
init_sriov_init()
register_netdev_notifier()
Therefore, the remove sequence should be,
unregister_netdev_notifier()
close_sriov()
mad_cleanup()
ib_unregister_device()
However it is not above.
Hence, make do above remove sequence.
Fixes: fa417f7b520ee ("IB/mlx4: Add support for IBoE")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Link: https://lore.kernel.org/r/20191212091214.315005-3-leon@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
from Xiubo, a patch to add some observability into cap waiters from
Jeff and a couple of cleanups.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl3yiJMTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi5k9CACmM3fJGrTUuOLgXAxxllCfiV6UQLoY
nuTo/bx0DmG603n+Ze8+Z0iz7hDc1Gw2XUeLkJcAE/xSetgZXO/MvJ0Ionq5Ac/k
CrqS6ucIa1bPxbE1QMTHswHjkajKwBpAZ5+khdLNLuXJxy3c9HDCGOT4VZav7Yc9
99W4kIdzOKdYLpZHAedMK97IJIrD5WhYTAFW4rNPY0GL6OPD1V0uiS9v7xUWIxnZ
Uusnu+zY8miQlLVx/V9DyLh/6G5X7XyQO1nkSQcVXZOOG7+qnkq6jDhQW8adgOSZ
wUFigTxxhSTIcntWg01TaCRNoi1N3/P8Z9/rD27zBHPbl93ANH+lUkCh
=NicF
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.5-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A fix to avoid a corner case when scheduling cap reclaim in batches
from Xiubo, a patch to add some observability into cap waiters from
Jeff and a couple of cleanups"
* tag 'ceph-for-5.5-rc2' of git://github.com/ceph/ceph-client:
ceph: add more debug info when decoding mdsmap
ceph: switch to global cap helper
ceph: trigger the reclaim work once there has enough pending caps
ceph: show tasks waiting on caps in debugfs caps file
ceph: convert int fields in ceph_mount_options to unsigned int
ksys_dup() is used only at one place in the kernel, namely to duplicate
fd 0 of /dev/console to stdout and stderr. The same functionality can be
achieved by using functions already available within the kernel namespace.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
- Add missing jedec,spi-nor compatible for imx6ul-14x14-evk board,
so that SPI NOR device can be probed.
- Fix power button of E60K02 board by removing LDORTC2 regulator.
- A couple of fixes on serial number support of i.MX6ULL/ULZ SoCs to
remove the boot regression caused by 8267ff89b713 ("ARM: imx: Add
serial number support for i.MX6/7 SoCs").
- A couple of fixes on LS1028A SoC TMU regarding to calibration data
and reboot register configuration.
- Fix a regression seen on imx6ul-evk board by marking always-on for
the regulator that is shared by many peripherals.
- Explicitly restore CONFIG_DEBUG_FS in imx_v6_v7_defconfig.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJd8i+oAAoJEFBXWFqHsHzOSDYH/2clV9RTidg9hYqWV5nuMHry
16+S4LYyWEkKgSZwheMbX13+f0UEMLZxQ1/c/oTlJGa95GppEzNq0ydmfZ22mScH
1I1L1t/AfmxiVB6z3HRS0GZk8uIn7CVPe/PBf8tOm+95qEznOOXljykmB2SHLBP3
KVRSx4OIJ1Nb/IBWmPar7tEkNnLAmI77dTvCZFWbaxep5QTmHX4/dff78eghKmwL
CyYJeINpsZt6a+sXOMbD7UBUx4zWKv50QDFToCRdGWJ4o+e1vp6FM+s/D82vC+3A
ZA8cCt9oRxiDAWiwx7gvlpzNVMyLLW3ork7z+a6shtD2Q4BAbStaSNpIXcVKbtI=
=3SH8
-----END PGP SIGNATURE-----
Merge tag 'imx-fixes-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 5.5:
- Add missing jedec,spi-nor compatible for imx6ul-14x14-evk board,
so that SPI NOR device can be probed.
- Fix power button of E60K02 board by removing LDORTC2 regulator.
- A couple of fixes on serial number support of i.MX6ULL/ULZ SoCs to
remove the boot regression caused by 8267ff89b713 ("ARM: imx: Add
serial number support for i.MX6/7 SoCs").
- A couple of fixes on LS1028A SoC TMU regarding to calibration data
and reboot register configuration.
- Fix a regression seen on imx6ul-evk board by marking always-on for
the regulator that is shared by many peripherals.
- Explicitly restore CONFIG_DEBUG_FS in imx_v6_v7_defconfig.
* tag 'imx-fixes-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: imx: Fix boot crash if ocotp is not found
ARM: imx_v6_v7_defconfig: Explicitly restore CONFIG_DEBUG_FS
ARM: dts: imx6ul-evk: Fix peripheral regulator
arm64: dts: ls1028a: fix reboot node
arm64: dts: ls1028a: fix typo in TMU calibration data
ARM: imx: Correct ocotp id for serial number support of i.MX6ULL/ULZ SoCs
ARM: dts: e60k02: fix power button
ARM: dts: imx6ul: imx6ul-14x14-evk.dtsi: Fix SPI NOR probing
Link: https://lore.kernel.org/r/20191212122427.GK15858@dragon
Signed-off-by: Olof Johansson <olof@lixom.net>
The data type of the target_residency_ns field in struct cpuidle_state
is u64, so it does not need to be cast into u64.
Get rid of the unnecessary type cast.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since blk_drain_queue had already been removed, so this function
is not needed anymore.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We have dts property for "ti,sysc-delay-us", and we're using it, but the
wait after OCP softreset only happens if devices are probed in legacy mode.
Let's add a delay after writing the OCP softreset when specified.
Fixes: e0db94fe87da ("bus: ti-sysc: Make OCP reset work for sysstatus and sysconfig reset bits")
Cc: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The IDE driver creates some passthru requests which never get
submitted to the block layer in such a way that blk_account_io_start()
gets called. However, the driver still calls __blk_mq_end_request() in
ide_end_rq() which will call blk_account_io_completion() which tries
to dereferences req->part which is never set. See ide_prep_sense() for
an example of where these requests come from.
To fix this, blk_account_io_completion() and blk_account_io_done()
should do nothing if req->part is not set.
The back trace of this bug is:
BUG: kernel NULL pointer dereference, address: 000002ac
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
*pde = 00000000
Oops: 0002 [#1]
CPU: 0 PID: 237 Comm: kworker/0:1H Not tainted
5.4.0-rc2-00011-g48d9b0d43105e #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1
04/01/2014
Workqueue: kblockd drive_rq_insert_work
EIP: blk_account_io_completion+0x7a/0xf0
Code: 89 54 24 08 31 d2 89 4c 24 04 31 c9 c7 04 24 02 00 00 00 c1 ee
09 e8 f5 21 a6 ff e8 70 5c a7 ff 8b 53 60 8d 04 bd 00 00 00 00 <01> b4
02 ac 02 00 00 8b 9a 88 02 00 00 85 db 74 11 85 d2 74 51 8b
EAX: 00000000 EBX: f5b80000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: f3031e70 ESP: f3031e54
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010046
CR0: 80050033 CR2: 000002ac CR3: 03c25000 CR4: 000406d0
Call Trace:
<IRQ>
blk_update_request+0x85/0x420
ide_end_rq+0x38/0xa0
ide_complete_rq+0x3d/0x70
cdrom_newpc_intr+0x258/0xba0
ide_intr+0x135/0x250
__handle_irq_event_percpu+0x3e/0x250
handle_irq_event_percpu+0x1f/0x50
handle_irq_event+0x32/0x60
handle_level_irq+0x6c/0x110
handle_irq+0x72/0xa0
</IRQ>
do_IRQ+0x45/0xad
common_interrupt+0x115/0x11c
Fixes: 48d9b0d43105 ("block: account statistics for passthrough requests")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It may cause timeout waiting for sem acquire in VM flush when using
invalidate semaphore for picasso. So it needs to avoid using invalidate
semaphore for piasso.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit f2efc6e60089c99c342a6b7da47f1037e06c4296.
This was fixed properly for 5.5, but came back via 5.4 merge
into drm-next, so revert it again.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In prepare_namespace(), do_mount() can be used instead of ksys_mount()
as the first and third argument are const strings in the kernel, the
second and fourth argument are passed through anyway, and the fifth
argument is NULL.
In do_mount_root(), ksys_mount() is called with the first and third
argument being already kernelspace strings, which do not need to be
copied over from userspace to kernelspace (again). The second and
fourth arguments are passed through to do_mount() anyway. The fifth
argument, while already residing in kernelspace, needs to be put into
a page of its own. Then, do_mount() can be used instead of
ksys_mount().
Once this is done, there are no in-kernel users to ksys_mount() left,
which can therefore be removed.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
All three calls to ksys_mount() in initrd-related kernel code can
be switched over to do_mount():
- the first and third arguments are const strings in the kernel,
and do not need to be copied over from userspace;
- the fifth argument is NULL, and therefore no page needs to be,
copied over from userspace;
- the second and fourth argument are passed through anyway.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
In devtmpfs, do_mount() can be called directly instead of complex wrapping
by ksys_mount():
- the first and third arguments are const strings in the kernel,
and do not need to be copied over from userspace;
- the fifth argument is NULL, and therefore no page needs to be
copied over from userspace;
- the second and fourth argument are passed through anyway.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
The original ID that was added for Comet Lake PCH was
actually for the -LP (low power) variant even though the
constant for it said CMLH. Changing that while at it.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Felipe Balbi <balbi@kernel.org>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191212093713.60614-1-heikki.krogerus@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As we will remove items off the list using list_del(), we need to use the
safe version of list_for_each_entry().
Fixes: 4e60a9568dc6 ("interconnect: qcom: add msm8974 driver")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Link: https://lore.kernel.org/r/20191212075332.16202-5-georgi.djakov@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As we will remove items off the list using list_del(), we need to use the
safe version of list_for_each_entry().
Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
Cc: <stable@vger.kernel.org> # v5.4
Link: https://lore.kernel.org/r/20191212075332.16202-4-georgi.djakov@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>