62798 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Josef Bacik
|
f4b1363cae |
btrfs: do not do delalloc reservation under page lock
We ran into a deadlock in production with the fixup worker. The stack traces were as follows: Thread responsible for the writeout, waiting on the page lock [<0>] io_schedule+0x12/0x40 [<0>] __lock_page+0x109/0x1e0 [<0>] extent_write_cache_pages+0x206/0x360 [<0>] extent_writepages+0x40/0x60 [<0>] do_writepages+0x31/0xb0 [<0>] __writeback_single_inode+0x3d/0x350 [<0>] writeback_sb_inodes+0x19d/0x3c0 [<0>] __writeback_inodes_wb+0x5d/0xb0 [<0>] wb_writeback+0x231/0x2c0 [<0>] wb_workfn+0x308/0x3c0 [<0>] process_one_work+0x1e0/0x390 [<0>] worker_thread+0x2b/0x3c0 [<0>] kthread+0x113/0x130 [<0>] ret_from_fork+0x35/0x40 [<0>] 0xffffffffffffffff Thread of the fixup worker who is holding the page lock [<0>] start_delalloc_inodes+0x241/0x2d0 [<0>] btrfs_start_delalloc_roots+0x179/0x230 [<0>] btrfs_alloc_data_chunk_ondemand+0x11b/0x2e0 [<0>] btrfs_check_data_free_space+0x53/0xa0 [<0>] btrfs_delalloc_reserve_space+0x20/0x70 [<0>] btrfs_writepage_fixup_worker+0x1fc/0x2a0 [<0>] normal_work_helper+0x11c/0x360 [<0>] process_one_work+0x1e0/0x390 [<0>] worker_thread+0x2b/0x3c0 [<0>] kthread+0x113/0x130 [<0>] ret_from_fork+0x35/0x40 [<0>] 0xffffffffffffffff Thankfully the stars have to align just right to hit this. First you have to end up in the fixup worker, which is tricky by itself (my reproducer does DIO reads into a MMAP'ed region, so not a common operation). Then you have to have less than a page size of free data space and 0 unallocated space so you go down the "commit the transaction to free up pinned space" path. This was accomplished by a random balance that was running on the host. Then you get this deadlock. I'm still in the process of trying to force the deadlock to happen on demand, but I've hit other issues. I can still trigger the fixup worker path itself so this patch has been tested in that regard, so the normal case is fine. Fixes: 87826df0ec36 ("btrfs: delalloc for page dirtied out-of-band in fixup worker") Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Josef Bacik
|
5ab5805569 |
btrfs: drop the -EBUSY case in __extent_writepage_io
Now that we only return 0 or -EAGAIN from btrfs_writepage_cow_fixup, we do not need this -EBUSY case. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Chris Mason
|
25f3c50219 |
Btrfs: keep pages dirty when using btrfs_writepage_fixup_worker
For COW, btrfs expects pages dirty pages to have been through a few setup steps. This includes reserving space for the new block allocations and marking the range in the state tree for delayed allocation. A few places outside btrfs will dirty pages directly, especially when unmapping mmap'd pages. In order for these to properly go through COW, we run them through a fixup worker to wait for stable pages, and do the delalloc prep. 87826df0ec36 added a window where the dirty pages were cleaned, but pending more action from the fixup worker. We clear_page_dirty_for_io() before we call into writepage, so the page is no longer dirty. The commit changed it so now we leave the page clean between unlocking it here and the fixup worker starting at some point in the future. During this window, page migration can jump in and relocate the page. Once our fixup work actually starts, it finds page->mapping is NULL and we end up freeing the page without ever writing it. This leads to crc errors and other exciting problems, since it screws up the whole statemachine for waiting for ordered extents. The fix here is to keep the page dirty while we're waiting for the fixup worker to get to work. This is accomplished by returning -EAGAIN from btrfs_writepage_cow_fixup if we queued the page up for fixup, which will cause the writepage function to redirty the page. Because we now expect the page to be dirty once it gets to the fixup worker we must adjust the error cases to call clear_page_dirty_for_io() on the page. That is the bulk of the patch, but it is not the fix, the fix is the -EAGAIN from btrfs_writepage_cow_fixup. We cannot separate these two changes out because the error conditions change with the new expectations. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Josef Bacik
|
a30a3d2067 |
btrfs: take overcommit into account in inc_block_group_ro
inc_block_group_ro does a calculation to see if we have enough room left over if we mark this block group as read only in order to see if it's ok to mark the block group as read only. The problem is this calculation _only_ works for data, where our used is always less than our total. For metadata we will overcommit, so this will almost always fail for metadata. Fix this by exporting btrfs_can_overcommit, and then see if we have enough space to remove the remaining free space in the block group we are trying to mark read only. If we do then we can mark this block group as read only. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Josef Bacik
|
a7a63acc65 |
btrfs: fix force usage in inc_block_group_ro
For some reason we've translated the do_chunk_alloc that goes into btrfs_inc_block_group_ro to force in inc_block_group_ro, but these are two different things. force for inc_block_group_ro is used when we are forcing the block group read only no matter what, for example when the underlying chunk is marked read only. We need to not do the space check here as this block group needs to be read only. btrfs_inc_block_group_ro() has a do_chunk_alloc flag that indicates that we need to pre-allocate a chunk before marking the block group read only. This has nothing to do with forcing, and in fact we _always_ want to do the space check in this case, so unconditionally pass false for force in this case. Then fixup inc_block_group_ro to honor force as it's expected and documented to do. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Nikolay Borisov
|
5750c37523 |
btrfs: Correctly handle empty trees in find_first_clear_extent_bit
Raviu reported that running his regular fs_trim segfaulted with the following backtrace: [ 237.525947] assertion failed: prev, in ../fs/btrfs/extent_io.c:1595 [ 237.525984] ------------[ cut here ]------------ [ 237.525985] kernel BUG at ../fs/btrfs/ctree.h:3117! [ 237.525992] invalid opcode: 0000 [#1] SMP PTI [ 237.525998] CPU: 4 PID: 4423 Comm: fstrim Tainted: G U OE 5.4.14-8-vanilla #1 [ 237.526001] Hardware name: ASUSTeK COMPUTER INC. [ 237.526044] RIP: 0010:assfail.constprop.58+0x18/0x1a [btrfs] [ 237.526079] Call Trace: [ 237.526120] find_first_clear_extent_bit+0x13d/0x150 [btrfs] [ 237.526148] btrfs_trim_fs+0x211/0x3f0 [btrfs] [ 237.526184] btrfs_ioctl_fitrim+0x103/0x170 [btrfs] [ 237.526219] btrfs_ioctl+0x129a/0x2ed0 [btrfs] [ 237.526227] ? filemap_map_pages+0x190/0x3d0 [ 237.526232] ? do_filp_open+0xaf/0x110 [ 237.526238] ? _copy_to_user+0x22/0x30 [ 237.526242] ? cp_new_stat+0x150/0x180 [ 237.526247] ? do_vfs_ioctl+0xa4/0x640 [ 237.526278] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [ 237.526283] do_vfs_ioctl+0xa4/0x640 [ 237.526288] ? __do_sys_newfstat+0x3c/0x60 [ 237.526292] ksys_ioctl+0x70/0x80 [ 237.526297] __x64_sys_ioctl+0x16/0x20 [ 237.526303] do_syscall_64+0x5a/0x1c0 [ 237.526310] entry_SYSCALL_64_after_hwframe+0x49/0xbe That was due to btrfs_fs_device::aloc_tree being empty. Initially I thought this wasn't possible and as a percaution have put the assert in find_first_clear_extent_bit. Turns out this is indeed possible and could happen when a file system with SINGLE data/metadata profile has a 2nd device added. Until balance is run or a new chunk is allocated on this device it will be completely empty. In this case find_first_clear_extent_bit should return the full range [0, -1ULL] and let the caller handle this i.e for trim the end will be capped at the size of actual device. Link: https://lore.kernel.org/linux-btrfs/izW2WNyvy1dEDweBICizKnd2KDwDiDyY2EYQr4YCwk7pkuIpthx-JRn65MPBde00ND6V0_Lh8mW0kZwzDiLDv25pUYWxkskWNJnVP0kgdMA=@protonmail.com/ Fixes: 45bfcfc168f8 ("btrfs: Implement find_first_clear_extent_bit") CC: stable@vger.kernel.org # 5.2+ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Josef Bacik
|
42ffb0bf58 |
btrfs: flush write bio if we loop in extent_write_cache_pages
There exists a deadlock with range_cyclic that has existed forever. If we loop around with a bio already built we could deadlock with a writer who has the page locked that we're attempting to write but is waiting on a page in our bio to be written out. The task traces are as follows PID: 1329874 TASK: ffff889ebcdf3800 CPU: 33 COMMAND: "kworker/u113:5" #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3 #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42 #3 [ffffc900297bb708] __lock_page at ffffffff811f145b #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502 #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684 #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0 #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2 #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd PID: 2167901 TASK: ffff889dc6a59c00 CPU: 14 COMMAND: "aio-dio-invalid" #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3 #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42 #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6 #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7 #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359 #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933 #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8 #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032 I used drgn to find the respective pages we were stuck on page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901 page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874 As you can see the kworker is waiting for bit 0 (PG_locked) on index 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index 8148. aio-dio-invalid has 7680, and the kworker epd looks like the following crash> struct extent_page_data ffffc900297bbbb0 struct extent_page_data { bio = 0xffff889f747ed830, tree = 0xffff889eed6ba448, extent_locked = 0, sync_io = 0 } Probably worth mentioning as well that it waits for writeback of the page to complete while holding a lock on it (at prepare_pages()). Using drgn I walked the bio pages looking for page 0xffffea00fbfc7500 which is the one we're waiting for writeback on bio = Object(prog, 'struct bio', address=0xffff889f747ed830) for i in range(0, bio.bi_vcnt.value_()): bv = bio.bi_io_vec[i] if bv.bv_page.value_() == 0xffffea00fbfc7500: print("FOUND IT") which validated what I suspected. The fix for this is simple, flush the epd before we loop back around to the beginning of the file during writeout. Fixes: b293f02e1423 ("Btrfs: Add writepages support") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Filipe Manana
|
7227ff4de5 |
Btrfs: fix race between adding and putting tree mod seq elements and nodes
There is a race between adding and removing elements to the tree mod log list and rbtree that can lead to use-after-free problems. Consider the following example that explains how/why the problems happens: 1) Task A has mod log element with sequence number 200. It currently is the only element in the mod log list; 2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to access the tree mod log. When it enters the function, it initializes 'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock' before checking if there are other elements in the mod seq list. Since the list it empty, 'min_seq' remains set to (u64)-1. Then it unlocks the lock 'tree_mod_seq_lock'; 3) Before task A acquires the lock 'tree_mod_log_lock', task B adds itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a sequence number of 201; 4) Some other task, name it task C, modifies a btree and because there elements in the mod seq list, it adds a tree mod elem to the tree mod log rbtree. That node added to the mod log rbtree is assigned a sequence number of 202; 5) Task B, which is doing fiemap and resolving indirect back references, calls btrfs get_old_root(), with 'time_seq' == 201, which in turn calls tree_mod_log_search() - the search returns the mod log node from the rbtree with sequence number 202, created by task C; 6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating the mod log rbtree and finds the node with sequence number 202. Since 202 is less than the previously computed 'min_seq', (u64)-1, it removes the node and frees it; 7) Task B still has a pointer to the node with sequence number 202, and it dereferences the pointer itself and through the call to __tree_mod_log_rewind(), resulting in a use-after-free problem. This issue can be triggered sporadically with the test case generic/561 from fstests, and it happens more frequently with a higher number of duperemove processes. When it happens to me, it either freezes the VM or it produces a trace like the following before crashing: [ 1245.321140] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI [ 1245.321200] CPU: 1 PID: 26997 Comm: pool Not tainted 5.5.0-rc6-btrfs-next-52 #1 [ 1245.321235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 [ 1245.321287] RIP: 0010:rb_next+0x16/0x50 [ 1245.321307] Code: .... [ 1245.321372] RSP: 0018:ffffa151c4d039b0 EFLAGS: 00010202 [ 1245.321388] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8ae221363c80 RCX: 6b6b6b6b6b6b6b6b [ 1245.321409] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ae221363c80 [ 1245.321439] RBP: ffff8ae20fcc4688 R08: 0000000000000002 R09: 0000000000000000 [ 1245.321475] R10: ffff8ae20b120910 R11: 00000000243f8bb1 R12: 0000000000000038 [ 1245.321506] R13: ffff8ae221363c80 R14: 000000000000075f R15: ffff8ae223f762b8 [ 1245.321539] FS: 00007fdee1ec7700(0000) GS:ffff8ae236c80000(0000) knlGS:0000000000000000 [ 1245.321591] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1245.321614] CR2: 00007fded4030c48 CR3: 000000021da16003 CR4: 00000000003606e0 [ 1245.321642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1245.321668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1245.321706] Call Trace: [ 1245.321798] __tree_mod_log_rewind+0xbf/0x280 [btrfs] [ 1245.321841] btrfs_search_old_slot+0x105/0xd00 [btrfs] [ 1245.321877] resolve_indirect_refs+0x1eb/0xc60 [btrfs] [ 1245.321912] find_parent_nodes+0x3dc/0x11b0 [btrfs] [ 1245.321947] btrfs_check_shared+0x115/0x1c0 [btrfs] [ 1245.321980] ? extent_fiemap+0x59d/0x6d0 [btrfs] [ 1245.322029] extent_fiemap+0x59d/0x6d0 [btrfs] [ 1245.322066] do_vfs_ioctl+0x45a/0x750 [ 1245.322081] ksys_ioctl+0x70/0x80 [ 1245.322092] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 1245.322113] __x64_sys_ioctl+0x16/0x20 [ 1245.322126] do_syscall_64+0x5c/0x280 [ 1245.322139] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1245.322155] RIP: 0033:0x7fdee3942dd7 [ 1245.322177] Code: .... [ 1245.322258] RSP: 002b:00007fdee1ec6c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1245.322294] RAX: ffffffffffffffda RBX: 00007fded40210d8 RCX: 00007fdee3942dd7 [ 1245.322314] RDX: 00007fded40210d8 RSI: 00000000c020660b RDI: 0000000000000004 [ 1245.322337] RBP: 0000562aa89e7510 R08: 0000000000000000 R09: 00007fdee1ec6d44 [ 1245.322369] R10: 0000000000000073 R11: 0000000000000246 R12: 00007fdee1ec6d48 [ 1245.322390] R13: 00007fdee1ec6d40 R14: 00007fded40210d0 R15: 00007fdee1ec6d50 [ 1245.322423] Modules linked in: .... [ 1245.323443] ---[ end trace 01de1e9ec5dff3cd ]--- Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum sequence number and iterates the rbtree while holding the lock 'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock' lock, since it is now redundant. Fixes: bd989ba359f2ac ("Btrfs: add tree modification log functions") Fixes: 097b8a7c9e48e2 ("Btrfs: join tree mod log code with the code holding back delayed refs") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> |
||
Linus Torvalds
|
ccaaaf6fe5 |
MPX requires recompiling applications, which requires compiler support.
Unfortunately, GCC 9.1 is expected to be be released without support for MPX. This means that there was only a relatively small window where folks could have ever used MPX. It failed to gain wide adoption in the industry, and Linux was the only mainstream OS to ever support it widely. Support for the feature may also disappear on future processors. This set completes the process that we started during the 5.4 merge window. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJeK1/pAAoJEGg1lTBwyZKwgC8QAIiVn1d7A9Uj/WpnpgfCChCZ 9XiV6Ak999qD9fbAcrgNfPjieaD4mtokocSRVJuRgJu5iLnIJCINlozLPe4yVl7P 7zebnxkLq0CIA8d56bEUoFlC0J+oWYlDVQePZzNQsSk5KHVGXVLpF6U4vDVzZeQy cprgvdeY+ehB7G6IIo0MWTg5ylKYAsOAyVvK8NIGpKY2k6/YqCnsptnsVE7bvlHy TrEOiUWLv+hh0bMkZdP1PwKQKEuMO/IZly0HtviFbMN7T4TB1spfg7ELoBucEq3T s4EVbYRe+nIE4tuEAveaX3CgxJek8cY5MlticskdaKSEACBwabdOF55qsZy0u+WA PYC4iUIXfbOH8OgieKWtGX4IuSkRYdQ2nP4BOpe4ZX4+zvU7zOCIyVSKRrwkX8cc ADtWI5FAtB36KCgUuWnHGHNZpOxPTbTLBuBataFY4Q2uBNJEBJpscZ5H9ObtyGFU ZjlzqFnM0nFNDKEI1EEtv9jLzgZTU1RQ46s7EFeSeEQ2/s9wJ3+s5sBlVbljsmus o658bLOEaRWC/aF15dgmEXW9GAO6uifNdmbzGnRn7oEMYyFQPTWbZvi1zGz58QaG Y6WTtigVtsSrHS4wpYd+p+n1W06VnB6J3BpBM4G1VQv1Vm0dNd1tUOfkqOzPjg7c 33Itmsz2LaW1mb67GlgZ =g4cC -----END PGP SIGNATURE----- Merge tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx Pull x86 MPX removal from Dave Hansen: "MPX requires recompiling applications, which requires compiler support. Unfortunately, GCC 9.1 is expected to be be released without support for MPX. This means that there was only a relatively small window where folks could have ever used MPX. It failed to gain wide adoption in the industry, and Linux was the only mainstream OS to ever support it widely. Support for the feature may also disappear on future processors. This set completes the process that we started during the 5.4 merge window when the MPX prctl()s were removed. XSAVE support is left in place, which allows MPX-using KVM guests to continue to function" * tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx: x86/mpx: remove MPX from arch/x86 mm: remove arch_bprm_mm_init() hook x86/mpx: remove bounds exception code x86/mpx: remove build infrastructure x86/alternatives: add missing insn.h include |
||
Linus Torvalds
|
e84bcd61f6 |
This pull request contains mostly fixes for UBI and UBIFS:
UBI: - Fixes for memory leaks in error paths - Fix for an logic error in a fastmap selfcheck UBIFS: - Fix for FS_IOC_SETFLAGS related to fscrypt flag - Support for FS_ENCRYPT_FL - Fix for a dead lock in bulk-read mode -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl4k1v8WHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wTGuD/9PU3qZJq1w5F499YAJb2qx2hLD rseg7SZ4rzKwXI5m2g2EP63lgMQkJZC7u5YUv9c1m0gQnNvVXbTdhba3M1V437Kd F0Ce2SeqbVRi4faGWyEH0TEEmFo2Nk7Uz3iJeaXxUY8BqVrvQZaBYk6GWtj+wWIl Yc2ONKwzIF2BDTU5pFvP2yubHnTm00M4uP46MqAcaWoehd9L+xZhC0xiXyWFiAQE g/ITk4vgD3bJRkI0nYNuHxFIgafIweGlyuNfMMpfh2Yqo5/tGnppPE+H+Maokb8V 6Gqmt9XR34ZGH8mOZsMFWxeK6e68DP2AkzL1EsiT2FlUc6hhCr+pOVEN17Y4eb// IRpy7l8f9BkHvR72roaQusE1UjANC2sw2VtDi4TJO6WpFRx4n94//bf+IxO32os8 0AbIyzYCEo1Kql0wTxhqTZnHJr+zHjcFWOuzZ/95iH5wVQmb3hvlmmozL6ZPV9sG cqyV1sEcFhkUKgCSTmbtoYBKfEJLj4j3WYvLoI7apLYN014ExNJY7PIVfIUtMfZQ Sn0sN8+/gpQOOben67IQK9EdcvEhEkY4JdTHpuZEQmh3cS4HNnhzjM2A+n2sWTU8 lxkakeemcO2sV6ue/Vg6Fq4fPEnQtyOVVsSUHjA7hIy5JXwprDvft4gHkI2r40FI fC9PuCjoUIIJtMyA7g== =9tMk -----END PGP SIGNATURE----- Merge tag 'upstream-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI/UBIFS updates from Miquel Raynal: "This pull request contains mostly fixes for UBI and UBIFS: UBI: - Fixes for memory leaks in error paths - Fix for an logic error in a fastmap selfcheck UBIFS: - Fix for FS_IOC_SETFLAGS related to fscrypt flag - Support for FS_ENCRYPT_FL - Fix for a dead lock in bulk-read mode" Sent on behalf of Richard Weinberger who is traveling. * tag 'upstream-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Fix an error pointer dereference in error handling code ubifs: Fix memory leak from c->sup_node ubifs: Fix ino_t format warnings in orphan_delete() ubifs: Fix deadlock in concurrent bulk-read and writepage ubifs: Fix wrong memory allocation ubi: Free the normal volumes in error paths of ubi_attach_mtd_dev() ubi: Check the presence of volume before call ubi_fastmap_destroy_checkmap() ubifs: Add support for FS_ENCRYPT_FL ubifs: Fix FS_IOC_SETFLAGS unexpectedly clearing encrypt flag ubi: wl: Remove set but not used variable 'prev_e' ubi: fastmap: Fix inverted logic in seen selfcheck |
||
Linus Torvalds
|
6e135baed8 |
f2fs-for-5.6
In this series, we've implemented transparent compression experimentally. It supports LZO and LZ4, but will add more later as we investigate in the field more. At this point, the feature doesn't expose compressed space to user directly in order to guarantee potential data updates later to the space. Instead, the main goal is to reduce data writes to flash disk as much as possible, resulting in extending disk life time as well as relaxing IO congestion. Alternatively, we're also considering to add ioctl() to reclaim compressed space and show it to user after putting the immutable bit. Enhancement: - add compression support - avoid unnecessary locks in quota ops - harden power-cut scenario for zoned block devices - use private bio_set to avoid IO congestion - replace GC mutex with rwsem to serialize callers Bug fix: - fix dentry consistency and memory corruption in rename()'s error case - fix wrong swap extent reports - fix casefolding bugs - change lock coverage to avoid deadlock - avoid GFP_KERNEL under f2fs_lock_op And, we've cleaned up sysfs entries to prepare no debugfs. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl4zInwACgkQQBSofoJI UNL4Tg/+JBbVEFa3IUBGMdbjfgd/g0Jye++iMAYYGRWT6Ll/IGcHRV9NunITjgWU mBZqdhI28kXeiGCcewB1ZvivjLx22X4n6yevHk2B5A6PNe9IDCHi0HOAhJJHkjPH ecv2L+vX3Oj4y0+H7JNz9Fo3OIPJvMPtCQWlg1z+VQyhB85zNP7fZlvvIY4tG8yw ERo0YNotLqwcF1BxCwNbAhV3aJGDxar+MI//yNzpiwDX7IptVpqestfcoIYc9kKL 4kSWRyEIGwcuIeyoM6aofGS9t4Z/Oe/gdqcxNr6l5n0Q/tMTpb4b/fJFGNr6RRx9 X9NQo8flkQb2DEIOP0DVpO2aPebzsVtzg3LZUOLA83+wCHfwINtHai2Dy2zDJ2my BrVdou8fe2oxoaYihJg/Tz9cd0nA/6mZArtpYvDImAmX/xuGOvVk9zZkXNwc9nVX EyVzy0vW4lA6gAIJ95aG6DDhJcAtVoy0MhBRWG92Pufxhn9aW24AV63ChWUf9DRx /3RqpMAuQ3UC2gOxXKKnr54lsdhUIMn/y9sjROkVvQ1BvgRVxO8I4GFvMHMKv9pR 9KXiVRdzyYERyoL4+MF7A2zTnw+RHL4RVILa85p2ALGy2jQ1UuNUQi0BN9x2u1v8 S1ifNNX8SwOP+83ImFJhhn3HybpFQ45aLO3F7ZjKBQAnufJu+xw= =zeoY -----END PGP SIGNATURE----- Merge tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this series, we've implemented transparent compression experimentally. It supports LZO and LZ4, but will add more later as we investigate in the field more. At this point, the feature doesn't expose compressed space to user directly in order to guarantee potential data updates later to the space. Instead, the main goal is to reduce data writes to flash disk as much as possible, resulting in extending disk life time as well as relaxing IO congestion. Alternatively, we're also considering to add ioctl() to reclaim compressed space and show it to user after putting the immutable bit. Enhancements: - add compression support - avoid unnecessary locks in quota ops - harden power-cut scenario for zoned block devices - use private bio_set to avoid IO congestion - replace GC mutex with rwsem to serialize callers Bug fixes: - fix dentry consistency and memory corruption in rename()'s error case - fix wrong swap extent reports - fix casefolding bugs - change lock coverage to avoid deadlock - avoid GFP_KERNEL under f2fs_lock_op And, we've cleaned up sysfs entries to prepare no debugfs" * tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (31 commits) f2fs: fix race conditions in ->d_compare() and ->d_hash() f2fs: fix dcache lookup of !casefolded directories f2fs: Add f2fs stats to sysfs f2fs: delete duplicate information on sysfs nodes f2fs: change to use rwsem for gc_mutex f2fs: update f2fs document regarding to fsync_mode f2fs: add a way to turn off ipu bio cache f2fs: code cleanup for f2fs_statfs_project() f2fs: fix miscounted block limit in f2fs_statfs_project() f2fs: show the CP_PAUSE reason in checkpoint traces f2fs: fix deadlock allocating bio_post_read_ctx from mempool f2fs: remove unneeded check for error allocating bio_post_read_ctx f2fs: convert inline_dir early before starting rename f2fs: fix memleak of kobject f2fs: fix to add swap extent correctly f2fs: run fsck when getting bad inode during GC f2fs: support data compression f2fs: free sysfs kobject f2fs: declare nested quota_sem and remove unnecessary sems f2fs: don't put new_page twice in f2fs_rename ... |
||
Linus Torvalds
|
0196be12aa |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl4zANcACgkQnJ2qBz9k QNkyBQgA5/ppAhSC7Snc6BDm5PMiOJjN+FhYB1W9bHbkRlKfTetJxQTxbPpokZPq A+99KuuNb3Uay2XWqan2pwZ90/9SIUZT8HnwNYwEHh33Nt76A1ybqqM0IAk+RWus KjW7Jg/xCbbFKQX/estngjIlniUQ0WP7VTTwS/NPnvsIYNEpWJQvyIecm2DZhWGS fmbn5x7PYnyveADd2Tf9z0iOKKI0ysLYksUlx+Ndg3fwPaWsI57tgUZL0Tzf552S cCsRjQrcnhjuHTDEhH9HOGQlu45U4bBNkXKKoc1HUrp58UyTY2Rnn/QCM8jkTpzB 7NwoFyqPtWguJTFDsUH1rmqQisYoMQ== =1v6t -----END PGP SIGNATURE----- Merge tag 'for_v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF, quota, reiserfs, ext2 fixes and cleanups from Jan Kara: "A few assorted fixes and cleanups for udf, quota, reiserfs, and ext2" * tag 'for_v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fs/reiserfs: remove unused macros fs/quota: remove unused macro udf: Clarify meaning of f_files in udf_statfs udf: Allow writing to 'Rewritable' partitions udf: Disallow R/W mode for disk with Metadata partition udf: Fix meaning of ENTITYID_FLAGS_* macros to be really bitwise-or flags udf: Fix free space reporting for metadata and virtual partitions udf: Update header files to UDF 2.60 udf: Move OSTA Identifier Suffix macros from ecma_167.h to osta_udf.h udf: Fix spelling in EXT_NEXT_EXTENT_ALLOCDESCS ext2: Adjust indentation in ext2_fill_super quota: avoid time_t in v1_disk_dqblk definition reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling reiserfs: Fix memory leak of journal device string ext2: set proper errno in error case of ext2_fill_super() |
||
Linus Torvalds
|
91f1a9566f |
New code for 5.6:
- Get rid of compat_time_t - Convert time_t to time64_t in quota code - Remove shadow variables - Prevent ATTR_ flag misuse in the attrmulti ioctls - Clean out strlen in the attr code - Remove some bogus asserts - Fix various file size limit calculation errors with 32-bit kernels - Pack xfs_dir2_sf_entry_t to fix build errors on arm oabi - Fix nowait inode locking calls for directio aio reads. - Fix memory corruption bugs when invalidating remote xattr value buffers. - Streamline remote attr value removal. - Make the buffer log format size consistent across platforms. - Strengthen buffer log format size checking. - Fix messed up return types of xfs_inode_need_cow. - Fix some unused variable warnings. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl4rGQcACgkQ+H93GTRK tOv1uhAAo5F38h9ZpdoMK4TPJNaREmLtM+MAA03l3sk+BY5zGDo690WDaxa5cd8h cJi8pqHpcnroJVptCkbdagx0BIPnHMVpw7CKFpBlCjNHpFeO1ULLpk/RcvwISRZy OvEOmpn6Z/FQ2jTBjr45KXIiS5DAK3KRMrmGdvh6j73YVhQ1lgadY/6Jj2TRjzT1 FFjnYMGhufBsP0yV9Rg10AcIhNCIAr3RYmfjqFXMA8raEX33gdpsJ6GU6r5iyXRJ B9dByoBlGCL15ZlxZOqEiN4omqqBLux5jrJr1tg0L0hbfu7UIxMcwXiSTnoaO2SZ G7GjlEO3wszf3wGEeaaJd/tN58SQDvfz3yY9vZObrlTelN5iDrcziLHbWc+xkwPh mykly36x8+dZ8kqgBxiF1WFgRheYNQIWnZ9wtCfWvsPtjrklIZNZqKDEAxXnSg0w 8lRXgeNfy6Jh78A817aPQibqkUAtxSk8RJhimkPWyKwUA3jSzgv1LsjgzoRCM8ik /vmmRKmXviFrQwIgDKYQc/wHx58khKBi2Kna+rZUJrozrBU+P0EfG+AU6Rxjw1ob cOOSNooI5kp+uTrmlaKKr9mua80m0hACgrFT9Fj+SJWgi313343VVNXubRgK/yV7 WxTcTOZadkQgqgeQSMzwFQuQPjQob9bNoQf5PK01sJ/4v96Zsmg= =J5Em -----END PGP SIGNATURE----- Merge tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Darrick Wong: "In this release we clean out the last of the old 32-bit timestamp code, fix a number of bugs and memory corruptions on 32-bit platforms, and a refactoring of some of the extended attribute code. I think I'll be back next week with some refactoring of how the XFS buffer code returns error codes, however I prefer to hold onto that for another week to let it soak a while longer Summary: - Get rid of compat_time_t - Convert time_t to time64_t in quota code - Remove shadow variables - Prevent ATTR_ flag misuse in the attrmulti ioctls - Clean out strlen in the attr code - Remove some bogus asserts - Fix various file size limit calculation errors with 32-bit kernels - Pack xfs_dir2_sf_entry_t to fix build errors on arm oabi - Fix nowait inode locking calls for directio aio reads - Fix memory corruption bugs when invalidating remote xattr value buffers - Streamline remote attr value removal - Make the buffer log format size consistent across platforms - Strengthen buffer log format size checking - Fix messed up return types of xfs_inode_need_cow - Fix some unused variable warnings" * tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (24 commits) xfs: remove unused variable 'done' xfs: fix uninitialized variable in xfs_attr3_leaf_inactive xfs: change return value of xfs_inode_need_cow to int xfs: check log iovec size to make sure it's plausibly a buffer log format xfs: make struct xfs_buf_log_format have a consistent size xfs: complain if anyone tries to create a too-large buffer log item xfs: clean up xfs_buf_item_get_format return value xfs: streamline xfs_attr3_leaf_inactive xfs: fix memory corruption during remote attr value buffer invalidation xfs: refactor remote attr value buffer invalidation xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read xfs: Add __packed to xfs_dir2_sf_entry_t definition xfs: fix s_maxbytes computation on 32-bit kernels xfs: truncate should remove all blocks, not just to the end of the page cache xfs: introduce XFS_MAX_FILEOFF xfs: remove bogus assertion when online repair isn't enabled xfs: Remove all strlen in all xfs_attr_* functions for attr names. xfs: fix misuse of the XFS_ATTR_INCOMPLETE flag xfs: also remove cached ACLs when removing the underlying attr xfs: reject invalid flags combinations in XFS_IOC_ATTRMULTI_BY_HANDLE ... |
||
Linus Torvalds
|
e5da4c933c |
This merge window, we've added some performance improvements in how we
handle inode locking in the read/write paths, and improving the performance of Direct I/O overwrites. We also now record the error code which caused the first and most recent ext4_error() report in the superblock, to make it easier to root cause problems in production systems. There are also many of the usual cleanups and miscellaneous bug fixes. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl4yBf0ACgkQ8vlZVpUN gaOK8Af9EsY1vyR/IvEosfXJoKIqnTXN1SLt94iAOUh6dNeVNcyv1SIzRGFrpmsg uHY02EkcTl68b/AjV7ieDpOnOSmlP7NzynuVoar2hrjKX0MzpEu03Vv1a3dUQKuU zcdchi83EwRjEvegsNK/VF3FFadk3TtC7x+7o6p840V6OAyp5CXhjm1akJqIJwvd A4gTpruTSRIFg6Jj36HEDNRgSAeILed3wC7Ywtxt51tLK7Lp/qB1EuvYodMQRvGz d0fRhbNHKepVYfxwpDUDMFnrqDPZ/SZGF73XBxP2zHd6SXy9dBLzGsRL+oj9tTUg YQJtt4Yxjjg8Q1UrMyMRzQpi4S8dAQ== =pVeR -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge window, we've added some performance improvements in how we handle inode locking in the read/write paths, and improving the performance of Direct I/O overwrites. We also now record the error code which caused the first and most recent ext4_error() report in the superblock, to make it easier to root cause problems in production systems. There are also many of the usual cleanups and miscellaneous bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (49 commits) jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft() jbd2: make sure ESHUTDOWN to be recorded in the journal superblock ext4, jbd2: ensure panic when aborting with zero errno jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record jbd2_seq_info_next should increase position index jbd2: remove pointless assertion in __journal_remove_journal_head ext4,jbd2: fix comment and code style jbd2: delete the duplicated words in the comments ext4: fix extent_status trace points ext4: fix symbolic enum printing in trace output ext4: choose hardlimit when softlimit is larger than hardlimit in ext4_statfs_project() ext4: fix race conditions in ->d_compare() and ->d_hash() ext4: make dioread_nolock the default ext4: fix extent_status fragmentation for plain files jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal ext4: drop ext4_kvmalloc() ext4: Add EXT4_IOC_FSGETXATTR/EXT4_IOC_FSSETXATTR to compat_ioctl ext4: remove unused macro MPAGE_DA_EXTENT_TAIL ext4: add missing braces in ext4_ext_drop_refs() ext4: fix some nonstandard indentation in extents.c ... |
||
Ronnie Sahlberg
|
c54849ddd8 |
cifs: fix soft mounts hanging in the reconnect code
RHBZ: 1795429 In recent DFS updates we have a new variable controlling how many times we will retry to reconnect the share. If DFS is not used, then this variable is initialized to 0 in: static inline int dfs_cache_get_nr_tgts(const struct dfs_cache_tgt_list *tl) { return tl ? tl->tl_numtgts : 0; } This means that in the reconnect loop in smb2_reconnect() we will immediately wrap retries to -1 and never actually get to pass this conditional: if (--retries) continue; The effect is that we no longer reach the point where we fail the commands with -EHOSTDOWN and basically the kernel threads are virtually hung and unkillable. Fixes: a3a53b7603798fd8 (cifs: Add support for failover in smb2_reconnect()) Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> CC: Stable <stable@vger.kernel.org> |
||
Stefan Metzmacher
|
d7f62e825f |
io_uring: add BUILD_BUG_ON() to assert the layout of struct io_uring_sqe
With nesting of anonymous unions and structs it's hard to review layout changes. It's better to ask the compiler for these things. Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Jens Axboe
|
87ce955b24 |
io_uring: add ->show_fdinfo() for the io_uring file descriptor
It can be hard to know exactly what is registered with the ring. Especially for credentials, it'd be handy to be able to see which ones are registered, what personalities they have, and what the ID of each of them is. This adds support for showing information registered in the ring from the fdinfo of the io_uring fd. Here's an example from a test case that registers 4 files (two of them sparse), 4 buffers, and 2 personalities: pos: 0 flags: 02000002 mnt_id: 14 UserFiles: 4 0: file-no-1 1: file-no-2 2: <none> 3: <none> UserBufs: 4 0: 0x563817c46000/128 1: 0x563817c47000/256 2: 0x563817c48000/512 3: 0x563817c49000/1024 Personalities: 1 Uid: 0 0 0 0 Gid: 0 0 0 0 Groups: 0 CapEff: 0000003fffffffff 2 Uid: 0 0 0 0 Gid: 0 0 0 0 Groups: 0 CapEff: 0000003fffffffff Suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Linus Torvalds
|
83fa805bcb |
threads-v5.6
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXjFo8wAKCRCRxhvAZXjc omaGAQDVwCHQekqxp2eC8EJH4Pkt+Bn1BLrA25stlTo93YBPHgEAsPVUCRNcrZAl VncYmxCfpt3Yu0S/MTVXu5xrRiIXPQk= =uqTN -----END PGP SIGNATURE----- Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread management updates from Christian Brauner: "Sargun Dhillon over the last cycle has worked on the pidfd_getfd() syscall. This syscall allows for the retrieval of file descriptors of a process based on its pidfd. A task needs to have ptrace_may_access() permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and Andy) on the target. One of the main use-cases is in combination with seccomp's user notification feature. As a reminder, seccomp's user notification feature was made available in v5.0. It allows a task to retrieve a file descriptor for its seccomp filter. The file descriptor is usually handed of to a more privileged supervising process. The supervisor can then listen for syscall events caught by the seccomp filter of the supervisee and perform actions in lieu of the supervisee, usually emulating syscalls. pidfd_getfd() is needed to expand its uses. There are currently two major users that wait on pidfd_getfd() and one future user: - Netflix, Sargun said, is working on a service mesh where users should be able to connect to a dns-based VIP. When a user connects to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be redirected to an envoy process. This service mesh uses seccomp user notifications and pidfd to intercept all connect calls and instead of connecting them to 1.2.3.4:80 connects them to e.g. 127.0.0.1:8080. - LXD uses the seccomp notifier heavily to intercept and emulate mknod() and mount() syscalls for unprivileged containers/processes. With pidfd_getfd() more uses-cases e.g. bridging socket connections will be possible. - The patchset has also seen some interest from the browser corner. Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a broker process. In the future glibc will start blocking all signals during dlopen() rendering this type of sandbox impossible. Hence, in the future Firefox will switch to a seccomp-user-nofication based sandbox which also makes use of file descriptor retrieval. The thread for this can be found at https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html With pidfd_getfd() it is e.g. possible to bridge socket connections for the supervisee (binding to a privileged port) and taking actions on file descriptors on behalf of the supervisee in general. Sargun's first version was using an ioctl on pidfds but various people pushed for it to be a proper syscall which he duely implemented as well over various review cycles. Selftests are of course included. I've also added instructions how to deal with merge conflicts below. There's also a small fix coming from the kernel mentee project to correctly annotate struct sighand_struct with __rcu to fix various sparse warnings. We've received a few more such fixes and even though they are mostly trivial I've decided to postpone them until after -rc1 since they came in rather late and I don't want to risk introducing build warnings. Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is needed to avoid allocation recursions triggerable by storage drivers that have userspace parts that run in the IO path (e.g. dm-multipath, iscsi, etc). These allocation recursions deadlock the device. The new prctl() allows such privileged userspace components to avoid allocation recursions by setting the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags. The patch carries the necessary acks from the relevant maintainers and is routed here as part of prctl() thread-management." * tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim sched.h: Annotate sighand_struct with __rcu test: Add test for pidfd getfd arch: wire up pidfd_getfd syscall pid: Implement pidfd_getfd syscall vfs, fdtable: Add fget_task helper |
||
Linus Torvalds
|
896f8d23d0 |
for-5.6/io_uring-vfs-2020-01-29
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4yEegQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpn5ZD/4/WlXs2cUDgg1C65bzZFO4qvevm+VkXmsk GbyrnFstRekvSH01/ZQxlyDVKS8Wux0XIJ6OArCh1047LvL1bEE5dvOW5iIiwa/r grjQuwFAzIPsE2fgcAO17BKIUzq2Z96+hwDzH7dw0i32yBuLvNmY/1SxcCHKfPut uzGyp7t3/2dIHbpWILRndMYe0O9j9ubmOMvKyKTwy723yDEafsUoqu2mlpigzTq4 2i+DbYBIAd8qmLqG/m3e+vOt9xodJ2Q0hlO+v6DcP2SKXU64Hb/N98HadR//aWP9 41DBXqs+dvDBcu3Jxb80PFUTiOQZECJivkns5cNcjuSXmNkOuQhDQR5K372AHmR9 m6e6FSBxwej8HselAZCI6yu9uBKd0i+MM4FnFs/O73QGYx2ayXsEXp/Jad9xiYgW pC5XJTSqJQhPE0AYYEOzHPPcBLBcpvXHkvmGKdjkNb8OLhhgh2S/YG0DNC+8ABXr j1uIe/n3kJEEmOanUyiitGyLmDq+mXd7aCVKJL/J0KiGD8Gkc1avAZ1ZrTQgjujY FqqBFawO8gv3g0L4WMI8JI+HJGMnA488obet6UKm9+l/Z/urEpXzDAKf/W/vnx2B LD0FSA0bCh1tyO6JU+avFwHlwShtV7/rx/OhrmCK7CCYKtZCA2IEctxyr8U+PBIv DtwIMTYTsA== =ZZUI -----END PGP SIGNATURE----- Merge tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block Pull io_uring updates from Jens Axboe: - Support for various new opcodes (fallocate, openat, close, statx, fadvise, madvise, openat2, non-vectored read/write, send/recv, and epoll_ctl) - Faster ring quiesce for fileset updates - Optimizations for overflow condition checking - Support for max-sized clamping - Support for probing what opcodes are supported - Support for io-wq backend sharing between "sibling" rings - Support for registering personalities - Lots of little fixes and improvements * tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block: (64 commits) io_uring: add support for epoll_ctl(2) eventpoll: support non-blocking do_epoll_ctl() calls eventpoll: abstract out epoll_ctl() handler io_uring: fix linked command file table usage io_uring: support using a registered personality for commands io_uring: allow registering credentials io_uring: add io-wq workqueue sharing io-wq: allow grabbing existing io-wq io_uring/io-wq: don't use static creds/mm assignments io-wq: make the io_wq ref counted io_uring: fix refcounting with batched allocations at OOM io_uring: add comment for drain_next io_uring: don't attempt to copy iovec for READ/WRITE io_uring: honor IOSQE_ASYNC for linked reqs io_uring: prep req when do IOSQE_ASYNC io_uring: use labeled array init in io_op_defs io_uring: optimise sqe-to-req flags translation io_uring: remove REQ_F_IO_DRAINED io_uring: file switch work needs to get flushed on exit io_uring: hide uring_fd in ctx ... |
||
Linus Torvalds
|
33c84e89ab |
SCSI misc on 20200129
This series is slightly unusual because it includes Arnd's compat ioctl tree here: 1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue Excluding Arnd's changes, this is mostly an update of the usual drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas. There are a couple of core and base updates around error propagation and atomicity in the attribute container base we use for the SCSI transport classes. The rest is minor changes and updates. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXjHQJyYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZZ8AQC02N+v iUnTl1YxGPjIWBbnHuUxN2Qbb9D3C6gAT1LkigEArlk163K3A1XEQHF/VNCdAz/f 01XYTd3p1VHuegIBHlk= =Cn52 -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This series is slightly unusual because it includes Arnd's compat ioctl tree here: 1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue Excluding Arnd's changes, this is mostly an update of the usual drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas. There are a couple of core and base updates around error propagation and atomicity in the attribute container base we use for the SCSI transport classes. The rest is minor changes and updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits) scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity scsi: hisi_sas: Modify the file permissions of trigger_dump to write only scsi: hisi_sas: Replace magic number when handle channel interrupt scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock scsi: hisi_sas: use threaded irq to process CQ interrupts scsi: ufs: Use UFS device indicated maximum LU number scsi: ufs: Add max_lu_supported in struct ufs_dev_info scsi: ufs: Delete is_init_prefetch from struct ufs_hba scsi: ufs: Inline two functions into their callers scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init() scsi: ufs: Split ufshcd_probe_hba() based on its called flow scsi: ufs: Delete struct ufs_dev_desc scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails scsi: ufs-mediatek: enable low-power mode for hibern8 state scsi: ufs: export some functions for vendor usage scsi: ufs-mediatek: add dbg_register_dump implementation scsi: qla2xxx: Fix a NULL pointer dereference in an error path scsi: qla1280: Make checking for 64bit support consistent scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1 ... |
||
Linus Torvalds
|
08a3ef8f6b |
linux-kselftest-5.6-rc1-kunit
This kunit update for Linux 5.6-rc1 consists of: -- Support for building kunit as a module from Alan Maguire -- AppArmor KUnit tests for policy unpack from Mike Salvatore -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl4xz/wACgkQCwJExA0N Qxyg2A//X0bnhN82oCchkTRW3GyGi5wTR2wGhoNzMZD0XUtCvn+4BlCSP20ttYdT beiLCiewcuEdvXRyEV9Kikvet/67ovbjA/ce6ZrR7TlIHo8esKcy19/nu1OTvtI1 8eji1q7NSEV9iswz1ZoBAw+MTDHZfOI9qYY2UPcwjy7xWN84z2X1w+8UQ3EamOKd 6BfbohsYuuTTHhA2k1aUzvQcHqNz0YdH4yvNQpdunJXLUI04TeGZA6Ug66u6kWEd 1f5SSAu6r1vnU7DADrb1QwEDuIwL4KBuaMg2Rj5GLxTNp3wxmW9M2Dit+iN7+vNH TS31kZW6KgxC5XuGVPENJaWlDX5Hm+5W8uiRZLNXsxDy927u53RzwrSZw/FbdbB1 HuPZZCzE1soWHdPIQz44HCCAg9XddypYlC1o4IYL1JkJknqG12ky4xgM8GRNCZAB oUW3Ax3Lcr0EJALO/kFd/uEbl79PdmDk8uPMU1jtLyx5cs70yC3fsT2GB+DbP802 i/FxTtrOMGjU2OWcYfQcXapvZdgImf9nPsSZe3FJXjHfytNRbVZOZ2rHAMh03Keu EBthDs6ejm6OUSGUXjngE9NaQKXsNSQ1Qor+6FrGnT4IxUMzWenudqHH7/dgF7Fr fHlZGBilKMc/EYKb/6hj4kvEChrSIXj6TFknmI28I/epPiOr2gU= =AFO4 -----END PGP SIGNATURE----- Merge tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest kunit updates from Shuah Khan: "This kunit update consists of: - Support for building kunit as a module from Alan Maguire - AppArmor KUnit tests for policy unpack from Mike Salvatore" * tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: building kunit as a module breaks allmodconfig kunit: update documentation to describe module-based build kunit: allow kunit to be loaded as a module kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds kunit: allow kunit tests to be loaded as a module kunit: hide unexported try-catch interface in try-catch-impl.h kunit: move string-stream.h to lib/kunit apparmor: add AppArmor KUnit tests for policy unpack |
||
Linus Torvalds
|
22b17db4ea |
y2038: core, driver and file system changes
These are updates to device drivers and file systems that for some reason or another were not included in the kernel in the previous y2038 series. I've gone through all users of time_t again to make sure the kernel is in a long-term maintainable state, replacing all remaining references to time_t with safe alternatives. Some related parts of the series were picked up into the nfsd, xfs, alsa and v4l2 trees. A final set of patches in linux-mm removes the now unused time_t/timeval/timespec types and helper functions after all five branches are merged for linux-5.6, ensuring that no new users get merged. As a result, linux-5.6, or my backport of the patches to 5.4 [1], should be the first release that can serve as a base for a 32-bit system designed to run beyond year 2038, with a few remaining caveats: - All user space must be compiled with a 64-bit time_t, which will be supported in the coming musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher. - Applications that use the system call interfaces directly need to be ported to use the time64 syscalls added in linux-5.1 in place of the existing system calls. This impacts most users of futex() and seccomp() as well as programming languages that have their own runtime environment not based on libc. - Applications that use a private copy of kernel uapi header files or their contents may need to update to the linux-5.6 version, in particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h, linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h. - A few remaining interfaces cannot be changed to pass a 64-bit time_t in a compatible way, so they must be configured to use CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit timestamps. Most importantly this impacts all users of 'struct input_event'. - All y2038 problems that are present on 64-bit machines also apply to 32-bit machines. In particular this affects file systems with on-disk timestamps using signed 32-bit seconds: ext4 with ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs. Changes since v1 [2]: - Add Acks I received - Rebase to v5.5-rc1, dropping patches that got merged already - Add NFS, XFS and the final three patches from another series - Rewrite etnaviv patches - Add one late revert to avoid an etnaviv regression [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame [2] https://lore.kernel.org/lkml/20191108213257.3097633-1-arnd@arndb.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJeMYy3AAoJEGCrR//JCVInEGwP/0R+S+ok7vw9OdLVT0lFl07D IcVabgOWf24imN7m7L7Mlt3nDfxIT4tMpiAXq7eMO3spcyViG18O2LXdSQ4/7QBp +BlhoMjOP9w34Jyd7mnkFr4vqQALvfIqkS8rFObDtDub2Rfj9PC36MRMIu8BPXlv RK8bigwJeH/DV38yc5/JeUcD+WuewYLsK9XPWN+4yB4vgGsNU3ZQQ6nnzbR3hMsN DN8WZ68Y7IBs0Kyxkf+s2zmRXtCa2RiFg/2TUsk5olVAJVaenvte69hq5RSbg1vW vLi6K8cBoPWL59nqCzcNE+TUhSUg3LOj/a/KWyl76yovz7AlJaNjssOf8ZjHw6sL MhQqz3hXTxiJDS2Jvbf1yojiYGlzrq/gqcRFGe9jPcZdieMc4/yZCx60G/Exa5Pu YdMcqMyDWPFyUAFQNWEF59HPheOdj6tb1KpJ6bwgCo3P7QqhLrU4z9w3Py4/ZfBO 4sWcWteSsD6MN/ADJ2WQ56nNxzM2AvkeVJKcF6FCkdngXX9T0GExmZz7SqB5Du99 9lNjIiD5E+LBa/Swo/7n49aYa8x06V1pmHYTZVh9Wkl+CZiO21umezQFrWsfaMTp xt3c6pFdMG5xNMGpreTAXOmf2R+T6O8IO2qQq/TYjzqOLH7QC830P7avkmml+cK1 LjOBE2TfSeO8Ru1dXV4t =wx0A -----END PGP SIGNATURE----- Merge tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull y2038 updates from Arnd Bergmann: "Core, driver and file system changes These are updates to device drivers and file systems that for some reason or another were not included in the kernel in the previous y2038 series. I've gone through all users of time_t again to make sure the kernel is in a long-term maintainable state, replacing all remaining references to time_t with safe alternatives. Some related parts of the series were picked up into the nfsd, xfs, alsa and v4l2 trees. A final set of patches in linux-mm removes the now unused time_t/timeval/timespec types and helper functions after all five branches are merged for linux-5.6, ensuring that no new users get merged. As a result, linux-5.6, or my backport of the patches to 5.4 [1], should be the first release that can serve as a base for a 32-bit system designed to run beyond year 2038, with a few remaining caveats: - All user space must be compiled with a 64-bit time_t, which will be supported in the coming musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher. - Applications that use the system call interfaces directly need to be ported to use the time64 syscalls added in linux-5.1 in place of the existing system calls. This impacts most users of futex() and seccomp() as well as programming languages that have their own runtime environment not based on libc. - Applications that use a private copy of kernel uapi header files or their contents may need to update to the linux-5.6 version, in particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h, linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h. - A few remaining interfaces cannot be changed to pass a 64-bit time_t in a compatible way, so they must be configured to use CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit timestamps. Most importantly this impacts all users of 'struct input_event'. - All y2038 problems that are present on 64-bit machines also apply to 32-bit machines. In particular this affects file systems with on-disk timestamps using signed 32-bit seconds: ext4 with ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs" [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame * tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits) Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC" y2038: sh: remove timeval/timespec usage from headers y2038: sparc: remove use of struct timex y2038: rename itimerval to __kernel_old_itimerval y2038: remove obsolete jiffies conversion functions nfs: fscache: use timespec64 in inode auxdata nfs: fix timstamp debug prints nfs: use time64_t internally sunrpc: convert to time64_t for expiry drm/etnaviv: avoid deprecated timespec drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC drm/msm: avoid using 'timespec' hfs/hfsplus: use 64-bit inode timestamps hostfs: pass 64-bit timestamps to/from user space packet: clarify timestamp overflow tsacct: add 64-bit btime field acct: stop using get_seconds() um: ubd: use 64-bit time_t where possible xtensa: ISS: avoid struct timeval dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD ... |
||
Jens Axboe
|
3e4827b05d |
io_uring: add support for epoll_ctl(2)
This adds IORING_OP_EPOLL_CTL, which can perform the same work as the epoll_ctl(2) system call. Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Jens Axboe
|
39220e8d4a |
eventpoll: support non-blocking do_epoll_ctl() calls
Also make it available outside of epoll, along with the helper that decides if we need to copy the passed in epoll_event. Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Jens Axboe
|
58e41a44c4 |
eventpoll: abstract out epoll_ctl() handler
No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Jens Axboe
|
f86cd20c94 |
io_uring: fix linked command file table usage
We're not consistent in how the file table is grabbed and assigned if we have a command linked that requires the use of it. Add ->file_table to the io_op_defs[] array, and use that to determine when to grab the table instead of having the handlers set it if they need to defer. This also means we can kill the IO_WQ_WORK_NEEDS_FILES flag. We always initialize work->files, so io-wq can just check for that. Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Linus Torvalds
|
3893c2025f |
Changes since last update:
- fix an out-of-bound read access introduced in v5.3, which could rarely cause data corruption; - various cleanup patches. -----BEGIN PGP SIGNATURE----- iIwEABYIADQWIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCXjCB9xYcZ2FveGlhbmcy NUBodWF3ZWkuY29tAAoJEDk3MdwfteYEW+0BALDobAWvVw7Bxkz0tkFkmoZKF4eG Otdzi/EhLq6baeyGAQCtSbTblu9/hAL53p6++RsTXazzzWDUyZZKtZj8MkR5Ag== =c1Fs -----END PGP SIGNATURE----- Merge tag 'erofs-for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "A regression fix, several cleanups and (maybe) plus an upcoming new mount api convert patch as a part of vfs update are considered available for this cycle. All commits have been in linux-next and tested with no smoke out. Summary: - fix an out-of-bound read access introduced in v5.3, which could rarely cause data corruption - various cleanup patches" * tag 'erofs-for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: clean up z_erofs_submit_queue() erofs: fold in postsubmit_is_all_bypassed() erofs: fix out-of-bound read for shifted uncompressed block erofs: remove void tagging/untagging of workgroup pointers erofs: remove unused tag argument while registering a workgroup erofs: remove unused tag argument while finding a workgroup erofs: correct indentation of an assigned structure inside a function |
||
Linus Torvalds
|
5307040655 |
Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull adfs updates from Al Viro: "adfs stuff for this cycle" * 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits) fs/adfs: bigdir: Fix an error code in adfs_fplus_read() Documentation: update adfs filesystem documentation fs/adfs: mostly divorse inode number from indirect disc address fs/adfs: super: add support for E and E+ floppy image formats fs/adfs: super: extract filesystem block probe fs/adfs: dir: remove debug in adfs_dir_update() fs/adfs: super: fix inode dropping fs/adfs: bigdir: implement directory update support fs/adfs: bigdir: calculate and validate directory checkbyte fs/adfs: bigdir: directory validation strengthening fs/adfs: bigdir: extract directory validation fs/adfs: bigdir: factor out directory entry offset calculation fs/adfs: newdir: split out directory commit from update fs/adfs: newdir: clean up adfs_f_update() fs/adfs: newdir: merge adfs_dir_read() into adfs_f_read() fs/adfs: newdir: improve directory validation fs/adfs: newdir: factor out directory format validation fs/adfs: dir: use pointers to access directory head/tails fs/adfs: dir: add more efficient iterate() per-format method fs/adfs: dir: switch to iterate_shared method ... |
||
Linus Torvalds
|
6aee4badd8 |
Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull openat2 support from Al Viro: "This is the openat2() series from Aleksa Sarai. I'm afraid that the rest of namei stuff will have to wait - it got zero review the last time I'd posted #work.namei, and there had been a leak in the posted series I'd caught only last weekend. I was going to repost it on Monday, but the window opened and the odds of getting any review during that... Oh, well. Anyway, openat2 part should be ready; that _did_ get sane amount of review and public testing, so here it comes" From Aleksa's description of the series: "For a very long time, extending openat(2) with new features has been incredibly frustrating. This stems from the fact that openat(2) is possibly the most famous counter-example to the mantra "don't silently accept garbage from userspace" -- it doesn't check whether unknown flags are present[1]. This means that (generally) the addition of new flags to openat(2) has been fraught with backwards-compatibility issues (O_TMPFILE has to be defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old kernels gave errors, since it's insecure to silently ignore the flag[2]). All new security-related flags therefore have a tough road to being added to openat(2). Furthermore, the need for some sort of control over VFS's path resolution (to avoid malicious paths resulting in inadvertent breakouts) has been a very long-standing desire of many userspace applications. This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum project[5]) with a few additions and changes made based on the previous discussion within [6] as well as others I felt were useful. In line with the conclusions of the original discussion of AT_NO_JUMPS, the flag has been split up into separate flags. However, instead of being an openat(2) flag it is provided through a new syscall openat2(2) which provides several other improvements to the openat(2) interface (see the patch description for more details). The following new LOOKUP_* flags are added: LOOKUP_NO_XDEV: Blocks all mountpoint crossings (upwards, downwards, or through absolute links). Absolute pathnames alone in openat(2) do not trigger this. Magic-link traversal which implies a vfsmount jump is also blocked (though magic-link jumps on the same vfsmount are permitted). LOOKUP_NO_MAGICLINKS: Blocks resolution through /proc/$pid/fd-style links. This is done by blocking the usage of nd_jump_link() during resolution in a filesystem. The term "magic-links" is used to match with the only reference to these links in Documentation/, but I'm happy to change the name. It should be noted that this is different to the scope of ~LOOKUP_FOLLOW in that it applies to all path components. However, you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it will *not* fail (assuming that no parent component was a magic-link), and you will have an fd for the magic-link. In order to correctly detect magic-links, the introduction of a new LOOKUP_MAGICLINK_JUMPED state flag was required. LOOKUP_BENEATH: Disallows escapes to outside the starting dirfd's tree, using techniques such as ".." or absolute links. Absolute paths in openat(2) are also disallowed. Conceptually this flag is to ensure you "stay below" a certain point in the filesystem tree -- but this requires some additional to protect against various races that would allow escape using "..". Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it can trivially beam you around the filesystem (breaking the protection). In future, there might be similar safety checks done as in LOOKUP_IN_ROOT, but that requires more discussion. In addition, two new flags are added that expand on the above ideas: LOOKUP_NO_SYMLINKS: Does what it says on the tin. No symlink resolution is allowed at all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an fd for the symlink as long as no parent path had a symlink component. LOOKUP_IN_ROOT: This is an extension of LOOKUP_BENEATH that, rather than blocking attempts to move past the root, forces all such movements to be scoped to the starting point. This provides chroot(2)-like protection but without the cost of a chroot(2) for each filesystem operation, as well as being safe against race attacks that chroot(2) is not. If a race is detected (as with LOOKUP_BENEATH) then an error is generated, and similar to LOOKUP_BENEATH it is not permitted to cross magic-links with LOOKUP_IN_ROOT. The primary need for this is from container runtimes, which currently need to do symlink scoping in userspace[7] when opening paths in a potentially malicious container. There is a long list of CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT (such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a few). In order to make all of the above more usable, I'm working on libpathrs[8] which is a C-friendly library for safe path resolution. It features a userspace-emulated backend if the kernel doesn't support openat2(2). Hopefully we can get userspace to switch to using it, and thus get openat2(2) support for free once it's ready. Future work would include implementing things like RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow programs to be sure they don't hit DoSes though stale NFS handles)" * 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Documentation: path-lookup: include new LOOKUP flags selftests: add openat2(2) selftests open: introduce openat2(2) syscall namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution namei: LOOKUP_IN_ROOT: chroot-like scoped resolution namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution namei: LOOKUP_NO_XDEV: block mountpoint crossing namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution namei: LOOKUP_NO_SYMLINKS: block symlink resolution namei: allow set_root() to produce errors namei: allow nd_jump_link() to produce errors nsfs: clean-up ns_get_path() signature to return int namei: only return -ECHILD from follow_dotdot_rcu() |
||
Linus Torvalds
|
975f9ce9a0 |
Driver core changes for 5.6-rc1
Here is a small set of changes for 5.6-rc1 for the driver core and some firmware subsystem changes. Included in here are: - device.h splitup like you asked for months ago - devtmpfs minor cleanups - firmware core minor changes - debugfs fix for lockdown mode - kernfs cleanup fix - cpu topology minor fix All of these have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXjFL1g8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymzfwCgpHAaFeiF1u2XS1pvfySssOUeQ8sAoMXWWgr4 BHms+ey3TaUnzm38dxk8 =Xv9r -----END PGP SIGNATURE----- Merge tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is a small set of changes for 5.6-rc1 for the driver core and some firmware subsystem changes. Included in here are: - device.h splitup like you asked for months ago - devtmpfs minor cleanups - firmware core minor changes - debugfs fix for lockdown mode - kernfs cleanup fix - cpu topology minor fix All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (22 commits) firmware: Rename FW_OPT_NOFALLBACK to FW_OPT_NOFALLBACK_SYSFS devtmpfs: factor out common tail of devtmpfs_{create,delete}_node devtmpfs: initify a bit devtmpfs: simplify initialization of mount_dev devtmpfs: factor out setup part of devtmpfsd() devtmpfs: fix theoretical stale pointer deref in devtmpfsd() driver core: platform: fix u32 greater or equal to zero comparison cpu-topology: Don't error on more than CONFIG_NR_CPUS CPUs in device tree debugfs: Return -EPERM when locked down driver core: Print device when resources present in really_probe() driver core: Fix test_async_driver_probe if NUMA is disabled driver core: platform: Prevent resouce overflow from causing infinite loops fs/kernfs/dir.c: Clean code by removing always true condition component: do not dereference opaque pointer in debugfs drivers/component: remove modular code debugfs: Fix warnings when building documentation device.h: move 'struct driver' stuff out to device/driver.h device.h: move 'struct class' stuff out to device/class.h device.h: move 'struct bus' stuff out to device/bus.h device.h: move dev_printk()-like functions to dev_printk.h ... |
||
Jens Axboe
|
75c6a03904 |
io_uring: support using a registered personality for commands
For personalities previously registered via IORING_REGISTER_PERSONALITY, allow any command to select them. This is done through setting sqe->personality to the id returned from registration, and then flagging sqe->flags with IOSQE_PERSONALITY. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Jens Axboe
|
071698e13a |
io_uring: allow registering credentials
If an application wants to use a ring with different kinds of credentials, it can register them upfront. We don't lookup credentials, the credentials of the task calling IORING_REGISTER_PERSONALITY is used. An 'id' is returned for the application to use in subsequent personality support. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Pavel Begunkov
|
24369c2e3b |
io_uring: add io-wq workqueue sharing
If IORING_SETUP_ATTACH_WQ is set, it expects wq_fd in io_uring_params to be a valid io_uring fd io-wq of which will be shared with the newly created io_uring instance. If the flag is set but it can't share io-wq, it fails. This allows creation of "sibling" io_urings, where we prefer to keep the SQ/CQ private, but want to share the async backend to minimize the amount of overhead associated with having multiple rings that belong to the same backend. Reported-by: Jens Axboe <axboe@kernel.dk> Reported-by: Daurnimator <quae@daurnimator.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Pavel Begunkov
|
eba6f5a330 |
io-wq: allow grabbing existing io-wq
Export a helper to attach to an existing io-wq, rather than setting up a new one. This is doable now that we have reference counted io_wq's. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Jens Axboe
|
cccf0ee834 |
io_uring/io-wq: don't use static creds/mm assignments
We currently setup the io_wq with a static set of mm and creds. Even for a single-use io-wq per io_uring, this is suboptimal as we have may have multiple enters of the ring. For sharing the io-wq backend, it doesn't work at all. Switch to passing in the creds and mm when the work item is setup. This means that async work is no longer deferred to the io_uring mm and creds, it is done with the current mm and creds. Flag this behavior with IORING_FEAT_CUR_PERSONALITY, so applications know they can rely on the current personality (mm and creds) being the same for direct issue and async issue. Reviewed-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Linus Torvalds
|
a78208e243 |
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu: "API: - Removed CRYPTO_TFM_RES flags - Extended spawn grabbing to all algorithm types - Moved hash descsize verification into API code Algorithms: - Fixed recursive pcrypt dead-lock - Added new 32 and 64-bit generic versions of poly1305 - Added cryptogams implementation of x86/poly1305 Drivers: - Added support for i.MX8M Mini in caam - Added support for i.MX8M Nano in caam - Added support for i.MX8M Plus in caam - Added support for A33 variant of SS in sun4i-ss - Added TEE support for Raven Ridge in ccp - Added in-kernel API to submit TEE commands in ccp - Added AMD-TEE driver - Added support for BCM2711 in iproc-rng200 - Added support for AES256-GCM based ciphers for chtls - Added aead support on SEC2 in hisilicon" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits) crypto: arm/chacha - fix build failured when kernel mode NEON is disabled crypto: caam - add support for i.MX8M Plus crypto: x86/poly1305 - emit does base conversion itself crypto: hisilicon - fix spelling mistake "disgest" -> "digest" crypto: chacha20poly1305 - add back missing test vectors and test chunking crypto: x86/poly1305 - fix .gitignore typo tee: fix memory allocation failure checks on drv_data and amdtee crypto: ccree - erase unneeded inline funcs crypto: ccree - make cc_pm_put_suspend() void crypto: ccree - split overloaded usage of irq field crypto: ccree - fix PM race condition crypto: ccree - fix FDE descriptor sequence crypto: ccree - cc_do_send_request() is void func crypto: ccree - fix pm wrongful error reporting crypto: ccree - turn errors to debug msgs crypto: ccree - fix AEAD decrypt auth fail crypto: ccree - fix typo in comment crypto: ccree - fix typos in error msgs crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data crypto: x86/sha - Eliminate casts on asm implementations ... |
||
Linus Torvalds
|
68353984d6 |
Various SMB3/CIFS fixes including 4 for stable. Improvement to fallocate (enables 3 additional xfstests). Fix for file creation when mounting with modefromsid, add ability to backup/restore dos attributes and creation time, DFS failover and reconnect fixes and performance optimization for readir
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl4wd8IACgkQiiy9cAdy T1FFmQwAuh4efFQuBjV3/0CRHC7ezqW88/zVuRqK5m1xC+VDW2tcV24mcsyVOOMW wgOdr0NXWRxrPHoJfIGLPFwq7o59PFWR5Cdwjpu/th72W9zftAPZdQCjCO3ssO3u tGmtavN/rk61A3ALjqtjC6lRx+A1wxYoxgBNlAXwyCktWShQBWHFsRnFq6w7e76H lvaXkk0ubh7LDnCvwqpMupkD3n//Y9UAHQ1D1oIC29i6F5M8SmGs36e1ptuSCssq q4Xb48uc+QXeD8NvA0Y9S/Su3NZEy/iqJpFok75tHzbr+ypAthhvxfMW3SnOh2NI t4zWUSIcNU7b2v+QjrWqJlN8vpYB8R4Awti0FkI1LvjjZwZ1lTcW2o40pAwgQl3o BoFqlksbOf+457tmcuQ7/psqRxAvgA3z7rSFJbYvKFYR2Go0E6oxpi1mXjph5Bw3 K5s3ieEbEC2mbAa/5Qf4HHit4jL4tGxijo5MhUB9uMymCGyxYIKUJcVnZ/LnTgpC WZTjeNqT =7t46 -----END PGP SIGNATURE----- Merge tag '5.6-smb3-fixes-and-dfs-and-readdir-improvements' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs updates from Steve French: "Various SMB3/CIFS fixes including four for stable. - Improvement to fallocate (enables 3 additional xfstests) - Fix for file creation when mounting with modefromsid - Add ability to backup/restore dos attributes and creation time - DFS failover and reconnect fixes - performance optimization for readir Note that due to the upcoming SMB3 Test Event (at SNIA SDC next week) there will likely be more changesets near the end of the merge window (since we will be testing heavily next week, I held off on some patches and I expect some additional multichannel patches as well as patches to enable some additional xfstests)" * tag '5.6-smb3-fixes-and-dfs-and-readdir-improvements' of git://git.samba.org/sfrench/cifs-2.6: (24 commits) CIFS: Fix task struct use-after-free on reconnect cifs: use PTR_ERR_OR_ZERO() to simplify code cifs: add support for fallocate mode 0 for non-sparse files cifs: fix NULL dereference in match_prepath smb3: fix default permissions on new files when mounting with modefromsid CIFS: Add support for setting owner info, dos attributes, and create time cifs: remove set but not used variable 'server' cifs: Fix memory allocation in __smb2_handle_cancelled_cmd() cifs: Fix mount options set in automount cifs: fix unitialized variable poential problem with network I/O cache lock patch cifs: Fix return value in __update_cache_entry cifs: Avoid doing network I/O while holding cache lock cifs: Fix potential deadlock when updating vol in cifs_reconnect() cifs: Merge is_path_valid() into get_normalized_path() cifs: Introduce helpers for finding TCP connection cifs: Get rid of kstrdup_const()'d paths cifs: Clean up DFS referral cache cifs: Don't use iov_iter::type directly cifs: set correct max-buffer-size for smb2_ioctl_init() cifs: use compounding for open and first query-dir for readdir() ... |
||
Linus Torvalds
|
c8994374d9 |
fsverity updates for 5.6
- Optimize fs-verity sequential read performance by implementing readahead of Merkle tree pages. This allows the Merkle tree to be read in larger chunks. - Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by implementing readahead of data pages. - Allocate the hash requests from a mempool in order to eliminate the possibility of allocation failures during I/O. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXi+OuhQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK/ZIAP452KKPs6AGXrClZ2l+5nFbkDLN9Or8 w277B0BeRnu5ogEApmKnYsmRsduLZRJbni7VCpkJLAYI2kmFCwGkFfe3tAQ= =svdR -----END PGP SIGNATURE----- Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity updates from Eric Biggers: - Optimize fs-verity sequential read performance by implementing readahead of Merkle tree pages. This allows the Merkle tree to be read in larger chunks. - Optimize FS_IOC_ENABLE_VERITY performance in the uncached case by implementing readahead of data pages. - Allocate the hash requests from a mempool in order to eliminate the possibility of allocation failures during I/O. * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fs-verity: use u64_to_user_ptr() fs-verity: use mempool for hash requests fs-verity: implement readahead of Merkle tree pages fs-verity: implement readahead for FS_IOC_ENABLE_VERITY |
||
Linus Torvalds
|
f0d8744143 |
fscrypt updates for 5.6
- Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be provided via a keyring key. - Prepare for the new dirhash method (SipHash of plaintext name) that will be used by directories that are both encrypted and casefolded. - Switch to a new format for "no-key names" that prepares for the new dirhash method, and also fixes a longstanding bug where multiple filenames could map to the same no-key name. - Allow the crypto algorithms used by fscrypt to be built as loadable modules when the fscrypt-capable filesystems are. - Optimize fscrypt_zeroout_range(). - Various cleanups. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXi+OfxQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK0tJAQDkxUl11NRqVgS06TLWAniKVMWh3kaL CaonjEmfs1m6kgEA+TP8coTAxUvJNubaHz3J3x9dyx9RPUVFyUzSB0J/TQg= =XkXJ -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fscrypt updates from Eric Biggers: - Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be provided via a keyring key. - Prepare for the new dirhash method (SipHash of plaintext name) that will be used by directories that are both encrypted and casefolded. - Switch to a new format for "no-key names" that prepares for the new dirhash method, and also fixes a longstanding bug where multiple filenames could map to the same no-key name. - Allow the crypto algorithms used by fscrypt to be built as loadable modules when the fscrypt-capable filesystems are. - Optimize fscrypt_zeroout_range(). - Various cleanups. * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: (26 commits) fscrypt: improve format of no-key names ubifs: allow both hash and disk name to be provided in no-key names ubifs: don't trigger assertion on invalid no-key filename fscrypt: clarify what is meant by a per-file key fscrypt: derive dirhash key for casefolded directories fscrypt: don't allow v1 policies with casefolding fscrypt: add "fscrypt_" prefix to fname_encrypt() fscrypt: don't print name of busy file when removing key ubifs: use IS_ENCRYPTED() instead of ubifs_crypt_is_encrypted() fscrypt: document gfp_flags for bounce page allocation fscrypt: optimize fscrypt_zeroout_range() fscrypt: remove redundant bi_status check fscrypt: Allow modular crypto algorithms fscrypt: include <linux/ioctl.h> in UAPI header fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info() fscrypt: remove fscrypt_is_direct_key_policy() fscrypt: move fscrypt_valid_enc_modes() to policy.c fscrypt: check for appropriate use of DIRECT_KEY flag earlier fscrypt: split up fscrypt_supported_policy() by policy version fscrypt: introduce fscrypt_needs_contents_encryption() ... |
||
Linus Torvalds
|
b5f7ab6b1c |
fs-dedupe-last-block-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl4vEM0ACgkQxWXV+ddt WDvWFA//eHQZjWQ4FNby9AZwkDS4NIq2v6XGg7Xx9Ioxf1NenuBdyt1uNda2I3Mw 4/0tEnrd3XL3POldx4TTZXlg0itfXeY2un8pGREPwbqGL67296wmENUT8kwXU8aX 4zMuqO8veO7jzqeibX7T0VntnAC4Q+yWUd8Tb07KXaNH6Jeowo2rbU7n2CdI8T91 IZdc/AxnmomVQ3Y0lmQe5fL5z9KRkwud7toXek8yVXyheX1JgnqkzVjCgLyk3ai7 0VP0jfOPtC1qoMPR+buD2VuxaDTrLGkmsaG9l4XZghmmSnAvTMOxIosFA/SLqHwp LPDZ5RABbxMj1MHTAr2Bq+9uokqSIIdbSWFiMazeJsL/JWOiOQk0BekNJudeL7Ow PtGIZkKC+c3qKIjLymnd+u56klCe/O4vhx7wOHjoGxC0j+qRN473LK8dmGmgM9Yv XH5YfWqDpjt5h7SSei94S7JrqaJ29+gY8bMtMXJhx46jHh6AdXAkszU83kLr8Biz mvV49ECib12RuI4EVB4I3XedCftzMRH5ocN/Y3B2qGn/bU3g9nuExAc24FRBt6QK e/ucc4jQyRpbvpDwsmdE+kSxUvENuIRnlyke9XdOsN9Mth0X85gvYOS6aFS+4qjq QiXborlp+fZY00INYjG6Ck9UyfpdSEUJg452xVkGBhd/emAAIDo= =FGXQ -----END PGP SIGNATURE----- Merge tag 'fs-dedupe-last-block-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull fs deduplication fix from David Sterba: "This is a fix for deduplication bug: the last block of two files is allowed to deduplicated. This got broken in 5.1 by lifting some generic checks to VFS layer. The affected filesystems are btrfs and xfs. The patches are marked for stable as the bug decreases deduplication effectivity" * tag 'fs-dedupe-last-block-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: make deduplication with range including the last block work fs: allow deduplication of eof block into the end of the destination file |
||
Linus Torvalds
|
81a046b18b |
for-5.6-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl4vDYkACgkQxWXV+ddt WDsNJQ//WJEcYoRpN5Y7oOIk/vo5ulF68P3kUh3hl206A13xpaHorvTvZKAD5s2o C6xACJk839sGEhMdDRWvdeBDCHTedMk7EXjiZ6kJD+7EPpWmDllI5O6DTolT7SR2 b9zId4KCO+m8LiLZccRsxCJbdkJ7nJnz2c5+063TjsS3uq1BFudctRUjW/XnFCCZ JIE5iOkdXrA+bFqc+l2zKTwgByQyJg+hVKRTZEJBT0QZsyNQvHKzXAmXxGopW8bO SeuzFkiFTA0raK8xBz6mUwaZbk40Qlzm9v9AitFZx0x2nvQnMu447N3xyaiuyDWd Li1aMN0uFZNgSz+AemuLfG0Wj70x1HrQisEj958XKzn4cPpUuMcc3lr1PZ2NIX+C p6pSgaLOEq8Rc0U78/euZX6oyiLJPAmQO1TdkVMHrcMi36esBI6uG11rds+U+xeK XoP20qXLFVYLLrl3wH9F4yIzydfMYu66Us1AeRPRB14NSSa7tbCOG//aCafOoLM6 518sJCazSWlv1kDewK8dtLiXc8eM6XJN+KI4NygFZrUj2Rq376q5oovUUKKkn3iN pdHtF/7gAxIx6bZ+jY/gyt/Xe5AdPi7sKggahvrSOL3X+LLINwC4r+vAnnpd6yh4 NfJj5fobvc/mO9PEVMwgJ8PmHw5uNqeMlORGjk7stQs7Oez3tCw= =4OkE -----END PGP SIGNATURE----- Merge tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Features, highlights: - async discard - "mount -o discard=async" to enable it - freed extents are not discarded immediatelly, but grouped together and trimmed later, with IO rate limiting - the "sync" mode submits short extents that could have been ignored completely by the device, for SATA prior to 3.1 the requests are unqueued and have a big impact on performance - the actual discard IO requests have been moved out of transaction commit to a worker thread, improving commit latency - IO rate and request size can be tuned by sysfs files, for now enabled only with CONFIG_BTRFS_DEBUG as we might need to add/delete the files and don't have a stable-ish ABI for general use, defaults are conservative - export device state info in sysfs, eg. missing, writeable - no discard of extents known to be untouched on disk (eg. after reservation) - device stats reset is logged with process name and PID that called the ioctl Fixes: - fix missing hole after hole punching and fsync when using NO_HOLES - writeback: range cyclic mode could miss some dirty pages and lead to OOM - two more corner cases for metadata_uuid change after power loss during the change - fix infinite loop during fsync after mix of rename operations Core changes: - qgroup assign returns ENOTCONN when quotas not enabled, used to return EINVAL that was confusing - device closing does not need to allocate memory anymore - snapshot aware code got removed, disabled for years due to performance problems, reimplmentation will allow to select wheter defrag breaks or does not break COW on shared extents - tree-checker: - check leaf chunk item size, cross check against number of stripes - verify location keys for DIR_ITEM, DIR_INDEX and XATTR items - new self test for physical -> logical mapping code, used for super block range exclusion - assertion helpers/macros updated to avoid objtool "unreachable code" reports on older compilers or config option combinations" * tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (84 commits) btrfs: free block groups after free'ing fs trees btrfs: Fix split-brain handling when changing FSID to metadata uuid btrfs: Handle another split brain scenario with metadata uuid feature btrfs: Factor out metadata_uuid code from find_fsid. btrfs: Call find_fsid from find_fsid_inprogress Btrfs: fix infinite loop during fsync after rename operations btrfs: set trans->drity in btrfs_commit_transaction btrfs: drop log root for dropped roots btrfs: sysfs, add devid/dev_state kobject and device attributes btrfs: Refactor btrfs_rmap_block to improve readability btrfs: Add self-tests for btrfs_rmap_block btrfs: selftests: Add support for dummy devices btrfs: Move and unexport btrfs_rmap_block btrfs: separate definition of assertion failure handlers btrfs: device stats, log when stats are zeroed btrfs: fix improper setting of scanned for range cyclic write cache pages btrfs: safely advance counter when looking up bio csums btrfs: remove unused member btrfs_device::work btrfs: remove unnecessary wrapper get_alloc_profile btrfs: add correction to handle -1 edge case in async discard ... |
||
Linus Torvalds
|
4244057c3d |
Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Ingo Molnar: "The main change in this tree is the extension of the resctrl procfs ABI with a new file that helps tooling to navigate from tasks back to resctrl groups: /proc/{pid}/cpu_resctrl_groups. Also fix static key usage for certain feature combinations and simplify the task exit resctrl case" * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Add task resctrl information display x86/resctrl: Check monitoring static key in the MBM overflow handler x86/resctrl: Do not reconfigure exiting tasks |
||
Linus Torvalds
|
c677124e63 |
Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ... |
||
Linus Torvalds
|
c0e809e244 |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "Kernel side changes: - Ftrace is one of the last W^X violators (after this only KLP is left). These patches move it over to the generic text_poke() interface and thereby get rid of this oddity. This requires a surprising amount of surgery, by Peter Zijlstra. - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to count certain types of events that have a special, quirky hw ABI (by Kim Phillips) - kprobes fixes by Masami Hiramatsu Lots of tooling updates as well, the following subcommands were updated: annotate/report/top, c2c, clang, record, report/top TUI, sched timehist, tests; plus updates were done to the gtk ui, libperf, headers and the parser" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Constrain Large Increment per Cycle events perf/x86/intel/rapl: Add Comet Lake support tracing: Initialize ret in syscall_enter_define_fields() perf header: Use last modification time for timestamp perf c2c: Fix return type for histogram sorting comparision functions perf beauty sockaddr: Fix augmented syscall format warning perf/ui/gtk: Fix gtk2 build perf ui gtk: Add missing zalloc object perf tools: Use %define api.pure full instead of %pure-parser libperf: Setup initial evlist::all_cpus value perf report: Fix no libunwind compiled warning break s390 issue perf tools: Support --prefix/--prefix-strip perf report: Clarify in help that --children is default tools build: Fix test-clang.cpp with Clang 8+ perf clang: Fix build with Clang 9 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic tools lib: Fix builds when glibc contains strlcpy() perf report/top: Make 'e' visible in the help and make it toggle showing callchains perf report/top: Do not offer annotation for symbols without samples ... |
||
Bob Peterson
|
a31b4ec539 |
Revert "gfs2: eliminate tr_num_revoke_rm"
This reverts commit e955537e3262de8e56f070b13817f525f472fa00. Before patch e955537e32, tr_num_revoke tracked the number of revokes added to the transaction, and tr_num_revoke_rm tracked how many revokes were removed. But since revokes are queued off the sdp (superblock) pointer, some transactions could remove more revokes than they added. (e.g. revokes added by a different process). Commit e955537e32 eliminated transaction variable tr_num_revoke_rm, but in order to do so, it changed the accounting to always use tr_num_revoke for its math. Since you can remove more revokes than you add, tr_num_revoke could now become a negative value. This negative value broke the assert in function gfs2_trans_end: if (gfs2_assert_withdraw(sdp, (nbuf <=3D tr->tr_blocks) && (tr->tr_num_revoke <=3D tr->tr_revokes))) One way to fix this is to simply remove the tr_num_revoke clause from the assert and allow the value to become negative. Andreas didn't like that idea, so instead, we decided to revert e955537e32. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> |
||
Linus Torvalds
|
ab67f60025 |
A small set of SMP core code changes:
- Rework the smp function call core code to avoid the allocation of an additional cpumask. - Remove the not longer required GFP argument from on_each_cpu_cond() and on_each_cpu_cond_mask() and fixup the callers. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl4vcrATHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYocr1D/4ptWrZKsgBxGKBP34lvJAjd0KRqVoz J9dLAN+AAs6YZSnOmRBX1b9d9IL2PrccOEF+J/Ja3ZkB+PAoAQ9W3uCHkZ77WUph xx5eJahZCo+3nZ6amGgS2cPdG8WjxSK3enxPcU4pJhV/QaaP7R9BZt5YQgreYAQO kRi0qyt10AExLqLd+077GX5DKcEOXwwVG/qckUQK2h8Kkd68vTbjDxggvsHwmpSE MHaszv85UpE+YQbT6DyG5Hi4kK3AJeODBy/fKr2VODIBLZpKiuQ5kK4lbNHYPpVB wXw0umXHLQggrKoPKo58ayoCXD0bAG9JT0rvapjUJIz1/9YejQ6lB/t5f0dPbSrU al4CJq/pfNky4H6uLWFVbAXJabJuBcB/eG1csaM88Yw0pEXkbnHCOkJAdosoDhhl qNQYg4yaE9tTuy1chXDMntH0R0Qztqry6+DMsczJxT21TgERsHCRJV+mGLV46/ZN GXJEoJ/cnjNJlqj8GirjbksPRbxuvmQNHRVrTh8qOSxbPKUQZfZocp9HHNmFsBaN Q07VgWMHXzYj1L4r3cbJ/ONpOCo66lw7F//MNGk0eIWdeL6H7XZvJQPX+YUrLsZc tVlZh8mZOGbRiM8g1dN0BSJO7QrVYmJWGb0oQQtv5tVSRN/V8Y9VZ8YX8lpYlF1e ETkrZLGhTJWp4A== =M4aK -----END PGP SIGNATURE----- Merge tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core SMP updates from Thomas Gleixner: "A small set of SMP core code changes: - Rework the smp function call core code to avoid the allocation of an additional cpumask - Remove the not longer required GFP argument from on_each_cpu_cond() and on_each_cpu_cond_mask() and fixup the callers" * tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Remove allocation mask from on_each_cpu_cond.*() smp: Add a smp_cond_func_t argument to smp_call_function_many() smp: Use smp_cond_func_t as type for the conditional function |
||
Linus Torvalds
|
e279160f49 |
The timekeeping and timers departement provides:
- Time namespace support: If a container migrates from one host to another then it expects that clocks based on MONOTONIC and BOOTTIME are not subject to disruption. Due to different boot time and non-suspended runtime these clocks can differ significantly on two hosts, in the worst case time goes backwards which is a violation of the POSIX requirements. The time namespace addresses this problem. It allows to set offsets for clock MONOTONIC and BOOTTIME once after creation and before tasks are associated with the namespace. These offsets are taken into account by timers and timekeeping including the VDSO. Offsets for wall clock based clocks (REALTIME/TAI) are not provided by this mechanism. While in theory possible, the overhead and code complexity would be immense and not justified by the esoteric potential use cases which were discussed at Plumbers '18. The overhead for tasks in the root namespace (host time offsets = 0) is in the noise and great effort was made to ensure that especially in the VDSO. If time namespace is disabled in the kernel configuration the code is compiled out. Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature and kept on for more than a year addressing review comments, finding better solutions. A pleasant experience. - Overhaul of the alarmtimer device dependency handling to ensure that the init/suspend/resume ordering is correct. - A new clocksource/event driver for Microchip PIT64 - Suspend/resume support for the Hyper-V clocksource - The usual pile of fixes, updates and improvements mostly in the driver code. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl4vbTcTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoXT2D/96iJ3G9Snn2khEQP3XS2rYmtDGw7NO m1n96falwWeGe6zreU80R2Jge5nLxQtNhRoMPLLee1GpHwRC6lvqEqgdZ4LMBrD2 JqV7Gzg8Urmdh+hpDsyTCpeEWEzoMKxiFOX8PxwctqUhM4szEe5iQg2YQsg85Jw2 vG6M93N2xwDILh4rhEMbKjo+5ZmYn7c1RQvpGOSmpKOj940W/N7H2HBsFhdaJ1Kw FW5pFv1211PaU5RV2YNb2dMeeMTT1N3e2VN4Dkadoxp47pb+725gNHEBEjmV9poG Lp4IhzGAPnj8zVD88icQZSTaK3gUHMClxprJ0Pf84WEtiH7SeGu8BPYyu77+oNDe yzcctDJNyCWXkzmaP/fe/HLc0TStbvNAJ5Tagp4BC75gzebeb4/n8RtRT0fKeDYL pxpDPKDAPU7p1JSjxiWAtshqjBycWNY3Z49bA7/VhKBhnv8BDyBPGlYd7/4xrbGr RK7DQNXJwaJaiNJ7p5PiaFxGzNyB0B9sThD/slSlEInIKb4h9YzWr0TV+NB62VnB sDcN+tpLbRPz5/5cHGGfxR0+zKWpfyai8pzbmmaXEaKssjRYwyvcac5EZdgbWpbK k7CqAjoWLA2P+tGeePNJOf5JYK6Vmdyh4clmuwM0zOiRJ9NlWUyMf3z7QYILs4RO UAI+6opYlZEPAw== =x3qT -----END PGP SIGNATURE----- Merge tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timekeeping and timers departement provides: - Time namespace support: If a container migrates from one host to another then it expects that clocks based on MONOTONIC and BOOTTIME are not subject to disruption. Due to different boot time and non-suspended runtime these clocks can differ significantly on two hosts, in the worst case time goes backwards which is a violation of the POSIX requirements. The time namespace addresses this problem. It allows to set offsets for clock MONOTONIC and BOOTTIME once after creation and before tasks are associated with the namespace. These offsets are taken into account by timers and timekeeping including the VDSO. Offsets for wall clock based clocks (REALTIME/TAI) are not provided by this mechanism. While in theory possible, the overhead and code complexity would be immense and not justified by the esoteric potential use cases which were discussed at Plumbers '18. The overhead for tasks in the root namespace (ie where host time offsets = 0) is in the noise and great effort was made to ensure that especially in the VDSO. If time namespace is disabled in the kernel configuration the code is compiled out. Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature and kept on for more than a year addressing review comments, finding better solutions. A pleasant experience. - Overhaul of the alarmtimer device dependency handling to ensure that the init/suspend/resume ordering is correct. - A new clocksource/event driver for Microchip PIT64 - Suspend/resume support for the Hyper-V clocksource - The usual pile of fixes, updates and improvements mostly in the driver code" * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n alarmtimer: Use wakeup source from alarmtimer platform device alarmtimer: Make alarmtimer platform device child of RTC device alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality hrtimer: Add missing sparse annotation for __run_timer() lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres() MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources clocksource/drivers/timer-microchip-pit64b: Fix sparse warning clocksource/drivers/exynos_mct: Rename Exynos to lowercase clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access clocksource/drivers/timer-ti-dm: Switch to platform_get_irq clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource clocksource/drivers/bcm2835_timer: Fix memory leak of timer clocksource/drivers/cadence-ttc: Use ttc driver as platform driver clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page ... |
||
Jens Axboe
|
848f7e1887 |
io-wq: make the io_wq ref counted
In preparation for sharing an io-wq across different users, add a reference count that manages destruction of it. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Pavel Begunkov
|
9466f43741 |
io_uring: fix refcounting with batched allocations at OOM
In case of out of memory the second argument of percpu_ref_put_many() in io_submit_sqes() may evaluate into "nr - (-EAGAIN)", that is clearly wrong. Fixes: 2b85edfc0c90 ("io_uring: batch getting pcpu references") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Pavel Begunkov
|
8cdf2193a3 |
io_uring: add comment for drain_next
Draining the middle of a link is tricky, so leave a comment there Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |