75596 Commits

Author SHA1 Message Date
Xiaomeng Tong
a96c94481f cifs: fix incorrect use of list iterator after the loop
The bug is here:
if (!tcon) {
	resched = true;
	list_del_init(&ses->rlist);
	cifs_put_smb_ses(ses);

Because the list_for_each_entry() never exits early (without any
break/goto/return inside the loop), the iterator 'ses' after the
loop will always be an pointer to a invalid struct containing the
HEAD (&pserver->smb_ses_list). As a result, the uses of 'ses' above
will lead to a invalid memory access.

The original intention should have been to walk each entry 'ses' in
'&tmp_ses_list', delete '&ses->rlist' and put 'ses'. So fix it with
a list_for_each_entry_safe().

Cc: stable@vger.kernel.org # 5.17
Fixes: 3663c9045f51a ("cifs: check reconnects for channels of active tcons too")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-23 15:20:15 -05:00
Paulo Alcantara
2d004c6cae ksmbd: store fids as opaque u64 integers
There is no need to store the fids as le64 integers as they are opaque
to the client and only used for equality.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Tom Talpey <tom@talpey.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-23 15:20:15 -05:00
Paulo Alcantara
351a59dace cifs: fix bad fids sent over wire
The client used to partially convert the fids to le64, while storing
or sending them by using host endianness.  This broke the client on
big-endian machines.  Instead of converting them to le64, store them
as opaque integers and then avoid byteswapping when sending them over
wire.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-23 15:20:14 -05:00
Ronnie Sahlberg
8708b10760 cifs: change smb2_query_info_compound to use a cached fid, if available
This will reduce the number of Open/Close we send on the wire and replace
a Open/GetInfo/Close compound with just a simple GetInfo request
IF we have a cached handle for the object.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-23 15:20:14 -05:00
Ronnie Sahlberg
5e0c969e9e cifs: convert the path to utf16 in smb2_query_info_compound
and not in the callers.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-23 15:17:22 -05:00
Andreas Gruenbacher
124c458a40 gfs2: Minor retry logic cleanup
Clean up the retry logic in the read and write functions somewhat.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-23 16:52:41 +01:00
Andreas Gruenbacher
52f3f033a5 gfs2: Disable page faults during lockless buffered reads
During lockless buffered reads, filemap_read() holds page cache page
references while trying to copy data to the user-space buffer.  The
calling process isn't holding the inode glock, but the page references
it holds prevent those pages from being removed from the page cache, and
that prevents the underlying inode glock from being moved to another
node.  Thus, we can end up in the same kinds of distributed deadlock
situations as with normal (non-lockless) buffered reads.

Fix that by disabling page faults during lockless reads as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-23 16:52:41 +01:00
Andreas Gruenbacher
bb7f5d96aa gfs2: Fix should_fault_in_pages() logic
Fix the fault-in window size logic:
* Use a maximum window size of 1 MiB instead of BIO_MAX_VECS * PAGE_SIZE.
  The previous window size was always one page because the pages variable
  was accidentally being defined and then redefined in
  should_fault_in_pages().
* The nr_dirtied heuristic for guessing when there might be memory
  pressure often results in very small window sizes.  Don't let
  nr_dirtied drop below 8 pages (as btrfs does).
* Compute the window size in units of bytes, not pages.
* Account for page overlap (unaligned iterators).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-23 16:51:46 +01:00
Christoph Hellwig
61285ff72a fs: do not pass __GFP_HIGHMEM to bio_alloc in do_mpage_readpage
The mpage bio alloc cleanup accidentally removed clearing ~GFP_KERNEL
bits from the mask passed to bio_alloc.  Fix this up in a slightly
less obsfucated way that mirrors what iomap does in its readpage code.

Fixes: 77c436de01c0 ("mpage: pass the operation to bio_alloc")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Link: https://lore.kernel.org/r/20220323153952.1418560-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-23 09:42:15 -06:00
Jens Axboe
4d55f238f8 io_uring: don't recycle provided buffer if punted to async worker
We only really need to recycle the buffer when going async for a file
type that has an indefinite reponse time (eg non-file/bdev). And for
files that to arm poll, the async worker will arm poll anyway and the
buffer will get recycled there.

In that latter case, we're not holding ctx->uring_lock. Ensure we take
the issue_flags into account and acquire it if we need to.

Fixes: b1c62645758e ("io_uring: recycle provided buffers if request goes async")
Reported-by: Stefan Roesch <shr@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-23 06:26:06 -06:00
Jens Axboe
d89a4fac0f io_uring: fix assuming triggered poll waitqueue is the single poll
syzbot reports a recent regression:

BUG: KASAN: use-after-free in __wake_up_common+0x637/0x650 kernel/sched/wait.c:101
Read of size 8 at addr ffff888011e8a130 by task syz-executor413/3618

CPU: 0 PID: 3618 Comm: syz-executor413 Tainted: G        W         5.17.0-syzkaller-01402-g8565d64430f8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0x8d/0x303 mm/kasan/report.c:255
 __kasan_report mm/kasan/report.c:442 [inline]
 kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
 __wake_up_common+0x637/0x650 kernel/sched/wait.c:101
 __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:138
 tty_release+0x657/0x1200 drivers/tty/tty_io.c:1781
 __fput+0x286/0x9f0 fs/file_table.c:317
 task_work_run+0xdd/0x1a0 kernel/task_work.c:164
 exit_task_work include/linux/task_work.h:32 [inline]
 do_exit+0xaff/0x29d0 kernel/exit.c:806
 do_group_exit+0xd2/0x2f0 kernel/exit.c:936
 __do_sys_exit_group kernel/exit.c:947 [inline]
 __se_sys_exit_group kernel/exit.c:945 [inline]
 __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:945
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f439a1fac69

which is due to leaving the request on the waitqueue mistakenly. The
reproducer is using a tty device, which means we end up arming the same
poll queue twice (it uses the same poll waitqueue for both), but in
io_poll_wake() we always just clear REQ_F_SINGLE_POLL regardless of which
entry triggered. This leaves one waitqueue potentially armed after we're
done, which then blows up in tty when the waitqueue is attempted removed.

We have no room to store this information, so simply encode it in the
wait_queue_entry->private where we store the io_kiocb request pointer.

Fixes: 91eac1c69c20 ("io_uring: cache poll/double-poll state with a request flag")
Reported-by: syzbot+09ad4050dd3a120bfccd@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-23 06:26:05 -06:00
Jens Axboe
e2c0cb7c0c io_uring: bump poll refs to full 31-bits
The previous commit:

1bc84c40088 ("io_uring: remove poll entry from list when canceling all")

removed a potential overflow condition for the poll references. They
are currently limited to 20-bits, even if we have 31-bits available. The
upper bit is used to mark for cancelation.

Bump the poll ref space to 31-bits, making that kind of situation much
harder to trigger in general. We'll separately add overflow checking
and handling.

Fixes: aa43477b0402 ("io_uring: poll rework")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-23 06:26:05 -06:00
Linus Torvalds
6b1f86f8e9 Filesystem folio changes for 5.18
Primarily this series converts some of the address_space operations
 to take a folio instead of a page.
 
 ->is_partially_uptodate() takes a folio instead of a page and changes the
 type of the 'from' and 'count' arguments to make it obvious they're bytes.
 ->invalidatepage() becomes ->invalidate_folio() and has a similar type change.
 ->launder_page() becomes ->launder_folio()
 ->set_page_dirty() becomes ->dirty_folio() and adds the address_space as
 an argument.
 
 There are a couple of other misc changes up front that weren't worth
 separating into their own pull request.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4hqMACgkQDpNsjXcp
 gj7r7Af/fVJ7m8kKqjP/IayX3HiJRuIDQw+vM++BlRNXdjz+IyED6whdmFGxJeOY
 BMyT+8ApOAz7ErS4G+7fAv4ScJK/aEgFUsnSeAiCp0PliiEJ5NNJzElp6sVmQ7H5
 SX7+Ek444FZUGsQuy0qL7/ELpR3ditnD7x+5U2g0p5TeaHGUQn84crRyfR4xuhNG
 EBD9D71BOb7OxUcOHe93pTkK51QsQ0aCrcIsB1tkK5KR0BAthn1HqF7ehL90Rvrr
 omx5M7aDWGY4oj7IKrhlAs+55Ah2WaOzrZBp0FXNbr4UENDBKWKyUxErwa4xPkf6
 Gm1iQG/CspOHnxN3YWsd5WjtlL3A+A==
 =cOiq
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache

Pull filesystem folio updates from Matthew Wilcox:
 "Primarily this series converts some of the address_space operations to
  take a folio instead of a page.

  Notably:

   - a_ops->is_partially_uptodate() takes a folio instead of a page and
     changes the type of the 'from' and 'count' arguments to make it
     obvious they're bytes.

   - a_ops->invalidatepage() becomes ->invalidate_folio() and has a
     similar type change.

   - a_ops->launder_page() becomes ->launder_folio()

   - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the
     address_space as an argument.

  There are a couple of other misc changes up front that weren't worth
  separating into their own pull request"

* tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits)
  fs: Remove aops ->set_page_dirty
  fb_defio: Use noop_dirty_folio()
  fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio
  fs: Convert __set_page_dirty_buffers to block_dirty_folio
  nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio()
  mm: Convert swap_set_page_dirty() to swap_dirty_folio()
  ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio
  f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio
  f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio
  f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio
  afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio()
  btrfs: Convert extent_range_redirty_for_io() to use folios
  fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio
  btrfs: Convert from set_page_dirty to dirty_folio
  fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio()
  fs: Add aops->dirty_folio
  fs: Remove aops->launder_page
  orangefs: Convert launder_page to launder_folio
  nfs: Convert from launder_page to launder_folio
  fuse: Convert from launder_page to launder_folio
  ...
2022-03-22 18:26:56 -07:00
Linus Torvalds
9030fb0bb9 Folio changes for 5.18
- Rewrite how munlock works to massively reduce the contention
    on i_mmap_rwsem (Hugh Dickins):
    https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
  - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
    https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
  - Convert GUP to use folios and make pincount available for order-1
    pages. (Matthew Wilcox)
  - Convert a few more truncation functions to use folios (Matthew Wilcox)
  - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
  - Convert rmap_walk to use folios (Matthew Wilcox)
  - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
  - Add support for creating large folios in readahead (Matthew Wilcox)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
 gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
 P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
 s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
 Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
 CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
 v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
 =n9Ad
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Rewrite how munlock works to massively reduce the contention on
   i_mmap_rwsem (Hugh Dickins):

     https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/

 - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
   Hellwig):

     https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/

 - Convert GUP to use folios and make pincount available for order-1
   pages. (Matthew Wilcox)

 - Convert a few more truncation functions to use folios (Matthew
   Wilcox)

 - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
   Wilcox)

 - Convert rmap_walk to use folios (Matthew Wilcox)

 - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)

 - Add support for creating large folios in readahead (Matthew Wilcox)

* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
  mm/damon: minor cleanup for damon_pa_young
  selftests/vm/transhuge-stress: Support file-backed PMD folios
  mm/filemap: Support VM_HUGEPAGE for file mappings
  mm/readahead: Switch to page_cache_ra_order
  mm/readahead: Align file mappings for non-DAX
  mm/readahead: Add large folio readahead
  mm: Support arbitrary THP sizes
  mm: Make large folios depend on THP
  mm: Fix READ_ONLY_THP warning
  mm/filemap: Allow large folios to be added to the page cache
  mm: Turn can_split_huge_page() into can_split_folio()
  mm/vmscan: Convert pageout() to take a folio
  mm/vmscan: Turn page_check_references() into folio_check_references()
  mm/vmscan: Account large folios correctly
  mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
  mm/vmscan: Free non-shmem folios without splitting them
  mm/rmap: Constify the rmap_walk_control argument
  mm/rmap: Convert rmap_walk() to take a folio
  mm: Turn page_anon_vma() into folio_anon_vma()
  mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
  ...
2022-03-22 17:03:12 -07:00
Linus Torvalds
3bf03b9a08 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs

 - Most the MM patches which precede the patches in Willy's tree: kasan,
   pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
   sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
   userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
   cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
   zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits)
  mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
  Docs/ABI/testing: add DAMON sysfs interface ABI document
  Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
  selftests/damon: add a test for DAMON sysfs interface
  mm/damon/sysfs: support DAMOS stats
  mm/damon/sysfs: support DAMOS watermarks
  mm/damon/sysfs: support schemes prioritization
  mm/damon/sysfs: support DAMOS quotas
  mm/damon/sysfs: support DAMON-based Operation Schemes
  mm/damon/sysfs: support the physical address space monitoring
  mm/damon/sysfs: link DAMON for virtual address spaces monitoring
  mm/damon: implement a minimal stub for sysfs-based DAMON interface
  mm/damon/core: add number of each enum type values
  mm/damon/core: allow non-exclusive DAMON start/stop
  Docs/damon: update outdated term 'regions update interval'
  Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
  Docs/vm/damon: call low level monitoring primitives the operations
  mm/damon: remove unnecessary CONFIG_DAMON option
  mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
  mm/damon/dbgfs-test: fix is_target_id() change
  ...
2022-03-22 16:11:53 -07:00
Hugh Dickins
b698f0a177 mm/fs: delete PF_SWAPWRITE
PF_SWAPWRITE has been redundant since v3.2 commit ee72886d8ed5 ("mm:
vmscan: do not writeback filesystem pages in direct reclaim").

Coincidentally, NeilBrown's current patch "remove inode_congested()"
deletes may_write_to_inode(), which appeared to be the one function which
took notice of PF_SWAPWRITE.  But if you study the old logic, and the
conditions under which may_write_to_inode() was called, you discover that
flag and function have been pointless for a decade.

Link: https://lkml.kernel.org/r/75e80e7-742d-e3bd-531-614db8961e4@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Jan Kara <jack@suse.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:08 -07:00
Nadav Amit
824ddc601a userfaultfd: provide unmasked address on page-fault
Userfaultfd is supposed to provide the full address (i.e., unmasked) of
the faulting access back to userspace.  However, that is not the case for
quite some time.

Even running "userfaultfd_demo" from the userfaultfd man page provides the
wrong output (and contradicts the man page).  Notice that
"UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and
not the first read address (0x7fc5e30b300f).

	Address returned by mmap() = 0x7fc5e30b3000

	fault_handler_thread():
	    poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
	    UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
		(uffdio_copy.copy returned 4096)
	Read address 0x7fc5e30b300f in main(): A
	Read address 0x7fc5e30b340f in main(): A
	Read address 0x7fc5e30b380f in main(): A
	Read address 0x7fc5e30b3c0f in main(): A

The exact address is useful for various reasons and specifically for
prefetching decisions.  If it is known that the memory is populated by
certain objects whose size is not page-aligned, then based on the faulting
address, the uffd-monitor can decide whether to prefetch and prefault the
adjacent page.

This bug has been for quite some time in the kernel: since commit
1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address")
vmf->virtual_address"), which dates back to 2016.  A concern has been
raised that existing userspace application might rely on the old/wrong
behavior in which the address is masked.  Therefore, it was suggested to
provide the masked address unless the user explicitly asks for the exact
address.

Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct
userfaultfd to provide the exact address.  Add a new "real_address" field
to vmf to hold the unmasked address.  Provide the address to userspace
accordingly.

Initialize real_address in various code-paths to be consistent with
address, even when it is not used, to be on the safe side.

[namit@vmware.com: initialize real_address on all code paths, per Jan]
  Link: https://lkml.kernel.org/r/20220226022655.350562-1-namit@vmware.com
[akpm@linux-foundation.org: fix typo in comment, per Jan]

Link: https://lkml.kernel.org/r/20220218041003.3508-1-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:08 -07:00
Muchun Song
f53bf711d4 mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
Like inode cache, the dentry will also be added to its memcg list_lru.  So
replace kmem_cache_alloc() with kmem_cache_alloc_lru() to allocate dentry.

Link: https://lkml.kernel.org/r/20220228122126.37293-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:03 -07:00
Muchun Song
65d3af647b f2fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().

Link: https://lkml.kernel.org/r/20220228122126.37293-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:03 -07:00
Muchun Song
fd60b28842 fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().

Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>		[ext4]
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:03 -07:00
Muchun Song
8b9f3ac5b0 fs: introduce alloc_inode_sb() to allocate filesystems specific inode
The allocated inode cache is supposed to be added to its memcg list_lru
which should be allocated as well in advance.  That can be done by
kmem_cache_alloc_lru() which allocates object and list_lru.  The file
systems is main user of it.  So introduce alloc_inode_sb() to allocate
file system specific inodes and set up the inode reclaim context
properly.  The file system is supposed to use alloc_inode_sb() to
allocate inodes.

In later patches, we will convert all users to the new API.

Link: https://lkml.kernel.org/r/20220228122126.37293-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:03 -07:00
Minchan Kim
c0226eb8bd mm: fs: fix lru_cache_disabled race in bh_lru
Check lru_cache_disabled under bh_lru_lock.  Otherwise, it could introduce
race below and it fails to migrate pages containing buffer_head.

   CPU 0					CPU 1

bh_lru_install
                                       lru_cache_disable
  lru_cache_disabled = false
                                       atomic_inc(&lru_disable_count);
				       invalidate_bh_lrus_cpu of CPU 0
				       bh_lru_lock
				       __invalidate_bh_lrus
				       bh_lru_unlock
  bh_lru_lock
  install the bh
  bh_lru_unlock

WHen this race happens a CMA allocation fails, which is critical for
the workload which depends on CMA.

Link: https://lkml.kernel.org/r/20220308180709.2017638-1-minchan@kernel.org
Fixes: 8cc621d2f45d ("mm: fs: invalidate BH LRU during page migration")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: John Dias <joaodias@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:01 -07:00
Anthony Iliopoulos
a128b054ce mount: warn only once about timestamp range expiration
Commit f8b92ba67c5d ("mount: Add mount warning for impending timestamp
expiry") introduced a mount warning regarding filesystem timestamp
limits, that is printed upon each writable mount or remount.

This can result in a lot of unnecessary messages in the kernel log in
setups where filesystems are being frequently remounted (or mounted
multiple times).

Avoid this by setting a superblock flag which indicates that the warning
has been emitted at least once for any particular mount, as suggested in
[1].

Link: https://lore.kernel.org/CAHk-=wim6VGnxQmjfK_tDg6fbHYKL4EFkmnTjVr9QnRqjDBAeA@mail.gmail.com/ [1]
Link: https://lkml.kernel.org/r/20220119202934.26495-1-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:01 -07:00
NeilBrown
a64239d0ef f2fs: replace congestion_wait() calls with io_schedule_timeout()
As congestion is no longer tracked, congestion_wait() is effectively
equivalent to io_schedule_timeout().

So introduce f2fs_io_schedule_timeout() which sets TASK_UNINTERRUPTIBLE
and call that instead.

Link: https://lkml.kernel.org/r/164549983744.9187.6425865370954230902.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:01 -07:00
NeilBrown
b9b1335e64 remove bdi_congested() and wb_congested() and related functions
These functions are no longer useful as no BDIs report congestions any
more.

Removing the test on bdi_write_contested() in current_may_throttle()
could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
is set.

So replace the calls by 'false' and simplify the code - and remove the
functions.

[akpm@linux-foundation.org: fix build]

Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>	[nilfs]
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:01 -07:00
NeilBrown
fe55d563d4 remove inode_congested()
inode_congested() reports if the backing-device for the inode is
congested.  No bdi reports congestion any more, so this always returns
'false'.

So remove inode_congested() and related functions, and remove the call
sites, assuming that inode_congested() always returns 'false'.

Link: https://lkml.kernel.org/r/164549983741.9187.2174285592262191311.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:01 -07:00
NeilBrown
503d4fa6ee ceph: remove reliance on bdi congestion
The bdi congestion tracking in not widely used and will be removed.

CEPHfs is one of a small number of filesystems that uses it, setting just
the async (write) congestion flags at what it determines are appropriate
times.

The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.

So instead of setting the flag, set an internal flag and change:

 - .writepages to do nothing if WB_SYNC_NONE and the flag is set

 - .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the
   flag is set.

The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will
be called on the page which (I think) wil further delay the next attempt
at writeout.  This might be a good thing.

Link: https://lkml.kernel.org/r/164549983739.9187.14895675781408171186.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:00 -07:00
NeilBrown
6df25e5853 nfs: remove reliance on bdi congestion
The bdi congestion tracking in not widely used and will be removed.

NFS is one of a small number of filesystems that uses it, setting just
the async (write) congestion flag at what it determines are appropriate
times.

The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.

So instead of setting the flag, set an internal flag and change:

 - .writepages to do nothing if WB_SYNC_NONE and the flag is set

 - .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the
   flag is set.

The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be
called on the page which (I think) wil further delay the next attempt at
writeout.  This might be a good thing.

Link: https://lkml.kernel.org/r/164549983738.9187.3972219847989393182.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:00 -07:00
NeilBrown
670d21c6e1 fuse: remove reliance on bdi congestion
The bdi congestion tracking in not widely used and will be removed.

Fuse is one of a small number of filesystems that uses it, setting both
the sync (read) and async (write) congestion flags at what it determines
are appropriate times.

The only remaining effect of the sync flag is to cause read-ahead to be
skipped.  The only remaining effect of the async flag is to cause (some)
WB_SYNC_NONE writes to be skipped.

So instead of setting the flags, change:

 - .readahead to stop when it has submitted all non-async pages for
   read.

 - .writepages to do nothing if WB_SYNC_NONE and the flag would be set

 - .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the
   flag would be set.

The writepages change causes a behavioural change in that pageout() can
now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be
called on the page which (I think) will further delay the next attempt at
writeout.  This might be a good thing.

Link: https://lkml.kernel.org/r/164549983737.9187.2627117501000365074.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:00 -07:00
hongnanli
137cebf943 fs/ocfs2: fix comments mentioning i_mutex
inode->i_mutex has been replaced with inode->i_rwsem long ago.  Fix
comments still mentioning i_mutex.

Link: https://lkml.kernel.org/r/20220214031314.100094-1-hongnan.li@linux.alibaba.com
Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:00 -07:00
Joseph Qi
38c9d2d3f3 ocfs2: cleanup some return variables
Simply return directly instead of assign the return value to another
variable.

Link: https://lkml.kernel.org/r/20220114021641.13927-1-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Cc: CGEL ZTE <cgel.zte@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:00 -07:00
Dongliang Mu
714fbf2647 ntfs: add sanity check on allocation size
ntfs_read_inode_mount invokes ntfs_malloc_nofs with zero allocation
size.  It triggers one BUG in the __ntfs_malloc function.

Fix this by adding sanity check on ni->attr_list_size.

Link: https://lkml.kernel.org/r/20220120094914.47736-1-dzm91@hust.edu.cn
Reported-by: syzbot+3c765c5248797356edaa@syzkaller.appspotmail.com
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Acked-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:00 -07:00
David Howells
70ef38515b cifs: writeback fix
Wait for the page to be written to the cache before we allow it
to be modified

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-22 15:38:20 -05:00
Peter Zijlstra
b9067cd80f Merge branch 'kvm/kvm-sls-fix'
Sync with the last minute SLS fix to extend it for IBT.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-03-22 21:12:14 +01:00
Trond Myklebust
a245832aaa pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod
Ensure that pNFS file commit allocations in rpciod/nfsiod callbacks can
fail in low memory mode, so that the threads don't block and loop
forever.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 15:52:56 -04:00
Trond Myklebust
3e5f151e94 pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod
Ensure that pNFS flexfile allocations in rpciod/nfsiod callbacks can
fail in low memory mode, so that the threads don't block and loop
forever.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 15:52:56 -04:00
Trond Myklebust
63d8a41b1d NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod
Ensure that pNFS allocations that can be called from rpciod/nfsiod
callback can fail in low memory mode, so that the threads don't block
and loop forever.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 15:52:56 -04:00
Trond Myklebust
0bae835b63 NFS: Avoid writeback threads getting stuck in mempool_alloc()
In a low memory situation, allow the NFS writeback code to fail without
getting stuck in infinite loops in mempool_alloc().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 15:52:56 -04:00
Trond Myklebust
515dcdcd48 NFS: nfsiod should not block forever in mempool_alloc()
The concern is that since nfsiod is sometimes required to kick off a
commit, it can get locked up waiting forever in mempool_alloc() instead
of failing gracefully and leaving the commit until later.

Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY,
then fall back to a non-blocking attempt to allocate from the memory
pool.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 15:52:56 -04:00
Trond Myklebust
e47a62df29 NFS: Fix revalidation of empty readdir pages
If the page is empty, we need to check the array->last_cookie instead of
the first entry. Add a helper for the cases where we care.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 15:52:55 -04:00
Linus Torvalds
5191290407 for-5.18-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmI44SgACgkQxWXV+ddt
 WDtzyg//YgMKr05jRsU3I/pIQ9znuKZmmllThwF63ZRG4PvKz2QfzvKdrMuzNjru
 5kHbG59iJqtLmU/aVsdp8mL6mmg5U3Ym2bIRsrW5m4HTtTowKdirvL/lQ3/tWm8j
 CSDJhUdCL2SwFjpru+4cxOeHLXNSfsk4BoCu8nsLitL+oXv/EPo/dkmu6nPjiMY3
 RjsIDBeDEf7J20KOuP/qJuN2YOAT7TeISPD3Ow4aDsmndWQ8n6KehEmAZb7QuqZQ
 SYubZ2wTb9HuPH/qpiTIA7innBIr+JkYtUYlz2xxixM2BUWNfqD6oKHw9RgOY5Sg
 CULFssw0i7cgGKsvuPJw1zdM002uG4wwXKigGiyljTVWvxneyr4mNDWiGad+LyFJ
 XWhnABPidkLs/1zbUkJ23DVub5VlfZsypkFDJAUXI0nGu3VrhjDfTYMa8eCe2L/F
 YuGG6CrAC+5K/arKAWTVj7hOb+52UzBTEBJz60LJJ6dS9eQoBy857V6pfo7w7ukZ
 t/tqA6q75O4tk/G3Ix3V1CjuAH3kJE6qXrvBxhpu8aZNjofopneLyGqS5oahpcE8
 8edtT+ZZhNuU9sLSEJCJATVxXRDdNzpQ8CHgOR5HOUbmM/vwKNzHPfRQzDnImznw
 UaUlFaaHwK17M6Y/6CnMecz26U2nVSJ7pyh39mb784XYe2a1efE=
 =YARd
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "This contains feature updates, performance improvements, preparatory
  and core work and some related VFS updates:

  Features:

   - encoded read/write ioctls, allows user space to read or write raw
     data directly to extents (now compressed, encrypted in the future),
     will be used by send/receive v2 where it saves processing time

   - zoned mode now works with metadata DUP (the mkfs.btrfs default)

   - error message header updates:
      - print error state: transaction abort, other error, log tree
        errors
      - print transient filesystem state: remount, device replace,
        ignored checksum verifications

   - tree-checker: verify the transaction id of the to-be-written dirty
     extent buffer

  Performance improvements for fsync:

   - directory logging speedups (up to -90% run time)

   - avoid logging all directory changes during renames (up to -60% run
     time)

   - avoid inode logging during rename and link when possible (up to
     -60% run time)

   - prepare extents to be logged before locking a log tree path
     (throughput +7%)

   - stop copying old file extents when doing a full fsync()

   - improved logging of old extents after truncate

  Core, fixes:

   - improved stale device identification by dev_t and not just path
     (for devices that are behind other layers like device mapper)

   - continued extent tree v2 preparatory work
      - disable features that won't work yet
      - add wrappers and abstractions for new tree roots

   - improved error handling

   - add super block write annotations around background block group
     reclaim

   - fix device scanning messages potentially accessing stale pointer

   - cleanups and refactoring

  VFS:

   - allow reflinks/deduplication from two different mounts of the same
     filesystem

   - export and add helpers for read/write range verification, for the
     encoded ioctls"

* tag 'for-5.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (98 commits)
  btrfs: zoned: put block group after final usage
  btrfs: don't access possibly stale fs_info data in device_list_add
  btrfs: add lockdep_assert_held to need_preemptive_reclaim
  btrfs: verify the tranisd of the to-be-written dirty extent buffer
  btrfs: unify the error handling of btrfs_read_buffer()
  btrfs: unify the error handling pattern for read_tree_block()
  btrfs: factor out do_free_extent_accounting helper
  btrfs: remove last_ref from the extent freeing code
  btrfs: add a alloc_reserved_extent helper
  btrfs: remove BUG_ON(ret) in alloc_reserved_tree_block
  btrfs: add and use helper for unlinking inode during log replay
  btrfs: extend locking to all space_info members accesses
  btrfs: zoned: mark relocation as writing
  fs: allow cross-vfsmount reflink/dedupe
  btrfs: remove the cross file system checks from remap
  btrfs: pass btrfs_fs_info to btrfs_recover_relocation
  btrfs: pass btrfs_fs_info for deleting snapshots and cleaner
  btrfs: add filesystems state details to error messages
  btrfs: deal with unexpected extent type during reflinking
  btrfs: fix unexpected error path when reflinking an inline extent
  ...
2022-03-22 10:51:40 -07:00
Linus Torvalds
9b03992f0c Fix some bugs in converting ext4 to use the new mount API, as well as
more bug fixes and clean ups in the ext4 fast_commit feature (most
 notably, in the tracepoints).  In the jbd2 layer, the t_handle_lock
 spinlock has been removed, with the last place where it was actually
 needed replaced with an atomic cmpxchg.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEOrBXt+eNlFyMVZH70292m8EYBPAFAmI5QPsACgkQ0292m8EY
 BPD99Q//ZO7L0XUicNEsuxvtXDj4GBWg/R62JXcHsIetIRhCpM4+L2vS3kZVXCBh
 Pd+QMYsfAhuXFK3yK5MfvAb+XJ2e0iVNaw9ke2ueAsrGidlKWOstbMWWqEa4Dcr2
 age1DdrRB5hXIFxOg+Qz8bPMpP9XfxEiMqT2K7PgB+R/Z2chBbwPJ/G1VmdOMNGe
 FzxZgvXFY5EIJgjqj09ioD5pUSlAj5sgI4y/Dcv4liW11kvgBxg7Mm8ZouoPRHa3
 gZF5mj/40izhD8gYPmZJtHFFYuqBBLsrmJnjUF2Y2yvZt5I14j4w/5OfcjuJqWn6
 t2mlnk8/o6ZHzyuUV0/8/x35rZBlwqMq9ep6L9pN07XVB0/xOMG7a59pWOElhFGp
 sG/ELhP251ZE+glWC9/X1VjnhM+agduiWAUY0wF0X62LM3sw6sVO3q6hsI1EKunN
 O5VlvzPXNTZn9bcKzXzpvbxP9vF6s4exf32rMxd1rfSptE6myavGwss7MlB1HM7v
 xiM/A4AZT/qlUH9eOJxF1aGcswpxnFwTwHKZqBVW32iyG9mbqRT2lKajx4WstowU
 pdI4X7YfF7GhI5tMFR5S4Fa9FiOivO4EEc6z/db753kVuxSjwtyDd8ip56ZuJUhU
 LqpwZO2bh+UhR0vRO0pQH/ZqCzCfCaJTGtBwAvJUwo3g/zRTmoc=
 =naSC
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Fix some bugs in converting ext4 to use the new mount API, as well as
  more bug fixes and clean ups in the ext4 fast_commit feature (most
  notably, in the tracepoints).

  In the jbd2 layer, the t_handle_lock spinlock has been removed, with
  the last place where it was actually needed replaced with an atomic
  cmpxchg"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (35 commits)
  ext4: fix kernel doc warnings
  ext4: fix remaining two trace events to use same printk convention
  ext4: add commit tid info in ext4_fc_commit_start/stop trace events
  ext4: add commit_tid info in jbd debug log
  ext4: add transaction tid info in fc_track events
  ext4: add new trace event in ext4_fc_cleanup
  ext4: return early for non-eligible fast_commit track events
  ext4: do not call FC trace event in ext4_fc_commit() if FS does not support FC
  ext4: convert ext4_fc_track_dentry type events to use event class
  ext4: fix ext4_fc_stats trace point
  ext4: remove unused enum EXT4_FC_COMMIT_FAILED
  ext4: warn when dirtying page w/o buffers in data=journal mode
  doc: fixed a typo in ext4 documentation
  ext4: make mb_optimize_scan performance mount option work with extents
  ext4: make mb_optimize_scan option work with set/unset mount cmd
  ext4: don't BUG if someone dirty pages without asking ext4 first
  ext4: remove redundant assignment to variable split_flag1
  ext4: fix underflow in ext4_max_bitmap_size()
  ext4: fix ext4_mb_clear_bb() kernel-doc comment
  ext4: fix fs corruption when tring to remove a non-empty directory with IO error
  ...
2022-03-22 10:36:55 -07:00
Linus Torvalds
14705fda8f New features:
- NFSv3 support in NFSD is now always built
 - Added NFSD support for the NFSv4 birth-time file attribute
 - Added support for storing and displaying sockaddrs in trace points
 - NFSD now recognizes RPC_AUTH_TLS probes
 
 Performance improvements:
 - Optimized the svc transport enqueuing mechanism
 - Added micro-optimizations for the duplicate reply cache
 
 Notable bug fixes:
 - Allocation of the NFSD file cache hash table is more reliable
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmI3XNkACgkQM2qzM29m
 f5dNERAAqJ/nzfVp2H5BKLszJ7p/s13wFqbW719Rzymzl6/tUUHOqIsVdBsJsa/b
 BdZQLLDwa6ZB5zOAnC6FosRKYu4lwixOOf94pC6a9ZDD/glYVKF8mZG+RZXPAy16
 g3JUOi/bcyHXv0ZUhbv7DqW+HHM+owPP4vlNJ9ChiiLr/Xdp8NBKj+4Qtn/wcAo+
 Xuvx7fU/5Mbemh+dd5mWker4afHvpt9V6U6s04m5LiTPPnHVnxmeyekJGUCOY0QO
 cm/6SPNDqyn/VEfM/SRxEnLE9QcHRhZo/4PKRGF4wYolcviIogbILE1M7Ig/r/Gv
 6Du2kcRAhyZ3zgWnu799Ivn3Q6IrVjxZwqmsi7YHURTwYKyZtxYsUk0MCBcpnxrE
 WyTS2onpElbMop3viKCnQdpIetbsHnUNg3udUV6ugbdCbnZuhUw5B/d6x0o8ZWDE
 C0f+jnX+GnBstn0vkcj2H0+VQTd5hUJtXMrooI42ODJoboQRZcmePwoXjqCmw3sy
 PXTxLZC/5+4zNHGUuz4Pq4V7FKr4nHhDzaW4ZDO3mILx4ahceotulY1B/yoBUu8o
 /LAhu2kJ6nFQkmpzdrGzPeOstgJYHm9CaitRvMzg+NJxEAJdebypdQDbX5iNpgfW
 MDXH4n8eIqroTlQ/mQYEV0EbC7BaTqSCL6rQdcrcFUPu9n4Fcno=
 =5nac
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "New features:

   - NFSv3 support in NFSD is now always built

   - Added NFSD support for the NFSv4 birth-time file attribute

   - Added support for storing and displaying sockaddrs in trace points

   - NFSD now recognizes RPC_AUTH_TLS probes

  Performance improvements:

   - Optimized the svc transport enqueuing mechanism

   - Added micro-optimizations for the duplicate reply cache

  Notable bug fixes:

   - Allocation of the NFSD file cache hash table is more reliable"

* tag 'nfsd-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (30 commits)
  nfsd: fix using the correct variable for sizeof()
  nfsd: use correct format characters
  NFSD: prevent integer overflow on 32 bit systems
  NFSD: prevent underflow in nfssvc_decode_writeargs()
  fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.
  NFSD: Fix nfsd_breaker_owns_lease() return values
  NFSD: Clean up _lm_ operation names
  arch: Remove references to CONFIG_NFSD_V3 in the default configs
  NFSD: Remove CONFIG_NFSD_V3
  nfsd: more robust allocation failure handling in nfsd_file_cache_init
  SUNRPC: Teach server to recognize RPC_AUTH_TLS
  NFSD: Move svc_serv_ops::svo_function into struct svc_serv
  NFSD: Remove svc_serv_ops::svo_module
  SUNRPC: Remove svc_shutdown_net()
  SUNRPC: Rename svc_close_xprt()
  SUNRPC: Rename svc_create_xprt()
  SUNRPC: Remove svo_shutdown method
  SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()
  SUNRPC: Remove the .svo_enqueue_xprt method
  SUNRPC: Record endpoint information in trace log
  ...
2022-03-22 10:29:51 -07:00
Linus Torvalds
105b6c05c5 5 cifs/smb3 fixes, 3 for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmI4zfoACgkQiiy9cAdy
 T1F5cQwAiAoZ3aMPLhr2pq9EAWtAy413opNIGwxmqAYMGsFKunMfjtMBdNx0y6g+
 rEkImRA315U9MobFqndRJgmwIdztimxqst9jZEZJPuMO0Yt3WMXU6YIoYd/rbcJ2
 XxBuOwhm9fW56qQv6M9xGXuE6r47VhhHTTpQh9nMWlvH6sad5Rzwzjt2ycrOAmKL
 Jn65LTmwTCK4htkXJWzGFwxSB31ueQ2cEDpA0IlDtnlYlcBJ1lSmoEL6muIVKn+h
 ynWvY4k39xa83GF9t1BDCGFsXx4htxD3+LSP4Gb07Rp1n+ZMcwUHk1cGFBRRaise
 IzcLgLV8cwiruHQgqkGKW5b4DplFv7ScEj8iqO/Dve5kDmXNFzgPFygJKSDCLr/j
 IsrGpZfGINwiMDgI+R6jxCnLgEfswFbEHYhwZtRxguuav1JLGtLfHCb3G84nPJQu
 4sRfk3euYkkGP3IYW0uQw4tfhjDaB8m17EHQLzWw/RYAzUdsA67Ocvn/F8kdbF76
 YBXhJhrA
 =ai8m
 -----END PGP SIGNATURE-----

Merge tag '5.18-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6

Pull cfis updates from Steve French:
 "Handlecache,  unmount, fiemap and two reconnect fixes"

* tag '5.18-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: use a different reconnect helper for non-cifsd threads
  cifs: we do not need a spinlock around the tree access during umount
  Adjust cifssb maximum read size
  cifs: truncate the inode and mapping when we simulate fcollapse
  cifs: fix handlecache and multiuser
2022-03-22 10:24:16 -07:00
Linus Torvalds
ef510682af f2fs-for-5.18
In this cycle, f2fs has some performance improvements for Android workloads such
 as using read-unfair rwsems and adding some sysfs entries to control GCs and
 discard commands in more details. In addtiion, it has some tunings to improve
 the recovery speed after sudden power-cut.
 
 Enhancement:
  - add reader-unfair rwsems with F2FS_UNFAIR_RWSEM
   : will replace with generic API support
  - adjust to make the readahead/recovery flow more efficiently
  - sysfs entries to control issue speeds of GCs and Discard commands
  - enable idmapped mounts
 
 Bug fix:
  - correct wrong error handling routines
  - fix missing conditions in quota
  - fix a potential deadlock between writeback and block plug routines
  - fix a deadlock btween freezefs and evict_inode
 
 We've added some boundary checks to avoid kernel panics on corrupted images,
 and several minor code clean-ups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmI44c4ACgkQQBSofoJI
 UNIqdBAAgBjV/76Gphbpg2lR5+13pWBV0jp66yYaaPiqmM6IsSPYKTlGMJpEBy41
 x6M+MRc+NjtwSEHAWOptOIPbP9zwXYJn/KSDMCAP3+454YhBFDLqDAkAxBt1frYT
 0EkwCIYw/LqmVnuIttQ01gnT8v5zH4d/x4+gsdM+b7flmpCP/AoZDvI19Zd66F0y
 RdOdQQWyhvmmetZbaeaPoxbjS8LJ9b0ZMcxidTv9a+5GylCAXNicBdM9x1iVVmJ1
 dT1n2w7USKVdL4ydpwPUiec6RwACRk49CL3FgyyGNRlcpMmU9ArcY2l/Qr+At7ky
 tgPODXme/EvH12DsfoixjkNSLc4a7RHPfiJ3qy8XC6dshWYMKIegjateG8lVhf0P
 kdifMRCdOa+/l+RoyD1IjKTXPmVl9ihh6RBYDr6YrFclxg3uI4CvJCXht4dSXOCE
 5vLIVZEf5yk+6Ee2ozcNTG2hZ8gd+aNy1WqBN3/5lFxhBYVNlTnUYd0URzenwIdW
 i2QP99mFrntCL25lhF7f7AeTHxSg/UVXnRA1oQZ+6qIPPLhNdApfd1lov/6+Hhe4
 0zDbCbmIfVko/vZJeYOppaj+6jSZ3FafMfH5dDYyis4S4RbX2sjR9wGSd8PEdOTw
 /4dZXXfB2XslPb3KQsJSyGz75af3PxZ8PHLxj0HBSQXOA140htY=
 =t75l
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this cycle, f2fs has some performance improvements for Android
  workloads such as using read-unfair rwsems and adding some sysfs
  entries to control GCs and discard commands in more details. In
  addtiion, it has some tunings to improve the recovery speed after
  sudden power-cut.

  Enhancement:
   - add reader-unfair rwsems with F2FS_UNFAIR_RWSEM: will replace with
     generic API support
   - adjust to make the readahead/recovery flow more efficiently
   - sysfs entries to control issue speeds of GCs and Discard commands
   - enable idmapped mounts

  Bug fix:
   - correct wrong error handling routines
   - fix missing conditions in quota
   - fix a potential deadlock between writeback and block plug routines
   - fix a deadlock btween freezefs and evict_inode

  We've added some boundary checks to avoid kernel panics on corrupted
  images, and several minor code clean-ups"

* tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (27 commits)
  f2fs: fix to do sanity check on .cp_pack_total_block_count
  f2fs: make gc_urgent and gc_segment_mode sysfs node readable
  f2fs: use aggressive GC policy during f2fs_disable_checkpoint()
  f2fs: fix compressed file start atomic write may cause data corruption
  f2fs: initialize sbi->gc_mode explicitly
  f2fs: introduce gc_urgent_mid mode
  f2fs: compress: fix to print raw data size in error path of lz4 decompression
  f2fs: remove redundant parameter judgment
  f2fs: use spin_lock to avoid hang
  f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs
  f2fs: remove unnecessary read for F2FS_FITS_IN_INODE
  f2fs: introduce F2FS_UNFAIR_RWSEM to support unfair rwsem
  f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes
  f2fs: fix to do sanity check on curseg->alloc_type
  f2fs: fix to avoid potential deadlock
  f2fs: quota: fix loop condition at f2fs_quota_sync()
  f2fs: Restore rwsem lockdep support
  f2fs: fix missing free nid in f2fs_handle_failed_inode
  f2fs: support idmapped mounts
  f2fs: add a way to limit roll forward recovery time
  ...
2022-03-22 10:00:31 -07:00
Linus Torvalds
aab4ed5816 Changes since last update:
- Avoid using page structure directly for all uncompressed paths;
 
  - Fix a double-free issue when sysfs initialization fails;
 
  - Complete DAX description for erofs;
 
  - Use mtime instead since there's no (easy) way for users to control
    ctime;
 
  - Several code cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iJIEABYIADoWIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYjfdHxwcaHNpYW5na2Fv
 QGxpbnV4LmFsaWJhYmEuY29tAAoJEDk3MdwfteYEvbYBANAWd+wSFLS3XDEzM3Nw
 VzdMW7lfnrqog8HgqbcRHm9OAP9zx3KZaCh/frUA1OCKk/H0KD6UFShu6fXgBri+
 DJMnCw==
 =a5Ns
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "In this cycle, we continue converting to use meta buffers for all
  remaining uncompressed paths to prepare for the upcoming subpage,
  folio and fscache features.

  We also fixed a double-free issue when sysfs initialization fails,
  which was reported by syzbot.

  Besides, in order for the userspace to control per-file timestamp
  easier, we now switch to record mtime instead of ctime with a
  compatible feature marked. And there are also some code cleanups and
  documentation update as usual.

  Summary:

   - Avoid using page structure directly for all uncompressed paths

   - Fix a double-free issue when sysfs initialization fails

   - Complete DAX description for erofs

   - Use mtime instead since there's no (easy) way for users to control
     ctime

   - Several code cleanups"

* tag 'erofs-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: rename ctime to mtime
  erofs: use meta buffers for inode lookup
  erofs: use meta buffers for reading directories
  fs: erofs: add sanity check for kobject in erofs_unregister_sysfs
  erofs: refine managed inode stuffs
  erofs: clean up z_erofs_extent_lookback
  erofs: silence warnings related to impossible m_plen
  Documentation/filesystem/dax: update DAX description on erofs
  erofs: clean up preload_compressed_pages()
  erofs: get rid of `struct z_erofs_collector'
  erofs: use meta buffers for erofs_read_superblock()
2022-03-22 09:54:56 -07:00
Linus Torvalds
881b568756 fscrypt updates for 5.18
Add support for direct I/O on encrypted files when blk-crypto (inline
 encryption) is being used for file contents encryption.
 
 There will be a merge conflict with the block pull request in
 fs/iomap/direct-io.c, due to some bio interface cleanups.  The merge
 resolution is straightforward and can be found in linux-next.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYjgCfhQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK6vjAQDp8D8OKIyj67KjYwvyHpNy0fZhxeur
 RexC0nDfd9BE/AD/fV6zpCglmsuGxqxL0jmqeczKXn2y7nRFmPciCBTi/wY=
 =kwNd
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt updates from Eric Biggers:
 "Add support for direct I/O on encrypted files when blk-crypto (inline
  encryption) is being used for file contents encryption"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: update documentation for direct I/O support
  f2fs: support direct I/O with fscrypt using blk-crypto
  ext4: support direct I/O with fscrypt using blk-crypto
  iomap: support direct I/O with fscrypt using blk-crypto
  fscrypt: add functions for direct I/O support
2022-03-22 09:50:16 -07:00
Haimin Zhang
a530462910 jfs: prevent NULL deref in diFree
Add validation check for JFS_IP(ipimap)->i_imap to prevent a NULL deref
in diFree since diFree uses it without do any validations.
When function jfs_mount calls diMount to initialize fileset inode
allocation map, it can fail and JFS_IP(ipimap)->i_imap won't be
initialized. Then it calls diFreeSpecial to close fileset inode allocation
map inode and it will flow into jfs_evict_inode. Function jfs_evict_inode
just validates JFS_SBI(inode->i_sb)->ipimap, then calls diFree. diFree use
JFS_IP(ipimap)->i_imap directly, then it will cause a NULL deref.

Reported-by: TCS Robot <tcs_robot@tencent.com>
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-03-22 10:09:19 -05:00
Pavel Skripkin
2cc7cc01c1 jfs: fix divide error in dbNextAG
Syzbot reported divide error in dbNextAG(). The problem was in missing
validation check for malicious image.

Syzbot crafted an image with bmp->db_numag equal to 0. There wasn't any
validation checks, but dbNextAG() blindly use bmp->db_numag in divide
expression

Fix it by validating bmp->db_numag in dbMount() and return an error if
image is malicious

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: syzbot+46f5c25af73eb8330eb6@syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-03-22 10:09:13 -05:00
Trond Myklebust
648a4548d6 NFS: Don't deadlock when cookie hashes collide
In the very rare case where the readdir reply contains multiple cookies
that map to the same hash value, we can end up deadlocking waiting for a
page lock that we already hold. In this case we should fail the page
lock by using grab_cache_page_nowait().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22 09:14:39 -04:00