78804 Commits

Author SHA1 Message Date
Darrick J. Wong
c1ccf967bf xfs: fix uninitialized list head in struct xfs_refcount_recovery
We're supposed to initialize the list head of an object before adding it
to another list.  Fix that, and stop using the kmem_{alloc,free} calls
from the Irix days.

Fixes: 174edb0e46e5 ("xfs: store in-progress CoW allocations in the refcount btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:22 -07:00
Darrick J. Wong
f1fdc82078 xfs: fix agblocks check in the cow leftover recovery function
As we've seen, refcount records use the upper bit of the rc_startblock
field to ensure that all the refcount records are at the right side of
the refcount btree.  This works because an AG is never allowed to have
more than (1U << 31) blocks in it.  If we ever encounter a filesystem
claiming to have that many blocks, we absolutely do not want reflink
touching it at all.

However, this test at the start of xfs_refcount_recover_cow_leftovers is
slightly incorrect -- it /should/ be checking that agblocks isn't larger
than the XFS_MAX_CRC_AG_BLOCKS constant, and it should check that the
constant is never large enough to conflict with that CoW flag.

Note that the V5 superblock verifier has not historically rejected
filesystems where agblocks >= XFS_MAX_CRC_AG_BLOCKS, which is why this
ended up in the COW recovery routine.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
f62ac3e0ac xfs: check record domain when accessing refcount records
Now that we've separated the startblock and CoW/shared extent domain in
the incore refcount record structure, check the domain whenever we
retrieve a record to ensure that it's still in the domain that we want.
Depending on the circumstances, a change in domain either means we're
done processing or that we've found a corruption and need to fail out.

The refcount check in xchk_xref_is_cow_staging is redundant since
_get_rec has done that for a long time now, so we can get rid of it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
68d0f38917 xfs: remove XFS_FIND_RCEXT_SHARED and _COW
Now that we have an explicit enum for shared and CoW staging extents, we
can get rid of the old FIND_RCEXT flags.  Omit a couple of conversions
that disappear in the next patches.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
f492135df0 xfs: refactor domain and refcount checking
Create a helper function to ensure that CoW staging extent records have
a single refcount and that shared extent records have more than 1
refcount.  We'll put this to more use in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
571423a162 xfs: report refcount domain in tracepoints
Now that we've broken out the startblock and shared/cow domain in the
incore refcount extent record structure, update the tracepoints to
report the domain.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
9a50ee4f8d xfs: track cow/shared record domains explicitly in xfs_refcount_irec
Just prior to committing the reflink code into upstream, the xfs
maintainer at the time requested that I find a way to shard the refcount
records into two domains -- one for records tracking shared extents, and
a second for tracking CoW staging extents.  The idea here was to
minimize mount time CoW reclamation by pushing all the CoW records to
the right edge of the keyspace, and it was accomplished by setting the
upper bit in rc_startblock.  We don't allow AGs to have more than 2^31
blocks, so the bit was free.

Unfortunately, this was a very late addition to the codebase, so most of
the refcount record processing code still treats rc_startblock as a u32
and pays no attention to whether or not the upper bit (the cow flag) is
set.  This is a weakness is theoretically exploitable, since we're not
fully validating the incoming metadata records.

Fuzzing demonstrates practical exploits of this weakness.  If the cow
flag of a node block key record is corrupted, a lookup operation can go
to the wrong record block and start returning records from the wrong
cow/shared domain.  This causes the math to go all wrong (since cow
domain is still implicit in the upper bit of rc_startblock) and we can
crash the kernel by tricking xfs into jumping into a nonexistent AG and
tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL.

To fix this, start tracking the domain as an explicit part of struct
xfs_refcount_irec, adjust all refcount functions to check the domain
of a returned record, and alter the function definitions to accept them
where necessary.

Found by fuzzing keys[2].cowflag = add in xfs/464.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
5a8c345ca8 xfs: refactor refcount record usage in xchk_refcountbt_rec
Consolidate the open-coded xfs_refcount_irec fields into an actual
struct and use the existing _btrec_to_irec to decode the ondisk record.
This will reduce code churn in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
9e7e2436c1 xfs: move _irec structs to xfs_types.h
Structure definitions for incore objects do not belong in the ondisk
format header.  Move them to the incore types header where they belong.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:20 -07:00
Darrick J. Wong
8edbe0cf8b xfs: check deferred refcount op continuation parameters
If we're in the middle of a deferred refcount operation and decide to
roll the transaction to avoid overflowing the transaction space, we need
to check the new agbno/aglen parameters that we're about to record in
the new intent.  Specifically, we need to check that the new extent is
completely within the filesystem, and that continuation does not put us
into a different AG.

If the keys of a node block are wrong, the lookup to resume an
xfs_refcount_adjust_extents operation can put us into the wrong record
block.  If this happens, we might not find that we run out of aglen at
an exact record boundary, which will cause the loop control to do the
wrong thing.

The previous patch should take care of that problem, but let's add this
extra sanity check to stop corruption problems sooner than later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:20 -07:00
Darrick J. Wong
b65e08f83b xfs: create a predicate to verify per-AG extents
Create a predicate function to verify that a given agbno/blockcount pair
fit entirely within a single allocation group and don't suffer
mathematical overflows.  Refactor the existng open-coded logic; we're
going to add more calls to this function in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:20 -07:00
Darrick J. Wong
f850995f60 xfs: make sure aglen never goes negative in xfs_refcount_adjust_extents
Prior to calling xfs_refcount_adjust_extents, we trimmed agbno/aglen
such that the end of the range would not be in the middle of a refcount
record.  If this is no longer the case, something is seriously wrong
with the btree.  Bail out with a corruption error.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:20 -07:00
Darrick J. Wong
950f0d50ee xfs: dump corrupt recovered log intent items to dmesg consistently
If log recovery decides that an intent item is corrupt and wants to
abort the mount, capture a hexdump of the corrupt log item in the kernel
log for further analysis.  Some of the log item code already did this,
so we're fixing the rest to do it consistently.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:20 -07:00
Darrick J. Wong
921ed96b4f xfs: actually abort log recovery on corrupt intent-done log items
If log recovery picks up intent-done log items that are not of the
correct size it needs to abort recovery and fail the mount.  Debug
assertions are not good enough.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:20 -07:00
Darrick J. Wong
3c5aaaced9 xfs: refactor all the EFI/EFD log item sizeof logic
Refactor all the open-coded sizeof logic for EFI/EFD log item and log
format structures into common helper functions whose names reflect the
struct names.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:20 -07:00
Darrick J. Wong
03a7485cd7 xfs: fix memcpy fortify errors in EFI log format copying
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of
memcpy.  Since we're already fixing problems with BUI item copying, we
should fix it everything else.

An extra difficulty here is that the ef[id]_extents arrays are declared
as single-element arrays.  This is not the convention for flex arrays in
the modern kernel, and it causes all manner of problems with static
checking tools, since they often cannot tell the difference between a
single element array and a flex array.

So for starters, change those array[1] declarations to array[]
declarations to signal that they are proper flex arrays and adjust all
the "size-1" expressions to fit the new declaration style.

Next, refactor the xfs_efi_copy_format function to handle the copying of
the head and the flex array members separately.  While we're at it, fix
a minor validation deficiency in the recovery function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:20 -07:00
Darrick J. Wong
b45ca961e9 xfs: fix memcpy fortify errors in RUI log format copying
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of
memcpy.  Since we're already fixing problems with BUI item copying, we
should fix it everything else.

Refactor the xfs_rui_copy_format function to handle the copying of the
head and the flex array members separately.  While we're at it, fix a
minor validation deficiency in the recovery function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:19 -07:00
Darrick J. Wong
a38935c03c xfs: fix memcpy fortify errors in CUI log format copying
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of
memcpy.  Since we're already fixing problems with BUI item copying, we
should fix it everything else.

Refactor the xfs_cui_copy_format function to handle the copying of the
head and the flex array members separately.  While we're at it, fix a
minor validation deficiency in the recovery function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:19 -07:00
Darrick J. Wong
a38ebce1da xfs: fix memcpy fortify errors in BUI log format copying
Starting in 6.1, CONFIG_FORTIFY_SOURCE checks the length parameter of
memcpy.  Unfortunately, it doesn't handle flex arrays correctly:

------------[ cut here ]------------
memcpy: detected field-spanning write (size 48) of single field "dst_bui_fmt" at fs/xfs/xfs_bmap_item.c:628 (size 16)

Fix this by refactoring the xfs_bui_copy_format function to handle the
copying of the head and the flex array members separately.  While we're
at it, fix a minor validation deficiency in the recovery function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:19 -07:00
Darrick J. Wong
59da7ff49d xfs: fix validation in attr log item recovery
Before we start fixing all the complaints about memcpy'ing log items
around, let's fix some inadequate validation in the xattr log item
recovery code and get rid of the (now trivial) copy_format function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:19 -07:00
Filipe Manana
8184620ae2 btrfs: fix lost file sync on direct IO write with nowait and dsync iocb
When doing a direct IO write using a iocb with nowait and dsync set, we
end up not syncing the file once the write completes.

This is because we tell iomap to not call generic_write_sync(), which
would result in calling btrfs_sync_file(), in order to avoid a deadlock
since iomap can call it while we are holding the inode's lock and
btrfs_sync_file() needs to acquire the inode's lock. The deadlock happens
only if the write happens synchronously, when iomap_dio_rw() calls
iomap_dio_complete() before it returns. Instead we do the sync ourselves
at btrfs_do_write_iter().

For a nowait write however we can end up not doing the sync ourselves at
at btrfs_do_write_iter() because the write could have been queued, and
therefore we get -EIOCBQUEUED returned from iomap in such case. That makes
us skip the sync call at btrfs_do_write_iter(), as we don't do it for
any error returned from btrfs_direct_write(). We can't simply do the call
even if -EIOCBQUEUED is returned, since that would block the task waiting
for IO, both for the data since there are bios still in progress as well
as potentially blocking when joining a log transaction and when syncing
the log (writing log trees, super blocks, etc).

So let iomap do the sync call itself and in order to avoid deadlocks for
the case of synchronous writes (without nowait), use __iomap_dio_rw() and
have ourselves call iomap_dio_complete() after unlocking the inode.

A test case will later be sent for fstests, after this is fixed in Linus'
tree.

Fixes: 51bd9563b678 ("btrfs: fix deadlock due to page faults during direct IO reads and writes")
Reported-by: Марк Коренберг <socketpair@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAEmTpZGRKbzc16fWPvxbr6AfFsQoLmz-Lcg-7OgJOZDboJ+SGQ@mail.gmail.com/
CC: stable@vger.kernel.org # 6.0+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-31 16:52:56 +01:00
Darrick J. Wong
47ba8cc7b4 xfs: fix incorrect return type for fsdax fault handlers
The kernel robot complained about this:

>> fs/xfs/xfs_file.c:1266:31: sparse: sparse: incorrect type in return expression (different base types) @@     expected int @@     got restricted vm_fault_t @@
   fs/xfs/xfs_file.c:1266:31: sparse:     expected int
   fs/xfs/xfs_file.c:1266:31: sparse:     got restricted vm_fault_t
   fs/xfs/xfs_file.c:1314:21: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted vm_fault_t [usertype] ret @@     got int @@
   fs/xfs/xfs_file.c:1314:21: sparse:     expected restricted vm_fault_t [usertype] ret
   fs/xfs/xfs_file.c:1314:21: sparse:     got int

Fix the incorrect return type for these two functions.

While we're at it, make the !fsdax version return VM_FAULT_SIGBUS
because a zero return value will cause some callers to try to lock
vmf->page, which we never set here.

Fixes: ea6c49b784f0 ("xfs: support CoW in fsdax mode")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:51:45 -07:00
Christophe JAILLET
063b1f21cc btrfs: fix a memory allocation failure test in btrfs_submit_direct
After allocation 'dip' is tested instead of 'dip->csums'.  Fix it.

Fixes: 642c5d34da53 ("btrfs: allocate the btrfs_dio_private as part of the iomap dio bio")
CC: stable@vger.kernel.org # 5.19+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-31 16:50:15 +01:00
Linus Torvalds
28b7bd4ad2 3 cifs/smb3 fixes (also for stable)
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmNd32gACgkQiiy9cAdy
 T1E/FQwAqaALaUBM+PHcqHnOKhflYf7+DjkClKcbKSuQ6kN9R5ajHk6pi7EEcexx
 eDNvNszxNm+eEGmSRMoH10cQ3U0C68Dtip1vEyBmywCRBk7QWKAvGDF2JzzS92Cv
 rz4IrZfBGDS5kxeKHAHMaGBy4xnU+4yeVFkcgESltv+g3+C2wLmwL72oeI/ttkIg
 +Tmr2EQLKG/FIxobLZePc90fWUg6vvUM3u0HwK0bzW2ZtkrxTa8/RU2ziNCbOOQN
 VVUEq9FlEVf+71TLa+N4fJStBWQWqldX197Fk15C4on7zcT05wVITXro2CYAPvhV
 ZwROwggSu0jCPiohVkrg4lQjVAFXejE/GNv8c1casliWbxaixpfLC2czONU0PItj
 lTpntaX+fxUIblhMgCsJNxpgYpgeyTS1XyC9kqmt2tsOAgkrgUx7wKtreI+M18yC
 GXPbXkczAEq73pkfkvBRd1UcWuoUPi2ex6UG6oQQO6DnFQwIBYNlMGGIHwEG3VoI
 kvdtwhi8
 =/RPn
 -----END PGP SIGNATURE-----

Merge tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:

 - use after free fix for reconnect race

 - two memory leak fixes

* tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix use-after-free caused by invalid pointer `hostname`
  cifs: Fix pages leak when writedata alloc failed in cifs_write_from_iter()
  cifs: Fix pages array leak when writedata alloc failed in cifs_writedata_alloc()
2022-10-30 09:40:04 -07:00
Linus Torvalds
3c339dbd13 23 hotfixes.
Eight fix pre-6.0 bugs and the remainder address issues which were
 introduced in the 6.1-rc merge cycle, or address issues which aren't
 considered sufficiently serious to warrant a -stable backport.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY1w/LAAKCRDdBJ7gKXxA
 jovHAQDqY3TGAVQsvCBKdUqkp5nakZ7o7kK+mUGvsZ8Cgp5fwQD/Upsu93RZsTgm
 oJfYW4W6eSVEKPu7oAY20xVwLvK6iQ0=
 =z0Fn
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
 "Eight fix pre-6.0 bugs and the remainder address issues which were
  introduced in the 6.1-rc merge cycle, or address issues which aren't
  considered sufficiently serious to warrant a -stable backport"

* tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
  mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region
  lib: maple_tree: remove unneeded initialization in mtree_range_walk()
  mmap: fix remap_file_pages() regression
  mm/shmem: ensure proper fallback if page faults
  mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page()
  x86: fortify: kmsan: fix KMSAN fortify builds
  x86: asm: make sure __put_user_size() evaluates pointer once
  Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default
  x86/purgatory: disable KMSAN instrumentation
  mm: kmsan: export kmsan_copy_page_meta()
  mm: migrate: fix return value if all subpages of THPs are migrated successfully
  mm/uffd: fix vma check on userfault for wp
  mm: prep_compound_tail() clear page->private
  mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs
  mm/page_isolation: fix clang deadcode warning
  fs/ext4/super.c: remove unused `deprecated_msg'
  ipc/msg.c: fix percpu_counter use after free
  memory tier, sysfs: rename attribute "nodes" to "nodelist"
  MAINTAINERS: git://github.com -> https://github.com for nilfs2
  mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops
  ...
2022-10-29 17:49:33 -07:00
Sebastian Andrzej Siewior
dda1c41a07 mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region
lru_gen_add_mm() has been added within an IRQ-off region in the commit
mentioned below.  The other invocations of lru_gen_add_mm() are not within
an IRQ-off region.

The invocation within IRQ-off region is problematic on PREEMPT_RT because
the function is using a spin_lock_t which must not be used within
IRQ-disabled regions.

The other invocations of lru_gen_add_mm() occur while
task_struct::alloc_lock is acquired.  Move lru_gen_add_mm() after
interrupts are enabled and before task_unlock().

Link: https://lkml.kernel.org/r/20221026134830.711887-1-bigeasy@linutronix.de
Fixes: bd74fdaea1460 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:23 -07:00
Andrew Morton
bb2282cf01 fs/ext4/super.c: remove unused `deprecated_msg'
fs/ext4/super.c:1744:19: warning: 'deprecated_msg' defined but not used [-Wunused-const-variable=]

Reported-by: kernel test robot <lkp@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:22 -07:00
Phillip Lougher
e11c4e088b squashfs: fix buffer release race condition in readahead code
Fix a buffer release race condition, where the error value was used after
release.

Link: https://lkml.kernel.org/r/20221020223616.7571-4-phillip@squashfs.org.uk
Fixes: b09a7a036d20 ("squashfs: support reading fragments in readahead call")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reported-by: Marc Miltenberger <marcmiltenberger@gmail.com>
Cc: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: Slade Watkins <srw@sladewatkins.net>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:21 -07:00
Phillip Lougher
c9199de82b squashfs: fix extending readahead beyond end of file
The readahead code will try to extend readahead to the entire size of the
Squashfs data block.

But, it didn't take into account that the last block at the end of the
file may not be a whole block.  In this case, the code would extend
readahead to beyond the end of the file, leaving trailing pages.

Fix this by only requesting the expected number of pages.

Link: https://lkml.kernel.org/r/20221020223616.7571-3-phillip@squashfs.org.uk
Fixes: 8fc78b6fe24c ("squashfs: implement readahead")
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reported-by: Marc Miltenberger <marcmiltenberger@gmail.com>
Cc: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: Slade Watkins <srw@sladewatkins.net>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:21 -07:00
Phillip Lougher
9ef8eb6104 squashfs: fix read regression introduced in readahead code
Patch series "squashfs: fix some regressions introduced in the readahead
code".

This patchset fixes 3 regressions introduced by the recent readahead code
changes.  The first regression is causing "snaps" to randomly fail after a
couple of hours or days, which how the regression came to light.


This patch (of 3):

If a file isn't a whole multiple of the page size, the last page will have
trailing bytes unfilled.

There was a mistake in the readahead code which did this.  In particular
it incorrectly assumed that the last page in the readahead page array
(page[nr_pages - 1]) will always contain the last page in the block, which
if we're at file end, will be the page that needs to be zero filled.

But the readahead code may not return the last page in the block, which
means it is unmapped and will be skipped by the decompressors (a temporary
buffer used).

In this case the zero filling code will zero out the wrong page, leading
to data corruption.

Fix this by by extending the "page actor" to return the last page if
present, or NULL if a temporary buffer was used.

Link: https://lkml.kernel.org/r/20221020223616.7571-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20221020223616.7571-2-phillip@squashfs.org.uk
Fixes: 8fc78b6fe24c ("squashfs: implement readahead")
Link: https://lore.kernel.org/lkml/b0c258c3-6dcf-aade-efc4-d62a8b3a1ce2@alu.unizg.hr/
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Tested-by: Slade Watkins <srw@sladewatkins.net>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reported-by: Marc Miltenberger <marcmiltenberger@gmail.com>
Cc: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Cc: Hsin-Yi Wang <hsinyi@chromium.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-10-28 13:37:21 -07:00
Miklos Szeredi
4a6f278d48 fuse: add file_modified() to fallocate
Add missing file_modified() call to fuse_file_fallocate().  Without this
fallocate on fuse failed to clear privileges.

Fixes: 05ba1f082300 ("fuse: add FALLOCATE operation")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2022-10-28 14:25:20 +02:00
Zeng Heng
153695d36e cifs: fix use-after-free caused by invalid pointer hostname
`hostname` needs to be set as null-pointer after free in
`cifs_put_tcp_session` function, or when `cifsd` thread attempts
to resolve hostname and reconnect the host, the thread would deref
the invalid pointer.

Here is one of practical backtrace examples as reference:

Task 477
---------------------------
 do_mount
  path_mount
   do_new_mount
    vfs_get_tree
     smb3_get_tree
      smb3_get_tree_common
       cifs_smb3_do_mount
        cifs_mount
         mount_put_conns
          cifs_put_tcp_session
          --> kfree(server->hostname)

cifsd
---------------------------
 kthread
  cifs_demultiplex_thread
   cifs_reconnect
    reconn_set_ipaddr_from_hostname
    --> if (!server->hostname)
    --> if (server->hostname[0] == '\0')  // !! UAF fault here

CIFS: VFS: cifs_mount failed w/return code = -112
mount error(112): Host is down
BUG: KASAN: use-after-free in reconn_set_ipaddr_from_hostname+0x2ba/0x310
Read of size 1 at addr ffff888108f35380 by task cifsd/480
CPU: 2 PID: 480 Comm: cifsd Not tainted 6.1.0-rc2-00106-gf705792f89dd-dirty #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0x85
 print_report+0x16c/0x4a3
 kasan_report+0x95/0x190
 reconn_set_ipaddr_from_hostname+0x2ba/0x310
 __cifs_reconnect.part.0+0x241/0x800
 cifs_reconnect+0x65f/0xb60
 cifs_demultiplex_thread+0x1570/0x2570
 kthread+0x2c5/0x380
 ret_from_fork+0x22/0x30
 </TASK>
Allocated by task 477:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x30
 __kasan_kmalloc+0x7e/0x90
 __kmalloc_node_track_caller+0x52/0x1b0
 kstrdup+0x3b/0x70
 cifs_get_tcp_session+0xbc/0x19b0
 mount_get_conns+0xa9/0x10c0
 cifs_mount+0xdf/0x1970
 cifs_smb3_do_mount+0x295/0x1660
 smb3_get_tree+0x352/0x5e0
 vfs_get_tree+0x8e/0x2e0
 path_mount+0xf8c/0x1990
 do_mount+0xee/0x110
 __x64_sys_mount+0x14b/0x1f0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 477:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x30
 kasan_save_free_info+0x2a/0x50
 __kasan_slab_free+0x10a/0x190
 __kmem_cache_free+0xca/0x3f0
 cifs_put_tcp_session+0x30c/0x450
 cifs_mount+0xf95/0x1970
 cifs_smb3_do_mount+0x295/0x1660
 smb3_get_tree+0x352/0x5e0
 vfs_get_tree+0x8e/0x2e0
 path_mount+0xf8c/0x1990
 do_mount+0xee/0x110
 __x64_sys_mount+0x14b/0x1f0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
The buggy address belongs to the object at ffff888108f35380
 which belongs to the cache kmalloc-16 of size 16
The buggy address is located 0 bytes inside of
 16-byte region [ffff888108f35380, ffff888108f35390)
The buggy address belongs to the physical page:
page:00000000333f8e58 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888108f350e0 pfn:0x108f35
flags: 0x200000000000200(slab|node=0|zone=2)
raw: 0200000000000200 0000000000000000 dead000000000122 ffff8881000423c0
raw: ffff888108f350e0 000000008080007a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
 ffff888108f35280: fa fb fc fc fa fb fc fc fa fb fc fc fa fb fc fc
 ffff888108f35300: fa fb fc fc fa fb fc fc fa fb fc fc fa fb fc fc
>ffff888108f35380: fa fb fc fc fa fb fc fc fa fb fc fc fa fb fc fc
                   ^
 ffff888108f35400: fa fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888108f35480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Fixes: 7be3248f3139 ("cifs: To match file servers, make sure the server hostname matches")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-27 23:59:13 -05:00
Theodore Ts'o
9a8c5b0d06 ext4: update the backup superblock's at the end of the online resize
When expanding a file system using online resize, various fields in
the superblock (e.g., s_blocks_count, s_inodes_count, etc.) change.
To update the backup superblocks, the online resize uses the function
update_backups() in fs/ext4/resize.c.  This function was not updating
the checksum field in the backup superblocks.  This wasn't a big deal
previously, because e2fsck didn't care about the checksum field in the
backup superblock.  (And indeed, update_backups() goes all the way
back to the ext3 days, well before we had support for metadata
checksums.)

However, there is an alternate, more general way of updating
superblock fields, ext4_update_primary_sb() in fs/ext4/ioctl.c.  This
function does check the checksum of the backup superblock, and if it
doesn't match will mark the file system as corrupted.  That was
clearly not the intent, so avoid to aborting the resize when a bad
superblock is found.

In addition, teach update_backups() to properly update the checksum in
the backup superblocks.  We will eventually want to unify
updapte_backups() with the infrasture in ext4_update_primary_sb(), but
that's for another day.

Note: The problem has been around for a while; it just didn't really
matter until ext4_update_primary_sb() was added by commit bbc605cdb1e1
("ext4: implement support for get/set fs label").  And it became
trivially easy to reproduce after commit 827891a38acc ("ext4: update
the s_overhead_clusters in the backup sb's when resizing") in v6.0.

Cc: stable@kernel.org # 5.17+
Fixes: bbc605cdb1e1 ("ext4: implement support for get/set fs label")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-10-27 23:21:40 -04:00
Linus Torvalds
7dd257d02e execve fixes for v6.1-rc3
- Fix an ancient signal action copy race. (Bernd Edlinger)
 
 - Fix a memory leak in ELF loader, when under memory pressure. (Li Zetao)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmNa1xEWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJoLqD/927ZXWxVLQ0GygmNz3xSEZh+5c
 34flrZv4LUDQPw1rNXycWx2D5MQv5MehrpsMvF+11pu/M1EP3e3+R3bngFeFXtBo
 12ov3yEloe6yA8bOPPWEDB1fU8K7C9aODKMcJOoWFCk20g7uQGYS8+GCUGhLxjHs
 mZn5U8OuEGGvn4QuGknIps+Ddca2SHuJ7jBtsw8NVjuvtWcAhlw9PYNbLTJEgBzU
 0zsfK68idMpQHDPvWMmoRcwAXn3kiVzc3wKeR9Zdx9q2NyDIS+OxgynEAc3fM2rf
 ag19+Epn6GUGPMakS/zJNQS0wCA4+pJi60Z+Hlddy0WNUocg55uHd0zY7xcT3s75
 rsPtbTeabOrtzQMf7lSpsn5OUeCDJjc3KcZIlmILaZaVXUZv+jvysRwH7CRdDNNS
 gM2j9nu87I8TbSPXbY79KutvucfKAl88iWxRgFqnzyqzRYLWahwWSKsiVubH7OoU
 kUYdDdPmiZh7XAqTFUsMF4++wyx/PAwU7RdYuxaUvHZd6PT8J92AqIisPwRT9ojL
 oqLpgRoeYX3JY7aDyvBjYan2IKfIPhB0WZF9vCeHVoTXoEy/LVZeWVNoBXyO6ILl
 BYzBAjp5oJRLbJYVtjI4/gkDizdtpAu8YYRYX36TUvBAkFqpGYn9dvySpMGl24uJ
 g3IEqTj/kajeZleHnQ==
 =dHXB
 -----END PGP SIGNATURE-----

Merge tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve fixes from Kees Cook:

 - Fix an ancient signal action copy race (Bernd Edlinger)

 - Fix a memory leak in ELF loader, when under memory pressure (Li
   Zetao)

* tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  fs/binfmt_elf: Fix memory leak in load_elf_binary()
  exec: Copy oldsighand->action under spin-lock
2022-10-27 13:16:36 -07:00
Zhang Xiaoxu
7e8436728e nfs4: Fix kmemleak when allocate slot failed
If one of the slot allocate failed, should cleanup all the other
allocated slots, otherwise, the allocated slots will leak:

  unreferenced object 0xffff8881115aa100 (size 64):
    comm ""mount.nfs"", pid 679, jiffies 4294744957 (age 115.037s)
    hex dump (first 32 bytes):
      00 cc 19 73 81 88 ff ff 00 a0 5a 11 81 88 ff ff  ...s......Z.....
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<000000007a4c434a>] nfs4_find_or_create_slot+0x8e/0x130
      [<000000005472a39c>] nfs4_realloc_slot_table+0x23f/0x270
      [<00000000cd8ca0eb>] nfs40_init_client+0x4a/0x90
      [<00000000128486db>] nfs4_init_client+0xce/0x270
      [<000000008d2cacad>] nfs4_set_client+0x1a2/0x2b0
      [<000000000e593b52>] nfs4_create_server+0x300/0x5f0
      [<00000000e4425dd2>] nfs4_try_get_tree+0x65/0x110
      [<00000000d3a6176f>] vfs_get_tree+0x41/0xf0
      [<0000000016b5ad4c>] path_mount+0x9b3/0xdd0
      [<00000000494cae71>] __x64_sys_mount+0x190/0x1d0
      [<000000005d56bdec>] do_syscall_64+0x35/0x80
      [<00000000687c9ae4>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: abf79bb341bf ("NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-27 15:52:11 -04:00
Benjamin Coddington
038efb6348 NFSv4.2: Fixup CLONE dest file size for zero-length count
When holding a delegation, the NFS client optimizes away setting the
attributes of a file from the GETATTR in the compound after CLONE, and for
a zero-length CLONE we will end up setting the inode's size to zero in
nfs42_copy_dest_done().  Handle this case by computing the resulting count
from the server's reported size after CLONE's GETATTR.

Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 94d202d5ca39 ("NFSv42: Copy offload should update the file size when appropriate")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-27 15:52:10 -04:00
Benjamin Coddington
f5ea16137a NFSv4: Retry LOCK on OLD_STATEID during delegation return
There's a small window where a LOCK sent during a delegation return can
race with another OPEN on client, but the open stateid has not yet been
updated.  In this case, the client doesn't handle the OLD_STATEID error
from the server and will lose this lock, emitting:
"NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".

Fix this by sending the task through the nfs4 error handling in
nfs4_lock_done() when we may have to reconcile our stateid with what the
server believes it to be.  For this case, the result is a retry of the
LOCK operation with the updated stateid.

Reported-by: Gonzalo Siero Humet <gsierohu@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-27 15:52:10 -04:00
Trond Myklebust
e59679f2b7 NFSv4.1: We must always send RECLAIM_COMPLETE after a reboot
Currently, we are only guaranteed to send RECLAIM_COMPLETE if we have
open state to recover. Fix the client to always send RECLAIM_COMPLETE
after setting up the lease.

Fixes: fce5c838e133 ("nfs41: RECLAIM_COMPLETE functionality")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-27 15:52:10 -04:00
Trond Myklebust
5d917cba32 NFSv4.1: Handle RECLAIM_COMPLETE trunking errors
If RECLAIM_COMPLETE sets the NFS4CLNT_BIND_CONN_TO_SESSION flag, then we
need to loop back in order to handle it.

Fixes: 0048fdd06614 ("NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-27 15:52:10 -04:00
Trond Myklebust
1ba04394e0 NFSv4: Fix a potential state reclaim deadlock
If the server reboots while we are engaged in a delegation return, and
there is a pNFS layout with return-on-close set, then the current code
can end up deadlocking in pnfs_roc() when nfs_inode_set_delegation()
tries to return the old delegation.
Now that delegreturn actually uses its own copy of the stateid, it
should be safe to just always update the delegation stateid in place.

Fixes: 078000d02d57 ("pNFS: We want return-on-close to complete when evicting the inode")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-27 15:52:10 -04:00
Kees Cook
cf0d7e7f45 NFS: Avoid memcpy() run-time warning for struct sockaddr overflows
The 'nfs_server' and 'mount_server' structures include a union of
'struct sockaddr' (with the older 16 bytes max address size) and
'struct sockaddr_storage' which is large enough to hold all the
supported sa_family types (128 bytes max size). The runtime memcpy()
buffer overflow checker is seeing attempts to write beyond the 16
bytes as an overflow, but the actual expected size is that of 'struct
sockaddr_storage'. Plumb the use of 'struct sockaddr_storage' more
completely through-out NFS, which results in adjusting the memcpy()
buffers to the correct union members. Avoids this false positive run-time
warning under CONFIG_FORTIFY_SOURCE:

  memcpy: detected field-spanning write (size 28) of single field "&ctx->nfs_server.address" at fs/nfs/namespace.c:178 (size 16)

Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/all/202210110948.26b43120-yujie.liu@intel.com
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-27 15:52:10 -04:00
Yushan Zhou
121affdf8a nfs: Remove redundant null checks before kfree
Fix the following coccicheck warning:
fs/nfs/dir.c:2494:2-7: WARNING:
NULL check before some freeing functions is not needed.

Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-27 15:52:10 -04:00
Linus Torvalds
200204f56f fscrypt fix for 6.1-rc3
Fix a memory leak that was introduced by a change that went into -rc1.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCY1oM6BQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK3ixAP9IY1TdJu64uKTofFdYvO/wBASpdszm
 GkY1QnEFxATA9AEAwRswZgaGiuKj4hFBeIWmu9+luT4T7kVIcaumslTyTg8=
 =YinC
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt fix from Eric Biggers:
 "Fix a memory leak that was introduced by a change that went into -rc1"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: fix keyring memory leak on mount failure
2022-10-27 11:44:18 -07:00
Allison Henderson
e07ee6fe21 xfs: increase rename inode reservation
xfs_rename can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip
and wip.  So we need to increase the inode reservation to match.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-10-26 13:02:24 -07:00
Li Zetao
594d2a14f2 fs/binfmt_elf: Fix memory leak in load_elf_binary()
There is a memory leak reported by kmemleak:

  unreferenced object 0xffff88817104ef80 (size 224):
    comm "xfs_admin", pid 47165, jiffies 4298708825 (age 1333.476s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      60 a8 b3 00 81 88 ff ff a8 10 5a 00 81 88 ff ff  `.........Z.....
    backtrace:
      [<ffffffff819171e1>] __alloc_file+0x21/0x250
      [<ffffffff81918061>] alloc_empty_file+0x41/0xf0
      [<ffffffff81948cda>] path_openat+0xea/0x3d30
      [<ffffffff8194ec89>] do_filp_open+0x1b9/0x290
      [<ffffffff8192660e>] do_open_execat+0xce/0x5b0
      [<ffffffff81926b17>] open_exec+0x27/0x50
      [<ffffffff81a69250>] load_elf_binary+0x510/0x3ed0
      [<ffffffff81927759>] bprm_execve+0x599/0x1240
      [<ffffffff8192a997>] do_execveat_common.isra.0+0x4c7/0x680
      [<ffffffff8192b078>] __x64_sys_execve+0x88/0xb0
      [<ffffffff83bbf0a5>] do_syscall_64+0x35/0x80

If "interp_elf_ex" fails to allocate memory in load_elf_binary(),
the program will take the "out_free_ph" error handing path,
resulting in "interpreter" file resource is not released.

Fix it by adding an error handing path "out_free_file", which will
release the file resource when "interp_elf_ex" failed to allocate
memory.

Fixes: 0693ffebcfe5 ("fs/binfmt_elf.c: allocate less for static executable")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221024154421.982230-1-lizetao1@huawei.com
2022-10-25 15:11:21 -07:00
Bernd Edlinger
5bf2fedca8 exec: Copy oldsighand->action under spin-lock
unshare_sighand should only access oldsighand->action
while holding oldsighand->siglock, to make sure that
newsighand->action is in a consistent state.

Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: stable@vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/AM8PR10MB470871DEBD1DED081F9CC391E4389@AM8PR10MB4708.EURPRD10.PROD.OUTLOOK.COM
2022-10-25 15:05:58 -07:00
Qu Wenruo
76a66ba101 btrfs: don't use btrfs_chunk::sub_stripes from disk
[BUG]
There are two reports (the earliest one from LKP, a more recent one from
kernel bugzilla) that we can have some chunks with 0 as sub_stripes.

This will cause divide-by-zero errors at btrfs_rmap_block, which is
introduced by a recent kernel patch ac0677348f3c ("btrfs: merge
calculations for simple striped profiles in btrfs_rmap_block"):

		if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
				 BTRFS_BLOCK_GROUP_RAID10)) {
			stripe_nr = stripe_nr * map->num_stripes + i;
			stripe_nr = div_u64(stripe_nr, map->sub_stripes); <<<
		}

[CAUSE]
From the more recent report, it has been proven that we have some chunks
with 0 as sub_stripes, mostly caused by older mkfs.

It turns out that the mkfs.btrfs fix is only introduced in 6718ab4d33aa
("btrfs-progs: Initialize sub_stripes to 1 in btrfs_alloc_data_chunk")
which is included in v5.4 btrfs-progs release.

So there would be quite some old filesystems with such 0 sub_stripes.

[FIX]
Just don't trust the sub_stripes values from disk.

We have a trusted btrfs_raid_array[] to fetch the correct sub_stripes
numbers for each profile and that are fixed.

By this, we can keep the compatibility with older filesystems while
still avoid divide-by-zero bugs.

Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Viktor Kuzmin <kvaster@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216559
Fixes: ac0677348f3c ("btrfs: merge calculations for simple striped profiles in btrfs_rmap_block")
CC: stable@vger.kernel.org # 6.0
Reviewed-by: Su Yue <glass@fydeos.io>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-25 10:17:33 +02:00
David Sterba
2398091f9c btrfs: fix type of parameter generation in btrfs_get_dentry
The type of parameter generation has been u32 since the beginning,
however all callers pass a u64 generation, so unify the types to prevent
potential loss.

CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-24 15:28:58 +02:00
BingJing Chang
9b8be45f1e btrfs: send: fix send failure of a subcase of orphan inodes
Commit 9ed0a72e5b35 ("btrfs: send: fix failures when processing inodes with
no links") tries to fix all incremental send cases of orphan inodes the
send operation will meet. However, there's still a bug causing the corner
subcase fails with a ENOENT error.

Here's shortened steps of that subcase:

  $ btrfs subvolume create vol
  $ touch vol/foo

  $ btrfs subvolume snapshot -r vol snap1
  $ btrfs subvolume snapshot -r vol snap2

  # Turn the second snapshot to RW mode and delete the file while
  # holding an open file descriptor on it
  $ btrfs property set snap2 ro false
  $ exec 73<snap2/foo
  $ rm snap2/foo

  # Set the second snapshot back to RO mode and do an incremental send
  # with an unusal reverse order
  $ btrfs property set snap2 ro true
  $ btrfs send -p snap2 snap1 > /dev/null
  At subvol snap1
  ERROR: send ioctl failed with -2: No such file or directory

It's subcase 3 of BTRFS_COMPARE_TREE_CHANGED in the commit 9ed0a72e5b35
("btrfs: send: fix failures when processing inodes with no links"). And
it's not a common case. We still have not met it in the real world.
Theoretically, this case can happen in a batch cascading snapshot backup.
In cascading backups, the receive operation in the middle may cause orphan
inodes to appear because of the open file descriptors on the snapshot files
during receiving. And if we don't do the batch snapshot backups in their
creation order, then we can have an inode, which is an orphan in the parent
snapshot but refers to a file in the send snapshot. Since an orphan inode
has no paths, the send operation will fail with a ENOENT error if it
tries to generate a path for it.

In that patch, this subcase will be treated as an inode with a new
generation. However, when the routine tries to delete the old paths in
the parent snapshot, the function process_all_refs() doesn't check whether
there are paths recorded or not before it calls the function
process_recorded_refs(). And the function process_recorded_refs() try
to get the first path in the parent snapshot in the beginning. Since it has
no paths in the parent snapshot, the send operation fails.

To fix this, we can easily put a link count check to avoid entering the
deletion routine like what we do a link count check to avoid creating a
new one. Moreover, we can assume that the function process_all_refs()
can always collect references to process because we know it has a
positive link count.

Fixes: 9ed0a72e5b35 ("btrfs: send: fix failures when processing inodes with no links")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-24 15:28:52 +02:00
Qu Wenruo
3d17adea74 btrfs: make thaw time super block check to also verify checksum
Previous commit a05d3c915314 ("btrfs: check superblock to ensure the fs
was not modified at thaw time") only checks the content of the super
block, but it doesn't really check if the on-disk super block has a
matching checksum.

This patch will add the checksum verification to thaw time superblock
verification.

This involves the following extra changes:

- Export btrfs_check_super_csum()
  As we need to call it in super.c.

- Change the argument list of btrfs_check_super_csum()
  Instead of passing a char *, directly pass struct btrfs_super_block *
  pointer.

- Verify that our checksum type didn't change before checking the
  checksum value, like it's done at mount time

Fixes: a05d3c915314 ("btrfs: check superblock to ensure the fs was not modified at thaw time")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-24 15:28:29 +02:00