Commit Graph

1137311 Commits

Author SHA1 Message Date
Kirill A. Shutemov
373e715e31 x86/tdx: Panic on bad configs that #VE on "private" memory access
All normal kernel memory is "TDX private memory".  This includes
everything from kernel stacks to kernel text.  Handling
exceptions on arbitrary accesses to kernel memory is essentially
impossible because they can happen in horribly nasty places like
kernel entry/exit.  But, TDX hardware can theoretically _deliver_
a virtualization exception (#VE) on any access to private memory.

But, it's not as bad as it sounds.  TDX can be configured to never
deliver these exceptions on private memory with a "TD attribute"
called ATTR_SEPT_VE_DISABLE.  The guest has no way to *set* this
attribute, but it can check it.

Ensure ATTR_SEPT_VE_DISABLE is set in early boot.  panic() if it
is unset.  There is no sane way for Linux to run with this
attribute clear so a panic() is appropriate.

There's small window during boot before the check where kernel
has an early #VE handler. But the handler is only for port I/O
and will also panic() as soon as it sees any other #VE, such as
a one generated by a private memory access.

[ dhansen: Rewrite changelog and rebase on new tdx_parse_tdinfo().
	   Add Kirill's tested-by because I made changes since
	   he wrote this. ]

Fixes: 9a22bf6deb ("x86/traps: Add #VE support for TDX guest")
Reported-by: ruogui.ygr@alibaba-inc.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20221028141220.29217-3-kirill.shutemov%40linux.intel.com
2022-11-01 16:02:40 -07:00
Vishal Verma
71ee71d7ad cxl/region: Fix decoder allocation crash
When an intermediate port's decoders have been exhausted by existing
regions, and creating a new region with the port in question in it's
hierarchical path is attempted, cxl_port_attach_region() fails to find a
port decoder (as would be expected), and drops into the failure / cleanup
path.

However, during cleanup of the region reference, a sanity check attempts
to dereference the decoder, which in the above case didn't exist. This
causes a NULL pointer dereference BUG.

To fix this, refactor the decoder allocation and de-allocation into
helper routines, and in this 'free' routine, check that the decoder,
@cxld, is valid before attempting any operations on it.

Cc: <stable@vger.kernel.org>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 384e624bb2 ("cxl/region: Attach endpoint decoders")
Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01 15:33:07 -07:00
Linus Torvalds
8f71a2b3f4 Four small fixes for the docs tree.
-----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmNhl1kPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5Y05IIAJComMJ+V1zgOvl2NQmT89GZqoySbqyu3yI2
 yNvfpVMdTWjaC5qi6+G7HZrqmZI9EPBO9C1kR677onk4hHV5lVhGePNn7ybu/f76
 47vNkAo9azyOiP8H9AYByg6s+pTAE4A3Ko+ldr5ZszXdpl1SN7/FjE+QP2DAQm7Z
 f/cdaGTOW9XTZzNZq/mwP1a90T2Hh8eHF5oWsGiXTe0eF1WiM5gJsWKTuYohsOHj
 H3Pq/S4YiFuQ9RlB4sbjei0QCfUdrZvi5ddt06nhh3Kkrft/4papHFyN55IV1hh4
 xQHotvn6Kg0ACv3lNeyXxJP3sUzYmUd0tOMCfhXs/u3dNkB+u2s=
 =Kbhs
 -----END PGP SIGNATURE-----

Merge tag 'docs-6.1-fixes' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
 "Four small fixes for the docs tree"

* tag 'docs-6.1-fixes' of git://git.lwn.net/linux:
  docs/process/howto: Replace C89 with C11
  Documentation: Fix spelling mistake in hacking.rst
  Documentation: process: replace outdated LTS table w/ link
  tracing/histogram: Update document for KEYS_MAX size
2022-11-01 15:11:42 -07:00
Linus Torvalds
6eafb4a13d Fixes:
- Fix a loop that occurs when using multiple net namespaces
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmNhj5EACgkQM2qzM29m
 f5fqNxAAwlrhR83lzzcM4xt8woUnhnlyUjbVF38lVFLV7SJQC0Q2y4BktORxK1se
 GPKkWF5vn188xCwvhGZwFYdR2dL3z3GmUkOX9MOWWJwjAVkcACj5lVZcOzdSq2Ny
 iXKTym/6zTqp2rc0rjXQnaXLwUUHo3uNZe6qtMpY8tezwYkN9EG3ZWNcSgtFGdwA
 4accYxYu4p3J0BGig4rq0R3tjFf3Ya2u9igCdvBrObzBRNyYpoVlyYpXRoK0f1mp
 PhWk+9qtEBD5qqddj5ZgQtkZt8GSIHxJvlyyYnvv/YvSqZ26e3zjkS9tDVLPdTss
 6RiaKz8iKYEOHAtABfqikJMoPGU51fg5auY4gmm4DgeYO9HTQmQXvpHBZEuejTKt
 Gv4CVOV7ziQtSl5EwOLO5d1CiHWA9u57PYrzQeHf7+Y1kCHmB9dy35LztG+3LaNJ
 r357EyGaGhXD4tpad4xZAl9soo2DUy2BWIr1CvbwvLaveV3oAu/svPUAvCWXRPH9
 /PDfVmAOo1t4yYvMIsx/gJn//Wv0qBtnLsCaby34el4NF5eSTRaYTT+LTUNPLd/j
 oVwf0FPEyp7lHXNH+rrjCn91YrjY+1qnVLkrf1TbpC9XcemONe3lNnl/X5IjSNqS
 BiJXS1Xe1qLeiU+vRsxH8gN/+vr4PDkebm/M371rs7ymL5pfrEM=
 =Mss6
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Fix a loop that occurs when using multiple net namespaces

* tag 'nfsd-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix net-namespace logic in __nfsd_file_cache_purge
2022-11-01 15:05:03 -07:00
Jeff Layton
d3aefd2b29 nfsd: fix net-namespace logic in __nfsd_file_cache_purge
If the namespace doesn't match the one in "net", then we'll continue,
but that doesn't cause another rhashtable_walk_next call, so it will
loop infinitely.

Fixes: ce502f81ba ("NFSD: Convert the filecache to use rhashtable")
Reported-by: Petr Vorel <pvorel@suse.cz>
Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-01 17:27:27 -04:00
Linus Torvalds
54917c90c2 Urgent nolibc pull request for v6.1
This pull request contains a couple of commits that fix string-function
 bugs introduced by:
 
 96980b833a ("tools/nolibc/string: do not use __builtin_strlen() at -O0")
 66b6f755ad ("rcutorture: Import a copy of nolibc")
 
 These appeared in v5.19 and v5.0, respectively, but it would be good
 to get these fixes in sooner rather than later.  Commits providing the
 corresponding tests are in -rcu and I expect to submit them into the
 upcoming v6.2 merge window.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmNcVM8THHBhdWxtY2tA
 a2VybmVsLm9yZwAKCRCevxLzctn7jB6ED/43qPeIBl2n4w0MvIfFGW80/laSWZ0o
 J1r6z0EqTwyHJOx8Yv2tPtpqiJllQXTIH1t4d18wEn/4Bf7rykOCFpNHqvRrui4i
 kp7V5QQUsAOj5Erk7tenbuIA/x2oHkOBmPQwFTF8T+tBuH9cMKKg2bb4VYW40+Dc
 31b3PbBoUDSg1USPfL357itbR+hp8oeWQZdrkYGI3pbF95FEz/49d5UOUZcLya4y
 TWXAFJFSVb5VLqju3Pg9aa8f2HK1JVAduEN/YRYM+h+9jURo6GV5gtUA1HNDHACF
 ZF5Q9cm+OwRLHvfMdX8nJl8wBAbumNkPe0S1ABOM0L38vcj2I+p3CIKF1cZK8Z0X
 QN9eU6EUanlkNcUIsAtT2If1Xs9On6C800Tl3HEGZmHQJmSIy2g++5LY88YfMFO9
 Q+2vGIofixXqJEkwg/iFBOF2R57g3iUJNot3uq6Y5z0kWeZMtN5iXiydPd22/eIm
 puz1NEmTzWsvKN4bI64NmKmSxLbSJWc1Uaydo+XJPgbr1fblTYUKtculPRgbf3N2
 4K0Z5k98Az/cbHj614eZ9sgomFbe43nJ/JLlVFsXVjx9H6PnpvrnIrcxRcWWt4+2
 6nJwB+GMp/YpSbq74gqvt2LVG+NoYNQZKrQKNePj0vxyNwPx4YpMCrXDaodW3dLT
 O/G2Xl0HOBqNPA==
 =EN2i
 -----END PGP SIGNATURE-----

Merge tag 'nolibc-urgent.2022.10.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull nolibc fixes from Paul McKenney:
 "This contains a couple of fixes for string-function bugs"

* tag 'nolibc-urgent.2022.10.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  tools/nolibc/string: Fix memcmp() implementation
  tools/nolibc: Fix missing strlen() definition and infinite loop with gcc-12
2022-11-01 13:15:14 -07:00
Mark Brown
be0ddf5293 arm64: booting: Document our requirements for fine grained traps with SME
With SME we require that fine grained traps on access to TPIDR2_EL0 and
SMPRI_EL1 are disabled but did not document that fact. Add the relevant
register bits.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221101112716.52035-2-broonie@kernel.org
2022-11-01 19:30:34 +00:00
Linus Torvalds
f526d6a822 x86:
- fix lock initialization race in gfn-to-pfn cache (+selftests)
 
 - fix two refcounting errors
 
 - emulator fixes
 
 - mask off reserved bits in CPUID
 
 - fix bug with disabling SGX
 
 RISC-V:
 
 - update MAINTAINERS
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNcYawUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPMzwf/Uh1lO2Op6IJ3/no2gPfShc9bdqgM
 LiCkzfJp9PQAuDl/hs44CiQHUlPEfeIsI/ns0euNj37TlnB3zKmm46mtiWhEefIH
 rwcm/ngKgw3283pZEf8FeMTDfNexOaBg2ZNoODR7JQsU50tbToY4TNE2nNRgbdL5
 SNmzOwox1rZIQHxEa2r/k2B/HdRbeCFUU82EjwFqaNzH1yhzBXMcokdSCmGCBMsE
 3xfCzQ7uMkXw/rlkkG0be65+5dTNmhfiKQYGAQe4s7PycVPMD79D2EhCfbpvbK7t
 EmgOXStmvtW6+ukqPATHbRVCDwW0VmiQv5IWOGbLB1Qdy5/REynJ5ObC8g==
 =Hvro
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "x86:

   - fix lock initialization race in gfn-to-pfn cache (+selftests)

   - fix two refcounting errors

   - emulator fixes

   - mask off reserved bits in CPUID

   - fix bug with disabling SGX

  RISC-V:

   - update MAINTAINERS"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/xen: Fix eventfd error handling in kvm_xen_eventfd_assign()
  KVM: x86: smm: number of GPRs in the SMRAM image depends on the image format
  KVM: x86: emulator: update the emulation mode after CR0 write
  KVM: x86: emulator: update the emulation mode after rsm
  KVM: x86: emulator: introduce emulator_recalc_and_set_mode
  KVM: x86: emulator: em_sysexit should update ctxt->mode
  KVM: selftests: Mark "guest_saw_irq" as volatile in xen_shinfo_test
  KVM: selftests: Add tests in xen_shinfo_test to detect lock races
  KVM: Reject attempts to consume or refresh inactive gfn_to_pfn_cache
  KVM: Initialize gfn_to_pfn_cache locks in dedicated helper
  KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailable
  KVM: x86: Exempt pending triple fault from event injection sanity check
  MAINTAINERS: git://github -> https://github.com for kvm-riscv
  KVM: debugfs: Return retval of simple_attr_open() if it fails
  KVM: x86: Reduce refcount if single_open() fails in kvm_mmu_rmaps_stat_open()
  KVM: x86: Mask off reserved bits in CPUID.8000001FH
  KVM: x86: Mask off reserved bits in CPUID.8000001AH
  KVM: x86: Mask off reserved bits in CPUID.80000008H
  KVM: x86: Mask off reserved bits in CPUID.80000006H
  KVM: x86: Mask off reserved bits in CPUID.80000001H
2022-11-01 12:28:52 -07:00
Linus Torvalds
d79dcde0bc linux-watchdog 6.1-rc4 tag
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iEYEABECAAYFAmNg1YcACgkQ+iyteGJfRsoBNQCgyixgNdLK31WFtoanRlo++KuO
 4csAnjnJ0BdMwJj13TBfv3B107RYpxrd
 =L7M+
 -----END PGP SIGNATURE-----

Merge tag 'linux-watchdog-6.1-rc4' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog fixes from Wim Van Sebroeck:

 - fix use after free in exar driver

 - spelling fix in comment

* tag 'linux-watchdog-6.1-rc4' of git://www.linux-watchdog.org/linux-watchdog:
  drivers: watchdog: exar_wdt.c fix use after free
  watchdog: sp805_wdt: fix spelling typo in comment
2022-11-01 12:21:53 -07:00
Mark Rutland
024f4b2e1f arm64: entry: avoid kprobe recursion
The cortex_a76_erratum_1463225_debug_handler() function is called when
handling debug exceptions (and synchronous exceptions from BRK
instructions), and so is called when a probed function executes. If the
compiler does not inline cortex_a76_erratum_1463225_debug_handler(), it
can be probed.

If cortex_a76_erratum_1463225_debug_handler() is probed, any debug
exception or software breakpoint exception will result in recursive
exceptions leading to a stack overflow. This can be triggered with the
ftrace multiple_probes selftest, and as per the example splat below.

This is a regression caused by commit:

  6459b84697 ("arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround")

... which removed the NOKPROBE_SYMBOL() annotation associated with the
function.

My intent was that cortex_a76_erratum_1463225_debug_handler() would be
inlined into its caller, el1_dbg(), which is marked noinstr and cannot
be probed. Mark cortex_a76_erratum_1463225_debug_handler() as
__always_inline to ensure this.

Example splat prior to this patch (with recursive entries elided):

| # echo p cortex_a76_erratum_1463225_debug_handler > /sys/kernel/debug/tracing/kprobe_events
| # echo p do_el0_svc >> /sys/kernel/debug/tracing/kprobe_events
| # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
| Insufficient stack space to handle exception!
| ESR: 0x0000000096000047 -- DABT (current EL)
| FAR: 0xffff800009cefff0
| Task stack:     [0xffff800009cf0000..0xffff800009cf4000]
| IRQ stack:      [0xffff800008000000..0xffff800008004000]
| Overflow stack: [0xffff00007fbc00f0..0xffff00007fbc10f0]
| CPU: 0 PID: 145 Comm: sh Not tainted 6.0.0 #2
| Hardware name: linux,dummy-virt (DT)
| pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : arm64_enter_el1_dbg+0x4/0x20
| lr : el1_dbg+0x24/0x5c
| sp : ffff800009cf0000
| x29: ffff800009cf0000 x28: ffff000002c74740 x27: 0000000000000000
| x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
| x23: 00000000604003c5 x22: ffff80000801745c x21: 0000aaaac95ac068
| x20: 00000000f2000004 x19: ffff800009cf0040 x18: 0000000000000000
| x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
| x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
| x11: 0000000000000010 x10: ffff800008c87190 x9 : ffff800008ca00d0
| x8 : 000000000000003c x7 : 0000000000000000 x6 : 0000000000000000
| x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000000043a4
| x2 : 00000000f2000004 x1 : 00000000f2000004 x0 : ffff800009cf0040
| Kernel panic - not syncing: kernel stack overflow
| CPU: 0 PID: 145 Comm: sh Not tainted 6.0.0 #2
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0xe4/0x104
|  show_stack+0x18/0x4c
|  dump_stack_lvl+0x64/0x7c
|  dump_stack+0x18/0x38
|  panic+0x14c/0x338
|  test_taint+0x0/0x2c
|  panic_bad_stack+0x104/0x118
|  handle_bad_stack+0x34/0x48
|  __bad_stack+0x78/0x7c
|  arm64_enter_el1_dbg+0x4/0x20
|  el1h_64_sync_handler+0x40/0x98
|  el1h_64_sync+0x64/0x68
|  cortex_a76_erratum_1463225_debug_handler+0x0/0x34
...
|  el1h_64_sync_handler+0x40/0x98
|  el1h_64_sync+0x64/0x68
|  cortex_a76_erratum_1463225_debug_handler+0x0/0x34
...
|  el1h_64_sync_handler+0x40/0x98
|  el1h_64_sync+0x64/0x68
|  cortex_a76_erratum_1463225_debug_handler+0x0/0x34
|  el1h_64_sync_handler+0x40/0x98
|  el1h_64_sync+0x64/0x68
|  do_el0_svc+0x0/0x28
|  el0t_64_sync_handler+0x84/0xf0
|  el0t_64_sync+0x18c/0x190
| Kernel Offset: disabled
| CPU features: 0x0080,00005021,19001080
| Memory Limit: none
| ---[ end Kernel panic - not syncing: kernel stack overflow ]---

With this patch, cortex_a76_erratum_1463225_debug_handler() is inlined
into el1_dbg(), and el1_dbg() cannot be probed:

| # echo p cortex_a76_erratum_1463225_debug_handler > /sys/kernel/debug/tracing/kprobe_events
| sh: write error: No such file or directory
| # grep -w cortex_a76_erratum_1463225_debug_handler /proc/kallsyms | wc -l
| 0
| # echo p el1_dbg > /sys/kernel/debug/tracing/kprobe_events
| sh: write error: Invalid argument
| # grep -w el1_dbg /proc/kallsyms | wc -l
| 1

Fixes: 6459b84697 ("arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround")
Cc: <stable@vger.kernel.org> # 5.12.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221017090157.2881408-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-11-01 17:43:31 +00:00
Dave Hansen
a6dd6f3900 x86/tdx: Prepare for using "INFO" call for a second purpose
The TDG.VP.INFO TDCALL provides the guest with various details about
the TDX system that the guest needs to run.  Only one field is currently
used: 'gpa_width' which tells the guest which PTE bits mark pages shared
or private.

A second field is now needed: the guest "TD attributes" to tell if
virtualization exceptions are configured in a way that can harm the guest.

Make the naming and calling convention more generic and discrete from the
mask-centric one.

Thanks to Sathya for the inspiration here, but there's no code, comments
or changelogs left from where he started.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@vger.kernel.org
2022-11-01 10:07:15 -07:00
Darrick J. Wong
4eb559dd15 xfs: improve runtime refcountbt corruption detection
Fuzz testing of the refcount btree demonstrated a weakness in validation
 of refcount btree records during normal runtime.  The idea of using the
 upper bit of the rc_startblock field to separate the refcount records
 into one group for shared space and another for CoW staging extents was
 added at the last minute.  The incore struct left this bit encoded in
 the upper bit of the startblock field, which makes it all too easy for
 arithmetic operations to overflow if we don't detect the cowflag
 properly.
 
 When I ran a norepair fuzz tester, I was able to crash the kernel on one
 of these accidental overflows by fuzzing a key record in a node block,
 which broke lookups.  To fix the problem, make the domain (shared/cow) a
 separate field in the incore record.
 
 Unfortunately, a customer also hit this once in production.  Due to bugs
 in the kernel running on the VM host, writes to the disk image would
 occasionally be lost.  Given sufficient memory pressure on the VM guest,
 a refcountbt xfs_buf could be reclaimed and later reloaded from the
 stale copy on the virtual disk.  The stale disk contents were a refcount
 btree leaf block full of records for the wrong domain, and this caused
 an infinite loop in the guest VM.
 
 v2: actually include the refcount adjust loop invariant checking patch;
     move the deferred refcount continuation checks earlier in the series;
     break up the megapatch into smaller pieces; fix an uninitialized list
     error.
 v3: in the continuation check patch, verify the per-ag extent before
     converting it to a fsblock
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmNf8LoACgkQ+H93GTRK
 tOu3Aw//Q14hrj8im673xcFvRdviPIsaGZWMrNsDG6gcnnWxOyDWcx3TkzzWQsNT
 82C7urLUIEScYdjNC0G03FVLIROPVGRgaHSXHK1ldQLCjN/63Gzx2Mo5zd6eiroZ
 0RJZtVTr19zQtIwvIDxXKGO/9dJpdDDTJxgtlW8BV2Xr2uBEdPQd4dcGLBiJM/cH
 A/oGOeZHgXlBRFDu9HYzpZRPoOmz0KPxmobSEQF3f131J0jXWFyx5z8qnCGI0BEh
 sNutyY+J77pIkbd3T4tIBZABvehUOcHQbsgET4yAKM6mw80XOiGfAyLL/4bMuBTb
 HBTJwJ8EIm3ibpzutTRUf/z8sIlr4SQJ4+fnEyY40LvcNaF1THF1DxqNJYSSpwSb
 MNXn6X2f5QRCCNHErAE3VpQUNT8mSRTmKKjJu6aYm9m4JpNrHZ6GedF390D3Uuwl
 fNuOGN02fOhtL94XrvCdmDO+lchQ1PBN9/K7ii2b/mmja45CzxzM4y06ZT75ishX
 Gq8ljh50GIjRnf1GAOoOcaPTFd+G+/UAW3cbr/+XA+0Fd7UNkDOY/JQmYR282rtz
 vkebnvmpZqUlzpmbtp0G+izk0gY0GIe1widPEq67zBB1kCqTnWSoOU8vPul/Ng6n
 HZAY/+35tuiCQnyp1JHH3KXQ7YOGMTbx9AQmxeifucXoc1CTMMw=
 =PiZA
 -----END PGP SIGNATURE-----

Merge tag 'refcount-cow-domain-6.1_2022-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.1-fixesA

xfs: improve runtime refcountbt corruption detection

Fuzz testing of the refcount btree demonstrated a weakness in validation
of refcount btree records during normal runtime.  The idea of using the
upper bit of the rc_startblock field to separate the refcount records
into one group for shared space and another for CoW staging extents was
added at the last minute.  The incore struct left this bit encoded in
the upper bit of the startblock field, which makes it all too easy for
arithmetic operations to overflow if we don't detect the cowflag
properly.

When I ran a norepair fuzz tester, I was able to crash the kernel on one
of these accidental overflows by fuzzing a key record in a node block,
which broke lookups.  To fix the problem, make the domain (shared/cow) a
separate field in the incore record.

Unfortunately, a customer also hit this once in production.  Due to bugs
in the kernel running on the VM host, writes to the disk image would
occasionally be lost.  Given sufficient memory pressure on the VM guest,
a refcountbt xfs_buf could be reclaimed and later reloaded from the
stale copy on the virtual disk.  The stale disk contents were a refcount
btree leaf block full of records for the wrong domain, and this caused
an infinite loop in the guest VM.

v2: actually include the refcount adjust loop invariant checking patch;
    move the deferred refcount continuation checks earlier in the series;
    break up the megapatch into smaller pieces; fix an uninitialized list
    error.
v3: in the continuation check patch, verify the per-ag extent before
    converting it to a fsblock

Signed-off-by: Darrick J. Wong <djwong@kernel.org>

* tag 'refcount-cow-domain-6.1_2022-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: rename XFS_REFC_COW_START to _COWFLAG
  xfs: fix uninitialized list head in struct xfs_refcount_recovery
  xfs: fix agblocks check in the cow leftover recovery function
  xfs: check record domain when accessing refcount records
  xfs: remove XFS_FIND_RCEXT_SHARED and _COW
  xfs: refactor domain and refcount checking
  xfs: report refcount domain in tracepoints
  xfs: track cow/shared record domains explicitly in xfs_refcount_irec
  xfs: refactor refcount record usage in xchk_refcountbt_rec
  xfs: move _irec structs to xfs_types.h
  xfs: check deferred refcount op continuation parameters
  xfs: create a predicate to verify per-AG extents
  xfs: make sure aglen never goes negative in xfs_refcount_adjust_extents
2022-11-01 09:52:13 -07:00
Christophe JAILLET
6c412da54c sfc: Fix an error handling path in efx_pci_probe()
If an error occurs after the first kzalloc() the corresponding memory
allocation is never freed.

Add the missing kfree() in the error handling path, as already done in the
remove() function.

Fixes: 7e773594da ("sfc: Separate efx_nic memory from net_device memory")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/dc114193121c52c8fa3779e49bdd99d4b41344a9.1667077009.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-01 09:09:47 -07:00
Marc Zyngier
4151bb636a KVM: arm64: Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
The trapping of SMPRI_EL1 and TPIDR2_EL0 currently only really
work on nVHE, as only this mode uses the fine-grained trapping
that controls these two registers.

Move the trapping enable/disable code into
__{de,}activate_traps_common(), allowing it to be called when it
actually matters on VHE, and remove the flipping of EL2 control
for TPIDR2_EL0, which only affects the host access of this
register.

Fixes: 861262ab86 ("KVM: arm64: Handle SME host state when running guests")
Reported-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/86bkpqer4z.wl-maz@kernel.org
2022-11-01 15:56:52 +00:00
Nathan Huckleberry
fc007fb815 drm/imx: imx-tve: Fix return type of imx_tve_connector_mode_valid
The mode_valid field in drm_connector_helper_funcs is expected to be of
type:
enum drm_mode_status (* mode_valid) (struct drm_connector *connector,
                                     struct drm_display_mode *mode);

The mismatched return type breaks forward edge kCFI since the underlying
function definition does not match the function hook definition.

The return type of imx_tve_connector_mode_valid should be changed from
int to enum drm_mode_status.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1703
Cc: llvm@lists.linux.dev
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220913205544.155106-1-nhuck@google.com
2022-11-01 14:36:55 +01:00
Liu Ying
ff52fe006f drm/imx: Kconfig: Remove duplicated 'select DRM_KMS_HELPER' line
A duplicated line 'select DRM_KMS_HELPER' was introduced in Kconfig file
by commit 09717af7d1 ("drm: Remove CONFIG_DRM_KMS_CMA_HELPER option"),
so remove it.

Fixes: 09717af7d1 ("drm: Remove CONFIG_DRM_KMS_CMA_HELPER option")
Signed-off-by: Liu Ying <victor.liu@nxp.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20221009023527.3669647-1-victor.liu@nxp.com
2022-11-01 14:36:17 +01:00
Nam Cao
d6643d7207 i2c: i801: add lis3lv02d's I2C address for Vostro 5568
Dell Vostro 5568 laptop has lis3lv02d, but its i2c address is not known
to the kernel. Add this address.

Output of "cat /sys/devices/platform/lis3lv02d/position" on Dell Vostro
5568 laptop:
    - Horizontal: (-18,0,1044)
    - Front elevated: (522,-18,1080)
    - Left elevated: (-18,-360,1080)
    - Upside down: (36,108,-1134)

Signed-off-by: Nam Cao <namcaov@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-11-01 13:46:30 +01:00
Thierry Reding
cdbf26251d i2c: tegra: Allocate DMA memory for DMA engine
When the I2C controllers are running in DMA mode, it is the DMA engine
that performs the memory accesses rather than the I2C controller. Pass
the DMA engine's struct device pointer to the DMA API to make sure the
correct DMA operations are used.

This fixes an issue where the DMA engine's SMMU stream ID needs to be
misleadingly set for the I2C controllers in device tree.

Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-11-01 13:36:58 +01:00
Chen Zhongjin
569bea74c9 i2c: piix4: Fix adapter not be removed in piix4_remove()
In piix4_probe(), the piix4 adapter will be registered in:

   piix4_probe()
     piix4_add_adapters_sb800() / piix4_add_adapter()
       i2c_add_adapter()

Based on the probed device type, piix4_add_adapters_sb800() or single
piix4_add_adapter() will be called.
For the former case, piix4_adapter_count is set as the number of adapters,
while for antoher case it is not set and kept default *zero*.

When piix4 is removed, piix4_remove() removes the adapters added in
piix4_probe(), basing on the piix4_adapter_count value.
Because the count is zero for the single adapter case, the adapter won't
be removed and makes the sources allocated for adapter leaked, such as
the i2c client and device.

These sources can still be accessed by i2c or bus and cause problems.
An easily reproduced case is that if a new adapter is registered, i2c
will get the leaked adapter and try to call smbus_algorithm, which was
already freed:

Triggered by: rmmod i2c_piix4 && modprobe max31730

 BUG: unable to handle page fault for address: ffffffffc053d860
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 Oops: 0000 [#1] PREEMPT SMP KASAN
 CPU: 0 PID: 3752 Comm: modprobe Tainted: G
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 RIP: 0010:i2c_default_probe (drivers/i2c/i2c-core-base.c:2259) i2c_core
 RSP: 0018:ffff888107477710 EFLAGS: 00000246
 ...
 <TASK>
  i2c_detect (drivers/i2c/i2c-core-base.c:2302) i2c_core
  __process_new_driver (drivers/i2c/i2c-core-base.c:1336) i2c_core
  bus_for_each_dev (drivers/base/bus.c:301)
  i2c_for_each_dev (drivers/i2c/i2c-core-base.c:1823) i2c_core
  i2c_register_driver (drivers/i2c/i2c-core-base.c:1861) i2c_core
  do_one_initcall (init/main.c:1296)
  do_init_module (kernel/module/main.c:2455)
  ...
 </TASK>
 ---[ end trace 0000000000000000 ]---

Fix this problem by correctly set piix4_adapter_count as 1 for the
single adapter so it can be normally removed.

Fixes: 528d53a159 ("i2c: piix4: Fix probing of reserved ports on AMD Family 16h Model 30h")
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-11-01 13:09:33 +01:00
Cristian Marussi
c4a7b9b587 arm64: dts: juno: Add thermal critical trip points
When thermnal zones are defined, trip points definitions are mandatory.
Define a couple of critical trip points for monitoring of existing
PMIC and SOC thermal zones.

This was lost between txt to yaml conversion and was re-enforced recently
via the commit 8c59632423 ("dt-bindings: thermal: Fix missing required property")

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>
Cc: devicetree@vger.kernel.org
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Fixes: f7b636a8d8 ("arm64: dts: juno: add thermal zones for scpi sensors")
Link: https://lore.kernel.org/r/20221028140833.280091-8-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-11-01 12:03:07 +00:00
Cristian Marussi
1eff6929af firmware: arm_scmi: Fix deferred_tx_wq release on error paths
Use devres to allocate the dedicated deferred_tx_wq polling workqueue so
as to automatically trigger the proper resource release on error path.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 5a3b7185c4 ("firmware: arm_scmi: Add atomic mode support to virtio transport")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20221028140833.280091-6-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-11-01 11:36:20 +00:00
Cristian Marussi
5ffc1c4cb8 firmware: arm_scmi: Fix devres allocation device in virtio transport
SCMI virtio transport device managed allocations must use the main
platform device in devres operations instead of the channel devices.

Cc: Peter Hilber <peter.hilber@opensynergy.com>
Fixes: 46abe13b5e ("firmware: arm_scmi: Add virtio transport")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20221028140833.280091-5-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-11-01 11:35:45 +00:00
Cristian Marussi
be9ba1f7f9 firmware: arm_scmi: Make Rx chan_setup fail on memory errors
SCMI Rx channels are optional and they can fail to be setup when not
present but anyway channels setup routines must bail-out on memory errors.

Make channels setup, and related probing, fail when memory errors are
reported on Rx channels.

Fixes: 5c8a47a5a9 ("firmware: arm_scmi: Make scmi core independent of the transport type")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20221028140833.280091-4-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-11-01 11:35:22 +00:00
Cristian Marussi
59172b212e firmware: arm_scmi: Make tx_prepare time out eventually
SCMI transports based on shared memory, at start of transmissions, have
to wait for the shared Tx channel area to be eventually freed by the
SCMI platform before accessing the channel. In fact the channel is owned
by the SCMI platform until marked as free by the platform itself and,
as such, cannot be used by the agent until relinquished.

As a consequence a badly misbehaving SCMI platform firmware could lock
the channel indefinitely and make the kernel side SCMI stack loop
forever waiting for such channel to be freed, possibly hanging the
whole boot sequence.

Add a timeout to the existent Tx waiting spin-loop so that, when the
system ends up in this situation, the SCMI stack can at least bail-out,
nosily warn the user, and abort the transmission.

Reported-by: YaxiongTian <iambestgod@outlook.com>
Suggested-by: YaxiongTian <iambestgod@outlook.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Etienne Carriere <etienne.carriere@linaro.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20221028140833.280091-3-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-11-01 11:33:24 +00:00
Cristian Marussi
fd96fbc8fa firmware: arm_scmi: Suppress the driver's bind attributes
Suppress the capability to unbind the core SCMI driver since all the
SCMI stack protocol drivers depend on it.

Fixes: aa4f886f38 ("firmware: arm_scmi: add basic driver infrastructure for SCMI")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20221028140833.280091-2-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-11-01 11:32:49 +00:00
Cristian Marussi
3f4071cbd2 firmware: arm_scmi: Cleanup the core driver removal callback
Platform drivers .remove callbacks are not supposed to fail and report
errors. Such errors are indeed ignored by the core platform drivers
and the driver unbind process is anyway completed.

The SCMI core platform driver as it is now, instead, bails out reporting
an error in case of an explicit unbind request.

Fix the removal path by adding proper device links between the core SCMI
device and the SCMI protocol devices so that a full SCMI stack unbind is
triggered when the core driver is removed. The remove process does not
bail out anymore on the anomalous conditions triggered by an explicit
unbind but the user is still warned.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Link: https://lore.kernel.org/r/20221028140833.280091-1-cristian.marussi@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-11-01 11:30:13 +00:00
Jay Fang
a0e215088e
MAINTAINERS: Update HiSilicon LPC BUS Driver maintainer
Add Jay Fang as the maintainer of the HiSilicon LPC BUS Driver, replacing
John Garry.

Signed-off-by: Jay Fang <f.fangjian@huawei.com>
Link: https://lore.kernel.org/r/20221028105434.1661264-1-f.fangjian@huawei.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-11-01 12:22:02 +01:00
Linus Walleij
cd73adcdba
ARM: dts: ux500: Add trips to battery thermal zones
Recent changes to the thermal framework has made the trip
points (trips) for thermal zones compulsory, which made
the Ux500 DTS files break validation and also stopped
probing because of similar changes to the code.

Fix this by adding an "outer bounding box": battery thermal
zones should not get warmer than 70 degress, then we will
shut down.

Fixes: 8c59632423 ("dt-bindings: thermal: Fix missing required property")
Fixes: 3fd6d6e2b4 ("thermal/of: Rework the thermal device tree initialization")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-pm@vger.kernel.org
Link: https://lore.kernel.org/r/20221030210854.346662-1-linus.walleij@linaro.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-11-01 12:21:17 +01:00
Arnd Bergmann
c872f6ce22 i.MX fixes for 6.1:
- Fix imx93-pd driver to release resources when error occurs in probe.
 - A series from Ioana Ciornei to add missing clock frequencies for MDIO
   controllers on LayerScape SoCs, so that the kernel driver can work
   independently from bootloader.
 - A series from Li Jun to fix USB power domain setup in i.MX8MM/N device
   trees.
 - Fix CPLD_Dn pull configuration for MX8Menlo board to avoid interfering
   with CPLD power off functionality.
 - Fix ctrl_sleep_moci GPIO setup for verdin-imx8mp board.
 - Fix DT schema check warnings on uSDHC clocks for imx8-ss-conn device
   tree.
 - Fix up gpcv2 DT bindings to have an optional `power-domains` property.
 - A couple of i.MX93 device tree fixes on S4MU interrupt and gpio-ranges
   of GPIO controllers.
 - Keep PU regulator on for Quad and QuadPlus based imx6dl-yapp4 boards to
   work around a hardware design flaw in supply voltage distribution.
 - Fix user push-button GPIO offset on imx6qdl-gw59 boards.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCgAyFiEEFmJXigPl4LoGSz08UFdYWoewfM4FAmNgjiEUHHNoYXduZ3Vv
 QGtlcm5lbC5vcmcACgkQUFdYWoewfM4anAgAmZ5lSvXKhlaYijN7DSh9WlpwDcf1
 BRGAoT94d2C0rv1+lPKaEcSCJAo7/jqiMUgNorUVhlwR9frJIJK9HvNkfzJ66vQ9
 DRksUZjomlIuefaOWdja18ZZD6WuPoe95pjW2ssB/A2zAmKYxtSoqiHsPfnzRtp1
 yaBa1DyKBodVp+Eo0HT9Cyg1gBbPaHDex4D/v852UaBQ3J297phykeFrL/FjwsZT
 ojAlK30sEb5LOBvcBnQxus1f0EhaB+sjapAsNMX0vaEkx49zO+fOEDa2I/vBXbA8
 NF/jOf87pl9QeYkVDGKlP4EtTbXKObsVOfSFc05C1gT68FUGUbGF2pX/3g==
 =YELK
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmNhAQoACgkQmmx57+YA
 GNknHQ/9ENt7IOQmPPYPF+9jsEE6YtlzPSO40hdbgTcQM94qFN2mQUYRTf9kBzr+
 Gaeot/CE9cZEj5ZlN0qSUK/NWsIfHNiMmCMCTrEB9hqjJqyKfPrWjUD2CaAnORii
 za1Qma7IBrSc0wfiNNx0ZIiWeJluPYw5iw9BgF7HlHOYd81Rs6ZAxDjT0nwa81NN
 DjzpIswc7dFQSJU6mgJ20tEPvthZBRRlDnI4zdzr84hVovizCMF6UfM3K+XfM5Tb
 lJMuLC1w3uCVqS6v3+MpZJABg3qFZwB7TqRI+fS/W+R+w21EGkhzpo8OqqpSULYJ
 4HHrd4Gw1QItue8dVqbqNSTBJNo0lpUvcFKOoNPh0kUuUwRgoxOWKE+EowLpqaNO
 6Judyy4KWTTBv3YyBSF0AVEy/wuQDsv4wj//cttHOrWZKcEnF3t9EiKEBzu0f7b9
 u4dVVw6aOlF/bv7CoOXwhUK79Dm2XOV6NseXKGPGkCdWZNg+bwojKjDIy80sjL7i
 3YR0ggDCb5rN+eioKp04l//qaKbw6cr8FtW7AjfXASfqgHojcyh8UXu0PIuMKRNW
 jod9rP2HFKV8+gLVl/Ax4T9GWyKAbBtogzR+J+77MZXZ5+hbZa4e7CEtHy3feHzC
 d52+kjPqOLiW8CfhgNbTYeWkNhz+qSF1ea8I7AjpyUurWNtlVf4=
 =W2eO
 -----END PGP SIGNATURE-----

Merge tag 'imx-fixes-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes

i.MX fixes for 6.1:

- Fix imx93-pd driver to release resources when error occurs in probe.
- A series from Ioana Ciornei to add missing clock frequencies for MDIO
  controllers on LayerScape SoCs, so that the kernel driver can work
  independently from bootloader.
- A series from Li Jun to fix USB power domain setup in i.MX8MM/N device
  trees.
- Fix CPLD_Dn pull configuration for MX8Menlo board to avoid interfering
  with CPLD power off functionality.
- Fix ctrl_sleep_moci GPIO setup for verdin-imx8mp board.
- Fix DT schema check warnings on uSDHC clocks for imx8-ss-conn device
  tree.
- Fix up gpcv2 DT bindings to have an optional `power-domains` property.
- A couple of i.MX93 device tree fixes on S4MU interrupt and gpio-ranges
  of GPIO controllers.
- Keep PU regulator on for Quad and QuadPlus based imx6dl-yapp4 boards to
  work around a hardware design flaw in supply voltage distribution.
- Fix user push-button GPIO offset on imx6qdl-gw59 boards.

* tag 'imx-fixes-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  arm64: dts: ls208xa: specify clock frequencies for the MDIO controllers
  arm64: dts: ls1088a: specify clock frequencies for the MDIO controllers
  arm64: dts: lx2160a: specify clock frequencies for the MDIO controllers
  soc: imx: imx93-pd: Fix the error handling path of imx93_pd_probe()
  arm64: dts: imx93: correct gpio-ranges
  arm64: dts: imx93: correct s4mu interrupt names
  dt-bindings: power: gpcv2: add power-domains property
  arm64: dts: imx8: correct clock order
  ARM: dts: imx6dl-yapp4: Do not allow PM to switch PU regulator off on Q/QP
  ARM: dts: imx6qdl-gw59{10,13}: fix user pushbutton GPIO offset
  arm64: dts: imx8mn: Correct the usb power domain
  arm64: dts: imx8mn: remove otg1 power domain dependency on hsio
  arm64: dts: imx8mm: correct usb power domains
  arm64: dts: imx8mm: remove otg1/2 power domain dependency on hsio
  arm64: dts: verdin-imx8mp: fix ctrl_sleep_moci
  arm64: dts: imx8mm: Enable CPLD_Dn pull down resistor on MX8Menlo

Link: https://lore.kernel.org/r/20221101031547.GB125525@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-11-01 12:20:41 +01:00
Pablo Neira Ayuso
26b5934ff4 netfilter: nf_tables: release flow rule object from commit path
No need to postpone this to the commit release path, since no packets
are walking over this object, this is accessed from control plane only.
This helped uncovered UAF triggered by races with the netlink notifier.

Fixes: 9dd732e0bd ("netfilter: nf_tables: memleak flow rule from commit path")
Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-01 12:19:47 +01:00
Pablo Neira Ayuso
d4bc8271db netfilter: nf_tables: netlink notifier might race to release objects
commit release path is invoked via call_rcu and it runs lockless to
release the objects after rcu grace period. The netlink notifier handler
might win race to remove objects that the transaction context is still
referencing from the commit release path.

Call rcu_barrier() to ensure pending rcu callbacks run to completion
if the list of transactions to be destroyed is not empty.

Fixes: 6001a930ce ("netfilter: nftables: introduce table ownership")
Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-01 12:19:46 +01:00
Michael Ellerman
02a771c9a6 powerpc/32: Select ARCH_SPLIT_ARG64
On 32-bit kernels, 64-bit syscall arguments are split into two
registers. For that to work with syscall wrappers, the prototype of the
syscall must have the argument split so that the wrapper macro properly
unpacks the arguments from pt_regs.

The fanotify_mark() syscall is one such syscall, which already has a
split prototype, guarded behind ARCH_SPLIT_ARG64.

So select ARCH_SPLIT_ARG64 to get that prototype and fix fanotify_mark()
on 32-bit kernels with syscall wrappers.

Note also that fanotify_mark() is the only usage of ARCH_SPLIT_ARG64.

Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221101034852.2340319-1-mpe@ellerman.id.au
2022-11-01 15:27:12 +11:00
Ziyang Xuan
363a5328f4 net: tun: fix bugs for oversize packet when napi frags enabled
Recently, we got two syzkaller problems because of oversize packet
when napi frags enabled.

One of the problems is because the first seg size of the iov_iter
from user space is very big, it is 2147479538 which is bigger than
the threshold value for bail out early in __alloc_pages(). And
skb->pfmemalloc is true, __kmalloc_reserve() would use pfmemalloc
reserves without __GFP_NOWARN flag. Thus we got a warning as following:

========================================================
WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
...
Call trace:
 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
 __alloc_pages_node include/linux/gfp.h:550 [inline]
 alloc_pages_node include/linux/gfp.h:564 [inline]
 kmalloc_large_node+0x94/0x350 mm/slub.c:4038
 __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
 __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
 pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
 __skb_grow include/linux/skbuff.h:2779 [inline]
 tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
 tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
 tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036

The other problem is because odd IPv6 packets without NEXTHDR_NONE
extension header and have big packet length, it is 2127925 which is
bigger than ETH_MAX_MTU(65535). After ipv6_gso_pull_exthdrs() in
ipv6_gro_receive(), network_header offset and transport_header offset
are all bigger than U16_MAX. That would trigger skb->network_header
and skb->transport_header overflow error, because they are all '__u16'
type. Eventually, it would affect the value for __skb_push(skb, value),
and make it be a big value. After __skb_push() in ipv6_gro_receive(),
skb->data would less than skb->head, an out of bounds memory bug occurred.
That would trigger the problem as following:

==================================================================
BUG: KASAN: use-after-free in eth_type_trans+0x100/0x260
...
Call trace:
 dump_backtrace+0xd8/0x130
 show_stack+0x1c/0x50
 dump_stack_lvl+0x64/0x7c
 print_address_description.constprop.0+0xbc/0x2e8
 print_report+0x100/0x1e4
 kasan_report+0x80/0x120
 __asan_load8+0x78/0xa0
 eth_type_trans+0x100/0x260
 napi_gro_frags+0x164/0x550
 tun_get_user+0xda4/0x1270
 tun_chr_write_iter+0x74/0x130
 do_iter_readv_writev+0x130/0x1ec
 do_iter_write+0xbc/0x1e0
 vfs_writev+0x13c/0x26c

To fix the problems, restrict the packet size less than
(ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN) which has considered reserved
skb space in napi_alloc_skb() because transport_header is an offset from
skb->head. Add len check in tun_napi_alloc_frags() simply.

Fixes: 90e33d4594 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221029094101.1653855-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31 20:04:55 -07:00
Rick Lindsley
e230d36f7d ibmvnic: change maintainers for vnic driver
Changed maintainers for vnic driver, since Dany has new responsibilities.
Also added Nick Child as reviewer.

Signed-off-by: Rick Lindsley <ricklind@us.ibm.com>
Link: https://lore.kernel.org/r/20221028203509.4070154-1-ricklind@us.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31 19:56:57 -07:00
Al Viro
878eb6e48f block: blk_add_rq_to_plug(): clear stale 'last' after flush
blk_mq_flush_plug_list() empties ->mq_list and request we'd peeked there
before that call is gone; in any case, we are not dealing with a mix
of requests for different queues now - there's no requests left in the
plug.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-31 20:21:38 -06:00
Andreas Schwab
ce883a2ba3 powerpc/32: fix syscall wrappers with 64-bit arguments
With the introduction of syscall wrappers all wrappers for syscalls with
64-bit arguments must be handled specially, not only those that have
unaligned 64-bit arguments. This left out the fallocate() and
sync_file_range2() syscalls.

Fixes: 7e92e01b72 ("powerpc: Provide syscall wrapper")
Fixes: e237506238 ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/87mt9cxd6g.fsf_-_@igel.home
2022-11-01 10:24:09 +11:00
Andreas Schwab
40ff214328 asm-generic: compat: fix compat_arg_u64() and compat_arg_u64_dual()
The macros are defined backwards.

This affects the following compat syscalls:
 - compat_sys_truncate64()
 - compat_sys_ftruncate64()
 - compat_sys_fallocate()
 - compat_sys_sync_file_range()
 - compat_sys_fadvise64_64()
 - compat_sys_readahead()
 - compat_sys_pread64()
 - compat_sys_pwrite64()

Fixes: 43d5de2b67 ("asm-generic: compat: Support BE for long long args in 32-bit ABIs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
[mpe: Add list of affected syscalls]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/871qqoyvni.fsf_-_@igel.home
2022-11-01 10:20:11 +11:00
Linus Torvalds
5aaef24b5c for-6.1-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmNfzNwACgkQxWXV+ddt
 WDuC6Q//a72PAq1sjwvQqAcr+OOe3PWnmlwYZCnXxiab5c74Kc7rDhDZcO3m/Qt5
 3YTwgK5FT4Y0AI8RN1NXx3+UOAYCWp/TGeBdbPHg35XIYKAnCh4pfql84Uiw1Awz
 HbqmSTma7sqVdRMehkKCkd7w4YoyAAsDdyXFQlSFm4ah9WHFZDswBc+m6xQZuWvU
 QVQS6wUTxkxuBZp0UComWGBNHiDeDZbga7VqO8UHPYOB394IV2mYP6fh8l0oB/BS
 bfKgsHjV9e0S0Ul0oPVADCGCiJcTbdnw3IA+Cje7MSgZ3kds/4Bo5IJWT5QRb94A
 yDAFpxc+t3+FgpoKS3/tZK7imXwgpXueiT2bBj+BjDDWD2VUVVBG4QmXYIW6tuqY
 vtEFw9+NCAvS2gRetHyXxQshYh/QW//+AZSkuI6/fuPSM+lRG5E0lnDxqrZiOMIo
 e6SJOGH3tCmtusL5VSXIQ8DPaLI9PBg4OXChytwmLHwPIusbQOvD5sTDpd99UezB
 dLXqZOGGScAc11HU1AFyZfAxTBybUgUxX/xCviJtf7ZOWKdcwiFrzSJOL5upSPz3
 8qZTVjrD71mJlEa0Z8wj0Utuu4Psecp0GN+fs5JJxmqsFO0cYApU17OqPZ22+yEV
 RU26YNpqurYVarHVER4WxyXYraBYd1Cr6s6bFVDnuZynfiCOYIw=
 =3tvc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes and regression fixes:

   - fix a corner case when handling tree-mod-log chagnes in reallocated
     notes

   - fix crash on raid0 filesystems created with <5.4 mkfs.btrfs that
     could lead to division by zero

   - add missing super block checksum verification after thawing
     filesystem

   - handle one more case in send when dealing with orphan files

   - fix parameter type mismatch for generation when reading dentry

   - improved error handling in raid56 code

   - better struct bio packing after recent cleanups"

* tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: don't use btrfs_chunk::sub_stripes from disk
  btrfs: fix type of parameter generation in btrfs_get_dentry
  btrfs: send: fix send failure of a subcase of orphan inodes
  btrfs: make thaw time super block check to also verify checksum
  btrfs: fix tree mod log mishandling of reallocated nodes
  btrfs: reorder btrfs_bio for better packing
  btrfs: raid56: avoid double freeing for rbio if full_stripe_write() failed
  btrfs: raid56: properly handle the error when unable to find the missing stripe
2022-10-31 12:28:29 -07:00
Linus Torvalds
78a089d033 lsm/stable-6.1 PR 20221031
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmNfpvEUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXM4wBAAr3iQ2y+j88aZKbgHMp+uT5FF8fp6
 xTAI+Zyqn6KUD3H2VC8DYm1crlyibA6bZhscO3Al14ustS4wyVxXqBkXBTukkXxE
 exTzfmyx8SHCcke5vEfWvF1M/w9nHGRLTwtMwc2W0GR3Qz1uB65ezsxTikDwjlyP
 Ax5nXoC9r0DMsunfkYuLlRpfoe3Vwbz2in93odemB4cHSDiqj0V0Llk5z/kidcqF
 XrPf/GknVZblqS9NDZYg9accZGe8cLuIVHEeiXhmCt21mVoX13PycUWRzSnAvG7/
 9M+Wb3KExpZFn+8J3G0HK89P7v+PUmpOUMsH03kQARdHS0br35jE7eAqfEwo96xk
 UWJKbJCCEqURKmR9nzG6tuHqbUA2e8Sw/fqCMFRTxYBhAl64ptRqJPD5hqwY50Od
 P6khJo75F8uIuwJtW+0fQ9kAIrJqjzVHiObOMEZmt9vSiOOGHqjriGsEitWIMe6+
 cVxVSqwuNeaUyux5sj9IiKyKnFelPt0qMpMncrePZ8l2y4ATf9MQFX28X6HhskPt
 7JD2nIprsCsMHUSjUf4Z+fBZC8IFw8yWSQbM+9S/ErnV2zieq5/OxlnJs87vro6W
 3skrgwsB1C4TQoW9qRf3bDbT5O31kbu4lmUcD5mgUUzQd/V+L257DY2d+rF1rB3w
 QMDyRxPPR/BP6bE=
 =L2Xt
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20221031' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull LSM fix from Paul Moore:
 "A single patch to the capabilities code to fix a potential memory leak
  in the xattr allocation error handling"

* tag 'lsm-pr-20221031' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  capabilities: fix potential memleak on error path from vfs_getxattr_alloc()
2022-10-31 12:09:42 -07:00
Gavin Shan
7a2726ec32 KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
There are two capabilities related to ring-based dirty page tracking:
KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Both are
supported by x86. However, arm64 supports KVM_CAP_DIRTY_LOG_RING_ACQ_REL
only when the feature is supported on arm64. The userspace doesn't have
to enable the advertised capability, meaning KVM_CAP_DIRTY_LOG_RING can
be enabled on arm64 by userspace and it's wrong.

Fix it by double checking if the capability has been advertised prior to
enabling it. It's rejected to enable the capability if it hasn't been
advertised.

Fixes: 17601bfed9 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option")
Reported-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221031003621.164306-4-gshan@redhat.com
2022-10-31 17:22:15 +00:00
Darrick J. Wong
9f187ba0d5 xfs: fix various problems with log intent item recovery
Starting with 6.1-rc1, CONFIG_FORTIFY_SOURCE checks became smart enough
 to detect memcpy() callers that copy beyond what seems to be the end of
 a struct.  Unfortunately, gcc has a bug wherein it cannot reliably
 compute the size of a struct containing another struct containing a flex
 array at the end.  This is the case with the xfs log item format
 structures, which means that -rc1 starts complaining all over the place.
 
 Fix these problems by memcpying the struct head and the flex arrays
 separately.  Although it's tempting to use the FLEX_ARRAY macros, the
 structs involved are part of the ondisk log format.  Some day we're
 going to want to make the ondisk log contents endian-safe, which means
 that we will have to stop using memcpy entirely.
 
 While we're at it, fix some deficiencies in the validation of recovered
 log intent items -- if the size of the recovery buffer is not even large
 enough to cover the flex array record count in the head, we should abort
 the recovery of that item immediately.
 
 The last patch of this series changes the EFI/EFD sizeof functions names
 and behaviors to be consistent with the similarly named sizeof helpers
 for other log intent items.
 
 v2: fix more inadequate log intent done recovery validation and dump
     corrupt recovered items
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmNf8LkACgkQ+H93GTRK
 tOt3QQ//SuyxzE4i2Vr8o7dwFQ6qtQeSt9RixtgKUG3ay+eZLCgpA7KS8po0Dv7W
 /8aAY6K712Mp2IzmdJUIHb/Pch5UbRSN5rw0169CsNDOmU/R9njqfeMWMfDr9ixS
 HAWfo13yh/QSmBTyioijZhP08N0TpyNVFsM9s5/4hKU7UGV4h5g2kz+hyDHrsSmB
 KXAM7FAh6SX8eBjxpj3iKLgsdEW7mcsDYurSVOnfmgWkXvgZXoLOvPt84e09A+s3
 tLq5AEiLr261o45VbfExrjqn0qvwE7HdMdLPJrTa/tp6ztfsU2SJ6AxmG/XTTlBj
 jnIcYL9unu8JOndmJjLZxuhXmXXwZ3eFfsUgn0/tluSeR/nMMc3CCItZ58Ox5zk7
 kUpN0JnY1+ecYmDw1Qz8LhhSIReOiA5Rw2SwVQ8wB3Oit9/cBQsxtM9YxxOne3MN
 od2096CiyvCYjpm6EGTRCkxQuz2nleJ5LajXb7dmkw91IiPdvoWbTPT+trtjO/63
 gYbD0A4Qko9iDW0bWCCvWPD6vBZhN1q6r1j1lu77Az+z/45W47ut6MGokK4NHzo3
 fTarDMqbVDxyeSrhW713iQO7PypLoOv7b72HD1+SvSkHzKwdi42kIlWe8e5B8Rew
 GkH2ycfQtaq+UR2fT4rSs4wWWuxoZLyo9Utdh0DA0ZF5QZfAqFo=
 =wP07
 -----END PGP SIGNATURE-----

Merge tag 'fix-log-recovery-misuse-6.1_2022-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.1-fixes

xfs: fix various problems with log intent item recovery

Starting with 6.1-rc1, CONFIG_FORTIFY_SOURCE checks became smart enough
to detect memcpy() callers that copy beyond what seems to be the end of
a struct.  Unfortunately, gcc has a bug wherein it cannot reliably
compute the size of a struct containing another struct containing a flex
array at the end.  This is the case with the xfs log item format
structures, which means that -rc1 starts complaining all over the place.

Fix these problems by memcpying the struct head and the flex arrays
separately.  Although it's tempting to use the FLEX_ARRAY macros, the
structs involved are part of the ondisk log format.  Some day we're
going to want to make the ondisk log contents endian-safe, which means
that we will have to stop using memcpy entirely.

While we're at it, fix some deficiencies in the validation of recovered
log intent items -- if the size of the recovery buffer is not even large
enough to cover the flex array record count in the head, we should abort
the recovery of that item immediately.

The last patch of this series changes the EFI/EFD sizeof functions names
and behaviors to be consistent with the similarly named sizeof helpers
for other log intent items.

v2: fix more inadequate log intent done recovery validation and dump
    corrupt recovered items

Signed-off-by: Darrick J. Wong <djwong@kernel.org>

* tag 'fix-log-recovery-misuse-6.1_2022-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: dump corrupt recovered log intent items to dmesg consistently
  xfs: actually abort log recovery on corrupt intent-done log items
  xfs: refactor all the EFI/EFD log item sizeof logic
  xfs: fix memcpy fortify errors in EFI log format copying
  xfs: fix memcpy fortify errors in RUI log format copying
  xfs: fix memcpy fortify errors in CUI log format copying
  xfs: fix memcpy fortify errors in BUI log format copying
  xfs: fix validation in attr log item recovery
2022-10-31 09:15:37 -07:00
Darrick J. Wong
8b972158af xfs: rename XFS_REFC_COW_START to _COWFLAG
We've been (ab)using XFS_REFC_COW_START as both an integer quantity and
a bit flag, even though it's *only* a bit flag.  Rename the variable to
reflect its nature and update the cast target since we're not supposed
to be comparing it to xfs_agblock_t now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:22 -07:00
Darrick J. Wong
c1ccf967bf xfs: fix uninitialized list head in struct xfs_refcount_recovery
We're supposed to initialize the list head of an object before adding it
to another list.  Fix that, and stop using the kmem_{alloc,free} calls
from the Irix days.

Fixes: 174edb0e46 ("xfs: store in-progress CoW allocations in the refcount btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:22 -07:00
Darrick J. Wong
f1fdc82078 xfs: fix agblocks check in the cow leftover recovery function
As we've seen, refcount records use the upper bit of the rc_startblock
field to ensure that all the refcount records are at the right side of
the refcount btree.  This works because an AG is never allowed to have
more than (1U << 31) blocks in it.  If we ever encounter a filesystem
claiming to have that many blocks, we absolutely do not want reflink
touching it at all.

However, this test at the start of xfs_refcount_recover_cow_leftovers is
slightly incorrect -- it /should/ be checking that agblocks isn't larger
than the XFS_MAX_CRC_AG_BLOCKS constant, and it should check that the
constant is never large enough to conflict with that CoW flag.

Note that the V5 superblock verifier has not historically rejected
filesystems where agblocks >= XFS_MAX_CRC_AG_BLOCKS, which is why this
ended up in the COW recovery routine.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
f62ac3e0ac xfs: check record domain when accessing refcount records
Now that we've separated the startblock and CoW/shared extent domain in
the incore refcount record structure, check the domain whenever we
retrieve a record to ensure that it's still in the domain that we want.
Depending on the circumstances, a change in domain either means we're
done processing or that we've found a corruption and need to fail out.

The refcount check in xchk_xref_is_cow_staging is redundant since
_get_rec has done that for a long time now, so we can get rid of it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
68d0f38917 xfs: remove XFS_FIND_RCEXT_SHARED and _COW
Now that we have an explicit enum for shared and CoW staging extents, we
can get rid of the old FIND_RCEXT flags.  Omit a couple of conversions
that disappear in the next patches.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
f492135df0 xfs: refactor domain and refcount checking
Create a helper function to ensure that CoW staging extent records have
a single refcount and that shared extent records have more than 1
refcount.  We'll put this to more use in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
571423a162 xfs: report refcount domain in tracepoints
Now that we've broken out the startblock and shared/cow domain in the
incore refcount extent record structure, update the tracepoints to
report the domain.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
9a50ee4f8d xfs: track cow/shared record domains explicitly in xfs_refcount_irec
Just prior to committing the reflink code into upstream, the xfs
maintainer at the time requested that I find a way to shard the refcount
records into two domains -- one for records tracking shared extents, and
a second for tracking CoW staging extents.  The idea here was to
minimize mount time CoW reclamation by pushing all the CoW records to
the right edge of the keyspace, and it was accomplished by setting the
upper bit in rc_startblock.  We don't allow AGs to have more than 2^31
blocks, so the bit was free.

Unfortunately, this was a very late addition to the codebase, so most of
the refcount record processing code still treats rc_startblock as a u32
and pays no attention to whether or not the upper bit (the cow flag) is
set.  This is a weakness is theoretically exploitable, since we're not
fully validating the incoming metadata records.

Fuzzing demonstrates practical exploits of this weakness.  If the cow
flag of a node block key record is corrupted, a lookup operation can go
to the wrong record block and start returning records from the wrong
cow/shared domain.  This causes the math to go all wrong (since cow
domain is still implicit in the upper bit of rc_startblock) and we can
crash the kernel by tricking xfs into jumping into a nonexistent AG and
tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL.

To fix this, start tracking the domain as an explicit part of struct
xfs_refcount_irec, adjust all refcount functions to check the domain
of a returned record, and alter the function definitions to accept them
where necessary.

Found by fuzzing keys[2].cowflag = add in xfs/464.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00
Darrick J. Wong
5a8c345ca8 xfs: refactor refcount record usage in xchk_refcountbt_rec
Consolidate the open-coded xfs_refcount_irec fields into an actual
struct and use the existing _btrec_to_irec to decode the ondisk record.
This will reduce code churn in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-10-31 08:58:21 -07:00