1280697 Commits

Author SHA1 Message Date
Mike Rapoport (IBM)
8043832e2a memblock: use numa_valid_node() helper to check for invalid node ID
Introduce numa_valid_node(nid) that verifies that nid is a valid node ID
and use that instead of comparing nid parameter with either NUMA_NO_NODE
or MAX_NUMNODES.

This makes the checks for valid node IDs consistent and more robust and
allows to get rid of multiple WARNings.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2024-06-16 10:17:57 +03:00
Fabio Estevam
a5d400b643 arm64: dts: imx93-11x11-evk: Remove the 'no-sdio' property
The usdhc2 port is connected to the microSD slot. The presence of the
'no-sdio' property prevents Wifi SDIO cards, such as CMP9010-X-EVB [1]
to be detected.

Remove the 'no-sdio' property so that SDIO cards could also work.

[1] https://www.nxp.com/products/wireless-connectivity/wi-fi-plus-bluetooth-plus-802-15-4/cmp9010-x-evb-iw416-usd-interface-evaluation-board:CMP9010-X-EVB

Fixes: e37907bd8294 ("arm64: dts: freescale: add i.MX93 11x11 EVK basic support")
Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2024-06-16 14:04:49 +08:00
Michael Ellerman
a986fa57fd KVM: PPC: Book3S HV: Prevent UAF in kvm_spapr_tce_attach_iommu_group()
Al reported a possible use-after-free (UAF) in kvm_spapr_tce_attach_iommu_group().

It looks up `stt` from tablefd, but then continues to use it after doing
fdput() on the returned fd. After the fdput() the tablefd is free to be
closed by another thread. The close calls kvm_spapr_tce_release() and
then release_spapr_tce_table() (via call_rcu()) which frees `stt`.

Although there are calls to rcu_read_lock() in
kvm_spapr_tce_attach_iommu_group() they are not sufficient to prevent
the UAF, because `stt` is used outside the locked regions.

With an artifcial delay after the fdput() and a userspace program which
triggers the race, KASAN detects the UAF:

  BUG: KASAN: slab-use-after-free in kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm]
  Read of size 4 at addr c000200027552c30 by task kvm-vfio/2505
  CPU: 54 PID: 2505 Comm: kvm-vfio Not tainted 6.10.0-rc3-next-20240612-dirty #1
  Hardware name: 8335-GTH POWER9 0x4e1202 opal:skiboot-v6.5.3-35-g1851b2a06 PowerNV
  Call Trace:
    dump_stack_lvl+0xb4/0x108 (unreliable)
    print_report+0x2b4/0x6ec
    kasan_report+0x118/0x2b0
    __asan_load4+0xb8/0xd0
    kvm_spapr_tce_attach_iommu_group+0x298/0x720 [kvm]
    kvm_vfio_set_attr+0x524/0xac0 [kvm]
    kvm_device_ioctl+0x144/0x240 [kvm]
    sys_ioctl+0x62c/0x1810
    system_call_exception+0x190/0x440
    system_call_vectored_common+0x15c/0x2ec
  ...
  Freed by task 0:
   ...
   kfree+0xec/0x3e0
   release_spapr_tce_table+0xd4/0x11c [kvm]
   rcu_core+0x568/0x16a0
   handle_softirqs+0x23c/0x920
   do_softirq_own_stack+0x6c/0x90
   do_softirq_own_stack+0x58/0x90
   __irq_exit_rcu+0x218/0x2d0
   irq_exit+0x30/0x80
   arch_local_irq_restore+0x128/0x230
   arch_local_irq_enable+0x1c/0x30
   cpuidle_enter_state+0x134/0x5cc
   cpuidle_enter+0x6c/0xb0
   call_cpuidle+0x7c/0x100
   do_idle+0x394/0x410
   cpu_startup_entry+0x60/0x70
   start_secondary+0x3fc/0x410
   start_secondary_prolog+0x10/0x14

Fix it by delaying the fdput() until `stt` is no longer in use, which
is effectively the entire function. To keep the patch minimal add a call
to fdput() at each of the existing return paths. Future work can convert
the function to goto or __cleanup style cleanup.

With the fix in place the test case no longer triggers the UAF.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Closes: https://lore.kernel.org/all/20240610024437.GA1464458@ZenIV/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240614122910.3499489-1-mpe@ellerman.id.au
2024-06-16 10:20:11 +10:00
Linus Torvalds
a3e18a5405 Bug fixes for 6.10-rc4:
* Ensure xfs incore superblock's
     1. Allocated inode counter
     2. Free inode counter
     3. Free data block counter
     are zero or positive when they are copied over from
     xfs_mount->m_[icount,ifree,fdblocks] respectively.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZm20zAAKCRAH7y4RirJu
 9KU+AQDbXY06rXotcoDqALGSKoGZddBnfJMhOiFgOzNDJ+Cy7AEAieixGN2COKoV
 Xa3TzIO6fb0mbNg7RDySY6XideiJ7Qw=
 =xnum
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Chandan Babu:
 "Ensure xfs incore superblock's allocated inode counter, free inode
  counter, and free data block counter are all zero or positive when
  they are copied over from xfs_mount->m_[icount,ifree,fdblocks]
  respectively"

* tag 'xfs-6.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: make sure sb_fdblocks is non-negative
2024-06-15 12:03:32 -07:00
Linus Torvalds
62e1f3b3fd Two small smb3 server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmZts6cACgkQiiy9cAdy
 T1FGnQwAhUSxD4sWvJ8XL2Y+k8AL/Lh9sGpoj9vgBBgjt9W9nFLT0thEcGDqkpDO
 n9U8OMxsP1+U86WuM28Yuz9+3WvVvX+ZYrG/LBD3DuyoZnpuIzkBst2XUCA+SpKB
 XZ52lLsEQj0Apr0muM98kv+RqZrJwHjE0nrPv0BhQAxNzgaJPJ7RqjjIqBvnBT35
 +OPckXbl3uda5mbnj/jPBYRU3asIkfLcAXh2Q6dwpFRtWLj4P+IOAEB7wfPlVr9O
 rA7ASq7fPuwKSCHpCehWlNdkPItqV2JDN7uvoIZF+83Ob7I6U+Mm6vDJKmf2ap9T
 JW3U2FwytAvvcCnlPj2xW+7fs227hDsLDbbnQGnV00W9LCCxi9t30PgCw5ISx2CP
 XDSg3VvBt6TLzRkGQ44enHxQdNuQ8JGh99MWFl4U54N/j5smyXPfegph28QLi73f
 Ksq+VnHJUMEppaoWO37G0lHfgodA1G4zLWqFeZnNHpC9cqgXm7TNxD7ZhAgQEKGL
 EdtEH3rH
 =6H3L
 -----END PGP SIGNATURE-----

Merge tag '6.10-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Two small smb3 server fixes:

   - set xatttr fix

   - pathname parsing check fix"

* tag '6.10-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix missing use of get_write in in smb2_set_ea()
  ksmbd: move leading slash check to smb2_get_name()
2024-06-15 12:00:25 -07:00
Linus Torvalds
08a6b55aa0 Misc fixes:
- Fix the 8 bytes get_user() logic on x86-32
 
  - Fix build bug that creates weird & mistaken target directory under arch/x86/
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZtTI8RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1j//RAAjD1QkSLfhGRH90CWLAuigLCrAMigacd4
 DzKb71RqpYMolaWNk+cDqksEqtSMhLNlXJYMWPuq6ojD1QUEL+7l7/5JcQGl8VB/
 LY/sPEiMxsycRW5ZxE5OzGoqEiEAhqbxFQpi+7hBYhaYxNvZcJouCrKk9MM6+5s5
 uWjHC8PSsv4YGwHKz8OwHwC6+gklUfYP7dBOLEtOjwqVZtdoiNnv4xl+i7z2lYt/
 d5GjUZQ5KD61Oa64MwgElbs2mKsp5P5SigJ0ts5BTi6y/60W6KPisVDDyEE5Kwx4
 vMrTbvw1rVxDQITokLymKii+vV6j5zDUp2uxqtC78T3Y0rBiHAo3AAfgDOQ58pVo
 Jr/vnCJnbjjWeSvrkBYicHtezKreQv9rOXAb4BGkVqzam+U61hCnk9vo4Idi/5ra
 lmQuQZHTxHsTeGMzV6XEwUFtty5HYWhHArrNdTCMNHjmPkbnML05bc40QScy2xDE
 fX3c5VGe603hm/h9N7esoqOgmK5VSi7Q6r1rGnFQJCClXJ4E4W/zmuVHyjLGOtMh
 IoopKq5I+lReNOE1QW51TQW7W67vW88juTzVIa06jiJfYws41Wr1brvhf7drb+zI
 QgpLDIBN0BiJS5At1cA8ZpTiWqn0xC1GP+uGyv/gJ0xULZehYhnYbk46glbeb5nr
 Uk8B4Yg79sw=
 =Z3bk
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Fix the 8 bytes get_user() logic on x86-32

 - Fix build bug that creates weird & mistaken target directory under
   arch/x86/

* tag 'x86-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Don't add the EFI stub to targets, again
  x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking
2024-06-15 11:03:05 -07:00
Linus Torvalds
41d707222e Fix boot-time warning in tick_setup_device().
Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmZtSrcRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gIRA//TukXjLJbUYA/vsdatgKLMGog4CepkXjc
 heU6qymee4CqY08g3RnXngzR3p0pwMZ8UFlG1Tx1/QOC6rkbZNccuLxRKt80aO4t
 guBggTbY/HyQK5COn1mRcHsoqN0TrzFwezixkC9x1Tyo+aQmctaKz3D0f/PYWsbU
 9oy5/Ap+Gz/Yym63MPvl7/p+8hXX2Iy6LrgepDfMfU5Srl4dCwffJihRmWX/RjZi
 g9CeCw3EKONnBYhR1Eu57IjRXUQlA77x0dw/Wy+1KmVVb2DP7woZMWKruXVh2Cx1
 qnRcSkdIMSpW1d36F8q2xG6FLLxk6w+flpvwWel0MU2WRLh1Wi6Ly598AzKseFY5
 X4JtnAaIBzNLc8UYdebZjCqlVinbp/GGSyD2D5IIELPJ34W8yGe/EiAYzmJ+nM2a
 AawVms1zvnzdC7UxqXrCN+jwh6hMmEUxDuXKLxvgoFkPdNvhK5Fi/2wYKPw4lYHV
 bJQ8kUOeh4j/nDv22IGqMGR7025+tyi7gwgcz2Hgo/ShfyrKxuUcfZ4/znFCAcZZ
 dKCeU+1XsNFxPUlVSp3dccOS3Chs8rljiRFodnV0JM7dKMcXmkI0UQkp3eoRta6X
 RnTh0MmJjBlks32KcivK9zLcrNXHkT5bY/kZpNYiCu688oiqR/aRFMc65mIs9dsF
 2hKbW9LJ4bU=
 =b1pp
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
 "Fix boot-time warning in tick_setup_device()"

* tag 'timers-urgent-2024-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
2024-06-15 10:54:24 -07:00
Aleksandr Nogikh
01c8f9806b kcov: don't lose track of remote references during softirqs
In kcov_remote_start()/kcov_remote_stop(), we swap the previous KCOV
metadata of the current task into a per-CPU variable.  However, the
kcov_mode_enabled(mode) check is not sufficient in the case of remote KCOV
coverage: current->kcov_mode always remains KCOV_MODE_DISABLED for remote
KCOV objects.

If the original task that has invoked the KCOV_REMOTE_ENABLE ioctl happens
to get interrupted and kcov_remote_start() is called, it ultimately leads
to kcov_remote_stop() NOT restoring the original KCOV reference.  So when
the task exits, all registered remote KCOV handles remain active forever.

The most uncomfortable effect (at least for syzkaller) is that the bug
prevents the reuse of the same /sys/kernel/debug/kcov descriptor.  If
we obtain it in the parent process and then e.g.  drop some
capabilities and continuously fork to execute individual programs, at
some point current->kcov of the forked process is lost,
kcov_task_exit() takes no action, and all KCOV_REMOTE_ENABLE ioctls
calls from subsequent forks fail.

And, yes, the efficiency is also affected if we keep on losing remote
kcov objects.
a) kcov_remote_map keeps on growing forever.
b) (If I'm not mistaken), we're also not freeing the memory referenced
by kcov->area.

Fix it by introducing a special kcov_mode that is assigned to the task
that owns a KCOV remote object.  It makes kcov_mode_enabled() return true
and yet does not trigger coverage collection in __sanitizer_cov_trace_pc()
and write_comp_data().

[nogikh@google.com: replace WRITE_ONCE() with an ordinary assignment]
  Link: https://lkml.kernel.org/r/20240614171221.2837584-1-nogikh@google.com
Link: https://lkml.kernel.org/r/20240611133229.527822-1-nogikh@google.com
Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts")
Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:08 -07:00
Baolin Wang
9094b4a1c7 mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
When testing shmem swapin, I encountered the warning below on my machine. 
The reason is that replacing an old shmem folio with a new one causes
mem_cgroup_migrate() to clear the old folio's memcg data.  As a result,
the old folio cannot get the correct memcg's lruvec needed to remove
itself from the LRU list when it is being freed.  This could lead to
possible serious problems, such as LRU list crashes due to holding the
wrong LRU lock, and incorrect LRU statistics.

To fix this issue, we can fallback to use the mem_cgroup_replace_folio()
to replace the old shmem folio.

[ 5241.100311] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5d9960
[ 5241.100317] head: order:4 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 5241.100319] flags: 0x17fffe0000040068(uptodate|lru|head|swapbacked|node=0|zone=2|lastcpupid=0x3ffff)
[ 5241.100323] raw: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000
[ 5241.100325] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 5241.100326] head: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000
[ 5241.100327] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 5241.100328] head: 17fffe0000000204 fffffdffd6665801 ffffffffffffffff 0000000000000000
[ 5241.100329] head: 0000000a00000010 0000000000000000 00000000ffffffff 0000000000000000
[ 5241.100330] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
[ 5241.100338] ------------[ cut here ]------------
[ 5241.100339] WARNING: CPU: 19 PID: 78402 at include/linux/memcontrol.h:775 folio_lruvec_lock_irqsave+0x140/0x150
[...]
[ 5241.100374] pc : folio_lruvec_lock_irqsave+0x140/0x150
[ 5241.100375] lr : folio_lruvec_lock_irqsave+0x138/0x150
[ 5241.100376] sp : ffff80008b38b930
[...]
[ 5241.100398] Call trace:
[ 5241.100399]  folio_lruvec_lock_irqsave+0x140/0x150
[ 5241.100401]  __page_cache_release+0x90/0x300
[ 5241.100404]  __folio_put+0x50/0x108
[ 5241.100406]  shmem_replace_folio+0x1b4/0x240
[ 5241.100409]  shmem_swapin_folio+0x314/0x528
[ 5241.100411]  shmem_get_folio_gfp+0x3b4/0x930
[ 5241.100412]  shmem_fault+0x74/0x160
[ 5241.100414]  __do_fault+0x40/0x218
[ 5241.100417]  do_shared_fault+0x34/0x1b0
[ 5241.100419]  do_fault+0x40/0x168
[ 5241.100420]  handle_pte_fault+0x80/0x228
[ 5241.100422]  __handle_mm_fault+0x1c4/0x440
[ 5241.100424]  handle_mm_fault+0x60/0x1f0
[ 5241.100426]  do_page_fault+0x120/0x488
[ 5241.100429]  do_translation_fault+0x4c/0x68
[ 5241.100431]  do_mem_abort+0x48/0xa0
[ 5241.100434]  el0_da+0x38/0xc0
[ 5241.100436]  el0t_64_sync_handler+0x68/0xc0
[ 5241.100437]  el0t_64_sync+0x14c/0x150
[ 5241.100439] ---[ end trace 0000000000000000 ]---

[baolin.wang@linux.alibaba.com: remove less helpful comments, per Matthew]
  Link: https://lkml.kernel.org/r/ccad3fe1375b468ebca3227b6b729f3eaf9d8046.1718423197.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/3c11000dd6c1df83015a8321a859e9775ebbc23e.1718266112.git.baolin.wang@linux.alibaba.com
Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migration")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:08 -07:00
Peter Xu
0b1ef4fde7 mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick
Macro RANDOM_ORVALUE was used to make sure the pgtable entry will be
populated with !none data in clear tests.

The RANDOM_ORVALUE tried to cover mostly all the bits in a pgtable entry,
even if there's no discussion on whether all the bits will be vaild.  Both
S390 and PPC64 have their own masks to avoid touching some bits.  Now it's
the turn for x86_64.

The issue is there's a recent report from Mikhail Gavrilov showing that
this can cause a warning with the newly added pte set check in commit
8430557fc5 on writable v.s.  userfaultfd-wp bit, even though the check
itself was valid, the random pte is not.  We can choose to mask more bits
out.

However the need to have such random bits setup is questionable, as now
it's already guaranteed to be true on below:

  - For pte level, the pgtable entry will be installed with value from
    pfn_pte(), where pfn points to a valid page.  Hence the pte will be
    !none already if populated with pfn_pte().

  - For upper-than-pte level, the pgtable entry should contain a directory
    entry always, which is also !none.

All the cases look like good enough to test a pxx_clear() helper.  Instead
of extending the bitmask, drop the "set random bits" trick completely.  Add
some warning guards to make sure the entries will be !none before clear().

Link: https://lkml.kernel.org/r/20240523132139.289719-1-peterx@redhat.com
Fixes: 8430557fc584 ("mm/page_table_check: support userfault wr-protect entries")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Link: https://lore.kernel.org/r/CABXGCsMB9A8-X+Np_Q+fWLURYL_0t3Y-MdoNabDM-Lzk58-DGA@mail.gmail.com
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:08 -07:00
Kefeng Wang
cfdd12b482 mm: fix possible OOB in numa_rebuild_large_mapping()
The large folio is mapped with folio size(not greater PMD_SIZE) aligned
virtual address during the pagefault, ie, 'addr = ALIGN_DOWN(vmf->address,
nr_pages * PAGE_SIZE)' in do_anonymous_page().  But after the mremap(),
the virtual address only requires PAGE_SIZE alignment.  Also pte is moved
to new in move_page_tables(), then traversal of the new pte in the
numa_rebuild_large_mapping() could hit the following issue,

   Unable to handle kernel paging request at virtual address 00000a80c021a788
   Mem abort info:
     ESR = 0x0000000096000004
     EC = 0x25: DABT (current EL), IL = 32 bits
     SET = 0, FnV = 0
     EA = 0, S1PTW = 0
     FSC = 0x04: level 0 translation fault
   Data abort info:
     ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
     CM = 0, WnR = 0, TnD = 0, TagAccess = 0
     GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
   user pgtable: 4k pages, 48-bit VAs, pgdp=00002040341a6000
   [00000a80c021a788] pgd=0000000000000000, p4d=0000000000000000
   Internal error: Oops: 0000000096000004 [#1] SMP
   ...
   CPU: 76 PID: 15187 Comm: git Kdump: loaded Tainted: G        W          6.10.0-rc2+ #209
   Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.79 08/21/2021
   pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
   pc : numa_rebuild_large_mapping+0x338/0x638
   lr : numa_rebuild_large_mapping+0x320/0x638
   sp : ffff8000b41c3b00
   x29: ffff8000b41c3b30 x28: ffff8000812a0000 x27: 00000000000a8000
   x26: 00000000000000a8 x25: 0010000000000001 x24: ffff20401c7170f0
   x23: 0000ffff33a1e000 x22: 0000ffff33a76000 x21: ffff20400869eca0
   x20: 0000ffff33976000 x19: 00000000000000a8 x18: ffffffffffffffff
   x17: 0000000000000000 x16: 0000000000000020 x15: ffff8000b41c36a8
   x14: 0000000000000000 x13: 205d373831353154 x12: 5b5d333331363732
   x11: 000000000011ff78 x10: 000000000011ff10 x9 : ffff800080273f30
   x8 : 000000320400869e x7 : c0000000ffffd87f x6 : 00000000001e6ba8
   x5 : ffff206f3fb5af88 x4 : 0000000000000000 x3 : 0000000000000000
   x2 : 0000000000000000 x1 : fffffdffc0000000 x0 : 00000a80c021a780
   Call trace:
    numa_rebuild_large_mapping+0x338/0x638
    do_numa_page+0x3e4/0x4e0
    handle_pte_fault+0x1bc/0x238
    __handle_mm_fault+0x20c/0x400
    handle_mm_fault+0xa8/0x288
    do_page_fault+0x124/0x498
    do_translation_fault+0x54/0x80
    do_mem_abort+0x4c/0xa8
    el0_da+0x40/0x110
    el0t_64_sync_handler+0xe4/0x158
    el0t_64_sync+0x188/0x190

Fix it by making the start and end not only within the vma range, but also
within the page table range.

Link: https://lkml.kernel.org/r/20240612122822.4033433-1-wangkefeng.wang@huawei.com
Fixes: d2136d749d76 ("mm: support multi-size THP numa balancing")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:07 -07:00
Hugh Dickins
8e279f970b mm/migrate: fix kernel BUG at mm/compaction.c:2761!
I hit the VM_BUG_ON(!list_empty(&cc->migratepages)) in compact_zone(); and
if DEBUG_VM were off, then pages would be lost on a local list.

Our convention is that if migrate_pages() reports complete success (0),
then the migratepages list will be empty; but if it reports an error or
some pages remaining, then its caller must putback_movable_pages().

There's a new case in which migrate_pages() has been reporting complete
success, but returning with pages left on the migratepages list: when
migrate_pages_batch() successfully split a folio on the deferred list, but
then the "Failure isn't counted" call does not dispose of them all.

Since that block is expecting the large folio to have been counted as 1
failure already, and since the return code is later adjusted to success
whenever the returned list is found empty, the simple way to fix this
safely is to count splitting the deferred folio as "a failure".

Link: https://lkml.kernel.org/r/46c948b4-4dd8-6e03-4c7b-ce4e81cfa536@google.com
Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:07 -07:00
Mark Brown
e7d2a28bd0 selftests: mm: make map_fixed_noreplace test names stable
KTAP parsers interpret the output of ksft_test_result_*() as being the
name of the test.  The map_fixed_noreplace test uses a dynamically
allocated base address for the mmap()s that it tests and currently
includes this in the test names that it logs so the test names that are
logged are not stable between runs.  It also uses multiples of PAGE_SIZE
which mean that runs for kernels with different PAGE_SIZE configurations
can't be directly compared.  Both these factors cause issues for CI
systems when interpreting and displaying results.

Fix this by replacing the current test names with fixed strings describing
the intent of the mappings that are logged, the existing messages with the
actual addresses and sizes are retained as diagnostic prints to aid in
debugging.

Link: https://lkml.kernel.org/r/20240605-kselftest-mm-fixed-noreplace-v1-1-a235db8b9be9@kernel.org
Fixes: 4838cf70e539 ("selftests/mm: map_fixed_noreplace: conform test to TAP format output")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:07 -07:00
Jeff Xu
653c5c7511 mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it didn't
have proper documentation.  This led to a lot of confusion, especially
about whether or not memfd created with the MFD_NOEXEC_SEAL flag is
sealable.  Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.

As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags.  The idea is
to make it easier to use memfd in the most common way, which is NOEXEC +
F_SEAL_EXEC + MFD_ALLOW_SEALING.  This works with sysctl vm.noexec to help
existing applications move to a more secure way of using memfd.

Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1], Those
are based on the viewpoint that each flag is an atomic unit, which is a
reasonable assumption.  However, MFD_NOEXEC_SEAL was designed with the
intent of promoting the most secure method of using memfd, therefore a
combination of multiple functionalities into one bit.

Furthermore, the MFD_NOEXEC_SEAL has been added for more than one year,
and multiple applications and distributions have backported and utilized
it.  Altering ABI now presents a degree of risk and may lead to
disruption.

MFD_NOEXEC_SEAL is a new flag, and applications must change their code to
use it.  There is no backward compatibility problem.

When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd.  And
old-application might break, that is by-design, in such a system vm.noexec
= 0 shall be used.  Also no backward compatibility problem.

I propose to include this documentation patch to assist in clarifying the
semantics of MFD_NOEXEC_SEAL, thereby preventing any potential future
confusion.

Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.

[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/

[jeffxu@chromium.org: updates per Randy]
  Link: https://lkml.kernel.org/r/20240611034903.3456796-2-jeffxu@chromium.org
[jeffxu@chromium.org: v3]
  Link: https://lkml.kernel.org/r/20240611231409.3899809-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240607203543.2151433-2-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Barnabás Pőcze <pobrn@protonmail.com>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: David Rheinsberg <david@readahead.eu>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:07 -07:00
Rafael Aquini
3afb76a66b mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default
An ASLR regression was noticed [1] and tracked down to file-mapped areas
being backed by THP in recent kernels.  The 21-bit alignment constraint
for such mappings reduces the entropy for randomizing the placement of
64-bit library mappings and breaks ASLR completely for 32-bit libraries.

The reported issue is easily addressed by increasing vm.mmap_rnd_bits and
vm.mmap_rnd_compat_bits.  This patch just provides a simple way to set
ARCH_MMAP_RND_BITS and ARCH_MMAP_RND_COMPAT_BITS to their maximum values
allowed by the architecture at build time.

[1] https://zolutal.github.io/aslrnt/

[akpm@linux-foundation.org: default to `y' if 32-bit, per Rafael]
Link: https://lkml.kernel.org/r/20240606180622.102099-1-aquini@redhat.com
Fixes: 1854bc6e2420 ("mm/readahead: Align file mappings for non-DAX")
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:06 -07:00
Peter Oberparleiter
c1558bc57b gcov: add support for GCC 14
Using gcov on kernels compiled with GCC 14 results in truncated 16-byte
long .gcda files with no usable data.  To fix this, update GCOV_COUNTERS
to match the value defined by GCC 14.

Tested with GCC versions 14.1.0 and 13.2.0.

Link: https://lkml.kernel.org/r/20240610092743.1609845-1-oberpar@linux.ibm.com
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reported-by: Allison Henderson <allison.henderson@oracle.com>
Reported-by: Chuck Lever III <chuck.lever@oracle.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:06 -07:00
Oleg Nesterov
7fea700e04 zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
kernel_wait4() doesn't sleep and returns -EINTR if there is no
eligible child and signal_pending() is true.

That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
return false and avoid a busy-wait loop.

Link: https://lkml.kernel.org/r/20240608120616.GB7947@redhat.com
Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Rachel Menge <rachelmenge@linux.microsoft.com>
Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Tested-by: Wei Fu <fuweid89@gmail.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Cc: Allen Pais <apais@linux.microsoft.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Joel Granados <j.granados@samsung.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:06 -07:00
Ran Xiaokai
6a50c9b512 mm: huge_memory: fix misused mapping_large_folio_support() for anon folios
When I did a large folios split test, a WARNING "[ 5059.122759][ T166]
Cannot split file folio to non-0 order" was triggered.  But the test cases
are only for anonmous folios.  while mapping_large_folio_support() is only
reasonable for page cache folios.

In split_huge_page_to_list_to_order(), the folio passed to
mapping_large_folio_support() maybe anonmous folio.  The folio_test_anon()
check is missing.  So the split of the anonmous THP is failed.  This is
also the same for shmem_mapping().  We'd better add a check for both.  But
the shmem_mapping() in __split_huge_page() is not involved, as for
anonmous folios, the end parameter is set to -1, so (head[i].index >= end)
is always false.  shmem_mapping() is not called.

Also add a VM_WARN_ON_ONCE() in mapping_large_folio_support() for anon
mapping, So we can detect the wrong use more easily.

THP folios maybe exist in the pagecache even the file system doesn't
support large folio, it is because when CONFIG_TRANSPARENT_HUGEPAGE is
enabled, khugepaged will try to collapse read-only file-backed pages to
THP.  But the mapping does not actually support multi order large folios
properly.

Using /sys/kernel/debug/split_huge_pages to verify this, with this patch,
large anon THP is successfully split and the warning is ceased.

Link: https://lkml.kernel.org/r/202406071740485174hcFl7jRxncsHDtI-Pz-o@zte.com.cn
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:06 -07:00
Suren Baghdasaryan
a273559e9e lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get()
put_page_tag_ref() should be called only when get_page_tag_ref() returns a
valid reference because only in that case get_page_tag_ref() enters RCU
read section while put_page_tag_ref() will call rcu_read_unlock() even if
the provided reference is NULL.  Fix pgalloc_tag_get() which does not
follow this rule causing RCU imbalance.  Add a warning in
put_page_tag_ref() to catch any future mistakes.

Link: https://lkml.kernel.org/r/20240601233840.617458-1-surenb@google.com
Fixes: cc92eba1c88b ("mm: fix non-compound multi-order memory accounting in __free_pages")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202405271029.6d2f9c4c-lkp@intel.com
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:05 -07:00
Suren Baghdasaryan
c944bf60c1 lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n
Memory allocation profiling is trying to register sysctl interface even
when CONFIG_SYSCTL=n, resulting in proc_do_static_key() being undefined. 
Prevent that by skipping sysctl registration for such configurations.

Link: https://lkml.kernel.org/r/20240601233831.617124-1-surenb@google.com
Fixes: 22d407b164ff ("lib: add allocation tagging support for memory allocation profiling")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405280616.wcOGWJEj-lkp@intel.com/
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:05 -07:00
Lorenzo Stoakes
3ab85f4046 MAINTAINERS: remove Lorenzo as vmalloc reviewer
I haven't had the bandwidth to review vmalloc patches recently and I
suspect I won't be able to do so consistently moving forwards, so I think
it's best if I remove myself as reviewer for the time being.

Link: https://lkml.kernel.org/r/20240602205510.108807-1-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:05 -07:00
David Hildenbrand
384a746bb5 Revert "mm: init_mlocked_on_free_v3"
There was insufficient review and no agreement that this is the right
approach.

There are serious flaws with the implementation that make processes using
mlock() not even work with simple fork() [1] and we get reliable crashes
when rebooting.

Further, simply because we might be unmapping a single PTE of a large
mlocked folio, we shouldn't zero out the whole folio.

... especially because the code can also *corrupt* urelated memory because
	kernel_init_pages(page, folio_nr_pages(folio));

Could end up writing outside of the actual folio if we work with a tail
page.

Let's revert it.  Once there is agreement that this is the right approach,
the issues were fixed and there was reasonable review and proper testing,
we can consider it again.

[1] https://lkml.kernel.org/r/4da9da2f-73e4-45fd-b62f-a8a513314057@redhat.com

Link: https://lkml.kernel.org/r/20240605091710.38961-1-david@redhat.com
Fixes: ba42b524a040 ("mm: init_mlocked_on_free_v3")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/lkml/20240528151340.4282-1-00107082@163.com/
Reported-by: Lance Yang <ioworker0@gmail.com>
Closes: https://lkml.kernel.org/r/20240601140917.43562-1-ioworker0@gmail.com
Acked-by: Lance Yang <ioworker0@gmail.com>
Cc: York Jasper Niebuhr <yjnworkstation@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:05 -07:00
Peter Xu
8bb592c2ec mm/page_table_check: fix crash on ZONE_DEVICE
Not all pages may apply to pgtable check.  One example is ZONE_DEVICE
pages: they map PFNs directly, and they don't allocate page_ext at all
even if there's struct page around.  One may reference
devm_memremap_pages().

When both ZONE_DEVICE and page-table-check enabled, then try to map some
dax memories, one can trigger kernel bug constantly now when the kernel
was trying to inject some pfn maps on the dax device:

 kernel BUG at mm/page_table_check.c:55!

While it's pretty legal to use set_pxx_at() for ZONE_DEVICE pages for page
fault resolutions, skip all the checks if page_ext doesn't even exist in
pgtable checker, which applies to ZONE_DEVICE but maybe more.

Link: https://lkml.kernel.org/r/20240605212146.994486-1-peterx@redhat.com
Fixes: df4e817b7108 ("mm: page table check")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:04 -07:00
Yury Norov
8e5bd4eadd gcc: disable '-Warray-bounds' for gcc-9
'-Warray-bounds' is already disabled for gcc-10+.  Now that we've merged
bitmap_{read,write), I see the following error when building the kernel
with gcc-9.4 (Ubuntu 20.04.4 LTS) for x86_64 allmodconfig:

drivers/pinctrl/pinctrl-cy8c95x0.c: In function `cy8c95x0_read_regs_mask.isra.0':
include/linux/bitmap.h:756:18: error: array subscript [1, 288230376151711744] is outside array bounds of `long unsigned int[1]' [-Werror=array-bounds]
  756 |  value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
      |               ~~~^~~~~~~~~~~

The immediate reason is that the commit b44759705f7d ("bitmap: make
bitmap_{get,set}_value8() use bitmap_{read,write}()") switched the
bitmap_get_value8() to an alias of bitmap_read(); the same for 'set'.

Now; the code that triggers Warray-bounds, calls the function like this:

  #define MAX_BANK 8
  #define BANK_SZ 8
  #define MAX_LINE        (MAX_BANK * BANK_SZ)
  DECLARE_BITMAP(tval, MAX_LINE); // 64-bit map: unsigned long tval[1]

  read_val |= bitmap_get_value8(tval, i * BANK_SZ) & ~bits;

bitmap_read() is implemented such that it may conditionally dereference a
pointer beyond the boundary like this:

	unsigned long offset = start % BITS_PER_LONG;
        unsigned long space = BITS_PER_LONG - offset;

        if (space >= nbits)
                return (map[index] >> offset) & BITMAP_LAST_WORD_MASK(nbits);

        value_low = map[index] & BITMAP_FIRST_WORD_MASK(start);
        value_high = map[index + 1] & BITMAP_LAST_WORD_MASK(start + nbits);
        return (value_low >> offset) | (value_high << space);

In case of bitmap_get_value8(), it's impossible to violate the boundary
because 'space >= nbits' is never the true for byte-aligned 8-bit access. 
So, this is clearly a false-positive.

The same type of false-positives break my allmodconfig build in many
places.  gcc-8, is clear, however.

Link: https://lkml.kernel.org/r/20240522225830.1201778-1-yury.norov@gmail.com
Fixes: b44759705f7d ("bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}()")
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Yoann Congal <yoann.congal@smile.fr>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:04 -07:00
Joseph Qi
685d03c379 ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger()
bdev->bd_super has been removed and commit 8887b94d9322 change the usage
from bdev->bd_super to b_assoc_map->host->i_sb.  Since ocfs2 hasn't set
bh->b_assoc_map, it will trigger NULL pointer dereference when calling
into ocfs2_abort_trigger().

Actually this was pointed out in history, see commit 74e364ad1b13.  But
I've made a mistake when reviewing commit 8887b94d9322 and then
re-introduce this regression.

Since we cannot revive bdev in buffer head, so fix this issue by
initializing all types of ocfs2 triggers when fill super, and then get the
specific ocfs2 trigger from ocfs2_caching_info when access journal.

[joseph.qi@linux.alibaba.com: v2]
  Link: https://lkml.kernel.org/r/20240602112045.1112708-1-joseph.qi@linux.alibaba.com
Link: https://lkml.kernel.org/r/20240530110630.3933832-2-joseph.qi@linux.alibaba.com
Fixes: 8887b94d9322 ("ocfs2: stop using bdev->bd_super for journal error logging")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>	[6.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:04 -07:00
Joseph Qi
58f7e1e2c9 ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
bdev->bd_super has been removed and commit 8887b94d9322 change the usage
from bdev->bd_super to b_assoc_map->host->i_sb.  This introduces the
following NULL pointer dereference in ocfs2_journal_dirty() since
b_assoc_map is still not initialized.  This can be easily reproduced by
running xfstests generic/186, which simulate no more credits.

[  134.351592] BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[  134.355341] RIP: 0010:ocfs2_journal_dirty+0x14f/0x160 [ocfs2]
...
[  134.365071] Call Trace:
[  134.365312]  <TASK>
[  134.365524]  ? __die_body+0x1e/0x60
[  134.365868]  ? page_fault_oops+0x13d/0x4f0
[  134.366265]  ? __pfx_bit_wait_io+0x10/0x10
[  134.366659]  ? schedule+0x27/0xb0
[  134.366981]  ? exc_page_fault+0x6a/0x140
[  134.367356]  ? asm_exc_page_fault+0x26/0x30
[  134.367762]  ? ocfs2_journal_dirty+0x14f/0x160 [ocfs2]
[  134.368305]  ? ocfs2_journal_dirty+0x13d/0x160 [ocfs2]
[  134.368837]  ocfs2_create_new_meta_bhs.isra.51+0x139/0x2e0 [ocfs2]
[  134.369454]  ocfs2_grow_tree+0x688/0x8a0 [ocfs2]
[  134.369927]  ocfs2_split_and_insert.isra.67+0x35c/0x4a0 [ocfs2]
[  134.370521]  ocfs2_split_extent+0x314/0x4d0 [ocfs2]
[  134.371019]  ocfs2_change_extent_flag+0x174/0x410 [ocfs2]
[  134.371566]  ocfs2_add_refcount_flag+0x3fa/0x630 [ocfs2]
[  134.372117]  ocfs2_reflink_remap_extent+0x21b/0x4c0 [ocfs2]
[  134.372994]  ? inode_update_timestamps+0x4a/0x120
[  134.373692]  ? __pfx_ocfs2_journal_access_di+0x10/0x10 [ocfs2]
[  134.374545]  ? __pfx_ocfs2_journal_access_di+0x10/0x10 [ocfs2]
[  134.375393]  ocfs2_reflink_remap_blocks+0xe4/0x4e0 [ocfs2]
[  134.376197]  ocfs2_remap_file_range+0x1de/0x390 [ocfs2]
[  134.376971]  ? security_file_permission+0x29/0x50
[  134.377644]  vfs_clone_file_range+0xfe/0x320
[  134.378268]  ioctl_file_clone+0x45/0xa0
[  134.378853]  do_vfs_ioctl+0x457/0x990
[  134.379422]  __x64_sys_ioctl+0x6e/0xd0
[  134.379987]  do_syscall_64+0x5d/0x170
[  134.380550]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  134.381231] RIP: 0033:0x7fa4926397cb
[  134.381786] Code: 73 01 c3 48 8b 0d bd 56 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 56 38 00 f7 d8 64 89 01 48
[  134.383930] RSP: 002b:00007ffc2b39f7b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  134.384854] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fa4926397cb
[  134.385734] RDX: 00007ffc2b39f7f0 RSI: 000000004020940d RDI: 0000000000000003
[  134.386606] RBP: 0000000000000000 R08: 00111a82a4f015bb R09: 00007fa494221000
[  134.387476] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  134.388342] R13: 0000000000f10000 R14: 0000558e844e2ac8 R15: 0000000000f10000
[  134.389207]  </TASK>

Fix it by only aborting transaction and journal in ocfs2_journal_dirty()
now, and leave ocfs2_abort() later when detecting an aborted handle,
e.g. start next transaction. Also log the handle details in this case.

Link: https://lkml.kernel.org/r/20240530110630.3933832-1-joseph.qi@linux.alibaba.com
Fixes: 8887b94d9322 ("ocfs2: stop using bdev->bd_super for journal error logging")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>	[6.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:04 -07:00
Tim Harvey
e1b4622efb arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix BT shutdown GPIO
Fix the invalid BT shutdown GPIO (gpio1_io3 not gpio4_io16)

Fixes: 716ced308234 ("arm64: dts: freescale: Add imx8mp-venice-gw73xx-2x")
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2024-06-15 16:35:57 +08:00
Waiman Long
46e27b9961 efi/arm64: Fix kmemleak false positive in arm64_efi_rt_init()
The kmemleak code sometimes complains about the following leak:

unreferenced object 0xffff8000102e0000 (size 32768):
  comm "swapper/0", pid 1, jiffies 4294937323 (age 71.240s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000db9a88a3>] __vmalloc_node_range+0x324/0x450
    [<00000000ff8903a4>] __vmalloc_node+0x90/0xd0
    [<000000001a06634f>] arm64_efi_rt_init+0x64/0xdc
    [<0000000007826a8d>] do_one_initcall+0x178/0xac0
    [<0000000054a87017>] do_initcalls+0x190/0x1d0
    [<00000000308092d0>] kernel_init_freeable+0x2c0/0x2f0
    [<000000003e7b99e0>] kernel_init+0x28/0x14c
    [<000000002246af5b>] ret_from_fork+0x10/0x20

The memory object in this case is for efi_rt_stack_top and is allocated
in an initcall. So this is certainly a false positive. Mark the object
as not a leak to quash it.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-06-15 10:25:02 +02:00
Ard Biesheuvel
75dde792d6 efi/x86: Free EFI memory map only when installing a new one.
The logic in __efi_memmap_init() is shared between two different
execution flows:
- mapping the EFI memory map early or late into the kernel VA space, so
  that its entries can be accessed;
- the x86 specific cloning of the EFI memory map in order to insert new
  entries that are created as a result of making a memory reservation
  via a call to efi_mem_reserve().

In the former case, the underlying memory containing the kernel's view
of the EFI memory map (which may be heavily modified by the kernel
itself on x86) is not modified at all, and the only thing that changes
is the virtual mapping of this memory, which is different between early
and late boot.

In the latter case, an entirely new allocation is created that carries a
new, updated version of the kernel's view of the EFI memory map. When
installing this new version, the old version will no longer be
referenced, and if the memory was allocated by the kernel, it will leak
unless it gets freed.

The logic that implements this freeing currently lives on the code path
that is shared between these two use cases, but it should only apply to
the latter. So move it to the correct spot.

While at it, drop the dummy definition for non-x86 architectures, as
that is no longer needed.

Cc: <stable@vger.kernel.org>
Fixes: f0ef6523475f ("efi: Fix efi_memmap_alloc() leaks")
Tested-by: Ashish Kalra <Ashish.Kalra@amd.com>
Link: https://lore.kernel.org/all/36ad5079-4326-45ed-85f6-928ff76483d3@amd.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-06-15 10:25:02 +02:00
Ard Biesheuvel
a1439d8948 efi/arm: Disable LPAE PAN when calling EFI runtime services
EFI runtime services are remapped into the lower 1 GiB of virtual
address space at boot, so they are guaranteed to be able to co-exist
with the kernel virtual mappings without the need to allocate space for
them in the kernel's vmalloc region, which is rather small.

This means those mappings are covered by TTBR0 when LPAE PAN is enabled,
and so 'user' access must be enabled while such calls are in progress.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-06-15 10:25:02 +02:00
Liu Ying
bcdea3e81e arm: dts: imx53-qsb-hdmi: Disable panel instead of deleting node
We cannot use /delete-node/ directive to delete a node in a DT
overlay.  The node won't be deleted effectively.  Instead, set
the node's status property to "disabled" to achieve something
similar.

Fixes: eeb403df953f ("ARM: dts: imx53-qsb: add support for the HDMI expander")
Signed-off-by: Liu Ying <victor.liu@nxp.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2024-06-15 16:07:15 +08:00
Marek Vasut
c03984d43a arm64: dts: imx8mp: Fix TC9595 input clock on DH i.MX8M Plus DHCOM SoM
The IMX8MP_CLK_CLKOUT2 supplies the TC9595 bridge with 13 MHz reference
clock. The IMX8MP_CLK_CLKOUT2 is supplied from IMX8MP_AUDIO_PLL2_OUT.
The IMX8MP_CLK_CLKOUT2 operates only as a power-of-two divider, and the
current 156 MHz is not power-of-two divisible to achieve 13 MHz.

To achieve 13 MHz output from IMX8MP_CLK_CLKOUT2, set IMX8MP_AUDIO_PLL2_OUT
to 208 MHz, because 208 MHz / 16 = 13 MHz.

Fixes: 20d0b83e712b ("arm64: dts: imx8mp: Add TC9595 bridge on DH electronics i.MX8M Plus DHCOM")
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2024-06-15 16:00:43 +08:00
Takashi Sakamoto
893098b2af firewire: core: record card index in bus_reset_handle tracepoints event
The bus reset event occurs in the bus managed by one of 1394 OHCI
controller in Linux system, however the existing tracepoints events has
the lack of data about it to distinguish the issued hardware from the
others.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-9-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:26 +09:00
Takashi Sakamoto
7507dbc46b firewire: core: record card index in tracepoinrts events derived from bus_reset_arrange_template
The asynchronous transmission of phy packet is initiated on one of 1394
OHCI controller, however the existing tracepoints events has the lack of
data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-8-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
abbb4bd96d firewire: core: record card index in async_phy_inbound tracepoints event
The asynchronous transmission of phy packet is initiated on one of 1394
OHCI controller, however the existing tracepoints events has the lack of
data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-7-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
810f2aa835 firewire: core: record card index in async_phy_outbound_complete tracepoints event
The asynchronous transmission of phy packet is initiated on one of 1394
OHCI controller, however the existing tracepoints events has the lack of
data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-6-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
3cb44a72a3 firewire: core: record card index in async_phy_outbound_initiate tracepoints event
The asynchronous transaction is initiated on one of 1394 OHCI
controller, however the existing tracepoints events has the lack of data
about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-5-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
65ec7ebefe firewire: core: record card index in tracepoinrts events derived from async_inbound_template
The asynchronous transaction is initiated on one of 1394 OHCI controller,
however the existing tracepoints events has the lack of data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-4-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
64e02b64fb firewire: core: record card index in tracepoinrts events derived from async_outbound_initiate_template
The asynchronous transaction is initiated on one of 1394 OHCI controller,
however the existing tracepoints events has the lack of data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-3-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
e7da16abf0 firewire: core: record card index in tracepoinrts events derived from async_outbound_complete_template
The asynchronous transaction is initiated on one of 1394 OHCI controller,
however the existing tracepoints events has the lack of data about it.

This commit adds card_index member into event structure to store the index
of host controller in use, and prints it.

Link: https://lore.kernel.org/r/20240613131440.431766-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:17 +09:00
Takashi Sakamoto
e789523fe2 firewire: fix website URL in Kconfig
The wiki in kernel.org is no longer updated. This commit replaces the
website URL with the latest one.

Link: https://lore.kernel.org/r/20240613090343.416198-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-06-15 14:59:04 +09:00
Linus Torvalds
44ef20baed s390 updates for 6.10-rc4
- A couple of fixes for regressions resulting from the uncoupling of
   physical vs virtual kernel address spaces: fix the mapping of the
   kernel image using large pages; enforce alignment checks on physical
   addresses before creating large pages
 
 - Update defconfigs
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmZsyVMACgkQjYWKoQLX
 FBiMBwf+Ly2mFUG3LQNPauvjhNkhH5NbkNafQTX9yzuDFn2fvQGUH+IruUosa3WI
 egYWP3Z6W0nfpBGWcDleuWCcSUmUvmEiFFI4q30Ze0Q7IMm9VE14/PjSeplWwstV
 pFwpZIqofvXFMbP7TYw4R6lwTe1MTtXLQSv53SMrIufeq0OnSWpWp/2dL2AC36fe
 QuQCqHDuqbEvoV+3s35+xQ6OMoWcWSfG6gHgBPyQ6brkI39PlZwymdSJFanUUsl3
 DCextvbUJEPSI2M5Dc0XrpOnahrN75Sh+w6FYMBcUxVMMNHdbvUXKDzQK3aPL4sQ
 qNhzZJtlK545PpYEZ4TVDuU7zPQnzw==
 =w9Pa
 -----END PGP SIGNATURE-----

Merge tag 's390-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - A couple of fixes for regressions resulting from the uncoupling of
   physical vs virtual kernel address spaces: fix the mapping of the
   kernel image using large pages; enforce alignment checks on physical
   addresses before creating large pages

 - Update defconfigs

* tag 's390-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/mm: Restore mapping of kernel image using large pages
  s390/mm: Allow large pages only for aligned physical addresses
  s390: Update defconfigs
2024-06-14 19:27:02 -07:00
Jakub Kicinski
143492fce3 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-06-11 (ice)

This series contains updates to ice driver only.

En-Wei Wu resolves IRQ collision during suspend.

Paul corrects 200Gbps speed being reported as unknown.

Wojciech adds retry mechanism when package download fails.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: implement AQ download pkg retry
  ice: fix 200G link speed message log
  ice: avoid IRQ collision to fix init failure on ACPI S3 resume
====================

Link: https://lore.kernel.org/r/20240613154514.1948785-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14 19:05:38 -07:00
Linus Torvalds
d4332da0f2 drm fixes for 6.10-rc4
core:
 - Werror Kconfig fix
 
 panel:
 - add orientation quirk for Aya Neo KUN
 - fix runtime warning on panel/bridge release
 
 nouveau:
 - remove unused struct
 - fix wq crash on cards with no display
 
 amdgpu:
 - fix bo release clear page warning
 
 xe:
 - update MAINTAINERS
 - Use correct forcewake assertions.
 - Assert that VRAM provisioning is only done on DGFX.
 - Flush render caches before user-fence signalling on all engines.
 - Move the disable_c6 call since it was sometimes never called.
 
 exynos:
 - fix regression with fallback mode
 - fix EDID related memory leak
 - remove redundant code
 
 komeda:
 - fix debugfs conditional compilations
 - check pointer error value
 
 renesas:
 - atomic shutdown fix
 
 mediatek:
 - atomic shutdown fix
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmZsvfcACgkQDHTzWXnE
 hr5xDQ/+LjrK7jOp+bI4UYmu3MOieUE49iOEpKS8aNpMoKtISMca6ky/VmJPYBzQ
 awN1AAEfdWNa2IqFGBZAsSCXaPuIDDqsWa8PdsXpVNKGN6AkA65TDN2a1eWehTQa
 T8chEnYkQXLFmJxSpMvkpzmiEBnrwi6tvXZ0aGb0iUyyeYZm+tJj91ZW4GbJ4V8+
 /sCuyCi/Nexx9O2uDAQPdqvillWF1nIO/3+2e+YwBVJO4YKMceZunKvNd661JOnq
 idUml6bYvCneZh1SvIIl5gj9Qze87iCa4QR1V5jngJaSjdlBeVpm+Z/r/FdAVPd3
 3Hbd1wFPlHP9UbGE/Q30RVcYUWyUu4+/w3kw46dpNbfd2UQkPy6M9UJTFwWzMrXP
 FmFAA1XahPHtCqBnCOxCZI2FZGoB2Jq7tI/NRspnBQ+gcy0LiU2pNfSkevKDN/Eu
 7LZeq/H2NGPuBTxe0TigsrU0ZEsqWy84tj1eTPDzWVuM8p/fogBP27vDQecLNpDG
 xQw7hmwtKfI3Hd4v27NC+788SZ3PgGKR7L7QSsu30KCK5bb7cybN6TCZWPl5mXo7
 3p60G5FuQX4c131iuSeoAu1Kxu5g9KFbQW1z1hR37kwfPq1bm6mzTFDZys8SSx7d
 qCc3QvJHWUvCug8Mu4RU0h/ZCAMIj62rgMp5XtLqOT9i6R/4x90=
 =ZzdU
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2024-06-15' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly fixes. Seems a little quieter than usual, but still a bunch of
  stuff across the board. Mostly xe, some exynos and nouveau fixes.

  core:
   - Werror Kconfig fix

  panel:
   - add orientation quirk for Aya Neo KUN
   - fix runtime warning on panel/bridge release

  nouveau:
   - remove unused struct
   - fix wq crash on cards with no display

  amdgpu:
   - fix bo release clear page warning

  xe:
   - update MAINTAINERS
   - Use correct forcewake assertions
   - Assert that VRAM provisioning is only done on DGFX
   - Flush render caches before user-fence signalling on all engines
   - Move the disable_c6 call since it was sometimes never called

  exynos:
   - fix regression with fallback mode
   - fix EDID related memory leak
   - remove redundant code

  komeda:
   - fix debugfs conditional compilations
   - check pointer error value

  renesas:
   - atomic shutdown fix

  mediatek:
   - atomic shutdown fix"

* tag 'drm-fixes-2024-06-15' of https://gitlab.freedesktop.org/drm/kernel:
  arm/komeda: Remove all CONFIG_DEBUG_FS conditional compilations
  drm/xe: move disable_c6 call
  drm/xe: flush engine buffers before signalling user fence on all engines
  drm/xe/pf: Assert LMEM provisioning is done only on DGFX
  drm/xe/xe_gt_idle: use GT forcewake domain assertion
  drm/mediatek: Call drm_atomic_helper_shutdown() at shutdown time
  drm: renesas: shmobile: Call drm_atomic_helper_shutdown() at shutdown time
  drm/nouveau: remove unused struct 'init_exec'
  drm/nouveau: don't attempt to schedule hpd_work on headless cards
  drm/amdgpu: Fix the BO release clear memory warning
  drm/bridge/panel: Fix runtime warning on panel bridge release
  drm/komeda: check for error-valued pointer
  drm: panel-orientation-quirks: Add quirk for Aya Neo KUN
  drm/exynos/vidi: fix memory leak in .get_modes()
  drm/exynos: dp: drop driver owner initialization
  drm/exynos: hdmi: report safe 640x480 mode as a fallback when no EDID found
  drm: have config DRM_WERROR depend on !WERROR
  MAINTAINERS: Update Xe driver maintainers
  MAINTAINERS: update Xe driver maintainers
2024-06-14 18:57:28 -07:00
Linus Torvalds
68132b3536 VFIO fixes for v6.10-rc4
- Fix long standing lockdep issue of using remap_pfn_range() from
    the vfio-pci fault handler for mapping device MMIO.  Commit
    ba168b52bf8e ("mm: use rwsem assertion macros for mmap_lock") now
    exposes this as a warning forcing this to be addressed.
 
    remap_pfn_range() was used here to efficiently map the entire vma,
    but it really never should have been used in the fault handler and
    doesn't handle concurrency, which introduced complex locking.  We
    also needed to track vmas mapping the device memory in order to zap
    those vmas when the memory is disabled resulting in a vma list.
 
    Instead of all that mess, setup an address space on the device fd
    such that we can use unmap_mapping_range() for zapping to avoid
    the tracking overhead and use the standard vmf_insert_pfn() to
    insert mappings on fault.  For now we'll iterate the vma and
    opportunistically try to insert mappings for the entire vma.  This
    aligns with typical use cases, but hopefully in the future we can
    drop the iterative approach and make use of huge_fault instead,
    once vmf_insert_pfn{pud,pmd}() learn to handle pfnmaps.
    (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmZsu1AbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsiliUQAIYXnii7HumOdcMIPnre
 twv9K4JFixpaH1bRCKd+bCoAH/7RZDRj0oSKa5OHtYiOfnlalzLAKCq84BRiUndj
 cj9gYlpSlEpZl4Aa9gr2YR9ng5poQjeVq5GIzWQGZwszX2hYCk7Bz5zKbdG95zAm
 qVOIFYTU+i+9D5NRMXwXEnDyKmazwuCwbmlYp4nZijMsP3/rNAy3dmDZ6ljEhIa0
 p0QKvRi7L3BbTu1Zy6PXEowo1neF4d8KgViY7B1eYLsR48awmnXzMcHmvcBk14Mb
 79GwripnXI2SXiGdzzqt0ODuVHs2xyV4P/Ddb6lUpZhO5KgpuxVblW5NMKSKm9ta
 /12WrknWlpIKcPljWVgyDU70O9Umm3f39lUQ6Ns4e/ieS8c8GHC+5Nl5Q8PSEpqj
 VYbSRuObwXSa6qzyB6O2QtNaJ8B55/bjl+FSoN4qnfccprZ7R4k96O/4hu+StZOZ
 4wNaQXyB0FIakelotBy9T/ZbI4YQmhlC4FcsDXugz4wOdVUkwOVVFZ8R7jAUTNxn
 Ty8RzBTSAX4alvpYhe+WFBLq3TnS+c8J7tK1q9ihfwjrEWd3gnx4M+if60XPpPHt
 WxKJnvSzYvasO32AF2yTSmm3S5NlEQvQ2LTY3yfsGHfk0x/W4FEi78KcnjLZLTFk
 dhf3X5qIIw8AGrLHKlgKcu61
 =bSXX
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.10-rc4' of https://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:
 "Fix long standing lockdep issue of using remap_pfn_range() from the
  vfio-pci fault handler for mapping device MMIO. Commit ba168b52bf8e
  ("mm: use rwsem assertion macros for mmap_lock") now exposes this as a
  warning forcing this to be addressed.

  remap_pfn_range() was used here to efficiently map the entire vma, but
  it really never should have been used in the fault handler and doesn't
  handle concurrency, which introduced complex locking. We also needed
  to track vmas mapping the device memory in order to zap those vmas
  when the memory is disabled resulting in a vma list.

  Instead of all that mess, setup an address space on the device fd
  such that we can use unmap_mapping_range() for zapping to avoid the
  tracking overhead and use the standard vmf_insert_pfn() to insert
  mappings on fault.

  For now we'll iterate the vma and opportunistically try to insert
  mappings for the entire vma. This aligns with typical use cases, but
  hopefully in the future we can drop the iterative approach and make
  use of huge_fault instead, once vmf_insert_pfn{pud,pmd}() learn to
  handle pfnmaps"

* tag 'vfio-v6.10-rc4' of https://github.com/awilliam/linux-vfio:
  vfio/pci: Insert full vma on mmap'd MMIO fault
  vfio/pci: Use unmap_mapping_range()
  vfio: Create vfio_fs_type with inode per device
2024-06-14 18:46:53 -07:00
Jakub Kicinski
7ed352d34f netdev-genl: fix error codes when outputting XDP features
-EINVAL will interrupt the dump. The correct error to return
if we have more data to dump is -EMSGSIZE.

Discovered by doing:

  for i in `seq 80`; do ip link add type veth; done
  ./cli.py --dbg-small-recv 5300 --spec netdev.yaml --dump dev-get >> /dev/null
  [...]
     nl_len = 64 (48) nl_flags = 0x0 nl_type = 19
     nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
  	error: -22

Fixes: d3d854fd6a1d ("netdev-genl: create a simple family for netdev stuff")
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240613213044.3675745-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14 18:04:29 -07:00
Jakub Kicinski
c64da10adb bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZmykPwAKCRDbK58LschI
 g7LOAQDVPkJ9k50/xrWIBtgvkGq1jCrMlpwEh49QYO0xoqh1IgEA+6Xje9jCIsdp
 AHz9WmZ6G0EpTuDgFq50K1NVZ7MgSQE=
 =zKfv
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-06-14

We've added 8 non-merge commits during the last 2 day(s) which contain
a total of 9 files changed, 92 insertions(+), 11 deletions(-).

The main changes are:

1) Silence a syzkaller splat under CONFIG_DEBUG_NET=y in pskb_pull_reason()
   triggered via __bpf_try_make_writable(), from Florian Westphal.

2) Fix removal of kfuncs during linking phase which then throws a kernel
   build warning via resolve_btfids about unresolved symbols,
   from Tony Ambardar.

3) Fix a UML x86_64 compilation failure from BPF as pcpu_hot symbol
   is not available on User Mode Linux, from Maciej Żenczykowski.

4) Fix a register corruption in reg_set_min_max triggering an invariant
   violation in BPF verifier, from Daniel Borkmann.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Harden __bpf_kfunc tag against linker kfunc removal
  compiler_types.h: Define __retain for __attribute__((__retain__))
  bpf: Avoid splat in pskb_pull_reason
  bpf: fix UML x86_64 compile failure
  selftests/bpf: Add test coverage for reg_set_min_max handling
  bpf: Reduce stack consumption in check_stack_write_fixed_off
  bpf: Fix reg_set_min_max corruption of fake_reg
  MAINTAINERS: mailmap: Update Stanislav's email address
====================

Link: https://lore.kernel.org/r/20240614203223.26500-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-14 17:57:10 -07:00
Dave Airlie
9f0a86492a Merge tag 'drm-misc-fixes-2024-06-14' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.10-rc4:
- Kconfig fix for WERROR.
- Add panel quirk for Aya Neo KUN
- Small bugfixes in komeda, bridge/panel, amdgpu, nouveau.
- Remove unused nouveau struct.
- Call drm_atomic_helper_shutdown for shmobile and mediatek on shutdown.
- Remove DEBUGFS ifdefs from komeda.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/941c0552-3614-4af1-b04a-0a62c99fd7fb@linux.intel.com
2024-06-15 06:52:56 +10:00
Linus Torvalds
c286c21ff9 block-6.10-20240614
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmZsY6wQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjVoD/9emDOqfWGbYQ2DQaIyXYbWaUHteBrg1d3S
 U8fRQsipHY0F3a+3eoY6Epjyt/8fq3jK4g/0/7hs+wYUtWgEXJMjl9lql7gZl4rS
 3kGlk9CmwFEm5lYQAYCIDNzKr6814naGi6yYhCnstyrsSzawjr3dYP3vr9soQL3h
 TvIv3PPRy6xg03mKEH+X2OSWMa4CB/85WrcKF2L/TrPD1/v3hbLZ2exfK1XX5nhl
 wtM9e7fUoP0bCOx/5GKd01uJJJjrC4dhsLDRTXM1ayVrOO0jREIBi27Ho95TdzfU
 MPwwWosGXgNzBCDKzZyBw1i1LFuHb+7waU/FgvpU5vXS8xaYBvm/4pWnDEGJlSAQ
 8Hwiwl0uiaok9zpMCDexiWl7LdNilcS2wMxKptdpENQ9mzC+doxWqMhQLY2eRprp
 eaBdLsHngS+GCiPePdZXuFMJ9Rz4/J9MEOBWmhicBDjwITLCdF6jGQ1/3tU/vQzp
 wvBjc2qjOQ46uRl+HgJVNe4E2G0BZdF1b17nYEpNRatmIWMCoydZtrLNuve2ZMkG
 icGP31nQe9YlgTMm/ayS2qgLD4yDFFpMeD2Ff3xXmtWZyyJ9L2UdOSUUeCZANmF8
 7OVbQrHAnHi61CINGf0JHVkx1xKy/o3Z06+FEypycxlE849k8sentSHbqWsjWzMP
 4jCICEwsHg==
 =p7jM
 -----END PGP SIGNATURE-----

Merge tag 'block-6.10-20240614' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
     - Discard double free on error conditions (Chunguang)
     - Target Fixes (Daniel)
     - Namespace detachment regression fix (Keith)

 - Fix for an issue with flush requests and queuelist reuse (Chengming)

 - nbd sparse annotation fixes (Christoph)

 - unmap and free bio mapped data via submitter (Anuj)

 - loop discard/fallocate unsupported fix (Cyril)

 - Fix for the zoned write plugging added in this release (Damien)

 - sed-opal wrong address fix (Su)

* tag 'block-6.10-20240614' of git://git.kernel.dk/linux:
  loop: Disable fallocate() zero and discard if not supported
  nvme: fix namespace removal list
  nbd: Remove __force casts
  nvmet: always initialize cqe.result
  nvmet-passthru: propagate status from id override functions
  nvme: avoid double free special payload
  block: unmap and free user mapped integrity via submitter
  block: fix request.queuelist usage in flush
  block: Optimize disk zone resource cleanup
  block: sed-opal: avoid possible wrong address reference in read_sed_opal_key()
2024-06-14 11:41:50 -07:00
Linus Torvalds
ac3cb72aea io_uring-6.10-20240614
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmZsY78QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgppBzEADdmuV9pgx3fGp1f+4XVdzu053z7HWFj6RU
 gQ0cLAL5p0Yx5s+wYG64CC2eluYy7cPxvrfCXoMAdNnFsSSYIPDDerI3ZZ6vt5ft
 m0yHJQx3V0j9ZCqcGFw9PxkOmCOx3jPKe+xkclE1rLIht/lNJd3GaNT8QuXTGcKz
 FqRJ1Jd2a+Rt1/QXJhA/HjFGLTHrpAzFNfmzhCmoVtTJQ9fjw/JNJ7jzrX9p4V8i
 guUBawWVCOOOED1ieg84qCZKIRzaIhgjvo8klELN1dvEKyjqzJUBI7qg1S6GePwy
 rGMuWJHQcOnXvp9+PZ3V9Zs6hGB/NHzOuNJMZ+n8LEm13UGBEcoysRwdcfcZh+jE
 D8NWxofsKeAveoNiKXIZxuZvYQN0RG7+BjBTBbTsr6z/Q/s4r4ll+baQ4lLReiPh
 oC4VKwsTQ3hzpHPomY5r9ZEI3JQOuM8+I2ILLZ26xP+lWyTyrQ8CUEiOCVBPhx8A
 TLRPydRp7V453E1/DjlB7L/oOC6ZBWZqg72kyBj3En8j8KVB1NRJcIuaBrMn+Fi4
 0cgUCmOJQz1ep38j43vwGuRsCVWoZNZ18L2JhhViMFnPh6rw/SlcLSh00ei4cl7v
 BEEJkeYNGyXxkozp7pj7I8CAm3gM3nyy6ULFd9X4vKVBnvnj7aTnQIH4hKd29z/u
 UoNwHCHiSA==
 =bA5c
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-6.10-20240614' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:
 "Two fixes from Pavel headed to stable:

   - Ensure that the task state is correct before attempting to grab a
     mutex

   - Split cancel sequence flag into a separate variable, as it can get
     set by someone not owning the request (but holding the ctx lock)"

* tag 'io_uring-6.10-20240614' of git://git.kernel.dk/linux:
  io_uring: fix cancellation overwriting req->flags
  io_uring/rsrc: don't lock while !TASK_RUNNING
2024-06-14 11:17:24 -07:00