1234971 Commits

Author SHA1 Message Date
Peter Xu
4980e837ca mm/pagemap: fix wr-protect even if PM_SCAN_WP_MATCHING not set
The new pagemap ioctl contains a fast path for wr-protections without
looking into category masks.  It forgets to check PM_SCAN_WP_MATCHING
before applying the wr-protections.  It can cause, e.g., pte markers
installed on archs that do not even support uffd wr-protect.

WARNING: CPU: 0 PID: 5059 at mm/memory.c:1520 zap_pte_range mm/memory.c:1520 [inline]

Link: https://lkml.kernel.org/r/20231116201547.536857-3-peterx@redhat.com
Fixes: 12f6b01a0bcb ("fs/proc/task_mmu: add fast paths to get/clear PAGE_IS_WRITTEN flag")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: syzbot+7ca4b2719dc742b8d0a4@syzkaller.appspotmail.com
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:45 -08:00
Peter Xu
0dff1b407d mm/pagemap: fix ioctl(PAGEMAP_SCAN) on vma check
Patch series "mm/pagemap: A few fixes to the recent PAGEMAP_SCAN".

This series should fix two known reports from syzbot on the new
PAGEMAP_SCAN ioctl():

https://lore.kernel.org/all/000000000000b0e576060a30ee3b@google.com/
https://lore.kernel.org/all/000000000000773fa7060a31e2cc@google.com/

The 3rd patch is something I found when testing these patches.


This patch (of 3):

The new ioctl(PAGEMAP_SCAN) relies on vma wr-protect capability provided
by userfault, however in the vma test it didn't explicitly require the vma
to have wr-protect function enabled, even if PM_SCAN_WP_MATCHING flag is
set.

It means the pagemap code can now apply uffd-wp bit to a page in the vma
even if not registered to userfaultfd at all.

Then in whatever way as long as the pte got written and page fault
resolved, we'll apply the write bit even if uffd-wp bit is set.  We'll see
a pte that has both UFFD_WP and WRITE bit set.  Anything later that looks
up the pte for uffd-wp bit will trigger the warning:

WARNING: CPU: 1 PID: 5071 at arch/x86/include/asm/pgtable.h:403 pte_uffd_wp arch/x86/include/asm/pgtable.h:403 [inline]

Fix it by doing proper check over the vma attributes when
PM_SCAN_WP_MATCHING is specified.

Link: https://lkml.kernel.org/r/20231116201547.536857-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20231116201547.536857-2-peterx@redhat.com
Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: syzbot+e94c5aaf7890901ebf9b@syzkaller.appspotmail.com
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:44 -08:00
Roman Gushchin
5f79489a73 mm: kmem: properly initialize local objcg variable in current_obj_cgroup()
Erhard reported that the 6.7-rc1 kernel panics on boot if being
built with clang-16. The problem was not reproducible with gcc.

[    5.975049] general protection fault, probably for non-canonical address 0xf555515555555557: 0000 [#1] SMP KASAN PTI
[    5.976422] KASAN: maybe wild-memory-access in range [0xaaaaaaaaaaaaaab8-0xaaaaaaaaaaaaaabf]
[    5.977475] CPU: 3 PID: 1 Comm: systemd Not tainted 6.7.0-rc1-Zen3 #77
[    5.977860] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[    5.977860] RIP: 0010:obj_cgroup_charge_pages+0x27/0x2d5
[    5.977860] Code: 90 90 90 55 41 57 41 56 41 55 41 54 53 89 d5 41 89 f6 49 89 ff 48 b8 00 00 00 00 00 fc ff df 49 83 c7 10 4d3
[    5.977860] RSP: 0018:ffffc9000001fb18 EFLAGS: 00010a02
[    5.977860] RAX: dffffc0000000000 RBX: aaaaaaaaaaaaaaaa RCX: ffff8883eb9a8b08
[    5.977860] RDX: 0000000000000005 RSI: 0000000000400cc0 RDI: aaaaaaaaaaaaaaaa
[    5.977860] RBP: 0000000000000005 R08: 3333333333333333 R09: 0000000000000000
[    5.977860] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8883eb9a8b18
[    5.977860] R13: 1555555555555557 R14: 0000000000400cc0 R15: aaaaaaaaaaaaaaba
[    5.977860] FS:  00007f2976438b40(0000) GS:ffff8883eb980000(0000) knlGS:0000000000000000
[    5.977860] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.977860] CR2: 00007f29769e0060 CR3: 0000000107222003 CR4: 0000000000370eb0
[    5.977860] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    5.977860] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    5.977860] Call Trace:
[    5.977860]  <TASK>
[    5.977860]  ? __die_body+0x16/0x75
[    5.977860]  ? die_addr+0x4a/0x70
[    5.977860]  ? exc_general_protection+0x1c9/0x2d0
[    5.977860]  ? cgroup_mkdir+0x455/0x9fb
[    5.977860]  ? __x64_sys_mkdir+0x69/0x80
[    5.977860]  ? asm_exc_general_protection+0x26/0x30
[    5.977860]  ? obj_cgroup_charge_pages+0x27/0x2d5
[    5.977860]  obj_cgroup_charge+0x114/0x1ab
[    5.977860]  pcpu_alloc+0x1a6/0xa65
[    5.977860]  ? mem_cgroup_css_alloc+0x1eb/0x1140
[    5.977860]  ? cgroup_apply_control_enable+0x26b/0x7c0
[    5.977860]  mem_cgroup_css_alloc+0x23f/0x1140
[    5.977860]  cgroup_apply_control_enable+0x26b/0x7c0
[    5.977860]  ? cgroup_kn_set_ugid+0x2d/0x1a0
[    5.977860]  cgroup_mkdir+0x455/0x9fb
[    5.977860]  ? __cfi_cgroup_mkdir+0x10/0x10
[    5.977860]  kernfs_iop_mkdir+0x130/0x170
[    5.977860]  vfs_mkdir+0x405/0x530
[    5.977860]  do_mkdirat+0x188/0x1f0
[    5.977860]  __x64_sys_mkdir+0x69/0x80
[    5.977860]  do_syscall_64+0x7d/0x100
[    5.977860]  ? do_syscall_64+0x89/0x100
[    5.977860]  ? do_syscall_64+0x89/0x100
[    5.977860]  ? do_syscall_64+0x89/0x100
[    5.977860]  ? do_syscall_64+0x89/0x100
[    5.977860]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[    5.977860] RIP: 0033:0x7f297671defb
[    5.977860] Code: 8b 05 39 7f 0d 00 bb ff ff ff ff 64 c7 00 16 00 00 00 e9 61 ff ff ff e8 23 0c 02 00 0f 1f 00 f3 0f 1e fa b88
[    5.977860] RSP: 002b:00007ffee6242bb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[    5.977860] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f297671defb
[    5.977860] RDX: 0000000000000000 RSI: 00000000000001ed RDI: 000055c6b449f0e0
[    5.977860] RBP: 00007ffee6242bf0 R08: 000000000000000e R09: 0000000000000000
[    5.977860] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c6b445db80
[    5.977860] R13: 00000000000003a0 R14: 00007f2976a68651 R15: 00000000000003a0
[    5.977860]  </TASK>
[    5.977860] Modules linked in:
[    6.014095] ---[ end trace 0000000000000000 ]---
[    6.014701] RIP: 0010:obj_cgroup_charge_pages+0x27/0x2d5
[    6.015348] Code: 90 90 90 55 41 57 41 56 41 55 41 54 53 89 d5 41 89 f6 49 89 ff 48 b8 00 00 00 00 00 fc ff df 49 83 c7 10 4d3
[    6.017575] RSP: 0018:ffffc9000001fb18 EFLAGS: 00010a02
[    6.018255] RAX: dffffc0000000000 RBX: aaaaaaaaaaaaaaaa RCX: ffff8883eb9a8b08
[    6.019120] RDX: 0000000000000005 RSI: 0000000000400cc0 RDI: aaaaaaaaaaaaaaaa
[    6.019983] RBP: 0000000000000005 R08: 3333333333333333 R09: 0000000000000000
[    6.020849] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8883eb9a8b18
[    6.021747] R13: 1555555555555557 R14: 0000000000400cc0 R15: aaaaaaaaaaaaaaba
[    6.022609] FS:  00007f2976438b40(0000) GS:ffff8883eb980000(0000) knlGS:0000000000000000
[    6.023593] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.024296] CR2: 00007f29769e0060 CR3: 0000000107222003 CR4: 0000000000370eb0
[    6.025279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    6.026139] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    6.027000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Actually the problem is caused by uninitialized local variable in
current_obj_cgroup().  If the root memory cgroup is set as an active
memory cgroup for a charging scope (as in the trace, where systemd tries
to create the first non-root cgroup, so the parent cgroup is the root
cgroup), the "for" loop is skipped and uninitialized objcg is returned,
causing a panic down the accounting stack.

The fix is trivial: initialize the objcg variable to NULL unconditionally
before the "for" loop.

[vbabka@suse.cz: remove redundant assignment]
  Link: https://lkml.kernel.org/r/4bd106d5-c3e3-6731-9a74-cff81e2392de@suse.cz
Link: https://lkml.kernel.org/r/20231116025109.3775055-1-roman.gushchin@linux.dev
Fixes: e86828e5446d ("mm: kmem: scoped objcg protection")
Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://github.com/ClangBuiltLinux/linux/issues/1959
Tested-by:  Erhard Furtner <erhard_f@mailbox.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Shakeel Butt <shakeelb@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:44 -08:00
Liu Shixin
d63385a7d3 mm/kmemleak: move set_track_prepare() outside raw_spinlocks
set_track_prepare() will call __alloc_pages() which attempts to acquire
zone->lock(spinlocks), so move it outside object->lock(raw_spinlocks)
because it's not right to acquire spinlocks while holding raw_spinlocks in
RT mode.

Link: https://lkml.kernel.org/r/20231115082138.2649870-3-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Patrick Wang <patrick.wang.shcn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:44 -08:00
Liu Shixin
4eff7d62ab Revert "mm/kmemleak: move the initialisation of object to __link_object"
Patch series "Fix invalid wait context of set_track_prepare()".

Geert reported an invalid wait context[1] which is resulted by moving
set_track_prepare() inside kmemleak_lock.  This is not allowed because in
RT mode, the spinlocks can be preempted but raw_spinlocks can not, so it
is not allowd to acquire spinlocks while holding raw_spinlocks.  The
second patch fix same problem in kmemleak_update_trace().


This patch (of 2):

Move the initialisation of object back to__alloc_object() because
set_track_prepare() attempt to acquire zone->lock(spinlocks) while
__link_object is holding kmemleak_lock(raw_spinlocks).  This is not right
for RT mode.

This reverts commit 245245c2fffd00 ("mm/kmemleak: move the initialisation
of object to __link_object").

Link: https://lkml.kernel.org/r/20231115082138.2649870-1-liushixin2@huawei.com
Link: https://lkml.kernel.org/r/20231115082138.2649870-2-liushixin2@huawei.com
Fixes: 245245c2fffd ("mm/kmemleak: move the initialisation of object to __link_object")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Closes: https://lore.kernel.org/linux-mm/CAMuHMdWj0UzwNaxUvcocTfh481qRJpOWwXxsJCTJfu1oCqvgdA@mail.gmail.com/ [1]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Patrick Wang <patrick.wang.shcn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:44 -08:00
Andrew Morton
727d16f199 mm/memory.c:zap_pte_range() print bad swap entry
We have a report of this WARN() triggering.  Let's print the offending
swp_entry_t to help diagnosis.

Link: https://lkml.kernel.org/r/000000000000b0e576060a30ee3b@google.com
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:43 -08:00
Mike Kravetz
187da0f825 hugetlb: fix null-ptr-deref in hugetlb_vma_lock_write
The routine __vma_private_lock tests for the existence of a reserve map
associated with a private hugetlb mapping.  A pointer to the reserve map
is in vma->vm_private_data.  __vma_private_lock was checking the pointer
for NULL.  However, it is possible that the low bits of the pointer could
be used as flags.  In such instances, vm_private_data is not NULL and not
a valid pointer.  This results in the null-ptr-deref reported by syzbot:

general protection fault, probably for non-canonical address 0xdffffc000000001d:
 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
CPU: 0 PID: 5048 Comm: syz-executor139 Not tainted 6.6.0-rc7-syzkaller-00142-g88
8cf78c29e2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 1
0/09/2023
RIP: 0010:__lock_acquire+0x109/0x5de0 kernel/locking/lockdep.c:5004
...
Call Trace:
 <TASK>
 lock_acquire kernel/locking/lockdep.c:5753 [inline]
 lock_acquire+0x1ae/0x510 kernel/locking/lockdep.c:5718
 down_write+0x93/0x200 kernel/locking/rwsem.c:1573
 hugetlb_vma_lock_write mm/hugetlb.c:300 [inline]
 hugetlb_vma_lock_write+0xae/0x100 mm/hugetlb.c:291
 __hugetlb_zap_begin+0x1e9/0x2b0 mm/hugetlb.c:5447
 hugetlb_zap_begin include/linux/hugetlb.h:258 [inline]
 unmap_vmas+0x2f4/0x470 mm/memory.c:1733
 exit_mmap+0x1ad/0xa60 mm/mmap.c:3230
 __mmput+0x12a/0x4d0 kernel/fork.c:1349
 mmput+0x62/0x70 kernel/fork.c:1371
 exit_mm kernel/exit.c:567 [inline]
 do_exit+0x9ad/0x2a20 kernel/exit.c:861
 __do_sys_exit kernel/exit.c:991 [inline]
 __se_sys_exit kernel/exit.c:989 [inline]
 __x64_sys_exit+0x42/0x50 kernel/exit.c:989
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Mask off low bit flags before checking for NULL pointer.  In addition, the
reserve map only 'belongs' to the OWNER (parent in parent/child
relationships) so also check for the OWNER flag.

Link: https://lkml.kernel.org/r/20231114012033.259600-1-mike.kravetz@oracle.com
Reported-by: syzbot+6ada951e7c0f7bc8a71e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/00000000000078d1e00608d7878b@google.com/
Fixes: bf4916922c60 ("hugetlbfs: extend hugetlb_vma_lock to private VMAs")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Edward Adam Davis <eadavis@qq.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:43 -08:00
Andrew Morton
b197d16669 MAINTAINERS: add Andrew Morton for lib/*
Add myself as the fallthough maintainer for material under lib/.

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06 16:12:43 -08:00
Daniel Hill
e597288839 bcachefs: rebalance shouldn't attempt to compress unwritten extents
This fixes a bug where rebalance would loop repeatedly on the same
extents.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-06 17:43:21 -05:00
Fabio Estevam
136c6531ba dt-bindings: display: adi,adv75xx: Document #sound-dai-cells
When using audio from ADV7533 or ADV7535 and describing the audio
card via simple-audio-card, the '#sound-dai-cells' needs to be passed.

Document the '#sound-dai-cells' property to fix the following
dt-schema warning:

imx8mn-beacon-kit.dtb: hdmi@3d: '#sound-dai-cells' does not match any of the regexes: 'pinctrl-[0-9]+'
	from schema $id: http://devicetree.org/schemas/display/bridge/adi,adv7533.yaml#

Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Adam Ford <aford173@gmail.com>
Link: https://lore.kernel.org/r/20231206093643.2198562-1-festevam@gmail.com
Signed-off-by: Rob Herring <robh@kernel.org>
2023-12-06 16:36:14 -06:00
Fabio Estevam
b6c7ca4d79 dt-bindings: lcdif: Properly describe the i.MX23 interrupts
i.MX23 has two LCDIF interrupts instead of a single one like other
i.MX devices.

Take this into account for properly describing the i.MX23 LCDIF
interrupts.

This fixes the following dt-schema warning:

imx23-olinuxino.dtb: lcdif@80030000: interrupts: [[46], [45]] is too long
        from schema $id: http://devicetree.org/schemas/display/fsl,lcdif.yaml#

Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Marek Vasut <marex@denx.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231206112337.2234849-1-festevam@gmail.com
Signed-off-by: Rob Herring <robh@kernel.org>
2023-12-06 16:36:05 -06:00
Jens Axboe
7d2affce33 Merge tag 'md-fixes-20231206' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7
Pull MD fixes from Song:

"This set from Yu Kuai fixes issues around sync_work, which was introduced
 in 6.7 kernels."

* tag 'md-fixes-20231206' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: fix stopping sync thread
  md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()
  md: fix missing flush of sync_work
2023-12-06 15:31:58 -07:00
Jiri Olsa
ffed24eff9 selftests/bpf: Add test for early update in prog_array_map_poke_run
Adding test that tries to trigger the BUG_IN during early map update
in prog_array_map_poke_run function.

The idea is to share prog array map between thread that constantly
updates it and another one loading a program that uses that prog
array.

Eventually we will hit a place where the program is ok to be updated
(poke->tailcall_target_stable check) but the address is still not
registered in kallsyms, so the bpf_arch_text_poke returns -EINVAL
and cause imbalance for the next tail call update check, which will
fail with -EBUSY in bpf_arch_text_poke as described in previous fix.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20231206083041.1306660-3-jolsa@kernel.org
2023-12-06 22:40:43 +01:00
Jiri Olsa
4b7de80160 bpf: Fix prog_array_map_poke_run map poke update
Lee pointed out issue found by syscaller [0] hitting BUG in prog array
map poke update in prog_array_map_poke_run function due to error value
returned from bpf_arch_text_poke function.

There's race window where bpf_arch_text_poke can fail due to missing
bpf program kallsym symbols, which is accounted for with check for
-EINVAL in that BUG_ON call.

The problem is that in such case we won't update the tail call jump
and cause imbalance for the next tail call update check which will
fail with -EBUSY in bpf_arch_text_poke.

I'm hitting following race during the program load:

  CPU 0                             CPU 1

  bpf_prog_load
    bpf_check
      do_misc_fixups
        prog_array_map_poke_track

                                    map_update_elem
                                      bpf_fd_array_map_update_elem
                                        prog_array_map_poke_run

                                          bpf_arch_text_poke returns -EINVAL

    bpf_prog_kallsyms_add

After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
poke update fails on expected jump instruction check in bpf_arch_text_poke
with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.

Similar race exists on the program unload.

Fixing this by moving the update to bpf_arch_poke_desc_update function which
makes sure we call __bpf_arch_text_poke that skips the bpf address check.

Each architecture has slightly different approach wrt looking up bpf address
in bpf_arch_text_poke, so instead of splitting the function or adding new
'checkip' argument in previous version, it seems best to move the whole
map_poke_run update as arch specific code.

  [0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810

Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Reported-by: syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Cc: Lee Jones <lee@kernel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20231206083041.1306660-2-jolsa@kernel.org
2023-12-06 22:40:16 +01:00
Boris Burkov
e85a0adacf btrfs: ensure releasing squota reserve on head refs
A reservation goes through a 3 step lifetime:

- generated during delalloc
- released/counted by ordered_extent allocation
- freed by running delayed ref

That third step depends on must_insert_reserved on the head ref, so the
head ref with that field set owns the reservation. Once you prepare to
run the head ref, must_insert_reserved is unset, which means that
running the ref must free the reservation, whether or not it succeeds,
or else the reservation is leaked. That results in either a risk of
spurious ENOSPC if the fs stays writeable or a warning on unmount if it
is readonly.

The existing squota code was aware of these invariants, but missed a few
cases. Improve it by adding a helper function to use in the cleanup
paths and call it from the existing early returns in running delayed
refs. This also simplifies btrfs_record_squota_delta and struct
btrfs_quota_delta.

This fixes (or at least improves the reliability of) generic/475 with
"mkfs -O squota". On my machine, that test failed ~4/10 times without
this patch and passed 100/100 times with it.

Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:57 +01:00
Boris Burkov
a86805504b btrfs: don't clear qgroup reserved bit in release_folio
The EXTENT_QGROUP_RESERVED bit is used to "lock" regions of the file for
duplicate reservations. That is two writes to that range in one
transaction shouldn't create two reservations, as the reservation will
only be freed once when the write finally goes down. Therefore, it is
never OK to clear that bit without freeing the associated qgroup
reserve. At this point, we don't want to be freeing the reserve, so mask
off the bit.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:52 +01:00
Boris Burkov
b321a52cce btrfs: free qgroup pertrans reserve on transaction abort
If we abort a transaction, we never run the code that frees the pertrans
qgroup reservation. This results in warnings on unmount as that
reservation has been leaked. The leak isn't a huge issue since the fs is
read-only, but it's better to clean it up when we know we can/should. Do
it during the cleanup_transaction step of aborting.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:49 +01:00
Boris Burkov
9e65bfca24 btrfs: fix qgroup_free_reserved_data int overflow
The reserved data counter and input parameter is a u64, but we
inadvertently accumulate it in an int. Overflowing that int results in
freeing the wrong amount of data and breaking reserve accounting.

Unfortunately, this overflow rot spreads from there, as the qgroup
release/free functions rely on returning an int to take advantage of
negative values for error codes.

Therefore, the full fix is to return the "released" or "freed" amount by
a u64 argument and to return 0 or negative error code via the return
value.

Most of the call sites simply ignore the return value, though some
of them handle the error and count the returned bytes. Change all of
them accordingly.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:46 +01:00
Boris Burkov
f63e1164b9 btrfs: free qgroup reserve when ORDERED_IOERR is set
An ordered extent completing is a critical moment in qgroup reserve
handling, as the ownership of the reservation is handed off from the
ordered extent to the delayed ref. In the happy path we release (unlock)
but do not free (decrement counter) the reservation, and the delayed ref
drives the free. However, on an error, we don't create a delayed ref,
since there is no ref to add. Therefore, free on the error path.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-06 22:32:40 +01:00
Alex Deucher
dab96d8b61 drm/amdgpu: fix buffer funcs setting order on suspend
We need to disable this after the last eviction
call, but before we disable the SDMA IP.

Fixes: b70438004a14 ("drm/amdgpu: move buffer funcs setting up a level")
Link: https://lore.kernel.org/r/87edgv4x3i.fsf@vps.thesusis.net
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Tested-by: Phillip Susi <phill@thesusis.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Phillip Susi <phill@thesusis.net>
Cc: Luben Tuikov <ltuikov89@gmail.com>
2023-12-06 16:05:32 -05:00
Lijo Lazar
27b024a88a drm/amdgpu: Avoid querying DRM MGCG status
MP0 v13.0.6 SOCs don't support DRM MGCG.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Lijo Lazar
555e39f027 drm/amdgpu: Update HDP 4.4.2 clock gating flags
HDP 4.4.2 clockgating is enabled by default, update the flags
accordingly.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Lijo Lazar
81577503ef drm/amdgpu: Add NULL checks for function pointers
Check if function is implemented before making the call.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Lijo Lazar
6fce23a4d8 drm/amdgpu: Restrict extended wait to PSP v13.0.6
Only PSPv13.0.6 SOCs take a longer time to reach steady state. Other
PSPv13 based SOCs don't need extended wait. Also, reduce PSPv13.0.6 wait
time.

Cc: stable@vger.kernel.org
Fixes: fc5988907156 ("drm/amdgpu: update retry times for psp vmbx wait")
Fixes: d8c1925ba8cd ("drm/amdgpu: update retry times for psp BL wait")
Link: https://lore.kernel.org/amd-gfx/34dd4c66-f7bf-44aa-af8f-c82889dd652c@amd.com/
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Alex Deucher
5b750b2253 drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml
Does the same thing as:
commit 6740ec97bcdb ("drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2")

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311302107.hUDXVyWT-lkp@intel.com/
Fixes: 67e38874b85b ("drm/amd/display: Increase num voltage states to 40")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Alvin Lee <alvin.lee2@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Cc: Samson Tam <samson.tam@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
2023-12-06 16:05:32 -05:00
Yang Wang
dbf3850d12 drm/amdgpu: optimize the printing order of error data
sort error data list to optimize the printing order.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Hawking Zhang
0e8af20517 drm/amdgpu: Update fw version for boot time error query
Boot time error query is not available until fw a10109

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Yang Wang
37c57631c1 drm/amd/pm: support new mca smu error code decoding
support new mca smu error code decoding from smu 85.86.0 for smu v13.0.6

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Li Ma
78825df90d drm/amd/swsmu: update smu v14_0_0 driver if version and metrics table
Increment the driver if version and add new mems to the mertics table.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Roman Li
9f7cb03e3c drm/amd/display: Fix array-index-out-of-bounds in dml2
[Why]
UBSAN errors observed in dmesg.
array-index-out-of-bounds in dml2/display_mode_core.c

[How]
Fix the index.

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 16:05:32 -05:00
Ivan Lipski
3d71a8726e drm/amd/display: Add monitor patch for specific eDP
[WHY]
Some eDP panels's ext caps don't write initial value cause the value of
dpcd_addr(0x317) is random.  It means that sometimes the eDP will
clarify it is OLED, miniLED...etc cause the backlight control interface
is incorrect.

[HOW]
Add a new panel patch to remove sink ext caps(HDR,OLED...etc)

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Ivan Lipski <ivlipski@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:59:17 -05:00
Alvin Lee
fec05adc40 drm/amd/display: Use channel_width = 2 for vram table 3.0
VBIOS has suggested to use channel_width=2 for any ASIC that uses vram
info 3.0. This is because channel_width in the vram table no longer
represents the memory width

Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Samson Tam <samson.tam@amd.com>
Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-06 15:59:17 -05:00
Yu Kuai
f52f5c71f3 md: fix stopping sync thread
Currently sync thread is stopped from multiple contex:
 - idle_sync_thread
 - frozen_sync_thread
 - __md_stop_writes
 - md_set_readonly
 - do_md_stop

And there are some problems:
1) sync_work is flushed while reconfig_mutex is grabbed, this can
   deadlock because the work function will grab reconfig_mutex as well.
2) md_reap_sync_thread() can't be called directly while md_do_sync() is
   not finished yet, for example, commit 130443d60b1b ("md: refactor
   idle/frozen_sync_thread() to fix deadlock").
3) If MD_RECOVERY_RUNNING is not set, there is no need to stop
   sync_thread at all because sync_thread must not be registered.

Factor out a helper stop_sync_thread(), so that above contex will behave
the same. Fix 1) by flushing sync_work after reconfig_mutex is released,
before waiting for sync_thread to be done; Fix 2) bt letting daemon thread
to unregister sync_thread; Fix 3) by always checking MD_RECOVERY_RUNNING
first.

Fixes: db5e653d7c9f ("md: delay choosing sync action to md_start_sync()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231205094215.1824240-4-yukuai1@huaweicloud.com
2023-12-06 12:44:00 -08:00
Yu Kuai
c9f7cb5b2b md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()
If md_set_readonly() failed, the array could still be read-write, however
'MD_RECOVERY_FROZEN' could still be set, which leave the array in an
abnormal state that sync or recovery can't continue anymore.
Hence make sure the flag is cleared after md_set_readonly() returns.

Fixes: 88724bfa68be ("md: wait for pending superblock updates before switching to read-only")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231205094215.1824240-3-yukuai1@huaweicloud.com
2023-12-06 12:44:00 -08:00
Yu Kuai
f2d87a759f md: fix missing flush of sync_work
Commit ac619781967b ("md: use separate work_struct for md_start_sync()")
use a new sync_work to replace del_work, however, stop_sync_thread() and
__md_stop_writes() was trying to wait for sync_thread to be done, hence
they should switch to use sync_work as well.

Noted that md_start_sync() from sync_work will grab 'reconfig_mutex',
hence other contex can't held the same lock to flush work, and this will
be fixed in later patches.

Fixes: ac619781967b ("md: use separate work_struct for md_start_sync()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20231205094215.1824240-2-yukuai1@huaweicloud.com
2023-12-06 12:44:00 -08:00
Jiadong Zhu
d6a5758866 drm/amdgpu: disable MCBP by default
Disable MCBP(mid command buffer preemption) by default as old Mesa
hangs with it. We shall not enable the feature that breaks old usermode
driver.

Fixes: 50a7c8765ca6 ("drm/amdgpu: enable mcbp by default on gfx9")
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-12-06 15:39:59 -05:00
Steven Rostedt (Google)
f458a14534 ring-buffer: Test last update in 32bit version of __rb_time_read()
Since 64 bit cmpxchg() is very expensive on 32bit architectures, the
timestamp used by the ring buffer does some interesting tricks to be able
to still have an atomic 64 bit number. It originally just used 60 bits and
broke it up into two 32 bit words where the extra 2 bits were used for
synchronization. But this was not enough for all use cases, and all 64
bits were required.

The 32bit version of the ring buffer timestamp was then broken up into 3
32bit words using the same counter trick. But one update was not done. The
check to see if the read operation was done without interruption only
checked the first two words and not last one (like it had before this
update). Fix it by making sure all three updates happen without
interruption by comparing the initial counter with the last updated
counter.

Link: https://lore.kernel.org/linux-trace-kernel/20231206100050.3100b7bb@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: f03f2abce4f39 ("ring-buffer: Have 32 bit time stamps use all 64 bits")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-06 15:01:49 -05:00
Steven Rostedt (Google)
b2dd797543 ring-buffer: Force absolute timestamp on discard of event
There's a race where if an event is discarded from the ring buffer and an
interrupt were to happen at that time and insert an event, the time stamp
is still used from the discarded event as an offset. This can screw up the
timings.

If the event is going to be discarded, set the "before_stamp" to zero.
When a new event comes in, it compares the "before_stamp" with the
"write_stamp" and if they are not equal, it will insert an absolute
timestamp. This will prevent the timings from getting out of sync due to
the discarded event.

Link: https://lore.kernel.org/linux-trace-kernel/20231206100244.5130f9b3@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 6f6be606e763f ("ring-buffer: Force before_stamp and write_stamp to be different on discard")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-06 15:00:59 -05:00
Shyam Prasad N
04909192ad cifs: reconnect worker should take reference on server struct unconditionally
Reconnect worker currently assumes that the server struct
is alive and only takes reference on the server if it needs
to call smb2_reconnect.

With the new ability to disable channels based on whether the
server has multichannel disabled, this becomes a problem when
we need to disable established channels. While disabling the
channels and deallocating the server, there could be reconnect
work that could not be cancelled (because it started).

This change forces the reconnect worker to unconditionally
take a reference on the server when it runs.

Also, this change now allows smb2_reconnect to know if it was
called by the reconnect worker. Based on this, the cifs_put_tcp_session
can decide whether it can cancel the reconnect work synchronously or not.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-06 11:04:23 -06:00
Shyam Prasad N
8233425248 Revert "cifs: reconnect work should have reference on server struct"
This reverts commit 19a4b9d6c372cab6a3b2c9a061a236136fe95274.

This earlier commit was making an assumption that each mod_delayed_work
called for the reconnect work would result in smb2_reconnect_server
being called twice. This assumption turns out to be untrue. So reverting
this change for now.

I will submit a follow-up patch to fix the actual problem in a different
way.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-12-06 11:03:36 -06:00
Phil Sutter
7ae836a3d6 netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
A concurrently running sock_orphan() may NULL the sk_socket pointer in
between check and deref. Follow other users (like nft_meta.c for
instance) and acquire sk_callback_lock before dereferencing sk_socket.

Fixes: 0265ab44bacc ("[NETFILTER]: merge ipt_owner/ip6t_owner in xt_owner")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06 17:52:15 +01:00
Arnd Bergmann
b0b2981c49 Arm FF-A fixes for v6.7
A bunch of fixes addressing issues around the notification support that
 was added this cycle. They address issue in partition IDs handling in
 ffa_notification_info_get(), notifications cleanup path and the size of
 the allocation in ffa_partitions_cleanup().
 
 It also adds check for the notification enabled state so that the drivers
 registering the callbacks can be rejected if not enabled/supported.
 
 It also moves the partitions setup operation after the notification
 initialisation so that the driver has the correct state for notification
 enabled/supported before the partitions are initialised/setup.
 
 It also now allows FF-A initialisation to complete successfully even
 when the notification initialisation fails as it is an optional support
 in the specification. Initial support just allowed it only if the
 firmware didn't support notifications.
 
 Finally, it also adds a fix for smatch warning by declaring ffa_bus_type
 structure in the header.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEunHlEgbzHrJD3ZPhAEG6vDF+4pgFAmVWagYACgkQAEG6vDF+
 4pjtzg//blfExUD8Eu78LtL0CL58qzUrA1bOwT52xbGl202hoykgB49x0YJ4uZMP
 edkwPeUzT1sizvd2/kI+q7qjCrBqwwmaFGD1QUzMX0O7d9bpNZ2r8xzrTEDTUvOh
 yLi4F6D1m50lY2ZFq/8iu/eTtrAHKX0zzmnz0NgNLqITbOMpp1GAvUkhFJjmL4c+
 8MPZL3hHOfVV4m06YQHmpHEbdZiJ5iqY38nXW4Lyl9xQOO9E7/OU48bt5+fFC1SN
 E3PRGrqFKz4O2iCOCEb/mIa94fOjaO4ymgc1/jtUUSZ3ok4+yQlz2ZK2JNPQbwhs
 C8MB/arxIWBdNKejXjUquPiZ4z3FbFIG9A3MptxFQQH6QudygldxifVcqh6AUeJa
 44iaz0Il7oFZF1E1iiMqKHdQEoKXLo0IiGJlBkwTrUKqEQQ0V12hLcujVxKcHMxh
 /UTE5R/yzo5It9NiS9RSHzavTXSbe6B66zepSJv0sm3YWwgaaJZ1uShKHhs9L99L
 rPfTXHcAk5QTCWn66KLnNAW7NzdZ/2paZ+G7gNadqRcVyYdk9Vce3C4HXmse7eOk
 6nfOaRrOSCKFbH5q05k7JUcV6On+2fUFHFpQOnxxatglrpf08WHVLsir5s8DyiZ3
 WC81z23HkoqnNZ63ofw5FUdervQSmB3iK+m+R7Hqrowqg9lcT1U=
 =mgRO
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmVwotEACgkQYKtH/8kJ
 Uif3QA/+MGZdnJywBQKQ89cFl1LMBYhm6+NC/qr6zzTJX8hwAuO31yR4WzUb6H2M
 Y+kZKXmWUYz98ylSBDLXWAWjfI1bHr68h+4LHPWway9SPfrt9ZtTdI0m3kTtcuTQ
 mIa8mkewa2MnD6lxjmkl5jUHjywpFsSCJGHnCRYmoCkUtGVDbXqp4ByBhpQPIBGu
 FeTj8w4EQ3MIf7zIMqm6UmpRrB+zUjwPeinHBcsv6DJvTSn2dWyqpJ1DhoA6Ulkk
 xBrULU2pE6b1qP6WwG81+Nsrdg9NIaO29PNBylEyQQEYQGPTvRwcfVouhRKmeAUk
 2C1JOujbLk67CRfvD5CCyxvptJPW++3scrFFFY1R56B8Z+xz2Kx/IlkXv7U4W5xy
 AWU7kYzWY4andQdzoM1S8M+qwgWKcQ9QiddHOHgnqLLvgn8HOAYUF2LdJneHdr+t
 MZpKKAfYyBlz0tXT7p6YoHDVIRe5pQgb5Pd+irFCpk/40b+xoV4OD4u+zO7JW2e8
 DUGvIbNwQsT8lpgQLmYJfoLZJzEVpHXUiQ9h5g8gzVnL4cDKsvxlvWc6RFTZxDaD
 FgHHQhY0qPyEzVNrw/88QnaT/3XwgB2Boyx1BbYmNovfHRX2zw/UsEQlVNwCjzZ9
 mSUNXTCjfA1sYeX5Akah2upswstNEfs8xWMTLLywsxtJ6SB1zPI=
 =4Rj2
 -----END PGP SIGNATURE-----

Merge tag 'ffa-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes

Arm FF-A fixes for v6.7

A bunch of fixes addressing issues around the notification support that
was added this cycle. They address issue in partition IDs handling in
ffa_notification_info_get(), notifications cleanup path and the size of
the allocation in ffa_partitions_cleanup().

It also adds check for the notification enabled state so that the drivers
registering the callbacks can be rejected if not enabled/supported.

It also moves the partitions setup operation after the notification
initialisation so that the driver has the correct state for notification
enabled/supported before the partitions are initialised/setup.

It also now allows FF-A initialisation to complete successfully even
when the notification initialisation fails as it is an optional support
in the specification. Initial support just allowed it only if the
firmware didn't support notifications.

Finally, it also adds a fix for smatch warning by declaring ffa_bus_type
structure in the header.

* tag 'ffa-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  firmware: arm_ffa: Fix ffa_notification_info_get() IDs handling
  firmware: arm_ffa: Fix the size of the allocation in ffa_partitions_cleanup()
  firmware: arm_ffa: Fix FFA notifications cleanup path
  firmware: arm_ffa: Add checks for the notification enabled state
  firmware: arm_ffa: Setup the partitions after the notification initialisation
  firmware: arm_ffa: Allow FF-A initialisation even when notification fails
  firmware: arm_ffa: Declare ffa_bus_type structure in the header

Link: https://lore.kernel.org/r/20231116191603.929767-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-06 17:35:29 +01:00
Pablo Neira Ayuso
f6e1532a26 netfilter: nf_tables: validate family when identifying table via handle
Validate table family when looking up for it via NFTA_TABLE_HANDLE.

Fixes: 3ecbfd65f50e ("netfilter: nf_tables: allocate handle and delete objects via handle")
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06 17:15:43 +01:00
Pablo Neira Ayuso
3701cd390f netfilter: nf_tables: bail out on mismatching dynset and set expressions
If dynset expressions provided by userspace is larger than the declared
set expressions, then bail out.

Fixes: 48b0ae046ee9 ("netfilter: nftables: netlink support for several set element expressions")
Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06 17:15:43 +01:00
Florian Westphal
63331e37fb netfilter: nf_tables: fix 'exist' matching on bigendian arches
Maze reports "tcp option fastopen exists" fails to match on
OpenWrt 22.03.5, r20134-5f15225c1e (5.10.176) router.

"tcp option fastopen exists" translates to:
inet
  [ exthdr load tcpopt 1b @ 34 + 0 present => reg 1 ]
  [ cmp eq reg 1 0x00000001 ]

.. but existing nft userspace generates a 1-byte compare.

On LSB (x86), "*reg32 = 1" is identical to nft_reg_store8(reg32, 1), but
not on MSB, which will place the 1 last. IOW, on bigendian aches the cmp8
is awalys false.

Make sure we store this in a consistent fashion, so existing userspace
will also work on MSB (bigendian).

Regardless of this patch we can also change nft userspace to generate
'reg32 == 0' and 'reg32 != 0' instead of u8 == 0 // u8 == 1 when
adding 'option x missing/exists' expressions as well.

Fixes: 3c1fece8819e ("netfilter: nft_exthdr: Allow checking TCP option presence, too")
Fixes: b9f9a485fb0e ("netfilter: nft_exthdr: add boolean DCCP option matching")
Fixes: 055c4b34b94f ("netfilter: nft_fib: Support existence check")
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Closes: https://lore.kernel.org/netfilter-devel/CAHo-OozyEqHUjL2-ntATzeZOiuftLWZ_HU6TOM_js4qLfDEAJg@mail.gmail.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06 17:15:42 +01:00
Florian Westphal
317eb96850 netfilter: nft_set_pipapo: skip inactive elements during set walk
Otherwise set elements can be deactivated twice which will cause a crash.

Reported-by: Xingyuan Mo <hdthky0@gmail.com>
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06 17:14:37 +01:00
D. Wythe
1834d62ae8 netfilter: bpf: fix bad registration on nf_defrag
We should pass a pointer to global_hook to the get_proto_defrag_hook()
instead of its value, since the passed value won't be updated even if
the request module was loaded successfully.

Log:

[   54.915713] nf_defrag_ipv4 has bad registration
[   54.915779] WARNING: CPU: 3 PID: 6323 at net/netfilter/nf_bpf_link.c:62 get_proto_defrag_hook+0x137/0x160
[   54.915835] CPU: 3 PID: 6323 Comm: fentry Kdump: loaded Tainted: G            E      6.7.0-rc2+ #35
[   54.915839] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
[   54.915841] RIP: 0010:get_proto_defrag_hook+0x137/0x160
[   54.915844] Code: 4f 8c e8 2c cf 68 ff 80 3d db 83 9a 01 00 0f 85 74 ff ff ff 48 89 ee 48 c7 c7 8f 12 4f 8c c6 05 c4 83 9a 01 01 e8 09 ee 5f ff <0f> 0b e9 57 ff ff ff 49 8b 3c 24 4c 63 e5 e8 36 28 6c ff 4c 89 e0
[   54.915849] RSP: 0018:ffffb676003fbdb0 EFLAGS: 00010286
[   54.915852] RAX: 0000000000000023 RBX: ffff9596503d5600 RCX: ffff95996fce08c8
[   54.915854] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff95996fce08c0
[   54.915855] RBP: ffffffff8c4f12de R08: 0000000000000000 R09: 00000000fffeffff
[   54.915859] R10: ffffb676003fbc70 R11: ffffffff8d363ae8 R12: 0000000000000000
[   54.915861] R13: ffffffff8e1f75c0 R14: ffffb676003c9000 R15: 00007ffd15e78ef0
[   54.915864] FS:  00007fb6e9cab740(0000) GS:ffff95996fcc0000(0000) knlGS:0000000000000000
[   54.915867] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.915868] CR2: 00007ffd15e75c40 CR3: 0000000101e62006 CR4: 0000000000360ef0
[   54.915870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   54.915871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   54.915873] Call Trace:
[   54.915891]  <TASK>
[   54.915894]  ? __warn+0x84/0x140
[   54.915905]  ? get_proto_defrag_hook+0x137/0x160
[   54.915908]  ? __report_bug+0xea/0x100
[   54.915925]  ? report_bug+0x2b/0x80
[   54.915928]  ? handle_bug+0x3c/0x70
[   54.915939]  ? exc_invalid_op+0x18/0x70
[   54.915942]  ? asm_exc_invalid_op+0x1a/0x20
[   54.915948]  ? get_proto_defrag_hook+0x137/0x160
[   54.915950]  bpf_nf_link_attach+0x1eb/0x240
[   54.915953]  link_create+0x173/0x290
[   54.915969]  __sys_bpf+0x588/0x8f0
[   54.915974]  __x64_sys_bpf+0x20/0x30
[   54.915977]  do_syscall_64+0x45/0xf0
[   54.915989]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[   54.915998] RIP: 0033:0x7fb6e9daa51d
[   54.916001] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 89 0c 00 f7 d8 64 89 01 48
[   54.916003] RSP: 002b:00007ffd15e78ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[   54.916006] RAX: ffffffffffffffda RBX: 00007ffd15e78fc0 RCX: 00007fb6e9daa51d
[   54.916007] RDX: 0000000000000040 RSI: 00007ffd15e78ef0 RDI: 000000000000001c
[   54.916009] RBP: 000000000000002d R08: 00007fb6e9e73a60 R09: 0000000000000001
[   54.916010] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006
[   54.916012] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
[   54.916014]  </TASK>
[   54.916015] ---[ end trace 0000000000000000 ]---

Fixes: 91721c2d02d3 ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06 17:14:26 +01:00
Chester Lin
94ea0ed356
MAINTAINERS: change the S32G2 maintainer's email address.
I am leaving SUSE so the current email address <clin@suse.com> will be
disabled soon. <chester62515@gmail.com> will be my new address for handling
emails, patches and pull requests from upstream and communities.

Cc: Chester Lin <chester62515@gmail.com>
Cc: NXP S32 Linux Team <s32@nxp.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Olof Johansson <olof@lixom.net>
Cc: Andreas Färber <afaerber@suse.de>
Cc: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Chester Lin <clin@suse.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Link: https://lore.kernel.org/r/20231115234508.11510-1-clin@suse.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-06 16:46:49 +01:00
Arnd Bergmann
56d4eaed1e This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
6.7, please pull the following:
 
 - Stefan corrects the disabling of the activity LED for the Raspberry Pi
   400
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEm+Rq3+YGJdiR9yuFh9CWnEQHBwQFAmVvqQ0ACgkQh9CWnEQH
 BwRkzRAAioWk537inVyfayyYXimkM+GdCkQ8iReieWPDtTJIqRp7OJunf/+mEciP
 r0fmCYmDTQjnHLmR0rZUKQxlLRyvHdW7ghsFyQGQVKfINc+/wfU4XzsbkxiEZUu4
 0vfn9D/m3NHJSooyJH2ec7JWQawwEj9LaCOVI3LlzRrChVeCluF/w9jhBL/PqpC0
 cLFZqr9Ym2olSets3gDgU3fkWBTklfcO4mYOTT3QvLAbgJ25lCJw0tyEnSelet9g
 35d74cKh9DOoEGTm4fc+BrDHSurclvRH24J2hbp/AxSe6JxLYuLaaiVM1aVSEJff
 PhfJfF/ByhvAS1Y9L+UZlmAbwQuzUPm//7NAtKkpORdkzprgkBeAUn2GawDLWdZY
 Dnm2twKtSaHSJ1kX9xjCju0bhikbMAN6L72ukfcWASWbWzlAmKhPA7QlFcGn4jip
 VjflqpKQoJGp5uc4bAb2G0bWlUs8eAvVgJk4lsyIdeZ+6Ft5rCliHRzprUBh3tj2
 xF4IjcgysuFU5Q68AHoQg5eA2ONiyI22o/4BxwgYE0+DDi4WVcuyXQNDC/QWeFWg
 o+uRjr04GSjpGa/4qrM3LqP/Hq6pTfGO7uKf5PMMA/jBVBqh0yg9Md3zEEocUf2t
 YTgtxbjchoHiPxtF69mfhbdXxIaJ0Vm7QruFXIXdFCRNOdzI91c=
 =Q9Li
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmVwlR0ACgkQYKtH/8kJ
 UidvvA//SW68rwNwprc35SJ5UBT5eh7GdjAr2W2Jam4oc4DgY9gzcwaTMDVhrw+4
 sQcz/9j1jquX1l0RREQas2F7YSZbo3TCtWHMpHQS4Tv29oWNPI4YaPS9vjvgndll
 CoZOT1zOaSzK9b7za/LLoYfspMb4fUqwX03vwZIVV4CYkr0shivrHvxvxNQuVzMH
 w4A54D3iJFkjYVp/OdfIDGSgShgX4umGb2vUYv1EykFxeyQ6w2rsuFbu55Gcq0Hy
 dBMiE62Cw+bKHjjbRmZRMX5v6AtWor4uaYnWxhAvg+YboLEVb6ZJOh4SZoZPUM15
 cXdCLVwiViBfsJIq/GYgY43HL2fTYk+PLlK+bGxeursiHDIwLBuRysrj6k2diW4j
 NST4H7J3bRfUzGcvM9SqZzm6/JPrnGBxc/FsQJH7e+mT5FUg474CAPVhUp3WmxJZ
 yplMq3Vc42CCJVsODf/W3VaQmUsY8RGQryZ48eHgJPAe8LIhDL0hc+Wssq/EotLV
 gdFXFnrgY/LGi6TtuznbaNK5vbt+YKa8Zc/9hR5yjVJUwUXqCRepeiyz/1Jvbl77
 IJuul1EYcRyfo30JgjF5CCrraZRiCpu4fjvxw5cae0Q/OLIEDXgRmFkQ13Uyf/YY
 VDeTvI3hI7dzyUs+4vnhgwSHV4uGjZWDI9qnyJKkltPzcVcIXhE=
 =v2s8
 -----END PGP SIGNATURE-----

Merge tag 'arm-soc/for-6.7/devicetree-fixes' of https://github.com/Broadcom/stblinux into arm/fixes

This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
6.7, please pull the following:

- Stefan corrects the disabling of the activity LED for the Raspberry Pi
  400

* tag 'arm-soc/for-6.7/devicetree-fixes' of https://github.com/Broadcom/stblinux:
  ARM: dts: bcm2711-rpi-400: Fix delete-node of led_act

Link: https://lore.kernel.org/r/20231205225021.653045-1-florian.fainelli@broadcom.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-06 16:37:01 +01:00
Heiner Kallweit
fe2b122665 leds: trigger: netdev: fix RTNL handling to prevent potential deadlock
When working on LED support for r8169 I got the following lockdep
warning. Easiest way to prevent this scenario seems to be to take
the RTNL lock before the trigger_data lock in set_device_name().

======================================================
WARNING: possible circular locking dependency detected
6.7.0-rc2-next-20231124+ #2 Not tainted
------------------------------------------------------
bash/383 is trying to acquire lock:
ffff888103aa1c68 (&trigger_data->lock){+.+.}-{3:3}, at: netdev_trig_notify+0xec/0x190 [ledtrig_netdev]

but task is already holding lock:
ffffffff8cddf808 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x12/0x20

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (rtnl_mutex){+.+.}-{3:3}:
       __mutex_lock+0x9b/0xb50
       mutex_lock_nested+0x16/0x20
       rtnl_lock+0x12/0x20
       set_device_name+0xa9/0x120 [ledtrig_netdev]
       netdev_trig_activate+0x1a1/0x230 [ledtrig_netdev]
       led_trigger_set+0x172/0x2c0
       led_trigger_write+0xf1/0x140
       sysfs_kf_bin_write+0x5d/0x80
       kernfs_fop_write_iter+0x15d/0x210
       vfs_write+0x1f0/0x510
       ksys_write+0x6c/0xf0
       __x64_sys_write+0x14/0x20
       do_syscall_64+0x3f/0xf0
       entry_SYSCALL_64_after_hwframe+0x6c/0x74

-> #0 (&trigger_data->lock){+.+.}-{3:3}:
       __lock_acquire+0x1459/0x25a0
       lock_acquire+0xc8/0x2d0
       __mutex_lock+0x9b/0xb50
       mutex_lock_nested+0x16/0x20
       netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
       call_netdevice_register_net_notifiers+0x5a/0x100
       register_netdevice_notifier+0x85/0x120
       netdev_trig_activate+0x1d4/0x230 [ledtrig_netdev]
       led_trigger_set+0x172/0x2c0
       led_trigger_write+0xf1/0x140
       sysfs_kf_bin_write+0x5d/0x80
       kernfs_fop_write_iter+0x15d/0x210
       vfs_write+0x1f0/0x510
       ksys_write+0x6c/0xf0
       __x64_sys_write+0x14/0x20
       do_syscall_64+0x3f/0xf0
       entry_SYSCALL_64_after_hwframe+0x6c/0x74

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(rtnl_mutex);
                               lock(&trigger_data->lock);
                               lock(rtnl_mutex);
  lock(&trigger_data->lock);

 *** DEADLOCK ***

8 locks held by bash/383:
 #0: ffff888103ff33f0 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x6c/0xf0
 #1: ffff888103aa1e88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x114/0x210
 #2: ffff8881036f1890 (kn->active#82){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x11d/0x210
 #3: ffff888108e2c358 (&led_cdev->led_access){+.+.}-{3:3}, at: led_trigger_write+0x30/0x140
 #4: ffffffff8cdd9e10 (triggers_list_lock){++++}-{3:3}, at: led_trigger_write+0x75/0x140
 #5: ffff888108e2c270 (&led_cdev->trigger_lock){++++}-{3:3}, at: led_trigger_write+0xe3/0x140
 #6: ffffffff8cdde3d0 (pernet_ops_rwsem){++++}-{3:3}, at: register_netdevice_notifier+0x1c/0x120
 #7: ffffffff8cddf808 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x12/0x20

stack backtrace:
CPU: 0 PID: 383 Comm: bash Not tainted 6.7.0-rc2-next-20231124+ #2
Hardware name: Default string Default string/Default string, BIOS ADLN.M6.SODIMM.ZB.CY.015 08/08/2023
Call Trace:
 <TASK>
 dump_stack_lvl+0x5c/0xd0
 dump_stack+0x10/0x20
 print_circular_bug+0x2dd/0x410
 check_noncircular+0x131/0x150
 __lock_acquire+0x1459/0x25a0
 lock_acquire+0xc8/0x2d0
 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
 __mutex_lock+0x9b/0xb50
 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
 ? __this_cpu_preempt_check+0x13/0x20
 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
 ? __cancel_work_timer+0x11c/0x1b0
 ? __mutex_lock+0x123/0xb50
 mutex_lock_nested+0x16/0x20
 ? mutex_lock_nested+0x16/0x20
 netdev_trig_notify+0xec/0x190 [ledtrig_netdev]
 call_netdevice_register_net_notifiers+0x5a/0x100
 register_netdevice_notifier+0x85/0x120
 netdev_trig_activate+0x1d4/0x230 [ledtrig_netdev]
 led_trigger_set+0x172/0x2c0
 ? preempt_count_add+0x49/0xc0
 led_trigger_write+0xf1/0x140
 sysfs_kf_bin_write+0x5d/0x80
 kernfs_fop_write_iter+0x15d/0x210
 vfs_write+0x1f0/0x510
 ksys_write+0x6c/0xf0
 __x64_sys_write+0x14/0x20
 do_syscall_64+0x3f/0xf0
 entry_SYSCALL_64_after_hwframe+0x6c/0x74
RIP: 0033:0x7f269055d034
Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 35 c3 0d 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
RSP: 002b:00007ffddb7ef748 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f269055d034
RDX: 0000000000000007 RSI: 000055bf5f4af3c0 RDI: 0000000000000001
RBP: 000055bf5f4af3c0 R08: 0000000000000073 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000007
R13: 00007f26906325c0 R14: 00007f269062ff20 R15: 0000000000000000
 </TASK>

Fixes: d5e01266e7f5 ("leds: trigger: netdev: add additional specific link speed mode")
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/fb5c8294-2a10-4bf5-8f10-3d2b77d2757e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06 07:36:55 -08:00